feat(whatsapp): add WhatsApp Business Cloud API adapter

Add an official, production-grade WhatsApp integration via Meta's Business Cloud API as a complement to the existing Baileys bridge. No bridge subprocess, no QR codes, no account-ban risk — at the cost of a Meta Business account and a public HTTPS webhook URL. Setup is fully wizard-driven: 'hermes whatsapp-cloud' walks through every credential with paste-time validation (catches the #1 trap of pasting a phone number into the Phone Number ID field), generates a verify token, and ends with copy-paste instructions for the cloudflared / Meta-dashboard / Business Manager pieces that can't be automated. The wizard also points users at Meta's Business Manager for setting the bot's display name and profile picture. Feature set: - Inbound: text, images (with native-vision routing), voice notes (STT), documents (small text inlined, larger cached), reply context. - Outbound: text with WhatsApp-flavored markdown conversion, images, videos, documents, opus voice notes via ffmpeg with MP3 fallback. - Native interactive buttons for clarify, dangerous-command approval, and slash-command confirmation flows — matches the Telegram / Discord UX, graceful degrades to plain text. - Read receipts (blue double-checkmarks) and typing indicator, using Meta's combined endpoint so they fire in a single API call. - Webhook security: X-Hub-Signature-256 HMAC verification (raw body, constant-time), wamid deduplication, group-shaped-message refusal (groups deferred to v2 — Baileys still covers them). - Full integration with the gateway's session, cron, display-tier, prompt-hint, and auth-allowlist systems. Cloud and Baileys can run side-by-side against different phone numbers. Also wires STT (speech-to-text) through Nous's managed audio gateway for Nous subscribers — previously the default stt.provider=local required a separate faster-whisper install. New subscribers now get voice-note transcription out of the box. Docs: 418-line user guide at website/docs/user-guide/messaging/ whatsapp-cloud.md, sidebar entry, environment-variables reference, ADDING_A_PLATFORM.md updated with the optional interactive-UX contract for future adapter authors. Tests: 100 dedicated tests for the adapter, 32 for the setup wizard, 20 for the Nous subscription STT wiring, plus regression coverage across display_config, prompt_builder, and the cron scheduler. Known limitations (deferred until clear demand signal): - Group chats — use the Baileys bridge if you need them. - Message templates for 24-hour-window outside-conversation sends — reactive chat is unaffected; cron / delegate_task with gaps > 24h will fail with a clear error. The agent's system prompt warns the model about this so it knows to mention it when scheduling delayed messages.
feat(tui): mouse_tracking DEC mode presets (salvage of #26681 ) (#30084 )
2026-05-23 01:07:01 -04:00 · 2026-05-21 20:25:52 -05:00 · 2026-05-22 10:38:32 +10:00 · 2026-05-21 17:38:19 -07:00 · 2026-05-21 17:38:19 -07:00 · 2026-05-21 17:38:19 -07:00
1557 changed files with 217895 additions and 49928 deletions
--- a/.env.example
+++ b/.env.example
@@ -14,6 +14,14 @@
 # LLM_MODEL is no longer read from .env — this line is kept for reference only.
 # LLM_MODEL=anthropic/claude-opus-4.6

+# =============================================================================
+# LLM PROVIDER (NovitaAI)
+# =============================================================================
+# NovitaAI — 90+ models, pay-per-use
+# Get your key at: https://novita.ai/settings/key-management
+# NOVITA_API_KEY=
+# NOVITA_BASE_URL=https://api.novita.ai/openai/v1  # Override default base URL
+
 # =============================================================================
 # LLM PROVIDER (Google AI Studio / Gemini)
 # =============================================================================
@@ -143,6 +151,18 @@
 # Also requires ~/.honcho/config.json with enabled=true (see README).
 # HONCHO_API_KEY=

+# =============================================================================
+# HYPERLIQUID OPTIONAL SKILL
+# =============================================================================
+# Optional defaults for the Hyperliquid skill in optional-skills/blockchain/hyperliquid
+#
+# Hyperliquid API base URL override
+# Default: https://api.hyperliquid.xyz
+# HYPERLIQUID_API_URL=https://api.hyperliquid-testnet.xyz
+#
+# Default address for account-level commands like state, fills, orders, and review
+# HYPERLIQUID_USER_ADDRESS=0x0000000000000000000000000000000000000000
+
 # =============================================================================
 # TERMINAL TOOL CONFIGURATION
 # =============================================================================
@@ -261,6 +281,27 @@ BROWSER_SESSION_TIMEOUT=300
 # Browser sessions are automatically closed after this period of no activity
 BROWSER_INACTIVITY_TIMEOUT=120

+# Extra Chromium launch flags passed to agent-browser, comma- or newline-separated.
+# Hermes auto-injects "--no-sandbox,--disable-dev-shm-usage" when it detects root
+# or AppArmor-restricted unprivileged user namespaces (Ubuntu 23.10+, DGX Spark,
+# many container images), so leave this unset unless you need extra flags.
+# Setting this disables the auto-injection.
+# AGENT_BROWSER_ARGS=--no-sandbox
+
+# Camofox local anti-detection browser (Camoufox-based Firefox).
+# Set CAMOFOX_URL to route the browser tools through a local Camofox server
+# instead of agent-browser/Browserbase. See docs/user-guide/features/browser.md.
+# CAMOFOX_URL=http://localhost:9377
+
+# Externally managed Camofox sessions — when another app owns the visible
+# Camofox browser, set these so Hermes shares the same userId/profile instead
+# of creating its own isolated session.
+# CAMOFOX_USER_ID=
+# CAMOFOX_SESSION_KEY=
+# Set to true to reuse an already-open Camofox tab for this identity before
+# creating a new one (useful for gateway restarts).
+# CAMOFOX_ADOPT_EXISTING_TAB=false
+
 # =============================================================================
 # SESSION LOGGING
 # =============================================================================
@@ -298,6 +339,7 @@ BROWSER_INACTIVITY_TIMEOUT=120
 # TELEGRAM_ALLOWED_USERS=                  # Comma-separated user IDs
 # TELEGRAM_HOME_CHANNEL=                   # Default chat for cron delivery
 # TELEGRAM_HOME_CHANNEL_NAME=              # Display name for home channel
+# TELEGRAM_CRON_THREAD_ID=                 # Forum topic ID for cron deliveries; overrides TELEGRAM_HOME_CHANNEL_THREAD_ID for cron so replies work in topic mode

 # Webhook mode (optional — for cloud deployments like Fly.io/Railway)
 # Default is long polling. Setting TELEGRAM_WEBHOOK_URL switches to webhook mode.
@@ -353,24 +395,6 @@ IMAGE_TOOLS_DEBUG=false
 # CONTEXT_COMPRESSION_THRESHOLD=0.85      # Compress at 85% of context limit
 # Model is set via compression.summary_model in config.yaml (default: google/gemini-3-flash-preview)

-# =============================================================================
-# RL TRAINING (Tinker + Atropos)
-# =============================================================================
-# Run reinforcement learning training on language models using the Tinker API.
-# Requires the rl-server to be running (from tinker-atropos package).
-
-# Tinker API Key - RL training service
-# Get at: https://tinker-console.thinkingmachines.ai/keys
-# TINKER_API_KEY=
-
-# Weights & Biases API Key - Experiment tracking and metrics
-# Get at: https://wandb.ai/authorize
-# WANDB_API_KEY=
-
-# RL API Server URL (default: http://localhost:8080)
-# Change if running the rl-server on a different host/port
-# RL_API_URL=http://localhost:8080
-
 # =============================================================================
 # SKILLS HUB (GitHub integration for skill search/install/publish)
 # =============================================================================
--- a/.github/workflows/contributor-check.yml
+++ b/.github/workflows/contributor-check.yml
@@ -16,7 +16,7 @@ jobs:
  check-attribution:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          fetch-depth: 0  # Full history needed for git log

--- a/.github/workflows/deploy-site.yml
+++ b/.github/workflows/deploy-site.yml
@@ -35,7 +35,7 @@ jobs:
      name: github-pages
      url: ${{ steps.deploy.outputs.page_url }}
    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2

      - uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4
        with:
@@ -43,7 +43,7 @@ jobs:
          cache: npm
          cache-dependency-path: website/package-lock.json

-      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5
+      - uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: '3.11'

--- a/.github/workflows/docker-publish.yml
+++ b/.github/workflows/docker-publish.yml
@@ -27,10 +27,11 @@ on:
 permissions:
  contents: read

-# Concurrency: push/release runs are NEVER cancelled so every merge gets its
-# own SHA-tagged image; :latest is guarded separately by the move-latest job.
-# PR runs reuse a PR-scoped group with cancel-in-progress: true so rapid
-# pushes to the same PR collapse to the latest commit.
+# Concurrency: push/release runs are NEVER cancelled so every merge gets
+# its own :main or release-tagged image.  :latest is guarded separately
+# by the move-latest job.  PR runs reuse a PR-scoped group with
+# cancel-in-progress: true so rapid pushes to the same PR collapse to the
+# latest commit.
 concurrency:
  group: docker-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
@@ -53,7 +54,7 @@ jobs:
      digest: ${{ steps.push.outputs.digest }}
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          submodules: recursive

@@ -64,7 +65,7 @@ jobs:
      # to gha with a per-arch scope; the push step below reuses every
      # layer from this build.
      - name: Build image (amd64, smoke test)
-        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
+        uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f  # v7.1.0
        with:
          context: .
          file: Dockerfile
@@ -81,7 +82,7 @@ jobs:

      - name: Log in to Docker Hub
        if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
-        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3
+        uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121  # v4.1.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
@@ -91,14 +92,14 @@ jobs:
      # pattern for multi-runner multi-platform builds.
      #
      # We apply the OCI revision label here (and again on arm64) because
-      # the move-latest job reads it off the linux/amd64 sub-manifest config
-      # of `:latest` to decide whether it's safe to advance.  The label must
-      # be on each per-arch image — manifest lists themselves don't carry
-      # image config labels.
+      # the move-latest job reads it off the linux/amd64 sub-manifest
+      # config of the floating tag to decide whether it's safe to advance.
+      # The label must be on each per-arch image — manifest lists themselves
+      # don't carry image config labels.
      - name: Push amd64 by digest
        id: push
        if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
-        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
+        uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f  # v7.1.0
        with:
          context: .
          file: Dockerfile
@@ -141,7 +142,7 @@ jobs:
      digest: ${{ steps.push.outputs.digest }}
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          submodules: recursive

@@ -152,7 +153,7 @@ jobs:
      # to gha with a per-arch scope; the push step below reuses every
      # layer from this build.
      - name: Build image (arm64, smoke test)
-        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
+        uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f  # v7.1.0
        with:
          context: .
          file: Dockerfile
@@ -169,7 +170,7 @@ jobs:

      - name: Log in to Docker Hub
        if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
-        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3
+        uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121  # v4.1.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
@@ -177,7 +178,7 @@ jobs:
      - name: Push arm64 by digest
        id: push
        if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
-        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
+        uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f  # v7.1.0
        with:
          context: .
          file: Dockerfile
@@ -207,8 +208,14 @@ jobs:
  # ---------------------------------------------------------------------------
  # Stitch both per-arch digests into a single tagged multi-arch manifest.
  # This is a registry-side operation — no building, no layer re-push —
-  # so it runs in ~30 seconds.  On main pushes it produces :sha-<sha>.
-  # On releases it produces :<release_tag_name>.
+  # so it runs in ~30 seconds.  On main pushes it produces :main; on
+  # releases it produces :<release_tag_name>.
+  #
+  # For main pushes the ancestor check runs BEFORE the manifest push so
+  # we never overwrite :main with an older commit.  The top-level
+  # concurrency group (`docker-${{ github.ref }}` with
+  # `cancel-in-progress: false`) already serialises runs per ref; the
+  # ancestor check is defense-in-depth.
  # ---------------------------------------------------------------------------
  merge:
    if: github.repository == 'NousResearch/hermes-agent' && (github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release')
@@ -216,8 +223,15 @@ jobs:
    needs: [build-amd64, build-arm64]
    timeout-minutes: 10
    outputs:
-      pushed_sha_tag: ${{ steps.mark_pushed.outputs.pushed }}
+      pushed_release_tag: ${{ steps.mark_release_pushed.outputs.pushed }}
+      release_tag: ${{ steps.tag.outputs.tag }}
    steps:
+      - name: Checkout code
+        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          fetch-depth: 1000
+
      - name: Download digests
        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4
        with:
@@ -229,30 +243,94 @@ jobs:
        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3

      - name: Log in to Docker Hub
-        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3
+        uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121  # v4.1.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

-      # Compute the tag for this run.  Main pushes use sha-<sha> (so every
-      # commit gets its own immutable tag); releases use the release tag name.
+      # Read the git revision label off the current :main manifest, then
+      # use `git merge-base --is-ancestor` to check whether our commit is
+      # a descendant of it.  If :main doesn't exist yet, or its label is
+      # missing, we treat that as "safe to publish".  If another run
+      # already advanced :main past us (or diverged), we skip and leave
+      # it alone.
+      - name: Decide whether to move :main
+        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+        id: main_check
+        run: |
+          set -euo pipefail
+          image=nousresearch/hermes-agent
+
+          image_json=$(
+            docker buildx imagetools inspect "${image}:main" \
+              --format '{{ json (index .Image "linux/amd64") }}' \
+              2>/dev/null || true
+          )
+
+          if [ -z "${image_json}" ]; then
+            echo "No existing :main (or inspect failed) — safe to publish."
+            echo "push_main=true" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          current_sha=$(
+            printf '%s' "${image_json}" \
+              | jq -r '.config.Labels."org.opencontainers.image.revision" // ""'
+          )
+
+          if [ -z "${current_sha}" ]; then
+            echo "Registry :main has no revision label — safe to publish."
+            echo "push_main=true" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          echo "Registry :main is at ${current_sha}"
+          echo "This run is at      ${GITHUB_SHA}"
+
+          if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
+            echo ":main already points at our SHA — nothing to do."
+            echo "push_main=false" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
+            git fetch --no-tags --prune origin \
+              "+refs/heads/main:refs/remotes/origin/main" \
+              || true
+          fi
+
+          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
+            echo "Registry :main points at an unknown commit (${current_sha}); refusing to overwrite."
+            echo "push_main=false" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
+            echo "Our commit is a descendant of :main — safe to advance."
+            echo "push_main=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "Another run advanced :main past us (or diverged) — leaving it alone."
+            echo "push_main=false" >> "$GITHUB_OUTPUT"
+          fi
+
+      # Compute the tag for this run.  Main pushes tag directly as :main
+      # (no per-commit SHA tags); releases use the release tag name.
      - name: Compute tag
        id: tag
        run: |
          if [ "${{ github.event_name }}" = "release" ]; then
            echo "tag=${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
          else
-            echo "tag=sha-${{ github.sha }}" >> "$GITHUB_OUTPUT"
+            echo "tag=main" >> "$GITHUB_OUTPUT"
          fi

+      # Gate the manifest push on the ancestor check for main pushes.
+      # For releases there is no gate — the check doesn't even run.
      - name: Create manifest list and push
+        if: github.event_name != 'push' || steps.main_check.outputs.push_main == 'true'
        working-directory: /tmp/digests
        run: |
          set -euo pipefail
-          # Build the arg array from each digest file (filename = the digest
-          # hex, with no sha256: prefix; empty file content, only the name
-          # matters).  Using an array avoids shellcheck SC2046 and keeps
-          # every digest a single argv token even under pathological names.
          args=()
          for digest_file in *; do
            args+=("${IMAGE_NAME}@sha256:${digest_file}")
@@ -265,53 +343,46 @@ jobs:
          TAG: ${{ steps.tag.outputs.tag }}

      - name: Inspect image
+        if: github.event_name != 'push' || steps.main_check.outputs.push_main == 'true'
        run: |
          docker buildx imagetools inspect "${IMAGE_NAME}:${TAG}"
        env:
          IMAGE_NAME: ${{ env.IMAGE_NAME }}
          TAG: ${{ steps.tag.outputs.tag }}

-      # Signal to move-latest that the SHA tag is live.  Only on main pushes;
-      # releases don't trigger move-latest (they use their own release tag).
-      - name: Mark SHA tag pushed
-        id: mark_pushed
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+      # Signal to move-latest that the release tag is live.
+      - name: Mark release tag pushed
+        id: mark_release_pushed
+        if: github.event_name == 'release'
        run: echo "pushed=true" >> "$GITHUB_OUTPUT"

  # ---------------------------------------------------------------------------
-  # Move :latest to point at the SHA tag the merge job pushed.
+  # Move :latest to point at the release tag the merge job pushed.
  #
-  # The real serialization guarantee comes from the top-level concurrency
-  # group (`docker-${{ github.ref }}` with `cancel-in-progress: false`),
-  # which ensures at most one workflow run for this ref executes at a time.
-  # That means two move-latest steps for the same ref cannot overlap.
+  # :latest is the floating tag that tracks the most recent stable release.
+  # Only `release: published` events advance it — never main pushes.
  #
-  # This job has its own concurrency group as defense-in-depth: if the
-  # top-level group is ever loosened, queued move-latests will run serially
-  # in arrival order, each one running the ancestor check below and either
-  # advancing :latest or skipping.  `cancel-in-progress: false` matches the
-  # top-level setting — we don't want rapid pushes to cancel a queued
-  # move-latest, because the ancestor check is the real safety mechanism
-  # and queueing is cheap (move-latest is a ~30s registry op).
-  #
-  # Combined with the ancestor check, this means :latest only ever moves
-  # forward in git history.
+  # We still run an ancestor check against the existing :latest so that a
+  # backport release on an older branch (e.g. patching v1.1.5 after v1.2.3
+  # is out) doesn't drag :latest backwards.  The check is the same shape
+  # as the ancestor check in the merge job for :main: read the OCI
+  # revision label off the current :latest, look up that commit in git,
+  # and only advance if our release commit is a strict descendant.
  # ---------------------------------------------------------------------------
  move-latest:
    if: |
      github.repository == 'NousResearch/hermes-agent'
-      && github.event_name == 'push'
-      && github.ref == 'refs/heads/main'
-      && needs.merge.outputs.pushed_sha_tag == 'true'
+      && github.event_name == 'release'
+      && needs.merge.outputs.pushed_release_tag == 'true'
    needs: merge
    runs-on: ubuntu-latest
    timeout-minutes: 10
    concurrency:
-      group: docker-move-latest-${{ github.ref }}
+      group: docker-move-latest
      cancel-in-progress: false
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          fetch-depth: 1000

@@ -319,25 +390,17 @@ jobs:
        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3

      - name: Log in to Docker Hub
-        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3
+        uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121  # v4.1.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

-      # Read the git revision label off the current :latest manifest, then
-      # use `git merge-base --is-ancestor` to check whether our commit is a
-      # descendant of it.  If :latest doesn't exist yet, or its label is
-      # missing, we treat that as "safe to publish".  If another run already
-      # advanced :latest past us (or diverged), we skip and leave it alone.
      - name: Decide whether to move :latest
        id: latest_check
        run: |
          set -euo pipefail
          image=nousresearch/hermes-agent

-          # Pull the JSON for the linux/amd64 sub-manifest's config and extract
-          # the OCI revision label with jq — Go template field access can't
-          # handle dots in map keys, so using json+jq is the robust route.
          image_json=$(
            docker buildx imagetools inspect "${image}:latest" \
              --format '{{ json (index .Image "linux/amd64") }}' \
@@ -362,7 +425,7 @@ jobs:
          fi

          echo "Registry :latest is at ${current_sha}"
-          echo "This run is at      ${GITHUB_SHA}"
+          echo "This release is at  ${GITHUB_SHA}"

          if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
            echo ":latest already points at our SHA — nothing to do."
@@ -371,6 +434,7 @@ jobs:
          fi

          # Make sure we have the :latest commit locally for merge-base.
+          # Releases can be cut from any branch, so fetch broadly.
          if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
            git fetch --no-tags --prune origin \
              "+refs/heads/main:refs/remotes/origin/main" \
@@ -383,25 +447,25 @@ jobs:
            exit 0
          fi

-          # Our SHA must be a descendant of the current :latest to be safe.
+          # Our release SHA must be a descendant of the current :latest.
+          # Backport releases on older branches won't satisfy this and will
+          # be left alone — :latest stays on the newer release.
          if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
-            echo "Our commit is a descendant of :latest — safe to advance."
+            echo "Our release commit is a descendant of :latest — safe to advance."
            echo "push_latest=true" >> "$GITHUB_OUTPUT"
          else
-            echo "Another run advanced :latest past us (or diverged) — leaving it alone."
+            echo "Existing :latest is newer than this release (likely a backport) — leaving it alone."
            echo "push_latest=false" >> "$GITHUB_OUTPUT"
          fi

-      # Retag the already-pushed SHA manifest as :latest.  This is a registry-
-      # side operation — no rebuild, no layer re-push — so it's quick and
-      # atomic per-tag.  The ancestor check above plus the cancel-in-progress
-      # concurrency on this job together guarantee we only ever move :latest
-      # forward in git history.
-      - name: Move :latest to this SHA
+      # Retag the already-pushed release manifest as :latest.
+      - name: Move :latest to this release tag
        if: steps.latest_check.outputs.push_latest == 'true'
+        env:
+          RELEASE_TAG: ${{ needs.merge.outputs.release_tag }}
        run: |
          set -euo pipefail
          image=nousresearch/hermes-agent
          docker buildx imagetools create \
            --tag "${image}:latest" \
-            "${image}:sha-${GITHUB_SHA}"
+            "${image}:${RELEASE_TAG}"
--- a/.github/workflows/docs-site-checks.yml
+++ b/.github/workflows/docs-site-checks.yml
@@ -14,7 +14,7 @@ jobs:
  docs-site-checks:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2

      - uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4
        with:
@@ -26,7 +26,7 @@ jobs:
        run: npm ci
        working-directory: website

-      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5
+      - uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: '3.11'

--- a/.github/workflows/history-check.yml
+++ b/.github/workflows/history-check.yml
@@ -0,0 +1,58 @@
+name: History Check
+
+# Rejects PRs whose branch has no common ancestor with main.
+#
+# In May 2026 PR #25045 was merged from a branch that had been disconnected
+# from main's history (likely an accidental `git checkout --orphan` or
+# `.git/` re-init).  GitHub's merge UI does not refuse merges of unrelated
+# histories, so the PR landed cleanly with the intended one-file change —
+# but its parent-less root commit (413990c94) got grafted into main as a
+# second root, and ~1500 files' worth of `git blame` history collapsed
+# onto that single commit.
+#
+# This check catches the failure mode by requiring `git merge-base` between
+# the PR head and main to be non-empty.
+
+on:
+  pull_request:
+    branches: [main]
+
+permissions:
+  contents: read
+
+jobs:
+  check-common-ancestor:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          fetch-depth: 0  # full history both sides for merge-base
+
+      - name: Reject PRs with no common ancestor on main
+        run: |
+          # `git merge-base` exits non-zero AND prints nothing when the two
+          # commits share no ancestor.  We check both conditions explicitly
+          # so the failure message is clear regardless of which signal fires
+          # first.
+          if ! BASE=$(git merge-base origin/main HEAD 2>/dev/null) || [ -z "$BASE" ]; then
+            echo ""
+            echo "::error::This PR has no common ancestor with main."
+            echo ""
+            echo "Your branch's history is disconnected from main.  Common causes:"
+            echo "  - the branch was created with 'git checkout --orphan'"
+            echo "  - '.git/' was re-initialized at some point during the work"
+            echo "  - the branch was force-pushed from an unrelated repository"
+            echo ""
+            echo "Merging an unrelated-history PR grafts a parent-less root commit"
+            echo "into main and collapses git blame for every file in that snapshot."
+            echo "Reference: PR #25045 caused this and re-rooted blame on ~1500"
+            echo "files to a single orphan commit."
+            echo ""
+            echo "To fix, rebase your changes onto current main:"
+            echo "  git fetch origin main"
+            echo "  git checkout -b fix-branch origin/main"
+            echo "  # re-apply your changes (cherry-pick, copy files, etc.)"
+            echo "  git push -f origin fix-branch"
+            exit 1
+          fi
+          echo "::notice::Common ancestor with main: $BASE"
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -37,7 +37,7 @@ jobs:
    timeout-minutes: 10
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
        with:
          fetch-depth: 0 # need full history for merge-base + worktree

@@ -122,7 +122,8 @@ jobs:
          retention-days: 14

      - name: Post / update PR comment
-        if: github.event_name == 'pull_request'
+        if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
+        continue-on-error: true
        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7
        with:
          script: |
@@ -166,7 +167,7 @@ jobs:
    timeout-minutes: 5
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install uv
        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
@@ -190,10 +191,10 @@ jobs:
    timeout-minutes: 5
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Set up Python
-        uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5
+        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v5
        with:
          python-version: "3.11"

--- a/.github/workflows/nix-lockfile-fix.yml
+++ b/.github/workflows/nix-lockfile-fix.yml
@@ -56,7 +56,7 @@ jobs:
          app-id: ${{ secrets.APP_ID }}
          private-key: ${{ secrets.APP_PRIVATE_KEY }}

-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          ref: main
          token: ${{ steps.app-token.outputs.token }}
@@ -194,7 +194,7 @@ jobs:

            Triggered by @${{ github.actor }} — [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).

-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          repository: ${{ steps.resolve.outputs.owner }}/${{ steps.resolve.outputs.repo }}
          ref: ${{ steps.resolve.outputs.ref }}
--- a/.github/workflows/nix.yml
+++ b/.github/workflows/nix.yml
@@ -21,7 +21,7 @@ jobs:
    runs-on: ${{ matrix.os }}
    timeout-minutes: 30
    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
      - uses: ./.github/actions/nix-setup
        with:
          cachix-auth-token: ${{ secrets.CACHIX_AUTH_TOKEN }}
--- a/.github/workflows/osv-scanner.yml
+++ b/.github/workflows/osv-scanner.yml
@@ -56,7 +56,7 @@ permissions:
 jobs:
  scan:
    name: Scan lockfiles
-    uses: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml@c51854704019a247608d928f370c98740469d4b5  # v2.3.5
+    uses: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml@9a498708959aeaef5ef730655706c5a1df1edbc2  # v2.3.8
    with:
      # Scan explicit lockfiles rather than recursing, so we only look at
      # the three sources of truth and skip vendored / test / worktree dirs.
--- a/.github/workflows/skills-index.yml
+++ b/.github/workflows/skills-index.yml
@@ -20,9 +20,9 @@ jobs:
    if: github.repository == 'NousResearch/hermes-agent'
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2

-      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5
+      - uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: '3.11'

@@ -53,7 +53,7 @@ jobs:
    # Only deploy on schedule or manual trigger (not on every push to the script)
    if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2

      - uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4
        with:
@@ -66,7 +66,7 @@ jobs:
          cache: npm
          cache-dependency-path: website/package-lock.json

-      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5
+      - uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: '3.11'

--- a/.github/workflows/supply-chain-audit.yml
+++ b/.github/workflows/supply-chain-audit.yml
@@ -11,6 +11,7 @@ on:
      - '**/sitecustomize.py'
      - '**/usercustomize.py'
      - '**/__init__.pth'
+      - 'pyproject.toml'

 permissions:
  pull-requests: write
@@ -31,7 +32,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          fetch-depth: 0

@@ -137,3 +138,68 @@ jobs:
        run: |
          echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
          exit 1
+
+  dep-bounds:
+    name: Check PyPI dependency upper bounds
+    runs-on: ubuntu-latest
+    if: contains(github.event.pull_request.changed_files_url, 'pyproject.toml') || true
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          fetch-depth: 0
+
+      - name: Check for unbounded PyPI deps
+        id: bounds
+        run: |
+          set -euo pipefail
+
+          BASE="${{ github.event.pull_request.base.sha }}"
+          HEAD="${{ github.event.pull_request.head.sha }}"
+
+          # Only check added lines in pyproject.toml
+          ADDED=$(git diff "$BASE".."$HEAD" -- pyproject.toml | grep '^+' | grep -v '^+++' || true)
+
+          if [ -z "$ADDED" ]; then
+            echo "found=false" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+
+          # Match PyPI dep specs that have >= but no < ceiling.
+          # Pattern: "package>=version" without a following ",<" bound.
+          # Excludes git+ URLs (which use commit SHAs) and comments.
+          UNBOUNDED=$(echo "$ADDED" | grep -oE '"[a-zA-Z0-9_-]+(\[[^\]]*\])?>=[ 0-9.]+"' | grep -v ',<' || true)
+
+          if [ -n "$UNBOUNDED" ]; then
+            echo "found=true" >> "$GITHUB_OUTPUT"
+            echo "$UNBOUNDED" > /tmp/unbounded.txt
+          else
+            echo "found=false" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Post unbounded dep warning
+        if: steps.bounds.outputs.found == 'true'
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          BODY="## ⚠️ Unbounded PyPI Dependency Detected
+
+          This PR adds PyPI dependencies without a \`<next_major\` upper bound. Per our [supply chain policy](../blob/main/CONTRIBUTING.md#dependency-pinning-policy-supply-chain-hardening), all PyPI deps must be pinned as \`>=floor,<next_major\`.
+
+          **Unbounded specs found:**
+          \`\`\`
+          $(cat /tmp/unbounded.txt)
+          \`\`\`
+
+          **Fix:** Add an upper bound, e.g. \`\"package>=1.2.0,<2\"\`
+
+          ---
+          *See PR #2810 and CONTRIBUTING.md for the full policy rationale.*"
+
+          gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY" || echo "::warning::Could not post PR comment (expected for fork PRs)"
+
+      - name: Fail on unbounded deps
+        if: steps.bounds.outputs.found == 'true'
+        run: |
+          echo "::error::PyPI dependencies without upper bounds detected. Add <next_major ceiling per CONTRIBUTING.md policy."
+          exit 1
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -23,13 +23,24 @@ concurrency:
 jobs:
  test:
    runs-on: ubuntu-latest
-    timeout-minutes: 20
+    timeout-minutes: 60
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2

-      - name: Install system dependencies
-        run: sudo apt-get update && sudo apt-get install -y ripgrep
+      - name: Install ripgrep (prebuilt binary)
+        run: |
+          set -euo pipefail
+          RG_VERSION=15.1.0
+          RG_SHA256=1c9297be4a084eea7ecaedf93eb03d058d6faae29bbc57ecdaf5063921491599
+          RG_TARBALL=ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl.tar.gz
+          curl -sSfL -o "$RG_TARBALL" \
+            "https://github.com/BurntSushi/ripgrep/releases/download/${RG_VERSION}/${RG_TARBALL}"
+          echo "${RG_SHA256}  ${RG_TARBALL}" | sha256sum -c -
+          tar -xzf "$RG_TARBALL"
+          sudo mv "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl/rg" /usr/local/bin/rg
+          rm -rf "$RG_TARBALL" "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl"
+          rg --version

      - name: Install uv
        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
@@ -44,9 +55,26 @@ jobs:
          uv pip install -e ".[all,dev]"

      - name: Run tests
+        # Per-file isolation via scripts/run_tests_parallel.py: discovers
+        # every test_*.py file under tests/ (excluding integration/ + e2e/),
+        # then runs `python -m pytest <file>` in a freshly-spawned subprocess
+        # with bounded parallelism. No xdist, no shared workers, no
+        # module-level state leakage between files.
+        #
+        # Why per-file (not per-test): per-test spawn cost (~250ms × 17k
+        # tests = 70min CPU minimum) blew the wall-clock budget. Per-file
+        # spawn (~250ms × ~850 files = ~3.5min) fits while still giving
+        # every file a fresh interpreter — the only isolation boundary
+        # that matters in practice (cross-file leakage was the original
+        # flake source; intra-file is the test author's responsibility).
+        #
+        # Why drop xdist entirely: xdist's persistent workers accumulate
+        # state across files, which is exactly the leakage we wanted to
+        # fix. ThreadPoolExecutor + subprocess.run is ~60 lines and does
+        # the job with cleaner semantics.
        run: |
          source .venv/bin/activate
-          python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto
+          python scripts/run_tests_parallel.py
        env:
          # Ensure tests don't accidentally call real APIs
          OPENROUTER_API_KEY: ""
@@ -55,10 +83,24 @@ jobs:

  e2e:
    runs-on: ubuntu-latest
-    timeout-minutes: 10
+    timeout-minutes: 15
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+
+      - name: Install ripgrep (prebuilt binary)
+        run: |
+          set -euo pipefail
+          RG_VERSION=15.1.0
+          RG_SHA256=1c9297be4a084eea7ecaedf93eb03d058d6faae29bbc57ecdaf5063921491599
+          RG_TARBALL=ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl.tar.gz
+          curl -sSfL -o "$RG_TARBALL" \
+            "https://github.com/BurntSushi/ripgrep/releases/download/${RG_VERSION}/${RG_TARBALL}"
+          echo "${RG_SHA256}  ${RG_TARBALL}" | sha256sum -c -
+          tar -xzf "$RG_TARBALL"
+          sudo mv "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl/rg" /usr/local/bin/rg
+          rm -rf "$RG_TARBALL" "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl"
+          rg --version

      - name: Install uv
        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
--- a/.github/workflows/upload_to_pypi.yml
+++ b/.github/workflows/upload_to_pypi.yml
@@ -0,0 +1,164 @@
+name: Publish to PyPI
+
+# Triggered by CalVer tag pushes from scripts/release.py (e.g. v2026.5.15)
+# Can also be triggered manually from the Actions tab as an escape hatch.
+on:
+  push:
+    tags:
+      - 'v20*'  # CalVer tags: v2026.5.15, v2026.5.15.2, etc.
+  workflow_dispatch:
+    inputs:
+      confirm_tag:
+        description: 'Tag to publish (e.g. v2026.5.15). Must already exist.'
+        required: true
+        type: string
+
+# Restrict default token to read-only; each job escalates as needed.
+permissions:
+  contents: read
+
+# Prevent overlapping publishes (e.g. two same-day tags pushed quickly).
+concurrency:
+  group: pypi-publish
+  cancel-in-progress: false
+
+jobs:
+  build:
+    name: Build distribution 📦
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          persist-credentials: false
+          # On workflow_dispatch, check out the confirmed tag.
+          ref: ${{ inputs.confirm_tag || github.ref }}
+          fetch-tags: true
+
+      - name: Validate tag exists
+        if: github.event_name == 'workflow_dispatch'
+        run: |
+          if ! git tag -l "${{ inputs.confirm_tag }}" | grep -q .; then
+            echo "::error::Tag '${{ inputs.confirm_tag }}' does not exist in the repo"
+            exit 1
+          fi
+
+      - name: Set up Python
+        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
+        with:
+          python-version: '3.13'
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@d0cc045d04ccac9d8b7881df0226f9e82c39688e  # v6
+
+      - name: Set up Node.js
+        uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4
+        with:
+          node-version: '22'
+
+      - name: Build web dashboard
+        run: cd web && npm ci && npm run build
+
+      - name: Build TUI bundle
+        run: cd ui-tui && npm ci && npm run build
+
+      - name: Bundle TUI into hermes_cli
+        run: |
+          mkdir -p hermes_cli/tui_dist
+          cp ui-tui/dist/entry.js hermes_cli/tui_dist/entry.js
+
+      - name: Verify frontend assets exist
+        run: |
+          test -f hermes_cli/web_dist/index.html || { echo "ERROR: web_dist not built"; exit 1; }
+          test -f hermes_cli/tui_dist/entry.js || { echo "ERROR: tui_dist not built"; exit 1; }
+
+      - name: Bundle install scripts into wheel
+        run: |
+          mkdir -p hermes_cli/scripts
+          cp scripts/install.sh hermes_cli/scripts/install.sh
+          cp scripts/install.ps1 hermes_cli/scripts/install.ps1
+
+      - name: Build wheel and sdist
+        run: uv build --sdist --wheel
+
+      - name: Upload distribution artifacts
+        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4
+        with:
+          name: python-package-distributions
+          path: dist/
+
+  publish:
+    name: Publish to PyPI
+    needs: build
+    runs-on: ubuntu-latest
+    environment:
+      name: pypi
+      url: https://pypi.org/p/hermes-agent
+    permissions:
+      id-token: write  # OIDC trusted publishing
+
+    steps:
+      - name: Download distribution artifacts
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4
+        with:
+          name: python-package-distributions
+          path: dist/
+
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b  # v1.14.0
+        with:
+          skip-existing: true
+
+  sign:
+    name: Sign and attach to GitHub Release
+    # Only runs on tag pushes — release.py creates the GitHub Release,
+    # and workflow_dispatch won't have a matching release to attach to.
+    if: startsWith(github.ref, 'refs/tags/')
+    needs: publish
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write   # attach assets to the existing release
+      id-token: write   # sigstore signing
+
+    steps:
+      - name: Download distribution artifacts
+        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4
+        with:
+          name: python-package-distributions
+          path: dist/
+
+      - name: Wait for GitHub Release to exist
+        env:
+          GITHUB_TOKEN: ${{ github.token }}
+        # release.py creates the GitHub Release after pushing the tag,
+        # but this workflow starts from the tag push — wait for it.
+        run: |
+          for i in $(seq 1 30); do
+            if gh release view "$GITHUB_REF_NAME" --repo "$GITHUB_REPOSITORY" >/dev/null 2>&1; then
+              echo "Release $GITHUB_REF_NAME found"
+              exit 0
+            fi
+            echo "Waiting for release... ($i/30)"
+            sleep 10
+          done
+          echo "::warning::Release $GITHUB_REF_NAME not found after 5 minutes — skipping signature upload"
+          echo "skip_sign=true" >> "$GITHUB_ENV"
+
+      - name: Sign with Sigstore
+        if: env.skip_sign != 'true'
+        uses: sigstore/gh-action-sigstore-python@04cffa1d795717b140764e8b640de88853c92acc  # v3.3.0
+        with:
+          inputs: >-
+            ./dist/*.tar.gz
+            ./dist/*.whl
+
+      - name: Attach signed artifacts to GitHub Release
+        if: env.skip_sign != 'true'
+        env:
+          GITHUB_TOKEN: ${{ github.token }}
+        # release.py already created the GitHub Release — just upload
+        # the Sigstore signatures alongside the existing assets.
+        run: >-
+          gh release upload
+          "$GITHUB_REF_NAME" dist/*.sigstore.json
+          --repo "$GITHUB_REPOSITORY"
+          --clobber
--- a/.github/workflows/uv-lockfile-check.yml
+++ b/.github/workflows/uv-lockfile-check.yml
@@ -71,7 +71,7 @@ jobs:
    timeout-minutes: 5
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2

      - name: Install uv
        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
--- a/.gitignore
+++ b/.gitignore
@@ -18,6 +18,7 @@ __pycache__/web_tools.cpython-310.pyc
 logs/
 data/
 .pytest_cache/
+.pytest-cache/
 tmp/
 temp_vision_images/
 hermes-*/*
@@ -70,3 +71,6 @@ mini-swe-agent/
 result
 website/static/api/skills-index.json
 models-dev-upstream/
+hermes_cli/tui_dist/*
+hermes_cli/scripts/
+docs/superpowers/*
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +0,0 @@
-[submodule "tinker-atropos"]
-	path = tinker-atropos
-	url = https://github.com/nousresearch/tinker-atropos
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -56,7 +56,6 @@ hermes-agent/
 ├── tui_gateway/          # Python JSON-RPC backend for the TUI
 ├── acp_adapter/          # ACP server (VS Code / Zed / JetBrains integration)
 ├── cron/                 # Scheduler — jobs.py, scheduler.py
-├── environments/         # RL training environments (Atropos)
 ├── scripts/              # run_tests.sh, release.py, auxiliary scripts
 ├── website/              # Docusaurus docs site
 └── tests/                # Pytest suite (~17k tests across ~900 files as of May 2026)
@@ -309,6 +308,29 @@ The registry handles schema collection, dispatch, availability checking, and err

 ---

+## Dependency Pinning Policy
+
+All dependencies must have upper bounds to limit supply-chain attack surface.
+This policy was established after the litellm compromise (PR #2796, #2810) and
+reinforced after the Mini Shai-Hulud worm campaign (May 2026).
+
+| Source type | Treatment | Example |
+|---|---|---|
+| PyPI package | `>=floor,<next_major` | `"httpx>=0.28.1,<1"` |
+| Git URL | Commit SHA | `git+https://...@<40-char-sha>` |
+| GitHub Actions | Commit SHA + comment | `uses: actions/checkout@<sha>  # v4` |
+| CI-only pip | `==exact` | `pyyaml==6.0.2` |
+
+**When adding a new dependency to `pyproject.toml`:**
+1. Pin to `>=current_version,<next_major` for post-1.0 (e.g. `>=1.5.0,<2`).
+2. For pre-1.0 packages, use `<0.(current_minor + 2)` (e.g. `>=0.29,<0.32`).
+3. Never commit a bare `>=X.Y.Z` without a ceiling — CI and reviewers will reject it.
+4. Run `uv lock` to regenerate `uv.lock` with hashes.
+
+Reference: #2810 (bounds pass), #9801 (SHA pinning + audit CI).
+
+---
+
 ## Adding Configuration

 ### config.yaml options:
@@ -513,6 +535,17 @@ generic plugin surface (new hook, new ctx method) — never hardcode
 plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded
 honcho argparse from `main.py` for exactly this reason.

+**No new in-tree memory providers (policy, May 2026):** the set of
+built-in memory providers under `plugins/memory/` is closed. New memory
+backends must ship as **standalone plugin repos** that users install
+into `~/.hermes/plugins/` (or via pip entry points) — they implement
+the same `MemoryProvider` ABC, register through the same discovery
+path, and integrate via `hermes memory setup` / `post_setup()` without
+landing in this tree. PRs that add a new directory under
+`plugins/memory/` will be closed with a pointer to publish the
+provider as its own repo. Existing in-tree providers stay; bug fixes
+to them are welcome.
+
 ### Model-provider plugins (`plugins/model-providers/<name>/`)

 Every inference backend (openrouter, anthropic, gmi, deepseek, nvidia, …)
@@ -540,10 +573,14 @@ Full authoring guide: `website/docs/developer-guide/model-provider-plugin.md`.

 ### Dashboard / context-engine / image-gen plugin directories

-`plugins/context_engine/`, `plugins/image_gen/`, `plugins/example-dashboard/`,
-etc. follow the same pattern (ABC + orchestrator + per-plugin directory).
-Context engines plug into `agent/context_engine.py`; image-gen providers
-into `agent/image_gen_provider.py`.
+`plugins/context_engine/`, `plugins/image_gen/`, etc. follow the same
+pattern (ABC + orchestrator + per-plugin directory). Context engines
+plug into `agent/context_engine.py`; image-gen providers into
+`agent/image_gen_provider.py`. Reference / docs-companion plugins
+(`example-dashboard`, `strike-freedom-cockpit`, `plugin-llm-example`,
+`plugin-llm-async-example`) live in the
+[`hermes-example-plugins`](https://github.com/NousResearch/hermes-example-plugins)
+companion repo, not in this tree.

 ---

@@ -576,6 +613,86 @@ during setup, injected at load time).
 Top-level `tags:` and `category:` are also accepted and mirrored from
 `metadata.hermes.*` by the loader.

+### Skill authoring standards (HARDLINE)
+
+Every new or modernized skill — bundled, optional, or contributed —
+must meet these standards before merge. Reviewers reject PRs that
+violate them.
+
+1. **`description` ≤ 60 characters, one sentence, ends with a period.**
+   Long descriptions bloat skill listings and dilute the model's
+   attention when many skills are loaded. State the capability, not
+   the implementation. No marketing words ("powerful",
+   "comprehensive", "seamless", "advanced"). Don't repeat the skill
+   name. Verify with:
+   ```python
+   import re, pathlib
+   m = re.search(r'^description: (.*)$',
+                 pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
+                 re.MULTILINE)
+   assert len(m.group(1)) <= 60, len(m.group(1))
+   ```
+
+2. **Tools referenced in SKILL.md prose must be native Hermes tools or
+   MCP servers the skill explicitly expects.** When the skill needs a
+   capability, point at the proper tool by name in backticks
+   (`` `terminal` ``, `` `web_extract` ``, `` `read_file` ``,
+   `` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``,
+   `` `browser_navigate` ``, `` `delegate_task` ``, etc.). Do NOT
+   name shell utilities the agent already has wrapped — `grep` →
+   `search_files`, `cat`/`head`/`tail` → `read_file`, `sed`/`awk` →
+   `patch`, `find`/`ls` → `search_files target='files'`. If the skill
+   depends on an MCP server, name the MCP server and document the
+   expected setup in `## Prerequisites`. Anything else (third-party
+   CLIs, shell pipelines, etc.) is fair game inside script files but
+   should not be the headline interaction surface in the prose.
+
+3. **`platforms:` gating audited against actual script imports.**
+   Skills that use POSIX-only primitives (`fcntl`, `termios`,
+   `os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, `/tmp`
+   hardcoded, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`,
+   `systemctl`) must declare their supported platforms. Default
+   posture: try to fix it cross-platform first — `tempfile.gettempdir`,
+   `pathlib.Path`, `psutil.pid_exists`, Python-level filtering instead
+   of `grep`. Gate to a narrower set only when the dependency is
+   genuinely platform-bound.
+
+4. **`author` credits the human contributor first.** For external
+   contributions, the contributor's real name + GitHub handle goes
+   first; "Hermes Agent" is the secondary collaborator. If the
+   contributor's commit shows "Hermes Agent" as author (because they
+   used Hermes to draft the skill), replace it with their actual name
+   — credit the human, not the tool.
+
+5. **SKILL.md body uses the modern section order.** `# <Skill> Skill`
+   title, 2-3 sentence intro stating what it does and doesn't do,
+   `## When to Use`, `## Prerequisites`, `## How to Run`,
+   `## Quick Reference`, `## Procedure`, `## Pitfalls`,
+   `## Verification`. Target ~200 lines for a complex skill,
+   ~100 lines for a simple one. Cut redundant intro fluff, marketing
+   prose, and re-explanations of env vars already in
+   `## Prerequisites`.
+
+6. **Scripts go in `scripts/`, references in `references/`,
+   templates in `templates/`.** Don't expect the model to inline-write
+   parsers, XML walkers, or non-trivial logic every call — ship a
+   helper script. Reference it from SKILL.md by path relative to the
+   skill directory.
+
+7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only
+   stdlib + pytest + `unittest.mock`. No live network calls. Run via
+   `scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`.
+
+8. **`.env.example` additions are isolated to a clearly delimited
+   block.** Don't touch the surrounding file — contributor-supplied
+   `.env.example` versions are usually stale and edits outside the
+   skill's own block must be dropped during salvage.
+
+The full salvage / modernization checklist for external skill PRs
+lives in the `hermes-agent-dev` skill at
+`references/new-skill-pr-salvage.md` — load it before polishing
+contributor skill PRs.
+
 ---

 ## Toolsets
@@ -713,10 +830,11 @@ kanban task.
  `unlink`, `comment`, `complete`, `block`, `unblock`, `archive`,
  `tail`, plus less-commonly-used `watch`, `stats`, `runs`, `log`,
  `assignees`, `heartbeat`, `notify-*`, `dispatch`, `daemon`, `gc`.
- **Worker toolset:** `tools/kanban_tools.py` exposes `kanban_show`,
-  `kanban_complete`, `kanban_block`, `kanban_heartbeat`, `kanban_comment`,
-  `kanban_create`, `kanban_link` — gated by `HERMES_KANBAN_TASK` so
-  the schema only appears for processes actually running as a worker.
+- **Worker/orchestrator toolset:** `tools/kanban_tools.py` exposes
+  `kanban_show`, `kanban_complete`, `kanban_block`, `kanban_heartbeat`,
+  `kanban_comment`, `kanban_create`, `kanban_link`; profiles that
+  explicitly enable the `kanban` toolset outside a dispatcher-spawned
+  task also get `kanban_list` and `kanban_unblock` for board routing.
 - **Dispatcher:** long-lived loop that (default every 60s) reclaims
  stale claims, promotes ready tasks, atomically claims, and spawns
  assigned profiles. Runs **inside the gateway** by default via
@@ -732,8 +850,9 @@ Isolation model:
 - **Tenant** is a soft namespace *within* a board — one specialist
  fleet can serve multiple businesses with workspace-path + memory-key
  isolation.
- After ~5 consecutive spawn failures on the same task the dispatcher
-  auto-blocks it to prevent spin loops.
+- After `kanban.failure_limit` consecutive non-success attempts on the
+  same task (default: 2), the dispatcher auto-blocks it to prevent spin
+  loops.

 Full user-facing docs: `website/docs/user-guide/features/kanban.md`.

@@ -894,17 +1013,39 @@ def profile_env(tmp_path, monkeypatch):

 **ALWAYS use `scripts/run_tests.sh`** — do not call `pytest` directly. The script enforces
 hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
-4 xdist workers matching GHA ubuntu-latest). Direct `pytest` on a 16+ core
-developer machine with API keys set diverges from CI in ways that have caused
-multiple "works locally, fails in CI" incidents (and the reverse).
+`-n auto` xdist workers, in-tree subprocess-isolation plugin). Direct `pytest`
+on a 16+ core developer machine with API keys set diverges from CI in ways
+that have caused multiple "works locally, fails in CI" incidents (and the reverse).

 ```bash
 scripts/run_tests.sh                                  # full suite, CI-parity
 scripts/run_tests.sh tests/gateway/                   # one directory
 scripts/run_tests.sh tests/agent/test_foo.py::test_x  # one test
 scripts/run_tests.sh -v --tb=long                     # pass-through pytest flags
+scripts/run_tests.sh --no-isolate tests/foo/          # disable subprocess isolation (faster, for debugging)
 ```

+### Subprocess-per-test isolation
+
+Every test runs in a freshly-spawned Python subprocess via the in-tree plugin
+at `tests/_isolate_plugin.py`. This means module-level dicts/sets and
+ContextVars from one test cannot leak into the next — the historic
+`_reset_module_state` autouse fixture is gone.
+
+Implementation notes:
+
+- The plugin uses `multiprocessing.get_context("spawn")`, which works on
+  Linux, macOS, and Windows alike (POSIX `fork` is not used).
+- Per-test overhead is ~0.5–1.0s (Python startup + pytest collection). xdist
+  parallelism amortizes this across cores; on a 20-core box the full suite
+  finishes in roughly the same wall time as before, but flake-free.
+- `isolate_timeout` (configured in `pyproject.toml`) caps each test at 30s.
+  Hangs are killed and surfaced as a failure report.
+- Pass `--no-isolate` to disable isolation — useful when debugging a single
+  test interactively, or when you specifically want to verify state leakage.
+- The plugin disables itself in child processes (sentinel envvar
+  `HERMES_ISOLATE_CHILD=1`), so there's no fork-bomb risk.
+
 ### Why the wrapper (and why the old "just call pytest" doesn't work)

 Five real sources of local-vs-CI drift the script closes:
@@ -915,7 +1056,7 @@ Five real sources of local-vs-CI drift the script closes:
 | HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
 | Timezone | Local TZ (PDT etc.) | UTC |
 | Locale | Whatever is set | C.UTF-8 |
-| xdist workers | `-n auto` = all cores (20+ on a workstation) | `-n 4` matching CI |
+| xdist workers | `-n auto` = all cores | `-n auto` (safe — subprocess isolation prevents cross-worker flakes) |

 `tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest
 invocation (including IDE integrations) gets hermetic behavior — but the wrapper
@@ -923,15 +1064,21 @@ is belt-and-suspenders.

 ### Running without the wrapper (only if you must)

-If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
-pytest directly), at minimum activate the venv and pass `-n 4`:
+If you can't use the wrapper (e.g. inside an IDE that shells pytest directly),
+at minimum activate the venv. The isolation plugin loads automatically from
+`addopts` in `pyproject.toml`, so you get the same per-test process isolation
+either way.

 ```bash
 source .venv/bin/activate   # or: source venv/bin/activate
-python -m pytest tests/ -q -n 4
+python -m pytest tests/ -q
 ```

-Worker count above 4 will surface test-ordering flakes that CI never sees.
+If you need to bypass isolation for fast feedback while debugging:
+
+```bash
+python -m pytest tests/agent/test_foo.py -q --no-isolate
+```

 Always run the full suite before pushing changes.

--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -49,6 +49,24 @@ If your skill is specialized, community-contributed, or niche, it's better suite

 ---

+## Memory Providers: Ship as a Standalone Plugin
+
+**We are no longer accepting new memory providers into this repo.** The set of built-in providers under `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) is closed. If you want to add a new memory backend, publish it as a **standalone plugin repo** that users install into `~/.hermes/plugins/` (or via a pip entry point).
+
+Standalone memory plugins:
+
+- Implement the same `MemoryProvider` ABC (`agent/memory_provider.py`) — `sync_turn`, `prefetch`, `shutdown`, and optionally `post_setup(hermes_home, config)` for setup-wizard integration
+- Use the same discovery system — `discover_memory_providers()` picks them up from user/project plugin directories and pip entry points
+- Integrate with `hermes memory setup` via `post_setup()` — no need to touch core code
+- Can register their own CLI subcommands via `register_cli(subparser)` in a `cli.py` file
+- Get all the same lifecycle hooks and config plumbing as in-tree providers
+
+PRs that add a new directory under `plugins/memory/` will be closed with a pointer to publish the provider as its own repo. Existing in-tree providers stay; bug fixes to them are welcome.
+
+This isn't a quality bar — it's a coupling-and-maintenance decision. Memory providers are the most common plugin type and they shouldn't all live in this tree.
+
+---
+
 ## Development Setup

 ### Prerequisites
@@ -73,9 +91,6 @@ export VIRTUAL_ENV="$(pwd)/venv"
 # Install with all extras (messaging, cron, CLI menus, dev tools)
 uv pip install -e ".[all,dev]"

-# Optional: RL training submodule
-# git submodule update --init tinker-atropos && uv pip install -e "./tinker-atropos"
-
 # Optional: browser tools
 npm install
 ```
@@ -157,7 +172,7 @@ hermes-agent/
 │   ├── vision_tools.py           # Image analysis via multimodal models
 │   ├── delegate_tool.py          # Subagent spawning and parallel task execution
 │   ├── code_execution_tool.py    # Sandboxed Python with RPC tool access
-│   ├── session_search_tool.py    # Search past conversations with FTS5 + summarization
+│   ├── session_search_tool.py    # Search past conversations with FTS5 + anchored windows
 │   ├── cronjob_tools.py          # Scheduled task management
 │   ├── skill_tools.py            # Skill search, load, manage
 │   └── environments/             # Terminal execution backends
@@ -178,7 +193,6 @@ hermes-agent/
 │
 ├── skills/                   # Bundled skills (copied to ~/.hermes/skills/ on install)
 ├── optional-skills/          # Official optional skills (discoverable via hub, not activated by default)
-├── environments/             # RL training environments (Atropos integration)
 ├── tests/                    # Test suite
 ├── website/                  # Documentation site (hermes-agent.nousresearch.com)
 │
@@ -196,7 +210,7 @@ hermes-agent/
 | `~/.hermes/skills/` | All active skills (bundled + hub-installed + agent-created) |
 | `~/.hermes/memories/` | Persistent memory (MEMORY.md, USER.md) |
 | `~/.hermes/state.db` | SQLite session database |
-| `~/.hermes/sessions/` | JSON session logs |
+| `~/.hermes/sessions/` | Gateway routing index (`sessions.json`), request-dump breadcrumbs, gateway `*.jsonl` transcripts, and (optionally) per-session JSON snapshots when `sessions.write_json_snapshots: true` is set. The per-session snapshots are off by default; state.db is canonical. |
 | `~/.hermes/cron/` | Scheduled job data |
 | `~/.hermes/whatsapp/session/` | WhatsApp bridge credentials |

@@ -225,7 +239,7 @@ User message → AIAgent._run_agent_loop()

 - **Self-registering tools**: Each tool file calls `registry.register()` at import time. `model_tools.py` triggers discovery by importing all tool modules.
 - **Toolset grouping**: Tools are grouped into toolsets (`web`, `terminal`, `file`, `browser`, etc.) that can be enabled/disabled per platform.
- **Session persistence**: All conversations are stored in SQLite (`hermes_state.py`) with full-text search and unique session titles. JSON logs go to `~/.hermes/sessions/`.
+- **Session persistence**: All conversations are stored in SQLite (`hermes_state.py`) with full-text search and unique session titles. Per-session JSON snapshots in `~/.hermes/sessions/` were superseded by the SQLite store and are off by default; opt back in with `sessions.write_json_snapshots: true` if you have external tooling that consumes the JSON files directly.
 - **Ephemeral injection**: System prompts and prefill messages are injected at API call time, never persisted to the database or logs.
 - **Provider abstraction**: The agent works with any OpenAI-compatible API. Provider resolution happens at init time (Nous Portal OAuth, OpenRouter API key, or custom endpoint).
 - **Provider routing**: When using OpenRouter, `provider_routing` in config.yaml controls provider selection (sort by throughput/latency/price, allow/ignore specific providers, data retention policies). These are injected as `extra_body.provider` in API requests.
@@ -461,6 +475,58 @@ Gateway and messaging sessions never collect secrets in-band; they instruct the

 See `skills/gifs/gif-search/` and `skills/email/himalaya/` for examples.

+### Skill authoring standards (HARDLINE)
+
+Every new or modernized skill — bundled, optional, or contributed — must meet these standards before merge. Reviewers reject PRs that violate them.
+
+1. **`description` ≤ 60 characters, one sentence, ends with a period.** Long descriptions bloat the skill listing UI and dilute the model's attention when many skills are loaded. State the capability, not the implementation. No marketing words ("powerful", "comprehensive", "seamless", "advanced"). Don't repeat the skill name. Verify with:
+   ```python
+   import re, pathlib
+   m = re.search(r'^description: (.*)$',
+                 pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
+                 re.MULTILINE)
+   assert len(m.group(1)) <= 60, len(m.group(1))
+   ```
+
+   Good: `Search arXiv papers by keyword, author, category, or ID.`
+   Bad: `A powerful and comprehensive skill that allows the agent to search arXiv for relevant academic papers using various criteria including keywords, authors, and categories.`
+
+2. **Tools referenced in SKILL.md prose must be native Hermes tools or MCP servers the skill explicitly expects.** When the skill needs a capability, point at the proper tool by name in backticks: `` `terminal` ``, `` `web_extract` ``, `` `web_search` ``, `` `read_file` ``, `` `write_file` ``, `` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``, `` `browser_navigate` ``, `` `delegate_task` ``, `` `image_generate` ``, `` `text_to_speech` ``, `` `cronjob` ``, `` `memory` ``, `` `skill_view` ``, `` `todo` ``, `` `execute_code` ``.
+
+   Do NOT name shell utilities the agent already has wrapped:
+
+   | Don't say | Say |
+   |---|---|
+   | `grep`, `rg` | `search_files` |
+   | `cat`, `head`, `tail` | `read_file` |
+   | `sed`, `awk` | `patch` |
+   | `find`, `ls` | `search_files` (with `target='files'`) |
+   | `curl` for content extraction | `web_extract` |
+   | `echo > file`, `cat <<EOF` | `write_file` |
+
+   If the skill depends on an MCP server, name the MCP server and document its setup in `## Prerequisites`. Third-party CLIs (e.g. `ffmpeg`, `gh`, a specific SDK) are fine to invoke from inside script files, but the prose should frame the interaction as "invoke through the `terminal` tool", not as a manual shell session.
+
+3. **`platforms:` gating audited against actual script imports.** Skills that use POSIX-only primitives (`fcntl`, `termios`, `os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, hardcoded `/tmp` paths, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`, `systemctl`) must declare their supported platforms via the `platforms:` frontmatter. Default posture is to fix it cross-platform first — `tempfile.gettempdir()`, `pathlib.Path`, `psutil.pid_exists()`, Python-level filtering instead of `grep`. Gate to a narrower set only when the dependency is genuinely platform-bound (e.g. `osascript` is macOS-only, `/proc` is Linux-only).
+
+4. **`author` credits the human contributor first.** For external contributions, the contributor's real name + GitHub handle goes first (`Jane Doe (jane-doe)`); "Hermes Agent" is the secondary collaborator. If the contributor's commit shows "Hermes Agent" as author because they used Hermes to draft the skill, replace it with their actual name — credit the human, not the tool.
+
+5. **SKILL.md body uses the modern section order.** `# <Skill> Skill` title, 2-3 sentence intro stating what it does and what it doesn't do, then:
+   - `## When to Use` — trigger conditions
+   - `## Prerequisites` — env vars, install steps, MCP setup, API key sourcing
+   - `## How to Run` — canonical invocation through the `terminal` tool
+   - `## Quick Reference` — flat command/API reference
+   - `## Procedure` — numbered steps with copy-paste commands
+   - `## Pitfalls` — known limits, rate limits, things that look broken but aren't
+   - `## Verification` — single command that proves the skill works
+
+   Target ~200 lines for a complex skill, ~100 lines for a simple one. Cut redundant intro fluff, marketing prose, and re-explanations of env vars already documented in `## Prerequisites`.
+
+6. **Scripts go in `scripts/`, references in `references/`, templates in `templates/`.** Don't expect the model to inline-write parsers, XML walkers, or non-trivial logic every call — ship a helper script. Reference scripts from SKILL.md by path relative to the skill directory.
+
+7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only stdlib + pytest + `unittest.mock`. No live network calls. Run via `scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`. Must pass under the hermetic CI env (no API keys leaking through). Use `monkeypatch` and `tmp_path` for any env-var or filesystem dependencies.
+
+8. **`.env.example` additions are isolated to a clearly delimited block.** Don't touch the surrounding file — contributor-supplied `.env.example` versions are usually stale, and edits outside the skill's own block will be dropped during salvage. Comment all values with `#` (it's documentation, not live config).
+
 ### Skill guidelines

 - **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
@@ -734,6 +800,47 @@ Hermes has terminal access. Security matters.

 If your PR affects security, note it explicitly in the description.

+### Dependency pinning policy (supply chain hardening)
+
+After the [litellm supply chain compromise](https://github.com/BerriAI/litellm/issues/24512) in March 2026 and the [Mini Shai-Hulud worm campaign](https://socket.dev/blog/tanstack-npm-packages-compromised-mini-shai-hulud-supply-chain-attack) in May 2026, all dependencies must follow these rules:
+
+| Source type | Required treatment | Rationale |
+|---|---|---|
+| **PyPI package** | `>=floor,<next_major` | PyPI versions are immutable once published, but new versions can be pushed into your range. A `<next_major` ceiling stops a 1.x install from upgrading to a malicious 2.0.0. |
+| **Git URL** (atroposlib, tinker, yc-bench, Baileys) | Full commit SHA | Branches and tags are mutable refs; SHA is content-addressed. |
+| **GitHub Actions** | Full commit SHA + version comment | Action tags are mutable refs (e.g. tj-actions/changed-files March 2025). Pin as `uses: owner/action@<sha>  # vX.Y.Z` |
+| **CI-only pip installs** | `==exact` | Hermetic CI builds; churn is acceptable. |
+
+**Every new PyPI dependency in a PR must have a `<next_major` upper bound.** PRs adding unbounded `>=X.Y.Z` specs will be rejected by reviewers. The `supply-chain-audit.yml` CI workflow also flags dependency manifest changes for manual review.
+
+**How to determine the ceiling:**
+- If the package is at version `1.x.y`, use `<2`.
+- If the package is at version `0.x.y` (pre-1.0), use `<0.(current_minor + 2)` — e.g. if current is `0.29.x`, use `<0.32`. This gives ~2 minor versions of headroom while keeping the window small enough that a hostile takeover version is unlikely to land inside it.
+- Exception: packages with very stable APIs (e.g. `aiohttp-socks`) can use `<1` at reviewer discretion.
+
+**Examples:**
+```toml
+# ✅ Correct — post-1.0
+"openai>=2.21.0,<3"
+"pydantic>=2.12.5,<3"
+
+# ✅ Correct — pre-1.0 (tight minor window)
+"asyncpg>=0.29,<0.32"
+"aiosqlite>=0.20,<0.23"
+"hindsight-client>=0.4.22,<0.5"
+
+# ❌ Rejected — no upper bound
+"some-package>=1.2.3"
+
+# ❌ Rejected — too tight (blocks legitimate patches)
+"some-package==1.2.3"
+
+# ❌ Rejected — too loose for pre-1.0 (allows 80 minor versions)
+"some-package>=0.20,<1"
+```
+
+**Reference PRs:** #2796 (litellm removal), #2810 (upper bounds pass), #9801 (SHA pinning + supply-chain-audit CI).
+
 ---

 ## Pull Request Process
--- a/17
+++ b/17
@@ -66,9 +66,11 @@ RUN npm install --prefer-offline --no-audit && \
 # frontend stats the readme path during dep resolution, so we `touch` an
 # empty placeholder — the real README is restored by `COPY . .` below.
 #
-# `uv sync --frozen --no-install-project --extra all` installs only the
-# deps reachable through the composite `[all]` extra (handpicked set
-# intended for the production image).  We do NOT use `--all-extras`:
+# `uv sync --frozen --no-install-project --extra all --extra messaging`
+# installs the deps reachable through the composite `[all]` extra
+# (handpicked set intended for the production image), plus gateway
+# messaging adapters that should work in the published image without a
+# first-boot lazy install.  We do NOT use `--all-extras`:
 # that would pull in `[rl]` (atroposlib + tinker + torch + wandb from
 # git), `[yc-bench]` (another git dep), and `[termux-all]` (Android
 # redundancy), none of which belong in the published container.
@@ -76,7 +78,7 @@ RUN npm install --prefer-offline --no-audit && \
 # The editable link is created after the source copy below.
 COPY pyproject.toml uv.lock ./
 RUN touch ./README.md
-RUN uv sync --frozen --no-install-project --extra all
+RUN uv sync --frozen --no-install-project --extra all --extra messaging

 # ---------- Source code ----------
 # .dockerignore excludes node_modules, so the installs above survive.
@@ -94,9 +96,13 @@ RUN cd web && npm run build && \
 # hermes_cli/main.py succeeds (see #18800). /opt/hermes/web is build-time
 # only (HERMES_WEB_DIST points at hermes_cli/web_dist) and is intentionally
 # not chowned here.
+# The .venv MUST remain hermes-writable so lazy_deps.py can install
+# remaining optional platform packages and future pin bumps at first use.
+# Without this, `uv pip install` fails with EACCES and adapters silently
+# fail to load.  See tools/lazy_deps.py.
 USER root
 RUN chmod -R a+rX /opt/hermes && \
-    chown -R hermes:hermes /opt/hermes/ui-tui /opt/hermes/node_modules
+    chown -R hermes:hermes /opt/hermes/.venv /opt/hermes/ui-tui /opt/hermes/node_modules
 # Start as root so the entrypoint can usermod/groupmod + gosu.
 # If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).

@@ -109,5 +115,6 @@ RUN uv pip install --no-cache-dir --no-deps -e "."
 ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
 ENV HERMES_HOME=/opt/data
 ENV PATH="/opt/data/.local/bin:${PATH}"
+RUN mkdir -p /opt/data
 VOLUME [ "/opt/data" ]
 ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]
--- a/README.md
+++ b/README.md
@@ -14,7 +14,7 @@

 **The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.

-Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
+Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NovitaAI](https://novita.ai) (AI-native cloud for Model API, Agent Sandbox, and GPU Cloud), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.

 <table>
 <tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
@@ -23,7 +23,7 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
 <tr><td><b>Scheduled automations</b></td><td>Built-in cron scheduler with delivery to any platform. Daily reports, nightly backups, weekly audits — all in natural language, running unattended.</td></tr>
 <tr><td><b>Delegates and parallelizes</b></td><td>Spawn isolated subagents for parallel workstreams. Write Python scripts that call tools via RPC, collapsing multi-step pipelines into zero-context-cost turns.</td></tr>
 <tr><td><b>Runs anywhere, not just your laptop</b></td><td>Seven terminal backends — local, Docker, SSH, Singularity, Modal, Daytona, and Vercel Sandbox. Daytona and Modal offer serverless persistence — your agent's environment hibernates when idle and wakes on demand, costing nearly nothing between sessions. Run it on a $5 VPS or a GPU cluster.</td></tr>
-<tr><td><b>Research-ready</b></td><td>Batch trajectory generation, Atropos RL environments, trajectory compression for training the next generation of tool-calling models.</td></tr>
+<tr><td><b>Research-ready</b></td><td>Batch trajectory generation, trajectory compression for training the next generation of tool-calling models.</td></tr>
 </table>

 ---
@@ -43,7 +43,7 @@ curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scri
 Run this in PowerShell:

 ```powershell
-irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
+iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)
 ```

 The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install).  Hermes uses this bundled Git Bash to run shell commands.
@@ -175,8 +175,6 @@ uv pip install -e ".[all,dev]"
 scripts/run_tests.sh
 ```

-> **RL Training (optional):** The RL/Atropos integration (`environments/`) — see [`CONTRIBUTING.md`](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#development-setup) for the full setup.
-
 ---

 ## Community
@@ -184,6 +182,7 @@ scripts/run_tests.sh
 - 💬 [Discord](https://discord.gg/NousResearch)
 - 📚 [Skills Hub](https://agentskills.io)
 - 🐛 [Issues](https://github.com/NousResearch/hermes-agent/issues)
+- 🔌 [computer-use-linux](https://github.com/avifenesh/computer-use-linux) — Linux desktop-control MCP server for Hermes and other MCP hosts, with AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.
 - 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — Community WeChat bridge: Run Hermes Agent and OpenClaw on the same WeChat account.

 ---
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@@ -23,7 +23,7 @@
 <tr><td><b>定时自动化</b></td><td>内置 cron 调度器，支持向任何平台投递。日报、夜间备份、周审计——全部用自然语言描述，无人值守运行。</td></tr>
 <tr><td><b>委派与并行</b></td><td>生成隔离子代理处理并行工作流。编写 Python 脚本通过 RPC 调用工具，将多步管道压缩为零上下文开销的轮次。</td></tr>
 <tr><td><b>随处运行</b></td><td>六种终端后端——本地、Docker、SSH、Daytona、Singularity 和 Modal。Daytona 和 Modal 提供 Serverless 持久化——代理环境空闲时休眠、按需唤醒，空闲期间几乎零成本。$5 VPS 或 GPU 集群都能跑。</td></tr>
-<tr><td><b>研究就绪</b></td><td>批量轨迹生成、Atropos RL 环境、轨迹压缩——用于训练下一代工具调用模型。</td></tr>
+<tr><td><b>研究就绪</b></td><td>批量轨迹生成、轨迹压缩——用于训练下一代工具调用模型。</td></tr>
 </table>

 ---
@@ -161,12 +161,6 @@ uv pip install -e ".[all,dev]"
 python -m pytest tests/ -q
 ```

-> **RL 训练（可选）：** 如需参与 RL/Tinker-Atropos 集成开发：
-> ```bash
-> git submodule update --init tinker-atropos
-> uv pip install -e "./tinker-atropos"
-> ```
-
 ---

 ## 社区
--- a/RELEASE_v0.14.0.md
+++ b/RELEASE_v0.14.0.md
@@ -0,0 +1,479 @@
+# Hermes Agent v0.14.0 (v2026.5.16)
+
+**Release Date:** May 16, 2026
+**Since v0.13.0:** 808 commits · 633 merged PRs · 1393 files changed · 165,061 insertions · 545 issues closed (12 P0, 50 P1) · 215 community contributors (including co-authors)
+
+> The Foundation Release — Hermes installs and runs anywhere, ships with the things you actually want to use, and stops shipping the things you don't. xAI Grok lands as a SuperGrok OAuth provider with grok-4.3 bumped to a 1M context window. A new OpenAI-compatible local proxy turns any OAuth-authed Hermes provider — Claude Pro, ChatGPT Pro, SuperGrok — into an endpoint that Codex / Aider / Cline / Continue can hit. `x_search` lands as a first-class X (Twitter) search tool with OAuth-or-API-key auth. The Microsoft Teams stack is wired end-to-end (Graph auth + webhook listener + pipeline runtime + outbound delivery). A debloating wave makes installs dramatically lighter — heavyweight backends now lazy-install on first use, the `[all]` extras drop everything covered by lazy-deps, and a tiered install falls back when a wheel rejects on your platform. `pip install hermes-agent` works from PyPI. The cold-start wave shaves ~19 seconds off `hermes` launch. Browser CDP calls are 180x faster. Two new messaging platforms (LINE + SimpleX Chat) bring the total to 22. Cross-session 1-hour Claude prompt caching, `/handoff` that actually transfers sessions live, native button UI for `clarify` on Telegram and Discord, Discord channel history backfill, LSP semantic diagnostics on every write, a unified pluggable `video_generate`, a `computer_use` cua-driver backend that finally works with non-Anthropic providers, clickable URLs in any terminal, Zed ACP Registry integration via `uvx`, native Windows beta, 9 new optional skills, OpenRouter Pareto Code router, huggingface/skills as a trusted default tap. 12 P0 + 50 P1 closures.
+
+---
+
+## ✨ Highlights
+
+- **xAI Grok via SuperGrok OAuth — and grok-4.3 jumps to a 1M context window** — If you pay for SuperGrok, you can now use Grok inside Hermes by signing in with your xAI account — no API key, no separate billing. The wire-through also bumps grok-4.3 to a 1M token context window, so you can drop whole codebases or research corpora into a single prompt. Includes proper handling for entitlement errors and an SSH-to-tunnel docs page for when you're SSH'd into a remote box and need to complete the OAuth flow. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534), [#26664](https://github.com/NousResearch/hermes-agent/pull/26664), [#26644](https://github.com/NousResearch/hermes-agent/pull/26644), [#26592](https://github.com/NousResearch/hermes-agent/pull/26592))
+
+- **OpenAI-compatible local proxy for OAuth providers** — Run `hermes proxy` and you get a `http://localhost:port` endpoint that speaks the OpenAI API but is backed by whichever OAuth provider you're signed into — Claude Pro, ChatGPT Pro, SuperGrok. Now any tool that expects an OpenAI-compatible endpoint (Codex CLI, Aider, Cline, Continue, your custom scripts) just works with your existing subscription, no API key required. One subscription, every tool. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
+
+- **`x_search` — first-class X (Twitter) search tool** — The agent can now search X directly without installing a skill or wiring up a custom integration. Search the timeline, find threads, surface specific posts — straight from the chat. Auth with either your X OAuth login or an API key, whichever you have. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
+
+- **Microsoft Teams — end-to-end** — Hermes can now read messages from Teams and post back. The full Microsoft Graph stack lands together: auth + client foundation, a webhook listener that receives Teams events, a pipeline plugin runtime, and outbound delivery. Wire up the bot once, then chat to your agent from any Teams channel, DM, or group. (salvages of #21408–#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
+
+- **Debloating wave — lighter installs, less you don't use** — A clean `pip install hermes-agent` used to pull down everything: every messaging adapter SDK, every image-gen SDK, every voice/TTS provider, whether you used them or not. Now those heavy backends (Slack / Matrix / Feishu / DingTalk adapters, hindsight client, codex app-server, Pixverse / Camofox / image-gen SDKs, voice/TTS providers) install automatically the first time you actually use them. The `[all]` extras drop everything covered by lazy-deps, the installer falls back through tiers when a wheel doesn't fit your platform, and a supply-chain advisory checker scans every install for unsafe versions. Faster installs, smaller disk footprint, fewer transitive vulnerabilities. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220), [#24515](https://github.com/NousResearch/hermes-agent/pull/24515), [#25014](https://github.com/NousResearch/hermes-agent/pull/25014), [#25038](https://github.com/NousResearch/hermes-agent/pull/25038), [#25766](https://github.com/NousResearch/hermes-agent/pull/25766), [#21818](https://github.com/NousResearch/hermes-agent/pull/21818))
+
+- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. No more cloning the repo or running shell installers — one pip command and you're running. The wheel ships with the Ink TUI bundle and the shell launcher, so the full experience comes out of the box. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593), [#26148](https://github.com/NousResearch/hermes-agent/pull/26148))
+
+- **Cross-session 1h Claude prompt cache** — When you use Claude through Anthropic, OpenRouter, or Nous Portal, the prompt prefix (system prompt, skills, memory) now caches for an hour across sessions. Start a `/new` session and the first response comes back faster and cheaper because the cache is still warm from your last session. Background memory review hits the cache too, so it's not paying full price every turn. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828), [#25434](https://github.com/NousResearch/hermes-agent/pull/25434), [#24778](https://github.com/NousResearch/hermes-agent/pull/24778))
+
+- **180x faster `browser_console` evaluations** — When the agent uses the browser tool to inspect a page or run JavaScript, those calls now share one persistent connection to Chrome instead of spinning up a new DevTools session every time. The difference is huge: things that used to take a couple of seconds per call return in milliseconds. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
+
+- **Cold-start performance wave — ~19 seconds off `hermes` launch** — Running `hermes` used to make you wait through a chunk of import overhead and network calls before you saw a prompt. Now the launch path is mostly deferred: heavy adapters only load when you use them, model catalogs come from disk cache first, doctor checks run in parallel, and `chat -q` skips the welcome banner entirely. The `hermes tools` All-Platforms screen alone dropped from 14 seconds to under 1.5 seconds. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
+
+- **Two new messaging platforms — LINE + SimpleX Chat** — LINE is huge in Japan, Korea, and Taiwan, and now Hermes runs natively on the LINE Messaging API. SimpleX Chat is the privacy-focused decentralized messenger with no user IDs — also wired up as a first-class platform. That brings Hermes to 22 messaging platforms total, so wherever you and your team chat, the agent can be there. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
+
+- **`/handoff` actually transfers the session live** — Switching models or personalities mid-conversation used to mean losing context or starting over. Now `/handoff` moves your active session — every message, every tool call, every piece of context — to the target model, persona, or profile, live, without dropping anything. Mid-debugging hand off from a fast model to a deep-reasoning one, or pass a session between profiles for different parts of a task. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
+
+- **Native button UI for `clarify` on Telegram and Discord** — When the agent uses the `clarify` tool to ask you a multiple-choice question, it now shows real platform-native buttons on Telegram and Discord instead of asking you to type back the option number. Tap the button, the agent gets your answer. Especially nice on mobile. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
+
+- **Discord channel history backfill (default on)** — When Hermes joins a Discord channel or thread for the first time, it now reads the recent message history so it knows what's been said before it responds. No more "what are we talking about?" — the agent has the context that's already on screen for everyone else. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
+
+- **`vision_analyze` returns pixels to vision-capable models** — When you point the agent at an image with `vision_analyze` and the active model can actually see (GPT-5, Claude, Gemini, Grok-vision), Hermes now passes the raw pixels straight to the model instead of converting them to a text description first. You get the model's actual visual reasoning instead of a degraded text-summary round-trip. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
+
+- **Per-turn file-mutation verifier footer** — After every turn that wrote or edited files, the agent now gets a short footer summarizing exactly what changed on disk — the file paths, the line counts, the actual delta. That means the agent catches its own mistakes when a write didn't land or got silently overwritten, instead of confidently telling you "I added the function" when the file wasn't actually saved. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
+
+- **LSP semantic diagnostics on every write** — When the agent uses `write_file` or `patch`, Hermes now runs a real language server against the edited file and surfaces any new errors back to the agent before the next turn. Type errors, undefined symbols, missing imports — caught immediately. Goes way beyond v0.13.0's basic Python/JSON/YAML/TOML linting because it's actual semantic analysis. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
+
+- **Unified `video_generate` with pluggable provider backends** — One tool, any video model. Hermes ships with the obvious backends already, but you can drop in a new video provider as a plugin without touching core. So when a new video model lands next month, it can be a one-file plugin instead of a fork. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
+
+- **`computer_use` cua-driver backend — works with non-Anthropic models now** — Computer-use (the agent controlling your mouse and keyboard to drive GUI apps) used to be locked to Anthropic's SDK. The new cua-driver backend works with non-Anthropic providers too, has proper focus-safe operations, and refreshes itself on `hermes update`. Now any vision-capable model can drive your desktop. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
+
+- **Clickable URLs in any terminal** — Links in agent output are now real OSC8 hyperlinks with hover-highlight in any terminal that supports them. Click to open in your browser — no more copy-paste-trim of long URLs from the transcript. Just works in iTerm2, Kitty, Ghostty, modern Windows Terminal, etc. (@OutThisLife) ([#25071](https://github.com/NousResearch/hermes-agent/pull/25071), [#24013](https://github.com/NousResearch/hermes-agent/pull/24013))
+
+- **Zed ACP Registry — `uvx` install in one click** — Hermes is now listed in Zed's Agent Client Protocol registry, so Zed users can install it with one click. The install path uses `uvx` so there's no npm dependency. `hermes acp --setup-browser` bootstraps the browser tools for registry-driven installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
+
+- **OpenRouter Pareto Code router with `min_coding_score` knob** — OpenRouter's "Pareto" router automatically picks the cheapest model that meets a minimum quality bar. The new `min_coding_score` config lets you set that bar for coding tasks specifically — Hermes routes to the most affordable model that's at least that good at code. Stop paying for top-tier models when a mid-tier one would do. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
+
+- **NovitaAI as a new model provider** — NovitaAI joins the provider lineup, giving you another option for open-source model hosting (Llama, Qwen, DeepSeek, etc.) with their pricing and rate limits. (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507))
+
+- **Codex app-server runtime for OpenAI/Codex models** — An optional runtime that drives OpenAI's Codex CLI under the hood when you're using OpenAI or Codex paths. You get session reuse, automatic retirement of wedged sessions, and proper OAuth refresh classification — the kind of plumbing that makes long agentic runs not fall over. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
+
+- **`huggingface/skills` as a trusted default tap** — The community skills index hosted at huggingface.co/skills is now wired into the Skills Hub by default. So when somebody publishes a useful skill there, you can install it from your own `hermes skills` browser without any extra config. (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
+
+- **9 new optional skills** — Hyperliquid (perp + spot trading via the SDK and REST API), Yahoo Finance (live market data, fundamentals, historicals), api-testing (REST + GraphQL debug recipes), unified EVM multi-chain (one skill covers Ethereum + L2s + Base), darwinian-evolver (evolutionary prompt/skill tuning), osint-investigation (OSINT recipes for people / domains / orgs), pinggy-tunnel (expose local services to the public internet), watchers (polls RSS / HTTP JSON / GitHub via cron `no_agent` mode for change detection), and a full Notion overhaul for the May 2026 Developer Platform. ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582), [#23583](https://github.com/NousResearch/hermes-agent/pull/23583), [#23590](https://github.com/NousResearch/hermes-agent/pull/23590), [#25299](https://github.com/NousResearch/hermes-agent/pull/25299), [#26760](https://github.com/NousResearch/hermes-agent/pull/26760), [#26729](https://github.com/NousResearch/hermes-agent/pull/26729), [#26765](https://github.com/NousResearch/hermes-agent/pull/26765), [#21881](https://github.com/NousResearch/hermes-agent/pull/21881), [#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
+
+- **API server exposes run approval events** — If you're driving Hermes programmatically through the HTTP API, long-running runs no longer silently hang when the agent hits an approval-required command. The approval request now surfaces on the API stream so your client can prompt the user and reply — no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
+
+- **Plugins can run any LLM call via `ctx.llm` + replace built-in tools via `tool_override`** — If you're writing a Hermes plugin, you now get first-class access to make LLM calls through the active provider and credentials — no manual client wiring. The new `tool_override` flag lets a plugin swap out a built-in tool with its own implementation cleanly. Plugin authors get the same model-routing and auth plumbing the core agent uses. (closes #11049) ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
+
+- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — Two new free web-search backends join Tavily, SearXNG, and Exa. Brave Search has a generous free tier; DDGS is the DuckDuckGo scraper that needs no key at all. Pick whichever fits your budget and rate-limit needs. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
+
+- **Sudo brute-force block + 3 dangerous-command bypasses closed + tool-error sanitization** — The approval gate now blocks `sudo -S` brute-force attempts and classifies stdin-fed or askpass-stripped sudo invocations as DANGEROUS. Three known bypasses of dangerous-command detection are closed (inspired by Claude Code's command-detection work). And tool error strings are now sanitized before being re-injected into the model context, so a malicious file or remote service can't pass instructions to your agent through error output. ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736), [#26829](https://github.com/NousResearch/hermes-agent/pull/26829), [#26823](https://github.com/NousResearch/hermes-agent/pull/26823))
+
+- **`/subgoal` — user-added criteria appended to an active `/goal`** — When you've got a `/goal` running (the persistent Ralph-loop goal where the agent keeps going until criteria are met), you can now use `/subgoal <text>` to layer extra success criteria onto it mid-run. The judge factors your new criteria into the done-or-keep-going decision without restarting the loop. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
+
+- **Provider rename — Alibaba Cloud → Qwen Cloud** — The Alibaba Cloud provider is renamed to Qwen Cloud in the picker and config to match what the rest of the world calls it. Existing config keys still work — no breaking changes — but the UI matches the actual brand now. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
+
+- **Native Windows support (early beta)** — Hermes now runs natively on `cmd.exe` and PowerShell without WSL. A full PowerShell installer handles MinGit auto-install, Microsoft Store python stub detection, and the foreground Ctrl+C dance. There's still rough edges (this is the "early beta" stamp) — ~40 follow-up Windows-only fixes already landed in the window — but the basic loop works end-to-end on a clean Windows box. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561))
+
+
+---
+
+## 🪟 Windows — Native Support (Early Beta)
+
+### Bootstrap & installer
+- **Native Windows support (early beta)** — first-class native Windows path across CLI / gateway / TUI / tools ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561))
+- **PyPI wheel packaging — `pip install hermes-agent && hermes`** (salvage of #26350) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593))
+- **Recognise Shift+Enter as a newline key** + Windows docs (salvage #21545) ([#22130](https://github.com/NousResearch/hermes-agent/pull/22130))
+- **Preserve Ctrl+C for Windows foreground runs** (@helix4u) ([#22752](https://github.com/NousResearch/hermes-agent/pull/22752))
+- **Stop spamming cwd-missing + tirith-spawn warnings on every terminal call** ([#26618](https://github.com/NousResearch/hermes-agent/pull/26618))
+- **Use `--extra all` not `--all-extras`; drop lazy-covered extras from `[all]`** ([#24515](https://github.com/NousResearch/hermes-agent/pull/24515))
+
+### Windows-specific fixes (40+ across cli / tools / gateway / curator / TUI)
+A long tail of native-Windows fixes shipped alongside the beta — taskkill-based subprocess management, MinGit auto-install, Microsoft Store python stub detection, npm prefix handling, native PTY paths, signal handling differences, foreground process management, ANSI sequence handling, path normalization, file-locking semantics, and many more. Full list in commit log under `fix(windows)` / `feat(windows)` / `windows`.
+
+---
+
+## 🚀 Performance Wave
+
+### Cold start
+- **Cut ~19s from `hermes` cold start** — skills cache + lazy Feishu + no Nous HTTP at startup ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138))
+- **Skip eager plugin discovery on known built-in subcommands** ([#22120](https://github.com/NousResearch/hermes-agent/pull/22120))
+- **Cache Nous auth + .env loads** — `hermes tools` All Platforms from 14s to <1.5s ([#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
+- **Skip welcome banner on `chat -q` single-query mode** ([#22904](https://github.com/NousResearch/hermes-agent/pull/22904))
+- **Defer heavy google-cloud imports in google_chat to first adapter use** ([#22681](https://github.com/NousResearch/hermes-agent/pull/22681))
+- **Defer QQAdapter and YuanbaoAdapter imports via PEP 562** ([#22790](https://github.com/NousResearch/hermes-agent/pull/22790))
+- **Defer httpx import in teams to first webhook call** ([#22831](https://github.com/NousResearch/hermes-agent/pull/22831))
+- **Defer fal_client import to first generation request** ([#22859](https://github.com/NousResearch/hermes-agent/pull/22859))
+- **models.dev cache-first lookup, skip network when disk cache is fresh** ([#22808](https://github.com/NousResearch/hermes-agent/pull/22808))
+- **Parallelize API connectivity checks in `hermes doctor` and disable IMDS** ([#22766](https://github.com/NousResearch/hermes-agent/pull/22766))
+
+### Runtime
+- **180x faster `browser_console` evaluations** — route through supervisor's persistent CDP WebSocket ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
+- **Tune Telegram cadence + adaptive fast-path for short replies** (salvage of #10388) ([#23587](https://github.com/NousResearch/hermes-agent/pull/23587))
+- **Accumulate length-continuation prefix via list+join** ([#26237](https://github.com/NousResearch/hermes-agent/pull/26237))
+
+### Prompt caching
+- **Cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal** ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828))
+- **Hit prefix cache in background review fork** (salvage #17276 + #25427) ([#25434](https://github.com/NousResearch/hermes-agent/pull/25434))
+
+---
+
+## 📦 Installation & Distribution
+
+### PyPI + supply-chain
+- **PyPI wheel packaging — `pip install hermes-agent && hermes`** (salvage of #26350) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593))
+- **Supply-chain advisory checker + lazy-install framework + tiered install fallback** ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220))
+- **Use `--extra all` not `--all-extras`; drop lazy-covered extras from `[all]`** ([#24515](https://github.com/NousResearch/hermes-agent/pull/24515))
+- **Skip browser download when system chromium exists** (@helix4u) ([#25317](https://github.com/NousResearch/hermes-agent/pull/25317))
+
+### Nix
+- **`extraDependencyGroups` for sealed venv extras** (@alt-glitch) ([#21817](https://github.com/NousResearch/hermes-agent/pull/21817))
+- **Refresh npm lockfile hashes** — keeps Nix flake builds reproducible
+
+### Docker
+- **Bootstrap auth.json from env on first boot** ([#21880](https://github.com/NousResearch/hermes-agent/pull/21880))
+- **Drop manual @hermes/ink build, rely on esbuild bundle** — slimmer image
+
+### ACP / Zed
+- **Zed ACP Registry integration** (salvage of #25908) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079))
+- **Switch to uvx distribution, drop npm launcher** ([#26120](https://github.com/NousResearch/hermes-agent/pull/26120))
+- **`hermes acp --setup-browser` bootstraps browser tools for registry installs** ([#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
+
+---
+
+## 🏗️ Core Agent & Architecture
+
+### Sessions & handoff
+- **`/handoff` actually transfers the session live** ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
+- **Expose `HERMES_SESSION_ID` env var to agent tools** (@alt-glitch) ([#23847](https://github.com/NousResearch/hermes-agent/pull/23847))
+
+### Goals (Ralph loop)
+- **`/subgoal` — user-added criteria appended to active `/goal`** ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
+- **`/goal` checklist + /subgoal user controls** ([#23456](https://github.com/NousResearch/hermes-agent/pull/23456)) — rolled back in window ([#23813](https://github.com/NousResearch/hermes-agent/pull/23813)); /subgoal returned in simpler form via #25449
+
+### Compression
+- **Make `protect_first_n` configurable** ([#25447](https://github.com/NousResearch/hermes-agent/pull/25447))
+
+### Verification
+- **Per-turn file-mutation verifier footer** ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
+
+### Stream retry
+- **Log inner cause, upstream headers, bytes/elapsed on every drop** ([#23005](https://github.com/NousResearch/hermes-agent/pull/23005))
+
+---
+
+## 🤖 Models & Providers
+
+### New providers
+- **xAI Grok OAuth (SuperGrok Subscription) provider** ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534))
+- **NovitaAI provider** (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507))
+- **NVIDIA NIM billing origin header** (salvage #25211) ([#26585](https://github.com/NousResearch/hermes-agent/pull/26585))
+
+### Provider work
+- **OpenRouter Pareto Code router with `min_coding_score` knob** ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
+- **Optional codex app-server runtime for OpenAI/Codex models** ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182))
+- **Codex-runtime: retire wedged sessions + post-tool watchdog + OAuth refresh classify** ([#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
+- **Codex-runtime: skip unavailable plugins during migration** ([#25437](https://github.com/NousResearch/hermes-agent/pull/25437))
+- **Codex-runtime: de-dup `[plugins.X]` tables and stop leaking HERMES_HOME into config.toml** (#26250) (@kshitijk4poor) ([#26260](https://github.com/NousResearch/hermes-agent/pull/26260))
+- **Pass `reasoning.effort` to xAI Responses API** ([#22807](https://github.com/NousResearch/hermes-agent/pull/22807))
+- **Custom provider: prompt and persist explicit `api_mode`** ([#25068](https://github.com/NousResearch/hermes-agent/pull/25068))
+- **Rename Alibaba Cloud → Qwen Cloud, reorder picker** ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
+- **Restore gpt-5.3-codex-spark for ChatGPT Pro** (salvage #18286 + #19530, fixes #16172) (@kshitijk4poor) ([#22991](https://github.com/NousResearch/hermes-agent/pull/22991))
+- **Inject tool-use enforcement for GLM models** ([#24715](https://github.com/NousResearch/hermes-agent/pull/24715))
+- **Use Nous Portal as model metadata authority** (@rob-maron) ([#24502](https://github.com/NousResearch/hermes-agent/pull/24502))
+- **Unified `client=hermes-client-v<version>` tag on every Portal request** ([#24779](https://github.com/NousResearch/hermes-agent/pull/24779))
+- **Prevent stale Ollama credentials after provider switch** (@kshitijk4poor) ([#21703](https://github.com/NousResearch/hermes-agent/pull/21703))
+- **Auxiliary client: rotate pooled auth after quota failures** (salvage #22779) ([#22792](https://github.com/NousResearch/hermes-agent/pull/22792))
+- **Auxiliary client: skip providers without credentials immediately** (#25395) ([#25487](https://github.com/NousResearch/hermes-agent/pull/25487))
+- **Auth: send Nous refresh token via header** (@shannonsands) ([#21578](https://github.com/NousResearch/hermes-agent/pull/21578))
+- **MiniMax: harden OAuth dashboard and runtime** ([#24165](https://github.com/NousResearch/hermes-agent/pull/24165))
+
+### OpenAI-compatible proxy
+- **Local OpenAI-compatible proxy for OAuth providers** — Codex / Aider / Cline can hit Claude Pro, ChatGPT Pro, SuperGrok ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
+
+---
+
+## 📱 Messaging Platforms (Gateway)
+
+### New platforms
+- **LINE Messaging API platform plugin** ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197))
+- **SimpleX Chat platform plugin** (salvages #2558) ([#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
+
+### Microsoft Graph foundation
+- **msgraph: add auth and client foundation** (salvage of #21408) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922))
+- **msgraph: add webhook listener platform** (salvage of #21409) ([#21969](https://github.com/NousResearch/hermes-agent/pull/21969))
+- **teams-pipeline: add plugin runtime and operator cli** (salvage of #21410) ([#22007](https://github.com/NousResearch/hermes-agent/pull/22007))
+- **teams: add pipeline outbound delivery via existing adapter** (salvage of #21411) ([#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
+
+### Cross-platform
+- **Per-platform admin/user split for slash commands** (salvage of #4443) ([#23373](https://github.com/NousResearch/hermes-agent/pull/23373))
+- **Forensics on signal handling — non-blocking diag, per-phase timing, stale-unit warning** ([#23285](https://github.com/NousResearch/hermes-agent/pull/23285))
+- **Keep gateway running when platforms fail; add per-platform circuit breaker + `/platform`** ([#26600](https://github.com/NousResearch/hermes-agent/pull/26600))
+- **Wire `clarify` tool with inline keyboard buttons on Telegram** ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199))
+- **Add `chat_id` to `hook_ctx` for message source tracking** ([#24710](https://github.com/NousResearch/hermes-agent/pull/24710))
+
+### Telegram
+- **Native draft streaming via `sendMessageDraft` (Bot API 9.5+)** (salvage of #3412) ([#23512](https://github.com/NousResearch/hermes-agent/pull/23512))
+- **Stream Telegram edits safely** — salvage of #22264 (@kshitijk4poor) ([#22518](https://github.com/NousResearch/hermes-agent/pull/22518))
+- **Telegram notification mode** (salvage #22772) ([#22793](https://github.com/NousResearch/hermes-agent/pull/22793))
+- **Telegram guest mention mode** (@kshitijk4poor) ([#22759](https://github.com/NousResearch/hermes-agent/pull/22759))
+- **Split-and-deliver oversized edits instead of silent truncation** (salvage of #19537) ([#23576](https://github.com/NousResearch/hermes-agent/pull/23576))
+- **Preserve DM topic routing via reply fallback** (salvage #22053) (@kshitijk4poor) ([#22410](https://github.com/NousResearch/hermes-agent/pull/22410))
+- **Pass `source.thread_id` explicitly on auto-reset notice** (carve-out of #7404) ([#23440](https://github.com/NousResearch/hermes-agent/pull/23440))
+
+### Discord
+- **Render clarify choices as buttons** ([#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
+- **Channel history backfill — default on, broadened scope** ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
+- **`thread_require_mention` for multi-bot threads** (salvage #25313) ([#25445](https://github.com/NousResearch/hermes-agent/pull/25445))
+
+### Slack
+- **Support `!cmd` as alternate prefix for slash commands in threads** ([#25355](https://github.com/NousResearch/hermes-agent/pull/25355))
+
+### WhatsApp
+- **Surface quoted reply metadata from Baileys** (#25398) ([#25489](https://github.com/NousResearch/hermes-agent/pull/25489))
+
+### Feishu / Google Chat / others
+- **Feishu: native update prompt cards** (@kshitijk4poor) ([#22448](https://github.com/NousResearch/hermes-agent/pull/22448))
+- **Google Chat: repair setup prompt imports** (@helix4u) ([#22038](https://github.com/NousResearch/hermes-agent/pull/22038))
+- **Google Chat: honor relay-declared sender_type** (salvage of #22107) (@kshitijk4poor) ([#22432](https://github.com/NousResearch/hermes-agent/pull/22432))
+- **LINE: use `build_source` instead of nonexistent `create_source`** ([#24717](https://github.com/NousResearch/hermes-agent/pull/24717))
+- **Add `weixin, and more` to gateway docs** (salvage of #21063 by @wuwuzhijing)
+
+---
+
+## 🖥️ CLI & TUI
+
+### CLI
+- **Show YOLO mode warning in banner and status bar** ([#26238](https://github.com/NousResearch/hermes-agent/pull/26238))
+- **Confirm prompt for destructive slash commands** (#4069) ([#22687](https://github.com/NousResearch/hermes-agent/pull/22687))
+- **`docker_extra_args` + `display.timestamps`** ([#23599](https://github.com/NousResearch/hermes-agent/pull/23599))
+- **Delegate tool: show user's actual concurrency / spawn-depth limits in description** ([#22694](https://github.com/NousResearch/hermes-agent/pull/22694))
+
+### TUI
+- **`/sessions` slash command for browsing and resuming previous sessions** (@austinpickett) ([#20805](https://github.com/NousResearch/hermes-agent/pull/20805))
+- **Segment turns with rule above non-first user msgs; trim ticker dead space** (@OutThisLife) ([#21846](https://github.com/NousResearch/hermes-agent/pull/21846))
+- **Support attaching to an existing gateway** (@OutThisLife) ([#21978](https://github.com/NousResearch/hermes-agent/pull/21978))
+- **Resolve markdown links to readable page titles** (@OutThisLife) ([#24013](https://github.com/NousResearch/hermes-agent/pull/24013))
+- **Width-aware markdown table rendering with vertical fallback** (@alt-glitch) ([#26195](https://github.com/NousResearch/hermes-agent/pull/26195))
+- **Keep Ink displayCursor in sync with fast-echo writes so cursor stops drifting** (@OutThisLife) ([#26717](https://github.com/NousResearch/hermes-agent/pull/26717))
+- **Allow transcript scroll + Esc during approval/clarify/confirm prompts** (@OutThisLife) ([#26414](https://github.com/NousResearch/hermes-agent/pull/26414))
+- **Preserve session when switching personality** (@austinpickett) ([#20942](https://github.com/NousResearch/hermes-agent/pull/20942))
+- **Skip native safety net on OSC52-capable terminals** (@benbarclay) ([#20954](https://github.com/NousResearch/hermes-agent/pull/20954))
+
+### Dashboard / GUI
+- **Route embedded TUI through dashboard gateway** (@OutThisLife) ([#21979](https://github.com/NousResearch/hermes-agent/pull/21979))
+- **Hide token/cost analytics behind config flag (default off)** ([#25438](https://github.com/NousResearch/hermes-agent/pull/25438))
+- **Fix Langfuse observability — trace I/O, tool outputs, placeholder credentials** (closes #22342, #22763) (@kshitijk4poor) ([#26320](https://github.com/NousResearch/hermes-agent/pull/26320))
+- **MiniMax 'Login' button launched Claude OAuth** (salvage #22849) ([#24058](https://github.com/NousResearch/hermes-agent/pull/24058))
+- **Update cron modals** (@austinpickett) ([#25985](https://github.com/NousResearch/hermes-agent/pull/25985))
+- **Analytics: prevent silent token loss and add Claude 4.5–4.7 pricing** (@austinpickett) ([#21455](https://github.com/NousResearch/hermes-agent/pull/21455))
+
+---
+
+## 🔧 Tools & Capabilities
+
+### Vision & video
+- **`vision_analyze` returns pixels to vision-capable models** ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
+- **Unified `video_generate` with pluggable provider backends** ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
+- **`image_gen`: actionable setup message when no FAL backend is reachable** ([#26222](https://github.com/NousResearch/hermes-agent/pull/26222))
+
+### Computer use
+- **`computer_use` cua-driver backend + focus-safe ops + non-Anthropic provider fix** (re-salvage #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967))
+- **Refresh cua-driver on `hermes update` + add `install --upgrade`** ([#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
+
+### LSP & write-time diagnostics
+- **Semantic diagnostics from real language servers in `write_file`/`patch`** ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168))
+- **Shift baseline diagnostics into post-edit coordinates** ([#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
+
+### Search & web
+- **Brave Search (free tier) and DDGS search providers** ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
+- **Bearer auth header for Tavily `/crawl` endpoint** ([#24658](https://github.com/NousResearch/hermes-agent/pull/24658))
+
+### X (Twitter)
+- **Gated `x_search` tool with OAuth-or-API-key auth** ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
+
+### Browser
+- **Route `browser_console` eval through supervisor's persistent CDP WS (180x faster)** ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
+- **Support externally managed Camofox sessions** ([#24499](https://github.com/NousResearch/hermes-agent/pull/24499))
+
+### MCP
+- **`supports_parallel_tool_calls` for MCP servers** (salvage of #9944) ([#26825](https://github.com/NousResearch/hermes-agent/pull/26825))
+- **Codex preset for Codex CLI MCP server** (salvage #22663) ([#22679](https://github.com/NousResearch/hermes-agent/pull/22679))
+- **Stop retrying initial MCP auth failures** (#25624) ([#25776](https://github.com/NousResearch/hermes-agent/pull/25776))
+
+### Google Workspace
+- **Drive write ops + Docs/Sheets create/append** ([#21895](https://github.com/NousResearch/hermes-agent/pull/21895))
+
+### Per-turn verifier
+- **Per-turn file-mutation verifier footer** ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
+
+---
+
+## 🧩 Kanban (Multi-Agent)
+
+- **`specify` — auxiliary LLM fleshes out triage tasks** ([#21435](https://github.com/NousResearch/hermes-agent/pull/21435))
+- **Orchestrator board tools — `kanban_list` + `kanban_unblock`** (carve-out of #20568) ([#23012](https://github.com/NousResearch/hermes-agent/pull/23012))
+- **`stranded_in_ready` diagnostic for unclaimed tasks** ([#23578](https://github.com/NousResearch/hermes-agent/pull/23578))
+- **Dashboard batch QOL upgrade** (salvage of #23240) ([#23550](https://github.com/NousResearch/hermes-agent/pull/23550))
+- **Tooltips and docs link across dashboard** ([#21541](https://github.com/NousResearch/hermes-agent/pull/21541))
+- **Dedupe notifier delivery via atomic claim + rewind on failure** (salvage #22558) ([#23401](https://github.com/NousResearch/hermes-agent/pull/23401))
+- **Keep notifier subscriptions alive across retry cycles** (salvage #21398) ([#23423](https://github.com/NousResearch/hermes-agent/pull/23423))
+- **Drop caller-controlled author override in `kanban_comment`** (salvage of #22109) (@kshitijk4poor) ([#22435](https://github.com/NousResearch/hermes-agent/pull/22435))
+- **Sanitize comment author rendering in `build_worker_context`** ([#22769](https://github.com/NousResearch/hermes-agent/pull/22769))
+
+---
+
+## 🧠 Plugins & Extension
+
+### Plugin surface
+- **Run any LLM call from inside a plugin via `ctx.llm`** ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194))
+- **`tool_override` flag for replacing built-in tools** (closes #11049) ([#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
+- **`standalone_sender_fn` for out-of-process cron delivery** (@kshitijk4poor) ([#22461](https://github.com/NousResearch/hermes-agent/pull/22461))
+- **`HERMES_PLUGINS_DEBUG=1` surfaces plugin discovery logs** ([#22684](https://github.com/NousResearch/hermes-agent/pull/22684))
+- **Hindsight-client as optional dependency** (@alt-glitch) ([#21818](https://github.com/NousResearch/hermes-agent/pull/21818))
+
+### Profile & distribution
+- **Shareable profile distributions via git** ([#20831](https://github.com/NousResearch/hermes-agent/pull/20831))
+
+---
+
+## ⏰ Cron
+
+- **Routing intent — `deliver=all` fans out to every connected channel** ([#21495](https://github.com/NousResearch/hermes-agent/pull/21495))
+- **Support name-based lookup for job operations** ([#26231](https://github.com/NousResearch/hermes-agent/pull/26231))
+- **Blank Cron dashboard tab + partial-record crashes** (salvage #21042 + #22330) (@kshitijk4poor) ([#22389](https://github.com/NousResearch/hermes-agent/pull/22389))
+- **Do not seed `HERMES_SESSION_*` contextvars from cron origin** (salvage of #22356) (@kshitijk4poor) ([#22382](https://github.com/NousResearch/hermes-agent/pull/22382))
+- **Scan assembled prompt including skill content for prompt injection** (#3968)
+
+---
+
+## 🧩 Skills Ecosystem
+
+### Skills Hub
+- **`hermes-skills/huggingface` as a trusted default tap** (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
+- **Show per-skill pages in the left sidebar** ([#26646](https://github.com/NousResearch/hermes-agent/pull/26646))
+- **Richer info panels on the Skills Hub** ([#22905](https://github.com/NousResearch/hermes-agent/pull/22905))
+- **Refuse `skill_view` name collisions instead of guessing** (closes #6136 @polkn)
+
+### Curator
+- **Show rename map in user-visible summary** ([#22910](https://github.com/NousResearch/hermes-agent/pull/22910))
+- **Hint at `hermes curator pin` in the rename block** ([#23212](https://github.com/NousResearch/hermes-agent/pull/23212))
+
+### New optional skills
+- **Hyperliquid** — perp/spot trading via SDK + REST (salvage of #1952) ([#23583](https://github.com/NousResearch/hermes-agent/pull/23583))
+- **Yahoo Finance** market data ([#23590](https://github.com/NousResearch/hermes-agent/pull/23590))
+- **api-testing** (REST/GraphQL debug, salvages #1800) ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582))
+- **Unified EVM multi-chain skill** (salvages #25291 + #2010 + folds in base/) ([#25299](https://github.com/NousResearch/hermes-agent/pull/25299))
+- **darwinian-evolver** ([#26760](https://github.com/NousResearch/hermes-agent/pull/26760))
+- **osint-investigation** (closes #355) ([#26729](https://github.com/NousResearch/hermes-agent/pull/26729))
+- **pinggy-tunnel** ([#26765](https://github.com/NousResearch/hermes-agent/pull/26765))
+- **watchers** — RSS / HTTP JSON / GitHub polling via cron no-agent ([#21881](https://github.com/NousResearch/hermes-agent/pull/21881))
+- **Notion overhaul for the Developer Platform** (May 2026) ([#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
+
+---
+
+## 🔒 Security & Reliability
+
+### Security hardening
+- **Sudo brute-force block + sudo-stdin/askpass DANGEROUS** (salvage of #22194 + #21128) (@kshitijk4poor) ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736))
+- **Drop caller-controlled author override in `kanban_comment`** (salvage of #22109) (@kshitijk4poor) ([#22435](https://github.com/NousResearch/hermes-agent/pull/22435))
+- **Cover remaining SSRF fetch paths in skills-hub** (salvage #22804) ([#22843](https://github.com/NousResearch/hermes-agent/pull/22843))
+- **Use credential_pool for custom endpoint model listing probes** (salvage #22810) ([#22842](https://github.com/NousResearch/hermes-agent/pull/22842))
+- **Require dashboard auth for plugin API routes** (salvage #19541) ([#23220](https://github.com/NousResearch/hermes-agent/pull/23220))
+- **Sanitize env and redact output in quick commands + remove write-only `_pending_messages`** ([#23584](https://github.com/NousResearch/hermes-agent/pull/23584))
+- **Reduce unnecessary `shell=True` in subprocess calls** ([#25149](https://github.com/NousResearch/hermes-agent/pull/25149))
+- **Sanitize Google Chat sender_type from relay** (salvage of #22107) (@kshitijk4poor) ([#22432](https://github.com/NousResearch/hermes-agent/pull/22432))
+- **Supply-chain advisory checker** ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220))
+- **Rewrite security policy around OS-level isolation as the boundary** (@jquesnelle) ([#20317](https://github.com/NousResearch/hermes-agent/pull/20317))
+- **Remove public security advisory page** ([#24253](https://github.com/NousResearch/hermes-agent/pull/24253))
+
+### Reliability — notable bug closures
+- **SQLite: fall back to `journal_mode=DELETE` on NFS/SMB/FUSE** (fixes `/resume` on network mounts) (@kshitijk4poor) ([#22043](https://github.com/NousResearch/hermes-agent/pull/22043))
+- **Codex-runtime: retire wedged sessions + post-tool watchdog + OAuth refresh classify** ([#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
+- **Codex-runtime: de-dup `[plugins.X]` tables and stop leaking HERMES_HOME** (#26250) (@kshitijk4poor) ([#26260](https://github.com/NousResearch/hermes-agent/pull/26260))
+- **Daytona: migrate legacy-sandbox lookup to cursor-based `list()`** ([#24587](https://github.com/NousResearch/hermes-agent/pull/24587))
+- **MCP: stop retrying initial MCP auth failures** (#25624) ([#25776](https://github.com/NousResearch/hermes-agent/pull/25776))
+- **Gateway: enable text-intercept for multi-choice clarify fallback** (#25587) ([#25778](https://github.com/NousResearch/hermes-agent/pull/25778))
+- **Gateway: keep running when platforms fail; per-platform circuit breaker + `/platform`** ([#26600](https://github.com/NousResearch/hermes-agent/pull/26600))
+- **Delegate: salvage #21933 JSON-string batch + diagnostic logging** (@kshitijk4poor) ([#22436](https://github.com/NousResearch/hermes-agent/pull/22436))
+- **Profiles+banner: exclude infrastructure from `--clone-all` + fix stale update-check repo resolution** (@kshitijk4poor) ([#22475](https://github.com/NousResearch/hermes-agent/pull/22475))
+- **ACP: inline file attachment resources** (salvage #21400 + image support) ([#21407](https://github.com/NousResearch/hermes-agent/pull/21407))
+- **CI: unblock shared PR checks** (@stephenschoettler) ([#21012](https://github.com/NousResearch/hermes-agent/pull/21012), [#25957](https://github.com/NousResearch/hermes-agent/pull/25957))
+
+### Notable reverts in window
+- **`/goal` checklist + /subgoal feature stack** — rolled back ([#23813](https://github.com/NousResearch/hermes-agent/pull/23813)); `/subgoal` returned in simpler form via [#25449](https://github.com/NousResearch/hermes-agent/pull/25449)
+- **Scrollback box width clamp** (#25975) rolled back to restore full-width borders ([#26163](https://github.com/NousResearch/hermes-agent/pull/26163))
+- **`fix(cli): tolerate unreadable dirs when building systemd PATH`** rolled back
+
+---
+
+## 🌍 i18n
+
+- **Localize all gateway commands + web dashboard, add 8 new locales (16 total)** ([#22914](https://github.com/NousResearch/hermes-agent/pull/22914))
+
+---
+
+## 📚 Documentation
+
+- **Repair Voice & TTS provider table** (@nightcityblade, fixes #24101) ([#24138](https://github.com/NousResearch/hermes-agent/pull/24138))
+- **Show per-skill pages in the left sidebar** ([#26646](https://github.com/NousResearch/hermes-agent/pull/26646))
+- **Mention Weixin in gateway help and docstrings** (salvage of #21063 by @wuwuzhijing)
+- **Richer info panels on the Skills Hub** ([#22905](https://github.com/NousResearch/hermes-agent/pull/22905))
+- Many more doc updates across providers, platforms, skills, Windows install paths, and dashboard.
+
+---
+
+## 🧪 Testing & CI
+
+- **Unblock shared PR checks** (@stephenschoettler) ([#21012](https://github.com/NousResearch/hermes-agent/pull/21012))
+- **Stabilize shared test state after 21012** (@stephenschoettler) ([#25957](https://github.com/NousResearch/hermes-agent/pull/25957))
+- A long tail of test additions for platforms, providers, plugins, and edge cases — 8 explicit `test:` PRs plus ~250 fix PRs that also added regression coverage.
+
+---
+
+## 👥 Contributors
+
+### Core
+- @teknium1 — release lead, architecture, ~406 PRs merged in window
+
+### Top community contributors
+- **@kshitijk4poor** — 38 PRs · Telegram cadence/streaming/topic routing, security hardening (sudo, SSRF, kanban_comment, dashboard auth), codex-runtime hygiene, NovitaAI provider, profile/banner fixes, Feishu update cards, gateway QOL across the board
+- **@alt-glitch** — 13 PRs · Markdown-table TUI rendering, `HERMES_SESSION_ID` env var, hindsight-client optional dep, Nix `extraDependencyGroups`
+- **@OutThisLife** (Brooklyn Nicholson) — 12 PRs · TUI turn segmentation, attach-to-gateway, markdown link titles, embedded TUI via dashboard gateway, Ink cursor sync, scroll/Esc during prompts
+- **@austinpickett** — 8 PRs · `/sessions` slash command, personality switching preserves session, cron modals, dashboard analytics
+- **@helix4u** — 5 PRs · Google Chat setup, browser install skip on system chromium, Windows Ctrl+C preservation
+- **@rob-maron** — 4 PRs · Nous Portal as model metadata authority, provider polish
+- **@stephenschoettler** — 3 PRs · CI stabilization
+- **@ethernet8023** — 3 PRs · platform/gateway work
+
+### All contributors (alphabetical)
+
+@02356abc, @0xbyt4, @0xharryriddle, @1000Delta, @1RB, @29206394, @A-kamal, @aashizpoudel, @Abd0r,
+@adybag14-cyber, @AgentArcLab, @ahmedbadr3, @AhmetArif0, @alblez, @Alex-yang00, @ALIYILD, @AllynSheep,
+@alt-glitch, @am423, @amathxbt, @amethystani, @ArecaNon, @Arkmusn, @askclaw-vesper, @AsoTora, @austinpickett,
+@aydnOktay, @ayushere, @baocin, @Bartok9, @benbarclay, @BennetYrWang, @Bihruze, @binhnt92, @briandevans,
+@brooklynnicholson, @btorresgil, @buntingszn, @CalmProton, @chrisworksai, @CoinTheHat, @dandacompany, @Dangooy,
+@DanielLSM, @David-0x221Eight, @ddupont808, @dhruv-saxena, @diablozzc, @dlkakbs, @dmahan93, @dmnkhorvath,
+@domtriola, @donrhmexe, @Dusk1e, @eloklam, @emozilla, @ephron-ren, @erenkarakus, @EthanGuo-coder,
+@ethernet8023, @evgyur, @explainanalyze, @fahdad, @fr33d3m0n, @Freeman-Consulting, @freqyfreqy, @Frowtek,
+@fu576, @github-actions[bot], @gnanirahulnutakki, @GodsBoy, @guglielmofonda, @Gutslabs, @hanzckernel,
+@heathley, @hekaru-agent, @helix4u, @HenkDz, @HiddenPuppy, @hllqkb, @hrygo, @HuangYuChuh, @Hugo-SEQUIER, @HxT9,
+@iacker, @InB4DevOps, @isaachuangGMICLOUD, @iuyup, @Jaaneek, @jackey8616, @jackjin1997, @Jaggia, @jak983464779,
+@jelrod27, @jethac, @JithendraNara, @johnisag, @Julientalbot, @Jwd-gity, @kallidean, @keyuyuan, @kfa-ai,
+@kidonng, @KiraKatana, @kjames2001, @konsisumer, @Korkyzer, @kshitijk4poor, @KvnGz, @lars-hagen, @leehack,
+@leepoweii, @LeonSGP43, @li0near, @libo1106, @liquidchen, @littlewwwhite, @liuhao1024, @liyoungc, @luandiasrj,
+@luoyuctl, @luyao618, @magic524, @mbac, @McClean, @memosr, @Mibayy, @ming1523, @mizgyo, @mrshu, @ms-alan,
+@MustafaKara7, @nederev, @nicoechaniz, @nidhi-singh02, @nightcityblade, @nik1t7n, @Ninso112, @NivOO5,
+@novax635, @nv-kasikritc, @oferlaor, @oswaldb22, @outdoorsea, @oxngon, @PaTTeeL, @pearjelly, @pefontana,
+@perng, @PhilipAD, @phuongvm, @polkn, @Prasanna28Devadiga, @princepal9120, @pty819, @purzbeats, @Quarkex,
+@quocanh261997, @qWaitCrypto, @Qwinty, @rahimsais, @raymaylee, @ReqX, @rewbs, @RhombusMaximus, @rob-maron,
+@Ruzzgar, @ryptotalent, @Sanjays2402, @shannonsands, @shaun0927, @SiliconID, @silv-mt-holdings, @simpolism,
+@smwbev, @soichiyo, @sprmn24, @steezkelly, @stephenschoettler, @Sylw3ster, @szymonclawd, @teyrebaz33,
+@Tianyu199509, @Tranquil-Flow, @TreyDong, @TurgutKural, @tw2818, @tymrtn, @uzunkuyruk, @v1b3coder,
+@vanthinh6886, @VinceZcrikl, @vKongv, @vominh1919, @voteblake, @VTRiot, @wali-reheman, @wesleysimplicio,
+@wilsen0, @WorldWriter, @worlldz, @wuli666, @wuwuzhijing, @Wysie, @XiaoXiao0221, @xieNniu, @xxxigm, @yehuosi,
+@ygd58, @yifengingit, @yuga-hashimoto, @zccyman, @ZeterMordio, @Zhekinmaksim, @zhengyn0001
+
+Also: @Nagatha (Claude Opus 4.7).
+
+---
+
+**Full Changelog**: [v2026.5.7...v2026.5.16](https://github.com/NousResearch/hermes-agent/compare/v2026.5.7...v2026.5.16)
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -1,84 +1,331 @@
 # Hermes Agent Security Policy

-This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project.
+This document describes Hermes Agent's trust model, names the one
+security boundary the project treats as load-bearing, and defines the
+scope for vulnerability reports.

-## 1. Vulnerability Reporting
+## 1. Reporting a Vulnerability

-Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities.
+Report privately via [GitHub Security Advisories](https://github.com/NousResearch/hermes-agent/security/advisories/new)
+or **security@nousresearch.com**. Do not open public issues for
+security vulnerabilities. **Hermes Agent does not operate a bug
+bounty program.**

-### Required Submission Details
- **Title & Severity:** Concise description and CVSS score/rating.
- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`).
- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version.
- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release.
- **Impact:** Explanation of what trust boundary was crossed.
+A useful report includes:
+
+- A concise description and severity assessment.
+- The affected component, identified by file path and line range
+  (e.g. `path/to/file.py:120-145`).
+- Environment details (`hermes version`, commit SHA, OS, Python
+  version).
+- A reproduction against `main` or the latest release.
+- A statement of which trust boundary in §2 is crossed.
+
+Please read §2 and §3 before submitting. Reports that demonstrate
+limits of an in-process heuristic this policy does not treat as a
+boundary will be closed as out-of-scope under §3 — but see §3.2:
+they are still welcome as regular issues or pull requests, just not
+through the private security channel.

 ---

 ## 2. Trust Model

-The core assumption is that Hermes is a **personal agent** with one trusted operator.
+Hermes Agent is a single-tenant personal agent. Its posture is
+layered, and the layers are not equally load-bearing. Reporters and
+operators should reason about them in the same terms.

-### Operator & Session Trust
- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level.
- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries.
- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing.
+### 2.1 Definitions

-### Dangerous Command Approval
-The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`:
- `"on"` (default) — prompts the user to approve dangerous commands.
- `"auto"` — auto-approves after a configurable delay.
- `"off"` — disables the gate entirely (break-glass; see Section 3).
+- **Agent process.** The Python interpreter running Hermes Agent,
+  including any Python modules it has loaded (skills, plugins,
+  hook handlers).
+- **Terminal backend.** A pluggable execution target for the
+  `terminal()` tool. The default runs commands directly on the host.
+  Other backends run commands inside a container, cloud sandbox, or
+  remote host.
+- **Input surface.** Any channel through which content enters the
+  agent's context: operator input, web fetches, email, gateway
+  messages, file reads, MCP server responses, tool results.
+- **Trust envelope.** The set of resources an operator has implicitly
+  granted Hermes Agent access to by running it — typically, whatever
+  the operator's own user account can reach on the host.
+- **Stance.** An explicit statement in Hermes Agent's documentation
+  or code about how a consuming layer (adapter, UI, file writer,
+  shell) should treat agent output — e.g. "the dashboard renders
+  agent output as inert HTML."

-### Output Redaction
-`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations.
+### 2.2 The Boundary: OS-Level Isolation

-### Skills vs. MCP Servers
- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands.
- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning.
+**The only security boundary against an adversarial LLM is the
+operating system.** Nothing inside the agent process constitutes
+containment — not the approval gate, not output redaction, not any
+pattern scanner, not any tool allowlist. Any in-process component
+that screens LLM output is a heuristic operating on an
+attacker-influenced string, and this policy treats it as such.

-### Code Execution Sandbox
-The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls.
+Hermes Agent supports two OS-level isolation postures. They address
+different threats and an operator should choose deliberately.

-### Subagents
- **No recursive delegation:** The `delegate_task` tool is disabled for child agents.
- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected.
- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation.
+#### Terminal-backend isolation
+
+A non-default terminal backend runs LLM-emitted shell commands
+inside a container, remote host, or cloud sandbox. The file tools
+(`read_file`, `write_file`, `patch`) also run through this backend,
+since they are implemented on top of the shell contract — they
+cannot reach paths the backend doesn't expose.
+
+What this confines: anything the agent does by issuing shell or
+file operations. What this does **not** confine: everything the
+agent does in its own Python process. That includes the
+code-execution tool (spawned as a host subprocess), MCP subprocesses
+(spawned from the agent's environment), plugin loading, hook
+dispatch, and skill loading (all imported into the agent
+interpreter).
+
+Terminal-backend isolation is the right posture when the concern is
+LLM-emitted destructive shell or unwanted file-tool writes, and the
+operator is otherwise trusted.
+
+#### Whole-process wrapping
+
+Whole-process wrapping runs the entire agent process tree inside a
+sandbox. Every code path — shell, code-execution, MCP, file tools,
+plugins, hooks, skill loading — is subject to the same filesystem,
+network, process, and (where applicable) inference policy.
+
+Hermes Agent supports this in two ways:
+
+- **Hermes Agent's own Docker image and Compose setup.** Lighter-
+  weight; the agent runs in a standard container with operator-
+  configured mounts and network policy.
+- **[NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell)**.
+  OpenShell provides per-session sandboxes with declarative policy
+  across filesystem, network (L7 egress), process/syscall, and
+  inference-routing layers. Network and inference policies are
+  hot-reloadable. Credentials are injected from a Provider store
+  and never touch the sandbox filesystem.
+
+Under a whole-process wrapper, Hermes Agent's in-process heuristics
+(§2.4) function as accident-prevention layered on top of a real
+boundary. This is the supported posture when the agent ingests
+content from surfaces the operator does not control — the open web,
+inbound email, multi-user channels, untrusted MCP servers — and for
+production or shared deployments.
+
+Operators running the default local backend with untrusted input
+surfaces, or running a terminal-backend sandbox and expecting it to
+contain code paths that don't go through the shell, are operating
+outside the supported security posture.
+
+### 2.3 Credential Scoping
+
+Hermes Agent filters the environment it passes to its lower-trust
+in-process components: shell subprocesses, MCP subprocesses, and
+the code-execution child. Credentials like provider API keys and
+gateway tokens are stripped by default; variables explicitly
+declared by the operator or by a loaded skill are passed through.
+
+This reduces casual exfiltration. It is not containment. Any
+component running inside the agent process (skills, plugins, hook
+handlers) can read whatever the agent itself can read, including
+in-memory credentials. The mitigation against a compromised
+in-process component is operator review before install (§2.4,
+§2.5), not environment scrubbing.
+
+### 2.4 In-Process Heuristics
+
+The following components screen or warn about LLM behavior. They
+are useful. They are not boundaries.
+
+- The **approval gate** detects common destructive shell patterns
+  and prompts the operator before execution. Shell is Turing-
+  complete; a denylist over shell strings is structurally
+  incomplete. The gate catches cooperative-mode mistakes, not
+  adversarial output.
+- **Output redaction** strips secret-like patterns from display.
+  A motivated output producer will defeat it.
+- **Skills Guard** scans installable skill content for injection
+  patterns. It is a review aid; the boundary for third-party skills
+  is operator review before install. Reviewing a skill means
+  reading its Python code and scripts, not just its SKILL.md
+  description — skills execute arbitrary Python at import time.
+
+### 2.5 Plugin Trust Model
+
+Plugins load into the agent process and run with full agent
+privileges: they can read the same credentials, call the same
+tools, register the same hooks, and import the same modules as
+anything shipped in-tree. The boundary for third-party plugins is
+operator review before install — the same rule as skills (§2.4),
+called out separately because plugins are architecturally heavier
+and often ship their own background services, network listeners,
+and dependencies.
+
+A malicious or buggy plugin is not a vulnerability in Hermes Agent
+itself. Bugs in Hermes Agent's plugin-install or plugin-discovery
+path that prevent the operator from seeing what they're installing
+are in scope under §3.1.
+
+### 2.6 External Surfaces
+
+An **external surface** is any channel outside the local agent
+process through which a caller can dispatch agent work, resolve
+approvals, or receive agent output. Each surface has its own
+authorization model, but the rules below apply uniformly.
+
+**Surfaces in Hermes Agent:**
+
+- **Gateway platform adapters.** Messaging integrations in
+  `gateway/platforms/` (Telegram, Discord, Slack, email, SMS, etc.)
+  and analogous adapters shipped as plugins.
+- **Network-exposed HTTP surfaces.** The API server adapter, the
+  dashboard plugin, the kanban plugin's HTTP endpoints, and any
+  other plugin that binds a listening socket.
+- **Editor / IDE adapters.** The ACP adapter (`acp_adapter/`) and
+  equivalent integrations that accept requests from a local client
+  process.
+- **The TUI gateway (`tui_gateway/`).** JSON-RPC backend for the
+  Ink terminal UI, reached over local IPC.
+
+**Uniform rules:**
+
+1. **Authorization is required at every surface that crosses a
+   trust boundary.** For messaging and network HTTP surfaces, the
+   boundary is the network: authorization means an operator-
+   configured caller allowlist. For editor and local-IPC surfaces
+   (ACP, TUI gateway), the boundary is the host's user account:
+   authorization means relying on OS-level access control (file
+   permissions, loopback-only binds) and not exposing the surface
+   beyond the local user without an explicit network auth layer.
+2. **An allowlist is required for every enabled network-exposed
+   adapter.** Adapters must refuse to dispatch agent work, resolve
+   approvals, or relay output until an allowlist is set. Code paths
+   that fail open when no allowlist is configured are code bugs in
+   scope under §3.1.
+3. **Session identifiers are routing handles, not authorization
+   boundaries.** Knowing another caller's session ID does not grant
+   access to their approvals or output; authorization is always
+   re-checked against the allowlist (or OS-level equivalent).
+4. **Within the authorized set, all callers are equally trusted.**
+   Hermes Agent does not model per-caller capabilities inside a
+   single adapter. Operators who need capability separation should
+   run separate agent instances with separate allowlists.
+5. **Binding a local-only surface to a non-loopback interface is a
+   break-glass operator decision (§3.2).** The dashboard and other
+   plugin HTTP servers default to loopback; exposing them via
+   `--host 0.0.0.0` or equivalent makes public-exposure hardening
+   (§4) the operator's responsibility.

 ---

-## 3. Out of Scope (Non-Vulnerabilities)
+## 3. Scope

-The following scenarios are **not** considered security breaches:
- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox.
- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection.
- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files).
- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability.
- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production.
- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system).
+### 3.1 In Scope
+
+- Escape from a declared OS-level isolation posture (§2.2): an
+  attacker-controlled code path reaching state that the posture
+  claimed to confine.
+- Unauthorized external-surface access: a caller outside the
+  configured authorization set (allowlist, or OS-level equivalent
+  for local-IPC surfaces) dispatching work, receiving output, or
+  resolving approvals (§2.6).
+- Credential exfiltration: leakage of operator credentials or
+  session authorization material to a destination outside the
+  trust envelope, via a mechanism that should have prevented it
+  (environment scrubbing bug, adapter logging, transport error
+  that flushes credentials to an upstream, etc.).
+- Trust-model documentation violations: code behaving contrary to
+  what this policy, Hermes Agent's own documentation, or reasonable
+  operator expectations would predict — including cases where
+  Hermes Agent has documented a stance about how its output should
+  be rendered by a consuming layer (dashboard, gateway adapter,
+  file writer, shell) and a code path breaks that stance.
+
+### 3.2 Out of Scope
+
+"Out of scope" here means "not a security vulnerability under this
+policy." It does not mean "not worth reporting." Improvements to the
+in-process heuristics, hardening ideas, and UX fixes are welcome as
+regular issues or pull requests — the approval gate can always catch
+more patterns, redaction can always get smarter, adapter behavior
+can always be tightened. These items just don't go through the
+private-disclosure channel and don't receive advisories.
+
+- **Bypasses of in-process heuristics (§2.4)** — approval-gate regex
+  bypasses, redaction bypasses, Skills Guard pattern bypasses, and
+  analogous reports against future heuristics. These components are
+  not boundaries; defeating them is not a vulnerability under this
+  policy.
+- **Prompt injection per se.** Getting the LLM to emit unusual
+  output — via injected content, hallucination, training artifacts,
+  or any other cause — is not itself a vulnerability. "I achieved
+  prompt injection" without a chained §3.1 outcome is not an
+  actionable report under this policy.
+- **Consequences of a chosen isolation posture.** Reports that a
+  code path operating within its posture's scope can do what that
+  posture permits are not vulnerabilities. Examples: shell or file
+  tools reaching host state under the local backend; code-execution
+  or MCP subprocesses reaching host state under terminal-backend
+  isolation that only sandboxes shell; reports whose preconditions
+  require pre-existing write access to operator-owned configuration
+  or credential files (those are already inside the trust envelope).
+- **Documented break-glass settings.** Operator-selected trade-offs
+  that explicitly disable protections: `--insecure` and equivalent
+  flags on the dashboard or other components, disabled approvals,
+  local backend in production, development profiles that bypass
+  hermes-home security, and similar. Reports against those
+  configurations are not vulnerabilities — that's the flag's job.
+- **Community-contributed skills and plugins.** Third-party skills
+  (including the community skills repository) and third-party
+  plugins are in the operator's review surface, not Hermes Agent's
+  trust surface (§2.4, §2.5). A skill or plugin doing something
+  malicious is the expected failure mode of one that wasn't
+  reviewed, not a vulnerability in Hermes Agent. Bugs in Hermes
+  Agent's skill-install or plugin-install path that prevent the
+  operator from seeing what they're installing are in scope under
+  §3.1.
+- **Public exposure without external controls.** Exposing the
+  gateway or API to the public internet without authentication,
+  VPN, or firewall.
+- **Tool-level read/write restrictions on a posture where shell is
+  permitted.** If a path is reachable via the terminal tool, reports
+  that other file tools can reach it add nothing.

 ---

-## 4. Deployment Hardening & Best Practices
+## 4. Deployment Hardening

-### Filesystem & Network
- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads.
- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs.
- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment.
+The single most important hardening decision is matching isolation
+(§2.2) to the trust of the content the agent will ingest. Beyond
+that:

-### Skills & Supply Chain
- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal.
- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned.
- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns.
-
-### Credential Storage
- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control.
- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases.
+- Run the agent as a non-root user. The supplied container image
+  does this by default.
+- Keep credentials in the operator credential file with tight
+  permissions, never in the main config, never in version control.
+  Under OpenShell, use the Provider store rather than an on-disk
+  credential file.
+- Do not expose the gateway or API to the public internet without
+  VPN, Tailscale, or firewall protection. Under OpenShell, use the
+  network policy layer to restrict egress.
+- Configure a caller allowlist for every network-exposed adapter
+  you enable (§2.6).
+- Review third-party skills and plugins before install (§2.4,
+  §2.5). For skills, this means reading the Python and scripts,
+  not just SKILL.md. Skills Guard reports and the install audit
+  log are the review surface.
+- Hermes Agent includes supply-chain guards for MCP server
+  launches and for dependency / bundled-package changes in CI; see
+  `CONTRIBUTING.md` for specifics.

 ---

-## 5. Disclosure Process
+## 5. Disclosure

- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first.
- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com.
- **Credits:** Reporters are credited in release notes unless anonymity is requested.
+- **Coordinated disclosure window:** 90 days from report, or until a
+  fix is released, whichever comes first.
+- **Channel:** the GHSA thread or email correspondence with
+  security@nousresearch.com.
+- **Credit:** reporters are credited in release notes unless
+  anonymity is requested.
--- a/acp_adapter/auth.py
+++ b/acp_adapter/auth.py
@@ -1,18 +1,32 @@
-"""ACP auth helpers — detect the currently configured Hermes provider."""
+"""ACP auth helpers — detect and advertise Hermes authentication methods."""

 from __future__ import annotations

-from typing import Optional
+from typing import Any, Optional
+
+
+TERMINAL_SETUP_AUTH_METHOD_ID = "hermes-setup"


 def detect_provider() -> Optional[str]:
-    """Resolve the active Hermes runtime provider, or None if unavailable."""
+    """Resolve the active Hermes runtime provider, or None if unavailable.
+
+    Treats a ``Callable`` ``api_key`` (Azure Foundry Entra ID bearer
+    token provider — see :mod:`agent.azure_identity_adapter`) as a valid
+    credential. Without this, ACP sessions for Entra-configured Foundry
+    deployments silently default to ``"openrouter"`` and the ACP auth
+    handshake rejects the legitimate provider.
+    """
    try:
        from hermes_cli.runtime_provider import resolve_runtime_provider
        runtime = resolve_runtime_provider()
        api_key = runtime.get("api_key")
        provider = runtime.get("provider")
-        if isinstance(api_key, str) and api_key.strip() and isinstance(provider, str) and provider.strip():
+        if not isinstance(provider, str) or not provider.strip():
+            return None
+        is_string_key = isinstance(api_key, str) and api_key.strip()
+        is_callable_provider = callable(api_key) and not isinstance(api_key, str)
+        if is_string_key or is_callable_provider:
            return provider.strip().lower()
    except Exception:
        return None
@@ -22,3 +36,44 @@ def detect_provider() -> Optional[str]:
 def has_provider() -> bool:
    """Return True if Hermes can resolve any runtime provider credentials."""
    return detect_provider() is not None
+
+
+def build_auth_methods() -> list[Any]:
+    """Return registry-compatible ACP auth methods for Hermes.
+
+    The official ACP registry validates that agents advertise at least one
+    usable auth method during the initial handshake. A fresh Zed install may
+    not have Hermes provider credentials configured yet, so Hermes always
+    advertises a terminal setup method. When credentials are already present,
+    it also advertises the resolved provider as the default agent-managed
+    runtime credential method.
+    """
+    from acp.schema import AuthMethodAgent, TerminalAuthMethod
+
+    methods: list[Any] = []
+    provider = detect_provider()
+    if provider:
+        methods.append(
+            AuthMethodAgent(
+                id=provider,
+                name=f"{provider} runtime credentials",
+                description=(
+                    "Authenticate Hermes using the currently configured "
+                    f"{provider} runtime credentials."
+                ),
+            )
+        )
+
+    methods.append(
+        TerminalAuthMethod(
+            id=TERMINAL_SETUP_AUTH_METHOD_ID,
+            name="Configure Hermes provider",
+            description=(
+                "Open Hermes' interactive model/provider setup in a terminal. "
+                "Use this when Hermes has not been configured on this machine yet."
+            ),
+            type="terminal",
+            args=["--setup"],
+        )
+    )
+    return methods
--- a/acp_adapter/edit_approval.py
+++ b/acp_adapter/edit_approval.py
@@ -0,0 +1,286 @@
+"""Pre-execution ACP edit approval helpers.
+
+This module is intentionally isolated from the generic tool registry.  ACP binds
+an edit approval requester in a ContextVar for the duration of one ACP agent run;
+CLI, gateway, and other sessions leave it unset and therefore bypass this guard.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import tempfile
+from concurrent.futures import TimeoutError as FutureTimeout
+from contextvars import ContextVar, Token
+from dataclasses import dataclass
+from itertools import count
+from pathlib import Path
+from typing import Any, Callable
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass(frozen=True)
+class EditProposal:
+    """A proposed single-file edit that can be shown to an ACP client."""
+
+    tool_name: str
+    path: str
+    old_text: str | None
+    new_text: str
+    arguments: dict[str, Any]
+
+
+EditApprovalRequester = Callable[[EditProposal], bool]
+
+_EDIT_APPROVAL_REQUESTER: ContextVar[EditApprovalRequester | None] = ContextVar(
+    "ACP_EDIT_APPROVAL_REQUESTER",
+    default=None,
+)
+_PERMISSION_REQUEST_IDS = count(1)
+
+
+SENSITIVE_AUTO_APPROVE_NAMES = {".env", ".env.local", ".env.production", "id_rsa", "id_ed25519"}
+AUTO_APPROVE_ASK = "ask"
+AUTO_APPROVE_WORKSPACE = "workspace_session"
+AUTO_APPROVE_SESSION = "session"
+
+
+def set_edit_approval_requester(requester: EditApprovalRequester | None) -> Token:
+    """Bind an ACP edit approval requester for the current context."""
+
+    return _EDIT_APPROVAL_REQUESTER.set(requester)
+
+
+def reset_edit_approval_requester(token: Token) -> None:
+    """Restore a previous edit approval requester binding."""
+
+    _EDIT_APPROVAL_REQUESTER.reset(token)
+
+
+def clear_edit_approval_requester() -> None:
+    """Clear the current requester; primarily used by tests."""
+
+    _EDIT_APPROVAL_REQUESTER.set(None)
+
+
+def get_edit_approval_requester() -> EditApprovalRequester | None:
+    return _EDIT_APPROVAL_REQUESTER.get()
+
+
+def _read_text_if_exists(path: str) -> str | None:
+    p = Path(path).expanduser()
+    if not p.exists():
+        return None
+    if not p.is_file():
+        raise OSError(f"Cannot edit non-file path: {path}")
+    return p.read_text(encoding="utf-8", errors="replace")
+
+
+def _proposal_for_write_file(arguments: dict[str, Any]) -> EditProposal:
+    path = str(arguments.get("path") or "")
+    if not path:
+        raise ValueError("path required")
+    content = arguments.get("content")
+    if content is None:
+        raise ValueError("content required")
+    return EditProposal(
+        tool_name="write_file",
+        path=path,
+        old_text=_read_text_if_exists(path),
+        new_text=str(content),
+        arguments=dict(arguments),
+    )
+
+
+def _proposal_for_patch_replace(arguments: dict[str, Any]) -> EditProposal:
+    path = str(arguments.get("path") or "")
+    if not path:
+        raise ValueError("path required")
+    old_string = arguments.get("old_string")
+    new_string = arguments.get("new_string")
+    if old_string is None or new_string is None:
+        raise ValueError("old_string and new_string required")
+
+    old_text = _read_text_if_exists(path)
+    if old_text is None:
+        raise ValueError(f"Failed to read file: {path}")
+
+    from tools.fuzzy_match import fuzzy_find_and_replace
+
+    new_text, match_count, _strategy, error = fuzzy_find_and_replace(
+        old_text,
+        str(old_string),
+        str(new_string),
+        bool(arguments.get("replace_all", False)),
+    )
+    if error or match_count == 0:
+        raise ValueError(error or f"Could not find match for old_string in {path}")
+
+    return EditProposal(
+        tool_name="patch",
+        path=path,
+        old_text=old_text,
+        new_text=new_text,
+        arguments=dict(arguments),
+    )
+
+
+def build_edit_proposal(tool_name: str, arguments: dict[str, Any]) -> EditProposal | None:
+    """Return an edit proposal for supported file mutation calls."""
+
+    if tool_name == "write_file":
+        return _proposal_for_write_file(arguments)
+    if tool_name == "patch" and arguments.get("mode", "replace") == "replace":
+        return _proposal_for_patch_replace(arguments)
+    return None
+
+
+def _is_sensitive_auto_approve_path(path: str) -> bool:
+    parts = Path(path).expanduser().parts
+    lowered = {part.lower() for part in parts}
+    if ".git" in lowered or ".ssh" in lowered:
+        return True
+    return Path(path).name.lower() in SENSITIVE_AUTO_APPROVE_NAMES
+
+
+def should_auto_approve_edit(proposal: EditProposal, policy: str, cwd: str | None = None) -> bool:
+    """Return whether an ACP edit proposal may bypass the prompt for this session.
+
+    This is intentionally session-scoped and conservative: sensitive paths still
+    ask even under autonomous policies.
+    """
+
+    policy = str(policy or AUTO_APPROVE_ASK).strip()
+    if policy == AUTO_APPROVE_ASK or _is_sensitive_auto_approve_path(proposal.path):
+        return False
+    path = Path(proposal.path).expanduser().resolve(strict=False)
+    if policy == AUTO_APPROVE_SESSION:
+        return True
+    if policy == AUTO_APPROVE_WORKSPACE:
+        # `/tmp` is the POSIX path but tempfile.gettempdir() is the real one on
+        # every platform: `/private/tmp` on macOS (because `/tmp` is a symlink
+        # and Path.resolve() follows it) and the per-user Temp dir on Windows.
+        tmp_root = Path(tempfile.gettempdir()).resolve(strict=False)
+        try:
+            path.relative_to(tmp_root)
+            return True
+        except ValueError:
+            pass
+        if cwd:
+            root = Path(cwd).expanduser().resolve(strict=False)
+            try:
+                path.relative_to(root)
+                return True
+            except ValueError:
+                return False
+    return False
+
+
+def maybe_require_edit_approval(tool_name: str, arguments: dict[str, Any]) -> str | None:
+    """Run ACP edit approval if bound.
+
+    Returns a JSON tool-error string when the edit must be blocked, otherwise
+    ``None`` so dispatch can continue.  Requester exceptions deny by default.
+    """
+
+    requester = get_edit_approval_requester()
+    if requester is None:
+        return None
+
+    try:
+        proposal = build_edit_proposal(tool_name, arguments)
+    except Exception as exc:
+        logger.warning("Could not build ACP edit approval proposal for %s: %s", tool_name, exc)
+        return json.dumps({"error": f"Edit approval denied: could not prepare diff ({exc})"}, ensure_ascii=False)
+
+    if proposal is None:
+        return None
+
+    try:
+        approved = bool(requester(proposal))
+    except Exception as exc:
+        logger.warning("ACP edit approval requester failed: %s", exc)
+        approved = False
+
+    if approved:
+        return None
+    return json.dumps({"error": "Edit approval denied by ACP client; file was not modified."}, ensure_ascii=False)
+
+
+def build_acp_edit_tool_call(proposal: EditProposal):
+    """Build the ToolCallUpdate payload for ACP request_permission."""
+
+    import acp
+
+    tool_call_id = f"edit-approval-{next(_PERMISSION_REQUEST_IDS)}"
+    return acp.update_tool_call(
+        tool_call_id,
+        title=f"Approve edit: {proposal.path}",
+        kind="edit",
+        status="pending",
+        content=[
+            acp.tool_diff_content(
+                path=proposal.path,
+                old_text=proposal.old_text,
+                new_text=proposal.new_text,
+            )
+        ],
+        raw_input={"tool": proposal.tool_name, "arguments": proposal.arguments},
+    )
+
+
+def make_acp_edit_approval_requester(
+    request_permission_fn: Callable,
+    loop: asyncio.AbstractEventLoop,
+    session_id: str,
+    timeout: float = 60.0,
+    auto_approve_getter: Callable[[], tuple[str, str | None]] | None = None,
+) -> EditApprovalRequester:
+    """Return a sync requester that bridges edit proposals to ACP permissions."""
+
+    def _requester(proposal: EditProposal) -> bool:
+        from acp.schema import PermissionOption
+        from agent.async_utils import safe_schedule_threadsafe
+
+        if auto_approve_getter is not None:
+            try:
+                policy, cwd = auto_approve_getter()
+                if should_auto_approve_edit(proposal, policy, cwd):
+                    logger.info("Auto-approved ACP edit under policy %s: %s", policy, proposal.path)
+                    return True
+            except Exception:
+                logger.debug("ACP edit auto-approval policy check failed", exc_info=True)
+
+        options = [
+            PermissionOption(option_id="allow_once", kind="allow_once", name="Allow edit"),
+            PermissionOption(option_id="deny", kind="reject_once", name="Deny"),
+        ]
+        tool_call = build_acp_edit_tool_call(proposal)
+        coro = request_permission_fn(
+            session_id=session_id,
+            tool_call=tool_call,
+            options=options,
+        )
+        future = safe_schedule_threadsafe(
+            coro,
+            loop,
+            logger=logger,
+            log_message="Edit approval request: failed to schedule on loop",
+        )
+        if future is None:
+            return False
+        try:
+            response = future.result(timeout=timeout)
+        except (FutureTimeout, Exception) as exc:
+            future.cancel()
+            logger.warning("Edit approval request timed out or failed: %s", exc)
+            return False
+        outcome = getattr(response, "outcome", None)
+        return (
+            getattr(outcome, "outcome", None) == "selected"
+            and getattr(outcome, "option_id", None) == "allow_once"
+        )
+
+    return _requester
--- a/acp_adapter/entry.py
+++ b/acp_adapter/entry.py
@@ -24,6 +24,7 @@ except ModuleNotFoundError:
    # means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
    pass

+import argparse
 import asyncio
 import logging
 import sys
@@ -107,8 +108,125 @@ def _load_env() -> None:
        )


-def main() -> None:
+def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        prog="hermes-acp",
+        description="Run Hermes Agent as an ACP stdio server.",
+    )
+    parser.add_argument("--version", action="store_true", help="Print Hermes version and exit")
+    parser.add_argument(
+        "--check",
+        action="store_true",
+        help="Verify ACP dependencies and adapter imports, then exit",
+    )
+    parser.add_argument(
+        "--setup",
+        action="store_true",
+        help="Run interactive Hermes provider/model setup for ACP terminal auth",
+    )
+    parser.add_argument(
+        "--setup-browser",
+        action="store_true",
+        help="Install agent-browser + Playwright Chromium into ~/.hermes/node/ "
+             "for browser tool support. Idempotent.",
+    )
+    parser.add_argument(
+        "--yes",
+        "-y",
+        action="store_true",
+        dest="assume_yes",
+        help="Accept all prompts (currently used by --setup-browser to skip the "
+             "~400 MB Chromium download confirmation).",
+    )
+    return parser.parse_args(argv)
+
+
+def _print_version() -> None:
+    from hermes_cli import __version__ as hermes_version
+
+    print(hermes_version)
+
+
+def _run_check() -> None:
+    import acp  # noqa: F401
+    from acp_adapter.server import HermesACPAgent  # noqa: F401
+
+    print("Hermes ACP check OK")
+
+
+def _run_setup() -> None:
+    from hermes_cli.main import main as hermes_main
+
+    old_argv = sys.argv[:]
+    try:
+        sys.argv = [old_argv[0] if old_argv else "hermes", "model"]
+        hermes_main()
+    finally:
+        sys.argv = old_argv
+
+    # Offer browser-tools install as a follow-up. The terminal auth method
+    # is the one supported first-run UX for registry installs, so this is
+    # the natural moment to ask. Skip silently if stdin isn't a TTY (the
+    # answer can't be collected anyway).
+    if not sys.stdin.isatty():
+        return
+    try:
+        reply = input(
+            "\nInstall browser tools? Downloads agent-browser (npm) and "
+            "optionally Playwright Chromium (~400 MB). [y/N] "
+        ).strip().lower()
+    except (EOFError, KeyboardInterrupt):
+        return
+    if reply in {"y", "yes"}:
+        _run_setup_browser(assume_yes=False)
+
+
+def _run_setup_browser(assume_yes: bool = False) -> int:
+    """Bootstrap agent-browser + Chromium.
+
+    Routes through dep_ensure -> install.{sh,ps1} --ensure, sharing code
+    with ``hermes postinstall`` and the runtime lazy installer.
+
+    Returns 0 on success, 1 on failure.
+    """
+    from hermes_cli.dep_ensure import ensure_dependency
+
+    try:
+        node_ok = ensure_dependency("node", interactive=not assume_yes)
+        if not node_ok:
+            print("Node.js installation failed — cannot proceed with browser tools.",
+                  file=sys.stderr)
+            return 1
+
+        browser_ok = ensure_dependency("browser", interactive=not assume_yes)
+        if not browser_ok:
+            print("Browser tools installation failed.", file=sys.stderr)
+            return 1
+
+        return 0
+    except OSError as exc:
+        print(f"Browser bootstrap failed: {exc}", file=sys.stderr)
+        return 1
+
+
+def main(argv: list[str] | None = None) -> None:
    """Entry point: load env, configure logging, run the ACP agent."""
+    args = _parse_args(argv)
+    if args.version:
+        _print_version()
+        return
+    if args.check:
+        _run_check()
+        return
+    if args.setup:
+        _run_setup()
+        return
+    if args.setup_browser:
+        rc = _run_setup_browser(assume_yes=args.assume_yes)
+        if rc != 0:
+            sys.exit(rc)
+        return
+
    _setup_logging()
    _load_env()

--- a/acp_adapter/events.py
+++ b/acp_adapter/events.py
@@ -14,6 +14,7 @@ from collections import deque
 from typing import Any, Callable, Deque, Dict

 import acp
+from acp.schema import AgentPlanUpdate, PlanEntry

 from .tools import (
    build_tool_complete,
@@ -24,6 +25,65 @@ from .tools import (
 logger = logging.getLogger(__name__)


+def _json_loads_maybe_prefix(value: str) -> Any:
+    """Parse a JSON object even when Hermes appended a human hint after it."""
+    text = value.strip()
+    try:
+        return json.loads(text)
+    except Exception:
+        decoder = json.JSONDecoder()
+        data, _ = decoder.raw_decode(text)
+        return data
+
+
+def _build_plan_update_from_todo_result(result: Any) -> AgentPlanUpdate | None:
+    """Translate Hermes' todo tool result into ACP's native plan update.
+
+    Zed renders ``sessionUpdate: plan`` as its first-class task/todo panel. The
+    Hermes agent already maintains task state through the ``todo`` tool, so the
+    ACP adapter should expose that state natively instead of only as a generic
+    tool-call transcript block.
+    """
+    if not isinstance(result, str) or not result.strip():
+        return None
+
+    try:
+        data = _json_loads_maybe_prefix(result)
+    except Exception:
+        return None
+
+    if not isinstance(data, dict) or not isinstance(data.get("todos"), list):
+        return None
+
+    todos = data["todos"]
+    if not todos:
+        return AgentPlanUpdate(session_update="plan", entries=[])
+
+    status_map = {
+        "pending": "pending",
+        "in_progress": "in_progress",
+        "completed": "completed",
+        # ACP plans only support pending/in_progress/completed. Preserve
+        # cancelled tasks as terminal entries instead of dropping them and
+        # making the client's full-list replacement lose visible context.
+        "cancelled": "completed",
+    }
+    entries: list[PlanEntry] = []
+    for item in todos:
+        if not isinstance(item, dict):
+            continue
+        content = str(item.get("content") or item.get("id") or "").strip()
+        if not content:
+            continue
+        raw_status = str(item.get("status") or "pending").strip()
+        status = status_map.get(raw_status, "pending")
+        if raw_status == "cancelled":
+            content = f"[cancelled] {content}"
+        entries.append(PlanEntry(content=content, priority="medium", status=status))
+
+    return AgentPlanUpdate(session_update="plan", entries=entries)
+
+
 def _send_update(
    conn: acp.Client,
    session_id: str,
@@ -31,10 +91,17 @@ def _send_update(
    update: Any,
 ) -> None:
    """Fire-and-forget an ACP session update from a worker thread."""
+    from agent.async_utils import safe_schedule_threadsafe
+
+    future = safe_schedule_threadsafe(
+        conn.session_update(session_id, update),
+        loop,
+        logger=logger,
+        log_message="Failed to send ACP update",
+    )
+    if future is None:
+        return
    try:
-        future = asyncio.run_coroutine_threadsafe(
-            conn.session_update(session_id, update), loop
-        )
        future.result(timeout=5)
    except Exception:
        logger.debug("Failed to send ACP update", exc_info=True)
@@ -50,6 +117,7 @@ def make_tool_progress_cb(
    loop: asyncio.AbstractEventLoop,
    tool_call_ids: Dict[str, Deque[str]],
    tool_call_meta: Dict[str, Dict[str, Any]],
+    edit_approval_policy_getter: Callable[[], tuple[str, str | None]] | None = None,
 ) -> Callable:
    """Create a ``tool_progress_callback`` for AIAgent.

@@ -95,7 +163,20 @@ def make_tool_progress_cb(
                logger.debug("Failed to capture ACP edit snapshot for %s", name, exc_info=True)
        tool_call_meta[tc_id] = {"args": args, "snapshot": snapshot}

-        update = build_tool_start(tc_id, name, args)
+        edit_diff = None
+        if name in {"write_file", "patch"} and edit_approval_policy_getter is not None:
+            try:
+                from acp_adapter.edit_approval import build_edit_proposal, should_auto_approve_edit
+
+                proposal = build_edit_proposal(name, args)
+                if proposal is not None:
+                    policy, cwd = edit_approval_policy_getter()
+                    if should_auto_approve_edit(proposal, policy, cwd):
+                        edit_diff = proposal
+            except Exception:
+                logger.debug("Failed to prepare auto-approved ACP edit diff for %s", name, exc_info=True)
+
+        update = build_tool_start(tc_id, name, args, edit_diff=edit_diff)
        _send_update(conn, session_id, loop, update)

    return _tool_progress
@@ -168,6 +249,10 @@ def make_step_cb(
                        snapshot=meta.get("snapshot"),
                    )
                    _send_update(conn, session_id, loop, update)
+                    if tool_name == "todo":
+                        plan_update = _build_plan_update_from_todo_result(result)
+                        if plan_update is not None:
+                            _send_update(conn, session_id, loop, plan_update)
                    if not queue:
                        tool_call_ids.pop(tool_name, None)

--- a/acp_adapter/permissions.py
+++ b/acp_adapter/permissions.py
@@ -1,10 +1,11 @@
-"""ACP permission bridging — maps ACP approval requests to hermes approval callbacks."""
+"""ACP permission bridging for Hermes dangerous-command approvals."""

 from __future__ import annotations

 import asyncio
 import logging
 from concurrent.futures import TimeoutError as FutureTimeout
+from itertools import count
 from typing import Callable

 from acp.schema import (
@@ -14,24 +15,107 @@ from acp.schema import (

 logger = logging.getLogger(__name__)

-# Maps ACP PermissionOptionKind -> hermes approval result strings
-_KIND_TO_HERMES = {
+# Maps ACP permission option ids to Hermes approval result strings.
+# Option ids are stable across both the ``allow_permanent=True`` and
+# ``allow_permanent=False`` paths even though the option list differs.
+_OPTION_ID_TO_HERMES = {
    "allow_once": "once",
+    "allow_session": "session",
    "allow_always": "always",
-    "reject_once": "deny",
-    "reject_always": "deny",
+    "deny": "deny",
+    "deny_always": "deny",
 }

+_PERMISSION_REQUEST_IDS = count(1)
+
+
+def _permission_option_supports_kind(kind: str) -> bool:
+    """Return whether the installed ACP SDK accepts a permission option kind."""
+    try:
+        PermissionOption(option_id="__probe__", kind=kind, name="probe")
+    except Exception:
+        return False
+    return True
+
+
+def _build_permission_options(*, allow_permanent: bool) -> list[PermissionOption]:
+    """Return ACP options that match Hermes approval semantics."""
+    options = [
+        PermissionOption(option_id="allow_once", kind="allow_once", name="Allow once"),
+        PermissionOption(
+            option_id="allow_session",
+            # ACP has no session-scoped kind, so use the closest persistent
+            # hint while keeping Hermes semantics in the option id.
+            kind="allow_always",
+            name="Allow for session",
+        ),
+    ]
+    if allow_permanent:
+        options.append(
+            PermissionOption(
+                option_id="allow_always",
+                kind="allow_always",
+                name="Allow always",
+            ),
+        )
+    options.append(PermissionOption(option_id="deny", kind="reject_once", name="Deny"))
+    if _permission_option_supports_kind("reject_always"):
+        options.append(
+            PermissionOption(
+                option_id="deny_always",
+                kind="reject_always",
+                name="Deny always",
+            ),
+        )
+    return options
+
+
+def _build_permission_tool_call(command: str, description: str):
+    """Return the ACP tool-call update attached to a permission request.
+
+    ``request_permission`` expects a ``ToolCallUpdate`` payload — produced
+    by ``_acp.update_tool_call`` — not a ``ToolCallStart``. Each request
+    gets a unique ``perm-check-N`` id so concurrent requests don't collide.
+    """
+    import acp as _acp
+
+    tool_call_id = f"perm-check-{next(_PERMISSION_REQUEST_IDS)}"
+    title = f"{description}: {command}" if description else command
+    content_text = f"{description}\n$ {command}" if description else f"$ {command}"
+    return _acp.update_tool_call(
+        tool_call_id,
+        title=title,
+        kind="execute",
+        status="pending",
+        content=[_acp.tool_content(_acp.text_block(content_text))],
+        raw_input={"command": command, "description": description},
+    )
+
+
+def _map_outcome_to_hermes(outcome: object, *, allowed_option_ids: set[str]) -> str:
+    """Map an ACP permission outcome into Hermes approval strings."""
+    if not isinstance(outcome, AllowedOutcome):
+        return "deny"
+
+    option_id = outcome.option_id
+    if option_id not in allowed_option_ids:
+        logger.warning("Permission request returned unknown option_id: %s", option_id)
+        return "deny"
+    return _OPTION_ID_TO_HERMES.get(option_id, "deny")
+

 def make_approval_callback(
    request_permission_fn: Callable,
    loop: asyncio.AbstractEventLoop,
    session_id: str,
    timeout: float = 60.0,
-) -> Callable[[str, str], str]:
+) -> Callable[..., str]:
    """
-    Return a hermes-compatible ``approval_callback(command, description) -> str``
-    that bridges to the ACP client's ``request_permission`` call.
+    Return a Hermes-compatible approval callback that bridges to ACP.
+
+    The callback accepts ``command`` and ``description`` plus optional
+    keyword arguments such as ``allow_permanent`` used by
+    ``tools.approval.prompt_dangerous_approval()``.

    Args:
        request_permission_fn: The ACP connection's ``request_permission`` coroutine.
@@ -40,41 +124,45 @@ def make_approval_callback(
        timeout: Seconds to wait for a response before auto-denying.
    """

-    def _callback(command: str, description: str) -> str:
-        options = [
-            PermissionOption(option_id="allow_once", kind="allow_once", name="Allow once"),
-            PermissionOption(option_id="allow_always", kind="allow_always", name="Allow always"),
-            PermissionOption(option_id="deny", kind="reject_once", name="Deny"),
-        ]
-        import acp as _acp
+    def _callback(
+        command: str,
+        description: str,
+        *,
+        allow_permanent: bool = True,
+        **_: object,
+    ) -> str:
+        from agent.async_utils import safe_schedule_threadsafe

-        tool_call = _acp.start_tool_call("perm-check", command, kind="execute")
+        options = _build_permission_options(allow_permanent=allow_permanent)

+        tool_call = _build_permission_tool_call(command, description)
        coro = request_permission_fn(
            session_id=session_id,
            tool_call=tool_call,
            options=options,
        )
+        future = safe_schedule_threadsafe(
+            coro, loop,
+            logger=logger,
+            log_message="Permission request: failed to schedule on loop",
+        )
+        if future is None:
+            return "deny"

        try:
-            future = asyncio.run_coroutine_threadsafe(coro, loop)
            response = future.result(timeout=timeout)
        except (FutureTimeout, Exception) as exc:
+            future.cancel()
            logger.warning("Permission request timed out or failed: %s", exc)
            return "deny"

        if response is None:
            return "deny"

-        outcome = response.outcome
-        if isinstance(outcome, AllowedOutcome):
-            option_id = outcome.option_id
-            # Look up the kind from our options list
-            for opt in options:
-                if opt.option_id == option_id:
-                    return _KIND_TO_HERMES.get(opt.kind, "deny")
-            return "once"  # fallback for unknown option_id
-        else:
-            return "deny"
+        allowed_option_ids = {option.option_id for option in options}
+        return _map_outcome_to_hermes(
+            response.outcome,
+            allowed_option_ids=allowed_option_ids,
+        )

    return _callback
--- a/acp_adapter/server.py
+++ b/acp_adapter/server.py
@@ -3,6 +3,7 @@
 from __future__ import annotations

 import asyncio
+from datetime import datetime, timezone
 import base64
 import contextvars
 import json
@@ -18,6 +19,7 @@ import acp
 from acp.schema import (
    AgentCapabilities,
    AgentMessageChunk,
+    AgentThoughtChunk,
    AuthenticateResponse,
    AvailableCommand,
    AvailableCommandsUpdate,
@@ -45,7 +47,10 @@ from acp.schema import (
    ResourceContentBlock,
    SessionCapabilities,
    SessionForkCapabilities,
+    SessionInfoUpdate,
    SessionListCapabilities,
+    SessionMode,
+    SessionModeState,
    SessionModelState,
    SessionResumeCapabilities,
    SessionInfo,
@@ -57,14 +62,9 @@ from acp.schema import (
    UserMessageChunk,
 )

-# AuthMethodAgent was renamed from AuthMethod in agent-client-protocol 0.9.0
-try:
-    from acp.schema import AuthMethodAgent
-except ImportError:
-    from acp.schema import AuthMethod as AuthMethodAgent  # type: ignore[attr-defined]
-
-from acp_adapter.auth import detect_provider
+from acp_adapter.auth import TERMINAL_SETUP_AUTH_METHOD_ID, build_auth_methods, detect_provider
 from acp_adapter.events import (
+    _build_plan_update_from_todo_result,
    make_message_cb,
    make_step_cb,
    make_thinking_cb,
@@ -499,6 +499,20 @@ class HermesACPAgent(acp.Agent):
        },
    )

+    _EDIT_APPROVAL_POLICY_CONFIG_ID = "edit_approval_policy"
+    _EDIT_APPROVAL_POLICY_DEFAULT = "ask"
+    _MODE_DEFAULT = "default"
+    _MODE_ACCEPT_EDITS = "accept_edits"
+    _MODE_DONT_ASK = "dont_ask"
+    _MODE_TO_EDIT_APPROVAL_POLICY = {
+        _MODE_DEFAULT: "ask",
+        _MODE_ACCEPT_EDITS: "workspace_session",
+        _MODE_DONT_ASK: "session",
+    }
+    _EDIT_APPROVAL_POLICY_TO_MODE = {
+        value: key for key, value in _MODE_TO_EDIT_APPROVAL_POLICY.items()
+    }
+
    def __init__(self, session_manager: SessionManager | None = None):
        super().__init__()
        self.session_manager = session_manager or SessionManager()
@@ -511,6 +525,45 @@ class HermesACPAgent(acp.Agent):
        self._conn = conn
        logger.info("ACP client connected")

+
+    def _session_modes(self, state: SessionState) -> SessionModeState:
+        """Return ACP session modes while preserving Zed's separate model picker.
+
+        Zed renders ``config_options`` in the prominent selector slot where the
+        model picker was visible. Claude/Codex expose policy-like controls as ACP
+        modes, which coexist with the model picker, so Hermes maps edit approval
+        policy onto modes instead of advertising config options.
+        """
+
+        current = str(getattr(state, "mode", "") or self._MODE_DEFAULT)
+        if current not in self._MODE_TO_EDIT_APPROVAL_POLICY:
+            current = self._MODE_DEFAULT
+        return SessionModeState(
+            current_mode_id=current,
+            available_modes=[
+                SessionMode(
+                    id=self._MODE_DEFAULT,
+                    name="Default",
+                    description="Ask before edits.",
+                ),
+                SessionMode(
+                    id=self._MODE_ACCEPT_EDITS,
+                    name="Accept Edits",
+                    description="Auto-allow workspace and /tmp edits; still asks for sensitive paths.",
+                ),
+                SessionMode(
+                    id=self._MODE_DONT_ASK,
+                    name="Don't Ask",
+                    description="Auto-allow file edits for this session except sensitive paths.",
+                ),
+            ],
+        )
+
+    def _edit_approval_policy_for_state(self, state: SessionState) -> tuple[str, str | None]:
+        mode = str(getattr(state, "mode", "") or self._MODE_DEFAULT)
+        policy = self._MODE_TO_EDIT_APPROVAL_POLICY.get(mode, self._EDIT_APPROVAL_POLICY_DEFAULT)
+        return policy, state.cwd
+
    @staticmethod
    def _encode_model_choice(provider: str | None, model: str | None) -> str:
        """Encode a model selection so ACP clients can keep provider context."""
@@ -656,6 +709,37 @@ class HermesACPAgent(acp.Agent):
                exc_info=True,
            )

+    async def _send_session_info_update(self, session_id: str) -> None:
+        """Send ACP native session metadata after Hermes changes it."""
+        if not self._conn:
+            return
+        try:
+            row = self.session_manager._get_db().get_session(session_id)
+        except Exception:
+            logger.debug("Could not read ACP session info for %s", session_id, exc_info=True)
+            return
+        if not row:
+            return
+
+        title = row.get("title")
+        # The `sessions` table does not have an `updated_at` column (see
+        # hermes_state.py schema — only started_at/ended_at). Use "now" as
+        # the updated_at since we're emitting this notification precisely
+        # because the title was just refreshed.
+        updated_at = datetime.now(timezone.utc).isoformat()
+        update = SessionInfoUpdate(
+            session_update="session_info_update",
+            title=title if isinstance(title, str) and title.strip() else None,
+            updated_at=updated_at,
+        )
+        try:
+            await self._conn.session_update(
+                session_id=session_id,
+                update=update,
+            )
+        except Exception:
+            logger.debug("Could not send ACP session info update for %s", session_id, exc_info=True)
+
    def _schedule_usage_update(self, state: SessionState) -> None:
        """Schedule native context indicator refresh after ACP responses."""
        if not self._conn:
@@ -744,16 +828,7 @@ class HermesACPAgent(acp.Agent):
        resolved_protocol_version = (
            protocol_version if isinstance(protocol_version, int) else acp.PROTOCOL_VERSION
        )
-        provider = detect_provider()
-        auth_methods = None
-        if provider:
-            auth_methods = [
-                AuthMethodAgent(
-                    id=provider,
-                    name=f"{provider} runtime credentials",
-                    description=f"Authenticate Hermes using the currently configured {provider} runtime credentials.",
-                )
-            ]
+        auth_methods = build_auth_methods()

        client_name = client_info.name if client_info else "unknown"
        logger.info(
@@ -784,24 +859,38 @@ class HermesACPAgent(acp.Agent):
        # server has provider credentials configured — harmless under
        # Hermes' threat model (ACP is stdio-only, local-trust), but poor
        # API hygiene and confusing if ACP ever grows multi-method auth.
-        provider = detect_provider()
-        if not provider:
+        if not isinstance(method_id, str):
            return None
-        if not isinstance(method_id, str) or method_id.strip().lower() != provider:
+        normalized_method = method_id.strip().lower()
+        provider = detect_provider()
+
+        if normalized_method == TERMINAL_SETUP_AUTH_METHOD_ID:
+            # Terminal auth launches Hermes setup/model selection out-of-band.
+            # Only report success once that flow has produced usable runtime
+            # credentials for the normal ACP session.
+            return AuthenticateResponse() if provider else None
+
+        if not provider or normalized_method != provider:
            return None
        return AuthenticateResponse()

    # ---- Session management -------------------------------------------------

    @staticmethod
-    def _history_message_text(message: dict[str, Any]) -> str:
-        """Extract displayable text from a persisted OpenAI-style message."""
-        content = message.get("content")
-        if isinstance(content, str):
-            return content.strip()
-        if isinstance(content, list):
+    def _flatten_history_text(value: Any) -> str:
+        """Normalize a persisted text-or-text-parts value into a single string.
+
+        OpenAI-style assistant content (and provider reasoning fields) can arrive
+        as either a scalar string or a list of ``{"text": ...}`` /
+        ``{"type": "text", "content": ...}`` parts. Whitespace-only inputs
+        collapse to an empty string so callers can treat ``""`` as "nothing to
+        emit".
+        """
+        if isinstance(value, str):
+            return value.strip()
+        if isinstance(value, list):
            parts: list[str] = []
-            for item in content:
+            for item in value:
                if isinstance(item, dict):
                    text = item.get("text")
                    if isinstance(text, str):
@@ -813,6 +902,29 @@ class HermesACPAgent(acp.Agent):
            return "\n".join(part.strip() for part in parts if part and part.strip()).strip()
        return ""

+    @classmethod
+    def _history_message_text(cls, message: dict[str, Any]) -> str:
+        """Extract displayable text from a persisted OpenAI-style message."""
+        return cls._flatten_history_text(message.get("content"))
+
+    @classmethod
+    def _history_reasoning_text(cls, message: dict[str, Any]) -> str:
+        """Extract displayable reasoning/thought text from a persisted assistant message.
+
+        Returns the first non-empty value among ``reasoning_content`` (the
+        canonical field used by DeepSeek / Moonshot and the post-#16892
+        chat-completions normalizer) and ``reasoning`` (used by the codex
+        event projector and several other transports). Both keys are
+        actively written by live code paths, so neither branch is
+        deprecated — they cover different transports rather than old vs.
+        new sessions.
+        """
+        for key in ("reasoning_content", "reasoning"):
+            text = cls._flatten_history_text(message.get(key))
+            if text:
+                return text
+        return ""
+
    @staticmethod
    def _history_message_update(
        *,
@@ -833,6 +945,11 @@ class HermesACPAgent(acp.Agent):
            )
        return None

+    @staticmethod
+    def _history_thought_update(text: str) -> AgentThoughtChunk:
+        """Build an ACP history replay update for an assistant thought."""
+        return acp.update_agent_thought_text(text)
+
    @staticmethod
    def _history_tool_call_name_args(tool_call: dict[str, Any]) -> tuple[str, dict[str, Any]]:
        """Extract function name/arguments from an OpenAI-style tool_call."""
@@ -860,13 +977,17 @@ class HermesACPAgent(acp.Agent):
        ).strip()

    async def _replay_session_history(self, state: SessionState) -> None:
-        """Send persisted user/assistant history to clients during session/load.
+        """Replay persisted user/assistant history during session/load or session/resume.

-        Zed's ACP history UI calls ``session/load`` after the user picks an item
-        from the Agents sidebar. The agent must then replay the full conversation
-        as user/assistant chunks plus reconstructed tool-call start/completion
-        notifications; merely restoring server-side state makes Hermes remember
-        context, but leaves the editor looking like a clean thread.
+        Invoked inline (``await``) from both ``load_session`` and
+        ``resume_session`` so that spec-compliant ACP clients receive the
+        full transcript within the request's lifetime — see the comment at
+        the call sites for the rationale and prior-art citations.
+
+        Replays the conversation as user/assistant chunks, thinking-mode
+        thought chunks, plus reconstructed tool-call start/completion
+        notifications. Merely restoring server-side state makes Hermes
+        remember context, but leaves the editor looking like a clean thread.
        """
        if not self._conn or not state.history:
            return
@@ -888,24 +1009,37 @@ class HermesACPAgent(acp.Agent):
        for message in state.history:
            role = str(message.get("role") or "")

-            if role in {"user", "assistant"}:
+            if role == "user":
+                text = self._history_message_text(message)
+                if text:
+                    update = self._history_message_update(role=role, text=text)
+                    if update is not None and not await _send(update):
+                        return
+                continue
+
+            if role == "assistant":
+                thought = self._history_reasoning_text(message)
+                if thought and not await _send(self._history_thought_update(thought)):
+                    return
+
                text = self._history_message_text(message)
                if text:
                    update = self._history_message_update(role=role, text=text)
                    if update is not None and not await _send(update):
                        return

-            if role == "assistant" and isinstance(message.get("tool_calls"), list):
-                for tool_call in message["tool_calls"]:
-                    if not isinstance(tool_call, dict):
-                        continue
-                    tool_call_id = self._history_tool_call_id(tool_call)
-                    if not tool_call_id:
-                        continue
-                    tool_name, args = self._history_tool_call_name_args(tool_call)
-                    active_tool_calls[tool_call_id] = (tool_name, args)
-                    if not await _send(build_tool_start(tool_call_id, tool_name, args)):
-                        return
+                tool_calls = message.get("tool_calls")
+                if isinstance(tool_calls, list):
+                    for tool_call in tool_calls:
+                        if not isinstance(tool_call, dict):
+                            continue
+                        tool_call_id = self._history_tool_call_id(tool_call)
+                        if not tool_call_id:
+                            continue
+                        tool_name, args = self._history_tool_call_name_args(tool_call)
+                        active_tool_calls[tool_call_id] = (tool_name, args)
+                        if not await _send(build_tool_start(tool_call_id, tool_name, args)):
+                            return
                continue

            if role == "tool":
@@ -917,15 +1051,20 @@ class HermesACPAgent(acp.Agent):
                if not tool_call_id or not tool_name:
                    continue
                result = message.get("content")
+                result_text = result if isinstance(result, str) else None
                if not await _send(
                    build_tool_complete(
                        tool_call_id,
                        tool_name,
-                        result=result if isinstance(result, str) else None,
+                        result=result_text,
                        function_args=function_args,
                    )
                ):
                    return
+                if tool_name == "todo":
+                    plan_update = _build_plan_update_from_todo_result(result_text)
+                    if plan_update is not None and not await _send(plan_update):
+                        return

    async def new_session(
        self,
@@ -941,20 +1080,9 @@ class HermesACPAgent(acp.Agent):
        return NewSessionResponse(
            session_id=state.session_id,
            models=self._build_model_state(state),
+            modes=self._session_modes(state),
        )

-    def _schedule_history_replay(self, state: SessionState) -> None:
-        """Replay persisted history after session/load or session/resume returns.
-
-        Zed only attaches streamed transcript/tool updates once the load/resume
-        response has completed. Sending replay notifications while the request is
-        still in-flight can make the server look correct in logs while the editor
-        drops or fails to attach the tool-call history.
-        """
-        loop = asyncio.get_running_loop()
-        replay_coro = self._replay_session_history(state)
-        loop.call_soon(asyncio.create_task, replay_coro)
-
    async def load_session(
        self,
        cwd: str,
@@ -968,10 +1096,36 @@ class HermesACPAgent(acp.Agent):
            return None
        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Loaded session %s", session_id)
-        self._schedule_history_replay(state)
+        # Per ACP spec, `session/load` must stream the prior conversation back
+        # to the client via `session/update` notifications BEFORE responding,
+        # so the client receives the full transcript within the load request's
+        # lifetime. Awaiting the replay here matches Codex / Claude Code /
+        # OpenCode / Pi and the Zed client (which registers the session-update
+        # routing entry before awaiting the loadSession RPC specifically so
+        # in-call history replay updates can find the thread). Deferring this
+        # via `loop.call_soon` (as we did briefly in May 2026) broke every
+        # spec-compliant ACP client that measures notifications synchronously
+        # against the load response — see #12285 follow-up.
+        try:
+            await self._replay_session_history(state)
+        except Exception:
+            # Replay is best-effort — a corrupted or unexpected message shape
+            # must not turn a successful session/load into a JSON-RPC error
+            # response. Per-notification failures are already caught inside
+            # ``_replay_session_history``; this outer guard covers anything
+            # raised by the helpers themselves before reaching ``_send``.
+            logger.warning(
+                "ACP history replay raised during session/load for %s — "
+                "load will still succeed, partial transcript may be missing",
+                session_id,
+                exc_info=True,
+            )
        self._schedule_available_commands_update(session_id)
        self._schedule_usage_update(state)
-        return LoadSessionResponse(models=self._build_model_state(state))
+        return LoadSessionResponse(
+            models=self._build_model_state(state),
+            modes=self._session_modes(state),
+        )

    async def resume_session(
        self,
@@ -986,10 +1140,24 @@ class HermesACPAgent(acp.Agent):
            state = self.session_manager.create_session(cwd=cwd)
        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Resumed session %s", state.session_id)
-        self._schedule_history_replay(state)
+        # See `load_session` above for the spec rationale — replay must
+        # complete before the response so clients receive the full transcript
+        # within the request's lifetime.
+        try:
+            await self._replay_session_history(state)
+        except Exception:
+            logger.warning(
+                "ACP history replay raised during session/resume for %s — "
+                "resume will still succeed, partial transcript may be missing",
+                state.session_id,
+                exc_info=True,
+            )
        self._schedule_available_commands_update(state.session_id)
        self._schedule_usage_update(state)
-        return ResumeSessionResponse(models=self._build_model_state(state))
+        return ResumeSessionResponse(
+            models=self._build_model_state(state),
+            modes=self._session_modes(state),
+        )

    async def cancel(self, session_id: str, **kwargs: Any) -> None:
        state = self.session_manager.get_session(session_id)
@@ -1019,7 +1187,11 @@ class HermesACPAgent(acp.Agent):
        logger.info("Forked session %s -> %s", session_id, new_id)
        if new_id:
            self._schedule_available_commands_update(new_id)
-        return ForkSessionResponse(session_id=new_id)
+        return ForkSessionResponse(
+            session_id=new_id,
+            models=self._build_model_state(state) if state is not None else None,
+            modes=self._session_modes(state) if state is not None else None,
+        )

    async def list_sessions(
        self,
@@ -1170,11 +1342,19 @@ class HermesACPAgent(acp.Agent):
        tool_call_ids: dict[str, Deque[str]] = defaultdict(deque)
        tool_call_meta: dict[str, dict[str, Any]] = {}
        previous_approval_cb = None
+        edit_approval_requester = None

        streamed_message = False

        if conn:
-            tool_progress_cb = make_tool_progress_cb(conn, session_id, loop, tool_call_ids, tool_call_meta)
+            tool_progress_cb = make_tool_progress_cb(
+                conn,
+                session_id,
+                loop,
+                tool_call_ids,
+                tool_call_meta,
+                edit_approval_policy_getter=lambda: self._edit_approval_policy_for_state(state),
+            )
            reasoning_cb = make_thinking_cb(conn, session_id, loop)
            step_cb = make_step_cb(conn, session_id, loop, tool_call_ids, tool_call_meta)
            message_cb = make_message_cb(conn, session_id, loop)
@@ -1186,6 +1366,17 @@ class HermesACPAgent(acp.Agent):
                message_cb(text)

            approval_cb = make_approval_callback(conn.request_permission, loop, session_id)
+            try:
+                from acp_adapter.edit_approval import make_acp_edit_approval_requester
+
+                edit_approval_requester = make_acp_edit_approval_requester(
+                    conn.request_permission,
+                    loop,
+                    session_id,
+                    auto_approve_getter=lambda: self._edit_approval_policy_for_state(state),
+                )
+            except Exception:
+                logger.debug("Could not create ACP edit approval requester", exc_info=True)
        else:
            tool_progress_cb = None
            reasoning_cb = None
@@ -1215,9 +1406,11 @@ class HermesACPAgent(acp.Agent):
        # which requires a notify_cb registered in _gateway_notify_cbs.
        previous_approval_cb = None
        previous_interactive = None
+        edit_approval_token = None
+        previous_session_id = None

        def _run_agent() -> dict:
-            nonlocal previous_approval_cb, previous_interactive
+            nonlocal previous_approval_cb, previous_interactive, edit_approval_token, previous_session_id
            # Bind HERMES_SESSION_KEY for this session so per-session caches
            # (e.g. the interactive sudo password cache in tools.terminal_tool)
            # scope to the ACP session rather than leaking across sessions
@@ -1241,10 +1434,24 @@ class HermesACPAgent(acp.Agent):
                    _terminal_tool.set_approval_callback(approval_cb)
                except Exception:
                    logger.debug("Could not set ACP approval callback", exc_info=True)
+            if edit_approval_requester:
+                try:
+                    from acp_adapter.edit_approval import set_edit_approval_requester
+
+                    edit_approval_token = set_edit_approval_requester(edit_approval_requester)
+                except Exception:
+                    logger.debug("Could not set ACP edit approval requester", exc_info=True)
            # Signal to tools.approval that we have an interactive callback
            # and the non-interactive auto-approve path must not fire.
            previous_interactive = os.environ.get("HERMES_INTERACTIVE")
            os.environ["HERMES_INTERACTIVE"] = "1"
+            # Propagate the originating ACP session id to tools that want to
+            # tag side-effects with it (e.g. ``kanban_create`` stamps it on
+            # the new task so clients can render a per-session board). Save
+            # and restore around the agent call so a re-used executor thread
+            # never leaks one session's id into the next session's tools.
+            previous_session_id = os.environ.get("HERMES_SESSION_ID")
+            os.environ["HERMES_SESSION_ID"] = session_id
            try:
                result = agent.run_conversation(
                    user_message=user_content,
@@ -1262,12 +1469,24 @@ class HermesACPAgent(acp.Agent):
                    os.environ.pop("HERMES_INTERACTIVE", None)
                else:
                    os.environ["HERMES_INTERACTIVE"] = previous_interactive
+                # Restore HERMES_SESSION_ID symmetrically.
+                if previous_session_id is None:
+                    os.environ.pop("HERMES_SESSION_ID", None)
+                else:
+                    os.environ["HERMES_SESSION_ID"] = previous_session_id
                if approval_cb:
                    try:
                        from tools import terminal_tool as _terminal_tool
                        _terminal_tool.set_approval_callback(previous_approval_cb)
                    except Exception:
                        logger.debug("Could not restore approval callback", exc_info=True)
+                if edit_approval_token is not None:
+                    try:
+                        from acp_adapter.edit_approval import reset_edit_approval_requester
+
+                        reset_edit_approval_requester(edit_approval_token)
+                    except Exception:
+                        logger.debug("Could not restore ACP edit approval requester", exc_info=True)
                if session_tokens is not None and clear_session_vars is not None:
                    try:
                        clear_session_vars(session_tokens)
@@ -1298,12 +1517,20 @@ class HermesACPAgent(acp.Agent):
            try:
                from agent.title_generator import maybe_auto_title

+                def _notify_title_update(_title: str) -> None:
+                    if conn:
+                        loop.call_soon_threadsafe(
+                            asyncio.create_task,
+                            self._send_session_info_update(session_id),
+                        )
+
                maybe_auto_title(
                    self.session_manager._get_db(),
                    session_id,
                    user_text,
                    final_response,
                    state.history,
+                    title_callback=_notify_title_update,
                )
            except Exception:
                logger.debug("Failed to auto-title ACP session %s", session_id, exc_info=True)
@@ -1690,9 +1917,12 @@ class HermesACPAgent(acp.Agent):
        if state is None:
            logger.warning("Session %s: mode switch requested for missing session", session_id)
            return None
-        setattr(state, "mode", mode_id)
+        normalized_mode = str(mode_id or "").strip()
+        if normalized_mode not in self._MODE_TO_EDIT_APPROVAL_POLICY:
+            normalized_mode = self._MODE_DEFAULT
+        setattr(state, "mode", normalized_mode)
        self.session_manager.save_session(session_id)
-        logger.info("Session %s: mode switched to %s", session_id, mode_id)
+        logger.info("Session %s: mode switched to %s", session_id, normalized_mode)
        return SetSessionModeResponse()

    async def set_config_option(
@@ -1704,11 +1934,15 @@ class HermesACPAgent(acp.Agent):
            logger.warning("Session %s: config update requested for missing session", session_id)
            return None

-        options = getattr(state, "config_options", None)
-        if not isinstance(options, dict):
-            options = {}
-        options[str(config_id)] = value
-        setattr(state, "config_options", options)
+        if str(config_id) == self._EDIT_APPROVAL_POLICY_CONFIG_ID:
+            mode = self._EDIT_APPROVAL_POLICY_TO_MODE.get(str(value), self._MODE_DEFAULT)
+            setattr(state, "mode", mode)
+        else:
+            options = getattr(state, "config_options", None)
+            if not isinstance(options, dict):
+                options = {}
+            options[str(config_id)] = value
+            setattr(state, "config_options", options)
        self.session_manager.save_session(session_id)
        logger.info("Session %s: config option %s updated", session_id, config_id)
        return SetSessionConfigOptionResponse(config_options=[])
--- a/acp_adapter/session.py
+++ b/acp_adapter/session.py
@@ -601,6 +601,7 @@ class SessionManager:
            ),
            "quiet_mode": True,
            "session_id": session_id,
+            "session_db": self._get_db(),
            "model": model or default_model,
        }

--- a/acp_adapter/tools.py
+++ b/acp_adapter/tools.py
@@ -202,6 +202,44 @@ def _json_loads_maybe(value: Optional[str]) -> Any:
        return None


+def _tool_result_failed(result: Optional[str], tool_name: str | None = None) -> bool:
+    """Return True when a structured Hermes tool result clearly failed.
+
+    Keep this deliberately conservative. Plain text can contain words like
+    "error" because tests failed or a command printed diagnostics; Zed should
+    only receive ACP failed status for structured tool-level failures.
+    """
+    # Raised exceptions from the agent's tool executor get wrapped in a
+    # canonical "Error executing tool '<name>': ..." prefix (see
+    # agent/tool_executor.py around the try/except). That prefix is uniquely
+    # produced by the wrapper itself — it cannot legitimately appear in
+    # well-behaved tool output. Catch it so a tool that blew up shows as
+    # failed in Zed instead of misleadingly green.
+    if isinstance(result, str) and result.startswith("Error executing tool '"):
+        return True
+
+    data = _json_loads_maybe(result)
+    if not isinstance(data, dict):
+        return False
+
+    for key in ("success", "ok"):
+        if data.get(key) is False:
+            return True
+
+    exit_code = data.get("exit_code", data.get("returncode"))
+    if isinstance(exit_code, int) and exit_code != 0:
+        return True
+
+    # Hermes core/polished tools commonly report tool-level failures as a
+    # structured {"error": "..."} payload without an explicit success flag.
+    # Keep generic plugin/unknown tool payloads conservative to avoid marking
+    # optional diagnostic messages as failed.
+    if tool_name in _POLISHED_TOOLS and data.get("error") and not data.get("content"):
+        return True
+
+    return False
+
+
 def _truncate_text(text: str, limit: int = 5000) -> str:
    if len(text) <= limit:
        return text
@@ -278,6 +316,26 @@ def _format_search_files_result(result: Optional[str]) -> Optional[str]:
    data = _json_loads_maybe(result)
    if not isinstance(data, dict):
        return None
+
+    files = data.get("files")
+    if isinstance(files, list):
+        total = data.get("total_count", len(files))
+        shown = min(len(files), 20)
+        truncated = bool(data.get("truncated")) or len(files) > shown
+        lines = [
+            "File search results",
+            f"Found {total} file{'s' if total != 1 else ''}; showing {shown}.",
+            "",
+        ]
+        for path in files[:shown]:
+            lines.append(f"- {path}")
+        if truncated:
+            lines.extend([
+                "",
+                "Results truncated. Narrow the search, add path/file_glob, or use offset to page.",
+            ])
+        return _truncate_text("\n".join(lines), limit=7000)
+
    matches = data.get("matches")
    if not isinstance(matches, list):
        return None
@@ -668,14 +726,114 @@ def _format_media_or_cron_result(tool_name: str, result: Optional[str]) -> Optio
    return "\n".join(lines)


-def _format_generic_structured_result(tool_name: str, result: Optional[str]) -> Optional[str]:
+def _format_structured_value(
+    key: str,
+    value: Any,
+    *,
+    indent: int = 0,
+    max_depth: int = 3,
+    max_items: int = 8,
+) -> List[str]:
+    """Render nested JSON-ish values as compact Markdown bullets, not inline blobs."""
+    prefix = "  " * indent
+    bullet = f"{prefix}- "
+    label = f"**{key}:**" if key else ""
+
+    if value in (None, "", [], {}):
+        return []
+
+    if max_depth <= 0:
+        if isinstance(value, (dict, list)):
+            preview = json.dumps(value, ensure_ascii=False, default=str)
+        else:
+            preview = str(value)
+        return [f"{bullet}{label} {_truncate_text(preview, limit=240)}" if label else f"{bullet}{_truncate_text(preview, limit=240)}"]
+
+    if isinstance(value, dict):
+        lines = [f"{bullet}{label}" if label else f"{bullet}{len(value)} fields"]
+        shown = 0
+        for child_key, child_value in value.items():
+            if child_value in (None, "", [], {}):
+                continue
+            lines.extend(
+                _format_structured_value(
+                    str(child_key),
+                    child_value,
+                    indent=indent + 1,
+                    max_depth=max_depth - 1,
+                    max_items=max_items,
+                )
+            )
+            shown += 1
+            if shown >= max_items:
+                remaining = max(0, len(value) - shown)
+                if remaining:
+                    lines.append(f"{'  ' * (indent + 1)}- ... {remaining} more fields")
+                break
+        return lines
+
+    if isinstance(value, list):
+        lines = [f"{bullet}{label} {len(value)} item{'s' if len(value) != 1 else ''}" if label else f"{bullet}{len(value)} item{'s' if len(value) != 1 else ''}"]
+        for idx, item in enumerate(value[:max_items], 1):
+            if isinstance(item, dict):
+                headline = str(item.get("content") or item.get("message") or item.get("title") or item.get("name") or item.get("id") or "").strip()
+                if headline:
+                    lines.append(f"{'  ' * (indent + 1)}{idx}. {_truncate_text(headline, limit=220)}")
+                    for child_key in ("id", "status", "type", "scope", "quality_score", "score", "path", "url"):
+                        child_value = item.get(child_key)
+                        if child_value not in (None, "", [], {}):
+                            lines.append(f"{'  ' * (indent + 2)}- **{child_key}:** {_truncate_text(str(child_value), limit=180)}")
+                else:
+                    lines.append(f"{'  ' * (indent + 1)}{idx}.")
+                    for child_key, child_value in list(item.items())[:max_items]:
+                        lines.extend(
+                            _format_structured_value(
+                                str(child_key),
+                                child_value,
+                                indent=indent + 2,
+                                max_depth=max_depth - 1,
+                                max_items=max_items,
+                            )
+                        )
+            elif isinstance(item, list):
+                lines.append(f"{'  ' * (indent + 1)}{idx}. {len(item)} items")
+                for nested in item[:max_items]:
+                    lines.extend(
+                        _format_structured_value(
+                            "",
+                            nested,
+                            indent=indent + 2,
+                            max_depth=max_depth - 1,
+                            max_items=max_items,
+                        )
+                    )
+            else:
+                lines.append(f"{'  ' * (indent + 1)}{idx}. {_truncate_text(str(item), limit=240)}")
+        if len(value) > max_items:
+            lines.append(f"{'  ' * (indent + 1)}... {len(value) - max_items} more items")
+        return lines
+
+    return [f"{bullet}{label} {_truncate_text(str(value), limit=500)}" if label else f"{bullet}{_truncate_text(str(value), limit=500)}"]
+
+
+def _format_generic_structured_result(
+    tool_name: str,
+    result: Optional[str],
+    *,
+    fallback_to_text: bool = True,
+) -> Optional[str]:
    data = _json_loads_maybe(result)
    if not isinstance(data, (dict, list)):
-        return result if isinstance(result, str) and result.strip() else None
+        return result if fallback_to_text and isinstance(result, str) and result.strip() else None
    if isinstance(data, list):
        lines = [f"{tool_name}: {len(data)} item{'s' if len(data) != 1 else ''}"]
        for item in data[:12]:
-            lines.append(f"- {_truncate_text(str(item), limit=240)}")
+            if isinstance(item, (dict, list)):
+                lines.extend(_format_structured_value("", item, indent=0, max_depth=2, max_items=6))
+            else:
+                lines.append(f"- {_truncate_text(str(item), limit=240)}")
+        if len(data) > 12:
+            lines.append(f"... {len(data) - 12} more items")
        return _truncate_text("\n".join(lines), limit=5000)

    if data.get("success") is False or data.get("error"):
@@ -699,12 +857,9 @@ def _format_generic_structured_result(tool_name: str, result: Optional[str]) ->
            continue
        if value in (None, "", [], {}):
            continue
-        if isinstance(value, (dict, list)):
-            preview = json.dumps(value, ensure_ascii=False, default=str)
-        else:
-            preview = str(value)
-        lines.append(f"- **{key}:** {_truncate_text(preview, limit=500)}")
-        if len(lines) >= 14:
+        lines.extend(_format_structured_value(str(key), value, indent=0, max_depth=3, max_items=8))
+        if len(lines) >= 40:
+            lines.append("- ... more fields truncated")
            break

    content = data.get("content")
@@ -744,8 +899,9 @@ def _build_polished_completion_content(
    if formatter is None and tool_name in _POLISHED_TOOLS:
        formatter = lambda: _format_generic_structured_result(tool_name, result)
    if formatter is None:
-        return None
-    text = formatter()
+        text = _format_generic_structured_result(tool_name, result, fallback_to_text=False)
+    else:
+        text = formatter()
    if not text:
        return None
    return [_text(text)]
@@ -769,8 +925,8 @@ def _build_patch_mode_content(patch_text: str) -> List[Any]:
                old_chunks: list[str] = []
                new_chunks: list[str] = []
                for hunk in op.hunks:
-                    old_lines = [line.content for line in hunk.lines if line.prefix in (" ", "-")]
-                    new_lines = [line.content for line in hunk.lines if line.prefix in (" ", "+")]
+                    old_lines = [line.content for line in hunk.lines if line.prefix in {" ", "-"}]
+                    new_lines = [line.content for line in hunk.lines if line.prefix in {" ", "+"}]
                    if old_lines or new_lines:
                        old_chunks.append("\n".join(old_lines))
                        new_chunks.append("\n".join(new_lines))
@@ -895,7 +1051,7 @@ def _build_tool_complete_content(
    if len(display_result) > 5000:
        display_result = display_result[:4900] + f"\n... ({len(result)} chars total, truncated)"

-    if tool_name in {"write_file", "patch", "skill_manage"}:
+    if tool_name == "skill_manage":
        try:
            from agent.display import extract_edit_diff

@@ -928,6 +1084,8 @@ def build_tool_start(
    tool_call_id: str,
    tool_name: str,
    arguments: Dict[str, Any],
+    *,
+    edit_diff: Any = None,
 ) -> ToolCallStart:
    """Create a ToolCallStart event for the given hermes tool invocation."""
    kind = get_tool_kind(tool_name)
@@ -935,23 +1093,34 @@ def build_tool_start(
    locations = extract_locations(arguments)

    if tool_name == "patch":
-        mode = arguments.get("mode", "replace")
-        if mode == "replace":
-            path = arguments.get("path", "")
-            old = arguments.get("old_string", "")
-            new = arguments.get("new_string", "")
-            content = [acp.tool_diff_content(path=path, new_text=new, old_text=old)]
+        if edit_diff is not None:
+            content = [
+                acp.tool_diff_content(
+                    path=edit_diff.path,
+                    old_text=edit_diff.old_text,
+                    new_text=edit_diff.new_text,
+                )
+            ]
        else:
-            patch_text = arguments.get("patch", "")
-            content = _build_patch_mode_content(patch_text)
+            mode = arguments.get("mode", "replace")
+            path = arguments.get("path") or "patch input"
+            content = [_text(f"Preparing {mode} edit for {path}. Approval prompt shows the diff.")]
        return acp.start_tool_call(
            tool_call_id, title, kind=kind, content=content, locations=locations,
        )

    if tool_name == "write_file":
-        path = arguments.get("path", "")
-        file_content = arguments.get("content", "")
-        content = [acp.tool_diff_content(path=path, new_text=file_content)]
+        if edit_diff is not None:
+            content = [
+                acp.tool_diff_content(
+                    path=edit_diff.path,
+                    old_text=edit_diff.old_text,
+                    new_text=edit_diff.new_text,
+                )
+            ]
+        else:
+            path = arguments.get("path", "")
+            content = [_text(f"Preparing write to {path}. Approval prompt shows the diff." if path else "Preparing file write. Approval prompt shows the diff.")]
        return acp.start_tool_call(
            tool_call_id, title, kind=kind, content=content, locations=locations,
        )
@@ -1122,8 +1291,12 @@ def build_tool_start(
            tool_call_id, title, kind=kind, content=content, locations=locations,
        )

+    if not arguments:
+        return acp.start_tool_call(
+            tool_call_id, title, kind=kind, content=None, locations=locations, raw_input=None,
+        )
+
    # Generic fallback
-    import json
    try:
        args_text = json.dumps(arguments, indent=2, default=str)
    except (TypeError, ValueError):
@@ -1135,6 +1308,10 @@ def build_tool_start(
    )


+def _is_structured_json_result(result: Optional[str]) -> bool:
+    return isinstance(_json_loads_maybe(result), (dict, list))
+
+
 def build_tool_complete(
    tool_call_id: str,
    tool_name: str,
@@ -1157,9 +1334,9 @@ def build_tool_complete(
    return acp.update_tool_call(
        tool_call_id,
        kind=kind,
-        status="completed",
+        status="failed" if _tool_result_failed(result, tool_name) else "completed",
        content=content,
-        raw_output=None if tool_name in _POLISHED_TOOLS else result,
+        raw_output=None if tool_name in _POLISHED_TOOLS or _is_structured_json_result(result) else result,
    )


--- a/acp_registry/agent.json
+++ b/acp_registry/agent.json
@@ -1,12 +1,16 @@
 {
-  "schema_version": 1,
-  "name": "hermes-agent",
-  "display_name": "Hermes Agent",
-  "description": "AI agent by Nous Research with 90+ tools, persistent memory, and multi-platform support",
-  "icon": "icon.svg",
+  "id": "hermes-agent",
+  "name": "Hermes Agent",
+  "version": "0.14.0",
+  "description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
+  "repository": "https://github.com/NousResearch/hermes-agent",
+  "website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
+  "authors": ["Nous Research"],
+  "license": "MIT",
  "distribution": {
-    "type": "command",
-    "command": "hermes",
-    "args": ["acp"]
+    "uvx": {
+      "package": "hermes-agent[acp]==0.14.0",
+      "args": ["hermes-acp"]
+    }
  }
 }
--- a/acp_registry/icon.svg
+++ b/acp_registry/icon.svg
@@ -1,25 +1,8 @@
-<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 64 64" width="64" height="64">
-  <defs>
-    <linearGradient id="gold" x1="0%" y1="0%" x2="0%" y2="100%">
-      <stop offset="0%" style="stop-color:#F5C542;stop-opacity:1" />
-      <stop offset="100%" style="stop-color:#D4961C;stop-opacity:1" />
-    </linearGradient>
-  </defs>
-  <!-- Staff -->
-  <rect x="30" y="10" width="4" height="46" rx="2" fill="url(#gold)" />
-  <!-- Wings (left) -->
-  <path d="M30 18 C24 14, 14 14, 10 18 C14 16, 22 16, 28 20" fill="#F5C542" opacity="0.9" />
-  <path d="M30 22 C26 19, 18 19, 14 22 C18 20, 24 20, 28 24" fill="#D4961C" opacity="0.8" />
-  <!-- Wings (right) -->
-  <path d="M34 18 C40 14, 50 14, 54 18 C50 16, 42 16, 36 20" fill="#F5C542" opacity="0.9" />
-  <path d="M34 22 C38 19, 46 19, 50 22 C46 20, 40 20, 36 24" fill="#D4961C" opacity="0.8" />
-  <!-- Left serpent -->
-  <path d="M32 48 C22 44, 20 38, 26 34 C20 36, 18 42, 24 46 C18 40, 22 30, 30 28 C24 32, 22 38, 28 42"
-        fill="none" stroke="#F5C542" stroke-width="2.5" stroke-linecap="round" />
-  <!-- Right serpent -->
-  <path d="M32 48 C42 44, 44 38, 38 34 C44 36, 46 42, 40 46 C46 40, 42 30, 34 28 C40 32, 42 38, 36 42"
-        fill="none" stroke="#D4961C" stroke-width="2.5" stroke-linecap="round" />
-  <!-- Orb at top -->
-  <circle cx="32" cy="10" r="4" fill="#F5C542" />
-  <circle cx="32" cy="10" r="2" fill="#FFF8E1" opacity="0.7" />
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16" fill="none">
+  <path d="M8 1.5v13" stroke="currentColor" stroke-width="1.5" stroke-linecap="round"/>
+  <path d="M8 3.25c-2.35-1.4-4.7-.95-6.25.35 1.85-.2 3.8.2 5.55 1.55" stroke="currentColor" stroke-width="1.1" stroke-linecap="round" stroke-linejoin="round"/>
+  <path d="M8 3.25c2.35-1.4 4.7-.95 6.25.35-1.85-.2-3.8.2-5.55 1.55" stroke="currentColor" stroke-width="1.1" stroke-linecap="round" stroke-linejoin="round"/>
+  <path d="M8 13.25c-2.3-1-3.05-2.65-1.35-4.15-2 .8-2.35 2.95-.35 4" stroke="currentColor" stroke-width="1.1" stroke-linecap="round" stroke-linejoin="round"/>
+  <path d="M8 13.25c2.3-1 3.05-2.65 1.35-4.15 2 .8 2.35 2.95.35 4" stroke="currentColor" stroke-width="1.1" stroke-linecap="round" stroke-linejoin="round"/>
+  <circle cx="8" cy="1.8" r="1.1" fill="currentColor"/>
 </svg>
--- a/agent/account_usage.py
+++ b/agent/account_usage.py
@@ -47,7 +47,7 @@ def _title_case_slug(value: Optional[str]) -> Optional[str]:


 def _parse_dt(value: Any) -> Optional[datetime]:
-    if value in (None, ""):
+    if value in {None, ""}:
        return None
    if isinstance(value, (int, float)):
        return datetime.fromtimestamp(float(value), tz=timezone.utc)
--- a/agent/agent_init.py
+++ b/agent/agent_init.py
--- a/agent/agent_runtime_helpers.py
+++ b/agent/agent_runtime_helpers.py
--- a/agent/anthropic_adapter.py
+++ b/agent/anthropic_adapter.py
@@ -17,6 +17,7 @@ import os
 import platform
 import subprocess
 from pathlib import Path
+from urllib.parse import urlparse

 from hermes_constants import get_hermes_home
 from typing import Any, Dict, List, Optional, Tuple
@@ -35,6 +36,14 @@ def _get_anthropic_sdk():
    """Return the ``anthropic`` SDK module, importing lazily. None if not installed."""
    global _anthropic_sdk
    if _anthropic_sdk is ...:
+        try:
+            from tools.lazy_deps import ensure as _lazy_ensure
+            _lazy_ensure("provider.anthropic", prompt=False)
+        except ImportError:
+            pass
+        except Exception:
+            # FeatureUnavailable — fall through to ImportError handling below
+            pass
        try:
            import anthropic as _sdk
            _anthropic_sdk = _sdk
@@ -356,7 +365,7 @@ def _normalize_base_url_text(base_url) -> str:
 def _is_third_party_anthropic_endpoint(base_url: str | None) -> bool:
    """Return True for non-Anthropic endpoints using the Anthropic Messages API.

-    Third-party proxies (Azure AI Foundry, AWS Bedrock, self-hosted) authenticate
+    Third-party proxies (Microsoft Foundry, AWS Bedrock, self-hosted) authenticate
    with their own API keys via x-api-key, not Anthropic OAuth tokens. OAuth
    detection should be skipped for these endpoints.
    """
@@ -463,14 +472,18 @@ def _requires_bearer_auth(base_url: str | None) -> bool:
    """Return True for Anthropic-compatible providers that require Bearer auth.

    Some third-party /anthropic endpoints implement Anthropic's Messages API but
-    require Authorization: Bearer *** of Anthropic's native x-api-key header.
-    MiniMax's global and China Anthropic-compatible endpoints follow this pattern.
+    require Authorization: Bearer instead of Anthropic's native x-api-key header.
+    MiniMax's global and China Anthropic-compatible endpoints, and Azure AI
+    Foundry's Anthropic-style endpoint follow this pattern.
    """
    normalized = _normalize_base_url_text(base_url)
    if not normalized:
        return False
    normalized = normalized.rstrip("/").lower()
-    return normalized.startswith(("https://api.minimax.io/anthropic", "https://api.minimaxi.com/anthropic"))
+    return (
+        normalized.startswith(("https://api.minimax.io/anthropic", "https://api.minimaxi.com/anthropic"))
+        or "azure.com" in normalized
+    )


 def _base_url_needs_context_1m_beta(base_url: str | None) -> bool:
@@ -481,6 +494,44 @@ def _base_url_needs_context_1m_beta(base_url: str | None) -> bool:
    return "azure.com" in normalized


+def _is_minimax_anthropic_endpoint(base_url: str | None) -> bool:
+    """Return True for MiniMax's Anthropic-compatible endpoints.
+
+    MiniMax rejects the fine-grained-tool-streaming and context-1m betas;
+    those need to be stripped even though MiniMax also uses Bearer auth.
+    """
+    normalized = _normalize_base_url_text(base_url)
+    if not normalized:
+        return False
+    normalized = normalized.rstrip("/").lower()
+    return normalized.startswith(
+        ("https://api.minimax.io/anthropic", "https://api.minimaxi.com/anthropic")
+    )
+
+
+def _is_azure_anthropic_endpoint(base_url: str | None) -> bool:
+    """Return True for Azure-hosted Anthropic Messages endpoints.
+
+    Covers both the modern Foundry host family (``*.services.ai.azure.*``)
+    and the legacy Azure OpenAI host family (``*.openai.azure.*``) when
+    serving Anthropic's ``/anthropic`` route. Used to opt-in those hosts
+    to the ``api-version`` query-param plumbing required by Azure.
+
+    Intentionally avoids a finite allow-list of TLD suffixes so it works
+    across sovereign / private Azure clouds.
+    """
+    normalized = _normalize_base_url_text(base_url)
+    if not normalized:
+        return False
+    parsed = urlparse(normalized)
+    host = (parsed.hostname or "").lower().rstrip(".")
+    path = (parsed.path or "").lower()
+    host_padded = f".{host}."
+    is_foundry_host = ".services.ai.azure." in host_padded
+    is_legacy_azoai_host = ".openai.azure." in host_padded
+    return (is_foundry_host or is_legacy_azoai_host) and "/anthropic" in path
+
+
 def _common_betas_for_base_url(
    base_url: str | None,
    *,
@@ -490,11 +541,13 @@ def _common_betas_for_base_url(

    MiniMax's Anthropic-compatible endpoints (Bearer-auth) reject requests
    that include Anthropic's ``fine-grained-tool-streaming`` beta — every
-    tool-use message triggers a connection error.
+    tool-use message triggers a connection error. They also reject the
+    1M-context beta. Azure AI Foundry's Anthropic endpoint also uses
+    Bearer auth but keeps both betas (it needs the 1M beta for 1M context).

    The ``context-1m-2025-08-07`` beta is not sent to native Anthropic by
    default because some subscriptions reject it. Add it only for endpoint
-    families that still require it for 1M context, currently Azure AI Foundry.
+    families that still require it for 1M context, currently Microsoft Foundry.
    Bedrock uses its own client helper below and opts in explicitly.

    ``drop_context_1m_beta=True`` strips the 1M-context beta from any path that
@@ -503,7 +556,7 @@ def _common_betas_for_base_url(
    betas = list(_COMMON_BETAS)
    if _base_url_needs_context_1m_beta(base_url) and not drop_context_1m_beta:
        betas.append(_CONTEXT_1M_BETA)
-    if _requires_bearer_auth(base_url):
+    if _is_minimax_anthropic_endpoint(base_url):
        _stripped = {_TOOL_STREAMING_BETA, _CONTEXT_1M_BETA}
        return [b for b in betas if b not in _stripped]
    if drop_context_1m_beta:
@@ -511,8 +564,81 @@ def _common_betas_for_base_url(
    return betas


+def _build_anthropic_client_with_bearer_hook(
+    token_provider,
+    base_url: str = None,
+    timeout: float = None,
+    *,
+    drop_context_1m_beta: bool = False,
+):
+    """Anthropic-on-Foundry Entra ID variant of :func:`build_anthropic_client`.
+
+    Anthropic SDK 0.86.0 stores ``api_key`` / ``auth_token`` as static
+    strings; there is no callable-token contract. To get per-request
+    bearer refresh (Microsoft's documented Foundry pattern), we hand
+    the SDK a custom ``httpx.Client`` whose request event hook mints a
+    fresh JWT from the Entra credential chain and rewrites
+    ``Authorization: Bearer <jwt>`` on every outbound request. The SDK
+    ignores its own auth logic when ``http_client`` is provided (the
+    hook strips any pre-set Authorization).
+
+    The placeholder ``auth_token`` is required because the SDK raises
+    ``AnthropicError`` at construction if neither ``api_key`` nor
+    ``auth_token`` is set — but the hook overrides it per-request so
+    the placeholder value never reaches Azure.
+    """
+    _anthropic_sdk = _get_anthropic_sdk()
+    if _anthropic_sdk is None:
+        raise ImportError(
+            "The 'anthropic' package is required for Azure Foundry Anthropic-style "
+            "endpoints with Entra ID auth. Install with: pip install 'anthropic>=0.39.0'"
+        )
+
+    normalize_proxy_env_vars()
+
+    from httpx import Timeout
+    from agent.azure_identity_adapter import build_bearer_http_client
+
+    _read_timeout = timeout if (isinstance(timeout, (int, float)) and timeout > 0) else 900.0
+    timeout_obj = Timeout(timeout=float(_read_timeout), connect=10.0)
+
+    # Strip any trailing /v1 — the Anthropic SDK appends /v1/messages.
+    normalized_base_url = _normalize_base_url_text(base_url)
+    if normalized_base_url:
+        import re as _re
+        normalized_base_url = _re.sub(r"/v1/?$", "", normalized_base_url.rstrip("/"))
+
+    http_client = build_bearer_http_client(token_provider, timeout=timeout_obj)
+
+    kwargs = {
+        "timeout": timeout_obj,
+        "http_client": http_client,
+        # The SDK requires *something* for api_key/auth_token. Our
+        # event hook overrides Authorization per request so this value
+        # is never sent. The sentinel string makes accidental leaks
+        # diagnosable in logs.
+        "auth_token": "entra-id-bearer-via-http-hook",
+    }
+
+    if normalized_base_url:
+        if _is_azure_anthropic_endpoint(normalized_base_url) and "api-version" not in normalized_base_url:
+            kwargs["base_url"] = normalized_base_url
+            kwargs["default_query"] = {"api-version": "2025-04-15"}
+        else:
+            kwargs["base_url"] = normalized_base_url
+
+    common_betas = _common_betas_for_base_url(
+        normalized_base_url,
+        drop_context_1m_beta=drop_context_1m_beta,
+    )
+    if common_betas:
+        kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}
+
+    return _anthropic_sdk.Anthropic(**kwargs)
+
+
 def build_anthropic_client(
-    api_key: str,
+    api_key,
    base_url: str = None,
    timeout: float = None,
    *,
@@ -520,6 +646,17 @@ def build_anthropic_client(
 ):
    """Create an Anthropic client, auto-detecting setup-tokens vs API keys.

+    ``api_key`` accepts either:
+
+    * a static ``str`` — the historical contract for all key-based and
+      OAuth flows.
+    * a ``Callable[[], str]`` — an Entra ID bearer token provider from
+      :mod:`agent.azure_identity_adapter`. The Anthropic SDK itself
+      requires a static string, so when given a callable we construct
+      a custom ``httpx.Client`` with a request event hook that mints a
+      fresh JWT per outbound request and rewrites the ``Authorization``
+      header. The SDK never sees the callable directly.
+
    If *timeout* is provided it overrides the default 900s read timeout.  The
    connect timeout stays at 10s.  Callers pass this from the per-provider /
    per-model ``request_timeout_seconds`` config so Anthropic-native and
@@ -541,6 +678,14 @@ def build_anthropic_client(
            "Install it with: pip install 'anthropic>=0.39.0'"
        )

+    # Callable api_key → Entra ID bearer provider path. Delegated to a
+    # helper so the existing static-key code below stays unchanged.
+    if callable(api_key) and not isinstance(api_key, str):
+        return _build_anthropic_client_with_bearer_hook(
+            api_key, base_url, timeout,
+            drop_context_1m_beta=drop_context_1m_beta,
+        )
+
    normalize_proxy_env_vars()

    from httpx import Timeout
@@ -555,8 +700,7 @@ def build_anthropic_client(
        # Pass it via default_query so the SDK appends it to every request URL
        # without corrupting the base_url (appending it directly produces
        # malformed paths like /anthropic?api-version=.../v1/messages).
-        _is_azure_endpoint = "azure.com" in normalized_base_url.lower()
-        if _is_azure_endpoint and "api-version" not in normalized_base_url:
+        if _is_azure_anthropic_endpoint(normalized_base_url) and "api-version" not in normalized_base_url:
            kwargs["base_url"] = normalized_base_url.rstrip("/")
            kwargs["default_query"] = {"api-version": "2025-04-15"}
        else:
@@ -586,7 +730,7 @@ def build_anthropic_client(
        if common_betas:
            kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}
    elif _is_third_party_anthropic_endpoint(base_url):
-        # Third-party proxies (Azure AI Foundry, AWS Bedrock, etc.) use their
+        # Third-party proxies (Microsoft Foundry, AWS Bedrock, etc.) use their
        # own API keys with x-api-key auth. Skip OAuth detection — their keys
        # don't follow Anthropic's sk-ant-* prefix convention and would be
        # misclassified as OAuth tokens.
@@ -1052,10 +1196,12 @@ def _generate_pkce() -> tuple:

 def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]:
    """Run Hermes-native OAuth PKCE flow and return credential state."""
+    import secrets
    import time
    import webbrowser

    verifier, challenge = _generate_pkce()
+    oauth_state = secrets.token_urlsafe(32)

    params = {
        "code": "true",
@@ -1065,7 +1211,7 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]:
        "scope": _OAUTH_SCOPES,
        "code_challenge": challenge,
        "code_challenge_method": "S256",
-        "state": verifier,
+        "state": oauth_state,
    }
    from urllib.parse import urlencode

@@ -1102,7 +1248,12 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]:

    splits = auth_code.split("#")
    code = splits[0]
-    state = splits[1] if len(splits) > 1 else ""
+    received_state = splits[1] if len(splits) > 1 else ""
+
+    # Validate state to prevent CSRF (RFC 6749 §10.12)
+    if received_state != oauth_state:
+        logger.warning("OAuth state mismatch — possible CSRF, aborting")
+        return None

    try:
        import urllib.request
@@ -1111,7 +1262,7 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]:
            "grant_type": "authorization_code",
            "client_id": _OAUTH_CLIENT_ID,
            "code": code,
-            "state": state,
+            "state": received_state,
            "redirect_uri": _OAUTH_REDIRECT_URI,
            "code_verifier": verifier,
        }).encode()
@@ -1289,13 +1440,20 @@ def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
            continue
        if name:
            seen_names.add(name)
-        result.append({
+        anthropic_tool: Dict[str, Any] = {
            "name": name,
            "description": fn.get("description", ""),
            "input_schema": _normalize_tool_input_schema(
                fn.get("parameters", {"type": "object", "properties": {}})
            ),
-        })
+        }
+        # Forward cache_control marker when present on the OpenAI-format
+        # tool dict. Anthropic's tools array supports cache_control on the
+        # last tool to cache the entire schema cross-session.
+        cache_control = t.get("cache_control")
+        if isinstance(cache_control, dict):
+            anthropic_tool["cache_control"] = dict(cache_control)
+        result.append(anthropic_tool)
    return result


@@ -1537,7 +1695,7 @@ def convert_messages_to_anthropic(
            # downgraded to a spurious text block on the last assistant message.
            reasoning_content = m.get("reasoning_content")
            _already_has_thinking = any(
-                isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking")
+                isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"}
                for b in blocks
            )
            if isinstance(reasoning_content, str) and not _already_has_thinking:
@@ -1688,7 +1846,7 @@ def convert_messages_to_anthropic(
                if isinstance(m["content"], list):
                    m["content"] = [
                        b for b in m["content"]
-                        if not (isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking"))
+                        if not (isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"})
                    ]
                prev_blocks = fixed[-1]["content"]
                curr_blocks = m["content"]
@@ -1714,7 +1872,7 @@ def convert_messages_to_anthropic(
    # causing HTTP 400 "Invalid signature in thinking block".
    #
    # Signatures are Anthropic-proprietary.  Third-party endpoints
-    # (MiniMax, Azure AI Foundry, self-hosted proxies) cannot validate
+    # (MiniMax, Microsoft Foundry, self-hosted proxies) cannot validate
    # them and will reject them outright.  When targeting a third-party
    # endpoint, strip ALL thinking/redacted_thinking blocks from every
    # assistant message — the third-party will generate its own
@@ -2060,5 +2218,3 @@ def build_anthropic_kwargs(
        kwargs["extra_headers"] = {"anthropic-beta": ",".join(betas)}

    return kwargs
-
-
--- a/agent/async_utils.py
+++ b/agent/async_utils.py
@@ -0,0 +1,68 @@
+"""Async/sync bridging helpers.
+
+The codebase has ~30 sites that schedule a coroutine onto an event loop from a
+worker thread via :func:`asyncio.run_coroutine_threadsafe`.  That function can
+raise :class:`RuntimeError` (e.g. the loop was closed during a shutdown race),
+and when it does the coroutine object is never awaited and never closed —
+which triggers a ``"coroutine '<name>' was never awaited"`` RuntimeWarning and
+leaks the coroutine's frame until GC.
+
+:func:`safe_schedule_threadsafe` wraps the call, closes the coroutine on
+scheduling failure, and returns ``None`` (instead of a half-formed future) so
+callers can branch cleanly:
+
+    fut = safe_schedule_threadsafe(coro, loop)
+    if fut is None:
+        return  # or fallback behavior
+    fut.result(timeout=5)
+
+The helper deliberately does NOT also handle ``future.result()`` failures —
+that is a separate concern.  Once the loop has accepted the coroutine, its
+lifecycle belongs to the loop, not the scheduling thread.
+"""
+from __future__ import annotations
+
+import asyncio
+import logging
+from concurrent.futures import Future
+from typing import Any, Coroutine, Optional
+
+
+_DEFAULT_LOGGER = logging.getLogger(__name__)
+
+
+def safe_schedule_threadsafe(
+    coro: Coroutine[Any, Any, Any],
+    loop: Optional[asyncio.AbstractEventLoop],
+    *,
+    logger: Optional[logging.Logger] = None,
+    log_message: str = "Failed to schedule coroutine on loop",
+    log_level: int = logging.DEBUG,
+) -> Optional[Future]:
+    """Schedule ``coro`` on ``loop`` from a sync context, leak-safe.
+
+    Returns the :class:`concurrent.futures.Future` on success, or ``None`` if
+    the loop is missing or :func:`asyncio.run_coroutine_threadsafe` raised
+    (e.g. the loop was closed during a shutdown race).  In all failure paths
+    the coroutine is :meth:`close`-d so it does not trigger
+    ``"coroutine was never awaited"`` warnings or leak its frame.
+
+    Callers retain full control over what to do with the returned future
+    (call ``.result(timeout=...)``, attach ``add_done_callback``, ignore it
+    fire-and-forget, etc.).
+    """
+    log = logger if logger is not None else _DEFAULT_LOGGER
+
+    if loop is None:
+        if asyncio.iscoroutine(coro):
+            coro.close()
+        log.log(log_level, "%s: loop is None", log_message)
+        return None
+
+    try:
+        return asyncio.run_coroutine_threadsafe(coro, loop)
+    except Exception as exc:
+        if asyncio.iscoroutine(coro):
+            coro.close()
+        log.log(log_level, "%s: %s", log_message, exc)
+        return None
--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
--- a/agent/azure_identity_adapter.py
+++ b/agent/azure_identity_adapter.py
@@ -0,0 +1,555 @@
+"""Microsoft Entra ID adapter for Microsoft Foundry.
+
+Provides keyless authentication for Microsoft Foundry deployments using the
+`azure-identity` SDK's `DefaultAzureCredential` chain (env service principal
+→ workload identity → managed identity → VS Code → Azure CLI → azd →
+PowerShell → broker).
+
+Architecture mirrors `agent/bedrock_adapter.py`:
+
+* Lazy import. `azure-identity` is only loaded when ``model.auth_mode =
+  entra_id`` is selected. Users who stick with `AZURE_FOUNDRY_API_KEY`
+  never pay the import cost.
+* SDK-callable contract. The public entry point ``build_token_provider``
+  returns a zero-arg callable produced by ``get_bearer_token_provider`` —
+  this is exactly the value Microsoft's documented sample plugs into
+  ``OpenAI(api_key=token_provider, base_url=...)``. The OpenAI SDK calls
+  it before every request, so token refresh is transparent.
+* Three explicit consumer-side helpers (display / cache / http-bearer)
+  rather than one generic "materialize" function — splitting them by
+  purpose prevents accidental token-minting in logging paths or token
+  leakage into cache keys / dashboard JSON.
+* No persisted JWT. ``azure-identity`` caches in-process and (where
+  available) in the OS keychain or ``~/.IdentityService``. Hermes does
+  not duplicate that storage in ``auth.json``.
+
+Reference: https://learn.microsoft.com/azure/ai-foundry/foundry-models/how-to/configure-entra-id
+
+Requires: ``azure-identity`` (optional dependency — only needed when
+``model.auth_mode = entra_id``).
+"""
+
+from __future__ import annotations
+
+import functools
+import logging
+import os
+import threading
+from dataclasses import dataclass
+from typing import Any, Callable, Dict, Optional
+
+logger = logging.getLogger(__name__)
+
+# Microsoft-documented scope for Foundry inference auth. Both the new
+# Foundry portal and the legacy Azure OpenAI managed-identity docs use
+# this scope for ALL Foundry endpoint shapes (*.openai.azure.com,
+# *.services.ai.azure.com, *.ai.azure.com). The older control-plane
+# scope ``https://cognitiveservices.azure.com/.default`` is for ARM
+# resource management and is rejected for inference by newer
+# resources — users with that requirement override via
+# ``model.entra.scope`` in config.yaml.
+SCOPE_AI_AZURE_DEFAULT = "https://ai.azure.com/.default"
+
+# ---------------------------------------------------------------------------
+# Lazy SDK import — only loaded when the Entra path is actually used.
+# ---------------------------------------------------------------------------
+
+_AZURE_IDENTITY_FEATURE = "provider.azure_identity"
+
+
+def has_azure_identity_installed() -> bool:
+    """Return True if `azure-identity` can be imported right now.
+
+    Cheap check — does not walk the credential chain.
+    """
+    try:
+        import azure.identity  # noqa: F401
+        return True
+    except Exception:
+        return False
+
+
+def _require_azure_identity():
+    """Import ``azure.identity``, lazy-installing it if allowed.
+
+    Raises ``ImportError`` with a clear actionable message when the
+    package is missing and lazy installs are disabled.
+    """
+    try:
+        import azure.identity as _ai
+        return _ai
+    except ImportError:
+        try:
+            from tools.lazy_deps import ensure, FeatureUnavailable
+        except ImportError as exc:
+            raise ImportError(
+                "The 'azure-identity' package is required for Azure AI "
+                "Foundry Entra ID authentication. Install it with: "
+                "pip install azure-identity"
+            ) from exc
+
+        try:
+            ensure(_AZURE_IDENTITY_FEATURE, prompt=False)
+        except FeatureUnavailable as exc:
+            raise ImportError(
+                "The 'azure-identity' package is required for Azure AI "
+                "Foundry Entra ID authentication. " + str(exc)
+            ) from exc
+
+        # Retry import after lazy install.
+        import azure.identity as _ai  # noqa: WPS440
+        return _ai
+
+
+def reset_credential_cache() -> None:
+    """Clear the cached ``DefaultAzureCredential``. Used by tests and
+    profile switches.
+
+    Defensive against tests that ``monkeypatch.setattr`` over
+    ``build_credential`` with a plain (non-lru-cached) function — those
+    won't expose ``cache_clear()`` until pytest reverts the patch.
+    """
+    cache_clear = getattr(build_credential, "cache_clear", None)
+    if callable(cache_clear):
+        cache_clear()
+
+
+# ---------------------------------------------------------------------------
+# Token-provider construction
+# ---------------------------------------------------------------------------
+
+
+@dataclass(frozen=True)
+class EntraIdentityConfig:
+    """Serializable Entra ID config.
+
+    Captures the Hermes-managed Entra knobs we need outside Azure SDK
+    environment configuration. Everything else
+    (tenant ID, service principal secret, federated token file, sovereign
+    cloud authority, etc.) flows through azure-identity's standard
+    ``AZURE_*`` env vars — see the Bedrock pattern in
+    ``hermes_cli/runtime_provider.py:1310-1377`` for the analogous
+    "let the SDK read env" approach.
+
+    ``scope`` is Microsoft's documented Foundry inference audience. Almost
+    everyone uses the default; sovereign-cloud / non-standard tenants can
+    override via ``model.entra.scope``. Identity selection (user-assigned
+    managed identity, workload identity, service principal, tenant, authority)
+    stays in the standard Azure SDK env vars such as ``AZURE_CLIENT_ID``.
+
+    ``exclude_interactive_browser`` is kept as an internal constructor knob
+    so probes stay non-interactive by default. It is not written by the setup
+    wizard.
+
+    The dataclass is frozen so it's hashable for ``functools.lru_cache``
+    keying, and serializable across multiprocessing boundaries (workers
+    rebuild the credential inside their own process).
+    """
+
+    scope: str = SCOPE_AI_AZURE_DEFAULT
+    exclude_interactive_browser: bool = True
+
+    def __post_init__(self) -> None:
+        scope = str(self.scope or "").strip() or SCOPE_AI_AZURE_DEFAULT
+        object.__setattr__(self, "scope", scope)
+
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "scope": self.scope,
+            "exclude_interactive_browser": self.exclude_interactive_browser,
+        }
+
+    @classmethod
+    def from_dict(cls, data: Optional[Dict[str, Any]],
+                  *, default_scope: Optional[str] = None) -> "EntraIdentityConfig":
+        data = data or {}
+        scope = str(data.get("scope") or "").strip() or default_scope or SCOPE_AI_AZURE_DEFAULT
+        exclude_browser = bool(data.get("exclude_interactive_browser", True))
+        return cls(
+            scope=scope,
+            exclude_interactive_browser=exclude_browser,
+        )
+
+
+def _build_default_credential(config: EntraIdentityConfig) -> Any:
+    """Construct a ``DefaultAzureCredential`` for ``config``.
+
+    Only Hermes-selected knobs are passed as kwargs. Everything else
+    (tenant, service principal secret, federated token file, sovereign
+    cloud authority, etc.) is read by ``azure-identity`` from the
+    standard ``AZURE_*`` environment variables — see Microsoft's
+    documented credential resolution chain. Users configure those in
+    ``~/.hermes/.env`` or the deployment environment.
+    """
+    ai = _require_azure_identity()
+    kwargs: Dict[str, Any] = {}
+    # SDK default is True (browser excluded); only pass when the user
+    # explicitly opts in to interactive browser auth.
+    if not config.exclude_interactive_browser:
+        kwargs["exclude_interactive_browser_credential"] = False
+    return ai.DefaultAzureCredential(**kwargs)
+
+
+@functools.lru_cache(maxsize=1)
+def build_credential(config: EntraIdentityConfig) -> Any:
+    """Return the cached ``DefaultAzureCredential`` for ``config``.
+
+    Hermes processes use exactly one Entra config at a time (the
+    ``model.entra.*`` block in config.yaml drives every aux task,
+    subagent, and credential probe in the session). ``maxsize=1`` is
+    intentional: it reflects the actual usage pattern and keeps the
+    cache trivially small.
+
+    ``EntraIdentityConfig`` is a frozen dataclass, so it's hashable and
+    safe as an LRU-cache key. ``functools.lru_cache`` is thread-safe in
+    CPython.
+
+    If two distinct configs are ever passed (tests do this; production
+    rarely), the LRU eviction handles it correctly — each call still
+    returns a credential matching its config; only one is cached at a
+    time. Use :func:`reset_credential_cache` to clear (e.g. in tests).
+    """
+    return _build_default_credential(config)
+
+
+def build_token_provider(scope: Optional[str] = None,
+                         *,
+                         config: Optional[EntraIdentityConfig] = None,
+                         base_url: Optional[str] = None,
+                         exclude_interactive_browser: bool = True,
+                         ) -> Callable[[], str]:
+    """Return a zero-arg callable that mints a fresh Entra bearer JWT.
+
+    The returned callable is exactly what Microsoft's documented Foundry
+    sample expects::
+
+        from openai import OpenAI
+        client = OpenAI(
+            base_url="https://my-resource.openai.azure.com/openai/v1/",
+            api_key=build_token_provider(),
+        )
+
+    Scope resolution order:
+      1. ``config.scope`` when a config object is supplied
+      2. explicit ``scope`` kwarg
+      3. ``SCOPE_AI_AZURE_DEFAULT`` (Microsoft's documented Foundry scope)
+
+    ``base_url`` is unused today and kept for back-compat. Tenant /
+    service-principal / sovereign-cloud configuration flows through
+    ``azure-identity``'s standard ``AZURE_*`` environment variables —
+    see :func:`_build_default_credential` for the rationale.
+
+    NOT serializable across process boundaries. For multiprocessing
+    workers, serialize the ``EntraIdentityConfig`` and rebuild the
+    provider inside the worker.
+    """
+    ai = _require_azure_identity()
+    if config is None:
+        config = EntraIdentityConfig(
+            scope=scope or SCOPE_AI_AZURE_DEFAULT,
+            exclude_interactive_browser=exclude_interactive_browser,
+        )
+    credential = build_credential(config)
+    return ai.get_bearer_token_provider(credential, config.scope)
+
+
+# ---------------------------------------------------------------------------
+# Credential probing
+# ---------------------------------------------------------------------------
+
+
+def has_azure_identity_credentials(scope: Optional[str] = None,
+                                   *,
+                                   config: Optional[EntraIdentityConfig] = None,
+                                   timeout_seconds: float = 10.0,
+                                   allow_install: bool = True,
+                                   **overrides: Any) -> bool:
+    """Best-effort probe: can `DefaultAzureCredential` mint a token now?
+
+    Runs ``credential.get_token(scope)`` under a thread-based timeout so
+    a slow token service can't hang the caller. Returns False on any
+    error — never raises. Use for ``hermes doctor`` /
+    ``hermes auth status`` / wizard preflight.
+
+    ``allow_install``: when True (default) and ``azure-identity`` is not
+    importable, the adapter triggers the standard lazy-install path
+    (subject to ``security.allow_lazy_installs``) before probing. Set
+    False to make this strictly an "is installed?" check — used on hot
+    paths like CLI startup where we never want pip to run.
+
+    NOT used by ``is_provider_configured()`` — that path is structural
+    only (no token mint), so CLI startup doesn't pay this latency.
+    """
+    if not has_azure_identity_installed():
+        if not allow_install:
+            return False
+        try:
+            _require_azure_identity()
+        except ImportError as exc:
+            logger.debug("azure-identity lazy install unavailable: %s", exc)
+            return False
+    if config is None:
+        effective_scope = (scope or "").strip() or SCOPE_AI_AZURE_DEFAULT
+        config = EntraIdentityConfig(scope=effective_scope, **overrides)
+
+    result = {"ok": False}
+
+    def _probe() -> None:
+        try:
+            credential = build_credential(config)
+            tok = credential.get_token(config.scope)
+            result["ok"] = bool(getattr(tok, "token", None))
+        except Exception as exc:
+            logger.debug("Entra credential probe failed: %s", exc)
+            result["ok"] = False
+
+    thread = threading.Thread(target=_probe, daemon=True)
+    thread.start()
+    thread.join(timeout=max(0.01, timeout_seconds))
+    if thread.is_alive():
+        logger.debug("Entra token service probe timed out after %ss", timeout_seconds)
+        return False
+    return bool(result.get("ok"))
+
+
+def describe_active_credential(config: Optional[EntraIdentityConfig] = None,
+                               *,
+                               scope: Optional[str] = None,
+                               timeout_seconds: float = 10.0,
+                               allow_install: bool = True,
+                               **overrides: Any) -> Dict[str, Any]:
+    """Return diagnostic info about the active credential chain.
+
+    Best-effort: runs ``get_token()`` and inspects what came back.
+    Designed for ``hermes doctor`` and the wizard preflight — never
+    raises, returns ``{"ok": False, "error": ...}`` on failure.
+
+    ``allow_install``: when True (default) and ``azure-identity`` is not
+    importable, the adapter triggers the standard lazy-install path
+    (subject to ``security.allow_lazy_installs``) before probing. The
+    install failure is surfaced as the diagnostic error when it fails.
+    Set False for hot CLI paths that should never trigger pip.
+
+    ``azure-identity`` doesn't expose the winning inner credential as
+    a public field, so we report a coarse picture (env vars present,
+    token expiry, claims-derived tenant) rather than the credential
+    class name. Users wanting the precise class can run with
+    ``AZURE_LOG_LEVEL=DEBUG``.
+    """
+    info: Dict[str, Any] = {"ok": False}
+    if not has_azure_identity_installed():
+        if not allow_install:
+            info["error"] = "azure-identity not installed"
+            info["hint"] = (
+                "pip install azure-identity (or rely on lazy install at "
+                "first use)"
+            )
+            return info
+        try:
+            _require_azure_identity()
+        except ImportError as exc:
+            info["error"] = str(exc) or "azure-identity not installed"
+            info["hint"] = (
+                "pip install azure-identity manually, or enable lazy "
+                "installs (security.allow_lazy_installs: true in "
+                "config.yaml)."
+            )
+            return info
+
+    if config is None:
+        effective_scope = (scope or "").strip() or SCOPE_AI_AZURE_DEFAULT
+        config = EntraIdentityConfig(scope=effective_scope, **overrides)
+
+    info["scope"] = config.scope
+    # Tenant / authority / service-principal config flow through the
+    # standard ``AZURE_*`` env vars; surface them below.
+    if os.environ.get("AZURE_TENANT_ID", "").strip():
+        info["tenant_id_env"] = os.environ["AZURE_TENANT_ID"].strip()
+
+    # Surface which env-var sources are present without minting yet.
+    env_sources = []
+    if os.environ.get("AZURE_FEDERATED_TOKEN_FILE", "").strip():
+        env_sources.append("WorkloadIdentityCredential (AZURE_FEDERATED_TOKEN_FILE)")
+    if (os.environ.get("AZURE_CLIENT_ID", "").strip()
+            and os.environ.get("AZURE_CLIENT_SECRET", "").strip()
+            and os.environ.get("AZURE_TENANT_ID", "").strip()):
+        env_sources.append("EnvironmentCredential (client secret)")
+    if os.environ.get("IDENTITY_ENDPOINT", "").strip() or os.environ.get("MSI_ENDPOINT", "").strip():
+        env_sources.append("ManagedIdentityCredential (IDENTITY_ENDPOINT)")
+    info["env_sources"] = env_sources
+
+    # Now try minting.
+    result: Dict[str, Any] = {}
+
+    def _probe() -> None:
+        try:
+            credential = build_credential(config)
+            tok = credential.get_token(config.scope)
+            result["token"] = tok
+        except Exception as exc:
+            result["error"] = str(exc)
+
+    thread = threading.Thread(target=_probe, daemon=True)
+    thread.start()
+    thread.join(timeout=max(0.01, timeout_seconds))
+    if thread.is_alive():
+        info["error"] = f"Token probe timed out after {timeout_seconds:.0f}s"
+        info["hint"] = (
+            "DefaultAzureCredential can be slow when the token service is unreachable "
+            "or when az login state is stale. Try `az login` or set "
+            "AZURE_CLIENT_ID / AZURE_TENANT_ID / AZURE_CLIENT_SECRET."
+        )
+        return info
+
+    if "error" in result:
+        info["error"] = result["error"]
+        return info
+
+    token = result.get("token")
+    if token is None:
+        info["error"] = "credential chain exhausted"
+        return info
+
+    info["ok"] = True
+    info["expires_on"] = getattr(token, "expires_on", None)
+    return info
+
+
+# ---------------------------------------------------------------------------
+# Consumer-side helpers — split by purpose to prevent accidental token
+# minting in logging / cache-key / dashboard paths.
+# ---------------------------------------------------------------------------
+
+
+def is_token_provider(value: Any) -> bool:
+    """Return True when ``value`` is a callable Entra token provider.
+
+    Used at the seams where a consumer must decide between
+    string-API-key semantics and bearer-callable semantics.
+    """
+    return callable(value) and not isinstance(value, str)
+
+
+def materialize_bearer_for_http(value: Any) -> str:
+    """Return a fresh Bearer JWT for a manual HTTP request.
+
+    Only call this at sites that must construct an ``Authorization``
+    header outside the OpenAI SDK (e.g. ``hermes_cli/azure_detect.py``).
+    Calls the callable exactly once and returns the resulting token.
+
+    **Anthropic SDK integration:** the Anthropic Python SDK does not
+    accept a ``Callable[[], str]`` for ``auth_token``. Instead,
+    :func:`build_bearer_http_client` returns an ``httpx.Client`` whose
+    request event hook calls this function and rewrites the
+    ``Authorization`` header per request — and that client is passed to
+    the Anthropic SDK via ``http_client=...``. See
+    :func:`agent.anthropic_adapter.build_anthropic_client` for the
+    consumer.
+
+    Raises ``ValueError`` if ``value`` is not a callable token provider
+    or non-empty string.
+    """
+    if is_token_provider(value):
+        token = value()
+        if not isinstance(token, str) or not token:
+            raise ValueError("token provider returned empty value")
+        return token
+    if isinstance(value, str) and value:
+        return value
+    raise ValueError("no usable api_key / token provider")
+
+
+def build_bearer_http_client(token_provider: Callable[[], str], **httpx_kwargs: Any) -> Any:
+    """Return an ``httpx.Client`` that mints a fresh Entra bearer JWT
+    per outbound request.
+
+    The Anthropic SDK (≤ 0.86.0 at the time of writing) stores
+    ``api_key`` / ``auth_token`` as static strings and computes the
+    ``Authorization`` header at construction time. To get per-request
+    token refresh (the Microsoft-recommended Foundry pattern for
+    callable bearer providers), we install an httpx ``request`` event
+    hook on a custom client and pass that client to the SDK via
+    ``http_client=...``. The hook:
+
+      1. Calls :func:`materialize_bearer_for_http` to mint a fresh JWT
+         (azure-identity caches internally — this is cheap when the
+         cached token is still valid).
+      2. Strips any pre-set ``Authorization`` / ``api-key`` /
+         ``x-api-key`` headers the SDK may have added (avoids
+         conflicting auth values).
+      3. Sets ``Authorization: Bearer <fresh-jwt>``.
+
+    ``token_provider`` must be a zero-arg callable returning a string —
+    typically the result of :func:`build_token_provider`.
+
+    ``httpx_kwargs`` are forwarded verbatim to ``httpx.Client(...)`` so
+    callers can attach a ``timeout``, ``transport``, ``proxy``, etc.
+
+    Raises ``ImportError`` if ``httpx`` is not installed (it is a
+    transitive dependency of both ``openai`` and ``anthropic`` SDKs, so
+    in practice always available when this helper is reached).
+    """
+    if not is_token_provider(token_provider):
+        raise ValueError(
+            "build_bearer_http_client requires a zero-arg callable "
+            "token provider"
+        )
+
+    try:
+        import httpx
+    except ImportError as exc:  # pragma: no cover — httpx ships with openai/anthropic
+        raise ImportError(
+            "httpx is required for Entra ID bearer auth on Microsoft Foundry "
+            "Anthropic-style endpoints. It is normally a transitive "
+            "dependency of the openai/anthropic SDKs."
+        ) from exc
+
+    def _inject_bearer(request: "httpx.Request") -> None:
+        try:
+            token = materialize_bearer_for_http(token_provider)
+        except ValueError as exc:
+            # Token provider failed (chain exhausted, token service unreachable,
+            # az login expired, etc.). Strip any auth headers the SDK
+            # may have set — including our own placeholder sentinel
+            # ``entra-id-bearer-via-http-hook`` from
+            # ``_build_anthropic_client_with_bearer_hook`` — so the
+            # outbound request hits Azure with NO Authorization rather
+            # than with the placeholder. Azure returns a clean 401
+            # "missing auth" that is easier to diagnose than a 401
+            # against the sentinel string, and the sentinel never
+            # appears in upstream access logs.
+            #
+            # Log at WARNING (not DEBUG) so the misconfiguration is
+            # visible at default log levels.
+            logger.warning(
+                "Bearer hook: Entra ID token provider returned empty (%s) "
+                "— stripping Authorization headers. Azure will respond 401. "
+                "Run `hermes doctor` or `az login` to recover.",
+                exc,
+            )
+            for header_name in ("Authorization", "authorization", "Api-Key", "api-key", "X-Api-Key", "x-api-key"):
+                request.headers.pop(header_name, None)
+            return
+        for header_name in ("Authorization", "authorization", "Api-Key", "api-key", "X-Api-Key", "x-api-key"):
+            request.headers.pop(header_name, None)
+        request.headers["Authorization"] = f"Bearer {token}"
+
+    return httpx.Client(
+        event_hooks={"request": [_inject_bearer]},
+        **httpx_kwargs,
+    )
+
+
+__all__ = [
+    "EntraIdentityConfig",
+    "SCOPE_AI_AZURE_DEFAULT",
+    "build_bearer_http_client",
+    "build_credential",
+    "build_token_provider",
+    "describe_active_credential",
+    "has_azure_identity_credentials",
+    "has_azure_identity_installed",
+    "is_token_provider",
+    "materialize_bearer_for_http",
+    "reset_credential_cache",
+]
--- a/agent/background_review.py
+++ b/agent/background_review.py
@@ -0,0 +1,587 @@
+"""Background memory/skill review — fork the agent to evaluate the turn.
+
+After every turn, ``AIAgent.run_conversation`` may call
+:func:`spawn_background_review` to fire off a daemon thread that replays
+the conversation snapshot in a forked :class:`AIAgent` and asks itself
+"should any skill/memory be saved or updated?".  Writes go straight to
+the memory + skill stores.  Main conversation and prompt cache are never
+touched.
+
+The fork inherits the parent's live runtime (provider, model, base_url,
+credentials, cached system prompt) so it hits the same prefix cache and
+uses the same auth.  It runs with a tool whitelist limited to memory and
+skill management tools; everything else is denied at runtime.
+
+See the ``hermes-agent-dev`` skill (``references/self-improvement-loop.md``)
+for invariants and PR review criteria.
+"""
+
+from __future__ import annotations
+
+import contextlib
+import json
+import logging
+import os
+from typing import Any, Dict, List, Optional
+
+logger = logging.getLogger(__name__)
+
+
+# Review-prompt strings — used by ``spawn_background_review_thread`` to build
+# the user-message that the forked review agent receives.  AIAgent exposes
+# them as class attributes (``_MEMORY_REVIEW_PROMPT`` etc.) for back-compat;
+# the actual text lives here so future edits are one-place.
+_MEMORY_REVIEW_PROMPT = (
+    "Review the conversation above and consider saving to memory if appropriate.\n\n"
+    "Focus on:\n"
+    "1. Has the user revealed things about themselves — their persona, desires, "
+    "preferences, or personal details worth remembering?\n"
+    "2. Has the user expressed expectations about how you should behave, their work "
+    "style, or ways they want you to operate?\n\n"
+    "If something stands out, save it using the memory tool. "
+    "If nothing is worth saving, just say 'Nothing to save.' and stop."
+)
+
+_SKILL_REVIEW_PROMPT = (
+    "Review the conversation above and update the skill library. Be "
+    "ACTIVE — most sessions produce at least one skill update, even if "
+    "small. A pass that does nothing is a missed learning opportunity, "
+    "not a neutral outcome.\n\n"
+    "Target shape of the library: CLASS-LEVEL skills, each with a rich "
+    "SKILL.md and a `references/` directory for session-specific detail. "
+    "Not a long flat list of narrow one-session-one-skill entries. This "
+    "shapes HOW you update, not WHETHER you update.\n\n"
+    "Signals to look for (any one of these warrants action):\n"
+    "  • User corrected your style, tone, format, legibility, or "
+    "verbosity. Frustration signals like 'stop doing X', 'this is too "
+    "verbose', 'don't format like this', 'why are you explaining', "
+    "'just give me the answer', 'you always do Y and I hate it', or an "
+    "explicit 'remember this' are FIRST-CLASS skill signals, not just "
+    "memory signals. Update the relevant skill(s) to embed the "
+    "preference so the next session starts already knowing.\n"
+    "  • User corrected your workflow, approach, or sequence of steps. "
+    "Encode the correction as a pitfall or explicit step in the skill "
+    "that governs that class of task.\n"
+    "  • Non-trivial technique, fix, workaround, debugging path, or "
+    "tool-usage pattern emerged that a future session would benefit "
+    "from. Capture it.\n"
+    "  • A skill that got loaded or consulted this session turned out "
+    "to be wrong, missing a step, or outdated. Patch it NOW.\n\n"
+    "Preference order — prefer the earliest action that fits, but do "
+    "pick one when a signal above fired:\n"
+    "  1. UPDATE A CURRENTLY-LOADED SKILL. Look back through the "
+    "conversation for skills the user loaded via /skill-name or you "
+    "read via skill_view. If any of them covers the territory of the "
+    "new learning, PATCH that one first. It is the skill that was in "
+    "play, so it's the right one to extend.\n"
+    "  2. UPDATE AN EXISTING UMBRELLA (via skills_list + skill_view). "
+    "If no loaded skill fits but an existing class-level skill does, "
+    "patch it. Add a subsection, a pitfall, or broaden a trigger.\n"
+    "  3. ADD A SUPPORT FILE under an existing umbrella. Skills can be "
+    "packaged with three kinds of support files — use the right "
+    "directory per kind:\n"
+    "     • `references/<topic>.md` — session-specific detail (error "
+    "transcripts, reproduction recipes, provider quirks) AND "
+    "condensed knowledge banks: quoted research, API docs, external "
+    "authoritative excerpts, or domain notes you found while working "
+    "on the problem. Write it concise and for the value of the task, "
+    "not as a full mirror of upstream docs.\n"
+    "     • `templates/<name>.<ext>` — starter files meant to be "
+    "copied and modified (boilerplate configs, scaffolding, a "
+    "known-good example the agent can `reproduce with modifications`).\n"
+    "     • `scripts/<name>.<ext>` — statically re-runnable actions "
+    "the skill can invoke directly (verification scripts, fixture "
+    "generators, deterministic probes, anything the agent should run "
+    "rather than hand-type each time).\n"
+    "     Add support files via skill_manage action=write_file with "
+    "file_path starting 'references/', 'templates/', or 'scripts/'. "
+    "The umbrella's SKILL.md should gain a one-line pointer to any "
+    "new support file so future agents know it exists.\n"
+    "  4. CREATE A NEW CLASS-LEVEL UMBRELLA SKILL when no existing "
+    "skill covers the class. The name MUST be at the class level. "
+    "The name MUST NOT be a specific PR number, error string, feature "
+    "codename, library-alone name, or 'fix-X / debug-Y / audit-Z-today' "
+    "session artifact. If the proposed name only makes sense for "
+    "today's task, it's wrong — fall back to (1), (2), or (3).\n\n"
+    "User-preference embedding (important): when the user expressed a "
+    "style/format/workflow preference, the update belongs in the "
+    "SKILL.md body, not just in memory. Memory captures 'who the user "
+    "is and what the current situation and state of your operations "
+    "are'; skills capture 'how to do this class of task for this "
+    "user'. When they complain about how you handled a task, the "
+    "skill that governs that task needs to carry the lesson.\n\n"
+    "If you notice two existing skills that overlap, note it in your "
+    "reply — the background curator handles consolidation at scale.\n\n"
+    "Protected skills (DO NOT edit these):\n"
+    "  • Bundled skills (shipped with Hermes, e.g. 'hermes-agent').\n"
+    "  • Hub-installed skills (installed via 'hermes skills install').\n"
+    "  • Pinned skills (marked via 'hermes curator pin').\n"
+    "If the only skills that need updating are protected, say\n"
+    "'Nothing to save.' and stop.\n\n"
+    "Do NOT capture (these become persistent self-imposed constraints "
+    "that bite you later when the environment changes):\n"
+    "  • Environment-dependent failures: missing binaries, fresh-install "
+    "errors, post-migration path mismatches, 'command not found', "
+    "unconfigured credentials, uninstalled packages. The user can fix "
+    "these — they are not durable rules.\n"
+    "  • Negative claims about tools or features ('browser tools do not "
+    "work', 'X tool is broken', 'cannot use Y from execute_code'). These "
+    "harden into refusals the agent cites against itself for months "
+    "after the actual problem was fixed.\n"
+    "  • Session-specific transient errors that resolved before the "
+    "conversation ended. If retrying worked, the lesson is the retry "
+    "pattern, not the original failure.\n"
+    "  • One-off task narratives. A user asking 'summarize today's "
+    "market' or 'analyze this PR' is not a class of work that warrants "
+    "a skill.\n\n"
+    "If a tool failed because of setup state, capture the FIX (install "
+    "command, config step, env var to set) under an existing setup or "
+    "troubleshooting skill — never 'this tool does not work' as a "
+    "standalone constraint.\n\n"
+    "'Nothing to save.' is a real option but should NOT be the "
+    "default. If the session ran smoothly with no corrections and "
+    "produced no new technique, just say 'Nothing to save.' and stop. "
+    "Otherwise, act."
+)
+
+_COMBINED_REVIEW_PROMPT = (
+    "Review the conversation above and update two things:\n\n"
+    "**Memory**: who the user is. Did the user reveal persona, "
+    "desires, preferences, personal details, or expectations about "
+    "how you should behave? Save facts about the user and durable "
+    "preferences with the memory tool.\n\n"
+    "**Skills**: how to do this class of task. Be ACTIVE — most "
+    "sessions produce at least one skill update. A pass that does "
+    "nothing is a missed learning opportunity, not a neutral outcome.\n\n"
+    "Target shape of the skill library: CLASS-LEVEL skills with a rich "
+    "SKILL.md and a `references/` directory for session-specific detail. "
+    "Not a long flat list of narrow one-session-one-skill entries.\n\n"
+    "Signals that warrant a skill update (any one is enough):\n"
+    "  • User corrected your style, tone, format, legibility, "
+    "verbosity, or approach. Frustration is a FIRST-CLASS skill "
+    "signal, not just a memory signal. 'stop doing X', 'don't format "
+    "like this', 'I hate when you Y' — embed the lesson in the skill "
+    "that governs that task so the next session starts fixed.\n"
+    "  • Non-trivial technique, fix, workaround, or debugging path "
+    "emerged.\n"
+    "  • A skill that was loaded or consulted turned out wrong, "
+    "missing, or outdated — patch it now.\n\n"
+    "Preference order for skills — pick the earliest that fits:\n"
+    "  1. UPDATE A CURRENTLY-LOADED SKILL. Check what skills were "
+    "loaded via /skill-name or skill_view in the conversation. If one "
+    "of them covers the learning, PATCH it first. It was in play; "
+    "it's the right place.\n"
+    "  2. UPDATE AN EXISTING UMBRELLA (skills_list + skill_view to "
+    "find the right one). Patch it.\n"
+    "  3. ADD A SUPPORT FILE under an existing umbrella via "
+    "skill_manage action=write_file. Three kinds: "
+    "`references/<topic>.md` for session-specific detail OR condensed "
+    "knowledge banks (quoted research, API docs excerpts, domain "
+    "notes) written concise and task-focused; `templates/<name>.<ext>` "
+    "for starter files meant to be copied and modified; "
+    "`scripts/<name>.<ext>` for statically re-runnable actions "
+    "(verification, fixture generators, probes). Add a one-line "
+    "pointer in SKILL.md so future agents find them.\n"
+    "  4. CREATE A NEW CLASS-LEVEL UMBRELLA when nothing exists. "
+    "Name at the class level — NOT a PR number, error string, "
+    "codename, library-alone name, or 'fix-X / debug-Y' session "
+    "artifact. If the name only fits today's task, fall back to (1), "
+    "(2), or (3).\n\n"
+    "User-preference embedding: when the user complains about how "
+    "you handled a task, update the skill that governs that task — "
+    "memory alone isn't enough. Memory says 'who the user is and "
+    "what the current situation and state of your operations are'; "
+    "skills say 'how to do this class of task for this user'. Both "
+    "should carry user-preference lessons when relevant.\n\n"
+    "If you notice overlapping existing skills, mention it — the "
+    "background curator handles consolidation.\n\n"
+    "Protected skills (DO NOT edit these):\n"
+    "  • Bundled skills (shipped with Hermes, e.g. 'hermes-agent').\n"
+    "  • Hub-installed skills (installed via 'hermes skills install').\n"
+    "  • Pinned skills (marked via 'hermes curator pin').\n"
+    "If the only skills that need updating are protected, say\n"
+    "'Nothing to save.' and stop.\n\n"
+    "Do NOT capture as skills (these become persistent self-imposed "
+    "constraints that bite you later when the environment changes):\n"
+    "  • Environment-dependent failures: missing binaries, fresh-install "
+    "errors, post-migration path mismatches, 'command not found', "
+    "unconfigured credentials, uninstalled packages. The user can fix "
+    "these — they are not durable rules.\n"
+    "  • Negative claims about tools or features ('browser tools do not "
+    "work', 'X tool is broken', 'cannot use Y from execute_code'). These "
+    "harden into refusals the agent cites against itself for months "
+    "after the actual problem was fixed.\n"
+    "  • Session-specific transient errors that resolved before the "
+    "conversation ended. If retrying worked, the lesson is the retry "
+    "pattern, not the original failure.\n"
+    "  • One-off task narratives. A user asking 'summarize today's "
+    "market' or 'analyze this PR' is not a class of work that warrants "
+    "a skill.\n\n"
+    "If a tool failed because of setup state, capture the FIX (install "
+    "command, config step, env var to set) under an existing setup or "
+    "troubleshooting skill — never 'this tool does not work' as a "
+    "standalone constraint.\n\n"
+    "Act on whichever of the two dimensions has real signal. If "
+    "genuinely nothing stands out on either, say 'Nothing to save.' "
+    "and stop — but don't reach for that conclusion as a default."
+)
+
+
+
+def summarize_background_review_actions(
+    review_messages: List[Dict],
+    prior_snapshot: List[Dict],
+) -> List[str]:
+    """Build the human-facing action summary for a background review pass.
+
+    Walks the review agent's session messages and collects "successful tool
+    action" descriptions to surface to the user (e.g. "Memory updated").
+    Tool messages already present in ``prior_snapshot`` are skipped so we
+    don't re-surface stale results from the prior conversation that the
+    review agent inherited via ``conversation_history`` (issue #14944).
+
+    Matching is by ``tool_call_id`` when available, with a content-equality
+    fallback for tool messages that lack one.
+    """
+    existing_tool_call_ids = set()
+    existing_tool_contents = set()
+    for prior in prior_snapshot or []:
+        if not isinstance(prior, dict) or prior.get("role") != "tool":
+            continue
+        tcid = prior.get("tool_call_id")
+        if tcid:
+            existing_tool_call_ids.add(tcid)
+        else:
+            content = prior.get("content")
+            if isinstance(content, str):
+                existing_tool_contents.add(content)
+
+    actions: List[str] = []
+    for msg in review_messages or []:
+        if not isinstance(msg, dict) or msg.get("role") != "tool":
+            continue
+        tcid = msg.get("tool_call_id")
+        if tcid and tcid in existing_tool_call_ids:
+            continue
+        if not tcid:
+            content_str = msg.get("content")
+            if isinstance(content_str, str) and content_str in existing_tool_contents:
+                continue
+        try:
+            data = json.loads(msg.get("content", "{}"))
+        except (json.JSONDecodeError, TypeError):
+            continue
+        if not isinstance(data, dict) or not data.get("success"):
+            continue
+        message = data.get("message", "")
+        target = data.get("target", "")
+        if "created" in message.lower():
+            actions.append(message)
+        elif "updated" in message.lower():
+            actions.append(message)
+        elif "added" in message.lower() or (target and "add" in message.lower()):
+            label = "Memory" if target == "memory" else "User profile" if target == "user" else target
+            actions.append(f"{label} updated")
+        elif "Entry added" in message:
+            label = "Memory" if target == "memory" else "User profile" if target == "user" else target
+            actions.append(f"{label} updated")
+        elif "removed" in message.lower() or "replaced" in message.lower():
+            label = "Memory" if target == "memory" else "User profile" if target == "user" else target
+            actions.append(f"{label} updated")
+    return actions
+
+
+def build_memory_write_metadata(
+    agent: Any,
+    *,
+    write_origin: Optional[str] = None,
+    execution_context: Optional[str] = None,
+    task_id: Optional[str] = None,
+    tool_call_id: Optional[str] = None,
+) -> Dict[str, Any]:
+    """Build provenance metadata for external memory-provider mirrors."""
+    metadata: Dict[str, Any] = {
+        "write_origin": write_origin or getattr(agent, "_memory_write_origin", "assistant_tool"),
+        "execution_context": (
+            execution_context
+            or getattr(agent, "_memory_write_context", "foreground")
+        ),
+        "session_id": agent.session_id or "",
+        "parent_session_id": agent._parent_session_id or "",
+        "platform": agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
+        "tool_name": "memory",
+    }
+    if task_id:
+        metadata["task_id"] = task_id
+    if tool_call_id:
+        metadata["tool_call_id"] = tool_call_id
+    return {k: v for k, v in metadata.items() if v not in {None, ""}}
+
+
+def _run_review_in_thread(
+    agent: Any,
+    messages_snapshot: List[Dict],
+    prompt: str,
+) -> None:
+    """Worker function executed in the background-review daemon thread.
+
+    Spawns a forked ``AIAgent`` inheriting the parent's runtime, runs the
+    review prompt, and surfaces a compact action summary back to the user
+    via ``agent._safe_print`` and ``agent.background_review_callback``.
+    """
+    # Local import to avoid a hard circular dep at module load.
+    from run_agent import AIAgent
+    from tools.terminal_tool import set_approval_callback as _set_approval_callback
+
+    # Install a non-interactive approval callback on this worker
+    # thread so any dangerous-command guard the review agent trips
+    # resolves to "deny" instead of falling back to input() -- which
+    # deadlocks against the parent's prompt_toolkit TUI (#15216).
+    # Same pattern as _subagent_auto_deny in tools/delegate_tool.py.
+    def _bg_review_auto_deny(command, description, **kwargs):
+        logger.warning(
+            "Background review auto-denied dangerous command: %s (%s)",
+            command, description,
+        )
+        return "deny"
+    try:
+        _set_approval_callback(_bg_review_auto_deny)
+    except Exception:
+        pass
+
+    review_agent = None
+    review_messages: List[Dict] = []
+    try:
+        with open(os.devnull, "w", encoding="utf-8") as _devnull, \
+             contextlib.redirect_stdout(_devnull), \
+             contextlib.redirect_stderr(_devnull):
+            # Inherit the parent agent's live runtime (provider, model,
+            # base_url, api_key, api_mode) so the fork uses the exact
+            # same credentials the main turn is using.  Without this,
+            # AIAgent.__init__ re-runs auto-resolution from env vars,
+            # which fails for OAuth-only providers, session-scoped
+            # creds, or credential-pool setups where the resolver can't
+            # reconstruct auth from scratch -- producing the spurious
+            # "No LLM provider configured" warning at end of turn.
+            _parent_runtime = agent._current_main_runtime()
+            _parent_api_mode = _parent_runtime.get("api_mode") or None
+            # The review fork needs to call agent-loop tools (memory,
+            # skill_manage). Those tools require Hermes' own dispatch,
+            # which the codex_app_server runtime bypasses entirely
+            # (it runs the turn inside codex's subprocess). So when
+            # the parent is on codex_app_server, downgrade the review
+            # fork to codex_responses — same auth/credentials, but
+            # talks to the OpenAI Responses API directly so Hermes
+            # owns the loop and the agent-loop tools dispatch.
+            if _parent_api_mode == "codex_app_server":
+                _parent_api_mode = "codex_responses"
+            # skip_memory=True keeps the review fork from
+            # touching external memory plugins (honcho, mem0,
+            # supermemory, etc.).  Without it, the fork's
+            # __init__ rebuilds its own _memory_manager from
+            # config, scoped to the parent's session_id, and
+            # run_conversation() then leaks the harness prompt
+            # into the user's real memory namespace via three
+            # ingestion sites: on_turn_start (cadence + turn
+            # message), prefetch_all (recall query), and
+            # sync_all (harness prompt + review output recorded
+            # as a (user, assistant) turn pair).  Built-in
+            # MEMORY.md / USER.md state is re-bound from the
+            # parent below so memory(action="add") writes from
+            # the review still land on disk; the review just
+            # has zero side effects on external providers.
+            # Match parent's toolset config so ``tools[]`` is byte-identical
+            # in the request body — Anthropic's cache key includes it.
+            # (The runtime whitelist below still restricts dispatch.)
+            review_agent = AIAgent(
+                model=agent.model,
+                max_iterations=16,
+                quiet_mode=True,
+                platform=agent.platform,
+                provider=agent.provider,
+                api_mode=_parent_api_mode,
+                base_url=_parent_runtime.get("base_url") or None,
+                api_key=_parent_runtime.get("api_key") or None,
+                credential_pool=getattr(agent, "_credential_pool", None),
+                parent_session_id=agent.session_id,
+                enabled_toolsets=getattr(agent, "enabled_toolsets", None),
+                disabled_toolsets=getattr(agent, "disabled_toolsets", None),
+                skip_memory=True,
+            )
+            review_agent._memory_write_origin = "background_review"
+            review_agent._memory_write_context = "background_review"
+            review_agent._memory_store = agent._memory_store
+            review_agent._memory_enabled = agent._memory_enabled
+            review_agent._user_profile_enabled = agent._user_profile_enabled
+            review_agent._memory_nudge_interval = 0
+            review_agent._skill_nudge_interval = 0
+            # Suppress all status/warning emits from the fork so the
+            # user only sees the final successful-action summary.
+            # Without this, mid-review "Iteration budget exhausted",
+            # rate-limit retries, compression warnings, and other
+            # lifecycle messages bubble up through _emit_status ->
+            # _vprint and leak past the stdout redirect (they go via
+            # _print_fn/status_callback, which bypass sys.stdout).
+            review_agent.suppress_status_output = True
+            # Inherit the parent's cached system prompt verbatim so
+            # the review fork's outbound HTTP request hits the same
+            # Anthropic/OpenRouter prefix cache the parent warmed.
+            # Without this, the fork rebuilds the system prompt from
+            # scratch (fresh _hermes_now() timestamp, fresh
+            # session_id, narrower toolset → different skills_prompt)
+            # and the byte-exact prefix-cache key misses. See
+            # issue #25322 and PR #17276 for the full analysis +
+            # measured impact (~26% end-to-end cost reduction on
+            # Sonnet 4.5).
+            review_agent._cached_system_prompt = agent._cached_system_prompt
+            # Defensive: pin session_start + session_id to the
+            # parent's so any code path that re-renders parts of
+            # the system prompt (compression, plugin hooks) still
+            # produces byte-identical output. The cached-prompt
+            # assignment above already short-circuits the normal
+            # rebuild path, but these pins guarantee parity even
+            # if a future code path bypasses the cache.
+            review_agent.session_start = agent.session_start
+            review_agent.session_id = agent.session_id
+
+            from model_tools import get_tool_definitions
+            from hermes_cli.plugins import (
+                set_thread_tool_whitelist,
+                clear_thread_tool_whitelist,
+            )
+
+            review_whitelist = {
+                t["function"]["name"]
+                for t in get_tool_definitions(
+                    enabled_toolsets=["memory", "skills"],
+                    quiet_mode=True,
+                )
+            }
+            set_thread_tool_whitelist(
+                review_whitelist,
+                deny_msg_fmt=(
+                    "Background review denied non-whitelisted tool: "
+                    "{tool_name}. Only memory/skill tools are allowed."
+                ),
+            )
+            try:
+                review_agent.run_conversation(
+                    user_message=(
+                        prompt
+                        + "\n\nYou can only call memory and skill "
+                        "management tools. Other tools will be denied "
+                        "at runtime — do not attempt them."
+                    ),
+                    conversation_history=messages_snapshot,
+                )
+            finally:
+                clear_thread_tool_whitelist()
+
+            # Tear down memory providers while stdout is still
+            # redirected so background thread teardown (Honcho flush,
+            # Hindsight sync, etc.) stays silent.  The finally block
+            # below is a safety net for the exception path.
+            try:
+                review_agent.shutdown_memory_provider()
+            except Exception:
+                pass
+            try:
+                review_agent.close()
+            except Exception:
+                pass
+            review_messages = list(getattr(review_agent, "_session_messages", []))
+            review_agent = None
+
+        # Scan the review agent's messages for successful tool actions
+        # and surface a compact summary to the user. Tool messages
+        # already present in messages_snapshot must be skipped, since
+        # the review agent inherits that history and would otherwise
+        # re-surface stale "created"/"updated" messages from the prior
+        # conversation as if they just happened (issue #14944).
+        actions = summarize_background_review_actions(
+            review_messages,
+            messages_snapshot,
+        )
+
+        if actions:
+            summary = " · ".join(dict.fromkeys(actions))
+            agent._safe_print(
+                f"  💾 Self-improvement review: {summary}"
+            )
+            _bg_cb = agent.background_review_callback
+            if _bg_cb:
+                try:
+                    _bg_cb(
+                        f"💾 Self-improvement review: {summary}"
+                    )
+                except Exception:
+                    pass
+
+    except Exception as e:
+        logger.warning("Background memory/skill review failed: %s", e)
+        agent._emit_auxiliary_failure("background review", e)
+    finally:
+        # Safety-net cleanup for the exception path.  Normal
+        # completion already shut down inside redirect_stdout above.
+        # Re-open devnull here so any teardown output (Honcho flush,
+        # Hindsight sync, background thread joins) stays silent even
+        # on the exception path where redirect_stdout already exited.
+        if review_agent is not None:
+            try:
+                with open(os.devnull, "w", encoding="utf-8") as _fn, \
+                     contextlib.redirect_stdout(_fn), \
+                     contextlib.redirect_stderr(_fn):
+                    try:
+                        review_agent.shutdown_memory_provider()
+                    except Exception:
+                        pass
+                    try:
+                        review_agent.close()
+                    except Exception:
+                        pass
+            except Exception:
+                pass
+        # Clear the approval callback on this bg-review thread so a
+        # recycled thread-id doesn't inherit a stale reference.
+        try:
+            _set_approval_callback(None)
+        except Exception:
+            pass
+
+
+def spawn_background_review_thread(
+    agent: Any,
+    messages_snapshot: List[Dict],
+    review_memory: bool = False,
+    review_skills: bool = False,
+):
+    """Build the review thread target and prompt for a background review.
+
+    Returns a ``(target, prompt)`` tuple.  The caller (``AIAgent._spawn_background_review``)
+    owns the actual ``threading.Thread`` construction so test-level patches
+    of ``run_agent.threading.Thread`` keep working.
+    """
+    # Pick the right prompt based on which triggers fired.  Allow per-agent
+    # override (the prompts moved to module-level constants but old code paths
+    # that set agent._MEMORY_REVIEW_PROMPT etc. directly keep working).
+    if review_memory and review_skills:
+        prompt = getattr(agent, "_COMBINED_REVIEW_PROMPT", _COMBINED_REVIEW_PROMPT)
+    elif review_memory:
+        prompt = getattr(agent, "_MEMORY_REVIEW_PROMPT", _MEMORY_REVIEW_PROMPT)
+    else:
+        prompt = getattr(agent, "_SKILL_REVIEW_PROMPT", _SKILL_REVIEW_PROMPT)
+
+    def _target() -> None:
+        _run_review_in_thread(agent, messages_snapshot, prompt)
+
+    return _target, prompt
+
+
+__all__ = [
+    "_MEMORY_REVIEW_PROMPT",
+    "_SKILL_REVIEW_PROMPT",
+    "_COMBINED_REVIEW_PROMPT",
+    "spawn_background_review_thread",
+    "summarize_background_review_actions",
+    "build_memory_write_metadata",
+]
--- a/agent/bedrock_adapter.py
+++ b/agent/bedrock_adapter.py
@@ -36,6 +36,19 @@ from typing import Any, Dict, List, Optional, Tuple

 logger = logging.getLogger(__name__)

+# ---------------------------------------------------------------------------
+# Ensure boto3/botocore are installed before any code in this module runs.
+# Upstream removed boto3 from [all] extras (PRs #24220, #24515); lazy_deps
+# handles on-demand installation so the Bedrock provider still works in the
+# EKS deployment without baking boto3 into the base image.
+# ---------------------------------------------------------------------------
+try:
+    from tools.lazy_deps import ensure
+    ensure("provider.bedrock", prompt=False)
+except Exception:
+    pass  # lazy_deps unavailable or install failed — let downstream imports surface the real error
+
+
 # ---------------------------------------------------------------------------
 # Lazy boto3 import — only loaded when the Bedrock provider is actually used.
 # This keeps startup fast for users who don't use Bedrock.
--- a/agent/browser_provider.py
+++ b/agent/browser_provider.py
@@ -0,0 +1,175 @@
+"""
+Browser Provider ABC
+====================
+
+Defines the pluggable-backend interface for cloud browser providers
+(Browserbase, Browser Use, Firecrawl, …). Providers register instances via
+:meth:`PluginContext.register_browser_provider`; the active one (selected via
+``browser.cloud_provider`` in ``config.yaml``) services every cloud-mode
+``browser_*`` tool call.
+
+Providers live in ``<repo>/plugins/browser/<name>/`` (built-in, auto-loaded as
+``kind: backend``) or ``~/.hermes/plugins/browser/<name>/`` (user, opt-in via
+``plugins.enabled``).
+
+This ABC mirrors :class:`agent.web_search_provider.WebSearchProvider` (PR
+#25182) — same shape, same registration flow, same picker integration. The
+legacy in-tree ``tools.browser_providers.base.CloudBrowserProvider`` ABC was
+deleted in PR #25214 (this work) along with the per-vendor inline modules in
+``tools/browser_providers/``; the lifecycle contract documented below is
+preserved bit-for-bit so the tool wrapper (:mod:`tools.browser_tool`) does
+not have to translate.
+
+Session metadata contract (preserved from the legacy ``CloudBrowserProvider``)::
+
+    {
+        "session_name": str,        # unique name for agent-browser --session
+        "bb_session_id": str,       # provider session ID (for close/cleanup)
+        "cdp_url": str,             # CDP websocket URL
+        "features": dict,           # feature flags that were enabled
+        "external_call_id": str,    # optional, managed-gateway billing key
+    }
+
+``bb_session_id`` is a legacy key name kept verbatim for backward compat with
+:mod:`tools.browser_tool` — it holds the provider's session ID regardless of
+which provider is in use.
+"""
+
+from __future__ import annotations
+
+import abc
+from typing import Any, Dict
+
+
+# ---------------------------------------------------------------------------
+# ABC
+# ---------------------------------------------------------------------------
+
+
+class BrowserProvider(abc.ABC):
+    """Abstract base class for a cloud browser backend.
+
+    Subclasses must implement :meth:`name`, :meth:`is_available`, and the
+    three lifecycle methods: :meth:`create_session`, :meth:`close_session`,
+    :meth:`emergency_cleanup`.
+
+    The lifecycle shape preserves the legacy ``CloudBrowserProvider`` contract
+    bit-for-bit so the dispatcher in :mod:`tools.browser_tool` is a pure
+    registry lookup — no per-provider conditionals, no shape translation.
+    """
+
+    @property
+    @abc.abstractmethod
+    def name(self) -> str:
+        """Stable short identifier used in the ``browser.cloud_provider``
+        config key.
+
+        Lowercase, hyphens permitted to preserve existing user-visible names.
+        Examples: ``browserbase``, ``browser-use``, ``firecrawl``.
+        """
+
+    @property
+    def display_name(self) -> str:
+        """Human-readable label shown in ``hermes tools``. Defaults to ``name``."""
+        return self.name
+
+    @abc.abstractmethod
+    def is_available(self) -> bool:
+        """Return True when this provider can service calls.
+
+        Typically a cheap check (env var present, managed-gateway token
+        readable, optional Python dep importable). Must NOT make network
+        calls — this runs at tool-registration time and on every
+        ``hermes tools`` paint.
+
+        Mirrors the legacy ``CloudBrowserProvider.is_configured()`` method;
+        renamed for parity with :class:`agent.web_search_provider.WebSearchProvider`.
+        """
+
+    @abc.abstractmethod
+    def create_session(self, task_id: str) -> Dict[str, object]:
+        """Create a cloud browser session and return session metadata.
+
+        Must return a dict with at least::
+
+            {
+                "session_name": str,    # unique name for agent-browser --session
+                "bb_session_id": str,   # provider session ID (for close/cleanup)
+                "cdp_url": str,         # CDP websocket URL
+                "features": dict,       # feature flags that were enabled
+            }
+
+        ``bb_session_id`` is a legacy key name kept for backward compat with
+        the rest of :mod:`tools.browser_tool` — it holds the provider's
+        session ID regardless of which provider is in use.
+
+        May raise ``ValueError`` (missing credentials) or ``RuntimeError``
+        (network / API failure); the dispatcher surfaces these to the user.
+        """
+
+    @abc.abstractmethod
+    def close_session(self, session_id: str) -> bool:
+        """Release / terminate a cloud session by its provider session ID.
+
+        Returns True on success, False on failure. Should not raise — log and
+        return False on any exception so the dispatcher's cleanup loop keeps
+        moving across sessions.
+        """
+
+    @abc.abstractmethod
+    def emergency_cleanup(self, session_id: str) -> None:
+        """Best-effort session teardown during process exit.
+
+        Called from atexit / signal handlers. Must tolerate missing
+        credentials, network errors, etc. — log and move on. Must not raise.
+        """
+
+    def get_setup_schema(self) -> Dict[str, Any]:
+        """Return provider metadata for the ``hermes tools`` picker.
+
+        Used by :mod:`hermes_cli.tools_config` to inject this provider as a
+        row in the Browser Automation picker. Shape mirrors the existing
+        hardcoded entries in ``TOOL_CATEGORIES["browser"]``::
+
+            {
+                "name": "Browserbase",
+                "badge": "paid",
+                "tag": "Cloud browser with stealth and proxies",
+                "env_vars": [
+                    {"key": "BROWSERBASE_API_KEY",
+                     "prompt": "Browserbase API key",
+                     "url": "https://browserbase.com"},
+                ],
+                "post_setup": "agent_browser",
+            }
+
+        Default: minimal entry derived from :attr:`display_name`. Override to
+        expose API key prompts, badges, managed-Nous gating, and the
+        ``post_setup`` install hook.
+        """
+        return {
+            "name": self.display_name,
+            "badge": "",
+            "tag": "",
+            "env_vars": [],
+        }
+
+    # ------------------------------------------------------------------
+    # Backward-compat shims for the legacy CloudBrowserProvider API
+    # ------------------------------------------------------------------
+    #
+    # The pre-PR-#25214 ABC exposed ``is_configured()`` and ``provider_name()``;
+    # ``tools.browser_tool`` has ~6 callers that still use those names. Rather
+    # than churn every callsite (and break out-of-tree downstream code that
+    # subclassed CloudBrowserProvider), we expose the old names as thin
+    # delegations to the new API. Subclasses MUST implement :meth:`is_available`
+    # and :attr:`name`; they may override ``is_configured`` / ``provider_name``
+    # for compatibility with the legacy ABC but it is not required.
+
+    def is_configured(self) -> bool:
+        """Backward-compat alias for :meth:`is_available`."""
+        return self.is_available()
+
+    def provider_name(self) -> str:
+        """Backward-compat alias returning :attr:`display_name`."""
+        return self.display_name
--- a/agent/browser_registry.py
+++ b/agent/browser_registry.py
@@ -0,0 +1,223 @@
+"""
+Browser Provider Registry
+=========================
+
+Central map of registered cloud browser providers. Populated by plugins at
+import-time via :meth:`PluginContext.register_browser_provider`; consumed by
+:func:`tools.browser_tool._get_cloud_provider` to route each cloud-mode
+``browser_*`` tool call to the active backend.
+
+Active selection
+----------------
+The active provider is chosen by configuration with this precedence:
+
+1. ``browser.cloud_provider`` in ``config.yaml`` (explicit override).
+2. Legacy preference order — ``browser-use`` → ``browserbase`` — filtered by
+   availability. Matches the historic auto-detect order in
+   :func:`tools.browser_tool._get_cloud_provider` (Browser Use checked first
+   because it covers both the managed Nous gateway and direct API key path;
+   Browserbase as the older direct-credentials fallback). ``firecrawl`` is
+   intentionally NOT in the legacy walk — users only get Firecrawl as a
+   cloud browser when they explicitly set ``browser.cloud_provider:
+   firecrawl``, matching pre-migration behaviour where Firecrawl was never
+   auto-selected.
+3. Otherwise ``None`` — the dispatcher falls back to local browser mode.
+
+The explicit-config branch (rule 1) intentionally ignores ``is_available()``
+so the dispatcher surfaces a typed "X_API_KEY is not set" error to the user
+instead of silently switching backends. Matches the legacy
+:func:`tools.browser_tool._get_cloud_provider` behaviour for configured names.
+
+Note: there is no "capability" split here (unlike the web subsystem, which
+has search/extract/crawl). Every browser provider implements the full
+:class:`agent.browser_provider.BrowserProvider` lifecycle; the registry's
+job is purely selection, not capability routing.
+"""
+
+from __future__ import annotations
+
+import logging
+import threading
+from typing import Dict, List, Optional
+
+from agent.browser_provider import BrowserProvider
+
+logger = logging.getLogger(__name__)
+
+
+_providers: Dict[str, BrowserProvider] = {}
+_lock = threading.Lock()
+
+
+def register_provider(provider: BrowserProvider) -> None:
+    """Register a cloud browser provider.
+
+    Re-registration (same ``name``) overwrites the previous entry and logs
+    a debug message — makes hot-reload scenarios (tests, dev loops) behave
+    predictably.
+    """
+    if not isinstance(provider, BrowserProvider):
+        raise TypeError(
+            f"register_provider() expects a BrowserProvider instance, "
+            f"got {type(provider).__name__}"
+        )
+    name = provider.name
+    if not isinstance(name, str) or not name.strip():
+        raise ValueError("Browser provider .name must be a non-empty string")
+    with _lock:
+        existing = _providers.get(name)
+        _providers[name] = provider
+    if existing is not None:
+        logger.debug(
+            "Browser provider '%s' re-registered (was %r)",
+            name, type(existing).__name__,
+        )
+    else:
+        logger.debug(
+            "Registered browser provider '%s' (%s)",
+            name, type(provider).__name__,
+        )
+
+
+def list_providers() -> List[BrowserProvider]:
+    """Return all registered providers, sorted by name."""
+    with _lock:
+        items = list(_providers.values())
+    return sorted(items, key=lambda p: p.name)
+
+
+def get_provider(name: str) -> Optional[BrowserProvider]:
+    """Return the provider registered under *name*, or None."""
+    if not isinstance(name, str):
+        return None
+    with _lock:
+        return _providers.get(name.strip())
+
+
+# ---------------------------------------------------------------------------
+# Active-provider resolution
+# ---------------------------------------------------------------------------
+
+
+# Legacy auto-detect order — used when no ``browser.cloud_provider`` is set.
+# Matches the pre-migration walk in :func:`tools.browser_tool._get_cloud_provider`.
+# Firecrawl is intentionally absent so users with ``FIRECRAWL_API_KEY`` set
+# for web-extract don't get silently routed to a paid cloud browser. See
+# :func:`_resolve` for the full rationale.
+_LEGACY_PREFERENCE = (
+    "browser-use",
+    "browserbase",
+)
+
+
+def _resolve(configured: Optional[str]) -> Optional[BrowserProvider]:
+    """Resolve the active browser provider.
+
+    Resolution rules (in order):
+
+    1. **Explicit "local".** Returns None — the dispatcher disables cloud
+       mode entirely. Mirrors legacy short-circuit in
+       :func:`tools.browser_tool._get_cloud_provider`.
+    2. **Explicit config wins, ignoring availability.** If ``configured``
+       names a registered provider, return it even if its
+       :meth:`is_available` returns False — the dispatcher will surface a
+       precise "X_API_KEY is not set" error instead of silently routing
+       somewhere else.
+    3. **Legacy preference walk, filtered by availability.** Walk
+       :data:`_LEGACY_PREFERENCE` (``browser-use`` → ``browserbase``) looking
+       for a provider whose ``is_available()`` is True.
+
+    There is intentionally NO "single-eligible shortcut" rule here (unlike
+    :func:`agent.web_search_registry._resolve`). Pre-migration, the
+    auto-detect branch in ``tools.browser_tool._get_cloud_provider`` only
+    considered Browser Use and Browserbase; Firecrawl was reachable only
+    via an explicit ``browser.cloud_provider: firecrawl`` config key.
+    Preserving that gate matters because Firecrawl shares its API key with
+    the *web* extract plugin (``plugins/web/firecrawl/``), so users who set
+    ``FIRECRAWL_API_KEY`` for web extract must NOT get silently routed to a
+    paid cloud browser on a fresh install. Third-party browser-provider
+    plugins added under ``~/.hermes/plugins/browser/<vendor>/`` are subject
+    to the same gate — they must be explicitly configured to take effect.
+
+    Returns None when no provider is configured AND no available provider
+    matches the legacy preference; the dispatcher then falls back to local
+    browser mode.
+    """
+    with _lock:
+        snapshot = dict(_providers)
+
+    def _is_available_safe(p: BrowserProvider) -> bool:
+        """Wrap ``is_available()`` so a buggy provider doesn't kill resolution."""
+        try:
+            return bool(p.is_available())
+        except Exception as exc:  # noqa: BLE001
+            logger.warning(
+                "Browser provider %s.is_available() raised %s — treating as unavailable",
+                p.name, exc, exc_info=True,
+            )
+            return False
+
+    # 1. Explicit "local" short-circuit.
+    if configured == "local":
+        return None
+
+    # 2. Explicit config wins — return regardless of is_available() so the
+    #    user gets a precise downstream error message rather than a silent
+    #    backend switch. Matches _get_cloud_provider() in browser_tool.py.
+    if configured:
+        provider = snapshot.get(configured)
+        if provider is not None:
+            return provider
+        logger.debug(
+            "browser cloud_provider '%s' configured but not registered; "
+            "falling back to auto-detect",
+            configured,
+        )
+
+    # 3. Legacy preference walk — only providers in _LEGACY_PREFERENCE are
+    #    auto-eligible. Filtered by availability so we don't surface a
+    #    provider the user has no credentials for. See docstring for why
+    #    we do NOT fall back to "any single-eligible registered provider".
+    for legacy in _LEGACY_PREFERENCE:
+        provider = snapshot.get(legacy)
+        if provider is not None and _is_available_safe(provider):
+            return provider
+
+    return None
+
+
+def get_active_browser_provider() -> Optional[BrowserProvider]:
+    """Resolve the currently-active cloud browser provider.
+
+    Reads ``browser.cloud_provider`` from config.yaml; falls back per the
+    module docstring. Returns None for local mode or when no provider is
+    available.
+    """
+    try:
+        from hermes_cli.config import read_raw_config
+
+        cfg = read_raw_config()
+        browser_cfg = cfg.get("browser", {})
+    except Exception as exc:
+        logger.debug("Could not read browser config: %s", exc)
+        browser_cfg = {}
+
+    configured: Optional[str] = None
+    if isinstance(browser_cfg, dict) and "cloud_provider" in browser_cfg:
+        try:
+            from tools.tool_backend_helpers import normalize_browser_cloud_provider
+
+            configured = normalize_browser_cloud_provider(
+                browser_cfg.get("cloud_provider")
+            )
+        except Exception as exc:
+            logger.debug("normalize_browser_cloud_provider failed: %s", exc)
+            configured = None
+
+    return _resolve(configured)
+
+
+def _reset_for_tests() -> None:
+    """Clear the registry. **Test-only.**"""
+    with _lock:
+        _providers.clear()
--- a/agent/chat_completion_helpers.py
+++ b/agent/chat_completion_helpers.py
--- a/agent/codex_responses_adapter.py
+++ b/agent/codex_responses_adapter.py
@@ -244,8 +244,24 @@ def _normalize_responses_message_status(value: Any, *, default: str = "completed
    return default


-def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
-    """Convert internal chat-style messages to Responses input items."""
+def _chat_messages_to_responses_input(
+    messages: List[Dict[str, Any]],
+    *,
+    is_xai_responses: bool = False,
+) -> List[Dict[str, Any]]:
+    """Convert internal chat-style messages to Responses input items.
+
+    ``is_xai_responses`` is kept for transport signature compatibility but
+    no longer suppresses encrypted reasoning replay.  Earlier (PR #26644,
+    May 2026) we believed xAI's OAuth/SuperGrok ``/v1/responses`` surface
+    rejected replayed ``encrypted_content`` reasoning items minted by
+    prior turns, and we stripped them.  That decision was wrong — xAI
+    explicitly relies on Hermes threading encrypted reasoning back across
+    turns for cross-turn coherence (the whole point of their partnership
+    integration).  We now replay encrypted reasoning on every Responses
+    transport (xAI, native Codex, custom relays) and let xAI tell us
+    explicitly if a specific surface ever rejects a payload.
+    """
    items: List[Dict[str, Any]] = []
    seen_item_ids: set = set()

@@ -271,6 +287,9 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
            if role == "assistant":
                # Replay encrypted reasoning items from previous turns
                # so the API can maintain coherent reasoning chains.
+                # This applies to every Responses transport including
+                # xAI — see _chat_messages_to_responses_input docstring
+                # for the May 2026 reversal of the earlier xAI gate.
                codex_reasoning = msg.get("codex_reasoning_items")
                has_codex_reasoning = False
                if isinstance(codex_reasoning, list):
@@ -410,10 +429,29 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
                    call_id = raw_tool_call_id.strip()
            if not isinstance(call_id, str) or not call_id.strip():
                continue
+
+            # Multimodal tool result: convert OpenAI-style content list into
+            # Responses ``function_call_output.output`` array. The Responses
+            # API accepts ``output`` as either a string or an array of
+            # ``input_text``/``input_image`` items. See
+            # https://developers.openai.com/api/reference/python/resources/responses/.
+            tool_content = msg.get("content")
+            output_value: Any
+            if isinstance(tool_content, list):
+                converted = _chat_content_to_responses_parts(
+                    tool_content, role="user",
+                )
+                if converted:
+                    output_value = converted
+                else:
+                    output_value = ""
+            else:
+                output_value = str(tool_content or "")
+
            items.append({
                "type": "function_call_output",
                "call_id": call_id,
-                "output": str(msg.get("content", "") or ""),
+                "output": output_value,
            })

    return items
@@ -466,6 +504,38 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
            output = item.get("output", "")
            if output is None:
                output = ""
+            # Output may be a string OR an array of structured content
+            # items (input_text / input_image) for multimodal tool results.
+            # Both shapes are accepted by the Responses API. We preserve
+            # the array form when present.
+            if isinstance(output, list):
+                # Validate each item is a recognised content shape; drop
+                # anything else to avoid 4xx from the API.
+                cleaned: List[Dict[str, Any]] = []
+                for part in output:
+                    if not isinstance(part, dict):
+                        continue
+                    ptype = part.get("type")
+                    if ptype == "input_text":
+                        text = part.get("text")
+                        if isinstance(text, str) and text:
+                            cleaned.append({"type": "input_text", "text": text})
+                    elif ptype == "input_image":
+                        url = part.get("image_url")
+                        if isinstance(url, str) and url:
+                            entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
+                            detail = part.get("detail")
+                            if isinstance(detail, str) and detail.strip():
+                                entry["detail"] = detail.strip()
+                            cleaned.append(entry)
+                normalized.append(
+                    {
+                        "type": "function_call_output",
+                        "call_id": call_id.strip(),
+                        "output": cleaned if cleaned else "",
+                    }
+                )
+                continue
            if not isinstance(output, str):
                output = str(output)

@@ -675,7 +745,7 @@ def _preflight_codex_api_kwargs(
        "model", "instructions", "input", "tools", "store",
        "reasoning", "include", "max_output_tokens", "temperature",
        "tool_choice", "parallel_tool_calls", "prompt_cache_key", "service_tier",
-        "extra_headers",
+        "extra_headers", "extra_body",
    }
    normalized: Dict[str, Any] = {
        "model": model,
@@ -725,6 +795,19 @@ def _preflight_codex_api_kwargs(
        if normalized_headers:
            normalized["extra_headers"] = normalized_headers

+    extra_body = api_kwargs.get("extra_body")
+    if extra_body is not None:
+        if not isinstance(extra_body, dict):
+            raise ValueError("Codex Responses request 'extra_body' must be an object.")
+        # Pass extra_body through verbatim — used by xAI Responses to
+        # carry `prompt_cache_key` as a body-level field (the documented
+        # cache-routing surface on /v1/responses). The openai SDK
+        # serializes extra_body into the JSON body without per-field
+        # type checks, so it survives Responses.stream() kwarg-signature
+        # changes that would otherwise raise TypeError before the wire.
+        if extra_body:
+            normalized["extra_body"] = dict(extra_body)
+
    if allow_stream:
        stream = api_kwargs.get("stream")
        if stream is not None and stream is not True:
--- a/agent/codex_runtime.py
+++ b/agent/codex_runtime.py
@@ -0,0 +1,448 @@
+"""Codex API runtime — App Server and Responses-API streaming paths.
+
+Extracted from :class:`AIAgent` to keep the agent loop file focused.
+Each function takes the parent ``AIAgent`` as its first argument
+(``agent``).  AIAgent keeps thin forwarder methods for backward
+compatibility.
+
+* ``run_codex_app_server_turn`` — drives one turn through the
+  ``codex_app_server`` subprocess client (used when a Codex CLI install
+  is the active provider).
+* ``run_codex_stream`` — streams a Codex Responses API call (the
+  ``codex_responses`` api_mode).
+* ``run_codex_create_stream_fallback`` — recovery path when the
+  Responses ``stream=True`` initial create fails.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+from types import SimpleNamespace
+from typing import Any, Dict, List
+
+logger = logging.getLogger(__name__)
+
+
+def run_codex_app_server_turn(
+    agent,
+    *,
+    user_message: str,
+    original_user_message: Any,
+    messages: List[Dict[str, Any]],
+    effective_task_id: str,
+    should_review_memory: bool = False,
+) -> Dict[str, Any]:
+    """Codex app-server runtime path. Hands the entire turn to a `codex
+    app-server` subprocess and projects its events back into Hermes'
+    messages list so memory/skill review keep working.
+
+    Called from run_conversation() when agent.api_mode == "codex_app_server".
+    Returns the same dict shape as the chat_completions path.
+    """
+    from agent.transports.codex_app_server_session import CodexAppServerSession
+
+    # Lazy session: one CodexAppServerSession per AIAgent instance.
+    # Spawned on first turn, reused across turns, closed at AIAgent
+    # shutdown (see _cleanup hook).
+    if not hasattr(agent, "_codex_session") or agent._codex_session is None:
+        cwd = getattr(agent, "session_cwd", None) or os.getcwd()
+        # Approval callback: defer to Hermes' standard prompt flow if a
+        # CLI thread has installed one. Gateway / cron contexts get the
+        # codex-side fail-closed default.
+        try:
+            from tools.terminal_tool import _get_approval_callback
+            approval_callback = _get_approval_callback()
+        except Exception:
+            approval_callback = None
+        agent._codex_session = CodexAppServerSession(
+            cwd=cwd,
+            approval_callback=approval_callback,
+        )
+
+    # NOTE: the user message is ALREADY appended to messages by the
+    # standard run_conversation() flow (line ~11823) before the early
+    # return reaches us. Do NOT append again — that would duplicate.
+
+    try:
+        turn = agent._codex_session.run_turn(user_input=user_message)
+    except Exception as exc:
+        logger.exception("codex app-server turn failed")
+        # Crash → unconditionally drop the session so the next turn
+        # respawns from scratch instead of reusing a dead client.
+        try:
+            agent._codex_session.close()
+        except Exception:
+            pass
+        agent._codex_session = None
+        return {
+            "final_response": (
+                f"Codex app-server turn failed: {exc}. "
+                f"Fall back to default runtime with `/codex-runtime auto`."
+            ),
+            "messages": messages,
+            "api_calls": 0,
+            "completed": False,
+            "partial": True,
+            "error": str(exc),
+        }
+
+    # If the turn signalled the underlying client is wedged (deadline
+    # blown, post-tool watchdog tripped, OAuth refresh died, subprocess
+    # exited), retire the session so the next turn respawns codex
+    # rather than riding the broken process. Mirrors openclaw beta.8's
+    # "retire timed-out app-server clients" fix.
+    if getattr(turn, "should_retire", False):
+        logger.warning(
+            "codex app-server session retired (turn error: %s)",
+            turn.error,
+        )
+        try:
+            agent._codex_session.close()
+        except Exception:
+            pass
+        agent._codex_session = None
+
+    # Splice projected messages into the conversation. The projector emits
+    # standard {role, content, tool_calls, tool_call_id} entries, which
+    # is exactly what curator.py / sessions DB expect.
+    if turn.projected_messages:
+        messages.extend(turn.projected_messages)
+
+    # Counter ticks for the agent-improvement loop.
+    # _turns_since_memory and _user_turn_count are ALREADY incremented
+    # in the run_conversation() pre-loop block (lines ~11793-11817) so we
+    # do NOT touch them here — that would double-count.
+    # Only _iters_since_skill needs explicit increment, since the
+    # chat_completions loop bumps it per tool iteration (line ~12110)
+    # and that loop is bypassed on this path.
+    agent._iters_since_skill = (
+        getattr(agent, "_iters_since_skill", 0) + turn.tool_iterations
+    )
+
+    # Now check the skill nudge AFTER iters were incremented — same
+    # pattern the chat_completions path uses (line ~15432).
+    should_review_skills = False
+    if (
+        agent._skill_nudge_interval > 0
+        and agent._iters_since_skill >= agent._skill_nudge_interval
+        and "skill_manage" in agent.valid_tool_names
+    ):
+        should_review_skills = True
+        agent._iters_since_skill = 0
+
+    # External memory provider sync (mirrors line ~15439). Skipped on
+    # interrupt/error to avoid feeding partial transcripts to memory.
+    if not turn.interrupted and turn.error is None:
+        try:
+            agent._sync_external_memory_for_turn(
+                original_user_message=original_user_message,
+                final_response=turn.final_text,
+                interrupted=False,
+            )
+        except Exception:
+            logger.debug("external memory sync raised", exc_info=True)
+
+    # Background review fork — same cadence + signature as the default
+    # path (line ~15449). Only fires when a trigger actually tripped AND
+    # we have a real final response.
+    if (
+        turn.final_text
+        and not turn.interrupted
+        and (should_review_memory or should_review_skills)
+    ):
+        try:
+            agent._spawn_background_review(
+                messages_snapshot=list(messages),
+                review_memory=should_review_memory,
+                review_skills=should_review_skills,
+            )
+        except Exception:
+            logger.debug("background review spawn raised", exc_info=True)
+
+    return {
+        "final_response": turn.final_text,
+        "messages": messages,
+        "api_calls": 1,  # one app-server "turn" maps to one logical API call
+        "completed": not turn.interrupted and turn.error is None,
+        "partial": turn.interrupted or turn.error is not None,
+        "error": turn.error,
+        "codex_thread_id": turn.thread_id,
+        "codex_turn_id": turn.turn_id,
+    }
+
+
+
+
+def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta: callable = None):
+    """Execute one streaming Responses API request and return the final response."""
+    import httpx as _httpx
+
+    active_client = client or agent._ensure_primary_openai_client(reason="codex_stream_direct")
+    max_stream_retries = 1
+    has_tool_calls = False
+    first_delta_fired = False
+    # Accumulate streamed text so we can recover if get_final_response()
+    # returns empty output (e.g. chatgpt.com backend-api sends
+    # response.incomplete instead of response.completed).
+    agent._codex_streamed_text_parts: list = []
+    for attempt in range(max_stream_retries + 1):
+        if agent._interrupt_requested:
+            raise InterruptedError("Agent interrupted before Codex stream retry")
+        collected_output_items: list = []
+        try:
+            with active_client.responses.stream(**api_kwargs) as stream:
+                for event in stream:
+                    agent._touch_activity("receiving stream response")
+                    if agent._interrupt_requested:
+                        break
+                    event_type = getattr(event, "type", "")
+                    # Fire callbacks on text content deltas (suppress during tool calls)
+                    if "output_text.delta" in event_type or event_type == "response.output_text.delta":
+                        delta_text = getattr(event, "delta", "")
+                        if delta_text:
+                            agent._codex_streamed_text_parts.append(delta_text)
+                        if delta_text and not has_tool_calls:
+                            if not first_delta_fired:
+                                first_delta_fired = True
+                                if on_first_delta:
+                                    try:
+                                        on_first_delta()
+                                    except Exception:
+                                        pass
+                            agent._fire_stream_delta(delta_text)
+                    # Track tool calls to suppress text streaming
+                    elif "function_call" in event_type:
+                        has_tool_calls = True
+                    # Fire reasoning callbacks
+                    elif "reasoning" in event_type and "delta" in event_type:
+                        reasoning_text = getattr(event, "delta", "")
+                        if reasoning_text:
+                            agent._fire_reasoning_delta(reasoning_text)
+                    # Collect completed output items — some backends
+                    # (chatgpt.com/backend-api/codex) stream valid items
+                    # via response.output_item.done but the SDK's
+                    # get_final_response() returns an empty output list.
+                    elif event_type == "response.output_item.done":
+                        done_item = getattr(event, "item", None)
+                        if done_item is not None:
+                            collected_output_items.append(done_item)
+                    # Log non-completed terminal events for diagnostics
+                    elif event_type in {"response.incomplete", "response.failed"}:
+                        resp_obj = getattr(event, "response", None)
+                        status = getattr(resp_obj, "status", None) if resp_obj else None
+                        incomplete_details = getattr(resp_obj, "incomplete_details", None) if resp_obj else None
+                        logger.warning(
+                            "Codex Responses stream received terminal event %s "
+                            "(status=%s, incomplete_details=%s, streamed_chars=%d). %s",
+                            event_type, status, incomplete_details,
+                            sum(len(p) for p in agent._codex_streamed_text_parts),
+                            agent._client_log_context(),
+                        )
+                final_response = stream.get_final_response()
+                # PATCH: ChatGPT Codex backend streams valid output items
+                # but get_final_response() can return an empty output list.
+                # Backfill from collected items or synthesize from deltas.
+                _out = getattr(final_response, "output", None)
+                if isinstance(_out, list) and not _out:
+                    if collected_output_items:
+                        final_response.output = list(collected_output_items)
+                        logger.debug(
+                            "Codex stream: backfilled %d output items from stream events",
+                            len(collected_output_items),
+                        )
+                    elif agent._codex_streamed_text_parts and not has_tool_calls:
+                        assembled = "".join(agent._codex_streamed_text_parts)
+                        final_response.output = [SimpleNamespace(
+                            type="message",
+                            role="assistant",
+                            status="completed",
+                            content=[SimpleNamespace(type="output_text", text=assembled)],
+                        )]
+                        logger.debug(
+                            "Codex stream: synthesized output from %d text deltas (%d chars)",
+                            len(agent._codex_streamed_text_parts), len(assembled),
+                        )
+                return final_response
+        except (_httpx.RemoteProtocolError, _httpx.ReadTimeout, _httpx.ConnectError, ConnectionError) as exc:
+            if attempt < max_stream_retries:
+                logger.debug(
+                    "Codex Responses stream transport failed (attempt %s/%s); retrying. %s error=%s",
+                    attempt + 1,
+                    max_stream_retries + 1,
+                    agent._client_log_context(),
+                    exc,
+                )
+                continue
+            logger.debug(
+                "Codex Responses stream transport failed; falling back to create(stream=True). %s error=%s",
+                agent._client_log_context(),
+                exc,
+            )
+            return agent._run_codex_create_stream_fallback(api_kwargs, client=active_client)
+        except RuntimeError as exc:
+            err_text = str(exc)
+            missing_completed = "response.completed" in err_text
+            # The OpenAI SDK's Responses streaming state machine raises
+            # ``RuntimeError("Expected to have received `response.created`
+            # before `<event-type>`")`` when the first SSE event from the
+            # server is anything other than ``response.created`` — and it
+            # discards the event's payload before we can read it.  Three
+            # real-world backends emit a different first frame:
+            #
+            #   * xAI on grok-4.x OAuth — sends ``error`` (issues
+            #     reported around the May 2026 SuperGrok rollout when
+            #     multi-turn conversations replay encrypted reasoning
+            #     content the OAuth tier rejects)
+            #   * codex-lb relays — send ``codex.rate_limits`` (#14634)
+            #   * custom Responses relays — send ``response.in_progress``
+            #     (#8133)
+            #
+            # In all three cases the underlying byte stream is still
+            # readable: a non-stream ``responses.create(stream=True)``
+            # fallback succeeds and surfaces the real provider error as
+            # a normal exception with body+status_code attached, which
+            # ``_summarize_api_error`` can then translate into a useful
+            # user-facing line.  Treat ``response.created`` prelude
+            # errors the same way we already treat ``response.completed``
+            # postlude errors.
+            prelude_error = (
+                "Expected to have received `response.created`" in err_text
+                or "Expected to have received \"response.created\"" in err_text
+            )
+            if (missing_completed or prelude_error) and attempt < max_stream_retries:
+                logger.debug(
+                    "Responses stream %s (attempt %s/%s); retrying. %s",
+                    "prelude rejected" if prelude_error else "closed before completion",
+                    attempt + 1,
+                    max_stream_retries + 1,
+                    agent._client_log_context(),
+                )
+                continue
+            if missing_completed or prelude_error:
+                logger.debug(
+                    "Responses stream %s; falling back to create(stream=True). %s err=%s",
+                    "rejected before response.created" if prelude_error else "did not emit response.completed",
+                    agent._client_log_context(),
+                    err_text,
+                )
+                return agent._run_codex_create_stream_fallback(api_kwargs, client=active_client)
+            raise
+
+
+
+def run_codex_create_stream_fallback(agent, api_kwargs: dict, client: Any = None):
+    """Fallback path for stream completion edge cases on Codex-style Responses backends."""
+    active_client = client or agent._ensure_primary_openai_client(reason="codex_create_stream_fallback")
+    fallback_kwargs = dict(api_kwargs)
+    fallback_kwargs["stream"] = True
+    fallback_kwargs = agent._get_transport().preflight_kwargs(fallback_kwargs, allow_stream=True)
+    stream_or_response = active_client.responses.create(**fallback_kwargs)
+
+    # Compatibility shim for mocks or providers that still return a concrete response.
+    if hasattr(stream_or_response, "output"):
+        return stream_or_response
+    if not hasattr(stream_or_response, "__iter__"):
+        return stream_or_response
+
+    terminal_response = None
+    collected_output_items: list = []
+    collected_text_deltas: list = []
+    try:
+        for event in stream_or_response:
+            agent._touch_activity("receiving stream response")
+            event_type = getattr(event, "type", None)
+            if not event_type and isinstance(event, dict):
+                event_type = event.get("type")
+
+            # ``error`` SSE frames carry the provider's real failure
+            # reason (subscription / quota / model-not-available /
+            # rejected-reasoning-replay) but never appear in the
+            # ``{completed, incomplete, failed}`` terminal set, so the
+            # raw loop below would silently consume them and end with
+            # "did not emit a terminal response".  xAI in particular
+            # emits ``type=error`` as the FIRST frame for OAuth
+            # accounts whose Grok subscription is missing/exhausted —
+            # the SDK's stream helper raises ``RuntimeError(Expected
+            # to have received response.created before error)`` which
+            # the caller catches and routes here, expecting this
+            # fallback to surface the message.  Synthesize an
+            # APIError-shaped exception so ``_summarize_api_error``
+            # and the credential-pool entitlement detector see the
+            # real text instead of a generic RuntimeError.
+            if event_type == "error":
+                err_message = getattr(event, "message", None)
+                if not err_message and isinstance(event, dict):
+                    err_message = event.get("message")
+                err_code = getattr(event, "code", None)
+                if not err_code and isinstance(event, dict):
+                    err_code = event.get("code")
+                err_param = getattr(event, "param", None)
+                if not err_param and isinstance(event, dict):
+                    err_param = event.get("param")
+                err_message = (err_message or "stream emitted error event").strip()
+                from run_agent import _StreamErrorEvent
+                raise _StreamErrorEvent(err_message, code=err_code, param=err_param)
+
+            # Collect output items and text deltas for backfill
+            if event_type == "response.output_item.done":
+                done_item = getattr(event, "item", None)
+                if done_item is None and isinstance(event, dict):
+                    done_item = event.get("item")
+                if done_item is not None:
+                    collected_output_items.append(done_item)
+            elif event_type in {"response.output_text.delta",}:
+                delta = getattr(event, "delta", "")
+                if not delta and isinstance(event, dict):
+                    delta = event.get("delta", "")
+                if delta:
+                    collected_text_deltas.append(delta)
+
+            if event_type not in {"response.completed", "response.incomplete", "response.failed"}:
+                continue
+
+            terminal_response = getattr(event, "response", None)
+            if terminal_response is None and isinstance(event, dict):
+                terminal_response = event.get("response")
+            if terminal_response is not None:
+                # Backfill empty output from collected stream events
+                _out = getattr(terminal_response, "output", None)
+                if isinstance(_out, list) and not _out:
+                    if collected_output_items:
+                        terminal_response.output = list(collected_output_items)
+                        logger.debug(
+                            "Codex fallback stream: backfilled %d output items",
+                            len(collected_output_items),
+                        )
+                    elif collected_text_deltas:
+                        assembled = "".join(collected_text_deltas)
+                        terminal_response.output = [SimpleNamespace(
+                            type="message", role="assistant",
+                            status="completed",
+                            content=[SimpleNamespace(type="output_text", text=assembled)],
+                        )]
+                        logger.debug(
+                            "Codex fallback stream: synthesized from %d deltas (%d chars)",
+                            len(collected_text_deltas), len(assembled),
+                        )
+                return terminal_response
+    finally:
+        close_fn = getattr(stream_or_response, "close", None)
+        if callable(close_fn):
+            try:
+                close_fn()
+            except Exception:
+                pass
+
+    if terminal_response is not None:
+        return terminal_response
+    raise RuntimeError("Responses create(stream=True) fallback did not emit a terminal response.")
+
+
+
+__all__ = [
+    "run_codex_app_server_turn",
+    "run_codex_stream",
+    "run_codex_create_stream_fallback",
+]
--- a/agent/context_compressor.py
+++ b/agent/context_compressor.py
@@ -23,7 +23,7 @@ import re
 import time
 from typing import Any, Dict, List, Optional

-from agent.auxiliary_client import call_llm
+from agent.auxiliary_client import call_llm, _is_connection_error
 from agent.context_engine import ContextEngine
 from agent.model_metadata import (
    MINIMUM_CONTEXT_LENGTH,
@@ -167,7 +167,7 @@ def _strip_image_parts_from_parts(parts: Any) -> Any:
            out.append(part)
            continue
        ptype = part.get("type")
-        if ptype in ("image", "image_url", "input_image"):
+        if ptype in {"image", "image_url", "input_image"}:
            had_image = True
            out.append({"type": "text", "text": "[screenshot removed to save context]"})
        else:
@@ -221,6 +221,114 @@ def _truncate_tool_call_args_json(args: str, head_chars: int = 200) -> str:
    return json.dumps(shrunken, ensure_ascii=False)


+_IMAGE_PART_TYPES = frozenset({"image_url", "input_image", "image"})
+
+
+def _is_image_part(part: Any) -> bool:
+    """True if ``part`` is a multimodal image content block.
+
+    Recognizes all three shapes the agent handles:
+      - OpenAI chat.completions: ``{"type": "image_url", "image_url": ...}``
+      - OpenAI Responses API:    ``{"type": "input_image", "image_url": "..."}``
+      - Anthropic native:        ``{"type": "image", "source": {...}}``
+    """
+    if not isinstance(part, dict):
+        return False
+    return part.get("type") in _IMAGE_PART_TYPES
+
+
+def _content_has_images(content: Any) -> bool:
+    """True if a message's ``content`` is a multimodal list with image parts."""
+    if not isinstance(content, list):
+        return False
+    return any(_is_image_part(p) for p in content)
+
+
+def _strip_images_from_content(content: Any) -> Any:
+    """Return a copy of ``content`` with every image part replaced by a
+    short text placeholder.
+
+    - String content is returned unchanged.
+    - Non-list, non-string content is returned unchanged.
+    - List content: image parts become ``{"type": "text", "text": "[Attached
+      image — stripped after compression]"}``; other parts are preserved as-is.
+
+    Input is never mutated.
+    """
+    if not isinstance(content, list):
+        return content
+    if not any(_is_image_part(p) for p in content):
+        return content
+
+    new_parts: List[Any] = []
+    for p in content:
+        if _is_image_part(p):
+            new_parts.append({
+                "type": "text",
+                "text": "[Attached image — stripped after compression]",
+            })
+        else:
+            new_parts.append(p)
+    return new_parts
+
+
+def _strip_historical_media(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    """Replace image parts in older messages with placeholder text.
+
+    The anchor is the *last* user message that has any image content. Every
+    message before that anchor gets its image parts replaced with a short
+    placeholder so the outgoing request stops re-shipping the same multi-MB
+    base-64 image blobs on every turn.
+
+    If no user message carries images, the list is returned unchanged.
+    If the only user message with images is the very first one (nothing
+    earlier to strip), the list is returned unchanged.
+
+    Shallow copies of touched messages only; input is never mutated.
+    Port of Kilo-Org/kilocode#9434 (adapted for the OpenAI-style message
+    shape the hermes compressor emits).
+    """
+    if not messages:
+        return messages
+
+    # Find the newest user message that carries at least one image part.
+    # We anchor on image-bearing user messages (not all user messages) so
+    # a plain text follow-up after a big-image turn still strips the old
+    # image — matching the problem kilocode#9434 set out to solve.
+    anchor = -1
+    for i in range(len(messages) - 1, -1, -1):
+        msg = messages[i]
+        if not isinstance(msg, dict):
+            continue
+        if msg.get("role") != "user":
+            continue
+        if _content_has_images(msg.get("content")):
+            anchor = i
+            break
+
+    if anchor <= 0:
+        # No image-bearing user message, or it's the very first message —
+        # nothing before it to strip.
+        return messages
+
+    changed = False
+    result: List[Dict[str, Any]] = []
+    for i, msg in enumerate(messages):
+        if i >= anchor or not isinstance(msg, dict):
+            result.append(msg)
+            continue
+        content = msg.get("content")
+        if not _content_has_images(content):
+            result.append(msg)
+            continue
+        new_msg = msg.copy()
+        new_msg["content"] = _strip_images_from_content(content)
+        result.append(new_msg)
+        changed = True
+
+    return result if changed else messages
+
+
 def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) -> str:
    """Create an informative 1-line summary of a tool call + result.

@@ -274,8 +382,8 @@ def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) ->
        mode = args.get("mode", "replace")
        return f"[patch] {mode} in {path} ({content_len:,} chars result)"

-    if tool_name in ("browser_navigate", "browser_click", "browser_snapshot",
-                     "browser_type", "browser_scroll", "browser_vision"):
+    if tool_name in {"browser_navigate", "browser_click", "browser_snapshot",
+                     "browser_type", "browser_scroll", "browser_vision"}:
        url = args.get("url", "")
        ref = args.get("ref", "")
        detail = f" {url}" if url else (f" ref={ref}" if ref else "")
@@ -304,7 +412,7 @@ def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) ->
            code_preview += "..."
        return f"[execute_code] `{code_preview}` ({line_count} lines output)"

-    if tool_name in ("skill_view", "skills_list", "skill_manage"):
+    if tool_name in {"skill_view", "skills_list", "skill_manage"}:
        name = args.get("name", "?")
        return f"[{tool_name}] name={name} ({content_len:,} chars)"

@@ -378,7 +486,7 @@ class ContextCompressor(ContextEngine):
        model: str,
        context_length: int,
        base_url: str = "",
-        api_key: str = "",
+        api_key: Any = "",
        provider: str = "",
        api_mode: str = "",
    ) -> None:
@@ -415,6 +523,7 @@ class ContextCompressor(ContextEngine):
        config_context_length: int | None = None,
        provider: str = "",
        api_mode: str = "",
+        abort_on_summary_failure: bool = False,
    ):
        self.model = model
        self.base_url = base_url
@@ -426,6 +535,11 @@ class ContextCompressor(ContextEngine):
        self.protect_last_n = protect_last_n
        self.summary_target_ratio = max(0.10, min(summary_target_ratio, 0.80))
        self.quiet_mode = quiet_mode
+        # When True, summary-generation failure aborts compression entirely
+        # (returns messages unchanged, sets _last_compress_aborted=True).
+        # When False (default = historical behavior), insert a static
+        # "summary unavailable" placeholder and drop the middle window.
+        self.abort_on_summary_failure = abort_on_summary_failure

        self.context_length = get_model_context_length(
            model, base_url=base_url, api_key=api_key,
@@ -478,6 +592,12 @@ class ContextCompressor(ContextEngine):
        # (gateway hygiene, /compress) can surface a visible warning.
        self._last_summary_dropped_count: int = 0
        self._last_summary_fallback_used: bool = False
+        # When summary generation fails we now ABORT compression entirely
+        # and return the original messages unchanged instead of dropping
+        # the middle window with a static placeholder.  Callers inspect
+        # this flag to know "compression was attempted but aborted, freeze
+        # the chat until the user manually retries via /compress".
+        self._last_compress_aborted: bool = False
        # When a user-configured summary model fails and we recover by
        # retrying on the main model, record the failure so gateway /
        # CLI callers can still warn the user even though compression
@@ -979,13 +1099,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            _status = getattr(e, "status_code", None) or getattr(getattr(e, "response", None), "status_code", None)
            _err_str = str(e).lower()
            _is_model_not_found = (
-                _status in (404, 503)
+                _status in {404, 503}
                or "model_not_found" in _err_str
                or "does not exist" in _err_str
                or "no available channel" in _err_str
            )
            _is_timeout = (
-                _status in (408, 429, 502, 504)
+                _status in {408, 429, 502, 504}
                or "timeout" in _err_str
            )
            # Non-JSON / malformed-body responses from misconfigured providers
@@ -1000,6 +1120,14 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                isinstance(e, json.JSONDecodeError)
                or "expecting value" in _err_str
            )
+            # httpcore / httpx streaming premature-close errors surface as
+            # ConnectionError subclasses or plain Exception with characteristic
+            # substrings ("incomplete chunked read", "peer closed connection",
+            # "response ended prematurely", "unexpected eof").  These are
+            # transient network events; treat them like a timeout so we fall
+            # back to the main model instead of entering a 60-second cooldown.
+            # See issue #18458.
+            _is_streaming_closed = _is_connection_error(e)
            if _is_json_decode and not _is_model_not_found and not _is_timeout:
                logger.error(
                    "Context compression failed: auxiliary LLM returned a "
@@ -1012,7 +1140,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                    e,
                )
            if (
-                (_is_model_not_found or _is_timeout or _is_json_decode)
+                (_is_model_not_found or _is_timeout or _is_json_decode or _is_streaming_closed)
                and self.summary_model
                and self.summary_model != self.model
                and not getattr(self, "_summary_model_fallen_back", False)
@@ -1021,6 +1149,8 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                    _reason = "returned invalid JSON"
                elif _is_model_not_found:
                    _reason = "unavailable"
+                elif _is_streaming_closed:
+                    _reason = "closed stream prematurely"
                else:
                    _reason = "timed out"
                self._fallback_to_main_for_compression(e, _reason)
@@ -1043,10 +1173,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                self._fallback_to_main_for_compression(e, "failed")
                return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)

-            # Transient errors (timeout, rate limit, network, JSON decode) —
-            # shorter cooldown for JSON decode since the body shape can flip
-            # back to valid quickly when an upstream proxy recovers.
-            _transient_cooldown = 30 if _is_json_decode else 60
+            # Transient errors (timeout, rate limit, network, JSON decode,
+            # streaming premature-close) — shorter cooldown for JSON decode and
+            # streaming-closed since those conditions can self-resolve quickly.
+            _transient_cooldown = 30 if (_is_json_decode or _is_streaming_closed) else 60
            self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
            err_text = str(e).strip() or e.__class__.__name__
            if len(err_text) > 220:
@@ -1175,6 +1305,26 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            idx += 1
        return idx

+    def _protect_head_size(self, messages: List[Dict[str, Any]]) -> int:
+        """Total count of head messages to protect.
+
+        ``protect_first_n`` is defined as *additional* messages protected
+        beyond the system prompt.  The system prompt (if present at index 0)
+        is always implicitly protected — it's load-bearing context that
+        must never be summarised away.  This keeps semantics stable across
+        call paths where the system prompt may or may not be included in
+        the ``messages`` list (e.g. the gateway ``/compress`` handler
+        strips it before calling compress()).
+
+        Examples:
+          protect_first_n=0 → system prompt only (or nothing if no system msg)
+          protect_first_n=3 → system + first 3 non-system messages
+        """
+        head = 0
+        if messages and messages[0].get("role") == "system":
+            head = 1
+        return head + self.protect_first_n
+
    def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
        """Pull a compress-end boundary backward to avoid splitting a
        tool_call / result group.
@@ -1306,8 +1456,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio

        # Ensure we protect at least min_tail messages
        fallback_cut = n - min_tail
-        if cut_idx > fallback_cut:
-            cut_idx = fallback_cut
+        cut_idx = min(cut_idx, fallback_cut)

        # If the token budget would protect everything (small conversations),
        # force a cut after the head so compression can still remove middle turns.
@@ -1334,7 +1483,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        skip the LLM call when the transcript is still entirely inside
        the protected head/tail.
        """
-        compress_start = self._align_boundary_forward(messages, self.protect_first_n)
+        compress_start = self._align_boundary_forward(messages, self._protect_head_size(messages))
        compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
        return compress_start < compress_end

@@ -1342,7 +1491,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
    # Main compression entry point
    # ------------------------------------------------------------------

-    def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None, focus_topic: str = None) -> List[Dict[str, Any]]:
+    def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None, focus_topic: str = None, force: bool = False) -> List[Dict[str, Any]]:
        """Compress conversation messages by summarizing middle turns.

        Algorithm:
@@ -1360,6 +1509,9 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                provided, the summariser will prioritise preserving information
                related to this topic and be more aggressive about compressing
                everything else.  Inspired by Claude Code's ``/compact``.
+            force: If True, clear any active summary-failure cooldown before
+                running so a manual ``/compress`` can retry immediately after
+                an auto-compression abort.  Auto-compress callers pass False.
        """
        # Reset per-call summary failure state — callers inspect these fields
        # after compress() returns to decide whether to surface a warning.
@@ -1368,9 +1520,16 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        self._last_summary_error = None
        self._last_aux_model_failure_error = None
        self._last_aux_model_failure_model = None
+        self._last_compress_aborted = False
+
+        # Manual /compress (force=True) bypasses the failure cooldown so the
+        # user can retry immediately after an auto-compress abort.  Without
+        # this, /compress would silently no-op for 30-60s after a failure.
+        if force and self._summary_failure_cooldown_until > 0.0:
+            self._summary_failure_cooldown_until = 0.0
        n_messages = len(messages)
        # Only need head + 3 tail messages minimum (token budget decides the real tail size)
-        _min_for_compress = self.protect_first_n + 3 + 1
+        _min_for_compress = self._protect_head_size(messages) + 3 + 1
        if n_messages <= _min_for_compress:
            if not self.quiet_mode:
                logger.warning(
@@ -1390,7 +1549,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)

        # Phase 2: Determine boundaries
-        compress_start = self.protect_first_n
+        compress_start = self._protect_head_size(messages)
        compress_start = self._align_boundary_forward(messages, compress_start)

        # Use token-budget tail protection instead of fixed message count
@@ -1400,15 +1559,23 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            return messages

        turns_to_summarize = messages[compress_start:compress_end]
+        # A persisted handoff summary can sit in the protected head after a
+        # resume (commonly immediately after the system prompt). Search from
+        # the first non-system message through the compression window so we can
+        # rehydrate iterative-summary state without serializing that handoff as
+        # a new turn. Protected messages after the handoff remain live context,
+        # so only summarize messages that are both after the handoff and inside
+        # the current compression window.
+        summary_search_start = 1 if messages and messages[0].get("role") == "system" else 0
        summary_idx, summary_body = self._find_latest_context_summary(
            messages,
-            compress_start,
+            summary_search_start,
            compress_end,
        )
        if summary_idx is not None:
            if summary_body and not self._previous_summary:
                self._previous_summary = summary_body
-            turns_to_summarize = messages[summary_idx + 1:compress_end]
+            turns_to_summarize = messages[max(compress_start, summary_idx + 1):compress_end]

        if not self.quiet_mode:
            logger.info(
@@ -1435,6 +1602,32 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        # Phase 3: Generate structured summary
        summary = self._generate_summary(turns_to_summarize, focus_topic=focus_topic)

+        # If summary generation failed, behavior splits on
+        # ``abort_on_summary_failure`` (config: compression.abort_on_summary_failure):
+        #   True  → ABORT compression entirely. Return messages unchanged
+        #           and set _last_compress_aborted=True so callers can warn
+        #           the user and stop the auto-compress retry loop.
+        #   False → Fall through to the legacy fallback path below: insert
+        #           a static "summary unavailable" placeholder and drop the
+        #           middle window.  Records _last_summary_fallback_used /
+        #           _last_summary_dropped_count for gateway hygiene to
+        #           surface a warning.
+        # Default is False (historical behavior).
+        if not summary and self.abort_on_summary_failure:
+            n_skipped = compress_end - compress_start
+            self._last_summary_dropped_count = 0  # nothing actually dropped
+            self._last_summary_fallback_used = False
+            self._last_compress_aborted = True
+            if not self.quiet_mode:
+                logger.warning(
+                    "Summary generation failed — aborting compression "
+                    "(compression.abort_on_summary_failure=true). "
+                    "%d message(s) preserved unchanged. Conversation is "
+                    "frozen until the next /compress or /new.",
+                    n_skipped,
+                )
+            return messages
+
        # Phase 4: Assemble compressed message list
        compressed = []
        for i in range(compress_start):
@@ -1449,7 +1642,8 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                    )
            compressed.append(msg)

-        # If LLM summary failed, insert a static fallback so the model
+        # Legacy fallback path: LLM summary failed and abort_on_summary_failure
+        # is False (the default).  Insert a static placeholder so the model
        # knows context was lost rather than silently dropping everything.
        if not summary:
            if not self.quiet_mode:
@@ -1470,7 +1664,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        first_tail_role = messages[compress_end].get("role", "user") if compress_end < n_messages else "user"
        # Pick a role that avoids consecutive same-role with both neighbors.
        # Priority: avoid colliding with head (already committed), then tail.
-        if last_head_role in ("assistant", "tool"):
+        if last_head_role in {"assistant", "tool"}:
            summary_role = "user"
        else:
            summary_role = "assistant"
@@ -1522,6 +1716,14 @@ The user has requested that this compaction PRIORITISE preserving all informatio

        compressed = self._sanitize_tool_pairs(compressed)

+        # Replace image parts in all compressed messages before the newest
+        # image-bearing user turn with a short text placeholder. Without
+        # this, tail messages keep their original multi-MB base-64 image
+        # payloads forever, which can push every subsequent API request
+        # past the provider's body-size limit and wedge the session.
+        # Port of Kilo-Org/kilocode#9434.
+        compressed = _strip_historical_media(compressed)
+
        new_estimate = estimate_messages_tokens_rough(compressed)
        saved_estimate = display_tokens - new_estimate

--- a/agent/context_engine.py
+++ b/agent/context_engine.py
@@ -55,6 +55,11 @@ class ContextEngine(ABC):
    # These control the preflight compression check.  Subclasses may
    # override via __init__ or property; defaults are sensible for most
    # engines.
+    #
+    # protect_first_n semantics (since PR #13754): count of non-system head
+    # messages always preserved verbatim, IN ADDITION to the system prompt
+    # which is always implicitly protected.  Default 3 keeps the
+    # historical "system + first 3 non-system messages" head shape.

    threshold_percent: float = 0.75
    protect_first_n: int = 3
--- a/agent/conversation_compression.py
+++ b/agent/conversation_compression.py
@@ -0,0 +1,603 @@
+"""Context compression — extract the AIAgent methods that drive summarisation.
+
+Three concerns live here:
+
+* :func:`check_compression_model_feasibility` — startup probe of the
+  configured auxiliary compression model.  Warns when the aux context
+  window can't fit the main model's compression threshold; auto-lowers
+  the session threshold when possible; hard-rejects auxes below
+  ``MINIMUM_CONTEXT_LENGTH``.
+
+* :func:`replay_compression_warning` — re-emit a stored warning through
+  the gateway ``status_callback`` once it's wired up (the callback is
+  set after :class:`AIAgent` construction).
+
+* :func:`compress_context` — the actual compression call.  Runs the
+  configured compressor, splits the SQLite session, rotates the
+  session_id, notifies plugin context engines / memory providers, and
+  returns the compressed message list and freshly-built system prompt.
+
+* :func:`try_shrink_image_parts_in_messages` — image-too-large recovery
+  helper that re-encodes ``data:image/...;base64,...`` parts at a smaller
+  size so retries can fit under provider ceilings (Anthropic's 5 MB).
+
+``run_agent`` keeps thin wrappers for each so existing call sites
+(``self._compress_context(...)``) keep working.  Tests that exercise
+these paths see no behavioural change.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import tempfile
+import uuid
+from datetime import datetime
+from pathlib import Path
+from typing import Any, List, Optional, Tuple
+
+from agent.model_metadata import estimate_request_tokens_rough
+
+logger = logging.getLogger(__name__)
+
+
+def check_compression_model_feasibility(agent: Any) -> None:
+    """Warn at session start if the auxiliary compression model's context
+    window is smaller than the main model's compression threshold.
+
+    When the auxiliary model cannot fit the content that needs summarising,
+    compression will either fail outright (the LLM call errors) or produce
+    a severely truncated summary.
+
+    Called during ``AIAgent.__init__`` so CLI users see the warning
+    immediately (via ``_vprint``).  The gateway sets ``status_callback``
+    *after* construction, so :func:`replay_compression_warning` re-sends
+    the stored warning through the callback on the first
+    ``run_conversation()`` call.
+    """
+    if not agent.compression_enabled:
+        return
+    try:
+        from agent.auxiliary_client import (
+            _resolve_task_provider_model,
+            get_text_auxiliary_client,
+        )
+        from agent.model_metadata import (
+            MINIMUM_CONTEXT_LENGTH,
+            get_model_context_length,
+        )
+
+        client, aux_model = get_text_auxiliary_client(
+            "compression",
+            main_runtime=agent._current_main_runtime(),
+        )
+        # Best-effort aux provider label for the warning message. The
+        # configured provider may be "auto", in which case we fall back
+        # to the client's base_url hostname so the user can still tell
+        # where the compression model is actually being called.
+        try:
+            _aux_cfg_provider, _, _, _, _ = _resolve_task_provider_model("compression")
+        except Exception:
+            _aux_cfg_provider = ""
+        if client is None or not aux_model:
+            if _aux_cfg_provider and _aux_cfg_provider != "auto":
+                msg = (
+                    "⚠ Configured auxiliary compression provider "
+                    f"'{_aux_cfg_provider}' is unavailable — context "
+                    "compression will drop middle turns without a summary. "
+                    "Check auxiliary.compression in config.yaml and "
+                    "reauthenticate that provider."
+                )
+            else:
+                msg = (
+                    "⚠ No auxiliary LLM provider configured — context "
+                    "compression will drop middle turns without a summary. "
+                    "Run `hermes setup` or set OPENROUTER_API_KEY."
+                )
+            agent._compression_warning = msg
+            agent._emit_status(msg)
+            logger.warning(
+                "No auxiliary LLM provider for compression — "
+                "summaries will be unavailable."
+            )
+            return
+
+        aux_base_url = str(getattr(client, "base_url", ""))
+        # ``client.api_key`` may be a callable (Azure Foundry Entra ID
+        # bearer provider). The context-length resolver chain expects a
+        # string, but it only needs a key for live catalogue probes
+        # (provider model lists). For Entra clients the model-metadata
+        # chain still resolves via models.dev + hardcoded family
+        # fallbacks, which don't require auth — pass empty string rather
+        # than minting a bearer JWT just to look up a context length.
+        _raw_aux_key = getattr(client, "api_key", "")
+        aux_api_key = "" if (callable(_raw_aux_key) and not isinstance(_raw_aux_key, str)) else str(_raw_aux_key or "")
+
+        aux_context = get_model_context_length(
+            aux_model,
+            base_url=aux_base_url,
+            api_key=aux_api_key,
+            config_context_length=getattr(agent, "_aux_compression_context_length_config", None),
+            # Each model must be resolved with its own provider so that
+            # provider-specific paths (e.g. Bedrock static table, OpenRouter API)
+            # are invoked for the correct client, not inherited from the main model.
+            provider=(_aux_cfg_provider if _aux_cfg_provider and _aux_cfg_provider != "auto" else getattr(agent, "provider", "")),
+            custom_providers=agent._custom_providers,
+        )
+
+        # Hard floor: the auxiliary compression model must have at least
+        # MINIMUM_CONTEXT_LENGTH (64K) tokens of context.  The main model
+        # is already required to meet this floor (checked earlier in
+        # __init__), so the compression model must too — otherwise it
+        # cannot summarise a full threshold-sized window of main-model
+        # content.  Mirrors the main-model rejection pattern.
+        if aux_context and aux_context < MINIMUM_CONTEXT_LENGTH:
+            raise ValueError(
+                f"Auxiliary compression model {aux_model} has a context "
+                f"window of {aux_context:,} tokens, which is below the "
+                f"minimum {MINIMUM_CONTEXT_LENGTH:,} required by Hermes "
+                f"Agent.  Choose a compression model with at least "
+                f"{MINIMUM_CONTEXT_LENGTH // 1000}K context (set "
+                f"auxiliary.compression.model in config.yaml), or set "
+                f"auxiliary.compression.context_length to override the "
+                f"detected value if it is wrong."
+            )
+
+        threshold = agent.context_compressor.threshold_tokens
+        if aux_context < threshold:
+            # Auto-correct: lower the live session threshold so
+            # compression actually works this session.  The hard floor
+            # above guarantees aux_context >= MINIMUM_CONTEXT_LENGTH,
+            # so the new threshold is always >= 64K.
+            #
+            # The compression summariser sends a single user-role
+            # prompt (no system prompt, no tools) to the aux model, so
+            # new_threshold == aux_context is safe: the request is
+            # the raw messages plus a small summarisation instruction.
+            old_threshold = threshold
+            new_threshold = aux_context
+            agent.context_compressor.threshold_tokens = new_threshold
+            # Keep threshold_percent in sync so future main-model
+            # context_length changes (update_model) re-derive from a
+            # sensible number rather than the original too-high value.
+            main_ctx = agent.context_compressor.context_length
+            if main_ctx:
+                agent.context_compressor.threshold_percent = (
+                    new_threshold / main_ctx
+                )
+            safe_pct = int((aux_context / main_ctx) * 100) if main_ctx else 50
+            # Build human-readable "model (provider)" labels for both
+            # the main model and the compression model so users can
+            # tell at a glance which provider each side is actually
+            # using. When the configured provider is empty or "auto",
+            # fall back to the client's base_url hostname.
+            _main_model = getattr(agent, "model", "") or "?"
+            _main_provider = getattr(agent, "provider", "") or ""
+            _aux_provider_label = (
+                _aux_cfg_provider
+                if _aux_cfg_provider and _aux_cfg_provider != "auto"
+                else ""
+            )
+            if not _aux_provider_label:
+                try:
+                    from urllib.parse import urlparse
+                    _aux_provider_label = (
+                        urlparse(aux_base_url).hostname or aux_base_url
+                    )
+                except Exception:
+                    _aux_provider_label = aux_base_url or "auto"
+            _main_label = (
+                f"{_main_model} ({_main_provider})"
+                if _main_provider
+                else _main_model
+            )
+            _aux_label = f"{aux_model} ({_aux_provider_label})"
+            msg = (
+                f"⚠ Compression model {_aux_label} context is "
+                f"{aux_context:,} tokens, but the main model "
+                f"{_main_label}'s compression threshold was "
+                f"{old_threshold:,} tokens. "
+                f"Auto-lowered this session's threshold to "
+                f"{new_threshold:,} tokens so compression can run.\n"
+                f"  To make this permanent, edit config.yaml — either:\n"
+                f"  1. Use a larger compression model:\n"
+                f"       auxiliary:\n"
+                f"         compression:\n"
+                f"           model: <model-with-{old_threshold:,}+-context>\n"
+                f"  2. Lower the compression threshold:\n"
+                f"       compression:\n"
+                f"         threshold: 0.{safe_pct:02d}"
+            )
+            agent._compression_warning = msg
+            agent._emit_status(msg)
+            logger.warning(
+                "Auxiliary compression model %s has %d token context, "
+                "below the main model's compression threshold of %d "
+                "tokens — auto-lowered session threshold to %d to "
+                "keep compression working.",
+                aux_model,
+                aux_context,
+                old_threshold,
+                new_threshold,
+            )
+    except ValueError:
+        # Hard rejections (aux below minimum context) must propagate
+        # so the session refuses to start.
+        raise
+    except Exception as exc:
+        logger.debug(
+            "Compression feasibility check failed (non-fatal): %s", exc
+        )
+
+
+def replay_compression_warning(agent: Any) -> None:
+    """Re-send the compression warning through ``status_callback``.
+
+    During ``__init__`` the gateway's ``status_callback`` is not yet
+    wired, so ``_emit_status`` only reaches ``_vprint`` (CLI).  This
+    method is called once at the start of the first
+    ``run_conversation()`` — by then the gateway has set the callback,
+    so every platform (Telegram, Discord, Slack, etc.) receives the
+    warning.
+    """
+    msg = getattr(agent, "_compression_warning", None)
+    if msg and agent.status_callback:
+        try:
+            agent.status_callback("lifecycle", msg)
+        except Exception:
+            pass
+
+
+def compress_context(
+    agent: Any,
+    messages: list,
+    system_message: str,
+    *,
+    approx_tokens: Optional[int] = None,
+    task_id: str = "default",
+    focus_topic: Optional[str] = None,
+    force: bool = False,
+) -> Tuple[list, str]:
+    """Compress conversation context and split the session in SQLite.
+
+    Args:
+        agent: The owning :class:`AIAgent`.
+        messages: Current message history (will be summarised).
+        system_message: Current system prompt; rebuilt after compression.
+        approx_tokens: Pre-compression token estimate, logged for ops.
+        task_id: Tool task scope (used for clearing file-read dedup state).
+        focus_topic: Optional focus string for guided compression — the
+            summariser will prioritise preserving information related to
+            this topic.  Inspired by Claude Code's ``/compact <focus>``.
+        force: If True, bypass any active summary-failure cooldown.  Set
+            by the manual ``/compress`` slash command so users can retry
+            immediately after an auto-compress abort.  Auto-compress
+            callers use the default ``False``.
+
+    Returns:
+        ``(compressed_messages, new_system_prompt)`` tuple.  When
+        compression aborts (aux LLM failed to produce a usable summary),
+        returns the original messages unchanged and the existing system
+        prompt — the session is NOT rotated.  Callers should detect the
+        no-op via ``len(returned) == len(input)`` and stop the retry loop.
+    """
+    # Lazy feasibility check — run the auxiliary-provider probe + context
+    # length lookup just-in-time on the first compression attempt instead of
+    # at AIAgent.__init__. Saves ~400ms cold off every short session that
+    # never reaches the threshold (the vast majority of ``chat -q`` runs).
+    # The check itself sets ``agent._compression_warning`` so the
+    # status-callback replay machinery still emits the warning to the user
+    # the first time it would matter.
+    if not getattr(agent, "_compression_feasibility_checked", True):
+        try:
+            check_compression_model_feasibility(agent)
+        finally:
+            agent._compression_feasibility_checked = True
+
+    _pre_msg_count = len(messages)
+    logger.info(
+        "context compression started: session=%s messages=%d tokens=~%s model=%s focus=%r",
+        agent.session_id or "none", _pre_msg_count,
+        f"{approx_tokens:,}" if approx_tokens else "unknown", agent.model,
+        focus_topic,
+    )
+    agent._emit_status(
+        "🗜️ Compacting context — summarizing earlier conversation so I can continue..."
+    )
+
+    # Notify external memory provider before compression discards context
+    if agent._memory_manager:
+        try:
+            agent._memory_manager.on_pre_compress(messages)
+        except Exception:
+            pass
+
+    try:
+        compressed = agent.context_compressor.compress(messages, current_tokens=approx_tokens, focus_topic=focus_topic, force=force)
+    except TypeError:
+        # Plugin context engine with strict signature that doesn't accept
+        # focus_topic / force — fall back to calling without them.
+        compressed = agent.context_compressor.compress(messages, current_tokens=approx_tokens)
+
+    # If compression aborted (aux LLM failed to produce a usable summary)
+    # the compressor returns the input messages unchanged.  Surface the
+    # error to the user, skip the session-rotation work entirely (no
+    # session has logically ended), and let auto-compress callers detect
+    # the no-op via len(returned) == len(input).
+    if getattr(agent.context_compressor, "_last_compress_aborted", False):
+        _err = getattr(agent.context_compressor, "_last_summary_error", None) or "unknown error"
+        if getattr(agent, "_last_compression_summary_warning", None) != _err:
+            agent._last_compression_summary_warning = _err
+            agent._emit_warning(
+                f"⚠ Compression aborted: {_err}. "
+                "No messages were dropped — conversation continues unchanged. "
+                "Run /compress to retry, or /new to start a fresh session."
+            )
+        _existing_sp = getattr(agent, "_cached_system_prompt", None)
+        if not _existing_sp:
+            _existing_sp = agent._build_system_prompt(system_message)
+        return messages, _existing_sp
+
+    summary_error = getattr(agent.context_compressor, "_last_summary_error", None)
+    if summary_error:
+        if getattr(agent, "_last_compression_summary_warning", None) != summary_error:
+            agent._last_compression_summary_warning = summary_error
+            agent._emit_warning(
+                f"⚠ Compression summary failed: {summary_error}. "
+                "Inserted a fallback context marker."
+            )
+    else:
+        # No hard failure — but did the configured aux model error out
+        # and get recovered by retrying on main?  Surface that so users
+        # know their auxiliary.compression.model setting is broken even
+        # though compression succeeded.
+        _aux_fail_model = getattr(agent.context_compressor, "_last_aux_model_failure_model", None)
+        _aux_fail_err = getattr(agent.context_compressor, "_last_aux_model_failure_error", None)
+        if _aux_fail_model:
+            # Dedup on (model, error) so we don't spam on every compaction
+            _aux_key = (_aux_fail_model, _aux_fail_err)
+            if getattr(agent, "_last_aux_fallback_warning_key", None) != _aux_key:
+                agent._last_aux_fallback_warning_key = _aux_key
+                agent._emit_warning(
+                    f"ℹ Configured compression model '{_aux_fail_model}' failed "
+                    f"({_aux_fail_err or 'unknown error'}). Recovered using main model — "
+                    "check auxiliary.compression.model in config.yaml."
+                )
+
+    todo_snapshot = agent._todo_store.format_for_injection()
+    if todo_snapshot:
+        compressed.append({"role": "user", "content": todo_snapshot})
+
+    agent._invalidate_system_prompt()
+    new_system_prompt = agent._build_system_prompt(system_message)
+    agent._cached_system_prompt = new_system_prompt
+
+    if agent._session_db:
+        try:
+            # Propagate title to the new session with auto-numbering
+            old_title = agent._session_db.get_session_title(agent.session_id)
+            # Trigger memory extraction on the old session before it rotates.
+            agent.commit_memory_session(messages)
+            agent._session_db.end_session(agent.session_id, "compression")
+            old_session_id = agent.session_id
+            agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
+            os.environ["HERMES_SESSION_ID"] = agent.session_id
+            try:
+                from gateway.session_context import _SESSION_ID
+                _SESSION_ID.set(agent.session_id)
+            except Exception:
+                pass
+            agent._session_db_created = False
+            agent._session_db.create_session(
+                session_id=agent.session_id,
+                source=agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
+                model=agent.model,
+                model_config=agent._session_init_model_config,
+                parent_session_id=old_session_id,
+            )
+            agent._session_db_created = True
+            # Auto-number the title for the continuation session
+            if old_title:
+                try:
+                    new_title = agent._session_db.get_next_title_in_lineage(old_title)
+                    agent._session_db.set_session_title(agent.session_id, new_title)
+                except (ValueError, Exception) as e:
+                    logger.debug("Could not propagate title on compression: %s", e)
+            agent._session_db.update_system_prompt(agent.session_id, new_system_prompt)
+            # Reset flush cursor — new session starts with no messages written
+            agent._last_flushed_db_idx = 0
+        except Exception as e:
+            logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
+
+    # Notify the context engine that the session_id rotated because of
+    # compression (not a fresh /new). Plugin engines (e.g. hermes-lcm) use
+    # boundary_reason="compression" to preserve DAG lineage across the
+    # rollover instead of re-initializing fresh per-session state.
+    # See hermes-lcm#68. Built-in ContextCompressor ignores kwargs.
+    try:
+        _old_sid = locals().get("old_session_id")
+        if _old_sid and hasattr(agent.context_compressor, "on_session_start"):
+            agent.context_compressor.on_session_start(
+                agent.session_id or "",
+                boundary_reason="compression",
+                old_session_id=_old_sid,
+            )
+    except Exception as _ce_err:
+        logger.debug("context engine on_session_start (compression): %s", _ce_err)
+
+    # Notify memory providers of the compression-driven session_id rotation
+    # so provider-cached per-session state (Hindsight's _document_id,
+    # accumulated turn buffers, counters) refreshes. reset=False because
+    # the logical conversation continues; only the id and DB row rolled
+    # over. See #6672.
+    try:
+        _old_sid = locals().get("old_session_id")
+        if _old_sid and agent._memory_manager:
+            agent._memory_manager.on_session_switch(
+                agent.session_id or "",
+                parent_session_id=_old_sid,
+                reset=False,
+                reason="compression",
+            )
+    except Exception as _me_err:
+        logger.debug("memory manager on_session_switch (compression): %s", _me_err)
+
+    # Warn on repeated compressions (quality degrades with each pass)
+    _cc = agent.context_compressor.compression_count
+    if _cc >= 2:
+        agent._vprint(
+            f"{agent.log_prefix}⚠️  Session compressed {_cc} times — "
+            f"accuracy may degrade. Consider /new to start fresh.",
+            force=True,
+        )
+
+    # Update token estimate after compaction so pressure calculations
+    # use the post-compression count, not the stale pre-compression one.
+    # Use estimate_request_tokens_rough() so tool schemas are included —
+    # with 50+ tools enabled, schemas alone can add 20-30K tokens, and
+    # omitting them delays the next compression cycle far past the
+    # configured threshold (issue #14695).
+    _compressed_est = estimate_request_tokens_rough(
+        compressed,
+        system_prompt=new_system_prompt or "",
+        tools=agent.tools or None,
+    )
+    agent.context_compressor.last_prompt_tokens = _compressed_est
+    agent.context_compressor.last_completion_tokens = 0
+
+    # Clear the file-read dedup cache.  After compression the original
+    # read content is summarised away — if the model re-reads the same
+    # file it needs the full content, not a "file unchanged" stub.
+    try:
+        from tools.file_tools import reset_file_dedup
+        reset_file_dedup(task_id)
+    except Exception:
+        pass
+
+    logger.info(
+        "context compression done: session=%s messages=%d->%d tokens=~%s",
+        agent.session_id or "none", _pre_msg_count, len(compressed),
+        f"{_compressed_est:,}",
+    )
+    return compressed, new_system_prompt
+
+
+def try_shrink_image_parts_in_messages(api_messages: list) -> bool:
+    """Re-encode all native image parts at a smaller size to recover from
+    image-too-large errors (Anthropic 5 MB, unknown other providers).
+
+    Mutates ``api_messages`` in place. Returns True if any image part was
+    actually replaced, False if there were no image parts to shrink or
+    Pillow couldn't help (caller should surface the original error).
+
+    Strategy: look for ``image_url`` / ``input_image`` parts carrying a
+    ``data:image/...;base64,...`` payload.  For each one whose encoded
+    size exceeds 4 MB (a safe target that slides under Anthropic's 5 MB
+    ceiling with header overhead), write the base64 to a tempfile, call
+    ``vision_tools._resize_image_for_vision`` to produce a smaller data
+    URL, and substitute it in place.
+
+    Non-data-URL images (http/https URLs) are not touched — the provider
+    fetches those itself and the size limit is different.
+    """
+    if not api_messages:
+        return False
+
+    try:
+        from tools.vision_tools import _resize_image_for_vision
+    except Exception as exc:
+        logger.warning("image-shrink recovery: vision_tools unavailable — %s", exc)
+        return False
+
+    # 4 MB target leaves comfortable headroom under Anthropic's 5 MB.
+    # Non-Anthropic providers we haven't observed rejecting are fine with
+    # much larger; shrinking to 4 MB here loses quality but only fires
+    # after a confirmed provider rejection, so the alternative is failure.
+    target_bytes = 4 * 1024 * 1024
+    changed_count = 0
+
+    def _shrink_data_url(url: str) -> Optional[str]:
+        """Return a smaller data URL, or None if shrink can't help."""
+        if not isinstance(url, str) or not url.startswith("data:"):
+            return None
+        if len(url) <= target_bytes:
+            # This specific image wasn't the oversized one.
+            return None
+        try:
+            header, _, data = url.partition(",")
+            mime = "image/jpeg"
+            if header.startswith("data:"):
+                mime_part = header[len("data:"):].split(";", 1)[0].strip()
+                if mime_part.startswith("image/"):
+                    mime = mime_part
+            import base64 as _b64
+            raw = _b64.b64decode(data)
+            suffix = {
+                "image/png": ".png", "image/gif": ".gif", "image/webp": ".webp",
+                "image/jpeg": ".jpg", "image/jpg": ".jpg", "image/bmp": ".bmp",
+            }.get(mime, ".jpg")
+            tmp = tempfile.NamedTemporaryFile(
+                prefix="hermes_shrink_", suffix=suffix, delete=False,
+            )
+            try:
+                tmp.write(raw)
+                tmp.close()
+                resized = _resize_image_for_vision(
+                    Path(tmp.name),
+                    mime_type=mime,
+                    max_base64_bytes=target_bytes,
+                )
+            finally:
+                try:
+                    Path(tmp.name).unlink(missing_ok=True)
+                except Exception:
+                    pass
+            if not resized or len(resized) >= len(url):
+                # Shrink didn't help (or made it bigger — corrupt input?).
+                return None
+            return resized
+        except Exception as exc:
+            logger.warning("image-shrink recovery: re-encode failed — %s", exc)
+            return None
+
+    for msg in api_messages:
+        if not isinstance(msg, dict):
+            continue
+        content = msg.get("content")
+        if not isinstance(content, list):
+            continue
+        for part in content:
+            if not isinstance(part, dict):
+                continue
+            ptype = part.get("type")
+            if ptype not in {"image_url", "input_image"}:
+                continue
+            image_value = part.get("image_url")
+            # OpenAI chat.completions: {"image_url": {"url": "data:..."}}
+            # OpenAI Responses: {"image_url": "data:..."}
+            if isinstance(image_value, dict):
+                url = image_value.get("url", "")
+                resized = _shrink_data_url(url)
+                if resized:
+                    image_value["url"] = resized
+                    changed_count += 1
+            elif isinstance(image_value, str):
+                resized = _shrink_data_url(image_value)
+                if resized:
+                    part["image_url"] = resized
+                    changed_count += 1
+
+    if changed_count:
+        logger.info(
+            "image-shrink recovery: re-encoded %d image part(s) to fit under %.0f MB",
+            changed_count, target_bytes / (1024 * 1024),
+        )
+    return changed_count > 0
+
+
+__all__ = [
+    "check_compression_model_feasibility",
+    "replay_compression_warning",
+    "compress_context",
+    "try_shrink_image_parts_in_messages",
+]
--- a/agent/conversation_loop.py
+++ b/agent/conversation_loop.py
--- a/agent/copilot_acp_client.py
+++ b/agent/copilot_acp_client.py
@@ -30,6 +30,28 @@ _DEFAULT_TIMEOUT_SECONDS = 900.0
 _TOOL_CALL_BLOCK_RE = re.compile(r"<tool_call>\s*(\{.*?\})\s*</tool_call>", re.DOTALL)
 _TOOL_CALL_JSON_RE = re.compile(r"\{\s*\"id\"\s*:\s*\"[^\"]+\"\s*,\s*\"type\"\s*:\s*\"function\"\s*,\s*\"function\"\s*:\s*\{.*?\}\s*\}", re.DOTALL)

+# Stderr fingerprint of the deprecated `gh copilot` CLI extension
+# (https://github.blog/changelog/2025-09-25-upcoming-deprecation-of-gh-copilot-cli-extension).
+# We require BOTH the literal product name ("gh-copilot") AND a deprecation
+# marker, so generic stderr from the NEW `@github/copilot` CLI — whose repo
+# is github.com/github/copilot-cli and which legitimately mentions "copilot-cli"
+# in its own banners and error messages — doesn't get misclassified as the
+# deprecated extension.
+_DEPRECATION_REQUIRED = ("gh-copilot",)
+_DEPRECATION_MARKERS = (
+    "has been deprecated",
+    "no commands will be executed",
+)
+
+
+def _is_gh_copilot_deprecation_message(stderr_text: str) -> bool:
+    """True iff stderr looks like the deprecated gh-copilot extension's banner."""
+
+    lower = stderr_text.lower()
+    if not any(req in lower for req in _DEPRECATION_REQUIRED):
+        return False
+    return any(marker in lower for marker in _DEPRECATION_MARKERS)
+

 def _resolve_command() -> str:
    return (
@@ -506,6 +528,21 @@ class CopilotACPClient:

            stderr_text = "\n".join(stderr_tail).strip()
            if proc.poll() is not None and stderr_text:
+                if _is_gh_copilot_deprecation_message(stderr_text):
+                    raise RuntimeError(
+                        "Hermes ACP mode requires the NEW GitHub Copilot CLI "
+                        "(github.com/github/copilot-cli), but the binary it just "
+                        "spawned is the deprecated `gh copilot` extension.\n\n"
+                        "Install the new CLI:\n"
+                        "  npm install -g @github/copilot\n"
+                        "  # then verify with: copilot --help\n\n"
+                        "If `copilot` already resolves to the new CLI but you still see this,\n"
+                        "point Hermes at it explicitly:\n"
+                        "  export HERMES_COPILOT_ACP_COMMAND=/path/to/new/copilot\n\n"
+                        "Alternative: use the `copilot` provider (no ACP, hits the Copilot API\n"
+                        "directly with a Copilot subscription token) via `hermes setup`.\n\n"
+                        f"Original error:\n{stderr_text}"
+                    )
                raise RuntimeError(f"Copilot ACP process exited early: {stderr_text}")
            raise TimeoutError(f"Timed out waiting for Copilot ACP response to {method}.")

@@ -599,7 +636,10 @@ class CopilotACPClient:
                block_error = get_read_block_error(str(path))
                if block_error:
                    raise PermissionError(block_error)
-                content = path.read_text() if path.exists() else ""
+                try:
+                    content = path.read_text()
+                except FileNotFoundError:
+                    content = ""
                line = params.get("line")
                limit = params.get("limit")
                if isinstance(line, int) and line > 1:
--- a/agent/credential_pool.py
+++ b/agent/credential_pool.py
@@ -10,7 +10,7 @@ import time
 import uuid
 import re
 from dataclasses import dataclass, fields, replace
-from datetime import datetime
+from datetime import datetime, timezone
 from typing import Any, Dict, List, Optional, Set, Tuple

 from hermes_constants import OPENROUTER_BASE_URL
@@ -29,6 +29,7 @@ from hermes_cli.auth import (
    _resolve_zai_base_url,
    _save_auth_store,
    _save_provider_state,
+    _store_provider_state,
    read_credential_pool,
    write_credential_pool,
 )
@@ -128,6 +129,9 @@ class PooledCredential:
    def from_dict(cls, provider: str, payload: Dict[str, Any]) -> "PooledCredential":
        field_names = {f.name for f in fields(cls) if f.name != "provider"}
        data = {k: payload.get(k) for k in field_names if k in payload}
+        # Rehydrated last_status_at may be an ISO string from to_dict() — normalize to float epoch
+        if "last_status_at" in data and isinstance(data["last_status_at"], str):
+            data["last_status_at"] = _parse_absolute_timestamp(data["last_status_at"])
        extra = {k: payload[k] for k in _EXTRA_KEYS if k in payload and payload[k] is not None}
        data["extra"] = extra
        data.setdefault("id", uuid.uuid4().hex[:6])
@@ -149,7 +153,7 @@ class PooledCredential:
        }
        result: Dict[str, Any] = {}
        for field_def in fields(self):
-            if field_def.name in ("provider", "extra"):
+            if field_def.name in {"provider", "extra"}:
                continue
            value = getattr(self, field_def.name)
            if value is not None or field_def.name in _ALWAYS_EMIT:
@@ -162,6 +166,8 @@ class PooledCredential:
    @property
    def runtime_api_key(self) -> str:
        if self.provider == "nous":
+            # Nous stores the runtime inference credential in agent_key for
+            # compatibility. It may be a NAS invoke JWT or legacy opaque key.
            return str(self.agent_key or self.access_token or "")
        return str(self.access_token or "")

@@ -539,6 +545,64 @@ class CredentialPool:
            logger.debug("Failed to sync Codex entry from auth.json: %s", exc)
        return entry

+    def _sync_xai_oauth_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
+        """Sync an xAI OAuth pool entry from auth.json if tokens differ.
+
+        xAI OAuth refresh tokens are single-use.  When another Hermes process
+        (or another profile sharing the same auth.json) refreshes the token,
+        it writes the new pair to ``providers["xai-oauth"]["tokens"]`` under
+        ``_auth_store_lock``.  Without this resync, our in-memory pool entry
+        keeps the consumed refresh_token and the next ``_refresh_entry`` call
+        would replay it and get a ``refresh_token_reused``-style 4xx.
+
+        Only applies to entries seeded from the singleton (``loopback_pkce``);
+        manually added entries (``manual:xai_pkce``) are independent
+        credentials with their own refresh-token lifecycle.
+        """
+        if self.provider != "xai-oauth" or entry.source != "loopback_pkce":
+            return entry
+        try:
+            with _auth_store_lock():
+                auth_store = _load_auth_store()
+                state = _load_provider_state(auth_store, "xai-oauth")
+            if not isinstance(state, dict):
+                return entry
+            tokens = state.get("tokens")
+            if not isinstance(tokens, dict):
+                return entry
+            store_access = tokens.get("access_token", "")
+            store_refresh = tokens.get("refresh_token", "")
+            entry_access = entry.access_token or ""
+            entry_refresh = entry.refresh_token or ""
+            if store_access and (
+                store_access != entry_access
+                or (store_refresh and store_refresh != entry_refresh)
+            ):
+                logger.debug(
+                    "Pool entry %s: syncing xAI OAuth tokens from auth.json "
+                    "(refreshed by another process)",
+                    entry.id,
+                )
+                field_updates: Dict[str, Any] = {
+                    "access_token": store_access,
+                    "refresh_token": store_refresh or entry.refresh_token,
+                    "last_status": None,
+                    "last_status_at": None,
+                    "last_error_code": None,
+                    "last_error_reason": None,
+                    "last_error_message": None,
+                    "last_error_reset_at": None,
+                }
+                if state.get("last_refresh"):
+                    field_updates["last_refresh"] = state["last_refresh"]
+                updated = replace(entry, **field_updates)
+                self._replace_entry(entry, updated)
+                self._persist()
+                return updated
+        except Exception as exc:
+            logger.debug("Failed to sync xAI OAuth entry from auth.json: %s", exc)
+        return entry
+
    def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
        """Sync a Nous pool entry from auth.json if tokens differ.

@@ -559,18 +623,35 @@ class CredentialPool:
                return entry
            store_refresh = state.get("refresh_token", "")
            store_access = state.get("access_token", "")
-            if store_refresh and store_refresh != entry.refresh_token:
+            comparable_updates = {
+                "access_token": store_access,
+                "refresh_token": store_refresh,
+                "expires_at": state.get("expires_at"),
+                "agent_key": state.get("agent_key"),
+                "agent_key_expires_at": state.get("agent_key_expires_at"),
+                "inference_base_url": state.get("inference_base_url"),
+            }
+            should_sync = any(
+                value not in (None, "") and getattr(entry, key, None) != value
+                for key, value in comparable_updates.items()
+            )
+            if should_sync:
                logger.debug(
-                    "Pool entry %s: syncing tokens from auth.json (Nous refresh token changed)",
+                    "Pool entry %s: syncing Nous state from auth.json",
                    entry.id,
                )
                field_updates: Dict[str, Any] = {
-                    "access_token": store_access,
-                    "refresh_token": store_refresh,
                    "last_status": None,
                    "last_status_at": None,
                    "last_error_code": None,
+                    "last_error_reason": None,
+                    "last_error_message": None,
+                    "last_error_reset_at": None,
                }
+                if store_access:
+                    field_updates["access_token"] = store_access
+                if store_refresh:
+                    field_updates["refresh_token"] = store_refresh
                if state.get("expires_at"):
                    field_updates["expires_at"] = state["expires_at"]
                if state.get("agent_key"):
@@ -604,9 +685,22 @@ class CredentialPool:
        re-seeding a consumed single-use refresh token.

        Applies to any OAuth provider whose singleton lives in auth.json
-        (currently Nous and OpenAI Codex).
+        (currently Nous, OpenAI Codex, and xAI Grok OAuth).
+
+        ``set_active=False`` on every write: a pool sync-back is a
+        token-rotation side effect, not the user choosing a provider.
+        Using ``_save_provider_state`` (which sets ``active_provider``)
+        here would mean every Nous/Codex/xAI refresh in a multi-provider
+        setup silently flips the ``active_provider`` flag — the next
+        ``hermes`` invocation that defaults to the active provider
+        (e.g. setup wizard, ``hermes auth status``) would land on
+        whatever provider happened to refresh last, not whatever the
+        user actually chose.
        """
-        if entry.source != "device_code":
+        # Only sync entries that were seeded *from* a singleton.  Manually
+        # added pool entries (source="manual:*") are independent credentials
+        # and must not write back to the singleton.
+        if entry.source not in {"device_code", "loopback_pkce"}:
            return
        try:
            with _auth_store_lock():
@@ -632,7 +726,7 @@ class CredentialPool:
                            state[extra_key] = val
                    if entry.inference_base_url:
                        state["inference_base_url"] = entry.inference_base_url
-                    _save_provider_state(auth_store, "nous", state)
+                    _store_provider_state(auth_store, "nous", state, set_active=False)

                elif self.provider == "openai-codex":
                    state = _load_provider_state(auth_store, "openai-codex")
@@ -646,7 +740,21 @@ class CredentialPool:
                        tokens["refresh_token"] = entry.refresh_token
                    if entry.last_refresh:
                        state["last_refresh"] = entry.last_refresh
-                    _save_provider_state(auth_store, "openai-codex", state)
+                    _store_provider_state(auth_store, "openai-codex", state, set_active=False)
+
+                elif self.provider == "xai-oauth":
+                    state = _load_provider_state(auth_store, "xai-oauth")
+                    if not isinstance(state, dict):
+                        return
+                    tokens = state.get("tokens")
+                    if not isinstance(tokens, dict):
+                        return
+                    tokens["access_token"] = entry.access_token
+                    if entry.refresh_token:
+                        tokens["refresh_token"] = entry.refresh_token
+                    if entry.last_refresh:
+                        state["last_refresh"] = entry.last_refresh
+                    _store_provider_state(auth_store, "xai-oauth", state, set_active=False)

                else:
                    return
@@ -689,6 +797,13 @@ class CredentialPool:
                    except Exception as wexc:
                        logger.debug("Failed to write refreshed token to credentials file: %s", wexc)
            elif self.provider == "openai-codex":
+                # Adopt fresher tokens from auth.json before spending the
+                # refresh_token — single-use tokens consumed by another Hermes
+                # process sharing the same auth.json singleton would otherwise
+                # trigger ``refresh_token_reused`` on the next POST.
+                synced = self._sync_codex_entry_from_auth_store(entry)
+                if synced is not entry:
+                    entry = synced
                refreshed = auth_mod.refresh_codex_oauth_pure(
                    entry.access_token,
                    entry.refresh_token,
@@ -699,40 +814,38 @@ class CredentialPool:
                    refresh_token=refreshed["refresh_token"],
                    last_refresh=refreshed.get("last_refresh"),
                )
+            elif self.provider == "xai-oauth":
+                # Adopt fresher tokens from auth.json before spending the
+                # refresh_token — single-use tokens consumed by another
+                # process (or another profile sharing the singleton) would
+                # otherwise trigger ``refresh_token_reused`` on the next
+                # POST.  Only meaningful for singleton-seeded entries.
+                synced = self._sync_xai_oauth_entry_from_auth_store(entry)
+                if synced is not entry:
+                    entry = synced
+                refreshed = auth_mod.refresh_xai_oauth_pure(
+                    entry.access_token,
+                    entry.refresh_token,
+                )
+                updated = replace(
+                    entry,
+                    access_token=refreshed["access_token"],
+                    refresh_token=refreshed["refresh_token"],
+                    last_refresh=refreshed.get("last_refresh"),
+                )
            elif self.provider == "nous":
                synced = self._sync_nous_entry_from_auth_store(entry)
                if synced is not entry:
                    entry = synced
-                nous_state = {
-                    "access_token": entry.access_token,
-                    "refresh_token": entry.refresh_token,
-                    "client_id": entry.client_id,
-                    "portal_base_url": entry.portal_base_url,
-                    "inference_base_url": entry.inference_base_url,
-                    "token_type": entry.token_type,
-                    "scope": entry.scope,
-                    "obtained_at": entry.obtained_at,
-                    "expires_at": entry.expires_at,
-                    "agent_key": entry.agent_key,
-                    "agent_key_expires_at": entry.agent_key_expires_at,
-                    "tls": entry.tls,
-                }
-                refreshed = auth_mod.refresh_nous_oauth_from_state(
-                    nous_state,
+                auth_mod.resolve_nous_runtime_credentials(
                    min_key_ttl_seconds=DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
-                    force_refresh=force,
-                    force_mint=force,
+                    inference_auth_mode=(
+                        auth_mod.NOUS_INFERENCE_AUTH_MODE_LEGACY
+                        if force
+                        else auth_mod.NOUS_INFERENCE_AUTH_MODE_AUTO
+                    ),
                )
-                # Apply returned fields: dataclass fields via replace, extras via dict update
-                field_updates = {}
-                extra_updates = dict(entry.extra)
-                _field_names = {f.name for f in fields(entry)}
-                for k, v in refreshed.items():
-                    if k in _field_names:
-                        field_updates[k] = v
-                    elif k in _EXTRA_KEYS:
-                        extra_updates[k] = v
-                updated = replace(entry, extra=extra_updates, **field_updates)
+                updated = self._sync_nous_entry_from_auth_store(entry)
            else:
                return entry
        except Exception as exc:
@@ -777,6 +890,140 @@ class CredentialPool:
                    # Credentials file had a valid (non-expired) token — use it directly
                    logger.debug("Credentials file has valid token, using without refresh")
                    return synced
+            # For xai-oauth: same race as nous — another process may have
+            # consumed the refresh token between our proactive sync and the
+            # HTTP call.  Re-check auth.json and adopt the fresh tokens if
+            # they have rotated since.  Only meaningful for singleton-seeded
+            # (loopback_pkce) entries; manual entries don't share state with
+            # the singleton.
+            if self.provider == "xai-oauth":
+                synced = self._sync_xai_oauth_entry_from_auth_store(entry)
+                if synced.refresh_token != entry.refresh_token:
+                    logger.debug(
+                        "xAI OAuth refresh failed but auth.json has newer tokens — adopting"
+                    )
+                    updated = replace(
+                        synced,
+                        last_status=STATUS_OK,
+                        last_status_at=None,
+                        last_error_code=None,
+                        last_error_reason=None,
+                        last_error_message=None,
+                        last_error_reset_at=None,
+                    )
+                    self._replace_entry(synced, updated)
+                    self._persist()
+                    return updated
+                # Terminal error: auth.json has no newer tokens — the stored
+                # refresh_token is dead.  Clear it from auth.json so the next
+                # session does not re-seed the same revoked credentials, and
+                # remove all singleton-seeded (loopback_pkce) entries from the
+                # in-memory pool.  Mirrors the Nous quarantine path above.
+                if auth_mod._is_terminal_xai_oauth_refresh_error(exc):
+                    logger.debug(
+                        "xAI OAuth refresh token is terminally invalid; clearing local token state"
+                    )
+                    try:
+                        with _auth_store_lock():
+                            auth_store = _load_auth_store()
+                            state = _load_provider_state(auth_store, "xai-oauth") or {}
+                            if isinstance(state, dict):
+                                tokens = state.get("tokens") or {}
+                                if isinstance(tokens, dict):
+                                    store_refresh = str(tokens.get("refresh_token") or "").strip()
+                                    entry_refresh = str(entry.refresh_token or "").strip()
+                                    if not store_refresh or store_refresh == entry_refresh:
+                                        tokens.pop("access_token", None)
+                                        tokens.pop("refresh_token", None)
+                                        state["tokens"] = tokens
+                                        state["last_auth_error"] = {
+                                            "provider": "xai-oauth",
+                                            "code": getattr(exc, "code", "unknown"),
+                                            "message": str(exc),
+                                            "reason": "credential_pool_refresh_failure",
+                                            "relogin_required": True,
+                                            "at": datetime.now(timezone.utc).isoformat(),
+                                        }
+                                        _save_provider_state(auth_store, "xai-oauth", state)
+                                        _save_auth_store(auth_store)
+                    except Exception as clear_exc:
+                        logger.debug(
+                            "Failed to clear terminal xAI OAuth state: %s", clear_exc
+                        )
+                    self._entries = [
+                        item for item in self._entries
+                        if item.source != "loopback_pkce"
+                    ]
+                    if self._current_id == entry.id:
+                        self._current_id = None
+                    self._persist()
+                    return None
+            # For openai-codex: same race as xAI/nous — another Hermes process
+            # may have consumed the refresh token between our proactive sync
+            # and the HTTP call.  Re-check auth.json and adopt the fresh tokens
+            # if they have rotated since.
+            if self.provider == "openai-codex":
+                synced = self._sync_codex_entry_from_auth_store(entry)
+                if synced.refresh_token != entry.refresh_token:
+                    logger.debug(
+                        "Codex OAuth refresh failed but auth.json has newer tokens — adopting"
+                    )
+                    updated = replace(
+                        synced,
+                        last_status=STATUS_OK,
+                        last_status_at=None,
+                        last_error_code=None,
+                        last_error_reason=None,
+                        last_error_message=None,
+                        last_error_reset_at=None,
+                    )
+                    self._replace_entry(synced, updated)
+                    self._persist()
+                    return updated
+                # Terminal error: auth.json has no newer tokens — the stored
+                # refresh_token is dead.  Clear it from auth.json so the next
+                # session does not re-seed the same revoked credentials, and
+                # remove all singleton-seeded (device_code) entries from the
+                # in-memory pool.  Mirrors the xAI and Nous quarantine paths.
+                if auth_mod._is_terminal_codex_oauth_refresh_error(exc):
+                    logger.debug(
+                        "Codex OAuth refresh token is terminally invalid; clearing local token state"
+                    )
+                    try:
+                        with _auth_store_lock():
+                            auth_store = _load_auth_store()
+                            state = _load_provider_state(auth_store, "openai-codex") or {}
+                            if isinstance(state, dict):
+                                tokens = state.get("tokens") or {}
+                                if isinstance(tokens, dict):
+                                    store_refresh = str(tokens.get("refresh_token") or "").strip()
+                                    entry_refresh = str(entry.refresh_token or "").strip()
+                                    if not store_refresh or store_refresh == entry_refresh:
+                                        tokens.pop("access_token", None)
+                                        tokens.pop("refresh_token", None)
+                                        state["tokens"] = tokens
+                                        state["last_auth_error"] = {
+                                            "provider": "openai-codex",
+                                            "code": getattr(exc, "code", "unknown"),
+                                            "message": str(exc),
+                                            "reason": "credential_pool_refresh_failure",
+                                            "relogin_required": True,
+                                            "at": datetime.now(timezone.utc).isoformat(),
+                                        }
+                                        _save_provider_state(auth_store, "openai-codex", state)
+                                        _save_auth_store(auth_store)
+                    except Exception as clear_exc:
+                        logger.debug(
+                            "Failed to clear terminal Codex OAuth state: %s", clear_exc
+                        )
+                    self._entries = [
+                        item for item in self._entries
+                        if item.source != "device_code"
+                    ]
+                    if self._current_id == entry.id:
+                        self._current_id = None
+                    self._persist()
+                    return None
            # For nous: another process may have consumed the refresh token
            # between our proactive sync and the HTTP call.  Re-sync from
            # auth.json and adopt the fresh tokens if available.
@@ -797,6 +1044,49 @@ class CredentialPool:
                    self._persist()
                    self._sync_device_code_entry_to_auth_store(updated)
                    return updated
+                if auth_mod._is_terminal_nous_refresh_error(exc):
+                    logger.debug("Nous refresh token is terminally invalid; clearing local token state")
+                    try:
+                        with _auth_store_lock():
+                            auth_store = _load_auth_store()
+                            state = _load_provider_state(auth_store, "nous") or {
+                                "client_id": entry.client_id,
+                                "portal_base_url": entry.portal_base_url,
+                                "inference_base_url": entry.inference_base_url,
+                                "token_type": entry.token_type,
+                                "scope": entry.scope,
+                                "tls": entry.tls,
+                            }
+                            store_refresh = str(state.get("refresh_token") or "").strip()
+                            entry_refresh = str(entry.refresh_token or "").strip()
+                            if not store_refresh or store_refresh == entry_refresh:
+                                auth_mod._quarantine_nous_oauth_state(
+                                    state,
+                                    exc,
+                                    reason="credential_pool_refresh_failure",
+                                )
+                                auth_mod._quarantine_nous_pool_entries(
+                                    auth_store,
+                                    exc,
+                                    reason="credential_pool_refresh_failure",
+                                )
+                                _save_provider_state(auth_store, "nous", state)
+                                _save_auth_store(auth_store)
+                    except Exception as clear_exc:
+                        logger.debug("Failed to clear terminal Nous OAuth state: %s", clear_exc)
+
+                    singleton_sources = {
+                        auth_mod.NOUS_DEVICE_CODE_SOURCE,
+                        f"manual:{auth_mod.NOUS_DEVICE_CODE_SOURCE}",
+                    }
+                    self._entries = [
+                        item for item in self._entries
+                        if item.source not in singleton_sources
+                    ]
+                    if self._current_id == entry.id:
+                        self._current_id = None
+                    self._persist()
+                    return None
            self._mark_exhausted(entry, None)
            return None

@@ -829,6 +1119,11 @@ class CredentialPool:
                entry.access_token,
                CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
            )
+        if self.provider == "xai-oauth":
+            return auth_mod._xai_access_token_is_expiring(
+                entry.access_token,
+                auth_mod.XAI_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
+            )
        if self.provider == "nous":
            # Nous refresh/mint can require network access and should happen when
            # runtime credentials are actually resolved, not merely when the pool
@@ -883,6 +1178,17 @@ class CredentialPool:
                if synced is not entry:
                    entry = synced
                    cleared_any = True
+            # For xai-oauth singleton-seeded entries, identical pattern:
+            # an entry frozen as exhausted may simply be holding stale
+            # tokens that another process (or a fresh `hermes model` ->
+            # xAI Grok OAuth login) has since rotated in auth.json.
+            if (self.provider == "xai-oauth"
+                    and entry.source == "loopback_pkce"
+                    and entry.last_status == STATUS_EXHAUSTED):
+                synced = self._sync_xai_oauth_entry_from_auth_store(entry)
+                if synced is not entry:
+                    entry = synced
+                    cleared_any = True
            if entry.last_status == STATUS_EXHAUSTED:
                exhausted_until = _exhausted_until(entry)
                if exhausted_until is not None and now < exhausted_until:
@@ -1217,7 +1523,22 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup

    elif provider == "nous":
        state = _load_provider_state(auth_store, "nous")
-        if state and not _is_suppressed(provider, "device_code"):
+        has_runtime_material = bool(
+            isinstance(state, dict)
+            and (
+                str(state.get("access_token") or "").strip()
+                or str(state.get("agent_key") or "").strip()
+            )
+        )
+        if state and not has_runtime_material:
+            retained = [
+                entry for entry in entries
+                if entry.source not in {"device_code", "manual:device_code"}
+            ]
+            if len(retained) != len(entries):
+                entries[:] = retained
+                changed = True
+        if state and has_runtime_material and not _is_suppressed(provider, "device_code"):
            active_sources.add("device_code")
            # Prefer a user-supplied label embedded in the singleton state
            # (set by persist_nous_credentials(label=...) when the user ran
@@ -1394,6 +1715,37 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
                },
            )

+    elif provider == "xai-oauth":
+        # When the user logs in via ``hermes model`` -> xAI Grok OAuth,
+        # tokens are written to the auth.json singleton
+        # (``providers["xai-oauth"]``).  Surface them in the pool too so
+        # ``hermes auth list`` reflects the logged-in state and so the pool
+        # is the single source of truth for refresh during runtime resolution.
+        if _is_suppressed(provider, "loopback_pkce"):
+            return changed, active_sources
+
+        state = _load_provider_state(auth_store, "xai-oauth")
+        tokens = state.get("tokens") if isinstance(state, dict) else None
+        if isinstance(tokens, dict) and tokens.get("access_token"):
+            active_sources.add("loopback_pkce")
+            from hermes_cli.auth import DEFAULT_XAI_OAUTH_BASE_URL
+
+            base_url = DEFAULT_XAI_OAUTH_BASE_URL
+            changed |= _upsert_entry(
+                entries,
+                provider,
+                "loopback_pkce",
+                {
+                    "source": "loopback_pkce",
+                    "auth_type": AUTH_TYPE_OAUTH,
+                    "access_token": tokens.get("access_token", ""),
+                    "refresh_token": tokens.get("refresh_token"),
+                    "base_url": base_url,
+                    "last_refresh": state.get("last_refresh"),
+                    "label": label_from_token(tokens.get("access_token", ""), "loopback_pkce"),
+                },
+            )
+
    return changed, active_sources


--- a/agent/credential_sources.py
+++ b/agent/credential_sources.py
@@ -265,6 +265,31 @@ def _remove_minimax_oauth(provider: str, removed) -> RemovalResult:
    return result


+def _remove_xai_oauth_loopback_pkce(provider: str, removed) -> RemovalResult:
+    """xAI OAuth tokens live in auth.json providers.xai-oauth — clear them.
+
+    Without this step, ``hermes auth remove xai-oauth <N>`` silently undoes
+    itself: the central dispatcher only removes the in-memory pool entry,
+    leaves ``providers.xai-oauth`` in auth.json intact, and on the next
+    ``load_pool("xai-oauth")`` call ``_seed_from_singletons`` re-seeds the
+    entry from the still-present singleton — credentials reappear with no
+    user feedback. Clearing the singleton in step with the suppression set
+    by the central dispatcher makes the removal stick.
+
+    Belt-and-braces against the manual entry path: ``hermes auth add
+    xai-oauth`` produces a ``manual:xai_pkce`` entry whose removal step
+    falls through to "unregistered → nothing to clean up" (correct —
+    manual entries are pool-only).
+    """
+    result = RemovalResult()
+    if _clear_auth_store_provider(provider):
+        result.cleaned.append(f"Cleared {provider} OAuth tokens from auth store")
+    result.hints.append(
+        "Run `hermes model` → xAI Grok OAuth (SuperGrok Subscription) to re-authenticate if needed."
+    )
+    return result
+
+
 def _remove_codex_device_code(provider: str, removed) -> RemovalResult:
    """Codex tokens live in TWO places: our auth store AND ~/.codex/auth.json.

@@ -397,6 +422,11 @@ def _register_all_sources() -> None:
        remove_fn=_remove_codex_device_code,
        description="auth.json providers.openai-codex + ~/.codex/auth.json",
    ))
+    register(RemovalStep(
+        provider="xai-oauth", source_id="loopback_pkce",
+        remove_fn=_remove_xai_oauth_loopback_pkce,
+        description="auth.json providers.xai-oauth",
+    ))
    register(RemovalStep(
        provider="qwen-oauth", source_id="qwen-cli",
        remove_fn=_remove_qwen_cli,
--- a/agent/curator.py
+++ b/agent/curator.py
@@ -72,6 +72,7 @@ def _default_state() -> Dict[str, Any]:
        "last_run_at": None,
        "last_run_duration_seconds": None,
        "last_run_summary": None,
+        "last_run_summary_shown_at": None,
        "last_report_path": None,
        "paused": False,
        "run_count": 0,
@@ -876,6 +877,96 @@ def _reconcile_classification(
    return {"consolidated": consolidated, "pruned": pruned}


+def _build_rename_summary(
+    *,
+    before_names: Set[str],
+    after_report: List[Dict[str, Any]],
+    tool_calls: List[Dict[str, Any]],
+    model_final: str,
+) -> str:
+    """Format the user-visible rename map for a curator run.
+
+    Renders the "where did my skills go?" lines that get appended to the
+    `final_summary` string fed to gateway/CLI receivers. Empty string when
+    nothing was archived this run — most ticks are no-op and shouldn't add
+    extra log noise.
+
+    Format::
+
+        archived 4 skill(s):
+          • pdf-extraction → document-tools
+          • docx-extraction → document-tools
+          • flaky-thing — pruned (stale)
+          • old-utility → spreadsheet-ops
+        full report: hermes curator status
+        keep an umbrella stable: hermes curator pin document-tools
+
+    Cap is 10 entries so a 50-skill consolidation doesn't blow up
+    agent.log; the full list is always in REPORT.md. The pin hint only
+    appears when at least one consolidation produced an umbrella worth
+    pinning (pruned-only runs skip it).
+    """
+    after_by_name = {r.get("name"): r for r in after_report if isinstance(r, dict)}
+    after_names = set(after_by_name.keys())
+    removed = sorted(before_names - after_names)
+    added = sorted(after_names - before_names)
+    if not removed:
+        return ""
+
+    heuristic = _classify_removed_skills(
+        removed=removed,
+        added=added,
+        after_names=after_names,
+        tool_calls=tool_calls,
+    )
+    model_block = _parse_structured_summary(model_final)
+    destinations = set(after_names) | set(added)
+    absorbed_declarations = _extract_absorbed_into_declarations(tool_calls)
+    classification = _reconcile_classification(
+        removed=removed,
+        heuristic=heuristic,
+        model_block=model_block,
+        destinations=destinations,
+        absorbed_declarations=absorbed_declarations,
+    )
+    consolidated = classification["consolidated"]
+    pruned = classification["pruned"]
+
+    SHOW = 10
+    lines: List[str] = []
+    total = len(consolidated) + len(pruned)
+    lines.append(f"archived {total} skill(s):")
+    shown = 0
+    for entry in consolidated:
+        if shown >= SHOW:
+            break
+        name = entry.get("name", "?")
+        into = entry.get("into", "?")
+        lines.append(f"  • {name} → {into}")
+        shown += 1
+    for entry in pruned:
+        if shown >= SHOW:
+            break
+        name = entry.get("name", "?") if isinstance(entry, dict) else str(entry)
+        lines.append(f"  • {name} — pruned (stale)")
+        shown += 1
+    if total > SHOW:
+        lines.append(f"  … and {total - SHOW} more")
+    lines.append("full report: hermes curator status")
+    # Pin hint — only surface it when there's actually a destination skill
+    # worth pinning. The umbrella skills that absorbed content are the natural
+    # candidates: pinning one tells future curator runs to leave it alone.
+    # Pruned-only runs don't get this hint (nothing surviving to pin).
+    if consolidated:
+        umbrellas = sorted({e.get("into") for e in consolidated if e.get("into")})
+        if umbrellas:
+            example = umbrellas[0]
+            lines.append(
+                f"keep an umbrella stable: hermes curator pin {example}"
+            )
+    return "\n".join(lines)
+
+
 def _write_run_report(
    *,
    started_at: datetime,
@@ -1398,6 +1489,22 @@ def run_curator_review(
                "error": str(e),
            }

+        # Append the rename map (`old-name → umbrella`) to the user-visible
+        # summary so people don't have to dig into REPORT.md to find out where
+        # their skills went. Best-effort: classification is pure but never
+        # block the run on a formatting issue.
+        try:
+            rename_lines = _build_rename_summary(
+                before_names=before_names,
+                after_report=skill_usage.agent_created_report(),
+                tool_calls=llm_meta.get("tool_calls", []) or [],
+                model_final=llm_meta.get("final", "") or "",
+            )
+            if rename_lines:
+                final_summary = f"{final_summary}\n{rename_lines}"
+        except Exception as e:
+            logger.debug("Curator rename summary build failed: %s", e, exc_info=True)
+
        elapsed = (datetime.now(timezone.utc) - start).total_seconds()
        state2 = load_state()
        state2["last_run_duration_seconds"] = elapsed
--- a/agent/curator_backup.py
+++ b/agent/curator_backup.py
@@ -50,6 +50,7 @@ from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple

 from hermes_constants import get_hermes_home
+from agent.skill_utils import is_excluded_skill_path

 logger = logging.getLogger(__name__)

@@ -176,7 +177,9 @@ def get_keep() -> int:

 def _count_skill_files(base: Path) -> int:
    try:
-        return sum(1 for _ in base.rglob("SKILL.md"))
+        return sum(
+            1 for p in base.rglob("SKILL.md") if not is_excluded_skill_path(p)
+        )
    except OSError:
        return 0

--- a/agent/display.py
+++ b/agent/display.py
@@ -14,6 +14,7 @@ from difflib import unified_diff
 from pathlib import Path

 from utils import safe_json_loads
+from agent.tool_result_classification import file_mutation_result_landed

 # ANSI escape codes for coloring tool failure indicators
 _RED = "\033[31m"
@@ -239,21 +240,6 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
            msg = msg[:17] + "..."
        return f"to {target}: \"{msg}\""

-    if tool_name.startswith("rl_"):
-        rl_previews = {
-            "rl_list_environments": "listing envs",
-            "rl_select_environment": args.get("name", ""),
-            "rl_get_current_config": "reading config",
-            "rl_edit_config": f"{args.get('field', '')}={args.get('value', '')}",
-            "rl_start_training": "starting",
-            "rl_check_status": args.get("run_id", "")[:16],
-            "rl_stop_training": f"stopping {args.get('run_id', '')[:16]}",
-            "rl_get_results": args.get("run_id", "")[:16],
-            "rl_list_runs": "listing runs",
-            "rl_test_inference": f"{args.get('num_steps', 3)} steps",
-        }
-        return rl_previews.get(tool_name)
-
    key = primary_args.get(tool_name)
    if not key:
        for fallback_key in ("query", "text", "command", "path", "name", "prompt", "code", "goal"):
@@ -810,6 +796,8 @@ def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]
    """
    if result is None:
        return False, ""
+    if file_mutation_result_landed(tool_name, result):
+        return False, ""

    if tool_name == "terminal":
        data = safe_json_loads(result)
@@ -978,15 +966,6 @@ def get_cute_tool_message(
        if action == "list":
            return _wrap(f"┊ ⏰ cron      listing  {dur}")
        return _wrap(f"┊ ⏰ cron      {action} {args.get('job_id', '')}  {dur}")
-    if tool_name.startswith("rl_"):
-        rl = {
-            "rl_list_environments": "list envs", "rl_select_environment": f"select {args.get('name', '')}",
-            "rl_get_current_config": "get config", "rl_edit_config": f"set {args.get('field', '?')}",
-            "rl_start_training": "start training", "rl_check_status": f"status {args.get('run_id', '?')[:12]}",
-            "rl_stop_training": f"stop {args.get('run_id', '?')[:12]}", "rl_get_results": f"results {args.get('run_id', '?')[:12]}",
-            "rl_list_runs": "list runs", "rl_test_inference": "test inference",
-        }
-        return _wrap(f"┊ 🧪 rl        {rl.get(tool_name, tool_name.replace('rl_', ''))}  {dur}")
    if tool_name == "execute_code":
        code = args.get("code", "")
        first_line = code.strip().split("\n")[0] if code.strip() else ""
--- a/agent/error_classifier.py
+++ b/agent/error_classifier.py
@@ -83,7 +83,7 @@ class ClassifiedError:

    @property
    def is_auth(self) -> bool:
-        return self.reason in (FailoverReason.auth, FailoverReason.auth_permanent)
+        return self.reason in {FailoverReason.auth, FailoverReason.auth_permanent}



@@ -254,6 +254,20 @@ _THINKING_SIG_PATTERNS = [
    "signature",  # Combined with "thinking" check
 ]

+# Message-string patterns that indicate a provider-side timeout even when
+# the exception type is generic (e.g. RuntimeError from a local shim that
+# wraps a subprocess timeout).  Checked before the type-based transport
+# heuristics so custom-provider "timed out" errors don't fall through to
+# the unknown bucket and get misreported as empty responses.
+_TIMEOUT_MESSAGE_PATTERNS = [
+    "timed out",
+    "turn timed out",
+    "request timed out",
+    "deadline exceeded",
+    "operation timed out",
+    "upstream timed out",
+]
+
 # Transport error type names
 _TRANSPORT_ERROR_TYPES = frozenset({
    "ReadTimeout", "ConnectTimeout", "PoolTimeout",
@@ -496,6 +510,35 @@ def classify_api_error(
            should_compress=False,
        )

+    # xAI Grok subscription entitlement errors.
+    #
+    # xAI returns "You have either run out of available resources or do not
+    # have an active Grok subscription" through two distinct code paths:
+    #
+    #   • HTTP 403 — status_code is set; _classify_by_status (step 2) routes
+    #     it to FailoverReason.auth correctly, and _is_entitlement_failure
+    #     then prevents the credential-refresh loop.
+    #
+    #   • SSE ``type=error`` frame — surfaced as _StreamErrorEvent with
+    #     status_code=None.  _classify_by_status is skipped entirely, and
+    #     "grok subscription" / "out of available resources" appear in none
+    #     of the message-pattern lists below.  Without this guard the error
+    #     falls through to FailoverReason.unknown (retryable=True), burning
+    #     max_retries before the agent stops — and _is_entitlement_failure
+    #     is never called because it only runs under FailoverReason.auth.
+    #
+    # Both X Premium+ and SuperGrok subscribers hit this path when their
+    # subscription tier does not cover the requested model or feature.
+    if (
+        "do not have an active grok subscription" in error_msg
+        or ("out of available resources" in error_msg and "grok" in error_msg)
+    ):
+        return _result(
+            FailoverReason.auth,
+            retryable=False,
+            should_fallback=True,
+        )
+
    # ── 2. HTTP status code classification ──────────────────────────

    if status_code is not None:
@@ -674,10 +717,10 @@ def _classify_by_status(
            result_fn=result_fn,
        )

-    if status_code in (500, 502):
+    if status_code in {500, 502}:
        return result_fn(FailoverReason.server_error, retryable=True)

-    if status_code in (503, 529):
+    if status_code in {503, 529}:
        return result_fn(FailoverReason.overloaded, retryable=True)

    # Other 4xx — non-retryable
@@ -796,7 +839,7 @@ def _classify_400(
        # Responses API (and some providers) use flat body: {"message": "..."}
        if not err_body_msg:
            err_body_msg = str(body.get("message") or "").strip().lower()
-    is_generic = len(err_body_msg) < 30 or err_body_msg in ("error", "")
+    is_generic = len(err_body_msg) < 30 or err_body_msg in {"error", ""}
    # Absolute token/message-count thresholds are only a proxy for smaller
    # context windows.  Large-context sessions can have many messages while
    # still being far below their actual token budget.
@@ -827,14 +870,14 @@ def _classify_by_error_code(
    """Classify by structured error codes from the response body."""
    code_lower = error_code.lower()

-    if code_lower in ("resource_exhausted", "throttled", "rate_limit_exceeded"):
+    if code_lower in {"resource_exhausted", "throttled", "rate_limit_exceeded"}:
        return result_fn(
            FailoverReason.rate_limit,
            retryable=True,
            should_rotate_credential=True,
        )

-    if code_lower in ("insufficient_quota", "billing_not_active", "payment_required"):
+    if code_lower in {"insufficient_quota", "billing_not_active", "payment_required"}:
        return result_fn(
            FailoverReason.billing,
            retryable=False,
@@ -842,14 +885,14 @@ def _classify_by_error_code(
            should_fallback=True,
        )

-    if code_lower in ("model_not_found", "model_not_available", "invalid_model"):
+    if code_lower in {"model_not_found", "model_not_available", "invalid_model"}:
        return result_fn(
            FailoverReason.model_not_found,
            retryable=False,
            should_fallback=True,
        )

-    if code_lower in ("context_length_exceeded", "max_tokens_exceeded"):
+    if code_lower in {"context_length_exceeded", "max_tokens_exceeded"}:
        return result_fn(
            FailoverReason.context_overflow,
            retryable=True,
@@ -963,6 +1006,14 @@ def _classify_by_message(
            should_fallback=True,
        )

+    # Timeout message patterns — generic exception types (e.g. RuntimeError)
+    # raised by local shims or custom providers that internally wrap a
+    # subprocess/HTTP timeout.  Classified as transport timeout so the retry
+    # loop rebuilds the client instead of treating the turn as an empty
+    # model response.
+    if any(p in error_msg for p in _TIMEOUT_MESSAGE_PATTERNS):
+        return result_fn(FailoverReason.timeout, retryable=True)
+
    return None


--- a/agent/file_safety.py
+++ b/agent/file_safety.py
@@ -16,9 +16,19 @@ def _hermes_home_path() -> Path:
        return Path(os.path.expanduser("~/.hermes"))


+def _hermes_root_path() -> Path:
+    """Resolve the Hermes root dir (always the parent of any profile, never per-profile)."""
+    try:
+        from hermes_constants import get_default_hermes_root  # local import to avoid cycles
+        return get_default_hermes_root()
+    except Exception:
+        return Path(os.path.expanduser("~/.hermes"))
+
+
 def build_write_denied_paths(home: str) -> set[str]:
    """Return exact sensitive paths that must never be written."""
    hermes_home = _hermes_home_path()
+    hermes_root = _hermes_root_path()
    return {
        os.path.realpath(p)
        for p in [
@@ -26,7 +36,11 @@ def build_write_denied_paths(home: str) -> set[str]:
            os.path.join(home, ".ssh", "id_rsa"),
            os.path.join(home, ".ssh", "id_ed25519"),
            os.path.join(home, ".ssh", "config"),
+            # Active profile .env (or top-level .env when not in profile mode).
            str(hermes_home / ".env"),
+            # Top-level .env, even when running under a profile — overwriting it
+            # leaks credentials across every profile that inherits from root (#15981).
+            str(hermes_root / ".env"),
            os.path.join(home, ".bashrc"),
            os.path.join(home, ".zshrc"),
            os.path.join(home, ".profile"),
--- a/agent/gemini_cloudcode_adapter.py
+++ b/agent/gemini_cloudcode_adapter.py
@@ -77,7 +77,7 @@ def _coerce_content_to_text(content: Any) -> str:
                if p.get("type") == "text" and isinstance(p.get("text"), str):
                    pieces.append(p["text"])
                # Multimodal (image_url, etc.) — stub for now; log and skip
-                elif p.get("type") in ("image_url", "input_audio"):
+                elif p.get("type") in {"image_url", "input_audio"}:
                    logger.debug("Dropping multimodal part (not yet supported): %s", p.get("type"))
        return "\n".join(pieces)
    return str(content)
@@ -450,7 +450,13 @@ def _make_stream_chunk(
    finish_reason: Optional[str] = None,
    reasoning: str = "",
 ) -> _GeminiStreamChunk:
-    delta_kwargs: Dict[str, Any] = {"role": "assistant"}
+    delta_kwargs: Dict[str, Any] = {
+        "role": "assistant",
+        "content": None,
+        "tool_calls": None,
+        "reasoning": None,
+        "reasoning_content": None,
+    }
    if content:
        delta_kwargs["content"] = content
    if tool_call_delta is not None:
--- a/agent/gemini_native_adapter.py
+++ b/agent/gemini_native_adapter.py
@@ -945,6 +945,12 @@ class AsyncGeminiNativeClient:
        self.api_key = sync_client.api_key
        self.base_url = sync_client.base_url
        self.chat = _AsyncGeminiChatNamespace(self)
+        # Expose the underlying sync client as _real_client so the auxiliary
+        # cache's eviction-by-leaf-client helper (#23482) can find and drop
+        # this async entry when the sync GeminiNativeClient is poisoned.
+        # GeminiNativeClient is itself the leaf (no OpenAI client beneath
+        # it), so we point at the sync_client directly.
+        self._real_client = sync_client

    async def _create_chat_completion(self, **kwargs: Any) -> Any:
        stream = bool(kwargs.get("stream"))
--- a/agent/google_oauth.py
+++ b/agent/google_oauth.py
@@ -59,7 +59,7 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Dict, Optional, Tuple

-from hermes_constants import get_hermes_home
+from hermes_constants import get_hermes_home, secure_parent_dir

 logger = logging.getLogger(__name__)

@@ -491,10 +491,8 @@ def save_credentials(creds: GoogleCredentials) -> Path:
    path.parent.mkdir(parents=True, exist_ok=True)
    # Tighten parent dir to 0o700 so siblings can't traverse to the creds file.
    # On Windows this is a no-op (POSIX mode bits aren't enforced); ignore failures.
-    try:
-        os.chmod(path.parent, 0o700)
-    except OSError:
-        pass
+    # secure_parent_dir refuses to chmod / or top-level dirs (#25821).
+    secure_parent_dir(path)
    payload = json.dumps(creds.to_dict(), indent=2, sort_keys=True) + "\n"

    with _credentials_lock():
--- a/agent/i18n.py
+++ b/agent/i18n.py
@@ -39,20 +39,45 @@ from typing import Any

 logger = logging.getLogger(__name__)

-SUPPORTED_LANGUAGES: tuple[str, ...] = ("en", "zh", "ja", "de", "es", "fr", "tr", "uk")
+SUPPORTED_LANGUAGES: tuple[str, ...] = (
+    "en", "zh", "zh-hant", "ja", "de", "es", "fr", "tr", "uk",
+    "af", "ko", "it", "ga", "pt", "ru", "hu",
+)
 DEFAULT_LANGUAGE = "en"

 # Accept a few natural aliases so users who type "chinese" / "zh-CN" / "jp"
 # get the right catalog instead of silently falling back to English.
 _LANGUAGE_ALIASES: dict[str, str] = {
    "english": "en", "en-us": "en", "en-gb": "en",
-    "chinese": "zh", "mandarin": "zh", "zh-cn": "zh", "zh-tw": "zh", "zh-hans": "zh", "zh-hant": "zh",
+    # Simplified Chinese — explicit codes route here; bare "chinese" / "mandarin"
+    # also default to Simplified since that's the larger user base.
+    "chinese": "zh", "mandarin": "zh", "zh-cn": "zh", "zh-hans": "zh", "zh-sg": "zh",
+    # Traditional Chinese — distinct catalog.  Cover Taiwan / Hong Kong / Macau
+    # locale tags plus the common "traditional" alias.
+    "traditional-chinese": "zh-hant", "traditional_chinese": "zh-hant",
+    "zh-tw": "zh-hant", "zh-hk": "zh-hant", "zh-mo": "zh-hant",
    "japanese": "ja", "jp": "ja", "ja-jp": "ja",
-    "german": "de", "deutsch": "de", "de-de": "de",
-    "spanish": "es", "español": "es", "espanol": "es", "es-es": "es", "es-mx": "es",
+    "german": "de", "deutsch": "de", "de-de": "de", "de-at": "de", "de-ch": "de",
+    "spanish": "es", "español": "es", "espanol": "es", "es-es": "es", "es-mx": "es", "es-ar": "es",
    "french": "fr", "français": "fr", "france": "fr", "fr-fr": "fr", "fr-be": "fr", "fr-ca": "fr", "fr-ch": "fr",
    "ukrainian": "uk", "ukrainisch": "uk", "українська": "uk", "uk-ua": "uk", "ua": "uk",
    "turkish": "tr", "türkçe": "tr", "tr-tr": "tr",
+    # Afrikaans — South African Dutch-derived language; "af-ZA" is the common BCP-47 tag.
+    "afrikaans": "af", "af-za": "af",
+    # Korean
+    "korean": "ko", "한국어": "ko", "ko-kr": "ko",
+    # Italian
+    "italian": "it", "italiano": "it", "it-it": "it", "it-ch": "it",
+    # Irish (Gaeilge) — ga is the BCP-47 code
+    "irish": "ga", "gaeilge": "ga", "ga-ie": "ga",
+    # Portuguese — bare "portuguese" routes to European Portuguese; pt-br
+    # is in the same family but rendered identically here (no separate br catalog).
+    "portuguese": "pt", "português": "pt", "portugues": "pt",
+    "pt-pt": "pt", "pt-br": "pt", "brazilian": "pt", "brasileiro": "pt",
+    # Russian
+    "russian": "ru", "русский": "ru", "ru-ru": "ru",
+    # Hungarian
+    "hungarian": "hu", "magyar": "hu", "hu-hu": "hu",
 }

 _catalog_cache: dict[str, dict[str, str]] = {}
--- a/agent/image_gen_registry.py
+++ b/agent/image_gen_registry.py
@@ -77,6 +77,17 @@ def get_active_provider() -> Optional[ImageGenProvider]:

    Reads ``image_gen.provider`` from config.yaml; falls back per the
    module docstring.
+
+    **Availability semantics** (mirrors :mod:`agent.web_search_registry`):
+
+    - When ``image_gen.provider`` is explicitly set, the configured
+      provider is returned even if :meth:`ImageGenProvider.is_available`
+      reports False — the dispatcher surfaces a precise "X_API_KEY is not
+      set" error rather than silently switching backends.
+    - When ``image_gen.provider`` is unset, the fallback path (single-
+      provider shortcut and the FAL legacy preference) is filtered by
+      ``is_available()`` so we don't pick a provider the user has no
+      credentials for.
    """
    configured: Optional[str] = None
    try:
@@ -94,6 +105,17 @@ def get_active_provider() -> Optional[ImageGenProvider]:
    with _lock:
        snapshot = dict(_providers)

+    def _is_available_safe(p: ImageGenProvider) -> bool:
+        """Wrap ``is_available()`` so a buggy provider doesn't kill resolution."""
+        try:
+            return bool(p.is_available())
+        except Exception as exc:  # noqa: BLE001
+            logger.debug("image_gen provider %s.is_available() raised %s", p.name, exc)
+            return False
+
+    # 1. Explicit config wins — return regardless of is_available() so the
+    #    user gets a precise downstream error message rather than a silent
+    #    backend switch.
    if configured:
        provider = snapshot.get(configured)
        if provider is not None:
@@ -103,13 +125,16 @@ def get_active_provider() -> Optional[ImageGenProvider]:
            configured,
        )

-    # Fallback: single-provider case
-    if len(snapshot) == 1:
-        return next(iter(snapshot.values()))
+    # 2. Fallback: single registered provider — but only if it's actually
+    #    available (no credentials = don't surface it as "active").
+    available = [p for p in snapshot.values() if _is_available_safe(p)]
+    if len(available) == 1:
+        return available[0]

-    # Fallback: prefer legacy FAL for backward compat
-    if "fal" in snapshot:
-        return snapshot["fal"]
+    # 3. Fallback: prefer legacy FAL for backward compat, when available.
+    fal = snapshot.get("fal")
+    if fal is not None and _is_available_safe(fal):
+        return fal

    return None

--- a/agent/image_routing.py
+++ b/agent/image_routing.py
@@ -46,6 +46,84 @@ logger = logging.getLogger(__name__)
 _VALID_MODES = frozenset({"auto", "native", "text"})


+# Strict YAML/JSON boolean coercion for capability overrides.
+#
+# ``bool("false")`` is True in Python because non-empty strings are truthy, so
+# a user writing ``supports_vision: "false"`` (quoted — a common YAML mistake)
+# would silently enable native vision routing on a model that can't actually
+# handle it. Accept only the values YAML 1.1 / 1.2 treat as booleans, plus
+# real ``bool`` and integer 0/1. Anything else returns None so the caller
+# falls through to models.dev rather than honouring garbage.
+_TRUE_TOKENS = frozenset({"true", "yes", "on", "1"})
+_FALSE_TOKENS = frozenset({"false", "no", "off", "0"})
+
+
+def _coerce_capability_bool(raw: Any) -> Optional[bool]:
+    """Return True/False for recognised boolean values, None otherwise."""
+    if isinstance(raw, bool):
+        return raw
+    if isinstance(raw, int):
+        if raw in (0, 1):
+            return bool(raw)
+        return None
+    if isinstance(raw, str):
+        s = raw.strip().lower()
+        if s in _TRUE_TOKENS:
+            return True
+        if s in _FALSE_TOKENS:
+            return False
+    return None
+
+
+def _supports_vision_override(
+    cfg: Optional[Dict[str, Any]],
+    provider: str,
+    model: str,
+) -> Optional[bool]:
+    """Resolve user-declared vision capability from config.yaml.
+
+    Resolution order, first hit wins:
+      1. ``model.supports_vision`` (top-level shortcut for the active model)
+      2. ``providers.<provider>.models.<model>.supports_vision``
+         (named custom providers — ``provider`` may be the runtime-resolved
+         value ``"custom"`` and/or the user-declared name under
+         ``model.provider``; both are tried)
+
+    Returns None when no override is set, so the caller falls through to
+    models.dev. Returns False explicitly only when the user wrote a
+    recognised boolean false token.
+    """
+    if not isinstance(cfg, dict):
+        return None
+
+    # 1. Top-level shortcut
+    model_cfg_raw = cfg.get("model")
+    model_cfg: Dict[str, Any] = model_cfg_raw if isinstance(model_cfg_raw, dict) else {}
+    top = _coerce_capability_bool(model_cfg.get("supports_vision"))
+    if top is not None:
+        return top
+
+    # 2. Per-provider, per-model. Named custom providers (e.g. "my-vllm")
+    # get rewritten to provider="custom" at runtime
+    # (hermes_cli/runtime_provider.py:_resolve_named_custom_runtime), so the
+    # config still holds the user-declared name under model.provider. Try
+    # both as candidate provider keys.
+    config_provider = str(model_cfg.get("provider") or "").strip()
+    providers_raw = cfg.get("providers")
+    providers_cfg: Dict[str, Any] = providers_raw if isinstance(providers_raw, dict) else {}
+    for p in dict.fromkeys(filter(None, (provider, config_provider))):
+        entry_raw = providers_cfg.get(p)
+        entry: Dict[str, Any] = entry_raw if isinstance(entry_raw, dict) else {}
+        models_raw = entry.get("models")
+        models_cfg: Dict[str, Any] = models_raw if isinstance(models_raw, dict) else {}
+        per_model_raw = models_cfg.get(model)
+        per_model: Dict[str, Any] = per_model_raw if isinstance(per_model_raw, dict) else {}
+        coerced = _coerce_capability_bool(per_model.get("supports_vision"))
+        if coerced is not None:
+            return coerced
+    return None
+
+
 def _coerce_mode(raw: Any) -> str:
    """Normalize a config value into one of the valid modes."""
    if not isinstance(raw, str):
@@ -76,13 +154,25 @@ def _explicit_aux_vision_override(cfg: Optional[Dict[str, Any]]) -> bool:
    base_url = str(vision.get("base_url") or "").strip()

    # "auto" / "" / blank = not explicit
-    if provider in ("", "auto") and not model and not base_url:
+    if provider in {"", "auto"} and not model and not base_url:
        return False
    return True


-def _lookup_supports_vision(provider: str, model: str) -> Optional[bool]:
-    """Return True/False if we can resolve caps, None if unknown."""
+def _lookup_supports_vision(
+    provider: str,
+    model: str,
+    cfg: Optional[Dict[str, Any]] = None,
+) -> Optional[bool]:
+    """Return True/False if we can resolve caps, None if unknown.
+
+    Consults the user's ``supports_vision`` override in config.yaml first
+    (so custom/local models declared as vision-capable don't fall through to
+    text routing in ``auto`` mode), then falls back to models.dev.
+    """
+    override = _supports_vision_override(cfg, provider, model)
+    if override is not None:
+        return override
    if not provider or not model:
        return None
    try:
@@ -123,7 +213,7 @@ def decide_image_input_mode(
    if _explicit_aux_vision_override(cfg):
        return "text"

-    supports = _lookup_supports_vision(provider, model)
+    supports = _lookup_supports_vision(provider, model, cfg)
    if supports is True:
        return "native"
    return "text"
@@ -163,7 +253,7 @@ def _sniff_mime_from_bytes(raw: bytes) -> Optional[str]:
    if raw.startswith(b"\xff\xd8\xff"):
        return "image/jpeg"
    # GIF87a / GIF89a
-    if raw[:6] in (b"GIF87a", b"GIF89a"):
+    if raw[:6] in {b"GIF87a", b"GIF89a"}:
        return "image/gif"
    # WEBP: "RIFF" .... "WEBP"
    if len(raw) >= 12 and raw[:4] == b"RIFF" and raw[8:12] == b"WEBP":
@@ -172,9 +262,9 @@ def _sniff_mime_from_bytes(raw: bytes) -> Optional[str]:
    if raw.startswith(b"BM"):
        return "image/bmp"
    # HEIC/HEIF: ftypheic / ftypheix / ftypmif1 / ftypmsf1 etc.
-    if len(raw) >= 12 and raw[4:8] == b"ftyp" and raw[8:12] in (
+    if len(raw) >= 12 and raw[4:8] == b"ftyp" and raw[8:12] in {
        b"heic", b"heix", b"hevc", b"hevx", b"mif1", b"msf1", b"heim", b"heis",
-    ):
+    }:
        return "image/heic"
    return None

--- a/agent/iteration_budget.py
+++ b/agent/iteration_budget.py
@@ -0,0 +1,62 @@
+"""Per-agent iteration budget — thread-safe consume/refund counter.
+
+Extracted from ``run_agent.py``.  Each ``AIAgent`` instance (parent or
+subagent) holds an :class:`IterationBudget`; the parent's cap comes from
+``max_iterations`` (default 90), each subagent's cap comes from
+``delegation.max_iterations`` (default 50).
+
+``run_agent`` re-exports ``IterationBudget`` so existing
+``from run_agent import IterationBudget`` imports keep working unchanged.
+"""
+
+from __future__ import annotations
+
+import threading
+
+
+class IterationBudget:
+    """Thread-safe iteration counter for an agent.
+
+    Each agent (parent or subagent) gets its own ``IterationBudget``.
+    The parent's budget is capped at ``max_iterations`` (default 90).
+    Each subagent gets an independent budget capped at
+    ``delegation.max_iterations`` (default 50) — this means total
+    iterations across parent + subagents can exceed the parent's cap.
+    Users control the per-subagent limit via ``delegation.max_iterations``
+    in config.yaml.
+
+    ``execute_code`` (programmatic tool calling) iterations are refunded via
+    :meth:`refund` so they don't eat into the budget.
+    """
+
+    def __init__(self, max_total: int):
+        self.max_total = max_total
+        self._used = 0
+        self._lock = threading.Lock()
+
+    def consume(self) -> bool:
+        """Try to consume one iteration.  Returns True if allowed."""
+        with self._lock:
+            if self._used >= self.max_total:
+                return False
+            self._used += 1
+            return True
+
+    def refund(self) -> None:
+        """Give back one iteration (e.g. for execute_code turns)."""
+        with self._lock:
+            if self._used > 0:
+                self._used -= 1
+
+    @property
+    def used(self) -> int:
+        with self._lock:
+            return self._used
+
+    @property
+    def remaining(self) -> int:
+        with self._lock:
+            return max(0, self.max_total - self._used)
+
+
+__all__ = ["IterationBudget"]
--- a/agent/lsp/init.py
+++ b/agent/lsp/init.py
@@ -0,0 +1,106 @@
+"""Language Server Protocol (LSP) integration for Hermes Agent.
+
+Hermes runs full language servers (pyright, gopls, rust-analyzer,
+typescript-language-server, etc.) as subprocesses and pipes their
+``textDocument/publishDiagnostics`` output into the post-write lint
+delta filter used by ``write_file`` and ``patch``.
+
+LSP is **gated on git workspace detection** — if the agent's cwd is
+inside a git repository, LSP runs against that workspace; otherwise the
+file_operations layer falls back to its existing in-process syntax
+checks.  This keeps users on user-home cwd's (e.g. Telegram gateway
+chats) from spawning daemons they don't need.
+
+Public API:
+
+    from agent.lsp import get_service
+
+    svc = get_service()
+    if svc and svc.enabled_for(path):
+        await svc.touch_file(path)
+        diags = svc.diagnostics_for(path)
+
+The bulk of the wiring is internal — most callers only need the layer
+in :func:`tools.file_operations.FileOperations._check_lint_delta`,
+which is already wired (see that module).
+
+Architecture is documented in ``website/docs/user-guide/features/lsp.md``.
+"""
+from __future__ import annotations
+
+import atexit
+import logging
+import threading
+from typing import Optional
+
+from agent.lsp.manager import LSPService
+
+logger = logging.getLogger("agent.lsp")
+
+_service: Optional[LSPService] = None
+_atexit_registered = False
+_service_lock = threading.Lock()
+
+
+def get_service() -> Optional[LSPService]:
+    """Return the process-wide LSP service singleton, or None when disabled.
+
+    The service is created lazily on first call.  ``None`` is returned
+    when LSP is disabled in config, when no workspace can be detected,
+    or when the platform doesn't support subprocess-based LSP servers.
+
+    On first creation, registers an :mod:`atexit` handler that tears
+    down spawned language servers on Python exit so a long-running
+    CLI or gateway session doesn't leak pyright/gopls/etc. processes
+    when it terminates.
+    """
+    global _service, _atexit_registered
+    if _service is not None:
+        return _service if _service.is_active() else None
+    with _service_lock:
+        if _service is not None:
+            return _service if _service.is_active() else None
+        _service = LSPService.create_from_config()
+        if not _atexit_registered:
+            # ``atexit`` handlers run in LIFO order on normal Python
+            # exit and on SystemExit, but NOT on os._exit() or
+            # uncaught signals.  Language servers are stateless
+            # subprocesses — losing them on SIGKILL is fine; they'll
+            # be reaped by the kernel along with their parent.  We
+            # care about clean exits where Python flushes stdio
+            # before terminating; without this hook every
+            # ``hermes chat`` exit would leak pyright processes that
+            # outlive the parent for a few seconds while their
+            # stdout buffers drain.
+            atexit.register(_atexit_shutdown)
+            _atexit_registered = True
+    return _service if (_service is not None and _service.is_active()) else None
+
+
+def shutdown_service() -> None:
+    """Tear down the LSP service if one was started.
+
+    Safe to call multiple times; safe to call when no service was created.
+    """
+    global _service
+    with _service_lock:
+        svc = _service
+        _service = None
+    if svc is not None:
+        try:
+            svc.shutdown()
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP shutdown error: %s", e)
+
+
+def _atexit_shutdown() -> None:
+    """atexit-registered wrapper.  Logs at debug because by the time
+    atexit fires the user has already seen the agent's final output —
+    a noisy shutdown line on top of that is just clutter."""
+    try:
+        shutdown_service()
+    except Exception as e:  # noqa: BLE001
+        logger.debug("atexit LSP shutdown failed: %s", e)
+
+
+__all__ = ["get_service", "shutdown_service", "LSPService"]
--- a/agent/lsp/cli.py
+++ b/agent/lsp/cli.py
@@ -0,0 +1,308 @@
+"""``hermes lsp`` CLI subcommand.
+
+Subcommands:
+
+- ``status`` — show service state, configured servers, install status.
+- ``install <server_id>`` — eagerly install one server's binary.
+- ``install-all`` — try to install every server with a known recipe.
+- ``restart`` — tear down running clients so the next edit re-spawns.
+- ``which <server_id>`` — print the resolved binary path for one server.
+- ``list`` — print the registry of supported servers.
+
+The handlers are kept here (rather than in
+``hermes_cli/main.py``) so the LSP module ships self-contained.
+"""
+from __future__ import annotations
+
+import argparse
+import sys
+from typing import Optional
+
+
+def register_subparser(subparsers: argparse._SubParsersAction) -> None:
+    """Wire the ``hermes lsp`` subcommand tree into the main argparse."""
+    parser = subparsers.add_parser(
+        "lsp",
+        help="Language Server Protocol management",
+        description=(
+            "Manage the LSP layer that powers post-write semantic "
+            "diagnostics in write_file/patch."
+        ),
+    )
+    sub = parser.add_subparsers(dest="lsp_command")
+
+    sub_status = sub.add_parser("status", help="Show LSP service status")
+    sub_status.add_argument(
+        "--json", action="store_true", help="Emit machine-readable JSON"
+    )
+
+    sub_list = sub.add_parser("list", help="List supported language servers")
+    sub_list.add_argument(
+        "--installed-only",
+        action="store_true",
+        help="Only show servers whose binary is currently available",
+    )
+
+    sub_install = sub.add_parser("install", help="Install a server binary")
+    sub_install.add_argument("server", help="Server id (e.g. pyright, gopls)")
+
+    sub_install_all = sub.add_parser(
+        "install-all",
+        help="Install every server with a known auto-install recipe",
+    )
+    sub_install_all.add_argument(
+        "--include-manual",
+        action="store_true",
+        help="Even attempt servers marked manual-install (best effort)",
+    )
+
+    sub_restart = sub.add_parser(
+        "restart",
+        help="Tear down running LSP clients (next edit re-spawns)",
+    )
+
+    sub_which = sub.add_parser("which", help="Print binary path for a server")
+    sub_which.add_argument("server", help="Server id")
+
+    parser.set_defaults(func=run_lsp_command)
+
+
+def run_lsp_command(args: argparse.Namespace) -> int:
+    """Top-level dispatcher for ``hermes lsp <subcommand>``."""
+    sub = getattr(args, "lsp_command", None) or "status"
+    try:
+        if sub == "status":
+            return _cmd_status(getattr(args, "json", False))
+        if sub == "list":
+            return _cmd_list(getattr(args, "installed_only", False))
+        if sub == "install":
+            return _cmd_install(args.server)
+        if sub == "install-all":
+            return _cmd_install_all(getattr(args, "include_manual", False))
+        if sub == "restart":
+            return _cmd_restart()
+        if sub == "which":
+            return _cmd_which(args.server)
+        sys.stderr.write(f"unknown lsp subcommand: {sub}\n")
+        return 2
+    except KeyboardInterrupt:
+        return 130
+
+
+def _cmd_status(emit_json: bool) -> int:
+    from agent.lsp import get_service
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import detect_status
+
+    svc = get_service()
+    service_active = svc is not None
+    info = svc.get_status() if svc is not None else {"enabled": False}
+
+    if emit_json:
+        import json
+        payload = {
+            "service": info,
+            "registry": [
+                {
+                    "server_id": s.server_id,
+                    "extensions": list(s.extensions),
+                    "description": s.description,
+                    "binary_status": detect_status(_recipe_pkg_for(s.server_id)),
+                }
+                for s in SERVERS
+            ],
+        }
+        sys.stdout.write(json.dumps(payload, indent=2) + "\n")
+        return 0
+
+    out = []
+    out.append("LSP Service")
+    out.append("===========")
+    out.append(f"  enabled:         {info.get('enabled', False)}")
+    if service_active:
+        out.append(f"  wait_mode:       {info.get('wait_mode')}")
+        out.append(f"  wait_timeout:    {info.get('wait_timeout')}s")
+        out.append(f"  install_strategy:{info.get('install_strategy')}")
+        clients = info.get("clients") or []
+        if clients:
+            out.append(f"  active clients:  {len(clients)}")
+            for c in clients:
+                out.append(
+                    f"    - {c['server_id']:20s} state={c['state']:10s} root={c['workspace_root']}"
+                )
+        else:
+            out.append("  active clients:  none")
+        broken = info.get("broken") or []
+        if broken:
+            out.append(f"  broken pairs:    {len(broken)}")
+            for b in broken:
+                out.append(f"    - {b}")
+        disabled = info.get("disabled_servers") or []
+        if disabled:
+            out.append(f"  disabled in cfg: {', '.join(disabled)}")
+
+    # Surface backend-tool gaps that aren't visible in the registry table:
+    # some servers spawn fine but emit no diagnostics without a sidecar
+    # binary (bash-language-server -> shellcheck).
+    backend_warnings = _backend_warnings()
+    if backend_warnings:
+        out.append("")
+        out.append("Backend warnings")
+        out.append("================")
+        for line in backend_warnings:
+            out.append(f"  ! {line}")
+    out.append("")
+    out.append("Registered Servers")
+    out.append("==================")
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        status = detect_status(pkg)
+        marker = {
+            "installed": "✓",
+            "missing": "·",
+            "manual-only": "?",
+        }.get(status, " ")
+        ext_summary = ", ".join(list(s.extensions)[:5])
+        if len(s.extensions) > 5:
+            ext_summary += f", … (+{len(s.extensions) - 5})"
+        out.append(
+            f"  {marker} {s.server_id:24s} [{status:11s}] {ext_summary}"
+        )
+        if s.description:
+            out.append(f"      {s.description}")
+    sys.stdout.write("\n".join(out) + "\n")
+    return 0
+
+
+def _cmd_list(installed_only: bool) -> int:
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import detect_status
+
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        status = detect_status(pkg)
+        if installed_only and status != "installed":
+            continue
+        sys.stdout.write(
+            f"{s.server_id:24s} [{status:11s}] {','.join(s.extensions)}\n"
+        )
+    return 0
+
+
+def _cmd_install(server_id: str) -> int:
+    from agent.lsp.install import try_install, INSTALL_RECIPES, detect_status
+    pkg = _recipe_pkg_for(server_id)
+    pre_status = detect_status(pkg)
+    if pre_status == "installed":
+        sys.stdout.write(f"{server_id} already installed\n")
+        return 0
+    sys.stdout.write(f"installing {server_id} (pkg={pkg}) ...\n")
+    sys.stdout.flush()
+    bin_path = try_install(pkg, "auto")
+    if bin_path is None:
+        recipe = INSTALL_RECIPES.get(pkg)
+        if recipe and recipe.get("strategy") == "manual":
+            sys.stderr.write(
+                f"{server_id}: this server requires a manual install. "
+                f"See documentation.\n"
+            )
+        else:
+            sys.stderr.write(f"{server_id}: install failed (see logs).\n")
+        return 1
+    sys.stdout.write(f"installed: {bin_path}\n")
+    return 0
+
+
+def _cmd_install_all(include_manual: bool) -> int:
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import try_install, INSTALL_RECIPES, detect_status
+
+    rc = 0
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        recipe = INSTALL_RECIPES.get(pkg)
+        if recipe is None:
+            continue
+        if recipe.get("strategy") == "manual" and not include_manual:
+            continue
+        if detect_status(pkg) == "installed":
+            sys.stdout.write(f"  {s.server_id:24s} already installed\n")
+            continue
+        sys.stdout.write(f"  installing {s.server_id} (pkg={pkg}) ... ")
+        sys.stdout.flush()
+        path = try_install(pkg, "auto")
+        if path:
+            sys.stdout.write(f"ok ({path})\n")
+        else:
+            sys.stdout.write("FAILED\n")
+            rc = 1
+    return rc
+
+
+def _cmd_restart() -> int:
+    from agent.lsp import shutdown_service
+
+    shutdown_service()
+    sys.stdout.write("LSP service shut down. Next edit will respawn clients.\n")
+    return 0
+
+
+def _cmd_which(server_id: str) -> int:
+    from agent.lsp.install import INSTALL_RECIPES, hermes_lsp_bin_dir
+    import os
+    import shutil as _shutil
+
+    recipe = INSTALL_RECIPES.get(server_id)
+    bin_name = (recipe or {}).get("bin", server_id)
+    staged = hermes_lsp_bin_dir() / bin_name
+    if staged.exists():
+        sys.stdout.write(str(staged) + "\n")
+        return 0
+    on_path = _shutil.which(bin_name)
+    if on_path:
+        sys.stdout.write(on_path + "\n")
+        return 0
+    sys.stderr.write(f"{server_id}: not installed\n")
+    return 1
+
+
+def _recipe_pkg_for(server_id: str) -> str:
+    """Map a registry ``server_id`` to its install-recipe package key."""
+    # The mapping lives here (not in install.py) because it's a CLI
+    # convenience layer.  Most server_ids are also their own recipe
+    # key, but a few differ (e.g. ``vue-language-server`` →
+    # ``@vue/language-server``).
+    aliases = {
+        "vue-language-server": "@vue/language-server",
+        "astro-language-server": "@astrojs/language-server",
+        "dockerfile-ls": "dockerfile-language-server-nodejs",
+        "typescript": "typescript-language-server",
+    }
+    return aliases.get(server_id, server_id)
+
+
+def _backend_warnings() -> list:
+    """Return human-readable notes about LSP backend tools that are missing
+    in a way that won't surface elsewhere.
+
+    Some language servers ship as thin wrappers around an external CLI for
+    actual diagnostics — they spawn cleanly but never emit any errors when
+    the sidecar binary isn't on PATH.  bash-language-server / shellcheck
+    is the load-bearing example.
+
+    Returned strings are short, actionable, and include the install
+    suggestion across common platforms.
+    """
+    import shutil as _shutil
+    from agent.lsp.install import hermes_lsp_bin_dir
+    notes: list = []
+    bash_installed = _shutil.which("bash-language-server") is not None or (
+        (hermes_lsp_bin_dir() / "bash-language-server").exists()
+    )
+    if bash_installed and _shutil.which("shellcheck") is None:
+        notes.append(
+            "bash-language-server is installed but shellcheck is missing — "
+            "diagnostics will be empty (apt: shellcheck, brew: shellcheck, "
+            "scoop: shellcheck)."
+        )
+    return notes
--- a/agent/lsp/client.py
+++ b/agent/lsp/client.py
@@ -0,0 +1,930 @@
+"""Async LSP client over stdin/stdout.
+
+One :class:`LSPClient` corresponds to one ``(language_server, workspace_root)``
+pair — exactly what OpenCode keys clients on, and the same shape Claude
+Code uses.  The client owns a child process, drives the JSON-RPC
+exchange, and exposes:
+
+- :meth:`open_file` / :meth:`change_file` — text document sync
+- :meth:`wait_for_diagnostics` — block until the server emits fresh
+  diagnostics for a specific file (or a timeout fires)
+- :meth:`diagnostics_for` — read the current per-file diagnostic store
+- :meth:`shutdown` — graceful close + SIGTERM/SIGKILL fallback
+
+The class is designed for async use from a single asyncio event loop.
+The :class:`agent.lsp.manager.LSPService` runs an event loop in a
+background thread so the synchronous file_operations layer can call
+into it via :func:`agent.lsp.manager.LSPService.touch_file`.
+
+Implementation notes:
+
+- Push diagnostics are stored per-URI in :attr:`_push_diagnostics` from
+  ``textDocument/publishDiagnostics`` notifications.  Pull diagnostics
+  go in :attr:`_pull_diagnostics`.  The merged view dedupes by content.
+
+- Whole-document sync.  Even when the server advertises incremental
+  sync, we send a single ``contentChanges`` entry replacing the
+  entire document.  Pretending to be incremental while sending a
+  full replacement is well-tolerated by every major server and saves
+  range bookkeeping.  See OpenCode's ``client.ts:584-659`` for the
+  same trick.
+
+- The "touch-file dance": every ``open_file`` call also fires a
+  ``workspace/didChangeWatchedFiles`` notification (CREATED on the
+  first open, CHANGED thereafter).  Some servers (clangd, eslint)
+  only re-scan when this notification fires, even though the LSP spec
+  doesn't strictly require it.
+
+- ``ContentModified`` (-32801) errors get retried with exponential
+  backoff up to 3 times.  This matches Claude Code's
+  ``LSPServerInstance.sendRequest``.
+"""
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+from pathlib import Path
+from typing import Any, Awaitable, Callable, Dict, List, Optional, Set
+from urllib.parse import quote, unquote
+
+from agent.lsp.protocol import (
+    ERROR_CONTENT_MODIFIED,
+    ERROR_METHOD_NOT_FOUND,
+    LSPProtocolError,
+    LSPRequestError,
+    classify_message,
+    encode_message,
+    make_error_response,
+    make_notification,
+    make_request,
+    make_response,
+    read_message,
+)
+
+logger = logging.getLogger("agent.lsp.client")
+
+# Timeouts (seconds) — mirror OpenCode's constants, scaled to seconds.
+INITIALIZE_TIMEOUT = 45.0
+DIAGNOSTICS_DOCUMENT_WAIT = 5.0
+DIAGNOSTICS_FULL_WAIT = 10.0
+DIAGNOSTICS_REQUEST_TIMEOUT = 3.0
+PUSH_DEBOUNCE = 0.15
+SHUTDOWN_GRACE = 1.0  # seconds between SIGTERM and SIGKILL
+
+# Retry policy for transient ContentModified errors.
+MAX_CONTENT_MODIFIED_RETRIES = 3
+RETRY_BASE_DELAY = 0.5  # 0.5, 1.0, 2.0 — exponential
+
+
+def file_uri(path: str) -> str:
+    """Return ``file://`` URI for an absolute filesystem path.
+
+    Mirrors Node's ``pathToFileURL`` — handles spaces, unicode, and
+    Windows drive letters (``C:\\foo`` → ``file:///C:/foo``).
+    """
+    abs_path = os.path.abspath(path)
+    if os.name == "nt":
+        # Windows: backslash → forward slash, prepend extra slash so
+        # the drive letter shows up as part of the path component.
+        abs_path = abs_path.replace("\\", "/")
+        if not abs_path.startswith("/"):
+            abs_path = "/" + abs_path
+    return "file://" + quote(abs_path, safe="/:")
+
+
+def uri_to_path(uri: str) -> str:
+    """Inverse of :func:`file_uri`."""
+    if not uri.startswith("file://"):
+        return uri
+    raw = uri[len("file://"):]
+    if os.name == "nt" and raw.startswith("/") and len(raw) > 2 and raw[2] == ":":
+        raw = raw[1:]  # strip leading slash before drive letter
+    return os.path.normpath(unquote(raw))
+
+
+def _end_position(text: str) -> Dict[str, int]:
+    """Return the LSP Position at the end of ``text``.
+
+    Used to construct a single-range "replace whole document" change
+    for ``textDocument/didChange`` regardless of the server's declared
+    sync mode.
+    """
+    if not text:
+        return {"line": 0, "character": 0}
+    lines = text.splitlines(keepends=False)
+    last_line = len(lines) - 1
+    last_col = len(lines[-1]) if lines else 0
+    # If the text ends with a trailing newline, ``splitlines`` won't
+    # represent it.  The end position is then the start of the next
+    # (empty) line — line index is len(lines), column 0.
+    if text.endswith(("\n", "\r")):
+        return {"line": last_line + 1, "character": 0}
+    return {"line": last_line, "character": last_col}
+
+
+class LSPClient:
+    """Async LSP client tied to one server process and one workspace root.
+
+    Lifecycle:
+
+        c = LSPClient(server_id, workspace_root, command, args, init_options)
+        await c.start()       # spawn + initialize
+        ver = await c.open_file("/path/to/foo.py")
+        await c.wait_for_diagnostics("/path/to/foo.py", ver)
+        diags = c.diagnostics_for("/path/to/foo.py")
+        await c.shutdown()
+    """
+
+    # ------------------------------------------------------------------
+    # construction + lifecycle
+    # ------------------------------------------------------------------
+
+    def __init__(
+        self,
+        *,
+        server_id: str,
+        workspace_root: str,
+        command: List[str],
+        env: Optional[Dict[str, str]] = None,
+        cwd: Optional[str] = None,
+        initialization_options: Optional[Dict[str, Any]] = None,
+        seed_diagnostics_on_first_push: bool = False,
+    ) -> None:
+        self.server_id = server_id
+        self.workspace_root = workspace_root
+        self._command = list(command)
+        self._env = env
+        self._cwd = cwd or workspace_root
+        self._init_options = initialization_options or {}
+        self._seed_first_push = seed_diagnostics_on_first_push
+
+        # Process + streams
+        self._proc: Optional[asyncio.subprocess.Process] = None
+        self._stderr_task: Optional[asyncio.Task] = None
+        self._reader_task: Optional[asyncio.Task] = None
+
+        # Request/response correlation
+        self._next_id: int = 0
+        self._pending: Dict[int, asyncio.Future] = {}
+
+        # Server-side request handlers (server → client requests).
+        # Kept small and explicit; everything else returns method-not-found.
+        self._request_handlers: Dict[str, Callable[[Any], Awaitable[Any]]] = {
+            "window/workDoneProgress/create": self._handle_work_done_create,
+            "workspace/configuration": self._handle_workspace_configuration,
+            "client/registerCapability": self._handle_register_capability,
+            "client/unregisterCapability": self._handle_unregister_capability,
+            "workspace/workspaceFolders": self._handle_workspace_folders,
+            "workspace/diagnostic/refresh": self._handle_diagnostic_refresh,
+        }
+        # Notifications (server → client) we care about.
+        self._notification_handlers: Dict[str, Callable[[Any], None]] = {
+            "textDocument/publishDiagnostics": self._handle_publish_diagnostics,
+            # Everything else (window/showMessage, $/progress, etc.)
+            # is silently dropped by default.
+        }
+
+        # Tracked file state — required for didChange version bumps.
+        self._files: Dict[str, Dict[str, Any]] = {}
+        # Diagnostic stores, keyed by file path (NOT URI).
+        self._push_diagnostics: Dict[str, List[Dict[str, Any]]] = {}
+        self._pull_diagnostics: Dict[str, List[Dict[str, Any]]] = {}
+        # Per-path "last published" time so wait-for-fresh logic works.
+        self._published: Dict[str, float] = {}
+        # Per-path version of the latest push (matches our didChange
+        # version when the server respects it).
+        self._published_version: Dict[str, int] = {}
+        # First-push seen flag, for typescript-style seed-on-first-push.
+        self._first_push_seen: Set[str] = set()
+        # Capability registrations — only diagnostic ones are tracked.
+        self._diagnostic_registrations: Dict[str, Dict[str, Any]] = {}
+
+        # State machine
+        self._state: str = "stopped"
+        self._initialize_result: Optional[Dict[str, Any]] = None
+        self._sync_kind: int = 1  # 1=Full, 2=Incremental
+        self._stopping: bool = False
+
+        # Push event for waiters.
+        self._push_event = asyncio.Event()
+        # Monotonic counter incremented on every publishDiagnostics push.
+        # Waiters snapshot it on entry and treat any increase as
+        # "something happened, recheck the predicate".  Avoids the
+        # asyncio.Event sticky-state trap.
+        self._push_counter = 0
+        # Registration change event so wait_for_diagnostics can re-loop
+        # when the server announces a new dynamic provider.
+        self._registration_event = asyncio.Event()
+
+    @property
+    def is_running(self) -> bool:
+        return self._state == "running" and self._proc is not None and self._proc.returncode is None
+
+    @property
+    def state(self) -> str:
+        return self._state
+
+    async def start(self) -> None:
+        """Spawn the server and complete the initialize handshake.
+
+        Raises any exception encountered during spawn/init.  On failure
+        the process is killed and the client is left in state
+        ``"error"`` — re-call ``start()`` to retry.
+        """
+        if self._state in {"running", "starting"}:
+            return
+        self._state = "starting"
+        try:
+            await self._spawn()
+            await self._initialize()
+            self._state = "running"
+        except Exception:
+            self._state = "error"
+            await self._cleanup_process()
+            raise
+
+    async def _spawn(self) -> None:
+        env = dict(os.environ)
+        if self._env:
+            env.update(self._env)
+
+        try:
+            self._proc = await asyncio.create_subprocess_exec(
+                self._command[0],
+                *self._command[1:],
+                stdin=asyncio.subprocess.PIPE,
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.PIPE,
+                env=env,
+                cwd=self._cwd,
+            )
+        except FileNotFoundError as e:
+            raise LSPProtocolError(
+                f"LSP server binary not found: {self._command[0]} ({e})"
+            ) from e
+
+        # Drain stderr at debug level — if we don't, the pipe buffer
+        # fills and the server hangs.
+        self._stderr_task = asyncio.create_task(self._drain_stderr())
+        # Start the reader loop.
+        self._reader_task = asyncio.create_task(self._reader_loop())
+
+    async def _drain_stderr(self) -> None:
+        if self._proc is None or self._proc.stderr is None:
+            return
+        try:
+            while True:
+                line = await self._proc.stderr.readline()
+                if not line:
+                    break
+                text = line.decode("utf-8", errors="replace").rstrip()
+                if text:
+                    logger.debug("[%s] stderr: %s", self.server_id, text[:1000])
+        except (asyncio.CancelledError, OSError):
+            pass
+
+    async def _reader_loop(self) -> None:
+        if self._proc is None or self._proc.stdout is None:
+            return
+        try:
+            while True:
+                msg = await read_message(self._proc.stdout)
+                if msg is None:
+                    logger.debug("[%s] server closed stdout cleanly", self.server_id)
+                    break
+                kind, key = classify_message(msg)
+                if kind == "response":
+                    self._dispatch_response(key, msg)
+                elif kind == "request":
+                    asyncio.create_task(self._dispatch_request(key, msg))
+                elif kind == "notification":
+                    self._dispatch_notification(key, msg)
+                else:
+                    logger.warning("[%s] dropping invalid message: %r", self.server_id, msg)
+        except LSPProtocolError as e:
+            logger.warning("[%s] protocol error in reader loop: %s", self.server_id, e)
+        except (asyncio.CancelledError, OSError):
+            pass
+        finally:
+            # Wake up any pending requests so they can fail fast.
+            for fut in list(self._pending.values()):
+                if not fut.done():
+                    fut.set_exception(LSPProtocolError("server connection closed"))
+            self._pending.clear()
+
+    async def _initialize(self) -> None:
+        params = {
+            "rootUri": file_uri(self.workspace_root),
+            "rootPath": self.workspace_root,
+            "processId": os.getpid(),
+            "workspaceFolders": [
+                {"name": "workspace", "uri": file_uri(self.workspace_root)}
+            ],
+            "initializationOptions": self._init_options,
+            "capabilities": {
+                "window": {"workDoneProgress": True},
+                "workspace": {
+                    "configuration": True,
+                    "workspaceFolders": True,
+                    "didChangeWatchedFiles": {"dynamicRegistration": True},
+                    "diagnostics": {"refreshSupport": False},
+                },
+                "textDocument": {
+                    "synchronization": {
+                        "dynamicRegistration": False,
+                        "didOpen": True,
+                        "didChange": True,
+                        "didSave": True,
+                        "willSave": False,
+                        "willSaveWaitUntil": False,
+                    },
+                    "diagnostic": {
+                        "dynamicRegistration": True,
+                        "relatedDocumentSupport": True,
+                    },
+                    "publishDiagnostics": {
+                        "relatedInformation": True,
+                        "tagSupport": {"valueSet": [1, 2]},
+                        "versionSupport": True,
+                        "codeDescriptionSupport": True,
+                        "dataSupport": False,
+                    },
+                    "hover": {"contentFormat": ["markdown", "plaintext"]},
+                    "definition": {"linkSupport": True},
+                    "references": {},
+                    "documentSymbol": {"hierarchicalDocumentSymbolSupport": True},
+                },
+                "general": {"positionEncodings": ["utf-16"]},
+            },
+        }
+
+        result = await asyncio.wait_for(
+            self._send_request("initialize", params),
+            timeout=INITIALIZE_TIMEOUT,
+        )
+        self._initialize_result = result
+        self._sync_kind = self._extract_sync_kind(result.get("capabilities") or {})
+
+        await self._send_notification("initialized", {})
+        if self._init_options:
+            # Some servers (vtsls, eslint) want config pushed via
+            # didChangeConfiguration even if it was sent in
+            # initializationOptions.
+            await self._send_notification(
+                "workspace/didChangeConfiguration",
+                {"settings": self._init_options},
+            )
+
+    @staticmethod
+    def _extract_sync_kind(capabilities: dict) -> int:
+        sync = capabilities.get("textDocumentSync")
+        if isinstance(sync, int):
+            return sync
+        if isinstance(sync, dict):
+            change = sync.get("change")
+            if isinstance(change, int):
+                return change
+        return 1  # default to Full
+
+    async def shutdown(self) -> None:
+        """Best-effort graceful shutdown.
+
+        Sends ``shutdown`` + ``exit``, then SIGTERMs/SIGKILLs the
+        process if it doesn't exit cleanly.  Idempotent.
+        """
+        if self._stopping:
+            return
+        self._stopping = True
+        try:
+            if self.is_running:
+                try:
+                    await asyncio.wait_for(self._send_request("shutdown", None), timeout=2.0)
+                except (asyncio.TimeoutError, LSPRequestError, LSPProtocolError):
+                    pass
+                try:
+                    await self._send_notification("exit", None)
+                except Exception:
+                    pass
+        finally:
+            self._state = "stopped"
+            await self._cleanup_process()
+
+    async def _cleanup_process(self) -> None:
+        if self._reader_task is not None and not self._reader_task.done():
+            self._reader_task.cancel()
+            try:
+                await self._reader_task
+            except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                pass
+        if self._stderr_task is not None and not self._stderr_task.done():
+            self._stderr_task.cancel()
+            try:
+                await self._stderr_task
+            except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                pass
+        proc = self._proc
+        self._proc = None
+        if proc is None:
+            return
+        if proc.returncode is None:
+            try:
+                proc.terminate()
+                try:
+                    await asyncio.wait_for(proc.wait(), timeout=SHUTDOWN_GRACE)
+                except asyncio.TimeoutError:
+                    try:
+                        proc.kill()
+                        await proc.wait()
+                    except ProcessLookupError:
+                        pass
+            except ProcessLookupError:
+                pass
+
+    # ------------------------------------------------------------------
+    # request / notification plumbing
+    # ------------------------------------------------------------------
+
+    async def _send_request(self, method: str, params: Any) -> Any:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            raise LSPProtocolError(f"cannot send {method!r}: stdin closed")
+        loop = asyncio.get_running_loop()
+        req_id = self._next_id
+        self._next_id += 1
+        fut: asyncio.Future = loop.create_future()
+        self._pending[req_id] = fut
+        try:
+            self._proc.stdin.write(encode_message(make_request(req_id, method, params)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError) as e:
+            self._pending.pop(req_id, None)
+            raise LSPProtocolError(f"send failed for {method!r}: {e}") from e
+        try:
+            return await fut
+        finally:
+            self._pending.pop(req_id, None)
+
+    async def _send_request_with_retry(self, method: str, params: Any, *, timeout: float) -> Any:
+        """Send a request, retrying on ``ContentModified`` (-32801).
+
+        Other errors propagate.  The retry policy matches Claude Code's
+        ``LSPServerInstance.sendRequest`` — 3 attempts with delays
+        0.5s, 1.0s, 2.0s.
+        """
+        for attempt in range(MAX_CONTENT_MODIFIED_RETRIES + 1):
+            try:
+                return await asyncio.wait_for(self._send_request(method, params), timeout=timeout)
+            except LSPRequestError as e:
+                if e.code == ERROR_CONTENT_MODIFIED and attempt < MAX_CONTENT_MODIFIED_RETRIES:
+                    await asyncio.sleep(RETRY_BASE_DELAY * (2 ** attempt))
+                    continue
+                raise
+
+    async def _send_notification(self, method: str, params: Any) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_notification(method, params)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError) as e:
+            logger.debug("[%s] notify %s failed: %s", self.server_id, method, e)
+
+    async def _send_response(self, req_id: Any, result: Any) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_response(req_id, result)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError):
+            pass
+
+    async def _send_error_response(self, req_id: Any, code: int, message: str) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_error_response(req_id, code, message)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError):
+            pass
+
+    def _dispatch_response(self, req_id: int, msg: dict) -> None:
+        fut = self._pending.get(req_id)
+        if fut is None or fut.done():
+            return
+        if "error" in msg:
+            err = msg["error"] or {}
+            fut.set_exception(
+                LSPRequestError(
+                    code=int(err.get("code", -32000)),
+                    message=str(err.get("message", "unknown")),
+                    data=err.get("data"),
+                )
+            )
+        else:
+            fut.set_result(msg.get("result"))
+
+    async def _dispatch_request(self, req_id: Any, msg: dict) -> None:
+        method = msg.get("method", "")
+        params = msg.get("params")
+        handler = self._request_handlers.get(method)
+        if handler is None:
+            await self._send_error_response(req_id, ERROR_METHOD_NOT_FOUND, f"method not found: {method}")
+            return
+        try:
+            result = await handler(params)
+        except Exception as e:  # noqa: BLE001 — protocol must not blow up
+            logger.warning("[%s] request handler %s failed: %s", self.server_id, method, e)
+            await self._send_error_response(req_id, -32000, f"handler failed: {e}")
+            return
+        await self._send_response(req_id, result)
+
+    def _dispatch_notification(self, method: str, msg: dict) -> None:
+        handler = self._notification_handlers.get(method)
+        if handler is None:
+            return
+        try:
+            handler(msg.get("params"))
+        except Exception as e:  # noqa: BLE001
+            logger.debug("[%s] notification handler %s failed: %s", self.server_id, method, e)
+
+    # ------------------------------------------------------------------
+    # built-in server-→-client request handlers
+    # ------------------------------------------------------------------
+
+    async def _handle_work_done_create(self, params: Any) -> Any:
+        # Acknowledge progress tokens — required by some servers.
+        return None
+
+    async def _handle_workspace_configuration(self, params: Any) -> Any:
+        # Walk dotted sections through initializationOptions.  Mirrors
+        # OpenCode's `client.ts:198-220` — return null when missing.
+        if not isinstance(params, dict):
+            return [None]
+        items = params.get("items") or []
+        out: List[Any] = []
+        for item in items:
+            if not isinstance(item, dict):
+                out.append(None)
+                continue
+            section = item.get("section")
+            if not section or not self._init_options:
+                out.append(self._init_options or None)
+                continue
+            cur: Any = self._init_options
+            for part in str(section).split("."):
+                if isinstance(cur, dict) and part in cur:
+                    cur = cur[part]
+                else:
+                    cur = None
+                    break
+            out.append(cur)
+        return out
+
+    async def _handle_register_capability(self, params: Any) -> Any:
+        if not isinstance(params, dict):
+            return None
+        for reg in params.get("registrations") or []:
+            if not isinstance(reg, dict):
+                continue
+            method = reg.get("method")
+            reg_id = reg.get("id")
+            if method == "textDocument/diagnostic" and reg_id:
+                self._diagnostic_registrations[str(reg_id)] = reg
+                self._registration_event.set()
+        return None
+
+    async def _handle_unregister_capability(self, params: Any) -> Any:
+        if not isinstance(params, dict):
+            return None
+        for unreg in params.get("unregisterations") or []:
+            if not isinstance(unreg, dict):
+                continue
+            reg_id = unreg.get("id")
+            if reg_id:
+                self._diagnostic_registrations.pop(str(reg_id), None)
+        return None
+
+    async def _handle_workspace_folders(self, params: Any) -> Any:
+        return [{"name": "workspace", "uri": file_uri(self.workspace_root)}]
+
+    async def _handle_diagnostic_refresh(self, params: Any) -> Any:
+        # We don't honour refresh — we re-pull on every touchFile.
+        return None
+
+    # ------------------------------------------------------------------
+    # publishDiagnostics handler
+    # ------------------------------------------------------------------
+
+    def _handle_publish_diagnostics(self, params: Any) -> None:
+        if not isinstance(params, dict):
+            return
+        uri = params.get("uri")
+        if not isinstance(uri, str):
+            return
+        path = uri_to_path(uri)
+        diagnostics = params.get("diagnostics") or []
+        if not isinstance(diagnostics, list):
+            diagnostics = []
+        version = params.get("version")
+        loop_time = asyncio.get_event_loop().time()
+
+        if self._seed_first_push and path not in self._first_push_seen:
+            # First push: seed without firing the event so a waiter
+            # doesn't resolve on the very first push (which arrives
+            # before the user-triggered didChange could've produced
+            # fresh diagnostics).
+            self._first_push_seen.add(path)
+            self._push_diagnostics[path] = diagnostics
+            self._published[path] = loop_time
+            if isinstance(version, int):
+                self._published_version[path] = version
+            return
+
+        self._push_diagnostics[path] = diagnostics
+        self._published[path] = loop_time
+        if isinstance(version, int):
+            self._published_version[path] = version
+        self._first_push_seen.add(path)
+        # Bump the monotonic push counter and wake every waiter.  We
+        # keep the Event sticky-set so any wait already in progress
+        # resolves; waiters re-check their predicate after waking and
+        # decide whether to keep waiting.  ``_push_counter`` is what
+        # they actually compare against to detect a fresh event.
+        self._push_counter += 1
+        self._push_event.set()
+
+    # ------------------------------------------------------------------
+    # public file-sync API
+    # ------------------------------------------------------------------
+
+    async def open_file(self, path: str, *, language_id: str = "plaintext") -> int:
+        """Send didOpen (first time) or didChange (subsequent) for ``path``.
+
+        Returns the new document version number that the agent's
+        ``wait_for_diagnostics`` should match against.
+        """
+        if not self.is_running:
+            raise LSPProtocolError("client not running")
+
+        abs_path = os.path.abspath(path)
+        try:
+            text = Path(abs_path).read_text(encoding="utf-8", errors="replace")
+        except OSError as e:
+            raise LSPProtocolError(f"cannot read {abs_path}: {e}") from e
+
+        uri = file_uri(abs_path)
+        existing = self._files.get(abs_path)
+
+        if existing is not None:
+            # Re-open: bump version, fire didChangeWatchedFiles + didChange.
+            await self._send_notification(
+                "workspace/didChangeWatchedFiles",
+                {"changes": [{"uri": uri, "type": 2}]},  # 2 = CHANGED
+            )
+            new_version = existing["version"] + 1
+            old_text = existing["text"]
+            content_changes: List[Dict[str, Any]]
+            if self._sync_kind == 2:
+                content_changes = [
+                    {
+                        "range": {
+                            "start": {"line": 0, "character": 0},
+                            "end": _end_position(old_text),
+                        },
+                        "text": text,
+                    }
+                ]
+            else:
+                content_changes = [{"text": text}]
+            await self._send_notification(
+                "textDocument/didChange",
+                {
+                    "textDocument": {"uri": uri, "version": new_version},
+                    "contentChanges": content_changes,
+                },
+            )
+            self._files[abs_path] = {"version": new_version, "text": text}
+            return new_version
+
+        # First open: didChangeWatchedFiles CREATED + didOpen.
+        await self._send_notification(
+            "workspace/didChangeWatchedFiles",
+            {"changes": [{"uri": uri, "type": 1}]},  # 1 = CREATED
+        )
+        # Clear any stale push/pull entries — fresh open should start
+        # from scratch.
+        self._push_diagnostics.pop(abs_path, None)
+        self._pull_diagnostics.pop(abs_path, None)
+        self._published.pop(abs_path, None)
+        self._published_version.pop(abs_path, None)
+        await self._send_notification(
+            "textDocument/didOpen",
+            {
+                "textDocument": {
+                    "uri": uri,
+                    "languageId": language_id,
+                    "version": 0,
+                    "text": text,
+                }
+            },
+        )
+        self._files[abs_path] = {"version": 0, "text": text}
+        return 0
+
+    async def save_file(self, path: str) -> None:
+        """Send didSave for ``path``.  Some linters re-scan only on save."""
+        if not self.is_running:
+            return
+        abs_path = os.path.abspath(path)
+        await self._send_notification(
+            "textDocument/didSave",
+            {"textDocument": {"uri": file_uri(abs_path)}},
+        )
+
+    # ------------------------------------------------------------------
+    # diagnostics: pull + wait
+    # ------------------------------------------------------------------
+
+    async def _pull_document_diagnostics(self, path: str) -> None:
+        """Send ``textDocument/diagnostic`` for one file.
+
+        Stores results into :attr:`_pull_diagnostics`.  Silently
+        no-ops on errors (server may not support the pull endpoint).
+        """
+        try:
+            params: Dict[str, Any] = {
+                "textDocument": {"uri": file_uri(os.path.abspath(path))}
+            }
+            result = await self._send_request_with_retry(
+                "textDocument/diagnostic",
+                params,
+                timeout=DIAGNOSTICS_REQUEST_TIMEOUT,
+            )
+        except (LSPRequestError, LSPProtocolError, asyncio.TimeoutError) as e:
+            logger.debug("[%s] document diagnostic pull failed: %s", self.server_id, e)
+            return
+        if not isinstance(result, dict):
+            return
+        items = result.get("items")
+        if isinstance(items, list):
+            self._pull_diagnostics[os.path.abspath(path)] = items
+        related = result.get("relatedDocuments")
+        if isinstance(related, dict):
+            for uri, sub in related.items():
+                if not isinstance(sub, dict):
+                    continue
+                sub_items = sub.get("items")
+                if isinstance(sub_items, list):
+                    self._pull_diagnostics[uri_to_path(uri)] = sub_items
+
+    async def wait_for_diagnostics(
+        self,
+        path: str,
+        version: int,
+        *,
+        mode: str = "document",
+    ) -> None:
+        """Wait for the server to publish diagnostics for ``path`` at ``version``.
+
+        ``mode`` is ``"document"`` (5s budget, document pulls) or
+        ``"full"`` (10s budget, also workspace pulls).  Best-effort —
+        returns silently on timeout.  Does NOT throw if the server
+        doesn't support pull diagnostics; we still get the push side.
+        """
+        budget = DIAGNOSTICS_FULL_WAIT if mode == "full" else DIAGNOSTICS_DOCUMENT_WAIT
+        deadline = asyncio.get_event_loop().time() + budget
+        abs_path = os.path.abspath(path)
+
+        while True:
+            remaining = deadline - asyncio.get_event_loop().time()
+            if remaining <= 0:
+                return
+
+            # Concurrent: document pull + push wait.
+            pull_task = asyncio.create_task(self._pull_document_diagnostics(abs_path))
+            push_task = asyncio.create_task(self._wait_for_fresh_push(abs_path, version, remaining))
+            done, pending = await asyncio.wait(
+                {pull_task, push_task},
+                timeout=remaining,
+                return_when=asyncio.FIRST_COMPLETED,
+            )
+            for t in pending:
+                t.cancel()
+            for t in pending:
+                try:
+                    await t
+                except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                    pass
+
+            # If we got a fresh push for our version, we're done.
+            current_v = self._published_version.get(abs_path)
+            if abs_path in self._published and (
+                current_v is None or current_v >= version
+            ):
+                return
+
+            # Pull may have populated _pull_diagnostics — that's also
+            # success.
+            if abs_path in self._pull_diagnostics:
+                return
+
+            # Loop until budget runs out.
+
+    async def _wait_for_fresh_push(self, path: str, version: int, timeout: float) -> None:
+        """Wait until a publishDiagnostics arrives for ``path`` at ``version``+."""
+        deadline = asyncio.get_event_loop().time() + timeout
+        baseline = self._push_counter
+        while True:
+            current_v = self._published_version.get(path)
+            if path in self._published and (current_v is None or current_v >= version):
+                # Debounce — wait a tick in case more diagnostics arrive
+                # immediately after.  TS often emits in pairs.  We
+                # snapshot the counter so we wake on a *new* push, not
+                # on the one that satisfied us a moment ago.
+                debounce_baseline = self._push_counter
+                debounce_deadline = asyncio.get_event_loop().time() + PUSH_DEBOUNCE
+                while self._push_counter == debounce_baseline:
+                    remaining = debounce_deadline - asyncio.get_event_loop().time()
+                    if remaining <= 0:
+                        break
+                    self._push_event.clear()
+                    try:
+                        await asyncio.wait_for(self._push_event.wait(), timeout=remaining)
+                    except asyncio.TimeoutError:
+                        break
+                return
+            remaining = deadline - asyncio.get_event_loop().time()
+            if remaining <= 0:
+                return
+            if self._push_counter > baseline:
+                # New event arrived but predicate still false — re-check
+                # immediately without waiting again.
+                baseline = self._push_counter
+                continue
+            self._push_event.clear()
+            try:
+                await asyncio.wait_for(self._push_event.wait(), timeout=min(remaining, 0.5))
+            except asyncio.TimeoutError:
+                continue
+
+    def diagnostics_for(self, path: str) -> List[Dict[str, Any]]:
+        """Return current merged + deduped diagnostics for one file.
+
+        Diagnostics from push and pull stores are concatenated and
+        deduplicated by ``(severity, code, message, range)`` content
+        key.  Empty list if the server hasn't published anything.
+        """
+        abs_path = os.path.abspath(path)
+        push = self._push_diagnostics.get(abs_path) or []
+        pull = self._pull_diagnostics.get(abs_path) or []
+        return _dedupe(push, pull)
+
+
+def _dedupe(*lists: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    seen: Set[str] = set()
+    out: List[Dict[str, Any]] = []
+    for lst in lists:
+        for d in lst:
+            if not isinstance(d, dict):
+                continue
+            key = _diagnostic_key(d)
+            if key in seen:
+                continue
+            seen.add(key)
+            out.append(d)
+    return out
+
+
+def _diagnostic_key(d: Dict[str, Any]) -> str:
+    """Content-equality key for a diagnostic.
+
+    Matches the structural-equality used in claude-code's
+    ``areDiagnosticsEqual`` — message + severity + source + code +
+    range coords.  The range is reduced to a tuple to keep the key
+    stable across dict orderings.
+    """
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    end = rng.get("end") or {}
+    code = d.get("code")
+    if code is not None and not isinstance(code, str):
+        code = str(code)
+    return "\x00".join(
+        [
+            str(d.get("severity") or 1),
+            str(code or ""),
+            str(d.get("source") or ""),
+            str(d.get("message") or "").strip(),
+            f"{start.get('line', 0)}:{start.get('character', 0)}-{end.get('line', 0)}:{end.get('character', 0)}",
+        ]
+    )
+
+
+__all__ = [
+    "LSPClient",
+    "file_uri",
+    "uri_to_path",
+    "INITIALIZE_TIMEOUT",
+    "DIAGNOSTICS_DOCUMENT_WAIT",
+    "DIAGNOSTICS_FULL_WAIT",
+]
--- a/agent/lsp/eventlog.py
+++ b/agent/lsp/eventlog.py
@@ -0,0 +1,213 @@
+"""Structured logging with steady-state silence for the LSP layer.
+
+The LSP layer fires on every write_file/patch.  In a busy session
+that's hundreds of events.  We want users to be able to ``rg`` the
+log for "did LSP fire on that edit?" without drowning in noise.
+
+The level model:
+
+- ``DEBUG`` for steady-state events that have no novel signal:
+  ``clean``, ``feature off``, ``extension not mapped``, ``no project
+  root for already-announced file``, ``server unavailable for
+  already-announced binary``.  These never reach ``agent.log`` at the
+  default INFO threshold.
+
+- ``INFO`` for state transitions worth surfacing exactly once per
+  session: ``active for <root>`` the first time a (server_id,
+  workspace_root) client starts, ``no project root for <path>``
+  the first time we see that file.  Plus every diagnostic event
+  (those are inherently rare and per-edit, exactly what users grep
+  for).
+
+- ``WARNING`` for action-required failures: ``server unavailable``
+  (binary not on PATH) the first time per (server_id, binary),
+  ``no server configured`` once per language.  Per-call WARNING for
+  timeouts and unexpected bridge exceptions.
+
+The dedup is in-process module-level sets.  Each set grows at most by
+the number of distinct (server_id, root) and (server_id, binary)
+pairs touched in one Python process — bytes of memory in even an
+aggressive monorepo session.  Bounded LRU was rejected: evicting an
+entry would risk re-firing the WARNING/INFO line we explicitly want
+to suppress.
+
+Grep recipe::
+
+    tail -f ~/.hermes/logs/agent.log | rg 'lsp\\['
+"""
+from __future__ import annotations
+
+import logging
+import os
+import threading
+from typing import Tuple
+
+# Dedicated logger name so the documented grep recipe survives a
+# ``logging.getLogger(__name__)`` rename of any internal module.
+event_log = logging.getLogger("hermes.lint.lsp")
+
+# ---------------------------------------------------------------------------
+# Once-per-X dedup sets
+# ---------------------------------------------------------------------------
+
+_announce_lock = threading.Lock()
+_announced_active: set = set()        # keys: (server_id, workspace_root)
+_announced_unavailable: set = set()   # keys: (server_id, binary_path_or_name)
+_announced_no_root: set = set()       # keys: (server_id, file_path)
+_announced_no_server: set = set()     # keys: (server_id,)
+
+
+def _short_path(file_path: str) -> str:
+    """Render *file_path* relative to the cwd when sensible, else absolute.
+
+    Keeps log lines readable for the common case (the user is inside
+    the project they're editing) without emitting brittle ``../../..``
+    chains for the cross-tree case.
+    """
+    if not file_path:
+        return file_path
+    try:
+        rel = os.path.relpath(file_path)
+    except ValueError:
+        return file_path
+    if rel.startswith(".." + os.sep) or rel == "..":
+        return file_path
+    return rel
+
+
+def _emit(server_id: str, level: int, message: str) -> None:
+    event_log.log(level, "lsp[%s] %s", server_id, message)
+
+
+def _announce_once(bucket: set, key: Tuple) -> bool:
+    """Return True if *key* has not been announced for *bucket* yet.
+
+    Atomically marks the key as announced so concurrent callers
+    cannot both win the race and double-log.
+    """
+    with _announce_lock:
+        if key in bucket:
+            return False
+        bucket.add(key)
+        return True
+
+
+# ---------------------------------------------------------------------------
+# Public event helpers — call these from the LSP layer.
+# ---------------------------------------------------------------------------
+
+
+def log_clean(server_id: str, file_path: str) -> None:
+    """No diagnostics emitted for *file_path*.  DEBUG (silent at default)."""
+    _emit(server_id, logging.DEBUG, f"clean ({_short_path(file_path)})")
+
+
+def log_disabled(server_id: str, file_path: str, reason: str) -> None:
+    """LSP intentionally skipped for this file (feature off, ext unmapped,
+    backend not local, etc.).  DEBUG."""
+    _emit(server_id, logging.DEBUG, f"skipped: {reason} ({_short_path(file_path)})")
+
+
+def log_active(server_id: str, workspace_root: str) -> None:
+    """A new LSP client started for (server_id, workspace_root).
+
+    INFO once per (server_id, workspace_root); DEBUG thereafter.
+    Lets users verify "is LSP actually running?" with a single grep.
+    """
+    key = (server_id, workspace_root)
+    if _announce_once(_announced_active, key):
+        _emit(server_id, logging.INFO, f"active for {workspace_root}")
+    else:
+        _emit(server_id, logging.DEBUG, f"reused client for {workspace_root}")
+
+
+def log_diagnostics(server_id: str, file_path: str, count: int) -> None:
+    """Diagnostics arrived for a file.  INFO every time — these are the
+    failure signals users actually want to grep for, and they are
+    inherently rare per edit."""
+    _emit(server_id, logging.INFO, f"{count} diags ({_short_path(file_path)})")
+
+
+def log_no_project_root(server_id: str, file_path: str) -> None:
+    """File had no recognised project marker.  INFO once per file,
+    DEBUG thereafter."""
+    key = (server_id, file_path)
+    if _announce_once(_announced_no_root, key):
+        _emit(server_id, logging.INFO, f"no project root for {_short_path(file_path)}")
+    else:
+        _emit(server_id, logging.DEBUG, f"no project root for {_short_path(file_path)}")
+
+
+def log_server_unavailable(server_id: str, binary_or_pkg: str) -> None:
+    """The server binary couldn't be resolved.  WARNING once per
+    (server_id, binary), DEBUG thereafter so a hundred subsequent
+    .py edits don't spam the log."""
+    key = (server_id, binary_or_pkg)
+    if _announce_once(_announced_unavailable, key):
+        _emit(
+            server_id,
+            logging.WARNING,
+            f"server unavailable: {binary_or_pkg} not found "
+            "(install via `hermes lsp install <id>` or set lsp.servers.<id>.command)",
+        )
+    else:
+        _emit(server_id, logging.DEBUG, f"server still unavailable: {binary_or_pkg}")
+
+
+def log_no_server_configured(server_id: str) -> None:
+    """No spawn recipe for this language.  WARNING once."""
+    if _announce_once(_announced_no_server, (server_id,)):
+        _emit(server_id, logging.WARNING, "no server configured")
+
+
+def log_timeout(server_id: str, file_path: str, kind: str = "diagnostics") -> None:
+    """A request to the server timed out.  WARNING every time — these are
+    inherently novel events worth surfacing on each occurrence."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"{kind} timed out for {_short_path(file_path)}",
+    )
+
+
+def log_server_error(server_id: str, file_path: str, exc: BaseException) -> None:
+    """An unexpected exception bubbled out of the LSP layer.  WARNING."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"unexpected error for {_short_path(file_path)}: {type(exc).__name__}: {exc}",
+    )
+
+
+def log_spawn_failed(server_id: str, workspace_root: str, exc: BaseException) -> None:
+    """The LSP server failed to spawn or initialize.  WARNING."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"spawn/initialize failed for {workspace_root}: {type(exc).__name__}: {exc}",
+    )
+
+
+def reset_announce_caches() -> None:
+    """Test-only: clear the dedup caches.  Production code never calls this."""
+    with _announce_lock:
+        _announced_active.clear()
+        _announced_unavailable.clear()
+        _announced_no_root.clear()
+        _announced_no_server.clear()
+
+
+__all__ = [
+    "event_log",
+    "log_clean",
+    "log_disabled",
+    "log_active",
+    "log_diagnostics",
+    "log_no_project_root",
+    "log_server_unavailable",
+    "log_no_server_configured",
+    "log_timeout",
+    "log_server_error",
+    "log_spawn_failed",
+    "reset_announce_caches",
+]
--- a/agent/lsp/install.py
+++ b/agent/lsp/install.py
@@ -0,0 +1,376 @@
+"""Auto-installation of LSP server binaries.
+
+Tries to install missing servers using whatever package manager is
+appropriate.  All installs go to a Hermes-owned bin staging dir,
+``<HERMES_HOME>/lsp/bin/``, so we don't pollute the user's global
+toolchain.
+
+Strategies:
+
+- ``auto`` — attempt to install with the best available package
+  manager.  This is the default.
+- ``manual`` — never install; if a binary is missing, the server is
+  silently skipped and the user is told about it via ``hermes lsp
+  status``.
+- ``off`` — same as ``manual`` for now (kept distinct so we can
+  evolve behavior later, e.g. logging differently).
+
+The actual installs happen synchronously the first time a server is
+needed and concurrent calls to :func:`try_install` for the same
+package are deduplicated via a per-package lock.
+
+Failure modes are non-fatal: every install path is wrapped in
+try/except and returns ``None`` on failure.  The tool layer then
+falls back to its in-process syntax checker, exactly as if the user
+hadn't enabled LSP at all.
+"""
+from __future__ import annotations
+
+import logging
+import os
+import shutil
+import subprocess
+import sys
+import threading
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+logger = logging.getLogger("agent.lsp.install")
+
+# Package-name → install-strategy hint registry.  Each entry is a
+# tuple of strategy name + package name + executable name.  When the
+# install completes, we look for the executable in
+# ``<HERMES_HOME>/lsp/bin/`` first, then on PATH.
+#
+# Optional fields:
+#   - ``extra_pkgs``: list of sibling packages to install alongside
+#     ``pkg`` in the same node_modules tree.  Used when an LSP server
+#     has a runtime peer dependency that npm doesn't auto-pull (e.g.
+#     typescript-language-server needs ``typescript``).
+INSTALL_RECIPES: Dict[str, Dict[str, Any]] = {
+    # Python
+    "pyright": {"strategy": "npm", "pkg": "pyright", "bin": "pyright-langserver"},
+    # JS/TS family
+    "typescript-language-server": {
+        "strategy": "npm",
+        "pkg": "typescript-language-server",
+        "bin": "typescript-language-server",
+        # typescript-language-server requires the `typescript` SDK
+        # (tsserver) to be importable from the same node_modules tree;
+        # otherwise initialize() fails with "Could not find a valid
+        # TypeScript installation".  Install them together.
+        "extra_pkgs": ["typescript"],
+    },
+    "@vue/language-server": {
+        "strategy": "npm",
+        "pkg": "@vue/language-server",
+        "bin": "vue-language-server",
+    },
+    "svelte-language-server": {
+        "strategy": "npm",
+        "pkg": "svelte-language-server",
+        "bin": "svelteserver",
+    },
+    "@astrojs/language-server": {
+        "strategy": "npm",
+        "pkg": "@astrojs/language-server",
+        "bin": "astro-ls",
+    },
+    "yaml-language-server": {
+        "strategy": "npm",
+        "pkg": "yaml-language-server",
+        "bin": "yaml-language-server",
+    },
+    "bash-language-server": {
+        "strategy": "npm",
+        "pkg": "bash-language-server",
+        "bin": "bash-language-server",
+    },
+    "intelephense": {"strategy": "npm", "pkg": "intelephense", "bin": "intelephense"},
+    "dockerfile-language-server-nodejs": {
+        "strategy": "npm",
+        "pkg": "dockerfile-language-server-nodejs",
+        "bin": "docker-langserver",
+    },
+    # Go
+    "gopls": {"strategy": "go", "pkg": "golang.org/x/tools/gopls@latest", "bin": "gopls"},
+    # Rust — too heavy (hundreds of MB to bootstrap).  We do NOT
+    # auto-install rust-analyzer; users install via rustup.
+    "rust-analyzer": {"strategy": "manual", "pkg": "", "bin": "rust-analyzer"},
+    # C/C++ — manual (clangd ships with LLVM, very heavy)
+    "clangd": {"strategy": "manual", "pkg": "", "bin": "clangd"},
+    # Lua — manual (LuaLS is platform-specific binaries from GitHub
+    # releases; complex enough that we punt to the user)
+    "lua-language-server": {"strategy": "manual", "pkg": "", "bin": "lua-language-server"},
+}
+
+
+_install_locks: Dict[str, threading.Lock] = {}
+_install_results: Dict[str, Optional[str]] = {}
+_install_lock_meta = threading.Lock()
+
+
+def hermes_lsp_bin_dir() -> Path:
+    """Return the Hermes-owned bin staging dir for LSP servers."""
+    home = os.environ.get("HERMES_HOME")
+    if home is None:
+        home = os.path.join(os.path.expanduser("~"), ".hermes")
+    p = Path(home) / "lsp" / "bin"
+    p.mkdir(parents=True, exist_ok=True)
+    return p
+
+
+def _existing_binary(name: str) -> Optional[str]:
+    """Probe the staging dir + PATH for a binary named ``name``."""
+    staged = hermes_lsp_bin_dir() / name
+    if staged.exists() and os.access(staged, os.X_OK):
+        return str(staged)
+    on_path = shutil.which(name)
+    if on_path:
+        return on_path
+    return None
+
+
+def _get_lock(pkg: str) -> threading.Lock:
+    with _install_lock_meta:
+        lock = _install_locks.get(pkg)
+        if lock is None:
+            lock = threading.Lock()
+            _install_locks[pkg] = lock
+        return lock
+
+
+def try_install(pkg: str, strategy: str = "auto") -> Optional[str]:
+    """Try to install ``pkg`` and return the binary path if successful.
+
+    ``strategy`` is ``"auto"``, ``"manual"``, or ``"off"``.  In
+    ``manual``/``off`` mode, this function only probes for an
+    existing binary and returns ``None`` if not found.
+
+    The install is cached per-package — a second call returns the
+    same path (or ``None``) without reinstalling.  Concurrent calls
+    are serialized.
+    """
+    if strategy not in {"auto",}:
+        # Only ``auto`` triggers an actual install.  In manual/off,
+        # we still check whether the binary already exists.
+        recipe = INSTALL_RECIPES.get(pkg, {})
+        bin_name = recipe.get("bin", pkg)
+        return _existing_binary(bin_name)
+
+    if pkg in _install_results:
+        return _install_results[pkg]
+
+    lock = _get_lock(pkg)
+    with lock:
+        # Double-check after acquiring lock.
+        if pkg in _install_results:
+            return _install_results[pkg]
+        result = _do_install(pkg)
+        _install_results[pkg] = result
+        return result
+
+
+def _do_install(pkg: str) -> Optional[str]:
+    recipe = INSTALL_RECIPES.get(pkg)
+    if recipe is None:
+        # Not in our registry — best-effort: just probe PATH.
+        return shutil.which(pkg)
+
+    strategy = recipe.get("strategy", "manual")
+    bin_name = recipe.get("bin", pkg)
+
+    # Check if already present (shutil.which or staging dir)
+    existing = _existing_binary(bin_name)
+    if existing:
+        return existing
+
+    if strategy == "manual":
+        logger.debug("[install] %s requires manual install (recipe=%s)", pkg, recipe)
+        return None
+
+    if strategy == "npm":
+        return _install_npm(
+            recipe.get("pkg", pkg),
+            bin_name,
+            extra_pkgs=recipe.get("extra_pkgs") or [],
+        )
+    if strategy == "go":
+        return _install_go(recipe.get("pkg", pkg), bin_name)
+    if strategy == "pip":
+        return _install_pip(recipe.get("pkg", pkg), bin_name)
+
+    logger.warning("[install] unknown strategy %r for %s", strategy, pkg)
+    return None
+
+
+def _install_npm(
+    pkg: str,
+    bin_name: str,
+    extra_pkgs: Optional[list] = None,
+) -> Optional[str]:
+    """Install an npm package into our staging dir.
+
+    Uses ``npm install --prefix`` so the binaries land in
+    ``<staging>/node_modules/.bin/<bin_name>`` and we symlink them up
+    one level for direct PATH-style access.
+
+    ``extra_pkgs`` is a list of sibling packages to install in the
+    same ``node_modules`` tree.  Used for LSP servers with runtime
+    peer deps that npm doesn't auto-pull (typescript-language-server
+    needs ``typescript`` next to it; intelephense ships standalone).
+    """
+    npm = shutil.which("npm")
+    if npm is None:
+        logger.info("[install] cannot install %s: npm not on PATH", pkg)
+        return None
+    staging = hermes_lsp_bin_dir().parent  # <HERMES_HOME>/lsp/
+    install_targets = [pkg] + list(extra_pkgs or [])
+    try:
+        logger.info(
+            "[install] npm install --prefix %s %s",
+            staging,
+            " ".join(install_targets),
+        )
+        proc = subprocess.run(
+            [npm, "install", "--prefix", str(staging), "--silent", "--no-fund", "--no-audit", *install_targets],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=300,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] npm install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] npm install errored for %s: %s", pkg, e)
+        return None
+
+    # Find the bin
+    nm_bin = staging / "node_modules" / ".bin" / bin_name
+    if os.name == "nt":
+        # On Windows npm sometimes drops `.cmd` shims
+        candidates = [nm_bin, nm_bin.with_suffix(".cmd")]
+    else:
+        candidates = [nm_bin]
+    for c in candidates:
+        if c.exists():
+            # Symlink into our `lsp/bin/` for stable PATH access.
+            link = hermes_lsp_bin_dir() / c.name
+            if not link.exists():
+                try:
+                    link.symlink_to(c)
+                except (OSError, NotImplementedError):
+                    # Symlinks fail on some Windows setups — copy instead.
+                    try:
+                        shutil.copy2(c, link)
+                    except OSError:
+                        return str(c)
+            return str(link if link.exists() else c)
+    logger.warning("[install] npm install for %s succeeded but bin %s not found", pkg, bin_name)
+    return None
+
+
+def _install_go(pkg: str, bin_name: str) -> Optional[str]:
+    """Install a Go module to GOBIN=<staging>."""
+    go = shutil.which("go")
+    if go is None:
+        logger.info("[install] cannot install %s: go not on PATH", pkg)
+        return None
+    staging = hermes_lsp_bin_dir()
+    env = dict(os.environ)
+    env["GOBIN"] = str(staging)
+    try:
+        logger.info("[install] go install %s (GOBIN=%s)", pkg, staging)
+        proc = subprocess.run(
+            [go, "install", pkg],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=600,
+            env=env,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] go install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] go install errored for %s: %s", pkg, e)
+        return None
+    bin_path = staging / bin_name
+    if os.name == "nt":
+        bin_path = bin_path.with_suffix(".exe")
+    if bin_path.exists():
+        return str(bin_path)
+    logger.warning("[install] go install for %s succeeded but bin %s not found", pkg, bin_name)
+    return None
+
+
+def _install_pip(pkg: str, bin_name: str) -> Optional[str]:
+    """Install a Python package into a hermes-owned target dir.
+
+    We avoid polluting the user's site-packages by using
+    ``pip install --target``.  Bins go into
+    ``<staging>/python-packages/bin/`` which we symlink into
+    ``<staging>/bin``.  Note: this only works for packages that ship a
+    console script.
+    """
+    pip_target = hermes_lsp_bin_dir().parent / "python-packages"
+    pip_target.mkdir(parents=True, exist_ok=True)
+    try:
+        logger.info("[install] pip install --target %s %s", pip_target, pkg)
+        proc = subprocess.run(
+            [sys.executable, "-m", "pip", "install", "--target", str(pip_target), "--quiet", pkg],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=300,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] pip install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] pip install errored for %s: %s", pkg, e)
+        return None
+    # Look for the script
+    bin_path = pip_target / "bin" / bin_name
+    if bin_path.exists():
+        link = hermes_lsp_bin_dir() / bin_name
+        if not link.exists():
+            try:
+                link.symlink_to(bin_path)
+            except (OSError, NotImplementedError):
+                try:
+                    shutil.copy2(bin_path, link)
+                except OSError:
+                    return str(bin_path)
+        return str(link if link.exists() else bin_path)
+    return None
+
+
+def detect_status(pkg: str) -> str:
+    """Return ``installed``, ``missing``, or ``manual-only`` for a package.
+
+    Used by the ``hermes lsp status`` CLI to give users a quick
+    overview of what's available without spawning anything.
+    """
+    recipe = INSTALL_RECIPES.get(pkg)
+    bin_name = recipe.get("bin", pkg) if recipe else pkg
+    if _existing_binary(bin_name):
+        return "installed"
+    if recipe and recipe.get("strategy") == "manual":
+        return "manual-only"
+    return "missing"
+
+
+__all__ = [
+    "INSTALL_RECIPES",
+    "try_install",
+    "detect_status",
+    "hermes_lsp_bin_dir",
+]
--- a/agent/lsp/manager.py
+++ b/agent/lsp/manager.py
@@ -0,0 +1,644 @@
+"""Service-level orchestration for LSP clients.
+
+The :class:`LSPService` is the bridge between the synchronous
+file_operations layer and the async :class:`agent.lsp.client.LSPClient`.
+
+Design choices:
+
+- A **single asyncio event loop** runs in a background thread.  All
+  client work happens on that loop.  Synchronous callers from
+  ``tools/file_operations.py`` use :meth:`get_diagnostics_sync` to
+  open + wait + drain in one blocking call.
+
+- One client per ``(server_id, workspace_root)`` key.  Lazy spawn:
+  the first request for a key spawns the client; subsequent requests
+  re-use it.
+
+- A **broken-set** records ``(server_id, workspace_root)`` pairs that
+  failed to spawn or initialize.  These are never retried for the
+  life of the service.  Mirrors OpenCode's design.
+
+- A **delta baseline** map keeps "diagnostics-as-of-the-last-snapshot"
+  per file.  ``snapshot_baseline()`` is called BEFORE a write; the
+  next ``get_diagnostics_sync()`` returns only diagnostics that
+  weren't in the baseline.  This is the lift from Claude Code's
+  ``beforeFileEdited`` / ``getNewDiagnostics`` pattern, except wired
+  to the local LSP layer instead of MCP IDE RPC.
+
+The service is **off by default** — call :meth:`is_active` to check
+whether it's actually doing anything.  When LSP is disabled in
+config, when no git workspace can be detected, when all configured
+servers are missing binaries and auto-install is off, ``is_active``
+returns False and the file_operations layer falls through to the
+in-process syntax check.
+"""
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+import threading
+import time
+from concurrent.futures import Future as ConcurrentFuture
+from typing import Any, Callable, Dict, List, Optional, Tuple
+
+from agent.lsp import eventlog
+from agent.lsp.client import (
+    DIAGNOSTICS_DOCUMENT_WAIT,
+    LSPClient,
+    file_uri,
+)
+from agent.lsp.servers import (
+    ServerContext,
+    ServerDef,
+    SpawnSpec,
+    find_server_for_file,
+    language_id_for,
+)
+from agent.lsp.workspace import (
+    clear_cache,
+    is_inside_workspace,
+    resolve_workspace_for_file,
+)
+
+logger = logging.getLogger("agent.lsp.manager")
+
+DEFAULT_IDLE_TIMEOUT = 600  # seconds; servers idle for >10min get reaped
+
+
+class _BackgroundLoop:
+    """A daemon thread that owns one asyncio event loop.
+
+    Provides :meth:`run` for synchronous callers — submits a coroutine
+    to the loop and blocks until it finishes (or a timeout fires).
+    """
+
+    def __init__(self) -> None:
+        self._loop: Optional[asyncio.AbstractEventLoop] = None
+        self._thread: Optional[threading.Thread] = None
+        self._ready = threading.Event()
+
+    def start(self) -> None:
+        if self._thread is not None:
+            return
+        self._thread = threading.Thread(
+            target=self._run_forever,
+            name="hermes-lsp-loop",
+            daemon=True,
+        )
+        self._thread.start()
+        self._ready.wait(timeout=5.0)
+
+    def _run_forever(self) -> None:
+        loop = asyncio.new_event_loop()
+        self._loop = loop
+        asyncio.set_event_loop(loop)
+        self._ready.set()
+        try:
+            loop.run_forever()
+        finally:
+            try:
+                loop.close()
+            except Exception:  # noqa: BLE001
+                pass
+
+    def run(self, coro, *, timeout: Optional[float] = None) -> Any:
+        """Submit a coroutine to the loop and block until done.
+
+        Returns the coroutine's result, or raises its exception.
+        """
+        from agent.async_utils import safe_schedule_threadsafe
+        if self._loop is None:
+            if asyncio.iscoroutine(coro):
+                coro.close()
+            raise RuntimeError("background loop not started")
+        fut = safe_schedule_threadsafe(coro, self._loop)
+        if fut is None:
+            raise RuntimeError("background loop not running")
+        try:
+            return fut.result(timeout=timeout)
+        except Exception:
+            fut.cancel()
+            raise
+
+    def stop(self) -> None:
+        loop = self._loop
+        if loop is None:
+            return
+        try:
+            loop.call_soon_threadsafe(loop.stop)
+        except RuntimeError:
+            pass
+        if self._thread is not None:
+            self._thread.join(timeout=2.0)
+        self._loop = None
+        self._thread = None
+
+
+class LSPService:
+    """The process-wide LSP service.
+
+    Created once via :meth:`create_from_config`; the
+    :func:`agent.lsp.get_service` accessor manages the singleton.
+    Most callers should use that accessor rather than constructing
+    :class:`LSPService` directly.
+    """
+
+    # ------------------------------------------------------------------
+    # construction + factory
+    # ------------------------------------------------------------------
+
+    def __init__(
+        self,
+        *,
+        enabled: bool,
+        wait_mode: str,
+        wait_timeout: float,
+        install_strategy: str,
+        binary_overrides: Optional[Dict[str, List[str]]] = None,
+        env_overrides: Optional[Dict[str, Dict[str, str]]] = None,
+        init_overrides: Optional[Dict[str, Dict[str, Any]]] = None,
+        disabled_servers: Optional[List[str]] = None,
+        idle_timeout: float = DEFAULT_IDLE_TIMEOUT,
+    ) -> None:
+        self._enabled = enabled
+        self._wait_mode = wait_mode if wait_mode in {"document", "full"} else "document"
+        self._wait_timeout = wait_timeout
+        self._install_strategy = install_strategy
+        self._binary_overrides = binary_overrides or {}
+        self._env_overrides = env_overrides or {}
+        self._init_overrides = init_overrides or {}
+        self._disabled_servers = set(disabled_servers or [])
+        self._idle_timeout = idle_timeout
+
+        self._loop = _BackgroundLoop()
+        if self._enabled:
+            self._loop.start()
+
+        # Per-(server_id, workspace_root) state
+        self._clients: Dict[Tuple[str, str], LSPClient] = {}
+        self._broken: set = set()
+        self._spawning: Dict[Tuple[str, str], asyncio.Future] = {}
+        self._last_used: Dict[Tuple[str, str], float] = {}
+        self._state_lock = threading.Lock()
+
+        # Delta baseline: file path → snapshot of diagnostics taken
+        # immediately before a write.  ``get_diagnostics_sync`` filters
+        # out anything in the baseline so the agent only sees errors
+        # introduced by the current edit.
+        self._delta_baseline: Dict[str, List[Dict[str, Any]]] = {}
+
+    @classmethod
+    def create_from_config(cls) -> Optional["LSPService"]:
+        """Build a service from ``hermes_cli.config`` settings.
+
+        Returns ``None`` if the config can't be loaded.  The service
+        itself returns ``is_active()`` False when LSP is disabled.
+        """
+        try:
+            from hermes_cli.config import load_config
+            cfg = load_config()
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP config load failed: %s", e)
+            return None
+
+        lsp_cfg = (cfg.get("lsp") or {}) if isinstance(cfg, dict) else {}
+        if not isinstance(lsp_cfg, dict):
+            lsp_cfg = {}
+
+        enabled = bool(lsp_cfg.get("enabled", True))
+        wait_mode = lsp_cfg.get("wait_mode", "document")
+        wait_timeout = float(lsp_cfg.get("wait_timeout", DIAGNOSTICS_DOCUMENT_WAIT))
+        install_strategy = lsp_cfg.get("install_strategy", "auto")
+        servers_cfg = lsp_cfg.get("servers") or {}
+        disabled = []
+        binary_overrides: Dict[str, List[str]] = {}
+        env_overrides: Dict[str, Dict[str, str]] = {}
+        init_overrides: Dict[str, Dict[str, Any]] = {}
+        if isinstance(servers_cfg, dict):
+            for name, sub in servers_cfg.items():
+                if not isinstance(sub, dict):
+                    continue
+                if sub.get("disabled"):
+                    disabled.append(name)
+                cmd = sub.get("command")
+                if isinstance(cmd, list) and cmd:
+                    binary_overrides[name] = cmd
+                env = sub.get("env")
+                if isinstance(env, dict):
+                    env_overrides[name] = {k: str(v) for k, v in env.items()}
+                init = sub.get("initialization_options")
+                if isinstance(init, dict):
+                    init_overrides[name] = init
+
+        return cls(
+            enabled=enabled,
+            wait_mode=wait_mode,
+            wait_timeout=wait_timeout,
+            install_strategy=install_strategy,
+            binary_overrides=binary_overrides,
+            env_overrides=env_overrides,
+            init_overrides=init_overrides,
+            disabled_servers=disabled,
+        )
+
+    # ------------------------------------------------------------------
+    # public API
+    # ------------------------------------------------------------------
+
+    def is_active(self) -> bool:
+        """Return True iff this service should be consulted at all."""
+        return self._enabled
+
+    def enabled_for(self, file_path: str) -> bool:
+        """Return True iff LSP should run for this specific file.
+
+        Gates on workspace detection (file or cwd inside a git worktree),
+        on whether any registered server matches the extension, and
+        on whether the (server_id, workspace_root) pair is in the
+        broken-set from a previous spawn failure.
+
+        Files in already-broken pairs return False so the file_operations
+        layer skips the LSP path entirely — no spawn attempts, no
+        timeout cost — until the service is restarted (``hermes lsp
+        restart``) or the process exits.
+        """
+        if not self._enabled:
+            return False
+        srv = find_server_for_file(file_path)
+        if srv is None or srv.server_id in self._disabled_servers:
+            return False
+        ws_root, gated_in = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated_in):
+            return False
+        # Broken-set short-circuit.  Use the per-server root if we can
+        # compute one cheaply; otherwise fall back to the workspace
+        # root as the broken key (which is what _get_or_spawn would
+        # have used anyway when it failed).
+        try:
+            per_server_root = srv.resolve_root(file_path, ws_root) or ws_root
+        except Exception:  # noqa: BLE001
+            per_server_root = ws_root
+        if (srv.server_id, per_server_root) in self._broken:
+            return False
+        return True
+
+    def snapshot_baseline(self, file_path: str) -> None:
+        """Snapshot current diagnostics for ``file_path`` as the delta baseline.
+
+        Called BEFORE a write so the next ``get_diagnostics_sync()``
+        can filter out pre-existing errors.  Best-effort — failures
+        are silently swallowed so a flaky server can't break a write.
+
+        Outer timeouts (e.g. server hangs during initialize) mark the
+        (server_id, workspace_root) pair as broken so subsequent edits
+        skip it instantly instead of re-paying the timeout cost.
+        """
+        if not self.enabled_for(file_path):
+            return
+        try:
+            diags = self._loop.run(self._snapshot_async(file_path), timeout=8.0)
+            self._delta_baseline[os.path.abspath(file_path)] = diags or []
+        except Exception as e:  # noqa: BLE001
+            logger.debug("baseline snapshot failed for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            self._delta_baseline[os.path.abspath(file_path)] = []
+
+    def get_diagnostics_sync(
+        self,
+        file_path: str,
+        *,
+        delta: bool = True,
+        timeout: Optional[float] = None,
+        line_shift: Optional[Callable[[int], Optional[int]]] = None,
+    ) -> List[Dict[str, Any]]:
+        """Synchronously open ``file_path`` in the right server, wait for
+        diagnostics, return them.
+
+        If ``delta`` is True (default), the result is filtered against
+        any baseline previously captured via :meth:`snapshot_baseline`.
+        Diagnostics present in the baseline are removed so the caller
+        only sees errors introduced by the current edit.
+
+        When ``line_shift`` is provided, baseline diagnostics are
+        remapped through it before the set-difference.  This handles
+        the case where the edit deleted or inserted lines, causing
+        pre-existing diagnostics below the edit point to surface at
+        different line numbers in the post-edit snapshot — without
+        the shift, they'd all look "introduced by this edit".  Pass
+        a callable built by
+        :func:`agent.lsp.range_shift.build_line_shift` (pre_text,
+        post_text).  Omit when pre/post content isn't available;
+        the unshifted comparison still catches diagnostics that
+        didn't move.
+
+        Returns an empty list when LSP is disabled, when no workspace
+        can be detected, when no server matches, or when the server
+        can't be spawned.  Never raises.
+        """
+        if not self.enabled_for(file_path):
+            return []
+
+        # Resolve server_id eagerly so we can emit structured logs even
+        # when the request errors out below.
+        srv = find_server_for_file(file_path)
+        server_id = srv.server_id if srv else "?"
+
+        try:
+            t = timeout if timeout is not None else self._wait_timeout + 2.0
+            diags = self._loop.run(self._open_and_wait_async(file_path), timeout=t) or []
+        except asyncio.TimeoutError as e:
+            eventlog.log_timeout(server_id, file_path)
+            logger.debug("LSP diagnostics timeout for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            return []
+        except Exception as e:  # noqa: BLE001
+            eventlog.log_server_error(server_id, file_path, e)
+            logger.debug("LSP diagnostics fetch failed for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            return []
+
+        abs_path = os.path.abspath(file_path)
+        if delta:
+            baseline = self._delta_baseline.get(abs_path) or []
+            if baseline:
+                if line_shift is not None:
+                    # Remap baseline diagnostics into post-edit
+                    # coordinates so shifted-but-otherwise-identical
+                    # entries hash equal under _diag_key.  Entries
+                    # that mapped into a deleted region drop out
+                    # silently — they no longer apply.
+                    from agent.lsp.range_shift import shift_baseline
+                    baseline = shift_baseline(baseline, line_shift)
+                seen = {_diag_key(d) for d in baseline}
+                diags = [d for d in diags if _diag_key(d) not in seen]
+            # Roll baseline forward — next call returns deltas relative
+            # to the just-emitted state, mirroring claude-code's
+            # diagnosticTracking.
+            try:
+                fresh = self._loop.run(self._current_diags_async(file_path), timeout=2.0) or []
+            except Exception:  # noqa: BLE001
+                fresh = []
+            if fresh:
+                self._delta_baseline[abs_path] = fresh
+
+        if diags:
+            eventlog.log_diagnostics(server_id, file_path, len(diags))
+        else:
+            eventlog.log_clean(server_id, file_path)
+        return diags
+
+    def _mark_broken_for_file(self, file_path: str, exc: BaseException) -> None:
+        """Mark the (server_id, workspace_root) pair as broken so subsequent
+        edits skip it instantly instead of re-paying timeout cost.
+
+        Called when the outer ``_loop.run`` timeout cancels an in-flight
+        spawn/initialize that the inner ``_get_or_spawn`` task was still
+        holding open.  Without this, every subsequent write would re-enter
+        the spawn path and re-pay the full ``snapshot_baseline``
+        timeout (8s) until the binary is fixed.
+
+        Also kills any orphan client process that survived the cancelled
+        future, and emits a single eventlog WARNING so the user knows
+        which server gave up.
+
+        ``exc`` is whatever exception the outer wrapper caught — used
+        only for logging, never re-raised.
+        """
+        srv = find_server_for_file(file_path)
+        if srv is None:
+            return
+        ws_root, gated = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated):
+            return
+        try:
+            per_server_root = srv.resolve_root(file_path, ws_root) or ws_root
+        except Exception:  # noqa: BLE001
+            per_server_root = ws_root
+        key = (srv.server_id, per_server_root)
+        already_broken = key in self._broken
+        self._broken.add(key)
+
+        # Kill any client we managed to spawn before the timeout.  The
+        # cancelled future never reached the broken-set add inside
+        # ``_get_or_spawn`` so the client may still be hanging in
+        # ``_clients`` with a half-initialized state.
+        with self._state_lock:
+            client = self._clients.pop(key, None)
+        if client is not None:
+            try:
+                # Fire-and-forget shutdown — give it a second to cleanup,
+                # but don't block.  We're already on a slow path.
+                self._loop.run(client.shutdown(), timeout=1.0)
+            except Exception:  # noqa: BLE001
+                pass
+
+        if not already_broken:
+            eventlog.log_spawn_failed(srv.server_id, per_server_root, exc)
+
+    def shutdown(self) -> None:
+        """Tear down all clients and stop the background loop."""
+        if not self._enabled:
+            return
+        try:
+            self._loop.run(self._shutdown_async(), timeout=10.0)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP shutdown error: %s", e)
+        self._loop.stop()
+        clear_cache()
+
+    # ------------------------------------------------------------------
+    # async internals
+    # ------------------------------------------------------------------
+
+    async def _snapshot_async(self, file_path: str) -> List[Dict[str, Any]]:
+        client = await self._get_or_spawn(file_path)
+        if client is None:
+            return []
+        try:
+            version = await client.open_file(file_path, language_id=language_id_for(file_path))
+            await client.wait_for_diagnostics(file_path, version, mode=self._wait_mode)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("snapshot open/wait failed: %s", e)
+            return []
+        self._last_used[(client.server_id, client.workspace_root)] = time.time()
+        return list(client.diagnostics_for(file_path))
+
+    async def _open_and_wait_async(self, file_path: str) -> List[Dict[str, Any]]:
+        client = await self._get_or_spawn(file_path)
+        if client is None:
+            return []
+        try:
+            version = await client.open_file(file_path, language_id=language_id_for(file_path))
+            await client.save_file(file_path)
+            await client.wait_for_diagnostics(file_path, version, mode=self._wait_mode)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("open/wait failed for %s: %s", file_path, e)
+            return []
+        self._last_used[(client.server_id, client.workspace_root)] = time.time()
+        return list(client.diagnostics_for(file_path))
+
+    async def _current_diags_async(self, file_path: str) -> List[Dict[str, Any]]:
+        ws, gated = resolve_workspace_for_file(file_path)
+        srv = find_server_for_file(file_path)
+        if not (ws and gated and srv):
+            return []
+        with self._state_lock:
+            client = self._clients.get((srv.server_id, ws))
+        if client is None:
+            return []
+        return list(client.diagnostics_for(file_path))
+
+    async def _get_or_spawn(self, file_path: str) -> Optional[LSPClient]:
+        srv = find_server_for_file(file_path)
+        if srv is None:
+            return None
+        if srv.server_id in self._disabled_servers:
+            eventlog.log_disabled(srv.server_id, file_path, "disabled in config")
+            return None
+        ws_root, gated = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated):
+            eventlog.log_no_project_root(srv.server_id, file_path)
+            return None
+        per_server_root = srv.resolve_root(file_path, ws_root)
+        if per_server_root is None:
+            eventlog.log_disabled(
+                srv.server_id, file_path, "exclude marker hit (server gated off)"
+            )
+            return None  # exclude marker hit, server gated off
+
+        key = (srv.server_id, per_server_root)
+        if key in self._broken:
+            return None
+        with self._state_lock:
+            client = self._clients.get(key)
+            if client is not None and client.is_running:
+                eventlog.log_active(srv.server_id, per_server_root)
+                return client
+            spawning = self._spawning.get(key)
+        if spawning is not None:
+            try:
+                return await spawning
+            except Exception:  # noqa: BLE001
+                return None
+
+        # Begin spawn
+        loop = asyncio.get_running_loop()
+        spawn_future: asyncio.Future = loop.create_future()
+        with self._state_lock:
+            self._spawning[key] = spawn_future
+        try:
+            ctx = ServerContext(
+                workspace_root=per_server_root,
+                install_strategy=self._install_strategy,
+                binary_overrides=self._binary_overrides,
+                env_overrides=self._env_overrides,
+                init_overrides=self._init_overrides,
+            )
+            spec = srv.build_spawn(per_server_root, ctx)
+            if spec is None:
+                # ``build_spawn`` returns None when the binary can't be
+                # located (auto-install disabled, manual-only server,
+                # or install attempt failed).  Surface this once via
+                # the structured logger so the user can act on it.
+                eventlog.log_server_unavailable(srv.server_id, srv.server_id)
+                self._broken.add(key)
+                spawn_future.set_result(None)
+                return None
+            client = LSPClient(
+                server_id=srv.server_id,
+                workspace_root=spec.workspace_root,
+                command=spec.command,
+                env=spec.env,
+                cwd=spec.cwd,
+                initialization_options=spec.initialization_options,
+                seed_diagnostics_on_first_push=spec.seed_diagnostics_on_first_push or srv.seed_first_push,
+            )
+            try:
+                await client.start()
+            except Exception as e:  # noqa: BLE001
+                eventlog.log_spawn_failed(srv.server_id, per_server_root, e)
+                self._broken.add(key)
+                spawn_future.set_result(None)
+                return None
+            with self._state_lock:
+                self._clients[key] = client
+            self._last_used[key] = time.time()
+            eventlog.log_active(srv.server_id, per_server_root)
+            spawn_future.set_result(client)
+            return client
+        finally:
+            with self._state_lock:
+                self._spawning.pop(key, None)
+
+    async def _shutdown_async(self) -> None:
+        with self._state_lock:
+            clients = list(self._clients.values())
+            self._clients.clear()
+            self._broken.clear()
+            self._last_used.clear()
+        await asyncio.gather(
+            *(c.shutdown() for c in clients),
+            return_exceptions=True,
+        )
+
+    # ------------------------------------------------------------------
+    # status / introspection (used by ``hermes lsp status``)
+    # ------------------------------------------------------------------
+
+    def get_status(self) -> Dict[str, Any]:
+        """Return a snapshot of the service for the CLI status command."""
+        with self._state_lock:
+            clients = [
+                {
+                    "server_id": k[0],
+                    "workspace_root": k[1],
+                    "state": c.state,
+                    "running": c.is_running,
+                }
+                for k, c in self._clients.items()
+            ]
+            broken = list(self._broken)
+        return {
+            "enabled": self._enabled,
+            "wait_mode": self._wait_mode,
+            "wait_timeout": self._wait_timeout,
+            "install_strategy": self._install_strategy,
+            "clients": clients,
+            "broken": broken,
+            "disabled_servers": sorted(self._disabled_servers),
+        }
+
+
+def _diag_key(d: Dict[str, Any]) -> str:
+    """Content equality key used for cross-edit delta filtering.
+
+    Includes the diagnostic's position range — when used together
+    with :func:`agent.lsp.range_shift.shift_baseline`, the baseline
+    is line-shifted into post-edit coordinates BEFORE this key is
+    computed, so identical-but-shifted diagnostics hash equal.  Two
+    genuinely distinct diagnostics at different lines (e.g. the same
+    error class introduced at a second site) hash differently and
+    are surfaced as new.
+
+    Mirrors :func:`agent.lsp.client._diagnostic_key`; intentionally
+    identical so the two layers agree on diagnostic identity.
+    """
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    end = rng.get("end") or {}
+    code = d.get("code")
+    if code is not None and not isinstance(code, str):
+        code = str(code)
+    return "\x00".join(
+        [
+            str(d.get("severity") or 1),
+            str(code or ""),
+            str(d.get("source") or ""),
+            str(d.get("message") or "").strip(),
+            f"{start.get('line', 0)}:{start.get('character', 0)}-{end.get('line', 0)}:{end.get('character', 0)}",
+        ]
+    )
+
+
+__all__ = ["LSPService"]
--- a/agent/lsp/protocol.py
+++ b/agent/lsp/protocol.py
@@ -0,0 +1,196 @@
+"""Minimal LSP JSON-RPC 2.0 framer over async streams.
+
+LSP wire format:
+
+    Content-Length: <bytes>\\r\\n
+    \\r\\n
+    <utf-8 JSON body>
+
+The body is a JSON-RPC 2.0 envelope: request, response, or notification.
+
+This module replaces what ``vscode-jsonrpc/node`` would do in a
+TypeScript implementation.  We keep it deliberately small — just the
+framer + envelope helpers — so :class:`agent.lsp.client.LSPClient` can
+focus on protocol semantics.
+"""
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+from typing import Any, Optional, Tuple
+
+logger = logging.getLogger("agent.lsp.protocol")
+
+# LSP error codes we care about.  Full list in
+# https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#errorCodes
+ERROR_CONTENT_MODIFIED = -32801
+ERROR_REQUEST_CANCELLED = -32800
+ERROR_METHOD_NOT_FOUND = -32601
+
+
+class LSPProtocolError(Exception):
+    """Raised when the wire protocol is violated.
+
+    Distinct from :class:`LSPRequestError` which represents a server
+    returning a JSON-RPC error response — that's protocol-conformant.
+    This exception means the framing or envelope itself is broken.
+    """
+
+
+class LSPRequestError(Exception):
+    """Raised when an LSP request returns an error response.
+
+    Carries the JSON-RPC ``code``, ``message``, and optional ``data``.
+    """
+
+    def __init__(self, code: int, message: str, data: Any = None) -> None:
+        super().__init__(f"LSP error {code}: {message}")
+        self.code = code
+        self.message = message
+        self.data = data
+
+
+def encode_message(obj: dict) -> bytes:
+    """Encode a JSON-RPC envelope as a Content-Length framed byte string.
+
+    The body is encoded as compact UTF-8 JSON (no spaces between
+    separators) — matches what ``vscode-jsonrpc`` emits and keeps the
+    Content-Length count exact.
+    """
+    body = json.dumps(obj, separators=(",", ":"), ensure_ascii=False).encode("utf-8")
+    header = f"Content-Length: {len(body)}\r\n\r\n".encode("ascii")
+    return header + body
+
+
+async def read_message(reader: asyncio.StreamReader) -> Optional[dict]:
+    """Read one Content-Length framed JSON-RPC message from the stream.
+
+    Returns ``None`` on clean EOF (server closed stdout cleanly between
+    messages — typical shutdown).  Raises :class:`LSPProtocolError` on
+    malformed framing.
+
+    The reader is advanced to just past the JSON body on success.
+    """
+    headers: dict = {}
+    header_bytes = 0
+    while True:
+        try:
+            line = await reader.readuntil(b"\r\n")
+        except asyncio.IncompleteReadError as e:
+            # EOF while reading headers.  If we hadn't started a header
+            # block, treat as clean EOF; otherwise the framing is bad.
+            if not e.partial and not headers:
+                return None
+            raise LSPProtocolError(
+                f"unexpected EOF while reading LSP headers (partial={e.partial!r})"
+            ) from e
+        # Defensive cap against a server streaming headers without ever
+        # emitting CRLF-CRLF.  Caps total header bytes at 8 KiB — a
+        # well-behaved server fits in well under 200 bytes.
+        header_bytes += len(line)
+        if header_bytes > 8192:
+            raise LSPProtocolError(
+                f"LSP header block exceeded 8 KiB without terminator"
+            )
+        line = line[:-2]  # strip CRLF
+        if not line:
+            break  # blank line ends header block
+        try:
+            key, _, value = line.decode("ascii").partition(":")
+        except UnicodeDecodeError as e:
+            raise LSPProtocolError(f"non-ASCII LSP header: {line!r}") from e
+        if not key:
+            raise LSPProtocolError(f"malformed LSP header line: {line!r}")
+        headers[key.strip().lower()] = value.strip()
+
+    cl = headers.get("content-length")
+    if cl is None:
+        raise LSPProtocolError(f"LSP message missing Content-Length: {headers!r}")
+    try:
+        n = int(cl)
+    except ValueError as e:
+        raise LSPProtocolError(f"non-integer Content-Length: {cl!r}") from e
+    if n < 0 or n > 64 * 1024 * 1024:  # 64 MiB sanity cap
+        raise LSPProtocolError(f"unreasonable Content-Length: {n}")
+
+    try:
+        body = await reader.readexactly(n)
+    except asyncio.IncompleteReadError as e:
+        raise LSPProtocolError(
+            f"truncated LSP body: expected {n} bytes, got {len(e.partial)}"
+        ) from e
+
+    try:
+        return json.loads(body.decode("utf-8"))
+    except json.JSONDecodeError as e:
+        raise LSPProtocolError(f"invalid JSON in LSP body: {e}") from e
+    except UnicodeDecodeError as e:
+        raise LSPProtocolError(f"non-UTF-8 LSP body: {e}") from e
+
+
+def make_request(req_id: int, method: str, params: Any) -> dict:
+    """Build a JSON-RPC 2.0 request envelope."""
+    msg: dict = {"jsonrpc": "2.0", "id": req_id, "method": method}
+    if params is not None:
+        msg["params"] = params
+    return msg
+
+
+def make_notification(method: str, params: Any) -> dict:
+    """Build a JSON-RPC 2.0 notification envelope (no ``id``)."""
+    msg: dict = {"jsonrpc": "2.0", "method": method}
+    if params is not None:
+        msg["params"] = params
+    return msg
+
+
+def make_response(req_id: Any, result: Any) -> dict:
+    """Build a JSON-RPC 2.0 success response envelope."""
+    return {"jsonrpc": "2.0", "id": req_id, "result": result}
+
+
+def make_error_response(req_id: Any, code: int, message: str, data: Any = None) -> dict:
+    """Build a JSON-RPC 2.0 error response envelope."""
+    err: dict = {"code": code, "message": message}
+    if data is not None:
+        err["data"] = data
+    return {"jsonrpc": "2.0", "id": req_id, "error": err}
+
+
+def classify_message(msg: dict) -> Tuple[str, Any]:
+    """Return ``(kind, key)`` where kind is one of ``request``,
+    ``response``, ``notification``, ``invalid``.
+
+    The key is the request id for request/response, the method name
+    for notifications, and ``None`` for invalid messages.
+    """
+    if not isinstance(msg, dict):
+        return "invalid", None
+    if msg.get("jsonrpc") != "2.0":
+        return "invalid", None
+    has_id = "id" in msg
+    has_method = "method" in msg
+    if has_id and has_method:
+        return "request", msg["id"]
+    if has_id and ("result" in msg or "error" in msg):
+        return "response", msg["id"]
+    if has_method and not has_id:
+        return "notification", msg["method"]
+    return "invalid", None
+
+
+__all__ = [
+    "ERROR_CONTENT_MODIFIED",
+    "ERROR_REQUEST_CANCELLED",
+    "ERROR_METHOD_NOT_FOUND",
+    "LSPProtocolError",
+    "LSPRequestError",
+    "encode_message",
+    "read_message",
+    "make_request",
+    "make_notification",
+    "make_response",
+    "make_error_response",
+    "classify_message",
+]
--- a/agent/lsp/range_shift.py
+++ b/agent/lsp/range_shift.py
@@ -0,0 +1,149 @@
+"""Diff-aware line-shift map for cross-edit LSP delta filtering.
+
+When an edit deletes or inserts lines in the middle of a file, every
+diagnostic below the edit point shifts to a new line number.  The
+LSPService delta filter subtracts the pre-edit baseline from the
+post-edit diagnostics keyed on ``(severity, code, source, message,
+range)`` — without an adjustment, the shifted-but-otherwise-identical
+diagnostics look brand-new and the agent gets flooded with noise.
+
+The fix used here is the same trick git's blame and unified diff use:
+build a piecewise-linear map from pre-edit line numbers to post-edit
+line numbers, then apply that map to baseline diagnostics before the
+set-difference.  Diagnostics whose pre-edit line is in a region the
+edit deleted return ``None`` and are dropped from the baseline (they
+genuinely no longer apply).
+
+Trade-off vs. dropping range from the key entirely (the previous
+fix): preserves the "new instance of an identical error at a
+different line" signal — if the model introduces a second instance
+of the same error class at a different location, that one will be
+surfaced as new instead of swallowed by content-only dedup.
+
+The map is derived from ``difflib.SequenceMatcher.get_opcodes()`` and
+exposed as a single callable so callers don't have to reason about
+diff regions.
+"""
+from __future__ import annotations
+
+import difflib
+from typing import Any, Callable, Dict, List, Optional
+
+
+def build_line_shift(pre_text: str, post_text: str) -> Callable[[int], Optional[int]]:
+    """Build a function mapping pre-edit line numbers to post-edit line numbers.
+
+    Lines are 0-indexed to match the LSP wire format
+    (``range.start.line`` is 0-indexed).
+
+    The returned callable takes a pre-edit 0-indexed line number and
+    returns the corresponding post-edit 0-indexed line number, or
+    ``None`` if that line was deleted by the edit (no post-edit
+    counterpart exists).
+
+    Cost: one ``SequenceMatcher.get_opcodes()`` call up front; the
+    returned closure is O(log n) per call (binary search over opcode
+    regions).  Cheap enough to call once per write/patch and apply to
+    every baseline diagnostic.
+    """
+    pre_lines = pre_text.splitlines() if pre_text else []
+    post_lines = post_text.splitlines() if post_text else []
+
+    # Trivial case: identical content or no content — identity map.
+    if pre_lines == post_lines:
+        return lambda line: line
+
+    # SequenceMatcher.get_opcodes() returns a list of
+    # (tag, i1, i2, j1, j2) where tag is 'equal', 'replace', 'delete',
+    # or 'insert'.  i1:i2 is the range in pre, j1:j2 is the range in
+    # post.  We build a list of (i1, i2, j1, j2, tag) tuples and
+    # binary-search by i for each lookup.
+    sm = difflib.SequenceMatcher(a=pre_lines, b=post_lines, autojunk=False)
+    opcodes = sm.get_opcodes()
+
+    def shift(line: int) -> Optional[int]:
+        # Find the opcode region whose i1 <= line < i2.
+        # Linear scan is fine — typical opcode count is small (single
+        # digits for a typical patch-tool edit).
+        for tag, i1, i2, j1, j2 in opcodes:
+            if i1 <= line < i2:
+                if tag == "equal":
+                    # Pre-line N → post-line (N - i1 + j1).
+                    return line - i1 + j1
+                if tag == "delete":
+                    # Pre-line is in a deleted region — no post counterpart.
+                    return None
+                if tag == "replace":
+                    # Replace == delete + insert; the pre-line has no
+                    # post counterpart in any meaningful sense.  Drop.
+                    return None
+                # 'insert' has i1 == i2 so line < i2 can't be hit.
+            if line < i1:
+                # Past the relevant region — handled in earlier iteration.
+                break
+        # Past the last opcode region (line >= len(pre_lines)).
+        # Anchor at end of post.
+        return max(0, len(post_lines) - 1) if post_lines else None
+
+    return shift
+
+
+def shift_diagnostic_range(diag: Dict[str, Any],
+                           shift: Callable[[int], Optional[int]]) -> Optional[Dict[str, Any]]:
+    """Return a copy of ``diag`` with its line range remapped through ``shift``.
+
+    Returns ``None`` if the diagnostic's start line maps to ``None``
+    (the line was deleted by the edit) — caller drops it from the
+    baseline since the diagnostic no longer applies.
+
+    Both ``start.line`` and ``end.line`` are remapped independently;
+    when only the end maps to ``None`` (rare, multi-line diagnostic
+    straddling the edit boundary) we collapse to a single-line range
+    at the shifted start to keep the diagnostic in the baseline.
+
+    The original ``diag`` is not mutated.
+    """
+    rng = diag.get("range") or {}
+    start = rng.get("start") or {}
+    end = rng.get("end") or {}
+
+    pre_start_line = int(start.get("line", 0))
+    pre_end_line = int(end.get("line", pre_start_line))
+
+    new_start_line = shift(pre_start_line)
+    if new_start_line is None:
+        return None
+
+    new_end_line = shift(pre_end_line)
+    if new_end_line is None:
+        # Diagnostic straddled the deletion — collapse to start.
+        new_end_line = new_start_line
+
+    shifted = dict(diag)
+    shifted["range"] = {
+        "start": {
+            "line": new_start_line,
+            "character": int(start.get("character", 0)),
+        },
+        "end": {
+            "line": new_end_line,
+            "character": int(end.get("character", 0)),
+        },
+    }
+    return shifted
+
+
+def shift_baseline(baseline: List[Dict[str, Any]],
+                   shift: Callable[[int], Optional[int]]) -> List[Dict[str, Any]]:
+    """Apply ``shift`` to every diagnostic in ``baseline``, dropping deleted entries."""
+    out: List[Dict[str, Any]] = []
+    for d in baseline:
+        if not isinstance(d, dict):
+            continue
+        shifted = shift_diagnostic_range(d, shift)
+        if shifted is not None:
+            out.append(shifted)
+    return out
+
+
+__all__ = ["build_line_shift", "shift_diagnostic_range", "shift_baseline"]
--- a/agent/lsp/reporter.py
+++ b/agent/lsp/reporter.py
@@ -0,0 +1,78 @@
+"""Format LSP diagnostics for inclusion in tool output.
+
+The model sees a compact, severity-filtered, line-bounded summary of
+diagnostics introduced by the latest edit.  Format matches what
+OpenCode's ``lsp/diagnostic.ts`` and Claude Code's
+``formatDiagnosticsSummary`` produce — ``<diagnostics>`` blocks with
+1-indexed line/column, capped at ``MAX_PER_FILE`` errors.
+"""
+from __future__ import annotations
+
+from typing import Any, Dict, List
+
+# Severity-1 only by default — warnings/info/hints would flood the
+# agent.  Lift this in config under ``lsp.severities`` if needed.
+SEVERITY_NAMES = {1: "ERROR", 2: "WARN", 3: "INFO", 4: "HINT"}
+DEFAULT_SEVERITIES = frozenset({1})  # ERROR only
+
+MAX_PER_FILE = 20
+MAX_TOTAL_CHARS = 4000
+
+
+def format_diagnostic(d: Dict[str, Any]) -> str:
+    """One-line representation of a single diagnostic."""
+    sev = SEVERITY_NAMES.get(d.get("severity") or 1, "ERROR")
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    line = int(start.get("line", 0)) + 1
+    col = int(start.get("character", 0)) + 1
+    msg = str(d.get("message") or "").rstrip()
+    code = d.get("code")
+    code_part = f" [{code}]" if code not in {None, ""} else ""
+    source = d.get("source")
+    source_part = f" ({source})" if source else ""
+    return f"{sev} [{line}:{col}] {msg}{code_part}{source_part}"
+
+
+def report_for_file(
+    file_path: str,
+    diagnostics: List[Dict[str, Any]],
+    *,
+    severities: frozenset = DEFAULT_SEVERITIES,
+    max_per_file: int = MAX_PER_FILE,
+) -> str:
+    """Build a ``<diagnostics file=...>`` block for one file.
+
+    Returns an empty string when no diagnostics pass the severity
+    filter, so callers can do ``if block:`` to skip empty cases.
+    """
+    if not diagnostics:
+        return ""
+    filtered = [d for d in diagnostics if (d.get("severity") or 1) in severities]
+    if not filtered:
+        return ""
+    limited = filtered[:max_per_file]
+    extra = len(filtered) - len(limited)
+    lines = [format_diagnostic(d) for d in limited]
+    body = "\n".join(lines)
+    if extra > 0:
+        body += f"\n... and {extra} more"
+    return f"<diagnostics file=\"{file_path}\">\n{body}\n</diagnostics>"
+
+
+def truncate(s: str, *, limit: int = MAX_TOTAL_CHARS) -> str:
+    """Hard-cap a formatted summary string."""
+    if len(s) <= limit:
+        return s
+    marker = "\n…[truncated]"
+    return s[: limit - len(marker)] + marker
+
+
+__all__ = [
+    "SEVERITY_NAMES",
+    "DEFAULT_SEVERITIES",
+    "MAX_PER_FILE",
+    "format_diagnostic",
+    "report_for_file",
+    "truncate",
+]
--- a/agent/lsp/servers.py
+++ b/agent/lsp/servers.py
--- a/agent/lsp/workspace.py
+++ b/agent/lsp/workspace.py
@@ -0,0 +1,223 @@
+"""Workspace and project-root resolution for LSP.
+
+Two concerns live here:
+
+1. **Workspace gate** — the upper-level "is this directory a project?"
+   check.  Hermes only runs LSP when the cwd (or the file being edited)
+   sits inside a git worktree.  Files outside any git root never
+   trigger LSP, even if a server is configured.  This keeps Telegram
+   gateway users on user-home cwd's from spawning daemons.
+
+2. **NearestRoot** — the per-server project-root walk.  Each language
+   server cares about a different marker (``pyproject.toml`` for
+   Python, ``Cargo.toml`` for Rust, ``go.mod`` for Go, etc.) and
+   wants the directory containing that marker.  ``nearest_root()``
+   walks up from a starting path looking for any of a list of marker
+   files, optionally bailing if an exclude marker shows up first.
+"""
+from __future__ import annotations
+
+import logging
+import os
+from pathlib import Path
+from typing import Iterable, Optional, Tuple
+
+logger = logging.getLogger("agent.lsp.workspace")
+
+# Cache: cwd → (worktree_root, is_git) so repeated calls don't re-stat.
+# Cleared on shutdown.  Keyed by absolute resolved path so symlink
+# folds collapse to one entry.
+_workspace_cache: dict = {}
+
+
+def normalize_path(path: str) -> str:
+    """Normalize a path for use as a stable map key.
+
+    Resolves ``~``, makes absolute, and collapses ``.``/``..``.  We do
+    NOT resolve symlinks here — symlink stability matters for some
+    LSP servers (rust-analyzer cares about Cargo workspace identity)
+    and we want the canonical path the user typed when possible.
+    """
+    return os.path.abspath(os.path.expanduser(path))
+
+
+def find_git_worktree(start: str) -> Optional[str]:
+    """Walk up from ``start`` looking for a ``.git`` entry (file or dir).
+
+    Returns the directory containing ``.git``, or ``None`` if no git
+    root is found before hitting the filesystem root.
+
+    A ``.git`` *file* (not directory) means we're inside a git
+    worktree set up via ``git worktree add`` — both forms count.
+    """
+    try:
+        start_path = Path(normalize_path(start))
+        if start_path.is_file():
+            start_path = start_path.parent
+    except (OSError, RuntimeError, ValueError):
+        # Pathological input (loop in symlinks, encoding error, etc.) —
+        # bail out rather than crash the lint hook.
+        return None
+
+    # Cache check
+    cached = _workspace_cache.get(str(start_path))
+    if cached is not None:
+        root, _is_git = cached
+        return root
+
+    cur = start_path
+    # Defensive cap: the deepest reasonable monorepo is well under 64
+    # levels.  Caps the walk so a pathological cwd or a symlink cycle
+    # we somehow traverse can't keep us looping.
+    for _ in range(64):
+        git_marker = cur / ".git"
+        try:
+            if git_marker.exists():
+                resolved = str(cur)
+                _workspace_cache[str(start_path)] = (resolved, True)
+                return resolved
+        except OSError:
+            # Permission error on a parent dir — bail out cleanly.
+            break
+        parent = cur.parent
+        if parent == cur:
+            break
+        cur = parent
+
+    _workspace_cache[str(start_path)] = (None, False)
+    return None
+
+
+def is_inside_workspace(path: str, workspace_root: str) -> bool:
+    """Return True iff ``path`` is inside (or equal to) ``workspace_root``.
+
+    Uses absolute paths but does not resolve symlinks — a file accessed
+    via a symlink that points outside the workspace still counts as
+    outside.  This is the conservative interpretation; matches LSP
+    behaviour where servers reject didOpen for unrelated files.
+    """
+    p = normalize_path(path)
+    root = normalize_path(workspace_root)
+    if p == root:
+        return True
+    # Use os.path.commonpath to handle case-insensitive filesystems
+    # correctly on macOS/Windows.
+    try:
+        common = os.path.commonpath([p, root])
+    except ValueError:
+        # Different drives on Windows.
+        return False
+    return common == root
+
+
+def nearest_root(
+    start: str,
+    markers: Iterable[str],
+    *,
+    excludes: Optional[Iterable[str]] = None,
+    ceiling: Optional[str] = None,
+) -> Optional[str]:
+    """Walk up from ``start`` looking for any of the given marker files.
+
+    Returns the **directory containing** the first matched marker, or
+    ``None`` if no marker is found before hitting ``ceiling`` (or the
+    filesystem root if no ceiling).
+
+    If ``excludes`` is provided and an exclude marker matches *first*
+    in the upward walk, returns ``None`` — the server is gated off
+    for that file.  Mirrors OpenCode's NearestRoot exclude semantics
+    (e.g. typescript skips deno projects when ``deno.json`` is found
+    before ``package.json``).
+    """
+    start_path = Path(normalize_path(start))
+    try:
+        if start_path.is_file():
+            start_path = start_path.parent
+    except (OSError, RuntimeError, ValueError):
+        return None
+    ceiling_path = Path(normalize_path(ceiling)) if ceiling else None
+
+    markers_list = list(markers)
+    excludes_list = list(excludes) if excludes else []
+
+    cur = start_path
+    # Defensive cap matching ``find_git_worktree``.  Bounded walk
+    # protects against pathological inputs even though the
+    # parent-equality stop normally terminates within ~10 steps.
+    for _ in range(64):
+        # Check excludes first — if an exclude is found at this level,
+        # the server is gated off for this file.
+        for exc in excludes_list:
+            try:
+                if (cur / exc).exists():
+                    return None
+            except OSError:
+                continue
+        # Then check markers.
+        for marker in markers_list:
+            try:
+                if (cur / marker).exists():
+                    return str(cur)
+            except OSError:
+                continue
+        # Stop conditions.
+        if ceiling_path is not None and cur == ceiling_path:
+            return None
+        parent = cur.parent
+        if parent == cur:
+            return None
+        cur = parent
+    return None
+
+
+def resolve_workspace_for_file(
+    file_path: str,
+    *,
+    cwd: Optional[str] = None,
+) -> Tuple[Optional[str], bool]:
+    """Resolve the workspace root for a file.
+
+    Returns ``(workspace_root, gated_in)`` where ``gated_in`` is True
+    iff LSP should run for this file at all.  Currently the gate is
+    "file is inside a git worktree found by walking up from cwd OR
+    from the file itself".
+
+    The cwd path takes precedence — if the agent was launched in a
+    git project, that worktree is the workspace, and any edit inside
+    it (regardless of where the file lives) is in-scope.  If the cwd
+    isn't in a git worktree, we try the file's own location as a
+    fallback.
+
+    Returns ``(None, False)`` when neither path is in a git worktree.
+    """
+    cwd = cwd or os.getcwd()
+    cwd_root = find_git_worktree(cwd)
+    if cwd_root is not None:
+        if is_inside_workspace(file_path, cwd_root):
+            return cwd_root, True
+        # File is outside the cwd's worktree — try the file's own
+        # location as a secondary anchor.  Useful for monorepos where
+        # the user opens an unrelated checkout.
+    file_root = find_git_worktree(file_path)
+    if file_root is not None:
+        return file_root, True
+    return None, False
+
+
+def clear_cache() -> None:
+    """Clear the workspace-resolution cache.
+
+    Called on service shutdown so a subsequent re-init doesn't pick
+    up stale results from a previous session.
+    """
+    _workspace_cache.clear()
+
+
+__all__ = [
+    "find_git_worktree",
+    "is_inside_workspace",
+    "nearest_root",
+    "normalize_path",
+    "resolve_workspace_for_file",
+    "clear_cache",
+]
--- a/agent/markdown_tables.py
+++ b/agent/markdown_tables.py
@@ -0,0 +1,309 @@
+"""CJK/wide-character-aware re-alignment of model-emitted markdown tables.
+
+Models pad markdown tables assuming each character occupies one terminal
+cell. CJK glyphs and most emoji render as two cells, so the model's
+spacing collapses into drift the moment a table reaches a real terminal —
+header pipes line up, every body row drifts right by N cells per CJK
+char.
+
+This module rebuilds row padding using ``wcwidth.wcswidth`` (display
+columns), preserving the table's pipes and dashes so it still reads as a
+plain-text table in ``strip`` / unrendered display modes. Standard Rich
+markdown rendering already aligns CJK correctly inside a wide enough
+panel; this helper is for the paths that print the model's text more or
+less verbatim.
+
+The helper is deliberately conservative:
+
+* Only contiguous ``| ... |`` blocks with a divider line are rewritten.
+* Anything that does not look like a table is passed through unchanged.
+* Single-line / mid-stream fragments are left alone — callers buffer
+  table rows and flush them once the block is complete.
+
+There is a small, intentional caveat: ``wcwidth`` returns ``-1`` for some
+emoji-with-variation-selector sequences (e.g. ``⚠️``); we clamp those to
+0 so they do not corrupt the column width math. The 1-cell drift on
+those specific glyphs is preferable to silently widening every table
+that contains one.
+"""
+
+from __future__ import annotations
+
+import re
+from typing import List
+
+from wcwidth import wcswidth
+
+__all__ = [
+    "is_table_divider",
+    "looks_like_table_row",
+    "realign_markdown_tables",
+    "split_table_row",
+]
+
+
+_DIVIDER_CELL_RE = re.compile(r"^\s*:?-{3,}:?\s*$")
+_MIN_COL_WIDTH = 3  # matches the divider's minimum dash run.
+
+
+def _disp_width(s: str) -> int:
+    """``wcswidth`` clamped to a non-negative integer.
+
+    ``wcswidth`` returns ``-1`` when it encounters a control char or an
+    unknown sequence; treat those as zero-width rather than letting a
+    negative number flow into ``max`` and break the column-width math.
+    """
+
+    w = wcswidth(s)
+    return w if w > 0 else 0
+
+
+def _pad_to_width(s: str, target: int) -> str:
+    return s + " " * max(0, target - _disp_width(s))
+
+
+def split_table_row(row: str) -> List[str]:
+    """Split ``| a | b | c |`` into ``["a", "b", "c"]`` with trims."""
+
+    s = row.strip()
+    if s.startswith("|"):
+        s = s[1:]
+    if s.endswith("|"):
+        s = s[:-1]
+    return [c.strip() for c in s.split("|")]
+
+
+def is_table_divider(row: str) -> bool:
+    """True when ``row`` is a markdown table separator line."""
+
+    cells = split_table_row(row)
+    return len(cells) > 1 and all(_DIVIDER_CELL_RE.match(c) for c in cells)
+
+
+def looks_like_table_row(row: str) -> bool:
+    """True when ``row`` could plausibly be a markdown table row.
+
+    Used by streaming callers to decide whether to buffer an in-flight
+    line. We are intentionally permissive here — the realigner itself
+    only rewrites blocks that are accompanied by a divider, so a false
+    positive here at most delays the print of one line.
+    """
+
+    if "|" not in row:
+        return False
+    stripped = row.strip()
+    if not stripped:
+        return False
+    # A leading pipe is the strongest signal; without it we still allow
+    # rows with at least two pipes so models that omit the leading pipe
+    # don't slip past us.
+    if stripped.startswith("|"):
+        return True
+    return stripped.count("|") >= 2
+
+
+def _render_block(rows: List[List[str]], available_width: int | None = None) -> List[str]:
+    """Render ``rows`` (header + body, divider implied) at uniform widths.
+
+    If ``available_width`` is given and the rebuilt horizontal table
+    would exceed it, fall back to a vertical key-value rendering so
+    rows do not soft-wrap mid-cell — terminal soft-wrap destroys
+    column alignment visually even when the underlying bytes are
+    perfectly padded, which is exactly the "tables look broken"
+    user report this code path is meant to address.
+    """
+
+    ncols = max(len(r) for r in rows)
+    rows = [r + [""] * (ncols - len(r)) for r in rows]
+
+    widths = [
+        max(_MIN_COL_WIDTH, *(_disp_width(r[c]) for r in rows))
+        for c in range(ncols)
+    ]
+
+    # Total horizontal width for the rendered row:
+    #   `| ` + cell + ` ` for each column, plus the final closing `|`.
+    horizontal_width = sum(widths) + 3 * ncols + 1
+
+    if available_width is not None and horizontal_width > max(available_width, 20):
+        return _render_vertical(rows, ncols, available_width)
+
+    def _row(cells: List[str]) -> str:
+        return (
+            "| "
+            + " | ".join(_pad_to_width(c, widths[k]) for k, c in enumerate(cells))
+            + " |"
+        )
+
+    out = [_row(rows[0])]
+    out.append("|" + "|".join("-" * (w + 2) for w in widths) + "|")
+    for r in rows[1:]:
+        out.append(_row(r))
+    return out
+
+
+def _wrap_to_width(text: str, width: int) -> List[str]:
+    """Soft-wrap ``text`` at word boundaries to fit ``width`` display cells.
+
+    Falls back to hard-breaking the longest word if a single token is
+    wider than ``width``.  Empty input yields a single empty string so
+    the caller's row count stays predictable.
+    """
+
+    if width <= 0 or not text:
+        return [text]
+
+    words = text.split()
+    if not words:
+        return [""]
+
+    lines: List[str] = []
+    current = ""
+    current_w = 0
+
+    def _hard_break(word: str, w: int) -> List[str]:
+        out: List[str] = []
+        buf = ""
+        bw = 0
+        for ch in word:
+            cw = _disp_width(ch) or 1
+            if bw + cw > w and buf:
+                out.append(buf)
+                buf = ch
+                bw = cw
+            else:
+                buf += ch
+                bw += cw
+        if buf:
+            out.append(buf)
+        return out
+
+    for word in words:
+        ww = _disp_width(word)
+        if not current:
+            if ww <= width:
+                current = word
+                current_w = ww
+            else:
+                pieces = _hard_break(word, width)
+                lines.extend(pieces[:-1])
+                current = pieces[-1] if pieces else ""
+                current_w = _disp_width(current)
+            continue
+        if current_w + 1 + ww <= width:
+            current += " " + word
+            current_w += 1 + ww
+        else:
+            lines.append(current)
+            if ww <= width:
+                current = word
+                current_w = ww
+            else:
+                pieces = _hard_break(word, width)
+                lines.extend(pieces[:-1])
+                current = pieces[-1] if pieces else ""
+                current_w = _disp_width(current)
+    if current:
+        lines.append(current)
+    return lines or [""]
+
+
+def _render_vertical(
+    rows: List[List[str]], ncols: int, available_width: int
+) -> List[str]:
+    """Render a too-wide table as vertical ``Header: value`` rows.
+
+    Mirrors Claude Code's narrow-terminal fallback in
+    ``MarkdownTable.tsx``: each body row becomes a small block of
+    ``Header: cell-value`` lines (continuation lines indented two
+    spaces) separated by a thin ``─`` divider between rows.  Keeps
+    every line narrower than ``available_width`` so the terminal does
+    not soft-wrap mid-cell.
+    """
+
+    if not rows:
+        return []
+
+    headers = rows[0] + [""] * (ncols - len(rows[0]))
+    body = rows[1:]
+
+    labels = [h or f"Column {i + 1}" for i, h in enumerate(headers)]
+
+    sep_width = max(20, min(40, available_width - 2)) if available_width else 30
+    separator = "─" * sep_width
+    indent = "  "
+    indent_w = _disp_width(indent)
+
+    out: List[str] = []
+    for ri, row in enumerate(body):
+        if ri > 0:
+            out.append(separator)
+        for ci in range(ncols):
+            label = labels[ci]
+            value = row[ci] if ci < len(row) else ""
+            label_w = _disp_width(label)
+            first_budget = max(10, available_width - label_w - 2)
+            cont_budget = max(10, available_width - indent_w)
+            if not value:
+                out.append(f"{label}:")
+                continue
+            wrapped = _wrap_to_width(value, first_budget)
+            out.append(f"{label}: {wrapped[0]}")
+            if len(wrapped) > 1:
+                # Re-flow continuation text at the wider continuation
+                # budget — words split across the narrower first-line
+                # budget should re-pack greedily for the rest.
+                cont_text = " ".join(wrapped[1:])
+                for cl in _wrap_to_width(cont_text, cont_budget):
+                    if cl.strip():
+                        out.append(f"{indent}{cl}")
+    return out
+
+
+def realign_markdown_tables(text: str, available_width: int | None = None) -> str:
+    """Rewrite every ``| ... |`` + divider block with wcwidth-aware padding.
+
+    Lines that are not part of a recognised table are returned verbatim,
+    so this is safe to apply to arbitrary assistant prose.
+
+    If ``available_width`` is given (terminal cells available for the
+    rendered table), tables wider than that are rendered as vertical
+    key-value pairs instead of a horizontal pipe-bordered grid.  This
+    avoids the terminal soft-wrapping mid-cell, which destroys column
+    alignment visually even when the bytes are perfectly padded.
+    """
+
+    if "|" not in text:
+        return text
+
+    lines = text.split("\n")
+    out: List[str] = []
+    i = 0
+    n = len(lines)
+
+    while i < n:
+        line = lines[i]
+        # A table starts with a header row whose next line is a divider.
+        if (
+            "|" in line
+            and i + 1 < n
+            and is_table_divider(lines[i + 1])
+        ):
+            header = split_table_row(line)
+            body: List[List[str]] = []
+            j = i + 2
+            while j < n and "|" in lines[j] and lines[j].strip():
+                if is_table_divider(lines[j]):
+                    j += 1
+                    continue
+                body.append(split_table_row(lines[j]))
+                j += 1
+
+            if any(c for c in header) or body:
+                out.extend(_render_block([header] + body, available_width))
+                i = j
+                continue
+        out.append(line)
+        i += 1
+
+    return "\n".join(out)
--- a/agent/memory_manager.py
+++ b/agent/memory_manager.py
@@ -91,10 +91,12 @@ class StreamingContextScrubber:
    def __init__(self) -> None:
        self._in_span: bool = False
        self._buf: str = ""
+        self._at_block_boundary: bool = True

    def reset(self) -> None:
        self._in_span = False
        self._buf = ""
+        self._at_block_boundary = True

    def feed(self, text: str) -> str:
        """Return the visible portion of ``text`` after scrubbing.
@@ -121,19 +123,22 @@ class StreamingContextScrubber:
                buf = buf[idx + len(self._CLOSE_TAG):]
                self._in_span = False
            else:
-                idx = buf.lower().find(self._OPEN_TAG)
+                idx = self._find_boundary_open_tag(buf)
                if idx == -1:
                    # No open tag — hold back a potential partial open tag
-                    held = self._max_partial_suffix(buf, self._OPEN_TAG)
+                    held = (
+                        self._max_pending_open_suffix(buf)
+                        or self._max_partial_suffix(buf, self._OPEN_TAG)
+                    )
                    if held:
-                        out.append(buf[:-held])
+                        self._append_visible(out, buf[:-held])
                        self._buf = buf[-held:]
                    else:
-                        out.append(buf)
+                        self._append_visible(out, buf)
                    return "".join(out)
                # Emit text before the tag, enter span
                if idx > 0:
-                    out.append(buf[:idx])
+                    self._append_visible(out, buf[:idx])
                buf = buf[idx + len(self._OPEN_TAG):]
                self._in_span = True

@@ -169,6 +174,55 @@ class StreamingContextScrubber:
                return i
        return 0

+    def _find_boundary_open_tag(self, buf: str) -> int:
+        """Find an opening fence only when it starts a block-like span."""
+        buf_lower = buf.lower()
+        search_start = 0
+        while True:
+            idx = buf_lower.find(self._OPEN_TAG, search_start)
+            if idx == -1:
+                return -1
+            if self._is_block_boundary(buf, idx) and self._has_block_opener_suffix(buf, idx):
+                return idx
+            search_start = idx + 1
+
+    def _max_pending_open_suffix(self, buf: str) -> int:
+        """Hold a complete boundary tag until the following char confirms it."""
+        if not buf.lower().endswith(self._OPEN_TAG):
+            return 0
+        idx = len(buf) - len(self._OPEN_TAG)
+        if not self._is_block_boundary(buf, idx):
+            return 0
+        return len(self._OPEN_TAG)
+
+    def _has_block_opener_suffix(self, buf: str, idx: int) -> bool:
+        after_idx = idx + len(self._OPEN_TAG)
+        if after_idx >= len(buf):
+            return False
+        return buf[after_idx] in "\r\n"
+
+    def _is_block_boundary(self, buf: str, idx: int) -> bool:
+        if idx == 0:
+            return self._at_block_boundary
+        preceding = buf[:idx]
+        last_newline = preceding.rfind("\n")
+        if last_newline == -1:
+            return self._at_block_boundary and preceding.strip() == ""
+        return preceding[last_newline + 1:].strip() == ""
+
+    def _append_visible(self, out: list[str], text: str) -> None:
+        if not text:
+            return
+        out.append(text)
+        self._update_block_boundary(text)
+
+    def _update_block_boundary(self, text: str) -> None:
+        last_newline = text.rfind("\n")
+        if last_newline != -1:
+            self._at_block_boundary = text[last_newline + 1:].strip() == ""
+        else:
+            self._at_block_boundary = self._at_block_boundary and text.strip() == ""
+

 def build_memory_context_block(raw_context: str) -> str:
    """Wrap prefetched memory in a fenced block with system note."""
@@ -470,11 +524,11 @@ class MemoryManager:

        accepted = [
            p for p in params
-            if p.kind in (
+            if p.kind in {
                inspect.Parameter.POSITIONAL_ONLY,
                inspect.Parameter.POSITIONAL_OR_KEYWORD,
                inspect.Parameter.KEYWORD_ONLY,
-            )
+            }
        ]
        if len(accepted) >= 4:
            return "positional"
--- a/agent/message_sanitization.py
+++ b/agent/message_sanitization.py
@@ -0,0 +1,444 @@
+"""Message and tool-payload sanitization helpers.
+
+Pure functions extracted from ``run_agent.py`` so the AIAgent module can
+stay focused on the conversation loop.  These walk OpenAI-format message
+lists and structured payloads, repairing or stripping problematic
+characters that would otherwise crash ``json.dumps`` inside the OpenAI
+SDK or be rejected by upstream APIs.
+
+All helpers are stateless and side-effect-free except for in-place
+mutation of their input (where documented).  Backward-compatible
+re-exports from ``run_agent`` remain in place so existing imports
+``from run_agent import _sanitize_surrogates`` keep working.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import re
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+# Lone surrogate code points are invalid in UTF-8 and crash json.dumps
+# inside the OpenAI SDK.  Used by every surrogate-sanitization helper
+# below as well as by run_agent and the CLI for paste-from-clipboard
+# scrubbing.
+_SURROGATE_RE = re.compile(r'[\ud800-\udfff]')
+
+
+def _sanitize_surrogates(text: str) -> str:
+    """Replace lone surrogate code points with U+FFFD (replacement character).
+
+    Surrogates are invalid in UTF-8 and will crash ``json.dumps()`` inside the
+    OpenAI SDK.  This is a fast no-op when the text contains no surrogates.
+    """
+    if _SURROGATE_RE.search(text):
+        return _SURROGATE_RE.sub('\ufffd', text)
+    return text
+
+
+def _sanitize_structure_surrogates(payload: Any) -> bool:
+    """Replace surrogate code points in nested dict/list payloads in-place.
+
+    Mirror of ``_sanitize_structure_non_ascii`` but for surrogate recovery.
+    Used to scrub nested structured fields (e.g. ``reasoning_details`` — an
+    array of dicts with ``summary``/``text`` strings) that flat per-field
+    checks don't reach.  Returns True if any surrogates were replaced.
+    """
+    found = False
+
+    def _walk(node):
+        nonlocal found
+        if isinstance(node, dict):
+            for key, value in node.items():
+                if isinstance(value, str):
+                    if _SURROGATE_RE.search(value):
+                        node[key] = _SURROGATE_RE.sub('\ufffd', value)
+                        found = True
+                elif isinstance(value, (dict, list)):
+                    _walk(value)
+        elif isinstance(node, list):
+            for idx, value in enumerate(node):
+                if isinstance(value, str):
+                    if _SURROGATE_RE.search(value):
+                        node[idx] = _SURROGATE_RE.sub('\ufffd', value)
+                        found = True
+                elif isinstance(value, (dict, list)):
+                    _walk(value)
+
+    _walk(payload)
+    return found
+
+
+def _sanitize_messages_surrogates(messages: list) -> bool:
+    """Sanitize surrogate characters from all string content in a messages list.
+
+    Walks message dicts in-place. Returns True if any surrogates were found
+    and replaced, False otherwise. Covers content/text, name, tool call
+    metadata/arguments, AND any additional string or nested structured fields
+    (``reasoning``, ``reasoning_content``, ``reasoning_details``, etc.) so
+    retries don't fail on a non-content field.  Byte-level reasoning models
+    (xiaomi/mimo, kimi, glm) can emit lone surrogates in reasoning output
+    that flow through to ``api_messages["reasoning_content"]`` on the next
+    turn and crash json.dumps inside the OpenAI SDK.
+    """
+    found = False
+    for msg in messages:
+        if not isinstance(msg, dict):
+            continue
+        content = msg.get("content")
+        if isinstance(content, str) and _SURROGATE_RE.search(content):
+            msg["content"] = _SURROGATE_RE.sub('\ufffd', content)
+            found = True
+        elif isinstance(content, list):
+            for part in content:
+                if isinstance(part, dict):
+                    text = part.get("text")
+                    if isinstance(text, str) and _SURROGATE_RE.search(text):
+                        part["text"] = _SURROGATE_RE.sub('\ufffd', text)
+                        found = True
+        name = msg.get("name")
+        if isinstance(name, str) and _SURROGATE_RE.search(name):
+            msg["name"] = _SURROGATE_RE.sub('\ufffd', name)
+            found = True
+        tool_calls = msg.get("tool_calls")
+        if isinstance(tool_calls, list):
+            for tc in tool_calls:
+                if not isinstance(tc, dict):
+                    continue
+                tc_id = tc.get("id")
+                if isinstance(tc_id, str) and _SURROGATE_RE.search(tc_id):
+                    tc["id"] = _SURROGATE_RE.sub('\ufffd', tc_id)
+                    found = True
+                fn = tc.get("function")
+                if isinstance(fn, dict):
+                    fn_name = fn.get("name")
+                    if isinstance(fn_name, str) and _SURROGATE_RE.search(fn_name):
+                        fn["name"] = _SURROGATE_RE.sub('\ufffd', fn_name)
+                        found = True
+                    fn_args = fn.get("arguments")
+                    if isinstance(fn_args, str) and _SURROGATE_RE.search(fn_args):
+                        fn["arguments"] = _SURROGATE_RE.sub('\ufffd', fn_args)
+                        found = True
+        # Walk any additional string / nested fields (reasoning,
+        # reasoning_content, reasoning_details, etc.) — surrogates from
+        # byte-level reasoning models (xiaomi/mimo, kimi, glm) can lurk
+        # in these fields and aren't covered by the per-field checks above.
+        # Matches _sanitize_messages_non_ascii's coverage (PR #10537).
+        for key, value in msg.items():
+            if key in {"content", "name", "tool_calls", "role"}:
+                continue
+            if isinstance(value, str):
+                if _SURROGATE_RE.search(value):
+                    msg[key] = _SURROGATE_RE.sub('\ufffd', value)
+                    found = True
+            elif isinstance(value, (dict, list)):
+                if _sanitize_structure_surrogates(value):
+                    found = True
+    return found
+
+
+def _escape_invalid_chars_in_json_strings(raw: str) -> str:
+    """Escape unescaped control chars inside JSON string values.
+
+    Walks the raw JSON character-by-character, tracking whether we are
+    inside a double-quoted string. Inside strings, replaces literal
+    control characters (0x00-0x1F) that aren't already part of an escape
+    sequence with their ``\\uXXXX`` equivalents. Pass-through for everything
+    else.
+
+    Ported from #12093 — complements the other repair passes in
+    ``_repair_tool_call_arguments`` when ``json.loads(strict=False)`` is
+    not enough (e.g. llama.cpp backends that emit literal apostrophes or
+    tabs alongside other malformations).
+    """
+    out: list[str] = []
+    in_string = False
+    i = 0
+    n = len(raw)
+    while i < n:
+        ch = raw[i]
+        if in_string:
+            if ch == "\\" and i + 1 < n:
+                # Already-escaped char — pass through as-is
+                out.append(ch)
+                out.append(raw[i + 1])
+                i += 2
+                continue
+            if ch == '"':
+                in_string = False
+                out.append(ch)
+            elif ord(ch) < 0x20:
+                out.append(f"\\u{ord(ch):04x}")
+            else:
+                out.append(ch)
+        else:
+            if ch == '"':
+                in_string = True
+            out.append(ch)
+        i += 1
+    return "".join(out)
+
+
+def _repair_tool_call_arguments(raw_args: str, tool_name: str = "?") -> str:
+    """Attempt to repair malformed tool_call argument JSON.
+
+    Models like GLM-5.1 via Ollama can produce truncated JSON, trailing
+    commas, Python ``None``, etc.  The API proxy rejects these with HTTP 400
+    "invalid tool call arguments".  This function applies common repairs;
+    if all fail it returns ``"{}"`` so the request succeeds (better than
+    crashing the session).  All repairs are logged at WARNING level.
+    """
+    raw_stripped = raw_args.strip() if isinstance(raw_args, str) else ""
+
+    # Fast-path: empty / whitespace-only -> empty object
+    if not raw_stripped:
+        logger.warning("Sanitized empty tool_call arguments for %s", tool_name)
+        return "{}"
+
+    # Python-literal None -> normalise to {}
+    if raw_stripped == "None":
+        logger.warning("Sanitized Python-None tool_call arguments for %s", tool_name)
+        return "{}"
+
+    # Repair pass 0: llama.cpp backends sometimes emit literal control
+    # characters (tabs, newlines) inside JSON string values. json.loads
+    # with strict=False accepts these and lets us re-serialise the
+    # result into wire-valid JSON without any string surgery. This is
+    # the most common local-model repair case (#12068).
+    try:
+        parsed = json.loads(raw_stripped, strict=False)
+        reserialised = json.dumps(parsed, separators=(",", ":"))
+        if reserialised != raw_stripped:
+            logger.warning(
+                "Repaired unescaped control chars in tool_call arguments for %s",
+                tool_name,
+            )
+        return reserialised
+    except (json.JSONDecodeError, TypeError, ValueError):
+        pass
+
+    # Attempt common JSON repairs
+    fixed = raw_stripped
+    # 1. Strip trailing commas before } or ]
+    fixed = re.sub(r',\s*([}\]])', r'\1', fixed)
+    # 2. Close unclosed structures
+    open_curly = fixed.count('{') - fixed.count('}')
+    open_bracket = fixed.count('[') - fixed.count(']')
+    if open_curly > 0:
+        fixed += '}' * open_curly
+    if open_bracket > 0:
+        fixed += ']' * open_bracket
+    # 3. Remove excess closing braces/brackets (bounded to 50 iterations)
+    for _ in range(50):
+        try:
+            json.loads(fixed)
+            break
+        except json.JSONDecodeError:
+            if fixed.endswith('}') and fixed.count('}') > fixed.count('{'):
+                fixed = fixed[:-1]
+            elif fixed.endswith(']') and fixed.count(']') > fixed.count('['):
+                fixed = fixed[:-1]
+            else:
+                break
+
+    try:
+        json.loads(fixed)
+        logger.warning(
+            "Repaired malformed tool_call arguments for %s: %s → %s",
+            tool_name, raw_stripped[:80], fixed[:80],
+        )
+        return fixed
+    except json.JSONDecodeError:
+        pass
+
+    # Repair pass 4: escape unescaped control chars inside JSON strings,
+    # then retry. Catches cases where strict=False alone fails because
+    # other malformations are present too.
+    try:
+        escaped = _escape_invalid_chars_in_json_strings(fixed)
+        if escaped != fixed:
+            json.loads(escaped)
+            logger.warning(
+                "Repaired control-char-laced tool_call arguments for %s: %s → %s",
+                tool_name, raw_stripped[:80], escaped[:80],
+            )
+            return escaped
+    except (json.JSONDecodeError, TypeError, ValueError):
+        pass
+
+    # Last resort: replace with empty object so the API request doesn't
+    # crash the entire session.
+    logger.warning(
+        "Unrepairable tool_call arguments for %s — "
+        "replaced with empty object (was: %s)",
+        tool_name, raw_stripped[:80],
+    )
+    return "{}"
+
+
+def _strip_non_ascii(text: str) -> str:
+    """Remove non-ASCII characters, replacing with closest ASCII equivalent or removing.
+
+    Used as a last resort when the system encoding is ASCII and can't handle
+    any non-ASCII characters (e.g. LANG=C on Chromebooks).
+    """
+    return text.encode('ascii', errors='ignore').decode('ascii')
+
+
+def _sanitize_messages_non_ascii(messages: list) -> bool:
+    """Strip non-ASCII characters from all string content in a messages list.
+
+    This is a last-resort recovery for systems with ASCII-only encoding
+    (LANG=C, Chromebooks, minimal containers).  Returns True if any
+    non-ASCII content was found and sanitized.
+    """
+    found = False
+    for msg in messages:
+        if not isinstance(msg, dict):
+            continue
+        # Sanitize content (string)
+        content = msg.get("content")
+        if isinstance(content, str):
+            sanitized = _strip_non_ascii(content)
+            if sanitized != content:
+                msg["content"] = sanitized
+                found = True
+        elif isinstance(content, list):
+            for part in content:
+                if isinstance(part, dict):
+                    text = part.get("text")
+                    if isinstance(text, str):
+                        sanitized = _strip_non_ascii(text)
+                        if sanitized != text:
+                            part["text"] = sanitized
+                            found = True
+        # Sanitize name field (can contain non-ASCII in tool results)
+        name = msg.get("name")
+        if isinstance(name, str):
+            sanitized = _strip_non_ascii(name)
+            if sanitized != name:
+                msg["name"] = sanitized
+                found = True
+        # Sanitize tool_calls
+        tool_calls = msg.get("tool_calls")
+        if isinstance(tool_calls, list):
+            for tc in tool_calls:
+                if isinstance(tc, dict):
+                    fn = tc.get("function", {})
+                    if isinstance(fn, dict):
+                        fn_args = fn.get("arguments")
+                        if isinstance(fn_args, str):
+                            sanitized = _strip_non_ascii(fn_args)
+                            if sanitized != fn_args:
+                                fn["arguments"] = sanitized
+                                found = True
+        # Sanitize any additional top-level string fields (e.g. reasoning_content)
+        for key, value in msg.items():
+            if key in {"content", "name", "tool_calls", "role"}:
+                continue
+            if isinstance(value, str):
+                sanitized = _strip_non_ascii(value)
+                if sanitized != value:
+                    msg[key] = sanitized
+                    found = True
+    return found
+
+
+def _sanitize_tools_non_ascii(tools: list) -> bool:
+    """Strip non-ASCII characters from tool payloads in-place."""
+    return _sanitize_structure_non_ascii(tools)
+
+
+def _strip_images_from_messages(messages: list) -> bool:
+    """Remove image_url content parts from all messages in-place.
+
+    Called when a server signals it does not support images (e.g.
+    "Only 'text' content type is supported.").  Mutates messages so the
+    next API call sends text only.
+
+    Preserves message alternation invariants:
+      * ``tool``-role messages whose content was entirely images are replaced
+        with a plaintext placeholder, NOT deleted — deleting them would leave
+        the paired ``tool_call_id`` on the prior assistant message unmatched,
+        which providers reject with HTTP 400.
+      * Non-tool messages whose content becomes empty are dropped.  In
+        practice this only hits synthetic image-only user messages appended
+        for attachment delivery; real user turns always include text.
+
+    Returns True if any image parts were removed.
+    """
+    found = False
+    to_delete = []
+    for i, msg in enumerate(messages):
+        if not isinstance(msg, dict):
+            continue
+        content = msg.get("content")
+        if not isinstance(content, list):
+            continue
+        new_parts = []
+        for part in content:
+            if isinstance(part, dict) and part.get("type") in {"image_url", "image", "input_image"}:
+                found = True
+            else:
+                new_parts.append(part)
+        if len(new_parts) < len(content):
+            if new_parts:
+                msg["content"] = new_parts
+            elif msg.get("role") == "tool":
+                # Preserve tool_call_id linkage — providers require every
+                # assistant tool_call to have a matching tool response.
+                msg["content"] = "[image content removed — server does not support images]"
+            else:
+                # Synthetic image-only user/assistant message with no text;
+                # safe to drop.
+                to_delete.append(i)
+    for i in reversed(to_delete):
+        del messages[i]
+    return found
+
+
+def _sanitize_structure_non_ascii(payload: Any) -> bool:
+    """Strip non-ASCII characters from nested dict/list payloads in-place."""
+    found = False
+
+    def _walk(node):
+        nonlocal found
+        if isinstance(node, dict):
+            for key, value in node.items():
+                if isinstance(value, str):
+                    sanitized = _strip_non_ascii(value)
+                    if sanitized != value:
+                        node[key] = sanitized
+                        found = True
+                elif isinstance(value, (dict, list)):
+                    _walk(value)
+        elif isinstance(node, list):
+            for idx, value in enumerate(node):
+                if isinstance(value, str):
+                    sanitized = _strip_non_ascii(value)
+                    if sanitized != value:
+                        node[idx] = sanitized
+                        found = True
+                elif isinstance(value, (dict, list)):
+                    _walk(value)
+
+    _walk(payload)
+    return found
+
+
+__all__ = [
+    "_SURROGATE_RE",
+    "_sanitize_surrogates",
+    "_sanitize_structure_surrogates",
+    "_sanitize_messages_surrogates",
+    "_escape_invalid_chars_in_json_strings",
+    "_repair_tool_call_arguments",
+    "_strip_non_ascii",
+    "_sanitize_messages_non_ascii",
+    "_sanitize_tools_non_ascii",
+    "_strip_images_from_messages",
+    "_sanitize_structure_non_ascii",
+]
--- a/agent/model_metadata.py
+++ b/agent/model_metadata.py
@@ -10,7 +10,7 @@ import os
 import re
 import time
 from pathlib import Path
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Optional, Tuple
 from urllib.parse import urlparse

 import requests
@@ -47,7 +47,7 @@ def _resolve_requests_verify() -> bool | str:
 _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
    "gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-oauth", "minimax-cn", "anthropic", "deepseek",
-    "opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
+    "opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba", "novita",
    "qwen-oauth",
    "xiaomi",
    "arcee",
@@ -66,7 +66,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "gmi-cloud", "gmicloud",
    "xai", "x-ai", "x.ai", "grok",
    "nvidia", "nim", "nvidia-nim", "nemotron",
-    "qwen-portal",
+    "qwen-portal", "novita-ai", "novitaai",
 })


@@ -104,6 +104,8 @@ def _strip_provider_prefix(model: str) -> str:

 _model_metadata_cache: Dict[str, Dict[str, Any]] = {}
 _model_metadata_cache_time: float = 0
+_novita_metadata_cache: Dict[str, Dict[str, Any]] = {}
+_novita_metadata_cache_time: float = 0
 _MODEL_CACHE_TTL = 3600
 _endpoint_model_metadata_cache: Dict[str, Dict[str, Dict[str, Any]]] = {}
 _endpoint_model_metadata_cache_time: Dict[str, float] = {}
@@ -157,6 +159,13 @@ DEFAULT_CONTEXT_LENGTHS = {
    "gpt-5.4-nano": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4-mini": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4": 1050000,               # GPT-5.4, GPT-5.4 Pro (1.05M context)
+    # gpt-5.3-codex-spark is Codex-OAuth-only (ChatGPT Pro entitlement) and
+    # uses a smaller 128k window than other gpt-5.x slugs. Listed here as
+    # a defensive override so the longest-substring fallback doesn't match
+    # the generic "gpt-5" entry below (400k) and report the wrong limit if
+    # Spark's context ever needs to be resolved through this path. Real
+    # usage flows through _CODEX_OAUTH_CONTEXT_FALLBACK at line ~1113.
+    "gpt-5.3-codex-spark": 128000,
    "gpt-5.1-chat": 128000,           # Chat variant has 128k context
    "gpt-5": 400000,                  # GPT-5.x base, mini, codex variants (400k)
    "gpt-4.1": 1047576,
@@ -185,6 +194,7 @@ DEFAULT_CONTEXT_LENGTHS = {
    "llama": 131072,
    # Qwen — specific model families before the catch-all.
    # Official docs: https://help.aliyun.com/zh/model-studio/developer-reference/
+    "qwen3.6-plus": 1048576,      # 1M context (DashScope/Alibaba & OpenRouter)
    "qwen3-coder-plus": 1000000,  # 1M context
    "qwen3-coder": 262144,        # 256K context
    "qwen": 131072,
@@ -204,14 +214,17 @@ DEFAULT_CONTEXT_LENGTHS = {
    "grok-2-vision": 8192,      # grok-2-vision, -1212, -latest
    "grok-4-fast": 2000000,     # grok-4-fast-(non-)reasoning
    "grok-4.20": 2000000,       # grok-4.20-0309-(non-)reasoning, -multi-agent-0309
+    "grok-4.3": 1000000,        # grok-4.3, grok-4.3-latest — 1M context per docs.x.ai
    "grok-4": 256000,           # grok-4, grok-4-0709
    "grok-3": 131072,           # grok-3, grok-3-mini, grok-3-fast, grok-3-mini-fast
    "grok-2": 131072,           # grok-2, grok-2-1212, grok-2-latest
    "grok": 131072,             # catch-all (grok-beta, unknown grok-*)
    # Kimi
    "kimi": 262144,
-    # Tencent — Hy3 Preview (Hunyuan) with 256K context window
-    "hy3-preview": 256000,
+    # Tencent — Hy3 Preview (Hunyuan) with 256K context window.
+    # OpenRouter live metadata reports 262144 (256 × 1024); align the
+    # static fallback so cache and offline both agree (issue #22268).
+    "hy3-preview": 262144,
    # Nemotron — NVIDIA's open-weights series (128K context across all sizes)
    "nemotron": 131072,
    # Arcee
@@ -235,9 +248,48 @@ DEFAULT_CONTEXT_LENGTHS = {
    "zai-org/GLM-5": 202752,
 }

+# xAI Grok models that ACCEPT the `reasoning.effort` parameter on
+# api.x.ai. Verified live against /v1/responses 2026-05-10:
+#
+#   ACCEPTS effort:  grok-3-mini, grok-3-mini-fast, grok-4.20-multi-agent-0309,
+#                    grok-4.3
+#   REJECTS effort:  grok-3, grok-4, grok-4-0709, grok-4-fast-(non-)reasoning,
+#                    grok-4-1-fast-(non-)reasoning, grok-4.20-0309-(non-)reasoning,
+#                    grok-code-fast-1
+#
+# REJECTS-side models still reason natively — they just don't expose an
+# effort dial — so callers should send no `reasoning` key at all rather
+# than a default `medium` (which 400s with "Model X does not support
+# parameter reasoningEffort").
+_GROK_EFFORT_CAPABLE_PREFIXES = (
+    "grok-3-mini",
+    "grok-4.20-multi-agent",
+    "grok-4.3",
+)
+
+
+def grok_supports_reasoning_effort(model: str) -> bool:
+    """Return True when an xAI Grok model accepts ``reasoning.effort``.
+
+    Allowlist by substring (matches both bare ``grok-3-mini`` and
+    aggregator-prefixed ``x-ai/grok-3-mini``). Conservative by design:
+    if a future Grok model isn't listed, we send no effort dial rather
+    than 400.
+    """
+    name = (model or "").strip().lower()
+    if not name:
+        return False
+    # Strip common aggregator prefixes (x-ai/, openrouter/x-ai/, xai/, ...)
+    for sep in ("/",):
+        if sep in name:
+            name = name.rsplit(sep, 1)[-1]
+    return any(name.startswith(prefix) for prefix in _GROK_EFFORT_CAPABLE_PREFIXES)
+
+
 _CONTEXT_LENGTH_KEYS = (
    "context_length",
    "context_window",
+    "context_size",
    "max_context_length",
    "max_position_embeddings",
    "max_model_len",
@@ -307,6 +359,12 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.deepseek.com": "deepseek",
    "api.githubcopilot.com": "copilot",
    "models.github.ai": "copilot",
+    # GitHub Models free tier (Azure-hosted prototyping endpoint) — same
+    # canonical provider as the Copilot API.  Hard per-request token cap
+    # (often 8K) makes it unusable for Hermes' system prompt, but mapping
+    # it here lets us recognize the endpoint and emit a targeted hint
+    # instead of falling through the unknown-custom-endpoint path.
+    "models.inference.ai.azure.com": "copilot",
    "api.fireworks.ai": "fireworks",
    "opencode.ai": "opencode-go",
    "api.x.ai": "xai",
@@ -314,6 +372,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.xiaomimimo.com": "xiaomi",
    "xiaomimimo.com": "xiaomi",
    "api.gmi-serving.com": "gmi",
+    "api.novita.ai": "novita",
    "tokenhub.tencentmaas.com": "tencent-tokenhub",
    "ollama.com": "ollama-cloud",
 }
@@ -510,6 +569,16 @@ def _extract_max_completion_tokens(payload: Dict[str, Any]) -> Optional[int]:


 def _extract_pricing(payload: Dict[str, Any]) -> Dict[str, Any]:
+    novita_input = payload.get("input_token_price_per_m")
+    novita_output = payload.get("output_token_price_per_m")
+    if novita_input is not None or novita_output is not None:
+        pricing: Dict[str, Any] = {}
+        if novita_input is not None:
+            pricing["prompt"] = str(float(novita_input) / 10_000 / 1_000_000)
+        if novita_output is not None:
+            pricing["completion"] = str(float(novita_output) / 10_000 / 1_000_000)
+        return pricing
+
    alias_map = {
        "prompt": ("prompt", "input", "input_cost_per_token", "prompt_token_cost"),
        "completion": ("completion", "output", "output_cost_per_token", "completion_token_cost"),
@@ -524,7 +593,7 @@ def _extract_pricing(payload: Dict[str, Any]) -> Dict[str, Any]:
        pricing: Dict[str, Any] = {}
        for target, aliases in alias_map.items():
            for alias in aliases:
-                if alias in normalized and normalized[alias] not in (None, ""):
+                if alias in normalized and normalized[alias] not in {None, ""}:
                    pricing[target] = normalized[alias]
                    break
        if pricing:
@@ -959,6 +1028,79 @@ def query_ollama_num_ctx(model: str, base_url: str, api_key: str = "") -> Option
    return None


+def _query_ollama_api_show(model: str, base_url: str, api_key: str = "") -> Optional[int]:
+    """Query an Ollama server's native ``/api/show`` for context length.
+
+    Provider-agnostic: works against ANY Ollama-compatible server regardless
+    of hostname — local Ollama, Ollama Cloud (``ollama.com``), custom Ollama
+    hosting behind a reverse proxy, etc.  For non-Ollama servers the POST
+    returns 404/405 quickly; the function handles errors gracefully.
+
+    For hosted servers the GGUF ``model_info.*.context_length`` is the
+    authoritative source: the user can't set their own ``num_ctx``, and the
+    OpenAI-compat ``/v1/models`` endpoint correctly omits ``context_length``
+    per the OpenAI schema.
+
+    Resolution order for hosted Ollama:
+      1. ``model_info.*.context_length`` — GGUF training max (authoritative)
+      2. ``parameters`` → ``num_ctx`` — server-side Modelfile override
+    The order is flipped vs ``query_ollama_num_ctx()`` because local users
+    control ``num_ctx`` themselves; hosted users can't.
+    """
+    import httpx
+
+    server_url = base_url.rstrip("/")
+    if server_url.endswith("/v1"):
+        server_url = server_url[:-3]
+
+    headers = _auth_headers(api_key)
+
+    try:
+        with httpx.Client(timeout=5.0, headers=headers) as client:
+            resp = client.post(f"{server_url}/api/show", json={"name": model})
+            if resp.status_code != 200:
+                return None
+            data = resp.json()
+
+            # Hosted Ollama: GGUF model_info is the real max — prefer it over
+            # num_ctx which the Cloud operator may have capped arbitrarily.
+            model_info = data.get("model_info", {})
+            for key, value in model_info.items():
+                if "context_length" in key and isinstance(value, (int, float)):
+                    ctx = int(value)
+                    if ctx >= 1024:
+                        return ctx
+
+            # Fall back to num_ctx from Modelfile parameters (rare on Cloud)
+            params = data.get("parameters", "")
+            if "num_ctx" in params:
+                for line in params.split("\n"):
+                    if "num_ctx" in line:
+                        parts = line.strip().split()
+                        if len(parts) >= 2:
+                            try:
+                                ctx = int(parts[-1])
+                                if ctx >= 1024:
+                                    return ctx
+                            except ValueError:
+                                pass
+    except Exception:
+        pass
+    return None
+
+
+def _model_name_suggests_kimi(model: str) -> bool:
+    """Return True if the model name looks like a Kimi-family model.
+
+    Catches ``kimi-k2.6``, ``kimi-k2.5``, ``kimi-k2-thinking``,
+    ``moonshotai/Kimi-K2.6``, and similar variants.  Used as a guard
+    against stale OpenRouter metadata that underreports these models
+    as 32K context when they actually support 262K+.
+    """
+    lower = model.lower()
+    return lower.startswith("kimi") or "moonshot" in lower
+
+
 def _query_local_context_length(model: str, base_url: str, api_key: str = "") -> Optional[int]:
    """Query a local server for the model's context length."""
    import httpx
@@ -1106,6 +1248,12 @@ _CODEX_OAUTH_CONTEXT_FALLBACK: Dict[str, int] = {
    "gpt-5.1-codex-max": 272_000,
    "gpt-5.1-codex-mini": 272_000,
    "gpt-5.3-codex": 272_000,
+    # Spark runs on specialised low-latency hardware and exposes a smaller
+    # 128k window than other Codex OAuth slugs. Listed explicitly so the
+    # longest-key-first fallback resolves it correctly — substring match
+    # on "gpt-5.3-codex" otherwise wins and reports 272k. Availability is
+    # gated by ChatGPT Pro entitlement on the Codex backend.
+    "gpt-5.3-codex-spark": 128_000,
    "gpt-5.2-codex": 272_000,
    "gpt-5.4-mini": 272_000,
    "gpt-5.5": 272_000,
@@ -1204,27 +1352,66 @@ def _resolve_codex_oauth_context_length(
    return None


-def _resolve_nous_context_length(model: str) -> Optional[int]:
-    """Resolve Nous Portal model context length via OpenRouter metadata.
+def _resolve_nous_context_length(
+    model: str,
+    base_url: str = "",
+    api_key: str = "",
+) -> Tuple[Optional[int], str]:
+    """Resolve Nous Portal model context length.

-    Nous model IDs are bare (e.g. 'claude-opus-4-6') while OpenRouter uses
-    prefixed IDs (e.g. 'anthropic/claude-opus-4.6'). Try suffix matching
-    with version normalization (dot↔dash).
+    Tries the live Nous inference endpoint first (authoritative), then falls
+    back to OpenRouter metadata with suffix/version matching.
+
+    Nous model IDs are bare after prefix-stripping (e.g. 'qwen3.6-plus',
+    'claude-opus-4-6') while OpenRouter uses prefixed IDs (e.g.
+    'qwen/qwen3.6-plus', 'anthropic/claude-opus-4.6').  Version
+    normalization (dot↔dash) is applied to handle name drifts.
+
+    Returns ``(context_length, source)`` where ``source`` is one of:
+      - ``"portal"``    — live /v1/models response (authoritative)
+      - ``"openrouter"`` — OpenRouter cache fallback (non-authoritative;
+        callers must NOT persist this to the on-disk cache or a single
+        portal blip will freeze the wrong value in forever)
+      - ``""``           — could not resolve
    """
-    metadata = fetch_model_metadata()  # OpenRouter cache
-    # Exact match first
+    # Portal first — the Nous /models endpoint is authoritative for what our
+    # infrastructure enforces and may differ from OR (e.g. OR reports 1M for
+    # qwen3.6-plus; the portal correctly says 262144).  Fall back to the OR
+    # catalog only if the portal doesn't list the model.
+    if base_url:
+        portal_ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
+        if portal_ctx is not None:
+            return portal_ctx, "portal"
+
+    metadata = fetch_model_metadata()
+
+    def _safe_ctx(or_id: str, entry: dict) -> Optional[int]:
+        ctx = entry.get("context_length")
+        if ctx is None:
+            return None
+        if ctx <= 32768 and _model_name_suggests_kimi(or_id):
+            logger.info(
+                "Rejecting OpenRouter metadata context=%s for %r "
+                "(Kimi-family underreport, Nous path); falling through to hardcoded defaults",
+                ctx, or_id,
+            )
+            return None
+        return ctx
+
    if model in metadata:
-        return metadata[model].get("context_length")
+        ctx = _safe_ctx(model, metadata[model])
+        if ctx is not None:
+            return ctx, "openrouter"

    normalized = _normalize_model_version(model).lower()

    for or_id, entry in metadata.items():
        bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
        if bare.lower() == model.lower() or _normalize_model_version(bare).lower() == normalized:
-            return entry.get("context_length")
+            ctx = _safe_ctx(or_id, entry)
+            if ctx is not None:
+                return ctx, "openrouter"

-    # Partial prefix match for cases like gemini-3-flash → gemini-3-flash-preview
-    # Require match to be at a word boundary (followed by -, :, or end of string)
    model_lower = model.lower()
    for or_id, entry in metadata.items():
        bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
@@ -1232,9 +1419,11 @@ def _resolve_nous_context_length(model: str) -> Optional[int]:
            if candidate.startswith(query) and (
                len(candidate) == len(query) or candidate[len(query)] in "-:."
            ):
-                return entry.get("context_length")
+                ctx = _safe_ctx(or_id, entry)
+                if ctx is not None:
+                    return ctx, "openrouter"

-    return None
+    return None, ""


 def get_model_context_length(
@@ -1249,17 +1438,26 @@ def get_model_context_length(

    Resolution order:
    0. Explicit config override (model.context_length or custom_providers per-model)
-    1. Persistent cache (previously discovered via probing)
+    1. Persistent cache (previously discovered via probing).  Nous URLs
+       bypass the cache here so step 5b can always reconcile against
+       the authoritative portal /v1/models response.
    1b. AWS Bedrock static table (must precede custom-endpoint probe)
    2. Active endpoint metadata (/models for explicit custom endpoints)
    3. Local server query (for local endpoints)
    4. Anthropic /v1/models API (API-key users only, not OAuth)
-    5. OpenRouter live API metadata
-    6. Nous suffix-match via OpenRouter cache
-    7. models.dev registry lookup (provider-aware)
-    8. Thin hardcoded defaults (broad family patterns)
-    9. Default fallback (256K)
-    """
+    5. Provider-aware lookups (before generic OpenRouter cache):
+       a. Copilot live /models API
+       b. Nous: live /v1/models probe first (authoritative), then OR
+          cache fallback with suffix/version normalisation.  Only
+          portal-derived values are persisted to disk.
+       c. Codex OAuth /models probe
+       d. GMI /models endpoint
+       e. Ollama native /api/show probe (any base_url, provider-agnostic)
+       f. models.dev registry lookup (with :cloud/-cloud suffix fallback)
+    6. OpenRouter live API metadata (Kimi-family 32k guard)
+    7. Hardcoded defaults (broad family patterns, longest-key-first)
+    8. Local server query (last resort)
+    9. Default fallback (256K)"""
    # 0. Explicit config override — user knows best
    if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
        return config_context_length
@@ -1306,6 +1504,28 @@ def get_model_context_length(
                    model, base_url, f"{cached:,}",
                )
                _invalidate_cached_context_length(model, base_url)
+            # Invalidate stale 32k cache entries for Kimi-family models.
+            elif cached <= 32768 and _model_name_suggests_kimi(model):
+                logger.info(
+                    "Dropping stale Kimi cache entry %s@%s -> %s (OpenRouter underreport); "
+                    "re-resolving via hardcoded defaults",
+                    model, base_url, f"{cached:,}",
+                )
+                _invalidate_cached_context_length(model, base_url)
+            # Nous Portal: the portal /v1/models endpoint is authoritative.
+            # Bypass the persistent cache so step 5b can always reconcile
+            # against it — this corrects pre-fix entries seeded from the
+            # OR catalog (the same OR underreport class that the Kimi/Qwen
+            # DEFAULT_CONTEXT_LENGTHS overrides exist to mitigate) without
+            # touching the on-disk file when the portal is unreachable.
+            # The in-memory 300s endpoint metadata cache makes the per-call
+            # cost amortise to ~0 within a process.
+            elif _infer_provider_from_url(base_url) == "nous":
+                logger.debug(
+                    "Bypassing persistent cache for %s@%s (Nous portal authoritative)",
+                    model, base_url,
+                )
+                # Fall through; step 5b reconciles and overwrites if portal responds.
            else:
                return cached

@@ -1329,6 +1549,13 @@ def get_model_context_length(
        except ImportError:
            pass  # boto3 not installed — fall through to generic resolution

+    if provider == "novita" or (base_url and base_url_host_matches(base_url, "api.novita.ai")):
+        ctx = _resolve_endpoint_context_length(model, base_url or "https://api.novita.ai/openai/v1", api_key=api_key)
+        if ctx is not None:
+            if base_url:
+                save_context_length(model, base_url, ctx)
+            return ctx
+
    # 2. Active endpoint metadata for truly custom/unknown endpoints.
    # Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
    # /models endpoint may report a provider-imposed limit (e.g. Copilot
@@ -1339,6 +1566,13 @@ def get_model_context_length(
        if context_length is not None:
            return context_length
        if not _is_known_provider_base_url(base_url):
+            # 2b. Ollama native /api/show — any URL might be an Ollama server
+            # (local, cloud, or custom hosting).  Non-Ollama servers return
+            # 404/405 quickly.  Fall through on failure.
+            ctx = _query_ollama_api_show(model, base_url, api_key=api_key)
+            if ctx is not None:
+                save_context_length(model, base_url, ctx)
+                return ctx
            # 3. Try querying local server directly
            if is_local_endpoint(base_url):
                local_ctx = _query_local_context_length(model, base_url, api_key=api_key)
@@ -1370,7 +1604,7 @@ def get_model_context_length(
    # (e.g. claude-opus-4.6 is 1M on Anthropic but 128K on GitHub Copilot).
    # If provider is generic (openrouter/custom/empty), try to infer from URL.
    effective_provider = provider
-    if not effective_provider or effective_provider in ("openrouter", "custom"):
+    if not effective_provider or effective_provider in {"openrouter", "custom"}:
        if base_url:
            inferred = _infer_provider_from_url(base_url)
            if inferred:
@@ -1380,7 +1614,7 @@ def get_model_context_length(
    # This catches account-specific models (e.g. claude-opus-4.6-1m) that
    # don't exist in models.dev. For models that ARE in models.dev, this
    # returns the provider-enforced limit which is what users can actually use.
-    if effective_provider in ("copilot", "copilot-acp", "github-copilot"):
+    if effective_provider in {"copilot", "copilot-acp", "github-copilot"}:
        try:
            from hermes_cli.models import get_copilot_model_context
            ctx = get_copilot_model_context(model, api_key=api_key)
@@ -1390,8 +1624,18 @@ def get_model_context_length(
            pass  # Fall through to models.dev

    if effective_provider == "nous":
-        ctx = _resolve_nous_context_length(model)
+        ctx, source = _resolve_nous_context_length(
+            model, base_url=base_url or "", api_key=api_key or ""
+        )
        if ctx:
+            # Persist ONLY portal-derived values.  Caching an OR-fallback
+            # value here would freeze in a wrong number on the first portal
+            # blip / auth glitch and step-1 would short-circuit it forever.
+            # OR's catalog is community-maintained and is precisely why the
+            # Kimi/Qwen DEFAULT_CONTEXT_LENGTHS overrides exist — we don't
+            # want it leaking into the persistent cache for Nous URLs.
+            if base_url and source == "portal":
+                save_context_length(model, base_url, ctx)
            return ctx
    if effective_provider == "openai-codex":
        # Codex OAuth enforces lower context limits than the direct OpenAI
@@ -1408,16 +1652,45 @@ def get_model_context_length(
        ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
        if ctx is not None:
            return ctx
+    # 5e. Ollama native /api/show probe — runs for ANY provider with a
+    # base_url, not just ollama-cloud.  Ollama-compatible servers expose
+    # this endpoint regardless of hostname (local Ollama, Ollama Cloud,
+    # custom Ollama hosting).  The OpenAI-compat /v1/models endpoint
+    # correctly omits context_length per the OpenAI schema, but /api/show
+    # returns the authoritative GGUF model_info.context_length.
+    # For non-Ollama servers (OpenAI, Anthropic, etc.), the POST returns
+    # 404/405 quickly.  Results are cached, so the hit is per-model+URL,
+    # once per hour.
+    if base_url:
+        ctx = _query_ollama_api_show(model, base_url, api_key=api_key)
+        if ctx is not None:
+            save_context_length(model, base_url, ctx)
+            return ctx
    if effective_provider:
        from agent.models_dev import lookup_models_dev_context
        ctx = lookup_models_dev_context(effective_provider, model)
        if ctx:
            return ctx

-    # 6. OpenRouter live API metadata (provider-unaware fallback)
-    metadata = fetch_model_metadata()
-    if model in metadata:
-        return metadata[model].get("context_length", DEFAULT_FALLBACK_CONTEXT)
+    # 6. OpenRouter live API metadata — provider-unaware fallback.
+    # Only consulted when the provider is unknown (no effective_provider),
+    # because OpenRouter data is community-maintained and can be incorrect
+    # for models that belong to known providers with curated defaults.
+    if not effective_provider:
+        metadata = fetch_model_metadata()
+        if model in metadata:
+            or_ctx = metadata[model].get("context_length", DEFAULT_FALLBACK_CONTEXT)
+            # Guard against stale OpenRouter metadata for Kimi-family models.
+            if or_ctx == 32768 and _model_name_suggests_kimi(model):
+                logger.info(
+                    "Rejecting OpenRouter metadata context=%s for %r "
+                    "(Kimi-family underreport); falling through to hardcoded defaults",
+                    or_ctx, model,
+                )
+            else:
+                return or_ctx
+
+    # 7. (reserved)

    # 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
    # Only check `default_model in model` (is the key a substring of the input).
@@ -1480,7 +1753,7 @@ def _count_image_tokens(msg: Dict[str, Any], cost_per_image: int) -> int:
            if not isinstance(part, dict):
                continue
            ptype = part.get("type")
-            if ptype in ("image", "image_url", "input_image"):
+            if ptype in {"image", "image_url", "input_image"}:
                count += 1
    stashed = msg.get("_anthropic_content_blocks") if isinstance(msg, dict) else None
    if isinstance(stashed, list):
@@ -1492,7 +1765,7 @@ def _count_image_tokens(msg: Dict[str, Any], cost_per_image: int) -> int:
        inner = content.get("content")
        if isinstance(inner, list):
            for part in inner:
-                if isinstance(part, dict) and part.get("type") in ("image", "image_url"):
+                if isinstance(part, dict) and part.get("type") in {"image", "image_url"}:
                    count += 1
    return count * cost_per_image

@@ -1514,7 +1787,7 @@ def _estimate_message_chars(msg: Dict[str, Any]) -> int:
                cleaned = []
                for part in v:
                    if isinstance(part, dict):
-                        if part.get("type") in ("image", "image_url", "input_image"):
+                        if part.get("type") in {"image", "image_url", "input_image"}:
                            cleaned.append({"type": part.get("type"), "image": "[stripped]"})
                        else:
                            cleaned.append(part)
--- a/agent/models_dev.py
+++ b/agent/models_dev.py
@@ -141,11 +141,14 @@ class ProviderInfo:
 # Hermes provider names → models.dev provider IDs
 PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
    "openrouter": "openrouter",
+    "novita": "novita-ai",
    "anthropic": "anthropic",
    "openai": "openai",
    "openai-codex": "openai",
    "zai": "zai",
+    "kimi": "kimi-for-coding",
    "kimi-coding": "kimi-for-coding",
+    "moonshot": "kimi-for-coding",
    "stepfun": "stepfun",
    "kimi-coding-cn": "kimi-for-coding",
    "minimax": "minimax",
@@ -197,6 +200,32 @@ def _load_disk_cache() -> Dict[str, Any]:
    return {}


+def _disk_cache_age_seconds() -> Optional[float]:
+    """Return age (in seconds) of the disk cache file, or None if missing.
+
+    Used by ``fetch_models_dev`` to short-circuit the network probe when
+    a recent on-disk cache exists. Errors (missing file, permission
+    denied, weird filesystem) all return None — callers fall through
+    to the network fetch path.
+    """
+    try:
+        cache_path = _get_cache_path()
+        if not cache_path.exists():
+            return None
+        mtime = cache_path.stat().st_mtime
+        age = time.time() - mtime
+        # Negative age means the file's mtime is in the future (clock skew
+        # or system clock reset). Treat as "unknown freshness" → fall
+        # through to network so we don't serve potentially-bad data
+        # forever.
+        if age < 0:
+            return None
+        return age
+    except Exception as e:
+        logger.debug("Failed to stat models.dev disk cache: %s", e)
+        return None
+
+
 def _save_disk_cache(data: Dict[str, Any]) -> None:
    """Save models.dev data to disk cache atomically."""
    try:
@@ -207,13 +236,29 @@ def _save_disk_cache(data: Dict[str, Any]) -> None:


 def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
-    """Fetch models.dev registry. In-memory cache (1hr) + disk fallback.
+    """Fetch models.dev registry. Cache hierarchy: in-mem → disk → network.

    Returns the full registry dict keyed by provider ID, or empty dict on failure.
+
+    Cache hierarchy (when ``force_refresh=False``):
+      1. In-memory cache, populated and < TTL old → return immediately.
+      2. **Disk cache file < TTL old by mtime → load, populate in-mem, return.**
+         No network call. Saves ~500 ms per cold-start agent construction;
+         ``models.dev`` only changes when providers add new models, so a
+         1 hour staleness window is acceptable (same TTL as in-mem cache).
+      3. Network fetch → on success, save to disk + in-mem and return.
+      4. Network fails → fall back to ANY available disk cache (even stale)
+         with a short 5 min in-mem grace period before retrying network.
+
+    When ``force_refresh=True`` (used by ``hermes config refresh``, the
+    \"refresh model catalog\" code path), stages 1 and 2 are skipped. The
+    function always hits the network and only falls back to disk if the
+    network call fails.
    """
    global _models_dev_cache, _models_dev_cache_time

-    # Check in-memory cache
+    # Stage 1: fresh in-memory cache wins. This is the hot path on
+    # long-lived processes — no I/O, no system calls.
    if (
        not force_refresh
        and _models_dev_cache
@@ -221,7 +266,27 @@ def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
    ):
        return _models_dev_cache

-    # Try network fetch
+    # Stage 2: fresh-by-mtime disk cache short-circuits the network call.
+    # Only kicks in on cold-start processes (in-mem cache is empty or
+    # expired) and only when the user hasn't asked for a forced refresh.
+    # Skipped if the disk cache file is missing, unreadable, or older
+    # than _MODELS_DEV_CACHE_TTL.
+    if not force_refresh:
+        disk_age = _disk_cache_age_seconds()
+        if disk_age is not None and disk_age < _MODELS_DEV_CACHE_TTL:
+            disk_data = _load_disk_cache()
+            if disk_data:
+                _models_dev_cache = disk_data
+                # Anchor in-mem TTL to the disk file's age so we don't
+                # extend an already-aging cache by another full hour.
+                _models_dev_cache_time = time.time() - disk_age
+                logger.debug(
+                    "Loaded models.dev from fresh disk cache "
+                    "(%d providers, age=%.0fs)", len(disk_data), disk_age,
+                )
+                return _models_dev_cache
+
+    # Stage 3: network fetch.
    try:
        response = requests.get(MODELS_DEV_URL, timeout=15)
        response.raise_for_status()
@@ -239,8 +304,9 @@ def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
    except Exception as e:
        logger.debug("Failed to fetch models.dev: %s", e)

-    # Fall back to disk cache — use a short TTL (5 min) so we retry
-    # the network fetch soon instead of serving stale data for a full hour.
+    # Stage 4: network failed — fall back to whatever disk cache exists,
+    # even if it's stale. Give it a short 5 min in-mem TTL so we retry
+    # the network soon instead of serving stale data for a full hour.
    if not _models_dev_cache:
        _models_dev_cache = _load_disk_cache()
        if _models_dev_cache:
@@ -284,6 +350,28 @@ def lookup_models_dev_context(provider: str, model: str) -> Optional[int]:
            if ctx:
                return ctx

+    # Suffix-aware fallback: some providers (e.g. ollama-cloud) store
+    # model IDs with :cloud / -cloud suffixes in models.dev while the
+    # live API returns bare names.  Without this, kimi-k2.6 misses the
+    # kimi-k2.6:cloud entry and falls through to stale OpenRouter metadata
+    # reporting 32768 — tripping the 64k minimum-context guard.
+    # The suffix-stripping in fetch_ollama_cloud_models() handles the
+    # model-picker UX; this handles the context-length lookup path.
+    for suffix in (":cloud", "-cloud"):
+        suffixed_key = model + suffix
+        entry = models.get(suffixed_key)
+        if entry:
+            ctx = _extract_context(entry)
+            if ctx:
+                return ctx
+        # Also try case-insensitive
+        suffixed_lower = model_lower + suffix
+        for mid, mdata in models.items():
+            if mid.lower() == suffixed_lower:
+                ctx = _extract_context(mdata)
+                if ctx:
+                    return ctx
+
    return None


--- a/agent/moonshot_schema.py
+++ b/agent/moonshot_schema.py
@@ -15,6 +15,18 @@ and MoonshotAI/kimi-cli#1595:
 2. When ``anyOf`` is used, ``type`` must be on the ``anyOf`` children, not
   the parent.  Presence of both causes "type should be defined in anyOf
   items instead of the parent schema".
+3. ``enum`` arrays on scalar-typed nodes may not contain ``null`` or empty
+   strings.  Strip those entries (drop the enum entirely if it becomes empty).
+4. ``$ref`` nodes may not carry sibling keywords.  Moonshot expands the
+   reference before validation and then rejects the node if sibling keys
+   like ``description`` remain on the same node as ``$ref``.  Strip every
+   sibling from ``$ref`` nodes so only ``{"$ref": "..."}`` survives.
+   (Ported from anomalyco/opencode#24730.)
+5. ``items`` may not be a tuple-style array (``items: [schemaA, schemaB]``
+   for positional element schemas).  Moonshot's schema engine requires a
+   single object schema applied to every array element.  Collapse tuple
+   ``items`` to the first element schema (or ``{}`` if the tuple is empty).
+   (Ported from anomalyco/opencode#24730.)

 The ``#/definitions/...`` → ``#/$defs/...`` rewrite for draft-07 refs is
 handled separately in ``tools/mcp_tool._normalize_mcp_input_schema`` so it
@@ -66,6 +78,16 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
            }
        elif key in _SCHEMA_LIST_KEYS and isinstance(value, list):
            repaired[key] = [_repair_schema(v, is_schema=True) for v in value]
+        elif key == "items" and isinstance(value, list):
+            # Rule 5: tuple-style ``items`` arrays (positional element
+            # schemas) are not accepted by Moonshot.  Collapse to the
+            # first element schema if present, else to ``{}``.  This
+            # matches opencode's behaviour for moonshotai / kimi models.
+            first = value[0] if value else {}
+            if isinstance(first, dict):
+                repaired[key] = _repair_schema(first, is_schema=True)
+            else:
+                repaired[key] = first
        elif key in _SCHEMA_NODE_KEYS:
            # items / not / additionalProperties: single nested schema.
            # additionalProperties can also be a bool — leave those alone.
@@ -122,7 +144,7 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
    # empty, drop it entirely.
    if "enum" in repaired and isinstance(repaired["enum"], list):
        node_type = repaired.get("type")
-        if node_type in ("string", "integer", "number", "boolean"):
+        if node_type in {"string", "integer", "number", "boolean"}:
            cleaned = [v for v in repaired["enum"]
                       if v is not None and v != ""]
            if cleaned:
@@ -130,12 +152,21 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
            else:
                repaired.pop("enum")

+    # Rule 4: $ref nodes must not have sibling keywords.  Moonshot expands
+    # the reference before validation and then rejects the node if siblings
+    # like ``description`` / ``type`` / ``default`` appear alongside $ref.
+    # The referenced definition still carries its own description on the
+    # target node, which Moonshot accepts.
+    # (Ported from anomalyco/opencode#24730.)
+    if "$ref" in repaired:
+        return {"$ref": repaired["$ref"]}
+
    return repaired


 def _fill_missing_type(node: Dict[str, Any]) -> Dict[str, Any]:
    """Infer a reasonable ``type`` if this schema node has none."""
-    if "type" in node and node["type"] not in (None, ""):
+    if "type" in node and node["type"] not in {None, ""}:
        return node

    # Heuristic: presence of ``properties`` → object, ``items`` → array, ``enum``
--- a/agent/plugin_llm.py
+++ b/agent/plugin_llm.py
--- a/agent/portal_tags.py
+++ b/agent/portal_tags.py
@@ -0,0 +1,64 @@
+"""Centralized Nous Portal request tags.
+
+Every Hermes request that hits the Nous Portal — main agent loop, auxiliary
+client (compression / titles / vision / web_extract / session_search / etc.),
+and any future code path — must carry the same product-attribution tags so
+Nous can attribute usage to Hermes Agent and bucket it by client release.
+
+Tag shape (sent in OpenAI-compatible ``extra_body['tags']``):
+
+    [
+        "product=hermes-agent",
+        "client=hermes-client-v<__version__>",
+    ]
+
+The version is sourced live from ``hermes_cli.__version__`` so it auto-aligns
+to whatever release is installed; the release script
+(``scripts/release.py``) regex-bumps that single string, and every Portal
+request picks up the new tag on the next process start.
+
+Why one helper instead of inlining the literal at each site:
+* Four call sites (main loop profile, aux client, run_agent compression
+  fallback, web_tools fallback) used to drift apart — see PR #24194 which
+  only got the aux site, leaving the main loop sending a different tag set.
+* Tests should assert the same tag list everywhere; centralizing makes that
+  assertion a one-liner against this module.
+
+Do NOT pre-compute these as module-level constants in the consumers. The
+version can change at runtime (editable installs, hot-reload tooling), and
+``hermes_cli.__version__`` is the canonical source of truth.
+"""
+
+from __future__ import annotations
+
+from typing import List
+
+
+def _hermes_version() -> str:
+    """Return the current Hermes release version, e.g. ``"0.13.0"``.
+
+    Falls back to ``"unknown"`` if ``hermes_cli`` cannot be imported (should
+    never happen in a real install — guarded for defensive testing).
+    """
+    try:
+        from hermes_cli import __version__
+        return __version__
+    except Exception:
+        return "unknown"
+
+
+def hermes_client_tag() -> str:
+    """Return the ``client=...`` tag for Nous Portal requests.
+
+    Format: ``client=hermes-client-v<MAJOR>.<MINOR>.<PATCH>``.
+    """
+    return f"client=hermes-client-v{_hermes_version()}"
+
+
+def nous_portal_tags() -> List[str]:
+    """Return the canonical list of Nous Portal product tags.
+
+    Always returns a fresh list so callers can mutate it freely
+    (e.g. ``merged_extra.setdefault("tags", []).extend(nous_portal_tags())``).
+    """
+    return ["product=hermes-agent", hermes_client_tag()]
--- a/agent/process_bootstrap.py
+++ b/agent/process_bootstrap.py
@@ -0,0 +1,167 @@
+"""Process-level bootstrap helpers for ``run_agent``.
+
+Three concerns, all tied to ``AIAgent`` boot-time / runtime IO setup:
+
+1. **Lazy OpenAI SDK import** — ``_load_openai_cls`` + ``_OpenAIProxy``
+   defer the 240ms-ish ``from openai import OpenAI`` cost until first use,
+   while preserving ``isinstance(client, OpenAI)`` checks and
+   ``patch("run_agent.OpenAI", ...)`` test patterns.
+
+2. **Crash-resistant stdio** — ``_SafeWriter`` wraps stdout/stderr so
+   ``OSError: Input/output error`` from broken pipes (systemd, Docker,
+   thread teardown races) cannot crash the agent.  ``_install_safe_stdio``
+   applies the wrapper.
+
+3. **HTTP proxy resolution** — ``_get_proxy_from_env`` reads
+   ``HTTPS_PROXY`` / ``HTTP_PROXY`` / ``ALL_PROXY``;
+   ``_get_proxy_for_base_url`` respects ``NO_PROXY`` for the given base URL.
+
+``run_agent`` re-exports every name so existing
+``from run_agent import _get_proxy_from_env`` imports keep working
+unchanged.
+"""
+
+from __future__ import annotations
+
+import os
+import sys
+import urllib.request
+from typing import Optional
+
+from utils import base_url_hostname, normalize_proxy_url
+
+
+# Cached at module level so we only pay the OpenAI SDK import cost once
+# per process (after the first lazy load).
+_OPENAI_CLS_CACHE = None
+
+
+def _load_openai_cls() -> type:
+    """Import and cache ``openai.OpenAI``."""
+    global _OPENAI_CLS_CACHE
+    if _OPENAI_CLS_CACHE is None:
+        from openai import OpenAI as _cls
+        _OPENAI_CLS_CACHE = _cls
+    return _OPENAI_CLS_CACHE
+
+
+class _OpenAIProxy:
+    """Module-level proxy that looks like ``openai.OpenAI`` but imports lazily."""
+
+    __slots__ = ()
+
+    def __call__(self, *args, **kwargs):
+        return _load_openai_cls()(*args, **kwargs)
+
+    def __instancecheck__(self, obj):
+        return isinstance(obj, _load_openai_cls())
+
+    def __repr__(self):
+        return "<lazy openai.OpenAI proxy>"
+
+
+class _SafeWriter:
+    """Transparent stdio wrapper that catches OSError/ValueError from broken pipes.
+
+    When hermes-agent runs as a systemd service, Docker container, or headless
+    daemon, the stdout/stderr pipe can become unavailable (idle timeout, buffer
+    exhaustion, socket reset). Any print() call then raises
+    ``OSError: [Errno 5] Input/output error``, which can crash agent setup or
+    run_conversation() — especially via double-fault when an except handler
+    also tries to print.
+
+    Additionally, when subagents run in ThreadPoolExecutor threads, the shared
+    stdout handle can close between thread teardown and cleanup, raising
+    ``ValueError: I/O operation on closed file`` instead of OSError.
+
+    This wrapper delegates all writes to the underlying stream and silently
+    catches both OSError and ValueError. It is transparent when the wrapped
+    stream is healthy.
+    """
+
+    __slots__ = ("_inner",)
+
+    def __init__(self, inner):
+        object.__setattr__(self, "_inner", inner)
+
+    def write(self, data):
+        try:
+            return self._inner.write(data)
+        except (OSError, ValueError):
+            return len(data) if isinstance(data, str) else 0
+
+    def flush(self):
+        try:
+            self._inner.flush()
+        except (OSError, ValueError):
+            pass
+
+    def fileno(self):
+        return self._inner.fileno()
+
+    def isatty(self):
+        try:
+            return self._inner.isatty()
+        except (OSError, ValueError):
+            return False
+
+    def __getattr__(self, name):
+        return getattr(self._inner, name)
+
+
+def _get_proxy_from_env() -> Optional[str]:
+    """Read proxy URL from environment variables.
+
+    Checks HTTPS_PROXY, HTTP_PROXY, ALL_PROXY (and lowercase variants) in order.
+    Returns the first valid proxy URL found, or None if no proxy is configured.
+    """
+    for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
+                "https_proxy", "http_proxy", "all_proxy"):
+        value = os.environ.get(key, "").strip()
+        if value:
+            return normalize_proxy_url(value)
+    return None
+
+
+def _get_proxy_for_base_url(base_url: Optional[str]) -> Optional[str]:
+    """Return an env-configured proxy unless NO_PROXY excludes this base URL."""
+    proxy = _get_proxy_from_env()
+    if not proxy or not base_url:
+        return proxy
+
+    host = base_url_hostname(base_url)
+    if not host:
+        return proxy
+
+    try:
+        if urllib.request.proxy_bypass_environment(host):
+            return None
+    except Exception:
+        pass
+
+    return proxy
+
+
+def _install_safe_stdio() -> None:
+    """Wrap stdout/stderr so best-effort console output cannot crash the agent."""
+    for stream_name in ("stdout", "stderr"):
+        stream = getattr(sys, stream_name, None)
+        if stream is not None and not isinstance(stream, _SafeWriter):
+            setattr(sys, stream_name, _SafeWriter(stream))
+
+
+# Module-level proxy instance — drops in for ``openai.OpenAI``.  Imported as
+# ``from agent.process_bootstrap import OpenAI`` (or re-exported via
+# ``run_agent`` for legacy tests).
+OpenAI = _OpenAIProxy()
+
+
+__all__ = [
+    "OpenAI",
+    "_OpenAIProxy",
+    "_load_openai_cls",
+    "_SafeWriter",
+    "_install_safe_stdio",
+    "_get_proxy_from_env",
+    "_get_proxy_for_base_url",
+]
--- a/agent/prompt_builder.py
+++ b/agent/prompt_builder.py
@@ -157,6 +157,9 @@ MEMORY_GUIDANCE = (
    "User preferences and recurring corrections matter more than procedural task details.\n"
    "Do NOT save task progress, session outcomes, completed-work logs, or temporary TODO "
    "state to memory; use session_search to recall those from past transcripts. "
+    "Specifically: do not record PR numbers, issue numbers, commit SHAs, 'fixed bug X', "
+    "'submitted PR Y', 'Phase N done', file counts, or any artifact that will be stale "
+    "in 7 days. If a fact will be stale in a week, it does not belong in memory. "
    "If you've discovered a new way to do something, solved a problem that could be "
    "necessary later, save it as a skill with the skill tool.\n"
    "Write memories as declarative facts, not instructions to yourself. "
@@ -203,7 +206,12 @@ KANBAN_GUIDANCE = (
    "files outside it unless the task explicitly asks.\n"
    "3. **Heartbeat on long operations.** Call `kanban_heartbeat(note=...)` "
    "every few minutes during long subprocesses (training, encoding, crawling). "
-    "Skip heartbeats for short tasks.\n"
+    "Skip heartbeats for short tasks. **If your task may run longer than 1 hour, "
+    "you MUST call `kanban_heartbeat` at least once an hour** — the dispatcher "
+    "reclaims tasks running past `kanban.dispatch_stale_timeout_seconds` "
+    "(default 4 hours) when no heartbeat has arrived in the last hour. A "
+    "reclaim re-queues the task as `ready` without penalty (no failure counter "
+    "tick), but you lose your current run's progress.\n"
    "4. **Block on genuine ambiguity.** If you need a human decision you cannot "
    "infer (missing credentials, UX choice, paywalled source, peer output you "
    "need first), call `kanban_block(reason=\"...\")` and stop. Don't guess. "
@@ -213,7 +221,15 @@ KANBAN_GUIDANCE = (
    "artifacts. `metadata` is machine-readable facts "
    "(`{changed_files: [...], tests_run: N, decisions: [...]}`). Downstream "
    "workers read both via their own `kanban_show`. Never put secrets / "
-    "tokens / raw PII in either field — run rows are durable forever.\n"
+    "tokens / raw PII in either field — run rows are durable forever. "
+    "Exception: if your output is a code change that needs human review "
+    "before counting as merged/done (most coding tasks), drop the "
+    "structured metadata (changed_files / tests_run / diff_path) into a "
+    "`kanban_comment` first, then end with "
+    "`kanban_block(reason=\"review-required: <one-line summary>\")` so a "
+    "reviewer can approve+unblock or request changes. Reviewing-then-"
+    "completing is more honest than auto-completing work that still needs "
+    "eyes on it.\n"
    "6. **If follow-up work appears, create it; don't do it.** Use "
    "`kanban_create(title=..., assignee=<right-profile>, parents=[your-task-id])` "
    "to spawn a child task for the appropriate specialist profile instead of "
@@ -257,12 +273,16 @@ TOOL_USE_ENFORCEMENT_GUIDANCE = (

 # Model name substrings that trigger tool-use enforcement guidance.
 # Add new patterns here when a model family needs explicit steering.
-TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex", "gemini", "gemma", "grok")
+TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex", "gemini", "gemma", "grok", "glm", "qwen", "deepseek")

 # OpenAI GPT/Codex-specific execution guidance.  Addresses known failure modes
 # where GPT models abandon work on partial results, skip prerequisite lookups,
 # hallucinate instead of using tools, and declare "done" without verification.
 # Inspired by patterns from OpenAI's GPT-5.4 prompting guide & OpenClaw PR #38953.
+# Also applied to xAI Grok — same failure modes in practice (claims completion
+# without tool calls, suggests workarounds instead of using existing tools,
+# replies with plans/suggestions instead of executing). The body is
+# family-agnostic; the OPENAI_ prefix reflects origin, not exclusivity.
 OPENAI_MODEL_EXECUTION_GUIDANCE = (
    "# Execution discipline\n"
    "<tool_persistence>\n"
@@ -408,6 +428,23 @@ PLATFORM_HINTS = {
        "files arrive as downloadable documents. You can also include image "
        "URLs in markdown format ![alt](url) and they will be sent as photos."
    ),
+    "whatsapp_cloud": (
+        "You are on a text messaging communication platform, WhatsApp "
+        "(via Meta's official Business Cloud API). Standard markdown "
+        "(**bold**, ~~strike~~, # headers, [links](url)) is auto-converted "
+        "to WhatsApp's native syntax (*bold*, ~strike~, etc.) — feel free "
+        "to write in markdown. Tables are NOT supported — prefer bullet "
+        "lists or labeled key:value pairs. "
+        "You can send media files natively: include MEDIA:/absolute/path/to/file "
+        "in your response. Images (.jpg, .png) become photo attachments, "
+        "videos (.mp4) play inline, audio (.mp3, .ogg) sends as voice/audio "
+        "messages, other files arrive as documents. Image URLs in markdown "
+        "format ![alt](url) also work. "
+        "IMPORTANT: this platform has a 24-hour conversation window — if the "
+        "user hasn't messaged in 24h, free-form replies are refused by Meta "
+        "(error 131047). This rarely matters for live chat, but is worth "
+        "knowing if you're scheduling a delayed message."
+    ),
    "telegram": (
        "You are on a text messaging communication platform, Telegram. "
        "Standard markdown is automatically converted to Telegram format. "
@@ -1259,13 +1296,13 @@ def build_nous_subscription_prompt(valid_tool_names: "set[str] | None" = None) -

    lines = [
        "# Nous Subscription",
-        "Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, and browser automation (Browser Use) by default. Modal execution is optional.",
+        "Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, OpenAI Whisper STT, and browser automation (Browser Use) by default. Modal execution is optional.",
        "Current capability status:",
    ]
    lines.extend(_status_line(feature) for feature in features.items())
    lines.extend(
        [
-            "When a Nous-managed feature is active, do not ask the user for Firecrawl, FAL, OpenAI TTS, or Browser-Use API keys.",
+            "When a Nous-managed feature is active, do not ask the user for Firecrawl, FAL, OpenAI TTS, OpenAI Whisper, or Browser-Use API keys.",
            "If the user is not subscribed and asks for a capability that Nous subscription would unlock or simplify, suggest Nous subscription as one option alongside direct setup or local alternatives.",
            "Do not mention subscription unless the user asks about it or it directly solves the current missing capability.",
            "Useful commands: hermes setup, hermes setup tools, hermes setup terminal, hermes status.",
--- a/agent/prompt_caching.py
+++ b/agent/prompt_caching.py
@@ -1,9 +1,9 @@
-"""Anthropic prompt caching (system_and_3 strategy).
+"""Anthropic prompt caching strategy.

-Reduces input token costs by ~75% on multi-turn conversations by caching
-the conversation prefix. Uses 4 cache_control breakpoints (Anthropic max):
-  1. System prompt (stable across all turns)
-  2-4. Last 3 non-system messages (rolling window)
+Single layout: ``system_and_3``. 4 cache_control breakpoints — system
+prompt + last 3 non-system messages, all at the same TTL (5m or 1h).
+Reduces input token costs by ~75% on multi-turn conversations within a
+single session.

 Pure functions -- no class state, no AIAgent dependency.
 """
@@ -38,6 +38,14 @@ def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool =
            last["cache_control"] = cache_marker


+def _build_marker(ttl: str) -> Dict[str, str]:
+    """Build a cache_control marker dict for the given TTL ('5m' or '1h')."""
+    marker: Dict[str, str] = {"type": "ephemeral"}
+    if ttl == "1h":
+        marker["ttl"] = "1h"
+    return marker
+
+
 def apply_anthropic_cache_control(
    api_messages: List[Dict[str, Any]],
    cache_ttl: str = "5m",
@@ -45,7 +53,8 @@ def apply_anthropic_cache_control(
 ) -> List[Dict[str, Any]]:
    """Apply system_and_3 caching strategy to messages for Anthropic models.

-    Places up to 4 cache_control breakpoints: system prompt + last 3 non-system messages.
+    Places up to 4 cache_control breakpoints: system prompt + last 3 non-system
+    messages, all at the same TTL.

    Returns:
        Deep copy of messages with cache_control breakpoints injected.
@@ -54,9 +63,7 @@ def apply_anthropic_cache_control(
    if not messages:
        return messages

-    marker = {"type": "ephemeral"}
-    if cache_ttl == "1h":
-        marker["ttl"] = "1h"
+    marker = _build_marker(cache_ttl)

    breakpoints_used = 0

--- a/agent/redact.py
+++ b/agent/redact.py
@@ -64,7 +64,7 @@ _SENSITIVE_BODY_KEYS = frozenset({
 # cli.py) or `HERMES_REDACT_SECRETS=false` in ~/.hermes/.env. An opt-out
 # warning is logged at gateway and CLI startup so operators see the
 # downgrade — see `_log_redaction_status()` in gateway/run.py and cli.py.
-_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "true").lower() in ("1", "true", "yes", "on")
+_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "true").lower() in {"1", "true", "yes", "on"}

 # Known API key prefixes -- match the prefix + contiguous token chars
 _PREFIX_PATTERNS = [
@@ -103,6 +103,7 @@ _PREFIX_PATTERNS = [
    r"hsk-[A-Za-z0-9]{10,}",            # Hindsight API key
    r"mem0_[A-Za-z0-9]{10,}",           # Mem0 Platform API key
    r"brv_[A-Za-z0-9]{10,}",            # ByteRover API key
+    r"xai-[A-Za-z0-9]{30,}",            # xAI (Grok) API key
 ]

 # ENV assignment patterns: KEY=value where KEY contains a secret-like name
@@ -320,6 +321,15 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
    patterns when the text is known to be source code (e.g. MAX_TOKENS=***
    constants, "apiKey": "test" fixtures). Prefix patterns, auth headers,
    private keys, DB connstrings, JWTs, and URL secrets are still redacted.
+
+    Performance: each regex pattern is gated behind a cheap substring
+    pre-check (e.g. ``"=" in text`` for ENV assignments, ``"://" in text``
+    for URLs, ``"eyJ" in text`` for JWTs). On a typical hermes log line
+    (no secrets) this drops the 13-pattern scan from ~5.6us to ~1.8us per
+    record (-68%). The pre-checks are conservative — false positives
+    still run the full regex, which then doesn't match. False negatives
+    are impossible because every regex requires the gated substring to
+    match.
    """
    if text is None:
        return None
@@ -330,68 +340,122 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
    if not (force or _REDACT_ENABLED):
        return text

-    # Known prefixes (sk-, ghp_, etc.)
-    text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
+    # Known prefixes (sk-, ghp_, etc.) — gate on substring presence
+    if _has_known_prefix_substring(text):
+        text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)

    # ENV assignments: OPENAI_API_KEY=***  (skip for code files — false positives)
    if not code_file:
-        def _redact_env(m):
-            name, quote, value = m.group(1), m.group(2), m.group(3)
-            return f"{name}={quote}{_mask_token(value)}{quote}"
-        text = _ENV_ASSIGN_RE.sub(_redact_env, text)
+        if "=" in text:
+            def _redact_env(m):
+                name, quote, value = m.group(1), m.group(2), m.group(3)
+                return f"{name}={quote}{_mask_token(value)}{quote}"
+            text = _ENV_ASSIGN_RE.sub(_redact_env, text)

        # JSON fields: "apiKey": "***"  (skip for code files — false positives)
-        def _redact_json(m):
-            key, value = m.group(1), m.group(2)
-            return f'{key}: "{_mask_token(value)}"'
-        text = _JSON_FIELD_RE.sub(_redact_json, text)
+        if ":" in text and '"' in text:
+            def _redact_json(m):
+                key, value = m.group(1), m.group(2)
+                return f'{key}: "{_mask_token(value)}"'
+            text = _JSON_FIELD_RE.sub(_redact_json, text)

-    # Authorization headers
-    text = _AUTH_HEADER_RE.sub(
-        lambda m: m.group(1) + _mask_token(m.group(2)),
-        text,
-    )
+    # Authorization headers — _AUTH_HEADER_RE is "Authorization: Bearer ..."
+    # case-insensitive, so "uthorization" is the cheapest substring gate that
+    # covers both "Authorization" and "authorization" without a casefold().
+    if "uthorization" in text or "UTHORIZATION" in text:
+        text = _AUTH_HEADER_RE.sub(
+            lambda m: m.group(1) + _mask_token(m.group(2)),
+            text,
+        )

-    # Telegram bot tokens
-    def _redact_telegram(m):
-        prefix = m.group(1) or ""
-        digits = m.group(2)
-        return f"{prefix}{digits}:***"
-    text = _TELEGRAM_RE.sub(_redact_telegram, text)
+    # Telegram bot tokens — pattern requires ":<token>" with digits prefix
+    if ":" in text:
+        def _redact_telegram(m):
+            prefix = m.group(1) or ""
+            digits = m.group(2)
+            return f"{prefix}{digits}:***"
+        text = _TELEGRAM_RE.sub(_redact_telegram, text)

    # Private key blocks
-    text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
+    if "BEGIN" in text and "-----" in text:
+        text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)

    # Database connection string passwords
-    text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
+    if "://" in text:
+        text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)

    # JWT tokens (eyJ... — base64-encoded JSON headers)
-    text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)
+    if "eyJ" in text:
+        text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)

    # URL userinfo (http(s)://user:pass@host) — redact for non-DB schemes.
    # DB schemes are handled above by _DB_CONNSTR_RE.
-    text = _redact_url_userinfo(text)
+    if "://" in text:
+        text = _redact_url_userinfo(text)

-    # URL query params containing opaque tokens (?access_token=…&code=…)
-    text = _redact_url_query_params(text)
+        # URL query params containing opaque tokens (?access_token=…&code=…)
+        if "?" in text:
+            text = _redact_url_query_params(text)

    # Form-urlencoded bodies (only triggers on clean k=v&k=v inputs).
-    text = _redact_form_body(text)
+    if "&" in text and "=" in text:
+        text = _redact_form_body(text)

    # Discord user/role mentions (<@snowflake_id>)
-    text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)
+    if "<@" in text:
+        text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)

    # E.164 phone numbers (Signal, WhatsApp)
-    def _redact_phone(m):
-        phone = m.group(1)
-        if len(phone) <= 8:
-            return phone[:2] + "****" + phone[-2:]
-        return phone[:4] + "****" + phone[-4:]
-    text = _SIGNAL_PHONE_RE.sub(_redact_phone, text)
+    if "+" in text:
+        def _redact_phone(m):
+            phone = m.group(1)
+            if len(phone) <= 8:
+                return phone[:2] + "****" + phone[-2:]
+            return phone[:4] + "****" + phone[-4:]
+        text = _SIGNAL_PHONE_RE.sub(_redact_phone, text)

    return text


+# Substrings used to gate ``_PREFIX_RE`` execution. If none of these appear in
+# the input string, the prefix regex cannot match anything, so we skip it.
+# False positives are fine (they just run the regex, which then matches
+# nothing) — the bound is "no false negatives" and that holds because every
+# pattern in ``_PREFIX_PATTERNS`` has at least one of these as a literal
+# substring of its leading characters.
+#
+# Derived automatically from ``_PREFIX_PATTERNS`` at module load time so a
+# future PR that adds a new prefix to the regex list can't silently break
+# the screen.
+
+def _extract_literal_prefix(pattern: str) -> str:
+    """Return the leading literal characters of a regex pattern.
+
+    Stops at the first regex metacharacter (``[``, ``(``, ``\\``, ``.``,
+    ``?``, ``*``, ``+``, ``|``, ``{``, ``^``, ``$``).  Returns the literal
+    that any match of the pattern MUST contain as a substring, so the
+    pre-screen never produces false negatives.
+    """
+    meta = "[(\\.?*+|{^$"
+    for i, ch in enumerate(pattern):
+        if ch in meta:
+            return pattern[:i]
+    return pattern
+
+
+_PREFIX_SUBSTRINGS = tuple(
+    _extract_literal_prefix(p) for p in _PREFIX_PATTERNS
+)
+
+
+def _has_known_prefix_substring(text: str) -> bool:
+    """Return True if ``text`` contains any known credential prefix substring.
+
+    Used as a cheap pre-check before invoking the expensive ``_PREFIX_RE``.
+    """
+    return any(p in text for p in _PREFIX_SUBSTRINGS)
+
+
 class RedactingFormatter(logging.Formatter):
    """Log formatter that redacts secrets from all log messages."""

--- a/agent/secret_sources/init.py
+++ b/agent/secret_sources/init.py
@@ -0,0 +1,13 @@
+"""External secret source integrations.
+
+A secret source is anything that can supply environment-variable-shaped
+credentials at process startup, _after_ ~/.hermes/.env has loaded.  By
+default sources are non-destructive: they only set values for env vars
+that aren't already present, so .env and shell exports continue to win.
+
+Currently shipped:
+
+  - ``bitwarden`` — Bitwarden Secrets Manager (`bws` CLI).  See
+    ``agent.secret_sources.bitwarden`` for the integration and
+    ``hermes_cli.secrets_cli`` for the user-facing setup wizard.
+"""
--- a/agent/secret_sources/bitwarden.py
+++ b/agent/secret_sources/bitwarden.py
@@ -0,0 +1,515 @@
+"""Bitwarden Secrets Manager (`bws` CLI) integration.
+
+Hermes pulls API keys from Bitwarden Secrets Manager at process startup
+so they don't have to live in plaintext in ``~/.hermes/.env``.
+
+Design summary
+--------------
+
+* The ``bws`` binary is auto-installed into ``<hermes_home>/bin/bws`` on
+  first use.  Hermes pins one version (``_BWS_VERSION``) and downloads
+  the matching asset from the official GitHub Releases page, verifying
+  the SHA-256 against the release's published checksum file.
+* The access token is stored in ``~/.hermes/.env`` as
+  ``BWS_ACCESS_TOKEN`` (or whatever name the user picked in
+  ``secrets.bitwarden.access_token_env``).  This is the one
+  bootstrap secret — every other provider key can live in Bitwarden.
+* Pulling secrets is a single ``bws secret list <project_id>
+  --output json`` call.  We cache the result in-process for
+  ``cache_ttl_seconds`` so back-to-back ``hermes`` invocations don't
+  hammer the API.
+* Failures NEVER block Hermes startup.  Missing binary, no network,
+  expired token, etc. all emit a one-line warning and continue with
+  whatever credentials ``.env`` already had.
+
+The module is intentionally subprocess-driven rather than going through
+the ``bitwarden-sdk-secrets`` Python package: one cross-platform binary
+is easier to lazy-install than a wheels-with-Rust-extension dependency.
+"""
+
+from __future__ import annotations
+
+import hashlib
+import json
+import logging
+import os
+import platform
+import shutil
+import stat
+import subprocess
+import sys
+import tempfile
+import time
+import urllib.error
+import urllib.request
+import zipfile
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Configuration constants
+# ---------------------------------------------------------------------------
+
+# Pinned upstream version.  Bump in a follow-up PR — never auto-resolve
+# "latest" because upstream release shape (asset names, CLI flags) is
+# allowed to change between majors and we want updates to be deliberate.
+_BWS_VERSION = "2.0.0"
+
+_BWS_RELEASE_BASE = (
+    f"https://github.com/bitwarden/sdk-sm/releases/download/bws-v{_BWS_VERSION}"
+)
+_BWS_CHECKSUM_NAME = f"bws-sha256-checksums-{_BWS_VERSION}.txt"
+
+# How long to wait for bws subprocesses and HTTP downloads, in seconds.
+_BWS_DOWNLOAD_TIMEOUT = 60
+_BWS_RUN_TIMEOUT = 30
+
+# In-process cache so repeated load_hermes_dotenv() calls (CLI startup,
+# gateway hot-reload, test suites) don't re-fetch from BSM.
+_CacheKey = Tuple[str, str]  # (access_token_fingerprint, project_id)
+_CACHE: Dict[_CacheKey, "_CachedFetch"] = {}
+
+
+@dataclass
+class _CachedFetch:
+    secrets: Dict[str, str]
+    fetched_at: float
+
+    def is_fresh(self, ttl_seconds: float) -> bool:
+        if ttl_seconds <= 0:
+            return False
+        return (time.time() - self.fetched_at) < ttl_seconds
+
+
+# ---------------------------------------------------------------------------
+# Public dataclasses
+# ---------------------------------------------------------------------------
+
+
+@dataclass
+class FetchResult:
+    """Outcome of a single BSM pull."""
+
+    secrets: Dict[str, str] = field(default_factory=dict)
+    applied: List[str] = field(default_factory=list)   # set into os.environ
+    skipped: List[str] = field(default_factory=list)   # already set, not overridden
+    warnings: List[str] = field(default_factory=list)  # non-fatal issues
+    error: Optional[str] = None                        # fatal: nothing was fetched
+    binary_path: Optional[Path] = None
+
+    @property
+    def ok(self) -> bool:
+        return self.error is None
+
+
+# ---------------------------------------------------------------------------
+# Binary discovery + lazy install
+# ---------------------------------------------------------------------------
+
+
+def _hermes_bin_dir() -> Path:
+    """Where Hermes stores its managed binaries.  Profile-aware."""
+    from hermes_constants import get_hermes_home
+
+    return get_hermes_home() / "bin"
+
+
+def find_bws(*, install_if_missing: bool = False) -> Optional[Path]:
+    """Return a path to a usable ``bws`` binary, or None.
+
+    Resolution order:
+      1. ``<hermes_home>/bin/bws``  (our managed copy — preferred)
+      2. ``shutil.which("bws")``    (system PATH)
+
+    When ``install_if_missing`` is True and neither resolves, this calls
+    :func:`install_bws` to download and verify the pinned version.
+    """
+    managed = _hermes_bin_dir() / _platform_binary_name()
+    if managed.exists() and os.access(managed, os.X_OK):
+        return managed
+
+    system = shutil.which("bws")
+    if system:
+        return Path(system)
+
+    if install_if_missing:
+        try:
+            return install_bws()
+        except Exception as exc:  # noqa: BLE001 — never block startup
+            logger.warning("bws auto-install failed: %s", exc)
+            return None
+    return None
+
+
+def _platform_binary_name() -> str:
+    return "bws.exe" if platform.system() == "Windows" else "bws"
+
+
+def _platform_asset_name() -> str:
+    """Map (uname, arch, libc) → the upstream asset filename.
+
+    Asset names follow Rust's target triple convention.  Linux defaults
+    to gnu (glibc); we switch to musl only if ldd --version says so.
+    """
+    system = platform.system()
+    machine = platform.machine().lower()
+
+    if system == "Darwin":
+        # Universal binary works on both Intel and Apple Silicon — no
+        # need to pick a per-arch asset.
+        return f"bws-macos-universal-{_BWS_VERSION}.zip"
+
+    if system == "Windows":
+        arch = "aarch64" if machine in ("arm64", "aarch64") else "x86_64"
+        return f"bws-{arch}-pc-windows-msvc-{_BWS_VERSION}.zip"
+
+    if system == "Linux":
+        arch = "aarch64" if machine in ("arm64", "aarch64") else "x86_64"
+        libc = "gnu"
+        # ldd --version writes to stderr on glibc, stdout on musl.  We
+        # don't need bullet-proof detection — getting it wrong falls
+        # back to a clear error from the binary loader, which we catch.
+        try:
+            res = subprocess.run(
+                ["ldd", "--version"],
+                capture_output=True,
+                text=True,
+                timeout=2,
+            )
+            if "musl" in (res.stdout + res.stderr).lower():
+                libc = "musl"
+        except (OSError, subprocess.TimeoutExpired):
+            pass
+        return f"bws-{arch}-unknown-linux-{libc}-{_BWS_VERSION}.zip"
+
+    raise RuntimeError(
+        f"Unsupported platform for bws auto-install: {system} {machine}"
+    )
+
+
+def install_bws(*, force: bool = False) -> Path:
+    """Download, verify, and install the pinned ``bws`` binary.
+
+    Returns the path to the installed executable.  Raises on any
+    failure (network, checksum, extraction) — callers in the auto-install
+    path catch these; the user-facing ``hermes secrets bitwarden setup``
+    surface lets them propagate so the wizard can show a clear error.
+    """
+    bin_dir = _hermes_bin_dir()
+    bin_dir.mkdir(parents=True, exist_ok=True)
+    target = bin_dir / _platform_binary_name()
+
+    if target.exists() and not force:
+        return target
+
+    asset_name = _platform_asset_name()
+    asset_url = f"{_BWS_RELEASE_BASE}/{asset_name}"
+    checksum_url = f"{_BWS_RELEASE_BASE}/{_BWS_CHECKSUM_NAME}"
+
+    with tempfile.TemporaryDirectory(prefix="hermes-bws-") as tmpdir:
+        tmp = Path(tmpdir)
+        zip_path = tmp / asset_name
+        checksum_path = tmp / _BWS_CHECKSUM_NAME
+
+        logger.info("Downloading %s", asset_url)
+        _http_download(asset_url, zip_path)
+        _http_download(checksum_url, checksum_path)
+
+        expected = _expected_sha256(checksum_path, asset_name)
+        actual = _sha256_file(zip_path)
+        if expected.lower() != actual.lower():
+            raise RuntimeError(
+                f"Checksum mismatch for {asset_name}: "
+                f"expected {expected}, got {actual}"
+            )
+
+        with zipfile.ZipFile(zip_path) as zf:
+            member = _pick_zip_member(zf, _platform_binary_name())
+            zf.extract(member, tmp)
+            extracted = tmp / member
+
+        # Move into place atomically.  We write to a sibling tempfile in
+        # the final directory so the rename can't cross filesystems.
+        fd, staged = tempfile.mkstemp(dir=str(bin_dir), prefix=".bws_")
+        os.close(fd)
+        shutil.copy2(extracted, staged)
+        os.chmod(
+            staged,
+            stat.S_IRUSR | stat.S_IWUSR | stat.S_IXUSR
+            | stat.S_IRGRP | stat.S_IXGRP
+            | stat.S_IROTH | stat.S_IXOTH,
+        )
+        os.replace(staged, target)
+
+    logger.info("Installed bws %s at %s", _BWS_VERSION, target)
+    return target
+
+
+def _http_download(url: str, dest: Path) -> None:
+    req = urllib.request.Request(url, headers={"User-Agent": "hermes-agent"})
+    try:
+        with urllib.request.urlopen(req, timeout=_BWS_DOWNLOAD_TIMEOUT) as resp:  # noqa: S310
+            with open(dest, "wb") as f:
+                shutil.copyfileobj(resp, f)
+    except urllib.error.URLError as exc:
+        raise RuntimeError(f"Failed to download {url}: {exc}") from exc
+
+
+def _expected_sha256(checksum_file: Path, asset_name: str) -> str:
+    """Parse the upstream ``bws-sha256-checksums-X.Y.Z.txt`` file.
+
+    Format is the standard ``sha256sum`` output: ``<hex>  <filename>``,
+    one per line.
+    """
+    text = checksum_file.read_text(encoding="utf-8", errors="replace")
+    for line in text.splitlines():
+        parts = line.strip().split()
+        if len(parts) >= 2 and parts[-1] == asset_name:
+            return parts[0]
+    raise RuntimeError(
+        f"No checksum entry for {asset_name} in {checksum_file.name}"
+    )
+
+
+def _sha256_file(path: Path) -> str:
+    h = hashlib.sha256()
+    with open(path, "rb") as f:
+        for chunk in iter(lambda: f.read(65536), b""):
+            h.update(chunk)
+    return h.hexdigest()
+
+
+def _pick_zip_member(zf: zipfile.ZipFile, binary_name: str) -> str:
+    """Find the binary inside the upstream zip.
+
+    Historically the archive has been flat (``bws`` at the root) but we
+    tolerate a top-level directory just in case upstream changes.
+    """
+    candidates = [n for n in zf.namelist() if n.split("/")[-1] == binary_name]
+    if not candidates:
+        raise RuntimeError(
+            f"Could not find {binary_name} inside downloaded archive "
+            f"(members: {zf.namelist()[:5]}...)"
+        )
+    # Prefer the shortest path (i.e. root over nested) for determinism.
+    candidates.sort(key=len)
+    return candidates[0]
+
+
+# ---------------------------------------------------------------------------
+# Secret fetch + apply
+# ---------------------------------------------------------------------------
+
+
+def _token_fingerprint(token: str) -> str:
+    """SHA-256 prefix used as a cache key — never logged, never displayed."""
+    return hashlib.sha256(token.encode("utf-8")).hexdigest()[:16]
+
+
+def fetch_bitwarden_secrets(
+    *,
+    access_token: str,
+    project_id: str,
+    binary: Optional[Path] = None,
+    cache_ttl_seconds: float = 300,
+    use_cache: bool = True,
+) -> Tuple[Dict[str, str], List[str]]:
+    """Pull the secrets for ``project_id`` from Bitwarden Secrets Manager.
+
+    Returns ``(secrets_dict, warnings_list)``.
+
+    Raises :class:`RuntimeError` for fatal conditions (missing binary,
+    auth failure, unparseable output).  Callers in the env_loader path
+    catch this and emit a single warning; callers in the user-facing
+    setup wizard let it propagate.
+    """
+    if not access_token:
+        raise RuntimeError("Bitwarden access token is empty")
+    if not project_id:
+        raise RuntimeError("Bitwarden project_id is empty")
+
+    cache_key = (_token_fingerprint(access_token), project_id)
+    if use_cache:
+        cached = _CACHE.get(cache_key)
+        if cached and cached.is_fresh(cache_ttl_seconds):
+            return cached.secrets, []
+
+    bws = binary or find_bws(install_if_missing=True)
+    if bws is None:
+        raise RuntimeError(
+            "bws binary not available — auto-install failed and `bws` is "
+            "not on PATH.  Install manually from "
+            "https://github.com/bitwarden/sdk-sm/releases or re-run "
+            "`hermes secrets bitwarden setup`."
+        )
+
+    secrets, warnings = _run_bws_list(bws, access_token, project_id)
+    _CACHE[cache_key] = _CachedFetch(secrets=secrets, fetched_at=time.time())
+    return secrets, warnings
+
+
+def _run_bws_list(
+    bws: Path, access_token: str, project_id: str
+) -> Tuple[Dict[str, str], List[str]]:
+    cmd = [str(bws), "secret", "list", project_id, "--output", "json"]
+    env = os.environ.copy()
+    env["BWS_ACCESS_TOKEN"] = access_token
+    # Make sure we're not echoing telemetry / colour codes into json.
+    env.setdefault("NO_COLOR", "1")
+
+    try:
+        proc = subprocess.run(  # noqa: S603 — bws path is trusted
+            cmd,
+            env=env,
+            capture_output=True,
+            text=True,
+            timeout=_BWS_RUN_TIMEOUT,
+        )
+    except subprocess.TimeoutExpired as exc:
+        raise RuntimeError(
+            f"bws timed out after {_BWS_RUN_TIMEOUT}s fetching secrets"
+        ) from exc
+    except OSError as exc:
+        raise RuntimeError(f"failed to invoke bws: {exc}") from exc
+
+    if proc.returncode != 0:
+        # bws writes auth/network errors to stderr in plain English.
+        # Strip ANSI just in case and surface the first 200 chars.
+        err = (proc.stderr or proc.stdout or "").strip().replace("\x1b", "")
+        raise RuntimeError(
+            f"bws exited {proc.returncode}: {err[:200]}"
+        )
+
+    raw = proc.stdout.strip()
+    if not raw:
+        return {}, ["bws returned no output (empty project?)"]
+
+    try:
+        payload = json.loads(raw)
+    except json.JSONDecodeError as exc:
+        raise RuntimeError(f"bws returned non-JSON output: {exc}") from exc
+
+    if not isinstance(payload, list):
+        raise RuntimeError(
+            f"bws returned unexpected shape: {type(payload).__name__}"
+        )
+
+    secrets: Dict[str, str] = {}
+    warnings: List[str] = []
+    for item in payload:
+        if not isinstance(item, dict):
+            continue
+        key = item.get("key")
+        value = item.get("value")
+        if not isinstance(key, str) or not isinstance(value, str):
+            continue
+        if not _is_valid_env_name(key):
+            warnings.append(
+                f"Skipping secret {key!r}: not a valid env-var name"
+            )
+            continue
+        secrets[key] = value
+    return secrets, warnings
+
+
+def _is_valid_env_name(name: str) -> bool:
+    if not name:
+        return False
+    if not (name[0].isalpha() or name[0] == "_"):
+        return False
+    return all(c.isalnum() or c == "_" for c in name)
+
+
+# ---------------------------------------------------------------------------
+# Public entry point — called from hermes_cli.env_loader
+# ---------------------------------------------------------------------------
+
+
+def apply_bitwarden_secrets(
+    *,
+    enabled: bool,
+    access_token_env: str = "BWS_ACCESS_TOKEN",
+    project_id: str = "",
+    override_existing: bool = False,
+    cache_ttl_seconds: float = 300,
+    auto_install: bool = True,
+) -> FetchResult:
+    """Pull secrets from BSM and set them on ``os.environ``.
+
+    This is the function ``load_hermes_dotenv()`` calls after the .env
+    files have loaded.  It is intentionally defensive — any failure
+    returns a :class:`FetchResult` with ``error`` set; it never raises.
+
+    Parameters mirror the ``secrets.bitwarden.*`` config keys so the
+    caller can just splat the dict in.
+    """
+    result = FetchResult()
+
+    if not enabled:
+        return result
+
+    access_token = os.environ.get(access_token_env, "").strip()
+    if not access_token:
+        result.error = (
+            f"secrets.bitwarden.enabled is true but {access_token_env} is "
+            "not set.  Run `hermes secrets bitwarden setup`."
+        )
+        return result
+
+    if not project_id:
+        result.error = (
+            "secrets.bitwarden.project_id is empty.  "
+            "Run `hermes secrets bitwarden setup`."
+        )
+        return result
+
+    binary = find_bws(install_if_missing=auto_install)
+    result.binary_path = binary
+    if binary is None:
+        result.error = (
+            "bws binary not available and auto-install is disabled.  "
+            "Run `hermes secrets bitwarden setup` to install."
+        )
+        return result
+
+    try:
+        secrets, warnings = fetch_bitwarden_secrets(
+            access_token=access_token,
+            project_id=project_id,
+            binary=binary,
+            cache_ttl_seconds=cache_ttl_seconds,
+        )
+    except RuntimeError as exc:
+        result.error = str(exc)
+        return result
+
+    result.secrets = secrets
+    result.warnings.extend(warnings)
+
+    for key, value in secrets.items():
+        if key == access_token_env:
+            # Don't let BSM clobber the very token we used to fetch
+            # itself — that would be a footgun if someone stored the
+            # token as a BSM secret too.
+            result.skipped.append(key)
+            continue
+        if not override_existing and os.environ.get(key):
+            result.skipped.append(key)
+            continue
+        os.environ[key] = value
+        result.applied.append(key)
+
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Test hook — used by hermetic tests to flush the cache between cases.
+# ---------------------------------------------------------------------------
+
+
+def _reset_cache_for_tests() -> None:
+    _CACHE.clear()
--- a/agent/shell_hooks.py
+++ b/agent/shell_hooks.py
@@ -83,6 +83,7 @@ logger = logging.getLogger(__name__)
 DEFAULT_TIMEOUT_SECONDS = 60
 MAX_TIMEOUT_SECONDS = 300
 ALLOWLIST_FILENAME = "shell-hooks-allowlist.json"
+_DEFAULT_BLOCK_MESSAGE = "Blocked by shell hook."

 # (event, matcher, command) triples that have been wired to the plugin
 # manager in the current process.  Matcher is part of the key because
@@ -312,7 +313,7 @@ def _parse_single_entry(
        )
        matcher = None

-    if matcher is not None and event not in ("pre_tool_call", "post_tool_call"):
+    if matcher is not None and event not in {"pre_tool_call", "post_tool_call"}:
        logger.warning(
            "hooks.%s[%d].matcher=%r will be ignored at runtime — the "
            "matcher field is only honored for pre_tool_call / "
@@ -423,7 +424,7 @@ def _make_callback(spec: ShellHookSpec) -> Callable[..., Optional[Dict[str, Any]

    def _callback(**kwargs: Any) -> Optional[Dict[str, Any]]:
        # Matcher gate — only meaningful for tool-scoped events.
-        if spec.event in ("pre_tool_call", "post_tool_call"):
+        if spec.event in {"pre_tool_call", "post_tool_call"}:
            if not spec.matches_tool(kwargs.get("tool_name")):
                return None

@@ -481,6 +482,17 @@ def _serialize_payload(event: str, kwargs: Dict[str, Any]) -> str:
    return json.dumps(payload, ensure_ascii=False, default=str)


+def _block_message(primary: Any, secondary: Any) -> str:
+    """Return a validated string block message, falling back to the default.
+
+    Accepts two candidate fields (primary wins over secondary) so callers
+    can express field-priority differences between the two hook wire formats
+    without duplicating the type-check logic.
+    """
+    raw = primary or secondary
+    return raw if isinstance(raw, str) and raw else _DEFAULT_BLOCK_MESSAGE
+
+
 def _parse_response(event: str, stdout: str) -> Optional[Dict[str, Any]]:
    """Translate stdout JSON into a Hermes wire-shape dict.

@@ -515,13 +527,9 @@ def _parse_response(event: str, stdout: str) -> Optional[Dict[str, Any]]:

    if event == "pre_tool_call":
        if data.get("action") == "block":
-            message = data.get("message") or data.get("reason") or ""
-            if isinstance(message, str) and message:
-                return {"action": "block", "message": message}
+            return {"action": "block", "message": _block_message(data.get("message"), data.get("reason"))}
        if data.get("decision") == "block":
-            message = data.get("reason") or data.get("message") or ""
-            if isinstance(message, str) and message:
-                return {"action": "block", "message": message}
+            return {"action": "block", "message": _block_message(data.get("reason"), data.get("message"))}
        return None

    context = data.get("context")
@@ -624,7 +632,10 @@ def _locked_update_approvals() -> Iterator[Dict[str, Any]]:
            yield data
            save_allowlist(data)
        finally:
-            fcntl.flock(lock_fh.fileno(), fcntl.LOCK_UN)
+            try:
+                fcntl.flock(lock_fh.fileno(), fcntl.LOCK_UN)
+            except (OSError, IOError):
+                pass


 def _prompt_and_record(
@@ -658,7 +669,7 @@ def _prompt_and_record(
        print()  # keep the terminal tidy after ^C
        return False

-    if answer in ("y", "yes"):
+    if answer in {"y", "yes"}:
        _record_approval(event, command)
        return True

@@ -752,13 +763,13 @@ def _resolve_effective_accept(
    if accept_hooks_arg:
        return True
    env = os.environ.get("HERMES_ACCEPT_HOOKS", "").strip().lower()
-    if env in ("1", "true", "yes", "on"):
+    if env in {"1", "true", "yes", "on"}:
        return True
    cfg_val = cfg.get("hooks_auto_accept", False)
    if isinstance(cfg_val, bool):
        return cfg_val
    if isinstance(cfg_val, str):
-        return cfg_val.strip().lower() in ("1", "true", "yes", "on")
+        return cfg_val.strip().lower() in {"1", "true", "yes", "on"}
    return False


--- a/agent/skill_bundles.py
+++ b/agent/skill_bundles.py
@@ -0,0 +1,410 @@
+"""Skill bundles — aliases that load multiple skills under one slash command.
+
+A skill bundle is a small YAML file that names a set of skills to load
+together. Invoking ``/<bundle-name>`` from the CLI or gateway loads every
+referenced skill's full content into a single user message, the same way
+``/<skill-name>`` does — but for N skills at once.
+
+Storage
+-------
+Bundles live in ``~/.hermes/skill-bundles/*.yaml`` (and the equivalent
+profile-aware directory under ``HERMES_HOME``). Each file looks like::
+
+    name: backend-dev
+    description: Backend feature work — code review, testing, PR workflow.
+    skills:
+      - github-code-review
+      - test-driven-development
+      - github-pr-workflow
+    instruction: |
+      Optional extra guidance to inject above the skill bodies.
+
+The file's stem is treated as a fallback name when ``name:`` is absent, so
+dropping a YAML into the directory is enough to register a new bundle.
+
+Conflict resolution
+-------------------
+If a bundle and a skill share the same slash name, the bundle wins. The
+slash command dispatch checks bundles first, then falls back to skills.
+This is the intended behavior — a user who names a bundle ``research``
+explicitly wants ``/research`` to mean their bundle, not whatever skill
+happens to share the slug.
+
+Public API
+----------
+- :func:`get_skill_bundles` — return ``{"/slug": bundle_info}``
+- :func:`resolve_bundle_command_key` — map a user-typed command to its slug
+- :func:`build_bundle_invocation_message` — produce the full user message
+- :func:`reload_bundles` — re-scan disk and return a diff
+- :func:`list_bundles` — return rich info for display (``hermes bundles``)
+- :func:`save_bundle` / :func:`delete_bundle` — file-level operations
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import re
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+
+import yaml
+
+from hermes_constants import get_hermes_home
+
+logger = logging.getLogger(__name__)
+
+# Slug normalization — matches agent/skill_commands.py so a bundle and a
+# skill called "Foo Bar" both resolve to "/foo-bar".
+_BUNDLE_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
+_BUNDLE_MULTI_HYPHEN = re.compile(r"-{2,}")
+
+_bundles_cache: Dict[str, Dict[str, Any]] = {}
+_bundles_cache_mtime: Optional[float] = None
+
+
+def _bundles_dir() -> Path:
+    """Return the canonical bundles directory under HERMES_HOME.
+
+    Honors ``HERMES_BUNDLES_DIR`` for tests; falls back to
+    ``<HERMES_HOME>/skill-bundles``.
+    """
+    override = os.environ.get("HERMES_BUNDLES_DIR")
+    if override:
+        return Path(override).expanduser()
+    return get_hermes_home() / "skill-bundles"
+
+
+def _slugify(name: str) -> str:
+    cmd = name.lower().replace(" ", "-").replace("_", "-")
+    cmd = _BUNDLE_INVALID_CHARS.sub("", cmd)
+    cmd = _BUNDLE_MULTI_HYPHEN.sub("-", cmd).strip("-")
+    return cmd
+
+
+def _iter_bundle_files() -> List[Path]:
+    base = _bundles_dir()
+    if not base.exists():
+        return []
+    files: List[Path] = []
+    for ext in ("*.yaml", "*.yml"):
+        files.extend(sorted(base.glob(ext)))
+    return files
+
+
+def _max_mtime(files: List[Path]) -> float:
+    """Highest mtime across the bundle files plus the dir itself.
+
+    Watching the directory mtime catches deletions; watching individual
+    files catches edits. Together they're a cheap freshness check.
+    """
+    base = _bundles_dir()
+    mtimes = []
+    if base.exists():
+        try:
+            mtimes.append(base.stat().st_mtime)
+        except OSError:
+            pass
+    for f in files:
+        try:
+            mtimes.append(f.stat().st_mtime)
+        except OSError:
+            continue
+    return max(mtimes) if mtimes else 0.0
+
+
+def _load_bundle_file(path: Path) -> Optional[Dict[str, Any]]:
+    """Parse a single bundle YAML file. Returns ``None`` on any error.
+
+    Errors are logged at WARNING level. We don't raise — a broken bundle
+    shouldn't take down slash command discovery.
+    """
+    try:
+        raw = path.read_text(encoding="utf-8")
+    except OSError as exc:
+        logger.warning("Could not read bundle %s: %s", path, exc)
+        return None
+    try:
+        data = yaml.safe_load(raw)
+    except yaml.YAMLError as exc:
+        logger.warning("Invalid YAML in bundle %s: %s", path, exc)
+        return None
+    if not isinstance(data, dict):
+        logger.warning("Bundle %s is not a mapping; skipping", path)
+        return None
+
+    name = str(data.get("name") or path.stem).strip()
+    if not name:
+        logger.warning("Bundle %s has no name; skipping", path)
+        return None
+
+    skills = data.get("skills") or []
+    if not isinstance(skills, list) or not skills:
+        logger.warning("Bundle %s has no skills list; skipping", path)
+        return None
+    skills = [str(s).strip() for s in skills if str(s).strip()]
+    if not skills:
+        logger.warning("Bundle %s has empty skills list; skipping", path)
+        return None
+
+    description = str(data.get("description") or "").strip()
+    instruction = str(data.get("instruction") or "").strip()
+
+    slug = _slugify(name)
+    if not slug:
+        logger.warning("Bundle %s yielded empty slug; skipping", path)
+        return None
+
+    return {
+        "name": name,
+        "slug": slug,
+        "description": description or f"Load {len(skills)} skills as a bundle",
+        "skills": skills,
+        "instruction": instruction,
+        "path": str(path),
+    }
+
+
+def scan_bundles() -> Dict[str, Dict[str, Any]]:
+    """Scan the bundles directory and rebuild the cache.
+
+    Returns the same mapping as :func:`get_skill_bundles` — ``"/slug"`` →
+    bundle info dict. Later bundles with a duplicate slug are skipped with
+    a warning (first wins, alphabetical order).
+    """
+    global _bundles_cache, _bundles_cache_mtime
+    files = _iter_bundle_files()
+    out: Dict[str, Dict[str, Any]] = {}
+    for f in files:
+        info = _load_bundle_file(f)
+        if not info:
+            continue
+        key = f"/{info['slug']}"
+        if key in out:
+            logger.warning(
+                "Duplicate bundle slug %s from %s; keeping %s",
+                key, f, out[key]["path"],
+            )
+            continue
+        out[key] = info
+    _bundles_cache = out
+    _bundles_cache_mtime = _max_mtime(files)
+    return out
+
+
+def get_skill_bundles() -> Dict[str, Dict[str, Any]]:
+    """Return the current bundle mapping, rescanning when disk changed.
+
+    Cheap to call repeatedly: only rescans when the bundles directory or
+    any bundle file's mtime is newer than the cached snapshot.
+    """
+    files = _iter_bundle_files()
+    current_mtime = _max_mtime(files)
+    if not _bundles_cache or _bundles_cache_mtime != current_mtime:
+        scan_bundles()
+    return _bundles_cache
+
+
+def resolve_bundle_command_key(command: str) -> Optional[str]:
+    """Resolve a user-typed command to its canonical bundle slash key.
+
+    Hyphens and underscores are treated interchangeably to mirror the
+    skill-command behavior (Telegram converts hyphens to underscores in
+    bot command names).
+    """
+    if not command:
+        return None
+    cmd_key = f"/{command.replace('_', '-')}"
+    return cmd_key if cmd_key in get_skill_bundles() else None
+
+
+def reload_bundles() -> Dict[str, Any]:
+    """Re-scan the bundles directory and return a diff.
+
+    Mirrors :func:`agent.skill_commands.reload_skills` so callers can use
+    the same display logic. Returns a dict with ``added``, ``removed``,
+    ``unchanged``, and ``total`` keys.
+    """
+    def _snapshot(cmds: Dict[str, Dict[str, Any]]) -> Dict[str, str]:
+        return {k.lstrip("/"): (v or {}).get("description", "") for k, v in cmds.items()}
+
+    before = _snapshot(_bundles_cache)
+    new = scan_bundles()
+    after = _snapshot(new)
+
+    added_names = sorted(set(after) - set(before))
+    removed_names = sorted(set(before) - set(after))
+    unchanged = sorted(set(after) & set(before))
+
+    return {
+        "added": [{"name": n, "description": after[n]} for n in added_names],
+        "removed": [{"name": n, "description": before[n]} for n in removed_names],
+        "unchanged": unchanged,
+        "total": len(after),
+    }
+
+
+def list_bundles() -> List[Dict[str, Any]]:
+    """Return a sorted list of bundle info dicts for display."""
+    bundles = get_skill_bundles()
+    return sorted(bundles.values(), key=lambda b: b["slug"])
+
+
+def build_bundle_invocation_message(
+    cmd_key: str,
+    user_instruction: str = "",
+    task_id: str | None = None,
+) -> Optional[Tuple[str, List[str], List[str]]]:
+    """Build the user message content for a bundle slash command invocation.
+
+    Returns ``(message, loaded_skill_names, missing_skill_names)`` or
+    ``None`` if the bundle wasn't found.
+
+    A bundle that references skills the user doesn't have installed still
+    loads — the agent gets a note about which ones were skipped. This is
+    the same forgiving stance ``build_preloaded_skills_prompt`` uses for
+    ``-s`` CLI preloading.
+    """
+    bundles = get_skill_bundles()
+    info = bundles.get(cmd_key)
+    if not info:
+        return None
+
+    # Late import to avoid pulling tools/* at module import time and to
+    # keep skill_bundles cheap to import in test environments.
+    from agent.skill_commands import _load_skill_payload, _build_skill_message
+
+    loaded_names: List[str] = []
+    missing: List[str] = []
+    skill_blocks: List[str] = []
+    seen: set[str] = set()
+
+    bundle_name = info["name"]
+    skills = info["skills"]
+    extra_instruction = info.get("instruction") or ""
+
+    for skill_id in skills:
+        identifier = (skill_id or "").strip()
+        if not identifier or identifier in seen:
+            continue
+        seen.add(identifier)
+
+        loaded = _load_skill_payload(identifier, task_id=task_id)
+        if not loaded:
+            missing.append(identifier)
+            continue
+        loaded_skill, skill_dir, skill_name = loaded
+
+        try:
+            from tools.skill_usage import bump_use
+            bump_use(skill_name)
+        except Exception:
+            pass
+
+        activation_note = (
+            f'[Loaded as part of the "{bundle_name}" skill bundle.]'
+        )
+        skill_blocks.append(
+            _build_skill_message(
+                loaded_skill,
+                skill_dir,
+                activation_note,
+                session_id=task_id,
+            )
+        )
+        loaded_names.append(skill_name)
+
+    if not skill_blocks:
+        return None
+
+    # Header — tells the agent this is a bundle, lists the skills, and
+    # provides any author-supplied instruction.
+    header_lines = [
+        f'[IMPORTANT: The user has invoked the "{bundle_name}" skill bundle, '
+        f"loading {len(loaded_names)} skills together. Treat every skill below "
+        "as active guidance for this turn.]",
+        "",
+        f"Bundle: {bundle_name}",
+        f"Skills loaded: {', '.join(loaded_names)}",
+    ]
+    if missing:
+        header_lines.append(f"Skills missing (skipped): {', '.join(missing)}")
+    if extra_instruction:
+        header_lines.extend(["", f"Bundle instruction: {extra_instruction}"])
+    if user_instruction:
+        header_lines.extend(
+            ["", f"User instruction: {user_instruction}"]
+        )
+
+    header = "\n".join(header_lines)
+    return ("\n\n".join([header, *skill_blocks]), loaded_names, missing)
+
+
+# ---------------------------------------------------------------------------
+# File-level CRUD helpers — used by `hermes bundles` CLI subcommand.
+# ---------------------------------------------------------------------------
+
+
+def bundle_path_for(name: str) -> Path:
+    """Return the canonical filesystem path for a bundle name."""
+    slug = _slugify(name)
+    if not slug:
+        raise ValueError(f"Bundle name {name!r} normalizes to an empty slug")
+    return _bundles_dir() / f"{slug}.yaml"
+
+
+def save_bundle(
+    name: str,
+    skills: List[str],
+    description: str = "",
+    instruction: str = "",
+    overwrite: bool = False,
+) -> Path:
+    """Write a bundle to disk and invalidate the cache.
+
+    Raises ``FileExistsError`` if the target exists and ``overwrite`` is
+    False. Raises ``ValueError`` if the inputs are unusable.
+    """
+    name = (name or "").strip()
+    if not name:
+        raise ValueError("Bundle name is required")
+    cleaned_skills = [str(s).strip() for s in skills if str(s).strip()]
+    if not cleaned_skills:
+        raise ValueError("Bundle must reference at least one skill")
+
+    path = bundle_path_for(name)
+    if path.exists() and not overwrite:
+        raise FileExistsError(f"Bundle already exists at {path}")
+
+    path.parent.mkdir(parents=True, exist_ok=True)
+    payload: Dict[str, Any] = {"name": name, "skills": cleaned_skills}
+    if description:
+        payload["description"] = description
+    if instruction:
+        payload["instruction"] = instruction
+
+    path.write_text(
+        yaml.safe_dump(payload, sort_keys=False, allow_unicode=True),
+        encoding="utf-8",
+    )
+    scan_bundles()  # refresh cache
+    return path
+
+
+def delete_bundle(name: str) -> Path:
+    """Delete a bundle by name. Returns the deleted path.
+
+    Raises ``FileNotFoundError`` if the bundle doesn't exist.
+    """
+    path = bundle_path_for(name)
+    if not path.exists():
+        raise FileNotFoundError(f"No bundle at {path}")
+    path.unlink()
+    scan_bundles()
+    return path
+
+
+def get_bundle(name: str) -> Optional[Dict[str, Any]]:
+    """Look up a bundle by name (slug-normalized)."""
+    slug = _slugify(name)
+    return get_skill_bundles().get(f"/{slug}")
--- a/agent/skill_commands.py
+++ b/agent/skill_commands.py
@@ -58,13 +58,35 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu

    try:
        from tools.skills_tool import SKILLS_DIR, skill_view
+        from agent.skill_utils import get_external_skills_dirs

        identifier_path = Path(raw_identifier).expanduser()
        if identifier_path.is_absolute():
+            normalized = None
+            trusted_roots = [SKILLS_DIR]
            try:
-                normalized = str(identifier_path.resolve().relative_to(SKILLS_DIR.resolve()))
+                trusted_roots.extend(get_external_skills_dirs())
            except Exception:
-                normalized = raw_identifier
+                pass
+
+            # Prefer the lexical path under a trusted skill root before
+            # resolving symlinks.  Slash-command discovery can legitimately
+            # find a skill via ~/.hermes/skills/<name> where <name> is a
+            # symlink to a checked-out skill elsewhere.  Resolving first turns
+            # that trusted visible path into an arbitrary absolute path that
+            # skill_view() refuses to load.
+            for root in trusted_roots:
+                try:
+                    normalized = str(identifier_path.relative_to(root))
+                    break
+                except ValueError:
+                    continue
+
+            if normalized is None:
+                try:
+                    normalized = str(identifier_path.resolve().relative_to(SKILLS_DIR.resolve()))
+                except Exception:
+                    normalized = raw_identifier
        else:
            normalized = raw_identifier.lstrip("/")

@@ -261,7 +283,7 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:

        for scan_dir in dirs_to_scan:
            for skill_md in iter_skill_index_files(scan_dir, "SKILL.md"):
-                if any(part in ('.git', '.github', '.hub', '.archive') for part in skill_md.parts):
+                if any(part in {'.git', '.github', '.hub', '.archive'} for part in skill_md.parts):
                    continue
                try:
                    content = skill_md.read_text(encoding='utf-8')
@@ -425,7 +447,7 @@ def build_skill_invocation_message(

    loaded = _load_skill_payload(skill_info["skill_dir"], task_id=task_id)
    if not loaded:
-        return f"[Failed to load skill: {skill_info['name']}]"
+        return None

    loaded_skill, skill_dir, skill_name = loaded

--- a/agent/skill_preprocessing.py
+++ b/agent/skill_preprocessing.py
@@ -79,6 +79,14 @@ def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
        return f"[inline-shell timeout after {timeout}s: {command}]"
    except FileNotFoundError:
        return "[inline-shell error: bash not found]"
+    except RuntimeError as exc:
+        # tests/conftest.py installs a live-system guard that blocks real
+        # os.kill on out-of-tree PIDs. subprocess.run(timeout=...) may trip
+        # that guard while trying to clean up the timed-out shell; treat that
+        # as the same timeout outcome instead of surfacing the guard error.
+        if "live-system guard: blocked os.kill" in str(exc):
+            return f"[inline-shell timeout after {timeout}s: {command}]"
+        return f"[inline-shell error: {exc}]"
    except Exception as exc:
        return f"[inline-shell error: {exc}]"

--- a/agent/skill_utils.py
+++ b/agent/skill_utils.py
@@ -24,7 +24,43 @@ PLATFORM_MAP = {
    "windows": "win32",
 }

-EXCLUDED_SKILL_DIRS = frozenset((".git", ".github", ".hub", ".archive"))
+EXCLUDED_SKILL_DIRS = frozenset(
+    (
+        ".git",
+        ".github",
+        ".hub",
+        ".archive",
+        ".venv",
+        "venv",
+        "node_modules",
+        "site-packages",
+        "__pycache__",
+        ".tox",
+        ".nox",
+        ".pytest_cache",
+        ".mypy_cache",
+        ".ruff_cache",
+    )
+)
+
+
+def is_excluded_skill_path(path) -> bool:
+    """True if any component of *path* is in EXCLUDED_SKILL_DIRS.
+
+    Use this on every SKILL.md path produced by ``rglob`` to prune
+    dependency, virtualenv, VCS, and cache directories. Centralising the
+    check here keeps every skill-scanning site in sync with the shared
+    exclusion set.
+
+    Accepts a Path or string.
+    """
+    try:
+        parts = path.parts  # Path
+    except AttributeError:
+        from pathlib import PurePath
+        parts = PurePath(str(path)).parts
+    return any(part in EXCLUDED_SKILL_DIRS for part in parts)
+

 # ── Lazy YAML loader ─────────────────────────────────────────────────────

@@ -478,7 +514,8 @@ def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
 def iter_skill_index_files(skills_dir: Path, filename: str):
    """Walk skills_dir yielding sorted paths matching *filename*.

-    Excludes ``.git``, ``.github``, ``.hub``, ``.archive`` directories.
+    Excludes Hermes metadata, VCS, virtualenv/dependency, and cache
+    directories so dependencies cannot register nested skills.
    """
    matches = []
    for root, dirs, files in os.walk(skills_dir, followlinks=True):
--- a/agent/stream_diag.py
+++ b/agent/stream_diag.py
@@ -0,0 +1,280 @@
+"""Stream diagnostics — per-attempt counters, exception chains, retry logging.
+
+When a streaming chat-completions request dies mid-response, we want to
+know why: which Cloudflare edge served the request, which OpenRouter
+downstream provider answered, how many bytes/chunks we got before the
+drop, the HTTP status, the underlying httpx error class.  These helpers
+collect that info and emit it both to ``agent.log`` (full detail) and to
+the user-facing status line (compact).
+
+All helpers are extracted from :class:`AIAgent` for cleanliness.
+``run_agent`` keeps thin forwarder methods so existing call sites and
+tests that patch ``run_agent.<helper>`` keep working.
+"""
+
+from __future__ import annotations
+
+import logging
+import time
+from typing import Any, Dict, List, Optional
+
+logger = logging.getLogger(__name__)
+
+
+# Per-attempt stream diagnostic headers.  Lowercased; httpx returns
+# CIMultiDict so case-insensitive lookups already work, but we read .get()
+# on the dict from agent.log for free-form post-hoc analysis.
+STREAM_DIAG_HEADERS = (
+    "cf-ray",
+    "cf-cache-status",
+    "x-openrouter-provider",
+    "x-openrouter-model",
+    "x-openrouter-id",
+    "x-request-id",
+    "x-vercel-id",
+    "via",
+    "server",
+    "x-forwarded-for",
+)
+
+
+def stream_diag_init() -> Dict[str, Any]:
+    """Return a fresh per-attempt diagnostic dict.
+
+    Mutated in-place by the streaming functions and read from the retry
+    block when a stream dies.  Lives on ``request_client_holder`` so it
+    survives across the closure boundary.
+    """
+    return {
+        "started_at": time.time(),
+        "first_chunk_at": None,
+        "chunks": 0,
+        "bytes": 0,
+        "headers": {},
+        "http_status": None,
+    }
+
+
+def stream_diag_capture_response(agent: Any, diag: Dict[str, Any], http_response: Any) -> None:
+    """Snapshot interesting headers + HTTP status from the live stream.
+
+    Called once at stream open (before iterating chunks) so the metadata
+    survives even if the stream dies before any chunk arrives.  Failures
+    are swallowed — diag is best-effort.
+    """
+    if http_response is None or not isinstance(diag, dict):
+        return
+    try:
+        diag["http_status"] = getattr(http_response, "status_code", None)
+    except Exception:
+        pass
+    try:
+        headers = getattr(http_response, "headers", None) or {}
+        captured: Dict[str, str] = {}
+        # Allow per-agent override of the headers list (back-compat).
+        target_headers = getattr(agent, "_STREAM_DIAG_HEADERS", STREAM_DIAG_HEADERS)
+        for name in target_headers:
+            try:
+                val = headers.get(name)
+                if val:
+                    # Truncate single-value to keep log lines bounded.
+                    captured[name] = str(val)[:120]
+            except Exception:
+                continue
+        diag["headers"] = captured
+    except Exception:
+        pass
+
+
+def flatten_exception_chain(error: BaseException) -> str:
+    """Return a compact ``Outer(msg) <- Inner(msg) <- ...`` rendering.
+
+    OpenAI SDK wraps httpx errors as ``APIConnectionError`` /
+    ``APIError`` and only the wrapper's class is visible at the catch
+    site — but the underlying ``RemoteProtocolError`` /
+    ``ConnectError`` / ``ReadError`` is what tells us WHY the stream
+    died.  Walks ``__cause__`` then ``__context__`` (deduped, max 4
+    deep) to surface the chain in one line.
+    """
+    seen: List[BaseException] = []
+    link: Optional[BaseException] = error
+    while link is not None and len(seen) < 4:
+        if link in seen:
+            break
+        seen.append(link)
+        nxt = getattr(link, "__cause__", None) or getattr(
+            link, "__context__", None
+        )
+        if nxt is None or nxt is link:
+            break
+        link = nxt
+    parts: List[str] = []
+    for e in seen:
+        msg = str(e).strip().replace("\n", " ")
+        if len(msg) > 140:
+            msg = msg[:140] + "…"
+        parts.append(f"{type(e).__name__}({msg})" if msg else type(e).__name__)
+    return " <- ".join(parts) if parts else type(error).__name__
+
+
+def log_stream_retry(
+    agent: Any,
+    *,
+    kind: str,
+    error: BaseException,
+    attempt: int,
+    max_attempts: int,
+    mid_tool_call: bool,
+    diag: Optional[Dict[str, Any]] = None,
+) -> None:
+    """Record a transient stream-drop and retry to ``agent.log``.
+
+    Always logs a structured WARNING so users have a breadcrumb regardless
+    of UI verbosity.  Subagents in particular benefit because their
+    retries no longer spam the parent's terminal — but the file log keeps
+    full detail (provider, error class, attempt, base_url, subagent_id).
+
+    When *diag* is provided (the per-attempt stream-diagnostic dict from
+    :func:`stream_diag_init`), the WARNING also captures upstream headers
+    (cf-ray, x-openrouter-provider, x-openrouter-id), HTTP status, bytes
+    streamed before the drop, and elapsed time on the dying attempt.
+    These are the breadcrumbs needed to answer "is one CF edge / one
+    downstream provider responsible, or is it random across runs?"
+    """
+    try:
+        try:
+            _summary = agent._summarize_api_error(error)
+        except Exception:
+            _summary = str(error)
+        if _summary and len(_summary) > 240:
+            _summary = _summary[:240] + "…"
+
+        # Inner-cause chain (httpx errors hide under openai.APIError).
+        try:
+            _chain = flatten_exception_chain(error)
+        except Exception:
+            _chain = type(error).__name__
+
+        # Per-attempt counters and upstream headers.
+        _now = time.time()
+        _bytes = 0
+        _chunks = 0
+        _elapsed = 0.0
+        _ttfb = None
+        _headers_repr = "-"
+        _http_status = "-"
+        if isinstance(diag, dict):
+            try:
+                _bytes = int(diag.get("bytes") or 0)
+                _chunks = int(diag.get("chunks") or 0)
+                _started = float(diag.get("started_at") or _now)
+                _elapsed = max(0.0, _now - _started)
+                _first = diag.get("first_chunk_at")
+                if _first is not None:
+                    _ttfb = max(0.0, float(_first) - _started)
+                headers = diag.get("headers") or {}
+                if isinstance(headers, dict) and headers:
+                    _headers_repr = " ".join(
+                        f"{k}={v}" for k, v in headers.items()
+                    )
+                if diag.get("http_status") is not None:
+                    _http_status = str(diag.get("http_status"))
+            except Exception:
+                pass
+
+        logger.warning(
+            "Stream %s on attempt %s/%s — retrying. "
+            "subagent_id=%s depth=%s provider=%s base_url=%s "
+            "error_type=%s error=%s "
+            "chain=%s "
+            "http_status=%s bytes=%d chunks=%d elapsed=%.2fs ttfb=%s "
+            "upstream=[%s]",
+            kind,
+            attempt,
+            max_attempts,
+            getattr(agent, "_subagent_id", None) or "-",
+            getattr(agent, "_delegate_depth", 0),
+            agent.provider or "-",
+            agent.base_url or "-",
+            type(error).__name__,
+            _summary,
+            _chain,
+            _http_status,
+            _bytes,
+            _chunks,
+            _elapsed,
+            f"{_ttfb:.2f}s" if _ttfb is not None else "-",
+            _headers_repr,
+            extra={"mid_tool_call": mid_tool_call},
+        )
+    except Exception:
+        logger.debug("stream-retry log emit failed", exc_info=True)
+
+
+def emit_stream_drop(
+    agent: Any,
+    *,
+    error: BaseException,
+    attempt: int,
+    max_attempts: int,
+    mid_tool_call: bool,
+    diag: Optional[Dict[str, Any]] = None,
+) -> None:
+    """Emit a single user-visible line for a stream drop+retry.
+
+    Both top-level agents and subagents announce drops in the UI — the
+    parent prefixes subagent lines with ``[subagent-N]`` via ``log_prefix``
+    so they're easy to attribute.  All cases also write a structured
+    WARNING to ``agent.log`` via :func:`log_stream_retry` with the full
+    diagnostic detail (subagent_id, provider, base_url, error_type,
+    cf-ray, x-openrouter-provider, bytes/chunks, elapsed) for post-hoc
+    analysis.
+
+    The user-visible status line is intentionally compact: provider,
+    error class, attempt N/M, plus ``after Xs`` when the stream dropped
+    mid-flight.  Full diagnostic detail goes to ``agent.log`` only —
+    ``hermes logs --level WARNING | grep "Stream drop"`` to inspect.
+    """
+    kind = "drop mid tool-call" if mid_tool_call else "drop"
+    log_stream_retry(
+        agent,
+        kind=kind,
+        error=error,
+        attempt=attempt,
+        max_attempts=max_attempts,
+        mid_tool_call=mid_tool_call,
+        diag=diag,
+    )
+    provider = agent.provider or "provider"
+    # Compose a brief "after Xs" suffix when we have timing data — helps
+    # the user distinguish "couldn't connect" (0s) from "died after 30s
+    # of streaming" (likely upstream idle-kill or proxy timeout).
+    _suffix = ""
+    if isinstance(diag, dict):
+        try:
+            started = diag.get("started_at")
+            if started is not None:
+                _suffix = f" after {max(0.0, time.time() - float(started)):.1f}s"
+        except Exception:
+            pass
+    try:
+        agent._emit_status(
+            f"⚠️ {provider} stream {kind} ({type(error).__name__}){_suffix} "
+            f"— reconnecting, retry {attempt}/{max_attempts}"
+        )
+        agent._touch_activity(
+            f"stream retry {attempt}/{max_attempts} "
+            f"after {type(error).__name__}"
+        )
+    except Exception:
+        pass
+
+
+__all__ = [
+    "STREAM_DIAG_HEADERS",
+    "stream_diag_init",
+    "stream_diag_capture_response",
+    "flatten_exception_chain",
+    "log_stream_retry",
+    "emit_stream_drop",
+]
--- a/agent/system_prompt.py
+++ b/agent/system_prompt.py
@@ -0,0 +1,346 @@
+"""System-prompt assembly for :class:`AIAgent`.
+
+The agent's system prompt is built once per session and reused across all
+turns — only context compression triggers a rebuild.  This keeps the
+upstream prefix cache warm.  See ``hermes-agent-dev``'s
+``references/system-prompt-invariant.md`` for the invariants and
+``references/self-improvement-loop.md`` for how the background-review
+fork inherits the cached prompt verbatim.
+
+Three tiers are joined with ``\\n\\n``:
+
+* ``stable``   — identity (SOUL.md or DEFAULT_AGENT_IDENTITY), tool
+  guidance, computer-use guidance, nous subscription block, tool-use
+  enforcement guidance + per-model operational guidance, skills prompt,
+  alibaba model-name workaround, environment hints, platform hints.
+* ``context``  — caller-supplied ``system_message`` plus context files
+  (AGENTS.md / .cursorrules / etc.) discovered under ``TERMINAL_CWD``.
+* ``volatile`` — memory snapshot, USER.md profile, external memory
+  provider block, timestamp/session/model/provider line.
+
+Pure helpers that read the agent's state.  AIAgent keeps thin forwarders.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+from typing import Any, Dict, List, Optional
+
+from agent.prompt_builder import (
+    DEFAULT_AGENT_IDENTITY,
+    GOOGLE_MODEL_OPERATIONAL_GUIDANCE,
+    HERMES_AGENT_HELP_GUIDANCE,
+    KANBAN_GUIDANCE,
+    MEMORY_GUIDANCE,
+    OPENAI_MODEL_EXECUTION_GUIDANCE,
+    PLATFORM_HINTS,
+    SESSION_SEARCH_GUIDANCE,
+    SKILLS_GUIDANCE,
+    TOOL_USE_ENFORCEMENT_GUIDANCE,
+    TOOL_USE_ENFORCEMENT_MODELS,
+)
+
+
+def _ra():
+    """Lazy reference to the ``run_agent`` module.
+
+    Helpers like ``load_soul_md``, ``build_environment_hints``,
+    ``build_context_files_prompt``, ``build_nous_subscription_prompt``,
+    ``build_skills_system_prompt`` and ``get_toolset_for_tool`` are
+    imported into ``run_agent``'s namespace.  Many tests
+    ``patch("run_agent.load_soul_md", ...)``; if we imported them
+    directly here those patches would not reach us.  Looking them up
+    through ``run_agent`` on every call preserves the patch contract.
+    """
+    import run_agent
+    return run_agent
+
+
+def build_system_prompt_parts(agent: Any, system_message: Optional[str] = None) -> Dict[str, str]:
+    """Assemble the system prompt as three ordered parts.
+
+    Returns a dict with three keys:
+      * ``stable``   — identity, tool guidance, skills prompt,
+        environment hints, platform hints, model-family operational
+        guidance.
+      * ``context``  — context files (AGENTS.md, .cursorrules, etc.)
+        and caller-supplied system_message.
+      * ``volatile`` — memory snapshot, user profile, external
+        memory provider block, timestamp line.
+
+    Joined into a single string by :func:`build_system_prompt` and
+    cached on ``agent._cached_system_prompt`` for the lifetime of the
+    AIAgent.  Hermes never re-renders parts of this string mid-
+    session — that's the only way to keep upstream prompt caches
+    warm across turns.
+    """
+    # Local import to avoid pulling model_tools at module load.  Tests
+    # patch ``run_agent.get_toolset_for_tool`` and similar helpers, so
+    # we resolve through ``_ra()`` to honor those patches.
+    _r = _ra()
+
+    # ── Stable tier ────────────────────────────────────────────────
+    stable_parts: List[str] = []
+
+    # Try SOUL.md as primary identity unless the caller explicitly skipped it.
+    # Some execution modes (cron) still want HERMES_HOME persona while keeping
+    # cwd project instructions disabled.
+    _soul_loaded = False
+    if agent.load_soul_identity or not agent.skip_context_files:
+        _soul_content = _r.load_soul_md()
+        if _soul_content:
+            stable_parts.append(_soul_content)
+            _soul_loaded = True
+
+    if not _soul_loaded:
+        # Fallback to hardcoded identity
+        stable_parts.append(DEFAULT_AGENT_IDENTITY)
+
+    # Pointer to the hermes-agent skill + docs for user questions about Hermes itself.
+    stable_parts.append(HERMES_AGENT_HELP_GUIDANCE)
+
+    # Tool-aware behavioral guidance: only inject when the tools are loaded
+    tool_guidance = []
+    if "memory" in agent.valid_tool_names:
+        tool_guidance.append(MEMORY_GUIDANCE)
+    if "session_search" in agent.valid_tool_names:
+        tool_guidance.append(SESSION_SEARCH_GUIDANCE)
+    if "skill_manage" in agent.valid_tool_names:
+        tool_guidance.append(SKILLS_GUIDANCE)
+    # Kanban worker/orchestrator lifecycle — only present when the
+    # dispatcher spawned this process (kanban_show check_fn gates on
+    # HERMES_KANBAN_TASK env var). Normal chat sessions never see
+    # this block. Resolved once at __init__ (see _kanban_worker_guidance).
+    _kanban_guidance = getattr(agent, "_kanban_worker_guidance", None)
+    if _kanban_guidance:
+        tool_guidance.append(_kanban_guidance)
+    elif _kanban_guidance is None and "kanban_show" in agent.valid_tool_names:
+        # Fallback for code paths that bypass agent_init (rare).
+        tool_guidance.append(KANBAN_GUIDANCE)
+    if tool_guidance:
+        stable_parts.append(" ".join(tool_guidance))
+
+    # Computer-use (macOS) — goes in as its own block rather than being
+    # merged into tool_guidance because the content is multi-paragraph.
+    if "computer_use" in agent.valid_tool_names:
+        from agent.prompt_builder import COMPUTER_USE_GUIDANCE
+        stable_parts.append(COMPUTER_USE_GUIDANCE)
+
+    nous_subscription_prompt = _r.build_nous_subscription_prompt(agent.valid_tool_names)
+    if nous_subscription_prompt:
+        stable_parts.append(nous_subscription_prompt)
+    # Tool-use enforcement: tells the model to actually call tools instead
+    # of describing intended actions.  Controlled by config.yaml
+    # agent.tool_use_enforcement:
+    #   "auto" (default) — matches TOOL_USE_ENFORCEMENT_MODELS
+    #   true  — always inject (all models)
+    #   false — never inject
+    #   list  — custom model-name substrings to match
+    if agent.valid_tool_names:
+        _enforce = agent._tool_use_enforcement
+        _inject = False
+        if _enforce is True or (isinstance(_enforce, str) and _enforce.lower() in {"true", "always", "yes", "on"}):
+            _inject = True
+        elif _enforce is False or (isinstance(_enforce, str) and _enforce.lower() in {"false", "never", "no", "off"}):
+            _inject = False
+        elif isinstance(_enforce, list):
+            model_lower = (agent.model or "").lower()
+            _inject = any(p.lower() in model_lower for p in _enforce if isinstance(p, str))
+        else:
+            # "auto" or any unrecognised value — use hardcoded defaults
+            model_lower = (agent.model or "").lower()
+            _inject = any(p in model_lower for p in TOOL_USE_ENFORCEMENT_MODELS)
+        if _inject:
+            stable_parts.append(TOOL_USE_ENFORCEMENT_GUIDANCE)
+            _model_lower = (agent.model or "").lower()
+            # Google model operational guidance (conciseness, absolute
+            # paths, parallel tool calls, verify-before-edit, etc.)
+            if "gemini" in _model_lower or "gemma" in _model_lower:
+                stable_parts.append(GOOGLE_MODEL_OPERATIONAL_GUIDANCE)
+            # OpenAI GPT/Codex execution discipline (tool persistence,
+            # prerequisite checks, verification, anti-hallucination).
+            # Also applied to xAI Grok — same failure modes (claims completion
+            # without tool calls, suggests workarounds instead of using
+            # existing tools, replies with plans instead of executing).
+            if "gpt" in _model_lower or "codex" in _model_lower or "grok" in _model_lower:
+                stable_parts.append(OPENAI_MODEL_EXECUTION_GUIDANCE)
+
+    has_skills_tools = any(name in agent.valid_tool_names for name in ['skills_list', 'skill_view', 'skill_manage'])
+    if has_skills_tools:
+        avail_toolsets = {
+            toolset
+            for toolset in (
+                _r.get_toolset_for_tool(tool_name) for tool_name in agent.valid_tool_names
+            )
+            if toolset
+        }
+        skills_prompt = _r.build_skills_system_prompt(
+            available_tools=agent.valid_tool_names,
+            available_toolsets=avail_toolsets,
+        )
+    else:
+        skills_prompt = ""
+    if skills_prompt:
+        stable_parts.append(skills_prompt)
+
+    # Alibaba Coding Plan API always returns "glm-4.7" as model name regardless
+    # of the requested model. Inject explicit model identity into the system prompt
+    # so the agent can correctly report which model it is (workaround for API bug).
+    # Stable for the lifetime of an agent instance — model and provider are fixed
+    # at construction time.
+    if agent.provider == "alibaba":
+        _model_short = agent.model.split("/")[-1] if "/" in agent.model else agent.model
+        stable_parts.append(
+            f"You are powered by the model named {_model_short}. "
+            f"The exact model ID is {agent.model}. "
+            f"When asked what model you are, always answer based on this information, "
+            f"not on any model name returned by the API."
+        )
+
+    # Environment hints (WSL, Termux, etc.) — tell the agent about the
+    # execution environment so it can translate paths and adapt behavior.
+    # Stable for the lifetime of the process.
+    _env_hints = _r.build_environment_hints()
+    if _env_hints:
+        stable_parts.append(_env_hints)
+
+    platform_key = (agent.platform or "").lower().strip()
+    if platform_key in PLATFORM_HINTS:
+        stable_parts.append(PLATFORM_HINTS[platform_key])
+    elif platform_key:
+        # Check plugin registry for platform-specific LLM guidance
+        try:
+            from gateway.platform_registry import platform_registry
+            _entry = platform_registry.get(platform_key)
+            if _entry and _entry.platform_hint:
+                stable_parts.append(_entry.platform_hint)
+        except Exception:
+            pass
+
+    # ── Context tier (cwd-dependent, may change between sessions) ─
+    context_parts: List[str] = []
+
+    # Note: ephemeral_system_prompt is NOT included here. It's injected at
+    # API-call time only so it stays out of the cached/stored system prompt.
+    if system_message is not None:
+        context_parts.append(system_message)
+
+    if not agent.skip_context_files:
+        # Use TERMINAL_CWD for context file discovery when set (gateway
+        # mode).  The gateway process runs from the hermes-agent install
+        # dir, so os.getcwd() would pick up the repo's AGENTS.md and
+        # other dev files — inflating token usage by ~10k for no benefit.
+        _context_cwd = os.getenv("TERMINAL_CWD") or None
+        context_files_prompt = _r.build_context_files_prompt(
+            cwd=_context_cwd, skip_soul=_soul_loaded)
+        if context_files_prompt:
+            context_parts.append(context_files_prompt)
+
+    # ── Volatile tier (changes per session/turn — never cached) ───
+    volatile_parts: List[str] = []
+
+    if agent._memory_store:
+        if agent._memory_enabled:
+            mem_block = agent._memory_store.format_for_system_prompt("memory")
+            if mem_block:
+                volatile_parts.append(mem_block)
+        # USER.md is always included when enabled.
+        if agent._user_profile_enabled:
+            user_block = agent._memory_store.format_for_system_prompt("user")
+            if user_block:
+                volatile_parts.append(user_block)
+
+    # External memory provider system prompt block (additive to built-in)
+    if agent._memory_manager:
+        try:
+            _ext_mem_block = agent._memory_manager.build_system_prompt()
+            if _ext_mem_block:
+                volatile_parts.append(_ext_mem_block)
+        except Exception:
+            pass
+
+    from hermes_time import now as _hermes_now
+    now = _hermes_now()
+    # Date-only (not minute-precision) so the system prompt is byte-stable
+    # for the full day.  Minute-precision changes invalidate prefix-cache KV
+    # on every rebuild path (compression boundary, fresh-agent gateway turns,
+    # session resume without a stored prompt).  The model can still query the
+    # exact wall-clock time via tools when it actually needs it.
+    # Credit: @iamfoz (PR #20451).
+    timestamp_line = f"Conversation started: {now.strftime('%A, %B %d, %Y')}"
+    if agent.pass_session_id and agent.session_id:
+        timestamp_line += f"\nSession ID: {agent.session_id}"
+    if agent.model:
+        timestamp_line += f"\nModel: {agent.model}"
+    if agent.provider:
+        timestamp_line += f"\nProvider: {agent.provider}"
+    volatile_parts.append(timestamp_line)
+
+    return {
+        "stable":   "\n\n".join(p.strip() for p in stable_parts   if p and p.strip()),
+        "context":  "\n\n".join(p.strip() for p in context_parts  if p and p.strip()),
+        "volatile": "\n\n".join(p.strip() for p in volatile_parts if p and p.strip()),
+    }
+
+
+def build_system_prompt(agent: Any, system_message: Optional[str] = None) -> str:
+    """Assemble the full system prompt from all layers.
+
+    Called once per session (cached on ``agent._cached_system_prompt``) and
+    only rebuilt after context compression events. This ensures the system
+    prompt is stable across all turns in a session, maximizing prefix cache
+    hits.
+
+    Layers are ordered cache-friendly: stable identity/guidance first,
+    then session-stable context files, then per-call volatile content
+    (memory, USER profile, timestamp).  The whole string is treated as
+    one cached block — Hermes never rebuilds or reinjects parts of it
+    mid-session, which is the only way to keep upstream prompt caches
+    warm across turns.
+    """
+    parts = build_system_prompt_parts(agent, system_message=system_message)
+    return "\n\n".join(p for p in (parts["stable"], parts["context"], parts["volatile"]) if p)
+
+
+def invalidate_system_prompt(agent: Any) -> None:
+    """Invalidate the cached system prompt, forcing a rebuild on the next turn.
+
+    Called after context compression events. Also reloads memory from disk
+    so the rebuilt prompt captures any writes from this session.
+    """
+    agent._cached_system_prompt = None
+    if agent._memory_store:
+        agent._memory_store.load_from_disk()
+
+
+def format_tools_for_system_message(agent: Any) -> str:
+    """Format tool definitions for the system message in the trajectory format.
+
+    Returns:
+        str: JSON string representation of tool definitions
+    """
+    if not agent.tools:
+        return "[]"
+
+    # Convert tool definitions to the format expected in trajectories
+    formatted_tools = []
+    for tool in agent.tools:
+        func = tool["function"]
+        formatted_tool = {
+            "name": func["name"],
+            "description": func.get("description", ""),
+            "parameters": func.get("parameters", {}),
+            "required": None  # Match the format in the example
+        }
+        formatted_tools.append(formatted_tool)
+
+    return json.dumps(formatted_tools, ensure_ascii=False)
+
+
+__all__ = [
+    "build_system_prompt_parts",
+    "build_system_prompt",
+    "invalidate_system_prompt",
+    "format_tools_for_system_message",
+]
--- a/agent/tool_dispatch_helpers.py
+++ b/agent/tool_dispatch_helpers.py
@@ -0,0 +1,350 @@
+"""Tool-dispatch helpers — parallelism gating, multimodal envelopes, mutation tracking.
+
+Pure module-level utilities extracted from ``run_agent.py``:
+
+* ``_is_destructive_command`` — terminal-command heuristic used to gate
+  parallel batch dispatch.
+* ``_should_parallelize_tool_batch`` / ``_extract_parallel_scope_path`` /
+  ``_paths_overlap`` — the rules engine deciding when a multi-tool batch
+  can run concurrently.
+* ``_is_multimodal_tool_result`` / ``_multimodal_text_summary`` /
+  ``_append_subdir_hint_to_multimodal`` — envelope helpers for the
+  ``{"_multimodal": True, "content": [...], "text_summary": ...}`` dict
+  shape returned by tools like ``computer_use``.
+* ``_extract_file_mutation_targets`` / ``_extract_error_preview`` —
+  per-turn file-mutation verifier inputs.
+* ``_trajectory_normalize_msg`` — strip image blobs from a message for
+  trajectory saving.
+
+All helpers are stateless.  ``run_agent`` re-exports each name so existing
+``from run_agent import ...`` imports in tests and other modules keep
+working unchanged.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import re
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+from agent.tool_result_classification import (
+    FILE_MUTATING_TOOL_NAMES as _FILE_MUTATING_TOOLS,
+)
+
+logger = logging.getLogger(__name__)
+
+# Tools that must never run concurrently (interactive / user-facing).
+# When any of these appear in a batch, we fall back to sequential execution.
+_NEVER_PARALLEL_TOOLS = frozenset({"clarify"})
+
+# Read-only tools with no shared mutable session state.
+_PARALLEL_SAFE_TOOLS = frozenset({
+    "ha_get_state",
+    "ha_list_entities",
+    "ha_list_services",
+    "read_file",
+    "search_files",
+    "session_search",
+    "skill_view",
+    "skills_list",
+    "vision_analyze",
+    "web_extract",
+    "web_search",
+})
+
+# File tools can run concurrently when they target independent paths.
+_PATH_SCOPED_TOOLS = frozenset({"read_file", "write_file", "patch"})
+
+# Patterns that indicate a terminal command may modify/delete files.
+_DESTRUCTIVE_PATTERNS = re.compile(
+    r"""(?:^|\s|&&|\|\||;|`)(?:
+        rm\s|rmdir\s|
+        cp\s|install\s|
+        mv\s|
+        sed\s+-i|
+        truncate\s|
+        dd\s|
+        shred\s|
+        git\s+(?:reset|clean|checkout)\s
+    )""",
+    re.VERBOSE,
+)
+# Output redirects that overwrite files (> but not >>)
+_REDIRECT_OVERWRITE = re.compile(r'[^>]>[^>]|^>[^>]')
+
+
+def _is_destructive_command(cmd: str) -> bool:
+    """Heuristic: does this terminal command look like it modifies/deletes files?"""
+    if not cmd:
+        return False
+    if _DESTRUCTIVE_PATTERNS.search(cmd):
+        return True
+    if _REDIRECT_OVERWRITE.search(cmd):
+        return True
+    return False
+
+
+def _is_mcp_tool_parallel_safe(tool_name: str) -> bool:
+    """Check if an MCP tool comes from a server with parallel tool calls enabled.
+
+    Lazy-imports from ``tools.mcp_tool`` to avoid circular dependencies.
+    Returns False if the MCP module is not available.
+    """
+    try:
+        from tools.mcp_tool import is_mcp_tool_parallel_safe
+        return is_mcp_tool_parallel_safe(tool_name)
+    except Exception:
+        return False
+
+
+def _should_parallelize_tool_batch(tool_calls) -> bool:
+    """Return True when a tool-call batch is safe to run concurrently."""
+    if len(tool_calls) <= 1:
+        return False
+
+    tool_names = [tc.function.name for tc in tool_calls]
+    if any(name in _NEVER_PARALLEL_TOOLS for name in tool_names):
+        return False
+
+    reserved_paths: list[Path] = []
+    for tool_call in tool_calls:
+        tool_name = tool_call.function.name
+        try:
+            function_args = json.loads(tool_call.function.arguments)
+        except Exception:
+            logging.debug(
+                "Could not parse args for %s — defaulting to sequential; raw=%s",
+                tool_name,
+                tool_call.function.arguments[:200],
+            )
+            return False
+        if not isinstance(function_args, dict):
+            logging.debug(
+                "Non-dict args for %s (%s) — defaulting to sequential",
+                tool_name,
+                type(function_args).__name__,
+            )
+            return False
+
+        if tool_name in _PATH_SCOPED_TOOLS:
+            scoped_path = _extract_parallel_scope_path(tool_name, function_args)
+            if scoped_path is None:
+                return False
+            if any(_paths_overlap(scoped_path, existing) for existing in reserved_paths):
+                return False
+            reserved_paths.append(scoped_path)
+            continue
+
+        if tool_name not in _PARALLEL_SAFE_TOOLS:
+            # Check if it's an MCP tool from a server that opted into parallel calls.
+            if not _is_mcp_tool_parallel_safe(tool_name):
+                return False
+
+    return True
+
+
+def _extract_parallel_scope_path(tool_name: str, function_args: dict) -> Optional[Path]:
+    """Return the normalized file target for path-scoped tools."""
+    if tool_name not in _PATH_SCOPED_TOOLS:
+        return None
+
+    raw_path = function_args.get("path")
+    if not isinstance(raw_path, str) or not raw_path.strip():
+        return None
+
+    expanded = Path(raw_path).expanduser()
+    if expanded.is_absolute():
+        return Path(os.path.abspath(str(expanded)))
+
+    # Avoid resolve(); the file may not exist yet.
+    return Path(os.path.abspath(str(Path.cwd() / expanded)))
+
+
+def _paths_overlap(left: Path, right: Path) -> bool:
+    """Return True when two paths may refer to the same subtree."""
+    left_parts = left.parts
+    right_parts = right.parts
+    if not left_parts or not right_parts:
+        # Empty paths shouldn't reach here (guarded upstream), but be safe.
+        return bool(left_parts) == bool(right_parts) and bool(left_parts)
+    common_len = min(len(left_parts), len(right_parts))
+    return left_parts[:common_len] == right_parts[:common_len]
+
+
+def _is_multimodal_tool_result(value: Any) -> bool:
+    """True if the value is a multimodal tool result envelope.
+
+    Multimodal handlers (e.g. tools/computer_use) return a dict with
+    `_multimodal=True`, a `content` key holding OpenAI-style content
+    parts, and an optional `text_summary` for string-only fallbacks.
+    """
+    return (
+        isinstance(value, dict)
+        and value.get("_multimodal") is True
+        and isinstance(value.get("content"), list)
+    )
+
+
+def _multimodal_text_summary(value: Any) -> str:
+    """Extract a plain text view of a multimodal tool result.
+
+    Used wherever downstream code needs a string — logging, previews,
+    persistence size heuristics, fall-back content for providers that
+    don't support multipart tool messages.
+    """
+    if _is_multimodal_tool_result(value):
+        if value.get("text_summary"):
+            return str(value["text_summary"])
+        parts = []
+        for p in value.get("content") or []:
+            if isinstance(p, dict) and p.get("type") == "text":
+                parts.append(str(p.get("text", "")))
+        if parts:
+            return "\n".join(parts)
+        return "[multimodal tool result]"
+    if isinstance(value, str):
+        return value
+    try:
+        return json.dumps(value, default=str)
+    except Exception:
+        return str(value)
+
+
+def _append_subdir_hint_to_multimodal(value: Dict[str, Any], hint: str) -> None:
+    """Mutate a multimodal tool-result envelope to append a subdir hint.
+
+    The hint is added to the first text part so the model sees it; image
+    parts are left untouched. `text_summary` is also updated for
+    string-fallback callers.
+    """
+    if not _is_multimodal_tool_result(value):
+        return
+    parts = value.get("content") or []
+    for p in parts:
+        if isinstance(p, dict) and p.get("type") == "text":
+            p["text"] = str(p.get("text", "")) + hint
+            break
+    else:
+        parts.insert(0, {"type": "text", "text": hint})
+        value["content"] = parts
+    if isinstance(value.get("text_summary"), str):
+        value["text_summary"] = value["text_summary"] + hint
+
+
+def _extract_file_mutation_targets(tool_name: str, args: Dict[str, Any]) -> List[str]:
+    """Return the file paths a ``write_file`` or ``patch`` call is targeting.
+
+    For ``write_file`` and ``patch`` in replace mode this is just ``args["path"]``.
+    For ``patch`` in V4A patch mode we parse the patch content for
+    ``*** Update File:`` / ``*** Add File:`` / ``*** Delete File:`` headers so
+    the verifier can track each file in a multi-file patch separately.
+    """
+    if tool_name not in _FILE_MUTATING_TOOLS:
+        return []
+    if tool_name == "write_file":
+        p = args.get("path")
+        return [str(p)] if p else []
+    # tool_name == "patch"
+    mode = args.get("mode") or "replace"
+    if mode == "replace":
+        p = args.get("path")
+        return [str(p)] if p else []
+    if mode == "patch":
+        body = args.get("patch") or ""
+        if not isinstance(body, str) or not body:
+            return []
+        paths: List[str] = []
+        for _m in re.finditer(
+            r'^\*\*\*\s+(?:Update|Add|Delete)\s+File:\s*(.+)$',
+            body,
+            re.MULTILINE,
+        ):
+            p = _m.group(1).strip()
+            if p:
+                paths.append(p)
+        return paths
+    return []
+
+
+def _extract_error_preview(result: Any, max_len: int = 180) -> str:
+    """Pull a one-line error summary out of a tool result for footer display."""
+    text = _multimodal_text_summary(result) if result is not None else ""
+    if not isinstance(text, str):
+        try:
+            text = str(text)
+        except Exception:
+            return ""
+    # Try to parse JSON and pull the ``error`` field — tool handlers return
+    # ``{"success": false, "error": "..."}``; raw string wins if parse fails.
+    stripped = text.strip()
+    if stripped.startswith("{"):
+        try:
+            data = json.loads(stripped)
+            if isinstance(data, dict) and isinstance(data.get("error"), str):
+                text = data["error"]
+        except Exception:
+            pass
+    # Collapse whitespace, trim to max_len.
+    text = " ".join(text.split())
+    if len(text) > max_len:
+        text = text[: max_len - 1] + "…"
+    return text
+
+
+def _trajectory_normalize_msg(msg: Dict[str, Any]) -> Dict[str, Any]:
+    """Strip image blobs from a message for trajectory saving.
+
+    Returns a shallow copy with multimodal tool results replaced by their
+    text_summary, and image parts in content lists replaced by
+    `[screenshot]` placeholders. Keeps the message schema otherwise intact.
+    """
+    if not isinstance(msg, dict):
+        return msg
+    content = msg.get("content")
+    if _is_multimodal_tool_result(content):
+        return {**msg, "content": _multimodal_text_summary(content)}
+    if isinstance(content, list):
+        cleaned = []
+        for p in content:
+            if isinstance(p, dict) and p.get("type") in {"image", "image_url", "input_image"}:
+                cleaned.append({"type": "text", "text": "[screenshot]"})
+            else:
+                cleaned.append(p)
+        return {**msg, "content": cleaned}
+    return msg
+
+
+def make_tool_result_message(name: str, content: Any, tool_call_id: str) -> dict:
+    """Build a tool-result message dict with both the OpenAI-format ``name``
+    field (required by the wire format and provider adapters) and the internal
+    ``tool_name`` field (written to the session DB messages table)."""
+    return {
+        "role": "tool",
+        "name": name,
+        "tool_name": name,
+        "content": content,
+        "tool_call_id": tool_call_id,
+    }
+
+
+__all__ = [
+    "_NEVER_PARALLEL_TOOLS",
+    "_PARALLEL_SAFE_TOOLS",
+    "_PATH_SCOPED_TOOLS",
+    "_DESTRUCTIVE_PATTERNS",
+    "_REDIRECT_OVERWRITE",
+    "_is_destructive_command",
+    "_should_parallelize_tool_batch",
+    "_extract_parallel_scope_path",
+    "_paths_overlap",
+    "_is_multimodal_tool_result",
+    "_multimodal_text_summary",
+    "_append_subdir_hint_to_multimodal",
+    "_extract_file_mutation_targets",
+    "_extract_error_preview",
+    "_trajectory_normalize_msg",
+    "make_tool_result_message",
+]
--- a/Show More
+++ b/Show More