feat: add transport types + migrate Anthropic normalize path

Add agent/transports/types.py with three shared dataclasses: - NormalizedResponse: content, tool_calls, finish_reason, reasoning, usage, provider_data - ToolCall: id, name, arguments, provider_data (per-tool-call protocol metadata) - Usage: prompt_tokens, completion_tokens, total_tokens, cached_tokens Add normalize_anthropic_response_v2() to anthropic_adapter.py — wraps the existing v1 function and maps its output to NormalizedResponse. One call site in run_agent.py (the main normalize branch) uses v2 with a back-compat shim to SimpleNamespace for downstream code. No ABC, no registry, no streaming, no client lifecycle. Those land in PR 3 with the first concrete transport (AnthropicTransport). 46 new tests: - test_types.py: dataclass construction, build_tool_call, map_finish_reason - test_anthropic_normalize_v2.py: v1-vs-v2 regression tests (text, tools, thinking, mixed, stop reasons, mcp prefix stripping, edge cases) Part of the provider transport refactor (PR 2 of 9).
fix(tui): fix Linux Ctrl+C regression, remove double clipboard write
2026-04-20 20:13:33 +05:30 · 2026-04-20 07:14:33 -07:00 · 2026-04-20 07:14:33 -07:00 · 2026-04-20 07:14:33 -07:00 · 2026-04-20 07:14:33 -07:00 · 2026-04-20 07:14:33 -07:00
330 changed files with 28288 additions and 6349 deletions
@@ -3,8 +3,13 @@ name: Docker Build and Publish
 on:
  push:
    branches: [main]
-  pull_request:
-    branches: [main]
+    paths:
+      - '**/*.py'
+      - 'pyproject.toml'
+      - 'uv.lock'
+      - 'Dockerfile'
+      - 'docker/**'
+      - '.github/workflows/docker-publish.yml'
  release:
    types: [published]

@@ -49,6 +54,14 @@ jobs:

      - name: Test image starts
        run: |
+          # The image runs as the hermes user (UID 10000).  GitHub Actions
+          # creates /tmp/hermes-test root-owned by default, which hermes
+          # can't write to — chown it to match the in-container UID before
+          # bind-mounting.  Real users doing `docker run -v ~/.hermes:...`
+          # with their own UID hit the same issue and have their own
+          # remediations (HERMES_UID env var, or chown locally).
+          mkdir -p /tmp/hermes-test
+          sudo chown -R 10000:10000 /tmp/hermes-test
          docker run --rm \
            -v /tmp/hermes-test:/opt/data \
            --entrypoint /opt/hermes/docker/entrypoint.sh \
@@ -3,14 +3,31 @@ name: Supply Chain Audit
 on:
  pull_request:
    types: [opened, synchronize, reopened]
+    paths:
+      - '**/*.py'
+      - '**/*.pth'
+      - '**/setup.py'
+      - '**/setup.cfg'
+      - '**/sitecustomize.py'
+      - '**/usercustomize.py'
+      - '**/__init__.pth'

 permissions:
  pull-requests: write
  contents: read

+# Narrow, high-signal scanner. Only fires on critical indicators of supply
+# chain attacks (e.g. the litellm-style payloads). Low-signal heuristics
+# (plain base64, plain exec/eval, dependency/Dockerfile/workflow edits,
+# Actions version unpinning, outbound POST/PUT) were intentionally
+# removed — they fired on nearly every PR and trained reviewers to ignore
+# the scanner. Keep this file's checks ruthlessly narrow: if you find
+# yourself adding WARNING-tier patterns here again, make a separate
+# advisory-only workflow instead.
+
 jobs:
  scan:
-    name: Scan PR for supply chain risks
+    name: Scan PR for critical supply chain risks
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
@@ -18,7 +35,7 @@ jobs:
        with:
          fetch-depth: 0

-      - name: Scan diff for suspicious patterns
+      - name: Scan diff for critical patterns
        id: scan
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -28,19 +45,19 @@ jobs:
          BASE="${{ github.event.pull_request.base.sha }}"
          HEAD="${{ github.event.pull_request.head.sha }}"

-          # Get the full diff (added lines only)
+          # Added lines only, excluding lockfiles.
          DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)

          FINDINGS=""
-          CRITICAL=false

          # --- .pth files (auto-execute on Python startup) ---
+          # The exact mechanism used in the litellm supply chain attack:
+          # https://github.com/BerriAI/litellm/issues/24512
          PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
          if [ -n "$PTH_FILES" ]; then
-            CRITICAL=true
            FINDINGS="${FINDINGS}
          ### 🚨 CRITICAL: .pth file added or modified
-          Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required. This is the exact mechanism used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512).
+          Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required.

          **Files:**
          \`\`\`
@@ -49,13 +66,12 @@ jobs:
          "
          fi

-          # --- base64 + exec/eval combo (the litellm attack pattern) ---
+          # --- base64 decode + exec/eval on the same line (the litellm attack pattern) ---
          B64_EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|decodebytes|urlsafe_b64decode)' | grep -iE 'exec\(|eval\(' | head -10 || true)
          if [ -n "$B64_EXEC_HITS" ]; then
-            CRITICAL=true
            FINDINGS="${FINDINGS}
          ### 🚨 CRITICAL: base64 decode + exec/eval combo
-          This is the exact pattern used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512) — base64-decoded strings passed to exec/eval to hide credential-stealing payloads.
+          Base64-decoded strings passed directly to exec/eval — the signature of hidden credential-stealing payloads.

          **Matches:**
          \`\`\`
@@ -64,41 +80,12 @@ jobs:
          "
          fi

-          # --- base64 decode/encode (alone — legitimate uses exist) ---
-          B64_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|b64encode|decodebytes|encodebytes|urlsafe_b64decode)|atob\(|btoa\(|Buffer\.from\(.*base64' | head -20 || true)
-          if [ -n "$B64_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: base64 encoding/decoding detected
-          Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.
-
-          **Matches (first 20):**
-          \`\`\`
-          ${B64_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- exec/eval with string arguments ---
-          EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E '(exec|eval)\s*\(' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert\|# ' | head -20 || true)
-          if [ -n "$EXEC_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: exec() or eval() usage
-          Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.
-
-          **Matches (first 20):**
-          \`\`\`
-          ${EXEC_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- subprocess with encoded/obfuscated commands ---
-          PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|decode|encode|\\x|chr\(' | head -10 || true)
+          # --- subprocess with encoded/obfuscated command argument ---
+          PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|\\x[0-9a-f]{2}|chr\(' | head -10 || true)
          if [ -n "$PROC_HITS" ]; then
-            CRITICAL=true
            FINDINGS="${FINDINGS}
          ### 🚨 CRITICAL: subprocess with encoded/obfuscated command
-          Subprocess calls with encoded arguments are a strong indicator of payload execution.
+          Subprocess calls whose command strings are base64- or hex-encoded are a strong indicator of payload execution.

          **Matches:**
          \`\`\`
@@ -107,25 +94,12 @@ jobs:
          "
          fi

-          # --- Network calls to non-standard domains ---
-          EXFIL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'requests\.(post|put)\(|httpx\.(post|put)\(|urllib\.request\.urlopen' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert' | head -10 || true)
-          if [ -n "$EXFIL_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: Outbound network calls (POST/PUT)
-          Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.
-
-          **Matches (first 10):**
-          \`\`\`
-          ${EXFIL_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- setup.py / setup.cfg install hooks ---
-          SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(setup\.py|setup\.cfg|__init__\.pth|sitecustomize\.py|usercustomize\.py)$' || true)
+          # --- Install-hook files (setup.py/sitecustomize/usercustomize/__init__.pth) ---
+          # These execute during pip install or interpreter startup.
+          SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(^|/)(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
          if [ -n "$SETUP_HITS" ]; then
            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: Install hook files modified
+          ### 🚨 CRITICAL: Install-hook file added or modified
          These files can execute code during package installation or interpreter startup.

          **Files:**
@@ -135,114 +109,31 @@ jobs:
          "
          fi

-          # --- Compile/marshal/pickle (code object injection) ---
-          MARSHAL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'marshal\.loads|pickle\.loads|compile\(' | grep -v '^\+\s*#' | grep -v 'test_\|re\.compile\|ast\.compile' | head -10 || true)
-          if [ -n "$MARSHAL_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: marshal/pickle/compile usage
-          These can deserialize or construct executable code objects.
-
-          **Matches:**
-          \`\`\`
-          ${MARSHAL_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- CI/CD workflow files modified ---
-          WORKFLOW_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '\.github/workflows/.*\.ya?ml$' || true)
-          if [ -n "$WORKFLOW_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: CI/CD workflow files modified
-          Changes to workflow files can alter build pipelines, inject steps, or modify permissions. Verify no unauthorized actions or secrets access were added.
-
-          **Files:**
-          \`\`\`
-          ${WORKFLOW_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- Dockerfile / container build files modified ---
-          DOCKER_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -iE '(Dockerfile|\.dockerignore|docker-compose)' || true)
-          if [ -n "$DOCKER_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: Container build files modified
-          Changes to Dockerfiles or compose files can alter base images, add build steps, or expose ports. Verify base image pins and build commands.
-
-          **Files:**
-          \`\`\`
-          ${DOCKER_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- Dependency manifest files modified ---
-          DEP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(pyproject\.toml|requirements.*\.txt|package\.json|Gemfile|go\.mod|Cargo\.toml)$' || true)
-          if [ -n "$DEP_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: Dependency manifest files modified
-          Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.
-
-          **Files:**
-          \`\`\`
-          ${DEP_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- GitHub Actions version unpinning (mutable tags instead of SHAs) ---
-          ACTIONS_UNPIN=$(echo "$DIFF" | grep -n '^\+' | grep 'uses:' | grep -v '#' | grep -E '@v[0-9]' | head -10 || true)
-          if [ -n "$ACTIONS_UNPIN" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: GitHub Actions with mutable version tags
-          Actions should be pinned to full commit SHAs (not \`@v4\`, \`@v5\`). Mutable tags can be retargeted silently if a maintainer account is compromised.
-
-          **Matches:**
-          \`\`\`
-          ${ACTIONS_UNPIN}
-          \`\`\`
-          "
-          fi
-
-          # --- Output results ---
          if [ -n "$FINDINGS" ]; then
            echo "found=true" >> "$GITHUB_OUTPUT"
-            if [ "$CRITICAL" = true ]; then
-              echo "critical=true" >> "$GITHUB_OUTPUT"
-            else
-              echo "critical=false" >> "$GITHUB_OUTPUT"
-            fi
-            # Write findings to a file (multiline env vars are fragile)
            echo "$FINDINGS" > /tmp/findings.md
          else
            echo "found=false" >> "$GITHUB_OUTPUT"
-            echo "critical=false" >> "$GITHUB_OUTPUT"
          fi

-      - name: Post warning comment
+      - name: Post critical finding comment
        if: steps.scan.outputs.found == 'true'
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
-          SEVERITY="⚠️ Supply Chain Risk Detected"
-          if [ "${{ steps.scan.outputs.critical }}" = "true" ]; then
-            SEVERITY="🚨 CRITICAL Supply Chain Risk Detected"
-          fi
+          BODY="## 🚨 CRITICAL Supply Chain Risk Detected

-          BODY="## ${SEVERITY}
-
-          This PR contains patterns commonly associated with supply chain attacks. This does **not** mean the PR is malicious — but these patterns require careful human review before merging.
+          This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

          $(cat /tmp/findings.md)

          ---
-          *Automated scan triggered by [supply-chain-audit](/.github/workflows/supply-chain-audit.yml). If this is a false positive, a maintainer can approve after manual review.*"
+          *Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.*"

          gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY" || echo "::warning::Could not post PR comment (expected for fork PRs — GITHUB_TOKEN is read-only)"

      - name: Fail on critical findings
-        if: steps.scan.outputs.critical == 'true'
+        if: steps.scan.outputs.found == 'true'
        run: |
          echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
          exit 1
@@ -3,8 +3,14 @@ name: Tests
 on:
  push:
    branches: [main]
+    paths-ignore:
+      - '**/*.md'
+      - 'docs/**'
  pull_request:
    branches: [main]
+    paths-ignore:
+      - '**/*.md'
+      - 'docs/**'

 permissions:
  contents: read
@@ -17,7 +23,7 @@ concurrency:
 jobs:
  test:
    runs-on: ubuntu-latest
-    timeout-minutes: 10
+    timeout-minutes: 20
    steps:
      - name: Checkout code
        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
@@ -54,6 +54,11 @@ environments/benchmarks/evals/
 # Web UI build output
 hermes_cli/web_dist/

+# Web UI assets — synced from @nous-research/ui at build time via
+# `npm run sync-assets` (see web/package.json).
+web/public/fonts/
+web/public/ds-assets/
+
 # Release script temp files
 .release_notes.md
 mini-swe-agent/
@@ -27,12 +27,10 @@ WORKDIR /opt/hermes
 # Copy only package manifests first so npm install + Playwright are cached
 # unless the lockfiles themselves change.
 COPY package.json package-lock.json ./
-COPY scripts/whatsapp-bridge/package.json scripts/whatsapp-bridge/package-lock.json scripts/whatsapp-bridge/
 COPY web/package.json web/package-lock.json web/

 RUN npm install --prefer-offline --no-audit && \
    npx playwright install --with-deps chromium --only-shell && \
-    (cd scripts/whatsapp-bridge && npm install --prefer-offline --no-audit) && \
    (cd web && npm install --prefer-offline --no-audit) && \
    npm cache clean --force

@@ -20,6 +20,46 @@ from pathlib import Path
 from hermes_constants import get_hermes_home


+# Methods clients send as periodic liveness probes. They are not part of the
+# ACP schema, so the acp router correctly returns JSON-RPC -32601 to the
+# caller — but the supervisor task that dispatches the request then surfaces
+# the raised RequestError via ``logging.exception("Background task failed")``,
+# which dumps a traceback to stderr every probe interval. Clients like
+# acp-bridge already treat the -32601 response as "agent alive", so the
+# traceback is pure noise. We keep the protocol response intact and only
+# silence the stderr noise for this specific benign case.
+_BENIGN_PROBE_METHODS = frozenset({"ping", "health", "healthcheck"})
+
+
+class _BenignProbeMethodFilter(logging.Filter):
+    """Suppress acp 'Background task failed' tracebacks caused by unknown
+    liveness-probe methods (e.g. ``ping``) while leaving every other
+    background-task error — including method_not_found for any non-probe
+    method — visible in stderr.
+    """
+
+    def filter(self, record: logging.LogRecord) -> bool:
+        if record.getMessage() != "Background task failed":
+            return True
+        exc_info = record.exc_info
+        if not exc_info:
+            return True
+        exc = exc_info[1]
+        # Imported lazily so this module stays importable when the optional
+        # ``agent-client-protocol`` dependency is not installed.
+        try:
+            from acp.exceptions import RequestError
+        except ImportError:
+            return True
+        if not isinstance(exc, RequestError):
+            return True
+        if getattr(exc, "code", None) != -32601:
+            return True
+        data = getattr(exc, "data", None)
+        method = data.get("method") if isinstance(data, dict) else None
+        return method not in _BENIGN_PROBE_METHODS
+
+
 def _setup_logging() -> None:
    """Route all logging to stderr so stdout stays clean for ACP stdio."""
    handler = logging.StreamHandler(sys.stderr)
@@ -29,6 +69,7 @@ def _setup_logging() -> None:
            datefmt="%Y-%m-%d %H:%M:%S",
        )
    )
+    handler.addFilter(_BenignProbeMethodFilter())
    root = logging.getLogger()
    root.handlers.clear()
    root.addHandler(handler)
@@ -292,9 +292,15 @@ def _common_betas_for_base_url(base_url: str | None) -> list[str]:
    return _COMMON_BETAS


-def build_anthropic_client(api_key: str, base_url: str = None):
+def build_anthropic_client(api_key: str, base_url: str = None, timeout: float = None):
    """Create an Anthropic client, auto-detecting setup-tokens vs API keys.

+    If *timeout* is provided it overrides the default 900s read timeout.  The
+    connect timeout stays at 10s.  Callers pass this from the per-provider /
+    per-model ``request_timeout_seconds`` config so Anthropic-native and
+    Anthropic-compatible providers respect the same knob as OpenAI-wire
+    providers.
+
    Returns an anthropic.Anthropic instance.
    """
    if _anthropic_sdk is None:
@@ -305,8 +311,9 @@ def build_anthropic_client(api_key: str, base_url: str = None):
    from httpx import Timeout

    normalized_base_url = _normalize_base_url_text(base_url)
+    _read_timeout = timeout if (isinstance(timeout, (int, float)) and timeout > 0) else 900.0
    kwargs = {
-        "timeout": Timeout(timeout=900.0, connect=10.0),
+        "timeout": Timeout(timeout=float(_read_timeout), connect=10.0),
    }
    if normalized_base_url:
        kwargs["base_url"] = normalized_base_url
@@ -1518,3 +1525,42 @@ def normalize_anthropic_response(
        ),
        finish_reason,
    )
+
+
+def normalize_anthropic_response_v2(
+    response,
+    strip_tool_prefix: bool = False,
+) -> "NormalizedResponse":
+    """Normalize Anthropic response to NormalizedResponse.
+
+    Wraps the existing normalize_anthropic_response() and maps its output
+    to the shared transport types.  This allows incremental migration —
+    one call site at a time — without changing the original function.
+    """
+    from agent.transports.types import NormalizedResponse, build_tool_call
+
+    assistant_msg, finish_reason = normalize_anthropic_response(response, strip_tool_prefix)
+
+    tool_calls = None
+    if assistant_msg.tool_calls:
+        tool_calls = [
+            build_tool_call(
+                id=tc.id,
+                name=tc.function.name,
+                arguments=tc.function.arguments,
+            )
+            for tc in assistant_msg.tool_calls
+        ]
+
+    provider_data = {}
+    if getattr(assistant_msg, "reasoning_details", None):
+        provider_data["reasoning_details"] = assistant_msg.reasoning_details
+
+    return NormalizedResponse(
+        content=assistant_msg.content,
+        tool_calls=tool_calls,
+        finish_reason=finish_reason,
+        reasoning=getattr(assistant_msg, "reasoning", None),
+        usage=None,  # Anthropic usage is on the raw response, not the normaliser
+        provider_data=provider_data or None,
+    )
@@ -116,8 +116,25 @@ _KIMI_THINKING_MODELS: frozenset = frozenset({
    "kimi-k2-thinking-turbo",
 })

+# Moonshot's public chat endpoint (api.moonshot.ai/v1) enforces a different
+# temperature contract than the Coding Plan endpoint above.  Empirically,
+# `kimi-k2.5` on the public API rejects 0.6 with HTTP 400
+# "invalid temperature: only 1 is allowed for this model" — the Coding Plan
+# lock (0.6 for non-thinking) does not apply.  `kimi-k2-turbo-preview` and the
+# thinking variants already match the Coding Plan contract on the public
+# endpoint, so we only override the models that diverge.
+# Users hit this endpoint when `KIMI_API_KEY` is a legacy `sk-*` key (the
+# `sk-kimi-*` prefix routes to api.kimi.com/coding/v1 instead — see
+# hermes_cli/auth.py:_kimi_base_url_for_key).
+_KIMI_PUBLIC_API_OVERRIDES: Dict[str, float] = {
+    "kimi-k2.5": 1.0,
+}

-def _fixed_temperature_for_model(model: Optional[str]) -> Optional[float]:
+
+def _fixed_temperature_for_model(
+    model: Optional[str],
+    base_url: Optional[str] = None,
+) -> Optional[float]:
    """Return a required temperature override for models with strict contracts.

    Moonshot's kimi-for-coding endpoint rejects any non-approved temperature on
@@ -125,15 +142,31 @@ def _fixed_temperature_for_model(model: Optional[str]) -> Optional[float]:
    variants require 1.0.  An optional ``vendor/`` prefix (e.g.
    ``moonshotai/kimi-k2.5``) is tolerated for aggregator routings.

+    When ``base_url`` points to Moonshot's public chat endpoint
+    (``api.moonshot.ai``), the contract changes for ``kimi-k2.5``: the public
+    API only accepts ``temperature=1``, not 0.6.  That override takes precedence
+    over the Coding Plan defaults above.
+
    Returns ``None`` for every other model, including ``kimi-k2-instruct*``
    which is the separate non-coding K2 family with variable temperature.
    """
    normalized = (model or "").strip().lower()
+    bare = normalized.rsplit("/", 1)[-1]
+
+    # Public Moonshot API has a stricter contract for some models than the
+    # Coding Plan endpoint — check it first so it wins on conflict.
+    if base_url and ("api.moonshot.ai" in base_url.lower() or "api.moonshot.cn" in base_url.lower()):
+        public = _KIMI_PUBLIC_API_OVERRIDES.get(bare)
+        if public is not None:
+            logger.debug(
+                "Forcing temperature=%s for %r on public Moonshot API", public, model
+            )
+            return public
+
    fixed = _FIXED_TEMPERATURE_MODELS.get(normalized)
    if fixed is not None:
        logger.debug("Forcing temperature=%s for model %r (fixed map)", fixed, model)
        return fixed
-    bare = normalized.rsplit("/", 1)[-1]
    if bare in _KIMI_THINKING_MODELS:
        logger.debug("Forcing temperature=1.0 for kimi thinking model %r", model)
        return 1.0
@@ -200,6 +233,45 @@ _CODEX_AUX_MODEL = "gpt-5.2-codex"
 _CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"


+def _codex_cloudflare_headers(access_token: str) -> Dict[str, str]:
+    """Headers required to avoid Cloudflare 403s on chatgpt.com/backend-api/codex.
+
+    The Cloudflare layer in front of the Codex endpoint whitelists a small set of
+    first-party originators (``codex_cli_rs``, ``codex_vscode``, ``codex_sdk_ts``,
+    anything starting with ``Codex``). Requests from non-residential IPs (VPS,
+    server-hosted agents) that don't advertise an allowed originator are served
+    a 403 with ``cf-mitigated: challenge`` regardless of auth correctness.
+
+    We pin ``originator: codex_cli_rs`` to match the upstream codex-rs CLI, set
+    ``User-Agent`` to a codex_cli_rs-shaped string (beats SDK fingerprinting),
+    and extract ``ChatGPT-Account-ID`` (canonical casing, from codex-rs
+    ``auth.rs``) out of the OAuth JWT's ``chatgpt_account_id`` claim.
+
+    Malformed tokens are tolerated — we drop the account-ID header rather than
+    raise, so a bad token still surfaces as an auth error (401) instead of a
+    crash at client construction.
+    """
+    headers = {
+        "User-Agent": "codex_cli_rs/0.0.0 (Hermes Agent)",
+        "originator": "codex_cli_rs",
+    }
+    if not isinstance(access_token, str) or not access_token.strip():
+        return headers
+    try:
+        import base64
+        parts = access_token.split(".")
+        if len(parts) < 2:
+            return headers
+        payload_b64 = parts[1] + "=" * (-len(parts[1]) % 4)
+        claims = json.loads(base64.urlsafe_b64decode(payload_b64))
+        acct_id = claims.get("https://api.openai.com/auth", {}).get("chatgpt_account_id")
+        if isinstance(acct_id, str) and acct_id:
+            headers["ChatGPT-Account-ID"] = acct_id
+    except Exception:
+        pass
+    return headers
+
+
 def _to_openai_base_url(base_url: str) -> str:
    """Normalize an Anthropic-style base URL to OpenAI-compatible format.

@@ -775,6 +847,11 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
            if model is None:
                continue  # skip provider if we don't know a valid aux model
            logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
+            if provider_id == "gemini":
+                from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
+
+                if is_native_gemini_base_url(base_url):
+                    return GeminiNativeClient(api_key=api_key, base_url=base_url), model
            extra = {}
            if "api.kimi.com" in base_url.lower():
                extra["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
@@ -796,6 +873,11 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
        if model is None:
            continue  # skip provider if we don't know a valid aux model
        logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
+        if provider_id == "gemini":
+            from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
+
+            if is_native_gemini_base_url(base_url):
+                return GeminiNativeClient(api_key=api_key, base_url=base_url), model
        extra = {}
        if "api.kimi.com" in base_url.lower():
            extra["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
@@ -1016,7 +1098,7 @@ def _validate_base_url(base_url: str) -> None:
        ) from exc


-def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
+def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
    runtime = _resolve_custom_runtime()
    if len(runtime) == 2:
        custom_base, custom_key = runtime
@@ -1032,6 +1114,23 @@ def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
    if custom_mode == "codex_responses":
        real_client = OpenAI(api_key=custom_key, base_url=custom_base)
        return CodexAuxiliaryClient(real_client, model), model
+    if custom_mode == "anthropic_messages":
+        # Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
+        # LiteLLM proxies, etc.).  Must NEVER be treated as OAuth —
+        # Anthropic OAuth claims only apply to api.anthropic.com.
+        try:
+            from agent.anthropic_adapter import build_anthropic_client
+            real_client = build_anthropic_client(custom_key, custom_base)
+        except ImportError:
+            logger.warning(
+                "Custom endpoint declares api_mode=anthropic_messages but the "
+                "anthropic SDK is not installed — falling back to OpenAI-wire."
+            )
+            return OpenAI(api_key=custom_key, base_url=custom_base), model
+        return (
+            AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
+            model,
+        )
    return OpenAI(api_key=custom_key, base_url=custom_base), model


@@ -1052,7 +1151,11 @@ def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
            return None, None
        base_url = _CODEX_AUX_BASE_URL
    logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
-    real_client = OpenAI(api_key=codex_token, base_url=base_url)
+    real_client = OpenAI(
+        api_key=codex_token,
+        base_url=base_url,
+        default_headers=_codex_cloudflare_headers(codex_token),
+    )
    return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL


@@ -1348,6 +1451,13 @@ def _to_async_client(sync_client, model: str):
        return AsyncCodexAuxiliaryClient(sync_client), model
    if isinstance(sync_client, AnthropicAuxiliaryClient):
        return AsyncAnthropicAuxiliaryClient(sync_client), model
+    try:
+        from agent.gemini_native_adapter import GeminiNativeClient, AsyncGeminiNativeClient
+
+        if isinstance(sync_client, GeminiNativeClient):
+            return AsyncGeminiNativeClient(sync_client), model
+    except ImportError:
+        pass
    try:
        from agent.copilot_acp_client import CopilotACPClient
        if isinstance(sync_client, CopilotACPClient):
@@ -1512,7 +1622,11 @@ def resolve_provider_client(
                               "but no Codex OAuth token found (run: hermes model)")
                return None, None
            final_model = _normalize_resolved_model(model or _CODEX_AUX_MODEL, provider)
-            raw_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
+            raw_client = OpenAI(
+                api_key=codex_token,
+                base_url=_CODEX_AUX_BASE_URL,
+                default_headers=_codex_cloudflare_headers(codex_token),
+            )
            return (raw_client, final_model)
        # Standard path: wrap in CodexAuxiliaryClient adapter
        client, default = _try_codex()
@@ -1640,6 +1754,15 @@ def resolve_provider_client(
        default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
        final_model = _normalize_resolved_model(model or default_model, provider)

+        if provider == "gemini":
+            from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
+
+            if is_native_gemini_base_url(base_url):
+                client = GeminiNativeClient(api_key=api_key, base_url=base_url)
+                logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
+                return (_to_async_client(client, final_model) if async_mode
+                        else (client, final_model))
+
        # Provider-specific headers
        headers = {}
        if "api.kimi.com" in base_url.lower():
@@ -2190,7 +2313,6 @@ def _resolve_task_provider_model(
    to "custom" and the task uses that direct endpoint. api_mode is one of
    "chat_completions", "codex_responses", or None (auto-detect).
    """
-    config = {}
    cfg_provider = None
    cfg_model = None
    cfg_base_url = None
@@ -2198,16 +2320,7 @@ def _resolve_task_provider_model(
    cfg_api_mode = None

    if task:
-        try:
-            from hermes_cli.config import load_config
-            config = load_config()
-        except ImportError:
-            config = {}
-
-        aux = config.get("auxiliary", {}) if isinstance(config, dict) else {}
-        task_config = aux.get(task, {}) if isinstance(aux, dict) else {}
-        if not isinstance(task_config, dict):
-            task_config = {}
+        task_config = _get_auxiliary_task_config(task)
        cfg_provider = str(task_config.get("provider", "")).strip() or None
        cfg_model = str(task_config.get("model", "")).strip() or None
        cfg_base_url = str(task_config.get("base_url", "")).strip() or None
@@ -2237,17 +2350,25 @@ def _resolve_task_provider_model(
 _DEFAULT_AUX_TIMEOUT = 30.0


-def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float:
-    """Read timeout from auxiliary.{task}.timeout in config, falling back to *default*."""
+def _get_auxiliary_task_config(task: str) -> Dict[str, Any]:
+    """Return the config dict for auxiliary.<task>, or {} when unavailable."""
    if not task:
-        return default
+        return {}
    try:
        from hermes_cli.config import load_config
        config = load_config()
    except ImportError:
-        return default
+        return {}
    aux = config.get("auxiliary", {}) if isinstance(config, dict) else {}
    task_config = aux.get(task, {}) if isinstance(aux, dict) else {}
+    return task_config if isinstance(task_config, dict) else {}
+
+
+def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float:
+    """Read timeout from auxiliary.{task}.timeout in config, falling back to *default*."""
+    if not task:
+        return default
+    task_config = _get_auxiliary_task_config(task)
    raw = task_config.get("timeout")
    if raw is not None:
        try:
@@ -2257,6 +2378,15 @@ def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float
    return default


+def _get_task_extra_body(task: str) -> Dict[str, Any]:
+    """Read auxiliary.<task>.extra_body and return a shallow copy when valid."""
+    task_config = _get_auxiliary_task_config(task)
+    raw = task_config.get("extra_body")
+    if isinstance(raw, dict):
+        return dict(raw)
+    return {}
+
+
 # ---------------------------------------------------------------------------
 # Anthropic-compatible endpoint detection + image block conversion
 # ---------------------------------------------------------------------------
@@ -2344,7 +2474,7 @@ def _build_call_kwargs(
        "timeout": timeout,
    }

-    fixed_temperature = _fixed_temperature_for_model(model)
+    fixed_temperature = _fixed_temperature_for_model(model, base_url)
    if fixed_temperature is not None:
        temperature = fixed_temperature

@@ -2457,6 +2587,8 @@ def call_llm(
    """
    resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
        task, provider, model, base_url, api_key)
+    effective_extra_body = _get_task_extra_body(task)
+    effective_extra_body.update(extra_body or {})

    if task == "vision":
        effective_provider, client, final_model = resolve_vision_provider_client(
@@ -2525,11 +2657,14 @@ def call_llm(
                     task, resolved_provider or "auto", final_model or "default",
                     f" at {_base_info}" if _base_info and "openrouter" not in _base_info else "")

+    # Pass the client's actual base_url (not just resolved_base_url) so
+    # endpoint-specific temperature overrides can distinguish
+    # api.moonshot.ai vs api.kimi.com/coding even on auto-detected routes.
    kwargs = _build_call_kwargs(
        resolved_provider, final_model, messages,
        temperature=temperature, max_tokens=max_tokens,
-        tools=tools, timeout=effective_timeout, extra_body=extra_body,
-        base_url=resolved_base_url)
+        tools=tools, timeout=effective_timeout, extra_body=effective_extra_body,
+        base_url=_base_info or resolved_base_url)

    # Convert image blocks for Anthropic-compatible endpoints (e.g. MiniMax)
    _client_base = str(getattr(client, "base_url", "") or "")
@@ -2583,7 +2718,8 @@ def call_llm(
                    fb_label, fb_model, messages,
                    temperature=temperature, max_tokens=max_tokens,
                    tools=tools, timeout=effective_timeout,
-                    extra_body=extra_body)
+                    extra_body=effective_extra_body,
+                    base_url=str(getattr(fb_client, "base_url", "") or ""))
                return _validate_llm_response(
                    fb_client.chat.completions.create(**fb_kwargs), task)
        raise
@@ -2665,6 +2801,8 @@ async def async_call_llm(
    """
    resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
        task, provider, model, base_url, api_key)
+    effective_extra_body = _get_task_extra_body(task)
+    effective_extra_body.update(extra_body or {})

    if task == "vision":
        effective_provider, client, final_model = resolve_vision_provider_client(
@@ -2718,14 +2856,17 @@ async def async_call_llm(

    effective_timeout = timeout if timeout is not None else _get_task_timeout(task)

+    # Pass the client's actual base_url (not just resolved_base_url) so
+    # endpoint-specific temperature overrides can distinguish
+    # api.moonshot.ai vs api.kimi.com/coding even on auto-detected routes.
+    _client_base = str(getattr(client, "base_url", "") or "")
    kwargs = _build_call_kwargs(
        resolved_provider, final_model, messages,
        temperature=temperature, max_tokens=max_tokens,
-        tools=tools, timeout=effective_timeout, extra_body=extra_body,
-        base_url=resolved_base_url)
+        tools=tools, timeout=effective_timeout, extra_body=effective_extra_body,
+        base_url=_client_base or resolved_base_url)

    # Convert image blocks for Anthropic-compatible endpoints (e.g. MiniMax)
-    _client_base = str(getattr(client, "base_url", "") or "")
    if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
        kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])

@@ -2761,7 +2902,8 @@ async def async_call_llm(
                    fb_label, fb_model, messages,
                    temperature=temperature, max_tokens=max_tokens,
                    tools=tools, timeout=effective_timeout,
-                    extra_body=extra_body)
+                    extra_body=effective_extra_body,
+                    base_url=str(getattr(fb_client, "base_url", "") or ""))
                # Convert sync fallback client to async
                async_fb, async_fb_model = _to_async_client(fb_client, fb_model or "")
                if async_fb_model and async_fb_model != fb_kwargs.get("model"):
@@ -633,7 +633,9 @@ class ContextCompressor(ContextEngine):
            "assistant that continues the conversation. "
            "Do NOT respond to any questions or requests in the conversation — "
            "only output the structured summary. "
-            "Do NOT include any preamble, greeting, or prefix."
+            "Do NOT include any preamble, greeting, or prefix. "
+            "Write the summary in the same language the user was using in the "
+            "conversation — do not translate or switch to English."
        )

        # Shared structured template (used by both paths).
@@ -483,9 +483,7 @@ def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
            text=True,
            timeout=10,
        )
-    except FileNotFoundError:
-        return None
-    except subprocess.TimeoutExpired:
+    except (FileNotFoundError, OSError, subprocess.TimeoutExpired):
        return None
    if result.returncode != 0:
        return None
@@ -225,9 +225,11 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
            content = _oneline(args.get("content", ""))
            return f"+{target}: \"{content[:25]}{'...' if len(content) > 25 else ''}\""
        elif action == "replace":
-            return f"~{target}: \"{_oneline(args.get('old_text', '')[:20])}\""
+            old = _oneline(args.get("old_text") or "") or "<missing old_text>"
+            return f"~{target}: \"{old[:20]}\""
        elif action == "remove":
-            return f"-{target}: \"{_oneline(args.get('old_text', '')[:20])}\""
+            old = _oneline(args.get("old_text") or "") or "<missing old_text>"
+            return f"-{target}: \"{old[:20]}\""
        return action

    if tool_name == "send_message":
@@ -939,9 +941,13 @@ def get_cute_tool_message(
        if action == "add":
            return _wrap(f"┊ 🧠 memory    +{target}: \"{_trunc(args.get('content', ''), 30)}\"  {dur}")
        elif action == "replace":
-            return _wrap(f"┊ 🧠 memory    ~{target}: \"{_trunc(args.get('old_text', ''), 20)}\"  {dur}")
+            old = args.get("old_text") or ""
+            old = old if old else "<missing old_text>"
+            return _wrap(f"┊ 🧠 memory    ~{target}: \"{_trunc(old, 20)}\"  {dur}")
        elif action == "remove":
-            return _wrap(f"┊ 🧠 memory    -{target}: \"{_trunc(args.get('old_text', ''), 20)}\"  {dur}")
+            old = args.get("old_text") or ""
+            old = old if old else "<missing old_text>"
+            return _wrap(f"┊ 🧠 memory    -{target}: \"{_trunc(old, 20)}\"  {dur}")
        return _wrap(f"┊ 🧠 memory    {action}  {dur}")
    if tool_name == "skills_list":
        return _wrap(f"┊ 📚 skills    list {args.get('category', 'all')}  {dur}")
@@ -290,7 +290,7 @@ def classify_api_error(
    if isinstance(body, dict):
        _err_obj = body.get("error", {})
        if isinstance(_err_obj, dict):
-            _body_msg = (_err_obj.get("message") or "").lower()
+            _body_msg = str(_err_obj.get("message") or "").lower()
            # Parse metadata.raw for wrapped provider errors
            _metadata = _err_obj.get("metadata", {})
            if isinstance(_metadata, dict):
@@ -302,11 +302,11 @@ def classify_api_error(
                        if isinstance(_inner, dict):
                            _inner_err = _inner.get("error", {})
                            if isinstance(_inner_err, dict):
-                                _metadata_msg = (_inner_err.get("message") or "").lower()
+                                _metadata_msg = str(_inner_err.get("message") or "").lower()
                    except (json.JSONDecodeError, TypeError):
                        pass
        if not _body_msg:
-            _body_msg = (body.get("message") or "").lower()
+            _body_msg = str(body.get("message") or "").lower()
    # Combine all message sources for pattern matching
    parts = [_raw_msg]
    if _body_msg and _body_msg not in _raw_msg:
@@ -606,10 +606,10 @@ def _classify_400(
    if isinstance(body, dict):
        err_obj = body.get("error", {})
        if isinstance(err_obj, dict):
-            err_body_msg = (err_obj.get("message") or "").strip().lower()
+            err_body_msg = str(err_obj.get("message") or "").strip().lower()
        # Responses API (and some providers) use flat body: {"message": "..."}
        if not err_body_msg:
-            err_body_msg = (body.get("message") or "").strip().lower()
+            err_body_msg = str(body.get("message") or "").strip().lower()
    is_generic = len(err_body_msg) < 30 or err_body_msg in ("error", "")
    is_large = approx_tokens > context_length * 0.4 or approx_tokens > 80000 or num_messages > 80

@@ -39,6 +39,7 @@ from typing import Any, Dict, Iterator, List, Optional
 import httpx

 from agent import google_oauth
+from agent.gemini_schema import sanitize_gemini_tool_parameters
 from agent.google_code_assist import (
    CODE_ASSIST_ENDPOINT,
    FREE_TIER_ID,
@@ -205,7 +206,7 @@ def _translate_tools_to_gemini(tools: Any) -> List[Dict[str, Any]]:
            decl["description"] = str(fn["description"])
        params = fn.get("parameters")
        if isinstance(params, dict):
-            decl["parameters"] = params
+            decl["parameters"] = sanitize_gemini_tool_parameters(params)
        declarations.append(decl)
    if not declarations:
        return []
@@ -504,9 +505,16 @@ def _iter_sse_events(response: httpx.Response) -> Iterator[Dict[str, Any]]:
 def _translate_stream_event(
    event: Dict[str, Any],
    model: str,
-    tool_call_indices: Dict[str, int],
+    tool_call_counter: List[int],
 ) -> List[_GeminiStreamChunk]:
-    """Unwrap Code Assist envelope and emit OpenAI-shaped chunk(s)."""
+    """Unwrap Code Assist envelope and emit OpenAI-shaped chunk(s).
+
+    ``tool_call_counter`` is a single-element list used as a mutable counter
+    across events in the same stream. Each ``functionCall`` part gets a
+    fresh, unique OpenAI ``index`` — keying by function name would collide
+    whenever the model issues parallel calls to the same tool (e.g. reading
+    three files in one turn).
+    """
    inner = event.get("response") if isinstance(event.get("response"), dict) else event
    candidates = inner.get("candidates") or []
    if not candidates:
@@ -532,7 +540,8 @@ def _translate_stream_event(
        fc = part.get("functionCall")
        if isinstance(fc, dict) and fc.get("name"):
            name = str(fc["name"])
-            idx = tool_call_indices.setdefault(name, len(tool_call_indices))
+            idx = tool_call_counter[0]
+            tool_call_counter[0] += 1
            try:
                args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
            except (TypeError, ValueError):
@@ -549,7 +558,7 @@ def _translate_stream_event(
    finish_reason_raw = str(cand.get("finishReason") or "")
    if finish_reason_raw:
        mapped = _map_gemini_finish_reason(finish_reason_raw)
-        if tool_call_indices:
+        if tool_call_counter[0] > 0:
            mapped = "tool_calls"
        chunks.append(_make_stream_chunk(model=model, finish_reason=mapped))
    return chunks
@@ -733,9 +742,9 @@ class GeminiCloudCodeClient:
                        # Materialize error body for better diagnostics
                        response.read()
                        raise _gemini_http_error(response)
-                    tool_call_indices: Dict[str, int] = {}
+                    tool_call_counter: List[int] = [0]
                    for event in _iter_sse_events(response):
-                        for chunk in _translate_stream_event(event, model, tool_call_indices):
+                        for chunk in _translate_stream_event(event, model, tool_call_counter):
                            yield chunk
            except httpx.HTTPError as exc:
                raise CodeAssistError(
@@ -0,0 +1,846 @@
+"""OpenAI-compatible facade over Google AI Studio's native Gemini API.
+
+Hermes keeps ``api_mode='chat_completions'`` for the ``gemini`` provider so the
+main agent loop can keep using its existing OpenAI-shaped message flow.
+This adapter is the transport shim that converts those OpenAI-style
+``messages[]`` / ``tools[]`` requests into Gemini's native
+``models/{model}:generateContent`` schema and converts the responses back.
+
+Why this exists
+---------------
+Google's OpenAI-compatible endpoint has been brittle for Hermes's multi-turn
+agent/tool loop (auth churn, tool-call replay quirks, thought-signature
+requirements).  The native Gemini API is the canonical path and avoids the
+OpenAI-compat layer entirely.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import base64
+import json
+import logging
+import time
+import uuid
+from types import SimpleNamespace
+from typing import Any, Dict, Iterator, List, Optional
+
+import httpx
+
+from agent.gemini_schema import sanitize_gemini_tool_parameters
+
+logger = logging.getLogger(__name__)
+
+DEFAULT_GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta"
+
+
+def is_native_gemini_base_url(base_url: str) -> bool:
+    """Return True when the endpoint speaks Gemini's native REST API."""
+    normalized = str(base_url or "").strip().rstrip("/").lower()
+    if not normalized:
+        return False
+    if "generativelanguage.googleapis.com" not in normalized:
+        return False
+    return not normalized.endswith("/openai")
+
+
+class GeminiAPIError(Exception):
+    """Error shape compatible with Hermes retry/error classification."""
+
+    def __init__(
+        self,
+        message: str,
+        *,
+        code: str = "gemini_api_error",
+        status_code: Optional[int] = None,
+        response: Optional[httpx.Response] = None,
+        retry_after: Optional[float] = None,
+        details: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        super().__init__(message)
+        self.code = code
+        self.status_code = status_code
+        self.response = response
+        self.retry_after = retry_after
+        self.details = details or {}
+
+
+def _coerce_content_to_text(content: Any) -> str:
+    if content is None:
+        return ""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        pieces: List[str] = []
+        for part in content:
+            if isinstance(part, str):
+                pieces.append(part)
+            elif isinstance(part, dict) and part.get("type") == "text":
+                text = part.get("text")
+                if isinstance(text, str):
+                    pieces.append(text)
+        return "\n".join(pieces)
+    return str(content)
+
+
+def _extract_multimodal_parts(content: Any) -> List[Dict[str, Any]]:
+    if not isinstance(content, list):
+        text = _coerce_content_to_text(content)
+        return [{"text": text}] if text else []
+
+    parts: List[Dict[str, Any]] = []
+    for item in content:
+        if isinstance(item, str):
+            parts.append({"text": item})
+            continue
+        if not isinstance(item, dict):
+            continue
+        ptype = item.get("type")
+        if ptype == "text":
+            text = item.get("text")
+            if isinstance(text, str) and text:
+                parts.append({"text": text})
+        elif ptype == "image_url":
+            url = ((item.get("image_url") or {}).get("url") or "")
+            if not isinstance(url, str) or not url.startswith("data:"):
+                continue
+            try:
+                header, encoded = url.split(",", 1)
+                mime = header.split(":", 1)[1].split(";", 1)[0]
+                raw = base64.b64decode(encoded)
+            except Exception:
+                continue
+            parts.append(
+                {
+                    "inlineData": {
+                        "mimeType": mime,
+                        "data": base64.b64encode(raw).decode("ascii"),
+                    }
+                }
+            )
+    return parts
+
+
+def _tool_call_extra_signature(tool_call: Dict[str, Any]) -> Optional[str]:
+    extra = tool_call.get("extra_content") or {}
+    if not isinstance(extra, dict):
+        return None
+    google = extra.get("google") or extra.get("thought_signature")
+    if isinstance(google, dict):
+        sig = google.get("thought_signature") or google.get("thoughtSignature")
+        return str(sig) if isinstance(sig, str) and sig else None
+    if isinstance(google, str) and google:
+        return google
+    return None
+
+
+def _translate_tool_call_to_gemini(tool_call: Dict[str, Any]) -> Dict[str, Any]:
+    fn = tool_call.get("function") or {}
+    args_raw = fn.get("arguments", "")
+    try:
+        args = json.loads(args_raw) if isinstance(args_raw, str) and args_raw else {}
+    except json.JSONDecodeError:
+        args = {"_raw": args_raw}
+    if not isinstance(args, dict):
+        args = {"_value": args}
+
+    part: Dict[str, Any] = {
+        "functionCall": {
+            "name": str(fn.get("name") or ""),
+            "args": args,
+        }
+    }
+    thought_signature = _tool_call_extra_signature(tool_call)
+    if thought_signature:
+        part["thoughtSignature"] = thought_signature
+    return part
+
+
+def _translate_tool_result_to_gemini(
+    message: Dict[str, Any],
+    tool_name_by_call_id: Optional[Dict[str, str]] = None,
+) -> Dict[str, Any]:
+    tool_name_by_call_id = tool_name_by_call_id or {}
+    tool_call_id = str(message.get("tool_call_id") or "")
+    name = str(
+        message.get("name")
+        or tool_name_by_call_id.get(tool_call_id)
+        or tool_call_id
+        or "tool"
+    )
+    content = _coerce_content_to_text(message.get("content"))
+    try:
+        parsed = json.loads(content) if content.strip().startswith(("{", "[")) else None
+    except json.JSONDecodeError:
+        parsed = None
+    response = parsed if isinstance(parsed, dict) else {"output": content}
+    return {
+        "functionResponse": {
+            "name": name,
+            "response": response,
+        }
+    }
+
+
+def _build_gemini_contents(messages: List[Dict[str, Any]]) -> tuple[List[Dict[str, Any]], Optional[Dict[str, Any]]]:
+    system_text_parts: List[str] = []
+    contents: List[Dict[str, Any]] = []
+    tool_name_by_call_id: Dict[str, str] = {}
+
+    for msg in messages:
+        if not isinstance(msg, dict):
+            continue
+        role = str(msg.get("role") or "user")
+
+        if role == "system":
+            system_text_parts.append(_coerce_content_to_text(msg.get("content")))
+            continue
+
+        if role in {"tool", "function"}:
+            contents.append(
+                {
+                    "role": "user",
+                    "parts": [
+                        _translate_tool_result_to_gemini(
+                            msg,
+                            tool_name_by_call_id=tool_name_by_call_id,
+                        )
+                    ],
+                }
+            )
+            continue
+
+        gemini_role = "model" if role == "assistant" else "user"
+        parts: List[Dict[str, Any]] = []
+
+        content_parts = _extract_multimodal_parts(msg.get("content"))
+        parts.extend(content_parts)
+
+        tool_calls = msg.get("tool_calls") or []
+        if isinstance(tool_calls, list):
+            for tool_call in tool_calls:
+                if isinstance(tool_call, dict):
+                    tool_call_id = str(tool_call.get("id") or tool_call.get("call_id") or "")
+                    tool_name = str(((tool_call.get("function") or {}).get("name") or ""))
+                    if tool_call_id and tool_name:
+                        tool_name_by_call_id[tool_call_id] = tool_name
+                    parts.append(_translate_tool_call_to_gemini(tool_call))
+
+        if parts:
+            contents.append({"role": gemini_role, "parts": parts})
+
+    system_instruction = None
+    joined_system = "\n".join(part for part in system_text_parts if part).strip()
+    if joined_system:
+        system_instruction = {"parts": [{"text": joined_system}]}
+    return contents, system_instruction
+
+
+def _translate_tools_to_gemini(tools: Any) -> List[Dict[str, Any]]:
+    if not isinstance(tools, list):
+        return []
+    declarations: List[Dict[str, Any]] = []
+    for tool in tools:
+        if not isinstance(tool, dict):
+            continue
+        fn = tool.get("function") or {}
+        if not isinstance(fn, dict):
+            continue
+        name = fn.get("name")
+        if not isinstance(name, str) or not name:
+            continue
+        decl: Dict[str, Any] = {"name": name}
+        description = fn.get("description")
+        if isinstance(description, str) and description:
+            decl["description"] = description
+        parameters = fn.get("parameters")
+        if isinstance(parameters, dict):
+            decl["parameters"] = sanitize_gemini_tool_parameters(parameters)
+        declarations.append(decl)
+    return [{"functionDeclarations": declarations}] if declarations else []
+
+
+def _translate_tool_choice_to_gemini(tool_choice: Any) -> Optional[Dict[str, Any]]:
+    if tool_choice is None:
+        return None
+    if isinstance(tool_choice, str):
+        if tool_choice == "auto":
+            return {"functionCallingConfig": {"mode": "AUTO"}}
+        if tool_choice == "required":
+            return {"functionCallingConfig": {"mode": "ANY"}}
+        if tool_choice == "none":
+            return {"functionCallingConfig": {"mode": "NONE"}}
+    if isinstance(tool_choice, dict):
+        fn = tool_choice.get("function") or {}
+        name = fn.get("name")
+        if isinstance(name, str) and name:
+            return {"functionCallingConfig": {"mode": "ANY", "allowedFunctionNames": [name]}}
+    return None
+
+
+def _normalize_thinking_config(config: Any) -> Optional[Dict[str, Any]]:
+    if not isinstance(config, dict) or not config:
+        return None
+    budget = config.get("thinkingBudget", config.get("thinking_budget"))
+    include = config.get("includeThoughts", config.get("include_thoughts"))
+    level = config.get("thinkingLevel", config.get("thinking_level"))
+    normalized: Dict[str, Any] = {}
+    if isinstance(budget, (int, float)):
+        normalized["thinkingBudget"] = int(budget)
+    if isinstance(include, bool):
+        normalized["includeThoughts"] = include
+    if isinstance(level, str) and level.strip():
+        normalized["thinkingLevel"] = level.strip().lower()
+    return normalized or None
+
+
+def build_gemini_request(
+    *,
+    messages: List[Dict[str, Any]],
+    tools: Any = None,
+    tool_choice: Any = None,
+    temperature: Optional[float] = None,
+    max_tokens: Optional[int] = None,
+    top_p: Optional[float] = None,
+    stop: Any = None,
+    thinking_config: Any = None,
+) -> Dict[str, Any]:
+    contents, system_instruction = _build_gemini_contents(messages)
+    request: Dict[str, Any] = {"contents": contents}
+    if system_instruction:
+        request["systemInstruction"] = system_instruction
+
+    gemini_tools = _translate_tools_to_gemini(tools)
+    if gemini_tools:
+        request["tools"] = gemini_tools
+
+    tool_config = _translate_tool_choice_to_gemini(tool_choice)
+    if tool_config:
+        request["toolConfig"] = tool_config
+
+    generation_config: Dict[str, Any] = {}
+    if temperature is not None:
+        generation_config["temperature"] = temperature
+    if max_tokens is not None:
+        generation_config["maxOutputTokens"] = max_tokens
+    if top_p is not None:
+        generation_config["topP"] = top_p
+    if stop:
+        generation_config["stopSequences"] = stop if isinstance(stop, list) else [str(stop)]
+    normalized_thinking = _normalize_thinking_config(thinking_config)
+    if normalized_thinking:
+        generation_config["thinkingConfig"] = normalized_thinking
+    if generation_config:
+        request["generationConfig"] = generation_config
+
+    return request
+
+
+def _map_gemini_finish_reason(reason: str) -> str:
+    mapping = {
+        "STOP": "stop",
+        "MAX_TOKENS": "length",
+        "SAFETY": "content_filter",
+        "RECITATION": "content_filter",
+        "OTHER": "stop",
+    }
+    return mapping.get(str(reason or "").upper(), "stop")
+
+
+def _tool_call_extra_from_part(part: Dict[str, Any]) -> Optional[Dict[str, Any]]:
+    sig = part.get("thoughtSignature")
+    if isinstance(sig, str) and sig:
+        return {"google": {"thought_signature": sig}}
+    return None
+
+
+def _empty_response(model: str) -> SimpleNamespace:
+    message = SimpleNamespace(
+        role="assistant",
+        content="",
+        tool_calls=None,
+        reasoning=None,
+        reasoning_content=None,
+        reasoning_details=None,
+    )
+    choice = SimpleNamespace(index=0, message=message, finish_reason="stop")
+    usage = SimpleNamespace(
+        prompt_tokens=0,
+        completion_tokens=0,
+        total_tokens=0,
+        prompt_tokens_details=SimpleNamespace(cached_tokens=0),
+    )
+    return SimpleNamespace(
+        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
+        object="chat.completion",
+        created=int(time.time()),
+        model=model,
+        choices=[choice],
+        usage=usage,
+    )
+
+
+def translate_gemini_response(resp: Dict[str, Any], model: str) -> SimpleNamespace:
+    candidates = resp.get("candidates") or []
+    if not isinstance(candidates, list) or not candidates:
+        return _empty_response(model)
+
+    cand = candidates[0] if isinstance(candidates[0], dict) else {}
+    content_obj = cand.get("content") if isinstance(cand, dict) else {}
+    parts = content_obj.get("parts") if isinstance(content_obj, dict) else []
+
+    text_pieces: List[str] = []
+    reasoning_pieces: List[str] = []
+    tool_calls: List[SimpleNamespace] = []
+
+    for index, part in enumerate(parts or []):
+        if not isinstance(part, dict):
+            continue
+        if part.get("thought") is True and isinstance(part.get("text"), str):
+            reasoning_pieces.append(part["text"])
+            continue
+        if isinstance(part.get("text"), str):
+            text_pieces.append(part["text"])
+            continue
+        fc = part.get("functionCall")
+        if isinstance(fc, dict) and fc.get("name"):
+            try:
+                args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
+            except (TypeError, ValueError):
+                args_str = "{}"
+            tool_call = SimpleNamespace(
+                id=f"call_{uuid.uuid4().hex[:12]}",
+                type="function",
+                index=index,
+                function=SimpleNamespace(name=str(fc["name"]), arguments=args_str),
+            )
+            extra_content = _tool_call_extra_from_part(part)
+            if extra_content:
+                tool_call.extra_content = extra_content
+            tool_calls.append(tool_call)
+
+    finish_reason = "tool_calls" if tool_calls else _map_gemini_finish_reason(str(cand.get("finishReason") or ""))
+    usage_meta = resp.get("usageMetadata") or {}
+    usage = SimpleNamespace(
+        prompt_tokens=int(usage_meta.get("promptTokenCount") or 0),
+        completion_tokens=int(usage_meta.get("candidatesTokenCount") or 0),
+        total_tokens=int(usage_meta.get("totalTokenCount") or 0),
+        prompt_tokens_details=SimpleNamespace(
+            cached_tokens=int(usage_meta.get("cachedContentTokenCount") or 0),
+        ),
+    )
+    reasoning = "".join(reasoning_pieces) or None
+    message = SimpleNamespace(
+        role="assistant",
+        content="".join(text_pieces) if text_pieces else None,
+        tool_calls=tool_calls or None,
+        reasoning=reasoning,
+        reasoning_content=reasoning,
+        reasoning_details=None,
+    )
+    choice = SimpleNamespace(index=0, message=message, finish_reason=finish_reason)
+    return SimpleNamespace(
+        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
+        object="chat.completion",
+        created=int(time.time()),
+        model=model,
+        choices=[choice],
+        usage=usage,
+    )
+
+
+class _GeminiStreamChunk(SimpleNamespace):
+    pass
+
+
+def _make_stream_chunk(
+    *,
+    model: str,
+    content: str = "",
+    tool_call_delta: Optional[Dict[str, Any]] = None,
+    finish_reason: Optional[str] = None,
+    reasoning: str = "",
+) -> _GeminiStreamChunk:
+    delta_kwargs: Dict[str, Any] = {
+        "role": "assistant",
+        "content": None,
+        "tool_calls": None,
+        "reasoning": None,
+        "reasoning_content": None,
+    }
+    if content:
+        delta_kwargs["content"] = content
+    if tool_call_delta is not None:
+        tool_delta = SimpleNamespace(
+            index=tool_call_delta.get("index", 0),
+            id=tool_call_delta.get("id") or f"call_{uuid.uuid4().hex[:12]}",
+            type="function",
+            function=SimpleNamespace(
+                name=tool_call_delta.get("name") or "",
+                arguments=tool_call_delta.get("arguments") or "",
+            ),
+        )
+        extra_content = tool_call_delta.get("extra_content")
+        if isinstance(extra_content, dict):
+            tool_delta.extra_content = extra_content
+        delta_kwargs["tool_calls"] = [tool_delta]
+    if reasoning:
+        delta_kwargs["reasoning"] = reasoning
+        delta_kwargs["reasoning_content"] = reasoning
+    delta = SimpleNamespace(**delta_kwargs)
+    choice = SimpleNamespace(index=0, delta=delta, finish_reason=finish_reason)
+    return _GeminiStreamChunk(
+        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
+        object="chat.completion.chunk",
+        created=int(time.time()),
+        model=model,
+        choices=[choice],
+        usage=None,
+    )
+
+
+def _iter_sse_events(response: httpx.Response) -> Iterator[Dict[str, Any]]:
+    buffer = ""
+    for chunk in response.iter_text():
+        if not chunk:
+            continue
+        buffer += chunk
+        while "\n" in buffer:
+            line, buffer = buffer.split("\n", 1)
+            line = line.rstrip("\r")
+            if not line:
+                continue
+            if not line.startswith("data: "):
+                continue
+            data = line[6:]
+            if data == "[DONE]":
+                return
+            try:
+                payload = json.loads(data)
+            except json.JSONDecodeError:
+                logger.debug("Non-JSON Gemini SSE line: %s", data[:200])
+                continue
+            if isinstance(payload, dict):
+                yield payload
+
+
+def translate_stream_event(event: Dict[str, Any], model: str, tool_call_indices: Dict[str, Dict[str, Any]]) -> List[_GeminiStreamChunk]:
+    candidates = event.get("candidates") or []
+    if not candidates:
+        return []
+    cand = candidates[0] if isinstance(candidates[0], dict) else {}
+    parts = ((cand.get("content") or {}).get("parts") or []) if isinstance(cand, dict) else []
+    chunks: List[_GeminiStreamChunk] = []
+
+    for part_index, part in enumerate(parts):
+        if not isinstance(part, dict):
+            continue
+        if part.get("thought") is True and isinstance(part.get("text"), str):
+            chunks.append(_make_stream_chunk(model=model, reasoning=part["text"]))
+            continue
+        if isinstance(part.get("text"), str) and part["text"]:
+            chunks.append(_make_stream_chunk(model=model, content=part["text"]))
+        fc = part.get("functionCall")
+        if isinstance(fc, dict) and fc.get("name"):
+            name = str(fc["name"])
+            try:
+                args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False, sort_keys=True)
+            except (TypeError, ValueError):
+                args_str = "{}"
+            thought_signature = part.get("thoughtSignature") if isinstance(part.get("thoughtSignature"), str) else ""
+            call_key = json.dumps(
+                {
+                    "part_index": part_index,
+                    "name": name,
+                    "thought_signature": thought_signature,
+                },
+                sort_keys=True,
+            )
+            slot = tool_call_indices.get(call_key)
+            if slot is None:
+                slot = {
+                    "index": len(tool_call_indices),
+                    "id": f"call_{uuid.uuid4().hex[:12]}",
+                    "last_arguments": "",
+                }
+                tool_call_indices[call_key] = slot
+            emitted_arguments = args_str
+            last_arguments = str(slot.get("last_arguments") or "")
+            if last_arguments:
+                if args_str == last_arguments:
+                    emitted_arguments = ""
+                elif args_str.startswith(last_arguments):
+                    emitted_arguments = args_str[len(last_arguments):]
+            slot["last_arguments"] = args_str
+            chunks.append(
+                _make_stream_chunk(
+                    model=model,
+                    tool_call_delta={
+                        "index": slot["index"],
+                        "id": slot["id"],
+                        "name": name,
+                        "arguments": emitted_arguments,
+                        "extra_content": _tool_call_extra_from_part(part),
+                    },
+                )
+            )
+
+    finish_reason_raw = str(cand.get("finishReason") or "")
+    if finish_reason_raw:
+        mapped = "tool_calls" if tool_call_indices else _map_gemini_finish_reason(finish_reason_raw)
+        chunks.append(_make_stream_chunk(model=model, finish_reason=mapped))
+    return chunks
+
+
+def gemini_http_error(response: httpx.Response) -> GeminiAPIError:
+    status = response.status_code
+    body_text = ""
+    body_json: Dict[str, Any] = {}
+    try:
+        body_text = response.text
+    except Exception:
+        body_text = ""
+    if body_text:
+        try:
+            parsed = json.loads(body_text)
+            if isinstance(parsed, dict):
+                body_json = parsed
+        except (ValueError, TypeError):
+            body_json = {}
+
+    err_obj = body_json.get("error") if isinstance(body_json, dict) else None
+    if not isinstance(err_obj, dict):
+        err_obj = {}
+    err_status = str(err_obj.get("status") or "").strip()
+    err_message = str(err_obj.get("message") or "").strip()
+    details_list = err_obj.get("details") if isinstance(err_obj.get("details"), list) else []
+
+    reason = ""
+    retry_after: Optional[float] = None
+    metadata: Dict[str, Any] = {}
+    for detail in details_list:
+        if not isinstance(detail, dict):
+            continue
+        type_url = str(detail.get("@type") or "")
+        if not reason and type_url.endswith("/google.rpc.ErrorInfo"):
+            reason_value = detail.get("reason")
+            if isinstance(reason_value, str):
+                reason = reason_value
+            md = detail.get("metadata")
+            if isinstance(md, dict):
+                metadata = md
+    header_retry = response.headers.get("Retry-After") or response.headers.get("retry-after")
+    if header_retry:
+        try:
+            retry_after = float(header_retry)
+        except (TypeError, ValueError):
+            retry_after = None
+
+    code = f"gemini_http_{status}"
+    if status == 401:
+        code = "gemini_unauthorized"
+    elif status == 429:
+        code = "gemini_rate_limited"
+    elif status == 404:
+        code = "gemini_model_not_found"
+
+    if err_message:
+        message = f"Gemini HTTP {status} ({err_status or 'error'}): {err_message}"
+    else:
+        message = f"Gemini returned HTTP {status}: {body_text[:500]}"
+
+    return GeminiAPIError(
+        message,
+        code=code,
+        status_code=status,
+        response=response,
+        retry_after=retry_after,
+        details={
+            "status": err_status,
+            "reason": reason,
+            "metadata": metadata,
+            "message": err_message,
+        },
+    )
+
+
+class _GeminiChatCompletions:
+    def __init__(self, client: "GeminiNativeClient"):
+        self._client = client
+
+    def create(self, **kwargs: Any) -> Any:
+        return self._client._create_chat_completion(**kwargs)
+
+
+class _AsyncGeminiChatCompletions:
+    def __init__(self, client: "AsyncGeminiNativeClient"):
+        self._client = client
+
+    async def create(self, **kwargs: Any) -> Any:
+        return await self._client._create_chat_completion(**kwargs)
+
+
+class _GeminiChatNamespace:
+    def __init__(self, client: "GeminiNativeClient"):
+        self.completions = _GeminiChatCompletions(client)
+
+
+class _AsyncGeminiChatNamespace:
+    def __init__(self, client: "AsyncGeminiNativeClient"):
+        self.completions = _AsyncGeminiChatCompletions(client)
+
+
+class GeminiNativeClient:
+    """Minimal OpenAI-SDK-compatible facade over Gemini's native REST API."""
+
+    def __init__(
+        self,
+        *,
+        api_key: str,
+        base_url: Optional[str] = None,
+        default_headers: Optional[Dict[str, str]] = None,
+        timeout: Any = None,
+        http_client: Optional[httpx.Client] = None,
+        **_: Any,
+    ) -> None:
+        self.api_key = api_key
+        normalized_base = (base_url or DEFAULT_GEMINI_BASE_URL).rstrip("/")
+        if normalized_base.endswith("/openai"):
+            normalized_base = normalized_base[: -len("/openai")]
+        self.base_url = normalized_base
+        self._default_headers = dict(default_headers or {})
+        self.chat = _GeminiChatNamespace(self)
+        self.is_closed = False
+        self._http = http_client or httpx.Client(
+            timeout=timeout or httpx.Timeout(connect=15.0, read=600.0, write=30.0, pool=30.0)
+        )
+
+    def close(self) -> None:
+        self.is_closed = True
+        try:
+            self._http.close()
+        except Exception:
+            pass
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        self.close()
+
+    def _headers(self) -> Dict[str, str]:
+        headers = {
+            "Content-Type": "application/json",
+            "Accept": "application/json",
+            "x-goog-api-key": self.api_key,
+            "User-Agent": "hermes-agent (gemini-native)",
+        }
+        headers.update(self._default_headers)
+        return headers
+
+    @staticmethod
+    def _advance_stream_iterator(iterator: Iterator[_GeminiStreamChunk]) -> tuple[bool, Optional[_GeminiStreamChunk]]:
+        try:
+            return False, next(iterator)
+        except StopIteration:
+            return True, None
+
+    def _create_chat_completion(
+        self,
+        *,
+        model: str = "gemini-2.5-flash",
+        messages: Optional[List[Dict[str, Any]]] = None,
+        stream: bool = False,
+        tools: Any = None,
+        tool_choice: Any = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        stop: Any = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+        timeout: Any = None,
+        **_: Any,
+    ) -> Any:
+        thinking_config = None
+        if isinstance(extra_body, dict):
+            thinking_config = extra_body.get("thinking_config") or extra_body.get("thinkingConfig")
+
+        request = build_gemini_request(
+            messages=messages or [],
+            tools=tools,
+            tool_choice=tool_choice,
+            temperature=temperature,
+            max_tokens=max_tokens,
+            top_p=top_p,
+            stop=stop,
+            thinking_config=thinking_config,
+        )
+
+        if stream:
+            return self._stream_completion(model=model, request=request, timeout=timeout)
+
+        url = f"{self.base_url}/models/{model}:generateContent"
+        response = self._http.post(url, json=request, headers=self._headers(), timeout=timeout)
+        if response.status_code != 200:
+            raise gemini_http_error(response)
+        try:
+            payload = response.json()
+        except ValueError as exc:
+            raise GeminiAPIError(
+                f"Invalid JSON from Gemini native API: {exc}",
+                code="gemini_invalid_json",
+                status_code=response.status_code,
+                response=response,
+            ) from exc
+        return translate_gemini_response(payload, model=model)
+
+    def _stream_completion(self, *, model: str, request: Dict[str, Any], timeout: Any = None) -> Iterator[_GeminiStreamChunk]:
+        url = f"{self.base_url}/models/{model}:streamGenerateContent?alt=sse"
+        stream_headers = dict(self._headers())
+        stream_headers["Accept"] = "text/event-stream"
+
+        def _generator() -> Iterator[_GeminiStreamChunk]:
+            try:
+                with self._http.stream("POST", url, json=request, headers=stream_headers, timeout=timeout) as response:
+                    if response.status_code != 200:
+                        response.read()
+                        raise gemini_http_error(response)
+                    tool_call_indices: Dict[str, Dict[str, Any]] = {}
+                    for event in _iter_sse_events(response):
+                        for chunk in translate_stream_event(event, model, tool_call_indices):
+                            yield chunk
+            except httpx.HTTPError as exc:
+                raise GeminiAPIError(
+                    f"Gemini streaming request failed: {exc}",
+                    code="gemini_stream_error",
+                ) from exc
+
+        return _generator()
+
+
+class AsyncGeminiNativeClient:
+    """Async wrapper used by auxiliary_client for native Gemini calls."""
+
+    def __init__(self, sync_client: GeminiNativeClient):
+        self._sync = sync_client
+        self.api_key = sync_client.api_key
+        self.base_url = sync_client.base_url
+        self.chat = _AsyncGeminiChatNamespace(self)
+
+    async def _create_chat_completion(self, **kwargs: Any) -> Any:
+        stream = bool(kwargs.get("stream"))
+        result = await asyncio.to_thread(self._sync.chat.completions.create, **kwargs)
+        if not stream:
+            return result
+
+        async def _async_stream() -> Any:
+            while True:
+                done, chunk = await asyncio.to_thread(self._sync._advance_stream_iterator, result)
+                if done:
+                    break
+                yield chunk
+
+        return _async_stream()
+
+    async def close(self) -> None:
+        await asyncio.to_thread(self._sync.close)
@@ -0,0 +1,85 @@
+"""Helpers for translating OpenAI-style tool schemas to Gemini's schema subset."""
+
+from __future__ import annotations
+
+from typing import Any, Dict, List
+
+# Gemini's ``FunctionDeclaration.parameters`` field accepts the ``Schema``
+# object, which is only a subset of OpenAPI 3.0 / JSON Schema.  Strip fields
+# outside that subset before sending Hermes tool schemas to Google.
+_GEMINI_SCHEMA_ALLOWED_KEYS = {
+    "type",
+    "format",
+    "title",
+    "description",
+    "nullable",
+    "enum",
+    "maxItems",
+    "minItems",
+    "properties",
+    "required",
+    "minProperties",
+    "maxProperties",
+    "minLength",
+    "maxLength",
+    "pattern",
+    "example",
+    "anyOf",
+    "propertyOrdering",
+    "default",
+    "items",
+    "minimum",
+    "maximum",
+}
+
+
+def sanitize_gemini_schema(schema: Any) -> Dict[str, Any]:
+    """Return a Gemini-compatible copy of a tool parameter schema.
+
+    Hermes tool schemas are OpenAI-flavored JSON Schema and may contain keys
+    such as ``$schema`` or ``additionalProperties`` that Google's Gemini
+    ``Schema`` object rejects.  This helper preserves the documented Gemini
+    subset and recursively sanitizes nested ``properties`` / ``items`` /
+    ``anyOf`` definitions.
+    """
+
+    if not isinstance(schema, dict):
+        return {}
+
+    cleaned: Dict[str, Any] = {}
+    for key, value in schema.items():
+        if key not in _GEMINI_SCHEMA_ALLOWED_KEYS:
+            continue
+        if key == "properties":
+            if not isinstance(value, dict):
+                continue
+            props: Dict[str, Any] = {}
+            for prop_name, prop_schema in value.items():
+                if not isinstance(prop_name, str):
+                    continue
+                props[prop_name] = sanitize_gemini_schema(prop_schema)
+            cleaned[key] = props
+            continue
+        if key == "items":
+            cleaned[key] = sanitize_gemini_schema(value)
+            continue
+        if key == "anyOf":
+            if not isinstance(value, list):
+                continue
+            cleaned[key] = [
+                sanitize_gemini_schema(item)
+                for item in value
+                if isinstance(item, dict)
+            ]
+            continue
+        cleaned[key] = value
+    return cleaned
+
+
+def sanitize_gemini_tool_parameters(parameters: Any) -> Dict[str, Any]:
+    """Normalize tool parameters to a valid Gemini object schema."""
+
+    cleaned = sanitize_gemini_schema(parameters)
+    if not cleaned:
+        return {"type": "object", "properties": {}}
+    return cleaned
@@ -124,6 +124,7 @@ class InsightsEngine:
        # Gather raw data
        sessions = self._get_sessions(cutoff, source)
        tool_usage = self._get_tool_usage(cutoff, source)
+        skill_usage = self._get_skill_usage(cutoff, source)
        message_stats = self._get_message_stats(cutoff, source)

        if not sessions:
@@ -135,6 +136,15 @@ class InsightsEngine:
                "models": [],
                "platforms": [],
                "tools": [],
+                "skills": {
+                    "summary": {
+                        "total_skill_loads": 0,
+                        "total_skill_edits": 0,
+                        "total_skill_actions": 0,
+                        "distinct_skills_used": 0,
+                    },
+                    "top_skills": [],
+                },
                "activity": {},
                "top_sessions": [],
            }
@@ -144,6 +154,7 @@ class InsightsEngine:
        models = self._compute_model_breakdown(sessions)
        platforms = self._compute_platform_breakdown(sessions)
        tools = self._compute_tool_breakdown(tool_usage)
+        skills = self._compute_skill_breakdown(skill_usage)
        activity = self._compute_activity_patterns(sessions)
        top_sessions = self._compute_top_sessions(sessions)

@@ -156,6 +167,7 @@ class InsightsEngine:
            "models": models,
            "platforms": platforms,
            "tools": tools,
+            "skills": skills,
            "activity": activity,
            "top_sessions": top_sessions,
        }
@@ -284,6 +296,82 @@ class InsightsEngine:
            for name, count in tool_counts.most_common()
        ]

+    def _get_skill_usage(self, cutoff: float, source: str = None) -> List[Dict]:
+        """Extract per-skill usage from assistant tool calls."""
+        skill_counts: Dict[str, Dict[str, Any]] = {}
+
+        if source:
+            cursor = self._conn.execute(
+                """SELECT m.tool_calls, m.timestamp
+                   FROM messages m
+                   JOIN sessions s ON s.id = m.session_id
+                   WHERE s.started_at >= ? AND s.source = ?
+                     AND m.role = 'assistant' AND m.tool_calls IS NOT NULL""",
+                (cutoff, source),
+            )
+        else:
+            cursor = self._conn.execute(
+                """SELECT m.tool_calls, m.timestamp
+                   FROM messages m
+                   JOIN sessions s ON s.id = m.session_id
+                   WHERE s.started_at >= ?
+                     AND m.role = 'assistant' AND m.tool_calls IS NOT NULL""",
+                (cutoff,),
+            )
+
+        for row in cursor.fetchall():
+            try:
+                calls = row["tool_calls"]
+                if isinstance(calls, str):
+                    calls = json.loads(calls)
+                if not isinstance(calls, list):
+                    continue
+            except (json.JSONDecodeError, TypeError):
+                continue
+
+            timestamp = row["timestamp"]
+            for call in calls:
+                if not isinstance(call, dict):
+                    continue
+                func = call.get("function", {})
+                tool_name = func.get("name")
+                if tool_name not in {"skill_view", "skill_manage"}:
+                    continue
+
+                args = func.get("arguments")
+                if isinstance(args, str):
+                    try:
+                        args = json.loads(args)
+                    except (json.JSONDecodeError, TypeError):
+                        continue
+                if not isinstance(args, dict):
+                    continue
+
+                skill_name = args.get("name")
+                if not isinstance(skill_name, str) or not skill_name.strip():
+                    continue
+
+                entry = skill_counts.setdefault(
+                    skill_name,
+                    {
+                        "skill": skill_name,
+                        "view_count": 0,
+                        "manage_count": 0,
+                        "last_used_at": None,
+                    },
+                )
+                if tool_name == "skill_view":
+                    entry["view_count"] += 1
+                else:
+                    entry["manage_count"] += 1
+
+                if timestamp is not None and (
+                    entry["last_used_at"] is None or timestamp > entry["last_used_at"]
+                ):
+                    entry["last_used_at"] = timestamp
+
+        return list(skill_counts.values())
+
    def _get_message_stats(self, cutoff: float, source: str = None) -> Dict:
        """Get aggregate message statistics."""
        if source:
@@ -475,6 +563,46 @@ class InsightsEngine:
            })
        return result

+    def _compute_skill_breakdown(self, skill_usage: List[Dict]) -> Dict[str, Any]:
+        """Process per-skill usage into summary + ranked list."""
+        total_skill_loads = sum(s["view_count"] for s in skill_usage) if skill_usage else 0
+        total_skill_edits = sum(s["manage_count"] for s in skill_usage) if skill_usage else 0
+        total_skill_actions = total_skill_loads + total_skill_edits
+
+        top_skills = []
+        for skill in skill_usage:
+            total_count = skill["view_count"] + skill["manage_count"]
+            percentage = (total_count / total_skill_actions * 100) if total_skill_actions else 0
+            top_skills.append({
+                "skill": skill["skill"],
+                "view_count": skill["view_count"],
+                "manage_count": skill["manage_count"],
+                "total_count": total_count,
+                "percentage": percentage,
+                "last_used_at": skill.get("last_used_at"),
+            })
+
+        top_skills.sort(
+            key=lambda s: (
+                s["total_count"],
+                s["view_count"],
+                s["manage_count"],
+                s["last_used_at"] or 0,
+                s["skill"],
+            ),
+            reverse=True,
+        )
+
+        return {
+            "summary": {
+                "total_skill_loads": total_skill_loads,
+                "total_skill_edits": total_skill_edits,
+                "total_skill_actions": total_skill_actions,
+                "distinct_skills_used": len(skill_usage),
+            },
+            "top_skills": top_skills,
+        }
+
    def _compute_activity_patterns(self, sessions: List[Dict]) -> Dict:
        """Analyze activity patterns by day of week and hour."""
        day_counts = Counter()  # 0=Monday ... 6=Sunday
@@ -670,6 +798,28 @@ class InsightsEngine:
                lines.append(f"  ... and {len(report['tools']) - 15} more tools")
            lines.append("")

+        # Skill usage
+        skills = report.get("skills", {})
+        top_skills = skills.get("top_skills", [])
+        if top_skills:
+            lines.append("  🧠 Top Skills")
+            lines.append("  " + "─" * 56)
+            lines.append(f"  {'Skill':<28} {'Loads':>7} {'Edits':>7} {'Last used':>11}")
+            for skill in top_skills[:10]:
+                last_used = "—"
+                if skill.get("last_used_at"):
+                    last_used = datetime.fromtimestamp(skill["last_used_at"]).strftime("%b %d")
+                lines.append(
+                    f"  {skill['skill'][:28]:<28} {skill['view_count']:>7,} {skill['manage_count']:>7,} {last_used:>11}"
+                )
+            summary = skills.get("summary", {})
+            lines.append(
+                f"  Distinct skills: {summary.get('distinct_skills_used', 0)}  "
+                f"Loads: {summary.get('total_skill_loads', 0):,}  "
+                f"Edits: {summary.get('total_skill_edits', 0):,}"
+            )
+            lines.append("")
+
        # Activity patterns
        act = report.get("activity", {})
        if act.get("by_day"):
@@ -753,6 +903,18 @@ class InsightsEngine:
                lines.append(f"  {t['tool']} — {t['count']:,} calls ({t['percentage']:.1f}%)")
            lines.append("")

+        skills = report.get("skills", {})
+        if skills.get("top_skills"):
+            lines.append("**🧠 Top Skills:**")
+            for skill in skills["top_skills"][:5]:
+                suffix = ""
+                if skill.get("last_used_at"):
+                    suffix = f", last used {datetime.fromtimestamp(skill['last_used_at']).strftime('%b %d')}"
+                lines.append(
+                    f"  {skill['skill']} — {skill['view_count']:,} loads, {skill['manage_count']:,} edits{suffix}"
+                )
+            lines.append("")
+
        # Activity summary
        act = report.get("activity", {})
        if act.get("busiest_day") and act.get("busiest_hour"):
@@ -116,7 +116,6 @@ DEFAULT_CONTEXT_LENGTHS = {
    "gpt-5.4-nano": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4-mini": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4": 1050000,               # GPT-5.4, GPT-5.4 Pro (1.05M context)
-    "gpt-5.3-codex-spark": 128000,    # Spark variant has reduced 128k context
    "gpt-5.1-chat": 128000,           # Chat variant has 128k context
    "gpt-5": 400000,                  # GPT-5.x base, mini, codex variants (400k)
    "gpt-4.1": 1047576,
@@ -152,7 +152,13 @@ MEMORY_GUIDANCE = (
    "Do NOT save task progress, session outcomes, completed-work logs, or temporary TODO "
    "state to memory; use session_search to recall those from past transcripts. "
    "If you've discovered a new way to do something, solved a problem that could be "
-    "necessary later, save it as a skill with the skill tool."
+    "necessary later, save it as a skill with the skill tool.\n"
+    "Write memories as declarative facts, not instructions to yourself. "
+    "'User prefers concise responses' ✓ — 'Always respond concisely' ✗. "
+    "'Project uses pytest with xdist' ✓ — 'Run tests with pytest -n 4' ✗. "
+    "Imperative phrasing gets re-read as a directive in later sessions and can "
+    "cause repeated work or override the user's current request. Procedures and "
+    "workflows belong in skills, not memory."
 )

 SESSION_SEARCH_GUIDANCE = (
@@ -613,12 +619,14 @@ def build_skills_system_prompt(
        or get_session_env("HERMES_SESSION_PLATFORM")
        or ""
    )
+    disabled = get_disabled_skill_names()
    cache_key = (
        str(skills_dir.resolve()),
        tuple(str(d) for d in external_dirs),
        tuple(sorted(str(t) for t in (available_tools or set()))),
        tuple(sorted(str(ts) for ts in (available_toolsets or set()))),
        _platform_hint,
+        tuple(sorted(disabled)),
    )
    with _SKILLS_PROMPT_CACHE_LOCK:
        cached = _SKILLS_PROMPT_CACHE.get(cache_key)
@@ -626,8 +634,6 @@ def build_skills_system_prompt(
            _SKILLS_PROMPT_CACHE.move_to_end(cache_key)
            return cached

-    disabled = get_disabled_skill_names()
-
    # ── Layer 2: disk snapshot ────────────────────────────────────────
    snapshot = _load_skills_snapshot(skills_dir)

@@ -1,195 +0,0 @@
-"""Helpers for optional cheap-vs-strong model routing."""
-
-from __future__ import annotations
-
-import os
-import re
-from typing import Any, Dict, Optional
-
-from utils import is_truthy_value
-
-_COMPLEX_KEYWORDS = {
-    "debug",
-    "debugging",
-    "implement",
-    "implementation",
-    "refactor",
-    "patch",
-    "traceback",
-    "stacktrace",
-    "exception",
-    "error",
-    "analyze",
-    "analysis",
-    "investigate",
-    "architecture",
-    "design",
-    "compare",
-    "benchmark",
-    "optimize",
-    "optimise",
-    "review",
-    "terminal",
-    "shell",
-    "tool",
-    "tools",
-    "pytest",
-    "test",
-    "tests",
-    "plan",
-    "planning",
-    "delegate",
-    "subagent",
-    "cron",
-    "docker",
-    "kubernetes",
-}
-
-_URL_RE = re.compile(r"https?://|www\.", re.IGNORECASE)
-
-
-def _coerce_bool(value: Any, default: bool = False) -> bool:
-    return is_truthy_value(value, default=default)
-
-
-def _coerce_int(value: Any, default: int) -> int:
-    try:
-        return int(value)
-    except (TypeError, ValueError):
-        return default
-
-
-def choose_cheap_model_route(user_message: str, routing_config: Optional[Dict[str, Any]]) -> Optional[Dict[str, Any]]:
-    """Return the configured cheap-model route when a message looks simple.
-
-    Conservative by design: if the message has signs of code/tool/debugging/
-    long-form work, keep the primary model.
-    """
-    cfg = routing_config or {}
-    if not _coerce_bool(cfg.get("enabled"), False):
-        return None
-
-    cheap_model = cfg.get("cheap_model") or {}
-    if not isinstance(cheap_model, dict):
-        return None
-    provider = str(cheap_model.get("provider") or "").strip().lower()
-    model = str(cheap_model.get("model") or "").strip()
-    if not provider or not model:
-        return None
-
-    text = (user_message or "").strip()
-    if not text:
-        return None
-
-    max_chars = _coerce_int(cfg.get("max_simple_chars"), 160)
-    max_words = _coerce_int(cfg.get("max_simple_words"), 28)
-
-    if len(text) > max_chars:
-        return None
-    if len(text.split()) > max_words:
-        return None
-    if text.count("\n") > 1:
-        return None
-    if "```" in text or "`" in text:
-        return None
-    if _URL_RE.search(text):
-        return None
-
-    lowered = text.lower()
-    words = {token.strip(".,:;!?()[]{}\"'`") for token in lowered.split()}
-    if words & _COMPLEX_KEYWORDS:
-        return None
-
-    route = dict(cheap_model)
-    route["provider"] = provider
-    route["model"] = model
-    route["routing_reason"] = "simple_turn"
-    return route
-
-
-def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any]], primary: Dict[str, Any]) -> Dict[str, Any]:
-    """Resolve the effective model/runtime for one turn.
-
-    Returns a dict with model/runtime/signature/label fields.
-    """
-    route = choose_cheap_model_route(user_message, routing_config)
-    if not route:
-        return {
-            "model": primary.get("model"),
-            "runtime": {
-                "api_key": primary.get("api_key"),
-                "base_url": primary.get("base_url"),
-                "provider": primary.get("provider"),
-                "api_mode": primary.get("api_mode"),
-                "command": primary.get("command"),
-                "args": list(primary.get("args") or []),
-                "credential_pool": primary.get("credential_pool"),
-            },
-            "label": None,
-            "signature": (
-                primary.get("model"),
-                primary.get("provider"),
-                primary.get("base_url"),
-                primary.get("api_mode"),
-                primary.get("command"),
-                tuple(primary.get("args") or ()),
-            ),
-        }
-
-    from hermes_cli.runtime_provider import resolve_runtime_provider
-
-    explicit_api_key = None
-    api_key_env = str(route.get("api_key_env") or "").strip()
-    if api_key_env:
-        explicit_api_key = os.getenv(api_key_env) or None
-
-    try:
-        runtime = resolve_runtime_provider(
-            requested=route.get("provider"),
-            explicit_api_key=explicit_api_key,
-            explicit_base_url=route.get("base_url"),
-        )
-    except Exception:
-        return {
-            "model": primary.get("model"),
-            "runtime": {
-                "api_key": primary.get("api_key"),
-                "base_url": primary.get("base_url"),
-                "provider": primary.get("provider"),
-                "api_mode": primary.get("api_mode"),
-                "command": primary.get("command"),
-                "args": list(primary.get("args") or []),
-                "credential_pool": primary.get("credential_pool"),
-            },
-            "label": None,
-            "signature": (
-                primary.get("model"),
-                primary.get("provider"),
-                primary.get("base_url"),
-                primary.get("api_mode"),
-                primary.get("command"),
-                tuple(primary.get("args") or ()),
-            ),
-        }
-
-    return {
-        "model": route.get("model"),
-        "runtime": {
-            "api_key": runtime.get("api_key"),
-            "base_url": runtime.get("base_url"),
-            "provider": runtime.get("provider"),
-            "api_mode": runtime.get("api_mode"),
-            "command": runtime.get("command"),
-            "args": list(runtime.get("args") or []),
-            "credential_pool": runtime.get("credential_pool"),
-        },
-        "label": f"smart route → {route.get('model')} ({runtime.get('provider')})",
-        "signature": (
-            route.get("model"),
-            runtime.get("provider"),
-            runtime.get("base_url"),
-            runtime.get("api_mode"),
-            runtime.get("command"),
-            tuple(runtime.get("args") or ()),
-        ),
-    }
@@ -0,0 +1 @@
+"""Transport layer types for provider response normalization."""
@@ -0,0 +1,100 @@
+"""Shared types for normalized provider responses.
+
+These dataclasses define the canonical shape that all provider adapters
+normalize responses to.  The shared surface is intentionally minimal —
+only fields that every downstream consumer reads are top-level.
+Protocol-specific state goes in ``provider_data`` dicts (response-level
+and per-tool-call) so that protocol-aware code paths can access it
+without polluting the shared type.
+"""
+
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional
+
+
+@dataclass
+class ToolCall:
+    """A normalized tool call from any provider.
+
+    ``id`` is the protocol's canonical identifier — what gets used in
+    ``tool_call_id`` / ``tool_use_id`` when constructing tool result
+    messages.  May be ``None`` when the provider omits it; the agent
+    fills it via ``_deterministic_call_id()`` before storing in history.
+
+    ``provider_data`` carries per-tool-call protocol metadata that only
+    protocol-aware code reads:
+
+    * Codex: ``{"call_id": "call_XXX", "response_item_id": "fc_XXX"}``
+    * Gemini: ``{"extra_content": {"google": {"thought_signature": "..."}}}``
+    * Others: ``None``
+    """
+
+    id: Optional[str]
+    name: str
+    arguments: str  # JSON string
+    provider_data: Optional[Dict[str, Any]] = field(default=None, repr=False)
+
+
+@dataclass
+class Usage:
+    """Token usage from an API response."""
+
+    prompt_tokens: int = 0
+    completion_tokens: int = 0
+    total_tokens: int = 0
+    cached_tokens: int = 0
+
+
+@dataclass
+class NormalizedResponse:
+    """Normalized API response from any provider.
+
+    Shared fields are truly cross-provider — every caller can rely on
+    them without branching on api_mode.  Protocol-specific state goes in
+    ``provider_data`` so that only protocol-aware code paths read it.
+
+    Response-level ``provider_data`` examples:
+
+    * Anthropic: ``{"reasoning_details": [...]}``
+    * Codex: ``{"codex_reasoning_items": [...]}``
+    * Others: ``None``
+    """
+
+    content: Optional[str]
+    tool_calls: Optional[List[ToolCall]]
+    finish_reason: str  # "stop", "tool_calls", "length", "content_filter"
+    reasoning: Optional[str] = None
+    usage: Optional[Usage] = None
+    provider_data: Optional[Dict[str, Any]] = field(default=None, repr=False)
+
+
+# ---------------------------------------------------------------------------
+# Factory helpers
+# ---------------------------------------------------------------------------
+
+def build_tool_call(
+    id: Optional[str],
+    name: str,
+    arguments: Any,
+    **provider_fields: Any,
+) -> ToolCall:
+    """Build a ``ToolCall``, auto-serialising *arguments* if it's a dict.
+
+    Any extra keyword arguments are collected into ``provider_data``.
+    """
+    args_str = json.dumps(arguments) if isinstance(arguments, dict) else str(arguments)
+    pd = dict(provider_fields) if provider_fields else None
+    return ToolCall(id=id, name=name, arguments=args_str, provider_data=pd)
+
+
+def map_finish_reason(reason: Optional[str], mapping: Dict[str, str]) -> str:
+    """Translate a provider-specific stop reason to the normalised set.
+
+    Falls back to ``"stop"`` for unknown or ``None`` reasons.
+    """
+    if reason is None:
+        return "stop"
+    return mapping.get(reason, "stop")
@@ -444,6 +444,7 @@ def _process_batch_worker(args: Tuple) -> Dict[str, Any]:
            if not reasoning.get("has_any_reasoning", True):
                print(f"   🚫 Prompt {prompt_index} discarded (no reasoning in any turn)")
                discarded_no_reasoning += 1
+                completed_in_batch.append(prompt_index)
                continue
            
            # Get and normalize tool stats for consistent schema across all entries
@@ -63,7 +63,38 @@ model:
  #   Leave unset to use the model's native output ceiling (recommended).
  #   Set only if you want to deliberately limit individual response length.
  #
-  # max_tokens: 8192
+# max_tokens: 8192
+
+# Named provider overrides (optional)
+# Use this for per-provider request timeouts, non-stream stale timeouts,
+# and per-model exceptions.
+# Applies to the primary turn client on every api_mode (OpenAI-wire, native
+# Anthropic, and Anthropic-compatible providers), the fallback chain, and
+# client rebuilds during credential rotation.  For OpenAI-wire chat
+# completions (streaming and non-streaming) the configured value is also
+# used as the per-request ``timeout=`` kwarg so it wins over the legacy
+# HERMES_API_TIMEOUT env var (which still applies when no config is set).
+# ``stale_timeout_seconds`` controls the non-streaming stale-call detector and
+# wins over the legacy HERMES_API_CALL_STALE_TIMEOUT env var. Leaving these
+# unset keeps the legacy defaults (HERMES_API_TIMEOUT=1800s,
+# HERMES_API_CALL_STALE_TIMEOUT=300s, native Anthropic 900s).
+#
+# Not currently wired for AWS Bedrock (bedrock_converse + AnthropicBedrock
+# SDK paths) — those use boto3 with its own timeout configuration.
+#
+# providers:
+#   ollama-local:
+#     request_timeout_seconds: 300   # Longer timeout for local cold-starts
+#     stale_timeout_seconds: 900     # Explicitly re-enable stale detection on local endpoints
+#   anthropic:
+#     request_timeout_seconds: 30    # Fast-fail cloud requests
+#     models:
+#       claude-opus-4.6:
+#         timeout_seconds: 600       # Longer timeout for extended-thinking Opus calls
+#   openai-codex:
+#     models:
+#       gpt-5.4:
+#         stale_timeout_seconds: 1800  # Longer non-stream stale timeout for slow large-context turns

 # =============================================================================
 # OpenRouter Provider Routing (only applies when using OpenRouter)
@@ -91,20 +122,6 @@ model:
 #   # Data policy: "allow" (default) or "deny" to exclude providers that may store data
 #   # data_collection: "deny"

-# =============================================================================
-# Smart Model Routing (optional)
-# =============================================================================
-# Use a cheaper model for short/simple turns while keeping your main model for
-# more complex requests. Disabled by default.
-#
-# smart_model_routing:
-#   enabled: true
-#   max_simple_chars: 160
-#   max_simple_words: 28
-#   cheap_model:
-#     provider: openrouter
-#     model: google/gemini-2.5-flash
-
 # =============================================================================
 # Git Worktree Isolation
 # =============================================================================
@@ -357,6 +374,18 @@ compression:
 #   web_extract:
 #     provider: "auto"
 #     model: ""
+#
+#   # Session search — summarizes matching past sessions
+#   session_search:
+#     provider: "auto"
+#     model: ""
+#     timeout: 30
+#     max_concurrency: 3    # Limit parallel summaries to reduce request-burst 429s
+#     extra_body: {}        # Provider-specific OpenAI-compatible request fields
+#                           # Example for providers that support request-body
+#                           # reasoning controls:
+#                           # extra_body:
+#                           #   enable_thinking: false

 # =============================================================================
 # Persistent Memory
@@ -310,12 +310,6 @@ def load_cli_config() -> Dict[str, Any]:
            "enabled": True,      # Auto-compress when approaching context limit
            "threshold": 0.50,    # Compress at 50% of model's context limit
        },
-        "smart_model_routing": {
-            "enabled": False,
-            "max_simple_chars": 160,
-            "max_simple_words": 28,
-            "cheap_model": {},
-        },
        "agent": {
            "max_turns": 90,  # Default max tool-calling iterations (shared with subagents)
            "verbose": False,
@@ -1147,6 +1141,43 @@ def _rich_text_from_ansi(text: str) -> _RichText:
    return _RichText.from_ansi(text or "")


+def _strip_markdown_syntax(text: str) -> str:
+    """Best-effort markdown marker removal for plain-text display."""
+    import re
+
+    plain = _rich_text_from_ansi(text or "").plain
+    plain = re.sub(r"^\s{0,3}(?:[-*_]\s*){3,}$", "", plain, flags=re.MULTILINE)
+    plain = re.sub(r"^\s{0,3}#{1,6}\s+", "", plain, flags=re.MULTILINE)
+    # Preserve blockquotes, lists, and checkboxes because they carry structure.
+    plain = re.sub(r"(```+|~~~+)", "", plain)
+    plain = re.sub(r"`([^`]*)`", r"\1", plain)
+    plain = re.sub(r"!\[([^\]]*)\]\([^\)]*\)", r"\1", plain)
+    plain = re.sub(r"\[([^\]]+)\]\([^\)]*\)", r"\1", plain)
+    plain = re.sub(r"\*\*\*([^*]+)\*\*\*", r"\1", plain)
+    plain = re.sub(r"___([^_]+)___", r"\1", plain)
+    plain = re.sub(r"\*\*([^*]+)\*\*", r"\1", plain)
+    plain = re.sub(r"__([^_]+)__", r"\1", plain)
+    plain = re.sub(r"\*([^*]+)\*", r"\1", plain)
+    plain = re.sub(r"_([^_]+)_", r"\1", plain)
+    plain = re.sub(r"~~([^~]+)~~", r"\1", plain)
+    plain = re.sub(r"\n{3,}", "\n\n", plain)
+    return plain.strip("\n")
+
+
+def _render_final_assistant_content(text: str, mode: str = "render"):
+    """Render final assistant content as markdown, stripped text, or raw text."""
+    from rich.markdown import Markdown
+
+    normalized_mode = str(mode or "render").strip().lower()
+    if normalized_mode == "strip":
+        return _RichText(_strip_markdown_syntax(text))
+    if normalized_mode == "raw":
+        return _rich_text_from_ansi(text or "")
+
+    plain = _rich_text_from_ansi(text or "").plain
+    return Markdown(plain)
+
+
 def _cprint(text: str):
    """Print ANSI-colored text through prompt_toolkit's native renderer.

@@ -1724,10 +1755,30 @@ class HermesCLI:
        
        # streaming: stream tokens to the terminal as they arrive (display.streaming in config.yaml)
        self.streaming_enabled = CLI_CONFIG["display"].get("streaming", False)
+        self.final_response_markdown = str(
+            CLI_CONFIG["display"].get("final_response_markdown", "strip")
+        ).strip().lower() or "strip"
+        if self.final_response_markdown not in {"render", "strip", "raw"}:
+            self.final_response_markdown = "strip"

        # Inline diff previews for write actions (display.inline_diffs in config.yaml)
        self._inline_diffs_enabled = CLI_CONFIG["display"].get("inline_diffs", True)

+        # Submitted multiline user-message preview (display.user_message_preview in config.yaml)
+        _ump = CLI_CONFIG["display"].get("user_message_preview", {})
+        if not isinstance(_ump, dict):
+            _ump = {}
+        try:
+            _ump_first_lines = int(_ump.get("first_lines", 2))
+        except (TypeError, ValueError):
+            _ump_first_lines = 2
+        try:
+            _ump_last_lines = int(_ump.get("last_lines", 2))
+        except (TypeError, ValueError):
+            _ump_last_lines = 2
+        self.user_message_preview_first_lines = max(1, _ump_first_lines)
+        self.user_message_preview_last_lines = max(0, _ump_last_lines)
+
        # Streaming display state
        self._stream_buf = ""        # Partial line buffer for line-buffered rendering
        self._stream_started = False  # True once first delta arrives
@@ -1857,8 +1908,9 @@ class HermesCLI:
            fb = [fb] if fb.get("provider") and fb.get("model") else []
        self._fallback_model = fb

-        # Optional cheap-vs-strong routing for simple turns
-        self._smart_model_routing = CLI_CONFIG.get("smart_model_routing", {}) or {}
+        # Signature of the currently-initialised agent's runtime.  Used to
+        # rebuild the agent when provider / model / base_url changes across
+        # turns (e.g. after /model or credential rotation).
        self._active_agent_route_signature = None

        # Agent will be initialized on first use
@@ -1869,6 +1921,10 @@ class HermesCLI:
        self.conversation_history: List[Dict[str, Any]] = []
        self.session_start = datetime.now()
        self._resumed = False
+        # Per-prompt elapsed timer — started at the beginning of each chat turn,
+        # frozen when the agent thread completes, displayed in the status bar.
+        self._prompt_start_time: Optional[float] = None  # time.time() when turn started
+        self._prompt_duration: float = 0.0  # frozen duration of last completed turn
        # Initialize SQLite session store early so /title works before first message
        self._session_db = None
        try:
@@ -1967,6 +2023,44 @@ class HermesCLI:
        filled = round((safe_percent / 100) * width)
        return f"[{('█' * filled) + ('░' * max(0, width - filled))}]"

+    @staticmethod
+    def _format_prompt_elapsed(prompt_start_time: Optional[float], prompt_duration: float, live: bool = False) -> str:
+        """Format per-prompt elapsed time for the status bar.
+
+        Always returns a string — shows 0s on fresh start before first turn.
+        Keeps seconds visible at all scales so it increments smoothly:
+            59s → 1m → 1m 1s → ... → 1m 59s → 2m → 2m 1s → ...
+            59m 59s → 1h → 1h 0m 1s → ...
+            23h 59m 59s → 1d → 1d 0h 1m → ...
+
+        Emoji prefix: ⏱ when turn is live, ⏲ when frozen or fresh start.
+        Uses width-1 (no variation selector) glyphs so the status bar stays
+        aligned in monospace terminals.
+        """
+        if prompt_start_time is None and prompt_duration == 0.0:
+            return "⏲ 0s"
+        elapsed = time.time() - prompt_start_time if prompt_start_time is not None else prompt_duration
+        elapsed = max(0.0, elapsed)
+
+        days = int(elapsed // 86400)
+        remaining = elapsed % 86400
+        hours = int(remaining // 3600)
+        remaining = remaining % 3600
+        minutes = int(remaining // 60)
+        seconds = int(remaining % 60)
+
+        if days > 0:
+            time_str = f"{days}d {hours}h {minutes}m"
+        elif hours > 0:
+            time_str = f"{hours}h {minutes}m {seconds}s" if seconds else f"{hours}h {minutes}m"
+        elif minutes > 0:
+            time_str = f"{minutes}m {seconds}s" if seconds else f"{minutes}m"
+        else:
+            time_str = f"{int(elapsed)}s"
+
+        emoji = "⏱" if live else "⏲"
+        return f"{emoji} {time_str}"
+
    def _get_status_bar_snapshot(self) -> Dict[str, Any]:
        # Prefer the agent's model name — it updates on fallback.
        # self.model reflects the originally configured model and never
@@ -1985,6 +2079,11 @@ class HermesCLI:
            "model_name": model_name,
            "model_short": model_short,
            "duration": format_duration_compact(elapsed_seconds),
+            "prompt_elapsed": self._format_prompt_elapsed(
+                getattr(self, "_prompt_start_time", None),
+                getattr(self, "_prompt_duration", 0.0),
+                live=getattr(self, "_prompt_start_time", None) is not None,
+            ),
            "context_tokens": 0,
            "context_length": None,
            "context_percent": None,
@@ -2176,6 +2275,9 @@ class HermesCLI:

            parts = [f"⚕ {snapshot['model_short']}", context_label, percent_label]
            parts.append(duration_label)
+            prompt_elapsed = snapshot.get("prompt_elapsed")
+            if prompt_elapsed:
+                parts.append(prompt_elapsed)
            return self._trim_status_bar_text(" │ ".join(parts), width)
        except Exception:
            return f"⚕ {self.model if getattr(self, 'model', None) else 'Hermes'}"
@@ -2234,8 +2336,13 @@ class HermesCLI:
                        (bar_style, percent_label),
                        ("class:status-bar-dim", " │ "),
                        ("class:status-bar-dim", duration_label),
-                        ("class:status-bar", " "),
                    ]
+                    # Position 7: per-prompt elapsed timer (live or frozen)
+                    prompt_elapsed = snapshot.get("prompt_elapsed")
+                    if prompt_elapsed:
+                        frags.append(("class:status-bar-dim", " │ "))
+                        frags.append(("class:status-bar-dim", prompt_elapsed))
+                    frags.append(("class:status-bar", " "))

            total_width = sum(self._status_bar_display_width(text) for _, text in frags)
            if total_width > width:
@@ -2454,6 +2561,61 @@ class HermesCLI:
        if flush_text:
            self._emit_reasoning_preview(flush_text)

+    def _format_submitted_user_message_preview(self, user_input: str) -> str:
+        """Format the submitted user-message scrollback preview."""
+        lines = user_input.split("\n")
+        if len(lines) <= 1:
+            return f"[bold {_accent_hex()}]●[/] [bold]{_escape(user_input)}[/]"
+
+        first_lines = int(getattr(self, "user_message_preview_first_lines", 2))
+        last_lines = int(getattr(self, "user_message_preview_last_lines", 2))
+        first_lines = max(1, first_lines)
+        last_lines = max(0, last_lines)
+        head = lines[:first_lines]
+        remaining_after_head = max(0, len(lines) - len(head))
+        tail_count = min(last_lines, remaining_after_head)
+        tail = lines[-tail_count:] if tail_count else []
+
+        hidden_middle_count = len(lines) - len(head) - len(tail)
+        if hidden_middle_count < 0:
+            hidden_middle_count = 0
+            tail = []
+
+        preview_lines = [
+            f"[bold {_accent_hex()}]●[/] [bold]{_escape(head[0])}[/]"
+        ]
+        preview_lines.extend(f"[bold]{_escape(line)}[/]" for line in head[1:])
+
+        if hidden_middle_count > 0:
+            noun = "line" if hidden_middle_count == 1 else "lines"
+            preview_lines.append(f"[dim]... (+{hidden_middle_count} more {noun})[/]")
+
+        preview_lines.extend(f"[bold]{_escape(line)}[/]" for line in tail)
+        return "\n".join(preview_lines)
+
+    def _expand_paste_references(self, text: str | None) -> str:
+        """Expand [Pasted text #N -> file] placeholders into file contents."""
+        if not isinstance(text, str) or "[Pasted text #" not in text:
+            return text or ""
+        import re as _re
+
+        paste_ref_re = _re.compile(r'\[Pasted text #\d+: \d+ lines \u2192 (.+?)\]')
+
+        def _expand_ref(match):
+            path = Path(match.group(1))
+            return path.read_text(encoding="utf-8") if path.exists() else match.group(0)
+
+        return paste_ref_re.sub(_expand_ref, text)
+
+    def _print_user_message_preview(self, user_input: str) -> None:
+        """Render a user message using the normal chat scrollback style."""
+        ChatConsole().print(f"[{_accent_hex()}]{'─' * 40}[/]")
+        text = str(user_input or "")
+        if "\n" in text:
+            ChatConsole().print(self._format_submitted_user_message_preview(text))
+        else:
+            ChatConsole().print(f"[bold {_accent_hex()}]●[/] [bold]{_escape(text)}[/]")
+
    def _stream_reasoning_delta(self, text: str) -> None:
        """Stream reasoning/thinking tokens into a dim box above the response.

@@ -2697,6 +2859,8 @@ class HermesCLI:
        _tc = getattr(self, "_stream_text_ansi", "")
        while "\n" in self._stream_buf:
            line, self._stream_buf = self._stream_buf.split("\n", 1)
+            if self.final_response_markdown == "strip":
+                line = _strip_markdown_syntax(line)
            _cprint(f"{_STREAM_PAD}{_tc}{line}{_RST}" if _tc else f"{_STREAM_PAD}{line}")

    def _flush_stream(self) -> None:
@@ -2714,7 +2878,8 @@ class HermesCLI:

        if self._stream_buf:
            _tc = getattr(self, "_stream_text_ansi", "")
-            _cprint(f"{_STREAM_PAD}{_tc}{self._stream_buf}{_RST}" if _tc else f"{_STREAM_PAD}{self._stream_buf}")
+            line = _strip_markdown_syntax(self._stream_buf) if self.final_response_markdown == "strip" else self._stream_buf
+            _cprint(f"{_STREAM_PAD}{_tc}{line}{_RST}" if _tc else f"{_STREAM_PAD}{line}")
            self._stream_buf = ""

        # Close the response box
@@ -2776,6 +2941,39 @@ class HermesCLI:
            self._command_status = ""
            self._invalidate(min_interval=0.0)

+    def _open_external_editor(self, buffer=None) -> bool:
+        """Open the active input buffer in an external editor."""
+        app = getattr(self, "_app", None)
+        if not app:
+            _cprint(f"{_DIM}External editor is only available inside the interactive CLI.{_RST}")
+            return False
+        if self._command_running:
+            _cprint(f"{_DIM}Wait for the current command to finish before opening the editor.{_RST}")
+            return False
+        if self._sudo_state or self._secret_state or self._approval_state or self._clarify_state:
+            _cprint(f"{_DIM}Finish the active prompt before opening the editor.{_RST}")
+            return False
+        target_buffer = buffer or getattr(app, "current_buffer", None)
+        if target_buffer is None:
+            _cprint(f"{_DIM}No active input buffer is available for the external editor.{_RST}")
+            return False
+        try:
+            existing_text = getattr(target_buffer, "text", "")
+            expanded_text = self._expand_paste_references(existing_text)
+            if expanded_text != existing_text and hasattr(target_buffer, "text"):
+                self._skip_paste_collapse = True
+                target_buffer.text = expanded_text
+                if hasattr(target_buffer, "cursor_position"):
+                    target_buffer.cursor_position = len(expanded_text)
+            # Set skip flag (again) so the text-change event fired when the
+            # editor closes does not re-collapse the returned content.
+            self._skip_paste_collapse = True
+            target_buffer.open_in_editor(validate_and_handle=False)
+            return True
+        except Exception as exc:
+            _cprint(f"{_DIM}Failed to open external editor: {exc}{_RST}")
+            return False
+
    def _ensure_runtime_credentials(self) -> bool:
        """
        Ensure runtime credentials are resolved before agent use.
@@ -2883,24 +3081,36 @@ class HermesCLI:
        return True

    def _resolve_turn_agent_config(self, user_message: str) -> dict:
-        """Resolve model/runtime overrides for a single user turn."""
-        from agent.smart_model_routing import resolve_turn_route
+        """Build the effective model/runtime config for a single user turn.
+
+        Always uses the session's primary model/provider.  If the user has
+        toggled `/fast` on and the current model supports Priority
+        Processing / Anthropic fast mode, attach `request_overrides` so the
+        API call is marked accordingly.
+        """
        from hermes_cli.models import resolve_fast_mode_overrides

-        route = resolve_turn_route(
-            user_message,
-            self._smart_model_routing,
-            {
-                "model": self.model,
-                "api_key": self.api_key,
-                "base_url": self.base_url,
-                "provider": self.provider,
-                "api_mode": self.api_mode,
-                "command": self.acp_command,
-                "args": list(self.acp_args or []),
-                "credential_pool": getattr(self, "_credential_pool", None),
-            },
-        )
+        runtime = {
+            "api_key": self.api_key,
+            "base_url": self.base_url,
+            "provider": self.provider,
+            "api_mode": self.api_mode,
+            "command": self.acp_command,
+            "args": list(self.acp_args or []),
+            "credential_pool": getattr(self, "_credential_pool", None),
+        }
+        route = {
+            "model": self.model,
+            "runtime": runtime,
+            "signature": (
+                self.model,
+                runtime["provider"],
+                runtime["base_url"],
+                runtime["api_mode"],
+                runtime["command"],
+                tuple(runtime["args"]),
+            ),
+        }

        service_tier = getattr(self, "service_tier", None)
        if not service_tier:
@@ -2908,13 +3118,13 @@ class HermesCLI:
            return route

        try:
-            overrides = resolve_fast_mode_overrides(route.get("model"))
+            overrides = resolve_fast_mode_overrides(route["model"])
        except Exception:
            overrides = None
        route["request_overrides"] = overrides
        return route

-    def _init_agent(self, *, model_override: str = None, runtime_override: dict = None, route_label: str = None, request_overrides: dict | None = None) -> bool:
+    def _init_agent(self, *, model_override: str = None, runtime_override: dict = None, request_overrides: dict | None = None) -> bool:
        """
        Initialize the agent on first use.
        When resuming a session, restores conversation history from SQLite.
@@ -3941,6 +4151,7 @@ class HermesCLI:

        _cprint(f"\n  {_DIM}Tip: Just type your message to chat with Hermes!{_RST}")
        _cprint(f"  {_DIM}Multi-line: Alt+Enter for a new line{_RST}")
+        _cprint(f"  {_DIM}Draft editor: Ctrl+G{_RST}")
        if _is_termux_environment():
            _cprint(f"  {_DIM}Attach image: /image {_termux_example_image_path()} or start your prompt with a local image path{_RST}\n")
        else:
@@ -5287,7 +5498,7 @@ class HermesCLI:
            print("    /cron list")
            print('    /cron add "every 2h" "Check server status" [--skill blogwatcher]')
            print('    /cron edit <job_id> --schedule "every 4h" --prompt "New task"')
-            print("    /cron edit <job_id> --skill blogwatcher --skill find-nearby")
+            print("    /cron edit <job_id> --skill blogwatcher --skill maps")
            print("    /cron edit <job_id> --remove-skill blogwatcher")
            print("    /cron edit <job_id> --clear-skills")
            print("    /cron pause <job_id>")
@@ -6040,7 +6251,7 @@ class HermesCLI:

                    _chat_console = ChatConsole()
                    _chat_console.print(Panel(
-                        _rich_text_from_ansi(response),
+                        _render_final_assistant_content(response, mode=self.final_response_markdown),
                        title=f"[{_resp_color} bold]{label} (background #{task_num})[/]",
                        title_align="left",
                        border_style=_resp_color,
@@ -6165,7 +6376,7 @@ class HermesCLI:
                        _resp_color = "#4F6D4A"

                    ChatConsole().print(Panel(
-                        _rich_text_from_ansi(response),
+                        _render_final_assistant_content(response, mode=self.final_response_markdown),
                        title=f"[{_resp_color} bold]⚕ /btw[/]",
                        title_align="left",
                        border_style=_resp_color,
@@ -6657,6 +6868,18 @@ class HermesCLI:
                focus_topic=focus_topic or None,
            )
            self.conversation_history = compressed
+            # _compress_context ends the old session and creates a new child
+            # session on the agent (run_agent.py::_compress_context). Sync the
+            # CLI's session_id so /status, /resume, exit summary, and title
+            # generation all point at the live continuation session, not the
+            # ended parent. Without this, subsequent end_session() calls target
+            # the already-closed parent and the child is orphaned.
+            if (
+                getattr(self.agent, "session_id", None)
+                and self.agent.session_id != self.session_id
+            ):
+                self.session_id = self.agent.session_id
+                self._pending_title = None
            new_tokens = estimate_messages_tokens_rough(self.conversation_history)
            summary = summarize_manual_compression(
                original_history,
@@ -7911,7 +8134,6 @@ class HermesCLI:
        if not self._init_agent(
            model_override=turn_route["model"],
            runtime_override=turn_route["runtime"],
-            route_label=turn_route["label"],
            request_overrides=turn_route.get("request_overrides"),
        ):
            return None
@@ -8069,6 +8291,10 @@ class HermesCLI:
            # Start agent in background thread (daemon so it cannot keep the
            # process alive when the user closes the terminal tab — SIGHUP
            # exits the main thread and daemon threads are reaped automatically).
+            # Start per-prompt elapsed timer — frozen after the agent thread
+            # finishes; reset on the next turn.
+            self._prompt_start_time = time.time()
+            self._prompt_duration = 0.0
            agent_thread = threading.Thread(target=run_agent, daemon=True)
            agent_thread.start()

@@ -8146,6 +8372,12 @@ class HermesCLI:
                # but guard against edge cases.
                agent_thread.join(timeout=30)

+            # Freeze per-prompt elapsed timer once the agent thread has
+            # exited (or been abandoned as a daemon after interrupt).
+            if self._prompt_start_time is not None:
+                self._prompt_duration = max(0.0, time.time() - self._prompt_start_time)
+                self._prompt_start_time = None
+
            # Proactively clean up async clients whose event loop is dead.
            # The agent thread may have created AsyncOpenAI clients bound
            # to a per-thread event loop; if that loop is now closed, those
@@ -8176,6 +8408,20 @@ class HermesCLI:
            # Update history with full conversation
            self.conversation_history = result.get("messages", self.conversation_history) if result else self.conversation_history

+            # If auto-compression fired mid-turn, the agent created a new
+            # continuation session and mutated self.agent.session_id. Sync
+            # the CLI's session_id so /status, /resume, title generation,
+            # and the exit summary all target the live child session rather
+            # than the ended parent. Mirrors the gateway's post-run sync
+            # (gateway/run.py around line 9983).
+            if (
+                self.agent
+                and getattr(self.agent, "session_id", None)
+                and self.agent.session_id != self.session_id
+            ):
+                self.session_id = self.agent.session_id
+                self._pending_title = None
+
            # Get the final response
            response = result.get("final_response", "") if result else ""

@@ -8265,7 +8511,7 @@ class HermesCLI:
                else:
                    _chat_console = ChatConsole()
                    _chat_console.print(Panel(
-                        _rich_text_from_ansi(response),
+                        _render_final_assistant_content(response, mode=self.final_response_markdown),
                        title=f"[{_resp_color} bold]{label}[/]",
                        title_align="left",
                        border_style=_resp_color,
@@ -8831,6 +9077,16 @@ class HermesCLI:
            """Ctrl+Enter (c-j) inserts a newline. Most terminals send c-j for Ctrl+Enter."""
            event.current_buffer.insert_text('\n')

+        @kb.add(
+            'c-g',
+            filter=Condition(
+                lambda: not self._clarify_state and not self._approval_state and not self._sudo_state and not self._secret_state
+            ),
+        )
+        def handle_open_in_editor(event):
+            """Ctrl+G opens the current draft in an external editor."""
+            cli_ref._open_external_editor(event.current_buffer)
+
        @kb.add('tab', eager=True)
        def handle_tab(event):
            """Tab: accept completion, auto-suggestion, or start completions.
@@ -9282,6 +9538,7 @@ class HermesCLI:
        _prev_text_len = [0]
        _prev_newline_count = [0]
        _paste_just_collapsed = [False]
+        self._skip_paste_collapse = False

        def _on_text_changed(buf):
            """Detect large pastes and collapse them to a file reference.
@@ -9301,8 +9558,9 @@ class HermesCLI:
            text = buf.text
            chars_added = len(text) - _prev_text_len[0]
            _prev_text_len[0] = len(text)
-            if _paste_just_collapsed[0]:
+            if _paste_just_collapsed[0] or self._skip_paste_collapse:
                _paste_just_collapsed[0] = False
+                self._skip_paste_collapse = False
                _prev_newline_count[0] = text.count('\n')
                return
            line_count = text.count('\n')
@@ -9311,12 +9569,10 @@ class HermesCLI:
            is_paste = chars_added > 1 or newlines_added >= 4
            if line_count >= 5 and is_paste and not text.startswith('/'):
                _paste_counter[0] += 1
-                # Save to temp file
                paste_dir = _hermes_home / "pastes"
                paste_dir.mkdir(parents=True, exist_ok=True)
                paste_file = paste_dir / f"paste_{_paste_counter[0]}_{datetime.now().strftime('%H%M%S')}.txt"
                paste_file.write_text(text, encoding="utf-8")
-                # Replace buffer with compact reference
                _paste_just_collapsed[0] = True
                buf.text = f"[Pasted text #{_paste_counter[0]}: {line_count + 1} lines \u2192 {paste_file}]"
                buf.cursor_position = len(buf.text)
@@ -10038,45 +10294,9 @@ class HermesCLI:
                    _paste_ref_re = _re.compile(r'\[Pasted text #\d+: \d+ lines \u2192 (.+?)\]')
                    paste_refs = list(_paste_ref_re.finditer(user_input)) if isinstance(user_input, str) else []
                    if paste_refs:
-                        def _expand_ref(m):
-                            p = Path(m.group(1))
-                            return p.read_text(encoding="utf-8") if p.exists() else m.group(0)
-                        expanded = _paste_ref_re.sub(_expand_ref, user_input)
-                        total_lines = expanded.count('\n') + 1
-                        n_pastes = len(paste_refs)
-                        _user_bar = f"[{_accent_hex()}]{'─' * 40}[/]"
-                        print()
-                        ChatConsole().print(_user_bar)
-                        # Show any surrounding user text alongside the paste summary
-                        split_parts = _paste_ref_re.split(user_input)
-                        visible_user_text = " ".join(
-                            split_parts[i].strip() for i in range(0, len(split_parts), 2) if split_parts[i].strip()
-                        )
-                        if visible_user_text:
-                            ChatConsole().print(
-                                f"[bold {_accent_hex()}]\u25cf[/] [bold]{_escape(visible_user_text)}[/] "
-                                f"[dim]({n_pastes} pasted block{'s' if n_pastes > 1 else ''}, {total_lines} lines total)[/]"
-                            )
-                        else:
-                            ChatConsole().print(
-                                f"[bold {_accent_hex()}]\u25cf[/] [bold]{_escape(f'[Pasted text: {total_lines} lines]')}[/]"
-                            )
-                        user_input = expanded
-                    else:
-                        _user_bar = f"[{_accent_hex()}]{'─' * 40}[/]"
-                        if '\n' in user_input:
-                            first_line = user_input.split('\n')[0]
-                            line_count = user_input.count('\n') + 1
-                            print()
-                            ChatConsole().print(_user_bar)
-                            ChatConsole().print(
-                                f"[bold {_accent_hex()}]●[/] [bold]{_escape(first_line)}[/] "
-                                f"[dim](+{line_count - 1} lines)[/]"
-                            )
-                        else:
-                            print()
-                            ChatConsole().print(_user_bar)
-                            ChatConsole().print(f"[bold {_accent_hex()}]●[/] [bold]{_escape(user_input)}[/]")
+                        user_input = self._expand_paste_references(user_input)
+                    print()
+                    self._print_user_message_preview(user_input)
                    
                    # Show image attachment count
                    if submit_images:
@@ -10535,7 +10755,6 @@ def main(
                if cli._init_agent(
                    model_override=turn_route["model"],
                    runtime_override=turn_route["runtime"],
-                    route_label=turn_route["label"],
                    request_overrides=turn_route.get("request_overrides"),
                ):
                    cli.agent.quiet_mode = True
@@ -10549,6 +10768,15 @@ def main(
                        user_message=effective_query,
                        conversation_history=cli.conversation_history,
                    )
+                    # Sync session_id if mid-run compression created a
+                    # continuation session. The exit line below reports
+                    # session_id to stderr for automation wrappers; without
+                    # this sync it would point at the ended parent.
+                    if (
+                        getattr(cli.agent, "session_id", None)
+                        and cli.agent.session_id != cli.session_id
+                    ):
+                        cli.session_id = cli.agent.session_id
                    response = result.get("final_response", "") if isinstance(result, dict) else str(result)
                    if response:
                        print(response)
@@ -826,7 +826,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:

        # Provider routing
        pr = _cfg.get("provider_routing", {})
-        smart_routing = _cfg.get("smart_model_routing", {}) or {}

        from hermes_cli.runtime_provider import (
            resolve_runtime_provider,
@@ -843,24 +842,9 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            message = format_runtime_provider_error(exc)
            raise RuntimeError(message) from exc

-        from agent.smart_model_routing import resolve_turn_route
-        turn_route = resolve_turn_route(
-            prompt,
-            smart_routing,
-            {
-                "model": model,
-                "api_key": runtime.get("api_key"),
-                "base_url": runtime.get("base_url"),
-                "provider": runtime.get("provider"),
-                "api_mode": runtime.get("api_mode"),
-                "command": runtime.get("command"),
-                "args": list(runtime.get("args") or []),
-            },
-        )
-
        fallback_model = _cfg.get("fallback_providers") or _cfg.get("fallback_model") or None
        credential_pool = None
-        runtime_provider = str(turn_route["runtime"].get("provider") or "").strip().lower()
+        runtime_provider = str(runtime.get("provider") or "").strip().lower()
        if runtime_provider:
            try:
                from agent.credential_pool import load_pool
@@ -877,13 +861,13 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
                logger.debug("Job '%s': failed to load credential pool for %s: %s", job_id, runtime_provider, e)

        agent = AIAgent(
-            model=turn_route["model"],
-            api_key=turn_route["runtime"].get("api_key"),
-            base_url=turn_route["runtime"].get("base_url"),
-            provider=turn_route["runtime"].get("provider"),
-            api_mode=turn_route["runtime"].get("api_mode"),
-            acp_command=turn_route["runtime"].get("command"),
-            acp_args=turn_route["runtime"].get("args"),
+            model=model,
+            api_key=runtime.get("api_key"),
+            base_url=runtime.get("base_url"),
+            provider=runtime.get("provider"),
+            api_mode=runtime.get("api_mode"),
+            acp_command=runtime.get("command"),
+            acp_args=runtime.get("args"),
            max_iterations=max_iterations,
            reasoning_config=reasoning_config,
            prefill_messages=prefill_messages,
@@ -1,228 +0,0 @@
-# Hermes Agent — ACP (Agent Client Protocol) Setup Guide
-
-Hermes Agent supports the **Agent Client Protocol (ACP)**, allowing it to run as
-a coding agent inside your editor. ACP lets your IDE send tasks to Hermes, and
-Hermes responds with file edits, terminal commands, and explanations — all shown
-natively in the editor UI.
-
---
-
-## Prerequisites
-
- Hermes Agent installed and configured (`hermes setup` completed)
- An API key / provider set up in `~/.hermes/.env` or via `hermes login`
- Python 3.11+
-
-Install the ACP extra:
-
-```bash
-pip install -e ".[acp]"
-```
-
---
-
-## VS Code Setup
-
-### 1. Install the ACP Client extension
-
-Open VS Code and install **ACP Client** from the marketplace:
-
- Press `Ctrl+Shift+X` (or `Cmd+Shift+X` on macOS)
- Search for **"ACP Client"**
- Click **Install**
-
-Or install from the command line:
-
-```bash
-code --install-extension anysphere.acp-client
-```
-
-### 2. Configure settings.json
-
-Open your VS Code settings (`Ctrl+,` → click the `{}` icon for JSON) and add:
-
-```json
-{
-  "acpClient.agents": [
-    {
-      "name": "hermes-agent",
-      "registryDir": "/path/to/hermes-agent/acp_registry"
-    }
-  ]
-}
-```
-
-Replace `/path/to/hermes-agent` with the actual path to your Hermes Agent
-installation (e.g. `~/.hermes/hermes-agent`).
-
-Alternatively, if `hermes` is on your PATH, the ACP Client can discover it
-automatically via the registry directory.
-
-### 3. Restart VS Code
-
-After configuring, restart VS Code. You should see **Hermes Agent** appear in
-the ACP agent picker in the chat/agent panel.
-
---
-
-## Zed Setup
-
-Zed has built-in ACP support.
-
-### 1. Configure Zed settings
-
-Open Zed settings (`Cmd+,` on macOS or `Ctrl+,` on Linux) and add to your
-`settings.json`:
-
-```json
-{
-  "agent_servers": {
-    "hermes-agent": {
-      "type": "custom",
-      "command": "hermes",
-      "args": ["acp"],
-    },
-  },
-}
-```
-
-### 2. Restart Zed
-
-Hermes Agent will appear in the agent panel. Select it and start a conversation.
-
---
-
-## JetBrains Setup (IntelliJ, PyCharm, WebStorm, etc.)
-
-### 1. Install the ACP plugin
-
- Open **Settings** → **Plugins** → **Marketplace**
- Search for **"ACP"** or **"Agent Client Protocol"**
- Install and restart the IDE
-
-### 2. Configure the agent
-
- Open **Settings** → **Tools** → **ACP Agents**
- Click **+** to add a new agent
- Set the registry directory to your `acp_registry/` folder:
-  `/path/to/hermes-agent/acp_registry`
- Click **OK**
-
-### 3. Use the agent
-
-Open the ACP panel (usually in the right sidebar) and select **Hermes Agent**.
-
---
-
-## What You Will See
-
-Once connected, your editor provides a native interface to Hermes Agent:
-
-### Chat Panel
-A conversational interface where you can describe tasks, ask questions, and
-give instructions. Hermes responds with explanations and actions.
-
-### File Diffs
-When Hermes edits files, you see standard diffs in the editor. You can:
- **Accept** individual changes
- **Reject** changes you don't want
- **Review** the full diff before applying
-
-### Terminal Commands
-When Hermes needs to run shell commands (builds, tests, installs), the editor
-shows them in an integrated terminal. Depending on your settings:
- Commands may run automatically
- Or you may be prompted to **approve** each command
-
-### Approval Flow
-For potentially destructive operations, the editor will prompt you for
-approval before Hermes proceeds. This includes:
- File deletions
- Shell commands
- Git operations
-
---
-
-## Configuration
-
-Hermes Agent under ACP uses the **same configuration** as the CLI:
-
- **API keys / providers**: `~/.hermes/.env`
- **Agent config**: `~/.hermes/config.yaml`
- **Skills**: `~/.hermes/skills/`
- **Sessions**: `~/.hermes/state.db`
-
-You can run `hermes setup` to configure providers, or edit `~/.hermes/.env`
-directly.
-
-### Changing the model
-
-Edit `~/.hermes/config.yaml`:
-
-```yaml
-model: openrouter/nous/hermes-3-llama-3.1-70b
-```
-
-Or set the `HERMES_MODEL` environment variable.
-
-### Toolsets
-
-ACP sessions use the curated `hermes-acp` toolset by default. It is designed for editor workflows and intentionally excludes things like messaging delivery, cronjob management, and audio-first UX features.
-
---
-
-## Troubleshooting
-
-### Agent doesn't appear in the editor
-
-1. **Check the registry path** — make sure the `acp_registry/` directory path
-   in your editor settings is correct and contains `agent.json`.
-2. **Check `hermes` is on PATH** — run `which hermes` in a terminal. If not
-   found, you may need to activate your virtualenv or add it to PATH.
-3. **Restart the editor** after changing settings.
-
-### Agent starts but errors immediately
-
-1. Run `hermes doctor` to check your configuration.
-2. Check that you have a valid API key: `hermes status`
-3. Try running `hermes acp` directly in a terminal to see error output.
-
-### "Module not found" errors
-
-Make sure you installed the ACP extra:
-
-```bash
-pip install -e ".[acp]"
-```
-
-### Slow responses
-
- ACP streams responses, so you should see incremental output. If the agent
-  appears stuck, check your network connection and API provider status.
- Some providers have rate limits. Try switching to a different model/provider.
-
-### Permission denied for terminal commands
-
-If the editor blocks terminal commands, check your ACP Client extension
-settings for auto-approval or manual-approval preferences.
-
-### Logs
-
-Hermes logs are written to stderr when running in ACP mode. Check:
- VS Code: **Output** panel → select **ACP Client** or **Hermes Agent**
- Zed: **View** → **Toggle Terminal** and check the process output
- JetBrains: **Event Log** or the ACP tool window
-
-You can also enable verbose logging:
-
-```bash
-HERMES_LOG_LEVEL=DEBUG hermes acp
-```
-
---
-
-## Further Reading
-
- [ACP Specification](https://github.com/anysphere/acp)
- [Hermes Agent Documentation](https://github.com/NousResearch/hermes-agent)
- Run `hermes --help` for all CLI options
@@ -1,698 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-<meta charset="UTF-8">
-<meta name="viewport" content="width=device-width, initial-scale=1.0">
-<title>honcho-integration-spec</title>
-<style>
-  :root {
-    --bg:             #0b0e14;
-    --bg-surface:     #11151c;
-    --bg-elevated:    #181d27;
-    --bg-code:        #0d1018;
-    --fg:             #c9d1d9;
-    --fg-bright:      #e6edf3;
-    --fg-muted:       #6e7681;
-    --fg-subtle:      #484f58;
-    --accent:         #7eb8f6;
-    --accent-dim:     #3d6ea5;
-    --accent-glow:    rgba(126, 184, 246, 0.08);
-    --green:          #7ee6a8;
-    --green-dim:      #2ea04f;
-    --orange:         #e6a855;
-    --red:            #f47067;
-    --purple:         #bc8cff;
-    --cyan:           #56d4dd;
-    --border:         #21262d;
-    --border-subtle:  #161b22;
-    --radius:         6px;
-    --font-sans:      'New York', ui-serif, 'Iowan Old Style', 'Apple Garamond', Baskerville, 'Times New Roman', 'Noto Emoji', serif;
-    --font-mono:      'Departure Mono', 'Noto Emoji', monospace;
-  }
-
-  *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
-  html { scroll-behavior: smooth; scroll-padding-top: 2rem; }
-  body {
-    font-family: var(--font-sans);
-    background: var(--bg);
-    color: var(--fg);
-    line-height: 1.7;
-    font-size: 15px;
-    -webkit-font-smoothing: antialiased;
-  }
-
-  .container { max-width: 860px; margin: 0 auto; padding: 3rem 2rem 6rem; }
-
-  .hero {
-    text-align: center;
-    padding: 4rem 0 3rem;
-    border-bottom: 1px solid var(--border);
-    margin-bottom: 3rem;
-  }
-  .hero h1 { font-family: var(--font-mono); font-size: 2.2rem; font-weight: 700; color: var(--fg-bright); letter-spacing: -0.03em; margin-bottom: 0.5rem; }
-  .hero h1 span { color: var(--accent); }
-  .hero .subtitle { font-family: var(--font-sans); color: var(--fg-muted); font-size: 0.92rem; max-width: 560px; margin: 0 auto; line-height: 1.6; }
-  .hero .meta { margin-top: 1.5rem; display: flex; justify-content: center; gap: 1.5rem; flex-wrap: wrap; }
-  .hero .meta span { font-size: 0.8rem; color: var(--fg-subtle); font-family: var(--font-mono); }
-
-  .toc { background: var(--bg-surface); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.5rem 2rem; margin-bottom: 3rem; }
-  .toc h2 { font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.1em; color: var(--fg-muted); margin-bottom: 1rem; }
-  .toc ol { list-style: none; counter-reset: toc; columns: 2; column-gap: 2rem; }
-  .toc li { counter-increment: toc; break-inside: avoid; margin-bottom: 0.35rem; }
-  .toc li::before { content: counter(toc, decimal-leading-zero) " "; color: var(--fg-subtle); font-family: var(--font-mono); font-size: 0.75rem; margin-right: 0.25rem; }
-  .toc a { font-family: var(--font-mono); color: var(--fg); text-decoration: none; font-size: 0.82rem; transition: color 0.15s; }
-  .toc a:hover { color: var(--accent); }
-
-  section { margin-bottom: 4rem; }
-  section + section { padding-top: 1rem; }
-
-  h2 { font-family: var(--font-mono); font-size: 1.3rem; font-weight: 700; color: var(--fg-bright); letter-spacing: -0.01em; margin-bottom: 1.25rem; padding-bottom: 0.5rem; border-bottom: 1px solid var(--border); }
-  h3 { font-family: var(--font-mono); font-size: 1rem; font-weight: 600; color: var(--fg-bright); margin-top: 2rem; margin-bottom: 0.75rem; }
-  h4 { font-family: var(--font-mono); font-size: 0.9rem; font-weight: 600; color: var(--accent); margin-top: 1.5rem; margin-bottom: 0.5rem; }
-
-  p { margin-bottom: 1rem; font-size: 0.95rem; line-height: 1.75; }
-  strong { color: var(--fg-bright); font-weight: 600; }
-  a { color: var(--accent); text-decoration: none; }
-  a:hover { text-decoration: underline; }
-
-  ul, ol { margin-bottom: 1rem; padding-left: 1.5rem; font-size: 0.93rem; line-height: 1.7; }
-  li { margin-bottom: 0.35rem; }
-  li::marker { color: var(--fg-subtle); }
-
-  .table-wrap { overflow-x: auto; margin-bottom: 1.5rem; }
-  table { width: 100%; border-collapse: collapse; font-size: 0.88rem; }
-  th, td { text-align: left; padding: 0.6rem 1rem; border-bottom: 1px solid var(--border-subtle); }
-  th { font-family: var(--font-mono); font-size: 0.72rem; text-transform: uppercase; letter-spacing: 0.06em; color: var(--fg-muted); background: var(--bg-surface); border-bottom-color: var(--border); white-space: nowrap; }
-  td { font-family: var(--font-sans); font-size: 0.88rem; color: var(--fg); }
-  tr:hover td { background: var(--accent-glow); }
-  td code { background: var(--bg-elevated); padding: 0.15em 0.4em; border-radius: 3px; font-family: var(--font-mono); font-size: 0.82em; color: var(--cyan); }
-
-  pre { background: var(--bg-code); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.25rem 1.5rem; overflow-x: auto; margin-bottom: 1.5rem; font-family: var(--font-mono); font-size: 0.82rem; line-height: 1.65; color: var(--fg); }
-  pre code { background: none; padding: 0; color: inherit; font-size: inherit; }
-  code { font-family: var(--font-mono); font-size: 0.85em; }
-  p code, li code { background: var(--bg-elevated); padding: 0.15em 0.4em; border-radius: 3px; color: var(--cyan); font-size: 0.85em; }
-
-  .kw { color: var(--purple); }
-  .str { color: var(--green); }
-  .cm { color: var(--fg-subtle); font-style: italic; }
-  .num { color: var(--orange); }
-  .key { color: var(--accent); }
-
-  .mermaid { margin: 1.5rem 0 2rem; text-align: center; }
-  .mermaid svg { max-width: 100%; height: auto; }
-
-  .callout { font-family: var(--font-sans); background: var(--bg-surface); border-left: 3px solid var(--accent-dim); border-radius: 0 var(--radius) var(--radius) 0; padding: 1rem 1.25rem; margin-bottom: 1.5rem; font-size: 0.88rem; color: var(--fg-muted); line-height: 1.6; }
-  .callout strong { font-family: var(--font-mono); color: var(--fg-bright); }
-  .callout.success { border-left-color: var(--green-dim); }
-  .callout.warn { border-left-color: var(--orange); }
-
-  .badge { display: inline-block; font-family: var(--font-mono); font-size: 0.65rem; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; padding: 0.2em 0.6em; border-radius: 3px; vertical-align: middle; margin-left: 0.4rem; }
-  .badge-done { background: var(--green-dim); color: #fff; }
-  .badge-wip { background: var(--orange); color: #0b0e14; }
-  .badge-todo { background: var(--fg-subtle); color: var(--fg); }
-
-  .checklist { list-style: none; padding-left: 0; }
-  .checklist li { padding-left: 1.5rem; position: relative; margin-bottom: 0.5rem; }
-  .checklist li::before { position: absolute; left: 0; font-family: var(--font-mono); font-size: 0.85rem; }
-  .checklist li.done { color: var(--fg-muted); }
-  .checklist li.done::before { content: "\2713"; color: var(--green); }
-  .checklist li.todo::before { content: "\25CB"; color: var(--fg-subtle); }
-  .checklist li.wip::before { content: "\25D4"; color: var(--orange); }
-
-  .compare { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-bottom: 2rem; }
-  .compare-card { background: var(--bg-surface); border: 1px solid var(--border); border-radius: var(--radius); padding: 1.25rem; }
-  .compare-card h4 { margin-top: 0; font-size: 0.82rem; }
-  .compare-card.after { border-color: var(--accent-dim); }
-  .compare-card ul { font-family: var(--font-mono); padding-left: 1.25rem; font-size: 0.8rem; }
-
-  hr { border: none; border-top: 1px solid var(--border); margin: 3rem 0; }
-
-  .progress-bar { position: fixed; top: 0; left: 0; height: 2px; background: var(--accent); z-index: 999; transition: width 0.1s linear; }
-
-  @media (max-width: 640px) {
-    .container { padding: 2rem 1rem 4rem; }
-    .hero h1 { font-size: 1.6rem; }
-    .toc ol { columns: 1; }
-    .compare { grid-template-columns: 1fr; }
-    table { font-size: 0.8rem; }
-    th, td { padding: 0.4rem 0.6rem; }
-  }
-</style>
-<link rel="preconnect" href="https://fonts.googleapis.com">
-<link href="https://fonts.googleapis.com/css2?family=Noto+Emoji&display=swap" rel="stylesheet">
-<style>
-  @font-face {
-    font-family: 'Departure Mono';
-    src: url('https://cdn.jsdelivr.net/gh/rektdeckard/departure-mono@latest/fonts/DepartureMono-Regular.woff2') format('woff2');
-    font-weight: normal;
-    font-style: normal;
-    font-display: swap;
-  }
-</style>
-</head>
-<body>
-
-<div class="progress-bar" id="progress"></div>
-
-<div class="container">
-
-<header class="hero">
-  <h1>honcho<span>-integration-spec</span></h1>
-  <p class="subtitle">Comparison of Hermes Agent vs. openclaw-honcho — and a porting spec for bringing Hermes patterns into other Honcho integrations.</p>
-  <div class="meta">
-    <span>hermes-agent / openclaw-honcho</span>
-    <span>Python + TypeScript</span>
-    <span>2026-03-09</span>
-  </div>
-</header>
-
-<nav class="toc">
-  <h2>Contents</h2>
-  <ol>
-    <li><a href="#overview">Overview</a></li>
-    <li><a href="#architecture">Architecture comparison</a></li>
-    <li><a href="#diff-table">Diff table</a></li>
-    <li><a href="#patterns">Hermes patterns to port</a></li>
-    <li><a href="#spec-async">Spec: async prefetch</a></li>
-    <li><a href="#spec-reasoning">Spec: dynamic reasoning level</a></li>
-    <li><a href="#spec-modes">Spec: per-peer memory modes</a></li>
-    <li><a href="#spec-identity">Spec: AI peer identity formation</a></li>
-    <li><a href="#spec-sessions">Spec: session naming strategies</a></li>
-    <li><a href="#spec-cli">Spec: CLI surface injection</a></li>
-    <li><a href="#openclaw-checklist">openclaw-honcho checklist</a></li>
-    <li><a href="#nanobot-checklist">nanobot-honcho checklist</a></li>
-  </ol>
-</nav>
-
-<!-- OVERVIEW -->
-<section id="overview">
-  <h2>Overview</h2>
-
-  <p>Two independent Honcho integrations have been built for two different agent runtimes: <strong>Hermes Agent</strong> (Python, baked into the runner) and <strong>openclaw-honcho</strong> (TypeScript plugin via hook/tool API). Both use the same Honcho peer paradigm — dual peer model, <code>session.context()</code>, <code>peer.chat()</code> — but they made different tradeoffs at every layer.</p>
-
-  <p>This document maps those tradeoffs and defines a porting spec: a set of Hermes-originated patterns, each stated as an integration-agnostic interface, that any Honcho integration can adopt regardless of runtime or language.</p>
-
-  <div class="callout">
-    <strong>Scope</strong> Both integrations work correctly today. This spec is about the delta — patterns in Hermes that are worth propagating and patterns in openclaw-honcho that Hermes should eventually adopt. The spec is additive, not prescriptive.
-  </div>
-</section>
-
-<!-- ARCHITECTURE -->
-<section id="architecture">
-  <h2>Architecture comparison</h2>
-
-  <h3>Hermes: baked-in runner</h3>
-  <p>Honcho is initialised directly inside <code>AIAgent.__init__</code>. There is no plugin boundary. Session management, context injection, async prefetch, and CLI surface are all first-class concerns of the runner. Context is injected once per session (baked into <code>_cached_system_prompt</code>) and never re-fetched mid-session — this maximises prefix cache hits at the LLM provider.</p>
-
-  <div class="mermaid">
-%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1f3150', 'primaryTextColor': '#c9d1d9', 'primaryBorderColor': '#3d6ea5', 'lineColor': '#3d6ea5', 'secondaryColor': '#162030', 'tertiaryColor': '#11151c' }}}%%
-flowchart TD
-    U["user message"] --> P["_honcho_prefetch()<br/>(reads cache — no HTTP)"]
-    P --> SP["_build_system_prompt()<br/>(first turn only, cached)"]
-    SP --> LLM["LLM call"]
-    LLM --> R["response"]
-    R --> FP["_honcho_fire_prefetch()<br/>(daemon threads, turn end)"]
-    FP --> C1["prefetch_context() thread"]
-    FP --> C2["prefetch_dialectic() thread"]
-    C1 --> CACHE["_context_cache / _dialectic_cache"]
-    C2 --> CACHE
-
-    style U fill:#162030,stroke:#3d6ea5,color:#c9d1d9
-    style P fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
-    style SP fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
-    style LLM fill:#162030,stroke:#3d6ea5,color:#c9d1d9
-    style R fill:#162030,stroke:#3d6ea5,color:#c9d1d9
-    style FP fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
-    style C1 fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
-    style C2 fill:#2a1a40,stroke:#bc8cff,color:#c9d1d9
-    style CACHE fill:#11151c,stroke:#484f58,color:#6e7681
-  </div>
-
-  <h3>openclaw-honcho: hook-based plugin</h3>
-  <p>The plugin registers hooks against OpenClaw's event bus. Context is fetched synchronously inside <code>before_prompt_build</code> on every turn. Message capture happens in <code>agent_end</code>. The multi-agent hierarchy is tracked via <code>subagent_spawned</code>. This model is correct but every turn pays a blocking Honcho round-trip before the LLM call can begin.</p>
-
-  <div class="mermaid">
-%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1f3150', 'primaryTextColor': '#c9d1d9', 'primaryBorderColor': '#3d6ea5', 'lineColor': '#3d6ea5', 'secondaryColor': '#162030', 'tertiaryColor': '#11151c' }}}%%
-flowchart TD
-    U2["user message"] --> BPB["before_prompt_build<br/>(BLOCKING HTTP — every turn)"]
-    BPB --> CTX["session.context()"]
-    CTX --> SP2["system prompt assembled"]
-    SP2 --> LLM2["LLM call"]
-    LLM2 --> R2["response"]
-    R2 --> AE["agent_end hook"]
-    AE --> SAVE["session.addMessages()<br/>session.setMetadata()"]
-
-    style U2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
-    style BPB fill:#3a1515,stroke:#f47067,color:#c9d1d9
-    style CTX fill:#3a1515,stroke:#f47067,color:#c9d1d9
-    style SP2 fill:#1f3150,stroke:#3d6ea5,color:#c9d1d9
-    style LLM2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
-    style R2 fill:#162030,stroke:#3d6ea5,color:#c9d1d9
-    style AE fill:#162030,stroke:#3d6ea5,color:#c9d1d9
-    style SAVE fill:#11151c,stroke:#484f58,color:#6e7681
-  </div>
-</section>
-
-<!-- DIFF TABLE -->
-<section id="diff-table">
-  <h2>Diff table</h2>
-
-  <div class="table-wrap">
-    <table>
-      <thead>
-        <tr>
-          <th>Dimension</th>
-          <th>Hermes Agent</th>
-          <th>openclaw-honcho</th>
-        </tr>
-      </thead>
-      <tbody>
-        <tr>
-          <td><strong>Context injection timing</strong></td>
-          <td>Once per session (cached). Zero HTTP on response path after turn 1.</td>
-          <td>Every turn, blocking. Fresh context per turn but adds latency.</td>
-        </tr>
-        <tr>
-          <td><strong>Prefetch strategy</strong></td>
-          <td>Daemon threads fire at turn end; consumed next turn from cache.</td>
-          <td>None. Blocking call at prompt-build time.</td>
-        </tr>
-        <tr>
-          <td><strong>Dialectic (peer.chat)</strong></td>
-          <td>Prefetched async; result injected into system prompt next turn.</td>
-          <td>On-demand via <code>honcho_recall</code> / <code>honcho_analyze</code> tools.</td>
-        </tr>
-        <tr>
-          <td><strong>Reasoning level</strong></td>
-          <td>Dynamic: scales with message length. Floor = config default. Cap = "high".</td>
-          <td>Fixed per tool: recall=minimal, analyze=medium.</td>
-        </tr>
-        <tr>
-          <td><strong>Memory modes</strong></td>
-          <td><code>user_memory_mode</code> / <code>agent_memory_mode</code>: hybrid / honcho / local.</td>
-          <td>None. Always writes to Honcho.</td>
-        </tr>
-        <tr>
-          <td><strong>Write frequency</strong></td>
-          <td>async (background queue), turn, session, N turns.</td>
-          <td>After every agent_end (no control).</td>
-        </tr>
-        <tr>
-          <td><strong>AI peer identity</strong></td>
-          <td><code>observe_me=True</code>, <code>seed_ai_identity()</code>, <code>get_ai_representation()</code>, SOUL.md → AI peer.</td>
-          <td>Agent files uploaded to agent peer at setup. No ongoing self-observation seeding.</td>
-        </tr>
-        <tr>
-          <td><strong>Context scope</strong></td>
-          <td>User peer + AI peer representation, both injected.</td>
-          <td>User peer (owner) representation + conversation summary. <code>peerPerspective</code> on context call.</td>
-        </tr>
-        <tr>
-          <td><strong>Session naming</strong></td>
-          <td>per-directory / global / manual map / title-based.</td>
-          <td>Derived from platform session key.</td>
-        </tr>
-        <tr>
-          <td><strong>Multi-agent</strong></td>
-          <td>Single-agent only.</td>
-          <td>Parent observer hierarchy via <code>subagent_spawned</code>.</td>
-        </tr>
-        <tr>
-          <td><strong>Tool surface</strong></td>
-          <td>Single <code>query_user_context</code> tool (on-demand dialectic).</td>
-          <td>6 tools: session, profile, search, context (fast) + recall, analyze (LLM).</td>
-        </tr>
-        <tr>
-          <td><strong>Platform metadata</strong></td>
-          <td>Not stripped.</td>
-          <td>Explicitly stripped before Honcho storage.</td>
-        </tr>
-        <tr>
-          <td><strong>Message dedup</strong></td>
-          <td>None (sends on every save cycle).</td>
-          <td><code>lastSavedIndex</code> in session metadata prevents re-sending.</td>
-        </tr>
-        <tr>
-          <td><strong>CLI surface in prompt</strong></td>
-          <td>Management commands injected into system prompt. Agent knows its own CLI.</td>
-          <td>Not injected.</td>
-        </tr>
-        <tr>
-          <td><strong>AI peer name in identity</strong></td>
-          <td>Replaces "Hermes Agent" in DEFAULT_AGENT_IDENTITY when configured.</td>
-          <td>Not implemented.</td>
-        </tr>
-        <tr>
-          <td><strong>QMD / local file search</strong></td>
-          <td>Not implemented.</td>
-          <td>Passthrough tools when QMD backend configured.</td>
-        </tr>
-        <tr>
-          <td><strong>Workspace metadata</strong></td>
-          <td>Not implemented.</td>
-          <td><code>agentPeerMap</code> in workspace metadata tracks agent&#8594;peer ID.</td>
-        </tr>
-      </tbody>
-    </table>
-  </div>
-</section>
-
-<!-- PATTERNS -->
-<section id="patterns">
-  <h2>Hermes patterns to port</h2>
-
-  <p>Six patterns from Hermes are worth adopting in any Honcho integration. They are described below as integration-agnostic interfaces — the implementation will differ per runtime, but the contract is the same.</p>
-
-  <div class="compare">
-    <div class="compare-card">
-      <h4>Patterns Hermes contributes</h4>
-      <ul>
-        <li>Async prefetch (zero-latency)</li>
-        <li>Dynamic reasoning level</li>
-        <li>Per-peer memory modes</li>
-        <li>AI peer identity formation</li>
-        <li>Session naming strategies</li>
-        <li>CLI surface injection</li>
-      </ul>
-    </div>
-    <div class="compare-card after">
-      <h4>Patterns openclaw contributes back</h4>
-      <ul>
-        <li>lastSavedIndex dedup</li>
-        <li>Platform metadata stripping</li>
-        <li>Multi-agent observer hierarchy</li>
-        <li>peerPerspective on context()</li>
-        <li>Tiered tool surface (fast/LLM)</li>
-        <li>Workspace agentPeerMap</li>
-      </ul>
-    </div>
-  </div>
-</section>
-
-<!-- SPEC: ASYNC PREFETCH -->
-<section id="spec-async">
-  <h2>Spec: async prefetch</h2>
-
-  <h3>Problem</h3>
-  <p>Calling <code>session.context()</code> and <code>peer.chat()</code> synchronously before each LLM call adds 200–800ms of Honcho round-trip latency to every turn. Users experience this as the agent "thinking slowly."</p>
-
-  <h3>Pattern</h3>
-  <p>Fire both calls as non-blocking background work at the <strong>end</strong> of each turn. Store results in a per-session cache keyed by session ID. At the <strong>start</strong> of the next turn, pop from cache — the HTTP is already done. First turn is cold (empty cache); all subsequent turns are zero-latency on the response path.</p>
-
-  <h3>Interface contract</h3>
-  <pre><code><span class="cm">// TypeScript (openclaw / nanobot plugin shape)</span>
-
-<span class="kw">interface</span> <span class="key">AsyncPrefetch</span> {
-  <span class="cm">// Fire context + dialectic fetches at turn end. Non-blocking.</span>
-  firePrefetch(sessionId: <span class="str">string</span>, userMessage: <span class="str">string</span>): <span class="kw">void</span>;
-
-  <span class="cm">// Pop cached results at turn start. Returns empty if cache is cold.</span>
-  popContextResult(sessionId: <span class="str">string</span>): ContextResult | <span class="kw">null</span>;
-  popDialecticResult(sessionId: <span class="str">string</span>): <span class="str">string</span> | <span class="kw">null</span>;
-}
-
-<span class="kw">type</span> <span class="key">ContextResult</span> = {
-  representation: <span class="str">string</span>;
-  card: <span class="str">string</span>[];
-  aiRepresentation?: <span class="str">string</span>;  <span class="cm">// AI peer context if enabled</span>
-  summary?: <span class="str">string</span>;            <span class="cm">// conversation summary if fetched</span>
-};</code></pre>
-
-  <h3>Implementation notes</h3>
-  <ul>
-    <li>Python: <code>threading.Thread(daemon=True)</code>. Write to <code>dict[session_id, result]</code> — GIL makes this safe for simple writes.</li>
-    <li>TypeScript: <code>Promise</code> stored in <code>Map&lt;string, Promise&lt;ContextResult&gt;&gt;</code>. Await at pop time. If not resolved yet, skip (return null) — do not block.</li>
-    <li>The pop is destructive: clears the cache entry after reading so stale data never accumulates.</li>
-    <li>Prefetch should also fire on first turn (even though it won't be consumed until turn 2) — this ensures turn 2 is never cold.</li>
-  </ul>
-
-  <h3>openclaw-honcho adoption</h3>
-  <p>Move <code>session.context()</code> from <code>before_prompt_build</code> to a post-<code>agent_end</code> background task. Store result in <code>state.contextCache</code>. In <code>before_prompt_build</code>, read from cache instead of calling Honcho. If cache is empty (turn 1), inject nothing — the prompt is still valid without Honcho context on the first turn.</p>
-</section>
-
-<!-- SPEC: DYNAMIC REASONING LEVEL -->
-<section id="spec-reasoning">
-  <h2>Spec: dynamic reasoning level</h2>
-
-  <h3>Problem</h3>
-  <p>Honcho's dialectic endpoint supports reasoning levels from <code>minimal</code> to <code>max</code>. A fixed level per tool wastes budget on simple queries and under-serves complex ones.</p>
-
-  <h3>Pattern</h3>
-  <p>Select the reasoning level dynamically based on the user's message. Use the configured default as a floor. Bump by message length. Cap auto-selection at <code>high</code> — never select <code>max</code> automatically.</p>
-
-  <h3>Interface contract</h3>
-  <pre><code><span class="cm">// Shared helper — identical logic in any language</span>
-
-<span class="kw">const</span> LEVELS = [<span class="str">"minimal"</span>, <span class="str">"low"</span>, <span class="str">"medium"</span>, <span class="str">"high"</span>, <span class="str">"max"</span>];
-
-<span class="kw">function</span> <span class="key">dynamicReasoningLevel</span>(
-  query: <span class="str">string</span>,
-  configDefault: <span class="str">string</span> = <span class="str">"low"</span>
-): <span class="str">string</span> {
-  <span class="kw">const</span> baseIdx = Math.max(<span class="num">0</span>, LEVELS.indexOf(configDefault));
-  <span class="kw">const</span> n = query.length;
-  <span class="kw">const</span> bump = n &lt; <span class="num">120</span> ? <span class="num">0</span> : n &lt; <span class="num">400</span> ? <span class="num">1</span> : <span class="num">2</span>;
-  <span class="kw">return</span> LEVELS[Math.min(baseIdx + bump, <span class="num">3</span>)]; <span class="cm">// cap at "high" (idx 3)</span>
-}</code></pre>
-
-  <h3>Config key</h3>
-  <p>Add a <code>dialecticReasoningLevel</code> config field (string, default <code>"low"</code>). This sets the floor. Users can raise or lower it. The dynamic bump always applies on top.</p>
-
-  <h3>openclaw-honcho adoption</h3>
-  <p>Apply in <code>honcho_recall</code> and <code>honcho_analyze</code>: replace the fixed <code>reasoningLevel</code> with the dynamic selector. <code>honcho_recall</code> should use floor <code>"minimal"</code> and <code>honcho_analyze</code> floor <code>"medium"</code> — both still bump with message length.</p>
-</section>
-
-<!-- SPEC: PER-PEER MEMORY MODES -->
-<section id="spec-modes">
-  <h2>Spec: per-peer memory modes</h2>
-
-  <h3>Problem</h3>
-  <p>Users want independent control over whether user context and agent context are written locally, to Honcho, or both. A single <code>memoryMode</code> shorthand is not granular enough.</p>
-
-  <h3>Pattern</h3>
-  <p>Three modes per peer: <code>hybrid</code> (write both local + Honcho), <code>honcho</code> (Honcho only, disable local files), <code>local</code> (local files only, skip Honcho sync for this peer). Two orthogonal axes: user peer and agent peer.</p>
-
-  <h3>Config schema</h3>
-  <pre><code><span class="cm">// ~/.openclaw/openclaw.json  (or ~/.nanobot/config.json)</span>
-{
-  <span class="str">"plugins"</span>: {
-    <span class="str">"openclaw-honcho"</span>: {
-      <span class="str">"config"</span>: {
-        <span class="str">"apiKey"</span>: <span class="str">"..."</span>,
-        <span class="str">"memoryMode"</span>: <span class="str">"hybrid"</span>,          <span class="cm">// shorthand: both peers</span>
-        <span class="str">"userMemoryMode"</span>: <span class="str">"honcho"</span>,       <span class="cm">// override for user peer</span>
-        <span class="str">"agentMemoryMode"</span>: <span class="str">"hybrid"</span>       <span class="cm">// override for agent peer</span>
-      }
-    }
-  }
-}</code></pre>
-
-  <h3>Resolution order</h3>
-  <ol>
-    <li>Per-peer field (<code>userMemoryMode</code> / <code>agentMemoryMode</code>) — wins if present.</li>
-    <li>Shorthand <code>memoryMode</code> — applies to both peers as default.</li>
-    <li>Hardcoded default: <code>"hybrid"</code>.</li>
-  </ol>
-
-  <h3>Effect on Honcho sync</h3>
-  <ul>
-    <li><code>userMemoryMode=local</code>: skip adding user peer messages to Honcho.</li>
-    <li><code>agentMemoryMode=local</code>: skip adding assistant peer messages to Honcho.</li>
-    <li>Both local: skip <code>session.addMessages()</code> entirely.</li>
-    <li><code>userMemoryMode=honcho</code>: disable local USER.md writes.</li>
-    <li><code>agentMemoryMode=honcho</code>: disable local MEMORY.md / SOUL.md writes.</li>
-  </ul>
-</section>
-
-<!-- SPEC: AI PEER IDENTITY -->
-<section id="spec-identity">
-  <h2>Spec: AI peer identity formation</h2>
-
-  <h3>Problem</h3>
-  <p>Honcho builds the user's representation organically by observing what the user says. The same mechanism exists for the AI peer — but only if <code>observe_me=True</code> is set for the agent peer. Without it, the agent peer accumulates nothing and Honcho's AI-side model never forms.</p>
-
-  <p>Additionally, existing persona files (SOUL.md, IDENTITY.md) should seed the AI peer's Honcho representation at first activation, rather than waiting for it to emerge from scratch.</p>
-
-  <h3>Part A: observe_me=True for agent peer</h3>
-  <pre><code><span class="cm">// TypeScript — in session.addPeers() call</span>
-<span class="kw">await</span> session.addPeers([
-  [ownerPeer.id, { observeMe: <span class="kw">true</span>,  observeOthers: <span class="kw">false</span> }],
-  [agentPeer.id, { observeMe: <span class="kw">true</span>,  observeOthers: <span class="kw">true</span>  }], <span class="cm">// was false</span>
-]);</code></pre>
-
-  <p>This is a one-line change but foundational. Without it, Honcho's AI peer representation stays empty regardless of what the agent says.</p>
-
-  <h3>Part B: seedAiIdentity()</h3>
-  <pre><code><span class="kw">async function</span> <span class="key">seedAiIdentity</span>(
-  session: HonchoSession,
-  agentPeer: Peer,
-  content: <span class="str">string</span>,
-  source: <span class="str">string</span>
-): Promise&lt;<span class="kw">boolean</span>&gt; {
-  <span class="kw">const</span> wrapped = [
-    <span class="str">`&lt;ai_identity_seed&gt;`</span>,
-    <span class="str">`&lt;source&gt;${source}&lt;/source&gt;`</span>,
-    <span class="str">``</span>,
-    content.trim(),
-    <span class="str">`&lt;/ai_identity_seed&gt;`</span>,
-  ].join(<span class="str">"\n"</span>);
-
-  <span class="kw">await</span> agentPeer.addMessage(<span class="str">"assistant"</span>, wrapped);
-  <span class="kw">return true</span>;
-}</code></pre>
-
-  <h3>Part C: migrate agent files at setup</h3>
-  <p>During <code>openclaw honcho setup</code>, upload agent-self files (SOUL.md, IDENTITY.md, AGENTS.md, BOOTSTRAP.md) to the agent peer using <code>seedAiIdentity()</code> instead of <code>session.uploadFile()</code>. This routes the content through Honcho's observation pipeline rather than the file store.</p>
-
-  <h3>Part D: AI peer name in identity</h3>
-  <p>When the agent has a configured name (non-default), inject it into the agent's self-identity prefix. In OpenClaw this means adding to the injected system prompt section:</p>
-  <pre><code><span class="cm">// In context hook return value</span>
-<span class="kw">return</span> {
-  systemPrompt: [
-    agentName ? <span class="str">`You are ${agentName}.`</span> : <span class="str">""</span>,
-    <span class="str">"## User Memory Context"</span>,
-    ...sections,
-  ].filter(Boolean).join(<span class="str">"\n\n"</span>)
-};</code></pre>
-
-  <h3>CLI surface: honcho identity subcommand</h3>
-  <pre><code>openclaw honcho identity &lt;file&gt;    <span class="cm"># seed from file</span>
-openclaw honcho identity --show    <span class="cm"># show current AI peer representation</span></code></pre>
-</section>
-
-<!-- SPEC: SESSION NAMING -->
-<section id="spec-sessions">
-  <h2>Spec: session naming strategies</h2>
-
-  <h3>Problem</h3>
-  <p>When Honcho is used across multiple projects or directories, a single global session means every project shares the same context. Per-directory sessions provide isolation without requiring users to name sessions manually.</p>
-
-  <h3>Strategies</h3>
-  <div class="table-wrap">
-    <table>
-      <thead><tr><th>Strategy</th><th>Session key</th><th>When to use</th></tr></thead>
-      <tbody>
-        <tr><td><code>per-directory</code></td><td>basename of CWD</td><td>Default. Each project gets its own session.</td></tr>
-        <tr><td><code>global</code></td><td>fixed string <code>"global"</code></td><td>Single cross-project session.</td></tr>
-        <tr><td>manual map</td><td>user-configured per path</td><td><code>sessions</code> config map overrides directory basename.</td></tr>
-        <tr><td>title-based</td><td>sanitized session title</td><td>When agent supports named sessions; title set mid-conversation.</td></tr>
-      </tbody>
-    </table>
-  </div>
-
-  <h3>Config schema</h3>
-  <pre><code>{
-  <span class="str">"sessionStrategy"</span>: <span class="str">"per-directory"</span>,   <span class="cm">// "per-directory" | "global"</span>
-  <span class="str">"sessionPeerPrefix"</span>: <span class="kw">false</span>,            <span class="cm">// prepend peer name to session key</span>
-  <span class="str">"sessions"</span>: {                            <span class="cm">// manual overrides</span>
-    <span class="str">"/home/user/projects/foo"</span>: <span class="str">"foo-project"</span>
-  }
-}</code></pre>
-
-  <h3>CLI surface</h3>
-  <pre><code>openclaw honcho sessions              <span class="cm"># list all mappings</span>
-openclaw honcho map &lt;name&gt;           <span class="cm"># map cwd to session name</span>
-openclaw honcho map                   <span class="cm"># no-arg = list mappings</span></code></pre>
-
-  <p>Resolution order: manual map wins &rarr; session title &rarr; directory basename &rarr; platform key.</p>
-</section>
-
-<!-- SPEC: CLI SURFACE INJECTION -->
-<section id="spec-cli">
-  <h2>Spec: CLI surface injection</h2>
-
-  <h3>Problem</h3>
-  <p>When a user asks "how do I change my memory settings?" or "what Honcho commands are available?" the agent either hallucinates or says it doesn't know. The agent should know its own management interface.</p>
-
-  <h3>Pattern</h3>
-  <p>When Honcho is active, append a compact command reference to the system prompt. The agent can cite these commands directly instead of guessing.</p>
-
-  <pre><code><span class="cm">// In context hook, append to systemPrompt</span>
-<span class="kw">const</span> honchoSection = [
-  <span class="str">"# Honcho memory integration"</span>,
-  <span class="str">`Active. Session: ${sessionKey}. Mode: ${mode}.`</span>,
-  <span class="str">"Management commands:"</span>,
-  <span class="str">"  openclaw honcho status                    — show config + connection"</span>,
-  <span class="str">"  openclaw honcho mode [hybrid|honcho|local] — show or set memory mode"</span>,
-  <span class="str">"  openclaw honcho sessions                  — list session mappings"</span>,
-  <span class="str">"  openclaw honcho map &lt;name&gt;                — map directory to session"</span>,
-  <span class="str">"  openclaw honcho identity [file] [--show]  — seed or show AI identity"</span>,
-  <span class="str">"  openclaw honcho setup                     — full interactive wizard"</span>,
-].join(<span class="str">"\n"</span>);</code></pre>
-
-  <div class="callout warn">
-    <strong>Keep it compact.</strong> This section is injected every turn. Keep it under 300 chars of context. List commands, not explanations — the agent can explain them on request.
-  </div>
-</section>
-
-<!-- OPENCLAW CHECKLIST -->
-<section id="openclaw-checklist">
-  <h2>openclaw-honcho checklist</h2>
-
-  <p>Ordered by impact. Each item maps to a spec section above.</p>
-
-  <ul class="checklist">
-    <li class="todo"><strong>Async prefetch</strong> — move <code>session.context()</code> out of <code>before_prompt_build</code> into post-<code>agent_end</code> background Promise. Pop from cache at prompt build. (<a href="#spec-async">spec</a>)</li>
-    <li class="todo"><strong>observe_me=True for agent peer</strong> — one-line change in <code>session.addPeers()</code> config for agent peer. (<a href="#spec-identity">spec</a>)</li>
-    <li class="todo"><strong>Dynamic reasoning level</strong> — add <code>dynamicReasoningLevel()</code> helper; apply in <code>honcho_recall</code> and <code>honcho_analyze</code>. Add <code>dialecticReasoningLevel</code> to config schema. (<a href="#spec-reasoning">spec</a>)</li>
-    <li class="todo"><strong>Per-peer memory modes</strong> — add <code>userMemoryMode</code> / <code>agentMemoryMode</code> to config; gate Honcho sync and local writes accordingly. (<a href="#spec-modes">spec</a>)</li>
-    <li class="todo"><strong>seedAiIdentity()</strong> — add helper; apply during setup migration for SOUL.md / IDENTITY.md instead of <code>session.uploadFile()</code>. (<a href="#spec-identity">spec</a>)</li>
-    <li class="todo"><strong>Session naming strategies</strong> — add <code>sessionStrategy</code>, <code>sessions</code> map, <code>sessionPeerPrefix</code> to config; implement resolution function. (<a href="#spec-sessions">spec</a>)</li>
-    <li class="todo"><strong>CLI surface injection</strong> — append command reference to <code>before_prompt_build</code> return value when Honcho is active. (<a href="#spec-cli">spec</a>)</li>
-    <li class="todo"><strong>honcho identity subcommand</strong> — add <code>openclaw honcho identity</code> CLI command. (<a href="#spec-identity">spec</a>)</li>
-    <li class="todo"><strong>AI peer name injection</strong> — if <code>aiPeer</code> name configured, prepend to injected system prompt. (<a href="#spec-identity">spec</a>)</li>
-    <li class="todo"><strong>honcho mode / honcho sessions / honcho map</strong> — CLI parity with Hermes. (<a href="#spec-sessions">spec</a>)</li>
-  </ul>
-
-  <div class="callout success">
-    <strong>Already done in openclaw-honcho (do not re-implement):</strong> lastSavedIndex dedup, platform metadata stripping, multi-agent parent observer hierarchy, peerPerspective on context(), tiered tool surface (fast/LLM), workspace agentPeerMap, QMD passthrough, self-hosted Honcho support.
-  </div>
-</section>
-
-<!-- NANOBOT CHECKLIST -->
-<section id="nanobot-checklist">
-  <h2>nanobot-honcho checklist</h2>
-
-  <p>nanobot-honcho is a greenfield integration. Start from openclaw-honcho's architecture (hook-based, dual peer) and apply all Hermes patterns from day one rather than retrofitting. Priority order:</p>
-
-  <h3>Phase 1 — core correctness</h3>
-  <ul class="checklist">
-    <li class="todo">Dual peer model (owner + agent peer), both with <code>observe_me=True</code></li>
-    <li class="todo">Message capture at turn end with <code>lastSavedIndex</code> dedup</li>
-    <li class="todo">Platform metadata stripping before Honcho storage</li>
-    <li class="todo">Async prefetch from day one — do not implement blocking context injection</li>
-    <li class="todo">Legacy file migration at first activation (USER.md → owner peer, SOUL.md → <code>seedAiIdentity()</code>)</li>
-  </ul>
-
-  <h3>Phase 2 — configuration</h3>
-  <ul class="checklist">
-    <li class="todo">Config schema: <code>apiKey</code>, <code>workspaceId</code>, <code>baseUrl</code>, <code>memoryMode</code>, <code>userMemoryMode</code>, <code>agentMemoryMode</code>, <code>dialecticReasoningLevel</code>, <code>sessionStrategy</code>, <code>sessions</code></li>
-    <li class="todo">Per-peer memory mode gating</li>
-    <li class="todo">Dynamic reasoning level</li>
-    <li class="todo">Session naming strategies</li>
-  </ul>
-
-  <h3>Phase 3 — tools and CLI</h3>
-  <ul class="checklist">
-    <li class="todo">Tool surface: <code>honcho_profile</code>, <code>honcho_recall</code>, <code>honcho_analyze</code>, <code>honcho_search</code>, <code>honcho_context</code></li>
-    <li class="todo">CLI: <code>setup</code>, <code>status</code>, <code>sessions</code>, <code>map</code>, <code>mode</code>, <code>identity</code></li>
-    <li class="todo">CLI surface injection into system prompt</li>
-    <li class="todo">AI peer name wired into agent identity</li>
-  </ul>
-</section>
-
-</div>
-
-<script type="module">
-  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
-  mermaid.initialize({ startOnLoad: true, securityLevel: 'loose', fontFamily: 'Departure Mono, Noto Emoji, monospace' });
-</script>
-<script>
-  window.addEventListener('scroll', () => {
-    const bar = document.getElementById('progress');
-    const max = document.documentElement.scrollHeight - window.innerHeight;
-    bar.style.width = (max > 0 ? (window.scrollY / max) * 100 : 0) + '%';
-  });
-</script>
-</body>
-</html>
@@ -1,377 +0,0 @@
-# honcho-integration-spec
-
-Comparison of Hermes Agent vs. openclaw-honcho — and a porting spec for bringing Hermes patterns into other Honcho integrations.
-
---
-
-## Overview
-
-Two independent Honcho integrations have been built for two different agent runtimes: **Hermes Agent** (Python, baked into the runner) and **openclaw-honcho** (TypeScript plugin via hook/tool API). Both use the same Honcho peer paradigm — dual peer model, `session.context()`, `peer.chat()` — but they made different tradeoffs at every layer.
-
-This document maps those tradeoffs and defines a porting spec: a set of Hermes-originated patterns, each stated as an integration-agnostic interface, that any Honcho integration can adopt regardless of runtime or language.
-
-> **Scope** Both integrations work correctly today. This spec is about the delta — patterns in Hermes that are worth propagating and patterns in openclaw-honcho that Hermes should eventually adopt. The spec is additive, not prescriptive.
-
---
-
-## Architecture comparison
-
-### Hermes: baked-in runner
-
-Honcho is initialised directly inside `AIAgent.__init__`. There is no plugin boundary. Session management, context injection, async prefetch, and CLI surface are all first-class concerns of the runner. Context is injected once per session (baked into `_cached_system_prompt`) and never re-fetched mid-session — this maximises prefix cache hits at the LLM provider.
-
-Turn flow:
-
-```
-user message
-  → _honcho_prefetch()       (reads cache — no HTTP)
-  → _build_system_prompt()   (first turn only, cached)
-  → LLM call
-  → response
-  → _honcho_fire_prefetch()  (daemon threads, turn end)
-       → prefetch_context() thread  ──┐
-       → prefetch_dialectic() thread ─┴→ _context_cache / _dialectic_cache
-```
-
-### openclaw-honcho: hook-based plugin
-
-The plugin registers hooks against OpenClaw's event bus. Context is fetched synchronously inside `before_prompt_build` on every turn. Message capture happens in `agent_end`. The multi-agent hierarchy is tracked via `subagent_spawned`. This model is correct but every turn pays a blocking Honcho round-trip before the LLM call can begin.
-
-Turn flow:
-
-```
-user message
-  → before_prompt_build (BLOCKING HTTP — every turn)
-       → session.context()
-  → system prompt assembled
-  → LLM call
-  → response
-  → agent_end hook
-       → session.addMessages()
-       → session.setMetadata()
-```
-
---
-
-## Diff table
-
-| Dimension | Hermes Agent | openclaw-honcho |
-|---|---|---|
-| **Context injection timing** | Once per session (cached). Zero HTTP on response path after turn 1. | Every turn, blocking. Fresh context per turn but adds latency. |
-| **Prefetch strategy** | Daemon threads fire at turn end; consumed next turn from cache. | None. Blocking call at prompt-build time. |
-| **Dialectic (peer.chat)** | Prefetched async; result injected into system prompt next turn. | On-demand via `honcho_recall` / `honcho_analyze` tools. |
-| **Reasoning level** | Dynamic: scales with message length. Floor = config default. Cap = "high". | Fixed per tool: recall=minimal, analyze=medium. |
-| **Memory modes** | `user_memory_mode` / `agent_memory_mode`: hybrid / honcho / local. | None. Always writes to Honcho. |
-| **Write frequency** | async (background queue), turn, session, N turns. | After every agent_end (no control). |
-| **AI peer identity** | `observe_me=True`, `seed_ai_identity()`, `get_ai_representation()`, SOUL.md → AI peer. | Agent files uploaded to agent peer at setup. No ongoing self-observation. |
-| **Context scope** | User peer + AI peer representation, both injected. | User peer (owner) representation + conversation summary. `peerPerspective` on context call. |
-| **Session naming** | per-directory / global / manual map / title-based. | Derived from platform session key. |
-| **Multi-agent** | Single-agent only. | Parent observer hierarchy via `subagent_spawned`. |
-| **Tool surface** | Single `query_user_context` tool (on-demand dialectic). | 6 tools: session, profile, search, context (fast) + recall, analyze (LLM). |
-| **Platform metadata** | Not stripped. | Explicitly stripped before Honcho storage. |
-| **Message dedup** | None. | `lastSavedIndex` in session metadata prevents re-sending. |
-| **CLI surface in prompt** | Management commands injected into system prompt. Agent knows its own CLI. | Not injected. |
-| **AI peer name in identity** | Replaces "Hermes Agent" in DEFAULT_AGENT_IDENTITY when configured. | Not implemented. |
-| **QMD / local file search** | Not implemented. | Passthrough tools when QMD backend configured. |
-| **Workspace metadata** | Not implemented. | `agentPeerMap` in workspace metadata tracks agent→peer ID. |
-
---
-
-## Patterns
-
-Six patterns from Hermes are worth adopting in any Honcho integration. Each is described as an integration-agnostic interface.
-
-**Hermes contributes:**
- Async prefetch (zero-latency)
- Dynamic reasoning level
- Per-peer memory modes
- AI peer identity formation
- Session naming strategies
- CLI surface injection
-
-**openclaw-honcho contributes back (Hermes should adopt):**
- `lastSavedIndex` dedup
- Platform metadata stripping
- Multi-agent observer hierarchy
- `peerPerspective` on `context()`
- Tiered tool surface (fast/LLM)
- Workspace `agentPeerMap`
-
---
-
-## Spec: async prefetch
-
-### Problem
-
-Calling `session.context()` and `peer.chat()` synchronously before each LLM call adds 200–800ms of Honcho round-trip latency to every turn.
-
-### Pattern
-
-Fire both calls as non-blocking background work at the **end** of each turn. Store results in a per-session cache keyed by session ID. At the **start** of the next turn, pop from cache — the HTTP is already done. First turn is cold (empty cache); all subsequent turns are zero-latency on the response path.
-
-### Interface contract
-
-```typescript
-interface AsyncPrefetch {
-  // Fire context + dialectic fetches at turn end. Non-blocking.
-  firePrefetch(sessionId: string, userMessage: string): void;
-
-  // Pop cached results at turn start. Returns empty if cache is cold.
-  popContextResult(sessionId: string): ContextResult | null;
-  popDialecticResult(sessionId: string): string | null;
-}
-
-type ContextResult = {
-  representation: string;
-  card: string[];
-  aiRepresentation?: string;  // AI peer context if enabled
-  summary?: string;           // conversation summary if fetched
-};
-```
-
-### Implementation notes
-
- **Python:** `threading.Thread(daemon=True)`. Write to `dict[session_id, result]` — GIL makes this safe for simple writes.
- **TypeScript:** `Promise` stored in `Map<string, Promise<ContextResult>>`. Await at pop time. If not resolved yet, return null — do not block.
- The pop is destructive: clears the cache entry after reading so stale data never accumulates.
- Prefetch should also fire on first turn (even though it won't be consumed until turn 2).
-
-### openclaw-honcho adoption
-
-Move `session.context()` from `before_prompt_build` to a post-`agent_end` background task. Store result in `state.contextCache`. In `before_prompt_build`, read from cache instead of calling Honcho. If cache is empty (turn 1), inject nothing — the prompt is still valid without Honcho context on the first turn.
-
---
-
-## Spec: dynamic reasoning level
-
-### Problem
-
-Honcho's dialectic endpoint supports reasoning levels from `minimal` to `max`. A fixed level per tool wastes budget on simple queries and under-serves complex ones.
-
-### Pattern
-
-Select the reasoning level dynamically based on the user's message. Use the configured default as a floor. Bump by message length. Cap auto-selection at `high` — never select `max` automatically.
-
-### Logic
-
-```
-< 120 chars  → default (typically "low")
-120–400 chars → one level above default (cap at "high")
-> 400 chars  → two levels above default (cap at "high")
-```
-
-### Config key
-
-Add `dialecticReasoningLevel` (string, default `"low"`). This sets the floor. The dynamic bump always applies on top.
-
-### openclaw-honcho adoption
-
-Apply in `honcho_recall` and `honcho_analyze`: replace fixed `reasoningLevel` with the dynamic selector. `honcho_recall` uses floor `"minimal"`, `honcho_analyze` uses floor `"medium"` — both still bump with message length.
-
---
-
-## Spec: per-peer memory modes
-
-### Problem
-
-Users want independent control over whether user context and agent context are written locally, to Honcho, or both.
-
-### Modes
-
-| Mode | Effect |
-|---|---|
-| `hybrid` | Write to both local files and Honcho (default) |
-| `honcho` | Honcho only — disable corresponding local file writes |
-| `local` | Local files only — skip Honcho sync for this peer |
-
-### Config schema
-
-```json
-{
-  "memoryMode": "hybrid",
-  "userMemoryMode": "honcho",
-  "agentMemoryMode": "hybrid"
-}
-```
-
-Resolution order: per-peer field wins → shorthand `memoryMode` → default `"hybrid"`.
-
-### Effect on Honcho sync
-
- `userMemoryMode=local`: skip adding user peer messages to Honcho
- `agentMemoryMode=local`: skip adding assistant peer messages to Honcho
- Both local: skip `session.addMessages()` entirely
- `userMemoryMode=honcho`: disable local USER.md writes
- `agentMemoryMode=honcho`: disable local MEMORY.md / SOUL.md writes
-
---
-
-## Spec: AI peer identity formation
-
-### Problem
-
-Honcho builds the user's representation organically by observing what the user says. The same mechanism exists for the AI peer — but only if `observe_me=True` is set for the agent peer. Without it, the agent peer accumulates nothing.
-
-Additionally, existing persona files (SOUL.md, IDENTITY.md) should seed the AI peer's Honcho representation at first activation.
-
-### Part A: observe_me=True for agent peer
-
-```typescript
-await session.addPeers([
-  [ownerPeer.id, { observeMe: true,  observeOthers: false }],
-  [agentPeer.id, { observeMe: true,  observeOthers: true  }], // was false
-]);
-```
-
-One-line change. Foundational. Without it, the AI peer representation stays empty regardless of what the agent says.
-
-### Part B: seedAiIdentity()
-
-```typescript
-async function seedAiIdentity(
-  agentPeer: Peer,
-  content: string,
-  source: string
-): Promise<boolean> {
-  const wrapped = [
-    `<ai_identity_seed>`,
-    `<source>${source}</source>`,
-    ``,
-    content.trim(),
-    `</ai_identity_seed>`,
-  ].join("\n");
-
-  await agentPeer.addMessage("assistant", wrapped);
-  return true;
-}
-```
-
-### Part C: migrate agent files at setup
-
-During `honcho setup`, upload agent-self files (SOUL.md, IDENTITY.md, AGENTS.md) to the agent peer via `seedAiIdentity()` instead of `session.uploadFile()`. This routes content through Honcho's observation pipeline.
-
-### Part D: AI peer name in identity
-
-When the agent has a configured name, prepend it to the injected system prompt:
-
-```typescript
-const namePrefix = agentName ? `You are ${agentName}.\n\n` : "";
-return { systemPrompt: namePrefix + "## User Memory Context\n\n" + sections };
-```
-
-### CLI surface
-
-```
-honcho identity <file>    # seed from file
-honcho identity --show    # show current AI peer representation
-```
-
---
-
-## Spec: session naming strategies
-
-### Problem
-
-A single global session means every project shares the same Honcho context. Per-directory sessions provide isolation without requiring users to name sessions manually.
-
-### Strategies
-
-| Strategy | Session key | When to use |
-|---|---|---|
-| `per-directory` | basename of CWD | Default. Each project gets its own session. |
-| `global` | fixed string `"global"` | Single cross-project session. |
-| manual map | user-configured per path | `sessions` config map overrides directory basename. |
-| title-based | sanitized session title | When agent supports named sessions set mid-conversation. |
-
-### Config schema
-
-```json
-{
-  "sessionStrategy": "per-directory",
-  "sessionPeerPrefix": false,
-  "sessions": {
-    "/home/user/projects/foo": "foo-project"
-  }
-}
-```
-
-### CLI surface
-
-```
-honcho sessions              # list all mappings
-honcho map <name>            # map cwd to session name
-honcho map                   # no-arg = list mappings
-```
-
-Resolution order: manual map → session title → directory basename → platform key.
-
---
-
-## Spec: CLI surface injection
-
-### Problem
-
-When a user asks "how do I change my memory settings?" the agent either hallucinates or says it doesn't know. The agent should know its own management interface.
-
-### Pattern
-
-When Honcho is active, append a compact command reference to the system prompt. Keep it under 300 chars.
-
-```
-# Honcho memory integration
-Active. Session: {sessionKey}. Mode: {mode}.
-Management commands:
-  honcho status                    — show config + connection
-  honcho mode [hybrid|honcho|local] — show or set memory mode
-  honcho sessions                  — list session mappings
-  honcho map <name>                — map directory to session
-  honcho identity [file] [--show]  — seed or show AI identity
-  honcho setup                     — full interactive wizard
-```
-
---
-
-## openclaw-honcho checklist
-
-Ordered by impact:
-
- [ ] **Async prefetch** — move `session.context()` out of `before_prompt_build` into post-`agent_end` background Promise
- [ ] **observe_me=True for agent peer** — one-line change in `session.addPeers()`
- [ ] **Dynamic reasoning level** — add helper; apply in `honcho_recall` and `honcho_analyze`; add `dialecticReasoningLevel` to config
- [ ] **Per-peer memory modes** — add `userMemoryMode` / `agentMemoryMode` to config; gate Honcho sync and local writes
- [ ] **seedAiIdentity()** — add helper; use during setup migration for SOUL.md / IDENTITY.md
- [ ] **Session naming strategies** — add `sessionStrategy`, `sessions` map, `sessionPeerPrefix`
- [ ] **CLI surface injection** — append command reference to `before_prompt_build` return value
- [ ] **honcho identity subcommand** — seed from file or `--show` current representation
- [ ] **AI peer name injection** — if `aiPeer` name configured, prepend to injected system prompt
- [ ] **honcho mode / sessions / map** — CLI parity with Hermes
-
-Already done in openclaw-honcho (do not re-implement): `lastSavedIndex` dedup, platform metadata stripping, multi-agent parent observer, `peerPerspective` on `context()`, tiered tool surface, workspace `agentPeerMap`, QMD passthrough, self-hosted Honcho.
-
---
-
-## nanobot-honcho checklist
-
-Greenfield integration. Start from openclaw-honcho's architecture and apply all Hermes patterns from day one.
-
-### Phase 1 — core correctness
-
- [ ] Dual peer model (owner + agent peer), both with `observe_me=True`
- [ ] Message capture at turn end with `lastSavedIndex` dedup
- [ ] Platform metadata stripping before Honcho storage
- [ ] Async prefetch from day one — do not implement blocking context injection
- [ ] Legacy file migration at first activation (USER.md → owner peer, SOUL.md → `seedAiIdentity()`)
-
-### Phase 2 — configuration
-
- [ ] Config schema: `apiKey`, `workspaceId`, `baseUrl`, `memoryMode`, `userMemoryMode`, `agentMemoryMode`, `dialecticReasoningLevel`, `sessionStrategy`, `sessions`
- [ ] Per-peer memory mode gating
- [ ] Dynamic reasoning level
- [ ] Session naming strategies
-
-### Phase 3 — tools and CLI
-
- [ ] Tool surface: `honcho_profile`, `honcho_recall`, `honcho_analyze`, `honcho_search`, `honcho_context`
- [ ] CLI: `setup`, `status`, `sessions`, `map`, `mode`, `identity`
- [ ] CLI surface injection into system prompt
- [ ] AI peer name wired into agent identity
@@ -1,142 +0,0 @@
-# Migrating from OpenClaw to Hermes Agent
-
-This guide covers how to import your OpenClaw settings, memories, skills, and API keys into Hermes Agent.
-
-## Three Ways to Migrate
-
-### 1. Automatic (during first-time setup)
-
-When you run `hermes setup` for the first time and Hermes detects `~/.openclaw`, it automatically offers to import your OpenClaw data before configuration begins. Just accept the prompt and everything is handled for you.
-
-### 2. CLI Command (quick, scriptable)
-
-```bash
-hermes claw migrate                      # Preview then migrate (always shows preview first)
-hermes claw migrate --dry-run            # Preview only, no changes
-hermes claw migrate --preset user-data   # Migrate without API keys/secrets
-hermes claw migrate --yes                # Skip confirmation prompt
-```
-
-The migration always shows a full preview of what will be imported before making any changes. You review the preview and confirm before anything is written.
-
-**All options:**
-
-| Flag | Description |
-|------|-------------|
-| `--source PATH` | Path to OpenClaw directory (default: `~/.openclaw`) |
-| `--dry-run` | Preview only — no files are modified |
-| `--preset {user-data,full}` | Migration preset (default: `full`). `user-data` excludes secrets |
-| `--overwrite` | Overwrite existing files (default: skip conflicts) |
-| `--migrate-secrets` | Include allowlisted secrets (auto-enabled with `full` preset) |
-| `--workspace-target PATH` | Copy workspace instructions (AGENTS.md) to this absolute path |
-| `--skill-conflict {skip,overwrite,rename}` | How to handle skill name conflicts (default: `skip`) |
-| `--yes`, `-y` | Skip confirmation prompts |
-
-### 3. Agent-Guided (interactive, with previews)
-
-Ask the agent to run the migration for you:
-
-```
-> Migrate my OpenClaw setup to Hermes
-```
-
-The agent will use the `openclaw-migration` skill to:
-1. Run a preview first to show what would change
-2. Ask about conflict resolution (SOUL.md, skills, etc.)
-3. Let you choose between `user-data` and `full` presets
-4. Execute the migration with your choices
-5. Print a detailed summary of what was migrated
-
-## What Gets Migrated
-
-### `user-data` preset
-| Item | Source | Destination |
-|------|--------|-------------|
-| SOUL.md | `~/.openclaw/workspace/SOUL.md` | `~/.hermes/SOUL.md` |
-| Memory entries | `~/.openclaw/workspace/MEMORY.md` | `~/.hermes/memories/MEMORY.md` |
-| User profile | `~/.openclaw/workspace/USER.md` | `~/.hermes/memories/USER.md` |
-| Skills | `~/.openclaw/workspace/skills/` | `~/.hermes/skills/openclaw-imports/` |
-| Command allowlist | `~/.openclaw/workspace/exec_approval_patterns.yaml` | Merged into `~/.hermes/config.yaml` |
-| Messaging settings | `~/.openclaw/config.yaml` (TELEGRAM_ALLOWED_USERS, MESSAGING_CWD) | `~/.hermes/.env` |
-| TTS assets | `~/.openclaw/workspace/tts/` | `~/.hermes/tts/` |
-
-Workspace files are also checked at `workspace.default/` and `workspace-main/` as fallback paths (OpenClaw renamed `workspace/` to `workspace-main/` in recent versions).
-
-### `full` preset (adds to `user-data`)
-| Item | Source | Destination |
-|------|--------|-------------|
-| Telegram bot token | `openclaw.json` channels config | `~/.hermes/.env` |
-| OpenRouter API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
-| OpenAI API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
-| Anthropic API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
-| ElevenLabs API key | `.env`, `openclaw.json`, or `openclaw.json["env"]` | `~/.hermes/.env` |
-
-API keys are searched across four sources: inline config values, `~/.openclaw/.env`, the `openclaw.json` `"env"` sub-object, and per-agent auth profiles.
-
-Only allowlisted secrets are ever imported. Other credentials are skipped and reported.
-
-## OpenClaw Schema Compatibility
-
-The migration handles both old and current OpenClaw config layouts:
-
- **Channel tokens**: Reads from flat paths (`channels.telegram.botToken`) and the newer `accounts.default` layout (`channels.telegram.accounts.default.botToken`)
- **TTS provider**: OpenClaw renamed "edge" to "microsoft" — both are recognized and mapped to Hermes' "edge"
- **Provider API types**: Both short (`openai`, `anthropic`) and hyphenated (`openai-completions`, `anthropic-messages`, `google-generative-ai`) values are mapped correctly
- **thinkingDefault**: All enum values are handled including newer ones (`minimal`, `xhigh`, `adaptive`)
- **Matrix**: Uses `accessToken` field (not `botToken`)
- **SecretRef formats**: Plain strings, env templates (`${VAR}`), and `source: "env"` SecretRefs are resolved. `source: "file"` and `source: "exec"` SecretRefs produce a warning — add those keys manually after migration.
-
-## Conflict Handling
-
-By default, the migration **will not overwrite** existing Hermes data:
-
- **SOUL.md** — skipped if one already exists in `~/.hermes/`
- **Memory entries** — skipped if memories already exist (to avoid duplicates)
- **Skills** — skipped if a skill with the same name already exists
- **API keys** — skipped if the key is already set in `~/.hermes/.env`
-
-To overwrite conflicts, use `--overwrite`. The migration creates backups before overwriting.
-
-For skills, you can also use `--skill-conflict rename` to import conflicting skills under a new name (e.g., `skill-name-imported`).
-
-## Migration Report
-
-Every migration produces a report showing:
- **Migrated items** — what was successfully imported
- **Conflicts** — items skipped because they already exist
- **Skipped items** — items not found in the source
- **Errors** — items that failed to import
-
-For executed migrations, the full report is saved to `~/.hermes/migration/openclaw/<timestamp>/`.
-
-## Post-Migration Notes
-
- **Skills require a new session** — imported skills take effect after restarting your agent or starting a new chat.
- **WhatsApp requires re-pairing** — WhatsApp uses QR-code pairing, not token-based auth. Run `hermes whatsapp` to pair.
- **Archive cleanup** — after migration, you'll be offered to rename `~/.openclaw/` to `.openclaw.pre-migration/` to prevent state confusion. You can also run `hermes claw cleanup` later.
-
-## Troubleshooting
-
-### "OpenClaw directory not found"
-The migration looks for `~/.openclaw` by default, then tries `~/.clawdbot` and `~/.moltbot`. If your OpenClaw is installed elsewhere, use `--source`:
-```bash
-hermes claw migrate --source /path/to/.openclaw
-```
-
-### "Migration script not found"
-The migration script ships with Hermes Agent. If you installed via pip (not git clone), the `optional-skills/` directory may not be present. Install the skill from the Skills Hub:
-```bash
-hermes skills install openclaw-migration
-```
-
-### Memory overflow
-If your OpenClaw MEMORY.md or USER.md exceeds Hermes' character limits, excess entries are exported to an overflow file in the migration report directory. You can manually review and add the most important ones.
-
-### API keys not found
-Keys might be stored in different places depending on your OpenClaw setup:
- `~/.openclaw/.env` file
- Inline in `openclaw.json` under `models.providers.*.apiKey`
- In `openclaw.json` under the `"env"` or `"env.vars"` sub-objects
- In `~/.openclaw/agents/main/agent/auth-profiles.json`
-
-The migration checks all four. If keys use `source: "file"` or `source: "exec"` SecretRefs, they can't be resolved automatically — add them via `hermes config set`.
@@ -1,608 +0,0 @@
-# Pricing Accuracy Architecture
-
-Date: 2026-03-16
-
-## Goal
-
-Hermes should only show dollar costs when they are backed by an official source for the user's actual billing path.
-
-This design replaces the current static, heuristic pricing flow in:
-
- `run_agent.py`
- `agent/usage_pricing.py`
- `agent/insights.py`
- `cli.py`
-
-with a provider-aware pricing system that:
-
- handles cache billing correctly
- distinguishes `actual` vs `estimated` vs `included` vs `unknown`
- reconciles post-hoc costs when providers expose authoritative billing data
- supports direct providers, OpenRouter, subscriptions, enterprise pricing, and custom endpoints
-
-## Problems In The Current Design
-
-Current Hermes behavior has four structural issues:
-
-1. It stores only `prompt_tokens` and `completion_tokens`, which is insufficient for providers that bill cache reads and cache writes separately.
-2. It uses a static model price table and fuzzy heuristics, which can drift from current official pricing.
-3. It assumes public API list pricing matches the user's real billing path.
-4. It has no distinction between live estimates and reconciled billed cost.
-
-## Design Principles
-
-1. Normalize usage before pricing.
-2. Never fold cached tokens into plain input cost.
-3. Track certainty explicitly.
-4. Treat the billing path as part of the model identity.
-5. Prefer official machine-readable sources over scraped docs.
-6. Use post-hoc provider cost APIs when available.
-7. Show `n/a` rather than inventing precision.
-
-## High-Level Architecture
-
-The new system has four layers:
-
-1. `usage_normalization`
-   Converts raw provider usage into a canonical usage record.
-2. `pricing_source_resolution`
-   Determines the billing path, source of truth, and applicable pricing source.
-3. `cost_estimation_and_reconciliation`
-   Produces an immediate estimate when possible, then replaces or annotates it with actual billed cost later.
-4. `presentation`
-   `/usage`, `/insights`, and the status bar display cost with certainty metadata.
-
-## Canonical Usage Record
-
-Add a canonical usage model that every provider path maps into before any pricing math happens.
-
-Suggested structure:
-
-```python
-@dataclass
-class CanonicalUsage:
-    provider: str
-    billing_provider: str
-    model: str
-    billing_route: str
-
-    input_tokens: int = 0
-    output_tokens: int = 0
-    cache_read_tokens: int = 0
-    cache_write_tokens: int = 0
-    reasoning_tokens: int = 0
-    request_count: int = 1
-
-    raw_usage: dict[str, Any] | None = None
-    raw_usage_fields: dict[str, str] | None = None
-    computed_fields: set[str] | None = None
-
-    provider_request_id: str | None = None
-    provider_generation_id: str | None = None
-    provider_response_id: str | None = None
-```
-
-Rules:
-
- `input_tokens` means non-cached input only.
- `cache_read_tokens` and `cache_write_tokens` are never merged into `input_tokens`.
- `output_tokens` excludes cache metrics.
- `reasoning_tokens` is telemetry unless a provider officially bills it separately.
-
-This is the same normalization pattern used by `opencode`, extended with provenance and reconciliation ids.
-
-## Provider Normalization Rules
-
-### OpenAI Direct
-
-Source usage fields:
-
- `prompt_tokens`
- `completion_tokens`
- `prompt_tokens_details.cached_tokens`
-
-Normalization:
-
- `cache_read_tokens = cached_tokens`
- `input_tokens = prompt_tokens - cached_tokens`
- `cache_write_tokens = 0` unless OpenAI exposes it in the relevant route
- `output_tokens = completion_tokens`
-
-### Anthropic Direct
-
-Source usage fields:
-
- `input_tokens`
- `output_tokens`
- `cache_read_input_tokens`
- `cache_creation_input_tokens`
-
-Normalization:
-
- `input_tokens = input_tokens`
- `output_tokens = output_tokens`
- `cache_read_tokens = cache_read_input_tokens`
- `cache_write_tokens = cache_creation_input_tokens`
-
-### OpenRouter
-
-Estimate-time usage normalization should use the response usage payload with the same rules as the underlying provider when possible.
-
-Reconciliation-time records should also store:
-
- OpenRouter generation id
- native token fields when available
- `total_cost`
- `cache_discount`
- `upstream_inference_cost`
- `is_byok`
-
-### Gemini / Vertex
-
-Use official Gemini or Vertex usage fields where available.
-
-If cached content tokens are exposed:
-
- map them to `cache_read_tokens`
-
-If a route exposes no cache creation metric:
-
- store `cache_write_tokens = 0`
- preserve the raw usage payload for later extension
-
-### DeepSeek And Other Direct Providers
-
-Normalize only the fields that are officially exposed.
-
-If a provider does not expose cache buckets:
-
- do not infer them unless the provider explicitly documents how to derive them
-
-### Subscription / Included-Cost Routes
-
-These still use the canonical usage model.
-
-Tokens are tracked normally. Cost depends on billing mode, not on whether usage exists.
-
-## Billing Route Model
-
-Hermes must stop keying pricing solely by `model`.
-
-Introduce a billing route descriptor:
-
-```python
-@dataclass
-class BillingRoute:
-    provider: str
-    base_url: str | None
-    model: str
-    billing_mode: str
-    organization_hint: str | None = None
-```
-
-`billing_mode` values:
-
- `official_cost_api`
- `official_generation_api`
- `official_models_api`
- `official_docs_snapshot`
- `subscription_included`
- `user_override`
- `custom_contract`
- `unknown`
-
-Examples:
-
- OpenAI direct API with Costs API access: `official_cost_api`
- Anthropic direct API with Usage & Cost API access: `official_cost_api`
- OpenRouter request before reconciliation: `official_models_api`
- OpenRouter request after generation lookup: `official_generation_api`
- GitHub Copilot style subscription route: `subscription_included`
- local OpenAI-compatible server: `unknown`
- enterprise contract with configured rates: `custom_contract`
-
-## Cost Status Model
-
-Every displayed cost should have:
-
-```python
-@dataclass
-class CostResult:
-    amount_usd: Decimal | None
-    status: Literal["actual", "estimated", "included", "unknown"]
-    source: Literal[
-        "provider_cost_api",
-        "provider_generation_api",
-        "provider_models_api",
-        "official_docs_snapshot",
-        "user_override",
-        "custom_contract",
-        "none",
-    ]
-    label: str
-    fetched_at: datetime | None
-    pricing_version: str | None
-    notes: list[str]
-```
-
-Presentation rules:
-
- `actual`: show dollar amount as final
- `estimated`: show dollar amount with estimate labeling
- `included`: show `included` or `$0.00 (included)` depending on UX choice
- `unknown`: show `n/a`
-
-## Official Source Hierarchy
-
-Resolve cost using this order:
-
-1. Request-level or account-level official billed cost
-2. Official machine-readable model pricing
-3. Official docs snapshot
-4. User override or custom contract
-5. Unknown
-
-The system must never skip to a lower level if a higher-confidence source exists for the current billing route.
-
-## Provider-Specific Truth Rules
-
-### OpenAI Direct
-
-Preferred truth:
-
-1. Costs API for reconciled spend
-2. Official pricing page for live estimate
-
-### Anthropic Direct
-
-Preferred truth:
-
-1. Usage & Cost API for reconciled spend
-2. Official pricing docs for live estimate
-
-### OpenRouter
-
-Preferred truth:
-
-1. `GET /api/v1/generation` for reconciled `total_cost`
-2. `GET /api/v1/models` pricing for live estimate
-
-Do not use underlying provider public pricing as the source of truth for OpenRouter billing.
-
-### Gemini / Vertex
-
-Preferred truth:
-
-1. official billing export or billing API for reconciled spend when available for the route
-2. official pricing docs for estimate
-
-### DeepSeek
-
-Preferred truth:
-
-1. official machine-readable cost source if available in the future
-2. official pricing docs snapshot today
-
-### Subscription-Included Routes
-
-Preferred truth:
-
-1. explicit route config marking the model as included in subscription
-
-These should display `included`, not an API list-price estimate.
-
-### Custom Endpoint / Local Model
-
-Preferred truth:
-
-1. user override
-2. custom contract config
-3. unknown
-
-These should default to `unknown`.
-
-## Pricing Catalog
-
-Replace the current `MODEL_PRICING` dict with a richer pricing catalog.
-
-Suggested record:
-
-```python
-@dataclass
-class PricingEntry:
-    provider: str
-    route_pattern: str
-    model_pattern: str
-
-    input_cost_per_million: Decimal | None = None
-    output_cost_per_million: Decimal | None = None
-    cache_read_cost_per_million: Decimal | None = None
-    cache_write_cost_per_million: Decimal | None = None
-    request_cost: Decimal | None = None
-    image_cost: Decimal | None = None
-
-    source: str = "official_docs_snapshot"
-    source_url: str | None = None
-    fetched_at: datetime | None = None
-    pricing_version: str | None = None
-```
-
-The catalog should be route-aware:
-
- `openai:gpt-5`
- `anthropic:claude-opus-4-6`
- `openrouter:anthropic/claude-opus-4.6`
- `copilot:gpt-4o`
-
-This avoids conflating direct-provider billing with aggregator billing.
-
-## Pricing Sync Architecture
-
-Introduce a pricing sync subsystem instead of manually maintaining a single hardcoded table.
-
-Suggested modules:
-
- `agent/pricing/catalog.py`
- `agent/pricing/sources.py`
- `agent/pricing/sync.py`
- `agent/pricing/reconcile.py`
- `agent/pricing/types.py`
-
-### Sync Sources
-
- OpenRouter models API
- official provider docs snapshots where no API exists
- user overrides from config
-
-### Sync Output
-
-Cache pricing entries locally with:
-
- source URL
- fetch timestamp
- version/hash
- confidence/source type
-
-### Sync Frequency
-
- startup warm cache
- background refresh every 6 to 24 hours depending on source
- manual `hermes pricing sync`
-
-## Reconciliation Architecture
-
-Live requests may produce only an estimate initially. Hermes should reconcile them later when a provider exposes actual billed cost.
-
-Suggested flow:
-
-1. Agent call completes.
-2. Hermes stores canonical usage plus reconciliation ids.
-3. Hermes computes an immediate estimate if a pricing source exists.
-4. A reconciliation worker fetches actual cost when supported.
-5. Session and message records are updated with `actual` cost.
-
-This can run:
-
- inline for cheap lookups
- asynchronously for delayed provider accounting
-
-## Persistence Changes
-
-Session storage should stop storing only aggregate prompt/completion totals.
-
-Add fields for both usage and cost certainty:
-
- `input_tokens`
- `output_tokens`
- `cache_read_tokens`
- `cache_write_tokens`
- `reasoning_tokens`
- `estimated_cost_usd`
- `actual_cost_usd`
- `cost_status`
- `cost_source`
- `pricing_version`
- `billing_provider`
- `billing_mode`
-
-If schema expansion is too large for one PR, add a new pricing events table:
-
-```text
-session_cost_events
-  id
-  session_id
-  request_id
-  provider
-  model
-  billing_mode
-  input_tokens
-  output_tokens
-  cache_read_tokens
-  cache_write_tokens
-  estimated_cost_usd
-  actual_cost_usd
-  cost_status
-  cost_source
-  pricing_version
-  created_at
-  updated_at
-```
-
-## Hermes Touchpoints
-
-### `run_agent.py`
-
-Current responsibility:
-
- parse raw provider usage
- update session token counters
-
-New responsibility:
-
- build `CanonicalUsage`
- update canonical counters
- store reconciliation ids
- emit usage event to pricing subsystem
-
-### `agent/usage_pricing.py`
-
-Current responsibility:
-
- static lookup table
- direct cost arithmetic
-
-New responsibility:
-
- move or replace with pricing catalog facade
- no fuzzy model-family heuristics
- no direct pricing without billing-route context
-
-### `cli.py`
-
-Current responsibility:
-
- compute session cost directly from prompt/completion totals
-
-New responsibility:
-
- display `CostResult`
- show status badges:
-  - `actual`
-  - `estimated`
-  - `included`
-  - `n/a`
-
-### `agent/insights.py`
-
-Current responsibility:
-
- recompute historical estimates from static pricing
-
-New responsibility:
-
- aggregate stored pricing events
- prefer actual cost over estimate
- surface estimates only when reconciliation is unavailable
-
-## UX Rules
-
-### Status Bar
-
-Show one of:
-
- `$1.42`
- `~$1.42`
- `included`
- `cost n/a`
-
-Where:
-
- `$1.42` means `actual`
- `~$1.42` means `estimated`
- `included` means subscription-backed or explicitly zero-cost route
- `cost n/a` means unknown
-
-### `/usage`
-
-Show:
-
- token buckets
- estimated cost
- actual cost if available
- cost status
- pricing source
-
-### `/insights`
-
-Aggregate:
-
- actual cost totals
- estimated-only totals
- unknown-cost sessions count
- included-cost sessions count
-
-## Config And Overrides
-
-Add user-configurable pricing overrides in config:
-
-```yaml
-pricing:
-  mode: hybrid
-  sync_on_startup: true
-  sync_interval_hours: 12
-  overrides:
-    - provider: openrouter
-      model: anthropic/claude-opus-4.6
-      billing_mode: custom_contract
-      input_cost_per_million: 4.25
-      output_cost_per_million: 22.0
-      cache_read_cost_per_million: 0.5
-      cache_write_cost_per_million: 6.0
-  included_routes:
-    - provider: copilot
-      model: "*"
-    - provider: codex-subscription
-      model: "*"
-```
-
-Overrides must win over catalog defaults for the matching billing route.
-
-## Rollout Plan
-
-### Phase 1
-
- add canonical usage model
- split cache token buckets in `run_agent.py`
- stop pricing cache-inflated prompt totals
- preserve current UI with improved backend math
-
-### Phase 2
-
- add route-aware pricing catalog
- integrate OpenRouter models API sync
- add `estimated` vs `included` vs `unknown`
-
-### Phase 3
-
- add reconciliation for OpenRouter generation cost
- add actual cost persistence
- update `/insights` to prefer actual cost
-
-### Phase 4
-
- add direct OpenAI and Anthropic reconciliation paths
- add user overrides and contract pricing
- add pricing sync CLI command
-
-## Testing Strategy
-
-Add tests for:
-
- OpenAI cached token subtraction
- Anthropic cache read/write separation
- OpenRouter estimated vs actual reconciliation
- subscription-backed models showing `included`
- custom endpoints showing `n/a`
- override precedence
- stale catalog fallback behavior
-
-Current tests that assume heuristic pricing should be replaced with route-aware expectations.
-
-## Non-Goals
-
- exact enterprise billing reconstruction without an official source or user override
- backfilling perfect historical cost for old sessions that lack cache bucket data
- scraping arbitrary provider web pages at request time
-
-## Recommendation
-
-Do not expand the existing `MODEL_PRICING` dict.
-
-That path cannot satisfy the product requirement. Hermes should instead migrate to:
-
- canonical usage normalization
- route-aware pricing sources
- estimate-then-reconcile cost lifecycle
- explicit certainty states in the UI
-
-This is the minimum architecture that makes the statement "Hermes pricing is backed by official sources where possible, and otherwise clearly labeled" defensible.
@@ -1,108 +0,0 @@
-# Ink Gateway TUI Migration — Post-mortem
-
-Planned: 2026-04-01 · Delivered: 2026-04 · Status: shipped, classic (prompt_toolkit) CLI still present
-
-## What Shipped
-
-Three layers, same repo, Python runtime unchanged.
-
-```
-ui-tui (Node/TS)  ──stdio JSON-RPC──▶  tui_gateway (Py)  ──▶  AIAgent (run_agent.py)
-```
-
-### Backend — `tui_gateway/`
-
-```
-tui_gateway/
-├── entry.py          # subprocess entrypoint, stdio read/write loop
-├── server.py         # everything: sessions dict, @method handlers, _emit
-├── render.py         # stream renderer, diff rendering, message rendering
-├── slash_worker.py   # subprocess that runs hermes_cli slash commands
-└── __init__.py
-```
-
-`server.py` owns the full runtime-control surface: session store (`_sessions: dict[str, dict]`), method registry (`@method("…")` decorator), event emitter (`_emit`), agent lifecycle (`_make_agent`, `_init_session`, `_wire_callbacks`), approval/sudo/clarify round-trips, and JSON-RPC dispatch.
-
-Protocol methods (`@method(...)` in `server.py`):
-
- session: `session.{create, resume, list, close, interrupt, usage, history, compress, branch, title, save, undo}`
- prompt: `prompt.{submit, background, btw}`
- tools: `tools.{list, show, configure}`
- slash: `slash.exec`, `command.{dispatch, resolve}`, `commands.catalog`, `complete.{path, slash}`
- approvals: `approval.respond`, `sudo.respond`, `clarify.respond`, `secret.respond`
- config/state: `config.{get, set, show}`, `model.options`, `reload.mcp`
- ops: `shell.exec`, `cli.exec`, `terminal.resize`, `input.detect_drop`, `clipboard.paste`, `paste.collapse`, `image.attach`, `process.stop`
- misc: `agents.list`, `skills.manage`, `plugins.list`, `cron.manage`, `insights.get`, `rollback.{list, diff, restore}`, `browser.manage`
-
-Protocol events (`_emit(…)` → handled in `ui-tui/src/app/createGatewayEventHandler.ts`):
-
- lifecycle: `gateway.{ready, stderr}`, `session.info`, `skin.changed`
- stream: `message.{start, delta, complete}`, `thinking.delta`, `reasoning.{delta, available}`, `status.update`
- tools: `tool.{start, progress, complete, generating}`, `subagent.{start, thinking, tool, progress, complete}`
- interactive: `approval.request`, `sudo.request`, `clarify.request`, `secret.request`
- async: `background.complete`, `btw.complete`, `error`
-
-### Frontend — `ui-tui/src/`
-
-```
-src/
-├── entry.tsx            # node bootstrap: bootBanner → spawn python → dynamic-import Ink → render(<App/>)
-├── app.tsx              # <GatewayProvider> wraps <AppLayout>
-├── bootBanner.ts        # raw-ANSI banner to stdout in ~2ms, pre-React
-├── gatewayClient.ts     # JSON-RPC client over child_process stdio
-├── gatewayTypes.ts      # typed RPC responses + GatewayEvent union
-├── theme.ts             # DEFAULT_THEME + fromSkin
-│
-├── app/                 # hooks + stores — the orchestration layer
-│   ├── uiStore.ts             # nanostore: sid, info, busy, usage, theme, status…
-│   ├── turnStore.ts           # nanostore: per-turn activity / reasoning / tools
-│   ├── turnController.ts      # imperative singleton for stream-time operations
-│   ├── overlayStore.ts        # nanostore: modal/overlay state
-│   ├── useMainApp.ts          # top-level composition hook
-│   ├── useSessionLifecycle.ts # session.create/resume/close/reset
-│   ├── useSubmission.ts       # shell/slash/prompt dispatch + interpolation
-│   ├── useConfigSync.ts       # config.get + mtime poll
-│   ├── useComposerState.ts    # input buffer, paste snippets, editor mode
-│   ├── useInputHandlers.ts    # key bindings
-│   ├── createGatewayEventHandler.ts  # event-stream dispatcher
-│   ├── createSlashHandler.ts         # slash command router (registry + python fallback)
-│   └── slash/commands/        # core.ts, ops.ts, session.ts — TS-owned slash commands
-│
-├── components/          # AppLayout, AppChrome, AppOverlays, MessageLine, Thinking, Markdown, pickers, prompts, Banner, SessionPanel
-├── config/              # env, limits, timing constants
-├── content/             # charms, faces, fortunes, hotkeys, placeholders, verbs
-├── domain/              # details, messages, paths, roles, slash, usage, viewport
-├── protocol/            # interpolation, paste regex
-├── hooks/               # useCompletion, useInputHistory, useQueue, useVirtualHistory
-└── lib/                 # history, messages, osc52, rpc, text
-```
-
-### CLI entry points — `hermes_cli/main.py`
-
- `hermes --tui`      → `node dist/entry.js` (auto-builds when `.ts`/`.tsx` newer than `dist/entry.js`)
- `hermes --tui --dev` → `tsx src/entry.tsx` (skip build)
- `HERMES_TUI_DIR=…`  → external prebuilt dist (nix, distro packaging)
-
-## Diverged From Original Plan
-
-| Plan | Reality | Why |
-|---|---|---|
-| `tui_gateway/{controller,session_state,events,protocol}.py` | all collapsed into `server.py` | no second consumer ever emerged, keeping one file cheaper than four |
-| `ui-tui/src/main.tsx` | split into `entry.tsx` (bootstrap) + `app.tsx` (shell) | boot banner + early python spawn wanted a pre-React moment |
-| `ui-tui/src/state/store.ts` | three nanostores (`uiStore`, `turnStore`, `overlayStore`) | separate lifetimes: ui persists, turn resets per reply, overlay is modal |
-| `approval.requested` / `sudo.requested` / `clarify.requested` | `*.request` (no `-ed`) | cosmetic |
-| `session.cancel` | dropped | `session.interrupt` covers it |
-| `HERMES_EXPERIMENTAL_TUI=1`, `display.experimental_tui: true`, `/tui on/off/status` | none shipped | `--tui` went from opt-in to first-class without an experimental phase |
-
-## Post-migration Additions (not in original plan)
-
- **Async `session.create`** — returns sid in ~1ms, agent builds on a background thread, `session.info` broadcasts when ready; `_wait_agent()` gates every agent-touching handler via `_sess`
- **`bootBanner`** — raw-ANSI logo painted to stdout at T≈2ms, before Ink loads; `<AlternateScreen>` wipes it seamlessly when React mounts
- **Selection uniform bg** — `theme.color.selectionBg` wired via `useSelection().setSelectionBgColor`; replaces SGR-inverse per-cell swap that fragmented over amber/gold fg
- **Slash command registry** — TS-owned commands in `app/slash/commands/{core,ops,session}.ts`, everything else falls through to `slash.exec` (python worker)
- **Turn store + controller split** — imperative singleton (`turnController`) holds refs/timers, nanostore (`turnStore`) holds render-visible state
-
-## What's Still Open
-
- **Classic CLI not deleted.** `cli.py` still has ~80 `prompt_toolkit` references; classic REPL is still the default when `--tui` is absent. The original plan's "Cut 4 · prompt_toolkit removal later" hasn't happened.
- **No config-file opt-in.** `HERMES_EXPERIMENTAL_TUI` and `display.experimental_tui` were never built; only the CLI flag exists. Fine for now — if we want "default to TUI", a single line in `main.py` flips it.
@@ -1,106 +0,0 @@
-# ============================================================================
-# Hermes Agent — Example Skin Template
-# ============================================================================
-#
-# Copy this file to ~/.hermes/skins/<name>.yaml to create a custom skin.
-# All fields are optional — missing values inherit from the default skin.
-# Activate with: /skin <name>  or  display.skin: <name> in config.yaml
-#
-# Keys are marked:
-#   (both)    — applies to both the classic CLI and the TUI
-#   (classic) — classic CLI only (see hermes --tui in user-guide/tui.md)
-#   (tui)     — TUI only
-#
-# See hermes_cli/skin_engine.py for the full schema reference.
-# ============================================================================
-
-# Required: unique skin name (used in /skin command and config)
-name: example
-description: An example custom skin — copy and modify this template
-
-# ── Colors ──────────────────────────────────────────────────────────────────
-# Hex color values. These control the visual palette.
-colors:
-  # Banner panel (the startup welcome box) — (both)
-  banner_border: "#CD7F32"        # Panel border
-  banner_title: "#FFD700"         # Panel title text
-  banner_accent: "#FFBF00"        # Section headers (Available Tools, Skills, etc.)
-  banner_dim: "#B8860B"           # Dim/muted text (separators, model info)
-  banner_text: "#FFF8DC"          # Body text (tool names, skill names)
-
-  # UI elements — (both)
-  ui_accent: "#FFBF00"            # General accent (falls back to banner_accent)
-  ui_label: "#4dd0e1"             # Labels
-  ui_ok: "#4caf50"                # Success indicators
-  ui_error: "#ef5350"             # Error indicators
-  ui_warn: "#ffa726"              # Warning indicators
-
-  # Input area
-  prompt: "#FFF8DC"               # Prompt text / `❯` glyph color (both)
-  input_rule: "#CD7F32"           # Horizontal rule above input (classic)
-
-  # Response box — (classic)
-  response_border: "#FFD700"      # Response box border
-
-  # Session display — (both)
-  session_label: "#DAA520"        # "Session: " label
-  session_border: "#8B8682"       # Session ID text
-
-  # TUI / CLI surfaces — (classic: status bar, voice badge, completion meta)
-  status_bar_bg: "#1a1a2e"              # Status / usage bar background (classic)
-  voice_status_bg: "#1a1a2e"            # Voice-mode badge background (classic)
-  completion_menu_bg: "#1a1a2e"         # Completion list background (both)
-  completion_menu_current_bg: "#333355" # Active completion row background (both)
-  completion_menu_meta_bg: "#1a1a2e"    # Completion meta column bg (classic)
-  completion_menu_meta_current_bg: "#333355"  # Active meta bg (classic)
-
-  # Drag-to-select background — (tui)
-  selection_bg: "#3a3a55"               # Uniform selection highlight in the TUI
-
-# ── Spinner ─────────────────────────────────────────────────────────────────
-# (classic) — the TUI uses its own animated indicators; spinner config here
-# is only read by the classic prompt_toolkit CLI.
-spinner:
-  # Faces shown while waiting for the API response
-  waiting_faces:
-    - "(｡◕‿◕｡)"
-    - "(◕‿◕✿)"
-    - "٩(◕‿◕｡)۶"
-
-  # Faces shown during extended thinking/reasoning
-  thinking_faces:
-    - "(｡•́︿•̀｡)"
-    - "(◔_◔)"
-    - "(¬‿¬)"
-
-  # Verbs used in spinner messages (e.g., "pondering your request...")
-  thinking_verbs:
-    - "pondering"
-    - "contemplating"
-    - "musing"
-    - "ruminating"
-
-  # Optional: left/right decorations around the spinner
-  # Each entry is a [left, right] pair. Omit entirely for no wings.
-  # wings:
-  #   - ["⟪⚔", "⚔⟫"]
-  #   - ["⟪▲", "▲⟫"]
-
-# ── Branding ────────────────────────────────────────────────────────────────
-# Text strings used throughout the interface.
-branding:
-  agent_name: "Hermes Agent"                  # (both) Banner title, about display
-  welcome: "Welcome! Type your message or /help for commands."  # (both)
-  goodbye: "Goodbye! ⚕"                       # (both) Exit message
-  response_label: " ⚕ Hermes "                # (classic) Response box header label
-  prompt_symbol: "❯ "                          # (both) Input prompt glyph
-  help_header: "(^_^)? Available Commands"     # (both) /help overlay title
-
-# ── Tool Output ─────────────────────────────────────────────────────────────
-# Character used as the prefix for tool output lines. (both)
-# Default is "┊" (thin dotted vertical line). Some alternatives:
-#   "╎" (light triple dash vertical)
-#   "▏" (left one-eighth block)
-#   "│" (box drawing light vertical)
-#   "┃" (box drawing heavy vertical)
-tool_prefix: "┊"
@@ -1,329 +0,0 @@
-# Container-Aware CLI Review Fixes Spec
-
-**PR:** NousResearch/hermes-agent#7543
-**Review:** cursor[bot] bugbot review (4094049442) + two prior rounds
-**Date:** 2026-04-12
-**Branch:** `feat/container-aware-cli-clean`
-
-## Review Issues Summary
-
-Six issues were raised across three bugbot review rounds. Three were fixed in intermediate commits (38277a6a, 726cf90f). This spec addresses remaining design concerns surfaced by those reviews and simplifies the implementation based on interview decisions.
-
-| # | Issue | Severity | Status |
-|---|-------|----------|--------|
-| 1 | `os.execvp` retry loop unreachable | Medium | Fixed in 79e8cd12 (switched to subprocess.run) |
-| 2 | Redundant `shutil.which("sudo")` | Medium | Fixed in 38277a6a (reuses `sudo` var) |
-| 3 | Missing `chown -h` on symlink update | Low | Fixed in 38277a6a |
-| 4 | Container routing after `parse_args()` | High | Fixed in 726cf90f |
-| 5 | Hardcoded `/home/${user}` | Medium | Fixed in 726cf90f |
-| 6 | Group membership not gated on `container.enable` | Low | Fixed in 726cf90f |
-
-The mechanical fixes are in place but the overall design needs revision. The retry loop, error swallowing, and process model have deeper issues than what the bugbot flagged.
-
---
-
-## Spec: Revised `_exec_in_container`
-
-### Design Principles
-
-1. **Let it crash.** No silent fallbacks. If `.container-mode` exists but something goes wrong, the error propagates naturally (Python traceback). The only case where container routing is skipped is when `.container-mode` doesn't exist or `HERMES_DEV=1`.
-2. **No retries.** Probe once for sudo, exec once. If it fails, docker/podman's stderr reaches the user verbatim.
-3. **Completely transparent.** No error wrapping, no prefixes, no spinners. Docker's output goes straight through.
-4. **`os.execvp` on the happy path.** Replace the Python process entirely so there's no idle parent during interactive sessions. Note: `execvp` never returns on success (process is replaced) and raises `OSError` on failure (it does not return a value). The container process's exit code becomes the process exit code by definition — no explicit propagation needed.
-5. **One human-readable exception to "let it crash".** `subprocess.TimeoutExpired` from the sudo probe gets a specific catch with a readable message, since a raw traceback for "your Docker daemon is slow" is confusing. All other exceptions propagate naturally.
-
-### Execution Flow
-
-```
-1. get_container_exec_info()
-   - HERMES_DEV=1 → return None (skip routing)
-   - Inside container → return None (skip routing)
-   - .container-mode doesn't exist → return None (skip routing)
-   - .container-mode exists → parse and return dict
-   - .container-mode exists but malformed/unreadable → LET IT CRASH (no try/except)
-
-2. _exec_in_container(container_info, sys.argv[1:])
-   a. shutil.which(backend) → if None, print "{backend} not found on PATH" and sys.exit(1)
-   b. Sudo probe: subprocess.run([runtime, "inspect", "--format", "ok", container_name], timeout=15)
-      - If succeeds → needs_sudo = False
-      - If fails → try subprocess.run([sudo, "-n", runtime, "inspect", ...], timeout=15)
-        - If succeeds → needs_sudo = True
-        - If fails → print error with sudoers hint (including why -n is required) and sys.exit(1)
-      - If TimeoutExpired → catch specifically, print human-readable message about slow daemon
-   c. Build exec_cmd: [sudo? + runtime, "exec", tty_flags, "-u", exec_user, env_flags, container, hermes_bin, *cli_args]
-   d. os.execvp(exec_cmd[0], exec_cmd)
-      - On success: process is replaced — Python is gone, container exit code IS the process exit code
-      - On OSError: let it crash (natural traceback)
-```
-
-### Changes to `hermes_cli/main.py`
-
-#### `_exec_in_container` — rewrite
-
-Remove:
- The entire retry loop (`max_retries`, `for attempt in range(...)`)
- Spinner logic (`"Waiting for container..."`, dots)
- Exit code classification (125/126/127 handling)
- `subprocess.run` for the exec call (keep it only for the sudo probe)
- Special TTY vs non-TTY retry counts
- The `time` import (no longer needed)
-
-Change:
- Use `os.execvp(exec_cmd[0], exec_cmd)` as the final call
- Keep the `subprocess` import only for the sudo probe
- Keep TTY detection for the `-it` vs `-i` flag
- Keep env var forwarding (TERM, COLORTERM, LANG, LC_ALL)
- Keep the sudo probe as-is (it's the one "smart" part)
- Bump probe `timeout` from 5s to 15s — cold podman on a loaded machine needs headroom
- Catch `subprocess.TimeoutExpired` specifically on both probe calls — print a readable message about the daemon being unresponsive instead of a raw traceback
- Expand the sudoers hint error message to explain *why* `-n` (non-interactive) is required: a password prompt would hang the CLI or break piped commands
-
-The function becomes roughly:
-
-```python
-def _exec_in_container(container_info: dict, cli_args: list):
-    """Replace the current process with a command inside the managed container.
-
-    Probes whether sudo is needed (rootful containers), then os.execvp
-    into the container. If exec fails, the OS error propagates naturally.
-    """
-    import shutil
-    import subprocess
-
-    backend = container_info["backend"]
-    container_name = container_info["container_name"]
-    exec_user = container_info["exec_user"]
-    hermes_bin = container_info["hermes_bin"]
-
-    runtime = shutil.which(backend)
-    if not runtime:
-        print(f"Error: {backend} not found on PATH. Cannot route to container.",
-              file=sys.stderr)
-        sys.exit(1)
-
-    # Probe whether we need sudo to see the rootful container.
-    # Timeout is 15s — cold podman on a loaded machine can take a while.
-    # TimeoutExpired is caught specifically for a human-readable message;
-    # all other exceptions propagate naturally.
-    needs_sudo = False
-    sudo = None
-    try:
-        probe = subprocess.run(
-            [runtime, "inspect", "--format", "ok", container_name],
-            capture_output=True, text=True, timeout=15,
-        )
-    except subprocess.TimeoutExpired:
-        print(
-            f"Error: timed out waiting for {backend} to respond.\n"
-            f"The {backend} daemon may be unresponsive or starting up.",
-            file=sys.stderr,
-        )
-        sys.exit(1)
-
-    if probe.returncode != 0:
-        sudo = shutil.which("sudo")
-        if sudo:
-            try:
-                probe2 = subprocess.run(
-                    [sudo, "-n", runtime, "inspect", "--format", "ok", container_name],
-                    capture_output=True, text=True, timeout=15,
-                )
-            except subprocess.TimeoutExpired:
-                print(
-                    f"Error: timed out waiting for sudo {backend} to respond.",
-                    file=sys.stderr,
-                )
-                sys.exit(1)
-
-            if probe2.returncode == 0:
-                needs_sudo = True
-            else:
-                print(
-                    f"Error: container '{container_name}' not found via {backend}.\n"
-                    f"\n"
-                    f"The NixOS service runs the container as root. Your user cannot\n"
-                    f"see it because {backend} uses per-user namespaces.\n"
-                    f"\n"
-                    f"Fix: grant passwordless sudo for {backend}. The -n (non-interactive)\n"
-                    f"flag is required because the CLI calls sudo non-interactively —\n"
-                    f"a password prompt would hang or break piped commands:\n"
-                    f"\n"
-                    f'  security.sudo.extraRules = [{{\n'
-                    f'    users = [ "{os.getenv("USER", "your-user")}" ];\n'
-                    f'    commands = [{{ command = "{runtime}"; options = [ "NOPASSWD" ]; }}];\n'
-                    f'  }}];\n'
-                    f"\n"
-                    f"Or run: sudo hermes {' '.join(cli_args)}",
-                    file=sys.stderr,
-                )
-                sys.exit(1)
-        else:
-            print(
-                f"Error: container '{container_name}' not found via {backend}.\n"
-                f"The container may be running under root. Try: sudo hermes {' '.join(cli_args)}",
-                file=sys.stderr,
-            )
-            sys.exit(1)
-
-    is_tty = sys.stdin.isatty()
-    tty_flags = ["-it"] if is_tty else ["-i"]
-
-    env_flags = []
-    for var in ("TERM", "COLORTERM", "LANG", "LC_ALL"):
-        val = os.environ.get(var)
-        if val:
-            env_flags.extend(["-e", f"{var}={val}"])
-
-    cmd_prefix = [sudo, "-n", runtime] if needs_sudo else [runtime]
-    exec_cmd = (
-        cmd_prefix + ["exec"]
-        + tty_flags
-        + ["-u", exec_user]
-        + env_flags
-        + [container_name, hermes_bin]
-        + cli_args
-    )
-
-    # execvp replaces this process entirely — it never returns on success.
-    # On failure it raises OSError, which propagates naturally.
-    os.execvp(exec_cmd[0], exec_cmd)
-```
-
-#### Container routing call site in `main()` — remove try/except
-
-Current:
-```python
-try:
-    from hermes_cli.config import get_container_exec_info
-    container_info = get_container_exec_info()
-    if container_info:
-        _exec_in_container(container_info, sys.argv[1:])
-        sys.exit(1)  # exec failed if we reach here
-except SystemExit:
-    raise
-except Exception:
-    pass  # Container routing unavailable, proceed locally
-```
-
-Revised:
-```python
-from hermes_cli.config import get_container_exec_info
-container_info = get_container_exec_info()
-if container_info:
-    _exec_in_container(container_info, sys.argv[1:])
-    # Unreachable: os.execvp never returns on success (process is replaced)
-    # and raises OSError on failure (which propagates as a traceback).
-    # This line exists only as a defensive assertion.
-    sys.exit(1)
-```
-
-No try/except. If `.container-mode` doesn't exist, `get_container_exec_info()` returns `None` and we skip routing. If it exists but is broken, the exception propagates with a natural traceback.
-
-Note: `sys.exit(1)` after `_exec_in_container` is dead code in all paths — `os.execvp` either replaces the process or raises. It's kept as a belt-and-suspenders assertion with a comment marking it unreachable, not as actual error handling.
-
-### Changes to `hermes_cli/config.py`
-
-#### `get_container_exec_info` — remove inner try/except
-
-Current code catches `(OSError, IOError)` and returns `None`. This silently hides permission errors, corrupt files, etc.
-
-Change: Remove the try/except around file reading. Keep the early returns for `HERMES_DEV=1` and `_is_inside_container()`. The `FileNotFoundError` from `open()` when `.container-mode` doesn't exist should still return `None` (this is the "container mode not enabled" case). All other exceptions propagate.
-
-```python
-def get_container_exec_info() -> Optional[dict]:
-    if os.environ.get("HERMES_DEV") == "1":
-        return None
-    if _is_inside_container():
-        return None
-
-    container_mode_file = get_hermes_home() / ".container-mode"
-
-    try:
-        with open(container_mode_file, "r") as f:
-            # ... parse key=value lines ...
-    except FileNotFoundError:
-        return None
-    # All other exceptions (PermissionError, malformed data, etc.) propagate
-
-    return { ... }
-```
-
---
-
-## Spec: NixOS Module Changes
-
-### Symlink creation — simplify to two branches
-
-Current: 4 branches (symlink exists, directory exists, other file, doesn't exist).
-
-Revised: 2 branches.
-
-```bash
-if [ -d "${symlinkPath}" ] && [ ! -L "${symlinkPath}" ]; then
-  # Real directory — back it up, then create symlink
-  _backup="${symlinkPath}.bak.$(date +%s)"
-  echo "hermes-agent: backing up existing ${symlinkPath} to $_backup"
-  mv "${symlinkPath}" "$_backup"
-fi
-# For everything else (symlink, doesn't exist, etc.) — just force-create
-ln -sfn "${target}" "${symlinkPath}"
-chown -h ${user}:${cfg.group} "${symlinkPath}"
-```
-
-`ln -sfn` handles: existing symlink (replaces), doesn't exist (creates), and after the `mv` above (creates). The only case that needs special handling is a real directory, because `ln -sfn` cannot atomically replace a directory.
-
-Note: there is a theoretical race between the `[ -d ... ]` check and the `mv` (something could create/remove the directory in between). In practice this is a NixOS activation script running as root during `nixos-rebuild switch` — no other process should be touching `~/.hermes` at that moment. Not worth adding locking for.
-
-### Sudoers — document, don't auto-configure
-
-Do NOT add `security.sudo.extraRules` to the module. Document the sudoers requirement in the module's description/comments and in the error message the CLI prints when sudo probe fails.
-
-### Group membership gating — keep as-is
-
-The fix in 726cf90f (`cfg.container.enable && cfg.container.hostUsers != []`) is correct. Leftover group membership when container mode is disabled is harmless. No cleanup needed.
-
---
-
-## Spec: Test Rewrite
-
-The existing test file (`tests/hermes_cli/test_container_aware_cli.py`) has 16 tests. With the simplified exec model, several are obsolete.
-
-### Tests to keep (update as needed)
-
- `test_is_inside_container_dockerenv` — unchanged
- `test_is_inside_container_containerenv` — unchanged
- `test_is_inside_container_cgroup_docker` — unchanged
- `test_is_inside_container_false_on_host` — unchanged
- `test_get_container_exec_info_returns_metadata` — unchanged
- `test_get_container_exec_info_none_inside_container` — unchanged
- `test_get_container_exec_info_none_without_file` — unchanged
- `test_get_container_exec_info_skipped_when_hermes_dev` — unchanged
- `test_get_container_exec_info_not_skipped_when_hermes_dev_zero` — unchanged
- `test_get_container_exec_info_defaults` — unchanged
- `test_get_container_exec_info_docker_backend` — unchanged
-
-### Tests to add
-
- `test_get_container_exec_info_crashes_on_permission_error` — verify that `PermissionError` propagates (no silent `None` return)
- `test_exec_in_container_calls_execvp` — verify `os.execvp` is called with correct args (runtime, tty flags, user, env, container, binary, cli args)
- `test_exec_in_container_sudo_probe_sets_prefix` — verify that when first probe fails and sudo probe succeeds, `os.execvp` is called with `sudo -n` prefix
- `test_exec_in_container_no_runtime_hard_fails` — keep existing, verify `sys.exit(1)` when `shutil.which` returns None
- `test_exec_in_container_non_tty_uses_i_only` — update to check `os.execvp` args instead of `subprocess.run` args
- `test_exec_in_container_probe_timeout_prints_message` — verify that `subprocess.TimeoutExpired` from the probe produces a human-readable error and `sys.exit(1)`, not a raw traceback
- `test_exec_in_container_container_not_running_no_sudo` — verify the path where runtime exists (`shutil.which` returns a path) but probe returns non-zero and no sudo is available. Should print the "container may be running under root" error. This is distinct from `no_runtime_hard_fails` which covers `shutil.which` returning None.
-
-### Tests to delete
-
- `test_exec_in_container_tty_retries_on_container_failure` — retry loop removed
- `test_exec_in_container_non_tty_retries_silently_exits_126` — retry loop removed
- `test_exec_in_container_propagates_hermes_exit_code` — no subprocess.run to check exit codes; execvp replaces the process. Note: exit code propagation still works correctly — when `os.execvp` succeeds, the container's process *becomes* this process, so its exit code is the process exit code by OS semantics. No application code needed, no test needed. A comment in the function docstring documents this intent for future readers.
-
---
-
-## Out of Scope
-
- Auto-configuring sudoers rules in the NixOS module
- Any changes to `get_container_exec_info` parsing logic beyond the try/except narrowing
- Changes to `.container-mode` file format
- Changes to the `HERMES_DEV=1` bypass
- Changes to container detection logic (`_is_inside_container`)
@@ -117,6 +117,160 @@ def _normalize_chat_content(
        return ""


+# Content part type aliases used by the OpenAI Chat Completions and Responses
+# APIs.  We accept both spellings on input and emit a single canonical internal
+# shape (``{"type": "text", ...}`` / ``{"type": "image_url", ...}``) that the
+# rest of the agent pipeline already understands.
+_TEXT_PART_TYPES = frozenset({"text", "input_text", "output_text"})
+_IMAGE_PART_TYPES = frozenset({"image_url", "input_image"})
+_FILE_PART_TYPES = frozenset({"file", "input_file"})
+
+
+def _normalize_multimodal_content(content: Any) -> Any:
+    """Validate and normalize multimodal content for the API server.
+
+    Returns a plain string when the content is text-only, or a list of
+    ``{"type": "text"|"image_url", ...}`` parts when images are present.
+    The output shape is the native OpenAI Chat Completions vision format,
+    which the agent pipeline accepts verbatim (OpenAI-wire providers) or
+    converts (``_preprocess_anthropic_content`` for Anthropic).
+
+    Raises ``ValueError`` with an OpenAI-style code on invalid input:
+      * ``unsupported_content_type`` — file/input_file/file_id parts, or
+        non-image ``data:`` URLs.
+      * ``invalid_image_url`` — missing URL or unsupported scheme.
+      * ``invalid_content_part`` — malformed text/image objects.
+
+    Callers translate the ValueError into a 400 response.
+    """
+    # Scalar passthrough mirrors ``_normalize_chat_content``.
+    if content is None:
+        return ""
+    if isinstance(content, str):
+        return content[:MAX_NORMALIZED_TEXT_LENGTH] if len(content) > MAX_NORMALIZED_TEXT_LENGTH else content
+    if not isinstance(content, list):
+        # Mirror the legacy text-normalizer's fallback so callers that
+        # pre-existed image support still get a string back.
+        return _normalize_chat_content(content)
+
+    items = content[:MAX_CONTENT_LIST_SIZE] if len(content) > MAX_CONTENT_LIST_SIZE else content
+    normalized_parts: List[Dict[str, Any]] = []
+    text_accum_len = 0
+
+    for part in items:
+        if isinstance(part, str):
+            if part:
+                trimmed = part[:MAX_NORMALIZED_TEXT_LENGTH]
+                normalized_parts.append({"type": "text", "text": trimmed})
+                text_accum_len += len(trimmed)
+            continue
+
+        if not isinstance(part, dict):
+            # Ignore unknown scalars for forward compatibility with future
+            # Responses API additions (e.g. ``refusal``).  The same policy
+            # the text normalizer applies.
+            continue
+
+        raw_type = part.get("type")
+        part_type = str(raw_type or "").strip().lower()
+
+        if part_type in _TEXT_PART_TYPES:
+            text = part.get("text")
+            if text is None:
+                continue
+            if not isinstance(text, str):
+                text = str(text)
+            if text:
+                trimmed = text[:MAX_NORMALIZED_TEXT_LENGTH]
+                normalized_parts.append({"type": "text", "text": trimmed})
+                text_accum_len += len(trimmed)
+            continue
+
+        if part_type in _IMAGE_PART_TYPES:
+            detail = part.get("detail")
+            image_ref = part.get("image_url")
+            # OpenAI Responses sends ``input_image`` with a top-level
+            # ``image_url`` string; Chat Completions sends ``image_url`` as
+            # ``{"url": "...", "detail": "..."}``.  Support both.
+            if isinstance(image_ref, dict):
+                url_value = image_ref.get("url")
+                detail = image_ref.get("detail", detail)
+            else:
+                url_value = image_ref
+            if not isinstance(url_value, str) or not url_value.strip():
+                raise ValueError("invalid_image_url:Image parts must include a non-empty image URL.")
+            url_value = url_value.strip()
+            lowered = url_value.lower()
+            if lowered.startswith("data:"):
+                if not lowered.startswith("data:image/") or "," not in url_value:
+                    raise ValueError(
+                        "unsupported_content_type:Only image data URLs are supported. "
+                        "Non-image data payloads are not supported."
+                    )
+            elif not (lowered.startswith("http://") or lowered.startswith("https://")):
+                raise ValueError(
+                    "invalid_image_url:Image inputs must use http(s) URLs or data:image/... URLs."
+                )
+            image_part: Dict[str, Any] = {"type": "image_url", "image_url": {"url": url_value}}
+            if detail is not None:
+                if not isinstance(detail, str) or not detail.strip():
+                    raise ValueError("invalid_content_part:Image detail must be a non-empty string when provided.")
+                image_part["image_url"]["detail"] = detail.strip()
+            normalized_parts.append(image_part)
+            continue
+
+        if part_type in _FILE_PART_TYPES:
+            raise ValueError(
+                "unsupported_content_type:Inline image inputs are supported, "
+                "but uploaded files and document inputs are not supported on this endpoint."
+            )
+
+        # Unknown part type — reject explicitly so clients get a clear error
+        # instead of a silently dropped turn.
+        raise ValueError(
+            f"unsupported_content_type:Unsupported content part type {raw_type!r}. "
+            "Only text and image_url/input_image parts are supported."
+        )
+
+    if not normalized_parts:
+        return ""
+
+    # Text-only: collapse to a plain string so downstream logging/trajectory
+    # code sees the native shape and prompt caching on text-only turns is
+    # unaffected.
+    if all(p.get("type") == "text" for p in normalized_parts):
+        return "\n".join(p["text"] for p in normalized_parts if p.get("text"))
+
+    return normalized_parts
+
+
+def _content_has_visible_payload(content: Any) -> bool:
+    """True when content has any text or image attachment.  Used to reject empty turns."""
+    if isinstance(content, str):
+        return bool(content.strip())
+    if isinstance(content, list):
+        for part in content:
+            if isinstance(part, dict):
+                ptype = str(part.get("type") or "").strip().lower()
+                if ptype in _TEXT_PART_TYPES and str(part.get("text") or "").strip():
+                    return True
+                if ptype in _IMAGE_PART_TYPES:
+                    return True
+    return False
+
+
+def _multimodal_validation_error(exc: ValueError, *, param: str) -> "web.Response":
+    """Translate a ``_normalize_multimodal_content`` ValueError into a 400 response."""
+    raw = str(exc)
+    code, _, message = raw.partition(":")
+    if not message:
+        code, message = "invalid_content_part", raw
+    return web.json_response(
+        _openai_error(message, code=code, param=param),
+        status=400,
+    )
+
+
 def check_api_server_requirements() -> bool:
    """Check if API server dependencies are available."""
    return AIOHTTP_AVAILABLE
@@ -637,26 +791,32 @@ class APIServerAdapter(BasePlatformAdapter):
        system_prompt = None
        conversation_messages: List[Dict[str, str]] = []

-        for msg in messages:
+        for idx, msg in enumerate(messages):
            role = msg.get("role", "")
-            content = _normalize_chat_content(msg.get("content", ""))
+            raw_content = msg.get("content", "")
            if role == "system":
-                # Accumulate system messages
+                # System messages don't support images (Anthropic rejects, OpenAI
+                # text-model systems don't render them).  Flatten to text.
+                content = _normalize_chat_content(raw_content)
                if system_prompt is None:
                    system_prompt = content
                else:
                    system_prompt = system_prompt + "\n" + content
            elif role in ("user", "assistant"):
+                try:
+                    content = _normalize_multimodal_content(raw_content)
+                except ValueError as exc:
+                    return _multimodal_validation_error(exc, param=f"messages[{idx}].content")
                conversation_messages.append({"role": role, "content": content})

        # Extract the last user message as the primary input
-        user_message = ""
+        user_message: Any = ""
        history = []
        if conversation_messages:
            user_message = conversation_messages[-1].get("content", "")
            history = conversation_messages[:-1]

-        if not user_message:
+        if not _content_has_visible_payload(user_message):
            return web.json_response(
                {"error": {"message": "No user message found in messages", "type": "invalid_request_error"}},
                status=400,
@@ -1424,16 +1584,19 @@ class APIServerAdapter(BasePlatformAdapter):
            # No error if conversation doesn't exist yet — it's a new conversation

        # Normalize input to message list
-        input_messages: List[Dict[str, str]] = []
+        input_messages: List[Dict[str, Any]] = []
        if isinstance(raw_input, str):
            input_messages = [{"role": "user", "content": raw_input}]
        elif isinstance(raw_input, list):
-            for item in raw_input:
+            for idx, item in enumerate(raw_input):
                if isinstance(item, str):
                    input_messages.append({"role": "user", "content": item})
                elif isinstance(item, dict):
                    role = item.get("role", "user")
-                    content = _normalize_chat_content(item.get("content", ""))
+                    try:
+                        content = _normalize_multimodal_content(item.get("content", ""))
+                    except ValueError as exc:
+                        return _multimodal_validation_error(exc, param=f"input[{idx}].content")
                    input_messages.append({"role": role, "content": content})
        else:
            return web.json_response(_openai_error("'input' must be a string or array"), status=400)
@@ -1442,7 +1605,7 @@ class APIServerAdapter(BasePlatformAdapter):
        # This lets stateless clients supply their own history instead of
        # relying on server-side response chaining via previous_response_id.
        # Precedence: explicit conversation_history > previous_response_id.
-        conversation_history: List[Dict[str, str]] = []
+        conversation_history: List[Dict[str, Any]] = []
        raw_history = body.get("conversation_history")
        if raw_history:
            if not isinstance(raw_history, list):
@@ -1456,7 +1619,11 @@ class APIServerAdapter(BasePlatformAdapter):
                        _openai_error(f"conversation_history[{i}] must have 'role' and 'content' fields"),
                        status=400,
                    )
-                conversation_history.append({"role": str(entry["role"]), "content": str(entry["content"])})
+                try:
+                    entry_content = _normalize_multimodal_content(entry["content"])
+                except ValueError as exc:
+                    return _multimodal_validation_error(exc, param=f"conversation_history[{i}].content")
+                conversation_history.append({"role": str(entry["role"]), "content": entry_content})
            if previous_response_id:
                logger.debug("Both conversation_history and previous_response_id provided; using conversation_history")

@@ -1476,8 +1643,8 @@ class APIServerAdapter(BasePlatformAdapter):
            conversation_history.append(msg)

        # Last input message is the user_message
-        user_message = input_messages[-1].get("content", "") if input_messages else ""
-        if not user_message:
+        user_message: Any = input_messages[-1].get("content", "") if input_messages else ""
+        if not _content_has_visible_payload(user_message):
            return web.json_response(_openai_error("No user message found in input"), status=400)

        # Truncation support
@@ -6,6 +6,7 @@ and implement the required methods.
 """

 import asyncio
+import inspect
 import ipaddress
 import logging
 import os
@@ -551,6 +552,39 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) ->
    raise last_exc


+# ---------------------------------------------------------------------------
+# Video cache utilities
+#
+# Same pattern as image/audio cache -- videos from platforms are downloaded
+# here so the agent can reference them by local file path.
+# ---------------------------------------------------------------------------
+
+VIDEO_CACHE_DIR = get_hermes_dir("cache/videos", "video_cache")
+
+SUPPORTED_VIDEO_TYPES = {
+    ".mp4": "video/mp4",
+    ".mov": "video/quicktime",
+    ".webm": "video/webm",
+    ".mkv": "video/x-matroska",
+    ".avi": "video/x-msvideo",
+}
+
+
+def get_video_cache_dir() -> Path:
+    """Return the video cache directory, creating it if it doesn't exist."""
+    VIDEO_CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    return VIDEO_CACHE_DIR
+
+
+def cache_video_from_bytes(data: bytes, ext: str = ".mp4") -> str:
+    """Save raw video bytes to the cache and return the absolute file path."""
+    cache_dir = get_video_cache_dir()
+    filename = f"video_{uuid.uuid4().hex[:12]}{ext}"
+    filepath = cache_dir / filename
+    filepath.write_bytes(data)
+    return str(filepath)
+
+
 # ---------------------------------------------------------------------------
 # Document cache utilities
 #
@@ -880,10 +914,11 @@ class BasePlatformAdapter(ABC):
        # working on a task after --replace or manual restarts.
        self._background_tasks: set[asyncio.Task] = set()
        # One-shot callbacks to fire after the main response is delivered.
-        # Keyed by session_key.  GatewayRunner uses this to defer
-        # background-review notifications ("💾 Skill created") until the
-        # primary reply has been sent.
-        self._post_delivery_callbacks: Dict[str, Callable] = {}
+        # Keyed by session_key. Values are either a bare callback (legacy) or
+        # a ``(generation, callback)`` tuple so GatewayRunner can make deferred
+        # deliveries generation-aware and avoid stale runs clearing callbacks
+        # registered by a fresher run for the same session.
+        self._post_delivery_callbacks: Dict[str, Any] = {}
        self._expected_cancelled_tasks: set[asyncio.Task] = set()
        self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
        # Chats where auto-TTS on voice input is disabled (set by /voice off)
@@ -1401,7 +1436,13 @@ class BasePlatformAdapter(ABC):

        return paths, cleaned

-    async def _keep_typing(self, chat_id: str, interval: float = 2.0, metadata=None) -> None:
+    async def _keep_typing(
+        self,
+        chat_id: str,
+        interval: float = 2.0,
+        metadata=None,
+        stop_event: asyncio.Event | None = None,
+    ) -> None:
        """
        Continuously send typing indicator until cancelled.
        
@@ -1415,9 +1456,18 @@ class BasePlatformAdapter(ABC):
        """
        try:
            while True:
+                if stop_event is not None and stop_event.is_set():
+                    return
                if chat_id not in self._typing_paused:
                    await self.send_typing(chat_id, metadata=metadata)
-                await asyncio.sleep(interval)
+                if stop_event is None:
+                    await asyncio.sleep(interval)
+                    continue
+                try:
+                    await asyncio.wait_for(stop_event.wait(), timeout=interval)
+                except asyncio.TimeoutError:
+                    continue
+                return
        except asyncio.CancelledError:
            pass  # Normal cancellation when handler completes
        finally:
@@ -1444,6 +1494,59 @@ class BasePlatformAdapter(ABC):
        """Resume typing indicator for a chat after approval resolves."""
        self._typing_paused.discard(chat_id)

+    async def interrupt_session_activity(self, session_key: str, chat_id: str) -> None:
+        """Signal the active session loop to stop and clear typing immediately."""
+        if session_key:
+            interrupt_event = self._active_sessions.get(session_key)
+            if interrupt_event is not None:
+                interrupt_event.set()
+        try:
+            await self.stop_typing(chat_id)
+        except Exception:
+            pass
+
+    def register_post_delivery_callback(
+        self,
+        session_key: str,
+        callback: Callable,
+        *,
+        generation: int | None = None,
+    ) -> None:
+        """Register a deferred callback to fire after the main response.
+
+        ``generation`` lets callers tie the callback to a specific gateway run
+        generation so stale runs cannot clear callbacks owned by a fresher run.
+        """
+        if not session_key or not callable(callback):
+            return
+        if generation is None:
+            self._post_delivery_callbacks[session_key] = callback
+        else:
+            self._post_delivery_callbacks[session_key] = (int(generation), callback)
+
+    def pop_post_delivery_callback(
+        self,
+        session_key: str,
+        *,
+        generation: int | None = None,
+    ) -> Callable | None:
+        """Pop a deferred callback, optionally requiring generation ownership."""
+        if not session_key:
+            return None
+        entry = self._post_delivery_callbacks.get(session_key)
+        if entry is None:
+            return None
+        if isinstance(entry, tuple) and len(entry) == 2:
+            entry_generation, callback = entry
+            if generation is not None and int(entry_generation) != int(generation):
+                return None
+            self._post_delivery_callbacks.pop(session_key, None)
+            return callback if callable(callback) else None
+        if generation is not None:
+            return None
+        self._post_delivery_callbacks.pop(session_key, None)
+        return entry if callable(entry) else None
+
    # ── Processing lifecycle hooks ──────────────────────────────────────────
    # Subclasses override these to react to message processing events
    # (e.g. Discord adds 👀/✅/❌ reactions).
@@ -1714,10 +1817,23 @@ class BasePlatformAdapter(ABC):
        # Fall back to a new Event only if the entry was removed externally.
        interrupt_event = self._active_sessions.get(session_key) or asyncio.Event()
        self._active_sessions[session_key] = interrupt_event
+        callback_generation = getattr(interrupt_event, "_hermes_run_generation", None)
        
        # Start continuous typing indicator (refreshes every 2 seconds)
        _thread_metadata = {"thread_id": event.source.thread_id} if event.source.thread_id else None
-        typing_task = asyncio.create_task(self._keep_typing(event.source.chat_id, metadata=_thread_metadata))
+        _keep_typing_kwargs = {"metadata": _thread_metadata}
+        try:
+            _keep_typing_sig = inspect.signature(self._keep_typing)
+        except (TypeError, ValueError):
+            _keep_typing_sig = None
+        if _keep_typing_sig is None or "stop_event" in _keep_typing_sig.parameters:
+            _keep_typing_kwargs["stop_event"] = interrupt_event
+        typing_task = asyncio.create_task(
+            self._keep_typing(
+                event.source.chat_id,
+                **_keep_typing_kwargs,
+            )
+        )
        
        try:
            await self._run_processing_hook("on_processing_start", event)
@@ -1976,7 +2092,14 @@ class BasePlatformAdapter(ABC):
        finally:
            # Fire any one-shot post-delivery callback registered for this
            # session (e.g. deferred background-review notifications).
-            _post_cb = getattr(self, "_post_delivery_callbacks", {}).pop(session_key, None)
+            _callback_generation = callback_generation
+            if hasattr(self, "pop_post_delivery_callback"):
+                _post_cb = self.pop_post_delivery_callback(
+                    session_key,
+                    generation=_callback_generation,
+                )
+            else:
+                _post_cb = getattr(self, "_post_delivery_callbacks", {}).pop(session_key, None)
            if callable(_post_cb):
                try:
                    _post_cb()
@@ -2022,10 +2145,10 @@ class BasePlatformAdapter(ABC):
                    pass
                # Leave _active_sessions[session_key] populated — the drain
                # task's own lifecycle will clean it up.
-                return
-            # Clean up session tracking
-            if session_key in self._active_sessions:
-                del self._active_sessions[session_key]
+            else:
+                # Clean up session tracking
+                if session_key in self._active_sessions:
+                    del self._active_sessions[session_key]
    
    async def cancel_background_tasks(self) -> None:
        """Cancel any in-flight background message-processing tasks.
@@ -2033,12 +2156,26 @@ class BasePlatformAdapter(ABC):
        Used during gateway shutdown/replacement so active sessions from the old
        process do not keep running after adapters are being torn down.
        """
-        tasks = [task for task in self._background_tasks if not task.done()]
-        for task in tasks:
-            self._expected_cancelled_tasks.add(task)
-            task.cancel()
-        if tasks:
+        # Loop until no new tasks appear.  Without this, a message
+        # arriving during the `await asyncio.gather` below would spawn
+        # a fresh _process_message_background task (added to
+        # self._background_tasks at line ~1668 via handle_message),
+        # and the _background_tasks.clear() at the end of this method
+        # would drop the reference — the task runs untracked against a
+        # disconnecting adapter, logs send-failures, and may linger
+        # until it completes on its own.  Retrying the drain until the
+        # task set stabilizes closes the window.
+        MAX_DRAIN_ROUNDS = 5
+        for _ in range(MAX_DRAIN_ROUNDS):
+            tasks = [task for task in self._background_tasks if not task.done()]
+            if not tasks:
+                break
+            for task in tasks:
+                self._expected_cancelled_tasks.add(task)
+                task.cancel()
            await asyncio.gather(*tasks, return_exceptions=True)
+            # Loop: late-arrival tasks spawned during the gather above
+            # will be in self._background_tasks now.  Re-check.
        self._background_tasks.clear()
        self._expected_cancelled_tasks.clear()
        self._pending_messages.clear()
@@ -498,6 +498,7 @@ class DiscordAdapter(BasePlatformAdapter):
        self._allowed_role_ids: set = set()  # For DISCORD_ALLOWED_ROLES filtering
        # Voice channel state (per-guild)
        self._voice_clients: Dict[int, Any] = {}  # guild_id -> VoiceClient
+        self._voice_locks: Dict[int, asyncio.Lock] = {}  # guild_id -> serialize join/leave
        # Text batching: merge rapid successive messages (Telegram-style)
        self._text_batch_delay_seconds = float(os.getenv("HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS", "0.6"))
        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
@@ -636,6 +637,15 @@ class DiscordAdapter(BasePlatformAdapter):

            @self._client.event
            async def on_message(message: DiscordMessage):
+                # Block until _resolve_allowed_usernames has swapped
+                # any raw usernames in DISCORD_ALLOWED_USERS for numeric
+                # IDs (otherwise on_message's author.id lookup can miss).
+                if not adapter_self._ready_event.is_set():
+                    try:
+                        await asyncio.wait_for(adapter_self._ready_event.wait(), timeout=30.0)
+                    except asyncio.TimeoutError:
+                        pass
+
                # Dedup: Discord RESUME replays events after reconnects (#4777)
                if adapter_self._dedup.is_duplicate(str(message.id)):
                    return
@@ -1071,6 +1081,8 @@ class DiscordAdapter(BasePlatformAdapter):
        chat_id: str,
        message_id: str,
        content: str,
+        *,
+        finalize: bool = False,
    ) -> SendResult:
        """Edit a previously sent Discord message."""
        if not self._client:
@@ -1237,51 +1249,53 @@ class DiscordAdapter(BasePlatformAdapter):
            return False
        guild_id = channel.guild.id

-        # Already connected in this guild?
-        existing = self._voice_clients.get(guild_id)
-        if existing and existing.is_connected():
-            if existing.channel.id == channel.id:
+        async with self._voice_locks.setdefault(guild_id, asyncio.Lock()):
+            # Already connected in this guild?
+            existing = self._voice_clients.get(guild_id)
+            if existing and existing.is_connected():
+                if existing.channel.id == channel.id:
+                    self._reset_voice_timeout(guild_id)
+                    return True
+                await existing.move_to(channel)
                self._reset_voice_timeout(guild_id)
                return True
-            await existing.move_to(channel)
+
+            vc = await channel.connect()
+            self._voice_clients[guild_id] = vc
            self._reset_voice_timeout(guild_id)
+
+            # Start voice receiver (Phase 2: listen to users)
+            try:
+                receiver = VoiceReceiver(vc, allowed_user_ids=self._allowed_user_ids)
+                receiver.start()
+                self._voice_receivers[guild_id] = receiver
+                self._voice_listen_tasks[guild_id] = asyncio.ensure_future(
+                    self._voice_listen_loop(guild_id)
+                )
+            except Exception as e:
+                logger.warning("Voice receiver failed to start: %s", e)
+
            return True

-        vc = await channel.connect()
-        self._voice_clients[guild_id] = vc
-        self._reset_voice_timeout(guild_id)
-
-        # Start voice receiver (Phase 2: listen to users)
-        try:
-            receiver = VoiceReceiver(vc, allowed_user_ids=self._allowed_user_ids)
-            receiver.start()
-            self._voice_receivers[guild_id] = receiver
-            self._voice_listen_tasks[guild_id] = asyncio.ensure_future(
-                self._voice_listen_loop(guild_id)
-            )
-        except Exception as e:
-            logger.warning("Voice receiver failed to start: %s", e)
-
-        return True
-
    async def leave_voice_channel(self, guild_id: int) -> None:
        """Disconnect from the voice channel in a guild."""
-        # Stop voice receiver first
-        receiver = self._voice_receivers.pop(guild_id, None)
-        if receiver:
-            receiver.stop()
-        listen_task = self._voice_listen_tasks.pop(guild_id, None)
-        if listen_task:
-            listen_task.cancel()
+        async with self._voice_locks.setdefault(guild_id, asyncio.Lock()):
+            # Stop voice receiver first
+            receiver = self._voice_receivers.pop(guild_id, None)
+            if receiver:
+                receiver.stop()
+            listen_task = self._voice_listen_tasks.pop(guild_id, None)
+            if listen_task:
+                listen_task.cancel()

-        vc = self._voice_clients.pop(guild_id, None)
-        if vc and vc.is_connected():
-            await vc.disconnect()
-        task = self._voice_timeout_tasks.pop(guild_id, None)
-        if task:
-            task.cancel()
-        self._voice_text_channels.pop(guild_id, None)
-        self._voice_sources.pop(guild_id, None)
+            vc = self._voice_clients.pop(guild_id, None)
+            if vc and vc.is_connected():
+                await vc.disconnect()
+            task = self._voice_timeout_tasks.pop(guild_id, None)
+            if task:
+                task.cancel()
+            self._voice_text_channels.pop(guild_id, None)
+            self._voice_sources.pop(guild_id, None)

    # Maximum seconds to wait for voice playback before giving up
    PLAYBACK_TIMEOUT = 120
@@ -8,7 +8,8 @@ Supports:
 - Gateway allowlist integration via FEISHU_ALLOWED_USERS
 - Persistent dedup state across restarts
 - Per-chat serial message processing (matches openclaw createChatQueue)
- Persistent ACK emoji reaction on inbound messages
+- Processing status reactions: Typing while working, removed on success,
+  swapped for CrossMark on failure
 - Reaction events routed as synthetic text events (matches openclaw)
 - Interactive card button-click events routed as synthetic COMMAND events
 - Webhook anomaly tracking (matches openclaw createWebhookAnomalyTracker)
@@ -29,6 +30,7 @@ import re
 import threading
 import time
 import uuid
+from collections import OrderedDict
 from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path
@@ -98,6 +100,7 @@ from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
    MessageType,
+    ProcessingOutcome,
    SendResult,
    SUPPORTED_DOCUMENT_TYPES,
    cache_document_from_bytes,
@@ -119,6 +122,8 @@ _MARKDOWN_HINT_RE = re.compile(
    re.MULTILINE,
 )
 _MARKDOWN_LINK_RE = re.compile(r"\[([^\]]+)\]\(([^)]+)\)")
+_MARKDOWN_FENCE_OPEN_RE = re.compile(r"^```([^\n`]*)\s*$")
+_MARKDOWN_FENCE_CLOSE_RE = re.compile(r"^```\s*$")
 _MENTION_RE = re.compile(r"@_user_\d+")
 _MULTISPACE_RE = re.compile(r"[ \t]{2,}")
 _POST_CONTENT_INVALID_RE = re.compile(r"content format of the post type is incorrect", re.IGNORECASE)
@@ -188,7 +193,17 @@ _APPROVAL_LABEL_MAP: Dict[str, str] = {
 }
 _FEISHU_BOT_MSG_TRACK_SIZE = 512                   # LRU size for tracking sent message IDs
 _FEISHU_REPLY_FALLBACK_CODES = frozenset({230011, 231003})  # reply target withdrawn/missing → create fallback
-_FEISHU_ACK_EMOJI = "OK"
+
+# Feishu reactions render as prominent badges, unlike Discord/Telegram's
+# small footer emoji — a success badge on every message would add noise, so
+# we only mark start (Typing) and failure (CrossMark); the reply itself is
+# the success signal.
+_FEISHU_REACTION_IN_PROGRESS = "Typing"
+_FEISHU_REACTION_FAILURE = "CrossMark"
+# Bound on the (message_id → reaction_id) handle cache. Happy-path entries
+# drain on completion; the cap is a safeguard against unbounded growth from
+# delete-failures, not a capacity plan.
+_FEISHU_PROCESSING_REACTION_CACHE_SIZE = 1024

 # QR onboarding constants
 _ONBOARD_ACCOUNTS_URLS = {
@@ -430,23 +445,66 @@ def _coerce_required_int(value: Any, default: int, min_value: int = 0) -> int:


 def _build_markdown_post_payload(content: str) -> str:
+    rows = _build_markdown_post_rows(content)
    return json.dumps(
        {
            "zh_cn": {
-                "content": [
-                    [
-                        {
-                            "tag": "md",
-                            "text": content,
-                        }
-                    ]
-                ],
+                "content": rows,
            }
        },
        ensure_ascii=False,
    )


+def _build_markdown_post_rows(content: str) -> List[List[Dict[str, str]]]:
+    """Build Feishu post rows while isolating fenced code blocks.
+
+    Feishu's `md` renderer can swallow trailing content when a fenced code block
+    appears inside one large markdown element. Split the reply at real fence
+    lines so prose before/after the code block remains visible while code stays
+    in a dedicated row.
+    """
+    if not content:
+        return [[{"tag": "md", "text": ""}]]
+    if "```" not in content:
+        return [[{"tag": "md", "text": content}]]
+
+    rows: List[List[Dict[str, str]]] = []
+    current: List[str] = []
+    in_code_block = False
+
+    def _flush_current() -> None:
+        nonlocal current
+        if not current:
+            return
+        segment = "\n".join(current)
+        if segment.strip():
+            rows.append([{"tag": "md", "text": segment}])
+        current = []
+
+    for raw_line in content.splitlines():
+        stripped_line = raw_line.strip()
+        is_fence = bool(
+            _MARKDOWN_FENCE_CLOSE_RE.match(stripped_line)
+            if in_code_block
+            else _MARKDOWN_FENCE_OPEN_RE.match(stripped_line)
+        )
+
+        if is_fence:
+            if not in_code_block:
+                _flush_current()
+            current.append(raw_line)
+            in_code_block = not in_code_block
+            if not in_code_block:
+                _flush_current()
+            continue
+
+        current.append(raw_line)
+
+    _flush_current()
+    return rows or [[{"tag": "md", "text": content}]]
+
+
 def parse_feishu_post_payload(payload: Any) -> FeishuPostParseResult:
    resolved = _resolve_post_payload(payload)
    if not resolved:
@@ -1096,6 +1154,9 @@ class FeishuAdapter(BasePlatformAdapter):
        # Exec approval button state (approval_id → {session_key, message_id, chat_id})
        self._approval_state: Dict[int, Dict[str, str]] = {}
        self._approval_counter = itertools.count(1)
+        # Feishu reaction deletion requires the opaque reaction_id returned
+        # by create, so we cache it per message_id.
+        self._pending_processing_reactions: "OrderedDict[str, str]" = OrderedDict()
        self._load_seen_message_ids()

    @staticmethod
@@ -1423,6 +1484,8 @@ class FeishuAdapter(BasePlatformAdapter):
        chat_id: str,
        message_id: str,
        content: str,
+        *,
+        finalize: bool = False,
    ) -> SendResult:
        """Edit a previously sent Feishu text/post message."""
        if not self._client:
@@ -1925,8 +1988,8 @@ class FeishuAdapter(BasePlatformAdapter):
        if not message_id or self._is_duplicate(message_id):
            logger.debug("[Feishu] Dropping duplicate/missing message_id: %s", message_id)
            return
-        if getattr(sender, "sender_type", "") == "bot":
-            logger.debug("[Feishu] Dropping bot-originated event: %s", message_id)
+        if self._is_self_sent_bot_message(event):
+            logger.debug("[Feishu] Dropping self-sent bot event: %s", message_id)
            return

        chat_type = getattr(message, "chat_type", "p2p")
@@ -2003,12 +2066,12 @@ class FeishuAdapter(BasePlatformAdapter):
            operator_type,
            emoji_type,
        )
-        # Only process reactions from real users. Ignore app/bot-generated reactions
-        # and Hermes' own ACK emoji to avoid feedback loops.
+        # Drop bot/app-origin reactions to break the feedback loop from our
+        # own lifecycle reactions. A human reacting with the same emoji (e.g.
+        # clicking Typing on a bot message) is still routed through.
        loop = self._loop
        if (
            operator_type in {"bot", "app"}
-            or emoji_type == _FEISHU_ACK_EMOJI
            or not message_id
            or loop is None
            or bool(getattr(loop, "is_closed", lambda: False)())
@@ -2232,33 +2295,35 @@ class FeishuAdapter(BasePlatformAdapter):

    async def _handle_message_with_guards(self, event: MessageEvent) -> None:
        """Dispatch a single event through the agent pipeline with per-chat serialization
-        and a persistent ACK emoji reaction before processing starts.
+        before handing the event off to the agent.

-        - Per-chat lock: ensures messages in the same chat are processed one at a time
-          (matches openclaw's createChatQueue serial queue behaviour).
-        - ACK indicator: adds a CHECK reaction to the triggering message before handing
-          off to the agent and leaves it in place as a receipt marker.
+        Per-chat lock ensures messages in the same chat are processed one at a
+        time (matches openclaw's createChatQueue serial queue behaviour).
        """
        chat_id = getattr(event.source, "chat_id", "") or "" if event.source else ""
        chat_lock = self._get_chat_lock(chat_id)
        async with chat_lock:
-            message_id = event.message_id
-            if message_id:
-                await self._add_ack_reaction(message_id)
            await self.handle_message(event)

-    async def _add_ack_reaction(self, message_id: str) -> Optional[str]:
-        """Add a persistent ACK emoji reaction to signal the message was received."""
-        if not self._client or not message_id:
+    # =========================================================================
+    # Processing status reactions
+    # =========================================================================
+
+    def _reactions_enabled(self) -> bool:
+        return os.getenv("FEISHU_REACTIONS", "true").strip().lower() not in ("false", "0", "no")
+
+    async def _add_reaction(self, message_id: str, emoji_type: str) -> Optional[str]:
+        """Return the reaction_id on success, else None. The id is needed later for deletion."""
+        if not self._client or not message_id or not emoji_type:
            return None
        try:
-            from lark_oapi.api.im.v1 import (  # lazy import — keeps optional dep optional
+            from lark_oapi.api.im.v1 import (
                CreateMessageReactionRequest,
                CreateMessageReactionRequestBody,
            )
            body = (
                CreateMessageReactionRequestBody.builder()
-                .reaction_type({"emoji_type": _FEISHU_ACK_EMOJI})
+                .reaction_type({"emoji_type": emoji_type})
                .build()
            )
            request = (
@@ -2271,16 +2336,93 @@ class FeishuAdapter(BasePlatformAdapter):
            if response and getattr(response, "success", lambda: False)():
                data = getattr(response, "data", None)
                return getattr(data, "reaction_id", None)
-            logger.warning(
-                "[Feishu] Failed to add ack reaction to %s: code=%s msg=%s",
+            logger.debug(
+                "[Feishu] Add reaction %s on %s rejected: code=%s msg=%s",
+                emoji_type,
                message_id,
                getattr(response, "code", None),
                getattr(response, "msg", None),
            )
        except Exception:
-            logger.warning("[Feishu] Failed to add ack reaction to %s", message_id, exc_info=True)
+            logger.warning(
+                "[Feishu] Add reaction %s on %s raised",
+                emoji_type,
+                message_id,
+                exc_info=True,
+            )
        return None

+    async def _remove_reaction(self, message_id: str, reaction_id: str) -> bool:
+        if not self._client or not message_id or not reaction_id:
+            return False
+        try:
+            from lark_oapi.api.im.v1 import DeleteMessageReactionRequest
+            request = (
+                DeleteMessageReactionRequest.builder()
+                .message_id(message_id)
+                .reaction_id(reaction_id)
+                .build()
+            )
+            response = await asyncio.to_thread(self._client.im.v1.message_reaction.delete, request)
+            if response and getattr(response, "success", lambda: False)():
+                return True
+            logger.debug(
+                "[Feishu] Remove reaction %s on %s rejected: code=%s msg=%s",
+                reaction_id,
+                message_id,
+                getattr(response, "code", None),
+                getattr(response, "msg", None),
+            )
+        except Exception:
+            logger.warning(
+                "[Feishu] Remove reaction %s on %s raised",
+                reaction_id,
+                message_id,
+                exc_info=True,
+            )
+        return False
+
+    def _remember_processing_reaction(self, message_id: str, reaction_id: str) -> None:
+        cache = self._pending_processing_reactions
+        cache[message_id] = reaction_id
+        cache.move_to_end(message_id)
+        while len(cache) > _FEISHU_PROCESSING_REACTION_CACHE_SIZE:
+            cache.popitem(last=False)
+
+    def _pop_processing_reaction(self, message_id: str) -> Optional[str]:
+        return self._pending_processing_reactions.pop(message_id, None)
+
+    async def on_processing_start(self, event: MessageEvent) -> None:
+        if not self._reactions_enabled():
+            return
+        message_id = event.message_id
+        if not message_id or message_id in self._pending_processing_reactions:
+            return
+        reaction_id = await self._add_reaction(message_id, _FEISHU_REACTION_IN_PROGRESS)
+        if reaction_id:
+            self._remember_processing_reaction(message_id, reaction_id)
+
+    async def on_processing_complete(
+        self, event: MessageEvent, outcome: ProcessingOutcome
+    ) -> None:
+        if not self._reactions_enabled():
+            return
+        message_id = event.message_id
+        if not message_id:
+            return
+
+        start_reaction_id = self._pending_processing_reactions.get(message_id)
+        if start_reaction_id:
+            if not await self._remove_reaction(message_id, start_reaction_id):
+                # Don't stack a second badge on top of a Typing we couldn't
+                # remove — UI would read as both "working" and "done/failed"
+                # simultaneously. Keep the handle so LRU eventually evicts it.
+                return
+            self._pop_processing_reaction(message_id)
+
+        if outcome is ProcessingOutcome.FAILURE:
+            await self._add_reaction(message_id, _FEISHU_REACTION_FAILURE)
+
    # =========================================================================
    # Webhook server and security
    # =========================================================================
@@ -3249,6 +3391,23 @@ class FeishuAdapter(BasePlatformAdapter):
            return self._post_mentions_bot(normalized.mentioned_ids)
        return False

+    def _is_self_sent_bot_message(self, event: Any) -> bool:
+        """Return True only for Feishu events emitted by this Hermes bot."""
+        sender = getattr(event, "sender", None)
+        sender_type = str(getattr(sender, "sender_type", "") or "").strip().lower()
+        if sender_type not in {"bot", "app"}:
+            return False
+
+        sender_id = getattr(sender, "sender_id", None)
+        sender_open_id = str(getattr(sender_id, "open_id", "") or "").strip()
+        sender_user_id = str(getattr(sender_id, "user_id", "") or "").strip()
+
+        if self._bot_open_id and sender_open_id == self._bot_open_id:
+            return True
+        if self._bot_user_id and sender_user_id == self._bot_user_id:
+            return True
+        return False
+
    def _message_mentions_bot(self, mentions: List[Any]) -> bool:
        """Check whether any mention targets the configured or inferred bot identity."""
        for mention in mentions:
@@ -3276,10 +3435,55 @@ class FeishuAdapter(BasePlatformAdapter):
        return False

    async def _hydrate_bot_identity(self) -> None:
-        """Best-effort discovery of bot identity for precise group mention gating."""
+        """Best-effort discovery of bot identity for precise group mention gating
+        and self-sent bot event filtering.
+
+        Populates ``_bot_open_id`` and ``_bot_name`` from /open-apis/bot/v3/info
+        (no extra scopes required beyond the tenant access token). Falls back to
+        the application info endpoint for ``_bot_name`` only when the first probe
+        doesn't return it. Each field is hydrated independently — a value already
+        supplied via env vars (FEISHU_BOT_OPEN_ID / FEISHU_BOT_USER_ID /
+        FEISHU_BOT_NAME) is preserved and skips its probe.
+        """
        if not self._client:
            return
-        if any((self._bot_open_id, self._bot_user_id, self._bot_name)):
+        if self._bot_open_id and self._bot_name:
+            # Everything the self-send filter and precise mention gate need is
+            # already in place; nothing to probe.
+            return
+
+        # Primary probe: /open-apis/bot/v3/info — returns bot_name + open_id, no
+        # extra scopes required. This is the same endpoint the onboarding wizard
+        # uses via probe_bot().
+        if not self._bot_open_id or not self._bot_name:
+            try:
+                resp = await asyncio.to_thread(
+                    self._client.request,
+                    method="GET",
+                    url="/open-apis/bot/v3/info",
+                    body=None,
+                    raw_response=True,
+                )
+                content = getattr(resp, "content", None)
+                if content:
+                    payload = json.loads(content)
+                    parsed = _parse_bot_response(payload) or {}
+                    open_id = (parsed.get("bot_open_id") or "").strip()
+                    bot_name = (parsed.get("bot_name") or "").strip()
+                    if open_id and not self._bot_open_id:
+                        self._bot_open_id = open_id
+                    if bot_name and not self._bot_name:
+                        self._bot_name = bot_name
+            except Exception:
+                logger.debug(
+                    "[Feishu] /bot/v3/info probe failed during hydration",
+                    exc_info=True,
+                )
+
+        # Fallback probe for _bot_name only: application info endpoint. Needs
+        # admin:app.info:readonly or application:application:self_manage scope,
+        # so it's best-effort.
+        if self._bot_name:
            return
        try:
            request = self._build_get_application_request(app_id=self._app_id, lang="en_us")
@@ -3288,17 +3492,17 @@ class FeishuAdapter(BasePlatformAdapter):
                code = getattr(response, "code", None)
                if code == 99991672:
                    logger.warning(
-                        "[Feishu] Unable to hydrate bot identity from application info. "
+                        "[Feishu] Unable to hydrate bot name from application info. "
                        "Grant admin:app.info:readonly or application:application:self_manage "
                        "so group @mention gating can resolve the bot name precisely."
                    )
                return
            app = getattr(getattr(response, "data", None), "app", None)
            app_name = (getattr(app, "app_name", None) or "").strip()
-            if app_name:
+            if app_name and not self._bot_name:
                self._bot_name = app_name
        except Exception:
-            logger.debug("[Feishu] Failed to hydrate bot identity", exc_info=True)
+            logger.debug("[Feishu] Failed to hydrate bot name from application info", exc_info=True)

    # =========================================================================
    # Deduplication — seen message ID cache (persistent)
@@ -825,7 +825,7 @@ class MatrixAdapter(BasePlatformAdapter):


    async def edit_message(
-        self, chat_id: str, message_id: str, content: str
+        self, chat_id: str, message_id: str, content: str, *, finalize: bool = False
    ) -> SendResult:
        """Edit an existing message (via m.replace)."""

@@ -304,7 +304,7 @@ class MattermostAdapter(BasePlatformAdapter):
        )

    async def edit_message(
-        self, chat_id: str, message_id: str, content: str
+        self, chat_id: str, message_id: str, content: str, *, finalize: bool = False
    ) -> SendResult:
        """Edit an existing post."""
        formatted = self.format_message(content)
@@ -18,6 +18,7 @@ import logging
 import os
 import random
 import time
+import uuid
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Dict, List, Optional, Any
@@ -127,6 +128,27 @@ def _render_mentions(text: str, mentions: list) -> str:
    return text


+def _is_signal_service_id(value: str) -> bool:
+    """Return True if *value* already looks like a Signal service identifier."""
+    if not value:
+        return False
+    if value.startswith("PNI:") or value.startswith("u:"):
+        return True
+    try:
+        uuid.UUID(value)
+        return True
+    except (ValueError, AttributeError, TypeError):
+        return False
+
+
+def _looks_like_e164_number(value: str) -> bool:
+    """Return True for a plausible E.164 phone number."""
+    if not value or not value.startswith("+"):
+        return False
+    digits = value[1:]
+    return digits.isdigit() and 7 <= len(digits) <= 15
+
+
 def check_signal_requirements() -> bool:
    """Check if Signal is configured (has URL and account)."""
    return bool(os.getenv("SIGNAL_HTTP_URL") and os.getenv("SIGNAL_ACCOUNT"))
@@ -179,6 +201,12 @@ class SignalAdapter(BasePlatformAdapter):
        # in Note to Self / self-chat mode (mirrors WhatsApp recentlySentIds)
        self._recent_sent_timestamps: set = set()
        self._max_recent_timestamps = 50
+        # Signal increasingly exposes ACI/PNI UUIDs as stable recipient IDs.
+        # Keep a best-effort mapping so outbound sends can upgrade from a
+        # phone number to the corresponding UUID when signal-cli prefers it.
+        self._recipient_uuid_by_number: Dict[str, str] = {}
+        self._recipient_number_by_uuid: Dict[str, str] = {}
+        self._recipient_cache_lock = asyncio.Lock()

        logger.info("Signal adapter initialized: url=%s account=%s groups=%s",
                     self.http_url, redact_phone(self.account),
@@ -195,31 +223,40 @@ class SignalAdapter(BasePlatformAdapter):
            return False

        # Acquire scoped lock to prevent duplicate Signal listeners for the same phone
+        lock_acquired = False
        try:
            if not self._acquire_platform_lock('signal-phone', self.account, 'Signal account'):
                return False
+            lock_acquired = True
        except Exception as e:
            logger.warning("Signal: Could not acquire phone lock (non-fatal): %s", e)

        self.client = httpx.AsyncClient(timeout=30.0)
-
-        # Health check — verify signal-cli daemon is reachable
        try:
-            resp = await self.client.get(f"{self.http_url}/api/v1/check", timeout=10.0)
-            if resp.status_code != 200:
-                logger.error("Signal: health check failed (status %d)", resp.status_code)
+            # Health check — verify signal-cli daemon is reachable
+            try:
+                resp = await self.client.get(f"{self.http_url}/api/v1/check", timeout=10.0)
+                if resp.status_code != 200:
+                    logger.error("Signal: health check failed (status %d)", resp.status_code)
+                    return False
+            except Exception as e:
+                logger.error("Signal: cannot reach signal-cli at %s: %s", self.http_url, e)
                return False
-        except Exception as e:
-            logger.error("Signal: cannot reach signal-cli at %s: %s", self.http_url, e)
-            return False

-        self._running = True
-        self._last_sse_activity = time.time()
-        self._sse_task = asyncio.create_task(self._sse_listener())
-        self._health_monitor_task = asyncio.create_task(self._health_monitor())
+            self._running = True
+            self._last_sse_activity = time.time()
+            self._sse_task = asyncio.create_task(self._sse_listener())
+            self._health_monitor_task = asyncio.create_task(self._health_monitor())

-        logger.info("Signal: connected to %s", self.http_url)
-        return True
+            logger.info("Signal: connected to %s", self.http_url)
+            return True
+        finally:
+            if not self._running:
+                if self.client:
+                    await self.client.aclose()
+                    self.client = None
+                if lock_acquired:
+                    self._release_platform_lock()

    async def disconnect(self) -> None:
        """Stop SSE listener and clean up."""
@@ -400,6 +437,7 @@ class SignalAdapter(BasePlatformAdapter):
        )
        sender_name = envelope_data.get("sourceName", "")
        sender_uuid = envelope_data.get("sourceUuid", "")
+        self._remember_recipient_identifiers(sender, sender_uuid)

        if not sender:
            logger.debug("Signal: ignoring envelope with no sender")
@@ -518,6 +556,64 @@ class SignalAdapter(BasePlatformAdapter):

        await self.handle_message(event)

+    def _remember_recipient_identifiers(self, number: Optional[str], service_id: Optional[str]) -> None:
+        """Cache any number↔UUID mapping observed from Signal envelopes."""
+        if not number or not service_id or not _is_signal_service_id(service_id):
+            return
+        self._recipient_uuid_by_number[number] = service_id
+        self._recipient_number_by_uuid[service_id] = number
+
+    def _extract_contact_uuid(self, contact: Any, phone_number: str) -> Optional[str]:
+        """Best-effort extraction of a Signal service ID from listContacts output."""
+        if not isinstance(contact, dict):
+            return None
+
+        number = contact.get("number")
+        recipient = contact.get("recipient")
+        service_id = contact.get("uuid") or contact.get("serviceId")
+        if not service_id:
+            profile = contact.get("profile")
+            if isinstance(profile, dict):
+                service_id = profile.get("serviceId") or profile.get("uuid")
+
+        if service_id and _is_signal_service_id(service_id):
+            matches_number = number == phone_number or recipient == phone_number
+            if matches_number:
+                return service_id
+        return None
+
+    async def _resolve_recipient(self, chat_id: str) -> str:
+        """Return the preferred Signal recipient identifier for a direct chat."""
+        if (
+            not chat_id
+            or chat_id.startswith("group:")
+            or _is_signal_service_id(chat_id)
+            or not _looks_like_e164_number(chat_id)
+        ):
+            return chat_id
+
+        cached = self._recipient_uuid_by_number.get(chat_id)
+        if cached:
+            return cached
+
+        async with self._recipient_cache_lock:
+            cached = self._recipient_uuid_by_number.get(chat_id)
+            if cached:
+                return cached
+
+            contacts = await self._rpc("listContacts", {
+                "account": self.account,
+                "allRecipients": True,
+            })
+            if isinstance(contacts, list):
+                for contact in contacts:
+                    number = contact.get("number") if isinstance(contact, dict) else None
+                    service_id = self._extract_contact_uuid(contact, chat_id)
+                    if number and service_id:
+                        self._remember_recipient_identifiers(number, service_id)
+
+            return self._recipient_uuid_by_number.get(chat_id, chat_id)
+
    # ------------------------------------------------------------------
    # Attachment Handling
    # ------------------------------------------------------------------
@@ -633,7 +729,7 @@ class SignalAdapter(BasePlatformAdapter):
        if chat_id.startswith("group:"):
            params["groupId"] = chat_id[6:]
        else:
-            params["recipient"] = [chat_id]
+            params["recipient"] = [await self._resolve_recipient(chat_id)]

        result = await self._rpc("send", params)

@@ -684,7 +780,7 @@ class SignalAdapter(BasePlatformAdapter):
        if chat_id.startswith("group:"):
            params["groupId"] = chat_id[6:]
        else:
-            params["recipient"] = [chat_id]
+            params["recipient"] = [await self._resolve_recipient(chat_id)]

        fails = self._typing_failures.get(chat_id, 0)
        result = await self._rpc(
@@ -745,7 +841,7 @@ class SignalAdapter(BasePlatformAdapter):
        if chat_id.startswith("group:"):
            params["groupId"] = chat_id[6:]
        else:
-            params["recipient"] = [chat_id]
+            params["recipient"] = [await self._resolve_recipient(chat_id)]

        result = await self._rpc("send", params)
        if result is not None:
@@ -784,7 +880,7 @@ class SignalAdapter(BasePlatformAdapter):
        if chat_id.startswith("group:"):
            params["groupId"] = chat_id[6:]
        else:
-            params["recipient"] = [chat_id]
+            params["recipient"] = [await self._resolve_recipient(chat_id)]

        result = await self._rpc("send", params)
        if result is not None:
@@ -150,9 +150,11 @@ class SlackAdapter(BasePlatformAdapter):
            except Exception as e:
                logger.warning("[Slack] Failed to read %s: %s", tokens_file, e)

+        lock_acquired = False
        try:
            if not self._acquire_platform_lock('slack-app-token', app_token, 'Slack app token'):
                return False
+            lock_acquired = True

            # First token is the primary — used for AsyncApp / Socket Mode
            primary_token = bot_tokens[0]
@@ -228,6 +230,9 @@ class SlackAdapter(BasePlatformAdapter):
        except Exception as e:  # pragma: no cover - defensive logging
            logger.error("[Slack] Connection failed: %s", e, exc_info=True)
            return False
+        finally:
+            if lock_acquired and not self._running:
+                self._release_platform_lock()

    async def disconnect(self) -> None:
        """Disconnect from Slack."""
@@ -316,6 +321,8 @@ class SlackAdapter(BasePlatformAdapter):
        chat_id: str,
        message_id: str,
        content: str,
+        *,
+        finalize: bool = False,
    ) -> SendResult:
        """Edit a previously sent Slack message."""
        if not self._app:
@@ -11,6 +11,7 @@ import asyncio
 import json
 import logging
 import os
+import tempfile
 import html as _html
 import re
 from typing import Dict, List, Optional, Any
@@ -70,8 +71,10 @@ from gateway.platforms.base import (
    SendResult,
    cache_image_from_bytes,
    cache_audio_from_bytes,
+    cache_video_from_bytes,
    cache_document_from_bytes,
    resolve_proxy_url,
+    SUPPORTED_VIDEO_TYPES,
    SUPPORTED_DOCUMENT_TYPES,
    utf16_len,
    _prefix_within_utf16_limit,
@@ -534,8 +537,23 @@ class TelegramAdapter(BasePlatformAdapter):
                        break

            if changed:
-                with open(config_path, "w") as f:
-                    _yaml.dump(config, f, default_flow_style=False, sort_keys=False)
+                fd, tmp_path = tempfile.mkstemp(
+                    dir=str(config_path.parent),
+                    suffix=".tmp",
+                    prefix=".config_",
+                )
+                try:
+                    with os.fdopen(fd, "w", encoding="utf-8") as f:
+                        _yaml.dump(config, f, default_flow_style=False, sort_keys=False)
+                        f.flush()
+                        os.fsync(f.fileno())
+                    os.replace(tmp_path, config_path)
+                except BaseException:
+                    try:
+                        os.unlink(tmp_path)
+                    except OSError:
+                        pass
+                    raise
                logger.info(
                    "[%s] Persisted thread_id=%s for topic '%s' in config.yaml",
                    self.name, thread_id, topic_name,
@@ -1081,6 +1099,8 @@ class TelegramAdapter(BasePlatformAdapter):
        chat_id: str,
        message_id: str,
        content: str,
+        *,
+        finalize: bool = False,
    ) -> SendResult:
        """Edit a previously sent Telegram message."""
        if not self._bot:
@@ -1657,6 +1677,21 @@ class TelegramAdapter(BasePlatformAdapter):
        except Exception as exc:
            logger.error("Failed to write update response from callback: %s", exc)

+    def _missing_media_path_error(self, label: str, path: str) -> str:
+        """Build an actionable file-not-found error for gateway MEDIA delivery.
+
+        Paths like /workspace/... or /output/... often only exist inside the
+        Docker sandbox, while the gateway process runs on the host.
+        """
+        error = f"{label} file not found: {path}"
+        if path.startswith(("/workspace/", "/output/", "/outputs/")):
+            error += (
+                " (path may only exist inside the Docker sandbox. "
+                "Bind-mount a host directory and emit the host-visible "
+                "path in MEDIA: for gateway file delivery.)"
+            )
+        return error
+
    async def send_voice(
        self,
        chat_id: str,
@@ -1673,7 +1708,7 @@ class TelegramAdapter(BasePlatformAdapter):
        try:
            import os
            if not os.path.exists(audio_path):
-                return SendResult(success=False, error=f"Audio file not found: {audio_path}")
+                return SendResult(success=False, error=self._missing_media_path_error("Audio", audio_path))
            
            with open(audio_path, "rb") as audio_file:
                # .ogg files -> send as voice (round playable bubble)
@@ -1722,7 +1757,7 @@ class TelegramAdapter(BasePlatformAdapter):
        try:
            import os
            if not os.path.exists(image_path):
-                return SendResult(success=False, error=f"Image file not found: {image_path}")
+                return SendResult(success=False, error=self._missing_media_path_error("Image", image_path))

            _thread = self._metadata_thread_id(metadata)
            with open(image_path, "rb") as image_file:
@@ -1759,7 +1794,7 @@ class TelegramAdapter(BasePlatformAdapter):

        try:
            if not os.path.exists(file_path):
-                return SendResult(success=False, error=f"File not found: {file_path}")
+                return SendResult(success=False, error=self._missing_media_path_error("File", file_path))

            display_name = file_name or os.path.basename(file_path)
            _thread = self._metadata_thread_id(metadata)
@@ -1793,7 +1828,7 @@ class TelegramAdapter(BasePlatformAdapter):

        try:
            if not os.path.exists(video_path):
-                return SendResult(success=False, error=f"Video file not found: {video_path}")
+                return SendResult(success=False, error=self._missing_media_path_error("Video", video_path))

            _thread = self._metadata_thread_id(metadata)
            with open(video_path, "rb") as f:
@@ -2241,22 +2276,27 @@ class TelegramAdapter(BasePlatformAdapter):

        bot_username = (getattr(self._bot, "username", None) or "").lstrip("@").lower()
        bot_id = getattr(self._bot, "id", None)
+        expected = f"@{bot_username}" if bot_username else None

        def _iter_sources():
            yield getattr(message, "text", None) or "", getattr(message, "entities", None) or []
            yield getattr(message, "caption", None) or "", getattr(message, "caption_entities", None) or []

+        # Telegram parses mentions server-side and emits MessageEntity objects
+        # (type=mention for @username, type=text_mention for @FirstName targeting
+        # a user without a public username). Only those entities are authoritative —
+        # raw substring matches like "foo@hermes_bot.example" are not mentions
+        # (bug #12545). Entities also correctly handle @handles inside URLs, code
+        # blocks, and quoted text, where a regex scan would over-match.
        for source_text, entities in _iter_sources():
-            if bot_username and f"@{bot_username}" in source_text.lower():
-                return True
            for entity in entities:
                entity_type = str(getattr(entity, "type", "")).split(".")[-1].lower()
-                if entity_type == "mention" and bot_username:
+                if entity_type == "mention" and expected:
                    offset = int(getattr(entity, "offset", -1))
                    length = int(getattr(entity, "length", 0))
                    if offset < 0 or length <= 0:
                        continue
-                    if source_text[offset:offset + length].strip().lower() == f"@{bot_username}":
+                    if source_text[offset:offset + length].strip().lower() == expected:
                        return True
                elif entity_type == "text_mention":
                    user = getattr(entity, "user", None)
@@ -2590,6 +2630,23 @@ class TelegramAdapter(BasePlatformAdapter):
            except Exception as e:
                logger.warning("[Telegram] Failed to cache audio: %s", e, exc_info=True)

+        elif msg.video:
+            try:
+                file_obj = await msg.video.get_file()
+                video_bytes = await file_obj.download_as_bytearray()
+                ext = ".mp4"
+                if getattr(file_obj, "file_path", None):
+                    for candidate in SUPPORTED_VIDEO_TYPES:
+                        if file_obj.file_path.lower().endswith(candidate):
+                            ext = candidate
+                            break
+                cached_path = cache_video_from_bytes(bytes(video_bytes), ext=ext)
+                event.media_urls = [cached_path]
+                event.media_types = [SUPPORTED_VIDEO_TYPES.get(ext, "video/mp4")]
+                logger.info("[Telegram] Cached user video at %s", cached_path)
+            except Exception as e:
+                logger.warning("[Telegram] Failed to cache video: %s", e, exc_info=True)
+
        # Download document files to cache for agent processing
        elif msg.document:
            doc = msg.document
@@ -2606,6 +2663,21 @@ class TelegramAdapter(BasePlatformAdapter):
                    mime_to_ext = {v: k for k, v in SUPPORTED_DOCUMENT_TYPES.items()}
                    ext = mime_to_ext.get(doc.mime_type, "")

+                if not ext and doc.mime_type:
+                    video_mime_to_ext = {v: k for k, v in SUPPORTED_VIDEO_TYPES.items()}
+                    ext = video_mime_to_ext.get(doc.mime_type, "")
+
+                if ext in SUPPORTED_VIDEO_TYPES:
+                    file_obj = await doc.get_file()
+                    video_bytes = await file_obj.download_as_bytearray()
+                    cached_path = cache_video_from_bytes(bytes(video_bytes), ext=ext)
+                    event.media_urls = [cached_path]
+                    event.media_types = [SUPPORTED_VIDEO_TYPES[ext]]
+                    event.message_type = MessageType.VIDEO
+                    logger.info("[Telegram] Cached user video document at %s", cached_path)
+                    await self.handle_message(event)
+                    return
+
                # Check if supported
                if ext not in SUPPORTED_DOCUMENT_TYPES:
                    supported_list = ", ".join(sorted(SUPPORTED_DOCUMENT_TYPES.keys()))
@@ -13,6 +13,10 @@ Each route defines:
  - skills: optional list of skills to load for the agent
  - deliver: where to send the response (github_comment, telegram, etc.)
  - deliver_extra: additional delivery config (repo, pr_number, chat_id)
+  - deliver_only: if true, skip the agent — the rendered prompt IS the
+    message that gets delivered.  Use for external push notifications
+    (Supabase, monitoring alerts, inter-agent pings) where zero LLM cost
+    and sub-second delivery matter more than agent reasoning.

 Security:
  - HMAC secret is required per route (validated at startup)
@@ -122,6 +126,19 @@ class WebhookAdapter(BasePlatformAdapter):
                    f"For testing without auth, set secret to '{_INSECURE_NO_AUTH}'."
                )

+            # deliver_only routes bypass the agent — the POST body becomes a
+            # direct push notification via the configured delivery target.
+            # Validate up-front so misconfiguration surfaces at startup rather
+            # than on the first webhook POST.
+            if route.get("deliver_only"):
+                deliver = route.get("deliver", "log")
+                if not deliver or deliver == "log":
+                    raise ValueError(
+                        f"[webhook] Route '{name}' has deliver_only=true but "
+                        f"deliver is '{deliver}'. Direct delivery requires a "
+                        f"real target (telegram, discord, slack, github_comment, etc.)."
+                    )
+
        app = web.Application()
        app.router.add_get("/health", self._handle_health)
        app.router.add_post("/webhooks/{route_name}", self._handle_webhook)
@@ -296,24 +313,14 @@ class WebhookAdapter(BasePlatformAdapter):
                {"error": "Payload too large"}, status=413
            )

-        # ── Rate limiting ────────────────────────────────────────
-        now = time.time()
-        window = self._rate_counts.setdefault(route_name, [])
-        window[:] = [t for t in window if now - t < 60]
-        if len(window) >= self._rate_limit:
-            return web.json_response(
-                {"error": "Rate limit exceeded"}, status=429
-            )
-        window.append(now)
-
-        # Read body
+        # Read body (must be done before any validation)
        try:
            raw_body = await request.read()
        except Exception as e:
            logger.error("[webhook] Failed to read body: %s", e)
            return web.json_response({"error": "Bad request"}, status=400)

-        # Validate HMAC signature (skip for INSECURE_NO_AUTH testing mode)
+        # Validate HMAC signature FIRST (skip for INSECURE_NO_AUTH testing mode)
        secret = route_config.get("secret", self._global_secret)
        if secret and secret != _INSECURE_NO_AUTH:
            if not self._validate_signature(request, raw_body, secret):
@@ -324,6 +331,16 @@ class WebhookAdapter(BasePlatformAdapter):
                    {"error": "Invalid signature"}, status=401
                )

+        # ── Rate limiting (after auth) ───────────────────────────
+        now = time.time()
+        window = self._rate_counts.setdefault(route_name, [])
+        window[:] = [t for t in window if now - t < 60]
+        if len(window) >= self._rate_limit:
+            return web.json_response(
+                {"error": "Rate limit exceeded"}, status=429
+            )
+        window.append(now)
+
        # Parse payload
        try:
            payload = json.loads(raw_body)
@@ -419,6 +436,64 @@ class WebhookAdapter(BasePlatformAdapter):
            )
        self._seen_deliveries[delivery_id] = now

+        # ── Direct delivery mode (deliver_only) ─────────────────
+        # Skip the agent entirely — the rendered prompt IS the message we
+        # deliver.  Use case: external services (Supabase, monitoring,
+        # cron jobs, other agents) that need to push a plain notification
+        # to a user's chat with zero LLM cost.  Reuses the same HMAC auth,
+        # rate limiting, idempotency, and template rendering as agent mode.
+        if route_config.get("deliver_only"):
+            delivery = {
+                "deliver": route_config.get("deliver", "log"),
+                "deliver_extra": self._render_delivery_extra(
+                    route_config.get("deliver_extra", {}), payload
+                ),
+                "payload": payload,
+            }
+            logger.info(
+                "[webhook] direct-deliver event=%s route=%s target=%s msg_len=%d delivery=%s",
+                event_type,
+                route_name,
+                delivery["deliver"],
+                len(prompt),
+                delivery_id,
+            )
+            try:
+                result = await self._direct_deliver(prompt, delivery)
+            except Exception:
+                logger.exception(
+                    "[webhook] direct-deliver failed route=%s delivery=%s",
+                    route_name,
+                    delivery_id,
+                )
+                return web.json_response(
+                    {"status": "error", "error": "Delivery failed", "delivery_id": delivery_id},
+                    status=502,
+                )
+
+            if result.success:
+                return web.json_response(
+                    {
+                        "status": "delivered",
+                        "route": route_name,
+                        "target": delivery["deliver"],
+                        "delivery_id": delivery_id,
+                    },
+                    status=200,
+                )
+            # Delivery attempted but target rejected it — surface as 502
+            # with a generic error (don't leak adapter-level detail).
+            logger.warning(
+                "[webhook] direct-deliver target rejected route=%s target=%s error=%s",
+                route_name,
+                delivery["deliver"],
+                result.error,
+            )
+            return web.json_response(
+                {"status": "error", "error": "Delivery failed", "delivery_id": delivery_id},
+                status=502,
+            )
+
        # Use delivery_id in session key so concurrent webhooks on the
        # same route get independent agent runs (not queued/interrupted).
        session_chat_id = f"webhook:{route_name}:{delivery_id}"
@@ -572,6 +647,34 @@ class WebhookAdapter(BasePlatformAdapter):
    # Response delivery
    # ------------------------------------------------------------------

+    async def _direct_deliver(
+        self, content: str, delivery: dict
+    ) -> SendResult:
+        """Deliver *content* directly without invoking the agent.
+
+        Used by ``deliver_only`` routes: the rendered template becomes the
+        literal message body, and we dispatch to the same delivery helpers
+        that the agent-mode ``send()`` flow uses.  All target types that
+        work in agent mode work here — Telegram, Discord, Slack, GitHub
+        PR comments, etc.
+        """
+        deliver_type = delivery.get("deliver", "log")
+
+        if deliver_type == "log":
+            # Shouldn't reach here — startup validation rejects deliver_only
+            # with deliver=log — but guard defensively.
+            logger.info("[webhook] direct-deliver log-only: %s", content[:200])
+            return SendResult(success=True)
+
+        if deliver_type == "github_comment":
+            return await self._deliver_github_comment(content, delivery)
+
+        # Fall through to the cross-platform dispatcher, which validates the
+        # target name and routes via the gateway runner.
+        return await self._deliver_cross_platform(
+            deliver_type, content, delivery
+        )
+
    async def _deliver_github_comment(
        self, content: str, delivery: dict
    ) -> SendResult:
@@ -289,33 +289,35 @@ class WhatsAppAdapter(BasePlatformAdapter):
        logger.info("[%s] Bridge found at %s", self.name, bridge_path)
        
        # Acquire scoped lock to prevent duplicate sessions
+        lock_acquired = False
        try:
            if not self._acquire_platform_lock('whatsapp-session', str(self._session_path), 'WhatsApp session'):
                return False
+            lock_acquired = True
        except Exception as e:
            logger.warning("[%s] Could not acquire session lock (non-fatal): %s", self.name, e)

-        # Auto-install npm dependencies if node_modules doesn't exist
-        bridge_dir = bridge_path.parent
-        if not (bridge_dir / "node_modules").exists():
-            print(f"[{self.name}] Installing WhatsApp bridge dependencies...")
-            try:
-                install_result = subprocess.run(
-                    ["npm", "install", "--silent"],
-                    cwd=str(bridge_dir),
-                    capture_output=True,
-                    text=True,
-                    timeout=60,
-                )
-                if install_result.returncode != 0:
-                    print(f"[{self.name}] npm install failed: {install_result.stderr}")
-                    return False
-                print(f"[{self.name}] Dependencies installed")
-            except Exception as e:
-                print(f"[{self.name}] Failed to install dependencies: {e}")
-                return False
-        
        try:
+            # Auto-install npm dependencies if node_modules doesn't exist
+            bridge_dir = bridge_path.parent
+            if not (bridge_dir / "node_modules").exists():
+                print(f"[{self.name}] Installing WhatsApp bridge dependencies...")
+                try:
+                    install_result = subprocess.run(
+                        ["npm", "install", "--silent"],
+                        cwd=str(bridge_dir),
+                        capture_output=True,
+                        text=True,
+                        timeout=60,
+                    )
+                    if install_result.returncode != 0:
+                        print(f"[{self.name}] npm install failed: {install_result.stderr}")
+                        return False
+                    print(f"[{self.name}] Dependencies installed")
+                except Exception as e:
+                    print(f"[{self.name}] Failed to install dependencies: {e}")
+                    return False
+
            # Ensure session directory exists
            self._session_path.mkdir(parents=True, exist_ok=True)
            
@@ -452,10 +454,13 @@ class WhatsAppAdapter(BasePlatformAdapter):
            return True
            
        except Exception as e:
-            self._release_platform_lock()
            logger.error("[%s] Failed to start bridge: %s", self.name, e, exc_info=True)
-            self._close_bridge_log()
            return False
+        finally:
+            if not self._running:
+                if lock_acquired:
+                    self._release_platform_lock()
+                self._close_bridge_log()
    
    def _close_bridge_log(self) -> None:
        """Close the bridge log file handle if open."""
@@ -655,6 +660,8 @@ class WhatsAppAdapter(BasePlatformAdapter):
        chat_id: str,
        message_id: str,
        content: str,
+        *,
+        finalize: bool = False,
    ) -> SendResult:
        """Edit a previously sent message via the WhatsApp bridge."""
        if not self._running or not self._http_session:
@@ -766,6 +773,17 @@ class WhatsAppAdapter(BasePlatformAdapter):
        """Send a video natively via bridge — plays inline in WhatsApp."""
        return await self._send_media_to_bridge(chat_id, video_path, "video", caption)

+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+        **kwargs,
+    ) -> SendResult:
+        """Send an audio file as a WhatsApp voice message via bridge."""
+        return await self._send_media_to_bridge(chat_id, audio_path, "audio", caption)
+
    async def send_document(
        self,
        chat_id: str,
@@ -96,6 +96,10 @@ from hermes_cli.env_loader import load_hermes_dotenv
 _env_path = _hermes_home / '.env'
 load_hermes_dotenv(hermes_home=_hermes_home, project_env=Path(__file__).resolve().parents[1] / '.env')

+
+_DOCKER_VOLUME_SPEC_RE = re.compile(r"^(?P<host>.+):(?P<container>/[^:]+?)(?::(?P<options>[^:]+))?$")
+_DOCKER_MEDIA_OUTPUT_CONTAINER_PATHS = {"/output", "/outputs"}
+
 # Bridge config.yaml values into the environment so os.getenv() picks them up.
 # config.yaml is authoritative for terminal settings — overrides .env.
 _config_path = _hermes_home / 'config.yaml'
@@ -398,6 +402,33 @@ def _dequeue_pending_event(adapter, session_key: str) -> MessageEvent | None:
    return adapter.get_pending_message(session_key)


+_INTERRUPT_REASON_STOP = "Stop requested"
+_INTERRUPT_REASON_RESET = "Session reset requested"
+_INTERRUPT_REASON_TIMEOUT = "Execution timed out (inactivity)"
+_INTERRUPT_REASON_SSE_DISCONNECT = "SSE client disconnected"
+_INTERRUPT_REASON_GATEWAY_SHUTDOWN = "Gateway shutting down"
+_INTERRUPT_REASON_GATEWAY_RESTART = "Gateway restarting"
+
+_CONTROL_INTERRUPT_MESSAGES = frozenset(
+    {
+        _INTERRUPT_REASON_STOP.lower(),
+        _INTERRUPT_REASON_RESET.lower(),
+        _INTERRUPT_REASON_TIMEOUT.lower(),
+        _INTERRUPT_REASON_SSE_DISCONNECT.lower(),
+        _INTERRUPT_REASON_GATEWAY_SHUTDOWN.lower(),
+        _INTERRUPT_REASON_GATEWAY_RESTART.lower(),
+    }
+)
+
+
+def _is_control_interrupt_message(message: Optional[str]) -> bool:
+    """Return True when an interrupt message is internal control flow."""
+    if not message:
+        return False
+    normalized = " ".join(str(message).strip().split()).lower()
+    return normalized in _CONTROL_INTERRUPT_MESSAGES
+
+
 def _check_unavailable_skill(command_name: str) -> str | None:
    """Check if a command matches a known-but-inactive skill.

@@ -585,6 +616,7 @@ class GatewayRunner:
    def __init__(self, config: Optional[GatewayConfig] = None):
        self.config = config or load_gateway_config()
        self.adapters: Dict[Platform, BasePlatformAdapter] = {}
+        self._warn_if_docker_media_delivery_is_risky()

        # Load ephemeral config from config.yaml / env vars.
        # Both are injected at API-call time only and never persisted.
@@ -597,7 +629,6 @@ class GatewayRunner:
        self._restart_drain_timeout = self._load_restart_drain_timeout()
        self._provider_routing = self._load_provider_routing()
        self._fallback_model = self._load_fallback_model()
-        self._smart_model_routing = self._load_smart_model_routing()

        # Wire process registry into session store for reset protection
        from tools.process_registry import process_registry
@@ -625,6 +656,7 @@ class GatewayRunner:
        self._running_agents_ts: Dict[str, float] = {}  # start timestamp per session
        self._pending_messages: Dict[str, str] = {}  # Queued messages during interrupt
        self._busy_ack_ts: Dict[str, float] = {}  # last busy-ack timestamp per session (debounce)
+        self._session_run_generation: Dict[str, int] = {}

        # Cache AIAgent instances per session to preserve prompt caching.
        # Without this, a new AIAgent is created per message, rebuilding the
@@ -691,6 +723,53 @@ class GatewayRunner:
        self._background_tasks: set = set()


+    def _warn_if_docker_media_delivery_is_risky(self) -> None:
+        """Warn when Docker-backed gateways lack an explicit export mount.
+
+        MEDIA delivery happens in the gateway process, so paths emitted by the model
+        must be readable from the host. A plain container-local path like
+        `/workspace/report.txt` or `/output/report.txt` often exists only inside
+        Docker, so users commonly need a dedicated export mount such as
+        `host-dir:/output`.
+        """
+        if os.getenv("TERMINAL_ENV", "").strip().lower() != "docker":
+            return
+
+        connected = self.config.get_connected_platforms()
+        messaging_platforms = [p for p in connected if p not in {Platform.LOCAL, Platform.API_SERVER, Platform.WEBHOOK}]
+        if not messaging_platforms:
+            return
+
+        raw_volumes = os.getenv("TERMINAL_DOCKER_VOLUMES", "").strip()
+        volumes: List[str] = []
+        if raw_volumes:
+            try:
+                parsed = json.loads(raw_volumes)
+                if isinstance(parsed, list):
+                    volumes = [str(v) for v in parsed if isinstance(v, str)]
+            except Exception:
+                logger.debug("Could not parse TERMINAL_DOCKER_VOLUMES for gateway media warning", exc_info=True)
+
+        has_explicit_output_mount = False
+        for spec in volumes:
+            match = _DOCKER_VOLUME_SPEC_RE.match(spec)
+            if not match:
+                continue
+            container_path = match.group("container")
+            if container_path in _DOCKER_MEDIA_OUTPUT_CONTAINER_PATHS:
+                has_explicit_output_mount = True
+                break
+
+        if has_explicit_output_mount:
+            return
+
+        logger.warning(
+            "Docker backend is enabled for the messaging gateway but no explicit host-visible "
+            "output mount (for example '/home/user/.hermes/cache/documents:/output') is configured. "
+            "This is fine if the model already emits host-visible paths, but MEDIA file delivery can fail "
+            "for container-local paths like '/workspace/...' or '/output/...'."
+        )
+


    # -- Setup skill availability ----------------------------------------
@@ -707,6 +786,10 @@ class GatewayRunner:

    _VOICE_MODE_PATH = _hermes_home / "gateway_voice_mode.json"

+    def _voice_key(self, platform: Platform, chat_id: str) -> str:
+        """Return a platform-namespaced key for voice mode state."""
+        return f"{platform.value}:{chat_id}"
+
    def _load_voice_modes(self) -> Dict[str, str]:
        try:
            data = json.loads(self._VOICE_MODE_PATH.read_text())
@@ -717,11 +800,21 @@ class GatewayRunner:
            return {}

        valid_modes = {"off", "voice_only", "all"}
-        return {
-            str(chat_id): mode
-            for chat_id, mode in data.items()
-            if mode in valid_modes
-        }
+        result = {}
+        for chat_id, mode in data.items():
+            if mode not in valid_modes:
+                continue
+            key = str(chat_id)
+            # Skip legacy unprefixed keys (warn and skip)
+            if ":" not in key:
+                logger.warning(
+                    "Skipping legacy unprefixed voice mode key %r during migration. "
+                    "Re-enable voice mode on that chat to rebuild the prefixed key.",
+                    key,
+                )
+                continue
+            result[key] = mode
+        return result

    def _save_voice_modes(self) -> None:
        try:
@@ -747,9 +840,14 @@ class GatewayRunner:
        disabled_chats = getattr(adapter, "_auto_tts_disabled_chats", None)
        if not isinstance(disabled_chats, set):
            return
+        platform = getattr(adapter, "platform", None)
+        if not isinstance(platform, Platform):
+            return
        disabled_chats.clear()
+        prefix = f"{platform.value}:"
        disabled_chats.update(
-            chat_id for chat_id, mode in self._voice_mode.items() if mode == "off"
+            key[len(prefix):] for key, mode in self._voice_mode.items()
+            if mode == "off" and key.startswith(prefix)
        )

    async def _safe_adapter_disconnect(self, adapter, platform) -> None:
@@ -1002,11 +1100,16 @@ class GatewayRunner:
        return model, runtime_kwargs

    def _resolve_turn_agent_config(self, user_message: str, model: str, runtime_kwargs: dict) -> dict:
-        from agent.smart_model_routing import resolve_turn_route
+        """Build the effective model/runtime config for a single turn.
+
+        Always uses the session's primary model/provider.  If `/fast` is
+        enabled and the model supports Priority Processing / Anthropic fast
+        mode, attach `request_overrides` so the API call is marked
+        accordingly.
+        """
        from hermes_cli.models import resolve_fast_mode_overrides

-        primary = {
-            "model": model,
+        runtime = {
            "api_key": runtime_kwargs.get("api_key"),
            "base_url": runtime_kwargs.get("base_url"),
            "provider": runtime_kwargs.get("provider"),
@@ -1015,7 +1118,18 @@ class GatewayRunner:
            "args": list(runtime_kwargs.get("args") or []),
            "credential_pool": runtime_kwargs.get("credential_pool"),
        }
-        route = resolve_turn_route(user_message, getattr(self, "_smart_model_routing", {}), primary)
+        route = {
+            "model": model,
+            "runtime": runtime,
+            "signature": (
+                model,
+                runtime["provider"],
+                runtime["base_url"],
+                runtime["api_mode"],
+                runtime["command"],
+                tuple(runtime["args"]),
+            ),
+        }

        service_tier = getattr(self, "_service_tier", None)
        if not service_tier:
@@ -1023,7 +1137,7 @@ class GatewayRunner:
            return route

        try:
-            overrides = resolve_fast_mode_overrides(route.get("model"))
+            overrides = resolve_fast_mode_overrides(route["model"])
        except Exception:
            overrides = None
        route["request_overrides"] = overrides
@@ -1381,20 +1495,6 @@ class GatewayRunner:
            pass
        return None

-    @staticmethod
-    def _load_smart_model_routing() -> dict:
-        """Load optional smart cheap-vs-strong model routing config."""
-        try:
-            import yaml as _y
-            cfg_path = _hermes_home / "config.yaml"
-            if cfg_path.exists():
-                with open(cfg_path, encoding="utf-8") as _f:
-                    cfg = _y.safe_load(_f) or {}
-                return cfg.get("smart_model_routing", {}) or {}
-        except Exception:
-            pass
-        return {}
-
    def _snapshot_running_agents(self) -> Dict[str, Any]:
        return {
            session_key: agent
@@ -1567,12 +1667,32 @@ class GatewayRunner:

        notified: set = set()
        for session_key in active:
-            # Parse platform + chat_id from the session key.
-            _parsed = _parse_session_key(session_key)
-            if not _parsed:
-                continue
-            platform_str = _parsed["platform"]
-            chat_id = _parsed["chat_id"]
+            source = None
+            try:
+                if getattr(self, "session_store", None) is not None:
+                    self.session_store._ensure_loaded()
+                    entry = self.session_store._entries.get(session_key)
+                    source = getattr(entry, "origin", None) if entry else None
+            except Exception as e:
+                logger.debug(
+                    "Failed to load session origin for shutdown notification %s: %s",
+                    session_key,
+                    e,
+                )
+
+            if source is not None:
+                platform_str = source.platform.value
+                chat_id = source.chat_id
+                thread_id = source.thread_id
+            else:
+                # Fall back to parsing the session key when no persisted
+                # origin is available (legacy sessions/tests).
+                _parsed = _parse_session_key(session_key)
+                if not _parsed:
+                    continue
+                platform_str = _parsed["platform"]
+                chat_id = _parsed["chat_id"]
+                thread_id = _parsed.get("thread_id")

            # Deduplicate: one notification per chat, even if multiple
            # sessions (different users/threads) share the same chat.
@@ -1588,7 +1708,6 @@ class GatewayRunner:

                # Include thread_id if present so the message lands in the
                # correct forum topic / thread.
-                thread_id = _parsed.get("thread_id")
                metadata = {"thread_id": thread_id} if thread_id else None

                await adapter.send(chat_id, msg, metadata=metadata)
@@ -2441,7 +2560,7 @@ class GatewayRunner:
                            _sk[:20], _e,
                        )
                self._interrupt_running_agents(
-                    "Gateway restarting" if self._restart_requested else "Gateway shutting down"
+                    _INTERRUPT_REASON_GATEWAY_RESTART if self._restart_requested else _INTERRUPT_REASON_GATEWAY_SHUTDOWN
                )
                interrupt_deadline = asyncio.get_running_loop().time() + 5.0
                while self._running_agents and asyncio.get_running_loop().time() < interrupt_deadline:
@@ -2862,10 +2981,59 @@ class GatewayRunner:
        return bool(check_ids & allowed_ids)

    def _get_unauthorized_dm_behavior(self, platform: Optional[Platform]) -> str:
-        """Return how unauthorized DMs should be handled for a platform."""
+        """Return how unauthorized DMs should be handled for a platform.
+
+        Resolution order:
+        1. Explicit per-platform ``unauthorized_dm_behavior`` in config — always wins.
+        2. Explicit global ``unauthorized_dm_behavior`` in config — wins when no per-platform.
+        3. When an allowlist (``PLATFORM_ALLOWED_USERS`` or ``GATEWAY_ALLOWED_USERS``) is
+           configured, default to ``"ignore"`` — the allowlist signals that the owner has
+           deliberately restricted access; spamming unknown contacts with pairing codes
+           is both noisy and a potential info-leak. (#9337)
+        4. No allowlist and no explicit config → ``"pair"`` (open-gateway default).
+        """
        config = getattr(self, "config", None)
-        if config and hasattr(config, "get_unauthorized_dm_behavior"):
-            return config.get_unauthorized_dm_behavior(platform)
+
+        # Check for an explicit per-platform override first.
+        if config and hasattr(config, "get_unauthorized_dm_behavior") and platform:
+            platform_cfg = config.platforms.get(platform) if hasattr(config, "platforms") else None
+            if platform_cfg and "unauthorized_dm_behavior" in getattr(platform_cfg, "extra", {}):
+                # Operator explicitly configured behavior for this platform — respect it.
+                return config.get_unauthorized_dm_behavior(platform)
+
+        # Check for an explicit global config override.
+        if config and hasattr(config, "unauthorized_dm_behavior"):
+            if config.unauthorized_dm_behavior != "pair":  # non-default → explicit override
+                return config.unauthorized_dm_behavior
+
+        # No explicit override.  Fall back to allowlist-aware default:
+        # if any allowlist is configured for this platform, silently drop
+        # unauthorized messages instead of sending pairing codes.
+        if platform:
+            platform_env_map = {
+                Platform.TELEGRAM: "TELEGRAM_ALLOWED_USERS",
+                Platform.DISCORD:  "DISCORD_ALLOWED_USERS",
+                Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
+                Platform.SLACK:    "SLACK_ALLOWED_USERS",
+                Platform.SIGNAL:   "SIGNAL_ALLOWED_USERS",
+                Platform.EMAIL:    "EMAIL_ALLOWED_USERS",
+                Platform.SMS:      "SMS_ALLOWED_USERS",
+                Platform.MATTERMOST: "MATTERMOST_ALLOWED_USERS",
+                Platform.MATRIX:   "MATRIX_ALLOWED_USERS",
+                Platform.DINGTALK: "DINGTALK_ALLOWED_USERS",
+                Platform.FEISHU:   "FEISHU_ALLOWED_USERS",
+                Platform.WECOM:    "WECOM_ALLOWED_USERS",
+                Platform.WECOM_CALLBACK: "WECOM_CALLBACK_ALLOWED_USERS",
+                Platform.WEIXIN:   "WEIXIN_ALLOWED_USERS",
+                Platform.BLUEBUBBLES: "BLUEBUBBLES_ALLOWED_USERS",
+                Platform.QQBOT:    "QQ_ALLOWED_USERS",
+            }
+            if os.getenv(platform_env_map.get(platform, ""), "").strip():
+                return "ignore"
+
+        if os.getenv("GATEWAY_ALLOWED_USERS", "").strip():
+            return "ignore"
+
        return "pair"
    
    async def _handle_message(self, event: MessageEvent) -> Optional[str]:
@@ -3012,6 +3180,10 @@ class GatewayRunner:
                    _quick_key[:30], _stale_age, _stale_idle,
                    _raw_stale_timeout, _stale_detail,
                )
+                self._invalidate_session_run_generation(
+                    _quick_key,
+                    reason="stale_running_agent_eviction",
+                )
                self._release_running_agent_state(_quick_key)

        if _quick_key in self._running_agents:
@@ -3035,15 +3207,12 @@ class GatewayRunner:
            # _interrupt_requested.  Force-clean _running_agents so the session
            # is unlocked and subsequent messages are processed normally.
            if _cmd_def_inner and _cmd_def_inner.name == "stop":
-                running_agent = self._running_agents.get(_quick_key)
-                if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
-                    running_agent.interrupt("Stop requested")
-                # Force-clean: remove the session lock regardless of agent state
-                adapter = self.adapters.get(source.platform)
-                if adapter and hasattr(adapter, 'get_pending_message'):
-                    adapter.get_pending_message(_quick_key)  # consume and discard
-                self._pending_messages.pop(_quick_key, None)
-                self._release_running_agent_state(_quick_key)
+                await self._interrupt_and_clear_session(
+                    _quick_key,
+                    source,
+                    interrupt_reason=_INTERRUPT_REASON_STOP,
+                    invalidation_reason="stop_command",
+                )
                logger.info("STOP for session %s — agent interrupted, session lock released", _quick_key[:20])
                return "⚡ Stopped. You can continue this session."

@@ -3055,17 +3224,15 @@ class GatewayRunner:
            # doesn't get re-processed as a user message after the
            # interrupt completes.
            if _cmd_def_inner and _cmd_def_inner.name == "new":
-                running_agent = self._running_agents.get(_quick_key)
-                if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
-                    running_agent.interrupt("Session reset requested")
                # Clear any pending messages so the old text doesn't replay
-                adapter = self.adapters.get(source.platform)
-                if adapter and hasattr(adapter, 'get_pending_message'):
-                    adapter.get_pending_message(_quick_key)  # consume and discard
-                self._pending_messages.pop(_quick_key, None)
+                await self._interrupt_and_clear_session(
+                    _quick_key,
+                    source,
+                    interrupt_reason=_INTERRUPT_REASON_RESET,
+                    invalidation_reason="new_command",
+                )
                # Clean up the running agent entry so the reset handler
                # doesn't think an agent is still active.
-                self._release_running_agent_state(_quick_key)
                return await self._handle_reset_command(event)

            # /queue <prompt> — queue without interrupting
@@ -3156,6 +3323,20 @@ class GatewayRunner:
            if _cmd_def_inner and _cmd_def_inner.name == "background":
                return await self._handle_background_command(event)

+            # Session-level toggles that are safe to run mid-agent —
+            # /yolo can unblock a pending approval prompt, /verbose cycles
+            # the tool-progress display mode for the ongoing stream.
+            # Both modify session state without needing agent interaction
+            # and must not be queued (the safety net would discard them).
+            # /fast and /reasoning are config-only and take effect next
+            # message, so they fall through to the catch-all busy response
+            # below — users should wait and set them between turns.
+            if _cmd_def_inner and _cmd_def_inner.name in ("yolo", "verbose"):
+                if _cmd_def_inner.name == "yolo":
+                    return await self._handle_yolo_command(event)
+                if _cmd_def_inner.name == "verbose":
+                    return await self._handle_verbose_command(event)
+
            # Gateway-handled info/control commands with dedicated
            # running-agent handlers.
            if _cmd_def_inner and _cmd_def_inner.name in _DEDICATED_HANDLERS:
@@ -3546,9 +3727,10 @@ class GatewayRunner:
        # same session — corrupting the transcript.
        self._running_agents[_quick_key] = _AGENT_PENDING_SENTINEL
        self._running_agents_ts[_quick_key] = time.time()
+        _run_generation = self._begin_session_run_generation(_quick_key)

        try:
-            return await self._handle_message_with_agent(event, source, _quick_key)
+            return await self._handle_message_with_agent(event, source, _quick_key, _run_generation)
        finally:
            # If _run_agent replaced the sentinel with a real agent and
            # then cleaned it up, this is a no-op.  If we exited early
@@ -3719,7 +3901,7 @@ class GatewayRunner:

        return message_text

-    async def _handle_message_with_agent(self, event, source, _quick_key: str):
+    async def _handle_message_with_agent(self, event, source, _quick_key: str, run_generation: int):
        """Inner handler that runs under the _running_agents sentinel guard."""
        _msg_start_time = time.time()
        _platform_name = source.platform.value if hasattr(source.platform, "value") else str(source.platform)
@@ -4176,6 +4358,15 @@ class GatewayRunner:
        if message_text is None:
            return

+        # Bind this gateway run generation to the adapter's active-session
+        # event so deferred post-delivery callbacks can be released by the
+        # same run that registered them.
+        self._bind_adapter_run_generation(
+            self.adapters.get(source.platform),
+            session_key,
+            run_generation,
+        )
+
        try:
            # Emit agent:start hook
            hook_ctx = {
@@ -4194,6 +4385,7 @@ class GatewayRunner:
                source=source,
                session_id=session_entry.session_id,
                session_key=session_key,
+                run_generation=run_generation,
                event_message_id=event.message_id,
                channel_prompt=event.channel_prompt,
            )
@@ -4206,6 +4398,22 @@ class GatewayRunner:
            except Exception:
                pass

+            if not self._is_session_run_current(_quick_key, run_generation):
+                logger.info(
+                    "Discarding stale agent result for %s — generation %d is no longer current",
+                    _quick_key[:20] if _quick_key else "?",
+                    run_generation,
+                )
+                _stale_adapter = self.adapters.get(source.platform)
+                if getattr(type(_stale_adapter), "pop_post_delivery_callback", None) is not None:
+                    _stale_adapter.pop_post_delivery_callback(
+                        _quick_key,
+                        generation=run_generation,
+                    )
+                elif _stale_adapter and hasattr(_stale_adapter, "_post_delivery_callbacks"):
+                    _stale_adapter._post_delivery_callbacks.pop(_quick_key, None)
+                return None
+
            response = agent_result.get("final_response") or ""

            # Convert the agent's internal "(empty)" sentinel into a
@@ -4620,6 +4828,7 @@ class GatewayRunner:
        
        # Get existing session key
        session_key = self._session_key_for_source(source)
+        self._invalidate_session_run_generation(session_key, reason="session_reset")
        
        # Flush memories in the background (fire-and-forget) so the user
        # gets the "Session reset!" response immediately.
@@ -4879,14 +5088,23 @@ class GatewayRunner:
        agent = self._running_agents.get(session_key)
        if agent is _AGENT_PENDING_SENTINEL:
            # Force-clean the sentinel so the session is unlocked.
-            self._release_running_agent_state(session_key)
+            await self._interrupt_and_clear_session(
+                session_key,
+                source,
+                interrupt_reason=_INTERRUPT_REASON_STOP,
+                invalidation_reason="stop_command_pending",
+            )
            logger.info("STOP (pending) for session %s — sentinel cleared", session_key[:20])
            return "⚡ Stopped. The agent hadn't started yet — you can continue this session."
        if agent:
-            agent.interrupt("Stop requested")
            # Force-clean the session lock so a truly hung agent doesn't
            # keep it locked forever.
-            self._release_running_agent_state(session_key)
+            await self._interrupt_and_clear_session(
+                session_key,
+                source,
+                interrupt_reason=_INTERRUPT_REASON_STOP,
+                invalidation_reason="stop_command_handler",
+            )
            return "⚡ Stopped. You can continue this session."
        else:
            return "No active task to stop."
@@ -5664,11 +5882,13 @@ class GatewayRunner:
        """Handle /voice [on|off|tts|channel|leave|status] command."""
        args = event.get_command_args().strip().lower()
        chat_id = event.source.chat_id
+        platform = event.source.platform
+        voice_key = self._voice_key(platform, chat_id)

-        adapter = self.adapters.get(event.source.platform)
+        adapter = self.adapters.get(platform)

        if args in ("on", "enable"):
-            self._voice_mode[chat_id] = "voice_only"
+            self._voice_mode[voice_key] = "voice_only"
            self._save_voice_modes()
            if adapter:
                self._set_adapter_auto_tts_disabled(adapter, chat_id, disabled=False)
@@ -5678,13 +5898,13 @@ class GatewayRunner:
                "Use /voice tts to get voice replies for all messages."
            )
        elif args in ("off", "disable"):
-            self._voice_mode[chat_id] = "off"
+            self._voice_mode[voice_key] = "off"
            self._save_voice_modes()
            if adapter:
                self._set_adapter_auto_tts_disabled(adapter, chat_id, disabled=True)
            return "Voice mode disabled. Text-only replies."
        elif args == "tts":
-            self._voice_mode[chat_id] = "all"
+            self._voice_mode[voice_key] = "all"
            self._save_voice_modes()
            if adapter:
                self._set_adapter_auto_tts_disabled(adapter, chat_id, disabled=False)
@@ -5697,7 +5917,7 @@ class GatewayRunner:
        elif args == "leave":
            return await self._handle_voice_channel_leave(event)
        elif args == "status":
-            mode = self._voice_mode.get(chat_id, "off")
+            mode = self._voice_mode.get(voice_key, "off")
            labels = {
                "off": "Off (text only)",
                "voice_only": "On (voice reply to voice messages)",
@@ -5721,15 +5941,15 @@ class GatewayRunner:
            return f"Voice mode: {labels.get(mode, mode)}"
        else:
            # Toggle: off → on, on/all → off
-            current = self._voice_mode.get(chat_id, "off")
+            current = self._voice_mode.get(voice_key, "off")
            if current == "off":
-                self._voice_mode[chat_id] = "voice_only"
+                self._voice_mode[voice_key] = "voice_only"
                self._save_voice_modes()
                if adapter:
                    self._set_adapter_auto_tts_disabled(adapter, chat_id, disabled=False)
                return "Voice mode enabled."
            else:
-                self._voice_mode[chat_id] = "off"
+                self._voice_mode[voice_key] = "off"
                self._save_voice_modes()
                if adapter:
                    self._set_adapter_auto_tts_disabled(adapter, chat_id, disabled=True)
@@ -5775,7 +5995,7 @@ class GatewayRunner:
            adapter._voice_text_channels[guild_id] = int(event.source.chat_id)
            if hasattr(adapter, "_voice_sources"):
                adapter._voice_sources[guild_id] = event.source.to_dict()
-            self._voice_mode[event.source.chat_id] = "all"
+            self._voice_mode[self._voice_key(event.source.platform, event.source.chat_id)] = "all"
            self._save_voice_modes()
            self._set_adapter_auto_tts_disabled(adapter, event.source.chat_id, disabled=False)
            return (
@@ -5802,7 +6022,7 @@ class GatewayRunner:
        except Exception as e:
            logger.warning("Error leaving voice channel: %s", e)
        # Always clean up state even if leave raised an exception
-        self._voice_mode[event.source.chat_id] = "off"
+        self._voice_mode[self._voice_key(event.source.platform, event.source.chat_id)] = "off"
        self._save_voice_modes()
        self._set_adapter_auto_tts_disabled(adapter, event.source.chat_id, disabled=True)
        if hasattr(adapter, "_voice_input_callback"):
@@ -5814,7 +6034,7 @@ class GatewayRunner:

        Cleans up runner-side voice_mode state that the adapter cannot reach.
        """
-        self._voice_mode[chat_id] = "off"
+        self._voice_mode[self._voice_key(Platform.DISCORD, chat_id)] = "off"
        self._save_voice_modes()
        adapter = self.adapters.get(Platform.DISCORD)
        self._set_adapter_auto_tts_disabled(adapter, chat_id, disabled=True)
@@ -5900,7 +6120,7 @@ class GatewayRunner:
            return False

        chat_id = event.source.chat_id
-        voice_mode = self._voice_mode.get(chat_id, "off")
+        voice_mode = self._voice_mode.get(self._voice_key(event.source.platform, chat_id), "off")
        is_voice_input = (event.message_type == MessageType.VOICE)

        should = (
@@ -8333,6 +8553,84 @@ class GatewayRunner:
        if hasattr(self, "_busy_ack_ts"):
            self._busy_ack_ts.pop(session_key, None)

+    def _begin_session_run_generation(self, session_key: str) -> int:
+        """Claim a fresh run generation token for ``session_key``.
+
+        Every top-level gateway turn gets a monotonically increasing token.
+        If a later command like /stop or /new invalidates that token while the
+        old worker is still unwinding, the late result can be recognized and
+        dropped instead of bleeding into the fresh session.
+        """
+        if not session_key:
+            return 0
+        generations = self.__dict__.get("_session_run_generation")
+        if generations is None:
+            generations = {}
+            self._session_run_generation = generations
+        next_generation = int(generations.get(session_key, 0)) + 1
+        generations[session_key] = next_generation
+        return next_generation
+
+    def _invalidate_session_run_generation(self, session_key: str, *, reason: str = "") -> int:
+        """Invalidate any in-flight run token for ``session_key``."""
+        generation = self._begin_session_run_generation(session_key)
+        if reason:
+            logger.info(
+                "Invalidated run generation for %s → %d (%s)",
+                session_key[:20],
+                generation,
+                reason,
+            )
+        return generation
+
+    def _is_session_run_current(self, session_key: str, generation: int) -> bool:
+        """Return True when ``generation`` is still current for ``session_key``."""
+        if not session_key:
+            return True
+        generations = self.__dict__.get("_session_run_generation") or {}
+        return int(generations.get(session_key, 0)) == int(generation)
+
+    def _bind_adapter_run_generation(
+        self,
+        adapter: Any,
+        session_key: str,
+        generation: int | None,
+    ) -> None:
+        """Bind a gateway run generation to the adapter's active-session event."""
+        if not adapter or not session_key or generation is None:
+            return
+        try:
+            interrupt_event = getattr(adapter, "_active_sessions", {}).get(session_key)
+            if interrupt_event is not None:
+                setattr(interrupt_event, "_hermes_run_generation", int(generation))
+        except Exception:
+            pass
+
+    async def _interrupt_and_clear_session(
+        self,
+        session_key: str,
+        source: SessionSource,
+        *,
+        interrupt_reason: str,
+        invalidation_reason: str,
+        release_running_state: bool = True,
+    ) -> None:
+        """Interrupt the current run and clear queued session state consistently."""
+        if not session_key:
+            return
+        running_agent = self._running_agents.get(session_key)
+        if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
+            running_agent.interrupt(interrupt_reason)
+        self._invalidate_session_run_generation(session_key, reason=invalidation_reason)
+        adapter = self.adapters.get(source.platform)
+        if adapter and hasattr(adapter, "interrupt_session_activity"):
+            await adapter.interrupt_session_activity(session_key, source.chat_id)
+        if adapter and hasattr(adapter, "get_pending_message"):
+            adapter.get_pending_message(session_key)  # consume and discard
+        self._pending_messages.pop(session_key, None)
+        if release_running_state:
+            self._release_running_agent_state(session_key)
+
    def _evict_cached_agent(self, session_key: str) -> None:
        """Remove a cached agent for a session (called on /new, /model, etc)."""
        _lock = getattr(self, "_agent_cache_lock", None)
@@ -8514,6 +8812,7 @@ class GatewayRunner:
        source: "SessionSource",
        session_id: str,
        session_key: str = None,
+        run_generation: Optional[int] = None,
        event_message_id: Optional[str] = None,
    ) -> Dict[str, Any]:
        """Forward the message to a remote Hermes API server instead of
@@ -8549,6 +8848,11 @@ class GatewayRunner:

        proxy_key = os.getenv("GATEWAY_PROXY_KEY", "").strip()

+        def _run_still_current() -> bool:
+            if run_generation is None or not session_key:
+                return True
+            return self._is_session_run_current(session_key, run_generation)
+
        # Build messages in OpenAI chat format --------------------------
        #
        # The remote api_server can maintain session continuity via
@@ -8678,6 +8982,21 @@ class GatewayRunner:
                    # Parse SSE stream
                    buffer = ""
                    async for chunk in resp.content.iter_any():
+                        if not _run_still_current():
+                            logger.info(
+                                "Discarding stale proxy stream for %s — generation %d is no longer current",
+                                session_key[:20] if session_key else "?",
+                                run_generation or 0,
+                            )
+                            return {
+                                "final_response": "",
+                                "messages": [],
+                                "api_calls": 0,
+                                "tools": [],
+                                "history_offset": len(history),
+                                "session_id": session_id,
+                                "response_previewed": False,
+                            }
                        text = chunk.decode("utf-8", errors="replace")
                        buffer += text

@@ -8727,6 +9046,21 @@ class GatewayRunner:
                    stream_task.cancel()

        _elapsed = time.time() - _start
+        if not _run_still_current():
+            logger.info(
+                "Discarding stale proxy result for %s — generation %d is no longer current",
+                session_key[:20] if session_key else "?",
+                run_generation or 0,
+            )
+            return {
+                "final_response": "",
+                "messages": [],
+                "api_calls": 0,
+                "tools": [],
+                "history_offset": len(history),
+                "session_id": session_id,
+                "response_previewed": False,
+            }
        logger.info(
            "proxy response: url=%s session=%s time=%.1fs response=%d chars",
            proxy_url, (session_id or "")[:20], _elapsed, len(full_response),
@@ -8755,6 +9089,7 @@ class GatewayRunner:
        source: SessionSource,
        session_id: str,
        session_key: str = None,
+        run_generation: Optional[int] = None,
        _interrupt_depth: int = 0,
        event_message_id: Optional[str] = None,
        channel_prompt: Optional[str] = None,
@@ -8780,11 +9115,17 @@ class GatewayRunner:
                source=source,
                session_id=session_id,
                session_key=session_key,
+                run_generation=run_generation,
                event_message_id=event_message_id,
            )

        from run_agent import AIAgent
        import queue
+
+        def _run_still_current() -> bool:
+            if run_generation is None or not session_key:
+                return True
+            return self._is_session_run_current(session_key, run_generation)
        
        user_config = _load_gateway_config()
        platform_key = _platform_config_key(source.platform)
@@ -8839,7 +9180,7 @@ class GatewayRunner:
        
        def progress_callback(event_type: str, tool_name: str = None, preview: str = None, args: dict = None, **kwargs):
            """Callback invoked by agent on tool lifecycle events."""
-            if not progress_queue:
+            if not progress_queue or not _run_still_current():
                return

            # Only act on tool.started events (ignore tool.completed, reasoning.available, etc.)
@@ -8944,6 +9285,14 @@ class GatewayRunner:

            while True:
                try:
+                    if not _run_still_current():
+                        while not progress_queue.empty():
+                            try:
+                                progress_queue.get_nowait()
+                            except Exception:
+                                break
+                        return
+
                    raw = progress_queue.get_nowait()

                    # Handle dedup messages: update last line with repeat counter
@@ -8969,6 +9318,9 @@ class GatewayRunner:
                        await asyncio.sleep(_remaining)
                        continue

+                    if not _run_still_current():
+                        return
+
                    if can_edit and progress_msg_id is not None:
                        # Try to edit the existing progress message
                        full_text = "\n".join(progress_lines)
@@ -9004,7 +9356,8 @@ class GatewayRunner:

                    # Restore typing indicator
                    await asyncio.sleep(0.3)
-                    await adapter.send_typing(source.chat_id, metadata=_progress_metadata)
+                    if _run_still_current():
+                        await adapter.send_typing(source.chat_id, metadata=_progress_metadata)

                except queue.Empty:
                    await asyncio.sleep(0.3)
@@ -9048,6 +9401,8 @@ class GatewayRunner:
        _hooks_ref = self.hooks

        def _step_callback_sync(iteration: int, prev_tools: list) -> None:
+            if not _run_still_current():
+                return
            try:
                # prev_tools may be list[str] or list[dict] with "name"/"result"
                # keys.  Normalise to keep "tool_names" backward-compatible for
@@ -9078,7 +9433,7 @@ class GatewayRunner:
        _status_thread_metadata = {"thread_id": _progress_thread_id} if _progress_thread_id else None

        def _status_callback_sync(event_type: str, message: str) -> None:
-            if not _status_adapter:
+            if not _status_adapter or not _run_still_current():
                return
            try:
                asyncio.run_coroutine_threadsafe(
@@ -9209,12 +9564,16 @@ class GatewayRunner:
                            metadata={"thread_id": _progress_thread_id} if _progress_thread_id else None,
                        )
                        if _want_stream_deltas:
-                            _stream_delta_cb = _stream_consumer.on_delta
+                            def _stream_delta_cb(text: str) -> None:
+                                if _run_still_current():
+                                    _stream_consumer.on_delta(text)
                        stream_consumer_holder[0] = _stream_consumer
                except Exception as _sc_err:
                    logger.debug("Could not set up stream consumer: %s", _sc_err)

            def _interim_assistant_cb(text: str, *, already_streamed: bool = False) -> None:
+                if not _run_still_current():
+                    return
                if _stream_consumer is not None:
                    if already_streamed:
                        _stream_consumer.on_segment_break()
@@ -9318,7 +9677,7 @@ class GatewayRunner:
            _bg_review_pending_lock = threading.Lock()

            def _deliver_bg_review_message(message: str) -> None:
-                if not _status_adapter:
+                if not _status_adapter or not _run_still_current():
                    return
                try:
                    asyncio.run_coroutine_threadsafe(
@@ -9342,7 +9701,7 @@ class GatewayRunner:

            # Background review delivery — send "💾 Memory updated" etc. to user
            def _bg_review_send(message: str) -> None:
-                if not _status_adapter:
+                if not _status_adapter or not _run_still_current():
                    return
                if not _bg_review_release.is_set():
                    with _bg_review_pending_lock:
@@ -9355,9 +9714,16 @@ class GatewayRunner:
            # Register the release hook on the adapter so base.py's finally
            # block can fire it after delivering the main response.
            if _status_adapter and session_key:
-                _pdc = getattr(_status_adapter, "_post_delivery_callbacks", None)
-                if _pdc is not None:
-                    _pdc[session_key] = _release_bg_review_messages
+                if getattr(type(_status_adapter), "register_post_delivery_callback", None) is not None:
+                    _status_adapter.register_post_delivery_callback(
+                        session_key,
+                        _release_bg_review_messages,
+                        generation=run_generation,
+                    )
+                else:
+                    _pdc = getattr(_status_adapter, "_post_delivery_callbacks", None)
+                    if _pdc is not None:
+                        _pdc[session_key] = _release_bg_review_messages

            # Store agent reference for interrupt support
            agent_holder[0] = agent
@@ -9959,7 +10325,7 @@ class GatewayRunner:
                # Interrupt the agent if it's still running so the thread
                # pool worker is freed.
                if _timed_out_agent and hasattr(_timed_out_agent, "interrupt"):
-                    _timed_out_agent.interrupt("Execution timed out (inactivity)")
+                    _timed_out_agent.interrupt(_INTERRUPT_REASON_TIMEOUT)

                _timeout_mins = int(_agent_timeout // 60) or 1

@@ -10024,11 +10390,29 @@ class GatewayRunner:
            if result and adapter and session_key:
                pending_event = _dequeue_pending_event(adapter, session_key)
                if result.get("interrupted") and not pending_event and result.get("interrupt_message"):
-                    pending = result.get("interrupt_message")
+                    interrupt_message = result.get("interrupt_message")
+                    if _is_control_interrupt_message(interrupt_message):
+                        logger.info(
+                            "Ignoring control interrupt message for session %s: %s",
+                            session_key[:20] if session_key else "?",
+                            interrupt_message,
+                        )
+                    else:
+                        pending = interrupt_message
                elif pending_event:
                    pending = pending_event.text or _build_media_placeholder(pending_event)
                    logger.debug("Processing queued message after agent completion: '%s...'", pending[:40])

+            # Leftover /steer: if a steer arrived after the last tool batch
+            # (e.g. during the final API call), the agent couldn't inject it
+            # and returned it in result["pending_steer"]. Deliver it as the
+            # next user turn so it isn't silently dropped.
+            if result and not pending and not pending_event:
+                _leftover_steer = result.get("pending_steer")
+                if _leftover_steer:
+                    pending = _leftover_steer
+                    logger.debug("Delivering leftover /steer as next turn: '%s...'", pending[:40])
+
            # Safety net: if the pending text is a slash command (e.g. "/stop",
            # "/new"), discard it — commands should never be passed to the agent
            # as user input.  The primary fix is in base.py (commands bypass the
@@ -10129,7 +10513,17 @@ class GatewayRunner:
                    # first response has been delivered.  Pop from the
                    # adapter's callback dict (prevents double-fire in
                    # base.py's finally block) and call it.
-                    if adapter and hasattr(adapter, "_post_delivery_callbacks"):
+                    if getattr(type(adapter), "pop_post_delivery_callback", None) is not None:
+                        _bg_cb = adapter.pop_post_delivery_callback(
+                            session_key,
+                            generation=run_generation,
+                        )
+                        if callable(_bg_cb):
+                            try:
+                                _bg_cb()
+                            except Exception:
+                                pass
+                    elif adapter and hasattr(adapter, "_post_delivery_callbacks"):
                        _bg_cb = adapter._post_delivery_callbacks.pop(session_key, None)
                        if callable(_bg_cb):
                            try:
@@ -10177,6 +10571,7 @@ class GatewayRunner:
                    source=next_source,
                    session_id=session_id,
                    session_key=session_key,
+                    run_generation=run_generation,
                    _interrupt_depth=_interrupt_depth + 1,
                    event_message_id=next_message_id,
                    channel_prompt=next_channel_prompt,
@@ -926,12 +926,18 @@ class SessionStore:
                    continue
                # Never prune sessions with an active background process
                # attached — the user may still be waiting on output.
+                # The callback is keyed by session_key (see process_registry.
+                # has_active_for_session); passing session_id here used to
+                # never match, so active sessions got pruned anyway.
                if self._has_active_processes_fn is not None:
                    try:
-                        if self._has_active_processes_fn(entry.session_id):
+                        if self._has_active_processes_fn(entry.session_key):
                            continue
-                    except Exception:
-                        pass
+                    except Exception as exc:
+                        logger.debug(
+                            "has_active_processes_fn raised during prune for %s: %s",
+                            entry.session_key, exc,
+                        )
                if entry.updated_at < cutoff:
                    removed_keys.append(key)
            for key in removed_keys:
@@ -571,6 +571,30 @@ class GatewayStreamConsumer:
            if final_text.strip() and final_text != self._visible_prefix():
                continuation = final_text
            else:
+                # Defence-in-depth for #7183: the last edit may still show the
+                # cursor character because fallback mode was entered after an
+                # edit failure left it stuck.  Try one final edit to strip it
+                # so the message doesn't freeze with a visible ▉.  Best-effort
+                # — if this edit also fails (flood control still active),
+                # _try_strip_cursor has already been called on fallback entry
+                # and the adaptive-backoff retries will have had their shot.
+                if (
+                    self._message_id
+                    and self._last_sent_text
+                    and self.cfg.cursor
+                    and self._last_sent_text.endswith(self.cfg.cursor)
+                ):
+                    clean_text = self._last_sent_text[:-len(self.cfg.cursor)]
+                    try:
+                        result = await self.adapter.edit_message(
+                            chat_id=self.chat_id,
+                            message_id=self._message_id,
+                            content=clean_text,
+                        )
+                        if result.success:
+                            self._last_sent_text = clean_text
+                    except Exception:
+                        pass
                self._already_sent = True
                self._final_response_sent = True
                return
@@ -20,6 +20,7 @@ import logging
 import os
 import shutil
 import shlex
+import ssl
 import stat
 import base64
 import hashlib
@@ -151,7 +152,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        id="gemini",
        name="Google AI Studio",
        auth_type="api_key",
-        inference_base_url="https://generativelanguage.googleapis.com/v1beta/openai",
+        inference_base_url="https://generativelanguage.googleapis.com/v1beta",
        api_key_env_vars=("GOOGLE_API_KEY", "GEMINI_API_KEY"),
        base_url_env_var="GEMINI_BASE_URL",
    ),
@@ -353,6 +354,9 @@ def _resolve_kimi_base_url(api_key: str, default_url: str, env_override: str) ->
    """
    if env_override:
        return env_override
+    # No key → nothing to infer from.  Return default without inspecting.
+    if not api_key:
+        return default_url
    if api_key.startswith("sk-kimi-"):
        return KIMI_CODE_BASE_URL
    return default_url
@@ -480,6 +484,14 @@ def _resolve_zai_base_url(api_key: str, default_url: str, env_override: str) ->
    if env_override:
        return env_override

+    # No API key set → don't probe (would fire N×M HTTPS requests with an
+    # empty Bearer token, all returning 401).  This path is hit during
+    # auxiliary-client auto-detection when the user has no Z.AI credentials
+    # at all — the caller discards the result immediately, so the probe is
+    # pure latency for every AIAgent construction.
+    if not api_key:
+        return default_url
+
    # Check provider-state cache for a previously-detected endpoint.
    auth_store = _load_auth_store()
    state = _load_provider_state(auth_store, "zai") or {}
@@ -1652,7 +1664,7 @@ def _resolve_verify(
    insecure: Optional[bool] = None,
    ca_bundle: Optional[str] = None,
    auth_state: Optional[Dict[str, Any]] = None,
-) -> bool | str:
+) -> bool | ssl.SSLContext:
    tls_state = auth_state.get("tls") if isinstance(auth_state, dict) else {}
    tls_state = tls_state if isinstance(tls_state, dict) else {}

@@ -1672,13 +1684,12 @@ def _resolve_verify(
    if effective_ca:
        ca_path = str(effective_ca)
        if not os.path.isfile(ca_path):
-            import logging
-            logging.getLogger("hermes.auth").warning(
+            logger.warning(
                "CA bundle path does not exist: %s — falling back to default certificates",
                ca_path,
            )
            return True
-        return ca_path
+        return ssl.create_default_context(cafile=ca_path)
    return True


@@ -2721,6 +2732,17 @@ def _update_config_for_provider(
        # Clear stale base_url to prevent contamination when switching providers
        model_cfg.pop("base_url", None)

+    # Clear stale api_key/api_mode left over from a previous custom provider.
+    # When the user switches from e.g. a MiniMax custom endpoint
+    # (api_mode=anthropic_messages, api_key=mxp-...) to a built-in provider
+    # (e.g. OpenRouter), the stale api_key/api_mode would override the new
+    # provider's credentials and transport choice.  Built-in providers that
+    # need a specific api_mode (copilot, xai) set it at request-resolution
+    # time via `_copilot_runtime_api_mode` / `_detect_api_mode_for_url`, so
+    # removing the persisted value here is safe.
+    model_cfg.pop("api_key", None)
+    model_cfg.pop("api_mode", None)
+
    # When switching to a non-OpenRouter provider, ensure model.default is
    # valid for the new provider.  An OpenRouter-formatted name like
    # "anthropic/claude-opus-4.6" will fail on direct-API providers.
@@ -201,7 +201,7 @@ def run_backup(args) -> None:
                else:
                    zf.write(abs_path, arcname=str(rel_path))
                    total_bytes += abs_path.stat().st_size
-            except (PermissionError, OSError) as exc:
+            except (PermissionError, OSError, ValueError) as exc:
                errors.append(f"  {rel_path}: {exc}")
                continue

@@ -24,7 +24,6 @@ _FORWARD_COMPAT_TEMPLATE_MODELS: List[tuple[str, tuple[str, ...]]] = [
    ("gpt-5.4-mini", ("gpt-5.3-codex", "gpt-5.2-codex")),
    ("gpt-5.4", ("gpt-5.3-codex", "gpt-5.2-codex")),
    ("gpt-5.3-codex", ("gpt-5.2-codex",)),
-    ("gpt-5.3-codex-spark", ("gpt-5.3-codex", "gpt-5.2-codex")),
 ]


@@ -497,9 +497,8 @@ def _collect_gateway_skill_entries(
    # --- Tier 1: Plugin slash commands (never trimmed) ---------------------
    plugin_pairs: list[tuple[str, str]] = []
    try:
-        from hermes_cli.plugins import get_plugin_manager
-        pm = get_plugin_manager()
-        plugin_cmds = getattr(pm, "_plugin_commands", {})
+        from hermes_cli.plugins import get_plugin_commands
+        plugin_cmds = get_plugin_commands()
        for cmd_name in sorted(plugin_cmds):
            name = sanitize_name(cmd_name) if sanitize_name else cmd_name
            if not name:
@@ -13,6 +13,7 @@ This module provides:
 """

 import copy
+import logging
 import os
 import platform
 import re
@@ -24,6 +25,7 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import Dict, Any, Optional, List, Tuple

+logger = logging.getLogger(__name__)

 _IS_WINDOWS = platform.system() == "Windows"
 _ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
@@ -403,7 +405,11 @@ DEFAULT_CONFIG = {
        "container_persistent": True,   # Persist filesystem across sessions
        # Docker volume mounts — share host directories with the container.
        # Each entry is "host_path:container_path" (standard Docker -v syntax).
-        # Example: ["/home/user/projects:/workspace/projects", "/data:/data"]
+        # Example:
+        # ["/home/user/projects:/workspace/projects",
+        #  "/home/user/.hermes/cache/documents:/output"]
+        # For gateway MEDIA delivery, write inside Docker to /output/... and emit
+        # the host-visible path in MEDIA:, not the container path.
        "docker_volumes": [],
        # Explicit opt-in: mount the host cwd into /workspace for Docker sessions.
        # Default off because passing host directories into a sandbox weakens isolation.
@@ -470,13 +476,6 @@ DEFAULT_CONFIG = {
        },
    },

-    "smart_model_routing": {
-        "enabled": False,
-        "max_simple_chars": 160,
-        "max_simple_words": 28,
-        "cheap_model": {},
-    },
-    
    # Auxiliary model config — provider:model for each side task.
    # Format: provider is the provider name, model is the model slug.
    # "auto" for provider = auto-detect best available provider.
@@ -490,6 +489,7 @@ DEFAULT_CONFIG = {
            "base_url": "",        # direct OpenAI-compatible endpoint (takes precedence over provider)
            "api_key": "",         # API key for base_url (falls back to OPENAI_API_KEY)
            "timeout": 120,        # seconds — LLM API call timeout; vision payloads need generous timeout
+            "extra_body": {},      # OpenAI-compatible provider-specific request fields
            "download_timeout": 30,  # seconds — image HTTP download timeout; increase for slow connections
        },
        "web_extract": {
@@ -498,6 +498,7 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
            "timeout": 360,        # seconds (6min) — per-attempt LLM summarization timeout; increase for slow local models
+            "extra_body": {},
        },
        "compression": {
            "provider": "auto",
@@ -505,6 +506,7 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
            "timeout": 120,        # seconds — compression summarises large contexts; increase for local models
+            "extra_body": {},
        },
        "session_search": {
            "provider": "auto",
@@ -512,6 +514,8 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
            "timeout": 30,
+            "extra_body": {},
+            "max_concurrency": 3,  # Clamp parallel summaries to avoid request-burst 429s on small providers
        },
        "skills_hub": {
            "provider": "auto",
@@ -519,6 +523,7 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
            "timeout": 30,
+            "extra_body": {},
        },
        "approval": {
            "provider": "auto",
@@ -526,6 +531,7 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
            "timeout": 30,
+            "extra_body": {},
        },
        "mcp": {
            "provider": "auto",
@@ -533,6 +539,7 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
            "timeout": 30,
+            "extra_body": {},
        },
        "flush_memories": {
            "provider": "auto",
@@ -540,6 +547,7 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
            "timeout": 30,
+            "extra_body": {},
        },
        "title_generation": {
            "provider": "auto",
@@ -547,6 +555,7 @@ DEFAULT_CONFIG = {
            "base_url": "",
            "api_key": "",
            "timeout": 30,
+            "extra_body": {},
        },
    },
    
@@ -558,9 +567,14 @@ DEFAULT_CONFIG = {
        "bell_on_complete": False,
        "show_reasoning": False,
        "streaming": False,
+        "final_response_markdown": "strip",  # render | strip | raw
        "inline_diffs": True,     # Show inline diff previews for write actions (write_file, patch, skill_manage)
        "show_cost": False,       # Show $ cost in the status bar (off by default)
        "skin": "default",
+        "user_message_preview": {  # CLI: how many submitted user-message lines to echo back in scrollback
+            "first_lines": 2,
+            "last_lines": 2,
+        },
        "interim_assistant_messages": True,  # Gateway: show natural mid-turn assistant status messages
        "tool_progress_command": False,  # Enable /verbose command in messaging gateway
        "tool_progress_overrides": {},  # DEPRECATED — use display.platforms instead
@@ -708,6 +722,14 @@ DEFAULT_CONFIG = {
        "auto_thread": True,           # Auto-create threads on @mention in channels (like Slack)
        "reactions": True,             # Add 👀/✅/❌ reactions to messages during processing
        "channel_prompts": {},         # Per-channel ephemeral system prompts (forum parents apply to child threads)
+        # discord_server tool: restrict which actions the agent may call.
+        # Default (empty) = all actions allowed (subject to bot privileged intents).
+        # Accepts comma-separated string ("list_guilds,list_channels,fetch_messages")
+        # or YAML list. Unknown names are dropped with a warning at load time.
+        # Actions: list_guilds, server_info, list_channels, channel_info,
+        # list_roles, member_info, search_members, fetch_messages, list_pins,
+        # pin_message, unpin_message, create_thread, add_role, remove_role.
+        "server_actions": "",
    },

    # WhatsApp platform settings (gateway mode)
@@ -807,7 +829,7 @@ DEFAULT_CONFIG = {
    },

    # Config schema version - bump this when adding new required fields
-    "_config_version": 19,
+    "_config_version": 21,
 }

 # =============================================================================
@@ -1830,12 +1852,53 @@ def _normalize_custom_provider_entry(
    if not isinstance(entry, dict):
        return None

+    # Accept camelCase aliases commonly used in hand-written configs.
+    _CAMEL_ALIASES: Dict[str, str] = {
+        "apiKey": "api_key",
+        "baseUrl": "base_url",
+        "apiMode": "api_mode",
+        "keyEnv": "key_env",
+        "defaultModel": "default_model",
+        "contextLength": "context_length",
+        "rateLimitDelay": "rate_limit_delay",
+    }
+    _KNOWN_KEYS = {
+        "name", "api", "url", "base_url", "api_key", "key_env",
+        "api_mode", "transport", "model", "default_model", "models",
+        "context_length", "rate_limit_delay",
+    }
+    for camel, snake in _CAMEL_ALIASES.items():
+        if camel in entry and snake not in entry:
+            logger.warning(
+                "providers.%s: camelCase key '%s' auto-mapped to '%s' "
+                "(use snake_case to avoid this warning)",
+                provider_key or "?", camel, snake,
+            )
+            entry[snake] = entry[camel]
+    unknown = set(entry.keys()) - _KNOWN_KEYS - set(_CAMEL_ALIASES.keys())
+    if unknown:
+        logger.warning(
+            "providers.%s: unknown config keys ignored: %s",
+            provider_key or "?", ", ".join(sorted(unknown)),
+        )
+
+    from urllib.parse import urlparse
+
    base_url = ""
-    for url_key in ("api", "url", "base_url"):
+    for url_key in ("base_url", "url", "api"):
        raw_url = entry.get(url_key)
        if isinstance(raw_url, str) and raw_url.strip():
-            base_url = raw_url.strip()
-            break
+            candidate = raw_url.strip()
+            parsed = urlparse(candidate)
+            if parsed.scheme and parsed.netloc:
+                base_url = candidate
+                break
+            else:
+                logger.warning(
+                    "providers.%s: '%s' value '%s' is not a valid URL "
+                    "(no scheme or host) — skipped",
+                    provider_key or "?", url_key, candidate,
+                )
    if not base_url:
        return None

@@ -2464,6 +2527,72 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
                    else:
                        print("  ✓ Removed unused compression.summary_* keys")

+    # ── Version 20 → 21: plugins are now opt-in; grandfather existing user plugins ──
+    # The loader now requires plugins to appear in ``plugins.enabled`` before
+    # loading. Existing installs had all discovered plugins loading by default
+    # (minus anything in ``plugins.disabled``). To avoid silently breaking
+    # those setups on upgrade, populate ``plugins.enabled`` with the set of
+    # currently-installed user plugins that aren't already disabled.
+    #
+    # Bundled plugins (shipped in the repo itself) are NOT grandfathered —
+    # they ship off for everyone, including existing users, so any user who
+    # wants one has to opt in explicitly.
+    if current_ver < 21:
+        config = read_raw_config()
+        plugins_cfg = config.get("plugins")
+        if not isinstance(plugins_cfg, dict):
+            plugins_cfg = {}
+        # Only migrate if the enabled allow-list hasn't been set yet.
+        if "enabled" not in plugins_cfg:
+            disabled = plugins_cfg.get("disabled", []) or []
+            if not isinstance(disabled, list):
+                disabled = []
+            disabled_set = set(disabled)
+
+            # Scan ``$HERMES_HOME/plugins/`` for currently installed user plugins.
+            grandfathered: List[str] = []
+            try:
+                from hermes_constants import get_hermes_home as _ghome
+                user_plugins_dir = _ghome() / "plugins"
+                if user_plugins_dir.is_dir():
+                    for child in sorted(user_plugins_dir.iterdir()):
+                        if not child.is_dir():
+                            continue
+                        manifest_file = child / "plugin.yaml"
+                        if not manifest_file.exists():
+                            manifest_file = child / "plugin.yml"
+                        if not manifest_file.exists():
+                            continue
+                        try:
+                            with open(manifest_file) as _mf:
+                                manifest = yaml.safe_load(_mf) or {}
+                        except Exception:
+                            manifest = {}
+                        name = manifest.get("name") or child.name
+                        if name in disabled_set:
+                            continue
+                        grandfathered.append(name)
+            except Exception:
+                grandfathered = []
+
+            plugins_cfg["enabled"] = grandfathered
+            config["plugins"] = plugins_cfg
+            save_config(config)
+            results["config_added"].append(
+                f"plugins.enabled (opt-in allow-list, {len(grandfathered)} grandfathered)"
+            )
+            if not quiet:
+                if grandfathered:
+                    print(
+                        f"  ✓ Plugins now opt-in: grandfathered "
+                        f"{len(grandfathered)} existing plugin(s) into plugins.enabled"
+                    )
+                else:
+                    print(
+                        "  ✓ Plugins now opt-in: no existing plugins to grandfather. "
+                        "Use `hermes plugins enable <name>` to activate."
+                    )
+
    if current_ver < latest_ver and not quiet:
        print(f"Config version: {current_ver} → {latest_ver}")
    
@@ -2866,19 +2995,6 @@ _FALLBACK_COMMENT = """
 # fallback_model:
 #   provider: openrouter
 #   model: anthropic/claude-sonnet-4
-#
-# ── Smart Model Routing ────────────────────────────────────────────────
-# Optional cheap-vs-strong routing for simple turns.
-# Keeps the primary model for complex work, but can route short/simple
-# messages to a cheaper model across providers.
-#
-# smart_model_routing:
-#   enabled: true
-#   max_simple_chars: 160
-#   max_simple_words: 28
-#   cheap_model:
-#     provider: openrouter
-#     model: google/gemini-2.5-flash
 """


@@ -2910,19 +3026,6 @@ _COMMENTED_SECTIONS = """
 # fallback_model:
 #   provider: openrouter
 #   model: anthropic/claude-sonnet-4
-#
-# ── Smart Model Routing ────────────────────────────────────────────────
-# Optional cheap-vs-strong routing for simple turns.
-# Keeps the primary model for complex work, but can route short/simple
-# messages to a cheaper model across providers.
-#
-# smart_model_routing:
-#   enabled: true
-#   max_simple_chars: 160
-#   max_simple_words: 28
-#   cheap_model:
-#     provider: openrouter
-#     model: google/gemini-2.5-flash
 """


@@ -3385,6 +3488,10 @@ def show_config():
    print(f"  Personality:  {display.get('personality', 'kawaii')}")
    print(f"  Reasoning:    {'on' if display.get('show_reasoning', False) else 'off'}")
    print(f"  Bell:         {'on' if display.get('bell_on_complete', False) else 'off'}")
+    ump = display.get('user_message_preview', {}) if isinstance(display.get('user_message_preview', {}), dict) else {}
+    ump_first = ump.get('first_lines', 2)
+    ump_last = ump.get('last_lines', 2)
+    print(f"  User preview: first {ump_first} line(s), last {ump_last} line(s)")

    # Terminal
    print()
@@ -277,6 +277,86 @@ def run_doctor(args):
    config_path = HERMES_HOME / 'config.yaml'
    if config_path.exists():
        check_ok(f"{_DHH}/config.yaml exists")
+
+        # Validate model.provider and model.default values
+        try:
+            import yaml as _yaml
+            cfg = _yaml.safe_load(config_path.read_text(encoding="utf-8")) or {}
+            model_section = cfg.get("model") or {}
+            provider_raw = (model_section.get("provider") or "").strip()
+            provider = provider_raw.lower()
+            default_model = (model_section.get("default") or model_section.get("model") or "").strip()
+
+            known_providers: set = set()
+            try:
+                from hermes_cli.auth import PROVIDER_REGISTRY
+                known_providers = set(PROVIDER_REGISTRY.keys()) | {"openrouter", "custom", "auto"}
+            except Exception:
+                pass
+            try:
+                from hermes_cli.auth import resolve_provider as _resolve_provider
+            except Exception:
+                _resolve_provider = None
+
+            canonical_provider = provider
+            if provider and _resolve_provider is not None and provider != "auto":
+                try:
+                    canonical_provider = _resolve_provider(provider)
+                except Exception:
+                    canonical_provider = None
+
+            if provider and provider != "auto":
+                if canonical_provider is None or (known_providers and canonical_provider not in known_providers):
+                    known_list = ", ".join(sorted(known_providers)) if known_providers else "(unavailable)"
+                    check_fail(
+                        f"model.provider '{provider_raw}' is not a recognised provider",
+                        f"(known: {known_list})",
+                    )
+                    issues.append(
+                        f"model.provider '{provider_raw}' is unknown. "
+                        f"Valid providers: {known_list}. "
+                        f"Fix: run 'hermes config set model.provider <valid_provider>'"
+                    )
+
+            # Warn if model is set to a provider-prefixed name on a provider that doesn't use them
+            if default_model and "/" in default_model and canonical_provider and canonical_provider not in ("openrouter", "custom", "auto", "ai-gateway", "kilocode", "opencode-zen", "huggingface", "nous"):
+                check_warn(
+                    f"model.default '{default_model}' uses a vendor/model slug but provider is '{provider_raw}'",
+                    "(vendor-prefixed slugs belong to aggregators like openrouter)",
+                )
+                issues.append(
+                    f"model.default '{default_model}' is vendor-prefixed but model.provider is '{provider_raw}'. "
+                    "Either set model.provider to 'openrouter', or drop the vendor prefix."
+                )
+
+            # Check credentials for the configured provider.
+            # Limit to API-key providers in PROVIDER_REGISTRY — other provider
+            # types (OAuth, SDK, openrouter/anthropic/custom/auto) have their
+            # own env-var checks elsewhere in doctor, and get_auth_status()
+            # returns a bare {logged_in: False} for anything it doesn't
+            # explicitly dispatch, which would produce false positives.
+            if canonical_provider and canonical_provider not in ("auto", "custom", "openrouter"):
+                try:
+                    from hermes_cli.auth import PROVIDER_REGISTRY, get_auth_status
+                    pconfig = PROVIDER_REGISTRY.get(canonical_provider)
+                    if pconfig and getattr(pconfig, "auth_type", "") == "api_key":
+                        status = get_auth_status(canonical_provider) or {}
+                        configured = bool(status.get("configured") or status.get("logged_in") or status.get("api_key"))
+                        if not configured:
+                            check_fail(
+                                f"model.provider '{canonical_provider}' is set but no API key is configured",
+                                "(check ~/.hermes/.env or run 'hermes setup')",
+                            )
+                            issues.append(
+                                f"No credentials found for provider '{canonical_provider}'. "
+                                f"Run 'hermes setup' or set the provider's API key in {_DHH}/.env, "
+                                f"or switch providers with 'hermes config set model.provider <name>'"
+                            )
+                except Exception:
+                    pass
+
+        except Exception as e:
+            check_warn("Could not validate model/provider config", f"({e})")
    else:
        fallback_config = PROJECT_ROOT / 'cli-config.yaml'
        if fallback_config.exists():
@@ -778,6 +858,16 @@ def run_doctor(args):
            elif response.status_code == 401:
                print(f"\r  {color('✗', Colors.RED)} OpenRouter API {color('(invalid API key)', Colors.DIM)}                ")
                issues.append("Check OPENROUTER_API_KEY in .env")
+            elif response.status_code == 402:
+                print(f"\r  {color('✗', Colors.RED)} OpenRouter API {color('(out of credits — payment required)', Colors.DIM)}")
+                issues.append(
+                    "OpenRouter account has insufficient credits. "
+                    "Fix: run 'hermes config set model.provider <provider>' to switch providers, "
+                    "or fund your OpenRouter account at https://openrouter.ai/settings/credits"
+                )
+            elif response.status_code == 429:
+                print(f"\r  {color('✗', Colors.RED)} OpenRouter API {color('(rate limited)', Colors.DIM)}                ")
+                issues.append("OpenRouter rate limit hit — consider switching to a different provider or waiting")
            else:
                print(f"\r  {color('✗', Colors.RED)} OpenRouter API {color(f'(HTTP {response.status_code})', Colors.DIM)}                ")
        except Exception as e:
@@ -160,7 +160,6 @@ def _config_overrides(config: dict) -> dict[str, str]:
        ("display", "streaming"),
        ("display", "skin"),
        ("display", "show_reasoning"),
-        ("smart_model_routing", "enabled"),
        ("privacy", "redact_pii"),
        ("tts", "provider"),
    ]
@@ -693,6 +693,10 @@ def _resolve_session_by_name_or_id(name_or_id: str) -> Optional[str]:
    - If it looks like a session ID (contains underscore + hex), try direct lookup first.
    - Otherwise, treat it as a title and use resolve_session_by_title (auto-latest).
    - Falls back to the other method if the first doesn't match.
+    - If the resolved session is a compression root, follow the chain forward
+      to the latest continuation. Users who remember the old root ID (e.g.
+      from an exit summary printed before the bug fix, or from notes) get
+      resumed at the live tip instead of a stale parent with no messages.
    """
    try:
        from hermes_state import SessionDB
@@ -701,14 +705,23 @@ def _resolve_session_by_name_or_id(name_or_id: str) -> Optional[str]:

        # Try as exact session ID first
        session = db.get_session(name_or_id)
+        resolved_id: Optional[str] = None
        if session:
-            db.close()
-            return session["id"]
+            resolved_id = session["id"]
+        else:
+            # Try as title (with auto-latest for lineage)
+            resolved_id = db.resolve_session_by_title(name_or_id)
+
+        if resolved_id:
+            # Project forward through compression chain so resumes land on
+            # the live tip instead of a dead compressed parent.
+            try:
+                resolved_id = db.get_compression_tip(resolved_id) or resolved_id
+            except Exception:
+                pass

-        # Try as title (with auto-latest for lineage)
-        session_id = db.resolve_session_by_title(name_or_id)
        db.close()
-        return session_id
+        return resolved_id
    except Exception:
        pass
    return None
@@ -2351,7 +2364,7 @@ def _model_flow_google_gemini_cli(_config, current_model=""):
        return

    models = list(_PROVIDER_MODELS.get("google-gemini-cli") or [])
-    default = current_model or (models[0] if models else "gemini-2.5-flash")
+    default = current_model or (models[0] if models else "gemini-3-flash-preview")
    selected = _prompt_model_selection(models, current_model=default)
    if selected:
        _save_model_choice(selected)
@@ -7002,6 +7015,13 @@ For more help on a command:
    wh_sub.add_argument(
        "--secret", default="", help="HMAC secret (auto-generated if omitted)"
    )
+    wh_sub.add_argument(
+        "--deliver-only",
+        action="store_true",
+        help="Skip the agent — deliver the rendered prompt directly as the "
+        "message. Zero LLM cost. Requires --deliver to be a real target "
+        "(not 'log').",
+    )

    webhook_subparsers.add_parser(
        "list", aliases=["ls"], help="List all dynamic subscriptions"
@@ -7429,6 +7449,17 @@ Examples:
        action="store_true",
        help="Remove existing plugin and reinstall",
    )
+    _install_enable_group = plugins_install.add_mutually_exclusive_group()
+    _install_enable_group.add_argument(
+        "--enable",
+        action="store_true",
+        help="Auto-enable the plugin after install (skip confirmation prompt)",
+    )
+    _install_enable_group.add_argument(
+        "--no-enable",
+        action="store_true",
+        help="Install disabled (skip confirmation prompt); enable later with `hermes plugins enable <name>`",
+    )

    plugins_update = plugins_subparsers.add_parser(
        "update", help="Pull latest changes for an installed plugin"
@@ -1035,21 +1035,49 @@ def list_authenticated_providers(
        seen_slugs.add(_cp.slug.lower())

    # --- 3. User-defined endpoints from config ---
+    # Track (name, base_url) of what section 3 emits so section 4 can skip
+    # any overlapping ``custom_providers:`` entries.  Callers typically pass
+    # both (gateway/CLI invoke ``get_compatible_custom_providers()`` which
+    # merges ``providers:`` into the list) — without this, the same endpoint
+    # produces two picker rows: one bare-slug ("openrouter") from section 3
+    # and one "custom:openrouter" from section 4, both labelled identically.
+    _section3_emitted_pairs: set = set()
    if user_providers and isinstance(user_providers, dict):
        for ep_name, ep_cfg in user_providers.items():
            if not isinstance(ep_cfg, dict):
                continue
+            # Skip if this slug was already emitted (e.g. canonical provider
+            # with the same name) or will be picked up by section 4.
+            if ep_name.lower() in seen_slugs:
+                continue
            display_name = ep_cfg.get("name", "") or ep_name
-            api_url = ep_cfg.get("api", "") or ep_cfg.get("url", "") or ""
-            default_model = ep_cfg.get("default_model", "")
+            # ``base_url`` is Hermes's canonical write key (matches
+            # custom_providers and _save_custom_provider); ``api`` / ``url``
+            # remain as fallbacks for hand-edited / legacy configs.
+            api_url = (
+                ep_cfg.get("base_url", "")
+                or ep_cfg.get("api", "")
+                or ep_cfg.get("url", "")
+                or ""
+            )
+            # ``default_model`` is the legacy key; ``model`` matches what
+            # custom_providers entries use, so accept either.
+            default_model = ep_cfg.get("default_model", "") or ep_cfg.get("model", "")

            # Build models list from both default_model and full models array
            models_list = []
            if default_model:
                models_list.append(default_model)
-            # Also include the full models list from config
+            # Also include the full models list from config.
+            # Hermes writes ``models:`` as a dict keyed by model id
+            # (see hermes_cli/main.py::_save_custom_provider); older
+            # configs or hand-edited files may still use a list.
            cfg_models = ep_cfg.get("models", [])
-            if isinstance(cfg_models, list):
+            if isinstance(cfg_models, dict):
+                for m in cfg_models:
+                    if m and m not in models_list:
+                        models_list.append(m)
+            elif isinstance(cfg_models, list):
                for m in cfg_models:
                    if m and m not in models_list:
                        models_list.append(m)
@@ -1066,6 +1094,13 @@ def list_authenticated_providers(
                "source": "user-config",
                "api_url": api_url,
            })
+            seen_slugs.add(ep_name.lower())
+            _pair = (
+                str(display_name).strip().lower(),
+                str(api_url).strip().rstrip("/").lower(),
+            )
+            if _pair[0] and _pair[1]:
+                _section3_emitted_pairs.add(_pair)

    # --- 4. Saved custom providers from config ---
    # Each ``custom_providers`` entry represents one model under a named
@@ -1100,13 +1135,41 @@ def list_authenticated_providers(
                    "api_url": api_url,
                    "models": [],
                }
+            # The singular ``model:`` field only holds the currently
+            # active model. Hermes's own writer (main.py::_save_custom_provider)
+            # stores every configured model as a dict under ``models:``;
+            # downstream readers (agent/models_dev.py, gateway/run.py,
+            # run_agent.py, hermes_cli/config.py) already consume that dict.
+            # The /model picker previously ignored it, so multi-model
+            # custom providers appeared to have only the active model.
            default_model = (entry.get("model") or "").strip()
            if default_model and default_model not in groups[slug]["models"]:
                groups[slug]["models"].append(default_model)

+            cfg_models = entry.get("models", {})
+            if isinstance(cfg_models, dict):
+                for m in cfg_models:
+                    if m and m not in groups[slug]["models"]:
+                        groups[slug]["models"].append(m)
+            elif isinstance(cfg_models, list):
+                for m in cfg_models:
+                    if m and m not in groups[slug]["models"]:
+                        groups[slug]["models"].append(m)
+
        for slug, grp in groups.items():
            if slug.lower() in seen_slugs:
                continue
+            # Skip if section 3 already emitted this endpoint under its
+            # ``providers:`` dict key — matches on (display_name, base_url),
+            # the tuple section 4 groups by.  Prevents two picker rows
+            # labelled identically when callers pass both ``user_providers``
+            # and a compatibility-merged ``custom_providers`` list.
+            _pair_key = (
+                str(grp["name"]).strip().lower(),
+                str(grp["api_url"]).strip().rstrip("/").lower(),
+            )
+            if _pair_key[0] and _pair_key[1] and _pair_key in _section3_emitted_pairs:
+                continue
            results.append({
                "slug": slug,
                "name": grp["name"],
@@ -16,6 +16,12 @@ from difflib import get_close_matches
 from pathlib import Path
 from typing import Any, NamedTuple, Optional

+from hermes_cli import __version__ as _HERMES_VERSION
+
+# Identify ourselves so endpoints fronted by Cloudflare's Browser Integrity
+# Check (error 1010) don't reject the default ``Python-urllib/*`` signature.
+_HERMES_USER_AGENT = f"hermes-cli/{_HERMES_VERSION}"
+
 COPILOT_BASE_URL = "https://api.githubcopilot.com"
 COPILOT_MODELS_URL = f"{COPILOT_BASE_URL}/models"
 COPILOT_EDITOR_VERSION = "vscode/1.104.1"
@@ -128,16 +134,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
    ],
    "gemini": [
        "gemini-3.1-pro-preview",
+        "gemini-3-pro-preview",
        "gemini-3-flash-preview",
        "gemini-3.1-flash-lite-preview",
-        "gemini-2.5-pro",
-        "gemini-2.5-flash",
-        "gemini-2.5-flash-lite",
    ],
    "google-gemini-cli": [
-        "gemini-2.5-pro",
-        "gemini-2.5-flash",
-        "gemini-2.5-flash-lite",
+        "gemini-3.1-pro-preview",
+        "gemini-3-pro-preview",
+        "gemini-3-flash-preview",
    ],
    "zai": [
        "glm-5.1",
@@ -227,7 +231,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "gpt-5.4-pro",
        "gpt-5.4",
        "gpt-5.3-codex",
-        "gpt-5.3-codex-spark",
        "gpt-5.2",
        "gpt-5.2-codex",
        "gpt-5.1",
@@ -552,7 +555,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("copilot",        "GitHub Copilot",           "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
    ProviderEntry("copilot-acp",    "GitHub Copilot ACP",       "GitHub Copilot ACP (spawns `copilot --acp --stdio`)"),
    ProviderEntry("huggingface",    "Hugging Face",             "Hugging Face Inference Providers (20+ open models)"),
-    ProviderEntry("gemini",         "Google AI Studio",         "Google AI Studio (Gemini models — OpenAI-compatible endpoint)"),
+    ProviderEntry("gemini",         "Google AI Studio",         "Google AI Studio (Gemini models — native Gemini API)"),
    ProviderEntry("google-gemini-cli", "Google Gemini (OAuth)",   "Google Gemini via OAuth + Code Assist (free tier supported; no API key needed)"),
    ProviderEntry("deepseek",       "DeepSeek",                 "DeepSeek (DeepSeek-V3, R1, coder — direct API)"),
    ProviderEntry("xai",            "xAI",                      "xAI (Grok models — direct API)"),
@@ -1771,7 +1774,7 @@ def probe_api_models(
        candidates.append((alternate_base, True))

    tried: list[str] = []
-    headers: dict[str, str] = {}
+    headers: dict[str, str] = {"User-Agent": _HERMES_USER_AGENT}
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"
    if normalized.startswith(COPILOT_BASE_URL):
@@ -2106,6 +2109,51 @@ def validate_requested_model(
                ),
            }

+    # MiniMax providers don't expose a /models endpoint — validate against
+    # the static catalog instead, similar to openai-codex.
+    if normalized in ("minimax", "minimax-cn"):
+        try:
+            catalog_models = provider_model_ids(normalized)
+        except Exception:
+            catalog_models = []
+        if catalog_models:
+            # Case-insensitive lookup (catalog uses mixed case like MiniMax-M2.7)
+            catalog_lower = {m.lower(): m for m in catalog_models}
+            if requested_for_lookup.lower() in catalog_lower:
+                return {
+                    "accepted": True,
+                    "persist": True,
+                    "recognized": True,
+                    "message": None,
+                }
+            # Auto-correct close matches (case-insensitive)
+            catalog_lower_list = list(catalog_lower.keys())
+            auto = get_close_matches(requested_for_lookup.lower(), catalog_lower_list, n=1, cutoff=0.9)
+            if auto:
+                corrected = catalog_lower[auto[0]]
+                return {
+                    "accepted": True,
+                    "persist": True,
+                    "recognized": True,
+                    "corrected_model": corrected,
+                    "message": f"Auto-corrected `{requested}` → `{corrected}`",
+                }
+            suggestions = get_close_matches(requested_for_lookup.lower(), catalog_lower_list, n=3, cutoff=0.5)
+            suggestion_text = ""
+            if suggestions:
+                suggestion_text = "\n  Similar models: " + ", ".join(f"`{catalog_lower[s]}`" for s in suggestions)
+            return {
+                "accepted": True,
+                "persist": True,
+                "recognized": False,
+                "message": (
+                    f"Note: `{requested}` was not found in the MiniMax catalog."
+                    f"{suggestion_text}"
+                    "\n  MiniMax does not expose a /models endpoint, so Hermes cannot verify the model name."
+                    "\n  The model may still work if it exists on the server."
+                ),
+            }
+
    # Probe the live API to check if the model actually exists
    api_models = fetch_api_models(api_key, base_url)

@@ -2,14 +2,20 @@
 Hermes Plugin System
 ====================

-Discovers, loads, and manages plugins from three sources:
+Discovers, loads, and manages plugins from four sources:

-1. **User plugins**   – ``~/.hermes/plugins/<name>/``
-2. **Project plugins** – ``./.hermes/plugins/<name>/`` (opt-in via
+1. **Bundled plugins** – ``<repo>/plugins/<name>/`` (shipped with hermes-agent;
+   ``memory/`` and ``context_engine/`` subdirs are excluded — they have their
+   own discovery paths)
+2. **User plugins**   – ``~/.hermes/plugins/<name>/``
+3. **Project plugins** – ``./.hermes/plugins/<name>/`` (opt-in via
   ``HERMES_ENABLE_PROJECT_PLUGINS``)
-3. **Pip plugins**     – packages that expose the ``hermes_agent.plugins``
+4. **Pip plugins**     – packages that expose the ``hermes_agent.plugins``
   entry-point group.

+Later sources override earlier ones on name collision, so a user or project
+plugin with the same name as a bundled plugin replaces it.
+
 Each directory plugin must contain a ``plugin.yaml`` manifest **and** an
 ``__init__.py`` with a ``register(ctx)`` function.

@@ -54,6 +60,8 @@ logger = logging.getLogger(__name__)
 VALID_HOOKS: Set[str] = {
    "pre_tool_call",
    "post_tool_call",
+    "transform_terminal_output",
+    "transform_tool_result",
    "pre_llm_call",
    "post_llm_call",
    "pre_api_request",
@@ -75,7 +83,12 @@ def _env_enabled(name: str) -> bool:


 def _get_disabled_plugins() -> set:
-    """Read the disabled plugins list from config.yaml."""
+    """Read the disabled plugins list from config.yaml.
+
+    Kept for backward compat and explicit deny-list semantics. A plugin
+    name in this set will never load, even if it appears in
+    ``plugins.enabled``.
+    """
    try:
        from hermes_cli.config import load_config
        config = load_config()
@@ -85,6 +98,36 @@ def _get_disabled_plugins() -> set:
        return set()


+def _get_enabled_plugins() -> Optional[set]:
+    """Read the enabled-plugins allow-list from config.yaml.
+
+    Plugins are opt-in by default — only plugins whose name appears in
+    this set are loaded. Returns:
+
+    * ``None`` — the key is missing or malformed. Callers should treat
+      this as "nothing enabled yet" (the opt-in default); the first
+      ``migrate_config`` run populates the key with a grandfathered set
+      of currently-installed user plugins so existing setups don't
+      break on upgrade.
+    * ``set()`` — an empty list was explicitly set; nothing loads.
+    * ``set(...)`` — the concrete allow-list.
+    """
+    try:
+        from hermes_cli.config import load_config
+        config = load_config()
+        plugins_cfg = config.get("plugins")
+        if not isinstance(plugins_cfg, dict):
+            return None
+        if "enabled" not in plugins_cfg:
+            return None
+        enabled = plugins_cfg.get("enabled")
+        if not isinstance(enabled, list):
+            return None
+        return set(enabled)
+    except Exception:
+        return None
+
+
 # ---------------------------------------------------------------------------
 # Data classes
 # ---------------------------------------------------------------------------
@@ -420,27 +463,66 @@ class PluginManager:

        manifests: List[PluginManifest] = []

-        # 1. User plugins (~/.hermes/plugins/)
+        # 1. Bundled plugins (<repo>/plugins/<name>/)
+        # Repo-shipped generic plugins live next to hermes_cli/.  Memory and
+        # context_engine subdirs are handled by their own discovery paths, so
+        # skip those names here.  Bundled plugins are discovered (so they
+        # show up in `hermes plugins`) but only loaded when added to
+        # `plugins.enabled` in config.yaml — opt-in like any other plugin.
+        repo_plugins = Path(__file__).resolve().parent.parent / "plugins"
+        manifests.extend(
+            self._scan_directory(
+                repo_plugins,
+                source="bundled",
+                skip_names={"memory", "context_engine"},
+            )
+        )
+
+        # 2. User plugins (~/.hermes/plugins/)
        user_dir = get_hermes_home() / "plugins"
        manifests.extend(self._scan_directory(user_dir, source="user"))

-        # 2. Project plugins (./.hermes/plugins/)
+        # 3. Project plugins (./.hermes/plugins/)
        if _env_enabled("HERMES_ENABLE_PROJECT_PLUGINS"):
            project_dir = Path.cwd() / ".hermes" / "plugins"
            manifests.extend(self._scan_directory(project_dir, source="project"))

-        # 3. Pip / entry-point plugins
+        # 4. Pip / entry-point plugins
        manifests.extend(self._scan_entry_points())

-        # Load each manifest (skip user-disabled plugins)
+        # Load each manifest (skip user-disabled plugins).
+        # Later sources override earlier ones on name collision — user plugins
+        # take precedence over bundled, project plugins take precedence over
+        # user.  Dedup here so we only load the final winner.
        disabled = _get_disabled_plugins()
+        enabled = _get_enabled_plugins()  # None = opt-in default (nothing enabled)
+        winners: Dict[str, PluginManifest] = {}
        for manifest in manifests:
+            winners[manifest.name] = manifest
+        for manifest in winners.values():
+            # Explicit disable always wins.
            if manifest.name in disabled:
                loaded = LoadedPlugin(manifest=manifest, enabled=False)
                loaded.error = "disabled via config"
                self._plugins[manifest.name] = loaded
                logger.debug("Skipping disabled plugin '%s'", manifest.name)
                continue
+            # Opt-in gate: plugins must be in the enabled allow-list.
+            # If the allow-list is missing (None), treat as "nothing enabled"
+            # — users have to explicitly enable plugins to load them.
+            # Memory and context_engine providers are excluded from this gate
+            # since they have their own single-select config (memory.provider
+            # / context.engine), not the enabled list.
+            if enabled is None or manifest.name not in enabled:
+                loaded = LoadedPlugin(manifest=manifest, enabled=False)
+                loaded.error = "not enabled in config (run `hermes plugins enable {}` to activate)".format(
+                    manifest.name
+                )
+                self._plugins[manifest.name] = loaded
+                logger.debug(
+                    "Skipping '%s' (not in plugins.enabled)", manifest.name
+                )
+                continue
            self._load_plugin(manifest)

        if manifests:
@@ -454,8 +536,18 @@ class PluginManager:
    # Directory scanning
    # -----------------------------------------------------------------------

-    def _scan_directory(self, path: Path, source: str) -> List[PluginManifest]:
-        """Read ``plugin.yaml`` manifests from subdirectories of *path*."""
+    def _scan_directory(
+        self,
+        path: Path,
+        source: str,
+        skip_names: Optional[Set[str]] = None,
+    ) -> List[PluginManifest]:
+        """Read ``plugin.yaml`` manifests from subdirectories of *path*.
+
+        *skip_names* is an optional allow-list of names to ignore (used
+        for the bundled scan to exclude ``memory`` / ``context_engine``
+        subdirs that have their own discovery path).
+        """
        manifests: List[PluginManifest] = []
        if not path.is_dir():
            return manifests
@@ -463,6 +555,8 @@ class PluginManager:
        for child in sorted(path.iterdir()):
            if not child.is_dir():
                continue
+            if skip_names and child.name in skip_names:
+                continue
            manifest_file = child / "plugin.yaml"
            if not manifest_file.exists():
                manifest_file = child / "plugin.yml"
@@ -530,7 +624,7 @@ class PluginManager:
        loaded = LoadedPlugin(manifest=manifest)

        try:
-            if manifest.source in ("user", "project"):
+            if manifest.source in ("user", "project", "bundled"):
                module = self._load_directory_module(manifest)
            else:
                module = self._load_entrypoint_module(manifest)
@@ -779,23 +873,31 @@ def get_pre_tool_call_block_message(
    return None


+def _ensure_plugins_discovered() -> PluginManager:
+    """Return the global manager after running idempotent plugin discovery."""
+    manager = get_plugin_manager()
+    manager.discover_and_load()
+    return manager
+
+
 def get_plugin_context_engine():
    """Return the plugin-registered context engine, or None."""
-    return get_plugin_manager()._context_engine
+    return _ensure_plugins_discovered()._context_engine


 def get_plugin_command_handler(name: str) -> Optional[Callable]:
    """Return the handler for a plugin-registered slash command, or ``None``."""
-    entry = get_plugin_manager()._plugin_commands.get(name)
+    entry = _ensure_plugins_discovered()._plugin_commands.get(name)
    return entry["handler"] if entry else None


 def get_plugin_commands() -> Dict[str, dict]:
    """Return the full plugin commands dict (name → {handler, description, plugin}).

-    Safe to call before discovery — returns an empty dict if no plugins loaded.
+    Triggers idempotent plugin discovery so callers can use plugin commands
+    before any explicit discover_plugins() call.
    """
-    return get_plugin_manager()._plugin_commands
+    return _ensure_plugins_discovered()._plugin_commands


 def get_plugin_toolsets() -> List[tuple]:
@@ -15,6 +15,7 @@ import shutil
 import subprocess
 import sys
 from pathlib import Path
+from typing import Optional

 from hermes_constants import get_hermes_home

@@ -281,8 +282,16 @@ def _require_installed_plugin(name: str, plugins_dir: Path, console) -> Path:
 # ---------------------------------------------------------------------------


-def cmd_install(identifier: str, force: bool = False) -> None:
-    """Install a plugin from a Git URL or owner/repo shorthand."""
+def cmd_install(
+    identifier: str,
+    force: bool = False,
+    enable: Optional[bool] = None,
+) -> None:
+    """Install a plugin from a Git URL or owner/repo shorthand.
+
+    After install, prompt "Enable now? [y/N]" unless *enable* is provided
+    (True = auto-enable without prompting, False = install disabled).
+    """
    import tempfile
    from rich.console import Console

@@ -391,6 +400,40 @@ def cmd_install(identifier: str, force: bool = False) -> None:

    _display_after_install(target, identifier)

+    # Determine the canonical plugin name for enable-list bookkeeping.
+    installed_name = installed_manifest.get("name") or target.name
+
+    # Decide whether to enable: explicit flag > interactive prompt > default off
+    should_enable = enable
+    if should_enable is None:
+        # Interactive prompt unless stdin isn't a TTY (scripted install).
+        if sys.stdin.isatty() and sys.stdout.isatty():
+            try:
+                answer = input(
+                    f"  Enable '{installed_name}' now? [y/N]: "
+                ).strip().lower()
+                should_enable = answer in ("y", "yes")
+            except (EOFError, KeyboardInterrupt):
+                should_enable = False
+        else:
+            should_enable = False
+
+    if should_enable:
+        enabled = _get_enabled_set()
+        disabled = _get_disabled_set()
+        enabled.add(installed_name)
+        disabled.discard(installed_name)
+        _save_enabled_set(enabled)
+        _save_disabled_set(disabled)
+        console.print(
+            f"[green]✓[/green] Plugin [bold]{installed_name}[/bold] enabled."
+        )
+    else:
+        console.print(
+            f"[dim]Plugin installed but not enabled. "
+            f"Run `hermes plugins enable {installed_name}` to activate.[/dim]"
+        )
+
    console.print("[dim]Restart the gateway for the plugin to take effect:[/dim]")
    console.print("[dim]  hermes gateway restart[/dim]")
    console.print()
@@ -468,7 +511,11 @@ def cmd_remove(name: str) -> None:


 def _get_disabled_set() -> set:
-    """Read the disabled plugins set from config.yaml."""
+    """Read the disabled plugins set from config.yaml.
+
+    An explicit deny-list. A plugin name here never loads, even if also
+    listed in ``plugins.enabled``.
+    """
    try:
        from hermes_cli.config import load_config
        config = load_config()
@@ -488,103 +535,196 @@ def _save_disabled_set(disabled: set) -> None:
    save_config(config)


+def _get_enabled_set() -> set:
+    """Read the enabled plugins allow-list from config.yaml.
+
+    Plugins are opt-in: only names here are loaded. Returns ``set()`` if
+    the key is missing (same behaviour as "nothing enabled yet").
+    """
+    try:
+        from hermes_cli.config import load_config
+        config = load_config()
+        plugins_cfg = config.get("plugins", {})
+        if not isinstance(plugins_cfg, dict):
+            return set()
+        enabled = plugins_cfg.get("enabled", [])
+        return set(enabled) if isinstance(enabled, list) else set()
+    except Exception:
+        return set()
+
+
+def _save_enabled_set(enabled: set) -> None:
+    """Write the enabled plugins list to config.yaml."""
+    from hermes_cli.config import load_config, save_config
+    config = load_config()
+    if "plugins" not in config:
+        config["plugins"] = {}
+    config["plugins"]["enabled"] = sorted(enabled)
+    save_config(config)
+
+
 def cmd_enable(name: str) -> None:
-    """Enable a previously disabled plugin."""
+    """Add a plugin to the enabled allow-list (and remove it from disabled)."""
    from rich.console import Console

    console = Console()
-    plugins_dir = _plugins_dir()
-
-    # Verify the plugin exists
-    target = plugins_dir / name
-    if not target.is_dir():
-        console.print(f"[red]Plugin '{name}' is not installed.[/red]")
+    # Discover the plugin — check installed (user) AND bundled.
+    if not _plugin_exists(name):
+        console.print(f"[red]Plugin '{name}' is not installed or bundled.[/red]")
        sys.exit(1)

+    enabled = _get_enabled_set()
    disabled = _get_disabled_set()
-    if name not in disabled:
+
+    if name in enabled and name not in disabled:
        console.print(f"[dim]Plugin '{name}' is already enabled.[/dim]")
        return

+    enabled.add(name)
    disabled.discard(name)
+    _save_enabled_set(enabled)
    _save_disabled_set(disabled)
-    console.print(f"[green]✓[/green] Plugin [bold]{name}[/bold] enabled. Takes effect on next session.")
+    console.print(
+        f"[green]✓[/green] Plugin [bold]{name}[/bold] enabled. "
+        "Takes effect on next session."
+    )


 def cmd_disable(name: str) -> None:
-    """Disable a plugin without removing it."""
+    """Remove a plugin from the enabled allow-list (and add to disabled)."""
    from rich.console import Console

    console = Console()
-    plugins_dir = _plugins_dir()
-
-    # Verify the plugin exists
-    target = plugins_dir / name
-    if not target.is_dir():
-        console.print(f"[red]Plugin '{name}' is not installed.[/red]")
+    if not _plugin_exists(name):
+        console.print(f"[red]Plugin '{name}' is not installed or bundled.[/red]")
        sys.exit(1)

+    enabled = _get_enabled_set()
    disabled = _get_disabled_set()
-    if name in disabled:
+
+    if name not in enabled and name in disabled:
        console.print(f"[dim]Plugin '{name}' is already disabled.[/dim]")
        return

+    enabled.discard(name)
    disabled.add(name)
+    _save_enabled_set(enabled)
    _save_disabled_set(disabled)
-    console.print(f"[yellow]\u2298[/yellow] Plugin [bold]{name}[/bold] disabled. Takes effect on next session.")
+    console.print(
+        f"[yellow]\u2298[/yellow] Plugin [bold]{name}[/bold] disabled. "
+        "Takes effect on next session."
+    )


-def cmd_list() -> None:
-    """List installed plugins."""
-    from rich.console import Console
-    from rich.table import Table
+def _plugin_exists(name: str) -> bool:
+    """Return True if a plugin with *name* is installed (user) or bundled."""
+    # Installed: directory name or manifest name match in user plugins dir
+    user_dir = _plugins_dir()
+    if user_dir.is_dir():
+        if (user_dir / name).is_dir():
+            return True
+        for child in user_dir.iterdir():
+            if not child.is_dir():
+                continue
+            manifest = _read_manifest(child)
+            if manifest.get("name") == name:
+                return True
+    # Bundled: <repo>/plugins/<name>/
+    from pathlib import Path as _P
+    import hermes_cli
+    repo_plugins = _P(hermes_cli.__file__).resolve().parent.parent / "plugins"
+    if repo_plugins.is_dir():
+        candidate = repo_plugins / name
+        if candidate.is_dir() and (
+            (candidate / "plugin.yaml").exists()
+            or (candidate / "plugin.yml").exists()
+        ):
+            return True
+    return False

+
+def _discover_all_plugins() -> list:
+    """Return a list of (name, version, description, source, dir_path) for
+    every plugin the loader can see — user + bundled + project.
+
+    Matches the ordering/dedup of ``PluginManager.discover_and_load``:
+    bundled first, then user, then project; user overrides bundled on
+    name collision.
+    """
    try:
        import yaml
    except ImportError:
        yaml = None

-    console = Console()
-    plugins_dir = _plugins_dir()
+    seen: dict = {}  # name -> (name, version, description, source, path)

-    dirs = sorted(d for d in plugins_dir.iterdir() if d.is_dir())
-    if not dirs:
+    # Bundled (<repo>/plugins/<name>/), excluding memory/ and context_engine/
+    import hermes_cli
+    repo_plugins = Path(hermes_cli.__file__).resolve().parent.parent / "plugins"
+    for base, source in ((repo_plugins, "bundled"), (_plugins_dir(), "user")):
+        if not base.is_dir():
+            continue
+        for d in sorted(base.iterdir()):
+            if not d.is_dir():
+                continue
+            if source == "bundled" and d.name in ("memory", "context_engine"):
+                continue
+            manifest_file = d / "plugin.yaml"
+            if not manifest_file.exists():
+                manifest_file = d / "plugin.yml"
+            if not manifest_file.exists():
+                continue
+            name = d.name
+            version = ""
+            description = ""
+            if yaml:
+                try:
+                    with open(manifest_file) as f:
+                        manifest = yaml.safe_load(f) or {}
+                    name = manifest.get("name", d.name)
+                    version = manifest.get("version", "")
+                    description = manifest.get("description", "")
+                except Exception:
+                    pass
+            # User plugins override bundled on name collision.
+            if name in seen and source == "bundled":
+                continue
+            src_label = source
+            if source == "user" and (d / ".git").exists():
+                src_label = "git"
+            seen[name] = (name, version, description, src_label, d)
+    return list(seen.values())
+
+
+def cmd_list() -> None:
+    """List all plugins (bundled + user) with enabled/disabled state."""
+    from rich.console import Console
+    from rich.table import Table
+
+    console = Console()
+    entries = _discover_all_plugins()
+    if not entries:
        console.print("[dim]No plugins installed.[/dim]")
        console.print("[dim]Install with:[/dim] hermes plugins install owner/repo")
        return

+    enabled = _get_enabled_set()
    disabled = _get_disabled_set()

-    table = Table(title="Installed Plugins", show_lines=False)
+    table = Table(title="Plugins", show_lines=False)
    table.add_column("Name", style="bold")
    table.add_column("Status")
    table.add_column("Version", style="dim")
    table.add_column("Description")
    table.add_column("Source", style="dim")

-    for d in dirs:
-        manifest_file = d / "plugin.yaml"
-        name = d.name
-        version = ""
-        description = ""
-        source = "local"
-
-        if manifest_file.exists() and yaml:
-            try:
-                with open(manifest_file) as f:
-                    manifest = yaml.safe_load(f) or {}
-                name = manifest.get("name", d.name)
-                version = manifest.get("version", "")
-                description = manifest.get("description", "")
-            except Exception:
-                pass
-
-        # Check if it's a git repo (installed via hermes plugins install)
-        if (d / ".git").exists():
-            source = "git"
-
-        is_disabled = name in disabled or d.name in disabled
-        status = "[red]disabled[/red]" if is_disabled else "[green]enabled[/green]"
+    for name, version, description, source, _dir in entries:
+        if name in disabled:
+            status = "[red]disabled[/red]"
+        elif name in enabled:
+            status = "[green]enabled[/green]"
+        else:
+            status = "[yellow]not enabled[/yellow]"
        table.add_row(name, status, str(version), description, source)

    console.print()
@@ -592,6 +732,7 @@ def cmd_list() -> None:
    console.print()
    console.print("[dim]Interactive toggle:[/dim] hermes plugins")
    console.print("[dim]Enable/disable:[/dim] hermes plugins enable/disable <name>")
+    console.print("[dim]Plugins are opt-in by default — only 'enabled' plugins load.[/dim]")


 # ---------------------------------------------------------------------------
@@ -742,41 +883,25 @@ def cmd_toggle() -> None:
    """Interactive composite UI — general plugins + provider plugin categories."""
    from rich.console import Console

-    try:
-        import yaml
-    except ImportError:
-        yaml = None
-
    console = Console()
-    plugins_dir = _plugins_dir()

-    # -- General plugins discovery --
-    dirs = sorted(d for d in plugins_dir.iterdir() if d.is_dir())
-    disabled = _get_disabled_set()
+    # -- General plugins discovery (bundled + user) --
+    entries = _discover_all_plugins()
+    enabled_set = _get_enabled_set()
+    disabled_set = _get_disabled_set()

    plugin_names = []
    plugin_labels = []
    plugin_selected = set()

-    for i, d in enumerate(dirs):
-        manifest_file = d / "plugin.yaml"
-        name = d.name
-        description = ""
-
-        if manifest_file.exists() and yaml:
-            try:
-                with open(manifest_file) as f:
-                    manifest = yaml.safe_load(f) or {}
-                name = manifest.get("name", d.name)
-                description = manifest.get("description", "")
-            except Exception:
-                pass
-
-        plugin_names.append(name)
+    for i, (name, _version, description, source, _d) in enumerate(entries):
        label = f"{name} \u2014 {description}" if description else name
+        if source == "bundled":
+            label = f"{label} [bundled]"
+        plugin_names.append(name)
        plugin_labels.append(label)
-
-        if name not in disabled and d.name not in disabled:
+        # Selected (enabled) when in enabled-set AND not in disabled-set
+        if name in enabled_set and name not in disabled_set:
            plugin_selected.add(i)

    # -- Provider categories --
@@ -804,10 +929,10 @@ def cmd_toggle() -> None:
    try:
        import curses
        _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
-                          disabled, categories, console)
+                          disabled_set, categories, console)
    except ImportError:
        _run_composite_fallback(plugin_names, plugin_labels, plugin_selected,
-                                disabled, categories, console)
+                                disabled_set, categories, console)


 def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
@@ -1020,18 +1145,29 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
    curses.wrapper(_draw)
    flush_stdin()

-    # Persist general plugin changes
-    new_disabled = set()
+    # Persist general plugin changes. The new allow-list is the set of
+    # plugin names that were checked; anything not checked is explicitly
+    # disabled (written to disabled-list) so it remains off even if the
+    # plugin code does something clever like auto-enable in the future.
+    new_enabled: set = set()
+    new_disabled: set = set(disabled)  # preserve existing disabled state for unseen plugins
    for i, name in enumerate(plugin_names):
-        if i not in chosen:
+        if i in chosen:
+            new_enabled.add(name)
+            new_disabled.discard(name)
+        else:
            new_disabled.add(name)

-    if new_disabled != disabled:
+    prev_enabled = _get_enabled_set()
+    enabled_changed = new_enabled != prev_enabled
+    disabled_changed = new_disabled != disabled
+
+    if enabled_changed or disabled_changed:
+        _save_enabled_set(new_enabled)
        _save_disabled_set(new_disabled)
-        enabled_count = len(plugin_names) - len(new_disabled)
        console.print(
-            f"\n[green]\u2713[/green] General plugins: {enabled_count} enabled, "
-            f"{len(new_disabled)} disabled."
+            f"\n[green]\u2713[/green] General plugins: {len(new_enabled)} enabled, "
+            f"{len(plugin_names) - len(new_enabled)} disabled."
        )
    elif n_plugins > 0:
        console.print("\n[dim]General plugins unchanged.[/dim]")
@@ -1078,11 +1214,17 @@ def _run_composite_fallback(plugin_names, plugin_labels, plugin_selected,
                return
            print()

-        new_disabled = set()
+        new_enabled: set = set()
+        new_disabled: set = set(disabled)
        for i, name in enumerate(plugin_names):
-            if i not in chosen:
+            if i in chosen:
+                new_enabled.add(name)
+                new_disabled.discard(name)
+            else:
                new_disabled.add(name)
-        if new_disabled != disabled:
+        prev_enabled = _get_enabled_set()
+        if new_enabled != prev_enabled or new_disabled != disabled:
+            _save_enabled_set(new_enabled)
            _save_disabled_set(new_disabled)

    # Provider categories
@@ -1108,7 +1250,17 @@ def plugins_command(args) -> None:
    action = getattr(args, "plugins_action", None)

    if action == "install":
-        cmd_install(args.identifier, force=getattr(args, "force", False))
+        # Map argparse tri-state: --enable=True, --no-enable=False, neither=None (prompt)
+        enable_arg = None
+        if getattr(args, "enable", False):
+            enable_arg = True
+        elif getattr(args, "no_enable", False):
+            enable_arg = False
+        cmd_install(
+            args.identifier,
+            force=getattr(args, "force", False),
+            enable=enable_arg,
+        )
    elif action == "update":
        cmd_update(args.name)
    elif action in ("remove", "rm", "uninstall"):
@@ -322,12 +322,16 @@ def normalize_provider(name: str) -> str:


 def get_provider(name: str) -> Optional[ProviderDef]:
-    """Look up a provider by id or alias, merging all data sources.
+    """Look up a built-in provider by id or alias.

    Resolution order:
      1. Hermes overlays (for providers not in models.dev: nous, openai-codex, etc.)
      2. models.dev catalog + Hermes overlay
-      3. User-defined providers from config (TODO: Phase 4)
+
+    User-defined providers from config.yaml (``providers:`` / ``custom_providers:``)
+    are resolved by :func:`resolve_provider_full`, which layers ``resolve_user_provider``
+    and ``resolve_custom_provider`` on top of this function. Callers that need
+    user-config support should use ``resolve_provider_full`` instead.

    Returns a fully-resolved ProviderDef or None.
    """
@@ -38,14 +38,21 @@ def _normalize_custom_provider_name(value: str) -> str:
 def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
    """Auto-detect api_mode from the resolved base URL.

-    Direct api.openai.com endpoints need the Responses API for GPT-5.x
-    tool calls with reasoning (chat/completions returns 400).
+    - Direct api.openai.com endpoints need the Responses API for GPT-5.x
+      tool calls with reasoning (chat/completions returns 400).
+    - Third-party Anthropic-compatible gateways (MiniMax, Zhipu GLM,
+      LiteLLM proxies, etc.) conventionally expose the native Anthropic
+      protocol under a ``/anthropic`` suffix — treat those as
+      ``anthropic_messages`` transport instead of the default
+      ``chat_completions``.
    """
    normalized = (base_url or "").strip().lower().rstrip("/")
    if "api.x.ai" in normalized:
        return "codex_responses"
    if "api.openai.com" in normalized and "openrouter" not in normalized:
        return "codex_responses"
+    if normalized.endswith("/anthropic"):
+        return "anthropic_messages"
    return None


@@ -194,8 +201,12 @@ def _resolve_runtime_from_pool_entry(
        elif provider in ("opencode-zen", "opencode-go"):
            from hermes_cli.models import opencode_model_api_mode
            api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
-        elif base_url.rstrip("/").endswith("/anthropic"):
-            api_mode = "anthropic_messages"
+        else:
+            # Auto-detect Anthropic-compatible endpoints (/anthropic suffix,
+            # api.openai.com → codex_responses, api.x.ai → codex_responses).
+            detected = _detect_api_mode_for_url(base_url)
+            if detected:
+                api_mode = detected

    # OpenCode base URLs end with /v1 for OpenAI-compatible models, but the
    # Anthropic SDK prepends its own /v1/messages to the base_url.  Strip the
@@ -642,8 +653,11 @@ def _resolve_explicit_runtime(
            configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
            if configured_mode:
                api_mode = configured_mode
-            elif base_url.rstrip("/").endswith("/anthropic"):
-                api_mode = "anthropic_messages"
+            else:
+                # Auto-detect Anthropic-compatible endpoints (/anthropic suffix).
+                detected = _detect_api_mode_for_url(base_url)
+                if detected:
+                    api_mode = detected

        return {
            "provider": provider,
@@ -965,10 +979,13 @@ def resolve_runtime_provider(
            elif provider in ("opencode-zen", "opencode-go"):
                from hermes_cli.models import opencode_model_api_mode
                api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
-            # Auto-detect Anthropic-compatible endpoints by URL convention
-            # (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
-            elif base_url.rstrip("/").endswith("/anthropic"):
-                api_mode = "anthropic_messages"
+            else:
+                # Auto-detect Anthropic-compatible endpoints by URL convention
+                # (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
+                # plus api.openai.com → codex_responses and api.x.ai → codex_responses.
+                detected = _detect_api_mode_for_url(base_url)
+                if detected:
+                    api_mode = detected
        # Strip trailing /v1 for OpenCode Anthropic models (see comment above).
        if api_mode == "anthropic_messages" and provider in ("opencode-zen", "opencode-go"):
            base_url = re.sub(r"/v1/?$", "", base_url)
@@ -89,8 +89,8 @@ _DEFAULT_PROVIDER_MODELS = {
        "grok-code-fast-1",
    ],
    "gemini": [
-        "gemini-3.1-pro-preview", "gemini-3-flash-preview", "gemini-3.1-flash-lite-preview",
-        "gemini-2.5-pro", "gemini-2.5-flash", "gemini-2.5-flash-lite",
+        "gemini-3.1-pro-preview", "gemini-3-pro-preview",
+        "gemini-3-flash-preview", "gemini-3.1-flash-lite-preview",
    ],
    "zai": ["glm-5.1", "glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
    "kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
@@ -0,0 +1,82 @@
+from __future__ import annotations
+
+
+def _coerce_timeout(raw: object) -> float | None:
+    try:
+        timeout = float(raw)
+    except (TypeError, ValueError):
+        return None
+    if timeout <= 0:
+        return None
+    return timeout
+
+
+def get_provider_request_timeout(
+    provider_id: str, model: str | None = None
+) -> float | None:
+    """Return a configured provider request timeout in seconds, if any."""
+    if not provider_id:
+        return None
+
+    try:
+        from hermes_cli.config import load_config
+    except ImportError:
+        return None
+
+    config = load_config()
+    providers = config.get("providers", {}) if isinstance(config, dict) else {}
+    provider_config = (
+        providers.get(provider_id, {}) if isinstance(providers, dict) else {}
+    )
+    if not isinstance(provider_config, dict):
+        return None
+
+    model_config = _get_model_config(provider_config, model)
+    if model_config is not None:
+        timeout = _coerce_timeout(model_config.get("timeout_seconds"))
+        if timeout is not None:
+            return timeout
+
+    return _coerce_timeout(provider_config.get("request_timeout_seconds"))
+
+
+def get_provider_stale_timeout(
+    provider_id: str, model: str | None = None
+) -> float | None:
+    """Return a configured non-stream stale timeout in seconds, if any."""
+    if not provider_id:
+        return None
+
+    try:
+        from hermes_cli.config import load_config
+    except ImportError:
+        return None
+
+    config = load_config()
+    providers = config.get("providers", {}) if isinstance(config, dict) else {}
+    provider_config = (
+        providers.get(provider_id, {}) if isinstance(providers, dict) else {}
+    )
+    if not isinstance(provider_config, dict):
+        return None
+
+    model_config = _get_model_config(provider_config, model)
+    if model_config is not None:
+        timeout = _coerce_timeout(model_config.get("stale_timeout_seconds"))
+        if timeout is not None:
+            return timeout
+
+    return _coerce_timeout(provider_config.get("stale_timeout_seconds"))
+
+
+def _get_model_config(
+    provider_config: dict[str, object], model: str | None
+) -> dict[str, object] | None:
+    if not model:
+        return None
+
+    models = provider_config.get("models", {})
+    model_config = models.get(model, {}) if isinstance(models, dict) else {}
+    if isinstance(model_config, dict):
+        return model_config
+    return None
@@ -245,7 +245,7 @@ TIPS = [
    "Three plugin types: general (tools/hooks), memory providers, and context engines.",
    "hermes plugins install owner/repo installs plugins directly from GitHub.",
    "8 external memory providers available: Honcho, OpenViking, Mem0, Hindsight, and more.",
-    "Plugin hooks include pre_tool_call, post_tool_call, pre_llm_call, and post_llm_call.",
+    "Plugin hooks include pre/post_tool_call, pre/post_llm_call, and transform_terminal_output for output canonicalization.",

    # --- Miscellaneous ---
    "Prompt caching (Anthropic) reduces costs by reusing cached system prompt prefixes.",
@@ -323,7 +323,6 @@ TIPS = [
    "GPT-5 and Codex use 'developer' role instead of 'system' in the message format.",
    "Per-task auxiliary overrides: auxiliary.vision.provider, auxiliary.compression.model, etc. in config.yaml.",
    "The auxiliary client treats 'main' as a provider alias — resolves to your actual primary provider + model.",
-    "Smart routing can auto-route simple queries to a cheaper model — set smart_model_routing.enabled: true.",
    "hermes claw migrate --dry-run previews OpenClaw migration without writing anything.",
    "File paths pasted with quotes or escaped spaces are handled automatically — no manual cleanup needed.",
    "Slash commands never trigger the large-paste collapse — /command with big arguments works correctly.",
@@ -346,4 +345,3 @@ def get_random_tip(exclude_recent: int = 0) -> str:
    return random.choice(TIPS)


-
@@ -232,8 +232,8 @@ _CATEGORY_MERGE: Dict[str, str] = {
    "checkpoints": "agent",
    "approvals": "security",
    "human_delay": "display",
-    "smart_model_routing": "agent",
    "dashboard": "display",
+    "code_execution": "agent",
 }

 # Display order for tabs — unlisted categories sort alphabetically after these.
@@ -1958,6 +1958,8 @@ async def update_config_raw(body: RawConfigUpdate):
@app.get("/api/analytics/usage")
 async def get_usage_analytics(days: int = 30):
    from hermes_state import SessionDB
+    from agent.insights import InsightsEngine
+
    db = SessionDB()
    try:
        cutoff = time.time() - (days * 86400)
@@ -1997,8 +1999,24 @@ async def get_usage_analytics(days: int = 30):
            FROM sessions WHERE started_at > ?
        """, (cutoff,))
        totals = dict(cur3.fetchone())
+        insights_report = InsightsEngine(db).generate(days=days)
+        skills = insights_report.get("skills", {
+            "summary": {
+                "total_skill_loads": 0,
+                "total_skill_edits": 0,
+                "total_skill_actions": 0,
+                "distinct_skills_used": 0,
+            },
+            "top_skills": [],
+        })

-        return {"daily": daily, "by_model": by_model, "totals": totals, "period_days": days}
+        return {
+            "daily": daily,
+            "by_model": by_model,
+            "totals": totals,
+            "period_days": days,
+            "skills": skills,
+        }
    finally:
        db.close()

@@ -155,6 +155,15 @@ def _cmd_subscribe(args):
        "created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
    }

+    if getattr(args, "deliver_only", False):
+        if route["deliver"] == "log":
+            print(
+                "Error: --deliver-only requires --deliver to be a real target "
+                "(telegram, discord, slack, github_comment, etc.) — not 'log'."
+            )
+            return
+        route["deliver_only"] = True
+
    if args.deliver_chat_id:
        route["deliver_extra"] = {"chat_id": args.deliver_chat_id}

@@ -172,9 +181,12 @@ def _cmd_subscribe(args):
    else:
        print("  Events: (all)")
    print(f"  Deliver: {route['deliver']}")
+    if route.get("deliver_only"):
+        print("  Mode: direct delivery (no agent, zero LLM cost)")
    if route.get("prompt"):
        prompt_preview = route["prompt"][:80] + ("..." if len(route["prompt"]) > 80 else "")
-        print(f"  Prompt: {prompt_preview}")
+        label = "Message" if route.get("deliver_only") else "Prompt"
+        print(f"  {label}: {prompt_preview}")
    print(f"\n  Configure your service to POST to the URL above.")
    print(f"  Use the secret for HMAC-SHA256 signature validation.")
    print(f"  The gateway must be running to receive events (hermes gateway run).\n")
@@ -192,6 +204,8 @@ def _cmd_list(args):
    for name, route in subs.items():
        events = ", ".join(route.get("events", [])) or "(all)"
        deliver = route.get("deliver", "log")
+        if route.get("deliver_only"):
+            deliver = f"{deliver} (direct — no agent)"
        desc = route.get("description", "")
        print(f"  ◆ {name}")
        if desc:
@@ -383,10 +383,19 @@ class SessionDB:
        return session_id

    def end_session(self, session_id: str, end_reason: str) -> None:
-        """Mark a session as ended."""
+        """Mark a session as ended.
+
+        No-ops when the session is already ended. The first end_reason wins:
+        compression-split sessions must keep their ``end_reason = 'compression'``
+        record even if a later stale ``end_session()`` call (e.g. from a
+        desynced CLI session_id after ``/resume`` or ``/branch``) targets them
+        with a different reason. Use ``reopen_session()`` first if you
+        intentionally need to re-end a closed session with a new reason.
+        """
        def _do(conn):
            conn.execute(
-                "UPDATE sessions SET ended_at = ?, end_reason = ? WHERE id = ?",
+                "UPDATE sessions SET ended_at = ?, end_reason = ? "
+                "WHERE id = ? AND ended_at IS NULL",
                (time.time(), end_reason, session_id),
            )
        self._execute_write(_do)
@@ -714,6 +723,42 @@ class SessionDB:

        return f"{base} #{max_num + 1}"

+    def get_compression_tip(self, session_id: str) -> Optional[str]:
+        """Walk the compression-continuation chain forward and return the tip.
+
+        A compression continuation is a child session where:
+        1. The parent's ``end_reason = 'compression'``
+        2. The child was created AFTER the parent was ended (started_at >= ended_at)
+
+        The second condition distinguishes compression continuations from
+        delegate subagents or branch children, which can also have a
+        ``parent_session_id`` but were created while the parent was still live.
+
+        Returns the session_id of the latest continuation in the chain, or the
+        input ``session_id`` if it isn't part of a compression chain (or if the
+        input itself doesn't exist).
+        """
+        current = session_id
+        # Bound the walk defensively — compression chains this deep are
+        # pathological and shouldn't happen in practice. 100 = plenty.
+        for _ in range(100):
+            with self._lock:
+                cursor = self._conn.execute(
+                    "SELECT id FROM sessions "
+                    "WHERE parent_session_id = ? "
+                    "  AND started_at >= ("
+                    "      SELECT ended_at FROM sessions "
+                    "      WHERE id = ? AND end_reason = 'compression'"
+                    "  ) "
+                    "ORDER BY started_at DESC LIMIT 1",
+                    (current, current),
+                )
+                row = cursor.fetchone()
+            if row is None:
+                return current
+            current = row["id"]
+        return current
+
    def list_sessions_rich(
        self,
        source: str = None,
@@ -721,6 +766,7 @@ class SessionDB:
        limit: int = 20,
        offset: int = 0,
        include_children: bool = False,
+        project_compression_tips: bool = True,
    ) -> List[Dict[str, Any]]:
        """List sessions with preview (first user message) and last active timestamp.

@@ -732,6 +778,14 @@ class SessionDB:

        By default, child sessions (subagent runs, compression continuations)
        are excluded.  Pass ``include_children=True`` to include them.
+
+        With ``project_compression_tips=True`` (default), sessions that are
+        roots of compression chains are projected forward to their latest
+        continuation — one logical conversation = one list entry, showing the
+        live continuation's id/message_count/title/last_active. This prevents
+        compressed continuations from being invisible to users while keeping
+        delegate subagents and branches hidden. Pass ``False`` to return the
+        raw root rows (useful for admin/debug UIs).
        """
        where_clauses = []
        params = []
@@ -782,8 +836,77 @@ class SessionDB:
                s["preview"] = ""
            sessions.append(s)

+        # Project compression roots forward to their tips. Each row whose
+        # end_reason is 'compression' has a continuation child; replace the
+        # surfaced fields (id, message_count, title, last_active, ended_at,
+        # end_reason, preview) with the tip's values so the list entry acts
+        # as the live conversation. Keep the root's started_at to preserve
+        # chronological ordering by original conversation start.
+        if project_compression_tips and not include_children:
+            projected = []
+            for s in sessions:
+                if s.get("end_reason") != "compression":
+                    projected.append(s)
+                    continue
+                tip_id = self.get_compression_tip(s["id"])
+                if tip_id == s["id"]:
+                    projected.append(s)
+                    continue
+                tip_row = self._get_session_rich_row(tip_id)
+                if not tip_row:
+                    projected.append(s)
+                    continue
+                # Preserve the root's started_at for stable sort order, but
+                # surface the tip's identity and activity data.
+                merged = dict(s)
+                for key in (
+                    "id", "ended_at", "end_reason", "message_count",
+                    "tool_call_count", "title", "last_active", "preview",
+                    "model", "system_prompt",
+                ):
+                    if key in tip_row:
+                        merged[key] = tip_row[key]
+                merged["_lineage_root_id"] = s["id"]
+                projected.append(merged)
+            sessions = projected
+
        return sessions

+    def _get_session_rich_row(self, session_id: str) -> Optional[Dict[str, Any]]:
+        """Fetch a single session with the same enriched columns as
+        ``list_sessions_rich`` (preview + last_active). Returns None if the
+        session doesn't exist.
+        """
+        query = """
+            SELECT s.*,
+                COALESCE(
+                    (SELECT SUBSTR(REPLACE(REPLACE(m.content, X'0A', ' '), X'0D', ' '), 1, 63)
+                     FROM messages m
+                     WHERE m.session_id = s.id AND m.role = 'user' AND m.content IS NOT NULL
+                     ORDER BY m.timestamp, m.id LIMIT 1),
+                    ''
+                ) AS _preview_raw,
+                COALESCE(
+                    (SELECT MAX(m2.timestamp) FROM messages m2 WHERE m2.session_id = s.id),
+                    s.started_at
+                ) AS last_active
+            FROM sessions s
+            WHERE s.id = ?
+        """
+        with self._lock:
+            cursor = self._conn.execute(query, (session_id,))
+            row = cursor.fetchone()
+        if not row:
+            return None
+        s = dict(row)
+        raw = s.pop("_preview_raw", "").strip()
+        if raw:
+            text = raw[:60]
+            s["preview"] = text + ("..." if len(raw) > 60 else "")
+        else:
+            s["preview"] = ""
+        return s
+
    # =========================================================================
    # Message storage
    # =========================================================================
@@ -1126,10 +1249,37 @@ class SessionDB:
            try:
                with self._lock:
                    ctx_cursor = self._conn.execute(
-                        """SELECT role, content FROM messages
-                           WHERE session_id = ? AND id >= ? - 1 AND id <= ? + 1
-                           ORDER BY id""",
-                        (match["session_id"], match["id"], match["id"]),
+                        """WITH target AS (
+                               SELECT session_id, timestamp, id
+                               FROM messages
+                               WHERE id = ?
+                           )
+                           SELECT role, content
+                           FROM (
+                               SELECT m.id, m.timestamp, m.role, m.content
+                               FROM messages m
+                               JOIN target t ON t.session_id = m.session_id
+                               WHERE (m.timestamp < t.timestamp)
+                                  OR (m.timestamp = t.timestamp AND m.id < t.id)
+                               ORDER BY m.timestamp DESC, m.id DESC
+                               LIMIT 1
+                           )
+                           UNION ALL
+                           SELECT role, content
+                           FROM messages
+                           WHERE id = ?
+                           UNION ALL
+                           SELECT role, content
+                           FROM (
+                               SELECT m.id, m.timestamp, m.role, m.content
+                               FROM messages m
+                               JOIN target t ON t.session_id = m.session_id
+                               WHERE (m.timestamp > t.timestamp)
+                                  OR (m.timestamp = t.timestamp AND m.id > t.id)
+                               ORDER BY m.timestamp ASC, m.id ASC
+                               LIMIT 1
+                           )""",
+                        (match["id"], match["id"]),
                    )
                    context_msgs = [
                        {"role": r["role"], "content": (r["content"] or "")[:200]}
@@ -43,13 +43,16 @@ from dotenv import load_dotenv
 load_dotenv()


-def _effective_temperature_for_model(model: str) -> Optional[float]:
+def _effective_temperature_for_model(
+    model: str,
+    base_url: Optional[str] = None,
+) -> Optional[float]:
    """Return a fixed temperature for models with strict sampling contracts."""
    try:
        from agent.auxiliary_client import _fixed_temperature_for_model
    except Exception:
        return None
-    return _fixed_temperature_for_model(model)
+    return _fixed_temperature_for_model(model, base_url)



@@ -457,7 +460,10 @@ Complete the user's task step by step."""
                        "tools": self.tools,
                        "timeout": 300.0,
                    }
-                    fixed_temperature = _effective_temperature_for_model(self.model)
+                    fixed_temperature = _effective_temperature_for_model(
+                        self.model,
+                        str(getattr(self.client, "base_url", "") or ""),
+                    )
                    if fixed_temperature is not None:
                        api_kwargs["temperature"] = fixed_temperature

@@ -282,6 +282,31 @@ def get_tool_definitions(
                filtered_tools[i] = {"type": "function", "function": dynamic_schema}
                break

+    # Rebuild discord_server schema based on the bot's privileged intents
+    # (detected from GET /applications/@me) and the user's action allowlist
+    # in config.  Hides actions the bot's intents don't support so the
+    # model never attempts them, and annotates fetch_messages when the
+    # MESSAGE_CONTENT intent is missing.
+    if "discord_server" in available_tool_names:
+        try:
+            from tools.discord_tool import get_dynamic_schema
+            dynamic = get_dynamic_schema()
+        except Exception:  # pragma: no cover — defensive, fall back to static
+            dynamic = None
+        if dynamic is None:
+            # Tool filtered out entirely (empty allowlist or detection disabled
+            # the only remaining actions).  Drop it from the schema list.
+            filtered_tools = [
+                t for t in filtered_tools
+                if t.get("function", {}).get("name") != "discord_server"
+            ]
+            available_tool_names.discard("discord_server")
+        else:
+            for i, td in enumerate(filtered_tools):
+                if td.get("function", {}).get("name") == "discord_server":
+                    filtered_tools[i] = {"type": "function", "function": dynamic}
+                    break
+
    # Strip web tool cross-references from browser_navigate description when
    # web_search / web_extract are not available.  The static schema says
    # "prefer web_search or web_extract" which causes the model to hallucinate
@@ -525,6 +550,30 @@ def handle_function_call(
        except Exception:
            pass

+        # Generic tool-result canonicalization seam: plugins receive the
+        # final result string (JSON, usually) and may replace it by
+        # returning a string from transform_tool_result. Runs after
+        # post_tool_call (which stays observational) and before the result
+        # is appended back into conversation context. Fail-open; the first
+        # valid string return wins; non-string returns are ignored.
+        try:
+            from hermes_cli.plugins import invoke_hook
+            hook_results = invoke_hook(
+                "transform_tool_result",
+                tool_name=function_name,
+                args=function_args,
+                result=result,
+                task_id=task_id or "",
+                session_id=session_id or "",
+                tool_call_id=tool_call_id or "",
+            )
+            for hook_result in hook_results:
+                if isinstance(hook_result, str):
+                    result = hook_result
+                    break
+        except Exception:
+            pass
+
        return result

    except Exception as e:
@@ -7,7 +7,7 @@ license: MIT
 metadata:
  hermes:
    tags: [telephony, phone, sms, mms, voice, twilio, bland.ai, vapi, calling, texting]
-    related_skills: [find-nearby, google-workspace, agentmail]
+    related_skills: [maps, google-workspace, agentmail]
    category: productivity
 ---

@@ -57,32 +57,32 @@ Use the `ddgs` command via `terminal` when it exists. This is the preferred path

 ```bash
 # Text search
-ddgs text -k "python async programming" -m 5
+ddgs text -q "python async programming" -m 5

 # News search
-ddgs news -k "artificial intelligence" -m 5
+ddgs news -q "artificial intelligence" -m 5

 # Image search
-ddgs images -k "landscape photography" -m 10
+ddgs images -q "landscape photography" -m 10

 # Video search
-ddgs videos -k "python tutorial" -m 5
+ddgs videos -q "python tutorial" -m 5

 # With region filter
-ddgs text -k "best restaurants" -m 5 -r us-en
+ddgs text -q "best restaurants" -m 5 -r us-en

 # Recent results only (d=day, w=week, m=month, y=year)
-ddgs text -k "latest AI news" -m 5 -t w
+ddgs text -q "latest AI news" -m 5 -t w

 # JSON output for parsing
-ddgs text -k "fastapi tutorial" -m 5 -o json
+ddgs text -q "fastapi tutorial" -m 5 -o json
 ```

 ### CLI Flags

 | Flag | Description | Example |
 |------|-------------|---------|
-| `-k` | Keywords (query) — **required** | `-k "search terms"` |
+| `-q` | Query — **required** | `-q "search terms"` |
 | `-m` | Max results | `-m 5` |
 | `-r` | Region | `-r us-en` |
 | `-t` | Time limit | `-t w` (week) |
@@ -189,7 +189,7 @@ DuckDuckGo returns titles, URLs, and snippets — not full page content. To get
 CLI example:

 ```bash
-ddgs text -k "fastapi deployment guide" -m 3 -o json
+ddgs text -q "fastapi deployment guide" -m 3 -o json
 ```

 Python example, only after verifying `ddgs` is installed in that runtime:
@@ -229,7 +229,7 @@ Then extract the best URL with `web_extract` or another content-retrieval tool.
 - **Do not assume the CLI exists**: Check `command -v ddgs` before using it.
 - **Do not assume `execute_code` can import `ddgs`**: `from ddgs import DDGS` may fail with `ModuleNotFoundError` unless that runtime was prepared separately.
 - **Package name**: The package is `ddgs` (previously `duckduckgo-search`). Install with `pip install ddgs`.
- **Don't confuse `-k` and `-m`** (CLI): `-k` is for keywords, `-m` is for max results count.
+- **Don't confuse `-q` and `-m`** (CLI): `-q` is for the query, `-m` is for max results count.
 - **Empty results**: If `ddgs` returns nothing, it may be rate-limited. Wait a few seconds and retry.

 ## Validated With
@@ -25,4 +25,4 @@ if ! command -v ddgs &> /dev/null; then
    exit 1
 fi

-ddgs text -k "$QUERY" -m "$MAX_RESULTS"
+ddgs text -q "$QUERY" -m "$MAX_RESULTS"
@@ -4,7 +4,7 @@
 Add a first-class `gemini` provider that authenticates via Google OAuth, using the standard Gemini API (not Cloud Code Assist). Users who have a Google AI subscription or Gemini API access can authenticate through the browser without needing to manually copy API keys.

 ## Architecture Decision
- **Path A (chosen):** Standard Gemini API at `generativelanguage.googleapis.com/v1beta/openai/`
+- **Path A (chosen):** Standard Gemini API at `generativelanguage.googleapis.com/v1beta`
 - **NOT Path B:** Cloud Code Assist (`cloudcode-pa.googleapis.com`) — rate-limited free tier, internal API, account ban risk
 - Standard `chat_completions` api_mode via OpenAI SDK — no new api_mode needed
 - Our own OAuth credentials — NOT sharing tokens with Gemini CLI
@@ -32,9 +32,9 @@ Add a first-class `gemini` provider that authenticates via Google OAuth, using t
 - File locking for concurrent access (multiple agent sessions)

 ## API Integration
- Base URL: `https://generativelanguage.googleapis.com/v1beta/openai/`
- Auth: `Authorization: Bearer <access_token>` (passed as `api_key` to OpenAI SDK)
- api_mode: `chat_completions` (standard)
+- Base URL: `https://generativelanguage.googleapis.com/v1beta`
+- Auth: native Gemini API authentication handled by the provider adapter
+- api_mode: `chat_completions` (standard facade over native transport)
 - Models: gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash, etc.

 ## Files to Create/Modify
@@ -0,0 +1,51 @@
+# disk-cleanup
+
+Auto-tracks and cleans up ephemeral files created during Hermes Agent
+sessions — test scripts, temp outputs, cron logs, stale chrome profiles.
+Scoped strictly to `$HERMES_HOME` and `/tmp/hermes-*`.
+
+Originally contributed by [@LVT382009](https://github.com/LVT382009) as a
+skill in PR #12212.  Ported to the plugin system so the behaviour runs
+automatically via `post_tool_call` and `on_session_end` hooks — the agent
+never needs to remember to call a tool.
+
+## How it works
+
+| Hook | Behaviour |
+|---|---|
+| `post_tool_call` | When `write_file` / `terminal` / `patch` creates a file matching `test_*`, `tmp_*`, or `*.test.*` inside `HERMES_HOME`, track it silently as `test` / `temp` / `cron-output`. |
+| `on_session_end` | If any test files were auto-tracked during this turn, run `quick` cleanup (no prompts). |
+
+Deletion rules (same as the original PR):
+
+| Category | Threshold | Confirmation |
+|---|---|---|
+| `test` | every session end | Never |
+| `temp` | >7 days since tracked | Never |
+| `cron-output` | >14 days since tracked | Never |
+| empty dirs under HERMES_HOME | always | Never |
+| `research` | >30 days, beyond 10 newest | Always (deep only) |
+| `chrome-profile` | >14 days since tracked | Always (deep only) |
+| files >500 MB | never auto | Always (deep only) |
+
+## Slash command
+
+```
+/disk-cleanup status                     # breakdown + top-10 largest
+/disk-cleanup dry-run                    # preview without deleting
+/disk-cleanup quick                      # run safe cleanup now
+/disk-cleanup deep                       # quick + list items needing prompt
+/disk-cleanup track <path> <category>    # manual tracking
+/disk-cleanup forget <path>              # stop tracking
+```
+
+## Safety
+
+- `is_safe_path()` rejects anything outside `HERMES_HOME` or `/tmp/hermes-*`
+- Windows mounts (`/mnt/c` etc.) are rejected
+- The state directory `$HERMES_HOME/disk-cleanup/` is itself excluded
+- `$HERMES_HOME/logs/`, `memories/`, `sessions/`, `skills/`, `plugins/`,
+  and config files are never tracked
+- Backup/restore is scoped to `tracked.json` — the plugin never touches
+  agent logs
+- Atomic writes: `.tmp` → backup → rename
@@ -0,0 +1,316 @@
+"""disk-cleanup plugin — auto-cleanup of ephemeral Hermes session files.
+
+Wires three behaviours:
+
+1. ``post_tool_call`` hook — inspects ``write_file`` and ``terminal``
+   tool results for newly-created paths matching test/temp patterns
+   under ``HERMES_HOME`` and tracks them silently.  Zero agent
+   compliance required.
+
+2. ``on_session_end`` hook — when any test files were auto-tracked
+   during the just-finished turn, runs :func:`disk_cleanup.quick` and
+   logs a single line to ``$HERMES_HOME/disk-cleanup/cleanup.log``.
+
+3. ``/disk-cleanup`` slash command — manual ``status``, ``dry-run``,
+   ``quick``, ``deep``, ``track``, ``forget``.
+
+Replaces PR #12212's skill-plus-script design: the agent no longer
+needs to remember to run commands.
+"""
+
+from __future__ import annotations
+
+import logging
+import re
+import shlex
+import threading
+from pathlib import Path
+from typing import Any, Dict, Optional, Set
+
+from . import disk_cleanup as dg
+
+logger = logging.getLogger(__name__)
+
+
+# Per-task set of "test files newly tracked this turn".  Keyed by task_id
+# (or session_id as fallback) so on_session_end can decide whether to run
+# cleanup.  Guarded by a lock — post_tool_call can fire concurrently on
+# parallel tool calls.
+_recent_test_tracks: Dict[str, Set[str]] = {}
+_lock = threading.Lock()
+
+
+# Tool-call result shapes we can parse
+_WRITE_FILE_PATH_KEY = "path"
+_TERMINAL_PATH_REGEX = re.compile(r"(?:^|\s)(/[^\s'\"`]+|\~/[^\s'\"`]+)")
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _tracker_key(task_id: str, session_id: str) -> str:
+    return task_id or session_id or "default"
+
+
+def _record_track(task_id: str, session_id: str, path: Path, category: str) -> None:
+    """Record that we tracked *path* as *category* during this turn."""
+    if category != "test":
+        return
+    key = _tracker_key(task_id, session_id)
+    with _lock:
+        _recent_test_tracks.setdefault(key, set()).add(str(path))
+
+
+def _drain(task_id: str, session_id: str) -> Set[str]:
+    """Pop the set of test paths tracked during this turn."""
+    key = _tracker_key(task_id, session_id)
+    with _lock:
+        return _recent_test_tracks.pop(key, set())
+
+
+def _attempt_track(path_str: str, task_id: str, session_id: str) -> None:
+    """Best-effort auto-track. Never raises."""
+    try:
+        p = Path(path_str).expanduser()
+    except Exception:
+        return
+    if not p.exists():
+        return
+    category = dg.guess_category(p)
+    if category is None:
+        return
+    newly = dg.track(str(p), category, silent=True)
+    if newly:
+        _record_track(task_id, session_id, p, category)
+
+
+def _extract_paths_from_write_file(args: Dict[str, Any]) -> Set[str]:
+    path = args.get(_WRITE_FILE_PATH_KEY)
+    return {path} if isinstance(path, str) and path else set()
+
+
+def _extract_paths_from_patch(args: Dict[str, Any]) -> Set[str]:
+    # The patch tool creates new files via the `mode="patch"` path too, but
+    # most of its use is editing existing files — we only care about new
+    # ephemeral creations, so treat patch conservatively and only pick up
+    # the single-file `path` arg.  Track-then-cleanup is idempotent, so
+    # re-tracking an already-tracked file is a no-op (dedup in track()).
+    path = args.get("path")
+    return {path} if isinstance(path, str) and path else set()
+
+
+def _extract_paths_from_terminal(args: Dict[str, Any], result: str) -> Set[str]:
+    """Best-effort: pull candidate filesystem paths from a terminal command
+    and its output, then let ``guess_category`` / ``is_safe_path`` filter.
+    """
+    paths: Set[str] = set()
+    cmd = args.get("command") or ""
+    if isinstance(cmd, str) and cmd:
+        # Tokenise the command — catches `touch /tmp/hermes-x/test_foo.py`
+        try:
+            for tok in shlex.split(cmd, posix=True):
+                if tok.startswith(("/", "~")):
+                    paths.add(tok)
+        except ValueError:
+            pass
+    # Only scan the result text if it's a reasonable size (avoid 50KB dumps).
+    if isinstance(result, str) and len(result) < 4096:
+        for match in _TERMINAL_PATH_REGEX.findall(result):
+            paths.add(match)
+    return paths
+
+
+# ---------------------------------------------------------------------------
+# Hooks
+# ---------------------------------------------------------------------------
+
+def _on_post_tool_call(
+    tool_name: str = "",
+    args: Optional[Dict[str, Any]] = None,
+    result: Any = None,
+    task_id: str = "",
+    session_id: str = "",
+    tool_call_id: str = "",
+    **_: Any,
+) -> None:
+    """Auto-track ephemeral files created by recent tool calls."""
+    if not isinstance(args, dict):
+        return
+
+    candidates: Set[str] = set()
+    if tool_name == "write_file":
+        candidates = _extract_paths_from_write_file(args)
+    elif tool_name == "patch":
+        candidates = _extract_paths_from_patch(args)
+    elif tool_name == "terminal":
+        candidates = _extract_paths_from_terminal(args, result if isinstance(result, str) else "")
+    else:
+        return
+
+    for path_str in candidates:
+        _attempt_track(path_str, task_id, session_id)
+
+
+def _on_session_end(
+    session_id: str = "",
+    completed: bool = True,
+    interrupted: bool = False,
+    **_: Any,
+) -> None:
+    """Run quick cleanup if any test files were tracked during this turn."""
+    # Drain both task-level and session-level buckets.  In practice only one
+    # is populated per turn; the other is empty.
+    drained_session = _drain("", session_id)
+    # Also drain any task-scoped buckets that happen to exist.  This is a
+    # cheap sweep: if an agent spawned subagents (each with their own
+    # task_id) they'll have recorded into separate buckets; we want to
+    # cleanup them all at session end.
+    with _lock:
+        task_buckets = list(_recent_test_tracks.keys())
+    for key in task_buckets:
+        if key and key != session_id:
+            _recent_test_tracks.pop(key, None)
+
+    if not drained_session and not task_buckets:
+        return
+
+    try:
+        summary = dg.quick()
+    except Exception as exc:
+        logger.debug("disk-cleanup quick cleanup failed: %s", exc)
+        return
+
+    if summary["deleted"] or summary["empty_dirs"]:
+        dg._log(
+            f"AUTO_QUICK (session_end): deleted={summary['deleted']} "
+            f"dirs={summary['empty_dirs']} freed={dg.fmt_size(summary['freed'])}"
+        )
+
+
+# ---------------------------------------------------------------------------
+# Slash command
+# ---------------------------------------------------------------------------
+
+_HELP_TEXT = """\
+/disk-cleanup — ephemeral-file cleanup
+
+Subcommands:
+  status                     Per-category breakdown + top-10 largest
+  dry-run                    Preview what quick/deep would delete
+  quick                      Run safe cleanup now (no prompts)
+  deep                       Run quick, then list items that need prompts
+  track <path> <category>    Manually add a path to tracking
+  forget <path>              Stop tracking a path (does not delete)
+
+Categories: temp | test | research | download | chrome-profile | cron-output | other
+
+All operations are scoped to HERMES_HOME and /tmp/hermes-*.
+Test files are auto-tracked on write_file / terminal and auto-cleaned at session end.
+"""
+
+
+def _fmt_summary(summary: Dict[str, Any]) -> str:
+    base = (
+        f"[disk-cleanup] Cleaned {summary['deleted']} files + "
+        f"{summary['empty_dirs']} empty dirs, freed {dg.fmt_size(summary['freed'])}."
+    )
+    if summary.get("errors"):
+        base += f"\n  {len(summary['errors'])} error(s); see cleanup.log."
+    return base
+
+
+def _handle_slash(raw_args: str) -> Optional[str]:
+    argv = raw_args.strip().split()
+    if not argv or argv[0] in ("help", "-h", "--help"):
+        return _HELP_TEXT
+
+    sub = argv[0]
+
+    if sub == "status":
+        return dg.format_status(dg.status())
+
+    if sub == "dry-run":
+        auto, prompt = dg.dry_run()
+        auto_size = sum(i["size"] for i in auto)
+        prompt_size = sum(i["size"] for i in prompt)
+        lines = [
+            "Dry-run preview (nothing deleted):",
+            f"  Auto-delete : {len(auto)} files ({dg.fmt_size(auto_size)})",
+        ]
+        for item in auto:
+            lines.append(f"    [{item['category']}] {item['path']}")
+        lines.append(
+            f"  Needs prompt: {len(prompt)} files ({dg.fmt_size(prompt_size)})"
+        )
+        for item in prompt:
+            lines.append(f"    [{item['category']}] {item['path']}")
+        lines.append(
+            f"\n  Total potential: {dg.fmt_size(auto_size + prompt_size)}"
+        )
+        return "\n".join(lines)
+
+    if sub == "quick":
+        return _fmt_summary(dg.quick())
+
+    if sub == "deep":
+        # In-session deep can't prompt the user interactively — show what
+        # quick cleaned plus the items that WOULD need confirmation.
+        quick_summary = dg.quick()
+        _auto, prompt_items = dg.dry_run()
+        lines = [_fmt_summary(quick_summary)]
+        if prompt_items:
+            size = sum(i["size"] for i in prompt_items)
+            lines.append(
+                f"\n{len(prompt_items)} item(s) need confirmation "
+                f"({dg.fmt_size(size)}):"
+            )
+            for item in prompt_items:
+                lines.append(f"  [{item['category']}] {item['path']}")
+            lines.append(
+                "\nRun `/disk-cleanup forget <path>` to skip, or delete "
+                "manually via terminal."
+            )
+        return "\n".join(lines)
+
+    if sub == "track":
+        if len(argv) < 3:
+            return "Usage: /disk-cleanup track <path> <category>"
+        path_arg = argv[1]
+        category = argv[2]
+        if category not in dg.ALLOWED_CATEGORIES:
+            return (
+                f"Unknown category '{category}'. "
+                f"Allowed: {sorted(dg.ALLOWED_CATEGORIES)}"
+            )
+        if dg.track(path_arg, category, silent=True):
+            return f"Tracked {path_arg} as '{category}'."
+        return (
+            f"Not tracked (already present, missing, or outside HERMES_HOME): "
+            f"{path_arg}"
+        )
+
+    if sub == "forget":
+        if len(argv) < 2:
+            return "Usage: /disk-cleanup forget <path>"
+        n = dg.forget(argv[1])
+        return (
+            f"Removed {n} tracking entr{'y' if n == 1 else 'ies'} for {argv[1]}."
+            if n else f"Not found in tracking: {argv[1]}"
+        )
+
+    return f"Unknown subcommand: {sub}\n\n{_HELP_TEXT}"
+
+
+# ---------------------------------------------------------------------------
+# Plugin registration
+# ---------------------------------------------------------------------------
+
+def register(ctx) -> None:
+    ctx.register_hook("post_tool_call", _on_post_tool_call)
+    ctx.register_hook("on_session_end", _on_session_end)
+    ctx.register_command(
+        "disk-cleanup",
+        handler=_handle_slash,
+        description="Track and clean up ephemeral Hermes session files.",
+    )
@@ -0,0 +1,496 @@
+"""disk_cleanup — ephemeral file cleanup for Hermes Agent.
+
+Library module wrapping the deterministic cleanup rules written by
+@LVT382009 in PR #12212. The plugin ``__init__.py`` wires these
+functions into ``post_tool_call`` and ``on_session_end`` hooks so
+tracking and cleanup happen automatically — the agent never needs to
+call a tool or remember a skill.
+
+Rules:
+  - test files    → delete immediately at task end (age >= 0)
+  - temp files    → delete after 7 days
+  - cron-output   → delete after 14 days
+  - empty dirs    → always delete (under HERMES_HOME)
+  - research      → keep 10 newest, prompt for older (deep only)
+  - chrome-profile→ prompt after 14 days (deep only)
+  - >500 MB files → prompt always (deep only)
+
+Scope: strictly HERMES_HOME and /tmp/hermes-*
+Never touches: ~/.hermes/logs/ or any system directory.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import shutil
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+
+try:
+    from hermes_constants import get_hermes_home
+except Exception:  # pragma: no cover — plugin may load before constants resolves
+    import os
+
+    def get_hermes_home() -> Path:  # type: ignore[no-redef]
+        val = (os.environ.get("HERMES_HOME") or "").strip()
+        return Path(val).resolve() if val else (Path.home() / ".hermes").resolve()
+
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Paths
+# ---------------------------------------------------------------------------
+
+def get_state_dir() -> Path:
+    """State dir — separate from ``$HERMES_HOME/logs/``."""
+    return get_hermes_home() / "disk-cleanup"
+
+
+def get_tracked_file() -> Path:
+    return get_state_dir() / "tracked.json"
+
+
+def get_log_file() -> Path:
+    """Audit log — intentionally NOT under ``$HERMES_HOME/logs/``."""
+    return get_state_dir() / "cleanup.log"
+
+
+# ---------------------------------------------------------------------------
+# Path safety
+# ---------------------------------------------------------------------------
+
+def is_safe_path(path: Path) -> bool:
+    """Accept only paths under HERMES_HOME or ``/tmp/hermes-*``.
+
+    Rejects Windows mounts (``/mnt/c`` etc.) and any system directory.
+    """
+    hermes_home = get_hermes_home()
+    try:
+        path.resolve().relative_to(hermes_home)
+        return True
+    except (ValueError, OSError):
+        pass
+    # Allow /tmp/hermes-* explicitly
+    parts = path.parts
+    if len(parts) >= 3 and parts[1] == "tmp" and parts[2].startswith("hermes-"):
+        return True
+    return False
+
+
+# ---------------------------------------------------------------------------
+# Audit log
+# ---------------------------------------------------------------------------
+
+def _log(message: str) -> None:
+    try:
+        log_file = get_log_file()
+        log_file.parent.mkdir(parents=True, exist_ok=True)
+        ts = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+        with open(log_file, "a") as f:
+            f.write(f"[{ts}] {message}\n")
+    except OSError:
+        # Never let the audit log break the agent loop.
+        pass
+
+
+# ---------------------------------------------------------------------------
+# tracked.json — atomic read/write, backup scoped to tracked.json only
+# ---------------------------------------------------------------------------
+
+def load_tracked() -> List[Dict[str, Any]]:
+    """Load tracked.json.  Restores from ``.bak`` on corruption."""
+    tf = get_tracked_file()
+    tf.parent.mkdir(parents=True, exist_ok=True)
+
+    if not tf.exists():
+        return []
+
+    try:
+        return json.loads(tf.read_text())
+    except (json.JSONDecodeError, ValueError):
+        bak = tf.with_suffix(".json.bak")
+        if bak.exists():
+            try:
+                data = json.loads(bak.read_text())
+                _log("WARN: tracked.json corrupted — restored from .bak")
+                return data
+            except Exception:
+                pass
+        _log("WARN: tracked.json corrupted, no backup — starting fresh")
+        return []
+
+
+def save_tracked(tracked: List[Dict[str, Any]]) -> None:
+    """Atomic write: ``.tmp`` → backup old → rename."""
+    tf = get_tracked_file()
+    tf.parent.mkdir(parents=True, exist_ok=True)
+    tmp = tf.with_suffix(".json.tmp")
+    tmp.write_text(json.dumps(tracked, indent=2))
+    if tf.exists():
+        shutil.copy2(tf, tf.with_suffix(".json.bak"))
+    tmp.replace(tf)
+
+
+# ---------------------------------------------------------------------------
+# Categories
+# ---------------------------------------------------------------------------
+
+ALLOWED_CATEGORIES = {
+    "temp", "test", "research", "download",
+    "chrome-profile", "cron-output", "other",
+}
+
+
+def fmt_size(n: float) -> str:
+    for unit in ("B", "KB", "MB", "GB", "TB"):
+        if n < 1024:
+            return f"{n:.1f} {unit}"
+        n /= 1024
+    return f"{n:.1f} PB"
+
+
+# ---------------------------------------------------------------------------
+# Track / forget
+# ---------------------------------------------------------------------------
+
+def track(path_str: str, category: str, silent: bool = False) -> bool:
+    """Register a file for tracking. Returns True if newly tracked."""
+    if category not in ALLOWED_CATEGORIES:
+        _log(f"WARN: unknown category '{category}', using 'other'")
+        category = "other"
+
+    path = Path(path_str).resolve()
+
+    if not path.exists():
+        _log(f"SKIP: {path} (does not exist)")
+        return False
+
+    if not is_safe_path(path):
+        _log(f"REJECT: {path} (outside HERMES_HOME)")
+        return False
+
+    size = path.stat().st_size if path.is_file() else 0
+    tracked = load_tracked()
+
+    # Deduplicate
+    if any(item["path"] == str(path) for item in tracked):
+        return False
+
+    tracked.append({
+        "path": str(path),
+        "timestamp": datetime.now(timezone.utc).isoformat(),
+        "category": category,
+        "size": size,
+    })
+    save_tracked(tracked)
+    _log(f"TRACKED: {path} ({category}, {fmt_size(size)})")
+    if not silent:
+        print(f"Tracked: {path} ({category}, {fmt_size(size)})")
+    return True
+
+
+def forget(path_str: str) -> int:
+    """Remove a path from tracking without deleting the file."""
+    p = Path(path_str).resolve()
+    tracked = load_tracked()
+    before = len(tracked)
+    tracked = [i for i in tracked if Path(i["path"]).resolve() != p]
+    removed = before - len(tracked)
+    if removed:
+        save_tracked(tracked)
+        _log(f"FORGOT: {p} ({removed} entries)")
+    return removed
+
+
+# ---------------------------------------------------------------------------
+# Dry run
+# ---------------------------------------------------------------------------
+
+def dry_run() -> Tuple[List[Dict], List[Dict]]:
+    """Return (auto_delete_list, needs_prompt_list) without touching files."""
+    tracked = load_tracked()
+    now = datetime.now(timezone.utc)
+
+    auto: List[Dict] = []
+    prompt: List[Dict] = []
+
+    for item in tracked:
+        p = Path(item["path"])
+        if not p.exists():
+            continue
+        age = (now - datetime.fromisoformat(item["timestamp"])).days
+        cat = item["category"]
+        size = item["size"]
+
+        if cat == "test":
+            auto.append(item)
+        elif cat == "temp" and age > 7:
+            auto.append(item)
+        elif cat == "cron-output" and age > 14:
+            auto.append(item)
+        elif cat == "research" and age > 30:
+            prompt.append(item)
+        elif cat == "chrome-profile" and age > 14:
+            prompt.append(item)
+        elif size > 500 * 1024 * 1024:
+            prompt.append(item)
+
+    return auto, prompt
+
+
+# ---------------------------------------------------------------------------
+# Quick cleanup
+# ---------------------------------------------------------------------------
+
+def quick() -> Dict[str, Any]:
+    """Safe deterministic cleanup — no prompts.
+
+    Returns: ``{"deleted": N, "empty_dirs": N, "freed": bytes,
+               "errors": [str, ...]}``.
+    """
+    tracked = load_tracked()
+    now = datetime.now(timezone.utc)
+    deleted = 0
+    freed = 0
+    new_tracked: List[Dict] = []
+    errors: List[str] = []
+
+    for item in tracked:
+        p = Path(item["path"])
+        cat = item["category"]
+
+        if not p.exists():
+            _log(f"STALE: {p} (removed from tracking)")
+            continue
+
+        age = (now - datetime.fromisoformat(item["timestamp"])).days
+
+        should_delete = (
+            cat == "test"
+            or (cat == "temp" and age > 7)
+            or (cat == "cron-output" and age > 14)
+        )
+
+        if should_delete:
+            try:
+                if p.is_file():
+                    p.unlink()
+                elif p.is_dir():
+                    shutil.rmtree(p)
+                freed += item["size"]
+                deleted += 1
+                _log(f"DELETED: {p} ({cat}, {fmt_size(item['size'])})")
+            except OSError as e:
+                _log(f"ERROR deleting {p}: {e}")
+                errors.append(f"{p}: {e}")
+                new_tracked.append(item)
+        else:
+            new_tracked.append(item)
+
+    # Remove empty dirs under HERMES_HOME (but leave HERMES_HOME itself and
+    # a short list of well-known top-level state dirs alone — a fresh install
+    # has these empty, and deleting them would surprise the user).
+    hermes_home = get_hermes_home()
+    _PROTECTED_TOP_LEVEL = {
+        "logs", "memories", "sessions", "cron", "cronjobs",
+        "cache", "skills", "plugins", "disk-cleanup", "optional-skills",
+        "hermes-agent", "backups", "profiles", ".worktrees",
+    }
+    empty_removed = 0
+    try:
+        for dirpath in sorted(hermes_home.rglob("*"), reverse=True):
+            if not dirpath.is_dir() or dirpath == hermes_home:
+                continue
+            try:
+                rel_parts = dirpath.relative_to(hermes_home).parts
+            except ValueError:
+                continue
+            # Skip the well-known top-level state dirs themselves.
+            if len(rel_parts) == 1 and rel_parts[0] in _PROTECTED_TOP_LEVEL:
+                continue
+            try:
+                if not any(dirpath.iterdir()):
+                    dirpath.rmdir()
+                    empty_removed += 1
+                    _log(f"DELETED: {dirpath} (empty dir)")
+            except OSError:
+                pass
+    except OSError:
+        pass
+
+    save_tracked(new_tracked)
+    _log(
+        f"QUICK_SUMMARY: {deleted} files, {empty_removed} dirs, "
+        f"{fmt_size(freed)}"
+    )
+    return {
+        "deleted": deleted,
+        "empty_dirs": empty_removed,
+        "freed": freed,
+        "errors": errors,
+    }
+
+
+# ---------------------------------------------------------------------------
+# Deep cleanup (interactive — not called from plugin hooks)
+# ---------------------------------------------------------------------------
+
+def deep(
+    confirm: Optional[callable] = None,
+) -> Dict[str, Any]:
+    """Deep cleanup.
+
+    Runs :func:`quick` first, then asks the *confirm* callable for each
+    risky item (research > 30d beyond 10 newest, chrome-profile > 14d,
+    any file > 500 MB).  *confirm(item)* must return True to delete.
+
+    Returns: ``{"quick": {...}, "deep_deleted": N, "deep_freed": bytes}``.
+    """
+    quick_result = quick()
+
+    if confirm is None:
+        # No interactive confirmer — deep stops after the quick pass.
+        return {"quick": quick_result, "deep_deleted": 0, "deep_freed": 0}
+
+    tracked = load_tracked()
+    now = datetime.now(timezone.utc)
+    research, chrome, large = [], [], []
+
+    for item in tracked:
+        p = Path(item["path"])
+        if not p.exists():
+            continue
+        age = (now - datetime.fromisoformat(item["timestamp"])).days
+        cat = item["category"]
+
+        if cat == "research" and age > 30:
+            research.append(item)
+        elif cat == "chrome-profile" and age > 14:
+            chrome.append(item)
+        elif item["size"] > 500 * 1024 * 1024:
+            large.append(item)
+
+    research.sort(key=lambda x: x["timestamp"], reverse=True)
+    old_research = research[10:]
+
+    freed, count = 0, 0
+    to_remove: List[Dict] = []
+
+    for group in (old_research, chrome, large):
+        for item in group:
+            if confirm(item):
+                try:
+                    p = Path(item["path"])
+                    if p.is_file():
+                        p.unlink()
+                    elif p.is_dir():
+                        shutil.rmtree(p)
+                    to_remove.append(item)
+                    freed += item["size"]
+                    count += 1
+                    _log(
+                        f"DELETED: {p} ({item['category']}, "
+                        f"{fmt_size(item['size'])})"
+                    )
+                except OSError as e:
+                    _log(f"ERROR deleting {item['path']}: {e}")
+
+    if to_remove:
+        remove_paths = {i["path"] for i in to_remove}
+        save_tracked([i for i in tracked if i["path"] not in remove_paths])
+
+    return {"quick": quick_result, "deep_deleted": count, "deep_freed": freed}
+
+
+# ---------------------------------------------------------------------------
+# Status
+# ---------------------------------------------------------------------------
+
+def status() -> Dict[str, Any]:
+    """Return per-category breakdown and top 10 largest tracked files."""
+    tracked = load_tracked()
+    cats: Dict[str, Dict] = {}
+    for item in tracked:
+        c = item["category"]
+        cats.setdefault(c, {"count": 0, "size": 0})
+        cats[c]["count"] += 1
+        cats[c]["size"] += item["size"]
+
+    existing = [
+        (i["path"], i["size"], i["category"])
+        for i in tracked if Path(i["path"]).exists()
+    ]
+    existing.sort(key=lambda x: x[1], reverse=True)
+
+    return {
+        "categories": cats,
+        "top10": existing[:10],
+        "total_tracked": len(tracked),
+    }
+
+
+def format_status(s: Dict[str, Any]) -> str:
+    """Human-readable status string (for slash command output)."""
+    lines = [f"{'Category':<20} {'Files':>6}  {'Size':>10}", "-" * 40]
+    cats = s["categories"]
+    for cat, d in sorted(cats.items(), key=lambda x: x[1]["size"], reverse=True):
+        lines.append(f"{cat:<20} {d['count']:>6}  {fmt_size(d['size']):>10}")
+
+    if not cats:
+        lines.append("(nothing tracked yet)")
+
+    lines.append("")
+    lines.append("Top 10 largest tracked files:")
+    if not s["top10"]:
+        lines.append("  (none)")
+    else:
+        for rank, (path, size, cat) in enumerate(s["top10"], 1):
+            lines.append(f"  {rank:>2}. {fmt_size(size):>8}  [{cat}]  {path}")
+    return "\n".join(lines)
+
+
+# ---------------------------------------------------------------------------
+# Auto-categorisation from tool-call inspection
+# ---------------------------------------------------------------------------
+
+_TEST_PATTERNS = ("test_", "tmp_")
+_TEST_SUFFIXES = (".test.py", ".test.js", ".test.ts", ".test.md")
+
+
+def guess_category(path: Path) -> Optional[str]:
+    """Return a category label for *path*, or None if we shouldn't track it.
+
+    Used by the ``post_tool_call`` hook to auto-track ephemeral files.
+    """
+    if not is_safe_path(path):
+        return None
+
+    # Skip the state dir itself, logs, memory files, sessions, config.
+    hermes_home = get_hermes_home()
+    try:
+        rel = path.resolve().relative_to(hermes_home)
+        top = rel.parts[0] if rel.parts else ""
+        if top in {
+            "disk-cleanup", "logs", "memories", "sessions", "config.yaml",
+            "skills", "plugins", ".env", "USER.md", "MEMORY.md", "SOUL.md",
+            "auth.json", "hermes-agent",
+        }:
+            return None
+        if top == "cron" or top == "cronjobs":
+            return "cron-output"
+        if top == "cache":
+            return "temp"
+    except ValueError:
+        # Path isn't under HERMES_HOME (e.g. /tmp/hermes-*) — fall through.
+        pass
+
+    name = path.name
+    if name.startswith(_TEST_PATTERNS):
+        return "test"
+    if any(name.endswith(sfx) for sfx in _TEST_SUFFIXES):
+        return "test"
+    return None
@@ -0,0 +1,7 @@
+name: disk-cleanup
+version: 2.0.0
+description: "Auto-track and clean up ephemeral files (test scripts, temp outputs, cron logs) created during Hermes sessions. Runs via plugin hooks — no agent action required."
+author: "@LVT382009 (original), NousResearch (plugin port)"
+hooks:
+  - post_tool_call
+  - on_session_end
@@ -630,7 +630,7 @@ function Copy-ConfigTemplates {
    New-Item -ItemType Directory -Force -Path "$HermesHome\audio_cache" | Out-Null
    New-Item -ItemType Directory -Force -Path "$HermesHome\memories" | Out-Null
    New-Item -ItemType Directory -Force -Path "$HermesHome\skills" | Out-Null
-    New-Item -ItemType Directory -Force -Path "$HermesHome\whatsapp\session" | Out-Null
+
    
    # Create .env
    $envPath = "$HermesHome\.env"
@@ -735,19 +735,7 @@ function Install-NodeDeps {
        Pop-Location
    }

-    # Install WhatsApp bridge dependencies
-    $bridgeDir = "$InstallDir\scripts\whatsapp-bridge"
-    if (Test-Path "$bridgeDir\package.json") {
-        Write-Info "Installing WhatsApp bridge dependencies..."
-        Push-Location $bridgeDir
-        try {
-            npm install --silent 2>&1 | Out-Null
-            Write-Success "WhatsApp bridge dependencies installed"
-        } catch {
-            Write-Warn "WhatsApp bridge npm install failed (WhatsApp may not work)"
-        }
-        Pop-Location
-    }
+
    
    Pop-Location
 }
@@ -297,7 +297,7 @@ check_python() {
        if command -v python >/dev/null 2>&1; then
            PYTHON_PATH="$(command -v python)"
            if "$PYTHON_PATH" -c 'import sys; raise SystemExit(0 if sys.version_info >= (3, 11) else 1)' 2>/dev/null; then
-                PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+                PYTHON_FOUND_VERSION="$("$PYTHON_PATH" --version 2>/dev/null)"
                log_success "Python found: $PYTHON_FOUND_VERSION"
                return 0
            fi
@@ -306,7 +306,7 @@ check_python() {
        log_info "Installing Python via pkg..."
        pkg install -y python >/dev/null
        PYTHON_PATH="$(command -v python)"
-        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        PYTHON_FOUND_VERSION="$("$PYTHON_PATH" --version 2>/dev/null)"
        log_success "Python installed: $PYTHON_FOUND_VERSION"
        return 0
    fi
@@ -315,18 +315,17 @@ check_python() {

    # Let uv handle Python — it can download and manage Python versions
    # First check if a suitable Python is already available
-    if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
-        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
-        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+    if PYTHON_PATH="$("$UV_CMD" python find "$PYTHON_VERSION" 2>/dev/null)"; then
+        PYTHON_FOUND_VERSION="$("$PYTHON_PATH" --version 2>/dev/null)"
        log_success "Python found: $PYTHON_FOUND_VERSION"
        return 0
    fi

    # Python not found — use uv to install it (no sudo needed!)
    log_info "Python $PYTHON_VERSION not found, installing via uv..."
-    if $UV_CMD python install "$PYTHON_VERSION"; then
-        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
-        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+    if "$UV_CMD" python install "$PYTHON_VERSION"; then
+        PYTHON_PATH="$("$UV_CMD" python find "$PYTHON_VERSION")"
+        PYTHON_FOUND_VERSION="$("$PYTHON_PATH" --version 2>/dev/null)"
        log_success "Python installed: $PYTHON_FOUND_VERSION"
    else
        log_error "Failed to install Python $PYTHON_VERSION"
@@ -1052,7 +1051,7 @@ copy_config_templates() {
    log_info "Setting up configuration files..."

    # Create ~/.hermes directory structure (config at top level, code in subdir)
-    mkdir -p "$HERMES_HOME"/{cron,sessions,logs,pairing,hooks,image_cache,audio_cache,memories,skills,whatsapp/session}
+    mkdir -p "$HERMES_HOME"/{cron,sessions,logs,pairing,hooks,image_cache,audio_cache,memories,skills}

    # Create .env at ~/.hermes/.env (top level, easy to find)
    if [ ! -f "$HERMES_HOME/.env" ]; then
@@ -1122,7 +1121,7 @@ install_node_deps() {

    if [ "$DISTRO" = "termux" ]; then
        log_info "Skipping automatic Node/browser dependency setup on Termux"
-        log_info "Browser automation and WhatsApp bridge are not part of the tested Termux install path yet."
+        log_info "Browser automation is not part of the tested Termux install path yet."
        log_info "If you want to experiment manually later, run: cd $INSTALL_DIR && npm install"
        return 0
    fi
@@ -1204,15 +1203,7 @@ install_node_deps() {
        log_success "TUI dependencies installed"
    fi

-    # Install WhatsApp bridge dependencies
-    if [ -f "$INSTALL_DIR/scripts/whatsapp-bridge/package.json" ]; then
-        log_info "Installing WhatsApp bridge dependencies..."
-        cd "$INSTALL_DIR/scripts/whatsapp-bridge"
-        npm install --silent 2>/dev/null || {
-            log_warn "WhatsApp bridge npm install failed (WhatsApp may not work)"
-        }
-        log_success "WhatsApp bridge dependencies installed"
-    fi
+
 }

 run_setup_wizard() {
@@ -66,7 +66,10 @@ AUTHOR_MAP = {
    "104278804+Sertug17@users.noreply.github.com": "Sertug17",
    "112503481+caentzminger@users.noreply.github.com": "caentzminger",
    "258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
+    "sir_even@icloud.com": "sirEven",
+    "36056348+sirEven@users.noreply.github.com": "sirEven",
    "70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
+    "254021826+dodo-reach@users.noreply.github.com": "dodo-reach",
    "259807879+Bartok9@users.noreply.github.com": "Bartok9",
    "241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
    "268667990+Roy-oss1@users.noreply.github.com": "Roy-oss1",
@@ -76,7 +79,18 @@ AUTHOR_MAP = {
    "39405770+yyq4193@users.noreply.github.com": "yyq4193",
    "Asunfly@users.noreply.github.com": "Asunfly",
    "2500400+honghua@users.noreply.github.com": "honghua",
+    "462836+jplew@users.noreply.github.com": "jplew",
    "nish3451@users.noreply.github.com": "nish3451",
+    "Mibayy@users.noreply.github.com": "Mibayy",
+    "mibayy@users.noreply.github.com": "Mibayy",
+    "135070653+sgaofen@users.noreply.github.com": "sgaofen",
+    "nocoo@users.noreply.github.com": "nocoo",
+    "30841158+n-WN@users.noreply.github.com": "n-WN",
+    "leoyuan0099@gmail.com": "keyuyuan",
+    "bxzt2006@163.com": "Only-Code-A",
+    "i@troy-y.org": "TroyMitchell911",
+    "mygamez@163.com": "zhongyueming1121",
+    "hansnow@users.noreply.github.com": "hansnow",
    # contributors (manual mapping from git names)
    "ahmedsherif95@gmail.com": "asheriif",
    "liujinkun@bytedance.com": "liujinkun2025",
@@ -94,14 +108,22 @@ AUTHOR_MAP = {
    "xiewenxuan462@gmail.com": "yule975",
    "yiweimeng.dlut@hotmail.com": "meng93",
    "hakanerten02@hotmail.com": "teyrebaz33",
+    "linux2010@users.noreply.github.com": "Linux2010",
+    "elmatadorgh@users.noreply.github.com": "elmatadorgh",
+    "alexazzjjtt@163.com": "alexzhu0",
+    "1180176+Swift42@users.noreply.github.com": "Swift42",
    "ruzzgarcn@gmail.com": "Ruzzgar",
    "alireza78.crypto@gmail.com": "alireza78a",
    "brooklyn.bb.nicholson@gmail.com": "brooklynnicholson",
+    "withapurpose37@gmail.com": "StefanIsMe",
    "4317663+helix4u@users.noreply.github.com": "helix4u",
    "331214+counterposition@users.noreply.github.com": "counterposition",
    "blspear@gmail.com": "BrennerSpear",
    "akhater@gmail.com": "akhater",
    "239876380+handsdiff@users.noreply.github.com": "handsdiff",
+    "hesapacicam112@gmail.com": "etherman-os",
+    "mark.ramsell@rivermounts.com": "mark-ramsell",
+    "taeng02@icloud.com": "taeng0204",
    "gpickett00@gmail.com": "gpickett00",
    "mcosma@gmail.com": "wakamex",
    "clawdia.nash@proton.me": "clawdia-nash",
@@ -112,6 +134,7 @@ AUTHOR_MAP = {
    "noonou7@gmail.com": "HenkDz",
    "dean.kerr@gmail.com": "deankerr",
    "socrates1024@gmail.com": "socrates1024",
+    "seanalt555@gmail.com": "Salt-555",
    "satelerd@gmail.com": "satelerd",
    "numman.ali@gmail.com": "nummanali",
    "0xNyk@users.noreply.github.com": "0xNyk",
@@ -123,12 +146,14 @@ AUTHOR_MAP = {
    "aryan@synvoid.com": "aryansingh",
    "johnsonblake1@gmail.com": "blakejohnson",
    "hcn518@gmail.com": "pedh",
+    "haileymarshall005@gmail.com": "haileymarshall",
    "greer.guthrie@gmail.com": "g-guthrie",
    "kennyx102@gmail.com": "bobashopcashier",
    "shokatalishaikh95@gmail.com": "areu01or00",
    "bryan@intertwinesys.com": "bryanyoung",
    "christo.mitov@gmail.com": "christomitov",
    "hermes@nousresearch.com": "NousResearch",
+    "hermes@noushq.ai": "benbarclay",
    "chinmingcock@gmail.com": "ChimingLiu",
    "openclaw@sparklab.ai": "openclaw",
    "semihcvlk53@gmail.com": "Himess",
@@ -143,7 +168,7 @@ AUTHOR_MAP = {
    "jack.47@gmail.com": "JackTheGit",
    "dalvidjr2022@gmail.com": "Jr-kenny",
    "m@statecraft.systems": "mbierling",
-    "balyan.sid@gmail.com": "balyansid",
+    "balyan.sid@gmail.com": "alt-glitch",
    "oluwadareab12@gmail.com": "bennytimz",
    "simon@simonmarcus.org": "simon-marcus",
    "xowiekk@gmail.com": "Xowiek",
@@ -153,6 +178,10 @@ AUTHOR_MAP = {
    "1115117931@qq.com": "aaronagent",
    "1506751656@qq.com": "hqhq1025",
    "364939526@qq.com": "luyao618",
+    "hgk324@gmail.com": "houziershi",
+    "176644217+PStarH@users.noreply.github.com": "PStarH",
+    "51058514+Sanjays2402@users.noreply.github.com": "Sanjays2402",
+    "906014227@qq.com": "bingo906",
    "aaronwong1999@icloud.com": "AaronWong1999",
    "agents@kylefrench.dev": "DeployFaith",
    "angelos@oikos.lan.home.malaiwah.com": "angelos",
@@ -175,6 +204,7 @@ AUTHOR_MAP = {
    "duerzy@gmail.com": "duerzy",
    "emozilla@nousresearch.com": "emozilla",
    "fancydirty@gmail.com": "fancydirty",
+    "farion1231@gmail.com": "farion1231",
    "floptopbot33@gmail.com": "flobo3",
    "fontana.pedro93@gmail.com": "pefontana",
    "francis.x.fitzpatrick@gmail.com": "fxfitz",
@@ -193,6 +223,7 @@ AUTHOR_MAP = {
    "kagura.chen28@gmail.com": "kagura-agent",
    "1342088860@qq.com": "youngDoo",
    "kamil@gwozdz.me": "kamil-gwozdz",
+    "skmishra1991@gmail.com": "bugkill3r",
    "karamusti912@gmail.com": "MustafaKara7",
    "kira@ariaki.me": "kira-ariaki",
    "knopki@duck.com": "knopki",
@@ -203,6 +234,7 @@ AUTHOR_MAP = {
    "82095453+iacker@users.noreply.github.com": "iacker",
    "sontianye@users.noreply.github.com": "sontianye",
    "jackjin1997@users.noreply.github.com": "jackjin1997",
+    "1037461232@qq.com": "jackjin1997",
    "danieldoderlein@users.noreply.github.com": "danieldoderlein",
    "lrawnsley@users.noreply.github.com": "lrawnsley",
    "taeuk178@users.noreply.github.com": "taeuk178",
@@ -211,6 +243,7 @@ AUTHOR_MAP = {
    "ygd58@users.noreply.github.com": "ygd58",
    "vominh1919@users.noreply.github.com": "vominh1919",
    "iamagenius00@users.noreply.github.com": "iamagenius00",
+    "9219265+cresslank@users.noreply.github.com": "cresslank",
    "trevmanthony@gmail.com": "trevthefoolish",
    "ziliangpeng@users.noreply.github.com": "ziliangpeng",
    "centripetal-star@users.noreply.github.com": "centripetal-star",
@@ -268,9 +301,15 @@ AUTHOR_MAP = {
    "asurla@nvidia.com": "anniesurla",
    "limkuan24@gmail.com": "WideLee",
    "aviralarora002@gmail.com": "AviArora02-commits",
+    "draixagent@gmail.com": "draix",
    "junminliu@gmail.com": "JimLiu",
    "jarvischer@gmail.com": "maxchernin",
    "levantam.98.2324@gmail.com": "LVT382009",
+    "zhurongcheng@rcrai.com": "heykb",
+    "withapurpose37@gmail.com": "StefanIsMe",
+    "261797239+lumenradley@users.noreply.github.com": "lumenradley",
+    "166376523+sjz-ks@users.noreply.github.com": "sjz-ks",
+    "haileymarshall005@gmail.com": "haileymarshall",
 }


@@ -338,7 +338,6 @@ Edit with `hermes config edit` or `hermes config set section.key value`.
 | `memory` | `memory_enabled`, `user_profile_enabled`, `provider` |
 | `security` | `tirith_enabled`, `website_blocklist` |
 | `delegation` | `model`, `provider`, `base_url`, `api_key`, `max_iterations` (50), `reasoning_effort` |
-| `smart_model_routing` | `enabled`, `cheap_model` |
 | `checkpoints` | `enabled`, `max_snapshots` (50) |

 Full config reference: https://hermes-agent.nousresearch.com/docs/user-guide/configuration
@@ -0,0 +1,54 @@
+# Attribution
+
+This skill bundles code ported from a third-party MIT-licensed project.
+All reuse is credited here.
+
+## pixel-art-studio (Synero)
+
+- Source: https://github.com/Synero/pixel-art-studio
+- License: MIT
+- Copyright: © Synero, MIT-licensed contributors
+
+### What was ported
+
+**`scripts/palettes.py`** — the `PALETTES` dict containing 23 named RGB
+palettes (hardware and artistic). Values are reproduced verbatim from
+`scripts/pixelart.py` of pixel-art-studio.
+
+**`scripts/pixel_art_video.py`** — the 12 procedural animation init/draw pairs
+(`stars`, `fireflies`, `leaves`, `dust_motes`, `sparkles`, `rain`,
+`lightning`, `bubbles`, `embers`, `snowflakes`, `neon_pulse`, `heat_shimmer`)
+and the `SCENES` → layer mapping. Ported from `scripts/pixelart_video.py`
+with minor refactors:
+- Names prefixed with `_` for private helpers (`_px`, `_pixel_cross`)
+- `SCENE_ANIMATIONS` renamed to `SCENES` and restructured to hold layer
+  names (strings) instead of function-name strings resolved via `globals()`
+- `generate_video()` split: the Pollinations text-to-image call was removed
+  (Hermes uses its own `image_generate` + `pixel_art()` pipeline for base
+  frames). Only the overlay + ffmpeg encoding remains.
+- Frame directory is now a `tempfile.TemporaryDirectory` instead of
+  hand-managed cleanup.
+- `ffmpeg` invocation switched from `os.system` to `subprocess.run(check=True)`
+  for safety.
+
+### What was NOT ported
+
+- Wu's Color Quantization (PIL's built-in `quantize` suffices)
+- Sobel edge-aware downsampling (requires scipy; not worth the dep)
+- Bayer / Atkinson dither (would need numpy reimplementation; kept scope tight)
+- Pollinations text-to-image generation (`pixelart_image.py`,
+  `generate_base()` in `pixelart_video.py`) — Hermes has `image_generate`
+
+### License compatibility
+
+pixel-art-studio ships under the MIT License, which permits redistribution
+with attribution. This skill preserves the original copyright notice here
+and in the SKILL.md credits block. No code was relicensed.
+
+---
+
+## pixel-art skill itself
+
+- License: MIT (inherits from hermes-agent repo)
+- Original author of the skill shell: dodo-reach
+- Expansion with palettes + video: Hermes Agent contributors
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`"""Transport layer types for provider response normalization."""`