feat(sessions): add --sanitize flag to sessions export

Port from anomalyco/opencode#22489: redact user/model content from session exports before sharing for bug reports or training data. Adds hermes_state.sanitize_session_export() which returns a deep-copied session with: - Message content, reasoning, and reasoning_details replaced with [redacted:<kind>:<id>] tokens - Tool-call arguments redacted (tool id, type, and function name preserved) - Session title and system_prompt redacted - All structural/metric fields preserved: ids, timestamps, token counts, tool names, finish reasons, model info, cost data, message counts Wired into 'hermes sessions export --sanitize' (applies to both --session-id and full exports). The flag is opt-in — default behaviour is unchanged. User sees '(sanitized)' suffix on the export summary when the flag is active. 5 new tests covering content redaction, reasoning/tool-call redaction, empty-value preservation, input immutability, and reasoning_details block structure. E2E verified: raw export still leaks sk-proj-* API keys and usernames, sanitized export replaces them with redaction tokens while preserving model names, tool names, and tool call ids. Authored-by: Hermes Agent (autonomous weekly OpenCode PR scout)
chore(release): map mbelleau@Michels-MacBook-Pro.local to @malaiwah
2026-04-16 17:11:11 -07:00 · 2026-04-16 16:50:15 -07:00 · 2026-04-16 16:50:15 -07:00 · 2026-04-16 16:49:22 -07:00 · 2026-04-16 16:49:00 -07:00 · 2026-04-16 16:48:14 -07:00
296 changed files with 25766 additions and 5106 deletions
@@ -24,6 +24,15 @@
 # Optional base URL override (default: Google's OpenAI-compatible endpoint)
 # GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai

+# =============================================================================
+# LLM PROVIDER (Ollama Cloud)
+# =============================================================================
+# Cloud-hosted open models via Ollama's OpenAI-compatible endpoint.
+# Get your key at: https://ollama.com/settings
+# OLLAMA_API_KEY=your_ollama_key_here
+# Optional base URL override (default: https://ollama.com/v1)
+# OLLAMA_BASE_URL=https://ollama.com/v1
+
 # =============================================================================
 # LLM PROVIDER (z.ai / GLM)
 # =============================================================================
@@ -1,11 +1,12 @@
 name: Deploy Site

 on:
+  release:
+    types: [published]
  push:
    branches: [main]
    paths:
      - 'website/**'
-      - 'landingpage/**'
      - 'skills/**'
      - 'optional-skills/**'
      - '.github/workflows/deploy-site.yml'
@@ -20,8 +21,14 @@ concurrency:
  cancel-in-progress: false

 jobs:
-  build-and-deploy:
-    # Only run on the upstream repository, not on forks
+  deploy-vercel:
+    if: github.event_name == 'release'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Trigger Vercel Deploy
+        run: curl -X POST "${{ secrets.VERCEL_DEPLOY_HOOK }}"
+
+  deploy-docs:
    if: github.repository == 'NousResearch/hermes-agent'
    runs-on: ubuntu-latest
    environment:
@@ -65,12 +72,7 @@ jobs:
      - name: Stage deployment
        run: |
          mkdir -p _site/docs
-          # Landing page at root
-          cp -r landingpage/* _site/
-          # Docusaurus at /docs/
          cp -r website/build/* _site/docs/
-          # CNAME so GitHub Pages keeps the custom domain between deploys
-          echo "hermes-agent.nousresearch.com" > _site/CNAME

      - name: Upload artifact
        uses: actions/upload-pages-artifact@56afc609e74202658d3ffba0e8f6dda462b719fa  # v3
@@ -105,3 +105,4 @@ tesseracttars-creator <tesseracttars@gmail.com> <tesseracttars@gmail.com>
 xinbenlv <zzn+pa@zzn.im> <zzn+pa@zzn.im>
 SaulJWu <saul.jj.wu@gmail.com> <saul.jj.wu@gmail.com>
 angelos <angelos@oikos.lan.home.malaiwah.com> <angelos@oikos.lan.home.malaiwah.com>
+MestreY0d4-Uninter <241404605+MestreY0d4-Uninter@users.noreply.github.com> <MestreY0d4-Uninter@users.noreply.github.com>
@@ -0,0 +1,27 @@
+# Hermes Agent v0.10.0 (v2026.4.16)
+
+**Release Date:** April 16, 2026
+
+> The Tool Gateway release — paid Nous Portal subscribers can now use web search, image generation, text-to-speech, and browser automation through their existing subscription with zero additional API keys.
+
+---
+
+## ✨ Highlights
+
+- **Nous Tool Gateway** — Paid [Nous Portal](https://portal.nousresearch.com) subscribers now get automatic access to **web search** (Firecrawl), **image generation** (FAL / FLUX 2 Pro), **text-to-speech** (OpenAI TTS), and **browser automation** (Browser Use) through their existing subscription. No separate API keys needed — just run `hermes model`, select Nous Portal, and pick which tools to enable. Per-tool opt-in via `use_gateway` config, full integration with `hermes tools` and `hermes status`, and the runtime correctly prefers the gateway even when direct API keys exist. Replaces the old hidden `HERMES_ENABLE_NOUS_MANAGED_TOOLS` env var with clean subscription-based detection. ([#11206](https://github.com/NousResearch/hermes-agent/pull/11206), based on work by @jquesnelle; docs: [#11208](https://github.com/NousResearch/hermes-agent/pull/11208))
+
+---
+
+## 🐛 Bug Fixes & Improvements
+
+This release includes 180+ commits with numerous bug fixes, platform improvements, and reliability enhancements across the agent core, gateway, CLI, and tool system. Full details will be published in the v0.11.0 changelog.
+
+---
+
+## 👥 Contributors
+
+- **@jquesnelle** (emozilla) — Original Tool Gateway implementation ([#10799](https://github.com/NousResearch/hermes-agent/pull/10799)), salvaged and shipped in this release
+
+---
+
+**Full Changelog**: [v2026.4.13...v2026.4.16](https://github.com/NousResearch/hermes-agent/compare/v2026.4.13...v2026.4.16)
@@ -0,0 +1,84 @@
+# Hermes Agent Security Policy
+
+This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project.
+
+## 1. Vulnerability Reporting
+
+Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities.
+
+### Required Submission Details
+- **Title & Severity:** Concise description and CVSS score/rating.
+- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`).
+- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version.
+- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release.
+- **Impact:** Explanation of what trust boundary was crossed.
+
+---
+
+## 2. Trust Model
+
+The core assumption is that Hermes is a **personal agent** with one trusted operator.
+
+### Operator & Session Trust
+- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level.
+- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries.
+- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing.
+
+### Dangerous Command Approval
+The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`:
+- `"on"` (default) — prompts the user to approve dangerous commands.
+- `"auto"` — auto-approves after a configurable delay.
+- `"off"` — disables the gate entirely (break-glass; see Section 3).
+
+### Output Redaction
+`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations.
+
+### Skills vs. MCP Servers
+- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands.
+- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning.
+
+### Code Execution Sandbox
+The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls.
+
+### Subagents
+- **No recursive delegation:** The `delegate_task` tool is disabled for child agents.
+- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected.
+- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation.
+
+---
+
+## 3. Out of Scope (Non-Vulnerabilities)
+
+The following scenarios are **not** considered security breaches:
+- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox.
+- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection.
+- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files).
+- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability.
+- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production.
+- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system).
+
+---
+
+## 4. Deployment Hardening & Best Practices
+
+### Filesystem & Network
+- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads.
+- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs.
+- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment.
+
+### Skills & Supply Chain
+- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal.
+- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned.
+- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns.
+
+### Credential Storage
+- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control.
+- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases.
+
+---
+
+## 5. Disclosure Process
+
+- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first.
+- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com.
+- **Credits:** Reporters are credited in release notes unless anonymity is requested.
@@ -28,19 +28,45 @@ except ImportError:
 logger = logging.getLogger(__name__)

 THINKING_BUDGET = {"xhigh": 32000, "high": 16000, "medium": 8000, "low": 4000}
+# Hermes effort → Anthropic adaptive-thinking effort (output_config.effort).
+# Anthropic exposes 5 levels on 4.7+: low, medium, high, xhigh, max.
+# Opus/Sonnet 4.6 only expose 4 levels: low, medium, high, max — no xhigh.
+# We preserve xhigh as xhigh on 4.7+ (the recommended default for coding/
+# agentic work) and downgrade it to max on pre-4.7 adaptive models (which
+# is the strongest level they accept).  "minimal" is a legacy alias that
+# maps to low on every model.  See:
+# https://platform.claude.com/docs/en/about-claude/models/migration-guide
 ADAPTIVE_EFFORT_MAP = {
-    "xhigh": "max",
-    "high": "high",
-    "medium": "medium",
-    "low": "low",
+    "max":     "max",
+    "xhigh":   "xhigh",
+    "high":    "high",
+    "medium":  "medium",
+    "low":     "low",
    "minimal": "low",
 }

+# Models that accept the "xhigh" output_config.effort level.  Opus 4.7 added
+# xhigh as a distinct level between high and max; older adaptive-thinking
+# models (4.6) reject it with a 400.  Keep this substring list in sync with
+# the Anthropic migration guide as new model families ship.
+_XHIGH_EFFORT_SUBSTRINGS = ("4-7", "4.7")
+
+# Models where extended thinking is deprecated/removed (4.6+ behavior: adaptive
+# is the only supported mode; 4.7 additionally forbids manual thinking entirely
+# and drops temperature/top_p/top_k).
+_ADAPTIVE_THINKING_SUBSTRINGS = ("4-6", "4.6", "4-7", "4.7")
+
+# Models where temperature/top_p/top_k return 400 if set to non-default values.
+# This is the Opus 4.7 contract; future 4.x+ models are expected to follow it.
+_NO_SAMPLING_PARAMS_SUBSTRINGS = ("4-7", "4.7")
+
 # ── Max output token limits per Anthropic model ───────────────────────
 # Source: Anthropic docs + Cline model catalog.  Anthropic's API requires
 # max_tokens as a mandatory field.  Previously we hardcoded 16384, which
 # starves thinking-enabled models (thinking tokens count toward the limit).
 _ANTHROPIC_OUTPUT_LIMITS = {
+    # Claude 4.7
+    "claude-opus-4-7":   128_000,
    # Claude 4.6
    "claude-opus-4-6":   128_000,
    "claude-sonnet-4-6":  64_000,
@@ -91,11 +117,37 @@ def _get_anthropic_max_output(model: str) -> int:


 def _supports_adaptive_thinking(model: str) -> bool:
-    """Return True for Claude 4.6 models that support adaptive thinking."""
-    return any(v in model for v in ("4-6", "4.6"))
+    """Return True for Claude 4.6+ models that support adaptive thinking."""
+    return any(v in model for v in _ADAPTIVE_THINKING_SUBSTRINGS)


-# Beta headers for enhanced features (sent with ALL auth types)
+def _supports_xhigh_effort(model: str) -> bool:
+    """Return True for models that accept the 'xhigh' adaptive effort level.
+
+    Opus 4.7 introduced xhigh as a distinct level between high and max.
+    Pre-4.7 adaptive models (Opus/Sonnet 4.6) only accept low/medium/high/max
+    and reject xhigh with an HTTP 400. Callers should downgrade xhigh→max
+    when this returns False.
+    """
+    return any(v in model for v in _XHIGH_EFFORT_SUBSTRINGS)
+
+
+def _forbids_sampling_params(model: str) -> bool:
+    """Return True for models that 400 on any non-default temperature/top_p/top_k.
+
+    Opus 4.7 explicitly rejects sampling parameters; later Claude releases are
+    expected to follow suit.  Callers should omit these fields entirely rather
+    than passing zero/default values (the API rejects anything non-null).
+    """
+    return any(v in model for v in _NO_SAMPLING_PARAMS_SUBSTRINGS)
+
+
+# Beta headers for enhanced features (sent with ALL auth types).
+# As of Opus 4.7 (2026-04-16), both of these are GA on Claude 4.6+ — the
+# beta headers are still accepted (harmless no-op) but not required. Kept
+# here so older Claude (4.5, 4.1) + third-party Anthropic-compat endpoints
+# that still gate on the headers continue to get the enhanced features.
+# Migration guide: remove these if you no longer support ≤4.5 models.
 _COMMON_BETAS = [
    "interleaved-thinking-2025-05-14",
    "fine-grained-tool-streaming-2025-05-14",
@@ -298,6 +350,33 @@ def build_anthropic_client(api_key: str, base_url: str = None):
    return _anthropic_sdk.Anthropic(**kwargs)


+def build_anthropic_bedrock_client(region: str):
+    """Create an AnthropicBedrock client for Bedrock Claude models.
+
+    Uses the Anthropic SDK's native Bedrock adapter, which provides full
+    Claude feature parity: prompt caching, thinking budgets, adaptive
+    thinking, fast mode — features not available via the Converse API.
+
+    Auth uses the boto3 default credential chain (IAM roles, SSO, env vars).
+    """
+    if _anthropic_sdk is None:
+        raise ImportError(
+            "The 'anthropic' package is required for the Bedrock provider. "
+            "Install it with: pip install 'anthropic>=0.39.0'"
+        )
+    if not hasattr(_anthropic_sdk, "AnthropicBedrock"):
+        raise ImportError(
+            "anthropic.AnthropicBedrock not available. "
+            "Upgrade with: pip install 'anthropic>=0.39.0'"
+        )
+    from httpx import Timeout
+
+    return _anthropic_sdk.AnthropicBedrock(
+        aws_region=region,
+        timeout=Timeout(timeout=900.0, connect=10.0),
+    )
+
+
 def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
    """Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.

@@ -1314,18 +1393,31 @@ def build_anthropic_kwargs(
            kwargs["tool_choice"] = {"type": "tool", "name": tool_choice}

    # Map reasoning_config to Anthropic's thinking parameter.
-    # Claude 4.6 models use adaptive thinking + output_config.effort.
+    # Claude 4.6+ models use adaptive thinking + output_config.effort.
    # Older models use manual thinking with budget_tokens.
    # MiniMax Anthropic-compat endpoints support thinking (manual mode only,
    # not adaptive).  Haiku does NOT support extended thinking — skip entirely.
+    #
+    # On 4.7+ the `thinking.display` field defaults to "omitted", which
+    # silently hides reasoning text that Hermes surfaces in its CLI. We
+    # request "summarized" so the reasoning blocks stay populated — matching
+    # 4.6 behavior and preserving the activity-feed UX during long tool runs.
    if reasoning_config and isinstance(reasoning_config, dict):
        if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
            effort = str(reasoning_config.get("effort", "medium")).lower()
            budget = THINKING_BUDGET.get(effort, 8000)
            if _supports_adaptive_thinking(model):
-                kwargs["thinking"] = {"type": "adaptive"}
+                kwargs["thinking"] = {
+                    "type": "adaptive",
+                    "display": "summarized",
+                }
+                adaptive_effort = ADAPTIVE_EFFORT_MAP.get(effort, "medium")
+                # Downgrade xhigh→max on models that don't list xhigh as a
+                # supported level (Opus/Sonnet 4.6). Opus 4.7+ keeps xhigh.
+                if adaptive_effort == "xhigh" and not _supports_xhigh_effort(model):
+                    adaptive_effort = "max"
                kwargs["output_config"] = {
-                    "effort": ADAPTIVE_EFFORT_MAP.get(effort, "medium")
+                    "effort": adaptive_effort,
                }
            else:
                kwargs["thinking"] = {"type": "enabled", "budget_tokens": budget}
@@ -1333,6 +1425,15 @@ def build_anthropic_kwargs(
                kwargs["temperature"] = 1
                kwargs["max_tokens"] = max(effective_max_tokens, budget + 4096)

+    # ── Strip sampling params on 4.7+ ─────────────────────────────────
+    # Opus 4.7 rejects any non-default temperature/top_p/top_k with a 400.
+    # Callers (auxiliary_client, flush_memories, etc.) may set these for
+    # older models; drop them here as a safety net so upstream 4.6 → 4.7
+    # migrations don't require coordinated edits everywhere.
+    if _forbids_sampling_params(model):
+        for _sampling_key in ("temperature", "top_p", "top_k"):
+            kwargs.pop(_sampling_key, None)
+
    # ── Fast mode (Opus 4.6 only) ────────────────────────────────────
    # Adds extra_body.speed="fast" + the fast-mode beta header for ~2.5x
    # output speed. Only for native Anthropic endpoints — third-party
@@ -1390,12 +1491,20 @@ def normalize_anthropic_response(
                )
            )

-    # Map Anthropic stop_reason to OpenAI finish_reason
+    # Map Anthropic stop_reason to OpenAI finish_reason.
+    # Newer stop reasons added in Claude 4.5+ / 4.7:
+    #   - refusal: the model declined to answer (cyber safeguards, CSAM, etc.)
+    #   - model_context_window_exceeded: hit context limit (not max_tokens)
+    # Both need distinct handling upstream — a refusal should surface to the
+    # user with a clear message, and a context-window overflow should trigger
+    # compression/truncation rather than be treated as normal end-of-turn.
    stop_reason_map = {
        "end_turn": "stop",
        "tool_use": "tool_calls",
        "max_tokens": "length",
        "stop_sequence": "stop",
+        "refusal": "content_filter",
+        "model_context_window_exceeded": "length",
    }
    finish_reason = stop_reason_map.get(response.stop_reason, "stop")

@@ -58,6 +58,9 @@ _PROVIDER_ALIASES = {
    "google": "gemini",
    "google-gemini": "gemini",
    "google-ai-studio": "gemini",
+    "x-ai": "xai",
+    "x.ai": "xai",
+    "grok": "xai",
    "glm": "zai",
    "z-ai": "zai",
    "z.ai": "zai",
@@ -104,6 +107,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
    "opencode-zen": "gemini-3-flash",
    "opencode-go": "glm-5",
    "kilocode": "google/gemini-3-flash-preview",
+    "ollama-cloud": "nemotron-3-nano:30b",
 }

 # Vision-specific model overrides for direct providers.
@@ -514,8 +518,13 @@ class _AnthropicCompletionsAdapter:
            tool_choice=normalized_tool_choice,
            is_oauth=self._is_oauth,
        )
+        # Opus 4.7+ rejects any non-default temperature/top_p/top_k; only set
+        # temperature for models that still accept it. build_anthropic_kwargs
+        # additionally strips these keys as a safety net — keep both layers.
        if temperature is not None:
-            anthropic_kwargs["temperature"] = temperature
+            from agent.anthropic_adapter import _forbids_sampling_params
+            if not _forbids_sampling_params(model):
+                anthropic_kwargs["temperature"] = temperature

        response = self._client.messages.create(**anthropic_kwargs)
        assistant_message, finish_reason = normalize_anthropic_response(response)
@@ -775,6 +784,21 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:


 def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
+    # Check cross-session rate limit guard before attempting Nous —
+    # if another session already recorded a 429, skip Nous entirely
+    # to avoid piling more requests onto the tapped RPH bucket.
+    try:
+        from agent.nous_rate_guard import nous_rate_limit_remaining
+        _remaining = nous_rate_limit_remaining()
+        if _remaining is not None and _remaining > 0:
+            logger.debug(
+                "Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
+                _remaining,
+            )
+            return None, None
+    except Exception:
+        pass
+
    nous = _read_nous_auth()
    if not nous:
        return None, None
@@ -899,6 +923,51 @@ def _current_custom_base_url() -> str:
    return custom_base or ""


+def _validate_proxy_env_urls() -> None:
+    """Fail fast with a clear error when proxy env vars have malformed URLs.
+
+    Common cause: shell config (e.g. .zshrc) with a typo like
+    ``export HTTP_PROXY=http://127.0.0.1:6153export NEXT_VAR=...``
+    which concatenates 'export' into the port number.  Without this
+    check the OpenAI/httpx client raises a cryptic ``Invalid port``
+    error that doesn't name the offending env var.
+    """
+    from urllib.parse import urlparse
+
+    for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
+                "https_proxy", "http_proxy", "all_proxy"):
+        value = str(os.environ.get(key) or "").strip()
+        if not value:
+            continue
+        try:
+            parsed = urlparse(value)
+            if parsed.scheme:
+                _ = parsed.port          # raises ValueError for e.g. '6153export'
+        except ValueError as exc:
+            raise RuntimeError(
+                f"Malformed proxy environment variable {key}={value!r}. "
+                "Fix or unset your proxy settings and try again."
+            ) from exc
+
+
+def _validate_base_url(base_url: str) -> None:
+    """Reject obviously broken custom endpoint URLs before they reach httpx."""
+    from urllib.parse import urlparse
+
+    candidate = str(base_url or "").strip()
+    if not candidate or candidate.startswith("acp://"):
+        return
+    try:
+        parsed = urlparse(candidate)
+        if parsed.scheme in {"http", "https"}:
+            _ = parsed.port              # raises ValueError for malformed ports
+    except ValueError as exc:
+        raise RuntimeError(
+            f"Malformed custom endpoint URL: {candidate!r}. "
+            "Run `hermes setup` or `hermes model` and enter a valid http(s) base URL."
+        ) from exc
+
+
 def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
    runtime = _resolve_custom_runtime()
    if len(runtime) == 2:
@@ -1299,6 +1368,7 @@ def resolve_provider_client(
    Returns:
        (client, resolved_model) or (None, None) if auth is unavailable.
    """
+    _validate_proxy_env_urls()
    # Normalise aliases
    provider = _normalize_aux_provider(provider)

@@ -1835,9 +1905,15 @@ def auxiliary_max_tokens_param(value: int) -> dict:
 # Every auxiliary LLM consumer should use these instead of manually
 # constructing clients and calling .chat.completions.create().

-# Client cache: (provider, async_mode, base_url, api_key) -> (client, default_model)
+# Client cache: (provider, async_mode, base_url, api_key, api_mode, runtime_key) -> (client, default_model, loop)
+# NOTE: loop identity is NOT part of the key.  On async cache hits we check
+# whether the cached loop is the *current* loop; if not, the stale entry is
+# replaced in-place.  This bounds cache growth to one entry per unique
+# provider config rather than one per (config × event-loop), which previously
+# caused unbounded fd accumulation in long-running gateway processes (#10200).
 _client_cache: Dict[tuple, tuple] = {}
 _client_cache_lock = threading.Lock()
+_CLIENT_CACHE_MAX_SIZE = 64  # safety belt — evict oldest when exceeded


 def neuter_async_httpx_del() -> None:
@@ -1970,39 +2046,49 @@ def _get_cached_client(
    Async clients (AsyncOpenAI) use httpx.AsyncClient internally, which
    binds to the event loop that was current when the client was created.
    Using such a client on a *different* loop causes deadlocks or
-    RuntimeError.  To prevent cross-loop issues (especially in gateway
-    mode where _run_async() may spawn fresh loops in worker threads), the
-    cache key for async clients includes the current event loop's identity
-    so each loop gets its own client instance.
+    RuntimeError.  To prevent cross-loop issues, the cache validates on
+    every async hit that the cached loop is the *current, open* loop.
+    If the loop changed (e.g. a new gateway worker-thread loop), the stale
+    entry is replaced in-place rather than creating an additional entry.
+
+    This keeps cache size bounded to one entry per unique provider config,
+    preventing the fd-exhaustion that previously occurred in long-running
+    gateways where recycled worker threads created unbounded entries (#10200).
    """
-    # Include loop identity for async clients to prevent cross-loop reuse.
-    # httpx.AsyncClient (inside AsyncOpenAI) is bound to the loop where it
-    # was created — reusing it on a different loop causes deadlocks (#2681).
-    loop_id = 0
+    # Resolve the current event loop for async clients so we can validate
+    # cached entries.  Loop identity is NOT in the cache key — instead we
+    # check at hit time whether the cached loop is still current and open.
+    # This prevents unbounded cache growth from recycled worker-thread loops
+    # while still guaranteeing we never reuse a client on the wrong loop
+    # (which causes deadlocks, see #2681).
    current_loop = None
    if async_mode:
        try:
            import asyncio as _aio
            current_loop = _aio.get_event_loop()
-            loop_id = id(current_loop)
        except RuntimeError:
            pass
    runtime = _normalize_main_runtime(main_runtime)
    runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
-    cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", loop_id, runtime_key)
+    cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
    with _client_cache_lock:
        if cache_key in _client_cache:
            cached_client, cached_default, cached_loop = _client_cache[cache_key]
            if async_mode:
-                # A cached async client whose loop has been closed will raise
-                # "Event loop is closed" when httpx tries to clean up its
-                # transport.  Discard the stale client and create a fresh one.
-                if cached_loop is not None and cached_loop.is_closed():
-                    _force_close_async_httpx(cached_client)
-                    del _client_cache[cache_key]
-                else:
+                # Validate: the cached client must be bound to the CURRENT,
+                # OPEN loop.  If the loop changed or was closed, the httpx
+                # transport inside is dead — force-close and replace.
+                loop_ok = (
+                    cached_loop is not None
+                    and cached_loop is current_loop
+                    and not cached_loop.is_closed()
+                )
+                if loop_ok:
                    effective = _compat_model(cached_client, model, cached_default)
                    return cached_client, effective
+                # Stale — evict and fall through to create a new client.
+                _force_close_async_httpx(cached_client)
+                del _client_cache[cache_key]
            else:
                effective = _compat_model(cached_client, model, cached_default)
                return cached_client, effective
@@ -2022,6 +2108,12 @@ def _get_cached_client(
        bound_loop = current_loop
        with _client_cache_lock:
            if cache_key not in _client_cache:
+                # Safety belt: if the cache has grown beyond the max, evict
+                # the oldest entries (FIFO — dict preserves insertion order).
+                while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE:
+                    evict_key, evict_entry = next(iter(_client_cache.items()))
+                    _force_close_async_httpx(evict_entry[0])
+                    del _client_cache[evict_key]
                _client_cache[cache_key] = (client, default_model, bound_loop)
            else:
                client, default_model, _ = _client_cache[cache_key]
@@ -2201,6 +2293,15 @@ def _build_call_kwargs(
        "timeout": timeout,
    }

+    # Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
+    # drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
+    # flush_memories, 0 on structured-JSON extraction) don't 400 the moment
+    # the aux model is flipped to 4.7.
+    if temperature is not None:
+        from agent.anthropic_adapter import _forbids_sampling_params
+        if _forbids_sampling_params(model):
+            temperature = None
+
    if temperature is not None:
        kwargs["temperature"] = temperature

@@ -2304,10 +2405,10 @@ def call_llm(

    if task == "vision":
        effective_provider, client, final_model = resolve_vision_provider_client(
-            provider=provider,
-            model=model,
-            base_url=base_url,
-            api_key=api_key,
+            provider=resolved_provider if resolved_provider != "auto" else provider,
+            model=resolved_model or model,
+            base_url=resolved_base_url or base_url,
+            api_key=resolved_api_key or api_key,
            async_mode=False,
        )
        if client is None and resolved_provider != "auto" and not resolved_base_url:
@@ -2512,10 +2613,10 @@ async def async_call_llm(

    if task == "vision":
        effective_provider, client, final_model = resolve_vision_provider_client(
-            provider=provider,
-            model=model,
-            base_url=base_url,
-            api_key=api_key,
+            provider=resolved_provider if resolved_provider != "auto" else provider,
+            model=resolved_model or model,
+            base_url=resolved_base_url or base_url,
+            api_key=resolved_api_key or api_key,
            async_mode=True,
        )
        if client is None and resolved_provider != "auto" and not resolved_base_url:
@@ -39,7 +39,10 @@ SUMMARY_PREFIX = (
    "into the summary below. This is a handoff from a previous context "
    "window — treat it as background reference, NOT as active instructions. "
    "Do NOT answer questions or fulfill requests mentioned in this summary; "
-    "they were already addressed. Respond ONLY to the latest user message "
+    "they were already addressed. "
+    "Your current task is identified in the '## Active Task' section of the "
+    "summary — resume exactly from there. "
+    "Respond ONLY to the latest user message "
    "that appears AFTER this summary. The current session state (files, "
    "config, etc.) may reflect work described here — avoid repeating it:"
 )
@@ -581,8 +584,16 @@ class ContextCompressor(ContextEngine):
        )

        # Shared structured template (used by both paths).
-        _template_sections = f"""## Goal
-[What the user is trying to accomplish]
+        _template_sections = f"""## Active Task
+[THE SINGLE MOST IMPORTANT FIELD. Copy the user's most recent request or
+task assignment verbatim — the exact words they used. If multiple tasks
+were requested and only some are done, list only the ones NOT yet completed.
+The next assistant must pick up exactly here. Example:
+"User asked: 'Now refactor the auth module to use JWT instead of sessions'"
+If no outstanding task exists, write "None."]
+
+## Goal
+[What the user is trying to accomplish overall]

 ## Constraints & Preferences
 [User preferences, coding style, constraints, important decisions]
@@ -644,7 +655,7 @@ PREVIOUS SUMMARY:
 NEW TURNS TO INCORPORATE:
 {content_to_summarize}

-Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete.
+Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete. CRITICAL: Update "## Active Task" to reflect the user's most recent unfulfilled request — this is the most important field for task continuity.

 {_template_sections}"""
        else:
@@ -862,6 +873,62 @@ The user has requested that this compaction PRIORITISE preserving all informatio
    # Tail protection by token budget
    # ------------------------------------------------------------------

+    def _find_last_user_message_idx(
+        self, messages: List[Dict[str, Any]], head_end: int
+    ) -> int:
+        """Return the index of the last user-role message at or after *head_end*, or -1."""
+        for i in range(len(messages) - 1, head_end - 1, -1):
+            if messages[i].get("role") == "user":
+                return i
+        return -1
+
+    def _ensure_last_user_message_in_tail(
+        self,
+        messages: List[Dict[str, Any]],
+        cut_idx: int,
+        head_end: int,
+    ) -> int:
+        """Guarantee the most recent user message is in the protected tail.
+
+        Context compressor bug (#10896): ``_align_boundary_backward`` can pull
+        ``cut_idx`` past a user message when it tries to keep tool_call/result
+        groups together.  If the last user message ends up in the *compressed*
+        middle region the LLM summariser writes it into "Pending User Asks",
+        but ``SUMMARY_PREFIX`` tells the next model to respond only to user
+        messages *after* the summary — so the task effectively disappears from
+        the active context, causing the agent to stall, repeat completed work,
+        or silently drop the user's latest request.
+
+        Fix: if the last user-role message is not already in the tail
+        (``messages[cut_idx:]``), walk ``cut_idx`` back to include it.  We
+        then re-align backward one more time to avoid splitting any
+        tool_call/result group that immediately precedes the user message.
+        """
+        last_user_idx = self._find_last_user_message_idx(messages, head_end)
+        if last_user_idx < 0:
+            # No user message found beyond head — nothing to anchor.
+            return cut_idx
+
+        if last_user_idx >= cut_idx:
+            # Already in the tail; nothing to do.
+            return cut_idx
+
+        # The last user message is in the middle (compressed) region.
+        # Pull cut_idx back to it directly — a user message is already a
+        # clean boundary (no tool_call/result splitting risk), so there is no
+        # need to call _align_boundary_backward here; doing so would
+        # unnecessarily pull the cut further back into the preceding
+        # assistant + tool_calls group.
+        if not self.quiet_mode:
+            logger.debug(
+                "Anchoring tail cut to last user message at index %d "
+                "(was %d) to prevent active-task loss after compression",
+                last_user_idx,
+                cut_idx,
+            )
+        # Safety: never go back into the head region.
+        return max(last_user_idx, head_end + 1)
+
    def _find_tail_cut_by_tokens(
        self, messages: List[Dict[str, Any]], head_end: int,
        token_budget: int | None = None,
@@ -879,7 +946,8 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        read, etc.).  If even the minimum 3 messages exceed 1.5x the budget
        the cut is placed right after the head so compression still runs.

-        Never cuts inside a tool_call/result group.
+        Never cuts inside a tool_call/result group.  Always ensures the most
+        recent user message is in the tail (see ``_ensure_last_user_message_in_tail``).
        """
        if token_budget is None:
            token_budget = self.tail_token_budget
@@ -918,6 +986,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        # Align to avoid splitting tool groups
        cut_idx = self._align_boundary_backward(messages, cut_idx)

+        # Ensure the most recent user message is always in the tail so the
+        # active task is never lost to compression (fixes #10896).
+        cut_idx = self._ensure_last_user_message_in_tail(messages, cut_idx, head_end)
+
        return max(cut_idx, head_end + 1)

    # ------------------------------------------------------------------
@@ -313,9 +313,25 @@ class CopilotACPClient:
            tools=tools,
            tool_choice=tool_choice,
        )
+        # Normalise timeout: run_agent.py may pass an httpx.Timeout object
+        # (used natively by the OpenAI SDK) rather than a plain float.
+        if timeout is None:
+            _effective_timeout = _DEFAULT_TIMEOUT_SECONDS
+        elif isinstance(timeout, (int, float)):
+            _effective_timeout = float(timeout)
+        else:
+            # httpx.Timeout or similar — pick the largest component so the
+            # subprocess has enough wall-clock time for the full response.
+            _candidates = [
+                getattr(timeout, attr, None)
+                for attr in ("read", "write", "connect", "pool", "timeout")
+            ]
+            _numeric = [float(v) for v in _candidates if isinstance(v, (int, float))]
+            _effective_timeout = max(_numeric) if _numeric else _DEFAULT_TIMEOUT_SECONDS
+
        response_text, reasoning_text = self._run_prompt(
            prompt_text,
-            timeout_seconds=float(timeout or _DEFAULT_TIMEOUT_SECONDS),
+            timeout_seconds=_effective_timeout,
        )

        tool_calls, cleaned_text = _extract_tool_calls_from_text(response_text)
@@ -1162,6 +1162,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
            if token:
                source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
                active_sources.add(source_name)
+                pconfig = PROVIDER_REGISTRY.get(provider)
                changed |= _upsert_entry(
                    entries,
                    provider,
@@ -1170,6 +1171,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
                        "source": source_name,
                        "auth_type": AUTH_TYPE_API_KEY,
                        "access_token": token,
+                        "base_url": pconfig.inference_base_url if pconfig else "",
                        "label": source,
                    },
                )
@@ -600,6 +600,45 @@ class KawaiiSpinner:
        "analyzing", "computing", "synthesizing", "formulating", "brainstorming",
    ]

+    @classmethod
+    def get_waiting_faces(cls) -> list:
+        """Return waiting faces from the active skin, falling back to KAWAII_WAITING."""
+        try:
+            skin = _get_skin()
+            if skin:
+                faces = skin.spinner.get("waiting_faces", [])
+                if faces:
+                    return faces
+        except Exception:
+            pass
+        return cls.KAWAII_WAITING
+
+    @classmethod
+    def get_thinking_faces(cls) -> list:
+        """Return thinking faces from the active skin, falling back to KAWAII_THINKING."""
+        try:
+            skin = _get_skin()
+            if skin:
+                faces = skin.spinner.get("thinking_faces", [])
+                if faces:
+                    return faces
+        except Exception:
+            pass
+        return cls.KAWAII_THINKING
+
+    @classmethod
+    def get_thinking_verbs(cls) -> list:
+        """Return thinking verbs from the active skin, falling back to THINKING_VERBS."""
+        try:
+            skin = _get_skin()
+            if skin:
+                verbs = skin.spinner.get("thinking_verbs", [])
+                if verbs:
+                    return verbs
+        except Exception:
+            pass
+        return cls.THINKING_VERBS
+
    def __init__(self, message: str = "", spinner_type: str = 'dots', print_fn=None):
        self.message = message
        self.spinner_frames = self.SPINNERS.get(spinner_type, self.SPINNERS['dots'])
@@ -954,84 +993,4 @@ def get_cute_tool_message(
 # Honcho session line (one-liner with clickable OSC 8 hyperlink)
 # =========================================================================

-_DIM = "\033[2m"
-_SKY_BLUE = "\033[38;5;117m"
-_ANSI_RESET = "\033[0m"

-
-# =========================================================================
-# Context pressure display (CLI user-facing warnings)
-# =========================================================================
-
-# ANSI color codes for context pressure tiers
-_CYAN = "\033[36m"
-_YELLOW = "\033[33m"
-_BOLD = "\033[1m"
-_DIM_ANSI = "\033[2m"
-
-# Bar characters
-_BAR_FILLED = "▰"
-_BAR_EMPTY = "▱"
-_BAR_WIDTH = 20
-
-
-def format_context_pressure(
-    compaction_progress: float,
-    threshold_tokens: int,
-    threshold_percent: float,
-    compression_enabled: bool = True,
-) -> str:
-    """Build a formatted context pressure line for CLI display.
-
-    The bar and percentage show progress toward the compaction threshold,
-    NOT the raw context window.  100% = compaction fires.
-
-    Args:
-        compaction_progress: How close to compaction (0.0–1.0, 1.0 = fires).
-        threshold_tokens: Compaction threshold in tokens.
-        threshold_percent: Compaction threshold as a fraction of context window.
-        compression_enabled: Whether auto-compression is active.
-    """
-    pct_int = min(int(compaction_progress * 100), 100)
-    filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
-    bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
-
-    threshold_k = f"{threshold_tokens // 1000}k" if threshold_tokens >= 1000 else str(threshold_tokens)
-    threshold_pct_int = int(threshold_percent * 100)
-
-    color = f"{_BOLD}{_YELLOW}"
-    icon = "⚠"
-    if compression_enabled:
-        hint = "compaction approaching"
-    else:
-        hint = "no auto-compaction"
-
-    return (
-        f"  {color}{icon} context {bar} {pct_int}% to compaction{_ANSI_RESET}"
-        f"  {_DIM_ANSI}{threshold_k} threshold ({threshold_pct_int}%) · {hint}{_ANSI_RESET}"
-    )
-
-
-def format_context_pressure_gateway(
-    compaction_progress: float,
-    threshold_percent: float,
-    compression_enabled: bool = True,
-) -> str:
-    """Build a plain-text context pressure notification for messaging platforms.
-
-    No ANSI — just Unicode and plain text suitable for Telegram/Discord/etc.
-    The percentage shows progress toward the compaction threshold.
-    """
-    pct_int = min(int(compaction_progress * 100), 100)
-    filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
-    bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
-
-    threshold_pct_int = int(threshold_percent * 100)
-
-    icon = "⚠️"
-    if compression_enabled:
-        hint = f"Context compaction approaching (threshold: {threshold_pct_int}% of window)."
-    else:
-        hint = "Auto-compaction is disabled — context may be truncated."
-
-    return f"{icon} Context: {bar} {pct_int}% to compaction\n{hint}"
@@ -112,6 +112,10 @@ _RATE_LIMIT_PATTERNS = [
    "please retry after",
    "resource_exhausted",
    "rate increased too quickly",  # Alibaba/DashScope throttling
+    # AWS Bedrock throttling
+    "throttlingexception",
+    "too many concurrent requests",
+    "servicequotaexceededexception",
 ]

 # Usage-limit patterns that need disambiguation (could be billing OR rate_limit)
@@ -171,6 +175,11 @@ _CONTEXT_OVERFLOW_PATTERNS = [
    # Chinese error messages (some providers return these)
    "超过最大长度",
    "上下文长度",
+    # AWS Bedrock Converse API error patterns
+    "input is too long",
+    "max input token",
+    "input token",
+    "exceeds the maximum number of input tokens",
 ]

 # Model not found patterns
@@ -0,0 +1,764 @@
+"""OpenAI-compatible facade that talks to Google's Cloud Code Assist backend.
+
+This adapter lets Hermes use the ``google-gemini-cli`` provider as if it were
+a standard OpenAI-shaped chat completion endpoint, while the underlying HTTP
+traffic goes to ``cloudcode-pa.googleapis.com/v1internal:{generateContent,
+streamGenerateContent}`` with a Bearer access token obtained via OAuth PKCE.
+
+Architecture
+------------
+- ``GeminiCloudCodeClient`` exposes ``.chat.completions.create(**kwargs)``
+  mirroring the subset of the OpenAI SDK that ``run_agent.py`` uses.
+- Incoming OpenAI ``messages[]`` / ``tools[]`` / ``tool_choice`` are translated
+  to Gemini's native ``contents[]`` / ``tools[].functionDeclarations`` /
+  ``toolConfig`` / ``systemInstruction`` shape.
+- The request body is wrapped ``{project, model, user_prompt_id, request}``
+  per Code Assist API expectations.
+- Responses (``candidates[].content.parts[]``) are converted back to
+  OpenAI ``choices[0].message`` shape with ``content`` + ``tool_calls``.
+- Streaming uses SSE (``?alt=sse``) and yields OpenAI-shaped delta chunks.
+
+Attribution
+-----------
+Translation semantics follow jenslys/opencode-gemini-auth (MIT) and the public
+Gemini API docs. Request envelope shape
+(``{project, model, user_prompt_id, request}``) is documented nowhere; it is
+reverse-engineered from the opencode-gemini-auth and clawdbot implementations.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import time
+import uuid
+from types import SimpleNamespace
+from typing import Any, Dict, Iterator, List, Optional
+
+import httpx
+
+from agent import google_oauth
+from agent.google_code_assist import (
+    CODE_ASSIST_ENDPOINT,
+    FREE_TIER_ID,
+    CodeAssistError,
+    ProjectContext,
+    resolve_project_context,
+)
+
+logger = logging.getLogger(__name__)
+
+
+# =============================================================================
+# Request translation: OpenAI → Gemini
+# =============================================================================
+
+_ROLE_MAP_OPENAI_TO_GEMINI = {
+    "user": "user",
+    "assistant": "model",
+    "system": "user",   # handled separately via systemInstruction
+    "tool": "user",     # functionResponse is wrapped in a user-role turn
+    "function": "user",
+}
+
+
+def _coerce_content_to_text(content: Any) -> str:
+    """OpenAI content may be str or a list of parts; reduce to plain text."""
+    if content is None:
+        return ""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        pieces: List[str] = []
+        for p in content:
+            if isinstance(p, str):
+                pieces.append(p)
+            elif isinstance(p, dict):
+                if p.get("type") == "text" and isinstance(p.get("text"), str):
+                    pieces.append(p["text"])
+                # Multimodal (image_url, etc.) — stub for now; log and skip
+                elif p.get("type") in ("image_url", "input_audio"):
+                    logger.debug("Dropping multimodal part (not yet supported): %s", p.get("type"))
+        return "\n".join(pieces)
+    return str(content)
+
+
+def _translate_tool_call_to_gemini(tool_call: Dict[str, Any]) -> Dict[str, Any]:
+    """OpenAI tool_call -> Gemini functionCall part."""
+    fn = tool_call.get("function") or {}
+    args_raw = fn.get("arguments", "")
+    try:
+        args = json.loads(args_raw) if isinstance(args_raw, str) and args_raw else {}
+    except json.JSONDecodeError:
+        args = {"_raw": args_raw}
+    if not isinstance(args, dict):
+        args = {"_value": args}
+    return {
+        "functionCall": {
+            "name": fn.get("name") or "",
+            "args": args,
+        },
+        # Sentinel signature — matches opencode-gemini-auth's approach.
+        # Without this, Code Assist rejects function calls that originated
+        # outside its own chain.
+        "thoughtSignature": "skip_thought_signature_validator",
+    }
+
+
+def _translate_tool_result_to_gemini(message: Dict[str, Any]) -> Dict[str, Any]:
+    """OpenAI tool-role message -> Gemini functionResponse part.
+
+    The function name isn't in the OpenAI tool message directly; it must be
+    passed via the assistant message that issued the call. For simplicity we
+    look up ``name`` on the message (OpenAI SDK copies it there) or on the
+    ``tool_call_id`` cross-reference.
+    """
+    name = str(message.get("name") or message.get("tool_call_id") or "tool")
+    content = _coerce_content_to_text(message.get("content"))
+    # Gemini expects the response as a dict under `response`. We wrap plain
+    # text in {"output": "..."}.
+    try:
+        parsed = json.loads(content) if content.strip().startswith(("{", "[")) else None
+    except json.JSONDecodeError:
+        parsed = None
+    response = parsed if isinstance(parsed, dict) else {"output": content}
+    return {
+        "functionResponse": {
+            "name": name,
+            "response": response,
+        },
+    }
+
+
+def _build_gemini_contents(
+    messages: List[Dict[str, Any]],
+) -> tuple[List[Dict[str, Any]], Optional[Dict[str, Any]]]:
+    """Convert OpenAI messages[] to Gemini contents[] + systemInstruction."""
+    system_text_parts: List[str] = []
+    contents: List[Dict[str, Any]] = []
+
+    for msg in messages:
+        if not isinstance(msg, dict):
+            continue
+        role = str(msg.get("role") or "user")
+
+        if role == "system":
+            system_text_parts.append(_coerce_content_to_text(msg.get("content")))
+            continue
+
+        # Tool result message — emit a user-role turn with functionResponse
+        if role == "tool" or role == "function":
+            contents.append({
+                "role": "user",
+                "parts": [_translate_tool_result_to_gemini(msg)],
+            })
+            continue
+
+        gemini_role = _ROLE_MAP_OPENAI_TO_GEMINI.get(role, "user")
+        parts: List[Dict[str, Any]] = []
+
+        text = _coerce_content_to_text(msg.get("content"))
+        if text:
+            parts.append({"text": text})
+
+        # Assistant messages can carry tool_calls
+        tool_calls = msg.get("tool_calls") or []
+        if isinstance(tool_calls, list):
+            for tc in tool_calls:
+                if isinstance(tc, dict):
+                    parts.append(_translate_tool_call_to_gemini(tc))
+
+        if not parts:
+            # Gemini rejects empty parts; skip the turn entirely
+            continue
+
+        contents.append({"role": gemini_role, "parts": parts})
+
+    system_instruction: Optional[Dict[str, Any]] = None
+    joined_system = "\n".join(p for p in system_text_parts if p).strip()
+    if joined_system:
+        system_instruction = {
+            "role": "system",
+            "parts": [{"text": joined_system}],
+        }
+
+    return contents, system_instruction
+
+
+def _translate_tools_to_gemini(tools: Any) -> List[Dict[str, Any]]:
+    """OpenAI tools[] -> Gemini tools[].functionDeclarations[]."""
+    if not isinstance(tools, list) or not tools:
+        return []
+    declarations: List[Dict[str, Any]] = []
+    for t in tools:
+        if not isinstance(t, dict):
+            continue
+        fn = t.get("function") or {}
+        if not isinstance(fn, dict):
+            continue
+        name = fn.get("name")
+        if not name:
+            continue
+        decl = {"name": str(name)}
+        if fn.get("description"):
+            decl["description"] = str(fn["description"])
+        params = fn.get("parameters")
+        if isinstance(params, dict):
+            decl["parameters"] = params
+        declarations.append(decl)
+    if not declarations:
+        return []
+    return [{"functionDeclarations": declarations}]
+
+
+def _translate_tool_choice_to_gemini(tool_choice: Any) -> Optional[Dict[str, Any]]:
+    """OpenAI tool_choice -> Gemini toolConfig.functionCallingConfig."""
+    if tool_choice is None:
+        return None
+    if isinstance(tool_choice, str):
+        if tool_choice == "auto":
+            return {"functionCallingConfig": {"mode": "AUTO"}}
+        if tool_choice == "required":
+            return {"functionCallingConfig": {"mode": "ANY"}}
+        if tool_choice == "none":
+            return {"functionCallingConfig": {"mode": "NONE"}}
+    if isinstance(tool_choice, dict):
+        fn = tool_choice.get("function") or {}
+        name = fn.get("name")
+        if name:
+            return {
+                "functionCallingConfig": {
+                    "mode": "ANY",
+                    "allowedFunctionNames": [str(name)],
+                },
+            }
+    return None
+
+
+def _normalize_thinking_config(config: Any) -> Optional[Dict[str, Any]]:
+    """Accept thinkingBudget / thinkingLevel / includeThoughts (+ snake_case)."""
+    if not isinstance(config, dict) or not config:
+        return None
+    budget = config.get("thinkingBudget", config.get("thinking_budget"))
+    level = config.get("thinkingLevel", config.get("thinking_level"))
+    include = config.get("includeThoughts", config.get("include_thoughts"))
+    normalized: Dict[str, Any] = {}
+    if isinstance(budget, (int, float)):
+        normalized["thinkingBudget"] = int(budget)
+    if isinstance(level, str) and level.strip():
+        normalized["thinkingLevel"] = level.strip().lower()
+    if isinstance(include, bool):
+        normalized["includeThoughts"] = include
+    return normalized or None
+
+
+def build_gemini_request(
+    *,
+    messages: List[Dict[str, Any]],
+    tools: Any = None,
+    tool_choice: Any = None,
+    temperature: Optional[float] = None,
+    max_tokens: Optional[int] = None,
+    top_p: Optional[float] = None,
+    stop: Any = None,
+    thinking_config: Any = None,
+) -> Dict[str, Any]:
+    """Build the inner Gemini request body (goes inside ``request`` wrapper)."""
+    contents, system_instruction = _build_gemini_contents(messages)
+
+    body: Dict[str, Any] = {"contents": contents}
+    if system_instruction is not None:
+        body["systemInstruction"] = system_instruction
+
+    gemini_tools = _translate_tools_to_gemini(tools)
+    if gemini_tools:
+        body["tools"] = gemini_tools
+    tool_cfg = _translate_tool_choice_to_gemini(tool_choice)
+    if tool_cfg is not None:
+        body["toolConfig"] = tool_cfg
+
+    generation_config: Dict[str, Any] = {}
+    if isinstance(temperature, (int, float)):
+        generation_config["temperature"] = float(temperature)
+    if isinstance(max_tokens, int) and max_tokens > 0:
+        generation_config["maxOutputTokens"] = max_tokens
+    if isinstance(top_p, (int, float)):
+        generation_config["topP"] = float(top_p)
+    if isinstance(stop, str) and stop:
+        generation_config["stopSequences"] = [stop]
+    elif isinstance(stop, list) and stop:
+        generation_config["stopSequences"] = [str(s) for s in stop if s]
+    normalized_thinking = _normalize_thinking_config(thinking_config)
+    if normalized_thinking:
+        generation_config["thinkingConfig"] = normalized_thinking
+    if generation_config:
+        body["generationConfig"] = generation_config
+
+    return body
+
+
+def wrap_code_assist_request(
+    *,
+    project_id: str,
+    model: str,
+    inner_request: Dict[str, Any],
+    user_prompt_id: Optional[str] = None,
+) -> Dict[str, Any]:
+    """Wrap the inner Gemini request in the Code Assist envelope."""
+    return {
+        "project": project_id,
+        "model": model,
+        "user_prompt_id": user_prompt_id or str(uuid.uuid4()),
+        "request": inner_request,
+    }
+
+
+# =============================================================================
+# Response translation: Gemini → OpenAI
+# =============================================================================
+
+def _translate_gemini_response(
+    resp: Dict[str, Any],
+    model: str,
+) -> SimpleNamespace:
+    """Non-streaming Gemini response -> OpenAI-shaped SimpleNamespace.
+
+    Code Assist wraps the actual Gemini response inside ``response``, so we
+    unwrap it first if present.
+    """
+    inner = resp.get("response") if isinstance(resp.get("response"), dict) else resp
+
+    candidates = inner.get("candidates") or []
+    if not isinstance(candidates, list) or not candidates:
+        return _empty_response(model)
+
+    cand = candidates[0]
+    content_obj = cand.get("content") if isinstance(cand, dict) else {}
+    parts = content_obj.get("parts") if isinstance(content_obj, dict) else []
+
+    text_pieces: List[str] = []
+    reasoning_pieces: List[str] = []
+    tool_calls: List[SimpleNamespace] = []
+
+    for i, part in enumerate(parts or []):
+        if not isinstance(part, dict):
+            continue
+        # Thought parts are model's internal reasoning — surface as reasoning,
+        # don't mix into content.
+        if part.get("thought") is True:
+            if isinstance(part.get("text"), str):
+                reasoning_pieces.append(part["text"])
+            continue
+        if isinstance(part.get("text"), str):
+            text_pieces.append(part["text"])
+            continue
+        fc = part.get("functionCall")
+        if isinstance(fc, dict) and fc.get("name"):
+            try:
+                args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
+            except (TypeError, ValueError):
+                args_str = "{}"
+            tool_calls.append(SimpleNamespace(
+                id=f"call_{uuid.uuid4().hex[:12]}",
+                type="function",
+                index=i,
+                function=SimpleNamespace(name=str(fc["name"]), arguments=args_str),
+            ))
+
+    finish_reason = "tool_calls" if tool_calls else _map_gemini_finish_reason(
+        str(cand.get("finishReason") or "")
+    )
+
+    usage_meta = inner.get("usageMetadata") or {}
+    usage = SimpleNamespace(
+        prompt_tokens=int(usage_meta.get("promptTokenCount") or 0),
+        completion_tokens=int(usage_meta.get("candidatesTokenCount") or 0),
+        total_tokens=int(usage_meta.get("totalTokenCount") or 0),
+        prompt_tokens_details=SimpleNamespace(
+            cached_tokens=int(usage_meta.get("cachedContentTokenCount") or 0),
+        ),
+    )
+
+    message = SimpleNamespace(
+        role="assistant",
+        content="".join(text_pieces) if text_pieces else None,
+        tool_calls=tool_calls or None,
+        reasoning="".join(reasoning_pieces) or None,
+        reasoning_content="".join(reasoning_pieces) or None,
+        reasoning_details=None,
+    )
+    choice = SimpleNamespace(
+        index=0,
+        message=message,
+        finish_reason=finish_reason,
+    )
+    return SimpleNamespace(
+        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
+        object="chat.completion",
+        created=int(time.time()),
+        model=model,
+        choices=[choice],
+        usage=usage,
+    )
+
+
+def _empty_response(model: str) -> SimpleNamespace:
+    message = SimpleNamespace(
+        role="assistant", content="", tool_calls=None,
+        reasoning=None, reasoning_content=None, reasoning_details=None,
+    )
+    choice = SimpleNamespace(index=0, message=message, finish_reason="stop")
+    usage = SimpleNamespace(
+        prompt_tokens=0, completion_tokens=0, total_tokens=0,
+        prompt_tokens_details=SimpleNamespace(cached_tokens=0),
+    )
+    return SimpleNamespace(
+        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
+        object="chat.completion",
+        created=int(time.time()),
+        model=model,
+        choices=[choice],
+        usage=usage,
+    )
+
+
+def _map_gemini_finish_reason(reason: str) -> str:
+    mapping = {
+        "STOP": "stop",
+        "MAX_TOKENS": "length",
+        "SAFETY": "content_filter",
+        "RECITATION": "content_filter",
+        "OTHER": "stop",
+    }
+    return mapping.get(reason.upper(), "stop")
+
+
+# =============================================================================
+# Streaming SSE iterator
+# =============================================================================
+
+class _GeminiStreamChunk(SimpleNamespace):
+    """Mimics an OpenAI ChatCompletionChunk with .choices[0].delta."""
+    pass
+
+
+def _make_stream_chunk(
+    *,
+    model: str,
+    content: str = "",
+    tool_call_delta: Optional[Dict[str, Any]] = None,
+    finish_reason: Optional[str] = None,
+    reasoning: str = "",
+) -> _GeminiStreamChunk:
+    delta_kwargs: Dict[str, Any] = {"role": "assistant"}
+    if content:
+        delta_kwargs["content"] = content
+    if tool_call_delta is not None:
+        delta_kwargs["tool_calls"] = [SimpleNamespace(
+            index=tool_call_delta.get("index", 0),
+            id=tool_call_delta.get("id") or f"call_{uuid.uuid4().hex[:12]}",
+            type="function",
+            function=SimpleNamespace(
+                name=tool_call_delta.get("name") or "",
+                arguments=tool_call_delta.get("arguments") or "",
+            ),
+        )]
+    if reasoning:
+        delta_kwargs["reasoning"] = reasoning
+        delta_kwargs["reasoning_content"] = reasoning
+    delta = SimpleNamespace(**delta_kwargs)
+    choice = SimpleNamespace(index=0, delta=delta, finish_reason=finish_reason)
+    return _GeminiStreamChunk(
+        id=f"chatcmpl-{uuid.uuid4().hex[:12]}",
+        object="chat.completion.chunk",
+        created=int(time.time()),
+        model=model,
+        choices=[choice],
+        usage=None,
+    )
+
+
+def _iter_sse_events(response: httpx.Response) -> Iterator[Dict[str, Any]]:
+    """Parse Server-Sent Events from an httpx streaming response."""
+    buffer = ""
+    for chunk in response.iter_text():
+        if not chunk:
+            continue
+        buffer += chunk
+        while "\n" in buffer:
+            line, buffer = buffer.split("\n", 1)
+            line = line.rstrip("\r")
+            if not line:
+                continue
+            if line.startswith("data: "):
+                data = line[6:]
+                if data == "[DONE]":
+                    return
+                try:
+                    yield json.loads(data)
+                except json.JSONDecodeError:
+                    logger.debug("Non-JSON SSE line: %s", data[:200])
+
+
+def _translate_stream_event(
+    event: Dict[str, Any],
+    model: str,
+    tool_call_indices: Dict[str, int],
+) -> List[_GeminiStreamChunk]:
+    """Unwrap Code Assist envelope and emit OpenAI-shaped chunk(s)."""
+    inner = event.get("response") if isinstance(event.get("response"), dict) else event
+    candidates = inner.get("candidates") or []
+    if not candidates:
+        return []
+    cand = candidates[0]
+    if not isinstance(cand, dict):
+        return []
+
+    chunks: List[_GeminiStreamChunk] = []
+
+    content = cand.get("content") or {}
+    parts = content.get("parts") if isinstance(content, dict) else []
+    for part in parts or []:
+        if not isinstance(part, dict):
+            continue
+        if part.get("thought") is True and isinstance(part.get("text"), str):
+            chunks.append(_make_stream_chunk(
+                model=model, reasoning=part["text"],
+            ))
+            continue
+        if isinstance(part.get("text"), str) and part["text"]:
+            chunks.append(_make_stream_chunk(model=model, content=part["text"]))
+        fc = part.get("functionCall")
+        if isinstance(fc, dict) and fc.get("name"):
+            name = str(fc["name"])
+            idx = tool_call_indices.setdefault(name, len(tool_call_indices))
+            try:
+                args_str = json.dumps(fc.get("args") or {}, ensure_ascii=False)
+            except (TypeError, ValueError):
+                args_str = "{}"
+            chunks.append(_make_stream_chunk(
+                model=model,
+                tool_call_delta={
+                    "index": idx,
+                    "name": name,
+                    "arguments": args_str,
+                },
+            ))
+
+    finish_reason_raw = str(cand.get("finishReason") or "")
+    if finish_reason_raw:
+        mapped = _map_gemini_finish_reason(finish_reason_raw)
+        if tool_call_indices:
+            mapped = "tool_calls"
+        chunks.append(_make_stream_chunk(model=model, finish_reason=mapped))
+    return chunks
+
+
+# =============================================================================
+# GeminiCloudCodeClient — OpenAI-compatible facade
+# =============================================================================
+
+MARKER_BASE_URL = "cloudcode-pa://google"
+
+
+class _GeminiChatCompletions:
+    def __init__(self, client: "GeminiCloudCodeClient"):
+        self._client = client
+
+    def create(self, **kwargs: Any) -> Any:
+        return self._client._create_chat_completion(**kwargs)
+
+
+class _GeminiChatNamespace:
+    def __init__(self, client: "GeminiCloudCodeClient"):
+        self.completions = _GeminiChatCompletions(client)
+
+
+class GeminiCloudCodeClient:
+    """Minimal OpenAI-SDK-compatible facade over Code Assist v1internal."""
+
+    def __init__(
+        self,
+        *,
+        api_key: Optional[str] = None,
+        base_url: Optional[str] = None,
+        default_headers: Optional[Dict[str, str]] = None,
+        project_id: str = "",
+        **_: Any,
+    ):
+        # `api_key` here is a dummy — real auth is the OAuth access token
+        # fetched on every call via agent.google_oauth.get_valid_access_token().
+        # We accept the kwarg for openai.OpenAI interface parity.
+        self.api_key = api_key or "google-oauth"
+        self.base_url = base_url or MARKER_BASE_URL
+        self._default_headers = dict(default_headers or {})
+        self._configured_project_id = project_id
+        self._project_context: Optional[ProjectContext] = None
+        self._project_context_lock = False  # simple single-thread guard
+        self.chat = _GeminiChatNamespace(self)
+        self.is_closed = False
+        self._http = httpx.Client(timeout=httpx.Timeout(connect=15.0, read=600.0, write=30.0, pool=30.0))
+
+    def close(self) -> None:
+        self.is_closed = True
+        try:
+            self._http.close()
+        except Exception:
+            pass
+
+    # Implement the OpenAI SDK's context-manager-ish closure check
+    def __enter__(self):
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        self.close()
+
+    def _ensure_project_context(self, access_token: str, model: str) -> ProjectContext:
+        """Lazily resolve and cache the project context for this client."""
+        if self._project_context is not None:
+            return self._project_context
+
+        env_project = google_oauth.resolve_project_id_from_env()
+        creds = google_oauth.load_credentials()
+        stored_project = creds.project_id if creds else ""
+
+        # Prefer what's already baked into the creds
+        if stored_project:
+            self._project_context = ProjectContext(
+                project_id=stored_project,
+                managed_project_id=creds.managed_project_id if creds else "",
+                tier_id="",
+                source="stored",
+            )
+            return self._project_context
+
+        ctx = resolve_project_context(
+            access_token,
+            configured_project_id=self._configured_project_id,
+            env_project_id=env_project,
+            user_agent_model=model,
+        )
+        # Persist discovered project back to the creds file so the next
+        # session doesn't re-run the discovery.
+        if ctx.project_id or ctx.managed_project_id:
+            google_oauth.update_project_ids(
+                project_id=ctx.project_id,
+                managed_project_id=ctx.managed_project_id,
+            )
+        self._project_context = ctx
+        return ctx
+
+    def _create_chat_completion(
+        self,
+        *,
+        model: str = "gemini-2.5-flash",
+        messages: Optional[List[Dict[str, Any]]] = None,
+        stream: bool = False,
+        tools: Any = None,
+        tool_choice: Any = None,
+        temperature: Optional[float] = None,
+        max_tokens: Optional[int] = None,
+        top_p: Optional[float] = None,
+        stop: Any = None,
+        extra_body: Optional[Dict[str, Any]] = None,
+        timeout: Any = None,
+        **_: Any,
+    ) -> Any:
+        access_token = google_oauth.get_valid_access_token()
+        ctx = self._ensure_project_context(access_token, model)
+
+        thinking_config = None
+        if isinstance(extra_body, dict):
+            thinking_config = extra_body.get("thinking_config") or extra_body.get("thinkingConfig")
+
+        inner = build_gemini_request(
+            messages=messages or [],
+            tools=tools,
+            tool_choice=tool_choice,
+            temperature=temperature,
+            max_tokens=max_tokens,
+            top_p=top_p,
+            stop=stop,
+            thinking_config=thinking_config,
+        )
+        wrapped = wrap_code_assist_request(
+            project_id=ctx.project_id,
+            model=model,
+            inner_request=inner,
+        )
+
+        headers = {
+            "Content-Type": "application/json",
+            "Accept": "application/json",
+            "Authorization": f"Bearer {access_token}",
+            "User-Agent": "hermes-agent (gemini-cli-compat)",
+            "X-Goog-Api-Client": "gl-python/hermes",
+            "x-activity-request-id": str(uuid.uuid4()),
+        }
+        headers.update(self._default_headers)
+
+        if stream:
+            return self._stream_completion(model=model, wrapped=wrapped, headers=headers)
+
+        url = f"{CODE_ASSIST_ENDPOINT}/v1internal:generateContent"
+        response = self._http.post(url, json=wrapped, headers=headers)
+        if response.status_code != 200:
+            raise _gemini_http_error(response)
+        try:
+            payload = response.json()
+        except ValueError as exc:
+            raise CodeAssistError(
+                f"Invalid JSON from Code Assist: {exc}",
+                code="code_assist_invalid_json",
+            ) from exc
+        return _translate_gemini_response(payload, model=model)
+
+    def _stream_completion(
+        self,
+        *,
+        model: str,
+        wrapped: Dict[str, Any],
+        headers: Dict[str, str],
+    ) -> Iterator[_GeminiStreamChunk]:
+        """Generator that yields OpenAI-shaped streaming chunks."""
+        url = f"{CODE_ASSIST_ENDPOINT}/v1internal:streamGenerateContent?alt=sse"
+        stream_headers = dict(headers)
+        stream_headers["Accept"] = "text/event-stream"
+
+        def _generator() -> Iterator[_GeminiStreamChunk]:
+            try:
+                with self._http.stream("POST", url, json=wrapped, headers=stream_headers) as response:
+                    if response.status_code != 200:
+                        # Materialize error body for better diagnostics
+                        response.read()
+                        raise _gemini_http_error(response)
+                    tool_call_indices: Dict[str, int] = {}
+                    for event in _iter_sse_events(response):
+                        for chunk in _translate_stream_event(event, model, tool_call_indices):
+                            yield chunk
+            except httpx.HTTPError as exc:
+                raise CodeAssistError(
+                    f"Streaming request failed: {exc}",
+                    code="code_assist_stream_error",
+                ) from exc
+
+        return _generator()
+
+
+def _gemini_http_error(response: httpx.Response) -> CodeAssistError:
+    status = response.status_code
+    try:
+        body = response.text[:500]
+    except Exception:
+        body = ""
+    # Let run_agent's retry logic see auth errors as rotatable via `api_key`
+    code = f"code_assist_http_{status}"
+    if status == 401:
+        code = "code_assist_unauthorized"
+    elif status == 429:
+        code = "code_assist_rate_limited"
+    return CodeAssistError(
+        f"Code Assist returned HTTP {status}: {body}",
+        code=code,
+    )
@@ -0,0 +1,417 @@
+"""Google Code Assist API client — project discovery, onboarding, quota.
+
+The Code Assist API powers Google's official gemini-cli. It sits at
+``cloudcode-pa.googleapis.com`` and provides:
+
+- Free tier access (generous daily quota) for personal Google accounts
+- Paid tier access via GCP projects with billing / Workspace / Standard / Enterprise
+
+This module handles the control-plane dance needed before inference:
+
+1. ``load_code_assist()`` — probe the user's account to learn what tier they're on
+   and whether a ``cloudaicompanionProject`` is already assigned.
+2. ``onboard_user()`` — if the user hasn't been onboarded yet (new account, fresh
+   free tier, etc.), call this with the chosen tier + project id. Supports LRO
+   polling for slow provisioning.
+3. ``retrieve_user_quota()`` — fetch the ``buckets[]`` array showing remaining
+   quota per model, used by the ``/gquota`` slash command.
+
+VPC-SC handling: enterprise accounts under a VPC Service Controls perimeter
+will get ``SECURITY_POLICY_VIOLATED`` on ``load_code_assist``. We catch this
+and force the account to ``standard-tier`` so the call chain still succeeds.
+
+Derived from opencode-gemini-auth (MIT) and clawdbot/extensions/google. The
+request/response shapes are specific to Google's internal Code Assist API,
+documented nowhere public — we copy them from the reference implementations.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import time
+import urllib.error
+import urllib.parse
+import urllib.request
+import uuid
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional
+
+logger = logging.getLogger(__name__)
+
+
+# =============================================================================
+# Constants
+# =============================================================================
+
+CODE_ASSIST_ENDPOINT = "https://cloudcode-pa.googleapis.com"
+
+# Fallback endpoints tried when prod returns an error during project discovery
+FALLBACK_ENDPOINTS = [
+    "https://daily-cloudcode-pa.sandbox.googleapis.com",
+    "https://autopush-cloudcode-pa.sandbox.googleapis.com",
+]
+
+# Tier identifiers that Google's API uses
+FREE_TIER_ID = "free-tier"
+LEGACY_TIER_ID = "legacy-tier"
+STANDARD_TIER_ID = "standard-tier"
+
+# Default HTTP headers matching gemini-cli's fingerprint.
+# Google may reject unrecognized User-Agents on these internal endpoints.
+_GEMINI_CLI_USER_AGENT = "google-api-nodejs-client/9.15.1 (gzip)"
+_X_GOOG_API_CLIENT = "gl-node/24.0.0"
+_DEFAULT_REQUEST_TIMEOUT = 30.0
+_ONBOARDING_POLL_ATTEMPTS = 12
+_ONBOARDING_POLL_INTERVAL_SECONDS = 5.0
+
+
+class CodeAssistError(RuntimeError):
+    def __init__(self, message: str, *, code: str = "code_assist_error") -> None:
+        super().__init__(message)
+        self.code = code
+
+
+class ProjectIdRequiredError(CodeAssistError):
+    def __init__(self, message: str = "GCP project id required for this tier") -> None:
+        super().__init__(message, code="code_assist_project_id_required")
+
+
+# =============================================================================
+# HTTP primitive (auth via Bearer token passed per-call)
+# =============================================================================
+
+def _build_headers(access_token: str, *, user_agent_model: str = "") -> Dict[str, str]:
+    ua = _GEMINI_CLI_USER_AGENT
+    if user_agent_model:
+        ua = f"{ua} model/{user_agent_model}"
+    return {
+        "Content-Type": "application/json",
+        "Accept": "application/json",
+        "Authorization": f"Bearer {access_token}",
+        "User-Agent": ua,
+        "X-Goog-Api-Client": _X_GOOG_API_CLIENT,
+        "x-activity-request-id": str(uuid.uuid4()),
+    }
+
+
+def _client_metadata() -> Dict[str, str]:
+    """Match Google's gemini-cli exactly — unrecognized metadata may be rejected."""
+    return {
+        "ideType": "IDE_UNSPECIFIED",
+        "platform": "PLATFORM_UNSPECIFIED",
+        "pluginType": "GEMINI",
+    }
+
+
+def _post_json(
+    url: str,
+    body: Dict[str, Any],
+    access_token: str,
+    *,
+    timeout: float = _DEFAULT_REQUEST_TIMEOUT,
+    user_agent_model: str = "",
+) -> Dict[str, Any]:
+    data = json.dumps(body).encode("utf-8")
+    request = urllib.request.Request(
+        url, data=data, method="POST",
+        headers=_build_headers(access_token, user_agent_model=user_agent_model),
+    )
+    try:
+        with urllib.request.urlopen(request, timeout=timeout) as response:
+            raw = response.read().decode("utf-8", errors="replace")
+            return json.loads(raw) if raw else {}
+    except urllib.error.HTTPError as exc:
+        detail = ""
+        try:
+            detail = exc.read().decode("utf-8", errors="replace")
+        except Exception:
+            pass
+        # Special case: VPC-SC violation should be distinguishable
+        if _is_vpc_sc_violation(detail):
+            raise CodeAssistError(
+                f"VPC-SC policy violation: {detail}",
+                code="code_assist_vpc_sc",
+            ) from exc
+        raise CodeAssistError(
+            f"Code Assist HTTP {exc.code}: {detail or exc.reason}",
+            code=f"code_assist_http_{exc.code}",
+        ) from exc
+    except urllib.error.URLError as exc:
+        raise CodeAssistError(
+            f"Code Assist request failed: {exc}",
+            code="code_assist_network_error",
+        ) from exc
+
+
+def _is_vpc_sc_violation(body: str) -> bool:
+    """Detect a VPC Service Controls violation from a response body."""
+    if not body:
+        return False
+    try:
+        parsed = json.loads(body)
+    except (json.JSONDecodeError, ValueError):
+        return "SECURITY_POLICY_VIOLATED" in body
+    # Walk the nested error structure Google uses
+    error = parsed.get("error") if isinstance(parsed, dict) else None
+    if not isinstance(error, dict):
+        return False
+    details = error.get("details") or []
+    if isinstance(details, list):
+        for item in details:
+            if isinstance(item, dict):
+                reason = item.get("reason") or ""
+                if reason == "SECURITY_POLICY_VIOLATED":
+                    return True
+    msg = str(error.get("message", ""))
+    return "SECURITY_POLICY_VIOLATED" in msg
+
+
+# =============================================================================
+# load_code_assist — discovers current tier + assigned project
+# =============================================================================
+
+@dataclass
+class CodeAssistProjectInfo:
+    """Result from ``load_code_assist``."""
+    current_tier_id: str = ""
+    cloudaicompanion_project: str = ""   # Google-managed project (free tier)
+    allowed_tiers: List[str] = field(default_factory=list)
+    raw: Dict[str, Any] = field(default_factory=dict)
+
+
+def load_code_assist(
+    access_token: str,
+    *,
+    project_id: str = "",
+    user_agent_model: str = "",
+) -> CodeAssistProjectInfo:
+    """Call ``POST /v1internal:loadCodeAssist`` with prod → sandbox fallback.
+
+    Returns whatever tier + project info Google reports. On VPC-SC violations,
+    returns a synthetic ``standard-tier`` result so the chain can continue.
+    """
+    body: Dict[str, Any] = {
+        "metadata": {
+            "duetProject": project_id,
+            **_client_metadata(),
+        },
+    }
+    if project_id:
+        body["cloudaicompanionProject"] = project_id
+
+    endpoints = [CODE_ASSIST_ENDPOINT] + FALLBACK_ENDPOINTS
+    last_err: Optional[Exception] = None
+    for endpoint in endpoints:
+        url = f"{endpoint}/v1internal:loadCodeAssist"
+        try:
+            resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
+            return _parse_load_response(resp)
+        except CodeAssistError as exc:
+            if exc.code == "code_assist_vpc_sc":
+                logger.info("VPC-SC violation on %s — defaulting to standard-tier", endpoint)
+                return CodeAssistProjectInfo(
+                    current_tier_id=STANDARD_TIER_ID,
+                    cloudaicompanion_project=project_id,
+                )
+            last_err = exc
+            logger.warning("loadCodeAssist failed on %s: %s", endpoint, exc)
+            continue
+    if last_err:
+        raise last_err
+    return CodeAssistProjectInfo()
+
+
+def _parse_load_response(resp: Dict[str, Any]) -> CodeAssistProjectInfo:
+    current_tier = resp.get("currentTier") or {}
+    tier_id = str(current_tier.get("id") or "") if isinstance(current_tier, dict) else ""
+    project = str(resp.get("cloudaicompanionProject") or "")
+    allowed = resp.get("allowedTiers") or []
+    allowed_ids: List[str] = []
+    if isinstance(allowed, list):
+        for t in allowed:
+            if isinstance(t, dict):
+                tid = str(t.get("id") or "")
+                if tid:
+                    allowed_ids.append(tid)
+    return CodeAssistProjectInfo(
+        current_tier_id=tier_id,
+        cloudaicompanion_project=project,
+        allowed_tiers=allowed_ids,
+        raw=resp,
+    )
+
+
+# =============================================================================
+# onboard_user — provisions a new user on a tier (with LRO polling)
+# =============================================================================
+
+def onboard_user(
+    access_token: str,
+    *,
+    tier_id: str,
+    project_id: str = "",
+    user_agent_model: str = "",
+) -> Dict[str, Any]:
+    """Call ``POST /v1internal:onboardUser`` to provision the user.
+
+    For paid tiers, ``project_id`` is REQUIRED (raises ProjectIdRequiredError).
+    For free tiers, ``project_id`` is optional — Google will assign one.
+
+    Returns the final operation response. Polls ``/v1internal/<name>`` for up
+    to ``_ONBOARDING_POLL_ATTEMPTS`` × ``_ONBOARDING_POLL_INTERVAL_SECONDS``
+    (default: 12 × 5s = 1 min).
+    """
+    if tier_id != FREE_TIER_ID and tier_id != LEGACY_TIER_ID and not project_id:
+        raise ProjectIdRequiredError(
+            f"Tier {tier_id!r} requires a GCP project id. "
+            "Set HERMES_GEMINI_PROJECT_ID or GOOGLE_CLOUD_PROJECT."
+        )
+
+    body: Dict[str, Any] = {
+        "tierId": tier_id,
+        "metadata": _client_metadata(),
+    }
+    if project_id:
+        body["cloudaicompanionProject"] = project_id
+
+    endpoint = CODE_ASSIST_ENDPOINT
+    url = f"{endpoint}/v1internal:onboardUser"
+    resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
+
+    # Poll if LRO (long-running operation)
+    if not resp.get("done"):
+        op_name = resp.get("name", "")
+        if not op_name:
+            return resp
+        for attempt in range(_ONBOARDING_POLL_ATTEMPTS):
+            time.sleep(_ONBOARDING_POLL_INTERVAL_SECONDS)
+            poll_url = f"{endpoint}/v1internal/{op_name}"
+            try:
+                poll_resp = _post_json(poll_url, {}, access_token, user_agent_model=user_agent_model)
+            except CodeAssistError as exc:
+                logger.warning("Onboarding poll attempt %d failed: %s", attempt + 1, exc)
+                continue
+            if poll_resp.get("done"):
+                return poll_resp
+        logger.warning("Onboarding did not complete within %d attempts", _ONBOARDING_POLL_ATTEMPTS)
+    return resp
+
+
+# =============================================================================
+# retrieve_user_quota — for /gquota
+# =============================================================================
+
+@dataclass
+class QuotaBucket:
+    model_id: str
+    token_type: str = ""
+    remaining_fraction: float = 0.0
+    reset_time_iso: str = ""
+    raw: Dict[str, Any] = field(default_factory=dict)
+
+
+def retrieve_user_quota(
+    access_token: str,
+    *,
+    project_id: str = "",
+    user_agent_model: str = "",
+) -> List[QuotaBucket]:
+    """Call ``POST /v1internal:retrieveUserQuota`` and parse ``buckets[]``."""
+    body: Dict[str, Any] = {}
+    if project_id:
+        body["project"] = project_id
+    url = f"{CODE_ASSIST_ENDPOINT}/v1internal:retrieveUserQuota"
+    resp = _post_json(url, body, access_token, user_agent_model=user_agent_model)
+    raw_buckets = resp.get("buckets") or []
+    buckets: List[QuotaBucket] = []
+    if not isinstance(raw_buckets, list):
+        return buckets
+    for b in raw_buckets:
+        if not isinstance(b, dict):
+            continue
+        buckets.append(QuotaBucket(
+            model_id=str(b.get("modelId") or ""),
+            token_type=str(b.get("tokenType") or ""),
+            remaining_fraction=float(b.get("remainingFraction") or 0.0),
+            reset_time_iso=str(b.get("resetTime") or ""),
+            raw=b,
+        ))
+    return buckets
+
+
+# =============================================================================
+# Project context resolution
+# =============================================================================
+
+@dataclass
+class ProjectContext:
+    """Resolved state for a given OAuth session."""
+    project_id: str = ""           # effective project id sent on requests
+    managed_project_id: str = ""   # Google-assigned project (free tier)
+    tier_id: str = ""
+    source: str = ""               # "env", "config", "discovered", "onboarded"
+
+
+def resolve_project_context(
+    access_token: str,
+    *,
+    configured_project_id: str = "",
+    env_project_id: str = "",
+    user_agent_model: str = "",
+) -> ProjectContext:
+    """Figure out what project id + tier to use for requests.
+
+    Priority:
+      1. If configured_project_id or env_project_id is set, use that directly
+         and short-circuit (no discovery needed).
+      2. Otherwise call loadCodeAssist to see what Google says.
+      3. If no tier assigned yet, onboard the user (free tier default).
+    """
+    # Short-circuit: caller provided a project id
+    if configured_project_id:
+        return ProjectContext(
+            project_id=configured_project_id,
+            tier_id=STANDARD_TIER_ID,  # assume paid since they specified one
+            source="config",
+        )
+    if env_project_id:
+        return ProjectContext(
+            project_id=env_project_id,
+            tier_id=STANDARD_TIER_ID,
+            source="env",
+        )
+
+    # Discover via loadCodeAssist
+    info = load_code_assist(access_token, user_agent_model=user_agent_model)
+
+    effective_project = info.cloudaicompanion_project
+    tier = info.current_tier_id
+
+    if not tier:
+        # User hasn't been onboarded — provision them on free tier
+        onboard_resp = onboard_user(
+            access_token,
+            tier_id=FREE_TIER_ID,
+            project_id="",
+            user_agent_model=user_agent_model,
+        )
+        # Re-parse from the onboard response
+        response_body = onboard_resp.get("response") or {}
+        if isinstance(response_body, dict):
+            effective_project = (
+                effective_project
+                or str(response_body.get("cloudaicompanionProject") or "")
+            )
+        tier = FREE_TIER_ID
+        source = "onboarded"
+    else:
+        source = "discovered"
+
+    return ProjectContext(
+        project_id=effective_project,
+        managed_project_id=effective_project if tier == FREE_TIER_ID else "",
+        tier_id=tier,
+        source=source,
+    )
@@ -28,6 +28,7 @@ Usage in run_agent.py:

 from __future__ import annotations

+import json
 import logging
 import re
 from typing import Any, Dict, List, Optional
@@ -43,11 +44,22 @@ logger = logging.getLogger(__name__)
 # ---------------------------------------------------------------------------

 _FENCE_TAG_RE = re.compile(r'</?\s*memory-context\s*>', re.IGNORECASE)
+_INTERNAL_CONTEXT_RE = re.compile(
+    r'<\s*memory-context\s*>[\s\S]*?</\s*memory-context\s*>',
+    re.IGNORECASE,
+)
+_INTERNAL_NOTE_RE = re.compile(
+    r'\[System note:\s*The following is recalled memory context,\s*NOT new user input\.\s*Treat as informational background data\.\]\s*',
+    re.IGNORECASE,
+)


 def sanitize_context(text: str) -> str:
-    """Strip fence-escape sequences from provider output."""
-    return _FENCE_TAG_RE.sub('', text)
+    """Strip fence tags, injected context blocks, and system notes from provider output."""
+    text = _INTERNAL_CONTEXT_RE.sub('', text)
+    text = _INTERNAL_NOTE_RE.sub('', text)
+    text = _FENCE_TAG_RE.sub('', text)
+    return text


 def build_memory_context_block(raw_context: str) -> str:
@@ -23,7 +23,7 @@ logger = logging.getLogger(__name__)
 # are preserved so the full model name reaches cache lookups and server queries.
 _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
-    "gemini", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "anthropic", "deepseek",
+    "gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "anthropic", "deepseek",
    "opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
    "qwen-oauth",
    "xiaomi",
@@ -33,6 +33,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "google", "google-gemini", "google-ai-studio",
    "glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
    "github-models", "kimi", "moonshot", "kimi-cn", "moonshot-cn", "claude", "deep-seek",
+    "ollama",
    "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
    "mimo", "xiaomi-mimo",
    "arcee-ai", "arceeai",
@@ -101,6 +102,8 @@ DEFAULT_CONTEXT_LENGTHS = {
    # fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
    # substring of "anthropic/claude-sonnet-4.6").
    # OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
+    "claude-opus-4-7": 1000000,
+    "claude-opus-4.7": 1000000,
    "claude-opus-4-6": 1000000,
    "claude-sonnet-4-6": 1000000,
    "claude-opus-4.6": 1000000,
@@ -239,6 +242,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.x.ai": "xai",
    "api.xiaomimimo.com": "xiaomi",
    "xiaomimimo.com": "xiaomi",
+    "ollama.com": "ollama-cloud",
 }


@@ -1012,6 +1016,16 @@ def get_model_context_length(
        if ctx:
            return ctx

+    # 4b. AWS Bedrock — use static context length table.
+    # Bedrock's ListFoundationModels doesn't expose context window sizes,
+    # so we maintain a curated table in bedrock_adapter.py.
+    if provider == "bedrock" or (base_url and "bedrock-runtime" in base_url):
+        try:
+            from agent.bedrock_adapter import get_bedrock_context_length
+            return get_bedrock_context_length(model)
+        except ImportError:
+            pass  # boto3 not installed — fall through to generic resolution
+
    # 5. Provider-aware lookups (before generic OpenRouter cache)
    # These are provider-specific and take priority over the generic OR cache,
    # since the same model can have different context limits per provider
@@ -169,6 +169,7 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
    "togetherai": "togetherai",
    "perplexity": "perplexity",
    "cohere": "cohere",
+    "ollama-cloud": "ollama-cloud",
 }

 # Reverse mapping: models.dev → Hermes (built lazily)
@@ -0,0 +1,182 @@
+"""Cross-session rate limit guard for Nous Portal.
+
+Writes rate limit state to a shared file so all sessions (CLI, gateway,
+cron, auxiliary) can check whether Nous Portal is currently rate-limited
+before making requests.  Prevents retry amplification when RPH is tapped.
+
+Each 429 from Nous triggers up to 9 API calls per conversation turn
+(3 SDK retries x 3 Hermes retries), and every one of those calls counts
+against RPH.  By recording the rate limit state on first 429 and checking
+it before subsequent attempts, we eliminate the amplification effect.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import tempfile
+import time
+from typing import Any, Mapping, Optional
+
+logger = logging.getLogger(__name__)
+
+_STATE_SUBDIR = "rate_limits"
+_STATE_FILENAME = "nous.json"
+
+
+def _state_path() -> str:
+    """Return the path to the Nous rate limit state file."""
+    try:
+        from hermes_constants import get_hermes_home
+        base = get_hermes_home()
+    except ImportError:
+        base = os.path.join(os.path.expanduser("~"), ".hermes")
+    return os.path.join(base, _STATE_SUBDIR, _STATE_FILENAME)
+
+
+def _parse_reset_seconds(headers: Optional[Mapping[str, str]]) -> Optional[float]:
+    """Extract the best available reset-time estimate from response headers.
+
+    Priority:
+      1. x-ratelimit-reset-requests-1h  (hourly RPH window — most useful)
+      2. x-ratelimit-reset-requests     (per-minute RPM window)
+      3. retry-after                     (generic HTTP header)
+
+    Returns seconds-from-now, or None if no usable header found.
+    """
+    if not headers:
+        return None
+
+    lowered = {k.lower(): v for k, v in headers.items()}
+
+    for key in (
+        "x-ratelimit-reset-requests-1h",
+        "x-ratelimit-reset-requests",
+        "retry-after",
+    ):
+        raw = lowered.get(key)
+        if raw is not None:
+            try:
+                val = float(raw)
+                if val > 0:
+                    return val
+            except (TypeError, ValueError):
+                pass
+
+    return None
+
+
+def record_nous_rate_limit(
+    *,
+    headers: Optional[Mapping[str, str]] = None,
+    error_context: Optional[dict[str, Any]] = None,
+    default_cooldown: float = 300.0,
+) -> None:
+    """Record that Nous Portal is rate-limited.
+
+    Parses the reset time from response headers or error context.
+    Falls back to ``default_cooldown`` (5 minutes) if no reset info
+    is available.  Writes to a shared file that all sessions can read.
+
+    Args:
+        headers: HTTP response headers from the 429 error.
+        error_context: Structured error context from _extract_api_error_context().
+        default_cooldown: Fallback cooldown in seconds when no header data.
+    """
+    now = time.time()
+    reset_at = None
+
+    # Try headers first (most accurate)
+    header_seconds = _parse_reset_seconds(headers)
+    if header_seconds is not None:
+        reset_at = now + header_seconds
+
+    # Try error_context reset_at (from body parsing)
+    if reset_at is None and isinstance(error_context, dict):
+        ctx_reset = error_context.get("reset_at")
+        if isinstance(ctx_reset, (int, float)) and ctx_reset > now:
+            reset_at = float(ctx_reset)
+
+    # Default cooldown
+    if reset_at is None:
+        reset_at = now + default_cooldown
+
+    path = _state_path()
+    try:
+        state_dir = os.path.dirname(path)
+        os.makedirs(state_dir, exist_ok=True)
+
+        state = {
+            "reset_at": reset_at,
+            "recorded_at": now,
+            "reset_seconds": reset_at - now,
+        }
+
+        # Atomic write: write to temp file + rename
+        fd, tmp_path = tempfile.mkstemp(dir=state_dir, suffix=".tmp")
+        try:
+            with os.fdopen(fd, "w") as f:
+                json.dump(state, f)
+            os.replace(tmp_path, path)
+        except Exception:
+            # Clean up temp file on failure
+            try:
+                os.unlink(tmp_path)
+            except OSError:
+                pass
+            raise
+
+        logger.info(
+            "Nous rate limit recorded: resets in %.0fs (at %.0f)",
+            reset_at - now, reset_at,
+        )
+    except Exception as exc:
+        logger.debug("Failed to write Nous rate limit state: %s", exc)
+
+
+def nous_rate_limit_remaining() -> Optional[float]:
+    """Check if Nous Portal is currently rate-limited.
+
+    Returns:
+        Seconds remaining until reset, or None if not rate-limited.
+    """
+    path = _state_path()
+    try:
+        with open(path) as f:
+            state = json.load(f)
+        reset_at = state.get("reset_at", 0)
+        remaining = reset_at - time.time()
+        if remaining > 0:
+            return remaining
+        # Expired — clean up
+        try:
+            os.unlink(path)
+        except OSError:
+            pass
+        return None
+    except (FileNotFoundError, json.JSONDecodeError, KeyError, TypeError):
+        return None
+
+
+def clear_nous_rate_limit() -> None:
+    """Clear the rate limit state (e.g., after a successful Nous request)."""
+    try:
+        os.unlink(_state_path())
+    except FileNotFoundError:
+        pass
+    except OSError as exc:
+        logger.debug("Failed to clear Nous rate limit state: %s", exc)
+
+
+def format_remaining(seconds: float) -> str:
+    """Format seconds remaining into human-readable duration."""
+    s = max(0, int(seconds))
+    if s < 60:
+        return f"{s}s"
+    if s < 3600:
+        m, sec = divmod(s, 60)
+        return f"{m}m {sec}s" if sec else f"{m}m"
+    h, remainder = divmod(s, 3600)
+    m = remainder // 60
+    return f"{h}h {m}m" if m else f"{h}h"
@@ -295,7 +295,9 @@ PLATFORM_HINTS = {
    ),
    "telegram": (
        "You are on a text messaging communication platform, Telegram. "
-        "Please do not use markdown as it does not render. "
+        "Standard markdown is automatically converted to Telegram format. "
+        "Supported: **bold**, *italic*, ~~strikethrough~~, ||spoiler||, "
+        "`inline code`, ```code blocks```, [links](url), and ## headers. "
        "You can send media files natively: to deliver a file to the user, "
        "include MEDIA:/absolute/path/to/file in your response. Images "
        "(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
@@ -93,6 +93,17 @@ _DB_CONNSTR_RE = re.compile(
    re.IGNORECASE,
 )

+# JWT tokens: header.payload[.signature] — always start with "eyJ" (base64 for "{")
+# Matches 1-part (header only), 2-part (header.payload), and full 3-part JWTs.
+_JWT_RE = re.compile(
+    r"eyJ[A-Za-z0-9_-]{10,}"           # Header (always starts with eyJ)
+    r"(?:\.[A-Za-z0-9_=-]{4,}){0,2}"   # Optional payload and/or signature
+)
+
+# Discord user/role mentions: <@123456789012345678> or <@!123456789012345678>
+# Snowflake IDs are 17-20 digit integers that resolve to specific Discord accounts.
+_DISCORD_MENTION_RE = re.compile(r"<@!?(\d{17,20})>")
+
 # E.164 phone numbers: +<country><number>, 7-15 digits
 # Negative lookahead prevents matching hex strings or identifiers
 _SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
@@ -159,6 +170,12 @@ def redact_sensitive_text(text: str) -> str:
    # Database connection string passwords
    text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)

+    # JWT tokens (eyJ... — base64-encoded JSON headers)
+    text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)
+
+    # Discord user/role mentions (<@snowflake_id>)
+    text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)
+
    # E.164 phone numbers (Signal, WhatsApp)
    def _redact_phone(m):
        phone = m.group(1)
@@ -12,6 +12,8 @@ from datetime import datetime
 from pathlib import Path
 from typing import Any, Dict, Optional

+from hermes_constants import display_hermes_home
+
 logger = logging.getLogger(__name__)

 _skill_commands: Dict[str, Dict[str, Any]] = {}
@@ -70,7 +72,14 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
    skill_name = str(loaded_skill.get("name") or normalized)
    skill_path = str(loaded_skill.get("path") or "")
    skill_dir = None
-    if skill_path:
+    # Prefer the absolute skill_dir returned by skill_view() — this is
+    # correct for both local and external skills.  Fall back to the old
+    # SKILLS_DIR-relative reconstruction only when skill_dir is absent
+    # (e.g. legacy skill_view responses).
+    abs_skill_dir = loaded_skill.get("skill_dir")
+    if abs_skill_dir:
+        skill_dir = Path(abs_skill_dir)
+    elif skill_path:
        try:
            skill_dir = SKILLS_DIR / Path(skill_path).parent
        except Exception:
@@ -108,7 +117,7 @@ def _inject_skill_config(loaded_skill: dict[str, Any], parts: list[str]) -> None
        if not resolved:
            return

-        lines = ["", "[Skill config (from ~/.hermes/config.yaml):"]
+        lines = ["", f"[Skill config (from {display_hermes_home()}/config.yaml):"]
        for key, value in resolved.items():
            display_val = str(value) if value else "(not set)"
            lines.append(f"  {key} = {display_val}")
@@ -284,6 +284,80 @@ _OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
        source_url="https://ai.google.dev/pricing",
        pricing_version="google-pricing-2026-03-16",
    ),
+    # AWS Bedrock — pricing per the Bedrock pricing page.
+    # Bedrock charges the same per-token rates as the model provider but
+    # through AWS billing.  These are the on-demand prices (no commitment).
+    # Source: https://aws.amazon.com/bedrock/pricing/
+    (
+        "bedrock",
+        "anthropic.claude-opus-4-6",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("15.00"),
+        output_cost_per_million=Decimal("75.00"),
+        source="official_docs_snapshot",
+        source_url="https://aws.amazon.com/bedrock/pricing/",
+        pricing_version="bedrock-pricing-2026-04",
+    ),
+    (
+        "bedrock",
+        "anthropic.claude-sonnet-4-6",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("3.00"),
+        output_cost_per_million=Decimal("15.00"),
+        source="official_docs_snapshot",
+        source_url="https://aws.amazon.com/bedrock/pricing/",
+        pricing_version="bedrock-pricing-2026-04",
+    ),
+    (
+        "bedrock",
+        "anthropic.claude-sonnet-4-5",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("3.00"),
+        output_cost_per_million=Decimal("15.00"),
+        source="official_docs_snapshot",
+        source_url="https://aws.amazon.com/bedrock/pricing/",
+        pricing_version="bedrock-pricing-2026-04",
+    ),
+    (
+        "bedrock",
+        "anthropic.claude-haiku-4-5",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("0.80"),
+        output_cost_per_million=Decimal("4.00"),
+        source="official_docs_snapshot",
+        source_url="https://aws.amazon.com/bedrock/pricing/",
+        pricing_version="bedrock-pricing-2026-04",
+    ),
+    (
+        "bedrock",
+        "amazon.nova-pro",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("0.80"),
+        output_cost_per_million=Decimal("3.20"),
+        source="official_docs_snapshot",
+        source_url="https://aws.amazon.com/bedrock/pricing/",
+        pricing_version="bedrock-pricing-2026-04",
+    ),
+    (
+        "bedrock",
+        "amazon.nova-lite",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("0.06"),
+        output_cost_per_million=Decimal("0.24"),
+        source="official_docs_snapshot",
+        source_url="https://aws.amazon.com/bedrock/pricing/",
+        pricing_version="bedrock-pricing-2026-04",
+    ),
+    (
+        "bedrock",
+        "amazon.nova-micro",
+    ): PricingEntry(
+        input_cost_per_million=Decimal("0.035"),
+        output_cost_per_million=Decimal("0.14"),
+        source="official_docs_snapshot",
+        source_url="https://aws.amazon.com/bedrock/pricing/",
+        pricing_version="bedrock-pricing-2026-04",
+    ),
 }


@@ -561,7 +561,10 @@ class BatchRunner:
            provider_sort (str): Sort providers by price/throughput/latency (optional)
            max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
            reasoning_config (Dict): OpenRouter reasoning config override (e.g. {"effort": "none"} to disable thinking)
-            prefill_messages (List[Dict]): Messages to prepend as prefilled conversation context (few-shot priming)
+            prefill_messages (List[Dict]): Messages to prepend as prefilled conversation context (few-shot priming).
+                NOTE: Anthropic Sonnet 4.6+ and Opus 4.6+ reject a trailing assistant-role prefill
+                (400 error).  For those models use output_config.format or structured-output
+                schemas instead.  Safe here for user-role priming and for older Claude / non-Claude models.
            max_samples (int): Only process the first N samples from the dataset (optional, processes all if not set)
        """
        self.dataset_file = Path(dataset_file)
@@ -16,7 +16,7 @@ model:
  #   "nous"         - Nous Portal OAuth (requires: hermes login)
  #   "nous-api"     - Nous Portal API key (requires: NOUS_API_KEY)
  #   "anthropic"    - Direct Anthropic API (requires: ANTHROPIC_API_KEY)
-  #   "openai-codex" - OpenAI Codex (requires: hermes login --provider openai-codex)
+  #   "openai-codex" - OpenAI Codex (requires: hermes auth)
  #   "copilot"      - GitHub Copilot / GitHub Models (requires: GITHUB_TOKEN)
  #   "gemini"      - Use Google AI Studio direct (requires: GOOGLE_API_KEY or GEMINI_API_KEY)
  #   "zai"         - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
@@ -26,6 +26,7 @@ model:
  #   "huggingface"  - Hugging Face Inference (requires: HF_TOKEN)
  #   "xiaomi"       - Xiaomi MiMo (requires: XIAOMI_API_KEY)
  #   "arcee"        - Arcee AI Trinity models (requires: ARCEEAI_API_KEY)
+  #   "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY — https://ollama.com/settings)
  #   "kilocode"     - KiloCode gateway (requires: KILOCODE_API_KEY)
  #   "ai-gateway"   - Vercel AI Gateway (requires: AI_GATEWAY_API_KEY)
  #
@@ -37,12 +38,6 @@ model:
  #     base_url: "http://localhost:1234/v1"
  #   No API key needed — local servers typically ignore auth.
  #
-  #   For Ollama Cloud (https://ollama.com/pricing):
-  #     provider: "custom"
-  #     base_url: "https://ollama.com/v1"
-  #   Set OLLAMA_API_KEY in .env — automatically picked up when base_url
-  #   points to ollama.com.
-  #
  # Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
  provider: "auto"
  
@@ -337,6 +332,7 @@ compression:
 #   "openrouter" - Force OpenRouter (requires OPENROUTER_API_KEY)
 #   "nous"       - Force Nous Portal (requires: hermes login)
 #   "gemini"      - Force Google AI Studio direct (requires: GOOGLE_API_KEY or GEMINI_API_KEY)
+#   "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY)
 #   "codex"       - Force Codex OAuth (requires: hermes model → Codex).
 #                  Uses gpt-5.3-codex which supports vision.
 #   "main"       - Use your custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY).
@@ -564,6 +560,18 @@ platform_toolsets:
  homeassistant: [hermes-homeassistant]
  qqbot: [hermes-qqbot]

+# =============================================================================
+# Gateway Platform Settings
+# =============================================================================
+# Optional per-platform messaging settings.
+# Platform-specific knobs live under `extra`.
+#
+# platforms:
+#   telegram:
+#     reply_to_mode: "first"  # off | first | all
+#     extra:
+#       disable_link_previews: false  # Set true to suppress Telegram URL previews in bot messages
+
 # ─────────────────────────────────────────────────────────────────────────────
 # Available toolsets (use these names in platform_toolsets or the toolsets list)
 #
@@ -401,14 +401,27 @@ def load_cli_config() -> Dict[str, Any]:
    # filesystem is directly accessible.  For ALL remote/container backends
    # (ssh, docker, modal, singularity), the host path doesn't exist on the
    # target -- remove the key so terminal_tool.py uses its per-backend default.
-    if terminal_config.get("cwd") in (".", "auto", "cwd"):
-        effective_backend = terminal_config.get("env_type", "local")
-        if effective_backend == "local":
-            terminal_config["cwd"] = os.getcwd()
-            defaults["terminal"]["cwd"] = terminal_config["cwd"]
+    #
+    # GUARD: If TERMINAL_CWD is already set to a real absolute path (by the
+    # gateway's config bridge earlier in the process), don't clobber it.
+    # This prevents a lazy import of cli.py during gateway runtime from
+    # rewriting TERMINAL_CWD to the service's working directory.
+    # See issue #10817.
+    _CWD_PLACEHOLDERS = (".", "auto", "cwd")
+    if terminal_config.get("cwd") in _CWD_PLACEHOLDERS:
+        _existing_cwd = os.environ.get("TERMINAL_CWD", "")
+        if _existing_cwd and _existing_cwd not in _CWD_PLACEHOLDERS and os.path.isabs(_existing_cwd):
+            # Gateway (or earlier startup) already resolved a real path — keep it
+            terminal_config["cwd"] = _existing_cwd
+            defaults["terminal"]["cwd"] = _existing_cwd
        else:
-            # Remove so TERMINAL_CWD stays unset → tool picks backend default
-            terminal_config.pop("cwd", None)
+            effective_backend = terminal_config.get("env_type", "local")
+            if effective_backend == "local":
+                terminal_config["cwd"] = os.getcwd()
+                defaults["terminal"]["cwd"] = terminal_config["cwd"]
+            else:
+                # Remove so TERMINAL_CWD stays unset → tool picks backend default
+                terminal_config.pop("cwd", None)
    
    env_mappings = {
        "env_type": "TERMINAL_ENV",
@@ -2013,7 +2026,17 @@ class HermesCLI:
        """Return the visible height for the spinner/status text line above the status bar."""
        if not getattr(self, "_spinner_text", ""):
            return 0
-        return 0 if self._use_minimal_tui_chrome(width=width) else 1
+        if self._use_minimal_tui_chrome(width=width):
+            return 0
+        # Compute how many lines the spinner text needs when wrapped.
+        # The rendered text is "  {emoji} {label}  ({elapsed})" — about
+        # len(_spinner_text) + 16 chars for indent + timer suffix.
+        width = width or self._get_tui_terminal_width()
+        if width and width > 10:
+            import math
+            text_len = len(self._spinner_text) + 16  # indent + timer
+            return max(1, math.ceil(text_len / width))
+        return 1

    def _get_voice_status_fragments(self, width: Optional[int] = None):
        """Return the voice status bar fragments for the interactive TUI."""
@@ -3897,23 +3920,14 @@ class HermesCLI:
    
    def _handle_profile_command(self):
        """Display active profile name and home directory."""
-        from hermes_constants import get_hermes_home, display_hermes_home
+        from hermes_constants import display_hermes_home
+        from hermes_cli.profiles import get_active_profile_name

-        home = get_hermes_home()
        display = display_hermes_home()
-
-        profiles_parent = Path.home() / ".hermes" / "profiles"
-        try:
-            rel = home.relative_to(profiles_parent)
-            profile_name = str(rel).split("/")[0]
-        except ValueError:
-            profile_name = None
+        profile_name = get_active_profile_name()

        print()
-        if profile_name:
-            print(f"  Profile: {profile_name}")
-        else:
-            print("  Profile: default")
+        print(f"  Profile: {profile_name}")
        print(f"  Home:    {display}")
        print()

@@ -4100,6 +4114,8 @@ class HermesCLI:
                self.agent.flush_memories(self.conversation_history)
            except (Exception, KeyboardInterrupt):
                pass
+            # Trigger memory extraction on the old session before session_id rotates.
+            self.agent.commit_memory_session(self.conversation_history)
            self._notify_session_boundary("on_session_finalize")
        elif self.agent:
            # First session or empty history — still finalize the old session
@@ -4588,16 +4604,19 @@ class HermesCLI:
                self._close_model_picker()
                return
            provider_data = providers[selected]
-            model_list = []
-            try:
-                from hermes_cli.models import provider_model_ids
-                live = provider_model_ids(provider_data["slug"])
-                if live:
-                    model_list = live
-            except Exception:
-                pass
+            # Use the curated model list from list_authenticated_providers()
+            # (same lists as `hermes model` and gateway pickers).
+            # Only fall back to the live provider catalog when the curated
+            # list is empty (e.g. user-defined endpoints with no curated list).
+            model_list = provider_data.get("models", [])
            if not model_list:
-                model_list = provider_data.get("models", [])
+                try:
+                    from hermes_cli.models import provider_model_ids
+                    live = provider_model_ids(provider_data["slug"])
+                    if live:
+                        model_list = live
+                except Exception:
+                    pass
            state["stage"] = "model"
            state["provider_data"] = provider_data
            state["model_list"] = model_list
@@ -4905,6 +4924,52 @@ class HermesCLI:
            return "\n".join(p for p in parts if p)
        return str(value)

+    def _handle_gquota_command(self, cmd_original: str) -> None:
+        """Show Google Gemini Code Assist quota usage for the current OAuth account."""
+        try:
+            from agent.google_oauth import get_valid_access_token, GoogleOAuthError, load_credentials
+            from agent.google_code_assist import retrieve_user_quota, CodeAssistError
+        except ImportError as exc:
+            self.console.print(f"  [red]Gemini modules unavailable: {exc}[/]")
+            return
+
+        try:
+            access_token = get_valid_access_token()
+        except GoogleOAuthError as exc:
+            self.console.print(f"  [yellow]{exc}[/]")
+            self.console.print("  Run [bold]/model[/] and pick 'Google Gemini (OAuth)' to sign in.")
+            return
+
+        creds = load_credentials()
+        project_id = (creds.project_id if creds else "") or ""
+
+        try:
+            buckets = retrieve_user_quota(access_token, project_id=project_id)
+        except CodeAssistError as exc:
+            self.console.print(f"  [red]Quota lookup failed:[/] {exc}")
+            return
+
+        if not buckets:
+            self.console.print("  [dim]No quota buckets reported (account may be on legacy/unmetered tier).[/]")
+            return
+
+        # Sort for stable display, group by model
+        buckets.sort(key=lambda b: (b.model_id, b.token_type))
+        self.console.print()
+        self.console.print(f"  [bold]Gemini Code Assist quota[/]  (project: {project_id or '(auto / free-tier)'})")
+        self.console.print()
+        for b in buckets:
+            pct = max(0.0, min(1.0, b.remaining_fraction))
+            width = 20
+            filled = int(round(pct * width))
+            bar = "▓" * filled + "░" * (width - filled)
+            pct_str = f"{int(pct * 100):3d}%"
+            header = b.model_id
+            if b.token_type:
+                header += f" [{b.token_type}]"
+            self.console.print(f"    {header:40s}  {bar}  {pct_str}")
+        self.console.print()
+
    def _handle_personality_command(self, cmd: str):
        """Handle the /personality command to set predefined personalities."""
        parts = cmd.split(maxsplit=1)
@@ -5414,6 +5479,8 @@ class HermesCLI:
            self._handle_model_switch(cmd_original)
        elif canonical == "provider":
            self._show_model_and_providers()
+        elif canonical == "gquota":
+            self._handle_gquota_command(cmd_original)

        elif canonical == "personality":
            # Use original case (handler lowercases the personality name itself)
@@ -5488,7 +5555,8 @@ class HermesCLI:
                        version = f" v{p['version']}" if p["version"] else ""
                        tools = f"{p['tools']} tools" if p["tools"] else ""
                        hooks = f"{p['hooks']} hooks" if p["hooks"] else ""
-                        parts = [x for x in [tools, hooks] if x]
+                        commands = f"{p['commands']} commands" if p.get("commands") else ""
+                        parts = [x for x in [tools, hooks, commands] if x]
                        detail = f" ({', '.join(parts)})" if parts else ""
                        error = f" — {p['error']}" if p["error"] else ""
                        print(f"  {status} {p['name']}{version}{detail}{error}")
@@ -5953,7 +6021,7 @@ class HermesCLI:
        parts = cmd.strip().split(None, 1)
        sub = parts[1].lower().strip() if len(parts) > 1 else "status"

-        _DEFAULT_CDP = "http://localhost:9222"
+        _DEFAULT_CDP = "http://127.0.0.1:9222"
        current = os.environ.get("BROWSER_CDP_URL", "").strip()

        if sub.startswith("connect"):
@@ -6200,13 +6268,21 @@ class HermesCLI:
    def _toggle_yolo(self):
        """Toggle YOLO mode — skip all dangerous command approval prompts."""
        import os
+        from hermes_cli.colors import Colors as _Colors
+
        current = bool(os.environ.get("HERMES_YOLO_MODE"))
        if current:
            os.environ.pop("HERMES_YOLO_MODE", None)
-            self.console.print("  ⚠ YOLO mode [bold red]OFF[/] — dangerous commands will require approval.")
+            _cprint(
+                f"  ⚠ YOLO mode {_Colors.BOLD}{_Colors.RED}OFF{_Colors.RESET}"
+                " — dangerous commands will require approval."
+            )
        else:
            os.environ["HERMES_YOLO_MODE"] = "1"
-            self.console.print("  ⚡ YOLO mode [bold green]ON[/] — all commands auto-approved. Use with caution.")
+            _cprint(
+                f"  ⚡ YOLO mode {_Colors.BOLD}{_Colors.GREEN}ON{_Colors.RESET}"
+                " — all commands auto-approved. Use with caution."
+            )

    def _handle_reasoning_command(self, cmd: str):
        """Handle /reasoning — manage effort level and display toggle.
@@ -7383,7 +7459,15 @@ class HermesCLI:
        self._invalidate()

    def _get_approval_display_fragments(self):
-        """Render the dangerous-command approval panel for the prompt_toolkit UI."""
+        """Render the dangerous-command approval panel for the prompt_toolkit UI.
+
+        Layout priority: title + command + choices must always render, even if
+        the terminal is short or the description is long. Description is placed
+        at the bottom of the panel and gets truncated to fit the remaining row
+        budget. This prevents HSplit from clipping approve/deny off-screen when
+        tirith findings produce multi-paragraph descriptions or when the user
+        runs in a compact terminal pane.
+        """
        state = self._approval_state
        if not state:
            return []
@@ -7442,22 +7526,89 @@ class HermesCLI:
        box_width = _panel_box_width(title, preview_lines)
        inner_text_width = max(8, box_width - 2)

+        # Pre-wrap the mandatory content — command + choices must always render.
+        cmd_wrapped = _wrap_panel_text(cmd_display, inner_text_width)
+
+        # (choice_index, wrapped_line) so we can re-apply selected styling below
+        choice_wrapped: list[tuple[int, str]] = []
+        for i, choice in enumerate(choices):
+            label = choice_labels.get(choice, choice)
+            prefix = '❯ ' if i == selected else '  '
+            for wrapped in _wrap_panel_text(f"{prefix}{label}", inner_text_width, subsequent_indent="  "):
+                choice_wrapped.append((i, wrapped))
+
+        # Budget vertical space so HSplit never clips the command or choices.
+        # Panel chrome (full layout with separators):
+        #   top border + title + blank_after_title
+        #   + blank_between_cmd_choices + bottom border = 5 rows.
+        # In tight terminals we collapse to:
+        #   top border + title + bottom border = 3 rows (no blanks).
+        #
+        # reserved_below: rows consumed below the approval panel by the
+        # spinner/tool-progress line, status bar, input area, separators, and
+        # prompt symbol. Measured at ~6 rows during live PTY approval prompts;
+        # budget 6 so we don't overestimate the panel's room.
+        term_rows = shutil.get_terminal_size((100, 24)).lines
+        chrome_full = 5
+        chrome_tight = 3
+        reserved_below = 6
+
+        available = max(0, term_rows - reserved_below)
+        mandatory_full = chrome_full + len(cmd_wrapped) + len(choice_wrapped)
+
+        # If the full-chrome panel doesn't fit, drop the separator blanks.
+        # This keeps the command and every choice on-screen in compact terminals.
+        use_compact_chrome = mandatory_full > available
+        chrome_rows = chrome_tight if use_compact_chrome else chrome_full
+
+        # If the command itself is too long to leave room for choices (e.g. user
+        # hit "view" on a multi-hundred-character command), truncate it so the
+        # approve/deny buttons still render. Keep at least 1 row of command.
+        max_cmd_rows = max(1, available - chrome_rows - len(choice_wrapped))
+        if len(cmd_wrapped) > max_cmd_rows:
+            keep = max(1, max_cmd_rows - 1) if max_cmd_rows > 1 else 1
+            cmd_wrapped = cmd_wrapped[:keep] + ["… (command truncated — use /logs or /debug for full text)"]
+
+        # Allocate any remaining rows to description. The extra -1 in full mode
+        # accounts for the blank separator between choices and description.
+        mandatory_no_desc = chrome_rows + len(cmd_wrapped) + len(choice_wrapped)
+        desc_sep_cost = 0 if use_compact_chrome else 1
+        available_for_desc = available - mandatory_no_desc - desc_sep_cost
+        # Even on huge terminals, cap description height so the panel stays compact.
+        available_for_desc = max(0, min(available_for_desc, 10))
+
+        desc_wrapped = _wrap_panel_text(description, inner_text_width) if description else []
+        if available_for_desc < 1 or not desc_wrapped:
+            desc_wrapped = []
+        elif len(desc_wrapped) > available_for_desc:
+            keep = max(1, available_for_desc - 1)
+            desc_wrapped = desc_wrapped[:keep] + ["… (description truncated)"]
+
+        # Render: title → command → choices → description (description last so
+        # any remaining overflow clips from the bottom of the least-critical
+        # content, never from the command or choices). Use compact chrome (no
+        # blank separators) when the terminal is tight.
        lines = []
        lines.append(('class:approval-border', '╭' + ('─' * box_width) + '╮\n'))
        _append_panel_line(lines, 'class:approval-border', 'class:approval-title', title, box_width)
-        _append_blank_panel_line(lines, 'class:approval-border', box_width)
-        for wrapped in _wrap_panel_text(description, inner_text_width):
-            _append_panel_line(lines, 'class:approval-border', 'class:approval-desc', wrapped, box_width)
-        for wrapped in _wrap_panel_text(cmd_display, inner_text_width):
+        if not use_compact_chrome:
+            _append_blank_panel_line(lines, 'class:approval-border', box_width)
+
+        for wrapped in cmd_wrapped:
            _append_panel_line(lines, 'class:approval-border', 'class:approval-cmd', wrapped, box_width)
-        _append_blank_panel_line(lines, 'class:approval-border', box_width)
-        for i, choice in enumerate(choices):
-            label = choice_labels.get(choice, choice)
+        if not use_compact_chrome:
+            _append_blank_panel_line(lines, 'class:approval-border', box_width)
+
+        for i, wrapped in choice_wrapped:
            style = 'class:approval-selected' if i == selected else 'class:approval-choice'
-            prefix = '❯ ' if i == selected else '  '
-            for wrapped in _wrap_panel_text(f"{prefix}{label}", inner_text_width, subsequent_indent="  "):
-                _append_panel_line(lines, 'class:approval-border', style, wrapped, box_width)
-        _append_blank_panel_line(lines, 'class:approval-border', box_width)
+            _append_panel_line(lines, 'class:approval-border', style, wrapped, box_width)
+
+        if desc_wrapped:
+            if not use_compact_chrome:
+                _append_blank_panel_line(lines, 'class:approval-border', box_width)
+            for wrapped in desc_wrapped:
+                _append_panel_line(lines, 'class:approval-border', 'class:approval-desc', wrapped, box_width)
+
        lines.append(('class:approval-border', '╰' + ('─' * box_width) + '╯\n'))
        return lines

@@ -7753,7 +7904,33 @@ class HermesCLI:
                    # Fallback for non-interactive mode (e.g., single-query)
                    agent_thread.join(0.1)

-            agent_thread.join()  # Ensure agent thread completes
+            # Wait for the agent thread to finish.  After an interrupt the
+            # agent may take a few seconds to clean up (kill subprocess, persist
+            # session).  Poll instead of a blocking join so the process_loop
+            # stays responsive — if the user sent another interrupt or the
+            # agent gets stuck, we can break out instead of freezing forever.
+            if interrupt_msg is not None:
+                # Interrupt path: poll briefly, then move on.  The agent
+                # thread is daemon — it dies on process exit regardless.
+                for _wait_tick in range(50):  # 50 * 0.2s = 10s max
+                    agent_thread.join(timeout=0.2)
+                    if not agent_thread.is_alive():
+                        break
+                    # Check if user fired ANOTHER interrupt (Ctrl+C sets
+                    # _should_exit which process_loop checks on next pass).
+                    if getattr(self, '_should_exit', False):
+                        break
+                if agent_thread.is_alive():
+                    logger.warning(
+                        "Agent thread still alive after interrupt "
+                        "(thread %s). Daemon thread will be cleaned up "
+                        "on exit.",
+                        agent_thread.ident,
+                    )
+            else:
+                # Normal completion: agent thread should be done already,
+                # but guard against edge cases.
+                agent_thread.join(timeout=30)

            # Proactively clean up async clients whose event loop is dead.
            # The agent thread may have created AsyncOpenAI clients bound
@@ -9046,6 +9223,7 @@ class HermesCLI:
        spinner_widget = Window(
            content=FormattedTextControl(get_spinner_text),
            height=get_spinner_height,
+            wrap_lines=True,
        )

        spacer = Window(
@@ -9082,7 +9260,13 @@ class HermesCLI:
            lines.append((border_style, "│" + (" " * box_width) + "│\n"))

        def _get_clarify_display():
-            """Build styled text for the clarify question/choices panel."""
+            """Build styled text for the clarify question/choices panel.
+
+            Layout priority: choices + Other option must always render even if
+            the question is very long. The question is budgeted to leave enough
+            rows for the choices and trailing chrome; anything over the budget
+            is truncated with a marker.
+            """
            state = cli_ref._clarify_state
            if not state:
                return []
@@ -9103,48 +9287,97 @@ class HermesCLI:
            box_width = _panel_box_width("Hermes needs your input", preview_lines)
            inner_text_width = max(8, box_width - 2)

+            # Pre-wrap choices + Other option — these are mandatory.
+            choice_wrapped: list[tuple[int, str]] = []
+            if choices:
+                for i, choice in enumerate(choices):
+                    prefix = '❯ ' if i == selected and not cli_ref._clarify_freetext else '  '
+                    for wrapped in _wrap_panel_text(f"{prefix}{choice}", inner_text_width, subsequent_indent="  "):
+                        choice_wrapped.append((i, wrapped))
+                # Trailing Other row(s)
+                other_idx = len(choices)
+                if selected == other_idx and not cli_ref._clarify_freetext:
+                    other_label_mand = '❯ Other (type your answer)'
+                elif cli_ref._clarify_freetext:
+                    other_label_mand = '❯ Other (type below)'
+                else:
+                    other_label_mand = '  Other (type your answer)'
+                other_wrapped = _wrap_panel_text(other_label_mand, inner_text_width, subsequent_indent="  ")
+            elif cli_ref._clarify_freetext:
+                # Freetext-only mode: the guidance line takes the place of choices.
+                other_wrapped = _wrap_panel_text(
+                    "Type your answer in the prompt below, then press Enter.",
+                    inner_text_width,
+                )
+            else:
+                other_wrapped = []
+
+            # Budget the question so mandatory rows always render.
+            # Chrome layouts:
+            #   full : top border + blank_after_title + blank_after_question
+            #          + blank_before_bottom + bottom border = 5 rows
+            #   tight: top border + bottom border = 2 rows (drop all blanks)
+            #
+            # reserved_below matches the approval-panel budget (~6 rows for
+            # spinner/tool-progress + status + input + separators + prompt).
+            term_rows = shutil.get_terminal_size((100, 24)).lines
+            chrome_full = 5
+            chrome_tight = 2
+            reserved_below = 6
+
+            available = max(0, term_rows - reserved_below)
+            mandatory_full = chrome_full + len(choice_wrapped) + len(other_wrapped)
+
+            use_compact_chrome = mandatory_full > available
+            chrome_rows = chrome_tight if use_compact_chrome else chrome_full
+
+            max_question_rows = max(1, available - chrome_rows - len(choice_wrapped) - len(other_wrapped))
+            max_question_rows = min(max_question_rows, 12)  # soft cap on huge terminals
+
+            question_wrapped = _wrap_panel_text(question, inner_text_width)
+            if len(question_wrapped) > max_question_rows:
+                keep = max(1, max_question_rows - 1)
+                question_wrapped = question_wrapped[:keep] + ["… (question truncated)"]
+
            lines = []
            # Box top border
            lines.append(('class:clarify-border', '╭─ '))
            lines.append(('class:clarify-title', 'Hermes needs your input'))
            lines.append(('class:clarify-border', ' ' + ('─' * max(0, box_width - len("Hermes needs your input") - 3)) + '╮\n'))
-            _append_blank_panel_line(lines, 'class:clarify-border', box_width)
+            if not use_compact_chrome:
+                _append_blank_panel_line(lines, 'class:clarify-border', box_width)

-            # Question text
-            for wrapped in _wrap_panel_text(question, inner_text_width):
+            # Question text (bounded)
+            for wrapped in question_wrapped:
                _append_panel_line(lines, 'class:clarify-border', 'class:clarify-question', wrapped, box_width)
-            _append_blank_panel_line(lines, 'class:clarify-border', box_width)
+            if not use_compact_chrome:
+                _append_blank_panel_line(lines, 'class:clarify-border', box_width)

            if cli_ref._clarify_freetext and not choices:
-                guidance = "Type your answer in the prompt below, then press Enter."
-                for wrapped in _wrap_panel_text(guidance, inner_text_width):
+                for wrapped in other_wrapped:
                    _append_panel_line(lines, 'class:clarify-border', 'class:clarify-choice', wrapped, box_width)
-                _append_blank_panel_line(lines, 'class:clarify-border', box_width)
+                if not use_compact_chrome:
+                    _append_blank_panel_line(lines, 'class:clarify-border', box_width)

            if choices:
                # Multiple-choice mode: show selectable options
-                for i, choice in enumerate(choices):
+                for i, wrapped in choice_wrapped:
                    style = 'class:clarify-selected' if i == selected and not cli_ref._clarify_freetext else 'class:clarify-choice'
-                    prefix = '❯ ' if i == selected and not cli_ref._clarify_freetext else '  '
-                    wrapped_lines = _wrap_panel_text(f"{prefix}{choice}", inner_text_width, subsequent_indent="  ")
-                    for wrapped in wrapped_lines:
-                        _append_panel_line(lines, 'class:clarify-border', style, wrapped, box_width)
+                    _append_panel_line(lines, 'class:clarify-border', style, wrapped, box_width)

-                # "Other" option (5th line, only shown when choices exist)
+                # "Other" option (trailing row(s), only shown when choices exist)
                other_idx = len(choices)
                if selected == other_idx and not cli_ref._clarify_freetext:
                    other_style = 'class:clarify-selected'
-                    other_label = '❯ Other (type your answer)'
                elif cli_ref._clarify_freetext:
                    other_style = 'class:clarify-active-other'
-                    other_label = '❯ Other (type below)'
                else:
                    other_style = 'class:clarify-choice'
-                    other_label = '  Other (type your answer)'
-                for wrapped in _wrap_panel_text(other_label, inner_text_width, subsequent_indent="  "):
+                for wrapped in other_wrapped:
                    _append_panel_line(lines, 'class:clarify-border', other_style, wrapped, box_width)

-            _append_blank_panel_line(lines, 'class:clarify-border', box_width)
+            if not use_compact_chrome:
+                _append_blank_panel_line(lines, 'class:clarify-border', box_width)
            lines.append(('class:clarify-border', '╰' + ('─' * box_width) + '╯\n'))
            return lines

@@ -10005,6 +10238,11 @@ def main(
                ):
                    cli.agent.quiet_mode = True
                    cli.agent.suppress_status_output = True
+                    # Suppress streaming display callbacks so stdout stays
+                    # machine-readable (no styled "Hermes" box, no tool-gen
+                    # status lines).  The response is printed once below.
+                    cli.agent.stream_delta_callback = None
+                    cli.agent.tool_gen_callback = None
                    result = cli.agent.run_conversation(
                        user_message=effective_query,
                        conversation_history=cli.conversation_history,
@@ -10012,7 +10250,8 @@ def main(
                    response = result.get("final_response", "") if isinstance(result, dict) else str(result)
                    if response:
                        print(response)
-                    print(f"\nsession_id: {cli.session_id}")
+                    # Session ID goes to stderr so piped stdout is clean.
+                    print(f"\nsession_id: {cli.session_id}", file=sys.stderr)
                    
                    # Ensure proper exit code for automation wrappers
                    sys.exit(1 if isinstance(result, dict) and result.get("failed") else 0)
@@ -501,6 +501,12 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]

        if schedule_changed:
            updated_schedule = updated["schedule"]
+            # The API may pass schedule as a raw string (e.g. "every 10m")
+            # instead of a pre-parsed dict.  Normalize it the same way
+            # create_job() does so downstream code can call .get() safely.
+            if isinstance(updated_schedule, str):
+                updated_schedule = parse_schedule(updated_schedule)
+                updated["schedule"] = updated_schedule
            updated["schedule_display"] = updates.get(
                "schedule_display",
                updated_schedule.get("display", updated.get("schedule_display")),
@@ -10,6 +10,7 @@ runs at a time if multiple processes overlap.

 import asyncio
 import concurrent.futures
+import contextvars
 import json
 import logging
 import os
@@ -288,11 +289,13 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option

    if wrap_response:
        task_name = job.get("name", job["id"])
+        job_id = job.get("id", "")
        delivery_content = (
            f"Cronjob Response: {task_name}\n"
+            f"(job_id: {job_id})\n"
            f"-------------\n\n"
            f"{content}\n\n"
-            f"Note: The agent cannot see this message, and therefore cannot respond to it."
+            f"To stop or manage this job, send me a new message (e.g. \"stop reminder {task_name}\")."
        )
    else:
        delivery_content = content
@@ -768,7 +771,11 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
        _cron_inactivity_limit = _cron_timeout if _cron_timeout > 0 else None
        _POLL_INTERVAL = 5.0
        _cron_pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
-        _cron_future = _cron_pool.submit(agent.run_conversation, prompt)
+        # Preserve scheduler-scoped ContextVar state (for example skill-declared
+        # env passthrough registrations) when the cron run hops into the worker
+        # thread used for inactivity timeout monitoring.
+        _cron_context = contextvars.copy_context()
+        _cron_future = _cron_pool.submit(_cron_context.run, agent.run_conversation, prompt)
        _inactivity_timeout = False
        try:
            if _cron_inactivity_limit is None:
@@ -830,6 +837,9 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            )

        final_response = result.get("final_response", "") or ""
+        # Strip leaked placeholder text that upstream may inject on empty completions.
+        if final_response.strip() == "(No response generated)":
+            final_response = ""
        # Use a separate variable for log display; keep final_response clean
        # for delivery logic (empty response = no delivery).
        logged_response = final_response if final_response else "(No response generated)"
@@ -969,6 +979,13 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
                        delivery_error = str(de)
                        logger.error("Delivery failed for job %s: %s", job["id"], de)

+                # Treat empty final_response as a soft failure so last_status
+                # is not "ok" — the agent ran but produced nothing useful.
+                # (issue #8585)
+                if success and not final_response:
+                    success = False
+                    error = "Agent completed but produced empty response (model error, timeout, or misconfiguration)"
+
                mark_job_run(job["id"], success, error, delivery_error=delivery_error)
                executed += 1

@@ -554,6 +554,12 @@ def load_gateway_config() -> GatewayConfig:
                    bridged["mention_patterns"] = platform_cfg["mention_patterns"]
                if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
                    bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
+                if "channel_prompts" in platform_cfg:
+                    channel_prompts = platform_cfg["channel_prompts"]
+                    if isinstance(channel_prompts, dict):
+                        bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
+                    else:
+                        bridged["channel_prompts"] = channel_prompts
                if not bridged:
                    continue
                plat_data = platforms_data.setdefault(plat.value, {})
@@ -632,6 +638,18 @@ def load_gateway_config() -> GatewayConfig:
                    os.environ["TELEGRAM_IGNORED_THREADS"] = str(ignored_threads)
                if "reactions" in telegram_cfg and not os.getenv("TELEGRAM_REACTIONS"):
                    os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
+                if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
+                    os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
+                if "disable_link_previews" in telegram_cfg:
+                    plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
+                    if not isinstance(plat_data, dict):
+                        plat_data = {}
+                        platforms_data[Platform.TELEGRAM.value] = plat_data
+                    extra = plat_data.setdefault("extra", {})
+                    if not isinstance(extra, dict):
+                        extra = {}
+                        plat_data["extra"] = extra
+                    extra["disable_link_previews"] = telegram_cfg["disable_link_previews"]

            whatsapp_cfg = yaml_cfg.get("whatsapp", {})
            if isinstance(whatsapp_cfg, dict):
@@ -902,7 +902,7 @@ class APIServerAdapter(BasePlatformAdapter):
                return time.monotonic()

            # Stream content chunks as they arrive from the agent
-            loop = asyncio.get_event_loop()
+            loop = asyncio.get_running_loop()
            while True:
                try:
                    delta = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
@@ -1241,7 +1241,7 @@ class APIServerAdapter(BasePlatformAdapter):
                    await _emit_text_delta(it)
                # Other types (non-string, non-tuple) are silently dropped.

-            loop = asyncio.get_event_loop()
+            loop = asyncio.get_running_loop()
            while True:
                try:
                    item = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
@@ -2004,7 +2004,7 @@ class APIServerAdapter(BasePlatformAdapter):
        callers (e.g. the SSE writer) to call ``agent.interrupt()`` from
        another thread to stop in-progress LLM calls.
        """
-        loop = asyncio.get_event_loop()
+        loop = asyncio.get_running_loop()

        def _run():
            agent = self._create_agent(
@@ -682,6 +682,10 @@ class MessageEvent:
    # Auto-loaded skill(s) for topic/channel bindings (e.g., Telegram DM Topics,
    # Discord channel_skill_bindings).  A single name or ordered list.
    auto_skill: Optional[str | list[str]] = None
+
+    # Per-channel ephemeral system prompt (e.g. Discord channel_prompts).
+    # Applied at API call time and never persisted to transcript history.
+    channel_prompt: Optional[str] = None
    
    # Internal flag — set for synthetic events (e.g. background process
    # completion notifications) that must bypass user authorization checks.
@@ -730,25 +734,56 @@ def merge_pending_message_event(
    pending_messages: Dict[str, MessageEvent],
    session_key: str,
    event: MessageEvent,
+    *,
+    merge_text: bool = False,
 ) -> None:
    """Store or merge a pending event for a session.

    Photo bursts/albums often arrive as multiple near-simultaneous PHOTO
    events. Merge those into the existing queued event so the next turn sees
-    the whole burst, while non-photo follow-ups still replace the pending
-    event normally.
+    the whole burst.
+
+    When ``merge_text`` is enabled, rapid follow-up TEXT events are appended
+    instead of replacing the pending turn. This is used for Telegram bursty
+    follow-ups so a multi-part user thought is not silently truncated to only
+    the last queued fragment.
    """
    existing = pending_messages.get(session_key)
-    if (
-        existing
-        and getattr(existing, "message_type", None) == MessageType.PHOTO
-        and event.message_type == MessageType.PHOTO
-    ):
-        existing.media_urls.extend(event.media_urls)
-        existing.media_types.extend(event.media_types)
-        if event.text:
-            existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
-        return
+    if existing:
+        existing_is_photo = getattr(existing, "message_type", None) == MessageType.PHOTO
+        incoming_is_photo = event.message_type == MessageType.PHOTO
+        existing_has_media = bool(existing.media_urls)
+        incoming_has_media = bool(event.media_urls)
+
+        if existing_is_photo and incoming_is_photo:
+            existing.media_urls.extend(event.media_urls)
+            existing.media_types.extend(event.media_types)
+            if event.text:
+                existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
+            return
+
+        if existing_has_media or incoming_has_media:
+            if incoming_has_media:
+                existing.media_urls.extend(event.media_urls)
+                existing.media_types.extend(event.media_types)
+            if event.text:
+                if existing.text:
+                    existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
+                else:
+                    existing.text = event.text
+            if existing_is_photo or incoming_is_photo:
+                existing.message_type = MessageType.PHOTO
+            return
+
+        if (
+            merge_text
+            and getattr(existing, "message_type", None) == MessageType.TEXT
+            and event.message_type == MessageType.TEXT
+        ):
+            if event.text:
+                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            return
+
    pending_messages[session_key] = event


@@ -776,6 +811,36 @@ _RETRYABLE_ERROR_PATTERNS = (
 MessageHandler = Callable[[MessageEvent], Awaitable[Optional[str]]]


+def resolve_channel_prompt(
+    config_extra: dict,
+    channel_id: str,
+    parent_id: str | None = None,
+) -> str | None:
+    """Resolve a per-channel ephemeral prompt from platform config.
+
+    Looks up ``channel_prompts`` in the adapter's ``config.extra`` dict.
+    Prefers an exact match on *channel_id*; falls back to *parent_id*
+    (useful for forum threads / child channels inheriting a parent prompt).
+
+    Returns the prompt string, or None if no match is found.  Blank/whitespace-
+    only prompts are treated as absent.
+    """
+    prompts = config_extra.get("channel_prompts") or {}
+    if not isinstance(prompts, dict):
+        return None
+
+    for key in (channel_id, parent_id):
+        if not key:
+            continue
+        prompt = prompts.get(key)
+        if prompt is None:
+            continue
+        prompt = str(prompt).strip()
+        if prompt:
+            return prompt
+    return None
+
+
 class BasePlatformAdapter(ABC):
    """
    Base class for platform adapters.
@@ -805,6 +870,11 @@ class BasePlatformAdapter(ABC):
        # Gateway shutdown cancels these so an old gateway instance doesn't keep
        # working on a task after --replace or manual restarts.
        self._background_tasks: set[asyncio.Task] = set()
+        # One-shot callbacks to fire after the main response is delivered.
+        # Keyed by session_key.  GatewayRunner uses this to defer
+        # background-review notifications ("💾 Skill created") until the
+        # primary reply has been sent.
+        self._post_delivery_callbacks: Dict[str, Callable] = {}
        self._expected_cancelled_tasks: set[asyncio.Task] = set()
        self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
        # Chats where auto-TTS on voice input is disabled (set by /voice off)
@@ -1221,7 +1291,7 @@ class BasePlatformAdapter(ABC):
                path = path[1:-1].strip()
            path = path.lstrip("`\"'").rstrip("`\"',.;:)}]")
            if path:
-                media.append((path, has_voice_tag))
+                media.append((os.path.expanduser(path), has_voice_tag))

        # Remove MEDIA tags from content (including surrounding quote/backtick wrappers)
        if media:
@@ -1509,7 +1579,7 @@ class BasePlatformAdapter(ABC):
            # session lifecycle and its cleanup races with the running task
            # (see PR #4926).
            cmd = event.get_command()
-            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background", "restart"):
+            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background", "restart", "queue", "q"):
                logger.debug(
                    "[%s] Command '/%s' bypassing active-session guard for %s",
                    self.name, cmd, session_key,
@@ -1624,6 +1694,21 @@ class BasePlatformAdapter(ABC):
            # streaming already delivered the text (already_sent=True) or
            # when the message was queued behind an active agent.  Log at
            # DEBUG to avoid noisy warnings for expected behavior.
+            #
+            # Suppress stale response when the session was interrupted by a
+            # new message that hasn't been consumed yet.  The pending message
+            # is processed by the pending-message handler below (#8221/#2483).
+            if (
+                response
+                and interrupt_event.is_set()
+                and session_key in self._pending_messages
+            ):
+                logger.info(
+                    "[%s] Suppressing stale response for interrupted session %s",
+                    self.name,
+                    session_key,
+                )
+                response = None
            if not response:
                logger.debug("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
            if response:
@@ -1845,6 +1930,14 @@ class BasePlatformAdapter(ABC):
            except Exception:
                pass  # Last resort — don't let error reporting crash the handler
        finally:
+            # Fire any one-shot post-delivery callback registered for this
+            # session (e.g. deferred background-review notifications).
+            _post_cb = getattr(self, "_post_delivery_callbacks", {}).pop(session_key, None)
+            if callable(_post_cb):
+                try:
+                    _post_cb()
+                except Exception:
+                    pass
            # Stop typing indicator
            typing_task.cancel()
            try:
@@ -235,6 +235,7 @@ class VoiceReceiver:
        # Calculate dynamic RTP header size (RFC 9335 / rtpsize mode)
        cc = first_byte & 0x0F  # CSRC count
        has_extension = bool(first_byte & 0x10)  # extension bit
+        has_padding = bool(first_byte & 0x20)  # padding bit (RFC 3550 §5.1)
        header_size = 12 + (4 * cc) + (4 if has_extension else 0)

        if len(data) < header_size + 4:  # need at least header + nonce
@@ -278,6 +279,31 @@ class VoiceReceiver:
        if ext_data_len and len(decrypted) > ext_data_len:
            decrypted = decrypted[ext_data_len:]

+        # --- Strip RTP padding (RFC 3550 §5.1) ---
+        # When the P bit is set, the last payload byte holds the count of
+        # trailing padding bytes (including itself) that must be removed
+        # before further processing. Skipping this passes padding-contaminated
+        # bytes into DAVE/Opus and corrupts inbound audio.
+        if has_padding:
+            if not decrypted:
+                if self._packet_debug_count <= 10:
+                    logger.warning(
+                        "RTP padding bit set but no payload (ssrc=%d)", ssrc,
+                    )
+                return
+            pad_len = decrypted[-1]
+            if pad_len == 0 or pad_len > len(decrypted):
+                if self._packet_debug_count <= 10:
+                    logger.warning(
+                        "Invalid RTP padding length %d for payload size %d (ssrc=%d)",
+                        pad_len, len(decrypted), ssrc,
+                    )
+                return
+            decrypted = decrypted[:-pad_len]
+            if not decrypted:
+                # Padding consumed entire payload — nothing to decode
+                return
+
        # --- DAVE E2EE decrypt ---
        if self._dave_session:
            with self._lock:
@@ -1379,6 +1405,68 @@ class DiscordAdapter(BasePlatformAdapter):
            )
            return await super().send_image(chat_id, image_url, caption, reply_to)

+    async def send_animation(
+        self,
+        chat_id: str,
+        animation_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> SendResult:
+        """Send an animated GIF natively as a Discord file attachment."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+
+        if not is_safe_url(animation_url):
+            logger.warning("[%s] Blocked unsafe animation URL during Discord send_animation", self.name)
+            return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
+
+        try:
+            import aiohttp
+
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+
+            # Download the GIF and send as a Discord file attachment
+            # (Discord renders .gif attachments as auto-playing animations inline)
+            from gateway.platforms.base import resolve_proxy_url, proxy_kwargs_for_aiohttp
+            _proxy = resolve_proxy_url(platform_env_var="DISCORD_PROXY")
+            _sess_kw, _req_kw = proxy_kwargs_for_aiohttp(_proxy)
+            async with aiohttp.ClientSession(**_sess_kw) as session:
+                async with session.get(animation_url, timeout=aiohttp.ClientTimeout(total=30), **_req_kw) as resp:
+                    if resp.status != 200:
+                        raise Exception(f"Failed to download animation: HTTP {resp.status}")
+
+                    animation_data = await resp.read()
+
+                    import io
+                    file = discord.File(io.BytesIO(animation_data), filename="animation.gif")
+
+                    msg = await channel.send(
+                        content=caption if caption else None,
+                        file=file,
+                    )
+                    return SendResult(success=True, message_id=str(msg.id))
+
+        except ImportError:
+            logger.warning(
+                "[%s] aiohttp not installed, falling back to URL. Run: pip install aiohttp",
+                self.name,
+                exc_info=True,
+            )
+            return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error(
+                "[%s] Failed to send animation attachment, falling back to URL: %s",
+                self.name,
+                e,
+                exc_info=True,
+            )
+            return await super().send_animation(chat_id, animation_url, caption, reply_to, metadata=metadata)
+
    async def send_video(
        self,
        chat_id: str,
@@ -1740,6 +1828,76 @@ class DiscordAdapter(BasePlatformAdapter):
        async def slash_btw(interaction: discord.Interaction, question: str):
            await self._run_simple_slash(interaction, f"/btw {question}")

+        # ── Auto-register any gateway-available commands not yet on the tree ──
+        # This ensures new commands added to COMMAND_REGISTRY in
+        # hermes_cli/commands.py automatically appear as Discord slash
+        # commands without needing a manual entry here.
+        try:
+            from hermes_cli.commands import COMMAND_REGISTRY, _is_gateway_available, _resolve_config_gates
+
+            already_registered = set()
+            try:
+                already_registered = {cmd.name for cmd in tree.get_commands()}
+            except Exception:
+                pass
+
+            config_overrides = _resolve_config_gates()
+
+            for cmd_def in COMMAND_REGISTRY:
+                if not _is_gateway_available(cmd_def, config_overrides):
+                    continue
+                # Discord command names: lowercase, hyphens OK, max 32 chars.
+                discord_name = cmd_def.name.lower()[:32]
+                if discord_name in already_registered:
+                    continue
+                # Skip aliases that overlap with already-registered names
+                # (aliases for explicitly registered commands are handled above).
+                desc = (cmd_def.description or f"Run /{cmd_def.name}")[:100]
+                has_args = bool(cmd_def.args_hint)
+
+                if has_args:
+                    # Command takes optional arguments — create handler with
+                    # an optional ``args`` string parameter.
+                    def _make_args_handler(_name: str, _hint: str):
+                        @discord.app_commands.describe(args=f"Arguments: {_hint}"[:100])
+                        async def _handler(interaction: discord.Interaction, args: str = ""):
+                            await self._run_simple_slash(
+                                interaction, f"/{_name} {args}".strip()
+                            )
+                        _handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
+                        return _handler
+
+                    handler = _make_args_handler(cmd_def.name, cmd_def.args_hint)
+                else:
+                    # Parameterless command.
+                    def _make_simple_handler(_name: str):
+                        async def _handler(interaction: discord.Interaction):
+                            await self._run_simple_slash(interaction, f"/{_name}")
+                        _handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
+                        return _handler
+
+                    handler = _make_simple_handler(cmd_def.name)
+
+                auto_cmd = discord.app_commands.Command(
+                    name=discord_name,
+                    description=desc,
+                    callback=handler,
+                )
+                try:
+                    tree.add_command(auto_cmd)
+                    already_registered.add(discord_name)
+                except Exception:
+                    # Silently skip commands that fail registration (e.g.
+                    # name conflict with a subcommand group).
+                    pass
+
+            logger.debug(
+                "Discord auto-registered %d commands from COMMAND_REGISTRY",
+                len(already_registered),
+            )
+        except Exception as e:
+            logger.warning("Discord auto-register from COMMAND_REGISTRY failed: %s", e)
+
        # Register skills under a single /skill command group with category
        # subcommand groups.  This uses 1 top-level slot instead of N,
        # supporting up to 25 categories × 25 skills = 625 skills.
@@ -1860,11 +2018,14 @@ class DiscordAdapter(BasePlatformAdapter):
        )

        msg_type = MessageType.COMMAND if text.startswith("/") else MessageType.TEXT
+        channel_id = str(interaction.channel_id)
+        parent_id = str(getattr(getattr(interaction, "channel", None), "parent_id", "") or "")
        return MessageEvent(
            text=text,
            message_type=msg_type,
            source=source,
            raw_message=interaction,
+            channel_prompt=self._resolve_channel_prompt(channel_id, parent_id or None),
        )

    # ------------------------------------------------------------------
@@ -1935,14 +2096,17 @@ class DiscordAdapter(BasePlatformAdapter):
            chat_topic=chat_topic,
        )

-        _parent_id = str(getattr(getattr(interaction, "channel", None), "parent_id", "") or "")
+        _parent_channel = self._thread_parent_channel(getattr(interaction, "channel", None))
+        _parent_id = str(getattr(_parent_channel, "id", "") or "")
        _skills = self._resolve_channel_skills(thread_id, _parent_id or None)
+        _channel_prompt = self._resolve_channel_prompt(thread_id, _parent_id or None)
        event = MessageEvent(
            text=text,
            message_type=MessageType.TEXT,
            source=source,
            raw_message=interaction,
            auto_skill=_skills,
+            channel_prompt=_channel_prompt,
        )
        await self.handle_message(event)

@@ -1971,6 +2135,11 @@ class DiscordAdapter(BasePlatformAdapter):
                    return list(dict.fromkeys(skills))  # dedup, preserve order
        return None

+    def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
+        """Resolve a Discord per-channel prompt, preferring the exact channel over its parent."""
+        from gateway.platforms.base import resolve_channel_prompt
+        return resolve_channel_prompt(self.config.extra, channel_id, parent_id)
+
    def _thread_parent_channel(self, channel: Any) -> Any:
        """Return the parent text channel when invoked from a thread."""
        return getattr(channel, "parent", None) or channel
@@ -2522,6 +2691,7 @@ class DiscordAdapter(BasePlatformAdapter):
        _parent_id = str(getattr(_chan, "parent_id", "") or "")
        _chan_id = str(getattr(_chan, "id", ""))
        _skills = self._resolve_channel_skills(_chan_id, _parent_id or None)
+        _channel_prompt = self._resolve_channel_prompt(_chan_id, _parent_id or None)

        reply_to_id = None
        reply_to_text = None
@@ -2542,6 +2712,7 @@ class DiscordAdapter(BasePlatformAdapter):
            reply_to_text=reply_to_text,
            timestamp=message.created_at,
            auto_skill=_skills,
+            channel_prompt=_channel_prompt,
        )

        # Track thread participation so the bot won't require @mention for
@@ -49,7 +49,10 @@ class MessageDeduplicator:
            return False
        now = time.time()
        if msg_id in self._seen:
-            return True
+            if now - self._seen[msg_id] < self._ttl:
+                return True
+            # Entry has expired — remove it and treat as new
+            del self._seen[msg_id]
        self._seen[msg_id] = now
        if len(self._seen) > self._max_size:
            cutoff = now - self._ttl
@@ -718,6 +718,12 @@ class MattermostAdapter(BasePlatformAdapter):
            thread_id=thread_id,
        )

+        # Per-channel ephemeral prompt
+        from gateway.platforms.base import resolve_channel_prompt
+        _channel_prompt = resolve_channel_prompt(
+            self.config.extra, channel_id, None,
+        )
+
        msg_event = MessageEvent(
            text=message_text,
            message_type=msg_type,
@@ -726,6 +732,7 @@ class MattermostAdapter(BasePlatformAdapter):
            message_id=post_id,
            media_urls=media_urls if media_urls else None,
            media_types=media_types if media_types else None,
+            channel_prompt=_channel_prompt,
        )

        await self.handle_message(msg_event)
@@ -366,6 +366,20 @@ class SlackAdapter(BasePlatformAdapter):
            # in an assistant-enabled context. Falls back to reactions.
            logger.debug("[Slack] assistant.threads.setStatus failed: %s", e)

+    def _dm_top_level_threads_as_sessions(self) -> bool:
+        """Whether top-level Slack DMs get per-message session threads.
+
+        Defaults to ``True`` so each visible DM reply thread is isolated as its
+        own Hermes session — matching the per-thread behavior channels already
+        have.  Set ``platforms.slack.extra.dm_top_level_threads_as_sessions``
+        to ``false`` in config.yaml to revert to the legacy behavior where all
+        top-level DMs share one continuous session.
+        """
+        raw = self.config.extra.get("dm_top_level_threads_as_sessions")
+        if raw is None:
+            return True  # default: each DM thread is its own session
+        return str(raw).strip().lower() in ("1", "true", "yes", "on")
+
    def _resolve_thread_ts(
        self,
        reply_to: Optional[str] = None,
@@ -996,10 +1010,14 @@ class SlackAdapter(BasePlatformAdapter):
        # Build thread_ts for session keying.
        # In channels: fall back to ts so each top-level @mention starts a
        #   new thread/session (the bot always replies in a thread).
-        # In DMs: only use the real thread_ts — top-level DMs should share
-        #   one continuous session, threaded DMs get their own session.
+        # In DMs: fall back to ts so each top-level DM reply thread gets
+        #   its own session key (matching channel behavior). Set
+        #   dm_top_level_threads_as_sessions: false in config to revert to
+        #   legacy single-session-per-DM-channel behavior.
        if is_dm:
-            thread_ts = event.get("thread_ts") or assistant_meta.get("thread_ts")  # None for top-level DMs
+            thread_ts = event.get("thread_ts") or assistant_meta.get("thread_ts")
+            if not thread_ts and self._dm_top_level_threads_as_sessions():
+                thread_ts = ts
        else:
            thread_ts = event.get("thread_ts") or ts  # ts fallback for channels

@@ -1167,6 +1185,12 @@ class SlackAdapter(BasePlatformAdapter):
            thread_id=thread_ts,
        )

+        # Per-channel ephemeral prompt
+        from gateway.platforms.base import resolve_channel_prompt
+        _channel_prompt = resolve_channel_prompt(
+            self.config.extra, channel_id, None,
+        )
+
        msg_event = MessageEvent(
            text=text,
            message_type=msg_type,
@@ -1176,6 +1200,7 @@ class SlackAdapter(BasePlatformAdapter):
            media_urls=media_urls,
            media_types=media_types,
            reply_to_message_id=thread_ts if thread_ts != ts else None,
+            channel_prompt=_channel_prompt,
        )

        # Only react when bot is directly addressed (DM or @mention).
@@ -11,6 +11,7 @@ import asyncio
 import json
 import logging
 import os
+import html as _html
 import re
 from typing import Dict, List, Optional, Any

@@ -18,6 +19,10 @@ logger = logging.getLogger(__name__)

 try:
    from telegram import Update, Bot, Message, InlineKeyboardButton, InlineKeyboardMarkup
+    try:
+        from telegram import LinkPreviewOptions
+    except ImportError:
+        LinkPreviewOptions = None
    from telegram.ext import (
        Application,
        CommandHandler,
@@ -36,6 +41,7 @@ except ImportError:
    Message = Any
    InlineKeyboardButton = Any
    InlineKeyboardMarkup = Any
+    LinkPreviewOptions = None
    Application = Any
    CommandHandler = Any
    CallbackQueryHandler = Any
@@ -129,6 +135,7 @@ class TelegramAdapter(BasePlatformAdapter):
    # When a chunk is near this limit, a continuation is almost certain.
    _SPLIT_THRESHOLD = 4000
    MEDIA_GROUP_WAIT_SECONDS = 0.8
+    _GENERAL_TOPIC_THREAD_ID = "1"
    
    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.TELEGRAM)
@@ -137,6 +144,7 @@ class TelegramAdapter(BasePlatformAdapter):
        self._webhook_mode: bool = False
        self._mention_patterns = self._compile_mention_patterns()
        self._reply_to_mode: str = getattr(config, 'reply_to_mode', 'first') or 'first'
+        self._disable_link_previews: bool = self._coerce_bool_extra("disable_link_previews", False)
        # Buffer rapid/album photo updates so Telegram image bursts are handled
        # as a single MessageEvent instead of self-interrupting multiple turns.
        self._media_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_MEDIA_BATCH_DELAY_SECONDS", "0.8"))
@@ -163,6 +171,38 @@ class TelegramAdapter(BasePlatformAdapter):
        # Approval button state: message_id → session_key
        self._approval_state: Dict[int, str] = {}

+    @staticmethod
+    def _is_callback_user_authorized(user_id: str) -> bool:
+        """Return whether a Telegram inline-button caller may perform gated actions."""
+        allowed_csv = os.getenv("TELEGRAM_ALLOWED_USERS", "").strip()
+        if not allowed_csv:
+            return True
+        allowed_ids = {uid.strip() for uid in allowed_csv.split(",") if uid.strip()}
+        return "*" in allowed_ids or user_id in allowed_ids
+
+    @classmethod
+    def _metadata_thread_id(cls, metadata: Optional[Dict[str, Any]]) -> Optional[str]:
+        if not metadata:
+            return None
+        thread_id = metadata.get("thread_id") or metadata.get("message_thread_id")
+        return str(thread_id) if thread_id is not None else None
+
+    @classmethod
+    def _message_thread_id_for_send(cls, thread_id: Optional[str]) -> Optional[int]:
+        if not thread_id or str(thread_id) == cls._GENERAL_TOPIC_THREAD_ID:
+            return None
+        return int(thread_id)
+
+    @classmethod
+    def _message_thread_id_for_typing(cls, thread_id: Optional[str]) -> Optional[int]:
+        if not thread_id:
+            return None
+        return int(thread_id)
+
+    @staticmethod
+    def _is_thread_not_found_error(error: Exception) -> bool:
+        return "thread not found" in str(error).lower()
+
    def _fallback_ips(self) -> list[str]:
        """Return validated fallback IPs from config (populated by _apply_env_overrides)."""
        configured = self.config.extra.get("fallback_ips", []) if getattr(self.config, "extra", None) else []
@@ -193,6 +233,26 @@ class TelegramAdapter(BasePlatformAdapter):
            pass
        return isinstance(error, OSError)

+    def _coerce_bool_extra(self, key: str, default: bool = False) -> bool:
+        value = self.config.extra.get(key) if getattr(self.config, "extra", None) else None
+        if value is None:
+            return default
+        if isinstance(value, str):
+            lowered = value.strip().lower()
+            if lowered in ("true", "1", "yes", "on"):
+                return True
+            if lowered in ("false", "0", "no", "off"):
+                return False
+            return default
+        return bool(value)
+
+    def _link_preview_kwargs(self) -> Dict[str, Any]:
+        if not getattr(self, "_disable_link_previews", False):
+            return {}
+        if LinkPreviewOptions is not None:
+            return {"link_preview_options": LinkPreviewOptions(is_disabled=True)}
+        return {"disable_web_page_preview": True}
+
    async def _handle_polling_network_error(self, error: Exception) -> None:
        """Reconnect polling after a transient network interruption.

@@ -540,7 +600,7 @@ class TelegramAdapter(BasePlatformAdapter):
                "write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
            }

-            proxy_url = resolve_proxy_url()
+            proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
            disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
            fallback_ips = self._fallback_ips()
            if not fallback_ips:
@@ -606,14 +666,14 @@ class TelegramAdapter(BasePlatformAdapter):
                from telegram.error import NetworkError, TimedOut
            except ImportError:
                NetworkError = TimedOut = OSError  # type: ignore[misc,assignment]
-            _max_connect = 3
+            _max_connect = 8
            for _attempt in range(_max_connect):
                try:
                    await self._app.initialize()
                    break
                except (NetworkError, TimedOut, OSError) as init_err:
                    if _attempt < _max_connect - 1:
-                        wait = 2 ** _attempt
+                        wait = min(2 ** _attempt, 15)
                        logger.warning(
                            "[%s] Connect attempt %d/%d failed: %s — retrying in %ds",
                            self.name, _attempt + 1, _max_connect, init_err, wait,
@@ -814,7 +874,7 @@ class TelegramAdapter(BasePlatformAdapter):
                ]
            
            message_ids = []
-            thread_id = metadata.get("thread_id") if metadata else None
+            thread_id = self._metadata_thread_id(metadata)
            
            try:
                from telegram.error import NetworkError as _NetErr
@@ -834,7 +894,7 @@ class TelegramAdapter(BasePlatformAdapter):
            for i, chunk in enumerate(chunks):
                should_thread = self._should_thread_reply(reply_to, i)
                reply_to_id = int(reply_to) if should_thread else None
-                effective_thread_id = int(thread_id) if thread_id else None
+                effective_thread_id = self._message_thread_id_for_send(thread_id)

                msg = None
                for _send_attempt in range(3):
@@ -847,6 +907,7 @@ class TelegramAdapter(BasePlatformAdapter):
                                parse_mode=ParseMode.MARKDOWN_V2,
                                reply_to_message_id=reply_to_id,
                                message_thread_id=effective_thread_id,
+                                **self._link_preview_kwargs(),
                            )
                        except Exception as md_error:
                            # Markdown parsing failed, try plain text
@@ -859,6 +920,7 @@ class TelegramAdapter(BasePlatformAdapter):
                                    parse_mode=None,
                                    reply_to_message_id=reply_to_id,
                                    message_thread_id=effective_thread_id,
+                                    **self._link_preview_kwargs(),
                                )
                            else:
                                raise
@@ -869,8 +931,7 @@ class TelegramAdapter(BasePlatformAdapter):
                        # (not transient network issues). Detect and handle
                        # specific cases instead of blindly retrying.
                        if _BadReq and isinstance(send_err, _BadReq):
-                            err_lower = str(send_err).lower()
-                            if "thread not found" in err_lower and effective_thread_id is not None:
+                            if self._is_thread_not_found_error(send_err) and effective_thread_id is not None:
                                # Thread doesn't exist — retry without
                                # message_thread_id so the message still
                                # reaches the chat.
@@ -880,6 +941,7 @@ class TelegramAdapter(BasePlatformAdapter):
                                )
                                effective_thread_id = None
                                continue
+                            err_lower = str(send_err).lower()
                            if "message to be replied not found" in err_lower and reply_to_id is not None:
                                # Original message was deleted before we
                                # could reply — clear reply target and retry
@@ -1046,6 +1108,7 @@ class TelegramAdapter(BasePlatformAdapter):
                text=text,
                parse_mode=ParseMode.MARKDOWN,
                reply_markup=keyboard,
+                **self._link_preview_kwargs(),
            )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
@@ -1068,15 +1131,13 @@ class TelegramAdapter(BasePlatformAdapter):
        try:
            cmd_preview = command[:3800] + "..." if len(command) > 3800 else command
            text = (
-                f"⚠️ *Command Approval Required*\n\n"
-                f"`{cmd_preview}`\n\n"
-                f"Reason: {description}"
+                f"⚠️ <b>Command Approval Required</b>\n\n"
+                f"<pre>{_html.escape(cmd_preview)}</pre>\n\n"
+                f"Reason: {_html.escape(description)}"
            )

            # Resolve thread context for thread replies
-            thread_id = None
-            if metadata:
-                thread_id = metadata.get("thread_id") or metadata.get("message_thread_id")
+            thread_id = self._metadata_thread_id(metadata)

            # We'll use the message_id as part of callback_data to look up session_key
            # Send a placeholder first, then update — or use a counter.
@@ -1100,11 +1161,13 @@ class TelegramAdapter(BasePlatformAdapter):
            kwargs: Dict[str, Any] = {
                "chat_id": int(chat_id),
                "text": text,
-                "parse_mode": ParseMode.MARKDOWN,
+                "parse_mode": ParseMode.HTML,
                "reply_markup": keyboard,
+                **self._link_preview_kwargs(),
            }
-            if thread_id:
-                kwargs["message_thread_id"] = int(thread_id)
+            message_thread_id = self._message_thread_id_for_send(thread_id)
+            if message_thread_id is not None:
+                kwargs["message_thread_id"] = message_thread_id

            msg = await self._bot.send_message(**kwargs)

@@ -1172,6 +1235,7 @@ class TelegramAdapter(BasePlatformAdapter):
                parse_mode=ParseMode.MARKDOWN,
                reply_markup=keyboard,
                message_thread_id=int(thread_id) if thread_id else None,
+                **self._link_preview_kwargs(),
            )

            # Store picker state keyed by chat_id
@@ -1440,12 +1504,9 @@ class TelegramAdapter(BasePlatformAdapter):

                # Only authorized users may click approval buttons.
                caller_id = str(getattr(query.from_user, "id", ""))
-                allowed_csv = os.getenv("TELEGRAM_ALLOWED_USERS", "").strip()
-                if allowed_csv:
-                    allowed_ids = {uid.strip() for uid in allowed_csv.split(",") if uid.strip()}
-                    if "*" not in allowed_ids and caller_id not in allowed_ids:
-                        await query.answer(text="⛔ You are not authorized to approve commands.")
-                        return
+                if not self._is_callback_user_authorized(caller_id):
+                    await query.answer(text="⛔ You are not authorized to approve commands.")
+                    return

                session_key = self._approval_state.pop(approval_id, None)
                if not session_key:
@@ -1490,6 +1551,10 @@ class TelegramAdapter(BasePlatformAdapter):
        if not data.startswith("update_prompt:"):
            return
        answer = data.split(":", 1)[1]  # "y" or "n"
+        caller_id = str(getattr(query.from_user, "id", ""))
+        if not self._is_callback_user_authorized(caller_id):
+            await query.answer(text="⛔ You are not authorized to answer update prompts.")
+            return
        await query.answer(text=f"Sent '{answer}' to the update process.")
        # Edit the message to show the choice and remove buttons
        label = "Yes" if answer == "y" else "No"
@@ -1535,23 +1600,23 @@ class TelegramAdapter(BasePlatformAdapter):
            with open(audio_path, "rb") as audio_file:
                # .ogg files -> send as voice (round playable bubble)
                if audio_path.endswith((".ogg", ".opus")):
-                    _voice_thread = metadata.get("thread_id") if metadata else None
+                    _voice_thread = self._metadata_thread_id(metadata)
                    msg = await self._bot.send_voice(
                        chat_id=int(chat_id),
                        voice=audio_file,
                        caption=caption[:1024] if caption else None,
                        reply_to_message_id=int(reply_to) if reply_to else None,
-                        message_thread_id=int(_voice_thread) if _voice_thread else None,
+                        message_thread_id=self._message_thread_id_for_send(_voice_thread),
                    )
                else:
                    # .mp3 and others -> send as audio file
-                    _audio_thread = metadata.get("thread_id") if metadata else None
+                    _audio_thread = self._metadata_thread_id(metadata)
                    msg = await self._bot.send_audio(
                        chat_id=int(chat_id),
                        audio=audio_file,
                        caption=caption[:1024] if caption else None,
                        reply_to_message_id=int(reply_to) if reply_to else None,
-                        message_thread_id=int(_audio_thread) if _audio_thread else None,
+                        message_thread_id=self._message_thread_id_for_send(_audio_thread),
                    )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
@@ -1581,14 +1646,14 @@ class TelegramAdapter(BasePlatformAdapter):
            if not os.path.exists(image_path):
                return SendResult(success=False, error=f"Image file not found: {image_path}")

-            _thread = metadata.get("thread_id") if metadata else None
+            _thread = self._metadata_thread_id(metadata)
            with open(image_path, "rb") as image_file:
                msg = await self._bot.send_photo(
                    chat_id=int(chat_id),
                    photo=image_file,
                    caption=caption[:1024] if caption else None,
                    reply_to_message_id=int(reply_to) if reply_to else None,
-                    message_thread_id=int(_thread) if _thread else None,
+                    message_thread_id=self._message_thread_id_for_send(_thread),
                )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
@@ -1619,7 +1684,7 @@ class TelegramAdapter(BasePlatformAdapter):
                return SendResult(success=False, error=f"File not found: {file_path}")

            display_name = file_name or os.path.basename(file_path)
-            _thread = metadata.get("thread_id") if metadata else None
+            _thread = self._metadata_thread_id(metadata)

            with open(file_path, "rb") as f:
                msg = await self._bot.send_document(
@@ -1628,7 +1693,7 @@ class TelegramAdapter(BasePlatformAdapter):
                    filename=display_name,
                    caption=caption[:1024] if caption else None,
                    reply_to_message_id=int(reply_to) if reply_to else None,
-                    message_thread_id=int(_thread) if _thread else None,
+                    message_thread_id=self._message_thread_id_for_send(_thread),
                )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
@@ -1652,14 +1717,14 @@ class TelegramAdapter(BasePlatformAdapter):
            if not os.path.exists(video_path):
                return SendResult(success=False, error=f"Video file not found: {video_path}")

-            _thread = metadata.get("thread_id") if metadata else None
+            _thread = self._metadata_thread_id(metadata)
            with open(video_path, "rb") as f:
                msg = await self._bot.send_video(
                    chat_id=int(chat_id),
                    video=f,
                    caption=caption[:1024] if caption else None,
                    reply_to_message_id=int(reply_to) if reply_to else None,
-                    message_thread_id=int(_thread) if _thread else None,
+                    message_thread_id=self._message_thread_id_for_send(_thread),
                )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
@@ -1689,13 +1754,13 @@ class TelegramAdapter(BasePlatformAdapter):

        try:
            # Telegram can send photos directly from URLs (up to ~5MB)
-            _photo_thread = metadata.get("thread_id") if metadata else None
+            _photo_thread = self._metadata_thread_id(metadata)
            msg = await self._bot.send_photo(
                chat_id=int(chat_id),
                photo=image_url,
                caption=caption[:1024] if caption else None,  # Telegram caption limit
                reply_to_message_id=int(reply_to) if reply_to else None,
-                message_thread_id=int(_photo_thread) if _photo_thread else None,
+                message_thread_id=self._message_thread_id_for_send(_photo_thread),
            )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
@@ -1718,6 +1783,7 @@ class TelegramAdapter(BasePlatformAdapter):
                    photo=image_data,
                    caption=caption[:1024] if caption else None,
                    reply_to_message_id=int(reply_to) if reply_to else None,
+                    message_thread_id=self._message_thread_id_for_send(_photo_thread),
                )
                return SendResult(success=True, message_id=str(msg.message_id))
            except Exception as e2:
@@ -1743,13 +1809,13 @@ class TelegramAdapter(BasePlatformAdapter):
            return SendResult(success=False, error="Not connected")
        
        try:
-            _anim_thread = metadata.get("thread_id") if metadata else None
+            _anim_thread = self._metadata_thread_id(metadata)
            msg = await self._bot.send_animation(
                chat_id=int(chat_id),
                animation=animation_url,
                caption=caption[:1024] if caption else None,
                reply_to_message_id=int(reply_to) if reply_to else None,
-                message_thread_id=int(_anim_thread) if _anim_thread else None,
+                message_thread_id=self._message_thread_id_for_send(_anim_thread),
            )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
@@ -1766,12 +1832,23 @@ class TelegramAdapter(BasePlatformAdapter):
        """Send typing indicator."""
        if self._bot:
            try:
-                _typing_thread = metadata.get("thread_id") if metadata else None
-                await self._bot.send_chat_action(
-                    chat_id=int(chat_id),
-                    action="typing",
-                    message_thread_id=int(_typing_thread) if _typing_thread else None,
-                )
+                _typing_thread = self._metadata_thread_id(metadata)
+                message_thread_id = self._message_thread_id_for_typing(_typing_thread)
+                try:
+                    await self._bot.send_chat_action(
+                        chat_id=int(chat_id),
+                        action="typing",
+                        message_thread_id=message_thread_id,
+                    )
+                except Exception as e:
+                    if message_thread_id is not None and self._is_thread_not_found_error(e):
+                        await self._bot.send_chat_action(
+                            chat_id=int(chat_id),
+                            action="typing",
+                            message_thread_id=None,
+                        )
+                    else:
+                        raise
            except Exception as e:
                # Typing failures are non-fatal; log at debug level only.
                logger.debug(
@@ -2716,7 +2793,9 @@ class TelegramAdapter(BasePlatformAdapter):

        # Resolve DM topic name and skill binding
        thread_id_raw = message.message_thread_id
-        thread_id_str = str(thread_id_raw) if thread_id_raw else None
+        thread_id_str = str(thread_id_raw) if thread_id_raw is not None else None
+        if chat_type == "group" and thread_id_str is None and getattr(chat, "is_forum", False):
+            thread_id_str = self._GENERAL_TOPIC_THREAD_ID
        chat_topic = None
        topic_skill = None

@@ -2765,6 +2844,15 @@ class TelegramAdapter(BasePlatformAdapter):
            reply_to_id = str(message.reply_to_message.message_id)
            reply_to_text = message.reply_to_message.text or message.reply_to_message.caption or None

+        # Per-channel/topic ephemeral prompt
+        from gateway.platforms.base import resolve_channel_prompt
+        _chat_id_str = str(chat.id)
+        _channel_prompt = resolve_channel_prompt(
+            self.config.extra,
+            thread_id_str or _chat_id_str,
+            _chat_id_str if thread_id_str else None,
+        )
+
        return MessageEvent(
            text=message.text or "",
            message_type=msg_type,
@@ -2774,6 +2862,7 @@ class TelegramAdapter(BasePlatformAdapter):
            reply_to_message_id=reply_to_id,
            reply_to_text=reply_to_text,
            auto_skill=topic_skill,
+            channel_prompt=_channel_prompt,
            timestamp=message.date,
        )

@@ -46,7 +46,7 @@ _SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
 def _resolve_proxy_url() -> str | None:
    # Delegate to shared implementation (env vars + macOS system proxy detection)
    from gateway.platforms.base import resolve_proxy_url
-    return resolve_proxy_url()
+    return resolve_proxy_url("TELEGRAM_PROXY")


 class TelegramFallbackTransport(httpx.AsyncBaseTransport):
@@ -258,6 +258,20 @@ class WecomCallbackAdapter(BasePlatformAdapter):
                )
                event = self._build_event(app, decrypted)
                if event is not None:
+                    # Deduplicate: WeCom retries callbacks on timeout,
+                    # producing duplicate inbound messages (#10305).
+                    if event.message_id:
+                        now = time.time()
+                        if event.message_id in self._seen_messages:
+                            if now - self._seen_messages[event.message_id] < MESSAGE_DEDUP_TTL_SECONDS:
+                                logger.debug("[WecomCallback] Duplicate MsgId %s, skipping", event.message_id)
+                                return web.Response(text="success", content_type="text/plain")
+                            del self._seen_messages[event.message_id]
+                        self._seen_messages[event.message_id] = now
+                        # Prune expired entries when cache grows large
+                        if len(self._seen_messages) > 2000:
+                            cutoff = now - MESSAGE_DEDUP_TTL_SECONDS
+                            self._seen_messages = {k: v for k, v in self._seen_messages.items() if v > cutoff}
                    # Record which app this user belongs to.
                    if event.source and event.source.user_id:
                        map_key = self._user_app_key(
@@ -301,6 +301,8 @@ def build_session_context_prompt(
    lines.append("")
    lines.append("**Delivery options for scheduled tasks:**")
    
+    from hermes_constants import display_hermes_home
+
    # Origin delivery
    if context.source.platform == Platform.LOCAL:
        lines.append("- `\"origin\"` → Local output (saved to files)")
@@ -309,9 +311,11 @@ def build_session_context_prompt(
            _hash_chat_id(context.source.chat_id) if redact_pii else context.source.chat_id
        )
        lines.append(f"- `\"origin\"` → Back to this chat ({_origin_label})")
-    
+
    # Local always available
-    lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
+    lines.append(
+        f"- `\"local\"` → Save to local files only ({display_hermes_home()}/cron/output/)"
+    )
    
    # Platform home channels
    for platform, home in context.home_channels.items():
@@ -37,18 +37,24 @@ needs to replace the import + call site:
 """

 from contextvars import ContextVar
+from typing import Any
+
+# Sentinel to distinguish "never set in this context" from "explicitly set to empty".
+# When a contextvar holds _UNSET, we fall back to os.environ (CLI/cron compat).
+# When it holds "" (after clear_session_vars resets it), we return "" — no fallback.
+_UNSET: Any = object()

 # ---------------------------------------------------------------------------
 # Per-task session variables
 # ---------------------------------------------------------------------------

-_SESSION_PLATFORM: ContextVar[str] = ContextVar("HERMES_SESSION_PLATFORM", default="")
-_SESSION_CHAT_ID: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_ID", default="")
-_SESSION_CHAT_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_NAME", default="")
-_SESSION_THREAD_ID: ContextVar[str] = ContextVar("HERMES_SESSION_THREAD_ID", default="")
-_SESSION_USER_ID: ContextVar[str] = ContextVar("HERMES_SESSION_USER_ID", default="")
-_SESSION_USER_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_USER_NAME", default="")
-_SESSION_KEY: ContextVar[str] = ContextVar("HERMES_SESSION_KEY", default="")
+_SESSION_PLATFORM: ContextVar = ContextVar("HERMES_SESSION_PLATFORM", default=_UNSET)
+_SESSION_CHAT_ID: ContextVar = ContextVar("HERMES_SESSION_CHAT_ID", default=_UNSET)
+_SESSION_CHAT_NAME: ContextVar = ContextVar("HERMES_SESSION_CHAT_NAME", default=_UNSET)
+_SESSION_THREAD_ID: ContextVar = ContextVar("HERMES_SESSION_THREAD_ID", default=_UNSET)
+_SESSION_USER_ID: ContextVar = ContextVar("HERMES_SESSION_USER_ID", default=_UNSET)
+_SESSION_USER_NAME: ContextVar = ContextVar("HERMES_SESSION_USER_NAME", default=_UNSET)
+_SESSION_KEY: ContextVar = ContextVar("HERMES_SESSION_KEY", default=_UNSET)

 _VAR_MAP = {
    "HERMES_SESSION_PLATFORM": _SESSION_PLATFORM,
@@ -91,10 +97,17 @@ def set_session_vars(


 def clear_session_vars(tokens: list) -> None:
-    """Restore session context variables to their pre-handler values."""
-    if not tokens:
-        return
-    vars_in_order = [
+    """Mark session context variables as explicitly cleared.
+
+    Sets all variables to ``""`` so that ``get_session_env`` returns an empty
+    string instead of falling back to (potentially stale) ``os.environ``
+    values.  The *tokens* argument is accepted for API compatibility with
+    callers that saved the return value of ``set_session_vars``, but the
+    actual clearing uses ``var.set("")`` rather than ``var.reset(token)``
+    to ensure the "explicitly cleared" state is distinguishable from
+    "never set" (which holds the ``_UNSET`` sentinel).
+    """
+    for var in (
        _SESSION_PLATFORM,
        _SESSION_CHAT_ID,
        _SESSION_CHAT_NAME,
@@ -102,9 +115,8 @@ def clear_session_vars(tokens: list) -> None:
        _SESSION_USER_ID,
        _SESSION_USER_NAME,
        _SESSION_KEY,
-    ]
-    for var, token in zip(vars_in_order, tokens):
-        var.reset(token)
+    ):
+        var.set("")


 def get_session_env(name: str, default: str = "") -> str:
@@ -113,8 +125,13 @@ def get_session_env(name: str, default: str = "") -> str:
    Drop-in replacement for ``os.getenv("HERMES_SESSION_*", default)``.

    Resolution order:
-    1. Context variable (set by the gateway for concurrency-safe access)
-    2. ``os.environ`` (used by CLI, cron scheduler, and tests)
+    1. Context variable (set by the gateway for concurrency-safe access).
+       If the variable was explicitly set (even to ``""``) via
+       ``set_session_vars`` or ``clear_session_vars``, that value is
+       returned — **no fallback to os.environ**.
+    2. ``os.environ`` (only when the context variable was never set in
+       this context — i.e. CLI, cron scheduler, and test processes that
+       don't use ``set_session_vars`` at all).
    3. *default*
    """
    import os
@@ -122,7 +139,7 @@ def get_session_env(name: str, default: str = "") -> str:
    var = _VAR_MAP.get(name)
    if var is not None:
        value = var.get()
-        if value:
+        if value is not _UNSET:
            return value
    # Fall back to os.environ for CLI, cron, and test compatibility
    return os.getenv(name, default)
@@ -43,6 +43,7 @@ class StreamConsumerConfig:
    edit_interval: float = 1.0
    buffer_threshold: int = 40
    cursor: str = " ▉"
+    buffer_only: bool = False


 class GatewayStreamConsumer:
@@ -295,10 +296,13 @@ class GatewayStreamConsumer:
                    got_done
                    or got_segment_break
                    or commentary_text is not None
-                    or (elapsed >= self._current_edit_interval
-                        and self._accumulated)
-                    or len(self._accumulated) >= self.cfg.buffer_threshold
                )
+                if not self.cfg.buffer_only:
+                    should_edit = should_edit or (
+                        (elapsed >= self._current_edit_interval
+                            and self._accumulated)
+                        or len(self._accumulated) >= self.cfg.buffer_threshold
+                    )

                current_update_visible = False
                if should_edit and self._accumulated:
@@ -403,18 +407,20 @@ class GatewayStreamConsumer:

        except asyncio.CancelledError:
            # Best-effort final edit on cancellation
+            _best_effort_ok = False
            if self._accumulated and self._message_id:
                try:
-                    await self._send_or_edit(self._accumulated)
+                    _best_effort_ok = bool(await self._send_or_edit(self._accumulated))
                except Exception:
                    pass
-            # If we delivered any content before being cancelled, mark the
-            # final response as sent so the gateway's already_sent check
-            # doesn't trigger a duplicate message.  The 5-second
-            # stream_task timeout (gateway/run.py) can cancel us while
-            # waiting on a slow Telegram API call — without this flag the
-            # gateway falls through to the normal send path.
-            if self._already_sent:
+            # Only confirm final delivery if the best-effort send above
+            # actually succeeded OR if the final response was already
+            # confirmed before we were cancelled.  Previously this
+            # promoted any partial send (already_sent=True) to
+            # final_response_sent — which suppressed the gateway's
+            # fallback send even when only intermediate text (e.g.
+            # "Let me search…") had been delivered, not the real answer.
+            if _best_effort_ok and not self._final_response_sent:
                self._final_response_sent = True
        except Exception as e:
            logger.error("Stream consumer error: %s", e)
@@ -513,9 +519,17 @@ class GatewayStreamConsumer:
        self._fallback_final_send = False
        if not continuation.strip():
            # Nothing new to send — the visible partial already matches final text.
-            self._already_sent = True
-            self._final_response_sent = True
-            return
+            # BUT: if final_text itself has meaningful content (e.g. a timeout
+            # message after a long tool call), the prefix-based continuation
+            # calculation may wrongly conclude "already shown" because the
+            # streamed prefix was from a *previous* segment (before the tool
+            # boundary).  In that case, send the full final_text as-is (#10807).
+            if final_text.strip() and final_text != self._visible_prefix():
+                continuation = final_text
+            else:
+                self._already_sent = True
+                self._final_response_sent = True
+                return

        raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
        safe_limit = max(500, raw_limit - 100)
@@ -609,12 +623,15 @@ class GatewayStreamConsumer:
                content=text,
                metadata=self.metadata,
            )
-            if result.success:
-                self._already_sent = True
-                return True
+            # Note: do NOT set _already_sent = True here.
+            # Commentary messages are interim status updates (e.g. "Using browser
+            # tool..."), not the final response. Setting already_sent would cause
+            # the final response to be incorrectly suppressed when there are
+            # multiple tool calls. See: https://github.com/NousResearch/hermes-agent/issues/10454
+            return result.success
        except Exception as e:
            logger.error("Commentary send error: %s", e)
-        return False
+            return False

    async def _send_or_edit(self, text: str) -> bool:
        """Send or edit the streaming message.
@@ -11,5 +11,5 @@ Provides subcommands for:
 - hermes cron          - Manage cron jobs
 """

-__version__ = "0.9.0"
-__release_date__ = "2026.4.13"
+__version__ = "0.10.0"
+__release_date__ = "2026.4.16"
@@ -70,6 +70,7 @@ DEFAULT_CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex"
 DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
 DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
 DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
+DEFAULT_OLLAMA_CLOUD_BASE_URL = "https://ollama.com/v1"
 CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
 CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
 CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
@@ -77,6 +78,10 @@ QWEN_OAUTH_CLIENT_ID = "f0304373b74a44d2b584a3fb70ca9e56"
 QWEN_OAUTH_TOKEN_URL = "https://chat.qwen.ai/api/v1/oauth2/token"
 QWEN_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120

+# Google Gemini OAuth (google-gemini-cli provider, Cloud Code Assist backend)
+DEFAULT_GEMINI_CLOUDCODE_BASE_URL = "cloudcode-pa://google"
+GEMINI_OAUTH_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 60  # refresh 60s before expiry
+

 # =============================================================================
 # Provider Registry
@@ -121,6 +126,12 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        auth_type="oauth_external",
        inference_base_url=DEFAULT_QWEN_BASE_URL,
    ),
+    "google-gemini-cli": ProviderConfig(
+        id="google-gemini-cli",
+        name="Google Gemini (OAuth)",
+        auth_type="oauth_external",
+        inference_base_url=DEFAULT_GEMINI_CLOUDCODE_BASE_URL,
+    ),
    "copilot": ProviderConfig(
        id="copilot",
        name="GitHub Copilot",
@@ -274,6 +285,22 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        api_key_env_vars=("XIAOMI_API_KEY",),
        base_url_env_var="XIAOMI_BASE_URL",
    ),
+    "ollama-cloud": ProviderConfig(
+        id="ollama-cloud",
+        name="Ollama Cloud",
+        auth_type="api_key",
+        inference_base_url=DEFAULT_OLLAMA_CLOUD_BASE_URL,
+        api_key_env_vars=("OLLAMA_API_KEY",),
+        base_url_env_var="OLLAMA_BASE_URL",
+    ),
+    "bedrock": ProviderConfig(
+        id="bedrock",
+        name="AWS Bedrock",
+        auth_type="aws_sdk",
+        inference_base_url="https://bedrock-runtime.us-east-1.amazonaws.com",
+        api_key_env_vars=(),
+        base_url_env_var="BEDROCK_BASE_URL",
+    ),
 }


@@ -911,6 +938,7 @@ def resolve_provider(
    _PROVIDER_ALIASES = {
        "glm": "zai", "z-ai": "zai", "z.ai": "zai", "zhipu": "zai",
        "google": "gemini", "google-gemini": "gemini", "google-ai-studio": "gemini",
+        "x-ai": "xai", "x.ai": "xai", "grok": "xai",
        "kimi": "kimi-coding", "kimi-for-coding": "kimi-coding", "moonshot": "kimi-coding",
        "kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
        "arcee-ai": "arcee", "arceeai": "arcee",
@@ -921,14 +949,16 @@ def resolve_provider(
        "github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
        "aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
        "opencode": "opencode-zen", "zen": "opencode-zen",
-        "qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth",
+        "qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth", "google-gemini-cli": "google-gemini-cli", "gemini-cli": "google-gemini-cli", "gemini-oauth": "google-gemini-cli",
        "hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
        "mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
+        "aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
        "go": "opencode-go", "opencode-go-sub": "opencode-go",
        "kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
        # Local server aliases — route through the generic custom provider
        "lmstudio": "custom", "lm-studio": "custom", "lm_studio": "custom",
-        "ollama": "custom", "vllm": "custom", "llamacpp": "custom",
+        "ollama": "custom", "ollama_cloud": "ollama-cloud",
+        "vllm": "custom", "llamacpp": "custom",
        "llama.cpp": "custom", "llama-cpp": "custom",
    }
    normalized = _PROVIDER_ALIASES.get(normalized, normalized)
@@ -980,6 +1010,15 @@ def resolve_provider(
            if has_usable_secret(os.getenv(env_var, "")):
                return pid

+    # AWS Bedrock — detect via boto3 credential chain (IAM roles, SSO, env vars).
+    # This runs after API-key providers so explicit keys always win.
+    try:
+        from agent.bedrock_adapter import has_aws_credentials
+        if has_aws_credentials():
+            return "bedrock"
+    except ImportError:
+        pass  # boto3 not installed — skip Bedrock auto-detection
+
    raise AuthError(
        "No inference provider configured. Run 'hermes model' to choose a "
        "provider and model, or set an API key (OPENROUTER_API_KEY, "
@@ -1222,6 +1261,83 @@ def get_qwen_auth_status() -> Dict[str, Any]:
        }


+# =============================================================================
+# Google Gemini OAuth (google-gemini-cli) — PKCE flow + Cloud Code Assist.
+#
+# Tokens live in ~/.hermes/auth/google_oauth.json (managed by agent.google_oauth).
+# The `base_url` here is the marker "cloudcode-pa://google" that run_agent.py
+# uses to construct a GeminiCloudCodeClient instead of the default OpenAI SDK.
+# Actual HTTP traffic goes to https://cloudcode-pa.googleapis.com/v1internal:*.
+# =============================================================================
+
+def resolve_gemini_oauth_runtime_credentials(
+    *,
+    force_refresh: bool = False,
+) -> Dict[str, Any]:
+    """Resolve runtime OAuth creds for google-gemini-cli."""
+    try:
+        from agent.google_oauth import (
+            GoogleOAuthError,
+            _credentials_path,
+            get_valid_access_token,
+            load_credentials,
+        )
+    except ImportError as exc:
+        raise AuthError(
+            f"agent.google_oauth is not importable: {exc}",
+            provider="google-gemini-cli",
+            code="google_oauth_module_missing",
+        ) from exc
+
+    try:
+        access_token = get_valid_access_token(force_refresh=force_refresh)
+    except GoogleOAuthError as exc:
+        raise AuthError(
+            str(exc),
+            provider="google-gemini-cli",
+            code=exc.code,
+        ) from exc
+
+    creds = load_credentials()
+    base_url = DEFAULT_GEMINI_CLOUDCODE_BASE_URL
+    return {
+        "provider": "google-gemini-cli",
+        "base_url": base_url,
+        "api_key": access_token,
+        "source": "google-oauth",
+        "expires_at_ms": (creds.expires_ms if creds else None),
+        "auth_file": str(_credentials_path()),
+        "email": (creds.email if creds else "") or "",
+        "project_id": (creds.project_id if creds else "") or "",
+    }
+
+
+def get_gemini_oauth_auth_status() -> Dict[str, Any]:
+    """Return a status dict for `hermes auth list` / `hermes status`."""
+    try:
+        from agent.google_oauth import _credentials_path, load_credentials
+    except ImportError:
+        return {"logged_in": False, "error": "agent.google_oauth unavailable"}
+    auth_path = _credentials_path()
+    creds = load_credentials()
+    if creds is None or not creds.access_token:
+        return {
+            "logged_in": False,
+            "auth_file": str(auth_path),
+            "error": "not logged in",
+        }
+    return {
+        "logged_in": True,
+        "auth_file": str(auth_path),
+        "source": "google-oauth",
+        "api_key": creds.access_token,
+        "expires_at_ms": creds.expires_ms,
+        "email": creds.email,
+        "project_id": creds.project_id,
+    }
+
+
+
 # =============================================================================
 # SSH / remote session detection
 # =============================================================================
@@ -2384,7 +2500,7 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
    if pconfig.base_url_env_var:
        env_url = os.getenv(pconfig.base_url_env_var, "").strip()

-    if provider_id == "kimi-coding":
+    if provider_id in ("kimi-coding", "kimi-coding-cn"):
        base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
    elif env_url:
        base_url = env_url
@@ -2440,12 +2556,21 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
        return get_codex_auth_status()
    if target == "qwen-oauth":
        return get_qwen_auth_status()
+    if target == "google-gemini-cli":
+        return get_gemini_oauth_auth_status()
    if target == "copilot-acp":
        return get_external_process_provider_status(target)
    # API-key providers
    pconfig = PROVIDER_REGISTRY.get(target)
    if pconfig and pconfig.auth_type == "api_key":
        return get_api_key_provider_status(target)
+    # AWS SDK providers (Bedrock) — check via boto3 credential chain
+    if pconfig and pconfig.auth_type == "aws_sdk":
+        try:
+            from agent.bedrock_adapter import has_aws_credentials
+            return {"logged_in": has_aws_credentials(), "provider": target}
+        except ImportError:
+            return {"logged_in": False, "provider": target, "error": "boto3 not installed"}
    return {"logged_in": False}


@@ -2470,7 +2595,7 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
    if pconfig.base_url_env_var:
        env_url = os.getenv(pconfig.base_url_env_var, "").strip()

-    if provider_id == "kimi-coding":
+    if provider_id in ("kimi-coding", "kimi-coding-cn"):
        base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
    elif provider_id == "zai":
        base_url = _resolve_zai_base_url(api_key, pconfig.inference_base_url, env_url)
@@ -4,6 +4,7 @@ from __future__ import annotations

 from getpass import getpass
 import math
+import sys
 import time
 from types import SimpleNamespace
 import uuid
@@ -32,7 +33,7 @@ from hermes_constants import OPENROUTER_BASE_URL


 # Providers that support OAuth login in addition to API keys.
-_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex", "qwen-oauth"}
+_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex", "qwen-oauth", "google-gemini-cli"}


 def _get_custom_provider_names() -> list:
@@ -147,7 +148,7 @@ def auth_add_command(args) -> None:
        if provider.startswith(CUSTOM_POOL_PREFIX):
            requested_type = AUTH_TYPE_API_KEY
        else:
-            requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex", "qwen-oauth"} else AUTH_TYPE_API_KEY
+            requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex", "qwen-oauth", "google-gemini-cli"} else AUTH_TYPE_API_KEY

    pool = load_pool(provider)

@@ -160,7 +161,10 @@ def auth_add_command(args) -> None:
        default_label = _api_key_default_label(len(pool.entries()) + 1)
        label = (getattr(args, "label", None) or "").strip()
        if not label:
-            label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
+            if sys.stdin.isatty():
+                label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
+            else:
+                label = default_label
        entry = PooledCredential(
            provider=provider,
            id=uuid.uuid4().hex[:6],
@@ -250,6 +254,27 @@ def auth_add_command(args) -> None:
        print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
        return

+    if provider == "google-gemini-cli":
+        from agent.google_oauth import run_gemini_oauth_login_pure
+
+        creds = run_gemini_oauth_login_pure()
+        label = (getattr(args, "label", None) or "").strip() or (
+            creds.get("email") or _oauth_default_label(provider, len(pool.entries()) + 1)
+        )
+        entry = PooledCredential(
+            provider=provider,
+            id=uuid.uuid4().hex[:6],
+            label=label,
+            auth_type=AUTH_TYPE_OAUTH,
+            priority=0,
+            source=f"{SOURCE_MANUAL}:google_pkce",
+            access_token=creds["access_token"],
+            refresh_token=creds.get("refresh_token"),
+        )
+        pool.add_entry(entry)
+        print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
+        return
+
    if provider == "qwen-oauth":
        creds = auth_mod.resolve_qwen_runtime_credentials(refresh_if_expiring=False)
        label = (getattr(args, "label", None) or "").strip() or label_from_token(
@@ -368,6 +393,27 @@ def _interactive_auth() -> None:
    print("=" * 50)

    auth_list_command(SimpleNamespace(provider=None))
+
+    # Show AWS Bedrock credential status (not in the pool — uses boto3 chain)
+    try:
+        from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
+        if has_aws_credentials():
+            auth_source = resolve_aws_auth_env_var() or "unknown"
+            region = resolve_bedrock_region()
+            print(f"bedrock (AWS SDK credential chain):")
+            print(f"  Auth: {auth_source}")
+            print(f"  Region: {region}")
+            try:
+                import boto3
+                sts = boto3.client("sts", region_name=region)
+                identity = sts.get_caller_identity()
+                arn = identity.get("Arn", "unknown")
+                print(f"  Identity: {arn}")
+            except Exception:
+                print(f"  Identity: (could not resolve — boto3 STS call failed)")
+            print()
+    except ImportError:
+        pass  # boto3 or bedrock_adapter not available
    print()

    # Main menu
@@ -102,6 +102,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--global]"),
    CommandDef("provider", "Show available providers and current provider",
               "Configuration"),
+    CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info"),

    CommandDef("personality", "Set a predefined personality", "Configuration",
               args_hint="[name]"),
@@ -164,7 +165,7 @@ COMMAND_REGISTRY: list[CommandDef] = [

    # Exit
    CommandDef("quit", "Exit the CLI", "Exit",
-               cli_only=True, aliases=("exit", "q")),
+               cli_only=True, aliases=("exit",)),
 ]


@@ -450,7 +451,7 @@ def _collect_gateway_skill_entries(
            name = sanitize_name(cmd_name) if sanitize_name else cmd_name
            if not name:
                continue
-            desc = "Plugin command"
+            desc = plugin_cmds[cmd_name].get("description", "Plugin command")
            if len(desc) > desc_limit:
                desc = desc[:desc_limit - 3] + "..."
            plugin_pairs.append((name, desc))
@@ -1139,6 +1140,22 @@ class SlashCommandCompleter(Completer):
                    display_meta=f"⚡ {short_desc}",
                )

+        # Plugin-registered slash commands
+        try:
+            from hermes_cli.plugins import get_plugin_commands
+            for cmd_name, cmd_info in get_plugin_commands().items():
+                if cmd_name.startswith(word):
+                    desc = str(cmd_info.get("description", "Plugin command"))
+                    short_desc = desc[:50] + ("..." if len(desc) > 50 else "")
+                    yield Completion(
+                        self._completion_text(cmd_name, word),
+                        start_position=-len(word),
+                        display=f"/{cmd_name}",
+                        display_meta=f"🔌 {short_desc}",
+                    )
+        except Exception:
+            pass
+

 # ---------------------------------------------------------------------------
 # Inline auto-suggest (ghost text) for slash commands
@@ -23,7 +23,6 @@ from dataclasses import dataclass
 from pathlib import Path
 from typing import Dict, Any, Optional, List, Tuple

-from tools.tool_backend_helpers import managed_nous_tools_enabled as _managed_nous_tools_enabled

 _IS_WINDOWS = platform.system() == "Windows"
 _ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
@@ -241,13 +240,41 @@ def _secure_dir(path):
        pass


+def _is_container() -> bool:
+    """Detect if we're running inside a Docker/Podman/LXC container.
+
+    When Hermes runs in a container with volume-mounted config files, forcing
+    0o600 permissions breaks multi-process setups where the gateway and
+    dashboard run as different UIDs or the volume mount requires broader
+    permissions.
+    """
+    # Explicit opt-out
+    if os.environ.get("HERMES_CONTAINER") or os.environ.get("HERMES_SKIP_CHMOD"):
+        return True
+    # Docker / Podman marker file
+    if os.path.exists("/.dockerenv"):
+        return True
+    # LXC / cgroup-based detection
+    try:
+        with open("/proc/1/cgroup", "r") as f:
+            cgroup_content = f.read()
+        if "docker" in cgroup_content or "lxc" in cgroup_content or "kubepods" in cgroup_content:
+            return True
+    except (OSError, IOError):
+        pass
+    return False
+
+
 def _secure_file(path):
    """Set file to owner-only read/write (0600). No-op on Windows.

    Skipped in managed mode — the NixOS activation script sets
    group-readable permissions (0640) on config files.
+
+    Skipped in containers — Docker/Podman volume mounts often need broader
+    permissions.  Set HERMES_SKIP_CHMOD=1 to force-skip on other systems.
    """
-    if is_managed():
+    if is_managed() or _is_container():
        return
    try:
        if os.path.exists(str(path)):
@@ -392,8 +419,7 @@ DEFAULT_CONFIG = {
        "allow_private_urls": False,  # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
        "camofox": {
            # When true, Hermes sends a stable profile-scoped userId to Camofox
-            # so the server can map it to a persistent browser profile directory.
-            # Requires Camofox server to be configured with CAMOFOX_PROFILE_DIR.
+            # so the server maps it to a persistent Firefox profile automatically.
            # When false (default), each session gets a random userId (ephemeral).
            "managed_persistence": False,
        },
@@ -419,6 +445,27 @@ DEFAULT_CONFIG = {
        "protect_last_n": 20,         # minimum recent messages to keep uncompressed

    },
+
+    # AWS Bedrock provider configuration.
+    # Only used when model.provider is "bedrock".
+    "bedrock": {
+        "region": "",  # AWS region for Bedrock API calls (empty = AWS_REGION env var → us-east-1)
+        "discovery": {
+            "enabled": True,           # Auto-discover models via ListFoundationModels
+            "provider_filter": [],     # Only show models from these providers (e.g. ["anthropic", "amazon"])
+            "refresh_interval": 3600,  # Cache discovery results for this many seconds
+        },
+        "guardrail": {
+            # Amazon Bedrock Guardrails — content filtering and safety policies.
+            # Create a guardrail in the Bedrock console, then set the ID and version here.
+            # See: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html
+            "guardrail_identifier": "",  # e.g. "abc123def456"
+            "guardrail_version": "",     # e.g. "1" or "DRAFT"
+            "stream_processing_mode": "async",  # "sync" or "async"
+            "trace": "disabled",         # "enabled", "disabled", or "enabled_full"
+        },
+    },
+
    "smart_model_routing": {
        "enabled": False,
        "max_simple_chars": 160,
@@ -510,6 +557,11 @@ DEFAULT_CONFIG = {
        "platforms": {},  # Per-platform display overrides: {"telegram": {"tool_progress": "all"}, "slack": {"tool_progress": "off"}}
    },

+    # Web dashboard settings
+    "dashboard": {
+        "theme": "default",  # Dashboard visual theme: "default", "midnight", "ember", "mono", "cyberpunk", "rose"
+    },
+
    # Privacy settings
    "privacy": {
        "redact_pii": False,  # When True, hash user IDs and strip phone numbers from LLM context
@@ -517,7 +569,7 @@ DEFAULT_CONFIG = {
    
    # Text-to-speech configuration
    "tts": {
-        "provider": "edge",  # "edge" (free) | "elevenlabs" (premium) | "openai" | "minimax" | "mistral" | "neutts" (local)
+        "provider": "edge",  # "edge" (free) | "elevenlabs" (premium) | "openai" | "xai" | "minimax" | "mistral" | "neutts" (local)
        "edge": {
            "voice": "en-US-AriaNeural",
            # Popular: AriaNeural, JennyNeural, AndrewNeural, BrianNeural, SoniaNeural
@@ -531,6 +583,12 @@ DEFAULT_CONFIG = {
            "voice": "alloy",
            # Voices: alloy, echo, fable, onyx, nova, shimmer
        },
+        "xai": {
+            "voice_id": "eve",
+            "language": "en",
+            "sample_rate": 24000,
+            "bit_rate": 128000,
+        },
        "mistral": {
            "model": "voxtral-mini-tts-2603",
            "voice_id": "c69964a6-ab8b-4f8a-9465-ec0925096ec8",  # Paul - Neutral
@@ -638,6 +696,7 @@ DEFAULT_CONFIG = {
        "allowed_channels": "",        # If set, bot ONLY responds in these channel IDs (whitelist)
        "auto_thread": True,           # Auto-create threads on @mention in channels (like Slack)
        "reactions": True,             # Add 👀/✅/❌ reactions to messages during processing
+        "channel_prompts": {},         # Per-channel ephemeral system prompts (forum parents apply to child threads)
    },

    # WhatsApp platform settings (gateway mode)
@@ -648,6 +707,21 @@ DEFAULT_CONFIG = {
        # Supports \n for newlines, e.g. "🤖 *My Bot*\n──────\n"
    },

+    # Telegram platform settings (gateway mode)
+    "telegram": {
+        "channel_prompts": {},         # Per-chat/topic ephemeral system prompts (topics inherit from parent group)
+    },
+
+    # Slack platform settings (gateway mode)
+    "slack": {
+        "channel_prompts": {},         # Per-channel ephemeral system prompts
+    },
+
+    # Mattermost platform settings (gateway mode)
+    "mattermost": {
+        "channel_prompts": {},         # Per-channel ephemeral system prompts
+    },
+
    # Approval mode for dangerous commands:
    #   manual — always prompt the user (default)
    #   smart  — use auxiliary LLM to auto-approve low-risk commands, prompt for high-risk
@@ -703,7 +777,7 @@ DEFAULT_CONFIG = {
    },

    # Config schema version - bump this when adding new required fields
-    "_config_version": 17,
+    "_config_version": 18,
 }

 # =============================================================================
@@ -771,6 +845,22 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "XAI_API_KEY": {
+        "description": "xAI API key",
+        "prompt": "xAI API key",
+        "url": "https://console.x.ai/",
+        "password": True,
+        "category": "provider",
+        "advanced": True,
+    },
+    "XAI_BASE_URL": {
+        "description": "xAI base URL override",
+        "prompt": "xAI base URL (leave empty for default)",
+        "url": None,
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
    "GLM_API_KEY": {
        "description": "Z.AI / GLM API key (also recognized as ZAI_API_KEY / Z_AI_API_KEY)",
        "prompt": "Z.AI / GLM API key",
@@ -912,6 +1002,30 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "HERMES_GEMINI_CLIENT_ID": {
+        "description": "Google OAuth client ID for google-gemini-cli (optional; defaults to Google's public gemini-cli client)",
+        "prompt": "Google OAuth client ID (optional — leave empty to use the public default)",
+        "url": "https://console.cloud.google.com/apis/credentials",
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
+    "HERMES_GEMINI_CLIENT_SECRET": {
+        "description": "Google OAuth client secret for google-gemini-cli (optional)",
+        "prompt": "Google OAuth client secret (optional)",
+        "url": "https://console.cloud.google.com/apis/credentials",
+        "password": True,
+        "category": "provider",
+        "advanced": True,
+    },
+    "HERMES_GEMINI_PROJECT_ID": {
+        "description": "GCP project ID for paid Gemini tiers (free tier auto-provisions)",
+        "prompt": "GCP project ID for Gemini OAuth (leave empty for free tier)",
+        "url": None,
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
    "OPENCODE_ZEN_API_KEY": {
        "description": "OpenCode Zen API key (pay-as-you-go access to curated models)",
        "prompt": "OpenCode Zen API key",
@@ -959,6 +1073,22 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "OLLAMA_API_KEY": {
+        "description": "Ollama Cloud API key (ollama.com — cloud-hosted open models)",
+        "prompt": "Ollama Cloud API key",
+        "url": "https://ollama.com/settings",
+        "password": True,
+        "category": "provider",
+        "advanced": True,
+    },
+    "OLLAMA_BASE_URL": {
+        "description": "Ollama Cloud base URL override (default: https://ollama.com/v1)",
+        "prompt": "Ollama base URL (leave empty for default)",
+        "url": None,
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
    "XIAOMI_API_KEY": {
        "description": "Xiaomi MiMo API key for MiMo models (mimo-v2-pro, mimo-v2-omni, mimo-v2-flash)",
        "prompt": "Xiaomi MiMo API Key",
@@ -974,6 +1104,22 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "AWS_REGION": {
+        "description": "AWS region for Bedrock API calls (e.g. us-east-1, eu-central-1)",
+        "prompt": "AWS Region",
+        "url": "https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-regions.html",
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
+    "AWS_PROFILE": {
+        "description": "AWS named profile for Bedrock authentication (from ~/.aws/credentials)",
+        "prompt": "AWS Profile",
+        "url": None,
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },

    # ── Tool API keys ──
    "EXA_API_KEY": {
@@ -1171,6 +1317,12 @@ OPTIONAL_ENV_VARS = {
        "password": False,
        "category": "messaging",
    },
+    "TELEGRAM_PROXY": {
+        "description": "Proxy URL for Telegram connections (overrides HTTPS_PROXY). Supports http://, https://, socks5://",
+        "prompt": "Telegram proxy URL (optional)",
+        "password": False,
+        "category": "messaging",
+    },
    "DISCORD_BOT_TOKEN": {
        "description": "Discord bot token from Developer Portal",
        "prompt": "Discord bot token",
@@ -1468,13 +1620,8 @@ OPTIONAL_ENV_VARS = {
    },

    # ── Agent settings ──
-    "MESSAGING_CWD": {
-        "description": "Working directory for terminal commands via messaging",
-        "prompt": "Messaging working directory (default: home)",
-        "url": None,
-        "password": False,
-        "category": "setting",
-    },
+    # NOTE: MESSAGING_CWD was removed here — use terminal.cwd in config.yaml
+    # instead.  The gateway reads TERMINAL_CWD (bridged from terminal.cwd).
    "SUDO_PASSWORD": {
        "description": "Sudo password for terminal commands requiring root access; set to an explicit empty string to try empty without prompting",
        "prompt": "Sudo password",
@@ -1522,14 +1669,8 @@ OPTIONAL_ENV_VARS = {
    },
 }

-if not _managed_nous_tools_enabled():
-    for _hidden_var in (
-        "FIRECRAWL_GATEWAY_URL",
-        "TOOL_GATEWAY_DOMAIN",
-        "TOOL_GATEWAY_SCHEME",
-        "TOOL_GATEWAY_USER_TOKEN",
-    ):
-        OPTIONAL_ENV_VARS.pop(_hidden_var, None)
+# Tool Gateway env vars are always visible — they're useful for
+# self-hosted / custom gateway setups regardless of subscription state.


 def get_missing_env_vars(required_only: bool = False) -> List[Dict[str, Any]]:
@@ -1953,6 +2094,52 @@ def print_config_warnings(config: Optional[Dict[str, Any]] = None) -> None:
    sys.stderr.write("\n".join(lines) + "\n\n")


+def warn_deprecated_cwd_env_vars(config: Optional[Dict[str, Any]] = None) -> None:
+    """Warn if MESSAGING_CWD or TERMINAL_CWD is set in .env instead of config.yaml.
+
+    These env vars are deprecated — the canonical setting is terminal.cwd
+    in config.yaml.  Prints a migration hint to stderr.
+    """
+    import os, sys
+    messaging_cwd = os.environ.get("MESSAGING_CWD")
+    terminal_cwd_env = os.environ.get("TERMINAL_CWD")
+
+    if config is None:
+        try:
+            config = load_config()
+        except Exception:
+            return
+
+    terminal_cfg = config.get("terminal", {})
+    config_cwd = terminal_cfg.get("cwd", ".") if isinstance(terminal_cfg, dict) else "."
+    # Only warn if config.yaml doesn't have an explicit path
+    config_has_explicit_cwd = config_cwd not in (".", "auto", "cwd", "")
+
+    lines: list[str] = []
+    if messaging_cwd:
+        lines.append(
+            f"  \033[33m⚠\033[0m MESSAGING_CWD={messaging_cwd} found in .env — "
+            f"this is deprecated."
+        )
+    if terminal_cwd_env and not config_has_explicit_cwd:
+        # TERMINAL_CWD in env but not from config bridge — likely from .env
+        lines.append(
+            f"  \033[33m⚠\033[0m TERMINAL_CWD={terminal_cwd_env} found in .env — "
+            f"this is deprecated."
+        )
+    if lines:
+        hint_path = os.environ.get("HERMES_HOME", "~/.hermes")
+        lines.insert(0, "\033[33m⚠ Deprecated .env settings detected:\033[0m")
+        lines.append(
+            f"  \033[2mMove to config.yaml instead:  "
+            f"terminal:\\n    cwd: /your/project/path\033[0m"
+        )
+        lines.append(
+            f"  \033[2mThen remove the old entries from {hint_path}/.env\033[0m"
+        )
+        sys.stderr.write("\n".join(lines) + "\n\n")
+
+
 def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, Any]:
    """
    Migrate config to latest version, prompting for new required fields.
@@ -2847,12 +3034,25 @@ def save_env_value(key: str, value: str):
        lines.append(f"{key}={value}\n")
    
    fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
+    # Preserve original permissions so Docker volume mounts aren't clobbered.
+    original_mode = None
+    if env_path.exists():
+        try:
+            original_mode = stat.S_IMODE(env_path.stat().st_mode)
+        except OSError:
+            pass
    try:
        with os.fdopen(fd, 'w', **write_kw) as f:
            f.writelines(lines)
            f.flush()
            os.fsync(f.fileno())
        os.replace(tmp_path, env_path)
+        # Restore original permissions before _secure_file may tighten them.
+        if original_mode is not None:
+            try:
+                os.chmod(env_path, original_mode)
+            except OSError:
+                pass
    except BaseException:
        try:
            os.unlink(tmp_path)
@@ -2863,13 +3063,6 @@ def save_env_value(key: str, value: str):

    os.environ[key] = value

-    # Restrict .env permissions to owner-only (contains API keys)
-    if not _IS_WINDOWS:
-        try:
-            os.chmod(env_path, stat.S_IRUSR | stat.S_IWUSR)
-        except OSError:
-            pass
-

 def remove_env_value(key: str) -> bool:
    """Remove a key from ~/.hermes/.env and os.environ.
@@ -2898,12 +3091,23 @@ def remove_env_value(key: str) -> bool:

    if found:
        fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
+        # Preserve original permissions so Docker volume mounts aren't clobbered.
+        original_mode = None
+        try:
+            original_mode = stat.S_IMODE(env_path.stat().st_mode)
+        except OSError:
+            pass
        try:
            with os.fdopen(fd, 'w', **write_kw) as f:
                f.writelines(new_lines)
                f.flush()
                os.fsync(f.fileno())
            os.replace(tmp_path, env_path)
+            if original_mode is not None:
+                try:
+                    os.chmod(env_path, original_mode)
+                except OSError:
+                    pass
        except BaseException:
            try:
                os.unlink(tmp_path)
@@ -166,6 +166,7 @@ def curses_radiolist(
    selected: int = 0,
    *,
    cancel_returns: int | None = None,
+    description: str | None = None,
 ) -> int:
    """Curses single-select radio list. Returns the selected index.

@@ -174,6 +175,9 @@ def curses_radiolist(
        items: Display labels for each row.
        selected: Index that starts selected (pre-selected).
        cancel_returns: Returned on ESC/q. Defaults to the original *selected*.
+        description: Optional multi-line text shown between the title and
+            the item list.  Useful for context that should survive the
+            curses screen clear.
    """
    if cancel_returns is None:
        cancel_returns = selected
@@ -181,6 +185,10 @@ def curses_radiolist(
    if not sys.stdin.isatty():
        return cancel_returns

+    desc_lines: list[str] = []
+    if description:
+        desc_lines = description.splitlines()
+
    try:
        import curses
        result_holder: list = [None]
@@ -199,22 +207,35 @@ def curses_radiolist(
                stdscr.clear()
                max_y, max_x = stdscr.getmaxyx()

+                row = 0
+
                # Header
                try:
                    hattr = curses.A_BOLD
                    if curses.has_colors():
                        hattr |= curses.color_pair(2)
-                    stdscr.addnstr(0, 0, title, max_x - 1, hattr)
+                    stdscr.addnstr(row, 0, title, max_x - 1, hattr)
+                    row += 1
+
+                    # Description lines
+                    for dline in desc_lines:
+                        if row >= max_y - 1:
+                            break
+                        stdscr.addnstr(row, 0, dline, max_x - 1, curses.A_NORMAL)
+                        row += 1
+
                    stdscr.addnstr(
-                        1, 0,
+                        row, 0,
                        "  \u2191\u2193 navigate  ENTER/SPACE select  ESC cancel",
                        max_x - 1, curses.A_DIM,
                    )
+                    row += 1
                except curses.error:
                    pass

                # Scrollable item list
-                visible_rows = max_y - 4
+                items_start = row + 1
+                visible_rows = max_y - items_start - 1
                if cursor < scroll_offset:
                    scroll_offset = cursor
                elif cursor >= scroll_offset + visible_rows:
@@ -223,7 +244,7 @@ def curses_radiolist(
                for draw_i, i in enumerate(
                    range(scroll_offset, min(len(items), scroll_offset + visible_rows))
                ):
-                    y = draw_i + 3
+                    y = draw_i + items_start
                    if y >= max_y - 1:
                        break
                    radio = "\u25cf" if i == selected else "\u25cb"
@@ -27,6 +27,110 @@ _DPASTE_COM_URL = "https://dpaste.com/api/"
 # paste.rs caps at ~1 MB; we stay under that with headroom.
 _MAX_LOG_BYTES = 512_000

+# Auto-delete pastes after this many seconds (6 hours).
+_AUTO_DELETE_SECONDS = 21600
+
+
+# ---------------------------------------------------------------------------
+# Privacy / delete helpers
+# ---------------------------------------------------------------------------
+
+_PRIVACY_NOTICE = """\
+⚠️  This will upload the following to a public paste service:
+  • System info (OS, Python version, Hermes version, provider, which API keys
+    are configured — NOT the actual keys)
+  • Recent log lines (agent.log, errors.log, gateway.log — may contain
+    conversation fragments and file paths)
+  • Full agent.log and gateway.log (up to 512 KB each — likely contains
+    conversation content, tool outputs, and file paths)
+
+Pastes auto-delete after 6 hours.
+"""
+
+_GATEWAY_PRIVACY_NOTICE = (
+    "⚠️ **Privacy notice:** This uploads system info + recent log tails "
+    "(may contain conversation fragments) to a public paste service. "
+    "Full logs are NOT included from the gateway — use `hermes debug share` "
+    "from the CLI for full log uploads.\n"
+    "Pastes auto-delete after 6 hours."
+)
+
+
+def _extract_paste_id(url: str) -> Optional[str]:
+    """Extract the paste ID from a paste.rs or dpaste.com URL.
+
+    Returns the ID string, or None if the URL doesn't match a known service.
+    """
+    url = url.strip().rstrip("/")
+    for prefix in ("https://paste.rs/", "http://paste.rs/"):
+        if url.startswith(prefix):
+            return url[len(prefix):]
+    return None
+
+
+def delete_paste(url: str) -> bool:
+    """Delete a paste from paste.rs.  Returns True on success.
+
+    Only paste.rs supports unauthenticated DELETE.  dpaste.com pastes
+    expire automatically but cannot be deleted via API.
+    """
+    paste_id = _extract_paste_id(url)
+    if not paste_id:
+        raise ValueError(
+            f"Cannot delete: only paste.rs URLs are supported.  Got: {url}"
+        )
+
+    target = f"{_PASTE_RS_URL}{paste_id}"
+    req = urllib.request.Request(
+        target, method="DELETE",
+        headers={"User-Agent": "hermes-agent/debug-share"},
+    )
+    with urllib.request.urlopen(req, timeout=30) as resp:
+        return 200 <= resp.status < 300
+
+
+def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SECONDS):
+    """Spawn a detached process to delete paste.rs pastes after *delay_seconds*.
+
+    The child process is fully detached (``start_new_session=True``) so it
+    survives the parent exiting (important for CLI mode).  Only paste.rs
+    URLs are attempted — dpaste.com pastes auto-expire on their own.
+    """
+    import subprocess
+
+    paste_rs_urls = [u for u in urls if _extract_paste_id(u)]
+    if not paste_rs_urls:
+        return
+
+    # Build a tiny inline Python script.  No imports beyond stdlib.
+    url_list = ", ".join(f'"{u}"' for u in paste_rs_urls)
+    script = (
+        "import time, urllib.request; "
+        f"time.sleep({delay_seconds}); "
+        f"[urllib.request.urlopen(urllib.request.Request(u, method='DELETE', "
+        f"headers={{'User-Agent': 'hermes-agent/auto-delete'}}), timeout=15) "
+        f"for u in [{url_list}]]"
+    )
+
+    try:
+        subprocess.Popen(
+            [sys.executable, "-c", script],
+            start_new_session=True,
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.DEVNULL,
+        )
+    except Exception:
+        pass  # Best-effort; manual delete still available.
+
+
+def _delete_hint(url: str) -> str:
+    """Return a one-liner delete command for the given paste URL."""
+    paste_id = _extract_paste_id(url)
+    if paste_id:
+        return f"hermes debug delete {url}"
+    # dpaste.com — no API delete, expires on its own.
+    return "(auto-expires per dpaste.com policy)"
+

 def _upload_paste_rs(content: str) -> str:
    """Upload to paste.rs.  Returns the paste URL.
@@ -250,6 +354,9 @@ def run_debug_share(args):
    expiry = getattr(args, "expire", 7)
    local_only = getattr(args, "local", False)

+    if not local_only:
+        print(_PRIVACY_NOTICE)
+
    print("Collecting debug report...")

    # Capture dump once — prepended to every paste for context.
@@ -315,22 +422,56 @@ def run_debug_share(args):
    if failures:
        print(f"\n  (failed to upload: {', '.join(failures)})")

+    # Schedule auto-deletion after 6 hours
+    _schedule_auto_delete(list(urls.values()))
+    print(f"\n⏱  Pastes will auto-delete in 6 hours.")
+
+    # Manual delete fallback
+    print(f"To delete now:  hermes debug delete <url>")
+
    print(f"\nShare these links with the Hermes team for support.")


+def run_debug_delete(args):
+    """Delete one or more paste URLs uploaded by /debug."""
+    urls = getattr(args, "urls", [])
+    if not urls:
+        print("Usage: hermes debug delete <url> [<url> ...]")
+        print("  Deletes paste.rs pastes uploaded by 'hermes debug share'.")
+        return
+
+    for url in urls:
+        try:
+            ok = delete_paste(url)
+            if ok:
+                print(f"  ✓ Deleted: {url}")
+            else:
+                print(f"  ✗ Failed to delete: {url} (unexpected response)")
+        except ValueError as exc:
+            print(f"  ✗ {exc}")
+        except Exception as exc:
+            print(f"  ✗ Could not delete {url}: {exc}")
+
+
 def run_debug(args):
    """Route debug subcommands."""
    subcmd = getattr(args, "debug_command", None)
    if subcmd == "share":
        run_debug_share(args)
+    elif subcmd == "delete":
+        run_debug_delete(args)
    else:
        # Default: show help
-        print("Usage: hermes debug share [--lines N] [--expire N] [--local]")
+        print("Usage: hermes debug <command>")
        print()
        print("Commands:")
        print("  share    Upload debug report to a paste service and print URL")
+        print("  delete   Delete a previously uploaded paste")
        print()
-        print("Options:")
+        print("Options (share):")
        print("  --lines N    Number of log lines to include (default: 200)")
        print("  --expire N   Paste expiry in days (default: 7)")
        print("  --local      Print report locally instead of uploading")
+        print()
+        print("Options (delete):")
+        print("  <url> ...    One or more paste URLs to delete")
@@ -373,7 +373,11 @@ def run_doctor(args):
    print(color("◆ Auth Providers", Colors.CYAN, Colors.BOLD))

    try:
-        from hermes_cli.auth import get_nous_auth_status, get_codex_auth_status
+        from hermes_cli.auth import (
+            get_nous_auth_status,
+            get_codex_auth_status,
+            get_gemini_oauth_auth_status,
+        )

        nous_status = get_nous_auth_status()
        if nous_status.get("logged_in"):
@@ -388,6 +392,20 @@ def run_doctor(args):
            check_warn("OpenAI Codex auth", "(not logged in)")
            if codex_status.get("error"):
                check_info(codex_status["error"])
+
+        gemini_status = get_gemini_oauth_auth_status()
+        if gemini_status.get("logged_in"):
+            email = gemini_status.get("email") or ""
+            project = gemini_status.get("project_id") or ""
+            pieces = []
+            if email:
+                pieces.append(email)
+            if project:
+                pieces.append(f"project={project}")
+            suffix = f" ({', '.join(pieces)})" if pieces else ""
+            check_ok("Google Gemini OAuth", f"(logged in{suffix})")
+        else:
+            check_warn("Google Gemini OAuth", "(not logged in)")
    except Exception as e:
        check_warn("Auth provider status", f"(could not check: {e})")

@@ -814,7 +832,8 @@ def run_doctor(args):
        ("Vercel AI Gateway",       ("AI_GATEWAY_API_KEY",),                          "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
        ("Kilo Code",        ("KILOCODE_API_KEY",),                            "https://api.kilo.ai/api/gateway/models",  "KILOCODE_BASE_URL", True),
        ("OpenCode Zen",     ("OPENCODE_ZEN_API_KEY",),                        "https://opencode.ai/zen/v1/models",  "OPENCODE_ZEN_BASE_URL", True),
-        ("OpenCode Go",      ("OPENCODE_GO_API_KEY",),                         "https://opencode.ai/zen/go/v1/models", "OPENCODE_GO_BASE_URL", True),
+        # OpenCode Go has no shared /models endpoint; skip the health check.
+        ("OpenCode Go",      ("OPENCODE_GO_API_KEY",),                         None,                                  "OPENCODE_GO_BASE_URL", False),
    ]
    for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
        _key = ""
@@ -859,6 +878,31 @@ def run_doctor(args):
            except Exception as _e:
                print(f"\r  {color('⚠', Colors.YELLOW)} {_label} {color(f'({_e})', Colors.DIM)}           ")

+    # -- AWS Bedrock --
+    # Bedrock uses the AWS SDK credential chain, not API keys.
+    try:
+        from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
+        if has_aws_credentials():
+            _auth_var = resolve_aws_auth_env_var()
+            _region = resolve_bedrock_region()
+            _label = "AWS Bedrock".ljust(20)
+            print(f"  Checking AWS Bedrock...", end="", flush=True)
+            try:
+                import boto3
+                _br_client = boto3.client("bedrock", region_name=_region)
+                _br_resp = _br_client.list_foundation_models()
+                _model_count = len(_br_resp.get("modelSummaries", []))
+                print(f"\r  {color('✓', Colors.GREEN)} {_label} {color(f'({_auth_var}, {_region}, {_model_count} models)', Colors.DIM)}           ")
+            except ImportError:
+                print(f"\r  {color('⚠', Colors.YELLOW)} {_label} {color('(boto3 not installed — pip install hermes-agent[bedrock])', Colors.DIM)}           ")
+                issues.append("Install boto3 for Bedrock: pip install hermes-agent[bedrock]")
+            except Exception as _e:
+                _err_name = type(_e).__name__
+                print(f"\r  {color('⚠', Colors.YELLOW)} {_label} {color(f'({_err_name}: {_e})', Colors.DIM)}           ")
+                issues.append(f"AWS Bedrock: {_err_name} — check IAM permissions for bedrock:ListFoundationModels")
+    except ImportError:
+        pass  # bedrock_adapter not available — skip silently
+
    # =========================================================================
    # Check: Submodules
    # =========================================================================
@@ -222,7 +222,7 @@ def find_gateway_pids(exclude_pids: set | None = None, all_profiles: bool = Fals
                    current_cmd = ""
        else:
            result = subprocess.run(
-                ["ps", "eww", "-ax", "-o", "pid=,command="],
+                ["ps", "-A", "eww", "-o", "pid=,command="],
                capture_output=True,
                text=True,
                timeout=10,
@@ -1118,6 +1118,8 @@ def select_provider_and_model(args=None):
        _model_flow_openai_codex(config, current_model)
    elif selected_provider == "qwen-oauth":
        _model_flow_qwen_oauth(config, current_model)
+    elif selected_provider == "google-gemini-cli":
+        _model_flow_google_gemini_cli(config, current_model)
    elif selected_provider == "copilot-acp":
        _model_flow_copilot_acp(config, current_model)
    elif selected_provider == "copilot":
@@ -1139,7 +1141,9 @@ def select_provider_and_model(args=None):
        _model_flow_anthropic(config, current_model)
    elif selected_provider == "kimi-coding":
        _model_flow_kimi(config, current_model)
-    elif selected_provider in ("gemini", "deepseek", "xai", "zai", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface", "xiaomi", "arcee"):
+    elif selected_provider == "bedrock":
+        _model_flow_bedrock(config, current_model)
+    elif selected_provider in ("gemini", "deepseek", "xai", "zai", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface", "xiaomi", "arcee", "ollama-cloud"):
        _model_flow_api_key_provider(config, selected_provider, current_model)

    # ── Post-switch cleanup: clear stale OPENAI_BASE_URL ──────────────
@@ -1275,11 +1279,8 @@ def _model_flow_nous(config, current_model="", args=None):
        AuthError, format_auth_error,
        _login_nous, PROVIDER_REGISTRY,
    )
-    from hermes_cli.config import get_env_value, save_config, save_env_value
-    from hermes_cli.nous_subscription import (
-        apply_nous_provider_defaults,
-        get_nous_subscription_explainer_lines,
-    )
+    from hermes_cli.config import get_env_value, load_config, save_config, save_env_value
+    from hermes_cli.nous_subscription import prompt_enable_tool_gateway
    import argparse

    state = get_provider_auth_state("nous")
@@ -1298,9 +1299,12 @@ def _model_flow_nous(config, current_model="", args=None):
                insecure=bool(getattr(args, "insecure", False)),
            )
            _login_nous(mock_args, PROVIDER_REGISTRY["nous"])
-            print()
-            for line in get_nous_subscription_explainer_lines():
-                print(line)
+            # Offer Tool Gateway enablement for paid subscribers
+            try:
+                _refreshed = load_config() or {}
+                prompt_enable_tool_gateway(_refreshed)
+            except Exception:
+                pass
        except SystemExit:
            print("Login cancelled or failed.")
            return
@@ -1408,18 +1412,10 @@ def _model_flow_nous(config, current_model="", args=None):
        if get_env_value("OPENAI_BASE_URL"):
            save_env_value("OPENAI_BASE_URL", "")
            save_env_value("OPENAI_API_KEY", "")
-        changed_defaults = apply_nous_provider_defaults(config)
        save_config(config)
        print(f"Default model set to: {selected} (via Nous Portal)")
-        if "tts" in changed_defaults:
-            print("TTS provider set to: OpenAI TTS via your Nous subscription")
-        else:
-            current_tts = str(config.get("tts", {}).get("provider") or "edge")
-            if current_tts.lower() not in {"", "edge"}:
-                print(f"Keeping your existing TTS provider: {current_tts}")
-        print()
-        for line in get_nous_subscription_explainer_lines():
-            print(line)
+        # Offer Tool Gateway enablement for paid subscribers
+        prompt_enable_tool_gateway(config)
    else:
        print("No change.")

@@ -1526,6 +1522,76 @@ def _model_flow_qwen_oauth(_config, current_model=""):
        print("No change.")


+def _model_flow_google_gemini_cli(_config, current_model=""):
+    """Google Gemini OAuth (PKCE) via Cloud Code Assist — supports free AND paid tiers.
+
+    Flow:
+      1. Show upfront warning about Google's ToS stance (per opencode-gemini-auth).
+      2. If creds missing, run PKCE browser OAuth via agent.google_oauth.
+      3. Resolve project context (env -> config -> auto-discover -> free tier).
+      4. Prompt user to pick a model.
+      5. Save to ~/.hermes/config.yaml.
+    """
+    from hermes_cli.auth import (
+        DEFAULT_GEMINI_CLOUDCODE_BASE_URL,
+        get_gemini_oauth_auth_status,
+        resolve_gemini_oauth_runtime_credentials,
+        _prompt_model_selection,
+        _save_model_choice,
+        _update_config_for_provider,
+    )
+    from hermes_cli.models import _PROVIDER_MODELS
+
+    print()
+    print("⚠  Google considers using the Gemini CLI OAuth client with third-party")
+    print("   software a policy violation. Some users have reported account")
+    print("   restrictions. You can use your own API key via 'gemini' provider")
+    print("   for the lowest-risk experience.")
+    print()
+    try:
+        proceed = input("Continue with OAuth login? [y/N]: ").strip().lower()
+    except (EOFError, KeyboardInterrupt):
+        print("Cancelled.")
+        return
+    if proceed not in {"y", "yes"}:
+        print("Cancelled.")
+        return
+
+    status = get_gemini_oauth_auth_status()
+    if not status.get("logged_in"):
+        try:
+            from agent.google_oauth import resolve_project_id_from_env, start_oauth_flow
+
+            env_project = resolve_project_id_from_env()
+            start_oauth_flow(force_relogin=True, project_id=env_project)
+        except Exception as exc:
+            print(f"OAuth login failed: {exc}")
+            return
+
+    # Verify creds resolve + trigger project discovery
+    try:
+        creds = resolve_gemini_oauth_runtime_credentials(force_refresh=False)
+        project_id = creds.get("project_id", "")
+        if project_id:
+            print(f"  Using GCP project: {project_id}")
+        else:
+            print("  No GCP project configured — free tier will be auto-provisioned on first request.")
+    except Exception as exc:
+        print(f"Failed to resolve Gemini credentials: {exc}")
+        return
+
+    models = list(_PROVIDER_MODELS.get("google-gemini-cli") or [])
+    default = current_model or (models[0] if models else "gemini-2.5-flash")
+    selected = _prompt_model_selection(models, current_model=default)
+    if selected:
+        _save_model_choice(selected)
+        _update_config_for_provider("google-gemini-cli", DEFAULT_GEMINI_CLOUDCODE_BASE_URL)
+        print(f"Default model set to: {selected} (via Google Gemini OAuth / Code Assist)")
+    else:
+        print("No change.")
+
+
+

 def _model_flow_custom(config):
    """Custom endpoint: collect URL, API key, and model name.
@@ -1566,6 +1632,27 @@ def _model_flow_custom(config):

    effective_key = api_key or current_key

+    # Hint: most local model servers (Ollama, vLLM, llama.cpp) require /v1
+    # in the base URL for OpenAI-compatible chat completions.  Prompt the
+    # user if the URL looks like a local server without /v1.
+    _url_lower = effective_url.rstrip("/").lower()
+    _looks_local = any(h in _url_lower for h in ("localhost", "127.0.0.1", "0.0.0.0", ":11434", ":8080", ":5000"))
+    if _looks_local and not _url_lower.endswith("/v1"):
+        print()
+        print(f"  Hint: Did you mean to add /v1 at the end?")
+        print(f"  Most local model servers (Ollama, vLLM, llama.cpp) require it.")
+        print(f"  e.g. {effective_url.rstrip('/')}/v1")
+        try:
+            _add_v1 = input("  Add /v1? [Y/n]: ").strip().lower()
+        except (KeyboardInterrupt, EOFError):
+            _add_v1 = "n"
+        if _add_v1 in ("", "y", "yes"):
+            effective_url = effective_url.rstrip("/") + "/v1"
+            if base_url:
+                base_url = effective_url
+            print(f"  Updated URL: {effective_url}")
+        print()
+
    from hermes_cli.models import probe_api_models

    probe = probe_api_models(effective_key, effective_url)
@@ -2425,6 +2512,252 @@ def _model_flow_kimi(config, current_model=""):
        print("No change.")


+def _model_flow_bedrock_api_key(config, region, current_model=""):
+    """Bedrock API Key mode — uses the OpenAI-compatible bedrock-mantle endpoint.
+
+    For developers who don't have an AWS account but received a Bedrock API Key
+    from their AWS admin. Works like any OpenAI-compatible endpoint.
+    """
+    from hermes_cli.auth import _prompt_model_selection, _save_model_choice, deactivate_provider
+    from hermes_cli.config import load_config, save_config, get_env_value, save_env_value
+    from hermes_cli.models import _PROVIDER_MODELS
+
+    mantle_base_url = f"https://bedrock-mantle.{region}.api.aws/v1"
+
+    # Prompt for API key
+    existing_key = get_env_value("AWS_BEARER_TOKEN_BEDROCK") or ""
+    if existing_key:
+        print(f"  Bedrock API Key: {existing_key[:12]}... ✓")
+    else:
+        print(f"  Endpoint: {mantle_base_url}")
+        print()
+        try:
+            import getpass
+            api_key = getpass.getpass("  Bedrock API Key: ").strip()
+        except (KeyboardInterrupt, EOFError):
+            print()
+            return
+        if not api_key:
+            print("  Cancelled.")
+            return
+        save_env_value("AWS_BEARER_TOKEN_BEDROCK", api_key)
+        existing_key = api_key
+        print("  ✓ API key saved.")
+    print()
+
+    # Model selection — use static list (mantle doesn't need boto3 for discovery)
+    model_list = _PROVIDER_MODELS.get("bedrock", [])
+    print(f"  Showing {len(model_list)} curated models")
+
+    if model_list:
+        selected = _prompt_model_selection(model_list, current_model=current_model)
+    else:
+        try:
+            selected = input("  Model ID: ").strip()
+        except (KeyboardInterrupt, EOFError):
+            selected = None
+
+    if selected:
+        _save_model_choice(selected)
+
+        # Save as custom provider pointing to bedrock-mantle
+        cfg = load_config()
+        model = cfg.get("model")
+        if not isinstance(model, dict):
+            model = {"default": model} if model else {}
+            cfg["model"] = model
+        model["provider"] = "custom"
+        model["base_url"] = mantle_base_url
+        model.pop("api_mode", None)  # chat_completions is the default
+
+        # Also save region in bedrock config for reference
+        bedrock_cfg = cfg.get("bedrock", {})
+        if not isinstance(bedrock_cfg, dict):
+            bedrock_cfg = {}
+        bedrock_cfg["region"] = region
+        cfg["bedrock"] = bedrock_cfg
+
+        # Save the API key env var name so hermes knows where to find it
+        save_env_value("OPENAI_API_KEY", existing_key)
+        save_env_value("OPENAI_BASE_URL", mantle_base_url)
+
+        save_config(cfg)
+        deactivate_provider()
+
+        print(f"  Default model set to: {selected} (via Bedrock API Key, {region})")
+        print(f"  Endpoint: {mantle_base_url}")
+    else:
+        print("  No change.")
+
+
+def _model_flow_bedrock(config, current_model=""):
+    """AWS Bedrock provider: verify credentials, pick region, discover models.
+
+    Uses the native Converse API via boto3 — not the OpenAI-compatible endpoint.
+    Auth is handled by the AWS SDK default credential chain (env vars, profile,
+    instance role), so no API key prompt is needed.
+    """
+    from hermes_cli.auth import _prompt_model_selection, _save_model_choice, deactivate_provider
+    from hermes_cli.config import load_config, save_config
+    from hermes_cli.models import _PROVIDER_MODELS
+
+    # 1. Check for AWS credentials
+    try:
+        from agent.bedrock_adapter import (
+            has_aws_credentials,
+            resolve_aws_auth_env_var,
+            resolve_bedrock_region,
+            discover_bedrock_models,
+        )
+    except ImportError:
+        print("  ✗ boto3 is not installed. Install it with:")
+        print("    pip install boto3")
+        print()
+        return
+
+    if not has_aws_credentials():
+        print("  ⚠ No AWS credentials detected via environment variables.")
+        print("  Bedrock will use boto3's default credential chain (IMDS, SSO, etc.)")
+        print()
+
+    auth_var = resolve_aws_auth_env_var()
+    if auth_var:
+        print(f"  AWS credentials: {auth_var} ✓")
+    else:
+        print("  AWS credentials: boto3 default chain (instance role / SSO)")
+    print()
+
+    # 2. Region selection
+    current_region = resolve_bedrock_region()
+    try:
+        region_input = input(f"  AWS Region [{current_region}]: ").strip()
+    except (KeyboardInterrupt, EOFError):
+        print()
+        return
+    region = region_input or current_region
+
+    # 2b. Authentication mode
+    print("  Choose authentication method:")
+    print()
+    print("    1. IAM credential chain (recommended)")
+    print("       Works with EC2 instance roles, SSO, env vars, aws configure")
+    print("    2. Bedrock API Key")
+    print("       Enter your Bedrock API Key directly — also supports")
+    print("       team scenarios where an admin distributes keys")
+    print()
+    try:
+        auth_choice = input("  Choice [1]: ").strip()
+    except (KeyboardInterrupt, EOFError):
+        print()
+        return
+
+    if auth_choice == "2":
+        _model_flow_bedrock_api_key(config, region, current_model)
+        return
+
+    # 3. Model discovery — try live API first, fall back to static list
+    print(f"  Discovering models in {region}...")
+    live_models = discover_bedrock_models(region)
+
+    if live_models:
+        _EXCLUDE_PREFIXES = (
+            "stability.", "cohere.embed", "twelvelabs.", "us.stability.",
+            "us.cohere.embed", "us.twelvelabs.", "global.cohere.embed",
+            "global.twelvelabs.",
+        )
+        _EXCLUDE_SUBSTRINGS = ("safeguard", "voxtral", "palmyra-vision")
+        filtered = []
+        for m in live_models:
+            mid = m["id"]
+            if any(mid.startswith(p) for p in _EXCLUDE_PREFIXES):
+                continue
+            if any(s in mid.lower() for s in _EXCLUDE_SUBSTRINGS):
+                continue
+            filtered.append(m)
+
+        # Deduplicate: prefer inference profiles (us.*, global.*) over bare
+        # foundation model IDs.
+        profile_base_ids = set()
+        for m in filtered:
+            mid = m["id"]
+            if mid.startswith(("us.", "global.")):
+                base = mid.split(".", 1)[1] if "." in mid[3:] else mid
+                profile_base_ids.add(base)
+
+        deduped = []
+        for m in filtered:
+            mid = m["id"]
+            if not mid.startswith(("us.", "global.")) and mid in profile_base_ids:
+                continue
+            deduped.append(m)
+
+        _RECOMMENDED = [
+            "us.anthropic.claude-sonnet-4-6",
+            "us.anthropic.claude-opus-4-6",
+            "us.anthropic.claude-haiku-4-5",
+            "us.amazon.nova-pro",
+            "us.amazon.nova-lite",
+            "us.amazon.nova-micro",
+            "deepseek.v3",
+            "us.meta.llama4-maverick",
+            "us.meta.llama4-scout",
+        ]
+
+        def _sort_key(m):
+            mid = m["id"]
+            for i, rec in enumerate(_RECOMMENDED):
+                if mid.startswith(rec):
+                    return (0, i, mid)
+            if mid.startswith("global."):
+                return (1, 0, mid)
+            return (2, 0, mid)
+
+        deduped.sort(key=_sort_key)
+        model_list = [m["id"] for m in deduped]
+        print(f"  Found {len(model_list)} text model(s) (filtered from {len(live_models)} total)")
+    else:
+        model_list = _PROVIDER_MODELS.get("bedrock", [])
+        if model_list:
+            print(f"  Using {len(model_list)} curated models (live discovery unavailable)")
+        else:
+            print("  No models found. Check IAM permissions for bedrock:ListFoundationModels.")
+            return
+
+    # 4. Model selection
+    if model_list:
+        selected = _prompt_model_selection(model_list, current_model=current_model)
+    else:
+        try:
+            selected = input("  Model ID: ").strip()
+        except (KeyboardInterrupt, EOFError):
+            selected = None
+
+    if selected:
+        _save_model_choice(selected)
+
+        cfg = load_config()
+        model = cfg.get("model")
+        if not isinstance(model, dict):
+            model = {"default": model} if model else {}
+            cfg["model"] = model
+        model["provider"] = "bedrock"
+        model["base_url"] = f"https://bedrock-runtime.{region}.amazonaws.com"
+        model.pop("api_mode", None)  # bedrock_converse is auto-detected
+
+        bedrock_cfg = cfg.get("bedrock", {})
+        if not isinstance(bedrock_cfg, dict):
+            bedrock_cfg = {}
+        bedrock_cfg["region"] = region
+        cfg["bedrock"] = bedrock_cfg
+
+        save_config(cfg)
+        deactivate_provider()
+
+        print(f"  Default model set to: {selected} (via AWS Bedrock, {region})")
+    else:
+        print("  No change.")
+
+
 def _model_flow_api_key_provider(config, provider_id, current_model=""):
    """Generic flow for API-key providers (z.ai, MiniMax, OpenCode, etc.)."""
    from hermes_cli.auth import (
@@ -2486,34 +2819,43 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
    #   1. models.dev registry (cached, filtered for agentic/tool-capable models)
    #   2. Curated static fallback list (offline insurance)
    #   3. Live /models endpoint probe (small providers without models.dev data)
-    curated = _PROVIDER_MODELS.get(provider_id, [])
-
-    # Try models.dev first — returns tool-capable models, filtered for noise
-    mdev_models: list = []
-    try:
-        from agent.models_dev import list_agentic_models
-        mdev_models = list_agentic_models(provider_id)
-    except Exception:
-        pass
-
-    if mdev_models:
-        model_list = mdev_models
-        print(f"  Found {len(model_list)} model(s) from models.dev registry")
-    elif curated and len(curated) >= 8:
-        # Curated list is substantial — use it directly, skip live probe
-        model_list = curated
-        print(f"  Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
-    else:
+    #
+    # Ollama Cloud: dedicated merged discovery (live API + models.dev + disk cache)
+    if provider_id == "ollama-cloud":
+        from hermes_cli.models import fetch_ollama_cloud_models
        api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
-        live_models = fetch_api_models(api_key_for_probe, effective_base)
-        if live_models and len(live_models) >= len(curated):
-            model_list = live_models
-            print(f"  Found {len(model_list)} model(s) from {pconfig.name} API")
-        else:
+        model_list = fetch_ollama_cloud_models(api_key=api_key_for_probe, base_url=effective_base)
+        if model_list:
+            print(f"  Found {len(model_list)} model(s) from Ollama Cloud")
+    else:
+        curated = _PROVIDER_MODELS.get(provider_id, [])
+
+        # Try models.dev first — returns tool-capable models, filtered for noise
+        mdev_models: list = []
+        try:
+            from agent.models_dev import list_agentic_models
+            mdev_models = list_agentic_models(provider_id)
+        except Exception:
+            pass
+
+        if mdev_models:
+            model_list = mdev_models
+            print(f"  Found {len(model_list)} model(s) from models.dev registry")
+        elif curated and len(curated) >= 8:
+            # Curated list is substantial — use it directly, skip live probe
            model_list = curated
-            if model_list:
-                print(f"  Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
-        # else: no defaults either, will fall through to raw input
+            print(f"  Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
+        else:
+            api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
+            live_models = fetch_api_models(api_key_for_probe, effective_base)
+            if live_models and len(live_models) >= len(curated):
+                model_list = live_models
+                print(f"  Found {len(model_list)} model(s) from {pconfig.name} API")
+            else:
+                model_list = curated
+                if model_list:
+                    print(f"  Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
+            # else: no defaults either, will fall through to raw input

    if provider_id in {"opencode-zen", "opencode-go"}:
        model_list = [normalize_opencode_model_id(provider_id, mid) for mid in model_list]
@@ -4612,7 +4954,7 @@ For more help on a command:
    )
    chat_parser.add_argument(
        "--provider",
-        choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "gemini", "huggingface", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "xiaomi", "arcee"],
+        choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "gemini", "xai", "ollama-cloud", "huggingface", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "kilocode", "xiaomi", "arcee"],
        default=None,
        help="Inference provider (default: auto)"
    )
@@ -5073,6 +5415,7 @@ Examples:
    hermes debug share --lines 500  Include more log lines
    hermes debug share --expire 30  Keep paste for 30 days
    hermes debug share --local      Print report locally (no upload)
+    hermes debug delete <url>       Delete a previously uploaded paste
 """,
    )
    debug_sub = debug_parser.add_subparsers(dest="debug_command")
@@ -5092,6 +5435,14 @@ Examples:
        "--local", action="store_true",
        help="Print the report locally instead of uploading",
    )
+    delete_parser = debug_sub.add_parser(
+        "delete",
+        help="Delete a paste uploaded by 'hermes debug share'",
+    )
+    delete_parser.add_argument(
+        "urls", nargs="*", default=[],
+        help="One or more paste URLs to delete (e.g. https://paste.rs/abc123)",
+    )
    debug_parser.set_defaults(func=cmd_debug)

    # =========================================================================
@@ -5372,6 +5723,18 @@ Examples:
    memory_sub.add_parser("setup", help="Interactive provider selection and configuration")
    memory_sub.add_parser("status", help="Show current memory provider config")
    memory_sub.add_parser("off", help="Disable external provider (built-in only)")
+    _reset_parser = memory_sub.add_parser(
+        "reset",
+        help="Erase all built-in memory (MEMORY.md and USER.md)",
+    )
+    _reset_parser.add_argument(
+        "--yes", "-y", action="store_true",
+        help="Skip confirmation prompt",
+    )
+    _reset_parser.add_argument(
+        "--target", choices=["all", "memory", "user"], default="all",
+        help="Which store to reset: 'all' (default), 'memory', or 'user'",
+    )

    def cmd_memory(args):
        sub = getattr(args, "memory_command", None)
@@ -5384,6 +5747,44 @@ Examples:
            save_config(config)
            print("\n  ✓ Memory provider: built-in only")
            print("  Saved to config.yaml\n")
+        elif sub == "reset":
+            from hermes_constants import get_hermes_home, display_hermes_home
+            mem_dir = get_hermes_home() / "memories"
+            target = getattr(args, "target", "all")
+            files_to_reset = []
+            if target in ("all", "memory"):
+                files_to_reset.append(("MEMORY.md", "agent notes"))
+            if target in ("all", "user"):
+                files_to_reset.append(("USER.md", "user profile"))
+
+            # Check what exists
+            existing = [(f, desc) for f, desc in files_to_reset if (mem_dir / f).exists()]
+            if not existing:
+                print(f"\n  Nothing to reset — no memory files found in {display_hermes_home()}/memories/\n")
+                return
+
+            print(f"\n  This will permanently erase the following memory files:")
+            for f, desc in existing:
+                path = mem_dir / f
+                size = path.stat().st_size
+                print(f"    ◆ {f} ({desc}) — {size:,} bytes")
+
+            if not getattr(args, "yes", False):
+                try:
+                    answer = input("\n  Type 'yes' to confirm: ").strip().lower()
+                except (EOFError, KeyboardInterrupt):
+                    print("\n  Cancelled.\n")
+                    return
+                if answer != "yes":
+                    print("  Cancelled.\n")
+                    return
+
+            for f, desc in existing:
+                (mem_dir / f).unlink()
+                print(f"  ✓ Deleted {f} ({desc})")
+
+            print(f"\n  Memory reset complete. New sessions will start with a blank slate.")
+            print(f"  Files were in: {display_hermes_home()}/memories/\n")
        else:
            from hermes_cli.memory_setup import memory_command
            memory_command(args)
@@ -5527,6 +5928,13 @@ Examples:
    sessions_export.add_argument("output", help="Output JSONL file path (use - for stdout)")
    sessions_export.add_argument("--source", help="Filter by source")
    sessions_export.add_argument("--session-id", help="Export a specific session")
+    sessions_export.add_argument(
+        "--sanitize",
+        action="store_true",
+        help="Redact user/model content (message text, reasoning, tool args/output, titles, "
+             "system prompt) before export. Structure and metrics are preserved. "
+             "Use when sharing exports for bug reports or training data.",
+    )

    sessions_delete = sessions_subparsers.add_parser("delete", help="Delete a specific session")
    sessions_delete.add_argument("session_id", help="Session ID to delete")
@@ -5596,6 +6004,19 @@ Examples:
                    print(f"{preview:<50} {last_active:<13} {s['source']:<6} {sid}")

        elif action == "export":
+            sanitize = getattr(args, "sanitize", False)
+            if sanitize:
+                try:
+                    from hermes_state import sanitize_session_export as _sanitize_fn
+                except Exception:
+                    _sanitize_fn = None
+                    print("Warning: sanitize_session_export unavailable — exporting raw data.")
+            else:
+                _sanitize_fn = None
+
+            def _maybe_sanitize(d):
+                return _sanitize_fn(d) if _sanitize_fn else d
+
            if args.session_id:
                resolved_session_id = db.resolve_session_id(args.session_id)
                if not resolved_session_id:
@@ -5605,6 +6026,7 @@ Examples:
                if not data:
                    print(f"Session '{args.session_id}' not found.")
                    return
+                data = _maybe_sanitize(data)
                line = _json.dumps(data, ensure_ascii=False) + "\n"
                if args.output == "-":
                    import sys
@@ -5612,18 +6034,20 @@ Examples:
                else:
                    with open(args.output, "w", encoding="utf-8") as f:
                        f.write(line)
-                    print(f"Exported 1 session to {args.output}")
+                    suffix = " (sanitized)" if sanitize and _sanitize_fn else ""
+                    print(f"Exported 1 session to {args.output}{suffix}")
            else:
                sessions = db.export_all(source=args.source)
                if args.output == "-":
                    import sys
                    for s in sessions:
-                        sys.stdout.write(_json.dumps(s, ensure_ascii=False) + "\n")
+                        sys.stdout.write(_json.dumps(_maybe_sanitize(s), ensure_ascii=False) + "\n")
                else:
                    with open(args.output, "w", encoding="utf-8") as f:
                        for s in sessions:
-                            f.write(_json.dumps(s, ensure_ascii=False) + "\n")
-                    print(f"Exported {len(sessions)} sessions to {args.output}")
+                            f.write(_json.dumps(_maybe_sanitize(s), ensure_ascii=False) + "\n")
+                    suffix = " (sanitized)" if sanitize and _sanitize_fn else ""
+                    print(f"Exported {len(sessions)} sessions to {args.output}{suffix}")

        elif action == "delete":
            resolved_session_id = db.resolve_session_id(args.session_id)
@@ -6068,8 +6492,13 @@ Examples:
            sys.stderr = _io.StringIO()
            args = parser.parse_args(_processed_argv)
            sys.stderr = _saved_stderr
-        except SystemExit:
+        except SystemExit as exc:
            sys.stderr = _saved_stderr
+            # Help/version flags (exit code 0) already printed output —
+            # re-raise immediately to avoid a second parse_args printing
+            # the same help text again (#10230).
+            if exc.code == 0:
+                raise
            # Subcommand name was consumed as a flag value (e.g. -c model).
            # Fall back to optional subparsers so argparse handles it normally.
            subparsers.required = False
@@ -58,9 +58,11 @@ def _prompt(label: str, default: str | None = None, secret: bool = False) -> str
 def _install_dependencies(provider_name: str) -> None:
    """Install pip dependencies declared in plugin.yaml."""
    import subprocess
-    from pathlib import Path as _Path
+    from plugins.memory import find_provider_dir

-    plugin_dir = _Path(__file__).parent.parent / "plugins" / "memory" / provider_name
+    plugin_dir = find_provider_dir(provider_name)
+    if not plugin_dir:
+        return
    yaml_path = plugin_dir / "plugin.yaml"
    if not yaml_path.exists():
        return
@@ -96,6 +96,7 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
    "qwen-oauth",
    "xiaomi",
    "arcee",
+    "ollama-cloud",
    "custom",
 })

@@ -274,6 +274,11 @@ def parse_model_flags(raw_args: str) -> tuple[str, str, bool]:
    is_global = False
    explicit_provider = ""

+    # Normalize Unicode dashes (Telegram/iOS auto-converts -- to em/en dash)
+    # A single Unicode dash before a flag keyword becomes "--"
+    import re as _re
+    raw_args = _re.sub(r'[\u2012\u2013\u2014\u2015](provider|global)', r'--\1', raw_args)
+
    # Extract --global
    if "--global" in raw_args:
        is_global = True
@@ -452,6 +457,7 @@ def switch_model(
        ModelSwitchResult with all information the caller needs.
    """
    from hermes_cli.models import (
+        copilot_model_api_mode,
        detect_provider_for_model,
        validate_requested_model,
        opencode_model_api_mode,
@@ -709,8 +715,12 @@ def switch_model(
    if validation.get("corrected_model"):
        new_model = validation["corrected_model"]

+    # --- Copilot api_mode override ---
+    if target_provider in {"copilot", "github-copilot"}:
+        api_mode = copilot_model_api_mode(new_model, api_key=api_key)
+
    # --- OpenCode api_mode override ---
-    if target_provider in {"opencode-zen", "opencode-go", "opencode", "opencode-go"}:
+    if target_provider in {"opencode-zen", "opencode-go", "opencode"}:
        api_mode = opencode_model_api_mode(target_provider, new_model)

    # --- Determine api_mode if not already set ---
@@ -786,7 +796,8 @@ def list_authenticated_providers(
    from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS

    results: List[dict] = []
-    seen_slugs: set = set()
+    seen_slugs: set = set()  # lowercase-normalized to catch case variants (#9545)
+    seen_mdev_ids: set = set()  # prevent duplicate entries for aliases (e.g. kimi-coding + kimi-coding-cn)

    data = fetch_models_dev()

@@ -796,9 +807,18 @@ def list_authenticated_providers(
    # "nous" shares OpenRouter's curated list if not separately defined
    if "nous" not in curated:
        curated["nous"] = curated["openrouter"]
+    # Ollama Cloud uses dynamic discovery (no static curated list)
+    if "ollama-cloud" not in curated:
+        from hermes_cli.models import fetch_ollama_cloud_models
+        curated["ollama-cloud"] = fetch_ollama_cloud_models()

    # --- 1. Check Hermes-mapped providers ---
    for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
+        # Skip aliases that map to the same models.dev provider (e.g.
+        # kimi-coding and kimi-coding-cn both → kimi-for-coding).
+        # The first one with valid credentials wins (#10526).
+        if mdev_id in seen_mdev_ids:
+            continue
        pdata = data.get(mdev_id)
        if not isinstance(pdata, dict):
            continue
@@ -837,7 +857,8 @@ def list_authenticated_providers(
            "total_models": total,
            "source": "built-in",
        })
-        seen_slugs.add(slug)
+        seen_slugs.add(slug.lower())
+        seen_mdev_ids.add(mdev_id)

    # --- 2. Check Hermes-only providers (nous, openai-codex, copilot, opencode-go) ---
    from hermes_cli.providers import HERMES_OVERLAYS
@@ -849,12 +870,12 @@ def list_authenticated_providers(
    _mdev_to_hermes = {v: k for k, v in PROVIDER_TO_MODELS_DEV.items()}

    for pid, overlay in HERMES_OVERLAYS.items():
-        if pid in seen_slugs:
+        if pid.lower() in seen_slugs:
            continue

        # Resolve Hermes slug — e.g. "github-copilot" → "copilot"
        hermes_slug = _mdev_to_hermes.get(pid, pid)
-        if hermes_slug in seen_slugs:
+        if hermes_slug.lower() in seen_slugs:
            continue

        # Check if credentials exist
@@ -935,8 +956,8 @@ def list_authenticated_providers(
            "total_models": total,
            "source": "hermes",
        })
-        seen_slugs.add(pid)
-        seen_slugs.add(hermes_slug)
+        seen_slugs.add(pid.lower())
+        seen_slugs.add(hermes_slug.lower())

    # --- 2b. Cross-check canonical provider list ---
    # Catches providers that are in CANONICAL_PROVIDERS but weren't found
@@ -948,7 +969,7 @@ def list_authenticated_providers(
        _canon_provs = []

    for _cp in _canon_provs:
-        if _cp.slug in seen_slugs:
+        if _cp.slug.lower() in seen_slugs:
            continue

        # Check credentials via PROVIDER_REGISTRY (auth.py)
@@ -995,7 +1016,7 @@ def list_authenticated_providers(
            "total_models": _cp_total,
            "source": "canonical",
        })
-        seen_slugs.add(_cp.slug)
+        seen_slugs.add(_cp.slug.lower())

    # --- 3. User-defined endpoints from config ---
    if user_providers and isinstance(user_providers, dict):
@@ -1068,7 +1089,7 @@ def list_authenticated_providers(
                groups[slug]["models"].append(default_model)

        for slug, grp in groups.items():
-            if slug in seen_slugs:
+            if slug.lower() in seen_slugs:
                continue
            results.append({
                "slug": slug,
@@ -1080,11 +1101,9 @@ def list_authenticated_providers(
                "source": "user-config",
                "api_url": grp["api_url"],
            })
-            seen_slugs.add(slug)
+            seen_slugs.add(slug.lower())

    # Sort: current provider first, then by model count descending
    results.sort(key=lambda r: (not r["is_current"], -r["total_models"]))

    return results
-
-
@@ -11,7 +11,9 @@ import json
 import os
 import urllib.request
 import urllib.error
+import time
 from difflib import get_close_matches
+from pathlib import Path
 from typing import Any, NamedTuple, Optional

 COPILOT_BASE_URL = "https://api.githubcopilot.com"
@@ -24,7 +26,8 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
 # Fallback OpenRouter snapshot used when the live catalog is unavailable.
 # (model_id, display description shown in menus)
 OPENROUTER_MODELS: list[tuple[str, str]] = [
-    ("anthropic/claude-opus-4.6",       "recommended"),
+    ("anthropic/claude-opus-4.7",       "recommended"),
+    ("anthropic/claude-opus-4.6",       ""),
    ("anthropic/claude-sonnet-4.6",     ""),
    ("qwen/qwen3.6-plus",               ""),
    ("anthropic/claude-sonnet-4.5",     ""),
@@ -133,6 +136,11 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "gemma-4-31b-it",
        "gemma-4-26b-it",
    ],
+    "google-gemini-cli": [
+        "gemini-2.5-pro",
+        "gemini-2.5-flash",
+        "gemini-2.5-flash-lite",
+    ],
    "zai": [
        "glm-5.1",
        "glm-5",
@@ -143,17 +151,8 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "glm-4.5-flash",
    ],
    "xai": [
-        "grok-4.20-0309-reasoning",
-        "grok-4.20-0309-non-reasoning",
-        "grok-4.20-multi-agent-0309",
+        "grok-4.20-reasoning",
        "grok-4-1-fast-reasoning",
-        "grok-4-1-fast-non-reasoning",
-        "grok-4-fast-reasoning",
-        "grok-4-fast-non-reasoning",
-        "grok-4-0709",
-        "grok-code-fast-1",
-        "grok-3",
-        "grok-3-mini",
    ],
    "kimi-coding": [
        "kimi-for-coding",
@@ -188,6 +187,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "MiniMax-M2",
    ],
    "anthropic": [
+        "claude-opus-4-7",
        "claude-opus-4-6",
        "claude-sonnet-4-6",
        "claude-opus-4-5-20251101",
@@ -249,6 +249,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "big-pickle",
    ],
    "opencode-go": [
+        "glm-5.1",
        "glm-5",
        "kimi-k2.5",
        "mimo-v2-pro",
@@ -303,6 +304,22 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "XiaomiMiMo/MiMo-V2-Flash",
        "moonshotai/Kimi-K2-Thinking",
    ],
+    # AWS Bedrock — static fallback list used when dynamic discovery is
+    # unavailable (no boto3, no credentials, or API error).  The agent
+    # prefers live discovery via ListFoundationModels + ListInferenceProfiles.
+    # Use inference profile IDs (us.*) since most models require them.
+    "bedrock": [
+        "us.anthropic.claude-sonnet-4-6",
+        "us.anthropic.claude-opus-4-6-v1",
+        "us.anthropic.claude-haiku-4-5-20251001-v1:0",
+        "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
+        "us.amazon.nova-pro-v1:0",
+        "us.amazon.nova-lite-v1:0",
+        "us.amazon.nova-micro-v1:0",
+        "deepseek.v3.2",
+        "us.meta.llama4-maverick-17b-instruct-v1:0",
+        "us.meta.llama4-scout-17b-instruct-v1:0",
+    ],
 }

 # ---------------------------------------------------------------------------
@@ -523,25 +540,29 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("copilot-acp",    "GitHub Copilot ACP",       "GitHub Copilot ACP (spawns `copilot --acp --stdio`)"),
    ProviderEntry("huggingface",    "Hugging Face",             "Hugging Face Inference Providers (20+ open models)"),
    ProviderEntry("gemini",         "Google AI Studio",         "Google AI Studio (Gemini models — OpenAI-compatible endpoint)"),
+    ProviderEntry("google-gemini-cli", "Google Gemini (OAuth)",   "Google Gemini via OAuth + Code Assist (free tier supported; no API key needed)"),
    ProviderEntry("deepseek",       "DeepSeek",                 "DeepSeek (DeepSeek-V3, R1, coder — direct API)"),
    ProviderEntry("xai",            "xAI",                      "xAI (Grok models — direct API)"),
    ProviderEntry("zai",            "Z.AI / GLM",               "Z.AI / GLM (Zhipu AI direct API)"),
-    ProviderEntry("kimi-coding",    "Kimi / Moonshot",          "Kimi / Moonshot (Moonshot AI direct API)"),
+    ProviderEntry("kimi-coding",    "Kimi / Kimi Coding Plan",  "Kimi Coding Plan (api.kimi.com) & Moonshot API"),
    ProviderEntry("kimi-coding-cn", "Kimi / Moonshot (China)",  "Kimi / Moonshot China (Moonshot CN direct API)"),
    ProviderEntry("minimax",        "MiniMax",                  "MiniMax (global direct API)"),
    ProviderEntry("minimax-cn",     "MiniMax (China)",          "MiniMax China (domestic direct API)"),
    ProviderEntry("alibaba",        "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
+    ProviderEntry("ollama-cloud",   "Ollama Cloud",             "Ollama Cloud (cloud-hosted open models — ollama.com)"),
    ProviderEntry("arcee",          "Arcee AI",                 "Arcee AI (Trinity models — direct API)"),
    ProviderEntry("kilocode",       "Kilo Code",                "Kilo Code (Kilo Gateway API)"),
    ProviderEntry("opencode-zen",   "OpenCode Zen",             "OpenCode Zen (35+ curated models, pay-as-you-go)"),
    ProviderEntry("opencode-go",    "OpenCode Go",              "OpenCode Go (open models, $10/month subscription)"),
    ProviderEntry("ai-gateway",     "Vercel AI Gateway",        "Vercel AI Gateway (200+ models, pay-per-use)"),
+    ProviderEntry("bedrock",        "AWS Bedrock",              "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
 ]

 # Derived dicts — used throughout the codebase
 _PROVIDER_LABELS = {p.slug: p.label for p in CANONICAL_PROVIDERS}
 _PROVIDER_LABELS["custom"] = "Custom endpoint"  # special case: not a named provider

+
 _PROVIDER_ALIASES = {
    "glm": "zai",
    "z-ai": "zai",
@@ -582,14 +603,22 @@ _PROVIDER_ALIASES = {
    "qwen": "alibaba",
    "alibaba-cloud": "alibaba",
    "qwen-portal": "qwen-oauth",
+    "gemini-cli": "google-gemini-cli",
+    "gemini-oauth": "google-gemini-cli",
    "hf": "huggingface",
    "hugging-face": "huggingface",
    "huggingface-hub": "huggingface",
    "mimo": "xiaomi",
    "xiaomi-mimo": "xiaomi",
+    "aws": "bedrock",
+    "aws-bedrock": "bedrock",
+    "amazon-bedrock": "bedrock",
+    "amazon": "bedrock",
    "grok": "xai",
    "x-ai": "xai",
    "x.ai": "xai",
+    "ollama": "custom",  # bare "ollama" = local; use "ollama-cloud" for cloud
+    "ollama_cloud": "ollama-cloud",
 }


@@ -1026,7 +1055,7 @@ def detect_provider_for_model(
            return (resolved_provider, default_models[0])

    # Aggregators list other providers' models — never auto-switch TO them
-    _AGGREGATORS = {"nous", "openrouter"}
+    _AGGREGATORS = {"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}

    # If the model belongs to the current provider's catalog, don't suggest switching
    current_models = _PROVIDER_MODELS.get(current_provider, [])
@@ -1043,7 +1072,8 @@ def detect_provider_for_model(
            break

    if direct_match:
-        # Check if we have credentials for this provider
+        # Check if we have credentials for this provider — env vars,
+        # credential pool, or auth store entries.
        has_creds = False
        try:
            from hermes_cli.auth import PROVIDER_REGISTRY
@@ -1056,16 +1086,28 @@ def detect_provider_for_model(
                        break
        except Exception:
            pass
+        # Also check credential pool and auth store — covers OAuth,
+        # Claude Code tokens, and other non-env-var credentials (#10300).
+        if not has_creds:
+            try:
+                from agent.credential_pool import load_pool
+                pool = load_pool(direct_match)
+                if pool.has_credentials():
+                    has_creds = True
+            except Exception:
+                pass
+        if not has_creds:
+            try:
+                from hermes_cli.auth import _load_auth_store
+                store = _load_auth_store()
+                if direct_match in store.get("providers", {}) or direct_match in store.get("credential_pool", {}):
+                    has_creds = True
+            except Exception:
+                pass

-        if has_creds:
-            return (direct_match, name)
-
-        # No direct creds — try to find this model on OpenRouter instead
-        or_slug = _find_openrouter_slug(name)
-        if or_slug:
-            return ("openrouter", or_slug)
-        # Still return the direct provider — credential resolution will
-        # give a clear error rather than silently using the wrong provider
+        # Always return the direct provider match.  If credentials are
+        # missing, the client init will give a clear error rather than
+        # silently routing through the wrong provider (#10300).
        return (direct_match, name)

    # --- Step 2: check OpenRouter catalog ---
@@ -1255,6 +1297,10 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
        live = _fetch_ai_gateway_models()
        if live:
            return live
+    if normalized == "ollama-cloud":
+        live = fetch_ollama_cloud_models(force_refresh=force_refresh)
+        if live:
+            return live
    if normalized == "custom":
        base_url = _get_custom_base_url()
        if base_url:
@@ -1539,6 +1585,11 @@ def copilot_model_api_mode(
    primary signal.  Falls back to the catalog's ``supported_endpoints``
    only for models not covered by the pattern check.
    """
+    # Fetch the catalog once so normalize + endpoint check share it
+    # (avoids two redundant network calls for non-GPT-5 models).
+    if catalog is None and api_key:
+        catalog = fetch_github_model_catalog(api_key=api_key)
+
    normalized = normalize_copilot_model_id(model_id, catalog=catalog, api_key=api_key)
    if not normalized:
        return "chat_completions"
@@ -1548,9 +1599,6 @@ def copilot_model_api_mode(
        return "codex_responses"

    # Secondary: check catalog for non-GPT-5 models (Claude via /v1/messages, etc.)
-    if catalog is None and api_key:
-        catalog = fetch_github_model_catalog(api_key=api_key)
-
    if catalog:
        catalog_entry = next((item for item in catalog if item.get("id") == normalized), None)
        if isinstance(catalog_entry, dict):
@@ -1765,6 +1813,125 @@ def fetch_api_models(
    return probe_api_models(api_key, base_url, timeout=timeout).get("models")


+# ---------------------------------------------------------------------------
+# Ollama Cloud — merged model discovery with disk cache
+# ---------------------------------------------------------------------------
+
+
+
+_OLLAMA_CLOUD_CACHE_TTL = 3600  # 1 hour
+
+
+def _ollama_cloud_cache_path() -> Path:
+    """Return the path for the Ollama Cloud model cache."""
+    from hermes_constants import get_hermes_home
+    return get_hermes_home() / "ollama_cloud_models_cache.json"
+
+
+def _load_ollama_cloud_cache(*, ignore_ttl: bool = False) -> Optional[dict]:
+    """Load cached Ollama Cloud models from disk.
+
+    Args:
+        ignore_ttl: If True, return data even if the TTL has expired (stale fallback).
+    """
+    try:
+        cache_path = _ollama_cloud_cache_path()
+        if not cache_path.exists():
+            return None
+        with open(cache_path, encoding="utf-8") as f:
+            data = json.load(f)
+        if not isinstance(data, dict):
+            return None
+        models = data.get("models")
+        if not (isinstance(models, list) and models):
+            return None
+        if not ignore_ttl:
+            cached_at = data.get("cached_at", 0)
+            if (time.time() - cached_at) > _OLLAMA_CLOUD_CACHE_TTL:
+                return None  # stale
+        return data
+    except Exception:
+        pass
+    return None
+
+
+def _save_ollama_cloud_cache(models: list[str]) -> None:
+    """Persist the merged Ollama Cloud model list to disk."""
+    try:
+        from utils import atomic_json_write
+        cache_path = _ollama_cloud_cache_path()
+        cache_path.parent.mkdir(parents=True, exist_ok=True)
+        atomic_json_write(cache_path, {"models": models, "cached_at": time.time()}, indent=None)
+    except Exception:
+        pass
+
+
+def fetch_ollama_cloud_models(
+    api_key: Optional[str] = None,
+    base_url: Optional[str] = None,
+    *,
+    force_refresh: bool = False,
+) -> list[str]:
+    """Fetch Ollama Cloud models by merging live API + models.dev, with disk cache.
+
+    Resolution order:
+      1. Disk cache (if fresh, < 1 hour, and not force_refresh)
+      2. Live ``/v1/models`` endpoint (primary — freshest source)
+      3. models.dev registry (secondary — fills gaps for unlisted models)
+      4. Merge: live models first, then models.dev additions (deduped)
+
+    Returns a list of model IDs (never None — empty list on total failure).
+    """
+    # 1. Check disk cache
+    if not force_refresh:
+        cached = _load_ollama_cloud_cache()
+        if cached is not None:
+            return cached["models"]
+
+    # 2. Live API probe
+    if not api_key:
+        api_key = os.getenv("OLLAMA_API_KEY", "")
+    if not base_url:
+        base_url = os.getenv("OLLAMA_BASE_URL", "") or "https://ollama.com/v1"
+
+    live_models: list[str] = []
+    if api_key:
+        result = fetch_api_models(api_key, base_url, timeout=8.0)
+        if result:
+            live_models = result
+
+    # 3. models.dev registry
+    mdev_models: list[str] = []
+    try:
+        from agent.models_dev import list_agentic_models
+        mdev_models = list_agentic_models("ollama-cloud")
+    except Exception:
+        pass
+
+    # 4. Merge: live first, then models.dev additions (deduped, order-preserving)
+    if live_models or mdev_models:
+        seen: set[str] = set()
+        merged: list[str] = []
+        for m in live_models:
+            if m and m not in seen:
+                seen.add(m)
+                merged.append(m)
+        for m in mdev_models:
+            if m and m not in seen:
+                seen.add(m)
+                merged.append(m)
+        if merged:
+            _save_ollama_cloud_cache(merged)
+            return merged
+
+    # Total failure — return stale cache if available (ignore TTL)
+    stale = _load_ollama_cloud_cache(ignore_ttl=True)
+    if stale is not None:
+        return stale["models"]
+
+    return []
+
+
 def validate_requested_model(
    model_name: str,
    provider: Optional[str],
@@ -1957,6 +2124,42 @@ def validate_requested_model(

    # api_models is None — couldn't reach API.  Accept and persist,
    # but warn so typos don't silently break things.
+
+    # Bedrock: use our own discovery instead of HTTP /models endpoint.
+    # Bedrock's bedrock-runtime URL doesn't support /models — it uses the
+    # AWS SDK control plane (ListFoundationModels + ListInferenceProfiles).
+    if normalized == "bedrock":
+        try:
+            from agent.bedrock_adapter import discover_bedrock_models, resolve_bedrock_region
+            region = resolve_bedrock_region()
+            discovered = discover_bedrock_models(region)
+            discovered_ids = {m["id"] for m in discovered}
+            if requested in discovered_ids:
+                return {
+                    "accepted": True,
+                    "persist": True,
+                    "recognized": True,
+                    "message": None,
+                }
+            # Not in discovered list — still accept (user may have custom
+            # inference profiles or cross-account access), but warn.
+            suggestions = get_close_matches(requested, list(discovered_ids), n=3, cutoff=0.4)
+            suggestion_text = ""
+            if suggestions:
+                suggestion_text = "\n  Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
+            return {
+                "accepted": True,
+                "persist": True,
+                "recognized": False,
+                "message": (
+                    f"Note: `{requested}` was not found in Bedrock model discovery for {region}. "
+                    f"It may still work with custom inference profiles or cross-account access."
+                    f"{suggestion_text}"
+                ),
+            }
+        except Exception:
+            pass  # Fall through to generic warning
+
    provider_label = _PROVIDER_LABELS.get(normalized, normalized)
    return {
        "accepted": True,
@@ -143,6 +143,7 @@ def _tts_label(current_provider: str) -> str:
        "openai": "OpenAI TTS",
        "elevenlabs": "ElevenLabs",
        "edge": "Edge TTS",
+        "xai": "xAI TTS",
        "mistral": "Mistral Voxtral TTS",
        "neutts": "NeuTTS",
    }
@@ -257,6 +258,15 @@ def get_nous_subscription_features(
        terminal_cfg.get("modal_mode")
    )

+    # use_gateway flags — when True, the user explicitly opted into the
+    # Tool Gateway via `hermes model`, so direct credentials should NOT
+    # prevent gateway routing.
+    web_use_gateway = bool(web_cfg.get("use_gateway"))
+    tts_use_gateway = bool(tts_cfg.get("use_gateway"))
+    browser_use_gateway = bool(browser_cfg.get("use_gateway"))
+    image_gen_cfg = config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}
+    image_use_gateway = bool(image_gen_cfg.get("use_gateway"))
+
    direct_exa = bool(get_env_value("EXA_API_KEY"))
    direct_firecrawl = bool(get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL"))
    direct_parallel = bool(get_env_value("PARALLEL_API_KEY"))
@@ -269,6 +279,21 @@ def get_nous_subscription_features(
    direct_browser_use = bool(get_env_value("BROWSER_USE_API_KEY"))
    direct_modal = has_direct_modal_credentials()

+    # When use_gateway is set, suppress direct credentials for managed detection
+    if web_use_gateway:
+        direct_firecrawl = False
+        direct_exa = False
+        direct_parallel = False
+        direct_tavily = False
+    if image_use_gateway:
+        direct_fal = False
+    if tts_use_gateway:
+        direct_openai_tts = False
+        direct_elevenlabs = False
+    if browser_use_gateway:
+        direct_browser_use = False
+        direct_browserbase = False
+
    managed_web_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("firecrawl")
    managed_image_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("fal-queue")
    managed_tts_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("openai-audio")
@@ -439,37 +464,7 @@ def get_nous_subscription_features(
    )


-def get_nous_subscription_explainer_lines() -> list[str]:
-    if not managed_nous_tools_enabled():
-        return []

-    return [
-        "Nous subscription enables managed web tools, image generation, OpenAI TTS, and browser automation by default.",
-        "Those managed tools bill to your Nous subscription. Modal execution is optional and can bill to your subscription too.",
-        "Change these later with: hermes setup tools, hermes setup terminal, or hermes status.",
-    ]
-
-
-def apply_nous_provider_defaults(config: Dict[str, object]) -> set[str]:
-    """Apply provider-level Nous defaults shared by `hermes setup` and `hermes model`."""
-    if not managed_nous_tools_enabled():
-        return set()
-
-    features = get_nous_subscription_features(config)
-    if not features.provider_is_nous:
-        return set()
-
-    tts_cfg = config.get("tts")
-    if not isinstance(tts_cfg, dict):
-        tts_cfg = {}
-        config["tts"] = tts_cfg
-
-    current_tts = str(tts_cfg.get("provider") or "edge").strip().lower()
-    if current_tts not in {"", "edge"}:
-        return set()
-
-    tts_cfg["provider"] = "openai"
-    return {"tts"}


 def apply_nous_managed_defaults(
@@ -529,3 +524,255 @@ def apply_nous_managed_defaults(
        changed.add("image_gen")

    return changed
+
+
+# ---------------------------------------------------------------------------
+# Tool Gateway offer — single Y/n prompt after model selection
+# ---------------------------------------------------------------------------
+
+_GATEWAY_TOOL_LABELS = {
+    "web": "Web search & extract (Firecrawl)",
+    "image_gen": "Image generation (FAL)",
+    "tts": "Text-to-speech (OpenAI TTS)",
+    "browser": "Browser automation (Browser Use)",
+}
+
+
+def _get_gateway_direct_credentials() -> Dict[str, bool]:
+    """Return a dict of tool_key -> has_direct_credentials."""
+    return {
+        "web": bool(
+            get_env_value("FIRECRAWL_API_KEY")
+            or get_env_value("FIRECRAWL_API_URL")
+            or get_env_value("PARALLEL_API_KEY")
+            or get_env_value("TAVILY_API_KEY")
+            or get_env_value("EXA_API_KEY")
+        ),
+        "image_gen": bool(get_env_value("FAL_KEY")),
+        "tts": bool(
+            resolve_openai_audio_api_key()
+            or get_env_value("ELEVENLABS_API_KEY")
+        ),
+        "browser": bool(
+            get_env_value("BROWSER_USE_API_KEY")
+            or (get_env_value("BROWSERBASE_API_KEY") and get_env_value("BROWSERBASE_PROJECT_ID"))
+        ),
+    }
+
+
+_GATEWAY_DIRECT_LABELS = {
+    "web": "Firecrawl/Exa/Parallel/Tavily key",
+    "image_gen": "FAL key",
+    "tts": "OpenAI/ElevenLabs key",
+    "browser": "Browser Use/Browserbase key",
+}
+
+_ALL_GATEWAY_KEYS = ("web", "image_gen", "tts", "browser")
+
+
+def get_gateway_eligible_tools(
+    config: Optional[Dict[str, object]] = None,
+) -> tuple[list[str], list[str], list[str]]:
+    """Return (unconfigured, has_direct, already_managed) tool key lists.
+
+    - unconfigured: tools with no direct credentials (easy switch)
+    - has_direct: tools where the user has their own API keys
+    - already_managed: tools already routed through the gateway
+
+    All lists are empty when the user is not a paid Nous subscriber or
+    is not using Nous as their provider.
+    """
+    if not managed_nous_tools_enabled():
+        return [], [], []
+
+    if config is None:
+        from hermes_cli.config import load_config
+        config = load_config() or {}
+
+    # Quick provider check without the heavy get_nous_subscription_features call
+    model_cfg = config.get("model")
+    if not isinstance(model_cfg, dict) or str(model_cfg.get("provider") or "").strip().lower() != "nous":
+        return [], [], []
+
+    direct = _get_gateway_direct_credentials()
+
+    # Check which tools the user has explicitly opted into the gateway for.
+    # This is distinct from managed_by_nous which fires implicitly when
+    # no direct keys exist — we only skip the prompt for tools where
+    # use_gateway was explicitly set.
+    opted_in = {
+        "web": bool((config.get("web") if isinstance(config.get("web"), dict) else {}).get("use_gateway")),
+        "image_gen": bool((config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}).get("use_gateway")),
+        "tts": bool((config.get("tts") if isinstance(config.get("tts"), dict) else {}).get("use_gateway")),
+        "browser": bool((config.get("browser") if isinstance(config.get("browser"), dict) else {}).get("use_gateway")),
+    }
+
+    unconfigured: list[str] = []
+    has_direct: list[str] = []
+    already_managed: list[str] = []
+    for key in _ALL_GATEWAY_KEYS:
+        if opted_in.get(key):
+            already_managed.append(key)
+        elif direct.get(key):
+            has_direct.append(key)
+        else:
+            unconfigured.append(key)
+    return unconfigured, has_direct, already_managed
+
+
+def apply_gateway_defaults(
+    config: Dict[str, object],
+    tool_keys: list[str],
+) -> set[str]:
+    """Apply Tool Gateway config for the given tool keys.
+
+    Sets ``use_gateway: true`` in each tool's config section so the
+    runtime prefers the gateway even when direct API keys are present.
+
+    Returns the set of tools that were actually changed.
+    """
+    changed: set[str] = set()
+
+    web_cfg = config.get("web")
+    if not isinstance(web_cfg, dict):
+        web_cfg = {}
+        config["web"] = web_cfg
+
+    tts_cfg = config.get("tts")
+    if not isinstance(tts_cfg, dict):
+        tts_cfg = {}
+        config["tts"] = tts_cfg
+
+    browser_cfg = config.get("browser")
+    if not isinstance(browser_cfg, dict):
+        browser_cfg = {}
+        config["browser"] = browser_cfg
+
+    if "web" in tool_keys:
+        web_cfg["backend"] = "firecrawl"
+        web_cfg["use_gateway"] = True
+        changed.add("web")
+
+    if "tts" in tool_keys:
+        tts_cfg["provider"] = "openai"
+        tts_cfg["use_gateway"] = True
+        changed.add("tts")
+
+    if "browser" in tool_keys:
+        browser_cfg["cloud_provider"] = "browser-use"
+        browser_cfg["use_gateway"] = True
+        changed.add("browser")
+
+    if "image_gen" in tool_keys:
+        image_cfg = config.get("image_gen")
+        if not isinstance(image_cfg, dict):
+            image_cfg = {}
+            config["image_gen"] = image_cfg
+        image_cfg["use_gateway"] = True
+        changed.add("image_gen")
+
+    return changed
+
+
+def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
+    """If eligible tools exist, prompt the user to enable the Tool Gateway.
+
+    Uses prompt_choice() with a description parameter so the curses TUI
+    shows the tool context alongside the choices.
+
+    Returns the set of tools that were enabled, or empty set if the user
+    declined or no tools were eligible.
+    """
+    unconfigured, has_direct, already_managed = get_gateway_eligible_tools(config)
+    if not unconfigured and not has_direct:
+        return set()
+
+    try:
+        from hermes_cli.setup import prompt_choice
+    except Exception:
+        return set()
+
+    # Build description lines showing full status of all gateway tools
+    desc_parts: list[str] = [
+        "",
+        "  The Tool Gateway gives you access to web search, image generation,",
+        "  text-to-speech, and browser automation through your Nous subscription.",
+        "  No need to sign up for separate API keys — just pick the tools you want.",
+        "",
+    ]
+    if already_managed:
+        for k in already_managed:
+            desc_parts.append(f"  ✓ {_GATEWAY_TOOL_LABELS[k]} — using Tool Gateway")
+    if unconfigured:
+        for k in unconfigured:
+            desc_parts.append(f"  ○ {_GATEWAY_TOOL_LABELS[k]} — not configured")
+    if has_direct:
+        for k in has_direct:
+            desc_parts.append(f"  ○ {_GATEWAY_TOOL_LABELS[k]} — using {_GATEWAY_DIRECT_LABELS[k]}")
+
+    # Build short choice labels — detail is in the description above
+    choices: list[str] = []
+    choice_keys: list[str] = []  # maps choice index -> action
+
+    if unconfigured and has_direct:
+        choices.append("Enable for all tools (existing keys kept, not used)")
+        choice_keys.append("all")
+
+        choices.append("Enable only for tools without existing keys")
+        choice_keys.append("unconfigured")
+
+        choices.append("Skip")
+        choice_keys.append("skip")
+
+    elif unconfigured:
+        choices.append("Enable Tool Gateway")
+        choice_keys.append("unconfigured")
+
+        choices.append("Skip")
+        choice_keys.append("skip")
+
+    else:
+        choices.append("Enable Tool Gateway (existing keys kept, not used)")
+        choice_keys.append("all")
+
+        choices.append("Skip")
+        choice_keys.append("skip")
+
+    description = "\n".join(desc_parts) if desc_parts else None
+    # Default to "Enable" when user has no direct keys (new user),
+    # default to "Skip" when they have existing keys to preserve.
+    default_idx = 0 if not has_direct else len(choices) - 1
+
+    try:
+        idx = prompt_choice(
+            "Your Nous subscription includes the Tool Gateway.",
+            choices,
+            default_idx,
+            description=description,
+        )
+    except (KeyboardInterrupt, EOFError, OSError, SystemExit):
+        return set()
+
+    action = choice_keys[idx]
+    if action == "skip":
+        return set()
+
+    if action == "all":
+        # Apply to switchable tools + ensure already-managed tools also
+        # have use_gateway persisted in config for consistency.
+        to_apply = list(_ALL_GATEWAY_KEYS)
+    else:
+        to_apply = unconfigured
+
+    changed = apply_gateway_defaults(config, to_apply)
+    if changed:
+        from hermes_cli.config import save_config
+        save_config(config)
+        # Only report the tools that actually switched (not already-managed ones)
+        newly_switched = changed - set(already_managed)
+        for key in sorted(newly_switched):
+            label = _GATEWAY_TOOL_LABELS.get(key, key)
+            print(f"  ✓ {label}: enabled via Nous subscription")
+        if already_managed and not newly_switched:
+            print("  (all tools already using Tool Gateway)")
+    return changed
@@ -112,6 +112,7 @@ class LoadedPlugin:
    module: Optional[types.ModuleType] = None
    tools_registered: List[str] = field(default_factory=list)
    hooks_registered: List[str] = field(default_factory=list)
+    commands_registered: List[str] = field(default_factory=list)
    enabled: bool = False
    error: Optional[str] = None

@@ -211,6 +212,84 @@ class PluginContext:
        }
        logger.debug("Plugin %s registered CLI command: %s", self.manifest.name, name)

+    # -- slash command registration -------------------------------------------
+
+    def register_command(
+        self,
+        name: str,
+        handler: Callable,
+        description: str = "",
+    ) -> None:
+        """Register a slash command (e.g. ``/lcm``) available in CLI and gateway sessions.
+
+        The handler signature is ``fn(raw_args: str) -> str | None``.
+        It may also be an async callable — the gateway dispatch handles both.
+
+        Unlike ``register_cli_command()`` (which creates ``hermes <subcommand>``
+        terminal commands), this registers in-session slash commands that users
+        invoke during a conversation.
+
+        Names conflicting with built-in commands are rejected with a warning.
+        """
+        clean = name.lower().strip().lstrip("/").replace(" ", "-")
+        if not clean:
+            logger.warning(
+                "Plugin '%s' tried to register a command with an empty name.",
+                self.manifest.name,
+            )
+            return
+
+        # Reject if it conflicts with a built-in command
+        try:
+            from hermes_cli.commands import resolve_command
+            if resolve_command(clean) is not None:
+                logger.warning(
+                    "Plugin '%s' tried to register command '/%s' which conflicts "
+                    "with a built-in command. Skipping.",
+                    self.manifest.name, clean,
+                )
+                return
+        except Exception:
+            pass  # If commands module isn't available, skip the check
+
+        self._manager._plugin_commands[clean] = {
+            "handler": handler,
+            "description": description or "Plugin command",
+            "plugin": self.manifest.name,
+        }
+        logger.debug("Plugin %s registered command: /%s", self.manifest.name, clean)
+
+    # -- tool dispatch -------------------------------------------------------
+
+    def dispatch_tool(self, tool_name: str, args: dict, **kwargs) -> str:
+        """Dispatch a tool call through the registry, with parent agent context.
+
+        This is the public interface for plugin slash commands that need to call
+        tools like ``delegate_task`` without reaching into framework internals.
+        The parent agent (if available) is resolved automatically — plugins never
+        need to access the agent directly.
+
+        Args:
+            tool_name: Registry name of the tool (e.g. ``"delegate_task"``).
+            args: Tool arguments dict (same as what the model would pass).
+            **kwargs: Extra keyword args forwarded to the registry dispatch.
+
+        Returns:
+            JSON string from the tool handler (same format as model tool calls).
+        """
+        from tools.registry import registry
+
+        # Wire up parent agent context when available (CLI mode).
+        # In gateway mode _cli_ref is None — tools degrade gracefully
+        # (workspace hints fall back to TERMINAL_CWD, no spinner).
+        if "parent_agent" not in kwargs:
+            cli = self._manager._cli_ref
+            agent = getattr(cli, "agent", None) if cli else None
+            if agent is not None:
+                kwargs["parent_agent"] = agent
+
+        return registry.dispatch(tool_name, args, **kwargs)
+
    # -- context engine registration -----------------------------------------

    def register_context_engine(self, engine) -> None:
@@ -323,6 +402,7 @@ class PluginManager:
        self._plugin_tool_names: Set[str] = set()
        self._cli_commands: Dict[str, dict] = {}
        self._context_engine = None  # Set by a plugin via register_context_engine()
+        self._plugin_commands: Dict[str, dict] = {}  # Slash commands registered by plugins
        self._discovered: bool = False
        self._cli_ref = None  # Set by CLI after plugin discovery
        # Plugin skill registry: qualified name → metadata dict.
@@ -485,6 +565,10 @@ class PluginManager:
                        for h in p.hooks_registered
                    }
                )
+                loaded.commands_registered = [
+                    c for c in self._plugin_commands
+                    if self._plugin_commands[c].get("plugin") == manifest.name
+                ]
                loaded.enabled = True

        except Exception as exc:
@@ -598,6 +682,7 @@ class PluginManager:
                    "enabled": loaded.enabled,
                    "tools": len(loaded.tools_registered),
                    "hooks": len(loaded.hooks_registered),
+                    "commands": len(loaded.commands_registered),
                    "error": loaded.error,
                }
            )
@@ -699,6 +784,20 @@ def get_plugin_context_engine():
    return get_plugin_manager()._context_engine


+def get_plugin_command_handler(name: str) -> Optional[Callable]:
+    """Return the handler for a plugin-registered slash command, or ``None``."""
+    entry = get_plugin_manager()._plugin_commands.get(name)
+    return entry["handler"] if entry else None
+
+
+def get_plugin_commands() -> Dict[str, dict]:
+    """Return the full plugin commands dict (name → {handler, description, plugin}).
+
+    Safe to call before discovery — returns an empty dict if no plugins loaded.
+    """
+    return get_plugin_manager()._plugin_commands
+
+
 def get_plugin_toolsets() -> List[tuple]:
    """Return plugin toolsets as ``(key, label, description)`` tuples.

@@ -64,6 +64,11 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        base_url_override="https://portal.qwen.ai/v1",
        base_url_env_var="HERMES_QWEN_BASE_URL",
    ),
+    "google-gemini-cli": HermesOverlay(
+        transport="openai_chat",
+        auth_type="oauth_external",
+        base_url_override="cloudcode-pa://google",
+    ),
    "copilot-acp": HermesOverlay(
        transport="codex_responses",
        auth_type="external_process",
@@ -128,7 +133,7 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        base_url_env_var="HF_BASE_URL",
    ),
    "xai": HermesOverlay(
-        transport="openai_chat",
+        transport="codex_responses",
        base_url_override="https://api.x.ai/v1",
        base_url_env_var="XAI_BASE_URL",
    ),
@@ -141,6 +146,10 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        base_url_override="https://api.arcee.ai/api/v1",
        base_url_env_var="ARCEE_BASE_URL",
    ),
+    "ollama-cloud": HermesOverlay(
+        transport="openai_chat",
+        base_url_env_var="OLLAMA_BASE_URL",
+    ),
 }


@@ -180,6 +189,7 @@ ALIASES: Dict[str, str] = {
    # xai
    "x-ai": "xai",
    "x.ai": "xai",
+    "grok": "xai",

    # kimi-for-coding (models.dev ID)
    "kimi": "kimi-for-coding",
@@ -227,6 +237,11 @@ ALIASES: Dict[str, str] = {
    "qwen": "alibaba",
    "alibaba-cloud": "alibaba",

+    # google-gemini-cli (OAuth + Code Assist)
+    "gemini-cli": "google-gemini-cli",
+    "gemini-oauth": "google-gemini-cli",
+
+
    # huggingface
    "hf": "huggingface",
    "hugging-face": "huggingface",
@@ -236,6 +251,12 @@ ALIASES: Dict[str, str] = {
    "mimo": "xiaomi",
    "xiaomi-mimo": "xiaomi",

+    # bedrock
+    "aws": "bedrock",
+    "aws-bedrock": "bedrock",
+    "amazon-bedrock": "bedrock",
+    "amazon": "bedrock",
+
    # arcee
    "arcee-ai": "arcee",
    "arceeai": "arcee",
@@ -244,7 +265,7 @@ ALIASES: Dict[str, str] = {
    "lmstudio": "lmstudio",
    "lm-studio": "lmstudio",
    "lm_studio": "lmstudio",
-    "ollama": "ollama-cloud",
+    "ollama": "custom",  # bare "ollama" = local; use "ollama-cloud" for cloud
    "vllm": "local",
    "llamacpp": "local",
    "llama.cpp": "local",
@@ -262,6 +283,8 @@ _LABEL_OVERRIDES: Dict[str, str] = {
    "copilot-acp": "GitHub Copilot ACP",
    "xiaomi": "Xiaomi MiMo",
    "local": "Local endpoint",
+    "bedrock": "AWS Bedrock",
+    "ollama-cloud": "Ollama Cloud",
 }


@@ -271,6 +294,7 @@ TRANSPORT_TO_API_MODE: Dict[str, str] = {
    "openai_chat": "chat_completions",
    "anthropic_messages": "anthropic_messages",
    "codex_responses": "codex_responses",
+    "bedrock_converse": "bedrock_converse",
 }


@@ -388,6 +412,10 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
    if pdef is not None:
        return TRANSPORT_TO_API_MODE.get(pdef.transport, "chat_completions")

+    # Direct provider checks for providers not in HERMES_OVERLAYS
+    if provider == "bedrock":
+        return "bedrock_converse"
+
    # URL-based heuristics for custom / unknown providers
    if base_url:
        url_lower = base_url.rstrip("/").lower()
@@ -395,6 +423,8 @@ def determine_api_mode(provider: str, base_url: str = "") -> str:
            return "anthropic_messages"
        if "api.openai.com" in url_lower:
            return "codex_responses"
+        if "bedrock-runtime" in url_lower and "amazonaws.com" in url_lower:
+            return "bedrock_converse"

    return "chat_completions"

@@ -22,6 +22,7 @@ from hermes_cli.auth import (
    resolve_nous_runtime_credentials,
    resolve_codex_runtime_credentials,
    resolve_qwen_runtime_credentials,
+    resolve_gemini_oauth_runtime_credentials,
    resolve_api_key_provider_credentials,
    resolve_external_process_provider_credentials,
    has_usable_secret,
@@ -41,6 +42,8 @@ def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
    tool calls with reasoning (chat/completions returns 400).
    """
    normalized = (base_url or "").strip().lower().rstrip("/")
+    if "api.x.ai" in normalized:
+        return "codex_responses"
    if "api.openai.com" in normalized and "openrouter" not in normalized:
        return "codex_responses"
    return None
@@ -124,7 +127,7 @@ def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
        return "chat_completions"


-_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages"}
+_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages", "bedrock_converse"}


 def _parse_api_mode(raw: Any) -> Optional[str]:
@@ -154,6 +157,9 @@ def _resolve_runtime_from_pool_entry(
    elif provider == "qwen-oauth":
        api_mode = "chat_completions"
        base_url = base_url or DEFAULT_QWEN_BASE_URL
+    elif provider == "google-gemini-cli":
+        api_mode = "chat_completions"
+        base_url = base_url or "cloudcode-pa://google"
    elif provider == "anthropic":
        api_mode = "anthropic_messages"
        cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
@@ -163,10 +169,13 @@ def _resolve_runtime_from_pool_entry(
        base_url = cfg_base_url or base_url or "https://api.anthropic.com"
    elif provider == "openrouter":
        base_url = base_url or OPENROUTER_BASE_URL
+    elif provider == "xai":
+        api_mode = "codex_responses"
    elif provider == "nous":
        api_mode = "chat_completions"
    elif provider == "copilot":
        api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
+        base_url = base_url or PROVIDER_REGISTRY["copilot"].inference_base_url
    else:
        configured_provider = str(model_cfg.get("provider") or "").strip().lower()
        # Honour model.base_url from config.yaml when the configured provider
@@ -627,6 +636,8 @@ def _resolve_explicit_runtime(
        api_mode = "chat_completions"
        if provider == "copilot":
            api_mode = _copilot_runtime_api_mode(model_cfg, api_key)
+        elif provider == "xai":
+            api_mode = "codex_responses"
        else:
            configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
            if configured_mode:
@@ -797,6 +808,26 @@ def resolve_runtime_provider(
            logger.info("Qwen OAuth credentials failed; "
                        "falling through to next provider.")

+    if provider == "google-gemini-cli":
+        try:
+            creds = resolve_gemini_oauth_runtime_credentials()
+            return {
+                "provider": "google-gemini-cli",
+                "api_mode": "chat_completions",
+                "base_url": creds.get("base_url", ""),
+                "api_key": creds.get("api_key", ""),
+                "source": creds.get("source", "google-oauth"),
+                "expires_at_ms": creds.get("expires_at_ms"),
+                "email": creds.get("email", ""),
+                "project_id": creds.get("project_id", ""),
+                "requested_provider": requested_provider,
+            }
+        except AuthError:
+            if requested_provider != "auto":
+                raise
+            logger.info("Google Gemini OAuth credentials failed; "
+                        "falling through to next provider.")
+
    if provider == "copilot-acp":
        creds = resolve_external_process_provider_credentials(provider)
        return {
@@ -836,6 +867,77 @@ def resolve_runtime_provider(
            "requested_provider": requested_provider,
        }

+    # AWS Bedrock (native Converse API via boto3)
+    if provider == "bedrock":
+        from agent.bedrock_adapter import (
+            has_aws_credentials,
+            resolve_aws_auth_env_var,
+            resolve_bedrock_region,
+            is_anthropic_bedrock_model,
+        )
+        # When the user explicitly selected bedrock (not auto-detected),
+        # trust boto3's credential chain — it handles IMDS, ECS task roles,
+        # Lambda execution roles, SSO, and other implicit sources that our
+        # env-var check can't detect.
+        is_explicit = requested_provider in ("bedrock", "aws", "aws-bedrock", "amazon-bedrock", "amazon")
+        if not is_explicit and not has_aws_credentials():
+            raise AuthError(
+                "No AWS credentials found for Bedrock. Configure one of:\n"
+                "  - AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY\n"
+                "  - AWS_PROFILE (for SSO / named profiles)\n"
+                "  - IAM instance role (EC2, ECS, Lambda)\n"
+                "Or run 'aws configure' to set up credentials.",
+                code="no_aws_credentials",
+            )
+        # Read bedrock-specific config from config.yaml
+        from hermes_cli.config import load_config as _load_bedrock_config
+        _bedrock_cfg = _load_bedrock_config().get("bedrock", {})
+        # Region priority: config.yaml bedrock.region → env var → us-east-1
+        region = (_bedrock_cfg.get("region") or "").strip() or resolve_bedrock_region()
+        auth_source = resolve_aws_auth_env_var() or "aws-sdk-default-chain"
+        # Build guardrail config if configured
+        _gr = _bedrock_cfg.get("guardrail", {})
+        guardrail_config = None
+        if _gr.get("guardrail_identifier") and _gr.get("guardrail_version"):
+            guardrail_config = {
+                "guardrailIdentifier": _gr["guardrail_identifier"],
+                "guardrailVersion": _gr["guardrail_version"],
+            }
+            if _gr.get("stream_processing_mode"):
+                guardrail_config["streamProcessingMode"] = _gr["stream_processing_mode"]
+            if _gr.get("trace"):
+                guardrail_config["trace"] = _gr["trace"]
+        # Dual-path routing: Claude models use AnthropicBedrock SDK for full
+        # feature parity (prompt caching, thinking budgets, adaptive thinking).
+        # Non-Claude models use the Converse API for multi-model support.
+        _current_model = str(model_cfg.get("default") or "").strip()
+        if is_anthropic_bedrock_model(_current_model):
+            # Claude on Bedrock → AnthropicBedrock SDK → anthropic_messages path
+            runtime = {
+                "provider": "bedrock",
+                "api_mode": "anthropic_messages",
+                "base_url": f"https://bedrock-runtime.{region}.amazonaws.com",
+                "api_key": "aws-sdk",
+                "source": auth_source,
+                "region": region,
+                "bedrock_anthropic": True,  # Signal to use AnthropicBedrock client
+                "requested_provider": requested_provider,
+            }
+        else:
+            # Non-Claude (Nova, DeepSeek, Llama, etc.) → Converse API
+            runtime = {
+                "provider": "bedrock",
+                "api_mode": "bedrock_converse",
+                "base_url": f"https://bedrock-runtime.{region}.amazonaws.com",
+                "api_key": "aws-sdk",
+                "source": auth_source,
+                "region": region,
+                "requested_provider": requested_provider,
+            }
+        if guardrail_config:
+            runtime["guardrail_config"] = guardrail_config
+        return runtime
+
    # API-key providers (z.ai/GLM, Kimi, MiniMax, MiniMax-CN)
    pconfig = PROVIDER_REGISTRY.get(provider)
    if pconfig and pconfig.auth_type == "api_key":
@@ -852,6 +954,8 @@ def resolve_runtime_provider(
        api_mode = "chat_completions"
        if provider == "copilot":
            api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", ""))
+        elif provider == "xai":
+            api_mode = "codex_responses"
        else:
            configured_provider = str(model_cfg.get("provider") or "").strip().lower()
            # Only honor persisted api_mode when it belongs to the same provider family.
@@ -20,10 +20,7 @@ import copy
 from pathlib import Path
 from typing import Optional, Dict, Any

-from hermes_cli.nous_subscription import (
-    apply_nous_provider_defaults,
-    get_nous_subscription_features,
-)
+from hermes_cli.nous_subscription import get_nous_subscription_features
 from tools.tool_backend_helpers import managed_nous_tools_enabled
 from hermes_constants import get_optional_skills_dir

@@ -105,7 +102,7 @@ _DEFAULT_PROVIDER_MODELS = {
    "ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
    "kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
    "opencode-zen": ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash", "glm-5", "kimi-k2.5", "minimax-m2.7"],
-    "opencode-go": ["glm-5", "kimi-k2.5", "mimo-v2-pro", "mimo-v2-omni", "minimax-m2.5", "minimax-m2.7"],
+    "opencode-go": ["glm-5.1", "glm-5", "kimi-k2.5", "mimo-v2-pro", "mimo-v2-omni", "minimax-m2.5", "minimax-m2.7"],
    "huggingface": [
        "Qwen/Qwen3.5-397B-A17B", "Qwen/Qwen3-235B-A22B-Thinking-2507",
        "Qwen/Qwen3-Coder-480B-A35B-Instruct", "deepseek-ai/DeepSeek-R1-0528",
@@ -213,20 +210,20 @@ def prompt(question: str, default: str = None, password: bool = False) -> str:
        sys.exit(1)


-def _curses_prompt_choice(question: str, choices: list, default: int = 0) -> int:
+def _curses_prompt_choice(question: str, choices: list, default: int = 0, description: str | None = None) -> int:
    """Single-select menu using curses. Delegates to curses_radiolist."""
    from hermes_cli.curses_ui import curses_radiolist
-    return curses_radiolist(question, choices, selected=default, cancel_returns=-1)
+    return curses_radiolist(question, choices, selected=default, cancel_returns=-1, description=description)



-def prompt_choice(question: str, choices: list, default: int = 0) -> int:
+def prompt_choice(question: str, choices: list, default: int = 0, description: str | None = None) -> int:
    """Prompt for a choice from a list with arrow key navigation.

    Escape keeps the current default (skips the question).
    Ctrl+C exits the wizard.
    """
-    idx = _curses_prompt_choice(question, choices, default)
+    idx = _curses_prompt_choice(question, choices, default, description=description)
    if idx >= 0:
        if idx == default:
            print_info("  Skipped (keeping current)")
@@ -433,6 +430,8 @@ def _print_setup_summary(config: dict, hermes_home):
        tool_status.append(("Text-to-Speech (MiniMax)", True, None))
    elif tts_provider == "mistral" and get_env_value("MISTRAL_API_KEY"):
        tool_status.append(("Text-to-Speech (Mistral Voxtral)", True, None))
+    elif tts_provider == "gemini" and (get_env_value("GEMINI_API_KEY") or get_env_value("GOOGLE_API_KEY")):
+        tool_status.append(("Text-to-Speech (Google Gemini)", True, None))
    elif tts_provider == "neutts":
        try:
            import importlib.util
@@ -835,14 +834,7 @@ def setup_model_provider(config: dict, *, quick: bool = False):
            print_info("Skipped — add later with 'hermes setup' or configure AUXILIARY_VISION_* settings")


-    if selected_provider == "nous" and nous_subscription_selected:
-        changed_defaults = apply_nous_provider_defaults(config)
-        current_tts = str(config.get("tts", {}).get("provider") or "edge")
-        if "tts" in changed_defaults:
-            print_success("TTS provider set to: OpenAI TTS via your Nous subscription")
-        else:
-            print_info(f"Keeping your existing TTS provider: {current_tts}")
-
+    # Tool Gateway prompt is already shown by _model_flow_nous() above.
    save_config(config)

    if not quick and selected_provider != "nous":
@@ -920,8 +912,10 @@ def _setup_tts_provider(config: dict):
        "edge": "Edge TTS",
        "elevenlabs": "ElevenLabs",
        "openai": "OpenAI TTS",
+        "xai": "xAI TTS",
        "minimax": "MiniMax TTS",
        "mistral": "Mistral Voxtral TTS",
+        "gemini": "Google Gemini TTS",
        "neutts": "NeuTTS",
    }
    current_label = provider_labels.get(current_provider, current_provider)
@@ -941,12 +935,14 @@ def _setup_tts_provider(config: dict):
            "Edge TTS (free, cloud-based, no setup needed)",
            "ElevenLabs (premium quality, needs API key)",
            "OpenAI TTS (good quality, needs API key)",
+            "xAI TTS (Grok voices, needs API key)",
            "MiniMax TTS (high quality with voice cloning, needs API key)",
            "Mistral Voxtral TTS (multilingual, native Opus, needs API key)",
+            "Google Gemini TTS (30 prebuilt voices, prompt-controllable, needs API key)",
            "NeuTTS (local on-device, free, ~300MB model download)",
        ]
    )
-    providers.extend(["edge", "elevenlabs", "openai", "minimax", "mistral", "neutts"])
+    providers.extend(["edge", "elevenlabs", "openai", "xai", "minimax", "mistral", "gemini", "neutts"])
    choices.append(f"Keep current ({current_label})")
    keep_current_idx = len(choices) - 1
    idx = prompt_choice("Select TTS provider:", choices, keep_current_idx)
@@ -1012,6 +1008,23 @@ def _setup_tts_provider(config: dict):
                print_warning("No API key provided. Falling back to Edge TTS.")
                selected = "edge"

+    elif selected == "xai":
+        existing = get_env_value("XAI_API_KEY")
+        if not existing:
+            print()
+            api_key = prompt("xAI API key for TTS", password=True)
+            if api_key:
+                save_env_value("XAI_API_KEY", api_key)
+                print_success("xAI TTS API key saved")
+            else:
+                from hermes_constants import display_hermes_home as _dhh
+                print_warning(
+                    "No xAI API key provided for TTS. Configure XAI_API_KEY via "
+                    f"hermes setup model or {_dhh()}/.env to use xAI TTS. "
+                    "Falling back to Edge TTS."
+                )
+                selected = "edge"
+
    elif selected == "minimax":
        existing = get_env_value("MINIMAX_API_KEY")
        if not existing:
@@ -1036,6 +1049,19 @@ def _setup_tts_provider(config: dict):
                print_warning("No API key provided. Falling back to Edge TTS.")
                selected = "edge"

+    elif selected == "gemini":
+        existing = get_env_value("GEMINI_API_KEY") or get_env_value("GOOGLE_API_KEY")
+        if not existing:
+            print()
+            print_info("Get a free API key at https://aistudio.google.com/app/apikey")
+            api_key = prompt("Gemini API key for TTS", password=True)
+            if api_key:
+                save_env_value("GEMINI_API_KEY", api_key)
+                print_success("Gemini TTS API key saved")
+            else:
+                print_warning("No API key provided. Falling back to Edge TTS.")
+                selected = "edge"
+
    # Save the selection
    if "tts" not in config:
        config["tts"] = {}
@@ -1611,9 +1637,19 @@ def _setup_telegram():
            return

    print_info("Create a bot via @BotFather on Telegram")
-    token = prompt("Telegram bot token", password=True)
-    if not token:
-        return
+    import re
+
+    while True:
+        token = prompt("Telegram bot token", password=True)
+        if not token:
+            return
+        if not re.match(r"^\d+:[A-Za-z0-9_-]{30,}$", token):
+            print_error(
+                "Invalid token format. Expected: <numeric_id>:<alphanumeric_hash> "
+                "(e.g., 123456789:ABCdefGHI-jklMNOpqrSTUvwxYZ)"
+            )
+            continue
+        break
    save_env_value("TELEGRAM_BOT_TOKEN", token)
    print_success("Telegram token saved")

@@ -708,7 +708,9 @@ def init_skin_from_config(config: dict) -> None:

    Call this once during CLI init with the loaded config dict.
    """
-    display = config.get("display", {})
+    display = config.get("display") or {}
+    if not isinstance(display, dict):
+        display = {}
    skin_name = display.get("skin", "default")
    if isinstance(skin_name, str) and skin_name.strip():
        set_active_skin(skin_name.strip())
@@ -212,7 +212,7 @@ def show_status(args):
    if managed_nous_tools_enabled():
        features = get_nous_subscription_features(config)
        print()
-        print(color("◆ Nous Subscription Features", Colors.CYAN, Colors.BOLD))
+        print(color("◆ Nous Tool Gateway", Colors.CYAN, Colors.BOLD))
        if not features.nous_auth_present:
            print("  Nous Portal   ✗ not logged in")
        else:
@@ -230,6 +230,18 @@ def show_status(args):
            else:
                state = "not configured"
            print(f"  {feature.label:<15} {check_mark(feature.available or feature.active or feature.managed_by_nous)} {state}")
+    elif nous_logged_in:
+        # Logged into Nous but on the free tier — show upgrade nudge
+        print()
+        print(color("◆ Nous Tool Gateway", Colors.CYAN, Colors.BOLD))
+        print("  Your free-tier Nous account does not include Tool Gateway access.")
+        print("  Upgrade your subscription to unlock managed web, image, TTS, and browser tools.")
+        try:
+            portal_url = nous_status.get("portal_base_url", "").rstrip("/")
+            if portal_url:
+                print(f"  Upgrade: {portal_url}")
+        except Exception:
+            pass

    # =========================================================================
    # API-Key Providers
@@ -146,6 +146,14 @@ TOOL_CATEGORIES = {
                ],
                "tts_provider": "openai",
            },
+            {
+                "name": "xAI TTS",
+                "tag": "Grok voices - requires xAI API key",
+                "env_vars": [
+                    {"key": "XAI_API_KEY", "prompt": "xAI API key", "url": "https://console.x.ai/"},
+                ],
+                "tts_provider": "xai",
+            },
            {
                "name": "ElevenLabs",
                "badge": "paid",
@@ -164,6 +172,15 @@ TOOL_CATEGORIES = {
                ],
                "tts_provider": "mistral",
            },
+            {
+                "name": "Google Gemini TTS",
+                "badge": "preview",
+                "tag": "30 prebuilt voices, controllable via prompts",
+                "env_vars": [
+                    {"key": "GEMINI_API_KEY", "prompt": "Gemini API key", "url": "https://aistudio.google.com/app/apikey"},
+                ],
+                "tts_provider": "gemini",
+            },
        ],
    },
    "web": {
@@ -946,34 +963,49 @@ def _configure_provider(provider: dict, config: dict):

    # Set TTS provider in config if applicable
    if provider.get("tts_provider"):
-        config.setdefault("tts", {})["provider"] = provider["tts_provider"]
+        tts_cfg = config.setdefault("tts", {})
+        tts_cfg["provider"] = provider["tts_provider"]
+        tts_cfg["use_gateway"] = bool(managed_feature)

    # Set browser cloud provider in config if applicable
    if "browser_provider" in provider:
        bp = provider["browser_provider"]
+        browser_cfg = config.setdefault("browser", {})
        if bp == "local":
-            config.setdefault("browser", {})["cloud_provider"] = "local"
+            browser_cfg["cloud_provider"] = "local"
            _print_success("  Browser set to local mode")
        elif bp:
-            config.setdefault("browser", {})["cloud_provider"] = bp
+            browser_cfg["cloud_provider"] = bp
            _print_success(f"  Browser cloud provider set to: {bp}")
+        browser_cfg["use_gateway"] = bool(managed_feature)

    # Set web search backend in config if applicable
    if provider.get("web_backend"):
-        config.setdefault("web", {})["backend"] = provider["web_backend"]
+        web_cfg = config.setdefault("web", {})
+        web_cfg["backend"] = provider["web_backend"]
+        web_cfg["use_gateway"] = bool(managed_feature)
        _print_success(f"  Web backend set to: {provider['web_backend']}")

+    # For tools without a specific config key (e.g. image_gen), still
+    # track use_gateway so the runtime knows the user's intent.
+    if managed_feature and managed_feature not in ("web", "tts", "browser"):
+        config.setdefault(managed_feature, {})["use_gateway"] = True
+    elif not managed_feature:
+        # User picked a non-gateway provider — find which category this
+        # belongs to and clear use_gateway if it was previously set.
+        for cat_key, cat in TOOL_CATEGORIES.items():
+            if provider in cat.get("providers", []):
+                section = config.get(cat_key)
+                if isinstance(section, dict) and section.get("use_gateway"):
+                    section["use_gateway"] = False
+                break
+
    if not env_vars:
        if provider.get("post_setup"):
            _run_post_setup(provider["post_setup"])
        _print_success(f"  {provider['name']} - no configuration needed!")
        if managed_feature:
            _print_info("  Requests for this tool will be billed to your Nous subscription.")
-            override_envs = provider.get("override_env_vars", [])
-            if any(get_env_value(env_var) for env_var in override_envs):
-                _print_warning(
-                    "  Direct credentials are still configured and may take precedence until you remove them from ~/.hermes/.env."
-                )
        return

    # Prompt for each required env var
@@ -1179,11 +1211,6 @@ def _reconfigure_provider(provider: dict, config: dict):
        _print_success(f"  {provider['name']} - no configuration needed!")
        if managed_feature:
            _print_info("  Requests for this tool will be billed to your Nous subscription.")
-            override_envs = provider.get("override_env_vars", [])
-            if any(get_env_value(env_var) for env_var in override_envs):
-                _print_warning(
-                    "  Direct credentials are still configured and may take precedence until you remove them from ~/.hermes/.env."
-                )
        return

    for var in env_vars:
@@ -11,6 +11,7 @@ Usage:

 import asyncio
 import hmac
+import importlib.util
 import json
 import logging
 import os
@@ -96,6 +97,9 @@ _PUBLIC_API_PATHS: frozenset = frozenset({
    "/api/config/defaults",
    "/api/config/schema",
    "/api/model/info",
+    "/api/dashboard/themes",
+    "/api/dashboard/plugins",
+    "/api/dashboard/plugins/rescan",
 })


@@ -114,7 +118,7 @@ def _require_token(request: Request) -> None:
 async def auth_middleware(request: Request, call_next):
    """Require the session token on all /api/ routes except the public list."""
    path = request.url.path
-    if path.startswith("/api/") and path not in _PUBLIC_API_PATHS:
+    if path.startswith("/api/") and path not in _PUBLIC_API_PATHS and not path.startswith("/api/plugins/"):
        auth = request.headers.get("authorization", "")
        expected = f"Bearer {_SESSION_TOKEN}"
        if not hmac.compare_digest(auth.encode(), expected.encode()):
@@ -166,6 +170,11 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
        "description": "CLI visual theme",
        "options": ["default", "ares", "mono", "slate"],
    },
+    "dashboard.theme": {
+        "type": "select",
+        "description": "Web dashboard visual theme",
+        "options": ["default", "midnight", "ember", "mono", "cyberpunk", "rose"],
+    },
    "display.resume_display": {
        "type": "select",
        "description": "How resumed sessions display history",
@@ -224,6 +233,7 @@ _CATEGORY_MERGE: Dict[str, str] = {
    "approvals": "security",
    "human_delay": "display",
    "smart_model_routing": "agent",
+    "dashboard": "display",
 }

 # Display order for tabs — unlisted categories sort alphabetically after these.
@@ -457,6 +467,7 @@ async def get_status():
        "latest_config_version": latest_ver,
        "gateway_running": gateway_running,
        "gateway_pid": gateway_pid,
+        "gateway_health_url": _GATEWAY_HEALTH_URL,
        "gateway_state": gateway_state,
        "gateway_platforms": gateway_platforms,
        "gateway_exit_reason": gateway_exit_reason,
@@ -1977,8 +1988,7 @@ async def update_config_raw(body: RawConfigUpdate):
@app.get("/api/analytics/usage")
 async def get_usage_analytics(days: int = 30):
    from hermes_state import SessionDB
-    from hermes_constants import get_hermes_home
-    db = SessionDB(db_path=get_hermes_home() / "state.db")
+    db = SessionDB()
    try:
        cutoff = time.time() - (days * 86400)
        cur = db._conn.execute("""
@@ -1986,12 +1996,10 @@ async def get_usage_analytics(days: int = 30):
                   SUM(input_tokens) as input_tokens,
                   SUM(output_tokens) as output_tokens,
                   SUM(cache_read_tokens) as cache_read_tokens,
-                   SUM(cache_write_tokens) as cache_write_tokens,
                   SUM(reasoning_tokens) as reasoning_tokens,
                   COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
                   COALESCE(SUM(actual_cost_usd), 0) as actual_cost,
-                   COUNT(*) as sessions,
-                   SUM(COALESCE(api_call_count, 0)) as api_calls
+                   COUNT(*) as sessions
            FROM sessions WHERE started_at > ?
            GROUP BY day ORDER BY day
        """, (cutoff,))
@@ -2001,13 +2009,10 @@ async def get_usage_analytics(days: int = 30):
            SELECT model,
                   SUM(input_tokens) as input_tokens,
                   SUM(output_tokens) as output_tokens,
-                   SUM(cache_read_tokens) as cache_read_tokens,
-                   SUM(cache_write_tokens) as cache_write_tokens,
                   COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
-                   COUNT(*) as sessions,
-                   SUM(COALESCE(api_call_count, 0)) as api_calls
+                   COUNT(*) as sessions
            FROM sessions WHERE started_at > ? AND model IS NOT NULL
-            GROUP BY model ORDER BY SUM(input_tokens) + SUM(cache_read_tokens) + SUM(cache_write_tokens) + SUM(output_tokens) DESC
+            GROUP BY model ORDER BY SUM(input_tokens) + SUM(output_tokens) DESC
        """, (cutoff,))
        by_model = [dict(r) for r in cur2.fetchall()]

@@ -2015,12 +2020,10 @@ async def get_usage_analytics(days: int = 30):
            SELECT SUM(input_tokens) as total_input,
                   SUM(output_tokens) as total_output,
                   SUM(cache_read_tokens) as total_cache_read,
-                   SUM(cache_write_tokens) as total_cache_write,
                   SUM(reasoning_tokens) as total_reasoning,
                   COALESCE(SUM(estimated_cost_usd), 0) as total_estimated_cost,
                   COALESCE(SUM(actual_cost_usd), 0) as total_actual_cost,
-                   COUNT(*) as total_sessions,
-                   SUM(COALESCE(api_call_count, 0)) as total_api_calls
+                   COUNT(*) as total_sessions
            FROM sessions WHERE started_at > ?
        """, (cutoff,))
        totals = dict(cur3.fetchone())
@@ -2076,6 +2079,237 @@ def mount_spa(application: FastAPI):
        return _serve_index()


+# ---------------------------------------------------------------------------
+# Dashboard theme endpoints
+# ---------------------------------------------------------------------------
+
+# Built-in dashboard themes — label + description only.  The actual color
+# definitions live in the frontend (web/src/themes/presets.ts).
+_BUILTIN_DASHBOARD_THEMES = [
+    {"name": "default",   "label": "Hermes Teal",  "description": "Classic dark teal — the canonical Hermes look"},
+    {"name": "midnight",  "label": "Midnight",      "description": "Deep blue-violet with cool accents"},
+    {"name": "ember",     "label": "Ember",          "description": "Warm crimson and bronze — forge vibes"},
+    {"name": "mono",      "label": "Mono",           "description": "Clean grayscale — minimal and focused"},
+    {"name": "cyberpunk", "label": "Cyberpunk",      "description": "Neon green on black — matrix terminal"},
+    {"name": "rose",      "label": "Rosé",           "description": "Soft pink and warm ivory — easy on the eyes"},
+]
+
+
+def _discover_user_themes() -> list:
+    """Scan ~/.hermes/dashboard-themes/*.yaml for user-created themes."""
+    themes_dir = get_hermes_home() / "dashboard-themes"
+    if not themes_dir.is_dir():
+        return []
+    result = []
+    for f in sorted(themes_dir.glob("*.yaml")):
+        try:
+            data = yaml.safe_load(f.read_text(encoding="utf-8"))
+            if isinstance(data, dict) and data.get("name"):
+                result.append({
+                    "name": data["name"],
+                    "label": data.get("label", data["name"]),
+                    "description": data.get("description", ""),
+                })
+        except Exception:
+            continue
+    return result
+
+
+@app.get("/api/dashboard/themes")
+async def get_dashboard_themes():
+    """Return available themes and the currently active one."""
+    config = load_config()
+    active = config.get("dashboard", {}).get("theme", "default")
+    user_themes = _discover_user_themes()
+    # Merge built-in + user, user themes override built-in by name.
+    seen = set()
+    themes = []
+    for t in _BUILTIN_DASHBOARD_THEMES:
+        seen.add(t["name"])
+        themes.append(t)
+    for t in user_themes:
+        if t["name"] not in seen:
+            themes.append(t)
+            seen.add(t["name"])
+    return {"themes": themes, "active": active}
+
+
+class ThemeSetBody(BaseModel):
+    name: str
+
+
+@app.put("/api/dashboard/theme")
+async def set_dashboard_theme(body: ThemeSetBody):
+    """Set the active dashboard theme (persists to config.yaml)."""
+    config = load_config()
+    if "dashboard" not in config:
+        config["dashboard"] = {}
+    config["dashboard"]["theme"] = body.name
+    save_config(config)
+    return {"ok": True, "theme": body.name}
+
+
+# ---------------------------------------------------------------------------
+# Dashboard plugin system
+# ---------------------------------------------------------------------------
+
+def _discover_dashboard_plugins() -> list:
+    """Scan plugins/*/dashboard/manifest.json for dashboard extensions.
+
+    Checks three plugin sources (same as hermes_cli.plugins):
+    1. User plugins:    ~/.hermes/plugins/<name>/dashboard/manifest.json
+    2. Bundled plugins: <repo>/plugins/<name>/dashboard/manifest.json  (memory/, etc.)
+    3. Project plugins: ./.hermes/plugins/  (only if HERMES_ENABLE_PROJECT_PLUGINS)
+    """
+    plugins = []
+    seen_names: set = set()
+
+    search_dirs = [
+        (get_hermes_home() / "plugins", "user"),
+        (PROJECT_ROOT / "plugins" / "memory", "bundled"),
+        (PROJECT_ROOT / "plugins", "bundled"),
+    ]
+    if os.environ.get("HERMES_ENABLE_PROJECT_PLUGINS"):
+        search_dirs.append((Path.cwd() / ".hermes" / "plugins", "project"))
+
+    for plugins_root, source in search_dirs:
+        if not plugins_root.is_dir():
+            continue
+        for child in sorted(plugins_root.iterdir()):
+            if not child.is_dir():
+                continue
+            manifest_file = child / "dashboard" / "manifest.json"
+            if not manifest_file.exists():
+                continue
+            try:
+                data = json.loads(manifest_file.read_text(encoding="utf-8"))
+                name = data.get("name", child.name)
+                if name in seen_names:
+                    continue
+                seen_names.add(name)
+                plugins.append({
+                    "name": name,
+                    "label": data.get("label", name),
+                    "description": data.get("description", ""),
+                    "icon": data.get("icon", "Puzzle"),
+                    "version": data.get("version", "0.0.0"),
+                    "tab": data.get("tab", {"path": f"/{name}", "position": "end"}),
+                    "entry": data.get("entry", "dist/index.js"),
+                    "css": data.get("css"),
+                    "has_api": bool(data.get("api")),
+                    "source": source,
+                    "_dir": str(child / "dashboard"),
+                    "_api_file": data.get("api"),
+                })
+            except Exception as exc:
+                _log.warning("Bad dashboard plugin manifest %s: %s", manifest_file, exc)
+                continue
+    return plugins
+
+
+# Cache discovered plugins per-process (refresh on explicit re-scan).
+_dashboard_plugins_cache: Optional[list] = None
+
+
+def _get_dashboard_plugins(force_rescan: bool = False) -> list:
+    global _dashboard_plugins_cache
+    if _dashboard_plugins_cache is None or force_rescan:
+        _dashboard_plugins_cache = _discover_dashboard_plugins()
+    return _dashboard_plugins_cache
+
+
+@app.get("/api/dashboard/plugins")
+async def get_dashboard_plugins():
+    """Return discovered dashboard plugins."""
+    plugins = _get_dashboard_plugins()
+    # Strip internal fields before sending to frontend.
+    return [
+        {k: v for k, v in p.items() if not k.startswith("_")}
+        for p in plugins
+    ]
+
+
+@app.get("/api/dashboard/plugins/rescan")
+async def rescan_dashboard_plugins():
+    """Force re-scan of dashboard plugins."""
+    plugins = _get_dashboard_plugins(force_rescan=True)
+    return {"ok": True, "count": len(plugins)}
+
+
+@app.get("/dashboard-plugins/{plugin_name}/{file_path:path}")
+async def serve_plugin_asset(plugin_name: str, file_path: str):
+    """Serve static assets from a dashboard plugin directory.
+
+    Only serves files from the plugin's ``dashboard/`` subdirectory.
+    Path traversal is blocked by checking ``resolve().is_relative_to()``.
+    """
+    plugins = _get_dashboard_plugins()
+    plugin = next((p for p in plugins if p["name"] == plugin_name), None)
+    if not plugin:
+        raise HTTPException(status_code=404, detail="Plugin not found")
+
+    base = Path(plugin["_dir"])
+    target = (base / file_path).resolve()
+
+    if not target.is_relative_to(base.resolve()):
+        raise HTTPException(status_code=403, detail="Path traversal blocked")
+    if not target.exists() or not target.is_file():
+        raise HTTPException(status_code=404, detail="File not found")
+
+    # Guess content type
+    suffix = target.suffix.lower()
+    content_types = {
+        ".js": "application/javascript",
+        ".mjs": "application/javascript",
+        ".css": "text/css",
+        ".json": "application/json",
+        ".html": "text/html",
+        ".svg": "image/svg+xml",
+        ".png": "image/png",
+        ".jpg": "image/jpeg",
+        ".woff2": "font/woff2",
+        ".woff": "font/woff",
+    }
+    media_type = content_types.get(suffix, "application/octet-stream")
+    return FileResponse(target, media_type=media_type)
+
+
+def _mount_plugin_api_routes():
+    """Import and mount backend API routes from plugins that declare them.
+
+    Each plugin's ``api`` field points to a Python file that must expose
+    a ``router`` (FastAPI APIRouter).  Routes are mounted under
+    ``/api/plugins/<name>/``.
+    """
+    for plugin in _get_dashboard_plugins():
+        api_file_name = plugin.get("_api_file")
+        if not api_file_name:
+            continue
+        api_path = Path(plugin["_dir"]) / api_file_name
+        if not api_path.exists():
+            _log.warning("Plugin %s declares api=%s but file not found", plugin["name"], api_file_name)
+            continue
+        try:
+            spec = importlib.util.spec_from_file_location(
+                f"hermes_dashboard_plugin_{plugin['name']}", api_path,
+            )
+            if spec is None or spec.loader is None:
+                continue
+            mod = importlib.util.module_from_spec(spec)
+            spec.loader.exec_module(mod)
+            router = getattr(mod, "router", None)
+            if router is None:
+                _log.warning("Plugin %s api file has no 'router' attribute", plugin["name"])
+                continue
+            app.include_router(router, prefix=f"/api/plugins/{plugin['name']}")
+            _log.info("Mounted plugin API routes: /api/plugins/%s/", plugin["name"])
+        except Exception as exc:
+            _log.warning("Failed to load plugin %s API routes: %s", plugin["name"], exc)
+
+
+# Mount plugin API routes before the SPA catch-all.
+_mount_plugin_api_routes()
+
 mount_spa(app)


@@ -31,7 +31,7 @@ T = TypeVar("T")

 DEFAULT_DB_PATH = get_hermes_home() / "state.db"

-SCHEMA_VERSION = 7
+SCHEMA_VERSION = 6

 SCHEMA_SQL = """
 CREATE TABLE IF NOT EXISTS schema_version (
@@ -65,7 +65,6 @@ CREATE TABLE IF NOT EXISTS sessions (
    cost_source TEXT,
    pricing_version TEXT,
    title TEXT,
-    api_call_count INTEGER DEFAULT 0,
    FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
 );

@@ -330,17 +329,6 @@ class SessionDB:
                    except sqlite3.OperationalError:
                        pass  # Column already exists
                cursor.execute("UPDATE schema_version SET version = 6")
-            if current_version < 7:
-                # v7: add api_call_count column to sessions — tracks the number
-                # of individual LLM API calls made within a session (as opposed
-                # to the session count itself).
-                try:
-                    cursor.execute(
-                        'ALTER TABLE sessions ADD COLUMN "api_call_count" INTEGER DEFAULT 0'
-                    )
-                except sqlite3.OperationalError:
-                    pass  # Column already exists
-                cursor.execute("UPDATE schema_version SET version = 7")

        # Unique title index — always ensure it exists (safe to run after migrations
        # since the title column is guaranteed to exist at this point)
@@ -438,7 +426,6 @@ class SessionDB:
        billing_provider: Optional[str] = None,
        billing_base_url: Optional[str] = None,
        billing_mode: Optional[str] = None,
-        api_call_count: int = 0,
        absolute: bool = False,
    ) -> None:
        """Update token counters and backfill model if not already set.
@@ -468,8 +455,7 @@ class SessionDB:
                   billing_provider = COALESCE(billing_provider, ?),
                   billing_base_url = COALESCE(billing_base_url, ?),
                   billing_mode = COALESCE(billing_mode, ?),
-                   model = COALESCE(model, ?),
-                   api_call_count = ?
+                   model = COALESCE(model, ?)
                   WHERE id = ?"""
        else:
            sql = """UPDATE sessions SET
@@ -489,8 +475,7 @@ class SessionDB:
                   billing_provider = COALESCE(billing_provider, ?),
                   billing_base_url = COALESCE(billing_base_url, ?),
                   billing_mode = COALESCE(billing_mode, ?),
-                   model = COALESCE(model, ?),
-                   api_call_count = COALESCE(api_call_count, 0) + ?
+                   model = COALESCE(model, ?)
                   WHERE id = ?"""
        params = (
            input_tokens,
@@ -508,7 +493,6 @@ class SessionDB:
            billing_base_url,
            billing_mode,
            model,
-            api_call_count,
            session_id,
        )
        def _do(conn):
@@ -1176,6 +1160,23 @@ class SessionDB:
            results.append({**session, "messages": messages})
        return results

+    # ---------------------------------------------------------------
+    # Export sanitization
+    # ---------------------------------------------------------------
+    #
+    # When users share session exports for debugging or training, the
+    # raw JSON contains every user message, tool output, and reasoning
+    # trace — which often includes file contents, command output, env
+    # variables, paths, and other confidential information.
+    #
+    # ``sanitize_session_export`` produces a deep copy of the export
+    # with all content fields replaced by opaque ``[redacted:<kind>:<id>]``
+    # tokens. Structural metadata (IDs, roles, timestamps, token counts,
+    # tool names, finish reasons, model info, cost data) is preserved
+    # so that the shape of a conversation is still analysable.
+    #
+    # Inspired by anomalyco/opencode#22489 (opencode's ``export --sanitize``).
+
    def clear_messages(self, session_id: str) -> None:
        """Delete all messages for a session and reset its counters."""
        def _do(conn):
@@ -1252,3 +1253,136 @@ class SessionDB:
            return len(session_ids)

        return self._execute_write(_do)
+
+
+# =========================================================================
+# Session export sanitization
+# =========================================================================
+#
+# Ported from anomalyco/opencode#22489 — users often want to share a
+# session export for bug reports, feature requests, or training data
+# collection, but the raw export contains every user prompt, tool
+# output, file content, and reasoning trace. ``sanitize_session_export``
+# replaces content fields with opaque tokens while preserving the
+# conversation's structure and metrics.
+
+# Message-level content fields that are always redacted on a message.
+_REDACT_MSG_STRING_FIELDS = (
+    "content",
+    "reasoning",
+)
+
+# Session-level fields that can contain user-facing text.
+_REDACT_SESSION_STRING_FIELDS = (
+    "system_prompt",
+    "title",
+)
+
+
+def _redact_token(kind: str, id_: Any, value: Any) -> Any:
+    """Produce an opaque redaction token. Preserves empty/None values."""
+    if value in (None, "", b""):
+        return value
+    return f"[redacted:{kind}:{id_}]"
+
+
+def _redact_tool_call(call: Any, msg_id: Any, index: int) -> Any:
+    """Redact arguments inside a tool_call while preserving structure (id, name)."""
+    if not isinstance(call, dict):
+        return call
+    out = dict(call)
+    tcid = out.get("id") or f"{msg_id}-{index}"
+    fn = out.get("function")
+    if isinstance(fn, dict):
+        new_fn = dict(fn)
+        if "arguments" in new_fn and new_fn["arguments"] not in (None, "", "{}"):
+            new_fn["arguments"] = _redact_token("tool-input", tcid, new_fn["arguments"])
+        out["function"] = new_fn
+    # Some schemas put args at the top level rather than under ``function``.
+    if "arguments" in out and out["arguments"] not in (None, "", "{}"):
+        out["arguments"] = _redact_token("tool-input", tcid, out["arguments"])
+    return out
+
+
+def _redact_reasoning_details(details: Any, msg_id: Any) -> Any:
+    """Redact text inside OpenAI / Anthropic reasoning_details blocks.
+
+    ``reasoning_details`` is a list of dicts with shapes like::
+
+        {"type": "reasoning.text", "text": "..."}
+        {"type": "reasoning.encrypted", "data": "..."}
+        {"type": "reasoning.summary", "summary": "..."}
+
+    We preserve the block type/structure and redact the inner payload.
+    """
+    if not isinstance(details, list):
+        return details
+    out = []
+    for idx, block in enumerate(details):
+        if not isinstance(block, dict):
+            out.append(block)
+            continue
+        new_block = dict(block)
+        for key in ("text", "data", "summary", "content"):
+            if key in new_block and new_block[key] not in (None, ""):
+                new_block[key] = _redact_token(f"reasoning-{key}", f"{msg_id}-{idx}", new_block[key])
+        out.append(new_block)
+    return out
+
+
+def _redact_message(msg: Dict[str, Any]) -> Dict[str, Any]:
+    """Return a sanitized copy of a single message row."""
+    if not isinstance(msg, dict):
+        return msg
+    msg_id = msg.get("id", "msg")
+    out = dict(msg)
+
+    # Plain string content fields.
+    for field in _REDACT_MSG_STRING_FIELDS:
+        if field in out and out[field] not in (None, ""):
+            out[field] = _redact_token(field.replace("_", "-"), msg_id, out[field])
+
+    # Tool calls: keep structure (id, name) but redact arguments.
+    tcs = out.get("tool_calls")
+    if isinstance(tcs, list):
+        out["tool_calls"] = [_redact_tool_call(tc, msg_id, i) for i, tc in enumerate(tcs)]
+
+    # Reasoning details: preserve block structure, redact text/data.
+    if "reasoning_details" in out:
+        out["reasoning_details"] = _redact_reasoning_details(out["reasoning_details"], msg_id)
+
+    # Codex reasoning items follow the same shape as reasoning_details.
+    if "codex_reasoning_items" in out:
+        out["codex_reasoning_items"] = _redact_reasoning_details(out["codex_reasoning_items"], msg_id)
+
+    return out
+
+
+def sanitize_session_export(session: Dict[str, Any]) -> Dict[str, Any]:
+    """Return a deep-sanitized copy of a session export.
+
+    All user-facing content (message text, reasoning, tool arguments and
+    outputs, system prompt, title) is replaced by ``[redacted:<kind>:<id>]``
+    tokens. Structural metadata (ids, timestamps, token counts, tool names,
+    model/provider info, cost data, finish reasons) is preserved so the
+    export remains useful for debugging schema issues, analysing tool-use
+    patterns, or counting sessions without leaking confidential data.
+
+    The input dict is not mutated.
+    """
+    if not isinstance(session, dict):
+        return session
+    sid = session.get("id", "session")
+    out = dict(session)
+
+    # Session-level text fields (title, system prompt).
+    for field in _REDACT_SESSION_STRING_FIELDS:
+        if field in out and out[field] not in (None, ""):
+            out[field] = _redact_token(field.replace("_", "-"), sid, out[field])
+
+    # Messages list: sanitize each row.
+    msgs = out.get("messages")
+    if isinstance(msgs, list):
+        out["messages"] = [_redact_message(m) for m in msgs]
+
+    return out
@@ -1,665 +0,0 @@
-<!doctype html>
-<html lang="en">
-  <head>
-    <meta charset="UTF-8" />
-    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <title>Hermes Agent — An Agent That Grows With You</title>
-    <meta
-      name="description"
-      content="An open-source agent that grows with you — learns your projects, builds its own skills, and reaches you wherever you are. By Nous Research."
-    />
-    <meta name="theme-color" content="#0A0E1A" />
-
-    <meta property="og:title" content="Hermes Agent — AI Agent Framework" />
-    <meta
-      property="og:description"
-      content="An open-source agent that grows with you. Install it, give it your messaging accounts, and it becomes a persistent personal agent — learning your projects, building its own skills, and reaching you wherever you are."
-    />
-    <meta property="og:type" content="website" />
-    <meta property="og:url" content="https://hermes-agent.nousresearch.com" />
-    <meta
-      property="og:image"
-      content="https://hermes-agent.nousresearch.com/hermes-agent-banner.png"
-    />
-
-    <link rel="preconnect" href="https://fonts.googleapis.com" />
-    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
-    <link
-      href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap"
-      rel="stylesheet"
-    />
-
-    <script
-      src="https://cdnjs.cloudflare.com/ajax/libs/three.js/r128/three.min.js"
-      defer
-    ></script>
-    <link rel="stylesheet" href="style.css" />
-    <link rel="icon" type="image/x-icon" href="favicon.ico" />
-    <link rel="icon" type="image/png" sizes="32x32" href="favicon-32x32.png" />
-    <link rel="icon" type="image/png" sizes="16x16" href="favicon-16x16.png" />
-    <link rel="apple-touch-icon" sizes="180x180" href="apple-touch-icon.png" />
-  </head>
-  <body>
-    <canvas id="noise-overlay"></canvas>
-
-    <div class="ambient-glow glow-1"></div>
-    <div class="ambient-glow glow-2"></div>
-
-    <nav class="nav">
-      <div class="nav-inner">
-        <a href="#" class="nav-logo">
-          <img src="nous-logo.png" alt="Nous Research" class="nav-nous-logo" />
-          <span class="nav-brand"
-            >Hermes Agent <span class="nav-by">by Nous Research</span></span
-          >
-        </a>
-        <div class="nav-links">
-          <a href="#install">Install</a>
-          <a href="#features">Features</a>
-          <a href="/docs/">Docs</a>
-          <a
-            href="https://github.com/NousResearch/hermes-agent"
-            target="_blank"
-            rel="noopener"
-            >GitHub</a
-          >
-          <a
-            href="https://discord.gg/NousResearch"
-            target="_blank"
-            rel="noopener"
-            >Discord</a
-          >
-        </div>
-        <button
-          class="nav-hamburger"
-          id="nav-hamburger"
-          onclick="toggleMobileNav()"
-          aria-label="Toggle menu"
-        >
-          <span class="hamburger-bar"></span>
-          <span class="hamburger-bar"></span>
-          <span class="hamburger-bar"></span>
-        </button>
-        <div class="nav-mobile" id="nav-mobile">
-          <a href="#install" onclick="toggleMobileNav()">Install</a>
-          <a href="#features" onclick="toggleMobileNav()">Features</a>
-          <a href="/docs/">Docs</a>
-          <a
-            href="https://github.com/NousResearch/hermes-agent"
-            target="_blank"
-            rel="noopener"
-            >GitHub</a
-          >
-          <a
-            href="https://discord.gg/NousResearch"
-            target="_blank"
-            rel="noopener"
-            >Discord</a
-          >
-        </div>
-      </div>
-    </nav>
-
-    <section class="hero">
-      <div class="hero-content">
-        <div class="hero-badge">
-          <span class="badge-dot"></span>
-          Open Source &bull; MIT License
-        </div>
-
-        <!-- prettier-ignore -->
-        <pre class="hero-ascii" aria-hidden="true" style="font-family: monospace; line-height: 1.1">
-██╗  ██╗███████╗██████╗ ███╗   ███╗███████╗███████╗     █████╗  ██████╗ ███████╗███╗   ██╗████████╗
-██║  ██║██╔════╝██╔══██╗████╗ ████║██╔════╝██╔════╝    ██╔══██╗██╔════╝ ██╔════╝████╗  ██║╚══██╔══╝
-███████║█████╗  ██████╔╝██╔████╔██║█████╗  ███████╗    ███████║██║  ███╗█████╗  ██╔██╗ ██║   ██║   
-██╔══██║██╔══╝  ██╔══██╗██║╚██╔╝██║██╔══╝  ╚════██║    ██╔══██║██║   ██║██╔══╝  ██║╚██╗██║   ██║   
-██║  ██║███████╗██║  ██║██║ ╚═╝ ██║███████╗███████║    ██║  ██║╚██████╔╝███████╗██║ ╚████║   ██║   
-╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚══════╝╚══════╝    ╚═╝  ╚═╝ ╚═════╝ ╚══════╝╚═╝  ╚═══╝   ╚═╝   
-</pre>
-
-        <h1 class="hero-title">
-          An agent that<br />
-          <span class="hero-gradient">grows with you.</span>
-        </h1>
-
-        <p class="hero-subtitle">
-          It's not a coding copilot tethered to an IDE or a chatbot wrapper
-          around a single API. It's an <strong>autonomous agent</strong> that
-          lives on your server, remembers what it learns, and gets more capable
-          the longer it runs.
-        </p>
-
-        <div class="hero-install">
-          <div class="install-widget">
-            <div class="install-widget-header">
-              <div class="install-dots">
-                <span class="dot dot-red"></span>
-                <span class="dot dot-yellow"></span>
-                <span class="dot dot-green"></span>
-              </div>
-              <div class="install-tabs">
-                <button
-                  class="install-tab active"
-                  data-platform="linux"
-                  onclick="switchPlatform('linux')"
-                >
-                  Linux / macOS / WSL
-                </button>
-              </div>
-            </div>
-            <div class="install-widget-body">
-              <span class="install-prompt" id="install-prompt">$</span>
-              <code id="install-command"
-                >curl -fsSL
-                https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh
-                | bash</code
-              >
-              <button
-                class="copy-btn"
-                onclick="copyInstall()"
-                title="Copy to clipboard"
-              >
-                <svg
-                  width="16"
-                  height="16"
-                  viewBox="0 0 24 24"
-                  fill="none"
-                  stroke="currentColor"
-                  stroke-width="2"
-                  stroke-linecap="round"
-                  stroke-linejoin="round"
-                >
-                  <rect x="9" y="9" width="13" height="13" rx="2" ry="2" />
-                  <path
-                    d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"
-                  />
-                </svg>
-                <span class="copy-text">Copy</span>
-              </button>
-            </div>
-          </div>
-          <p class="install-note" id="install-note">
-            Works on Linux, macOS & WSL2 · No prerequisites · Installs
-            everything automatically
-          </p>
-        </div>
-
-        <div class="hero-links">
-          <a
-            href="https://portal.nousresearch.com"
-            class="btn btn-primary"
-            target="_blank"
-            rel="noopener"
-          >
-            <svg
-              width="20"
-              height="20"
-              viewBox="0 0 24 24"
-              fill="none"
-              stroke="currentColor"
-              stroke-width="2"
-              stroke-linecap="round"
-              stroke-linejoin="round"
-            >
-              <path d="M15 3h4a2 2 0 0 1 2 2v14a2 2 0 0 1-2 2h-4" />
-              <polyline points="10 17 15 12 10 7" />
-              <line x1="15" y1="12" x2="3" y2="12" />
-            </svg>
-            Sign Up on Nous Portal
-          </a>
-        </div>
-      </div>
-    </section>
-
-    <section class="section section-install" id="install">
-      <div class="container">
-        <div class="section-header">
-          <h2>Get started in 60 seconds</h2>
-        </div>
-
-        <div class="install-steps">
-          <div class="install-step">
-            <div class="step-number">1</div>
-            <div class="step-content">
-              <h4>Install</h4>
-              <div class="code-block">
-                <div class="code-header">
-                  <div class="code-tabs">
-                    <button
-                      class="code-tab active"
-                      data-platform="linux"
-                      onclick="switchStepPlatform('linux')"
-                    >
-                      Linux / macOS / WSL
-                    </button>
-                  </div>
-                  <button
-                    class="copy-btn"
-                    id="step1-copy"
-                    onclick="copyText(this)"
-                    data-text="curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash"
-                  >
-                    Copy
-                  </button>
-                </div>
-                <pre><code id="step1-command">curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash</code></pre>
-              </div>
-              <p class="step-note" id="step1-note">
-                Installs uv, Python 3.11, clones the repo, sets up everything.
-                No sudo needed.
-              </p>
-            </div>
-          </div>
-
-          <div class="install-step">
-            <div class="step-number">2</div>
-            <div class="step-content">
-              <h4>Configure</h4>
-              <div class="code-block">
-                <div class="code-header">
-                  <span>bash</span>
-                  <button
-                    class="copy-btn"
-                    onclick="copyText(this)"
-                    data-text="hermes setup"
-                  >
-                    Copy
-                  </button>
-                </div>
-                <pre><code><span class="code-comment"># Interactive setup wizard</span>
-hermes setup
-
-<span class="code-comment"># Or choose your model</span>
-hermes model</code></pre>
-              </div>
-              <p class="step-note">
-                Connect to Nous Portal (OAuth), OpenRouter (API key), or your
-                own endpoint.
-              </p>
-            </div>
-          </div>
-
-          <div class="install-step">
-            <div class="step-number">3</div>
-            <div class="step-content">
-              <h4>Start chatting</h4>
-              <div class="code-block">
-                <div class="code-header">
-                  <span>bash</span>
-                  <button
-                    class="copy-btn"
-                    onclick="copyText(this)"
-                    data-text="hermes"
-                  >
-                    Copy
-                  </button>
-                </div>
-                <pre><code>hermes</code></pre>
-              </div>
-              <p class="step-note">
-                That's it. Full interactive CLI with tools, memory, and skills.
-              </p>
-            </div>
-          </div>
-
-          <div class="install-step">
-            <div class="step-number">4</div>
-            <div class="step-content">
-              <h4>
-                Go multi-platform <span class="step-optional">(optional)</span>
-              </h4>
-              <div class="code-block">
-                <div class="code-header">
-                  <span>bash</span>
-                  <button
-                    class="copy-btn"
-                    onclick="copyText(this)"
-                    data-text="hermes gateway setup"
-                  >
-                    Copy
-                  </button>
-                </div>
-                <pre><code><span class="code-comment"># Interactive gateway setup wizard</span>
-hermes gateway setup
-
-<span class="code-comment"># Start the messaging gateway</span>
-hermes gateway
-
-<span class="code-comment"># Install as a system service</span>
-hermes gateway install</code></pre>
-              </div>
-              <p class="step-note">
-                Walk through connecting Telegram, Discord, Slack, or WhatsApp.
-                Runs as a systemd service.
-              </p>
-            </div>
-          </div>
-
-          <div class="install-step">
-            <div class="step-number">5</div>
-            <div class="step-content">
-              <h4>Keep it up to date</h4>
-              <div class="code-block">
-                <div class="code-header">
-                  <span>bash</span>
-                  <button
-                    class="copy-btn"
-                    onclick="copyText(this)"
-                    data-text="hermes update"
-                  >
-                    Copy
-                  </button>
-                </div>
-                <pre><code>hermes update</code></pre>
-              </div>
-              <p class="step-note">
-                Pulls the latest changes and reinstalls dependencies. Run
-                anytime to get new features and fixes.
-              </p>
-            </div>
-          </div>
-        </div>
-
-        <div class="install-windows">
-          <p>
-            Native Windows support is extremely experimental and unsupported.
-            Please install
-            <a
-              href="https://learn.microsoft.com/en-us/windows/wsl/install"
-              target="_blank"
-              rel="noopener"
-              >WSL2</a
-            >
-            and run Hermes Agent from there.
-          </p>
-        </div>
-      </div>
-    </section>
-
-    <!-- Terminal Demo -->
-    <section class="section section-demo" id="demo">
-      <div class="container">
-        <div class="section-header">
-          <h2>See it in action</h2>
-        </div>
-
-        <div class="terminal-window">
-          <div class="terminal-header">
-            <div class="terminal-dots">
-              <span class="dot dot-red"></span>
-              <span class="dot dot-yellow"></span>
-              <span class="dot dot-green"></span>
-            </div>
-            <span class="terminal-title">hermes</span>
-          </div>
-          <div class="terminal-body" id="terminal-demo"></div>
-        </div>
-      </div>
-    </section>
-
-    <!-- Features + Specs -->
-    <section class="section" id="features">
-      <div class="container">
-        <div class="section-header">
-          <h2>Features</h2>
-        </div>
-
-        <div class="features-grid">
-          <div class="feature-card">
-            <div class="feature-header">
-              <div class="feature-icon">
-                <svg
-                  width="20"
-                  height="20"
-                  viewBox="0 0 24 24"
-                  fill="none"
-                  stroke="currentColor"
-                  stroke-width="1.5"
-                  stroke-linecap="round"
-                  stroke-linejoin="round"
-                >
-                  <path
-                    d="M21 15a2 2 0 0 1-2 2H7l-4 4V5a2 2 0 0 1 2-2h14a2 2 0 0 1 2 2z"
-                  />
-                </svg>
-              </div>
-              <h3>Lives Where You Do</h3>
-            </div>
-            <p>
-              Telegram, Discord, Slack, WhatsApp, and CLI from a single gateway
-              — start on one, pick up on another.
-            </p>
-          </div>
-
-          <div class="feature-card">
-            <div class="feature-header">
-              <div class="feature-icon">
-                <svg
-                  width="20"
-                  height="20"
-                  viewBox="0 0 24 24"
-                  fill="none"
-                  stroke="currentColor"
-                  stroke-width="1.5"
-                  stroke-linecap="round"
-                  stroke-linejoin="round"
-                >
-                  <polyline points="22 7 13.5 15.5 8.5 10.5 2 17" />
-                  <polyline points="16 7 22 7 22 13" />
-                </svg>
-              </div>
-              <h3>Grows the Longer It Runs</h3>
-            </div>
-            <p>
-              Persistent memory and auto-generated skills — it learns your
-              projects and never forgets how it solved a problem.
-            </p>
-          </div>
-
-          <div class="feature-card">
-            <div class="feature-header">
-              <div class="feature-icon">
-                <svg
-                  width="20"
-                  height="20"
-                  viewBox="0 0 24 24"
-                  fill="none"
-                  stroke="currentColor"
-                  stroke-width="1.5"
-                  stroke-linecap="round"
-                  stroke-linejoin="round"
-                >
-                  <circle cx="12" cy="12" r="10" />
-                  <polyline points="12 6 12 12 16 14" />
-                </svg>
-              </div>
-              <h3>Scheduled Automations</h3>
-            </div>
-            <p>
-              Natural language cron scheduling for reports, backups, and
-              briefings — running unattended through the gateway.
-            </p>
-          </div>
-
-          <div class="feature-card">
-            <div class="feature-header">
-              <div class="feature-icon">
-                <svg
-                  width="20"
-                  height="20"
-                  viewBox="0 0 24 24"
-                  fill="none"
-                  stroke="currentColor"
-                  stroke-width="1.5"
-                  stroke-linecap="round"
-                  stroke-linejoin="round"
-                >
-                  <circle cx="18" cy="18" r="3" />
-                  <circle cx="6" cy="6" r="3" />
-                  <path d="M6 21V9a9 9 0 0 0 9 9" />
-                  <path d="M18 3v12a9 9 0 0 1-9-9" />
-                </svg>
-              </div>
-              <h3>Delegates & Parallelizes</h3>
-            </div>
-            <p>
-              Isolated subagents with their own conversations, terminals, and
-              Python RPC scripts for zero-context-cost pipelines.
-            </p>
-          </div>
-
-          <div class="feature-card">
-            <div class="feature-header">
-              <div class="feature-icon">
-                <svg
-                  width="20"
-                  height="20"
-                  viewBox="0 0 24 24"
-                  fill="none"
-                  stroke="currentColor"
-                  stroke-width="1.5"
-                  stroke-linecap="round"
-                  stroke-linejoin="round"
-                >
-                  <rect x="3" y="11" width="18" height="11" rx="2" ry="2" />
-                  <path d="M7 11V7a5 5 0 0 1 10 0v4" />
-                </svg>
-              </div>
-              <h3>Real Sandboxing</h3>
-            </div>
-            <p>
-              Five backends — local, Docker, SSH, Singularity, Modal — with
-              container hardening and namespace isolation.
-            </p>
-          </div>
-
-          <div class="feature-card">
-            <div class="feature-header">
-              <div class="feature-icon">
-                <svg
-                  width="20"
-                  height="20"
-                  viewBox="0 0 24 24"
-                  fill="none"
-                  stroke="currentColor"
-                  stroke-width="1.5"
-                  stroke-linecap="round"
-                  stroke-linejoin="round"
-                >
-                  <circle cx="12" cy="12" r="10" />
-                  <line x1="2" y1="12" x2="22" y2="12" />
-                  <path
-                    d="M12 2a15.3 15.3 0 0 1 4 10 15.3 15.3 0 0 1-4 10 15.3 15.3 0 0 1-4-10 15.3 15.3 0 0 1 4-10z"
-                  />
-                </svg>
-              </div>
-              <h3>Full Web & Browser Control</h3>
-            </div>
-            <p>
-              Web search, browser automation, vision, image generation,
-              text-to-speech, and multi-model reasoning.
-            </p>
-          </div>
-        </div>
-
-        <div class="features-more">
-          <button class="more-toggle" onclick="toggleSpecs()" id="specs-toggle">
-            <span class="toggle-label">More details</span>
-            <svg
-              class="more-chevron"
-              width="16"
-              height="16"
-              viewBox="0 0 24 24"
-              fill="none"
-              stroke="currentColor"
-              stroke-width="2"
-              stroke-linecap="round"
-              stroke-linejoin="round"
-            >
-              <polyline points="6 9 12 15 18 9" />
-            </svg>
-          </button>
-        </div>
-
-        <div class="specs-wrapper" id="specs-wrapper">
-          <div class="specs-list">
-            <div class="spec-row">
-              <h3 class="spec-label">Tools</h3>
-              <p class="spec-value">
-                40+ built-in — web search, terminal, file system, browser
-                automation, vision, image generation, text-to-speech, code
-                execution, subagent delegation, memory, task planning, cron
-                scheduling, multi-model reasoning, and more.
-              </p>
-            </div>
-
-            <div class="spec-row">
-              <h3 class="spec-label">Platforms</h3>
-              <p class="spec-value">
-                Telegram, Discord, Slack, WhatsApp, Signal, Email, and CLI — all
-                from a single gateway. Connect to
-                <a
-                  href="https://portal.nousresearch.com"
-                  target="_blank"
-                  rel="noopener"
-                  >Nous Portal</a
-                >, OpenRouter, or any OpenAI-compatible API.
-              </p>
-            </div>
-
-            <div class="spec-row">
-              <h3 class="spec-label">Environments</h3>
-              <p class="spec-value">
-                Run locally, in Docker, over SSH, on Modal, Daytona, or
-                Singularity. Container hardening with read-only root, dropped
-                capabilities, and namespace isolation.
-              </p>
-            </div>
-
-            <div class="spec-row">
-              <h3 class="spec-label">Skills</h3>
-              <p class="spec-value">
-                40+ bundled skills covering MLOps, GitHub workflows, research,
-                and more. The agent creates new skills on the fly and shares
-                them via the open
-                <a href="https://agentskills.io" target="_blank" rel="noopener"
-                  >agentskills.io</a
-                >
-                format. Install community skills from
-                <a href="https://clawhub.ai" target="_blank" rel="noopener"
-                  >ClawHub</a
-                >,
-                <a href="https://lobehub.com" target="_blank" rel="noopener"
-                  >LobeHub</a
-                >, and GitHub.
-              </p>
-            </div>
-
-            <div class="spec-row">
-              <h3 class="spec-label">Research</h3>
-              <p class="spec-value">
-                Batch trajectory generation with parallel workers and
-                checkpointing. Atropos integration for RL training. Export to
-                ShareGPT for fine-tuning with trajectory compression.
-              </p>
-            </div>
-          </div>
-        </div>
-      </div>
-    </section>
-
-    <footer class="footer">
-      <div class="container">
-        <p class="footer-copy">
-          Built by
-          <a href="https://nousresearch.com" target="_blank" rel="noopener"
-            >Nous Research</a
-          >
-          &middot; MIT License &middot; 2026
-        </p>
-      </div>
-    </footer>
-
-    <script src="script.js"></script>
-  </body>
-</html>
@@ -1,521 +0,0 @@
-// =========================================================================
-// Hermes Agent Landing Page — Interactions
-// =========================================================================
-
-// --- Platform install commands ---
-const PLATFORMS = {
-  linux: {
-    command:
-      "curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash",
-    prompt: "$",
-    note: "Works on Linux, macOS & WSL2 · No prerequisites · Installs everything automatically",
-    stepNote:
-      "Installs uv, Python 3.11, clones the repo, sets up everything. No sudo needed.",
-  },
-};
-
-function detectPlatform() {
-  return "linux";
-}
-
-function switchPlatform(platform) {
-  const cfg = PLATFORMS[platform];
-  if (!cfg) return;
-
-  // Update hero install widget
-  const commandEl = document.getElementById("install-command");
-  const promptEl = document.getElementById("install-prompt");
-  const noteEl = document.getElementById("install-note");
-
-  if (commandEl) commandEl.textContent = cfg.command;
-  if (promptEl) promptEl.textContent = cfg.prompt;
-  if (noteEl) noteEl.textContent = cfg.note;
-
-  // Update active tab in hero
-  document.querySelectorAll(".install-tab").forEach((tab) => {
-    tab.classList.toggle("active", tab.dataset.platform === platform);
-  });
-
-  // Sync the step section tabs too
-  switchStepPlatform(platform);
-}
-
-function switchStepPlatform(platform) {
-  const cfg = PLATFORMS[platform];
-  if (!cfg) return;
-
-  const commandEl = document.getElementById("step1-command");
-  const copyBtn = document.getElementById("step1-copy");
-  const noteEl = document.getElementById("step1-note");
-
-  if (commandEl) commandEl.textContent = cfg.command;
-  if (copyBtn) copyBtn.setAttribute("data-text", cfg.command);
-  if (noteEl) noteEl.textContent = cfg.stepNote;
-
-  // Update active tab in step section
-  document.querySelectorAll(".code-tab").forEach((tab) => {
-    tab.classList.toggle("active", tab.dataset.platform === platform);
-  });
-}
-
-function toggleMobileNav() {
-  document.getElementById("nav-mobile").classList.toggle("open");
-  document.getElementById("nav-hamburger").classList.toggle("open");
-}
-
-function toggleSpecs() {
-  const wrapper = document.getElementById("specs-wrapper");
-  const btn = document.getElementById("specs-toggle");
-  const label = btn.querySelector(".toggle-label");
-  const isOpen = wrapper.classList.contains("open");
-
-  if (isOpen) {
-    wrapper.style.maxHeight = wrapper.scrollHeight + "px";
-    requestAnimationFrame(() => {
-      wrapper.style.maxHeight = "0";
-    });
-    wrapper.classList.remove("open");
-    btn.classList.remove("open");
-    if (label) label.textContent = "More details";
-  } else {
-    wrapper.classList.add("open");
-    wrapper.style.maxHeight = wrapper.scrollHeight + "px";
-    btn.classList.add("open");
-    if (label) label.textContent = "Less";
-    wrapper.addEventListener(
-      "transitionend",
-      () => {
-        if (wrapper.classList.contains("open")) {
-          wrapper.style.maxHeight = "none";
-        }
-      },
-      { once: true }
-    );
-  }
-}
-
-// --- Copy to clipboard ---
-function copyInstall() {
-  const text = document.getElementById("install-command").textContent;
-  navigator.clipboard.writeText(text).then(() => {
-    const btn = document.querySelector(".install-widget-body .copy-btn");
-    const original = btn.querySelector(".copy-text").textContent;
-    btn.querySelector(".copy-text").textContent = "Copied!";
-    btn.style.color = "var(--primary-light)";
-    setTimeout(() => {
-      btn.querySelector(".copy-text").textContent = original;
-      btn.style.color = "";
-    }, 2000);
-  });
-}
-
-function copyText(btn) {
-  const text = btn.getAttribute("data-text");
-  navigator.clipboard.writeText(text).then(() => {
-    const original = btn.textContent;
-    btn.textContent = "Copied!";
-    btn.style.color = "var(--primary-light)";
-    setTimeout(() => {
-      btn.textContent = original;
-      btn.style.color = "";
-    }, 2000);
-  });
-}
-
-// --- Scroll-triggered fade-in ---
-function initScrollAnimations() {
-  const elements = document.querySelectorAll(
-    ".feature-card, .install-step, " +
-      ".section-header, .terminal-window",
-  );
-
-  elements.forEach((el) => el.classList.add("fade-in"));
-
-  const observer = new IntersectionObserver(
-    (entries) => {
-      entries.forEach((entry) => {
-        if (entry.isIntersecting) {
-          // Stagger children within grids
-          const parent = entry.target.parentElement;
-          if (parent) {
-            const siblings = parent.querySelectorAll(".fade-in");
-            let idx = Array.from(siblings).indexOf(entry.target);
-            if (idx < 0) idx = 0;
-            setTimeout(() => {
-              entry.target.classList.add("visible");
-            }, idx * 60);
-          } else {
-            entry.target.classList.add("visible");
-          }
-          observer.unobserve(entry.target);
-        }
-      });
-    },
-    { threshold: 0.1, rootMargin: "0px 0px -40px 0px" },
-  );
-
-  elements.forEach((el) => observer.observe(el));
-}
-
-// --- Terminal Demo ---
-const CURSOR = '<span class="terminal-cursor">█</span>';
-
-const demoSequence = [
-  { type: "prompt", text: "❯ " },
-  {
-    type: "type",
-    text: "Research the latest approaches to GRPO training and write a summary",
-    delay: 30,
-  },
-  { type: "pause", ms: 600 },
-  {
-    type: "output",
-    lines: [
-      "",
-      '<span class="t-dim">  web_search "GRPO reinforcement learning 2026"       1.2s</span>',
-    ],
-  },
-  { type: "pause", ms: 400 },
-  {
-    type: "output",
-    lines: [
-      '<span class="t-dim">  web_extract arxiv.org/abs/2402.03300                3.1s</span>',
-    ],
-  },
-  { type: "pause", ms: 400 },
-  {
-    type: "output",
-    lines: [
-      '<span class="t-dim">  web_search "GRPO vs PPO ablation results"           0.9s</span>',
-    ],
-  },
-  { type: "pause", ms: 400 },
-  {
-    type: "output",
-    lines: [
-      '<span class="t-dim">  web_extract huggingface.co/blog/grpo                2.8s</span>',
-    ],
-  },
-  { type: "pause", ms: 400 },
-  {
-    type: "output",
-    lines: [
-      '<span class="t-dim">  write_file ~/research/grpo-summary.md               0.1s</span>',
-    ],
-  },
-  { type: "pause", ms: 500 },
-  {
-    type: "output",
-    lines: [
-      "",
-      '<span class="t-text">Done! I\'ve written a summary covering:</span>',
-      "",
-      '<span class="t-text">  <span class="t-green">✓</span> GRPO\'s group-relative advantage (no critic model needed)</span>',
-      '<span class="t-text">  <span class="t-green">✓</span> Comparison with PPO/DPO on reasoning benchmarks</span>',
-      '<span class="t-text">  <span class="t-green">✓</span> Implementation notes for Axolotl and TRL</span>',
-      "",
-      '<span class="t-text">Saved to</span> <span class="t-accent">~/research/grpo-summary.md</span>',
-    ],
-  },
-  { type: "pause", ms: 2500 },
-
-  { type: "clear" },
-  { type: "prompt", text: "❯ " },
-  {
-    type: "type",
-    text: "Review the PR at NousResearch/hermes-agent#42 and fix any issues",
-    delay: 30,
-  },
-  { type: "pause", ms: 600 },
-  {
-    type: "output",
-    lines: [
-      "",
-      '<span class="t-dim">  delegate_task "review PR #42 changes"                2.1s</span>',
-    ],
-  },
-  { type: "pause", ms: 500 },
-  {
-    type: "output",
-    lines: [
-      '<span class="t-dim">  git diff main..pr-42                                 0.4s</span>',
-    ],
-  },
-  { type: "pause", ms: 400 },
-  {
-    type: "output",
-    lines: [
-      '<span class="t-dim">  patch tools/registry.py                              0.1s</span>',
-    ],
-  },
-  { type: "pause", ms: 400 },
-  {
-    type: "output",
-    lines: [
-      '<span class="t-dim">  python -m pytest tests/ -x                           3.2s</span>',
-    ],
-  },
-  { type: "pause", ms: 400 },
-  {
-    type: "output",
-    lines: [
-      '<span class="t-dim">  git commit -m "fix: handle empty tool schemas"       0.3s</span>',
-    ],
-  },
-  { type: "pause", ms: 500 },
-  {
-    type: "output",
-    lines: [
-      "",
-      '<span class="t-text">Found 2 issues in the PR and fixed both:</span>',
-      "",
-      '<span class="t-text">  <span class="t-green">✓</span> Empty tool schema crash in registry.py — added guard</span>',
-      '<span class="t-text">  <span class="t-green">✓</span> Missing error handling in delegate_tool.py — added try/except</span>',
-      "",
-      '<span class="t-text">Tests pass. Committed the fix and pushed to the PR branch.</span>',
-      '<span class="t-text">I also saved a</span> <span class="t-accent">skill</span> <span class="t-text">for this PR review pattern.</span>',
-    ],
-  },
-  { type: "pause", ms: 2500 },
-
-  { type: "clear" },
-  { type: "prompt", text: "❯ " },
-  {
-    type: "type",
-    text: "How did we fix that Docker networking issue?",
-    delay: 35,
-  },
-  { type: "pause", ms: 500 },
-  {
-    type: "output",
-    lines: [
-      "",
-      '<span class="t-dim">  session_search "Docker networking"                   1.4s</span>',
-    ],
-  },
-  { type: "pause", ms: 500 },
-  {
-    type: "output",
-    lines: [
-      "",
-      '<span class="t-text">Found it — from a session on February 12th:</span>',
-      "",
-      '<span class="t-text">The containers couldn\'t reach each other because the compose</span>',
-      '<span class="t-text">file was using the default bridge network. We switched to a</span>',
-      '<span class="t-text">custom network with</span> <span class="t-accent">driver: overlay</span><span class="t-text">, added explicit</span>',
-      '<span class="t-text">aliases, and set</span> <span class="t-accent">dns: 8.8.8.8</span> <span class="t-text">as a fallback.</span>',
-      "",
-      '<span class="t-text">The fix was committed in</span> <span class="t-accent">docker-compose.prod.yml</span><span class="t-text">.</span>',
-    ],
-  },
-  { type: "pause", ms: 3000 },
-];
-
-class TerminalDemo {
-  constructor(container) {
-    this.container = container;
-    this.running = false;
-    this.content = "";
-  }
-
-  async start() {
-    if (this.running) return;
-    this.running = true;
-
-    while (this.running) {
-      for (const step of demoSequence) {
-        if (!this.running) return;
-        await this.execute(step);
-      }
-      this.clear();
-      await this.sleep(1000);
-    }
-  }
-
-  stop() {
-    this.running = false;
-  }
-
-  async execute(step) {
-    switch (step.type) {
-      case "prompt":
-        this.append(`<span class="t-prompt">${step.text}</span>`);
-        break;
-      case "type":
-        for (const char of step.text) {
-          if (!this.running) return;
-          this.append(`<span class="t-cmd">${char}</span>`);
-          await this.sleep(step.delay || 30);
-        }
-        break;
-      case "output":
-        for (const line of step.lines) {
-          if (!this.running) return;
-          this.append("\n" + line);
-          await this.sleep(50);
-        }
-        break;
-      case "pause":
-        await this.sleep(step.ms);
-        break;
-      case "clear":
-        this.clear();
-        break;
-    }
-  }
-
-  append(html) {
-    this.content += html;
-    this.render();
-  }
-
-  render() {
-    this.container.innerHTML = this.content + CURSOR;
-    this.container.scrollTop = this.container.scrollHeight;
-  }
-
-  clear() {
-    this.content = "";
-    this.container.innerHTML = "";
-  }
-
-  sleep(ms) {
-    return new Promise((resolve) => setTimeout(resolve, ms));
-  }
-}
-
-// --- Noise Overlay (ported from hermes-chat NoiseOverlay) ---
-function initNoiseOverlay() {
-  if (window.matchMedia("(prefers-reduced-motion: reduce)").matches) return;
-  if (typeof THREE === "undefined") return;
-
-  const canvas = document.getElementById("noise-overlay");
-  if (!canvas) return;
-
-  const vertexShader = `
-        varying vec2 vUv;
-        void main() {
-            vUv = uv;
-            gl_Position = projectionMatrix * modelViewMatrix * vec4(position, 1.0);
-        }
-    `;
-
-  const fragmentShader = `
-        uniform vec2 uRes;
-        uniform float uDpr, uSize, uDensity, uOpacity;
-        uniform vec3 uColor;
-        varying vec2 vUv;
-
-        float hash(vec2 p) {
-            vec3 p3 = fract(vec3(p.xyx) * 0.1031);
-            p3 += dot(p3, p3.yzx + 33.33);
-            return fract((p3.x + p3.y) * p3.z);
-        }
-
-        void main() {
-            float n = hash(floor(vUv * uRes / (uSize * uDpr)));
-            gl_FragColor = vec4(uColor, step(1.0 - uDensity, n)) * uOpacity;
-        }
-    `;
-
-  function hexToVec3(hex) {
-    const c = hex.replace("#", "");
-    return new THREE.Vector3(
-      parseInt(c.substring(0, 2), 16) / 255,
-      parseInt(c.substring(2, 4), 16) / 255,
-      parseInt(c.substring(4, 6), 16) / 255,
-    );
-  }
-
-  const renderer = new THREE.WebGLRenderer({
-    alpha: true,
-    canvas,
-    premultipliedAlpha: false,
-  });
-  renderer.setClearColor(0x000000, 0);
-
-  const scene = new THREE.Scene();
-  const camera = new THREE.OrthographicCamera(-1, 1, 1, -1, 0, 1);
-  const geo = new THREE.PlaneGeometry(2, 2);
-
-  const mat = new THREE.ShaderMaterial({
-    vertexShader,
-    fragmentShader,
-    transparent: true,
-    uniforms: {
-      uColor: { value: hexToVec3("#8090BB") },
-      uDensity: { value: 0.1 },
-      uDpr: { value: 1 },
-      uOpacity: { value: 0.4 },
-      uRes: { value: new THREE.Vector2() },
-      uSize: { value: 1.0 },
-    },
-  });
-
-  scene.add(new THREE.Mesh(geo, mat));
-
-  function resize() {
-    const dpr = window.devicePixelRatio;
-    const w = window.innerWidth;
-    const h = window.innerHeight;
-    renderer.setSize(w, h);
-    renderer.setPixelRatio(dpr);
-    mat.uniforms.uRes.value.set(w * dpr, h * dpr);
-    mat.uniforms.uDpr.value = dpr;
-  }
-
-  resize();
-  window.addEventListener("resize", resize);
-
-  function loop() {
-    requestAnimationFrame(loop);
-    renderer.render(scene, camera);
-  }
-  loop();
-}
-
-// --- Initialize ---
-document.addEventListener("DOMContentLoaded", () => {
-  const detectedPlatform = detectPlatform();
-  switchPlatform(detectedPlatform);
-
-  initScrollAnimations();
-  initNoiseOverlay();
-
-  const terminalEl = document.getElementById("terminal-demo");
-
-  if (terminalEl) {
-    const demo = new TerminalDemo(terminalEl);
-
-    const observer = new IntersectionObserver(
-      (entries) => {
-        entries.forEach((entry) => {
-          if (entry.isIntersecting) {
-            demo.start();
-          } else {
-            demo.stop();
-          }
-        });
-      },
-      { threshold: 0.3 },
-    );
-
-    observer.observe(document.querySelector(".terminal-window"));
-  }
-
-  const nav = document.querySelector(".nav");
-  let ticking = false;
-  window.addEventListener("scroll", () => {
-    if (!ticking) {
-      requestAnimationFrame(() => {
-        if (window.scrollY > 50) {
-          nav.style.borderBottomColor = "rgba(48, 80, 255, 0.15)";
-        } else {
-          nav.style.borderBottomColor = "";
-        }
-        ticking = false;
-      });
-      ticking = true;
-    }
-  });
-});
@@ -1,12 +1,12 @@
 ---
 name: honcho
-description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, and dialectic reasoning. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation and recall settings.
-version: 1.0.0
+description: Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshooting memory, managing profiles with Honcho peers, or tuning observation, recall, and dialectic settings.
+version: 2.0.0
 author: Hermes Agent
 license: MIT
 metadata:
  hermes:
-    tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling]
+    tags: [Honcho, Memory, Profiles, Observation, Dialectic, User-Modeling, Session-Summary]
    homepage: https://docs.honcho.dev
    related_skills: [hermes-agent]
 prerequisites:
@@ -22,8 +22,9 @@ Honcho provides AI-native cross-session user modeling. It learns who the user is
 - Setting up Honcho (cloud or self-hosted)
 - Troubleshooting memory not working / peers not syncing
 - Creating multi-profile setups where each agent has its own Honcho peer
- Tuning observation, recall, or write frequency settings
- Understanding what the 4 Honcho tools do and when to use them
+- Tuning observation, recall, dialectic depth, or write frequency settings
+- Understanding what the 5 Honcho tools do and when to use them
+- Configuring context budgets and session summary injection

 ## Setup

@@ -51,6 +52,27 @@ hermes honcho status    # shows resolved config, connection test, peer info

 ## Architecture

+### Base Context Injection
+
+When Honcho injects context into the system prompt (in `hybrid` or `context` recall modes), it assembles the base context block in this order:
+
+1. **Session summary** -- a short digest of the current session so far (placed first so the model has immediate conversational continuity)
+2. **User representation** -- Honcho's accumulated model of the user (preferences, facts, patterns)
+3. **AI peer card** -- the identity card for this Hermes profile's AI peer
+
+The session summary is generated automatically by Honcho at the start of each turn (when a prior session exists). It gives the model a warm start without replaying full history.
+
+### Cold / Warm Prompt Selection
+
+Honcho automatically selects between two prompt strategies:
+
+| Condition | Strategy | What happens |
+|-----------|----------|--------------|
+| No prior session or empty representation | **Cold start** | Lightweight intro prompt; skips summary injection; encourages the model to learn about the user |
+| Existing representation and/or session history | **Warm start** | Full base context injection (summary → representation → card); richer system prompt |
+
+You do not need to configure this -- it is automatic based on session state.
+
 ### Peers

 Honcho models conversations as interactions between **peers**. Hermes creates two peers per session:
@@ -112,6 +134,63 @@ How the agent accesses Honcho memory:
 | `context` | Yes | No (hidden) | Minimal token cost, no tool calls |
 | `tools` | No | Yes | Agent controls all memory access explicitly |

+## Three Orthogonal Knobs
+
+Honcho's dialectic behavior is controlled by three independent dimensions. Each can be tuned without affecting the others:
+
+### Cadence (when)
+
+Controls **how often** dialectic and context calls happen.
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `contextCadence` | `1` | Min turns between context API calls |
+| `dialecticCadence` | `3` | Min turns between dialectic API calls |
+| `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` for base context injection |
+
+Higher cadence values reduce API calls and cost. `dialecticCadence: 3` (default) means the dialectic engine fires at most every 3rd turn.
+
+### Depth (how many)
+
+Controls **how many rounds** of dialectic reasoning Honcho performs per query.
+
+| Key | Default | Range | Description |
+|-----|---------|-------|-------------|
+| `dialecticDepth` | `1` | 1-3 | Number of dialectic reasoning rounds per query |
+| `dialecticDepthLevels` | -- | array | Optional per-depth-round level overrides (see below) |
+
+`dialecticDepth: 2` means Honcho runs two rounds of dialectic synthesis. The first round produces an initial answer; the second refines it.
+
+`dialecticDepthLevels` lets you set the reasoning level for each round independently:
+
+```json
+{
+  "dialecticDepth": 3,
+  "dialecticDepthLevels": ["low", "medium", "high"]
+}
+```
+
+If `dialecticDepthLevels` is omitted, rounds use **proportional levels** derived from `dialecticReasoningLevel` (the base):
+
+| Depth | Pass levels |
+|-------|-------------|
+| 1 | [base] |
+| 2 | [minimal, base] |
+| 3 | [minimal, base, low] |
+
+This keeps earlier passes cheap while using full depth on the final synthesis.
+
+### Level (how hard)
+
+Controls the **intensity** of each dialectic reasoning round.
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
+| `dialecticDynamic` | `true` | When `true`, the model can pass `reasoning_level` to `honcho_reasoning` to override the default per-call. `false` = always use `dialecticReasoningLevel`, model overrides ignored |
+
+Higher levels produce richer synthesis but cost more tokens on Honcho's backend.
+
 ## Multi-Profile Setup

 Each Hermes profile gets its own Honcho AI peer while sharing the same workspace (user context). This means:
@@ -149,6 +228,7 @@ Override any setting in the host block:
    "hermes.coder": {
      "aiPeer": "coder",
      "recallMode": "tools",
+      "dialecticDepth": 2,
      "observation": {
        "user": { "observeMe": true, "observeOthers": false },
        "ai": { "observeMe": true, "observeOthers": true }
@@ -160,19 +240,97 @@ Override any setting in the host block:

 ## Tools

-The agent has 4 Honcho tools (hidden in `context` recall mode):
+The agent has 5 bidirectional Honcho tools (hidden in `context` recall mode):
+
+| Tool | LLM call? | Cost | Use when |
+|------|-----------|------|----------|
+| `honcho_profile` | No | minimal | Quick factual snapshot at conversation start or for fast name/role/pref lookups |
+| `honcho_search` | No | low | Fetch specific past facts to reason over yourself — raw excerpts, no synthesis |
+| `honcho_context` | No | low | Full session context snapshot: summary, representation, card, recent messages |
+| `honcho_reasoning` | Yes | medium–high | Natural language question synthesized by Honcho's dialectic engine |
+| `honcho_conclude` | No | minimal | Write or delete a persistent fact; pass `peer: "ai"` for AI self-knowledge |

 ### `honcho_profile`
-Quick factual snapshot of the user -- name, role, preferences, patterns. No LLM call, minimal cost. Use at conversation start or for fast lookups.
+Read or update a peer card — curated key facts (name, role, preferences, communication style). Pass `card: [...]` to update; omit to read. No LLM call.

 ### `honcho_search`
-Semantic search over stored context. Returns raw excerpts ranked by relevance, no LLM synthesis. Default 800 tokens, max 2000. Use when you want specific past facts to reason over yourself.
+Semantic search over stored context for a specific peer. Returns raw excerpts ranked by relevance, no synthesis. Default 800 tokens, max 2000. Good when you need specific past facts to reason over yourself rather than a synthesized answer.

 ### `honcho_context`
-Natural language question answered by Honcho's dialectic reasoning (LLM call on Honcho's backend). Higher cost, higher quality. Can query about user (default) or the AI peer.
+Full session context snapshot from Honcho — session summary, peer representation, peer card, and recent messages. No LLM call. Use when you want to see everything Honcho knows about the current session and peer in one shot.
+
+### `honcho_reasoning`
+Natural language question answered by Honcho's dialectic reasoning engine (LLM call on Honcho's backend). Higher cost, higher quality. Pass `reasoning_level` to control depth: `minimal` (fast/cheap) → `low` → `medium` → `high` → `max` (thorough). Omit to use the configured default (`low`). Use for synthesized understanding of the user's patterns, goals, or current state.

 ### `honcho_conclude`
-Write a persistent fact about the user. Conclusions build the user's profile over time. Use when the user states a preference, corrects you, or shares something to remember.
+Write or delete a persistent conclusion about a peer. Pass `conclusion: "..."` to create. Pass `delete_id: "..."` to remove a conclusion (for PII removal — Honcho self-heals incorrect conclusions over time, so deletion is only needed for PII). You MUST pass exactly one of the two.
+
+### Bidirectional peer targeting
+
+All 5 tools accept an optional `peer` parameter:
+- `peer: "user"` (default) — operates on the user peer
+- `peer: "ai"` — operates on this profile's AI peer
+- `peer: "<explicit-id>"` — any peer ID in the workspace
+
+Examples:
+```
+honcho_profile                        # read user's card
+honcho_profile peer="ai"              # read AI peer's card
+honcho_reasoning query="What does this user care about most?"
+honcho_reasoning query="What are my interaction patterns?" peer="ai" reasoning_level="medium"
+honcho_conclude conclusion="Prefers terse answers"
+honcho_conclude conclusion="I tend to over-explain code" peer="ai"
+honcho_conclude delete_id="abc123"    # PII removal
+```
+
+## Agent Usage Patterns
+
+Guidelines for Hermes when Honcho memory is active.
+
+### On conversation start
+
+```
+1. honcho_profile                  → fast warmup, no LLM cost
+2. If context looks thin → honcho_context  (full snapshot, still no LLM)
+3. If deep synthesis needed → honcho_reasoning  (LLM call, use sparingly)
+```
+
+Do NOT call `honcho_reasoning` on every turn. Auto-injection already handles ongoing context refresh. Use the reasoning tool only when you genuinely need synthesized insight the base context doesn't provide.
+
+### When the user shares something to remember
+
+```
+honcho_conclude conclusion="<specific, actionable fact>"
+```
+
+Good conclusions: "Prefers code examples over prose explanations", "Working on a Rust async project through April 2026"
+Bad conclusions: "User said something about Rust" (too vague), "User seems technical" (already in representation)
+
+### When the user asks about past context / you need to recall specifics
+
+```
+honcho_search query="<topic>"       → fast, no LLM, good for specific facts
+honcho_context                       → full snapshot with summary + messages
+honcho_reasoning query="<question>"  → synthesized answer, use when search isn't enough
+```
+
+### When to use `peer: "ai"`
+
+Use AI peer targeting to build and query the agent's own self-knowledge:
+- `honcho_conclude conclusion="I tend to be verbose when explaining architecture" peer="ai"` — self-correction
+- `honcho_reasoning query="How do I typically handle ambiguous requests?" peer="ai"` — self-audit
+- `honcho_profile peer="ai"` — review own identity card
+
+### When NOT to call tools
+
+In `hybrid` and `context` modes, base context (user representation + card + session summary) is auto-injected before every turn. Do not re-fetch what was already injected. Call tools only when:
+- You need something the injected context doesn't have
+- The user explicitly asks you to recall or check memory
+- You're writing a conclusion about something new
+
+### Cadence awareness
+
+`honcho_reasoning` on the tool side shares the same cost as auto-injection dialectic. After an explicit tool call, the auto-injection cadence resets — avoiding double-charging the same turn.

 ## Config Reference

@@ -191,18 +349,39 @@ Config file: `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.jso
 | `observation` | all on | Per-peer `observeMe`/`observeOthers` booleans |
 | `writeFrequency` | `async` | `async`, `turn`, `session`, or integer N |
 | `sessionStrategy` | `per-directory` | `per-directory`, `per-repo`, `per-session`, `global` |
-| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
-| `dialecticDynamic` | `true` | Auto-bump reasoning by query length. `false` = fixed level |
 | `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
-| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |

-### Cost-awareness (advanced, root config only)
+### Dialectic settings

 | Key | Default | Description |
 |-----|---------|-------------|
+| `dialecticReasoningLevel` | `low` | `minimal`, `low`, `medium`, `high`, `max` |
+| `dialecticDynamic` | `true` | Auto-bump reasoning by query complexity. `false` = fixed level |
+| `dialecticDepth` | `1` | Number of dialectic rounds per query (1-3) |
+| `dialecticDepthLevels` | -- | Optional array of per-round levels, e.g. `["low", "high"]` |
+| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input |
+
+### Context budget and injection
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `contextTokens` | uncapped | Max tokens for the combined base context injection (summary + representation + card). Opt-in cap — omit to leave uncapped, set to an integer to bound injection size. |
 | `injectionFrequency` | `every-turn` | `every-turn` or `first-turn` |
 | `contextCadence` | `1` | Min turns between context API calls |
-| `dialecticCadence` | `1` | Min turns between dialectic API calls |
+| `dialecticCadence` | `3` | Min turns between dialectic LLM calls |
+
+The `contextTokens` budget is enforced at injection time. If the session summary + representation + card exceed the budget, Honcho trims the summary first, then the representation, preserving the card. This prevents context blowup in long sessions.
+
+### Memory-context sanitization
+
+Honcho sanitizes the `memory-context` block before injection to prevent prompt injection and malformed content:
+
+- Strips XML/HTML tags from user-authored conclusions
+- Normalizes whitespace and control characters
+- Truncates individual conclusions that exceed `messageMaxChars`
+- Escapes delimiter sequences that could break the system prompt structure
+
+This fix addresses edge cases where raw user conclusions containing markup or special characters could corrupt the injected context block.

 ## Troubleshooting

@@ -221,6 +400,12 @@ Observation config is synced from the server on each session init. Start a new s
 ### Messages truncated
 Messages over `messageMaxChars` (default 25k) are automatically chunked with `[continued]` markers. If you're hitting this often, check if tool results or skill content is inflating message size.

+### Context injection too large
+If you see warnings about context budget exceeded, lower `contextTokens` or reduce `dialecticDepth`. The session summary is trimmed first when the budget is tight.
+
+### Session summary missing
+Session summary requires at least one prior turn in the current Honcho session. On cold start (new session, no history), the summary is omitted and Honcho uses the cold-start prompt strategy instead.
+
 ## CLI Commands

 | Command | Description |
@@ -0,0 +1,94 @@
+/**
+ * Example Dashboard Plugin
+ *
+ * Demonstrates how to build a dashboard plugin using the Hermes Plugin SDK.
+ * No build step needed — this is a plain IIFE that uses globals from the SDK.
+ */
+(function () {
+  "use strict";
+
+  const SDK = window.__HERMES_PLUGIN_SDK__;
+  const { React } = SDK;
+  const { Card, CardHeader, CardTitle, CardContent, Badge, Button } = SDK.components;
+  const { useState, useEffect } = SDK.hooks;
+  const { cn } = SDK.utils;
+
+  function ExamplePage() {
+    const [greeting, setGreeting] = useState(null);
+    const [loading, setLoading] = useState(false);
+
+    function fetchGreeting() {
+      setLoading(true);
+      SDK.fetchJSON("/api/plugins/example/hello")
+        .then(function (data) { setGreeting(data.message); })
+        .catch(function () { setGreeting("(backend not available)"); })
+        .finally(function () { setLoading(false); });
+    }
+
+    return React.createElement("div", { className: "flex flex-col gap-6" },
+      // Header card
+      React.createElement(Card, null,
+        React.createElement(CardHeader, null,
+          React.createElement("div", { className: "flex items-center gap-3" },
+            React.createElement(CardTitle, { className: "text-lg" }, "Example Plugin"),
+            React.createElement(Badge, { variant: "outline" }, "v1.0.0"),
+          ),
+        ),
+        React.createElement(CardContent, { className: "flex flex-col gap-4" },
+          React.createElement("p", { className: "text-sm text-muted-foreground" },
+            "This is an example dashboard plugin. It demonstrates using the Plugin SDK to build ",
+            "custom tabs with React components, connect to backend API routes, and integrate with ",
+            "the existing Hermes UI system.",
+          ),
+          React.createElement("div", { className: "flex items-center gap-3" },
+            React.createElement(Button, {
+              onClick: fetchGreeting,
+              disabled: loading,
+              className: cn(
+                "inline-flex items-center gap-2 border border-border bg-background/40 px-4 py-2",
+                "text-sm font-courier transition-colors hover:bg-foreground/10 cursor-pointer",
+              ),
+            }, loading ? "Loading..." : "Call Backend API"),
+            greeting && React.createElement("span", {
+              className: "text-sm font-courier text-muted-foreground",
+            }, greeting),
+          ),
+        ),
+      ),
+
+      // Info card about the SDK
+      React.createElement(Card, null,
+        React.createElement(CardHeader, null,
+          React.createElement(CardTitle, { className: "text-base" }, "Plugin SDK Reference"),
+        ),
+        React.createElement(CardContent, null,
+          React.createElement("div", { className: "grid gap-3 text-sm" },
+            React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
+              React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.React"),
+              React.createElement("span", { className: "text-muted-foreground text-xs" }, "React instance — use instead of importing react"),
+            ),
+            React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
+              React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.hooks"),
+              React.createElement("span", { className: "text-muted-foreground text-xs" }, "useState, useEffect, useCallback, useMemo, useRef, useContext, createContext"),
+            ),
+            React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
+              React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.components"),
+              React.createElement("span", { className: "text-muted-foreground text-xs" }, "Card, Badge, Button, Input, Label, Select, Separator, Tabs, etc."),
+            ),
+            React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
+              React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.api"),
+              React.createElement("span", { className: "text-muted-foreground text-xs" }, "Hermes API client — getStatus(), getSessions(), etc."),
+            ),
+            React.createElement("div", { className: "flex flex-col gap-1 border border-border p-3" },
+              React.createElement("span", { className: "font-medium" }, "window.__HERMES_PLUGIN_SDK__.utils"),
+              React.createElement("span", { className: "text-muted-foreground text-xs" }, "cn(), timeAgo(), isoTimeAgo()"),
+            ),
+          ),
+        ),
+      ),
+    );
+  }
+
+  // Register this plugin — the dashboard picks it up automatically.
+  window.__HERMES_PLUGINS__.register("example", ExamplePage);
+})();
@@ -0,0 +1,13 @@
+{
+  "name": "example",
+  "label": "Example",
+  "description": "Example dashboard plugin — demonstrates the plugin SDK",
+  "icon": "Sparkles",
+  "version": "1.0.0",
+  "tab": {
+    "path": "/example",
+    "position": "after:skills"
+  },
+  "entry": "dist/index.js",
+  "api": "plugin_api.py"
+}
@@ -0,0 +1,14 @@
+"""Example dashboard plugin — backend API routes.
+
+Mounted at /api/plugins/example/ by the dashboard plugin system.
+"""
+
+from fastapi import APIRouter
+
+router = APIRouter()
+
+
+@router.get("/hello")
+async def hello():
+    """Simple greeting endpoint to demonstrate plugin API routes."""
+    return {"message": "Hello from the example plugin!", "plugin": "example", "version": "1.0.0"}
@@ -1,18 +1,22 @@
 """Memory provider plugin discovery.

-Scans ``plugins/memory/<name>/`` directories for memory provider plugins.
-Each subdirectory must contain ``__init__.py`` with a class implementing
-the MemoryProvider ABC.
+Scans two directories for memory provider plugins:

-Memory providers are separate from the general plugin system — they live
-in the repo and are always available without user installation. Only ONE
-can be active at a time, selected via ``memory.provider`` in config.yaml.
+1. Bundled providers: ``plugins/memory/<name>/`` (shipped with hermes-agent)
+2. User-installed providers: ``$HERMES_HOME/plugins/<name>/``
+
+Each subdirectory must contain ``__init__.py`` with a class implementing
+the MemoryProvider ABC.  On name collisions, bundled providers take
+precedence.
+
+Only ONE provider can be active at a time, selected via
+``memory.provider`` in config.yaml.

 Usage:
    from plugins.memory import discover_memory_providers, load_memory_provider

    available = discover_memory_providers()   # [(name, desc, available), ...]
-    provider = load_memory_provider("openviking")  # MemoryProvider instance
+    provider = load_memory_provider("mnemosyne")  # MemoryProvider instance
 """

 from __future__ import annotations
@@ -29,24 +33,101 @@ logger = logging.getLogger(__name__)
 _MEMORY_PLUGINS_DIR = Path(__file__).parent


+# ---------------------------------------------------------------------------
+# Directory helpers
+# ---------------------------------------------------------------------------
+
+def _get_user_plugins_dir() -> Optional[Path]:
+    """Return ``$HERMES_HOME/plugins/`` or None if unavailable."""
+    try:
+        from hermes_constants import get_hermes_home
+        d = get_hermes_home() / "plugins"
+        return d if d.is_dir() else None
+    except Exception:
+        return None
+
+
+def _is_memory_provider_dir(path: Path) -> bool:
+    """Heuristic: does *path* look like a memory provider plugin?
+
+    Checks for ``register_memory_provider`` or ``MemoryProvider`` in the
+    ``__init__.py`` source.  Cheap text scan — no import needed.
+    """
+    init_file = path / "__init__.py"
+    if not init_file.exists():
+        return False
+    try:
+        source = init_file.read_text(errors="replace")[:8192]
+        return "register_memory_provider" in source or "MemoryProvider" in source
+    except Exception:
+        return False
+
+
+def _iter_provider_dirs() -> List[Tuple[str, Path]]:
+    """Yield ``(name, path)`` for all discovered provider directories.
+
+    Scans bundled first, then user-installed.  Bundled takes precedence
+    on name collisions (first-seen wins via ``seen`` set).
+    """
+    seen: set = set()
+    dirs: List[Tuple[str, Path]] = []
+
+    # 1. Bundled providers (plugins/memory/<name>/)
+    if _MEMORY_PLUGINS_DIR.is_dir():
+        for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
+            if not child.is_dir() or child.name.startswith(("_", ".")):
+                continue
+            if not (child / "__init__.py").exists():
+                continue
+            seen.add(child.name)
+            dirs.append((child.name, child))
+
+    # 2. User-installed providers ($HERMES_HOME/plugins/<name>/)
+    user_dir = _get_user_plugins_dir()
+    if user_dir:
+        for child in sorted(user_dir.iterdir()):
+            if not child.is_dir() or child.name.startswith(("_", ".")):
+                continue
+            if child.name in seen:
+                continue  # bundled takes precedence
+            if not _is_memory_provider_dir(child):
+                continue  # skip non-memory plugins
+            dirs.append((child.name, child))
+
+    return dirs
+
+
+def find_provider_dir(name: str) -> Optional[Path]:
+    """Resolve a provider name to its directory.
+
+    Checks bundled first, then user-installed.
+    """
+    # Bundled
+    bundled = _MEMORY_PLUGINS_DIR / name
+    if bundled.is_dir() and (bundled / "__init__.py").exists():
+        return bundled
+    # User-installed
+    user_dir = _get_user_plugins_dir()
+    if user_dir:
+        user = user_dir / name
+        if user.is_dir() and _is_memory_provider_dir(user):
+            return user
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+
 def discover_memory_providers() -> List[Tuple[str, str, bool]]:
-    """Scan plugins/memory/ for available providers.
+    """Scan bundled and user-installed directories for available providers.

    Returns list of (name, description, is_available) tuples.
-    Does NOT import the providers — just reads plugin.yaml for metadata
-    and does a lightweight availability check.
+    Bundled providers take precedence on name collisions.
    """
    results = []
-    if not _MEMORY_PLUGINS_DIR.is_dir():
-        return results
-
-    for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()):
-        if not child.is_dir() or child.name.startswith(("_", ".")):
-            continue
-        init_file = child / "__init__.py"
-        if not init_file.exists():
-            continue

+    for name, child in _iter_provider_dirs():
        # Read description from plugin.yaml if available
        desc = ""
        yaml_file = child / "plugin.yaml"
@@ -70,7 +151,7 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
        except Exception:
            available = False

-        results.append((child.name, desc, available))
+        results.append((name, desc, available))

    return results

@@ -78,11 +159,15 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]:
 def load_memory_provider(name: str) -> Optional["MemoryProvider"]:
    """Load and return a MemoryProvider instance by name.

+    Checks both bundled (``plugins/memory/<name>/``) and user-installed
+    (``$HERMES_HOME/plugins/<name>/``) directories.  Bundled takes
+    precedence on name collisions.
+
    Returns None if the provider is not found or fails to load.
    """
-    provider_dir = _MEMORY_PLUGINS_DIR / name
-    if not provider_dir.is_dir():
-        logger.debug("Memory provider '%s' not found in %s", name, _MEMORY_PLUGINS_DIR)
+    provider_dir = find_provider_dir(name)
+    if not provider_dir:
+        logger.debug("Memory provider '%s' not found in bundled or user plugins", name)
        return None

    try:
@@ -104,7 +189,10 @@ def _load_provider_from_dir(provider_dir: Path) -> Optional["MemoryProvider"]:
    - A top-level class that extends MemoryProvider — we instantiate it
    """
    name = provider_dir.name
-    module_name = f"plugins.memory.{name}"
+    # Use a separate namespace for user-installed plugins so they don't
+    # collide with bundled providers in sys.modules.
+    _is_bundled = _MEMORY_PLUGINS_DIR in provider_dir.parents or provider_dir.parent == _MEMORY_PLUGINS_DIR
+    module_name = f"plugins.memory.{name}" if _is_bundled else f"_hermes_user_memory.{name}"
    init_file = provider_dir / "__init__.py"

    if not init_file.exists():
@@ -257,15 +345,16 @@ def discover_plugin_cli_commands() -> List[dict]:
        return results

    # Only look at the active provider's directory
-    plugin_dir = _MEMORY_PLUGINS_DIR / active_provider
-    if not plugin_dir.is_dir():
+    plugin_dir = find_provider_dir(active_provider)
+    if not plugin_dir:
        return results

    cli_file = plugin_dir / "cli.py"
    if not cli_file.exists():
        return results

-    module_name = f"plugins.memory.{active_provider}.cli"
+    _is_bundled = _MEMORY_PLUGINS_DIR in plugin_dir.parents or plugin_dir.parent == _MEMORY_PLUGINS_DIR
+    module_name = f"plugins.memory.{active_provider}.cli" if _is_bundled else f"_hermes_user_memory.{active_provider}.cli"
    try:
        # Import the CLI module (lightweight — no SDK needed)
        if module_name in sys.modules:
@@ -1,6 +1,6 @@
 # Honcho Memory Provider

-AI-native cross-session user modeling with dialectic Q&A, semantic search, peer cards, and persistent conclusions.
+AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.

 > **Honcho docs:** <https://docs.honcho.dev/v3/guides/integrations/hermes>

@@ -19,9 +19,86 @@ hermes memory setup    # generic picker, also works
 Or manually:
 ```bash
 hermes config set memory.provider honcho
-echo "HONCHO_API_KEY=your-key" >> ~/.hermes/.env
+echo "HONCHO_API_KEY=***" >> ~/.hermes/.env
 ```

+## Architecture Overview
+
+### Two-Layer Context Injection
+
+Context is injected into the **user message** at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in `<memory-context>` fences with a system note clarifying it's background data, not new user input.
+
+Two independent layers, each on its own cadence:
+
+**Layer 1 — Base context** (refreshed every `contextCadence` turns):
+1. **SESSION SUMMARY** — from `session.context(summary=True)`, placed first
+2. **User Representation** — Honcho's evolving model of the user
+3. **User Peer Card** — key facts snapshot
+4. **AI Self-Representation** — Honcho's model of the AI peer
+5. **AI Identity Card** — AI peer facts
+
+**Layer 2 — Dialectic supplement** (fired every `dialecticCadence` turns):
+Multi-pass `.chat()` reasoning about the user, appended after base context.
+
+Both layers are joined, then truncated to fit `contextTokens` budget via `_truncate_to_budget` (tokens × 4 chars, word-boundary safe).
+
+### Cold Start vs Warm Session Prompts
+
+Dialectic pass 0 automatically selects its prompt based on session state:
+
+- **Cold** (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
+- **Warm** (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."
+
+Not configurable — determined automatically.
+
+### Dialectic Depth (Multi-Pass Reasoning)
+
+`dialecticDepth` (1–3, clamped) controls how many `.chat()` calls fire per dialectic cycle:
+
+| Depth | Passes | Behavior |
+|-------|--------|----------|
+| 1 | single `.chat()` | Base query only (cold or warm prompt) |
+| 2 | audit + synthesis | Pass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars) |
+| 3 | audit + synthesis + reconciliation | Pass 2 reconciles contradictions across prior passes into a final synthesis |
+
+### Proportional Reasoning Levels
+
+When `dialecticDepthLevels` is not set, each pass uses a proportional level relative to `dialecticReasoningLevel` (the "base"):
+
+| Depth | Pass levels |
+|-------|-------------|
+| 1 | [base] |
+| 2 | [minimal, base] |
+| 3 | [minimal, base, low] |
+
+Override with `dialecticDepthLevels`: an explicit array of reasoning level strings per pass.
+
+### Three Orthogonal Dialectic Knobs
+
+| Knob | Controls | Type |
+|------|----------|------|
+| `dialecticCadence` | How often — minimum turns between dialectic firings | int |
+| `dialecticDepth` | How many — passes per firing (1–3) | int |
+| `dialecticReasoningLevel` | How hard — reasoning ceiling per `.chat()` call | string |
+
+### Input Sanitization
+
+`run_conversation` strips leaked `<memory-context>` blocks from user input before processing. When `saveMessages` persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes `<memory-context>` blocks plus associated system notes.
+
+## Tools
+
+Five bidirectional tools. All accept an optional `peer` parameter (`"user"` or `"ai"`, default `"user"`).
+
+| Tool | LLM call? | Description |
+|------|-----------|-------------|
+| `honcho_profile` | No | Peer card — key facts snapshot |
+| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
+| `honcho_context` | No | Full session context: summary, representation, card, messages |
+| `honcho_reasoning` | Yes | LLM-synthesized answer via dialectic `.chat()` |
+| `honcho_conclude` | No | Write a persistent fact/conclusion about the user |
+
+Tool visibility depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
+
 ## Config Resolution

 Config is read from the first file that exists:
@@ -34,42 +111,128 @@ Config is read from the first file that exists:

 Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>`.

-## Tools
-
-| Tool | LLM call? | Description |
-|------|-----------|-------------|
-| `honcho_profile` | No | User's peer card -- key facts snapshot |
-| `honcho_search` | No | Semantic search over stored context (800 tok default, 2000 max) |
-| `honcho_context` | Yes | LLM-synthesized answer via dialectic reasoning |
-| `honcho_conclude` | No | Write a persistent fact about the user |
-
-Tool availability depends on `recallMode`: hidden in `context` mode, always present in `tools` and `hybrid`.
+For every key, resolution order is: **host block > root > env var > default**.

 ## Full Configuration Reference

 ### Identity & Connection

-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `apiKey` | string | -- | root / host | API key. Falls back to `HONCHO_API_KEY` env var |
-| `baseUrl` | string | -- | root | Base URL for self-hosted Honcho. Local URLs (`localhost`, `127.0.0.1`, `::1`) auto-skip API key auth |
-| `environment` | string | `"production"` | root / host | SDK environment mapping |
-| `enabled` | bool | auto | root / host | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
-| `workspace` | string | host key | root / host | Honcho workspace ID |
-| `peerName` | string | -- | root / host | User peer identity |
-| `aiPeer` | string | host key | root / host | AI peer identity |
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `apiKey` | string | — | API key. Falls back to `HONCHO_API_KEY` env var |
+| `baseUrl` | string | — | Base URL for self-hosted Honcho. Local URLs auto-skip API key auth |
+| `environment` | string | `"production"` | SDK environment mapping |
+| `enabled` | bool | auto | Master toggle. Auto-enables when `apiKey` or `baseUrl` present |
+| `workspace` | string | host key | Honcho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories |
+| `peerName` | string | — | User peer identity |
+| `aiPeer` | string | host key | AI peer identity |

 ### Memory & Recall

-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `recallMode` | string | `"hybrid"` | root / host | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` normalizes to `"hybrid"` |
-| `observationMode` | string | `"directional"` | root / host | Shorthand preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
-| `observation` | object | -- | root / host | Per-peer observation config (see below) |
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `recallMode` | string | `"hybrid"` | `"hybrid"` (auto-inject + tools), `"context"` (auto-inject only, tools hidden), `"tools"` (tools only, no injection). Legacy `"auto"` → `"hybrid"` |
+| `observationMode` | string | `"directional"` | Preset: `"directional"` (all on) or `"unified"` (shared pool). Use `observation` object for granular control |
+| `observation` | object | — | Per-peer observation config (see Observation section) |

-#### Observation (granular)
+### Write Behavior

-Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. Set at root or per host block -- each profile can have different observation settings. When present, overrides `observationMode` preset.
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `writeFrequency` | string/int | `"async"` | `"async"` (background), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
+| `saveMessages` | bool | `true` | Persist messages to Honcho API |
+
+### Session Resolution
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `sessionStrategy` | string | `"per-directory"` | `"per-directory"`, `"per-session"`, `"per-repo"` (git root), `"global"` |
+| `sessionPeerPrefix` | bool | `false` | Prepend peer name to session keys |
+| `sessions` | object | `{}` | Manual directory-to-session-name mappings |
+
+#### Session Name Resolution
+
+The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:
+
+| Priority | Source | Example session name |
+|----------|--------|---------------------|
+| 1 | Manual map (`sessions` config) | `"myproject-main"` |
+| 2 | `/title` command (mid-session rename) | `"refactor-auth"` |
+| 3 | Gateway session key (Telegram, Discord, etc.) | `"agent-main-telegram-dm-8439114563"` |
+| 4 | `per-session` strategy | Hermes session ID (`20260415_a3f2b1`) |
+| 5 | `per-repo` strategy | Git root directory name (`hermes-agent`) |
+| 6 | `per-directory` strategy | Current directory basename (`src`) |
+| 7 | `global` strategy | Workspace name (`hermes`) |
+
+Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of `sessionStrategy`. The strategy setting only affects CLI sessions.
+
+If `sessionPeerPrefix` is `true`, the peer name is prepended: `eri-hermes-agent`.
+
+#### What each strategy produces
+
+- **`per-directory`** — basename of `$PWD`. Opening hermes in `~/code/myapp` and `~/code/other` gives two separate sessions. Same directory = same session across runs.
+- **`per-repo`** — git root directory name. All subdirectories within a repo share one session. Falls back to `per-directory` if not inside a git repo.
+- **`per-session`** — Hermes session ID (timestamp + hex). Every `hermes` invocation starts a fresh Honcho session. Falls back to `per-directory` if no session ID is available.
+- **`global`** — workspace name. One session for everything. Memory accumulates across all directories and runs.
+
+### Multi-Profile Pattern
+
+Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is **host block > root > env var > default** — host blocks inherit from root, so shared settings only need to be declared once:
+
+```json
+{
+  "apiKey": "***",
+  "workspace": "hermes",
+  "peerName": "yourname",
+  "hosts": {
+    "hermes": {
+      "aiPeer": "hermes",
+      "recallMode": "hybrid",
+      "sessionStrategy": "per-directory"
+    },
+    "hermes.coder": {
+      "aiPeer": "coder",
+      "recallMode": "tools",
+      "sessionStrategy": "per-repo"
+    }
+  }
+}
+```
+
+Both profiles see the same user (`yourname`) in the same shared environment (`hermes`), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.
+
+Host key is derived from the active Hermes profile: `hermes` (default) or `hermes.<profile>` (e.g. `hermes -p coder` → host key `hermes.coder`).
+
+### Dialectic & Reasoning
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `dialecticDepth` | int | `1` | Passes per dialectic cycle (1–3, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation |
+| `dialecticDepthLevels` | array | — | Optional array of reasoning level strings per pass. Overrides proportional defaults. Example: `["minimal", "low", "medium"]` |
+| `dialecticReasoningLevel` | string | `"low"` | Base reasoning level for `.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
+| `dialecticDynamic` | bool | `true` | When `true`, model can override reasoning level per-call via `honcho_reasoning` tool. When `false`, always uses `dialecticReasoningLevel` |
+| `dialecticMaxChars` | int | `600` | Max chars of dialectic result injected into system prompt |
+| `dialecticMaxInputChars` | int | `10000` | Max chars for dialectic query input to `.chat()`. Honcho cloud limit: 10k |
+
+### Token Budgets
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `contextTokens` | int | SDK default | Token budget for `context()` API calls. Also gates prefetch truncation (tokens × 4 chars) |
+| `messageMaxChars` | int | `25000` | Max chars per message sent via `add_messages()`. Exceeding this triggers chunking with `[continued]` markers. Honcho cloud limit: 25k |
+
+### Cadence (Cost Control)
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `contextCadence` | int | `1` | Minimum turns between base context refreshes (session summary + representation + card) |
+| `dialecticCadence` | int | `1` | Minimum turns between dialectic `.chat()` firings |
+| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context on the first user message only, skip from turn 2 onward) |
+| `reasoningLevelCap` | string | — | Hard cap on reasoning level: `"minimal"`, `"low"`, `"medium"`, `"high"` |
+
+### Observation (Granular)
+
+Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. When present, overrides `observationMode` preset.

 ```json
 "observation": {
@@ -85,74 +248,16 @@ Maps 1:1 to Honcho's per-peer `SessionPeerConfig`. Set at root or per host block
 | `ai.observeMe` | `true` | AI peer self-observation (Honcho builds AI representation) |
 | `ai.observeOthers` | `true` | AI peer observes user messages (enables cross-peer dialectic) |

-Presets for `observationMode`:
- `"directional"` (default): all four booleans `true`
+Presets:
+- `"directional"` (default): all four `true`
 - `"unified"`: user `observeMe=true`, AI `observeOthers=true`, rest `false`

-Per-profile example -- coder profile observes the user but user doesn't observe coder:
+### Hardcoded Limits

-```json
-"hosts": {
-  "hermes.coder": {
-    "observation": {
-      "user": { "observeMe": true, "observeOthers": false },
-      "ai":   { "observeMe": true, "observeOthers": true }
-    }
-  }
-}
-```
-
-Settings changed in the [Honcho dashboard](https://app.honcho.dev) are synced back on session init.
-
-### Write Behavior
-
-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `writeFrequency` | string or int | `"async"` | root / host | `"async"` (background thread), `"turn"` (sync per turn), `"session"` (batch on end), or integer N (every N turns) |
-| `saveMessages` | bool | `true` | root / host | Whether to persist messages to Honcho API |
-
-### Session Resolution
-
-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `sessionStrategy` | string | `"per-directory"` | root / host | `"per-directory"`, `"per-session"` (new each run), `"per-repo"` (git root name), `"global"` (single session) |
-| `sessionPeerPrefix` | bool | `false` | root / host | Prepend peer name to session keys |
-| `sessions` | object | `{}` | root | Manual directory-to-session-name mappings: `{"/path/to/project": "my-session"}` |
-
-### Token Budgets & Dialectic
-
-| Key | Type | Default | Scope | Description |
-|-----|------|---------|-------|-------------|
-| `contextTokens` | int | SDK default | root / host | Token budget for `context()` API calls. Also gates prefetch truncation (tokens x 4 chars) |
-| `dialecticReasoningLevel` | string | `"low"` | root / host | Base reasoning level for `peer.chat()`: `"minimal"`, `"low"`, `"medium"`, `"high"`, `"max"` |
-| `dialecticDynamic` | bool | `true` | root / host | Auto-bump reasoning based on query length: `<120` chars = base level, `120-400` = +1, `>400` = +2 (capped at `"high"`). Set `false` to always use `dialecticReasoningLevel` as-is |
-| `dialecticMaxChars` | int | `600` | root / host | Max chars of dialectic result injected into system prompt |
-| `dialecticMaxInputChars` | int | `10000` | root / host | Max chars for dialectic query input to `peer.chat()`. Honcho cloud limit: 10k |
-| `messageMaxChars` | int | `25000` | root / host | Max chars per message sent via `add_messages()`. Messages exceeding this are chunked with `[continued]` markers. Honcho cloud limit: 25k |
-
-### Cost Awareness (Advanced)
-
-These are read from the root config object, not the host block. Must be set manually in `honcho.json`.
-
-| Key | Type | Default | Description |
-|-----|------|---------|-------------|
-| `injectionFrequency` | string | `"every-turn"` | `"every-turn"` or `"first-turn"` (inject context only on turn 0) |
-| `contextCadence` | int | `1` | Minimum turns between `context()` API calls |
-| `dialecticCadence` | int | `1` | Minimum turns between `peer.chat()` API calls |
-| `reasoningLevelCap` | string | -- | Hard cap on auto-bumped reasoning: `"minimal"`, `"low"`, `"mid"`, `"high"` |
-
-### Hardcoded Limits (Not Configurable)
-
-| Limit | Value | Location |
-|-------|-------|----------|
-| Search tool max tokens | 2000 (hard cap), 800 (default) | `__init__.py` handle_tool_call |
-| Peer card fetch tokens | 200 | `session.py` get_peer_card |
-
-## Config Precedence
-
-For every key, resolution order is: **host block > root > env var > default**.
-
-Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile>`) > `"hermes"`.
+| Limit | Value |
+|-------|-------|
+| Search tool max tokens | 2000 (hard cap), 800 (default) |
+| Peer card fetch tokens | 200 |

 ## Environment Variables

@@ -182,15 +287,16 @@ Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile

 ```json
 {
-  "apiKey": "your-key",
+  "apiKey": "***",
  "workspace": "hermes",
-  "peerName": "eri",
+  "peerName": "username",
+  "contextCadence": 2,
+  "dialecticCadence": 3,
+  "dialecticDepth": 2,
  "hosts": {
    "hermes": {
      "enabled": true,
      "aiPeer": "hermes",
-      "workspace": "hermes",
-      "peerName": "eri",
      "recallMode": "hybrid",
      "observation": {
        "user": { "observeMe": true, "observeOthers": true },
@@ -199,14 +305,16 @@ Host key derivation: `HERMES_HONCHO_HOST` env > active profile (`hermes.<profile
      "writeFrequency": "async",
      "sessionStrategy": "per-directory",
      "dialecticReasoningLevel": "low",
+      "dialecticDepth": 2,
      "dialecticMaxChars": 600,
      "saveMessages": true
    },
    "hermes.coder": {
      "enabled": true,
      "aiPeer": "coder",
-      "workspace": "hermes",
-      "peerName": "eri",
+      "sessionStrategy": "per-repo",
+      "dialecticDepth": 1,
+      "dialecticDepthLevels": ["low"],
      "observation": {
        "user": { "observeMe": true, "observeOthers": false },
        "ai": { "observeMe": true, "observeOthers": true }
@@ -17,6 +17,7 @@ from __future__ import annotations

 import json
 import logging
+import re
 import threading
 from typing import Any, Dict, List, Optional

@@ -33,20 +34,33 @@ logger = logging.getLogger(__name__)
 PROFILE_SCHEMA = {
    "name": "honcho_profile",
    "description": (
-        "Retrieve the user's peer card from Honcho — a curated list of key facts "
-        "about them (name, role, preferences, communication style, patterns). "
-        "Fast, no LLM reasoning, minimal cost. "
-        "Use this at conversation start or when you need a quick factual snapshot."
+        "Retrieve or update a peer card from Honcho — a curated list of key facts "
+        "about that peer (name, role, preferences, communication style, patterns). "
+        "Pass `card` to update; omit `card` to read."
    ),
-    "parameters": {"type": "object", "properties": {}, "required": []},
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "peer": {
+                "type": "string",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
+            },
+            "card": {
+                "type": "array",
+                "items": {"type": "string"},
+                "description": "New peer card as a list of fact strings. Omit to read the current card.",
+            },
+        },
+        "required": [],
+    },
 }

 SEARCH_SCHEMA = {
    "name": "honcho_search",
    "description": (
-        "Semantic search over Honcho's stored context about the user. "
+        "Semantic search over Honcho's stored context about a peer. "
        "Returns raw excerpts ranked by relevance — no LLM synthesis. "
-        "Cheaper and faster than honcho_context. "
+        "Cheaper and faster than honcho_reasoning. "
        "Good when you want to find specific past facts and reason over them yourself."
    ),
    "parameters": {
@@ -60,17 +74,23 @@ SEARCH_SCHEMA = {
                "type": "integer",
                "description": "Token budget for returned context (default 800, max 2000).",
            },
+            "peer": {
+                "type": "string",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
+            },
        },
        "required": ["query"],
    },
 }

-CONTEXT_SCHEMA = {
-    "name": "honcho_context",
+REASONING_SCHEMA = {
+    "name": "honcho_reasoning",
    "description": (
        "Ask Honcho a natural language question and get a synthesized answer. "
        "Uses Honcho's LLM (dialectic reasoning) — higher cost than honcho_profile or honcho_search. "
-        "Can query about any peer: the user (default) or the AI assistant."
+        "Can query about any peer via alias or explicit peer ID. "
+        "Pass reasoning_level to control depth: minimal (fast/cheap), low (default), "
+        "medium, high, max (deep/expensive). Omit for configured default."
    ),
    "parameters": {
        "type": "object",
@@ -79,37 +99,84 @@ CONTEXT_SCHEMA = {
                "type": "string",
                "description": "A natural language question.",
            },
+            "reasoning_level": {
+                "type": "string",
+                "description": (
+                    "Override the default reasoning depth. "
+                    "Omit to use the configured default (typically low). "
+                    "Guide:\n"
+                    "- minimal: quick factual lookups (name, role, simple preference)\n"
+                    "- low: straightforward questions with clear answers\n"
+                    "- medium: multi-aspect questions requiring synthesis across observations\n"
+                    "- high: complex behavioral patterns, contradictions, deep analysis\n"
+                    "- max: thorough audit-level analysis, leave no stone unturned"
+                ),
+                "enum": ["minimal", "low", "medium", "high", "max"],
+            },
            "peer": {
                "type": "string",
-                "description": "Which peer to query about: 'user' (default) or 'ai'.",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
            },
        },
        "required": ["query"],
    },
 }

+CONTEXT_SCHEMA = {
+    "name": "honcho_context",
+    "description": (
+        "Retrieve full session context from Honcho — summary, peer representation, "
+        "peer card, and recent messages. No LLM synthesis. "
+        "Cheaper than honcho_reasoning. Use this to see what Honcho knows about "
+        "the current conversation and the specified peer."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "query": {
+                "type": "string",
+                "description": "Optional focus query to filter context. Omit for full session context snapshot.",
+            },
+            "peer": {
+                "type": "string",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
+            },
+        },
+        "required": [],
+    },
+}
+
 CONCLUDE_SCHEMA = {
    "name": "honcho_conclude",
    "description": (
-        "Write a conclusion about the user back to Honcho's memory. "
-        "Conclusions are persistent facts that build the user's profile. "
-        "Use when the user states a preference, corrects you, or shares "
-        "something to remember across sessions."
+        "Write or delete a conclusion about a peer in Honcho's memory. "
+        "Conclusions are persistent facts that build a peer's profile. "
+        "You MUST pass exactly one of: `conclusion` (to create) or `delete_id` (to delete). "
+        "Passing neither is an error. "
+        "Deletion is only for PII removal — Honcho self-heals incorrect conclusions over time."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "conclusion": {
                "type": "string",
-                "description": "A factual statement about the user to persist.",
-            }
+                "description": "A factual statement to persist. Provide this when creating a conclusion. Do not send it together with delete_id.",
+            },
+            "delete_id": {
+                "type": "string",
+                "description": "Conclusion ID to delete for PII removal. Provide this when deleting a conclusion. Do not send it together with conclusion.",
+            },
+            "peer": {
+                "type": "string",
+                "description": "Peer to query. Built-in aliases: 'user' (default), 'ai'. Or pass any peer ID from this workspace.",
+            },
        },
-        "required": ["conclusion"],
+        "required": [],
    },
 }


-ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]
+ALL_TOOL_SCHEMAS = [PROFILE_SCHEMA, SEARCH_SCHEMA, REASONING_SCHEMA, CONTEXT_SCHEMA, CONCLUDE_SCHEMA]


 # ---------------------------------------------------------------------------
@@ -131,16 +198,18 @@ class HonchoMemoryProvider(MemoryProvider):
        # B1: recall_mode — set during initialize from config
        self._recall_mode = "hybrid"  # "context", "tools", or "hybrid"

-        # B4: First-turn context baking
-        self._first_turn_context: Optional[str] = None
-        self._first_turn_lock = threading.Lock()
+        # Base context cache — refreshed on context_cadence, not frozen
+        self._base_context_cache: Optional[str] = None
+        self._base_context_lock = threading.Lock()

        # B5: Cost-awareness turn counting and cadence
        self._turn_count = 0
        self._injection_frequency = "every-turn"  # or "first-turn"
        self._context_cadence = 1   # minimum turns between context API calls
-        self._dialectic_cadence = 1  # minimum turns between dialectic API calls
-        self._reasoning_level_cap: Optional[str] = None  # "minimal", "low", "mid", "high"
+        self._dialectic_cadence = 3  # minimum turns between dialectic API calls
+        self._dialectic_depth = 1   # how many .chat() calls per dialectic cycle (1-3)
+        self._dialectic_depth_levels: list[str] | None = None  # per-pass reasoning levels
+        self._reasoning_level_cap: Optional[str] = None  # "minimal", "low", "medium", "high"
        self._last_context_turn = -999
        self._last_dialectic_turn = -999

@@ -236,9 +305,11 @@ class HonchoMemoryProvider(MemoryProvider):
                raw = cfg.raw or {}
                self._injection_frequency = raw.get("injectionFrequency", "every-turn")
                self._context_cadence = int(raw.get("contextCadence", 1))
-                self._dialectic_cadence = int(raw.get("dialecticCadence", 1))
+                self._dialectic_cadence = int(raw.get("dialecticCadence", 3))
+                self._dialectic_depth = max(1, min(cfg.dialectic_depth, 3))
+                self._dialectic_depth_levels = cfg.dialectic_depth_levels
                cap = raw.get("reasoningLevelCap")
-                if cap and cap in ("minimal", "low", "mid", "high"):
+                if cap and cap in ("minimal", "low", "medium", "high"):
                    self._reasoning_level_cap = cap
            except Exception as e:
                logger.debug("Honcho cost-awareness config parse error: %s", e)
@@ -251,9 +322,7 @@ class HonchoMemoryProvider(MemoryProvider):
            # ----- Port #1957: lazy session init for tools-only mode -----
            if self._recall_mode == "tools":
                if cfg.init_on_session_start:
-                    # Eager init: create session now so sync_turn() works from turn 1.
-                    # Does NOT enable auto-injection — prefetch() still returns empty.
-                    logger.debug("Honcho tools-only mode — eager session init (initOnSessionStart=true)")
+                    # Eager init even in tools mode (opt-in)
                    self._do_session_init(cfg, session_id, **kwargs)
                    return
                # Defer actual session creation until first tool call
@@ -287,8 +356,13 @@ class HonchoMemoryProvider(MemoryProvider):

        # ----- B3: resolve_session_name -----
        session_title = kwargs.get("session_title")
+        gateway_session_key = kwargs.get("gateway_session_key")
        self._session_key = (
-            cfg.resolve_session_name(session_title=session_title, session_id=session_id)
+            cfg.resolve_session_name(
+                session_title=session_title,
+                session_id=session_id,
+                gateway_session_key=gateway_session_key,
+            )
            or session_id
            or "hermes-default"
        )
@@ -299,12 +373,21 @@ class HonchoMemoryProvider(MemoryProvider):
        self._session_initialized = True

        # ----- B6: Memory file migration (one-time, for new sessions) -----
+        # Skip under per-session strategy: every Hermes run creates a fresh
+        # Honcho session by design, so uploading MEMORY.md/USER.md/SOUL.md to
+        # each one would flood the backend with short-lived duplicates instead
+        # of performing a one-time migration.
        try:
-            if not session.messages:
+            if not session.messages and cfg.session_strategy != "per-session":
                from hermes_constants import get_hermes_home
                mem_dir = str(get_hermes_home() / "memories")
                self._manager.migrate_memory_files(self._session_key, mem_dir)
                logger.debug("Honcho memory file migration attempted for new session: %s", self._session_key)
+            elif cfg.session_strategy == "per-session":
+                logger.debug(
+                    "Honcho memory file migration skipped: per-session strategy creates a fresh session per run (%s)",
+                    self._session_key,
+                )
        except Exception as e:
            logger.debug("Honcho memory file migration skipped: %s", e)

@@ -347,6 +430,11 @@ class HonchoMemoryProvider(MemoryProvider):
        """Format the prefetch context dict into a readable system prompt block."""
        parts = []

+        # Session summary — session-scoped context, placed first for relevance
+        summary = ctx.get("summary", "")
+        if summary:
+            parts.append(f"## Session Summary\n{summary}")
+
        rep = ctx.get("representation", "")
        if rep:
            parts.append(f"## User Representation\n{rep}")
@@ -370,9 +458,9 @@ class HonchoMemoryProvider(MemoryProvider):
    def system_prompt_block(self) -> str:
        """Return system prompt text, adapted by recall_mode.

-        B4: On the FIRST call, fetch and bake the full Honcho context
-        (user representation, peer card, AI representation, continuity synthesis).
-        Subsequent calls return the cached block for prompt caching stability.
+        Returns only the mode header and tool instructions — static text
+        that doesn't change between turns (prompt-cache friendly).
+        Live context (representation, card) is injected via prefetch().
        """
        if self._cron_skipped:
            return ""
@@ -382,24 +470,10 @@ class HonchoMemoryProvider(MemoryProvider):
                return (
                    "# Honcho Memory\n"
                    "Active (tools-only mode). Use honcho_profile, honcho_search, "
-                    "honcho_context, and honcho_conclude tools to access user memory."
+                    "honcho_reasoning, honcho_context, and honcho_conclude tools to access user memory."
                )
            return ""

-        # ----- B4: First-turn context baking -----
-        first_turn_block = ""
-        if self._recall_mode in ("context", "hybrid"):
-            with self._first_turn_lock:
-                if self._first_turn_context is None:
-                    # First call — fetch and cache
-                    try:
-                        ctx = self._manager.get_prefetch_context(self._session_key)
-                        self._first_turn_context = self._format_first_turn_context(ctx) if ctx else ""
-                    except Exception as e:
-                        logger.debug("Honcho first-turn context fetch failed: %s", e)
-                        self._first_turn_context = ""
-                first_turn_block = self._first_turn_context
-
        # ----- B1: adapt text based on recall_mode -----
        if self._recall_mode == "context":
            header = (
@@ -412,7 +486,8 @@ class HonchoMemoryProvider(MemoryProvider):
            header = (
                "# Honcho Memory\n"
                "Active (tools-only mode). Use honcho_profile for a quick factual snapshot, "
-                "honcho_search for raw excerpts, honcho_context for synthesized answers, "
+                "honcho_search for raw excerpts, honcho_context for raw peer context, "
+                "honcho_reasoning for synthesized answers, "
                "honcho_conclude to save facts about the user. "
                "No automatic context injection — you must use tools to access memory."
            )
@@ -421,16 +496,19 @@ class HonchoMemoryProvider(MemoryProvider):
                "# Honcho Memory\n"
                "Active (hybrid mode). Relevant context is auto-injected AND memory tools are available. "
                "Use honcho_profile for a quick factual snapshot, "
-                "honcho_search for raw excerpts, honcho_context for synthesized answers, "
+                "honcho_search for raw excerpts, honcho_context for raw peer context, "
+                "honcho_reasoning for synthesized answers, "
                "honcho_conclude to save facts about the user."
            )

-        if first_turn_block:
-            return f"{header}\n\n{first_turn_block}"
        return header

    def prefetch(self, query: str, *, session_id: str = "") -> str:
-        """Return prefetched dialectic context from background thread.
+        """Return base context (representation + card) plus dialectic supplement.
+
+        Assembles two layers:
+        1. Base context from peer.context() — cached, refreshed on context_cadence
+        2. Dialectic supplement — cached, refreshed on dialectic_cadence

        B1: Returns empty when recall_mode is "tools" (no injection).
        B5: Respects injection_frequency — "first-turn" returns cached/empty after turn 0.
@@ -443,22 +521,95 @@ class HonchoMemoryProvider(MemoryProvider):
        if self._recall_mode == "tools":
            return ""

-        # B5: injection_frequency — if "first-turn" and past first turn, return empty
-        if self._injection_frequency == "first-turn" and self._turn_count > 0:
+        # B5: injection_frequency — if "first-turn" and past first turn, return empty.
+        # _turn_count is 1-indexed (first user message = 1), so > 1 means "past first".
+        if self._injection_frequency == "first-turn" and self._turn_count > 1:
            return ""

+        parts = []
+
+        # ----- Layer 1: Base context (representation + card) -----
+        # On first call, fetch synchronously so turn 1 isn't empty.
+        # After that, serve from cache and refresh in background on cadence.
+        with self._base_context_lock:
+            if self._base_context_cache is None:
+                # First call — synchronous fetch
+                try:
+                    ctx = self._manager.get_prefetch_context(self._session_key)
+                    self._base_context_cache = self._format_first_turn_context(ctx) if ctx else ""
+                    self._last_context_turn = self._turn_count
+                except Exception as e:
+                    logger.debug("Honcho base context fetch failed: %s", e)
+                    self._base_context_cache = ""
+            base_context = self._base_context_cache
+
+        # Check if background context prefetch has a fresher result
+        if self._manager:
+            fresh_ctx = self._manager.pop_context_result(self._session_key)
+            if fresh_ctx:
+                formatted = self._format_first_turn_context(fresh_ctx)
+                if formatted:
+                    with self._base_context_lock:
+                        self._base_context_cache = formatted
+                    base_context = formatted
+
+        if base_context:
+            parts.append(base_context)
+
+        # ----- Layer 2: Dialectic supplement -----
+        # On the very first turn, no queue_prefetch() has run yet so the
+        # dialectic result is empty.  Run with a bounded timeout so a slow
+        # Honcho connection doesn't block the first response indefinitely.
+        # On timeout the result is skipped and queue_prefetch() will pick it
+        # up at the next cadence-allowed turn.
+        if self._last_dialectic_turn == -999 and query:
+            _first_turn_timeout = (
+                self._config.timeout if self._config and self._config.timeout else 8.0
+            )
+            _result_holder: list[str] = []
+
+            def _run_first_turn() -> None:
+                try:
+                    _result_holder.append(self._run_dialectic_depth(query))
+                except Exception as exc:
+                    logger.debug("Honcho first-turn dialectic failed: %s", exc)
+
+            _t = threading.Thread(target=_run_first_turn, daemon=True)
+            _t.start()
+            _t.join(timeout=_first_turn_timeout)
+            if not _t.is_alive():
+                first_turn_dialectic = _result_holder[0] if _result_holder else ""
+                if first_turn_dialectic and first_turn_dialectic.strip():
+                    with self._prefetch_lock:
+                        self._prefetch_result = first_turn_dialectic
+                self._last_dialectic_turn = self._turn_count
+            else:
+                logger.debug(
+                    "Honcho first-turn dialectic timed out (%.1fs) — "
+                    "will inject at next cadence-allowed turn",
+                    _first_turn_timeout,
+                )
+                # Don't update _last_dialectic_turn: queue_prefetch() will
+                # retry at the next cadence-allowed turn via the async path.
+
        if self._prefetch_thread and self._prefetch_thread.is_alive():
            self._prefetch_thread.join(timeout=3.0)
        with self._prefetch_lock:
-            result = self._prefetch_result
+            dialectic_result = self._prefetch_result
            self._prefetch_result = ""
-        if not result:
+
+        if dialectic_result and dialectic_result.strip():
+            parts.append(dialectic_result)
+
+        if not parts:
            return ""

+        result = "\n\n".join(parts)
+
        # ----- Port #3265: token budget enforcement -----
        result = self._truncate_to_budget(result)

-        return f"## Honcho Context\n{result}"
+        return result

    def _truncate_to_budget(self, text: str) -> str:
        """Truncate text to fit within context_tokens budget if set."""
@@ -475,9 +626,11 @@ class HonchoMemoryProvider(MemoryProvider):
        return truncated + " …"

    def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
-        """Fire a background dialectic query for the upcoming turn.
+        """Fire background prefetch threads for the upcoming turn.

-        B5: Checks cadence before firing background threads.
+        B5: Checks cadence independently for dialectic and context refresh.
+        Context refresh updates the base layer (representation + card).
+        Dialectic fires the LLM reasoning supplement.
        """
        if self._cron_skipped:
            return
@@ -488,6 +641,15 @@ class HonchoMemoryProvider(MemoryProvider):
        if self._recall_mode == "tools":
            return

+        # ----- Context refresh (base layer) — independent cadence -----
+        if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
+            self._last_context_turn = self._turn_count
+            try:
+                self._manager.prefetch_context(self._session_key, query)
+            except Exception as e:
+                logger.debug("Honcho context prefetch failed: %s", e)
+
+        # ----- Dialectic prefetch (supplement layer) -----
        # B5: cadence check — skip if too soon since last dialectic call
        if self._dialectic_cadence > 1:
            if (self._turn_count - self._last_dialectic_turn) < self._dialectic_cadence:
@@ -499,9 +661,7 @@ class HonchoMemoryProvider(MemoryProvider):

        def _run():
            try:
-                result = self._manager.dialectic_query(
-                    self._session_key, query, peer="user"
-                )
+                result = self._run_dialectic_depth(query)
                if result and result.strip():
                    with self._prefetch_lock:
                        self._prefetch_result = result
@@ -513,13 +673,140 @@ class HonchoMemoryProvider(MemoryProvider):
        )
        self._prefetch_thread.start()

-        # Also fire context prefetch if cadence allows
-        if self._context_cadence <= 1 or (self._turn_count - self._last_context_turn) >= self._context_cadence:
-            self._last_context_turn = self._turn_count
-            try:
-                self._manager.prefetch_context(self._session_key, query)
-            except Exception as e:
-                logger.debug("Honcho context prefetch failed: %s", e)
+    # ----- Dialectic depth: multi-pass .chat() with cold/warm prompts -----
+
+    # Proportional reasoning levels per depth/pass when dialecticDepthLevels
+    # is not configured. The base level is dialecticReasoningLevel.
+    # Index: (depth, pass) → level relative to base.
+    _PROPORTIONAL_LEVELS: dict[tuple[int, int], str] = {
+        # depth 1: single pass at base level
+        (1, 0): "base",
+        # depth 2: pass 0 lighter, pass 1 at base
+        (2, 0): "minimal",
+        (2, 1): "base",
+        # depth 3: pass 0 lighter, pass 1 at base, pass 2 one above minimal
+        (3, 0): "minimal",
+        (3, 1): "base",
+        (3, 2): "low",
+    }
+
+    _LEVEL_ORDER = ("minimal", "low", "medium", "high", "max")
+
+    def _resolve_pass_level(self, pass_idx: int) -> str:
+        """Resolve reasoning level for a given pass index.
+
+        Uses dialecticDepthLevels if configured, otherwise proportional
+        defaults relative to dialecticReasoningLevel.
+        """
+        if self._dialectic_depth_levels and pass_idx < len(self._dialectic_depth_levels):
+            return self._dialectic_depth_levels[pass_idx]
+
+        base = (self._config.dialectic_reasoning_level if self._config else "low")
+        mapping = self._PROPORTIONAL_LEVELS.get((self._dialectic_depth, pass_idx))
+        if mapping is None or mapping == "base":
+            return base
+        return mapping
+
+    def _build_dialectic_prompt(self, pass_idx: int, prior_results: list[str], is_cold: bool) -> str:
+        """Build the prompt for a given dialectic pass.
+
+        Pass 0: cold start (general user query) or warm (session-scoped).
+        Pass 1: self-audit / targeted synthesis against gaps from pass 0.
+        Pass 2: reconciliation / contradiction check across prior passes.
+        """
+        if pass_idx == 0:
+            if is_cold:
+                return (
+                    "Who is this person? What are their preferences, goals, "
+                    "and working style? Focus on facts that would help an AI "
+                    "assistant be immediately useful."
+                )
+            return (
+                "Given what's been discussed in this session so far, what "
+                "context about this user is most relevant to the current "
+                "conversation? Prioritize active context over biographical facts."
+            )
+        elif pass_idx == 1:
+            prior = prior_results[-1] if prior_results else ""
+            return (
+                f"Given this initial assessment:\n\n{prior}\n\n"
+                "What gaps remain in your understanding that would help "
+                "going forward? Synthesize what you actually know about "
+                "the user's current state and immediate needs, grounded "
+                "in evidence from recent sessions."
+            )
+        else:
+            # pass 2: reconciliation
+            return (
+                f"Prior passes produced:\n\n"
+                f"Pass 1:\n{prior_results[0] if len(prior_results) > 0 else '(empty)'}\n\n"
+                f"Pass 2:\n{prior_results[1] if len(prior_results) > 1 else '(empty)'}\n\n"
+                "Do these assessments cohere? Reconcile any contradictions "
+                "and produce a final, concise synthesis of what matters most "
+                "for the current conversation."
+            )
+
+    @staticmethod
+    def _signal_sufficient(result: str) -> bool:
+        """Check if a dialectic pass returned enough signal to skip further passes.
+
+        Heuristic: a response longer than 100 chars with some structure
+        (section headers, bullets, or an ordered list) is considered sufficient.
+        """
+        if not result or len(result.strip()) < 100:
+            return False
+        # Structured output with sections/bullets is strong signal
+        if "\n" in result and (
+            "##" in result
+            or "•" in result
+            or re.search(r"^[*-] ", result, re.MULTILINE)
+            or re.search(r"^\s*\d+\. ", result, re.MULTILINE)
+        ):
+            return True
+        # Long enough even without structure
+        return len(result.strip()) > 300
+
+    def _run_dialectic_depth(self, query: str) -> str:
+        """Execute up to dialecticDepth .chat() calls with conditional bail-out.
+
+        Cold start (no base context): general user-oriented query.
+        Warm session (base context exists): session-scoped query.
+        Each pass is conditional — bails early if prior pass returned strong signal.
+        Returns the best (usually last) result.
+        """
+        if not self._manager or not self._session_key:
+            return ""
+
+        is_cold = not self._base_context_cache
+        results: list[str] = []
+
+        for i in range(self._dialectic_depth):
+            if i == 0:
+                prompt = self._build_dialectic_prompt(0, results, is_cold)
+            else:
+                # Skip further passes if prior pass delivered strong signal
+                if results and self._signal_sufficient(results[-1]):
+                    logger.debug("Honcho dialectic depth %d: pass %d skipped, prior signal sufficient",
+                                 self._dialectic_depth, i)
+                    break
+                prompt = self._build_dialectic_prompt(i, results, is_cold)
+
+            level = self._resolve_pass_level(i)
+            logger.debug("Honcho dialectic depth %d: pass %d, level=%s, cold=%s",
+                         self._dialectic_depth, i, level, is_cold)
+
+            result = self._manager.dialectic_query(
+                self._session_key, prompt,
+                reasoning_level=level,
+                peer="user",
+            )
+            results.append(result or "")
+
+        # Return the last non-empty result (deepest pass that ran)
+        for r in reversed(results):
+            if r and r.strip():
+                return r
+        return ""

    def on_turn_start(self, turn_number: int, message: str, **kwargs) -> None:
        """Track turn count for cadence and injection_frequency logic."""
@@ -659,7 +946,14 @@ class HonchoMemoryProvider(MemoryProvider):

        try:
            if tool_name == "honcho_profile":
-                card = self._manager.get_peer_card(self._session_key)
+                peer = args.get("peer", "user")
+                card_update = args.get("card")
+                if card_update:
+                    result = self._manager.set_peer_card(self._session_key, card_update, peer=peer)
+                    if result is None:
+                        return tool_error("Failed to update peer card.")
+                    return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
+                card = self._manager.get_peer_card(self._session_key, peer=peer)
                if not card:
                    return json.dumps({"result": "No profile facts available yet."})
                return json.dumps({"result": card})
@@ -669,30 +963,68 @@ class HonchoMemoryProvider(MemoryProvider):
                if not query:
                    return tool_error("Missing required parameter: query")
                max_tokens = min(int(args.get("max_tokens", 800)), 2000)
+                peer = args.get("peer", "user")
                result = self._manager.search_context(
-                    self._session_key, query, max_tokens=max_tokens
+                    self._session_key, query, max_tokens=max_tokens, peer=peer
                )
                if not result:
                    return json.dumps({"result": "No relevant context found."})
                return json.dumps({"result": result})

-            elif tool_name == "honcho_context":
+            elif tool_name == "honcho_reasoning":
                query = args.get("query", "")
                if not query:
                    return tool_error("Missing required parameter: query")
                peer = args.get("peer", "user")
+                reasoning_level = args.get("reasoning_level")
                result = self._manager.dialectic_query(
-                    self._session_key, query, peer=peer
+                    self._session_key, query,
+                    reasoning_level=reasoning_level,
+                    peer=peer,
                )
+                # Update cadence tracker so auto-injection respects the gap after an explicit call
+                self._last_dialectic_turn = self._turn_count
                return json.dumps({"result": result or "No result from Honcho."})

+            elif tool_name == "honcho_context":
+                peer = args.get("peer", "user")
+                ctx = self._manager.get_session_context(self._session_key, peer=peer)
+                if not ctx:
+                    return json.dumps({"result": "No context available yet."})
+                parts = []
+                if ctx.get("summary"):
+                    parts.append(f"## Summary\n{ctx['summary']}")
+                if ctx.get("representation"):
+                    parts.append(f"## Representation\n{ctx['representation']}")
+                if ctx.get("card"):
+                    parts.append(f"## Card\n{ctx['card']}")
+                if ctx.get("recent_messages"):
+                    msgs = ctx["recent_messages"]
+                    msg_str = "\n".join(
+                        f"  [{m['role']}] {m['content'][:200]}"
+                        for m in msgs[-5:]  # last 5 for brevity
+                    )
+                    parts.append(f"## Recent messages\n{msg_str}")
+                return json.dumps({"result": "\n\n".join(parts) or "No context available."})
+
            elif tool_name == "honcho_conclude":
-                conclusion = args.get("conclusion", "")
-                if not conclusion:
-                    return tool_error("Missing required parameter: conclusion")
-                ok = self._manager.create_conclusion(self._session_key, conclusion)
+                delete_id = (args.get("delete_id") or "").strip()
+                conclusion = args.get("conclusion", "").strip()
+                peer = args.get("peer", "user")
+
+                has_delete_id = bool(delete_id)
+                has_conclusion = bool(conclusion)
+                if has_delete_id == has_conclusion:
+                    return tool_error("Exactly one of conclusion or delete_id must be provided.")
+
+                if has_delete_id:
+                    ok = self._manager.delete_conclusion(self._session_key, delete_id, peer=peer)
+                    if ok:
+                        return json.dumps({"result": f"Conclusion {delete_id} deleted."})
+                    return tool_error(f"Failed to delete conclusion {delete_id}.")
+                ok = self._manager.create_conclusion(self._session_key, conclusion, peer=peer)
                if ok:
-                    return json.dumps({"result": f"Conclusion saved: {conclusion}"})
+                    return json.dumps({"result": f"Conclusion saved for {peer}: {conclusion}"})
                return tool_error("Failed to save conclusion.")

            return tool_error(f"Unknown tool: {tool_name}")
@@ -440,11 +440,43 @@ def cmd_setup(args) -> None:
    if new_recall in ("hybrid", "context", "tools"):
        hermes_host["recallMode"] = new_recall

-    # --- 7. Session strategy ---
-    current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-directory")
+    # --- 7. Context token budget ---
+    current_ctx_tokens = hermes_host.get("contextTokens") or cfg.get("contextTokens")
+    current_display = str(current_ctx_tokens) if current_ctx_tokens else "uncapped"
+    print("\n  Context injection per turn (hybrid/context recall modes only):")
+    print("    uncapped -- no limit (default)")
+    print("    N        -- token limit per turn (e.g. 1200)")
+    new_ctx_tokens = _prompt("Context tokens", default=current_display)
+    if new_ctx_tokens.strip().lower() in ("none", "uncapped", "no limit"):
+        hermes_host.pop("contextTokens", None)
+    elif new_ctx_tokens.strip() == "":
+        pass  # keep current
+    else:
+        try:
+            val = int(new_ctx_tokens)
+            if val >= 0:
+                hermes_host["contextTokens"] = val
+        except (ValueError, TypeError):
+            pass  # keep current
+
+    # --- 7b. Dialectic cadence ---
+    current_dialectic = str(hermes_host.get("dialecticCadence") or cfg.get("dialecticCadence") or "3")
+    print("\n  Dialectic cadence:")
+    print("    How often Honcho rebuilds its user model (LLM call on Honcho backend).")
+    print("    1 = every turn (aggressive), 3 = every 3 turns (recommended), 5+ = sparse.")
+    new_dialectic = _prompt("Dialectic cadence", default=current_dialectic)
+    try:
+        val = int(new_dialectic)
+        if val >= 1:
+            hermes_host["dialecticCadence"] = val
+    except (ValueError, TypeError):
+        hermes_host["dialecticCadence"] = 3
+
+    # --- 8. Session strategy ---
+    current_strat = hermes_host.get("sessionStrategy") or cfg.get("sessionStrategy", "per-session")
    print("\n  Session strategy:")
-    print("    per-directory -- one session per working directory (default)")
-    print("    per-session   -- new Honcho session each run")
+    print("    per-session   -- each run starts clean, Honcho injects context automatically")
+    print("    per-directory -- reuses session per dir, prior context auto-injected each run")
    print("    per-repo      -- one session per git repository")
    print("    global        -- single session across all directories")
    new_strat = _prompt("Session strategy", default=current_strat)
@@ -490,10 +522,11 @@ def cmd_setup(args) -> None:
    print(f"  Recall:    {hcfg.recall_mode}")
    print(f"  Sessions:  {hcfg.session_strategy}")
    print("\n  Honcho tools available in chat:")
-    print("    honcho_context   -- ask Honcho about the user (LLM-synthesized)")
-    print("    honcho_search    -- semantic search over history (no LLM)")
-    print("    honcho_profile   -- peer card, key facts (no LLM)")
-    print("    honcho_conclude  -- persist a user fact to memory (no LLM)")
+    print("    honcho_context   -- session context: summary, representation, card, messages")
+    print("    honcho_search    -- semantic search over history")
+    print("    honcho_profile   -- peer card, key facts")
+    print("    honcho_reasoning -- ask Honcho a question, synthesized answer")
+    print("    honcho_conclude  -- persist a user fact to memory")
    print("\n  Other commands:")
    print("    hermes honcho status     -- show full config")
    print("    hermes honcho mode       -- change recall/observation mode")
@@ -585,13 +618,26 @@ def cmd_status(args) -> None:
    print(f"  Enabled:        {hcfg.enabled}")
    print(f"  API key:        {masked}")
    print(f"  Workspace:      {hcfg.workspace_id}")
-    print(f"  Config path:    {active_path}")
+
+    # Config paths — show where config was read from and where writes go
+    global_path = Path.home() / ".honcho" / "config.json"
+    print(f"  Config:         {active_path}")
    if write_path != active_path:
-        print(f"  Write path:     {write_path}  (instance-local)")
+        print(f"  Write to:       {write_path}  (profile-local)")
+    if active_path == global_path:
+        print(f"  Fallback:       (none — using global ~/.honcho/config.json)")
+    elif global_path.exists():
+        print(f"  Fallback:       {global_path}  (exists, cross-app interop)")
+
    print(f"  AI peer:        {hcfg.ai_peer}")
    print(f"  User peer:      {hcfg.peer_name or 'not set'}")
    print(f"  Session key:    {hcfg.resolve_session_name()}")
+    print(f"  Session strat:  {hcfg.session_strategy}")
    print(f"  Recall mode:    {hcfg.recall_mode}")
+    print(f"  Context budget: {hcfg.context_tokens or '(uncapped)'} tokens")
+    raw = getattr(hcfg, "raw", None) or {}
+    dialectic_cadence = raw.get("dialecticCadence") or 3
+    print(f"  Dialectic cad:  every {dialectic_cadence} turn{'s' if dialectic_cadence != 1 else ''}")
    print(f"  Observation:    user(me={hcfg.user_observe_me},others={hcfg.user_observe_others}) ai(me={hcfg.ai_observe_me},others={hcfg.ai_observe_others})")
    print(f"  Write freq:     {hcfg.write_frequency}")

@@ -599,8 +645,8 @@ def cmd_status(args) -> None:
        print("\n  Connection... ", end="", flush=True)
        try:
            client = get_honcho_client(hcfg)
-            print("OK")
            _show_peer_cards(hcfg, client)
+            print("OK")
        except Exception as e:
            print(f"FAILED ({e})\n")
    else:
@@ -824,6 +870,41 @@ def cmd_mode(args) -> None:
    print(f"  {label}Recall mode -> {mode_arg}  ({MODES[mode_arg]})\n")


+def cmd_strategy(args) -> None:
+    """Show or set the session strategy."""
+    STRATEGIES = {
+        "per-session": "each run starts clean, Honcho injects context automatically",
+        "per-directory": "reuses session per dir, prior context auto-injected each run",
+        "per-repo": "one session per git repository",
+        "global": "single session across all directories",
+    }
+    cfg = _read_config()
+    strat_arg = getattr(args, "strategy", None)
+
+    if strat_arg is None:
+        current = (
+            (cfg.get("hosts") or {}).get(_host_key(), {}).get("sessionStrategy")
+            or cfg.get("sessionStrategy")
+            or "per-session"
+        )
+        print("\nHoncho session strategy\n" + "─" * 40)
+        for s, desc in STRATEGIES.items():
+            marker = " <-" if s == current else ""
+            print(f"  {s:<15}  {desc}{marker}")
+        print(f"\n  Set with: hermes honcho strategy [per-session|per-directory|per-repo|global]\n")
+        return
+
+    if strat_arg not in STRATEGIES:
+        print(f"  Invalid strategy '{strat_arg}'. Options: {', '.join(STRATEGIES)}\n")
+        return
+
+    host = _host_key()
+    label = f"[{host}] " if host != "hermes" else ""
+    cfg.setdefault("hosts", {}).setdefault(host, {})["sessionStrategy"] = strat_arg
+    _write_config(cfg)
+    print(f"  {label}Session strategy -> {strat_arg}  ({STRATEGIES[strat_arg]})\n")
+
+
 def cmd_tokens(args) -> None:
    """Show or set token budget settings."""
    cfg = _read_config()
@@ -1143,10 +1224,11 @@ def cmd_migrate(args) -> None:
    print("              automatically. Files become the seed, not the live store.")
    print()
    print("  Honcho tools (available to the agent during conversation)")
-    print("    honcho_context   — ask Honcho a question, get a synthesized answer (LLM)")
-    print("    honcho_search        — semantic search over stored context (no LLM)")
-    print("    honcho_profile       — fast peer card snapshot (no LLM)")
-    print("    honcho_conclude      — write a conclusion/fact back to memory (no LLM)")
+    print("    honcho_context   — session context: summary, representation, card, messages")
+    print("    honcho_search        — semantic search over stored context")
+    print("    honcho_profile       — fast peer card snapshot")
+    print("    honcho_reasoning     — ask Honcho a question, synthesized answer")
+    print("    honcho_conclude      — write a conclusion/fact back to memory")
    print()
    print("  Session naming")
    print("    OpenClaw: no persistent session concept — files are global.")
@@ -1197,6 +1279,8 @@ def honcho_command(args) -> None:
        cmd_peer(args)
    elif sub == "mode":
        cmd_mode(args)
+    elif sub == "strategy":
+        cmd_strategy(args)
    elif sub == "tokens":
        cmd_tokens(args)
    elif sub == "identity":
@@ -1211,7 +1295,7 @@ def honcho_command(args) -> None:
        cmd_sync(args)
    else:
        print(f"  Unknown honcho command: {sub}")
-        print("  Available: status, sessions, map, peer, mode, tokens, identity, migrate, enable, disable, sync\n")
+        print("  Available: status, sessions, map, peer, mode, strategy, tokens, identity, migrate, enable, disable, sync\n")


 def register_cli(subparser) -> None:
@@ -1270,6 +1354,15 @@ def register_cli(subparser) -> None:
        help="Recall mode to set (hybrid/context/tools). Omit to show current.",
    )

+    strategy_parser = subs.add_parser(
+        "strategy", help="Show or set session strategy (per-session/per-directory/per-repo/global)",
+    )
+    strategy_parser.add_argument(
+        "strategy", nargs="?", metavar="STRATEGY",
+        choices=("per-session", "per-directory", "per-repo", "global"),
+        help="Session strategy to set. Omit to show current.",
+    )
+
    tokens_parser = subs.add_parser(
        "tokens", help="Show or set token budget for context and dialectic",
    )
@@ -94,6 +94,68 @@ def _resolve_bool(host_val, root_val, *, default: bool) -> bool:
    return default


+def _parse_context_tokens(host_val, root_val) -> int | None:
+    """Parse contextTokens: host wins, then root, then None (uncapped)."""
+    for val in (host_val, root_val):
+        if val is not None:
+            try:
+                return int(val)
+            except (ValueError, TypeError):
+                pass
+    return None
+
+
+def _parse_dialectic_depth(host_val, root_val) -> int:
+    """Parse dialecticDepth: host wins, then root, then 1. Clamped to 1-3."""
+    for val in (host_val, root_val):
+        if val is not None:
+            try:
+                return max(1, min(int(val), 3))
+            except (ValueError, TypeError):
+                pass
+    return 1
+
+
+_VALID_REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")
+
+
+def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] | None:
+    """Parse dialecticDepthLevels: optional array of reasoning levels per pass.
+
+    Returns None when not configured (use proportional defaults).
+    When configured, validates each level and truncates/pads to match depth.
+    """
+    for val in (host_val, root_val):
+        if val is not None and isinstance(val, list):
+            levels = [
+                lvl if lvl in _VALID_REASONING_LEVELS else "low"
+                for lvl in val[:depth]
+            ]
+            # Pad with "low" if array is shorter than depth
+            while len(levels) < depth:
+                levels.append("low")
+            return levels
+    return None
+
+
+def _resolve_optional_float(*values: Any) -> float | None:
+    """Return the first non-empty value coerced to a positive float."""
+    for value in values:
+        if value is None:
+            continue
+        if isinstance(value, str):
+            value = value.strip()
+            if not value:
+                continue
+        try:
+            parsed = float(value)
+        except (TypeError, ValueError):
+            continue
+        if parsed > 0:
+            return parsed
+    return None
+
+
 _VALID_OBSERVATION_MODES = {"unified", "directional"}
 _OBSERVATION_MODE_ALIASES = {"shared": "unified", "separate": "directional", "cross": "directional"}

@@ -159,6 +221,8 @@ class HonchoClientConfig:
    environment: str = "production"
    # Optional base URL for self-hosted Honcho (overrides environment mapping)
    base_url: str | None = None
+    # Optional request timeout in seconds for Honcho SDK HTTP calls
+    timeout: float | None = None
    # Identity
    peer_name: str | None = None
    ai_peer: str = "hermes"
@@ -168,17 +232,25 @@ class HonchoClientConfig:
    # Write frequency: "async" (background thread), "turn" (sync per turn),
    # "session" (flush on session end), or int (every N turns)
    write_frequency: str | int = "async"
-    # Prefetch budget
+    # Prefetch budget (None = no cap; set to an integer to bound auto-injected context)
    context_tokens: int | None = None
    # Dialectic (peer.chat) settings
    # reasoning_level: "minimal" | "low" | "medium" | "high" | "max"
    dialectic_reasoning_level: str = "low"
-    # dynamic: auto-bump reasoning level based on query length
-    #   true  — low->medium (120+ chars), low->high (400+ chars), capped at "high"
-    #   false — always use dialecticReasoningLevel as-is
+    # When true, the model can override reasoning_level per-call via the
+    # honcho_reasoning tool param (agentic). When false, always uses
+    # dialecticReasoningLevel and ignores model-provided overrides.
    dialectic_dynamic: bool = True
    # Max chars of dialectic result to inject into Hermes system prompt
    dialectic_max_chars: int = 600
+    # Dialectic depth: how many .chat() calls per dialectic cycle (1-3).
+    # Depth 1: single call. Depth 2: self-audit + targeted synthesis.
+    # Depth 3: self-audit + synthesis + reconciliation.
+    dialectic_depth: int = 1
+    # Optional per-pass reasoning level override. Array of reasoning levels
+    # matching dialectic_depth length. When None, uses proportional defaults
+    # derived from dialectic_reasoning_level.
+    dialectic_depth_levels: list[str] | None = None
    # Honcho API limits — configurable for self-hosted instances
    # Max chars per message sent via add_messages() (Honcho cloud: 25000)
    message_max_chars: int = 25000
@@ -189,10 +261,8 @@ class HonchoClientConfig:
    # "context" — auto-injected context only, Honcho tools removed
    # "tools"   — Honcho tools only, no auto-injected context
    recall_mode: str = "hybrid"
-    # When True and recallMode is "tools", create the Honcho session eagerly
-    # during initialize() instead of deferring to the first tool call.
-    # This ensures sync_turn() can write from the very first turn.
-    # Does NOT enable automatic context injection — only changes init timing.
+    # Eager init in tools mode — when true, initializes session during
+    # initialize() instead of deferring to first tool call
    init_on_session_start: bool = False
    # Observation mode: legacy string shorthand ("directional" or "unified").
    # Kept for backward compat; granular per-peer booleans below are preferred.
@@ -224,12 +294,14 @@ class HonchoClientConfig:
        resolved_host = host or resolve_active_host()
        api_key = os.environ.get("HONCHO_API_KEY")
        base_url = os.environ.get("HONCHO_BASE_URL", "").strip() or None
+        timeout = _resolve_optional_float(os.environ.get("HONCHO_TIMEOUT"))
        return cls(
            host=resolved_host,
            workspace_id=workspace_id,
            api_key=api_key,
            environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
            base_url=base_url,
+            timeout=timeout,
            ai_peer=resolved_host,
            enabled=bool(api_key or base_url),
        )
@@ -290,6 +362,11 @@ class HonchoClientConfig:
            or os.environ.get("HONCHO_BASE_URL", "").strip()
            or None
        )
+        timeout = _resolve_optional_float(
+            raw.get("timeout"),
+            raw.get("requestTimeout"),
+            os.environ.get("HONCHO_TIMEOUT"),
+        )

        # Auto-enable when API key or base_url is present (unless explicitly disabled)
        # Host-level enabled wins, then root-level, then auto-enable if key/url exists.
@@ -335,12 +412,16 @@ class HonchoClientConfig:
            api_key=api_key,
            environment=environment,
            base_url=base_url,
+            timeout=timeout,
            peer_name=host_block.get("peerName") or raw.get("peerName"),
            ai_peer=ai_peer,
            enabled=enabled,
            save_messages=save_messages,
            write_frequency=write_frequency,
-            context_tokens=host_block.get("contextTokens") or raw.get("contextTokens"),
+            context_tokens=_parse_context_tokens(
+                host_block.get("contextTokens"),
+                raw.get("contextTokens"),
+            ),
            dialectic_reasoning_level=(
                host_block.get("dialecticReasoningLevel")
                or raw.get("dialecticReasoningLevel")
@@ -356,6 +437,15 @@ class HonchoClientConfig:
                or raw.get("dialecticMaxChars")
                or 600
            ),
+            dialectic_depth=_parse_dialectic_depth(
+                host_block.get("dialecticDepth"),
+                raw.get("dialecticDepth"),
+            ),
+            dialectic_depth_levels=_parse_dialectic_depth_levels(
+                host_block.get("dialecticDepthLevels"),
+                raw.get("dialecticDepthLevels"),
+                depth=_parse_dialectic_depth(host_block.get("dialecticDepth"), raw.get("dialecticDepth")),
+            ),
            message_max_chars=int(
                host_block.get("messageMaxChars")
                or raw.get("messageMaxChars")
@@ -422,16 +512,18 @@ class HonchoClientConfig:
        cwd: str | None = None,
        session_title: str | None = None,
        session_id: str | None = None,
+        gateway_session_key: str | None = None,
    ) -> str | None:
        """Resolve Honcho session name.

        Resolution order:
          1. Manual directory override from sessions map
          2. Hermes session title (from /title command)
-          3. per-session strategy — Hermes session_id ({timestamp}_{hex})
-          4. per-repo strategy — git repo root directory name
-          5. per-directory strategy — directory basename
-          6. global strategy — workspace name
+          3. Gateway session key (stable per-chat identifier from gateway platforms)
+          4. per-session strategy — Hermes session_id ({timestamp}_{hex})
+          5. per-repo strategy — git repo root directory name
+          6. per-directory strategy — directory basename
+          7. global strategy — workspace name
        """
        import re

@@ -445,12 +537,22 @@ class HonchoClientConfig:

        # /title mid-session remap
        if session_title:
-            sanitized = re.sub(r'[^a-zA-Z0-9_-]', '-', session_title).strip('-')
+            sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', session_title).strip('-')
            if sanitized:
                if self.session_peer_prefix and self.peer_name:
                    return f"{self.peer_name}-{sanitized}"
                return sanitized

+        # Gateway session key: stable per-chat identifier passed by the gateway
+        # (e.g. "agent:main:telegram:dm:8439114563"). Sanitize colons to hyphens
+        # for Honcho session ID compatibility. This takes priority over strategy-
+        # based resolution because gateway platforms need per-chat isolation that
+        # cwd-based strategies cannot provide.
+        if gateway_session_key:
+            sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
+            if sanitized:
+                return sanitized
+
        # per-session: inherit Hermes session_id (new Honcho session each run)
        if self.session_strategy == "per-session" and session_id:
            if self.session_peer_prefix and self.peer_name:
@@ -512,13 +614,20 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
    # mapping, enabling remote self-hosted Honcho deployments without
    # requiring the server to live on localhost.
    resolved_base_url = config.base_url
-    if not resolved_base_url:
+    resolved_timeout = config.timeout
+    if not resolved_base_url or resolved_timeout is None:
        try:
            from hermes_cli.config import load_config
            hermes_cfg = load_config()
            honcho_cfg = hermes_cfg.get("honcho", {})
            if isinstance(honcho_cfg, dict):
-                resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
+                if not resolved_base_url:
+                    resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
+                if resolved_timeout is None:
+                    resolved_timeout = _resolve_optional_float(
+                        honcho_cfg.get("timeout"),
+                        honcho_cfg.get("request_timeout"),
+                    )
        except Exception:
            pass

@@ -553,6 +662,8 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
    }
    if resolved_base_url:
        kwargs["base_url"] = resolved_base_url
+    if resolved_timeout is not None:
+        kwargs["timeout"] = resolved_timeout

    _honcho_client = Honcho(**kwargs)

@@ -486,36 +486,9 @@ class HonchoSessionManager:

    _REASONING_LEVELS = ("minimal", "low", "medium", "high", "max")

-    def _dynamic_reasoning_level(self, query: str) -> str:
-        """
-        Pick a reasoning level for a dialectic query.
-
-        When dialecticDynamic is true (default), auto-bumps based on query
-        length so Honcho applies more inference where it matters:
-
-          < 120 chars  -> configured default (typically "low")
-          120-400 chars -> +1 level above default (cap at "high")
-          > 400 chars  -> +2 levels above default (cap at "high")
-
-        "max" is never selected automatically -- reserve it for explicit config.
-
-        When dialecticDynamic is false, always returns the configured level.
-        """
-        if not self._dialectic_dynamic:
-            return self._dialectic_reasoning_level
-
-        levels = self._REASONING_LEVELS
-        default_idx = levels.index(self._dialectic_reasoning_level) if self._dialectic_reasoning_level in levels else 1
-        n = len(query)
-        if n < 120:
-            bump = 0
-        elif n < 400:
-            bump = 1
-        else:
-            bump = 2
-        # Cap at "high" (index 3) for auto-selection
-        idx = min(default_idx + bump, 3)
-        return levels[idx]
+    def _default_reasoning_level(self) -> str:
+        """Return the configured default reasoning level."""
+        return self._dialectic_reasoning_level

    def dialectic_query(
        self, session_key: str, query: str,
@@ -532,8 +505,9 @@ class HonchoSessionManager:
        Args:
            session_key: The session key to query against.
            query: Natural language question.
-            reasoning_level: Override the config default. If None, uses
-                             _dynamic_reasoning_level(query).
+            reasoning_level: Override the configured default (dialecticReasoningLevel).
+                             Only honored when dialecticDynamic is true.
+                             If None or dialecticDynamic is false, uses the configured default.
            peer: Which peer to query — "user" (default) or "ai".

        Returns:
@@ -543,29 +517,34 @@ class HonchoSessionManager:
        if not session:
            return ""

+        target_peer_id = self._resolve_peer_id(session, peer)
+        if target_peer_id is None:
+            return ""
+
        # Guard: truncate query to Honcho's dialectic input limit
        if len(query) > self._dialectic_max_input_chars:
            query = query[:self._dialectic_max_input_chars].rsplit(" ", 1)[0]

-        level = reasoning_level or self._dynamic_reasoning_level(query)
+        if self._dialectic_dynamic and reasoning_level:
+            level = reasoning_level
+        else:
+            level = self._default_reasoning_level()

        try:
            if self._ai_observe_others:
-                # AI peer can observe user — use cross-observation routing
-                if peer == "ai":
-                    ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
+                # AI peer can observe other peers — use assistant as observer.
+                ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
+                if target_peer_id == session.assistant_peer_id:
                    result = ai_peer_obj.chat(query, reasoning_level=level) or ""
                else:
-                    ai_peer_obj = self._get_or_create_peer(session.assistant_peer_id)
                    result = ai_peer_obj.chat(
                        query,
-                        target=session.user_peer_id,
+                        target=target_peer_id,
                        reasoning_level=level,
                    ) or ""
            else:
-                # AI can't observe others — each peer queries self
-                peer_id = session.assistant_peer_id if peer == "ai" else session.user_peer_id
-                target_peer = self._get_or_create_peer(peer_id)
+                # Without cross-observation, each peer queries its own context.
+                target_peer = self._get_or_create_peer(target_peer_id)
                result = target_peer.chat(query, reasoning_level=level) or ""

            # Apply Hermes-side char cap before caching
@@ -647,10 +626,11 @@ class HonchoSessionManager:
        """
        Pre-fetch user and AI peer context from Honcho.

-        Fetches peer_representation and peer_card for both peers. search_query
-        is intentionally omitted — it would only affect additional excerpts
-        that this code does not consume, and passing the raw message exposes
-        conversation content in server access logs.
+        Fetches peer_representation and peer_card for both peers, plus the
+        session summary when available. search_query is intentionally omitted
+        — it would only affect additional excerpts that this code does not
+        consume, and passing the raw message exposes conversation content in
+        server access logs.

        Args:
            session_key: The session key to get context for.
@@ -658,15 +638,29 @@ class HonchoSessionManager:

        Returns:
            Dictionary with 'representation', 'card', 'ai_representation',
-            and 'ai_card' keys.
+            'ai_card', and optionally 'summary' keys.
        """
        session = self._cache.get(session_key)
        if not session:
            return {}

        result: dict[str, str] = {}
+
+        # Session summary — provides session-scoped context.
+        # Fresh sessions (per-session cold start, or first-ever per-directory)
+        # return null summary — the guard below handles that gracefully.
+        # Per-directory returning sessions get their accumulated summary.
        try:
-            user_ctx = self._fetch_peer_context(session.user_peer_id)
+            honcho_session = self._sessions_cache.get(session.honcho_session_id)
+            if honcho_session:
+                ctx = honcho_session.context(summary=True)
+                if ctx.summary and getattr(ctx.summary, "content", None):
+                    result["summary"] = ctx.summary.content
+        except Exception as e:
+            logger.debug("Failed to fetch session summary from Honcho: %s", e)
+
+        try:
+            user_ctx = self._fetch_peer_context(session.user_peer_id, target=session.user_peer_id)
            result["representation"] = user_ctx["representation"]
            result["card"] = "\n".join(user_ctx["card"])
        except Exception as e:
@@ -674,7 +668,7 @@ class HonchoSessionManager:

        # Also fetch AI peer's own representation so Hermes knows itself.
        try:
-            ai_ctx = self._fetch_peer_context(session.assistant_peer_id)
+            ai_ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
            result["ai_representation"] = ai_ctx["representation"]
            result["ai_card"] = "\n".join(ai_ctx["card"])
        except Exception as e:
@@ -862,7 +856,7 @@ class HonchoSessionManager:
            return [str(item) for item in card if item]
        return [str(card)]

-    def _fetch_peer_card(self, peer_id: str) -> list[str]:
+    def _fetch_peer_card(self, peer_id: str, *, target: str | None = None) -> list[str]:
        """Fetch a peer card directly from the peer object.

        This avoids relying on session.context(), which can return an empty
@@ -872,22 +866,33 @@ class HonchoSessionManager:
        peer = self._get_or_create_peer(peer_id)
        getter = getattr(peer, "get_card", None)
        if callable(getter):
-            return self._normalize_card(getter())
+            return self._normalize_card(getter(target=target) if target is not None else getter())

        legacy_getter = getattr(peer, "card", None)
        if callable(legacy_getter):
-            return self._normalize_card(legacy_getter())
+            return self._normalize_card(legacy_getter(target=target) if target is not None else legacy_getter())

        return []

-    def _fetch_peer_context(self, peer_id: str, search_query: str | None = None) -> dict[str, Any]:
+    def _fetch_peer_context(
+        self,
+        peer_id: str,
+        search_query: str | None = None,
+        *,
+        target: str | None = None,
+    ) -> dict[str, Any]:
        """Fetch representation + peer card directly from a peer object."""
        peer = self._get_or_create_peer(peer_id)
        representation = ""
        card: list[str] = []

        try:
-            ctx = peer.context(search_query=search_query) if search_query else peer.context()
+            context_kwargs: dict[str, Any] = {}
+            if target is not None:
+                context_kwargs["target"] = target
+            if search_query is not None:
+                context_kwargs["search_query"] = search_query
+            ctx = peer.context(**context_kwargs) if context_kwargs else peer.context()
            representation = (
                getattr(ctx, "representation", None)
                or getattr(ctx, "peer_representation", None)
@@ -899,24 +904,111 @@ class HonchoSessionManager:

        if not representation:
            try:
-                representation = peer.representation() or ""
+                representation = (
+                    peer.representation(target=target) if target is not None else peer.representation()
+                ) or ""
            except Exception as e:
                logger.debug("Direct peer.representation() failed for '%s': %s", peer_id, e)

        if not card:
            try:
-                card = self._fetch_peer_card(peer_id)
+                card = self._fetch_peer_card(peer_id, target=target)
            except Exception as e:
                logger.debug("Direct peer card fetch failed for '%s': %s", peer_id, e)

        return {"representation": representation, "card": card}

-    def get_peer_card(self, session_key: str) -> list[str]:
+    def get_session_context(self, session_key: str, peer: str = "user") -> dict[str, Any]:
+        """Fetch full session context from Honcho including summary.
+
+        Uses the session-level context() API which returns summary,
+        peer_representation, peer_card, and messages.
        """
-        Fetch the user peer's card — a curated list of key facts.
+        session = self._cache.get(session_key)
+        if not session:
+            return {}
+
+        honcho_session = self._sessions_cache.get(session.honcho_session_id)
+        if not honcho_session:
+            # Fall back to peer-level context, respecting the requested peer
+            peer_id = self._resolve_peer_id(session, peer)
+            if peer_id is None:
+                peer_id = session.user_peer_id
+            return self._fetch_peer_context(peer_id, target=peer_id)
+
+        try:
+            peer_id = self._resolve_peer_id(session, peer)
+            ctx = honcho_session.context(
+                summary=True,
+                peer_target=peer_id,
+                peer_perspective=session.user_peer_id if peer == "user" else session.assistant_peer_id,
+            )
+
+            result: dict[str, Any] = {}
+
+            # Summary
+            if ctx.summary:
+                result["summary"] = ctx.summary.content
+
+            # Peer representation and card
+            if ctx.peer_representation:
+                result["representation"] = ctx.peer_representation
+            if ctx.peer_card:
+                result["card"] = "\n".join(ctx.peer_card)
+
+            # Messages (last N for context)
+            if ctx.messages:
+                recent = ctx.messages[-10:]  # last 10 messages
+                result["recent_messages"] = [
+                    {"role": getattr(m, "peer_id", "unknown"), "content": (m.content or "")[:500]}
+                    for m in recent
+                ]
+
+            return result
+        except Exception as e:
+            logger.debug("Session context fetch failed: %s", e)
+            return {}
+
+    def _resolve_peer_id(self, session: HonchoSession, peer: str | None) -> str:
+        """Resolve a peer alias or explicit peer ID to a concrete Honcho peer ID.
+
+        Always returns a non-empty string: either a known peer ID or a
+        sanitized version of the caller-supplied alias/ID.
+        """
+        candidate = (peer or "user").strip()
+        if not candidate:
+            return session.user_peer_id
+
+        normalized = self._sanitize_id(candidate)
+        if normalized == self._sanitize_id("user"):
+            return session.user_peer_id
+        if normalized == self._sanitize_id("ai"):
+            return session.assistant_peer_id
+
+        return normalized
+
+    def _resolve_observer_target(
+        self,
+        session: HonchoSession,
+        peer: str | None,
+    ) -> tuple[str, str | None]:
+        """Resolve observer and target peer IDs for context/search/profile queries."""
+        target_peer_id = self._resolve_peer_id(session, peer)
+
+        if target_peer_id == session.assistant_peer_id:
+            return session.assistant_peer_id, session.assistant_peer_id
+
+        if self._ai_observe_others:
+            return session.assistant_peer_id, target_peer_id
+
+        return target_peer_id, None
+
+    def get_peer_card(self, session_key: str, peer: str = "user") -> list[str]:
+        """
+        Fetch a peer card — a curated list of key facts.

        Fast, no LLM reasoning. Returns raw structured facts Honcho has
-        inferred about the user (name, role, preferences, patterns).
+        inferred about the target peer (name, role, preferences, patterns).
        Empty list if unavailable.
        """
        session = self._cache.get(session_key)
@@ -924,12 +1016,19 @@ class HonchoSessionManager:
            return []

        try:
-            return self._fetch_peer_card(session.user_peer_id)
+            observer_peer_id, target_peer_id = self._resolve_observer_target(session, peer)
+            return self._fetch_peer_card(observer_peer_id, target=target_peer_id)
        except Exception as e:
            logger.debug("Failed to fetch peer card from Honcho: %s", e)
            return []

-    def search_context(self, session_key: str, query: str, max_tokens: int = 800) -> str:
+    def search_context(
+        self,
+        session_key: str,
+        query: str,
+        max_tokens: int = 800,
+        peer: str = "user",
+    ) -> str:
        """
        Semantic search over Honcho session context.

@@ -941,6 +1040,7 @@ class HonchoSessionManager:
            session_key: Session to search against.
            query: Search query for semantic matching.
            max_tokens: Token budget for returned content.
+            peer: Peer alias or explicit peer ID to search about.

        Returns:
            Relevant context excerpts as a string, or empty string if none.
@@ -950,7 +1050,13 @@ class HonchoSessionManager:
            return ""

        try:
-            ctx = self._fetch_peer_context(session.user_peer_id, search_query=query)
+            observer_peer_id, target = self._resolve_observer_target(session, peer)
+
+            ctx = self._fetch_peer_context(
+                observer_peer_id,
+                search_query=query,
+                target=target,
+            )
            parts = []
            if ctx["representation"]:
                parts.append(ctx["representation"])
@@ -962,16 +1068,17 @@ class HonchoSessionManager:
            logger.debug("Honcho search_context failed: %s", e)
            return ""

-    def create_conclusion(self, session_key: str, content: str) -> bool:
-        """Write a conclusion about the user back to Honcho.
+    def create_conclusion(self, session_key: str, content: str, peer: str = "user") -> bool:
+        """Write a conclusion about a target peer back to Honcho.

-        Conclusions are facts the AI peer observes about the user —
-        preferences, corrections, clarifications, project context.
-        They feed into the user's peer card and representation.
+        Conclusions are facts a peer observes about another peer or itself —
+        preferences, corrections, clarifications, and project context.
+        They feed into the target peer's card and representation.

        Args:
            session_key: Session to associate the conclusion with.
-            content: The conclusion text (e.g. "User prefers dark mode").
+            content: The conclusion text.
+            peer: Peer alias or explicit peer ID. "user" is the default alias.

        Returns:
            True on success, False on failure.
@@ -985,25 +1092,90 @@ class HonchoSessionManager:
            return False

        try:
-            if self._ai_observe_others:
-                # AI peer creates conclusion about user (cross-observation)
+            target_peer_id = self._resolve_peer_id(session, peer)
+            if target_peer_id is None:
+                logger.warning("Could not resolve conclusion peer '%s' for session '%s'", peer, session_key)
+                return False
+
+            if target_peer_id == session.assistant_peer_id:
                assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
-                conclusions_scope = assistant_peer.conclusions_of(session.user_peer_id)
+                conclusions_scope = assistant_peer.conclusions_of(session.assistant_peer_id)
+            elif self._ai_observe_others:
+                assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
+                conclusions_scope = assistant_peer.conclusions_of(target_peer_id)
            else:
-                # AI can't observe others — user peer creates self-conclusion
-                user_peer = self._get_or_create_peer(session.user_peer_id)
-                conclusions_scope = user_peer.conclusions_of(session.user_peer_id)
+                target_peer = self._get_or_create_peer(target_peer_id)
+                conclusions_scope = target_peer.conclusions_of(target_peer_id)

            conclusions_scope.create([{
                "content": content.strip(),
                "session_id": session.honcho_session_id,
            }])
-            logger.info("Created conclusion for %s: %s", session_key, content[:80])
+            logger.info("Created conclusion about %s for %s: %s", target_peer_id, session_key, content[:80])
            return True
        except Exception as e:
            logger.error("Failed to create conclusion: %s", e)
            return False

+    def delete_conclusion(self, session_key: str, conclusion_id: str, peer: str = "user") -> bool:
+        """Delete a conclusion by ID. Use only for PII removal.
+
+        Args:
+            session_key: Session key for peer resolution.
+            conclusion_id: The conclusion ID to delete.
+            peer: Peer alias or explicit peer ID.
+
+        Returns:
+            True on success, False on failure.
+        """
+        session = self._cache.get(session_key)
+        if not session:
+            return False
+        try:
+            target_peer_id = self._resolve_peer_id(session, peer)
+            if target_peer_id == session.assistant_peer_id:
+                observer = self._get_or_create_peer(session.assistant_peer_id)
+                scope = observer.conclusions_of(session.assistant_peer_id)
+            elif self._ai_observe_others:
+                observer = self._get_or_create_peer(session.assistant_peer_id)
+                scope = observer.conclusions_of(target_peer_id)
+            else:
+                target_peer = self._get_or_create_peer(target_peer_id)
+                scope = target_peer.conclusions_of(target_peer_id)
+            scope.delete(conclusion_id)
+            logger.info("Deleted conclusion %s for %s", conclusion_id, session_key)
+            return True
+        except Exception as e:
+            logger.error("Failed to delete conclusion %s: %s", conclusion_id, e)
+            return False
+
+    def set_peer_card(self, session_key: str, card: list[str], peer: str = "user") -> list[str] | None:
+        """Update a peer's card.
+
+        Args:
+            session_key: Session key for peer resolution.
+            card: New peer card as list of fact strings.
+            peer: Peer alias or explicit peer ID.
+
+        Returns:
+            Updated card on success, None on failure.
+        """
+        session = self._cache.get(session_key)
+        if not session:
+            return None
+        try:
+            peer_id = self._resolve_peer_id(session, peer)
+            if peer_id is None:
+                logger.warning("Could not resolve peer '%s' for set_peer_card in session '%s'", peer, session_key)
+                return None
+            peer_obj = self._get_or_create_peer(peer_id)
+            result = peer_obj.set_card(card)
+            logger.info("Updated peer card for %s (%d facts)", peer_id, len(card))
+            return result
+        except Exception as e:
+            logger.error("Failed to set peer card: %s", e)
+            return None
+
    def seed_ai_identity(self, session_key: str, content: str, source: str = "manual") -> bool:
        """
        Seed the AI peer's Honcho representation from text content.
@@ -1061,7 +1233,7 @@ class HonchoSessionManager:
            return {"representation": "", "card": ""}

        try:
-            ctx = self._fetch_peer_context(session.assistant_peer_id)
+            ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
            return {
                "representation": ctx["representation"] or "",
                "card": "\n".join(ctx["card"]),
@@ -10,8 +10,9 @@ lifecycle instead of read-only search endpoints.
 Config via environment variables (profile-scoped via each profile's .env):
  OPENVIKING_ENDPOINT  — Server URL (default: http://127.0.0.1:1933)
  OPENVIKING_API_KEY   — API key (required for authenticated servers)
-  OPENVIKING_ACCOUNT   — Tenant account (default: root)
+  OPENVIKING_ACCOUNT   — Tenant account (default: default)
  OPENVIKING_USER      — Tenant user (default: default)
+  OPENVIKING_AGENT   — Tenant agent (default: hermes)

 Capabilities:
  - Automatic memory extraction on session commit (6 categories)
@@ -80,11 +81,12 @@ class _VikingClient:
    """Thin HTTP client for the OpenViking REST API."""

    def __init__(self, endpoint: str, api_key: str = "",
-                 account: str = "", user: str = ""):
+                 account: str = "", user: str = "", agent: str = ""):
        self._endpoint = endpoint.rstrip("/")
        self._api_key = api_key
-        self._account = account or os.environ.get("OPENVIKING_ACCOUNT", "root")
+        self._account = account or os.environ.get("OPENVIKING_ACCOUNT", "default")
        self._user = user or os.environ.get("OPENVIKING_USER", "default")
+        self._agent = agent or os.environ.get("OPENVIKING_AGENT", "hermes")
        self._httpx = _get_httpx()
        if self._httpx is None:
            raise ImportError("httpx is required for OpenViking: pip install httpx")
@@ -94,6 +96,7 @@ class _VikingClient:
            "Content-Type": "application/json",
            "X-OpenViking-Account": self._account,
            "X-OpenViking-User": self._user,
+            "X-OpenViking-Agent": self._agent,
        }
        if self._api_key:
            h["X-API-Key"] = self._api_key
@@ -282,20 +285,44 @@ class OpenVikingMemoryProvider(MemoryProvider):
            },
            {
                "key": "api_key",
-                "description": "OpenViking API key",
+                "description": "OpenViking API key (leave blank for local dev mode)",
                "secret": True,
                "env_var": "OPENVIKING_API_KEY",
            },
+            {
+                "key": "account",
+                "description": "OpenViking tenant account ID ([default], used when local mode, OPENVIKING_API_KEY is empty)",
+                "default": "default",
+                "env_var": "OPENVIKING_ACCOUNT",
+            },
+            {
+                "key": "user",
+                "description": "OpenViking user ID within the account ([default], used when local mode, OPENVIKING_API_KEY is empty)",
+                "default": "default",
+                "env_var": "OPENVIKING_USER",
+            },
+            {
+                "key": "agent",
+                "description": "OpenViking agent ID within the account ([hermes], useful in multi-agent mode)",
+                "default": "hermes",
+                "env_var": "OPENVIKING_AGENT",
+            },
        ]

    def initialize(self, session_id: str, **kwargs) -> None:
        self._endpoint = os.environ.get("OPENVIKING_ENDPOINT", _DEFAULT_ENDPOINT)
        self._api_key = os.environ.get("OPENVIKING_API_KEY", "")
+        self._account = os.environ.get("OPENVIKING_ACCOUNT", "default")
+        self._user = os.environ.get("OPENVIKING_USER", "default")
+        self._agent = os.environ.get("OPENVIKING_AGENT", "hermes")
        self._session_id = session_id
        self._turn_count = 0

        try:
-            self._client = _VikingClient(self._endpoint, self._api_key)
+            self._client = _VikingClient(
+                self._endpoint, self._api_key,
+                account=self._account, user=self._user, agent=self._agent,
+            )
            if not self._client.health():
                logger.warning("OpenViking server at %s is not reachable", self._endpoint)
                self._client = None
@@ -325,7 +352,8 @@ class OpenVikingMemoryProvider(MemoryProvider):
                "(abstract/overview/full), viking_browse to explore.\n"
                "Use viking_remember to store facts, viking_add_resource to index URLs/docs."
            )
-        except Exception:
+        except Exception as e:
+            logger.warning("OpenViking system_prompt_block failed: %s", e)
            return (
                "# OpenViking Knowledge Base\n"
                f"Active. Endpoint: {self._endpoint}\n"
@@ -351,7 +379,10 @@ class OpenVikingMemoryProvider(MemoryProvider):

        def _run():
            try:
-                client = _VikingClient(self._endpoint, self._api_key)
+                client = _VikingClient(
+                    self._endpoint, self._api_key,
+                    account=self._account, user=self._user, agent=self._agent,
+                )
                resp = client.post("/api/v1/search/find", {
                    "query": query,
                    "top_k": 5,
@@ -386,7 +417,10 @@ class OpenVikingMemoryProvider(MemoryProvider):

        def _sync():
            try:
-                client = _VikingClient(self._endpoint, self._api_key)
+                client = _VikingClient(
+                    self._endpoint, self._api_key,
+                    account=self._account, user=self._user, agent=self._agent,
+                )
                sid = self._session_id

                # Add user message
@@ -442,7 +476,10 @@ class OpenVikingMemoryProvider(MemoryProvider):

        def _write():
            try:
-                client = _VikingClient(self._endpoint, self._api_key)
+                client = _VikingClient(
+                    self._endpoint, self._api_key,
+                    account=self._account, user=self._user, agent=self._agent,
+                )
                # Add as a user message with memory context so the commit
                # picks it up as an explicit memory during extraction
                client.post(f"/api/v1/sessions/{self._session_id}/messages", {
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "hermes-agent"
-version = "0.9.0"
+version = "0.10.0"
 description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -63,10 +63,12 @@ homeassistant = ["aiohttp>=3.9.0,<4"]
 sms = ["aiohttp>=3.9.0,<4"]
 acp = ["agent-client-protocol>=0.9.0,<1.0"]
 mistral = ["mistralai>=2.3.0,<3"]
+bedrock = ["boto3>=1.35.0,<2"]
 termux = [
  # Tested Android / Termux path: keeps the core CLI feature-rich while
  # avoiding extras that currently depend on non-Android wheels (notably
  # faster-whisper -> ctranslate2 via the voice extra).
+  "python-telegram-bot[webhooks]>=22.6,<23",
  "hermes-agent[cron]",
  "hermes-agent[cli]",
  "hermes-agent[pty]",
@@ -108,6 +110,7 @@ all = [
  "hermes-agent[dingtalk]",
  "hermes-agent[feishu]",
  "hermes-agent[mistral]",
+  "hermes-agent[bedrock]",
  "hermes-agent[web]",
 ]

@@ -28,7 +28,7 @@ BOLD='\033[1m'
 # Configuration
 REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
 REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
-HERMES_HOME="$HOME/.hermes"
+HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
 INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
 PYTHON_VERSION="3.11"
 NODE_VERSION="22"
@@ -66,6 +66,10 @@ while [[ $# -gt 0 ]]; do
            INSTALL_DIR="$2"
            shift 2
            ;;
+        --hermes-home)
+            HERMES_HOME="$2"
+            shift 2
+            ;;
        -h|--help)
            echo "Hermes Agent Installer"
            echo ""
@@ -76,6 +80,7 @@ while [[ $# -gt 0 ]]; do
            echo "  --skip-setup   Skip interactive setup wizard"
            echo "  --branch NAME  Git branch to install (default: main)"
            echo "  --dir PATH     Installation directory (default: ~/.hermes/hermes-agent)"
+            echo "  --hermes-home PATH  Data directory (default: ~/.hermes, or \$HERMES_HOME)"
            echo "  -h, --help     Show this help"
            exit 0
            ;;
@@ -46,6 +46,7 @@ AUTHOR_MAP = {
    # contributors (from noreply pattern)
    "35742124+0xbyt4@users.noreply.github.com": "0xbyt4",
    "82637225+kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
+    "kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
    "16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
    "185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
    "101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
@@ -62,8 +63,13 @@ AUTHOR_MAP = {
    "258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
    "70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
    "259807879+Bartok9@users.noreply.github.com": "Bartok9",
+    "241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
    "268667990+Roy-oss1@users.noreply.github.com": "Roy-oss1",
+    "27917469+nosleepcassette@users.noreply.github.com": "nosleepcassette",
+    "241404605+MestreY0d4-Uninter@users.noreply.github.com": "MestreY0d4-Uninter",
+    "109555139+davetist@users.noreply.github.com": "davetist",
    # contributors (manual mapping from git names)
+    "ahmedsherif95@gmail.com": "asheriif",
    "dmayhem93@gmail.com": "dmahan93",
    "samherring99@gmail.com": "samherring99",
    "desaiaum08@gmail.com": "Aum08Desai",
@@ -75,8 +81,13 @@ AUTHOR_MAP = {
    "abdullahfarukozden@gmail.com": "Farukest",
    "lovre.pesut@gmail.com": "rovle",
    "hakanerten02@hotmail.com": "teyrebaz33",
+    "ruzzgarcn@gmail.com": "Ruzzgar",
    "alireza78.crypto@gmail.com": "alireza78a",
    "brooklyn.bb.nicholson@gmail.com": "brooklynnicholson",
+    "4317663+helix4u@users.noreply.github.com": "helix4u",
+    "331214+counterposition@users.noreply.github.com": "counterposition",
+    "blspear@gmail.com": "BrennerSpear",
+    "239876380+handsdiff@users.noreply.github.com": "handsdiff",
    "gpickett00@gmail.com": "gpickett00",
    "mcosma@gmail.com": "wakamex",
    "clawdia.nash@proton.me": "clawdia-nash",
@@ -118,6 +129,7 @@ AUTHOR_MAP = {
    "balyan.sid@gmail.com": "balyansid",
    "oluwadareab12@gmail.com": "bennytimz",
    "simon@simonmarcus.org": "simon-marcus",
+    "xowiekk@gmail.com": "Xowiek",
    "1243352777@qq.com": "zons-zhaozhy",
    # ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply
    #    crossref, and GH contributor list matching (April 2026 audit) ──
@@ -169,6 +181,22 @@ AUTHOR_MAP = {
    "limars874@gmail.com": "limars874",
    "lisicheng168@gmail.com": "lesterli",
    "mingjwan@microsoft.com": "MagicRay1217",
+    "orangeko@gmail.com": "GenKoKo",
+    "82095453+iacker@users.noreply.github.com": "iacker",
+    "sontianye@users.noreply.github.com": "sontianye",
+    "jackjin1997@users.noreply.github.com": "jackjin1997",
+    "danieldoderlein@users.noreply.github.com": "danieldoderlein",
+    "lrawnsley@users.noreply.github.com": "lrawnsley",
+    "taeuk178@users.noreply.github.com": "taeuk178",
+    "ogzerber@users.noreply.github.com": "ogzerber",
+    "cola-runner@users.noreply.github.com": "cola-runner",
+    "ygd58@users.noreply.github.com": "ygd58",
+    "vominh1919@users.noreply.github.com": "vominh1919",
+    "trevmanthony@gmail.com": "trevthefoolish",
+    "ziliangpeng@users.noreply.github.com": "ziliangpeng",
+    "centripetal-star@users.noreply.github.com": "centripetal-star",
+    "LeonSGP43@users.noreply.github.com": "LeonSGP43",
+    "Lubrsy706@users.noreply.github.com": "Lubrsy706",
    "niyant@spicefi.xyz": "spniyant",
    "olafthiele@gmail.com": "olafthiele",
    "oncuevtv@gmail.com": "sprmn24",
@@ -191,12 +219,15 @@ AUTHOR_MAP = {
    "yangzhi.see@gmail.com": "SeeYangZhi",
    "yongtenglei@gmail.com": "yongtenglei",
    "young@YoungdeMacBook-Pro.local": "YoungYang963",
-    "ysfalweshcan@gmail.com": "Awsh1",
+    "ysfalweshcan@gmail.com": "Junass1",
    "ysfwaxlycan@gmail.com": "WAXLYY",
    "yusufalweshdemir@gmail.com": "Dusk1e",
    "zhouboli@gmail.com": "zhouboli",
    "zqiao@microsoft.com": "tomqiaozc",
    "zzn+pa@zzn.im": "xinbenlv",
+    "zaynjarvis@gmail.com": "ZaynJarvis",
+    "zhiheng.liu@bytedance.com": "ZaynJarvis",
+    "mbelleau@Michels-MacBook-Pro.local": "malaiwah",
 }


@@ -313,7 +313,7 @@ Type these during an interactive chat session.
 ```
 ~/.hermes/config.yaml       Main configuration
 ~/.hermes/.env              API keys and secrets
-~/.hermes/skills/           Installed skills
+$HERMES_HOME/skills/        Installed skills
 ~/.hermes/sessions/         Session transcripts
 ~/.hermes/logs/             Gateway and error logs
 ~/.hermes/auth.json         OAuth tokens and credential pools
@@ -351,8 +351,8 @@ Full config reference: https://hermes-agent.nousresearch.com/docs/user-guide/con
 |----------|------|-------------|
 | OpenRouter | API key | `OPENROUTER_API_KEY` |
 | Anthropic | API key | `ANTHROPIC_API_KEY` |
-| Nous Portal | OAuth | `hermes login --provider nous` |
-| OpenAI Codex | OAuth | `hermes login --provider openai-codex` |
+| Nous Portal | OAuth | `hermes auth` |
+| OpenAI Codex | OAuth | `hermes auth` |
 | GitHub Copilot | Token | `COPILOT_GITHUB_TOKEN` |
 | Google Gemini | API key | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
 | DeepSeek | API key | `DEEPSEEK_API_KEY` |
@@ -334,7 +334,7 @@ When the user asks you to "review PR #N", "look at this PR", or gives you a PR U
 ### Step 1: Set up environment

 ```bash
-source ~/.hermes/skills/github/github-auth/scripts/gh-env.sh
+source "${HERMES_HOME:-$HOME/.hermes}/skills/github/github-auth/scripts/gh-env.sh"
 # Or run the inline setup block from the top of this skill
 ```

@@ -6,7 +6,7 @@ All requests need: `-H "Authorization: token $GITHUB_TOKEN"`

 Use the `gh-env.sh` helper to set `$GITHUB_TOKEN`, `$GH_OWNER`, `$GH_REPO` automatically:
 ```bash
-source ~/.hermes/skills/github/github-auth/scripts/gh-env.sh
+source "${HERMES_HOME:-$HOME/.hermes}/skills/github/github-auth/scripts/gh-env.sh"
 ```

 ## Repositories
@@ -98,7 +98,7 @@ def find_nearby(lat: float, lon: float, types: list[str], radius: int = 1500, li
        # Get coordinates (nodes have lat/lon directly, ways/relations use center)
        plat = el.get("lat") or (el.get("center", {}) or {}).get("lat")
        plon = el.get("lon") or (el.get("center", {}) or {}).get("lon")
-        if not plat or not plon:
+        if plat is None or plon is None:
            continue

        dist = haversine(lat, lon, plat, plon)
@@ -32,7 +32,7 @@ on CLI, Telegram, Discord, or any platform.
 Define a shorthand first:

 ```bash
-GSETUP="python ~/.hermes/skills/productivity/google-workspace/scripts/setup.py"
+GSETUP="python ${HERMES_HOME:-$HOME/.hermes}/skills/productivity/google-workspace/scripts/setup.py"
 ```

 ### Step 0: Check if already set up
@@ -163,7 +163,7 @@ Should print `AUTHENTICATED`. Setup is complete — token refreshes automaticall
 All commands go through the API script. Set `GAPI` as a shorthand:

 ```bash
-GAPI="python ~/.hermes/skills/productivity/google-workspace/scripts/google_api.py"
+GAPI="python ${HERMES_HOME:-$HOME/.hermes}/skills/productivity/google-workspace/scripts/google_api.py"
 ```

 ### Gmail
@@ -47,6 +47,13 @@ SCOPES = [
 ]


+def _normalize_authorized_user_payload(payload: dict) -> dict:
+    normalized = dict(payload)
+    if not normalized.get("type"):
+        normalized["type"] = "authorized_user"
+    return normalized
+
+
 def _ensure_authenticated():
    if not TOKEN_PATH.exists():
        print("Not authenticated. Run the setup script first:", file=sys.stderr)
@@ -170,7 +177,12 @@ def get_credentials():
    creds = Credentials.from_authorized_user_file(str(TOKEN_PATH), _stored_token_scopes())
    if creds.expired and creds.refresh_token:
        creds.refresh(Request())
-        TOKEN_PATH.write_text(creds.to_json())
+        TOKEN_PATH.write_text(
+            json.dumps(
+                _normalize_authorized_user_payload(json.loads(creds.to_json())),
+                indent=2,
+            )
+        )
    if not creds.valid:
        print("Token is invalid. Re-run setup.", file=sys.stderr)
        sys.exit(1)
--- a/Show More
+++ b/Show More