fix(dashboard): persist chat tab state across tab switches

The dashboard's Chat tab (hermes dashboard --tui) lost its session whenever the user navigated to another tab and came back. React Router unmounted ChatPage on path change, which ran the cleanup function, closed the PTY WebSocket, and terminated the underlying TUI child - so the next mount generated a fresh channel id, spawned a new PTY, and started a brand-new conversation. Rather than rebuild the destroyed state (session id capture + resume via HERMES_TUI_RESUME would reload history from disk but drop in-flight tool state, scrollback, and picker position), keep the component tree alive. * Pull ChatPage out of Routes into a sibling always-mounted host that toggles visibility via display:none keyed off the current route. A tiny ChatRouteSink still claims /chat so the catch-all redirect does not fire. * xterm instance, WebSocket, PTY child, and TUI/agent state all survive; returning to /chat shows the exact conversation the user left. * Respect plugin `/chat` overrides: if a plugin manifest declares `tab.override: "/chat"`, the Routes tree already swaps the element for <PluginPage /> — we additionally suppress the persistent host so the two don't paint on top of each other. Preserves the pre-persistence contract that a plugin owning /chat replaces the built-in chat UI entirely. * Wait for usePlugins() to finish loading before mounting the persistent host. Manifests arrive asynchronously from /api/dashboard/plugins, so without the `!pluginsLoading` gate the host would mount with manifests=[], spawn a PTY, and then unmount mid-session when the manifest list resolves and reveals a /chat override. Typical delay is <50ms; worst case is the 2s plugin- registration safety timeout. Cheaper than killing someone's conversation underneath them. * Gate page-header slot (`setEnd`), the mobile sheet's portalled render, and body-scroll lock on a new `isActive` prop so the hidden ChatPage doesn't fight the active page for shared state. The scroll-lock effect keys on the *derived* `mobilePanelOpen` (which is `isActive && mobilePanelOpenRaw`) rather than the raw state — that way tab-switch flips the dep false, fires the cleanup, and releases `document.body.style.overflow`. Keying on the raw state would leave body.overflow="hidden" stuck on /sessions and every other tab until the user navigated back to /chat and explicitly closed the sheet. * When isActive flips false to true, force a double-rAF fit: display:none collapses the host box and ResizeObserver does not fire on display changes, so xterm would otherwise stay at a stale or 1x1 grid. Also early-return from syncTerminalMetrics when the host has zero area, since fit() on a zero-sized element produces a 1x1 terminal. * Focus handling on tab return: only steal focus into the terminal if focus wasn't already parked somewhere inside ChatPage (e.g. the sidebar model picker, a tool-call entry). Yanking focus away from whatever the user last clicked is surprising and a screen-reader foot-gun; the typical "first activation" case still focuses the terminal because document.activeElement is <body> at that point. Trade-off worth flagging, deliberately not mitigated in this change: while hidden, ChatPage still holds a PTY child + WebSocket + xterm instance for the dashboard's full lifetime. The WS keeps delivering bytes and xterm keeps parsing them into a display:none host (cheap — no paint work, but not free). Reasonable costs to pay for the session preservation; if they become a problem we can pause `term.write` when !isActive or idle-disconnect after N minutes hidden. Lint clean on touched files. tsc -b && vite build pass.
test(gateway): cover /compress summary-failure warning path
2026-04-28 02:40:25 -04:00 · 2026-04-27 19:18:13 -07:00 · 2026-04-27 19:18:13 -07:00 · 2026-04-27 19:18:13 -07:00 · 2026-04-27 19:18:13 -07:00 · 2026-04-27 19:18:13 -07:00
481 changed files with 50536 additions and 2451 deletions
@@ -69,3 +69,4 @@ mini-swe-agent/
 .nix-stamps/
 result
 website/static/api/skills-index.json
+models-dev-upstream/
@@ -30,18 +30,22 @@ WORKDIR /opt/hermes
 # unless the lockfiles themselves change.
 COPY package.json package-lock.json ./
 COPY web/package.json web/package-lock.json web/
+COPY ui-tui/package.json ui-tui/package-lock.json ui-tui/
+COPY ui-tui/packages/hermes-ink/package.json ui-tui/packages/hermes-ink/package-lock.json ui-tui/packages/hermes-ink/

 RUN npm install --prefer-offline --no-audit && \
    npx playwright install --with-deps chromium --only-shell && \
    (cd web && npm install --prefer-offline --no-audit) && \
+    (cd ui-tui && npm install --prefer-offline --no-audit) && \
    npm cache clean --force

 # ---------- Source code ----------
 # .dockerignore excludes node_modules, so the installs above survive.
 COPY --chown=hermes:hermes . .

-# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
-RUN cd web && npm run build
+# Build browser dashboard and terminal UI assets.
+RUN cd web && npm run build && \
+    cd ../ui-tui && npm run build

 # ---------- Permissions ----------
 # Make install dir world-readable so any HERMES_UID can read it at runtime.
@@ -82,6 +82,8 @@ _PROVIDER_ALIASES = {
    "moonshot": "kimi-coding",
    "kimi-cn": "kimi-coding-cn",
    "moonshot-cn": "kimi-coding-cn",
+    "gmi-cloud": "gmi",
+    "gmicloud": "gmi",
    "minimax-china": "minimax-cn",
    "minimax_cn": "minimax-cn",
    "claude": "anthropic",
@@ -155,6 +157,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
    "kimi-coding": "kimi-k2-turbo-preview",
    "stepfun": "step-3.5-flash",
    "kimi-coding-cn": "kimi-k2-turbo-preview",
+    "gmi": "google/gemini-3.1-flash-lite-preview",
    "minimax": "MiniMax-M2.7",
    "minimax-cn": "MiniMax-M2.7",
    "anthropic": "claude-haiku-4-5-20251001",
@@ -1617,8 +1620,14 @@ def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Option
 # below — never look up auth env vars ad-hoc.


-def _to_async_client(sync_client, model: str):
-    """Convert a sync client to its async counterpart, preserving Codex routing."""
+def _to_async_client(sync_client, model: str, is_vision: bool = False):
+    """Convert a sync client to its async counterpart, preserving Codex routing.
+
+    When ``is_vision=True`` and the underlying base URL is Copilot, the
+    resulting async client carries the ``Copilot-Vision-Request: true``
+    header so the request is routed to Copilot's vision-capable
+    infrastructure (otherwise vision payloads silently time out).
+    """
    from openai import AsyncOpenAI

    if isinstance(sync_client, CodexAuxiliaryClient):
@@ -1647,9 +1656,11 @@ def _to_async_client(sync_client, model: str):
    if base_url_host_matches(sync_base_url, "openrouter.ai"):
        async_kwargs["default_headers"] = dict(_OR_HEADERS)
    elif base_url_host_matches(sync_base_url, "api.githubcopilot.com"):
-        from hermes_cli.models import copilot_default_headers
+        from hermes_cli.copilot_auth import copilot_request_headers

-        async_kwargs["default_headers"] = copilot_default_headers()
+        async_kwargs["default_headers"] = copilot_request_headers(
+            is_agent_turn=True, is_vision=is_vision
+        )
    elif base_url_host_matches(sync_base_url, "api.kimi.com"):
        async_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
    return AsyncOpenAI(**async_kwargs), model
@@ -1676,6 +1687,7 @@ def resolve_provider_client(
    explicit_api_key: str = None,
    api_mode: str = None,
    main_runtime: Optional[Dict[str, Any]] = None,
+    is_vision: bool = False,
 ) -> Tuple[Optional[Any], Optional[str]]:
    """Central router: given a provider name and optional model, return a
    configured client with the correct auth, base URL, and API format.
@@ -1759,7 +1771,7 @@ def resolve_provider_client(
                "auxiliary provider (using %r instead)", model, resolved)
            model = None
        final_model = model or resolved
-        return (_to_async_client(client, final_model) if async_mode
+        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    # ── OpenRouter ───────────────────────────────────────────────────
@@ -1772,7 +1784,7 @@ def resolve_provider_client(
            )
            return None, None
        final_model = _normalize_resolved_model(model or default, provider)
-        return (_to_async_client(client, final_model) if async_mode
+        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    # ── Nous Portal (OAuth) ──────────────────────────────────────────
@@ -1789,7 +1801,7 @@ def resolve_provider_client(
                           "but Nous Portal not configured (run: hermes auth)")
            return None, None
        final_model = _normalize_resolved_model(model or default, provider)
-        return (_to_async_client(client, final_model) if async_mode
+        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    # ── OpenAI Codex (OAuth → Responses API) ─────────────────────────
@@ -1816,7 +1828,7 @@ def resolve_provider_client(
                           "but no Codex OAuth token found (run: hermes model)")
            return None, None
        final_model = _normalize_resolved_model(model or default, provider)
-        return (_to_async_client(client, final_model) if async_mode
+        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    # ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
@@ -1845,11 +1857,13 @@ def resolve_provider_client(
            if base_url_host_matches(custom_base, "api.kimi.com"):
                extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
            elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
-                from hermes_cli.models import copilot_default_headers
-                extra["default_headers"] = copilot_default_headers()
+                from hermes_cli.copilot_auth import copilot_request_headers
+                extra["default_headers"] = copilot_request_headers(
+                    is_agent_turn=True, is_vision=is_vision
+                )
            client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
            client = _wrap_if_needed(client, final_model, custom_base)
-            return (_to_async_client(client, final_model) if async_mode
+            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                    else (client, final_model))
        # Try custom first, then codex, then API-key providers
        for try_fn in (_try_custom_endpoint, _try_codex,
@@ -1859,7 +1873,7 @@ def resolve_provider_client(
                final_model = _normalize_resolved_model(model or default, provider)
                _cbase = str(getattr(client, "base_url", "") or "")
                client = _wrap_if_needed(client, final_model, _cbase)
-                return (_to_async_client(client, final_model) if async_mode
+                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                        else (client, final_model))
        logger.warning("resolve_provider_client: custom/main requested "
                       "but no endpoint credentials found")
@@ -1904,7 +1918,7 @@ def resolve_provider_client(
                            provider,
                        )
                        client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
-                        return (_to_async_client(client, final_model) if async_mode
+                        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                                else (client, final_model))
                    sync_anthropic = AnthropicAuxiliaryClient(
                        real_client, final_model, custom_key, custom_base, is_oauth=False,
@@ -1923,7 +1937,7 @@ def resolve_provider_client(
                    client = CodexAuxiliaryClient(client, final_model)
                else:
                    client = _wrap_if_needed(client, final_model, custom_base)
-                return (_to_async_client(client, final_model) if async_mode
+                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                        else (client, final_model))
            logger.warning(
                "resolve_provider_client: named custom provider %r has no base_url",
@@ -1955,7 +1969,7 @@ def resolve_provider_client(
                logger.warning("resolve_provider_client: anthropic requested but no Anthropic credentials found")
                return None, None
            final_model = _normalize_resolved_model(model or default_model, provider)
-            return (_to_async_client(client, final_model) if async_mode else (client, final_model))
+            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode else (client, final_model))

        creds = resolve_api_key_provider_credentials(provider)
        api_key = str(creds.get("api_key", "")).strip()
@@ -1981,7 +1995,7 @@ def resolve_provider_client(
            if is_native_gemini_base_url(base_url):
                client = GeminiNativeClient(api_key=api_key, base_url=base_url)
                logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
-                return (_to_async_client(client, final_model) if async_mode
+                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                        else (client, final_model))

        # Provider-specific headers
@@ -1989,9 +2003,11 @@ def resolve_provider_client(
        if base_url_host_matches(base_url, "api.kimi.com"):
            headers["User-Agent"] = "claude-code/0.1.0"
        elif base_url_host_matches(base_url, "api.githubcopilot.com"):
-            from hermes_cli.models import copilot_default_headers
+            from hermes_cli.copilot_auth import copilot_request_headers

-            headers.update(copilot_default_headers())
+            headers.update(copilot_request_headers(
+                is_agent_turn=True, is_vision=is_vision
+            ))
        client = OpenAI(api_key=api_key, base_url=base_url,
                        **({"default_headers": headers} if headers else {}))

@@ -2017,7 +2033,7 @@ def resolve_provider_client(
        client = _wrap_if_needed(client, final_model, base_url)

        logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
-        return (_to_async_client(client, final_model) if async_mode
+        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    if pconfig.auth_type == "external_process":
@@ -2049,7 +2065,7 @@ def resolve_provider_client(
                args=args,
            )
            logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
-            return (_to_async_client(client, final_model) if async_mode
+            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                    else (client, final_model))
        logger.warning("resolve_provider_client: external-process provider %s not "
                       "directly supported", provider)
@@ -2085,7 +2101,7 @@ def resolve_provider_client(
            base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
        )
        logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
-        return (_to_async_client(client, final_model) if async_mode
+        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
@@ -2160,8 +2176,13 @@ def _normalize_vision_provider(provider: Optional[str]) -> str:
    return _normalize_aux_provider(provider)


-def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]:
+def _resolve_strict_vision_backend(
+    provider: str,
+    model: Optional[str] = None,
+) -> Tuple[Optional[Any], Optional[str]]:
    provider = _normalize_vision_provider(provider)
+    if provider == "copilot":
+        return resolve_provider_client("copilot", model, is_vision=True)
    if provider == "openrouter":
        return _try_openrouter()
    if provider == "nous":
@@ -2229,7 +2250,7 @@ def resolve_vision_provider_client(
            return resolved_provider, None, None
        final_model = resolved_model or default_model
        if async_mode:
-            async_client, async_model = _to_async_client(sync_client, final_model)
+            async_client, async_model = _to_async_client(sync_client, final_model, is_vision=True)
            return resolved_provider, async_client, async_model
        return resolved_provider, sync_client, final_model

@@ -2261,8 +2282,11 @@ def resolve_vision_provider_client(
        main_provider = _read_main_provider()
        main_model = _read_main_model()
        if main_provider and main_provider not in ("auto", ""):
+            vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
            if main_provider == "nous":
-                sync_client, default_model = _resolve_strict_vision_backend(main_provider)
+                sync_client, default_model = _resolve_strict_vision_backend(
+                    main_provider, vision_model
+                )
                if sync_client is not None:
                    logger.info(
                        "Vision auto-detect: using main provider %s (%s)",
@@ -2270,10 +2294,10 @@ def resolve_vision_provider_client(
                    )
                    return _finalize(main_provider, sync_client, default_model)
            else:
-                vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
                rpc_client, rpc_model = resolve_provider_client(
                    main_provider, vision_model,
-                    api_mode=resolved_api_mode)
+                    api_mode=resolved_api_mode,
+                    is_vision=True)
                if rpc_client is not None:
                    logger.info(
                        "Vision auto-detect: using main provider %s (%s)",
@@ -2295,11 +2319,14 @@ def resolve_vision_provider_client(
        return None, None, None

    if requested in _VISION_AUTO_PROVIDER_ORDER:
-        sync_client, default_model = _resolve_strict_vision_backend(requested)
+        sync_client, default_model = _resolve_strict_vision_backend(
+            requested, resolved_model
+        )
        return _finalize(requested, sync_client, default_model)

    client, final_model = _get_cached_client(requested, resolved_model, async_mode,
-                                             api_mode=resolved_api_mode)
+                                             api_mode=resolved_api_mode,
+                                             is_vision=True)
    if client is None:
        return requested, None, None
    return requested, client, final_model
@@ -2363,10 +2390,11 @@ def _client_cache_key(
    api_key: Optional[str] = None,
    api_mode: Optional[str] = None,
    main_runtime: Optional[Dict[str, Any]] = None,
+    is_vision: bool = False,
 ) -> tuple:
    runtime = _normalize_main_runtime(main_runtime)
    runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
-    return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
+    return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key, is_vision)


 def _store_cached_client(cache_key: tuple, client: Any, default_model: Optional[str], *, bound_loop: Any = None) -> None:
@@ -2392,6 +2420,7 @@ def _refresh_nous_auxiliary_client(
    api_key: Optional[str] = None,
    api_mode: Optional[str] = None,
    main_runtime: Optional[Dict[str, Any]] = None,
+    is_vision: bool = False,
 ) -> Tuple[Optional[Any], Optional[str]]:
    """Refresh Nous runtime creds, rebuild the client, and replace the cache entry."""
    runtime = _resolve_nous_runtime_api(force_refresh=True)
@@ -2409,7 +2438,7 @@ def _refresh_nous_auxiliary_client(
            current_loop = _aio.get_event_loop()
        except RuntimeError:
            pass
-        client, final_model = _to_async_client(sync_client, final_model or "")
+        client, final_model = _to_async_client(sync_client, final_model or "", is_vision=is_vision)
    else:
        client = sync_client

@@ -2420,6 +2449,7 @@ def _refresh_nous_auxiliary_client(
        api_key=api_key,
        api_mode=api_mode,
        main_runtime=main_runtime,
+        is_vision=is_vision,
    )
    _store_cached_client(cache_key, client, final_model, bound_loop=current_loop)
    return client, final_model
@@ -2531,12 +2561,19 @@ def _is_openrouter_client(client: Any) -> bool:
    return False


+def _cached_client_accepts_slash_models(client: Any, cached_default: Optional[str]) -> bool:
+    """Best-effort check for cached clients that accept ``vendor/model`` IDs."""
+    if _is_openrouter_client(client):
+        return True
+    return bool(cached_default and "/" in cached_default)
+
+
 def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
-    """Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.
+    """Keep slash-bearing model IDs only for cached clients that support them.

    Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
    """
-    if model and "/" in model and not _is_openrouter_client(client):
+    if model and "/" in model and not _cached_client_accepts_slash_models(client, cached_default):
        return cached_default
    return model or cached_default

@@ -2549,6 +2586,7 @@ def _get_cached_client(
    api_key: str = None,
    api_mode: str = None,
    main_runtime: Optional[Dict[str, Any]] = None,
+    is_vision: bool = False,
 ) -> Tuple[Optional[Any], Optional[str]]:
    """Get or create a cached client for the given provider.

@@ -2585,6 +2623,7 @@ def _get_cached_client(
        api_key=api_key,
        api_mode=api_mode,
        main_runtime=main_runtime,
+        is_vision=is_vision,
    )
    with _client_cache_lock:
        if cache_key in _client_cache:
@@ -2616,6 +2655,7 @@ def _get_cached_client(
        explicit_api_key=api_key,
        api_mode=api_mode,
        main_runtime=runtime,
+        is_vision=is_vision,
    )
    if client is not None:
        # For async clients, remember which loop they were created on so we
@@ -3079,6 +3119,7 @@ def call_llm(
                api_key=resolved_api_key,
                api_mode=resolved_api_mode,
                main_runtime=main_runtime,
+                is_vision=(task == "vision"),
            )
            if refreshed_client is not None:
                logger.info("Auxiliary %s: refreshed Nous runtime credentials after 401, retrying",
@@ -3369,6 +3410,7 @@ async def async_call_llm(
                base_url=resolved_base_url,
                api_key=resolved_api_key,
                api_mode=resolved_api_mode,
+                is_vision=(task == "vision"),
            )
            if refreshed_client is not None:
                logger.info("Auxiliary %s (async): refreshed Nous runtime credentials after 401, retrying",
@@ -3437,7 +3479,9 @@ async def async_call_llm(
                    extra_body=effective_extra_body,
                    base_url=str(getattr(fb_client, "base_url", "") or ""))
                # Convert sync fallback client to async
-                async_fb, async_fb_model = _to_async_client(fb_client, fb_model or "")
+                async_fb, async_fb_model = _to_async_client(
+                    fb_client, fb_model or "", is_vision=(task == "vision")
+                )
                if async_fb_model and async_fb_model != fb_kwargs.get("model"):
                    fb_kwargs["model"] = async_fb_model
                return _validate_llm_response(
@@ -61,9 +61,52 @@ _PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"

 # Chars per token rough estimate
 _CHARS_PER_TOKEN = 4
+# Flat token cost per attached image part.  Real cost varies by provider and
+# dimensions (Anthropic ≈ width×height/750, GPT-4o up to ~1700 for
+# high-detail 2048×2048, Gemini 258/tile), but 1600 is a realistic ceiling
+# that keeps compression budgeting honest for multi-image conversations.
+# Matches Claude Code's IMAGE_TOKEN_ESTIMATE constant.
+_IMAGE_TOKEN_ESTIMATE = 1600
+# Same figure expressed in the char-budget currency the rest of the
+# compressor speaks in.  Used when accumulating message "content length"
+# for tail-cut decisions.
+_IMAGE_CHAR_EQUIVALENT = _IMAGE_TOKEN_ESTIMATE * _CHARS_PER_TOKEN
 _SUMMARY_FAILURE_COOLDOWN_SECONDS = 600


+def _content_length_for_budget(raw_content: Any) -> int:
+    """Return the effective char-length of a message's content for token budgeting.
+
+    Plain strings: ``len(content)``. Multimodal lists: sum of text-part
+    ``len(text)`` plus a flat ``_IMAGE_CHAR_EQUIVALENT`` per image part
+    (``image_url`` / ``input_image`` / Anthropic-style ``image``). This
+    keeps the compressor from treating a turn with 5 attached images as
+    near-zero tokens just because the text part is empty.
+    """
+    if isinstance(raw_content, str):
+        return len(raw_content)
+    if not isinstance(raw_content, list):
+        return len(str(raw_content or ""))
+
+    total = 0
+    for p in raw_content:
+        if isinstance(p, str):
+            total += len(p)
+            continue
+        if not isinstance(p, dict):
+            total += len(str(p))
+            continue
+        ptype = p.get("type")
+        if ptype in {"image_url", "input_image", "image"}:
+            total += _IMAGE_CHAR_EQUIVALENT
+        else:
+            # text / input_text / tool_result-with-text / anything else with
+            # a text field.  Ignore the raw base64 payload inside image_url
+            # dicts — dimensions don't matter, only whether it's an image.
+            total += len(p.get("text", "") or "")
+    return total
+
+
 def _content_text_for_contains(content: Any) -> str:
    """Return a best-effort text view of message content.

@@ -295,6 +338,8 @@ class ContextCompressor(ContextEngine):
        self._context_probe_persistable = False
        self._previous_summary = None
        self._last_summary_error = None
+        self._last_summary_dropped_count = 0
+        self._last_summary_fallback_used = False
        self._last_compression_savings_pct = 100.0
        self._ineffective_compression_count = 0

@@ -398,6 +443,11 @@ class ContextCompressor(ContextEngine):
        self._ineffective_compression_count: int = 0
        self._summary_failure_cooldown_until: float = 0.0
        self._last_summary_error: Optional[str] = None
+        # When summary generation fails and a static fallback is inserted,
+        # record how many turns were unrecoverably dropped so callers
+        # (gateway hygiene, /compress) can surface a visible warning.
+        self._last_summary_dropped_count: int = 0
+        self._last_summary_fallback_used: bool = False

    def update_from_response(self, usage: Dict[str, Any]):
        """Update tracked token usage from API response."""
@@ -484,7 +534,7 @@ class ContextCompressor(ContextEngine):
            for i in range(len(result) - 1, -1, -1):
                msg = result[i]
                raw_content = msg.get("content") or ""
-                content_len = sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content)
+                content_len = _content_length_for_budget(raw_content)
                msg_tokens = content_len // _CHARS_PER_TOKEN + 10
                for tc in msg.get("tool_calls") or []:
                    if isinstance(tc, dict):
@@ -1082,8 +1132,9 @@ The user has requested that this compaction PRIORITISE preserving all informatio

        for i in range(n - 1, head_end - 1, -1):
            msg = messages[i]
-            content = msg.get("content") or ""
-            msg_tokens = len(content) // _CHARS_PER_TOKEN + 10  # +10 for role/metadata
+            raw_content = msg.get("content") or ""
+            content_len = _content_length_for_budget(raw_content)
+            msg_tokens = content_len // _CHARS_PER_TOKEN + 10  # +10 for role/metadata
            # Include tool call arguments in estimate
            for tc in msg.get("tool_calls") or []:
                if isinstance(tc, dict):
@@ -1152,6 +1203,11 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                related to this topic and be more aggressive about compressing
                everything else.  Inspired by Claude Code's ``/compact``.
        """
+        # Reset per-call summary failure state — callers inspect these fields
+        # after compress() returns to decide whether to surface a warning.
+        self._last_summary_dropped_count = 0
+        self._last_summary_fallback_used = False
+        self._last_summary_error = None
        n_messages = len(messages)
        # Only need head + 3 tail messages minimum (token budget decides the real tail size)
        _min_for_compress = self.protect_first_n + 3 + 1
@@ -1230,11 +1286,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            if not self.quiet_mode:
                logger.warning("Summary generation failed — inserting static fallback context marker")
            n_dropped = compress_end - compress_start
+            self._last_summary_dropped_count = n_dropped
+            self._last_summary_fallback_used = True
            summary = (
                f"{SUMMARY_PREFIX}\n"
-                f"Summary generation was unavailable. {n_dropped} conversation turns were "
+                f"Summary generation was unavailable. {n_dropped} message(s) were "
                f"removed to free context space but could not be summarized. The removed "
-                f"turns contained earlier work in this session. Continue based on the "
+                f"messages contained earlier work in this session. Continue based on the "
                f"recent messages below and the current state of any files or resources."
            )

@@ -14,6 +14,7 @@ from datetime import datetime
 from typing import Any, Dict, List, Optional, Set, Tuple

 from hermes_constants import OPENROUTER_BASE_URL
+from hermes_cli.config import get_env_value
 import hermes_cli.auth as auth_mod
 from hermes_cli.auth import (
    CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
@@ -1273,7 +1274,8 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
        def _is_source_suppressed(_p, _s):  # type: ignore[misc]
            return False
    if provider == "openrouter":
-        token = os.getenv("OPENROUTER_API_KEY", "").strip()
+        # Check both os.environ and ~/.hermes/.env file
+        token = (get_env_value("OPENROUTER_API_KEY") or "").strip()
        if token:
            source = "env:OPENROUTER_API_KEY"
            if _is_source_suppressed(provider, source):
@@ -1299,7 +1301,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool

    env_url = ""
    if pconfig.base_url_env_var:
-        env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")
+        env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")

    env_vars = list(pconfig.api_key_env_vars)
    if provider == "anthropic":
@@ -1310,7 +1312,8 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
        ]

    for env_var in env_vars:
-        token = os.getenv(env_var, "").strip()
+        # Check both os.environ and ~/.hermes/.env file
+        token = (get_env_value(env_var) or "").strip()
        if not token:
            continue
        source = f"env:{env_var}"
@@ -42,6 +42,7 @@ class FailoverReason(enum.Enum):
    # Context / payload
    context_overflow = "context_overflow"  # Context too large — compress, not failover
    payload_too_large = "payload_too_large"  # 413 — compress payload
+    image_too_large = "image_too_large"   # Native image part exceeds provider's per-image limit — shrink and retry

    # Model
    model_not_found = "model_not_found"  # 404 or invalid model — fallback to different model
@@ -147,6 +148,20 @@ _PAYLOAD_TOO_LARGE_PATTERNS = [
    "error code: 413",
 ]

+# Image-size patterns.  Matched against 400 bodies (not 413) because most
+# providers return a 400 with a specific image-too-big message before the
+# whole request hits the 413 size limit.  Anthropic's wording is the most
+# important here (hard 5 MB per image, returned as
+# "messages.N.content.K.image.source.base64: image exceeds 5 MB maximum").
+_IMAGE_TOO_LARGE_PATTERNS = [
+    "image exceeds",        # Anthropic: "image exceeds 5 MB maximum"
+    "image too large",      # generic
+    "image_too_large",      # error_code variant
+    "image size exceeds",   # variant
+    # "request_too_large" on a request known to contain an image → image is
+    # the likely culprit; we still try the shrink path before giving up.
+]
+
 # Context overflow patterns
 _CONTEXT_OVERFLOW_PATTERNS = [
    "context length",
@@ -671,6 +686,15 @@ def _classify_400(
 ) -> ClassifiedError:
    """Classify 400 Bad Request — context overflow, format error, or generic."""

+    # Image-too-large from 400 (Anthropic's 5 MB per-image check fires this way).
+    # Must be checked BEFORE context_overflow because messages can trip both
+    # patterns ("exceeds" + "image") and image-shrink is a cheaper recovery.
+    if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
+        return result_fn(
+            FailoverReason.image_too_large,
+            retryable=True,
+        )
+
    # Context overflow from 400
    if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
        return result_fn(
@@ -798,6 +822,13 @@ def _classify_by_message(
            should_compress=True,
        )

+    # Image-too-large patterns (from message text when no status_code)
+    if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
+        return result_fn(
+            FailoverReason.image_too_large,
+            retryable=True,
+        )
+
    # Usage-limit patterns need the same disambiguation as 402: some providers
    # surface "usage limit" errors without an HTTP status code.  A transient
    # signal ("try again", "resets at", …) means it's a periodic quota, not
@@ -0,0 +1,236 @@
+"""Routing helpers for inbound user-attached images.
+
+Two modes:
+
+  native  — attach images as OpenAI-style ``image_url`` content parts on the
+            user turn. Provider adapters (Anthropic, Gemini, Bedrock, Codex,
+            OpenAI chat.completions) already translate these into their
+            vendor-specific multimodal formats.
+
+  text    — run ``vision_analyze`` on each image up-front and prepend the
+            description to the user's text. The model never sees the pixels;
+            it only sees a lossy text summary. This is the pre-existing
+            behaviour and still the right choice for non-vision models.
+
+The decision is made once per message turn by :func:`decide_image_input_mode`.
+It reads ``agent.image_input_mode`` from config.yaml (``auto`` | ``native``
+| ``text``, default ``auto``) and the active model's capability metadata.
+
+In ``auto`` mode:
+  - If the user has explicitly configured ``auxiliary.vision.provider``
+    (i.e. not ``auto`` and not empty), we assume they want the text pipeline
+    regardless of the main model — they've opted in to a specific vision
+    backend for a reason (cost, quality, local-only, etc.).
+  - Otherwise, if the active model reports ``supports_vision=True`` in its
+    models.dev metadata, we attach natively.
+  - Otherwise (non-vision model, no explicit override), we fall back to text.
+
+This keeps ``vision_analyze`` surfaced as a tool in every session — skills
+and agent flows that chain it (browser screenshots, deeper inspection of
+URL-referenced images, style-gating loops) keep working. The routing only
+affects *how user-attached images on the current turn* are presented to the
+main model.
+"""
+
+from __future__ import annotations
+
+import base64
+import logging
+import mimetypes
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+
+logger = logging.getLogger(__name__)
+
+
+_VALID_MODES = frozenset({"auto", "native", "text"})
+
+
+def _coerce_mode(raw: Any) -> str:
+    """Normalize a config value into one of the valid modes."""
+    if not isinstance(raw, str):
+        return "auto"
+    val = raw.strip().lower()
+    if val in _VALID_MODES:
+        return val
+    return "auto"
+
+
+def _explicit_aux_vision_override(cfg: Optional[Dict[str, Any]]) -> bool:
+    """True when the user configured a specific auxiliary vision backend.
+
+    An explicit override means the user *wants* the text pipeline (they're
+    paying for a dedicated vision model), so we don't silently bypass it.
+    """
+    if not isinstance(cfg, dict):
+        return False
+    aux = cfg.get("auxiliary") or {}
+    if not isinstance(aux, dict):
+        return False
+    vision = aux.get("vision") or {}
+    if not isinstance(vision, dict):
+        return False
+
+    provider = str(vision.get("provider") or "").strip().lower()
+    model = str(vision.get("model") or "").strip()
+    base_url = str(vision.get("base_url") or "").strip()
+
+    # "auto" / "" / blank = not explicit
+    if provider in ("", "auto") and not model and not base_url:
+        return False
+    return True
+
+
+def _lookup_supports_vision(provider: str, model: str) -> Optional[bool]:
+    """Return True/False if we can resolve caps, None if unknown."""
+    if not provider or not model:
+        return None
+    try:
+        from agent.models_dev import get_model_capabilities
+        caps = get_model_capabilities(provider, model)
+    except Exception as exc:  # pragma: no cover - defensive
+        logger.debug("image_routing: caps lookup failed for %s:%s — %s", provider, model, exc)
+        return None
+    if caps is None:
+        return None
+    return bool(caps.supports_vision)
+
+
+def decide_image_input_mode(
+    provider: str,
+    model: str,
+    cfg: Optional[Dict[str, Any]],
+) -> str:
+    """Return ``"native"`` or ``"text"`` for the given turn.
+
+    Args:
+      provider: active inference provider ID (e.g. ``"anthropic"``, ``"openrouter"``).
+      model:    active model slug as it would be sent to the provider.
+      cfg:      loaded config.yaml dict, or None. When None, behaves as auto.
+    """
+    mode_cfg = "auto"
+    if isinstance(cfg, dict):
+        agent_cfg = cfg.get("agent") or {}
+        if isinstance(agent_cfg, dict):
+            mode_cfg = _coerce_mode(agent_cfg.get("image_input_mode"))
+
+    if mode_cfg == "native":
+        return "native"
+    if mode_cfg == "text":
+        return "text"
+
+    # auto
+    if _explicit_aux_vision_override(cfg):
+        return "text"
+
+    supports = _lookup_supports_vision(provider, model)
+    if supports is True:
+        return "native"
+    return "text"
+
+
+# Image size handling is REACTIVE rather than proactive: we attempt native
+# attachment at full size regardless of provider, and rely on
+# ``run_agent._try_shrink_image_parts_in_messages`` to shrink + retry if
+# the provider rejects the request (e.g. Anthropic's hard 5 MB per-image
+# ceiling returned as HTTP 400 "image exceeds 5 MB maximum").
+#
+# Why reactive: our knowledge of provider ceilings is partial and evolving
+# (OpenAI accepts 49 MB+, Anthropic 5 MB, Gemini 100 MB, others unknown).
+# A proactive per-provider table would be stale the moment a provider raises
+# or lowers its limit, and silently degrading quality for users on providers
+# that would have accepted the full image is the worse failure mode.
+# The shrink-on-reject path loses 1 API call + maybe 1s of Pillow work when
+# it fires, which is cheaper than permanent quality loss.
+
+
+def _guess_mime(path: Path) -> str:
+    mime, _ = mimetypes.guess_type(str(path))
+    if mime and mime.startswith("image/"):
+        return mime
+    # mimetypes on some Linux distros mis-maps .jpg; default to jpeg when
+    # the suffix looks imagey.
+    suffix = path.suffix.lower()
+    return {
+        ".jpg": "image/jpeg",
+        ".jpeg": "image/jpeg",
+        ".png": "image/png",
+        ".gif": "image/gif",
+        ".webp": "image/webp",
+        ".bmp": "image/bmp",
+    }.get(suffix, "image/jpeg")
+
+
+def _file_to_data_url(path: Path) -> Optional[str]:
+    """Encode a local image as a base64 data URL at its native size.
+
+    Size limits are NOT enforced here — the agent retry loop
+    (``run_agent._try_shrink_image_parts_in_messages``) shrinks on the
+    provider's first rejection. Keeping this simple means providers that
+    accept large images (OpenAI 49 MB+, Gemini 100 MB) don't pay a silent
+    quality tax just because one other provider is stricter.
+
+    Returns None only if the file can't be read (missing, permission
+    denied, etc.); the caller reports those paths in ``skipped``.
+    """
+    try:
+        raw = path.read_bytes()
+    except Exception as exc:
+        logger.warning("image_routing: failed to read %s — %s", path, exc)
+        return None
+    mime = _guess_mime(path)
+    b64 = base64.b64encode(raw).decode("ascii")
+    return f"data:{mime};base64,{b64}"
+
+
+def build_native_content_parts(
+    user_text: str,
+    image_paths: List[str],
+) -> Tuple[List[Dict[str, Any]], List[str]]:
+    """Build an OpenAI-style ``content`` list for a user turn.
+
+    Shape:
+      [{"type": "text", "text": "..."},
+       {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
+       ...]
+
+    Images are attached at their native size. If a provider rejects the
+    request because an image is too large (e.g. Anthropic's 5 MB per-image
+    ceiling), the agent's retry loop transparently shrinks and retries
+    once — see ``run_agent._try_shrink_image_parts_in_messages``.
+
+    Returns (content_parts, skipped_paths). Skipped paths are files that
+    couldn't be read from disk.
+    """
+    parts: List[Dict[str, Any]] = []
+    skipped: List[str] = []
+
+    text = (user_text or "").strip()
+    if text:
+        parts.append({"type": "text", "text": text})
+
+    for raw_path in image_paths:
+        p = Path(raw_path)
+        if not p.exists() or not p.is_file():
+            skipped.append(str(raw_path))
+            continue
+        data_url = _file_to_data_url(p)
+        if not data_url:
+            skipped.append(str(raw_path))
+            continue
+        parts.append({
+            "type": "image_url",
+            "image_url": {"url": data_url},
+        })
+
+    # If the text was empty, add a neutral prompt so the turn isn't just images.
+    if not text and any(p.get("type") == "image_url" for p in parts):
+        parts.insert(0, {"type": "text", "text": "What do you see in this image?"})
+
+    return parts, skipped
+
+
+__all__ = [
+    "decide_image_input_mode",
+    "build_native_content_parts",
+]
@@ -63,15 +63,124 @@ def sanitize_context(text: str) -> str:
    return text


-def build_memory_context_block(raw_context: str) -> str:
-    """Wrap prefetched memory in a fenced block with system note.
+class StreamingContextScrubber:
+    """Stateful scrubber for streaming text that may contain split memory-context spans.

-    The fence prevents the model from treating recalled context as user
-    discourse.  Injected at API-call time only — never persisted.
+    The one-shot ``sanitize_context`` regex cannot survive chunk boundaries:
+    a ``<memory-context>`` opened in one delta and closed in a later delta
+    leaks its payload to the UI because the non-greedy block regex needs
+    both tags in one string.  This scrubber runs a small state machine
+    across deltas, holding back partial-tag tails and discarding
+    everything inside a span (including the system-note line).
+
+    Usage::
+
+        scrubber = StreamingContextScrubber()
+        for delta in stream:
+            visible = scrubber.feed(delta)
+            if visible:
+                emit(visible)
+        trailing = scrubber.flush()  # at end of stream
+        if trailing:
+            emit(trailing)
+
+    The scrubber is re-entrant per agent instance.  Callers building new
+    top-level responses (new turn) should create a fresh scrubber or call
+    ``reset()``.
    """
+
+    _OPEN_TAG = "<memory-context>"
+    _CLOSE_TAG = "</memory-context>"
+
+    def __init__(self) -> None:
+        self._in_span: bool = False
+        self._buf: str = ""
+
+    def reset(self) -> None:
+        self._in_span = False
+        self._buf = ""
+
+    def feed(self, text: str) -> str:
+        """Return the visible portion of ``text`` after scrubbing.
+
+        Any trailing fragment that could be the start of an open/close tag
+        is held back in the internal buffer and surfaced on the next
+        ``feed()`` call or discarded/emitted by ``flush()``.
+        """
+        if not text:
+            return ""
+        buf = self._buf + text
+        self._buf = ""
+        out: list[str] = []
+
+        while buf:
+            if self._in_span:
+                idx = buf.lower().find(self._CLOSE_TAG)
+                if idx == -1:
+                    # Hold back a potential partial close tag; drop the rest
+                    held = self._max_partial_suffix(buf, self._CLOSE_TAG)
+                    self._buf = buf[-held:] if held else ""
+                    return "".join(out)
+                # Found close — skip span content + tag, continue
+                buf = buf[idx + len(self._CLOSE_TAG):]
+                self._in_span = False
+            else:
+                idx = buf.lower().find(self._OPEN_TAG)
+                if idx == -1:
+                    # No open tag — hold back a potential partial open tag
+                    held = self._max_partial_suffix(buf, self._OPEN_TAG)
+                    if held:
+                        out.append(buf[:-held])
+                        self._buf = buf[-held:]
+                    else:
+                        out.append(buf)
+                    return "".join(out)
+                # Emit text before the tag, enter span
+                if idx > 0:
+                    out.append(buf[:idx])
+                buf = buf[idx + len(self._OPEN_TAG):]
+                self._in_span = True
+
+        return "".join(out)
+
+    def flush(self) -> str:
+        """Emit any held-back buffer at end-of-stream.
+
+        If we're still inside an unterminated span the remaining content is
+        discarded (safer: leaking partial memory context is worse than a
+        truncated answer).  Otherwise the held-back partial-tag tail is
+        emitted verbatim (it turned out not to be a real tag).
+        """
+        if self._in_span:
+            self._buf = ""
+            self._in_span = False
+            return ""
+        tail = self._buf
+        self._buf = ""
+        return tail
+
+    @staticmethod
+    def _max_partial_suffix(buf: str, tag: str) -> int:
+        """Return the length of the longest buf-suffix that is a tag-prefix.
+
+        Case-insensitive.  Returns 0 if no suffix could start the tag.
+        """
+        tag_lower = tag.lower()
+        buf_lower = buf.lower()
+        max_check = min(len(buf_lower), len(tag_lower) - 1)
+        for i in range(max_check, 0, -1):
+            if tag_lower.startswith(buf_lower[-i:]):
+                return i
+        return 0
+
+
+def build_memory_context_block(raw_context: str) -> str:
+    """Wrap prefetched memory in a fenced block with system note."""
    if not raw_context or not raw_context.strip():
        return ""
    clean = sanitize_context(raw_context)
+    if clean != raw_context:
+        logger.warning("memory provider returned pre-wrapped context; stripped")
    return (
        "<memory-context>\n"
        "[System note: The following is recalled memory context, "
@@ -51,6 +51,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "qwen-oauth",
    "xiaomi",
    "arcee",
+    "gmi",
    "custom", "local",
    # Common aliases
    "google", "google-gemini", "google-ai-studio",
@@ -60,6 +61,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
    "mimo", "xiaomi-mimo",
    "arcee-ai", "arceeai",
+    "gmi-cloud", "gmicloud",
    "xai", "x-ai", "x.ai", "grok",
    "nvidia", "nim", "nvidia-nim", "nemotron",
    "qwen-portal",
@@ -145,10 +147,11 @@ DEFAULT_CONTEXT_LENGTHS = {
    "claude": 200000,
    # OpenAI — GPT-5 family (most have 400k; specific overrides first)
    # Source: https://developers.openai.com/api/docs/models
-    # GPT-5.5 (launched Apr 23 2026). 400k is the fallback for providers we
-    # can't probe live. ChatGPT Codex OAuth actually caps lower (272k as of
-    # Apr 2026) and is resolved via _resolve_codex_oauth_context_length().
-    "gpt-5.5": 400000,
+    # GPT-5.5 (launched Apr 23 2026) is 1.05M on the direct OpenAI API and
+    # ChatGPT Codex OAuth caps it at 272K; both paths resolve via their own
+    # provider-aware branches (_resolve_codex_oauth_context_length + models.dev).
+    # This hardcoded value is only reached when every probe misses.
+    "gpt-5.5": 1050000,
    "gpt-5.4-nano": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4-mini": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4": 1050000,               # GPT-5.4, GPT-5.4 Pro (1.05M context)
@@ -164,7 +167,17 @@ DEFAULT_CONTEXT_LENGTHS = {
    "gemma-4-31b": 256000,
    "gemma-3": 131072,
    "gemma": 8192,  # fallback for older gemma models
-    # DeepSeek
+    # DeepSeek — V4 family ships with a 1M context window. The legacy
+    # aliases ``deepseek-chat`` / ``deepseek-reasoner`` are server-side
+    # mapped to the non-thinking / thinking modes of ``deepseek-v4-flash``
+    # and inherit the same 1M window. The ``deepseek`` substring entry
+    # below remains as a 128K fallback for older / unknown DeepSeek model
+    # ids (e.g. via custom endpoints).
+    # https://api-docs.deepseek.com/zh-cn/quick_start/pricing
+    "deepseek-v4-pro": 1_000_000,
+    "deepseek-v4-flash": 1_000_000,
+    "deepseek-chat": 1_000_000,
+    "deepseek-reasoner": 1_000_000,
    "deepseek": 128000,
    # Meta
    "llama": 131072,
@@ -296,6 +309,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "integrate.api.nvidia.com": "nvidia",
    "api.xiaomimimo.com": "xiaomi",
    "xiaomimimo.com": "xiaomi",
+    "api.gmi-serving.com": "gmi",
    "ollama.com": "ollama-cloud",
 }

@@ -691,6 +705,29 @@ def fetch_endpoint_model_metadata(
    return {}


+def _resolve_endpoint_context_length(
+    model: str,
+    base_url: str,
+    api_key: str = "",
+) -> Optional[int]:
+    """Resolve context length from an endpoint's live ``/models`` metadata."""
+    endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
+    matched = endpoint_metadata.get(model)
+    if not matched:
+        if len(endpoint_metadata) == 1:
+            matched = next(iter(endpoint_metadata.values()))
+        else:
+            for key, entry in endpoint_metadata.items():
+                if model in key or key in model:
+                    matched = entry
+                    break
+    if matched:
+        context_length = matched.get("context_length")
+        if isinstance(context_length, int):
+            return context_length
+    return None
+
+
 def _get_context_cache_path() -> Path:
    """Return path to the persistent context length cache file."""
    from hermes_constants import get_hermes_home
@@ -1284,22 +1321,9 @@ def get_model_context_length(
    # returns 128k) instead of the model's full context (400k).  models.dev
    # has the correct per-provider values and is checked at step 5+.
    if _is_custom_endpoint(base_url) and not _is_known_provider_base_url(base_url):
-        endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
-        matched = endpoint_metadata.get(model)
-        if not matched:
-            # Single-model servers: if only one model is loaded, use it
-            if len(endpoint_metadata) == 1:
-                matched = next(iter(endpoint_metadata.values()))
-            else:
-                # Fuzzy match: substring in either direction
-                for key, entry in endpoint_metadata.items():
-                    if model in key or key in model:
-                        matched = entry
-                        break
-        if matched:
-            context_length = matched.get("context_length")
-            if isinstance(context_length, int):
-                return context_length
+        context_length = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
+        if context_length is not None:
+            return context_length
        if not _is_known_provider_base_url(base_url):
            # 3. Try querying local server directly
            if is_local_endpoint(base_url):
@@ -1363,6 +1387,12 @@ def get_model_context_length(
            if base_url:
                save_context_length(model, base_url, codex_ctx)
            return codex_ctx
+    if effective_provider == "gmi" and base_url:
+        # GMI exposes authoritative context_length via /models, but it is not
+        # in models.dev yet. Preserve that higher-fidelity endpoint lookup.
+        ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
+        if ctx is not None:
+            return ctx
    if effective_provider:
        from agent.models_dev import lookup_models_dev_context
        ctx = lookup_models_dev_context(effective_provider, model)
@@ -180,3 +180,145 @@ def format_remaining(seconds: float) -> str:
    h, remainder = divmod(s, 3600)
    m = remainder // 60
    return f"{h}h {m}m" if m else f"{h}h"
+
+
+# Buckets with reset windows shorter than this are treated as transient
+# (upstream jitter, secondary throttling) rather than a genuine quota
+# exhaustion worth a cross-session breaker trip.
+_MIN_RESET_FOR_BREAKER_SECONDS = 60.0
+
+
+def is_genuine_nous_rate_limit(
+    *,
+    headers: Optional[Mapping[str, str]] = None,
+    last_known_state: Optional[Any] = None,
+) -> bool:
+    """Decide whether a 429 from Nous Portal is a real account rate limit.
+
+    Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
+    MiMo, Hermes, ...) behind one endpoint.  A 429 can mean either:
+
+      (a) The caller's own RPM / RPH / TPM / TPH bucket on Nous is
+          exhausted — a genuine rate limit that will last until the
+          bucket resets.
+      (b) The upstream provider is out of capacity for a specific model
+          — transient, clears in seconds, and has nothing to do with
+          the caller's quota on Nous.
+
+    Tripping the cross-session breaker on (b) blocks ALL Nous requests
+    (and all models, since Nous is one provider key) for minutes even
+    though the caller's account is healthy and a different model would
+    have worked.  That's the bug users hit when DeepSeek V4 Pro 429s
+    trigger a breaker that then blocks Kimi 2.6 and MiMo V2.5 Pro.
+
+    We tell the two apart by looking at:
+
+      1. The 429 response's own ``x-ratelimit-*`` headers.  Nous emits
+         the full suite on every response including 429s.  An exhausted
+         bucket (``remaining == 0`` with a reset window >= 60s) is
+         proof of (a).
+      2. The last-known-good rate-limit state captured by
+         ``_capture_rate_limits()`` on the previous successful
+         response.  If any bucket there was already near-exhausted with
+         a substantial reset window, the current 429 is almost
+         certainly (a) continuing from that condition.
+
+    If neither signal fires, we treat the 429 as (b): fail the single
+    request, let the retry loop or model-switch proceed, and do NOT
+    write the cross-session breaker file.
+
+    Returns True when the evidence points at (a).
+    """
+    # Signal 1: current 429 response headers.
+    state = _parse_buckets_from_headers(headers)
+    if _has_exhausted_bucket(state):
+        return True
+
+    # Signal 2: last-known-good state from a recent successful response.
+    # Accepts either a RateLimitState (dataclass from rate_limit_tracker)
+    # or a dict of bucket snapshots.
+    if last_known_state is not None and _has_exhausted_bucket_in_object(last_known_state):
+        return True
+
+    return False
+
+
+def _parse_buckets_from_headers(
+    headers: Optional[Mapping[str, str]],
+) -> dict[str, tuple[Optional[int], Optional[float]]]:
+    """Extract (remaining, reset_seconds) per bucket from x-ratelimit-* headers.
+
+    Returns empty dict when no rate-limit headers are present.
+    """
+    if not headers:
+        return {}
+
+    lowered = {k.lower(): v for k, v in headers.items()}
+    if not any(k.startswith("x-ratelimit-") for k in lowered):
+        return {}
+
+    def _maybe_int(raw: Optional[str]) -> Optional[int]:
+        if raw is None:
+            return None
+        try:
+            return int(float(raw))
+        except (TypeError, ValueError):
+            return None
+
+    def _maybe_float(raw: Optional[str]) -> Optional[float]:
+        if raw is None:
+            return None
+        try:
+            return float(raw)
+        except (TypeError, ValueError):
+            return None
+
+    result: dict[str, tuple[Optional[int], Optional[float]]] = {}
+    for tag in ("requests", "requests-1h", "tokens", "tokens-1h"):
+        remaining = _maybe_int(lowered.get(f"x-ratelimit-remaining-{tag}"))
+        reset = _maybe_float(lowered.get(f"x-ratelimit-reset-{tag}"))
+        if remaining is not None or reset is not None:
+            result[tag] = (remaining, reset)
+    return result
+
+
+def _has_exhausted_bucket(
+    buckets: Mapping[str, tuple[Optional[int], Optional[float]]],
+) -> bool:
+    """Return True when any bucket has remaining == 0 AND a meaningful reset window."""
+    for remaining, reset in buckets.values():
+        if remaining is None or remaining > 0:
+            continue
+        if reset is None:
+            continue
+        if reset >= _MIN_RESET_FOR_BREAKER_SECONDS:
+            return True
+    return False
+
+
+def _has_exhausted_bucket_in_object(state: Any) -> bool:
+    """Check a RateLimitState-like object for an exhausted bucket.
+
+    Accepts the dataclass from ``agent.rate_limit_tracker`` (buckets
+    exposed as attributes ``requests_min``, ``requests_hour``,
+    ``tokens_min``, ``tokens_hour``) and falls back gracefully for any
+    object missing those attributes.
+    """
+    for attr in ("requests_min", "requests_hour", "tokens_min", "tokens_hour"):
+        bucket = getattr(state, attr, None)
+        if bucket is None:
+            continue
+        limit = getattr(bucket, "limit", 0) or 0
+        remaining = getattr(bucket, "remaining", 0) or 0
+        # Prefer the adjusted "remaining_seconds_now" property when present;
+        # fall back to raw reset_seconds.
+        reset = getattr(bucket, "remaining_seconds_now", None)
+        if reset is None:
+            reset = getattr(bucket, "reset_seconds", 0.0) or 0.0
+        if limit <= 0:
+            continue
+        if remaining > 0:
+            continue
+        if reset >= _MIN_RESET_FOR_BREAKER_SECONDS:
+            return True
+    return False
@@ -0,0 +1,191 @@
+"""
+Contextual first-touch onboarding hints.
+
+Instead of blocking first-run questionnaires, show a one-time hint the *first*
+time a user hits a behavior fork — message-while-running, first long-running
+tool, etc.  Each hint is shown once per install (tracked in ``config.yaml`` under
+``onboarding.seen.<flag>``) and then never again.
+
+Keep this module tiny and dependency-free so both the CLI and gateway can import
+it without pulling in heavy modules.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Any, Mapping, Optional
+
+logger = logging.getLogger(__name__)
+
+
+# -------------------------------------------------------------------------
+# Flag names (stable — used as config.yaml keys under onboarding.seen)
+# -------------------------------------------------------------------------
+
+BUSY_INPUT_FLAG = "busy_input_prompt"
+TOOL_PROGRESS_FLAG = "tool_progress_prompt"
+OPENCLAW_RESIDUE_FLAG = "openclaw_residue_cleanup"
+
+
+# -------------------------------------------------------------------------
+# Hint content
+# -------------------------------------------------------------------------
+
+def busy_input_hint_gateway(mode: str) -> str:
+    """Hint shown the first time a user messages while the agent is busy.
+
+    ``mode`` is the effective busy_input_mode that was just applied, so the
+    message matches reality ("I just interrupted…" vs "I just queued…").
+    """
+    if mode == "queue":
+        return (
+            "💡 First-time tip — I queued your message instead of interrupting. "
+            "Send `/busy interrupt` to make new messages stop the current task "
+            "immediately, or `/busy status` to check. This notice won't appear again."
+        )
+    if mode == "steer":
+        return (
+            "💡 First-time tip — I steered your message into the current run; "
+            "it will arrive after the next tool call instead of interrupting. "
+            "Send `/busy interrupt` or `/busy queue` to change this, or "
+            "`/busy status` to check. This notice won't appear again."
+        )
+    return (
+        "💡 First-time tip — I just interrupted my current task to answer you. "
+        "Send `/busy queue` to queue follow-ups for after the current task instead, "
+        "`/busy steer` to inject them mid-run without interrupting, or "
+        "`/busy status` to check. This notice won't appear again."
+    )
+
+
+def busy_input_hint_cli(mode: str) -> str:
+    """CLI version of the busy-input hint (plain text, no markdown)."""
+    if mode == "queue":
+        return (
+            "(tip) Your message was queued for the next turn. "
+            "Use /busy interrupt to make Enter stop the current run instead, "
+            "or /busy steer to inject mid-run. This tip only shows once."
+        )
+    if mode == "steer":
+        return (
+            "(tip) Your message was steered into the current run; it arrives "
+            "after the next tool call. Use /busy interrupt or /busy queue to "
+            "change this. This tip only shows once."
+        )
+    return (
+        "(tip) Your message interrupted the current run. "
+        "Use /busy queue to queue messages for the next turn instead, "
+        "or /busy steer to inject mid-run. This tip only shows once."
+    )
+
+
+def tool_progress_hint_gateway() -> str:
+    return (
+        "💡 First-time tip — that tool took a while and I'm streaming every step. "
+        "If the progress messages feel noisy, send `/verbose` to cycle modes "
+        "(all → new → off). This notice won't appear again."
+    )
+
+
+def tool_progress_hint_cli() -> str:
+    return (
+        "(tip) That tool ran for a while. Use /verbose to cycle tool-progress "
+        "display modes (all -> new -> off -> verbose). This tip only shows once."
+    )
+
+
+def openclaw_residue_hint_cli() -> str:
+    """Banner shown the first time Hermes starts and finds ``~/.openclaw/``.
+
+    OpenClaw-era config, memory, and skill paths in ``~/.openclaw/`` will
+    otherwise attract the agent (memory entries like ``~/.openclaw/config.yaml``
+    get carried forward and the agent dutifully reads them). ``hermes claw
+    cleanup`` renames the directory so the agent stops finding it.
+    """
+    return (
+        "Heads up — an OpenClaw workspace was detected at ~/.openclaw/.\n"
+        "After migrating, the agent can still get confused and read that "
+        "directory's config/memory instead of Hermes's.\n"
+        "Run `hermes claw cleanup` to archive it (rename → .openclaw.pre-migration). "
+        "This tip only shows once; rerun it any time with `hermes claw cleanup`."
+    )
+
+
+def detect_openclaw_residue(home: Optional[Path] = None) -> bool:
+    """Return True if an OpenClaw workspace directory is present in ``$HOME``.
+
+    Pure filesystem check — no side effects. ``home`` override exists for tests.
+    """
+    base = home or Path.home()
+    try:
+        return (base / ".openclaw").is_dir()
+    except OSError:
+        return False
+
+
+# -------------------------------------------------------------------------
+# State read / write
+# -------------------------------------------------------------------------
+
+def _get_seen_dict(config: Mapping[str, Any]) -> Mapping[str, Any]:
+    onboarding = config.get("onboarding") if isinstance(config, Mapping) else None
+    if not isinstance(onboarding, Mapping):
+        return {}
+    seen = onboarding.get("seen")
+    return seen if isinstance(seen, Mapping) else {}
+
+
+def is_seen(config: Mapping[str, Any], flag: str) -> bool:
+    """Return True if the user has already been shown this first-touch hint."""
+    return bool(_get_seen_dict(config).get(flag))
+
+
+def mark_seen(config_path: Path, flag: str) -> bool:
+    """Persist ``onboarding.seen.<flag> = True`` to ``config_path``.
+
+    Uses the atomic YAML writer so a concurrent process can't observe a
+    partially-written file.  Returns True on success, False on any error
+    (including the config file being absent — onboarding is best-effort).
+    """
+    try:
+        import yaml
+        from utils import atomic_yaml_write
+    except Exception as e:  # pragma: no cover — dependency issue
+        logger.debug("onboarding: failed to import yaml/utils: %s", e)
+        return False
+
+    try:
+        cfg: dict = {}
+        if config_path.exists():
+            with open(config_path, encoding="utf-8") as f:
+                cfg = yaml.safe_load(f) or {}
+        if not isinstance(cfg.get("onboarding"), dict):
+            cfg["onboarding"] = {}
+        seen = cfg["onboarding"].get("seen")
+        if not isinstance(seen, dict):
+            seen = {}
+            cfg["onboarding"]["seen"] = seen
+        if seen.get(flag) is True:
+            return True  # already marked — nothing to do
+        seen[flag] = True
+        atomic_yaml_write(config_path, cfg)
+        return True
+    except Exception as e:
+        logger.debug("onboarding: failed to mark flag %s: %s", flag, e)
+        return False
+
+
+__all__ = [
+    "BUSY_INPUT_FLAG",
+    "TOOL_PROGRESS_FLAG",
+    "OPENCLAW_RESIDUE_FLAG",
+    "busy_input_hint_gateway",
+    "busy_input_hint_cli",
+    "tool_progress_hint_gateway",
+    "tool_progress_hint_cli",
+    "openclaw_residue_hint_cli",
+    "detect_openclaw_residue",
+    "is_seen",
+    "mark_seen",
+]
@@ -141,6 +141,12 @@ DEFAULT_AGENT_IDENTITY = (
    "Be targeted and efficient in your exploration and investigations."
 )

+HERMES_AGENT_HELP_GUIDANCE = (
+    "If the user asks about configuring, setting up, or using Hermes Agent "
+    "itself, load the `hermes-agent` skill with skill_view(name='hermes-agent') "
+    "before answering. Docs: https://hermes-agent.nousresearch.com/docs"
+)
+
 MEMORY_GUIDANCE = (
    "You have persistent memory across sessions. Save durable facts using the memory "
    "tool: user preferences, environment details, tool quirks, and stable conventions. "
@@ -422,6 +428,29 @@ PLATFORM_HINTS = {
        "your response. Images are sent as native photos, and other files arrive as downloadable "
        "documents."
    ),
+    "yuanbao": (
+        "You are on Yuanbao (腾讯元宝), a Chinese AI assistant platform. "
+        "Markdown formatting is supported (code blocks, tables, bold/italic). "
+        "You CAN send media files natively — to deliver a file to the user, include "
+        "MEDIA:/absolute/path/to/file in your response. The file will be sent as a native "
+        "Yuanbao attachment: images (.jpg, .png, .webp, .gif) are sent as photos, "
+        "and other files (.pdf, .docx, .txt, .zip, etc.) arrive as downloadable documents "
+        "(max 50 MB). You can also include image URLs in markdown format ![alt](url) and "
+        "they will be downloaded and sent as native photos. "
+        "Do NOT tell the user you lack file-sending capability — use MEDIA: syntax "
+        "whenever a file delivery is appropriate.\n\n"
+        "Stickers (贴纸 / 表情包 / TIM face): Yuanbao has a built-in sticker catalogue. "
+        "When the user sends a sticker (you see '[emoji: 名称]' in their message) or asks "
+        "you to send/reply-with a 贴纸/表情/表情包, you MUST use the sticker tools:\n"
+        "  1. Call yb_search_sticker with a Chinese keyword (e.g. '666', '比心', '吃瓜', "
+        "     '捂脸', '合十') to discover matching sticker_ids.\n"
+        "  2. Call yb_send_sticker with the chosen sticker_id or name — this sends a real "
+        "     TIMFaceElem that renders as a native sticker in the chat.\n"
+        "DO NOT draw sticker-like PNGs with execute_code/Pillow/matplotlib and then send "
+        "them via MEDIA: or send_image_file. That produces a fake low-quality 'sticker' "
+        "image and is the WRONG path. Bare Unicode emoji in text is also not a substitute "
+        "— when a sticker is the right response, use yb_send_sticker."
+    ),
 }

 # ---------------------------------------------------------------------------
@@ -825,6 +854,11 @@ def build_skills_system_prompt(
            "Skills also encode the user's preferred approach, conventions, and quality standards "
            "for tasks like code review, planning, and testing — load them even for tasks you "
            "already know how to do, because the skill defines how it should be done here.\n"
+            "Whenever the user asks you to configure, set up, install, enable, disable, modify, "
+            "or troubleshoot Hermes Agent itself — its CLI, config, models, providers, tools, "
+            "skills, voice, gateway, plugins, or any feature — load the `hermes-agent` skill "
+            "first. It has the actual commands (e.g. `hermes config set …`, `hermes tools`, "
+            "`hermes setup`) so you don't have to guess or invent workarounds.\n"
            "If a skill has issues, fix it with skill_manage(action='patch').\n"
            "After difficult/iterative tasks, offer to save as a skill. "
            "If a skill you loaded was missing steps, had wrong commands, or needed "
@@ -754,7 +754,11 @@ def _resolve_effective_accept(
    if env in ("1", "true", "yes", "on"):
        return True
    cfg_val = cfg.get("hooks_auto_accept", False)
-    return bool(cfg_val)
+    if isinstance(cfg_val, bool):
+        return cfg_val
+    if isinstance(cfg_val, str):
+        return cfg_val.strip().lower() in ("1", "true", "yes", "on")
+    return False


 # ---------------------------------------------------------------------------
@@ -329,7 +329,7 @@ def build_skill_invocation_message(

    loaded_skill, skill_dir, skill_name = loaded
    activation_note = (
-        f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want '
+        f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want '
        "you to follow its instructions. The full skill content is loaded below.]"
    )
    return _build_skill_message(
@@ -368,7 +368,7 @@ def build_preloaded_skills_prompt(

        loaded_skill, skill_dir, skill_name = loaded
        activation_note = (
-            f'[SYSTEM: The user launched this CLI session with the "{skill_name}" skill '
+            f'[IMPORTANT: The user launched this CLI session with the "{skill_name}" skill '
            "preloaded. Treat its instructions as active guidance for the duration of this "
            "session unless the user overrides them.]"
        )
@@ -6,12 +6,18 @@ adds latency to the user-facing reply.

 import logging
 import threading
-from typing import Optional
+from typing import Callable, Optional

 from agent.auxiliary_client import call_llm

 logger = logging.getLogger(__name__)

+# Callback signature: (task_name, exception) -> None. Used to surface
+# auxiliary failures to the user through AIAgent._emit_auxiliary_failure
+# so silent-drops (e.g. OpenRouter 402 exhausting the fallback chain)
+# become visible instead of piling up as NULL session titles.
+FailureCallback = Callable[[str, BaseException], None]
+
 _TITLE_PROMPT = (
    "Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
    "following exchange. The title should capture the main topic or intent. "
@@ -19,11 +25,21 @@ _TITLE_PROMPT = (
 )


-def generate_title(user_message: str, assistant_response: str, timeout: float = 30.0) -> Optional[str]:
+def generate_title(
+    user_message: str,
+    assistant_response: str,
+    timeout: float = 30.0,
+    failure_callback: Optional[FailureCallback] = None,
+) -> Optional[str]:
    """Generate a session title from the first exchange.

    Uses the auxiliary LLM client (cheapest/fastest available model).
    Returns the title string or None on failure.
+
+    ``failure_callback`` is invoked with ``(task, exception)`` when the
+    auxiliary call raises — the caller typically wires this to
+    ``AIAgent._emit_auxiliary_failure`` so the user sees a warning instead
+    of silently accumulating untitled sessions.
    """
    # Truncate long messages to keep the request small
    user_snippet = user_message[:500] if user_message else ""
@@ -52,7 +68,15 @@ def generate_title(user_message: str, assistant_response: str, timeout: float =
            title = title[:77] + "..."
        return title if title else None
    except Exception as e:
-        logger.debug("Title generation failed: %s", e)
+        # Log at WARNING so this shows up in agent.log without debug mode.
+        # Full detail at debug level for operators who need the stack.
+        logger.warning("Title generation failed: %s", e)
+        logger.debug("Title generation traceback", exc_info=True)
+        if failure_callback is not None:
+            try:
+                failure_callback("title generation", e)
+            except Exception:
+                logger.debug("Title generation failure_callback raised", exc_info=True)
        return None


@@ -61,6 +85,7 @@ def auto_title_session(
    session_id: str,
    user_message: str,
    assistant_response: str,
+    failure_callback: Optional[FailureCallback] = None,
 ) -> None:
    """Generate and set a session title if one doesn't already exist.

@@ -81,7 +106,9 @@ def auto_title_session(
    except Exception:
        return

-    title = generate_title(user_message, assistant_response)
+    title = generate_title(
+        user_message, assistant_response, failure_callback=failure_callback
+    )
    if not title:
        return

@@ -98,6 +125,7 @@ def maybe_auto_title(
    user_message: str,
    assistant_response: str,
    conversation_history: list,
+    failure_callback: Optional[FailureCallback] = None,
 ) -> None:
    """Fire-and-forget title generation after the first exchange.

@@ -119,6 +147,7 @@ def maybe_auto_title(
    thread = threading.Thread(
        target=auto_title_session,
        args=(session_db, session_id, user_message, assistant_response),
+        kwargs={"failure_callback": failure_callback},
        daemon=True,
        name="auto-title",
    )
@@ -606,6 +606,7 @@ platform_toolsets:
  signal: [hermes-signal]
  homeassistant: [hermes-homeassistant]
  qqbot: [hermes-qqbot]
+  yuanbao: [hermes-yuanbao]

 # =============================================================================
 # Gateway Platform Settings
@@ -824,7 +825,9 @@ delegation:
 # Display
 # =============================================================================
 display:
-  # Use compact banner mode
+  # Use compact banner mode (hides the ASCII-art banner, shows a single line).
+  #   true:  Compact single-line banner
+  #   false: Full ASCII banner with tool/skill summary (default)
  compact: false

  # Tool progress display level (CLI and gateway)
@@ -838,12 +841,19 @@ display:
  # Gateway-only natural mid-turn assistant updates.
  # When true, completed assistant status messages are sent as separate chat
  # messages. This is independent of tool_progress and gateway streaming.
+  #   true:  Send mid-turn assistant updates as separate messages (default)
+  #   false: Only send the final response
  interim_assistant_messages: true

-  # What Enter does when Hermes is already busy in the CLI.
+  # What Enter does when Hermes is already busy (CLI and gateway platforms).
  #   interrupt: Interrupt the current run and redirect Hermes (default)
  #   queue:     Queue your message for the next turn
-  # Ctrl+C always interrupts regardless of this setting.
+  #   steer:     Inject your message mid-run via /steer, arriving at the agent
+  #              after the next tool call — no interrupt, no role violation.
+  #              Falls back to 'queue' if the agent isn't running yet or if
+  #              images are attached (steer only carries text).
+  # Ctrl+C (or /stop in gateway) always interrupts regardless of this setting.
+  # Toggle at runtime with /busy <interrupt|queue|steer>.
  busy_input_mode: interrupt

  # Background process notifications (gateway/messaging only).
@@ -859,17 +869,22 @@ display:
  # Play terminal bell when agent finishes a response.
  # Useful for long-running tasks — your terminal will ding when the agent is done.
  # Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
+  #   true:  Ring the terminal bell on each response
+  #   false: Silent (default)
  bell_on_complete: false

  # Show model reasoning/thinking before each response.
  # When enabled, a dim box shows the model's thought process above the response.
  # Toggle at runtime with /reasoning show or /reasoning hide.
+  #   true:  Show the reasoning box
+  #   false: Hide reasoning (default)
  show_reasoning: false

  # Stream tokens to the terminal as they arrive instead of waiting for the
  # full response. The response box opens on first token and text appears
  # line-by-line. Tool calls are still captured silently.
-  # Stream tokens to the terminal in real-time. Disable to wait for full responses.
+  #   true:  Stream tokens as they arrive (default)
+  #   false: Wait for the full response before rendering
  streaming: true

  # ───────────────────────────────────────────────────────────────────────────
@@ -879,10 +894,15 @@ display:
  # response box label, and branding text. Change at runtime with /skin <name>.
  #
  # Built-in skins:
-  #   default  — Classic Hermes gold/kawaii
-  #   ares     — Crimson/bronze war-god theme with spinner wings
-  #   mono     — Clean grayscale monochrome
-  #   slate    — Cool blue developer-focused
+  #   default        — Classic Hermes gold/kawaii
+  #   ares           — Crimson/bronze war-god theme with spinner wings
+  #   mono           — Clean grayscale monochrome
+  #   slate          — Cool blue developer-focused
+  #   daylight       — Bright light-mode theme
+  #   warm-lightmode — Warm paper-tone light-mode theme
+  #   poseidon       — Sea-green/teal Olympian theme
+  #   sisyphus       — Earthy stone-and-moss theme
+  #   charizard      — Fiery orange dragon theme
  #
  # Custom skins: drop a YAML file in ~/.hermes/skins/<name>.yaml
  # Schema (all fields optional, missing values inherit from default):
@@ -15,6 +15,7 @@ Usage:

 import logging
 import os
+import re
 import shutil
 import sys
 import json
@@ -417,6 +418,11 @@ def load_cli_config() -> Dict[str, Any]:
            "base_url": "",    # Direct OpenAI-compatible endpoint for subagents
            "api_key": "",     # API key for delegation.base_url (falls back to OPENAI_API_KEY)
        },
+        "onboarding": {
+            # First-touch hint flags (see agent/onboarding.py).  Each hint is
+            # shown once per install then latched here.
+            "seen": {},
+        },
    }
    
    # Track whether the config file explicitly set terminal config.
@@ -753,9 +759,17 @@ def _run_cleanup():
        pass
    try:
        if _active_agent_ref and hasattr(_active_agent_ref, 'shutdown_memory_provider'):
-            _active_agent_ref.shutdown_memory_provider(
-                getattr(_active_agent_ref, 'conversation_history', None) or []
-            )
+            # Forward the agent's own transcript so memory providers'
+            # ``on_session_end`` hooks see the real conversation instead of
+            # an empty list (#15165). ``_session_messages`` is set on
+            # ``AIAgent.__init__`` and refreshed every turn via
+            # ``_persist_session``. Fall back to no-arg on test stubs /
+            # partially-initialised agents where the attribute is missing.
+            _session_msgs = getattr(_active_agent_ref, '_session_messages', None)
+            if isinstance(_session_msgs, list):
+                _active_agent_ref.shutdown_memory_provider(_session_msgs)
+            else:
+                _active_agent_ref.shutdown_memory_provider()
    except Exception:
        pass

@@ -969,6 +983,7 @@ def _run_state_db_auto_maintenance(session_db) -> None:
        return
    try:
        from hermes_cli.config import load_config as _load_full_config
+        from hermes_constants import get_hermes_home as _get_hermes_home
        cfg = (_load_full_config().get("sessions") or {})
        if not cfg.get("auto_prune", False):
            return
@@ -976,11 +991,35 @@ def _run_state_db_auto_maintenance(session_db) -> None:
            retention_days=int(cfg.get("retention_days", 90)),
            min_interval_hours=int(cfg.get("min_interval_hours", 24)),
            vacuum=bool(cfg.get("vacuum_after_prune", True)),
+            sessions_dir=_get_hermes_home() / "sessions",
        )
    except Exception as exc:
        logger.debug("state.db auto-maintenance skipped: %s", exc)


+def _run_checkpoint_auto_maintenance() -> None:
+    """Call ``checkpoint_manager.maybe_auto_prune_checkpoints`` using current config.
+
+    Reads the ``checkpoints:`` section from config.yaml via
+    :func:`hermes_cli.config.load_config`. Honours ``auto_prune`` /
+    ``retention_days`` / ``delete_orphans`` / ``min_interval_hours``.
+    Never raises — maintenance must never block interactive startup.
+    """
+    try:
+        from hermes_cli.config import load_config as _load_full_config
+        cfg = (_load_full_config().get("checkpoints") or {})
+        if not cfg.get("auto_prune", False):
+            return
+        from tools.checkpoint_manager import maybe_auto_prune_checkpoints
+        maybe_auto_prune_checkpoints(
+            retention_days=int(cfg.get("retention_days", 7)),
+            min_interval_hours=int(cfg.get("min_interval_hours", 24)),
+            delete_orphans=bool(cfg.get("delete_orphans", True)),
+        )
+    except Exception as exc:
+        logger.debug("checkpoint auto-maintenance skipped: %s", exc)
+
+
 def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
    """Remove stale worktrees and orphaned branches on startup.

@@ -1373,7 +1412,7 @@ def _resolve_attachment_path(raw_path: str) -> Path | None:


 def _format_process_notification(evt: dict) -> "str | None":
-    """Format a process notification event into a [SYSTEM: ...] message.
+    """Format a process notification event into a [IMPORTANT: ...] message.

    Handles both completion events (notify_on_complete) and watch pattern
    match events from the unified completion_queue.
@@ -1383,14 +1422,14 @@ def _format_process_notification(evt: dict) -> "str | None":
    _cmd = evt.get("command", "unknown")

    if evt_type == "watch_disabled":
-        return f"[SYSTEM: {evt.get('message', '')}]"
+        return f"[IMPORTANT: {evt.get('message', '')}]"

    if evt_type == "watch_match":
        _pat = evt.get("pattern", "?")
        _out = evt.get("output", "")
        _sup = evt.get("suppressed", 0)
        text = (
-            f"[SYSTEM: Background process {_sid} matched "
+            f"[IMPORTANT: Background process {_sid} matched "
            f"watch pattern \"{_pat}\".\n"
            f"Command: {_cmd}\n"
            f"Matched output:\n{_out}"
@@ -1404,7 +1443,7 @@ def _format_process_notification(evt: dict) -> "str | None":
    _exit = evt.get("exit_code", "?")
    _out = evt.get("output", "")
    return (
-        f"[SYSTEM: Background process {_sid} completed "
+        f"[IMPORTANT: Background process {_sid} completed "
        f"(exit code {_exit}).\n"
        f"Command: {_cmd}\n"
        f"Output:\n{_out}]"
@@ -1517,6 +1556,60 @@ def _should_auto_attach_clipboard_image_on_paste(pasted_text: str) -> bool:
    return not pasted_text.strip()


+def _strip_leaked_bracketed_paste_wrappers(text: str) -> str:
+    """Strip leaked bracketed-paste wrapper markers from user-visible text.
+
+    Defensive normalization for cases where terminal/prompt_toolkit parsing
+    fails and bracketed-paste markers end up in the buffer as literal text.
+
+    We strip canonical wrappers unconditionally and also handle degraded
+    visible forms like ``[200~`` / ``[201~`` and ``00~`` / ``01~`` when they
+    look like wrapper boundaries, not arbitrary user content.
+    """
+    if not text:
+        return text
+
+    text = (
+        text.replace("\x1b[200~", "")
+        .replace("\x1b[201~", "")
+        .replace("^[[200~", "")
+        .replace("^[[201~", "")
+    )
+    text = re.sub(r"(^|[\s\n>:\]\)])\[200~", r"\1", text)
+    text = re.sub(r"\[201~(?=$|[\s\n<\[\(\):;.,!?])", "", text)
+    text = re.sub(r"(^|[\s\n>:\]\)])00~", r"\1", text)
+    text = re.sub(r"01~(?=$|[\s\n<\[\(\):;.,!?])", "", text)
+    return text
+
+
+# Cursor Position Report (CPR / DSR) response, format ``ESC[<row>;<col>R``.
+# prompt_toolkit's _on_resize() + renderer send ``ESC[6n`` queries to the
+# terminal; under resize storms or tab switches the terminal's reply can
+# race past the input parser and end up in the input buffer as literal
+# text (see issue #14692). Also matches the visible-form ``^[[<row>;<col>R``
+# that appears when the ESC byte was stripped by a prior filter.
+_DSR_CPR_ESC_RE = re.compile(r"\x1b\[\d+;\d+R")
+_DSR_CPR_VISIBLE_RE = re.compile(r"\^\[\[\d+;\d+R")
+
+
+def _strip_leaked_terminal_responses(text: str) -> str:
+    """Strip leaked terminal control-response sequences from user input.
+
+    Covers Cursor Position Report (CPR / DSR) responses — ``ESC[<row>;<col>R``
+    and the visible ``^[[<row>;<col>R`` form. These are replies the terminal
+    sends back to queries prompt_toolkit makes during ``_on_resize`` /
+    ``_request_absolute_cursor_position``. When the input parser drops one
+    (resize storms, multiplexer focus changes, slow PTYs) the response
+    lands in the input buffer as literal text and corrupts what the user
+    typed.
+    """
+    if not text:
+        return text
+    text = _DSR_CPR_ESC_RE.sub("", text)
+    text = _DSR_CPR_VISIBLE_RE.sub("", text)
+    return text
+
+
 def _collect_query_images(query: str | None, image_arg: str | None = None) -> tuple[str, list[Path]]:
    """Collect local image attachments for single-query CLI flows."""
    message = query or ""
@@ -1843,9 +1936,16 @@ class HermesCLI:
        self.bell_on_complete = CLI_CONFIG["display"].get("bell_on_complete", False)
        # show_reasoning: display model thinking/reasoning before the response
        self.show_reasoning = CLI_CONFIG["display"].get("show_reasoning", False)
-        # busy_input_mode: "interrupt" (Enter interrupts current run) or "queue" (Enter queues for next turn)
-        _bim = CLI_CONFIG["display"].get("busy_input_mode", "interrupt")
-        self.busy_input_mode = "queue" if str(_bim).strip().lower() == "queue" else "interrupt"
+        # busy_input_mode: "interrupt" (Enter interrupts current run),
+        # "queue" (Enter queues for next turn), or "steer" (Enter injects
+        # mid-run via /steer, arriving after the next tool call).
+        _bim = str(CLI_CONFIG["display"].get("busy_input_mode", "interrupt")).strip().lower()
+        if _bim == "queue":
+            self.busy_input_mode = "queue"
+        elif _bim == "steer":
+            self.busy_input_mode = "steer"
+        else:
+            self.busy_input_mode = "interrupt"

        self.verbose = verbose if verbose is not None else (self.tool_progress_mode == "verbose")
        
@@ -2040,6 +2140,11 @@ class HermesCLI:
        # Never blocks startup on failure.
        _run_state_db_auto_maintenance(self._session_db)

+        # Opportunistic shadow-repo cleanup — deletes orphan/stale
+        # checkpoint repos under ~/.hermes/checkpoints/.  Opt-in via
+        # checkpoints.auto_prune, idempotent via .last_prune marker.
+        _run_checkpoint_auto_maintenance()
+
        # Deferred title: stored in memory until the session is created in the DB
        self._pending_title: Optional[str] = None
        
@@ -2113,6 +2218,42 @@ class HermesCLI:
            self._last_invalidate = now
            self._app.invalidate()

+    def _force_full_redraw(self) -> None:
+        """Force a clean full-screen repaint of the prompt_toolkit UI.
+
+        Used to recover from terminal buffer drift caused by external
+        redraws we can't detect — e.g. macOS cmux / tmux tab switches,
+        ``clear`` issued from a subshell, or SSH window restores. These
+        wipe or repaint the terminal without firing SIGWINCH, so
+        prompt_toolkit's tracked ``_cursor_pos`` no longer matches reality
+        and the next incremental redraw stacks on top of stale content
+        (ghost status bars, duplicated prompts).
+
+        Bound to Ctrl+L and exposed as the ``/redraw`` slash command,
+        matching the standard terminal-UX convention (bash, zsh, fish,
+        vim, htop).
+        """
+        app = getattr(self, "_app", None)
+        if not app:
+            return
+        try:
+            renderer = app.renderer
+            out = renderer.output
+            out.reset_attributes()
+            out.erase_screen()
+            out.cursor_goto(0, 0)
+            out.flush()
+            # Drop prompt_toolkit's cached screen + cursor state so the
+            # next _redraw() starts from a known (0, 0) origin and
+            # re-renders every cell rather than diffing against stale.
+            renderer.reset(leave_alternate_screen=False)
+        except Exception:
+            pass
+        try:
+            app.invalidate()
+        except Exception:
+            pass
+
    def _status_bar_context_style(self, percent_used: Optional[int]) -> str:
        if percent_used is None:
            return "class:status-bar-dim"
@@ -4910,6 +5051,12 @@ class HermesCLI:
        if self.agent:
            self.agent.session_id = new_session_id
            self.agent.session_start = now
+            # Redirect the JSON session log to the new branch session file so
+            # messages written after branching land in the correct file.
+            if hasattr(self.agent, "session_log_file") and hasattr(self.agent, "logs_dir"):
+                self.agent.session_log_file = (
+                    self.agent.logs_dir / f"session_{new_session_id}.json"
+                )
            self.agent.reset_session_state()
            if hasattr(self.agent, "_last_flushed_db_idx"):
                self.agent._last_flushed_db_idx = len(self.conversation_history)
@@ -4931,22 +5078,37 @@ class HermesCLI:
        _cprint(f"  Branch session:   {new_session_id}")

    def save_conversation(self):
-        """Save the current conversation to a file."""
+        """Save the current conversation to a JSON snapshot under ~/.hermes/sessions/saved/.
+
+        The snapshot is a convenience export for sharing or off-line inspection;
+        every message is already persisted incrementally to the SQLite session
+        DB, so the live session remains resumable via ``hermes --resume <id>``
+        regardless of whether the user ever runs ``/save``.
+        """
        if not self.conversation_history:
            print("(;_;) No conversation to save.")
            return
-        
+
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-        filename = f"hermes_conversation_{timestamp}.json"
-        
+        saved_dir = get_hermes_home() / "sessions" / "saved"
        try:
-            with open(filename, "w", encoding="utf-8") as f:
+            saved_dir.mkdir(parents=True, exist_ok=True)
+        except Exception as e:
+            print(f"(x_x) Failed to create save directory {saved_dir}: {e}")
+            return
+        path = saved_dir / f"hermes_conversation_{timestamp}.json"
+
+        try:
+            with open(path, "w", encoding="utf-8") as f:
                json.dump({
                    "model": self.model,
+                    "session_id": self.session_id,
                    "session_start": self.session_start.isoformat(),
                    "messages": self.conversation_history,
                }, f, indent=2, ensure_ascii=False)
-            print(f"(^_^)v Conversation saved to: {filename}")
+            print(f"(^_^)v Conversation snapshot saved to: {path}")
+            if self.session_id:
+                print(f"       Resume the live session with: hermes --resume {self.session_id}")
        except Exception as e:
            print(f"(x_x) Failed to save: {e}")
    
@@ -5153,27 +5315,29 @@ class HermesCLI:
        _cprint(f"  ✓ Model switched: {result.new_model}")
        _cprint(f"    Provider: {provider_label}")

+        # Context: always resolve via the provider-aware chain so Codex OAuth,
+        # Copilot, and Nous-enforced caps win over the raw models.dev entry
+        # (e.g. gpt-5.5 is 1.05M on openai but 272K on Codex OAuth).
        mi = result.model_info
+        try:
+            from hermes_cli.model_switch import resolve_display_context_length
+            ctx = resolve_display_context_length(
+                result.new_model,
+                result.target_provider,
+                base_url=result.base_url or self.base_url or "",
+                api_key=result.api_key or self.api_key or "",
+                model_info=mi,
+            )
+            if ctx:
+                _cprint(f"    Context: {ctx:,} tokens")
+        except Exception:
+            pass
        if mi:
-            if mi.context_window:
-                _cprint(f"    Context: {mi.context_window:,} tokens")
            if mi.max_output:
                _cprint(f"    Max output: {mi.max_output:,} tokens")
            if mi.has_cost_data():
                _cprint(f"    Cost: {mi.format_cost()}")
            _cprint(f"    Capabilities: {mi.format_capabilities()}")
-        else:
-            try:
-                from agent.model_metadata import get_model_context_length
-                ctx = get_model_context_length(
-                    result.new_model,
-                    base_url=result.base_url or self.base_url,
-                    api_key=result.api_key or self.api_key,
-                    provider=result.target_provider,
-                )
-                _cprint(f"    Context: {ctx:,} tokens")
-            except Exception:
-                pass

        cache_enabled = (
            (base_url_host_matches(result.base_url or "", "openrouter.ai") and "claude" in result.new_model.lower())
@@ -5836,6 +6000,7 @@ class HermesCLI:
            platform_status = {
                Platform.TELEGRAM: ("Telegram", "TELEGRAM_BOT_TOKEN"),
                Platform.DISCORD: ("Discord", "DISCORD_BOT_TOKEN"),
+                Platform.SLACK: ("Slack", "SLACK_BOT_TOKEN"),
                Platform.WHATSAPP: ("WhatsApp", "WHATSAPP_ENABLED"),
            }
            
@@ -5906,6 +6071,12 @@ class HermesCLI:
            self.show_toolsets()
        elif canonical == "config":
            self.show_config()
+        elif canonical == "redraw":
+            # Manual recovery for terminal buffer drift from multiplexer
+            # tab switches, subshell ``clear``, SSH window restores, etc.
+            # See issue #8688 (cmux). Ctrl+L is bound to the same helper.
+            self._force_full_redraw()
+            _cprint(f"  {_DIM}✓ UI redrawn{_RST}")
        elif canonical == "clear":
            self.new_session(silent=True)
            # Clear terminal screen.  Inside the TUI, Rich's console.clear()
@@ -6122,8 +6293,6 @@ class HermesCLI:
            self._handle_agents_command()
        elif canonical == "background":
            self._handle_background_command(cmd_original)
-        elif canonical == "btw":
-            self._handle_btw_command(cmd_original)
        elif canonical == "queue":
            # Extract prompt after "/queue " or "/q "
            parts = cmd_original.split(None, 1)
@@ -6302,6 +6471,12 @@ class HermesCLI:
        turn_route = self._resolve_turn_agent_config(prompt)

        def run_background():
+            set_sudo_password_callback(self._sudo_password_callback)
+            set_approval_callback(self._approval_callback)
+            try:
+                set_secret_capture_callback(self._secret_capture_callback)
+            except Exception:
+                pass
            try:
                bg_agent = AIAgent(
                    model=turn_route["model"],
@@ -6399,6 +6574,12 @@ class HermesCLI:
                print()
                _cprint(f"  ❌ Background task #{task_num} failed: {e}")
            finally:
+                try:
+                    set_sudo_password_callback(None)
+                    set_approval_callback(None)
+                    set_secret_capture_callback(None)
+                except Exception:
+                    pass
                self._background_tasks.pop(task_id, None)
                # Clear spinner only if no foreground agent owns it
                if not self._agent_running:
@@ -6410,122 +6591,6 @@ class HermesCLI:
        self._background_tasks[task_id] = thread
        thread.start()

-    def _handle_btw_command(self, cmd: str):
-        """Handle /btw <question> — ephemeral side question using session context.
-
-        Snapshots the current conversation history, spawns a no-tools agent in
-        a background thread, and prints the answer without persisting anything
-        to the main session.
-        """
-        parts = cmd.strip().split(maxsplit=1)
-        if len(parts) < 2 or not parts[1].strip():
-            _cprint("  Usage: /btw <question>")
-            _cprint("  Example: /btw what module owns session title sanitization?")
-            _cprint("  Answers using session context. No tools, not persisted.")
-            return
-
-        question = parts[1].strip()
-        task_id = f"btw_{datetime.now().strftime('%H%M%S')}_{uuid.uuid4().hex[:6]}"
-
-        if not self._ensure_runtime_credentials():
-            _cprint("  (>_<) Cannot start /btw: no valid credentials.")
-            return
-
-        turn_route = self._resolve_turn_agent_config(question)
-        history_snapshot = list(self.conversation_history)
-
-        preview = question[:60] + ("..." if len(question) > 60 else "")
-        _cprint(f'  💬 /btw: "{preview}"')
-
-        def run_btw():
-            try:
-                btw_agent = AIAgent(
-                    model=turn_route["model"],
-                    api_key=turn_route["runtime"].get("api_key"),
-                    base_url=turn_route["runtime"].get("base_url"),
-                    provider=turn_route["runtime"].get("provider"),
-                    api_mode=turn_route["runtime"].get("api_mode"),
-                    acp_command=turn_route["runtime"].get("command"),
-                    acp_args=turn_route["runtime"].get("args"),
-                    max_iterations=8,
-                    enabled_toolsets=[],
-                    quiet_mode=True,
-                    verbose_logging=False,
-                    session_id=task_id,
-                    platform="cli",
-                    reasoning_config=self.reasoning_config,
-                    service_tier=self.service_tier,
-                    request_overrides=turn_route.get("request_overrides"),
-                    providers_allowed=self._providers_only,
-                    providers_ignored=self._providers_ignore,
-                    providers_order=self._providers_order,
-                    provider_sort=self._provider_sort,
-                    provider_require_parameters=self._provider_require_params,
-                    provider_data_collection=self._provider_data_collection,
-                    fallback_model=self._fallback_model,
-                    session_db=None,
-                    skip_memory=True,
-                    skip_context_files=True,
-                    persist_session=False,
-                )
-
-                btw_prompt = (
-                    "[Ephemeral /btw side question. Answer using the conversation "
-                    "context. No tools available. Be direct and concise.]\n\n"
-                    + question
-                )
-                result = btw_agent.run_conversation(
-                    user_message=btw_prompt,
-                    conversation_history=history_snapshot,
-                    task_id=task_id,
-                )
-
-                response = (result.get("final_response") or "") if result else ""
-                if not response and result and result.get("error"):
-                    response = f"Error: {result['error']}"
-
-                # TUI refresh before printing
-                if self._app:
-                    self._app.invalidate()
-                    time.sleep(0.05)
-                print()
-
-                if response:
-                    try:
-                        from hermes_cli.skin_engine import get_active_skin
-                        _skin = get_active_skin()
-                        _resp_color = _skin.get_color("response_border", "#4F6D4A")
-                    except Exception:
-                        _resp_color = "#4F6D4A"
-
-                    ChatConsole().print(Panel(
-                        _render_final_assistant_content(response, mode=self.final_response_markdown),
-                        title=f"[{_resp_color} bold]⚕ /btw[/]",
-                        title_align="left",
-                        border_style=_resp_color,
-                        box=rich_box.HORIZONTALS,
-                        padding=(1, 4),
-                    ))
-                else:
-                    _cprint("  💬 /btw: (no response)")
-
-                if self.bell_on_complete:
-                    sys.stdout.write("\a")
-                    sys.stdout.flush()
-
-            except Exception as e:
-                if self._app:
-                    self._app.invalidate()
-                    time.sleep(0.05)
-                print()
-                _cprint(f"  ❌ /btw failed: {e}")
-            finally:
-                if self._app:
-                    self._invalidate(min_interval=0)
-
-        thread = threading.Thread(target=run_btw, daemon=True, name=f"btw-{task_id}")
-        thread.start()
-
    @staticmethod
    def _try_launch_chrome_debug(port: int, system: str) -> bool:
        """Try to launch Chrome/Chromium with remote debugging enabled.
@@ -6909,24 +6974,36 @@ class HermesCLI:
            /busy               Show current busy input mode
            /busy status        Show current busy input mode
            /busy queue         Queue input for the next turn instead of interrupting
+            /busy steer         Inject Enter mid-run via /steer (after next tool call)
            /busy interrupt     Interrupt the current run on Enter (default)
        """
        parts = cmd.strip().split(maxsplit=1)
        if len(parts) < 2 or parts[1].strip().lower() == "status":
            _cprint(f"  {_ACCENT}Busy input mode: {self.busy_input_mode}{_RST}")
-            _cprint(f"  {_DIM}Enter while busy: {'queues for next turn' if self.busy_input_mode == 'queue' else 'interrupts current run'}{_RST}")
-            _cprint(f"  {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
+            if self.busy_input_mode == "queue":
+                _behavior = "queues for next turn"
+            elif self.busy_input_mode == "steer":
+                _behavior = "steers into current run (after next tool call)"
+            else:
+                _behavior = "interrupts current run"
+            _cprint(f"  {_DIM}Enter while busy: {_behavior}{_RST}")
+            _cprint(f"  {_DIM}Usage: /busy [queue|steer|interrupt|status]{_RST}")
            return

        arg = parts[1].strip().lower()
-        if arg not in {"queue", "interrupt"}:
+        if arg not in {"queue", "interrupt", "steer"}:
            _cprint(f"  {_DIM}(._.) Unknown argument: {arg}{_RST}")
-            _cprint(f"  {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
+            _cprint(f"  {_DIM}Usage: /busy [queue|steer|interrupt|status]{_RST}")
            return

        self.busy_input_mode = arg
        if save_config_value("display.busy_input_mode", arg):
-            behavior = "Enter will queue follow-up input while Hermes is busy." if arg == "queue" else "Enter will interrupt the current run while Hermes is busy."
+            if arg == "queue":
+                behavior = "Enter will queue follow-up input while Hermes is busy."
+            elif arg == "steer":
+                behavior = "Enter will steer your message into the current run (after the next tool call)."
+            else:
+                behavior = "Enter will interrupt the current run while Hermes is busy."
            _cprint(f"  {_ACCENT}✓ Busy input mode set to '{arg}' (saved to config){_RST}")
            _cprint(f"  {_DIM}{behavior}{_RST}")
        else:
@@ -7328,7 +7405,7 @@ class HermesCLI:
            change_detail = ". ".join(change_parts) + ". " if change_parts else ""
            self.conversation_history.append({
                "role": "user",
-                "content": f"[SYSTEM: MCP servers have been reloaded. {change_detail}{tool_summary}. The tool list for this conversation has been updated accordingly.]",
+                "content": f"[IMPORTANT: MCP servers have been reloaded. {change_detail}{tool_summary}. The tool list for this conversation has been updated accordingly.]",
            })

            # Persist session immediately so the session log reflects the
@@ -7410,6 +7487,31 @@ class HermesCLI:
                    _cprint(f"  {line}")
                except Exception:
                    pass
+                # First-touch onboarding: on the first tool in this process
+                # that takes longer than the threshold while we're in the
+                # noisiest progress mode, print a one-time hint about
+                # /verbose.  Latched on self so it fires at most once per
+                # process; persisted to config.yaml so it never fires again
+                # across processes either.
+                try:
+                    if (
+                        not getattr(self, "_long_tool_hint_fired", False)
+                        and self.tool_progress_mode == "all"
+                        and duration >= 30.0
+                    ):
+                        from agent.onboarding import (
+                            TOOL_PROGRESS_FLAG,
+                            is_seen,
+                            mark_seen,
+                            tool_progress_hint_cli,
+                        )
+                        if not is_seen(CLI_CONFIG, TOOL_PROGRESS_FLAG):
+                            self._long_tool_hint_fired = True
+                            _cprint(f"  {_DIM}{tool_progress_hint_cli()}{_RST}")
+                            mark_seen(_hermes_home / "config.yaml", TOOL_PROGRESS_FLAG)
+                            CLI_CONFIG.setdefault("onboarding", {}).setdefault("seen", {})[TOOL_PROGRESS_FLAG] = True
+                except Exception:
+                    pass
            self._invalidate()
            return
        if event_type != "tool.started":
@@ -8340,13 +8442,62 @@ class HermesCLI:
        ):
            return None
        
-        # Pre-process images through the vision tool (Gemini Flash) so the
-        # main model receives text descriptions instead of raw base64 image
-        # content — works with any model, not just vision-capable ones.
+        # Route image attachments based on the active model's vision capability.
+        # "native" → pass pixels as OpenAI-style content parts (adapters
+        #            translate for Anthropic/Gemini/Bedrock).
+        # "text"   → pre-analyze each image with vision_analyze and prepend the
+        #            description as text — works with non-vision models.
+        # See agent/image_routing.py for the decision table.
        if images:
-            message = self._preprocess_images_with_vision(
-                message if isinstance(message, str) else "", images
-            )
+            try:
+                from agent.image_routing import (
+                    build_native_content_parts,
+                    decide_image_input_mode,
+                )
+                from hermes_cli.config import load_config
+
+                _img_mode = decide_image_input_mode(
+                    (self.provider or "").strip(),
+                    (self.model or "").strip(),
+                    load_config(),
+                )
+            except Exception as _img_exc:
+                logging.debug("image_routing decision failed, defaulting to text: %s", _img_exc)
+                _img_mode = "text"
+
+            if _img_mode == "native":
+                try:
+                    _text_for_parts = message if isinstance(message, str) else ""
+                    _img_str_paths = [str(p) for p in images]
+                    _parts, _skipped = build_native_content_parts(
+                        _text_for_parts,
+                        _img_str_paths,
+                    )
+                    if _skipped:
+                        _cprint(
+                            f"  {_DIM}⚠ skipped {len(_skipped)} unreadable image path(s){_RST}"
+                        )
+                    if any(p.get("type") == "image_url" for p in _parts):
+                        _img_names = ", ".join(Path(p).name for p in _img_str_paths)
+                        _cprint(
+                            f"  {_DIM}📎 attaching {len(images)} image(s) natively "
+                            f"(model supports vision): {_img_names}{_RST}"
+                        )
+                        message = _parts
+                    else:
+                        # All images unreadable — fall back to text enrichment.
+                        message = self._preprocess_images_with_vision(
+                            message if isinstance(message, str) else "", images
+                        )
+                except Exception as _img_exc:
+                    logging.warning("native image attach failed, falling back to text: %s", _img_exc)
+                    message = self._preprocess_images_with_vision(
+                        message if isinstance(message, str) else "", images
+                    )
+            else:
+                message = self._preprocess_images_with_vision(
+                    message if isinstance(message, str) else "", images
+                )

        # Expand @ context references (e.g. @file:main.py, @diff, @folder:src/)
        if isinstance(message, str) and "@" in message:
@@ -8649,12 +8800,20 @@ class HermesCLI:
            if response and result and not result.get("failed") and not result.get("partial"):
                try:
                    from agent.title_generator import maybe_auto_title
+                    # Route title-generation failures through the agent's
+                    # user-visible warning channel so a depleted auxiliary
+                    # provider doesn't silently leave sessions untitled
+                    # (issue #15775).
+                    _title_failure_cb = getattr(
+                        self.agent, "_emit_auxiliary_failure", None
+                    ) if self.agent else None
                    maybe_auto_title(
                        self._session_db,
                        self.session_id,
                        message,
                        response,
                        self.conversation_history,
+                        failure_callback=_title_failure_cb,
                    )
                except Exception:
                    pass
@@ -9077,6 +9236,30 @@ class HermesCLI:
            _welcome_text = "Welcome to Hermes Agent! Type your message or /help for commands."
            _welcome_color = "#FFF8DC"
        self._console_print(f"[{_welcome_color}]{_welcome_text}[/]")
+        # First-time OpenClaw-residue banner — fires once if ~/.openclaw/ exists
+        # after an OpenClaw→Hermes migration (especially migrations done by
+        # OpenClaw's own tool, which doesn't archive the source directory).
+        try:
+            from agent.onboarding import (
+                OPENCLAW_RESIDUE_FLAG,
+                detect_openclaw_residue,
+                is_seen,
+                mark_seen,
+                openclaw_residue_hint_cli,
+            )
+            if not is_seen(self.config, OPENCLAW_RESIDUE_FLAG) and detect_openclaw_residue():
+                try:
+                    _resid_color = _welcome_skin.get_color("banner_dim", "#B8860B")
+                except Exception:
+                    _resid_color = "#B8860B"
+                self._console_print(f"[{_resid_color}]{openclaw_residue_hint_cli()}[/]")
+                try:
+                    from hermes_cli.config import get_config_path as _get_cfg_path_resid
+                    mark_seen(_get_cfg_path_resid(), OPENCLAW_RESIDUE_FLAG)
+                except Exception:
+                    pass  # best-effort — banner will fire again next session
+        except Exception:
+            pass  # banner is non-critical — never break startup
        # Show a random tip to help users discover features
        try:
            from hermes_cli.tips import get_random_tip
@@ -9278,12 +9461,34 @@ class HermesCLI:
                # Bundle text + images as a tuple when images are present
                payload = (text, images) if images else text
                if self._agent_running and not (text and _looks_like_slash_command(text)):
-                    if self.busy_input_mode == "queue":
+                    _effective_mode = self.busy_input_mode
+                    if _effective_mode == "steer":
+                        # Route Enter through /steer — inject mid-run after the
+                        # next tool call.  Images can't ride along (steer only
+                        # appends text), so fall back to queue when images are
+                        # attached.  If the agent lacks steer() or rejects the
+                        # payload, also fall back to queue so nothing is lost.
+                        if images or not text:
+                            _effective_mode = "queue"
+                        else:
+                            accepted = False
+                            try:
+                                if self.agent is not None and hasattr(self.agent, "steer"):
+                                    accepted = bool(self.agent.steer(text))
+                            except Exception as exc:
+                                _cprint(f"  {_DIM}Steer failed ({exc}) — queued for next turn.{_RST}")
+                                accepted = False
+                            if accepted:
+                                preview = text[:80] + ("..." if len(text) > 80 else "")
+                                _cprint(f"  {_ACCENT}⏩ Steered: '{preview}'{_RST}")
+                            else:
+                                _effective_mode = "queue"
+                    if _effective_mode == "queue":
                        # Queue for the next turn instead of interrupting
                        self._pending_input.put(payload)
                        preview = text if text else f"[{len(images)} image{'s' if len(images) != 1 else ''} attached]"
                        _cprint(f"  Queued for the next turn: {preview[:80]}{'...' if len(preview) > 80 else ''}")
-                    else:
+                    elif _effective_mode == "interrupt":
                        self._interrupt_queue.put(payload)
                        # Debug: log to file when message enters interrupt queue
                        try:
@@ -9293,6 +9498,24 @@ class HermesCLI:
                                         f"agent_running={self._agent_running}\n")
                        except Exception:
                            pass
+                    # First-touch onboarding: on the very first busy-while-running
+                    # event for this install, print a one-line tip explaining the
+                    # /busy knob.  Flag persists to config.yaml and never fires
+                    # again.  Guarded for exceptions so onboarding can't break
+                    # the input loop.
+                    try:
+                        from agent.onboarding import (
+                            BUSY_INPUT_FLAG,
+                            busy_input_hint_cli,
+                            is_seen,
+                            mark_seen,
+                        )
+                        if not is_seen(CLI_CONFIG, BUSY_INPUT_FLAG):
+                            _cprint(f"  {_DIM}{busy_input_hint_cli(self.busy_input_mode)}{_RST}")
+                            mark_seen(_hermes_home / "config.yaml", BUSY_INPUT_FLAG)
+                            CLI_CONFIG.setdefault("onboarding", {}).setdefault("seen", {})[BUSY_INPUT_FLAG] = True
+                    except Exception:
+                        pass
                else:
                    self._pending_input.put(payload)
                event.app.current_buffer.reset(append_to_history=True)
@@ -9468,6 +9691,17 @@ class HermesCLI:
            """Down arrow: browse history when on last line, else move cursor down."""
            event.app.current_buffer.auto_down(count=event.arg)

+        @kb.add('c-l')
+        def handle_ctrl_l(event):
+            """Ctrl+L: force a clean full-screen repaint.
+
+            Recovers the UI after external terminal buffer drift — tmux /
+            cmux tab switches, ``clear`` from a subshell, SSH window
+            restores, etc. — that prompt_toolkit can't detect on its own.
+            Matches the universal bash/zsh/fish/vim/htop convention.
+            """
+            self._force_full_redraw()
+
        @kb.add('c-c')
        def handle_ctrl_c(event):
            """Handle Ctrl+C - cancel interactive prompts, interrupt agent, or exit.
@@ -9695,10 +9929,18 @@ class HermesCLI:
            placeholder while preserving any existing user text in the
            buffer.
            """
+            # Diagnostic canary: measure how long the paste handler blocks
+            # the prompt_toolkit event loop. If this exceeds ~500ms we log
+            # it so recurring "CLI freezes on paste" reports (issue #16263,
+            # macOS Tahoe 26 + iTerm2/Ghostty) arrive with data attached.
+            _paste_handler_start = time.perf_counter()
+            _paste_raw_size = len(event.data or "")
            pasted_text = event.data or ""
            # Normalise line endings — Windows \r\n and old Mac \r both become \n
            # so the 5-line collapse threshold and display are consistent.
            pasted_text = pasted_text.replace('\r\n', '\n').replace('\r', '\n')
+            pasted_text = _strip_leaked_bracketed_paste_wrappers(pasted_text)
+            pasted_text = _strip_leaked_terminal_responses(pasted_text)
            if _should_auto_attach_clipboard_image_on_paste(pasted_text) and self._try_attach_clipboard_image():
                event.app.invalidate()
            if pasted_text:
@@ -9721,6 +9963,17 @@ class HermesCLI:
                    buf.insert_text(prefix + placeholder)
                else:
                    buf.insert_text(pasted_text)
+            _paste_handler_elapsed_ms = (time.perf_counter() - _paste_handler_start) * 1000.0
+            if _paste_handler_elapsed_ms > 500.0:
+                logger.warning(
+                    "Slow bracketed-paste handler: %.1fms to process %d bytes "
+                    "(%d lines) on %s. If the input becomes unresponsive after "
+                    "this, attach this log line to the bug report.",
+                    _paste_handler_elapsed_ms,
+                    _paste_raw_size,
+                    pasted_text.count('\n') + 1 if pasted_text else 0,
+                    sys.platform,
+                )

        @kb.add('c-v')
        def handle_ctrl_v(event):
@@ -9840,7 +10093,16 @@ class HermesCLI:
               still batch newlines.  Alt+Enter only adds 1 newline per
               event so it never triggers this.
            """
-            text = buf.text
+            text = _strip_leaked_bracketed_paste_wrappers(buf.text)
+            text = _strip_leaked_terminal_responses(text)
+            if text != buf.text:
+                cursor = min(buf.cursor_position, len(text))
+                _paste_just_collapsed[0] = True
+                buf.text = text
+                buf.cursor_position = cursor
+                _prev_text_len[0] = len(text)
+                _prev_newline_count[0] = text.count('\n')
+                return
            chars_added = len(text) - _prev_text_len[0]
            _prev_text_len[0] = len(text)
            if _paste_just_collapsed[0] or self._skip_paste_collapse:
@@ -9909,7 +10171,7 @@ class HermesCLI:
                status = cli_ref._command_status or "Processing command..."
                return f"{frame} {status}"
            if cli_ref._agent_running:
-                return "type a message + Enter to interrupt, Ctrl+C to cancel"
+                return "msg=interrupt · /queue · /bg · /steer · Ctrl+C cancel"
            if cli_ref._voice_mode:
                return "type or Ctrl+B to record"
            return ""
@@ -10497,36 +10759,30 @@ class HermesCLI:
        # only cursor_up()s by the stored layout height, missing the extra
        # rows created by reflow — leaving ghost duplicates visible.
        #
-        # Fix: before the standard erase, inflate _cursor_pos.y so the
-        # cursor moves up far enough to cover the reflowed ghost content.
+        # It's not just column-shrink: widening, row-shrinking, and
+        # multiplexer-driven SIGWINCH-less redraws (cmux / tmux tab switch)
+        # all produce the same class of drift, where the renderer's tracked
+        # _cursor_pos.y no longer matches terminal reality. The only reliable
+        # recovery is a full screen-clear (\x1b[2J\x1b[H) before the next
+        # redraw, so we force one on every resize rather than trying to
+        # compute the exact drift.
        _original_on_resize = app._on_resize

        def _resize_clear_ghosts():
-            from prompt_toolkit.data_structures import Point as _Pt
            renderer = app.renderer
            try:
-                old_size = renderer._last_size
-                new_size = renderer.output.get_size()
-                if (
-                    old_size
-                    and new_size.columns < old_size.columns
-                    and new_size.columns > 0
-                ):
-                    reflow_factor = (
-                        (old_size.columns + new_size.columns - 1)
-                        // new_size.columns
-                    )
-                    last_h = (
-                        renderer._last_screen.height
-                        if renderer._last_screen
-                        else 0
-                    )
-                    extra = last_h * (reflow_factor - 1)
-                    if extra > 0:
-                        renderer._cursor_pos = _Pt(
-                            x=renderer._cursor_pos.x,
-                            y=renderer._cursor_pos.y + extra,
-                        )
+                out = renderer.output
+                # Reset attributes, erase the entire screen, and home the
+                # cursor. This overwrites any reflowed status-bar rows or
+                # stale content the terminal kept from the prior layout.
+                out.reset_attributes()
+                out.erase_screen()
+                out.cursor_goto(0, 0)
+                out.flush()
+                # Tell the renderer its tracked position is fresh so its
+                # own erase() inside _on_resize doesn't cursor_up() past
+                # the top of the screen.
+                renderer.reset(leave_alternate_screen=False)
            except Exception:
                pass  # never break resize handling
            _original_on_resize()
@@ -10534,7 +10790,6 @@ class HermesCLI:
        app._on_resize = _resize_clear_ghosts

        def spinner_loop():
-            last_idle_refresh = 0.0
            while not self._should_exit:
                if not self._app:
                    time.sleep(0.1)
@@ -10543,10 +10798,11 @@ class HermesCLI:
                    self._invalidate(min_interval=0.1)
                    time.sleep(0.1)
                else:
-                    now = time.monotonic()
-                    if now - last_idle_refresh >= 1.0:
-                        last_idle_refresh = now
-                        self._invalidate(min_interval=1.0)
+                    # Do not repaint the idle prompt every second. In non-full-screen
+                    # prompt_toolkit mode, background redraws can fight tmux/Ghostty/cmux
+                    # viewport restoration after focus changes and visually move the
+                    # command input area. Keep idle stable; input/agent events still
+                    # invalidate explicitly when the UI actually changes.
                    time.sleep(0.2)

        spinner_thread = threading.Thread(target=spinner_loop, daemon=True)
@@ -10588,6 +10844,10 @@ class HermesCLI:
                    submit_images = []
                    if isinstance(user_input, tuple):
                        user_input, submit_images = user_input
+
+                    if isinstance(user_input, str):
+                        user_input = _strip_leaked_bracketed_paste_wrappers(user_input)
+                        user_input = _strip_leaked_terminal_responses(user_input)
                    
                    # Check for commands — but detect dragged/pasted file paths first.
                    # See _detect_file_drop() for details.
@@ -311,6 +311,12 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None

    elif schedule["kind"] == "cron":
        if not HAS_CRONITER:
+            logger.warning(
+                "Cannot compute next run for cron schedule %r: 'croniter' "
+                "is not installed. Install the 'cron' extra (pip install "
+                "'hermes-agent[cron]') to re-enable recurring cron jobs.",
+                schedule.get("expr"),
+            )
            return None
        cron = croniter(schedule["expr"], now)
        next_run = cron.get_next(datetime)
@@ -698,10 +704,32 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
                # Compute next run
                job["next_run_at"] = compute_next_run(job["schedule"], now)

-                # If no next run (one-shot completed), disable
+                # If no next run, decide whether this is terminal completion
+                # (one-shot) or a transient failure (recurring schedule couldn't
+                # compute — e.g. 'croniter' missing from the runtime env).
+                # Recurring jobs must NEVER be silently disabled: that turns a
+                # missing runtime dep into "job completed" and the user's
+                # schedule quietly goes off. See issue #16265.
                if job["next_run_at"] is None:
-                    job["enabled"] = False
-                    job["state"] = "completed"
+                    kind = job.get("schedule", {}).get("kind")
+                    if kind in ("cron", "interval"):
+                        job["state"] = "error"
+                        if not job.get("last_error"):
+                            job["last_error"] = (
+                                "Failed to compute next run for recurring "
+                                "schedule (is the 'croniter' package "
+                                "installed in the gateway's Python env?)"
+                            )
+                        logger.error(
+                            "Job '%s' (%s) could not compute next_run_at; "
+                            "leaving enabled and marking state=error so the "
+                            "job is not silently disabled.",
+                            job.get("name", job["id"]),
+                            kind,
+                        )
+                    else:
+                        job["enabled"] = False
+                        job["state"] = "completed"
                elif job.get("state") != "paused":
                    job["state"] = "scheduled"

@@ -77,7 +77,7 @@ _KNOWN_DELIVERY_PLATFORMS = frozenset({
    "telegram", "discord", "slack", "whatsapp", "signal",
    "matrix", "mattermost", "homeassistant", "dingtalk", "feishu",
    "wecom", "wecom_callback", "weixin", "sms", "email", "webhook", "bluebubbles",
-    "qqbot",
+    "qqbot", "yuanbao",
 })

 # Platforms that support a configured cron/notification home target, mapped to
@@ -337,6 +337,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
        "sms": Platform.SMS,
        "bluebubbles": Platform.BLUEBUBBLES,
        "qqbot": Platform.QQBOT,
+        "yuanbao": Platform.YUANBAO,
    }

    # Optionally wrap the content with a header/footer so the user knows this
@@ -715,7 +716,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
    # Always prepend cron execution guidance so the agent knows how
    # delivery works and can suppress delivery when appropriate.
    cron_hint = (
-        "[SYSTEM: You are running as a scheduled cron job. "
+        "[IMPORTANT: You are running as a scheduled cron job. "
        "DELIVERY: Your final response will be automatically delivered "
        "to the user — do NOT use send_message or try to deliver "
        "the output yourself. Just produce your report/output as your "
@@ -751,7 +752,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
            parts.append("")
        parts.extend(
            [
-                f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
+                f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
                "",
                content,
            ]
@@ -759,7 +760,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:

    if skipped:
        notice = (
-            f"[SYSTEM: The following skill(s) were listed for this job but could not be found "
+            f"[IMPORTANT: The following skill(s) were listed for this job but could not be found "
            f"and were skipped: {', '.join(skipped)}. "
            f"Start your response with a brief notice so the user is aware, e.g.: "
            f"'⚠️ Skill(s) not found and skipped: {', '.join(skipped)}']"
@@ -821,6 +822,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
    logger.info("Running job '%s' (ID: %s)", job_name, job_id)
    logger.info("Prompt: %s", prompt[:100])

+    agent = None
+
    # Mark this as a cron session so the approval system can apply cron_mode.
    # This env var is process-wide and persists for the lifetime of the
    # scheduler process — every job this process runs is a cron job.
@@ -1169,6 +1172,24 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
                _session_db.close()
            except (Exception, KeyboardInterrupt) as e:
                logger.debug("Job '%s': failed to close SQLite session store: %s", job_id, e)
+        # Release subprocesses, terminal sandboxes, browser daemons, and the
+        # main OpenAI/httpx client held by this ephemeral cron agent. Without
+        # this, a gateway that ticks cron every N minutes leaks fds per job
+        # until it hits EMFILE (#10200 / "too many open files").
+        try:
+            if agent is not None:
+                agent.close()
+        except (Exception, KeyboardInterrupt) as e:
+            logger.debug("Job '%s': failed to close agent resources: %s", job_id, e)
+        # Each cron run spins up a short-lived worker thread whose event loop
+        # dies as soon as the ``ThreadPoolExecutor`` shuts down. Any async
+        # httpx clients cached under that loop are now unusable — reap them
+        # so their transports don't accumulate in the process-global cache.
+        try:
+            from agent.auxiliary_client import cleanup_stale_async_clients
+            cleanup_stale_async_clients()
+        except Exception as e:
+            logger.debug("Job '%s': failed to reap stale auxiliary clients: %s", job_id, e)


 def tick(verbose: bool = True, adapters=None, loop=None) -> int:
@@ -1308,6 +1329,17 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
                    _futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
                _results.extend(f.result() for f in _futures)

+        # Best-effort sweep of MCP stdio subprocesses that survived their
+        # session teardown during this tick.  Runs AFTER every job has
+        # finished so active sessions (including live user chats) are
+        # never touched — only PIDs explicitly detected as orphans in
+        # tools.mcp_tool._run_stdio's finally block are reaped.
+        try:
+            from tools.mcp_tool import _kill_orphaned_mcp_children
+            _kill_orphaned_mcp_children()
+        except Exception as _e:
+            logger.debug("Post-tick MCP orphan cleanup failed: %s", _e)
+
        return sum(_results)
    finally:
        if fcntl:
@@ -41,6 +41,15 @@ if [ "$(id -u)" = "0" ]; then
            echo "Warning: chown failed (rootless container?) — continuing anyway"
    fi

+    # Ensure config.yaml is readable by the hermes runtime user even if it was
+    # edited on the host after initial ownership setup. Must run here (as root)
+    # rather than after the gosu drop, otherwise a non-root caller like
+    # `docker run -u $(id -u):$(id -g)` hits "Operation not permitted" (#15865).
+    if [ -f "$HERMES_HOME/config.yaml" ]; then
+        chown hermes:hermes "$HERMES_HOME/config.yaml" 2>/dev/null || true
+        chmod 640 "$HERMES_HOME/config.yaml" 2>/dev/null || true
+    fi
+
    echo "Dropping root privileges"
    exec gosu hermes "$0" "$@"
 fi
@@ -67,13 +76,6 @@ if [ ! -f "$HERMES_HOME/config.yaml" ]; then
    cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
 fi

-# Ensure the main config file remains accessible to the hermes runtime user
-# even if it was edited on the host after initial ownership setup.
-if [ -f "$HERMES_HOME/config.yaml" ]; then
-    chown hermes:hermes "$HERMES_HOME/config.yaml"
-    chmod 640 "$HERMES_HOME/config.yaml"
-fi
-
 # SOUL.md
 if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
    cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
@@ -36,6 +36,7 @@

      imports = [
        ./nix/packages.nix
+        ./nix/overlays.nix
        ./nix/nixosModules.nix
        ./nix/checks.nix
        ./nix/devShell.nix
@@ -57,7 +57,7 @@ def _session_entry_name(origin: Dict[str, Any]) -> str:
 # Build / refresh
 # ---------------------------------------------------------------------------

-def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
+async def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
    """
    Build a channel directory from connected platform adapters and session data.

@@ -72,7 +72,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
            if platform == Platform.DISCORD:
                platforms["discord"] = _build_discord(adapter)
            elif platform == Platform.SLACK:
-                platforms["slack"] = _build_slack(adapter)
+                platforms["slack"] = await _build_slack(adapter)
        except Exception as e:
            logger.warning("Channel directory: failed to build %s: %s", platform.value, e)

@@ -136,21 +136,66 @@ def _build_discord(adapter) -> List[Dict[str, str]]:
    return channels


-def _build_slack(adapter) -> List[Dict[str, str]]:
-    """List Slack channels the bot has joined."""
-    # Slack adapter may expose a web client
-    client = getattr(adapter, "_app", None) or getattr(adapter, "_client", None)
-    if not client:
+async def _build_slack(adapter) -> List[Dict[str, Any]]:
+    """List Slack channels the bot has joined across all workspaces.
+
+    Uses ``users.conversations`` against each workspace's web client. Pulls
+    public + private channels the bot is a member of, then merges in DMs
+    discovered from session history (IMs aren't useful to enumerate
+    proactively).
+    """
+    team_clients = getattr(adapter, "_team_clients", None) or {}
+    if not team_clients:
        return _build_from_sessions("slack")

-    try:
-        from tools.send_message_tool import _send_slack  # noqa: F401
-        # Use the Slack Web API directly if available
-    except Exception:
-        pass
+    channels: List[Dict[str, Any]] = []
+    seen_ids: set = set()

-    # Fallback to session data
-    return _build_from_sessions("slack")
+    for team_id, client in team_clients.items():
+        try:
+            cursor: Optional[str] = None
+            for _page in range(20):  # safety cap on pagination
+                response = await client.users_conversations(
+                    types="public_channel,private_channel",
+                    exclude_archived=True,
+                    limit=200,
+                    cursor=cursor,
+                )
+                if not response.get("ok"):
+                    logger.warning(
+                        "Channel directory: users.conversations not ok for team %s: %s",
+                        team_id,
+                        response.get("error", "unknown"),
+                    )
+                    break
+                for ch in response.get("channels", []):
+                    cid = ch.get("id")
+                    name = ch.get("name")
+                    if not cid or not name or cid in seen_ids:
+                        continue
+                    seen_ids.add(cid)
+                    channels.append({
+                        "id": cid,
+                        "name": name,
+                        "type": "private" if ch.get("is_private") else "channel",
+                    })
+                cursor = (response.get("response_metadata") or {}).get("next_cursor")
+                if not cursor:
+                    break
+        except Exception as e:
+            logger.warning(
+                "Channel directory: failed to list Slack channels for team %s: %s",
+                team_id, e,
+            )
+            continue
+
+    # Merge in DM/group entries discovered from session history.
+    for entry in _build_from_sessions("slack"):
+        if entry.get("id") not in seen_ids:
+            channels.append(entry)
+            seen_ids.add(entry.get("id"))
+
+    return channels


 def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
@@ -223,6 +268,14 @@ def resolve_channel_name(platform_name: str, name: str) -> Optional[str]:
    if not channels:
        return None

+    # 0. Exact ID match — case-sensitive, no normalization. Lets callers pass
+    # raw platform IDs (e.g. Slack "C0B0QV5434G") even when the format guard
+    # in _parse_target_ref hasn't recognized them as explicit.
+    raw = name.strip()
+    for ch in channels:
+        if ch.get("id") == raw:
+            return ch["id"]
+
    query = _normalize_channel_query(name)

    # 1. Exact name match, including the display labels shown by send_message(action="list")
@@ -67,6 +67,7 @@ class Platform(Enum):
    WEIXIN = "weixin"
    BLUEBUBBLES = "bluebubbles"
    QQBOT = "qqbot"
+    YUANBAO = "yuanbao"


@dataclass
@@ -195,6 +196,14 @@ class StreamingConfig:
    edit_interval: float = 1.0    # Seconds between message edits (Telegram rate-limits at ~1/s)
    buffer_threshold: int = 40    # Chars before forcing an edit
    cursor: str = " ▉"           # Cursor shown during streaming
+    # Ported from openclaw/openclaw#72038.  When >0, the final edit for
+    # a long-running streamed response is delivered as a fresh message
+    # if the original preview has been visible for at least this many
+    # seconds, so the platform's visible timestamp reflects completion
+    # time instead of the preview creation time.  Currently applied to
+    # Telegram only (other platforms ignore the setting).  Default 60s
+    # matches the OpenClaw rollout.  Set to 0 to disable.
+    fresh_final_after_seconds: float = 60.0

    def to_dict(self) -> Dict[str, Any]:
        return {
@@ -203,6 +212,7 @@ class StreamingConfig:
            "edit_interval": self.edit_interval,
            "buffer_threshold": self.buffer_threshold,
            "cursor": self.cursor,
+            "fresh_final_after_seconds": self.fresh_final_after_seconds,
        }

    @classmethod
@@ -215,6 +225,9 @@ class StreamingConfig:
            edit_interval=float(data.get("edit_interval", 1.0)),
            buffer_threshold=int(data.get("buffer_threshold", 40)),
            cursor=data.get("cursor", " ▉"),
+            fresh_final_after_seconds=float(
+                data.get("fresh_final_after_seconds", 60.0)
+            ),
        )


@@ -314,6 +327,9 @@ class GatewayConfig:
            # QQBot uses extra dict for app credentials
            elif platform == Platform.QQBOT and config.extra.get("app_id") and config.extra.get("client_secret"):
                connected.append(platform)
+            # Yuanbao uses extra dict for app credentials
+            elif platform == Platform.YUANBAO and config.extra.get("app_id") and config.extra.get("app_secret"):
+                connected.append(platform)
            # DingTalk uses client_id/client_secret from config.extra or env vars
            elif platform == Platform.DINGTALK and (
                config.extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID")
@@ -550,6 +566,8 @@ def load_gateway_config() -> GatewayConfig:
                        existing = {}
                    # Deep-merge extra dicts so gateway.json defaults survive
                    merged_extra = {**existing.get("extra", {}), **plat_block.get("extra", {})}
+                    if plat_name == Platform.SLACK.value and "enabled" in plat_block:
+                        merged_extra["_enabled_explicit"] = True
                    merged = {**existing, **plat_block}
                    if merged_extra:
                        merged["extra"] = merged_extra
@@ -570,6 +588,8 @@ def load_gateway_config() -> GatewayConfig:
                    )
                if "reply_prefix" in platform_cfg:
                    bridged["reply_prefix"] = platform_cfg["reply_prefix"]
+                if "reply_in_thread" in platform_cfg:
+                    bridged["reply_in_thread"] = platform_cfg["reply_in_thread"]
                if "require_mention" in platform_cfg:
                    bridged["require_mention"] = platform_cfg["require_mention"]
                if "free_response_channels" in platform_cfg:
@@ -584,7 +604,7 @@ def load_gateway_config() -> GatewayConfig:
                    bridged["group_policy"] = platform_cfg["group_policy"]
                if "group_allow_from" in platform_cfg:
                    bridged["group_allow_from"] = platform_cfg["group_allow_from"]
-                if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
+                if plat in (Platform.DISCORD, Platform.SLACK) and "channel_skill_bindings" in platform_cfg:
                    bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
                if "channel_prompts" in platform_cfg:
                    channel_prompts = platform_cfg["channel_prompts"]
@@ -592,16 +612,21 @@ def load_gateway_config() -> GatewayConfig:
                        bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
                    else:
                        bridged["channel_prompts"] = channel_prompts
-                if not bridged:
+                enabled_was_explicit = "enabled" in platform_cfg
+                if not bridged and not enabled_was_explicit:
                    continue
                plat_data = platforms_data.setdefault(plat.value, {})
                if not isinstance(plat_data, dict):
                    plat_data = {}
                    platforms_data[plat.value] = plat_data
+                if enabled_was_explicit:
+                    plat_data["enabled"] = platform_cfg["enabled"]
                extra = plat_data.setdefault("extra", {})
                if not isinstance(extra, dict):
                    extra = {}
                    plat_data["extra"] = extra
+                if plat == Platform.SLACK and enabled_was_explicit:
+                    extra["_enabled_explicit"] = True
                extra.update(bridged)

            # Slack settings → env vars (env vars take precedence)
@@ -609,6 +634,8 @@ def load_gateway_config() -> GatewayConfig:
            if isinstance(slack_cfg, dict):
                if "require_mention" in slack_cfg and not os.getenv("SLACK_REQUIRE_MENTION"):
                    os.environ["SLACK_REQUIRE_MENTION"] = str(slack_cfg["require_mention"]).lower()
+                if "strict_mention" in slack_cfg and not os.getenv("SLACK_STRICT_MENTION"):
+                    os.environ["SLACK_STRICT_MENTION"] = str(slack_cfg["strict_mention"]).lower()
                if "allow_bots" in slack_cfg and not os.getenv("SLACK_ALLOW_BOTS"):
                    os.environ["SLACK_ALLOW_BOTS"] = str(slack_cfg["allow_bots"]).lower()
                frc = slack_cfg.get("free_response_channels")
@@ -918,8 +945,20 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
    slack_token = os.getenv("SLACK_BOT_TOKEN")
    if slack_token:
        if Platform.SLACK not in config.platforms:
+            # No yaml config for Slack — env-only setup, enable it
            config.platforms[Platform.SLACK] = PlatformConfig()
-        config.platforms[Platform.SLACK].enabled = True
+            config.platforms[Platform.SLACK].enabled = True
+        else:
+            slack_config = config.platforms[Platform.SLACK]
+            enabled_was_explicit = bool(slack_config.extra.pop("_enabled_explicit", False))
+            if not slack_config.enabled and not enabled_was_explicit:
+                # Top-level Slack settings such as channel prompts should not
+                # turn an env-token setup into a disabled platform. Only an
+                # explicit slack.enabled/platforms.slack.enabled false should.
+                slack_config.enabled = True
+        # If yaml config exists, respect its enabled flag (don't override
+        # explicit enabled: false). Token is still stored so skills that
+        # send Slack messages can use it without activating the gateway adapter.
        config.platforms[Platform.SLACK].token = slack_token
    slack_home = os.getenv("SLACK_HOME_CHANNEL")
    if slack_home and Platform.SLACK in config.platforms:
@@ -1276,6 +1315,48 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
                name=os.getenv("QQBOT_HOME_CHANNEL_NAME") or os.getenv(qq_home_name_env, "Home"),
            )

+    # Yuanbao — YUANBAO_APP_ID preferred
+    yuanbao_app_id = os.getenv("YUANBAO_APP_ID") or os.getenv("YUANBAO_APP_KEY")
+    yuanbao_app_secret = os.getenv("YUANBAO_APP_SECRET")
+    if yuanbao_app_id and yuanbao_app_secret:
+        if Platform.YUANBAO not in config.platforms:
+            config.platforms[Platform.YUANBAO] = PlatformConfig()
+        config.platforms[Platform.YUANBAO].enabled = True
+        extra = config.platforms[Platform.YUANBAO].extra
+        extra["app_id"] = yuanbao_app_id
+        extra["app_secret"] = yuanbao_app_secret
+        yuanbao_bot_id = os.getenv("YUANBAO_BOT_ID")
+        if yuanbao_bot_id:
+            extra["bot_id"] = yuanbao_bot_id
+        yuanbao_ws_url = os.getenv("YUANBAO_WS_URL")
+        if yuanbao_ws_url:
+            extra["ws_url"] = yuanbao_ws_url
+        yuanbao_api_domain = os.getenv("YUANBAO_API_DOMAIN")
+        if yuanbao_api_domain:
+            extra["api_domain"] = yuanbao_api_domain
+        yuanbao_route_env = os.getenv("YUANBAO_ROUTE_ENV")
+        if yuanbao_route_env:
+            extra["route_env"] = yuanbao_route_env
+        yuanbao_home = os.getenv("YUANBAO_HOME_CHANNEL")
+        if yuanbao_home:
+            config.platforms[Platform.YUANBAO].home_channel = HomeChannel(
+                platform=Platform.YUANBAO,
+                chat_id=yuanbao_home,
+                name=os.getenv("YUANBAO_HOME_CHANNEL_NAME", "Home"),
+            )
+        yuanbao_dm_policy = os.getenv("YUANBAO_DM_POLICY")
+        if yuanbao_dm_policy:
+            extra["dm_policy"] = yuanbao_dm_policy.strip().lower()
+        yuanbao_dm_allow_from = os.getenv("YUANBAO_DM_ALLOW_FROM")
+        if yuanbao_dm_allow_from:
+            extra["dm_allow_from"] = yuanbao_dm_allow_from
+        yuanbao_group_policy = os.getenv("YUANBAO_GROUP_POLICY")
+        if yuanbao_group_policy:
+            extra["group_policy"] = yuanbao_group_policy.strip().lower()
+        yuanbao_group_allow_from = os.getenv("YUANBAO_GROUP_ALLOW_FROM")
+        if yuanbao_group_allow_from:
+            extra["group_allow_from"] = yuanbao_group_allow_from
+
    # Session settings
    idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
    if idle_minutes:
@@ -79,7 +79,9 @@ _PLATFORM_DEFAULTS: dict[str, dict[str, Any]] = {
    "discord":     _TIER_HIGH,

    # Tier 2 — edit support, often customer/workspace channels
-    "slack":           _TIER_MEDIUM,
+    # Slack: tool_progress off by default — Bolt posts cannot be edited like CLI;
+    # "new"/"all" spam permanent lines in channels (hermes-agent#14663).
+    "slack":           {**_TIER_MEDIUM, "tool_progress": "off"},
    "mattermost":      _TIER_MEDIUM,
    "matrix":          _TIER_MEDIUM,
    "feishu":          _TIER_MEDIUM,
@@ -28,6 +28,7 @@ def mirror_to_session(
    message_text: str,
    source_label: str = "cli",
    thread_id: Optional[str] = None,
+    user_id: Optional[str] = None,
 ) -> bool:
    """
    Append a delivery-mirror message to the target session's transcript.
@@ -39,9 +40,20 @@ def mirror_to_session(
    All errors are caught -- this is never fatal.
    """
    try:
-        session_id = _find_session_id(platform, str(chat_id), thread_id=thread_id)
+        session_id = _find_session_id(
+            platform,
+            str(chat_id),
+            thread_id=thread_id,
+            user_id=user_id,
+        )
        if not session_id:
-            logger.debug("Mirror: no session found for %s:%s:%s", platform, chat_id, thread_id)
+            logger.debug(
+                "Mirror: no session found for %s:%s:%s:%s",
+                platform,
+                chat_id,
+                thread_id,
+                user_id,
+            )
            return False

        mirror_msg = {
@@ -59,17 +71,33 @@ def mirror_to_session(
        return True

    except Exception as e:
-        logger.debug("Mirror failed for %s:%s:%s: %s", platform, chat_id, thread_id, e)
+        logger.debug(
+            "Mirror failed for %s:%s:%s:%s: %s",
+            platform,
+            chat_id,
+            thread_id,
+            user_id,
+            e,
+        )
        return False


-def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = None) -> Optional[str]:
+def _find_session_id(
+    platform: str,
+    chat_id: str,
+    thread_id: Optional[str] = None,
+    user_id: Optional[str] = None,
+) -> Optional[str]:
    """
    Find the active session_id for a platform + chat_id pair.

    Scans sessions.json entries and matches where origin.chat_id == chat_id
    on the right platform.  DM session keys don't embed the chat_id
    (e.g. "agent:main:telegram:dm"), so we check the origin dict.
+
+    When *user_id* is provided, prefer exact sender matches. If multiple
+    same-chat candidates exist and none matches the user, return None instead
+    of guessing and contaminating another participant's session.
    """
    if not _SESSIONS_INDEX.exists():
        return None
@@ -81,8 +109,7 @@ def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = Non
        return None

    platform_lower = platform.lower()
-    best_match = None
-    best_updated = ""
+    candidates = []

    for _key, entry in data.items():
        origin = entry.get("origin") or {}
@@ -96,12 +123,31 @@ def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = Non
            origin_thread_id = origin.get("thread_id")
            if thread_id is not None and str(origin_thread_id or "") != str(thread_id):
                continue
-            updated = entry.get("updated_at", "")
-            if updated > best_updated:
-                best_updated = updated
-                best_match = entry.get("session_id")
+            candidates.append(entry)

-    return best_match
+    if not candidates:
+        return None
+
+    if user_id:
+        exact_user_matches = [
+            entry for entry in candidates
+            if str((entry.get("origin") or {}).get("user_id") or "") == str(user_id)
+        ]
+        if exact_user_matches:
+            candidates = exact_user_matches
+        elif len(candidates) > 1:
+            return None
+    elif len(candidates) > 1:
+        distinct_user_ids = {
+            str((entry.get("origin") or {}).get("user_id") or "").strip()
+            for entry in candidates
+            if str((entry.get("origin") or {}).get("user_id") or "").strip()
+        }
+        if len(distinct_user_ids) > 1:
+            return None
+
+    best_entry = max(candidates, key=lambda entry: entry.get("updated_at", ""))
+    return best_entry.get("session_id")


 def _append_to_jsonl(session_id: str, message: dict) -> None:
@@ -10,10 +10,12 @@ Each adapter handles:

 from .base import BasePlatformAdapter, MessageEvent, SendResult
 from .qqbot import QQAdapter
+from .yuanbao import YuanbaoAdapter

 __all__ = [
    "BasePlatformAdapter",
    "MessageEvent",
    "SendResult",
    "QQAdapter",
+    "YuanbaoAdapter",
 ]
@@ -336,6 +336,39 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
    return {}, {"proxy": proxy_url}


+def is_host_excluded_by_no_proxy(hostname: str, no_proxy_value: str | None = None) -> bool:
+    """Return True when ``hostname`` matches a ``NO_PROXY`` entry.
+
+    Supports comma- or whitespace-separated entries with optional leading dots
+    and ``*.`` wildcards, which match both the apex domain and subdomains.
+    """
+    raw = no_proxy_value
+    if raw is None:
+        raw = os.environ.get("NO_PROXY") or os.environ.get("no_proxy") or ""
+
+    raw = raw.strip()
+    if not raw:
+        return False
+
+    lower_hostname = hostname.lower()
+    for entry in re.split(r"[\s,]+", raw):
+        normalized = entry.strip().lower()
+        if not normalized:
+            continue
+        if normalized == "*":
+            return True
+
+        if normalized.startswith("*."):
+            normalized = normalized[2:]
+        elif normalized.startswith("."):
+            normalized = normalized[1:]
+
+        if lower_hostname == normalized or lower_hostname.endswith(f".{normalized}"):
+            return True
+
+    return False
+
+
 from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path
@@ -693,7 +726,15 @@ SUPPORTED_DOCUMENT_TYPES = {
    ".pdf": "application/pdf",
    ".md": "text/markdown",
    ".txt": "text/plain",
+    ".csv": "text/csv",
    ".log": "text/plain",
+    ".json": "application/json",
+    ".xml": "application/xml",
+    ".yaml": "application/yaml",
+    ".yml": "application/yaml",
+    ".toml": "application/toml",
+    ".ini": "text/plain",
+    ".cfg": "text/plain",
    ".zip": "application/zip",
    ".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
@@ -982,6 +1023,61 @@ def resolve_channel_prompt(
    return None


+def resolve_channel_skills(
+    config_extra: dict,
+    channel_id: str,
+    parent_id: str | None = None,
+) -> list[str] | None:
+    """Resolve auto-loaded skill(s) for a channel/thread from platform config.
+
+    Looks up ``channel_skill_bindings`` in the adapter's ``config.extra`` dict.
+
+    Config format::
+
+        channel_skill_bindings:
+          - id: "C0123"          # Slack channel ID or Discord channel/forum ID
+            skills: ["skill-a", "skill-b"]
+          - id: "D0ABCDE"
+            skill: "solo-skill"  # single string also accepted
+
+    Prefers an exact match on *channel_id*; falls back to *parent_id*
+    (useful for forum threads / Slack threads inheriting the parent channel's
+    binding).
+
+    Returns a deduplicated list of skill names (order preserved), or None if
+    no match is found.
+    """
+    bindings = config_extra.get("channel_skill_bindings") or []
+    if not isinstance(bindings, list) or not bindings:
+        return None
+    ids_to_check: set[str] = set()
+    if channel_id:
+        ids_to_check.add(str(channel_id))
+    if parent_id:
+        ids_to_check.add(str(parent_id))
+    if not ids_to_check:
+        return None
+    for entry in bindings:
+        if not isinstance(entry, dict):
+            continue
+        entry_id = str(entry.get("id", ""))
+        if entry_id in ids_to_check:
+            skills = entry.get("skills") or entry.get("skill")
+            if isinstance(skills, str):
+                s = skills.strip()
+                return [s] if s else None
+            if isinstance(skills, list) and skills:
+                seen: list[str] = []
+                for name in skills:
+                    if not isinstance(name, str):
+                        continue
+                    nm = name.strip()
+                    if nm and nm not in seen:
+                        seen.append(nm)
+                return seen or None
+    return None
+
+
 class BasePlatformAdapter(ABC):
    """
    Base class for platform adapters.
@@ -1025,7 +1121,20 @@ class BasePlatformAdapter(ABC):
        self._post_delivery_callbacks: Dict[str, Any] = {}
        self._expected_cancelled_tasks: set[asyncio.Task] = set()
        self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
-        # Chats where auto-TTS on voice input is disabled (set by /voice off)
+        # Auto-TTS on voice input: ``_auto_tts_default`` is the global default
+        # (``voice.auto_tts`` in config.yaml, pushed by GatewayRunner on connect).
+        # Per-chat overrides live in two sets populated from ``_voice_mode``:
+        #   - ``_auto_tts_enabled_chats``: chat explicitly opted in via ``/voice on``
+        #     or ``/voice tts`` (mode is ``voice_only`` or ``all``). Fires even when
+        #     the global default is False.
+        #   - ``_auto_tts_disabled_chats``: chat explicitly opted out via
+        #     ``/voice off`` (mode is ``off``). Suppresses auto-TTS even when the
+        #     global default is True.
+        # The gate in _process_message() is:
+        #   fire if chat in _auto_tts_enabled_chats
+        #     OR (_auto_tts_default and chat not in _auto_tts_disabled_chats)
+        self._auto_tts_default: bool = False
+        self._auto_tts_enabled_chats: set = set()
        self._auto_tts_disabled_chats: set = set()
        # Chats where typing indicator is paused (e.g. during approval waits).
        # _keep_typing skips send_typing when the chat_id is in this set.
@@ -1047,6 +1156,21 @@ class BasePlatformAdapter(ABC):
    def fatal_error_retryable(self) -> bool:
        return self._fatal_error_retryable

+    def _should_auto_tts_for_chat(self, chat_id: str) -> bool:
+        """Whether auto-TTS on voice input should fire for ``chat_id``.
+
+        Decision layers (Issue #16007):
+          1. Explicit ``/voice on`` or ``/voice tts`` → always fire (even if
+             ``voice.auto_tts`` is False).
+          2. Explicit ``/voice off`` → never fire.
+          3. Fall back to the global ``voice.auto_tts`` config default.
+        """
+        if chat_id in self._auto_tts_enabled_chats:
+            return True
+        if chat_id in self._auto_tts_disabled_chats:
+            return False
+        return bool(self._auto_tts_default)
+
    def set_fatal_error_handler(self, handler: Callable[["BasePlatformAdapter"], Awaitable[None] | None]) -> None:
        self._fatal_error_handler = handler

@@ -1230,6 +1354,27 @@ class BasePlatformAdapter(ABC):
        """
        return SendResult(success=False, error="Not supported")

+    async def delete_message(
+        self,
+        chat_id: str,
+        message_id: str,
+    ) -> bool:
+        """
+        Delete a previously sent message.  Optional — platforms that don't
+        support deletion return ``False`` and callers fall back to leaving
+        the message in place.
+
+        Used by the stream consumer's fresh-final cleanup path (see
+        openclaw/openclaw#72038) to remove long-lived preview messages
+        after sending the completed reply as a fresh message so the
+        platform's visible timestamp reflects completion time.
+
+        Returns ``True`` on successful deletion, ``False`` otherwise.
+        Subclasses should override for platforms with a deletion API
+        (e.g. Telegram ``deleteMessage``).
+        """
+        return False
+
    async def send_typing(self, chat_id: str, metadata=None) -> None:
        """
        Send a typing indicator.
@@ -1557,13 +1702,41 @@ class BasePlatformAdapter(ABC):
        the agent is waiting for dangerous-command approval).  This is critical
        for Slack's Assistant API where ``assistant_threads_setStatus`` disables
        the compose box — pausing lets the user type ``/approve`` or ``/deny``.
+
+        Each ``send_typing`` call is bounded by a ~1.5s timeout so a slow
+        network round-trip can't stall the refresh cadence.  Telegram- and
+        Discord-side typing expire after ~5s; if any individual send_typing
+        takes longer than the refresh interval, the bubble would die and
+        stay dead until that call returns.  Abandoning the slow call lets
+        the next tick fire a fresh send_typing on schedule — as long as
+        one of them succeeds within the 5s platform-side window, the bubble
+        stays visible across provider stalls / upstream API timeouts.
        """
+        # Bound each send_typing round-trip so the refresh cadence isn't
+        # gated on network health.  Must stay below ``interval`` so a slow
+        # call gets abandoned before the next scheduled tick.
+        _send_typing_timeout = max(0.25, min(1.5, interval - 0.25))
        try:
            while True:
                if stop_event is not None and stop_event.is_set():
                    return
                if chat_id not in self._typing_paused:
-                    await self.send_typing(chat_id, metadata=metadata)
+                    try:
+                        await asyncio.wait_for(
+                            self.send_typing(chat_id, metadata=metadata),
+                            timeout=_send_typing_timeout,
+                        )
+                    except asyncio.TimeoutError:
+                        # Slow network — abandon this tick, keep the loop
+                        # on schedule so the next send_typing fires fresh.
+                        pass
+                    except asyncio.CancelledError:
+                        raise
+                    except Exception as typing_err:
+                        logger.debug(
+                            "[%s] send_typing error (non-fatal): %s",
+                            self.name, typing_err,
+                        )
                if stop_event is None:
                    await asyncio.sleep(interval)
                    continue
@@ -2214,12 +2387,14 @@ class BasePlatformAdapter(ABC):
                    logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
                
                # Auto-TTS: if voice message, generate audio FIRST (before sending text)
-                # Skipped when the chat has voice mode disabled (/voice off)
+                # Gated via ``_should_auto_tts_for_chat``: fires when the chat has
+                # an explicit ``/voice on|tts`` opt-in OR when ``voice.auto_tts`` is
+                # True globally and no ``/voice off`` has been issued.
                _tts_path = None
-                if (event.message_type == MessageType.VOICE
+                if (self._should_auto_tts_for_chat(event.source.chat_id)
+                        and event.message_type == MessageType.VOICE
                        and text_content
-                        and not media_files
-                        and event.source.chat_id not in self._auto_tts_disabled_chats):
+                        and not media_files):
                    try:
                        from tools.tts_tool import text_to_speech_tool, check_tts_requirements
                        if check_tts_requirements():
@@ -2315,11 +2315,6 @@ class DiscordAdapter(BasePlatformAdapter):
        async def slash_background(interaction: discord.Interaction, prompt: str):
            await self._run_simple_slash(interaction, f"/background {prompt}", "Background task started~")

-        @tree.command(name="btw", description="Ephemeral side question using session context")
-        @discord.app_commands.describe(question="Your side question (no tools, not persisted)")
-        async def slash_btw(interaction: discord.Interaction, question: str):
-            await self._run_simple_slash(interaction, f"/btw {question}")
-
        # ── Auto-register any gateway-available commands not yet on the tree ──
        # This ensures new commands added to COMMAND_REGISTRY in
        # hermes_cli/commands.py automatically appear as Discord slash
@@ -2684,21 +2679,8 @@ class DiscordAdapter(BasePlatformAdapter):
                skills: ["skill-a", "skill-b"]
        Also checks parent_id so forum threads inherit the forum's bindings.
        """
-        bindings = self.config.extra.get("channel_skill_bindings", [])
-        if not bindings:
-            return None
-        ids_to_check = {channel_id}
-        if parent_id:
-            ids_to_check.add(parent_id)
-        for entry in bindings:
-            entry_id = str(entry.get("id", ""))
-            if entry_id in ids_to_check:
-                skills = entry.get("skills") or entry.get("skill")
-                if isinstance(skills, str):
-                    return [skills]
-                if isinstance(skills, list) and skills:
-                    return list(dict.fromkeys(skills))  # dedup, preserve order
-        return None
+        from gateway.platforms.base import resolve_channel_skills
+        return resolve_channel_skills(self.config.extra, channel_id, parent_id)

    def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
        """Resolve a Discord per-channel prompt, preferring the exact channel over its parent."""
@@ -3312,6 +3294,7 @@ class DiscordAdapter(BasePlatformAdapter):
        chat_topic = self._get_effective_topic(message.channel, is_thread=is_thread)

        # Build source
+        guild = getattr(message, "guild", None)
        source = self.build_source(
            chat_id=str(effective_channel.id),
            chat_name=chat_name,
@@ -3321,7 +3304,7 @@ class DiscordAdapter(BasePlatformAdapter):
            thread_id=thread_id,
            chat_topic=chat_topic,
            is_bot=getattr(message.author, "bot", False),
-            guild_id=str(message.guild.id) if message.guild else None,
+            guild_id=str(guild.id) if guild else None,
            parent_chat_id=parent_channel_id,
            message_id=str(message.id),
        )
@@ -28,6 +28,7 @@ from email.header import decode_header
 from email.mime.multipart import MIMEMultipart
 from email.mime.text import MIMEText
 from email.mime.base import MIMEBase
+from email.utils import formatdate
 from email import encoders
 from pathlib import Path
 from typing import Any, Dict, List, Optional
@@ -504,6 +505,7 @@ class EmailAdapter(BasePlatformAdapter):
            msg["In-Reply-To"] = original_msg_id
            msg["References"] = original_msg_id

+        msg["Date"] = formatdate(localtime=True)
        msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
        msg["Message-ID"] = msg_id

@@ -586,6 +588,7 @@ class EmailAdapter(BasePlatformAdapter):
            msg["In-Reply-To"] = original_msg_id
            msg["References"] = original_msg_id

+        msg["Date"] = formatdate(localtime=True)
        msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
        msg["Message-ID"] = msg_id

@@ -57,6 +57,15 @@ class MessageDeduplicator:
        if len(self._seen) > self._max_size:
            cutoff = now - self._ttl
            self._seen = {k: v for k, v in self._seen.items() if v > cutoff}
+            if len(self._seen) > self._max_size:
+                # TTL pruning alone does not cap the cache when every entry is
+                # still fresh. Keep the newest entries so the helper's
+                # max_size bound is enforced under sustained traffic.
+                newest = sorted(
+                    self._seen.items(),
+                    key=lambda item: item[1],
+                )[-self._max_size:]
+                self._seen = dict(newest)
        return False

    def clear(self):
@@ -1178,13 +1178,83 @@ class MatrixAdapter(BasePlatformAdapter):
    # Event callbacks
    # ------------------------------------------------------------------

+    def _is_self_sender(self, sender: str) -> bool:
+        """Return True if the sender refers to the bot's own account.
+
+        Matrix user IDs are byte-compared after trimming whitespace and
+        lowercasing — some homeservers normalize the localpart case
+        differently at different API surfaces, and the reply-loop tail
+        of the "hall of mirrors" bug (#15763) has been observed with the
+        bot's own account bypassing a case-sensitive equality check.
+
+        When ``self._user_id`` is empty (whoami hasn't resolved yet, or
+        login failed), we cannot prove a sender is NOT us, so we return
+        True defensively — an unidentified bot dropping its own events
+        is always preferable to falling into an echo loop.
+        """
+        own = (self._user_id or "").strip().lower()
+        if not own:
+            return True
+        return sender.strip().lower() == own
+
+    @staticmethod
+    def _is_system_or_bridge_sender(sender: str) -> bool:
+        """Return True if the sender looks like a system / bridge / appservice
+        identity rather than a real user.
+
+        Appservice namespaces on Matrix conventionally prefix bot / puppet
+        user IDs with an underscore (e.g. ``@_telegram_12345:server``,
+        ``@_discord_999:server``, ``@_slack_...:server``).  Server-notices
+        bots and bridge-controller bots on many homeservers use the same
+        pattern.
+
+        We treat these as system identities for pairing purposes: they
+        should never be offered a pairing code, because an operator
+        approving the code would hand the bridge itself permanent
+        authorization — and every outbound message relayed by the bridge
+        would then loop back into the agent as an "authorized user
+        message", which is the root of issue #15763.
+
+        Matches:
+            ``@_something:server``   — appservice namespace convention
+            ``@:server``             — malformed / empty localpart
+            ``:server``              — malformed, no leading ``@``
+        """
+        s = (sender or "").strip()
+        if not s:
+            return True
+        # Localpart is everything between leading '@' and ':'
+        if s.startswith("@"):
+            s = s[1:]
+        if ":" in s:
+            localpart, _, _ = s.partition(":")
+        else:
+            localpart = s
+        if not localpart:
+            return True
+        return localpart.startswith("_")
+
    async def _on_room_message(self, event: Any) -> None:
        """Handle incoming room message events (text, media)."""
        room_id = str(getattr(event, "room_id", ""))
        sender = str(getattr(event, "sender", ""))

-        # Ignore own messages.
-        if sender == self._user_id:
+        # Ignore own messages (case-insensitive; also drops when our own
+        # user_id hasn't been resolved yet — see _is_self_sender docstring
+        # and issue #15763).
+        if self._is_self_sender(sender):
+            return
+
+        # Ignore appservice / bridge / system identities so they never
+        # trigger the pairing flow.  Once a bridge user is paired, every
+        # outbound message it relays would loop back as an authorized
+        # user message (the "hall of mirrors" in #15763).
+        if self._is_system_or_bridge_sender(sender):
+            logger.debug(
+                "Matrix: ignoring system/bridge sender %s in %s",
+                sender,
+                room_id,
+            )
            return

        # Deduplicate by event ID.
@@ -1654,7 +1724,7 @@ class MatrixAdapter(BasePlatformAdapter):
    async def _on_reaction(self, event: Any) -> None:
        """Handle incoming reaction events."""
        sender = str(getattr(event, "sender", ""))
-        if sender == self._user_id:
+        if self._is_self_sender(sender):
            return
        event_id = str(getattr(event, "event_id", ""))
        if self._is_duplicate_event(event_id):
@@ -1209,6 +1209,31 @@ class TelegramAdapter(BasePlatformAdapter):
            )
            return SendResult(success=False, error=str(e))

+    async def delete_message(self, chat_id: str, message_id: str) -> bool:
+        """Delete a previously sent Telegram message.
+
+        Used by the stream consumer's fresh-final cleanup path (ported
+        from openclaw/openclaw#72038) to remove long-lived preview
+        messages after sending the completed reply as a fresh message.
+        Telegram's Bot API ``deleteMessage`` works for bot-posted
+        messages in the last 48 hours.  Failures are non-fatal — the
+        caller leaves the preview in place and logs at debug level.
+        """
+        if not self._bot:
+            return False
+        try:
+            await self._bot.delete_message(
+                chat_id=int(chat_id),
+                message_id=int(message_id),
+            )
+            return True
+        except Exception as e:
+            logger.debug(
+                "[%s] Failed to delete Telegram message %s: %s",
+                self.name, message_id, e,
+            )
+            return False
+
    async def send_update_prompt(
        self, chat_id: str, prompt: str, default: str = "",
        session_key: str = "",
@@ -2328,6 +2353,26 @@ class TelegramAdapter(BasePlatformAdapter):
                    user = getattr(entity, "user", None)
                    if user and getattr(user, "id", None) == bot_id:
                        return True
+                elif entity_type == "bot_command" and expected:
+                    # Telegram's official group-disambiguation form for slash
+                    # commands (``/cmd@botname``) is emitted as a single
+                    # ``bot_command`` entity covering the whole span — there
+                    # is no accompanying ``mention`` entity. Treat it as a
+                    # direct address to this bot when the ``@botname`` suffix
+                    # matches. This is the form Telegram's own command menu
+                    # autocomplete produces in groups, so dropping it at the
+                    # mention gate would break /new, /reset, /help, ... for
+                    # every group that has ``require_mention`` enabled (#15415).
+                    offset = int(getattr(entity, "offset", -1))
+                    length = int(getattr(entity, "length", 0))
+                    if offset < 0 or length <= 0:
+                        continue
+                    command_text = source_text[offset:offset + length]
+                    at_index = command_text.find("@")
+                    if at_index < 0:
+                        continue
+                    if command_text[at_index:].strip().lower() == expected:
+                        return True
        return False

    def _message_matches_mention_patterns(self, message: Message) -> bool:
@@ -0,0 +1,647 @@
+"""
+yuanbao_media.py — 元宝平台媒体处理模块
+
+提供 COS 上传、文件下载、TIM 媒体消息构建等功能。
+移植自 TypeScript 版 media.ts（yuanbao-openclaw-plugin），
+使用 httpx 替代 cos-nodejs-sdk-v5，避免引入额外 SDK 依赖。
+
+COS 上传流程：
+  1. 调用 genUploadInfo 获取临时凭证（tmpSecretId/tmpSecretKey/sessionToken）
+  2. 用临时凭证通过 HMAC-SHA1 签名构建 Authorization 头
+  3. HTTP PUT 上传到 COS
+
+TIM 消息体构建：
+  - buildImageMsgBody() → TIMImageElem
+  - buildFileMsgBody()  → TIMFileElem
+"""
+
+from __future__ import annotations
+
+import hashlib
+import hmac
+import logging
+import os
+import re
+import secrets
+import struct
+import time
+import urllib.parse
+from datetime import datetime, timezone, timedelta
+from typing import Optional, Any
+
+import httpx
+
+logger = logging.getLogger(__name__)
+
+# ============ 常量 ============
+
+UPLOAD_INFO_PATH = "/api/resource/genUploadInfo"
+DEFAULT_API_DOMAIN = "yuanbao.tencent.com"
+DEFAULT_MAX_SIZE_MB = 50
+
+# COS 加速域名后缀（优先使用全球加速）
+COS_USE_ACCELERATE = True
+
+# ============ 类型映射 ============
+
+# MIME → image_format 数字（TIM 协议字段）
+_MIME_TO_IMAGE_FORMAT: dict[str, int] = {
+    "image/jpeg": 1,
+    "image/jpg": 1,
+    "image/gif": 2,
+    "image/png": 3,
+    "image/bmp": 4,
+    "image/webp": 255,
+    "image/heic": 255,
+    "image/tiff": 255,
+}
+
+# 文件扩展名 → MIME
+_EXT_TO_MIME: dict[str, str] = {
+    ".jpg": "image/jpeg",
+    ".jpeg": "image/jpeg",
+    ".png": "image/png",
+    ".gif": "image/gif",
+    ".webp": "image/webp",
+    ".bmp": "image/bmp",
+    ".heic": "image/heic",
+    ".tiff": "image/tiff",
+    ".ico": "image/x-icon",
+    ".pdf": "application/pdf",
+    ".doc": "application/msword",
+    ".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+    ".xls": "application/vnd.ms-excel",
+    ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
+    ".ppt": "application/vnd.ms-powerpoint",
+    ".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
+    ".txt": "text/plain",
+    ".zip": "application/zip",
+    ".tar": "application/x-tar",
+    ".gz": "application/gzip",
+    ".mp3": "audio/mpeg",
+    ".mp4": "video/mp4",
+    ".wav": "audio/wav",
+    ".ogg": "audio/ogg",
+    ".webm": "video/webm",
+}
+
+
+# ============ 工具函数 ============
+
+def guess_mime_type(filename: str) -> str:
+    """根据文件扩展名猜测 MIME 类型。"""
+    ext = os.path.splitext(filename)[-1].lower()
+    return _EXT_TO_MIME.get(ext, "application/octet-stream")
+
+
+def is_image(filename: str, mime_type: str = "") -> bool:
+    """判断是否为图片类型。"""
+    if mime_type.startswith("image/"):
+        return True
+    ext = os.path.splitext(filename)[-1].lower()
+    return ext in {".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".heic", ".tiff", ".ico"}
+
+
+def get_image_format(mime_type: str) -> int:
+    """获取 TIM 图片格式编号。"""
+    return _MIME_TO_IMAGE_FORMAT.get(mime_type.lower(), 255)
+
+
+def md5_hex(data: bytes) -> str:
+    """计算 MD5 十六进制摘要。"""
+    return hashlib.md5(data).hexdigest()
+
+
+def generate_file_id() -> str:
+    """生成随机文件 ID（32 位 hex）。"""
+    return secrets.token_hex(16)
+
+
+
+# ============ 图片尺寸解析（纯 Python，无需 Pillow） ============
+
+def parse_image_size(data: bytes) -> Optional[dict[str, int]]:
+    """
+    解析图片宽高（支持 JPEG/PNG/GIF/WebP），无需第三方依赖。
+    返回 {"width": w, "height": h} 或 None（无法识别）。
+    """
+    return (
+        _parse_png_size(data)
+        or _parse_jpeg_size(data)
+        or _parse_gif_size(data)
+        or _parse_webp_size(data)
+    )
+
+
+def _parse_png_size(buf: bytes) -> Optional[dict[str, int]]:
+    if len(buf) < 24:
+        return None
+    if buf[:4] != b"\x89PNG":
+        return None
+    w = struct.unpack(">I", buf[16:20])[0]
+    h = struct.unpack(">I", buf[20:24])[0]
+    return {"width": w, "height": h}
+
+
+def _parse_jpeg_size(buf: bytes) -> Optional[dict[str, int]]:
+    if len(buf) < 4 or buf[0] != 0xFF or buf[1] != 0xD8:
+        return None
+    i = 2
+    while i < len(buf) - 9:
+        if buf[i] != 0xFF:
+            i += 1
+            continue
+        marker = buf[i + 1]
+        if marker in (0xC0, 0xC2):
+            h = struct.unpack(">H", buf[i + 5: i + 7])[0]
+            w = struct.unpack(">H", buf[i + 7: i + 9])[0]
+            return {"width": w, "height": h}
+        if i + 3 < len(buf):
+            i += 2 + struct.unpack(">H", buf[i + 2: i + 4])[0]
+        else:
+            break
+    return None
+
+
+def _parse_gif_size(buf: bytes) -> Optional[dict[str, int]]:
+    if len(buf) < 10:
+        return None
+    sig = buf[:6].decode("ascii", errors="replace")
+    if sig not in ("GIF87a", "GIF89a"):
+        return None
+    w = struct.unpack("<H", buf[6:8])[0]
+    h = struct.unpack("<H", buf[8:10])[0]
+    return {"width": w, "height": h}
+
+
+def _parse_webp_size(buf: bytes) -> Optional[dict[str, int]]:
+    if len(buf) < 16:
+        return None
+    if buf[:4] != b"RIFF" or buf[8:12] != b"WEBP":
+        return None
+    chunk = buf[12:16].decode("ascii", errors="replace")
+    if chunk == "VP8 ":
+        if len(buf) >= 30 and buf[23] == 0x9D and buf[24] == 0x01 and buf[25] == 0x2A:
+            w = struct.unpack("<H", buf[26:28])[0] & 0x3FFF
+            h = struct.unpack("<H", buf[28:30])[0] & 0x3FFF
+            return {"width": w, "height": h}
+    elif chunk == "VP8L":
+        if len(buf) >= 25 and buf[20] == 0x2F:
+            bits = struct.unpack("<I", buf[21:25])[0]
+            w = (bits & 0x3FFF) + 1
+            h = ((bits >> 14) & 0x3FFF) + 1
+            return {"width": w, "height": h}
+    elif chunk == "VP8X":
+        if len(buf) >= 30:
+            w = (buf[24] | (buf[25] << 8) | (buf[26] << 16)) + 1
+            h = (buf[27] | (buf[28] << 8) | (buf[29] << 16)) + 1
+            return {"width": w, "height": h}
+    return None
+
+
+# ============ URL 下载 ============
+
+async def download_url(
+    url: str,
+    max_size_mb: int = DEFAULT_MAX_SIZE_MB,
+) -> tuple[bytes, str]:
+    """
+    下载 URL 内容，返回 (bytes, content_type)。
+
+    Args:
+        url:          HTTP(S) URL
+        max_size_mb:  最大允许大小（MB），超过则抛出异常
+
+    Returns:
+        (data_bytes, content_type_string)
+
+    Raises:
+        ValueError:  内容超过大小限制
+        httpx.HTTPError: 网络/HTTP 错误
+    """
+    max_bytes = max_size_mb * 1024 * 1024
+    async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+        # 先 HEAD 检查大小
+        try:
+            head = await client.head(url)
+            content_length = int(head.headers.get("content-length", 0) or 0)
+            if content_length > 0 and content_length > max_bytes:
+                raise ValueError(
+                    f"文件过大: {content_length / 1024 / 1024:.1f} MB > {max_size_mb} MB"
+                )
+        except httpx.HTTPStatusError:
+            pass  # 部分服务器不支持 HEAD，忽略
+
+        # GET 下载（流式读取，防止超限）
+        async with client.stream("GET", url) as resp:
+            resp.raise_for_status()
+
+            content_type = resp.headers.get("content-type", "").split(";")[0].strip()
+
+            chunks: list[bytes] = []
+            downloaded = 0
+            async for chunk in resp.aiter_bytes(65536):
+                downloaded += len(chunk)
+                if downloaded > max_bytes:
+                    raise ValueError(
+                        f"文件过大: 已超过 {max_size_mb} MB 限制"
+                    )
+                chunks.append(chunk)
+
+        data = b"".join(chunks)
+        return data, content_type
+
+
+# ============ COS 鉴权（HMAC-SHA1） ============
+
+def _cos_sign(
+    method: str,
+    path: str,
+    params: dict[str, str],
+    headers: dict[str, str],
+    secret_id: str,
+    secret_key: str,
+    start_time: Optional[int] = None,
+    expire_seconds: int = 3600,
+) -> str:
+    """
+    构建 COS 请求签名（q-sign-algorithm=sha1 方案）。
+    参考：https://cloud.tencent.com/document/product/436/7778
+
+    Args:
+        method:         HTTP 方法（小写，如 "put"）
+        path:           URL 路径（URL encode 后的小写）
+        params:         URL 查询参数 dict（用于签名）
+        headers:        参与签名的请求头 dict（key 需小写）
+        secret_id:      临时 SecretId（tmpSecretId）
+        secret_key:     临时 SecretKey（tmpSecretKey）
+        start_time:     签名起始 Unix 时间戳（默认 now）
+        expire_seconds: 签名有效期（秒，默认 3600）
+
+    Returns:
+        Authorization header 值（完整字符串）
+    """
+    now = int(time.time())
+    q_sign_time = f"{start_time or now};{(start_time or now) + expire_seconds}"
+
+    # Step 1: SignKey = HMAC-SHA1(SecretKey, q-sign-time)
+    sign_key = hmac.new(
+        secret_key.encode("utf-8"),
+        q_sign_time.encode("utf-8"),
+        hashlib.sha1,
+    ).hexdigest()
+
+    # Step 2: HttpString
+    # 参数和头部需按字典序排列，key 小写
+    sorted_params = sorted((k.lower(), urllib.parse.quote(str(v), safe="") ) for k, v in params.items())
+    sorted_headers = sorted((k.lower(), urllib.parse.quote(str(v), safe="") ) for k, v in headers.items())
+
+    url_param_list = ";".join(k for k, _ in sorted_params)
+    url_params = "&".join(f"{k}={v}" for k, v in sorted_params)
+    header_list = ";".join(k for k, _ in sorted_headers)
+    header_str = "&".join(f"{k}={v}" for k, v in sorted_headers)
+
+    http_string = "\n".join([
+        method.lower(),
+        path,
+        url_params,
+        header_str,
+        "",
+    ])
+
+    # Step 3: StringToSign = sha1 hash of HttpString
+    sha1_of_http = hashlib.sha1(http_string.encode("utf-8")).hexdigest()
+    string_to_sign = "\n".join([
+        "sha1",
+        q_sign_time,
+        sha1_of_http,
+        "",
+    ])
+
+    # Step 4: Signature = HMAC-SHA1(SignKey, StringToSign)
+    signature = hmac.new(
+        sign_key.encode("utf-8"),
+        string_to_sign.encode("utf-8"),
+        hashlib.sha1,
+    ).hexdigest()
+
+    return (
+        f"q-sign-algorithm=sha1"
+        f"&q-ak={secret_id}"
+        f"&q-sign-time={q_sign_time}"
+        f"&q-key-time={q_sign_time}"
+        f"&q-header-list={header_list}"
+        f"&q-url-param-list={url_param_list}"
+        f"&q-signature={signature}"
+    )
+
+
+# ============ 主要公开 API ============
+
+async def get_cos_credentials(
+    app_key: str,
+    api_domain: str,
+    token: str,
+    filename: str = "file",
+    file_id: Optional[str] = None,
+    bot_id: str = "",
+    route_env: str = "",
+) -> dict:
+    """
+    调用 genUploadInfo 接口获取 COS 临时密钥及上传配置。
+
+    Args:
+        app_key:        应用 Key（用于 X-ID 头）
+        api_domain:     API 域名（如 https://bot.yuanbao.tencent.com）
+        token:          当前有效的签票 token（X-Token 头）
+        filename:       待上传的文件名（含扩展名）
+        file_id:        客户端生成的唯一文件 ID（不传则自动生成）
+        bot_id:         Bot 账号 ID（用于 X-ID 头）
+
+    Returns:
+        COS 上传配置 dict，包含以下字段：
+            bucketName         (str)  — COS Bucket 名称
+            region             (str)  — COS 地域
+            location           (str)  — 上传 Key（对象路径）
+            encryptTmpSecretId (str)  — 临时 SecretId
+            encryptTmpSecretKey(str)  — 临时 SecretKey
+            encryptToken       (str)  — SessionToken
+            startTime          (int)  — 凭证起始时间戳（Unix）
+            expiredTime        (int)  — 凭证过期时间戳（Unix）
+            resourceUrl        (str)  — 上传后的公网访问 URL
+            resourceID         (str)  — 资源 ID（可选）
+
+    Raises:
+        RuntimeError: 接口返回非 0 code 或字段缺失
+    """
+    if file_id is None:
+        file_id = generate_file_id()
+
+    upload_url = f"{api_domain.rstrip('/')}{UPLOAD_INFO_PATH}"
+
+    headers = {
+        "Content-Type": "application/json",
+        "X-Token": token,
+        "X-ID": bot_id or app_key,
+        "X-Source": "web",
+    }
+    if route_env:
+        headers["X-Route-Env"] = route_env
+    body = {
+        "fileName": filename,
+        "fileId": file_id,
+        "docFrom": "localDoc",
+        "docOpenId": "",
+    }
+
+    async with httpx.AsyncClient(timeout=15.0) as client:
+        resp = await client.post(upload_url, json=body, headers=headers)
+        resp.raise_for_status()
+        result: dict[str, Any] = resp.json()
+
+    code = result.get("code")
+    if code != 0 and code is not None:
+        raise RuntimeError(
+            f"genUploadInfo 失败: code={code}, msg={result.get('msg', '')}"
+        )
+
+    data = result.get("data") or result
+    required_fields = ["bucketName", "location"]
+    missing = [f for f in required_fields if not data.get(f)]
+    if missing:
+        raise RuntimeError(
+            f"genUploadInfo 返回字段不完整: 缺少字段 {missing}"
+        )
+
+    return data
+
+
+async def upload_to_cos(
+    file_bytes: bytes,
+    filename: str,
+    content_type: str,
+    credentials: dict,
+    bucket: str,
+    region: str,
+) -> dict:
+    """
+    通过 httpx PUT 请求将文件上传到 COS。
+    使用临时凭证（tmpSecretId/tmpSecretKey/sessionToken）构建 HMAC-SHA1 签名。
+
+    Args:
+        file_bytes:   文件二进制内容
+        filename:     文件名（用于辅助计算 MIME、UUID）
+        content_type: MIME 类型（如 "image/jpeg"）
+        credentials:  get_cos_credentials() 返回的 dict，包含：
+                        encryptTmpSecretId  → tmpSecretId
+                        encryptTmpSecretKey → tmpSecretKey
+                        encryptToken        → sessionToken
+                        location            → COS key（对象路径）
+                        resourceUrl         → 上传后公网 URL
+                        startTime           → 凭证起始时间（Unix）
+                        expiredTime         → 凭证过期时间（Unix）
+        bucket:       COS Bucket 名称（如 chatbot-1234567890）
+        region:       COS 地域（如 ap-guangzhou）
+
+    Returns:
+        上传结果 dict，包含：
+            url       (str)           — COS 公网访问 URL
+            uuid      (str)           — 文件内容 MD5
+            size      (int)           — 文件大小（字节）
+            width     (int, optional) — 图片宽度（仅图片）
+            height    (int, optional) — 图片高度（仅图片）
+
+    Raises:
+        httpx.HTTPStatusError: COS 返回非 2xx 状态
+        RuntimeError:          credentials 字段缺失
+    """
+    secret_id: str = credentials.get("encryptTmpSecretId", "")
+    secret_key: str = credentials.get("encryptTmpSecretKey", "")
+    session_token: str = credentials.get("encryptToken", "")
+    cos_key: str = credentials.get("location", "")
+    resource_url: str = credentials.get("resourceUrl", "")
+    start_time: Optional[int] = credentials.get("startTime")
+    expired_time: Optional[int] = credentials.get("expiredTime")
+
+    if not secret_id or not secret_key or not cos_key:
+        raise RuntimeError(
+            f"COS credentials 不完整: secretId={bool(secret_id)}, "
+            f"secretKey={bool(secret_key)}, location={bool(cos_key)}"
+        )
+
+    # 构建 COS 上传 URL（优先使用全球加速域名）
+    if COS_USE_ACCELERATE:
+        cos_host = f"{bucket}.cos.accelerate.myqcloud.com"
+    else:
+        cos_host = f"{bucket}.cos.{region}.myqcloud.com"
+
+    # URL encode cos_key（保留 /）
+    encoded_key = urllib.parse.quote(cos_key, safe="/")
+    cos_url = f"https://{cos_host}/{encoded_key.lstrip('/')}"
+
+    # 确定 Content-Type
+    if not content_type or content_type == "application/octet-stream":
+        if is_image(filename):
+            content_type = guess_mime_type(filename)
+        else:
+            content_type = "application/octet-stream"
+
+    # 计算文件 MD5 + size
+    file_uuid = md5_hex(file_bytes)
+    file_size = len(file_bytes)
+
+    # 参与签名的请求头
+    sign_headers = {
+        "host": cos_host,
+        "content-type": content_type,
+        "x-cos-security-token": session_token,
+    }
+
+    # 计算签名有效期
+    now = int(time.time())
+    sign_start = start_time if start_time else now
+    sign_expire = (expired_time - now) if expired_time and expired_time > now else 3600
+
+    authorization = _cos_sign(
+        method="put",
+        path=f"/{encoded_key.lstrip('/')}",
+        params={},
+        headers=sign_headers,
+        secret_id=secret_id,
+        secret_key=secret_key,
+        start_time=sign_start,
+        expire_seconds=sign_expire,
+    )
+
+    put_headers = {
+        "Authorization": authorization,
+        "Content-Type": content_type,
+        "x-cos-security-token": session_token,
+    }
+
+    logger.info(
+        "COS PUT: bucket=%s region=%s key=%s size=%d mime=%s",
+        bucket, region, cos_key, file_size, content_type,
+    )
+
+    async with httpx.AsyncClient(timeout=120.0) as client:
+        resp = await client.put(
+            cos_url,
+            content=file_bytes,
+            headers=put_headers,
+        )
+        resp.raise_for_status()
+
+    # 解析图片尺寸（仅图片类型）
+    result: dict[str, Any] = {
+        "url": resource_url or cos_url,
+        "uuid": file_uuid,
+        "size": file_size,
+    }
+
+    if content_type.startswith("image/"):
+        size_info = parse_image_size(file_bytes)
+        if size_info:
+            result["width"] = size_info["width"]
+            result["height"] = size_info["height"]
+
+    logger.info(
+        "COS 上传成功: url=%s size=%d",
+        result["url"], file_size,
+    )
+    return result
+
+
+# ============ TIM 媒体消息构建 ============
+
+def build_image_msg_body(
+    url: str,
+    uuid: Optional[str] = None,
+    filename: Optional[str] = None,
+    size: int = 0,
+    width: int = 0,
+    height: int = 0,
+    mime_type: str = "",
+) -> list[dict]:
+    """
+    构建腾讯 IM TIMImageElem 消息体。
+    参考：https://cloud.tencent.com/document/product/269/2720
+
+    Args:
+        url:       图片公网访问 URL（COS resourceUrl）
+        uuid:      文件 UUID（MD5 或其他唯一标识）
+        filename:  文件名（uuid 为空时作为备用）
+        size:      文件大小（字节）
+        width:     图片宽度（像素）
+        height:    图片高度（像素）
+        mime_type: MIME 类型（用于确定 image_format）
+
+    Returns:
+        TIMImageElem 消息体列表（适合直接放入 msg_body）
+    """
+    _uuid = uuid or filename or _basename_from_url(url) or "image"
+    image_format = get_image_format(mime_type) if mime_type else 255
+
+    return [
+        {
+            "msg_type": "TIMImageElem",
+            "msg_content": {
+                "uuid": _uuid,
+                "image_format": image_format,
+                "image_info_array": [
+                    {
+                        "type": 1,       # 1 = 原图
+                        "size": size,
+                        "width": width,
+                        "height": height,
+                        "url": url,
+                    }
+                ],
+            },
+        }
+    ]
+
+
+def build_file_msg_body(
+    url: str,
+    filename: str,
+    uuid: Optional[str] = None,
+    size: int = 0,
+) -> list[dict]:
+    """
+    构建腾讯 IM TIMFileElem 消息体。
+    参考：https://cloud.tencent.com/document/product/269/2720
+
+    Args:
+        url:      文件公网访问 URL（COS resourceUrl）
+        filename: 文件名（含扩展名）
+        uuid:     文件 UUID（MD5 或其他唯一标识，不传则使用 filename）
+        size:     文件大小（字节）
+
+    Returns:
+        TIMFileElem 消息体列表（适合直接放入 msg_body）
+    """
+    _uuid = uuid or filename
+
+    return [
+        {
+            "msg_type": "TIMFileElem",
+            "msg_content": {
+                "uuid": _uuid,
+                "file_name": filename,
+                "file_size": size,
+                "url": url,
+            },
+        }
+    ]
+
+
+# ============ 内部工具 ============
+
+def _basename_from_url(url: str) -> str:
+    """从 URL 提取文件名。"""
+    try:
+        parsed = urllib.parse.urlparse(url)
+        return os.path.basename(parsed.path)
+    except Exception:
+        return ""
@@ -0,0 +1,558 @@
+"""
+Yuanbao sticker (TIMFaceElem) support.
+
+Ported from yuanbao-openclaw-plugin/src/sticker/.
+
+TIMFaceElem wire format:
+    {
+        "msg_type": "TIMFaceElem",
+        "msg_content": {
+            "index": 0,          # always 0 per Yuanbao convention
+            "data": "<json>",    # serialised sticker metadata
+        }
+    }
+
+The `data` field carries a JSON string with the sticker's metadata so the
+receiver can look up the correct asset in the emoji pack.
+"""
+
+from __future__ import annotations
+
+import json
+import random
+import re
+import unicodedata
+from typing import Optional
+
+# ---------------------------------------------------------------------------
+# Sticker catalogue – ported from builtin-stickers.json
+# Key   : canonical name (Chinese)
+# Value : {sticker_id, package_id, name, description, width, height, formats}
+# ---------------------------------------------------------------------------
+STICKER_MAP: dict[str, dict] = {
+    "六六六": {
+        "sticker_id": "278", "package_id": "1003", "name": "六六六",
+        "description": "666 厉害 牛 棒 绝了 好强 awesome",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "我想开了": {
+        "sticker_id": "262", "package_id": "1003", "name": "我想开了",
+        "description": "想开 佛系 释怀 顿悟 看淡了 无所谓",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "害羞": {
+        "sticker_id": "130", "package_id": "1003", "name": "害羞",
+        "description": "腼腆 不好意思 脸红 娇羞 羞涩 捂脸",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "比心": {
+        "sticker_id": "252", "package_id": "1003", "name": "比心",
+        "description": "笔芯 爱你 爱心手势 love heart 喜欢你",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "委屈": {
+        "sticker_id": "125", "package_id": "1003", "name": "委屈",
+        "description": "难过 想哭 可怜巴巴 瘪嘴 受伤 被欺负",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "亲亲": {
+        "sticker_id": "146", "package_id": "1003", "name": "亲亲",
+        "description": "么么 mua 亲一下 kiss 飞吻 啵",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "酷": {
+        "sticker_id": "131", "package_id": "1003", "name": "酷",
+        "description": "帅 墨镜 cool 高冷 有型 swagger",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "睡": {
+        "sticker_id": "145", "package_id": "1003", "name": "睡",
+        "description": "睡觉 困 zzZ 打盹 躺平 休眠 sleepy",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "发呆": {
+        "sticker_id": "152", "package_id": "1003", "name": "发呆",
+        "description": "懵 愣住 放空 呆滞 出神 脑子空白",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "可怜": {
+        "sticker_id": "157", "package_id": "1003", "name": "可怜",
+        "description": "卖萌 求饶 委屈巴巴 弱小 拜托 眼巴巴",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "摊手": {
+        "sticker_id": "200", "package_id": "1003", "name": "摊手",
+        "description": "无奈 没办法 耸肩 随便 那咋整 whatever",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "头大": {
+        "sticker_id": "213", "package_id": "1003", "name": "头大",
+        "description": "头疼 烦恼 郁闷 难搞 崩溃 一团乱",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "吓": {
+        "sticker_id": "256", "package_id": "1003", "name": "吓",
+        "description": "害怕 惊恐 震惊 吓一跳 恐怖 怂",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "吐血": {
+        "sticker_id": "203", "package_id": "1003", "name": "吐血",
+        "description": "无语 崩溃 被雷 内伤 一口老血 屮",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "哼": {
+        "sticker_id": "185", "package_id": "1003", "name": "哼",
+        "description": "傲娇 生气 不满 撇嘴 不理 赌气",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "嘿嘿": {
+        "sticker_id": "220", "package_id": "1003", "name": "嘿嘿",
+        "description": "坏笑 猥琐笑 偷笑 憨笑 得意 你懂的",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "头秃": {
+        "sticker_id": "218", "package_id": "1003", "name": "头秃",
+        "description": "程序员 加班 焦虑 没头发 秃了 肝爆",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "暗中观察": {
+        "sticker_id": "221", "package_id": "1003", "name": "暗中观察",
+        "description": "窥屏 潜水 偷偷看 角落 围观 屏住呼吸",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "我酸了": {
+        "sticker_id": "224", "package_id": "1003", "name": "我酸了",
+        "description": "嫉妒 柠檬精 羡慕 吃柠檬 眼红 恰柠檬",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "打call": {
+        "sticker_id": "246", "package_id": "1003", "name": "打call",
+        "description": "应援 加油 支持 喝彩 助威 call",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "庆祝": {
+        "sticker_id": "251", "package_id": "1003", "name": "庆祝",
+        "description": "祝贺 开心 耶 party 胜利 干杯",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "奋斗": {
+        "sticker_id": "151", "package_id": "1003", "name": "奋斗",
+        "description": "努力 加油 拼搏 冲 干劲 卷起来",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "惊讶": {
+        "sticker_id": "143", "package_id": "1003", "name": "惊讶",
+        "description": "震惊 哇 不敢相信 OMG 居然 这么离谱",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "疑问": {
+        "sticker_id": "144", "package_id": "1003", "name": "疑问",
+        "description": "问号 不懂 啥 为什么 啥情况 懵逼问",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "仔细分析": {
+        "sticker_id": "248", "package_id": "1003", "name": "仔细分析",
+        "description": "思考 推敲 认真 研究 琢磨 让我想想",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "撅嘴": {
+        "sticker_id": "184", "package_id": "1003", "name": "撅嘴",
+        "description": "嘟嘴 卖萌 不高兴 撒娇 嘴翘",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "泪奔": {
+        "sticker_id": "199", "package_id": "1003", "name": "泪奔",
+        "description": "大哭 伤心 破防 感动哭 泪流满面 呜呜",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "尊嘟假嘟": {
+        "sticker_id": "276", "package_id": "1003", "name": "尊嘟假嘟",
+        "description": "真的假的 真假 可爱问 你骗我 是不是",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "略略略": {
+        "sticker_id": "113", "package_id": "1003", "name": "略略略",
+        "description": "调皮 吐舌 不服 略 气死你 鬼脸",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "困": {
+        "sticker_id": "180", "package_id": "1003", "name": "困",
+        "description": "想睡 倦 打哈欠 睁不开眼 好困啊 sleepy",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "折磨": {
+        "sticker_id": "181", "package_id": "1003", "name": "折磨",
+        "description": "难受 痛苦 煎熬 蚌埠住了 受不了 要命",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "抠鼻": {
+        "sticker_id": "182", "package_id": "1003", "name": "抠鼻",
+        "description": "不屑 无聊 淡定 无所谓 鄙视 挖鼻",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "鼓掌": {
+        "sticker_id": "183", "package_id": "1003", "name": "鼓掌",
+        "description": "拍手 叫好 赞同 666 喝彩 掌声",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "斜眼笑": {
+        "sticker_id": "204", "package_id": "1003", "name": "斜眼笑",
+        "description": "滑稽 坏笑 doge 意味深长 阴阳怪气 嘿嘿嘿",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "辣眼睛": {
+        "sticker_id": "216", "package_id": "1003", "name": "辣眼睛",
+        "description": "看不下去 cringe 毁三观 太丑了 瞎了",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "哦哟": {
+        "sticker_id": "217", "package_id": "1003", "name": "哦哟",
+        "description": "惊讶 起哄 哇哦 有戏 不简单 哟",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "吃瓜": {
+        "sticker_id": "222", "package_id": "1003", "name": "吃瓜",
+        "description": "围观 看戏 八卦 路人 看热闹 板凳",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "狗头": {
+        "sticker_id": "225", "package_id": "1003", "name": "狗头",
+        "description": "doge 保命 开玩笑 滑稽 反讽 懂的都懂",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "敬礼": {
+        "sticker_id": "227", "package_id": "1003", "name": "敬礼",
+        "description": "salute 尊重 收到 遵命 致敬 报告",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "哦": {
+        "sticker_id": "231", "package_id": "1003", "name": "哦",
+        "description": "知道了 明白 敷衍 嗯 这样啊 收到",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "拿到红包": {
+        "sticker_id": "236", "package_id": "1003", "name": "拿到红包",
+        "description": "红包 谢谢老板 发财 开心 抢到了 欧气",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "牛吖": {
+        "sticker_id": "239", "package_id": "1003", "name": "牛吖",
+        "description": "牛 厉害 强 666 佩服 大佬",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "贴贴": {
+        "sticker_id": "272", "package_id": "1003", "name": "贴贴",
+        "description": "抱抱 亲昵 蹭蹭 亲密 靠靠 撒娇贴",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "爱心": {
+        "sticker_id": "138", "package_id": "1003", "name": "爱心",
+        "description": "心 love 喜欢你 红心 示爱 么么哒",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "晚安": {
+        "sticker_id": "170", "package_id": "1003", "name": "晚安",
+        "description": "好梦 睡了 night 早点休息 安啦 moon",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "太阳": {
+        "sticker_id": "176", "package_id": "1003", "name": "太阳",
+        "description": "晴天 早上好 阳光 morning 好天气 日",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "柠檬": {
+        "sticker_id": "266", "package_id": "1003", "name": "柠檬",
+        "description": "酸 嫉妒 柠檬精 羡慕 我酸 恰柠檬",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "大冤种": {
+        "sticker_id": "267", "package_id": "1003", "name": "大冤种",
+        "description": "倒霉 吃亏 自嘲 好心没好报 背锅 工具人",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "吐了": {
+        "sticker_id": "132", "package_id": "1003", "name": "吐了",
+        "description": "恶心 yue 受不了 嫌弃 想吐 生理不适",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "怒": {
+        "sticker_id": "134", "package_id": "1003", "name": "怒",
+        "description": "生气 愤怒 火大 暴躁 气炸 怼",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "玫瑰": {
+        "sticker_id": "165", "package_id": "1003", "name": "玫瑰",
+        "description": "花 示爱 表白 浪漫 送你花 情人节",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "凋谢": {
+        "sticker_id": "119", "package_id": "1003", "name": "凋谢",
+        "description": "花谢 失恋 难过 枯萎 心碎 凉了",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "点赞": {
+        "sticker_id": "159", "package_id": "1003", "name": "点赞",
+        "description": "赞 认同 好棒 good like 大拇指 顶",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "握手": {
+        "sticker_id": "164", "package_id": "1003", "name": "握手",
+        "description": "合作 你好 商务 hello deal 成交 友好",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "抱拳": {
+        "sticker_id": "163", "package_id": "1003", "name": "抱拳",
+        "description": "谢谢 失敬 江湖 承让 拜托 有礼",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "ok": {
+        "sticker_id": "169", "package_id": "1003", "name": "ok",
+        "description": "好的 收到 没问题 okay 行 可以 懂了",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "拳头": {
+        "sticker_id": "174", "package_id": "1003", "name": "拳头",
+        "description": "加油 干 冲 fight 力量 击拳 硬气",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "鞭炮": {
+        "sticker_id": "191", "package_id": "1003", "name": "鞭炮",
+        "description": "过年 喜庆 爆竹 春节 噼里啪啦 红",
+        "width": 128, "height": 128, "formats": "png",
+    },
+    "烟花": {
+        "sticker_id": "258", "package_id": "1003", "name": "烟花",
+        "description": "庆典 漂亮 新年 嘭 绽放 节日快乐",
+        "width": 128, "height": 128, "formats": "png",
+    },
+}
+
+
+def get_sticker_by_name(name: str) -> Optional[dict]:
+    """
+    按名称查找贴纸，支持模糊匹配。
+
+    匹配优先级：
+      1. 完全相等（name）
+      2. name 包含查询词（前缀/子串）
+      3. description 包含查询词（同义词搜索）
+      4. 通用模糊评分（与 sticker-search 同算法），命中即返回得分最高的一条
+
+    返回 sticker dict，找不到返回 None。
+    """
+    if not name:
+        return None
+
+    query = name.strip()
+
+    if query in STICKER_MAP:
+        return STICKER_MAP[query]
+
+    for key, sticker in STICKER_MAP.items():
+        if query in key or key in query:
+            return sticker
+
+    for sticker in STICKER_MAP.values():
+        desc = sticker.get("description", "")
+        if query in desc:
+            return sticker
+
+    matches = search_stickers(query, limit=1)
+    return matches[0] if matches else None
+
+
+def get_random_sticker(category: str = None) -> dict:
+    """
+    随机返回一个贴纸。
+
+    若指定 category，则在 description 中含有该关键词的贴纸里随机选取；
+    category 为 None 时从全表随机。
+    """
+    if category:
+        candidates = [
+            s for s in STICKER_MAP.values()
+            if category in s.get("description", "") or category in s.get("name", "")
+        ]
+        if candidates:
+            return random.choice(candidates)
+    return random.choice(list(STICKER_MAP.values()))
+
+
+def get_sticker_by_id(sticker_id: str) -> Optional[dict]:
+    """按 sticker_id 精确查找贴纸。"""
+    if not sticker_id:
+        return None
+    sid = str(sticker_id).strip()
+    for sticker in STICKER_MAP.values():
+        if sticker.get("sticker_id") == sid:
+            return sticker
+    return None
+
+
+# ---------------------------------------------------------------------------
+# 模糊搜索（对齐 chatbot-web yuanbao-openclaw-plugin/sticker-cache.ts.searchStickers）
+# ---------------------------------------------------------------------------
+
+_PUNCT_RE = re.compile(r"[\s\u3000\-_·.,，。!！?？\"“”'‘’、/\\]+")
+
+
+def _normalize_text(raw: str) -> str:
+    return unicodedata.normalize("NFKC", str(raw or "")).strip().lower()
+
+
+def _compact_text(raw: str) -> str:
+    return _PUNCT_RE.sub("", _normalize_text(raw))
+
+
+def _multiset_char_hit_ratio(needle: str, haystack: str) -> float:
+    if not needle:
+        return 0.0
+    bag: dict[str, int] = {}
+    for ch in haystack:
+        bag[ch] = bag.get(ch, 0) + 1
+    hits = 0
+    for ch in needle:
+        n = bag.get(ch, 0)
+        if n > 0:
+            hits += 1
+            bag[ch] = n - 1
+    return hits / len(needle)
+
+
+def _bigram_jaccard(a: str, b: str) -> float:
+    if len(a) < 2 or len(b) < 2:
+        return 0.0
+    A = {a[i:i + 2] for i in range(len(a) - 1)}
+    B = {b[i:i + 2] for i in range(len(b) - 1)}
+    inter = len(A & B)
+    union = len(A) + len(B) - inter
+    return inter / union if union else 0.0
+
+
+def _longest_subsequence_ratio(needle: str, haystack: str) -> float:
+    if not needle:
+        return 0.0
+    j = 0
+    for ch in haystack:
+        if j >= len(needle):
+            break
+        if ch == needle[j]:
+            j += 1
+    return j / len(needle)
+
+
+def _score_field(haystack: str, query: str) -> float:
+    hay = _normalize_text(haystack)
+    q = _normalize_text(query)
+    if not hay or not q:
+        return 0.0
+    hay_c = _compact_text(haystack)
+    q_c = _compact_text(query)
+    best = 0.0
+    if hay == q:
+        best = max(best, 100.0)
+    if q in hay:
+        best = max(best, 92 + min(6, len(q)))
+    if len(q) >= 2 and hay.startswith(q):
+        best = max(best, 88.0)
+    if q_c and q_c in hay_c:
+        best = max(best, 86.0)
+    best = max(best, _multiset_char_hit_ratio(q_c, hay_c) * 62)
+    best = max(best, _bigram_jaccard(q_c, hay_c) * 58)
+    best = max(best, _longest_subsequence_ratio(q_c, hay_c) * 52)
+    if len(q) == 1 and q in hay:
+        best = max(best, 68.0)
+    return best
+
+
+def search_stickers(query: str, limit: int = 10) -> list[dict]:
+    """
+    在内置贴纸表中按模糊匹配排序返回前 N 条结果。
+
+    评分综合 name/description 字段的子串、字符多重集覆盖、bigram Jaccard、子序列比例。
+    name 权重略高于 description（×0.88）。空 query 时按字典顺序返回前 N 条。
+    """
+    safe_limit = max(1, min(500, int(limit) if limit else 10))
+    if not query or not _normalize_text(query):
+        return list(STICKER_MAP.values())[:safe_limit]
+
+    scored: list[tuple[float, dict]] = []
+    for sticker in STICKER_MAP.values():
+        name_s = _score_field(sticker.get("name", ""), query)
+        desc_s = _score_field(sticker.get("description", ""), query) * 0.88
+        sid = str(sticker.get("sticker_id", "")).strip()
+        q_norm = _normalize_text(query)
+        id_s = 0.0
+        if sid and q_norm:
+            sid_norm = _normalize_text(sid)
+            if sid_norm == q_norm:
+                id_s = 100.0
+            elif q_norm in sid_norm:
+                id_s = 84.0
+        scored.append((max(name_s, desc_s, id_s), sticker))
+
+    scored.sort(key=lambda x: x[0], reverse=True)
+    top = scored[0][0] if scored else 0
+    if top <= 0:
+        return [s for _, s in scored[:safe_limit]]
+
+    if top >= 22:
+        floor = 18.0
+    elif top >= 12:
+        floor = max(10.0, top * 0.5)
+    else:
+        floor = max(6.0, top * 0.35)
+
+    filtered = [pair for pair in scored if pair[0] >= floor]
+    out = filtered if filtered else scored
+    return [s for _, s in out[:safe_limit]]
+
+
+def build_face_msg_body(
+    face_index: int,
+    face_type: int = 1,
+    data: Optional[str] = None,
+) -> list:
+    """
+    构造 TIMFaceElem 消息体。
+
+    Yuanbao 约定：
+      - index 固定传 0（服务端通过 data 字段识别具体表情）
+      - data 为 JSON 字符串，包含 sticker_id / package_id 等字段
+
+    Args:
+        face_index: 保留字段，暂时不影响 wire format（Yuanbao 固定 index=0）。
+                    当 face_index > 0 时视为旧版 QQ 表情 ID，直接放入 index。
+        face_type:  保留字段（兼容旧接口，当前未使用）。
+        data:       已序列化的 JSON 字符串；为 None 时仅传 index。
+
+    Returns:
+        符合 Yuanbao TIM 协议的 msg_body list，如::
+
+            [{"msg_type": "TIMFaceElem", "msg_content": {"index": 0, "data": "..."}}]
+    """
+    msg_content: dict = {"index": face_index}
+    if data is not None:
+        msg_content["data"] = data
+    return [{"msg_type": "TIMFaceElem", "msg_content": msg_content}]
+
+
+def build_sticker_msg_body(sticker: dict) -> list:
+    """
+    从 STICKER_MAP 中的 sticker dict 直接构造 TIMFaceElem 消息体。
+
+    这是 send_sticker() 的内部辅助，确保 data 字段与原始 JS 插件一致。
+    """
+    data_payload = json.dumps(
+        {
+            "sticker_id": sticker["sticker_id"],
+            "package_id": sticker["package_id"],
+            "width": sticker.get("width", 128),
+            "height": sticker.get("height", 128),
+            "formats": sticker.get("formats", "png"),
+            "name": sticker["name"],
+        },
+        ensure_ascii=False,
+        separators=(",", ":"),
+    )
+    return build_face_msg_body(face_index=0, data=data_payload)
@@ -310,8 +310,9 @@ def build_session_context_prompt(
            "**Platform notes:** You are running inside Slack. "
            "You do NOT have access to Slack-specific APIs — you cannot search "
            "channel history, pin/unpin messages, manage channels, or list users. "
-            "Do not promise to perform these actions. If the user asks, explain "
-            "that you can only read messages sent directly to you and respond."
+            "Do not promise to perform these actions. The gateway may inline the "
+            "current message's Slack block/attachment payload when available, but "
+            "you still cannot call Slack APIs yourself."
        )
    elif context.source.platform == Platform.DISCORD:
        # Inject the Discord IDs block only when the agent actually has
@@ -353,6 +354,14 @@ def build_session_context_prompt(
            "If the user needs a detailed answer, give the short version first "
            "and offer to elaborate."
        )
+    elif context.source.platform == Platform.YUANBAO:
+        lines.append("")
+        lines.append(
+            "**Platform notes:** You are running inside Yuanbao. "
+            "You CAN send private (DM) messages via the send_message tool. "
+            "Use target='yuanbao:direct:<account_id>' for DM "
+            "and target='yuanbao:group:<group_code>' for group chat."
+        )

    # Connected platforms
    platforms_list = ["local (files on this machine)"]
@@ -44,6 +44,14 @@ class StreamConsumerConfig:
    buffer_threshold: int = 40
    cursor: str = " ▉"
    buffer_only: bool = False
+    # When >0, the final edit for a streamed response is delivered as a
+    # fresh message if the original preview has been visible for at least
+    # this many seconds.  This makes the platform's visible timestamp
+    # reflect completion time instead of first-token time for long-running
+    # responses (e.g. reasoning models that stream slowly).  Ported from
+    # openclaw/openclaw#72038.  Default 0 = always edit in place (legacy
+    # behavior).  The gateway enables this selectively per-platform.
+    fresh_final_after_seconds: float = 0.0


 class GatewayStreamConsumer:
@@ -91,6 +99,12 @@ class GatewayStreamConsumer:
        self._queue: queue.Queue = queue.Queue()
        self._accumulated = ""
        self._message_id: Optional[str] = None
+        # Wall-clock timestamp (time.monotonic) when ``_message_id`` was
+        # first assigned from a successful first-send.  Used by the
+        # fresh-final logic to detect long-lived previews whose edit
+        # timestamps would be stale by completion time.  Ported from
+        # openclaw/openclaw#72038.
+        self._message_created_ts: Optional[float] = None
        self._already_sent = False
        self._edit_supported = True  # Disabled when progressive edits are no longer usable
        self._last_edit_time = 0.0
@@ -136,6 +150,7 @@ class GatewayStreamConsumer:
        if preserve_no_edit and self._message_id == "__no_edit__":
            return
        self._message_id = None
+        self._message_created_ts = None
        self._accumulated = ""
        self._last_sent_text = ""
        self._fallback_final_send = False
@@ -734,6 +749,81 @@ class GatewayStreamConsumer:
            logger.error("Commentary send error: %s", e)
            return False

+    def _should_send_fresh_final(self) -> bool:
+        """Return True when a long-lived preview should be replaced with a
+        fresh final message instead of an edit.
+
+        Conditions:
+        - Fresh-final is enabled (``fresh_final_after_seconds > 0``).
+        - We have a real preview message id (not the ``__no_edit__`` sentinel
+          and not ``None``).
+        - The preview has been visible for at least the configured threshold.
+
+        Ported from openclaw/openclaw#72038.
+        """
+        threshold = getattr(self.cfg, "fresh_final_after_seconds", 0.0) or 0.0
+        if threshold <= 0:
+            return False
+        if not self._message_id or self._message_id == "__no_edit__":
+            return False
+        if self._message_created_ts is None:
+            return False
+        age = time.monotonic() - self._message_created_ts
+        return age >= threshold
+
+    async def _try_fresh_final(self, text: str) -> bool:
+        """Send ``text`` as a brand-new message (best-effort delete the old
+        preview) so the platform's visible timestamp reflects completion
+        time.  Returns True on successful delivery, False on any failure so
+        the caller falls back to the normal edit path.
+
+        Ported from openclaw/openclaw#72038.
+        """
+        old_message_id = self._message_id
+        try:
+            result = await self.adapter.send(
+                chat_id=self.chat_id,
+                content=text,
+                metadata=self.metadata,
+            )
+        except Exception as e:
+            logger.debug("Fresh-final send failed, falling back to edit: %s", e)
+            return False
+        if not getattr(result, "success", False):
+            return False
+        # Successful fresh send — try to delete the stale preview so the
+        # user doesn't see the old edit-stuck message underneath.  Cleanup
+        # is best-effort; platforms that don't implement ``delete_message``
+        # just leave the preview behind (still an acceptable outcome —
+        # the visible final timestamp is the important part).
+        if old_message_id and old_message_id != "__no_edit__":
+            delete_fn = getattr(self.adapter, "delete_message", None)
+            if delete_fn is not None:
+                try:
+                    await delete_fn(self.chat_id, old_message_id)
+                except Exception as e:
+                    logger.debug(
+                        "Fresh-final preview cleanup failed (%s): %s",
+                        old_message_id, e,
+                    )
+        # Adopt the new message id as the current message so subsequent
+        # callers (e.g. overflow split loops, finalize retries) see a
+        # consistent state.
+        new_message_id = getattr(result, "message_id", None)
+        if new_message_id:
+            self._message_id = new_message_id
+            self._message_created_ts = time.monotonic()
+        else:
+            # Send succeeded but platform didn't return an id — treat the
+            # delivery as final-only and fall back to "__no_edit__" so we
+            # don't try to edit something we can't address.
+            self._message_id = "__no_edit__"
+            self._message_created_ts = None
+        self._already_sent = True
+        self._last_sent_text = text
+        self._final_response_sent = True
+        return True
+
    async def _send_or_edit(self, text: str, *, finalize: bool = False) -> bool:
        """Send or edit the streaming message.

@@ -786,6 +876,22 @@ class GatewayStreamConsumer:
                        finalize and self._adapter_requires_finalize
                    ):
                        return True
+                    # Fresh-final for long-lived previews: when finalizing
+                    # the last edit in a streaming sequence, if the
+                    # original preview has been visible for at least
+                    # ``fresh_final_after_seconds``, send the completed
+                    # reply as a fresh message so the platform's visible
+                    # timestamp reflects completion time instead of the
+                    # preview creation time.  Best-effort cleanup of the
+                    # old preview follows.  Ported from
+                    # openclaw/openclaw#72038.  Gated by config so the
+                    # legacy edit-in-place path stays the default.
+                    if (
+                        finalize
+                        and self._should_send_fresh_final()
+                        and await self._try_fresh_final(text)
+                    ):
+                        return True
                    # Edit existing message
                    result = await self.adapter.edit_message(
                        chat_id=self.chat_id,
@@ -852,6 +958,10 @@ class GatewayStreamConsumer:
                if result.success:
                    if result.message_id:
                        self._message_id = result.message_id
+                        # Track when the preview first became visible to
+                        # the user so fresh-final logic can detect stale
+                        # preview timestamps on long-running responses.
+                        self._message_created_ts = time.monotonic()
                    else:
                        self._edit_supported = False
                    self._already_sent = True
@@ -31,8 +31,17 @@ Hermes' own session keys.
 from __future__ import annotations

 import json
+import logging
+import re
 from typing import Set

+logger = logging.getLogger(__name__)
+
+# WhatsApp JIDs are numeric (or plus-prefixed numeric) with optional
+# ``@``, ``.`` and ``:`` separators. ``\w`` is pinned to ASCII so
+# full-width digits / Unicode word chars can't sneak through.
+_SAFE_IDENTIFIER_RE = re.compile(r"^[A-Za-z0-9@.+\-]+$")
+
 from hermes_constants import get_hermes_home


@@ -81,6 +90,16 @@ def expand_whatsapp_aliases(identifier: str) -> Set[str]:
        current = queue.pop(0)
        if not current or current in resolved:
            continue
+        # Defense-in-depth: reject identifiers that could sneak path
+        # separators / traversal segments into the ``lid-mapping-{current}``
+        # filename below. The hardcoded ``lid-mapping-`` prefix already
+        # prevents escape via pathlib's component split (an attacker can't
+        # create ``lid-mapping-..`` as a real directory in session_dir), but
+        # this keeps the identifier space to the characters WhatsApp JIDs
+        # actually use and avoids depending on that filesystem-layout
+        # invariant.
+        if not _SAFE_IDENTIFIER_RE.match(current):
+            continue

        resolved.add(current)
        for suffix in ("", "_reverse"):
@@ -91,7 +110,8 @@ def expand_whatsapp_aliases(identifier: str) -> Set[str]:
                mapped = normalize_whatsapp_identifier(
                    json.loads(mapping_path.read_text(encoding="utf-8"))
                )
-            except Exception:
+            except (OSError, json.JSONDecodeError) as exc:
+                logger.debug("whatsapp_identity: failed to read %s: %s", mapping_path, exc)
                continue
            if mapped and mapped not in resolved:
                queue.append(mapped)
@@ -224,6 +224,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        api_key_env_vars=("ARCEEAI_API_KEY",),
        base_url_env_var="ARCEE_BASE_URL",
    ),
+    "gmi": ProviderConfig(
+        id="gmi",
+        name="GMI Cloud",
+        auth_type="api_key",
+        inference_base_url="https://api.gmi-serving.com/v1",
+        api_key_env_vars=("GMI_API_KEY",),
+        base_url_env_var="GMI_BASE_URL",
+    ),
    "minimax": ProviderConfig(
        id="minimax",
        name="MiniMax",
@@ -467,11 +475,27 @@ def _resolve_api_key_provider_secret(
            pass
        return "", ""

+    from hermes_cli.config import get_env_value
    for env_var in pconfig.api_key_env_vars:
-        val = os.getenv(env_var, "").strip()
+        # Check both os.environ and ~/.hermes/.env file
+        val = (get_env_value(env_var) or "").strip()
        if has_usable_secret(val):
            return val, env_var

+    # Fallback: try credential pool (e.g. zai key stored via auth.json)
+    try:
+        from agent.credential_pool import load_pool
+        pool = load_pool(provider_id)
+        if pool and pool.has_credentials():
+            entry = pool.peek()
+            if entry:
+                key = getattr(entry, "access_token", "") or getattr(entry, "runtime_api_key", "")
+                key = str(key).strip()
+                if has_usable_secret(key):
+                    return key, f"credential_pool:{provider_id}"
+    except Exception:
+        pass
+
    return "", ""


@@ -1104,6 +1128,7 @@ def resolve_provider(
        "kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
        "step": "stepfun", "stepfun-coding-plan": "stepfun",
        "arcee-ai": "arcee", "arceeai": "arcee",
+        "gmi-cloud": "gmi", "gmicloud": "gmi",
        "minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
        "alibaba_coding": "alibaba-coding-plan", "alibaba-coding": "alibaba-coding-plan",
        "alibaba_coding_plan": "alibaba-coding-plan",
@@ -4244,10 +4269,10 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
                )

            from hermes_cli.models import (
-                _PROVIDER_MODELS, get_pricing_for_provider,
+                get_curated_nous_model_ids, get_pricing_for_provider,
                check_nous_free_tier, partition_nous_models_by_tier,
            )
-            model_ids = _PROVIDER_MODELS.get("nous", [])
+            model_ids = get_curated_nous_model_ids()

            print()
            unavailable_models: list = []
@@ -36,12 +36,23 @@ _EXCLUDED_DIRS = {
    "__pycache__",      # bytecode caches — regenerated on import
    ".git",             # nested git dirs (profiles shouldn't have these, but safety)
    "node_modules",     # js deps if website/ somehow leaks in
+    "backups",          # prior auto-backups — don't nest backups exponentially
+    "checkpoints",      # session-local trajectory caches — regenerated per-session,
+                        # session-hash-keyed so they don't port to another machine anyway
 }

 # File-name suffixes to skip
 _EXCLUDED_SUFFIXES = (
    ".pyc",
    ".pyo",
+    # SQLite sidecar files — the backup takes a consistent snapshot of ``*.db``
+    # via ``sqlite3.backup()``, so shipping the live WAL / shared-memory /
+    # rollback-journal alongside would pair a fresh snapshot with stale sidecar
+    # state and produce a torn restore on the next open. They're transient and
+    # regenerated on first connection anyway.
+    ".db-wal",
+    ".db-shm",
+    ".db-journal",
 )

 # File names to skip (runtime state that's meaningless on another machine)
@@ -454,6 +465,12 @@ def run_import(args) -> None:
 # Critical state files to include in quick snapshots (relative to HERMES_HOME).
 # Everything else is either regeneratable (logs, cache) or managed separately
 # (skills, repo, sessions/).
+#
+# Entries may be individual files OR directories.  Directories are captured
+# recursively; missing entries are silently skipped.  Pairing data lives in
+# platform-specific JSON blobs outside state.db, so it's listed here explicitly
+# — `hermes update` snapshots this set before pulling so approved-user lists
+# are recoverable if anything goes wrong (issue #15733).
 _QUICK_STATE_FILES = (
    "state.db",
    "config.yaml",
@@ -463,6 +480,10 @@ _QUICK_STATE_FILES = (
    "gateway_state.json",
    "channel_directory.json",
    "processes.json",
+    # Pairing stores (generic + per-platform JSONs outside state.db)
+    "pairing",                          # legacy location (gateway/pairing.py)
+    "platforms/pairing",                # new location (gateway/pairing.py)
+    "feishu_comment_pairing.json",      # Feishu comment subscription pairings
 )

 _QUICK_SNAPSHOTS_DIR = "state-snapshots"
@@ -498,7 +519,27 @@ def create_quick_snapshot(

    for rel in _QUICK_STATE_FILES:
        src = home / rel
-        if not src.exists() or not src.is_file():
+        if not src.exists():
+            continue
+
+        if src.is_dir():
+            # Walk the directory and record each file individually in the
+            # manifest so restore can treat them uniformly.  Empty dirs are
+            # skipped (nothing to snapshot).
+            for sub in src.rglob("*"):
+                if not sub.is_file():
+                    continue
+                sub_rel = sub.relative_to(home).as_posix()
+                dst = snap_dir / sub_rel
+                dst.parent.mkdir(parents=True, exist_ok=True)
+                try:
+                    shutil.copy2(sub, dst)
+                    manifest[sub_rel] = dst.stat().st_size
+                except (OSError, PermissionError) as exc:
+                    logger.warning("Could not snapshot %s: %s", sub_rel, exc)
+            continue
+
+        if not src.is_file():
            continue

        dst = snap_dir / rel
@@ -653,3 +694,138 @@ def run_quick_backup(args) -> None:
        print(f"  Restore with: /snapshot restore {snap_id}")
    else:
        print("No state files found to snapshot.")
+
+
+# ---------------------------------------------------------------------------
+# Pre-update auto-backup
+# ---------------------------------------------------------------------------
+
+_PRE_UPDATE_BACKUPS_DIR = "backups"
+_PRE_UPDATE_PREFIX = "pre-update-"
+_PRE_UPDATE_DEFAULT_KEEP = 5
+
+
+def _pre_update_backup_dir(hermes_home: Optional[Path] = None) -> Path:
+    home = hermes_home or get_hermes_home()
+    return home / _PRE_UPDATE_BACKUPS_DIR
+
+
+def _prune_pre_update_backups(backup_dir: Path, keep: int) -> int:
+    """Remove oldest pre-update backups beyond the keep limit.
+
+    Returns the number of files deleted.  Only touches files matching
+    ``pre-update-*.zip`` so hand-made zips dropped in the same directory
+    are never touched.
+    """
+    if keep < 0:
+        keep = 0
+    if not backup_dir.exists():
+        return 0
+
+    backups = sorted(
+        (p for p in backup_dir.iterdir()
+         if p.is_file() and p.name.startswith(_PRE_UPDATE_PREFIX) and p.suffix.lower() == ".zip"),
+        key=lambda p: p.name,
+        reverse=True,
+    )
+
+    deleted = 0
+    for p in backups[keep:]:
+        try:
+            p.unlink()
+            deleted += 1
+        except OSError as exc:
+            logger.warning("Failed to prune backup %s: %s", p.name, exc)
+
+    return deleted
+
+
+def create_pre_update_backup(
+    hermes_home: Optional[Path] = None,
+    keep: int = _PRE_UPDATE_DEFAULT_KEEP,
+) -> Optional[Path]:
+    """Create a full zip backup of HERMES_HOME under ``backups/``.
+
+    Mirrors :func:`run_backup` (same exclusion rules, same SQLite safe-copy)
+    but writes to ``<HERMES_HOME>/backups/pre-update-<timestamp>.zip`` and
+    auto-prunes old pre-update backups.
+
+    Returns the path to the created zip, or ``None`` if no files were
+    found or the backup could not be created.  Never raises — the caller
+    (``hermes update``) should continue even if the backup fails.
+    """
+    hermes_root = hermes_home or get_default_hermes_root()
+    if not hermes_root.is_dir():
+        return None
+
+    backup_dir = _pre_update_backup_dir(hermes_root)
+    try:
+        backup_dir.mkdir(parents=True, exist_ok=True)
+    except OSError as exc:
+        logger.warning("Could not create pre-update backup dir %s: %s", backup_dir, exc)
+        return None
+
+    stamp = datetime.now().strftime("%Y-%m-%d-%H%M%S")
+    out_path = backup_dir / f"{_PRE_UPDATE_PREFIX}{stamp}.zip"
+
+    # Collect files (same logic as run_backup, minus the chatty progress prints)
+    files_to_add: list[tuple[Path, Path]] = []
+    try:
+        for dirpath, dirnames, filenames in os.walk(hermes_root, followlinks=False):
+            dp = Path(dirpath)
+            # Prune excluded directories in-place so os.walk doesn't descend
+            dirnames[:] = [d for d in dirnames if d not in _EXCLUDED_DIRS]
+
+            for fname in filenames:
+                fpath = dp / fname
+                try:
+                    rel = fpath.relative_to(hermes_root)
+                except ValueError:
+                    continue
+
+                if _should_exclude(rel):
+                    continue
+
+                # Skip the output zip itself if it already exists
+                try:
+                    if fpath.resolve() == out_path.resolve():
+                        continue
+                except (OSError, ValueError):
+                    pass
+
+                files_to_add.append((fpath, rel))
+    except OSError as exc:
+        logger.warning("Pre-update backup: walk failed: %s", exc)
+        return None
+
+    if not files_to_add:
+        return None
+
+    try:
+        with zipfile.ZipFile(out_path, "w", zipfile.ZIP_DEFLATED, compresslevel=6) as zf:
+            for abs_path, rel_path in files_to_add:
+                try:
+                    if abs_path.suffix == ".db":
+                        with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
+                            tmp_db = Path(tmp.name)
+                        try:
+                            if _safe_copy_db(abs_path, tmp_db):
+                                zf.write(tmp_db, arcname=str(rel_path))
+                        finally:
+                            tmp_db.unlink(missing_ok=True)
+                    else:
+                        zf.write(abs_path, arcname=str(rel_path))
+                except (PermissionError, OSError, ValueError) as exc:
+                    logger.debug("Skipping %s in pre-update backup: %s", rel_path, exc)
+                    continue
+    except OSError as exc:
+        logger.warning("Pre-update backup: zip write failed: %s", exc)
+        # Best-effort cleanup of partial file
+        try:
+            out_path.unlink(missing_ok=True)
+        except OSError:
+            pass
+        return None
+
+    _prune_pre_update_backups(backup_dir, keep=keep)
+    return out_path
@@ -62,6 +62,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
               aliases=("reset",)),
    CommandDef("clear", "Clear screen and start a new session", "Session",
               cli_only=True),
+    CommandDef("redraw", "Force a full UI repaint (recovers from terminal drift)", "Session",
+               cli_only=True),
    CommandDef("history", "Show conversation history", "Session",
               cli_only=True),
    CommandDef("save", "Save the current conversation", "Session",
@@ -84,9 +86,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("deny", "Deny a pending dangerous command", "Session",
               gateway_only=True),
    CommandDef("background", "Run a prompt in the background", "Session",
-               aliases=("bg",), args_hint="<prompt>"),
-    CommandDef("btw", "Ephemeral side question using session context (no tools, not persisted)", "Session",
-               args_hint="<question>"),
+               aliases=("bg", "btw"), args_hint="<prompt>"),
    CommandDef("agents", "Show active agents and running tasks", "Session",
               aliases=("tasks",)),
    CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session",
@@ -128,8 +128,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("voice", "Toggle voice mode", "Configuration",
               args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
    CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
-               cli_only=True, args_hint="[queue|interrupt|status]",
-               subcommands=("queue", "interrupt", "status")),
+               cli_only=True, args_hint="[queue|steer|interrupt|status]",
+               subcommands=("queue", "steer", "interrupt", "status")),

    # Tools & Skills
    CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@@ -808,6 +808,114 @@ def discord_skill_commands_by_category(
    return trimmed_categories, uncategorized, hidden


+# ---------------------------------------------------------------------------
+# Slack native slash commands
+# ---------------------------------------------------------------------------
+
+# Slack slash command name constraints: lowercase a-z, 0-9, hyphens,
+# underscores. Max 32 chars. Slack app manifest accepts up to 50 slash
+# commands per app.
+_SLACK_MAX_SLASH_COMMANDS = 50
+_SLACK_NAME_LIMIT = 32
+_SLACK_INVALID_CHARS = re.compile(r"[^a-z0-9_\-]")
+
+
+def _sanitize_slack_name(raw: str) -> str:
+    """Convert a command name to a valid Slack slash command name.
+
+    Slack allows lowercase a-z, digits, hyphens, and underscores. Max 32
+    chars. Uppercase is lowercased; invalid chars are stripped.
+    """
+    name = raw.lower()
+    name = _SLACK_INVALID_CHARS.sub("", name)
+    name = name.strip("-_")
+    return name[:_SLACK_NAME_LIMIT]
+
+
+def slack_native_slashes() -> list[tuple[str, str, str]]:
+    """Return (slash_name, description, usage_hint) triples for Slack.
+
+    Every gateway-available command in ``COMMAND_REGISTRY`` is surfaced as
+    a standalone Slack slash command (e.g. ``/btw``, ``/stop``, ``/model``),
+    matching Discord's and Telegram's model where every command is a
+    first-class slash and not a ``/hermes <verb>`` subcommand.
+
+    Both canonical names and aliases are included so users can type any
+    documented form (e.g. ``/background``, ``/bg``, and ``/btw`` all work).
+    Plugin-registered slash commands are included too.
+
+    Results are clamped to Slack's 50-command limit with duplicate-name
+    avoidance. ``/hermes`` is always reserved as the first entry so the
+    legacy ``/hermes <subcommand>`` form keeps working for anything that
+    gets dropped by the clamp or for free-form questions.
+    """
+    overrides = _resolve_config_gates()
+    entries: list[tuple[str, str, str]] = []
+    seen: set[str] = set()
+
+    # Reserve /hermes as the catch-all top-level command.
+    entries.append(("hermes", "Talk to Hermes or run a subcommand", "[subcommand] [args]"))
+    seen.add("hermes")
+
+    def _add(name: str, desc: str, hint: str) -> None:
+        slack_name = _sanitize_slack_name(name)
+        if not slack_name or slack_name in seen:
+            return
+        if len(entries) >= _SLACK_MAX_SLASH_COMMANDS:
+            return
+        # Slack description cap is 2000 chars; keep it short.
+        entries.append((slack_name, desc[:140], hint[:100]))
+        seen.add(slack_name)
+
+    # First pass: canonical names (so they win slots if we hit the cap).
+    for cmd in COMMAND_REGISTRY:
+        if not _is_gateway_available(cmd, overrides):
+            continue
+        _add(cmd.name, cmd.description, cmd.args_hint or "")
+
+    # Second pass: aliases.
+    for cmd in COMMAND_REGISTRY:
+        if not _is_gateway_available(cmd, overrides):
+            continue
+        for alias in cmd.aliases:
+            # Skip aliases that only differ from canonical by case/punctuation
+            # normalization (already covered by _add dedup).
+            _add(alias, f"Alias for /{cmd.name} — {cmd.description}", cmd.args_hint or "")
+
+    # Third pass: plugin commands.
+    for name, description, args_hint in _iter_plugin_command_entries():
+        _add(name, description, args_hint or "")
+
+    return entries
+
+
+def slack_app_manifest(request_url: str = "https://hermes-agent.local/slack/commands") -> dict[str, Any]:
+    """Generate a Slack app manifest with all gateway commands as slashes.
+
+    ``request_url`` is required by Slack's manifest schema for every slash
+    command, but in Socket Mode (which we use) Slack ignores it and routes
+    the command event through the WebSocket. A placeholder URL is fine.
+
+    The returned dict is the ``features.slash_commands`` portion only —
+    callers compose it into a full manifest (or merge into an existing
+    one). Keeping it narrow avoids coupling us to the rest of the manifest
+    schema (display_information, oauth_config, settings, etc.) which users
+    set up once in the Slack UI and rarely change.
+    """
+    slashes = []
+    for name, desc, usage in slack_native_slashes():
+        entry = {
+            "command": f"/{name}",
+            "description": desc or f"Run /{name}",
+            "should_escape": False,
+            "url": request_url,
+        }
+        if usage:
+            entry["usage_hint"] = usage
+        slashes.append(entry)
+    return {"features": {"slash_commands": slashes}}
+
+
 def slack_subcommand_map() -> dict[str, str]:
    """Return subcommand -> /command mapping for Slack /hermes handler.

@@ -389,6 +389,20 @@ DEFAULT_CONFIG = {
        # (60+ tool iterations with tiny output) before users assume the
        # bot is dead and /restart.
        "gateway_notify_interval": 180,
+        # How user-attached images are presented to the main model on each turn.
+        #   "auto"   — attach natively when the active model reports
+        #              supports_vision=True AND the user hasn't explicitly
+        #              configured auxiliary.vision.provider.  Otherwise fall
+        #              back to text (vision_analyze pre-analysis).
+        #   "native" — always attach natively; non-vision models will either
+        #              error at the provider or get a last-chance text fallback
+        #              (see run_agent._prepare_messages_for_api).
+        #   "text"   — always pre-analyze with vision_analyze and prepend the
+        #              description as text; the main model never sees pixels.
+        # Affects gateway platforms, the TUI, and CLI /attach.  vision_analyze
+        # remains available as a tool regardless of this setting — the routing
+        # only controls how inbound user images are presented.
+        "image_input_mode": "auto",
    },
    
    "terminal": {
@@ -465,6 +479,7 @@ DEFAULT_CONFIG = {
        "command_timeout": 30,  # Timeout for browser commands in seconds (screenshot, navigate, etc.)
        "record_sessions": False,  # Auto-record browser sessions as WebM videos
        "allow_private_urls": False,  # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
+        "auto_local_for_private_urls": True,  # When a cloud provider is set, auto-spawn local Chromium for LAN/localhost URLs instead of sending them to the cloud
        "cdp_url": "",  # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
        # CDP supervisor — dialog + frame detection via a persistent WebSocket.
        # Active only when a CDP-capable backend is attached (Browserbase or
@@ -486,6 +501,19 @@ DEFAULT_CONFIG = {
    "checkpoints": {
        "enabled": True,
        "max_snapshots": 50,  # Max checkpoints to keep per directory
+        # Auto-maintenance: shadow repos accumulate forever under
+        # ~/.hermes/checkpoints/ (one per cd'd working directory). Field
+        # reports put the typical offender at 1000+ repos / ~12 GB. When
+        # auto_prune is on, hermes sweeps at startup (at most once per
+        # min_interval_hours) and deletes:
+        #   * orphan repos: HERMES_WORKDIR no longer exists on disk
+        #   * stale repos:  newest mtime older than retention_days
+        # Opt-in so users who rely on /rollback against long-ago sessions
+        # never lose data silently.
+        "auto_prune": False,
+        "retention_days": 7,
+        "delete_orphans": True,
+        "min_interval_hours": 24,
    },

    # Maximum characters returned by a single read_file call.  Reads that
@@ -626,7 +654,7 @@ DEFAULT_CONFIG = {
        "compact": False,
        "personality": "kawaii",
        "resume_display": "full",
-        "busy_input_mode": "interrupt",
+        "busy_input_mode": "interrupt",  # interrupt | queue | steer
        "bell_on_complete": False,
        "show_reasoning": False,
        "streaming": False,
@@ -959,6 +987,27 @@ DEFAULT_CONFIG = {
        "backup_count": 3,     # Number of rotated backup files to keep
    },

+    # Remotely-hosted model catalog manifest.  When enabled, the CLI fetches
+    # curated model lists for OpenRouter and Nous Portal from this URL,
+    # falling back to the in-repo snapshot on network failure.  Lets us
+    # update model picker lists without shipping a hermes-agent release.
+    # The default URL is served by the docs site GitHub Pages deploy.
+    "model_catalog": {
+        "enabled": True,
+        "url": "https://hermes-agent.nousresearch.com/docs/api/model-catalog.json",
+        # Disk cache TTL in hours.  Beyond this, the CLI refetches on the
+        # next /model or `hermes model` invocation; network failures
+        # silently fall back to the stale cache.
+        "ttl_hours": 24,
+        # Optional per-provider override URLs for third parties that want
+        # to self-host their own curation list using the same schema.
+        # Example:
+        #   providers:
+        #     openrouter:
+        #       url: https://example.com/my-curation.json
+        "providers": {},
+    },
+
    # Network settings — workarounds for connectivity issues.
    "network": {
        # Force IPv4 connections.  On servers with broken or unreachable IPv6,
@@ -995,6 +1044,27 @@ DEFAULT_CONFIG = {
        "min_interval_hours": 24,
    },

+    # Contextual first-touch onboarding hints (see agent/onboarding.py).
+    # Each hint is shown once per install and then latched here so it
+    # never fires again.  Users can wipe the section to re-see all hints.
+    "onboarding": {
+        "seen": {},
+    },
+
+    # ``hermes update`` behaviour.
+    "updates": {
+        # Run a full ``hermes backup``-style zip of HERMES_HOME before every
+        # ``hermes update``.  Backups land in ``<HERMES_HOME>/backups/`` and
+        # can be restored with ``hermes import <path>``.  Off by default —
+        # on large HERMES_HOME directories the zip can add minutes to every
+        # update.  Set to true to re-enable, or pass ``--backup`` to opt in
+        # for a single update run.
+        "pre_update_backup": False,
+        # How many pre-update backup zips to retain.  Older ones are pruned
+        # automatically after each successful backup.
+        "backup_keep": 5,
+    },
+
    # Config schema version - bump this when adding new required fields
    "_config_version": 22,
 }
@@ -1184,6 +1254,22 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "GMI_API_KEY": {
+        "description": "GMI Cloud API key",
+        "prompt": "GMI Cloud API key",
+        "url": "https://www.gmicloud.ai/",
+        "password": True,
+        "category": "provider",
+        "advanced": True,
+    },
+    "GMI_BASE_URL": {
+        "description": "GMI Cloud base URL override",
+        "prompt": "GMI Cloud base URL (leave empty for default)",
+        "url": None,
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
    "MINIMAX_API_KEY": {
        "description": "MiniMax API key (international)",
        "prompt": "MiniMax API key",
@@ -1553,6 +1639,44 @@ OPTIONAL_ENV_VARS = {
        "category": "tool",
    },

+    # ── Bundled skills (opt-in: only needed if the user uses that skill) ──
+    # These use category="skill" (distinct from "tool") so the sandbox
+    # env blocklist in tools/environments/local.py does NOT rewrite them —
+    # skills legitimately need these passed through to curl via
+    # tools/env_passthrough.py when the user's skill calls out.
+    "NOTION_API_KEY": {
+        "description": "Notion integration token (used by the `notion` skill)",
+        "prompt": "Notion API key",
+        "url": "https://www.notion.so/my-integrations",
+        "password": True,
+        "category": "skill",
+        "advanced": True,
+    },
+    "LINEAR_API_KEY": {
+        "description": "Linear personal API key (used by the `linear` skill)",
+        "prompt": "Linear API key",
+        "url": "https://linear.app/settings/api",
+        "password": True,
+        "category": "skill",
+        "advanced": True,
+    },
+    "AIRTABLE_API_KEY": {
+        "description": "Airtable personal access token (used by the `airtable` skill)",
+        "prompt": "Airtable API key",
+        "url": "https://airtable.com/create/tokens",
+        "password": True,
+        "category": "skill",
+        "advanced": True,
+    },
+    "TENOR_API_KEY": {
+        "description": "Tenor API key for GIF search (used by the `gif-search` skill)",
+        "prompt": "Tenor API key",
+        "url": "https://developers.google.com/tenor/guides/quickstart",
+        "password": True,
+        "category": "skill",
+        "advanced": True,
+    },
+
    # ── Honcho ──
    "HONCHO_API_KEY": {
        "description": "Honcho API key for AI-native persistent memory",
@@ -45,8 +45,13 @@ def _pending_file() -> Path:
    Each entry: ``{"url": "...", "expire_at": <unix_ts>}``.  Scheduled
    DELETEs used to be handled by spawning a detached Python process per
    paste that slept for 6 hours; those accumulated forever if the user
-    ran ``hermes debug share`` repeatedly.  We now persist the schedule
-    to disk and sweep expired entries on the next debug invocation.
+    ran ``hermes debug share`` repeatedly.
+
+    Deletion is now driven by the gateway's cron ticker
+    (``gateway/run.py::_start_cron_ticker``) which calls
+    ``_sweep_expired_pastes`` once per hour.  ``hermes debug share`` also
+    runs an opportunistic sweep on entry as a fallback for CLI-only users
+    who never start the gateway.
    """
    return get_hermes_home() / "pastes" / "pending.json"

@@ -223,9 +228,10 @@ def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SEC
    interpreters that never exited until the sleep completed.

    The replacement is stateless: we append to ``~/.hermes/pastes/pending.json``
-    and rely on opportunistic sweeps (``_sweep_expired_pastes``) called from
-    every ``hermes debug`` invocation.  If the user never runs ``hermes debug``
-    again, paste.rs's own retention policy handles cleanup.
+    and the gateway's cron ticker sweeps expired entries once per hour.
+    ``hermes debug share`` also runs an opportunistic sweep as a fallback
+    for CLI-only users.  If neither runs again, paste.rs's own retention
+    policy handles cleanup.
    """
    _record_pending(urls, delay_seconds=delay_seconds)

@@ -46,6 +46,7 @@ _PROVIDER_ENV_HINTS = (
    "Z_AI_API_KEY",
    "KIMI_API_KEY",
    "KIMI_CN_API_KEY",
+    "GMI_API_KEY",
    "MINIMAX_API_KEY",
    "MINIMAX_CN_API_KEY",
    "KILOCODE_API_KEY",
@@ -937,6 +938,7 @@ def run_doctor(args):
        ("StepFun Step Plan",   ("STEPFUN_API_KEY",),                           "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
        ("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",),                    "https://api.moonshot.cn/v1/models",   None, True),
        ("Arcee AI",         ("ARCEEAI_API_KEY",),                            "https://api.arcee.ai/api/v1/models",  "ARCEE_BASE_URL", True),
+        ("GMI Cloud",        ("GMI_API_KEY",),                                "https://api.gmi-serving.com/v1/models", "GMI_BASE_URL", True),
        ("DeepSeek",         ("DEEPSEEK_API_KEY",),                           "https://api.deepseek.com/v1/models",  "DEEPSEEK_BASE_URL", True),
        ("Hugging Face",     ("HF_TOKEN",),                                   "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
        ("NVIDIA NIM",       ("NVIDIA_API_KEY",),                             "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
@@ -0,0 +1,361 @@
+"""
+hermes fallback — manage the fallback provider chain.
+
+Fallback providers are tried in order when the primary model fails with
+rate-limit, overload, or connection errors. See:
+https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers
+
+Subcommands:
+  hermes fallback [list]   Show the current fallback chain (default when no subcommand)
+  hermes fallback add      Pick provider + model via the same picker as `hermes model`,
+                           then append the selection to the chain
+  hermes fallback remove   Pick an entry to delete from the chain
+  hermes fallback clear    Remove all fallback entries
+
+Storage: ``fallback_providers`` in ``~/.hermes/config.yaml`` (top-level, list of
+``{provider, model, base_url?, api_mode?}`` dicts).  The legacy single-dict
+``fallback_model`` format is migrated to the new list format on first add.
+"""
+from __future__ import annotations
+
+import copy
+from typing import Any, Dict, List, Optional
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _read_chain(config: Dict[str, Any]) -> List[Dict[str, Any]]:
+    """Return the normalized fallback chain as a list of dicts.
+
+    Accepts both the new list format (``fallback_providers``) and the legacy
+    single-dict format (``fallback_model``).  The returned list is always a
+    fresh copy — callers can mutate without touching the config dict.
+    """
+    chain = config.get("fallback_providers") or []
+    if isinstance(chain, list):
+        result = [dict(e) for e in chain if isinstance(e, dict) and e.get("provider") and e.get("model")]
+        if result:
+            return result
+    legacy = config.get("fallback_model")
+    if isinstance(legacy, dict) and legacy.get("provider") and legacy.get("model"):
+        return [dict(legacy)]
+    if isinstance(legacy, list):
+        return [dict(e) for e in legacy if isinstance(e, dict) and e.get("provider") and e.get("model")]
+    return []
+
+
+def _write_chain(config: Dict[str, Any], chain: List[Dict[str, Any]]) -> None:
+    """Persist the chain to ``fallback_providers`` and clear legacy key."""
+    config["fallback_providers"] = chain
+    # Drop the legacy single-dict key on write so there's only one source of truth.
+    if "fallback_model" in config:
+        config.pop("fallback_model", None)
+
+
+def _format_entry(entry: Dict[str, Any]) -> str:
+    """One-line human-readable rendering of a fallback entry."""
+    provider = entry.get("provider", "?")
+    model = entry.get("model", "?")
+    base = entry.get("base_url")
+    suffix = f"  [{base}]" if base else ""
+    return f"{model}  (via {provider}){suffix}"
+
+
+def _extract_fallback_from_model_cfg(model_cfg: Any) -> Optional[Dict[str, Any]]:
+    """Pull the ``{provider, model, base_url?, api_mode?}`` dict from a ``config["model"]`` snapshot."""
+    if not isinstance(model_cfg, dict):
+        return None
+    provider = (model_cfg.get("provider") or "").strip()
+    # The picker writes the selected model to ``model.default``.
+    model = (model_cfg.get("default") or model_cfg.get("model") or "").strip()
+    if not provider or not model:
+        return None
+    entry: Dict[str, Any] = {"provider": provider, "model": model}
+    base_url = (model_cfg.get("base_url") or "").strip()
+    if base_url:
+        entry["base_url"] = base_url
+    api_mode = (model_cfg.get("api_mode") or "").strip()
+    if api_mode:
+        entry["api_mode"] = api_mode
+    return entry
+
+
+def _snapshot_auth_active_provider() -> Any:
+    """Return the current ``active_provider`` in auth.json, or a sentinel if unavailable."""
+    try:
+        from hermes_cli.auth import _load_auth_store
+        store = _load_auth_store()
+        return store.get("active_provider")
+    except Exception:
+        return None
+
+
+def _restore_auth_active_provider(value: Any) -> None:
+    """Write back a previously snapshotted ``active_provider`` value."""
+    try:
+        from hermes_cli.auth import _auth_store_lock, _load_auth_store, _save_auth_store
+        with _auth_store_lock():
+            store = _load_auth_store()
+            store["active_provider"] = value
+            _save_auth_store(store)
+    except Exception:
+        # Best-effort — if auth.json can't be restored, the user's primary
+        # provider may have been deactivated by the picker.  They can re-run
+        # `hermes model` to fix it.  Don't fail the fallback add.
+        pass
+
+
+# ---------------------------------------------------------------------------
+# Subcommand handlers
+# ---------------------------------------------------------------------------
+
+def cmd_fallback_list(args) -> None:  # noqa: ARG001
+    """Print the current fallback chain."""
+    from hermes_cli.config import load_config
+
+    config = load_config()
+    chain = _read_chain(config)
+
+    print()
+    if not chain:
+        print("  No fallback providers configured.")
+        print()
+        print("  Add one with:  hermes fallback add")
+        print()
+        return
+
+    primary = _describe_primary(config)
+    if primary:
+        print(f"  Primary:   {primary}")
+        print()
+    print(f"  Fallback chain ({len(chain)} {'entry' if len(chain) == 1 else 'entries'}):")
+    for i, entry in enumerate(chain, 1):
+        print(f"    {i}. {_format_entry(entry)}")
+    print()
+    print("  Tried in order when the primary fails (rate-limit, 5xx, connection errors).")
+    print("  Docs: https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers")
+    print()
+
+
+def _describe_primary(config: Dict[str, Any]) -> Optional[str]:
+    """One-line description of the primary model for display purposes."""
+    model_cfg = config.get("model")
+    if isinstance(model_cfg, dict):
+        provider = (model_cfg.get("provider") or "?").strip() or "?"
+        model = (model_cfg.get("default") or model_cfg.get("model") or "?").strip() or "?"
+        return f"{model}  (via {provider})"
+    if isinstance(model_cfg, str) and model_cfg.strip():
+        return model_cfg.strip()
+    return None
+
+
+def cmd_fallback_add(args) -> None:
+    """Launch the same picker as `hermes model`, then append the selection to the chain."""
+    from hermes_cli.main import _require_tty, select_provider_and_model
+    from hermes_cli.config import load_config, save_config
+
+    _require_tty("fallback add")
+
+    # Snapshot BEFORE the picker runs so we can distinguish "user actually
+    # picked something" from "user cancelled" by comparing before/after.
+    before_cfg = load_config()
+    model_before = copy.deepcopy(before_cfg.get("model"))
+    active_provider_before = _snapshot_auth_active_provider()
+
+    print()
+    print("  Adding a fallback provider.  The picker below is the same one used by")
+    print("  `hermes model` — select the provider + model you want as a fallback.")
+    print()
+
+    try:
+        select_provider_and_model(args=args)
+    except SystemExit:
+        # Some provider flows exit on auth failure — restore state and re-raise.
+        _restore_model_cfg(model_before)
+        _restore_auth_active_provider(active_provider_before)
+        raise
+
+    # Read the post-picker state to see what the user selected.
+    after_cfg = load_config()
+    model_after = after_cfg.get("model")
+
+    new_entry = _extract_fallback_from_model_cfg(model_after)
+    if not new_entry:
+        # Picker didn't complete (user cancelled or flow bailed).  Nothing to do.
+        _restore_model_cfg(model_before)
+        _restore_auth_active_provider(active_provider_before)
+        print()
+        print("  No fallback added.")
+        return
+
+    # Picker picked the same thing that's already the primary → nothing changed,
+    # and there's nothing useful to add as a fallback to itself.
+    primary_entry = _extract_fallback_from_model_cfg(model_before)
+    if primary_entry and primary_entry["provider"] == new_entry["provider"] \
+            and primary_entry["model"] == new_entry["model"]:
+        _restore_model_cfg(model_before)
+        _restore_auth_active_provider(active_provider_before)
+        print()
+        print(f"  Selected model matches the current primary ({_format_entry(new_entry)}).")
+        print("  A provider cannot be a fallback for itself — no change.")
+        return
+
+    # Reload the config with the primary restored, then append the new entry
+    # to ``fallback_providers``.  We deliberately re-load (rather than mutating
+    # ``after_cfg``) because the picker may have touched other top-level keys
+    # (custom_providers, providers credentials) that we want to keep.
+    _restore_model_cfg(model_before)
+    _restore_auth_active_provider(active_provider_before)
+
+    final_cfg = load_config()
+    chain = _read_chain(final_cfg)
+
+    # Reject exact-duplicate fallback entries.
+    for existing in chain:
+        if existing.get("provider") == new_entry["provider"] \
+                and existing.get("model") == new_entry["model"]:
+            print()
+            print(f"  {_format_entry(new_entry)} is already in the fallback chain — skipped.")
+            return
+
+    chain.append(new_entry)
+    _write_chain(final_cfg, chain)
+    save_config(final_cfg)
+
+    print()
+    print(f"  Added fallback: {_format_entry(new_entry)}")
+    print(f"  Chain is now {len(chain)} {'entry' if len(chain) == 1 else 'entries'} long.")
+    print()
+    print("  Run `hermes fallback list` to view, or `hermes fallback remove` to delete.")
+
+
+def _restore_model_cfg(model_before: Any) -> None:
+    """Restore ``config["model"]`` to a previously-captured snapshot."""
+    from hermes_cli.config import load_config, save_config
+
+    cfg = load_config()
+    if model_before is None:
+        cfg.pop("model", None)
+    else:
+        cfg["model"] = copy.deepcopy(model_before)
+    save_config(cfg)
+
+
+def cmd_fallback_remove(args) -> None:  # noqa: ARG001
+    """Pick an entry from the chain and remove it."""
+    from hermes_cli.config import load_config, save_config
+
+    config = load_config()
+    chain = _read_chain(config)
+
+    if not chain:
+        print()
+        print("  No fallback providers configured — nothing to remove.")
+        print()
+        return
+
+    choices = [_format_entry(e) for e in chain]
+    choices.append("Cancel")
+
+    try:
+        from hermes_cli.setup import _curses_prompt_choice
+        idx = _curses_prompt_choice("Select a fallback to remove:", choices, 0)
+    except Exception:
+        idx = _numbered_pick("Select a fallback to remove:", choices)
+
+    if idx is None or idx < 0 or idx >= len(chain):
+        print()
+        print("  Cancelled — no change.")
+        return
+
+    removed = chain.pop(idx)
+    _write_chain(config, chain)
+    save_config(config)
+
+    print()
+    print(f"  Removed fallback: {_format_entry(removed)}")
+    if chain:
+        print(f"  Chain is now {len(chain)} {'entry' if len(chain) == 1 else 'entries'} long.")
+    else:
+        print("  Fallback chain is now empty.")
+    print()
+
+
+def cmd_fallback_clear(args) -> None:  # noqa: ARG001
+    """Remove all fallback entries (with confirmation)."""
+    from hermes_cli.config import load_config, save_config
+
+    config = load_config()
+    chain = _read_chain(config)
+
+    if not chain:
+        print()
+        print("  No fallback providers configured — nothing to clear.")
+        print()
+        return
+
+    print()
+    print(f"  Current fallback chain ({len(chain)} {'entry' if len(chain) == 1 else 'entries'}):")
+    for i, entry in enumerate(chain, 1):
+        print(f"    {i}. {_format_entry(entry)}")
+    print()
+    try:
+        resp = input("  Clear all entries? [y/N]: ").strip().lower()
+    except (KeyboardInterrupt, EOFError):
+        print()
+        print("  Cancelled.")
+        return
+    if resp not in ("y", "yes"):
+        print("  Cancelled — no change.")
+        return
+
+    _write_chain(config, [])
+    save_config(config)
+    print()
+    print("  Fallback chain cleared.")
+    print()
+
+
+def _numbered_pick(question: str, choices: List[str]) -> Optional[int]:
+    """Fallback numbered-list picker when curses is unavailable."""
+    print(question)
+    for i, c in enumerate(choices, 1):
+        print(f"  {i}. {c}")
+    print()
+    while True:
+        try:
+            val = input(f"Choice [1-{len(choices)}]: ").strip()
+            if not val:
+                return None
+            idx = int(val) - 1
+            if 0 <= idx < len(choices):
+                return idx
+            print(f"Please enter 1-{len(choices)}")
+        except ValueError:
+            print("Please enter a number")
+        except (KeyboardInterrupt, EOFError):
+            print()
+            return None
+
+
+# ---------------------------------------------------------------------------
+# Dispatch
+# ---------------------------------------------------------------------------
+
+def cmd_fallback(args) -> None:
+    """Top-level dispatcher for ``hermes fallback [subcommand]``."""
+    sub = getattr(args, "fallback_command", None)
+    if sub in (None, "", "list", "ls"):
+        cmd_fallback_list(args)
+    elif sub == "add":
+        cmd_fallback_add(args)
+    elif sub in ("remove", "rm"):
+        cmd_fallback_remove(args)
+    elif sub == "clear":
+        cmd_fallback_clear(args)
+    else:
+        print(f"Unknown fallback subcommand: {sub}")
+        print("Use one of: list, add, remove, clear")
+        raise SystemExit(2)
@@ -2724,6 +2724,24 @@ _PLATFORMS = [
             "help": "OpenID to deliver cron results and notifications to."},
        ],
    },
+    {
+        "key": "yuanbao",
+        "label": "Yuanbao",
+        "emoji": "💎",
+        "token_var": "YUANBAO_APP_ID",
+        "setup_instructions": [
+            "1. Download the Yuanbao app from https://yuanbao.tencent.com/",
+            "2. In the app, go to PAI → My Bot and create a new bot",
+            "3. After the bot is created, copy the App ID and App Secret",
+            "4. Enter them below and Hermes will connect automatically over WebSocket",
+        ],
+        "vars": [
+            {"name": "YUANBAO_APP_ID", "prompt": "App ID", "password": False,
+             "help": "The App ID from your Yuanbao IM Bot credentials."},
+            {"name": "YUANBAO_APP_SECRET", "prompt": "App Secret", "password": True,
+             "help": "The App Secret (used for HMAC signing) from your Yuanbao IM Bot."},
+        ],
+    },
 ]


@@ -3108,6 +3126,12 @@ def _setup_wecom():
    print_success("💬 WeCom configured!")


+def _setup_yuanbao():
+    """Configure Yuanbao via the standard platform setup."""
+    yuanbao_platform = next(p for p in _PLATFORMS if p["key"] == "yuanbao")
+    _setup_standard_platform(yuanbao_platform)
+
+
 def _is_service_installed() -> bool:
    """Check if the gateway is installed as a system service."""
    if supports_systemd_services():
@@ -44,6 +44,7 @@ Usage:
 """

 import argparse
+import json
 import os
 import shutil
 import subprocess
@@ -595,17 +596,22 @@ def _session_browse_picker(sessions: list) -> Optional[str]:


 def _resolve_last_session(source: str = "cli") -> Optional[str]:
-    """Look up the most recent session ID for a source."""
+    """Look up the most recently-used session ID for a source."""
+    db = None
    try:
        from hermes_state import SessionDB

        db = SessionDB()
        sessions = db.search_sessions(source=source, limit=1)
-        db.close()
-        if sessions:
-            return sessions[0]["id"]
+        return sessions[0]["id"] if sessions else None
    except Exception:
        pass
+    finally:
+        if db is not None:
+            try:
+                db.close()
+            except Exception:
+                pass
    return None


@@ -760,9 +766,20 @@ def _resolve_session_by_name_or_id(name_or_id: str) -> Optional[str]:
    return None


-def _print_tui_exit_summary(session_id: Optional[str]) -> None:
+def _read_tui_active_session_file(path: Optional[str]) -> Optional[str]:
+    if not path:
+        return None
+    try:
+        data = json.loads(Path(path).read_text(encoding="utf-8"))
+        sid = str(data.get("session_id") or "").strip()
+        return sid or None
+    except Exception:
+        return None
+
+
+def _print_tui_exit_summary(session_id: Optional[str], active_session_file: Optional[str] = None) -> None:
    """Print a shell-visible epilogue after TUI exits."""
-    target = session_id or _resolve_last_session(source="tui")
+    target = _read_tui_active_session_file(active_session_file) or session_id or _resolve_last_session(source="tui")
    if not target:
        return

@@ -812,8 +829,29 @@ def _print_tui_exit_summary(session_id: Optional[str]) -> None:
    )


+_NPM_LOCK_RUNTIME_KEYS = frozenset({"ideallyInert"})
+
+
 def _tui_need_npm_install(root: Path) -> bool:
-    """True when @hermes/ink is missing or node_modules is behind package-lock.json (post-pull)."""
+    """True when @hermes/ink is missing or node_modules is behind package-lock.json.
+
+    Compares ``package-lock.json`` against ``node_modules/.package-lock.json``
+    (npm's hidden lockfile) by **content**, not mtime: git checkouts and npm
+    rewrites can bump the root lockfile's timestamp even when installed deps
+    already match, which used to trigger a spurious "Installing TUI
+    dependencies" on every launch.
+
+    For each entry in the root lock's ``packages`` map:
+      - missing from hidden lock → reinstall (unless the entry is marked
+        ``optional`` or ``peer``, which npm may intentionally skip per platform)
+      - present but with differing fields (excluding npm-written runtime
+        annotations like ``ideallyInert``) → reinstall
+
+    Extra entries that exist only in the hidden lock are ignored — stale
+    transitives left over from a removed dependency don't break runtime and
+    we'd rather not force a reinstall for them. Falls back to mtime
+    comparison if either lockfile is unparseable.
+    """
    ink = root / "node_modules" / "@hermes" / "ink" / "package.json"
    if not ink.is_file():
        return True
@@ -823,7 +861,35 @@ def _tui_need_npm_install(root: Path) -> bool:
    marker = root / "node_modules" / ".package-lock.json"
    if not marker.is_file():
        return True
-    return lock.stat().st_mtime > marker.stat().st_mtime
+
+    # Compare lockfile contents, not mtimes: git checkouts and npm rewrites
+    # can bump the root lockfile timestamp even when installed deps already
+    # match. Fall back to mtime when either file is unparseable.
+    try:
+        wanted = json.loads(lock.read_text(encoding="utf-8")).get("packages") or {}
+        installed = json.loads(marker.read_text(encoding="utf-8")).get("packages") or {}
+    except (OSError, UnicodeDecodeError, json.JSONDecodeError):
+        return lock.stat().st_mtime > marker.stat().st_mtime
+
+    def comparable(pkg: dict) -> dict:
+        return {k: v for k, v in pkg.items() if k not in _NPM_LOCK_RUNTIME_KEYS}
+
+    for name, pkg in wanted.items():
+        if not name:
+            continue
+
+        if not isinstance(pkg, dict):
+            continue
+
+        if name not in installed:
+            if pkg.get("optional") or pkg.get("peer"):
+                continue
+            return True
+
+        if isinstance(installed[name], dict) and comparable(pkg) != comparable(installed[name]):
+            return True
+
+    return False


 def _find_bundled_tui(tui_dir: Path) -> Optional[Path]:
@@ -1037,12 +1103,20 @@ def _launch_tui(
    """Replace current process with the TUI."""
    tui_dir = PROJECT_ROOT / "ui-tui"

+    import tempfile
+
    env = os.environ.copy()
+    active_session_fd, active_session_file = tempfile.mkstemp(
+        prefix="hermes-tui-active-session-", suffix=".json"
+    )
+    os.close(active_session_fd)
+    env["HERMES_TUI_ACTIVE_SESSION_FILE"] = active_session_file
    env["HERMES_PYTHON_SRC_ROOT"] = os.environ.get(
        "HERMES_PYTHON_SRC_ROOT", str(PROJECT_ROOT)
    )
    env.setdefault("HERMES_PYTHON", sys.executable)
    env.setdefault("HERMES_CWD", os.getcwd())
+    env.setdefault("NODE_ENV", "development" if tui_dev else "production")
    if model:
        env["HERMES_MODEL"] = model
        env["HERMES_INFERENCE_MODEL"] = model
@@ -1064,13 +1138,20 @@ def _launch_tui(
        env["HERMES_TUI_RESUME"] = resume_session_id

    argv, cwd = _make_tui_argv(tui_dir, tui_dev)
+    code: Optional[int] = None
    try:
-        code = subprocess.call(argv, cwd=str(cwd), env=env)
-    except KeyboardInterrupt:
-        code = 130
+        try:
+            code = subprocess.call(argv, cwd=str(cwd), env=env)
+        except KeyboardInterrupt:
+            code = 130

-    if code in (0, 130):
-        _print_tui_exit_summary(resume_session_id)
+        if code in (0, 130):
+            _print_tui_exit_summary(resume_session_id, active_session_file)
+    finally:
+        try:
+            os.unlink(active_session_file)
+        except OSError:
+            pass

    sys.exit(code)

@@ -1736,6 +1817,7 @@ def select_provider_and_model(args=None):
        "huggingface",
        "xiaomi",
        "arcee",
+        "gmi",
        "nvidia",
        "ollama-cloud",
    ):
@@ -2315,13 +2397,13 @@ def _model_flow_nous(config, current_model="", args=None):
    # The live /models endpoint returns hundreds of models; the curated list
    # shows only agentic models users recognize from OpenRouter.
    from hermes_cli.models import (
-        _PROVIDER_MODELS,
+        get_curated_nous_model_ids,
        get_pricing_for_provider,
        check_nous_free_tier,
        partition_nous_models_by_tier,
    )

-    model_ids = _PROVIDER_MODELS.get("nous", [])
+    model_ids = get_curated_nous_model_ids()
    if not model_ids:
        print("No curated models available for Nous Portal.")
        return
@@ -3331,7 +3413,26 @@ def _model_flow_named_custom(config, provider_info):
            provider_entry = providers_cfg.get(provider_key)
            if isinstance(provider_entry, dict):
                provider_entry["default_model"] = model_name
-                if config_api_key and not str(provider_entry.get("api_key", "") or "").strip():
+                # Only persist an inline api_key when the user originally had
+                # one (either a literal secret or a ``${VAR}`` template). When
+                # the entry relies on ``key_env``, do not synthesize a
+                # ``${key_env}`` api_key — the runtime already resolves the
+                # key from ``key_env`` directly, and writing the resolved
+                # secret (or even a synthesized template) would silently
+                # downgrade credential hygiene on entries that intentionally
+                # keep plaintext out of ``config.yaml``. See issue #15803.
+                original_api_key_ref = str(
+                    provider_info.get("api_key_ref", "") or ""
+                ).strip()
+                original_api_key = str(
+                    provider_info.get("api_key", "") or ""
+                ).strip()
+                had_inline_api_key = bool(original_api_key_ref or original_api_key)
+                if (
+                    had_inline_api_key
+                    and config_api_key
+                    and not str(provider_entry.get("api_key", "") or "").strip()
+                ):
                    provider_entry["api_key"] = config_api_key
                if key_env and not str(provider_entry.get("key_env", "") or "").strip():
                    provider_entry["key_env"] = key_env
@@ -4412,8 +4513,14 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
        from hermes_cli.models import fetch_ollama_cloud_models

        api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
+        # During setup, force a live refresh so the picker reflects newly
+        # released models (e.g. deepseek v4 flash, kimi k2.6) the moment
+        # the user enters their key — not an hour later when the disk
+        # cache TTL expires.
        model_list = fetch_ollama_cloud_models(
-            api_key=api_key_for_probe, base_url=effective_base
+            api_key=api_key_for_probe,
+            base_url=effective_base,
+            force_refresh=True,
        )
        if model_list:
            print(f"  Found {len(model_list)} model(s) from Ollama Cloud")
@@ -4780,6 +4887,37 @@ def cmd_webhook(args):
    webhook_command(args)


+def cmd_slack(args):
+    """Slack integration helpers.
+
+    Dispatches ``hermes slack <subcommand>``. Currently supports:
+      manifest — print or write a Slack app manifest with every gateway
+                 command registered as a first-class slash.
+    """
+    sub = getattr(args, "slack_command", None)
+    if sub in (None, ""):
+        # No subcommand — print usage hint.
+        print(
+            "usage: hermes slack <subcommand>\n"
+            "\n"
+            "subcommands:\n"
+            "  manifest   Generate a Slack app manifest with every gateway\n"
+            "             command registered as a native slash\n"
+            "\n"
+            "Run `hermes slack manifest -h` for details.",
+            file=sys.stderr,
+        )
+        return 1
+
+    if sub == "manifest":
+        from hermes_cli.slack_cli import slack_manifest_command
+
+        return slack_manifest_command(args)
+
+    print(f"Unknown slack subcommand: {sub}", file=sys.stderr)
+    return 1
+
+
 def cmd_hooks(args):
    """Shell-hook inspection and management."""
    from hermes_cli.hooks import hooks_command
@@ -4953,6 +5091,83 @@ def _gateway_prompt(prompt_text: str, default: str = "", timeout: float = 300.0)
    return default


+def _web_ui_build_needed(web_dir: Path) -> bool:
+    """Return True if the web UI dist is missing or stale.
+
+    Mirrors the staleness logic used by ``_tui_build_needed()`` for the TUI.
+    The Vite build outputs to ``hermes_cli/web_dist/`` (per vite.config.ts
+    outDir: "../hermes_cli/web_dist"), NOT to ``web/dist/``.  Uses the Vite
+    manifest as the sentinel because it is written last and therefore has the
+    newest mtime of any build output.
+    """
+    dist_dir = web_dir.parent / "hermes_cli" / "web_dist"
+    sentinel = dist_dir / ".vite" / "manifest.json"
+    if not sentinel.exists():
+        sentinel = dist_dir / "index.html"
+    if not sentinel.exists():
+        return True
+    dist_mtime = sentinel.stat().st_mtime
+    skip = frozenset({"node_modules", "dist"})
+    for dirpath, dirnames, filenames in os.walk(web_dir, topdown=True):
+        dirnames[:] = [d for d in dirnames if d not in skip]
+        for fn in filenames:
+            if fn.endswith((".ts", ".tsx", ".js", ".jsx", ".css", ".html", ".vue")):
+                if os.path.getmtime(os.path.join(dirpath, fn)) > dist_mtime:
+                    return True
+    for meta in (
+        "package.json",
+        "package-lock.json",
+        "yarn.lock",
+        "pnpm-lock.yaml",
+        "vite.config.ts",
+        "vite.config.js",
+    ):
+        mp = web_dir / meta
+        if mp.exists() and mp.stat().st_mtime > dist_mtime:
+            return True
+    return False
+
+
+def _run_npm_install_deterministic(
+    npm: str,
+    cwd: Path,
+    *,
+    extra_args: tuple[str, ...] = (),
+    capture_output: bool = True,
+) -> subprocess.CompletedProcess:
+    """Run a deterministic npm install that does not mutate ``package-lock.json``.
+
+    Prefers ``npm ci`` (strict, lockfile-preserving) when a lockfile is present;
+    falls back to ``npm install`` only if ``npm ci`` fails (e.g. lockfile out of
+    sync on a WIP checkout).  Without this, ``npm install`` on npm ≥ 10 silently
+    rewrites committed lockfiles (stripping ``"peer": true`` etc.), which leaves
+    the working tree dirty and causes the next ``hermes update`` to stash the
+    lockfile — repeatedly.
+    """
+    lockfile = cwd / "package-lock.json"
+    if lockfile.exists():
+        ci_cmd = [npm, "ci", *extra_args]
+        ci_result = subprocess.run(
+            ci_cmd,
+            cwd=cwd,
+            capture_output=capture_output,
+            text=True,
+            check=False,
+        )
+        if ci_result.returncode == 0:
+            return ci_result
+        # Fall through to `npm install` — lockfile may be out of sync on a
+        # WIP fork/branch, or `npm ci` may not be available on very old npm.
+    install_cmd = [npm, "install", *extra_args]
+    return subprocess.run(
+        install_cmd,
+        cwd=cwd,
+        capture_output=capture_output,
+        text=True,
+        check=False,
+    )
+
+
 def _build_web_ui(web_dir: Path, *, fatal: bool = False) -> bool:
    """Build the web UI frontend if npm is available.

@@ -4966,6 +5181,9 @@ def _build_web_ui(web_dir: Path, *, fatal: bool = False) -> bool:
    if not (web_dir / "package.json").exists():
        return True

+    if not _web_ui_build_needed(web_dir):
+        return True
+
    npm = shutil.which("npm")
    if not npm:
        if fatal:
@@ -4973,7 +5191,7 @@ def _build_web_ui(web_dir: Path, *, fatal: bool = False) -> bool:
            print("Install Node.js, then run:  cd web && npm install && npm run build")
        return not fatal
    print("→ Building web UI...")
-    r1 = subprocess.run([npm, "install", "--silent"], cwd=web_dir, capture_output=True)
+    r1 = _run_npm_install_deterministic(npm, web_dir, extra_args=("--silent",))
    if r1.returncode != 0:
        print(
            f"  {'✗' if fatal else '⚠'} Web UI npm install failed"
@@ -5684,12 +5902,10 @@ def _update_node_dependencies() -> None:
        if not (path / "package.json").exists():
            continue

-        result = subprocess.run(
-            [npm, "install", "--silent", "--no-fund", "--no-audit", "--progress=false"],
-            cwd=path,
-            capture_output=True,
-            text=True,
-            check=False,
+        result = _run_npm_install_deterministic(
+            npm,
+            path,
+            extra_args=("--silent", "--no-fund", "--no-audit", "--progress=false"),
        )
        if result.returncode == 0:
            print(f"  ✓ {label}")
@@ -5925,6 +6141,178 @@ def _cmd_update_check():
        print(f"  Run '{recommended_update_command()}' to install.")


+def _ensure_fhs_path_guard() -> None:
+    """Ensure /usr/local/bin is on PATH for RHEL-family root non-login shells.
+
+    Mirrors the post-symlink probe added to ``scripts/install.sh`` so that
+    existing FHS-layout root installs on RHEL/CentOS/Rocky/Alma 8+ get
+    repaired on ``hermes update`` without requiring a reinstall.  The
+    installer's assumption that ``/usr/local/bin`` is on PATH for every
+    standard shell breaks on those distros in non-login interactive shells
+    (su, sudo -s, tmux panes, some web terminals): /etc/bashrc doesn't
+    add /usr/local/bin and /root/.bash_profile doesn't either.  Symptom:
+    ``hermes`` prints ``command not found`` even though the symlink lives
+    at /usr/local/bin/hermes.
+
+    Silent no-op on: non-Linux, non-root, non-FHS installs, and any system
+    where ``bash -i -c 'command -v hermes'`` already resolves.  Idempotent.
+    """
+    if sys.platform != "linux":
+        return
+    try:
+        if os.geteuid() != 0:
+            return
+    except AttributeError:
+        return
+    # Only act when this is actually an FHS-layout install (command link at
+    # /usr/local/bin/hermes, code at /usr/local/lib/hermes-agent).
+    fhs_link = Path("/usr/local/bin/hermes")
+    if not fhs_link.is_symlink() and not fhs_link.exists():
+        return
+
+    # Probe a fresh non-login interactive bash the way the user will use it.
+    # ``bash -i -c`` sources ~/.bashrc but NOT ~/.bash_profile or /etc/profile,
+    # which is the exact scenario where RHEL root loses /usr/local/bin.
+    home = os.environ.get("HOME") or "/root"
+    try:
+        probe = subprocess.run(
+            ["env", "-i",
+             f"HOME={home}",
+             f"TERM={os.environ.get('TERM', 'dumb')}",
+             "bash", "-i", "-c", "command -v hermes"],
+            capture_output=True, text=True, timeout=10,
+        )
+    except (FileNotFoundError, subprocess.TimeoutExpired):
+        return  # no bash or probe hung — don't block update on this
+    if probe.returncode == 0:
+        return  # already on PATH, nothing to do
+
+    path_line = 'export PATH="/usr/local/bin:$PATH"'
+    path_comment = (
+        "# Hermes Agent — ensure /usr/local/bin is on PATH "
+        "(RHEL non-login shells)"
+    )
+    wrote_any = False
+    for candidate in (".bashrc", ".bash_profile"):
+        cfg = Path(home) / candidate
+        if not cfg.is_file():
+            continue
+        try:
+            existing = cfg.read_text(errors="replace")
+        except OSError:
+            continue
+        # Idempotency: skip if any uncommented PATH= line already references
+        # /usr/local/bin.  Mirrors the grep pattern used by install.sh.
+        already_guarded = any(
+            "/usr/local/bin" in line
+            and "PATH" in line
+            and not line.lstrip().startswith("#")
+            for line in existing.splitlines()
+        )
+        if already_guarded:
+            continue
+        try:
+            with cfg.open("a", encoding="utf-8") as f:
+                f.write("\n" + path_comment + "\n" + path_line + "\n")
+        except OSError as e:
+            print(f"  ⚠ Could not update {cfg}: {e}")
+            continue
+        print(f"  ✓ Added /usr/local/bin to PATH in {cfg}")
+        wrote_any = True
+    if wrote_any:
+        print("    (reload your shell or run 'source ~/.bashrc' to pick it up)")
+
+
+def _run_pre_update_backup(args) -> None:
+    """Create a full zip backup of HERMES_HOME before running the update.
+
+    Gated on ``updates.pre_update_backup`` in config (default false).  Off
+    by default because the zip can add minutes to every update on large
+    HERMES_HOME directories.  The ``--backup`` flag on ``hermes update``
+    opts in for a single run; ``--no-backup`` forces it off when config
+    has it enabled.  Never raises — a backup failure should not block the
+    update itself.
+    """
+    # CLI flags win over config.  --no-backup beats --backup if both are set.
+    if getattr(args, "no_backup", False):
+        print("◆ Pre-update backup: skipped (--no-backup)")
+        print()
+        return
+
+    force_backup = bool(getattr(args, "backup", False))
+
+    try:
+        from hermes_cli.config import load_config
+        cfg = load_config()
+    except Exception as exc:
+        logging.getLogger(__name__).debug("Could not load config for pre-update backup: %s", exc)
+        cfg = {}
+
+    updates_cfg = cfg.get("updates", {}) if isinstance(cfg, dict) else {}
+    enabled = updates_cfg.get("pre_update_backup", False)
+    keep = updates_cfg.get("backup_keep", 5)
+
+    if not enabled and not force_backup:
+        # Silent by default — the backup is off, most users don't need to
+        # hear about it on every update.  They can opt in via --backup
+        # or by flipping the config knob.
+        return
+
+    try:
+        from hermes_cli.backup import create_pre_update_backup
+    except Exception as exc:
+        print(f"⚠ Pre-update backup: could not load backup module ({exc}); continuing update.")
+        print()
+        return
+
+    print("◆ Creating pre-update backup...")
+    t0 = _time.monotonic()
+    try:
+        out_path = create_pre_update_backup(keep=int(keep))
+    except Exception as exc:  # defensive — helper already swallows, but just in case
+        print(f"  ⚠ Backup failed: {exc}")
+        print("  Continuing with update.")
+        print()
+        return
+
+    elapsed = _time.monotonic() - t0
+
+    if out_path is None:
+        print("  ⚠ Backup skipped (no files found or write failed); continuing update.")
+        print()
+        return
+
+    try:
+        size_bytes = out_path.stat().st_size
+    except OSError:
+        size_bytes = 0
+
+    # Human-readable size
+    size_str = f"{size_bytes} B"
+    for unit in ("KB", "MB", "GB"):
+        if size_bytes < 1024:
+            break
+        size_bytes /= 1024
+        size_str = f"{size_bytes:.1f} {unit}"
+
+    # Render path using display_hermes_home so the user sees ~/.hermes/...
+    try:
+        from hermes_constants import get_hermes_home, display_hermes_home
+        home = get_hermes_home()
+        try:
+            display_path = f"{display_hermes_home()}/{out_path.relative_to(home)}"
+        except ValueError:
+            display_path = str(out_path)
+    except Exception:
+        display_path = str(out_path)
+
+    print(f"  Saved:    {display_path} ({size_str}, {elapsed:.1f}s)")
+    print(f"  Restore:  hermes import {out_path}")
+    print(f"  Disable:  omit --backup (backups are off by default)")
+    print(f"            set updates.pre_update_backup: false in config.yaml")
+    print()
+
+
 def cmd_update(args):
    """Update Hermes Agent to the latest version.

@@ -5967,6 +6355,10 @@ def _cmd_update_impl(args, gateway_mode: bool):
    print("⚕ Updating Hermes Agent...")
    print()

+    # Pre-update backup — runs before any git/file mutation so users can
+    # always roll back to the exact state they had before this update.
+    _run_pre_update_backup(args)
+
    # Try git-based update first, fall back to ZIP download on Windows
    # when git file I/O is broken (antivirus, NTFS filter drivers, etc.)
    use_zip_update = False
@@ -6116,6 +6508,22 @@ def _cmd_update_impl(args, gateway_mode: bool):

        print(f"→ Found {commit_count} new commit(s)")

+        # Snapshot critical state (state.db, config, pairing JSONs, etc.)
+        # before pulling so a user can recover if something goes wrong.
+        # Issue #15733 reported missing pairing data after an update; even
+        # though `git pull` can't touch $HERMES_HOME, this is cheap
+        # belt-and-suspenders insurance and gives the user something to
+        # restore from via `/snapshot list` / `/snapshot restore <id>`.
+        try:
+            from hermes_cli.backup import create_quick_snapshot
+
+            snap_id = create_quick_snapshot(label="pre-update")
+            if snap_id:
+                print(f"  ✓ Pre-update snapshot: {snap_id}")
+        except Exception as exc:
+            # Never let a snapshot failure block an update.
+            logger.debug("Pre-update snapshot failed: %s", exc)
+
        print("→ Pulling updates...")
        update_succeeded = False
        try:
@@ -6368,6 +6776,13 @@ def _cmd_update_impl(args, gateway_mode: bool):
        print()
        print("✓ Update complete!")

+        # Repair RHEL-family root installs where /usr/local/bin isn't on PATH
+        # for non-login interactive shells.  No-op on every other platform.
+        try:
+            _ensure_fhs_path_guard()
+        except Exception as e:
+            logger.debug("FHS PATH guard check failed: %s", e)
+
        # Write exit code *before* the gateway restart attempt.
        # When running as ``hermes update --gateway`` (spawned by the gateway's
        # /update command), this process lives inside the gateway's systemd
@@ -7223,6 +7638,9 @@ Examples:
    hermes auth remove <p> <t>    Remove pooled credential by index, id, or label
    hermes auth reset <provider>  Clear exhaustion status for a provider
    hermes model                  Select default model
+    hermes fallback [list]        Show fallback provider chain
+    hermes fallback add           Add a fallback provider (same picker as `hermes model`)
+    hermes fallback remove        Remove a fallback provider from the chain
    hermes config                 View configuration
    hermes config edit            Edit config in $EDITOR
    hermes config set model gpt-4 Set a config value
@@ -7414,6 +7832,7 @@ For more help on a command:
            "kilocode",
            "xiaomi",
            "arcee",
+            "gmi",
            "nvidia",
        ],
        default=None,
@@ -7564,6 +7983,42 @@ For more help on a command:
    )
    model_parser.set_defaults(func=cmd_model)

+    # =========================================================================
+    # fallback command — manage the fallback provider chain
+    # =========================================================================
+    from hermes_cli.fallback_cmd import cmd_fallback
+
+    fallback_parser = subparsers.add_parser(
+        "fallback",
+        help="Manage fallback providers (tried when the primary model fails)",
+        description=(
+            "Manage the fallback provider chain.  Fallback providers are tried "
+            "in order when the primary model fails with rate-limit, overload, or "
+            "connection errors.  See: "
+            "https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers"
+        ),
+    )
+    fallback_subparsers = fallback_parser.add_subparsers(dest="fallback_command")
+    fallback_subparsers.add_parser(
+        "list",
+        aliases=["ls"],
+        help="Show the current fallback chain (default when no subcommand)",
+    )
+    fallback_subparsers.add_parser(
+        "add",
+        help="Pick a provider + model (same picker as `hermes model`) and append to the chain",
+    )
+    fallback_subparsers.add_parser(
+        "remove",
+        aliases=["rm"],
+        help="Pick an entry to delete from the chain",
+    )
+    fallback_subparsers.add_parser(
+        "clear",
+        help="Remove all fallback entries",
+    )
+    fallback_parser.set_defaults(func=cmd_fallback)
+
    # =========================================================================
    # gateway command
    # =========================================================================
@@ -7759,6 +8214,54 @@ For more help on a command:
    )
    whatsapp_parser.set_defaults(func=cmd_whatsapp)

+    # =========================================================================
+    # slack command
+    # =========================================================================
+    slack_parser = subparsers.add_parser(
+        "slack",
+        help="Slack integration helpers (manifest generation, etc.)",
+        description="Slack integration helpers for Hermes.",
+    )
+    slack_sub = slack_parser.add_subparsers(dest="slack_command")
+    slack_manifest = slack_sub.add_parser(
+        "manifest",
+        help="Print or write a Slack app manifest with every gateway command "
+             "registered as a native slash (/btw, /stop, /model, ...)",
+        description=(
+            "Generate a Slack app manifest that registers every gateway "
+            "command in COMMAND_REGISTRY as a first-class Slack slash "
+            "command (matching Discord and Telegram parity). Paste the "
+            "output into Slack app config → Features → App Manifest → "
+            "Edit, then Save. Reinstall the app if Slack prompts for it."
+        ),
+    )
+    slack_manifest.add_argument(
+        "--write",
+        nargs="?",
+        const=True,
+        default=None,
+        metavar="PATH",
+        help="Write manifest to a file instead of stdout. With no PATH "
+             "writes to $HERMES_HOME/slack-manifest.json.",
+    )
+    slack_manifest.add_argument(
+        "--name",
+        default=None,
+        help='Bot display name (default: "Hermes")',
+    )
+    slack_manifest.add_argument(
+        "--description",
+        default=None,
+        help="Bot description shown in Slack's app directory.",
+    )
+    slack_manifest.add_argument(
+        "--slashes-only",
+        action="store_true",
+        help="Emit only the features.slash_commands array (for merging "
+             "into an existing manifest manually).",
+    )
+    slack_parser.set_defaults(func=cmd_slack)
+
    # =========================================================================
    # login command
    # =========================================================================
@@ -8390,11 +8893,17 @@ Examples:

    skills_install = skills_subparsers.add_parser("install", help="Install a skill")
    skills_install.add_argument(
-        "identifier", help="Skill identifier (e.g. openai/skills/skill-creator)"
+        "identifier",
+        help="Skill identifier (e.g. openai/skills/skill-creator) or a direct HTTP(S) URL to a SKILL.md file",
    )
    skills_install.add_argument(
        "--category", default="", help="Category folder to install into"
    )
+    skills_install.add_argument(
+        "--name",
+        default="",
+        help="Override the skill name (useful when installing from a URL whose SKILL.md has no `name:` frontmatter)",
+    )
    skills_install.add_argument(
        "--force", action="store_true", help="Install despite blocked scan verdict"
    )
@@ -8414,6 +8923,12 @@ Examples:
    skills_list.add_argument(
        "--source", default="all", choices=["all", "hub", "builtin", "local"]
    )
+    skills_list.add_argument(
+        "--enabled-only",
+        action="store_true",
+        help="Hide disabled skills. Use with -p <profile> to see exactly "
+             "which skills will load for that profile.",
+    )

    skills_check = skills_subparsers.add_parser(
        "check", help="Check installed hub skills for updates"
@@ -8920,7 +9435,7 @@ Examples:
        "--source", help="Filter by source (cli, telegram, discord, etc.)"
    )
    sessions_browse.add_argument(
-        "--limit", type=int, default=50, help="Max sessions to load (default: 50)"
+        "--limit", type=int, default=500, help="Max sessions to load (default: 500)"
    )

    def _confirm_prompt(prompt: str) -> bool:
@@ -9017,7 +9532,8 @@ Examples:
                ):
                    print("Cancelled.")
                    return
-            if db.delete_session(resolved_session_id):
+            sessions_dir = get_hermes_home() / "sessions"
+            if db.delete_session(resolved_session_id, sessions_dir=sessions_dir):
                print(f"Deleted session '{resolved_session_id}'.")
            else:
                print(f"Session '{args.session_id}' not found.")
@@ -9031,7 +9547,9 @@ Examples:
                ):
                    print("Cancelled.")
                    return
-            count = db.prune_sessions(older_than_days=days, source=args.source)
+            sessions_dir = get_hermes_home() / "sessions"
+            count = db.prune_sessions(older_than_days=days, source=args.source,
+                                      sessions_dir=sessions_dir)
            print(f"Pruned {count} session(s).")

        elif action == "rename":
@@ -9049,7 +9567,7 @@ Examples:
                print(f"Error: {e}")

        elif action == "browse":
-            limit = getattr(args, "limit", 50) or 50
+            limit = getattr(args, "limit", 500) or 500
            source = getattr(args, "source", None)
            _browse_exclude = None if source else ["tool"]
            sessions = db.list_sessions_rich(
@@ -9235,6 +9753,18 @@ Examples:
        default=False,
        help="Check whether an update is available without installing anything",
    )
+    update_parser.add_argument(
+        "--no-backup",
+        action="store_true",
+        default=False,
+        help="Skip the pre-update backup for this run (overrides updates.pre_update_backup)",
+    )
+    update_parser.add_argument(
+        "--backup",
+        action="store_true",
+        default=False,
+        help="Force a pre-update backup for this run (off by default; overrides updates.pre_update_backup)",
+    )
    update_parser.set_defaults(func=cmd_update)

    # =========================================================================
@@ -0,0 +1,329 @@
+"""Remote model catalog fetcher.
+
+The Hermes docs site hosts a JSON manifest of curated models for providers
+we want to update without shipping a release (currently OpenRouter and
+Nous Portal). This module fetches, validates, and caches that manifest,
+falling back to the in-repo hardcoded lists when the network is unavailable.
+
+Pipeline
+--------
+1. ``get_catalog()`` — returns a parsed manifest dict.
+   - Checks in-process cache (invalidated by TTL).
+   - Reads disk cache at ``~/.hermes/cache/model_catalog.json``.
+   - Fetches the master URL if disk cache is stale or missing.
+   - On any fetch failure, keeps using the stale cache (or empty dict).
+
+2. ``get_curated_openrouter_models()`` / ``get_curated_nous_models()`` —
+   thin accessors returning the shapes existing callers expect. Each
+   falls back to the in-repo hardcoded list on any lookup failure.
+
+Schema (version 1)
+------------------
+::
+
+    {
+      "version": 1,
+      "updated_at": "2026-04-25T22:00:00Z",
+      "metadata": {...},                # free-form
+      "providers": {
+        "openrouter": {
+          "metadata": {...},            # free-form
+          "models": [
+            {"id": "vendor/model", "description": "recommended",
+             "metadata": {...}}          # free-form, model-level
+          ]
+        },
+        "nous": {...}
+      }
+    }
+
+Unknown fields are ignored — extra metadata can be added at either level
+without bumping ``version``. ``version`` bumps are reserved for
+breaking changes (renaming ``providers``, changing ``models`` shape).
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import time
+import urllib.error
+import urllib.request
+from pathlib import Path
+from typing import Any
+
+from hermes_cli import __version__ as _HERMES_VERSION
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+
+DEFAULT_CATALOG_URL = (
+    "https://hermes-agent.nousresearch.com/docs/api/model-catalog.json"
+)
+DEFAULT_TTL_HOURS = 24
+DEFAULT_FETCH_TIMEOUT = 8.0
+SUPPORTED_SCHEMA_VERSION = 1
+
+_HERMES_USER_AGENT = f"hermes-cli/{_HERMES_VERSION}"
+
+# In-process cache to avoid repeated disk + parse work across multiple
+# calls within the same session. Invalidated by TTL against the disk file's
+# mtime, so calling code never has to think about this.
+_catalog_cache: dict[str, Any] | None = None
+_catalog_cache_source_mtime: float = 0.0
+
+
+# ---------------------------------------------------------------------------
+# Config
+# ---------------------------------------------------------------------------
+
+
+def _load_catalog_config() -> dict[str, Any]:
+    """Load the ``model_catalog`` config block with defaults filled in."""
+    try:
+        from hermes_cli.config import load_config
+        cfg = load_config() or {}
+    except Exception:
+        cfg = {}
+
+    raw = cfg.get("model_catalog")
+    if not isinstance(raw, dict):
+        raw = {}
+
+    return {
+        "enabled": bool(raw.get("enabled", True)),
+        "url": str(raw.get("url") or DEFAULT_CATALOG_URL),
+        "ttl_hours": float(raw.get("ttl_hours") or DEFAULT_TTL_HOURS),
+        "providers": raw.get("providers") if isinstance(raw.get("providers"), dict) else {},
+    }
+
+
+def _cache_path() -> Path:
+    """Return the disk cache path. Import lazily so tests can monkeypatch home."""
+    from hermes_constants import get_hermes_home
+    return get_hermes_home() / "cache" / "model_catalog.json"
+
+
+# ---------------------------------------------------------------------------
+# Fetch + validate + cache
+# ---------------------------------------------------------------------------
+
+
+def _fetch_manifest(url: str, timeout: float) -> dict[str, Any] | None:
+    """HTTP GET the manifest URL and return a parsed dict, or None on failure."""
+    try:
+        req = urllib.request.Request(
+            url,
+            headers={
+                "Accept": "application/json",
+                "User-Agent": _HERMES_USER_AGENT,
+            },
+        )
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            data = json.loads(resp.read().decode())
+    except (urllib.error.URLError, TimeoutError, json.JSONDecodeError, OSError) as exc:
+        logger.info("model catalog fetch failed (%s): %s", url, exc)
+        return None
+    except Exception as exc:  # pragma: no cover — defensive
+        logger.info("model catalog fetch errored (%s): %s", url, exc)
+        return None
+
+    if not _validate_manifest(data):
+        logger.info("model catalog at %s failed schema validation", url)
+        return None
+
+    return data
+
+
+def _validate_manifest(data: Any) -> bool:
+    """Return True when ``data`` matches the minimum manifest shape."""
+    if not isinstance(data, dict):
+        return False
+    version = data.get("version")
+    if not isinstance(version, int) or version > SUPPORTED_SCHEMA_VERSION:
+        # Future schema version we don't understand — refuse rather than
+        # guess. Older schemas (version < 1) aren't supported either.
+        return False
+    providers = data.get("providers")
+    if not isinstance(providers, dict):
+        return False
+    for pname, pblock in providers.items():
+        if not isinstance(pname, str) or not isinstance(pblock, dict):
+            return False
+        models = pblock.get("models")
+        if not isinstance(models, list):
+            return False
+        for m in models:
+            if not isinstance(m, dict):
+                return False
+            if not isinstance(m.get("id"), str) or not m["id"].strip():
+                return False
+    return True
+
+
+def _read_disk_cache() -> tuple[dict[str, Any] | None, float]:
+    """Return ``(data_or_none, mtime)``. mtime is 0 if file is missing."""
+    path = _cache_path()
+    try:
+        mtime = path.stat().st_mtime
+    except (OSError, FileNotFoundError):
+        return (None, 0.0)
+    try:
+        with open(path) as fh:
+            data = json.load(fh)
+    except (OSError, json.JSONDecodeError):
+        return (None, 0.0)
+    if not _validate_manifest(data):
+        return (None, 0.0)
+    return (data, mtime)
+
+
+def _write_disk_cache(data: dict[str, Any]) -> None:
+    path = _cache_path()
+    try:
+        path.parent.mkdir(parents=True, exist_ok=True)
+        tmp = path.with_suffix(path.suffix + ".tmp")
+        with open(tmp, "w") as fh:
+            json.dump(data, fh, indent=2)
+            fh.write("\n")
+        os.replace(tmp, path)
+    except OSError as exc:
+        logger.info("model catalog cache write failed: %s", exc)
+
+
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+
+
+def get_catalog(*, force_refresh: bool = False) -> dict[str, Any]:
+    """Return the parsed model catalog manifest, or an empty dict on failure.
+
+    Callers should treat a missing provider/model as "use the in-repo fallback"
+    — never raise from this function so the CLI keeps working offline.
+    """
+    global _catalog_cache, _catalog_cache_source_mtime
+
+    cfg = _load_catalog_config()
+    if not cfg["enabled"]:
+        return {}
+
+    ttl_seconds = max(0.0, cfg["ttl_hours"] * 3600.0)
+
+    disk_data, disk_mtime = _read_disk_cache()
+    now = time.time()
+    disk_fresh = disk_data is not None and (now - disk_mtime) < ttl_seconds
+
+    # In-process cache hit: disk hasn't changed since we loaded it and still fresh.
+    if (
+        not force_refresh
+        and _catalog_cache is not None
+        and disk_data is not None
+        and disk_mtime == _catalog_cache_source_mtime
+        and disk_fresh
+    ):
+        return _catalog_cache
+
+    # Disk is fresh enough — use it without a network hit.
+    if not force_refresh and disk_fresh and disk_data is not None:
+        _catalog_cache = disk_data
+        _catalog_cache_source_mtime = disk_mtime
+        return disk_data
+
+    # Need to (re)fetch. If it fails, fall back to any stale disk copy.
+    fetched = _fetch_manifest(cfg["url"], DEFAULT_FETCH_TIMEOUT)
+    if fetched is not None:
+        _write_disk_cache(fetched)
+        new_disk_data, new_mtime = _read_disk_cache()
+        if new_disk_data is not None:
+            _catalog_cache = new_disk_data
+            _catalog_cache_source_mtime = new_mtime
+            return new_disk_data
+        _catalog_cache = fetched
+        _catalog_cache_source_mtime = now
+        return fetched
+
+    if disk_data is not None:
+        _catalog_cache = disk_data
+        _catalog_cache_source_mtime = disk_mtime
+        return disk_data
+
+    return {}
+
+
+def _fetch_provider_override(provider: str) -> dict[str, Any] | None:
+    """If ``model_catalog.providers.<name>.url`` is set, fetch that instead."""
+    cfg = _load_catalog_config()
+    if not cfg["enabled"]:
+        return None
+    provider_cfg = cfg["providers"].get(provider)
+    if not isinstance(provider_cfg, dict):
+        return None
+    override_url = provider_cfg.get("url")
+    if not isinstance(override_url, str) or not override_url.strip():
+        return None
+    # Override fetches skip the disk cache because they're usually
+    # third-party self-hosted. Re-request on every call but with a short
+    # timeout so they don't block the picker.
+    return _fetch_manifest(override_url.strip(), DEFAULT_FETCH_TIMEOUT)
+
+
+def _get_provider_block(provider: str) -> dict[str, Any] | None:
+    """Return the provider's manifest block, respecting per-provider overrides."""
+    override = _fetch_provider_override(provider)
+    if override is not None:
+        block = override.get("providers", {}).get(provider)
+        if isinstance(block, dict):
+            return block
+
+    catalog = get_catalog()
+    if not catalog:
+        return None
+    block = catalog.get("providers", {}).get(provider)
+    return block if isinstance(block, dict) else None
+
+
+def get_curated_openrouter_models() -> list[tuple[str, str]] | None:
+    """Return OpenRouter's curated ``[(id, description), ...]`` from the manifest.
+
+    Returns ``None`` when the manifest is unavailable, so callers can fall
+    back to their hardcoded list.
+    """
+    block = _get_provider_block("openrouter")
+    if not block:
+        return None
+    out: list[tuple[str, str]] = []
+    for m in block.get("models", []):
+        mid = str(m.get("id") or "").strip()
+        if not mid:
+            continue
+        desc = str(m.get("description") or "")
+        out.append((mid, desc))
+    return out or None
+
+
+def get_curated_nous_models() -> list[str] | None:
+    """Return Nous Portal's curated list of model ids from the manifest.
+
+    Returns ``None`` when the manifest is unavailable.
+    """
+    block = _get_provider_block("nous")
+    if not block:
+        return None
+    out: list[str] = []
+    for m in block.get("models", []):
+        mid = str(m.get("id") or "").strip()
+        if mid:
+            out.append(mid)
+    return out or None
+
+
+def reset_cache() -> None:
+    """Clear the in-process cache. Used by tests and ``hermes model --refresh``."""
+    global _catalog_cache, _catalog_cache_source_mtime
+    _catalog_cache = None
+    _catalog_cache_source_mtime = 0.0
@@ -33,8 +33,6 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
 # (model_id, display description shown in menus)
 OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("moonshotai/kimi-k2.6",            "recommended"),
-    ("deepseek/deepseek-v4-pro",        ""),
-    ("deepseek/deepseek-v4-flash",      ""),
    ("anthropic/claude-opus-4.7",       ""),
    ("anthropic/claude-opus-4.6",       ""),
    ("anthropic/claude-sonnet-4.6",     ""),
@@ -111,8 +109,6 @@ def _codex_curated_models() -> list[str]:
 _PROVIDER_MODELS: dict[str, list[str]] = {
    "nous": [
        "moonshotai/kimi-k2.6",
-        "deepseek/deepseek-v4-pro",
-        "deepseek/deepseek-v4-flash",
        "xiaomi/mimo-v2.5-pro",
        "xiaomi/mimo-v2.5",
        "anthropic/claude-opus-4.7",
@@ -282,6 +278,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "trinity-large-preview",
        "trinity-mini",
    ],
+    "gmi": [
+        "zai-org/GLM-5.1-FP8",
+        "deepseek-ai/DeepSeek-V3.2",
+        "moonshotai/Kimi-K2.5",
+        "google/gemini-3.1-flash-lite-preview",
+        "anthropic/claude-sonnet-4.6",
+        "openai/gpt-5.4",
+    ],
    "opencode-zen": [
        "kimi-k2.5",
        "gpt-5.4-pro",
@@ -713,7 +717,6 @@ class ProviderEntry(NamedTuple):
    label: str
    tui_desc: str   # detailed description for `hermes model` TUI

-
 CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("nous",           "Nous Portal",              "Nous Portal (Nous Research subscription)"),
    ProviderEntry("openrouter",     "OpenRouter",               "OpenRouter (100+ models, pay-per-use)"),
@@ -739,6 +742,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("alibaba",        "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
    ProviderEntry("ollama-cloud",   "Ollama Cloud",             "Ollama Cloud (cloud-hosted open models — ollama.com)"),
    ProviderEntry("arcee",          "Arcee AI",                 "Arcee AI (Trinity models — direct API)"),
+    ProviderEntry("gmi",            "GMI Cloud",                "GMI Cloud (multi-model direct API)"),
    ProviderEntry("kilocode",       "Kilo Code",                "Kilo Code (Kilo Gateway API)"),
    ProviderEntry("opencode-zen",   "OpenCode Zen",             "OpenCode Zen (35+ curated models, pay-as-you-go)"),
    ProviderEntry("opencode-go",    "OpenCode Go",              "OpenCode Go (open models, $10/month subscription)"),
@@ -773,6 +777,8 @@ _PROVIDER_ALIASES = {
    "stepfun-coding-plan": "stepfun",
    "arcee-ai": "arcee",
    "arceeai": "arcee",
+    "gmi-cloud": "gmi",
+    "gmicloud": "gmi",
    "minimax-china": "minimax-cn",
    "minimax_cn": "minimax-cn",
    "claude": "anthropic",
@@ -876,7 +882,16 @@ def fetch_openrouter_models(
    if _openrouter_catalog_cache is not None and not force_refresh:
        return list(_openrouter_catalog_cache)

-    fallback = list(OPENROUTER_MODELS)
+    # Prefer the remotely-hosted catalog manifest; fall back to the in-repo
+    # snapshot when the manifest is unreachable. Both are curated lists that
+    # drive the picker; the OpenRouter live /v1/models filter (tool support,
+    # free pricing) is applied on top either way.
+    try:
+        from hermes_cli.model_catalog import get_curated_openrouter_models
+        remote = get_curated_openrouter_models()
+    except Exception:
+        remote = None
+    fallback = list(remote) if remote else list(OPENROUTER_MODELS)
    preferred_ids = [mid for mid, _ in fallback]

    try:
@@ -929,6 +944,24 @@ def model_ids(*, force_refresh: bool = False) -> list[str]:
    return [mid for mid, _ in fetch_openrouter_models(force_refresh=force_refresh)]


+def get_curated_nous_model_ids() -> list[str]:
+    """Return the curated Nous Portal model-id list.
+
+    Prefers the remotely-hosted catalog manifest (published under
+    ``website/static/api/model-catalog.json``); falls back to the in-repo
+    snapshot in ``_PROVIDER_MODELS["nous"]`` when the manifest is
+    unreachable. Always returns a list (never None).
+    """
+    try:
+        from hermes_cli.model_catalog import get_curated_nous_models
+        remote = get_curated_nous_models()
+    except Exception:
+        remote = None
+    if remote:
+        return list(remote)
+    return list(_PROVIDER_MODELS.get("nous", []))
+
+
 def _ai_gateway_model_is_free(pricing: Any) -> bool:
    """Return True if an AI Gateway model has $0 input AND output pricing."""
    if not isinstance(pricing, dict):
@@ -1826,6 +1859,19 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
                    return live
            except Exception:
                pass
+    if normalized == "gmi":
+        try:
+            from hermes_cli.auth import resolve_api_key_provider_credentials
+
+            creds = resolve_api_key_provider_credentials("gmi")
+            api_key = str(creds.get("api_key") or "").strip()
+            base_url = str(creds.get("base_url") or "").strip()
+            if api_key and base_url:
+                live = fetch_api_models(api_key, base_url)
+                if live:
+                    return live
+        except Exception:
+            pass
    if normalized == "custom":
        base_url = _get_custom_base_url()
        if base_url:
@@ -2203,6 +2249,52 @@ def copilot_model_api_mode(
    return "chat_completions"


+# Azure Foundry model families that require the Responses API.  Azure
+# rejects /chat/completions against these deployments with
+# ``400 "The requested operation is unsupported."`` — the same payload Bob
+# Dobolina hit in April 2026 on ``gpt-5.3-codex`` while ``gpt-4o-pure`` on
+# the same endpoint worked fine.  Keep the patterns broad enough to cover
+# vendor-renamed deployments (e.g. ``gpt-5.3-codex``, ``gpt-5-codex``,
+# ``gpt-5.4``, ``o1-preview``) but tight enough to leave GPT-4 / 3.5 / Llama /
+# Mistral / Grok deployments on chat completions.
+_AZURE_FOUNDRY_RESPONSES_PREFIXES = (
+    "codex",       # codex-*, codex-mini
+    "gpt-5",       # gpt-5, gpt-5.x, gpt-5-codex, gpt-5.x-codex
+    "o1",          # o1, o1-preview, o1-mini
+    "o3",          # o3, o3-mini
+    "o4",          # o4, o4-mini
+)
+
+
+def azure_foundry_model_api_mode(model_name: Optional[str]) -> Optional[str]:
+    """Infer Azure Foundry api_mode from a deployment/model name.
+
+    Returns ``"codex_responses"`` when the model name matches a family that
+    only accepts the Responses API on Azure Foundry (GPT-5.x, codex, o1/o3/o4
+    reasoning models).  Returns ``None`` otherwise — the caller should fall
+    back to the configured/default api_mode (typically ``chat_completions``)
+    so GPT-4o, GPT-4 Turbo, Llama, Mistral, etc. keep working.
+
+    Intentionally does NOT return ``anthropic_messages``; Anthropic-style
+    Azure endpoints are disambiguated by URL (``/anthropic`` suffix) in
+    ``runtime_provider._detect_api_mode_for_url`` and by the user setting
+    ``model.api_mode: anthropic_messages`` explicitly.
+    """
+    raw = str(model_name or "").strip().lower()
+    if not raw:
+        return None
+    # Strip any vendor/ prefix a user may have copied from OpenRouter / Copilot.
+    if "/" in raw:
+        raw = raw.rsplit("/", 1)[-1]
+    # gpt-5-mini speaks chat completions on Copilot but Azure Foundry deploys
+    # the full gpt-5 family uniformly on Responses API — don't carve an
+    # exception here.
+    for prefix in _AZURE_FOUNDRY_RESPONSES_PREFIXES:
+        if raw.startswith(prefix):
+            return "codex_responses"
+    return None
+
+
 def normalize_opencode_model_id(provider_id: Optional[str], model_id: Optional[str]) -> str:
    """Normalize OpenCode config IDs to the bare model slug used in API requests."""
    provider = normalize_provider(provider_id)
@@ -9,6 +9,7 @@ from typing import Dict, Iterable, Optional, Set
 from hermes_cli.auth import get_nous_auth_status
 from hermes_cli.config import get_env_value, load_config
 from tools.managed_tool_gateway import is_managed_tool_gateway_ready
+from utils import is_truthy_value
 from tools.tool_backend_helpers import (
    fal_key_is_configured,
    has_direct_modal_credentials,
@@ -25,6 +26,13 @@ _DEFAULT_PLATFORM_TOOLSETS = {
 }


+def _uses_gateway(section: object) -> bool:
+    """Return True when a config section explicitly opts into the gateway."""
+    if not isinstance(section, dict):
+        return False
+    return is_truthy_value(section.get("use_gateway"), default=False)
+
+
@dataclass(frozen=True)
 class NousFeatureState:
    key: str
@@ -262,11 +270,11 @@ def get_nous_subscription_features(
    # use_gateway flags — when True, the user explicitly opted into the
    # Tool Gateway via `hermes model`, so direct credentials should NOT
    # prevent gateway routing.
-    web_use_gateway = bool(web_cfg.get("use_gateway"))
-    tts_use_gateway = bool(tts_cfg.get("use_gateway"))
-    browser_use_gateway = bool(browser_cfg.get("use_gateway"))
+    web_use_gateway = _uses_gateway(web_cfg)
+    tts_use_gateway = _uses_gateway(tts_cfg)
+    browser_use_gateway = _uses_gateway(browser_cfg)
    image_gen_cfg = config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}
-    image_use_gateway = bool(image_gen_cfg.get("use_gateway"))
+    image_use_gateway = _uses_gateway(image_gen_cfg)

    direct_exa = bool(get_env_value("EXA_API_KEY"))
    direct_firecrawl = bool(get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL"))
@@ -601,10 +609,10 @@ def get_gateway_eligible_tools(
    # no direct keys exist — we only skip the prompt for tools where
    # use_gateway was explicitly set.
    opted_in = {
-        "web": bool((config.get("web") if isinstance(config.get("web"), dict) else {}).get("use_gateway")),
-        "image_gen": bool((config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}).get("use_gateway")),
-        "tts": bool((config.get("tts") if isinstance(config.get("tts"), dict) else {}).get("use_gateway")),
-        "browser": bool((config.get("browser") if isinstance(config.get("browser"), dict) else {}).get("use_gateway")),
+        "web": _uses_gateway(config.get("web")),
+        "image_gen": _uses_gateway(config.get("image_gen")),
+        "tts": _uses_gateway(config.get("tts")),
+        "browser": _uses_gateway(config.get("browser")),
    }

    unconfigured: list[str] = []
@@ -36,6 +36,7 @@ PLATFORMS: OrderedDict[str, PlatformInfo] = OrderedDict([
    ("wecom_callback", PlatformInfo(label="💬 WeCom Callback",  default_toolset="hermes-wecom-callback")),
    ("weixin",         PlatformInfo(label="💬 Weixin",          default_toolset="hermes-weixin")),
    ("qqbot",          PlatformInfo(label="💬 QQBot",           default_toolset="hermes-qqbot")),
+    ("yuanbao",        PlatformInfo(label="🤖 Yuanbao",         default_toolset="hermes-yuanbao")),
    ("webhook",        PlatformInfo(label="🔗 Webhook",         default_toolset="hermes-webhook")),
    ("api_server",     PlatformInfo(label="🌐 API Server",      default_toolset="hermes-api-server")),
    ("cron",           PlatformInfo(label="⏰ Cron",            default_toolset="hermes-cron")),
@@ -163,6 +163,12 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        base_url_override="https://api.arcee.ai/api/v1",
        base_url_env_var="ARCEE_BASE_URL",
    ),
+    "gmi": HermesOverlay(
+        transport="openai_chat",
+        extra_env_vars=("GMI_API_KEY",),
+        base_url_override="https://api.gmi-serving.com/v1",
+        base_url_env_var="GMI_BASE_URL",
+    ),
    "ollama-cloud": HermesOverlay(
        transport="openai_chat",
        base_url_env_var="OLLAMA_BASE_URL",
@@ -297,6 +303,10 @@ ALIASES: Dict[str, str] = {
    "arcee-ai": "arcee",
    "arceeai": "arcee",

+    # gmi
+    "gmi-cloud": "gmi",
+    "gmicloud": "gmi",
+
    # Local server aliases → virtual "local" concept (resolved via user config)
    "lmstudio": "lmstudio",
    "lm-studio": "lmstudio",
@@ -319,6 +329,7 @@ _LABEL_OVERRIDES: Dict[str, str] = {
    "copilot-acp": "GitHub Copilot ACP",
    "stepfun": "StepFun Step Plan",
    "xiaomi": "Xiaomi MiMo",
+    "gmi": "GMI Cloud",
    "local": "Local endpoint",
    "bedrock": "AWS Bedrock",
    "ollama-cloud": "Ollama Cloud",
@@ -231,6 +231,19 @@ def _resolve_runtime_from_pool_entry(
            configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
            if configured_mode:
                api_mode = configured_mode
+        # Model-family inference for GPT-5.x / codex / o1-o4: Azure rejects
+        # /chat/completions on these with 400 "operation unsupported" — see
+        # azure_foundry_model_api_mode() for rationale.  Skip when the user
+        # explicitly picked anthropic_messages (Anthropic-style endpoint).
+        if effective_model and api_mode != "anthropic_messages":
+            try:
+                from hermes_cli.models import azure_foundry_model_api_mode
+
+                inferred = azure_foundry_model_api_mode(effective_model)
+            except Exception:
+                inferred = None
+            if inferred:
+                api_mode = inferred
        # For Anthropic-style endpoints, strip /v1 suffix
        if api_mode == "anthropic_messages":
            base_url = re.sub(r"/v1/?$", "", base_url)
@@ -608,6 +621,7 @@ def _resolve_azure_foundry_runtime(
    model_cfg: Dict[str, Any],
    explicit_api_key: Optional[str] = None,
    explicit_base_url: Optional[str] = None,
+    target_model: Optional[str] = None,
 ) -> Dict[str, Any]:
    """Resolve an Azure Foundry runtime entry.

@@ -628,6 +642,22 @@ def _resolve_azure_foundry_runtime(
        cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
        cfg_api_mode = _parse_api_mode(model_cfg.get("api_mode")) or "chat_completions"

+    # Model-family inference: Azure Foundry deploys GPT-5.x / codex / o1-o4
+    # reasoning models as Responses-API-only.  Calling /chat/completions
+    # against them returns 400 "The requested operation is unsupported."
+    # Upgrade api_mode when the model name matches, unless the user has
+    # explicitly chosen anthropic_messages (Anthropic-style endpoint).
+    effective_model = str(target_model or model_cfg.get("default") or "").strip()
+    if effective_model and cfg_api_mode != "anthropic_messages":
+        try:
+            from hermes_cli.models import azure_foundry_model_api_mode
+
+            inferred = azure_foundry_model_api_mode(effective_model)
+        except Exception:
+            inferred = None
+        if inferred:
+            cfg_api_mode = inferred
+
    env_base_url = os.getenv("AZURE_FOUNDRY_BASE_URL", "").strip().rstrip("/")
    base_url = explicit_base_url_clean or cfg_base_url or env_base_url
    if not base_url:
@@ -864,6 +894,7 @@ def resolve_runtime_provider(
            model_cfg=_get_model_config(),
            explicit_api_key=explicit_api_key,
            explicit_base_url=explicit_base_url,
+            target_model=target_model,
        )
        return azure_runtime

@@ -1856,27 +1856,32 @@ def _setup_slack():
    if existing:
        print_info("Slack: already configured")
        if not prompt_yes_no("Reconfigure Slack?", False):
+            # Even without reconfiguring, offer to refresh the manifest so
+            # new commands (e.g. /btw, /stop, ...) get registered in Slack.
+            if prompt_yes_no(
+                "Regenerate the Slack app manifest with the latest command "
+                "list? (recommended after `hermes update`)",
+                True,
+            ):
+                _write_slack_manifest_and_instruct()
            return

    print_info("Steps to create a Slack app:")
-    print_info("   1. Go to https://api.slack.com/apps → Create New App (from scratch)")
+    print_info("   1. Go to https://api.slack.com/apps → Create New App")
+    print_info("      Pick 'From an app manifest' — we'll generate one for you below.")
    print_info("   2. Enable Socket Mode: Settings → Socket Mode → Enable")
    print_info("      • Create an App-Level Token with 'connections:write' scope")
-    print_info("   3. Add Bot Token Scopes: Features → OAuth & Permissions")
-    print_info("      Required scopes: chat:write, app_mentions:read,")
-    print_info("      channels:history, channels:read, im:history,")
-    print_info("      im:read, im:write, users:read, files:read, files:write")
-    print_info("      Optional for private channels: groups:history")
-    print_info("   4. Subscribe to Events: Features → Event Subscriptions → Enable")
-    print_info("      Required events: message.im, message.channels, app_mention")
-    print_info("      Optional for private channels: message.groups")
-    print_warning("   ⚠ Without message.channels the bot will ONLY work in DMs,")
-    print_warning("     not public channels.")
-    print_info("   5. Install to Workspace: Settings → Install App")
-    print_info("   6. Reinstall the app after any scope or event changes")
-    print_info("   7. After installing, invite the bot to channels: /invite @YourBot")
+    print_info("   3. Install to Workspace: Settings → Install App")
+    print_info("   4. After installing, invite the bot to channels: /invite @YourBot")
    print()
    print_info("   Full guide: https://hermes-agent.nousresearch.com/docs/user-guide/messaging/slack/")
+    print()
+
+    # Generate and write manifest up-front so the user can paste it into
+    # the "Create from manifest" flow instead of clicking through scopes /
+    # events / slash commands one at a time.
+    _write_slack_manifest_and_instruct()
+
    print()
    bot_token = prompt("Slack Bot Token (xoxb-...)", password=True)
    if not bot_token:
@@ -1902,6 +1907,49 @@ def _setup_slack():
        print_info("   Set SLACK_ALLOW_ALL_USERS=true or GATEWAY_ALLOW_ALL_USERS=true only if you intentionally want open workspace access.")


+def _write_slack_manifest_and_instruct():
+    """Generate the Slack manifest, write it under HERMES_HOME, and print
+    paste-into-Slack instructions.
+
+    Exposed as its own helper so both the initial setup flow and the
+    "reconfigure? → no" branch can refresh the manifest without the user
+    re-entering tokens. Failures are non-fatal — if the manifest write
+    fails for any reason, we print a warning and skip rather than abort
+    the whole Slack setup.
+    """
+    try:
+        from hermes_cli.slack_cli import _build_full_manifest
+        from hermes_constants import get_hermes_home
+
+        manifest = _build_full_manifest(
+            bot_name="Hermes",
+            bot_description="Your Hermes agent on Slack",
+        )
+        target = Path(get_hermes_home()) / "slack-manifest.json"
+        target.parent.mkdir(parents=True, exist_ok=True)
+        import json as _json
+        target.write_text(
+            _json.dumps(manifest, indent=2, ensure_ascii=False) + "\n",
+            encoding="utf-8",
+        )
+        print_success(f"Slack app manifest written to: {target}")
+        print_info(
+            "   Paste it into https://api.slack.com/apps → your app → Features "
+            "→ App Manifest → Edit, then Save.  Slack will prompt to "
+            "reinstall if scopes or slash commands changed."
+        )
+        print_info(
+            "   Re-run `hermes slack manifest --write` anytime to refresh after "
+            "Hermes adds new commands."
+        )
+    except Exception as exc:  # pragma: no cover - best-effort UX helper
+        print_warning(f"Couldn't write Slack manifest: {exc}")
+        print_info(
+            "   You can generate it manually later with: "
+            "hermes slack manifest --write"
+        )
+
+
 def _setup_matrix():
    """Configure Matrix credentials."""
    print_header("Matrix")
@@ -2085,6 +2133,12 @@ def _setup_feishu():
    _gateway_setup_feishu()


+def _setup_yuanbao():
+    """Configure Yuanbao via gateway setup."""
+    from hermes_cli.gateway import _setup_yuanbao as _gateway_setup_yuanbao
+    _gateway_setup_yuanbao()
+
+
 def _setup_wecom():
    """Configure WeCom (Enterprise WeChat) via gateway setup."""
    from hermes_cli.gateway import _setup_wecom as _gateway_setup_wecom
@@ -2229,6 +2283,7 @@ _GATEWAY_PLATFORMS = [
    ("WhatsApp", "WHATSAPP_ENABLED", _setup_whatsapp),
    ("DingTalk", "DINGTALK_CLIENT_ID", _setup_dingtalk),
    ("Feishu / Lark", "FEISHU_APP_ID", _setup_feishu),
+    ("Yuanbao", "YUANBAO_APP_ID", _setup_yuanbao),
    ("WeCom (Enterprise WeChat)", "WECOM_BOT_ID", _setup_wecom),
    ("WeCom Callback (Self-Built App)", "WECOM_CALLBACK_CORP_ID", _setup_wecom_callback),
    ("Weixin (WeChat)", "WEIXIN_ACCOUNT_ID", _setup_weixin),
@@ -11,9 +11,10 @@ handler are thin wrappers that parse args and delegate.
 """

 import json
+import re
 import shutil
 from pathlib import Path
-from typing import Any, Dict, Optional
+from typing import Any, Dict, List, Optional

 from rich.console import Console
 from rich.panel import Panel
@@ -141,6 +142,103 @@ def _derive_category_from_install_path(install_path: str) -> str:
    return "" if parent == "." else parent


+# ---------------------------------------------------------------------------
+# Interactive name/category resolution for URL-installed skills
+# ---------------------------------------------------------------------------
+
+_VALID_NAME_RE = re.compile(r"^[a-z][a-z0-9_-]*$")
+_VALID_CATEGORY_RE = re.compile(r"^[a-z][a-z0-9_/-]*$")
+
+
+def _is_valid_installed_skill_name(name: str) -> bool:
+    """Accept identifier-shaped names, reject empty / sentinel-y values."""
+    if not isinstance(name, str):
+        return False
+    candidate = name.strip().lower()
+    if not candidate or candidate in {"skill", "readme", "index", "unnamed-skill"}:
+        return False
+    return bool(_VALID_NAME_RE.match(candidate))
+
+
+def _existing_categories() -> List[str]:
+    """Return sorted subdirectory names under ``~/.hermes/skills/`` that look
+    like category buckets (contain at least one ``SKILL.md`` somewhere below).
+
+    Used to suggest reusable categories when interactively installing from a
+    URL. Hidden dirs (``.hub``, ``.trash``) are skipped.
+    """
+    from tools.skills_hub import SKILLS_DIR
+    out: List[str] = []
+    try:
+        for entry in SKILLS_DIR.iterdir():
+            if not entry.is_dir() or entry.name.startswith("."):
+                continue
+            # Only count as a category if it contains skills, not if it IS a skill.
+            # Heuristic: if ``<entry>/SKILL.md`` exists, it's a skill at the
+            # top level (no category); otherwise treat as a category bucket.
+            if (entry / "SKILL.md").exists():
+                continue
+            # Has at least one nested SKILL.md?
+            try:
+                if any(entry.rglob("SKILL.md")):
+                    out.append(entry.name)
+            except OSError:
+                continue
+    except (FileNotFoundError, OSError):
+        return []
+    return sorted(set(out))
+
+
+def _prompt_for_skill_name(c: Console, url: str, default: str = "") -> Optional[str]:
+    """Prompt interactively for a skill name. Returns None on cancel/EOF."""
+    c.print()
+    c.print(
+        f"[yellow]The SKILL.md at {url} doesn't declare a `name:` in its "
+        f"frontmatter,[/]\n[yellow]and the URL path doesn't produce a valid "
+        f"identifier either.[/]"
+    )
+    default_hint = f" [{default}]" if default else ""
+    c.print(
+        f"[bold]Enter a skill name{default_hint}:[/] "
+        f"[dim](lowercase letters, digits, hyphens, underscores; starts with a letter)[/]"
+    )
+    try:
+        answer = input("Name: ").strip()
+    except (EOFError, KeyboardInterrupt):
+        return None
+    if not answer and default:
+        answer = default
+    if not _is_valid_installed_skill_name(answer):
+        c.print(f"[bold red]Invalid name:[/] {answer!r}. Aborting install.\n")
+        return None
+    return answer
+
+
+def _prompt_for_category(c: Console, existing: List[str]) -> str:
+    """Prompt interactively for a category. Empty/None input means flat install."""
+    c.print()
+    if existing:
+        c.print(
+            "[bold]Pick a category[/] "
+            "[dim](reuse an existing bucket, type a new one, or press Enter to install flat)[/]"
+        )
+        c.print(f"[dim]Existing: {', '.join(existing)}[/]")
+    else:
+        c.print(
+            "[bold]Category[/] [dim](optional — press Enter to install flat at ~/.hermes/skills/<name>/)[/]"
+        )
+    try:
+        answer = input("Category: ").strip()
+    except (EOFError, KeyboardInterrupt):
+        return ""
+    if not answer:
+        return ""
+    if not _VALID_CATEGORY_RE.match(answer):
+        c.print(f"[dim]Invalid category {answer!r} — installing flat.[/]")
+        return ""
+    return answer
+
+
 def do_search(query: str, source: str = "all", limit: int = 10,
              console: Optional[Console] = None) -> None:
    """Search registries and display results as a Rich table."""
@@ -309,8 +407,17 @@ def do_browse(page: int = 1, page_size: int = 20, source: str = "all",

 def do_install(identifier: str, category: str = "", force: bool = False,
               console: Optional[Console] = None, skip_confirm: bool = False,
-               invalidate_cache: bool = True) -> None:
-    """Fetch, quarantine, scan, confirm, and install a skill."""
+               invalidate_cache: bool = True,
+               name_override: str = "") -> None:
+    """Fetch, quarantine, scan, confirm, and install a skill.
+
+    ``name_override`` lets non-interactive callers (slash commands, gateway,
+    scripts) supply a skill name when the upstream SKILL.md lacks a valid
+    ``name:`` frontmatter field. On interactive TTY surfaces, a missing name
+    triggers a prompt instead; ``skip_confirm=True`` means "non-interactive"
+    (so pair it with ``name_override`` when installing from a URL that has
+    no frontmatter).
+    """
    from tools.skills_hub import (
        GitHubAuth, create_source_router, ensure_hub_dirs,
        quarantine_bundle, install_from_quarantine, HubLockFile,
@@ -354,6 +461,58 @@ def do_install(identifier: str, category: str = "", force: bool = False,
            c.print()
        return

+    # URL-sourced skills may arrive with an empty name when SKILL.md has no
+    # ``name:`` in frontmatter AND the URL path doesn't yield a valid
+    # identifier. Resolve by (1) --name override, (2) interactive prompt on
+    # a TTY, (3) refuse with an actionable error on non-interactive surfaces.
+    bundle_meta = getattr(bundle, "metadata", {}) or {}
+    if bundle.source == "url" and (not bundle.name or bundle_meta.get("awaiting_name")):
+        if name_override and _is_valid_installed_skill_name(name_override):
+            bundle.name = name_override.strip()
+            bundle_meta["awaiting_name"] = False
+        elif name_override:
+            c.print(
+                f"[bold red]Invalid --name:[/] {name_override!r}. "
+                "Must be a lowercase identifier (letters, digits, hyphens, "
+                "underscores; starts with a letter).\n"
+            )
+            return
+        elif skip_confirm:
+            # Non-interactive surface (slash command / TUI / gateway). Can't
+            # prompt — emit an actionable error.
+            url = bundle_meta.get("url") or identifier
+            c.print(
+                f"[bold red]Cannot install from URL:[/] {url}\n"
+                "[yellow]The SKILL.md has no `name:` in its frontmatter, "
+                "and the URL path doesn't produce a valid identifier.[/]\n\n"
+                "Retry with an explicit name:\n"
+                f"  [bold]/skills install {url} --name <your-name>[/]\n"
+                f"  [bold]hermes skills install {url} --name <your-name>[/]\n\n"
+                "[dim]Or ask the SKILL.md's author to add a `name:` field to "
+                "its YAML frontmatter.[/]\n"
+            )
+            return
+        else:
+            # Interactive TTY — prompt.
+            url = bundle_meta.get("url") or identifier
+            chosen = _prompt_for_skill_name(c, url)
+            if not chosen:
+                c.print("[dim]Installation cancelled.[/]\n")
+                return
+            bundle.name = chosen
+            bundle_meta["awaiting_name"] = False
+        # Keep SkillMeta in sync so downstream "already installed" checks,
+        # audit logs, and display all see the final name.
+        if meta is not None:
+            meta.name = bundle.name
+            meta.path = bundle.name
+
+    # URL-sourced skills: offer to pick a category interactively when the
+    # caller didn't specify one (TTY only — non-interactive installs fall
+    # through to flat install, matching all other sources).
+    if bundle.source == "url" and not category and not skip_confirm:
+        category = _prompt_for_category(c, _existing_categories())
+
    # Auto-detect category for official skills (e.g. "official/autonomous-ai-agents/blackbox")
    if bundle.source == "official" and not category:
        id_parts = bundle.identifier.split("/")  # ["official", "category", "skill"]
@@ -599,11 +758,24 @@ def inspect_skill(identifier: str) -> Optional[dict]:
    return out


-def do_list(source_filter: str = "all", console: Optional[Console] = None) -> None:
-    """List installed skills, distinguishing hub, builtin, and local skills."""
+def do_list(source_filter: str = "all",
+            enabled_only: bool = False,
+            console: Optional[Console] = None) -> None:
+    """List installed skills, distinguishing hub, builtin, and local skills.
+
+    Args:
+        source_filter: ``all`` | ``hub`` | ``builtin`` | ``local``.
+        enabled_only: If True, hide disabled skills from the output.
+
+    Enabled/disabled state is resolved against the currently active profile's
+    config — ``hermes -p <profile> skills list`` reads that profile's
+    ``skills.disabled`` list because ``-p`` swaps ``HERMES_HOME`` at process
+    start.  No explicit profile flag needed here.
+    """
    from tools.skills_hub import HubLockFile, ensure_hub_dirs
    from tools.skills_sync import _read_manifest
    from tools.skills_tool import _find_all_skills
+    from agent.skill_utils import get_disabled_skill_names

    c = console or _console
    ensure_hub_dirs()
@@ -611,17 +783,26 @@ def do_list(source_filter: str = "all", console: Optional[Console] = None) -> No
    hub_installed = {e["name"]: e for e in lock.list_installed()}
    builtin_names = set(_read_manifest())

-    all_skills = _find_all_skills()
+    # Pull ALL skills (including disabled ones) so we can annotate status.
+    all_skills = _find_all_skills(skip_disabled=True)
+    disabled_names = get_disabled_skill_names()

-    table = Table(title="Installed Skills")
+    title = "Installed Skills"
+    if enabled_only:
+        title += " (enabled only)"
+
+    table = Table(title=title)
    table.add_column("Name", style="bold cyan")
    table.add_column("Category", style="dim")
    table.add_column("Source", style="dim")
    table.add_column("Trust", style="dim")
+    table.add_column("Status", style="dim")

    hub_count = 0
    builtin_count = 0
    local_count = 0
+    enabled_count = 0
+    disabled_count = 0

    for skill in sorted(all_skills, key=lambda s: (s.get("category") or "", s["name"])):
        name = skill["name"]
@@ -632,29 +813,48 @@ def do_list(source_filter: str = "all", console: Optional[Console] = None) -> No
            source_type = "hub"
            source_display = hub_entry.get("source", "hub")
            trust = hub_entry.get("trust_level", "community")
-            hub_count += 1
        elif name in builtin_names:
            source_type = "builtin"
            source_display = "builtin"
            trust = "builtin"
-            builtin_count += 1
        else:
            source_type = "local"
            source_display = "local"
            trust = "local"
-            local_count += 1

        if source_filter != "all" and source_filter != source_type:
            continue

+        is_enabled = name not in disabled_names
+        if enabled_only and not is_enabled:
+            continue
+
+        if source_type == "hub":
+            hub_count += 1
+        elif source_type == "builtin":
+            builtin_count += 1
+        else:
+            local_count += 1
+
+        if is_enabled:
+            enabled_count += 1
+            status_cell = "[bold green]enabled[/]"
+        else:
+            disabled_count += 1
+            status_cell = "[dim red]disabled[/]"
+
        trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow", "local": "dim"}.get(trust, "dim")
        trust_label = "official" if source_display == "official" else trust
-        table.add_row(name, category, source_display, f"[{trust_style}]{trust_label}[/]")
+        table.add_row(name, category, source_display, f"[{trust_style}]{trust_label}[/]", status_cell)

    c.print(table)
-    c.print(
-        f"[dim]{hub_count} hub-installed, {builtin_count} builtin, {local_count} local[/]\n"
-    )
+    summary = f"[dim]{hub_count} hub-installed, {builtin_count} builtin, {local_count} local"
+    if enabled_only:
+        summary += f" — {enabled_count} enabled shown"
+    else:
+        summary += f" — {enabled_count} enabled, {disabled_count} disabled"
+    summary += "[/]\n"
+    c.print(summary)


 def do_check(name: Optional[str] = None, console: Optional[Console] = None) -> None:
@@ -1123,11 +1323,15 @@ def skills_command(args) -> None:
        do_search(args.query, source=args.source, limit=args.limit)
    elif action == "install":
        do_install(args.identifier, category=args.category, force=args.force,
-                   skip_confirm=getattr(args, "yes", False))
+                   skip_confirm=getattr(args, "yes", False),
+                   name_override=getattr(args, "name", "") or "")
    elif action == "inspect":
        do_inspect(args.identifier)
    elif action == "list":
-        do_list(source_filter=args.source)
+        do_list(
+            source_filter=args.source,
+            enabled_only=getattr(args, "enabled_only", False),
+        )
    elif action == "check":
        do_check(name=getattr(args, "name", None))
    elif action == "update":
@@ -1177,6 +1381,7 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
        /skills search kubernetes
        /skills install openai/skills/skill-creator
        /skills install openai/skills/skill-creator --force
+        /skills install https://example.com/path/SKILL.md
        /skills inspect openai/skills/skill-creator
        /skills list
        /skills list --source hub
@@ -1253,10 +1458,11 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:

    elif action == "install":
        if not args:
-            c.print("[bold red]Usage:[/] /skills install <identifier> [--category <cat>] [--force] [--now]\n")
+            c.print("[bold red]Usage:[/] /skills install <identifier-or-url> [--name <name>] [--category <cat>] [--force] [--now]\n")
            return
        identifier = args[0]
        category = ""
+        name_override = ""
        # Slash commands run inside prompt_toolkit where input() hangs.
        # Always skip confirmation — the user typing the command is implicit consent.
        skip_confirm = True
@@ -1267,9 +1473,11 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
        for i, a in enumerate(args):
            if a == "--category" and i + 1 < len(args):
                category = args[i + 1]
+            elif a == "--name" and i + 1 < len(args):
+                name_override = args[i + 1]
        do_install(identifier, category=category, force=force,
                   skip_confirm=skip_confirm, invalidate_cache=invalidate_cache,
-                   console=c)
+                   name_override=name_override, console=c)

    elif action == "inspect":
        if not args:
@@ -1279,11 +1487,12 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:

    elif action == "list":
        source_filter = "all"
+        enabled_only = "--enabled-only" in args or "--enabled" in args
        if "--source" in args:
            idx = args.index("--source")
            if idx + 1 < len(args):
                source_filter = args[idx + 1]
-        do_list(source_filter=source_filter, console=c)
+        do_list(source_filter=source_filter, enabled_only=enabled_only, console=c)

    elif action == "check":
        name = args[0] if args else None
@@ -1371,7 +1580,8 @@ def _print_skills_help(console: Console) -> None:
        "  [cyan]search[/] <query>              Search registries for skills\n"
        "  [cyan]install[/] <identifier>        Install a skill (with security scan)\n"
        "  [cyan]inspect[/] <identifier>        Preview a skill without installing\n"
-        "  [cyan]list[/] [--source hub|builtin|local] List installed skills\n"
+        "  [cyan]list[/] [--source hub|builtin|local] [--enabled-only]\n"
+        "       List installed skills; --enabled-only filters to the active profile's live set\n"
        "  [cyan]check[/] [name]                Check hub skills for upstream updates\n"
        "  [cyan]update[/] [name]               Update hub skills with upstream changes\n"
        "  [cyan]audit[/] [name]                Re-scan hub skills for security\n"
@@ -0,0 +1,152 @@
+"""``hermes slack ...`` CLI subcommands.
+
+Today only ``hermes slack manifest`` is implemented — it generates the
+Slack app manifest JSON for registering every gateway command as a native
+Slack slash (``/btw``, ``/stop``, ``/model``, …) so users get the same
+first-class slash UX Discord and Telegram already have.
+
+Typical workflow::
+
+    $ hermes slack manifest > slack-manifest.json
+    # or:
+    $ hermes slack manifest --write
+
+Then paste the printed JSON into the Slack app config (Features → App
+Manifest → Edit) and click Save. Slack diffs the manifest and prompts
+for reinstall when scopes/commands change.
+"""
+from __future__ import annotations
+
+import json
+import sys
+from pathlib import Path
+
+
+def _build_full_manifest(bot_name: str, bot_description: str) -> dict:
+    """Build a full Slack manifest merging display info + our slash list.
+
+    The slash-command list is always generated from ``COMMAND_REGISTRY`` so
+    it stays in sync with the rest of Hermes. Other manifest sections
+    (display info, OAuth scopes, socket mode) are set to sensible defaults
+    for a Hermes deployment — users can tweak them in the Slack UI after
+    pasting.
+    """
+    from hermes_cli.commands import slack_app_manifest
+
+    partial = slack_app_manifest()
+    slashes = partial["features"]["slash_commands"]
+
+    return {
+        "_metadata": {
+            "major_version": 1,
+            "minor_version": 1,
+        },
+        "display_information": {
+            "name": bot_name[:35],
+            "description": (bot_description or "Your Hermes agent on Slack")[:140],
+            "background_color": "#1a1a2e",
+        },
+        "features": {
+            "bot_user": {
+                "display_name": bot_name[:80],
+                "always_online": True,
+            },
+            "slash_commands": slashes,
+            "assistant_view": {
+                "assistant_description": "Chat with Hermes in threads and DMs.",
+            },
+        },
+        "oauth_config": {
+            "scopes": {
+                "bot": [
+                    "app_mentions:read",
+                    "assistant:write",
+                    "channels:history",
+                    "channels:read",
+                    "chat:write",
+                    "commands",
+                    "files:read",
+                    "files:write",
+                    "groups:history",
+                    "im:history",
+                    "im:read",
+                    "im:write",
+                    "users:read",
+                ],
+            },
+        },
+        "settings": {
+            "event_subscriptions": {
+                "bot_events": [
+                    "app_mention",
+                    "assistant_thread_context_changed",
+                    "assistant_thread_started",
+                    "message.channels",
+                    "message.groups",
+                    "message.im",
+                ],
+            },
+            "interactivity": {
+                "is_enabled": True,
+            },
+            "org_deploy_enabled": False,
+            "socket_mode_enabled": True,
+            "token_rotation_enabled": False,
+        },
+    }
+
+
+def slack_manifest_command(args) -> int:
+    """Print or write a Slack app manifest JSON.
+
+    Flags (all parsed in ``hermes_cli/main.py``):
+      --write [PATH]  Write to file instead of stdout (default path:
+                      ``$HERMES_HOME/slack-manifest.json``)
+      --name NAME     Override the bot display name (default: "Hermes")
+      --description DESC  Override the bot description
+      --slashes-only  Emit only the ``features.slash_commands`` array (for
+                      merging into an existing manifest manually)
+    """
+    name = getattr(args, "name", None) or "Hermes"
+    description = getattr(args, "description", None) or "Your Hermes agent on Slack"
+
+    if getattr(args, "slashes_only", False):
+        from hermes_cli.commands import slack_app_manifest
+
+        manifest = slack_app_manifest()["features"]["slash_commands"]
+    else:
+        manifest = _build_full_manifest(name, description)
+
+    payload = json.dumps(manifest, indent=2, ensure_ascii=False) + "\n"
+
+    write_target = getattr(args, "write", None)
+    if write_target is not None:
+        if isinstance(write_target, bool) and write_target:
+            # --write with no value → default location
+            try:
+                from hermes_constants import get_hermes_home
+
+                target = Path(get_hermes_home()) / "slack-manifest.json"
+            except Exception:
+                target = Path.home() / ".hermes" / "slack-manifest.json"
+        else:
+            target = Path(write_target).expanduser()
+        target.parent.mkdir(parents=True, exist_ok=True)
+        target.write_text(payload, encoding="utf-8")
+        print(f"Slack manifest written to: {target}", file=sys.stderr)
+        print(
+            "\nNext steps:\n"
+            "  1. Open https://api.slack.com/apps and pick your Hermes app\n"
+            "     (or create a new one: Create New App → From an app manifest).\n"
+            f"  2. Features → App Manifest → paste the contents of\n"
+            f"     {target}\n"
+            "  3. Save; Slack will prompt to reinstall the app if scopes or\n"
+            "     slash commands changed.\n"
+            "  4. Make sure Socket Mode is enabled and you have a bot token\n"
+            "     (xoxb-...) and app token (xapp-...) configured via\n"
+            "     `hermes setup`.\n",
+            file=sys.stderr,
+        )
+    else:
+        sys.stdout.write(payload)
+    return 0
@@ -326,7 +326,8 @@ def show_status(args):
        "WeCom Callback": ("WECOM_CALLBACK_CORP_ID", None),
        "Weixin": ("WEIXIN_ACCOUNT_ID", "WEIXIN_HOME_CHANNEL"),
        "BlueBubbles": ("BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_HOME_CHANNEL"),
-        "QQBot": ("QQ_APP_ID", "QQBOT_HOME_CHANNEL"),
+        "QQBot": ("QQ_APP_ID", "QQ_HOME_CHANNEL"),
+        "Yuanbao": ("YUANBAO_APP_ID", "YUANBAO_HOME_CHANNEL"),
    }
    
    for name, (token_var, home_var) in platforms.items():
@@ -20,10 +20,10 @@ def get_provider_request_timeout(

    try:
        from hermes_cli.config import load_config
-    except ImportError:
+        config = load_config()
+    except Exception:
        return None

-    config = load_config()
    providers = config.get("providers", {}) if isinstance(config, dict) else {}
    provider_config = (
        providers.get(provider_id, {}) if isinstance(providers, dict) else {}
@@ -49,10 +49,10 @@ def get_provider_stale_timeout(

    try:
        from hermes_cli.config import load_config
-    except ImportError:
+        config = load_config()
+    except Exception:
        return None

-    config = load_config()
    providers = config.get("providers", {}) if isinstance(config, dict) else {}
    provider_config = (
        providers.get(provider_id, {}) if isinstance(providers, dict) else {}
@@ -10,8 +10,7 @@ import random

 TIPS = [
    # --- Slash Commands ---
-    "/btw <question> asks a quick side question without tools or history — great for clarifications.",
-    "/background <prompt> runs a task in a separate session while your current one stays free.",
+    "/background <prompt> (alias /bg or /btw) runs a task in a separate session while your current one stays free.",
    "/branch forks the current session so you can explore a different direction without losing progress.",
    "/compress manually compresses conversation context when things get long.",
    "/rollback lists filesystem checkpoints — restore files the agent modified to any prior state.",
@@ -107,7 +106,7 @@ TIPS = [
    "Set display.streaming: true to see tokens appear in real time as the model generates.",
    "Set display.show_reasoning: true to watch the model's chain-of-thought reasoning.",
    "Set display.compact: true to reduce whitespace in output for denser information.",
-    "Set display.busy_input_mode: queue to queue messages instead of interrupting the agent.",
+    "Set display.busy_input_mode: queue to queue messages instead of interrupting the agent, or steer to inject them mid-run via /steer.",
    "Set display.resume_display: minimal to skip the full conversation recap on session resume.",
    "Set compression.threshold: 0.50 to control when auto-compression fires (default: 50% of context).",
    "Set agent.max_turns: 200 to let the agent take more tool-calling steps per turn.",
@@ -11,6 +11,7 @@ the `platform_toolsets` key.

 import json as _json
 import logging
+import os
 import sys
 from pathlib import Path
 from typing import Dict, List, Optional, Set
@@ -25,7 +26,7 @@ from hermes_cli.nous_subscription import (
    get_nous_subscription_features,
 )
 from tools.tool_backend_helpers import fal_key_is_configured, managed_nous_tools_enabled
-from utils import base_url_hostname
+from utils import base_url_hostname, is_truthy_value

 logger = logging.getLogger(__name__)

@@ -70,6 +71,7 @@ CONFIGURABLE_TOOLSETS = [
    ("spotify",          "🎵 Spotify",                  "playback, search, playlists, library"),
    ("discord",         "💬 Discord (read/participate)", "fetch messages, search members, create thread"),
    ("discord_admin",   "🛡️  Discord Server Admin",    "list channels/roles, pin, assign roles"),
+    ("yuanbao",          "🤖 Yuanbao",                  "group info, member queries, DM"),
 ]

 # Toolsets that are OFF by default for new installs.
@@ -676,6 +678,15 @@ def _get_platform_tools(
        # their own platform (e.g. `discord` + `discord` should stay OFF).
        if platform in default_off and platform not in _TOOLSET_PLATFORM_RESTRICTIONS:
            default_off.remove(platform)
+        # Home Assistant is already runtime-gated by its check_fn (requires
+        # HASS_TOKEN to register any tools). When a user has configured
+        # HASS_TOKEN, they've explicitly opted in — don't also strip it via
+        # _DEFAULT_OFF_TOOLSETS, which would silently drop HA from platforms
+        # (e.g. cron) that run through _get_platform_tools without an
+        # explicit saved toolset list. Without this, Norbert's HA cron jobs
+        # regressed after #14798 made cron honor per-platform tool config.
+        if "homeassistant" in default_off and os.getenv("HASS_TOKEN"):
+            default_off.remove("homeassistant")
        enabled_toolsets -= default_off

    # Recover non-configurable platform toolsets (e.g. discord, feishu_doc,
@@ -1177,7 +1188,7 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
                configured_provider = image_cfg.get("provider")
                if configured_provider not in (None, "", "fal"):
                    return False
-                if image_cfg.get("use_gateway") is False:
+                if image_cfg.get("use_gateway") is not None and not is_truthy_value(image_cfg.get("use_gateway"), default=False):
                    return False
            return feature.managed_by_nous
        if provider.get("tts_provider"):
@@ -1209,7 +1220,7 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
        return (
            provider["imagegen_backend"] == "fal"
            and configured_provider in (None, "", "fal")
-            and not image_cfg.get("use_gateway")
+            and not is_truthy_value(image_cfg.get("use_gateway"), default=False)
        )
    return False

@@ -287,7 +287,7 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
    "display.busy_input_mode": {
        "type": "select",
        "description": "Input behavior while agent is running",
-        "options": ["interrupt", "queue"],
+        "options": ["interrupt", "queue", "steer"],
    },
    "memory.provider": {
        "type": "select",
@@ -2327,16 +2327,14 @@ def _resolve_chat_argv(
    from hermes_cli.main import PROJECT_ROOT, _make_tui_argv

    argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
-    env: Optional[dict] = None
+    env = os.environ.copy()
+    env.setdefault("NODE_ENV", "production")

-    if resume or sidecar_url:
-        env = os.environ.copy()
+    if resume:
+        env["HERMES_TUI_RESUME"] = resume

-        if resume:
-            env["HERMES_TUI_RESUME"] = resume
-
-        if sidecar_url:
-            env["HERMES_TUI_SIDECAR_URL"] = sidecar_url
+    if sidecar_url:
+        env["HERMES_TUI_SIDECAR_URL"] = sidecar_url

    return list(argv), str(cwd) if cwd else None, env

@@ -195,10 +195,6 @@ def setup_logging(
        The ``logs/`` directory where files are written.
    """
    global _logging_initialized
-    if _logging_initialized and not force:
-        home = hermes_home or get_hermes_home()
-        return home / "logs"
-
    home = hermes_home or get_hermes_home()
    log_dir = home / "logs"
    log_dir.mkdir(parents=True, exist_ok=True)
@@ -248,6 +244,9 @@ def setup_logging(
            log_filter=_ComponentFilter(COMPONENT_PREFIXES["gateway"]),
        )

+    if _logging_initialized and not force:
+        return log_dir
+
    # Ensure root logger level is low enough for the handlers to fire.
    if root.level == logging.NOTSET or root.level > level:
        root.setLevel(level)
@@ -22,6 +22,8 @@ import sqlite3
 import threading
 import time
 from pathlib import Path
+
+from agent.memory_manager import sanitize_context
 from hermes_constants import get_hermes_home
 from typing import Any, Callable, Dict, List, Optional, TypeVar

@@ -31,7 +33,7 @@ T = TypeVar("T")

 DEFAULT_DB_PATH = get_hermes_home() / "state.db"

-SCHEMA_VERSION = 9
+SCHEMA_VERSION = 10

 SCHEMA_SQL = """
 CREATE TABLE IF NOT EXISTS schema_version (
@@ -119,6 +121,32 @@ CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
 END;
 """

+# Trigram FTS5 table for CJK substring search.  The default unicode61
+# tokenizer splits CJK characters into individual tokens, breaking phrase
+# matching.  The trigram tokenizer creates overlapping 3-byte sequences so
+# substring queries work natively for any script (CJK, Thai, etc.).
+FTS_TRIGRAM_SQL = """
+CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts_trigram USING fts5(
+    content,
+    content=messages,
+    content_rowid=id,
+    tokenize='trigram'
+);
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_insert AFTER INSERT ON messages BEGIN
+    INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_delete AFTER DELETE ON messages BEGIN
+    INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_update AFTER UPDATE ON messages BEGIN
+    INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
+    INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
+END;
+"""
+

 class SessionDB:
    """
@@ -366,6 +394,18 @@ class SessionDB:
                except sqlite3.OperationalError:
                    pass  # Column already exists
                cursor.execute("UPDATE schema_version SET version = 9")
+            if current_version < 10:
+                # v10: trigram FTS5 table for CJK/substring search.
+                # Created via FTS_TRIGRAM_SQL below; backfill existing messages.
+                try:
+                    cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
+                except sqlite3.OperationalError:
+                    cursor.executescript(FTS_TRIGRAM_SQL)
+                    cursor.execute(
+                        "INSERT INTO messages_fts_trigram(rowid, content) "
+                        "SELECT id, content FROM messages WHERE content IS NOT NULL"
+                    )
+                cursor.execute("UPDATE schema_version SET version = 10")

        # Unique title index — always ensure it exists (safe to run after migrations
        # since the title column is guaranteed to exist at this point)
@@ -383,6 +423,12 @@ class SessionDB:
        except sqlite3.OperationalError:
            cursor.executescript(FTS_SQL)

+        # Trigram FTS5 for CJK/substring search
+        try:
+            cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
+        except sqlite3.OperationalError:
+            cursor.executescript(FTS_TRIGRAM_SQL)
+
        self._conn.commit()

    # =========================================================================
@@ -832,7 +878,18 @@ class SessionDB:
        params = []

        if not include_children:
-            where_clauses.append("s.parent_session_id IS NULL")
+            # Show root sessions and branch sessions (whose parent ended with
+            # end_reason='branched' before the child was created), while still
+            # hiding sub-agent runs and compression continuations (which also
+            # carry a parent_session_id but were spawned while the parent was
+            # still live — i.e., started_at < parent.ended_at).
+            where_clauses.append(
+                "(s.parent_session_id IS NULL"
+                " OR EXISTS (SELECT 1 FROM sessions p"
+                "            WHERE p.id = s.parent_session_id"
+                "            AND p.end_reason = 'branched'"
+                "            AND s.started_at >= p.ended_at))"
+            )

        if source:
            where_clauses.append("s.source = ?")
@@ -1121,23 +1178,33 @@ class SessionDB:
                current = child_id
        return session_id

-    def get_messages_as_conversation(self, session_id: str) -> List[Dict[str, Any]]:
+    def get_messages_as_conversation(
+        self, session_id: str, include_ancestors: bool = False
+    ) -> List[Dict[str, Any]]:
        """
        Load messages in the OpenAI conversation format (role + content dicts).
        Used by the gateway to restore conversation history.
        """
+        session_ids = [session_id]
+        if include_ancestors:
+            session_ids = self._session_lineage_root_to_tip(session_id)
+
        with self._lock:
-            cursor = self._conn.execute(
+            placeholders = ",".join("?" for _ in session_ids)
+            rows = self._conn.execute(
                "SELECT role, content, tool_call_id, tool_calls, tool_name, "
                "reasoning, reasoning_content, reasoning_details, codex_reasoning_items, "
                "codex_message_items "
-                "FROM messages WHERE session_id = ? ORDER BY timestamp, id",
-                (session_id,),
-            )
-            rows = cursor.fetchall()
+                f"FROM messages WHERE session_id IN ({placeholders}) ORDER BY timestamp, id",
+                tuple(session_ids),
+            ).fetchall()
+
        messages = []
        for row in rows:
-            msg = {"role": row["role"], "content": row["content"]}
+            content = row["content"]
+            if row["role"] in {"user", "assistant"} and isinstance(content, str):
+                content = sanitize_context(content).strip()
+            msg = {"role": row["role"], "content": content}
            if row["tool_call_id"]:
                msg["tool_call_id"] = row["tool_call_id"]
            if row["tool_name"]:
@@ -1174,9 +1241,47 @@ class SessionDB:
                    except (json.JSONDecodeError, TypeError):
                        logger.warning("Failed to deserialize codex_message_items, falling back to None")
                        msg["codex_message_items"] = None
+            if include_ancestors and self._is_duplicate_replayed_user_message(messages, msg):
+                continue
            messages.append(msg)
        return messages

+    def _session_lineage_root_to_tip(self, session_id: str) -> List[str]:
+        if not session_id:
+            return [session_id]
+
+        chain = []
+        current = session_id
+        seen = set()
+        with self._lock:
+            for _ in range(100):
+                if not current or current in seen:
+                    break
+                seen.add(current)
+                chain.append(current)
+                row = self._conn.execute(
+                    "SELECT parent_session_id FROM sessions WHERE id = ?",
+                    (current,),
+                ).fetchone()
+                if row is None:
+                    break
+                current = row["parent_session_id"] if hasattr(row, "keys") else row[0]
+        return list(reversed(chain)) or [session_id]
+
+    @staticmethod
+    def _is_duplicate_replayed_user_message(messages: List[Dict[str, Any]], msg: Dict[str, Any]) -> bool:
+        if msg.get("role") != "user":
+            return False
+        content = msg.get("content")
+        if not isinstance(content, str) or not content:
+            return False
+        for prev in reversed(messages):
+            if prev.get("role") == "user" and prev.get("content") == content:
+                return True
+            if prev.get("role") == "assistant" and (prev.get("content") or prev.get("tool_calls")):
+                return False
+        return False
+
    # =========================================================================
    # Search
    # =========================================================================
@@ -1235,6 +1340,16 @@ class SessionDB:
        return sanitized.strip()


+    @staticmethod
+    def _is_cjk_codepoint(cp: int) -> bool:
+        return (0x4E00 <= cp <= 0x9FFF or    # CJK Unified Ideographs
+                0x3400 <= cp <= 0x4DBF or    # CJK Extension A
+                0x20000 <= cp <= 0x2A6DF or  # CJK Extension B
+                0x3000 <= cp <= 0x303F or    # CJK Symbols
+                0x3040 <= cp <= 0x309F or    # Hiragana
+                0x30A0 <= cp <= 0x30FF or    # Katakana
+                0xAC00 <= cp <= 0xD7AF)      # Hangul Syllables
+
    @staticmethod
    def _contains_cjk(text: str) -> bool:
        """Check if text contains CJK (Chinese, Japanese, Korean) characters."""
@@ -1250,6 +1365,11 @@ class SessionDB:
                return True
        return False

+    @classmethod
+    def _count_cjk(cls, text: str) -> int:
+        """Count CJK characters in text."""
+        return sum(1 for ch in text if cls._is_cjk_codepoint(ord(ch)))
+
    def search_messages(
        self,
        query: str,
@@ -1320,52 +1440,113 @@ class SessionDB:
            LIMIT ? OFFSET ?
        """

-        with self._lock:
-            try:
-                cursor = self._conn.execute(sql, params)
-            except sqlite3.OperationalError:
-                # FTS5 query syntax error despite sanitization — return empty
-                # unless query contains CJK (fall back to LIKE below)
-                if not self._contains_cjk(query):
-                    return []
-                matches = []
-            else:
-                matches = [dict(row) for row in cursor.fetchall()]
-
-        # LIKE fallback for CJK queries: FTS5 default tokenizer splits CJK
-        # characters individually, causing multi-character queries to fail.
-        if not matches and self._contains_cjk(query):
+        # CJK queries bypass the unicode61 FTS5 table.  The default tokenizer
+        # splits CJK characters into individual tokens, so "大别山项目" becomes
+        # "大 AND 别 AND 山 AND 项 AND 目" — producing false positives and
+        # missing exact phrase matches.
+        #
+        # For queries with 3+ CJK characters, we use the trigram FTS5 table
+        # (indexed substring matching with ranking and snippets).  For shorter
+        # CJK queries (1-2 chars), trigram can't match (it needs ≥9 UTF-8
+        # bytes = 3 CJK chars), so we fall back to LIKE.
+        is_cjk = self._contains_cjk(query)
+        if is_cjk:
            raw_query = query.strip('"').strip()
-            like_where = ["m.content LIKE ?"]
-            like_params: list = [f"%{raw_query}%"]
-            if source_filter is not None:
-                like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
-                like_params.extend(source_filter)
-            if exclude_sources is not None:
-                like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
-                like_params.extend(exclude_sources)
-            if role_filter:
-                like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
-                like_params.extend(role_filter)
-            like_sql = f"""
-                SELECT m.id, m.session_id, m.role,
-                       substr(m.content,
-                              max(1, instr(m.content, ?) - 40),
-                              120) AS snippet,
-                       m.content, m.timestamp, m.tool_name,
-                       s.source, s.model, s.started_at AS session_started
-                FROM messages m
-                JOIN sessions s ON s.id = m.session_id
-                WHERE {' AND '.join(like_where)}
-                ORDER BY m.timestamp DESC
-                LIMIT ? OFFSET ?
-            """
-            like_params.extend([limit, offset])
-            # instr() parameter goes first in the bound list
-            like_params = [raw_query] + like_params
+            cjk_count = self._count_cjk(raw_query)
+
+            if cjk_count >= 3:
+                # Trigram FTS5 path — quote each non-operator token to handle
+                # FTS5 special chars (%, *, etc.) while preserving boolean
+                # operators (AND, OR, NOT) for multi-term queries.
+                tokens = raw_query.split()
+                parts = []
+                for tok in tokens:
+                    if tok.upper() in ("AND", "OR", "NOT"):
+                        parts.append(tok)
+                    else:
+                        parts.append('"' + tok.replace('"', '""') + '"')
+                trigram_query = " ".join(parts)
+                tri_where = ["messages_fts_trigram MATCH ?"]
+                tri_params: list = [trigram_query]
+                if source_filter is not None:
+                    tri_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
+                    tri_params.extend(source_filter)
+                if exclude_sources is not None:
+                    tri_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
+                    tri_params.extend(exclude_sources)
+                if role_filter:
+                    tri_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
+                    tri_params.extend(role_filter)
+                tri_sql = f"""
+                    SELECT
+                        m.id,
+                        m.session_id,
+                        m.role,
+                        snippet(messages_fts_trigram, 0, '>>>', '<<<', '...', 40) AS snippet,
+                        m.content,
+                        m.timestamp,
+                        m.tool_name,
+                        s.source,
+                        s.model,
+                        s.started_at AS session_started
+                    FROM messages_fts_trigram
+                    JOIN messages m ON m.id = messages_fts_trigram.rowid
+                    JOIN sessions s ON s.id = m.session_id
+                    WHERE {' AND '.join(tri_where)}
+                    ORDER BY rank
+                    LIMIT ? OFFSET ?
+                """
+                tri_params.extend([limit, offset])
+                with self._lock:
+                    try:
+                        tri_cursor = self._conn.execute(tri_sql, tri_params)
+                    except sqlite3.OperationalError:
+                        matches = []
+                    else:
+                        matches = [dict(row) for row in tri_cursor.fetchall()]
+            else:
+                # Short CJK query (1-2 chars) — trigram needs ≥3 CJK chars.
+                # Fall back to LIKE substring search.
+                escaped = raw_query.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
+                like_where = ["m.content LIKE ? ESCAPE '\\'"]
+                like_params: list = [f"%{escaped}%"]
+                if source_filter is not None:
+                    like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
+                    like_params.extend(source_filter)
+                if exclude_sources is not None:
+                    like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
+                    like_params.extend(exclude_sources)
+                if role_filter:
+                    like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
+                    like_params.extend(role_filter)
+                like_sql = f"""
+                    SELECT m.id, m.session_id, m.role,
+                           substr(m.content,
+                                  max(1, instr(m.content, ?) - 40),
+                                  120) AS snippet,
+                           m.content, m.timestamp, m.tool_name,
+                           s.source, s.model, s.started_at AS session_started
+                    FROM messages m
+                    JOIN sessions s ON s.id = m.session_id
+                    WHERE {' AND '.join(like_where)}
+                    ORDER BY m.timestamp DESC
+                    LIMIT ? OFFSET ?
+                """
+                like_params.extend([limit, offset])
+                # instr() parameter goes first in the bound list
+                like_params = [raw_query] + like_params
+                with self._lock:
+                    like_cursor = self._conn.execute(like_sql, like_params)
+                    matches = [dict(row) for row in like_cursor.fetchall()]
+        else:
            with self._lock:
-                like_cursor = self._conn.execute(like_sql, like_params)
-                matches = [dict(row) for row in like_cursor.fetchall()]
+                try:
+                    cursor = self._conn.execute(sql, params)
+                except sqlite3.OperationalError:
+                    # FTS5 query syntax error despite sanitization — return empty
+                    return []
+                else:
+                    matches = [dict(row) for row in cursor.fetchall()]

        # Add surrounding context (1 message before + after each match).
        # Done outside the lock so we don't hold it across N sequential queries.
@@ -1425,16 +1606,32 @@ class SessionDB:
        limit: int = 20,
        offset: int = 0,
    ) -> List[Dict[str, Any]]:
-        """List sessions, optionally filtered by source."""
+        """List sessions, optionally filtered by source.
+
+        Returns rows enriched with a computed ``last_active`` column (latest
+        message timestamp for the session, falling back to ``started_at``),
+        ordered by most-recently-used first.
+        """
+        select_with_last_active = (
+            "SELECT s.*, COALESCE(m.last_active, s.started_at) AS last_active "
+            "FROM sessions s "
+            "LEFT JOIN ("
+            "SELECT session_id, MAX(timestamp) AS last_active "
+            "FROM messages GROUP BY session_id"
+            ") m ON m.session_id = s.id "
+        )
        with self._lock:
            if source:
                cursor = self._conn.execute(
-                    "SELECT * FROM sessions WHERE source = ? ORDER BY started_at DESC LIMIT ? OFFSET ?",
+                    f"{select_with_last_active}"
+                    "WHERE s.source = ? "
+                    "ORDER BY last_active DESC, s.started_at DESC, s.id DESC LIMIT ? OFFSET ?",
                    (source, limit, offset),
                )
            else:
                cursor = self._conn.execute(
-                    "SELECT * FROM sessions ORDER BY started_at DESC LIMIT ? OFFSET ?",
+                    f"{select_with_last_active}"
+                    "ORDER BY last_active DESC, s.started_at DESC, s.id DESC LIMIT ? OFFSET ?",
                    (limit, offset),
                )
            return [dict(row) for row in cursor.fetchall()]
@@ -1501,12 +1698,45 @@ class SessionDB:
            )
        self._execute_write(_do)

-    def delete_session(self, session_id: str) -> bool:
+    @staticmethod
+    def _remove_session_files(sessions_dir: Optional[Path], session_id: str) -> None:
+        """Remove on-disk transcript files for a session.
+
+        Cleans up ``{session_id}.json``, ``{session_id}.jsonl``, and any
+        ``request_dump_{session_id}_*.json`` files left by the gateway.
+        Silently skips files that don't exist and swallows OSError so a
+        filesystem hiccup never blocks a DB operation.
+        """
+        if sessions_dir is None:
+            return
+        for suffix in (".json", ".jsonl"):
+            p = sessions_dir / f"{session_id}{suffix}"
+            try:
+                p.unlink(missing_ok=True)
+            except OSError:
+                pass
+        # request_dump files use session_id as a prefix component
+        try:
+            for p in sessions_dir.glob(f"request_dump_{session_id}_*.json"):
+                try:
+                    p.unlink(missing_ok=True)
+                except OSError:
+                    pass
+        except OSError:
+            pass
+
+    def delete_session(
+        self,
+        session_id: str,
+        sessions_dir: Optional[Path] = None,
+    ) -> bool:
        """Delete a session and all its messages.

        Child sessions are orphaned (parent_session_id set to NULL) rather
        than cascade-deleted, so they remain accessible independently.
-        Returns True if the session was found and deleted.
+        When *sessions_dir* is provided, also removes on-disk transcript
+        files (``.json`` / ``.jsonl`` / ``request_dump_*``) for the deleted
+        session. Returns True if the session was found and deleted.
        """
        def _do(conn):
            cursor = conn.execute(
@@ -1523,16 +1753,29 @@ class SessionDB:
            conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
            conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
            return True
-        return self._execute_write(_do)

-    def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
+        deleted = self._execute_write(_do)
+        if deleted:
+            self._remove_session_files(sessions_dir, session_id)
+        return deleted
+
+    def prune_sessions(
+        self,
+        older_than_days: int = 90,
+        source: str = None,
+        sessions_dir: Optional[Path] = None,
+    ) -> int:
        """Delete sessions older than N days. Returns count of deleted sessions.

        Only prunes ended sessions (not active ones).  Child sessions outside
        the prune window are orphaned (parent_session_id set to NULL) rather
-        than cascade-deleted.
+        than cascade-deleted.  When *sessions_dir* is provided, also removes
+        on-disk transcript files (``.json`` / ``.jsonl`` /
+        ``request_dump_*``) for every pruned session, outside the DB
+        transaction.
        """
        cutoff = time.time() - (older_than_days * 86400)
+        removed_ids: list[str] = []

        def _do(conn):
            if source:
@@ -1562,9 +1805,14 @@ class SessionDB:
            for sid in session_ids:
                conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
                conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
+                removed_ids.append(sid)
            return len(session_ids)

-        return self._execute_write(_do)
+        count = self._execute_write(_do)
+        # Clean up on-disk files outside the DB transaction
+        for sid in removed_ids:
+            self._remove_session_files(sessions_dir, sid)
+        return count

    # ── Meta key/value (for scheduler bookkeeping) ──

@@ -1618,6 +1866,7 @@ class SessionDB:
        retention_days: int = 90,
        min_interval_hours: int = 24,
        vacuum: bool = True,
+        sessions_dir: Optional[Path] = None,
    ) -> Dict[str, Any]:
        """Idempotent auto-maintenance: prune old sessions + optional VACUUM.

@@ -1625,6 +1874,10 @@ class SessionDB:
        within ``min_interval_hours`` no-op. Designed to be called once at
        startup from long-lived entrypoints (CLI, gateway, cron scheduler).

+        When *sessions_dir* is provided, on-disk transcript files
+        (``.json`` / ``.jsonl`` / ``request_dump_*``) for pruned sessions
+        are removed as part of the same sweep (issue #3015).
+
        Never raises. On any failure, logs a warning and returns a dict
        with ``"error"`` set.

@@ -1648,7 +1901,10 @@ class SessionDB:
                except (TypeError, ValueError):
                    pass  # corrupt meta; treat as no prior run

-            pruned = self.prune_sessions(older_than_days=retention_days)
+            pruned = self.prune_sessions(
+                older_than_days=retention_days,
+                sessions_dir=sessions_dir,
+            )
            result["pruned"] = pruned

            # Only VACUUM if we actually freed rows — VACUUM on a tight DB
@@ -7,9 +7,7 @@
  perSystem = { pkgs, system, lib, ... }:
    let
      hermes-agent = inputs.self.packages.${system}.default;
-      hermesVenv = pkgs.callPackage ./python.nix {
-        inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
-      };
+      hermesVenv = hermes-agent.hermesVenv;

      configMergeScript = pkgs.callPackage ./configMergeScript.nix { };

@@ -193,6 +191,35 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2)
          echo "ok" > $out/result
        '';

+        # Verify extraPythonPackages PYTHONPATH injection
+        extra-python-packages = let
+          testPkg = pkgs.python312Packages.pyfiglet;
+          hermesWithExtra = hermes-agent.override {
+            extraPythonPackages = [ testPkg ];
+          };
+        in pkgs.runCommand "hermes-extra-python-packages" { } ''
+          set -e
+          echo "=== Checking extraPythonPackages PYTHONPATH injection ==="
+
+          grep -q "PYTHONPATH" ${hermesWithExtra}/bin/hermes || \
+            (echo "FAIL: PYTHONPATH not in wrapper"; exit 1)
+          echo "PASS: PYTHONPATH present in wrapper"
+
+          grep -q "${testPkg}" ${hermesWithExtra}/bin/hermes || \
+            (echo "FAIL: test package path not in PYTHONPATH"; exit 1)
+          echo "PASS: test package path found in wrapper"
+
+          echo "=== Checking base package has no PYTHONPATH ==="
+          if grep -q "PYTHONPATH" ${hermes-agent}/bin/hermes; then
+            echo "FAIL: base package should not have PYTHONPATH"; exit 1
+          fi
+          echo "PASS: base package clean"
+
+          echo "=== All extraPythonPackages checks passed ==="
+          mkdir -p $out
+          echo "ok" > $out/result
+        '';
+
        # ── Config merge + round-trip test ────────────────────────────────
        # Tests the merge script (Nix activation behavior) across 7
        # scenarios, then verifies Python's load_config() reads correctly.
@@ -0,0 +1,186 @@
+# nix/hermes-agent.nix — Overridable Hermes Agent package
+#
+# callPackage auto-wires nixpkgs args; flake inputs are passed explicitly.
+# Users override via: pkgs.hermes-agent.override { extraPythonPackages = [...]; }
+{
+  lib,
+  stdenv,
+  makeWrapper,
+  callPackage,
+  python312,
+  nodejs_22,
+  ripgrep,
+  git,
+  openssh,
+  ffmpeg,
+  tirith,
+  # Flake inputs — passed explicitly by packages.nix and overlays.nix
+  uv2nix,
+  pyproject-nix,
+  pyproject-build-systems,
+  npm-lockfile-fix,
+  # Overridable parameters
+  extraPythonPackages ? [ ],
+}:
+let
+  hermesVenv = callPackage ./python.nix {
+    inherit uv2nix pyproject-nix pyproject-build-systems;
+  };
+
+  hermesNpmLib = callPackage ./lib.nix {
+    inherit npm-lockfile-fix;
+  };
+
+  hermesTui = callPackage ./tui.nix {
+    inherit hermesNpmLib;
+  };
+
+  hermesWeb = callPackage ./web.nix {
+    inherit hermesNpmLib;
+  };
+
+  bundledSkills = lib.cleanSourceWith {
+    src = ../skills;
+    filter = path: _type: !(lib.hasInfix "/index-cache/" path);
+  };
+
+  runtimeDeps = [
+    nodejs_22
+    ripgrep
+    git
+    openssh
+    ffmpeg
+    tirith
+  ];
+
+  runtimePath = lib.makeBinPath runtimeDeps;
+
+  sitePackagesPath = python312.sitePackages;
+
+  # Walk propagatedBuildInputs to include transitive Python deps in PYTHONPATH.
+  # Without this, a plugin listing e.g. requests as a dep would fail at runtime
+  # if requests isn't already in the sealed uv2nix venv.
+  allExtraPythonPackages = python312.pkgs.requiredPythonModules extraPythonPackages;
+
+  pythonPath = lib.makeSearchPath sitePackagesPath allExtraPythonPackages;
+
+  pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
+  uvLockHash =
+    if builtins.pathExists ../uv.lock then
+      builtins.hashString "sha256" (builtins.readFile ../uv.lock)
+    else
+      "none";
+in
+stdenv.mkDerivation {
+  pname = "hermes-agent";
+  version = (builtins.fromTOML (builtins.readFile ../pyproject.toml)).project.version;
+
+  dontUnpack = true;
+  dontBuild = true;
+  nativeBuildInputs = [ makeWrapper ];
+
+  installPhase = ''
+    runHook preInstall
+
+    mkdir -p $out/share/hermes-agent $out/bin
+    cp -r ${bundledSkills} $out/share/hermes-agent/skills
+    cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
+
+    mkdir -p $out/ui-tui
+    cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/
+
+    ${lib.concatMapStringsSep "\n"
+      (name: ''
+        makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
+          --suffix PATH : "${runtimePath}" \
+          --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
+          --set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
+          --set HERMES_TUI_DIR $out/ui-tui \
+          --set HERMES_PYTHON ${hermesVenv}/bin/python3 \
+          --set HERMES_NODE ${nodejs_22}/bin/node \
+          ${lib.optionalString (extraPythonPackages != [ ]) ''--suffix PYTHONPATH : "${pythonPath}"''}
+      '')
+      [
+        "hermes"
+        "hermes-agent"
+        "hermes-acp"
+      ]
+    }
+
+    ${lib.optionalString (extraPythonPackages != [ ]) ''
+      echo "=== Checking for plugin/core package collisions ==="
+      ${hermesVenv}/bin/python3 -c "
+import pathlib, sys, re
+
+def canonical(name):
+    return re.sub(r'[-_.]+', '-', name).lower()
+
+# Collect core venv package names
+core = set()
+venv_sp = pathlib.Path('${hermesVenv}/${sitePackagesPath}')
+for di in venv_sp.glob('*.dist-info'):
+    meta = di / 'METADATA'
+    if meta.exists():
+        for line in meta.read_text().splitlines():
+            if line.startswith('Name:'):
+                core.add(canonical(line.split(':', 1)[1].strip()))
+                break
+
+# Check each extra package for collisions
+extras_dirs = [${lib.concatMapStringsSep ", " (p: "'${toString p}'") allExtraPythonPackages}]
+for edir in extras_dirs:
+    sp = pathlib.Path(edir) / '${sitePackagesPath}'
+    if not sp.exists():
+        continue
+    for di in sp.glob('*.dist-info'):
+        meta = di / 'METADATA'
+        if not meta.exists():
+            continue
+        for line in meta.read_text().splitlines():
+            if line.startswith('Name:'):
+                pkg = canonical(line.split(':', 1)[1].strip())
+                if pkg in core:
+                    print(f'ERROR: plugin package \"{pkg}\" collides with a package in hermes sealed venv', file=sys.stderr)
+                    print(f'  from: {di}', file=sys.stderr)
+                    print(f'  Remove this dependency from extraPythonPackages.', file=sys.stderr)
+                    sys.exit(1)
+                break
+
+print('No collisions found.')
+      "
+      echo "=== No collisions ==="
+    ''}
+
+    runHook postInstall
+  '';
+
+  passthru = {
+    inherit hermesTui hermesWeb hermesNpmLib hermesVenv;
+
+    devShellHook = ''
+      STAMP=".nix-stamps/hermes-agent"
+      STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
+      if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
+        echo "hermes-agent: installing Python dependencies..."
+        uv venv .venv --python ${python312}/bin/python3 2>/dev/null || true
+        source .venv/bin/activate
+        uv pip install -e ".[all]"
+        [ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
+        [ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
+        mkdir -p .nix-stamps
+        echo "$STAMP_VALUE" > "$STAMP"
+      else
+        source .venv/bin/activate
+        export HERMES_PYTHON=${hermesVenv}/bin/python3
+      fi
+    '';
+  };
+
+  meta = with lib; {
+    description = "AI agent with advanced tool-calling capabilities";
+    homepage = "https://github.com/NousResearch/hermes-agent";
+    mainProgram = "hermes";
+    license = licenses.mit;
+    platforms = platforms.unix;
+  };
+}
@@ -28,6 +28,8 @@

  let
    cfg = config.services.hermes-agent;
+    effectivePackage = if cfg.extraPythonPackages == [ ] then cfg.package
+      else cfg.package.override { inherit (cfg) extraPythonPackages; };
    hermes-agent = inputs.self.packages.${pkgs.stdenv.hostPlatform.system}.default;

    # Deep-merge config type (from 0xrsydn/nix-hermes-agent)
@@ -456,6 +458,52 @@
        description = "Extra packages available on PATH.";
      };

+      extraPlugins = mkOption {
+        type = types.listOf types.package;
+        default = [ ];
+        description = ''
+          Directory-based plugin packages to symlink into the hermes plugins
+          directory. Each package should contain a plugin.yaml and __init__.py
+          at its root. Hermes discovers these automatically on startup.
+        '';
+        example = literalExpression ''
+          [
+            (pkgs.fetchFromGitHub {
+              owner = "stephenschoettler";
+              repo = "hermes-lcm";
+              name = "hermes-lcm";
+              rev = "v0.7.0";
+              hash = "sha256-...";
+            })
+          ]
+        '';
+      };
+
+      extraPythonPackages = mkOption {
+        type = types.listOf types.package;
+        default = [ ];
+        description = ''
+          Python packages to add to PYTHONPATH for entry-point plugin discovery.
+          These are pip-packaged plugins that register via the
+          hermes_agent.plugins entry-point group. Each package must be built
+          with the same Python interpreter as hermes (python312).
+        '';
+        example = literalExpression ''
+          [
+            (pkgs.python312Packages.buildPythonPackage {
+              pname = "rtk-hermes";
+              version = "1.0.0";
+              src = pkgs.fetchFromGitHub {
+                owner = "ogallotti";
+                repo = "rtk-hermes";
+                rev = "main";
+                hash = "sha256-...";
+              };
+            })
+          ]
+        '';
+      };
+
      restart = mkOption {
        type = types.str;
        default = "always";
@@ -570,7 +618,7 @@
      # so interactive shells share state (sessions, skills, cron) with the
      # gateway service instead of creating a separate ~/.hermes/.
      (lib.mkIf cfg.addToSystemPackages {
-        environment.systemPackages = [ cfg.package ];
+        environment.systemPackages = [ effectivePackage ];
        environment.variables.HERMES_HOME = "${cfg.stateDir}/.hermes";
      })

@@ -581,6 +629,16 @@
        });
      })

+      # ── Assertions ─────────────────────────────────────────────────────
+      {
+        assertions = let
+          names = map lib.getName cfg.extraPlugins;
+        in [{
+          assertion = (lib.length names) == (lib.length (lib.unique names));
+          message = "services.hermes-agent.extraPlugins: duplicate plugin names detected: ${toString names}. If using fetchFromGitHub, set name = \"plugin-name\" to disambiguate.";
+        }];
+      }
+
      # ── Warnings ──────────────────────────────────────────────────────
      (lib.mkIf (cfg.container.enable && !cfg.addToSystemPackages && cfg.container.hostUsers != []) {
        warnings = [
@@ -602,6 +660,7 @@
          "d ${cfg.stateDir}/.hermes/sessions 2770 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.stateDir}/.hermes/logs   2770 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.stateDir}/.hermes/memories 2770 ${cfg.user} ${cfg.group} - -"
+          "d ${cfg.stateDir}/.hermes/plugins 2770 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.stateDir}/home           0750 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.workingDirectory}         2770 ${cfg.user} ${cfg.group} - -"
        ];
@@ -623,7 +682,7 @@
          find ${cfg.stateDir}/.hermes -maxdepth 1 \
            \( -name "*.db" -o -name "*.db-wal" -o -name "*.db-shm" -o -name "SOUL.md" \) \
            -exec chmod g+rw {} + 2>/dev/null || true
-          for _subdir in cron sessions logs memories; do
+          for _subdir in cron sessions logs memories plugins; do
            mkdir -p "${cfg.stateDir}/.hermes/$_subdir"
            chown ${cfg.user}:${cfg.group} "${cfg.stateDir}/.hermes/$_subdir"
            chmod 2770 "${cfg.stateDir}/.hermes/$_subdir"
@@ -732,6 +791,22 @@ HERMES_NIX_ENV_EOF
          ${lib.concatStringsSep "\n" (lib.mapAttrsToList (name: _value: ''
            install -o ${cfg.user} -g ${cfg.group} -m 0640 ${documentDerivation}/${name} ${cfg.workingDirectory}/${name}
          '') cfg.documents)}
+
+        # ── Declarative plugins ─────────────────────────────────────────
+        # Remove stale managed symlinks (plugins removed from config)
+        find ${cfg.stateDir}/.hermes/plugins -maxdepth 1 -type l -name 'nix-managed-*' -delete 2>/dev/null || true
+
+        ${lib.concatStringsSep "\n" (map (plugin:
+          let
+            name = lib.getName plugin;
+          in ''
+            if [ ! -f "${plugin}/plugin.yaml" ]; then
+              echo "ERROR: extraPlugins entry '${plugin}' has no plugin.yaml" >&2
+              exit 1
+            fi
+            ln -sfn ${plugin} ${cfg.stateDir}/.hermes/plugins/nix-managed-${name}
+            chown -h ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/plugins/nix-managed-${name}
+          '') cfg.extraPlugins)}
        '';
      }

@@ -762,7 +837,7 @@ HERMES_NIX_ENV_EOF
            # reads them at Python startup — no systemd EnvironmentFile needed.

            ExecStart = lib.concatStringsSep " " ([
-              "${cfg.package}/bin/hermes"
+              "${effectivePackage}/bin/hermes"
              "gateway"
            ] ++ cfg.extraArgs);

@@ -785,7 +860,7 @@ HERMES_NIX_ENV_EOF
          };

          path = [
-            cfg.package
+            effectivePackage
            pkgs.bash
            pkgs.coreutils
            pkgs.git
@@ -810,11 +885,11 @@ HERMES_NIX_ENV_EOF

          preStart = ''
            # Stable symlinks — container references these, not store paths directly
-            ln -sfn ${cfg.package} ${cfg.stateDir}/current-package
+            ln -sfn ${effectivePackage} ${cfg.stateDir}/current-package
            ln -sfn ${containerEntrypoint} ${cfg.stateDir}/current-entrypoint

            # GC roots so nix-collect-garbage doesn't remove store paths in use
-            ${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root --indirect -r ${cfg.package} 2>/dev/null || true
+            ${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root --indirect -r ${effectivePackage} 2>/dev/null || true
            ${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root-entrypoint --indirect -r ${containerEntrypoint} 2>/dev/null || true

            # Check if container needs (re)creation
@@ -0,0 +1,10 @@
+# nix/overlays.nix — Expose pkgs.hermes-agent for external NixOS configs
+{ inputs, ... }:
+{
+  flake.overlays.default = final: _: {
+    hermes-agent = final.callPackage ./hermes-agent.nix {
+      inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
+      npm-lockfile-fix = inputs.npm-lockfile-fix.packages.${final.stdenv.hostPlatform.system}.default;
+    };
+  };
+}
@@ -4,120 +4,19 @@
  perSystem =
    { pkgs, inputs', ... }:
    let
-      hermesVenv = pkgs.callPackage ./python.nix {
+      hermesAgent = pkgs.callPackage ./hermes-agent.nix {
        inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
-      };
-
-      hermesNpmLib = pkgs.callPackage ./lib.nix {
        npm-lockfile-fix = inputs'.npm-lockfile-fix.packages.default;
      };
-
-      hermesTui = pkgs.callPackage ./tui.nix {
-        inherit hermesNpmLib;
-      };
-
-      # Import bundled skills, excluding runtime caches
-      bundledSkills = pkgs.lib.cleanSourceWith {
-        src = ../skills;
-        filter = path: _type: !(pkgs.lib.hasInfix "/index-cache/" path);
-      };
-
-      hermesWeb = pkgs.callPackage ./web.nix {
-        inherit hermesNpmLib;
-      };
-
-      runtimeDeps = with pkgs; [
-        nodejs_22
-        ripgrep
-        git
-        openssh
-        ffmpeg
-        tirith
-      ];
-
-      runtimePath = pkgs.lib.makeBinPath runtimeDeps;
-
-      # Lockfile hashes for dev shell stamps
-      pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
-      uvLockHash =
-        if builtins.pathExists ../uv.lock then
-          builtins.hashString "sha256" (builtins.readFile ../uv.lock)
-        else
-          "none";
    in
    {
      packages = {
-        default = pkgs.stdenv.mkDerivation {
-          pname = "hermes-agent";
-          version = (fromTOML (builtins.readFile ../pyproject.toml)).project.version;
+        default = hermesAgent;
+        tui = hermesAgent.hermesTui;
+        web = hermesAgent.hermesWeb;

-          dontUnpack = true;
-          dontBuild = true;
-          nativeBuildInputs = [ pkgs.makeWrapper ];
-
-          installPhase = ''
-            runHook preInstall
-
-            mkdir -p $out/share/hermes-agent $out/bin
-            cp -r ${bundledSkills} $out/share/hermes-agent/skills
-            cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
-
-            # copy pre-built TUI (same layout as dev: ui-tui/dist/ + node_modules/)
-            mkdir -p $out/ui-tui
-            cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/
-
-            ${pkgs.lib.concatMapStringsSep "\n"
-              (name: ''
-                makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
-                  --suffix PATH : "${runtimePath}" \
-                  --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
-                  --set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
-                  --set HERMES_TUI_DIR $out/ui-tui \
-                  --set HERMES_PYTHON ${hermesVenv}/bin/python3 \
-                  --set HERMES_NODE ${pkgs.nodejs_22}/bin/node
-              '')
-              [
-                "hermes"
-                "hermes-agent"
-                "hermes-acp"
-              ]
-            }
-
-            runHook postInstall
-          '';
-
-          passthru.devShellHook = ''
-            STAMP=".nix-stamps/hermes-agent"
-            STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
-            if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
-              echo "hermes-agent: installing Python dependencies..."
-              uv venv .venv --python ${pkgs.python312}/bin/python3 2>/dev/null || true
-              source .venv/bin/activate
-              uv pip install -e ".[all]"
-              [ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
-              [ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
-              mkdir -p .nix-stamps
-              echo "$STAMP_VALUE" > "$STAMP"
-            else
-              source .venv/bin/activate
-              export HERMES_PYTHON=${hermesVenv}/bin/python3
-            fi
-          '';
-
-          meta = with pkgs.lib; {
-            description = "AI agent with advanced tool-calling capabilities";
-            homepage = "https://github.com/NousResearch/hermes-agent";
-            mainProgram = "hermes";
-            license = licenses.mit;
-            platforms = platforms.unix;
-          };
-        };
-
-        tui = hermesTui;
-        web = hermesWeb;
-
-        fix-lockfiles = hermesNpmLib.mkFixLockfiles {
-          packages = [ hermesTui hermesWeb ];
+        fix-lockfiles = hermesAgent.hermesNpmLib.mkFixLockfiles {
+          packages = [ hermesAgent.hermesTui hermesAgent.hermesWeb ];
        };
      };
    };
@@ -7,6 +7,7 @@
  pyproject-nix,
  pyproject-build-systems,
  stdenv,
+  dependency-groups ? [ "all" ],
 }:
 let
  workspace = uv2nix.lib.workspace.loadWorkspace { workspaceRoot = ./..; };
@@ -96,5 +97,5 @@ let
      ]);
 in
 pythonSet.mkVirtualEnv "hermes-agent-env" {
-  hermes-agent = [ "all" ];
+  hermes-agent = dependency-groups;
 }
@@ -4,7 +4,7 @@ let
  src = ../ui-tui;
  npmDeps = pkgs.fetchNpmDeps {
    inherit src;
-    hash = "sha256-RU4qSHgJPMyfRSEJDzkG4+MReDZDc6QbTD2wisa5QE0=";
+    hash = "sha256-Chz+NW9NXqboXHOa6PKwf5bhAkkcFtKNhvKWwg2XSPc=";
  };

  npm = hermesNpmLib.mkNpmPassthru { folder = "ui-tui"; attr = "tui"; pname = "hermes-tui"; };
@@ -17,6 +17,7 @@ pkgs.buildNpmPackage (npm // {
  inherit src npmDeps version;

  doCheck = false;
+  npmFlags = [ "--legacy-peer-deps" ];

  installPhase = ''
    runHook preInstall
@@ -380,6 +380,10 @@ def backup_existing(path: Path, backup_root: Path) -> Optional[Path]:
 # Replace OpenClaw brand names with Hermes in migrated text so that
 # memory entries, user profiles, SOUL.md, and workspace instructions
 # read as self-referential to the new agent identity.
+#
+# Case-preserving: ``OpenClaw`` → ``Hermes`` (prose), but lowercase matches
+# like ``openclaw`` → ``hermes`` (so filesystem paths like ``~/.openclaw``
+# become ``~/.hermes`` — the real Hermes home — not the broken ``~/.Hermes``).
 _REBRAND_PATTERNS: List[Tuple[re.Pattern, str]] = [
    (re.compile(r'\bOpen[\s-]?Claw\b', re.IGNORECASE), 'Hermes'),
    (re.compile(r'\bClawdBot\b', re.IGNORECASE), 'Hermes'),
@@ -387,10 +391,31 @@ _REBRAND_PATTERNS: List[Tuple[re.Pattern, str]] = [
 ]


+def _case_preserving_replacement(replacement: str):
+    """Return a re.sub replacement fn that lowercases the result when the
+    matched text was all-lowercase.
+
+    Keeps ``OpenClaw`` → ``Hermes`` but maps ``openclaw`` → ``hermes`` so a
+    filesystem path like ``~/.openclaw/config.yaml`` rewrites to
+    ``~/.hermes/config.yaml`` (the real Hermes home) instead of the broken
+    ``~/.Hermes/config.yaml``.
+    """
+    def _sub(match: "re.Match[str]") -> str:
+        matched = match.group(0)
+        if matched and matched.islower():
+            return replacement.lower()
+        return replacement
+    return _sub
+
+
 def rebrand_text(text: str) -> str:
-    """Replace OpenClaw / ClawdBot / MoltBot brand names with Hermes."""
+    """Replace OpenClaw / ClawdBot / MoltBot brand names with Hermes.
+
+    Preserves case so filesystem-path matches (lowercase) don't become
+    capitalized directory names that don't exist.
+    """
    for pattern, replacement in _REBRAND_PATTERNS:
-        text = pattern.sub(replacement, text)
+        text = pattern.sub(_case_preserving_replacement(replacement), text)
    return text


@@ -0,0 +1,131 @@
+# google_meet plugin
+
+Let the hermes agent join a Google Meet call, transcribe it, optionally speak
+in it, and do the followup work afterwards.
+
+## What ships
+
+| Version | What | Status |
+|---|---|---|
+| v1 | Transcribe-only: Playwright joins Meet, scrapes captions to transcript file | ✓ ships by default |
+| v2 | Realtime duplex audio: bot speaks in-call via OpenAI Realtime + BlackHole/PulseAudio null-sink | ✓ opt in with `mode='realtime'` |
+| v3 | Remote node host: run the bot on a different machine than the gateway | ✓ opt in with `node='<name>'` |
+
+## Architecture
+
+```
+┌─ gateway (Linux box, where hermes runs) ────────────────────────────┐
+│                                                                      │
+│   agent → meet_join(url, mode='realtime', node='my-mac')             │
+│         │                                                            │
+│         └─ NodeClient ─── ws ────┐                                   │
+│                                  │                                   │
+└──────────────────────────────────┼───────────────────────────────────┘
+                                   │ wss (token auth)
+                                   ▼
+┌─ node host (user's Mac, signed-in Chrome lives here) ───────────────┐
+│                                                                      │
+│   NodeServer (from `hermes meet node run`)                           │
+│     │                                                                │
+│     ├─ start_bot → process_manager.start() → spawns meet_bot         │
+│     │                                                                │
+│     └─ meet_bot (Playwright)                                         │
+│        ├─ Chromium → meet.google.com                                 │
+│        ├─ caption scraper → transcript.txt                           │
+│        └─ (realtime mode only) RealtimeSpeaker thread                │
+│             ↓                                                        │
+│           OpenAI Realtime WS → speaker.pcm                           │
+│             ↓                                                        │
+│           paplay → null-sink ← Chrome fake mic                       │
+│                                                                      │
+└──────────────────────────────────────────────────────────────────────┘
+```
+
+Without v3: the whole right column runs on the gateway machine.
+Without v2: the "realtime" path is skipped; transcribe runs alone.
+
+## Files
+
+| Path | Purpose |
+|---|---|
+| `plugin.yaml` | manifest |
+| `__init__.py` | `register(ctx)` — registers 5 tools + `on_session_end` hook + `hermes meet` CLI |
+| `meet_bot.py` | Playwright bot subprocess (standalone, `python -m plugins.google_meet.meet_bot`) |
+| `process_manager.py` | local bot lifecycle + `enqueue_say` |
+| `tools.py` | agent-facing tools + node-routing helper |
+| `cli.py` | `hermes meet setup / auth / join / status / transcript / say / stop / node ...` |
+| `audio_bridge.py` | v2: PulseAudio null-sink (Linux) + BlackHole probe (macOS) |
+| `realtime/openai_client.py` | v2: `RealtimeSession` + `RealtimeSpeaker` (file-queue → OpenAI Realtime WS → PCM) |
+| `node/protocol.py` | v3: message envelope + validation |
+| `node/registry.py` | v3: `$HERMES_HOME/workspace/meetings/nodes.json` |
+| `node/server.py` | v3: `NodeServer` (runs on host machine) |
+| `node/client.py` | v3: `NodeClient` (used by tool handlers + CLI on gateway) |
+| `node/cli.py` | v3: `hermes meet node {run,list,approve,remove,status,ping}` |
+| `SKILL.md` | agent usage guide |
+
+## Local quick start
+
+```bash
+hermes plugins enable google_meet
+hermes meet install                                      # pip + Chromium
+hermes meet setup                                        # preflight
+hermes meet auth                                         # optional
+hermes meet join https://meet.google.com/abc-defg-hij    # transcribe
+```
+
+## Realtime mode
+
+Linux (preferred, most automated):
+```bash
+hermes meet install --realtime                     # installs pulseaudio-utils
+echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
+hermes meet join https://meet.google.com/abc-defg-hij --mode realtime
+# then from the agent or CLI:
+hermes meet say "Good morning everyone, I'm the note-taker bot."
+```
+
+macOS:
+```bash
+hermes meet install --realtime     # runs: brew install blackhole-2ch ffmpeg
+# then — manually! — open System Settings → Sound → Input → BlackHole 2ch
+echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
+hermes meet join https://meet.google.com/abc-defg-hij --mode realtime
+```
+
+On macOS, hermes will **not** switch your system audio input automatically — the
+user has to do it. This is deliberate: switching default input on a whim would
+be a surprising side effect.
+
+## Remote node host
+
+On the node machine (e.g. user's Mac with a signed-in Chrome):
+```bash
+pip install playwright websockets
+python -m playwright install chromium
+hermes plugins enable google_meet
+hermes meet node run --display-name my-mac --host 0.0.0.0 --port 18789
+# prints the bearer token on first run; copy it
+```
+
+On the gateway:
+```bash
+hermes meet node approve my-mac ws://<mac-ip>:18789 <token>
+hermes meet node ping my-mac
+# now any meet_* tool call accepts node='my-mac' (or 'auto')
+```
+
+## Safety
+
+- URL gate: only `https://meet.google.com/abc-defg-hij`, `/new`, `/lookup/<id>`.
+- No calendar scanning, no auto-dial, no auto-consent announcement.
+- Node server uses bearer-token auth; no key exchange, no TLS termination
+  built in — run it on a LAN or behind a reverse proxy you trust.
+- One active meeting per (gateway, node) pair. A second `meet_join` leaves the first.
+- `meet_say` refuses unless the active meeting was started with `mode='realtime'`.
+
+## Out of scope
+
+- **Calendar scanning** — deliberately not implemented. Join URLs must be explicit.
+- **Multi-tenant node sharing** — a node serves one gateway at a time.
+- **Windows** — audio bridging isn't tested; `register()` no-ops on Windows.
+- **System audio input switching on macOS** — user responsibility, not the bot's.
@@ -0,0 +1,148 @@
+---
+name: google_meet
+description: Join a Google Meet call, transcribe live captions, optionally speak in realtime, and do the followup work afterwards. Use when the user asks the agent to sit in on a meeting, take notes, summarize, respond in-call, or action items from it.
+version: 0.2.0
+platforms:
+  - linux
+  - macos
+metadata:
+  hermes:
+    tags: [meetings, google-meet, transcription, realtime-voice]
+---
+
+# google_meet
+
+## When to use
+
+The user says any of:
+
+- "join my Meet at <url>"
+- "take notes on this meeting"
+- "summarize the meeting and send followups"
+- "sit in on my standup"
+- "be a bot in this call and speak up when X"
+
+## Two modes
+
+| Mode | What the bot does |
+|---|---|
+| `transcribe` (default) | Joins, enables captions, scrapes a transcript. Listen-only. |
+| `realtime` | Same as transcribe PLUS speaks into the meeting via OpenAI Realtime. The agent calls `meet_say(text)` and the bot's voice comes out of the call. |
+
+Pick `realtime` only when the user actually wants the agent to speak. It costs real money (OpenAI Realtime is pay-per-audio-minute) and requires a virtual audio device set up on the machine running the bot.
+
+## Two locations
+
+| Location | When |
+|---|---|
+| Local (default) | Gateway machine runs the Playwright bot directly. |
+| Remote node (`node="<name>"`) | Bot runs on a different machine that has a signed-in Chrome and (for realtime) a configured audio bridge. Useful when the gateway runs on a headless Linux box but the user's real signed-in Chrome lives on their Mac. |
+
+## Prerequisites the user must handle once
+
+Easiest path — run the built-in installer:
+
+```bash
+hermes plugins enable google_meet
+hermes meet install                 # pip deps + Chromium (transcribe only)
+hermes meet install --realtime      # + pulseaudio-utils / brew blackhole+ffmpeg
+hermes meet auth                    # optional; skips guest-lobby wait
+hermes meet setup                   # preflight checks
+```
+
+`hermes meet install --realtime` prompts before running `sudo apt-get` (Linux)
+or `brew install` (macOS). Pass `--yes` to skip the prompt. It will NOT touch
+your macOS default-input setting — you have to select BlackHole 2ch in
+System Settings yourself before starting a realtime meeting.
+
+Or do it manually:
+```bash
+pip install playwright websockets && python -m playwright install chromium
+
+# For realtime mode, additionally:
+#   Linux:  sudo apt install pulseaudio-utils
+#   macOS:  brew install blackhole-2ch ffmpeg
+#           → System Settings → Sound → Input → BlackHole 2ch
+#   Then set OPENAI_API_KEY or HERMES_MEET_REALTIME_KEY in ~/.hermes/.env
+```
+
+For a remote node:
+```bash
+# on the user's Mac (where Chrome is signed in):
+pip install playwright websockets && python -m playwright install chromium
+hermes plugins enable google_meet
+hermes meet node run --display-name my-mac    # persistent server
+# copy the printed token
+
+# on the gateway:
+hermes meet node approve my-mac ws://<mac-ip>:18789 <token>
+hermes meet node ping my-mac                   # confirm reachable
+```
+
+Run `hermes meet setup` to preflight local prereqs.
+
+## Flow
+
+1. **Join** — call `meet_join(url=..., mode=..., node=...)`. Returns immediately.
+2. **Announce yourself** — no auto-consent. Say (in whatever channel the user is watching): "A Hermes agent bot is in this call taking notes."
+3. **Poll** — `meet_status()` for liveness, `meet_transcript(last=20)` for recent captions. Don't re-read the whole transcript every turn.
+4. **Speak (realtime only)** — `meet_say(text="...")` queues text for TTS. The speech lags by ~2s. Don't spam it.
+5. **Leave** — `meet_leave()` when done, or set `duration="30m"` on `meet_join` for auto-leave.
+6. **Follow up** — read `meet_transcript()` in full, summarize, and use regular tools to send the recap, file issues, schedule followups.
+
+## Tool reference
+
+| Tool | Parameters | Use |
+|---|---|---|
+| `meet_join` | `url`, `mode?`, `guest_name?`, `duration?`, `headed?`, `node?` | Start bot |
+| `meet_status` | `node?` | Liveness + progress |
+| `meet_transcript` | `last?`, `node?` | Read captions |
+| `meet_leave` | `node?` | Close bot |
+| `meet_say` | `text`, `node?` | Speak in realtime meeting |
+
+`node?` on all tools: pass a registered node name (or `"auto"` for the sole node) to operate a remote bot instead of a local one. Omit for local.
+
+## Important limits
+
+- Captions are only as good as Google Meet's live captions. English-biased, lossy on overlapping speakers.
+- Guest mode sits in the lobby until a host admits. Warn the user; `hermes meet auth` avoids this.
+- **Lobby timeout**: if the host doesn't admit the bot within 5 minutes (configurable via `HERMES_MEET_LOBBY_TIMEOUT` env), the bot leaves and `meet_status` reports `leaveReason: "lobby_timeout"`.
+- **One active meeting per install per location.** A second `meet_join` leaves the first.
+- **Windows not supported.**
+- Realtime mode needs a virtual audio device. If the audio bridge setup fails, the bot falls back to transcribe mode and flags it in `meet_status().error`.
+- `meet_say` requires `mode='realtime'` on the originating `meet_join`. Calling it against a transcribe-mode meeting returns a clear error.
+- **Barge-in is best-effort.** When a caption arrives attributed to a real participant while the bot is generating audio, the bot sends `response.cancel` to OpenAI Realtime. Captions take ~500ms to show up, so the bot will talk over the first second or so of a human interruption.
+
+## Status dict reference
+
+`meet_status()` returns (subset shown, there are more):
+
+| Key | Meaning |
+|---|---|
+| `inCall` | Past the lobby. False while waiting for admission. |
+| `lobbyWaiting` | Clicked "Ask to join", waiting on host. |
+| `joinAttemptedAt` / `joinedAt` | Timestamps for lobby-click and actual admission. |
+| `captioning` | Caption observer is installed. |
+| `transcriptLines` / `lastCaptionAt` | Transcript progress. |
+| `realtime` / `realtimeReady` | Realtime mode provisioned / WS connected. |
+| `realtimeDevice` | Audio device name the bot is feeding (e.g. `hermes_meet_src`). |
+| `audioBytesOut` / `lastAudioOutAt` | How much PCM the OpenAI session has produced. |
+| `lastBargeInAt` | Timestamp of the most recent `response.cancel` sent. |
+| `leaveReason` | `duration_expired`, `lobby_timeout`, `denied`, `page_closed`, or null. |
+| `error` | Last error (soft — bot may still be running). |
+
+## Transcript location
+
+Local:
+```
+$HERMES_HOME/workspace/meetings/<meeting-id>/transcript.txt
+```
+
+Remote node: transcript lives on the node host's disk. Use `meet_transcript(node=...)` to read it over RPC.
+
+## Safety
+
+- URL regex: only `https://meet.google.com/...` URLs pass.
+- No calendar scanning. No auto-dial.
+- Remote nodes use bearer-token auth; tokens are generated on the node (32 hex chars, persisted in `$HERMES_HOME/workspace/meetings/node_token.json`) and must be copied to the gateway via `hermes meet node approve`.
+- `meet_say` text is rate-limited by the OpenAI Realtime session; spam-protection is the bot's problem, not yours, but still — don't queue hundreds of lines.
@@ -0,0 +1,103 @@
+"""google_meet plugin — let the agent join a Meet call, transcribe it, follow up.
+
+v1: transcribe-only. Spawns a headless Chromium via Playwright, joins the Meet
+URL, enables live captions, scrapes them into a transcript file. The agent then
+has the transcript in its workspace and can do whatever followup work it needs
+using its regular tools.
+
+v2 (not in this PR): realtime duplex audio so the agent can speak in the
+meeting, via OpenAI Realtime / Gemini Live + BlackHole / PulseAudio null-sink.
+``meet_say`` exists as a stub today so the tool surface is stable.
+
+Explicit-by-design: only joins ``https://meet.google.com/`` URLs explicitly
+passed in. No calendar scanning, no auto-dial, no consent announcement.
+"""
+
+from __future__ import annotations
+
+import logging
+import platform
+
+from plugins.google_meet import process_manager as pm
+from plugins.google_meet.cli import register_cli as _register_meet_cli
+from plugins.google_meet.cli import meet_command as _meet_command
+from plugins.google_meet.tools import (
+    MEET_JOIN_SCHEMA,
+    MEET_LEAVE_SCHEMA,
+    MEET_SAY_SCHEMA,
+    MEET_STATUS_SCHEMA,
+    MEET_TRANSCRIPT_SCHEMA,
+    check_meet_requirements,
+    handle_meet_join,
+    handle_meet_leave,
+    handle_meet_say,
+    handle_meet_status,
+    handle_meet_transcript,
+)
+
+logger = logging.getLogger(__name__)
+
+
+_TOOLS = (
+    ("meet_join",       MEET_JOIN_SCHEMA,       handle_meet_join,       "📞"),
+    ("meet_status",     MEET_STATUS_SCHEMA,     handle_meet_status,     "🟢"),
+    ("meet_transcript", MEET_TRANSCRIPT_SCHEMA, handle_meet_transcript, "📝"),
+    ("meet_leave",      MEET_LEAVE_SCHEMA,      handle_meet_leave,      "👋"),
+    ("meet_say",        MEET_SAY_SCHEMA,        handle_meet_say,        "🗣️"),
+)
+
+
+def _on_session_end(**kwargs) -> None:
+    """Best-effort cleanup — if a meet bot is still running when the session
+    ends, leave the call so we don't orphan a headless Chromium.
+
+    No-ops when nothing is active. Swallows all exceptions — session end must
+    not fail because the bot cleanup hit an edge case.
+    """
+    try:
+        status = pm.status()
+        if status.get("ok") and status.get("alive"):
+            pm.stop(reason="session ended")
+    except Exception as e:  # pragma: no cover — defensive
+        logger.debug("google_meet on_session_end cleanup failed: %s", e)
+
+
+def register(ctx) -> None:
+    """Register tools, CLI, and lifecycle hooks.
+
+    Called once by the plugin loader when the plugin is enabled via
+    ``plugins.enabled`` in config.yaml.
+    """
+    # Windows is not supported in v1 — audio routing for v2 doesn't have a
+    # tested path there and guest-join Chromium is flakier. Refuse to register
+    # rather than half-working.
+    system = platform.system().lower()
+    if system not in ("linux", "darwin"):
+        logger.info(
+            "google_meet plugin: platform=%s not supported (linux/macos only)",
+            system,
+        )
+        return
+
+    for name, schema, handler, emoji in _TOOLS:
+        ctx.register_tool(
+            name=name,
+            toolset="google_meet",
+            schema=schema,
+            handler=handler,
+            check_fn=check_meet_requirements,
+            emoji=emoji,
+        )
+
+    ctx.register_cli_command(
+        name="meet",
+        help="Google Meet bot (join, transcribe, follow up)",
+        setup_fn=_register_meet_cli,
+        handler_fn=_meet_command,
+        description=(
+            "Let the hermes agent join a Google Meet call and scrape live "
+            "captions into a transcript. See: hermes meet setup"
+        ),
+    )
+
+    ctx.register_hook("on_session_end", _on_session_end)
@@ -0,0 +1,244 @@
+"""Virtual audio bridge for feeding generated speech into Chrome's mic.
+
+v2 module. Provisions a platform-specific virtual audio device so the
+Meet bot's Chromium instance can be pointed at an input source we
+control. The OpenAI Realtime client writes PCM bytes into this device;
+Chrome reads them as if they were coming from a microphone.
+
+Linux (primary): uses pactl (PulseAudio) to create a null-sink plus a
+virtual source whose master is the null-sink's monitor. Callers set
+PULSE_SOURCE=<source_name> in Chrome's env and pass the fake-mic flag.
+
+macOS: requires BlackHole 2ch to be installed. This module only
+verifies its presence and returns the device name; routing OS default
+input is left to the user (or a future switchaudio-osx integration) to
+avoid surprising the user's system audio state.
+
+Windows: not supported in v2.
+"""
+
+from __future__ import annotations
+
+import platform
+import subprocess
+from typing import Optional
+
+
+_BLACKHOLE_DEVICE = "BlackHole 2ch"
+
+
+class AudioBridge:
+    """Manages a virtual audio device for Chrome fake-mic input.
+
+    Call ``setup()`` once before launching the Meet bot and
+    ``teardown()`` when the session ends. ``teardown()`` is idempotent.
+    """
+
+    def __init__(self, name_prefix: str = "hermes_meet") -> None:
+        self._name_prefix = name_prefix
+        self._platform: Optional[str] = None
+        self._device_name: Optional[str] = None
+        self._write_target: Optional[str] = None
+        self._module_ids: list[int] = []
+        self._torn_down = False
+
+    # ── public properties ─────────────────────────────────────────────────
+
+    @property
+    def device_name(self) -> str:
+        if not self._device_name:
+            raise RuntimeError("AudioBridge not set up yet")
+        return self._device_name
+
+    @property
+    def write_target(self) -> str:
+        if not self._write_target:
+            raise RuntimeError("AudioBridge not set up yet")
+        return self._write_target
+
+    # ── lifecycle ─────────────────────────────────────────────────────────
+
+    def setup(self) -> dict:
+        """Provision the virtual audio device.
+
+        Returns a dict describing the device. Raises RuntimeError on
+        unsupported platforms or when required system tools are missing.
+        """
+        system = platform.system()
+        if system == "Linux":
+            return self._setup_linux()
+        if system == "Darwin":
+            return self._setup_darwin()
+        if system == "Windows":
+            raise RuntimeError("windows not supported in v2")
+        raise RuntimeError(f"unsupported platform: {system}")
+
+    def teardown(self) -> None:
+        """Release the virtual audio device. Idempotent."""
+        if self._torn_down:
+            return
+        # Only Linux needs explicit unloading.
+        if self._platform == "linux" and self._module_ids:
+            # Unload in reverse order (virtual-source before null-sink).
+            for mod_id in reversed(self._module_ids):
+                try:
+                    subprocess.run(
+                        ["pactl", "unload-module", str(mod_id)],
+                        check=False,
+                        capture_output=True,
+                    )
+                except Exception:
+                    # Best-effort teardown — never raise from here.
+                    pass
+            self._module_ids = []
+        self._torn_down = True
+
+    # ── platform impls ────────────────────────────────────────────────────
+
+    def _setup_linux(self) -> dict:
+        sink_name = f"{self._name_prefix}_sink"
+        src_name = f"{self._name_prefix}_src"
+
+        try:
+            sink_out = subprocess.run(
+                [
+                    "pactl",
+                    "load-module",
+                    "module-null-sink",
+                    f"sink_name={sink_name}",
+                    f"sink_properties=device.description=HermesMeetSink",
+                ],
+                check=True,
+                capture_output=True,
+                text=True,
+            )
+        except FileNotFoundError as exc:
+            raise RuntimeError(
+                "pactl not found — install PulseAudio/pipewire-pulse"
+            ) from exc
+        except subprocess.CalledProcessError as exc:
+            raise RuntimeError(
+                f"pactl load-module null-sink failed: {exc.stderr or exc}"
+            ) from exc
+
+        sink_mod_id = self._parse_module_id(sink_out.stdout)
+
+        try:
+            src_out = subprocess.run(
+                [
+                    "pactl",
+                    "load-module",
+                    "module-virtual-source",
+                    f"source_name={src_name}",
+                    f"master={sink_name}.monitor",
+                ],
+                check=True,
+                capture_output=True,
+                text=True,
+            )
+        except subprocess.CalledProcessError as exc:
+            # Roll back the null-sink we just created so we don't leak it.
+            subprocess.run(
+                ["pactl", "unload-module", str(sink_mod_id)],
+                check=False,
+                capture_output=True,
+            )
+            raise RuntimeError(
+                f"pactl load-module virtual-source failed: {exc.stderr or exc}"
+            ) from exc
+
+        src_mod_id = self._parse_module_id(src_out.stdout)
+
+        self._platform = "linux"
+        self._device_name = src_name
+        self._write_target = sink_name
+        self._module_ids = [sink_mod_id, src_mod_id]
+        self._torn_down = False
+
+        return {
+            "platform": "linux",
+            "device_name": src_name,
+            "sample_rate": 48000,
+            "channels": 2,
+            "module_ids": list(self._module_ids),
+            "write_target": sink_name,
+        }
+
+    def _setup_darwin(self) -> dict:
+        try:
+            out = subprocess.check_output(
+                ["system_profiler", "SPAudioDataType"],
+                text=True,
+                stderr=subprocess.STDOUT,
+            )
+        except FileNotFoundError as exc:
+            raise RuntimeError(
+                "system_profiler not found (macOS-only command)"
+            ) from exc
+        except subprocess.CalledProcessError as exc:
+            raise RuntimeError(
+                f"system_profiler failed: {exc.output}"
+            ) from exc
+
+        if "BlackHole" not in out:
+            raise RuntimeError(
+                "BlackHole virtual audio device not installed. "
+                "Install via: brew install blackhole-2ch"
+            )
+
+        self._platform = "darwin"
+        self._device_name = _BLACKHOLE_DEVICE
+        self._write_target = _BLACKHOLE_DEVICE
+        self._module_ids = []
+        self._torn_down = False
+
+        return {
+            "platform": "darwin",
+            "device_name": _BLACKHOLE_DEVICE,
+            "sample_rate": 48000,
+            "channels": 2,
+            "module_ids": [],
+            "write_target": _BLACKHOLE_DEVICE,
+        }
+
+    # ── helpers ──────────────────────────────────────────────────────────
+
+    @staticmethod
+    def _parse_module_id(stdout: str) -> int:
+        """pactl load-module prints the new module ID to stdout."""
+        text = (stdout or "").strip()
+        if not text:
+            raise RuntimeError("pactl load-module returned empty stdout")
+        # Take the last whitespace-separated token on the first non-empty line.
+        first = text.splitlines()[0].strip()
+        token = first.split()[-1]
+        try:
+            return int(token)
+        except ValueError as exc:
+            raise RuntimeError(
+                f"could not parse pactl module id from: {stdout!r}"
+            ) from exc
+
+
+def chrome_fake_audio_flags(bridge_info: dict) -> list[str]:
+    """Return Chrome flags for using the fake audio input.
+
+    The PulseAudio source is selected via the ``PULSE_SOURCE`` env var,
+    which callers must set in Chrome's environment before launch:
+
+        env["PULSE_SOURCE"] = bridge_info["device_name"]
+
+    On macOS the caller must ensure the system default audio input is
+    set to the returned BlackHole device (we do not flip that switch).
+    """
+    system = platform.system()
+    if system == "Linux":
+        # Chromium on Linux picks up the PulseAudio source selected via
+        # PULSE_SOURCE env var; the fake-ui flag skips the permission
+        # prompt so the bot can pick "use my mic" without user input.
+        return ["--use-fake-ui-for-media-stream"]
+    if system == "Darwin":
+        return ["--use-fake-ui-for-media-stream"]
+    if system == "Windows":
+        raise RuntimeError("windows not supported in v2")
+    raise RuntimeError(f"unsupported platform: {system}")
@@ -0,0 +1,478 @@
+"""CLI commands for the google_meet plugin.
+
+Wires ``hermes meet <subcommand>``:
+  setup       — preflight playwright, chromium, auth file, print fixes
+  auth        — open a browser to sign into Google, save storage state
+  join <url>  — join a Meet URL synchronously (also callable from the agent)
+  status      — print current bot state
+  transcript  — print the transcript
+  stop        — leave the current meeting
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Optional
+
+from hermes_constants import get_hermes_home
+
+from plugins.google_meet import process_manager as pm
+from plugins.google_meet.meet_bot import _is_safe_meet_url
+
+
+def _auth_state_path() -> Path:
+    return Path(get_hermes_home()) / "workspace" / "meetings" / "auth.json"
+
+
+# ---------------------------------------------------------------------------
+# argparse wiring
+# ---------------------------------------------------------------------------
+
+def register_cli(subparser: argparse.ArgumentParser) -> None:
+    """Build the ``hermes meet`` argparse tree.
+
+    Called by :func:`_register_cli_commands` at plugin load time.
+    """
+    subs = subparser.add_subparsers(dest="meet_command")
+
+    subs.add_parser("setup", help="Preflight: playwright, chromium, auth")
+
+    inst_p = subs.add_parser(
+        "install",
+        help="Install prerequisites (pip deps, Chromium, platform audio tools)",
+    )
+    inst_p.add_argument(
+        "--realtime", action="store_true",
+        help="Also install realtime audio tools (pulseaudio-utils on Linux, BlackHole+ffmpeg on macOS). Uses sudo/brew, prompts before invoking either.",
+    )
+    inst_p.add_argument(
+        "--yes", "-y", action="store_true",
+        help="Answer yes to all prompts (use with care; will run sudo apt-get or brew without asking).",
+    )
+
+    subs.add_parser("auth", help="Sign in to Google and save session state")
+
+    join_p = subs.add_parser("join", help="Join a Meet URL")
+    join_p.add_argument("url", help="https://meet.google.com/...")
+    join_p.add_argument("--guest-name", default="Hermes Agent")
+    join_p.add_argument("--duration", default=None, help="e.g. 30m, 2h, 90s")
+    join_p.add_argument("--headed", action="store_true", help="show browser")
+    join_p.add_argument(
+        "--mode", choices=("transcribe", "realtime"), default="transcribe",
+        help="transcribe (default, listen-only) or realtime (speak via OpenAI Realtime)"
+    )
+    join_p.add_argument(
+        "--node", default=None,
+        help="remote node name, or 'auto' to use the sole registered node"
+    )
+
+    subs.add_parser("status", help="Print current Meet bot state")
+
+    tr_p = subs.add_parser("transcript", help="Print the scraped transcript")
+    tr_p.add_argument("--last", type=int, default=None)
+
+    say_p = subs.add_parser("say", help="Speak text in an active realtime meeting")
+    say_p.add_argument("text", help="what to say")
+    say_p.add_argument("--node", default=None)
+
+    subs.add_parser("stop", help="Leave the current meeting")
+
+    # v3: remote node host management.
+    node_p = subs.add_parser(
+        "node",
+        help="Manage remote meet node hosts (run/list/approve/remove/status/ping)",
+    )
+    try:
+        from plugins.google_meet.node.cli import register_cli as _register_node_cli
+        _register_node_cli(node_p)
+    except Exception as e:  # pragma: no cover — defensive
+        # If the node module fails to import for any reason (optional dep
+        # missing at import time etc.), leave the subparser present but
+        # flag it. The argparse dispatch will surface a clear error.
+        def _node_unavailable(args):
+            print(f"hermes meet node: module unavailable ({e})")
+            return 1
+        node_p.set_defaults(func=_node_unavailable)
+
+    subparser.set_defaults(func=meet_command)
+
+
+# ---------------------------------------------------------------------------
+# Dispatch
+# ---------------------------------------------------------------------------
+
+def meet_command(args: argparse.Namespace) -> int:
+    sub = getattr(args, "meet_command", None)
+    if not sub:
+        print("usage: hermes meet {setup,auth,join,status,transcript,say,stop,node}")
+        return 2
+    if sub == "setup":
+        return _cmd_setup()
+    if sub == "install":
+        return _cmd_install(
+            realtime=bool(getattr(args, "realtime", False)),
+            assume_yes=bool(getattr(args, "yes", False)),
+        )
+    if sub == "auth":
+        return _cmd_auth()
+    if sub == "join":
+        return _cmd_join(
+            url=args.url,
+            guest_name=args.guest_name,
+            duration=args.duration,
+            headed=args.headed,
+            mode=getattr(args, "mode", "transcribe"),
+            node=getattr(args, "node", None),
+        )
+    if sub == "status":
+        return _cmd_status()
+    if sub == "transcript":
+        return _cmd_transcript(last=args.last)
+    if sub == "say":
+        return _cmd_say(text=args.text, node=getattr(args, "node", None))
+    if sub == "stop":
+        return _cmd_stop()
+    if sub == "node":
+        # Dispatch was set by the node cli's register_cli; fall through to
+        # whatever its subparsers wired.
+        fn = getattr(args, "func", None)
+        if fn is None or fn is meet_command:
+            print("usage: hermes meet node {run,list,approve,remove,status,ping}")
+            return 2
+        return fn(args)
+    print(f"unknown subcommand: {sub}")
+    return 2
+
+
+# ---------------------------------------------------------------------------
+# Subcommand handlers
+# ---------------------------------------------------------------------------
+
+def _cmd_setup() -> int:
+    import platform as _p
+
+    print("google_meet preflight")
+    print("---------------------")
+
+    system = _p.system()
+    system_ok = system in ("Linux", "Darwin")
+    print(f"  platform       : {system}  [{'ok' if system_ok else 'unsupported'}]")
+
+    try:
+        import playwright  # noqa: F401
+        pw_ok = True
+        pw_msg = "installed"
+    except ImportError:
+        pw_ok = False
+        pw_msg = "NOT installed — run: pip install playwright"
+    print(f"  playwright     : {pw_msg}")
+
+    chromium_ok = False
+    chromium_msg = "unknown"
+    if pw_ok:
+        try:
+            from playwright.sync_api import sync_playwright
+            with sync_playwright() as p:
+                try:
+                    exe = p.chromium.executable_path
+                    if exe and Path(exe).exists():
+                        chromium_ok = True
+                        chromium_msg = f"ok ({exe})"
+                    else:
+                        chromium_msg = (
+                            "not installed — run: "
+                            "python -m playwright install chromium"
+                        )
+                except Exception as e:
+                    chromium_msg = f"probe failed: {e}"
+        except Exception as e:
+            chromium_msg = f"probe failed: {e}"
+    print(f"  chromium       : {chromium_msg}")
+
+    auth_path = _auth_state_path()
+    auth_ok = auth_path.is_file()
+    print(
+        "  google auth    : "
+        + (f"ok ({auth_path})" if auth_ok else "not saved — run: hermes meet auth")
+    )
+
+    print()
+    all_ok = system_ok and pw_ok and chromium_ok
+    if all_ok:
+        print(
+            "ready. Join a meeting:  "
+            "hermes meet join https://meet.google.com/abc-defg-hij"
+        )
+    else:
+        print("not ready yet — fix the items above.")
+    return 0 if all_ok else 1
+
+
+def _cmd_install(*, realtime: bool, assume_yes: bool) -> int:
+    """Install the plugin's prerequisites.
+
+    Always: pip install playwright + websockets, then
+    ``python -m playwright install chromium``.
+
+    With ``--realtime``: also install the platform audio bridge deps.
+      Linux : ``sudo apt-get install -y pulseaudio-utils``
+      macOS : ``brew install blackhole-2ch ffmpeg``  (+ remind the user
+              to select BlackHole as the default input device manually)
+
+    Prompts before every package-manager invocation unless ``--yes``.
+    Refuses to run on Windows.
+    """
+    import platform as _p
+    import shutil as _shutil
+    import subprocess as _sp
+
+    system = _p.system()
+    if system not in ("Linux", "Darwin"):
+        print(f"google_meet install: {system} is not supported (linux/macos only)")
+        return 1
+
+    def _confirm(prompt: str) -> bool:
+        if assume_yes:
+            return True
+        try:
+            ans = input(f"{prompt} [y/N] ").strip().lower()
+        except EOFError:
+            return False
+        return ans in ("y", "yes")
+
+    print("google_meet install")
+    print("-------------------")
+
+    # 1) pip deps — always safe, venv-scoped.
+    pip_pkgs = ["playwright", "websockets"]
+    print(f"\n[1/3] pip install: {' '.join(pip_pkgs)}")
+    try:
+        res = _sp.run(
+            [sys.executable, "-m", "pip", "install", "--upgrade", *pip_pkgs],
+            check=False,
+        )
+        if res.returncode != 0:
+            print("  pip install failed")
+            return 1
+    except Exception as e:
+        print(f"  pip install failed: {e}")
+        return 1
+
+    # 2) Playwright browsers — pulls chromium (~300MB first run).
+    print("\n[2/3] python -m playwright install chromium")
+    try:
+        res = _sp.run(
+            [sys.executable, "-m", "playwright", "install", "chromium"],
+            check=False,
+        )
+        if res.returncode != 0:
+            print("  playwright install failed (may already be installed)")
+    except Exception as e:
+        print(f"  playwright install failed: {e}")
+        return 1
+
+    # 3) Platform audio deps for realtime mode.
+    if realtime:
+        print("\n[3/3] realtime audio deps")
+        if system == "Linux":
+            if _shutil.which("paplay") and _shutil.which("pactl"):
+                print("  pulseaudio-utils already installed.")
+            else:
+                if not _confirm(
+                    "  install pulseaudio-utils? this runs `sudo apt-get install -y pulseaudio-utils`"
+                ):
+                    print("  skipped (you can run it manually later)")
+                else:
+                    cmd = ["sudo", "apt-get", "install", "-y", "pulseaudio-utils"]
+                    print(f"  $ {' '.join(cmd)}")
+                    res = _sp.run(cmd, check=False)
+                    if res.returncode != 0:
+                        print("  apt install failed — install pulseaudio-utils manually")
+        elif system == "Darwin":
+            have_bh = False
+            try:
+                out = _sp.check_output(["system_profiler", "SPAudioDataType"], text=True)
+                have_bh = "BlackHole" in out
+            except Exception:
+                pass
+            have_ffmpeg = bool(_shutil.which("ffmpeg"))
+            needs = []
+            if not have_bh:
+                needs.append("blackhole-2ch")
+            if not have_ffmpeg:
+                needs.append("ffmpeg")
+            if not needs:
+                print("  BlackHole and ffmpeg already installed.")
+            elif not _shutil.which("brew"):
+                print(
+                    "  missing: " + ", ".join(needs) + "\n"
+                    "  install Homebrew first (https://brew.sh) or install the packages manually."
+                )
+            else:
+                if not _confirm(f"  install via brew: {' '.join(needs)}?"):
+                    print("  skipped (you can run it manually later)")
+                else:
+                    cmd = ["brew", "install", *needs]
+                    print(f"  $ {' '.join(cmd)}")
+                    res = _sp.run(cmd, check=False)
+                    if res.returncode != 0:
+                        print("  brew install failed — install them manually")
+            print(
+                "\n  NOTE: macOS does not auto-route audio. Open\n"
+                "    System Settings → Sound → Input\n"
+                "  and select 'BlackHole 2ch' before starting a realtime meeting.\n"
+                "  hermes will not switch your default input for you."
+            )
+    else:
+        print("\n[3/3] skipped (pass --realtime to install audio tooling too)")
+
+    print("\ndone. verify with: hermes meet setup")
+    return 0
+
+
+def _cmd_auth() -> int:
+    """Open a headed Chromium, let the user sign in, save storage_state."""
+    try:
+        from playwright.sync_api import sync_playwright
+    except ImportError:
+        print(
+            "playwright is not installed. run:\n"
+            "  pip install playwright && python -m playwright install chromium"
+        )
+        return 1
+
+    path = _auth_state_path()
+    path.parent.mkdir(parents=True, exist_ok=True)
+
+    print(f"opening Chromium — sign in to Google, then return here and press Enter.")
+    print(f"saving storage state to: {path}")
+    try:
+        with sync_playwright() as pw:
+            browser = pw.chromium.launch(headless=False)
+            context = browser.new_context()
+            page = context.new_page()
+            page.goto("https://accounts.google.com/", wait_until="domcontentloaded")
+            try:
+                input("press Enter after you've signed in ... ")
+            except EOFError:
+                pass
+            context.storage_state(path=str(path))
+            browser.close()
+    except Exception as e:
+        print(f"auth failed: {e}")
+        return 1
+    print("saved. you can now run: hermes meet join <url>")
+    return 0
+
+
+def _cmd_join(
+    url: str,
+    *,
+    guest_name: str,
+    duration: Optional[str],
+    headed: bool,
+    mode: str = "transcribe",
+    node: Optional[str] = None,
+) -> int:
+    if not _is_safe_meet_url(url):
+        print(f"refusing: not a meet.google.com URL: {url}")
+        return 2
+    if node:
+        # Remote: go through NodeClient.
+        try:
+            from plugins.google_meet.node.registry import NodeRegistry
+            from plugins.google_meet.node.client import NodeClient
+        except ImportError as e:
+            print(f"node module unavailable: {e}")
+            return 1
+        reg = NodeRegistry()
+        entry = reg.resolve(node if node != "auto" else None)
+        if entry is None:
+            print(f"no registered node matches {node!r}")
+            return 1
+        client = NodeClient(url=entry["url"], token=entry["token"])
+        try:
+            res = client.start_bot(
+                url=url, guest_name=guest_name, duration=duration,
+                headed=headed, mode=mode,
+            )
+        except Exception as e:
+            print(f"remote start_bot failed: {e}")
+            return 1
+        print(json.dumps({"node": entry.get("name"), **res}, indent=2))
+        return 0 if res.get("ok") else 1
+
+    auth = _auth_state_path()
+    res = pm.start(
+        url=url,
+        headed=headed,
+        guest_name=guest_name,
+        duration=duration,
+        auth_state=str(auth) if auth.is_file() else None,
+        mode=mode,
+    )
+    print(json.dumps(res, indent=2))
+    return 0 if res.get("ok") else 1
+
+
+def _cmd_say(text: str, node: Optional[str] = None) -> int:
+    if not (text or "").strip():
+        print("refusing: empty text")
+        return 2
+    if node:
+        try:
+            from plugins.google_meet.node.registry import NodeRegistry
+            from plugins.google_meet.node.client import NodeClient
+        except ImportError as e:
+            print(f"node module unavailable: {e}")
+            return 1
+        reg = NodeRegistry()
+        entry = reg.resolve(node if node != "auto" else None)
+        if entry is None:
+            print(f"no registered node matches {node!r}")
+            return 1
+        client = NodeClient(url=entry["url"], token=entry["token"])
+        try:
+            res = client.say(text)
+        except Exception as e:
+            print(f"remote say failed: {e}")
+            return 1
+        print(json.dumps({"node": entry.get("name"), **res}, indent=2))
+        return 0 if res.get("ok") else 1
+
+    res = pm.enqueue_say(text)
+    print(json.dumps(res, indent=2))
+    return 0 if res.get("ok") else 1
+
+
+def _cmd_status() -> int:
+    res = pm.status()
+    print(json.dumps(res, indent=2))
+    return 0 if res.get("ok") else 1
+
+
+def _cmd_transcript(last: Optional[int]) -> int:
+    res = pm.transcript(last=last)
+    if not res.get("ok"):
+        print(json.dumps(res, indent=2))
+        return 1
+    for ln in res.get("lines", []):
+        print(ln)
+    return 0
+
+
+def _cmd_stop() -> int:
+    res = pm.stop(reason="hermes meet stop")
+    print(json.dumps(res, indent=2))
+    return 0 if res.get("ok") else 1
+
+
+if __name__ == "__main__":  # pragma: no cover
+    parser = argparse.ArgumentParser(prog="hermes meet")
+    register_cli(parser)
+    ns = parser.parse_args()
+    sys.exit(meet_command(ns))
@@ -0,0 +1,852 @@
+"""Headless Google Meet bot — Playwright + live-caption scraping.
+
+Runs as a standalone subprocess spawned by ``process_manager.py``. Reads config
+from env vars, writes status + transcript to files under
+``$HERMES_HOME/workspace/meetings/<meeting-id>/``. The main hermes process
+reads those files via the ``meet_*`` tools — no IPC beyond filesystem.
+
+The scraping strategy mirrors OpenUtter (sumansid/openutter): we don't parse
+WebRTC audio, we enable Google Meet's built-in live captions and observe the
+captions container in the DOM via a MutationObserver. This is lossy and
+English-biased but it is:
+
+* deterministic (no API keys, no STT billing),
+* works behind Meet's normal login / admission,
+* survives Meet UI rewrites fairly well because the caption container has a
+  stable ARIA role.
+
+Run standalone for debugging::
+
+    HERMES_MEET_URL=https://meet.google.com/abc-defg-hij \\
+    HERMES_MEET_OUT_DIR=/tmp/meet-debug \\
+    HERMES_MEET_HEADED=1 \\
+    python -m plugins.google_meet.meet_bot
+
+No meet.google.com URL → exits non-zero. Any URL that doesn't start with
+``https://meet.google.com/`` is rejected (explicit-by-design).
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import re
+import signal
+import sys
+import threading
+import time
+from pathlib import Path
+from typing import Optional
+
+# Match ``https://meet.google.com/abc-defg-hij`` or ``.../lookup/...`` — the
+# short three-segment code or a lookup URL. Anything else is rejected.
+MEET_URL_RE = re.compile(
+    r"^https://meet\.google\.com/("
+    r"[a-z0-9]{3,}-[a-z0-9]{3,}-[a-z0-9]{3,}"
+    r"|lookup/[^/?#]+"
+    r"|new"
+    r")(?:[/?#].*)?$"
+)
+
+
+# Filenames the bot reads/writes in ``HERMES_MEET_OUT_DIR``.
+SAY_QUEUE_FILENAME = "say_queue.jsonl"
+SAY_PCM_FILENAME = "speaker.pcm"
+
+
+def _is_safe_meet_url(url: str) -> bool:
+    """Return True if *url* is a Google Meet URL we're willing to navigate to."""
+    if not isinstance(url, str):
+        return False
+    return bool(MEET_URL_RE.match(url.strip()))
+
+
+def _meeting_id_from_url(url: str) -> str:
+    """Extract the 3-segment meeting code from a Meet URL.
+
+    For ``https://meet.google.com/abc-defg-hij`` → ``abc-defg-hij``.
+    For ``.../lookup/<id>`` or ``/new`` we fall back to a timestamped id — the
+    bot won't know the real code until after redirect, and callers pass this
+    through to filename anyway.
+    """
+    m = re.search(
+        r"meet\.google\.com/([a-z0-9]{3,}-[a-z0-9]{3,}-[a-z0-9]{3,})",
+        url or "",
+    )
+    if m:
+        return m.group(1)
+    return f"meet-{int(time.time())}"
+
+
+# ---------------------------------------------------------------------------
+# Status + transcript file writers
+# ---------------------------------------------------------------------------
+
+class _BotState:
+    """Single-process mutable state, flushed to ``status.json`` on each change."""
+
+    def __init__(self, out_dir: Path, meeting_id: str, url: str):
+        self.out_dir = out_dir
+        self.meeting_id = meeting_id
+        self.url = url
+        self.in_call = False
+        self.captioning = False
+        self.captions_enabled_attempted = False
+        self.lobby_waiting = False
+        self.join_attempted_at: Optional[float] = None
+        self.joined_at: Optional[float] = None
+        self.last_caption_at: Optional[float] = None
+        self.transcript_lines = 0
+        self.error: Optional[str] = None
+        self.exited = False
+        # v2 realtime fields.
+        self.realtime = False
+        self.realtime_ready = False
+        self.realtime_device: Optional[str] = None
+        self.audio_bytes_out: int = 0
+        self.last_audio_out_at: Optional[float] = None
+        self.last_barge_in_at: Optional[float] = None
+        self.leave_reason: Optional[str] = None
+        # Scraped captions, in order, deduped. Each entry is a dict of
+        # {"ts": <epoch>, "speaker": str, "text": str}.
+        self._seen: set = set()
+        out_dir.mkdir(parents=True, exist_ok=True)
+        self.transcript_path = out_dir / "transcript.txt"
+        self.status_path = out_dir / "status.json"
+        self._flush()
+
+    # -------- transcript ------------------------------------------------
+
+    def record_caption(self, speaker: str, text: str) -> None:
+        """Append a caption line if we haven't seen this exact (speaker, text)."""
+        speaker = (speaker or "").strip() or "Unknown"
+        text = (text or "").strip()
+        if not text:
+            return
+        key = f"{speaker}|{text}"
+        if key in self._seen:
+            return
+        self._seen.add(key)
+        self.transcript_lines += 1
+        self.last_caption_at = time.time()
+        ts = time.strftime("%H:%M:%S", time.localtime(self.last_caption_at))
+        line = f"[{ts}] {speaker}: {text}\n"
+        # Atomic-ish append — good enough for a single-writer.
+        with self.transcript_path.open("a", encoding="utf-8") as f:
+            f.write(line)
+        self._flush()
+
+    # -------- status file ----------------------------------------------
+
+    def _flush(self) -> None:
+        data = {
+            "meetingId": self.meeting_id,
+            "url": self.url,
+            "inCall": self.in_call,
+            "captioning": self.captioning,
+            "captionsEnabledAttempted": self.captions_enabled_attempted,
+            "lobbyWaiting": self.lobby_waiting,
+            "joinAttemptedAt": self.join_attempted_at,
+            "joinedAt": self.joined_at,
+            "lastCaptionAt": self.last_caption_at,
+            "transcriptLines": self.transcript_lines,
+            "transcriptPath": str(self.transcript_path),
+            "error": self.error,
+            "exited": self.exited,
+            "pid": os.getpid(),
+            # v2 realtime telemetry.
+            "realtime": self.realtime,
+            "realtimeReady": self.realtime_ready,
+            "realtimeDevice": self.realtime_device,
+            "audioBytesOut": self.audio_bytes_out,
+            "lastAudioOutAt": self.last_audio_out_at,
+            "lastBargeInAt": self.last_barge_in_at,
+            "leaveReason": self.leave_reason,
+        }
+        tmp = self.status_path.with_suffix(".json.tmp")
+        tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
+        tmp.replace(self.status_path)
+
+    def set(self, **kwargs) -> None:
+        for k, v in kwargs.items():
+            setattr(self, k, v)
+        self._flush()
+
+
+# ---------------------------------------------------------------------------
+# Playwright bot entry point
+# ---------------------------------------------------------------------------
+
+# JavaScript injected into the Meet tab to observe captions. Captures
+# {speaker, text} tuples via a MutationObserver on the caption container,
+# and exposes ``window.__hermesMeetDrain()`` to pull new entries. This
+# mirrors the OpenUtter caption scraping approach.
+_CAPTION_OBSERVER_JS = r"""
+(() => {
+  if (window.__hermesMeetInstalled) return;
+  window.__hermesMeetInstalled = true;
+  window.__hermesMeetQueue = [];
+
+  const captionSelector = '[role="region"][aria-label*="aption" i], ' +
+                          'div[jsname="YSxPC"], ' +  // legacy
+                          'div[jsname="tgaKEf"]';    // current (Apr 2026)
+
+  function pushEntry(speaker, text) {
+    if (!text || !text.trim()) return;
+    window.__hermesMeetQueue.push({
+      ts: Date.now(),
+      speaker: (speaker || '').trim(),
+      text: text.trim(),
+    });
+  }
+
+  function scan(root) {
+    // Meet captions render as a list of rows; each row contains a speaker
+    // label and a text block. Selectors vary across Meet rewrites; we try
+    // a few shapes and fall back to raw text.
+    const rows = root.querySelectorAll('div[jsname="dsyhDe"], div.CNusmb, div.TBMuR');
+    if (rows.length) {
+      rows.forEach((row) => {
+        const spkEl = row.querySelector('div.KcIKyf, div.zs7s8d, span[jsname="YSxPC"]');
+        const txtEl = row.querySelector('div.bh44bd, span[jsname="tgaKEf"], div.iTTPOb');
+        const speaker = spkEl ? spkEl.innerText : '';
+        const text = txtEl ? txtEl.innerText : row.innerText;
+        pushEntry(speaker, text);
+      });
+      return;
+    }
+    // Fallback: treat the whole region's innerText as one anonymous line.
+    const text = (root.innerText || '').split('\n').filter(Boolean).pop();
+    pushEntry('', text);
+  }
+
+  function attach() {
+    const el = document.querySelector(captionSelector);
+    if (!el) return false;
+    const obs = new MutationObserver(() => scan(el));
+    obs.observe(el, { childList: true, subtree: true, characterData: true });
+    scan(el);
+    return true;
+  }
+
+  // Try now and retry on interval — the caption region only appears after
+  // captions are enabled and someone speaks.
+  if (!attach()) {
+    const iv = setInterval(() => { if (attach()) clearInterval(iv); }, 1500);
+  }
+
+  window.__hermesMeetDrain = () => {
+    const out = window.__hermesMeetQueue.slice();
+    window.__hermesMeetQueue = [];
+    return out;
+  };
+})();
+"""
+
+
+def _enable_captions_js() -> str:
+    """Return a small JS snippet that tries to click the 'Turn on captions' button.
+
+    Best-effort — Meet's caption toggle is keyboard-accessible via ``c``. We
+    dispatch that keystroke as a cheap fallback. Real click targeting is too
+    brittle to rely on.
+    """
+    return r"""
+    (() => {
+      const ev = new KeyboardEvent('keydown', {
+        key: 'c', code: 'KeyC', keyCode: 67, which: 67, bubbles: true,
+      });
+      document.body.dispatchEvent(ev);
+      return true;
+    })();
+    """
+
+
+def _start_realtime_speaker(
+    *,
+    rt: dict,
+    out_dir: Path,
+    bridge_info: dict,
+    api_key: str,
+    model: str,
+    voice: str,
+    instructions: str,
+    stop_flag: dict,
+    state: "_BotState",
+) -> None:
+    """Wire up the OpenAI Realtime session + speaker thread + PCM pump.
+
+    The speaker thread reads text lines from ``say_queue.jsonl``, sends each
+    to OpenAI Realtime, and writes PCM audio into ``speaker.pcm``. A
+    separate *pump* thread forwards that PCM into the OS audio sink so
+    Chrome's fake mic picks it up. On Linux we pipe to ``paplay`` against
+    the null-sink; on macOS the caller is expected to have the BlackHole
+    device selected as default input.
+    """
+    try:
+        from plugins.google_meet.realtime.openai_client import (
+            RealtimeSession,
+            RealtimeSpeaker,
+        )
+    except Exception as e:
+        state.set(error=f"realtime import failed: {e}")
+        return
+
+    pcm_path = out_dir / SAY_PCM_FILENAME
+    queue_path = out_dir / SAY_QUEUE_FILENAME
+    processed_path = out_dir / "say_processed.jsonl"
+    # Reset the sink file so we start clean each session.
+    pcm_path.write_bytes(b"")
+    # Make sure the queue exists so the speaker poller doesn't error on
+    # first iteration.
+    queue_path.touch()
+
+    try:
+        session = RealtimeSession(
+            api_key=api_key,
+            model=model,
+            voice=voice,
+            instructions=instructions,
+            audio_sink_path=pcm_path,
+            sample_rate=24000,
+        )
+        session.connect()
+    except Exception as e:
+        state.set(error=f"realtime connect failed: {e}")
+        return
+
+    rt["session"] = session
+
+    def _stop_fn():
+        return stop_flag.get("stop", False)
+
+    rt["speaker_stop"] = lambda: stop_flag.__setitem__("stop", stop_flag.get("stop", False))
+
+    speaker = RealtimeSpeaker(
+        session=session,
+        queue_path=queue_path,
+        processed_path=processed_path,
+    )
+
+    def _speaker_loop():
+        try:
+            speaker.run_until_stopped(_stop_fn)
+        except Exception as e:
+            state.set(error=f"realtime speaker crashed: {e}")
+
+    t_speaker = threading.Thread(target=_speaker_loop, name="meet-speaker", daemon=True)
+    t_speaker.start()
+    rt["speaker_thread"] = t_speaker
+
+    # PCM pump: feeds speaker.pcm (24kHz s16le mono) into the OS audio
+    # device that Chrome's fake mic reads from. Different tools per
+    # platform, but the contract is the same — block-read the growing
+    # PCM file and stream it to the device in near-real-time.
+    platform_tag = (bridge_info or {}).get("platform")
+    if platform_tag == "linux":
+        import subprocess as _sp
+
+        sink = (bridge_info or {}).get("write_target") or "hermes_meet_sink"
+        try:
+            proc = _sp.Popen(
+                [
+                    "paplay",
+                    "--raw",
+                    "--rate=24000",
+                    "--format=s16le",
+                    "--channels=1",
+                    f"--device={sink}",
+                    str(pcm_path),
+                ],
+                stdin=_sp.DEVNULL,
+                stdout=_sp.DEVNULL,
+                stderr=_sp.DEVNULL,
+            )
+            rt["pcm_pump"] = proc
+        except FileNotFoundError:
+            state.set(error="paplay not found — install pulseaudio-utils for realtime on Linux")
+    elif platform_tag == "darwin":
+        # macOS: use ffmpeg to tail-read speaker.pcm and write it to the
+        # BlackHole output device. The user must have BlackHole selected
+        # as the default input in System Settings → Sound for Chrome to
+        # pick it up. We prefer ffmpeg because it's scriptable and can
+        # target AVFoundation devices by name; fall back to afplay-ing
+        # the file in a tight loop if ffmpeg is absent.
+        import shutil as _shutil
+        import subprocess as _sp
+
+        device_name = (bridge_info or {}).get("write_target") or "BlackHole 2ch"
+        if _shutil.which("ffmpeg"):
+            try:
+                # -re: read input at native frame rate.
+                # -f avfoundation -i: speaker path as raw PCM.
+                # -f s16le -ar 24000 -ac 1 -i <pcm>: interpret the file.
+                # -f audiotoolbox -audio_device_index: write to BlackHole.
+                # Simpler: output as raw via coreaudio using "-f audiotoolbox".
+                # ffmpeg's audiotoolbox output picks the current default
+                # output device, which isn't what we want. Instead we use
+                # -f avfoundation with the named device as OUTPUT via
+                # -vn and the device name.
+                proc = _sp.Popen(
+                    [
+                        "ffmpeg",
+                        "-nostdin", "-hide_banner", "-loglevel", "error",
+                        "-re",
+                        "-f", "s16le", "-ar", "24000", "-ac", "1",
+                        "-i", str(pcm_path),
+                        "-f", "audiotoolbox",
+                        "-audio_device_index", _mac_audio_device_index(device_name),
+                        "-",
+                    ],
+                    stdin=_sp.DEVNULL,
+                    stdout=_sp.DEVNULL,
+                    stderr=_sp.DEVNULL,
+                )
+                rt["pcm_pump"] = proc
+            except FileNotFoundError:
+                state.set(error="ffmpeg not found — install via `brew install ffmpeg` for realtime on macOS")
+            except Exception as e:
+                state.set(error=f"macOS pcm pump failed to start: {e}")
+        else:
+            state.set(error="ffmpeg not found — install via `brew install ffmpeg` for realtime on macOS")
+
+
+def _mac_audio_device_index(device_name: str) -> str:
+    """Return the ffmpeg ``-audio_device_index`` for *device_name*, as a string.
+
+    Probes ``ffmpeg -f avfoundation -list_devices true -i ''`` (which prints
+    the device table on stderr) and matches *device_name* case-insensitively.
+    Defaults to ``"0"`` if the device can't be found — caller will get a
+    misrouted stream but not a crash, and the error will be obvious.
+    """
+    import subprocess as _sp
+
+    try:
+        out = _sp.run(
+            ["ffmpeg", "-f", "avfoundation", "-list_devices", "true", "-i", ""],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+    except Exception:
+        return "0"
+    # ffmpeg prints the table on stderr. Lines look like:
+    #   [AVFoundation indev @ 0x...] [0] BlackHole 2ch
+    import re as _re
+
+    needle = device_name.strip().lower()
+    for line in (out.stderr or "").splitlines():
+        m = _re.search(r"\[(\d+)\]\s+(.+)$", line)
+        if not m:
+            continue
+        if m.group(2).strip().lower() == needle:
+            return m.group(1)
+    return "0"
+
+
+def run_bot() -> int:  # noqa: C901 — orchestration, explicit branches
+    url = os.environ.get("HERMES_MEET_URL", "").strip()
+    out_dir_env = os.environ.get("HERMES_MEET_OUT_DIR", "").strip()
+    headed = os.environ.get("HERMES_MEET_HEADED", "").lower() in ("1", "true", "yes")
+    auth_state = os.environ.get("HERMES_MEET_AUTH_STATE", "").strip()
+    guest_name = os.environ.get("HERMES_MEET_GUEST_NAME", "Hermes Agent")
+    duration_s = _parse_duration(os.environ.get("HERMES_MEET_DURATION", ""))
+    # v2: optional realtime mode. Enabled when HERMES_MEET_MODE=realtime.
+    mode = os.environ.get("HERMES_MEET_MODE", "transcribe").strip().lower()
+    realtime_model = os.environ.get("HERMES_MEET_REALTIME_MODEL", "gpt-realtime")
+    realtime_voice = os.environ.get("HERMES_MEET_REALTIME_VOICE", "alloy")
+    realtime_instructions = os.environ.get("HERMES_MEET_REALTIME_INSTRUCTIONS", "")
+    realtime_api_key = os.environ.get("HERMES_MEET_REALTIME_KEY") or os.environ.get("OPENAI_API_KEY", "")
+
+    if not url or not _is_safe_meet_url(url):
+        sys.stderr.write(
+            "google_meet bot: refusing to launch — HERMES_MEET_URL must be a "
+            "meet.google.com URL. got: %r\n" % url
+        )
+        return 2
+    if not out_dir_env:
+        sys.stderr.write("google_meet bot: HERMES_MEET_OUT_DIR is required\n")
+        return 2
+
+    out_dir = Path(out_dir_env)
+    meeting_id = _meeting_id_from_url(url)
+    state = _BotState(out_dir=out_dir, meeting_id=meeting_id, url=url)
+
+    # SIGTERM → exit cleanly so the parent ``meet_leave`` gets a finalized
+    # transcript. We set a flag instead of raising so the Playwright context
+    # teardown runs in the finally block below.
+    stop_flag = {"stop": False}
+
+    def _on_signal(_sig, _frame):
+        stop_flag["stop"] = True
+
+    signal.signal(signal.SIGTERM, _on_signal)
+    signal.signal(signal.SIGINT, _on_signal)
+
+    # v2 realtime: provision virtual audio device + start speaker thread.
+    # We track these in a dict so the finally block can tear them down
+    # regardless of how we exit. If anything in the realtime setup fails we
+    # fall back to transcribe mode with a status flag.
+    rt = {
+        "enabled": mode == "realtime",
+        "bridge": None,            # AudioBridge | None
+        "bridge_info": None,       # dict | None
+        "session": None,           # RealtimeSession | None
+        "speaker_thread": None,    # threading.Thread | None
+        "speaker_stop": None,      # callable | None
+    }
+    if rt["enabled"]:
+        if not realtime_api_key:
+            state.set(error="realtime mode requested but no API key in HERMES_MEET_REALTIME_KEY/OPENAI_API_KEY — falling back to transcribe")
+            rt["enabled"] = False
+        else:
+            try:
+                from plugins.google_meet.audio_bridge import AudioBridge
+                bridge = AudioBridge()
+                rt["bridge_info"] = bridge.setup()
+                rt["bridge"] = bridge
+                state.set(realtime=True, realtime_device=rt["bridge_info"].get("device_name"))
+            except Exception as e:
+                state.set(error=f"audio bridge setup failed: {e} — falling back to transcribe")
+                rt["enabled"] = False
+
+    try:
+        from playwright.sync_api import sync_playwright
+    except ImportError as e:
+        state.set(error=f"playwright not installed: {e}", exited=True)
+        sys.stderr.write(
+            "google_meet bot: playwright is not installed. Run "
+            "`pip install playwright && python -m playwright install chromium`\n"
+        )
+        if rt["bridge"]:
+            rt["bridge"].teardown()
+        return 3
+
+    # Chrome env: if realtime is live on Linux, point PULSE_SOURCE at the
+    # virtual source so Chrome's fake mic reads the audio we generate.
+    chrome_env = os.environ.copy()
+    chrome_args = [
+        "--use-fake-ui-for-media-stream",
+        "--disable-blink-features=AutomationControlled",
+    ]
+    if not rt["enabled"]:
+        # v1-style fake device (silence) — we don't care about mic content
+        # when we're not speaking.
+        chrome_args.insert(1, "--use-fake-device-for-media-stream")
+    elif rt["bridge_info"] and rt["bridge_info"].get("platform") == "linux":
+        chrome_env["PULSE_SOURCE"] = rt["bridge_info"].get("device_name", "")
+
+    try:
+        with sync_playwright() as pw:
+            # Playwright's launch() doesn't take env; we set PULSE_SOURCE
+            # via the process env before launch so the child Chrome inherits it.
+            for k, v in chrome_env.items():
+                os.environ[k] = v
+            browser = pw.chromium.launch(
+                headless=not headed,
+                args=chrome_args,
+            )
+            context_args = {
+                "viewport": {"width": 1280, "height": 800},
+                "user_agent": (
+                    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
+                    "(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
+                ),
+                "permissions": ["microphone", "camera"],
+            }
+            if auth_state and Path(auth_state).is_file():
+                context_args["storage_state"] = auth_state
+            context = browser.new_context(**context_args)
+            page = context.new_page()
+
+            try:
+                page.goto(url, wait_until="domcontentloaded", timeout=30_000)
+            except Exception as e:
+                state.set(error=f"navigate failed: {e}", exited=True)
+                return 4
+
+            # Guest-mode: Meet shows a name field before "Ask to join". When
+            # we're authed, we instead see "Join now".
+            _try_guest_name(page, guest_name)
+            _click_join(page, state)
+
+            # Install caption observer and attempt to enable captions.
+            try:
+                page.evaluate(_enable_captions_js())
+                state.set(captions_enabled_attempted=True)
+            except Exception:
+                pass
+            try:
+                page.evaluate(_CAPTION_OBSERVER_JS)
+            except Exception as e:
+                state.set(error=f"caption observer install failed: {e}")
+
+            # Note: in_call=False until admission is confirmed (we detect
+            # either the Leave button or the caption region, signalling we
+            # made it past the lobby).
+            state.set(captioning=True, join_attempted_at=time.time())
+
+            # v2 realtime: start the speaker thread reading from the
+            # plugin-side say queue. The thread reads JSONL lines written by
+            # meet_say, calls OpenAI Realtime, and streams the audio PCM to
+            # the virtual sink that Chrome's fake-mic is pointed at.
+            if rt["enabled"]:
+                _start_realtime_speaker(
+                    rt=rt,
+                    out_dir=out_dir,
+                    bridge_info=rt["bridge_info"],
+                    api_key=realtime_api_key,
+                    model=realtime_model,
+                    voice=realtime_voice,
+                    instructions=realtime_instructions,
+                    stop_flag=stop_flag,
+                    state=state,
+                )
+                if rt["session"] is not None:
+                    state.set(realtime_ready=True)
+
+            # Admission + drain loop. Runs until SIGTERM, duration expiry,
+            # or the page detects "You were removed / you left the
+            # meeting". Responsible for:
+            #   * detecting admission (Leave button visible → in_call=True)
+            #   * timing out stuck-in-lobby (default 5 minutes)
+            #   * draining scraped captions into the transcript
+            #   * triggering realtime barge-in when a human speaks while
+            #     the bot is generating audio
+            #   * periodically flushing realtime counters into status.json
+            deadline = (time.time() + duration_s) if duration_s else None
+            lobby_deadline = time.time() + float(
+                os.environ.get("HERMES_MEET_LOBBY_TIMEOUT", "300")
+            )
+            last_admission_check = 0.0
+            while not stop_flag["stop"]:
+                now = time.time()
+                if deadline and now > deadline:
+                    state.set(leave_reason="duration_expired")
+                    break
+
+                # Admission detection every ~3s until admitted.
+                if not state.in_call and (now - last_admission_check) > 3.0:
+                    last_admission_check = now
+                    admitted = _detect_admission(page)
+                    if admitted:
+                        state.set(
+                            in_call=True,
+                            lobby_waiting=False,
+                            joined_at=now,
+                        )
+                    elif now > lobby_deadline:
+                        state.set(
+                            error=(
+                                "lobby timeout — host never admitted the bot "
+                                f"within {int(lobby_deadline - state.join_attempted_at) if state.join_attempted_at else 0}s"
+                            ),
+                            leave_reason="lobby_timeout",
+                        )
+                        break
+                    elif _detect_denied(page):
+                        state.set(
+                            error="host denied admission",
+                            leave_reason="denied",
+                        )
+                        break
+
+                try:
+                    queued = page.evaluate("window.__hermesMeetDrain && window.__hermesMeetDrain()")
+                    if isinstance(queued, list):
+                        for entry in queued:
+                            if not isinstance(entry, dict):
+                                continue
+                            speaker = str(entry.get("speaker", ""))
+                            text = str(entry.get("text", ""))
+                            state.record_caption(speaker=speaker, text=text)
+                            # Barge-in: if the bot is currently generating
+                            # audio AND a real human just spoke, cancel the
+                            # in-flight response so we don't talk over them.
+                            if rt["enabled"] and rt["session"] is not None:
+                                if _looks_like_human_speaker(speaker, guest_name):
+                                    try:
+                                        cancelled = rt["session"].cancel_response()
+                                        if cancelled:
+                                            state.set(last_barge_in_at=now)
+                                    except Exception:
+                                        pass
+                except Exception:
+                    # Meet reloaded or we got booted — try to detect and
+                    # exit gracefully rather than spinning.
+                    if page.is_closed():
+                        state.set(leave_reason="page_closed")
+                        break
+
+                # Fold the realtime session's byte/timestamp counters into
+                # the status file so meet_status can surface them.
+                if rt["session"] is not None:
+                    state.set(
+                        audio_bytes_out=getattr(rt["session"], "audio_bytes_out", 0),
+                        last_audio_out_at=getattr(rt["session"], "last_audio_out_at", None),
+                    )
+
+                time.sleep(1.0)
+
+            # Try to leave cleanly — click "Leave call" button if present.
+            try:
+                page.evaluate(
+                    "() => { const b = document.querySelector('button[aria-label*=\"eave call\"]');"
+                    " if (b) b.click(); }"
+                )
+            except Exception:
+                pass
+
+            context.close()
+            browser.close()
+            # v2: teardown realtime speaker + audio bridge.
+            if rt["speaker_stop"]:
+                try:
+                    rt["speaker_stop"]()
+                except Exception:
+                    pass
+            if rt["speaker_thread"] is not None:
+                try:
+                    rt["speaker_thread"].join(timeout=5.0)
+                except Exception:
+                    pass
+            if rt["session"]:
+                try:
+                    rt["session"].close()
+                except Exception:
+                    pass
+            if rt["bridge"]:
+                try:
+                    rt["bridge"].teardown()
+                except Exception:
+                    pass
+            state.set(in_call=False, captioning=False, exited=True)
+            return 0
+
+    except Exception as e:
+        state.set(error=f"unhandled: {e}", exited=True)
+        return 1
+
+
+def _try_guest_name(page, guest_name: str) -> None:
+    """If Meet is showing a guest-name input, type *guest_name* into it."""
+    try:
+        # Meet's guest name input has placeholder "Your name".
+        locator = page.locator('input[aria-label*="name" i]').first
+        if locator.count() and locator.is_visible():
+            locator.fill(guest_name, timeout=2_000)
+    except Exception:
+        pass
+
+
+def _detect_admission(page) -> bool:
+    """True if we're clearly past the lobby and in the call itself.
+
+    Uses a JS-side probe because Meet's DOM structure varies by client
+    version. We check several high-signal indicators and declare admission
+    on the first hit:
+
+      1. Leave-call button is present (``aria-label`` contains "eave call").
+      2. Caption region has appeared (we installed the observer and it attached).
+      3. The participant list container is visible.
+
+    Conservative by default — returns False on any error.
+    """
+    probe = r"""
+    (() => {
+      const leave = document.querySelector('button[aria-label*="eave call" i]');
+      if (leave) return true;
+      if (window.__hermesMeetInstalled) {
+        const caps = document.querySelector(
+          '[role="region"][aria-label*="aption" i], ' +
+          'div[jsname="YSxPC"], div[jsname="tgaKEf"]'
+        );
+        if (caps) return true;
+      }
+      const parts = document.querySelector('[aria-label*="articipants" i]');
+      if (parts) return true;
+      return false;
+    })();
+    """
+    try:
+        return bool(page.evaluate(probe))
+    except Exception:
+        return False
+
+
+def _detect_denied(page) -> bool:
+    """True when Meet is showing a 'you were denied' / 'no one admitted' page."""
+    probe = r"""
+    (() => {
+      const text = document.body ? document.body.innerText || '' : '';
+      // English only — matches what shows up when the host denies or
+      // removes a guest.
+      if (/You can't join this video call/i.test(text)) return true;
+      if (/You were removed from the meeting/i.test(text)) return true;
+      if (/No one responded to your request to join/i.test(text)) return true;
+      return false;
+    })();
+    """
+    try:
+        return bool(page.evaluate(probe))
+    except Exception:
+        return False
+
+
+def _looks_like_human_speaker(speaker: str, bot_guest_name: str) -> bool:
+    """Whether a caption line's speaker is probably a human, not our bot echo.
+
+    Meet attributes captions to the speaker's display name. When Chrome is
+    reading our fake mic, Meet still attributes captions to *our* bot name
+    (because the bot is the one "speaking"). We don't want those to trigger
+    barge-in. Anything else — real participant names — does.
+
+    Conservative: unknown / blank speakers (common when caption scraping
+    falls back to raw text) do NOT trigger barge-in, because we can't tell
+    whether it was a human or us.
+    """
+    if not speaker or not speaker.strip():
+        return False
+    spk = speaker.strip().lower()
+    if spk in ("unknown", "you", bot_guest_name.strip().lower()):
+        return False
+    return True
+
+
+def _click_join(page, state: _BotState) -> None:
+    """Click 'Join now' or 'Ask to join' if either button is visible.
+
+    Flags ``lobby_waiting`` when we hit the "waiting for host to admit you"
+    state so the agent can surface that in status.
+    """
+    for label in ("Join now", "Ask to join"):
+        try:
+            btn = page.get_by_role("button", name=label, exact=False).first
+            if btn.count() and btn.is_visible():
+                btn.click(timeout=3_000)
+                if label == "Ask to join":
+                    state.set(lobby_waiting=True)
+                break
+        except Exception:
+            continue
+
+
+def _parse_duration(raw: str) -> Optional[float]:
+    """Parse ``30m`` / ``2h`` / ``90`` (seconds) → float seconds, or None."""
+    if not raw:
+        return None
+    raw = raw.strip().lower()
+    try:
+        if raw.endswith("h"):
+            return float(raw[:-1]) * 3600
+        if raw.endswith("m"):
+            return float(raw[:-1]) * 60
+        if raw.endswith("s"):
+            return float(raw[:-1])
+        return float(raw)
+    except ValueError:
+        return None
+
+
+if __name__ == "__main__":  # pragma: no cover — subprocess entry point
+    sys.exit(run_bot())
@@ -0,0 +1,54 @@
+"""Remote 'node host' primitive for the google_meet plugin.
+
+Lets the Meet bot (Playwright + Chrome) run on a different machine than
+the hermes-agent gateway. The gateway speaks a small JSON-over-WebSocket
+RPC protocol to the remote node; the node wraps the existing
+``plugins.google_meet.process_manager`` API.
+
+Topology
+--------
+    gateway (Linux)  ── ws://mac.local:18789 ──▶  node server (Mac)
+                                                  └─ process_manager
+                                                     └─ meet_bot (Playwright)
+
+Why: Google sign-in + Chrome profile live on the user's laptop. Running
+the bot there reuses that profile without shipping credentials to the
+server.
+
+Public surface
+--------------
+    NodeClient     — gateway-side RPC client (short-lived sync WS per call)
+    NodeServer     — long-running server that hosts the bot
+    NodeRegistry   — local JSON registry of approved nodes (name → url+token)
+    protocol       — message envelope helpers (make_request, encode, decode, ...)
+"""
+
+from __future__ import annotations
+
+from plugins.google_meet.node import protocol
+from plugins.google_meet.node.client import NodeClient
+from plugins.google_meet.node.protocol import (
+    VALID_REQUEST_TYPES,
+    decode,
+    encode,
+    make_error,
+    make_request,
+    make_response,
+    validate_request,
+)
+from plugins.google_meet.node.registry import NodeRegistry
+from plugins.google_meet.node.server import NodeServer
+
+__all__ = [
+    "NodeClient",
+    "NodeServer",
+    "NodeRegistry",
+    "protocol",
+    "make_request",
+    "make_response",
+    "make_error",
+    "encode",
+    "decode",
+    "validate_request",
+    "VALID_REQUEST_TYPES",
+]
@@ -0,0 +1,125 @@
+"""`hermes meet node ...` subcommand tree.
+
+Wired into the existing ``hermes meet`` parser by the plugin's top-level
+CLI. This module only defines the subparsers and their dispatch — it
+does not mutate the existing cli.py.
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import json
+import sys
+from typing import Any
+
+from plugins.google_meet.node.client import NodeClient
+from plugins.google_meet.node.registry import NodeRegistry
+from plugins.google_meet.node.server import NodeServer
+
+
+def register_cli(subparser: argparse.ArgumentParser) -> None:
+    """Add ``run / list / approve / remove / status / ping`` subparsers.
+
+    *subparser* is the ``hermes meet node`` argparse object — typically
+    the result of ``meet_parser.add_parser('node', ...)``.
+    """
+    sp = subparser.add_subparsers(dest="node_cmd", required=True)
+
+    run = sp.add_parser("run", help="Start a node server on this machine.")
+    run.add_argument("--host", default="0.0.0.0")
+    run.add_argument("--port", type=int, default=18789)
+    run.add_argument("--display-name", default="hermes-meet-node")
+    run.set_defaults(func=node_command)
+
+    lst = sp.add_parser("list", help="List approved remote nodes.")
+    lst.set_defaults(func=node_command)
+
+    app = sp.add_parser("approve", help="Register a remote node on the gateway.")
+    app.add_argument("name")
+    app.add_argument("url")
+    app.add_argument("token")
+    app.set_defaults(func=node_command)
+
+    rm = sp.add_parser("remove", help="Forget a registered node.")
+    rm.add_argument("name")
+    rm.set_defaults(func=node_command)
+
+    st = sp.add_parser("status", help="Ping a registered node.")
+    st.add_argument("name")
+    st.set_defaults(func=node_command)
+
+    pg = sp.add_parser("ping", help="Alias for status.")
+    pg.add_argument("name")
+    pg.set_defaults(func=node_command)
+
+
+def node_command(args: argparse.Namespace) -> int:
+    """Dispatch for ``hermes meet node ...``.
+
+    Returns a process exit code. Side-effects print to stdout/stderr.
+    """
+    cmd = getattr(args, "node_cmd", None)
+
+    if cmd == "run":
+        server = NodeServer(
+            host=args.host,
+            port=args.port,
+            display_name=args.display_name,
+        )
+        token = server.ensure_token()
+        print(f"[meet-node] display_name={server.display_name}")
+        print(f"[meet-node] listening on ws://{args.host}:{args.port}")
+        print(f"[meet-node] token (copy to gateway): {token}")
+        print(f"[meet-node] approve with:")
+        print(f"             hermes meet node approve <name> ws://<host>:{args.port} {token}")
+        try:
+            asyncio.run(server.serve())
+        except KeyboardInterrupt:
+            return 0
+        except RuntimeError as exc:
+            print(f"[meet-node] error: {exc}", file=sys.stderr)
+            return 2
+        return 0
+
+    reg = NodeRegistry()
+
+    if cmd == "list":
+        nodes = reg.list_all()
+        if not nodes:
+            print("no nodes registered")
+            return 0
+        for n in nodes:
+            print(f"{n['name']}\t{n['url']}\ttoken={n['token'][:6]}…")
+        return 0
+
+    if cmd == "approve":
+        reg.add(args.name, args.url, args.token)
+        print(f"approved node {args.name!r} at {args.url}")
+        return 0
+
+    if cmd == "remove":
+        ok = reg.remove(args.name)
+        print(f"removed {args.name!r}" if ok else f"no such node: {args.name!r}")
+        return 0 if ok else 1
+
+    if cmd in ("status", "ping"):
+        entry = reg.get(args.name)
+        if entry is None:
+            print(f"no such node: {args.name!r}", file=sys.stderr)
+            return 1
+        client = NodeClient(entry["url"], entry["token"])
+        try:
+            result = client.ping()
+        except Exception as exc:  # noqa: BLE001 — surface any connection error
+            print(json.dumps({"ok": False, "error": str(exc)}))
+            return 1
+        print(json.dumps({"ok": True, "node": args.name, **_coerce_dict(result)}))
+        return 0
+
+    print(f"unknown node command: {cmd!r}", file=sys.stderr)
+    return 2
+
+
+def _coerce_dict(value: Any) -> dict:
+    return value if isinstance(value, dict) else {"result": value}
@@ -0,0 +1,107 @@
+"""Gateway-side RPC client for a remote meet node.
+
+Each call opens a short-lived synchronous WebSocket to the node, sends
+exactly one request, reads exactly one response, and closes. This keeps
+the client trivial to use from non-async tool handlers and avoids
+maintaining persistent connection state across agent turns.
+
+The ``websockets`` package is an optional dep — we import it lazily so
+plugin load doesn't require it.
+"""
+
+from __future__ import annotations
+
+from typing import Any, Dict, Optional
+
+from plugins.google_meet.node import protocol as _proto
+
+
+class NodeClient:
+    """Thin synchronous WS client matching the server's request surface."""
+
+    def __init__(self, url: str, token: str, timeout: float = 10.0) -> None:
+        if not isinstance(url, str) or not url:
+            raise ValueError("url must be a non-empty string")
+        if not isinstance(token, str) or not token:
+            raise ValueError("token must be a non-empty string")
+        self.url = url
+        self.token = token
+        self.timeout = float(timeout)
+
+    # ----- core RPC -----------------------------------------------------
+
+    def _rpc(self, type: str, payload: Dict[str, Any]) -> Dict[str, Any]:
+        """Send one request, return the response payload dict.
+
+        Raises RuntimeError when the server sends an ``error`` envelope
+        or the response id doesn't match.
+        """
+        try:
+            from websockets.sync.client import connect  # type: ignore
+        except ImportError as exc:
+            raise RuntimeError(
+                "NodeClient requires the 'websockets' package. "
+                "Install it with: pip install websockets"
+            ) from exc
+
+        req = _proto.make_request(type, self.token, payload)
+        raw_out = _proto.encode(req)
+
+        with connect(self.url, open_timeout=self.timeout,
+                     close_timeout=self.timeout) as ws:
+            ws.send(raw_out)
+            raw_in = ws.recv(timeout=self.timeout)
+
+        if isinstance(raw_in, (bytes, bytearray)):
+            raw_in = raw_in.decode("utf-8")
+        resp = _proto.decode(raw_in)
+
+        if resp.get("type") == "error":
+            raise RuntimeError(f"node error: {resp.get('error', '<unknown>')}")
+        if resp.get("id") != req["id"]:
+            raise RuntimeError(
+                f"response id mismatch: sent {req['id']}, got {resp.get('id')!r}"
+            )
+        payload_out = resp.get("payload")
+        if not isinstance(payload_out, dict):
+            # Ping returns {"type": "pong", "payload": {...}} — still a dict.
+            raise RuntimeError("response missing payload dict")
+        return payload_out
+
+    # ----- convenience methods -----------------------------------------
+
+    def start_bot(
+        self,
+        url: str,
+        guest_name: str = "Hermes Agent",
+        duration: Optional[str] = None,
+        headed: bool = False,
+        mode: str = "transcribe",
+    ) -> Dict[str, Any]:
+        payload: Dict[str, Any] = {
+            "url": url,
+            "guest_name": guest_name,
+            "headed": bool(headed),
+            "mode": mode,
+        }
+        if duration is not None:
+            payload["duration"] = duration
+        return self._rpc("start_bot", payload)
+
+    def stop(self) -> Dict[str, Any]:
+        return self._rpc("stop", {})
+
+    def status(self) -> Dict[str, Any]:
+        return self._rpc("status", {})
+
+    def transcript(self, last: Optional[int] = None) -> Dict[str, Any]:
+        payload: Dict[str, Any] = {}
+        if last is not None:
+            payload["last"] = int(last)
+        return self._rpc("transcript", payload)
+
+    def say(self, text: str) -> Dict[str, Any]:
+        return self._rpc("say", {"text": str(text)})
+
+    def ping(self) -> Dict[str, Any]:
+        return self._rpc("ping", {})
@@ -0,0 +1,124 @@
+"""Wire protocol for gateway ↔ node RPC.
+
+Everything is a JSON object with the same envelope shape:
+
+    Request:   {"type": <str>, "id": <str>, "token": <str>, "payload": <dict>}
+    Response:  {"type": "<req-type>_res", "id": <req-id>, "payload": <dict>}
+    Error:     {"type": "error", "id": <req-id>, "error": <str>}
+
+Requests must carry the shared bearer token (set up via
+``hermes meet node approve`` on the gateway and read off disk on the
+server). Mismatched tokens are rejected before dispatch.
+"""
+
+from __future__ import annotations
+
+import json
+import uuid
+from typing import Any, Dict, Tuple
+
+
+VALID_REQUEST_TYPES = frozenset({
+    "start_bot",
+    "stop",
+    "status",
+    "transcript",
+    "say",
+    "ping",
+})
+
+
+def make_request(
+    type: str,
+    token: str,
+    payload: Dict[str, Any],
+    req_id: str | None = None,
+) -> Dict[str, Any]:
+    """Construct a request envelope.
+
+    ``req_id`` is auto-generated (uuid4 hex) when not supplied so callers
+    can correlate async responses.
+    """
+    if not isinstance(type, str) or not type:
+        raise ValueError("type must be a non-empty string")
+    if type not in VALID_REQUEST_TYPES:
+        raise ValueError(f"unknown request type: {type!r}")
+    if not isinstance(token, str):
+        raise ValueError("token must be a string")
+    if not isinstance(payload, dict):
+        raise ValueError("payload must be a dict")
+    return {
+        "type": type,
+        "id": req_id or uuid.uuid4().hex,
+        "token": token,
+        "payload": payload,
+    }
+
+
+def make_response(req_id: str, payload: Dict[str, Any]) -> Dict[str, Any]:
+    """Build a success response. The caller supplies the *request* type;
+    we suffix it with ``_res`` so clients can assert they got the right
+    reply.
+
+    For simplicity we don't require the type here — clients usually just
+    key off ``id``. But we still emit a generic ``*_res`` envelope.
+    """
+    if not isinstance(payload, dict):
+        raise ValueError("payload must be a dict")
+    return {"type": "response", "id": req_id, "payload": payload}
+
+
+def make_error(req_id: str, error: str) -> Dict[str, Any]:
+    return {"type": "error", "id": req_id, "error": str(error)}
+
+
+def encode(msg: Dict[str, Any]) -> str:
+    """Serialize a message envelope to a JSON string."""
+    return json.dumps(msg, separators=(",", ":"), ensure_ascii=False)
+
+
+def decode(raw: str) -> Dict[str, Any]:
+    """Parse a JSON envelope, raising ValueError on anything malformed.
+
+    Minimal type validation: must be an object, must contain ``type`` and
+    ``id``. Heavier validation (token match, payload shape) happens in
+    :func:`validate_request` on the server side.
+    """
+    try:
+        obj = json.loads(raw)
+    except (TypeError, json.JSONDecodeError) as exc:
+        raise ValueError(f"malformed JSON: {exc}") from exc
+    if not isinstance(obj, dict):
+        raise ValueError("envelope must be a JSON object")
+    if "type" not in obj or not isinstance(obj["type"], str):
+        raise ValueError("envelope missing string 'type'")
+    if "id" not in obj or not isinstance(obj["id"], str):
+        raise ValueError("envelope missing string 'id'")
+    return obj
+
+
+def validate_request(msg: Dict[str, Any], expected_token: str) -> Tuple[bool, str]:
+    """Check a decoded request against the server's shared token.
+
+    Returns ``(True, "")`` when the envelope is acceptable or
+    ``(False, <reason>)`` otherwise. Reason strings are safe to surface
+    back to the client in an error envelope.
+    """
+    if not isinstance(msg, dict):
+        return False, "envelope must be a dict"
+    t = msg.get("type")
+    if not isinstance(t, str) or not t:
+        return False, "missing or non-string 'type'"
+    if t not in VALID_REQUEST_TYPES:
+        return False, f"unknown request type: {t!r}"
+    if not isinstance(msg.get("id"), str) or not msg.get("id"):
+        return False, "missing or non-string 'id'"
+    token = msg.get("token")
+    if not isinstance(token, str) or not token:
+        return False, "missing token"
+    if token != expected_token:
+        return False, "token mismatch"
+    payload = msg.get("payload")
+    if not isinstance(payload, dict):
+        return False, "payload must be a dict"
+    return True, ""
@@ -0,0 +1,112 @@
+"""Local JSON registry of approved remote meet nodes.
+
+Lives at ``$HERMES_HOME/workspace/meetings/nodes.json``. The gateway
+consults it to resolve a ``chrome_node`` name to a ``(url, token)`` pair
+before opening a WebSocket to the remote bot host.
+
+Schema
+------
+    {
+      "nodes": {
+        "<name>": {
+          "url":   "ws://host:port",
+          "token": "...",
+          "added_at": <epoch_float>
+        }
+      }
+    }
+"""
+
+from __future__ import annotations
+
+import json
+import time
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+from hermes_constants import get_hermes_home
+
+
+def _default_path() -> Path:
+    return Path(get_hermes_home()) / "workspace" / "meetings" / "nodes.json"
+
+
+class NodeRegistry:
+    """Simple file-backed registry. Not concurrent-safe across processes
+    — single writer assumed (the gateway CLI)."""
+
+    def __init__(self, path: Optional[Path] = None) -> None:
+        self.path = Path(path) if path is not None else _default_path()
+
+    # ----- storage ------------------------------------------------------
+
+    def _load(self) -> Dict[str, Any]:
+        if not self.path.is_file():
+            return {"nodes": {}}
+        try:
+            data = json.loads(self.path.read_text(encoding="utf-8"))
+        except (OSError, json.JSONDecodeError):
+            return {"nodes": {}}
+        if not isinstance(data, dict) or not isinstance(data.get("nodes"), dict):
+            return {"nodes": {}}
+        return data
+
+    def _save(self, data: Dict[str, Any]) -> None:
+        self.path.parent.mkdir(parents=True, exist_ok=True)
+        tmp = self.path.with_suffix(".json.tmp")
+        tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
+        tmp.replace(self.path)
+
+    # ----- public API ---------------------------------------------------
+
+    def get(self, name: str) -> Optional[Dict[str, Any]]:
+        data = self._load()
+        entry = data["nodes"].get(name)
+        if entry is None:
+            return None
+        return {"name": name, **entry}
+
+    def add(self, name: str, url: str, token: str) -> None:
+        if not isinstance(name, str) or not name:
+            raise ValueError("node name must be a non-empty string")
+        if not isinstance(url, str) or not url:
+            raise ValueError("url must be a non-empty string")
+        if not isinstance(token, str) or not token:
+            raise ValueError("token must be a non-empty string")
+        data = self._load()
+        data["nodes"][name] = {
+            "url": url,
+            "token": token,
+            "added_at": time.time(),
+        }
+        self._save(data)
+
+    def remove(self, name: str) -> bool:
+        data = self._load()
+        if name in data["nodes"]:
+            del data["nodes"][name]
+            self._save(data)
+            return True
+        return False
+
+    def list_all(self) -> List[Dict[str, Any]]:
+        data = self._load()
+        out: List[Dict[str, Any]] = []
+        for name, entry in sorted(data["nodes"].items()):
+            out.append({"name": name, **entry})
+        return out
+
+    def resolve(self, chrome_node: Optional[str]) -> Optional[Dict[str, Any]]:
+        """Resolve a node name to its entry.
+
+        If ``chrome_node`` is provided, return that named node (or None).
+        If ``chrome_node`` is None, return the sole registered node when
+        exactly one is registered; otherwise return None (ambiguous or
+        empty).
+        """
+        if chrome_node:
+            return self.get(chrome_node)
+        nodes = self.list_all()
+        if len(nodes) == 1:
+            return nodes[0]
+        return None
@@ -0,0 +1,193 @@
+"""Remote node server.
+
+Runs on the machine that will host the Meet bot (typically the user's
+Mac laptop with a signed-in Chrome). Exposes a WebSocket endpoint that
+accepts signed RPC requests and dispatches them to the existing
+``plugins.google_meet.process_manager`` module.
+
+Launched by ``hermes meet node run``.
+
+Token handling
+--------------
+On first boot we mint 32 hex chars of entropy and persist them at
+``$HERMES_HOME/workspace/meetings/node_token.json``. Subsequent boots
+reuse the same token so previously-approved gateways don't need to be
+re-paired. The operator copies this token out-of-band to the gateway
+via ``hermes meet node approve <name> <url> <token>``.
+
+Dependencies
+------------
+``websockets`` is an optional dep. We import it lazily inside
+:meth:`serve` so installing the plugin doesn't require it unless you
+actually host a node.
+"""
+
+from __future__ import annotations
+
+import json
+import secrets
+import time
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+from hermes_constants import get_hermes_home
+from plugins.google_meet.node import protocol as _proto
+
+
+def _default_token_path() -> Path:
+    return Path(get_hermes_home()) / "workspace" / "meetings" / "node_token.json"
+
+
+class NodeServer:
+    """WebSocket server that executes meet bot RPCs locally."""
+
+    def __init__(
+        self,
+        host: str = "0.0.0.0",
+        port: int = 18789,
+        token_path: Optional[Path] = None,
+        display_name: str = "hermes-meet-node",
+    ) -> None:
+        self.host = host
+        self.port = port
+        self.display_name = display_name
+        self.token_path = Path(token_path) if token_path is not None else _default_token_path()
+        self._token: Optional[str] = None
+
+    # ----- token management --------------------------------------------
+
+    def ensure_token(self) -> str:
+        """Return the persisted shared secret, generating one on first use."""
+        if self._token:
+            return self._token
+        if self.token_path.is_file():
+            try:
+                data = json.loads(self.token_path.read_text(encoding="utf-8"))
+                tok = data.get("token")
+                if isinstance(tok, str) and tok:
+                    self._token = tok
+                    return tok
+            except (OSError, json.JSONDecodeError):
+                pass
+        tok = secrets.token_hex(16)  # 32 hex chars
+        self.token_path.parent.mkdir(parents=True, exist_ok=True)
+        tmp = self.token_path.with_suffix(".json.tmp")
+        tmp.write_text(
+            json.dumps({"token": tok, "generated_at": time.time()}, indent=2),
+            encoding="utf-8",
+        )
+        tmp.replace(self.token_path)
+        self._token = tok
+        return tok
+
+    def get_token(self) -> str:
+        """Alias for :meth:`ensure_token`; does not mutate on subsequent calls."""
+        return self.ensure_token()
+
+    # ----- dispatch -----------------------------------------------------
+
+    async def _handle_request(self, msg: Dict[str, Any]) -> Dict[str, Any]:
+        """Validate + dispatch a single decoded request envelope.
+
+        Always returns a response envelope (success or error); never
+        raises. Errors from inside the process_manager are wrapped into
+        the response payload's ``ok``/``error`` keys (which pm already
+        does) rather than being re-encoded as error envelopes — the
+        envelope-level error channel is reserved for auth / protocol
+        failures.
+        """
+        expected = self.ensure_token()
+        ok, reason = _proto.validate_request(msg, expected)
+        if not ok:
+            return _proto.make_error(str(msg.get("id") or ""), reason)
+
+        req_id = msg["id"]
+        t = msg["type"]
+        payload = msg["payload"]
+
+        # Import lazily so test mocks can monkeypatch freely.
+        from plugins.google_meet import process_manager as pm
+
+        try:
+            if t == "ping":
+                return {"type": "pong", "id": req_id,
+                        "payload": {"display_name": self.display_name,
+                                    "ts": time.time()}}
+            if t == "start_bot":
+                # Whitelist kwargs we pass through to pm.start.
+                kwargs = {
+                    k: payload[k]
+                    for k in ("url", "guest_name", "duration", "headed",
+                              "auth_state", "session_id", "out_dir")
+                    if k in payload
+                }
+                if "url" not in kwargs:
+                    return _proto.make_error(req_id, "missing 'url' in payload")
+                result = pm.start(**kwargs)
+                return _proto.make_response(req_id, result)
+            if t == "stop":
+                reason_arg = payload.get("reason", "requested")
+                result = pm.stop(reason=reason_arg)
+                return _proto.make_response(req_id, result)
+            if t == "status":
+                return _proto.make_response(req_id, pm.status())
+            if t == "transcript":
+                last = payload.get("last")
+                result = pm.transcript(last=last)
+                return _proto.make_response(req_id, result)
+            if t == "say":
+                # v2 wiring: enqueue into say_queue.jsonl inside the
+                # active meeting's out_dir when present. The bot-side
+                # consumer is v3+ (for v1 this is a stub returning ok).
+                text = payload.get("text", "")
+                active = pm._read_active()  # type: ignore[attr-defined]
+                enqueued = False
+                if active and active.get("out_dir"):
+                    queue = Path(active["out_dir"]) / "say_queue.jsonl"
+                    try:
+                        queue.parent.mkdir(parents=True, exist_ok=True)
+                        with queue.open("a", encoding="utf-8") as fh:
+                            fh.write(json.dumps({"text": text, "ts": time.time()}) + "\n")
+                        enqueued = True
+                    except OSError:
+                        enqueued = False
+                return _proto.make_response(
+                    req_id,
+                    {"ok": True, "enqueued": enqueued, "text": text},
+                )
+        except Exception as exc:  # noqa: BLE001 — surface any pm crash to client
+            return _proto.make_error(req_id, f"{type(exc).__name__}: {exc}")
+
+        return _proto.make_error(req_id, f"unhandled type: {t!r}")
+
+    # ----- server loop --------------------------------------------------
+
+    async def serve(self) -> None:
+        """Run the WebSocket server until cancelled.
+
+        Blocks forever. Callers typically wrap this in ``asyncio.run``.
+        """
+        try:
+            import websockets  # type: ignore
+        except ImportError as exc:
+            raise RuntimeError(
+                "NodeServer.serve requires the 'websockets' package. "
+                "Install it with: pip install websockets"
+            ) from exc
+
+        self.ensure_token()
+
+        async def _handler(ws):
+            async for raw in ws:
+                try:
+                    msg = _proto.decode(raw if isinstance(raw, str) else raw.decode("utf-8"))
+                except ValueError as exc:
+                    await ws.send(_proto.encode(_proto.make_error("", f"decode: {exc}")))
+                    continue
+                reply = await self._handle_request(msg)
+                await ws.send(_proto.encode(reply))
+
+        async with websockets.serve(_handler, self.host, self.port):
+            # Run until cancelled.
+            import asyncio
+            await asyncio.Future()
@@ -0,0 +1,16 @@
+name: google_meet
+version: 0.2.0
+description: "Join a Google Meet call, transcribe live captions, speak in realtime, and follow up afterwards. v1 transcribe-only is the default; v2 realtime duplex audio via OpenAI Realtime + BlackHole/PulseAudio ships with mode='realtime'; v3 remote node host lets the bot run on a different machine than the gateway (gateway on Linux, Chrome+signed-in profile on the user's Mac). Explicit-by-design: only joins meet.google.com URLs passed in \u2014 no calendar scanning, no auto-dial."
+author: NousResearch
+kind: standalone
+platforms:
+  - linux
+  - macos
+provides_tools:
+  - meet_join
+  - meet_leave
+  - meet_status
+  - meet_transcript
+  - meet_say
+hooks:
+  - on_session_end
@@ -0,0 +1,326 @@
+"""Subprocess lifecycle manager for the google_meet bot.
+
+Single active meeting at a time. Stores the running pid + out_dir in a
+session-scoped state file under ``$HERMES_HOME/workspace/meetings/.active.json``
+so tool calls across turns can find the bot, and ``on_session_end`` can clean
+it up.
+
+The bot runs as a detached subprocess — we don't hold file descriptors open,
+so the parent agent loop can't block on it. We communicate via files only.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import signal
+import subprocess
+import sys
+import time
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+from hermes_constants import get_hermes_home
+
+# File + directory layout (under $HERMES_HOME):
+#
+#   workspace/meetings/
+#       .active.json                # pointer to current session's bot
+#       <meeting-id>/
+#           status.json             # live bot state (written by bot each tick)
+#           transcript.txt          # scraped captions
+#
+# .active.json holds:
+#   {"pid": 12345, "meeting_id": "abc-defg-hij", "out_dir": "...",
+#    "url": "https://meet.google.com/...", "started_at": 1714159200.0,
+#    "session_id": "optional"}
+
+
+def _root() -> Path:
+    return Path(get_hermes_home()) / "workspace" / "meetings"
+
+
+def _active_file() -> Path:
+    return _root() / ".active.json"
+
+
+def _read_active() -> Optional[Dict[str, Any]]:
+    p = _active_file()
+    if not p.is_file():
+        return None
+    try:
+        return json.loads(p.read_text(encoding="utf-8"))
+    except Exception:
+        return None
+
+
+def _write_active(data: Dict[str, Any]) -> None:
+    p = _active_file()
+    p.parent.mkdir(parents=True, exist_ok=True)
+    tmp = p.with_suffix(".json.tmp")
+    tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
+    tmp.replace(p)
+
+
+def _clear_active() -> None:
+    try:
+        _active_file().unlink()
+    except FileNotFoundError:
+        pass
+
+
+def _pid_alive(pid: int) -> bool:
+    try:
+        os.kill(pid, 0)
+    except ProcessLookupError:
+        return False
+    except PermissionError:
+        # Process exists but we can't signal it — treat as alive.
+        return True
+    return True
+
+
+# ---------------------------------------------------------------------------
+# Public API — used by tool handlers + CLI
+# ---------------------------------------------------------------------------
+
+def start(
+    url: str,
+    *,
+    out_dir: Optional[Path] = None,
+    headed: bool = False,
+    auth_state: Optional[str] = None,
+    guest_name: str = "Hermes Agent",
+    duration: Optional[str] = None,
+    session_id: Optional[str] = None,
+    mode: str = "transcribe",
+    realtime_model: Optional[str] = None,
+    realtime_voice: Optional[str] = None,
+    realtime_instructions: Optional[str] = None,
+    realtime_api_key: Optional[str] = None,
+) -> Dict[str, Any]:
+    """Spawn the meet_bot subprocess for *url*.
+
+    If a bot is already running for this hermes install, leave it first —
+    we enforce single-active-meeting semantics.
+
+    Returns a dict summarizing the started bot.
+    """
+    from plugins.google_meet.meet_bot import _is_safe_meet_url, _meeting_id_from_url
+
+    if not _is_safe_meet_url(url):
+        return {
+            "ok": False,
+            "error": (
+                "refusing: only https://meet.google.com/ URLs are allowed. "
+                "got: " + repr(url)
+            ),
+        }
+
+    existing = _read_active()
+    if existing and _pid_alive(int(existing.get("pid", 0))):
+        stop(reason="replaced by new meet_join")
+
+    meeting_id = _meeting_id_from_url(url)
+    out = out_dir or (_root() / meeting_id)
+    out.mkdir(parents=True, exist_ok=True)
+
+    # Wipe any stale transcript/status files from a previous run of this
+    # meeting id so polling isn't confused.
+    for name in ("transcript.txt", "status.json"):
+        f = out / name
+        if f.exists():
+            try:
+                f.unlink()
+            except OSError:
+                pass
+
+    env = os.environ.copy()
+    env["HERMES_MEET_URL"] = url
+    env["HERMES_MEET_OUT_DIR"] = str(out)
+    env["HERMES_MEET_GUEST_NAME"] = guest_name
+    if headed:
+        env["HERMES_MEET_HEADED"] = "1"
+    if auth_state:
+        env["HERMES_MEET_AUTH_STATE"] = auth_state
+    if duration:
+        env["HERMES_MEET_DURATION"] = duration
+    # v2: realtime mode + passthroughs. The bot defaults to transcribe
+    # mode if HERMES_MEET_MODE isn't set, matching v1 behavior.
+    if mode:
+        env["HERMES_MEET_MODE"] = mode
+    if realtime_model:
+        env["HERMES_MEET_REALTIME_MODEL"] = realtime_model
+    if realtime_voice:
+        env["HERMES_MEET_REALTIME_VOICE"] = realtime_voice
+    if realtime_instructions:
+        env["HERMES_MEET_REALTIME_INSTRUCTIONS"] = realtime_instructions
+    if realtime_api_key:
+        env["HERMES_MEET_REALTIME_KEY"] = realtime_api_key
+
+    log_path = out / "bot.log"
+    # Detach: stdin=devnull, stdout/stderr → log file, new session so parent
+    # signals don't propagate.
+    log_fh = open(log_path, "ab", buffering=0)
+    try:
+        proc = subprocess.Popen(
+            [sys.executable, "-m", "plugins.google_meet.meet_bot"],
+            stdin=subprocess.DEVNULL,
+            stdout=log_fh,
+            stderr=subprocess.STDOUT,
+            env=env,
+            start_new_session=True,
+            close_fds=True,
+        )
+    finally:
+        # The subprocess now owns the log fd; we can close ours.
+        log_fh.close()
+
+    record = {
+        "pid": proc.pid,
+        "meeting_id": meeting_id,
+        "out_dir": str(out),
+        "url": url,
+        "started_at": time.time(),
+        "session_id": session_id,
+        "log_path": str(log_path),
+        "mode": mode,
+    }
+    _write_active(record)
+    return {"ok": True, **record}
+
+
+def status() -> Dict[str, Any]:
+    """Return the current meeting state, or ``{"ok": False, "reason": ...}``."""
+    active = _read_active()
+    if not active:
+        return {"ok": False, "reason": "no active meeting"}
+
+    pid = int(active.get("pid", 0))
+    alive = _pid_alive(pid) if pid else False
+
+    status_path = Path(active.get("out_dir", "")) / "status.json"
+    bot_status: Dict[str, Any] = {}
+    if status_path.is_file():
+        try:
+            bot_status = json.loads(status_path.read_text(encoding="utf-8"))
+        except Exception:
+            pass
+
+    return {
+        "ok": True,
+        "alive": alive,
+        "pid": pid,
+        "meetingId": active.get("meeting_id"),
+        "url": active.get("url"),
+        "startedAt": active.get("started_at"),
+        "outDir": active.get("out_dir"),
+        **bot_status,
+    }
+
+
+def transcript(last: Optional[int] = None) -> Dict[str, Any]:
+    """Read the current transcript file. Returns ok=False if none exists."""
+    active = _read_active()
+    if not active:
+        return {"ok": False, "reason": "no active meeting"}
+
+    tp = Path(active.get("out_dir", "")) / "transcript.txt"
+    if not tp.is_file():
+        return {
+            "ok": True,
+            "meetingId": active.get("meeting_id"),
+            "lines": [],
+            "total": 0,
+            "path": str(tp),
+        }
+    text = tp.read_text(encoding="utf-8", errors="replace")
+    all_lines = [ln for ln in text.splitlines() if ln.strip()]
+    lines = all_lines[-last:] if last else all_lines
+    return {
+        "ok": True,
+        "meetingId": active.get("meeting_id"),
+        "lines": lines,
+        "total": len(all_lines),
+        "path": str(tp),
+    }
+
+
+def enqueue_say(text: str) -> Dict[str, Any]:
+    """Append a ``say`` request to the active bot's JSONL queue.
+
+    Returns ``{"ok": False, "reason": ...}`` when no meeting is active or
+    the active bot is in transcribe-only mode. Otherwise writes a line to
+    ``<out_dir>/say_queue.jsonl`` that the bot's realtime speaker thread
+    will consume.
+    """
+    import uuid
+
+    text = (text or "").strip()
+    if not text:
+        return {"ok": False, "reason": "text is required"}
+
+    active = _read_active()
+    if not active:
+        return {"ok": False, "reason": "no active meeting"}
+    if active.get("mode") != "realtime":
+        return {
+            "ok": False,
+            "reason": (
+                "active meeting is in transcribe mode — pass mode='realtime' "
+                "to meet_join to enable agent speech"
+            ),
+        }
+
+    out_dir = Path(active.get("out_dir", ""))
+    if not out_dir.is_dir():
+        return {"ok": False, "reason": f"out_dir missing: {out_dir}"}
+
+    queue_path = out_dir / "say_queue.jsonl"
+    entry = {"id": uuid.uuid4().hex[:12], "text": text}
+    with queue_path.open("a", encoding="utf-8") as f:
+        f.write(json.dumps(entry) + "\n")
+    return {
+        "ok": True,
+        "meetingId": active.get("meeting_id"),
+        "enqueued_id": entry["id"],
+        "queue_path": str(queue_path),
+    }
+
+
+def stop(*, reason: str = "requested") -> Dict[str, Any]:
+    """Signal the active bot to leave cleanly, then clear the active pointer.
+
+    Sends SIGTERM and waits up to 10s for the bot to exit. Falls back to
+    SIGKILL if the bot doesn't respond.
+    """
+    active = _read_active()
+    if not active:
+        return {"ok": False, "reason": "no active meeting"}
+
+    pid = int(active.get("pid", 0))
+    out_dir = active.get("out_dir")
+    transcript_path = Path(out_dir) / "transcript.txt" if out_dir else None
+
+    if pid and _pid_alive(pid):
+        try:
+            os.kill(pid, signal.SIGTERM)
+        except ProcessLookupError:
+            pass
+        for _ in range(20):
+            if not _pid_alive(pid):
+                break
+            time.sleep(0.5)
+        if _pid_alive(pid):
+            try:
+                os.kill(pid, signal.SIGKILL)
+            except ProcessLookupError:
+                pass
+
+    _clear_active()
+    return {
+        "ok": True,
+        "reason": reason,
+        "meetingId": active.get("meeting_id"),
+        "transcriptPath": str(transcript_path) if transcript_path else None,
+    }
@@ -0,0 +1,10 @@
+"""Realtime speech subpackage for the google_meet plugin (v2).
+
+Provides a thin OpenAI Realtime API client and a file-queue speaker
+wrapper so the Meet bot can play synthesized speech through the
+virtual audio bridge.
+"""
+
+from .openai_client import RealtimeSession, RealtimeSpeaker  # noqa: F401
+
+__all__ = ["RealtimeSession", "RealtimeSpeaker"]
@@ -0,0 +1,332 @@
+"""OpenAI Realtime API WebSocket client + file-queue speaker.
+
+This module is the "output" side of the v2 voice bridge: it takes text,
+sends it to the OpenAI Realtime API, receives audio deltas back, and
+appends the PCM bytes to a file. A separate consumer (the audio
+bridge) streams that file into Chrome's fake microphone.
+
+Designed for simplicity: a single synchronous WebSocket connection per
+speaker, per session. The ``websockets`` package is imported lazily so
+that importing this module never fails just because the optional dep
+is missing.
+"""
+
+from __future__ import annotations
+
+import base64
+import json
+import time
+import uuid
+from pathlib import Path
+from typing import Any, Callable, Optional
+
+
+REALTIME_URL = "wss://api.openai.com/v1/realtime"
+
+
+def _require_websockets():
+    """Import ``websockets.sync.client.connect`` or raise with hint."""
+    try:
+        from websockets.sync.client import connect as _connect  # type: ignore
+    except ImportError as exc:  # pragma: no cover - exercised via test
+        raise RuntimeError(
+            "websockets package is required for OpenAI Realtime; "
+            "install with: pip install websockets"
+        ) from exc
+    return _connect
+
+
+class RealtimeSession:
+    """Minimal sync client for the OpenAI Realtime WebSocket API.
+
+    Usage:
+        sess = RealtimeSession(api_key=..., audio_sink_path=Path("out.pcm"))
+        sess.connect()
+        sess.speak("Hello team.")
+        sess.close()
+
+    Thread safety: ``speak`` and ``cancel_response`` may be called from
+    different threads; a lock serializes WebSocket writes.
+    """
+
+    def __init__(
+        self,
+        api_key: str,
+        model: str = "gpt-realtime",
+        voice: str = "alloy",
+        instructions: str = "",
+        audio_sink_path: Optional[Path] = None,
+        sample_rate: int = 24000,
+    ) -> None:
+        import threading as _threading
+        self.api_key = api_key
+        self.model = model
+        self.voice = voice
+        self.instructions = instructions
+        self.audio_sink_path = Path(audio_sink_path) if audio_sink_path else None
+        self.sample_rate = sample_rate
+        self._ws: Any = None
+        self._send_lock = _threading.Lock()
+        self._last_response_id: Optional[str] = None
+        # Public counters for status reporting.
+        self.audio_bytes_out: int = 0
+        self.last_audio_out_at: Optional[float] = None
+
+    # ── lifecycle ─────────────────────────────────────────────────────────
+
+    def connect(self) -> None:
+        """Open WS and send session.update with voice+instructions."""
+        connect = _require_websockets()
+        url = f"{REALTIME_URL}?model={self.model}"
+        headers = [
+            ("Authorization", f"Bearer {self.api_key}"),
+            ("OpenAI-Beta", "realtime=v1"),
+        ]
+        # websockets.sync.client.connect accepts either additional_headers=
+        # (newer) or extra_headers= depending on version; try the newer
+        # name first and fall back.
+        try:
+            self._ws = connect(url, additional_headers=headers)
+        except TypeError:
+            self._ws = connect(url, extra_headers=headers)
+
+        self._send_json(
+            {
+                "type": "session.update",
+                "session": {
+                    "voice": self.voice,
+                    "instructions": self.instructions,
+                    "modalities": ["audio", "text"],
+                    "output_audio_format": "pcm16",
+                    "input_audio_format": "pcm16",
+                },
+            }
+        )
+
+    def close(self) -> None:
+        if self._ws is not None:
+            try:
+                self._ws.close()
+            except Exception:
+                pass
+            self._ws = None
+
+    # ── speaking ──────────────────────────────────────────────────────────
+
+    def speak(self, text: str, timeout: float = 30.0) -> dict:
+        """Send ``text`` and accumulate the audio response.
+
+        Audio deltas are base64-decoded and appended to
+        ``audio_sink_path`` (opened 'ab' and closed per call, so a
+        separate streaming reader can consume whatever is there).
+        """
+        if self._ws is None:
+            raise RuntimeError("RealtimeSession.connect() must be called first")
+
+        start = time.monotonic()
+
+        self._send_json(
+            {
+                "type": "conversation.item.create",
+                "item": {
+                    "type": "message",
+                    "role": "user",
+                    "content": [{"type": "input_text", "text": text}],
+                },
+            }
+        )
+        self._send_json(
+            {
+                "type": "response.create",
+                "response": {"modalities": ["audio"]},
+            }
+        )
+
+        bytes_written = 0
+        sink_fp = None
+        if self.audio_sink_path is not None:
+            self.audio_sink_path.parent.mkdir(parents=True, exist_ok=True)
+            sink_fp = open(self.audio_sink_path, "ab")
+
+        try:
+            while True:
+                remaining = timeout - (time.monotonic() - start)
+                if remaining <= 0:
+                    raise TimeoutError(
+                        f"realtime response did not complete within {timeout}s"
+                    )
+                raw = self._recv(timeout=remaining)
+                if raw is None:
+                    # Connection closed by peer.
+                    break
+                try:
+                    frame = json.loads(raw) if isinstance(raw, (str, bytes, bytearray)) else raw
+                except (TypeError, ValueError):
+                    continue
+                if not isinstance(frame, dict):
+                    continue
+                ftype = frame.get("type")
+                if ftype == "response.audio.delta":
+                    b64 = frame.get("delta") or frame.get("audio") or ""
+                    if b64 and sink_fp is not None:
+                        try:
+                            chunk = base64.b64decode(b64)
+                        except (ValueError, TypeError):
+                            chunk = b""
+                        if chunk:
+                            sink_fp.write(chunk)
+                            sink_fp.flush()
+                            bytes_written += len(chunk)
+                            self.audio_bytes_out += len(chunk)
+                            self.last_audio_out_at = time.time()
+                elif ftype == "response.created":
+                    rid = (frame.get("response") or {}).get("id")
+                    if rid:
+                        self._last_response_id = rid
+                elif ftype in ("response.done", "response.completed", "response.cancelled"):
+                    break
+                elif ftype == "error":
+                    err = frame.get("error") or frame
+                    raise RuntimeError(f"realtime error: {err}")
+                # All other frames (response.created, response.output_item.*,
+                # response.audio_transcript.delta, rate_limits.updated, ...)
+                # are ignored for v2.
+        finally:
+            if sink_fp is not None:
+                sink_fp.close()
+
+        duration_ms = (time.monotonic() - start) * 1000.0
+        return {
+            "ok": True,
+            "bytes_written": bytes_written,
+            "duration_ms": duration_ms,
+        }
+
+    # ── ws plumbing ───────────────────────────────────────────────────────
+
+    def cancel_response(self) -> bool:
+        """Interrupt the in-flight response (barge-in).
+
+        Sends ``response.cancel`` on the current WebSocket so the model
+        stops generating audio immediately. Safe to call at any time;
+        returns True if a cancel was actually sent, False when there's
+        nothing to cancel or the socket isn't open.
+        """
+        if self._ws is None:
+            return False
+        try:
+            self._send_json({"type": "response.cancel"})
+            return True
+        except Exception:
+            return False
+
+    def _send_json(self, payload: dict) -> None:
+        assert self._ws is not None
+        with self._send_lock:
+            self._ws.send(json.dumps(payload))
+
+    def _recv(self, timeout: Optional[float] = None):
+        assert self._ws is not None
+        try:
+            if timeout is None:
+                return self._ws.recv()
+            return self._ws.recv(timeout=timeout)
+        except TypeError:
+            # Older websockets may not accept timeout kwarg.
+            return self._ws.recv()
+
+
+class RealtimeSpeaker:
+    """File-based JSONL queue wrapper around :class:`RealtimeSession`.
+
+    Each line in ``queue_path`` is a JSON object of the form
+    ``{"id": "<uuid>", "text": "..."}``. Processed lines are appended
+    to ``processed_path`` (if set) and then removed from the queue;
+    if ``processed_path`` is ``None``, processed lines are simply
+    dropped.
+    """
+
+    def __init__(
+        self,
+        session: RealtimeSession,
+        queue_path: Path,
+        processed_path: Optional[Path] = None,
+    ) -> None:
+        self.session = session
+        self.queue_path = Path(queue_path)
+        self.processed_path = Path(processed_path) if processed_path else None
+
+    # ── helpers ──────────────────────────────────────────────────────────
+
+    def _read_queue(self) -> list[dict]:
+        if not self.queue_path.exists():
+            return []
+        out: list[dict] = []
+        for line in self.queue_path.read_text().splitlines():
+            line = line.strip()
+            if not line:
+                continue
+            try:
+                entry = json.loads(line)
+            except ValueError:
+                continue
+            if not isinstance(entry, dict):
+                continue
+            if "id" not in entry:
+                entry["id"] = str(uuid.uuid4())
+            out.append(entry)
+        return out
+
+    def _rewrite_queue(self, remaining: list[dict]) -> None:
+        if not remaining:
+            # Keep the file but empty — consumers may be watching for
+            # new writes via mtime, and delete-then-recreate is a race.
+            self.queue_path.write_text("")
+            return
+        self.queue_path.write_text(
+            "\n".join(json.dumps(e) for e in remaining) + "\n"
+        )
+
+    def _append_processed(self, entry: dict, result: dict) -> None:
+        if self.processed_path is None:
+            return
+        self.processed_path.parent.mkdir(parents=True, exist_ok=True)
+        record = {"id": entry.get("id"), "text": entry.get("text", ""), "result": result}
+        with open(self.processed_path, "a") as fp:
+            fp.write(json.dumps(record) + "\n")
+
+    # ── main loop ────────────────────────────────────────────────────────
+
+    def run_until_stopped(
+        self,
+        stop_fn: Callable[[], bool],
+        poll_interval: float = 0.5,
+    ) -> None:
+        while not stop_fn():
+            entries = self._read_queue()
+            if not entries:
+                time.sleep(poll_interval)
+                continue
+            # Process one at a time; re-check the queue file after each
+            # speak() call because new entries may have arrived.
+            head = entries[0]
+            text = (head.get("text") or "").strip()
+            if text:
+                try:
+                    result = self.session.speak(text)
+                except Exception as exc:
+                    result = {"ok": False, "error": str(exc)}
+            else:
+                result = {"ok": True, "bytes_written": 0, "duration_ms": 0.0}
+            self._append_processed(head, result)
+
+            # Re-read the queue from disk in case it was appended to
+            # while we were speaking, then drop the head.
+            latest = self._read_queue()
+            if latest and latest[0].get("id") == head.get("id"):
+                self._rewrite_queue(latest[1:])
+            else:
+                # Fallback: drop-by-id anywhere in the queue.
+                self._rewrite_queue(
+                    [e for e in latest if e.get("id") != head.get("id")]
+                )
@@ -0,0 +1,348 @@
+"""Agent-facing tools for the google_meet plugin.
+
+Tools:
+  meet_join        — join a Google Meet URL (spawns Playwright bot locally
+                     OR on a remote node host via node=<name>)
+  meet_status      — report bot liveness + transcript progress
+  meet_transcript  — read the current transcript (optional last-N)
+  meet_leave       — signal the bot to leave cleanly
+  meet_say         — (v2) speak text through the realtime audio bridge.
+                     Requires the active meeting to have been joined with
+                     mode='realtime'.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Any, Dict, Optional
+
+from plugins.google_meet import process_manager as pm
+
+
+# ---------------------------------------------------------------------------
+# Runtime gate
+# ---------------------------------------------------------------------------
+
+def check_meet_requirements() -> bool:
+    """Return True when the plugin can actually run LOCALLY.
+
+    Gates on:
+      * Python ``playwright`` package importable
+      * the plugin being on a supported platform (Linux or macOS)
+
+    Note: remote-node operation (``node=<name>``) only needs the
+    ``websockets`` dep on the gateway side — Chromium lives on the node.
+    But the plugin-level gate keeps the v1 semantics; individual tool
+    handlers relax the requirement when a node is addressed.
+    """
+    import platform as _p
+    if _p.system().lower() not in ("linux", "darwin"):
+        return False
+    try:
+        import playwright  # noqa: F401
+    except ImportError:
+        return False
+    return True
+
+
+# ---------------------------------------------------------------------------
+# Node client helper
+# ---------------------------------------------------------------------------
+
+def _resolve_node_client(node: Optional[str]):
+    """Return (NodeClient, node_name) for *node*, or (None, None) to run local.
+
+    Raises RuntimeError with a readable message if the node is named but
+    unresolvable, so the handler can surface a clear error to the agent.
+    """
+    if node is None or node == "":
+        return None, None
+    from plugins.google_meet.node.registry import NodeRegistry
+    from plugins.google_meet.node.client import NodeClient
+
+    reg = NodeRegistry()
+    entry = reg.resolve(node if node != "auto" else None)
+    if entry is None:
+        raise RuntimeError(
+            f"no registered meet node matches {node!r} — "
+            "run `hermes meet node approve <name> <url> <token>` first"
+        )
+    client = NodeClient(url=entry["url"], token=entry["token"])
+    return client, entry.get("name")
+
+
+# ---------------------------------------------------------------------------
+# Schemas
+# ---------------------------------------------------------------------------
+
+MEET_JOIN_SCHEMA: Dict[str, Any] = {
+    "name": "meet_join",
+    "description": (
+        "Join a Google Meet call and start scraping live captions into a "
+        "transcript file. Only meet.google.com URLs are accepted; no calendar "
+        "scanning, no auto-dial. Spawns a headless Chromium subprocess that "
+        "runs in parallel with the agent loop — returns immediately. Poll "
+        "with meet_status and read captions with meet_transcript. Reminder "
+        "to the agent: you should announce yourself in the meeting (there is "
+        "no automatic consent announcement)."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "url": {
+                "type": "string",
+                "description": (
+                    "Full https://meet.google.com/... URL. Required."
+                ),
+            },
+            "mode": {
+                "type": "string",
+                "enum": ["transcribe", "realtime"],
+                "description": (
+                    "transcribe (default): listen-only, scrape captions. "
+                    "realtime: also enable agent speech via meet_say "
+                    "(requires OpenAI Realtime key + platform audio bridge)."
+                ),
+            },
+            "guest_name": {
+                "type": "string",
+                "description": (
+                    "Display name to use when joining as guest. Defaults to "
+                    "'Hermes Agent'."
+                ),
+            },
+            "duration": {
+                "type": "string",
+                "description": (
+                    "Optional max duration before auto-leave (e.g. '30m', "
+                    "'2h', '90s'). Omit to stay until meet_leave is called."
+                ),
+            },
+            "headed": {
+                "type": "boolean",
+                "description": (
+                    "Run Chromium headed instead of headless (debug only). "
+                    "Default false."
+                ),
+            },
+            "node": {
+                "type": "string",
+                "description": (
+                    "Name of a registered remote node to run the bot on "
+                    "(useful when the gateway runs on a headless Linux box "
+                    "but the user's Chrome with a signed-in Google profile "
+                    "lives on their Mac). Pass 'auto' to use the single "
+                    "registered node. Default: run locally. Nodes are "
+                    "approved via `hermes meet node approve`."
+                ),
+            },
+        },
+        "required": ["url"],
+        "additionalProperties": False,
+    },
+}
+
+MEET_STATUS_SCHEMA: Dict[str, Any] = {
+    "name": "meet_status",
+    "description": (
+        "Report the current Meet session state — whether the bot is alive, "
+        "has joined, is sitting in the lobby, number of transcript lines "
+        "captured, and last-caption timestamp."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "node": {"type": "string"},
+        },
+        "additionalProperties": False,
+    },
+}
+
+MEET_TRANSCRIPT_SCHEMA: Dict[str, Any] = {
+    "name": "meet_transcript",
+    "description": (
+        "Read the scraped transcript for the active Meet session. Returns "
+        "full transcript unless 'last' is set, in which case returns the last "
+        "N lines only."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "last": {
+                "type": "integer",
+                "description": (
+                    "Optional: return only the last N caption lines. Useful "
+                    "for polling during a meeting without re-reading the "
+                    "whole transcript."
+                ),
+                "minimum": 1,
+            },
+            "node": {"type": "string"},
+        },
+        "additionalProperties": False,
+    },
+}
+
+MEET_LEAVE_SCHEMA: Dict[str, Any] = {
+    "name": "meet_leave",
+    "description": (
+        "Leave the active Meet call cleanly, stop caption scraping, and "
+        "finalize the transcript file. Safe to call when no meeting is "
+        "active — returns ok=false with a reason."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "node": {"type": "string"},
+        },
+        "additionalProperties": False,
+    },
+}
+
+MEET_SAY_SCHEMA: Dict[str, Any] = {
+    "name": "meet_say",
+    "description": (
+        "Speak text into the active Meet call. Requires the active meeting "
+        "to have been joined with mode='realtime'. The text is queued to "
+        "the bot's OpenAI Realtime session; the generated audio is streamed "
+        "into Chrome's fake microphone via a virtual audio device "
+        "(PulseAudio null-sink on Linux, BlackHole on macOS). Returns "
+        "immediately — the actual speech lags by a couple of seconds."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "text": {"type": "string", "description": "Text to speak."},
+            "node": {"type": "string"},
+        },
+        "required": ["text"],
+        "additionalProperties": False,
+    },
+}
+
+
+# ---------------------------------------------------------------------------
+# Handlers
+# ---------------------------------------------------------------------------
+
+def _json(obj: Any) -> str:
+    return json.dumps(obj, ensure_ascii=False)
+
+
+def _err(msg: str, **extra) -> str:
+    return _json({"success": False, "error": msg, **extra})
+
+
+def handle_meet_join(args: Dict[str, Any], **_kw) -> str:
+    url = (args.get("url") or "").strip()
+    if not url:
+        return _err("url is required")
+    mode = (args.get("mode") or "transcribe").strip().lower()
+    if mode not in ("transcribe", "realtime"):
+        return _err(f"mode must be 'transcribe' or 'realtime' (got {mode!r})")
+
+    node = args.get("node")
+    try:
+        client, node_name = _resolve_node_client(node)
+    except RuntimeError as e:
+        return _err(str(e))
+
+    if client is not None:
+        # Remote path — delegate to the node host.
+        try:
+            res = client.start_bot(
+                url=url,
+                guest_name=str(args.get("guest_name") or "Hermes Agent"),
+                duration=str(args.get("duration")) if args.get("duration") else None,
+                headed=bool(args.get("headed", False)),
+                mode=mode,
+            )
+            return _json({"success": bool(res.get("ok")), "node": node_name, **res})
+        except Exception as e:
+            return _err(f"remote node start_bot failed: {e}", node=node_name)
+
+    # Local path — same as v1, with v2 params.
+    if not check_meet_requirements():
+        return _err(
+            "google_meet plugin prerequisites missing — install with "
+            "`pip install playwright && python -m playwright install "
+            "chromium`. Plugin is supported on Linux and macOS only."
+        )
+    res = pm.start(
+        url=url,
+        headed=bool(args.get("headed", False)),
+        guest_name=str(args.get("guest_name") or "Hermes Agent"),
+        duration=str(args.get("duration")) if args.get("duration") else None,
+        mode=mode,
+    )
+    return _json({"success": bool(res.get("ok")), **res})
+
+
+def handle_meet_status(args: Dict[str, Any], **_kw) -> str:
+    try:
+        client, node_name = _resolve_node_client(args.get("node"))
+    except RuntimeError as e:
+        return _err(str(e))
+    if client is not None:
+        try:
+            res = client.status()
+            return _json({"success": bool(res.get("ok")), "node": node_name, **res})
+        except Exception as e:
+            return _err(f"remote node status failed: {e}", node=node_name)
+    res = pm.status()
+    return _json({"success": bool(res.get("ok")), **res})
+
+
+def handle_meet_transcript(args: Dict[str, Any], **_kw) -> str:
+    last = args.get("last")
+    try:
+        last_i = int(last) if last is not None else None
+        if last_i is not None and last_i < 1:
+            last_i = None
+    except (TypeError, ValueError):
+        last_i = None
+    try:
+        client, node_name = _resolve_node_client(args.get("node"))
+    except RuntimeError as e:
+        return _err(str(e))
+    if client is not None:
+        try:
+            res = client.transcript(last=last_i)
+            return _json({"success": bool(res.get("ok")), "node": node_name, **res})
+        except Exception as e:
+            return _err(f"remote node transcript failed: {e}", node=node_name)
+    res = pm.transcript(last=last_i)
+    return _json({"success": bool(res.get("ok")), **res})
+
+
+def handle_meet_leave(args: Dict[str, Any], **_kw) -> str:
+    try:
+        client, node_name = _resolve_node_client(args.get("node"))
+    except RuntimeError as e:
+        return _err(str(e))
+    if client is not None:
+        try:
+            res = client.stop()
+            return _json({"success": bool(res.get("ok")), "node": node_name, **res})
+        except Exception as e:
+            return _err(f"remote node stop failed: {e}", node=node_name)
+    res = pm.stop(reason="agent called meet_leave")
+    return _json({"success": bool(res.get("ok")), **res})
+
+
+def handle_meet_say(args: Dict[str, Any], **_kw) -> str:
+    text = (args.get("text") or "").strip()
+    if not text:
+        return _err("text is required")
+    try:
+        client, node_name = _resolve_node_client(args.get("node"))
+    except RuntimeError as e:
+        return _err(str(e))
+    if client is not None:
+        try:
+            res = client.say(text)
+            return _json({"success": bool(res.get("ok")), "node": node_name, **res})
+        except Exception as e:
+            return _err(f"remote node say failed: {e}", node=node_name)
+    res = pm.enqueue_say(text)
+    return _json({"success": bool(res.get("ok")), **res})
@@ -3,7 +3,9 @@
 Long-term memory with knowledge graph, entity resolution, and multi-strategy
 retrieval. Supports cloud (API key) and local modes.

-Configurable timeout via HINDSIGHT_TIMEOUT env var or config.json.
+Configurable request timeout via HINDSIGHT_TIMEOUT env var or config.json.
+Configurable embedded daemon idle timeout via HINDSIGHT_IDLE_TIMEOUT env var
+or config.json idle_timeout.

 Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.

@@ -14,6 +16,7 @@ Config via environment variables:
  HINDSIGHT_API_URL                — API endpoint
  HINDSIGHT_MODE                   — cloud or local (default: cloud)
  HINDSIGHT_TIMEOUT                — API request timeout in seconds (default: 120)
+  HINDSIGHT_IDLE_TIMEOUT           — embedded daemon idle timeout seconds; 0 disables shutdown (default: 300)
  HINDSIGHT_RETAIN_TAGS            — comma-separated tags attached to retained memories
  HINDSIGHT_RETAIN_SOURCE          — metadata source value attached to retained memories
  HINDSIGHT_RETAIN_USER_PREFIX     — label used before user turns in retained transcripts
@@ -45,6 +48,7 @@ _DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
 _DEFAULT_LOCAL_URL = "http://localhost:8888"
 _MIN_CLIENT_VERSION = "0.4.22"
 _DEFAULT_TIMEOUT = 120  # seconds — cloud API can take 30-40s per request
+_DEFAULT_IDLE_TIMEOUT = 300  # seconds — Hindsight embedded daemon default
 _VALID_BUDGETS = {"low", "mid", "high"}
 _PROVIDER_DEFAULT_MODELS = {
    "openai": "gpt-4o-mini",
@@ -59,6 +63,17 @@ _PROVIDER_DEFAULT_MODELS = {
 }


+def _parse_int_setting(value: Any, default: int) -> int:
+    """Parse an integer config/env value, falling back on invalid input."""
+    if value is None or value == "":
+        return default
+    try:
+        return int(value)
+    except (TypeError, ValueError):
+        logger.warning("Invalid integer Hindsight setting %r; using default %s", value, default)
+        return default
+
+
 def _check_local_runtime() -> tuple[bool, str | None]:
    """Return whether local embedded Hindsight imports cleanly.

@@ -203,6 +218,8 @@ def _load_config() -> dict:
    return {
        "mode": os.environ.get("HINDSIGHT_MODE", "cloud"),
        "apiKey": os.environ.get("HINDSIGHT_API_KEY", ""),
+        "timeout": _parse_int_setting(os.environ.get("HINDSIGHT_TIMEOUT"), _DEFAULT_TIMEOUT),
+        "idle_timeout": _parse_int_setting(os.environ.get("HINDSIGHT_IDLE_TIMEOUT"), _DEFAULT_IDLE_TIMEOUT),
        "retain_tags": os.environ.get("HINDSIGHT_RETAIN_TAGS", ""),
        "retain_source": os.environ.get("HINDSIGHT_RETAIN_SOURCE", ""),
        "retain_user_prefix": os.environ.get("HINDSIGHT_RETAIN_USER_PREFIX", "User"),
@@ -304,6 +321,16 @@ def _build_embedded_profile_env(config: dict[str, Any], *, llm_api_key: str | No
    }
    if current_base_url:
        env_values["HINDSIGHT_API_LLM_BASE_URL"] = str(current_base_url)
+
+    idle_timeout = (
+        config.get("idle_timeout")
+        if config.get("idle_timeout") is not None
+        else os.environ.get("HINDSIGHT_IDLE_TIMEOUT")
+    )
+    if idle_timeout is not None and idle_timeout != "":
+        env_values["HINDSIGHT_EMBED_DAEMON_IDLE_TIMEOUT"] = str(
+            _parse_int_setting(idle_timeout, _DEFAULT_IDLE_TIMEOUT)
+        )
    return env_values


@@ -412,6 +439,7 @@ class HindsightMemoryProvider(MemoryProvider):
        self._turn_index = 0
        self._client = None
        self._timeout = _DEFAULT_TIMEOUT
+        self._idle_timeout = _DEFAULT_IDLE_TIMEOUT
        self._prefetch_result = ""
        self._prefetch_lock = threading.Lock()
        self._prefetch_thread = None
@@ -498,16 +526,24 @@ class HindsightMemoryProvider(MemoryProvider):

        print("\n  Configuring Hindsight memory:\n")

+        existing_config = self._config if isinstance(self._config, dict) else _load_config()
+        if not isinstance(existing_config, dict):
+            existing_config = {}
+
        # Step 1: Mode selection
+        mode_values = ["cloud", "local_embedded", "local_external"]
        mode_items = [
            ("Cloud", "Hindsight Cloud API (lightweight, just needs an API key)"),
            ("Local Embedded", "Run Hindsight locally (downloads ~200MB, needs LLM key)"),
            ("Local External", "Connect to an existing Hindsight instance"),
        ]
-        mode_idx = _curses_select("  Select mode", mode_items, default=0)
-        mode = ["cloud", "local_embedded", "local_external"][mode_idx]
+        existing_mode = existing_config.get("mode")
+        mode_default_idx = mode_values.index(existing_mode) if existing_mode in mode_values else 0
+        mode_idx = _curses_select("  Select mode", mode_items, default=mode_default_idx)
+        mode = mode_values[mode_idx]

-        provider_config: dict = {"mode": mode}
+        provider_config: dict = dict(existing_config)
+        provider_config["mode"] = mode
        env_writes: dict = {}

        # Step 2: Install/upgrade deps for selected mode
@@ -573,38 +609,59 @@ class HindsightMemoryProvider(MemoryProvider):
                (p, f"default model: {_PROVIDER_DEFAULT_MODELS[p]}")
                for p in providers_list
            ]
-            llm_idx = _curses_select("  Select LLM provider", llm_items, default=0)
+            existing_llm_provider = provider_config.get("llm_provider")
+            llm_default_idx = providers_list.index(existing_llm_provider) if existing_llm_provider in providers_list else 0
+            llm_idx = _curses_select("  Select LLM provider", llm_items, default=llm_default_idx)
            llm_provider = providers_list[llm_idx]

            provider_config["llm_provider"] = llm_provider

            if llm_provider == "openai_compatible":
-                val = input("  LLM endpoint URL (e.g. http://192.168.1.10:8080/v1): ").strip()
+                existing_base_url = provider_config.get("llm_base_url", "")
+                prompt = "  LLM endpoint URL (e.g. http://192.168.1.10:8080/v1)"
+                if existing_base_url:
+                    prompt += f" [{existing_base_url}]"
+                prompt += ": "
+                val = input(prompt).strip()
                if val:
                    provider_config["llm_base_url"] = val
            elif llm_provider == "openrouter":
                provider_config["llm_base_url"] = "https://openrouter.ai/api/v1"

-            default_model = _PROVIDER_DEFAULT_MODELS.get(llm_provider, "gpt-4o-mini")
-            val = input(f"  LLM model [{default_model}]: ").strip()
-            provider_config["llm_model"] = val or default_model
+            provider_default_model = _PROVIDER_DEFAULT_MODELS.get(llm_provider, "gpt-4o-mini")
+            current_model = provider_config.get("llm_model") or provider_default_model
+            val = input(f"  LLM model [{current_model}]: ").strip()
+            provider_config["llm_model"] = val or current_model

            sys.stdout.write("  LLM API key: ")
            sys.stdout.flush()
            llm_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
-            # Always write explicitly (including empty) so the provider sees ""
-            # rather than a missing variable.  The daemon reads from .env at
-            # startup and fails when HINDSIGHT_LLM_API_KEY is unset.
-            env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
+            if llm_key:
+                env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
+            else:
+                env_path = Path(hermes_home) / ".env"
+                existing_llm_key = ""
+                if env_path.exists():
+                    for line in env_path.read_text().splitlines():
+                        if line.startswith("HINDSIGHT_LLM_API_KEY="):
+                            existing_llm_key = line.split("=", 1)[1]
+                            break
+                env_writes["HINDSIGHT_LLM_API_KEY"] = existing_llm_key

        # Step 4: Save everything
-        provider_config["bank_id"] = "hermes"
-        provider_config["recall_budget"] = "mid"
-        # Read existing timeout from config if present, otherwise use default
-        existing_timeout = self._config.get("timeout") if self._config else None
-        timeout_val = existing_timeout if existing_timeout else _DEFAULT_TIMEOUT
+        provider_config.setdefault("bank_id", "hermes")
+        provider_config.setdefault("recall_budget", "mid")
+        # Read existing timeout from config if present, otherwise use default.
+        # Preserve explicit 0 values instead of treating them as blank.
+        existing_timeout = provider_config.get("timeout")
+        timeout_val = existing_timeout if existing_timeout is not None else _DEFAULT_TIMEOUT
        provider_config["timeout"] = timeout_val
        env_writes["HINDSIGHT_TIMEOUT"] = str(timeout_val)
+        if mode == "local_embedded":
+            existing_idle_timeout = provider_config.get("idle_timeout")
+            idle_timeout_val = existing_idle_timeout if existing_idle_timeout is not None else _DEFAULT_IDLE_TIMEOUT
+            provider_config["idle_timeout"] = idle_timeout_val
+            env_writes["HINDSIGHT_IDLE_TIMEOUT"] = str(idle_timeout_val)
        config["memory"]["provider"] = "hindsight"
        save_config(config)

@@ -693,6 +750,7 @@ class HindsightMemoryProvider(MemoryProvider):
            {"key": "recall_max_input_chars", "description": "Maximum input query length for auto-recall", "default": 800},
            {"key": "recall_prompt_preamble", "description": "Custom preamble for recalled memories in context"},
            {"key": "timeout", "description": "API request timeout in seconds", "default": _DEFAULT_TIMEOUT},
+            {"key": "idle_timeout", "description": "Embedded daemon idle timeout in seconds (0 disables auto-shutdown)", "default": _DEFAULT_IDLE_TIMEOUT, "when": {"mode": "local_embedded"}},
        ]

    def _get_client(self):
@@ -720,6 +778,14 @@ class HindsightMemoryProvider(MemoryProvider):
                )
                if self._llm_base_url:
                    kwargs["llm_base_url"] = self._llm_base_url
+                idle_timeout = _parse_int_setting(
+                    self._config.get("idle_timeout")
+                    if self._config.get("idle_timeout") is not None
+                    else os.environ.get("HINDSIGHT_IDLE_TIMEOUT", self._idle_timeout),
+                    _DEFAULT_IDLE_TIMEOUT,
+                )
+                self._idle_timeout = idle_timeout
+                kwargs["idle_timeout"] = idle_timeout
                self._client = HindsightEmbedded(**kwargs)
            else:
                from hindsight_client import Hindsight
@@ -736,6 +802,38 @@ class HindsightMemoryProvider(MemoryProvider):
        """Schedule *coro* on the shared loop using the configured timeout."""
        return _run_sync(coro, timeout=self._timeout)

+    def _is_retriable_embedded_connection_error(self, exc: Exception) -> bool:
+        """Return True for stale embedded-daemon connection failures."""
+        if self._mode != "local_embedded":
+            return False
+        text = f"{type(exc).__name__}: {exc}".lower()
+        return any(
+            marker in text
+            for marker in (
+                "cannot connect to host",
+                "connection refused",
+                "connect call failed",
+                "clientconnectorerror",
+            )
+        )
+
+    def _run_hindsight_operation(self, operation):
+        """Run an async Hindsight client operation, retrying once after idle shutdown."""
+        client = self._get_client()
+        try:
+            return self._run_sync(operation(client))
+        except Exception as exc:
+            if not self._is_retriable_embedded_connection_error(exc):
+                raise
+            logger.info(
+                "Hindsight embedded daemon appears unreachable; recreating client and retrying once: %s",
+                exc,
+            )
+            self._client = None
+            client = self._get_client()
+            self._client = client
+            return self._run_sync(operation(client))
+
    def initialize(self, session_id: str, **kwargs) -> None:
        self._session_id = str(session_id or "").strip()
        self._parent_session_id = str(kwargs.get("parent_session_id", "") or "").strip()
@@ -790,7 +888,14 @@ class HindsightMemoryProvider(MemoryProvider):
        self._session_turns = []
        self._mode = self._config.get("mode", "cloud")
        # Read timeout from config or env var, fall back to default
-        self._timeout = self._config.get("timeout") or int(os.environ.get("HINDSIGHT_TIMEOUT", str(_DEFAULT_TIMEOUT)))
+        self._timeout = _parse_int_setting(
+            self._config.get("timeout") if self._config.get("timeout") is not None else os.environ.get("HINDSIGHT_TIMEOUT"),
+            _DEFAULT_TIMEOUT,
+        )
+        self._idle_timeout = _parse_int_setting(
+            self._config.get("idle_timeout") if self._config.get("idle_timeout") is not None else os.environ.get("HINDSIGHT_IDLE_TIMEOUT"),
+            _DEFAULT_IDLE_TIMEOUT,
+        )
        # "local" is a legacy alias for "local_embedded"
        if self._mode == "local":
            self._mode = "local_embedded"
@@ -981,10 +1086,9 @@ class HindsightMemoryProvider(MemoryProvider):

        def _run():
            try:
-                client = self._get_client()
                if self._prefetch_method == "reflect":
                    logger.debug("Prefetch: calling reflect (bank=%s, query_len=%d)", self._bank_id, len(query))
-                    resp = self._run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
+                    resp = self._run_hindsight_operation(lambda client: client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
                    text = resp.text or ""
                else:
                    recall_kwargs: dict = {
@@ -998,7 +1102,7 @@ class HindsightMemoryProvider(MemoryProvider):
                        recall_kwargs["types"] = self._recall_types
                    logger.debug("Prefetch: calling recall (bank=%s, query_len=%d, budget=%s)",
                                 self._bank_id, len(query), self._budget)
-                    resp = self._run_sync(client.arecall(**recall_kwargs))
+                    resp = self._run_hindsight_operation(lambda client: client.arecall(**recall_kwargs))
                    num_results = len(resp.results) if resp.results else 0
                    logger.debug("Prefetch: recall returned %d results", num_results)
                    text = "\n".join(f"- {r.text}" for r in resp.results if r.text) if resp.results else ""
@@ -1117,7 +1221,6 @@ class HindsightMemoryProvider(MemoryProvider):

        def _sync():
            try:
-                client = self._get_client()
                item = self._build_retain_kwargs(
                    content,
                    context=self._retain_context,
@@ -1131,12 +1234,14 @@ class HindsightMemoryProvider(MemoryProvider):
                item.pop("retain_async", None)
                logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
                             self._bank_id, self._document_id, self._retain_async, len(content), len(self._session_turns))
-                self._run_sync(client.aretain_batch(
-                    bank_id=self._bank_id,
-                    items=[item],
-                    document_id=self._document_id,
-                    retain_async=self._retain_async,
-                ))
+                self._run_hindsight_operation(
+                    lambda client: client.aretain_batch(
+                        bank_id=self._bank_id,
+                        items=[item],
+                        document_id=self._document_id,
+                        retain_async=self._retain_async,
+                    )
+                )
                logger.debug("Hindsight retain succeeded")
            except Exception as e:
                logger.warning("Hindsight sync failed: %s", e, exc_info=True)
@@ -1152,12 +1257,6 @@ class HindsightMemoryProvider(MemoryProvider):
        return [RETAIN_SCHEMA, RECALL_SCHEMA, REFLECT_SCHEMA]

    def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
-        try:
-            client = self._get_client()
-        except Exception as e:
-            logger.warning("Hindsight client init failed: %s", e)
-            return tool_error(f"Hindsight client unavailable: {e}")
-
        if tool_name == "hindsight_retain":
            content = args.get("content", "")
            if not content:
@@ -1171,7 +1270,7 @@ class HindsightMemoryProvider(MemoryProvider):
                )
                logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
                             self._bank_id, len(content), context)
-                self._run_sync(client.aretain(**retain_kwargs))
+                self._run_hindsight_operation(lambda client: client.aretain(**retain_kwargs))
                logger.debug("Tool hindsight_retain: success")
                return json.dumps({"result": "Memory stored successfully."})
            except Exception as e:
@@ -1194,7 +1293,7 @@ class HindsightMemoryProvider(MemoryProvider):
                    recall_kwargs["types"] = self._recall_types
                logger.debug("Tool hindsight_recall: bank=%s, query_len=%d, budget=%s",
                             self._bank_id, len(query), self._budget)
-                resp = self._run_sync(client.arecall(**recall_kwargs))
+                resp = self._run_hindsight_operation(lambda client: client.arecall(**recall_kwargs))
                num_results = len(resp.results) if resp.results else 0
                logger.debug("Tool hindsight_recall: %d results", num_results)
                if not resp.results:
@@ -1212,9 +1311,11 @@ class HindsightMemoryProvider(MemoryProvider):
            try:
                logger.debug("Tool hindsight_reflect: bank=%s, query_len=%d, budget=%s",
                             self._bank_id, len(query), self._budget)
-                resp = self._run_sync(client.areflect(
-                    bank_id=self._bank_id, query=query, budget=self._budget
-                ))
+                resp = self._run_hindsight_operation(
+                    lambda client: client.areflect(
+                        bank_id=self._bank_id, query=query, budget=self._budget
+                    )
+                )
                logger.debug("Tool hindsight_reflect: response_len=%d", len(resp.text or ""))
                return json.dumps({"result": resp.text or "No relevant memories found."})
            except Exception as e:
@@ -1231,9 +1332,19 @@ class HindsightMemoryProvider(MemoryProvider):
        if self._client is not None:
            try:
                if self._mode == "local_embedded":
-                    # Use the public close() API. The RuntimeError from
-                    # aiohttp's "attached to a different loop" is expected
-                    # and harmless — the daemon keeps running independently.
+                    # HindsightEmbedded.close() delegates to its sync client.close().
+                    # When Hermes created/used that client on the shared async loop,
+                    # closing it from this thread can raise "attached to a different
+                    # loop" before aiohttp releases the session. Close the embedded
+                    # inner async client on the shared loop first, then let the
+                    # wrapper clean up daemon/UI bookkeeping.
+                    inner_client = getattr(self._client, "_client", None)
+                    if inner_client is not None and hasattr(inner_client, "aclose"):
+                        _run_sync(inner_client.aclose())
+                        try:
+                            self._client._client = None
+                        except Exception:
+                            pass
                    try:
                        self._client.close()
                    except RuntimeError:
@@ -22,6 +22,7 @@ import threading
 import time
 from typing import Any, Dict, List, Optional

+from agent.memory_manager import sanitize_context
 from agent.memory_provider import MemoryProvider
 from tools.registry import tool_error

@@ -37,7 +38,10 @@ PROFILE_SCHEMA = {
    "description": (
        "Retrieve or update a peer card from Honcho — a curated list of key facts "
        "about that peer (name, role, preferences, communication style, patterns). "
-        "Pass `card` to update; omit `card` to read."
+        "Pass `card` to update; omit `card` to read.  If the card is empty, the "
+        "result includes a `hint` field explaining why (observation disabled, "
+        "fresh peer, dialectic layer still warming up, etc.) — this is NOT an "
+        "error.  Peer cards accumulate over time from observed conversation."
    ),
    "parameters": {
        "type": "object",
@@ -1056,6 +1060,63 @@ class HonchoMemoryProvider(MemoryProvider):

        return chunks

+    def _empty_profile_hint(self, peer: str) -> Dict[str, Any]:
+        """Build a diagnostic hint when honcho_profile returns an empty card.
+
+        A literal "No profile facts available yet." tells the model nothing
+        about WHY.  The model then often surfaces it to the user as a cryptic
+        error.  This hint enumerates the likely causes so the model can
+        explain the situation (or retry with a different peer).
+
+        Ordered by likelihood for a typical deployment:
+          1. Observation is disabled for this peer
+          2. Card hasn't accumulated yet (fresh peer, not enough dialectic
+             cycles — dialectic cadence runs every N turns)
+          3. Self-hosted Honcho backend doesn't support peer cards
+             (honcho-ai server < 3.x)
+        """
+        cfg = self._config
+        reasons: List[str] = []
+
+        if cfg is not None:
+            if peer == "user":
+                observe_me = bool(getattr(cfg, "user_observe_me", True))
+                observe_others = bool(getattr(cfg, "user_observe_others", True))
+            else:
+                observe_me = bool(getattr(cfg, "ai_observe_me", True))
+                observe_others = bool(getattr(cfg, "ai_observe_others", True))
+            if not (observe_me or observe_others):
+                reasons.append(
+                    f"observation is disabled for peer '{peer}' "
+                    f"(user_observe_me/ai_observe_me in config)"
+                )
+
+        cadence = getattr(self, "_dialectic_cadence", 1)
+        turn = getattr(self, "_turn_count", 0)
+        if turn < max(2, cadence):
+            reasons.append(
+                f"this session has only {turn} turn(s); peer cards accumulate "
+                f"as the dialectic layer reasons over conversation history "
+                f"(cadence every {cadence} turn(s))"
+            )
+
+        if not reasons:
+            reasons.append(
+                "peer card has no facts yet — Honcho's dialectic layer builds "
+                "this over time from observed turns; self-hosted Honcho < 3.x "
+                "does not support peer cards at all"
+            )
+
+        return {
+            "result": "No profile facts available yet.",
+            "hint": (
+                "This is not an error.  "
+                + "; ".join(reasons)
+                + ".  Try honcho_reasoning for a synthesized answer, or "
+                "honcho_search to query raw conversation excerpts."
+            ),
+        }
+
    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
        """Record the conversation turn in Honcho (non-blocking).

@@ -1068,13 +1129,15 @@ class HonchoMemoryProvider(MemoryProvider):
            return

        msg_limit = self._config.message_max_chars if self._config else 25000
+        clean_user_content = sanitize_context(user_content or "").strip()
+        clean_assistant_content = sanitize_context(assistant_content or "").strip()

        def _sync():
            try:
                session = self._manager.get_or_create(self._session_key)
-                for chunk in self._chunk_message(user_content, msg_limit):
+                for chunk in self._chunk_message(clean_user_content, msg_limit):
                    session.add_message("user", chunk)
-                for chunk in self._chunk_message(assistant_content, msg_limit):
+                for chunk in self._chunk_message(clean_assistant_content, msg_limit):
                    session.add_message("assistant", chunk)
                self._manager._flush_session(session)
            except Exception as e:
@@ -1087,8 +1150,20 @@ class HonchoMemoryProvider(MemoryProvider):
        )
        self._sync_thread.start()

-    def on_memory_write(self, action: str, target: str, content: str) -> None:
-        """Mirror built-in user profile writes as Honcho conclusions."""
+    def on_memory_write(
+        self,
+        action: str,
+        target: str,
+        content: str,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        """Mirror built-in user profile writes as Honcho conclusions.
+
+        ``metadata`` is accepted for compatibility with the write-origin
+        work landed in main (commit 6a957a74); it's not yet threaded into
+        the Honcho conclusion payload.  Left as a follow-up so this PR
+        stays focused on the 7-PR consolidation and its review follow-ups.
+        """
        if action != "add" or target != "user" or not content:
            return
        if self._cron_skipped:
@@ -1154,7 +1229,7 @@ class HonchoMemoryProvider(MemoryProvider):
                    return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
                card = self._manager.get_peer_card(self._session_key, peer=peer)
                if not card:
-                    return json.dumps({"result": "No profile facts available yet."})
+                    return json.dumps(self._empty_profile_hint(peer))
                return json.dumps({"result": card})

            elif tool_name == "honcho_search":
@@ -273,9 +273,38 @@ def _write_config(cfg: dict, path: Path | None = None) -> None:


 def _resolve_api_key(cfg: dict) -> str:
-    """Resolve API key with host -> root -> env fallback."""
+    """Resolve API key with host -> root -> env fallback.
+
+    For self-hosted instances configured with ``baseUrl`` instead of an API
+    key, returns ``"local"`` so that credential guards throughout the CLI
+    don't reject a valid configuration.  The ``baseUrl`` is scheme-validated
+    (http/https only) so that a typo like ``baseUrl: true`` can't silently
+    pass the guard.  Schemeless strings that look like host:port (legacy
+    config shapes, e.g. ``localhost:8000``) still pass — the Honcho SDK
+    will reject them itself with a clearer error than ours.
+    """
    host_key = ((cfg.get("hosts") or {}).get(_host_key()) or {}).get("apiKey")
-    return host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
+    key = host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
+    if not key:
+        base_url = cfg.get("baseUrl") or cfg.get("base_url") or os.environ.get("HONCHO_BASE_URL", "")
+        base_url = (base_url or "").strip()
+        if base_url:
+            from urllib.parse import urlparse
+            try:
+                parsed = urlparse(base_url)
+            except (TypeError, ValueError):
+                parsed = None
+            if parsed and parsed.scheme in ("http", "https") and parsed.netloc:
+                return "local"
+            # Schemeless but looks like a host (contains '.' or ':' and isn't
+            # a boolean literal): let it through so legacy configs don't
+            # regress into "no API key configured" when they previously worked.
+            lowered = base_url.lower()
+            if lowered not in ("true", "false", "none", "null") and any(
+                c in base_url for c in ".:"
+            ) and not base_url.isdigit():
+                return "local"
+    return key


 def _prompt(label: str, default: str | None = None, secret: bool = False) -> str:
@@ -16,6 +16,7 @@ from __future__ import annotations
 import json
 import os
 import logging
+import hashlib
 from dataclasses import dataclass, field
 from pathlib import Path

@@ -27,7 +28,6 @@ if TYPE_CHECKING:

 logger = logging.getLogger(__name__)

-GLOBAL_CONFIG_PATH = Path.home() / ".honcho" / "config.json"
 HOST = "hermes"


@@ -53,6 +53,11 @@ def resolve_active_host() -> str:
    return HOST


+def resolve_global_config_path() -> Path:
+    """Return the shared Honcho config path for the current HOME."""
+    return Path.home() / ".honcho" / "config.json"
+
+
 def resolve_config_path() -> Path:
    """Return the active Honcho config path.

@@ -72,7 +77,7 @@ def resolve_config_path() -> Path:
    if default_path != local_path and default_path.exists():
        return default_path

-    return GLOBAL_CONFIG_PATH
+    return resolve_global_config_path()


 _RECALL_MODE_ALIASES = {"auto": "hybrid"}
@@ -138,6 +143,15 @@ def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] |
    return None


+# Default HTTP timeout (seconds) applied when no explicit timeout is
+# configured via HonchoClientConfig.timeout, honcho.timeout / requestTimeout,
+# or HONCHO_TIMEOUT. Honcho calls happen on the post-response path of
+# run_conversation; without a cap the agent can block indefinitely when
+# the Honcho backend is unreachable, preventing the gateway from
+# delivering the already-generated response.
+_DEFAULT_HTTP_TIMEOUT = 30.0
+
+
 def _resolve_optional_float(*values: Any) -> float | None:
    """Return the first non-empty value coerced to a positive float."""
    for value in values:
@@ -226,6 +240,13 @@ class HonchoClientConfig:
    # Identity
    peer_name: str | None = None
    ai_peer: str = "hermes"
+    # When True, ``peer_name`` wins over any gateway-supplied runtime
+    # identity (Telegram UID, Discord ID, …) when resolving the user peer.
+    # This keeps memory unified across platforms for single-user deployments
+    # where Honcho's one peer-name is an unambiguous identity — otherwise
+    # each platform would fork memory into its own peer (#14984).  Default
+    # ``False`` preserves existing multi-user behaviour.
+    pin_peer_name: bool = False
    # Toggles
    enabled: bool = False
    save_messages: bool = True
@@ -420,6 +441,11 @@ class HonchoClientConfig:
            timeout=timeout,
            peer_name=host_block.get("peerName") or raw.get("peerName"),
            ai_peer=ai_peer,
+            pin_peer_name=_resolve_bool(
+                host_block.get("pinPeerName"),
+                raw.get("pinPeerName"),
+                default=False,
+            ),
            enabled=enabled,
            save_messages=save_messages,
            write_frequency=write_frequency,
@@ -522,6 +548,39 @@ class HonchoClientConfig:
            pass
        return None

+    # Honcho enforces a 100-char limit on session IDs. Long gateway session keys
+    # (Matrix "!room:server" + thread event IDs, Telegram supergroup reply
+    # chains, Slack thread IDs with long workspace prefixes) can overflow this
+    # limit after sanitization; the Honcho API then rejects every call for that
+    # session with "session_id too long". See issue #13868.
+    _HONCHO_SESSION_ID_MAX_LEN = 100
+    _HONCHO_SESSION_ID_HASH_LEN = 8
+
+    @classmethod
+    def _enforce_session_id_limit(cls, sanitized: str, original: str) -> str:
+        """Truncate a sanitized session ID to Honcho's 100-char limit.
+
+        The common case (short keys) short-circuits with no modification.
+        For over-limit keys, keep a prefix of the sanitized ID and append a
+        deterministic ``-<sha256 prefix>`` suffix so two distinct long keys
+        that share a leading segment don't collide onto the same truncated ID.
+        The hash is taken over the *original* pre-sanitization key, so two
+        inputs that sanitize to the same string still collide intentionally
+        (same logical session), but two inputs that only share a prefix do not.
+        """
+        max_len = cls._HONCHO_SESSION_ID_MAX_LEN
+        if len(sanitized) <= max_len:
+            return sanitized
+
+        hash_len = cls._HONCHO_SESSION_ID_HASH_LEN
+        digest = hashlib.sha256(original.encode("utf-8")).hexdigest()[:hash_len]
+        # max_len - hash_len - 1 (for the '-' separator) chars of the sanitized
+        # prefix, then '-<hash>'. Strip any trailing hyphen from the prefix so
+        # the result doesn't double up on separators.
+        prefix_len = max_len - hash_len - 1
+        prefix = sanitized[:prefix_len].rstrip("-")
+        return f"{prefix}-{digest}"
+
    def resolve_session_name(
        self,
        cwd: str | None = None,
@@ -566,7 +625,7 @@ class HonchoClientConfig:
        if gateway_session_key:
            sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
            if sanitized:
-                return sanitized
+                return self._enforce_session_id_limit(sanitized, gateway_session_key)

        # per-session: inherit Hermes session_id (new Honcho session each run)
        if self.session_strategy == "per-session" and session_id:
@@ -646,6 +705,11 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
        except Exception:
            pass

+    # Fall back to the default so an unconfigured install cannot hang
+    # indefinitely on a stalled Honcho request.
+    if resolved_timeout is None:
+        resolved_timeout = _DEFAULT_HTTP_TIMEOUT
+
    if resolved_base_url:
        logger.info("Initializing Honcho client (base_url: %s, workspace: %s)", resolved_base_url, config.workspace_id)
    else:
@@ -95,6 +95,7 @@ class HonchoSessionManager:
        self._config = config
        self._runtime_user_peer_name = runtime_user_peer_name
        self._cache: dict[str, HonchoSession] = {}
+        self._cache_lock = threading.RLock()
        self._peers_cache: dict[str, Any] = {}
        self._sessions_cache: dict[str, Any] = {}

@@ -273,17 +274,35 @@ class HonchoSessionManager:
        Returns:
            The session.
        """
-        if key in self._cache:
-            logger.debug("Local session cache hit: %s", key)
-            return self._cache[key]
+        with self._cache_lock:
+            if key in self._cache:
+                logger.debug("Local session cache hit: %s", key)
+                return self._cache[key]

-        # Gateway sessions should use the runtime user identity when available.
-        if self._runtime_user_peer_name:
+        # Determine peer IDs — no lock needed (read-only, no shared state mutation).
+        # Gateway sessions normally use the runtime user identity (the
+        # platform-native ID: Telegram UID, Discord snowflake, Slack user,
+        # etc.) so multi-user bots scope memory per user.  For a single-user
+        # deployment the config-supplied ``peer_name`` is an unambiguous
+        # identity and we should keep it unified across platforms — see
+        # #14984.  Opt into that with ``hosts.<host>.pinPeerName: true`` in
+        # ``honcho.json`` (or root-level ``pinPeerName: true``).
+        # `is True` (not `bool(...)`) is deliberate: several multi-user tests
+        # pass a ``MagicMock`` for ``config`` where ``mock.pin_peer_name``
+        # silently returns another MagicMock — truthy by default.  Requiring
+        # strict ``True`` keeps pinning as opt-in even for callers that
+        # haven't updated their mocks yet; real configs built via
+        # ``from_global_config`` always produce a proper boolean.
+        pin_peer_name = (
+            self._config is not None
+            and bool(getattr(self._config, "peer_name", None))
+            and getattr(self._config, "pin_peer_name", False) is True
+        )
+        if self._runtime_user_peer_name and not pin_peer_name:
            user_peer_id = self._sanitize_id(self._runtime_user_peer_name)
        elif self._config and self._config.peer_name:
            user_peer_id = self._sanitize_id(self._config.peer_name)
        else:
-            # Fallback: derive from session key
            parts = key.split(":", 1)
            channel = parts[0] if len(parts) > 1 else "default"
            chat_id = parts[1] if len(parts) > 1 else key
@@ -293,19 +312,14 @@ class HonchoSessionManager:
            self._config.ai_peer if self._config else "hermes-assistant"
        )

-        # Sanitize session ID for Honcho
+        # All expensive I/O outside the lock — Honcho's persistence is source of truth
        honcho_session_id = self._sanitize_id(key)
-
-        # Get or create peers
        user_peer = self._get_or_create_peer(user_peer_id)
        assistant_peer = self._get_or_create_peer(assistant_peer_id)
-
-        # Get or create Honcho session
        honcho_session, existing_messages = self._get_or_create_honcho_session(
            honcho_session_id, user_peer, assistant_peer
        )

-        # Convert Honcho messages to local format
        local_messages = []
        for msg in existing_messages:
            role = "assistant" if msg.peer_id == assistant_peer_id else "user"
@@ -313,10 +327,9 @@ class HonchoSessionManager:
                "role": role,
                "content": msg.content,
                "timestamp": msg.created_at.isoformat() if msg.created_at else "",
-                "_synced": True,  # Already in Honcho
+                "_synced": True,
            })

-        # Create local session wrapper with existing messages
        session = HonchoSession(
            key=key,
            user_peer_id=user_peer_id,
@@ -325,7 +338,9 @@ class HonchoSessionManager:
            messages=local_messages,
        )

-        self._cache[key] = session
+        # Write to cache under lock — only one writer wins
+        with self._cache_lock:
+            self._cache[key] = session
        return session

    def _flush_session(self, session: HonchoSession) -> bool:
@@ -356,13 +371,15 @@ class HonchoSessionManager:
            for msg in new_messages:
                msg["_synced"] = True
            logger.debug("Synced %d messages to Honcho for %s", len(honcho_messages), session.key)
-            self._cache[session.key] = session
+            with self._cache_lock:
+                self._cache[session.key] = session
            return True
        except Exception as e:
            for msg in new_messages:
                msg["_synced"] = False
            logger.error("Failed to sync messages to Honcho: %s", e)
-            self._cache[session.key] = session
+            with self._cache_lock:
+                self._cache[session.key] = session
            return False

    def _async_writer_loop(self) -> None:
@@ -434,7 +451,9 @@ class HonchoSessionManager:
        Called at session end for "session" write_frequency, or to force
        a sync before process exit regardless of mode.
        """
-        for session in list(self._cache.values()):
+        with self._cache_lock:
+            sessions = list(self._cache.values())
+        for session in sessions:
            try:
                self._flush_session(session)
            except Exception as e:
@@ -459,9 +478,10 @@ class HonchoSessionManager:

    def delete(self, key: str) -> bool:
        """Delete a session from local cache."""
-        if key in self._cache:
-            del self._cache[key]
-            return True
+        with self._cache_lock:
+            if key in self._cache:
+                del self._cache[key]
+                return True
        return False

    def new_session(self, key: str) -> HonchoSession:
@@ -473,20 +493,25 @@ class HonchoSessionManager:
        """
        import time

-        # Remove old session from caches (but don't delete from Honcho)
-        old_session = self._cache.pop(key, None)
-        if old_session:
-            self._sessions_cache.pop(old_session.honcho_session_id, None)
+        # Hold the reentrant lock across get_or_create so a concurrent caller
+        # can't observe the (old-popped, new-not-yet-inserted) gap and create
+        # its own session under the raw key.  `_cache_lock` is an RLock so
+        # nested reacquisition inside get_or_create is safe.
+        with self._cache_lock:
+            # Remove old session from caches (but don't delete from Honcho)
+            old_session = self._cache.pop(key, None)
+            if old_session:
+                self._sessions_cache.pop(old_session.honcho_session_id, None)

-        # Create new session with timestamp suffix
-        timestamp = int(time.time())
-        new_key = f"{key}:{timestamp}"
+            # Create new session with timestamp suffix
+            timestamp = int(time.time())
+            new_key = f"{key}:{timestamp}"

-        # get_or_create will create a fresh session
-        session = self.get_or_create(new_key)
+            # get_or_create will create a fresh session
+            session = self.get_or_create(new_key)

-        # Cache under the original key so callers find it by the expected name
-        self._cache[key] = session
+            # Cache under the original key so callers find it by the expected name
+            self._cache[key] = session

        logger.info("Created new session for %s (honcho: %s)", key, session.honcho_session_id)
        return session
@@ -74,18 +74,25 @@ from model_tools import (
    check_toolset_requirements,
 )
 from tools.terminal_tool import cleanup_vm, get_active_env, is_persistent_env
+from tools.terminal_tool import (
+    set_approval_callback as _set_approval_callback,
+    set_sudo_password_callback as _set_sudo_password_callback,
+    _get_approval_callback,
+    _get_sudo_password_callback,
+)
 from tools.tool_result_storage import maybe_persist_tool_result, enforce_turn_budget
 from tools.interrupt import set_interrupt as _set_interrupt
 from tools.browser_tool import cleanup_browser


 # Agent internals extracted to agent/ package for modularity
-from agent.memory_manager import build_memory_context_block, sanitize_context
+from agent.memory_manager import StreamingContextScrubber, build_memory_context_block, sanitize_context
 from agent.retry_utils import jittered_backoff
 from agent.error_classifier import classify_api_error, FailoverReason
 from agent.prompt_builder import (
    DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
    MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
+    HERMES_AGENT_HELP_GUIDANCE,
    build_nous_subscription_prompt,
 )
 from agent.model_metadata import (
@@ -892,7 +899,6 @@ class AIAgent:
        checkpoints_enabled: bool = False,
        checkpoint_max_snapshots: int = 50,
        pass_session_id: bool = False,
-        persist_session: bool = True,
    ):
        """
        Initialize the AI Agent.
@@ -964,7 +970,6 @@ class AIAgent:
        self.background_review_callback = None  # Optional sync callback for gateway delivery
        self.skip_context_files = skip_context_files
        self.pass_session_id = pass_session_id
-        self.persist_session = persist_session
        self._credential_pool = credential_pool
        self.log_prefix_chars = log_prefix_chars
        self.log_prefix = f"{log_prefix} " if log_prefix else ""
@@ -1213,6 +1218,10 @@ class AIAgent:
        # Deferred paragraph break flag — set after tool iterations so a
        # single "\n\n" is prepended to the next real text delta.
        self._stream_needs_break = False
+        # Stateful scrubber for <memory-context> spans split across stream
+        # deltas (#5719).  sanitize_context() alone can't survive chunk
+        # boundaries because the block regex needs both tags in one string.
+        self._stream_context_scrubber = StreamingContextScrubber()
        # Visible assistant text already delivered through live token callbacks
        # during the current model response. Used to avoid re-sending the same
        # commentary when the provider later returns it as a completed interim
@@ -2418,7 +2427,10 @@ class AIAgent:
        if not self.compression_enabled:
            return
        try:
-            from agent.auxiliary_client import get_text_auxiliary_client
+            from agent.auxiliary_client import (
+                _resolve_task_provider_model,
+                get_text_auxiliary_client,
+            )
            from agent.model_metadata import (
                MINIMUM_CONTEXT_LENGTH,
                get_model_context_length,
@@ -2428,6 +2440,14 @@ class AIAgent:
                "compression",
                main_runtime=self._current_main_runtime(),
            )
+            # Best-effort aux provider label for the warning message. The
+            # configured provider may be "auto", in which case we fall back
+            # to the client's base_url hostname so the user can still tell
+            # where the compression model is actually being called.
+            try:
+                _aux_cfg_provider, _, _, _, _ = _resolve_task_provider_model("compression")
+            except Exception:
+                _aux_cfg_provider = ""
            if client is None or not aux_model:
                msg = (
                    "⚠ No auxiliary LLM provider configured — context "
@@ -2494,10 +2514,37 @@ class AIAgent:
                        new_threshold / main_ctx
                    )
                safe_pct = int((aux_context / main_ctx) * 100) if main_ctx else 50
+                # Build human-readable "model (provider)" labels for both
+                # the main model and the compression model so users can
+                # tell at a glance which provider each side is actually
+                # using. When the configured provider is empty or "auto",
+                # fall back to the client's base_url hostname.
+                _main_model = getattr(self, "model", "") or "?"
+                _main_provider = getattr(self, "provider", "") or ""
+                _aux_provider_label = (
+                    _aux_cfg_provider
+                    if _aux_cfg_provider and _aux_cfg_provider != "auto"
+                    else ""
+                )
+                if not _aux_provider_label:
+                    try:
+                        from urllib.parse import urlparse
+                        _aux_provider_label = (
+                            urlparse(aux_base_url).hostname or aux_base_url
+                        )
+                    except Exception:
+                        _aux_provider_label = aux_base_url or "auto"
+                _main_label = (
+                    f"{_main_model} ({_main_provider})"
+                    if _main_provider
+                    else _main_model
+                )
+                _aux_label = f"{aux_model} ({_aux_provider_label})"
                msg = (
-                    f"⚠ Compression model ({aux_model}) context is "
-                    f"{aux_context:,} tokens, but the main model's "
-                    f"compression threshold was {old_threshold:,} tokens. "
+                    f"⚠ Compression model {_aux_label} context is "
+                    f"{aux_context:,} tokens, but the main model "
+                    f"{_main_label}'s compression threshold was "
+                    f"{old_threshold:,} tokens. "
                    f"Auto-lowered this session's threshold to "
                    f"{new_threshold:,} tokens so compression can run.\n"
                    f"  To make this permanent, edit config.yaml — either:\n"
@@ -3109,13 +3156,28 @@ class AIAgent:
    )

    _SKILL_REVIEW_PROMPT = (
-        "Review the conversation above and consider saving or updating a skill if appropriate.\n\n"
-        "Focus on: was a non-trivial approach used to complete a task that required trial "
-        "and error, or changing course due to experiential findings along the way, or did "
-        "the user expect or desire a different method or outcome?\n\n"
-        "If a relevant skill already exists, update it with what you learned. "
-        "Otherwise, create a new skill if the approach is reusable.\n"
-        "If nothing is worth saving, just say 'Nothing to save.' and stop."
+        "Review the conversation above and consider whether a skill should be saved or updated.\n\n"
+        "Work in this order — do not skip steps:\n\n"
+        "1. SURVEY the existing skill landscape first. Call skills_list to see what you "
+        "have. If anything looks potentially relevant, skill_view it before deciding. "
+        "You are looking for the CLASS of task that just happened, not the exact task. "
+        "Example: a successful Tauri build is in the class \"desktop app build "
+        "troubleshooting\", not \"fix my specific Tauri error today\".\n\n"
+        "2. THINK CLASS-FIRST. What general pattern of task did the user just complete? "
+        "What conditions will trigger this pattern again? Describe the class in one "
+        "sentence before looking at what to save.\n\n"
+        "3. PREFER GENERALIZING AN EXISTING SKILL over creating a new one. If a skill "
+        "already covers the class — even partially — update it (skill_manage patch) "
+        "with the new insight. Broaden its \"when to use\" trigger if needed.\n\n"
+        "4. ONLY CREATE A NEW SKILL when no existing skill reasonably covers the class. "
+        "When you create one, name and scope it at the class level "
+        "(\"react-i18n-setup\", not \"add-i18n-to-my-dashboard-app\"). The trigger "
+        "section must describe the class of situations, not this one session.\n\n"
+        "5. If you notice two existing skills that overlap, note it in your response "
+        "so a future review can consolidate them. Do not consolidate now unless the "
+        "overlap is obvious and low-risk.\n\n"
+        "Only act when something is genuinely worth saving. "
+        "If nothing stands out, just say 'Nothing to save.' and stop."
    )

    _COMBINED_REVIEW_PROMPT = (
@@ -3125,9 +3187,16 @@ class AIAgent:
        "about how you should behave, their work style, or ways they want you to operate? "
        "If so, save using the memory tool.\n\n"
        "**Skills**: Was a non-trivial approach used to complete a task that required trial "
-        "and error, or changing course due to experiential findings along the way, or did "
-        "the user expect or desire a different method or outcome? If a relevant skill "
-        "already exists, update it. Otherwise, create a new one if the approach is reusable.\n\n"
+        "and error, changing course due to experiential findings, or a different method "
+        "or outcome than the user expected? If so, work in this order:\n"
+        "  a. SURVEY existing skills first (skills_list, then skill_view on candidates).\n"
+        "  b. Identify the CLASS of task, not the specific task "
+        "(\"desktop app build troubleshooting\", not \"fix my Tauri error\").\n"
+        "  c. PREFER UPDATING/GENERALIZING an existing skill that covers the class.\n"
+        "  d. ONLY CREATE A NEW SKILL if no existing one covers the class. Scope at "
+        "the class level, not this one session.\n"
+        "  e. If you notice overlapping skills during the survey, note it so a future "
+        "review can consolidate them.\n\n"
        "Only act if there's something genuinely worth saving. "
        "If nothing stands out, just say 'Nothing to save.' and stop."
    )
@@ -3220,18 +3289,47 @@ class AIAgent:

        def _run_review():
            import contextlib
+            # Install a non-interactive approval callback on this worker
+            # thread so any dangerous-command guard the review agent trips
+            # resolves to "deny" instead of falling back to input() -- which
+            # deadlocks against the parent's prompt_toolkit TUI (#15216).
+            # Same pattern as _subagent_auto_deny in tools/delegate_tool.py.
+            def _bg_review_auto_deny(command, description, **kwargs):
+                logger.warning(
+                    "Background review auto-denied dangerous command: %s (%s)",
+                    command, description,
+                )
+                return "deny"
+            try:
+                _set_approval_callback(_bg_review_auto_deny)
+            except Exception:
+                pass
            review_agent = None
            try:
                with open(os.devnull, "w") as _devnull, \
                     contextlib.redirect_stdout(_devnull), \
                     contextlib.redirect_stderr(_devnull):
+                    # Inherit the parent agent's live runtime (provider, model,
+                    # base_url, api_key, api_mode) so the fork uses the exact
+                    # same credentials the main turn is using.  Without this,
+                    # AIAgent.__init__ re-runs auto-resolution from env vars,
+                    # which fails for OAuth-only providers, session-scoped
+                    # creds, or credential-pool setups where the resolver can't
+                    # reconstruct auth from scratch -- producing the spurious
+                    # "No LLM provider configured" warning at end of turn.
+                    _parent_runtime = self._current_main_runtime()
                    review_agent = AIAgent(
                        model=self.model,
                        max_iterations=8,
                        quiet_mode=True,
                        platform=self.platform,
                        provider=self.provider,
+                        api_mode=_parent_runtime.get("api_mode") or None,
+                        base_url=_parent_runtime.get("base_url") or None,
+                        api_key=_parent_runtime.get("api_key") or None,
+                        credential_pool=getattr(self, "_credential_pool", None),
                        parent_session_id=self.session_id,
+                        enabled_toolsets=["memory", "skills"],
                    )
                    review_agent._memory_write_origin = "background_review"
                    review_agent._memory_write_context = "background_review"
@@ -3271,14 +3369,29 @@ class AIAgent:
                logger.warning("Background memory/skill review failed: %s", e)
                self._emit_auxiliary_failure("background review", e)
            finally:
-                # Close all resources (httpx client, subprocesses, etc.) so
-                # GC doesn't try to clean them up on a dead asyncio event
-                # loop (which produces "Event loop is closed" errors).
+                # Background review agents can initialize memory providers
+                # (for example Hindsight) that own their own network clients.
+                # Explicitly stop those providers before closing the agent so
+                # their aiohttp sessions do not leak until GC/process exit.
+                # Then close all remaining resources (httpx client,
+                # subprocesses, etc.) so GC doesn't try to clean them up on a
+                # dead asyncio event loop (which produces "Event loop is
+                # closed" errors).
                if review_agent is not None:
+                    try:
+                        review_agent.shutdown_memory_provider()
+                    except Exception:
+                        pass
                    try:
                        review_agent.close()
                    except Exception:
                        pass
+                # Clear the approval callback on this bg-review thread so a
+                # recycled thread-id doesn't inherit a stale reference.
+                try:
+                    _set_approval_callback(None)
+                except Exception:
+                    pass

        t = threading.Thread(target=_run_review, daemon=True, name="bg-review")
        t.start()
@@ -3331,10 +3444,7 @@ class AIAgent:
        """Save session state to both JSON log and SQLite on any exit path.

        Ensures conversations are never lost, even on errors or early returns.
-        Skipped when ``persist_session=False`` (ephemeral helper flows).
        """
-        if not self.persist_session:
-            return
        self._apply_persist_user_message_override(messages)
        self._session_messages = messages
        self._save_session_log(messages)
@@ -4459,6 +4569,9 @@ class AIAgent:
            # Fallback to hardcoded identity
            prompt_parts = [DEFAULT_AGENT_IDENTITY]

+        # Pointer to the hermes-agent skill + docs for user questions about Hermes itself.
+        prompt_parts.append(HERMES_AGENT_HELP_GUIDANCE)
+
        # Tool-aware behavioral guidance: only inject when the tools are loaded
        tool_guidance = []
        if "memory" in self.valid_tool_names:
@@ -5187,7 +5300,39 @@ class AIAgent:
            logger.debug("Dead connection check error: %s", exc)
        return False

-    def _create_request_openai_client(self, *, reason: str) -> Any:
+    @staticmethod
+    def _api_kwargs_have_image_parts(api_kwargs: dict) -> bool:
+        """Return True when the outbound request still contains native image parts."""
+        if not isinstance(api_kwargs, dict):
+            return False
+        candidates = []
+        messages = api_kwargs.get("messages")
+        if isinstance(messages, list):
+            candidates.extend(messages)
+        # Responses API payloads use `input`; after conversion, image parts can
+        # still be present there instead of in `messages`.
+        response_input = api_kwargs.get("input")
+        if isinstance(response_input, list):
+            candidates.extend(response_input)
+
+        def _contains_image(value: Any) -> bool:
+            if isinstance(value, dict):
+                ptype = value.get("type")
+                if ptype in {"image_url", "input_image"}:
+                    return True
+                return any(_contains_image(v) for v in value.values())
+            if isinstance(value, list):
+                return any(_contains_image(v) for v in value)
+            return False
+
+        return any(_contains_image(item) for item in candidates)
+
+    def _copilot_headers_for_request(self, *, is_vision: bool) -> dict:
+        from hermes_cli.copilot_auth import copilot_request_headers
+
+        return copilot_request_headers(is_agent_turn=True, is_vision=is_vision)
+
+    def _create_request_openai_client(self, *, reason: str, api_kwargs: Optional[dict] = None) -> Any:
        from unittest.mock import Mock

        primary_client = self._ensure_primary_openai_client(reason=reason)
@@ -5195,6 +5340,11 @@ class AIAgent:
            return primary_client
        with self._openai_client_lock():
            request_kwargs = dict(self._client_kwargs)
+        if (
+            base_url_host_matches(str(request_kwargs.get("base_url", "")), "api.githubcopilot.com")
+            and self._api_kwargs_have_image_parts(api_kwargs or {})
+        ):
+            request_kwargs["default_headers"] = self._copilot_headers_for_request(is_vision=True)
        return self._create_openai_client(request_kwargs, reason=reason, shared=False)

    def _close_request_openai_client(self, client: Any, *, reason: str) -> None:
@@ -5737,7 +5887,10 @@ class AIAgent:
        def _call():
            try:
                if self.api_mode == "codex_responses":
-                    request_client_holder["client"] = self._create_request_openai_client(reason="codex_stream_request")
+                    request_client_holder["client"] = self._create_request_openai_client(
+                        reason="codex_stream_request",
+                        api_kwargs=api_kwargs,
+                    )
                    result["response"] = self._run_codex_stream(
                        api_kwargs,
                        client=request_client_holder["client"],
@@ -5769,7 +5922,10 @@ class AIAgent:
                        raise
                    result["response"] = normalize_converse_response(raw_response)
                else:
-                    request_client_holder["client"] = self._create_request_openai_client(reason="chat_completion_request")
+                    request_client_holder["client"] = self._create_request_openai_client(
+                        reason="chat_completion_request",
+                        api_kwargs=api_kwargs,
+                    )
                    result["response"] = request_client_holder["client"].chat.completions.create(**api_kwargs)
            except Exception as e:
                result["error"] = e
@@ -5867,6 +6023,20 @@ class AIAgent:

    def _reset_stream_delivery_tracking(self) -> None:
        """Reset tracking for text delivered during the current model response."""
+        # Flush any benign partial-tag tail held by the context scrubber so it
+        # reaches the UI before we clear state for the next model call.  If
+        # the scrubber is mid-span, flush() drops the orphaned content.
+        scrubber = getattr(self, "_stream_context_scrubber", None)
+        if scrubber is not None:
+            tail = scrubber.flush()
+            if tail:
+                callbacks = [cb for cb in (self.stream_delta_callback, self._stream_callback) if cb is not None]
+                for cb in callbacks:
+                    try:
+                        cb(tail)
+                    except Exception:
+                        pass
+                self._record_streamed_assistant_text(tail)
        self._current_streamed_assistant_text = ""

    def _record_streamed_assistant_text(self, text: str) -> None:
@@ -5917,6 +6087,28 @@ class AIAgent:
        if getattr(self, "_stream_needs_break", False) and text and text.strip():
            self._stream_needs_break = False
            text = "\n\n" + text
+            prepended_break = True
+        else:
+            prepended_break = False
+        if isinstance(text, str):
+            # Strip <think> blocks first (per-delta is safe for closed pairs; the
+            # unterminated-tag path is handled downstream by stream_consumer).
+            # Then feed through the stateful context scrubber so memory-context
+            # spans split across chunks cannot leak to the UI (#5719).
+            text = self._strip_think_blocks(text or "")
+            scrubber = getattr(self, "_stream_context_scrubber", None)
+            if scrubber is not None:
+                text = scrubber.feed(text)
+            else:
+                # Defensive: legacy callers without the scrubber attribute.
+                text = sanitize_context(text)
+            # Only strip leading newlines on the first delta — mid-stream "\n" is legitimate markdown.
+            if not prepended_break and not getattr(
+                self, "_current_streamed_assistant_text", ""
+            ):
+                text = text.lstrip("\n")
+        if not text:
+            return
        callbacks = [cb for cb in (self.stream_delta_callback, self._stream_callback) if cb is not None]
        delivered = False
        for cb in callbacks:
@@ -6112,7 +6304,8 @@ class AIAgent:
                ),
            }
            request_client_holder["client"] = self._create_request_openai_client(
-                reason="chat_completion_stream_request"
+                reason="chat_completion_stream_request",
+                api_kwargs=stream_kwargs,
            )
            # Reset stale-stream timer so the detector measures from this
            # attempt's start, not a previous attempt's last chunk.
@@ -7244,6 +7437,26 @@ class AIAgent:
        self._anthropic_image_fallback_cache[cache_key] = note
        return note

+    def _model_supports_vision(self) -> bool:
+        """Return True if the active provider+model reports native vision.
+
+        Used to decide whether to strip image content parts from API-bound
+        messages (for non-vision models) or let the provider adapter handle
+        them natively (for vision-capable models).
+        """
+        try:
+            from agent.models_dev import get_model_capabilities
+            provider = (getattr(self, "provider", "") or "").strip()
+            model = (getattr(self, "model", "") or "").strip()
+            if not provider or not model:
+                return False
+            caps = get_model_capabilities(provider, model)
+            if caps is None:
+                return False
+            return bool(caps.supports_vision)
+        except Exception:
+            return False
+
    def _preprocess_anthropic_content(self, content: Any, role: str) -> Any:
        if not self._content_has_image_parts(content):
            return content
@@ -7307,12 +7520,23 @@ class AIAgent:
        return t

    def _prepare_anthropic_messages_for_api(self, api_messages: list) -> list:
+        # Fast exit when no message carries image content at all.
        if not any(
            isinstance(msg, dict) and self._content_has_image_parts(msg.get("content"))
            for msg in api_messages
        ):
            return api_messages

+        # The Anthropic adapter (agent/anthropic_adapter.py:_convert_content_part_to_anthropic)
+        # already translates OpenAI-style image_url/input_image parts into
+        # native Anthropic ``{"type": "image", "source": ...}`` blocks. When
+        # the active model supports vision we let the adapter do its job and
+        # skip this legacy text-fallback preprocessor entirely.
+        if self._model_supports_vision():
+            return api_messages
+
+        # Non-vision Anthropic model (rare today, but keep the fallback for
+        # compat): replace each image part with a vision_analyze text note.
        transformed = copy.deepcopy(api_messages)
        for msg in transformed:
            if not isinstance(msg, dict):
@@ -7323,6 +7547,150 @@ class AIAgent:
            )
        return transformed

+    def _prepare_messages_for_non_vision_model(self, api_messages: list) -> list:
+        """Strip native image parts when the active model lacks vision.
+
+        Runs on the chat.completions / codex_responses paths. Vision-capable
+        models pass through unchanged (provider and any downstream translator
+        handle the image parts natively). Non-vision models get each image
+        replaced by a cached vision_analyze text description so the turn
+        doesn't fail with "model does not support image input".
+        """
+        if not any(
+            isinstance(msg, dict) and self._content_has_image_parts(msg.get("content"))
+            for msg in api_messages
+        ):
+            return api_messages
+
+        if self._model_supports_vision():
+            return api_messages
+
+        transformed = copy.deepcopy(api_messages)
+        for msg in transformed:
+            if not isinstance(msg, dict):
+                continue
+            # Reuse the Anthropic text-fallback preprocessor — the behaviour is
+            # identical (walk content parts, replace images with cached
+            # descriptions, merge back into a single text or structured
+            # content). Naming is historical.
+            msg["content"] = self._preprocess_anthropic_content(
+                msg.get("content"),
+                str(msg.get("role", "user") or "user"),
+            )
+        return transformed
+
+    def _try_shrink_image_parts_in_messages(self, api_messages: list) -> bool:
+        """Re-encode all native image parts at a smaller size to recover from
+        image-too-large errors (Anthropic 5 MB, unknown other providers).
+
+        Mutates ``api_messages`` in place. Returns True if any image part was
+        actually replaced, False if there were no image parts to shrink or
+        Pillow couldn't help (caller should surface the original error).
+
+        Strategy: look for ``image_url`` / ``input_image`` parts carrying a
+        ``data:image/...;base64,...`` payload.  For each one whose encoded
+        size exceeds 4 MB (a safe target that slides under Anthropic's 5 MB
+        ceiling with header overhead), write the base64 to a tempfile, call
+        ``vision_tools._resize_image_for_vision`` to produce a smaller data
+        URL, and substitute it in place.
+
+        Non-data-URL images (http/https URLs) are not touched — the provider
+        fetches those itself and the size limit is different.
+        """
+        if not api_messages:
+            return False
+
+        try:
+            from tools.vision_tools import _resize_image_for_vision
+        except Exception as exc:
+            logger.warning("image-shrink recovery: vision_tools unavailable — %s", exc)
+            return False
+
+        # 4 MB target leaves comfortable headroom under Anthropic's 5 MB.
+        # Non-Anthropic providers we haven't observed rejecting are fine with
+        # much larger; shrinking to 4 MB here loses quality but only fires
+        # after a confirmed provider rejection, so the alternative is failure.
+        target_bytes = 4 * 1024 * 1024
+        changed_count = 0
+
+        def _shrink_data_url(url: str) -> Optional[str]:
+            """Return a smaller data URL, or None if shrink can't help."""
+            if not isinstance(url, str) or not url.startswith("data:"):
+                return None
+            if len(url) <= target_bytes:
+                # This specific image wasn't the oversized one.
+                return None
+            try:
+                header, _, data = url.partition(",")
+                mime = "image/jpeg"
+                if header.startswith("data:"):
+                    mime_part = header[len("data:"):].split(";", 1)[0].strip()
+                    if mime_part.startswith("image/"):
+                        mime = mime_part
+                import base64 as _b64
+                raw = _b64.b64decode(data)
+                suffix = {
+                    "image/png": ".png", "image/gif": ".gif", "image/webp": ".webp",
+                    "image/jpeg": ".jpg", "image/jpg": ".jpg", "image/bmp": ".bmp",
+                }.get(mime, ".jpg")
+                tmp = tempfile.NamedTemporaryFile(
+                    prefix="hermes_shrink_", suffix=suffix, delete=False,
+                )
+                try:
+                    tmp.write(raw)
+                    tmp.close()
+                    resized = _resize_image_for_vision(
+                        Path(tmp.name),
+                        mime_type=mime,
+                        max_base64_bytes=target_bytes,
+                    )
+                finally:
+                    try:
+                        Path(tmp.name).unlink(missing_ok=True)
+                    except Exception:
+                        pass
+                if not resized or len(resized) >= len(url):
+                    # Shrink didn't help (or made it bigger — corrupt input?).
+                    return None
+                return resized
+            except Exception as exc:
+                logger.warning("image-shrink recovery: re-encode failed — %s", exc)
+                return None
+
+        for msg in api_messages:
+            if not isinstance(msg, dict):
+                continue
+            content = msg.get("content")
+            if not isinstance(content, list):
+                continue
+            for part in content:
+                if not isinstance(part, dict):
+                    continue
+                ptype = part.get("type")
+                if ptype not in {"image_url", "input_image"}:
+                    continue
+                image_value = part.get("image_url")
+                # OpenAI chat.completions: {"image_url": {"url": "data:..."}}
+                # OpenAI Responses: {"image_url": "data:..."}
+                if isinstance(image_value, dict):
+                    url = image_value.get("url", "")
+                    resized = _shrink_data_url(url)
+                    if resized:
+                        image_value["url"] = resized
+                        changed_count += 1
+                elif isinstance(image_value, str):
+                    resized = _shrink_data_url(image_value)
+                    if resized:
+                        part["image_url"] = resized
+                        changed_count += 1
+
+        if changed_count:
+            logger.info(
+                "image-shrink recovery: re-encoded %d image part(s) to fit under %.0f MB",
+                changed_count, target_bytes / (1024 * 1024),
+            )
+        return changed_count > 0
+
    def _anthropic_preserve_dots(self) -> bool:
        """True when using an anthropic-compatible endpoint that preserves dots in model names.
        Alibaba/DashScope keeps dots (e.g. qwen3.5-plus).
@@ -7471,9 +7839,10 @@ class AIAgent:
                )
            )
            is_xai_responses = self.provider == "xai" or self._base_url_hostname == "api.x.ai"
+            _msgs_for_codex = self._prepare_messages_for_non_vision_model(api_messages)
            return _ct.build_kwargs(
                model=self.model,
-                messages=api_messages,
+                messages=_msgs_for_codex,
                tools=self.tools,
                reasoning_config=self.reasoning_config,
                session_id=getattr(self, "session_id", None),
@@ -7552,9 +7921,12 @@ class AIAgent:
        if _ephemeral_out is not None:
            self._ephemeral_max_output_tokens = None

+        # Strip image parts for non-vision models (no-op when vision-capable).
+        _msgs_for_chat = self._prepare_messages_for_non_vision_model(api_messages)
+
        return _ct.build_kwargs(
            model=self.model,
-            messages=api_messages,
+            messages=_msgs_for_chat,
            tools=self.tools,
            timeout=self._resolved_api_call_timeout(),
            max_tokens=self.max_tokens,
@@ -7851,36 +8223,45 @@ class AIAgent:
            api_msg["reasoning_content"] = existing
            return

-        # 2. DeepSeek / Kimi thinking mode: tool-call turns that lack
-        # reasoning_content are "poisoned history" — a prior provider (MiniMax,
-        # etc.) left them empty. DeepSeek returns HTTP 400 if reasoning_content
-        # is absent on replay; inject "" to satisfy the provider's requirement
-        # without forwarding any cross-provider reasoning content.
-        needs_empty_reasoning = (
-            source_msg.get("tool_calls")
-            and (
-                self._needs_kimi_tool_reasoning()
-                or self._needs_deepseek_tool_reasoning()
-            )
+        needs_thinking_pad = (
+            self._needs_kimi_tool_reasoning()
+            or self._needs_deepseek_tool_reasoning()
        )
-        if needs_empty_reasoning:
+
+        # 2. Cross-provider poisoned history (#15748): on DeepSeek/Kimi,
+        # if the source turn has tool_calls AND a 'reasoning' field but no
+        # 'reasoning_content' key, the 'reasoning' text was written by a
+        # prior provider (e.g. MiniMax) — DeepSeek's own _build_assistant_message
+        # always pins reasoning_content="" at creation time for tool-call turns,
+        # so the shape (reasoning set, reasoning_content absent, tool_calls
+        # present) is unreachable from same-provider DeepSeek history. Inject
+        # "" to satisfy the API without leaking another provider's chain of
+        # thought to DeepSeek/Kimi.
+        normalized_reasoning = source_msg.get("reasoning")
+        if (
+            needs_thinking_pad
+            and source_msg.get("tool_calls")
+            and isinstance(normalized_reasoning, str)
+            and normalized_reasoning
+        ):
            api_msg["reasoning_content"] = ""
            return

        # 3. Healthy session: promote 'reasoning' field to 'reasoning_content'
        # for providers that use the internal 'reasoning' key.
-        normalized_reasoning = source_msg.get("reasoning")
+        # This must happen BEFORE the DeepSeek/Kimi tool-call check so that
+        # genuine reasoning content is not overwritten by the empty-string
+        # fallback (#15812 regression in PR #15478).
        if isinstance(normalized_reasoning, str) and normalized_reasoning:
            api_msg["reasoning_content"] = normalized_reasoning
            return

        # 4. DeepSeek / Kimi thinking mode: all assistant messages need
        # reasoning_content. Inject "" to satisfy the provider's requirement
-        # when no explicit reasoning content is present.
-        if (
-            self._needs_kimi_tool_reasoning()
-            or self._needs_deepseek_tool_reasoning()
-        ):
+        # when no explicit reasoning content is present. Covers both
+        # tool-call turns (already-poisoned history with no reasoning at all)
+        # and plain text turns.
+        if needs_thinking_pad:
            api_msg["reasoning_content"] = ""
            return

@@ -8118,6 +8499,22 @@ class AIAgent:
            except Exception as e:
                logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)

+        # Notify the context engine that the session_id rotated because of
+        # compression (not a fresh /new). Plugin engines (e.g. hermes-lcm) use
+        # boundary_reason="compression" to preserve DAG lineage across the
+        # rollover instead of re-initializing fresh per-session state.
+        # See hermes-lcm#68. Built-in ContextCompressor ignores kwargs.
+        try:
+            _old_sid = locals().get("old_session_id")
+            if _old_sid and hasattr(self.context_compressor, "on_session_start"):
+                self.context_compressor.on_session_start(
+                    self.session_id or "",
+                    boundary_reason="compression",
+                    old_session_id=_old_sid,
+                )
+        except Exception as _ce_err:
+            logger.debug("context engine on_session_start (compression): %s", _ce_err)
+
        # Warn on repeated compressions (quality degrades with each pass)
        _cc = self.context_compressor.compression_count
        if _cc >= 2:
@@ -8401,6 +8798,14 @@ class AIAgent:
        self._current_tool = tool_names_str
        self._touch_activity(f"executing {num_tools} tools concurrently: {tool_names_str}")

+        # Capture CLI callbacks from the agent thread so worker threads can
+        # register them locally.  Without this, _get_approval_callback() in
+        # terminal_tool returns None in ThreadPoolExecutor workers, causing
+        # the dangerous-command prompt to fall back to input() — which
+        # deadlocks against prompt_toolkit's raw terminal mode (#13617).
+        _parent_approval_cb = _get_approval_callback()
+        _parent_sudo_cb = _get_sudo_password_callback()
+
        def _run_tool(index, tool_call, function_name, function_args):
            """Worker function executed in a thread."""
            # Register this worker tid so the agent can fan out an interrupt
@@ -8427,6 +8832,18 @@ class AIAgent:
                set_activity_callback(self._touch_activity)
            except Exception:
                pass
+            # Propagate approval/sudo callbacks to this worker thread.
+            # Mirrors cli.py run_agent() pattern (GHSA-qg5c-hvr5-hjgr).
+            if _parent_approval_cb is not None:
+                try:
+                    _set_approval_callback(_parent_approval_cb)
+                except Exception:
+                    pass
+            if _parent_sudo_cb is not None:
+                try:
+                    _set_sudo_password_callback(_parent_sudo_cb)
+                except Exception:
+                    pass
            start = time.time()
            try:
                result = self._invoke_tool(function_name, function_args, effective_task_id, tool_call.id, messages=messages)
@@ -8449,6 +8866,13 @@ class AIAgent:
                _set_interrupt(False, _worker_tid)
            except Exception:
                pass
+            # Clear thread-local callbacks so a recycled worker thread
+            # doesn't hold stale references to a disposed CLI instance.
+            try:
+                _set_approval_callback(None)
+                _set_sudo_password_callback(None)
+            except Exception:
+                pass

        # Start spinner for CLI mode (skip when TUI handles tool progress)
        spinner = None
@@ -9208,16 +9632,6 @@ class AIAgent:
        if isinstance(persist_user_message, str):
            persist_user_message = _sanitize_surrogates(persist_user_message)

-        # Strip leaked <memory-context> blocks from user input.  When Honcho's
-        # saveMessages persists a turn that included injected context, the block
-        # can reappear in the next turn's user message via message history.
-        # Stripping here prevents stale memory tags from leaking into the
-        # conversation and being visible to the user or the model as user text.
-        if isinstance(user_message, str):
-            user_message = sanitize_context(user_message)
-        if isinstance(persist_user_message, str):
-            persist_user_message = sanitize_context(persist_user_message)
-
        # Store stream callback for _interruptible_api_call to pick up
        self._stream_callback = stream_callback
        self._persist_user_message_idx = None
@@ -9296,6 +9710,13 @@ class AIAgent:
        # Track user turns for memory flush and periodic nudge logic
        self._user_turn_count += 1

+        # Reset the streaming context scrubber at the top of each turn so a
+        # hung span from a prior interrupted stream can't taint this turn's
+        # output.
+        scrubber = getattr(self, "_stream_context_scrubber", None)
+        if scrubber is not None:
+            scrubber.reset()
+
        # Preserve the original user message (no nudge injection).
        original_user_message = persist_user_message if persist_user_message is not None else user_message

@@ -9823,6 +10244,7 @@ class AIAgent:
            nous_auth_retry_attempted=False
            copilot_auth_retry_attempted=False
            thinking_sig_retry_attempted = False
+            image_shrink_retry_attempted = False
            has_retried_429 = False
            restart_with_compressed_messages = False
            restart_with_length_continuation = False
@@ -10744,6 +11166,31 @@ class AIAgent:
                    )
                    if recovered_with_pool:
                        continue
+
+                    # Image-too-large recovery: shrink oversized native image
+                    # parts in-place and retry once.  Triggered by Anthropic's
+                    # per-image 5 MB ceiling (400 with "image exceeds 5 MB
+                    # maximum") or any other provider that complains about
+                    # image size.  If shrink fails or a second attempt still
+                    # fails, fall through to normal error handling.
+                    if (
+                        classified.reason == FailoverReason.image_too_large
+                        and not image_shrink_retry_attempted
+                    ):
+                        image_shrink_retry_attempted = True
+                        if self._try_shrink_image_parts_in_messages(api_messages):
+                            self._vprint(
+                                f"{self.log_prefix}📐 Image(s) exceeded provider size limit — "
+                                f"shrank and retrying...",
+                                force=True,
+                            )
+                            continue
+                        else:
+                            logger.info(
+                                "image-shrink recovery: no data-URL image parts found "
+                                "or shrink didn't reduce size; surfacing original error."
+                            )
+
                    if (
                        self.api_mode == "codex_responses"
                        and self.provider == "openai-codex"
@@ -11007,36 +11454,69 @@ class AIAgent:
                                continue

                    # ── Nous Portal: record rate limit & skip retries ─────
-                    # When Nous returns a 429, record the reset time to a
-                    # shared file so ALL sessions (cron, gateway, auxiliary)
-                    # know not to pile on.  Then skip further retries —
-                    # each one burns another RPH request and deepens the
-                    # rate limit hole.  The retry loop's top-of-iteration
-                    # guard will catch this on the next pass and try
-                    # fallback or bail with a clear message.
+                    # When Nous returns a 429 that is a genuine account-
+                    # level rate limit, record the reset time to a shared
+                    # file so ALL sessions (cron, gateway, auxiliary) know
+                    # not to pile on, then skip further retries -- each
+                    # one burns another RPH request and deepens the hole.
+                    # The retry loop's top-of-iteration guard will catch
+                    # this on the next pass and try fallback or bail.
+                    #
+                    # IMPORTANT: Nous Portal multiplexes multiple upstream
+                    # providers (DeepSeek, Kimi, MiMo, Hermes).  A 429 can
+                    # also mean an UPSTREAM provider is out of capacity
+                    # for one specific model -- transient, clears in
+                    # seconds, nothing to do with the caller's quota.
+                    # Tripping the cross-session breaker on that would
+                    # block every Nous model for minutes.  We use
+                    # ``is_genuine_nous_rate_limit`` to tell the two
+                    # apart via the 429's own x-ratelimit-* headers and
+                    # the last-known-good state captured on the previous
+                    # successful response.
                    if (
                        is_rate_limited
                        and self.provider == "nous"
                        and classified.reason == FailoverReason.rate_limit
                        and not recovered_with_pool
                    ):
+                        _genuine_nous_rate_limit = False
                        try:
-                            from agent.nous_rate_guard import record_nous_rate_limit
+                            from agent.nous_rate_guard import (
+                                is_genuine_nous_rate_limit,
+                                record_nous_rate_limit,
+                            )
                            _err_resp = getattr(api_error, "response", None)
                            _err_hdrs = (
                                getattr(_err_resp, "headers", None)
                                if _err_resp else None
                            )
-                            record_nous_rate_limit(
+                            _genuine_nous_rate_limit = is_genuine_nous_rate_limit(
                                headers=_err_hdrs,
-                                error_context=error_context,
+                                last_known_state=self._rate_limit_state,
                            )
+                            if _genuine_nous_rate_limit:
+                                record_nous_rate_limit(
+                                    headers=_err_hdrs,
+                                    error_context=error_context,
+                                )
+                            else:
+                                logging.info(
+                                    "Nous 429 looks like upstream capacity "
+                                    "(no exhausted bucket in headers or "
+                                    "last-known state) -- not tripping "
+                                    "cross-session breaker."
+                                )
                        except Exception:
                            pass
-                        # Skip straight to max_retries — the top-of-loop
-                        # guard will handle fallback or bail cleanly.
-                        retry_count = max_retries
-                        continue
+                        if _genuine_nous_rate_limit:
+                            # Skip straight to max_retries -- the
+                            # top-of-loop guard will handle fallback or
+                            # bail cleanly.
+                            retry_count = max_retries
+                            continue
+                        # Upstream capacity 429: fall through to normal
+                        # retry logic.  A different model (or the same
+                        # model a moment later) will typically succeed.

                    is_payload_too_large = (
                        classified.reason == FailoverReason.payload_too_large
@@ -12268,7 +12748,6 @@ class AIAgent:
                        truncated_response_prefix = ""
                        length_continue_retries = 0
                    
-                    # Strip <think> blocks from user-facing response (keep raw in messages for trajectory)
                    final_response = self._strip_think_blocks(final_response).strip()
                    
                    final_msg = self._build_assistant_message(assistant_message, finish_reason)
@@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+"""Build the Hermes Model Catalog — a centralized JSON manifest of curated models.
+
+This script reads the in-repo hardcoded curated lists (``OPENROUTER_MODELS``,
+``_PROVIDER_MODELS["nous"]``) and writes them to a JSON manifest that the
+Hermes CLI fetches at runtime. Publishing the catalog through the docs site
+lets maintainers update model lists without shipping a Hermes release.
+
+The runtime fetcher falls back to the same in-repo hardcoded lists if the
+manifest is unreachable, so this script is a convenience for keeping the
+manifest in sync — not a source of truth.
+
+Usage::
+
+    python scripts/build_model_catalog.py
+
+Output: ``website/static/api/model-catalog.json``
+
+Live URL (after ``deploy-site.yml`` runs on merge to main):
+``https://hermes-agent.nousresearch.com/docs/api/model-catalog.json``
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import sys
+from datetime import datetime, timezone
+
+REPO_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+sys.path.insert(0, REPO_ROOT)
+
+# Ensure HERMES_HOME is set for imports that touch it at module level.
+os.environ.setdefault("HERMES_HOME", os.path.join(os.path.expanduser("~"), ".hermes"))
+
+from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS  # noqa: E402
+
+OUTPUT_PATH = os.path.join(REPO_ROOT, "website", "static", "api", "model-catalog.json")
+CATALOG_VERSION = 1
+
+
+def build_catalog() -> dict:
+    return {
+        "version": CATALOG_VERSION,
+        "updated_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
+        "metadata": {
+            "source": "hermes-agent repo",
+            "docs": "https://hermes-agent.nousresearch.com/docs/reference/model-catalog",
+        },
+        "providers": {
+            "openrouter": {
+                "metadata": {
+                    "display_name": "OpenRouter",
+                    "note": (
+                        "Descriptions drive picker badges. Live /api/v1/models "
+                        "filters curated ids by tool-calling support and free pricing."
+                    ),
+                },
+                "models": [
+                    {"id": mid, "description": desc}
+                    for mid, desc in OPENROUTER_MODELS
+                ],
+            },
+            "nous": {
+                "metadata": {
+                    "display_name": "Nous Portal",
+                    "note": (
+                        "Free-tier gating is determined live via Portal pricing "
+                        "(partition_nous_models_by_tier), not this manifest."
+                    ),
+                },
+                "models": [
+                    {"id": mid}
+                    for mid in _PROVIDER_MODELS.get("nous", [])
+                ],
+            },
+        },
+    }
+
+
+def main() -> int:
+    catalog = build_catalog()
+    os.makedirs(os.path.dirname(OUTPUT_PATH), exist_ok=True)
+    with open(OUTPUT_PATH, "w") as fh:
+        json.dump(catalog, fh, indent=2)
+        fh.write("\n")
+
+    print(f"Wrote {OUTPUT_PATH}")
+    for provider, block in catalog["providers"].items():
+        print(f"  {provider}: {len(block['models'])} models")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -1055,10 +1055,37 @@ setup_path() {
        return 0
    fi

-    # FHS layout: /usr/local/bin is on PATH for every standard shell, nothing to inject.
+    # FHS layout: /usr/local/bin is normally on PATH for login shells (via
+    # /etc/profile pathmunge), but on RHEL/CentOS/Rocky/Alma 8+ non-login
+    # interactive root shells (su, sudo -s, tmux panes, some web terminals)
+    # only source /etc/bashrc, which does NOT add /usr/local/bin — and
+    # /root/.bash_profile doesn't either.  So verify with `command -v` and
+    # fall back to writing a PATH guard into /root/.bashrc when needed.
    if [ "$ROOT_FHS_LAYOUT" = true ]; then
        export PATH="$command_link_dir:$PATH"
-        log_info "/usr/local/bin is already on PATH for all shells"
+        # Probe a fresh non-login interactive bash the way the user will use it.
+        # `bash -i -c` sources ~/.bashrc but NOT ~/.bash_profile or /etc/profile,
+        # which is the exact scenario where RHEL root loses /usr/local/bin.
+        if env -i HOME="$HOME" TERM="${TERM:-dumb}" bash -i -c 'command -v hermes' \
+                >/dev/null 2>&1; then
+            log_info "/usr/local/bin is already on PATH for all shells"
+            log_success "hermes command ready"
+            return 0
+        fi
+
+        log_info "hermes not on PATH in non-login shells (common on RHEL-family)"
+        PATH_LINE='export PATH="/usr/local/bin:$PATH"'
+        PATH_COMMENT='# Hermes Agent — ensure /usr/local/bin is on PATH (RHEL non-login shells)'
+        for SHELL_CONFIG in "$HOME/.bashrc" "$HOME/.bash_profile"; do
+            [ -f "$SHELL_CONFIG" ] || continue
+            if ! grep -v '^[[:space:]]*#' "$SHELL_CONFIG" 2>/dev/null \
+                    | grep -qE 'PATH=.*(/usr/local/bin|\$command_link_dir)'; then
+                echo "" >> "$SHELL_CONFIG"
+                echo "$PATH_COMMENT" >> "$SHELL_CONFIG"
+                echo "$PATH_LINE" >> "$SHELL_CONFIG"
+                log_success "Added /usr/local/bin to PATH in $SHELL_CONFIG"
+            fi
+        done
        log_success "hermes command ready"
        return 0
    fi
@@ -0,0 +1,614 @@
+#!/usr/bin/env python3
+"""Drive the Hermes TUI under HERMES_DEV_PERF and summarize the pipeline.
+
+Usage:
+  scripts/profile-tui.py [--session SID] [--hold KEY] [--seconds N] [--rate HZ]
+
+Defaults: picks the session with the most messages, holds PageUp for 8s at
+~30 Hz (matching xterm key-repeat), summarizes ~/.hermes/perf.log on exit.
+
+The --tui build must exist (run `npm run build` in ui-tui first). This script
+launches `node dist/entry.js` directly with HERMES_TUI_RESUME set so it
+bypasses the hermes_cli wrapper — we want repeatable timing, not the CLI's
+session-picker flow.
+
+Environment overrides:
+  HERMES_PERF_LOG     (default ~/.hermes/perf.log)
+  HERMES_PERF_NODE    (default node from $PATH)
+  HERMES_TUI_DIR      (default /home/bb/hermes-agent/ui-tui)
+
+Exit code is 0 if the harness ran and parsed results, 2 if the TUI crashed
+or produced no perf data (suggests HERMES_DEV_PERF wiring is broken).
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import pty
+import select
+import signal
+import sqlite3
+import sys
+import time
+from pathlib import Path
+from typing import Any
+
+
+DEFAULT_TUI_DIR = Path(os.environ.get("HERMES_TUI_DIR", "/home/bb/hermes-agent/ui-tui"))
+DEFAULT_LOG = Path(os.environ.get("HERMES_PERF_LOG", str(Path.home() / ".hermes" / "perf.log")))
+DEFAULT_STATE_DB = Path.home() / ".hermes" / "state.db"
+
+# Keystroke escape sequences.  Matches what xterm/VT220 send when the
+# terminal has bracketed-paste disabled and the key-repeat handler fires.
+KEYS = {
+    "page_up": b"\x1b[5~",
+    "page_down": b"\x1b[6~",
+    "wheel_up": b"\x1b[M`!!",      # mouse wheel up (SGR-less) — best-effort
+    "shift_up": b"\x1b[1;2A",
+    "shift_down": b"\x1b[1;2B",
+}
+
+
+def pick_longest_session(db: Path) -> str:
+    conn = sqlite3.connect(db)
+    row = conn.execute(
+        "SELECT id FROM sessions s ORDER BY "
+        "(SELECT COUNT(*) FROM messages m WHERE m.session_id = s.id) DESC LIMIT 1"
+    ).fetchone()
+    if not row:
+        sys.exit(f"no sessions in {db}")
+    return row[0]
+
+
+def drain(fd: int, timeout: float) -> bytes:
+    """Read whatever's available from fd within `timeout`, then return."""
+    chunks = []
+    end = time.monotonic() + timeout
+    while time.monotonic() < end:
+        r, _, _ = select.select([fd], [], [], max(0.0, end - time.monotonic()))
+        if not r:
+            break
+        try:
+            data = os.read(fd, 4096)
+        except OSError:
+            break
+        if not data:
+            break
+        chunks.append(data)
+    return b"".join(chunks)
+
+
+def hold_key(fd: int, seq: bytes, seconds: float, rate_hz: int) -> int:
+    """Write `seq` to fd at ~rate_hz for `seconds`. Returns keystrokes sent."""
+    interval = 1.0 / max(1, rate_hz)
+    end = time.monotonic() + seconds
+    sent = 0
+    while time.monotonic() < end:
+        try:
+            os.write(fd, seq)
+            sent += 1
+        except OSError:
+            break
+        # Drain stdout to keep the PTY buffer flowing; ignore content.
+        drain(fd, 0)
+        time.sleep(interval)
+    return sent
+
+
+def summarize(log: Path, since_ts_ms: int) -> dict[str, Any]:
+    """Parse perf.log, keep only events newer than since_ts_ms, return stats."""
+    react_events: list[dict[str, Any]] = []
+    frame_events: list[dict[str, Any]] = []
+    if not log.exists():
+        return {"error": f"no log at {log}", "react": [], "frame": []}
+    for line in log.read_text().splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            row = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if int(row.get("ts", 0)) < since_ts_ms:
+            continue
+        src = row.get("src")
+        if src == "react":
+            react_events.append(row)
+        elif src == "frame":
+            frame_events.append(row)
+
+    return {
+        "react": react_events,
+        "frame": frame_events,
+    }
+
+
+def pct(values: list[float], p: float) -> float:
+    if not values:
+        return 0.0
+    s = sorted(values)
+    idx = min(len(s) - 1, int(len(s) * p))
+    return s[idx]
+
+
+def format_report(data: dict[str, Any]) -> str:
+    react = data.get("react") or []
+    frames = data.get("frame") or []
+    out = []
+
+    out.append("═══ React Profiler ═══")
+    if not react:
+        out.append("  (no react events — HERMES_DEV_PERF wired? threshold too high?)")
+    else:
+        by_id: dict[str, list[float]] = {}
+        for r in react:
+            by_id.setdefault(r["id"], []).append(r["actualMs"])
+        out.append(f"  {'pane':<14} {'count':>6} {'p50':>8} {'p95':>8} {'p99':>8} {'max':>8}")
+        for pid, ms in sorted(by_id.items(), key=lambda kv: -pct(kv[1], 0.99)):
+            out.append(
+                f"  {pid:<14} {len(ms):>6} {pct(ms,0.50):>8.2f} {pct(ms,0.95):>8.2f} "
+                f"{pct(ms,0.99):>8.2f} {max(ms):>8.2f}"
+            )
+
+    out.append("")
+    out.append("═══ Ink pipeline ═══")
+    if not frames:
+        out.append("  (no frame events — onFrame wiring broken?)")
+    else:
+        dur = [f["durationMs"] for f in frames]
+        phases_present = any(f.get("phases") for f in frames)
+        out.append(f"  frames captured: {len(frames)}")
+        out.append(
+            f"  durationMs  p50={pct(dur,0.50):.2f}  p95={pct(dur,0.95):.2f}  "
+            f"p99={pct(dur,0.99):.2f}  max={max(dur):.2f}"
+        )
+        # Effective FPS during the run: frames / elapsed seconds.
+        ts = sorted(f["ts"] for f in frames)
+        if len(ts) >= 2:
+            elapsed_s = (ts[-1] - ts[0]) / 1000.0
+            fps = len(frames) / elapsed_s if elapsed_s > 0 else float("inf")
+            out.append(f"  throughput: {len(frames)} frames / {elapsed_s:.2f}s = {fps:.1f} fps")
+
+        if phases_present:
+            fields = ["yoga", "renderer", "diff", "optimize", "write", "commit"]
+            out.append("")
+            out.append(f"  {'phase':<10} {'p50':>8} {'p95':>8} {'p99':>8} {'max':>8}   (ms)")
+            for field in fields:
+                vals = [f["phases"][field] for f in frames if f.get("phases")]
+                if vals:
+                    out.append(
+                        f"  {field:<10} {pct(vals,0.50):>8.2f} {pct(vals,0.95):>8.2f} "
+                        f"{pct(vals,0.99):>8.2f} {max(vals):>8.2f}"
+                    )
+            # Derived: sum of phases vs durationMs (reveals hidden time).
+            sum_ps = [
+                sum(f["phases"][k] for k in fields)
+                for f in frames if f.get("phases")
+            ]
+            if sum_ps:
+                dur_match = [f["durationMs"] for f in frames if f.get("phases")]
+                deltas = [d - s for d, s in zip(dur_match, sum_ps)]
+                out.append(
+                    f"  {'dur-Σphases':<10} {pct(deltas,0.50):>8.2f} {pct(deltas,0.95):>8.2f} "
+                    f"{pct(deltas,0.99):>8.2f} {max(deltas):>8.2f}   (unaccounted-for time)"
+                )
+
+            # Yoga counters
+            visited = [f["phases"]["yogaVisited"] for f in frames if f.get("phases")]
+            measured = [f["phases"]["yogaMeasured"] for f in frames if f.get("phases")]
+            cache_hits = [f["phases"]["yogaCacheHits"] for f in frames if f.get("phases")]
+            live = [f["phases"]["yogaLive"] for f in frames if f.get("phases")]
+            out.append("")
+            out.append("  Yoga counters (per frame):")
+            for name, vals in (
+                ("visited", visited),
+                ("measured", measured),
+                ("cacheHits", cache_hits),
+                ("live", live),
+            ):
+                if vals:
+                    out.append(f"    {name:<11} p50={pct(vals,0.5):.0f}  p99={pct(vals,0.99):.0f}  max={max(vals)}")
+
+            # Patch counts — proxy for "how much changed each frame"
+            patches = [f["phases"]["patches"] for f in frames if f.get("phases")]
+            if patches:
+                out.append(
+                    f"  patches     p50={pct(patches,0.5):.0f}  p99={pct(patches,0.99):.0f}  "
+                    f"max={max(patches)}  total={sum(patches)}"
+                )
+            optimized = [
+                f["phases"].get("optimizedPatches", 0)
+                for f in frames if f.get("phases")
+            ]
+            if any(optimized):
+                out.append(
+                    f"  optimized   p50={pct(optimized,0.5):.0f}  p99={pct(optimized,0.99):.0f}  "
+                    f"max={max(optimized)}  total={sum(optimized)}"
+                    f"  (ratio: {sum(optimized)/max(1,sum(patches)):.2f})"
+                )
+
+            # Write bytes + drain telemetry — the outer-terminal bottleneck gauge.
+            bytes_written = [
+                f["phases"].get("writeBytes", 0)
+                for f in frames if f.get("phases")
+            ]
+            if any(bytes_written):
+                total_b = sum(bytes_written)
+                kb = total_b / 1024
+                out.append(
+                    f"  writeBytes  p50={pct(bytes_written,0.5):.0f}B  p99={pct(bytes_written,0.99):.0f}B  "
+                    f"max={max(bytes_written)}B  total={kb:.1f}KB"
+                )
+            drains = [
+                f["phases"].get("prevFrameDrainMs", 0)
+                for f in frames if f.get("phases")
+            ]
+            if any(d > 0 for d in drains):
+                nonzero = [d for d in drains if d > 0]
+                out.append(
+                    f"  drainMs     p50={pct(nonzero,0.5):.2f}  p95={pct(nonzero,0.95):.2f}  "
+                    f"p99={pct(nonzero,0.99):.2f}  max={max(nonzero):.2f}   (terminal flush latency)"
+                )
+            backpressure = sum(1 for f in frames if f.get("phases", {}).get("backpressure"))
+            if backpressure:
+                out.append(
+                    f"  backpressure: {backpressure}/{len(frames)} frames "
+                    f"({100*backpressure/len(frames):.0f}%)   (Node stdout buffer full — terminal slow)"
+                )
+
+        # Flickers
+        flicker_frames = [f for f in frames if f.get("flickers")]
+        if flicker_frames:
+            out.append("")
+            out.append(f"  ⚠ flickers detected in {len(flicker_frames)} frames")
+            reasons: dict[str, int] = {}
+            for f in flicker_frames:
+                for fl in f["flickers"]:
+                    reasons[fl["reason"]] = reasons.get(fl["reason"], 0) + 1
+            for reason, n in sorted(reasons.items(), key=lambda kv: -kv[1]):
+                out.append(f"    {reason}: {n}")
+
+    return "\n".join(out)
+
+
+def key_metrics(data: dict[str, Any]) -> dict[str, float]:
+    """Flatten the report into a dict of scalar metrics for A/B diffing."""
+    metrics: dict[str, float] = {}
+    frames = data.get("frame") or []
+    react = data.get("react") or []
+
+    if frames:
+        durs = [f["durationMs"] for f in frames]
+        metrics["frames"] = len(frames)
+        metrics["dur_p50"] = pct(durs, 0.50)
+        metrics["dur_p95"] = pct(durs, 0.95)
+        metrics["dur_p99"] = pct(durs, 0.99)
+        metrics["dur_max"] = max(durs)
+
+        ts = sorted(f["ts"] for f in frames)
+        if len(ts) >= 2:
+            elapsed = (ts[-1] - ts[0]) / 1000.0
+            metrics["fps_throughput"] = len(frames) / elapsed if elapsed > 0 else 0.0
+            # Interframe gaps distribution — complementary view to throughput:
+            gaps = [ts[i] - ts[i - 1] for i in range(1, len(ts))]
+            if gaps:
+                metrics["gap_p50_ms"] = pct(gaps, 0.50)
+                metrics["gap_p99_ms"] = pct(gaps, 0.99)
+                metrics["gaps_under_16ms"] = sum(1 for g in gaps if g < 16)
+                metrics["gaps_over_200ms"] = sum(1 for g in gaps if g >= 200)
+
+        for phase in ("renderer", "yoga", "diff", "write"):
+            vals = [f["phases"][phase] for f in frames if f.get("phases")]
+            if vals:
+                metrics[f"{phase}_p99"] = pct(vals, 0.99)
+                metrics[f"{phase}_max"] = max(vals)
+
+        patches = [f["phases"]["patches"] for f in frames if f.get("phases")]
+        if patches:
+            metrics["patches_total"] = sum(patches)
+            metrics["patches_p99"] = pct(patches, 0.99)
+
+        optimized = [
+            f["phases"].get("optimizedPatches", 0) for f in frames if f.get("phases")
+        ]
+        if any(optimized):
+            metrics["optimized_total"] = sum(optimized)
+
+        bytes_list = [
+            f["phases"].get("writeBytes", 0) for f in frames if f.get("phases")
+        ]
+        if any(bytes_list):
+            metrics["writeBytes_total"] = sum(bytes_list)
+
+        drains = [
+            f["phases"].get("prevFrameDrainMs", 0)
+            for f in frames if f.get("phases")
+        ]
+        drain_nonzero = [d for d in drains if d > 0]
+        if drain_nonzero:
+            metrics["drain_p99"] = pct(drain_nonzero, 0.99)
+            metrics["drain_max"] = max(drain_nonzero)
+
+        bp = sum(1 for f in frames if f.get("phases", {}).get("backpressure"))
+        metrics["backpressure_frames"] = bp
+
+    if react:
+        for pid in set(e["id"] for e in react):
+            ms = [e["actualMs"] for e in react if e["id"] == pid]
+            metrics[f"react_{pid}_p99"] = pct(ms, 0.99)
+            metrics[f"react_{pid}_max"] = max(ms)
+
+    return metrics
+
+
+def format_diff(before: dict[str, float], after: dict[str, float]) -> str:
+    """Render a side-by-side A/B comparison table."""
+    keys = sorted(set(before) | set(after))
+    lines = [f"{'metric':<28} {'before':>12} {'after':>12} {'delta':>12}  {'%':>6}"]
+    lines.append("─" * 76)
+    for k in keys:
+        b = before.get(k, 0.0)
+        a = after.get(k, 0.0)
+        d = a - b
+        pct_change = ((a / b) - 1) * 100 if b not in (0, 0.0) else float("inf") if a else 0
+
+        # Flag improvements vs regressions. For _p99 / _max / _total / gaps_over /
+        # patches / writeBytes / backpressure, LOWER is better.  For fps / gaps_under,
+        # HIGHER is better.
+        lower_is_better = any(
+            token in k
+            for token in (
+                "p50",
+                "p95",
+                "p99",
+                "_max",
+                "_total",
+                "gaps_over",
+                "backpressure",
+                "drain",
+            )
+        )
+        higher_is_better = "fps_" in k or "gaps_under" in k
+        mark = ""
+        if d and not (lower_is_better or higher_is_better):
+            mark = ""
+        elif d < 0 and lower_is_better:
+            mark = "↓"
+        elif d > 0 and higher_is_better:
+            mark = "↑"
+        elif d > 0 and lower_is_better:
+            mark = "↑"  # regression
+        elif d < 0 and higher_is_better:
+            mark = "↓"  # regression
+
+        pct_str = "—" if pct_change == float("inf") else f"{pct_change:+6.1f}%"
+        lines.append(
+            f"{k:<28} {b:>12.2f} {a:>12.2f} {d:>+12.2f}  {pct_str} {mark}"
+        )
+
+    return "\n".join(lines)
+
+
+def run_once(args: argparse.Namespace) -> dict[str, Any]:
+    tui_dir = Path(args.tui_dir).resolve()
+    entry = tui_dir / "dist" / "entry.js"
+    if not entry.exists():
+        sys.exit(f"{entry} missing — run `npm run build` in {tui_dir} first")
+
+    sid = args.session or pick_longest_session(DEFAULT_STATE_DB)
+    print(f"• session: {sid}")
+    print(f"• hold: {args.hold} x {args.rate}Hz for {args.seconds}s after {args.warmup}s warmup")
+    print(f"• terminal: {args.cols}x{args.rows}")
+
+    log = Path(args.log)
+    if not args.keep_log and log.exists():
+        log.unlink()
+
+    since_ms = int(time.time() * 1000)
+
+    env = os.environ.copy()
+    env["HERMES_DEV_PERF"] = "1"
+    env["HERMES_DEV_PERF_MS"] = str(args.threshold_ms)
+    env["HERMES_DEV_PERF_LOG"] = str(log)
+    env["HERMES_TUI_RESUME"] = sid
+    env["COLUMNS"] = str(args.cols)
+    env["LINES"] = str(args.rows)
+    env["TERM"] = env.get("TERM", "xterm-256color")
+
+    # Pass through extra flags the TUI wrapper recognizes (e.g. --no-fullscreen).
+    # Stored on args as `extra_flags` list.
+    node = os.environ.get("HERMES_PERF_NODE", "node")
+    node_args = [node, str(entry), *getattr(args, "extra_flags", [])]
+
+    pid, fd = pty.fork()
+    if pid == 0:
+        os.execvpe(node, node_args, env)
+
+    try:
+        import fcntl, struct, termios
+        winsize = struct.pack("HHHH", args.rows, args.cols, 0, 0)
+        fcntl.ioctl(fd, termios.TIOCSWINSZ, winsize)
+
+        print(f"• pid: {pid}  fd: {fd}")
+        print(f"• warmup {args.warmup}s (drain startup output)…")
+        drain(fd, args.warmup)
+
+        print(f"• holding {args.hold}…")
+        sent = hold_key(fd, KEYS[args.hold], args.seconds, args.rate)
+        print(f"  sent {sent} keystrokes")
+
+        drain(fd, 0.5)
+    finally:
+        try:
+            os.kill(pid, signal.SIGTERM)
+            for _ in range(10):
+                pid_done, _ = os.waitpid(pid, os.WNOHANG)
+                if pid_done == pid:
+                    break
+                time.sleep(0.1)
+            else:
+                os.kill(pid, signal.SIGKILL)
+                os.waitpid(pid, 0)
+        except (ProcessLookupError, ChildProcessError):
+            pass
+        try:
+            os.close(fd)
+        except OSError:
+            pass
+
+    time.sleep(0.2)
+    return summarize(log, since_ms)
+
+
+def main() -> int:
+    p = argparse.ArgumentParser()
+    p.add_argument("--session", help="session id to resume (default: longest in db)")
+    p.add_argument("--hold", default="page_up", choices=sorted(KEYS.keys()), help="key to hold")
+    p.add_argument("--seconds", type=float, default=8.0, help="how long to hold the key")
+    p.add_argument("--rate", type=int, default=30, help="keystrokes per second")
+    p.add_argument("--warmup", type=float, default=3.0, help="seconds to wait after launch before input")
+    p.add_argument("--threshold-ms", type=float, default=0.0, help="HERMES_DEV_PERF_MS (0 = capture all)")
+    p.add_argument("--cols", type=int, default=120)
+    p.add_argument("--rows", type=int, default=40)
+    p.add_argument("--keep-log", action="store_true", help="don't wipe perf.log before run")
+    p.add_argument("--tui-dir", default=str(DEFAULT_TUI_DIR))
+    p.add_argument("--log", default=str(DEFAULT_LOG))
+    p.add_argument("--save", metavar="LABEL",
+                   help="save the final metrics as /tmp/perf-<LABEL>.json for later --compare")
+    p.add_argument("--compare", metavar="LABEL",
+                   help="diff against /tmp/perf-<LABEL>.json after running")
+    p.add_argument("--loop", action="store_true",
+                   help="watch for source changes, rebuild, rerun, and diff vs previous run")
+    p.add_argument("--extra-flag", dest="extra_flags", action="append", default=[],
+                   help="pass through to node dist/entry.js (repeatable)")
+    args = p.parse_args()
+
+    if args.loop:
+        return loop_mode(args)
+
+    # Single-shot path.
+    data = run_once(args)
+    print()
+    print(format_report(data))
+
+    metrics = key_metrics(data)
+
+    if args.save:
+        path = Path(f"/tmp/perf-{args.save}.json")
+        path.write_text(json.dumps(metrics, indent=2))
+        print(f"\n• saved: {path}")
+
+    if args.compare:
+        path = Path(f"/tmp/perf-{args.compare}.json")
+        if not path.exists():
+            print(f"\n⚠ no baseline at {path} — run with --save {args.compare} first")
+        else:
+            before = json.loads(path.read_text())
+            print(f"\n═══ A/B diff vs /tmp/perf-{args.compare}.json ═══")
+            print(format_diff(before, metrics))
+
+    if not data["react"] and not data["frame"]:
+        return 2
+    return 0
+
+
+def loop_mode(args: argparse.Namespace) -> int:
+    """Watch source files, rebuild, rerun, print A/B diff against previous run.
+
+    Keeps a rolling 'previous run' baseline in memory so each iteration
+    reports delta vs the last one — visibility into whether the last
+    edit moved the needle.  Press Ctrl+C to stop.
+    """
+    import subprocess
+
+    tui_dir = Path(args.tui_dir).resolve()
+    src_root = tui_dir / "src"
+    pkg_root = tui_dir / "packages" / "hermes-ink" / "src"
+
+    def collect_mtimes() -> dict[str, float]:
+        mtimes: dict[str, float] = {}
+        for root in (src_root, pkg_root):
+            if not root.exists():
+                continue
+            for path in root.rglob("*"):
+                if path.suffix in {".ts", ".tsx"} and "__tests__" not in str(path):
+                    try:
+                        mtimes[str(path)] = path.stat().st_mtime
+                    except OSError:
+                        pass
+        return mtimes
+
+    previous_metrics: dict[str, float] | None = None
+    previous_mtimes = collect_mtimes()
+    iteration = 0
+
+    print(f"• loop mode — watching {src_root} + {pkg_root} for *.ts(x) changes")
+    print("• edit any TS file, the harness rebuilds + reruns automatically")
+    print("• Ctrl+C to stop\n")
+
+    try:
+        while True:
+            iteration += 1
+            print(f"\n{'═' * 76}")
+            print(f"Iteration {iteration}  @ {time.strftime('%H:%M:%S')}")
+            print("═" * 76)
+
+            if iteration > 1:
+                print("• rebuilding…")
+                result = subprocess.run(
+                    ["npm", "run", "build"],
+                    cwd=tui_dir,
+                    capture_output=True,
+                    text=True,
+                )
+                if result.returncode != 0:
+                    print("✗ build failed:")
+                    print(result.stdout[-2000:])
+                    print(result.stderr[-2000:])
+                    print("\n• waiting for source changes to retry…")
+                    previous_mtimes = wait_for_change(previous_mtimes, collect_mtimes)
+                    continue
+                print("✓ build ok")
+
+            data = run_once(args)
+            metrics = key_metrics(data)
+
+            print()
+            print(format_report(data))
+
+            if previous_metrics is not None:
+                print(f"\n═══ A/B diff vs iteration {iteration - 1} ═══")
+                print(format_diff(previous_metrics, metrics))
+
+            previous_metrics = metrics
+
+            print("\n• waiting for source changes…")
+            previous_mtimes = wait_for_change(previous_mtimes, collect_mtimes)
+    except KeyboardInterrupt:
+        print("\n• loop stopped")
+        return 0
+
+
+def wait_for_change(prev: dict[str, float], collect) -> dict[str, float]:
+    """Poll every 1s until a watched file's mtime changes. Debounced 500ms."""
+    while True:
+        time.sleep(1)
+        current = collect()
+
+        changed = [
+            path for path, mtime in current.items() if prev.get(path) != mtime
+        ]
+
+        if changed:
+            print(f"  ↻ {len(changed)} file(s) changed:")
+            for path in changed[:5]:
+                print(f"    {path}")
+            # Debounce — editor save bursts can take ~500ms to settle
+            time.sleep(0.5)
+            return collect()
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/Show More
+++ b/Show More