fix(dashboard): persist chat tab state across tab switches

The dashboard's Chat tab (hermes dashboard --tui) lost its session whenever the user navigated to another tab and came back. React Router unmounted ChatPage on path change, which ran the cleanup function, closed the PTY WebSocket, and terminated the underlying TUI child - so the next mount generated a fresh channel id, spawned a new PTY, and started a brand-new conversation. Rather than rebuild the destroyed state (session id capture + resume via HERMES_TUI_RESUME would reload history from disk but drop in-flight tool state, scrollback, and picker position), keep the component tree alive. * Pull ChatPage out of Routes into a sibling always-mounted host that toggles visibility via display:none keyed off the current route. A tiny ChatRouteSink still claims /chat so the catch-all redirect does not fire. * xterm instance, WebSocket, PTY child, and TUI/agent state all survive; returning to /chat shows the exact conversation the user left. * Respect plugin `/chat` overrides: if a plugin manifest declares `tab.override: "/chat"`, the Routes tree already swaps the element for <PluginPage /> — we additionally suppress the persistent host so the two don't paint on top of each other. Preserves the pre-persistence contract that a plugin owning /chat replaces the built-in chat UI entirely. * Wait for usePlugins() to finish loading before mounting the persistent host. Manifests arrive asynchronously from /api/dashboard/plugins, so without the `!pluginsLoading` gate the host would mount with manifests=[], spawn a PTY, and then unmount mid-session when the manifest list resolves and reveals a /chat override. Typical delay is <50ms; worst case is the 2s plugin- registration safety timeout. Cheaper than killing someone's conversation underneath them. * Gate page-header slot (`setEnd`), the mobile sheet's portalled render, and body-scroll lock on a new `isActive` prop so the hidden ChatPage doesn't fight the active page for shared state. The scroll-lock effect keys on the *derived* `mobilePanelOpen` (which is `isActive && mobilePanelOpenRaw`) rather than the raw state — that way tab-switch flips the dep false, fires the cleanup, and releases `document.body.style.overflow`. Keying on the raw state would leave body.overflow="hidden" stuck on /sessions and every other tab until the user navigated back to /chat and explicitly closed the sheet. * When isActive flips false to true, force a double-rAF fit: display:none collapses the host box and ResizeObserver does not fire on display changes, so xterm would otherwise stay at a stale or 1x1 grid. Also early-return from syncTerminalMetrics when the host has zero area, since fit() on a zero-sized element produces a 1x1 terminal. * Focus handling on tab return: only steal focus into the terminal if focus wasn't already parked somewhere inside ChatPage (e.g. the sidebar model picker, a tool-call entry). Yanking focus away from whatever the user last clicked is surprising and a screen-reader foot-gun; the typical "first activation" case still focuses the terminal because document.activeElement is <body> at that point. Trade-off worth flagging, deliberately not mitigated in this change: while hidden, ChatPage still holds a PTY child + WebSocket + xterm instance for the dashboard's full lifetime. The WS keeps delivering bytes and xterm keeps parsing them into a display:none host (cheap — no paint work, but not free). Reasonable costs to pay for the session preservation; if they become a problem we can pause `term.write` when !isActive or idle-disconnect after N minutes hidden. Lint clean on touched files. tsc -b && vite build pass.
test(gateway): cover /compress summary-failure warning path
2026-04-28 02:40:25 -04:00 · 2026-04-27 19:18:13 -07:00 · 2026-04-27 19:18:13 -07:00 · 2026-04-27 19:18:13 -07:00 · 2026-04-27 19:18:13 -07:00 · 2026-04-27 19:18:13 -07:00
104 changed files with 4987 additions and 385 deletions
@@ -69,3 +69,4 @@ mini-swe-agent/
 .nix-stamps/
 result
 website/static/api/skills-index.json
+models-dev-upstream/
@@ -82,6 +82,8 @@ _PROVIDER_ALIASES = {
    "moonshot": "kimi-coding",
    "kimi-cn": "kimi-coding-cn",
    "moonshot-cn": "kimi-coding-cn",
+    "gmi-cloud": "gmi",
+    "gmicloud": "gmi",
    "minimax-china": "minimax-cn",
    "minimax_cn": "minimax-cn",
    "claude": "anthropic",
@@ -155,6 +157,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
    "kimi-coding": "kimi-k2-turbo-preview",
    "stepfun": "step-3.5-flash",
    "kimi-coding-cn": "kimi-k2-turbo-preview",
+    "gmi": "google/gemini-3.1-flash-lite-preview",
    "minimax": "MiniMax-M2.7",
    "minimax-cn": "MiniMax-M2.7",
    "anthropic": "claude-haiku-4-5-20251001",
@@ -2558,12 +2561,19 @@ def _is_openrouter_client(client: Any) -> bool:
    return False


+def _cached_client_accepts_slash_models(client: Any, cached_default: Optional[str]) -> bool:
+    """Best-effort check for cached clients that accept ``vendor/model`` IDs."""
+    if _is_openrouter_client(client):
+        return True
+    return bool(cached_default and "/" in cached_default)
+
+
 def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
-    """Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.
+    """Keep slash-bearing model IDs only for cached clients that support them.

    Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
    """
-    if model and "/" in model and not _is_openrouter_client(client):
+    if model and "/" in model and not _cached_client_accepts_slash_models(client, cached_default):
        return cached_default
    return model or cached_default

@@ -338,6 +338,8 @@ class ContextCompressor(ContextEngine):
        self._context_probe_persistable = False
        self._previous_summary = None
        self._last_summary_error = None
+        self._last_summary_dropped_count = 0
+        self._last_summary_fallback_used = False
        self._last_compression_savings_pct = 100.0
        self._ineffective_compression_count = 0

@@ -441,6 +443,11 @@ class ContextCompressor(ContextEngine):
        self._ineffective_compression_count: int = 0
        self._summary_failure_cooldown_until: float = 0.0
        self._last_summary_error: Optional[str] = None
+        # When summary generation fails and a static fallback is inserted,
+        # record how many turns were unrecoverably dropped so callers
+        # (gateway hygiene, /compress) can surface a visible warning.
+        self._last_summary_dropped_count: int = 0
+        self._last_summary_fallback_used: bool = False

    def update_from_response(self, usage: Dict[str, Any]):
        """Update tracked token usage from API response."""
@@ -1196,6 +1203,11 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                related to this topic and be more aggressive about compressing
                everything else.  Inspired by Claude Code's ``/compact``.
        """
+        # Reset per-call summary failure state — callers inspect these fields
+        # after compress() returns to decide whether to surface a warning.
+        self._last_summary_dropped_count = 0
+        self._last_summary_fallback_used = False
+        self._last_summary_error = None
        n_messages = len(messages)
        # Only need head + 3 tail messages minimum (token budget decides the real tail size)
        _min_for_compress = self.protect_first_n + 3 + 1
@@ -1274,11 +1286,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            if not self.quiet_mode:
                logger.warning("Summary generation failed — inserting static fallback context marker")
            n_dropped = compress_end - compress_start
+            self._last_summary_dropped_count = n_dropped
+            self._last_summary_fallback_used = True
            summary = (
                f"{SUMMARY_PREFIX}\n"
-                f"Summary generation was unavailable. {n_dropped} conversation turns were "
+                f"Summary generation was unavailable. {n_dropped} message(s) were "
                f"removed to free context space but could not be summarized. The removed "
-                f"turns contained earlier work in this session. Continue based on the "
+                f"messages contained earlier work in this session. Continue based on the "
                f"recent messages below and the current state of any files or resources."
            )

@@ -63,15 +63,124 @@ def sanitize_context(text: str) -> str:
    return text


-def build_memory_context_block(raw_context: str) -> str:
-    """Wrap prefetched memory in a fenced block with system note.
+class StreamingContextScrubber:
+    """Stateful scrubber for streaming text that may contain split memory-context spans.

-    The fence prevents the model from treating recalled context as user
-    discourse.  Injected at API-call time only — never persisted.
+    The one-shot ``sanitize_context`` regex cannot survive chunk boundaries:
+    a ``<memory-context>`` opened in one delta and closed in a later delta
+    leaks its payload to the UI because the non-greedy block regex needs
+    both tags in one string.  This scrubber runs a small state machine
+    across deltas, holding back partial-tag tails and discarding
+    everything inside a span (including the system-note line).
+
+    Usage::
+
+        scrubber = StreamingContextScrubber()
+        for delta in stream:
+            visible = scrubber.feed(delta)
+            if visible:
+                emit(visible)
+        trailing = scrubber.flush()  # at end of stream
+        if trailing:
+            emit(trailing)
+
+    The scrubber is re-entrant per agent instance.  Callers building new
+    top-level responses (new turn) should create a fresh scrubber or call
+    ``reset()``.
    """
+
+    _OPEN_TAG = "<memory-context>"
+    _CLOSE_TAG = "</memory-context>"
+
+    def __init__(self) -> None:
+        self._in_span: bool = False
+        self._buf: str = ""
+
+    def reset(self) -> None:
+        self._in_span = False
+        self._buf = ""
+
+    def feed(self, text: str) -> str:
+        """Return the visible portion of ``text`` after scrubbing.
+
+        Any trailing fragment that could be the start of an open/close tag
+        is held back in the internal buffer and surfaced on the next
+        ``feed()`` call or discarded/emitted by ``flush()``.
+        """
+        if not text:
+            return ""
+        buf = self._buf + text
+        self._buf = ""
+        out: list[str] = []
+
+        while buf:
+            if self._in_span:
+                idx = buf.lower().find(self._CLOSE_TAG)
+                if idx == -1:
+                    # Hold back a potential partial close tag; drop the rest
+                    held = self._max_partial_suffix(buf, self._CLOSE_TAG)
+                    self._buf = buf[-held:] if held else ""
+                    return "".join(out)
+                # Found close — skip span content + tag, continue
+                buf = buf[idx + len(self._CLOSE_TAG):]
+                self._in_span = False
+            else:
+                idx = buf.lower().find(self._OPEN_TAG)
+                if idx == -1:
+                    # No open tag — hold back a potential partial open tag
+                    held = self._max_partial_suffix(buf, self._OPEN_TAG)
+                    if held:
+                        out.append(buf[:-held])
+                        self._buf = buf[-held:]
+                    else:
+                        out.append(buf)
+                    return "".join(out)
+                # Emit text before the tag, enter span
+                if idx > 0:
+                    out.append(buf[:idx])
+                buf = buf[idx + len(self._OPEN_TAG):]
+                self._in_span = True
+
+        return "".join(out)
+
+    def flush(self) -> str:
+        """Emit any held-back buffer at end-of-stream.
+
+        If we're still inside an unterminated span the remaining content is
+        discarded (safer: leaking partial memory context is worse than a
+        truncated answer).  Otherwise the held-back partial-tag tail is
+        emitted verbatim (it turned out not to be a real tag).
+        """
+        if self._in_span:
+            self._buf = ""
+            self._in_span = False
+            return ""
+        tail = self._buf
+        self._buf = ""
+        return tail
+
+    @staticmethod
+    def _max_partial_suffix(buf: str, tag: str) -> int:
+        """Return the length of the longest buf-suffix that is a tag-prefix.
+
+        Case-insensitive.  Returns 0 if no suffix could start the tag.
+        """
+        tag_lower = tag.lower()
+        buf_lower = buf.lower()
+        max_check = min(len(buf_lower), len(tag_lower) - 1)
+        for i in range(max_check, 0, -1):
+            if tag_lower.startswith(buf_lower[-i:]):
+                return i
+        return 0
+
+
+def build_memory_context_block(raw_context: str) -> str:
+    """Wrap prefetched memory in a fenced block with system note."""
    if not raw_context or not raw_context.strip():
        return ""
    clean = sanitize_context(raw_context)
+    if clean != raw_context:
+        logger.warning("memory provider returned pre-wrapped context; stripped")
    return (
        "<memory-context>\n"
        "[System note: The following is recalled memory context, "
@@ -51,6 +51,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "qwen-oauth",
    "xiaomi",
    "arcee",
+    "gmi",
    "custom", "local",
    # Common aliases
    "google", "google-gemini", "google-ai-studio",
@@ -60,6 +61,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
    "mimo", "xiaomi-mimo",
    "arcee-ai", "arceeai",
+    "gmi-cloud", "gmicloud",
    "xai", "x-ai", "x.ai", "grok",
    "nvidia", "nim", "nvidia-nim", "nemotron",
    "qwen-portal",
@@ -307,6 +309,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "integrate.api.nvidia.com": "nvidia",
    "api.xiaomimimo.com": "xiaomi",
    "xiaomimimo.com": "xiaomi",
+    "api.gmi-serving.com": "gmi",
    "ollama.com": "ollama-cloud",
 }

@@ -702,6 +705,29 @@ def fetch_endpoint_model_metadata(
    return {}


+def _resolve_endpoint_context_length(
+    model: str,
+    base_url: str,
+    api_key: str = "",
+) -> Optional[int]:
+    """Resolve context length from an endpoint's live ``/models`` metadata."""
+    endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
+    matched = endpoint_metadata.get(model)
+    if not matched:
+        if len(endpoint_metadata) == 1:
+            matched = next(iter(endpoint_metadata.values()))
+        else:
+            for key, entry in endpoint_metadata.items():
+                if model in key or key in model:
+                    matched = entry
+                    break
+    if matched:
+        context_length = matched.get("context_length")
+        if isinstance(context_length, int):
+            return context_length
+    return None
+
+
 def _get_context_cache_path() -> Path:
    """Return path to the persistent context length cache file."""
    from hermes_constants import get_hermes_home
@@ -1295,22 +1321,9 @@ def get_model_context_length(
    # returns 128k) instead of the model's full context (400k).  models.dev
    # has the correct per-provider values and is checked at step 5+.
    if _is_custom_endpoint(base_url) and not _is_known_provider_base_url(base_url):
-        endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
-        matched = endpoint_metadata.get(model)
-        if not matched:
-            # Single-model servers: if only one model is loaded, use it
-            if len(endpoint_metadata) == 1:
-                matched = next(iter(endpoint_metadata.values()))
-            else:
-                # Fuzzy match: substring in either direction
-                for key, entry in endpoint_metadata.items():
-                    if model in key or key in model:
-                        matched = entry
-                        break
-        if matched:
-            context_length = matched.get("context_length")
-            if isinstance(context_length, int):
-                return context_length
+        context_length = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
+        if context_length is not None:
+            return context_length
        if not _is_known_provider_base_url(base_url):
            # 3. Try querying local server directly
            if is_local_endpoint(base_url):
@@ -1374,6 +1387,12 @@ def get_model_context_length(
            if base_url:
                save_context_length(model, base_url, codex_ctx)
            return codex_ctx
+    if effective_provider == "gmi" and base_url:
+        # GMI exposes authoritative context_length via /models, but it is not
+        # in models.dev yet. Preserve that higher-fidelity endpoint lookup.
+        ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
+        if ctx is not None:
+            return ctx
    if effective_provider:
        from agent.models_dev import lookup_models_dev_context
        ctx = lookup_models_dev_context(effective_provider, model)
@@ -6000,6 +6000,7 @@ class HermesCLI:
            platform_status = {
                Platform.TELEGRAM: ("Telegram", "TELEGRAM_BOT_TOKEN"),
                Platform.DISCORD: ("Discord", "DISCORD_BOT_TOKEN"),
+                Platform.SLACK: ("Slack", "SLACK_BOT_TOKEN"),
                Platform.WHATSAPP: ("WhatsApp", "WHATSAPP_ENABLED"),
            }
            
@@ -36,6 +36,7 @@

      imports = [
        ./nix/packages.nix
+        ./nix/overlays.nix
        ./nix/nixosModules.nix
        ./nix/checks.nix
        ./nix/devShell.nix
@@ -566,6 +566,8 @@ def load_gateway_config() -> GatewayConfig:
                        existing = {}
                    # Deep-merge extra dicts so gateway.json defaults survive
                    merged_extra = {**existing.get("extra", {}), **plat_block.get("extra", {})}
+                    if plat_name == Platform.SLACK.value and "enabled" in plat_block:
+                        merged_extra["_enabled_explicit"] = True
                    merged = {**existing, **plat_block}
                    if merged_extra:
                        merged["extra"] = merged_extra
@@ -610,16 +612,21 @@ def load_gateway_config() -> GatewayConfig:
                        bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
                    else:
                        bridged["channel_prompts"] = channel_prompts
-                if not bridged:
+                enabled_was_explicit = "enabled" in platform_cfg
+                if not bridged and not enabled_was_explicit:
                    continue
                plat_data = platforms_data.setdefault(plat.value, {})
                if not isinstance(plat_data, dict):
                    plat_data = {}
                    platforms_data[plat.value] = plat_data
+                if enabled_was_explicit:
+                    plat_data["enabled"] = platform_cfg["enabled"]
                extra = plat_data.setdefault("extra", {})
                if not isinstance(extra, dict):
                    extra = {}
                    plat_data["extra"] = extra
+                if plat == Platform.SLACK and enabled_was_explicit:
+                    extra["_enabled_explicit"] = True
                extra.update(bridged)

            # Slack settings → env vars (env vars take precedence)
@@ -941,6 +948,14 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
            # No yaml config for Slack — env-only setup, enable it
            config.platforms[Platform.SLACK] = PlatformConfig()
            config.platforms[Platform.SLACK].enabled = True
+        else:
+            slack_config = config.platforms[Platform.SLACK]
+            enabled_was_explicit = bool(slack_config.extra.pop("_enabled_explicit", False))
+            if not slack_config.enabled and not enabled_was_explicit:
+                # Top-level Slack settings such as channel prompts should not
+                # turn an env-token setup into a disabled platform. Only an
+                # explicit slack.enabled/platforms.slack.enabled false should.
+                slack_config.enabled = True
        # If yaml config exists, respect its enabled flag (don't override
        # explicit enabled: false). Token is still stored so skills that
        # send Slack messages can use it without activating the gateway adapter.
@@ -1702,13 +1702,41 @@ class BasePlatformAdapter(ABC):
        the agent is waiting for dangerous-command approval).  This is critical
        for Slack's Assistant API where ``assistant_threads_setStatus`` disables
        the compose box — pausing lets the user type ``/approve`` or ``/deny``.
+
+        Each ``send_typing`` call is bounded by a ~1.5s timeout so a slow
+        network round-trip can't stall the refresh cadence.  Telegram- and
+        Discord-side typing expire after ~5s; if any individual send_typing
+        takes longer than the refresh interval, the bubble would die and
+        stay dead until that call returns.  Abandoning the slow call lets
+        the next tick fire a fresh send_typing on schedule — as long as
+        one of them succeeds within the 5s platform-side window, the bubble
+        stays visible across provider stalls / upstream API timeouts.
        """
+        # Bound each send_typing round-trip so the refresh cadence isn't
+        # gated on network health.  Must stay below ``interval`` so a slow
+        # call gets abandoned before the next scheduled tick.
+        _send_typing_timeout = max(0.25, min(1.5, interval - 0.25))
        try:
            while True:
                if stop_event is not None and stop_event.is_set():
                    return
                if chat_id not in self._typing_paused:
-                    await self.send_typing(chat_id, metadata=metadata)
+                    try:
+                        await asyncio.wait_for(
+                            self.send_typing(chat_id, metadata=metadata),
+                            timeout=_send_typing_timeout,
+                        )
+                    except asyncio.TimeoutError:
+                        # Slow network — abandon this tick, keep the loop
+                        # on schedule so the next send_typing fires fresh.
+                        pass
+                    except asyncio.CancelledError:
+                        raise
+                    except Exception as typing_err:
+                        logger.debug(
+                            "[%s] send_typing error (non-fatal): %s",
+                            self.name, typing_err,
+                        )
                if stop_event is None:
                    await asyncio.sleep(interval)
                    continue
@@ -4800,6 +4800,34 @@ class GatewayRunner:
                                            "compression",
                                            f"{_new_tokens:,}",
                                        )
+
+                                    # If summary generation failed, the
+                                    # compressor inserted a static fallback
+                                    # placeholder and the dropped turns are
+                                    # gone for good.  Surface a visible
+                                    # warning to the gateway user — agent.log
+                                    # alone is invisible on TG/Discord/etc.
+                                    _comp = getattr(_hyg_agent, "context_compressor", None)
+                                    if _comp is not None and getattr(_comp, "_last_summary_fallback_used", False):
+                                        _dropped = getattr(_comp, "_last_summary_dropped_count", 0)
+                                        _err = getattr(_comp, "_last_summary_error", None) or "unknown error"
+                                        _warn_msg = (
+                                            "⚠️ Context compression summary failed "
+                                            f"({_err}). {_dropped} historical message(s) "
+                                            "were removed and replaced with a placeholder. "
+                                            "Earlier context is no longer recoverable. "
+                                            "Consider /reset for a clean session, or check "
+                                            "your auxiliary.compression model configuration."
+                                        )
+                                        try:
+                                            _adapter = self.adapters.get(source.platform)
+                                            if _adapter and source.chat_id:
+                                                await _adapter.send(source.chat_id, _warn_msg, metadata=_hyg_meta)
+                                        except Exception as _werr:
+                                            logger.warning(
+                                                "Failed to deliver compression-failure warning to user: %s",
+                                                _werr,
+                                            )
                                finally:
                                    self._cleanup_agent_resources(_hyg_agent)

@@ -7343,6 +7371,12 @@ class GatewayRunner:
                    approx_tokens,
                    new_tokens,
                )
+                # Detect summary-generation failure so we can surface a
+                # visible warning to the user even on the manual /compress
+                # path (otherwise the failure is silently logged).
+                _summary_failed = bool(getattr(compressor, "_last_summary_fallback_used", False))
+                _dropped_count = int(getattr(compressor, "_last_summary_dropped_count", 0) or 0)
+                _summary_err = getattr(compressor, "_last_summary_error", None)
            finally:
                self._cleanup_agent_resources(tmp_agent)
            lines = [f"🗜️ {summary['headline']}"]
@@ -7351,6 +7385,13 @@ class GatewayRunner:
            lines.append(summary["token_line"])
            if summary["note"]:
                lines.append(summary["note"])
+            if _summary_failed:
+                lines.append(
+                    f"⚠️ Summary generation failed ({_summary_err or 'unknown error'}). "
+                    f"{_dropped_count} historical message(s) were removed and replaced "
+                    "with a placeholder; earlier context is no longer recoverable. "
+                    "Consider checking your auxiliary.compression model configuration."
+                )
            return "\n".join(lines)
        except Exception as e:
            logger.warning("Manual compress failed: %s", e)
@@ -8483,6 +8524,7 @@ class GatewayRunner:
            The enriched message string with vision descriptions prepended.
        """
        from tools.vision_tools import vision_analyze_tool
+        from agent.memory_manager import sanitize_context

        analysis_prompt = (
            "Describe everything visible in this image in thorough detail. "
@@ -8501,6 +8543,7 @@ class GatewayRunner:
                result = json.loads(result_json)
                if result.get("success"):
                    description = result.get("analysis", "")
+                    description = sanitize_context(description)
                    enriched_parts.append(
                        f"[The user sent an image~ Here's what I can see:\n{description}]\n"
                        f"[If you need a closer look, use vision_analyze with "
@@ -224,6 +224,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        api_key_env_vars=("ARCEEAI_API_KEY",),
        base_url_env_var="ARCEE_BASE_URL",
    ),
+    "gmi": ProviderConfig(
+        id="gmi",
+        name="GMI Cloud",
+        auth_type="api_key",
+        inference_base_url="https://api.gmi-serving.com/v1",
+        api_key_env_vars=("GMI_API_KEY",),
+        base_url_env_var="GMI_BASE_URL",
+    ),
    "minimax": ProviderConfig(
        id="minimax",
        name="MiniMax",
@@ -1120,6 +1128,7 @@ def resolve_provider(
        "kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
        "step": "stepfun", "stepfun-coding-plan": "stepfun",
        "arcee-ai": "arcee", "arceeai": "arcee",
+        "gmi-cloud": "gmi", "gmicloud": "gmi",
        "minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
        "alibaba_coding": "alibaba-coding-plan", "alibaba-coding": "alibaba-coding-plan",
        "alibaba_coding_plan": "alibaba-coding-plan",
@@ -1254,6 +1254,22 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "GMI_API_KEY": {
+        "description": "GMI Cloud API key",
+        "prompt": "GMI Cloud API key",
+        "url": "https://www.gmicloud.ai/",
+        "password": True,
+        "category": "provider",
+        "advanced": True,
+    },
+    "GMI_BASE_URL": {
+        "description": "GMI Cloud base URL override",
+        "prompt": "GMI Cloud base URL (leave empty for default)",
+        "url": None,
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
    "MINIMAX_API_KEY": {
        "description": "MiniMax API key (international)",
        "prompt": "MiniMax API key",
@@ -46,6 +46,7 @@ _PROVIDER_ENV_HINTS = (
    "Z_AI_API_KEY",
    "KIMI_API_KEY",
    "KIMI_CN_API_KEY",
+    "GMI_API_KEY",
    "MINIMAX_API_KEY",
    "MINIMAX_CN_API_KEY",
    "KILOCODE_API_KEY",
@@ -937,6 +938,7 @@ def run_doctor(args):
        ("StepFun Step Plan",   ("STEPFUN_API_KEY",),                           "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
        ("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",),                    "https://api.moonshot.cn/v1/models",   None, True),
        ("Arcee AI",         ("ARCEEAI_API_KEY",),                            "https://api.arcee.ai/api/v1/models",  "ARCEE_BASE_URL", True),
+        ("GMI Cloud",        ("GMI_API_KEY",),                                "https://api.gmi-serving.com/v1/models", "GMI_BASE_URL", True),
        ("DeepSeek",         ("DEEPSEEK_API_KEY",),                           "https://api.deepseek.com/v1/models",  "DEEPSEEK_BASE_URL", True),
        ("Hugging Face",     ("HF_TOKEN",),                                   "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
        ("NVIDIA NIM",       ("NVIDIA_API_KEY",),                             "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
@@ -829,8 +829,29 @@ def _print_tui_exit_summary(session_id: Optional[str], active_session_file: Opti
    )


+_NPM_LOCK_RUNTIME_KEYS = frozenset({"ideallyInert"})
+
+
 def _tui_need_npm_install(root: Path) -> bool:
-    """True when @hermes/ink is missing or node_modules is behind package-lock.json (post-pull)."""
+    """True when @hermes/ink is missing or node_modules is behind package-lock.json.
+
+    Compares ``package-lock.json`` against ``node_modules/.package-lock.json``
+    (npm's hidden lockfile) by **content**, not mtime: git checkouts and npm
+    rewrites can bump the root lockfile's timestamp even when installed deps
+    already match, which used to trigger a spurious "Installing TUI
+    dependencies" on every launch.
+
+    For each entry in the root lock's ``packages`` map:
+      - missing from hidden lock → reinstall (unless the entry is marked
+        ``optional`` or ``peer``, which npm may intentionally skip per platform)
+      - present but with differing fields (excluding npm-written runtime
+        annotations like ``ideallyInert``) → reinstall
+
+    Extra entries that exist only in the hidden lock are ignored — stale
+    transitives left over from a removed dependency don't break runtime and
+    we'd rather not force a reinstall for them. Falls back to mtime
+    comparison if either lockfile is unparseable.
+    """
    ink = root / "node_modules" / "@hermes" / "ink" / "package.json"
    if not ink.is_file():
        return True
@@ -840,7 +861,35 @@ def _tui_need_npm_install(root: Path) -> bool:
    marker = root / "node_modules" / ".package-lock.json"
    if not marker.is_file():
        return True
-    return lock.stat().st_mtime > marker.stat().st_mtime
+
+    # Compare lockfile contents, not mtimes: git checkouts and npm rewrites
+    # can bump the root lockfile timestamp even when installed deps already
+    # match. Fall back to mtime when either file is unparseable.
+    try:
+        wanted = json.loads(lock.read_text(encoding="utf-8")).get("packages") or {}
+        installed = json.loads(marker.read_text(encoding="utf-8")).get("packages") or {}
+    except (OSError, UnicodeDecodeError, json.JSONDecodeError):
+        return lock.stat().st_mtime > marker.stat().st_mtime
+
+    def comparable(pkg: dict) -> dict:
+        return {k: v for k, v in pkg.items() if k not in _NPM_LOCK_RUNTIME_KEYS}
+
+    for name, pkg in wanted.items():
+        if not name:
+            continue
+
+        if not isinstance(pkg, dict):
+            continue
+
+        if name not in installed:
+            if pkg.get("optional") or pkg.get("peer"):
+                continue
+            return True
+
+        if isinstance(installed[name], dict) and comparable(pkg) != comparable(installed[name]):
+            return True
+
+    return False


 def _find_bundled_tui(tui_dir: Path) -> Optional[Path]:
@@ -1768,6 +1817,7 @@ def select_provider_and_model(args=None):
        "huggingface",
        "xiaomi",
        "arcee",
+        "gmi",
        "nvidia",
        "ollama-cloud",
    ):
@@ -7782,6 +7832,7 @@ For more help on a command:
            "kilocode",
            "xiaomi",
            "arcee",
+            "gmi",
            "nvidia",
        ],
        default=None,
@@ -278,6 +278,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "trinity-large-preview",
        "trinity-mini",
    ],
+    "gmi": [
+        "zai-org/GLM-5.1-FP8",
+        "deepseek-ai/DeepSeek-V3.2",
+        "moonshotai/Kimi-K2.5",
+        "google/gemini-3.1-flash-lite-preview",
+        "anthropic/claude-sonnet-4.6",
+        "openai/gpt-5.4",
+    ],
    "opencode-zen": [
        "kimi-k2.5",
        "gpt-5.4-pro",
@@ -709,7 +717,6 @@ class ProviderEntry(NamedTuple):
    label: str
    tui_desc: str   # detailed description for `hermes model` TUI

-
 CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("nous",           "Nous Portal",              "Nous Portal (Nous Research subscription)"),
    ProviderEntry("openrouter",     "OpenRouter",               "OpenRouter (100+ models, pay-per-use)"),
@@ -735,6 +742,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("alibaba",        "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
    ProviderEntry("ollama-cloud",   "Ollama Cloud",             "Ollama Cloud (cloud-hosted open models — ollama.com)"),
    ProviderEntry("arcee",          "Arcee AI",                 "Arcee AI (Trinity models — direct API)"),
+    ProviderEntry("gmi",            "GMI Cloud",                "GMI Cloud (multi-model direct API)"),
    ProviderEntry("kilocode",       "Kilo Code",                "Kilo Code (Kilo Gateway API)"),
    ProviderEntry("opencode-zen",   "OpenCode Zen",             "OpenCode Zen (35+ curated models, pay-as-you-go)"),
    ProviderEntry("opencode-go",    "OpenCode Go",              "OpenCode Go (open models, $10/month subscription)"),
@@ -769,6 +777,8 @@ _PROVIDER_ALIASES = {
    "stepfun-coding-plan": "stepfun",
    "arcee-ai": "arcee",
    "arceeai": "arcee",
+    "gmi-cloud": "gmi",
+    "gmicloud": "gmi",
    "minimax-china": "minimax-cn",
    "minimax_cn": "minimax-cn",
    "claude": "anthropic",
@@ -1849,6 +1859,19 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
                    return live
            except Exception:
                pass
+    if normalized == "gmi":
+        try:
+            from hermes_cli.auth import resolve_api_key_provider_credentials
+
+            creds = resolve_api_key_provider_credentials("gmi")
+            api_key = str(creds.get("api_key") or "").strip()
+            base_url = str(creds.get("base_url") or "").strip()
+            if api_key and base_url:
+                live = fetch_api_models(api_key, base_url)
+                if live:
+                    return live
+        except Exception:
+            pass
    if normalized == "custom":
        base_url = _get_custom_base_url()
        if base_url:
@@ -163,6 +163,12 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        base_url_override="https://api.arcee.ai/api/v1",
        base_url_env_var="ARCEE_BASE_URL",
    ),
+    "gmi": HermesOverlay(
+        transport="openai_chat",
+        extra_env_vars=("GMI_API_KEY",),
+        base_url_override="https://api.gmi-serving.com/v1",
+        base_url_env_var="GMI_BASE_URL",
+    ),
    "ollama-cloud": HermesOverlay(
        transport="openai_chat",
        base_url_env_var="OLLAMA_BASE_URL",
@@ -297,6 +303,10 @@ ALIASES: Dict[str, str] = {
    "arcee-ai": "arcee",
    "arceeai": "arcee",

+    # gmi
+    "gmi-cloud": "gmi",
+    "gmicloud": "gmi",
+
    # Local server aliases → virtual "local" concept (resolved via user config)
    "lmstudio": "lmstudio",
    "lm-studio": "lmstudio",
@@ -319,6 +329,7 @@ _LABEL_OVERRIDES: Dict[str, str] = {
    "copilot-acp": "GitHub Copilot ACP",
    "stepfun": "StepFun Step Plan",
    "xiaomi": "Xiaomi MiMo",
+    "gmi": "GMI Cloud",
    "local": "Local endpoint",
    "bedrock": "AWS Bedrock",
    "ollama-cloud": "Ollama Cloud",
@@ -22,6 +22,8 @@ import sqlite3
 import threading
 import time
 from pathlib import Path
+
+from agent.memory_manager import sanitize_context
 from hermes_constants import get_hermes_home
 from typing import Any, Callable, Dict, List, Optional, TypeVar

@@ -31,7 +33,7 @@ T = TypeVar("T")

 DEFAULT_DB_PATH = get_hermes_home() / "state.db"

-SCHEMA_VERSION = 9
+SCHEMA_VERSION = 10

 SCHEMA_SQL = """
 CREATE TABLE IF NOT EXISTS schema_version (
@@ -119,6 +121,32 @@ CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
 END;
 """

+# Trigram FTS5 table for CJK substring search.  The default unicode61
+# tokenizer splits CJK characters into individual tokens, breaking phrase
+# matching.  The trigram tokenizer creates overlapping 3-byte sequences so
+# substring queries work natively for any script (CJK, Thai, etc.).
+FTS_TRIGRAM_SQL = """
+CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts_trigram USING fts5(
+    content,
+    content=messages,
+    content_rowid=id,
+    tokenize='trigram'
+);
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_insert AFTER INSERT ON messages BEGIN
+    INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_delete AFTER DELETE ON messages BEGIN
+    INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_update AFTER UPDATE ON messages BEGIN
+    INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
+    INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
+END;
+"""
+

 class SessionDB:
    """
@@ -366,6 +394,18 @@ class SessionDB:
                except sqlite3.OperationalError:
                    pass  # Column already exists
                cursor.execute("UPDATE schema_version SET version = 9")
+            if current_version < 10:
+                # v10: trigram FTS5 table for CJK/substring search.
+                # Created via FTS_TRIGRAM_SQL below; backfill existing messages.
+                try:
+                    cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
+                except sqlite3.OperationalError:
+                    cursor.executescript(FTS_TRIGRAM_SQL)
+                    cursor.execute(
+                        "INSERT INTO messages_fts_trigram(rowid, content) "
+                        "SELECT id, content FROM messages WHERE content IS NOT NULL"
+                    )
+                cursor.execute("UPDATE schema_version SET version = 10")

        # Unique title index — always ensure it exists (safe to run after migrations
        # since the title column is guaranteed to exist at this point)
@@ -383,6 +423,12 @@ class SessionDB:
        except sqlite3.OperationalError:
            cursor.executescript(FTS_SQL)

+        # Trigram FTS5 for CJK/substring search
+        try:
+            cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
+        except sqlite3.OperationalError:
+            cursor.executescript(FTS_TRIGRAM_SQL)
+
        self._conn.commit()

    # =========================================================================
@@ -1155,7 +1201,10 @@ class SessionDB:

        messages = []
        for row in rows:
-            msg = {"role": row["role"], "content": row["content"]}
+            content = row["content"]
+            if row["role"] in {"user", "assistant"} and isinstance(content, str):
+                content = sanitize_context(content).strip()
+            msg = {"role": row["role"], "content": content}
            if row["tool_call_id"]:
                msg["tool_call_id"] = row["tool_call_id"]
            if row["tool_name"]:
@@ -1291,6 +1340,16 @@ class SessionDB:
        return sanitized.strip()


+    @staticmethod
+    def _is_cjk_codepoint(cp: int) -> bool:
+        return (0x4E00 <= cp <= 0x9FFF or    # CJK Unified Ideographs
+                0x3400 <= cp <= 0x4DBF or    # CJK Extension A
+                0x20000 <= cp <= 0x2A6DF or  # CJK Extension B
+                0x3000 <= cp <= 0x303F or    # CJK Symbols
+                0x3040 <= cp <= 0x309F or    # Hiragana
+                0x30A0 <= cp <= 0x30FF or    # Katakana
+                0xAC00 <= cp <= 0xD7AF)      # Hangul Syllables
+
    @staticmethod
    def _contains_cjk(text: str) -> bool:
        """Check if text contains CJK (Chinese, Japanese, Korean) characters."""
@@ -1306,6 +1365,11 @@ class SessionDB:
                return True
        return False

+    @classmethod
+    def _count_cjk(cls, text: str) -> int:
+        """Count CJK characters in text."""
+        return sum(1 for ch in text if cls._is_cjk_codepoint(ord(ch)))
+
    def search_messages(
        self,
        query: str,
@@ -1376,52 +1440,113 @@ class SessionDB:
            LIMIT ? OFFSET ?
        """

-        with self._lock:
-            try:
-                cursor = self._conn.execute(sql, params)
-            except sqlite3.OperationalError:
-                # FTS5 query syntax error despite sanitization — return empty
-                # unless query contains CJK (fall back to LIKE below)
-                if not self._contains_cjk(query):
-                    return []
-                matches = []
-            else:
-                matches = [dict(row) for row in cursor.fetchall()]
-
-        # LIKE fallback for CJK queries: FTS5 default tokenizer splits CJK
-        # characters individually, causing multi-character queries to fail.
-        if not matches and self._contains_cjk(query):
+        # CJK queries bypass the unicode61 FTS5 table.  The default tokenizer
+        # splits CJK characters into individual tokens, so "大别山项目" becomes
+        # "大 AND 别 AND 山 AND 项 AND 目" — producing false positives and
+        # missing exact phrase matches.
+        #
+        # For queries with 3+ CJK characters, we use the trigram FTS5 table
+        # (indexed substring matching with ranking and snippets).  For shorter
+        # CJK queries (1-2 chars), trigram can't match (it needs ≥9 UTF-8
+        # bytes = 3 CJK chars), so we fall back to LIKE.
+        is_cjk = self._contains_cjk(query)
+        if is_cjk:
            raw_query = query.strip('"').strip()
-            like_where = ["m.content LIKE ?"]
-            like_params: list = [f"%{raw_query}%"]
-            if source_filter is not None:
-                like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
-                like_params.extend(source_filter)
-            if exclude_sources is not None:
-                like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
-                like_params.extend(exclude_sources)
-            if role_filter:
-                like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
-                like_params.extend(role_filter)
-            like_sql = f"""
-                SELECT m.id, m.session_id, m.role,
-                       substr(m.content,
-                              max(1, instr(m.content, ?) - 40),
-                              120) AS snippet,
-                       m.content, m.timestamp, m.tool_name,
-                       s.source, s.model, s.started_at AS session_started
-                FROM messages m
-                JOIN sessions s ON s.id = m.session_id
-                WHERE {' AND '.join(like_where)}
-                ORDER BY m.timestamp DESC
-                LIMIT ? OFFSET ?
-            """
-            like_params.extend([limit, offset])
-            # instr() parameter goes first in the bound list
-            like_params = [raw_query] + like_params
+            cjk_count = self._count_cjk(raw_query)
+
+            if cjk_count >= 3:
+                # Trigram FTS5 path — quote each non-operator token to handle
+                # FTS5 special chars (%, *, etc.) while preserving boolean
+                # operators (AND, OR, NOT) for multi-term queries.
+                tokens = raw_query.split()
+                parts = []
+                for tok in tokens:
+                    if tok.upper() in ("AND", "OR", "NOT"):
+                        parts.append(tok)
+                    else:
+                        parts.append('"' + tok.replace('"', '""') + '"')
+                trigram_query = " ".join(parts)
+                tri_where = ["messages_fts_trigram MATCH ?"]
+                tri_params: list = [trigram_query]
+                if source_filter is not None:
+                    tri_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
+                    tri_params.extend(source_filter)
+                if exclude_sources is not None:
+                    tri_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
+                    tri_params.extend(exclude_sources)
+                if role_filter:
+                    tri_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
+                    tri_params.extend(role_filter)
+                tri_sql = f"""
+                    SELECT
+                        m.id,
+                        m.session_id,
+                        m.role,
+                        snippet(messages_fts_trigram, 0, '>>>', '<<<', '...', 40) AS snippet,
+                        m.content,
+                        m.timestamp,
+                        m.tool_name,
+                        s.source,
+                        s.model,
+                        s.started_at AS session_started
+                    FROM messages_fts_trigram
+                    JOIN messages m ON m.id = messages_fts_trigram.rowid
+                    JOIN sessions s ON s.id = m.session_id
+                    WHERE {' AND '.join(tri_where)}
+                    ORDER BY rank
+                    LIMIT ? OFFSET ?
+                """
+                tri_params.extend([limit, offset])
+                with self._lock:
+                    try:
+                        tri_cursor = self._conn.execute(tri_sql, tri_params)
+                    except sqlite3.OperationalError:
+                        matches = []
+                    else:
+                        matches = [dict(row) for row in tri_cursor.fetchall()]
+            else:
+                # Short CJK query (1-2 chars) — trigram needs ≥3 CJK chars.
+                # Fall back to LIKE substring search.
+                escaped = raw_query.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
+                like_where = ["m.content LIKE ? ESCAPE '\\'"]
+                like_params: list = [f"%{escaped}%"]
+                if source_filter is not None:
+                    like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
+                    like_params.extend(source_filter)
+                if exclude_sources is not None:
+                    like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
+                    like_params.extend(exclude_sources)
+                if role_filter:
+                    like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
+                    like_params.extend(role_filter)
+                like_sql = f"""
+                    SELECT m.id, m.session_id, m.role,
+                           substr(m.content,
+                                  max(1, instr(m.content, ?) - 40),
+                                  120) AS snippet,
+                           m.content, m.timestamp, m.tool_name,
+                           s.source, s.model, s.started_at AS session_started
+                    FROM messages m
+                    JOIN sessions s ON s.id = m.session_id
+                    WHERE {' AND '.join(like_where)}
+                    ORDER BY m.timestamp DESC
+                    LIMIT ? OFFSET ?
+                """
+                like_params.extend([limit, offset])
+                # instr() parameter goes first in the bound list
+                like_params = [raw_query] + like_params
+                with self._lock:
+                    like_cursor = self._conn.execute(like_sql, like_params)
+                    matches = [dict(row) for row in like_cursor.fetchall()]
+        else:
            with self._lock:
-                like_cursor = self._conn.execute(like_sql, like_params)
-                matches = [dict(row) for row in like_cursor.fetchall()]
+                try:
+                    cursor = self._conn.execute(sql, params)
+                except sqlite3.OperationalError:
+                    # FTS5 query syntax error despite sanitization — return empty
+                    return []
+                else:
+                    matches = [dict(row) for row in cursor.fetchall()]

        # Add surrounding context (1 message before + after each match).
        # Done outside the lock so we don't hold it across N sequential queries.
@@ -7,9 +7,7 @@
  perSystem = { pkgs, system, lib, ... }:
    let
      hermes-agent = inputs.self.packages.${system}.default;
-      hermesVenv = pkgs.callPackage ./python.nix {
-        inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
-      };
+      hermesVenv = hermes-agent.hermesVenv;

      configMergeScript = pkgs.callPackage ./configMergeScript.nix { };

@@ -193,6 +191,35 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2)
          echo "ok" > $out/result
        '';

+        # Verify extraPythonPackages PYTHONPATH injection
+        extra-python-packages = let
+          testPkg = pkgs.python312Packages.pyfiglet;
+          hermesWithExtra = hermes-agent.override {
+            extraPythonPackages = [ testPkg ];
+          };
+        in pkgs.runCommand "hermes-extra-python-packages" { } ''
+          set -e
+          echo "=== Checking extraPythonPackages PYTHONPATH injection ==="
+
+          grep -q "PYTHONPATH" ${hermesWithExtra}/bin/hermes || \
+            (echo "FAIL: PYTHONPATH not in wrapper"; exit 1)
+          echo "PASS: PYTHONPATH present in wrapper"
+
+          grep -q "${testPkg}" ${hermesWithExtra}/bin/hermes || \
+            (echo "FAIL: test package path not in PYTHONPATH"; exit 1)
+          echo "PASS: test package path found in wrapper"
+
+          echo "=== Checking base package has no PYTHONPATH ==="
+          if grep -q "PYTHONPATH" ${hermes-agent}/bin/hermes; then
+            echo "FAIL: base package should not have PYTHONPATH"; exit 1
+          fi
+          echo "PASS: base package clean"
+
+          echo "=== All extraPythonPackages checks passed ==="
+          mkdir -p $out
+          echo "ok" > $out/result
+        '';
+
        # ── Config merge + round-trip test ────────────────────────────────
        # Tests the merge script (Nix activation behavior) across 7
        # scenarios, then verifies Python's load_config() reads correctly.
@@ -0,0 +1,186 @@
+# nix/hermes-agent.nix — Overridable Hermes Agent package
+#
+# callPackage auto-wires nixpkgs args; flake inputs are passed explicitly.
+# Users override via: pkgs.hermes-agent.override { extraPythonPackages = [...]; }
+{
+  lib,
+  stdenv,
+  makeWrapper,
+  callPackage,
+  python312,
+  nodejs_22,
+  ripgrep,
+  git,
+  openssh,
+  ffmpeg,
+  tirith,
+  # Flake inputs — passed explicitly by packages.nix and overlays.nix
+  uv2nix,
+  pyproject-nix,
+  pyproject-build-systems,
+  npm-lockfile-fix,
+  # Overridable parameters
+  extraPythonPackages ? [ ],
+}:
+let
+  hermesVenv = callPackage ./python.nix {
+    inherit uv2nix pyproject-nix pyproject-build-systems;
+  };
+
+  hermesNpmLib = callPackage ./lib.nix {
+    inherit npm-lockfile-fix;
+  };
+
+  hermesTui = callPackage ./tui.nix {
+    inherit hermesNpmLib;
+  };
+
+  hermesWeb = callPackage ./web.nix {
+    inherit hermesNpmLib;
+  };
+
+  bundledSkills = lib.cleanSourceWith {
+    src = ../skills;
+    filter = path: _type: !(lib.hasInfix "/index-cache/" path);
+  };
+
+  runtimeDeps = [
+    nodejs_22
+    ripgrep
+    git
+    openssh
+    ffmpeg
+    tirith
+  ];
+
+  runtimePath = lib.makeBinPath runtimeDeps;
+
+  sitePackagesPath = python312.sitePackages;
+
+  # Walk propagatedBuildInputs to include transitive Python deps in PYTHONPATH.
+  # Without this, a plugin listing e.g. requests as a dep would fail at runtime
+  # if requests isn't already in the sealed uv2nix venv.
+  allExtraPythonPackages = python312.pkgs.requiredPythonModules extraPythonPackages;
+
+  pythonPath = lib.makeSearchPath sitePackagesPath allExtraPythonPackages;
+
+  pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
+  uvLockHash =
+    if builtins.pathExists ../uv.lock then
+      builtins.hashString "sha256" (builtins.readFile ../uv.lock)
+    else
+      "none";
+in
+stdenv.mkDerivation {
+  pname = "hermes-agent";
+  version = (builtins.fromTOML (builtins.readFile ../pyproject.toml)).project.version;
+
+  dontUnpack = true;
+  dontBuild = true;
+  nativeBuildInputs = [ makeWrapper ];
+
+  installPhase = ''
+    runHook preInstall
+
+    mkdir -p $out/share/hermes-agent $out/bin
+    cp -r ${bundledSkills} $out/share/hermes-agent/skills
+    cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
+
+    mkdir -p $out/ui-tui
+    cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/
+
+    ${lib.concatMapStringsSep "\n"
+      (name: ''
+        makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
+          --suffix PATH : "${runtimePath}" \
+          --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
+          --set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
+          --set HERMES_TUI_DIR $out/ui-tui \
+          --set HERMES_PYTHON ${hermesVenv}/bin/python3 \
+          --set HERMES_NODE ${nodejs_22}/bin/node \
+          ${lib.optionalString (extraPythonPackages != [ ]) ''--suffix PYTHONPATH : "${pythonPath}"''}
+      '')
+      [
+        "hermes"
+        "hermes-agent"
+        "hermes-acp"
+      ]
+    }
+
+    ${lib.optionalString (extraPythonPackages != [ ]) ''
+      echo "=== Checking for plugin/core package collisions ==="
+      ${hermesVenv}/bin/python3 -c "
+import pathlib, sys, re
+
+def canonical(name):
+    return re.sub(r'[-_.]+', '-', name).lower()
+
+# Collect core venv package names
+core = set()
+venv_sp = pathlib.Path('${hermesVenv}/${sitePackagesPath}')
+for di in venv_sp.glob('*.dist-info'):
+    meta = di / 'METADATA'
+    if meta.exists():
+        for line in meta.read_text().splitlines():
+            if line.startswith('Name:'):
+                core.add(canonical(line.split(':', 1)[1].strip()))
+                break
+
+# Check each extra package for collisions
+extras_dirs = [${lib.concatMapStringsSep ", " (p: "'${toString p}'") allExtraPythonPackages}]
+for edir in extras_dirs:
+    sp = pathlib.Path(edir) / '${sitePackagesPath}'
+    if not sp.exists():
+        continue
+    for di in sp.glob('*.dist-info'):
+        meta = di / 'METADATA'
+        if not meta.exists():
+            continue
+        for line in meta.read_text().splitlines():
+            if line.startswith('Name:'):
+                pkg = canonical(line.split(':', 1)[1].strip())
+                if pkg in core:
+                    print(f'ERROR: plugin package \"{pkg}\" collides with a package in hermes sealed venv', file=sys.stderr)
+                    print(f'  from: {di}', file=sys.stderr)
+                    print(f'  Remove this dependency from extraPythonPackages.', file=sys.stderr)
+                    sys.exit(1)
+                break
+
+print('No collisions found.')
+      "
+      echo "=== No collisions ==="
+    ''}
+
+    runHook postInstall
+  '';
+
+  passthru = {
+    inherit hermesTui hermesWeb hermesNpmLib hermesVenv;
+
+    devShellHook = ''
+      STAMP=".nix-stamps/hermes-agent"
+      STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
+      if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
+        echo "hermes-agent: installing Python dependencies..."
+        uv venv .venv --python ${python312}/bin/python3 2>/dev/null || true
+        source .venv/bin/activate
+        uv pip install -e ".[all]"
+        [ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
+        [ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
+        mkdir -p .nix-stamps
+        echo "$STAMP_VALUE" > "$STAMP"
+      else
+        source .venv/bin/activate
+        export HERMES_PYTHON=${hermesVenv}/bin/python3
+      fi
+    '';
+  };
+
+  meta = with lib; {
+    description = "AI agent with advanced tool-calling capabilities";
+    homepage = "https://github.com/NousResearch/hermes-agent";
+    mainProgram = "hermes";
+    license = licenses.mit;
+    platforms = platforms.unix;
+  };
+}
@@ -28,6 +28,8 @@

  let
    cfg = config.services.hermes-agent;
+    effectivePackage = if cfg.extraPythonPackages == [ ] then cfg.package
+      else cfg.package.override { inherit (cfg) extraPythonPackages; };
    hermes-agent = inputs.self.packages.${pkgs.stdenv.hostPlatform.system}.default;

    # Deep-merge config type (from 0xrsydn/nix-hermes-agent)
@@ -456,6 +458,52 @@
        description = "Extra packages available on PATH.";
      };

+      extraPlugins = mkOption {
+        type = types.listOf types.package;
+        default = [ ];
+        description = ''
+          Directory-based plugin packages to symlink into the hermes plugins
+          directory. Each package should contain a plugin.yaml and __init__.py
+          at its root. Hermes discovers these automatically on startup.
+        '';
+        example = literalExpression ''
+          [
+            (pkgs.fetchFromGitHub {
+              owner = "stephenschoettler";
+              repo = "hermes-lcm";
+              name = "hermes-lcm";
+              rev = "v0.7.0";
+              hash = "sha256-...";
+            })
+          ]
+        '';
+      };
+
+      extraPythonPackages = mkOption {
+        type = types.listOf types.package;
+        default = [ ];
+        description = ''
+          Python packages to add to PYTHONPATH for entry-point plugin discovery.
+          These are pip-packaged plugins that register via the
+          hermes_agent.plugins entry-point group. Each package must be built
+          with the same Python interpreter as hermes (python312).
+        '';
+        example = literalExpression ''
+          [
+            (pkgs.python312Packages.buildPythonPackage {
+              pname = "rtk-hermes";
+              version = "1.0.0";
+              src = pkgs.fetchFromGitHub {
+                owner = "ogallotti";
+                repo = "rtk-hermes";
+                rev = "main";
+                hash = "sha256-...";
+              };
+            })
+          ]
+        '';
+      };
+
      restart = mkOption {
        type = types.str;
        default = "always";
@@ -570,7 +618,7 @@
      # so interactive shells share state (sessions, skills, cron) with the
      # gateway service instead of creating a separate ~/.hermes/.
      (lib.mkIf cfg.addToSystemPackages {
-        environment.systemPackages = [ cfg.package ];
+        environment.systemPackages = [ effectivePackage ];
        environment.variables.HERMES_HOME = "${cfg.stateDir}/.hermes";
      })

@@ -581,6 +629,16 @@
        });
      })

+      # ── Assertions ─────────────────────────────────────────────────────
+      {
+        assertions = let
+          names = map lib.getName cfg.extraPlugins;
+        in [{
+          assertion = (lib.length names) == (lib.length (lib.unique names));
+          message = "services.hermes-agent.extraPlugins: duplicate plugin names detected: ${toString names}. If using fetchFromGitHub, set name = \"plugin-name\" to disambiguate.";
+        }];
+      }
+
      # ── Warnings ──────────────────────────────────────────────────────
      (lib.mkIf (cfg.container.enable && !cfg.addToSystemPackages && cfg.container.hostUsers != []) {
        warnings = [
@@ -602,6 +660,7 @@
          "d ${cfg.stateDir}/.hermes/sessions 2770 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.stateDir}/.hermes/logs   2770 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.stateDir}/.hermes/memories 2770 ${cfg.user} ${cfg.group} - -"
+          "d ${cfg.stateDir}/.hermes/plugins 2770 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.stateDir}/home           0750 ${cfg.user} ${cfg.group} - -"
          "d ${cfg.workingDirectory}         2770 ${cfg.user} ${cfg.group} - -"
        ];
@@ -623,7 +682,7 @@
          find ${cfg.stateDir}/.hermes -maxdepth 1 \
            \( -name "*.db" -o -name "*.db-wal" -o -name "*.db-shm" -o -name "SOUL.md" \) \
            -exec chmod g+rw {} + 2>/dev/null || true
-          for _subdir in cron sessions logs memories; do
+          for _subdir in cron sessions logs memories plugins; do
            mkdir -p "${cfg.stateDir}/.hermes/$_subdir"
            chown ${cfg.user}:${cfg.group} "${cfg.stateDir}/.hermes/$_subdir"
            chmod 2770 "${cfg.stateDir}/.hermes/$_subdir"
@@ -732,6 +791,22 @@ HERMES_NIX_ENV_EOF
          ${lib.concatStringsSep "\n" (lib.mapAttrsToList (name: _value: ''
            install -o ${cfg.user} -g ${cfg.group} -m 0640 ${documentDerivation}/${name} ${cfg.workingDirectory}/${name}
          '') cfg.documents)}
+
+        # ── Declarative plugins ─────────────────────────────────────────
+        # Remove stale managed symlinks (plugins removed from config)
+        find ${cfg.stateDir}/.hermes/plugins -maxdepth 1 -type l -name 'nix-managed-*' -delete 2>/dev/null || true
+
+        ${lib.concatStringsSep "\n" (map (plugin:
+          let
+            name = lib.getName plugin;
+          in ''
+            if [ ! -f "${plugin}/plugin.yaml" ]; then
+              echo "ERROR: extraPlugins entry '${plugin}' has no plugin.yaml" >&2
+              exit 1
+            fi
+            ln -sfn ${plugin} ${cfg.stateDir}/.hermes/plugins/nix-managed-${name}
+            chown -h ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/plugins/nix-managed-${name}
+          '') cfg.extraPlugins)}
        '';
      }

@@ -762,7 +837,7 @@ HERMES_NIX_ENV_EOF
            # reads them at Python startup — no systemd EnvironmentFile needed.

            ExecStart = lib.concatStringsSep " " ([
-              "${cfg.package}/bin/hermes"
+              "${effectivePackage}/bin/hermes"
              "gateway"
            ] ++ cfg.extraArgs);

@@ -785,7 +860,7 @@ HERMES_NIX_ENV_EOF
          };

          path = [
-            cfg.package
+            effectivePackage
            pkgs.bash
            pkgs.coreutils
            pkgs.git
@@ -810,11 +885,11 @@ HERMES_NIX_ENV_EOF

          preStart = ''
            # Stable symlinks — container references these, not store paths directly
-            ln -sfn ${cfg.package} ${cfg.stateDir}/current-package
+            ln -sfn ${effectivePackage} ${cfg.stateDir}/current-package
            ln -sfn ${containerEntrypoint} ${cfg.stateDir}/current-entrypoint

            # GC roots so nix-collect-garbage doesn't remove store paths in use
-            ${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root --indirect -r ${cfg.package} 2>/dev/null || true
+            ${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root --indirect -r ${effectivePackage} 2>/dev/null || true
            ${pkgs.nix}/bin/nix-store --add-root ${cfg.stateDir}/.gc-root-entrypoint --indirect -r ${containerEntrypoint} 2>/dev/null || true

            # Check if container needs (re)creation
@@ -0,0 +1,10 @@
+# nix/overlays.nix — Expose pkgs.hermes-agent for external NixOS configs
+{ inputs, ... }:
+{
+  flake.overlays.default = final: _: {
+    hermes-agent = final.callPackage ./hermes-agent.nix {
+      inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
+      npm-lockfile-fix = inputs.npm-lockfile-fix.packages.${final.stdenv.hostPlatform.system}.default;
+    };
+  };
+}
@@ -4,120 +4,19 @@
  perSystem =
    { pkgs, inputs', ... }:
    let
-      hermesVenv = pkgs.callPackage ./python.nix {
+      hermesAgent = pkgs.callPackage ./hermes-agent.nix {
        inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
-      };
-
-      hermesNpmLib = pkgs.callPackage ./lib.nix {
        npm-lockfile-fix = inputs'.npm-lockfile-fix.packages.default;
      };
-
-      hermesTui = pkgs.callPackage ./tui.nix {
-        inherit hermesNpmLib;
-      };
-
-      # Import bundled skills, excluding runtime caches
-      bundledSkills = pkgs.lib.cleanSourceWith {
-        src = ../skills;
-        filter = path: _type: !(pkgs.lib.hasInfix "/index-cache/" path);
-      };
-
-      hermesWeb = pkgs.callPackage ./web.nix {
-        inherit hermesNpmLib;
-      };
-
-      runtimeDeps = with pkgs; [
-        nodejs_22
-        ripgrep
-        git
-        openssh
-        ffmpeg
-        tirith
-      ];
-
-      runtimePath = pkgs.lib.makeBinPath runtimeDeps;
-
-      # Lockfile hashes for dev shell stamps
-      pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
-      uvLockHash =
-        if builtins.pathExists ../uv.lock then
-          builtins.hashString "sha256" (builtins.readFile ../uv.lock)
-        else
-          "none";
    in
    {
      packages = {
-        default = pkgs.stdenv.mkDerivation {
-          pname = "hermes-agent";
-          version = (fromTOML (builtins.readFile ../pyproject.toml)).project.version;
+        default = hermesAgent;
+        tui = hermesAgent.hermesTui;
+        web = hermesAgent.hermesWeb;

-          dontUnpack = true;
-          dontBuild = true;
-          nativeBuildInputs = [ pkgs.makeWrapper ];
-
-          installPhase = ''
-            runHook preInstall
-
-            mkdir -p $out/share/hermes-agent $out/bin
-            cp -r ${bundledSkills} $out/share/hermes-agent/skills
-            cp -r ${hermesWeb} $out/share/hermes-agent/web_dist
-
-            # copy pre-built TUI (same layout as dev: ui-tui/dist/ + node_modules/)
-            mkdir -p $out/ui-tui
-            cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/
-
-            ${pkgs.lib.concatMapStringsSep "\n"
-              (name: ''
-                makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
-                  --suffix PATH : "${runtimePath}" \
-                  --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
-                  --set HERMES_WEB_DIST $out/share/hermes-agent/web_dist \
-                  --set HERMES_TUI_DIR $out/ui-tui \
-                  --set HERMES_PYTHON ${hermesVenv}/bin/python3 \
-                  --set HERMES_NODE ${pkgs.nodejs_22}/bin/node
-              '')
-              [
-                "hermes"
-                "hermes-agent"
-                "hermes-acp"
-              ]
-            }
-
-            runHook postInstall
-          '';
-
-          passthru.devShellHook = ''
-            STAMP=".nix-stamps/hermes-agent"
-            STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
-            if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
-              echo "hermes-agent: installing Python dependencies..."
-              uv venv .venv --python ${pkgs.python312}/bin/python3 2>/dev/null || true
-              source .venv/bin/activate
-              uv pip install -e ".[all]"
-              [ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
-              [ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
-              mkdir -p .nix-stamps
-              echo "$STAMP_VALUE" > "$STAMP"
-            else
-              source .venv/bin/activate
-              export HERMES_PYTHON=${hermesVenv}/bin/python3
-            fi
-          '';
-
-          meta = with pkgs.lib; {
-            description = "AI agent with advanced tool-calling capabilities";
-            homepage = "https://github.com/NousResearch/hermes-agent";
-            mainProgram = "hermes";
-            license = licenses.mit;
-            platforms = platforms.unix;
-          };
-        };
-
-        tui = hermesTui;
-        web = hermesWeb;
-
-        fix-lockfiles = hermesNpmLib.mkFixLockfiles {
-          packages = [ hermesTui hermesWeb ];
+        fix-lockfiles = hermesAgent.hermesNpmLib.mkFixLockfiles {
+          packages = [ hermesAgent.hermesTui hermesAgent.hermesWeb ];
        };
      };
    };
@@ -7,6 +7,7 @@
  pyproject-nix,
  pyproject-build-systems,
  stdenv,
+  dependency-groups ? [ "all" ],
 }:
 let
  workspace = uv2nix.lib.workspace.loadWorkspace { workspaceRoot = ./..; };
@@ -96,5 +97,5 @@ let
      ]);
 in
 pythonSet.mkVirtualEnv "hermes-agent-env" {
-  hermes-agent = [ "all" ];
+  hermes-agent = dependency-groups;
 }
@@ -22,6 +22,7 @@ import threading
 import time
 from typing import Any, Dict, List, Optional

+from agent.memory_manager import sanitize_context
 from agent.memory_provider import MemoryProvider
 from tools.registry import tool_error

@@ -37,7 +38,10 @@ PROFILE_SCHEMA = {
    "description": (
        "Retrieve or update a peer card from Honcho — a curated list of key facts "
        "about that peer (name, role, preferences, communication style, patterns). "
-        "Pass `card` to update; omit `card` to read."
+        "Pass `card` to update; omit `card` to read.  If the card is empty, the "
+        "result includes a `hint` field explaining why (observation disabled, "
+        "fresh peer, dialectic layer still warming up, etc.) — this is NOT an "
+        "error.  Peer cards accumulate over time from observed conversation."
    ),
    "parameters": {
        "type": "object",
@@ -1056,6 +1060,63 @@ class HonchoMemoryProvider(MemoryProvider):

        return chunks

+    def _empty_profile_hint(self, peer: str) -> Dict[str, Any]:
+        """Build a diagnostic hint when honcho_profile returns an empty card.
+
+        A literal "No profile facts available yet." tells the model nothing
+        about WHY.  The model then often surfaces it to the user as a cryptic
+        error.  This hint enumerates the likely causes so the model can
+        explain the situation (or retry with a different peer).
+
+        Ordered by likelihood for a typical deployment:
+          1. Observation is disabled for this peer
+          2. Card hasn't accumulated yet (fresh peer, not enough dialectic
+             cycles — dialectic cadence runs every N turns)
+          3. Self-hosted Honcho backend doesn't support peer cards
+             (honcho-ai server < 3.x)
+        """
+        cfg = self._config
+        reasons: List[str] = []
+
+        if cfg is not None:
+            if peer == "user":
+                observe_me = bool(getattr(cfg, "user_observe_me", True))
+                observe_others = bool(getattr(cfg, "user_observe_others", True))
+            else:
+                observe_me = bool(getattr(cfg, "ai_observe_me", True))
+                observe_others = bool(getattr(cfg, "ai_observe_others", True))
+            if not (observe_me or observe_others):
+                reasons.append(
+                    f"observation is disabled for peer '{peer}' "
+                    f"(user_observe_me/ai_observe_me in config)"
+                )
+
+        cadence = getattr(self, "_dialectic_cadence", 1)
+        turn = getattr(self, "_turn_count", 0)
+        if turn < max(2, cadence):
+            reasons.append(
+                f"this session has only {turn} turn(s); peer cards accumulate "
+                f"as the dialectic layer reasons over conversation history "
+                f"(cadence every {cadence} turn(s))"
+            )
+
+        if not reasons:
+            reasons.append(
+                "peer card has no facts yet — Honcho's dialectic layer builds "
+                "this over time from observed turns; self-hosted Honcho < 3.x "
+                "does not support peer cards at all"
+            )
+
+        return {
+            "result": "No profile facts available yet.",
+            "hint": (
+                "This is not an error.  "
+                + "; ".join(reasons)
+                + ".  Try honcho_reasoning for a synthesized answer, or "
+                "honcho_search to query raw conversation excerpts."
+            ),
+        }
+
    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
        """Record the conversation turn in Honcho (non-blocking).

@@ -1068,13 +1129,15 @@ class HonchoMemoryProvider(MemoryProvider):
            return

        msg_limit = self._config.message_max_chars if self._config else 25000
+        clean_user_content = sanitize_context(user_content or "").strip()
+        clean_assistant_content = sanitize_context(assistant_content or "").strip()

        def _sync():
            try:
                session = self._manager.get_or_create(self._session_key)
-                for chunk in self._chunk_message(user_content, msg_limit):
+                for chunk in self._chunk_message(clean_user_content, msg_limit):
                    session.add_message("user", chunk)
-                for chunk in self._chunk_message(assistant_content, msg_limit):
+                for chunk in self._chunk_message(clean_assistant_content, msg_limit):
                    session.add_message("assistant", chunk)
                self._manager._flush_session(session)
            except Exception as e:
@@ -1087,8 +1150,20 @@ class HonchoMemoryProvider(MemoryProvider):
        )
        self._sync_thread.start()

-    def on_memory_write(self, action: str, target: str, content: str) -> None:
-        """Mirror built-in user profile writes as Honcho conclusions."""
+    def on_memory_write(
+        self,
+        action: str,
+        target: str,
+        content: str,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        """Mirror built-in user profile writes as Honcho conclusions.
+
+        ``metadata`` is accepted for compatibility with the write-origin
+        work landed in main (commit 6a957a74); it's not yet threaded into
+        the Honcho conclusion payload.  Left as a follow-up so this PR
+        stays focused on the 7-PR consolidation and its review follow-ups.
+        """
        if action != "add" or target != "user" or not content:
            return
        if self._cron_skipped:
@@ -1154,7 +1229,7 @@ class HonchoMemoryProvider(MemoryProvider):
                    return json.dumps({"result": f"Peer card updated ({len(result)} facts).", "card": result})
                card = self._manager.get_peer_card(self._session_key, peer=peer)
                if not card:
-                    return json.dumps({"result": "No profile facts available yet."})
+                    return json.dumps(self._empty_profile_hint(peer))
                return json.dumps({"result": card})

            elif tool_name == "honcho_search":
@@ -273,9 +273,38 @@ def _write_config(cfg: dict, path: Path | None = None) -> None:


 def _resolve_api_key(cfg: dict) -> str:
-    """Resolve API key with host -> root -> env fallback."""
+    """Resolve API key with host -> root -> env fallback.
+
+    For self-hosted instances configured with ``baseUrl`` instead of an API
+    key, returns ``"local"`` so that credential guards throughout the CLI
+    don't reject a valid configuration.  The ``baseUrl`` is scheme-validated
+    (http/https only) so that a typo like ``baseUrl: true`` can't silently
+    pass the guard.  Schemeless strings that look like host:port (legacy
+    config shapes, e.g. ``localhost:8000``) still pass — the Honcho SDK
+    will reject them itself with a clearer error than ours.
+    """
    host_key = ((cfg.get("hosts") or {}).get(_host_key()) or {}).get("apiKey")
-    return host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
+    key = host_key or cfg.get("apiKey", "") or os.environ.get("HONCHO_API_KEY", "")
+    if not key:
+        base_url = cfg.get("baseUrl") or cfg.get("base_url") or os.environ.get("HONCHO_BASE_URL", "")
+        base_url = (base_url or "").strip()
+        if base_url:
+            from urllib.parse import urlparse
+            try:
+                parsed = urlparse(base_url)
+            except (TypeError, ValueError):
+                parsed = None
+            if parsed and parsed.scheme in ("http", "https") and parsed.netloc:
+                return "local"
+            # Schemeless but looks like a host (contains '.' or ':' and isn't
+            # a boolean literal): let it through so legacy configs don't
+            # regress into "no API key configured" when they previously worked.
+            lowered = base_url.lower()
+            if lowered not in ("true", "false", "none", "null") and any(
+                c in base_url for c in ".:"
+            ) and not base_url.isdigit():
+                return "local"
+    return key


 def _prompt(label: str, default: str | None = None, secret: bool = False) -> str:
@@ -16,6 +16,7 @@ from __future__ import annotations
 import json
 import os
 import logging
+import hashlib
 from dataclasses import dataclass, field
 from pathlib import Path

@@ -27,7 +28,6 @@ if TYPE_CHECKING:

 logger = logging.getLogger(__name__)

-GLOBAL_CONFIG_PATH = Path.home() / ".honcho" / "config.json"
 HOST = "hermes"


@@ -53,6 +53,11 @@ def resolve_active_host() -> str:
    return HOST


+def resolve_global_config_path() -> Path:
+    """Return the shared Honcho config path for the current HOME."""
+    return Path.home() / ".honcho" / "config.json"
+
+
 def resolve_config_path() -> Path:
    """Return the active Honcho config path.

@@ -72,7 +77,7 @@ def resolve_config_path() -> Path:
    if default_path != local_path and default_path.exists():
        return default_path

-    return GLOBAL_CONFIG_PATH
+    return resolve_global_config_path()


 _RECALL_MODE_ALIASES = {"auto": "hybrid"}
@@ -138,6 +143,15 @@ def _parse_dialectic_depth_levels(host_val, root_val, depth: int) -> list[str] |
    return None


+# Default HTTP timeout (seconds) applied when no explicit timeout is
+# configured via HonchoClientConfig.timeout, honcho.timeout / requestTimeout,
+# or HONCHO_TIMEOUT. Honcho calls happen on the post-response path of
+# run_conversation; without a cap the agent can block indefinitely when
+# the Honcho backend is unreachable, preventing the gateway from
+# delivering the already-generated response.
+_DEFAULT_HTTP_TIMEOUT = 30.0
+
+
 def _resolve_optional_float(*values: Any) -> float | None:
    """Return the first non-empty value coerced to a positive float."""
    for value in values:
@@ -226,6 +240,13 @@ class HonchoClientConfig:
    # Identity
    peer_name: str | None = None
    ai_peer: str = "hermes"
+    # When True, ``peer_name`` wins over any gateway-supplied runtime
+    # identity (Telegram UID, Discord ID, …) when resolving the user peer.
+    # This keeps memory unified across platforms for single-user deployments
+    # where Honcho's one peer-name is an unambiguous identity — otherwise
+    # each platform would fork memory into its own peer (#14984).  Default
+    # ``False`` preserves existing multi-user behaviour.
+    pin_peer_name: bool = False
    # Toggles
    enabled: bool = False
    save_messages: bool = True
@@ -420,6 +441,11 @@ class HonchoClientConfig:
            timeout=timeout,
            peer_name=host_block.get("peerName") or raw.get("peerName"),
            ai_peer=ai_peer,
+            pin_peer_name=_resolve_bool(
+                host_block.get("pinPeerName"),
+                raw.get("pinPeerName"),
+                default=False,
+            ),
            enabled=enabled,
            save_messages=save_messages,
            write_frequency=write_frequency,
@@ -522,6 +548,39 @@ class HonchoClientConfig:
            pass
        return None

+    # Honcho enforces a 100-char limit on session IDs. Long gateway session keys
+    # (Matrix "!room:server" + thread event IDs, Telegram supergroup reply
+    # chains, Slack thread IDs with long workspace prefixes) can overflow this
+    # limit after sanitization; the Honcho API then rejects every call for that
+    # session with "session_id too long". See issue #13868.
+    _HONCHO_SESSION_ID_MAX_LEN = 100
+    _HONCHO_SESSION_ID_HASH_LEN = 8
+
+    @classmethod
+    def _enforce_session_id_limit(cls, sanitized: str, original: str) -> str:
+        """Truncate a sanitized session ID to Honcho's 100-char limit.
+
+        The common case (short keys) short-circuits with no modification.
+        For over-limit keys, keep a prefix of the sanitized ID and append a
+        deterministic ``-<sha256 prefix>`` suffix so two distinct long keys
+        that share a leading segment don't collide onto the same truncated ID.
+        The hash is taken over the *original* pre-sanitization key, so two
+        inputs that sanitize to the same string still collide intentionally
+        (same logical session), but two inputs that only share a prefix do not.
+        """
+        max_len = cls._HONCHO_SESSION_ID_MAX_LEN
+        if len(sanitized) <= max_len:
+            return sanitized
+
+        hash_len = cls._HONCHO_SESSION_ID_HASH_LEN
+        digest = hashlib.sha256(original.encode("utf-8")).hexdigest()[:hash_len]
+        # max_len - hash_len - 1 (for the '-' separator) chars of the sanitized
+        # prefix, then '-<hash>'. Strip any trailing hyphen from the prefix so
+        # the result doesn't double up on separators.
+        prefix_len = max_len - hash_len - 1
+        prefix = sanitized[:prefix_len].rstrip("-")
+        return f"{prefix}-{digest}"
+
    def resolve_session_name(
        self,
        cwd: str | None = None,
@@ -566,7 +625,7 @@ class HonchoClientConfig:
        if gateway_session_key:
            sanitized = re.sub(r'[^a-zA-Z0-9_-]+', '-', gateway_session_key).strip('-')
            if sanitized:
-                return sanitized
+                return self._enforce_session_id_limit(sanitized, gateway_session_key)

        # per-session: inherit Hermes session_id (new Honcho session each run)
        if self.session_strategy == "per-session" and session_id:
@@ -646,6 +705,11 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
        except Exception:
            pass

+    # Fall back to the default so an unconfigured install cannot hang
+    # indefinitely on a stalled Honcho request.
+    if resolved_timeout is None:
+        resolved_timeout = _DEFAULT_HTTP_TIMEOUT
+
    if resolved_base_url:
        logger.info("Initializing Honcho client (base_url: %s, workspace: %s)", resolved_base_url, config.workspace_id)
    else:
@@ -95,6 +95,7 @@ class HonchoSessionManager:
        self._config = config
        self._runtime_user_peer_name = runtime_user_peer_name
        self._cache: dict[str, HonchoSession] = {}
+        self._cache_lock = threading.RLock()
        self._peers_cache: dict[str, Any] = {}
        self._sessions_cache: dict[str, Any] = {}

@@ -273,17 +274,35 @@ class HonchoSessionManager:
        Returns:
            The session.
        """
-        if key in self._cache:
-            logger.debug("Local session cache hit: %s", key)
-            return self._cache[key]
+        with self._cache_lock:
+            if key in self._cache:
+                logger.debug("Local session cache hit: %s", key)
+                return self._cache[key]

-        # Gateway sessions should use the runtime user identity when available.
-        if self._runtime_user_peer_name:
+        # Determine peer IDs — no lock needed (read-only, no shared state mutation).
+        # Gateway sessions normally use the runtime user identity (the
+        # platform-native ID: Telegram UID, Discord snowflake, Slack user,
+        # etc.) so multi-user bots scope memory per user.  For a single-user
+        # deployment the config-supplied ``peer_name`` is an unambiguous
+        # identity and we should keep it unified across platforms — see
+        # #14984.  Opt into that with ``hosts.<host>.pinPeerName: true`` in
+        # ``honcho.json`` (or root-level ``pinPeerName: true``).
+        # `is True` (not `bool(...)`) is deliberate: several multi-user tests
+        # pass a ``MagicMock`` for ``config`` where ``mock.pin_peer_name``
+        # silently returns another MagicMock — truthy by default.  Requiring
+        # strict ``True`` keeps pinning as opt-in even for callers that
+        # haven't updated their mocks yet; real configs built via
+        # ``from_global_config`` always produce a proper boolean.
+        pin_peer_name = (
+            self._config is not None
+            and bool(getattr(self._config, "peer_name", None))
+            and getattr(self._config, "pin_peer_name", False) is True
+        )
+        if self._runtime_user_peer_name and not pin_peer_name:
            user_peer_id = self._sanitize_id(self._runtime_user_peer_name)
        elif self._config and self._config.peer_name:
            user_peer_id = self._sanitize_id(self._config.peer_name)
        else:
-            # Fallback: derive from session key
            parts = key.split(":", 1)
            channel = parts[0] if len(parts) > 1 else "default"
            chat_id = parts[1] if len(parts) > 1 else key
@@ -293,19 +312,14 @@ class HonchoSessionManager:
            self._config.ai_peer if self._config else "hermes-assistant"
        )

-        # Sanitize session ID for Honcho
+        # All expensive I/O outside the lock — Honcho's persistence is source of truth
        honcho_session_id = self._sanitize_id(key)
-
-        # Get or create peers
        user_peer = self._get_or_create_peer(user_peer_id)
        assistant_peer = self._get_or_create_peer(assistant_peer_id)
-
-        # Get or create Honcho session
        honcho_session, existing_messages = self._get_or_create_honcho_session(
            honcho_session_id, user_peer, assistant_peer
        )

-        # Convert Honcho messages to local format
        local_messages = []
        for msg in existing_messages:
            role = "assistant" if msg.peer_id == assistant_peer_id else "user"
@@ -313,10 +327,9 @@ class HonchoSessionManager:
                "role": role,
                "content": msg.content,
                "timestamp": msg.created_at.isoformat() if msg.created_at else "",
-                "_synced": True,  # Already in Honcho
+                "_synced": True,
            })

-        # Create local session wrapper with existing messages
        session = HonchoSession(
            key=key,
            user_peer_id=user_peer_id,
@@ -325,7 +338,9 @@ class HonchoSessionManager:
            messages=local_messages,
        )

-        self._cache[key] = session
+        # Write to cache under lock — only one writer wins
+        with self._cache_lock:
+            self._cache[key] = session
        return session

    def _flush_session(self, session: HonchoSession) -> bool:
@@ -356,13 +371,15 @@ class HonchoSessionManager:
            for msg in new_messages:
                msg["_synced"] = True
            logger.debug("Synced %d messages to Honcho for %s", len(honcho_messages), session.key)
-            self._cache[session.key] = session
+            with self._cache_lock:
+                self._cache[session.key] = session
            return True
        except Exception as e:
            for msg in new_messages:
                msg["_synced"] = False
            logger.error("Failed to sync messages to Honcho: %s", e)
-            self._cache[session.key] = session
+            with self._cache_lock:
+                self._cache[session.key] = session
            return False

    def _async_writer_loop(self) -> None:
@@ -434,7 +451,9 @@ class HonchoSessionManager:
        Called at session end for "session" write_frequency, or to force
        a sync before process exit regardless of mode.
        """
-        for session in list(self._cache.values()):
+        with self._cache_lock:
+            sessions = list(self._cache.values())
+        for session in sessions:
            try:
                self._flush_session(session)
            except Exception as e:
@@ -459,9 +478,10 @@ class HonchoSessionManager:

    def delete(self, key: str) -> bool:
        """Delete a session from local cache."""
-        if key in self._cache:
-            del self._cache[key]
-            return True
+        with self._cache_lock:
+            if key in self._cache:
+                del self._cache[key]
+                return True
        return False

    def new_session(self, key: str) -> HonchoSession:
@@ -473,20 +493,25 @@ class HonchoSessionManager:
        """
        import time

-        # Remove old session from caches (but don't delete from Honcho)
-        old_session = self._cache.pop(key, None)
-        if old_session:
-            self._sessions_cache.pop(old_session.honcho_session_id, None)
+        # Hold the reentrant lock across get_or_create so a concurrent caller
+        # can't observe the (old-popped, new-not-yet-inserted) gap and create
+        # its own session under the raw key.  `_cache_lock` is an RLock so
+        # nested reacquisition inside get_or_create is safe.
+        with self._cache_lock:
+            # Remove old session from caches (but don't delete from Honcho)
+            old_session = self._cache.pop(key, None)
+            if old_session:
+                self._sessions_cache.pop(old_session.honcho_session_id, None)

-        # Create new session with timestamp suffix
-        timestamp = int(time.time())
-        new_key = f"{key}:{timestamp}"
+            # Create new session with timestamp suffix
+            timestamp = int(time.time())
+            new_key = f"{key}:{timestamp}"

-        # get_or_create will create a fresh session
-        session = self.get_or_create(new_key)
+            # get_or_create will create a fresh session
+            session = self.get_or_create(new_key)

-        # Cache under the original key so callers find it by the expected name
-        self._cache[key] = session
+            # Cache under the original key so callers find it by the expected name
+            self._cache[key] = session

        logger.info("Created new session for %s (honcho: %s)", key, session.honcho_session_id)
        return session
@@ -86,7 +86,7 @@ from tools.browser_tool import cleanup_browser


 # Agent internals extracted to agent/ package for modularity
-from agent.memory_manager import build_memory_context_block, sanitize_context
+from agent.memory_manager import StreamingContextScrubber, build_memory_context_block, sanitize_context
 from agent.retry_utils import jittered_backoff
 from agent.error_classifier import classify_api_error, FailoverReason
 from agent.prompt_builder import (
@@ -1218,6 +1218,10 @@ class AIAgent:
        # Deferred paragraph break flag — set after tool iterations so a
        # single "\n\n" is prepended to the next real text delta.
        self._stream_needs_break = False
+        # Stateful scrubber for <memory-context> spans split across stream
+        # deltas (#5719).  sanitize_context() alone can't survive chunk
+        # boundaries because the block regex needs both tags in one string.
+        self._stream_context_scrubber = StreamingContextScrubber()
        # Visible assistant text already delivered through live token callbacks
        # during the current model response. Used to avoid re-sending the same
        # commentary when the provider later returns it as a completed interim
@@ -6019,6 +6023,20 @@ class AIAgent:

    def _reset_stream_delivery_tracking(self) -> None:
        """Reset tracking for text delivered during the current model response."""
+        # Flush any benign partial-tag tail held by the context scrubber so it
+        # reaches the UI before we clear state for the next model call.  If
+        # the scrubber is mid-span, flush() drops the orphaned content.
+        scrubber = getattr(self, "_stream_context_scrubber", None)
+        if scrubber is not None:
+            tail = scrubber.flush()
+            if tail:
+                callbacks = [cb for cb in (self.stream_delta_callback, self._stream_callback) if cb is not None]
+                for cb in callbacks:
+                    try:
+                        cb(tail)
+                    except Exception:
+                        pass
+                self._record_streamed_assistant_text(tail)
        self._current_streamed_assistant_text = ""

    def _record_streamed_assistant_text(self, text: str) -> None:
@@ -6069,6 +6087,28 @@ class AIAgent:
        if getattr(self, "_stream_needs_break", False) and text and text.strip():
            self._stream_needs_break = False
            text = "\n\n" + text
+            prepended_break = True
+        else:
+            prepended_break = False
+        if isinstance(text, str):
+            # Strip <think> blocks first (per-delta is safe for closed pairs; the
+            # unterminated-tag path is handled downstream by stream_consumer).
+            # Then feed through the stateful context scrubber so memory-context
+            # spans split across chunks cannot leak to the UI (#5719).
+            text = self._strip_think_blocks(text or "")
+            scrubber = getattr(self, "_stream_context_scrubber", None)
+            if scrubber is not None:
+                text = scrubber.feed(text)
+            else:
+                # Defensive: legacy callers without the scrubber attribute.
+                text = sanitize_context(text)
+            # Only strip leading newlines on the first delta — mid-stream "\n" is legitimate markdown.
+            if not prepended_break and not getattr(
+                self, "_current_streamed_assistant_text", ""
+            ):
+                text = text.lstrip("\n")
+        if not text:
+            return
        callbacks = [cb for cb in (self.stream_delta_callback, self._stream_callback) if cb is not None]
        delivered = False
        for cb in callbacks:
@@ -9592,16 +9632,6 @@ class AIAgent:
        if isinstance(persist_user_message, str):
            persist_user_message = _sanitize_surrogates(persist_user_message)

-        # Strip leaked <memory-context> blocks from user input.  When Honcho's
-        # saveMessages persists a turn that included injected context, the block
-        # can reappear in the next turn's user message via message history.
-        # Stripping here prevents stale memory tags from leaking into the
-        # conversation and being visible to the user or the model as user text.
-        if isinstance(user_message, str):
-            user_message = sanitize_context(user_message)
-        if isinstance(persist_user_message, str):
-            persist_user_message = sanitize_context(persist_user_message)
-
        # Store stream callback for _interruptible_api_call to pick up
        self._stream_callback = stream_callback
        self._persist_user_message_idx = None
@@ -9680,6 +9710,13 @@ class AIAgent:
        # Track user turns for memory flush and periodic nudge logic
        self._user_turn_count += 1

+        # Reset the streaming context scrubber at the top of each turn so a
+        # hung span from a prior interrupted stream can't taint this turn's
+        # output.
+        scrubber = getattr(self, "_stream_context_scrubber", None)
+        if scrubber is not None:
+            scrubber.reset()
+
        # Preserve the original user message (no nudge injection).
        original_user_message = persist_user_message if persist_user_message is not None else user_message

@@ -12711,7 +12748,6 @@ class AIAgent:
                        truncated_response_prefix = ""
                        length_continue_retries = 0
                    
-                    # Strip <think> blocks from user-facing response (keep raw in messages for trajectory)
                    final_response = self._strip_think_blocks(final_response).strip()
                    
                    final_msg = self._build_assistant_message(assistant_message, finish_reason)
@@ -557,6 +557,12 @@ AUTHOR_MAP = {
    "mor.aleksandr@yahoo.com": "MorAlekss",
    "ash@users.noreply.github.com": "ash",
    "andrewho.sf@gmail.com": "andrewhosf",
+    # April 2026 Honcho bug-fix consolidation (#15381)
+    "HiddenPuppy@users.noreply.github.com": "HiddenPuppy",
+    "code@sasha.id": "sasha-id",
+    "dontcallmejames@users.noreply.github.com": "dontcallmejames",
+    "hekaru.agent@gmail.com": "hekaru-agent",
+    "jas9000@gmail.com": "twozle",
 }


@@ -516,26 +516,88 @@ class TestGetTextAuxiliaryClient:
        assert isinstance(client, CodexAuxiliaryClient)
        assert model == "gpt-5.2-codex"

+    def test_returns_none_when_nothing_available(self, monkeypatch):
+        monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
+        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
+        monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
+             patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)):
+            client, model = get_text_auxiliary_client()
+        assert client is None
+        assert model is None

-class TestNousAuxiliaryRefresh:
-    def test_try_nous_prefers_runtime_credentials(self):
-        fresh_base = "https://inference-api.nousresearch.com/v1"
+    def test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api(self):
+        with patch("agent.auxiliary_client._resolve_custom_runtime",
+                   return_value=("https://api.openai.com/v1", "sk-test", "codex_responses")), \
+             patch("agent.auxiliary_client._read_main_model", return_value="gpt-5.3-codex"), \
+             patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_text_auxiliary_client()
+
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-5.3-codex"
+        assert mock_openai.call_args.kwargs["base_url"] == "https://api.openai.com/v1"
+        assert mock_openai.call_args.kwargs["api_key"] == "sk-test"
+
+
+class TestVisionClientFallback:
+    """Vision client auto mode resolves known-good multimodal backends."""
+
+    def test_vision_auto_includes_active_provider_when_configured(self, monkeypatch):
+        """Active provider appears in available backends when credentials exist."""
+        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
        with (
-            patch("agent.auxiliary_client._read_nous_auth", return_value={"access_token": "stale-token"}),
-            patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", fresh_base)),
-            patch("hermes_cli.models.get_nous_recommended_aux_model", return_value=None),
+            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
+            patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
+            patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
+            patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
+            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
+        ):
+            backends = get_available_vision_backends()
+
+        assert "anthropic" in backends
+
+    def test_resolve_provider_client_returns_native_anthropic_wrapper(self, monkeypatch):
+        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
+        with (
+            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
+            patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
+            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
+        ):
+            client, model = resolve_provider_client("anthropic")
+
+        assert client is not None
+        assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
+        assert model == "claude-haiku-4-5-20251001"
+
+
+class TestAuxiliaryPoolAwareness:
+    def test_try_nous_uses_pool_entry(self):
+        class _Entry:
+            access_token = "pooled-access-token"
+            agent_key = "pooled-agent-key"
+            inference_base_url = "https://inference.pool.example/v1"
+
+        class _Pool:
+            def has_credentials(self):
+                return True
+
+            def select(self):
+                return _Entry()
+
+        with (
+            patch("agent.auxiliary_client.load_pool", return_value=_Pool()),
            patch("agent.auxiliary_client.OpenAI") as mock_openai,
        ):
            from agent.auxiliary_client import _try_nous

-            mock_openai.return_value = MagicMock()
            client, model = _try_nous()

        assert client is not None
-        # No Portal recommendation → falls back to the hardcoded default.
        assert model == "google/gemini-3-flash-preview"
-        assert mock_openai.call_args.kwargs["api_key"] == "fresh-agent-key"
-        assert mock_openai.call_args.kwargs["base_url"] == fresh_base
+        assert mock_openai.call_args.kwargs["api_key"] == "pooled-agent-key"
+        assert mock_openai.call_args.kwargs["base_url"] == "https://inference.pool.example/v1"

    def test_try_nous_uses_portal_recommendation_for_text(self):
        """When the Portal recommends a compaction model, _try_nous honors it."""
@@ -643,6 +705,40 @@ class TestNousAuxiliaryRefresh:
        assert stale_client.chat.completions.create.await_count == 1
        assert fresh_async_client.chat.completions.create.await_count == 1

+    def test_cached_gmi_client_keeps_explicit_slash_model_override(self):
+        import agent.auxiliary_client as aux
+
+        fake_client = MagicMock()
+
+        with patch(
+            "agent.auxiliary_client.resolve_provider_client",
+            return_value=(fake_client, "google/gemini-3.1-flash-lite-preview"),
+        ) as mock_resolve:
+            aux.shutdown_cached_clients()
+            try:
+                client, model = aux._get_cached_client(
+                    "gmi",
+                    "google/gemini-3.1-flash-lite-preview",
+                    base_url="https://api.gmi-serving.com/v1",
+                    api_key="gmi-key",
+                )
+                assert client is fake_client
+                assert model == "google/gemini-3.1-flash-lite-preview"
+
+                client, model = aux._get_cached_client(
+                    "gmi",
+                    "openai/gpt-5.4-mini",
+                    base_url="https://api.gmi-serving.com/v1",
+                    api_key="gmi-key",
+                )
+            finally:
+                aux.shutdown_cached_clients()
+
+        assert client is fake_client
+        assert model == "openai/gpt-5.4-mini"
+        assert mock_resolve.call_count == 1
+
+
 # ── Payment / credit exhaustion fallback ─────────────────────────────────


@@ -242,6 +242,72 @@ class TestSummaryFailureCooldown:
        assert mock_call.call_count == 1


+class TestSummaryFailureTrackingForGatewayWarning:
+    """When summary generation fails, the compressor must record dropped count
+    + fallback flag so gateway hygiene & /compress can surface a visible
+    warning instead of silently dropping context."""
+
+    def test_compress_records_fallback_and_dropped_count_on_summary_failure(self):
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
+            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
+
+        msgs = [
+            {"role": "system", "content": "sys"},
+            {"role": "user", "content": "msg 1"},
+            {"role": "assistant", "content": "msg 2"},
+            {"role": "user", "content": "msg 3"},
+            {"role": "assistant", "content": "msg 4"},
+            {"role": "user", "content": "msg 5"},
+            {"role": "assistant", "content": "msg 6"},
+            {"role": "user", "content": "msg 7"},
+        ]
+
+        # Simulate summary LLM call failing — covers the 404 / model-not-found
+        # case from issue (auxiliary compression model misconfigured).
+        with patch("agent.context_compressor.call_llm", side_effect=Exception("404 model not found")):
+            result = c.compress(msgs)
+
+        assert c._last_summary_fallback_used is True
+        assert c._last_summary_dropped_count > 0
+        assert c._last_summary_error is not None
+        # Result must still be well-formed (fallback summary present).
+        assert any(
+            isinstance(m.get("content"), str) and "Summary generation was unavailable" in m["content"]
+            for m in result
+        )
+
+    def test_compress_clears_fallback_flag_on_subsequent_success(self):
+        mock_response = MagicMock()
+        mock_response.choices = [MagicMock()]
+        mock_response.choices[0].message.content = "summary text"
+
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
+            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
+
+        msgs = [
+            {"role": "system", "content": "sys"},
+            {"role": "user", "content": "msg 1"},
+            {"role": "assistant", "content": "msg 2"},
+            {"role": "user", "content": "msg 3"},
+            {"role": "assistant", "content": "msg 4"},
+            {"role": "user", "content": "msg 5"},
+            {"role": "assistant", "content": "msg 6"},
+            {"role": "user", "content": "msg 7"},
+        ]
+
+        # First call fails, second succeeds — flag must reset on second compress.
+        with patch("agent.context_compressor.call_llm", side_effect=Exception("boom")):
+            c.compress(msgs)
+        assert c._last_summary_fallback_used is True
+
+        # Reset cooldown to allow retry on second compress
+        c._summary_failure_cooldown_until = 0.0
+        with patch("agent.context_compressor.call_llm", return_value=mock_response):
+            c.compress(msgs)
+        assert c._last_summary_fallback_used is False
+        assert c._last_summary_dropped_count == 0
+
+
 class TestSummaryPrefixNormalization:
    def test_legacy_prefix_is_replaced(self):
        summary = ContextCompressor._with_summary_prefix("[CONTEXT SUMMARY]: did work")
@@ -0,0 +1,211 @@
+"""Unit tests for StreamingContextScrubber (agent/memory_manager.py).
+
+Regression coverage for #5719 — memory-context spans split across stream
+deltas must not leak payload to the UI.  The one-shot sanitize_context()
+regex can't survive chunk boundaries, so _fire_stream_delta routes deltas
+through a stateful scrubber.
+"""
+
+from agent.memory_manager import StreamingContextScrubber, sanitize_context
+
+
+class TestStreamingContextScrubberBasics:
+    def test_empty_input_returns_empty(self):
+        s = StreamingContextScrubber()
+        assert s.feed("") == ""
+        assert s.flush() == ""
+
+    def test_plain_text_passes_through(self):
+        s = StreamingContextScrubber()
+        assert s.feed("hello world") == "hello world"
+        assert s.flush() == ""
+
+    def test_complete_block_in_single_delta(self):
+        """Regression: the one-shot test case from #13672 must still work."""
+        s = StreamingContextScrubber()
+        leaked = (
+            "<memory-context>\n"
+            "[System note: The following is recalled memory context, NOT new "
+            "user input. Treat as informational background data.]\n\n"
+            "## Honcho Context\nstale memory\n"
+            "</memory-context>\n\nVisible answer"
+        )
+        out = s.feed(leaked) + s.flush()
+        assert out == "\n\nVisible answer"
+
+    def test_open_and_close_in_separate_deltas_strips_payload(self):
+        """The real streaming case: tag pair split across deltas."""
+        s = StreamingContextScrubber()
+        deltas = [
+            "Hello ",
+            "<memory-context>\npayload ",
+            "more payload\n",
+            "</memory-context> world",
+        ]
+        out = "".join(s.feed(d) for d in deltas) + s.flush()
+        assert out == "Hello  world"
+        assert "payload" not in out
+
+    def test_realistic_fragmented_chunks_strip_memory_payload(self):
+        """Exact leak scenario from the reviewer's comment — 4 realistic chunks.
+
+        This is the case the original #13672 fix silently leaks on: the open
+        tag, system note, payload, and close tag each arrive in their own
+        delta because providers emit 1-80 char chunks.
+        """
+        s = StreamingContextScrubber()
+        deltas = [
+            "<memory-context>\n[System note: The following",
+            " is recalled memory context, NOT new user input. "
+            "Treat as informational background data.]\n\n",
+            "## Honcho Context\nstale memory\n",
+            "</memory-context>\n\nVisible answer",
+        ]
+        out = "".join(s.feed(d) for d in deltas) + s.flush()
+        assert out == "\n\nVisible answer"
+        # The system-note line and payload must never reach the UI.
+        assert "System note" not in out
+        assert "Honcho Context" not in out
+        assert "stale memory" not in out
+
+    def test_open_tag_split_across_two_deltas(self):
+        """The open tag itself arriving in two fragments."""
+        s = StreamingContextScrubber()
+        out = (
+            s.feed("pre <memory")
+            + s.feed("-context>leak</memory-context> post")
+            + s.flush()
+        )
+        assert out == "pre  post"
+        assert "leak" not in out
+
+    def test_close_tag_split_across_two_deltas(self):
+        """The close tag arriving in two fragments."""
+        s = StreamingContextScrubber()
+        out = (
+            s.feed("pre <memory-context>leak</memory")
+            + s.feed("-context> post")
+            + s.flush()
+        )
+        assert out == "pre  post"
+        assert "leak" not in out
+
+
+class TestStreamingContextScrubberPartialTagFalsePositives:
+    def test_partial_open_tag_tail_emitted_on_flush(self):
+        """Bare '<mem' at end of stream is not really a memory-context tag."""
+        s = StreamingContextScrubber()
+        out = s.feed("hello <mem") + s.feed("ory other") + s.flush()
+        assert out == "hello <memory other"
+
+    def test_partial_tag_released_when_disambiguated(self):
+        """A held-back partial tag that turns out to be prose gets released."""
+        s = StreamingContextScrubber()
+        # '< ' should not look like the start of any tag.
+        out = s.feed("price < ") + s.feed("10 dollars") + s.flush()
+        assert out == "price < 10 dollars"
+
+
+class TestStreamingContextScrubberUnterminatedSpan:
+    def test_unterminated_span_drops_payload(self):
+        """Provider drops close tag — better to lose output than to leak."""
+        s = StreamingContextScrubber()
+        out = s.feed("pre <memory-context>secret never closed") + s.flush()
+        assert out == "pre "
+        assert "secret" not in out
+
+    def test_reset_clears_hung_span(self):
+        """Cross-turn scrubber reset drops a hung span so next turn is clean."""
+        s = StreamingContextScrubber()
+        s.feed("pre <memory-context>half")
+        s.reset()
+        out = s.feed("clean text") + s.flush()
+        assert out == "clean text"
+
+
+class TestStreamingContextScrubberCaseInsensitivity:
+    def test_uppercase_tags_still_scrubbed(self):
+        s = StreamingContextScrubber()
+        out = (
+            s.feed("<MEMORY-CONTEXT>secret")
+            + s.feed("</Memory-Context>visible")
+            + s.flush()
+        )
+        assert out == "visible"
+        assert "secret" not in out
+
+
+class TestSanitizeContextUnchanged:
+    """Smoke test that the one-shot sanitize_context still works for whole strings."""
+
+    def test_whole_block_still_sanitized(self):
+        leaked = (
+            "<memory-context>\n"
+            "[System note: The following is recalled memory context, NOT new "
+            "user input. Treat as informational background data.]\n"
+            "payload\n"
+            "</memory-context>\nVisible"
+        )
+        out = sanitize_context(leaked).strip()
+        assert out == "Visible"
+
+
+class TestStreamingContextScrubberCrossTurn:
+    """A scrubber instance is reused across turns (per agent).  reset() must
+    clear any held state so a partial-tag tail from turn N doesn't bleed
+    into turn N+1's first delta."""
+
+    def test_reset_clears_held_partial_tag(self):
+        s = StreamingContextScrubber()
+        # Feed a partial open-tag prefix that gets held back as buffer.
+        out_turn_1 = s.feed("answer<memo")
+        assert out_turn_1 == "answer"
+
+        # Reset for next turn — buffer must clear.
+        s.reset()
+
+        # New turn: plain text starting with a "<m" must NOT be treated as
+        # the continuation of the held "<memo".
+        out_turn_2 = s.feed("<marker>fresh content")
+        assert out_turn_2 == "<marker>fresh content"
+
+    def test_reset_clears_in_span_state(self):
+        s = StreamingContextScrubber()
+        s.feed("text<memory-context>secret-tail")
+        # Mid-span state held — without reset, subsequent text would be
+        # discarded until we see </memory-context>.
+        s.reset()
+        out = s.feed("post-reset visible text")
+        assert out == "post-reset visible text"
+
+
+class TestBuildMemoryContextBlockWarnsOnViolation:
+    """Providers must return raw context — not pre-wrapped.  When they do,
+    we strip and warn so the buggy provider surfaces."""
+
+    def test_provider_emitting_wrapper_warns(self, caplog):
+        import logging
+        from agent.memory_manager import build_memory_context_block
+
+        prewrapped = (
+            "<memory-context>\n"
+            "[System note: ...]\n\n"
+            "real fact\n"
+            "</memory-context>"
+        )
+        with caplog.at_level(logging.WARNING, logger="agent.memory_manager"):
+            out = build_memory_context_block(prewrapped)
+
+        assert any("pre-wrapped" in rec.message for rec in caplog.records)
+        assert out.count("<memory-context>") == 1
+        assert out.count("</memory-context>") == 1
+
+    def test_clean_provider_output_does_not_warn(self, caplog):
+        import logging
+        from agent.memory_manager import build_memory_context_block
+
+        with caplog.at_level(logging.WARNING, logger="agent.memory_manager"):
+            out = build_memory_context_block("plain fact about user")
+
+        assert not any("pre-wrapped" in rec.message for rec in caplog.records)
+        assert "plain fact about user" in out
@@ -288,6 +288,10 @@ def _hermetic_environment(tmp_path, monkeypatch):
        monkeypatch.setattr(_plugins_mod, "_plugin_manager", None)
    except Exception:
        pass
+    # Explicitly clear provider-specific base URL overrides that don't match
+    # the generic credential-shaped env-var filter above.
+    monkeypatch.delenv("GMI_API_KEY", raising=False)
+    monkeypatch.delenv("GMI_BASE_URL", raising=False)


 # Backward-compat alias — old tests reference this fixture name. Keep it
@@ -123,3 +123,61 @@ async def test_compress_command_explains_when_token_estimate_rises():
    assert "denser summaries" in result
    agent_instance.shutdown_memory_provider.assert_called_once()
    agent_instance.close.assert_called_once()
+
+
+@pytest.mark.asyncio
+async def test_compress_command_appends_warning_when_summary_generation_fails():
+    """When the auxiliary summariser fails and the compressor inserts a static
+    fallback placeholder, /compress must append a visible ⚠️ warning to its
+    reply. Otherwise the failure is silently logged and the user has no idea
+    earlier context is unrecoverable."""
+    history = _make_history()
+    # Compressed shape is irrelevant for this test — we only care that the
+    # warning surfaces. Drop one message so the headline is non-noop.
+    compressed = [
+        history[0],
+        {"role": "assistant", "content": "[fallback placeholder]"},
+        history[-1],
+    ]
+    runner = _make_runner(history)
+    agent_instance = MagicMock()
+    agent_instance.shutdown_memory_provider = MagicMock()
+    agent_instance.close = MagicMock()
+    agent_instance.context_compressor.has_content_to_compress.return_value = True
+    # Simulate summary-generation failure: fallback flag set, dropped count
+    # populated, error string captured.
+    agent_instance.context_compressor._last_summary_fallback_used = True
+    agent_instance.context_compressor._last_summary_dropped_count = 7
+    agent_instance.context_compressor._last_summary_error = (
+        "404 model not found: gemini-3-flash-preview"
+    )
+    agent_instance.session_id = "sess-1"
+    agent_instance._compress_context.return_value = (compressed, "")
+
+    def _estimate(messages):
+        if messages == history:
+            return 100
+        if messages == compressed:
+            return 60
+        raise AssertionError(f"unexpected transcript: {messages!r}")
+
+    with (
+        patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "***"}),
+        patch("gateway.run._resolve_gateway_model", return_value="test-model"),
+        patch("run_agent.AIAgent", return_value=agent_instance),
+        patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate),
+    ):
+        result = await runner._handle_compress_command(_make_event())
+
+    # The compress reply itself still goes through (the transcript was rewritten).
+    assert "Compressed:" in result
+    # ...but a clearly-marked warning must be appended.
+    assert "⚠️" in result
+    assert "Summary generation failed" in result
+    # Underlying error must surface so users can fix their config.
+    assert "404 model not found" in result
+    # Dropped count must be visible — silently losing N messages is the bug.
+    assert "7" in result
+    assert "historical message(s) were removed" in result
+    agent_instance.shutdown_memory_provider.assert_called_once()
+    agent_instance.close.assert_called_once()
@@ -0,0 +1,200 @@
+"""Tests for BasePlatformAdapter._keep_typing timeout-per-tick behavior.
+
+When the gateway is waiting on a long upstream provider response (e.g.
+Anthropic/opus-4.7 first-token latency climbing during an upstream blip),
+the model-call socket is blocked on the worker thread but the asyncio loop
+is still running, and ``_keep_typing`` refreshes the platform typing
+indicator every 2 seconds.
+
+The bug: each ``send_typing`` call is an HTTP round-trip to the platform API
+(Telegram/Discord). If the same network instability that's slowing the model
+call also makes ``send_typing`` slow (5-30s response time), the refresh loop
+stalls inside the ``await self.send_typing(...)`` call. Platform-side typing
+expires at ~5s, so the bubble dies and doesn't come back until that stuck
+call returns — exactly when the user most needs the "yes, still working"
+signal.
+
+The fix: bound each ``send_typing`` with ``asyncio.wait_for``. If a
+send_typing takes longer than the per-tick budget (default 1.5s when
+interval=2.0), abandon it and let the next scheduled tick fire a fresh
+call. As long as any one of them succeeds within the ~5s platform window,
+the bubble stays visible across provider stalls.
+"""
+
+import asyncio
+from unittest.mock import MagicMock
+
+import pytest
+
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    Platform,
+    PlatformConfig,
+    SendResult,
+)
+
+
+class _StubAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="test"), Platform.TELEGRAM)
+
+    async def connect(self) -> bool:
+        return True
+
+    async def disconnect(self) -> None:
+        self._mark_disconnected()
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None):
+        return SendResult(success=True, message_id="m1")
+
+    async def get_chat_info(self, chat_id):
+        return {"id": chat_id, "type": "dm"}
+
+
+class TestKeepTypingTimeoutPerTick:
+    @pytest.mark.asyncio
+    async def test_slow_send_typing_does_not_block_cadence(self, monkeypatch):
+        """A send_typing that hangs longer than the per-tick budget must be
+        abandoned so the next scheduled tick can fire a fresh call."""
+        adapter = _StubAdapter()
+        call_events = []
+
+        async def slow_send_typing(chat_id, metadata=None):
+            # Simulate a stuck HTTP round-trip. If _keep_typing awaits this
+            # unconditionally, the loop stalls for the full duration.
+            call_events.append("start")
+            try:
+                await asyncio.sleep(10)
+            finally:
+                call_events.append("finish-or-cancel")
+
+        monkeypatch.setattr(adapter, "send_typing", slow_send_typing)
+        # Avoid stop_typing side-effects in the finally block.
+        adapter.stop_typing = MagicMock(return_value=asyncio.sleep(0))
+
+        stop_event = asyncio.Event()
+        # Start the typing loop, let it run ~3s (should fire 2 ticks) then stop.
+        task = asyncio.create_task(
+            adapter._keep_typing(
+                chat_id="123",
+                interval=1.0,
+                stop_event=stop_event,
+            )
+        )
+        await asyncio.sleep(3.0)
+        stop_event.set()
+        try:
+            await asyncio.wait_for(task, timeout=2.0)
+        except asyncio.TimeoutError:
+            task.cancel()
+            pytest.fail(
+                "_keep_typing did not exit within 2s of stop_event.set() — "
+                "it is blocked on a slow send_typing call"
+            )
+
+        # With per-tick timeout, we should see MULTIPLE send_typing starts
+        # despite each being slow (abandoned via TimeoutError).  Without the
+        # fix there would be exactly 1 start (the one still stuck).
+        starts = [e for e in call_events if e == "start"]
+        assert len(starts) >= 2, (
+            f"expected at least 2 send_typing ticks across 3s of slow "
+            f"operation, got {len(starts)} — refresh cadence is stalled "
+            f"on a slow send_typing"
+        )
+
+    @pytest.mark.asyncio
+    async def test_fast_send_typing_still_gets_awaited(self, monkeypatch):
+        """When send_typing is fast (normal case), it must still complete
+        normally — the timeout is only an upper bound, not a cap on
+        successful calls."""
+        adapter = _StubAdapter()
+        completed = []
+
+        async def fast_send_typing(chat_id, metadata=None):
+            await asyncio.sleep(0.01)  # well under the timeout
+            completed.append(chat_id)
+
+        monkeypatch.setattr(adapter, "send_typing", fast_send_typing)
+        adapter.stop_typing = MagicMock(return_value=asyncio.sleep(0))
+
+        stop_event = asyncio.Event()
+        task = asyncio.create_task(
+            adapter._keep_typing(
+                chat_id="456",
+                interval=0.5,
+                stop_event=stop_event,
+            )
+        )
+        await asyncio.sleep(1.2)  # ~3 ticks
+        stop_event.set()
+        await asyncio.wait_for(task, timeout=1.0)
+
+        assert len(completed) >= 2, (
+            f"expected multiple completed send_typing calls, got "
+            f"{len(completed)}"
+        )
+        assert all(c == "456" for c in completed)
+
+    @pytest.mark.asyncio
+    async def test_send_typing_exception_does_not_kill_loop(self, monkeypatch):
+        """A send_typing that raises (e.g. transient HTTP 500) must be
+        caught so the loop continues refreshing on schedule."""
+        adapter = _StubAdapter()
+        tick_count = {"n": 0}
+
+        async def flaky_send_typing(chat_id, metadata=None):
+            tick_count["n"] += 1
+            if tick_count["n"] == 1:
+                raise RuntimeError("transient upstream error")
+            # Subsequent calls succeed.
+
+        monkeypatch.setattr(adapter, "send_typing", flaky_send_typing)
+        adapter.stop_typing = MagicMock(return_value=asyncio.sleep(0))
+
+        stop_event = asyncio.Event()
+        task = asyncio.create_task(
+            adapter._keep_typing(
+                chat_id="789",
+                interval=0.3,
+                stop_event=stop_event,
+            )
+        )
+        await asyncio.sleep(1.0)
+        stop_event.set()
+        await asyncio.wait_for(task, timeout=1.0)
+
+        assert tick_count["n"] >= 2, (
+            f"loop exited after first send_typing exception; expected it to "
+            f"keep ticking (got {tick_count['n']} ticks)"
+        )
+
+    @pytest.mark.asyncio
+    async def test_paused_chat_skips_send_typing(self, monkeypatch):
+        """When a chat is in _typing_paused (e.g. awaiting approval), the
+        loop must not call send_typing at all. Regression guard — existing
+        behavior, preserved through the timeout change."""
+        adapter = _StubAdapter()
+        calls = []
+
+        async def recording_send_typing(chat_id, metadata=None):
+            calls.append(chat_id)
+
+        monkeypatch.setattr(adapter, "send_typing", recording_send_typing)
+        adapter.stop_typing = MagicMock(return_value=asyncio.sleep(0))
+        adapter._typing_paused.add("paused-chat")
+
+        stop_event = asyncio.Event()
+        task = asyncio.create_task(
+            adapter._keep_typing(
+                chat_id="paused-chat",
+                interval=0.3,
+                stop_event=stop_event,
+            )
+        )
+        await asyncio.sleep(1.0)
+        stop_event.set()
+        await asyncio.wait_for(task, timeout=1.0)
+
+        assert calls == [], (
+            f"send_typing was called on a paused chat: {calls}"
+        )
@@ -393,3 +393,119 @@ async def test_session_hygiene_messages_stay_in_originating_topic(monkeypatch, t
    assert FakeCompressAgent.last_instance is not None
    FakeCompressAgent.last_instance.shutdown_memory_provider.assert_called_once()
    FakeCompressAgent.last_instance.close.assert_called_once()
+
+
+@pytest.mark.asyncio
+async def test_session_hygiene_warns_user_when_summary_generation_fails(monkeypatch, tmp_path):
+    """When auxiliary compression's summary LLM call fails, the compressor
+    inserts a static fallback and the dropped turns are unrecoverable.
+    Gateway must surface a visible ⚠️ warning to the user, including
+    thread_id metadata so it lands in the originating topic/thread."""
+    fake_dotenv = types.ModuleType("dotenv")
+    fake_dotenv.load_dotenv = lambda *args, **kwargs: None
+    monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
+
+    class FakeCompressAgentWithSummaryFailure:
+        last_instance = None
+
+        def __init__(self, **kwargs):
+            self.model = kwargs.get("model")
+            self.session_id = kwargs.get("session_id", "fake-session")
+            self._print_fn = None
+            self.shutdown_memory_provider = MagicMock()
+            self.close = MagicMock()
+            # Simulate a compressor that hit summary-generation failure
+            # and inserted the static fallback placeholder.
+            self.context_compressor = SimpleNamespace(
+                _last_summary_fallback_used=True,
+                _last_summary_dropped_count=42,
+                _last_summary_error="404 model not found: gemini-3-flash-preview",
+            )
+            type(self).last_instance = self
+
+        def _compress_context(self, messages, *_args, **_kwargs):
+            self.session_id = f"{self.session_id}_compressed"
+            return ([{"role": "assistant", "content": "compressed"}], None)
+
+    fake_run_agent = types.ModuleType("run_agent")
+    fake_run_agent.AIAgent = FakeCompressAgentWithSummaryFailure
+    monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
+
+    gateway_run = importlib.import_module("gateway.run")
+    GatewayRunner = gateway_run.GatewayRunner
+
+    adapter = HygieneCaptureAdapter()
+    runner = object.__new__(GatewayRunner)
+    runner.config = GatewayConfig(
+        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="fake-token")}
+    )
+    runner.adapters = {Platform.TELEGRAM: adapter}
+    runner._voice_mode = {}
+    runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False)
+    runner.session_store = MagicMock()
+    runner.session_store.get_or_create_session.return_value = SessionEntry(
+        session_key="agent:main:telegram:group:-1001:17585",
+        session_id="sess-1",
+        created_at=datetime.now(),
+        updated_at=datetime.now(),
+        platform=Platform.TELEGRAM,
+        chat_type="group",
+    )
+    runner.session_store.load_transcript.return_value = _make_history(6, content_size=400)
+    runner.session_store.has_any_sessions.return_value = True
+    runner.session_store.rewrite_transcript = MagicMock()
+    runner.session_store.append_to_transcript = MagicMock()
+    runner._running_agents = {}
+    runner._pending_messages = {}
+    runner._pending_approvals = {}
+    runner._session_db = None
+    runner._is_user_authorized = lambda _source: True
+    runner._set_session_env = lambda _context: None
+    runner._run_agent = AsyncMock(
+        return_value={
+            "final_response": "ok",
+            "messages": [],
+            "tools": [],
+            "history_offset": 0,
+            "last_prompt_tokens": 0,
+        }
+    )
+
+    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
+    monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"})
+    monkeypatch.setattr(
+        "agent.model_metadata.get_model_context_length",
+        lambda *_args, **_kwargs: 100,
+    )
+    monkeypatch.setenv("TELEGRAM_HOME_CHANNEL", "795544298")
+
+    event = MessageEvent(
+        text="hello",
+        source=SessionSource(
+            platform=Platform.TELEGRAM,
+            chat_id="-1001",
+            chat_type="group",
+            thread_id="17585",
+            user_id="12345",
+        ),
+        message_id="1",
+    )
+
+    result = await runner._handle_message(event)
+
+    assert result == "ok"
+    # The compressor reported summary-failure → exactly one warning
+    # message must have been delivered to the user.
+    warning_messages = [s for s in adapter.sent if "Context compression summary failed" in s["content"]]
+    assert len(warning_messages) == 1, (
+        f"Expected 1 compression-failure warning, got {len(warning_messages)}: {adapter.sent}"
+    )
+    warn = warning_messages[0]
+    # Warning must include the dropped count and the underlying error.
+    assert "42" in warn["content"]
+    assert "404" in warn["content"]
+    # Warning must land in the originating topic/thread, not the main channel.
+    assert warn["chat_id"] == "-1001"
+    assert warn["metadata"] == {"thread_id": "17585"}
+
+    FakeCompressAgentWithSummaryFailure.last_instance.close.assert_called_once()
@@ -356,6 +356,81 @@ def test_config_bridges_slack_free_response_channels(monkeypatch, tmp_path):
    assert _os.environ["SLACK_FREE_RESPONSE_CHANNELS"] == "C0AQWDLHY9M,C9999999999"


+def test_top_level_slack_settings_do_not_disable_env_token_setup(monkeypatch, tmp_path):
+    from gateway.config import load_gateway_config
+
+    hermes_home = tmp_path / ".hermes"
+    hermes_home.mkdir()
+    (hermes_home / "config.yaml").write_text(
+        "slack:\n"
+        "  require_mention: false\n",
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+    monkeypatch.setenv("SLACK_BOT_TOKEN", "xoxb-test")
+    monkeypatch.delenv("SLACK_REQUIRE_MENTION", raising=False)
+
+    config = load_gateway_config()
+
+    slack_config = config.platforms[Platform.SLACK]
+    assert slack_config.enabled is True
+    assert slack_config.token == "xoxb-test"
+    assert slack_config.extra.get("require_mention") is False
+    assert "_enabled_explicit" not in slack_config.extra
+
+
+def test_explicit_top_level_slack_enabled_false_wins_over_env_token(monkeypatch, tmp_path):
+    from gateway.config import load_gateway_config
+
+    hermes_home = tmp_path / ".hermes"
+    hermes_home.mkdir()
+    (hermes_home / "config.yaml").write_text(
+        "slack:\n"
+        "  enabled: false\n"
+        "  require_mention: false\n",
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+    monkeypatch.setenv("SLACK_BOT_TOKEN", "xoxb-test")
+    monkeypatch.delenv("SLACK_REQUIRE_MENTION", raising=False)
+
+    config = load_gateway_config()
+
+    slack_config = config.platforms[Platform.SLACK]
+    assert slack_config.enabled is False
+    assert slack_config.token == "xoxb-test"
+    assert slack_config.extra.get("require_mention") is False
+    assert "_enabled_explicit" not in slack_config.extra
+
+
+def test_explicit_platforms_slack_enabled_false_wins_over_env_token(monkeypatch, tmp_path):
+    from gateway.config import load_gateway_config
+
+    hermes_home = tmp_path / ".hermes"
+    hermes_home.mkdir()
+    (hermes_home / "config.yaml").write_text(
+        "platforms:\n"
+        "  slack:\n"
+        "    enabled: false\n"
+        "    extra:\n"
+        "      reply_in_thread: false\n",
+        encoding="utf-8",
+    )
+
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+    monkeypatch.setenv("SLACK_BOT_TOKEN", "xoxb-test")
+
+    config = load_gateway_config()
+
+    slack_config = config.platforms[Platform.SLACK]
+    assert slack_config.enabled is False
+    assert slack_config.token == "xoxb-test"
+    assert slack_config.extra.get("reply_in_thread") is False
+    assert "_enabled_explicit" not in slack_config.extra
+
+
 def test_config_bridges_slack_reply_in_thread(monkeypatch, tmp_path):
    from gateway.config import load_gateway_config

@@ -0,0 +1,80 @@
+"""Tests for _enrich_message_with_vision — regression for #5719.
+
+The auxiliary vision LLM can echo system-prompt memory-context back into
+its analysis output.  The boundary fix in gateway/run.py runs the generic
+sanitize_context helper over the description so the fenced wrapper and
+its system-note are removed before the description reaches the user.
+
+Plugin-specific header cleanup (e.g. "## Honcho Context") belongs at the
+provider boundary, not in this shared gateway path.
+"""
+
+import asyncio
+import json
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+
+@pytest.fixture
+def gateway_runner():
+    """Minimal GatewayRunner stub with just the method under test bound."""
+    from gateway.run import GatewayRunner
+
+    class _Stub:
+        _enrich_message_with_vision = GatewayRunner._enrich_message_with_vision
+
+    return _Stub()
+
+
+def _run(coro):
+    return asyncio.get_event_loop().run_until_complete(coro) if False else asyncio.new_event_loop().run_until_complete(coro)
+
+
+class TestEnrichMessageWithVision:
+    def test_clean_description_passes_through(self, gateway_runner):
+        """Vision output without leaked memory is embedded unchanged."""
+        fake_result = json.dumps({
+            "success": True,
+            "analysis": "A photograph of a sunset over the ocean.",
+        })
+        with patch("tools.vision_tools.vision_analyze_tool", new=AsyncMock(return_value=fake_result)):
+            out = _run(gateway_runner._enrich_message_with_vision("caption", ["/tmp/img.jpg"]))
+        assert "sunset over the ocean" in out
+
+    def test_memory_context_fence_stripped(self, gateway_runner):
+        """<memory-context>...</memory-context> fenced block is scrubbed."""
+        leaked = (
+            "<memory-context>\n"
+            "[System note: The following is recalled memory context, NOT new "
+            "user input. Treat as informational background data.]\n\n"
+            "User details and preferences here.\n"
+            "</memory-context>\n"
+            "A photograph of a cat."
+        )
+        fake_result = json.dumps({"success": True, "analysis": leaked})
+        with patch("tools.vision_tools.vision_analyze_tool", new=AsyncMock(return_value=fake_result)):
+            out = _run(gateway_runner._enrich_message_with_vision("caption", ["/tmp/img.jpg"]))
+        assert "photograph of a cat" in out
+        assert "<memory-context>" not in out
+        assert "User details and preferences" not in out
+        assert "System note" not in out
+
+    def test_fenced_leak_stripped_plugin_header_preserved(self, gateway_runner):
+        """The fenced wrapper is stripped; plugin-specific text outside the
+        fence (e.g. a "## Honcho Context" header) is left to the plugin layer.
+        Gateway core stays plugin-agnostic."""
+        leaked = (
+            "<memory-context>\n"
+            "[System note: The following is recalled memory context, NOT new "
+            "user input. Treat as informational background data.]\n"
+            "fenced leak\n"
+            "</memory-context>\n"
+            "A photograph of a dog."
+        )
+        fake_result = json.dumps({"success": True, "analysis": leaked})
+        with patch("tools.vision_tools.vision_analyze_tool", new=AsyncMock(return_value=fake_result)):
+            out = _run(gateway_runner._enrich_message_with_vision("caption", ["/tmp/img.jpg"]))
+        assert "photograph of a dog" in out
+        assert "fenced leak" not in out
+        assert "<memory-context>" not in out
@@ -42,6 +42,7 @@ class TestProviderRegistry:
        ("minimax-cn", "MiniMax (China)", "api_key"),
        ("ai-gateway", "Vercel AI Gateway", "api_key"),
        ("kilocode", "Kilo Code", "api_key"),
+        ("gmi", "GMI Cloud", "api_key"),
    ])
    def test_provider_registered(self, provider_id, name, auth_type):
        assert provider_id in PROVIDER_REGISTRY
@@ -106,6 +107,11 @@ class TestProviderRegistry:
        assert pconfig.api_key_env_vars == ("KILOCODE_API_KEY",)
        assert pconfig.base_url_env_var == "KILOCODE_BASE_URL"

+    def test_gmi_env_vars(self):
+        pconfig = PROVIDER_REGISTRY["gmi"]
+        assert pconfig.api_key_env_vars == ("GMI_API_KEY",)
+        assert pconfig.base_url_env_var == "GMI_BASE_URL"
+
    def test_huggingface_env_vars(self):
        pconfig = PROVIDER_REGISTRY["huggingface"]
        assert pconfig.api_key_env_vars == ("HF_TOKEN",)
@@ -121,6 +127,7 @@ class TestProviderRegistry:
        assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/anthropic"
        assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"
        assert PROVIDER_REGISTRY["kilocode"].inference_base_url == "https://api.kilo.ai/api/gateway"
+        assert PROVIDER_REGISTRY["gmi"].inference_base_url == "https://api.gmi-serving.com/v1"
        assert PROVIDER_REGISTRY["huggingface"].inference_base_url == "https://router.huggingface.co/v1"

    def test_oauth_providers_unchanged(self):
@@ -143,6 +150,7 @@ PROVIDER_ENV_VARS = (
    "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
    "AI_GATEWAY_API_KEY", "AI_GATEWAY_BASE_URL",
    "KILOCODE_API_KEY", "KILOCODE_BASE_URL",
+    "GMI_API_KEY", "GMI_BASE_URL",
    "DASHSCOPE_API_KEY", "OPENCODE_ZEN_API_KEY", "OPENCODE_GO_API_KEY",
    "NOUS_API_KEY", "GITHUB_TOKEN", "GH_TOKEN",
    "OPENAI_BASE_URL", "HERMES_COPILOT_ACP_COMMAND", "COPILOT_CLI_PATH",
@@ -178,6 +186,9 @@ class TestResolveProvider:
    def test_explicit_ai_gateway(self):
        assert resolve_provider("ai-gateway") == "ai-gateway"

+    def test_explicit_gmi(self):
+        assert resolve_provider("gmi") == "gmi"
+
    def test_alias_glm(self):
        assert resolve_provider("glm") == "zai"

@@ -205,6 +216,9 @@ class TestResolveProvider:
    def test_alias_vercel(self):
        assert resolve_provider("vercel") == "ai-gateway"

+    def test_alias_gmi_cloud(self):
+        assert resolve_provider("gmi-cloud") == "gmi"
+
    def test_explicit_kilocode(self):
        assert resolve_provider("kilocode") == "kilocode"

@@ -280,6 +294,10 @@ class TestResolveProvider:
        monkeypatch.setenv("AI_GATEWAY_API_KEY", "test-gw-key")
        assert resolve_provider("auto") == "ai-gateway"

+    def test_auto_detects_gmi_key(self, monkeypatch):
+        monkeypatch.setenv("GMI_API_KEY", "test-gmi-key")
+        assert resolve_provider("auto") == "gmi"
+
    def test_auto_detects_kilocode_key(self, monkeypatch):
        monkeypatch.setenv("KILOCODE_API_KEY", "test-kilo-key")
        assert resolve_provider("auto") == "kilocode"
@@ -497,6 +515,19 @@ class TestResolveApiKeyProviderCredentials:
        assert creds["api_key"] == "kilo-secret-key"
        assert creds["base_url"] == "https://api.kilo.ai/api/gateway"

+    def test_resolve_gmi_with_key(self, monkeypatch):
+        monkeypatch.setenv("GMI_API_KEY", "gmi-secret-key")
+        creds = resolve_api_key_provider_credentials("gmi")
+        assert creds["provider"] == "gmi"
+        assert creds["api_key"] == "gmi-secret-key"
+        assert creds["base_url"] == "https://api.gmi-serving.com/v1"
+
+    def test_resolve_gmi_custom_base_url(self, monkeypatch):
+        monkeypatch.setenv("GMI_API_KEY", "gmi-key")
+        monkeypatch.setenv("GMI_BASE_URL", "https://custom.gmi.example/v1")
+        creds = resolve_api_key_provider_credentials("gmi")
+        assert creds["base_url"] == "https://custom.gmi.example/v1"
+
    def test_resolve_kilocode_custom_base_url(self, monkeypatch):
        monkeypatch.setenv("KILOCODE_API_KEY", "kilo-key")
        monkeypatch.setenv("KILOCODE_BASE_URL", "https://custom.kilo.example/v1")
@@ -594,6 +625,15 @@ class TestRuntimeProviderResolution:
        assert result["api_key"] == "kilo-key"
        assert "kilo.ai" in result["base_url"]

+    def test_runtime_gmi(self, monkeypatch):
+        monkeypatch.setenv("GMI_API_KEY", "gmi-key")
+        from hermes_cli.runtime_provider import resolve_runtime_provider
+        result = resolve_runtime_provider(requested="gmi")
+        assert result["provider"] == "gmi"
+        assert result["api_mode"] == "chat_completions"
+        assert result["api_key"] == "gmi-key"
+        assert result["base_url"] == "https://api.gmi-serving.com/v1"
+
    def test_runtime_auto_detects_api_key_provider(self, monkeypatch):
        monkeypatch.setenv("KIMI_API_KEY", "auto-kimi-key")
        from hermes_cli.runtime_provider import resolve_runtime_provider
@@ -0,0 +1,363 @@
+"""Focused tests for GMI Cloud first-class provider wiring."""
+
+from __future__ import annotations
+
+import contextlib
+import io
+import sys
+import types
+from argparse import Namespace
+from unittest.mock import patch
+
+import pytest
+
+if "dotenv" not in sys.modules:
+    fake_dotenv = types.ModuleType("dotenv")
+    fake_dotenv.load_dotenv = lambda *args, **kwargs: None
+    sys.modules["dotenv"] = fake_dotenv
+
+from hermes_cli.auth import resolve_provider
+from hermes_cli.config import load_config
+from hermes_cli.models import (
+    CANONICAL_PROVIDERS,
+    _PROVIDER_LABELS,
+    _PROVIDER_MODELS,
+    normalize_provider,
+    provider_model_ids,
+)
+from agent.auxiliary_client import resolve_provider_client
+from agent.model_metadata import get_model_context_length
+
+
+@pytest.fixture(autouse=True)
+def _clear_provider_env(monkeypatch):
+    for key in (
+        "OPENROUTER_API_KEY",
+        "OPENAI_API_KEY",
+        "ANTHROPIC_API_KEY",
+        "GOOGLE_API_KEY",
+        "GLM_API_KEY",
+        "KIMI_API_KEY",
+        "MINIMAX_API_KEY",
+        "GMI_API_KEY",
+        "GMI_BASE_URL",
+    ):
+        monkeypatch.delenv(key, raising=False)
+
+
+class TestGmiAliases:
+    @pytest.mark.parametrize("alias", ["gmi", "gmi-cloud", "gmicloud"])
+    def test_alias_resolves(self, alias, monkeypatch):
+        monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
+        assert resolve_provider(alias) == "gmi"
+
+    def test_models_normalize_provider(self):
+        assert normalize_provider("gmi-cloud") == "gmi"
+        assert normalize_provider("gmicloud") == "gmi"
+
+    def test_providers_normalize_provider(self):
+        from hermes_cli.providers import normalize_provider as normalize_provider_in_providers
+
+        assert normalize_provider_in_providers("gmi-cloud") == "gmi"
+        assert normalize_provider_in_providers("gmicloud") == "gmi"
+
+
+class TestGmiConfigRegistry:
+    def test_optional_env_vars_include_gmi(self):
+        from hermes_cli.config import OPTIONAL_ENV_VARS
+
+        assert "GMI_API_KEY" in OPTIONAL_ENV_VARS
+        assert OPTIONAL_ENV_VARS["GMI_API_KEY"]["category"] == "provider"
+        assert OPTIONAL_ENV_VARS["GMI_API_KEY"]["password"] is True
+        assert OPTIONAL_ENV_VARS["GMI_API_KEY"]["url"] == "https://www.gmicloud.ai/"
+
+        assert "GMI_BASE_URL" in OPTIONAL_ENV_VARS
+        assert OPTIONAL_ENV_VARS["GMI_BASE_URL"]["category"] == "provider"
+        assert OPTIONAL_ENV_VARS["GMI_BASE_URL"]["password"] is False
+        # ENV_VARS_BY_VERSION entries are not needed for providers added after
+        # _config_version 22 (the current baseline) — users discover GMI via
+        # hermes model, not via upgrade prompts.
+
+
+class TestGmiModelCatalog:
+    def test_static_model_fallback_exists(self):
+        assert "gmi" in _PROVIDER_MODELS
+        models = _PROVIDER_MODELS["gmi"]
+        assert "zai-org/GLM-5.1-FP8" in models
+        assert "deepseek-ai/DeepSeek-V3.2" in models
+        assert "moonshotai/Kimi-K2.5" in models
+        assert "anthropic/claude-sonnet-4.6" in models
+
+    def test_canonical_provider_entry(self):
+        slugs = [p.slug for p in CANONICAL_PROVIDERS]
+        assert "gmi" in slugs
+
+    def test_provider_model_ids_prefers_live_api(self, monkeypatch):
+        monkeypatch.setattr(
+            "hermes_cli.auth.resolve_api_key_provider_credentials",
+            lambda provider_id: {
+                "provider": provider_id,
+                "api_key": "gmi-live-key",
+                "base_url": "https://api.gmi-serving.com/v1",
+                "source": "GMI_API_KEY",
+            },
+        )
+        monkeypatch.setattr(
+            "hermes_cli.models.fetch_api_models",
+            lambda api_key, base_url: [
+                "openai/gpt-5.4-mini",
+                "zai-org/GLM-5.1-FP8",
+            ],
+        )
+
+        assert provider_model_ids("gmi") == [
+            "openai/gpt-5.4-mini",
+            "zai-org/GLM-5.1-FP8",
+        ]
+
+    def test_provider_model_ids_falls_back_to_static_models(self, monkeypatch):
+        monkeypatch.setattr(
+            "hermes_cli.auth.resolve_api_key_provider_credentials",
+            lambda provider_id: {
+                "provider": provider_id,
+                "api_key": "gmi-live-key",
+                "base_url": "https://api.gmi-serving.com/v1",
+                "source": "GMI_API_KEY",
+            },
+        )
+        monkeypatch.setattr("hermes_cli.models.fetch_api_models", lambda api_key, base_url: None)
+
+        assert provider_model_ids("gmi") == list(_PROVIDER_MODELS["gmi"])
+
+
+class TestGmiProvidersModule:
+    def test_overlay_exists(self):
+        from hermes_cli.providers import HERMES_OVERLAYS
+
+        assert "gmi" in HERMES_OVERLAYS
+        overlay = HERMES_OVERLAYS["gmi"]
+        assert overlay.transport == "openai_chat"
+        assert overlay.extra_env_vars == ("GMI_API_KEY",)
+        assert overlay.base_url_override == "https://api.gmi-serving.com/v1"
+        assert overlay.base_url_env_var == "GMI_BASE_URL"
+        assert not overlay.is_aggregator
+
+    def test_provider_label(self):
+        assert _PROVIDER_LABELS["gmi"] == "GMI Cloud"
+
+
+class TestGmiDoctor:
+    def test_provider_env_hints_include_gmi(self):
+        from hermes_cli.doctor import _PROVIDER_ENV_HINTS
+
+        assert "GMI_API_KEY" in _PROVIDER_ENV_HINTS
+
+    def test_run_doctor_checks_gmi_models_endpoint(self, monkeypatch, tmp_path):
+        from hermes_cli import doctor as doctor_mod
+
+        home = tmp_path / ".hermes"
+        home.mkdir(parents=True, exist_ok=True)
+        (home / "config.yaml").write_text("memory: {}\n", encoding="utf-8")
+        (home / ".env").write_text("GMI_API_KEY=***\n", encoding="utf-8")
+        project = tmp_path / "project"
+        project.mkdir(exist_ok=True)
+
+        monkeypatch.setattr(doctor_mod, "HERMES_HOME", home)
+        monkeypatch.setattr(doctor_mod, "PROJECT_ROOT", project)
+        monkeypatch.setattr(doctor_mod, "_DHH", str(home))
+        monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
+
+        for env_name in (
+            "OPENROUTER_API_KEY",
+            "OPENAI_API_KEY",
+            "ANTHROPIC_API_KEY",
+            "ANTHROPIC_TOKEN",
+            "GLM_API_KEY",
+            "ZAI_API_KEY",
+            "Z_AI_API_KEY",
+            "KIMI_API_KEY",
+            "KIMI_CN_API_KEY",
+            "ARCEEAI_API_KEY",
+            "DEEPSEEK_API_KEY",
+            "HF_TOKEN",
+            "DASHSCOPE_API_KEY",
+            "MINIMAX_API_KEY",
+            "MINIMAX_CN_API_KEY",
+            "AI_GATEWAY_API_KEY",
+            "KILOCODE_API_KEY",
+            "OPENCODE_ZEN_API_KEY",
+            "OPENCODE_GO_API_KEY",
+            "XIAOMI_API_KEY",
+        ):
+            monkeypatch.delenv(env_name, raising=False)
+
+        fake_model_tools = types.SimpleNamespace(
+            check_tool_availability=lambda *a, **kw: ([], []),
+            TOOLSET_REQUIREMENTS={},
+        )
+        monkeypatch.setitem(sys.modules, "model_tools", fake_model_tools)
+
+        try:
+            from hermes_cli import auth as _auth_mod
+
+            monkeypatch.setattr(_auth_mod, "get_nous_auth_status", lambda: {})
+            monkeypatch.setattr(_auth_mod, "get_codex_auth_status", lambda: {})
+        except Exception:
+            pass
+
+        calls = []
+
+        def fake_get(url, headers=None, timeout=None):
+            calls.append((url, headers, timeout))
+            return types.SimpleNamespace(status_code=200)
+
+        import httpx
+
+        monkeypatch.setattr(httpx, "get", fake_get)
+
+        buf = io.StringIO()
+        with contextlib.redirect_stdout(buf):
+            doctor_mod.run_doctor(Namespace(fix=False))
+        out = buf.getvalue()
+
+        assert "API key or custom endpoint configured" in out
+        assert "GMI Cloud" in out
+        assert any(url == "https://api.gmi-serving.com/v1/models" for url, _, _ in calls)
+
+
+class TestGmiModelMetadata:
+    def test_url_to_provider(self):
+        from agent.model_metadata import _URL_TO_PROVIDER
+
+        assert _URL_TO_PROVIDER.get("api.gmi-serving.com") == "gmi"
+
+    def test_provider_prefixes(self):
+        from agent.model_metadata import _PROVIDER_PREFIXES
+
+        assert "gmi" in _PROVIDER_PREFIXES
+        assert "gmi-cloud" in _PROVIDER_PREFIXES
+        assert "gmicloud" in _PROVIDER_PREFIXES
+
+    def test_infer_from_url(self):
+        from agent.model_metadata import _infer_provider_from_url
+
+        assert _infer_provider_from_url("https://api.gmi-serving.com/v1") == "gmi"
+
+    def test_known_gmi_endpoint_still_uses_endpoint_metadata(self):
+        with patch(
+            "agent.model_metadata.get_cached_context_length",
+            return_value=None,
+        ), patch(
+            "agent.model_metadata.fetch_endpoint_model_metadata",
+            return_value={"anthropic/claude-opus-4.6": {"context_length": 409600}},
+        ), patch(
+            "agent.models_dev.lookup_models_dev_context",
+            return_value=None,
+        ), patch(
+            "agent.model_metadata.fetch_model_metadata",
+            return_value={},
+        ):
+            result = get_model_context_length(
+                "anthropic/claude-opus-4.6",
+                base_url="https://api.gmi-serving.com/v1",
+                api_key="gmi-test-key",
+                provider="custom",
+            )
+
+        assert result == 409600
+
+
+class TestGmiAuxiliary:
+    def test_aux_default_model(self):
+        from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
+
+        assert _API_KEY_PROVIDER_AUX_MODELS["gmi"] == "google/gemini-3.1-flash-lite-preview"
+
+    def test_resolve_provider_client_uses_gmi_aux_default(self, monkeypatch):
+        monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
+
+        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            mock_openai.return_value = object()
+            client, model = resolve_provider_client("gmi")
+
+        assert client is not None
+        assert model == "google/gemini-3.1-flash-lite-preview"
+        assert mock_openai.call_args.kwargs["api_key"] == "gmi-test-key"
+        assert mock_openai.call_args.kwargs["base_url"] == "https://api.gmi-serving.com/v1"
+
+    def test_resolve_provider_client_accepts_gmi_alias(self, monkeypatch):
+        monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
+
+        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            mock_openai.return_value = object()
+            client, model = resolve_provider_client("gmi-cloud")
+
+        assert client is not None
+        assert model == "google/gemini-3.1-flash-lite-preview"
+
+
+class TestGmiMainFlow:
+    def test_chat_parser_accepts_gmi_provider(self, monkeypatch):
+        recorded: dict[str, str] = {}
+
+        monkeypatch.setattr("hermes_cli.config.get_container_exec_info", lambda: None)
+        monkeypatch.setattr(
+            "hermes_cli.main.cmd_chat",
+            lambda args: recorded.setdefault("provider", args.provider),
+        )
+        monkeypatch.setattr(sys, "argv", ["hermes", "chat", "--provider", "gmi"])
+
+        from hermes_cli.main import main
+
+        main()
+
+        assert recorded["provider"] == "gmi"
+
+    def test_select_provider_and_model_routes_gmi_to_generic_flow(self, monkeypatch):
+        recorded: dict[str, str] = {}
+
+        monkeypatch.setattr("hermes_cli.auth.resolve_provider", lambda *args, **kwargs: None)
+
+        def fake_prompt_provider_choice(choices, default=0):
+            return next(i for i, label in enumerate(choices) if label.startswith("GMI Cloud"))
+
+        def fake_model_flow_api_key_provider(config, provider_id, current_model=""):
+            recorded["provider_id"] = provider_id
+
+        monkeypatch.setattr("hermes_cli.main._prompt_provider_choice", fake_prompt_provider_choice)
+        monkeypatch.setattr("hermes_cli.main._model_flow_api_key_provider", fake_model_flow_api_key_provider)
+
+        from hermes_cli.main import select_provider_and_model
+
+        select_provider_and_model()
+
+        assert recorded["provider_id"] == "gmi"
+
+    def test_model_flow_api_key_provider_persists_gmi_selection(self, monkeypatch):
+        monkeypatch.setenv("GMI_API_KEY", "gmi-test-key")
+
+        with patch(
+            "hermes_cli.models.fetch_api_models",
+            return_value=["zai-org/GLM-5.1-FP8", "openai/gpt-5.4-mini"],
+        ), patch(
+            "hermes_cli.auth._prompt_model_selection",
+            return_value="openai/gpt-5.4-mini",
+        ), patch(
+            "hermes_cli.auth.deactivate_provider",
+        ), patch(
+            "builtins.input",
+            return_value="",
+        ):
+            from hermes_cli.main import _model_flow_api_key_provider
+
+            _model_flow_api_key_provider(load_config(), "gmi", "old-model")
+
+        import yaml
+        from hermes_constants import get_hermes_home
+
+        config = yaml.safe_load((get_hermes_home() / "config.yaml").read_text()) or {}
+        model_cfg = config.get("model")
+        assert isinstance(model_cfg, dict)
+        assert model_cfg["provider"] == "gmi"
+        assert model_cfg["default"] == "openai/gpt-5.4-mini"
+        assert model_cfg["base_url"] == "https://api.gmi-serving.com/v1"
@@ -1,4 +1,4 @@
-"""_tui_need_npm_install: auto npm when lockfile ahead of node_modules."""
+"""_tui_need_npm_install: auto npm when node_modules is behind the lockfile."""

 import os
 from pathlib import Path
@@ -36,15 +36,39 @@ def test_need_install_when_ink_missing(tmp_path: Path, main_mod) -> None:
    assert main_mod._tui_need_npm_install(tmp_path) is True


-def test_need_install_when_lock_newer_than_marker(tmp_path: Path, main_mod) -> None:
+def test_no_install_when_lock_newer_but_hidden_lock_matches(tmp_path: Path, main_mod) -> None:
    _touch_ink(tmp_path)
-    (tmp_path / "package-lock.json").write_text("{}")
-    (tmp_path / "node_modules" / ".package-lock.json").write_text("{}")
+    (tmp_path / "package-lock.json").write_text('{"packages":{"node_modules/foo":{"version":"1.0.0"}}}')
+    (tmp_path / "node_modules" / ".package-lock.json").write_text(
+        '{"packages":{"node_modules/foo":{"version":"1.0.0","ideallyInert":true}}}'
+    )
    os.utime(tmp_path / "package-lock.json", (200, 200))
    os.utime(tmp_path / "node_modules" / ".package-lock.json", (100, 100))
+    assert main_mod._tui_need_npm_install(tmp_path) is False
+
+
+def test_need_install_when_required_package_missing_from_hidden_lock(tmp_path: Path, main_mod) -> None:
+    _touch_ink(tmp_path)
+    (tmp_path / "package-lock.json").write_text(
+        '{"packages":{"node_modules/foo":{"version":"1.0.0"},"node_modules/bar":{"version":"1.0.0"}}}'
+    )
+    (tmp_path / "node_modules" / ".package-lock.json").write_text(
+        '{"packages":{"node_modules/foo":{"version":"1.0.0"}}}'
+    )
    assert main_mod._tui_need_npm_install(tmp_path) is True


+def test_no_install_when_only_optional_peer_package_missing_from_hidden_lock(tmp_path: Path, main_mod) -> None:
+    _touch_ink(tmp_path)
+    (tmp_path / "package-lock.json").write_text(
+        '{"packages":{"node_modules/foo":{"version":"1.0.0"},"node_modules/optional":{"version":"1.0.0","optional":true,"peer":true}}}'
+    )
+    (tmp_path / "node_modules" / ".package-lock.json").write_text(
+        '{"packages":{"node_modules/foo":{"version":"1.0.0"}}}'
+    )
+    assert main_mod._tui_need_npm_install(tmp_path) is False
+
+
 def test_no_install_when_lock_older_than_marker(tmp_path: Path, main_mod) -> None:
    _touch_ink(tmp_path)
    (tmp_path / "package-lock.json").write_text("{}")
@@ -3,6 +3,103 @@
 from types import SimpleNamespace


+class TestResolveApiKey:
+    """Test _resolve_api_key with various config shapes."""
+
+    def test_returns_api_key_from_root(self, monkeypatch):
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        assert honcho_cli._resolve_api_key({"apiKey": "root-key"}) == "root-key"
+
+    def test_returns_api_key_from_host_block(self, monkeypatch):
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        cfg = {"hosts": {"hermes": {"apiKey": "host-key"}}, "apiKey": "root-key"}
+        assert honcho_cli._resolve_api_key(cfg) == "host-key"
+
+    def test_returns_local_for_base_url_without_api_key(self, monkeypatch):
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
+        cfg = {"baseUrl": "http://localhost:8000"}
+        assert honcho_cli._resolve_api_key(cfg) == "local"
+
+    def test_returns_local_for_base_url_env_var(self, monkeypatch):
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        monkeypatch.setenv("HONCHO_BASE_URL", "http://10.0.0.5:8000")
+        assert honcho_cli._resolve_api_key({}) == "local"
+
+    def test_returns_empty_when_nothing_configured(self, monkeypatch):
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
+        assert honcho_cli._resolve_api_key({}) == ""
+
+    def test_rejects_garbage_base_url_without_scheme(self, monkeypatch):
+        """Obvious non-URL literals in baseUrl (typos) must not pass the guard."""
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
+        # Boolean literals, pure digits, and bare identifiers without
+        # host-like punctuation are rejected.  Schemeless host:port-style
+        # strings are accepted (see test_accepts_legacy_schemeless_host).
+        for garbage in ("true", "false", "null", "1", "12345", "localhost"):
+            assert honcho_cli._resolve_api_key({"baseUrl": garbage}) == "", \
+                f"expected empty for garbage {garbage!r}"
+
+    def test_rejects_non_http_scheme_base_url(self, monkeypatch):
+        """file:// / ftp:// / ws:// schemes are rejected as non-HTTP Honcho URLs.
+
+        Note: these DO contain ``.`` or ``:`` so they pass the schemeless
+        host fallback.  That's acceptable — the Honcho SDK will still
+        reject them when it tries to connect.  If tighter filtering is
+        needed later, extend the lowered-literal blocklist or check the
+        parsed scheme explicitly.
+        """
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
+        # file:/// parses with scheme='file' but empty netloc, so the
+        # http/https guard rejects; the schemeless fallback also rejects
+        # because 'file:' starts with a known-non-http scheme prefix.
+        # ftp://host/ parses with scheme='ftp', netloc='host' — the
+        # http/https guard rejects but the schemeless fallback accepts
+        # because 'ftp://host/' contains ':' and '.'.  Behaviour is
+        # intentionally lenient: SDK errors out with clearer message.
+
+    def test_accepts_https_base_url(self, monkeypatch):
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
+        assert honcho_cli._resolve_api_key({"baseUrl": "https://honcho.example.com"}) == "local"
+
+    def test_accepts_legacy_schemeless_host(self, monkeypatch):
+        """Legacy configs with schemeless host:port must not regress.
+
+        Before scheme validation landed, ``baseUrl: "localhost:8000"`` passed
+        the truthy check and flowed through to the SDK.  The lenient
+        schemeless fallback preserves that behaviour so self-hosters with
+        older configs don't see spurious "no API key configured" errors.
+        The SDK itself still rejects malformed URLs at connect time.
+        """
+        import plugins.memory.honcho.cli as honcho_cli
+        monkeypatch.setattr(honcho_cli, "_host_key", lambda: "hermes")
+        monkeypatch.delenv("HONCHO_API_KEY", raising=False)
+        monkeypatch.delenv("HONCHO_BASE_URL", raising=False)
+        for legacy in ("localhost:8000", "10.0.0.5:8000", "honcho.local:8080", "host.example.com"):
+            assert honcho_cli._resolve_api_key({"baseUrl": legacy}) == "local", \
+                f"expected local sentinel for legacy schemeless {legacy!r}"
+
+
 class TestCmdStatus:
    def test_reports_connection_failure_when_session_setup_fails(self, monkeypatch, capsys, tmp_path):
        import plugins.memory.honcho.cli as honcho_cli
@@ -14,7 +14,7 @@ from plugins.memory.honcho.client import (
    reset_honcho_client,
    resolve_active_host,
    resolve_config_path,
-    GLOBAL_CONFIG_PATH,
+    resolve_global_config_path,
    HOST,
 )

@@ -360,7 +360,7 @@ class TestResolveConfigPath:
        with patch.dict(os.environ, {"HERMES_HOME": str(hermes_home)}), \
             patch.object(Path, "home", return_value=fake_home):
            result = resolve_config_path()
-        assert result == GLOBAL_CONFIG_PATH
+        assert result == fake_home / ".honcho" / "config.json"

    def test_falls_back_to_global_without_hermes_home_env(self, tmp_path):
        fake_home = tmp_path / "fakehome"
@@ -370,7 +370,18 @@ class TestResolveConfigPath:
             patch.object(Path, "home", return_value=fake_home):
            os.environ.pop("HERMES_HOME", None)
            result = resolve_config_path()
-        assert result == GLOBAL_CONFIG_PATH
+        assert result == fake_home / ".honcho" / "config.json"
+
+    def test_global_fallback_uses_home_at_call_time(self, tmp_path):
+        fake_home = tmp_path / "fakehome"
+        fake_home.mkdir()
+        hermes_home = tmp_path / "hermes"
+        hermes_home.mkdir()
+
+        with patch.dict(os.environ, {"HERMES_HOME": str(hermes_home)}), \
+             patch.object(Path, "home", return_value=fake_home):
+            assert resolve_global_config_path() == fake_home / ".honcho" / "config.json"
+            assert resolve_config_path() == fake_home / ".honcho" / "config.json"

    def test_from_global_config_uses_local_path(self, tmp_path):
        hermes_home = tmp_path / "hermes"
@@ -589,6 +600,28 @@ class TestGetHonchoClient:
        mock_honcho.assert_called_once()
        assert mock_honcho.call_args.kwargs["timeout"] == 88.0

+    @pytest.mark.skipif(
+        not importlib.util.find_spec("honcho"),
+        reason="honcho SDK not installed"
+    )
+    def test_defaults_to_30s_when_no_timeout_configured(self):
+        from plugins.memory.honcho.client import _DEFAULT_HTTP_TIMEOUT
+
+        fake_honcho = MagicMock(name="Honcho")
+        cfg = HonchoClientConfig(
+            api_key="test-key",
+            workspace_id="hermes",
+            environment="production",
+        )
+
+        with patch("honcho.Honcho", return_value=fake_honcho) as mock_honcho, \
+             patch("hermes_cli.config.load_config", return_value={}):
+            client = get_honcho_client(cfg)
+
+        assert client is fake_honcho
+        mock_honcho.assert_called_once()
+        assert mock_honcho.call_args.kwargs["timeout"] == _DEFAULT_HTTP_TIMEOUT
+
    @pytest.mark.skipif(
        not importlib.util.find_spec("honcho"),
        reason="honcho SDK not installed"
@@ -656,6 +689,82 @@ class TestResolveSessionNameGatewayKey:
        assert ":" not in result


+class TestResolveSessionNameLengthLimit:
+    """Regression tests for Honcho's 100-char session ID limit (issue #13868).
+
+    Long gateway session keys (Matrix room+event IDs, Telegram supergroup
+    reply chains, Slack thread IDs with long workspace prefixes) can overflow
+    Honcho's 100-char session_id limit after sanitization. Before this fix,
+    every Honcho API call for those sessions 400'd with "session_id too long".
+    """
+
+    HONCHO_MAX = 100
+
+    def test_short_gateway_key_unchanged(self):
+        """Short keys must not get a hash suffix appended."""
+        config = HonchoClientConfig()
+        result = config.resolve_session_name(
+            gateway_session_key="agent:main:telegram:dm:8439114563",
+        )
+        # Unchanged fast-path: sanitize only, no truncation, no hash suffix.
+        assert result == "agent-main-telegram-dm-8439114563"
+        assert len(result) <= self.HONCHO_MAX
+
+    def test_key_at_exact_limit_unchanged(self):
+        """A sanitized key that is exactly 100 chars must be returned as-is."""
+        key = "a" * self.HONCHO_MAX
+        config = HonchoClientConfig()
+        result = config.resolve_session_name(gateway_session_key=key)
+        assert result == key
+        assert len(result) == self.HONCHO_MAX
+
+    def test_long_gateway_key_truncated_to_limit(self):
+        """An over-limit sanitized key must truncate to exactly 100 chars."""
+        key = "!roomid:matrix.example.org|" + "$event_" + ("a" * 300)
+        config = HonchoClientConfig()
+        result = config.resolve_session_name(gateway_session_key=key)
+        assert result is not None
+        assert len(result) == self.HONCHO_MAX
+
+    def test_truncation_is_deterministic(self):
+        """Same long key must always produce the same truncated session ID."""
+        key = "matrix-" + ("a" * 300)
+        config = HonchoClientConfig()
+        first = config.resolve_session_name(gateway_session_key=key)
+        second = config.resolve_session_name(gateway_session_key=key)
+        assert first == second
+
+    def test_truncated_result_respects_char_allowlist(self):
+        """Truncated result must still match Honcho's [a-zA-Z0-9_-] allowlist."""
+        import re
+        key = "slack:T12345:thread-reply:" + ("x" * 300) + ":with:colons:and:slashes/here"
+        config = HonchoClientConfig()
+        result = config.resolve_session_name(gateway_session_key=key)
+        assert result is not None
+        assert re.fullmatch(r"[a-zA-Z0-9_-]+", result)
+
+    def test_distinct_long_keys_do_not_collide(self):
+        """Two long keys sharing a prefix must produce different truncated IDs."""
+        prefix = "matrix:!room:example.org|" + "a" * 200
+        key_a = prefix + "-suffix-alpha"
+        key_b = prefix + "-suffix-beta"
+        config = HonchoClientConfig()
+        result_a = config.resolve_session_name(gateway_session_key=key_a)
+        result_b = config.resolve_session_name(gateway_session_key=key_b)
+        assert result_a != result_b
+        assert len(result_a) == self.HONCHO_MAX
+        assert len(result_b) == self.HONCHO_MAX
+
+    def test_truncated_result_has_hash_suffix(self):
+        """Truncated IDs must end with '-<8 hex chars>' for collision resistance."""
+        import re
+        key = "matrix-" + ("a" * 300)
+        config = HonchoClientConfig()
+        result = config.resolve_session_name(gateway_session_key=key)
+        # Last 9 chars: '-' + 8 hex chars.
+        assert re.search(r"-[0-9a-f]{8}$", result)
+
+
 class TestResetHonchoClient:
    def test_reset_clears_singleton(self):
        import plugins.memory.honcho.client as mod
@@ -0,0 +1,85 @@
+"""Tests for honcho_profile's empty-card hint (#5137 follow-up)."""
+
+from __future__ import annotations
+
+import json
+from unittest.mock import MagicMock
+
+from plugins.memory.honcho import HonchoMemoryProvider
+
+
+def _make_provider(**cfg_overrides) -> HonchoMemoryProvider:
+    provider = HonchoMemoryProvider()
+    provider._manager = MagicMock()
+    provider._manager.get_peer_card.return_value = []  # empty card
+    provider._session_key = "agent:main:test"
+    provider._session_initialized = True  # bypass the lazy _ensure_session() gate
+    provider._cron_skipped = False
+
+    cfg = MagicMock()
+    # Defaults match HonchoClientConfig defaults
+    cfg.user_observe_me = cfg_overrides.get("user_observe_me", True)
+    cfg.user_observe_others = cfg_overrides.get("user_observe_others", True)
+    cfg.ai_observe_me = cfg_overrides.get("ai_observe_me", True)
+    cfg.ai_observe_others = cfg_overrides.get("ai_observe_others", True)
+    cfg.message_max_chars = 25000
+    provider._config = cfg
+
+    provider._dialectic_cadence = cfg_overrides.get("dialectic_cadence", 1)
+    provider._turn_count = cfg_overrides.get("turn_count", 5)
+    return provider
+
+
+class TestEmptyProfileHint:
+    def test_returns_hint_not_bare_error_message(self):
+        provider = _make_provider()
+        raw = provider.handle_tool_call("honcho_profile", {})
+        payload = json.loads(raw)
+        assert payload["result"] == "No profile facts available yet."
+        assert "hint" in payload
+        assert "not an error" in payload["hint"].lower()
+
+    def test_hint_mentions_warmup_when_turn_count_below_cadence(self):
+        provider = _make_provider(turn_count=1, dialectic_cadence=3)
+        raw = provider.handle_tool_call("honcho_profile", {})
+        payload = json.loads(raw)
+        assert "turn" in payload["hint"].lower()
+        assert "cadence" in payload["hint"].lower()
+
+    def test_hint_mentions_observation_when_fully_disabled_for_user(self):
+        provider = _make_provider(user_observe_me=False, user_observe_others=False)
+        raw = provider.handle_tool_call("honcho_profile", {"peer": "user"})
+        payload = json.loads(raw)
+        assert "observation is disabled" in payload["hint"].lower()
+
+    def test_hint_mentions_observation_when_fully_disabled_for_ai(self):
+        provider = _make_provider(ai_observe_me=False, ai_observe_others=False)
+        raw = provider.handle_tool_call("honcho_profile", {"peer": "ai"})
+        payload = json.loads(raw)
+        assert "observation is disabled" in payload["hint"].lower()
+        assert "ai" in payload["hint"]
+
+    def test_hint_falls_back_to_generic_reason_when_no_specific_cause(self):
+        """Mature session with observation on + enough turns = generic hint."""
+        provider = _make_provider(turn_count=50, dialectic_cadence=1)
+        raw = provider.handle_tool_call("honcho_profile", {})
+        payload = json.loads(raw)
+        assert "hint" in payload
+        # Generic hint mentions self-hosted as a common cause
+        assert any(word in payload["hint"].lower() for word in ("self-hosted", "dialectic"))
+
+    def test_hint_suggests_alternative_tools(self):
+        provider = _make_provider()
+        raw = provider.handle_tool_call("honcho_profile", {})
+        payload = json.loads(raw)
+        # User-facing suggestion to try honcho_reasoning or honcho_search
+        assert "honcho_reasoning" in payload["hint"] or "honcho_search" in payload["hint"]
+
+    def test_populated_card_returns_card_without_hint(self):
+        """Regression: a populated card should NOT trigger the hint path."""
+        provider = _make_provider()
+        provider._manager.get_peer_card.return_value = ["Fact 1", "Fact 2"]
+        raw = provider.handle_tool_call("honcho_profile", {})
+        payload = json.loads(raw)
+        assert payload["result"] == ["Fact 1", "Fact 2"]
+        assert "hint" not in payload
@@ -0,0 +1,307 @@
+"""Tests for the ``pinPeerName`` config flag (#14984).
+
+By default, when Hermes runs under a gateway (Telegram, Discord, Slack, ...)
+it passes the platform-native user ID as ``runtime_user_peer_name`` into
+``HonchoSessionManager``.  That ID wins over any configured ``peer_name``
+so multi-user bots scope memory per user.
+
+For a single-user personal deployment where the user connects over multiple
+platforms, that default forks memory into one Honcho peer per platform
+(Telegram UID, Discord snowflake, Slack user ID, ...).  The user asked for
+an opt-in knob that pins the user peer to ``peer_name`` from ``honcho.json``
+so the same person's memory stays unified regardless of which platform the
+turn arrived on — ``hosts.<host>.pinPeerName: true`` (or root-level
+``pinPeerName: true``).
+
+These tests exercise both the config parsing (``client.py::from_global_config``)
+and the resolution order (``session.py::get_or_create``).  We stub the
+Honcho API calls so we can assert the chosen ``user_peer_id`` without
+touching the network.
+"""
+
+import json
+from unittest.mock import MagicMock
+
+import pytest
+
+from plugins.memory.honcho.client import HonchoClientConfig
+from plugins.memory.honcho.session import HonchoSessionManager
+
+
+# ---------------------------------------------------------------------------
+# Config parsing
+# ---------------------------------------------------------------------------
+
+
+class TestPinPeerNameConfigParsing:
+    def test_default_is_false(self):
+        """Default preserves existing behaviour — multi-user bots unaffected."""
+        config = HonchoClientConfig()
+        assert config.pin_peer_name is False
+
+    def test_root_level_true(self, tmp_path, monkeypatch):
+        config_file = tmp_path / "honcho.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "k",
+            "peerName": "Igor",
+            "pinPeerName": True,
+        }))
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))
+
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.pin_peer_name is True
+        assert config.peer_name == "Igor"
+
+    def test_host_block_true(self, tmp_path, monkeypatch):
+        """Host-level flag works the same as root-level."""
+        config_file = tmp_path / "honcho.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "k",
+            "peerName": "Igor",
+            "hosts": {
+                "hermes": {"pinPeerName": True},
+            },
+        }))
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))
+
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.pin_peer_name is True
+
+    def test_host_block_overrides_root(self, tmp_path, monkeypatch):
+        """Host block wins over root — matches how every other flag behaves."""
+        config_file = tmp_path / "honcho.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "k",
+            "peerName": "Igor",
+            "pinPeerName": True,
+            "hosts": {
+                "hermes": {"pinPeerName": False},
+            },
+        }))
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))
+
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.pin_peer_name is False, (
+            "host-level pinPeerName=false must override root-level true, the "
+            "same way every other flag in this config is resolved"
+        )
+
+    def test_explicit_false_parses(self, tmp_path, monkeypatch):
+        config_file = tmp_path / "honcho.json"
+        config_file.write_text(json.dumps({
+            "apiKey": "k",
+            "peerName": "Igor",
+            "pinPeerName": False,
+        }))
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "isolated"))
+
+        config = HonchoClientConfig.from_global_config(config_path=config_file)
+        assert config.pin_peer_name is False
+
+
+# ---------------------------------------------------------------------------
+# Peer resolution (the actual bug fix)
+# ---------------------------------------------------------------------------
+
+
+def _patch_manager_for_resolution_test(mgr: HonchoSessionManager) -> None:
+    """Stub out the Honcho client so ``get_or_create`` doesn't try to talk
+    to the network — we only care about the user_peer_id chosen before
+    those calls happen.
+    """
+    fake_peer = MagicMock()
+    mgr._get_or_create_peer = MagicMock(return_value=fake_peer)
+    mgr._get_or_create_honcho_session = MagicMock(
+        return_value=(MagicMock(), [])
+    )
+
+
+class TestPeerResolutionOrder:
+    """Matrix of (runtime_id, pin_peer_name, peer_name) → expected user_peer_id."""
+
+    def _config(self, *, peer_name: str | None, pin_peer_name: bool) -> HonchoClientConfig:
+        # The test doesn't need auth / Honcho — disable the provider so
+        # the manager doesn't try to open a real client.
+        return HonchoClientConfig(
+            api_key="test-key",
+            peer_name=peer_name,
+            pin_peer_name=pin_peer_name,
+            enabled=False,
+            write_frequency="turn",  # avoid spawning the async writer thread
+        )
+
+    def test_runtime_wins_when_pin_is_false(self):
+        """Regression guard: default behaviour must stay unchanged.
+        Multi-user bots rely on the platform-native ID winning."""
+        mgr = HonchoSessionManager(
+            honcho=MagicMock(),
+            config=self._config(peer_name="Igor", pin_peer_name=False),
+            runtime_user_peer_name="86701400",  # e.g. Telegram UID
+        )
+        _patch_manager_for_resolution_test(mgr)
+
+        session = mgr.get_or_create("telegram:86701400")
+        assert session.user_peer_id == "86701400", (
+            "pin_peer_name=False is the multi-user default — the gateway's "
+            "platform-native user ID must win so each user gets their own "
+            "peer scope.  If this regresses, every Telegram/Discord/Slack "
+            "bot immediately merges memory across users."
+        )
+
+    def test_config_wins_when_pin_is_true(self):
+        """The #14984 fix: single-user deployments opt into config pinning."""
+        mgr = HonchoSessionManager(
+            honcho=MagicMock(),
+            config=self._config(peer_name="Igor", pin_peer_name=True),
+            runtime_user_peer_name="86701400",  # Telegram pushes this in
+        )
+        _patch_manager_for_resolution_test(mgr)
+
+        session = mgr.get_or_create("telegram:86701400")
+        assert session.user_peer_id == "Igor", (
+            "With pinPeerName=true the user's configured peer_name must "
+            "beat the platform-native runtime ID so memory stays unified "
+            "across Telegram/Discord/Slack for the same person."
+        )
+
+    def test_pin_noop_when_peer_name_missing(self):
+        """Safety: pinPeerName alone (no peer_name) must not silently drop
+        the runtime identity.  Without a configured peer_name there's
+        nothing to pin to — fall back to runtime as before."""
+        mgr = HonchoSessionManager(
+            honcho=MagicMock(),
+            config=self._config(peer_name=None, pin_peer_name=True),
+            runtime_user_peer_name="86701400",
+        )
+        _patch_manager_for_resolution_test(mgr)
+
+        session = mgr.get_or_create("telegram:86701400")
+        assert session.user_peer_id == "86701400", (
+            "pin_peer_name=True with no peer_name set must not strip the "
+            "runtime ID — otherwise the user peer would collapse to the "
+            "session-key fallback and lose per-user scoping entirely"
+        )
+
+    def test_runtime_missing_falls_back_to_peer_name(self):
+        """CLI-mode (no gateway runtime identity) uses config peer_name —
+        this path was already correct but the refactor shouldn't break it."""
+        mgr = HonchoSessionManager(
+            honcho=MagicMock(),
+            config=self._config(peer_name="Igor", pin_peer_name=False),
+            runtime_user_peer_name=None,
+        )
+        _patch_manager_for_resolution_test(mgr)
+
+        session = mgr.get_or_create("cli:local")
+        assert session.user_peer_id == "Igor"
+
+    def test_everything_missing_falls_back_to_session_key(self):
+        """Deepest fallback: no runtime identity, no peer_name, no pin.
+        Must still produce a deterministic peer_id from the session key."""
+        # Config with no peer_name and default pin_peer_name=False
+        mgr = HonchoSessionManager(
+            honcho=MagicMock(),
+            config=self._config(peer_name=None, pin_peer_name=False),
+            runtime_user_peer_name=None,
+        )
+        _patch_manager_for_resolution_test(mgr)
+
+        session = mgr.get_or_create("telegram:123")
+        assert session.user_peer_id == "user-telegram-123"
+
+    def test_pin_does_not_affect_assistant_peer(self):
+        """The flag only pins the USER peer — the assistant peer continues
+        to come from ``ai_peer`` and must not be touched."""
+        cfg = HonchoClientConfig(
+            api_key="k",
+            peer_name="Igor",
+            pin_peer_name=True,
+            ai_peer="hermes-assistant",
+            enabled=False,
+            write_frequency="turn",
+        )
+        mgr = HonchoSessionManager(
+            honcho=MagicMock(),
+            config=cfg,
+            runtime_user_peer_name="86701400",
+        )
+        _patch_manager_for_resolution_test(mgr)
+
+        session = mgr.get_or_create("telegram:86701400")
+        assert session.user_peer_id == "Igor"
+        assert session.assistant_peer_id == "hermes-assistant"
+
+
+class TestCrossPlatformMemoryUnification:
+    """The user-visible outcome of the #14984 fix: the same physical user
+    talking to Hermes via Telegram AND Discord should land on ONE peer
+    (not two) when pinPeerName is opted in.
+    """
+
+    def _config_pinned(self) -> HonchoClientConfig:
+        return HonchoClientConfig(
+            api_key="k",
+            peer_name="Igor",
+            pin_peer_name=True,
+            enabled=False,
+            write_frequency="turn",
+        )
+
+    def test_telegram_and_discord_collapse_to_one_peer_when_pinned(self):
+        """Single-user deployment: Telegram UID and Discord snowflake
+        both resolve to the same configured peer_name."""
+        # Telegram turn
+        mgr_telegram = HonchoSessionManager(
+            honcho=MagicMock(),
+            config=self._config_pinned(),
+            runtime_user_peer_name="86701400",
+        )
+        _patch_manager_for_resolution_test(mgr_telegram)
+        telegram_session = mgr_telegram.get_or_create("telegram:86701400")
+
+        # Discord turn (separate manager instance — simulates a fresh
+        # platform-adapter invocation)
+        mgr_discord = HonchoSessionManager(
+            honcho=MagicMock(),
+            config=self._config_pinned(),
+            runtime_user_peer_name="1348750102029926454",
+        )
+        _patch_manager_for_resolution_test(mgr_discord)
+        discord_session = mgr_discord.get_or_create("discord:1348750102029926454")
+
+        assert telegram_session.user_peer_id == "Igor"
+        assert discord_session.user_peer_id == "Igor"
+        assert telegram_session.user_peer_id == discord_session.user_peer_id, (
+            "cross-platform memory unification is the whole point of "
+            "pinPeerName — both platforms must land on the same Honcho peer"
+        )
+
+    def test_multiuser_default_keeps_platforms_separate(self):
+        """Negative control: with pinPeerName=false (the default), two
+        different platform IDs must produce two different peers so
+        multi-user bots don't merge users."""
+        cfg = HonchoClientConfig(
+            api_key="k",
+            peer_name="Igor",
+            pin_peer_name=False,
+            enabled=False,
+            write_frequency="turn",
+        )
+        mgr_a = HonchoSessionManager(
+            honcho=MagicMock(), config=cfg, runtime_user_peer_name="user_a",
+        )
+        mgr_b = HonchoSessionManager(
+            honcho=MagicMock(), config=cfg, runtime_user_peer_name="user_b",
+        )
+        _patch_manager_for_resolution_test(mgr_a)
+        _patch_manager_for_resolution_test(mgr_b)
+
+        sess_a = mgr_a.get_or_create("telegram:a")
+        sess_b = mgr_b.get_or_create("telegram:b")
+
+        assert sess_a.user_peer_id == "user_a"
+        assert sess_b.user_peer_id == "user_b"
+        assert sess_a.user_peer_id != sess_b.user_peer_id, (
+            "multi-user default MUST keep users separate — a regression "
+            "here would silently merge unrelated users' memory"
+        )
@@ -525,6 +525,39 @@ class TestConcludeToolDispatch:
        assert parsed == {"error": "Exactly one of conclusion or delete_id must be provided."}
        provider._manager.delete_conclusion.assert_not_called()

+    def test_sync_turn_strips_leaked_memory_context_before_honcho_ingest(self):
+        provider = HonchoMemoryProvider()
+        provider._session_key = "telegram:123"
+        provider._manager = MagicMock()
+        provider._cron_skipped = False
+        provider._config = SimpleNamespace(message_max_chars=25000)
+
+        session = MagicMock()
+        provider._manager.get_or_create.return_value = session
+
+        provider.sync_turn(
+            (
+                "hello\n\n"
+                "<memory-context>\n"
+                "[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
+                "## Honcho Context\n"
+                "stale memory\n"
+                "</memory-context>"
+            ),
+            (
+                "<memory-context>\n"
+                "[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
+                "## Honcho Context\n"
+                "stale memory\n"
+                "</memory-context>\n\n"
+                "Visible answer"
+            ),
+        )
+        provider._sync_thread.join(timeout=1.0)
+
+        assert session.add_message.call_args_list[0].args == ("user", "hello")
+        assert session.add_message.call_args_list[1].args == ("assistant", "Visible answer")
+

 # ---------------------------------------------------------------------------
 # Message chunking
@@ -1441,6 +1441,24 @@ class TestBuildAssistantMessage:
        result = agent._build_assistant_message(msg, "stop")
        assert result["content"] == "No thinking here."

+    def test_memory_context_in_stored_content_is_preserved(self, agent):
+        """`_build_assistant_message` must not silently mutate model output
+        containing literal <memory-context> markers — that's legitimate text
+        (e.g. documentation, code) that the model may emit.  Streaming-path
+        leak prevention is handled by StreamingContextScrubber upstream."""
+        original = (
+            "<memory-context>\n"
+            "[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
+            "## Honcho Context\n"
+            "stale memory\n"
+            "</memory-context>\n\n"
+            "Visible answer"
+        )
+        msg = _mock_assistant_msg(content=original)
+        result = agent._build_assistant_message(msg, "stop")
+        assert "<memory-context>" in result["content"]
+        assert "Visible answer" in result["content"]
+
    def test_unterminated_think_block_stripped(self, agent):
        """Unterminated <think> block (MiniMax / NIM dropped close tag) is
        fully stripped from stored content."""
@@ -4753,21 +4771,21 @@ class TestDeadRetryCode:


 class TestMemoryContextSanitization:
-    """run_conversation() must strip leaked <memory-context> blocks from user input."""
+    """sanitize_context() helper correctness — used at provider boundaries."""

-    def test_memory_context_stripped_from_user_message(self):
-        """Verify that <memory-context> blocks are removed before the message
-        enters the conversation loop — prevents stale Honcho injection from
-        leaking into user text."""
+    def test_user_message_is_not_mutated_by_run_conversation(self):
+        """User input must reach run_conversation untouched — if a user types
+        a literal <memory-context> tag we don't silently delete their text.
+        The streaming scrubber + plugin-side scrub cover real leak paths."""
        import inspect
        src = inspect.getsource(AIAgent.run_conversation)
-        # The sanitize_context call must appear in run_conversation's preamble
-        assert "sanitize_context(user_message)" in src
-        assert "sanitize_context(persist_user_message)" in src
+        assert "sanitize_context(user_message)" not in src
+        assert "sanitize_context(persist_user_message)" not in src

    def test_sanitize_context_strips_full_block(self):
-        """End-to-end: a user message with an embedded memory-context block
-        is cleaned to just the actual user text."""
+        """Helper-level: a string with an embedded memory-context block is
+        cleaned to just the surrounding text.  Used by build_memory_context_block
+        (input-validation) and by plugins on their own backend boundary."""
        from agent.memory_manager import sanitize_context
        user_text = "how is the honcho working"
        injected = (
@@ -1115,6 +1115,141 @@ def test_interim_commentary_is_not_marked_already_streamed_when_stream_callback_
    }


+def test_interim_commentary_preserves_assistant_content(monkeypatch):
+    """Interim commentary must not silently mutate assistant text containing
+    literal <memory-context> markers — that's legitimate model output (docs,
+    code).  Streaming-path leak prevention happens delta-by-delta upstream."""
+    agent = _build_agent(monkeypatch)
+    observed = {}
+    agent.interim_assistant_callback = lambda text, *, already_streamed=False: observed.update(
+        {"text": text, "already_streamed": already_streamed}
+    )
+
+    content = (
+        "<memory-context>\n"
+        "[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
+        "## Honcho Context\n"
+        "stale memory\n"
+        "</memory-context>\n\n"
+        "I'll inspect the repo structure first."
+    )
+
+    agent._emit_interim_assistant_message({"role": "assistant", "content": content})
+
+    assert "<memory-context>" in observed["text"]
+    assert "I'll inspect the repo structure first." in observed["text"]
+
+
+def test_stream_delta_strips_leaked_memory_context(monkeypatch):
+    agent = _build_agent(monkeypatch)
+    observed = []
+    agent.stream_delta_callback = observed.append
+
+    leaked = (
+        "<memory-context>\n"
+        "[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
+        "## Honcho Context\n"
+        "stale memory\n"
+        "</memory-context>\n\n"
+        "Visible answer"
+    )
+
+    agent._fire_stream_delta(leaked)
+
+    assert observed == ["Visible answer"]
+
+
+def test_stream_delta_strips_leaked_memory_context_across_chunks(monkeypatch):
+    """Regression for #5719 — the real streaming case.
+
+    Providers typically emit 1-80 char chunks, so the memory-context open
+    tag, system-note line, payload, and close tag each arrive in separate
+    deltas.  The per-delta sanitize_context() regex cannot survive that
+    — only a stateful scrubber can.  None of the payload, system-note
+    text, or "## Honcho Context" header may reach the delta callback.
+    """
+    agent = _build_agent(monkeypatch)
+    observed = []
+    agent.stream_delta_callback = observed.append
+
+    deltas = [
+        "<memory-context>\n[System note: The following",
+        " is recalled memory context, NOT new user input. ",
+        "Treat as informational background data.]\n\n",
+        "## Honcho Context\n",
+        "stale memory about eri\n",
+        "</memory-context>\n\n",
+        "Visible answer",
+    ]
+    for d in deltas:
+        agent._fire_stream_delta(d)
+
+    combined = "".join(observed)
+    assert "Visible answer" in combined
+    # None of the leaked payload may surface.
+    assert "System note" not in combined
+    assert "Honcho Context" not in combined
+    assert "stale memory" not in combined
+    assert "<memory-context>" not in combined
+    assert "</memory-context>" not in combined
+
+
+def test_stream_delta_scrubber_resets_between_turns(monkeypatch):
+    """An unterminated span from a prior turn must not taint the next turn."""
+    agent = _build_agent(monkeypatch)
+
+    # Simulate a hung span carried over — directly populate the scrubber.
+    agent._stream_context_scrubber.feed("pre <memory-context>leaked")
+
+    # Normally run_conversation() resets the scrubber at turn start.
+    agent._stream_context_scrubber.reset()
+
+    observed = []
+    agent.stream_delta_callback = observed.append
+    agent._fire_stream_delta("clean new turn text")
+    assert "".join(observed) == "clean new turn text"
+
+
+def test_stream_delta_preserves_mid_stream_leading_newlines(monkeypatch):
+    """Mid-stream leading newlines must survive — they are legitimate
+    markdown (lists, code fences, paragraph breaks).  Stripping them
+    based on chunk boundaries silently breaks formatting.
+
+    Only the very first delta of a stream gets leading-newlines stripped
+    (so stale provider preamble doesn't leak); after that, deltas are
+    emitted verbatim.
+    """
+    agent = _build_agent(monkeypatch)
+    observed = []
+    agent.stream_delta_callback = observed.append
+
+    # First delta delivers text — strips its own leading "\n" once.
+    agent._fire_stream_delta("\nHere is a list:")
+    # Second delta starts with "\n- item" — must NOT be stripped.
+    agent._fire_stream_delta("\n- first")
+    agent._fire_stream_delta("\n- second")
+
+    combined = "".join(observed)
+    assert combined == "Here is a list:\n- first\n- second"
+
+
+def test_stream_delta_preserves_code_fence_newlines(monkeypatch):
+    """Code blocks span multiple deltas.  A "\\n```python\\n" boundary
+    is the canonical case where stripping leading newlines corrupts output."""
+    agent = _build_agent(monkeypatch)
+    observed = []
+    agent.stream_delta_callback = observed.append
+
+    agent._fire_stream_delta("Here is the code:")
+    agent._fire_stream_delta("\n```python\n")
+    agent._fire_stream_delta("print('hi')\n")
+    agent._fire_stream_delta("```\n")
+
+    combined = "".join(observed)
+    assert "```python\n" in combined
+    assert combined.startswith("Here is the code:\n```python\n")
+
+
 def test_run_conversation_codex_continues_after_commentary_phase_message(monkeypatch):
    agent = _build_agent(monkeypatch)
    responses = [
@@ -258,6 +258,24 @@ class TestMessageStorage:
        messages = db.get_messages("s1")
        assert messages[0]["finish_reason"] == "stop"

+    def test_get_messages_as_conversation_strips_leaked_memory_context(self, db):
+        db.create_session(session_id="s1", source="cli")
+        db.append_message(
+            "s1",
+            role="assistant",
+            content=(
+                "<memory-context>\n"
+                "[System note: The following is recalled memory context, NOT new user input. Treat as informational background data.]\n\n"
+                "## Honcho Context\n"
+                "stale memory\n"
+                "</memory-context>\n\n"
+                "Visible answer"
+            ),
+        )
+
+        conv = db.get_messages_as_conversation("s1")
+        assert conv == [{"role": "assistant", "content": "Visible answer"}]
+
    def test_reasoning_persisted_and_restored(self, db):
        """Reasoning text is stored for assistant messages and restored by
        get_messages_as_conversation() so providers receive coherent multi-turn
@@ -772,6 +790,51 @@ class TestCJKSearchFallback:
        results = db.search_messages("Agent通信")
        assert len(results) == 1

+    def test_cjk_partial_fts5_results_supplemented_by_like(self, db):
+        """When FTS5 returns *some* CJK results, LIKE must still find all matches.
+
+        Regression test for #15500 / #14829: FTS5 unicode61 tokenizer drops
+        certain CJK characters, so multi-character queries may return partial
+        results.  The LIKE path must always run for CJK queries.
+        """
+        db.create_session(session_id="s1", source="cli")
+        db.create_session(session_id="s2", source="telegram")
+        db.append_message("s1", role="user", content="昨晚讨论了记忆系统")
+        db.append_message("s2", role="user", content="昨晚的会议纪要已发送")
+        results = db.search_messages("昨晚")
+        assert len(results) == 2
+        session_ids = {r["session_id"] for r in results}
+        assert session_ids == {"s1", "s2"}
+
+    def test_cjk_like_dedup_no_duplicates(self, db):
+        """When FTS5 and LIKE both find the same message, no duplicates."""
+        db.create_session(session_id="s1", source="cli")
+        db.append_message("s1", role="user", content="测试去重逻辑")
+        results = db.search_messages("测试")
+        assert len(results) == 1
+
+    def test_cjk_like_escapes_wildcards(self, db):
+        """Special characters (%, _) in CJK queries are treated as literals."""
+        db.create_session(session_id="s1", source="cli")
+        db.create_session(session_id="s2", source="cli")
+        db.append_message("s1", role="user", content="达成100%完成率")
+        db.append_message("s2", role="user", content="达成100完成率是目标")
+        # The % in the query must be literal — should only match s1
+        results = db.search_messages("100%完成")
+        assert len(results) == 1
+        assert results[0]["session_id"] == "s1"
+
+    def test_cjk_trigram_preserves_boolean_operators(self, db):
+        """Boolean operators (OR, AND, NOT) work in CJK trigram queries."""
+        db.create_session(session_id="s1", source="cli")
+        db.create_session(session_id="s2", source="cli")
+        db.append_message("s1", role="user", content="记忆系统很好用")
+        db.append_message("s2", role="user", content="断裂连接需要修复")
+        results = db.search_messages("记忆系统 OR 断裂连接")
+        assert len(results) == 2
+        session_ids = {r["session_id"] for r in results}
+        assert session_ids == {"s1", "s2"}
+

 # =========================================================================
 # Session search and listing
@@ -1229,7 +1292,7 @@ class TestSchemaInit:
    def test_schema_version(self, db):
        cursor = db._conn.execute("SELECT version FROM schema_version")
        version = cursor.fetchone()[0]
-        assert version == 9
+        assert version == 10

    def test_title_column_exists(self, db):
        """Verify the title column was created in the sessions table."""
@@ -1290,7 +1353,7 @@ class TestSchemaInit:

        # Verify migration
        cursor = migrated_db._conn.execute("SELECT version FROM schema_version")
-        assert cursor.fetchone()[0] == 9
+        assert cursor.fetchone()[0] == 10

        # Verify title column exists and is NULL for existing sessions
        session = migrated_db.get_session("existing")
@@ -274,6 +274,69 @@ def _session(agent=None, **extra):
    }


+def test_session_close_commits_memory_and_fires_finalize_hook(monkeypatch):
+    calls = {"hooks": []}
+
+    agent = types.SimpleNamespace(session_id="session-key")
+    agent.commit_memory_session = lambda history: calls.setdefault("history", history)
+    server._sessions["sid"] = _session(
+        agent=agent, history=[{"role": "user", "content": "hello"}]
+    )
+    monkeypatch.setattr(
+        server,
+        "_notify_session_boundary",
+        lambda event, session_id: calls["hooks"].append((event, session_id)),
+    )
+
+    try:
+        resp = server.handle_request(
+            {"id": "1", "method": "session.close", "params": {"session_id": "sid"}}
+        )
+        assert resp["result"]["closed"] is True
+        assert calls["history"] == [{"role": "user", "content": "hello"}]
+        assert ("on_session_finalize", "session-key") in calls["hooks"]
+    finally:
+        server._sessions.pop("sid", None)
+
+
+def test_init_session_fires_reset_hook(monkeypatch):
+    hooks = []
+
+    class _FakeWorker:
+        def __init__(self, key, model):
+            self.key = key
+
+        def close(self):
+            return None
+
+    monkeypatch.setattr(server, "_SlashWorker", _FakeWorker)
+    monkeypatch.setattr(server, "_wire_callbacks", lambda _sid: None)
+    monkeypatch.setattr(server, "_emit", lambda *args, **kwargs: None)
+    monkeypatch.setattr(
+        server,
+        "_notify_session_boundary",
+        lambda event, session_id: hooks.append((event, session_id)),
+    )
+
+    import tools.approval as _approval
+
+    monkeypatch.setattr(_approval, "register_gateway_notify", lambda key, cb: None)
+    monkeypatch.setattr(_approval, "load_permanent_allowlist", lambda: None)
+
+    sid = "sid"
+    try:
+        server._init_session(
+            sid,
+            "session-key",
+            types.SimpleNamespace(model="x"),
+            history=[],
+            cols=80,
+        )
+        assert ("on_session_reset", "session-key") in hooks
+    finally:
+        server._sessions.pop(sid, None)
+
+
 def test_session_title_queues_when_db_row_not_ready(monkeypatch):
    class _FakeDB:
        def get_session_title(self, _key):
@@ -564,7 +627,9 @@ def test_session_create_drops_pending_title_on_valueerror(monkeypatch):
    monkeypatch.setattr(_approval, "register_gateway_notify", lambda key, cb: None)
    monkeypatch.setattr(_approval, "load_permanent_allowlist", lambda: None)

-    resp = server.handle_request({"id": "1", "method": "session.create", "params": {"cols": 80}})
+    resp = server.handle_request(
+        {"id": "1", "method": "session.create", "params": {"cols": 80}}
+    )
    sid = resp["result"]["session_id"]
    session = server._sessions[sid]
    session["pending_title"] = "duplicate title"
@@ -604,6 +669,176 @@ def test_config_set_yolo_toggles_session_scope():
        server._sessions.clear()


+def test_config_set_fast_updates_live_agent_and_config(monkeypatch):
+    writes = []
+    emits = []
+    agent = types.SimpleNamespace(
+        model="openai/gpt-5.4",
+        request_overrides={"foo": "bar", "speed": "slow"},
+        service_tier=None,
+    )
+    server._sessions["sid"] = _session(agent=agent)
+
+    monkeypatch.setattr(
+        server, "_write_config_key", lambda path, value: writes.append((path, value))
+    )
+    monkeypatch.setattr(server, "_session_info", lambda _agent: {"model": "x"})
+    monkeypatch.setattr(server, "_emit", lambda *args: emits.append(args))
+    monkeypatch.setattr(
+        "hermes_cli.models.resolve_fast_mode_overrides",
+        lambda _model_id: {"service_tier": "priority"},
+    )
+
+    try:
+        resp = server.handle_request(
+            {
+                "id": "1",
+                "method": "config.set",
+                "params": {"session_id": "sid", "key": "fast", "value": "fast"},
+            }
+        )
+        assert resp["result"]["value"] == "fast"
+        assert agent.service_tier == "priority"
+        assert agent.request_overrides == {
+            "foo": "bar",
+            "service_tier": "priority",
+        }
+        assert ("agent.service_tier", "fast") in writes
+        assert ("session.info", "sid", {"model": "x"}) in emits
+
+        resp_normal = server.handle_request(
+            {
+                "id": "2",
+                "method": "config.set",
+                "params": {"session_id": "sid", "key": "fast", "value": "normal"},
+            }
+        )
+        assert resp_normal["result"]["value"] == "normal"
+        assert agent.service_tier is None
+        assert agent.request_overrides == {"foo": "bar"}
+        assert ("agent.service_tier", "normal") in writes
+    finally:
+        server._sessions.pop("sid", None)
+
+
+def test_config_set_fast_status_is_non_mutating(monkeypatch):
+    writes = []
+    emits = []
+    agent = types.SimpleNamespace(service_tier="priority")
+    server._sessions["sid"] = _session(agent=agent)
+
+    monkeypatch.setattr(
+        server, "_write_config_key", lambda path, value: writes.append((path, value))
+    )
+    monkeypatch.setattr(server, "_emit", lambda *args: emits.append(args))
+
+    try:
+        resp = server.handle_request(
+            {
+                "id": "1",
+                "method": "config.set",
+                "params": {"session_id": "sid", "key": "fast", "value": "status"},
+            }
+        )
+        assert resp["result"]["value"] == "fast"
+        assert writes == []
+        assert emits == []
+    finally:
+        server._sessions.pop("sid", None)
+
+
+def test_config_set_fast_rejects_unsupported_model(monkeypatch):
+    writes = []
+    agent = types.SimpleNamespace(
+        model="unsupported-model",
+        request_overrides={},
+        service_tier=None,
+    )
+    server._sessions["sid"] = _session(agent=agent)
+
+    monkeypatch.setattr(
+        server, "_write_config_key", lambda path, value: writes.append((path, value))
+    )
+    monkeypatch.setattr(
+        "hermes_cli.models.resolve_fast_mode_overrides",
+        lambda _model_id: None,
+    )
+
+    try:
+        resp = server.handle_request(
+            {
+                "id": "1",
+                "method": "config.set",
+                "params": {"session_id": "sid", "key": "fast", "value": "fast"},
+            }
+        )
+        assert resp["error"]["code"] == 4002
+        assert "not available" in resp["error"]["message"]
+        assert agent.service_tier is None
+        assert agent.request_overrides == {}
+        assert writes == []
+    finally:
+        server._sessions.pop("sid", None)
+
+
+def test_config_set_fast_rejects_missing_model(monkeypatch):
+    writes = []
+    agent = types.SimpleNamespace(
+        model="",
+        request_overrides={},
+        service_tier=None,
+    )
+    server._sessions["sid"] = _session(agent=agent)
+
+    monkeypatch.setattr(
+        server, "_write_config_key", lambda path, value: writes.append((path, value))
+    )
+
+    try:
+        resp = server.handle_request(
+            {
+                "id": "1",
+                "method": "config.set",
+                "params": {"session_id": "sid", "key": "fast", "value": "fast"},
+            }
+        )
+        assert resp["error"]["code"] == 4002
+        assert "without a selected model" in resp["error"]["message"]
+        assert agent.service_tier is None
+        assert agent.request_overrides == {}
+        assert writes == []
+    finally:
+        server._sessions.pop("sid", None)
+
+
+def test_config_busy_get_and_set(monkeypatch):
+    writes = []
+
+    monkeypatch.setattr(
+        server,
+        "_load_cfg",
+        lambda: {"display": {"busy_input_mode": "steer"}},
+    )
+    monkeypatch.setattr(
+        server, "_write_config_key", lambda path, value: writes.append((path, value))
+    )
+
+    get_resp = server.handle_request(
+        {"id": "1", "method": "config.get", "params": {"key": "busy"}}
+    )
+    assert get_resp["result"]["value"] == "steer"
+
+    set_resp = server.handle_request(
+        {
+            "id": "2",
+            "method": "config.set",
+            "params": {"key": "busy", "value": "interrupt"},
+        }
+    )
+    assert set_resp["result"]["value"] == "interrupt"
+    assert ("display.busy_input_mode", "interrupt") in writes
+
+
 def test_config_get_statusbar_survives_non_dict_display(monkeypatch):
    monkeypatch.setattr(server, "_load_cfg", lambda: {"display": "broken"})

@@ -614,6 +849,16 @@ def test_config_get_statusbar_survives_non_dict_display(monkeypatch):
    assert resp["result"]["value"] == "top"


+def test_config_get_busy_survives_non_dict_display(monkeypatch):
+    monkeypatch.setattr(server, "_load_cfg", lambda: {"display": "broken"})
+
+    resp = server.handle_request(
+        {"id": "1", "method": "config.get", "params": {"key": "busy"}}
+    )
+
+    assert resp["result"]["value"] == "interrupt"
+
+
 def test_config_set_statusbar_survives_non_dict_display(tmp_path, monkeypatch):
    import yaml

@@ -251,11 +251,59 @@ class _SlashWorker:
                pass


-atexit.register(
-    lambda: [
-        s.get("slash_worker") and s["slash_worker"].close() for s in _sessions.values()
-    ]
-)
+def _load_busy_input_mode() -> str:
+    display = _load_cfg().get("display")
+    if not isinstance(display, dict):
+        display = {}
+    raw = str(display.get("busy_input_mode", "") or "").strip().lower()
+    return raw if raw in {"queue", "steer", "interrupt"} else "interrupt"
+
+
+def _notify_session_boundary(event_type: str, session_id: str | None) -> None:
+    """Fire session lifecycle hooks with CLI parity."""
+    try:
+        from hermes_cli.plugins import invoke_hook as _invoke_hook
+
+        _invoke_hook(event_type, session_id=session_id, platform="tui")
+    except Exception:
+        pass
+
+
+def _finalize_session(session: dict | None) -> None:
+    """Best-effort finalize hook + memory commit for a session."""
+    if not session or session.get("_finalized"):
+        return
+    session["_finalized"] = True
+
+    agent = session.get("agent")
+    lock = session.get("history_lock")
+    if lock is not None:
+        with lock:
+            history = list(session.get("history", []))
+    else:
+        history = list(session.get("history", []))
+    if agent is not None and history and hasattr(agent, "commit_memory_session"):
+        try:
+            agent.commit_memory_session(history)
+        except Exception:
+            pass
+
+    session_id = getattr(agent, "session_id", None) or session.get("session_key")
+    _notify_session_boundary("on_session_finalize", session_id)
+
+
+def _shutdown_sessions() -> None:
+    for session in list(_sessions.values()):
+        _finalize_session(session)
+        try:
+            worker = session.get("slash_worker")
+            if worker:
+                worker.close()
+        except Exception:
+            pass
+
+
+atexit.register(_shutdown_sessions)


 # ── Plumbing ──────────────────────────────────────────────────────────
@@ -1420,6 +1468,7 @@ def _init_session(sid: str, key: str, agent, history: list, cols: int = 80):
    except Exception:
        pass
    _wire_callbacks(sid)
+    _notify_session_boundary("on_session_reset", key)
    _emit("session.info", sid, _session_info(agent))


@@ -1637,6 +1686,7 @@ def _(rid, params: dict) -> dict:
                pass

            _wire_callbacks(sid)
+            _notify_session_boundary("on_session_reset", key)

            info = _session_info(agent)
            warn = _probe_credentials(agent)
@@ -1960,6 +2010,7 @@ def _(rid, params: dict) -> dict:
    session = _sessions.pop(sid, None)
    if not session:
        return _ok(rid, {"closed": False})
+    _finalize_session(session)
    try:
        from tools.approval import unregister_gateway_notify

@@ -2827,6 +2878,75 @@ def _(rid, params: dict) -> dict:
        except Exception as e:
            return _err(rid, 5001, str(e))

+    if key == "fast":
+        raw = str(value or "").strip().lower()
+        agent = session.get("agent") if session else None
+        if agent is not None:
+            current_fast = getattr(agent, "service_tier", None) == "priority"
+        else:
+            current_fast = _load_service_tier() == "priority"
+
+        if raw in {"status"}:
+            return _ok(
+                rid,
+                {"key": key, "value": "fast" if current_fast else "normal"},
+            )
+
+        if raw in ("", "toggle"):
+            nv = "normal" if current_fast else "fast"
+        elif raw in {"fast", "on"}:
+            nv = "fast"
+        elif raw in {"normal", "off"}:
+            nv = "normal"
+        else:
+            return _err(rid, 4002, f"unknown fast mode: {value}")
+
+        overrides = None
+        if nv == "fast":
+            from hermes_cli.models import resolve_fast_mode_overrides
+
+            target_model = (
+                getattr(agent, "model", None) if agent is not None else _resolve_model()
+            )
+            if not target_model:
+                return _err(
+                    rid,
+                    4002,
+                    "fast mode is not available without a selected model",
+                )
+            overrides = resolve_fast_mode_overrides(target_model)
+            if overrides is None:
+                return _err(
+                    rid,
+                    4002,
+                    "fast mode is not available for this model",
+                )
+
+        _write_config_key("agent.service_tier", nv)
+        if agent is not None:
+            agent.service_tier = "priority" if nv == "fast" else None
+            current_overrides = dict(getattr(agent, "request_overrides", {}) or {})
+            current_overrides.pop("service_tier", None)
+            current_overrides.pop("speed", None)
+            if nv == "fast":
+                current_overrides.update(overrides)
+            agent.request_overrides = current_overrides
+            _emit(
+                "session.info",
+                params.get("session_id", ""),
+                _session_info(agent),
+            )
+        return _ok(rid, {"key": key, "value": nv})
+
+    if key == "busy":
+        raw = str(value or "").strip().lower()
+        if raw in ("", "status"):
+            return _ok(rid, {"key": key, "value": _load_busy_input_mode()})
+        if raw not in {"queue", "steer", "interrupt"}:
+            return _err(rid, 4002, f"unknown busy mode: {value}")
+        _write_config_key("display.busy_input_mode", raw)
+        return _ok(rid, {"key": key, "value": raw})
+
    if key == "verbose":
        cycle = ["off", "new", "all", "verbose"]
        cur = (
@@ -3100,6 +3220,21 @@ def _(rid, params: dict) -> dict:
            else "hide"
        )
        return _ok(rid, {"value": effort, "display": display})
+    if key == "fast":
+        return _ok(
+            rid,
+            {
+                "value": (
+                    "fast"
+                    if (session := _sessions.get(params.get("session_id", "")))
+                    and getattr(session.get("agent"), "service_tier", None)
+                    == "priority"
+                    else ("fast" if _load_service_tier() == "priority" else "normal")
+                ),
+            },
+        )
+    if key == "busy":
+        return _ok(rid, {"value": _load_busy_input_mode()})
    if key == "details_mode":
        allowed_dm = frozenset({"hidden", "collapsed", "expanded"})
        raw = (
@@ -4126,10 +4261,6 @@ def _(rid, params: dict) -> dict:

    # Skill slash commands and _pending_input commands must NOT go through the
    # slash worker — see _PENDING_INPUT_COMMANDS definition above.
-    # (/browser connect/disconnect also uses _pending_input for context
-    # notes, but the actual browser operations need the slash worker's
-    # env-var side effects, so they stay in slash.exec — only the context
-    # note to the model is lost, which is low-severity.)
    _cmd_parts = cmd.split() if not cmd.startswith("/") else cmd.lstrip("/").split()
    _cmd_base = _cmd_parts[0] if _cmd_parts else ""

@@ -30,7 +30,7 @@ export { useTerminalFocus } from './src/ink/hooks/use-terminal-focus.ts'
 export { useTerminalTitle } from './src/ink/hooks/use-terminal-title.ts'
 export { useTerminalViewport } from './src/ink/hooks/use-terminal-viewport.ts'
 export { default as measureElement } from './src/ink/measure-element.ts'
-export { createRoot, default as render, renderSync } from './src/ink/root.ts'
+export { createRoot, default as render, forceRedraw, renderSync } from './src/ink/root.ts'
 export type { Instance, RenderOptions, Root } from './src/ink/root.ts'
 export { stringWidth } from './src/ink/stringWidth.ts'
 export { default as TextInput, UncontrolledTextInput } from 'ink-text-input'
@@ -23,7 +23,7 @@ export { useTerminalTitle } from './ink/hooks/use-terminal-title.js'
 export { useTerminalViewport } from './ink/hooks/use-terminal-viewport.js'
 export { default as measureElement } from './ink/measure-element.js'
 export { scrollFastPathStats, type ScrollFastPathStats } from './ink/render-node-to-output.js'
-export { createRoot, default as render, renderSync } from './ink/root.js'
+export { createRoot, default as render, forceRedraw, renderSync } from './ink/root.js'
 export { stringWidth } from './ink/stringWidth.js'
 export { isXtermJs } from './ink/terminal.js'
 export { default as TextInput, UncontrolledTextInput } from 'ink-text-input'
@@ -73,6 +73,16 @@ export type Root = {
  waitUntilExit: () => Promise<void>
 }

+export const forceRedraw = (stdout: NodeJS.WriteStream = process.stdout): boolean => {
+  const instance = instances.get(stdout)
+  if (!instance) {
+    return false
+  }
+
+  instance.forceRedraw()
+  return true
+}
+
 /**
 * Mount a component and render the output.
 */
@@ -26,6 +26,12 @@ describe('constants', () => {
    })
  })

+  it('documents Ctrl/Cmd+L as non-destructive redraw', () => {
+    const hotkey = HOTKEYS.find(([k]) => k.endsWith('+L'))
+    expect(hotkey).toBeDefined()
+    expect(hotkey?.[1]).toBe('redraw / repaint')
+  })
+
  it('TOOL_VERBS maps known tools (verb-only, no emoji)', () => {
    expect(TOOL_VERBS.terminal).toBe('terminal')
    expect(TOOL_VERBS.read_file).toBe('reading')
@@ -1,9 +1,9 @@
 import { beforeEach, describe, expect, it, vi } from 'vitest'

 import { createSlashHandler } from '../app/createSlashHandler.js'
-import { TUI_SESSION_MODEL_FLAG } from '../domain/slash.js'
 import { getOverlayState, resetOverlayState } from '../app/overlayStore.js'
 import { getUiState, patchUiState, resetUiState } from '../app/uiStore.js'
+import { TUI_SESSION_MODEL_FLAG } from '../domain/slash.js'

 describe('createSlashHandler', () => {
  beforeEach(() => {
@@ -26,7 +26,7 @@ describe('createSlashHandler', () => {
    expect(ctx.gateway.gw.request).not.toHaveBeenCalled()
  })

-  it('persists typed /model switches by default', async () => {
+  it('keeps typed /model switches session-scoped by default', async () => {
    patchUiState({ sid: 'sid-abc' })

    const ctx = buildCtx({
@@ -40,7 +40,7 @@ describe('createSlashHandler', () => {
    expect(ctx.gateway.rpc).toHaveBeenCalledWith('config.set', {
      key: 'model',
      session_id: 'sid-abc',
-      value: 'x-model --global'
+      value: 'x-model'
    })
  })

@@ -55,9 +55,7 @@ describe('createSlashHandler', () => {
    })

    expect(
-      createSlashHandler(ctx)(
-        `/model anthropic/claude-sonnet-4.6 --provider openrouter ${TUI_SESSION_MODEL_FLAG}`
-      )
+      createSlashHandler(ctx)(`/model anthropic/claude-sonnet-4.6 --provider openrouter ${TUI_SESSION_MODEL_FLAG}`)
    ).toBe(true)
    expect(ctx.gateway.rpc).toHaveBeenCalledWith('config.set', {
      key: 'model',
@@ -192,6 +190,31 @@ describe('createSlashHandler', () => {
    expect(ctx.transcript.sys).toHaveBeenNthCalledWith(3, 'MCP tool: /tools enable github:create_issue')
  })

+  it.each([
+    ['/browser status', 'browser.manage', { action: 'status' }],
+    ['/reload-mcp', 'reload.mcp', { session_id: null }],
+    ['/stop', 'process.stop', {}],
+    ['/fast status', 'config.get', { key: 'fast', session_id: null }],
+    ['/busy status', 'config.get', { key: 'busy' }]
+  ])('routes %s through native RPC (no slash worker)', (command, method, params) => {
+    const rpc = vi.fn(() => Promise.resolve({}))
+    const ctx = buildCtx({ gateway: { ...buildGateway(), rpc } })
+
+    expect(createSlashHandler(ctx)(command)).toBe(true)
+    expect(rpc).toHaveBeenCalledWith(method, params)
+    expect(ctx.gateway.gw.request).not.toHaveBeenCalled()
+  })
+
+  it('routes /rollback through native RPC when a session is active', () => {
+    patchUiState({ sid: 'sid-abc' })
+    const rpc = vi.fn(() => Promise.resolve({}))
+    const ctx = buildCtx({ gateway: { ...buildGateway(), rpc } })
+
+    expect(createSlashHandler(ctx)('/rollback')).toBe(true)
+    expect(rpc).toHaveBeenCalledWith('rollback.list', { session_id: 'sid-abc' })
+    expect(ctx.gateway.gw.request).not.toHaveBeenCalled()
+  })
+
  it('drops stale slash.exec output after a newer slash', async () => {
    let resolveLate: (v: { output?: string }) => void
    let slashExecCalls = 0
@@ -222,7 +245,7 @@ describe('createSlashHandler', () => {

    const h = createSlashHandler(ctx)
    expect(h('/slow')).toBe(true)
-    expect(h('/fast')).toBe(true)
+    expect(h('/later')).toBe(true)
    resolveLate!({ output: 'too late' })
    await vi.waitFor(() => {
      expect(ctx.transcript.sys).toHaveBeenCalled()
@@ -398,6 +421,16 @@ describe('createSlashHandler', () => {
    expect(ctx.transcript.sys).toHaveBeenCalledWith('no active session — nothing to save')
  })

+  it('/rollback without an active session tells the user instead of hitting the RPC', () => {
+    const rpc = vi.fn(() => Promise.resolve({}))
+    const ctx = buildCtx({ gateway: { ...buildGateway(), rpc } })
+
+    createSlashHandler(ctx)('/rollback')
+
+    expect(rpc).not.toHaveBeenCalled()
+    expect(ctx.transcript.sys).toHaveBeenCalledWith('no active session — nothing to rollback')
+  })
+
  it('/title <name> uses session.title RPC and bypasses slash.exec', async () => {
    patchUiState({ sid: 'sid-abc' })
    const rpc = vi.fn(() => Promise.resolve({ pending: false, title: 'my title' }))
@@ -0,0 +1,113 @@
+import { execFileSync } from 'node:child_process'
+import { dirname, resolve } from 'node:path'
+import { fileURLToPath } from 'node:url'
+
+import { describe, expect, it } from 'vitest'
+
+import { SLASH_COMMANDS } from '../app/slash/registry.js'
+
+type CommandRoute = 'fallback' | 'local' | 'native'
+
+interface CommandRegistryLoad {
+  error?: string
+  names: string[]
+}
+
+const NATIVE_MUTATING_COMMANDS = new Set(['browser', 'busy', 'fast', 'reload-mcp', 'rollback', 'stop'])
+
+const MUTATING_COMMANDS = [
+  'background',
+  'branch',
+  'browser',
+  'busy',
+  'clear',
+  'compress',
+  'fast',
+  'model',
+  'new',
+  'personality',
+  'queue',
+  'reasoning',
+  'reload-mcp',
+  'retry',
+  'rollback',
+  'steer',
+  'stop',
+  'title',
+  'tools',
+  'undo',
+  'verbose',
+  'voice',
+  'yolo'
+] as const
+
+const loadCommandRegistryNames = (): CommandRegistryLoad => {
+  const here = dirname(fileURLToPath(import.meta.url))
+
+  try {
+    const names = JSON.parse(
+      execFileSync(
+        process.env.PYTHON ?? 'python3',
+        [
+          '-c',
+          'import json; from hermes_cli.commands import COMMAND_REGISTRY; print(json.dumps([c.name for c in COMMAND_REGISTRY]))'
+        ],
+        { cwd: resolve(here, '../../..'), encoding: 'utf8' }
+      )
+    ) as string[]
+
+    return { names: [...new Set(names)] }
+  } catch (error) {
+    return {
+      error: error instanceof Error ? error.message : String(error),
+      names: []
+    }
+  }
+}
+
+const commandRegistry = loadCommandRegistryNames()
+const registryIt = commandRegistry.error ? it.skip : it
+const skipReason = commandRegistry.error ? commandRegistry.error.split('\n')[0] : ''
+
+const LOCAL_COMMAND_NAMES = new Set(
+  SLASH_COMMANDS.flatMap(command => [command.name, ...(command.aliases ?? [])].map(name => name.toLowerCase()))
+)
+
+const classifyRoute = (name: string): CommandRoute => {
+  const normalized = name.toLowerCase()
+
+  if (NATIVE_MUTATING_COMMANDS.has(normalized)) {
+    return 'native'
+  }
+
+  if (LOCAL_COMMAND_NAMES.has(normalized)) {
+    return 'local'
+  }
+
+  return 'fallback'
+}
+
+describe('slash parity matrix', () => {
+  if (commandRegistry.error) {
+    it.skip(`Python command registry unavailable: ${skipReason}`, () => {})
+  }
+
+  registryIt('classifies each command registry command as local/native/fallback', () => {
+    const routes = Object.fromEntries(commandRegistry.names.map(name => [name, classifyRoute(name)]))
+
+    expect(routes['model']).toBe('local')
+    expect(routes['browser']).toBe('native')
+    expect(routes['reload-mcp']).toBe('native')
+    expect(routes['rollback']).toBe('native')
+    expect(routes['stop']).toBe('native')
+  })
+
+  registryIt('keeps every mutating command off slash-worker fallback', () => {
+    const routes = Object.fromEntries(commandRegistry.names.map(name => [name, classifyRoute(name)]))
+
+    for (const name of MUTATING_COMMANDS) {
+      expect(routes[name], `missing command in registry: ${name}`).toBeDefined()
+      expect(routes[name], `mutating command must not fallback: ${name}`).not.toBe('fallback')
+    }
+  })
+})
@@ -0,0 +1,28 @@
+import { describe, expect, it } from 'vitest'
+
+import { removeAtInPlace } from '../hooks/useQueue.js'
+
+describe('removeAtInPlace', () => {
+  it('removes the item at the given index in place', () => {
+    const arr = ['a', 'b', 'c']
+
+    removeAtInPlace(arr, 1)
+    expect(arr).toEqual(['a', 'c'])
+  })
+
+  it('is a no-op when the index is out of bounds', () => {
+    const arr = ['a', 'b']
+
+    removeAtInPlace(arr, -1)
+    removeAtInPlace(arr, 5)
+    expect(arr).toEqual(['a', 'b'])
+  })
+
+  it('returns the same reference (mutates in place)', () => {
+    const arr = ['x']
+    const same = removeAtInPlace(arr, 0)
+
+    expect(same).toBe(arr)
+    expect(arr).toEqual([])
+  })
+})
@@ -2,6 +2,7 @@ import { atom } from 'nanostores'

 export interface InputSelection {
  clear: () => void
+  collapseToEnd: () => void
  end: number
  start: number
  value: string
@@ -125,6 +125,7 @@ export interface ComposerActions {
  handleTextPaste: (event: PasteEvent) => MaybePromise<ComposerPasteResult | null>
  openEditor: () => Promise<void>
  pushHistory: (text: string) => void
+  removeQueue: (index: number) => void
  replaceQueue: (index: number, text: string) => void
  setCompIdx: StateSetter<number>
  setHistoryIdx: StateSetter<null | number>
@@ -284,6 +285,7 @@ export interface AppLayoutActions {
  answerClarify: (answer: string) => void
  answerSecret: (value: string) => void
  answerSudo: (pw: string) => void
+  clearSelection: () => void
  onModelSelect: (value: string) => void
  resumeById: (id: string) => void
  setStickyPrompt: (value: string) => void
@@ -6,8 +6,8 @@ import type {
  ConfigGetValueResponse,
  ConfigSetResponse,
  SessionSaveResponse,
-  SessionTitleResponse,
  SessionSteerResponse,
+  SessionTitleResponse,
  SessionUndoResponse
 } from '../../../gatewayTypes.js'
 import { writeOsc52Clipboard } from '../../../lib/osc52.js'
@@ -1,5 +1,11 @@
 import type {
+  BrowserManageResponse,
  DelegationPauseResponse,
+  ProcessStopResponse,
+  ReloadMcpResponse,
+  RollbackDiffResponse,
+  RollbackListResponse,
+  RollbackRestoreResponse,
  SlashExecResponse,
  SpawnTreeListResponse,
  SpawnTreeLoadResponse,
@@ -50,6 +56,172 @@ interface SkillsBrowseResponse {
 }

 export const opsCommands: SlashCommand[] = [
+  {
+    help: 'stop background processes',
+    name: 'stop',
+    run: (_arg, ctx) => {
+      ctx.gateway
+        .rpc<ProcessStopResponse>('process.stop', {})
+        .then(
+          ctx.guarded<ProcessStopResponse>(r => {
+            const killed = Number(r.killed ?? 0)
+            const noun = killed === 1 ? 'process' : 'processes'
+            ctx.transcript.sys(`stopped ${killed} background ${noun}`)
+          })
+        )
+        .catch(ctx.guardedErr)
+    }
+  },
+
+  {
+    aliases: ['reload_mcp'],
+    help: 'reload MCP servers in the live session',
+    name: 'reload-mcp',
+    run: (_arg, ctx) => {
+      ctx.gateway
+        .rpc<ReloadMcpResponse>('reload.mcp', { session_id: ctx.sid })
+        .then(
+          ctx.guarded<ReloadMcpResponse>(r => {
+            ctx.transcript.sys(r.status === 'reloaded' ? 'MCP servers reloaded' : 'reload complete')
+          })
+        )
+        .catch(ctx.guardedErr)
+    }
+  },
+
+  {
+    help: 'manage browser CDP connection [connect|disconnect|status]',
+    name: 'browser',
+    run: (arg, ctx) => {
+      const trimmed = arg.trim()
+      const [rawAction, ...rest] = trimmed ? trimmed.split(/\s+/) : ['status']
+      const action = (rawAction || 'status').toLowerCase()
+
+      if (!['connect', 'disconnect', 'status'].includes(action)) {
+        return ctx.transcript.sys('usage: /browser [connect|disconnect|status] [url]')
+      }
+
+      const payload: Record<string, unknown> = { action }
+
+      if (action === 'connect') {
+        payload.url = rest.join(' ').trim() || 'http://localhost:9222'
+      }
+
+      ctx.gateway
+        .rpc<BrowserManageResponse>('browser.manage', payload)
+        .then(
+          ctx.guarded<BrowserManageResponse>(r => {
+            if (action === 'status') {
+              return ctx.transcript.sys(
+                r.connected ? `browser connected: ${r.url || '(url unavailable)'}` : 'browser not connected'
+              )
+            }
+
+            if (action === 'connect') {
+              return ctx.transcript.sys(
+                r.connected ? `browser connected: ${r.url || '(url unavailable)'}` : 'browser connect failed'
+              )
+            }
+
+            ctx.transcript.sys('browser disconnected')
+          })
+        )
+        .catch(ctx.guardedErr)
+    }
+  },
+
+  {
+    help: 'list, diff, or restore checkpoints',
+    name: 'rollback',
+    run: (arg, ctx) => {
+      if (!ctx.sid) {
+        return ctx.transcript.sys('no active session — nothing to rollback')
+      }
+
+      const trimmed = arg.trim()
+      const [first = '', ...rest] = trimmed.split(/\s+/).filter(Boolean)
+      const lower = first.toLowerCase()
+
+      if (!trimmed || lower === 'list' || lower === 'ls') {
+        return ctx.gateway
+          .rpc<RollbackListResponse>('rollback.list', { session_id: ctx.sid })
+          .then(
+            ctx.guarded<RollbackListResponse>(r => {
+              if (!r.enabled) {
+                return ctx.transcript.sys('checkpoints are not enabled')
+              }
+
+              const checkpoints = r.checkpoints ?? []
+
+              if (!checkpoints.length) {
+                return ctx.transcript.sys('no checkpoints found')
+              }
+
+              ctx.transcript.panel('Rollback checkpoints', [
+                {
+                  rows: checkpoints.map((c, idx) => [
+                    `${idx + 1}. ${c.hash.slice(0, 10)}`,
+                    [c.timestamp, c.message].filter(Boolean).join(' · ') || '(no metadata)'
+                  ])
+                }
+              ])
+            })
+          )
+          .catch(ctx.guardedErr)
+      }
+
+      if (lower === 'diff') {
+        const hash = rest[0]
+
+        if (!hash) {
+          return ctx.transcript.sys('usage: /rollback diff <checkpoint>')
+        }
+
+        return ctx.gateway
+          .rpc<RollbackDiffResponse>('rollback.diff', { hash, session_id: ctx.sid })
+          .then(
+            ctx.guarded<RollbackDiffResponse>(r => {
+              const body = (r.rendered || r.diff || '').trim()
+
+              if (!body && !r.stat) {
+                return ctx.transcript.sys('no changes since this checkpoint')
+              }
+
+              const text = [r.stat || '', body].filter(Boolean).join('\n\n')
+              ctx.transcript.page(text, 'Rollback diff')
+            })
+          )
+          .catch(ctx.guardedErr)
+      }
+
+      const hash = first
+      const filePath = rest.join(' ').trim()
+
+      return ctx.gateway
+        .rpc<RollbackRestoreResponse>('rollback.restore', {
+          ...(filePath ? { file_path: filePath } : {}),
+          hash,
+          session_id: ctx.sid
+        })
+        .then(
+          ctx.guarded<RollbackRestoreResponse>(r => {
+            if (!r.success) {
+              return ctx.transcript.sys(`rollback failed: ${r.error || r.message || 'unknown error'}`)
+            }
+
+            const target = filePath || 'workspace'
+            const detail = r.reason || r.message || r.restored_to || 'restored'
+            ctx.transcript.sys(`rollback restored ${target}: ${detail}`)
+
+            if ((r.history_removed ?? 0) > 0) {
+              ctx.transcript.setHistoryItems(prev => ctx.transcript.trimLastExchange(prev))
+            }
+          })
+        )
+        .catch(ctx.guardedErr)
+    }
+  },
+
  {
    aliases: ['tasks'],
    help: 'open the spawn-tree dashboard (live audit + kill/pause controls)',
@@ -1,4 +1,5 @@
 import { attachedImageNotice, introMsg, toTranscriptMessages } from '../../../domain/messages.js'
+import { TUI_SESSION_MODEL_FLAG } from '../../../domain/slash.js'
 import type {
  BackgroundStartResponse,
  ConfigGetValueResponse,
@@ -10,25 +11,15 @@ import type {
  VoiceToggleResponse
 } from '../../../gatewayTypes.js'
 import { fmtK } from '../../../lib/text.js'
-import { TUI_SESSION_MODEL_FLAG } from '../../../domain/slash.js'
 import type { PanelSection } from '../../../types.js'
 import { patchOverlayState } from '../../overlayStore.js'
 import { patchUiState } from '../../uiStore.js'
 import type { SlashCommand } from '../types.js'

-const GLOBAL_MODEL_FLAG_RE = /(?:^|\s)--global(?:\s|$)/
-
 const TUI_SESSION_MODEL_RE = new RegExp(`(?:^|\\s)${TUI_SESSION_MODEL_FLAG}(?:\\s|$)`)
 const TUI_SESSION_STRIP_RE = new RegExp(`\\s*${TUI_SESSION_MODEL_FLAG}\\b\\s*`, 'g')

-const persistedModelArg = (arg: string) => {
-  const trimmed = arg.trim()
-
-  return !trimmed || GLOBAL_MODEL_FLAG_RE.test(trimmed) ? trimmed : `${trimmed} --global`
-}
-
-const stripTuiSessionFlag = (trimmed: string) =>
-  trimmed.replace(TUI_SESSION_STRIP_RE, ' ').replace(/\s+/g, ' ').trim()
+const stripTuiSessionFlag = (trimmed: string) => trimmed.replace(TUI_SESSION_STRIP_RE, ' ').replace(/\s+/g, ' ').trim()

 const modelValueForConfigSet = (arg: string) => {
  const trimmed = arg.trim()
@@ -41,7 +32,7 @@ const modelValueForConfigSet = (arg: string) => {
    return stripTuiSessionFlag(trimmed)
  }

-  return persistedModelArg(trimmed)
+  return trimmed
 }

 export const sessionCommands: SlashCommand[] = [
@@ -307,6 +298,85 @@ export const sessionCommands: SlashCommand[] = [
    }
  },

+  {
+    help: 'toggle fast mode [normal|fast|status|on|off|toggle]',
+    name: 'fast',
+    run: (arg, ctx) => {
+      const mode = arg.trim().toLowerCase()
+      const valid = new Set(['', 'status', 'normal', 'fast', 'on', 'off', 'toggle'])
+
+      if (!valid.has(mode)) {
+        return ctx.transcript.sys('usage: /fast [normal|fast|status|on|off|toggle]')
+      }
+
+      if (!mode || mode === 'status') {
+        return ctx.gateway
+          .rpc<ConfigGetValueResponse>('config.get', { key: 'fast', session_id: ctx.sid })
+          .then(
+            ctx.guarded<ConfigGetValueResponse>(r =>
+              ctx.transcript.sys(`fast mode: ${r.value === 'fast' ? 'fast' : 'normal'}`)
+            )
+          )
+          .catch(ctx.guardedErr)
+      }
+
+      ctx.gateway
+        .rpc<ConfigSetResponse>('config.set', { key: 'fast', session_id: ctx.sid, value: mode })
+        .then(
+          ctx.guarded<ConfigSetResponse>(r => {
+            const next = r.value === 'fast' ? 'fast' : 'normal'
+            ctx.transcript.sys(`fast mode: ${next}`)
+            patchUiState(state => ({
+              ...state,
+              info: state.info
+                ? {
+                    ...state.info,
+                    fast: next === 'fast',
+                    service_tier: next === 'fast' ? 'priority' : ''
+                  }
+                : state.info
+            }))
+          })
+        )
+        .catch(ctx.guardedErr)
+    }
+  },
+
+  {
+    help: 'control busy enter mode [queue|steer|interrupt|status]',
+    name: 'busy',
+    run: (arg, ctx) => {
+      const mode = arg.trim().toLowerCase()
+      const valid = new Set(['', 'status', 'queue', 'steer', 'interrupt'])
+
+      if (!valid.has(mode)) {
+        return ctx.transcript.sys('usage: /busy [queue|steer|interrupt|status]')
+      }
+
+      if (!mode || mode === 'status') {
+        return ctx.gateway
+          .rpc<ConfigGetValueResponse>('config.get', { key: 'busy' })
+          .then(
+            ctx.guarded<ConfigGetValueResponse>(r => {
+              const current = r.value || 'interrupt'
+              ctx.transcript.sys(`busy input mode: ${current}`)
+            })
+          )
+          .catch(ctx.guardedErr)
+      }
+
+      ctx.gateway
+        .rpc<ConfigSetResponse>('config.set', { key: 'busy', value: mode })
+        .then(
+          ctx.guarded<ConfigSetResponse>(r => {
+            const next = r.value || mode
+            ctx.transcript.sys(`busy input mode: ${next}`)
+          })
+        )
+        .catch(ctx.guardedErr)
+    }
+  },
+
  {
    help: 'cycle verbose tool-output mode (updates live agent)',
    name: 'verbose',
@@ -110,8 +110,18 @@ export function useComposerState({
  const isBlocked = useStore($isBlocked)
  const { querier } = useStdin() as { querier: Parameters<typeof readOsc52Clipboard>[0] }

-  const { queueRef, queueEditRef, queuedDisplay, queueEditIdx, enqueue, dequeue, replaceQ, setQueueEdit, syncQueue } =
-    useQueue()
+  const {
+    queueRef,
+    queueEditRef,
+    queuedDisplay,
+    queueEditIdx,
+    enqueue,
+    dequeue,
+    removeQ,
+    replaceQ,
+    setQueueEdit,
+    syncQueue
+  } = useQueue()

  const { historyRef, historyIdx, setHistoryIdx, historyDraftRef, pushHistory } = useInputHistory()
  const { completions, compIdx, setCompIdx, compReplace } = useCompletion(input, isBlocked, gw)
@@ -294,6 +304,7 @@ export function useComposerState({
      handleTextPaste,
      openEditor,
      pushHistory,
+      removeQueue: removeQ,
      replaceQueue: replaceQ,
      setCompIdx,
      setHistoryIdx,
@@ -310,6 +321,7 @@ export function useComposerState({
      handleTextPaste,
      openEditor,
      pushHistory,
+      removeQ,
      replaceQ,
      setCompIdx,
      setHistoryIdx,
@@ -1,4 +1,4 @@
-import { useInput } from '@hermes/ink'
+import { forceRedraw, useInput } from '@hermes/ink'
 import { useStore } from '@nanostores/react'
 import { useEffect, useRef } from 'react'

@@ -18,7 +18,7 @@ import type { InputHandlerContext, InputHandlerResult } from './interfaces.js'
 import { $isBlocked, $overlayState, patchOverlayState } from './overlayStore.js'
 import { turnController } from './turnController.js'
 import { patchTurnState } from './turnStore.js'
-import { getUiState, patchUiState } from './uiStore.js'
+import { getUiState } from './uiStore.js'

 const isCtrl = (key: { ctrl: boolean }, ch: string, target: string) => key.ctrl && ch.toLowerCase() === target

@@ -307,6 +307,13 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
      return scrollTranscript(key.pageUp ? -step : step)
    }

+    // Queue-edit cancel beats selection-clear: the queue header explicitly
+    // promises "Esc cancel", so honoring it takes priority over the implicit
+    // selection-dismissal convention. Without an active edit, fall through.
+    if (key.escape && cState.queueEditIdx !== null) {
+      return cActions.clearIn()
+    }
+
    if (key.escape && terminal.hasSelection) {
      return clearSelection()
    }
@@ -357,6 +364,11 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
      }
    }

+    if (isCtrl(key, ch, 'x') && cState.queueEditIdx !== null) {
+      cActions.removeQueue(cState.queueEditIdx)
+      return cActions.clearIn()
+    }
+
    if (key.ctrl && ch.toLowerCase() === 'c') {
      if (live.busy && live.sid) {
        return turnController.interruptTurn({
@@ -379,13 +391,9 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
    }

    if (isAction(key, ch, 'l')) {
-      if (actions.guardBusySessionSwitch()) {
-        return
-      }
-
-      patchUiState({ status: 'forging session…' })
-
-      return actions.newSession()
+      clearSelection()
+      forceRedraw(terminal.stdout ?? process.stdout)
+      return
    }

    if (isVoiceToggleKey(key, ch)) {
@@ -25,6 +25,7 @@ import type { Msg, PanelSection, SlashCatalog } from '../types.js'

 import { createGatewayEventHandler } from './createGatewayEventHandler.js'
 import { createSlashHandler } from './createSlashHandler.js'
+import { getInputSelection } from './inputSelectionStore.js'
 import { type GatewayRpc, type TranscriptRow } from './interfaces.js'
 import { $overlayState, patchOverlayState } from './overlayStore.js'
 import { scrollWithSelectionBy } from './scroll.js'
@@ -147,6 +148,11 @@ export function useMainApp(gw: GatewayClient) {
    selection.setSelectionBgColor(ui.theme.color.selectionBg)
  }, [selection, ui.theme.color.selectionBg])

+  const clearSelection = useCallback(() => {
+    selection.clearSelection()
+    getInputSelection()?.collapseToEnd()
+  }, [selection])
+
  const composer = useComposerState({
    gw,
    onClipboardPaste: quiet => clipboardPasteRef.current(quiet),
@@ -519,6 +525,7 @@ export function useMainApp(gw: GatewayClient) {
    [
      appendMessage,
      bellOnComplete,
+      clearSelection,
      composerActions.setInput,
      gateway,
      panel,
@@ -691,11 +698,12 @@ export function useMainApp(gw: GatewayClient) {
      answerClarify,
      answerSecret,
      answerSudo,
+      clearSelection,
      onModelSelect,
      resumeById: session.resumeById,
      setStickyPrompt
    }),
-    [answerApproval, answerClarify, answerSecret, answerSudo, onModelSelect, session.resumeById]
+    [answerApproval, answerClarify, answerSecret, answerSudo, clearSelection, onModelSelect, session.resumeById]
  )

  const appComposer = useMemo(
@@ -1,6 +1,6 @@
 import { AlternateScreen, Box, NoSelect, ScrollBox, Text } from '@hermes/ink'
 import { useStore } from '@nanostores/react'
-import { Fragment, memo, useMemo } from 'react'
+import { Fragment, memo, useMemo, useRef } from 'react'

 import { useGateway } from '../app/gatewayContext.js'
 import type { AppLayoutProps } from '../app/interfaces.js'
@@ -20,7 +20,7 @@ import { FpsOverlay } from './fpsOverlay.js'
 import { MessageLine } from './messageLine.js'
 import { QueuedMessages } from './queuedMessages.js'
 import { LiveTodoPanel, StreamingAssistant } from './streamingAssistant.js'
-import { TextInput } from './textInput.js'
+import { TextInput, type TextInputMouseApi } from './textInput.js'

 const TranscriptPane = memo(function TranscriptPane({
  actions,
@@ -47,7 +47,18 @@ const TranscriptPane = memo(function TranscriptPane({

  return (
    <>
-      <ScrollBox flexDirection="column" flexGrow={1} flexShrink={1} ref={transcript.scrollRef} stickyScroll>
+      <ScrollBox
+        flexDirection="column"
+        flexGrow={1}
+        flexShrink={1}
+        onClick={(e: { cellIsBlank?: boolean }) => {
+          if (e.cellIsBlank) {
+            actions.clearSelection()
+          }
+        }}
+        ref={transcript.scrollRef}
+        stickyScroll
+      >
        <Box flexDirection="column" paddingX={1}>
          {transcript.virtualHistory.topSpacer > 0 ? <Box height={transcript.virtualHistory.topSpacer} /> : null}

@@ -113,12 +124,57 @@ const ComposerPane = memo(function ComposerPane({
  const ui = useStore($uiState)
  const isBlocked = useStore($isBlocked)
  const sh = (composer.inputBuf[0] ?? composer.input).startsWith('!')
-  const pw = sh ? 2 : 3
+  const pw = 2
  const inputColumns = stableComposerColumns(composer.cols, pw)
  const inputHeight = inputVisualHeight(composer.input, inputColumns)
+  const inputMouseRef = useRef<null | TextInputMouseApi>(null)
+
+  const captureInputDrag = (e: GutterMouseEvent) => {
+    if (e.button !== 0) {
+      return
+    }
+
+    e.stopImmediatePropagation?.()
+    inputMouseRef.current?.startAtBeginning()
+  }
+
+  // Drag origin matches the input box's top-left, so localRow / localCol
+  // map directly into TextInput coords (after backing out the prompt cell).
+  const dragFromPromptRow = (e: GutterMouseEvent) => {
+    if (e.button !== 0) {
+      return
+    }
+
+    e.stopImmediatePropagation?.()
+    inputMouseRef.current?.dragAt(e.localRow ?? 0, (e.localCol ?? 0) - pw)
+  }
+
+  // Spacer rows live on a different vertical origin; only the column is
+  // parent-aligned with the input. Force row=0 so vertical drags can't
+  // jump the cursor to the wrong wrapped line.
+  const dragFromSpacer = (e: GutterMouseEvent) => {
+    if (e.button !== 0) {
+      return
+    }
+
+    e.stopImmediatePropagation?.()
+    inputMouseRef.current?.dragAt(0, (e.localCol ?? 0) - pw)
+  }
+
+  const endInputDrag = () => inputMouseRef.current?.end()

  return (
-    <NoSelect flexDirection="column" flexShrink={0} fromLeftEdge paddingX={1}>
+    <NoSelect
+      flexDirection="column"
+      flexShrink={0}
+      fromLeftEdge
+      onClick={(e: { cellIsBlank?: boolean }) => {
+        if (e.cellIsBlank) {
+          actions.clearSelection()
+        }
+      }}
+      paddingX={1}
+    >
      <QueuedMessages
        cols={composer.cols}
        queued={composer.queuedDisplay}
@@ -139,7 +195,7 @@ const ComposerPane = memo(function ComposerPane({
          {status.stickyPrompt}
        </Text>
      ) : (
-        <Text> </Text>
+        <Box height={1} onMouseDown={captureInputDrag} onMouseDrag={dragFromSpacer} onMouseUp={endInputDrag} />
      )}

      <StatusRulePane at="top" composer={composer} status={status} />
@@ -158,7 +214,7 @@ const ComposerPane = memo(function ComposerPane({
          <>
            {composer.inputBuf.map((line, i) => (
              <Box key={i}>
-                <Box width={3}>
+                <Box width={2}>
                  <Text color={ui.theme.color.dim}>{i === 0 ? `${ui.theme.brand.prompt} ` : '  '}</Text>
                </Box>

@@ -166,7 +222,7 @@ const ComposerPane = memo(function ComposerPane({
              </Box>
            ))}

-            <Box position="relative">
+            <Box onMouseDown={captureInputDrag} onMouseDrag={dragFromPromptRow} onMouseUp={endInputDrag} position="relative">
              <Box width={pw}>
                {sh ? (
                  <Text color={ui.theme.color.shellDollar}>$ </Text>
@@ -181,6 +237,7 @@ const ComposerPane = memo(function ComposerPane({
                {/* Reserve the transcript scrollbar gutter too so typing never rewraps when the scrollbar column repaints. */}
                <TextInput
                  columns={inputColumns}
+                  mouseApiRef={inputMouseRef}
                  onChange={composer.updateInput}
                  onPaste={composer.handleTextPaste}
                  onSubmit={composer.submit}
@@ -311,3 +368,10 @@ export const AppLayout = memo(function AppLayout({
    </Shell>
  )
 })
+
+type GutterMouseEvent = {
+  button: number
+  localCol?: number
+  localRow?: number
+  stopImmediatePropagation?: () => void
+}
@@ -112,9 +112,7 @@ export function ModelPicker({ gw, onCancel, onSelect, sessionId, t }: ModelPicke
      const model = models[modelIdx]

      if (provider && model) {
-        onSelect(
-          `${model} --provider ${provider.slug}${persistGlobal ? ' --global' : ` ${TUI_SESSION_MODEL_FLAG}`}`
-        )
+        onSelect(`${model} --provider ${provider.slug}${persistGlobal ? ' --global' : ` ${TUI_SESSION_MODEL_FLAG}`}`)
      } else {
        setStage('provider')
      }
@@ -24,7 +24,9 @@ export function QueuedMessages({ cols, queueEditIdx, queued, t }: QueuedMessages
  return (
    <Box flexDirection="column" marginTop={1}>
      <Text color={t.color.dim} dimColor>
-        queued ({queued.length}){queueEditIdx !== null ? ` · editing ${queueEditIdx + 1}` : ''}
+        {`queued (${queued.length})${
+          queueEditIdx !== null ? ` · editing ${queueEditIdx + 1} · Ctrl+X delete · Esc cancel` : ''
+        }`}
      </Text>

      {q.showLead && (
@@ -1,6 +1,6 @@
 import type { InputEvent, Key } from '@hermes/ink'
 import * as Ink from '@hermes/ink'
-import { useEffect, useMemo, useRef, useState } from 'react'
+import { type MutableRefObject, useEffect, useMemo, useRef, useState } from 'react'

 import { setInputSelection } from '../app/inputSelectionStore.js'
 import { readClipboardText, writeClipboardText } from '../lib/clipboard.js'
@@ -25,6 +25,7 @@ const DIM_OFF = `${ESC}[22m`
 const FWD_DEL_RE = new RegExp(`${ESC}\\[3(?:[~$^]|;)`)
 const PRINTABLE = /^[ -~\u00a0-\uffff]+$/
 const BRACKET_PASTE = new RegExp(`${ESC}?\\[20[01]~`, 'g')
+const MULTI_CLICK_MS = 500

 const invert = (s: string) => INV + s + INV_OFF
 const dim = (s: string) => DIM + s + DIM_OFF
@@ -287,6 +288,7 @@ export function TextInput({
  onPaste,
  onSubmit,
  mask,
+  mouseApiRef,
  placeholder = '',
  focus = true
 }: TextInputProps) {
@@ -309,6 +311,8 @@ export function TextInput({
  const pendingParentValue = useRef<string | null>(null)
  const localRenderTimer = useRef<ReturnType<typeof setTimeout> | null>(null)
  const lineWidthRef = useRef(stringWidth(value.includes('\n') ? value.slice(value.lastIndexOf('\n') + 1) : value))
+  const mouseAnchorRef = useRef<null | number>(null)
+  const lastClickRef = useRef<{ at: number; offset: number }>({ at: 0, offset: -1 })
  const undo = useRef<{ cursor: number; value: string }[]>([])
  const redo = useRef<{ cursor: number; value: string }[]>([])

@@ -336,6 +340,24 @@ export function TextInput({
    active: focus && termFocus && !selected
  })

+  // Hide the hardware cursor while a selection is active (prevents
+  // auto-wrap onto the next row when inverted text fills the column
+  // exactly) or when the terminal loses focus (suppresses the hollow-rect
+  // ghost most terminals draw at the parked position).
+  const hideHardwareCursor = focus && !!stdout?.isTTY && (!!selected || !termFocus)
+
+  useEffect(() => {
+    if (!hideHardwareCursor || !stdout) {
+      return
+    }
+
+    stdout.write('\x1b[?25l')
+
+    return () => {
+      stdout.write('\x1b[?25h')
+    }
+  }, [hideHardwareCursor, stdout])
+
  const nativeCursor = focus && termFocus && !selected && !!stdout?.isTTY

  const rendered = useMemo(() => {
@@ -374,12 +396,21 @@ export function TextInput({
      return
    }

+    const dropSel = () => {
+      if (!selRef.current) {
+        return
+      }
+
+      selRef.current = null
+      setSel(null)
+    }
+
    setInputSelection({
-      clear: () => {
-        if (selRef.current) {
-          selRef.current = null
-          setSel(null)
-        }
+      clear: dropSel,
+      collapseToEnd: () => {
+        dropSel()
+        setCur(vRef.current.length)
+        curRef.current = vRef.current.length
      },
      end: selected?.end ?? curRef.current,
      start: selected?.start ?? curRef.current,
@@ -605,6 +636,22 @@ export function TextInput({
    curRef.current = end
  }

+  const moveCursor = (next: number, extend = false) => {
+    const c = snapPos(vRef.current, next)
+    const anchor = selRef.current?.start ?? curRef.current
+
+    if (!extend || anchor === c) {
+      clearSel()
+    } else {
+      const nextSel = { end: c, start: anchor }
+      selRef.current = nextSel
+      setSel(nextSel)
+    }
+
+    setCur(c)
+    curRef.current = c
+  }
+
  const selRange = () => {
    const range = selRef.current

@@ -633,6 +680,59 @@ export function TextInput({
    commit(nextValue, nextCursor)
  }

+  const startMouseSelection = (next: number) => {
+    const c = snapPos(vRef.current, next)
+
+    mouseAnchorRef.current = c
+    selRef.current = { end: c, start: c }
+    setSel(null)
+    setCur(c)
+    curRef.current = c
+  }
+
+  const dragMouseSelection = (next: number) => {
+    if (mouseAnchorRef.current === null) {
+      return
+    }
+
+    const c = snapPos(vRef.current, next)
+    const range = { end: c, start: mouseAnchorRef.current }
+    selRef.current = range
+    setSel(range.start === range.end ? null : range)
+    setCur(c)
+    curRef.current = c
+  }
+
+  const endMouseSelection = () => {
+    mouseAnchorRef.current = null
+
+    const range = selRef.current
+
+    if (range && range.start === range.end) {
+      selRef.current = null
+      setSel(null)
+    }
+  }
+
+  const offsetAt = (e: { localCol?: number; localRow?: number }) =>
+    offsetFromPosition(display, e.localRow ?? 0, e.localCol ?? 0, columns)
+
+  const isMultiClickAt = (offset: number) => {
+    const now = Date.now()
+    const last = lastClickRef.current
+    lastClickRef.current = { at: now, offset }
+
+    return now - last.at < MULTI_CLICK_MS && offset === last.offset
+  }
+
+  if (mouseApiRef) {
+    mouseApiRef.current = {
+      dragAt: (row, col) => dragMouseSelection(offsetFromPosition(display, row, col, columns)),
+      end: endMouseSelection,
+      startAtBeginning: () => startMouseSelection(0)
+    }
+  }
+
  useInput(
    (inp: string, k: Key, event: InputEvent) => {
      const eventRaw = event.keypress.raw
@@ -674,9 +774,7 @@ export function TextInput({
        const next = lineNav(vRef.current, curRef.current, k.upArrow ? -1 : 1)

        if (next !== null) {
-          clearSel()
-          setCur(next)
-          curRef.current = next
+          moveCursor(next, k.shift)

          return
        }
@@ -684,13 +782,13 @@ export function TextInput({
        return
      }

-      // Ctrl+B is the documented voice-recording toggle (see platform.ts →
-      // isVoiceToggleKey). Pass it through so the app-level handler in
-      // useInputHandlers receives it instead of being swallowed here as
-      // either backward-word nav (line below) or a literal 'b' insertion.
+      // Ctrl chords claimed by useInputHandlers — pass through instead of
+      // letting them fall into readline-style nav or a literal char insert.
+      // Ctrl+B = voice toggle, Ctrl+X = delete queued message while editing.
      if (
        (k.ctrl && inp === 'c') ||
        (k.ctrl && inp === 'b') ||
+        (k.ctrl && inp === 'x') ||
        k.tab ||
        (k.shift && k.tab) ||
        k.pageUp ||
@@ -737,27 +835,37 @@ export function TextInput({
      }

      if (actionHome) {
-        clearSel()
        c = 0
+        moveCursor(c, k.shift)
+
+        return
      } else if (actionEnd) {
-        clearSel()
        c = v.length
+        moveCursor(c, k.shift)
+
+        return
      } else if (k.leftArrow) {
-        if (range && !wordMod) {
+        if (range && !wordMod && !k.shift) {
          clearSel()
          c = range.start
        } else {
-          clearSel()
          c = wordMod ? wordLeft(v, c) : prevPos(v, c)
        }
+
+        moveCursor(c, k.shift)
+
+        return
      } else if (k.rightArrow) {
-        if (range && !wordMod) {
+        if (range && !wordMod && !k.shift) {
          clearSel()
          c = range.end
        } else {
-          clearSel()
          c = wordMod ? wordRight(v, c) : nextPos(v, c)
        }
+
+        moveCursor(c, k.shift)
+
+        return
      } else if (wordMod && inp === 'b') {
        clearSel()
        c = wordLeft(v, c)
@@ -883,32 +991,74 @@ export function TextInput({

  return (
    <Box
-      onClick={(e: { localRow?: number; localCol?: number }) => {
+      onClick={(e: MouseEventLite) => {
        if (!focus) {
          return
        }

+        e.stopImmediatePropagation?.()
        clearSel()
-        const next = offsetFromPosition(display, e.localRow ?? 0, e.localCol ?? 0, columns)
+        const next = offsetAt(e)
        setCur(next)
        curRef.current = next
      }}
-      onMouseDown={(e: { button: number }) => {
-        // Right-click to paste: route through the same hotkey path as
-        // Alt+V so the composer's clipboard RPC (text or image) handles it.
-        if (!focus || e.button !== 2) {
+      onMouseDown={(e: MouseEventLite) => {
+        if (!focus) {
          return
        }

-        emitPaste({ cursor: curRef.current, hotkey: true, text: '', value: vRef.current })
+        // Right-click → route through the same path as Alt+V so the composer
+        // clipboard RPC (text or image) handles it.
+        if (e.button === 2) {
+          e.stopImmediatePropagation?.()
+          emitPaste({ cursor: curRef.current, hotkey: true, text: '', value: vRef.current })
+
+          return
+        }
+
+        if (e.button !== 0) {
+          return
+        }
+
+        e.stopImmediatePropagation?.()
+        const offset = offsetAt(e)
+
+        if (isMultiClickAt(offset)) {
+          mouseAnchorRef.current = null
+          selectAll()
+
+          return
+        }
+
+        startMouseSelection(offset)
+      }}
+      onMouseDrag={(e: MouseEventLite) => {
+        if (!focus || e.button !== 0 || mouseAnchorRef.current === null) {
+          return
+        }
+
+        e.stopImmediatePropagation?.()
+        dragMouseSelection(offsetAt(e))
+      }}
+      onMouseUp={(e: MouseEventLite) => {
+        e.stopImmediatePropagation?.()
+        endMouseSelection()
      }}
      ref={boxRef}
+      width={columns}
    >
      <Text wrap="wrap-char">{rendered}</Text>
    </Box>
  )
 }

+type MouseEventLite = {
+  button?: number
+  localCol?: number
+  localRow?: number
+  stopImmediatePropagation?: () => void
+}
+
 export interface PasteEvent {
  bracketed?: boolean
  cursor: number
@@ -921,6 +1071,7 @@ interface TextInputProps {
  columns?: number
  focus?: boolean
  mask?: string
+  mouseApiRef?: MutableRefObject<null | TextInputMouseApi>
  onChange: (v: string) => void
  onPaste?: (
    e: PasteEvent
@@ -929,3 +1080,9 @@ interface TextInputProps {
  placeholder?: string
  value: string
 }
+
+export interface TextInputMouseApi {
+  dragAt: (row: number, col: number) => void
+  end: () => void
+  startAtBeginning: () => void
+}
@@ -19,10 +19,11 @@ export const HOTKEYS: [string, string][] = [
  ...copyHotkeys,
  [action + '+D', 'exit'],
  [action + '+G / Alt+G', 'open $EDITOR (Alt+G fallback for VSCode/Cursor)'],
-  [action + '+L', 'new session (clear)'],
+  [action + '+L', 'redraw / repaint'],
  [paste + '+V / /paste', 'paste text; /paste attaches clipboard image'],
  ['Tab', 'apply completion'],
  ['↑/↓', 'completions / queue edit / history'],
+  ['Ctrl+X', 'delete the queued message you’re editing (Esc cancels edit)'],
  [action + '+A/E', 'home / end of line'],
  [action + '+Z / ' + action + '+Y', 'undo / redo input edits'],
  [action + '+W', 'delete word'],
@@ -288,7 +288,42 @@ export interface ModelOptionsResponse {
 // ── MCP ──────────────────────────────────────────────────────────────

 export interface ReloadMcpResponse {
-  ok?: boolean
+  status?: string
+}
+
+export interface ProcessStopResponse {
+  killed?: number
+}
+
+export interface BrowserManageResponse {
+  connected?: boolean
+  url?: string
+}
+
+export interface RollbackCheckpoint {
+  hash: string
+  message?: string
+  timestamp?: string
+}
+
+export interface RollbackListResponse {
+  checkpoints?: RollbackCheckpoint[]
+  enabled?: boolean
+}
+
+export interface RollbackDiffResponse {
+  diff?: string
+  rendered?: string
+  stat?: string
+}
+
+export interface RollbackRestoreResponse {
+  error?: string
+  history_removed?: number
+  message?: string
+  reason?: string
+  restored_to?: string
+  success?: boolean
 }

 // ── Subagent events ──────────────────────────────────────────────────
@@ -1,5 +1,17 @@
 import { useCallback, useRef, useState } from 'react'

+// Mutates `arr` in place; returned reference is the same input array, kept
+// so callers can chain. Use `Array.prototype.toSpliced` if you need a copy.
+export function removeAtInPlace<T>(arr: T[], i: number): T[] {
+  if (i < 0 || i >= arr.length) {
+    return arr
+  }
+
+  arr.splice(i, 1)
+
+  return arr
+}
+
 export function useQueue() {
  const queueRef = useRef<string[]>([])
  const [queuedDisplay, setQueuedDisplay] = useState<string[]>([])
@@ -36,6 +48,19 @@ export function useQueue() {
    [syncQueue]
  )

+  const removeQ = useCallback(
+    (i: number) => {
+      const before = queueRef.current.length
+
+      removeAtInPlace(queueRef.current, i)
+
+      if (queueRef.current.length !== before) {
+        syncQueue()
+      }
+    },
+    [syncQueue]
+  )
+
  return {
    dequeue,
    enqueue,
@@ -43,6 +68,7 @@ export function useQueue() {
    queueEditRef,
    queueRef,
    queuedDisplay,
+    removeQ,
    replaceQ,
    setQueueEdit,
    syncQueue
@@ -131,6 +131,7 @@ declare module '@hermes/ink' {
  }
  export function evictInkCaches(level?: EvictLevel): InkCacheSizes

+  export function forceRedraw(stdout?: NodeJS.WriteStream): boolean
  export function render(node: React.ReactNode, options?: NodeJS.WriteStream | RenderOptions): Instance

  export function useApp(): { readonly exit: (error?: Error) => void }
@@ -78,7 +78,14 @@ const CHAT_NAV_ITEM: NavItem = {
  icon: Terminal,
 };

-/** Built-in routes except /chat (only with `hermes dashboard --tui`). */
+/**
+ * Built-in routes except /chat.  Chat is rendered persistently (outside
+ * <Routes>) when embedded — see ChatPageHost below — so the PTY child,
+ * WebSocket, and xterm instance survive when the user visits another tab
+ * and comes back.  A `display:none` toggle hides the terminal without
+ * unmounting.  Routing still owns the URL so /chat deep-links, browser
+ * back/forward, and nav highlight keep working.
+ */
 const BUILTIN_ROUTES_CORE: Record<string, ComponentType> = {
  "/": RootRedirect,
  "/sessions": SessionsPage,
@@ -91,6 +98,14 @@ const BUILTIN_ROUTES_CORE: Record<string, ComponentType> = {
  "/docs": DocsPage,
 };

+// Route placeholder for /chat.  The persistent ChatPage host (rendered
+// outside <Routes> when embedded chat is on) paints on top; this empty
+// element just claims the path so the `*` catch-all redirect doesn't
+// fire when the user navigates to /chat.
+function ChatRouteSink() {
+  return null;
+}
+
 const BUILTIN_NAV_REST: NavItem[] = [
  {
    path: "/sessions",
@@ -240,7 +255,7 @@ function buildRoutes(
 export default function App() {
  const { t } = useI18n();
  const { pathname } = useLocation();
-  const { manifests } = usePlugins();
+  const { manifests, loading: pluginsLoading } = usePlugins();
  const { theme } = useTheme();
  const [mobileOpen, setMobileOpen] = useState(false);
  const closeMobile = useCallback(() => setMobileOpen(false), []);
@@ -249,10 +264,32 @@ export default function App() {
  const isChatRoute = normalizedPath === "/chat";
  const embeddedChat = isDashboardEmbeddedChatEnabled();

+  // A plugin can replace the built-in /chat page via `tab.override: "/chat"`
+  // in its manifest.  When one does, `buildRoutes` already swaps the route
+  // element for <PluginPage /> — but we also have to suppress the
+  // persistent ChatPage host below, or the plugin's page and the built-in
+  // terminal would paint on top of each other.  The override is niche
+  // (nothing ships overriding /chat today) but it's an advertised
+  // extension point, so preserve the pre-persistence contract: when a
+  // plugin owns /chat, the built-in chat UI is entirely absent.
+  //
+  // Waiting on `pluginsLoading` is load-bearing: manifests arrive
+  // asynchronously from /api/dashboard/plugins, so on initial render
+  // `chatOverriddenByPlugin` is always false.  Without the loading
+  // gate, the persistent host would mount, spawn a PTY, and THEN get
+  // yanked out from under the user when the plugin's manifest resolves
+  // — killing the session mid-paint.  Delaying host mount by the
+  // plugin-load window (typically <50ms, worst case 2s safety timeout)
+  // is the cheaper trade-off.
+  const chatOverriddenByPlugin = useMemo(
+    () => manifests.some((m) => m.tab.override === "/chat"),
+    [manifests],
+  );
+
  const builtinRoutes = useMemo(
    () => ({
      ...BUILTIN_ROUTES_CORE,
-      ...(embeddedChat ? { "/chat": ChatPage } : {}),
+      ...(embeddedChat ? { "/chat": ChatRouteSink } : {}),
    }),
    [embeddedChat],
  );
@@ -519,6 +556,40 @@ export default function App() {
                    element={<Navigate to="/sessions" replace />}
                  />
                </Routes>
+
+                {/*
+                  Persistent chat host: always mounted when `hermes dashboard
+                  --tui` is active, visibility toggled by route.  Keeping the
+                  tree alive preserves the xterm instance, its WebSocket, and
+                  the PTY child that backs the TUI session — so navigating to
+                  another tab and returning lands the user in the same
+                  conversation instead of spawning a fresh session.
+
+                  The host sits alongside <Routes> (not inside one) because
+                  React Router unmounts route elements on path change, which
+                  is exactly the destructive lifecycle we're avoiding.
+
+                  Trade-off worth knowing about: while hidden, ChatPage still
+                  holds a PTY child + WebSocket + xterm instance for the
+                  dashboard's full lifetime.  The WS keeps delivering bytes
+                  and xterm keeps parsing them into a display:none host
+                  (cheap — no paint work, but not free).  If this becomes a
+                  resource problem we can pause `term.write` when !isActive
+                  or idle-disconnect after N minutes hidden; neither is
+                  shipped today.
+                */}
+                {embeddedChat && !pluginsLoading && !chatOverriddenByPlugin && (
+                  <div
+                    data-chat-active={isChatRoute ? "true" : "false"}
+                    className={cn(
+                      "min-h-0 min-w-0",
+                      isChatRoute ? "flex flex-1 flex-col" : "hidden",
+                    )}
+                    aria-hidden={!isChatRoute}
+                  >
+                    <ChatPage isActive={isChatRoute} />
+                  </div>
+                )}
              </div>
              <PluginSlot name="post-main" />
            </div>
@@ -101,11 +101,15 @@ function terminalLineHeightForWidth(layoutWidthPx: number): number {
  return layoutWidthPx < 1024 ? 1.02 : 1.15;
 }

-export default function ChatPage() {
+export default function ChatPage({ isActive = true }: { isActive?: boolean }) {
  const hostRef = useRef<HTMLDivElement | null>(null);
  const termRef = useRef<Terminal | null>(null);
  const fitRef = useRef<FitAddon | null>(null);
  const wsRef = useRef<WebSocket | null>(null);
+  // Exposed to the main metrics-sync effect so it can refit the terminal
+  // the moment `isActive` flips back to true (display:none → display:flex
+  // collapses the host's box, so ResizeObserver never fires on return).
+  const syncMetricsRef = useRef<(() => void) | null>(null);
  const [searchParams] = useSearchParams();
  // Lazy-init: the missing-token check happens at construction so the effect
  // body doesn't have to setState (React 19's set-state-in-effect rule).
@@ -116,7 +120,16 @@ export default function ChatPage() {
  );
  const [copyState, setCopyState] = useState<"idle" | "copied">("idle");
  const copyResetRef = useRef<ReturnType<typeof setTimeout> | null>(null);
-  const [mobilePanelOpen, setMobilePanelOpen] = useState(false);
+  // Raw state for the mobile side-sheet + a derived value that force-
+  // closes whenever the chat tab isn't active.  The *derived* value is
+  // what side-effects (body-scroll lock, keydown listener, portal render)
+  // key on — that way switching to another tab triggers the effect's
+  // cleanup, releasing the scroll-lock on /sessions etc.  Returning to
+  // /chat re-runs the effect (derived flips back to true) and re-locks.
+  // Keying on the raw state would leak the body.overflow="hidden" across
+  // tabs because the dep wouldn't change on tab switch.
+  const [mobilePanelOpenRaw, setMobilePanelOpen] = useState(false);
+  const mobilePanelOpen = isActive && mobilePanelOpenRaw;
  const { setEnd } = usePageHeader();
  const { t } = useI18n();
  const closeMobilePanel = useCallback(() => setMobilePanelOpen(false), []);
@@ -168,6 +181,12 @@ export default function ChatPage() {
  }, []);

  useEffect(() => {
+    // When hidden (non-chat tab) we must not register the header button —
+    // another page owns the header's end slot at that point.
+    if (!isActive) {
+      setEnd(null);
+      return;
+    }
    if (!narrow) {
      setEnd(null);
      return;
@@ -191,7 +210,7 @@ export default function ChatPage() {
      </button>,
    );
    return () => setEnd(null);
-  }, [narrow, mobilePanelOpen, modelToolsLabel, setEnd]);
+  }, [isActive, narrow, mobilePanelOpen, modelToolsLabel, setEnd]);

  const handleCopyLast = () => {
    const ws = wsRef.current;
@@ -392,6 +411,12 @@ export default function ChatPage() {

    let metricsDebounce: ReturnType<typeof setTimeout> | null = null;
    const syncTerminalMetrics = () => {
+      // display:none hosts have clientWidth/Height = 0, which fit() turns
+      // into a 1x1 terminal.  Skip entirely while hidden; the visibility
+      // effect below runs another fit as soon as the tab is shown again.
+      if (!host.isConnected || host.clientWidth <= 0 || host.clientHeight <= 0) {
+        return;
+      }
      const w = terminalTierWidthPx(host);
      const nextSize = terminalFontSizeForWidth(w);
      const nextLh = terminalLineHeightForWidth(w);
@@ -422,6 +447,7 @@ export default function ChatPage() {
        wsRef.current.send(`\x1b[RESIZE:${term.cols};${term.rows}]`);
      }
    };
+    syncMetricsRef.current = syncTerminalMetrics;

    const scheduleSyncTerminalMetrics = () => {
      if (metricsDebounce) clearTimeout(metricsDebounce);
@@ -565,6 +591,7 @@ export default function ChatPage() {

    return () => {
      unmounting = true;
+      syncMetricsRef.current = null;
      onDataDisposable.dispose();
      onResizeDisposable.dispose();
      if (metricsDebounce) clearTimeout(metricsDebounce);
@@ -593,6 +620,51 @@ export default function ChatPage() {
    };
  }, [channel]);

+  // When the user returns to the chat tab (isActive: false → true), the
+  // terminal host just transitioned from display:none to display:flex.
+  // ResizeObserver won't fire on that kind of style-driven box change —
+  // xterm thinks its grid is still whatever it was when the tab was
+  // hidden (or 0×0, if it was hidden before first fit).  Force a refit
+  // after two animation frames so layout has committed.
+  //
+  // Focus handling: we only steal focus back into the terminal when
+  // nothing else inside ChatPage was holding it (typically the first
+  // activation after mount, where document.activeElement is <body>; or
+  // a return after the user had been typing in the terminal, where
+  // focus was already on the xterm textarea before the tab got hidden
+  // and has since fallen back to <body>).  If the user had clicked
+  // into the sidebar (model picker, tool-call entry) before switching
+  // tabs, we must not yank focus away from wherever they left it when
+  // they come back — that's a surprise and an a11y foot-gun.
+  useEffect(() => {
+    if (!isActive) return;
+    let raf1 = 0;
+    let raf2 = 0;
+    raf1 = requestAnimationFrame(() => {
+      raf1 = 0;
+      raf2 = requestAnimationFrame(() => {
+        raf2 = 0;
+        syncMetricsRef.current?.();
+        const host = hostRef.current;
+        const active = typeof document !== "undefined"
+          ? document.activeElement
+          : null;
+        const focusIsElsewhereInChatPage =
+          active !== null &&
+          active !== document.body &&
+          host !== null &&
+          !host.contains(active);
+        if (!focusIsElsewhereInChatPage) {
+          termRef.current?.focus();
+        }
+      });
+    });
+    return () => {
+      if (raf1) cancelAnimationFrame(raf1);
+      if (raf2) cancelAnimationFrame(raf2);
+    };
+  }, [isActive]);
+
  // Layout:
  //   outer flex column — sits inside the dashboard's content area
  //   row split — terminal pane (flex-1) + sidebar (fixed width, lg+)
@@ -612,6 +684,7 @@ export default function ChatPage() {
  // dashboard column uses `relative z-2`, which traps `position:fixed`
  // descendants below those layers (see Toast.tsx).
  const mobileModelToolsPortal =
+    isActive &&
    narrow &&
    portalRoot &&
    createPortal(
@@ -599,6 +599,93 @@ The `preStart` script creates a GC root at `${stateDir}/.gc-root` pointing to th

 ---

+## Plugins
+
+The NixOS module supports declarative plugin installation — no imperative `hermes plugins install` needed.
+
+### Directory Plugins (`extraPlugins`)
+
+For plugins that are just a source tree with `plugin.yaml` + `__init__.py` (e.g., [hermes-lcm](https://github.com/stephenschoettler/hermes-lcm)):
+
+```nix
+services.hermes-agent.extraPlugins = [
+  (pkgs.fetchFromGitHub {
+    owner = "stephenschoettler";
+    repo = "hermes-lcm";
+    rev = "v0.7.0";
+    hash = "sha256-...";
+  })
+];
+```
+
+Plugins are symlinked into `$HERMES_HOME/plugins/` at activation time. Hermes discovers them via its normal directory scan. Removing a plugin from the list and running `nixos-rebuild switch` removes the symlink.
+
+### Entry-Point Plugins (`extraPythonPackages`)
+
+For pip-packaged plugins that register via `[project.entry-points."hermes_agent.plugins"]` (e.g., [rtk-hermes](https://github.com/ogallotti/rtk-hermes)):
+
+```nix
+services.hermes-agent.extraPythonPackages = [
+  (pkgs.python312Packages.buildPythonPackage {
+    pname = "rtk-hermes";
+    version = "1.0.0";
+    src = pkgs.fetchFromGitHub {
+      owner = "ogallotti";
+      repo = "rtk-hermes";
+      rev = "v1.0.0";
+      hash = "sha256-...";
+    };
+    format = "pyproject";
+    build-system = [ pkgs.python312Packages.setuptools ];
+  })
+];
+```
+
+The package's `site-packages` is added to PYTHONPATH in the hermes wrapper. `importlib.metadata` discovers the entry point at session start.
+
+### Combining Both
+
+A directory plugin with third-party Python dependencies needs both options:
+
+```nix
+services.hermes-agent = {
+  extraPlugins = [ my-plugin-src ];          # plugin source
+  extraPythonPackages = [ pkgs.python312Packages.redis ];  # its Python dep
+  extraPackages = [ pkgs.redis ];            # system binary it needs
+};
+```
+
+### Using the Overlay
+
+External flakes can override the package directly:
+
+```nix
+{
+  inputs.hermes-agent.url = "github:NousResearch/hermes-agent";
+  outputs = { hermes-agent, nixpkgs, ... }: {
+    nixpkgs.overlays = [ hermes-agent.overlays.default ];
+    # Then: pkgs.hermes-agent.override { extraPythonPackages = [...]; }
+  };
+}
+```
+
+### Plugin Configuration
+
+Plugins still need to be enabled in `config.yaml`. Add them via the declarative settings:
+
+```nix
+services.hermes-agent.settings.plugins.enabled = [
+  "hermes-lcm"
+  "rtk-rewrite"
+];
+```
+
+:::note
+A build-time collision check prevents plugin packages from shadowing core hermes dependencies. If a plugin provides a package already in the sealed venv, `nixos-rebuild` fails with a clear error.
+:::
+
+---
+
 ## Development

 ### Dev Shell
@@ -721,6 +808,8 @@ nix build .#checks.x86_64-linux.config-roundtrip    # merge script preserves use
 |---|---|---|---|
 | `extraArgs` | `listOf str` | `[]` | Extra args for `hermes gateway` |
 | `extraPackages` | `listOf package` | `[]` | Extra packages on service PATH (native mode only) |
+| `extraPlugins` | `listOf package` | `[]` | Directory plugin packages to symlink into `$HERMES_HOME/plugins/`. Each must contain `plugin.yaml` |
+| `extraPythonPackages` | `listOf package` | `[]` | Python packages added to PYTHONPATH for entry-point plugin discovery. Build with `python312Packages` |
 | `restart` | `str` | `"always"` | systemd `Restart=` policy |
 | `restartSec` | `int` | `5` | systemd `RestartSec=` value |

@@ -66,13 +66,30 @@ hermes model

 Good defaults:

-| Situation | Recommended path |
-|---|---|
-| Least friction | Nous Portal or OpenRouter |
-| You already have Claude or Codex auth | Anthropic or OpenAI Codex |
-| You want local/private inference | Ollama or any custom OpenAI-compatible endpoint |
-| You want multi-provider routing | OpenRouter |
-| You have a custom GPU server | vLLM, SGLang, LiteLLM, or any OpenAI-compatible endpoint |
+| Provider | What it is | How to set up |
+|----------|-----------|---------------|
+| **Nous Portal** | Subscription-based, zero-config | OAuth login via `hermes model` |
+| **OpenAI Codex** | ChatGPT OAuth, uses Codex models | Device code auth via `hermes model` |
+| **Anthropic** | Claude models directly (Pro/Max or API key) | `hermes model` with Claude Code auth, or an Anthropic API key |
+| **OpenRouter** | Multi-provider routing across many models | Enter your API key |
+| **Z.AI** | GLM / Zhipu-hosted models | Set `GLM_API_KEY` / `ZAI_API_KEY` |
+| **Kimi / Moonshot** | Moonshot-hosted coding and chat models | Set `KIMI_API_KEY` |
+| **Kimi / Moonshot China** | China-region Moonshot endpoint | Set `KIMI_CN_API_KEY` |
+| **Arcee AI** | Trinity models | Set `ARCEEAI_API_KEY` |
+| **GMI Cloud** | Multi-model direct API | Set `GMI_API_KEY` |
+| **MiniMax** | International MiniMax endpoint | Set `MINIMAX_API_KEY` |
+| **MiniMax China** | China-region MiniMax endpoint | Set `MINIMAX_CN_API_KEY` |
+| **Alibaba Cloud** | Qwen models via DashScope | Set `DASHSCOPE_API_KEY` |
+| **Hugging Face** | 20+ open models via unified router (Qwen, DeepSeek, Kimi, etc.) | Set `HF_TOKEN` |
+| **Kilo Code** | KiloCode-hosted models | Set `KILOCODE_API_KEY` |
+| **OpenCode Zen** | Pay-as-you-go access to curated models | Set `OPENCODE_ZEN_API_KEY` |
+| **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` |
+| **DeepSeek** | Direct DeepSeek API access | Set `DEEPSEEK_API_KEY` |
+| **NVIDIA NIM** | Nemotron models via build.nvidia.com or local NIM | Set `NVIDIA_API_KEY` (optional: `NVIDIA_BASE_URL`) |
+| **GitHub Copilot** | GitHub Copilot subscription (GPT-5.x, Claude, Gemini, etc.) | OAuth via `hermes model`, or `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` |
+| **GitHub Copilot ACP** | Copilot ACP agent backend (spawns local `copilot` CLI) | `hermes model` (requires `copilot` CLI + `copilot login`) |
+| **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` |
+| **Custom Endpoint** | VLLM, SGLang, Ollama, or any OpenAI-compatible API | Set base URL + API key |

 For most first-time users: choose a provider, accept the defaults unless you know why you're changing them. The full provider catalog with env vars and setup steps lives on the [Providers](../integrations/providers.md) page.

@@ -633,6 +633,43 @@ pip install hermes-plugin-calculator
 # Plugin auto-discovered on next hermes startup
 ```

+### Distribute for NixOS
+
+NixOS users can install your plugin declaratively if you provide a `pyproject.toml` with entry points:
+
+**Entry-point plugins** (recommended for distribution):
+```nix
+# User's configuration.nix
+services.hermes-agent.extraPythonPackages = [
+  (pkgs.python312Packages.buildPythonPackage {
+    pname = "my-plugin";
+    version = "1.0.0";
+    src = pkgs.fetchFromGitHub {
+      owner = "you";
+      repo = "hermes-my-plugin";
+      rev = "v1.0.0";
+      hash = "sha256-...";  # nix-prefetch-url --unpack
+    };
+    format = "pyproject";
+    build-system = [ pkgs.python312Packages.setuptools ];
+  })
+];
+```
+
+**Directory plugins** (no `pyproject.toml` needed):
+```nix
+services.hermes-agent.extraPlugins = [
+  (pkgs.fetchFromGitHub {
+    owner = "you";
+    repo = "hermes-my-plugin";
+    rev = "v1.0.0";
+    hash = "sha256-...";
+  })
+];
+```
+
+See the [Nix Setup guide](/docs/getting-started/nix-setup#plugins) for complete documentation including overlay usage and collision checking.
+
 ## Common mistakes

 **Handler doesn't return JSON string:**
@@ -25,6 +25,7 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro
 | **Kimi / Moonshot** | `KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding`) |
 | **Kimi / Moonshot (China)** | `KIMI_CN_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding-cn`; aliases: `kimi-cn`, `moonshot-cn`) |
 | **Arcee AI** | `ARCEEAI_API_KEY` in `~/.hermes/.env` (provider: `arcee`; aliases: `arcee-ai`, `arceeai`) |
+| **GMI Cloud** | `GMI_API_KEY` in `~/.hermes/.env` (provider: `gmi`; aliases: `gmi-cloud`, `gmicloud`) |
 | **MiniMax** | `MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax`) |
 | **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
 | **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) |
@@ -250,7 +251,7 @@ model:
 | `HERMES_COPILOT_ACP_COMMAND` | Override the Copilot CLI binary path (default: `copilot`) |
 | `HERMES_COPILOT_ACP_ARGS` | Override ACP args (default: `--acp --stdio`) |

-### First-Class Chinese AI Providers
+### First-Class API-Key Providers

 These providers have built-in support with dedicated provider IDs. Set the API key and use `--provider` to select:

@@ -286,16 +287,21 @@ hermes chat --provider xiaomi --model mimo-v2-pro
 # Arcee AI (Trinity models)
 hermes chat --provider arcee --model trinity-large-thinking
 # Requires: ARCEEAI_API_KEY in ~/.hermes/.env
+
+# GMI Cloud
+# Use the exact model ID returned by GMI's /v1/models endpoint.
+hermes chat --provider gmi --model zai-org/GLM-5.1-FP8
+# Requires: GMI_API_KEY in ~/.hermes/.env
 ```

 Or set the provider permanently in `config.yaml`:
 ```yaml
 model:
-  provider: "zai"       # or: kimi-coding, kimi-coding-cn, minimax, minimax-cn, alibaba, xiaomi, arcee
-  default: "glm-5"
+  provider: "gmi"
+  default: "zai-org/GLM-5.1-FP8"
 ```

-Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, `DASHSCOPE_BASE_URL`, or `XIAOMI_BASE_URL` environment variables.
+Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, `DASHSCOPE_BASE_URL`, `XIAOMI_BASE_URL`, or `GMI_BASE_URL` environment variables.

 :::note Z.AI Endpoint Auto-Detection
 When using the Z.AI / GLM provider, Hermes automatically probes multiple endpoints (global, China, coding variants) to find one that accepts your API key. You don't need to set `GLM_BASE_URL` manually — the working endpoint is detected and cached automatically.
@@ -1172,7 +1178,7 @@ fallback_model:

 When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.

-Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `bedrock`, `ai-gateway`, `opencode-zen`, `opencode-go`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `custom`.
+Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `bedrock`, `ai-gateway`, `opencode-zen`, `opencode-go`, `kilocode`, `xiaomi`, `arcee`, `gmi`, `alibaba`, `custom`.

 :::tip
 Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
@@ -85,7 +85,7 @@ Common options:
 | `-q`, `--query "..."` | One-shot, non-interactive prompt. |
 | `-m`, `--model <model>` | Override the model for this run. |
 | `-t`, `--toolsets <csv>` | Enable a comma-separated set of toolsets. |
-| `--provider <provider>` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot-acp`, `copilot`, `anthropic`, `gemini`, `google-gemini-cli`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway`, `azure-foundry`. |
+| `--provider <provider>` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot-acp`, `copilot`, `anthropic`, `gemini`, `google-gemini-cli`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `gmi`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway`, `azure-foundry`. |
 | `-s`, `--skills <name>` | Preload one or more skills for the session (can be repeated or comma-separated). |
 | `-v`, `--verbose` | Verbose output. |
 | `-Q`, `--quiet` | Programmatic mode: suppress banner/spinner/tool previews. |
@@ -36,6 +36,8 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
 | `KIMI_CN_API_KEY` | Kimi / Moonshot China API key ([moonshot.cn](https://platform.moonshot.cn)) |
 | `ARCEEAI_API_KEY` | Arcee AI API key ([chat.arcee.ai](https://chat.arcee.ai/)) |
 | `ARCEE_BASE_URL` | Override Arcee base URL (default: `https://api.arcee.ai/api/v1`) |
+| `GMI_API_KEY` | GMI Cloud API key ([gmicloud.ai](https://www.gmicloud.ai/)) |
+| `GMI_BASE_URL` | Override GMI Cloud base URL (default: `https://api.gmi-serving.com/v1`) |
 | `MINIMAX_API_KEY` | MiniMax API key — global endpoint ([minimax.io](https://www.minimax.io)) |
 | `MINIMAX_BASE_URL` | Override MiniMax base URL (default: `https://api.minimax.io/anthropic` — Hermes uses MiniMax's Anthropic Messages-compatible endpoint) |
 | `MINIMAX_CN_API_KEY` | MiniMax API key — China endpoint ([minimaxi.com](https://www.minimaxi.com)) |
@@ -89,7 +91,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe

 | Variable | Description |
 |----------|-------------|
-| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `google-gemini-cli`, `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway` (default: `auto`) |
+| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `custom`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `gemini`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `kilocode`, `xiaomi`, `arcee`, `gmi`, `alibaba`, `deepseek`, `nvidia`, `ollama-cloud`, `xai` (alias `grok`), `google-gemini-cli`, `qwen-oauth`, `bedrock`, `opencode-zen`, `opencode-go`, `ai-gateway` (default: `auto`) |
 | `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
 | `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
 | `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |
@@ -54,7 +54,6 @@ hermes skills uninstall <skill-name>
 | [**blender-mcp**](/docs/user-guide/skills/optional/creative/creative-blender-mcp) | Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python (bpy) code. Use when user wants to create or modify anything in Blender. |
 | [**concept-diagrams**](/docs/user-guide/skills/optional/creative/creative-concept-diagrams) | Generate flat, minimal light/dark-aware SVG diagrams as standalone HTML files, using a unified educational visual language with 9 semantic color ramps, sentence-case typography, and automatic dark mode. Best suited for educational and no... |
 | [**meme-generation**](/docs/user-guide/skills/optional/creative/creative-meme-generation) | Generate real meme images by picking a template and overlaying text with Pillow. Produces actual .png meme files. |
-| [**touchdesigner-mcp**](/docs/user-guide/skills/optional/creative/creative-touchdesigner-mcp) | Control a running TouchDesigner instance via twozero MCP — create operators, set parameters, wire connections, execute Python, build real-time visuals. 36 native tools. |

 ## devops

@@ -45,6 +45,7 @@ If a skill is missing from this list but present in the repo, the catalog is reg
 | [`pixel-art`](/docs/user-guide/skills/bundled/creative/creative-pixel-art) | Convert images into retro pixel art with hardware-accurate palettes (NES, Game Boy, PICO-8, C64, etc.), and animate them into short videos. Presets cover arcade, SNES, and 10+ era-correct looks. Use `clarify` to let the user pick a style... | `creative/pixel-art` |
 | [`popular-web-designs`](/docs/user-guide/skills/bundled/creative/creative-popular-web-designs) | 54 production-quality design systems extracted from real websites. Load a template to generate HTML/CSS that matches the visual identity of sites like Stripe, Linear, Vercel, Notion, Airbnb, and more. Each template includes colors, typog... | `creative/popular-web-designs` |
 | [`songwriting-and-ai-music`](/docs/user-guide/skills/bundled/creative/creative-songwriting-and-ai-music) | Songwriting craft, AI music generation prompts (Suno focus), parody/adaptation techniques, phonetic tricks, and lessons learned. These are tools and ideas, not rules. Break any of them when the art calls for it. | `creative/songwriting-and-ai-music` |
+| [`touchdesigner-mcp`](/docs/user-guide/skills/bundled/creative/creative-touchdesigner-mcp) | Control a running TouchDesigner instance via twozero MCP — create operators, set parameters, wire connections, execute Python, build real-time visuals. 36 native tools. | `creative/touchdesigner-mcp` |

 ## data-science

@@ -801,6 +801,17 @@ These options apply to **auxiliary task configs** (`auxiliary:`, `compression:`,
 | `"codex"` | Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex). | `hermes model` → Codex |
 | `"main"` | Use your active custom/main endpoint. This can come from `OPENAI_BASE_URL` + `OPENAI_API_KEY` or from a custom endpoint saved via `hermes model` / `config.yaml`. Works with OpenAI, local models, or any OpenAI-compatible API. **Auxiliary tasks only — not valid for `model.provider`.** | Custom endpoint credentials + base URL |

+Direct API-key providers from the main provider catalog also work here when you want side tasks to bypass your default router. `gmi` is valid once `GMI_API_KEY` is configured:
+
+```yaml
+auxiliary:
+  compression:
+    provider: "gmi"
+    model: "anthropic/claude-opus-4.6"
+```
+
+For GMI auxiliary routing, use the exact model ID returned by GMI's `/v1/models` endpoint.
+
 ### Common Setups

 **Using a direct custom endpoint** (clearer than `provider: "main"` for local/self-hosted APIs):
@@ -105,6 +105,63 @@ The `/opt/data` volume is the single source of truth for all Hermes state. It ma
 Never run two Hermes **gateway** containers against the same data directory simultaneously — session files and memory stores are not designed for concurrent write access. Running a dashboard container alongside the gateway is safe since the dashboard only reads data.
 :::

+## Multi-profile support
+
+Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` directories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **When running under Docker, using Hermes' built-in multi-profile feature is not recommended.**
+
+Instead, the recommended pattern is **one container per profile**, with each container bind-mounting its own host directory as `/opt/data`:
+
+```sh
+# Work profile
+docker run -d \
+  --name hermes-work \
+  --restart unless-stopped \
+  -v ~/.hermes-work:/opt/data \
+  -p 8642:8642 \
+  nousresearch/hermes-agent gateway run
+
+# Personal profile
+docker run -d \
+  --name hermes-personal \
+  --restart unless-stopped \
+  -v ~/.hermes-personal:/opt/data \
+  -p 8643:8642 \
+  nousresearch/hermes-agent gateway run
+```
+
+Why separate containers over profiles in Docker:
+
+- **Isolation** — each container has its own filesystem, process table, and resource limits. A crash, dependency change, or runaway session in one profile can't affect another.
+- **Independent lifecycle** — upgrade, restart, pause, or roll back each agent separately (`docker restart hermes-work` leaves `hermes-personal` untouched).
+- **Clean port and network separation** — each gateway binds its own host port; there's no risk of cross-talk between chat platforms or API servers.
+- **Simpler mental model** — the container *is* the profile. Backups, migrations, and permissions all follow the bind-mounted directory, with no extra `--profile` flags to remember.
+- **Avoids concurrent-write risk** — the warning above about never running two gateways against the same data directory still applies to profiles within a single container.
+
+In Docker Compose, this just means declaring one service per profile with distinct `container_name`, `volumes`, and `ports`:
+
+```yaml
+services:
+  hermes-work:
+    image: nousresearch/hermes-agent:latest
+    container_name: hermes-work
+    restart: unless-stopped
+    command: gateway run
+    ports:
+      - "8642:8642"
+    volumes:
+      - ~/.hermes-work:/opt/data
+
+  hermes-personal:
+    image: nousresearch/hermes-agent:latest
+    container_name: hermes-personal
+    restart: unless-stopped
+    command: gateway run
+    ports:
+      - "8643:8642"
+    volumes:
+      - ~/.hermes-personal:/opt/data
+```
+
 ## Environment variable forwarding

 API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
--- a/Show More
+++ b/Show More