Inspired by Claude Code: tighten dangerous-command detection

Port three hardening patches from Claude Code 2.1.113's expanded deny rules to hermes' detect_dangerous_command() pattern list. 1. macOS /private/{etc,var,tmp,home} system paths /etc, /var, /tmp, /home are symlinks to /private/<name> on macOS. A write to /private/etc/sudoers works identically to /etc/sudoers but bypassed the plain /etc/ pattern check. Extracted a shared _SYSTEM_CONFIG_PATH fragment so /etc/ and the /private/ mirror stay in sync across redirect / tee / cp / mv / install / sed -i patterns. 2. killall -9 / -KILL / -SIGKILL / -s KILL / -r <regex> Parallel to the existing pkill -9 pattern. killall -9 against non-hermes processes was previously unprotected, and killall -r can sweep unrelated processes matching a regex. 3. find -execdir rm Same destructive effect as find -exec rm but ran in each match's directory. The previous pattern required a literal '-exec ' so -execdir slipped through. Guarded by 32 new test cases in 4 test classes: - TestMacOSPrivateSystemPaths (11 cases) - TestKillallKillSignals (9 cases) - TestFindExecdir (4 cases) - TestEtcPatternsUnaffectedByRefactor (6 regression guards on the existing /etc/ coverage after the _SYSTEM_CONFIG_PATH refactor) Inspiration: https://github.com/anthropics/claude-code/releases (Claude Code 2.1.113, April 17 2026 - "Enhanced deny rules" and "Dangerous path protection")
Support browser CDP URL from config
2026-04-17 17:10:26 -07:00 · 2026-04-17 16:05:04 -07:00 · 2026-04-17 18:02:37 -05:00 · 2026-04-17 15:53:57 -07:00 · 2026-04-17 17:51:40 -05:00 · 2026-04-17 15:49:14 -07:00
347 changed files with 56243 additions and 1496 deletions
@@ -1 +1,5 @@
+watch_file pyproject.toml uv.lock
+watch_file ui-tui/package-lock.json ui-tui/package.json
+watch_file flake.nix flake.lock nix/devShell.nix nix/tui.nix nix/package.nix nix/python.nix
+
 use flake
@@ -60,5 +60,6 @@ mini-swe-agent/

 # Nix
 .direnv/
+.nix-stamps/
 result
 website/static/api/skills-index.json
@@ -56,6 +56,19 @@ hermes-agent/
 │   ├── run.py            # Main loop, slash commands, message dispatch
 │   ├── session.py        # SessionStore — conversation persistence
 │   └── platforms/        # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal, qqbot
+├── ui-tui/               # Ink (React) terminal UI — `hermes --tui`
+│   ├── src/entry.tsx        # TTY gate + render()
+│   ├── src/app.tsx          # Main state machine and UI
+│   ├── src/gatewayClient.ts # Child process + JSON-RPC bridge
+│   ├── src/app/             # Decomposed app logic (event handler, slash handler, stores, hooks)
+│   ├── src/components/      # Ink components (branding, markdown, prompts, pickers, etc.)
+│   ├── src/hooks/           # useCompletion, useInputHistory, useQueue, useVirtualHistory
+│   └── src/lib/             # Pure helpers (history, osc52, text, rpc, messages)
+├── tui_gateway/          # Python JSON-RPC backend for the TUI
+│   ├── entry.py             # stdio entrypoint
+│   ├── server.py            # RPC handlers and session logic
+│   ├── render.py            # Optional rich/ANSI bridge
+│   └── slash_worker.py      # Persistent HermesCLI subprocess for slash commands
 ├── acp_adapter/          # ACP server (VS Code / Zed / JetBrains integration)
 ├── cron/                 # Scheduler (jobs.py, scheduler.py)
 ├── environments/         # RL training environments (Atropos)
@@ -179,6 +192,59 @@ if canonical == "mycommand":

 ---

+## TUI Architecture (ui-tui + tui_gateway)
+
+The TUI is a full replacement for the classic (prompt_toolkit) CLI, activated via `hermes --tui` or `HERMES_TUI=1`.
+
+### Process Model
+
+```
+hermes --tui
+  └─ Node (Ink)  ──stdio JSON-RPC──  Python (tui_gateway)
+       │                                  └─ AIAgent + tools + sessions
+       └─ renders transcript, composer, prompts, activity
+```
+
+TypeScript owns the screen. Python owns sessions, tools, model calls, and slash command logic.
+
+### Transport
+
+Newline-delimited JSON-RPC over stdio. Requests from Ink, events from Python. See `tui_gateway/server.py` for the full method/event catalog.
+
+### Key Surfaces
+
+| Surface | Ink component | Gateway method |
+|---------|---------------|----------------|
+| Chat streaming | `app.tsx` + `messageLine.tsx` | `prompt.submit` → `message.delta/complete` |
+| Tool activity | `thinking.tsx` | `tool.start/progress/complete` |
+| Approvals | `prompts.tsx` | `approval.respond` ← `approval.request` |
+| Clarify/sudo/secret | `prompts.tsx`, `maskedPrompt.tsx` | `clarify/sudo/secret.respond` |
+| Session picker | `sessionPicker.tsx` | `session.list/resume` |
+| Slash commands | Local handler + fallthrough | `slash.exec` → `_SlashWorker`, `command.dispatch` |
+| Completions | `useCompletion` hook | `complete.slash`, `complete.path` |
+| Theming | `theme.ts` + `branding.tsx` | `gateway.ready` with skin data |
+
+### Slash Command Flow
+
+1. Built-in client commands (`/help`, `/quit`, `/clear`, `/resume`, `/copy`, `/paste`, etc.) handled locally in `app.tsx`
+2. Everything else → `slash.exec` (runs in persistent `_SlashWorker` subprocess) → `command.dispatch` fallback
+
+### Dev Commands
+
+```bash
+cd ui-tui
+npm install       # first time
+npm run dev       # watch mode (rebuilds hermes-ink + tsx --watch)
+npm start         # production
+npm run build     # full build (hermes-ink + tsc)
+npm run type-check # typecheck only (tsc --noEmit)
+npm run lint      # eslint
+npm run fmt       # prettier
+npm test          # vitest
+```
+
+---
+
 ## Adding New Tools

 Requires changes in **2 files**:
@@ -13,7 +13,7 @@

 **The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.

-Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
+Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.

 <table>
 <tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
@@ -141,11 +141,18 @@ See `hermes claw migrate --help` for all options, or use the `openclaw-migration

 We welcome contributions! See the [Contributing Guide](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) for development setup, code style, and PR process.

-Quick start for contributors:
+Quick start for contributors — clone and go with `setup-hermes.sh`:

 ```bash
 git clone https://github.com/NousResearch/hermes-agent.git
 cd hermes-agent
+./setup-hermes.sh     # installs uv, creates venv, installs .[all], symlinks ~/.local/bin/hermes
+./hermes              # auto-detects the venv, no need to `source` first
+```
+
+Manual path (equivalent to the above):
+
+```bash
 curl -LsSf https://astral.sh/uv/install.sh | sh
 uv venv venv --python 3.11
 source venv/bin/activate
@@ -49,6 +49,7 @@ def make_tool_progress_cb(
    session_id: str,
    loop: asyncio.AbstractEventLoop,
    tool_call_ids: Dict[str, Deque[str]],
+    tool_call_meta: Dict[str, Dict[str, Any]],
 ) -> Callable:
    """Create a ``tool_progress_callback`` for AIAgent.

@@ -84,6 +85,16 @@ def make_tool_progress_cb(
            tool_call_ids[name] = queue
        queue.append(tc_id)

+        snapshot = None
+        if name in {"write_file", "patch", "skill_manage"}:
+            try:
+                from agent.display import capture_local_edit_snapshot
+
+                snapshot = capture_local_edit_snapshot(name, args)
+            except Exception:
+                logger.debug("Failed to capture ACP edit snapshot for %s", name, exc_info=True)
+        tool_call_meta[tc_id] = {"args": args, "snapshot": snapshot}
+
        update = build_tool_start(tc_id, name, args)
        _send_update(conn, session_id, loop, update)

@@ -119,6 +130,7 @@ def make_step_cb(
    session_id: str,
    loop: asyncio.AbstractEventLoop,
    tool_call_ids: Dict[str, Deque[str]],
+    tool_call_meta: Dict[str, Dict[str, Any]],
 ) -> Callable:
    """Create a ``step_callback`` for AIAgent.

@@ -132,10 +144,12 @@ def make_step_cb(
            for tool_info in prev_tools:
                tool_name = None
                result = None
+                function_args = None

                if isinstance(tool_info, dict):
                    tool_name = tool_info.get("name") or tool_info.get("function_name")
                    result = tool_info.get("result") or tool_info.get("output")
+                    function_args = tool_info.get("arguments") or tool_info.get("args")
                elif isinstance(tool_info, str):
                    tool_name = tool_info

@@ -145,8 +159,13 @@ def make_step_cb(
                    tool_call_ids[tool_name] = queue
                if tool_name and queue:
                    tc_id = queue.popleft()
+                    meta = tool_call_meta.pop(tc_id, {})
                    update = build_tool_complete(
-                        tc_id, tool_name, result=str(result) if result is not None else None
+                        tc_id,
+                        tool_name,
+                        result=str(result) if result is not None else None,
+                        function_args=function_args or meta.get("args"),
+                        snapshot=meta.get("snapshot"),
                    )
                    _send_update(conn, session_id, loop, update)
                    if not queue:
@@ -26,6 +26,7 @@ from acp.schema import (
    McpServerHttp,
    McpServerSse,
    McpServerStdio,
+    ModelInfo,
    NewSessionResponse,
    PromptResponse,
    ResumeSessionResponse,
@@ -36,6 +37,7 @@ from acp.schema import (
    SessionCapabilities,
    SessionForkCapabilities,
    SessionListCapabilities,
+    SessionModelState,
    SessionResumeCapabilities,
    SessionInfo,
    TextContentBlock,
@@ -147,6 +149,98 @@ class HermesACPAgent(acp.Agent):
        self._conn = conn
        logger.info("ACP client connected")

+    @staticmethod
+    def _encode_model_choice(provider: str | None, model: str | None) -> str:
+        """Encode a model selection so ACP clients can keep provider context."""
+        raw_model = str(model or "").strip()
+        if not raw_model:
+            return ""
+        raw_provider = str(provider or "").strip().lower()
+        if not raw_provider:
+            return raw_model
+        return f"{raw_provider}:{raw_model}"
+
+    def _build_model_state(self, state: SessionState) -> SessionModelState | None:
+        """Return the ACP model selector payload for editors like Zed."""
+        model = str(state.model or getattr(state.agent, "model", "") or "").strip()
+        provider = getattr(state.agent, "provider", None) or detect_provider() or "openrouter"
+
+        try:
+            from hermes_cli.models import curated_models_for_provider, normalize_provider, provider_label
+
+            normalized_provider = normalize_provider(provider)
+            provider_name = provider_label(normalized_provider)
+            available_models: list[ModelInfo] = []
+            seen_ids: set[str] = set()
+
+            for model_id, description in curated_models_for_provider(normalized_provider):
+                rendered_model = str(model_id or "").strip()
+                if not rendered_model:
+                    continue
+                choice_id = self._encode_model_choice(normalized_provider, rendered_model)
+                if choice_id in seen_ids:
+                    continue
+                desc_parts = [f"Provider: {provider_name}"]
+                if description:
+                    desc_parts.append(str(description).strip())
+                if rendered_model == model:
+                    desc_parts.append("current")
+                available_models.append(
+                    ModelInfo(
+                        model_id=choice_id,
+                        name=rendered_model,
+                        description=" • ".join(part for part in desc_parts if part),
+                    )
+                )
+                seen_ids.add(choice_id)
+
+            current_model_id = self._encode_model_choice(normalized_provider, model)
+            if current_model_id and current_model_id not in seen_ids:
+                available_models.insert(
+                    0,
+                    ModelInfo(
+                        model_id=current_model_id,
+                        name=model,
+                        description=f"Provider: {provider_name} • current",
+                    ),
+                )
+
+            if available_models:
+                return SessionModelState(
+                    available_models=available_models,
+                    current_model_id=current_model_id or available_models[0].model_id,
+                )
+        except Exception:
+            logger.debug("Could not build ACP model state", exc_info=True)
+
+        if not model:
+            return None
+
+        fallback_choice = self._encode_model_choice(provider, model)
+        return SessionModelState(
+            available_models=[ModelInfo(model_id=fallback_choice, name=model)],
+            current_model_id=fallback_choice,
+        )
+
+    @staticmethod
+    def _resolve_model_selection(raw_model: str, current_provider: str) -> tuple[str, str]:
+        """Resolve ``provider:model`` input into the provider and normalized model id."""
+        target_provider = current_provider
+        new_model = raw_model.strip()
+
+        try:
+            from hermes_cli.models import detect_provider_for_model, parse_model_input
+
+            target_provider, new_model = parse_model_input(new_model, current_provider)
+            if target_provider == current_provider:
+                detected = detect_provider_for_model(new_model, current_provider)
+                if detected:
+                    target_provider, new_model = detected
+        except Exception:
+            logger.debug("Provider detection failed, using model as-is", exc_info=True)
+
+        return target_provider, new_model
+
    async def _register_session_mcp_servers(
        self,
        state: SessionState,
@@ -273,7 +367,10 @@ class HermesACPAgent(acp.Agent):
        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("New session %s (cwd=%s)", state.session_id, cwd)
        self._schedule_available_commands_update(state.session_id)
-        return NewSessionResponse(session_id=state.session_id)
+        return NewSessionResponse(
+            session_id=state.session_id,
+            models=self._build_model_state(state),
+        )

    async def load_session(
        self,
@@ -289,7 +386,7 @@ class HermesACPAgent(acp.Agent):
        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Loaded session %s", session_id)
        self._schedule_available_commands_update(session_id)
-        return LoadSessionResponse()
+        return LoadSessionResponse(models=self._build_model_state(state))

    async def resume_session(
        self,
@@ -305,7 +402,7 @@ class HermesACPAgent(acp.Agent):
        await self._register_session_mcp_servers(state, mcp_servers)
        logger.info("Resumed session %s", state.session_id)
        self._schedule_available_commands_update(state.session_id)
-        return ResumeSessionResponse()
+        return ResumeSessionResponse(models=self._build_model_state(state))

    async def cancel(self, session_id: str, **kwargs: Any) -> None:
        state = self.session_manager.get_session(session_id)
@@ -340,11 +437,20 @@ class HermesACPAgent(acp.Agent):
        cwd: str | None = None,
        **kwargs: Any,
    ) -> ListSessionsResponse:
-        infos = self.session_manager.list_sessions()
-        sessions = [
-            SessionInfo(session_id=s["session_id"], cwd=s["cwd"])
-            for s in infos
-        ]
+        infos = self.session_manager.list_sessions(cwd=cwd)
+        sessions = []
+        for s in infos:
+            updated_at = s.get("updated_at")
+            if updated_at is not None and not isinstance(updated_at, str):
+                updated_at = str(updated_at)
+            sessions.append(
+                SessionInfo(
+                    session_id=s["session_id"],
+                    cwd=s["cwd"],
+                    title=s.get("title"),
+                    updated_at=updated_at,
+                )
+            )
        return ListSessionsResponse(sessions=sessions)

    # ---- Prompt (core) ------------------------------------------------------
@@ -389,12 +495,13 @@ class HermesACPAgent(acp.Agent):
            state.cancel_event.clear()

        tool_call_ids: dict[str, Deque[str]] = defaultdict(deque)
+        tool_call_meta: dict[str, dict[str, Any]] = {}
        previous_approval_cb = None

        if conn:
-            tool_progress_cb = make_tool_progress_cb(conn, session_id, loop, tool_call_ids)
+            tool_progress_cb = make_tool_progress_cb(conn, session_id, loop, tool_call_ids, tool_call_meta)
            thinking_cb = make_thinking_cb(conn, session_id, loop)
-            step_cb = make_step_cb(conn, session_id, loop, tool_call_ids)
+            step_cb = make_step_cb(conn, session_id, loop, tool_call_ids, tool_call_meta)
            message_cb = make_message_cb(conn, session_id, loop)
            approval_cb = make_approval_callback(conn.request_permission, loop, session_id)
        else:
@@ -449,6 +556,19 @@ class HermesACPAgent(acp.Agent):
            self.session_manager.save_session(session_id)

        final_response = result.get("final_response", "")
+        if final_response:
+            try:
+                from agent.title_generator import maybe_auto_title
+
+                maybe_auto_title(
+                    self.session_manager._get_db(),
+                    session_id,
+                    user_text,
+                    final_response,
+                    state.history,
+                )
+            except Exception:
+                logger.debug("Failed to auto-title ACP session %s", session_id, exc_info=True)
        if final_response and conn:
            update = acp.update_agent_message_text(final_response)
            await conn.session_update(session_id, update)
@@ -556,27 +676,15 @@ class HermesACPAgent(acp.Agent):
            provider = getattr(state.agent, "provider", None) or "auto"
            return f"Current model: {model}\nProvider: {provider}"

-        new_model = args.strip()
-        target_provider = None
        current_provider = getattr(state.agent, "provider", None) or "openrouter"
-
-        # Auto-detect provider for the requested model
-        try:
-            from hermes_cli.models import parse_model_input, detect_provider_for_model
-            target_provider, new_model = parse_model_input(new_model, current_provider)
-            if target_provider == current_provider:
-                detected = detect_provider_for_model(new_model, current_provider)
-                if detected:
-                    target_provider, new_model = detected
-        except Exception:
-            logger.debug("Provider detection failed, using model as-is", exc_info=True)
+        target_provider, new_model = self._resolve_model_selection(args, current_provider)

        state.model = new_model
        state.agent = self.session_manager._make_agent(
            session_id=state.session_id,
            cwd=state.cwd,
            model=new_model,
-            requested_provider=target_provider or current_provider,
+            requested_provider=target_provider,
        )
        self.session_manager.save_session(state.session_id)
        provider_label = getattr(state.agent, "provider", None) or target_provider or current_provider
@@ -678,20 +786,30 @@ class HermesACPAgent(acp.Agent):
        """Switch the model for a session (called by ACP protocol)."""
        state = self.session_manager.get_session(session_id)
        if state:
-            state.model = model_id
            current_provider = getattr(state.agent, "provider", None)
-            current_base_url = getattr(state.agent, "base_url", None)
-            current_api_mode = getattr(state.agent, "api_mode", None)
+            requested_provider, resolved_model = self._resolve_model_selection(
+                model_id,
+                current_provider or "openrouter",
+            )
+            state.model = resolved_model
+            provider_changed = bool(current_provider and requested_provider != current_provider)
+            current_base_url = None if provider_changed else getattr(state.agent, "base_url", None)
+            current_api_mode = None if provider_changed else getattr(state.agent, "api_mode", None)
            state.agent = self.session_manager._make_agent(
                session_id=session_id,
                cwd=state.cwd,
-                model=model_id,
-                requested_provider=current_provider,
+                model=resolved_model,
+                requested_provider=requested_provider,
                base_url=current_base_url,
                api_mode=current_api_mode,
            )
            self.session_manager.save_session(session_id)
-            logger.info("Session %s: model switched to %s", session_id, model_id)
+            logger.info(
+                "Session %s: model switched to %s via provider %s",
+                session_id,
+                resolved_model,
+                requested_provider,
+            )
            return SetSessionModelResponse()
        logger.warning("Session %s: model switch requested for missing session", session_id)
        return None
@@ -13,8 +13,12 @@ from hermes_constants import get_hermes_home
 import copy
 import json
 import logging
+import os
+import re
 import sys
+import time
 import uuid
+from datetime import datetime, timezone
 from dataclasses import dataclass, field
 from threading import Lock
 from typing import Any, Dict, List, Optional
@@ -22,6 +26,64 @@ from typing import Any, Dict, List, Optional
 logger = logging.getLogger(__name__)


+def _normalize_cwd_for_compare(cwd: str | None) -> str:
+    raw = str(cwd or ".").strip()
+    if not raw:
+        raw = "."
+    expanded = os.path.expanduser(raw)
+
+    # Normalize Windows drive paths into the equivalent WSL mount form so
+    # ACP history filters match the same workspace across Windows and WSL.
+    match = re.match(r"^([A-Za-z]):[\\/](.*)$", expanded)
+    if match:
+        drive = match.group(1).lower()
+        tail = match.group(2).replace("\\", "/")
+        expanded = f"/mnt/{drive}/{tail}"
+    elif re.match(r"^/mnt/[A-Za-z]/", expanded):
+        expanded = f"/mnt/{expanded[5].lower()}/{expanded[7:]}"
+
+    return os.path.normpath(expanded)
+
+
+def _build_session_title(title: Any, preview: Any, cwd: str | None) -> str:
+    explicit = str(title or "").strip()
+    if explicit:
+        return explicit
+    preview_text = str(preview or "").strip()
+    if preview_text:
+        return preview_text
+    leaf = os.path.basename(str(cwd or "").rstrip("/\\"))
+    return leaf or "New thread"
+
+
+def _format_updated_at(value: Any) -> str | None:
+    if value is None:
+        return None
+    if isinstance(value, str) and value.strip():
+        return value
+    try:
+        return datetime.fromtimestamp(float(value), tz=timezone.utc).isoformat()
+    except Exception:
+        return None
+
+
+def _updated_at_sort_key(value: Any) -> float:
+    if value is None:
+        return float("-inf")
+    if isinstance(value, (int, float)):
+        return float(value)
+    raw = str(value).strip()
+    if not raw:
+        return float("-inf")
+    try:
+        return datetime.fromisoformat(raw.replace("Z", "+00:00")).timestamp()
+    except Exception:
+        try:
+            return float(raw)
+        except Exception:
+            return float("-inf")
+
+
 def _acp_stderr_print(*args, **kwargs) -> None:
    """Best-effort human-readable output sink for ACP stdio sessions.

@@ -162,47 +224,78 @@ class SessionManager:
        logger.info("Forked ACP session %s -> %s", session_id, new_id)
        return state

-    def list_sessions(self) -> List[Dict[str, Any]]:
+    def list_sessions(self, cwd: str | None = None) -> List[Dict[str, Any]]:
        """Return lightweight info dicts for all sessions (memory + database)."""
+        normalized_cwd = _normalize_cwd_for_compare(cwd) if cwd else None
+        db = self._get_db()
+        persisted_rows: dict[str, dict[str, Any]] = {}
+
+        if db is not None:
+            try:
+                for row in db.list_sessions_rich(source="acp", limit=1000):
+                    persisted_rows[str(row["id"])] = dict(row)
+            except Exception:
+                logger.debug("Failed to load ACP sessions from DB", exc_info=True)
+
        # Collect in-memory sessions first.
        with self._lock:
            seen_ids = set(self._sessions.keys())
-            results = [
-                {
-                    "session_id": s.session_id,
-                    "cwd": s.cwd,
-                    "model": s.model,
-                    "history_len": len(s.history),
-                }
-                for s in self._sessions.values()
-            ]
+            results = []
+            for s in self._sessions.values():
+                history_len = len(s.history)
+                if history_len <= 0:
+                    continue
+                if normalized_cwd and _normalize_cwd_for_compare(s.cwd) != normalized_cwd:
+                    continue
+                persisted = persisted_rows.get(s.session_id, {})
+                preview = next(
+                    (
+                        str(msg.get("content") or "").strip()
+                        for msg in s.history
+                        if msg.get("role") == "user" and str(msg.get("content") or "").strip()
+                    ),
+                    persisted.get("preview") or "",
+                )
+                results.append(
+                    {
+                        "session_id": s.session_id,
+                        "cwd": s.cwd,
+                        "model": s.model,
+                        "history_len": history_len,
+                        "title": _build_session_title(persisted.get("title"), preview, s.cwd),
+                        "updated_at": _format_updated_at(
+                            persisted.get("last_active") or persisted.get("started_at") or time.time()
+                        ),
+                    }
+                )

        # Merge any persisted sessions not currently in memory.
-        db = self._get_db()
-        if db is not None:
-            try:
-                rows = db.search_sessions(source="acp", limit=1000)
-                for row in rows:
-                    sid = row["id"]
-                    if sid in seen_ids:
-                        continue
-                    # Extract cwd from model_config JSON.
-                    cwd = "."
-                    mc = row.get("model_config")
-                    if mc:
-                        try:
-                            cwd = json.loads(mc).get("cwd", ".")
-                        except (json.JSONDecodeError, TypeError):
-                            pass
-                    results.append({
-                        "session_id": sid,
-                        "cwd": cwd,
-                        "model": row.get("model") or "",
-                        "history_len": row.get("message_count") or 0,
-                    })
-            except Exception:
-                logger.debug("Failed to list ACP sessions from DB", exc_info=True)
+        for sid, row in persisted_rows.items():
+            if sid in seen_ids:
+                continue
+            message_count = int(row.get("message_count") or 0)
+            if message_count <= 0:
+                continue
+            # Extract cwd from model_config JSON.
+            session_cwd = "."
+            mc = row.get("model_config")
+            if mc:
+                try:
+                    session_cwd = json.loads(mc).get("cwd", ".")
+                except (json.JSONDecodeError, TypeError):
+                    pass
+            if normalized_cwd and _normalize_cwd_for_compare(session_cwd) != normalized_cwd:
+                continue
+            results.append({
+                "session_id": sid,
+                "cwd": session_cwd,
+                "model": row.get("model") or "",
+                "history_len": message_count,
+                "title": _build_session_title(row.get("title"), row.get("preview"), session_cwd),
+                "updated_at": _format_updated_at(row.get("last_active") or row.get("started_at")),
+            })

+        results.sort(key=lambda item: _updated_at_sort_key(item.get("updated_at")), reverse=True)
        return results

    def update_cwd(self, session_id: str, cwd: str) -> Optional[SessionState]:
@@ -2,6 +2,7 @@

 from __future__ import annotations

+import json
 import uuid
 from typing import Any, Dict, List, Optional

@@ -96,6 +97,170 @@ def build_tool_title(tool_name: str, args: Dict[str, Any]) -> str:
    return tool_name


+def _build_patch_mode_content(patch_text: str) -> List[Any]:
+    """Parse V4A patch mode input into ACP diff blocks when possible."""
+    if not patch_text:
+        return [acp.tool_content(acp.text_block(""))]
+
+    try:
+        from tools.patch_parser import OperationType, parse_v4a_patch
+
+        operations, error = parse_v4a_patch(patch_text)
+        if error or not operations:
+            return [acp.tool_content(acp.text_block(patch_text))]
+
+        content: List[Any] = []
+        for op in operations:
+            if op.operation == OperationType.UPDATE:
+                old_chunks: list[str] = []
+                new_chunks: list[str] = []
+                for hunk in op.hunks:
+                    old_lines = [line.content for line in hunk.lines if line.prefix in (" ", "-")]
+                    new_lines = [line.content for line in hunk.lines if line.prefix in (" ", "+")]
+                    if old_lines or new_lines:
+                        old_chunks.append("\n".join(old_lines))
+                        new_chunks.append("\n".join(new_lines))
+
+                old_text = "\n...\n".join(chunk for chunk in old_chunks if chunk)
+                new_text = "\n...\n".join(chunk for chunk in new_chunks if chunk)
+                if old_text or new_text:
+                    content.append(
+                        acp.tool_diff_content(
+                            path=op.file_path,
+                            old_text=old_text or None,
+                            new_text=new_text or "",
+                        )
+                    )
+                continue
+
+            if op.operation == OperationType.ADD:
+                added_lines = [line.content for hunk in op.hunks for line in hunk.lines if line.prefix == "+"]
+                content.append(
+                    acp.tool_diff_content(
+                        path=op.file_path,
+                        new_text="\n".join(added_lines),
+                    )
+                )
+                continue
+
+            if op.operation == OperationType.DELETE:
+                content.append(
+                    acp.tool_diff_content(
+                        path=op.file_path,
+                        old_text=f"Delete file: {op.file_path}",
+                        new_text="",
+                    )
+                )
+                continue
+
+            if op.operation == OperationType.MOVE:
+                content.append(
+                    acp.tool_content(acp.text_block(f"Move file: {op.file_path} -> {op.new_path}"))
+                )
+
+        return content or [acp.tool_content(acp.text_block(patch_text))]
+    except Exception:
+        return [acp.tool_content(acp.text_block(patch_text))]
+
+
+def _strip_diff_prefix(path: str) -> str:
+    raw = str(path or "").strip()
+    if raw.startswith(("a/", "b/")):
+        return raw[2:]
+    return raw
+
+
+def _parse_unified_diff_content(diff_text: str) -> List[Any]:
+    """Convert unified diff text into ACP diff content blocks."""
+    if not diff_text:
+        return []
+
+    content: List[Any] = []
+    current_old_path: Optional[str] = None
+    current_new_path: Optional[str] = None
+    old_lines: list[str] = []
+    new_lines: list[str] = []
+
+    def _flush() -> None:
+        nonlocal current_old_path, current_new_path, old_lines, new_lines
+        if current_old_path is None and current_new_path is None:
+            return
+        path = current_new_path if current_new_path and current_new_path != "/dev/null" else current_old_path
+        if not path or path == "/dev/null":
+            current_old_path = None
+            current_new_path = None
+            old_lines = []
+            new_lines = []
+            return
+        content.append(
+            acp.tool_diff_content(
+                path=_strip_diff_prefix(path),
+                old_text="\n".join(old_lines) if old_lines else None,
+                new_text="\n".join(new_lines),
+            )
+        )
+        current_old_path = None
+        current_new_path = None
+        old_lines = []
+        new_lines = []
+
+    for line in diff_text.splitlines():
+        if line.startswith("--- "):
+            _flush()
+            current_old_path = line[4:].strip()
+            continue
+        if line.startswith("+++ "):
+            current_new_path = line[4:].strip()
+            continue
+        if line.startswith("@@"):
+            continue
+        if current_old_path is None and current_new_path is None:
+            continue
+        if line.startswith("+"):
+            new_lines.append(line[1:])
+        elif line.startswith("-"):
+            old_lines.append(line[1:])
+        elif line.startswith(" "):
+            shared = line[1:]
+            old_lines.append(shared)
+            new_lines.append(shared)
+
+    _flush()
+    return content
+
+
+def _build_tool_complete_content(
+    tool_name: str,
+    result: Optional[str],
+    *,
+    function_args: Optional[Dict[str, Any]] = None,
+    snapshot: Any = None,
+) -> List[Any]:
+    """Build structured ACP completion content, falling back to plain text."""
+    display_result = result or ""
+    if len(display_result) > 5000:
+        display_result = display_result[:4900] + f"\n... ({len(result)} chars total, truncated)"
+
+    if tool_name in {"write_file", "patch", "skill_manage"}:
+        try:
+            from agent.display import extract_edit_diff
+
+            diff_text = extract_edit_diff(
+                tool_name,
+                result,
+                function_args=function_args,
+                snapshot=snapshot,
+            )
+            if isinstance(diff_text, str) and diff_text.strip():
+                diff_content = _parse_unified_diff_content(diff_text)
+                if diff_content:
+                    return diff_content
+        except Exception:
+            pass
+
+    return [acp.tool_content(acp.text_block(display_result))]
+
+
 # ---------------------------------------------------------------------------
 # Build ACP content objects for tool-call events
 # ---------------------------------------------------------------------------
@@ -119,9 +284,8 @@ def build_tool_start(
            new = arguments.get("new_string", "")
            content = [acp.tool_diff_content(path=path, new_text=new, old_text=old)]
        else:
-            # Patch mode — show the patch content as text
            patch_text = arguments.get("patch", "")
-            content = [acp.tool_content(acp.text_block(patch_text))]
+            content = _build_patch_mode_content(patch_text)
        return acp.start_tool_call(
            tool_call_id, title, kind=kind, content=content, locations=locations,
            raw_input=arguments,
@@ -178,16 +342,17 @@ def build_tool_complete(
    tool_call_id: str,
    tool_name: str,
    result: Optional[str] = None,
+    function_args: Optional[Dict[str, Any]] = None,
+    snapshot: Any = None,
 ) -> ToolCallProgress:
    """Create a ToolCallUpdate (progress) event for a completed tool call."""
    kind = get_tool_kind(tool_name)
-
-    # Truncate very large results for the UI
-    display_result = result or ""
-    if len(display_result) > 5000:
-        display_result = display_result[:4900] + f"\n... ({len(result)} chars total, truncated)"
-
-    content = [acp.tool_content(acp.text_block(display_result))]
+    content = _build_tool_complete_content(
+        tool_name,
+        result,
+        function_args=function_args,
+        snapshot=snapshot,
+    )
    return acp.update_tool_call(
        tool_call_id,
        kind=kind,
@@ -94,6 +94,17 @@ def _normalize_aux_provider(provider: Optional[str]) -> str:
        return "custom"
    return _PROVIDER_ALIASES.get(normalized, normalized)

+
+_FIXED_TEMPERATURE_MODELS: Dict[str, float] = {
+    "kimi-for-coding": 0.6,
+}
+
+
+def _fixed_temperature_for_model(model: Optional[str]) -> Optional[float]:
+    """Return a required temperature override for models with strict contracts."""
+    normalized = (model or "").strip().lower()
+    return _FIXED_TEMPERATURE_MODELS.get(normalized)
+
 # Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
 _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
    "gemini": "gemini-3-flash-preview",
@@ -2293,6 +2304,10 @@ def _build_call_kwargs(
        "timeout": timeout,
    }

+    fixed_temperature = _fixed_temperature_for_model(model)
+    if fixed_temperature is not None:
+        temperature = fixed_temperature
+
    # Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
    # drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
    # flush_memories, 0 on structured-JSON extraction) don't 400 the moment
@@ -747,18 +747,149 @@ class GeminiCloudCodeClient:


 def _gemini_http_error(response: httpx.Response) -> CodeAssistError:
+    """Translate an httpx response into a CodeAssistError with rich metadata.
+
+    Parses Google's error envelope (``{"error": {"code", "message", "status",
+    "details": [...]}}``) so the agent's error classifier can reason about
+    the failure — ``status_code`` enables the rate_limit / auth classification
+    paths, and ``response`` lets the main loop honor ``Retry-After`` just
+    like it does for OpenAI SDK exceptions.
+
+    Also lifts a few recognizable Google conditions into human-readable
+    messages so the user sees something better than a 500-char JSON dump:
+
+        MODEL_CAPACITY_EXHAUSTED → "Gemini model capacity exhausted for
+            <model>. This is a Google-side throttle..."
+        RESOURCE_EXHAUSTED w/o reason → quota-style message
+        404 → "Model <name> not found at cloudcode-pa..."
+    """
    status = response.status_code
+
+    # Parse the body once, surviving any weird encodings.
+    body_text = ""
+    body_json: Dict[str, Any] = {}
    try:
-        body = response.text[:500]
+        body_text = response.text
    except Exception:
-        body = ""
-    # Let run_agent's retry logic see auth errors as rotatable via `api_key`
+        body_text = ""
+    if body_text:
+        try:
+            parsed = json.loads(body_text)
+            if isinstance(parsed, dict):
+                body_json = parsed
+        except (ValueError, TypeError):
+            body_json = {}
+
+    # Dig into Google's error envelope.  Shape is:
+    #   {"error": {"code": 429, "message": "...", "status": "RESOURCE_EXHAUSTED",
+    #              "details": [{"@type": ".../ErrorInfo", "reason": "MODEL_CAPACITY_EXHAUSTED",
+    #                           "metadata": {...}},
+    #                          {"@type": ".../RetryInfo", "retryDelay": "30s"}]}}
+    err_obj = body_json.get("error") if isinstance(body_json, dict) else None
+    if not isinstance(err_obj, dict):
+        err_obj = {}
+    err_status = str(err_obj.get("status") or "").strip()
+    err_message = str(err_obj.get("message") or "").strip()
+    err_details_list = err_obj.get("details") if isinstance(err_obj.get("details"), list) else []
+
+    # Extract google.rpc.ErrorInfo reason + metadata.  There may be more
+    # than one ErrorInfo (rare), so we pick the first one with a reason.
+    error_reason = ""
+    error_metadata: Dict[str, Any] = {}
+    retry_delay_seconds: Optional[float] = None
+    for detail in err_details_list:
+        if not isinstance(detail, dict):
+            continue
+        type_url = str(detail.get("@type") or "")
+        if not error_reason and type_url.endswith("/google.rpc.ErrorInfo"):
+            reason = detail.get("reason")
+            if isinstance(reason, str) and reason:
+                error_reason = reason
+            md = detail.get("metadata")
+            if isinstance(md, dict):
+                error_metadata = md
+        elif retry_delay_seconds is None and type_url.endswith("/google.rpc.RetryInfo"):
+            # retryDelay is a google.protobuf.Duration string like "30s" or "1.5s".
+            delay_raw = detail.get("retryDelay")
+            if isinstance(delay_raw, str) and delay_raw.endswith("s"):
+                try:
+                    retry_delay_seconds = float(delay_raw[:-1])
+                except ValueError:
+                    pass
+            elif isinstance(delay_raw, (int, float)):
+                retry_delay_seconds = float(delay_raw)
+
+    # Fall back to the Retry-After header if the body didn't include RetryInfo.
+    if retry_delay_seconds is None:
+        try:
+            header_val = response.headers.get("Retry-After") or response.headers.get("retry-after")
+        except Exception:
+            header_val = None
+        if header_val:
+            try:
+                retry_delay_seconds = float(header_val)
+            except (TypeError, ValueError):
+                retry_delay_seconds = None
+
+    # Classify the error code.  ``code_assist_rate_limited`` stays the default
+    # for 429s; a more specific reason tag helps downstream callers (e.g. tests,
+    # logs) without changing the rate_limit classification path.
    code = f"code_assist_http_{status}"
    if status == 401:
        code = "code_assist_unauthorized"
    elif status == 429:
        code = "code_assist_rate_limited"
+        if error_reason == "MODEL_CAPACITY_EXHAUSTED":
+            code = "code_assist_capacity_exhausted"
+
+    # Build a human-readable message.  Keep the status + a raw-body tail for
+    # debugging, but lead with a friendlier summary when we recognize the
+    # Google signal.
+    model_hint = ""
+    if isinstance(error_metadata, dict):
+        model_hint = str(error_metadata.get("model") or error_metadata.get("modelId") or "").strip()
+
+    if status == 429 and error_reason == "MODEL_CAPACITY_EXHAUSTED":
+        target = model_hint or "this Gemini model"
+        message = (
+            f"Gemini capacity exhausted for {target} (Google-side throttle, "
+            f"not a Hermes issue). Try a different Gemini model or set a "
+            f"fallback_providers entry to a non-Gemini provider."
+        )
+        if retry_delay_seconds is not None:
+            message += f" Google suggests retrying in {retry_delay_seconds:g}s."
+    elif status == 429 and err_status == "RESOURCE_EXHAUSTED":
+        message = (
+            f"Gemini quota exhausted ({err_message or 'RESOURCE_EXHAUSTED'}). "
+            f"Check /gquota for remaining daily requests."
+        )
+        if retry_delay_seconds is not None:
+            message += f" Retry suggested in {retry_delay_seconds:g}s."
+    elif status == 404:
+        # Google returns 404 when a model has been retired or renamed.
+        target = model_hint or (err_message or "model")
+        message = (
+            f"Code Assist 404: {target} is not available at "
+            f"cloudcode-pa.googleapis.com. It may have been renamed or "
+            f"retired. Check hermes_cli/models.py for the current list."
+        )
+    elif err_message:
+        # Generic fallback with the parsed message.
+        message = f"Code Assist HTTP {status} ({err_status or 'error'}): {err_message}"
+    else:
+        # Last-ditch fallback — raw body snippet.
+        message = f"Code Assist returned HTTP {status}: {body_text[:500]}"
+
    return CodeAssistError(
-        f"Code Assist returned HTTP {status}: {body}",
+        message,
        code=code,
+        status_code=status,
+        response=response,
+        retry_after=retry_delay_seconds,
+        details={
+            "status": err_status,
+            "reason": error_reason,
+            "metadata": error_metadata,
+            "message": err_message,
+        },
    )
@@ -68,9 +68,45 @@ _ONBOARDING_POLL_INTERVAL_SECONDS = 5.0


 class CodeAssistError(RuntimeError):
-    def __init__(self, message: str, *, code: str = "code_assist_error") -> None:
+    """Exception raised by the Code Assist (``cloudcode-pa``) integration.
+
+    Carries HTTP status / response / retry-after metadata so the agent's
+    ``error_classifier._extract_status_code`` and the main loop's Retry-After
+    handling (which walks ``error.response.headers``) pick up the right
+    signals.  Without these, 429s from the OAuth path look like opaque
+    ``RuntimeError`` and skip the rate-limit path.
+    """
+
+    def __init__(
+        self,
+        message: str,
+        *,
+        code: str = "code_assist_error",
+        status_code: Optional[int] = None,
+        response: Any = None,
+        retry_after: Optional[float] = None,
+        details: Optional[Dict[str, Any]] = None,
+    ) -> None:
        super().__init__(message)
        self.code = code
+        # ``status_code`` is picked up by ``agent.error_classifier._extract_status_code``
+        # so a 429 from Code Assist classifies as FailoverReason.rate_limit and
+        # triggers the main loop's fallback_providers chain the same way SDK
+        # errors do.
+        self.status_code = status_code
+        # ``response`` is the underlying ``httpx.Response`` (or a shim with a
+        # ``.headers`` mapping and ``.json()`` method).  The main loop reads
+        # ``error.response.headers["Retry-After"]`` to honor Google's retry
+        # hints when the backend throttles us.
+        self.response = response
+        # Parsed ``Retry-After`` seconds (kept separately for convenience —
+        # Google returns retry hints in both the header and the error body's
+        # ``google.rpc.RetryInfo`` details, and we pick whichever we found).
+        self.retry_after = retry_after
+        # Parsed structured error details from the Google error envelope
+        # (e.g. ``{"reason": "MODEL_CAPACITY_EXHAUSTED", "status": "RESOURCE_EXHAUSTED"}``).
+        # Useful for logging and for tests that want to assert on specifics.
+        self.details = details or {}


 class ProjectIdRequiredError(CodeAssistError):
@@ -38,6 +38,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "mimo", "xiaomi-mimo",
    "arcee-ai", "arceeai",
    "xai", "x-ai", "x.ai", "grok",
+    "nvidia", "nim", "nvidia-nim", "nemotron",
    "qwen-portal",
 })

@@ -124,7 +125,6 @@ DEFAULT_CONTEXT_LENGTHS = {
    "gemini": 1048576,
    # Gemma (open models served via AI Studio)
    "gemma-4-31b": 256000,
-    "gemma-4-26b": 256000,
    "gemma-3": 131072,
    "gemma": 8192,  # fallback for older gemma models
    # DeepSeek
@@ -158,6 +158,8 @@ DEFAULT_CONTEXT_LENGTHS = {
    "grok": 131072,             # catch-all (grok-beta, unknown grok-*)
    # Kimi
    "kimi": 262144,
+    # Nemotron — NVIDIA's open-weights series (128K context across all sizes)
+    "nemotron": 131072,
    # Arcee
    "trinity": 262144,
    # OpenRouter
@@ -240,6 +242,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.fireworks.ai": "fireworks",
    "opencode.ai": "opencode-go",
    "api.x.ai": "xai",
+    "integrate.api.nvidia.com": "nvidia",
    "api.xiaomimimo.com": "xiaomi",
    "xiaomimimo.com": "xiaomi",
    "ollama.com": "ollama-cloud",
@@ -24,6 +24,7 @@ model:
  #   "minimax"      - MiniMax global (requires: MINIMAX_API_KEY)
  #   "minimax-cn"   - MiniMax China (requires: MINIMAX_CN_API_KEY)
  #   "huggingface"  - Hugging Face Inference (requires: HF_TOKEN)
+  #   "nvidia"       - NVIDIA NIM / build.nvidia.com (requires: NVIDIA_API_KEY)
  #   "xiaomi"       - Xiaomi MiMo (requires: XIAOMI_API_KEY)
  #   "arcee"        - Arcee AI Trinity models (requires: ARCEEAI_API_KEY)
  #   "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY — https://ollama.com/settings)
@@ -18,6 +18,8 @@ import os
 import shutil
 import sys
 import json
+import re
+import base64
 import atexit
 import tempfile
 import time
@@ -78,6 +80,42 @@ _project_env = Path(__file__).parent / '.env'
 load_hermes_dotenv(hermes_home=_hermes_home, project_env=_project_env)


+_REASONING_TAGS = (
+    "REASONING_SCRATCHPAD",
+    "think",
+    "reasoning",
+    "THINKING",
+    "thinking",
+)
+
+
+def _strip_reasoning_tags(text: str) -> str:
+    cleaned = text
+    for tag in _REASONING_TAGS:
+        cleaned = re.sub(rf"<{tag}>.*?</{tag}>\s*", "", cleaned, flags=re.DOTALL)
+        cleaned = re.sub(rf"<{tag}>.*$", "", cleaned, flags=re.DOTALL)
+    return cleaned.strip()
+
+
+def _assistant_content_as_text(content: Any) -> str:
+    if content is None:
+        return ""
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        parts = [
+            str(part.get("text", ""))
+            for part in content
+            if isinstance(part, dict) and part.get("type") == "text"
+        ]
+        return "\n".join(p for p in parts if p)
+    return str(content)
+
+
+def _assistant_copy_text(content: Any) -> str:
+    return _strip_reasoning_tags(_assistant_content_as_text(content))
+
+
 # =============================================================================
 # Configuration Loading
 # =============================================================================
@@ -1172,6 +1210,10 @@ def _resolve_attachment_path(raw_path: str) -> Path | None:
        return None

    expanded = os.path.expandvars(os.path.expanduser(token))
+    if os.name != "nt":
+        normalized = expanded.replace("\\", "/")
+        if len(normalized) >= 3 and normalized[1] == ":" and normalized[2] == "/" and normalized[0].isalpha():
+            expanded = f"/mnt/{normalized[0].lower()}/{normalized[3:]}"
    path = Path(expanded)
    if not path.is_absolute():
        base_dir = Path(os.getenv("TERMINAL_CWD", os.getcwd()))
@@ -1254,10 +1296,12 @@ def _detect_file_drop(user_input: str) -> "dict | None":
        or stripped.startswith("~")
        or stripped.startswith("./")
        or stripped.startswith("../")
+        or (len(stripped) >= 3 and stripped[1] == ":" and stripped[2] in ("\\", "/") and stripped[0].isalpha())
        or stripped.startswith('"/')
        or stripped.startswith('"~')
        or stripped.startswith("'/")
        or stripped.startswith("'~")
+        or (len(stripped) >= 4 and stripped[0] in ("'", '"') and stripped[2] == ":" and stripped[3] in ("\\", "/") and stripped[1].isalpha())
    )
    if not starts_like_path:
        return None
@@ -3125,21 +3169,6 @@ class HermesCLI:
        MAX_ASST_LEN = 200           # truncate assistant text
        MAX_ASST_LINES = 3           # max lines of assistant text

-        def _strip_reasoning(text: str) -> str:
-            """Remove <REASONING_SCRATCHPAD>...</REASONING_SCRATCHPAD> blocks
-            from displayed text (reasoning model internal thoughts)."""
-            import re
-            cleaned = re.sub(
-                r"<REASONING_SCRATCHPAD>.*?</REASONING_SCRATCHPAD>\s*",
-                "", text, flags=re.DOTALL,
-            )
-            # Also strip unclosed reasoning tags at the end
-            cleaned = re.sub(
-                r"<REASONING_SCRATCHPAD>.*$",
-                "", cleaned, flags=re.DOTALL,
-            )
-            return cleaned.strip()
-
        # Collect displayable entries (skip system, tool-result messages)
        entries = []  # list of (role, display_text)
        _last_asst_idx = None       # index of last assistant entry
@@ -3171,7 +3200,7 @@ class HermesCLI:

            elif role == "assistant":
                text = "" if content is None else str(content)
-                text = _strip_reasoning(text)
+                text = _strip_reasoning_tags(text)
                parts = []
                full_parts = []  # un-truncated version
                if text:
@@ -3510,6 +3539,26 @@ class HermesCLI:
        killed = process_registry.kill_all()
        print(f"  ✅ Stopped {killed} process(es).")

+    def _handle_agents_command(self):
+        """Handle /agents — show background processes and agent status."""
+        from tools.process_registry import format_uptime_short, process_registry
+
+        processes = process_registry.list_sessions()
+        running = [p for p in processes if p.get("status") == "running"]
+        finished = [p for p in processes if p.get("status") != "running"]
+
+        _cprint(f"  Running processes: {len(running)}")
+        for p in running:
+            cmd = p.get("command", "")[:80]
+            up = format_uptime_short(p.get("uptime_seconds", 0))
+            _cprint(f"    {p.get('session_id', '?')} · {up} · {cmd}")
+
+        if finished:
+            _cprint(f"  Recently finished: {len(finished)}")
+
+        agent_running = getattr(self, "_agent_running", False)
+        _cprint(f"  Agent: {'running' if agent_running else 'idle'}")
+
    def _handle_paste_command(self):
        """Handle /paste — explicitly check clipboard for an image.

@@ -3535,6 +3584,61 @@ class HermesCLI:
        else:
            _cprint(f"  {_DIM}(._.) No image found in clipboard{_RST}")

+    def _write_osc52_clipboard(self, text: str) -> None:
+        """Copy *text* to terminal clipboard via OSC 52."""
+        payload = base64.b64encode(text.encode("utf-8")).decode("ascii")
+        seq = f"\x1b]52;c;{payload}\x07"
+        out = getattr(self, "_app", None)
+        output = getattr(out, "output", None) if out else None
+        if output and hasattr(output, "write_raw"):
+            output.write_raw(seq)
+            output.flush()
+            return
+        if output and hasattr(output, "write"):
+            output.write(seq)
+            output.flush()
+            return
+        sys.stdout.write(seq)
+        sys.stdout.flush()
+
+    def _handle_copy_command(self, cmd_original: str) -> None:
+        """Handle /copy [number] — copy assistant output to clipboard."""
+        parts = cmd_original.split(maxsplit=1)
+        arg = parts[1].strip() if len(parts) > 1 else ""
+
+        assistant = [m for m in self.conversation_history if m.get("role") == "assistant"]
+        if not assistant:
+            _cprint("  Nothing to copy yet.")
+            return
+
+        if arg:
+            try:
+                idx = int(arg) - 1
+            except ValueError:
+                _cprint("  Usage: /copy [number]")
+                return
+            if idx < 0 or idx >= len(assistant):
+                _cprint(f"  Invalid response number. Use 1-{len(assistant)}.")
+                return
+        else:
+            idx = len(assistant) - 1
+            while idx >= 0 and not _assistant_copy_text(assistant[idx].get("content")):
+                idx -= 1
+            if idx < 0:
+                _cprint("  Nothing to copy in assistant responses yet.")
+                return
+
+        text = _assistant_copy_text(assistant[idx].get("content"))
+        if not text:
+            _cprint("  Nothing to copy in that assistant response.")
+            return
+
+        try:
+            self._write_osc52_clipboard(text)
+            _cprint(f"  Copied assistant response #{idx + 1} to clipboard")
+        except Exception as e:
+            _cprint(f"  Clipboard copy failed: {e}")
+
    def _handle_image_command(self, cmd_original: str):
        """Handle /image <path> — attach a local image file for the next prompt."""
        raw_args = (cmd_original.split(None, 1)[1].strip() if " " in cmd_original else "")
@@ -3671,7 +3775,7 @@ class HermesCLI:
            skin = get_active_skin()
            separator_color = skin.get_color("banner_dim", "#B8860B")
            accent_color = skin.get_color("ui_accent", "#FFBF00")
-            label_color = skin.get_color("ui_label", "#4dd0e1")
+            label_color = skin.get_color("ui_label", "#DAA520")
        except Exception:
            separator_color, accent_color, label_color = "#B8860B", "#FFBF00", "cyan"
        toolsets_info = ""
@@ -5553,6 +5657,8 @@ class HermesCLI:
            self._show_usage()
        elif canonical == "insights":
            self._show_insights(cmd_original)
+        elif canonical == "copy":
+            self._handle_copy_command(cmd_original)
        elif canonical == "debug":
            self._handle_debug_command()
        elif canonical == "paste":
@@ -5596,6 +5702,8 @@ class HermesCLI:
            self._handle_snapshot_command(cmd_original)
        elif canonical == "stop":
            self._handle_stop_command()
+        elif canonical == "agents":
+            self._handle_agents_command()
        elif canonical == "background":
            self._handle_background_command(cmd_original)
        elif canonical == "btw":
@@ -65,7 +65,15 @@ _HOME_TARGET_ENV_VARS = {
    "wecom": "WECOM_HOME_CHANNEL",
    "weixin": "WEIXIN_HOME_CHANNEL",
    "bluebubbles": "BLUEBUBBLES_HOME_CHANNEL",
-    "qqbot": "QQ_HOME_CHANNEL",
+    "qqbot": "QQBOT_HOME_CHANNEL",
+}
+
+# Legacy env var names kept for back-compat.  Each entry is the current
+# primary env var → the previous name.  _get_home_target_chat_id falls
+# back to the legacy name if the primary is unset, so users who set the
+# old name before the rename keep working until they migrate.
+_LEGACY_HOME_TARGET_ENV_VARS = {
+    "QQBOT_HOME_CHANNEL": "QQ_HOME_CHANNEL",
 }

 from cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_run
@@ -100,7 +108,12 @@ def _get_home_target_chat_id(platform_name: str) -> str:
    env_var = _HOME_TARGET_ENV_VARS.get(platform_name.lower())
    if not env_var:
        return ""
-    return os.getenv(env_var, "")
+    value = os.getenv(env_var, "")
+    if not value:
+        legacy = _LEGACY_HOME_TARGET_ENV_VARS.get(env_var)
+        if legacy:
+            value = os.getenv(legacy, "")
+    return value


 def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[dict]:
@@ -0,0 +1,108 @@
+# Ink Gateway TUI Migration — Post-mortem
+
+Planned: 2026-04-01 · Delivered: 2026-04 · Status: shipped, classic (prompt_toolkit) CLI still present
+
+## What Shipped
+
+Three layers, same repo, Python runtime unchanged.
+
+```
+ui-tui (Node/TS)  ──stdio JSON-RPC──▶  tui_gateway (Py)  ──▶  AIAgent (run_agent.py)
+```
+
+### Backend — `tui_gateway/`
+
+```
+tui_gateway/
+├── entry.py          # subprocess entrypoint, stdio read/write loop
+├── server.py         # everything: sessions dict, @method handlers, _emit
+├── render.py         # stream renderer, diff rendering, message rendering
+├── slash_worker.py   # subprocess that runs hermes_cli slash commands
+└── __init__.py
+```
+
+`server.py` owns the full runtime-control surface: session store (`_sessions: dict[str, dict]`), method registry (`@method("…")` decorator), event emitter (`_emit`), agent lifecycle (`_make_agent`, `_init_session`, `_wire_callbacks`), approval/sudo/clarify round-trips, and JSON-RPC dispatch.
+
+Protocol methods (`@method(...)` in `server.py`):
+
+- session: `session.{create, resume, list, close, interrupt, usage, history, compress, branch, title, save, undo}`
+- prompt: `prompt.{submit, background, btw}`
+- tools: `tools.{list, show, configure}`
+- slash: `slash.exec`, `command.{dispatch, resolve}`, `commands.catalog`, `complete.{path, slash}`
+- approvals: `approval.respond`, `sudo.respond`, `clarify.respond`, `secret.respond`
+- config/state: `config.{get, set, show}`, `model.options`, `reload.mcp`
+- ops: `shell.exec`, `cli.exec`, `terminal.resize`, `input.detect_drop`, `clipboard.paste`, `paste.collapse`, `image.attach`, `process.stop`
+- misc: `agents.list`, `skills.manage`, `plugins.list`, `cron.manage`, `insights.get`, `rollback.{list, diff, restore}`, `browser.manage`
+
+Protocol events (`_emit(…)` → handled in `ui-tui/src/app/createGatewayEventHandler.ts`):
+
+- lifecycle: `gateway.{ready, stderr}`, `session.info`, `skin.changed`
+- stream: `message.{start, delta, complete}`, `thinking.delta`, `reasoning.{delta, available}`, `status.update`
+- tools: `tool.{start, progress, complete, generating}`, `subagent.{start, thinking, tool, progress, complete}`
+- interactive: `approval.request`, `sudo.request`, `clarify.request`, `secret.request`
+- async: `background.complete`, `btw.complete`, `error`
+
+### Frontend — `ui-tui/src/`
+
+```
+src/
+├── entry.tsx            # node bootstrap: bootBanner → spawn python → dynamic-import Ink → render(<App/>)
+├── app.tsx              # <GatewayProvider> wraps <AppLayout>
+├── bootBanner.ts        # raw-ANSI banner to stdout in ~2ms, pre-React
+├── gatewayClient.ts     # JSON-RPC client over child_process stdio
+├── gatewayTypes.ts      # typed RPC responses + GatewayEvent union
+├── theme.ts             # DEFAULT_THEME + fromSkin
+│
+├── app/                 # hooks + stores — the orchestration layer
+│   ├── uiStore.ts             # nanostore: sid, info, busy, usage, theme, status…
+│   ├── turnStore.ts           # nanostore: per-turn activity / reasoning / tools
+│   ├── turnController.ts      # imperative singleton for stream-time operations
+│   ├── overlayStore.ts        # nanostore: modal/overlay state
+│   ├── useMainApp.ts          # top-level composition hook
+│   ├── useSessionLifecycle.ts # session.create/resume/close/reset
+│   ├── useSubmission.ts       # shell/slash/prompt dispatch + interpolation
+│   ├── useConfigSync.ts       # config.get + mtime poll
+│   ├── useComposerState.ts    # input buffer, paste snippets, editor mode
+│   ├── useInputHandlers.ts    # key bindings
+│   ├── createGatewayEventHandler.ts  # event-stream dispatcher
+│   ├── createSlashHandler.ts         # slash command router (registry + python fallback)
+│   └── slash/commands/        # core.ts, ops.ts, session.ts — TS-owned slash commands
+│
+├── components/          # AppLayout, AppChrome, AppOverlays, MessageLine, Thinking, Markdown, pickers, prompts, Banner, SessionPanel
+├── config/              # env, limits, timing constants
+├── content/             # charms, faces, fortunes, hotkeys, placeholders, verbs
+├── domain/              # details, messages, paths, roles, slash, usage, viewport
+├── protocol/            # interpolation, paste regex
+├── hooks/               # useCompletion, useInputHistory, useQueue, useVirtualHistory
+└── lib/                 # history, messages, osc52, rpc, text
+```
+
+### CLI entry points — `hermes_cli/main.py`
+
+- `hermes --tui`      → `node dist/entry.js` (auto-builds when `.ts`/`.tsx` newer than `dist/entry.js`)
+- `hermes --tui --dev` → `tsx src/entry.tsx` (skip build)
+- `HERMES_TUI_DIR=…`  → external prebuilt dist (nix, distro packaging)
+
+## Diverged From Original Plan
+
+| Plan | Reality | Why |
+|---|---|---|
+| `tui_gateway/{controller,session_state,events,protocol}.py` | all collapsed into `server.py` | no second consumer ever emerged, keeping one file cheaper than four |
+| `ui-tui/src/main.tsx` | split into `entry.tsx` (bootstrap) + `app.tsx` (shell) | boot banner + early python spawn wanted a pre-React moment |
+| `ui-tui/src/state/store.ts` | three nanostores (`uiStore`, `turnStore`, `overlayStore`) | separate lifetimes: ui persists, turn resets per reply, overlay is modal |
+| `approval.requested` / `sudo.requested` / `clarify.requested` | `*.request` (no `-ed`) | cosmetic |
+| `session.cancel` | dropped | `session.interrupt` covers it |
+| `HERMES_EXPERIMENTAL_TUI=1`, `display.experimental_tui: true`, `/tui on/off/status` | none shipped | `--tui` went from opt-in to first-class without an experimental phase |
+
+## Post-migration Additions (not in original plan)
+
+- **Async `session.create`** — returns sid in ~1ms, agent builds on a background thread, `session.info` broadcasts when ready; `_wait_agent()` gates every agent-touching handler via `_sess`
+- **`bootBanner`** — raw-ANSI logo painted to stdout at T≈2ms, before Ink loads; `<AlternateScreen>` wipes it seamlessly when React mounts
+- **Selection uniform bg** — `theme.color.selectionBg` wired via `useSelection().setSelectionBgColor`; replaces SGR-inverse per-cell swap that fragmented over amber/gold fg
+- **Slash command registry** — TS-owned commands in `app/slash/commands/{core,ops,session}.ts`, everything else falls through to `slash.exec` (python worker)
+- **Turn store + controller split** — imperative singleton (`turnController`) holds refs/timers, nanostore (`turnStore`) holds render-visible state
+
+## What's Still Open
+
+- **Classic CLI not deleted.** `cli.py` still has ~80 `prompt_toolkit` references; classic REPL is still the default when `--tui` is absent. The original plan's "Cut 4 · prompt_toolkit removal later" hasn't happened.
+- **No config-file opt-in.** `HERMES_EXPERIMENTAL_TUI` and `display.experimental_tui` were never built; only the CLI flag exists. Fine for now — if we want "default to TUI", a single line in `main.py` flips it.
@@ -6,6 +6,11 @@
 # All fields are optional — missing values inherit from the default skin.
 # Activate with: /skin <name>  or  display.skin: <name> in config.yaml
 #
+# Keys are marked:
+#   (both)    — applies to both the classic CLI and the TUI
+#   (classic) — classic CLI only (see hermes --tui in user-guide/tui.md)
+#   (tui)     — TUI only
+#
 # See hermes_cli/skin_engine.py for the full schema reference.
 # ============================================================================

@@ -14,43 +19,47 @@ name: example
 description: An example custom skin — copy and modify this template

 # ── Colors ──────────────────────────────────────────────────────────────────
-# Hex color values for Rich markup. These control the CLI's visual palette.
+# Hex color values. These control the visual palette.
 colors:
-  # Banner panel (the startup welcome box)
+  # Banner panel (the startup welcome box) — (both)
  banner_border: "#CD7F32"        # Panel border
  banner_title: "#FFD700"         # Panel title text
  banner_accent: "#FFBF00"        # Section headers (Available Tools, Skills, etc.)
  banner_dim: "#B8860B"           # Dim/muted text (separators, model info)
  banner_text: "#FFF8DC"          # Body text (tool names, skill names)

-  # UI elements
-  ui_accent: "#FFBF00"            # General accent color
+  # UI elements — (both)
+  ui_accent: "#FFBF00"            # General accent (falls back to banner_accent)
  ui_label: "#4dd0e1"             # Labels
  ui_ok: "#4caf50"                # Success indicators
  ui_error: "#ef5350"             # Error indicators
  ui_warn: "#ffa726"              # Warning indicators

  # Input area
-  prompt: "#FFF8DC"               # Prompt text color
-  input_rule: "#CD7F32"           # Horizontal rule around input
+  prompt: "#FFF8DC"               # Prompt text / `❯` glyph color (both)
+  input_rule: "#CD7F32"           # Horizontal rule above input (classic)

-  # Response box
-  response_border: "#FFD700"      # Response box border (ANSI color)
+  # Response box — (classic)
+  response_border: "#FFD700"      # Response box border

-  # Session display
-  session_label: "#DAA520"        # Session label
-  session_border: "#8B8682"       # Session ID dim color
+  # Session display — (both)
+  session_label: "#DAA520"        # "Session: " label
+  session_border: "#8B8682"       # Session ID text

-  # TUI surfaces
-  status_bar_bg: "#1a1a2e"              # Status / usage bar background
-  voice_status_bg: "#1a1a2e"            # Voice-mode badge background
-  completion_menu_bg: "#1a1a2e"         # Completion list background
-  completion_menu_current_bg: "#333355" # Active completion row background
-  completion_menu_meta_bg: "#1a1a2e"    # Completion meta column background
-  completion_menu_meta_current_bg: "#333355"  # Active completion meta background
+  # TUI / CLI surfaces — (classic: status bar, voice badge, completion meta)
+  status_bar_bg: "#1a1a2e"              # Status / usage bar background (classic)
+  voice_status_bg: "#1a1a2e"            # Voice-mode badge background (classic)
+  completion_menu_bg: "#1a1a2e"         # Completion list background (both)
+  completion_menu_current_bg: "#333355" # Active completion row background (both)
+  completion_menu_meta_bg: "#1a1a2e"    # Completion meta column bg (classic)
+  completion_menu_meta_current_bg: "#333355"  # Active meta bg (classic)
+
+  # Drag-to-select background — (tui)
+  selection_bg: "#3a3a55"               # Uniform selection highlight in the TUI

 # ── Spinner ─────────────────────────────────────────────────────────────────
-# Customize the animated spinner shown during API calls and tool execution.
+# (classic) — the TUI uses its own animated indicators; spinner config here
+# is only read by the classic prompt_toolkit CLI.
 spinner:
  # Faces shown while waiting for the API response
  waiting_faces:
@@ -78,17 +87,17 @@ spinner:
  #   - ["⟪▲", "▲⟫"]

 # ── Branding ────────────────────────────────────────────────────────────────
-# Text strings used throughout the CLI interface.
+# Text strings used throughout the interface.
 branding:
-  agent_name: "Hermes Agent"          # Banner title, about display
-  welcome: "Welcome! Type your message or /help for commands."
-  goodbye: "Goodbye! ⚕"              # Exit message
-  response_label: " ⚕ Hermes "       # Response box header label
-  prompt_symbol: "❯ "                 # Input prompt symbol
-  help_header: "(^_^)? Available Commands"  # /help header text
+  agent_name: "Hermes Agent"                  # (both) Banner title, about display
+  welcome: "Welcome! Type your message or /help for commands."  # (both)
+  goodbye: "Goodbye! ⚕"                       # (both) Exit message
+  response_label: " ⚕ Hermes "                # (classic) Response box header label
+  prompt_symbol: "❯ "                          # (both) Input prompt glyph
+  help_header: "(^_^)? Available Commands"     # (both) /help overlay title

 # ── Tool Output ─────────────────────────────────────────────────────────────
-# Character used as the prefix for tool output lines.
+# Character used as the prefix for tool output lines. (both)
 # Default is "┊" (thin dotted vertical line). Some alternatives:
 #   "╎" (light triple dash vertical)
 #   "▏" (left one-eighth block)
@@ -36,6 +36,26 @@
        "type": "github"
      }
    },
+    "npm-lockfile-fix": {
+      "inputs": {
+        "nixpkgs": [
+          "nixpkgs"
+        ]
+      },
+      "locked": {
+        "lastModified": 1775903712,
+        "narHash": "sha256-2GV79U6iVH4gKAPWYrxUReB0S41ty/Y3dBLquU8AlaA=",
+        "owner": "jeslie0",
+        "repo": "npm-lockfile-fix",
+        "rev": "c6093acb0c0548e0f9b8b3d82918823721930fe8",
+        "type": "github"
+      },
+      "original": {
+        "owner": "jeslie0",
+        "repo": "npm-lockfile-fix",
+        "type": "github"
+      }
+    },
    "pyproject-build-systems": {
      "inputs": {
        "nixpkgs": [
@@ -124,6 +144,7 @@
      "inputs": {
        "flake-parts": "flake-parts",
        "nixpkgs": "nixpkgs",
+        "npm-lockfile-fix": "npm-lockfile-fix",
        "pyproject-build-systems": "pyproject-build-systems",
        "pyproject-nix": "pyproject-nix_2",
        "uv2nix": "uv2nix_2"
@@ -19,11 +19,20 @@
      url = "github:pyproject-nix/build-system-pkgs";
      inputs.nixpkgs.follows = "nixpkgs";
    };
+    npm-lockfile-fix = {
+      url = "github:jeslie0/npm-lockfile-fix";
+      inputs.nixpkgs.follows = "nixpkgs";
+    };
  };

-  outputs = inputs:
+  outputs =
+    inputs:
    inputs.flake-parts.lib.mkFlake { inherit inputs; } {
-      systems = [ "x86_64-linux" "aarch64-linux" "aarch64-darwin" ];
+      systems = [
+        "x86_64-linux"
+        "aarch64-linux"
+        "aarch64-darwin"
+      ];

      imports = [
        ./nix/packages.nix
@@ -258,6 +258,13 @@ class GatewayConfig:
    # Streaming configuration
    streaming: StreamingConfig = field(default_factory=StreamingConfig)

+    # Session store pruning: drop SessionEntry records older than this many
+    # days from the in-memory dict and sessions.json.  Keeps the store from
+    # growing unbounded in gateways serving many chats/threads/users over
+    # months.  Pruning is invisible to users — if they resume, they get a
+    # fresh session exactly as if the reset policy had fired.  0 = disabled.
+    session_store_max_age_days: int = 90
+
    def get_connected_platforms(self) -> List[Platform]:
        """Return list of platforms that are enabled and configured."""
        connected = []
@@ -365,6 +372,7 @@ class GatewayConfig:
            "thread_sessions_per_user": self.thread_sessions_per_user,
            "unauthorized_dm_behavior": self.unauthorized_dm_behavior,
            "streaming": self.streaming.to_dict(),
+            "session_store_max_age_days": self.session_store_max_age_days,
        }
    
    @classmethod
@@ -412,6 +420,13 @@ class GatewayConfig:
            "pair",
        )

+        try:
+            session_store_max_age_days = int(data.get("session_store_max_age_days", 90))
+            if session_store_max_age_days < 0:
+                session_store_max_age_days = 0
+        except (TypeError, ValueError):
+            session_store_max_age_days = 90
+
        return cls(
            platforms=platforms,
            default_reset_policy=default_policy,
@@ -426,6 +441,7 @@ class GatewayConfig:
            thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
            unauthorized_dm_behavior=unauthorized_dm_behavior,
            streaming=StreamingConfig.from_dict(data.get("streaming", {})),
+            session_store_max_age_days=session_store_max_age_days,
        )

    def get_unauthorized_dm_behavior(self, platform: Optional[Platform] = None) -> str:
@@ -1213,12 +1229,24 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
        qq_group_allowed = os.getenv("QQ_GROUP_ALLOWED_USERS", "").strip()
        if qq_group_allowed:
            extra["group_allow_from"] = qq_group_allowed
-        qq_home = os.getenv("QQ_HOME_CHANNEL", "").strip()
+        qq_home = os.getenv("QQBOT_HOME_CHANNEL", "").strip()
+        qq_home_name_env = "QQBOT_HOME_CHANNEL_NAME"
+        if not qq_home:
+            # Back-compat: accept the pre-rename name and log a one-time warning.
+            legacy_home = os.getenv("QQ_HOME_CHANNEL", "").strip()
+            if legacy_home:
+                qq_home = legacy_home
+                qq_home_name_env = "QQ_HOME_CHANNEL_NAME"
+                import logging
+                logging.getLogger(__name__).warning(
+                    "QQ_HOME_CHANNEL is deprecated; rename to QQBOT_HOME_CHANNEL "
+                    "in your .env for consistency with the platform key."
+                )
        if qq_home:
            config.platforms[Platform.QQBOT].home_channel = HomeChannel(
                platform=Platform.QQBOT,
                chat_id=qq_home,
-                name=os.getenv("QQ_HOME_CHANNEL_NAME", "Home"),
+                name=os.getenv("QQBOT_HOME_CHANNEL_NAME") or os.getenv(qq_home_name_env, "Home"),
            )

    # Session settings
@@ -1579,7 +1579,20 @@ class BasePlatformAdapter(ABC):
            # session lifecycle and its cleanup races with the running task
            # (see PR #4926).
            cmd = event.get_command()
-            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background", "restart", "queue", "q"):
+            if cmd in (
+                "approve",
+                "deny",
+                "status",
+                "agents",
+                "tasks",
+                "stop",
+                "new",
+                "reset",
+                "background",
+                "restart",
+                "queue",
+                "q",
+            ):
                logger.debug(
                    "[%s] Command '/%s' bypassing active-session guard for %s",
                    self.name, cmd, session_key,
@@ -0,0 +1,57 @@
+"""
+QQBot platform package.
+
+Re-exports the main adapter symbols from ``adapter.py`` (the original
+``qqbot.py``) so that **all existing import paths remain unchanged**::
+
+    from gateway.platforms.qqbot import QQAdapter          # works
+    from gateway.platforms.qqbot import check_qq_requirements  # works
+
+New modules:
+    - ``constants`` — shared constants (API URLs, timeouts, message types)
+    - ``utils`` — User-Agent builder, config helpers
+    - ``crypto`` — AES-256-GCM key generation and decryption
+    - ``onboard`` — QR-code scan-to-configure flow
+"""
+
+# -- Adapter (original qqbot.py) ------------------------------------------
+from .adapter import (  # noqa: F401
+    QQAdapter,
+    QQCloseError,
+    check_qq_requirements,
+    _coerce_list,
+    _ssrf_redirect_guard,
+)
+
+# -- Onboard (QR-code scan-to-configure) -----------------------------------
+from .onboard import (  # noqa: F401
+    BindStatus,
+    create_bind_task,
+    poll_bind_result,
+    build_connect_url,
+)
+from .crypto import decrypt_secret, generate_bind_key  # noqa: F401
+
+# -- Utils -----------------------------------------------------------------
+from .utils import build_user_agent, get_api_headers, coerce_list  # noqa: F401
+
+__all__ = [
+    # adapter
+    "QQAdapter",
+    "QQCloseError",
+    "check_qq_requirements",
+    "_coerce_list",
+    "_ssrf_redirect_guard",
+    # onboard
+    "BindStatus",
+    "create_bind_task",
+    "poll_bind_result",
+    "build_connect_url",
+    # crypto
+    "decrypt_secret",
+    "generate_bind_key",
+    # utils
+    "build_user_agent",
+    "get_api_headers",
+    "coerce_list",
+]
@@ -0,0 +1,74 @@
+"""QQBot package-level constants shared across adapter, onboard, and other modules."""
+
+from __future__ import annotations
+
+import os
+
+# ---------------------------------------------------------------------------
+# QQBot adapter version — bump on functional changes to the adapter package.
+# ---------------------------------------------------------------------------
+
+QQBOT_VERSION = "1.1.0"
+
+# ---------------------------------------------------------------------------
+# API endpoints
+# ---------------------------------------------------------------------------
+
+# The portal domain is configurable via QQ_API_HOST for corporate proxies
+# or test environments.  Default: q.qq.com (production).
+PORTAL_HOST = os.getenv("QQ_PORTAL_HOST", "q.qq.com")
+
+API_BASE = "https://api.sgroup.qq.com"
+TOKEN_URL = "https://bots.qq.com/app/getAppAccessToken"
+GATEWAY_URL_PATH = "/gateway"
+
+# QR-code onboard endpoints (on the portal host)
+ONBOARD_CREATE_PATH = "/lite/create_bind_task"
+ONBOARD_POLL_PATH = "/lite/poll_bind_result"
+QR_URL_TEMPLATE = (
+    "https://q.qq.com/qqbot/openclaw/connect.html"
+    "?task_id={task_id}&_wv=2&source=hermes"
+)
+
+# ---------------------------------------------------------------------------
+# Timeouts & retry
+# ---------------------------------------------------------------------------
+
+DEFAULT_API_TIMEOUT = 30.0
+FILE_UPLOAD_TIMEOUT = 120.0
+CONNECT_TIMEOUT_SECONDS = 20.0
+
+RECONNECT_BACKOFF = [2, 5, 10, 30, 60]
+MAX_RECONNECT_ATTEMPTS = 100
+RATE_LIMIT_DELAY = 60  # seconds
+QUICK_DISCONNECT_THRESHOLD = 5.0  # seconds
+MAX_QUICK_DISCONNECT_COUNT = 3
+
+ONBOARD_POLL_INTERVAL = 2.0  # seconds between poll_bind_result calls
+ONBOARD_API_TIMEOUT = 10.0
+
+# ---------------------------------------------------------------------------
+# Message limits
+# ---------------------------------------------------------------------------
+
+MAX_MESSAGE_LENGTH = 4000
+DEDUP_WINDOW_SECONDS = 300
+DEDUP_MAX_SIZE = 1000
+
+# ---------------------------------------------------------------------------
+# QQ Bot message types
+# ---------------------------------------------------------------------------
+
+MSG_TYPE_TEXT = 0
+MSG_TYPE_MARKDOWN = 2
+MSG_TYPE_MEDIA = 7
+MSG_TYPE_INPUT_NOTIFY = 6
+
+# ---------------------------------------------------------------------------
+# QQ Bot file media types
+# ---------------------------------------------------------------------------
+
+MEDIA_TYPE_IMAGE = 1
+MEDIA_TYPE_VIDEO = 2
+MEDIA_TYPE_VOICE = 3
+MEDIA_TYPE_FILE = 4
@@ -0,0 +1,45 @@
+"""AES-256-GCM utilities for QQBot scan-to-configure credential decryption."""
+
+from __future__ import annotations
+
+import base64
+import os
+
+
+def generate_bind_key() -> str:
+    """Generate a 256-bit random AES key and return it as base64.
+
+    The key is passed to ``create_bind_task`` so the server can encrypt
+    the bot's *client_secret* before returning it.  Only this CLI holds
+    the key, ensuring the secret never travels in plaintext.
+    """
+    return base64.b64encode(os.urandom(32)).decode()
+
+
+def decrypt_secret(encrypted_base64: str, key_base64: str) -> str:
+    """Decrypt a base64-encoded AES-256-GCM ciphertext.
+
+    Ciphertext layout (after base64-decoding)::
+
+        IV (12 bytes) ‖ ciphertext (N bytes) ‖ AuthTag (16 bytes)
+
+    Args:
+        encrypted_base64: The ``bot_encrypt_secret`` value from
+            ``poll_bind_result``.
+        key_base64: The base64 AES key generated by
+            :func:`generate_bind_key`.
+
+    Returns:
+        The decrypted *client_secret* as a UTF-8 string.
+    """
+    from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+
+    key = base64.b64decode(key_base64)
+    raw = base64.b64decode(encrypted_base64)
+
+    iv = raw[:12]
+    ciphertext_with_tag = raw[12:]  # AESGCM expects ciphertext + tag concatenated
+
+    aesgcm = AESGCM(key)
+    plaintext = aesgcm.decrypt(iv, ciphertext_with_tag, None)
+    return plaintext.decode("utf-8")
@@ -0,0 +1,124 @@
+"""
+QQBot scan-to-configure (QR code onboard) module.
+
+Calls the ``q.qq.com`` ``create_bind_task`` / ``poll_bind_result`` APIs to
+generate a QR-code URL and poll for scan completion.  On success the caller
+receives the bot's *app_id*, *client_secret* (decrypted locally), and the
+scanner's *user_openid* — enough to fully configure the QQBot gateway.
+
+Reference: https://bot.q.qq.com/wiki/develop/api-v2/
+"""
+
+from __future__ import annotations
+
+import logging
+from enum import IntEnum
+from typing import Tuple
+from urllib.parse import quote
+
+from .constants import (
+    ONBOARD_API_TIMEOUT,
+    ONBOARD_CREATE_PATH,
+    ONBOARD_POLL_PATH,
+    PORTAL_HOST,
+    QR_URL_TEMPLATE,
+)
+from .crypto import generate_bind_key
+from .utils import get_api_headers
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Bind status
+# ---------------------------------------------------------------------------
+
+
+class BindStatus(IntEnum):
+    """Status codes returned by ``poll_bind_result``."""
+
+    NONE = 0
+    PENDING = 1
+    COMPLETED = 2
+    EXPIRED = 3
+
+
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+
+
+async def create_bind_task(
+    timeout: float = ONBOARD_API_TIMEOUT,
+) -> Tuple[str, str]:
+    """Create a bind task and return *(task_id, aes_key_base64)*.
+
+    The AES key is generated locally and sent to the server so it can
+    encrypt the bot credentials before returning them.
+
+    Raises:
+        RuntimeError: If the API returns a non-zero ``retcode``.
+    """
+    import httpx
+
+    url = f"https://{PORTAL_HOST}{ONBOARD_CREATE_PATH}"
+    key = generate_bind_key()
+
+    async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
+        resp = await client.post(url, json={"key": key}, headers=get_api_headers())
+        resp.raise_for_status()
+        data = resp.json()
+
+    if data.get("retcode") != 0:
+        raise RuntimeError(data.get("msg", "create_bind_task failed"))
+
+    task_id = data.get("data", {}).get("task_id")
+    if not task_id:
+        raise RuntimeError("create_bind_task: missing task_id in response")
+
+    logger.debug("create_bind_task ok: task_id=%s", task_id)
+    return task_id, key
+
+
+async def poll_bind_result(
+    task_id: str,
+    timeout: float = ONBOARD_API_TIMEOUT,
+) -> Tuple[BindStatus, str, str, str]:
+    """Poll the bind result for *task_id*.
+
+    Returns:
+        A 4-tuple of ``(status, bot_appid, bot_encrypt_secret, user_openid)``.
+
+        * ``bot_encrypt_secret`` is AES-256-GCM encrypted — decrypt it with
+          :func:`~gateway.platforms.qqbot.crypto.decrypt_secret` using the
+          key from :func:`create_bind_task`.
+        * ``user_openid`` is the OpenID of the person who scanned the code
+          (available when ``status == COMPLETED``).
+
+    Raises:
+        RuntimeError: If the API returns a non-zero ``retcode``.
+    """
+    import httpx
+
+    url = f"https://{PORTAL_HOST}{ONBOARD_POLL_PATH}"
+
+    async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
+        resp = await client.post(url, json={"task_id": task_id}, headers=get_api_headers())
+        resp.raise_for_status()
+        data = resp.json()
+
+    if data.get("retcode") != 0:
+        raise RuntimeError(data.get("msg", "poll_bind_result failed"))
+
+    d = data.get("data", {})
+    return (
+        BindStatus(d.get("status", 0)),
+        str(d.get("bot_appid", "")),
+        d.get("bot_encrypt_secret", ""),
+        d.get("user_openid", ""),
+    )
+
+
+def build_connect_url(task_id: str) -> str:
+    """Build the QR-code target URL for a given *task_id*."""
+    return QR_URL_TEMPLATE.format(task_id=quote(task_id))
@@ -0,0 +1,71 @@
+"""QQBot shared utilities — User-Agent, HTTP helpers, config coercion."""
+
+from __future__ import annotations
+
+import platform
+import sys
+from typing import Any, Dict, List
+
+from .constants import QQBOT_VERSION
+
+
+# ---------------------------------------------------------------------------
+# User-Agent
+# ---------------------------------------------------------------------------
+
+def _get_hermes_version() -> str:
+    """Return the hermes-agent package version, or 'dev' if unavailable."""
+    try:
+        from importlib.metadata import version
+        return version("hermes-agent")
+    except Exception:
+        return "dev"
+
+
+def build_user_agent() -> str:
+    """Build a descriptive User-Agent string.
+
+    Format::
+
+        QQBotAdapter/<qqbot_version> (Python/<py_version>; <os>; Hermes/<hermes_version>)
+
+    Example::
+
+        QQBotAdapter/1.0.0 (Python/3.11.15; darwin; Hermes/0.9.0)
+    """
+    py_version = f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"
+    os_name = platform.system().lower()
+    hermes_version = _get_hermes_version()
+    return f"QQBotAdapter/{QQBOT_VERSION} (Python/{py_version}; {os_name}; Hermes/{hermes_version})"
+
+
+def get_api_headers() -> Dict[str, str]:
+    """Return standard HTTP headers for QQBot API requests.
+
+    Includes ``Content-Type``, ``Accept``, and a dynamic ``User-Agent``.
+    ``q.qq.com`` requires ``Accept: application/json`` — without it,
+    the server returns a JavaScript anti-bot challenge page.
+    """
+    return {
+        "Content-Type": "application/json",
+        "Accept": "application/json",
+        "User-Agent": build_user_agent(),
+    }
+
+
+# ---------------------------------------------------------------------------
+# Config helpers
+# ---------------------------------------------------------------------------
+
+def coerce_list(value: Any) -> List[str]:
+    """Coerce config values into a trimmed string list.
+
+    Accepts comma-separated strings, lists, tuples, sets, or single values.
+    """
+    if value is None:
+        return []
+    if isinstance(value, str):
+        return [item.strip() for item in value.split(",") if item.strip()]
+    if isinstance(value, (list, tuple, set)):
+        return [str(item).strip() for item in value if str(item).strip()]
+    return [str(value).strip()] if str(value).strip() else []
@@ -118,6 +118,84 @@ def _strip_mdv2(text: str) -> str:
    return cleaned


+# ---------------------------------------------------------------------------
+# Markdown table → code block conversion
+# ---------------------------------------------------------------------------
+# Telegram's MarkdownV2 has no table syntax — '|' is just an escaped literal,
+# so pipe tables render as noisy backslash-pipe text with no alignment.
+# Wrapping the table in a fenced code block makes Telegram render it as
+# monospace preformatted text with columns intact.
+
+# Matches a GFM table delimiter row: optional outer pipes, cells containing
+# only dashes (with optional leading/trailing colons for alignment) separated
+# by '|'.  Requires at least one internal '|' so lone '---' horizontal rules
+# are NOT matched.
+_TABLE_SEPARATOR_RE = re.compile(
+    r'^\s*\|?\s*:?-+:?\s*(?:\|\s*:?-+:?\s*){1,}\|?\s*$'
+)
+
+
+def _is_table_row(line: str) -> bool:
+    """Return True if *line* could plausibly be a table data row."""
+    stripped = line.strip()
+    return bool(stripped) and '|' in stripped
+
+
+def _wrap_markdown_tables(text: str) -> str:
+    """Wrap GFM-style pipe tables in ``` fences so Telegram renders them.
+
+    Detected by a row containing '|' immediately followed by a delimiter
+    row matching :data:`_TABLE_SEPARATOR_RE`.  Subsequent pipe-containing
+    non-blank lines are consumed as the table body and included in the
+    wrapped block.  Tables inside existing fenced code blocks are left
+    alone.
+    """
+    if '|' not in text or '-' not in text:
+        return text
+
+    lines = text.split('\n')
+    out: list[str] = []
+    in_fence = False
+    i = 0
+    while i < len(lines):
+        line = lines[i]
+        stripped = line.lstrip()
+
+        # Track existing fenced code blocks — never touch content inside.
+        if stripped.startswith('```'):
+            in_fence = not in_fence
+            out.append(line)
+            i += 1
+            continue
+        if in_fence:
+            out.append(line)
+            i += 1
+            continue
+
+        # Look for a header row (contains '|') immediately followed by a
+        # delimiter row.
+        if (
+            '|' in line
+            and i + 1 < len(lines)
+            and _TABLE_SEPARATOR_RE.match(lines[i + 1])
+        ):
+            table_block = [line, lines[i + 1]]
+            j = i + 2
+            while j < len(lines) and _is_table_row(lines[j]):
+                table_block.append(lines[j])
+                j += 1
+            out.append('```')
+            out.extend(table_block)
+            out.append('```')
+            i = j
+            continue
+
+        out.append(line)
+        i += 1
+
+    return '\n'.join(out)
+
+
 class TelegramAdapter(BasePlatformAdapter):
    """
    Telegram bot adapter.
@@ -1916,6 +1994,12 @@ class TelegramAdapter(BasePlatformAdapter):

        text = content

+        # 0) Pre-wrap GFM-style pipe tables in ``` fences.  Telegram can't
+        #    render tables natively, but fenced code blocks render as
+        #    monospace preformatted text with columns intact.  The wrapped
+        #    tables then flow through step (1) below as protected regions.
+        text = _wrap_markdown_tables(text)
+
        # 1) Protect fenced code blocks (``` ... ```)
        #    Per MarkdownV2 spec, \ and ` inside pre/code must be escaped.
        def _protect_fenced(m):
@@ -2178,6 +2178,30 @@ class GatewayRunner:
                        )
                except Exception as _e:
                    logger.debug("Idle agent sweep failed: %s", _e)
+
+                # Periodically prune stale SessionStore entries.  The
+                # in-memory dict (and sessions.json) would otherwise grow
+                # unbounded in gateways serving many rotating chats /
+                # threads / users over long time windows.  Pruning is
+                # invisible to users — a resumed session just gets a
+                # fresh session_id, exactly as if the reset policy fired.
+                _last_prune_ts = getattr(self, "_last_session_store_prune_ts", 0.0)
+                _prune_interval = 3600.0  # once per hour
+                if time.time() - _last_prune_ts > _prune_interval:
+                    try:
+                        _max_age = int(
+                            getattr(self.config, "session_store_max_age_days", 0) or 0
+                        )
+                        if _max_age > 0:
+                            _pruned = self.session_store.prune_old_entries(_max_age)
+                            if _pruned:
+                                logger.info(
+                                    "SessionStore prune: dropped %d stale entries",
+                                    _pruned,
+                                )
+                    except Exception as _e:
+                        logger.debug("SessionStore prune failed: %s", _e)
+                    self._last_session_store_prune_ts = time.time()
            except Exception as e:
                logger.debug("Session expiry watcher error: %s", e)
            # Sleep in small increments so we can stop quickly
@@ -2384,6 +2408,7 @@ class GatewayRunner:

            self.adapters.clear()
            self._running_agents.clear()
+            self._running_agents_ts.clear()
            self._pending_messages.clear()
            self._pending_approvals.clear()
            if hasattr(self, '_busy_ack_ts'):
@@ -2408,6 +2433,20 @@ class GatewayRunner:
            except Exception:
                pass

+            # Close SQLite session DBs so the WAL write lock is released.
+            # Without this, --replace and similar restart flows leave the
+            # old gateway's connection holding the WAL lock until Python
+            # actually exits — causing 'database is locked' errors when
+            # the new gateway tries to open the same file.
+            for _db_holder in (self, getattr(self, "session_store", None)):
+                _db = getattr(_db_holder, "_db", None) if _db_holder else None
+                if _db is None or not hasattr(_db, "close"):
+                    continue
+                try:
+                    _db.close()
+                except Exception as _e:
+                    logger.debug("SessionDB close error: %s", _e)
+
            from gateway.status import remove_pid_file
            remove_pid_file()

@@ -2906,9 +2945,7 @@ class GatewayRunner:
                    _quick_key[:30], _stale_age, _stale_idle,
                    _raw_stale_timeout, _stale_detail,
                )
-                del self._running_agents[_quick_key]
-                self._running_agents_ts.pop(_quick_key, None)
-                self._busy_ack_ts.pop(_quick_key, None)
+                self._release_running_agent_state(_quick_key)

        if _quick_key in self._running_agents:
            if event.get_command() == "status":
@@ -2936,8 +2973,7 @@ class GatewayRunner:
                if adapter and hasattr(adapter, 'get_pending_message'):
                    adapter.get_pending_message(_quick_key)  # consume and discard
                self._pending_messages.pop(_quick_key, None)
-                if _quick_key in self._running_agents:
-                    del self._running_agents[_quick_key]
+                self._release_running_agent_state(_quick_key)
                logger.info("STOP for session %s — agent interrupted, session lock released", _quick_key[:20])
                return "⚡ Stopped. You can continue this session."

@@ -2959,8 +2995,7 @@ class GatewayRunner:
                self._pending_messages.pop(_quick_key, None)
                # Clean up the running agent entry so the reset handler
                # doesn't think an agent is still active.
-                if _quick_key in self._running_agents:
-                    del self._running_agents[_quick_key]
+                self._release_running_agent_state(_quick_key)
                return await self._handle_reset_command(event)

            # /queue <prompt> — queue without interrupting
@@ -2994,6 +3029,10 @@ class GatewayRunner:
                    return await self._handle_approve_command(event)
                return await self._handle_deny_command(event)

+            # /agents (/tasks alias) should be query-only and never interrupt.
+            if _cmd_def_inner and _cmd_def_inner.name == "agents":
+                return await self._handle_agents_command(event)
+
            # /background must bypass the running-agent guard — it starts a
            # parallel task and must never interrupt the active conversation.
            if _cmd_def_inner and _cmd_def_inner.name == "background":
@@ -3037,8 +3076,7 @@ class GatewayRunner:
                # Agent is being set up but not ready yet.
                if event.get_command() == "stop":
                    # Force-clean the sentinel so the session is unlocked.
-                    if _quick_key in self._running_agents:
-                        del self._running_agents[_quick_key]
+                    self._release_running_agent_state(_quick_key)
                    logger.info("HARD STOP (pending) for session %s — sentinel cleared", _quick_key[:20])
                    return "⚡ Force-stopped. The agent was still starting — session unlocked."
                # Queue the message so it will be picked up after the
@@ -3102,6 +3140,9 @@ class GatewayRunner:
        if canonical == "status":
            return await self._handle_status_command(event)

+        if canonical == "agents":
+            return await self._handle_agents_command(event)
+
        if canonical == "restart":
            return await self._handle_restart_command(event)
        
@@ -3354,8 +3395,13 @@ class GatewayRunner:
            # (exception, command fallthrough, etc.) the sentinel must
            # not linger or the session would be permanently locked out.
            if self._running_agents.get(_quick_key) is _AGENT_PENDING_SENTINEL:
-                del self._running_agents[_quick_key]
-            self._running_agents_ts.pop(_quick_key, None)
+                self._release_running_agent_state(_quick_key)
+            else:
+                # Agent path already cleaned _running_agents; make sure
+                # the paired metadata dicts are gone too.
+                self._running_agents_ts.pop(_quick_key, None)
+                if hasattr(self, "_busy_ack_ts"):
+                    self._busy_ack_ts.pop(_quick_key, None)

    async def _prepare_inbound_message_text(
        self,
@@ -4552,6 +4598,96 @@ class GatewayRunner:
        ])

        return "\n".join(lines)
+
+    async def _handle_agents_command(self, event: MessageEvent) -> str:
+        """Handle /agents command - list active agents and running tasks."""
+        from tools.process_registry import format_uptime_short, process_registry
+
+        now = time.time()
+        current_session_key = self._session_key_for_source(event.source)
+
+        running_agents: dict = getattr(self, "_running_agents", {}) or {}
+        running_started: dict = getattr(self, "_running_agents_ts", {}) or {}
+
+        agent_rows: list[dict] = []
+        for session_key, agent in running_agents.items():
+            started = float(running_started.get(session_key, now))
+            elapsed = max(0, int(now - started))
+            is_pending = agent is _AGENT_PENDING_SENTINEL
+            agent_rows.append(
+                {
+                    "session_key": session_key,
+                    "elapsed": elapsed,
+                    "state": "starting" if is_pending else "running",
+                    "session_id": "" if is_pending else str(getattr(agent, "session_id", "") or ""),
+                    "model": "" if is_pending else str(getattr(agent, "model", "") or ""),
+                }
+            )
+
+        agent_rows.sort(key=lambda row: row["elapsed"], reverse=True)
+
+        running_processes: list[dict] = []
+        try:
+            running_processes = [
+                p for p in process_registry.list_sessions()
+                if p.get("status") == "running"
+            ]
+        except Exception:
+            running_processes = []
+
+        background_tasks = [
+            t for t in (getattr(self, "_background_tasks", set()) or set())
+            if hasattr(t, "done") and not t.done()
+        ]
+
+        lines = [
+            "🤖 **Active Agents & Tasks**",
+            "",
+            f"**Active agents:** {len(agent_rows)}",
+        ]
+
+        if agent_rows:
+            for idx, row in enumerate(agent_rows[:12], 1):
+                current = " · this chat" if row["session_key"] == current_session_key else ""
+                sid = f" · `{row['session_id']}`" if row["session_id"] else ""
+                model = f" · `{row['model']}`" if row["model"] else ""
+                lines.append(
+                    f"{idx}. `{row['session_key']}` · {row['state']} · "
+                    f"{format_uptime_short(row['elapsed'])}{sid}{model}{current}"
+                )
+            if len(agent_rows) > 12:
+                lines.append(f"... and {len(agent_rows) - 12} more")
+
+        lines.extend(
+            [
+                "",
+                f"**Running background processes:** {len(running_processes)}",
+            ]
+        )
+        if running_processes:
+            for proc in running_processes[:12]:
+                cmd = " ".join(str(proc.get("command", "")).split())
+                if len(cmd) > 90:
+                    cmd = cmd[:87] + "..."
+                lines.append(
+                    f"- `{proc.get('session_id', '?')}` · "
+                    f"{format_uptime_short(int(proc.get('uptime_seconds', 0)))} · `{cmd}`"
+                )
+            if len(running_processes) > 12:
+                lines.append(f"... and {len(running_processes) - 12} more")
+
+        lines.extend(
+            [
+                "",
+                f"**Gateway async jobs:** {len(background_tasks)}",
+            ]
+        )
+
+        if not agent_rows and not running_processes and not background_tasks:
+            lines.append("")
+            lines.append("No active agents or running tasks.")
+
+        return "\n".join(lines)
    
    async def _handle_stop_command(self, event: MessageEvent) -> str:
        """Handle /stop command - interrupt a running agent.
@@ -4571,16 +4707,14 @@ class GatewayRunner:
        agent = self._running_agents.get(session_key)
        if agent is _AGENT_PENDING_SENTINEL:
            # Force-clean the sentinel so the session is unlocked.
-            if session_key in self._running_agents:
-                del self._running_agents[session_key]
+            self._release_running_agent_state(session_key)
            logger.info("STOP (pending) for session %s — sentinel cleared", session_key[:20])
            return "⚡ Stopped. The agent hadn't started yet — you can continue this session."
        if agent:
            agent.interrupt("Stop requested")
            # Force-clean the session lock so a truly hung agent doesn't
            # keep it locked forever.
-            if session_key in self._running_agents:
-                del self._running_agents[session_key]
+            self._release_running_agent_state(session_key)
            return "⚡ Stopped. You can continue this session."
        else:
            return "No active task to stop."
@@ -6496,8 +6630,7 @@ class GatewayRunner:
            logger.debug("Memory flush on resume failed: %s", e)

        # Clear any running agent for this session key
-        if session_key in self._running_agents:
-            del self._running_agents[session_key]
+        self._release_running_agent_state(session_key)

        # Switch the session entry to point at the old session
        new_entry = self.session_store.switch_session(session_key, target_id)
@@ -7913,6 +8046,30 @@ class GatewayRunner:
        override = self._session_model_overrides.get(session_key)
        return override is not None and override.get("model") == agent_model

+    def _release_running_agent_state(self, session_key: str) -> None:
+        """Pop ALL per-running-agent state entries for ``session_key``.
+
+        Replaces ad-hoc ``del self._running_agents[key]`` calls scattered
+        across the gateway.  Those sites had drifted: some popped only
+        ``_running_agents``; some also ``_running_agents_ts``; only one
+        path also cleared ``_busy_ack_ts``.  Each missed entry was a
+        small, persistent leak — a (str_key → float) tuple per session
+        per gateway lifetime.
+
+        Use this at every site that ends a running turn, regardless of
+        cause (normal completion, /stop, /reset, /resume, sentinel
+        cleanup, stale-eviction).  Per-session state that PERSISTS
+        across turns (``_session_model_overrides``, ``_voice_mode``,
+        ``_pending_approvals``, ``_update_prompt_pending``) is NOT
+        touched here — those have their own lifecycles.
+        """
+        if not session_key:
+            return
+        self._running_agents.pop(session_key, None)
+        self._running_agents_ts.pop(session_key, None)
+        if hasattr(self, "_busy_ack_ts"):
+            self._busy_ack_ts.pop(session_key, None)
+
    def _evict_cached_agent(self, session_key: str) -> None:
        """Remove a cached agent for a session (called on /new, /model, etc)."""
        _lock = getattr(self, "_agent_cache_lock", None)
@@ -9748,10 +9905,8 @@ class GatewayRunner:
            
            # Clean up tracking
            tracking_task.cancel()
-            if session_key and session_key in self._running_agents:
-                del self._running_agents[session_key]
            if session_key:
-                self._running_agents_ts.pop(session_key, None)
+                self._release_running_agent_state(session_key)
            if self._draining:
                self._update_runtime_status("draining")
            
@@ -802,6 +802,57 @@ class SessionStore:
                return True
        return False

+    def prune_old_entries(self, max_age_days: int) -> int:
+        """Drop SessionEntry records older than max_age_days.
+
+        Pruning is based on ``updated_at`` (last activity), not ``created_at``.
+        A session that's been active within the window is kept regardless of
+        how old it is.  Entries marked ``suspended`` are kept — the user
+        explicitly paused them for later resume.  Entries held by an active
+        process (via has_active_processes_fn) are also kept so long-running
+        background work isn't orphaned.
+
+        Pruning is functionally identical to a natural reset-policy expiry:
+        the transcript in SQLite stays, but the session_key → session_id
+        mapping is dropped and the user starts a fresh session on return.
+
+        ``max_age_days <= 0`` disables pruning; returns 0 immediately.
+        Returns the number of entries removed.
+        """
+        if max_age_days is None or max_age_days <= 0:
+            return 0
+        from datetime import timedelta
+
+        cutoff = _now() - timedelta(days=max_age_days)
+        removed_keys: list[str] = []
+
+        with self._lock:
+            self._ensure_loaded_locked()
+            for key, entry in list(self._entries.items()):
+                if entry.suspended:
+                    continue
+                # Never prune sessions with an active background process
+                # attached — the user may still be waiting on output.
+                if self._has_active_processes_fn is not None:
+                    try:
+                        if self._has_active_processes_fn(entry.session_id):
+                            continue
+                    except Exception:
+                        pass
+                if entry.updated_at < cutoff:
+                    removed_keys.append(key)
+            for key in removed_keys:
+                self._entries.pop(key, None)
+            if removed_keys:
+                self._save()
+
+        if removed_keys:
+            logger.info(
+                "SessionStore pruned %d entries older than %d days",
+                len(removed_keys), max_age_days,
+            )
+        return len(removed_keys)
+
    def suspend_recently_active(self, max_age_seconds: int = 120) -> int:
        """Mark recently-active sessions as suspended.

@@ -233,6 +233,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        api_key_env_vars=("XAI_API_KEY",),
        base_url_env_var="XAI_BASE_URL",
    ),
+    "nvidia": ProviderConfig(
+        id="nvidia",
+        name="NVIDIA NIM",
+        auth_type="api_key",
+        inference_base_url="https://integrate.api.nvidia.com/v1",
+        api_key_env_vars=("NVIDIA_API_KEY",),
+        base_url_env_var="NVIDIA_BASE_URL",
+    ),
    "ai-gateway": ProviderConfig(
        id="ai-gateway",
        name="Vercel AI Gateway",
@@ -7,8 +7,8 @@ CLI tools that ship with the platform (or are commonly installed).

 Platform support:
  macOS   — osascript (always available), pngpaste (if installed)
-  Windows — PowerShell via .NET System.Windows.Forms.Clipboard
-  WSL2    — powershell.exe via .NET System.Windows.Forms.Clipboard
+  Windows — PowerShell via WinForms, Get-Clipboard, file-drop fallback
+  WSL2    — powershell.exe via WinForms, Get-Clipboard, file-drop fallback
  Linux   — wl-paste (Wayland), xclip (X11)
 """

@@ -46,10 +46,11 @@ def has_clipboard_image() -> bool:
        return _macos_has_image()
    if sys.platform == "win32":
        return _windows_has_image()
-    if _is_wsl():
-        return _wsl_has_image()
-    if os.environ.get("WAYLAND_DISPLAY"):
-        return _wayland_has_image()
+    # Match _linux_save fallthrough order: WSL → Wayland → X11
+    if _is_wsl() and _wsl_has_image():
+        return True
+    if os.environ.get("WAYLAND_DISPLAY") and _wayland_has_image():
+        return True
    return _xclip_has_image()


@@ -135,6 +136,114 @@ _PS_EXTRACT_IMAGE = (
    "[System.Convert]::ToBase64String($ms.ToArray())"
 )

+_PS_CHECK_IMAGE_GET_CLIPBOARD = (
+    "try { "
+    "$img = Get-Clipboard -Format Image -ErrorAction Stop;"
+    "if ($null -ne $img) { 'True' } else { 'False' }"
+    "} catch { 'False' }"
+)
+
+_PS_EXTRACT_IMAGE_GET_CLIPBOARD = (
+    "try { "
+    "Add-Type -AssemblyName System.Drawing;"
+    "Add-Type -AssemblyName PresentationCore;"
+    "Add-Type -AssemblyName WindowsBase;"
+    "$img = Get-Clipboard -Format Image -ErrorAction Stop;"
+    "if ($null -eq $img) { exit 1 }"
+    "$ms = New-Object System.IO.MemoryStream;"
+    "if ($img -is [System.Drawing.Image]) {"
+    "$img.Save($ms, [System.Drawing.Imaging.ImageFormat]::Png)"
+    "} elseif ($img -is [System.Windows.Media.Imaging.BitmapSource]) {"
+    "$enc = New-Object System.Windows.Media.Imaging.PngBitmapEncoder;"
+    "$enc.Frames.Add([System.Windows.Media.Imaging.BitmapFrame]::Create($img));"
+    "$enc.Save($ms)"
+    "} else { exit 2 }"
+    "[System.Convert]::ToBase64String($ms.ToArray())"
+    "} catch { exit 1 }"
+)
+
+_FILEDROP_IMAGE_EXTS = "'.png','.jpg','.jpeg','.gif','.webp','.bmp','.tiff','.tif'"
+
+_PS_CHECK_FILEDROP_IMAGE = (
+    "try { "
+    "$files = Get-Clipboard -Format FileDropList -ErrorAction Stop;"
+    f"$exts = @({_FILEDROP_IMAGE_EXTS});"
+    "$hit = $files | Where-Object { $exts -contains ([System.IO.Path]::GetExtension($_).ToLowerInvariant()) } | Select-Object -First 1;"
+    "if ($null -ne $hit) { 'True' } else { 'False' }"
+    "} catch { 'False' }"
+)
+
+_PS_EXTRACT_FILEDROP_IMAGE = (
+    "try { "
+    "$files = Get-Clipboard -Format FileDropList -ErrorAction Stop;"
+    f"$exts = @({_FILEDROP_IMAGE_EXTS});"
+    "$hit = $files | Where-Object { $exts -contains ([System.IO.Path]::GetExtension($_).ToLowerInvariant()) } | Select-Object -First 1;"
+    "if ($null -eq $hit) { exit 1 }"
+    "[System.Convert]::ToBase64String([System.IO.File]::ReadAllBytes($hit))"
+    "} catch { exit 1 }"
+)
+
+_POWERSHELL_HAS_IMAGE_SCRIPTS = (
+    _PS_CHECK_IMAGE,
+    _PS_CHECK_IMAGE_GET_CLIPBOARD,
+    _PS_CHECK_FILEDROP_IMAGE,
+)
+
+_POWERSHELL_EXTRACT_IMAGE_SCRIPTS = (
+    _PS_EXTRACT_IMAGE,
+    _PS_EXTRACT_IMAGE_GET_CLIPBOARD,
+    _PS_EXTRACT_FILEDROP_IMAGE,
+)
+
+
+def _run_powershell(exe: str, script: str, timeout: int) -> subprocess.CompletedProcess:
+    return subprocess.run(
+        [exe, "-NoProfile", "-NonInteractive", "-Command", script],
+        capture_output=True, text=True, timeout=timeout,
+    )
+
+
+def _write_base64_image(dest: Path, b64_data: str) -> bool:
+    image_bytes = base64.b64decode(b64_data, validate=True)
+    dest.write_bytes(image_bytes)
+    return dest.exists() and dest.stat().st_size > 0
+
+
+def _powershell_has_image(exe: str, *, timeout: int, label: str) -> bool:
+    for script in _POWERSHELL_HAS_IMAGE_SCRIPTS:
+        try:
+            r = _run_powershell(exe, script, timeout=timeout)
+            if r.returncode == 0 and "True" in r.stdout:
+                return True
+        except FileNotFoundError:
+            logger.debug("%s not found — clipboard unavailable", exe)
+            return False
+        except Exception as e:
+            logger.debug("%s clipboard image check failed: %s", label, e)
+    return False
+
+
+def _powershell_save_image(exe: str, dest: Path, *, timeout: int, label: str) -> bool:
+    for script in _POWERSHELL_EXTRACT_IMAGE_SCRIPTS:
+        try:
+            r = _run_powershell(exe, script, timeout=timeout)
+            if r.returncode != 0:
+                continue
+
+            b64_data = r.stdout.strip()
+            if not b64_data:
+                continue
+
+            if _write_base64_image(dest, b64_data):
+                return True
+        except FileNotFoundError:
+            logger.debug("%s not found — clipboard unavailable", exe)
+            return False
+        except Exception as e:
+            logger.debug("%s clipboard image extraction failed: %s", label, e)
+            dest.unlink(missing_ok=True)
+    return False
+

 # ── Native Windows ────────────────────────────────────────────────────────

@@ -175,15 +284,7 @@ def _windows_has_image() -> bool:
    ps = _get_ps_exe()
    if ps is None:
        return False
-    try:
-        r = subprocess.run(
-            [ps, "-NoProfile", "-NonInteractive", "-Command", _PS_CHECK_IMAGE],
-            capture_output=True, text=True, timeout=5,
-        )
-        return r.returncode == 0 and "True" in r.stdout
-    except Exception as e:
-        logger.debug("Windows clipboard image check failed: %s", e)
-    return False
+    return _powershell_has_image(ps, timeout=5, label="Windows")


 def _windows_save(dest: Path) -> bool:
@@ -192,26 +293,7 @@ def _windows_save(dest: Path) -> bool:
    if ps is None:
        logger.debug("No PowerShell found — Windows clipboard image paste unavailable")
        return False
-    try:
-        r = subprocess.run(
-            [ps, "-NoProfile", "-NonInteractive", "-Command", _PS_EXTRACT_IMAGE],
-            capture_output=True, text=True, timeout=15,
-        )
-        if r.returncode != 0:
-            return False
-
-        b64_data = r.stdout.strip()
-        if not b64_data:
-            return False
-
-        png_bytes = base64.b64decode(b64_data)
-        dest.write_bytes(png_bytes)
-        return dest.exists() and dest.stat().st_size > 0
-
-    except Exception as e:
-        logger.debug("Windows clipboard image extraction failed: %s", e)
-        dest.unlink(missing_ok=True)
-    return False
+    return _powershell_save_image(ps, dest, timeout=15, label="Windows")


 # ── Linux ────────────────────────────────────────────────────────────────
@@ -235,45 +317,12 @@ def _linux_save(dest: Path) -> bool:

 def _wsl_has_image() -> bool:
    """Check if Windows clipboard has an image (via powershell.exe)."""
-    try:
-        r = subprocess.run(
-            ["powershell.exe", "-NoProfile", "-NonInteractive", "-Command",
-             _PS_CHECK_IMAGE],
-            capture_output=True, text=True, timeout=8,
-        )
-        return r.returncode == 0 and "True" in r.stdout
-    except FileNotFoundError:
-        logger.debug("powershell.exe not found — WSL clipboard unavailable")
-    except Exception as e:
-        logger.debug("WSL clipboard check failed: %s", e)
-    return False
+    return _powershell_has_image("powershell.exe", timeout=8, label="WSL")


 def _wsl_save(dest: Path) -> bool:
    """Extract clipboard image via powershell.exe → base64 → decode to PNG."""
-    try:
-        r = subprocess.run(
-            ["powershell.exe", "-NoProfile", "-NonInteractive", "-Command",
-             _PS_EXTRACT_IMAGE],
-            capture_output=True, text=True, timeout=15,
-        )
-        if r.returncode != 0:
-            return False
-
-        b64_data = r.stdout.strip()
-        if not b64_data:
-            return False
-
-        png_bytes = base64.b64decode(b64_data)
-        dest.write_bytes(png_bytes)
-        return dest.exists() and dest.stat().st_size > 0
-
-    except FileNotFoundError:
-        logger.debug("powershell.exe not found — WSL clipboard unavailable")
-    except Exception as e:
-        logger.debug("WSL clipboard extraction failed: %s", e)
-        dest.unlink(missing_ok=True)
-    return False
+    return _powershell_save_image("powershell.exe", dest, timeout=15, label="WSL")


 # ── Wayland (wl-paste) ──────────────────────────────────────────────────
@@ -87,6 +87,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
               aliases=("bg",), args_hint="<prompt>"),
    CommandDef("btw", "Ephemeral side question using session context (no tools, not persisted)", "Session",
               args_hint="<question>"),
+    CommandDef("agents", "Show active agents and running tasks", "Session",
+               aliases=("tasks",)),
    CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session",
               aliases=("q",), args_hint="<prompt>"),
    CommandDef("status", "Show session info", "Session"),
@@ -99,7 +101,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
    # Configuration
    CommandDef("config", "Show current configuration", "Configuration",
               cli_only=True),
-    CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--global]"),
+    CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--provider name] [--global]"),
    CommandDef("provider", "Show available providers and current provider",
               "Configuration"),
    CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info"),
@@ -120,7 +122,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
               args_hint="[normal|fast|status]",
               subcommands=("normal", "fast", "status", "on", "off")),
    CommandDef("skin", "Show or change the display skin/theme", "Configuration",
-               cli_only=True, args_hint="[name]"),
+               args_hint="[name]"),
    CommandDef("voice", "Toggle voice mode", "Configuration",
               args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),

@@ -155,7 +157,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
               args_hint="[days]"),
    CommandDef("platforms", "Show gateway/messaging platform status", "Info",
               cli_only=True, aliases=("gateway",)),
-    CommandDef("paste", "Check clipboard for an image and attach it", "Info",
+    CommandDef("copy", "Copy the last assistant response to clipboard", "Info",
+               cli_only=True, args_hint="[number]"),
+    CommandDef("paste", "Attach clipboard image from your clipboard", "Info",
               cli_only=True),
    CommandDef("image", "Attach a local image file for your next prompt", "Info",
               cli_only=True, args_hint="<path>"),
@@ -1044,6 +1048,51 @@ class SlashCommandCompleter(Completer):
                display_meta=f"{fp}  {meta}" if meta else fp,
            )

+    @staticmethod
+    def _skin_completions(sub_text: str, sub_lower: str):
+        """Yield completions for /skin from available skins."""
+        try:
+            from hermes_cli.skin_engine import list_skins
+            for s in list_skins():
+                name = s["name"]
+                if name.startswith(sub_lower) and name != sub_lower:
+                    yield Completion(
+                        name,
+                        start_position=-len(sub_text),
+                        display=name,
+                        display_meta=s.get("description", "") or s.get("source", ""),
+                    )
+        except Exception:
+            pass
+
+    @staticmethod
+    def _personality_completions(sub_text: str, sub_lower: str):
+        """Yield completions for /personality from configured personalities."""
+        try:
+            from hermes_cli.config import load_config
+            personalities = load_config().get("agent", {}).get("personalities", {})
+            if "none".startswith(sub_lower) and "none" != sub_lower:
+                yield Completion(
+                    "none",
+                    start_position=-len(sub_text),
+                    display="none",
+                    display_meta="clear personality overlay",
+                )
+            for name, prompt in personalities.items():
+                if name.startswith(sub_lower) and name != sub_lower:
+                    if isinstance(prompt, dict):
+                        meta = prompt.get("description") or prompt.get("system_prompt", "")[:50]
+                    else:
+                        meta = str(prompt)[:50]
+                    yield Completion(
+                        name,
+                        start_position=-len(sub_text),
+                        display=name,
+                        display_meta=meta,
+                    )
+        except Exception:
+            pass
+
    def _model_completions(self, sub_text: str, sub_lower: str):
        """Yield completions for /model from config aliases + built-in aliases."""
        seen = set()
@@ -1098,10 +1147,17 @@ class SlashCommandCompleter(Completer):
            sub_text = parts[1] if len(parts) > 1 else ""
            sub_lower = sub_text.lower()

-            # Dynamic model alias completions for /model
-            if " " not in sub_text and base_cmd == "/model":
-                yield from self._model_completions(sub_text, sub_lower)
-                return
+            # Dynamic completions for commands with runtime lists
+            if " " not in sub_text:
+                if base_cmd == "/model":
+                    yield from self._model_completions(sub_text, sub_lower)
+                    return
+                if base_cmd == "/skin":
+                    yield from self._skin_completions(sub_text, sub_lower)
+                    return
+                if base_cmd == "/personality":
+                    yield from self._personality_completions(sub_text, sub_lower)
+                    return

            # Static subcommand completions
            if " " not in sub_text and base_cmd in SUBCOMMANDS and self._command_allowed(base_cmd):
@@ -44,7 +44,8 @@ _EXTRA_ENV_KEYS = frozenset({
    "WEIXIN_HOME_CHANNEL", "WEIXIN_HOME_CHANNEL_NAME", "WEIXIN_DM_POLICY", "WEIXIN_GROUP_POLICY",
    "WEIXIN_ALLOWED_USERS", "WEIXIN_GROUP_ALLOWED_USERS", "WEIXIN_ALLOW_ALL_USERS",
    "BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_PASSWORD",
-    "QQ_APP_ID", "QQ_CLIENT_SECRET", "QQ_HOME_CHANNEL", "QQ_HOME_CHANNEL_NAME",
+    "QQ_APP_ID", "QQ_CLIENT_SECRET", "QQBOT_HOME_CHANNEL", "QQBOT_HOME_CHANNEL_NAME",
+    "QQ_HOME_CHANNEL", "QQ_HOME_CHANNEL_NAME",  # legacy aliases (pre-rename, still read for back-compat)
    "QQ_ALLOWED_USERS", "QQ_GROUP_ALLOWED_USERS", "QQ_ALLOW_ALL_USERS", "QQ_MARKDOWN_SUPPORT",
    "QQ_STT_API_KEY", "QQ_STT_BASE_URL", "QQ_STT_MODEL",
    "TERMINAL_ENV", "TERMINAL_SSH_KEY", "TERMINAL_SSH_PORT",
@@ -417,6 +418,7 @@ DEFAULT_CONFIG = {
        "command_timeout": 30,  # Timeout for browser commands in seconds (screenshot, navigate, etc.)
        "record_sessions": False,  # Auto-record browser sessions as WebM videos
        "allow_private_urls": False,  # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
+        "cdp_url": "",  # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
        "camofox": {
            # When true, Hermes sends a stable profile-scoped userId to Camofox
            # so the server maps it to a persistent Firefox profile automatically.
@@ -861,6 +863,22 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "NVIDIA_API_KEY": {
+        "description": "NVIDIA NIM API key (build.nvidia.com or local NIM endpoint)",
+        "prompt": "NVIDIA NIM API key",
+        "url": "https://build.nvidia.com/",
+        "password": True,
+        "category": "provider",
+        "advanced": True,
+    },
+    "NVIDIA_BASE_URL": {
+        "description": "NVIDIA NIM base URL override (e.g. http://localhost:8000/v1 for local NIM)",
+        "prompt": "NVIDIA NIM base URL (leave empty for default)",
+        "url": None,
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
    "GLM_API_KEY": {
        "description": "Z.AI / GLM API key (also recognized as ZAI_API_KEY / Z_AI_API_KEY)",
        "prompt": "Z.AI / GLM API key",
@@ -1518,12 +1536,12 @@ OPTIONAL_ENV_VARS = {
        "prompt": "Allow All QQ Users",
        "category": "messaging",
    },
-    "QQ_HOME_CHANNEL": {
+    "QQBOT_HOME_CHANNEL": {
        "description": "Default QQ channel/group for cron delivery and notifications",
        "prompt": "QQ Home Channel",
        "category": "messaging",
    },
-    "QQ_HOME_CHANNEL_NAME": {
+    "QQBOT_HOME_CHANNEL_NAME": {
        "description": "Display name for the QQ home channel",
        "prompt": "QQ Home Channel Name",
        "category": "messaging",
@@ -825,6 +825,7 @@ def run_doctor(args):
        ("Arcee AI",         ("ARCEEAI_API_KEY",),                            "https://api.arcee.ai/api/v1/models",  "ARCEE_BASE_URL", True),
        ("DeepSeek",         ("DEEPSEEK_API_KEY",),                           "https://api.deepseek.com/v1/models",  "DEEPSEEK_BASE_URL", True),
        ("Hugging Face",     ("HF_TOKEN",),                                   "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
+        ("NVIDIA NIM",       ("NVIDIA_API_KEY",),                             "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
        ("Alibaba/DashScope", ("DASHSCOPE_API_KEY",),                         "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models", "DASHSCOPE_BASE_URL", True),
        # MiniMax: the /anthropic endpoint doesn't support /models, but the /v1 endpoint does.
        ("MiniMax",          ("MINIMAX_API_KEY",),                            "https://api.minimax.io/v1/models",    "MINIMAX_BASE_URL", True),
@@ -296,6 +296,7 @@ def run_dump(args):
        ("DEEPSEEK_API_KEY", "deepseek"),
        ("DASHSCOPE_API_KEY", "dashscope"),
        ("HF_TOKEN", "huggingface"),
+        ("NVIDIA_API_KEY", "nvidia"),
        ("AI_GATEWAY_API_KEY", "ai_gateway"),
        ("OPENCODE_ZEN_API_KEY", "opencode_zen"),
        ("OPENCODE_GO_API_KEY", "opencode_go"),
@@ -1998,7 +1998,7 @@ _PLATFORMS = [
            {"name": "QQ_ALLOWED_USERS", "prompt": "Allowed user OpenIDs (comma-separated, leave empty for open access)", "password": False,
             "is_allowlist": True,
             "help": "Optional — restrict DM access to specific user OpenIDs."},
-            {"name": "QQ_HOME_CHANNEL", "prompt": "Home channel (user/group OpenID for cron delivery, or empty)", "password": False,
+            {"name": "QQBOT_HOME_CHANNEL", "prompt": "Home channel (user/group OpenID for cron delivery, or empty)", "password": False,
             "help": "OpenID to deliver cron results and notifications to."},
        ],
    },
@@ -2625,6 +2625,215 @@ def _setup_feishu():
        print_info(f"  Bot: {bot_name}")


+def _setup_qqbot():
+    """Interactive setup for QQ Bot — scan-to-configure or manual credentials."""
+    print()
+    print(color("  ─── 🐧 QQ Bot Setup ───", Colors.CYAN))
+
+    existing_app_id = get_env_value("QQ_APP_ID")
+    existing_secret = get_env_value("QQ_CLIENT_SECRET")
+    if existing_app_id and existing_secret:
+        print()
+        print_success("QQ Bot is already configured.")
+        if not prompt_yes_no("  Reconfigure QQ Bot?", False):
+            return
+
+    # ── Choose setup method ──
+    print()
+    method_choices = [
+        "Scan QR code to add bot automatically (recommended)",
+        "Enter existing App ID and App Secret manually",
+    ]
+    method_idx = prompt_choice("  How would you like to set up QQ Bot?", method_choices, 0)
+
+    credentials = None
+    used_qr = False
+
+    if method_idx == 0:
+        # ── QR scan-to-configure ──
+        try:
+            credentials = _qqbot_qr_flow()
+        except KeyboardInterrupt:
+            print()
+            print_warning("  QQ Bot setup cancelled.")
+            return
+        if credentials:
+            used_qr = True
+        if not credentials:
+            print_info("  QR setup did not complete. Continuing with manual input.")
+
+    # ── Manual credential input ──
+    if not credentials:
+        print()
+        print_info("  Go to https://q.qq.com to register a QQ Bot application.")
+        print_info("  Note your App ID and App Secret from the application page.")
+        print()
+        app_id = prompt("  App ID", password=False)
+        if not app_id:
+            print_warning("  Skipped — QQ Bot won't work without an App ID.")
+            return
+        app_secret = prompt("  App Secret", password=True)
+        if not app_secret:
+            print_warning("  Skipped — QQ Bot won't work without an App Secret.")
+            return
+        credentials = {"app_id": app_id.strip(), "client_secret": app_secret.strip(), "user_openid": ""}
+
+    # ── Save core credentials ──
+    save_env_value("QQ_APP_ID", credentials["app_id"])
+    save_env_value("QQ_CLIENT_SECRET", credentials["client_secret"])
+
+    user_openid = credentials.get("user_openid", "")
+
+    # ── DM security policy ──
+    print()
+    access_choices = [
+        "Use DM pairing approval (recommended)",
+        "Allow all direct messages",
+        "Only allow listed user OpenIDs",
+    ]
+    access_idx = prompt_choice("  How should direct messages be authorized?", access_choices, 0)
+    if access_idx == 0:
+        save_env_value("QQ_ALLOW_ALL_USERS", "false")
+        if user_openid:
+            print()
+            if prompt_yes_no(f"  Add yourself ({user_openid}) to the allow list?", True):
+                save_env_value("QQ_ALLOWED_USERS", user_openid)
+                print_success(f"  Allow list set to {user_openid}")
+            else:
+                save_env_value("QQ_ALLOWED_USERS", "")
+        else:
+            save_env_value("QQ_ALLOWED_USERS", "")
+        print_success("  DM pairing enabled.")
+        print_info("  Unknown users can request access; approve with `hermes pairing approve`.")
+    elif access_idx == 1:
+        save_env_value("QQ_ALLOW_ALL_USERS", "true")
+        save_env_value("QQ_ALLOWED_USERS", "")
+        print_warning("  Open DM access enabled for QQ Bot.")
+    else:
+        default_allow = user_openid or ""
+        allowlist = prompt("  Allowed user OpenIDs (comma-separated)", default_allow, password=False).replace(" ", "")
+        save_env_value("QQ_ALLOW_ALL_USERS", "false")
+        save_env_value("QQ_ALLOWED_USERS", allowlist)
+        print_success("  Allowlist saved.")
+
+    # ── Home channel ──
+    if user_openid:
+        print()
+        if prompt_yes_no(f"  Use your QQ user ID ({user_openid}) as the home channel?", True):
+            save_env_value("QQBOT_HOME_CHANNEL", user_openid)
+            print_success(f"  Home channel set to {user_openid}")
+    else:
+        print()
+        home_channel = prompt("  Home channel OpenID (for cron/notifications, or empty)", password=False)
+        if home_channel:
+            save_env_value("QQBOT_HOME_CHANNEL", home_channel.strip())
+            print_success(f"  Home channel set to {home_channel.strip()}")
+
+    print()
+    print_success("🐧 QQ Bot configured!")
+    print_info(f"  App ID: {credentials['app_id']}")
+
+
+def _qqbot_render_qr(url: str) -> bool:
+    """Try to render a QR code in the terminal. Returns True if successful."""
+    try:
+        import qrcode as _qr
+        qr = _qr.QRCode(border=1,error_correction=_qr.constants.ERROR_CORRECT_L)
+        qr.add_data(url)
+        qr.make(fit=True)
+        qr.print_ascii(invert=True)
+        return True
+    except Exception:
+        return False
+
+
+def _qqbot_qr_flow():
+    """Run the QR-code scan-to-configure flow.
+
+    Returns a dict with app_id, client_secret, user_openid on success,
+    or None on failure/cancel.
+    """
+    try:
+        from gateway.platforms.qqbot import (
+            create_bind_task, poll_bind_result, build_connect_url,
+            decrypt_secret, BindStatus,
+        )
+        from gateway.platforms.qqbot.constants import ONBOARD_POLL_INTERVAL
+    except Exception as exc:
+        print_error(f"  QQBot onboard import failed: {exc}")
+        return None
+
+    import asyncio
+    import time
+
+    MAX_REFRESHES = 3
+    refresh_count = 0
+
+    while refresh_count <= MAX_REFRESHES:
+        loop = asyncio.new_event_loop()
+
+        # ── Create bind task ──
+        try:
+            task_id, aes_key = loop.run_until_complete(create_bind_task())
+        except Exception as e:
+            print_warning(f"  Failed to create bind task: {e}")
+            loop.close()
+            return None
+
+        url = build_connect_url(task_id)
+
+        # ── Display QR code + URL ──
+        print()
+        if _qqbot_render_qr(url):
+            print(f"  Scan the QR code above, or open this URL directly:\n  {url}")
+        else:
+            print(f"  Open this URL in QQ on your phone:\n  {url}")
+            print_info("  Tip: pip install qrcode  to show a scannable QR code here")
+
+        # ── Poll loop (silent — keep QR visible at bottom) ──
+        try:
+            while True:
+                try:
+                    status, app_id, encrypted_secret, user_openid = loop.run_until_complete(
+                        poll_bind_result(task_id)
+                    )
+                except Exception:
+                    time.sleep(ONBOARD_POLL_INTERVAL)
+                    continue
+
+                if status == BindStatus.COMPLETED:
+                    client_secret = decrypt_secret(encrypted_secret, aes_key)
+                    print()
+                    print_success(f"  QR scan complete! (App ID: {app_id})")
+                    if user_openid:
+                        print_info(f"  Scanner's OpenID: {user_openid}")
+                    return {
+                        "app_id": app_id,
+                        "client_secret": client_secret,
+                        "user_openid": user_openid,
+                    }
+
+                if status == BindStatus.EXPIRED:
+                    refresh_count += 1
+                    if refresh_count > MAX_REFRESHES:
+                        print()
+                        print_warning(f"  QR code expired {MAX_REFRESHES} times — giving up.")
+                        return None
+                    print()
+                    print_warning(f"  QR code expired, refreshing... ({refresh_count}/{MAX_REFRESHES})")
+                    loop.close()
+                    break  # outer while creates a new task
+
+                time.sleep(ONBOARD_POLL_INTERVAL)
+        except KeyboardInterrupt:
+            loop.close()
+            raise
+        finally:
+            loop.close()
+
+    return None
+
+
 def _setup_signal():
    """Interactive setup for Signal messenger."""
    import shutil
@@ -2806,6 +3015,8 @@ def gateway_setup():
            _setup_dingtalk()
        elif platform["key"] == "feishu":
            _setup_feishu()
+        elif platform["key"] == "qqbot":
+            _setup_qqbot()
        else:
            _setup_standard_platform(platform)

@@ -692,12 +692,12 @@ def switch_model(
            api_key=api_key,
            base_url=base_url,
        )
-    except Exception:
+    except Exception as e:
        validation = {
-            "accepted": True,
-            "persist": True,
+            "accepted": False,
+            "persist": False,
            "recognized": False,
-            "message": None,
+            "message": f"Could not validate `{new_model}`: {e}",
        }

    if not validation.get("accepted"):
@@ -26,7 +26,8 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
 # Fallback OpenRouter snapshot used when the live catalog is unavailable.
 # (model_id, display description shown in menus)
 OPENROUTER_MODELS: list[tuple[str, str]] = [
-    ("anthropic/claude-opus-4.7",       "recommended"),
+    ("moonshotai/kimi-k2.5",            "recommended"),
+    ("anthropic/claude-opus-4.7",       ""),
    ("anthropic/claude-opus-4.6",       ""),
    ("anthropic/claude-sonnet-4.6",     ""),
    ("qwen/qwen3.6-plus",               ""),
@@ -49,7 +50,6 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("z-ai/glm-5.1",                    ""),
    ("z-ai/glm-5v-turbo",               ""),
    ("z-ai/glm-5-turbo",                ""),
-    ("moonshotai/kimi-k2.5",            ""),
    ("x-ai/grok-4.20",                  ""),
    ("nvidia/nemotron-3-super-120b-a12b",      ""),
    ("nvidia/nemotron-3-super-120b-a12b:free", "free"),
@@ -75,6 +75,7 @@ def _codex_curated_models() -> list[str]:

 _PROVIDER_MODELS: dict[str, list[str]] = {
    "nous": [
+        "moonshotai/kimi-k2.5",
        "xiaomi/mimo-v2-pro",
        "anthropic/claude-opus-4.7",
        "anthropic/claude-opus-4.6",
@@ -96,7 +97,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "z-ai/glm-5.1",
        "z-ai/glm-5v-turbo",
        "z-ai/glm-5-turbo",
-        "moonshotai/kimi-k2.5",
        "x-ai/grok-4.20-beta",
        "nvidia/nemotron-3-super-120b-a12b",
        "nvidia/nemotron-3-super-120b-a12b:free",
@@ -135,7 +135,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "gemini-2.5-flash-lite",
        # Gemma open models (also served via AI Studio)
        "gemma-4-31b-it",
-        "gemma-4-26b-it",
    ],
    "google-gemini-cli": [
        "gemini-2.5-pro",
@@ -155,9 +154,23 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "grok-4.20-reasoning",
        "grok-4-1-fast-reasoning",
    ],
+    "nvidia": [
+        # NVIDIA flagship reasoning models
+        "nvidia/nemotron-3-super-120b-a12b",
+        "nvidia/nemotron-3-nano-30b-a3b",
+        "nvidia/llama-3.3-nemotron-super-49b-v1.5",
+        # Third-party agentic models hosted on build.nvidia.com
+        # (map to OpenRouter defaults — users get familiar picks on NIM)
+        "qwen/qwen3.5-397b-a17b",
+        "deepseek-ai/deepseek-v3.2",
+        "moonshotai/kimi-k2.5",
+        "minimaxai/minimax-m2.5",
+        "z-ai/glm5",
+        "openai/gpt-oss-120b",
+    ],
    "kimi-coding": [
-        "kimi-for-coding",
        "kimi-k2.5",
+        "kimi-for-coding",
        "kimi-k2-thinking",
        "kimi-k2-thinking-turbo",
        "kimi-k2-turbo-preview",
@@ -212,6 +225,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "trinity-mini",
    ],
    "opencode-zen": [
+        "kimi-k2.5",
        "gpt-5.4-pro",
        "gpt-5.4",
        "gpt-5.3-codex",
@@ -243,16 +257,15 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "glm-5",
        "glm-4.7",
        "glm-4.6",
-        "kimi-k2.5",
        "kimi-k2-thinking",
        "kimi-k2",
        "qwen3-coder",
        "big-pickle",
    ],
    "opencode-go": [
+        "kimi-k2.5",
        "glm-5.1",
        "glm-5",
-        "kimi-k2.5",
        "mimo-v2-pro",
        "mimo-v2-omni",
        "minimax-m2.7",
@@ -285,21 +298,21 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
    # to https://dashscope-intl.aliyuncs.com/compatible-mode/v1 (OpenAI-compat)
    # or https://dashscope-intl.aliyuncs.com/apps/anthropic (Anthropic-compat).
    "alibaba": [
+        "kimi-k2.5",
        "qwen3.5-plus",
        "qwen3-coder-plus",
        "qwen3-coder-next",
        # Third-party models available on coding-intl
        "glm-5",
        "glm-4.7",
-        "kimi-k2.5",
        "MiniMax-M2.5",
    ],
    # Curated HF model list — only agentic models that map to OpenRouter defaults.
    "huggingface": [
+        "moonshotai/Kimi-K2.5",
        "Qwen/Qwen3.5-397B-A17B",
        "Qwen/Qwen3.5-35B-A3B",
        "deepseek-ai/DeepSeek-V3.2",
-        "moonshotai/Kimi-K2.5",
        "MiniMaxAI/MiniMax-M2.5",
        "zai-org/GLM-5",
        "XiaomiMiMo/MiMo-V2-Flash",
@@ -536,6 +549,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("anthropic",      "Anthropic",                "Anthropic (Claude models — API key or Claude Code)"),
    ProviderEntry("openai-codex",   "OpenAI Codex",             "OpenAI Codex"),
    ProviderEntry("xiaomi",         "Xiaomi MiMo",              "Xiaomi MiMo (MiMo-V2 models — pro, omni, flash)"),
+    ProviderEntry("nvidia",         "NVIDIA NIM",               "NVIDIA NIM (Nemotron models — build.nvidia.com or local NIM)"),
    ProviderEntry("qwen-oauth",     "Qwen OAuth (Portal)",      "Qwen OAuth (reuses local Qwen CLI login)"),
    ProviderEntry("copilot",        "GitHub Copilot",           "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
    ProviderEntry("copilot-acp",    "GitHub Copilot ACP",       "GitHub Copilot ACP (spawns `copilot --acp --stdio`)"),
@@ -618,6 +632,10 @@ _PROVIDER_ALIASES = {
    "grok": "xai",
    "x-ai": "xai",
    "x.ai": "xai",
+    "nim": "nvidia",
+    "nvidia-nim": "nvidia",
+    "build-nvidia": "nvidia",
+    "nemotron": "nvidia",
    "ollama": "custom",  # bare "ollama" = local; use "ollama-cloud" for cloud
    "ollama_cloud": "ollama-cloud",
 }
@@ -2032,8 +2050,8 @@ def validate_requested_model(
                )

            return {
-                "accepted": True,
-                "persist": True,
+                "accepted": False,
+                "persist": False,
                "recognized": False,
                "message": message,
            }
@@ -2046,8 +2064,8 @@ def validate_requested_model(
            message += f"\n  If this server expects `/v1`, try base URL: `{probe.get('suggested_base_url')}`"

        return {
-            "accepted": True,
-            "persist": True,
+            "accepted": False,
+            "persist": False,
            "recognized": False,
            "message": message,
        }
@@ -2081,12 +2099,11 @@ def validate_requested_model(
            if suggestions:
                suggestion_text = "\n  Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
            return {
-                "accepted": True,
-                "persist": True,
+                "accepted": False,
+                "persist": False,
                "recognized": False,
                "message": (
-                    f"Note: `{requested}` was not found in the OpenAI Codex model listing. "
-                    f"It may still work if your account has access to it."
+                    f"Model `{requested}` was not found in the OpenAI Codex model listing."
                    f"{suggestion_text}"
                ),
            }
@@ -2125,16 +2142,15 @@ def validate_requested_model(
            if suggestions:
                suggestion_text = "\n  Similar models: " + ", ".join(f"`{s}`" for s in suggestions)

-            return {
-                "accepted": True,
-                "persist": True,
-                "recognized": False,
-                "message": (
-                    f"Note: `{requested}` was not found in this provider's model listing. "
-                    f"It may still work if your plan supports it."
-                    f"{suggestion_text}"
-                ),
-            }
+        return {
+            "accepted": False,
+            "persist": False,
+            "recognized": False,
+            "message": (
+                f"Model `{requested}` was not found in this provider's model listing."
+                f"{suggestion_text}"
+            ),
+        }

    # api_models is None — couldn't reach API.  Accept and persist,
    # but warn so typos don't silently break things.
@@ -2176,8 +2192,8 @@ def validate_requested_model(

    provider_label = _PROVIDER_LABELS.get(normalized, normalized)
    return {
-        "accepted": True,
-        "persist": True,
+        "accepted": False,
+        "persist": False,
        "recognized": False,
        "message": (
            f"Could not reach the {provider_label} API to validate `{requested}`. "
@@ -137,6 +137,11 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        base_url_override="https://api.x.ai/v1",
        base_url_env_var="XAI_BASE_URL",
    ),
+    "nvidia": HermesOverlay(
+        transport="openai_chat",
+        base_url_override="https://integrate.api.nvidia.com/v1",
+        base_url_env_var="NVIDIA_BASE_URL",
+    ),
    "xiaomi": HermesOverlay(
        transport="openai_chat",
        base_url_env_var="XIAOMI_BASE_URL",
@@ -191,6 +196,12 @@ ALIASES: Dict[str, str] = {
    "x.ai": "xai",
    "grok": "xai",

+    # nvidia
+    "nim": "nvidia",
+    "nvidia-nim": "nvidia",
+    "build-nvidia": "nvidia",
+    "nemotron": "nvidia",
+
    # kimi-for-coding (models.dev ID)
    "kimi": "kimi-for-coding",
    "kimi-coding": "kimi-for-coding",
@@ -91,7 +91,7 @@ _DEFAULT_PROVIDER_MODELS = {
    "gemini": [
        "gemini-3.1-pro-preview", "gemini-3-flash-preview", "gemini-3.1-flash-lite-preview",
        "gemini-2.5-pro", "gemini-2.5-flash", "gemini-2.5-flash-lite",
-        "gemma-4-31b-it", "gemma-4-26b-it",
+        "gemma-4-31b-it",
    ],
    "zai": ["glm-5.1", "glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
    "kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
@@ -2005,52 +2005,6 @@ def _setup_wecom_callback():
    _gw_setup()


-def _setup_qqbot():
-    """Configure QQ Bot gateway."""
-    print_header("QQ Bot")
-    existing = get_env_value("QQ_APP_ID")
-    if existing:
-        print_info("QQ Bot: already configured")
-        if not prompt_yes_no("Reconfigure QQ Bot?", False):
-            return
-
-    print_info("Connects Hermes to QQ via the Official QQ Bot API (v2).")
-    print_info("   Requires a QQ Bot application at q.qq.com")
-    print_info("   Reference: https://bot.q.qq.com/wiki/develop/api-v2/")
-    print()
-
-    app_id = prompt("QQ Bot App ID")
-    if not app_id:
-        print_warning("App ID is required — skipping QQ Bot setup")
-        return
-    save_env_value("QQ_APP_ID", app_id.strip())
-
-    client_secret = prompt("QQ Bot App Secret", password=True)
-    if not client_secret:
-        print_warning("App Secret is required — skipping QQ Bot setup")
-        return
-    save_env_value("QQ_CLIENT_SECRET", client_secret)
-    print_success("QQ Bot credentials saved")
-
-    print()
-    print_info("🔒 Security: Restrict who can DM your bot")
-    print_info("   Use QQ user OpenIDs (found in event payloads)")
-    print()
-    allowed_users = prompt("Allowed user OpenIDs (comma-separated, leave empty for open access)")
-    if allowed_users:
-        save_env_value("QQ_ALLOWED_USERS", allowed_users.replace(" ", ""))
-        print_success("QQ Bot allowlist configured")
-    else:
-        print_info("⚠️  No allowlist set — anyone can DM the bot!")
-
-    print()
-    print_info("📬 Home Channel: OpenID for cron job delivery and notifications.")
-    home_channel = prompt("Home channel OpenID (leave empty to set later)")
-    if home_channel:
-        save_env_value("QQ_HOME_CHANNEL", home_channel)
-
-    print()
-    print_success("QQ Bot configured!")


 def _setup_bluebubbles():
@@ -2119,12 +2073,9 @@ def _setup_bluebubbles():


 def _setup_qqbot():
-    """Configure QQ Bot (Official API v2) via standard platform setup."""
-    from hermes_cli.gateway import _PLATFORMS
-    qq_platform = next((p for p in _PLATFORMS if p["key"] == "qqbot"), None)
-    if qq_platform:
-        from hermes_cli.gateway import _setup_standard_platform
-        _setup_standard_platform(qq_platform)
+    """Configure QQ Bot (Official API v2) via gateway setup."""
+    from hermes_cli.gateway import _setup_qqbot as _gateway_setup_qqbot
+    _gateway_setup_qqbot()


 def _setup_webhooks():
@@ -2264,7 +2215,9 @@ def setup_gateway(config: dict):
            missing_home.append("Slack")
        if get_env_value("BLUEBUBBLES_SERVER_URL") and not get_env_value("BLUEBUBBLES_HOME_CHANNEL"):
            missing_home.append("BlueBubbles")
-        if get_env_value("QQ_APP_ID") and not get_env_value("QQ_HOME_CHANNEL"):
+        if get_env_value("QQ_APP_ID") and not (
+            get_env_value("QQBOT_HOME_CHANNEL") or get_env_value("QQ_HOME_CHANNEL")
+        ):
            missing_home.append("QQBot")

        if missing_home:
@@ -515,6 +515,90 @@ def do_inspect(identifier: str, console: Optional[Console] = None) -> None:
    c.print()


+def browse_skills(page: int = 1, page_size: int = 20, source: str = "all") -> dict:
+    """Paginated hub browse for programmatic callers (e.g. TUI gateway).
+
+    Returns ``{"items": [...], "page": int, "total_pages": int, "total": int}``.
+    """
+    from tools.skills_hub import GitHubAuth, create_source_router
+
+    page_size = max(1, min(page_size, 100))
+    _TRUST_RANK = {"builtin": 3, "trusted": 2, "community": 1}
+    _PER_SOURCE_LIMIT = {"official": 100, "skills-sh": 100, "well-known": 25, "github": 100, "clawhub": 50,
+                         "claude-marketplace": 50, "lobehub": 50}
+    auth = GitHubAuth()
+    sources = create_source_router(auth)
+    all_results: list = []
+    for src in sources:
+        sid = src.source_id()
+        if source != "all" and sid != source and sid != "official":
+            continue
+        try:
+            limit = _PER_SOURCE_LIMIT.get(sid, 50)
+            all_results.extend(src.search("", limit=limit))
+        except Exception:
+            continue
+    if not all_results:
+        return {"items": [], "page": 1, "total_pages": 1, "total": 0}
+    seen: dict = {}
+    for r in all_results:
+        rank = _TRUST_RANK.get(r.trust_level, 0)
+        if r.name not in seen or rank > _TRUST_RANK.get(seen[r.name].trust_level, 0):
+            seen[r.name] = r
+    deduped = list(seen.values())
+    deduped.sort(key=lambda r: (-_TRUST_RANK.get(r.trust_level, 0), r.source != "official", r.name.lower()))
+    total = len(deduped)
+    total_pages = max(1, (total + page_size - 1) // page_size)
+    page = max(1, min(page, total_pages))
+    start = (page - 1) * page_size
+    page_items = deduped[start : min(start + page_size, total)]
+    return {
+        "items": [{"name": r.name, "description": r.description, "source": r.source,
+                    "trust": r.trust_level} for r in page_items],
+        "page": page,
+        "total_pages": total_pages,
+        "total": total,
+    }
+
+
+def inspect_skill(identifier: str) -> Optional[dict]:
+    """Skill metadata (+ SKILL.md preview) for programmatic callers."""
+    from tools.skills_hub import GitHubAuth, create_source_router
+
+    class _Q:
+        def print(self, *a, **k):
+            pass
+
+    c = _Q()
+    auth = GitHubAuth()
+    sources = create_source_router(auth)
+    ident = identifier
+    if "/" not in ident:
+        ident = _resolve_short_name(ident, sources, c)
+        if not ident:
+            return None
+    meta, bundle, _ = _resolve_source_meta_and_bundle(ident, sources)
+    if not meta:
+        return None
+    out: dict = {
+        "name": meta.name,
+        "description": meta.description,
+        "source": meta.source,
+        "identifier": meta.identifier,
+        "tags": list(meta.tags) if meta.tags else [],
+    }
+    if bundle and "SKILL.md" in bundle.files:
+        content = bundle.files["SKILL.md"]
+        if isinstance(content, bytes):
+            content = content.decode("utf-8", errors="replace")
+        lines = content.split("\n")
+        preview = "\n".join(lines[:50])
+        if len(lines) > 50:
+            preview += f"\n\n... ({len(lines) - 50} more lines)"
+        out["skill_md_preview"] = preview
+    return out
+
+
 def do_list(source_filter: str = "all", console: Optional[Console] = None) -> None:
    """List installed skills, distinguishing hub, builtin, and local skills."""
    from tools.skills_hub import HubLockFile, ensure_hub_dirs
@@ -23,7 +23,7 @@ All fields are optional. Missing values inherit from the ``default`` skin.
      banner_dim: "#B8860B"               # Dim/muted text (separators, labels)
      banner_text: "#FFF8DC"              # Body text (tool names, skill names)
      ui_accent: "#FFBF00"               # General UI accent
-      ui_label: "#4dd0e1"                # UI labels
+      ui_label: "#DAA520"                # UI labels (warm gold; teal clashed w/ default banner gold)
      ui_ok: "#4caf50"                   # Success indicators
      ui_error: "#ef5350"                # Error indicators
      ui_warn: "#ffa726"                 # Warning indicators
@@ -163,7 +163,7 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
            "banner_dim": "#B8860B",
            "banner_text": "#FFF8DC",
            "ui_accent": "#FFBF00",
-            "ui_label": "#4dd0e1",
+            "ui_label": "#DAA520",
            "ui_ok": "#4caf50",
            "ui_error": "#ef5350",
            "ui_warn": "#ffa726",
@@ -317,7 +317,7 @@ def show_status(args):
        "WeCom Callback": ("WECOM_CALLBACK_CORP_ID", None),
        "Weixin": ("WEIXIN_ACCOUNT_ID", "WEIXIN_HOME_CHANNEL"),
        "BlueBubbles": ("BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_HOME_CHANNEL"),
-        "QQBot": ("QQ_APP_ID", "QQ_HOME_CHANNEL"),
+        "QQBot": ("QQ_APP_ID", "QQBOT_HOME_CHANNEL"),
    }
    
    for name, (token_var, home_var) in platforms.items():
@@ -327,6 +327,9 @@ def show_status(args):
        home_channel = ""
        if home_var:
            home_channel = os.getenv(home_var, "")
+        # Back-compat: QQBot home channel was renamed from QQ_HOME_CHANNEL to QQBOT_HOME_CHANNEL
+        if not home_channel and home_var == "QQBOT_HOME_CHANNEL":
+            home_channel = os.getenv("QQ_HOME_CHANNEL", "")
        
        status = "configured" if has_token else "not configured"
        if home_channel:
@@ -14,7 +14,8 @@ def get_hermes_home() -> Path:
    Reads HERMES_HOME env var, falls back to ~/.hermes.
    This is the single source of truth — all other copies should import this.
    """
-    return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
+    val = os.environ.get("HERMES_HOME", "").strip()
+    return Path(val) if val else Path.home() / ".hermes"


 def get_default_hermes_root() -> Path:
@@ -103,6 +103,28 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2)
          echo "ok" > $out/result
        '';

+        # Verify bundled TUI is present and compiled
+        bundled-tui = pkgs.runCommand "hermes-bundled-tui" { } ''
+          set -e
+          echo "=== Checking bundled TUI ==="
+          test -d ${hermes-agent}/ui-tui || (echo "FAIL: ui-tui directory missing"; exit 1)
+          echo "PASS: ui-tui directory exists"
+
+          test -f ${hermes-agent}/ui-tui/dist/entry.js || (echo "FAIL: compiled entry.js missing"; exit 1)
+          echo "PASS: compiled entry.js present"
+
+          test -d ${hermes-agent}/ui-tui/node_modules || (echo "FAIL: node_modules missing"; exit 1)
+          echo "PASS: node_modules present"
+
+          grep -q "HERMES_TUI_DIR" ${hermes-agent}/bin/hermes || \
+            (echo "FAIL: HERMES_TUI_DIR not in wrapper"; exit 1)
+          echo "PASS: HERMES_TUI_DIR set in wrapper"
+
+          echo "=== All bundled TUI checks passed ==="
+          mkdir -p $out
+          echo "ok" > $out/result
+        '';
+
        # Verify HERMES_MANAGED guard works on all mutation commands
        managed-guard = pkgs.runCommand "hermes-managed-guard" { } ''
          set -e
@@ -1,49 +1,26 @@
-# nix/devShell.nix — Fast dev shell with stamp-file optimization
+# nix/devShell.nix — Dev shell that delegates setup to each package
+#
+# Each package in inputsFrom exposes passthru.devShellHook — a bash snippet
+# with stamp-checked setup logic. This file collects and runs them all.
 { inputs, ... }: {
-  perSystem = { pkgs, ... }:
+  perSystem = { pkgs, system, ... }:
    let
-      python = pkgs.python311;
+      hermes-agent = inputs.self.packages.${system}.default;
+      hermes-tui = inputs.self.packages.${system}.tui;
+      packages = [ hermes-agent hermes-tui ];
    in {
      devShells.default = pkgs.mkShell {
+        inputsFrom = packages;
        packages = with pkgs; [
-          python uv nodejs_20 ripgrep git openssh ffmpeg
+          python311 uv nodejs_22 ripgrep git openssh ffmpeg
        ];

-        shellHook = ''
+        shellHook = let
+          hooks = map (p: p.passthru.devShellHook or "") packages;
+          combined = pkgs.lib.concatStringsSep "\n" (builtins.filter (h: h != "") hooks);
+        in ''
          echo "Hermes Agent dev shell"
-
-          # Composite stamp: changes when nix python or uv change
-          STAMP_VALUE="${python}:${pkgs.uv}"
-          STAMP_FILE=".venv/.nix-stamp"
-
-          # Create venv if missing
-          if [ ! -d .venv ]; then
-            echo "Creating Python 3.11 venv..."
-            uv venv .venv --python ${python}/bin/python3
-          fi
-
-          source .venv/bin/activate
-
-          # Only install if stamp is stale or missing
-          if [ ! -f "$STAMP_FILE" ] || [ "$(cat "$STAMP_FILE")" != "$STAMP_VALUE" ]; then
-            echo "Installing Python dependencies..."
-            uv pip install -e ".[all]"
-            if [ -d mini-swe-agent ]; then
-              uv pip install -e ./mini-swe-agent 2>/dev/null || true
-            fi
-            if [ -d tinker-atropos ]; then
-              uv pip install -e ./tinker-atropos 2>/dev/null || true
-            fi
-
-            # Install npm deps
-            if [ -f package.json ] && [ ! -d node_modules ]; then
-              echo "Installing npm dependencies..."
-              npm install
-            fi
-
-            echo "$STAMP_VALUE" > "$STAMP_FILE"
-          fi
-
+          ${combined}
          echo "Ready. Run 'hermes' to start."
        '';
      };
@@ -1,54 +1,108 @@
 # nix/packages.nix — Hermes Agent package built with uv2nix
-{ inputs, ... }: {
-  perSystem = { pkgs, system, ... }:
+{ inputs, ... }:
+{
+  perSystem =
+    { pkgs, inputs', ... }:
    let
      hermesVenv = pkgs.callPackage ./python.nix {
        inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
      };

+      hermesTui = pkgs.callPackage ./tui.nix {
+        npm-lockfile-fix = inputs'.npm-lockfile-fix.packages.default;
+      };
+
      # Import bundled skills, excluding runtime caches
      bundledSkills = pkgs.lib.cleanSourceWith {
        src = ../skills;
-        filter = path: _type:
-          !(pkgs.lib.hasInfix "/index-cache/" path);
+        filter = path: _type: !(pkgs.lib.hasInfix "/index-cache/" path);
      };

      runtimeDeps = with pkgs; [
-        nodejs_20 ripgrep git openssh ffmpeg tirith
+        nodejs_22
+        ripgrep
+        git
+        openssh
+        ffmpeg
+        tirith
      ];

      runtimePath = pkgs.lib.makeBinPath runtimeDeps;
-    in {
-      packages.default = pkgs.stdenv.mkDerivation {
-        pname = "hermes-agent";
-        version = (builtins.fromTOML (builtins.readFile ../pyproject.toml)).project.version;

-        dontUnpack = true;
-        dontBuild = true;
-        nativeBuildInputs = [ pkgs.makeWrapper ];
+      # Lockfile hashes for dev shell stamps
+      pyprojectHash = builtins.hashString "sha256" (builtins.readFile ../pyproject.toml);
+      uvLockHash =
+        if builtins.pathExists ../uv.lock then
+          builtins.hashString "sha256" (builtins.readFile ../uv.lock)
+        else
+          "none";
+    in
+    {
+      packages = {
+        default = pkgs.stdenv.mkDerivation {
+          pname = "hermes-agent";
+          version = (fromTOML (builtins.readFile ../pyproject.toml)).project.version;

-        installPhase = ''
-          runHook preInstall
+          dontUnpack = true;
+          dontBuild = true;
+          nativeBuildInputs = [ pkgs.makeWrapper ];

-          mkdir -p $out/share/hermes-agent $out/bin
-          cp -r ${bundledSkills} $out/share/hermes-agent/skills
+          installPhase = ''
+            runHook preInstall

-          ${pkgs.lib.concatMapStringsSep "\n" (name: ''
-            makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
-              --suffix PATH : "${runtimePath}" \
-              --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills
-          '') [ "hermes" "hermes-agent" "hermes-acp" ]}
+            mkdir -p $out/share/hermes-agent $out/bin
+            cp -r ${bundledSkills} $out/share/hermes-agent/skills

-          runHook postInstall
-        '';
+            # copy pre-built TUI (same layout as dev: ui-tui/dist/ + node_modules/)
+            mkdir -p $out/ui-tui
+            cp -r ${hermesTui}/lib/hermes-tui/* $out/ui-tui/

-        meta = with pkgs.lib; {
-          description = "AI agent with advanced tool-calling capabilities";
-          homepage = "https://github.com/NousResearch/hermes-agent";
-          mainProgram = "hermes";
-          license = licenses.mit;
-          platforms = platforms.unix;
+            ${pkgs.lib.concatMapStringsSep "\n"
+              (name: ''
+                makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
+                  --suffix PATH : "${runtimePath}" \
+                  --set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills \
+                  --set HERMES_TUI_DIR $out/ui-tui \
+                  --set HERMES_PYTHON ${hermesVenv}/bin/python3
+              '')
+              [
+                "hermes"
+                "hermes-agent"
+                "hermes-acp"
+              ]
+            }
+
+            runHook postInstall
+          '';
+
+          passthru.devShellHook = ''
+            STAMP=".nix-stamps/hermes-agent"
+            STAMP_VALUE="${pyprojectHash}:${uvLockHash}"
+            if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
+              echo "hermes-agent: installing Python dependencies..."
+              uv venv .venv --python ${pkgs.python311}/bin/python3 2>/dev/null || true
+              source .venv/bin/activate
+              uv pip install -e ".[all]"
+              [ -d mini-swe-agent ] && uv pip install -e ./mini-swe-agent 2>/dev/null || true
+              [ -d tinker-atropos ] && uv pip install -e ./tinker-atropos 2>/dev/null || true
+              mkdir -p .nix-stamps
+              echo "$STAMP_VALUE" > "$STAMP"
+            else
+              source .venv/bin/activate
+              export HERMES_PYTHON=${hermesVenv}/bin/python3
+            fi
+          '';
+
+          meta = with pkgs.lib; {
+            description = "AI agent with advanced tool-calling capabilities";
+            homepage = "https://github.com/NousResearch/hermes-agent";
+            mainProgram = "hermes";
+            license = licenses.mit;
+            platforms = platforms.unix;
+          };
        };
+
+        tui = hermesTui;
      };
    };
 }
@@ -0,0 +1,82 @@
+# nix/tui.nix — Hermes TUI (Ink/React) compiled with tsc and bundled
+{ pkgs, npm-lockfile-fix, ... }:
+let
+  src = ../ui-tui;
+  npmDeps = pkgs.fetchNpmDeps {
+    inherit src;
+    hash = "sha256-zsUPmbC6oMUO10EhS3ptvDjwlfpCSEmrkjyeORw7fac=";
+  };
+
+  packageJson = builtins.fromJSON (builtins.readFile (src + "/package.json"));
+  version = packageJson.version;
+
+  npmLockHash = builtins.hashString "sha256" (builtins.readFile ../ui-tui/package-lock.json);
+in
+pkgs.buildNpmPackage {
+  pname = "hermes-tui";
+  inherit src npmDeps version;
+
+  doCheck = false;
+
+  postPatch = ''
+    # fetchNpmDeps strips the trailing newline; match it so the diff passes
+    sed -i -z 's/\n$//' package-lock.json
+  '';
+
+  installPhase = ''
+    runHook preInstall
+
+    mkdir -p $out/lib/hermes-tui
+
+    cp -r dist $out/lib/hermes-tui/dist
+
+    # runtime node_modules
+    cp -r node_modules $out/lib/hermes-tui/node_modules
+
+    # @hermes/ink is a file: dependency, we need to copy it in fr
+    rm -f $out/lib/hermes-tui/node_modules/@hermes/ink
+    cp -r packages/hermes-ink $out/lib/hermes-tui/node_modules/@hermes/ink
+
+    # package.json needed for "type": "module" resolution
+    cp package.json $out/lib/hermes-tui/
+
+    runHook postInstall
+  '';
+
+  nativeBuildInputs = [
+    (pkgs.writeShellScriptBin "update_tui_lockfile" ''
+      set -euox pipefail
+
+      # get root of repo
+      REPO_ROOT=$(git rev-parse --show-toplevel)
+
+      # cd into ui-tui and reinstall
+      cd "$REPO_ROOT/ui-tui"
+      rm -rf node_modules/
+      npm cache clean --force
+      CI=true npm install # ci env var to suppress annoying unicode install banner lag
+      ${pkgs.lib.getExe npm-lockfile-fix} ./package-lock.json
+
+      NIX_FILE="$REPO_ROOT/nix/tui.nix"
+      # compute the new hash
+      sed -i "s/hash = \"[^\"]*\";/hash = \"\";/" $NIX_FILE
+      NIX_OUTPUT=$(nix build .#tui 2>&1 || true)
+      NEW_HASH=$(echo "$NIX_OUTPUT" | grep 'got:' | awk '{print $2}') 
+      echo got new hash $NEW_HASH
+      sed -i "s|hash = \"[^\"]*\";|hash = \"$NEW_HASH\";|" $NIX_FILE
+      nix build .#tui
+      echo "Updated npm hash in $NIX_FILE to $NEW_HASH"
+    '')
+  ];
+
+  passthru.devShellHook = ''
+    STAMP=".nix-stamps/hermes-tui"
+    STAMP_VALUE="${npmLockHash}"
+    if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP")" != "$STAMP_VALUE" ]; then
+      echo "hermes-tui: installing npm dependencies..."
+      cd ui-tui && CI=true npm install --silent --no-fund --no-audit 2>/dev/null && cd ..
+      mkdir -p .nix-stamps
+      echo "$STAMP_VALUE" > "$STAMP"
+    fi
+  '';
+}
@@ -76,8 +76,8 @@ termux = [
  "hermes-agent[honcho]",
  "hermes-agent[acp]",
 ]
-dingtalk = ["dingtalk-stream>=0.1.0,<1"]
-feishu = ["lark-oapi>=1.5.3,<2"]
+dingtalk = ["dingtalk-stream>=0.1.0,<1", "qrcode>=7.0,<8"]
+feishu = ["lark-oapi>=1.5.3,<2", "qrcode>=7.0,<8"]
 web = ["fastapi>=0.104.0,<1", "uvicorn[standard]>=0.24.0,<1"]
 rl = [
  "atroposlib @ git+https://github.com/NousResearch/atropos.git@c20c85256e5a45ad31edf8b7276e9c5ee1995a30",
@@ -126,7 +126,7 @@ py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajector
 hermes_cli = ["web_dist/**/*"]

 [tool.setuptools.packages.find]
-include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]
+include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]

 [tool.pytest.ini_options]
 testpaths = ["tests"]
@@ -353,12 +353,50 @@ def _sanitize_surrogates(text: str) -> str:
    return text


+def _sanitize_structure_surrogates(payload: Any) -> bool:
+    """Replace surrogate code points in nested dict/list payloads in-place.
+
+    Mirror of ``_sanitize_structure_non_ascii`` but for surrogate recovery.
+    Used to scrub nested structured fields (e.g. ``reasoning_details`` — an
+    array of dicts with ``summary``/``text`` strings) that flat per-field
+    checks don't reach.  Returns True if any surrogates were replaced.
+    """
+    found = False
+
+    def _walk(node):
+        nonlocal found
+        if isinstance(node, dict):
+            for key, value in node.items():
+                if isinstance(value, str):
+                    if _SURROGATE_RE.search(value):
+                        node[key] = _SURROGATE_RE.sub('\ufffd', value)
+                        found = True
+                elif isinstance(value, (dict, list)):
+                    _walk(value)
+        elif isinstance(node, list):
+            for idx, value in enumerate(node):
+                if isinstance(value, str):
+                    if _SURROGATE_RE.search(value):
+                        node[idx] = _SURROGATE_RE.sub('\ufffd', value)
+                        found = True
+                elif isinstance(value, (dict, list)):
+                    _walk(value)
+
+    _walk(payload)
+    return found
+
+
 def _sanitize_messages_surrogates(messages: list) -> bool:
    """Sanitize surrogate characters from all string content in a messages list.

    Walks message dicts in-place. Returns True if any surrogates were found
-    and replaced, False otherwise. Covers content/text, name, and tool call
-    metadata/arguments so retries don't fail on a non-content field.
+    and replaced, False otherwise. Covers content/text, name, tool call
+    metadata/arguments, AND any additional string or nested structured fields
+    (``reasoning``, ``reasoning_content``, ``reasoning_details``, etc.) so
+    retries don't fail on a non-content field.  Byte-level reasoning models
+    (xiaomi/mimo, kimi, glm) can emit lone surrogates in reasoning output
+    that flow through to ``api_messages["reasoning_content"]`` on the next
+    turn and crash json.dumps inside the OpenAI SDK.
    """
    found = False
    for msg in messages:
@@ -398,6 +436,21 @@ def _sanitize_messages_surrogates(messages: list) -> bool:
                    if isinstance(fn_args, str) and _SURROGATE_RE.search(fn_args):
                        fn["arguments"] = _SURROGATE_RE.sub('\ufffd', fn_args)
                        found = True
+        # Walk any additional string / nested fields (reasoning,
+        # reasoning_content, reasoning_details, etc.) — surrogates from
+        # byte-level reasoning models (xiaomi/mimo, kimi, glm) can lurk
+        # in these fields and aren't covered by the per-field checks above.
+        # Matches _sanitize_messages_non_ascii's coverage (PR #10537).
+        for key, value in msg.items():
+            if key in {"content", "name", "tool_calls", "role"}:
+                continue
+            if isinstance(value, str):
+                if _SURROGATE_RE.search(value):
+                    msg[key] = _SURROGATE_RE.sub('\ufffd', value)
+                    found = True
+            elif isinstance(value, (dict, list)):
+                if _sanitize_structure_surrogates(value):
+                    found = True
    return found


@@ -5841,6 +5894,7 @@ class AIAgent:
                                    )
                                except Exception:
                                    pass
+                                self._emit_status("🔄 Reconnected — resuming…")
                                continue
                            self._emit_status(
                                "❌ Connection to provider failed after "
@@ -6949,7 +7003,7 @@ class AIAgent:
            # (gateway, batch, quiet) still get reasoning.
            # Any reasoning that wasn't shown during streaming is caught by the
            # CLI post-response display fallback (cli.py _reasoning_shown_this_turn).
-            if not self.stream_delta_callback:
+            if not self.stream_delta_callback and not self._stream_callback:
                try:
                    self.reasoning_callback(reasoning_text)
                except Exception:
@@ -7154,14 +7208,22 @@ class AIAgent:

            # Use auxiliary client for the flush call when available --
            # it's cheaper and avoids Codex Responses API incompatibility.
-            from agent.auxiliary_client import call_llm as _call_llm
+            from agent.auxiliary_client import (
+                call_llm as _call_llm,
+                _fixed_temperature_for_model,
+            )
            _aux_available = True
+            # Use the fixed-temperature override (e.g. kimi-for-coding → 0.6) if
+            # the model has a strict contract; otherwise the historical 0.3 default.
+            _flush_temperature = _fixed_temperature_for_model(self.model)
+            if _flush_temperature is None:
+                _flush_temperature = 0.3
            try:
                response = _call_llm(
                    task="flush_memories",
                    messages=api_messages,
                    tools=[memory_tool_def],
-                    temperature=0.3,
+                    temperature=_flush_temperature,
                    max_tokens=5120,
                    # timeout resolved from auxiliary.flush_memories.timeout config
                )
@@ -7173,7 +7235,7 @@ class AIAgent:
                # No auxiliary client -- use the Codex Responses path directly
                codex_kwargs = self._build_api_kwargs(api_messages)
                codex_kwargs["tools"] = self._responses_tools([memory_tool_def])
-                codex_kwargs["temperature"] = 0.3
+                codex_kwargs["temperature"] = _flush_temperature
                if "max_output_tokens" in codex_kwargs:
                    codex_kwargs["max_output_tokens"] = 5120
                response = self._run_codex_stream(codex_kwargs)
@@ -7192,7 +7254,7 @@ class AIAgent:
                    "model": self.model,
                    "messages": api_messages,
                    "tools": [memory_tool_def],
-                    "temperature": 0.3,
+                    "temperature": _flush_temperature,
                    **self._max_tokens_param(5120),
                }
                from agent.auxiliary_client import _get_task_timeout
@@ -8688,6 +8750,7 @@ class AIAgent:
                                {
                                    "name": tc["function"]["name"],
                                    "result": _results_by_id.get(tc.get("id")),
+                                    "arguments": tc["function"].get("arguments"),
                                }
                                for tc in _m["tool_calls"]
                                if isinstance(tc, dict)
@@ -9302,8 +9365,7 @@ class AIAgent:
                                "and had none left for the actual response.\n\n"
                                "To fix this:\n"
                                "→ Lower reasoning effort: `/thinkon low` or `/thinkon minimal`\n"
-                                "→ Increase the output token limit: "
-                                "set `model.max_tokens` in config.yaml"
+                                "→ Or switch to a larger/non-reasoning model with `/model`"
                            )
                            self._cleanup_task_resources(effective_task_id)
                            self._persist_session(messages, conversation_history)
@@ -9570,13 +9632,51 @@ class AIAgent:
                    if isinstance(api_error, UnicodeEncodeError) and getattr(self, '_unicode_sanitization_passes', 0) < 2:
                        _err_str = str(api_error).lower()
                        _is_ascii_codec = "'ascii'" in _err_str or "ascii" in _err_str
+                        # Detect surrogate errors — utf-8 codec refusing to
+                        # encode U+D800..U+DFFF.  The error text is:
+                        #   "'utf-8' codec can't encode characters in position
+                        #    N-M: surrogates not allowed"
+                        _is_surrogate_error = (
+                            "surrogate" in _err_str
+                            or ("'utf-8'" in _err_str and not _is_ascii_codec)
+                        )
+                        # Sanitize surrogates from both the canonical `messages`
+                        # list AND `api_messages` (the API-copy, which may carry
+                        # `reasoning_content`/`reasoning_details` transformed
+                        # from `reasoning` — fields the canonical list doesn't
+                        # have directly).  Also clean `api_kwargs` if built and
+                        # `prefill_messages` if present.  Mirrors the ASCII
+                        # codec recovery below.
                        _surrogates_found = _sanitize_messages_surrogates(messages)
-                        if _surrogates_found:
+                        if isinstance(api_messages, list):
+                            if _sanitize_messages_surrogates(api_messages):
+                                _surrogates_found = True
+                        if isinstance(api_kwargs, dict):
+                            if _sanitize_structure_surrogates(api_kwargs):
+                                _surrogates_found = True
+                        if isinstance(getattr(self, "prefill_messages", None), list):
+                            if _sanitize_messages_surrogates(self.prefill_messages):
+                                _surrogates_found = True
+                        # Gate the retry on the error type, not on whether we
+                        # found anything — _force_ascii_payload / the extended
+                        # surrogate walker above cover all known paths, but a
+                        # new transformed field could still slip through.  If
+                        # the error was a surrogate encode failure, always let
+                        # the retry run; the proactive sanitizer at line ~8781
+                        # runs again on the next iteration.  Bounded by
+                        # _unicode_sanitization_passes < 2 (outer guard).
+                        if _surrogates_found or _is_surrogate_error:
                            self._unicode_sanitization_passes += 1
-                            self._vprint(
-                                f"{self.log_prefix}⚠️  Stripped invalid surrogate characters from messages. Retrying...",
-                                force=True,
-                            )
+                            if _surrogates_found:
+                                self._vprint(
+                                    f"{self.log_prefix}⚠️  Stripped invalid surrogate characters from messages. Retrying...",
+                                    force=True,
+                                )
+                            else:
+                                self._vprint(
+                                    f"{self.log_prefix}⚠️  Surrogate encoding error — retrying after full-payload sanitization...",
+                                    force=True,
+                                )
                            continue
                        if _is_ascii_codec:
                            self._force_ascii_payload = True
@@ -10344,9 +10444,9 @@ class AIAgent:
                                    pass
                    wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
                    if is_rate_limited:
-                        self._emit_status(f"⏱️ Rate limit reached. Waiting {wait_time}s before retry (attempt {retry_count + 1}/{max_retries})...")
+                        self._emit_status(f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries})...")
                    else:
-                        self._emit_status(f"⏳ Retrying in {wait_time}s (attempt {retry_count}/{max_retries})...")
+                        self._emit_status(f"⏳ Retrying in {wait_time:.1f}s (attempt {retry_count}/{max_retries})...")
                    logger.warning(
                        "Retrying API call in %ss (attempt %s/%s) %s error=%s",
                        wait_time,
@@ -10762,7 +10862,14 @@ class AIAgent:
                        elif self.quiet_mode:
                            clean = self._strip_think_blocks(turn_content).strip()
                            if clean:
-                                self._vprint(f"  ┊ 💬 {clean}")
+                                relayed = False
+                                if (
+                                    self.tool_progress_callback
+                                    and getattr(self, "platform", "") == "tui"
+                                ):
+                                    relayed = True
+                                if not relayed:
+                                    self._vprint(f"  ┊ 💬 {clean}")
                    
                    # Pop thinking-only prefill message(s) before appending
                    # (tool-call path — same rationale as the final-response path).
@@ -721,6 +721,20 @@ function Install-NodeDeps {
        }
    }
    
+    # Install TUI dependencies
+    $tuiDir = "$InstallDir\ui-tui"
+    if (Test-Path "$tuiDir\package.json") {
+        Write-Info "Installing TUI dependencies..."
+        Push-Location $tuiDir
+        try {
+            npm install --silent 2>&1 | Out-Null
+            Write-Success "TUI dependencies installed"
+        } catch {
+            Write-Warn "TUI npm install failed (hermes --tui may not work)"
+        }
+        Pop-Location
+    }
+
    # Install WhatsApp bridge dependencies
    $bridgeDir = "$InstallDir\scripts\whatsapp-bridge"
    if (Test-Path "$bridgeDir\package.json") {
@@ -1194,6 +1194,16 @@ install_node_deps() {
        log_success "Browser engine setup complete"
    fi

+    # Install TUI dependencies
+    if [ -f "$INSTALL_DIR/ui-tui/package.json" ]; then
+        log_info "Installing TUI dependencies..."
+        cd "$INSTALL_DIR/ui-tui"
+        npm install --silent 2>/dev/null || {
+            log_warn "TUI npm install failed (hermes --tui may not work)"
+        }
+        log_success "TUI dependencies installed"
+    fi
+
    # Install WhatsApp bridge dependencies
    if [ -f "$INSTALL_DIR/scripts/whatsapp-bridge/package.json" ]; then
        log_info "Installing WhatsApp bridge dependencies..."
@@ -0,0 +1,238 @@
+#!/usr/bin/env bash
+# ============================================================================
+# scripts/lib/node-bootstrap.sh
+# ----------------------------------------------------------------------------
+# Sourceable helper: ensure Node.js >= MIN_VERSION is available for the TUI
+# (React + Ink), browser tools, and the WhatsApp bridge.
+#
+# Strategy (first hit wins — respects the user's existing tooling):
+#   1. modern `node` already on PATH
+#   2. ~/.hermes/node/ from a prior Hermes-managed install
+#   3. fnm, proto, nvm (in that order) if the user already uses a version manager
+#   4. Termux `pkg`, macOS Homebrew
+#   5. pinned nodejs.org tarball into ~/.hermes/node/ (always works, zero shell rc edits)
+#
+# Usage:
+#   source scripts/lib/node-bootstrap.sh
+#   ensure_node   # returns 0 on success, non-zero on failure
+#   if [ "$HERMES_NODE_AVAILABLE" = true ]; then ...; fi
+#
+# Env inputs (set before sourcing to override defaults):
+#   HERMES_NODE_MIN_VERSION   (default: 20)   — accepted on PATH
+#   HERMES_NODE_TARGET_MAJOR  (default: 22)   — installed when we install
+#   HERMES_HOME               (default: $HOME/.hermes)
+# ============================================================================
+
+HERMES_NODE_MIN_VERSION="${HERMES_NODE_MIN_VERSION:-20}"
+HERMES_NODE_TARGET_MAJOR="${HERMES_NODE_TARGET_MAJOR:-22}"
+HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
+HERMES_NODE_AVAILABLE=false
+
+# ---------------------------------------------------------------------------
+# Logging — prefer the host script's log_* helpers when present
+# ---------------------------------------------------------------------------
+
+_nb_log()  { declare -F log_info    >/dev/null 2>&1 && log_info    "$*" || printf '→ %s\n' "$*" >&2; }
+_nb_ok()   { declare -F log_success >/dev/null 2>&1 && log_success "$*" || printf '✓ %s\n' "$*" >&2; }
+_nb_warn() { declare -F log_warn    >/dev/null 2>&1 && log_warn    "$*" || printf '⚠ %s\n' "$*" >&2; }
+
+# ---------------------------------------------------------------------------
+# Platform + version helpers
+# ---------------------------------------------------------------------------
+
+_nb_is_termux() {
+    [ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
+}
+
+_nb_node_major() {
+    local v
+    v=$(node --version 2>/dev/null | sed 's/^v//' | cut -d. -f1)
+    [[ "$v" =~ ^[0-9]+$ ]] && echo "$v" || echo 0
+}
+
+_nb_have_modern_node() {
+    command -v node >/dev/null 2>&1 || return 1
+    [ "$(_nb_node_major)" -ge "$HERMES_NODE_MIN_VERSION" ]
+}
+
+# ---------------------------------------------------------------------------
+# Version-manager paths — respect what the user already uses
+# ---------------------------------------------------------------------------
+
+_nb_try_fnm() {
+    command -v fnm >/dev/null 2>&1 || return 1
+    _nb_log "fnm detected — installing Node $HERMES_NODE_TARGET_MAJOR..."
+    eval "$(fnm env 2>/dev/null)" || true
+    fnm install "$HERMES_NODE_TARGET_MAJOR" >/dev/null 2>&1 || return 1
+    fnm use     "$HERMES_NODE_TARGET_MAJOR" >/dev/null 2>&1 || return 1
+    _nb_have_modern_node || return 1
+    _nb_ok "Node $(node --version) activated via fnm"
+    return 0
+}
+
+_nb_try_proto() {
+    command -v proto >/dev/null 2>&1 || return 1
+    _nb_log "proto detected — installing Node $HERMES_NODE_TARGET_MAJOR..."
+    proto install node "$HERMES_NODE_TARGET_MAJOR" >/dev/null 2>&1 || return 1
+    _nb_have_modern_node || return 1
+    _nb_ok "Node $(node --version) activated via proto"
+    return 0
+}
+
+_nb_try_nvm() {
+    local nvm_sh="${NVM_DIR:-$HOME/.nvm}/nvm.sh"
+    [ -s "$nvm_sh" ] || return 1
+    # shellcheck source=/dev/null
+    \. "$nvm_sh" >/dev/null 2>&1 || return 1
+    _nb_log "nvm detected — installing Node $HERMES_NODE_TARGET_MAJOR..."
+    nvm install "$HERMES_NODE_TARGET_MAJOR" >/dev/null 2>&1 || return 1
+    nvm use     "$HERMES_NODE_TARGET_MAJOR" >/dev/null 2>&1 || return 1
+    _nb_have_modern_node || return 1
+    _nb_ok "Node $(node --version) activated via nvm"
+    return 0
+}
+
+# ---------------------------------------------------------------------------
+# Platform package managers
+# ---------------------------------------------------------------------------
+
+_nb_try_termux_pkg() {
+    _nb_is_termux || return 1
+    _nb_log "Installing Node.js via pkg..."
+    pkg install -y nodejs >/dev/null 2>&1 || return 1
+    _nb_have_modern_node || return 1
+    _nb_ok "Node $(node --version) installed via pkg"
+    return 0
+}
+
+_nb_try_brew() {
+    [ "$(uname -s)" = "Darwin" ] || return 1
+    command -v brew >/dev/null 2>&1 || return 1
+    _nb_log "Installing Node via Homebrew..."
+    brew install "node@${HERMES_NODE_TARGET_MAJOR}" >/dev/null 2>&1 \
+        || brew install node >/dev/null 2>&1 \
+        || return 1
+    brew link --overwrite --force "node@${HERMES_NODE_TARGET_MAJOR}" >/dev/null 2>&1 || true
+    _nb_have_modern_node || return 1
+    _nb_ok "Node $(node --version) installed via Homebrew"
+    return 0
+}
+
+# ---------------------------------------------------------------------------
+# Bundled binary fallback — always works, no shell rc edits
+# ---------------------------------------------------------------------------
+
+_nb_install_bundled_node() {
+    local arch node_arch os_name node_os
+    arch=$(uname -m)
+    case "$arch" in
+        x86_64)        node_arch="x64"    ;;
+        aarch64|arm64) node_arch="arm64"  ;;
+        armv7l)        node_arch="armv7l" ;;
+        *)
+            _nb_warn "Unsupported arch ($arch) — install Node.js manually: https://nodejs.org/"
+            return 1
+            ;;
+    esac
+
+    os_name=$(uname -s)
+    case "$os_name" in
+        Linux*)  node_os="linux"  ;;
+        Darwin*) node_os="darwin" ;;
+        *)
+            _nb_warn "Unsupported OS ($os_name) — install Node.js manually: https://nodejs.org/"
+            return 1
+            ;;
+    esac
+
+    local index_url="https://nodejs.org/dist/latest-v${HERMES_NODE_TARGET_MAJOR}.x/"
+    local tarball
+    tarball=$(curl -fsSL "$index_url" \
+        | grep -oE "node-v${HERMES_NODE_TARGET_MAJOR}\.[0-9]+\.[0-9]+-${node_os}-${node_arch}\.tar\.xz" \
+        | head -1)
+    if [ -z "$tarball" ]; then
+        tarball=$(curl -fsSL "$index_url" \
+            | grep -oE "node-v${HERMES_NODE_TARGET_MAJOR}\.[0-9]+\.[0-9]+-${node_os}-${node_arch}\.tar\.gz" \
+            | head -1)
+    fi
+    if [ -z "$tarball" ]; then
+        _nb_warn "Could not resolve Node $HERMES_NODE_TARGET_MAJOR binary for $node_os-$node_arch"
+        return 1
+    fi
+
+    local tmp
+    tmp=$(mktemp -d)
+    _nb_log "Downloading $tarball..."
+    curl -fsSL "${index_url}${tarball}" -o "$tmp/$tarball" || {
+        _nb_warn "Download failed"; rm -rf "$tmp"; return 1
+    }
+
+    _nb_log "Extracting to $HERMES_HOME/node/..."
+    if [[ "$tarball" == *.tar.xz ]]; then
+        tar xf  "$tmp/$tarball" -C "$tmp" || { rm -rf "$tmp"; return 1; }
+    else
+        tar xzf "$tmp/$tarball" -C "$tmp" || { rm -rf "$tmp"; return 1; }
+    fi
+
+    local extracted
+    extracted=$(find "$tmp" -maxdepth 1 -type d -name 'node-v*' 2>/dev/null | head -1)
+    if [ ! -d "$extracted" ]; then
+        _nb_warn "Extraction produced no node-v* directory"
+        rm -rf "$tmp"
+        return 1
+    fi
+
+    mkdir -p "$HERMES_HOME"
+    rm -rf "$HERMES_HOME/node"
+    mv "$extracted" "$HERMES_HOME/node"
+    rm -rf "$tmp"
+
+    mkdir -p "$HOME/.local/bin"
+    ln -sf "$HERMES_HOME/node/bin/node" "$HOME/.local/bin/node"
+    ln -sf "$HERMES_HOME/node/bin/npm"  "$HOME/.local/bin/npm"
+    ln -sf "$HERMES_HOME/node/bin/npx"  "$HOME/.local/bin/npx"
+    export PATH="$HERMES_HOME/node/bin:$PATH"
+
+    _nb_have_modern_node || return 1
+    _nb_ok "Node $(node --version) installed to $HERMES_HOME/node/"
+    return 0
+}
+
+# ---------------------------------------------------------------------------
+# Public entry point
+# ---------------------------------------------------------------------------
+
+ensure_node() {
+    HERMES_NODE_AVAILABLE=false
+
+    if _nb_have_modern_node; then
+        _nb_ok "Node $(node --version) found"
+        HERMES_NODE_AVAILABLE=true
+        return 0
+    fi
+
+    if [ -x "$HERMES_HOME/node/bin/node" ]; then
+        export PATH="$HERMES_HOME/node/bin:$PATH"
+        if _nb_have_modern_node; then
+            _nb_ok "Node $(node --version) found (Hermes-managed)"
+            HERMES_NODE_AVAILABLE=true
+            return 0
+        fi
+    fi
+
+    # Version managers first — respect the user's existing setup.
+    _nb_try_fnm   && { HERMES_NODE_AVAILABLE=true; return 0; }
+    _nb_try_proto && { HERMES_NODE_AVAILABLE=true; return 0; }
+    _nb_try_nvm   && { HERMES_NODE_AVAILABLE=true; return 0; }
+
+    # Platform package managers.
+    _nb_try_termux_pkg && { HERMES_NODE_AVAILABLE=true; return 0; }
+    _nb_try_brew       && { HERMES_NODE_AVAILABLE=true; return 0; }
+
+    # Last resort: pinned nodejs.org tarball.
+    _nb_install_bundled_node && { HERMES_NODE_AVAILABLE=true; return 0; }
+
+    _nb_warn "Node.js install failed — TUI and browser tools will be unavailable."
+    _nb_warn "Install manually: https://nodejs.org/en/download/  (or: \`brew install node\`, \`fnm install $HERMES_NODE_TARGET_MAJOR\`, etc.)"
+    return 1
+}
@@ -103,6 +103,7 @@ AUTHOR_MAP = {
    "dangtc94@gmail.com": "dieutx",
    "jaisehgal11299@gmail.com": "jaisup",
    "percydikec@gmail.com": "PercyDikec",
+    "noonou7@gmail.com": "HenkDz",
    "dean.kerr@gmail.com": "deankerr",
    "socrates1024@gmail.com": "socrates1024",
    "satelerd@gmail.com": "satelerd",
@@ -255,6 +256,8 @@ AUTHOR_MAP = {
    "anthhub@163.com": "anthhub",
    "shenuu@gmail.com": "shenuu",
    "xiayh17@gmail.com": "xiayh0107",
+    "asurla@nvidia.com": "anniesurla",
+    "limkuan24@gmail.com": "WideLee",
 }


@@ -42,9 +42,10 @@ class TestToolProgressCallback:
    def test_emits_tool_call_start(self, mock_conn, event_loop_fixture):
        """Tool progress should emit a ToolCallStart update."""
        tool_call_ids = {}
+        tool_call_meta = {}
        loop = event_loop_fixture

-        cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids)
+        cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids, tool_call_meta)

        # Run callback in the event loop context
        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts:
@@ -66,9 +67,10 @@ class TestToolProgressCallback:
    def test_handles_string_args(self, mock_conn, event_loop_fixture):
        """If args is a JSON string, it should be parsed."""
        tool_call_ids = {}
+        tool_call_meta = {}
        loop = event_loop_fixture

-        cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids)
+        cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids, tool_call_meta)

        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts:
            future = MagicMock(spec=Future)
@@ -82,9 +84,10 @@ class TestToolProgressCallback:
    def test_handles_non_dict_args(self, mock_conn, event_loop_fixture):
        """If args is not a dict, it should be wrapped."""
        tool_call_ids = {}
+        tool_call_meta = {}
        loop = event_loop_fixture

-        cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids)
+        cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids, tool_call_meta)

        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts:
            future = MagicMock(spec=Future)
@@ -98,10 +101,11 @@ class TestToolProgressCallback:
    def test_duplicate_same_name_tool_calls_use_fifo_ids(self, mock_conn, event_loop_fixture):
        """Multiple same-name tool calls should be tracked independently in order."""
        tool_call_ids = {}
+        tool_call_meta = {}
        loop = event_loop_fixture

-        progress_cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids)
-        step_cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids)
+        progress_cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids, tool_call_meta)
+        step_cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids, tool_call_meta)

        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts:
            future = MagicMock(spec=Future)
@@ -163,7 +167,7 @@ class TestStepCallback:
        tool_call_ids = {"terminal": "tc-abc123"}
        loop = event_loop_fixture

-        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids)
+        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids, {})

        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts:
            future = MagicMock(spec=Future)
@@ -181,7 +185,7 @@ class TestStepCallback:
        tool_call_ids = {}
        loop = event_loop_fixture

-        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids)
+        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids, {})

        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts:
            cb(1, [{"name": "unknown_tool", "result": "ok"}])
@@ -193,7 +197,7 @@ class TestStepCallback:
        tool_call_ids = {"read_file": "tc-def456"}
        loop = event_loop_fixture

-        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids)
+        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids, {})

        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts:
            future = MagicMock(spec=Future)
@@ -212,7 +216,7 @@ class TestStepCallback:
        tool_call_ids = {"terminal": deque(["tc-xyz789"])}
        loop = event_loop_fixture

-        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids)
+        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids, {})

        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts, \
             patch("acp_adapter.events.build_tool_complete") as mock_btc:
@@ -224,7 +228,7 @@ class TestStepCallback:
            cb(1, [{"name": "terminal", "result": '{"output": "hello"}'}])

        mock_btc.assert_called_once_with(
-            "tc-xyz789", "terminal", result='{"output": "hello"}'
+            "tc-xyz789", "terminal", result='{"output": "hello"}', function_args=None, snapshot=None
        )

    def test_none_result_passed_through(self, mock_conn, event_loop_fixture):
@@ -234,7 +238,7 @@ class TestStepCallback:
        tool_call_ids = {"web_search": deque(["tc-aaa"])}
        loop = event_loop_fixture

-        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids)
+        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids, {})

        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts, \
             patch("acp_adapter.events.build_tool_complete") as mock_btc:
@@ -244,7 +248,50 @@ class TestStepCallback:

            cb(1, [{"name": "web_search", "result": None}])

-        mock_btc.assert_called_once_with("tc-aaa", "web_search", result=None)
+        mock_btc.assert_called_once_with("tc-aaa", "web_search", result=None, function_args=None, snapshot=None)
+
+    def test_step_callback_passes_arguments_and_snapshot(self, mock_conn, event_loop_fixture):
+        from collections import deque
+
+        tool_call_ids = {"write_file": deque(["tc-write"])}
+        tool_call_meta = {"tc-write": {"args": {"path": "fallback.txt"}, "snapshot": "snap"}}
+        loop = event_loop_fixture
+
+        cb = make_step_cb(mock_conn, "session-1", loop, tool_call_ids, tool_call_meta)
+
+        with patch("acp_adapter.events.asyncio.run_coroutine_threadsafe") as mock_rcts, \
+             patch("acp_adapter.events.build_tool_complete") as mock_btc:
+            future = MagicMock(spec=Future)
+            future.result.return_value = None
+            mock_rcts.return_value = future
+
+            cb(1, [{"name": "write_file", "result": '{"bytes_written": 23}', "arguments": {"path": "diff-test.txt"}}])
+
+        mock_btc.assert_called_once_with(
+            "tc-write",
+            "write_file",
+            result='{"bytes_written": 23}',
+            function_args={"path": "diff-test.txt"},
+            snapshot="snap",
+        )
+
+    def test_tool_progress_captures_snapshot_metadata(self, mock_conn, event_loop_fixture):
+        tool_call_ids = {}
+        tool_call_meta = {}
+        loop = event_loop_fixture
+
+        with patch("acp_adapter.events.make_tool_call_id", return_value="tc-meta"), \
+             patch("acp_adapter.events._send_update") as mock_send, \
+             patch("agent.display.capture_local_edit_snapshot", return_value="snapshot"):
+            cb = make_tool_progress_cb(mock_conn, "session-1", loop, tool_call_ids, tool_call_meta)
+            cb("tool.started", "write_file", None, {"path": "diff-test.txt", "content": "hello"})
+
+        assert list(tool_call_ids["write_file"]) == ["tc-meta"]
+        assert tool_call_meta["tc-meta"] == {
+            "args": {"path": "diff-test.txt", "content": "hello"},
+            "snapshot": "snapshot",
+        }
+        mock_send.assert_called_once()


 # ---------------------------------------------------------------------------
@@ -29,6 +29,7 @@ from acp.schema import (

 from acp_adapter.server import HermesACPAgent
 from acp_adapter.session import SessionManager
+from acp_adapter.tools import build_tool_start


 # ---------------------------------------------------------------------------
@@ -181,6 +182,25 @@ class TestMcpRegistrationE2E:
        assert complete_event.raw_output is not None
        assert "hello" in str(complete_event.raw_output)

+    def test_patch_mode_tool_start_emits_diff_blocks_for_v4a_patch(self):
+        update = build_tool_start(
+            "tc-1",
+            "patch",
+            {
+                "mode": "patch",
+                "patch": "*** Begin Patch\n*** Update File: src/app.py\n@@\n-old line\n+new line\n*** Add File: src/new.py\n+hello\n*** End Patch",
+            },
+        )
+
+        assert len(update.content) == 2
+        assert update.content[0].type == "diff"
+        assert update.content[0].path == "src/app.py"
+        assert update.content[0].old_text == "old line"
+        assert update.content[0].new_text == "new line"
+        assert update.content[1].type == "diff"
+        assert update.content[1].path == "src/new.py"
+        assert update.content[1].new_text == "hello"
+
    @pytest.mark.asyncio
    async def test_prompt_tool_results_paired_by_call_id(self, acp_agent, mock_manager):
        """The ToolCallUpdate's toolCallId must match the ToolCallStart's."""
@@ -20,7 +20,9 @@ from acp.schema import (
    NewSessionResponse,
    PromptResponse,
    ResumeSessionResponse,
+    SessionModelState,
    SetSessionConfigOptionResponse,
+    SetSessionModelResponse,
    SetSessionModeResponse,
    SessionInfo,
    TextContentBlock,
@@ -127,6 +129,25 @@ class TestSessionOps:
        assert state is not None
        assert state.cwd == "/home/user/project"

+    @pytest.mark.asyncio
+    async def test_new_session_returns_model_state(self):
+        manager = SessionManager(
+            agent_factory=lambda: SimpleNamespace(model="gpt-5.4", provider="openai-codex")
+        )
+        acp_agent = HermesACPAgent(session_manager=manager)
+
+        with patch(
+            "hermes_cli.models.curated_models_for_provider",
+            return_value=[("gpt-5.4", "recommended"), ("gpt-5.4-mini", "")],
+        ):
+            resp = await acp_agent.new_session(cwd="/tmp")
+
+        assert isinstance(resp.models, SessionModelState)
+        assert resp.models.current_model_id == "openai-codex:gpt-5.4"
+        assert resp.models.available_models[0].model_id == "openai-codex:gpt-5.4"
+        assert resp.models.available_models[0].description is not None
+        assert "Provider:" in resp.models.available_models[0].description
+
    @pytest.mark.asyncio
    async def test_available_commands_include_help(self, agent):
        help_cmd = next(
@@ -204,6 +225,33 @@ class TestListAndFork:
        assert fork_resp.session_id
        assert fork_resp.session_id != new_resp.session_id

+    @pytest.mark.asyncio
+    async def test_list_sessions_includes_title_and_updated_at(self, agent):
+        with patch.object(
+            agent.session_manager,
+            "list_sessions",
+            return_value=[
+                {
+                    "session_id": "session-1",
+                    "cwd": "/tmp/project",
+                    "title": "Fix Zed session history",
+                    "updated_at": 123.0,
+                }
+            ],
+        ):
+            resp = await agent.list_sessions(cwd="/tmp/project")
+
+        assert isinstance(resp.sessions[0], SessionInfo)
+        assert resp.sessions[0].title == "Fix Zed session history"
+        assert resp.sessions[0].updated_at == "123.0"
+
+    @pytest.mark.asyncio
+    async def test_list_sessions_passes_cwd_filter(self, agent):
+        with patch.object(agent.session_manager, "list_sessions", return_value=[]) as mock_list:
+            await agent.list_sessions(cwd="/mnt/e/Projects/AI/browser-link-3")
+
+        mock_list.assert_called_once_with(cwd="/mnt/e/Projects/AI/browser-link-3")
+
 # ---------------------------------------------------------------------------
 # session configuration / model routing
 # ---------------------------------------------------------------------------
@@ -257,6 +305,53 @@ class TestSessionConfiguration:
        assert result == {}
        assert state.model == "gpt-5.4"

+    @pytest.mark.asyncio
+    async def test_set_session_model_accepts_provider_prefixed_choice(self, tmp_path, monkeypatch):
+        runtime_calls = []
+
+        def fake_resolve_runtime_provider(requested=None, **kwargs):
+            runtime_calls.append(requested)
+            provider = requested or "openrouter"
+            return {
+                "provider": provider,
+                "api_mode": "anthropic_messages" if provider == "anthropic" else "chat_completions",
+                "base_url": f"https://{provider}.example/v1",
+                "api_key": f"{provider}-key",
+                "command": None,
+                "args": [],
+            }
+
+        def fake_agent(**kwargs):
+            return SimpleNamespace(
+                model=kwargs.get("model"),
+                provider=kwargs.get("provider"),
+                base_url=kwargs.get("base_url"),
+                api_mode=kwargs.get("api_mode"),
+            )
+
+        monkeypatch.setattr("hermes_cli.config.load_config", lambda: {
+            "model": {"provider": "openrouter", "default": "openrouter/gpt-5"}
+        })
+        monkeypatch.setattr(
+            "hermes_cli.runtime_provider.resolve_runtime_provider",
+            fake_resolve_runtime_provider,
+        )
+        manager = SessionManager(db=SessionDB(tmp_path / "state.db"))
+
+        with patch("run_agent.AIAgent", side_effect=fake_agent):
+            acp_agent = HermesACPAgent(session_manager=manager)
+            state = manager.create_session(cwd="/tmp")
+            result = await acp_agent.set_session_model(
+                model_id="anthropic:claude-sonnet-4-6",
+                session_id=state.session_id,
+            )
+
+        assert isinstance(result, SetSessionModelResponse)
+        assert state.model == "claude-sonnet-4-6"
+        assert state.agent.provider == "anthropic"
+        assert state.agent.base_url == "https://anthropic.example/v1"
+        assert runtime_calls[-1] == "anthropic"
+

 # ---------------------------------------------------------------------------
 # prompt
@@ -354,6 +449,31 @@ class TestPrompt:
        update = last_call[1].get("update") or last_call[0][1]
        assert update.session_update == "agent_message_chunk"

+    @pytest.mark.asyncio
+    async def test_prompt_auto_titles_session(self, agent):
+        new_resp = await agent.new_session(cwd=".")
+        state = agent.session_manager.get_session(new_resp.session_id)
+        state.agent.run_conversation = MagicMock(return_value={
+            "final_response": "Here is the fix.",
+            "messages": [
+                {"role": "user", "content": "fix the broken ACP history"},
+                {"role": "assistant", "content": "Here is the fix."},
+            ],
+        })
+
+        mock_conn = MagicMock(spec=acp.Client)
+        mock_conn.session_update = AsyncMock()
+        agent._conn = mock_conn
+
+        with patch("agent.title_generator.maybe_auto_title") as mock_title:
+            prompt = [TextContentBlock(type="text", text="fix the broken ACP history")]
+            await agent.prompt(prompt=prompt, session_id=new_resp.session_id)
+
+        mock_title.assert_called_once()
+        assert mock_title.call_args.args[1] == new_resp.session_id
+        assert mock_title.call_args.args[2] == "fix the broken ACP history"
+        assert mock_title.call_args.args[3] == "Here is the fix."
+
    @pytest.mark.asyncio
    async def test_prompt_populates_usage_from_top_level_run_conversation_fields(self, agent):
        """ACP should map top-level token fields into PromptResponse.usage."""
@@ -3,6 +3,7 @@
 import contextlib
 import io
 import json
+import time
 from types import SimpleNamespace
 import pytest
 from unittest.mock import MagicMock, patch
@@ -100,15 +101,23 @@ class TestListAndCleanup:
    def test_list_sessions_returns_created(self, manager):
        s1 = manager.create_session(cwd="/a")
        s2 = manager.create_session(cwd="/b")
+        s1.history.append({"role": "user", "content": "hello from a"})
+        s2.history.append({"role": "user", "content": "hello from b"})
        listing = manager.list_sessions()
        ids = {s["session_id"] for s in listing}
        assert s1.session_id in ids
        assert s2.session_id in ids
        assert len(listing) == 2

+    def test_list_sessions_hides_empty_threads(self, manager):
+        manager.create_session(cwd="/empty")
+        assert manager.list_sessions() == []
+
    def test_cleanup_clears_all(self, manager):
-        manager.create_session()
-        manager.create_session()
+        s1 = manager.create_session()
+        s2 = manager.create_session()
+        s1.history.append({"role": "user", "content": "one"})
+        s2.history.append({"role": "user", "content": "two"})
        assert len(manager.list_sessions()) == 2
        manager.cleanup()
        assert manager.list_sessions() == []
@@ -194,6 +203,8 @@ class TestPersistence:
    def test_list_sessions_includes_db_only(self, manager):
        """Sessions only in DB (not in memory) appear in list_sessions."""
        state = manager.create_session(cwd="/db-only")
+        state.history.append({"role": "user", "content": "database only thread"})
+        manager.save_session(state.session_id)
        sid = state.session_id

        # Drop from memory.
@@ -204,6 +215,53 @@ class TestPersistence:
        ids = {s["session_id"] for s in listing}
        assert sid in ids

+    def test_list_sessions_filters_by_cwd(self, manager):
+        keep = manager.create_session(cwd="/keep")
+        drop = manager.create_session(cwd="/drop")
+        keep.history.append({"role": "user", "content": "keep me"})
+        drop.history.append({"role": "user", "content": "drop me"})
+
+        listing = manager.list_sessions(cwd="/keep")
+        ids = {s["session_id"] for s in listing}
+        assert keep.session_id in ids
+        assert drop.session_id not in ids
+
+    def test_list_sessions_matches_windows_and_wsl_paths(self, manager):
+        state = manager.create_session(cwd="/mnt/e/Projects/AI/browser-link-3")
+        state.history.append({"role": "user", "content": "same project from WSL"})
+
+        listing = manager.list_sessions(cwd=r"E:\Projects\AI\browser-link-3")
+        ids = {s["session_id"] for s in listing}
+        assert state.session_id in ids
+
+    def test_list_sessions_prefers_title_then_preview(self, manager):
+        state = manager.create_session(cwd="/named")
+        state.history.append({"role": "user", "content": "Investigate broken ACP history in Zed"})
+        manager.save_session(state.session_id)
+        db = manager._get_db()
+        db.set_session_title(state.session_id, "Fix Zed ACP history")
+
+        listing = manager.list_sessions(cwd="/named")
+        assert listing[0]["title"] == "Fix Zed ACP history"
+
+        db.set_session_title(state.session_id, "")
+        listing = manager.list_sessions(cwd="/named")
+        assert listing[0]["title"].startswith("Investigate broken ACP history")
+
+    def test_list_sessions_sorted_by_most_recent_activity(self, manager):
+        older = manager.create_session(cwd="/ordered")
+        older.history.append({"role": "user", "content": "older"})
+        manager.save_session(older.session_id)
+        time.sleep(0.02)
+        newer = manager.create_session(cwd="/ordered")
+        newer.history.append({"role": "user", "content": "newer"})
+        manager.save_session(newer.session_id)
+
+        listing = manager.list_sessions(cwd="/ordered")
+        assert [item["session_id"] for item in listing[:2]] == [newer.session_id, older.session_id]
+        assert listing[0]["updated_at"]
+        assert listing[1]["updated_at"]
+
    def test_fork_restores_source_from_db(self, manager):
        """Forking a session that is only in DB should work."""
        original = manager.create_session()
@@ -215,6 +215,46 @@ class TestBuildToolComplete:
        assert len(display_text) < 6000
        assert "truncated" in display_text

+    def test_build_tool_complete_for_patch_uses_diff_blocks(self):
+        """Completed patch calls should keep structured diff content for Zed."""
+        patch_result = (
+            '{"success": true, "diff": "--- a/README.md\\n+++ b/README.md\\n@@ -1 +1,2 @@\\n old line\\n+new line\\n", '
+            '"files_modified": ["README.md"]}'
+        )
+        result = build_tool_complete("tc-p1", "patch", patch_result)
+        assert isinstance(result, ToolCallProgress)
+        assert len(result.content) == 1
+        diff_item = result.content[0]
+        assert isinstance(diff_item, FileEditToolCallContent)
+        assert diff_item.path == "README.md"
+        assert diff_item.old_text == "old line"
+        assert diff_item.new_text == "old line\nnew line"
+
+    def test_build_tool_complete_for_patch_falls_back_to_text_when_no_diff(self):
+        result = build_tool_complete("tc-p2", "patch", '{"success": true}')
+        assert isinstance(result, ToolCallProgress)
+        assert isinstance(result.content[0], ContentToolCallContent)
+
+    def test_build_tool_complete_for_write_file_uses_snapshot_diff(self, tmp_path):
+        target = tmp_path / "diff-test.txt"
+        snapshot = type("Snapshot", (), {"paths": [target], "before": {str(target): None}})()
+        target.write_text("hello from hermes\n", encoding="utf-8")
+
+        result = build_tool_complete(
+            "tc-wf1",
+            "write_file",
+            '{"bytes_written": 18, "dirs_created": false}',
+            function_args={"path": str(target), "content": "hello from hermes\n"},
+            snapshot=snapshot,
+        )
+        assert isinstance(result, ToolCallProgress)
+        assert len(result.content) == 1
+        diff_item = result.content[0]
+        assert isinstance(diff_item, FileEditToolCallContent)
+        assert diff_item.path.endswith("diff-test.txt")
+        assert diff_item.old_text is None
+        assert diff_item.new_text == "hello from hermes"
+

 # ---------------------------------------------------------------------------
 # extract_locations
@@ -696,6 +696,95 @@ class TestIsConnectionError:
        assert _is_connection_error(err) is False


+class TestKimiForCodingTemperature:
+    """kimi-for-coding now requires temperature=0.6 exactly."""
+
+    def test_build_call_kwargs_forces_fixed_temperature(self):
+        from agent.auxiliary_client import _build_call_kwargs
+
+        kwargs = _build_call_kwargs(
+            provider="kimi-coding",
+            model="kimi-for-coding",
+            messages=[{"role": "user", "content": "hello"}],
+            temperature=0.3,
+        )
+
+        assert kwargs["temperature"] == 0.6
+
+    def test_build_call_kwargs_injects_temperature_when_missing(self):
+        from agent.auxiliary_client import _build_call_kwargs
+
+        kwargs = _build_call_kwargs(
+            provider="kimi-coding",
+            model="kimi-for-coding",
+            messages=[{"role": "user", "content": "hello"}],
+            temperature=None,
+        )
+
+        assert kwargs["temperature"] == 0.6
+
+    def test_auto_routed_kimi_for_coding_sync_call_uses_fixed_temperature(self):
+        client = MagicMock()
+        client.base_url = "https://api.kimi.com/coding/v1"
+        response = MagicMock()
+        client.chat.completions.create.return_value = response
+
+        with patch(
+            "agent.auxiliary_client._get_cached_client",
+            return_value=(client, "kimi-for-coding"),
+        ), patch(
+            "agent.auxiliary_client._resolve_task_provider_model",
+            return_value=("auto", "kimi-for-coding", None, None, None),
+        ):
+            result = call_llm(
+                task="session_search",
+                messages=[{"role": "user", "content": "hello"}],
+                temperature=0.1,
+            )
+
+        assert result is response
+        kwargs = client.chat.completions.create.call_args.kwargs
+        assert kwargs["model"] == "kimi-for-coding"
+        assert kwargs["temperature"] == 0.6
+
+    @pytest.mark.asyncio
+    async def test_auto_routed_kimi_for_coding_async_call_uses_fixed_temperature(self):
+        client = MagicMock()
+        client.base_url = "https://api.kimi.com/coding/v1"
+        response = MagicMock()
+        client.chat.completions.create = AsyncMock(return_value=response)
+
+        with patch(
+            "agent.auxiliary_client._get_cached_client",
+            return_value=(client, "kimi-for-coding"),
+        ), patch(
+            "agent.auxiliary_client._resolve_task_provider_model",
+            return_value=("auto", "kimi-for-coding", None, None, None),
+        ):
+            result = await async_call_llm(
+                task="session_search",
+                messages=[{"role": "user", "content": "hello"}],
+                temperature=0.1,
+            )
+
+        assert result is response
+        kwargs = client.chat.completions.create.call_args.kwargs
+        assert kwargs["model"] == "kimi-for-coding"
+        assert kwargs["temperature"] == 0.6
+
+    def test_non_kimi_model_still_preserves_temperature(self):
+        from agent.auxiliary_client import _build_call_kwargs
+
+        kwargs = _build_call_kwargs(
+            provider="kimi-coding",
+            model="kimi-k2.5",
+            messages=[{"role": "user", "content": "hello"}],
+            temperature=0.3,
+        )
+
+        assert kwargs["temperature"] == 0.3
+
+
 # ---------------------------------------------------------------------------
 # async_call_llm payment / connection fallback (#7512 bug 2)
 # ---------------------------------------------------------------------------
@@ -826,6 +826,160 @@ class TestGeminiCloudCodeClient:
        finally:
            client.close()

+
+class TestGeminiHttpErrorParsing:
+    """Regression coverage for _gemini_http_error Google-envelope parsing.
+
+    These are the paths that users actually hit during Google-side throttling
+    (April 2026: gemini-2.5-pro MODEL_CAPACITY_EXHAUSTED, gemma-4-26b-it
+    returning 404).  The error needs to carry status_code + response so the
+    main loop's error_classifier and Retry-After logic work.
+    """
+
+    @staticmethod
+    def _fake_response(status: int, body: dict | str = "", headers=None):
+        """Minimal httpx.Response stand-in (duck-typed for _gemini_http_error)."""
+        class _FakeResponse:
+            def __init__(self):
+                self.status_code = status
+                if isinstance(body, dict):
+                    self.text = json.dumps(body)
+                else:
+                    self.text = body
+                self.headers = headers or {}
+        return _FakeResponse()
+
+    def test_model_capacity_exhausted_produces_friendly_message(self):
+        from agent.gemini_cloudcode_adapter import _gemini_http_error
+
+        body = {
+            "error": {
+                "code": 429,
+                "message": "Resource has been exhausted (e.g. check quota).",
+                "status": "RESOURCE_EXHAUSTED",
+                "details": [
+                    {
+                        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
+                        "reason": "MODEL_CAPACITY_EXHAUSTED",
+                        "domain": "googleapis.com",
+                        "metadata": {"model": "gemini-2.5-pro"},
+                    },
+                    {
+                        "@type": "type.googleapis.com/google.rpc.RetryInfo",
+                        "retryDelay": "30s",
+                    },
+                ],
+            }
+        }
+        err = _gemini_http_error(self._fake_response(429, body))
+        assert err.status_code == 429
+        assert err.code == "code_assist_capacity_exhausted"
+        assert err.retry_after == 30.0
+        assert err.details["reason"] == "MODEL_CAPACITY_EXHAUSTED"
+        # Message must be user-friendly, not a raw JSON dump.
+        message = str(err)
+        assert "gemini-2.5-pro" in message
+        assert "capacity exhausted" in message.lower()
+        assert "30s" in message
+        # response attr is preserved for run_agent's Retry-After header path.
+        assert err.response is not None
+
+    def test_resource_exhausted_without_reason(self):
+        from agent.gemini_cloudcode_adapter import _gemini_http_error
+
+        body = {
+            "error": {
+                "code": 429,
+                "message": "Quota exceeded for requests per minute.",
+                "status": "RESOURCE_EXHAUSTED",
+            }
+        }
+        err = _gemini_http_error(self._fake_response(429, body))
+        assert err.status_code == 429
+        assert err.code == "code_assist_rate_limited"
+        message = str(err)
+        assert "quota" in message.lower()
+
+    def test_404_model_not_found_produces_model_retired_message(self):
+        from agent.gemini_cloudcode_adapter import _gemini_http_error
+
+        body = {
+            "error": {
+                "code": 404,
+                "message": "models/gemma-4-26b-it is not found for API version v1internal",
+                "status": "NOT_FOUND",
+            }
+        }
+        err = _gemini_http_error(self._fake_response(404, body))
+        assert err.status_code == 404
+        message = str(err)
+        assert "not available" in message.lower() or "retired" in message.lower()
+        # Error message should reference the actual model text from Google.
+        assert "gemma-4-26b-it" in message
+
+    def test_unauthorized_preserves_status_code(self):
+        from agent.gemini_cloudcode_adapter import _gemini_http_error
+
+        err = _gemini_http_error(self._fake_response(
+            401, {"error": {"code": 401, "message": "Invalid token", "status": "UNAUTHENTICATED"}},
+        ))
+        assert err.status_code == 401
+        assert err.code == "code_assist_unauthorized"
+
+    def test_retry_after_header_fallback(self):
+        """If the body has no RetryInfo detail, fall back to Retry-After header."""
+        from agent.gemini_cloudcode_adapter import _gemini_http_error
+
+        resp = self._fake_response(
+            429,
+            {"error": {"code": 429, "message": "Rate limited", "status": "RESOURCE_EXHAUSTED"}},
+            headers={"Retry-After": "45"},
+        )
+        err = _gemini_http_error(resp)
+        assert err.retry_after == 45.0
+
+    def test_malformed_body_still_produces_structured_error(self):
+        """Non-JSON body must not swallow status_code — we still want the classifier path."""
+        from agent.gemini_cloudcode_adapter import _gemini_http_error
+
+        err = _gemini_http_error(self._fake_response(500, "<html>internal error</html>"))
+        assert err.status_code == 500
+        # Raw body snippet must still be there for debugging.
+        assert "500" in str(err)
+
+    def test_status_code_flows_through_error_classifier(self):
+        """End-to-end: CodeAssistError from a 429 must classify as rate_limit.
+
+        This is the whole point of adding status_code to CodeAssistError —
+        _extract_status_code must see it and FailoverReason.rate_limit must
+        fire, so the main loop triggers fallback_providers.
+        """
+        from agent.gemini_cloudcode_adapter import _gemini_http_error
+        from agent.error_classifier import classify_api_error, FailoverReason
+
+        body = {
+            "error": {
+                "code": 429,
+                "message": "Resource has been exhausted",
+                "status": "RESOURCE_EXHAUSTED",
+                "details": [
+                    {
+                        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
+                        "reason": "MODEL_CAPACITY_EXHAUSTED",
+                        "metadata": {"model": "gemini-2.5-pro"},
+                    }
+                ],
+            }
+        }
+        err = _gemini_http_error(self._fake_response(429, body))
+
+        classified = classify_api_error(
+            err, provider="google-gemini-cli", model="gemini-2.5-pro",
+        )
+        assert classified.status_code == 429
+        assert classified.reason == FailoverReason.rate_limit
+
+
 # =============================================================================
 # Provider registration
 # =============================================================================
@@ -0,0 +1,71 @@
+"""Tests for CLI /copy command."""
+
+from unittest.mock import MagicMock, patch
+
+from cli import HermesCLI
+
+
+def _make_cli() -> HermesCLI:
+    cli_obj = HermesCLI.__new__(HermesCLI)
+    cli_obj.config = {}
+    cli_obj.console = MagicMock()
+    cli_obj.agent = None
+    cli_obj.conversation_history = []
+    cli_obj.session_id = "sess-copy-test"
+    cli_obj._pending_input = MagicMock()
+    cli_obj._app = None
+    return cli_obj
+
+
+def test_copy_copies_latest_assistant_message():
+    cli_obj = _make_cli()
+    cli_obj.conversation_history = [
+        {"role": "user", "content": "hi"},
+        {"role": "assistant", "content": "first"},
+        {"role": "assistant", "content": "latest"},
+    ]
+
+    with patch.object(cli_obj, "_write_osc52_clipboard") as mock_copy:
+        result = cli_obj.process_command("/copy")
+
+    assert result is True
+    mock_copy.assert_called_once_with("latest")
+
+
+def test_copy_with_index_uses_requested_assistant_message():
+    cli_obj = _make_cli()
+    cli_obj.conversation_history = [
+        {"role": "assistant", "content": "one"},
+        {"role": "assistant", "content": "two"},
+    ]
+
+    with patch.object(cli_obj, "_write_osc52_clipboard") as mock_copy:
+        cli_obj.process_command("/copy 1")
+
+    mock_copy.assert_called_once_with("one")
+
+
+def test_copy_strips_reasoning_blocks_before_copy():
+    cli_obj = _make_cli()
+    cli_obj.conversation_history = [
+        {
+            "role": "assistant",
+            "content": "<REASONING_SCRATCHPAD>internal</REASONING_SCRATCHPAD>\nVisible answer",
+        }
+    ]
+
+    with patch.object(cli_obj, "_write_osc52_clipboard") as mock_copy:
+        cli_obj.process_command("/copy")
+
+    mock_copy.assert_called_once_with("Visible answer")
+
+
+def test_copy_invalid_index_does_not_copy():
+    cli_obj = _make_cli()
+    cli_obj.conversation_history = [{"role": "assistant", "content": "only"}]
+
+    with patch.object(cli_obj, "_write_osc52_clipboard") as mock_copy, patch("cli._cprint") as mock_print:
+        cli_obj.process_command("/copy 99")
+
+    mock_copy.assert_not_called()
+    assert any("Invalid response number" in str(call) for call in mock_print.call_args_list)
@@ -2,7 +2,8 @@

 Surrogates (U+D800..U+DFFF) are invalid in UTF-8 and crash json.dumps()
 inside the OpenAI SDK. They can appear via clipboard paste from rich-text
-editors like Google Docs.
+editors like Google Docs, OR from byte-level reasoning models (xiaomi/mimo,
+kimi, glm) emitting lone halves in reasoning output.
 """
 import json
 import pytest
@@ -11,6 +12,7 @@ from unittest.mock import MagicMock, patch
 from run_agent import (
    _sanitize_surrogates,
    _sanitize_messages_surrogates,
+    _sanitize_structure_surrogates,
    _SURROGATE_RE,
 )

@@ -109,6 +111,186 @@ class TestSanitizeMessagesSurrogates:
        assert "\ufffd" in msgs[0]["content"]


+class TestReasoningFieldSurrogates:
+    """Surrogates in reasoning fields (byte-level reasoning models).
+
+    xiaomi/mimo, kimi, glm and similar byte-level tokenizers can emit lone
+    surrogates in reasoning output. These fields are carried through to the
+    API as `reasoning_content` on assistant messages, and must be sanitized
+    or json.dumps() crashes with 'utf-8' codec can't encode surrogates.
+    """
+
+    def test_reasoning_field_sanitized(self):
+        msgs = [
+            {"role": "assistant", "content": "ok", "reasoning": "thought \udce2 here"},
+        ]
+        assert _sanitize_messages_surrogates(msgs) is True
+        assert "\udce2" not in msgs[0]["reasoning"]
+        assert "\ufffd" in msgs[0]["reasoning"]
+
+    def test_reasoning_content_field_sanitized(self):
+        """api_messages carry `reasoning_content` built from `reasoning`."""
+        msgs = [
+            {"role": "assistant", "content": "ok", "reasoning_content": "thought \udce2 here"},
+        ]
+        assert _sanitize_messages_surrogates(msgs) is True
+        assert "\udce2" not in msgs[0]["reasoning_content"]
+        assert "\ufffd" in msgs[0]["reasoning_content"]
+
+    def test_reasoning_details_nested_sanitized(self):
+        """reasoning_details is a list of dicts with nested string fields."""
+        msgs = [
+            {
+                "role": "assistant",
+                "content": "ok",
+                "reasoning_details": [
+                    {"type": "reasoning.summary", "summary": "summary \udce2 text"},
+                    {"type": "reasoning.text", "text": "chain \udc00 of thought"},
+                ],
+            },
+        ]
+        assert _sanitize_messages_surrogates(msgs) is True
+        assert "\udce2" not in msgs[0]["reasoning_details"][0]["summary"]
+        assert "\ufffd" in msgs[0]["reasoning_details"][0]["summary"]
+        assert "\udc00" not in msgs[0]["reasoning_details"][1]["text"]
+        assert "\ufffd" in msgs[0]["reasoning_details"][1]["text"]
+
+    def test_deeply_nested_reasoning_sanitized(self):
+        """Nested dicts / lists inside extra fields are recursed into."""
+        msgs = [
+            {
+                "role": "assistant",
+                "content": "ok",
+                "reasoning_details": [
+                    {
+                        "type": "reasoning.encrypted",
+                        "content": {
+                            "encrypted_content": "opaque",
+                            "text_parts": ["part1", "part2 \udce2 part"],
+                        },
+                    },
+                ],
+            },
+        ]
+        assert _sanitize_messages_surrogates(msgs) is True
+        assert (
+            msgs[0]["reasoning_details"][0]["content"]["text_parts"][1]
+            == "part2 \ufffd part"
+        )
+
+    def test_reasoning_end_to_end_json_serialization(self):
+        """After sanitization, the full message dict must serialize clean."""
+        msgs = [
+            {
+                "role": "assistant",
+                "content": "answer",
+                "reasoning_content": "reasoning with \udce2 surrogate",
+                "reasoning_details": [
+                    {"summary": "nested \udcb0 surrogate"},
+                ],
+            },
+        ]
+        _sanitize_messages_surrogates(msgs)
+        # Must round-trip through json + utf-8 encoding without error
+        payload = json.dumps(msgs, ensure_ascii=False).encode("utf-8")
+        assert b"\\" not in payload[:0]  # sanity — just ensure we got bytes
+        assert len(payload) > 0
+
+    def test_no_surrogates_returns_false(self):
+        """Clean reasoning fields don't trigger a modification."""
+        msgs = [
+            {
+                "role": "assistant",
+                "content": "ok",
+                "reasoning": "clean thought",
+                "reasoning_content": "also clean",
+                "reasoning_details": [{"summary": "clean summary"}],
+            },
+        ]
+        assert _sanitize_messages_surrogates(msgs) is False
+
+
+class TestSanitizeStructureSurrogates:
+    """Test the _sanitize_structure_surrogates() helper for nested payloads."""
+
+    def test_empty_payload(self):
+        assert _sanitize_structure_surrogates({}) is False
+        assert _sanitize_structure_surrogates([]) is False
+
+    def test_flat_dict(self):
+        payload = {"a": "clean", "b": "dirty \udce2 text"}
+        assert _sanitize_structure_surrogates(payload) is True
+        assert payload["a"] == "clean"
+        assert "\ufffd" in payload["b"]
+
+    def test_flat_list(self):
+        payload = ["clean", "dirty \udce2"]
+        assert _sanitize_structure_surrogates(payload) is True
+        assert payload[0] == "clean"
+        assert "\ufffd" in payload[1]
+
+    def test_nested_dict_in_list(self):
+        payload = [{"x": "dirty \udce2"}, {"x": "clean"}]
+        assert _sanitize_structure_surrogates(payload) is True
+        assert "\ufffd" in payload[0]["x"]
+        assert payload[1]["x"] == "clean"
+
+    def test_deeply_nested(self):
+        payload = {
+            "level1": {
+                "level2": [
+                    {"level3": "deep \udce2 surrogate"},
+                ],
+            },
+        }
+        assert _sanitize_structure_surrogates(payload) is True
+        assert "\ufffd" in payload["level1"]["level2"][0]["level3"]
+
+    def test_clean_payload_returns_false(self):
+        payload = {"a": "clean", "b": [{"c": "also clean"}]}
+        assert _sanitize_structure_surrogates(payload) is False
+
+    def test_non_string_values_ignored(self):
+        payload = {"int": 42, "list": [1, 2, 3], "dict": {"none": None}, "bool": True}
+        assert _sanitize_structure_surrogates(payload) is False
+        # Non-string values survive unchanged
+        assert payload["int"] == 42
+        assert payload["list"] == [1, 2, 3]
+
+
+class TestApiMessagesSurrogateRecovery:
+    """Integration: verify the recovery block sanitizes api_messages.
+
+    The bug this guards against: a surrogate in `reasoning_content` on
+    api_messages (transformed from `reasoning` during build) crashes the
+    OpenAI SDK's json.dumps(), and the recovery block previously only
+    sanitized the canonical `messages` list — not `api_messages` — so the
+    next retry would send the same broken payload and fail 3 times.
+    """
+
+    def test_api_messages_reasoning_content_sanitized(self):
+        """The extended sanitizer catches reasoning_content in api_messages."""
+        api_messages = [
+            {"role": "system", "content": "sys"},
+            {
+                "role": "assistant",
+                "content": "response",
+                "reasoning_content": "thought \udce2 trail",
+                "tool_calls": [
+                    {
+                        "id": "call_1",
+                        "function": {"name": "tool", "arguments": "{}"},
+                    }
+                ],
+            },
+            {"role": "tool", "content": "result", "tool_call_id": "call_1"},
+        ]
+        assert _sanitize_messages_surrogates(api_messages) is True
+        assert "\udce2" not in api_messages[1]["reasoning_content"]
+        # Full payload must now serialize clean
+        json.dumps(api_messages, ensure_ascii=False).encode("utf-8")
+
+
 class TestRunConversationSurrogateSanitization:
    """Integration: verify run_conversation sanitizes user_message."""

@@ -184,6 +184,8 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
    "HERMES_BACKGROUND_NOTIFICATIONS",
    "HERMES_EXEC_ASK",
    "HERMES_HOME_MODE",
+    "BROWSER_CDP_URL",
+    "CAMOFOX_URL",
 })


@@ -229,6 +231,15 @@ def _hermetic_environment(tmp_path, monkeypatch):
    monkeypatch.setenv("LC_ALL", "C.UTF-8")
    monkeypatch.setenv("PYTHONHASHSEED", "0")

+    # 4b. Disable AWS IMDS lookups. Without this, any test that ends up
+    #     calling has_aws_credentials() / resolve_aws_auth_env_var()
+    #     (e.g. provider auto-detect, status command, cron run_job) burns
+    #     ~2s waiting for the metadata service at 169.254.169.254 to time
+    #     out. Tests don't run on EC2 — IMDS is always unreachable here.
+    monkeypatch.setenv("AWS_EC2_METADATA_DISABLED", "true")
+    monkeypatch.setenv("AWS_METADATA_SERVICE_TIMEOUT", "1")
+    monkeypatch.setenv("AWS_METADATA_SERVICE_NUM_ATTEMPTS", "1")
+
    # 5. Reset plugin singleton so tests don't leak plugins from
    #    ~/.hermes/plugins/ (which, per step 3, is now empty — but the
    #    singleton might still be cached from a previous test).
@@ -160,6 +160,30 @@ class TestCommandBypassActiveSession:
        assert sk not in adapter._pending_messages
        assert any("handled:status" in r for r in adapter.sent_responses)

+    @pytest.mark.asyncio
+    async def test_agents_bypasses_guard(self):
+        """/agents must bypass so active-task queries don't interrupt runs."""
+        adapter = _make_adapter()
+        sk = _session_key()
+        adapter._active_sessions[sk] = asyncio.Event()
+
+        await adapter.handle_message(_make_event("/agents"))
+
+        assert sk not in adapter._pending_messages
+        assert any("handled:agents" in r for r in adapter.sent_responses)
+
+    @pytest.mark.asyncio
+    async def test_tasks_alias_bypasses_guard(self):
+        """/tasks alias must bypass active-session guard too."""
+        adapter = _make_adapter()
+        sk = _session_key()
+        adapter._active_sessions[sk] = asyncio.Event()
+
+        await adapter.handle_message(_make_event("/tasks"))
+
+        assert sk not in adapter._pending_messages
+        assert any("handled:tasks" in r for r in adapter.sent_responses)
+
    @pytest.mark.asyncio
    async def test_background_bypasses_guard(self):
        """/background must bypass so it spawns a parallel task, not an interrupt."""
@@ -179,7 +179,7 @@ class TestVoiceAttachmentSSRFProtection:
        from gateway.platforms.qqbot import QQAdapter, _ssrf_redirect_guard

        client = mock.AsyncMock()
-        with mock.patch("gateway.platforms.qqbot.httpx.AsyncClient", return_value=client) as async_client_cls:
+        with mock.patch("gateway.platforms.qqbot.adapter.httpx.AsyncClient", return_value=client) as async_client_cls:
            adapter = QQAdapter(_make_config(app_id="a", client_secret="b"))
            adapter._ensure_token = mock.AsyncMock(side_effect=RuntimeError("stop after client creation"))

@@ -0,0 +1,231 @@
+"""Regression tests for _release_running_agent_state and SessionDB shutdown.
+
+Before this change, running-agent state lived in three dicts that drifted
+out of sync:
+
+  self._running_agents       — AIAgent instance per session key
+  self._running_agents_ts    — start timestamp per session key
+  self._busy_ack_ts          — last busy-ack timestamp per session key
+
+Six cleanup sites did ``del self._running_agents[key]`` without touching
+the other two; one site only popped ``_running_agents`` and
+``_running_agents_ts``; and only the stale-eviction site cleaned all
+three.  Each missed entry was a small persistent leak.
+
+Also: SessionDB connections were never closed on gateway shutdown,
+leaving WAL locks in place until Python actually exited.
+"""
+
+import threading
+from unittest.mock import MagicMock
+
+import pytest
+
+
+def _make_runner():
+    """Bare GatewayRunner wired with just the state the helper touches."""
+    from gateway.run import GatewayRunner
+
+    runner = GatewayRunner.__new__(GatewayRunner)
+    runner._running_agents = {}
+    runner._running_agents_ts = {}
+    runner._busy_ack_ts = {}
+    return runner
+
+
+class TestReleaseRunningAgentStateUnit:
+    def test_pops_all_three_dicts(self):
+        runner = _make_runner()
+        runner._running_agents["k"] = MagicMock()
+        runner._running_agents_ts["k"] = 123.0
+        runner._busy_ack_ts["k"] = 456.0
+
+        runner._release_running_agent_state("k")
+
+        assert "k" not in runner._running_agents
+        assert "k" not in runner._running_agents_ts
+        assert "k" not in runner._busy_ack_ts
+
+    def test_idempotent_on_missing_key(self):
+        """Calling twice (or on an absent key) must not raise."""
+        runner = _make_runner()
+        runner._release_running_agent_state("missing")
+        runner._release_running_agent_state("missing")  # still fine
+
+    def test_noop_on_empty_session_key(self):
+        """Empty string / None key is treated as a no-op."""
+        runner = _make_runner()
+        runner._running_agents[""] = "guard"
+        runner._release_running_agent_state("")
+        # Empty key not processed — guard value survives.
+        assert runner._running_agents[""] == "guard"
+
+    def test_preserves_other_sessions(self):
+        runner = _make_runner()
+        for k in ("a", "b", "c"):
+            runner._running_agents[k] = MagicMock()
+            runner._running_agents_ts[k] = 1.0
+            runner._busy_ack_ts[k] = 1.0
+
+        runner._release_running_agent_state("b")
+
+        assert set(runner._running_agents.keys()) == {"a", "c"}
+        assert set(runner._running_agents_ts.keys()) == {"a", "c"}
+        assert set(runner._busy_ack_ts.keys()) == {"a", "c"}
+
+    def test_handles_missing_busy_ack_attribute(self):
+        """Backward-compatible with older runners lacking _busy_ack_ts."""
+        runner = _make_runner()
+        del runner._busy_ack_ts  # simulate older version
+        runner._running_agents["k"] = MagicMock()
+        runner._running_agents_ts["k"] = 1.0
+
+        runner._release_running_agent_state("k")  # should not raise
+
+        assert "k" not in runner._running_agents
+        assert "k" not in runner._running_agents_ts
+
+    def test_concurrent_release_is_safe(self):
+        """Multiple threads releasing different keys concurrently."""
+        runner = _make_runner()
+        for i in range(50):
+            k = f"s{i}"
+            runner._running_agents[k] = MagicMock()
+            runner._running_agents_ts[k] = float(i)
+            runner._busy_ack_ts[k] = float(i)
+
+        def worker(keys):
+            for k in keys:
+                runner._release_running_agent_state(k)
+
+        threads = [
+            threading.Thread(target=worker, args=([f"s{i}" for i in range(start, 50, 5)],))
+            for start in range(5)
+        ]
+        for t in threads:
+            t.start()
+        for t in threads:
+            t.join(timeout=5)
+            assert not t.is_alive()
+
+        assert runner._running_agents == {}
+        assert runner._running_agents_ts == {}
+        assert runner._busy_ack_ts == {}
+
+
+class TestNoMoreBareDeleteSites:
+    """Regression: all bare `del self._running_agents[key]` sites were
+    converted to use the helper.  If a future contributor reverts one,
+    this test flags it.  Docstrings / comments mentioning the old
+    pattern are allowed.
+    """
+
+    def test_no_bare_del_of_running_agents_in_gateway_run(self):
+        from pathlib import Path
+        import re
+
+        gateway_run = (Path(__file__).parent.parent.parent / "gateway" / "run.py").read_text()
+        # Match `del self._running_agents[...]` that is NOT inside a
+        # triple-quoted docstring.  We scan non-docstring lines only.
+        lines = gateway_run.splitlines()
+
+        in_docstring = False
+        docstring_delim = None
+        offenders = []
+        for idx, line in enumerate(lines, start=1):
+            stripped = line.strip()
+            if not in_docstring:
+                if stripped.startswith('"""') or stripped.startswith("'''"):
+                    delim = stripped[:3]
+                    # single-line docstring?
+                    if stripped.count(delim) >= 2:
+                        continue
+                    in_docstring = True
+                    docstring_delim = delim
+                    continue
+                if re.search(r"\bdel\s+self\._running_agents\[", line):
+                    offenders.append((idx, line.rstrip()))
+            else:
+                if docstring_delim and docstring_delim in stripped:
+                    in_docstring = False
+                    docstring_delim = None
+
+        assert offenders == [], (
+            "Found bare `del self._running_agents[...]` sites in gateway/run.py. "
+            "Use self._release_running_agent_state(session_key) instead so "
+            "_running_agents_ts and _busy_ack_ts are popped in lockstep.\n"
+            + "\n".join(f"  line {n}: {l}" for n, l in offenders)
+        )
+
+
+class TestSessionDbCloseOnShutdown:
+    """_stop_impl should call .close() on both self._session_db and
+    self.session_store._db to release SQLite WAL locks before the new
+    gateway (during --replace restart) tries to open the same file.
+    """
+
+    def test_stop_impl_closes_both_session_dbs(self):
+        """Run the exact shutdown block that closes SessionDBs and verify
+        .close() was called on both holders."""
+        from gateway.run import GatewayRunner
+
+        runner = GatewayRunner.__new__(GatewayRunner)
+
+        runner_db = MagicMock()
+        store_db = MagicMock()
+
+        runner._db = runner_db
+        runner.session_store = MagicMock()
+        runner.session_store._db = store_db
+
+        # Replicate the exact production loop from _stop_impl.
+        for _db_holder in (runner, getattr(runner, "session_store", None)):
+            _db = getattr(_db_holder, "_db", None) if _db_holder else None
+            if _db is None or not hasattr(_db, "close"):
+                continue
+            _db.close()
+
+        runner_db.close.assert_called_once()
+        store_db.close.assert_called_once()
+
+    def test_shutdown_tolerates_missing_session_store(self):
+        """Gateway without a session_store attribute must not crash on shutdown."""
+        from gateway.run import GatewayRunner
+
+        runner = GatewayRunner.__new__(GatewayRunner)
+        runner._db = MagicMock()
+        # Deliberately no session_store attribute.
+
+        for _db_holder in (runner, getattr(runner, "session_store", None)):
+            _db = getattr(_db_holder, "_db", None) if _db_holder else None
+            if _db is None or not hasattr(_db, "close"):
+                continue
+            _db.close()
+
+        runner._db.close.assert_called_once()
+
+    def test_shutdown_tolerates_close_raising(self):
+        """A close() that raises must not prevent subsequent cleanup."""
+        from gateway.run import GatewayRunner
+
+        runner = GatewayRunner.__new__(GatewayRunner)
+        flaky_db = MagicMock()
+        flaky_db.close.side_effect = RuntimeError("simulated lock error")
+        healthy_db = MagicMock()
+
+        runner._db = flaky_db
+        runner.session_store = MagicMock()
+        runner.session_store._db = healthy_db
+
+        # Same pattern as production: try/except around each close().
+        for _db_holder in (runner, getattr(runner, "session_store", None)):
+            _db = getattr(_db_holder, "_db", None) if _db_holder else None
+            if _db is None or not hasattr(_db, "close"):
+                continue
+            try:
+                _db.close()
+            except Exception:
+                pass
+
+        flaky_db.close.assert_called_once()
+        healthy_db.close.assert_called_once()
@@ -0,0 +1,270 @@
+"""Tests for SessionStore.prune_old_entries and the gateway watcher that calls it.
+
+The SessionStore in-memory dict (and its backing sessions.json) grew
+unbounded — every unique (platform, chat_id, thread_id, user_id) tuple
+ever seen was kept forever, regardless of how stale it became.  These
+tests pin the prune behaviour:
+
+  * Entries older than max_age_days (by updated_at) are removed
+  * Entries marked ``suspended`` are preserved (user-paused)
+  * Entries with an active process attached are preserved
+  * max_age_days <= 0 disables pruning entirely
+  * sessions.json is rewritten with the post-prune dict
+  * The ``updated_at`` field — not ``created_at`` — drives the decision
+    (so a long-running-but-still-active session isn't pruned)
+"""
+
+import json
+import threading
+from datetime import datetime, timedelta
+from unittest.mock import patch
+
+import pytest
+
+from gateway.config import GatewayConfig, Platform, SessionResetPolicy
+from gateway.session import SessionEntry, SessionStore
+
+
+def _make_store(tmp_path, max_age_days: int = 90, has_active_processes_fn=None):
+    """Build a SessionStore bypassing SQLite/disk-load side effects."""
+    config = GatewayConfig(
+        default_reset_policy=SessionResetPolicy(mode="none"),
+        session_store_max_age_days=max_age_days,
+    )
+    with patch("gateway.session.SessionStore._ensure_loaded"):
+        store = SessionStore(
+            sessions_dir=tmp_path,
+            config=config,
+            has_active_processes_fn=has_active_processes_fn,
+        )
+    store._db = None
+    store._loaded = True
+    return store
+
+
+def _entry(key: str, age_days: float, *, suspended: bool = False,
+           session_id: str | None = None) -> SessionEntry:
+    now = datetime.now()
+    return SessionEntry(
+        session_key=key,
+        session_id=session_id or f"sid_{key}",
+        created_at=now - timedelta(days=age_days + 30),  # arbitrary older
+        updated_at=now - timedelta(days=age_days),
+        platform=Platform.TELEGRAM,
+        chat_type="dm",
+        suspended=suspended,
+    )
+
+
+class TestPruneBasics:
+    def test_prune_removes_entries_past_max_age(self, tmp_path):
+        store = _make_store(tmp_path)
+        store._entries["old"] = _entry("old", age_days=100)
+        store._entries["fresh"] = _entry("fresh", age_days=5)
+
+        removed = store.prune_old_entries(max_age_days=90)
+
+        assert removed == 1
+        assert "old" not in store._entries
+        assert "fresh" in store._entries
+
+    def test_prune_uses_updated_at_not_created_at(self, tmp_path):
+        """A session created long ago but updated recently must be kept."""
+        store = _make_store(tmp_path)
+        now = datetime.now()
+        entry = SessionEntry(
+            session_key="long-lived",
+            session_id="sid",
+            created_at=now - timedelta(days=365),   # ancient
+            updated_at=now - timedelta(days=3),     # but just chatted
+            platform=Platform.TELEGRAM,
+            chat_type="dm",
+        )
+        store._entries["long-lived"] = entry
+
+        removed = store.prune_old_entries(max_age_days=30)
+
+        assert removed == 0
+        assert "long-lived" in store._entries
+
+    def test_prune_disabled_when_max_age_is_zero(self, tmp_path):
+        store = _make_store(tmp_path, max_age_days=0)
+        for i in range(5):
+            store._entries[f"s{i}"] = _entry(f"s{i}", age_days=365)
+
+        assert store.prune_old_entries(0) == 0
+        assert len(store._entries) == 5
+
+    def test_prune_disabled_when_max_age_is_negative(self, tmp_path):
+        store = _make_store(tmp_path)
+        store._entries["s"] = _entry("s", age_days=365)
+
+        assert store.prune_old_entries(-1) == 0
+        assert "s" in store._entries
+
+    def test_prune_skips_suspended_entries(self, tmp_path):
+        """/stop-suspended sessions must be kept for later resume."""
+        store = _make_store(tmp_path)
+        store._entries["suspended"] = _entry(
+            "suspended", age_days=1000, suspended=True
+        )
+        store._entries["idle"] = _entry("idle", age_days=1000)
+
+        removed = store.prune_old_entries(max_age_days=90)
+
+        assert removed == 1
+        assert "suspended" in store._entries
+        assert "idle" not in store._entries
+
+    def test_prune_skips_entries_with_active_processes(self, tmp_path):
+        """Sessions with active bg processes aren't pruned even if old."""
+        active_session_ids = {"sid_active"}
+
+        def _has_active(session_id: str) -> bool:
+            return session_id in active_session_ids
+
+        store = _make_store(tmp_path, has_active_processes_fn=_has_active)
+        store._entries["active"] = _entry(
+            "active", age_days=1000, session_id="sid_active"
+        )
+        store._entries["idle"] = _entry(
+            "idle", age_days=1000, session_id="sid_idle"
+        )
+
+        removed = store.prune_old_entries(max_age_days=90)
+
+        assert removed == 1
+        assert "active" in store._entries
+        assert "idle" not in store._entries
+
+    def test_prune_does_not_write_disk_when_no_removals(self, tmp_path):
+        """If nothing is evictable, _save() should NOT be called."""
+        store = _make_store(tmp_path)
+        store._entries["fresh1"] = _entry("fresh1", age_days=1)
+        store._entries["fresh2"] = _entry("fresh2", age_days=2)
+
+        save_calls = []
+        store._save = lambda: save_calls.append(1)
+
+        assert store.prune_old_entries(max_age_days=90) == 0
+        assert save_calls == []
+
+    def test_prune_writes_disk_after_removal(self, tmp_path):
+        store = _make_store(tmp_path)
+        store._entries["stale"] = _entry("stale", age_days=500)
+        store._entries["fresh"] = _entry("fresh", age_days=1)
+
+        save_calls = []
+        store._save = lambda: save_calls.append(1)
+
+        store.prune_old_entries(max_age_days=90)
+        assert save_calls == [1]
+
+    def test_prune_is_thread_safe(self, tmp_path):
+        """Prune acquires _lock internally; concurrent update_session is safe."""
+        store = _make_store(tmp_path)
+        for i in range(20):
+            age = 1000 if i % 2 == 0 else 1
+            store._entries[f"s{i}"] = _entry(f"s{i}", age_days=age)
+
+        results = []
+
+        def _pruner():
+            results.append(store.prune_old_entries(max_age_days=90))
+
+        def _reader():
+            # Mimic a concurrent update_session reader iterating under lock.
+            with store._lock:
+                list(store._entries.keys())
+
+        threads = [threading.Thread(target=_pruner)]
+        threads += [threading.Thread(target=_reader) for _ in range(4)]
+        for t in threads:
+            t.start()
+        for t in threads:
+            t.join(timeout=5)
+            assert not t.is_alive()
+
+        # Exactly one pruner ran; removed exactly the 10 stale entries.
+        assert results == [10]
+        assert len(store._entries) == 10
+        for i in range(20):
+            if i % 2 == 1:  # fresh
+                assert f"s{i}" in store._entries
+
+
+class TestPrunePersistsToDisk:
+    def test_prune_rewrites_sessions_json(self, tmp_path):
+        """After prune, sessions.json on disk reflects the new dict."""
+        config = GatewayConfig(
+            default_reset_policy=SessionResetPolicy(mode="none"),
+            session_store_max_age_days=90,
+        )
+        store = SessionStore(sessions_dir=tmp_path, config=config)
+        store._db = None
+        # Force-populate without calling get_or_create to avoid DB side-effects
+        store._entries["stale"] = _entry("stale", age_days=500)
+        store._entries["fresh"] = _entry("fresh", age_days=1)
+        store._loaded = True
+        store._save()
+
+        # Verify pre-prune state on disk.
+        saved_pre = json.loads((tmp_path / "sessions.json").read_text())
+        assert set(saved_pre.keys()) == {"stale", "fresh"}
+
+        # Prune and check disk.
+        store.prune_old_entries(max_age_days=90)
+        saved_post = json.loads((tmp_path / "sessions.json").read_text())
+        assert set(saved_post.keys()) == {"fresh"}
+
+
+class TestGatewayConfigSerialization:
+    def test_session_store_max_age_days_defaults_to_90(self):
+        cfg = GatewayConfig()
+        assert cfg.session_store_max_age_days == 90
+
+    def test_session_store_max_age_days_roundtrips(self):
+        cfg = GatewayConfig(session_store_max_age_days=30)
+        restored = GatewayConfig.from_dict(cfg.to_dict())
+        assert restored.session_store_max_age_days == 30
+
+    def test_session_store_max_age_days_missing_defaults_90(self):
+        """Loading an old config (pre-this-field) falls back to default."""
+        restored = GatewayConfig.from_dict({})
+        assert restored.session_store_max_age_days == 90
+
+    def test_session_store_max_age_days_negative_coerced_to_zero(self):
+        """A negative value (accidental or hostile) becomes 0 (disabled)."""
+        restored = GatewayConfig.from_dict({"session_store_max_age_days": -5})
+        assert restored.session_store_max_age_days == 0
+
+    def test_session_store_max_age_days_bad_type_falls_back(self):
+        """Non-int values fall back to the default, not a crash."""
+        restored = GatewayConfig.from_dict({"session_store_max_age_days": "nope"})
+        assert restored.session_store_max_age_days == 90
+
+
+class TestGatewayWatcherCallsPrune:
+    """The session_expiry_watcher should call prune_old_entries once per hour."""
+
+    def test_prune_gate_fires_on_first_tick(self):
+        """First watcher tick has _last_prune_ts=0, so the gate opens."""
+        import time as _t
+
+        last_ts = 0.0
+        prune_interval = 3600.0
+        now = _t.time()
+
+        # Mirror the production gate check in _session_expiry_watcher.
+        should_prune = (now - last_ts) > prune_interval
+        assert should_prune is True
+
+    def test_prune_gate_suppresses_within_interval(self):
+        import time as _t
+
+        last_ts = _t.time() - 600  # 10 minutes ago
+        prune_interval = 3600.0
+        now = _t.time()
+
+        should_prune = (now - last_ts) > prune_interval
+        assert should_prune is False
@@ -1,6 +1,7 @@
 """Tests for gateway /status behavior and token persistence."""

 from datetime import datetime
+import time
 from types import SimpleNamespace
 from unittest.mock import AsyncMock, MagicMock

@@ -111,6 +112,75 @@ async def test_status_command_includes_session_title_when_present():
    assert "**Title:** My titled session" in result


+@pytest.mark.asyncio
+async def test_agents_command_reports_active_agents_and_processes(monkeypatch):
+    session_key = build_session_key(_make_source())
+    session_entry = SessionEntry(
+        session_key=session_key,
+        session_id="sess-1",
+        created_at=datetime.now(),
+        updated_at=datetime.now(),
+        platform=Platform.TELEGRAM,
+        chat_type="dm",
+        total_tokens=0,
+    )
+    runner = _make_runner(session_entry)
+    running_agent = SimpleNamespace(
+        session_id="sess-running",
+        model="openrouter/test-model",
+        interrupt=MagicMock(),
+        get_activity_summary=lambda: {"seconds_since_activity": 0},
+    )
+    runner._running_agents[session_key] = running_agent
+    runner._running_agents_ts = {session_key: time.time() - 8}
+    runner._background_tasks = set()
+
+    class _FakeRegistry:
+        def list_sessions(self):
+            return [
+                {
+                    "session_id": "proc-1",
+                    "status": "running",
+                    "uptime_seconds": 17,
+                    "command": "sleep 30",
+                }
+            ]
+
+    monkeypatch.setattr("tools.process_registry.process_registry", _FakeRegistry())
+
+    result = await runner._handle_message(_make_event("/agents"))
+
+    assert "**Active agents:** 1" in result
+    assert "**Running background processes:** 1" in result
+    assert "proc-1" in result
+    running_agent.interrupt.assert_not_called()
+
+
+@pytest.mark.asyncio
+async def test_tasks_alias_routes_to_agents_command(monkeypatch):
+    session_entry = SessionEntry(
+        session_key=build_session_key(_make_source()),
+        session_id="sess-1",
+        created_at=datetime.now(),
+        updated_at=datetime.now(),
+        platform=Platform.TELEGRAM,
+        chat_type="dm",
+        total_tokens=0,
+    )
+    runner = _make_runner(session_entry)
+    runner._background_tasks = set()
+
+    class _FakeRegistry:
+        def list_sessions(self):
+            return []
+
+    monkeypatch.setattr("tools.process_registry.process_registry", _FakeRegistry())
+
+    result = await runner._handle_message(_make_event("/tasks"))
+
+    assert "Active Agents & Tasks" in result
+
+
@pytest.mark.asyncio
 async def test_handle_message_persists_agent_token_counts(monkeypatch):
    import gateway.run as gateway_run
@@ -34,7 +34,12 @@ def _ensure_telegram_mock():

 _ensure_telegram_mock()

-from gateway.platforms.telegram import TelegramAdapter, _escape_mdv2, _strip_mdv2  # noqa: E402
+from gateway.platforms.telegram import (  # noqa: E402
+    TelegramAdapter,
+    _escape_mdv2,
+    _strip_mdv2,
+    _wrap_markdown_tables,
+)


 # ---------------------------------------------------------------------------
@@ -535,6 +540,152 @@ class TestStripMdv2:
        assert _strip_mdv2("||hidden text||") == "hidden text"


+# =========================================================================
+# Markdown table auto-wrap
+# =========================================================================
+
+
+class TestWrapMarkdownTables:
+    """_wrap_markdown_tables wraps GFM pipe tables in ``` fences so
+    Telegram renders them as monospace preformatted text instead of the
+    noisy backslash-pipe mess MarkdownV2 produces."""
+
+    def test_basic_table_wrapped(self):
+        text = (
+            "Scores:\n\n"
+            "| Player | Score |\n"
+            "|--------|-------|\n"
+            "| Alice  | 150   |\n"
+            "| Bob    | 120   |\n"
+            "\nEnd."
+        )
+        out = _wrap_markdown_tables(text)
+        # Table is now wrapped in a fence
+        assert "```\n| Player | Score |" in out
+        assert "| Bob    | 120   |\n```" in out
+        # Surrounding prose is preserved
+        assert out.startswith("Scores:")
+        assert out.endswith("End.")
+
+    def test_bare_pipe_table_wrapped(self):
+        """Tables without outer pipes (GFM allows this) are still detected."""
+        text = "head1 | head2\n--- | ---\na | b\nc | d"
+        out = _wrap_markdown_tables(text)
+        assert out.startswith("```\n")
+        assert out.rstrip().endswith("```")
+        assert "head1 | head2" in out
+
+    def test_alignment_separators(self):
+        """Separator rows with :--- / ---: / :---: alignment markers match."""
+        text = (
+            "| Name | Age | City |\n"
+            "|:-----|----:|:----:|\n"
+            "| Ada  |  30 | NYC  |"
+        )
+        out = _wrap_markdown_tables(text)
+        assert out.count("```") == 2
+
+    def test_two_consecutive_tables_wrapped_separately(self):
+        text = (
+            "| A | B |\n"
+            "|---|---|\n"
+            "| 1 | 2 |\n"
+            "\n"
+            "| X | Y |\n"
+            "|---|---|\n"
+            "| 9 | 8 |"
+        )
+        out = _wrap_markdown_tables(text)
+        # Four fences total — one opening + closing per table
+        assert out.count("```") == 4
+
+    def test_plain_text_with_pipes_not_wrapped(self):
+        """A bare pipe in prose must NOT trigger wrapping."""
+        text = "Use the | pipe operator to chain commands."
+        assert _wrap_markdown_tables(text) == text
+
+    def test_horizontal_rule_not_wrapped(self):
+        """A lone '---' horizontal rule must not be mistaken for a separator."""
+        text = "Section A\n\n---\n\nSection B"
+        assert _wrap_markdown_tables(text) == text
+
+    def test_existing_code_block_with_pipes_left_alone(self):
+        """A table already inside a fenced code block must not be re-wrapped."""
+        text = (
+            "```\n"
+            "| a | b |\n"
+            "|---|---|\n"
+            "| 1 | 2 |\n"
+            "```"
+        )
+        assert _wrap_markdown_tables(text) == text
+
+    def test_no_pipe_character_short_circuits(self):
+        text = "Plain **bold** text with no table."
+        assert _wrap_markdown_tables(text) == text
+
+    def test_no_dash_short_circuits(self):
+        text = "a | b\nc | d"  # has pipes but no '-' separator row
+        assert _wrap_markdown_tables(text) == text
+
+    def test_single_column_separator_not_matched(self):
+        """Single-column tables (rare) are not detected — we require at
+        least one internal pipe in the separator row to avoid false
+        positives on formatting rules."""
+        text = "| a |\n| - |\n| b |"
+        assert _wrap_markdown_tables(text) == text
+
+
+class TestFormatMessageTables:
+    """End-to-end: a pipe table passes through format_message with its
+    pipes and dashes left alone inside the fence, not mangled by MarkdownV2
+    escaping."""
+
+    def test_table_rendered_as_code_block(self, adapter):
+        text = (
+            "Data:\n\n"
+            "| Col1 | Col2 |\n"
+            "|------|------|\n"
+            "| A    | B    |\n"
+        )
+        out = adapter.format_message(text)
+        # Pipes inside the fenced block are NOT escaped
+        assert "```\n| Col1 | Col2 |" in out
+        assert "\\|" not in out.split("```")[1]
+        # Dashes in separator not escaped inside fence
+        assert "\\-" not in out.split("```")[1]
+
+    def test_text_after_table_still_formatted(self, adapter):
+        text = (
+            "| A | B |\n"
+            "|---|---|\n"
+            "| 1 | 2 |\n"
+            "\n"
+            "Nice **work** team!"
+        )
+        out = adapter.format_message(text)
+        # MarkdownV2 bold conversion still happens outside the table
+        assert "*work*" in out
+        # Exclamation outside fence is escaped
+        assert "\\!" in out
+
+    def test_multiple_tables_in_single_message(self, adapter):
+        text = (
+            "First:\n"
+            "| A | B |\n"
+            "|---|---|\n"
+            "| 1 | 2 |\n"
+            "\n"
+            "Second:\n"
+            "| X | Y |\n"
+            "|---|---|\n"
+            "| 9 | 8 |\n"
+        )
+        out = adapter.format_message(text)
+        # Two separate fenced blocks in the output
+        assert out.count("```") == 4
+
+
@pytest.mark.asyncio
 async def test_send_escapes_chunk_indicator_for_markdownv2(adapter):
    adapter.MAX_MESSAGE_LENGTH = 80
@@ -33,6 +33,7 @@ class TestProviderRegistry:
        ("huggingface", "Hugging Face", "api_key"),
        ("zai", "Z.AI / GLM", "api_key"),
        ("xai", "xAI", "api_key"),
+        ("nvidia", "NVIDIA NIM", "api_key"),
        ("kimi-coding", "Kimi / Moonshot", "api_key"),
        ("minimax", "MiniMax", "api_key"),
        ("minimax-cn", "MiniMax (China)", "api_key"),
@@ -57,6 +58,12 @@ class TestProviderRegistry:
        assert pconfig.base_url_env_var == "XAI_BASE_URL"
        assert pconfig.inference_base_url == "https://api.x.ai/v1"

+    def test_nvidia_env_vars(self):
+        pconfig = PROVIDER_REGISTRY["nvidia"]
+        assert pconfig.api_key_env_vars == ("NVIDIA_API_KEY",)
+        assert pconfig.base_url_env_var == "NVIDIA_BASE_URL"
+        assert pconfig.inference_base_url == "https://integrate.api.nvidia.com/v1"
+
    def test_copilot_env_vars(self):
        pconfig = PROVIDER_REGISTRY["copilot"]
        assert pconfig.api_key_env_vars == ("COPILOT_GITHUB_TOKEN", "GH_TOKEN", "GITHUB_TOKEN")
@@ -106,6 +106,49 @@ class TestCmdUpdateBranchFallback:
        pull_cmds = [c for c in commands if "pull" in c]
        assert len(pull_cmds) == 0

+    @patch("shutil.which")
+    @patch("subprocess.run")
+    def test_update_refreshes_repo_and_tui_node_dependencies(
+        self, mock_run, mock_which, mock_args
+    ):
+        mock_which.side_effect = {"uv": "/usr/bin/uv", "npm": "/usr/bin/npm"}.get
+        mock_run.side_effect = _make_run_side_effect(
+            branch="main", verify_ok=True, commit_count="1"
+        )
+
+        cmd_update(mock_args)
+
+        npm_calls = [
+            (call.args[0], call.kwargs.get("cwd"))
+            for call in mock_run.call_args_list
+            if call.args and call.args[0][0] == "/usr/bin/npm"
+        ]
+
+        assert npm_calls == [
+            (
+                [
+                    "/usr/bin/npm",
+                    "install",
+                    "--silent",
+                    "--no-fund",
+                    "--no-audit",
+                    "--progress=false",
+                ],
+                PROJECT_ROOT,
+            ),
+            (
+                [
+                    "/usr/bin/npm",
+                    "install",
+                    "--silent",
+                    "--no-fund",
+                    "--no-audit",
+                    "--progress=false",
+                ],
+                PROJECT_ROOT / "ui-tui",
+            ),
+        ]
+
    def test_update_non_interactive_skips_migration_prompt(self, mock_args, capsys):
        """When stdin/stdout aren't TTYs, config migration prompt is skipped."""
        with patch("shutil.which", return_value=None), patch(
@@ -93,6 +93,8 @@ class TestResolveCommand:
    def test_canonical_name_resolves(self):
        assert resolve_command("help").name == "help"
        assert resolve_command("background").name == "background"
+        assert resolve_command("copy").name == "copy"
+        assert resolve_command("agents").name == "agents"

    def test_alias_resolves_to_canonical(self):
        assert resolve_command("bg").name == "background"
@@ -102,6 +104,7 @@ class TestResolveCommand:
        assert resolve_command("gateway").name == "platforms"
        assert resolve_command("set-home").name == "sethome"
        assert resolve_command("reload_mcp").name == "reload-mcp"
+        assert resolve_command("tasks").name == "agents"

    def test_leading_slash_stripped(self):
        assert resolve_command("/help").name == "help"
@@ -178,10 +178,6 @@ class TestGeminiContextLength:
            ctx = get_model_context_length("gemma-4-31b-it", provider="gemini")
        assert ctx == 256000

-    def test_gemma_4_26b_context(self):
-        ctx = get_model_context_length("gemma-4-26b-it", provider="gemini")
-        assert ctx == 256000
-
    def test_gemini_3_context(self):
        ctx = get_model_context_length("gemini-3.1-pro-preview", provider="gemini")
        assert ctx == 1048576
@@ -403,7 +403,8 @@ class TestValidateFormatChecks:

    def test_no_slash_model_rejected_if_not_in_api(self):
        result = _validate("gpt-5.4", api_models=["openai/gpt-5.4"])
-        assert result["accepted"] is True
+        assert result["accepted"] is False
+        assert result["persist"] is False
        assert "not found" in result["message"]


@@ -429,10 +430,10 @@ class TestValidateApiFound:
 # -- validate — API not found ------------------------------------------------

 class TestValidateApiNotFound:
-    def test_model_not_in_api_accepted_with_warning(self):
+    def test_model_not_in_api_rejected_with_guidance(self):
        result = _validate("anthropic/claude-nonexistent")
-        assert result["accepted"] is True
-        assert result["persist"] is True
+        assert result["accepted"] is False
+        assert result["persist"] is False
        assert "not found" in result["message"]

    def test_warning_includes_suggestions(self):
@@ -456,30 +457,29 @@ class TestValidateApiNotFound:
        assert "not found" in result["message"]


-# -- validate — API unreachable — accept and persist everything ----------------
+# -- validate — API unreachable — reject with guidance ----------------

 class TestValidateApiFallback:
-    def test_any_model_accepted_when_api_down(self):
+    def test_any_model_rejected_when_api_down(self):
        result = _validate("anthropic/claude-opus-4.6", api_models=None)
-        assert result["accepted"] is True
-        assert result["persist"] is True
+        assert result["accepted"] is False
+        assert result["persist"] is False

-    def test_unknown_model_also_accepted_when_api_down(self):
-        """No hardcoded catalog gatekeeping — accept, persist, and warn."""
+    def test_unknown_model_also_rejected_when_api_down(self):
        result = _validate("anthropic/claude-next-gen", api_models=None)
-        assert result["accepted"] is True
-        assert result["persist"] is True
+        assert result["accepted"] is False
+        assert result["persist"] is False
        assert "could not reach" in result["message"].lower()

-    def test_zai_model_accepted_when_api_down(self):
+    def test_zai_model_rejected_when_api_down(self):
        result = _validate("glm-5", provider="zai", api_models=None)
-        assert result["accepted"] is True
-        assert result["persist"] is True
+        assert result["accepted"] is False
+        assert result["persist"] is False

-    def test_unknown_provider_accepted_when_api_down(self):
+    def test_unknown_provider_rejected_when_api_down(self):
        result = _validate("some-model", provider="totally-unknown", api_models=None)
-        assert result["accepted"] is True
-        assert result["persist"] is True
+        assert result["accepted"] is False
+        assert result["persist"] is False

    def test_custom_endpoint_warns_with_probed_url_and_v1_hint(self):
        with patch(
@@ -499,8 +499,8 @@ class TestValidateApiFallback:
                base_url="http://localhost:8000",
            )

-        assert result["accepted"] is True
-        assert result["persist"] is True
+        assert result["accepted"] is False
+        assert result["persist"] is False
        assert "http://localhost:8000/v1/models" in result["message"]
        assert "http://localhost:8000/v1" in result["message"]

@@ -15,7 +15,7 @@ def test_opencode_go_appears_when_api_key_set():
    opencode_go = next((p for p in providers if p["slug"] == "opencode-go"), None)
    
    assert opencode_go is not None, "opencode-go should appear when OPENCODE_GO_API_KEY is set"
-    assert opencode_go["models"] == ["glm-5.1", "glm-5", "kimi-k2.5", "mimo-v2-pro", "mimo-v2-omni", "minimax-m2.7", "minimax-m2.5"]
+    assert opencode_go["models"] == ["kimi-k2.5", "glm-5.1", "glm-5", "mimo-v2-pro", "mimo-v2-omni", "minimax-m2.7", "minimax-m2.5"]
    # opencode-go can appear as "built-in" (from PROVIDER_TO_MODELS_DEV when
    # models.dev is reachable) or "hermes" (from HERMES_OVERLAYS fallback when
    # the API is unavailable, e.g. in CI).
@@ -0,0 +1,53 @@
+"""_tui_need_npm_install: auto npm when lockfile ahead of node_modules."""
+
+import os
+from pathlib import Path
+
+import pytest
+
+
+@pytest.fixture
+def main_mod():
+    import hermes_cli.main as m
+
+    return m
+
+
+def _touch_ink(root: Path) -> None:
+    ink = root / "node_modules" / "@hermes" / "ink" / "package.json"
+    ink.parent.mkdir(parents=True, exist_ok=True)
+    ink.write_text("{}")
+
+
+def test_need_install_when_ink_missing(tmp_path: Path, main_mod) -> None:
+    (tmp_path / "package-lock.json").write_text("{}")
+    assert main_mod._tui_need_npm_install(tmp_path) is True
+
+
+def test_need_install_when_lock_newer_than_marker(tmp_path: Path, main_mod) -> None:
+    _touch_ink(tmp_path)
+    (tmp_path / "package-lock.json").write_text("{}")
+    (tmp_path / "node_modules" / ".package-lock.json").write_text("{}")
+    os.utime(tmp_path / "package-lock.json", (200, 200))
+    os.utime(tmp_path / "node_modules" / ".package-lock.json", (100, 100))
+    assert main_mod._tui_need_npm_install(tmp_path) is True
+
+
+def test_no_install_when_lock_older_than_marker(tmp_path: Path, main_mod) -> None:
+    _touch_ink(tmp_path)
+    (tmp_path / "package-lock.json").write_text("{}")
+    (tmp_path / "node_modules" / ".package-lock.json").write_text("{}")
+    os.utime(tmp_path / "package-lock.json", (100, 100))
+    os.utime(tmp_path / "node_modules" / ".package-lock.json", (200, 200))
+    assert main_mod._tui_need_npm_install(tmp_path) is False
+
+
+def test_need_install_when_marker_missing(tmp_path: Path, main_mod) -> None:
+    _touch_ink(tmp_path)
+    (tmp_path / "package-lock.json").write_text("{}")
+    assert main_mod._tui_need_npm_install(tmp_path) is True
+
+
+def test_no_install_without_lockfile_when_ink_present(tmp_path: Path, main_mod) -> None:
+    _touch_ink(tmp_path)
+    assert main_mod._tui_need_npm_install(tmp_path) is False
@@ -0,0 +1,121 @@
+from argparse import Namespace
+import sys
+import types
+
+import pytest
+
+
+def _args(**overrides):
+    base = {
+        "continue_last": None,
+        "resume": None,
+        "tui": True,
+    }
+    base.update(overrides)
+    return Namespace(**base)
+
+
+@pytest.fixture
+def main_mod(monkeypatch):
+    import hermes_cli.main as mod
+
+    monkeypatch.setattr(mod, "_has_any_provider_configured", lambda: True)
+    return mod
+
+
+def test_cmd_chat_tui_continue_uses_latest_tui_session(monkeypatch, main_mod):
+    calls = []
+    captured = {}
+
+    def fake_resolve_last(source="cli"):
+        calls.append(source)
+        return "20260408_235959_a1b2c3" if source == "tui" else None
+
+    def fake_launch(resume_session_id=None, tui_dev=False):
+        captured["resume"] = resume_session_id
+        raise SystemExit(0)
+
+    monkeypatch.setattr(main_mod, "_resolve_last_session", fake_resolve_last)
+    monkeypatch.setattr(main_mod, "_resolve_session_by_name_or_id", lambda val: val)
+    monkeypatch.setattr(main_mod, "_launch_tui", fake_launch)
+
+    with pytest.raises(SystemExit):
+        main_mod.cmd_chat(_args(continue_last=True))
+
+    assert calls == ["tui"]
+    assert captured["resume"] == "20260408_235959_a1b2c3"
+
+
+def test_cmd_chat_tui_continue_falls_back_to_latest_cli_session(monkeypatch, main_mod):
+    calls = []
+    captured = {}
+
+    def fake_resolve_last(source="cli"):
+        calls.append(source)
+        if source == "tui":
+            return None
+        if source == "cli":
+            return "20260408_235959_d4e5f6"
+        return None
+
+    def fake_launch(resume_session_id=None, tui_dev=False):
+        captured["resume"] = resume_session_id
+        raise SystemExit(0)
+
+    monkeypatch.setattr(main_mod, "_resolve_last_session", fake_resolve_last)
+    monkeypatch.setattr(main_mod, "_resolve_session_by_name_or_id", lambda val: val)
+    monkeypatch.setattr(main_mod, "_launch_tui", fake_launch)
+
+    with pytest.raises(SystemExit):
+        main_mod.cmd_chat(_args(continue_last=True))
+
+    assert calls == ["tui", "cli"]
+    assert captured["resume"] == "20260408_235959_d4e5f6"
+
+
+def test_cmd_chat_tui_resume_resolves_title_before_launch(monkeypatch, main_mod):
+    captured = {}
+
+    def fake_launch(resume_session_id=None, tui_dev=False):
+        captured["resume"] = resume_session_id
+        raise SystemExit(0)
+
+    monkeypatch.setattr(main_mod, "_resolve_session_by_name_or_id", lambda val: "20260409_000000_aa11bb")
+    monkeypatch.setattr(main_mod, "_launch_tui", fake_launch)
+
+    with pytest.raises(SystemExit):
+        main_mod.cmd_chat(_args(resume="my t0p session"))
+
+    assert captured["resume"] == "20260409_000000_aa11bb"
+
+
+def test_print_tui_exit_summary_includes_resume_and_token_totals(monkeypatch, capsys):
+    import hermes_cli.main as main_mod
+
+    class _FakeDB:
+        def get_session(self, session_id):
+            assert session_id == "20260409_000001_abc123"
+            return {
+                "message_count": 2,
+                "input_tokens": 10,
+                "output_tokens": 6,
+                "cache_read_tokens": 2,
+                "cache_write_tokens": 2,
+                "reasoning_tokens": 1,
+            }
+
+        def get_session_title(self, _session_id):
+            return "demo title"
+
+        def close(self):
+            return None
+
+    monkeypatch.setitem(sys.modules, "hermes_state", types.SimpleNamespace(SessionDB=lambda: _FakeDB()))
+
+    main_mod._print_tui_exit_summary("20260409_000001_abc123")
+    out = capsys.readouterr().out
+
+    assert "Resume this session with:" in out
+    assert "hermes --tui --resume 20260409_000001_abc123" in out
+    assert 'hermes --tui -c "demo title"' in out
+    assert "Tokens:         21 (in 10, out 6, cache 4, reasoning 1)" in out
@@ -13,9 +13,29 @@ from unittest.mock import patch, MagicMock
 import pytest

 import hermes_cli.gateway as gateway_cli
+import hermes_cli.main as cli_main
 from hermes_cli.main import cmd_update


+# ---------------------------------------------------------------------------
+# Skip the real-time sleeps inside cmd_update's restart-verification path
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture(autouse=True)
+def _no_restart_verify_sleep(monkeypatch):
+    """hermes_cli/main.py uses time.sleep(3) after systemctl restart to
+    verify the service survived. Tests mock subprocess.run — nothing
+    actually restarts — so the 3s wait is dead time.
+
+    main.py does ``import time as _time`` at both module level (line 167)
+    and inside functions (lines 3281, 4384, 4401). Patching the global
+    ``time.sleep`` affects only the duration of this test.
+    """
+    import time as _real_time
+    monkeypatch.setattr(_real_time, "sleep", lambda *_a, **_k: None)
+
+
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
@@ -31,6 +31,31 @@ def _isolate_env(tmp_path, monkeypatch):
    monkeypatch.delenv("RETAINDB_PROJECT", raising=False)


+@pytest.fixture(autouse=True)
+def _cap_retaindb_sleeps(monkeypatch):
+    """Cap production-code sleeps so background-thread tests run fast.
+
+    The retaindb ``_WriteQueue._flush_row`` does ``time.sleep(2)`` after
+    errors. Across multiple tests that trigger the retry path, that adds
+    up. Cap the module's bound ``time.sleep`` to 0.05s — tests don't care
+    about the exact retry delay, only that it happens. The test file's
+    own ``time.sleep`` stays real since it uses a different reference.
+    """
+    try:
+        from plugins.memory import retaindb as _retaindb
+    except ImportError:
+        return
+
+    real_sleep = _retaindb.time.sleep
+
+    def _capped_sleep(seconds):
+        return real_sleep(min(float(seconds), 0.05))
+
+    import types as _types
+    fake_time = _types.SimpleNamespace(sleep=_capped_sleep, time=_retaindb.time.time)
+    monkeypatch.setattr(_retaindb, "time", fake_time)
+
+
 # We need the repo root on sys.path so the plugin can import agent.memory_provider
 import sys
 _repo_root = str(Path(__file__).resolve().parents[2])
@@ -130,16 +155,18 @@ class TestWriteQueue:
    def test_enqueue_creates_row(self, tmp_path):
        q, client, db_path = self._make_queue(tmp_path)
        q.enqueue("user1", "sess1", [{"role": "user", "content": "hi"}])
-        # Give the writer thread a moment to process
-        time.sleep(1)
+        # shutdown() blocks until the writer thread drains the queue — no need
+        # to pre-sleep (the old 1s sleep was a just-in-case wait, but shutdown
+        # does the right thing).
        q.shutdown()
        # If ingest succeeded, the row should be deleted
        client.ingest_session.assert_called_once()

    def test_enqueue_persists_to_sqlite(self, tmp_path):
        client = MagicMock()
-        # Make ingest hang so the row stays in SQLite
-        client.ingest_session = MagicMock(side_effect=lambda *a, **kw: time.sleep(5))
+        # Make ingest slow so the row is still in SQLite when we peek.
+        # 0.5s is plenty — the test just needs the flush to still be in-flight.
+        client.ingest_session = MagicMock(side_effect=lambda *a, **kw: time.sleep(0.5))
        db_path = tmp_path / "test_queue.db"
        q = _WriteQueue(client, db_path)
        q.enqueue("user1", "sess1", [{"role": "user", "content": "test"}])
@@ -154,8 +181,7 @@ class TestWriteQueue:
    def test_flush_deletes_row_on_success(self, tmp_path):
        q, client, db_path = self._make_queue(tmp_path)
        q.enqueue("user1", "sess1", [{"role": "user", "content": "hi"}])
-        time.sleep(1)
-        q.shutdown()
+        q.shutdown()  # blocks until drain
        # Row should be gone
        conn = sqlite3.connect(str(db_path))
        rows = conn.execute("SELECT COUNT(*) FROM pending").fetchone()[0]
@@ -168,14 +194,20 @@ class TestWriteQueue:
        db_path = tmp_path / "test_queue.db"
        q = _WriteQueue(client, db_path)
        q.enqueue("user1", "sess1", [{"role": "user", "content": "hi"}])
-        time.sleep(3)  # Allow retry + sleep(2) in _flush_row
+        # Poll for the error to be recorded (max 2s), instead of a fixed 3s wait.
+        deadline = time.time() + 2.0
+        last_error = None
+        while time.time() < deadline:
+            conn = sqlite3.connect(str(db_path))
+            row = conn.execute("SELECT last_error FROM pending").fetchone()
+            conn.close()
+            if row and row[0]:
+                last_error = row[0]
+                break
+            time.sleep(0.05)
        q.shutdown()
-        # Row should still exist with error recorded
-        conn = sqlite3.connect(str(db_path))
-        row = conn.execute("SELECT last_error FROM pending").fetchone()
-        conn.close()
-        assert row is not None
-        assert "API down" in row[0]
+        assert last_error is not None
+        assert "API down" in last_error

    def test_thread_local_connection_reuse(self, tmp_path):
        q, _, _ = self._make_queue(tmp_path)
@@ -193,14 +225,27 @@ class TestWriteQueue:
        client1.ingest_session = MagicMock(side_effect=RuntimeError("fail"))
        q1 = _WriteQueue(client1, db_path)
        q1.enqueue("user1", "sess1", [{"role": "user", "content": "lost turn"}])
-        time.sleep(3)
+        # Wait until the error is recorded (poll with short interval).
+        deadline = time.time() + 2.0
+        while time.time() < deadline:
+            conn = sqlite3.connect(str(db_path))
+            row = conn.execute("SELECT last_error FROM pending").fetchone()
+            conn.close()
+            if row and row[0]:
+                break
+            time.sleep(0.05)
        q1.shutdown()

        # Now create a new queue — it should replay the pending rows
        client2 = MagicMock()
        client2.ingest_session = MagicMock(return_value={"status": "ok"})
        q2 = _WriteQueue(client2, db_path)
-        time.sleep(2)
+        # Poll for the replay to happen.
+        deadline = time.time() + 2.0
+        while time.time() < deadline:
+            if client2.ingest_session.called:
+                break
+            time.sleep(0.05)
        q2.shutdown()

        # The replayed row should have been ingested via client2
@@ -0,0 +1,34 @@
+"""Fast-path fixtures shared across tests/run_agent/.
+
+Many tests in this directory exercise the retry/backoff paths in the
+agent loop. Production code uses ``jittered_backoff(base_delay=5.0)``
+with a ``while time.time() < sleep_end`` loop — a single retry test
+spends 5+ seconds of real wall-clock time on backoff waits.
+
+Mocking ``jittered_backoff`` to return 0.0 collapses the while-loop
+to a no-op (``time.time() < time.time() + 0`` is false immediately),
+which handles the most common case without touching ``time.sleep``.
+
+We deliberately DO NOT mock ``time.sleep`` here — some tests
+(test_interrupt_propagation, test_primary_runtime_restore, etc.) use
+the real ``time.sleep`` for threading coordination or assert that it
+was called with specific values. Tests that want to additionally
+fast-path direct ``time.sleep(N)`` calls in production code should
+monkeypatch ``run_agent.time.sleep`` locally (see
+``test_anthropic_error_handling.py`` for the pattern).
+"""
+
+from __future__ import annotations
+
+import pytest
+
+
+@pytest.fixture(autouse=True)
+def _fast_retry_backoff(monkeypatch):
+    """Short-circuit retry backoff for all tests in this directory."""
+    try:
+        import run_agent
+    except ImportError:
+        return
+
+    monkeypatch.setattr(run_agent, "jittered_backoff", lambda *a, **k: 0.0)
@@ -32,6 +32,7 @@ class TestGeneric400Heuristic:
            from run_agent import AIAgent
            a = AIAgent(
                api_key="test-key-12345",
+                base_url="https://openrouter.ai/api/v1",
                quiet_mode=True,
                skip_context_files=True,
                skip_memory=True,
@@ -19,6 +19,24 @@ import pytest

 from agent.context_compressor import SUMMARY_PREFIX
 from run_agent import AIAgent
+import run_agent
+
+
+# ---------------------------------------------------------------------------
+# Fast backoff for compression retry tests
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture(autouse=True)
+def _no_compression_sleep(monkeypatch):
+    """Short-circuit the 2s time.sleep between compression retries.
+
+    Production code has ``time.sleep(2)`` in multiple places after a 413/context
+    compression, for rate-limit smoothing. Tests assert behavior, not timing.
+    """
+    import time as _time
+    monkeypatch.setattr(_time, "sleep", lambda *_a, **_k: None)
+    monkeypatch.setattr(run_agent, "jittered_backoff", lambda *a, **k: 0.0)


 # ---------------------------------------------------------------------------
@@ -69,6 +87,7 @@ def agent():
    ):
        a = AIAgent(
            api_key="test-key-1234567890",
+            base_url="https://openrouter.ai/api/v1",
            quiet_mode=True,
            skip_context_files=True,
            skip_memory=True,
@@ -29,6 +29,8 @@ class TestFlushDeduplication:
        with patch.dict(os.environ, {"OPENROUTER_API_KEY": "test-key"}):
            from run_agent import AIAgent
            agent = AIAgent(
+                api_key="test-key",
+                base_url="https://openrouter.ai/api/v1",
                model="test/model",
                quiet_mode=True,
                session_db=session_db,
@@ -271,6 +273,8 @@ class TestFlushIdxInit:
        with patch.dict(os.environ, {"OPENROUTER_API_KEY": "test-key"}):
            from run_agent import AIAgent
            agent = AIAgent(
+                api_key="test-key",
+                base_url="https://openrouter.ai/api/v1",
                model="test/model",
                quiet_mode=True,
                skip_context_files=True,
@@ -283,6 +287,8 @@ class TestFlushIdxInit:
        with patch.dict(os.environ, {"OPENROUTER_API_KEY": "test-key"}):
            from run_agent import AIAgent
            agent = AIAgent(
+                api_key="test-key",
+                base_url="https://openrouter.ai/api/v1",
                model="test/model",
                quiet_mode=True,
                skip_context_files=True,
@@ -27,6 +27,39 @@ from gateway.config import Platform
 from gateway.session import SessionSource


+# ---------------------------------------------------------------------------
+# Fast backoff for tests that exercise the retry loop
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture(autouse=True)
+def _no_backoff_wait(monkeypatch):
+    """Short-circuit retry backoff so tests don't block on real wall-clock waits.
+
+    The production code uses jittered_backoff() with a 5s base delay plus a
+    tight time.sleep(0.2) loop. Without this patch, each 429/500/529 retry
+    test burns ~10s of real time on CI — across six tests that's ~60s for
+    behavior we're not asserting against timing.
+
+    Tests assert retry counts and final results, never wait durations.
+    """
+    import asyncio as _asyncio
+    import time as _time
+
+    monkeypatch.setattr(run_agent, "jittered_backoff", lambda *a, **k: 0.0)
+    monkeypatch.setattr(_time, "sleep", lambda *_a, **_k: None)
+
+    # Also fast-path asyncio.sleep — the gateway's _run_agent path has
+    # several await asyncio.sleep(...) calls that add real wall-clock time.
+    _real_asyncio_sleep = _asyncio.sleep
+
+    async def _fast_sleep(delay=0, *args, **kwargs):
+        # Yield to the event loop but skip the actual delay.
+        await _real_asyncio_sleep(0)
+
+    monkeypatch.setattr(_asyncio, "sleep", _fast_sleep)
+
+
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
@@ -37,6 +37,8 @@ class TestFlushAfterCompression:
        with patch.dict(os.environ, {"OPENROUTER_API_KEY": "test-key"}):
            from run_agent import AIAgent
            agent = AIAgent(
+                api_key="test-key",
+                base_url="https://openrouter.ai/api/v1",
                model="test/model",
                quiet_mode=True,
                session_db=session_db,
@@ -19,6 +19,8 @@ from run_agent import AIAgent
 def test_create_openai_client_does_not_mutate_input_kwargs(mock_openai):
    mock_openai.return_value = MagicMock()
    agent = AIAgent(
+        api_key="test-key",
+        base_url="https://openrouter.ai/api/v1",
        model="test/model",
        quiet_mode=True,
        skip_context_files=True,
@@ -23,6 +23,8 @@ from run_agent import AIAgent

 def _make_agent():
    return AIAgent(
+        api_key="test-key",
+        base_url="https://openrouter.ai/api/v1",
        model="test/model",
        quiet_mode=True,
        skip_context_files=True,
@@ -13,6 +13,24 @@ from unittest.mock import MagicMock, patch, call
 import pytest


+@pytest.fixture(autouse=True)
+def _mock_runtime_provider(monkeypatch):
+    """run_job calls resolve_runtime_provider which can try real network
+    auto-detection (~4s of socket timeouts in hermetic CI). Mock it out
+    since these tests don't care about provider resolution — the agent
+    is mocked too."""
+    import hermes_cli.runtime_provider as rp
+    def _fake_resolve(*args, **kwargs):
+        return {
+            "provider": "openrouter",
+            "api_key": "test-key",
+            "base_url": "https://openrouter.ai/api/v1",
+            "model": "test/model",
+            "api_mode": "chat_completions",
+        }
+    monkeypatch.setattr(rp, "resolve_runtime_provider", _fake_resolve)
+
+
 class TestCronJobCleanup:
    """cron/scheduler.py — end_session + close in the finally block."""

@@ -11,6 +11,16 @@ from unittest.mock import MagicMock, patch
 import pytest

 from run_agent import AIAgent
+import run_agent
+
+
+@pytest.fixture(autouse=True)
+def _no_fallback_wait(monkeypatch):
+    """Short-circuit time.sleep in fallback/recovery paths so tests don't
+    block on the ``min(3 + retry_count, 8)`` wait before a primary retry."""
+    import time as _time
+    monkeypatch.setattr(_time, "sleep", lambda *_a, **_k: None)
+    monkeypatch.setattr(run_agent, "jittered_backoff", lambda *a, **k: 0.0)


 def _make_tool_defs(*names: str) -> list:
@@ -36,6 +46,7 @@ def _make_agent(fallback_model=None):
    ):
        agent = AIAgent(
            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            quiet_mode=True,
            skip_context_files=True,
            skip_memory=True,
@@ -45,6 +45,7 @@ def test_plugin_engine_gets_context_length_on_init():

        agent = AIAgent(
            api_key="test-key-1234567890",
+            base_url="https://openrouter.ai/api/v1",
            quiet_mode=True,
            skip_context_files=True,
            skip_memory=True,
@@ -75,6 +76,7 @@ def test_plugin_engine_update_model_args():
        agent = AIAgent(
            model="openrouter/auto",
            api_key="test-key-1234567890",
+            base_url="https://openrouter.ai/api/v1",
            quiet_mode=True,
            skip_context_files=True,
            skip_memory=True,
@@ -19,6 +19,7 @@ def _make_agent(fallback_model=None):
    ):
        agent = AIAgent(
            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            quiet_mode=True,
            skip_context_files=True,
            skip_memory=True,
@@ -60,6 +60,9 @@ def _make_agent(monkeypatch, provider, api_mode="chat_completions", base_url="ht
    )
    if model:
        kwargs["model"] = model
+    base_url="https://openrouter.ai/api/v1",
+    api_key="test-key",
+    base_url="https://openrouter.ai/api/v1",
    return AIAgent(**kwargs)


@@ -55,6 +55,7 @@ def agent():
    ):
        a = AIAgent(
            api_key="test-key-1234567890",
+            base_url="https://openrouter.ai/api/v1",
            quiet_mode=True,
            skip_context_files=True,
            skip_memory=True,
@@ -76,6 +77,7 @@ def agent_with_memory_tool():
    ):
        a = AIAgent(
            api_key="test-k...7890",
+            base_url="https://openrouter.ai/api/v1",
            quiet_mode=True,
            skip_context_files=True,
            skip_memory=True,
@@ -112,12 +114,14 @@ def test_aiagent_reuses_existing_errors_log_handler():
        ):
            AIAgent(
                api_key="test-k...7890",
+                base_url="https://openrouter.ai/api/v1",
                quiet_mode=True,
                skip_context_files=True,
                skip_memory=True,
            )
            AIAgent(
                api_key="test-k...7890",
+                base_url="https://openrouter.ai/api/v1",
                quiet_mode=True,
                skip_context_files=True,
                skip_memory=True,
@@ -491,6 +495,7 @@ class TestInit:
        ):
            a = AIAgent(
                api_key="test-key-1234567890",
+                base_url="https://openrouter.ai/api/v1",
                model="openai/gpt-4o",
                quiet_mode=True,
                skip_context_files=True,
@@ -542,6 +547,7 @@ class TestInit:
        ):
            a = AIAgent(
                api_key="test-key-1234567890",
+                base_url="https://openrouter.ai/api/v1",
                quiet_mode=True,
                skip_context_files=True,
                skip_memory=True,
@@ -557,6 +563,7 @@ class TestInit:
        ):
            a = AIAgent(
                api_key="test-key-1234567890",
+                base_url="https://openrouter.ai/api/v1",
                quiet_mode=True,
                skip_context_files=True,
                skip_memory=True,
@@ -694,6 +701,7 @@ class TestBuildSystemPrompt:
        ):
            agent = AIAgent(
                api_key="test-k...7890",
+                base_url="https://openrouter.ai/api/v1",
                quiet_mode=True,
                skip_context_files=True,
                skip_memory=True,
@@ -726,6 +734,7 @@ class TestToolUseEnforcementConfig:
            a = AIAgent(
                model=model,
                api_key="test-key-1234567890",
+                base_url="https://openrouter.ai/api/v1",
                quiet_mode=True,
                skip_context_files=True,
                skip_memory=True,
@@ -822,6 +831,7 @@ class TestToolUseEnforcementConfig:
        ):
            a = AIAgent(
                api_key="test-key-1234567890",
+                base_url="https://openrouter.ai/api/v1",
                quiet_mode=True,
                skip_context_files=True,
                skip_memory=True,
@@ -3433,7 +3443,7 @@ class TestAnthropicBaseUrlPassthrough:
        ):
            mock_build.return_value = MagicMock()
            a = AIAgent(
-                api_key="sk-ant-api03-test1234567890",
+                api_key="sk-ant...7890",
                api_mode="anthropic_messages",
                quiet_mode=True,
                skip_context_files=True,
@@ -3457,6 +3467,7 @@ class TestAnthropicCredentialRefresh:
            mock_build.side_effect = [old_client, new_client]
            agent = AIAgent(
                api_key="sk-ant-oat01-stale-token",
+                base_url="https://openrouter.ai/api/v1",
                api_mode="anthropic_messages",
                quiet_mode=True,
                skip_context_files=True,
@@ -3487,6 +3498,7 @@ class TestAnthropicCredentialRefresh:
        ):
            agent = AIAgent(
                api_key="sk-ant-oat01-same-token",
+                base_url="https://openrouter.ai/api/v1",
                api_mode="anthropic_messages",
                quiet_mode=True,
                skip_context_files=True,
@@ -3514,6 +3526,7 @@ class TestAnthropicCredentialRefresh:
        ):
            agent = AIAgent(
                api_key="sk-ant-oat01-current-token",
+                base_url="https://openrouter.ai/api/v1",
                api_mode="anthropic_messages",
                quiet_mode=True,
                skip_context_files=True,
@@ -12,6 +12,15 @@ sys.modules.setdefault("fal_client", types.SimpleNamespace())
 import run_agent


+@pytest.fixture(autouse=True)
+def _no_codex_backoff(monkeypatch):
+    """Short-circuit retry backoff so Codex retry tests don't block on real
+    wall-clock waits (5s jittered_backoff base delay + tight time.sleep loop)."""
+    import time as _time
+    monkeypatch.setattr(run_agent, "jittered_backoff", lambda *a, **k: 0.0)
+    monkeypatch.setattr(_time, "sleep", lambda *_a, **_k: None)
+
+
 def _patch_agent_bootstrap(monkeypatch):
    monkeypatch.setattr(
        run_agent,
@@ -80,6 +80,8 @@ class TestStreamingAccumulator:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -120,6 +122,8 @@ class TestStreamingAccumulator:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -167,6 +171,8 @@ class TestStreamingAccumulator:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -205,6 +211,8 @@ class TestStreamingAccumulator:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -245,6 +253,8 @@ class TestStreamingCallbacks:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -277,6 +287,8 @@ class TestStreamingCallbacks:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -308,6 +320,8 @@ class TestStreamingCallbacks:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -346,6 +360,8 @@ class TestStreamingCallbacks:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -381,6 +397,8 @@ class TestStreamingCallbacks:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -428,6 +446,8 @@ class TestStreamingFallback:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -455,6 +475,8 @@ class TestStreamingFallback:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -477,6 +499,8 @@ class TestStreamingFallback:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -500,6 +524,8 @@ class TestStreamingFallback:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -542,6 +568,8 @@ class TestStreamingFallback:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -577,6 +605,8 @@ class TestStreamingFallback:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -619,6 +649,8 @@ class TestReasoningStreaming:
        mock_create.return_value = mock_client

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -646,6 +678,8 @@ class TestHasStreamConsumers:
    def test_no_consumers(self):
        from run_agent import AIAgent
        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -656,6 +690,8 @@ class TestHasStreamConsumers:
    def test_delta_callback_set(self):
        from run_agent import AIAgent
        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -667,6 +703,8 @@ class TestHasStreamConsumers:
    def test_stream_callback_set(self):
        from run_agent import AIAgent
        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -688,6 +726,8 @@ class TestCodexStreamCallbacks:
        deltas = []

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -729,6 +769,8 @@ class TestCodexStreamCallbacks:
        from run_agent import AIAgent

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -792,6 +834,8 @@ class TestCodexStreamCallbacks:
        )

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -810,6 +854,8 @@ class TestCodexStreamCallbacks:
        from run_agent import AIAgent

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -861,6 +907,8 @@ class TestAnthropicStreamCallbacks:
        from run_agent import AIAgent

        agent = AIAgent(
+            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            model="test/model",
            quiet_mode=True,
            skip_context_files=True,
@@ -22,6 +22,7 @@ def _make_agent(session_db, *, platform: str):
    ):
        agent = AIAgent(
            api_key="test-key",
+            base_url="https://openrouter.ai/api/v1",
            quiet_mode=True,
            skip_context_files=True,
            skip_memory=True,
@@ -34,3 +34,21 @@ def test_messaging_extra_includes_qrcode_for_weixin_setup():

    messaging_extra = optional_dependencies["messaging"]
    assert any(dep.startswith("qrcode") for dep in messaging_extra)
+
+
+def test_dingtalk_extra_includes_qrcode_for_qr_auth():
+    """DingTalk's QR-code device-flow auth (hermes_cli/dingtalk_auth.py)
+    needs the qrcode package."""
+    optional_dependencies = _load_optional_dependencies()
+
+    dingtalk_extra = optional_dependencies["dingtalk"]
+    assert any(dep.startswith("qrcode") for dep in dingtalk_extra)
+
+
+def test_feishu_extra_includes_qrcode_for_qr_login():
+    """Feishu's QR login flow (gateway/platforms/feishu.py) needs the
+    qrcode package."""
+    optional_dependencies = _load_optional_dependencies()
+
+    feishu_extra = optional_dependencies["feishu"]
+    assert any(dep.startswith("qrcode") for dep in feishu_extra)
--- a/Show More
+++ b/Show More