fix(state): replace version-gated migrations with declarative column reconciliation

The linear migration chain (if current_version < N: ALTER TABLE ADD COLUMN) broke when commit a7d78d3b inserted a new v7 migration (reasoning_content) and renumbered the old v7 (api_call_count) to v8. Users who updated between Apr 15–22 already had schema_version=7 from the old numbering, so the new v7 block was skipped entirely — reasoning_content was never created, causing 'sqlite3.OperationalError: no such column: reasoning_content' on /continue. Root cause: two independent sources of truth (SCHEMA_SQL for new DBs vs a manually-maintained version chain for existing DBs) that could desync when migrations were reordered. Fix: replace the 8-block version chain with _reconcile_columns(), which on every startup diffs live table columns (PRAGMA table_info) against the columns declared in SCHEMA_SQL and ADDs any that are missing. This follows the Beets/sqlite-utils pattern used across mature SQLite projects — the CREATE TABLE definition becomes the single source of truth for schema. Adding a new column is now a single change: add it to SCHEMA_SQL. No migration block to write, no version number to bump, no possibility of skipped columns from reordering. The schema_version table is retained for future data migrations that cannot be handled declaratively (row transforms), but zero version-gated ADD COLUMN blocks remain.
Merge pull request #14045 from NousResearch/bb/subagent-observability
2026-04-22 23:49:50 +05:30 · 2026-04-22 12:21:25 -05:00 · 2026-04-22 12:10:21 -05:00 · 2026-04-22 12:03:14 -05:00 · 2026-04-22 11:54:19 -05:00 · 2026-04-22 11:32:17 -05:00
79 changed files with 7206 additions and 886 deletions
@@ -1083,6 +1083,31 @@ def convert_messages_to_anthropic(
                    "name": fn.get("name", ""),
                    "input": parsed_args,
                })
+            # Kimi's /coding endpoint (Anthropic protocol) requires assistant
+            # tool-call messages to carry reasoning_content when thinking is
+            # enabled server-side.  Preserve it as a thinking block so Kimi
+            # can validate the message history.  See hermes-agent#13848.
+            #
+            # Accept empty string "" — _copy_reasoning_content_for_api()
+            # injects "" as a tier-3 fallback for Kimi tool-call messages
+            # that had no reasoning.  Kimi requires the field to exist, even
+            # if empty.
+            #
+            # Prepend (not append): Anthropic protocol requires thinking
+            # blocks before text and tool_use blocks.
+            #
+            # Guard: only add when reasoning_details didn't already contribute
+            # thinking blocks.  On native Anthropic, reasoning_details produces
+            # signed thinking blocks — adding another unsigned one from
+            # reasoning_content would create a duplicate (same text) that gets
+            # downgraded to a spurious text block on the last assistant message.
+            reasoning_content = m.get("reasoning_content")
+            _already_has_thinking = any(
+                isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking")
+                for b in blocks
+            )
+            if isinstance(reasoning_content, str) and not _already_has_thinking:
+                blocks.insert(0, {"type": "thinking", "thinking": reasoning_content})
            # Anthropic rejects empty assistant content
            effective = blocks or content
            if not effective or effective == "":
@@ -1238,6 +1263,7 @@ def convert_messages_to_anthropic(
    #    cache markers can interfere with signature validation.
    _THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
    _is_third_party = _is_third_party_anthropic_endpoint(base_url)
+    _is_kimi = _is_kimi_coding_endpoint(base_url)

    last_assistant_idx = None
    for i in range(len(result) - 1, -1, -1):
@@ -1249,7 +1275,25 @@ def convert_messages_to_anthropic(
        if m.get("role") != "assistant" or not isinstance(m.get("content"), list):
            continue

-        if _is_third_party or idx != last_assistant_idx:
+        if _is_kimi:
+            # Kimi's /coding endpoint enables thinking server-side and
+            # requires unsigned thinking blocks on replayed assistant
+            # tool-call messages.  Strip signed Anthropic blocks (Kimi
+            # can't validate signatures) but preserve the unsigned ones
+            # we synthesised from reasoning_content above.
+            new_content = []
+            for b in m["content"]:
+                if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
+                    new_content.append(b)
+                    continue
+                if b.get("signature") or b.get("data"):
+                    # Anthropic-signed block — Kimi can't validate, strip
+                    continue
+                # Unsigned thinking (synthesised from reasoning_content) —
+                # keep it: Kimi needs it for message-history validation.
+                new_content.append(b)
+            m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
+        elif _is_third_party or idx != last_assistant_idx:
            # Third-party endpoint: strip ALL thinking blocks from every
            # assistant message — signatures are Anthropic-proprietary.
            # Direct Anthropic: strip from non-latest assistant messages only.
@@ -134,6 +134,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
    "gemini": "gemini-3-flash-preview",
    "zai": "glm-4.5-flash",
    "kimi-coding": "kimi-k2-turbo-preview",
+    "stepfun": "step-3.5-flash",
    "kimi-coding-cn": "kimi-k2-turbo-preview",
    "minimax": "MiniMax-M2.7",
    "minimax-cn": "MiniMax-M2.7",
@@ -470,11 +470,16 @@ def _classify_by_status(
                retryable=False,
                should_fallback=True,
            )
-        # Generic 404 — could be model or endpoint
+        # Generic 404 with no "model not found" signal — could be a wrong
+        # endpoint path (common with local llama.cpp / Ollama / vLLM when
+        # the URL is slightly misconfigured), a proxy routing glitch, or
+        # a transient backend issue.  Classifying these as model_not_found
+        # silently falls back to a different provider and tells the model
+        # the model is missing, which is wrong and wastes a turn.  Treat
+        # as unknown so the retry loop surfaces the real error instead.
        return result_fn(
-            FailoverReason.model_not_found,
-            retryable=False,
-            should_fallback=True,
+            FailoverReason.unknown,
+            retryable=True,
        )

    if status_code == 413:
@@ -25,7 +25,7 @@ logger = logging.getLogger(__name__)
 # are preserved so the full model name reaches cache lookups and server queries.
 _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
-    "gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "minimax", "minimax-cn", "anthropic", "deepseek",
+    "gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-cn", "anthropic", "deepseek",
    "opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
    "qwen-oauth",
    "xiaomi",
@@ -36,7 +36,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
    "github-models", "kimi", "moonshot", "kimi-cn", "moonshot-cn", "claude", "deep-seek",
    "ollama",
-    "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
+    "stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
    "mimo", "xiaomi-mimo",
    "arcee-ai", "arceeai",
    "xai", "x-ai", "x.ai", "grok",
@@ -237,6 +237,8 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.moonshot.ai": "kimi-coding",
    "api.moonshot.cn": "kimi-coding-cn",
    "api.kimi.com": "kimi-coding",
+    "api.stepfun.ai": "stepfun",
+    "api.stepfun.com": "stepfun",
    "api.arcee.ai": "arcee",
    "api.minimax": "minimax",
    "dashscope.aliyuncs.com": "alibaba",
@@ -146,6 +146,7 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
    "openai-codex": "openai",
    "zai": "zai",
    "kimi-coding": "kimi-for-coding",
+    "stepfun": "stepfun",
    "kimi-coding-cn": "kimi-for-coding",
    "minimax": "minimax",
    "minimax-cn": "minimax-cn",
@@ -914,6 +914,32 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
    print(f"\033[32m✓ Worktree cleaned up: {wt_path}\033[0m")


+def _run_state_db_auto_maintenance(session_db) -> None:
+    """Call ``SessionDB.maybe_auto_prune_and_vacuum`` using current config.
+
+    Reads the ``sessions:`` section from config.yaml via
+    :func:`hermes_cli.config.load_config` (the authoritative loader that
+    deep-merges DEFAULT_CONFIG, so unmigrated configs still get default
+    values). Honours ``auto_prune`` / ``retention_days`` /
+    ``vacuum_after_prune`` / ``min_interval_hours``, and delegates to the
+    DB. Never raises — maintenance must never block interactive startup.
+    """
+    if session_db is None:
+        return
+    try:
+        from hermes_cli.config import load_config as _load_full_config
+        cfg = (_load_full_config().get("sessions") or {})
+        if not cfg.get("auto_prune", False):
+            return
+        session_db.maybe_auto_prune_and_vacuum(
+            retention_days=int(cfg.get("retention_days", 90)),
+            min_interval_hours=int(cfg.get("min_interval_hours", 24)),
+            vacuum=bool(cfg.get("vacuum_after_prune", True)),
+        )
+    except Exception as exc:
+        logger.debug("state.db auto-maintenance skipped: %s", exc)
+
+
 def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
    """Remove stale worktrees and orphaned branches on startup.

@@ -1961,7 +1987,13 @@ class HermesCLI:
            self._session_db = SessionDB()
        except Exception as e:
            logger.warning("Failed to initialize SessionDB — session will NOT be indexed for search: %s", e)
-        
+
+        # Opportunistic state.db maintenance — runs at most once per
+        # min_interval_hours, tracked via state_meta in state.db itself so
+        # it's shared across all Hermes processes for this HERMES_HOME.
+        # Never blocks startup on failure.
+        _run_state_db_auto_maintenance(self._session_db)
+
        # Deferred title: stored in memory until the session is created in the DB
        self._pending_title: Optional[str] = None
        
@@ -616,6 +616,8 @@ def load_gateway_config() -> GatewayConfig:
                    if isinstance(frc, list):
                        frc = ",".join(str(v) for v in frc)
                    os.environ["SLACK_FREE_RESPONSE_CHANNELS"] = str(frc)
+                if "reactions" in slack_cfg and not os.getenv("SLACK_REACTIONS"):
+                    os.environ["SLACK_REACTIONS"] = str(slack_cfg["reactions"]).lower()

            # Discord settings → env vars (env vars take precedence)
            discord_cfg = yaml_cfg.get("discord", {})
@@ -26,9 +26,8 @@ from .adapter import (  # noqa: F401
 # -- Onboard (QR-code scan-to-configure) -----------------------------------
 from .onboard import (  # noqa: F401
    BindStatus,
-    create_bind_task,
-    poll_bind_result,
    build_connect_url,
+    qr_register,
 )
 from .crypto import decrypt_secret, generate_bind_key  # noqa: F401

@@ -44,9 +43,8 @@ __all__ = [
    "_ssrf_redirect_guard",
    # onboard
    "BindStatus",
-    "create_bind_task",
-    "poll_bind_result",
    "build_connect_url",
+    "qr_register",
    # crypto
    "decrypt_secret",
    "generate_bind_key",
@@ -1,6 +1,10 @@
 """
 QQBot scan-to-configure (QR code onboard) module.

+Mirrors the Feishu onboarding pattern: synchronous HTTP + a single public
+entry-point ``qr_register()`` that handles the full flow (create task →
+display QR code → poll → decrypt credentials).
+
 Calls the ``q.qq.com`` ``create_bind_task`` / ``poll_bind_result`` APIs to
 generate a QR-code URL and poll for scan completion.  On success the caller
 receives the bot's *app_id*, *client_secret* (decrypted locally), and the
@@ -12,18 +16,20 @@ Reference: https://bot.q.qq.com/wiki/develop/api-v2/
 from __future__ import annotations

 import logging
+import time
 from enum import IntEnum
-from typing import Tuple
+from typing import Optional, Tuple
 from urllib.parse import quote

 from .constants import (
    ONBOARD_API_TIMEOUT,
    ONBOARD_CREATE_PATH,
+    ONBOARD_POLL_INTERVAL,
    ONBOARD_POLL_PATH,
    PORTAL_HOST,
    QR_URL_TEMPLATE,
 )
-from .crypto import generate_bind_key
+from .crypto import decrypt_secret, generate_bind_key
 from .utils import get_api_headers

 logger = logging.getLogger(__name__)
@@ -35,7 +41,7 @@ logger = logging.getLogger(__name__)


 class BindStatus(IntEnum):
-    """Status codes returned by ``poll_bind_result``."""
+    """Status codes returned by ``_poll_bind_result``."""

    NONE = 0
    PENDING = 1
@@ -44,18 +50,40 @@ class BindStatus(IntEnum):


 # ---------------------------------------------------------------------------
-# Public API
+# QR rendering
+# ---------------------------------------------------------------------------
+
+try:
+    import qrcode as _qrcode_mod
+except (ImportError, TypeError):
+    _qrcode_mod = None  # type: ignore[assignment]
+
+
+def _render_qr(url: str) -> bool:
+    """Try to render a QR code in the terminal. Returns True if successful."""
+    if _qrcode_mod is None:
+        return False
+    try:
+        qr = _qrcode_mod.QRCode(
+            error_correction=_qrcode_mod.constants.ERROR_CORRECT_M,
+            border=2,
+        )
+        qr.add_data(url)
+        qr.make(fit=True)
+        qr.print_ascii(invert=True)
+        return True
+    except Exception:
+        return False
+
+
+# ---------------------------------------------------------------------------
+# Synchronous HTTP helpers (mirrors Feishu _post_registration pattern)
 # ---------------------------------------------------------------------------


-async def create_bind_task(
-    timeout: float = ONBOARD_API_TIMEOUT,
-) -> Tuple[str, str]:
+def _create_bind_task(timeout: float = ONBOARD_API_TIMEOUT) -> Tuple[str, str]:
    """Create a bind task and return *(task_id, aes_key_base64)*.

-    The AES key is generated locally and sent to the server so it can
-    encrypt the bot credentials before returning them.
-
    Raises:
        RuntimeError: If the API returns a non-zero ``retcode``.
    """
@@ -64,8 +92,8 @@ async def create_bind_task(
    url = f"https://{PORTAL_HOST}{ONBOARD_CREATE_PATH}"
    key = generate_bind_key()

-    async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
-        resp = await client.post(url, json={"key": key}, headers=get_api_headers())
+    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
+        resp = client.post(url, json={"key": key}, headers=get_api_headers())
        resp.raise_for_status()
        data = resp.json()

@@ -80,7 +108,7 @@ async def create_bind_task(
    return task_id, key


-async def poll_bind_result(
+def _poll_bind_result(
    task_id: str,
    timeout: float = ONBOARD_API_TIMEOUT,
 ) -> Tuple[BindStatus, str, str, str]:
@@ -89,12 +117,6 @@ async def poll_bind_result(
    Returns:
        A 4-tuple of ``(status, bot_appid, bot_encrypt_secret, user_openid)``.

-        * ``bot_encrypt_secret`` is AES-256-GCM encrypted — decrypt it with
-          :func:`~gateway.platforms.qqbot.crypto.decrypt_secret` using the
-          key from :func:`create_bind_task`.
-        * ``user_openid`` is the OpenID of the person who scanned the code
-          (available when ``status == COMPLETED``).
-
    Raises:
        RuntimeError: If the API returns a non-zero ``retcode``.
    """
@@ -102,8 +124,8 @@ async def poll_bind_result(

    url = f"https://{PORTAL_HOST}{ONBOARD_POLL_PATH}"

-    async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
-        resp = await client.post(url, json={"task_id": task_id}, headers=get_api_headers())
+    with httpx.Client(timeout=timeout, follow_redirects=True) as client:
+        resp = client.post(url, json={"task_id": task_id}, headers=get_api_headers())
        resp.raise_for_status()
        data = resp.json()

@@ -122,3 +144,77 @@ async def poll_bind_result(
 def build_connect_url(task_id: str) -> str:
    """Build the QR-code target URL for a given *task_id*."""
    return QR_URL_TEMPLATE.format(task_id=quote(task_id))
+
+
+# ---------------------------------------------------------------------------
+# Public entry-point
+# ---------------------------------------------------------------------------
+
+_MAX_REFRESHES = 3
+
+
+def qr_register(timeout_seconds: int = 600) -> Optional[dict]:
+    """Run the QQBot scan-to-configure QR registration flow.
+
+    Mirrors ``feishu.qr_register()``: handles create → display → poll →
+    decrypt in one call.  Unexpected errors propagate to the caller.
+
+    :returns:
+        ``{"app_id": ..., "client_secret": ..., "user_openid": ...}`` on
+        success, or ``None`` on failure / expiry / cancellation.
+    """
+    deadline = time.monotonic() + timeout_seconds
+
+    for refresh_count in range(_MAX_REFRESHES + 1):
+        # ── Create bind task ──
+        try:
+            task_id, aes_key = _create_bind_task()
+        except Exception as exc:
+            logger.warning("[QQBot onboard] Failed to create bind task: %s", exc)
+            return None
+
+        url = build_connect_url(task_id)
+
+        # ── Display QR code + URL ──
+        print()
+        if _render_qr(url):
+            print(f"  Scan the QR code above, or open this URL directly:\n  {url}")
+        else:
+            print(f"  Open this URL in QQ on your phone:\n  {url}")
+            print("  Tip: pip install qrcode  to display a scannable QR code here")
+        print()
+
+        # ── Poll loop ──
+        while time.monotonic() < deadline:
+            try:
+                status, app_id, encrypted_secret, user_openid = _poll_bind_result(task_id)
+            except Exception:
+                time.sleep(ONBOARD_POLL_INTERVAL)
+                continue
+
+            if status == BindStatus.COMPLETED:
+                client_secret = decrypt_secret(encrypted_secret, aes_key)
+                print()
+                print(f"  QR scan complete! (App ID: {app_id})")
+                if user_openid:
+                    print(f"  Scanner's OpenID: {user_openid}")
+                return {
+                    "app_id": app_id,
+                    "client_secret": client_secret,
+                    "user_openid": user_openid,
+                }
+
+            if status == BindStatus.EXPIRED:
+                if refresh_count >= _MAX_REFRESHES:
+                    logger.warning("[QQBot onboard] QR code expired %d times — giving up", _MAX_REFRESHES)
+                    return None
+                print(f"\n  QR code expired, refreshing... ({refresh_count + 1}/{_MAX_REFRESHES})")
+                break  # next for-loop iteration creates a new task
+
+            time.sleep(ONBOARD_POLL_INTERVAL)
+        else:
+            # deadline reached without completing
+            logger.warning("[QQBot onboard] Poll timed out after %ds", timeout_seconds)
+            return None
+
+    return None
@@ -38,6 +38,7 @@ from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
    MessageType,
+    ProcessingOutcome,
    SendResult,
    SUPPORTED_DOCUMENT_TYPES,
    safe_url_for_log,
@@ -113,6 +114,11 @@ class SlackAdapter(BasePlatformAdapter):
        # Cache for _fetch_thread_context results: cache_key → _ThreadContextCache
        self._thread_context_cache: Dict[str, _ThreadContextCache] = {}
        self._THREAD_CACHE_TTL = 60.0
+        # Track message IDs that should get reaction lifecycle (DMs / @mentions).
+        self._reacting_message_ids: set = set()
+        # Track active assistant thread status indicators so stop_typing can
+        # clear them (chat_id → thread_ts).
+        self._active_status_threads: Dict[str, str] = {}

    async def connect(self) -> bool:
        """Connect to Slack via Socket Mode."""
@@ -362,6 +368,7 @@ class SlackAdapter(BasePlatformAdapter):
        if not thread_ts:
            return  # Can only set status in a thread context

+        self._active_status_threads[chat_id] = thread_ts
        try:
            await self._get_client(chat_id).assistant_threads_setStatus(
                channel_id=chat_id,
@@ -373,6 +380,22 @@ class SlackAdapter(BasePlatformAdapter):
            # in an assistant-enabled context. Falls back to reactions.
            logger.debug("[Slack] assistant.threads.setStatus failed: %s", e)

+    async def stop_typing(self, chat_id: str) -> None:
+        """Clear the assistant thread status indicator."""
+        if not self._app:
+            return
+        thread_ts = self._active_status_threads.pop(chat_id, None)
+        if not thread_ts:
+            return
+        try:
+            await self._get_client(chat_id).assistant_threads_setStatus(
+                channel_id=chat_id,
+                thread_ts=thread_ts,
+                status="",
+            )
+        except Exception as e:
+            logger.debug("[Slack] assistant.threads.setStatus clear failed: %s", e)
+
    def _dm_top_level_threads_as_sessions(self) -> bool:
        """Whether top-level Slack DMs get per-message session threads.

@@ -584,6 +607,38 @@ class SlackAdapter(BasePlatformAdapter):
            logger.debug("[Slack] reactions.remove failed (%s): %s", emoji, e)
            return False

+    def _reactions_enabled(self) -> bool:
+        """Check if message reactions are enabled via config/env."""
+        return os.getenv("SLACK_REACTIONS", "true").lower() not in ("false", "0", "no")
+
+    async def on_processing_start(self, event: MessageEvent) -> None:
+        """Add an in-progress reaction when message processing begins."""
+        if not self._reactions_enabled():
+            return
+        ts = getattr(event, "message_id", None)
+        if not ts or ts not in self._reacting_message_ids:
+            return
+        channel_id = getattr(event.source, "chat_id", None)
+        if channel_id:
+            await self._add_reaction(channel_id, ts, "eyes")
+
+    async def on_processing_complete(self, event: MessageEvent, outcome: ProcessingOutcome) -> None:
+        """Swap the in-progress reaction for a final success/failure reaction."""
+        if not self._reactions_enabled():
+            return
+        ts = getattr(event, "message_id", None)
+        if not ts or ts not in self._reacting_message_ids:
+            return
+        self._reacting_message_ids.discard(ts)
+        channel_id = getattr(event.source, "chat_id", None)
+        if not channel_id:
+            return
+        await self._remove_reaction(channel_id, ts, "eyes")
+        if outcome == ProcessingOutcome.SUCCESS:
+            await self._add_reaction(channel_id, ts, "white_check_mark")
+        elif outcome == ProcessingOutcome.FAILURE:
+            await self._add_reaction(channel_id, ts, "x")
+
    # ----- User identity resolution -----

    async def _resolve_user_name(self, user_id: str, chat_id: str = "") -> str:
@@ -1213,17 +1268,12 @@ class SlackAdapter(BasePlatformAdapter):
        # Only react when bot is directly addressed (DM or @mention).
        # In listen-all channels (require_mention=false), reacting to every
        # casual message would be noisy.
-        _should_react = is_dm or is_mentioned
-
+        _should_react = (is_dm or is_mentioned) and self._reactions_enabled()
        if _should_react:
-            await self._add_reaction(channel_id, ts, "eyes")
+            self._reacting_message_ids.add(ts)

        await self.handle_message(msg_event)

-        if _should_react:
-            await self._remove_reaction(channel_id, ts, "eyes")
-            await self._add_reaction(channel_id, ts, "white_check_mark")
-
    # ----- Approval button support (Block Kit) -----

    async def send_exec_approval(
@@ -1464,3 +1464,134 @@ class WeComAdapter(BasePlatformAdapter):
            "name": chat_id,
            "type": "group" if chat_id and chat_id.lower().startswith("group") else "dm",
        }
+
+
+# ------------------------------------------------------------------
+# QR code scan flow for obtaining bot credentials
+# ------------------------------------------------------------------
+
+_QR_GENERATE_URL = "https://work.weixin.qq.com/ai/qc/generate"
+_QR_QUERY_URL = "https://work.weixin.qq.com/ai/qc/query_result"
+_QR_CODE_PAGE = "https://work.weixin.qq.com/ai/qc/gen?source=hermes&scode="
+_QR_POLL_INTERVAL = 3  # seconds
+_QR_POLL_TIMEOUT = 300  # 5 minutes
+
+
+def qr_scan_for_bot_info(
+    *,
+    timeout_seconds: int = _QR_POLL_TIMEOUT,
+) -> Optional[Dict[str, str]]:
+    """Run the WeCom QR scan flow to obtain bot_id and secret.
+
+    Fetches a QR code from WeCom, renders it in the terminal, and polls
+    until the user scans it or the timeout expires.
+
+    Returns ``{"bot_id": ..., "secret": ...}`` on success, ``None`` on
+    failure or timeout.
+
+    Note: the ``work.weixin.qq.com/ai/qc/{generate,query_result}`` endpoints
+    used here are not part of WeCom's public developer API — they back the
+    admin-console web UI's bot-creation flow and may change without notice.
+    The same pattern is used by the feishu/dingtalk QR setup wizards.
+    """
+    try:
+        import urllib.request
+        import urllib.parse
+    except ImportError:  # pragma: no cover
+        logger.error("urllib is required for WeCom QR scan")
+        return None
+
+    generate_url = f"{_QR_GENERATE_URL}?source=hermes"
+
+    # ── Step 1: Fetch QR code ──
+    print("  Connecting to WeCom...", end="", flush=True)
+    try:
+        req = urllib.request.Request(generate_url, headers={"User-Agent": "HermesAgent/1.0"})
+        with urllib.request.urlopen(req, timeout=15) as resp:
+            raw = json.loads(resp.read().decode("utf-8"))
+    except Exception as exc:
+        logger.error("WeCom QR: failed to fetch QR code: %s", exc)
+        print(f" failed: {exc}")
+        return None
+
+    data = raw.get("data") or {}
+    scode = str(data.get("scode") or "").strip()
+    auth_url = str(data.get("auth_url") or "").strip()
+
+    if not scode or not auth_url:
+        logger.error("WeCom QR: unexpected response format: %s", raw)
+        print(" failed: unexpected response format")
+        return None
+
+    print(" done.")
+
+    # ── Step 2: Render QR code in terminal ──
+    print()
+    qr_rendered = False
+    try:
+        import qrcode as _qrcode
+        qr = _qrcode.QRCode()
+        qr.add_data(auth_url)
+        qr.make(fit=True)
+        qr.print_ascii(invert=True)
+        qr_rendered = True
+    except ImportError:
+        pass
+    except Exception:
+        pass
+
+    page_url = f"{_QR_CODE_PAGE}{urllib.parse.quote(scode)}"
+    if qr_rendered:
+        print(f"\n  Scan the QR code above, or open this URL directly:\n  {page_url}")
+    else:
+        print(f"  Open this URL in WeCom on your phone:\n\n  {page_url}\n")
+        print("  Tip: pip install qrcode  to display a scannable QR code here next time")
+    print()
+    print("  Fetching configuration results...", end="", flush=True)
+
+    # ── Step 3: Poll for result ──
+    import time
+    deadline = time.time() + timeout_seconds
+    query_url = f"{_QR_QUERY_URL}?scode={urllib.parse.quote(scode)}"
+    poll_count = 0
+
+    while time.time() < deadline:
+        try:
+            req = urllib.request.Request(query_url, headers={"User-Agent": "HermesAgent/1.0"})
+            with urllib.request.urlopen(req, timeout=10) as resp:
+                result = json.loads(resp.read().decode("utf-8"))
+        except Exception as exc:
+            logger.debug("WeCom QR poll error: %s", exc)
+            time.sleep(_QR_POLL_INTERVAL)
+            continue
+
+        poll_count += 1
+        # Print a dot on every poll so progress is visible within 3s.
+        print(".", end="", flush=True)
+
+        result_data = result.get("data") or {}
+        status = str(result_data.get("status") or "").lower()
+
+        if status == "success":
+            print()  # newline after "Fetching configuration results..." dots
+            bot_info = result_data.get("bot_info") or {}
+            bot_id = str(bot_info.get("botid") or bot_info.get("bot_id") or "").strip()
+            secret = str(bot_info.get("secret") or "").strip()
+            if bot_id and secret:
+                return {"bot_id": bot_id, "secret": secret}
+            logger.warning(
+                "WeCom QR: scan reported success but bot_info missing or incomplete: %s",
+                result_data,
+            )
+            print(
+                "  QR scan reported success but no bot credentials were returned.\n"
+                "  This usually means the bot was not actually created on the WeCom side.\n"
+                "  Falling back to manual credential entry."
+            )
+            return None
+
+        time.sleep(_QR_POLL_INTERVAL)
+
+    print()  # newline after dots
+    print(f"  QR scan timed out ({timeout_seconds // 60} minutes). Please try again.")
+    return None
@@ -710,7 +710,26 @@ class GatewayRunner:
            self._session_db = SessionDB()
        except Exception as e:
            logger.debug("SQLite session store not available: %s", e)
-        
+
+        # Opportunistic state.db maintenance: prune ended sessions older
+        # than sessions.retention_days + optional VACUUM. Tracks last-run
+        # in state_meta so it only actually executes once per
+        # sessions.min_interval_hours.  Gateway is long-lived so blocking
+        # a few seconds once per day is acceptable; failures are logged
+        # but never raised.
+        if self._session_db is not None:
+            try:
+                from hermes_cli.config import load_config as _load_full_config
+                _sess_cfg = (_load_full_config().get("sessions") or {})
+                if _sess_cfg.get("auto_prune", False):
+                    self._session_db.maybe_auto_prune_and_vacuum(
+                        retention_days=int(_sess_cfg.get("retention_days", 90)),
+                        min_interval_hours=int(_sess_cfg.get("min_interval_hours", 24)),
+                        vacuum=bool(_sess_cfg.get("vacuum_after_prune", True)),
+                    )
+            except Exception as exc:
+                logger.debug("state.db auto-maintenance skipped: %s", exc)
+
        # DM pairing store for code-based user authorization
        from gateway.pairing import PairingStore
        self.pairing_store = PairingStore()
@@ -6456,6 +6475,11 @@ class GatewayRunner:
                    session_id=task_id,
                    platform=platform_key,
                    user_id=source.user_id,
+                    user_name=source.user_name,
+                    chat_id=source.chat_id,
+                    chat_name=source.chat_name,
+                    chat_type=source.chat_type,
+                    thread_id=source.thread_id,
                    session_db=self._session_db,
                    fallback_model=self._fallback_model,
                )
@@ -7216,6 +7240,7 @@ class GatewayRunner:
                    tool_calls=msg.get("tool_calls"),
                    tool_call_id=msg.get("tool_call_id"),
                    reasoning=msg.get("reasoning"),
+                    reasoning_content=msg.get("reasoning_content"),
                )
            except Exception:
                pass  # Best-effort copy
@@ -9698,6 +9723,11 @@ class GatewayRunner:
                    session_id=session_id,
                    platform=platform_key,
                    user_id=source.user_id,
+                    user_name=source.user_name,
+                    chat_id=source.chat_id,
+                    chat_name=source.chat_name,
+                    chat_type=source.chat_type,
+                    thread_id=source.thread_id,
                    gateway_session_key=session_key,
                    session_db=self._session_db,
                    fallback_model=self._fallback_model,
@@ -1147,6 +1147,10 @@ class SessionStore:
                    tool_name=message.get("tool_name"),
                    tool_calls=message.get("tool_calls"),
                    tool_call_id=message.get("tool_call_id"),
+                    reasoning=message.get("reasoning") if message.get("role") == "assistant" else None,
+                    reasoning_content=message.get("reasoning_content") if message.get("role") == "assistant" else None,
+                    reasoning_details=message.get("reasoning_details") if message.get("role") == "assistant" else None,
+                    codex_reasoning_items=message.get("codex_reasoning_items") if message.get("role") == "assistant" else None,
                )
            except Exception as e:
                logger.debug("Session DB operation failed: %s", e)
@@ -1176,6 +1180,7 @@ class SessionStore:
                        tool_calls=msg.get("tool_calls"),
                        tool_call_id=msg.get("tool_call_id"),
                        reasoning=msg.get("reasoning") if role == "assistant" else None,
+                        reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
                        reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
                        codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
                    )
@@ -72,6 +72,8 @@ DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
 DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
 DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
 DEFAULT_OLLAMA_CLOUD_BASE_URL = "https://ollama.com/v1"
+STEPFUN_STEP_PLAN_INTL_BASE_URL = "https://api.stepfun.ai/step_plan/v1"
+STEPFUN_STEP_PLAN_CN_BASE_URL = "https://api.stepfun.com/step_plan/v1"
 CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
 CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
 CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
@@ -182,6 +184,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        inference_base_url="https://api.moonshot.cn/v1",
        api_key_env_vars=("KIMI_CN_API_KEY",),
    ),
+    "stepfun": ProviderConfig(
+        id="stepfun",
+        name="StepFun Step Plan",
+        auth_type="api_key",
+        inference_base_url=STEPFUN_STEP_PLAN_INTL_BASE_URL,
+        api_key_env_vars=("STEPFUN_API_KEY",),
+        base_url_env_var="STEPFUN_BASE_URL",
+    ),
    "arcee": ProviderConfig(
        id="arcee",
        name="Arcee AI",
@@ -992,6 +1002,7 @@ def resolve_provider(
        "x-ai": "xai", "x.ai": "xai", "grok": "xai",
        "kimi": "kimi-coding", "kimi-for-coding": "kimi-coding", "moonshot": "kimi-coding",
        "kimi-cn": "kimi-coding-cn", "moonshot-cn": "kimi-coding-cn",
+        "step": "stepfun", "stepfun-coding-plan": "stepfun",
        "arcee-ai": "arcee", "arceeai": "arcee",
        "minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
        "claude": "anthropic", "claude-code": "anthropic",
@@ -893,6 +893,34 @@ DEFAULT_CONFIG = {
        "force_ipv4": False,
    },

+    # Session storage — controls automatic cleanup of ~/.hermes/state.db.
+    # state.db accumulates every session, message, tool call, and FTS5 index
+    # entry forever.  Without auto-pruning, a heavy user (gateway + cron)
+    # reports 384MB+ databases with 68K+ messages, which slows down FTS5
+    # inserts, /resume listing, and insights queries.
+    "sessions": {
+        # When true, prune ended sessions older than retention_days once
+        # per (roughly) min_interval_hours at CLI/gateway/cron startup.
+        # Only touches ended sessions — active sessions are always preserved.
+        # Default false: session history is valuable for search recall, and
+        # silently deleting it could surprise users.  Opt in explicitly.
+        "auto_prune": False,
+        # How many days of ended-session history to keep.  Matches the
+        # default of ``hermes sessions prune``.
+        "retention_days": 90,
+        # VACUUM after a prune that actually deleted rows.  SQLite does not
+        # reclaim disk space on DELETE — freed pages are just reused on
+        # subsequent INSERTs — so without VACUUM the file stays bloated
+        # even after pruning.  VACUUM blocks writes for a few seconds per
+        # 100MB, so it only runs at startup, and only when prune deleted
+        # ≥1 session.
+        "vacuum_after_prune": True,
+        # Minimum hours between auto-maintenance runs (avoids repeating
+        # the sweep on every CLI invocation).  Tracked via state_meta in
+        # state.db itself, so it's shared across all processes.
+        "min_interval_hours": 24,
+    },
+
    # Config schema version - bump this when adding new required fields
    "_config_version": 22,
 }
@@ -1050,6 +1078,22 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "STEPFUN_API_KEY": {
+        "description": "StepFun Step Plan API key",
+        "prompt": "StepFun Step Plan API key",
+        "url": "https://platform.stepfun.com/",
+        "password": True,
+        "category": "provider",
+        "advanced": True,
+    },
+    "STEPFUN_BASE_URL": {
+        "description": "StepFun Step Plan base URL override",
+        "prompt": "StepFun Step Plan base URL (leave empty for default)",
+        "url": None,
+        "password": False,
+        "category": "provider",
+        "advanced": True,
+    },
    "ARCEEAI_API_KEY": {
        "description": "Arcee AI API key",
        "prompt": "Arcee AI API key",
@@ -2102,6 +2146,7 @@ _KNOWN_ROOT_KEYS = {
    "fallback_providers", "credential_pool_strategies", "toolsets",
    "agent", "terminal", "display", "compression", "delegation",
    "auxiliary", "custom_providers", "context", "memory", "gateway",
+    "sessions",
 }

 # Valid fields inside a custom_providers list entry
@@ -912,6 +912,7 @@ def run_doctor(args):
    _apikey_providers = [
        ("Z.AI / GLM",      ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"), "https://api.z.ai/api/paas/v4/models", "GLM_BASE_URL", True),
        ("Kimi / Moonshot",  ("KIMI_API_KEY",),                              "https://api.moonshot.ai/v1/models",   "KIMI_BASE_URL", True),
+        ("StepFun Step Plan",   ("STEPFUN_API_KEY",),                           "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
        ("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",),                    "https://api.moonshot.cn/v1/models",   None, True),
        ("Arcee AI",         ("ARCEEAI_API_KEY",),                            "https://api.arcee.ai/api/v1/models",  "ARCEE_BASE_URL", True),
        ("DeepSeek",         ("DEEPSEEK_API_KEY",),                           "https://api.deepseek.com/v1/models",  "DEEPSEEK_BASE_URL", True),
@@ -160,6 +160,8 @@ def load_hermes_dotenv(
    # Fix corrupted .env files before python-dotenv parses them (#8908).
    if user_env.exists():
        _sanitize_env_file_if_needed(user_env)
+    if project_env_path and project_env_path.exists():
+        _sanitize_env_file_if_needed(project_env_path)

    if user_env.exists():
        _load_dotenv_with_fallback(user_env, override=True)
@@ -2639,9 +2639,120 @@ def _setup_dingtalk():


 def _setup_wecom():
-    """Configure WeCom (Enterprise WeChat) via the standard platform setup."""
-    wecom_platform = next(p for p in _PLATFORMS if p["key"] == "wecom")
-    _setup_standard_platform(wecom_platform)
+    """Interactive setup for WeCom — scan QR code or manual credential input."""
+    print()
+    print(color("  ─── 💬 WeCom (Enterprise WeChat) Setup ───", Colors.CYAN))
+
+    existing_bot_id = get_env_value("WECOM_BOT_ID")
+    existing_secret = get_env_value("WECOM_SECRET")
+    if existing_bot_id and existing_secret:
+        print()
+        print_success("WeCom is already configured.")
+        if not prompt_yes_no("  Reconfigure WeCom?", False):
+            return
+
+    # ── Choose setup method ──
+    print()
+    method_choices = [
+        "Scan QR code to obtain Bot ID and Secret automatically (recommended)",
+        "Enter existing Bot ID and Secret manually",
+    ]
+    method_idx = prompt_choice("  How would you like to set up WeCom?", method_choices, 0)
+
+    bot_id = None
+    secret = None
+
+    if method_idx == 0:
+        # ── QR scan flow ──
+        try:
+            from gateway.platforms.wecom import qr_scan_for_bot_info
+        except Exception as exc:
+            print_error(f"  WeCom QR scan import failed: {exc}")
+            qr_scan_for_bot_info = None
+
+        if qr_scan_for_bot_info is not None:
+            try:
+                credentials = qr_scan_for_bot_info()
+            except KeyboardInterrupt:
+                print()
+                print_warning("  WeCom setup cancelled.")
+                return
+            except Exception as exc:
+                print_warning(f"  QR scan failed: {exc}")
+                credentials = None
+            if credentials:
+                bot_id = credentials.get("bot_id", "")
+                secret = credentials.get("secret", "")
+                print_success("  ✔ QR scan successful! Bot ID and Secret obtained.")
+
+        if not bot_id or not secret:
+            print_info("  QR scan did not complete. Continuing with manual input.")
+            bot_id = None
+            secret = None
+
+    # ── Manual credential input ──
+    if not bot_id or not secret:
+        print()
+        print_info("  1. Go to WeCom Application → Workspace → Smart Robot -> Create smart robots")
+        print_info("  2. Select API Mode")
+        print_info("  3. Copy the Bot ID and Secret from the bot's credentials info")
+        print_info("  4. The bot connects via WebSocket — no public endpoint needed")
+        print()
+        bot_id = prompt("  Bot ID", password=False)
+        if not bot_id:
+            print_warning("  Skipped — WeCom won't work without a Bot ID.")
+            return
+        secret = prompt("  Secret", password=True)
+        if not secret:
+            print_warning("  Skipped — WeCom won't work without a Secret.")
+            return
+
+    # ── Save core credentials ──
+    save_env_value("WECOM_BOT_ID", bot_id)
+    save_env_value("WECOM_SECRET", secret)
+
+    # ── Allowed users (deny-by-default security) ──
+    print()
+    print_info("  The gateway DENIES all users by default for security.")
+    print_info("  Enter user IDs to create an allowlist, or leave empty.")
+    allowed = prompt("  Allowed user IDs (comma-separated, or empty)", password=False)
+    if allowed:
+        cleaned = allowed.replace(" ", "")
+        save_env_value("WECOM_ALLOWED_USERS", cleaned)
+        print_success("  Saved — only these users can interact with the bot.")
+    else:
+        print()
+        access_choices = [
+            "Enable open access (anyone can message the bot)",
+            "Use DM pairing (unknown users request access, you approve with 'hermes pairing approve')",
+            "Disable direct messages",
+            "Skip for now (bot will deny all users until configured)",
+        ]
+        access_idx = prompt_choice("  How should unauthorized users be handled?", access_choices, 1)
+        if access_idx == 0:
+            save_env_value("WECOM_DM_POLICY", "open")
+            save_env_value("GATEWAY_ALLOW_ALL_USERS", "true")
+            print_warning("  Open access enabled — anyone can use your bot!")
+        elif access_idx == 1:
+            save_env_value("WECOM_DM_POLICY", "pairing")
+            print_success("  DM pairing mode — users will receive a code to request access.")
+            print_info("  Approve with: hermes pairing approve <platform> <code>")
+        elif access_idx == 2:
+            save_env_value("WECOM_DM_POLICY", "disabled")
+            print_warning("  Direct messages disabled.")
+        else:
+            print_info("  Skipped — configure later with 'hermes gateway setup'")
+
+    # ── Home channel (optional) ──
+    print()
+    print_info("  Chat ID for scheduled results and notifications.")
+    home = prompt("  Home chat ID (optional, for cron/notifications)", password=False)
+    if home:
+        save_env_value("WECOM_HOME_CHANNEL", home)
+        print_success(f"  Home channel set to {home}")
+
+    print()
+    print_success("💬 WeCom configured!")


 def _is_service_installed() -> bool:
@@ -3021,7 +3132,8 @@ def _setup_qqbot():
    if method_idx == 0:
        # ── QR scan-to-configure ──
        try:
-            credentials = _qqbot_qr_flow()
+            from gateway.platforms.qqbot import qr_register
+            credentials = qr_register()
        except KeyboardInterrupt:
            print()
            print_warning("  QQ Bot setup cancelled.")
@@ -3103,106 +3215,6 @@ def _setup_qqbot():
    print_info(f"  App ID: {credentials['app_id']}")


-def _qqbot_render_qr(url: str) -> bool:
-    """Try to render a QR code in the terminal. Returns True if successful."""
-    try:
-        import qrcode as _qr
-        qr = _qr.QRCode(border=1,error_correction=_qr.constants.ERROR_CORRECT_L)
-        qr.add_data(url)
-        qr.make(fit=True)
-        qr.print_ascii(invert=True)
-        return True
-    except Exception:
-        return False
-
-
-def _qqbot_qr_flow():
-    """Run the QR-code scan-to-configure flow.
-
-    Returns a dict with app_id, client_secret, user_openid on success,
-    or None on failure/cancel.
-    """
-    try:
-        from gateway.platforms.qqbot import (
-            create_bind_task, poll_bind_result, build_connect_url,
-            decrypt_secret, BindStatus,
-        )
-        from gateway.platforms.qqbot.constants import ONBOARD_POLL_INTERVAL
-    except Exception as exc:
-        print_error(f"  QQBot onboard import failed: {exc}")
-        return None
-
-    import asyncio
-    import time
-
-    MAX_REFRESHES = 3
-    refresh_count = 0
-
-    while refresh_count <= MAX_REFRESHES:
-        loop = asyncio.new_event_loop()
-
-        # ── Create bind task ──
-        try:
-            task_id, aes_key = loop.run_until_complete(create_bind_task())
-        except Exception as e:
-            print_warning(f"  Failed to create bind task: {e}")
-            loop.close()
-            return None
-
-        url = build_connect_url(task_id)
-
-        # ── Display QR code + URL ──
-        print()
-        if _qqbot_render_qr(url):
-            print(f"  Scan the QR code above, or open this URL directly:\n  {url}")
-        else:
-            print(f"  Open this URL in QQ on your phone:\n  {url}")
-            print_info("  Tip: pip install qrcode  to show a scannable QR code here")
-
-        # ── Poll loop (silent — keep QR visible at bottom) ──
-        try:
-            while True:
-                try:
-                    status, app_id, encrypted_secret, user_openid = loop.run_until_complete(
-                        poll_bind_result(task_id)
-                    )
-                except Exception:
-                    time.sleep(ONBOARD_POLL_INTERVAL)
-                    continue
-
-                if status == BindStatus.COMPLETED:
-                    client_secret = decrypt_secret(encrypted_secret, aes_key)
-                    print()
-                    print_success(f"  QR scan complete! (App ID: {app_id})")
-                    if user_openid:
-                        print_info(f"  Scanner's OpenID: {user_openid}")
-                    return {
-                        "app_id": app_id,
-                        "client_secret": client_secret,
-                        "user_openid": user_openid,
-                    }
-
-                if status == BindStatus.EXPIRED:
-                    refresh_count += 1
-                    if refresh_count > MAX_REFRESHES:
-                        print()
-                        print_warning(f"  QR code expired {MAX_REFRESHES} times — giving up.")
-                        return None
-                    print()
-                    print_warning(f"  QR code expired, refreshing... ({refresh_count}/{MAX_REFRESHES})")
-                    loop.close()
-                    break  # outer while creates a new task
-
-                time.sleep(ONBOARD_POLL_INTERVAL)
-        except KeyboardInterrupt:
-            loop.close()
-            raise
-        finally:
-            loop.close()
-
-    return None
-
-
 def _setup_signal():
    """Interactive setup for Signal messenger."""
    import shutil
@@ -3390,6 +3402,8 @@ def gateway_setup():
            _setup_feishu()
        elif platform["key"] == "qqbot":
            _setup_qqbot()
+        elif platform["key"] == "wecom":
+            _setup_wecom()
        else:
            _setup_standard_platform(platform)

@@ -1566,6 +1566,8 @@ def select_provider_and_model(args=None):
        _model_flow_anthropic(config, current_model)
    elif selected_provider == "kimi-coding":
        _model_flow_kimi(config, current_model)
+    elif selected_provider == "stepfun":
+        _model_flow_stepfun(config, current_model)
    elif selected_provider == "bedrock":
        _model_flow_bedrock(config, current_model)
    elif selected_provider in (
@@ -3462,6 +3464,140 @@ def _model_flow_kimi(config, current_model=""):
        print("No change.")


+def _infer_stepfun_region(base_url: str) -> str:
+    """Infer the current StepFun region from the configured endpoint."""
+    normalized = (base_url or "").strip().lower()
+    if "api.stepfun.com" in normalized:
+        return "china"
+    return "international"
+
+
+def _stepfun_base_url_for_region(region: str) -> str:
+    from hermes_cli.auth import (
+        STEPFUN_STEP_PLAN_CN_BASE_URL,
+        STEPFUN_STEP_PLAN_INTL_BASE_URL,
+    )
+
+    return (
+        STEPFUN_STEP_PLAN_CN_BASE_URL
+        if region == "china"
+        else STEPFUN_STEP_PLAN_INTL_BASE_URL
+    )
+
+
+def _model_flow_stepfun(config, current_model=""):
+    """StepFun Step Plan flow with region-specific endpoints."""
+    from hermes_cli.auth import (
+        PROVIDER_REGISTRY,
+        _prompt_model_selection,
+        _save_model_choice,
+        deactivate_provider,
+    )
+    from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
+    from hermes_cli.models import fetch_api_models
+
+    provider_id = "stepfun"
+    pconfig = PROVIDER_REGISTRY[provider_id]
+    key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
+    base_url_env = pconfig.base_url_env_var or ""
+
+    existing_key = ""
+    for ev in pconfig.api_key_env_vars:
+        existing_key = get_env_value(ev) or os.getenv(ev, "")
+        if existing_key:
+            break
+
+    if not existing_key:
+        print(f"No {pconfig.name} API key configured.")
+        if key_env:
+            try:
+                import getpass
+                new_key = getpass.getpass(f"{key_env} (or Enter to cancel): ").strip()
+            except (KeyboardInterrupt, EOFError):
+                print()
+                return
+            if not new_key:
+                print("Cancelled.")
+                return
+            save_env_value(key_env, new_key)
+            existing_key = new_key
+            print("API key saved.")
+            print()
+    else:
+        print(f"  {pconfig.name} API key: {existing_key[:8]}... ✓")
+        print()
+
+    current_base = ""
+    if base_url_env:
+        current_base = get_env_value(base_url_env) or os.getenv(base_url_env, "")
+    if not current_base:
+        model_cfg = config.get("model")
+        if isinstance(model_cfg, dict):
+            current_base = str(model_cfg.get("base_url") or "").strip()
+    current_region = _infer_stepfun_region(current_base or pconfig.inference_base_url)
+
+    region_choices = [
+        ("international", f"International ({_stepfun_base_url_for_region('international')})"),
+        ("china", f"China ({_stepfun_base_url_for_region('china')})"),
+    ]
+    ordered_regions = []
+    for region_key, label in region_choices:
+        if region_key == current_region:
+            ordered_regions.insert(0, (region_key, f"{label}  ← currently active"))
+        else:
+            ordered_regions.append((region_key, label))
+    ordered_regions.append(("cancel", "Cancel"))
+
+    region_idx = _prompt_provider_choice([label for _, label in ordered_regions])
+    if region_idx is None or ordered_regions[region_idx][0] == "cancel":
+        print("No change.")
+        return
+
+    selected_region = ordered_regions[region_idx][0]
+    effective_base = _stepfun_base_url_for_region(selected_region)
+    if base_url_env:
+        save_env_value(base_url_env, effective_base)
+
+    live_models = fetch_api_models(existing_key, effective_base)
+    if live_models:
+        model_list = live_models
+        print(f"  Found {len(model_list)} model(s) from {pconfig.name} API")
+    else:
+        model_list = _PROVIDER_MODELS.get(provider_id, [])
+        if model_list:
+            print(
+                f"  Could not auto-detect models from {pconfig.name} API — "
+                "showing Step Plan fallback catalog."
+            )
+
+    if model_list:
+        selected = _prompt_model_selection(model_list, current_model=current_model)
+    else:
+        try:
+            selected = input("Model name: ").strip()
+        except (KeyboardInterrupt, EOFError):
+            selected = None
+
+    if selected:
+        _save_model_choice(selected)
+
+        cfg = load_config()
+        model = cfg.get("model")
+        if not isinstance(model, dict):
+            model = {"default": model} if model else {}
+            cfg["model"] = model
+        model["provider"] = provider_id
+        model["base_url"] = effective_base
+        model.pop("api_mode", None)
+        save_config(cfg)
+        deactivate_provider()
+
+        config["model"] = dict(model)
+        print(f"Default model set to: {selected} (via {pconfig.name})")
+    else:
+        print("No change.")
+
+
 def _model_flow_bedrock_api_key(config, region, current_model=""):
    """Bedrock API Key mode — uses the OpenAI-compatible bedrock-mantle endpoint.

@@ -6530,6 +6666,7 @@ For more help on a command:
            "zai",
            "kimi-coding",
            "kimi-coding-cn",
+            "stepfun",
            "minimax",
            "minimax-cn",
            "kilocode",
@@ -143,7 +143,7 @@ MODEL_ALIASES: dict[str, ModelIdentity] = {
    # Z.AI / GLM
    "glm":       ModelIdentity("z-ai", "glm"),

-    # StepFun
+    # Step Plan (StepFun)
    "step":      ModelIdentity("stepfun", "step"),

    # Xiaomi
@@ -210,6 +210,10 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "kimi-k2-turbo-preview",
        "kimi-k2-0905-preview",
    ],
+    "stepfun": [
+        "step-3.5-flash",
+        "step-3.5-flash-2603",
+    ],
    "moonshot": [
        "kimi-k2.6",
        "kimi-k2.5",
@@ -699,6 +703,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("zai",            "Z.AI / GLM",               "Z.AI / GLM (Zhipu AI direct API)"),
    ProviderEntry("kimi-coding",    "Kimi / Kimi Coding Plan",  "Kimi Coding Plan (api.kimi.com) & Moonshot API"),
    ProviderEntry("kimi-coding-cn", "Kimi / Moonshot (China)",  "Kimi / Moonshot China (Moonshot CN direct API)"),
+    ProviderEntry("stepfun",        "StepFun Step Plan",       "StepFun Step Plan (agent/coding models via Step Plan API)"),
    ProviderEntry("minimax",        "MiniMax",                  "MiniMax (global direct API)"),
    ProviderEntry("minimax-cn",     "MiniMax (China)",          "MiniMax China (domestic direct API)"),
    ProviderEntry("alibaba",        "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
@@ -733,6 +738,8 @@ _PROVIDER_ALIASES = {
    "moonshot": "kimi-coding",
    "kimi-cn": "kimi-coding-cn",
    "moonshot-cn": "kimi-coding-cn",
+    "step": "stepfun",
+    "stepfun-coding-plan": "stepfun",
    "arcee-ai": "arcee",
    "arceeai": "arcee",
    "minimax-china": "minimax-cn",
@@ -1613,6 +1620,19 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
                    return live
        except Exception:
            pass
+    if normalized == "stepfun":
+        try:
+            from hermes_cli.auth import resolve_api_key_provider_credentials
+
+            creds = resolve_api_key_provider_credentials("stepfun")
+            api_key = str(creds.get("api_key") or "").strip()
+            base_url = str(creds.get("base_url") or "").strip()
+            if api_key and base_url:
+                live = fetch_api_models(api_key, base_url)
+                if live:
+                    return live
+        except Exception:
+            pass
    if normalized == "anthropic":
        live = _fetch_anthropic_models()
        if live:
@@ -734,6 +734,30 @@ class PluginManager:
                )
                kind = "standalone"

+            # Auto-coerce user-installed memory providers to kind="exclusive"
+            # so they're routed to plugins/memory discovery instead of being
+            # loaded by the general PluginManager (which has no
+            # register_memory_provider on PluginContext). Mirrors the
+            # heuristic in plugins/memory/__init__.py:_is_memory_provider_dir.
+            # Bundled memory providers are already skipped via skip_names.
+            if kind == "standalone" and "kind" not in data:
+                init_file = plugin_dir / "__init__.py"
+                if init_file.exists():
+                    try:
+                        source_text = init_file.read_text(errors="replace")[:8192]
+                        if (
+                            "register_memory_provider" in source_text
+                            or "MemoryProvider" in source_text
+                        ):
+                            kind = "exclusive"
+                            logger.debug(
+                                "Plugin %s: detected memory provider, "
+                                "treating as kind='exclusive'",
+                                key,
+                            )
+                    except Exception:
+                        pass
+
            return PluginManifest(
                name=name,
                version=str(data.get("version", "")),
@@ -94,6 +94,12 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        transport="openai_chat",
        base_url_env_var="KIMI_BASE_URL",
    ),
+    "stepfun": HermesOverlay(
+        transport="openai_chat",
+        extra_env_vars=("STEPFUN_API_KEY",),
+        base_url_override="https://api.stepfun.ai/step_plan/v1",
+        base_url_env_var="STEPFUN_BASE_URL",
+    ),
    "minimax": HermesOverlay(
        transport="anthropic_messages",
        base_url_env_var="MINIMAX_BASE_URL",
@@ -210,6 +216,10 @@ ALIASES: Dict[str, str] = {
    "kimi-coding-cn": "kimi-for-coding",
    "moonshot": "kimi-for-coding",

+    # stepfun
+    "step": "stepfun",
+    "stepfun-coding-plan": "stepfun",
+
    # minimax-cn
    "minimax-china": "minimax-cn",
    "minimax_cn": "minimax-cn",
@@ -294,6 +304,7 @@ _LABEL_OVERRIDES: Dict[str, str] = {
    "nous": "Nous Portal",
    "openai-codex": "OpenAI Codex",
    "copilot-acp": "GitHub Copilot ACP",
+    "stepfun": "StepFun Step Plan",
    "xiaomi": "Xiaomi MiMo",
    "local": "Local endpoint",
    "bedrock": "AWS Bedrock",
@@ -96,6 +96,7 @@ _DEFAULT_PROVIDER_MODELS = {
    "zai": ["glm-5.1", "glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
    "kimi-coding": ["kimi-k2.6", "kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
    "kimi-coding-cn": ["kimi-k2.6", "kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
+    "stepfun": ["step-3.5-flash", "step-3.5-flash-2603"],
    "arcee": ["trinity-large-thinking", "trinity-large-preview", "trinity-mini"],
    "minimax": ["MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"],
    "minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"],
@@ -804,6 +805,7 @@ def setup_model_provider(config: dict, *, quick: bool = False):
            "zai": "Z.AI / GLM",
            "kimi-coding": "Kimi / Moonshot",
            "kimi-coding-cn": "Kimi / Moonshot (China)",
+            "stepfun": "StepFun Step Plan",
            "minimax": "MiniMax",
            "minimax-cn": "MiniMax CN",
            "anthropic": "Anthropic",
@@ -122,6 +122,7 @@ def show_status(args):
        "OpenAI": "OPENAI_API_KEY",
        "Z.AI/GLM": "GLM_API_KEY",
        "Kimi": "KIMI_API_KEY",
+        "StepFun Step Plan": "STEPFUN_API_KEY",
        "MiniMax": "MINIMAX_API_KEY",
        "MiniMax-CN": "MINIMAX_CN_API_KEY",
        "Firecrawl": "FIRECRAWL_API_KEY",
@@ -252,6 +253,7 @@ def show_status(args):
    apikey_providers = {
        "Z.AI / GLM":       ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"),
        "Kimi / Moonshot":  ("KIMI_API_KEY",),
+        "StepFun Step Plan": ("STEPFUN_API_KEY",),
        "MiniMax":          ("MINIMAX_API_KEY",),
        "MiniMax (China)":  ("MINIMAX_CN_API_KEY",),
    }
@@ -2189,7 +2189,8 @@ async def get_usage_analytics(days: int = 30):
                   SUM(reasoning_tokens) as reasoning_tokens,
                   COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
                   COALESCE(SUM(actual_cost_usd), 0) as actual_cost,
-                   COUNT(*) as sessions
+                   COUNT(*) as sessions,
+                   SUM(COALESCE(api_call_count, 0)) as api_calls
            FROM sessions WHERE started_at > ?
            GROUP BY day ORDER BY day
        """, (cutoff,))
@@ -2200,7 +2201,8 @@ async def get_usage_analytics(days: int = 30):
                   SUM(input_tokens) as input_tokens,
                   SUM(output_tokens) as output_tokens,
                   COALESCE(SUM(estimated_cost_usd), 0) as estimated_cost,
-                   COUNT(*) as sessions
+                   COUNT(*) as sessions,
+                   SUM(COALESCE(api_call_count, 0)) as api_calls
            FROM sessions WHERE started_at > ? AND model IS NOT NULL
            GROUP BY model ORDER BY SUM(input_tokens) + SUM(output_tokens) DESC
        """, (cutoff,))
@@ -2213,7 +2215,8 @@ async def get_usage_analytics(days: int = 30):
                   SUM(reasoning_tokens) as total_reasoning,
                   COALESCE(SUM(estimated_cost_usd), 0) as total_estimated_cost,
                   COALESCE(SUM(actual_cost_usd), 0) as total_actual_cost,
-                   COUNT(*) as total_sessions
+                   COUNT(*) as total_sessions,
+                   SUM(COALESCE(api_call_count, 0)) as total_api_calls
            FROM sessions WHERE started_at > ?
        """, (cutoff,))
        totals = dict(cur3.fetchone())
@@ -31,7 +31,7 @@ T = TypeVar("T")

 DEFAULT_DB_PATH = get_hermes_home() / "state.db"

-SCHEMA_VERSION = 6
+SCHEMA_VERSION = 8

 SCHEMA_SQL = """
 CREATE TABLE IF NOT EXISTS schema_version (
@@ -65,6 +65,7 @@ CREATE TABLE IF NOT EXISTS sessions (
    cost_source TEXT,
    pricing_version TEXT,
    title TEXT,
+    api_call_count INTEGER DEFAULT 0,
    FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
 );

@@ -80,10 +81,16 @@ CREATE TABLE IF NOT EXISTS messages (
    token_count INTEGER,
    finish_reason TEXT,
    reasoning TEXT,
+    reasoning_content TEXT,
    reasoning_details TEXT,
    codex_reasoning_items TEXT
 );

+CREATE TABLE IF NOT EXISTS state_meta (
+    key TEXT PRIMARY KEY,
+    value TEXT
+);
+
 CREATE INDEX IF NOT EXISTS idx_sessions_source ON sessions(source);
 CREATE INDEX IF NOT EXISTS idx_sessions_parent ON sessions(parent_session_id);
 CREATE INDEX IF NOT EXISTS idx_sessions_started ON sessions(started_at DESC);
@@ -249,89 +256,136 @@ class SessionDB:
                self._conn.close()
                self._conn = None

+    @staticmethod
+    def _parse_schema_columns(schema_sql: str) -> Dict[str, Dict[str, str]]:
+        """Extract expected columns per table from SCHEMA_SQL.
+
+        Uses an in-memory SQLite database to parse the SQL — SQLite itself
+        handles all syntax (DEFAULT expressions with commas, inline
+        REFERENCES, CHECK constraints, etc.) so there are zero regex
+        edge cases.  The in-memory DB is opened, the schema DDL is
+        executed, and PRAGMA table_info extracts the column metadata.
+
+        Adding a column to SCHEMA_SQL is all that's needed; the
+        reconciliation loop picks it up automatically.
+        """
+        ref = sqlite3.connect(":memory:")
+        try:
+            ref.executescript(schema_sql)
+            table_columns: Dict[str, Dict[str, str]] = {}
+            for (tbl,) in ref.execute(
+                "SELECT name FROM sqlite_master "
+                "WHERE type='table' AND name NOT LIKE 'sqlite_%'"
+            ).fetchall():
+                cols: Dict[str, str] = {}
+                for row in ref.execute(
+                    f'PRAGMA table_info("{tbl}")'
+                ).fetchall():
+                    # row: (cid, name, type, notnull, dflt_value, pk)
+                    col_name = row[1]
+                    col_type = row[2] or ""
+                    notnull = row[3]
+                    default = row[4]
+                    pk = row[5]
+                    # Reconstruct the type expression for ALTER TABLE ADD COLUMN
+                    parts = [col_type] if col_type else []
+                    if notnull and not pk:
+                        parts.append("NOT NULL")
+                    if default is not None:
+                        parts.append(f"DEFAULT {default}")
+                    cols[col_name] = " ".join(parts)
+                table_columns[tbl] = cols
+            return table_columns
+        finally:
+            ref.close()
+
+    def _reconcile_columns(self, cursor: sqlite3.Cursor) -> None:
+        """Ensure live tables have every column declared in SCHEMA_SQL.
+
+        Follows the Beets/sqlite-utils pattern: the CREATE TABLE definition
+        in SCHEMA_SQL is the single source of truth for the desired schema.
+        On every startup this method diffs the live columns (via PRAGMA
+        table_info) against the declared columns, and ADDs any that are
+        missing.
+
+        This makes column additions a declarative operation — just add
+        the column to SCHEMA_SQL and it appears on the next startup.
+        Version-gated migration blocks are no longer needed for ADD COLUMN.
+        """
+        expected = self._parse_schema_columns(SCHEMA_SQL)
+        for table_name, declared_cols in expected.items():
+            # Get current columns from the live table
+            try:
+                rows = cursor.execute(
+                    f'PRAGMA table_info("{table_name}")'
+                ).fetchall()
+            except sqlite3.OperationalError:
+                continue  # Table doesn't exist yet (shouldn't happen after executescript)
+            live_cols = set()
+            for row in rows:
+                # PRAGMA table_info returns (cid, name, type, notnull, dflt_value, pk)
+                name = row[1] if isinstance(row, (tuple, list)) else row["name"]
+                live_cols.add(name)
+
+            for col_name, col_type in declared_cols.items():
+                if col_name not in live_cols:
+                    safe_name = col_name.replace('"', '""')
+                    try:
+                        cursor.execute(
+                            f'ALTER TABLE "{table_name}" ADD COLUMN "{safe_name}" {col_type}'
+                        )
+                    except sqlite3.OperationalError as exc:
+                        # Expected: "duplicate column name" from a race or
+                        # re-run.  Unexpected: "Cannot add a NOT NULL column
+                        # with default value NULL" from a schema mistake.
+                        # Log at DEBUG so it's visible in agent.log.
+                        logger.debug(
+                            "reconcile %s.%s: %s", table_name, col_name, exc,
+                        )
+
    def _init_schema(self):
-        """Create tables and FTS if they don't exist, run migrations."""
+        """Create tables and FTS if they don't exist, reconcile columns.
+
+        Schema management follows the declarative reconciliation pattern
+        (Beets, sqlite-utils): SCHEMA_SQL is the single source of truth.
+        On existing databases, _reconcile_columns() diffs live columns
+        against SCHEMA_SQL and ADDs any missing ones.  This eliminates
+        the version-gated migration chain for column additions, making
+        it impossible for reordered or inserted migrations to skip columns.
+
+        The schema_version table is retained for future data migrations
+        (transforming existing rows) which cannot be handled declaratively.
+        """
        cursor = self._conn.cursor()

        cursor.executescript(SCHEMA_SQL)

-        # Check schema version and run migrations
+        # ── Declarative column reconciliation ──────────────────────────
+        # Diff live tables against SCHEMA_SQL and ADD any missing columns.
+        # This is idempotent and self-healing: even if a version-gated
+        # migration was skipped (e.g. due to version renumbering), the
+        # column gets created here.
+        self._reconcile_columns(cursor)
+
+        # ── Schema version bookkeeping ─────────────────────────────────
+        # Bump to current so future data migrations (if any) can gate on
+        # version.  No version-gated column additions remain.
        cursor.execute("SELECT version FROM schema_version LIMIT 1")
        row = cursor.fetchone()
        if row is None:
-            cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
+            cursor.execute(
+                "INSERT INTO schema_version (version) VALUES (?)",
+                (SCHEMA_VERSION,),
+            )
        else:
            current_version = row["version"] if isinstance(row, sqlite3.Row) else row[0]
-            if current_version < 2:
-                # v2: add finish_reason column to messages
-                try:
-                    cursor.execute("ALTER TABLE messages ADD COLUMN finish_reason TEXT")
-                except sqlite3.OperationalError:
-                    pass  # Column already exists
-                cursor.execute("UPDATE schema_version SET version = 2")
-            if current_version < 3:
-                # v3: add title column to sessions
-                try:
-                    cursor.execute("ALTER TABLE sessions ADD COLUMN title TEXT")
-                except sqlite3.OperationalError:
-                    pass  # Column already exists
-                cursor.execute("UPDATE schema_version SET version = 3")
-            if current_version < 4:
-                # v4: add unique index on title (NULLs allowed, only non-NULL must be unique)
-                try:
-                    cursor.execute(
-                        "CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
-                        "ON sessions(title) WHERE title IS NOT NULL"
-                    )
-                except sqlite3.OperationalError:
-                    pass  # Index already exists
-                cursor.execute("UPDATE schema_version SET version = 4")
-            if current_version < 5:
-                new_columns = [
-                    ("cache_read_tokens", "INTEGER DEFAULT 0"),
-                    ("cache_write_tokens", "INTEGER DEFAULT 0"),
-                    ("reasoning_tokens", "INTEGER DEFAULT 0"),
-                    ("billing_provider", "TEXT"),
-                    ("billing_base_url", "TEXT"),
-                    ("billing_mode", "TEXT"),
-                    ("estimated_cost_usd", "REAL"),
-                    ("actual_cost_usd", "REAL"),
-                    ("cost_status", "TEXT"),
-                    ("cost_source", "TEXT"),
-                    ("pricing_version", "TEXT"),
-                ]
-                for name, column_type in new_columns:
-                    try:
-                        # name and column_type come from the hardcoded tuple above,
-                        # not user input. Double-quote identifier escaping is applied
-                        # as defense-in-depth; SQLite DDL cannot be parameterized.
-                        safe_name = name.replace('"', '""')
-                        cursor.execute(f'ALTER TABLE sessions ADD COLUMN "{safe_name}" {column_type}')
-                    except sqlite3.OperationalError:
-                        pass
-                cursor.execute("UPDATE schema_version SET version = 5")
-            if current_version < 6:
-                # v6: add reasoning columns to messages table — preserves assistant
-                # reasoning text and structured reasoning_details across gateway
-                # session turns.  Without these, reasoning chains are lost on
-                # session reload, breaking multi-turn reasoning continuity for
-                # providers that replay reasoning (OpenRouter, OpenAI, Nous).
-                for col_name, col_type in [
-                    ("reasoning", "TEXT"),
-                    ("reasoning_details", "TEXT"),
-                    ("codex_reasoning_items", "TEXT"),
-                ]:
-                    try:
-                        safe = col_name.replace('"', '""')
-                        cursor.execute(
-                            f'ALTER TABLE messages ADD COLUMN "{safe}" {col_type}'
-                        )
-                    except sqlite3.OperationalError:
-                        pass  # Column already exists
-                cursor.execute("UPDATE schema_version SET version = 6")
+            if current_version < SCHEMA_VERSION:
+                cursor.execute(
+                    "UPDATE schema_version SET version = ?",
+                    (SCHEMA_VERSION,),
+                )

-        # Unique title index — always ensure it exists (safe to run after migrations
-        # since the title column is guaranteed to exist at this point)
+        # Unique title index — always ensure it exists
        try:
            cursor.execute(
                "CREATE UNIQUE INDEX IF NOT EXISTS idx_sessions_title_unique "
@@ -435,6 +489,7 @@ class SessionDB:
        billing_provider: Optional[str] = None,
        billing_base_url: Optional[str] = None,
        billing_mode: Optional[str] = None,
+        api_call_count: int = 0,
        absolute: bool = False,
    ) -> None:
        """Update token counters and backfill model if not already set.
@@ -464,7 +519,8 @@ class SessionDB:
                   billing_provider = COALESCE(billing_provider, ?),
                   billing_base_url = COALESCE(billing_base_url, ?),
                   billing_mode = COALESCE(billing_mode, ?),
-                   model = COALESCE(model, ?)
+                   model = COALESCE(model, ?),
+                   api_call_count = ?
                   WHERE id = ?"""
        else:
            sql = """UPDATE sessions SET
@@ -484,7 +540,8 @@ class SessionDB:
                   billing_provider = COALESCE(billing_provider, ?),
                   billing_base_url = COALESCE(billing_base_url, ?),
                   billing_mode = COALESCE(billing_mode, ?),
-                   model = COALESCE(model, ?)
+                   model = COALESCE(model, ?),
+                   api_call_count = COALESCE(api_call_count, 0) + ?
                   WHERE id = ?"""
        params = (
            input_tokens,
@@ -502,6 +559,7 @@ class SessionDB:
            billing_base_url,
            billing_mode,
            model,
+            api_call_count,
            session_id,
        )
        def _do(conn):
@@ -922,6 +980,7 @@ class SessionDB:
        token_count: int = None,
        finish_reason: str = None,
        reasoning: str = None,
+        reasoning_content: str = None,
        reasoning_details: Any = None,
        codex_reasoning_items: Any = None,
    ) -> int:
@@ -951,8 +1010,8 @@ class SessionDB:
            cursor = conn.execute(
                """INSERT INTO messages (session_id, role, content, tool_call_id,
                   tool_calls, tool_name, timestamp, token_count, finish_reason,
-                   reasoning, reasoning_details, codex_reasoning_items)
-                   VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+                   reasoning, reasoning_content, reasoning_details, codex_reasoning_items)
+                   VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
                (
                    session_id,
                    role,
@@ -964,6 +1023,7 @@ class SessionDB:
                    token_count,
                    finish_reason,
                    reasoning,
+                    reasoning_content,
                    reasoning_details_json,
                    codex_items_json,
                ),
@@ -1014,7 +1074,7 @@ class SessionDB:
        with self._lock:
            cursor = self._conn.execute(
                "SELECT role, content, tool_call_id, tool_calls, tool_name, "
-                "reasoning, reasoning_details, codex_reasoning_items "
+                "reasoning, reasoning_content, reasoning_details, codex_reasoning_items "
                "FROM messages WHERE session_id = ? ORDER BY timestamp, id",
                (session_id,),
            )
@@ -1038,6 +1098,8 @@ class SessionDB:
            if row["role"] == "assistant":
                if row["reasoning"]:
                    msg["reasoning"] = row["reasoning"]
+                if row["reasoning_content"] is not None:
+                    msg["reasoning_content"] = row["reasoning_content"]
                if row["reasoning_details"]:
                    try:
                        msg["reasoning_details"] = json.loads(row["reasoning_details"])
@@ -1441,3 +1503,116 @@ class SessionDB:
            return len(session_ids)

        return self._execute_write(_do)
+
+    # ── Meta key/value (for scheduler bookkeeping) ──
+
+    def get_meta(self, key: str) -> Optional[str]:
+        """Read a value from the state_meta key/value store."""
+        with self._lock:
+            row = self._conn.execute(
+                "SELECT value FROM state_meta WHERE key = ?", (key,)
+            ).fetchone()
+        if row is None:
+            return None
+        return row["value"] if isinstance(row, sqlite3.Row) else row[0]
+
+    def set_meta(self, key: str, value: str) -> None:
+        """Write a value to the state_meta key/value store."""
+        def _do(conn):
+            conn.execute(
+                "INSERT INTO state_meta (key, value) VALUES (?, ?) "
+                "ON CONFLICT(key) DO UPDATE SET value = excluded.value",
+                (key, value),
+            )
+        self._execute_write(_do)
+
+    # ── Space reclamation ──
+
+    def vacuum(self) -> None:
+        """Run VACUUM to reclaim disk space after large deletes.
+
+        SQLite does not shrink the database file when rows are deleted —
+        freed pages just get reused on the next insert. After a prune that
+        removed hundreds of sessions, the file stays bloated unless we
+        explicitly VACUUM.
+
+        VACUUM rewrites the entire DB, so it's expensive (seconds per
+        100MB) and cannot run inside a transaction. It also acquires an
+        exclusive lock, so callers must ensure no other writers are
+        active. Safe to call at startup before the gateway/CLI starts
+        serving traffic.
+        """
+        # VACUUM cannot be executed inside a transaction.
+        with self._lock:
+            # Best-effort WAL checkpoint first, then VACUUM.
+            try:
+                self._conn.execute("PRAGMA wal_checkpoint(TRUNCATE)")
+            except Exception:
+                pass
+            self._conn.execute("VACUUM")
+
+    def maybe_auto_prune_and_vacuum(
+        self,
+        retention_days: int = 90,
+        min_interval_hours: int = 24,
+        vacuum: bool = True,
+    ) -> Dict[str, Any]:
+        """Idempotent auto-maintenance: prune old sessions + optional VACUUM.
+
+        Records the last run timestamp in state_meta so subsequent calls
+        within ``min_interval_hours`` no-op. Designed to be called once at
+        startup from long-lived entrypoints (CLI, gateway, cron scheduler).
+
+        Never raises. On any failure, logs a warning and returns a dict
+        with ``"error"`` set.
+
+        Returns a dict with keys:
+          - ``"skipped"`` (bool) — true if within min_interval_hours of last run
+          - ``"pruned"`` (int)   — number of sessions deleted
+          - ``"vacuumed"`` (bool) — true if VACUUM ran
+          - ``"error"`` (str, optional) — present only on failure
+        """
+        result: Dict[str, Any] = {"skipped": False, "pruned": 0, "vacuumed": False}
+        try:
+            # Skip if another process/call did maintenance recently.
+            last_raw = self.get_meta("last_auto_prune")
+            now = time.time()
+            if last_raw:
+                try:
+                    last_ts = float(last_raw)
+                    if now - last_ts < min_interval_hours * 3600:
+                        result["skipped"] = True
+                        return result
+                except (TypeError, ValueError):
+                    pass  # corrupt meta; treat as no prior run
+
+            pruned = self.prune_sessions(older_than_days=retention_days)
+            result["pruned"] = pruned
+
+            # Only VACUUM if we actually freed rows — VACUUM on a tight DB
+            # is wasted I/O. Threshold keeps small DBs from paying the cost.
+            if vacuum and pruned > 0:
+                try:
+                    self.vacuum()
+                    result["vacuumed"] = True
+                except Exception as exc:
+                    logger.warning("state.db VACUUM failed: %s", exc)
+
+            # Record the attempt even if pruned == 0, so we don't retry
+            # every startup within the min_interval_hours window.
+            self.set_meta("last_auto_prune", str(now))
+
+            if pruned > 0:
+                logger.info(
+                    "state.db auto-maintenance: pruned %d session(s) older than %d days%s",
+                    pruned,
+                    retention_days,
+                    " + VACUUM" if result["vacuumed"] else "",
+                )
+        except Exception as exc:
+            # Maintenance must never block startup. Log and return error marker.
+            logger.warning("state.db auto-maintenance failed: %s", exc)
+            result["error"] = str(exc)
+
+        return result
+
@@ -0,0 +1,5 @@
+# Web Development
+
+Optional skills for client-side web development workflows — embedding agents, copilots, and AI-native UX patterns into user-facing web apps.
+
+These are distinct from Hermes' own browser automation (Browserbase, Camofox), which operate *on* websites from outside. Web-development skills here help users build *into* their own websites.
@@ -0,0 +1,189 @@
+---
+name: page-agent
+description: Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-users of your site drive the UI with natural language ("click login, fill username as John"). No Python, no headless browser, no extension required. Use this skill when the user is a web developer who wants to add an AI copilot to their SaaS / admin panel / B2B tool, make a legacy web app accessible via natural language, or evaluate page-agent against a local (Ollama) or cloud (Qwen / OpenAI / OpenRouter) LLM. NOT for server-side browser automation — point those users to Hermes' built-in browser tool instead.
+version: 1.0.0
+author: Hermes Agent
+license: MIT
+metadata:
+  hermes:
+    tags: [web, javascript, agent, browser, gui, alibaba, embed, copilot, saas]
+    category: web-development
+---
+
+# page-agent
+
+alibaba/page-agent (https://github.com/alibaba/page-agent, 17k+ stars, MIT) is an in-page GUI agent written in TypeScript. It lives inside a webpage, reads the DOM as text (no screenshots, no multi-modal LLM), and executes natural-language instructions like "click the login button, then fill username as John" against the current page. Pure client-side — the host site just includes a script and passes an OpenAI-compatible LLM endpoint.
+
+## When to use this skill
+
+Load this skill when a user wants to:
+
+- **Ship an AI copilot inside their own web app** (SaaS, admin panel, B2B tool, ERP, CRM) — "users on my dashboard should be able to type 'create invoice for Acme Corp and email it' instead of clicking through five screens"
+- **Modernize a legacy web app** without rewriting the frontend — page-agent drops on top of existing DOM
+- **Add accessibility via natural language** — voice / screen-reader users drive the UI by describing what they want
+- **Demo or evaluate page-agent** against a local (Ollama) or hosted (Qwen, OpenAI, OpenRouter) LLM
+- **Build interactive training / product demos** — let an AI walk a user through "how to submit an expense report" live in the real UI
+
+## When NOT to use this skill
+
+- User wants **Hermes itself to drive a browser** → use Hermes' built-in browser tool (Browserbase / Camofox). page-agent is the *opposite* direction.
+- User wants **cross-tab automation without embedding** → use Playwright, browser-use, or the page-agent Chrome extension
+- User needs **visual grounding / screenshots** → page-agent is text-DOM only; use a multimodal browser agent instead
+
+## Prerequisites
+
+- Node 22.13+ or 24+, npm 10+ (docs claim 11+ but 10.9 works fine)
+- An OpenAI-compatible LLM endpoint: Qwen (DashScope), OpenAI, Ollama, OpenRouter, or anything speaking `/v1/chat/completions`
+- Browser with devtools (for debugging)
+
+## Path 1 — 30-second demo via CDN (no install)
+
+Fastest way to see it work. Uses alibaba's free testing LLM proxy — **for evaluation only**, subject to their terms.
+
+Add to any HTML page (or paste into the devtools console as a bookmarklet):
+
+```html
+<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js" crossorigin="true"></script>
+```
+
+A panel appears. Type an instruction. Done.
+
+Bookmarklet form (drop into bookmarks bar, click on any page):
+
+```javascript
+javascript:(function(){var s=document.createElement('script');s.src='https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js';document.head.appendChild(s);})();
+```
+
+## Path 2 — npm install into your own web app (production use)
+
+Inside an existing web project (React / Vue / Svelte / plain):
+
+```bash
+npm install page-agent
+```
+
+Wire it up with your own LLM endpoint — **never ship the demo CDN to real users**:
+
+```javascript
+import { PageAgent } from 'page-agent'
+
+const agent = new PageAgent({
+    model: 'qwen3.5-plus',
+    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
+    apiKey: process.env.LLM_API_KEY,   // never hardcode
+    language: 'en-US',
+})
+
+// Show the panel for end users:
+agent.panel.show()
+
+// Or drive it programmatically:
+await agent.execute('Click submit button, then fill username as John')
+```
+
+Provider examples (any OpenAI-compatible endpoint works):
+
+| Provider | `baseURL` | `model` |
+|----------|-----------|---------|
+| Qwen / DashScope | `https://dashscope.aliyuncs.com/compatible-mode/v1` | `qwen3.5-plus` |
+| OpenAI | `https://api.openai.com/v1` | `gpt-4o-mini` |
+| Ollama (local) | `http://localhost:11434/v1` | `qwen3:14b` |
+| OpenRouter | `https://openrouter.ai/api/v1` | `anthropic/claude-sonnet-4.6` |
+
+**Key config fields** (passed to `new PageAgent({...})`):
+
+- `model`, `baseURL`, `apiKey` — LLM connection
+- `language` — UI language (`en-US`, `zh-CN`, etc.)
+- Allowlist and data-masking hooks exist for locking down what the agent can touch — see https://alibaba.github.io/page-agent/ for the full option list
+
+**Security.** Don't put your `apiKey` in client-side code for a real deployment — proxy LLM calls through your backend and point `baseURL` at your proxy. The demo CDN exists because alibaba runs that proxy for evaluation.
+
+## Path 3 — clone the source repo (contributing, or hacking on it)
+
+Use this when the user wants to modify page-agent itself, test it against arbitrary sites via a local IIFE bundle, or develop the browser extension.
+
+```bash
+git clone https://github.com/alibaba/page-agent.git
+cd page-agent
+npm ci              # exact lockfile install (or `npm i` to allow updates)
+```
+
+Create `.env` in the repo root with an LLM endpoint. Example:
+
+```
+LLM_MODEL_NAME=gpt-4o-mini
+LLM_API_KEY=sk-...
+LLM_BASE_URL=https://api.openai.com/v1
+```
+
+Ollama flavor:
+
+```
+LLM_BASE_URL=http://localhost:11434/v1
+LLM_API_KEY=NA
+LLM_MODEL_NAME=qwen3:14b
+```
+
+Common commands:
+
+```bash
+npm start           # docs/website dev server
+npm run build       # build every package
+npm run dev:demo    # serve IIFE bundle at http://localhost:5174/page-agent.demo.js
+npm run dev:ext     # develop the browser extension (WXT + React)
+npm run build:ext   # build the extension
+```
+
+**Test on any website** using the local IIFE bundle. Add this bookmarklet:
+
+```javascript
+javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log('PageAgent ready!');document.head.appendChild(s);})();
+```
+
+Then: `npm run dev:demo`, click the bookmarklet on any page, and the local build injects. Auto-rebuilds on save.
+
+**Warning:** your `.env` `LLM_API_KEY` is inlined into the IIFE bundle during dev builds. Don't share the bundle. Don't commit it. Don't paste the URL into Slack. (Verified: grepping the public dev bundle returns the literal values from `.env`.)
+
+## Repo layout (Path 3)
+
+Monorepo with npm workspaces. Key packages:
+
+| Package | Path | Purpose |
+|---------|------|---------|
+| `page-agent` | `packages/page-agent/` | Main entry with UI panel |
+| `@page-agent/core` | `packages/core/` | Core agent logic, no UI |
+| `@page-agent/mcp` | `packages/mcp/` | MCP server (beta) |
+| — | `packages/llms/` | LLM client |
+| — | `packages/page-controller/` | DOM ops + visual feedback |
+| — | `packages/ui/` | Panel + i18n |
+| — | `packages/extension/` | Chrome/Firefox extension |
+| — | `packages/website/` | Docs + landing site |
+
+## Verifying it works
+
+After Path 1 or Path 2:
+1. Open the page in a browser with devtools open
+2. You should see a floating panel. If not, check the console for errors (most common: CORS on the LLM endpoint, wrong `baseURL`, or a bad API key)
+3. Type a simple instruction matching something visible on the page ("click the Login link")
+4. Watch the Network tab — you should see a request to your `baseURL`
+
+After Path 3:
+1. `npm run dev:demo` prints `Accepting connections at http://localhost:5174`
+2. `curl -I http://localhost:5174/page-agent.demo.js` returns `HTTP/1.1 200 OK` with `Content-Type: application/javascript`
+3. Click the bookmarklet on any site; panel appears
+
+## Pitfalls
+
+- **Demo CDN in production** — don't. It's rate-limited, uses alibaba's free proxy, and their terms forbid production use.
+- **API key exposure** — any key passed to `new PageAgent({apiKey: ...})` ships in your JS bundle. Always proxy through your own backend for real deployments.
+- **Non-OpenAI-compatible endpoints** fail silently or with cryptic errors. If your provider needs native Anthropic/Gemini formatting, use an OpenAI-compatibility proxy (LiteLLM, OpenRouter) in front.
+- **CSP blocks** — sites with strict Content-Security-Policy may refuse to load the CDN script or disallow inline eval. In that case, self-host from your origin.
+- **Restart dev server** after editing `.env` in Path 3 — Vite only reads env at startup.
+- **Node version** — the repo declares `^22.13.0 || >=24`. Node 20 will fail `npm ci` with engine errors.
+- **npm 10 vs 11** — docs say npm 11+; npm 10.9 actually works fine.
+
+## Reference
+
+- Repo: https://github.com/alibaba/page-agent
+- Docs: https://alibaba.github.io/page-agent/
+- License: MIT (built on browser-use's DOM processing internals, Copyright 2024 Gregor Zunic)
@@ -84,7 +84,10 @@ Config file: `~/.hermes/hindsight/config.json`
 | `retain_async` | `true` | Process retain asynchronously on the Hindsight server |
 | `retain_every_n_turns` | `1` | Retain every N turns (1 = every turn) |
 | `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
-| `tags` | — | Tags applied when storing memories |
+| `retain_tags` | — | Default tags applied to retained memories; merged with per-call tool tags |
+| `retain_source` | — | Optional `metadata.source` attached to retained memories |
+| `retain_user_prefix` | `User` | Label used before user turns in auto-retained transcripts |
+| `retain_assistant_prefix` | `Assistant` | Label used before assistant turns in auto-retained transcripts |

 ### Integration

@@ -113,7 +116,7 @@ Available in `hybrid` and `tools` memory modes:

 | Tool | Description |
 |------|-------------|
-| `hindsight_retain` | Store information with auto entity extraction |
+| `hindsight_retain` | Store information with auto entity extraction; supports optional per-call `tags` |
 | `hindsight_recall` | Multi-strategy search (semantic + entity graph) |
 | `hindsight_reflect` | Cross-memory synthesis (LLM-powered) |

@@ -6,11 +6,15 @@ retrieval. Supports cloud (API key) and local modes.
 Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.

 Config via environment variables:
-  HINDSIGHT_API_KEY   — API key for Hindsight Cloud
-  HINDSIGHT_BANK_ID   — memory bank identifier (default: hermes)
-  HINDSIGHT_BUDGET    — recall budget: low/mid/high (default: mid)
-  HINDSIGHT_API_URL   — API endpoint
-  HINDSIGHT_MODE      — cloud or local (default: cloud)
+  HINDSIGHT_API_KEY                — API key for Hindsight Cloud
+  HINDSIGHT_BANK_ID                — memory bank identifier (default: hermes)
+  HINDSIGHT_BUDGET                 — recall budget: low/mid/high (default: mid)
+  HINDSIGHT_API_URL                — API endpoint
+  HINDSIGHT_MODE                   — cloud or local (default: cloud)
+  HINDSIGHT_RETAIN_TAGS            — comma-separated tags attached to retained memories
+  HINDSIGHT_RETAIN_SOURCE          — metadata source value attached to retained memories
+  HINDSIGHT_RETAIN_USER_PREFIX     — label used before user turns in retained transcripts
+  HINDSIGHT_RETAIN_ASSISTANT_PREFIX — label used before assistant turns in retained transcripts

 Or via $HERMES_HOME/hindsight/config.json (profile-scoped), falling back to
 ~/.hindsight/config.json (legacy, shared) for backward compatibility.
@@ -24,7 +28,7 @@ import logging
 import os
 import threading

-from hermes_constants import get_hermes_home
+from datetime import datetime, timezone
 from typing import Any, Dict, List

 from agent.memory_provider import MemoryProvider
@@ -99,6 +103,11 @@ RETAIN_SCHEMA = {
        "properties": {
            "content": {"type": "string", "description": "The information to store."},
            "context": {"type": "string", "description": "Short label (e.g. 'user preference', 'project decision')."},
+            "tags": {
+                "type": "array",
+                "items": {"type": "string"},
+                "description": "Optional per-call tags to merge with configured default retain tags.",
+            },
        },
        "required": ["content"],
    },
@@ -168,6 +177,10 @@ def _load_config() -> dict:
    return {
        "mode": os.environ.get("HINDSIGHT_MODE", "cloud"),
        "apiKey": os.environ.get("HINDSIGHT_API_KEY", ""),
+        "retain_tags": os.environ.get("HINDSIGHT_RETAIN_TAGS", ""),
+        "retain_source": os.environ.get("HINDSIGHT_RETAIN_SOURCE", ""),
+        "retain_user_prefix": os.environ.get("HINDSIGHT_RETAIN_USER_PREFIX", "User"),
+        "retain_assistant_prefix": os.environ.get("HINDSIGHT_RETAIN_ASSISTANT_PREFIX", "Assistant"),
        "banks": {
            "hermes": {
                "bankId": os.environ.get("HINDSIGHT_BANK_ID", "hermes"),
@@ -178,6 +191,48 @@ def _load_config() -> dict:
    }


+def _normalize_retain_tags(value: Any) -> List[str]:
+    """Normalize tag config/tool values to a deduplicated list of strings."""
+    if value is None:
+        return []
+
+    raw_items: list[Any]
+    if isinstance(value, list):
+        raw_items = value
+    elif isinstance(value, str):
+        text = value.strip()
+        if not text:
+            return []
+        if text.startswith("["):
+            try:
+                parsed = json.loads(text)
+            except Exception:
+                parsed = None
+            if isinstance(parsed, list):
+                raw_items = parsed
+            else:
+                raw_items = text.split(",")
+        else:
+            raw_items = text.split(",")
+    else:
+        raw_items = [value]
+
+    normalized = []
+    seen = set()
+    for item in raw_items:
+        tag = str(item).strip()
+        if not tag or tag in seen:
+            continue
+        seen.add(tag)
+        normalized.append(tag)
+    return normalized
+
+
+def _utc_timestamp() -> str:
+    """Return current UTC timestamp in ISO-8601 with milliseconds and Z suffix."""
+    return datetime.now(timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")
+
+
 # ---------------------------------------------------------------------------
 # MemoryProvider implementation
 # ---------------------------------------------------------------------------
@@ -195,6 +250,19 @@ class HindsightMemoryProvider(MemoryProvider):
        self._llm_base_url = ""
        self._memory_mode = "hybrid"  # "context", "tools", or "hybrid"
        self._prefetch_method = "recall"  # "recall" or "reflect"
+        self._retain_tags: List[str] = []
+        self._retain_source = ""
+        self._retain_user_prefix = "User"
+        self._retain_assistant_prefix = "Assistant"
+        self._platform = ""
+        self._user_id = ""
+        self._user_name = ""
+        self._chat_id = ""
+        self._chat_name = ""
+        self._chat_type = ""
+        self._thread_id = ""
+        self._agent_identity = ""
+        self._turn_index = 0
        self._client = None
        self._prefetch_result = ""
        self._prefetch_lock = threading.Lock()
@@ -210,6 +278,7 @@ class HindsightMemoryProvider(MemoryProvider):
        # Retain controls
        self._auto_retain = True
        self._retain_every_n_turns = 1
+        self._retain_async = True
        self._retain_context = "conversation between Hermes Agent and the User"
        self._turn_counter = 0
        self._session_turns: list[str] = []  # accumulates ALL turns for the session
@@ -224,7 +293,6 @@ class HindsightMemoryProvider(MemoryProvider):
        # Bank
        self._bank_mission = ""
        self._bank_retain_mission: str | None = None
-        self._retain_async = True

    @property
    def name(self) -> str:
@@ -423,7 +491,10 @@ class HindsightMemoryProvider(MemoryProvider):
            {"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
            {"key": "memory_mode", "description": "Memory integration mode", "default": "hybrid", "choices": ["hybrid", "context", "tools"]},
            {"key": "recall_prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
-            {"key": "tags", "description": "Tags applied when storing memories (comma-separated)", "default": ""},
+            {"key": "retain_tags", "description": "Default tags applied to retained memories (comma-separated)", "default": ""},
+            {"key": "retain_source", "description": "Metadata source value attached to retained memories", "default": ""},
+            {"key": "retain_user_prefix", "description": "Label used before user turns in retained transcripts", "default": "User"},
+            {"key": "retain_assistant_prefix", "description": "Label used before assistant turns in retained transcripts", "default": "Assistant"},
            {"key": "recall_tags", "description": "Tags to filter when searching memories (comma-separated)", "default": ""},
            {"key": "recall_tags_match", "description": "Tag matching mode for recall", "default": "any", "choices": ["any", "all", "any_strict", "all_strict"]},
            {"key": "auto_recall", "description": "Automatically recall memories before each turn", "default": True},
@@ -467,7 +538,7 @@ class HindsightMemoryProvider(MemoryProvider):
        return self._client

    def initialize(self, session_id: str, **kwargs) -> None:
-        self._session_id = session_id
+        self._session_id = str(session_id or "").strip()

        # Check client version and auto-upgrade if needed
        try:
@@ -496,6 +567,16 @@ class HindsightMemoryProvider(MemoryProvider):
            pass  # packaging not available or other issue — proceed anyway

        self._config = _load_config()
+        self._platform = str(kwargs.get("platform") or "").strip()
+        self._user_id = str(kwargs.get("user_id") or "").strip()
+        self._user_name = str(kwargs.get("user_name") or "").strip()
+        self._chat_id = str(kwargs.get("chat_id") or "").strip()
+        self._chat_name = str(kwargs.get("chat_name") or "").strip()
+        self._chat_type = str(kwargs.get("chat_type") or "").strip()
+        self._thread_id = str(kwargs.get("thread_id") or "").strip()
+        self._agent_identity = str(kwargs.get("agent_identity") or "").strip()
+        self._turn_index = 0
+        self._session_turns = []
        self._mode = self._config.get("mode", "cloud")
        # "local" is a legacy alias for "local_embedded"
        if self._mode == "local":
@@ -513,7 +594,7 @@ class HindsightMemoryProvider(MemoryProvider):
        memory_mode = self._config.get("memory_mode", "hybrid")
        self._memory_mode = memory_mode if memory_mode in ("context", "tools", "hybrid") else "hybrid"

-        prefetch_method = self._config.get("recall_prefetch_method", "recall")
+        prefetch_method = self._config.get("recall_prefetch_method") or self._config.get("prefetch_method", "recall")
        self._prefetch_method = prefetch_method if prefetch_method in ("recall", "reflect") else "recall"

        # Bank options
@@ -521,9 +602,22 @@ class HindsightMemoryProvider(MemoryProvider):
        self._bank_retain_mission = self._config.get("bank_retain_mission") or None

        # Tags
-        self._tags = self._config.get("tags") or None
+        self._retain_tags = _normalize_retain_tags(
+            self._config.get("retain_tags")
+            or os.environ.get("HINDSIGHT_RETAIN_TAGS", "")
+        )
+        self._tags = self._retain_tags or None
        self._recall_tags = self._config.get("recall_tags") or None
        self._recall_tags_match = self._config.get("recall_tags_match", "any")
+        self._retain_source = str(
+            self._config.get("retain_source") or os.environ.get("HINDSIGHT_RETAIN_SOURCE", "")
+        ).strip()
+        self._retain_user_prefix = str(
+            self._config.get("retain_user_prefix") or os.environ.get("HINDSIGHT_RETAIN_USER_PREFIX", "User")
+        ).strip() or "User"
+        self._retain_assistant_prefix = str(
+            self._config.get("retain_assistant_prefix") or os.environ.get("HINDSIGHT_RETAIN_ASSISTANT_PREFIX", "Assistant")
+        ).strip() or "Assistant"

        # Retain controls
        self._auto_retain = self._config.get("auto_retain", True)
@@ -547,11 +641,9 @@ class HindsightMemoryProvider(MemoryProvider):
        logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
                     self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
        logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
-                     "retain_async=%s, retain_context=%s, "
-                     "recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
+                     "retain_async=%s, retain_context=%s, recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
                     self._auto_retain, self._auto_recall, self._retain_every_n_turns,
-                     self._retain_async, self._retain_context,
-                     self._recall_max_tokens, self._recall_max_input_chars,
+                     self._retain_async, self._retain_context, self._recall_max_tokens, self._recall_max_input_chars,
                     self._tags, self._recall_tags)

        # For local mode, start the embedded daemon in the background so it
@@ -712,6 +804,78 @@ class HindsightMemoryProvider(MemoryProvider):
        self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="hindsight-prefetch")
        self._prefetch_thread.start()

+    def _build_turn_messages(self, user_content: str, assistant_content: str) -> List[Dict[str, str]]:
+        now = datetime.now(timezone.utc).isoformat()
+        return [
+            {
+                "role": "user",
+                "content": f"{self._retain_user_prefix}: {user_content}",
+                "timestamp": now,
+            },
+            {
+                "role": "assistant",
+                "content": f"{self._retain_assistant_prefix}: {assistant_content}",
+                "timestamp": now,
+            },
+        ]
+
+    def _build_metadata(self, *, message_count: int, turn_index: int) -> Dict[str, str]:
+        metadata: Dict[str, str] = {
+            "retained_at": _utc_timestamp(),
+            "message_count": str(message_count),
+            "turn_index": str(turn_index),
+        }
+        if self._retain_source:
+            metadata["source"] = self._retain_source
+        if self._session_id:
+            metadata["session_id"] = self._session_id
+        if self._platform:
+            metadata["platform"] = self._platform
+        if self._user_id:
+            metadata["user_id"] = self._user_id
+        if self._user_name:
+            metadata["user_name"] = self._user_name
+        if self._chat_id:
+            metadata["chat_id"] = self._chat_id
+        if self._chat_name:
+            metadata["chat_name"] = self._chat_name
+        if self._chat_type:
+            metadata["chat_type"] = self._chat_type
+        if self._thread_id:
+            metadata["thread_id"] = self._thread_id
+        if self._agent_identity:
+            metadata["agent_identity"] = self._agent_identity
+        return metadata
+
+    def _build_retain_kwargs(
+        self,
+        content: str,
+        *,
+        context: str | None = None,
+        document_id: str | None = None,
+        metadata: Dict[str, str] | None = None,
+        tags: List[str] | None = None,
+        retain_async: bool | None = None,
+    ) -> Dict[str, Any]:
+        kwargs: Dict[str, Any] = {
+            "bank_id": self._bank_id,
+            "content": content,
+            "metadata": metadata or self._build_metadata(message_count=1, turn_index=self._turn_index),
+        }
+        if context is not None:
+            kwargs["context"] = context
+        if document_id:
+            kwargs["document_id"] = document_id
+        if retain_async is not None:
+            kwargs["retain_async"] = retain_async
+        merged_tags = _normalize_retain_tags(self._retain_tags)
+        for tag in _normalize_retain_tags(tags):
+            if tag not in merged_tags:
+                merged_tags.append(tag)
+        if merged_tags:
+            kwargs["tags"] = merged_tags
+        return kwargs
+
    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
        """Retain conversation turn in background (non-blocking).

@@ -721,19 +885,14 @@ class HindsightMemoryProvider(MemoryProvider):
            logger.debug("sync_turn: skipped (auto_retain disabled)")
            return

-        from datetime import datetime, timezone
-        now = datetime.now(timezone.utc).isoformat()
+        if session_id:
+            self._session_id = str(session_id).strip()

-        messages = [
-            {"role": "user", "content": user_content, "timestamp": now},
-            {"role": "assistant", "content": assistant_content, "timestamp": now},
-        ]
-
-        turn = json.dumps(messages)
+        turn = json.dumps(self._build_turn_messages(user_content, assistant_content))
        self._session_turns.append(turn)
        self._turn_counter += 1
+        self._turn_index = self._turn_counter

-        # Only retain every N turns
        if self._turn_counter % self._retain_every_n_turns != 0:
            logger.debug("sync_turn: buffered turn %d (will retain at turn %d)",
                         self._turn_counter, self._turn_counter + (self._retain_every_n_turns - self._turn_counter % self._retain_every_n_turns))
@@ -741,19 +900,21 @@ class HindsightMemoryProvider(MemoryProvider):

        logger.debug("sync_turn: retaining %d turns, total session content %d chars",
                     len(self._session_turns), sum(len(t) for t in self._session_turns))
-        # Send the ENTIRE session as a single JSON array (document_id deduplicates).
-        # Each element in _session_turns is a JSON string of that turn's messages.
        content = "[" + ",".join(self._session_turns) + "]"

        def _sync():
            try:
                client = self._get_client()
-                item: dict = {
-                    "content": content,
-                    "context": self._retain_context,
-                }
-                if self._tags:
-                    item["tags"] = self._tags
+                item = self._build_retain_kwargs(
+                    content,
+                    context=self._retain_context,
+                    metadata=self._build_metadata(
+                        message_count=len(self._session_turns) * 2,
+                        turn_index=self._turn_index,
+                    ),
+                )
+                item.pop("bank_id", None)
+                item.pop("retain_async", None)
                logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
                             self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
                _run_sync(client.aretain_batch(
@@ -789,11 +950,11 @@ class HindsightMemoryProvider(MemoryProvider):
                return tool_error("Missing required parameter: content")
            context = args.get("context")
            try:
-                retain_kwargs: dict = {
-                    "bank_id": self._bank_id, "content": content, "context": context,
-                }
-                if self._tags:
-                    retain_kwargs["tags"] = self._tags
+                retain_kwargs = self._build_retain_kwargs(
+                    content,
+                    context=context,
+                    tags=args.get("tags"),
+                )
                logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
                             self._bank_id, len(content), context)
                _run_sync(client.aretain(**retain_kwargs))
@@ -126,7 +126,7 @@ py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajector
 hermes_cli = ["web_dist/**/*"]

 [tool.setuptools.packages.find]
-include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]
+include = ["agent", "agent.*", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*"]

 [tool.pytest.ini_options]
 testpaths = ["tests"]
@@ -751,6 +751,11 @@ class AIAgent:
        prefill_messages: List[Dict[str, Any]] = None,
        platform: str = None,
        user_id: str = None,
+        user_name: str = None,
+        chat_id: str = None,
+        chat_name: str = None,
+        chat_type: str = None,
+        thread_id: str = None,
        gateway_session_key: str = None,
        skip_context_files: bool = False,
        skip_memory: bool = False,
@@ -820,6 +825,11 @@ class AIAgent:
        self.ephemeral_system_prompt = ephemeral_system_prompt
        self.platform = platform  # "cli", "telegram", "discord", "whatsapp", etc.
        self._user_id = user_id  # Platform user identifier (gateway sessions)
+        self._user_name = user_name
+        self._chat_id = chat_id
+        self._chat_name = chat_name
+        self._chat_type = chat_type
+        self._thread_id = thread_id
        self._gateway_session_key = gateway_session_key  # Stable per-chat key (e.g. agent:main:telegram:dm:123)
        # Pluggable print function — CLI replaces this with _cprint so that
        # raw ANSI status lines are routed through prompt_toolkit's renderer
@@ -1471,6 +1481,16 @@ class AIAgent:
                        # Thread gateway user identity for per-user memory scoping
                        if self._user_id:
                            _init_kwargs["user_id"] = self._user_id
+                        if self._user_name:
+                            _init_kwargs["user_name"] = self._user_name
+                        if self._chat_id:
+                            _init_kwargs["chat_id"] = self._chat_id
+                        if self._chat_name:
+                            _init_kwargs["chat_name"] = self._chat_name
+                        if self._chat_type:
+                            _init_kwargs["chat_type"] = self._chat_type
+                        if self._thread_id:
+                            _init_kwargs["thread_id"] = self._thread_id
                        # Thread gateway session key for stable per-chat Honcho session isolation
                        if self._gateway_session_key:
                            _init_kwargs["gateway_session_key"] = self._gateway_session_key
@@ -2966,6 +2986,7 @@ class AIAgent:
                    tool_call_id=msg.get("tool_call_id"),
                    finish_reason=msg.get("finish_reason"),
                    reasoning=msg.get("reasoning") if role == "assistant" else None,
+                    reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
                    reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
                    codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
                )
@@ -7003,6 +7024,11 @@ class AIAgent:
            "finish_reason": finish_reason,
        }

+        if hasattr(assistant_message, "reasoning_content"):
+            raw_reasoning_content = getattr(assistant_message, "reasoning_content", None)
+            if raw_reasoning_content is not None:
+                msg["reasoning_content"] = _sanitize_surrogates(raw_reasoning_content)
+
        if hasattr(assistant_message, 'reasoning_details') and assistant_message.reasoning_details:
            # Pass reasoning_details back unmodified so providers (OpenRouter,
            # Anthropic, OpenAI) can maintain reasoning continuity across turns.
@@ -7077,6 +7103,30 @@ class AIAgent:

        return msg

+    def _copy_reasoning_content_for_api(self, source_msg: dict, api_msg: dict) -> None:
+        """Copy provider-facing reasoning fields onto an API replay message."""
+        if source_msg.get("role") != "assistant":
+            return
+
+        explicit_reasoning = source_msg.get("reasoning_content")
+        if isinstance(explicit_reasoning, str):
+            api_msg["reasoning_content"] = explicit_reasoning
+            return
+
+        normalized_reasoning = source_msg.get("reasoning")
+        if isinstance(normalized_reasoning, str) and normalized_reasoning:
+            api_msg["reasoning_content"] = normalized_reasoning
+            return
+
+        kimi_requires_reasoning = (
+            self.provider in {"kimi-coding", "kimi-coding-cn"}
+            or base_url_host_matches(self.base_url, "api.kimi.com")
+            or base_url_host_matches(self.base_url, "moonshot.ai")
+            or base_url_host_matches(self.base_url, "moonshot.cn")
+        )
+        if kimi_requires_reasoning and source_msg.get("tool_calls"):
+            api_msg["reasoning_content"] = ""
+
    @staticmethod
    def _sanitize_tool_calls_for_strict_api(api_msg: dict) -> dict:
        """Strip Codex Responses API fields from tool_calls for strict providers.
@@ -7160,10 +7210,7 @@ class AIAgent:
            api_messages = []
            for msg in messages:
                api_msg = msg.copy()
-                if msg.get("role") == "assistant":
-                    reasoning = msg.get("reasoning")
-                    if reasoning:
-                        api_msg["reasoning_content"] = reasoning
+                self._copy_reasoning_content_for_api(msg, api_msg)
                api_msg.pop("reasoning", None)
                api_msg.pop("finish_reason", None)
                api_msg.pop("_flush_sentinel", None)
@@ -8923,11 +8970,7 @@ class AIAgent:

                # For ALL assistant messages, pass reasoning back to the API
                # This ensures multi-turn reasoning context is preserved
-                if msg.get("role") == "assistant":
-                    reasoning_text = msg.get("reasoning")
-                    if reasoning_text:
-                        # Add reasoning_content for API compatibility (Moonshot AI, Novita, OpenRouter)
-                        api_msg["reasoning_content"] = reasoning_text
+                self._copy_reasoning_content_for_api(msg, api_msg)

                # Remove 'reasoning' field - it's for trajectory storage only
                # We've copied it to 'reasoning_content' for the API above
@@ -9724,6 +9767,7 @@ class AIAgent:
                                    billing_mode="subscription_included"
                                    if cost_result.status == "included" else None,
                                    model=self.model,
+                                    api_call_count=1,
                                )
                            except Exception:
                                pass  # never block the agent loop
@@ -10000,6 +10044,27 @@ class AIAgent:
                        if self._try_refresh_nous_client_credentials(force=True):
                            print(f"{self.log_prefix}🔐 Nous agent key refreshed after 401. Retrying request...")
                            continue
+                        # Credential refresh didn't help — show diagnostic info.
+                        # Most common causes: Portal OAuth expired/revoked,
+                        # account out of credits, or agent key blocked.
+                        from hermes_constants import display_hermes_home as _dhh_fn
+                        _dhh = _dhh_fn()
+                        _body_text = ""
+                        try:
+                            _body = getattr(api_error, "body", None) or getattr(api_error, "response", None)
+                            if _body is not None:
+                                _body_text = str(_body)[:200]
+                        except Exception:
+                            pass
+                        print(f"{self.log_prefix}🔐 Nous 401 — Portal authentication failed.")
+                        if _body_text:
+                            print(f"{self.log_prefix}   Response: {_body_text}")
+                        print(f"{self.log_prefix}   Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
+                        print(f"{self.log_prefix}   Troubleshooting:")
+                        print(f"{self.log_prefix}     • Re-authenticate: hermes login --provider nous")
+                        print(f"{self.log_prefix}     • Check credits / billing: https://portal.nousresearch.com")
+                        print(f"{self.log_prefix}     • Verify stored credentials: {_dhh}/auth.json")
+                        print(f"{self.log_prefix}     • Switch providers temporarily: /model <model> --provider openrouter")
                    if (
                        self.api_mode == "anthropic_messages"
                        and status_code == 401
@@ -50,7 +50,10 @@ AUTHOR_MAP = {
    "71184274+MassiveMassimo@users.noreply.github.com": "MassiveMassimo",
    "massivemassimo@users.noreply.github.com": "MassiveMassimo",
    "82637225+kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
+    "keifergu@tencent.com": "keifergu",
    "kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
+    "abner.the.foreman@agentmail.to": "Abnertheforeman",
+    "harryykyle1@gmail.com": "hharry11",
    "kshitijk4poor@gmail.com": "kshitijk4poor",
    "16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
    "185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
@@ -92,6 +95,8 @@ AUTHOR_MAP = {
    "135070653+sgaofen@users.noreply.github.com": "sgaofen",
    "nocoo@users.noreply.github.com": "nocoo",
    "30841158+n-WN@users.noreply.github.com": "n-WN",
+    "tsuijinglei@gmail.com": "hiddenpuppy",
+    "jerome@clawwork.ai": "HiddenPuppy",
    "leoyuan0099@gmail.com": "keyuyuan",
    "bxzt2006@163.com": "Only-Code-A",
    "i@troy-y.org": "TroyMitchell911",
@@ -99,6 +104,7 @@ AUTHOR_MAP = {
    "hansnow@users.noreply.github.com": "hansnow",
    "134848055+UNLINEARITY@users.noreply.github.com": "UNLINEARITY",
    "ben.burtenshaw@gmail.com": "burtenshaw",
+    "roopaknijhara@gmail.com": "rnijhara",
    # contributors (manual mapping from git names)
    "ahmedsherif95@gmail.com": "asheriif",
    "liujinkun@bytedance.com": "liujinkun2025",
@@ -298,9 +298,15 @@ class TestClassifyApiError:
        assert result.retryable is False

    def test_404_generic(self):
+        # Generic 404 with no "model not found" signal — common for local
+        # llama.cpp/Ollama/vLLM endpoints with slightly wrong paths.  Treat
+        # as unknown (retryable) so the real error surfaces, rather than
+        # claiming the model is missing and silently falling back.
        e = MockAPIError("Not Found", status_code=404)
        result = classify_api_error(e)
-        assert result.reason == FailoverReason.model_not_found
+        assert result.reason == FailoverReason.unknown
+        assert result.retryable is True
+        assert result.should_fallback is False

    # ── Payload too large ──

@@ -79,6 +79,28 @@ class TestMemoryManagerUserIdThreading:
        assert p._init_kwargs.get("platform") == "telegram"
        assert p._init_session_id == "sess-123"

+    def test_chat_context_forwarded_to_provider(self):
+        mgr = MemoryManager()
+        p = RecordingProvider()
+        mgr.add_provider(p)
+
+        mgr.initialize_all(
+            session_id="sess-chat",
+            platform="discord",
+            user_id="discord_u_7",
+            user_name="fakeusername",
+            chat_id="1485316232612941897",
+            chat_name="fakeassistantname-forums",
+            chat_type="thread",
+            thread_id="1491249007475949698",
+        )
+
+        assert p._init_kwargs.get("user_name") == "fakeusername"
+        assert p._init_kwargs.get("chat_id") == "1485316232612941897"
+        assert p._init_kwargs.get("chat_name") == "fakeassistantname-forums"
+        assert p._init_kwargs.get("chat_type") == "thread"
+        assert p._init_kwargs.get("thread_id") == "1491249007475949698"
+
    def test_no_user_id_when_cli(self):
        """CLI sessions should not have user_id in kwargs."""
        mgr = MemoryManager()
@@ -334,3 +356,4 @@ class TestAIAgentUserIdPropagation:
            agent = object.__new__(AIAgent)
            agent._user_id = None
            assert agent._user_id is None
+
@@ -385,6 +385,7 @@ class TestStripProviderPrefix:
        assert _strip_provider_prefix("local:my-model") == "my-model"
        assert _strip_provider_prefix("openrouter:anthropic/claude-sonnet-4") == "anthropic/claude-sonnet-4"
        assert _strip_provider_prefix("anthropic:claude-sonnet-4") == "claude-sonnet-4"
+        assert _strip_provider_prefix("stepfun:step-3.5-flash") == "step-3.5-flash"

    def test_ollama_model_tag_preserved(self):
        """Ollama model:tag format must NOT be stripped."""
@@ -82,6 +82,7 @@ class TestProviderMapping:
    def test_known_providers_mapped(self):
        assert PROVIDER_TO_MODELS_DEV["anthropic"] == "anthropic"
        assert PROVIDER_TO_MODELS_DEV["copilot"] == "github-copilot"
+        assert PROVIDER_TO_MODELS_DEV["stepfun"] == "stepfun"
        assert PROVIDER_TO_MODELS_DEV["kilocode"] == "kilo"
        assert PROVIDER_TO_MODELS_DEV["ai-gateway"] == "vercel"

@@ -1059,6 +1059,7 @@ class TestRewriteTranscriptPreservesReasoning:
            role="assistant",
            content="The answer is 42.",
            reasoning="I need to think step by step.",
+            reasoning_content="provider scratchpad",
            reasoning_details=[{"type": "summary", "text": "step by step"}],
            codex_reasoning_items=[{"id": "r1", "type": "reasoning"}],
        )
@@ -1066,6 +1067,7 @@ class TestRewriteTranscriptPreservesReasoning:
        # Verify all three were stored
        before = db.get_messages_as_conversation(session_id)
        assert before[0].get("reasoning") == "I need to think step by step."
+        assert before[0].get("reasoning_content") == "provider scratchpad"
        assert before[0].get("reasoning_details") == [{"type": "summary", "text": "step by step"}]
        assert before[0].get("codex_reasoning_items") == [{"id": "r1", "type": "reasoning"}]

@@ -1082,5 +1084,6 @@ class TestRewriteTranscriptPreservesReasoning:
        # Load again — all three reasoning fields must survive
        after = db.get_messages_as_conversation(session_id)
        assert after[0].get("reasoning") == "I need to think step by step."
+        assert after[0].get("reasoning_content") == "provider scratchpad"
        assert after[0].get("reasoning_details") == [{"type": "summary", "text": "step by step"}]
        assert after[0].get("codex_reasoning_items") == [{"id": "r1", "type": "reasoning"}]
@@ -1031,7 +1031,7 @@ class TestReactions:

    @pytest.mark.asyncio
    async def test_reactions_in_message_flow(self, adapter):
-        """Reactions should be added on receipt and swapped on completion."""
+        """Reactions should be bracketed around actual processing via hooks."""
        adapter._app.client.reactions_add = AsyncMock()
        adapter._app.client.reactions_remove = AsyncMock()
        adapter._app.client.users_info = AsyncMock(return_value={
@@ -1047,15 +1047,147 @@ class TestReactions:
        }
        await adapter._handle_slack_message(event)

-        # Should have added 👀, then removed 👀, then added ✅
+        # _handle_slack_message should register the message for reactions
+        assert "1234567890.000001" in adapter._reacting_message_ids
+
+        # Simulate the base class calling on_processing_start
+        from gateway.platforms.base import MessageEvent, MessageType, SessionSource
+        from gateway.config import Platform
+        source = SessionSource(
+            platform=Platform.SLACK,
+            chat_id="C123",
+            chat_type="dm",
+            user_id="U_USER",
+        )
+        msg_event = MessageEvent(
+            text="hello",
+            message_type=MessageType.TEXT,
+            source=source,
+            message_id="1234567890.000001",
+        )
+        await adapter.on_processing_start(msg_event)
+
+        add_calls = adapter._app.client.reactions_add.call_args_list
+        assert len(add_calls) == 1
+        assert add_calls[0].kwargs["name"] == "eyes"
+
+        # Simulate the base class calling on_processing_complete
+        from gateway.platforms.base import ProcessingOutcome
+        await adapter.on_processing_complete(msg_event, ProcessingOutcome.SUCCESS)
+
        add_calls = adapter._app.client.reactions_add.call_args_list
        remove_calls = adapter._app.client.reactions_remove.call_args_list
        assert len(add_calls) == 2
-        assert add_calls[0].kwargs["name"] == "eyes"
        assert add_calls[1].kwargs["name"] == "white_check_mark"
        assert len(remove_calls) == 1
        assert remove_calls[0].kwargs["name"] == "eyes"

+        # Message ID should be cleaned up
+        assert "1234567890.000001" not in adapter._reacting_message_ids
+
+    @pytest.mark.asyncio
+    async def test_reactions_failure_outcome(self, adapter):
+        """Failed processing should add :x: instead of :white_check_mark:."""
+        adapter._app.client.reactions_add = AsyncMock()
+        adapter._app.client.reactions_remove = AsyncMock()
+
+        from gateway.platforms.base import MessageEvent, MessageType, SessionSource, ProcessingOutcome
+        from gateway.config import Platform
+        source = SessionSource(
+            platform=Platform.SLACK,
+            chat_id="C123",
+            chat_type="dm",
+            user_id="U_USER",
+        )
+        adapter._reacting_message_ids.add("1234567890.000002")
+        msg_event = MessageEvent(
+            text="hello",
+            message_type=MessageType.TEXT,
+            source=source,
+            message_id="1234567890.000002",
+        )
+        await adapter.on_processing_complete(msg_event, ProcessingOutcome.FAILURE)
+
+        add_calls = adapter._app.client.reactions_add.call_args_list
+        remove_calls = adapter._app.client.reactions_remove.call_args_list
+        assert len(add_calls) == 1
+        assert add_calls[0].kwargs["name"] == "x"
+        assert len(remove_calls) == 1
+        assert remove_calls[0].kwargs["name"] == "eyes"
+
+    @pytest.mark.asyncio
+    async def test_reactions_skipped_for_non_dm_non_mention(self, adapter):
+        """Non-DM, non-mention messages should not get reactions."""
+        adapter._app.client.reactions_add = AsyncMock()
+        adapter._app.client.reactions_remove = AsyncMock()
+        adapter._app.client.users_info = AsyncMock(return_value={
+            "user": {"profile": {"display_name": "Tyler"}}
+        })
+
+        event = {
+            "text": "hello",
+            "user": "U_USER",
+            "channel": "C123",
+            "channel_type": "channel",
+            "ts": "1234567890.000003",
+        }
+        await adapter._handle_slack_message(event)
+
+        # Should NOT register for reactions when not mentioned in a channel
+        assert "1234567890.000003" not in adapter._reacting_message_ids
+        adapter._app.client.reactions_add.assert_not_called()
+        adapter._app.client.reactions_remove.assert_not_called()
+
+    @pytest.mark.asyncio
+    async def test_reactions_disabled_via_env(self, adapter, monkeypatch):
+        """SLACK_REACTIONS=false should suppress all reaction lifecycle."""
+        monkeypatch.setenv("SLACK_REACTIONS", "false")
+        adapter._app.client.reactions_add = AsyncMock()
+        adapter._app.client.reactions_remove = AsyncMock()
+        adapter._app.client.users_info = AsyncMock(return_value={
+            "user": {"profile": {"display_name": "Tyler"}}
+        })
+
+        event = {
+            "text": "hello",
+            "user": "U_USER",
+            "channel": "C123",
+            "channel_type": "im",
+            "ts": "1234567890.000004",
+        }
+        await adapter._handle_slack_message(event)
+
+        # Should NOT register for reactions when toggle is off
+        assert "1234567890.000004" not in adapter._reacting_message_ids
+
+        # Hooks should also be no-ops when disabled
+        from gateway.platforms.base import MessageEvent, MessageType, SessionSource, ProcessingOutcome
+        from gateway.config import Platform
+        source = SessionSource(
+            platform=Platform.SLACK,
+            chat_id="C123",
+            chat_type="dm",
+            user_id="U_USER",
+        )
+        msg_event = MessageEvent(
+            text="hello",
+            message_type=MessageType.TEXT,
+            source=source,
+            message_id="1234567890.000004",
+        )
+        # Force-add to verify hooks respect the toggle independently
+        adapter._reacting_message_ids.add("1234567890.000004")
+        await adapter.on_processing_start(msg_event)
+        await adapter.on_processing_complete(msg_event, ProcessingOutcome.SUCCESS)
+
+        adapter._app.client.reactions_add.assert_not_called()
+        adapter._app.client.reactions_remove.assert_not_called()
+
+    @pytest.mark.asyncio
+    async def test_reactions_enabled_by_default(self, adapter):
+        """SLACK_REACTIONS defaults to true (matches existing behavior)."""
+        assert adapter._reactions_enabled() is True
+

 # ---------------------------------------------------------------------------
 # TestThreadReplyHandling
@@ -15,6 +15,8 @@ from hermes_cli.auth import (
    get_auth_status,
    AuthError,
    KIMI_CODE_BASE_URL,
+    STEPFUN_STEP_PLAN_INTL_BASE_URL,
+    STEPFUN_STEP_PLAN_CN_BASE_URL,
    _resolve_kimi_base_url,
 )
 from hermes_cli.copilot_auth import _try_gh_cli_token
@@ -35,6 +37,7 @@ class TestProviderRegistry:
        ("xai", "xAI", "api_key"),
        ("nvidia", "NVIDIA NIM", "api_key"),
        ("kimi-coding", "Kimi / Moonshot", "api_key"),
+        ("stepfun", "StepFun Step Plan", "api_key"),
        ("minimax", "MiniMax", "api_key"),
        ("minimax-cn", "MiniMax (China)", "api_key"),
        ("ai-gateway", "Vercel AI Gateway", "api_key"),
@@ -83,6 +86,11 @@ class TestProviderRegistry:
        assert pconfig.api_key_env_vars == ("MINIMAX_API_KEY",)
        assert pconfig.base_url_env_var == "MINIMAX_BASE_URL"

+    def test_stepfun_env_vars(self):
+        pconfig = PROVIDER_REGISTRY["stepfun"]
+        assert pconfig.api_key_env_vars == ("STEPFUN_API_KEY",)
+        assert pconfig.base_url_env_var == "STEPFUN_BASE_URL"
+
    def test_minimax_cn_env_vars(self):
        pconfig = PROVIDER_REGISTRY["minimax-cn"]
        assert pconfig.api_key_env_vars == ("MINIMAX_CN_API_KEY",)
@@ -108,6 +116,7 @@ class TestProviderRegistry:
        assert PROVIDER_REGISTRY["copilot-acp"].inference_base_url == "acp://copilot"
        assert PROVIDER_REGISTRY["zai"].inference_base_url == "https://api.z.ai/api/paas/v4"
        assert PROVIDER_REGISTRY["kimi-coding"].inference_base_url == "https://api.moonshot.ai/v1"
+        assert PROVIDER_REGISTRY["stepfun"].inference_base_url == STEPFUN_STEP_PLAN_INTL_BASE_URL
        assert PROVIDER_REGISTRY["minimax"].inference_base_url == "https://api.minimax.io/anthropic"
        assert PROVIDER_REGISTRY["minimax-cn"].inference_base_url == "https://api.minimaxi.com/anthropic"
        assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"
@@ -130,7 +139,8 @@ PROVIDER_ENV_VARS = (
    "OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN",
    "CLAUDE_CODE_OAUTH_TOKEN",
    "GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY",
-    "KIMI_API_KEY", "KIMI_BASE_URL", "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
+    "KIMI_API_KEY", "KIMI_BASE_URL", "STEPFUN_API_KEY", "STEPFUN_BASE_URL",
+    "MINIMAX_API_KEY", "MINIMAX_CN_API_KEY",
    "AI_GATEWAY_API_KEY", "AI_GATEWAY_BASE_URL",
    "KILOCODE_API_KEY", "KILOCODE_BASE_URL",
    "DASHSCOPE_API_KEY", "OPENCODE_ZEN_API_KEY", "OPENCODE_GO_API_KEY",
@@ -156,6 +166,9 @@ class TestResolveProvider:
    def test_explicit_kimi_coding(self):
        assert resolve_provider("kimi-coding") == "kimi-coding"

+    def test_explicit_stepfun(self):
+        assert resolve_provider("stepfun") == "stepfun"
+
    def test_explicit_minimax(self):
        assert resolve_provider("minimax") == "minimax"

@@ -180,6 +193,9 @@ class TestResolveProvider:
    def test_alias_moonshot(self):
        assert resolve_provider("moonshot") == "kimi-coding"

+    def test_alias_step(self):
+        assert resolve_provider("step") == "stepfun"
+
    def test_alias_minimax_underscore(self):
        assert resolve_provider("minimax_cn") == "minimax-cn"

@@ -248,6 +264,10 @@ class TestResolveProvider:
        monkeypatch.setenv("KIMI_API_KEY", "test-kimi-key")
        assert resolve_provider("auto") == "kimi-coding"

+    def test_auto_detects_stepfun_key(self, monkeypatch):
+        monkeypatch.setenv("STEPFUN_API_KEY", "test-stepfun-key")
+        assert resolve_provider("auto") == "stepfun"
+
    def test_auto_detects_minimax_key(self, monkeypatch):
        monkeypatch.setenv("MINIMAX_API_KEY", "test-mm-key")
        assert resolve_provider("auto") == "minimax"
@@ -312,6 +332,13 @@ class TestApiKeyProviderStatus:
        status = get_api_key_provider_status("kimi-coding")
        assert status["base_url"] == "https://custom.kimi.example/v1"

+    def test_stepfun_status_uses_configured_base_url(self, monkeypatch):
+        monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-key")
+        monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
+        status = get_api_key_provider_status("stepfun")
+        assert status["configured"] is True
+        assert status["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
+
    def test_copilot_status_uses_gh_cli_token(self, monkeypatch):
        monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_gh_cli_token")
        status = get_api_key_provider_status("copilot")
@@ -429,6 +456,19 @@ class TestResolveApiKeyProviderCredentials:
        assert creds["api_key"] == "kimi-secret-key"
        assert creds["base_url"] == "https://api.moonshot.ai/v1"

+    def test_resolve_stepfun_with_key(self, monkeypatch):
+        monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-secret-key")
+        creds = resolve_api_key_provider_credentials("stepfun")
+        assert creds["provider"] == "stepfun"
+        assert creds["api_key"] == "stepfun-secret-key"
+        assert creds["base_url"] == STEPFUN_STEP_PLAN_INTL_BASE_URL
+
+    def test_resolve_stepfun_custom_base_url(self, monkeypatch):
+        monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-secret-key")
+        monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
+        creds = resolve_api_key_provider_credentials("stepfun")
+        assert creds["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
+
    def test_resolve_minimax_with_key(self, monkeypatch):
        monkeypatch.setenv("MINIMAX_API_KEY", "mm-secret-key")
        creds = resolve_api_key_provider_credentials("minimax")
@@ -519,6 +559,16 @@ class TestRuntimeProviderResolution:
        assert result["api_mode"] == "chat_completions"
        assert result["api_key"] == "kimi-key"

+    def test_runtime_stepfun(self, monkeypatch):
+        monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-key")
+        monkeypatch.setenv("STEPFUN_BASE_URL", STEPFUN_STEP_PLAN_CN_BASE_URL)
+        from hermes_cli.runtime_provider import resolve_runtime_provider
+        result = resolve_runtime_provider(requested="stepfun")
+        assert result["provider"] == "stepfun"
+        assert result["api_mode"] == "chat_completions"
+        assert result["api_key"] == "stepfun-key"
+        assert result["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL
+
    def test_runtime_minimax(self, monkeypatch):
        monkeypatch.setenv("MINIMAX_API_KEY", "mm-key")
        from hermes_cli.runtime_provider import resolve_runtime_provider
@@ -33,6 +33,25 @@ def test_project_env_overrides_stale_shell_values_when_user_env_missing(tmp_path
    assert os.getenv("OPENAI_BASE_URL") == "https://project.example/v1"


+def test_project_env_is_sanitized_before_loading(tmp_path, monkeypatch):
+    home = tmp_path / "hermes"
+    project_env = tmp_path / ".env"
+    project_env.write_text(
+        "TELEGRAM_BOT_TOKEN=8356550917:AAGGEkzg06Hrc3Hjb3Sa1jkGVDOdU_lYy2Q"
+        "ANTHROPIC_API_KEY=sk-ant-test123\n",
+        encoding="utf-8",
+    )
+
+    monkeypatch.delenv("TELEGRAM_BOT_TOKEN", raising=False)
+    monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
+
+    loaded = load_hermes_dotenv(hermes_home=home, project_env=project_env)
+
+    assert loaded == [project_env]
+    assert os.getenv("TELEGRAM_BOT_TOKEN") == "8356550917:AAGGEkzg06Hrc3Hjb3Sa1jkGVDOdU_lYy2Q"
+    assert os.getenv("ANTHROPIC_API_KEY") == "sk-ant-test123"
+
+
 def test_user_env_takes_precedence_over_project_env(tmp_path, monkeypatch):
    home = tmp_path / "hermes"
    home.mkdir()
@@ -32,6 +32,8 @@ def config_home(tmp_path, monkeypatch):
    monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
    monkeypatch.delenv("OPENAI_API_KEY", raising=False)
    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    monkeypatch.delenv("STEPFUN_API_KEY", raising=False)
+    monkeypatch.delenv("STEPFUN_BASE_URL", raising=False)
    return home


@@ -330,3 +332,33 @@ class TestBaseUrlValidation:

        saved = get_env_value("GLM_BASE_URL") or ""
        assert saved == "", "Empty input should not save a base URL"
+
+    def test_stepfun_provider_saved_with_selected_region(self, config_home, monkeypatch):
+        from hermes_cli.main import _model_flow_stepfun
+        from hermes_cli.config import load_config, get_env_value
+
+        monkeypatch.setenv("STEPFUN_API_KEY", "stepfun-test-key")
+
+        with patch(
+            "hermes_cli.main._prompt_provider_choice",
+            return_value=1,
+        ), patch(
+            "hermes_cli.models.fetch_api_models",
+            return_value=["step-3.5-flash", "step-3-agent-lite"],
+        ), patch(
+            "hermes_cli.auth._prompt_model_selection",
+            return_value="step-3-agent-lite",
+        ), patch(
+            "hermes_cli.auth.deactivate_provider",
+        ):
+            _model_flow_stepfun(load_config(), "old-model")
+
+        import yaml
+
+        config = yaml.safe_load((config_home / "config.yaml").read_text()) or {}
+        model = config.get("model")
+        assert isinstance(model, dict)
+        assert model.get("provider") == "stepfun"
+        assert model.get("default") == "step-3-agent-lite"
+        assert model.get("base_url") == "https://api.stepfun.com/step_plan/v1"
+        assert get_env_value("STEPFUN_BASE_URL") == "https://api.stepfun.com/step_plan/v1"
@@ -63,6 +63,11 @@ class TestParseModelInput:
        assert provider == "zai"
        assert model == "glm-5"

+    def test_stepfun_alias_resolved(self):
+        provider, model = parse_model_input("step:step-3.5-flash", "openrouter")
+        assert provider == "stepfun"
+        assert model == "step-3.5-flash"
+
    def test_no_slash_no_colon_keeps_provider(self):
        provider, model = parse_model_input("gpt-5.4", "openrouter")
        assert provider == "openrouter"
@@ -154,6 +159,7 @@ class TestNormalizeProvider:
        assert normalize_provider("glm") == "zai"
        assert normalize_provider("kimi") == "kimi-coding"
        assert normalize_provider("moonshot") == "kimi-coding"
+        assert normalize_provider("step") == "stepfun"
        assert normalize_provider("github-copilot") == "copilot"

    def test_case_insensitive(self):
@@ -164,6 +170,7 @@ class TestProviderLabel:
    def test_known_labels_and_auto(self):
        assert provider_label("anthropic") == "Anthropic"
        assert provider_label("kimi") == "Kimi / Kimi Coding Plan"
+        assert provider_label("stepfun") == "StepFun Step Plan"
        assert provider_label("copilot") == "GitHub Copilot"
        assert provider_label("copilot-acp") == "GitHub Copilot ACP"
        assert provider_label("auto") == "Auto"
@@ -193,6 +200,16 @@ class TestProviderModelIds:
    def test_zai_returns_glm_models(self):
        assert "glm-5" in provider_model_ids("zai")

+    def test_stepfun_prefers_live_catalog(self):
+        with patch(
+            "hermes_cli.auth.resolve_api_key_provider_credentials",
+            return_value={"api_key": "***", "base_url": "https://api.stepfun.com/step_plan/v1"},
+        ), patch(
+            "hermes_cli.models.fetch_api_models",
+            return_value=["step-3.5-flash", "step-3-agent-lite"],
+        ):
+            assert provider_model_ids("stepfun") == ["step-3.5-flash", "step-3-agent-lite"]
+
    def test_copilot_prefers_live_catalog(self):
        with patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={"api_key": "gh-token"}), \
             patch("hermes_cli.models._fetch_github_models", return_value=["gpt-5.4", "claude-sonnet-4.6"]):
@@ -250,6 +250,73 @@ class TestPluginLoading:

        assert "hermes_plugins.ns_plugin" in sys.modules

+    def test_user_memory_plugin_auto_coerced_to_exclusive(self, tmp_path, monkeypatch):
+        """User-installed memory plugins must NOT be loaded by the general
+        PluginManager — they belong to plugins/memory discovery.
+
+        Regression test for the mempalace crash:
+            'PluginContext' object has no attribute 'register_memory_provider'
+
+        A plugin that calls ``ctx.register_memory_provider`` in its
+        ``__init__.py`` should be auto-detected and treated as
+        ``kind: exclusive`` so the general loader records the manifest but
+        does not import/register() it. The real activation happens through
+        ``plugins/memory/__init__.py`` via ``memory.provider`` config.
+        """
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        plugin_dir = plugins_dir / "mempalace"
+        plugin_dir.mkdir(parents=True)
+        # No explicit `kind:` — the heuristic should kick in.
+        (plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "mempalace"}))
+        (plugin_dir / "__init__.py").write_text(
+            "class MemPalaceProvider:\n"
+            "    pass\n"
+            "def register(ctx):\n"
+            "    ctx.register_memory_provider('mempalace', MemPalaceProvider)\n"
+        )
+        # Even if the user explicitly enables it in config, the loader
+        # should still treat it as exclusive and skip general loading.
+        hermes_home = tmp_path / "hermes_test"
+        (hermes_home / "config.yaml").write_text(
+            yaml.safe_dump({"plugins": {"enabled": ["mempalace"]}})
+        )
+        monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert "mempalace" in mgr._plugins
+        entry = mgr._plugins["mempalace"]
+        assert entry.manifest.kind == "exclusive", (
+            f"Expected auto-coerced kind='exclusive', got {entry.manifest.kind}"
+        )
+        # Not loaded by general manager (no register() call, no AttributeError).
+        assert not entry.enabled
+        assert entry.module is None
+        assert "exclusive" in (entry.error or "").lower()
+
+    def test_explicit_standalone_kind_not_coerced(self, tmp_path, monkeypatch):
+        """If a plugin explicitly declares ``kind: standalone`` in its
+        manifest, the memory-provider heuristic must NOT override it —
+        even if the source happens to mention ``MemoryProvider``.
+        """
+        plugins_dir = tmp_path / "hermes_test" / "plugins"
+        plugin_dir = plugins_dir / "not_memory"
+        plugin_dir.mkdir(parents=True)
+        (plugin_dir / "plugin.yaml").write_text(
+            yaml.dump({"name": "not_memory", "kind": "standalone"})
+        )
+        (plugin_dir / "__init__.py").write_text(
+            "# This plugin inspects MemoryProvider docs but isn't one.\n"
+            "def register(ctx):\n    pass\n"
+        )
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes_test"))
+
+        mgr = PluginManager()
+        mgr.discover_and_load()
+
+        assert mgr._plugins["not_memory"].manifest.kind == "standalone"
+

 # ── TestPluginHooks ────────────────────────────────────────────────────────

@@ -706,6 +706,7 @@ class TestNewEndpoints:
        assert "skills" in data
        assert isinstance(data["daily"], list)
        assert "total_sessions" in data["totals"]
+        assert "total_api_calls" in data["totals"]
        assert data["skills"] == {
            "summary": {
                "total_skill_loads": 0,
@@ -6,6 +6,7 @@ turn counting, tags), and schema completeness.
 """

 import json
+import re
 import threading
 from types import SimpleNamespace
 from unittest.mock import AsyncMock, MagicMock, patch
@@ -18,6 +19,7 @@ from plugins.memory.hindsight import (
    REFLECT_SCHEMA,
    RETAIN_SCHEMA,
    _load_config,
+    _normalize_retain_tags,
 )


@@ -32,14 +34,30 @@ def _clean_env(monkeypatch):
    for key in (
        "HINDSIGHT_API_KEY", "HINDSIGHT_API_URL", "HINDSIGHT_BANK_ID",
        "HINDSIGHT_BUDGET", "HINDSIGHT_MODE", "HINDSIGHT_LLM_API_KEY",
+        "HINDSIGHT_RETAIN_TAGS", "HINDSIGHT_RETAIN_SOURCE",
+        "HINDSIGHT_RETAIN_USER_PREFIX", "HINDSIGHT_RETAIN_ASSISTANT_PREFIX",
    ):
        monkeypatch.delenv(key, raising=False)


 def _make_mock_client():
    """Create a mock Hindsight client with async methods."""
+    async def _aretain(
+        bank_id,
+        content,
+        timestamp=None,
+        context=None,
+        document_id=None,
+        metadata=None,
+        entities=None,
+        tags=None,
+        update_mode=None,
+        retain_async=None,
+    ):
+        return SimpleNamespace(ok=True)
+
    client = MagicMock()
-    client.aretain = AsyncMock()
+    client.aretain = AsyncMock(side_effect=_aretain)
    client.arecall = AsyncMock(
        return_value=SimpleNamespace(
            results=[
@@ -56,6 +74,14 @@ def _make_mock_client():
    return client


+class _FakeSessionDB:
+    def __init__(self, messages=None):
+        self._messages = list(messages or [])
+
+    def get_messages_as_conversation(self, session_id):
+        return list(self._messages)
+
+
@pytest.fixture()
 def provider(tmp_path, monkeypatch):
    """Create an initialized HindsightMemoryProvider with a mock client."""
@@ -109,6 +135,18 @@ def provider_with_config(tmp_path, monkeypatch):
    return _make


+def test_normalize_retain_tags_accepts_csv_and_dedupes():
+    assert _normalize_retain_tags("agent:fakeassistantname, source_system:hermes-agent, agent:fakeassistantname") == [
+        "agent:fakeassistantname",
+        "source_system:hermes-agent",
+    ]
+
+
+def test_normalize_retain_tags_accepts_json_array_string():
+    value = json.dumps(["agent:fakeassistantname", "source_system:hermes-agent"])
+    assert _normalize_retain_tags(value) == ["agent:fakeassistantname", "source_system:hermes-agent"]
+
+
 # ---------------------------------------------------------------------------
 # Schema tests
 # ---------------------------------------------------------------------------
@@ -118,6 +156,7 @@ class TestSchemas:
    def test_retain_schema_has_content(self):
        assert RETAIN_SCHEMA["name"] == "hindsight_retain"
        assert "content" in RETAIN_SCHEMA["parameters"]["properties"]
+        assert "tags" in RETAIN_SCHEMA["parameters"]["properties"]
        assert "content" in RETAIN_SCHEMA["parameters"]["required"]

    def test_recall_schema_has_query(self):
@@ -160,7 +199,10 @@ class TestConfig:

    def test_custom_config_values(self, provider_with_config):
        p = provider_with_config(
-            tags=["tag1", "tag2"],
+            retain_tags=["tag1", "tag2"],
+            retain_source="hermes",
+            retain_user_prefix="User (fakeusername)",
+            retain_assistant_prefix="Assistant (fakeassistantname)",
            recall_tags=["recall-tag"],
            recall_tags_match="all",
            auto_retain=False,
@@ -175,6 +217,10 @@ class TestConfig:
            bank_mission="Test agent mission",
        )
        assert p._tags == ["tag1", "tag2"]
+        assert p._retain_tags == ["tag1", "tag2"]
+        assert p._retain_source == "hermes"
+        assert p._retain_user_prefix == "User (fakeusername)"
+        assert p._retain_assistant_prefix == "Assistant (fakeassistantname)"
        assert p._recall_tags == ["recall-tag"]
        assert p._recall_tags_match == "all"
        assert p._auto_retain is False
@@ -222,11 +268,20 @@ class TestToolHandlers:
        assert call_kwargs["content"] == "user likes dark mode"

    def test_retain_with_tags(self, provider_with_config):
-        p = provider_with_config(tags=["pref", "ui"])
+        p = provider_with_config(retain_tags=["pref", "ui"])
        p.handle_tool_call("hindsight_retain", {"content": "likes dark mode"})
        call_kwargs = p._client.aretain.call_args.kwargs
        assert call_kwargs["tags"] == ["pref", "ui"]

+    def test_retain_merges_per_call_tags_with_config_tags(self, provider_with_config):
+        p = provider_with_config(retain_tags=["pref", "ui"])
+        p.handle_tool_call(
+            "hindsight_retain",
+            {"content": "likes dark mode", "tags": ["client:x", "ui"]},
+        )
+        call_kwargs = p._client.aretain.call_args.kwargs
+        assert call_kwargs["tags"] == ["pref", "ui", "client:x"]
+
    def test_retain_without_tags(self, provider):
        provider.handle_tool_call("hindsight_retain", {"content": "hello"})
        call_kwargs = provider._client.aretain.call_args.kwargs
@@ -389,38 +444,58 @@ class TestPrefetch:


 class TestSyncTurn:
-    def _get_retain_kwargs(self, provider):
-        """Helper to get the kwargs from the aretain_batch call."""
-        return provider._client.aretain_batch.call_args.kwargs
+    def test_sync_turn_retains_metadata_rich_turn(self, provider_with_config):
+        p = provider_with_config(
+            retain_tags=["conv", "session1"],
+            retain_source="hermes",
+            retain_user_prefix="User (fakeusername)",
+            retain_assistant_prefix="Assistant (fakeassistantname)",
+        )
+        p.initialize(
+            session_id="session-1",
+            platform="discord",
+            user_id="fakeusername-123",
+            user_name="fakeusername",
+            chat_id="1485316232612941897",
+            chat_name="fakeassistantname-forums",
+            chat_type="thread",
+            thread_id="1491249007475949698",
+            agent_identity="fakeassistantname",
+        )
+        p._client = _make_mock_client()

-    def _get_retain_content(self, provider):
-        """Helper to get the raw content string from the first item."""
-        kwargs = self._get_retain_kwargs(provider)
-        return kwargs["items"][0]["content"]
+        p.sync_turn("hello", "hi there")
+        p._sync_thread.join(timeout=5.0)

-    def _get_retain_messages(self, provider):
-        """Helper to parse the first turn's messages from retained content.
-
-        Content is a JSON array of turns: [[msgs...], [msgs...], ...]
-        For single-turn tests, returns the first turn's messages.
-        """
-        content = self._get_retain_content(provider)
-        turns = json.loads(content)
-        return turns[0] if len(turns) == 1 else turns
-
-    def test_sync_turn_retains(self, provider):
-        provider.sync_turn("hello", "hi there")
-        if provider._sync_thread:
-            provider._sync_thread.join(timeout=5.0)
-        provider._client.aretain_batch.assert_called_once()
-        messages = self._get_retain_messages(provider)
-        assert len(messages) == 2
-        assert messages[0]["role"] == "user"
-        assert messages[0]["content"] == "hello"
-        assert "timestamp" in messages[0]
-        assert messages[1]["role"] == "assistant"
-        assert messages[1]["content"] == "hi there"
-        assert "timestamp" in messages[1]
+        p._client.aretain_batch.assert_called_once()
+        call_kwargs = p._client.aretain_batch.call_args.kwargs
+        assert call_kwargs["bank_id"] == "test-bank"
+        assert call_kwargs["document_id"] == "session-1"
+        assert call_kwargs["retain_async"] is True
+        assert len(call_kwargs["items"]) == 1
+        item = call_kwargs["items"][0]
+        assert item["context"] == "conversation between Hermes Agent and the User"
+        assert item["tags"] == ["conv", "session1"]
+        content = json.loads(item["content"])
+        assert len(content) == 1
+        assert content[0][0]["role"] == "user"
+        assert content[0][0]["content"] == "User (fakeusername): hello"
+        assert content[0][1]["role"] == "assistant"
+        assert content[0][1]["content"] == "Assistant (fakeassistantname): hi there"
+        assert item["metadata"]["source"] == "hermes"
+        assert item["metadata"]["session_id"] == "session-1"
+        assert item["metadata"]["platform"] == "discord"
+        assert item["metadata"]["user_id"] == "fakeusername-123"
+        assert item["metadata"]["user_name"] == "fakeusername"
+        assert item["metadata"]["chat_id"] == "1485316232612941897"
+        assert item["metadata"]["chat_name"] == "fakeassistantname-forums"
+        assert item["metadata"]["chat_type"] == "thread"
+        assert item["metadata"]["thread_id"] == "1491249007475949698"
+        assert item["metadata"]["agent_identity"] == "fakeassistantname"
+        assert item["metadata"]["turn_index"] == "1"
+        assert item["metadata"]["message_count"] == "2"
+        assert re.fullmatch(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?\+00:00", content[0][0]["timestamp"])
+        assert re.fullmatch(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z", item["metadata"]["retained_at"])

    def test_sync_turn_skipped_when_auto_retain_off(self, provider_with_config):
        p = provider_with_config(auto_retain=False)
@@ -428,93 +503,33 @@ class TestSyncTurn:
        assert p._sync_thread is None
        p._client.aretain_batch.assert_not_called()

-    def test_sync_turn_with_tags(self, provider_with_config):
-        p = provider_with_config(tags=["conv", "session1"])
-        p.sync_turn("hello", "hi")
-        if p._sync_thread:
-            p._sync_thread.join(timeout=5.0)
-        item = p._client.aretain_batch.call_args.kwargs["items"][0]
-        assert item["tags"] == ["conv", "session1"]
-
-    def test_sync_turn_uses_aretain_batch(self, provider):
-        """sync_turn should use aretain_batch with retain_async."""
-        provider.sync_turn("hello", "hi")
-        if provider._sync_thread:
-            provider._sync_thread.join(timeout=5.0)
-        provider._client.aretain_batch.assert_called_once()
-        call_kwargs = provider._client.aretain_batch.call_args.kwargs
-        assert call_kwargs["document_id"] == "test-session"
-        assert call_kwargs["retain_async"] is True
-        assert len(call_kwargs["items"]) == 1
-        assert call_kwargs["items"][0]["context"] == "conversation between Hermes Agent and the User"
-
-    def test_sync_turn_custom_context(self, provider_with_config):
-        p = provider_with_config(retain_context="my-agent")
-        p.sync_turn("hello", "hi")
-        if p._sync_thread:
-            p._sync_thread.join(timeout=5.0)
-        item = p._client.aretain_batch.call_args.kwargs["items"][0]
-        assert item["context"] == "my-agent"
-
    def test_sync_turn_every_n_turns(self, provider_with_config):
-        """With retain_every_n_turns=3, only retains on every 3rd turn."""
-        p = provider_with_config(retain_every_n_turns=3)
-
+        p = provider_with_config(retain_every_n_turns=3, retain_async=False)
        p.sync_turn("turn1-user", "turn1-asst")
-        assert p._sync_thread is None  # not retained yet
-
+        assert p._sync_thread is None
        p.sync_turn("turn2-user", "turn2-asst")
-        assert p._sync_thread is None  # not retained yet
-
+        assert p._sync_thread is None
        p.sync_turn("turn3-user", "turn3-asst")
-        assert p._sync_thread is not None  # retained!
        p._sync_thread.join(timeout=5.0)
-
        p._client.aretain_batch.assert_called_once()
-        content = p._client.aretain_batch.call_args.kwargs["items"][0]["content"]
-        # Should contain all 3 turns
-        assert "turn1-user" in content
-        assert "turn2-user" in content
-        assert "turn3-user" in content
-
-    def test_sync_turn_accumulates_full_session(self, provider_with_config):
-        """Each retain sends the ENTIRE session, not just the latest batch."""
-        p = provider_with_config(retain_every_n_turns=2)
-
-        p.sync_turn("turn1-user", "turn1-asst")
-        p.sync_turn("turn2-user", "turn2-asst")
-        if p._sync_thread:
-            p._sync_thread.join(timeout=5.0)
-
-        p._client.aretain_batch.reset_mock()
-
-        p.sync_turn("turn3-user", "turn3-asst")
-        p.sync_turn("turn4-user", "turn4-asst")
-        if p._sync_thread:
-            p._sync_thread.join(timeout=5.0)
-
-        content = p._client.aretain_batch.call_args.kwargs["items"][0]["content"]
-        # Should contain ALL turns from the session
-        assert "turn1-user" in content
-        assert "turn2-user" in content
-        assert "turn3-user" in content
-        assert "turn4-user" in content
-
-    def test_sync_turn_passes_document_id(self, provider):
-        """sync_turn should pass session_id as document_id for dedup."""
-        provider.sync_turn("hello", "hi")
-        if provider._sync_thread:
-            provider._sync_thread.join(timeout=5.0)
-        call_kwargs = provider._client.aretain_batch.call_args.kwargs
+        call_kwargs = p._client.aretain_batch.call_args.kwargs
        assert call_kwargs["document_id"] == "test-session"
+        assert call_kwargs["retain_async"] is False
+        item = call_kwargs["items"][0]
+        content = json.loads(item["content"])
+        assert len(content) == 3
+        assert content[-1][0]["role"] == "user"
+        assert content[-1][0]["content"] == "User: turn3-user"
+        assert content[-1][1]["role"] == "assistant"
+        assert content[-1][1]["content"] == "Assistant: turn3-asst"
+        assert item["metadata"]["turn_index"] == "3"
+        assert item["metadata"]["message_count"] == "6"

    def test_sync_turn_error_does_not_raise(self, provider):
-        """Errors in sync_turn should be swallowed (non-blocking)."""
        provider._client.aretain_batch.side_effect = RuntimeError("network error")
        provider.sync_turn("hello", "hi")
        if provider._sync_thread:
            provider._sync_thread.join(timeout=5.0)
-        # Should not raise


 # ---------------------------------------------------------------------------
@@ -555,10 +570,11 @@ class TestConfigSchema:
            "mode", "api_url", "api_key", "llm_provider", "llm_api_key",
            "llm_model", "bank_id", "bank_mission", "bank_retain_mission",
            "recall_budget", "memory_mode", "recall_prefetch_method",
-            "tags", "recall_tags", "recall_tags_match",
+            "retain_tags", "retain_source",
+            "retain_user_prefix", "retain_assistant_prefix",
+            "recall_tags", "recall_tags_match",
            "auto_recall", "auto_retain",
-            "retain_every_n_turns", "retain_async",
-            "retain_context",
+            "retain_every_n_turns", "retain_async", "retain_context",
            "recall_max_tokens", "recall_max_input_chars",
            "recall_prompt_preamble",
        }
@@ -1216,6 +1216,15 @@ class TestBuildAssistantMessage:
        result = agent._build_assistant_message(msg, "stop")
        assert result["reasoning"] == "thinking"

+    def test_reasoning_content_preserved_separately(self, agent):
+        msg = _mock_assistant_msg(
+            content="answer",
+            reasoning="summary",
+            reasoning_content="provider scratchpad",
+        )
+        result = agent._build_assistant_message(msg, "stop")
+        assert result["reasoning_content"] == "provider scratchpad"
+
    def test_with_tool_calls(self, agent):
        tc = _mock_tool_call(name="web_search", arguments='{"q":"test"}', call_id="c1")
        msg = _mock_assistant_msg(content="", tool_calls=[tc])
@@ -4188,6 +4197,90 @@ class TestPersistUserMessageOverride:
        assert first_db_write["content"] == "Hello there"


+class TestReasoningReplayForStrictProviders:
+    """Assistant replay must preserve provider-native reasoning fields."""
+
+    def _setup_agent(self, agent):
+        agent._cached_system_prompt = "You are helpful."
+        agent._use_prompt_caching = False
+        agent.tool_delay = 0
+        agent.compression_enabled = False
+        agent.save_trajectories = False
+
+    def test_kimi_tool_replay_includes_empty_reasoning_content(self, agent):
+        self._setup_agent(agent)
+        agent.base_url = "https://api.kimi.com/coding/v1"
+        agent._base_url_lower = agent.base_url.lower()
+        agent.provider = "kimi-coding"
+
+        prior_assistant = {
+            "role": "assistant",
+            "content": "",
+            "tool_calls": [
+                {
+                    "id": "c1",
+                    "type": "function",
+                    "function": {"name": "terminal", "arguments": "{\"command\":\"date\"}"},
+                }
+            ],
+        }
+        tool_result = {"role": "tool", "tool_call_id": "c1", "content": "Tue Apr 21"}
+        final_resp = _mock_response(content="done", finish_reason="stop")
+        agent.client.chat.completions.create.return_value = final_resp
+
+        with (
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            result = agent.run_conversation(
+                "next step",
+                conversation_history=[prior_assistant, tool_result],
+            )
+
+        assert result["completed"] is True
+        sent_messages = agent.client.chat.completions.create.call_args.kwargs["messages"]
+        replayed_assistant = next(msg for msg in sent_messages if msg.get("role") == "assistant")
+        assert replayed_assistant["role"] == "assistant"
+        assert replayed_assistant["tool_calls"][0]["function"]["name"] == "terminal"
+        assert "reasoning_content" in replayed_assistant
+        assert replayed_assistant["reasoning_content"] == ""
+
+    def test_explicit_reasoning_content_beats_normalized_reasoning_on_replay(self, agent):
+        self._setup_agent(agent)
+        prior_assistant = {
+            "role": "assistant",
+            "content": "",
+            "tool_calls": [
+                {
+                    "id": "c1",
+                    "type": "function",
+                    "function": {"name": "web_search", "arguments": "{\"q\":\"test\"}"},
+                }
+            ],
+            "reasoning": "summary reasoning",
+            "reasoning_content": "provider-native scratchpad",
+        }
+        tool_result = {"role": "tool", "tool_call_id": "c1", "content": "ok"}
+        final_resp = _mock_response(content="done", finish_reason="stop")
+        agent.client.chat.completions.create.return_value = final_resp
+
+        with (
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            result = agent.run_conversation(
+                "next step",
+                conversation_history=[prior_assistant, tool_result],
+            )
+
+        assert result["completed"] is True
+        sent_messages = agent.client.chat.completions.create.call_args.kwargs["messages"]
+        replayed_assistant = next(msg for msg in sent_messages if msg.get("role") == "assistant")
+        assert replayed_assistant["reasoning_content"] == "provider-native scratchpad"
+
+
 # ---------------------------------------------------------------------------
 # Bugfix: _vprint force=True on error messages during TTS
 # ---------------------------------------------------------------------------
@@ -93,6 +93,27 @@ class TestSessionLifecycle:
        assert session["input_tokens"] == 300
        assert session["output_tokens"] == 150

+    def test_update_token_counts_tracks_api_call_count(self, db):
+        """api_call_count increments with each update_token_counts call."""
+        db.create_session(session_id="s1", source="cli")
+        db.update_token_counts("s1", input_tokens=100, output_tokens=50, api_call_count=1)
+        db.update_token_counts("s1", input_tokens=100, output_tokens=50, api_call_count=1)
+        db.update_token_counts("s1", input_tokens=100, output_tokens=50, api_call_count=1)
+
+        session = db.get_session("s1")
+        assert session["api_call_count"] == 3
+
+    def test_update_token_counts_api_call_count_absolute(self, db):
+        """absolute mode sets api_call_count directly."""
+        db.create_session(session_id="s1", source="cli")
+        db.update_token_counts("s1", input_tokens=100, output_tokens=50, api_call_count=1)
+        db.update_token_counts("s1", input_tokens=300, output_tokens=150,
+                               api_call_count=5, absolute=True)
+
+        session = db.get_session("s1")
+        assert session["api_call_count"] == 5
+        assert session["input_tokens"] == 300
+
    def test_update_token_counts_backfills_model_when_null(self, db):
        db.create_session(session_id="s1", source="telegram")
        db.update_token_counts("s1", input_tokens=10, output_tokens=5, model="openai/gpt-5.4")
@@ -255,6 +276,38 @@ class TestMessageStorage:
        assert msg["reasoning"] == "Thinking about what to say"
        assert msg["reasoning_details"] == details

+    def test_reasoning_content_persisted_and_restored(self, db):
+        """reasoning_content must survive session replay as its own field."""
+        db.create_session(session_id="s1", source="cli")
+        db.append_message(
+            "s1",
+            role="assistant",
+            content="Hello",
+            reasoning="Short summary",
+            reasoning_content="Longer provider-native scratchpad",
+        )
+
+        conv = db.get_messages_as_conversation("s1")
+        assert len(conv) == 1
+        assert conv[0]["reasoning"] == "Short summary"
+        assert conv[0]["reasoning_content"] == "Longer provider-native scratchpad"
+
+    def test_reasoning_content_empty_string_restored_for_assistant(self, db):
+        """Empty reasoning_content still needs to round-trip for strict replays."""
+        db.create_session(session_id="s1", source="cli")
+        db.append_message(
+            "s1",
+            role="assistant",
+            content="",
+            tool_calls=[{"id": "c1", "type": "function", "function": {"name": "date", "arguments": "{}"}}],
+            reasoning_content="",
+        )
+
+        conv = db.get_messages_as_conversation("s1")
+        assert len(conv) == 1
+        assert "reasoning_content" in conv[0]
+        assert conv[0]["reasoning_content"] == ""
+
    def test_reasoning_not_set_for_non_assistant(self, db):
        """reasoning is never leaked onto user or tool messages."""
        db.create_session(session_id="s1", source="telegram")
@@ -1120,7 +1173,7 @@ class TestSchemaInit:
    def test_schema_version(self, db):
        cursor = db._conn.execute("SELECT version FROM schema_version")
        version = cursor.fetchone()[0]
-        assert version == 6
+        assert version == 8

    def test_title_column_exists(self, db):
        """Verify the title column was created in the sessions table."""
@@ -1176,18 +1229,24 @@ class TestSchemaInit:
        conn.commit()
        conn.close()

-        # Open with SessionDB — should migrate to v6
+        # Open with SessionDB — should migrate to v8
        migrated_db = SessionDB(db_path=db_path)

        # Verify migration
        cursor = migrated_db._conn.execute("SELECT version FROM schema_version")
-        assert cursor.fetchone()[0] == 6
+        assert cursor.fetchone()[0] == 8

        # Verify title column exists and is NULL for existing sessions
        session = migrated_db.get_session("existing")
        assert session is not None
        assert session["title"] is None

+        # Verify api_call_count column was added with default 0
+        cursor = migrated_db._conn.execute(
+            "SELECT api_call_count FROM sessions WHERE id = 'existing'"
+        )
+        assert cursor.fetchone()[0] == 0
+
        # Verify we can set title on migrated session
        assert migrated_db.set_session_title("existing", "Migrated Title") is True
        session = migrated_db.get_session("existing")
@@ -1195,6 +1254,144 @@ class TestSchemaInit:

        migrated_db.close()

+    def test_reconciliation_adds_missing_columns(self, tmp_path):
+        """Columns present in SCHEMA_SQL but missing from the live table
+        are added by _reconcile_columns regardless of schema_version.
+
+        Regression test: commit a7d78d3b inserted a new v7 migration
+        (reasoning_content) and renumbered the old v7 (api_call_count)
+        to v8.  Users already at the old v7 had schema_version >= 7,
+        so the new v7 block was skipped and reasoning_content was never
+        created — causing 'no such column' on /continue.
+        """
+        import sqlite3
+
+        db_path = tmp_path / "gap_test.db"
+        conn = sqlite3.connect(str(db_path))
+        # Simulate the old v7 state: api_call_count exists, reasoning_content does NOT
+        conn.executescript("""
+            CREATE TABLE schema_version (version INTEGER NOT NULL);
+            INSERT INTO schema_version (version) VALUES (7);
+
+            CREATE TABLE sessions (
+                id TEXT PRIMARY KEY,
+                source TEXT NOT NULL,
+                user_id TEXT,
+                model TEXT,
+                model_config TEXT,
+                system_prompt TEXT,
+                parent_session_id TEXT,
+                started_at REAL NOT NULL,
+                ended_at REAL,
+                end_reason TEXT,
+                message_count INTEGER DEFAULT 0,
+                tool_call_count INTEGER DEFAULT 0,
+                input_tokens INTEGER DEFAULT 0,
+                output_tokens INTEGER DEFAULT 0,
+                cache_read_tokens INTEGER DEFAULT 0,
+                cache_write_tokens INTEGER DEFAULT 0,
+                reasoning_tokens INTEGER DEFAULT 0,
+                billing_provider TEXT,
+                billing_base_url TEXT,
+                billing_mode TEXT,
+                estimated_cost_usd REAL,
+                actual_cost_usd REAL,
+                cost_status TEXT,
+                cost_source TEXT,
+                pricing_version TEXT,
+                title TEXT,
+                api_call_count INTEGER DEFAULT 0
+            );
+
+            CREATE TABLE messages (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                session_id TEXT NOT NULL,
+                role TEXT NOT NULL,
+                content TEXT,
+                tool_call_id TEXT,
+                tool_calls TEXT,
+                tool_name TEXT,
+                timestamp REAL NOT NULL,
+                token_count INTEGER,
+                finish_reason TEXT,
+                reasoning TEXT,
+                reasoning_details TEXT,
+                codex_reasoning_items TEXT
+            );
+        """)
+        conn.execute(
+            "INSERT INTO sessions (id, source, started_at) VALUES (?, ?, ?)",
+            ("s1", "cli", 1000.0),
+        )
+        conn.execute(
+            "INSERT INTO messages (session_id, role, content, timestamp) "
+            "VALUES (?, ?, ?, ?)",
+            ("s1", "assistant", "hello", 1001.0),
+        )
+        conn.commit()
+        # Verify reasoning_content is absent
+        cols = {r[1] for r in conn.execute("PRAGMA table_info(messages)").fetchall()}
+        assert "reasoning_content" not in cols
+        conn.close()
+
+        # Open with SessionDB — reconciliation should add the missing column
+        migrated_db = SessionDB(db_path=db_path)
+
+        msg_cols = {
+            r[1]
+            for r in migrated_db._conn.execute("PRAGMA table_info(messages)").fetchall()
+        }
+        assert "reasoning_content" in msg_cols
+
+        # The query that used to crash must now work
+        cursor = migrated_db._conn.execute(
+            "SELECT role, content, reasoning, reasoning_content, "
+            "reasoning_details, codex_reasoning_items "
+            "FROM messages WHERE session_id = ?",
+            ("s1",),
+        )
+        row = cursor.fetchone()
+        assert row is not None
+        assert row[0] == "assistant"
+        assert row[3] is None  # reasoning_content NULL for old rows
+
+        migrated_db.close()
+
+    def test_reconciliation_is_idempotent(self, tmp_path):
+        """Opening the same database twice doesn't error or duplicate columns."""
+        db_path = tmp_path / "idempotent.db"
+        db1 = SessionDB(db_path=db_path)
+        cols1 = {r[1] for r in db1._conn.execute("PRAGMA table_info(messages)").fetchall()}
+        db1.close()
+
+        db2 = SessionDB(db_path=db_path)
+        cols2 = {r[1] for r in db2._conn.execute("PRAGMA table_info(messages)").fetchall()}
+        db2.close()
+
+        assert cols1 == cols2
+
+    def test_schema_sql_is_source_of_truth(self, db):
+        """Every column in SCHEMA_SQL exists in the live database.
+
+        This is the architectural invariant: SCHEMA_SQL declares the
+        desired schema, _reconcile_columns ensures it matches reality.
+        """
+        from hermes_state import SCHEMA_SQL
+
+        expected = SessionDB._parse_schema_columns(SCHEMA_SQL)
+        for table_name, declared_cols in expected.items():
+            live_cols = {
+                r[1]
+                for r in db._conn.execute(
+                    f'PRAGMA table_info("{table_name}")'
+                ).fetchall()
+            }
+            for col_name in declared_cols:
+                assert col_name in live_cols, (
+                    f"Column {col_name} declared in SCHEMA_SQL for {table_name} "
+                    f"but missing from live DB. Live columns: {live_cols}"
+                )
+

 class TestTitleUniqueness:
    """Tests for unique title enforcement and title-based lookups."""
@@ -1732,3 +1929,124 @@ class TestConcurrentWriteSafety:
        assert "30" in src, (
            "SQLite timeout should be at least 30s to handle CLI/gateway lock contention"
        )
+
+
+# =========================================================================
+# Auto-maintenance: state_meta + vacuum + maybe_auto_prune_and_vacuum
+# =========================================================================
+
+class TestStateMeta:
+    def test_get_meta_missing_returns_none(self, db):
+        assert db.get_meta("nonexistent") is None
+
+    def test_set_then_get_meta(self, db):
+        db.set_meta("foo", "bar")
+        assert db.get_meta("foo") == "bar"
+
+    def test_set_meta_upsert(self, db):
+        """set_meta overwrites existing value (ON CONFLICT DO UPDATE)."""
+        db.set_meta("key", "v1")
+        db.set_meta("key", "v2")
+        assert db.get_meta("key") == "v2"
+
+
+class TestVacuum:
+    def test_vacuum_runs_without_error(self, db):
+        """VACUUM must succeed on a fresh DB (no rows to reclaim)."""
+        db.create_session(session_id="s1", source="cli")
+        db.append_message(session_id="s1", role="user", content="hi")
+        # Should not raise, even though there's nothing significant to reclaim.
+        db.vacuum()
+
+
+class TestAutoMaintenance:
+    def _make_old_ended(self, db, sid: str, days_old: int = 100):
+        """Create a session that is ended and was started `days_old` days ago."""
+        db.create_session(session_id=sid, source="cli")
+        db.end_session(sid, end_reason="done")
+        db._conn.execute(
+            "UPDATE sessions SET started_at = ? WHERE id = ?",
+            (time.time() - days_old * 86400, sid),
+        )
+        db._conn.commit()
+
+    def test_first_run_prunes_and_vacuums(self, db):
+        self._make_old_ended(db, "old1", days_old=100)
+        self._make_old_ended(db, "old2", days_old=100)
+        db.create_session(session_id="new", source="cli")  # active, must survive
+
+        result = db.maybe_auto_prune_and_vacuum(retention_days=90)
+        assert result["skipped"] is False
+        assert result["pruned"] == 2
+        assert result["vacuumed"] is True
+        assert result.get("error") is None
+        assert db.get_session("old1") is None
+        assert db.get_session("old2") is None
+        assert db.get_session("new") is not None
+
+    def test_second_call_within_interval_skips(self, db):
+        self._make_old_ended(db, "old", days_old=100)
+        first = db.maybe_auto_prune_and_vacuum(
+            retention_days=90, min_interval_hours=24
+        )
+        assert first["skipped"] is False
+        assert first["pruned"] == 1
+
+        # Create another prunable session; a second call within
+        # min_interval_hours should still skip without touching it.
+        self._make_old_ended(db, "old2", days_old=100)
+        second = db.maybe_auto_prune_and_vacuum(
+            retention_days=90, min_interval_hours=24
+        )
+        assert second["skipped"] is True
+        assert second["pruned"] == 0
+        assert db.get_session("old2") is not None  # untouched
+
+    def test_second_call_after_interval_runs_again(self, db):
+        self._make_old_ended(db, "old", days_old=100)
+        db.maybe_auto_prune_and_vacuum(retention_days=90, min_interval_hours=24)
+
+        # Backdate the last-run marker to force another run.
+        db.set_meta("last_auto_prune", str(time.time() - 48 * 3600))
+
+        self._make_old_ended(db, "old2", days_old=100)
+        result = db.maybe_auto_prune_and_vacuum(
+            retention_days=90, min_interval_hours=24
+        )
+        assert result["skipped"] is False
+        assert result["pruned"] == 1
+        assert db.get_session("old2") is None
+
+    def test_no_prunable_sessions_no_vacuum(self, db):
+        """When prune deletes 0 rows, VACUUM is skipped (wasted I/O)."""
+        db.create_session(session_id="fresh", source="cli")  # too recent
+        result = db.maybe_auto_prune_and_vacuum(retention_days=90)
+        assert result["skipped"] is False
+        assert result["pruned"] == 0
+        assert result["vacuumed"] is False
+        # But last-run is still recorded so we don't retry immediately.
+        assert db.get_meta("last_auto_prune") is not None
+
+    def test_vacuum_disabled_via_flag(self, db):
+        self._make_old_ended(db, "old", days_old=100)
+        result = db.maybe_auto_prune_and_vacuum(retention_days=90, vacuum=False)
+        assert result["pruned"] == 1
+        assert result["vacuumed"] is False
+
+    def test_corrupt_last_run_marker_treated_as_no_prior_run(self, db):
+        """A non-numeric marker must not break maintenance."""
+        db.set_meta("last_auto_prune", "not-a-timestamp")
+        self._make_old_ended(db, "old", days_old=100)
+        result = db.maybe_auto_prune_and_vacuum(retention_days=90)
+        assert result["skipped"] is False
+        assert result["pruned"] == 1
+
+    def test_state_meta_survives_vacuum(self, db):
+        """Marker written just before VACUUM must still be readable after."""
+        self._make_old_ended(db, "old", days_old=100)
+        db.maybe_auto_prune_and_vacuum(retention_days=90)
+        marker = db.get_meta("last_auto_prune")
+        assert marker is not None
+        # Should parse as a float timestamp close to now.
+        assert abs(float(marker) - time.time()) < 60
+
@@ -19,6 +19,8 @@ from tools.file_operations import (
    BINARY_EXTENSIONS,
    IMAGE_EXTENSIONS,
    MAX_LINE_LENGTH,
+    normalize_read_pagination,
+    normalize_search_pagination,
 )


@@ -192,6 +194,17 @@ def file_ops(mock_env):


 class TestShellFileOpsHelpers:
+    def test_normalize_read_pagination_clamps_invalid_values(self):
+        assert normalize_read_pagination(offset=0, limit=0) == (1, 1)
+        assert normalize_read_pagination(offset=-10, limit=-5) == (1, 1)
+        assert normalize_read_pagination(offset="bad", limit="bad") == (1, 500)
+        assert normalize_read_pagination(offset=2, limit=999999) == (2, 2000)
+
+    def test_normalize_search_pagination_clamps_invalid_values(self):
+        assert normalize_search_pagination(offset=-10, limit=-5) == (0, 1)
+        assert normalize_search_pagination(offset="bad", limit="bad") == (0, 50)
+        assert normalize_search_pagination(offset=3, limit=0) == (3, 1)
+
    def test_escape_shell_arg_simple(self, file_ops):
        assert file_ops._escape_shell_arg("hello") == "'hello'"

@@ -146,3 +146,61 @@ class TestCheckLintBracePaths:

        assert result.success is False
        assert "SyntaxError" in result.output
+
+
+# =========================================================================
+# Pagination bounds
+# =========================================================================
+
+
+class TestPaginationBounds:
+    """Invalid pagination inputs should not leak into shell commands."""
+
+    def test_read_file_clamps_offset_and_limit_before_building_sed_range(self):
+        env = MagicMock()
+        env.cwd = "/tmp"
+        ops = ShellFileOperations(env)
+        commands = []
+
+        def fake_exec(command, *args, **kwargs):
+            commands.append(command)
+            if command.startswith("wc -c"):
+                return MagicMock(exit_code=0, stdout="12")
+            if command.startswith("head -c"):
+                return MagicMock(exit_code=0, stdout="line1\nline2\n")
+            if command.startswith("sed -n"):
+                return MagicMock(exit_code=0, stdout="line1\n")
+            if command.startswith("wc -l"):
+                return MagicMock(exit_code=0, stdout="2")
+            return MagicMock(exit_code=0, stdout="")
+
+        with patch.object(ops, "_exec", side_effect=fake_exec):
+            result = ops.read_file("notes.txt", offset=0, limit=0)
+
+        assert result.error is None
+        assert "     1|line1" in result.content
+        sed_commands = [cmd for cmd in commands if cmd.startswith("sed -n")]
+        assert sed_commands == ["sed -n '1,1p' 'notes.txt'"]
+
+    def test_search_clamps_offset_and_limit_before_building_head_pipeline(self):
+        env = MagicMock()
+        env.cwd = "/tmp"
+        ops = ShellFileOperations(env)
+        commands = []
+
+        def fake_exec(command, *args, **kwargs):
+            commands.append(command)
+            if command.startswith("test -e"):
+                return MagicMock(exit_code=0, stdout="exists")
+            if command.startswith("rg --files"):
+                return MagicMock(exit_code=0, stdout="a.py\n")
+            return MagicMock(exit_code=0, stdout="")
+
+        with patch.object(ops, "_has_command", side_effect=lambda cmd: cmd == "rg"), \
+             patch.object(ops, "_exec", side_effect=fake_exec):
+            result = ops.search("*.py", target="files", path=".", offset=-4, limit=-2)
+
+        assert result.files == ["a.py"]
+        rg_commands = [cmd for cmd in commands if cmd.startswith("rg --files")]
+        assert rg_commands
+        assert "| head -n 1" in rg_commands[0]
@@ -45,6 +45,19 @@ class TestReadFileHandler:
        read_file_tool("/tmp/big.txt", offset=10, limit=20)
        mock_ops.read_file.assert_called_once_with("/tmp/big.txt", 10, 20)

+    @patch("tools.file_tools._get_file_ops")
+    def test_invalid_offset_and_limit_are_normalized_before_dispatch(self, mock_get):
+        mock_ops = MagicMock()
+        result_obj = MagicMock()
+        result_obj.content = "line1"
+        result_obj.to_dict.return_value = {"content": "line1", "total_lines": 1}
+        mock_ops.read_file.return_value = result_obj
+        mock_get.return_value = mock_ops
+
+        from tools.file_tools import read_file_tool
+        read_file_tool("/tmp/big.txt", offset=0, limit=0)
+        mock_ops.read_file.assert_called_once_with("/tmp/big.txt", 1, 1)
+
    @patch("tools.file_tools._get_file_ops")
    def test_exception_returns_error_json(self, mock_get):
        mock_get.side_effect = RuntimeError("terminal not available")
@@ -191,6 +204,21 @@ class TestSearchHandler:
            limit=10, offset=5, output_mode="count", context=2,
        )

+    @patch("tools.file_tools._get_file_ops")
+    def test_search_normalizes_invalid_pagination_before_dispatch(self, mock_get):
+        mock_ops = MagicMock()
+        result_obj = MagicMock()
+        result_obj.to_dict.return_value = {"files": []}
+        mock_ops.search.return_value = result_obj
+        mock_get.return_value = mock_ops
+
+        from tools.file_tools import search_tool
+        search_tool(pattern="class", target="files", path="/src", limit=-5, offset=-2)
+        mock_ops.search.assert_called_once_with(
+            pattern="class", path="/src", target="files", file_glob=None,
+            limit=1, offset=0, output_mode="content", context=0,
+        )
+
    @patch("tools.file_tools._get_file_ops")
    def test_search_exception_returns_error(self, mock_get):
        mock_get.side_effect = RuntimeError("no terminal")
@@ -271,6 +271,40 @@ LINTERS = {
 MAX_LINES = 2000
 MAX_LINE_LENGTH = 2000
 MAX_FILE_SIZE = 50 * 1024  # 50KB
+DEFAULT_READ_OFFSET = 1
+DEFAULT_READ_LIMIT = 500
+DEFAULT_SEARCH_OFFSET = 0
+DEFAULT_SEARCH_LIMIT = 50
+
+
+def _coerce_int(value: Any, default: int) -> int:
+    """Best-effort integer coercion for tool pagination inputs."""
+    try:
+        return int(value)
+    except (TypeError, ValueError):
+        return default
+
+
+def normalize_read_pagination(offset: Any = DEFAULT_READ_OFFSET,
+                              limit: Any = DEFAULT_READ_LIMIT) -> tuple[int, int]:
+    """Return safe read_file pagination bounds.
+
+    Tool schemas declare minimum/maximum values, but not every caller or
+    provider enforces schemas before dispatch. Clamp here so invalid values
+    cannot leak into sed ranges like ``0,-1p``.
+    """
+    normalized_offset = max(1, _coerce_int(offset, DEFAULT_READ_OFFSET))
+    normalized_limit = _coerce_int(limit, DEFAULT_READ_LIMIT)
+    normalized_limit = max(1, min(normalized_limit, MAX_LINES))
+    return normalized_offset, normalized_limit
+
+
+def normalize_search_pagination(offset: Any = DEFAULT_SEARCH_OFFSET,
+                                limit: Any = DEFAULT_SEARCH_LIMIT) -> tuple[int, int]:
+    """Return safe search pagination bounds for shell head/tail pipelines."""
+    normalized_offset = max(0, _coerce_int(offset, DEFAULT_SEARCH_OFFSET))
+    normalized_limit = max(1, _coerce_int(limit, DEFAULT_SEARCH_LIMIT))
+    return normalized_offset, normalized_limit


 class ShellFileOperations(FileOperations):
@@ -461,8 +495,7 @@ class ShellFileOperations(FileOperations):
        # Expand ~ and other shell paths
        path = self._expand_path(path)
        
-        # Clamp limit
-        limit = min(limit, MAX_LINES)
+        offset, limit = normalize_read_pagination(offset, limit)
        
        # Check if file exists and get size (wc -c is POSIX, works on Linux + macOS)
        stat_cmd = f"wc -c < {self._escape_shell_arg(path)} 2>/dev/null"
@@ -866,6 +899,8 @@ class ShellFileOperations(FileOperations):
        Returns:
            SearchResult with matches or file list
        """
+        offset, limit = normalize_search_pagination(offset, limit)
+
        # Expand ~ and other shell paths
        path = self._expand_path(path)
        
@@ -11,7 +11,11 @@ from typing import Optional

 from agent.file_safety import get_read_block_error
 from tools.binary_extensions import has_binary_extension
-from tools.file_operations import ShellFileOperations
+from tools.file_operations import (
+    ShellFileOperations,
+    normalize_read_pagination,
+    normalize_search_pagination,
+)
 from tools import file_state
 from agent.redact import redact_sensitive_text

@@ -351,6 +355,8 @@ def clear_file_ops_cache(task_id: str = None):
 def read_file_tool(path: str, offset: int = 1, limit: int = 500, task_id: str = "default") -> str:
    """Read a file with pagination and line numbers."""
    try:
+        offset, limit = normalize_read_pagination(offset, limit)
+
        # ── Device path guard ─────────────────────────────────────────
        # Block paths that would hang the process (infinite output,
        # blocking on input).  Pure path check — no I/O.
@@ -762,6 +768,8 @@ def search_tool(pattern: str, target: str = "content", path: str = ".",
                task_id: str = "default") -> str:
    """Search for content or files."""
    try:
+        offset, limit = normalize_search_pagination(offset, limit)
+
        # Track searches to detect *consecutive* repeated search loops.
        # Include pagination args so users can page through truncated
        # results without tripping the repeated-search guard.
@@ -0,0 +1,410 @@
+import { describe, expect, it } from 'vitest'
+
+import {
+  buildSubagentTree,
+  descendantIds,
+  flattenTree,
+  fmtCost,
+  fmtDuration,
+  fmtTokens,
+  formatSummary,
+  hotnessBucket,
+  peakHotness,
+  sparkline,
+  topLevelSubagents,
+  treeTotals,
+  widthByDepth
+} from '../lib/subagentTree.js'
+import type { SubagentProgress } from '../types.js'
+
+const makeItem = (overrides: Partial<SubagentProgress> & Pick<SubagentProgress, 'id' | 'index'>): SubagentProgress => ({
+  depth: 0,
+  goal: overrides.id,
+  notes: [],
+  parentId: null,
+  status: 'running',
+  taskCount: 1,
+  thinking: [],
+  toolCount: 0,
+  tools: [],
+  ...overrides
+})
+
+describe('aggregate: tokens, cost, files, hotness', () => {
+  it('sums tokens and cost across subtree', () => {
+    const items = [
+      makeItem({ costUsd: 0.01, id: 'p', index: 0, inputTokens: 1000, outputTokens: 500 }),
+      makeItem({
+        costUsd: 0.005,
+        depth: 1,
+        id: 'c1',
+        index: 0,
+        inputTokens: 500,
+        outputTokens: 100,
+        parentId: 'p'
+      }),
+      makeItem({
+        costUsd: 0.008,
+        depth: 1,
+        id: 'c2',
+        index: 1,
+        inputTokens: 300,
+        outputTokens: 200,
+        parentId: 'p'
+      })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(tree[0]!.aggregate).toMatchObject({
+      costUsd: 0.023,
+      inputTokens: 1800,
+      outputTokens: 800
+    })
+  })
+
+  it('counts files read + written across subtree', () => {
+    const items = [
+      makeItem({ filesRead: ['a.ts', 'b.ts'], id: 'p', index: 0 }),
+      makeItem({ depth: 1, filesWritten: ['c.ts'], id: 'c', index: 0, parentId: 'p' })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(tree[0]!.aggregate.filesTouched).toBe(3)
+  })
+
+  it('hotness = totalTools / totalDuration', () => {
+    const items = [
+      makeItem({
+        durationSeconds: 10,
+        id: 'p',
+        index: 0,
+        status: 'completed',
+        toolCount: 20
+      })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(tree[0]!.aggregate.hotness).toBeCloseTo(2)
+  })
+
+  it('hotness is zero when duration is zero', () => {
+    const items = [makeItem({ id: 'p', index: 0, toolCount: 10 })]
+    const tree = buildSubagentTree(items)
+    expect(tree[0]!.aggregate.hotness).toBe(0)
+  })
+})
+
+describe('hotnessBucket + peakHotness', () => {
+  it('peakHotness walks subtree', () => {
+    const items = [
+      makeItem({ durationSeconds: 100, id: 'p', index: 0, status: 'completed', toolCount: 1 }),
+      makeItem({
+        depth: 1,
+        durationSeconds: 1,
+        id: 'c',
+        index: 0,
+        parentId: 'p',
+        status: 'completed',
+        toolCount: 5
+      })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(peakHotness(tree)).toBeGreaterThan(2)
+  })
+
+  it('hotnessBucket clamps and normalizes', () => {
+    expect(hotnessBucket(0, 10, 4)).toBe(0)
+    expect(hotnessBucket(10, 10, 4)).toBe(3)
+    expect(hotnessBucket(5, 10, 4)).toBe(2)
+    expect(hotnessBucket(100, 10, 4)).toBe(3) // clamped
+    expect(hotnessBucket(5, 0, 4)).toBe(0) // guard against divide-by-zero
+  })
+})
+
+describe('fmtCost + fmtTokens', () => {
+  it('fmtCost handles ranges', () => {
+    expect(fmtCost(0)).toBe('')
+    expect(fmtCost(0.001)).toBe('<$0.01')
+    expect(fmtCost(0.42)).toBe('$0.42')
+    expect(fmtCost(1.23)).toBe('$1.23')
+    expect(fmtCost(12.5)).toBe('$12.5')
+  })
+
+  it('fmtTokens handles ranges', () => {
+    expect(fmtTokens(0)).toBe('0')
+    expect(fmtTokens(542)).toBe('542')
+    expect(fmtTokens(1234)).toBe('1.2k')
+    expect(fmtTokens(45678)).toBe('46k')
+  })
+})
+
+describe('formatSummary with tokens + cost', () => {
+  it('includes token + cost when present', () => {
+    expect(
+      formatSummary({
+        activeCount: 0,
+        costUsd: 0.42,
+        descendantCount: 3,
+        filesTouched: 0,
+        hotness: 0,
+        inputTokens: 8000,
+        maxDepthFromHere: 2,
+        outputTokens: 2000,
+        totalDuration: 30,
+        totalTools: 14
+      })
+    ).toBe('d2 · 3 agents · 14 tools · 30s · 10k tok · $0.42')
+  })
+})
+
+describe('buildSubagentTree', () => {
+  it('returns empty list for empty input', () => {
+    expect(buildSubagentTree([])).toEqual([])
+  })
+
+  it('treats flat list as top-level when no parentId is given', () => {
+    const items = [makeItem({ id: 'a', index: 0 }), makeItem({ id: 'b', index: 1 }), makeItem({ id: 'c', index: 2 })]
+
+    const tree = buildSubagentTree(items)
+    expect(tree).toHaveLength(3)
+    expect(tree.map(n => n.item.id)).toEqual(['a', 'b', 'c'])
+    expect(tree.every(n => n.children.length === 0)).toBe(true)
+  })
+
+  it('nests children under their parent by subagent_id', () => {
+    const items = [
+      makeItem({ id: 'parent', index: 0 }),
+      makeItem({ depth: 1, id: 'child-1', index: 0, parentId: 'parent' }),
+      makeItem({ depth: 1, id: 'child-2', index: 1, parentId: 'parent' })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(tree).toHaveLength(1)
+    expect(tree[0]!.children).toHaveLength(2)
+    expect(tree[0]!.children.map(n => n.item.id)).toEqual(['child-1', 'child-2'])
+  })
+
+  it('builds multi-level nesting', () => {
+    const items = [
+      makeItem({ id: 'p', index: 0 }),
+      makeItem({ depth: 1, id: 'c', index: 0, parentId: 'p' }),
+      makeItem({ depth: 2, id: 'gc', index: 0, parentId: 'c' })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(tree[0]!.children[0]!.children[0]!.item.id).toBe('gc')
+    expect(tree[0]!.aggregate.maxDepthFromHere).toBe(2)
+    expect(tree[0]!.aggregate.descendantCount).toBe(2)
+  })
+
+  it('promotes orphaned children (missing parent) to top level', () => {
+    const items = [makeItem({ id: 'a', index: 0 }), makeItem({ depth: 1, id: 'orphan', index: 1, parentId: 'ghost' })]
+
+    const tree = buildSubagentTree(items)
+    expect(tree).toHaveLength(2)
+    expect(tree.map(n => n.item.id)).toEqual(['a', 'orphan'])
+  })
+
+  it('stable sort: children ordered by (depth, index) not insert order', () => {
+    const items = [
+      makeItem({ id: 'p', index: 0 }),
+      makeItem({ depth: 1, id: 'c3', index: 2, parentId: 'p' }),
+      makeItem({ depth: 1, id: 'c1', index: 0, parentId: 'p' }),
+      makeItem({ depth: 1, id: 'c2', index: 1, parentId: 'p' })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(tree[0]!.children.map(n => n.item.id)).toEqual(['c1', 'c2', 'c3'])
+  })
+})
+
+describe('aggregate', () => {
+  it('sums tool counts and durations across subtree', () => {
+    const items = [
+      makeItem({ durationSeconds: 10, id: 'p', index: 0, status: 'completed', toolCount: 5 }),
+      makeItem({ depth: 1, durationSeconds: 4, id: 'c1', index: 0, parentId: 'p', status: 'completed', toolCount: 3 }),
+      makeItem({ depth: 1, durationSeconds: 2, id: 'c2', index: 1, parentId: 'p', status: 'completed', toolCount: 1 })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(tree[0]!.aggregate).toMatchObject({
+      activeCount: 0,
+      descendantCount: 2,
+      totalDuration: 16,
+      totalTools: 9
+    })
+  })
+
+  it('counts queued + running as active', () => {
+    const items = [
+      makeItem({ id: 'p', index: 0, status: 'running' }),
+      makeItem({ depth: 1, id: 'c1', index: 0, parentId: 'p', status: 'queued' }),
+      makeItem({ depth: 1, id: 'c2', index: 1, parentId: 'p', status: 'completed' })
+    ]
+
+    const tree = buildSubagentTree(items)
+    expect(tree[0]!.aggregate.activeCount).toBe(2)
+  })
+})
+
+describe('widthByDepth', () => {
+  it('returns empty array for empty tree', () => {
+    expect(widthByDepth([])).toEqual([])
+  })
+
+  it('tallies nodes at each depth', () => {
+    const items = [
+      makeItem({ id: 'p1', index: 0 }),
+      makeItem({ id: 'p2', index: 1 }),
+      makeItem({ depth: 1, id: 'c1', index: 0, parentId: 'p1' }),
+      makeItem({ depth: 1, id: 'c2', index: 1, parentId: 'p1' }),
+      makeItem({ depth: 1, id: 'c3', index: 0, parentId: 'p2' }),
+      makeItem({ depth: 2, id: 'gc1', index: 0, parentId: 'c1' })
+    ]
+
+    expect(widthByDepth(buildSubagentTree(items))).toEqual([2, 3, 1])
+  })
+})
+
+describe('treeTotals', () => {
+  it('folds a full tree into a single rollup', () => {
+    const items = [
+      makeItem({ id: 'p1', index: 0, toolCount: 5 }),
+      makeItem({ id: 'p2', index: 1, toolCount: 2 }),
+      makeItem({ depth: 1, id: 'c', index: 0, parentId: 'p1', toolCount: 3 })
+    ]
+
+    const totals = treeTotals(buildSubagentTree(items))
+    expect(totals.descendantCount).toBe(3)
+    expect(totals.totalTools).toBe(10)
+    expect(totals.maxDepthFromHere).toBe(2)
+  })
+
+  it('returns zeros for empty tree', () => {
+    expect(treeTotals([])).toEqual({
+      activeCount: 0,
+      costUsd: 0,
+      descendantCount: 0,
+      filesTouched: 0,
+      hotness: 0,
+      inputTokens: 0,
+      maxDepthFromHere: 0,
+      outputTokens: 0,
+      totalDuration: 0,
+      totalTools: 0
+    })
+  })
+})
+
+describe('flattenTree + descendantIds', () => {
+  const items = [
+    makeItem({ id: 'p', index: 0 }),
+    makeItem({ depth: 1, id: 'c1', index: 0, parentId: 'p' }),
+    makeItem({ depth: 2, id: 'gc', index: 0, parentId: 'c1' }),
+    makeItem({ depth: 1, id: 'c2', index: 1, parentId: 'p' })
+  ]
+
+  it('flattens in visit order (depth-first, pre-order)', () => {
+    const tree = buildSubagentTree(items)
+    expect(flattenTree(tree).map(n => n.item.id)).toEqual(['p', 'c1', 'gc', 'c2'])
+  })
+
+  it('collects descendant ids excluding the node itself', () => {
+    const tree = buildSubagentTree(items)
+    expect(descendantIds(tree[0]!)).toEqual(['c1', 'gc', 'c2'])
+  })
+})
+
+describe('sparkline', () => {
+  it('returns empty string for empty input', () => {
+    expect(sparkline([])).toBe('')
+  })
+
+  it('renders zeroes as spaces (not bottom glyph)', () => {
+    expect(sparkline([0, 0])).toBe('  ')
+  })
+
+  it('scales to the max value', () => {
+    const out = sparkline([1, 8])
+    expect(out).toHaveLength(2)
+    expect(out[1]).toBe('█')
+  })
+
+  it('sparse widths render as expected', () => {
+    const out = sparkline([2, 3, 7, 4])
+    expect(out).toHaveLength(4)
+    expect([...out].every(ch => /[\s▁-█]/.test(ch))).toBe(true)
+  })
+})
+
+describe('formatSummary', () => {
+  const emptyTotals = {
+    activeCount: 0,
+    costUsd: 0,
+    descendantCount: 0,
+    filesTouched: 0,
+    hotness: 0,
+    inputTokens: 0,
+    maxDepthFromHere: 0,
+    outputTokens: 0,
+    totalDuration: 0,
+    totalTools: 0
+  }
+
+  it('collapses zero-valued components', () => {
+    expect(formatSummary({ ...emptyTotals, descendantCount: 1 })).toBe('d0 · 1 agent')
+  })
+
+  it('emits rich summary with all pieces', () => {
+    expect(
+      formatSummary({
+        ...emptyTotals,
+        activeCount: 2,
+        descendantCount: 7,
+        maxDepthFromHere: 3,
+        totalDuration: 134,
+        totalTools: 124
+      })
+    ).toBe('d3 · 7 agents · 124 tools · 2m 14s · ⚡2')
+  })
+})
+
+describe('fmtDuration', () => {
+  it('formats under a minute as plain seconds', () => {
+    expect(fmtDuration(0)).toBe('0s')
+    expect(fmtDuration(42)).toBe('42s')
+    expect(fmtDuration(59.4)).toBe('59s')
+  })
+
+  it('formats whole minutes without trailing seconds', () => {
+    expect(fmtDuration(60)).toBe('1m')
+    expect(fmtDuration(180)).toBe('3m')
+  })
+
+  it('mixes minutes and seconds', () => {
+    expect(fmtDuration(134)).toBe('2m 14s')
+    expect(fmtDuration(605)).toBe('10m 5s')
+  })
+})
+
+describe('topLevelSubagents', () => {
+  it('returns items with no parent', () => {
+    const items = [makeItem({ id: 'a', index: 0 }), makeItem({ id: 'b', index: 1 })]
+    expect(topLevelSubagents(items).map(s => s.id)).toEqual(['a', 'b'])
+  })
+
+  it('excludes children whose parent is present', () => {
+    const items = [
+      makeItem({ id: 'p', index: 0 }),
+      makeItem({ depth: 1, id: 'c', index: 0, parentId: 'p' })
+    ]
+
+    expect(topLevelSubagents(items).map(s => s.id)).toEqual(['p'])
+  })
+
+  it('promotes orphans whose parent is missing', () => {
+    const items = [makeItem({ id: 'a', index: 0 }), makeItem({ depth: 1, id: 'orphan', index: 1, parentId: 'ghost' })]
+    expect(topLevelSubagents(items).map(s => s.id)).toEqual(['a', 'orphan'])
+  })
+})
@@ -1,11 +1,13 @@
 import { STREAM_BATCH_MS } from '../config/timing.js'
 import { buildSetupRequiredSections, SETUP_REQUIRED_TITLE } from '../content/setup.js'
-import type { CommandsCatalogResponse, GatewayEvent, GatewaySkin } from '../gatewayTypes.js'
+import type { CommandsCatalogResponse, DelegationStatusResponse, GatewayEvent, GatewaySkin } from '../gatewayTypes.js'
 import { rpcErrorMessage } from '../lib/rpc.js'
+import { topLevelSubagents } from '../lib/subagentTree.js'
 import { formatToolCall, stripAnsi } from '../lib/text.js'
 import { fromSkin } from '../theme.js'
 import type { Msg, SubagentProgress } from '../types.js'

+import { applyDelegationStatus, getDelegationState } from './delegationStore.js'
 import type { GatewayEventHandlerContext } from './interfaces.js'
 import { patchOverlayState } from './overlayStore.js'
 import { turnController } from './turnController.js'
@@ -53,6 +55,55 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev:
  let pendingThinkingStatus = ''
  let thinkingStatusTimer: null | ReturnType<typeof setTimeout> = null

+  // Inject the disk-save callback into turnController so recordMessageComplete
+  // can fire-and-forget a persist without having to plumb a gateway ref around.
+  turnController.persistSpawnTree = async (subagents, sessionId) => {
+    try {
+      const startedAt = subagents.reduce<number>((min, s) => {
+        if (!s.startedAt) {
+          return min
+        }
+
+        return min === 0 ? s.startedAt : Math.min(min, s.startedAt)
+      }, 0)
+
+      const top = topLevelSubagents(subagents)
+        .map(s => s.goal)
+        .filter(Boolean)
+        .slice(0, 2)
+
+      const label = top.length ? top.join(' · ') : `${subagents.length} subagents`
+
+      await rpc('spawn_tree.save', {
+        finished_at: Date.now() / 1000,
+        label: label.slice(0, 120),
+        session_id: sessionId ?? 'default',
+        started_at: startedAt ? startedAt / 1000 : null,
+        subagents
+      })
+    } catch {
+      // Persistence is best-effort; in-memory history is the authoritative
+      // same-session source.  A write failure doesn't block the turn.
+    }
+  }
+
+  // Refresh delegation caps at most every 5s so the status bar HUD can
+  // render a /warning close to the configured cap without spamming the RPC.
+  let lastDelegationFetchAt = 0
+
+  const refreshDelegationStatus = (force = false) => {
+    const now = Date.now()
+
+    if (!force && now - lastDelegationFetchAt < 5000) {
+      return
+    }
+
+    lastDelegationFetchAt = now
+    rpc<DelegationStatusResponse>('delegation.status', {})
+      .then(r => applyDelegationStatus(r))
+      .catch(() => {})
+  }
+
  const setStatus = (status: string) => {
    pendingThinkingStatus = ''

@@ -85,7 +136,12 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev:
    }, ms)
  }

-  const keepCompletedElseRunning = (s: SubagentProgress['status']) => (s === 'completed' ? s : 'running')
+  // Terminal statuses are never overwritten by late-arriving live events —
+  // otherwise a stale `subagent.start` / `spawn_requested` can clobber a
+  // `failed` or `interrupted` terminal state (Copilot review #14045).
+  const isTerminalStatus = (s: SubagentProgress['status']) => s === 'completed' || s === 'failed' || s === 'interrupted'
+
+  const keepTerminalElseRunning = (s: SubagentProgress['status']) => (isTerminalStatus(s) ? s : 'running')

  const handleReady = (skin?: GatewaySkin) => {
    if (skin) {
@@ -260,32 +316,28 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev:
        turnController.recordToolStart(ev.payload.tool_id, ev.payload.name ?? 'tool', ev.payload.context ?? '')

        return
+      case 'tool.complete': {
+        const inlineDiffText =
+          ev.payload.inline_diff && getUiState().inlineDiffs ? stripAnsi(String(ev.payload.inline_diff)).trim() : ''

-      case 'tool.complete':
-        {
-          const inlineDiffText =
-            ev.payload.inline_diff && getUiState().inlineDiffs ? stripAnsi(String(ev.payload.inline_diff)).trim() : ''
-
-          turnController.recordToolComplete(
-            ev.payload.tool_id,
-            ev.payload.name,
-            ev.payload.error,
-            inlineDiffText ? '' : ev.payload.summary
-          )
-
-          if (!inlineDiffText) {
-            return
-          }
-
-          // Keep inline diffs attached to the assistant completion body so
-          // they render in the same message flow, not as a standalone system
-          // artifact that can look out-of-place around tool rows.
-          turnController.queueInlineDiff(inlineDiffText)
+        turnController.recordToolComplete(
+          ev.payload.tool_id,
+          ev.payload.name,
+          ev.payload.error,
+          inlineDiffText ? '' : ev.payload.summary
+        )

+        if (!inlineDiffText) {
          return
        }

+        // Keep inline diffs attached to the assistant completion body so
+        // they render in the same message flow, not as a standalone system
+        // artifact that can look out-of-place around tool rows.
+        turnController.queueInlineDiff(inlineDiffText)
+
        return
+      }

      case 'clarify.request':
        patchOverlayState({
@@ -329,8 +381,23 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev:

        return

+      case 'subagent.spawn_requested':
+        // Child built but not yet running (waiting on ThreadPoolExecutor slot).
+        // Preserve completed state if a later event races in before this one.
+        turnController.upsertSubagent(ev.payload, c => (isTerminalStatus(c.status) ? {} : { status: 'queued' }))
+
+        // Prime the status-bar HUD: fetch caps (once every 5s) so we can
+        // warn as depth/concurrency approaches the configured ceiling.
+        if (getDelegationState().maxSpawnDepth === null) {
+          refreshDelegationStatus(true)
+        } else {
+          refreshDelegationStatus()
+        }
+
+        return
+
      case 'subagent.start':
-        turnController.upsertSubagent(ev.payload, () => ({ status: 'running' }))
+        turnController.upsertSubagent(ev.payload, c => (isTerminalStatus(c.status) ? {} : { status: 'running' }))

        return
      case 'subagent.thinking': {
@@ -340,10 +407,16 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev:
          return
        }

-        turnController.upsertSubagent(ev.payload, c => ({
-          status: keepCompletedElseRunning(c.status),
-          thinking: pushThinking(c.thinking, text)
-        }))
+        // Update-only: never resurrect subagents whose spawn_requested/start
+        // we missed or that already flushed via message.complete.
+        turnController.upsertSubagent(
+          ev.payload,
+          c => ({
+            status: keepTerminalElseRunning(c.status),
+            thinking: pushThinking(c.thinking, text)
+          }),
+          { createIfMissing: false }
+        )

        return
      }
@@ -354,10 +427,14 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev:
          ev.payload.tool_preview ?? ev.payload.text ?? ''
        )

-        turnController.upsertSubagent(ev.payload, c => ({
-          status: keepCompletedElseRunning(c.status),
-          tools: pushTool(c.tools, line)
-        }))
+        turnController.upsertSubagent(
+          ev.payload,
+          c => ({
+            status: keepTerminalElseRunning(c.status),
+            tools: pushTool(c.tools, line)
+          }),
+          { createIfMissing: false }
+        )

        return
      }
@@ -369,20 +446,28 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev:
          return
        }

-        turnController.upsertSubagent(ev.payload, c => ({
-          notes: pushNote(c.notes, text),
-          status: keepCompletedElseRunning(c.status)
-        }))
+        turnController.upsertSubagent(
+          ev.payload,
+          c => ({
+            notes: pushNote(c.notes, text),
+            status: keepTerminalElseRunning(c.status)
+          }),
+          { createIfMissing: false }
+        )

        return
      }

      case 'subagent.complete':
-        turnController.upsertSubagent(ev.payload, c => ({
-          durationSeconds: ev.payload.duration_seconds ?? c.durationSeconds,
-          status: ev.payload.status ?? 'completed',
-          summary: ev.payload.summary || ev.payload.text || c.summary
-        }))
+        turnController.upsertSubagent(
+          ev.payload,
+          c => ({
+            durationSeconds: ev.payload.duration_seconds ?? c.durationSeconds,
+            status: ev.payload.status ?? 'completed',
+            summary: ev.payload.summary || ev.payload.text || c.summary
+          }),
+          { createIfMissing: false }
+        )

        return

@@ -0,0 +1,77 @@
+import { atom } from 'nanostores'
+
+import type { DelegationStatusResponse } from '../gatewayTypes.js'
+
+export interface DelegationState {
+  // Last known caps from `delegation.status` RPC.  null until fetched.
+  maxConcurrentChildren: null | number
+  maxSpawnDepth: null | number
+  // True when spawning is globally paused (see tools/delegate_tool.py).
+  paused: boolean
+  // Monotonic clock of the last successful status fetch.
+  updatedAt: null | number
+}
+
+const buildState = (): DelegationState => ({
+  maxConcurrentChildren: null,
+  maxSpawnDepth: null,
+  paused: false,
+  updatedAt: null
+})
+
+export const $delegationState = atom<DelegationState>(buildState())
+
+export const getDelegationState = () => $delegationState.get()
+
+export const patchDelegationState = (next: Partial<DelegationState>) =>
+  $delegationState.set({ ...$delegationState.get(), ...next })
+
+export const resetDelegationState = () => $delegationState.set(buildState())
+
+// ── Overlay accordion open-state ──────────────────────────────────────
+//
+// Lifted out of OverlaySection's local useState so collapse choices
+// survive:
+//   - navigating to a different subagent (Detail remounts)
+//   - switching list ↔ detail mode (Detail unmounts in list mode)
+//   - walking history (←/→)
+// Keyed by section title; missing entries fall back to the section's
+// `defaultOpen` prop.
+
+export const $overlaySectionsOpen = atom<Record<string, boolean>>({})
+
+export const toggleOverlaySection = (title: string, defaultOpen: boolean) => {
+  const state = $overlaySectionsOpen.get()
+  const current = title in state ? state[title]! : defaultOpen
+
+  $overlaySectionsOpen.set({ ...state, [title]: !current })
+}
+
+export const getOverlaySectionOpen = (title: string, defaultOpen: boolean): boolean => {
+  const state = $overlaySectionsOpen.get()
+
+  return title in state ? state[title]! : defaultOpen
+}
+
+/** Merge a raw RPC response into the store.  Tolerant of partial/omitted fields. */
+export const applyDelegationStatus = (r: DelegationStatusResponse | null | undefined) => {
+  if (!r) {
+    return
+  }
+
+  const patch: Partial<DelegationState> = { updatedAt: Date.now() }
+
+  if (typeof r.max_spawn_depth === 'number') {
+    patch.maxSpawnDepth = r.max_spawn_depth
+  }
+
+  if (typeof r.max_concurrent_children === 'number') {
+    patch.maxConcurrentChildren = r.max_concurrent_children
+  }
+
+  if (typeof r.paused === 'boolean') {
+    patch.paused = r.paused
+  }
+
+  patchDelegationState(patch)
+}
@@ -53,6 +53,8 @@ export interface GatewayProviderProps {
 }

 export interface OverlayState {
+  agents: boolean
+  agentsInitialHistoryIndex: number
  approval: ApprovalReq | null
  clarify: ClarifyReq | null
  confirm: ConfirmReq | null
@@ -3,6 +3,8 @@ import { atom, computed } from 'nanostores'
 import type { OverlayState } from './interfaces.js'

 const buildOverlayState = (): OverlayState => ({
+  agents: false,
+  agentsInitialHistoryIndex: 0,
  approval: null,
  clarify: null,
  confirm: null,
@@ -18,8 +20,8 @@ export const $overlayState = atom<OverlayState>(buildOverlayState())

 export const $isBlocked = computed(
  $overlayState,
-  ({ approval, clarify, confirm, modelPicker, pager, picker, secret, skillsHub, sudo }) =>
-    Boolean(approval || clarify || confirm || modelPicker || pager || picker || secret || skillsHub || sudo)
+  ({ agents, approval, clarify, confirm, modelPicker, pager, picker, secret, skillsHub, sudo }) =>
+    Boolean(agents || approval || clarify || confirm || modelPicker || pager || picker || secret || skillsHub || sudo)
 )

 export const getOverlayState = () => $overlayState.get()
@@ -27,4 +29,23 @@ export const getOverlayState = () => $overlayState.get()
 export const patchOverlayState = (next: Partial<OverlayState> | ((state: OverlayState) => OverlayState)) =>
  $overlayState.set(typeof next === 'function' ? next($overlayState.get()) : { ...$overlayState.get(), ...next })

+/** Full reset — used by session/turn teardown and tests. */
 export const resetOverlayState = () => $overlayState.set(buildOverlayState())
+
+/**
+ * Soft reset: drop FLOW-scoped overlays (approval / clarify / confirm / sudo
+ * / secret / pager) but PRESERVE user-toggled ones — agents dashboard, model
+ * picker, skills hub, session picker.  Those are opened deliberately and
+ * shouldn't vanish when a turn ends.  Called from turnController.idle() on
+ * every turn completion / interrupt; the old "reset everything" behaviour
+ * silently closed /agents the moment delegation finished.
+ */
+export const resetFlowOverlays = () =>
+  $overlayState.set({
+    ...buildOverlayState(),
+    agents: $overlayState.get().agents,
+    agentsInitialHistoryIndex: $overlayState.get().agentsInitialHistoryIndex,
+    modelPicker: $overlayState.get().modelPicker,
+    picker: $overlayState.get().picker,
+    skillsHub: $overlayState.get().skillsHub
+  })
@@ -1,6 +1,14 @@
-import type { SlashExecResponse, ToolsConfigureResponse } from '../../../gatewayTypes.js'
+import type {
+  DelegationPauseResponse,
+  SlashExecResponse,
+  SpawnTreeListResponse,
+  SpawnTreeLoadResponse,
+  ToolsConfigureResponse
+} from '../../../gatewayTypes.js'
 import type { PanelSection } from '../../../types.js'
+import { applyDelegationStatus, getDelegationState } from '../../delegationStore.js'
 import { patchOverlayState } from '../../overlayStore.js'
+import { getSpawnHistory, pushDiskSnapshot, setDiffPair, type SpawnSnapshot } from '../../spawnHistoryStore.js'
 import type { SlashCommand } from '../types.js'

 interface SkillInfo {
@@ -42,6 +50,163 @@ interface SkillsBrowseResponse {
 }

 export const opsCommands: SlashCommand[] = [
+  {
+    aliases: ['tasks'],
+    help: 'open the spawn-tree dashboard (live audit + kill/pause controls)',
+    name: 'agents',
+    run: (arg, ctx) => {
+      const sub = arg.trim().toLowerCase()
+
+      // Stay compatible with the gateway `/agents [pause|resume|status]` CLI —
+      // explicit subcommands skip the overlay and act directly so scripts and
+      // multi-step flows can drive it without entering interactive mode.
+      if (sub === 'pause' || sub === 'resume' || sub === 'unpause') {
+        const paused = sub === 'pause'
+        ctx.gateway.gw
+          .request<DelegationPauseResponse>('delegation.pause', { paused })
+          .then(r => {
+            applyDelegationStatus({ paused: r?.paused })
+            ctx.transcript.sys(`delegation · ${r?.paused ? 'paused' : 'resumed'}`)
+          })
+          .catch(ctx.guardedErr)
+
+        return
+      }
+
+      if (sub === 'status') {
+        const d = getDelegationState()
+        ctx.transcript.sys(
+          `delegation · ${d.paused ? 'paused' : 'active'} · caps d${d.maxSpawnDepth ?? '?'}/${d.maxConcurrentChildren ?? '?'}`
+        )
+
+        return
+      }
+
+      patchOverlayState({ agents: true, agentsInitialHistoryIndex: 0 })
+    }
+  },
+
+  {
+    help: 'replay a completed spawn tree · `/replay [N|last|list|load <path>]`',
+    name: 'replay',
+    run: (arg, ctx) => {
+      const history = getSpawnHistory()
+      const raw = arg.trim()
+      const lower = raw.toLowerCase()
+
+      // ── Disk-backed listing ─────────────────────────────────────
+      if (lower === 'list' || lower === 'ls') {
+        ctx.gateway
+          .rpc<SpawnTreeListResponse>('spawn_tree.list', {
+            limit: 30,
+            session_id: ctx.sid ?? 'default'
+          })
+          .then(
+            ctx.guarded<SpawnTreeListResponse>(r => {
+              const entries = r.entries ?? []
+
+              if (!entries.length) {
+                return ctx.transcript.sys('no archived spawn trees on disk for this session')
+              }
+
+              const rows: [string, string][] = entries.map(e => {
+                const ts = e.finished_at ? new Date(e.finished_at * 1000).toLocaleString() : '?'
+                const label = e.label || `${e.count} subagents`
+
+                return [`${ts} · ${e.count}×`, `${label}\n  ${e.path}`]
+              })
+
+              ctx.transcript.panel('Archived spawn trees', [{ rows }])
+            })
+          )
+          .catch(ctx.guardedErr)
+
+        return
+      }
+
+      // ── Disk-backed load by path ─────────────────────────────────
+      if (lower.startsWith('load ')) {
+        const path = raw.slice(5).trim()
+
+        if (!path) {
+          return ctx.transcript.sys('usage: /replay load <path>')
+        }
+
+        ctx.gateway
+          .rpc<SpawnTreeLoadResponse>('spawn_tree.load', { path })
+          .then(
+            ctx.guarded<SpawnTreeLoadResponse>(r => {
+              if (!r.subagents?.length) {
+                return ctx.transcript.sys('snapshot empty or unreadable')
+              }
+
+              // Push onto the in-memory history so the overlay picks it up
+              // by index 1 just like any other snapshot.
+              pushDiskSnapshot(r, path)
+              patchOverlayState({ agents: true, agentsInitialHistoryIndex: 1 })
+            })
+          )
+          .catch(ctx.guardedErr)
+
+        return
+      }
+
+      // ── In-memory nav (same-session) ─────────────────────────────
+      if (!history.length) {
+        return ctx.transcript.sys('no completed spawn trees this session · try /replay list')
+      }
+
+      let index = 1
+
+      if (raw && lower !== 'last') {
+        const parsed = parseInt(raw, 10)
+
+        if (Number.isNaN(parsed) || parsed < 1 || parsed > history.length) {
+          return ctx.transcript.sys(`replay: index out of range 1..${history.length} · use /replay list for disk`)
+        }
+
+        index = parsed
+      }
+
+      patchOverlayState({ agents: true, agentsInitialHistoryIndex: index })
+    }
+  },
+
+  {
+    help: 'diff two completed spawn trees · `/replay-diff <baseline> <candidate>` (indexes from /replay list or history N)',
+    name: 'replay-diff',
+    run: (arg, ctx) => {
+      const parts = arg.trim().split(/\s+/).filter(Boolean)
+
+      if (parts.length !== 2) {
+        return ctx.transcript.sys('usage: /replay-diff <a> <b>  (e.g. /replay-diff 1 2 for last two)')
+      }
+
+      const [a, b] = parts
+      const history = getSpawnHistory()
+
+      const resolve = (token: string): null | SpawnSnapshot => {
+        const n = parseInt(token!, 10)
+
+        if (Number.isFinite(n) && n >= 1 && n <= history.length) {
+          return history[n - 1] ?? null
+        }
+
+        return null
+      }
+
+      const baseline = resolve(a!)
+      const candidate = resolve(b!)
+
+      if (!baseline || !candidate) {
+        return ctx.transcript.sys(`replay-diff: could not resolve indices · history has ${history.length} entries`)
+      }
+
+      setDiffPair({ baseline, candidate })
+      patchOverlayState({ agents: true, agentsInitialHistoryIndex: 0 })
+    }
+  },
+
  {
    help: 'browse, inspect, install skills',
    name: 'skills',
@@ -0,0 +1,139 @@
+import { atom } from 'nanostores'
+
+import type { SpawnTreeLoadResponse } from '../gatewayTypes.js'
+import type { SubagentProgress } from '../types.js'
+
+export interface SpawnSnapshot {
+  finishedAt: number
+  fromDisk?: boolean
+  id: string
+  label: string
+  path?: string
+  sessionId: null | string
+  startedAt: number
+  subagents: SubagentProgress[]
+}
+
+export interface SpawnDiffPair {
+  baseline: SpawnSnapshot
+  candidate: SpawnSnapshot
+}
+
+const HISTORY_LIMIT = 10
+
+export const $spawnHistory = atom<SpawnSnapshot[]>([])
+export const $spawnDiff = atom<null | SpawnDiffPair>(null)
+
+export const getSpawnHistory = () => $spawnHistory.get()
+export const getSpawnDiff = () => $spawnDiff.get()
+
+export const clearSpawnHistory = () => $spawnHistory.set([])
+export const clearDiffPair = () => $spawnDiff.set(null)
+export const setDiffPair = (pair: SpawnDiffPair) => $spawnDiff.set(pair)
+
+/**
+ * Commit a finished turn's spawn tree to history.  Keeps the last 10
+ * non-empty snapshots — empty turns (no subagents) are dropped.
+ *
+ * Why in-memory?  The primary investigation loop is "I just ran a fan-out,
+ * it misbehaved, let me look at what happened" — same-session debugging.
+ * Disk persistence across process restarts is a natural extension but
+ * adds RPC surface for a less-common path.
+ */
+export const pushSnapshot = (
+  subagents: readonly SubagentProgress[],
+  meta: { sessionId?: null | string; startedAt?: null | number }
+) => {
+  if (!subagents.length) {
+    return
+  }
+
+  const now = Date.now()
+  const started = meta.startedAt ?? Math.min(...subagents.map(s => s.startedAt ?? now))
+
+  const snap: SpawnSnapshot = {
+    finishedAt: now,
+    id: `snap-${now.toString(36)}`,
+    label: summarizeLabel(subagents),
+    sessionId: meta.sessionId ?? null,
+    startedAt: Number.isFinite(started) ? started : now,
+    subagents: subagents.map(item => ({ ...item }))
+  }
+
+  const next = [snap, ...$spawnHistory.get()].slice(0, HISTORY_LIMIT)
+  $spawnHistory.set(next)
+}
+
+function summarizeLabel(subagents: readonly SubagentProgress[]): string {
+  const top = subagents
+    .filter(s => s.parentId == null || subagents.every(o => o.id !== s.parentId))
+    .slice(0, 2)
+    .map(s => s.goal || 'subagent')
+    .join(' · ')
+
+  return top || `${subagents.length} agent${subagents.length === 1 ? '' : 's'}`
+}
+
+/**
+ * Push a disk-loaded snapshot onto the front of the history stack so the
+ * overlay can pick it up at index 1 via /replay load.  Normalises the
+ * server payload (arbitrary list) into the same SubagentProgress shape
+ * used for live data — defensive against cross-version reads.
+ */
+export const pushDiskSnapshot = (r: SpawnTreeLoadResponse, path: string) => {
+  const raw = Array.isArray(r.subagents) ? r.subagents : []
+  const normalised = raw.map(normaliseSubagent)
+
+  if (!normalised.length) {
+    return
+  }
+
+  const snap: SpawnSnapshot = {
+    finishedAt: (r.finished_at ?? Date.now() / 1000) * 1000,
+    fromDisk: true,
+    id: `disk-${path}`,
+    label: r.label || `${normalised.length} subagents`,
+    path,
+    sessionId: r.session_id ?? null,
+    startedAt: (r.started_at ?? r.finished_at ?? Date.now() / 1000) * 1000,
+    subagents: normalised
+  }
+
+  const next = [snap, ...$spawnHistory.get()].slice(0, HISTORY_LIMIT)
+  $spawnHistory.set(next)
+}
+
+function normaliseSubagent(raw: unknown): SubagentProgress {
+  const o = raw as Record<string, unknown>
+  const s = (v: unknown) => (typeof v === 'string' ? v : undefined)
+  const n = (v: unknown) => (typeof v === 'number' ? v : undefined)
+  const arr = <T>(v: unknown): T[] | undefined => (Array.isArray(v) ? (v as T[]) : undefined)
+
+  return {
+    apiCalls: n(o.apiCalls),
+    costUsd: n(o.costUsd),
+    depth: typeof o.depth === 'number' ? o.depth : 0,
+    durationSeconds: n(o.durationSeconds),
+    filesRead: arr<string>(o.filesRead),
+    filesWritten: arr<string>(o.filesWritten),
+    goal: s(o.goal) ?? 'subagent',
+    id: s(o.id) ?? `sa-${Math.random().toString(36).slice(2, 8)}`,
+    index: typeof o.index === 'number' ? o.index : 0,
+    inputTokens: n(o.inputTokens),
+    iteration: n(o.iteration),
+    model: s(o.model),
+    notes: (arr<string>(o.notes) ?? []).filter(x => typeof x === 'string'),
+    outputTail: arr(o.outputTail) as SubagentProgress['outputTail'],
+    outputTokens: n(o.outputTokens),
+    parentId: s(o.parentId) ?? null,
+    reasoningTokens: n(o.reasoningTokens),
+    startedAt: n(o.startedAt),
+    status: (s(o.status) as SubagentProgress['status']) ?? 'completed',
+    summary: s(o.summary),
+    taskCount: typeof o.taskCount === 'number' ? o.taskCount : 1,
+    thinking: (arr<string>(o.thinking) ?? []).filter(x => typeof x === 'string'),
+    toolCount: typeof o.toolCount === 'number' ? o.toolCount : 0,
+    tools: (arr<string>(o.tools) ?? []).filter(x => typeof x === 'string'),
+    toolsets: arr<string>(o.toolsets)
+  }
+}
@@ -10,8 +10,9 @@ import {
 } from '../lib/text.js'
 import type { ActiveTool, ActivityItem, Msg, SubagentProgress } from '../types.js'

-import { resetOverlayState } from './overlayStore.js'
-import { patchTurnState, resetTurnState } from './turnStore.js'
+import { resetFlowOverlays } from './overlayStore.js'
+import { pushSnapshot } from './spawnHistoryStore.js'
+import { getTurnState, patchTurnState, resetTurnState } from './turnStore.js'
 import { getUiState, patchUiState } from './uiStore.js'

 const INTERRUPT_COOLDOWN_MS = 1500
@@ -41,6 +42,7 @@ class TurnController {
  lastStatusNote = ''
  pendingInlineDiffs: string[] = []
  persistedToolLabels = new Set<string>()
+  persistSpawnTree?: (subagents: SubagentProgress[], sessionId: null | string) => Promise<void>
  protocolWarned = false
  reasoningText = ''
  segmentMessages: Msg[] = []
@@ -90,7 +92,7 @@ class TurnController {
      turnTrail: []
    })
    patchUiState({ busy: false })
-    resetOverlayState()
+    resetFlowOverlays()
  }

  interruptTurn({ appendMessage, gw, sid, sys }: InterruptDeps) {
@@ -189,9 +191,7 @@ class TurnController {
    // leading "┊ review diff" header written by `_emit_inline_diff` for the
    // terminal printer). That header only makes sense as stdout dressing,
    // not inside a markdown ```diff block.
-    const text = diffText
-      .replace(/^\s*┊[^\n]*\n?/, '')
-      .trim()
+    const text = diffText.replace(/^\s*┊[^\n]*\n?/, '').trim()

    if (!text || this.pendingInlineDiffs.includes(text)) {
      return
@@ -249,12 +249,15 @@ class TurnController {
    // markdown fence of its own — otherwise we render two stacked diff
    // blocks for the same edit.
    const assistantAlreadyHasDiff = /```(?:diff|patch)\b/i.test(finalText)
+
    const remainingInlineDiffs = assistantAlreadyHasDiff
      ? []
      : this.pendingInlineDiffs.filter(diff => !finalText.includes(diff))
+
    const inlineDiffBlock = remainingInlineDiffs.length
      ? `\`\`\`diff\n${remainingInlineDiffs.join('\n\n')}\n\`\`\``
      : ''
+
    const mergedText = [finalText, inlineDiffBlock].filter(Boolean).join('\n\n')
    const existingReasoning = this.reasoningText.trim() || String(payload.reasoning ?? '').trim()
    const savedReasoning = [existingReasoning, existingReasoning ? '' : split.reasoning].filter(Boolean).join('\n\n')
@@ -276,6 +279,20 @@ class TurnController {

    const wasInterrupted = this.interrupted

+    // Archive the turn's spawn tree to history BEFORE idle() drops subagents
+    // from turnState.  Lets /replay and the overlay's history nav pull up
+    // finished fan-outs without a round-trip to disk.
+    const finishedSubagents = getTurnState().subagents
+    const sessionId = getUiState().sid
+
+    if (finishedSubagents.length > 0) {
+      pushSnapshot(finishedSubagents, { sessionId, startedAt: null })
+      // Fire-and-forget disk persistence so /replay survives process restarts.
+      // The same snapshot lives in memory via spawnHistoryStore for immediate
+      // recall — disk is the long-term archive.
+      void this.persistSpawnTree?.(finishedSubagents, sessionId)
+    }
+
    this.idle()
    this.clearReasoning()
    this.turnTools = []
@@ -443,33 +460,82 @@ class TurnController {
    patchTurnState({ activity: [], outcome: '', subagents: [], toolTokens: 0, tools: [], turnTrail: [] })
  }

-  upsertSubagent(p: SubagentEventPayload, patch: (current: SubagentProgress) => Partial<SubagentProgress>) {
-    const id = `sa:${p.task_index}:${p.goal || 'subagent'}`
+  upsertSubagent(
+    p: SubagentEventPayload,
+    patch: (current: SubagentProgress) => Partial<SubagentProgress>,
+    opts: { createIfMissing?: boolean } = { createIfMissing: true }
+  ) {
+    // Stable id: prefer the server-issued subagent_id (survives nested
+    // grandchildren + cross-tree joins).  Fall back to the composite key
+    // for older gateways that omit the field — those produce a flat list.
+    const id = p.subagent_id || `sa:${p.task_index}:${p.goal || 'subagent'}`

    patchTurnState(state => {
      const existing = state.subagents.find(item => item.id === id)

+      // Late events (subagent.complete/tool/progress arriving after message.complete
+      // has already fired idle()) would otherwise resurrect a finished
+      // subagent into turn.subagents and block the "finished" title on the
+      // /agents overlay.  When `createIfMissing` is false we drop silently.
+      if (!existing && !opts.createIfMissing) {
+        return state
+      }
+
      const base: SubagentProgress = existing ?? {
+        depth: p.depth ?? 0,
        goal: p.goal,
        id,
        index: p.task_index,
+        model: p.model,
        notes: [],
+        parentId: p.parent_id ?? null,
+        startedAt: Date.now(),
        status: 'running',
        taskCount: p.task_count ?? 1,
        thinking: [],
-        tools: []
+        toolCount: p.tool_count ?? 0,
+        tools: [],
+        toolsets: p.toolsets
      }

+      // Map snake_case payload keys onto camelCase state.  Only overwrite
+      // when the event actually carries the field; `??` preserves prior
+      // values across streaming events that emit partial payloads.
+      const outputTail = p.output_tail
+        ? p.output_tail.map(e => ({
+            isError: Boolean(e.is_error),
+            preview: String(e.preview ?? ''),
+            tool: String(e.tool ?? 'tool')
+          }))
+        : base.outputTail
+
      const next: SubagentProgress = {
        ...base,
+        apiCalls: p.api_calls ?? base.apiCalls,
+        costUsd: p.cost_usd ?? base.costUsd,
+        depth: p.depth ?? base.depth,
+        filesRead: p.files_read ?? base.filesRead,
+        filesWritten: p.files_written ?? base.filesWritten,
        goal: p.goal || base.goal,
+        inputTokens: p.input_tokens ?? base.inputTokens,
+        iteration: p.iteration ?? base.iteration,
+        model: p.model ?? base.model,
+        outputTail,
+        outputTokens: p.output_tokens ?? base.outputTokens,
+        parentId: p.parent_id ?? base.parentId,
+        reasoningTokens: p.reasoning_tokens ?? base.reasoningTokens,
        taskCount: p.task_count ?? base.taskCount,
+        toolCount: p.tool_count ?? base.toolCount,
+        toolsets: p.toolsets ?? base.toolsets,
        ...patch(base)
      }

+      // Stable order: by spawn (depth, parent, index) rather than insert time.
+      // Without it, grandchildren can shuffle relative to siblings when
+      // events arrive out of order under high concurrency.
      const subagents = existing
        ? state.subagents.map(item => (item.id === id ? next : item))
-        : [...state.subagents, next].sort((a, b) => a.index - b.index)
+        : [...state.subagents, next].sort((a, b) => a.depth - b.depth || a.index - b.index)

      return { ...state, subagents }
    })
@@ -74,6 +74,10 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
    if (overlay.picker) {
      return patchOverlayState({ picker: false })
    }
+
+    if (overlay.agents) {
+      return patchOverlayState({ agents: false })
+    }
  }

  const cycleQueue = (dir: 1 | -1) => {
@@ -180,6 +184,7 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
        if (isCtrl(key, ch, 'c')) {
          cancelOverlayFromCtrlC()
        }
+
        return
      }

@@ -290,6 +295,7 @@ export function useInputHandlers(ctx: InputHandlerContext): InputHandlerResult {
    if (key.upArrow && !cState.inputBuf.length) {
      const inputSel = getInputSelection()
      const cursor = inputSel && inputSel.start === inputSel.end ? inputSel.start : null
+
      const noLineAbove =
        !cState.input || (cursor !== null && cState.input.lastIndexOf('\n', Math.max(0, cursor - 1)) < 0)

@@ -1,10 +1,14 @@
 import { Box, type ScrollBoxHandle, Text } from '@hermes/ink'
-import { type ReactNode, type RefObject, useCallback, useEffect, useState, useSyncExternalStore } from 'react'
+import { useStore } from '@nanostores/react'
+import { type ReactNode, type RefObject, useCallback, useEffect, useMemo, useState, useSyncExternalStore } from 'react'

+import { $delegationState } from '../app/delegationStore.js'
+import { $turnState } from '../app/turnStore.js'
 import { FACES } from '../content/faces.js'
 import { VERBS } from '../content/verbs.js'
 import { fmtDuration } from '../domain/messages.js'
 import { stickyPromptFromViewport } from '../domain/viewport.js'
+import { buildSubagentTree, treeTotals, widthByDepth } from '../lib/subagentTree.js'
 import { fmtK } from '../lib/text.js'
 import type { Theme } from '../theme.js'
 import type { Msg, Usage } from '../types.js'
@@ -60,6 +64,67 @@ function ctxBar(pct: number | undefined, w = 10) {
  return '█'.repeat(filled) + '░'.repeat(w - filled)
 }

+function SpawnHud({ t }: { t: Theme }) {
+  // Tight HUD that only appears when the session is actually fanning out.
+  // Colour escalates to warn/error as depth or concurrency approaches the cap.
+  const delegation = useStore($delegationState)
+  const turn = useStore($turnState)
+
+  const tree = useMemo(() => buildSubagentTree(turn.subagents), [turn.subagents])
+  const totals = useMemo(() => treeTotals(tree), [tree])
+
+  if (!totals.descendantCount && !delegation.paused) {
+    return null
+  }
+
+  const maxDepth = delegation.maxSpawnDepth
+  const maxConc = delegation.maxConcurrentChildren
+  const depth = Math.max(0, totals.maxDepthFromHere)
+  const active = totals.activeCount
+
+  // `max_concurrent_children` is a per-parent cap, not a global one.
+  // `activeCount` sums every running agent across the tree and would
+  // over-warn for multi-orchestrator runs.  The widest level of the tree
+  // is a closer proxy to "most concurrent spawns that could be hitting a
+  // single parent's slot budget".
+  const widestLevel = widthByDepth(tree).reduce((a, b) => Math.max(a, b), 0)
+  const depthRatio = maxDepth ? depth / maxDepth : 0
+  const concRatio = maxConc ? widestLevel / maxConc : 0
+  const ratio = Math.max(depthRatio, concRatio)
+
+  const color = delegation.paused || ratio >= 1 ? t.color.error : ratio >= 0.66 ? t.color.warn : t.color.dim
+
+  const pieces: string[] = []
+
+  if (delegation.paused) {
+    pieces.push('⏸ paused')
+  }
+
+  if (totals.descendantCount > 0) {
+    const depthLabel = maxDepth ? `${depth}/${maxDepth}` : `${depth}`
+    pieces.push(`d${depthLabel}`)
+
+    if (active > 0) {
+      // Label pairs the widest-level count (drives concRatio above) with
+      // the total active count for context.  `W/cap` triggers the warn,
+      // `+N` is everything else currently running across the tree.
+      const extra = Math.max(0, active - widestLevel)
+      const widthLabel = maxConc ? `${widestLevel}/${maxConc}` : `${widestLevel}`
+      const suffix = extra > 0 ? `+${extra}` : ''
+      pieces.push(`⚡${widthLabel}${suffix}`)
+    }
+  }
+
+  const atCap = depthRatio >= 1 || concRatio >= 1
+
+  return (
+    <Text color={color}>
+      {atCap ? ' │ ⚠ ' : ' │ '}
+      {pieces.join(' ')}
+    </Text>
+  )
+}
+
 function SessionDuration({ startedAt }: { startedAt: number }) {
  const [now, setNow] = useState(() => Date.now())

@@ -145,6 +210,7 @@ export function StatusRule({
              <SessionDuration startedAt={sessionStartedAt} />
            </Text>
          ) : null}
+          <SpawnHud t={t} />
          {voiceLabel ? <Text color={t.color.dim}> │ {voiceLabel}</Text> : null}
          {bgCount > 0 ? <Text color={t.color.dim}> │ {bgCount} bg</Text> : null}
          {showCost && typeof usage.cost_usd === 'number' ? (
@@ -2,13 +2,15 @@ import { AlternateScreen, Box, NoSelect, ScrollBox, Text } from '@hermes/ink'
 import { useStore } from '@nanostores/react'
 import { memo } from 'react'

+import { useGateway } from '../app/gatewayContext.js'
 import type { AppLayoutProgressProps, AppLayoutProps } from '../app/interfaces.js'
-import { $isBlocked } from '../app/overlayStore.js'
+import { $isBlocked, $overlayState, patchOverlayState } from '../app/overlayStore.js'
 import { $uiState } from '../app/uiStore.js'
 import { PLACEHOLDER } from '../content/placeholders.js'
 import type { Theme } from '../theme.js'
 import type { DetailsMode } from '../types.js'

+import { AgentsOverlay } from './agentsOverlay.js'
 import { GoodVibesHeart, StatusRule, StickyPromptTracker, TranscriptScrollbar } from './appChrome.js'
 import { FloatingOverlays, PromptZone } from './appOverlays.js'
 import { Banner, Panel, SessionPanel } from './branding.js'
@@ -256,6 +258,21 @@ const ComposerPane = memo(function ComposerPane({
  )
 })

+const AgentsOverlayPane = memo(function AgentsOverlayPane() {
+  const { gw } = useGateway()
+  const ui = useStore($uiState)
+  const overlay = useStore($overlayState)
+
+  return (
+    <AgentsOverlay
+      gw={gw}
+      initialHistoryIndex={overlay.agentsInitialHistoryIndex}
+      onClose={() => patchOverlayState({ agents: false, agentsInitialHistoryIndex: 0 })}
+      t={ui.theme}
+    />
+  )
+})
+
 export const AppLayout = memo(function AppLayout({
  actions,
  composer,
@@ -264,22 +281,30 @@ export const AppLayout = memo(function AppLayout({
  status,
  transcript
 }: AppLayoutProps) {
+  const overlay = useStore($overlayState)
+
  return (
    <AlternateScreen mouseTracking={mouseTracking}>
      <Box flexDirection="column" flexGrow={1}>
        <Box flexDirection="row" flexGrow={1}>
-          <TranscriptPane actions={actions} composer={composer} progress={progress} transcript={transcript} />
+          {overlay.agents ? (
+            <AgentsOverlayPane />
+          ) : (
+            <TranscriptPane actions={actions} composer={composer} progress={progress} transcript={transcript} />
+          )}
        </Box>

-        <PromptZone
-          cols={composer.cols}
-          onApprovalChoice={actions.answerApproval}
-          onClarifyAnswer={actions.answerClarify}
-          onSecretSubmit={actions.answerSecret}
-          onSudoSubmit={actions.answerSudo}
-        />
+        {!overlay.agents && (
+          <PromptZone
+            cols={composer.cols}
+            onApprovalChoice={actions.answerApproval}
+            onClarifyAnswer={actions.answerClarify}
+            onSecretSubmit={actions.answerSecret}
+            onSudoSubmit={actions.answerSudo}
+          />
+        )}

-        <ComposerPane actions={actions} composer={composer} status={status} />
+        {!overlay.agents && <ComposerPane actions={actions} composer={composer} status={status} />}
      </Box>
    </AlternateScreen>
  )
@@ -615,14 +615,7 @@ export function TextInput({
        return
      }

-      if (
-        (k.ctrl && inp === 'c') ||
-        k.tab ||
-        (k.shift && k.tab) ||
-        k.pageUp ||
-        k.pageDown ||
-        k.escape
-      ) {
+      if ((k.ctrl && inp === 'c') || k.tab || (k.shift && k.tab) || k.pageUp || k.pageDown || k.escape) {
        return
      }

@@ -1,8 +1,19 @@
 import { Box, NoSelect, Text } from '@hermes/ink'
-import { memo, useEffect, useMemo, useState, type ReactNode } from 'react'
+import { memo, type ReactNode, useEffect, useMemo, useState } from 'react'
 import spinners, { type BrailleSpinnerName } from 'unicode-animations'

 import { THINKING_COT_MAX } from '../config/limits.js'
+import {
+  buildSubagentTree,
+  fmtCost,
+  fmtTokens,
+  formatSummary as formatSpawnSummary,
+  hotnessBucket,
+  peakHotness,
+  sparkline,
+  treeTotals,
+  widthByDepth
+} from '../lib/subagentTree.js'
 import {
  compactPreview,
  estimateTokensRough,
@@ -14,7 +25,7 @@ import {
  toolTrailLabel
 } from '../lib/text.js'
 import type { Theme } from '../theme.js'
-import type { ActiveTool, ActivityItem, DetailsMode, SubagentProgress, ThinkingMode } from '../types.js'
+import type { ActiveTool, ActivityItem, DetailsMode, SubagentNode, SubagentProgress, ThinkingMode } from '../types.js'

 const THINK: BrailleSpinnerName[] = ['helix', 'breathe', 'orbit', 'dna', 'waverows', 'snake', 'pulse']
 const TOOL: BrailleSpinnerName[] = ['cascade', 'scan', 'diagswipe', 'fillsweep', 'rain', 'columns', 'sparkle']
@@ -106,6 +117,8 @@ function TreeNode({
  header,
  open,
  rails = [],
+  stemColor,
+  stemDim,
  t
 }: {
  branch: TreeBranch
@@ -113,11 +126,13 @@ function TreeNode({
  header: ReactNode
  open: boolean
  rails?: TreeRails
+  stemColor?: string
+  stemDim?: boolean
  t: Theme
 }) {
  return (
    <Box flexDirection="column">
-      <TreeRow branch={branch} rails={rails} t={t}>
+      <TreeRow branch={branch} rails={rails} stemColor={stemColor} stemDim={stemDim} t={t}>
        {header}
      </TreeRow>
      {open ? children?.(nextTreeRails(rails, branch)) : null}
@@ -239,16 +254,31 @@ function Chevron({
  )
 }

+function heatColor(node: SubagentNode, peak: number, theme: Theme): string | undefined {
+  const palette = [theme.color.bronze, theme.color.amber, theme.color.gold, theme.color.warn, theme.color.error]
+  const idx = hotnessBucket(node.aggregate.hotness, peak, palette.length)
+
+  // Below the median bucket we keep the default dim stem so cool branches
+  // fade into the chrome — only "hot" branches draw the eye.
+  if (idx < 2) {
+    return undefined
+  }
+
+  return palette[idx]
+}
+
 function SubagentAccordion({
  branch,
  expanded,
-  item,
+  node,
+  peak,
  rails = [],
  t
 }: {
  branch: TreeBranch
  expanded: boolean
-  item: SubagentProgress
+  node: SubagentNode
+  peak: number
  rails?: TreeRails
  t: Theme
 }) {
@@ -257,6 +287,7 @@ function SubagentAccordion({
  const [openThinking, setOpenThinking] = useState(expanded)
  const [openTools, setOpenTools] = useState(expanded)
  const [openNotes, setOpenNotes] = useState(expanded)
+  const [openKids, setOpenKids] = useState(expanded)

  useEffect(() => {
    if (!expanded) {
@@ -268,6 +299,7 @@ function SubagentAccordion({
    setOpenThinking(true)
    setOpenTools(true)
    setOpenNotes(true)
+    setOpenKids(true)
  }, [expanded])

  const expandAll = () => {
@@ -276,8 +308,13 @@ function SubagentAccordion({
    setOpenThinking(true)
    setOpenTools(true)
    setOpenNotes(true)
+    setOpenKids(true)
  }

+  const item = node.item
+  const children = node.children
+  const aggregate = node.aggregate
+
  const statusTone: 'dim' | 'error' | 'warn' =
    item.status === 'failed' ? 'error' : item.status === 'interrupted' ? 'warn' : 'dim'

@@ -286,10 +323,60 @@ function SubagentAccordion({
  const title = `${prefix}${open ? goalLabel : compactPreview(goalLabel, 60)}`
  const summary = compactPreview((item.summary || '').replace(/\s+/g, ' ').trim(), 72)

-  const suffix =
-    item.status === 'running'
-      ? 'running'
-      : `${item.status}${item.durationSeconds ? ` · ${fmtElapsed(item.durationSeconds * 1000)}` : ''}`
+  // Suffix packs branch rollup: status · elapsed · per-branch tool/agent/token/cost.
+  // Emphasises the numbers the user can't easily eyeball from a flat list.
+  const statusLabel = item.status === 'queued' ? 'queued' : item.status === 'running' ? 'running' : String(item.status)
+
+  const rollupBits: string[] = [statusLabel]
+
+  if (item.durationSeconds) {
+    rollupBits.push(fmtElapsed(item.durationSeconds * 1000))
+  }
+
+  const localTools = item.toolCount ?? 0
+  const subtreeTools = aggregate.totalTools - localTools
+
+  if (localTools > 0) {
+    rollupBits.push(`${localTools} tool${localTools === 1 ? '' : 's'}`)
+  }
+
+  const localTokens = (item.inputTokens ?? 0) + (item.outputTokens ?? 0)
+
+  if (localTokens > 0) {
+    rollupBits.push(`${fmtTokens(localTokens)} tok`)
+  }
+
+  const localCost = item.costUsd ?? 0
+
+  if (localCost > 0) {
+    rollupBits.push(fmtCost(localCost))
+  }
+
+  const filesLocal = (item.filesWritten?.length ?? 0) + (item.filesRead?.length ?? 0)
+
+  if (filesLocal > 0) {
+    rollupBits.push(`⎘${filesLocal}`)
+  }
+
+  if (children.length > 0) {
+    rollupBits.push(`${aggregate.descendantCount}↓`)
+
+    if (subtreeTools > 0) {
+      rollupBits.push(`+${subtreeTools}t sub`)
+    }
+
+    const subCost = aggregate.costUsd - localCost
+
+    if (subCost >= 0.01) {
+      rollupBits.push(`+${fmtCost(subCost)} sub`)
+    }
+
+    if (aggregate.activeCount > 0 && item.status !== 'running') {
+      rollupBits.push(`⚡${aggregate.activeCount}`)
+    }
+  }
+
+  const suffix = rollupBits.join(' · ')

  const thinkingText = item.thinking.join('\n')
  const hasThinking = Boolean(thinkingText)
@@ -418,6 +505,50 @@ function SubagentAccordion({
    })
  }

+  if (children.length > 0) {
+    // Nested grandchildren — rendered recursively via SubagentAccordion,
+    // sharing the same keybindings / expand semantics as top-level nodes.
+    sections.push({
+      header: (
+        <Chevron
+          count={children.length}
+          onClick={shift => {
+            if (shift) {
+              expandAll()
+            } else {
+              setOpenKids(v => !v)
+            }
+          }}
+          open={showChildren || openKids}
+          suffix={`d${item.depth + 1} · ${aggregate.descendantCount} total`}
+          t={t}
+          title="Spawned"
+        />
+      ),
+      key: 'subagents',
+      open: showChildren || openKids,
+      render: childRails => (
+        <Box flexDirection="column">
+          {children.map((child, i) => (
+            <SubagentAccordion
+              branch={i === children.length - 1 ? 'last' : 'mid'}
+              expanded={expanded || deep}
+              key={child.item.id}
+              node={child}
+              peak={peak}
+              rails={childRails}
+              t={t}
+            />
+          ))}
+        </Box>
+      )
+    })
+  }
+
+  // Heatmap: amber→error gradient on the stem when this branch is "hot"
+  // (high tools/sec) relative to the whole tree's peak.
+  const stem = heatColor(node, peak, t)
+
  return (
    <TreeNode
      branch={branch}
@@ -447,6 +578,8 @@ function SubagentAccordion({
      }
      open={open}
      rails={rails}
+      stemColor={stem}
+      stemDim={stem == null}
      t={t}
    >
      {childRails => (
@@ -598,6 +731,16 @@ export const ToolTrail = memo(function ToolTrail({

  const cot = useMemo(() => thinkingPreview(reasoning, 'full', THINKING_COT_MAX), [reasoning])

+  // Spawn-tree derivations must live above any early return so React's
+  // rules-of-hooks sees a stable call order.  Cheap O(N) builds memoised
+  // by subagent-list identity.
+  const spawnTree = useMemo(() => buildSubagentTree(subagents), [subagents])
+  const spawnPeak = useMemo(() => peakHotness(spawnTree), [spawnTree])
+  const spawnTotals = useMemo(() => treeTotals(spawnTree), [spawnTree])
+  const spawnWidths = useMemo(() => widthByDepth(spawnTree), [spawnTree])
+  const spawnSpark = useMemo(() => sparkline(spawnWidths), [spawnWidths])
+  const spawnSummaryLabel = useMemo(() => formatSpawnSummary(spawnTotals), [spawnTotals])
+
  if (
    !busy &&
    !trail.length &&
@@ -753,12 +896,13 @@ export const ToolTrail = memo(function ToolTrail({

  const renderSubagentList = (rails: boolean[]) => (
    <Box flexDirection="column">
-      {subagents.map((item, index) => (
+      {spawnTree.map((node, index) => (
        <SubagentAccordion
-          branch={index === subagents.length - 1 ? 'last' : 'mid'}
+          branch={index === spawnTree.length - 1 ? 'last' : 'mid'}
          expanded={detailsMode === 'expanded' || deepSubagents}
-          item={item}
-          key={item.id}
+          key={node.item.id}
+          node={node}
+          peak={spawnPeak}
          rails={rails}
          t={t}
        />
@@ -881,10 +1025,14 @@ export const ToolTrail = memo(function ToolTrail({
  }

  if (hasSubagents && !inlineDelegateKey) {
+    // Spark + summary give a one-line read on the branch shape before
+    // opening the subtree.  `/agents` opens the full-screen audit overlay.
+    const suffix = spawnSpark ? `${spawnSummaryLabel}  ${spawnSpark}  (/agents)` : `${spawnSummaryLabel}  (/agents)`
+
    sections.push({
      header: (
        <Chevron
-          count={subagents.length}
+          count={spawnTotals.descendantCount}
          onClick={shift => {
            if (shift) {
              expandAll()
@@ -895,8 +1043,9 @@ export const ToolTrail = memo(function ToolTrail({
            }
          }}
          open={detailsMode === 'expanded' || openSubagents}
+          suffix={suffix}
          t={t}
-          title="Subagents"
+          title="Spawn tree"
        />
      ),
      key: 'subagents',
@@ -280,15 +280,85 @@ export interface ReloadMcpResponse {
 // ── Subagent events ──────────────────────────────────────────────────

 export interface SubagentEventPayload {
+  api_calls?: number
+  cost_usd?: number
+  depth?: number
  duration_seconds?: number
+  files_read?: string[]
+  files_written?: string[]
  goal: string
-  status?: 'completed' | 'failed' | 'interrupted' | 'running'
+  input_tokens?: number
+  iteration?: number
+  model?: string
+  output_tail?: { is_error?: boolean; preview?: string; tool?: string }[]
+  output_tokens?: number
+  parent_id?: null | string
+  reasoning_tokens?: number
+  status?: 'completed' | 'failed' | 'interrupted' | 'queued' | 'running'
+  subagent_id?: string
  summary?: string
  task_count?: number
  task_index: number
  text?: string
+  tool_count?: number
  tool_name?: string
  tool_preview?: string
+  toolsets?: string[]
+}
+
+// ── Delegation control RPCs ──────────────────────────────────────────
+
+export interface DelegationStatusResponse {
+  active?: {
+    depth?: number
+    goal?: string
+    model?: null | string
+    parent_id?: null | string
+    started_at?: number
+    status?: string
+    subagent_id?: string
+    tool_count?: number
+  }[]
+  max_concurrent_children?: number
+  max_spawn_depth?: number
+  paused?: boolean
+}
+
+export interface DelegationPauseResponse {
+  paused?: boolean
+}
+
+export interface SubagentInterruptResponse {
+  found?: boolean
+  subagent_id?: string
+}
+
+// ── Spawn-tree snapshots ─────────────────────────────────────────────
+
+export interface SpawnTreeListEntry {
+  count: number
+  finished_at?: number
+  label?: string
+  path: string
+  session_id?: string
+  started_at?: number | null
+}
+
+export interface SpawnTreeListResponse {
+  entries?: SpawnTreeListEntry[]
+}
+
+export interface SpawnTreeLoadResponse {
+  finished_at?: number
+  label?: string
+  session_id?: string
+  started_at?: null | number
+  subagents?: unknown[]
+}
+
+export interface SpawnTreeSaveResponse {
+  path?: string
+  session_id?: string
 }

 export type GatewayEvent =
@@ -320,6 +390,7 @@ export type GatewayEvent =
  | { payload: { env_var: string; prompt: string; request_id: string }; session_id?: string; type: 'secret.request' }
  | { payload: { task_id: string; text: string }; session_id?: string; type: 'background.complete' }
  | { payload: { text: string }; session_id?: string; type: 'btw.complete' }
+  | { payload: SubagentEventPayload; session_id?: string; type: 'subagent.spawn_requested' }
  | { payload: SubagentEventPayload; session_id?: string; type: 'subagent.start' }
  | { payload: SubagentEventPayload; session_id?: string; type: 'subagent.thinking' }
  | { payload: SubagentEventPayload; session_id?: string; type: 'subagent.tool' }
@@ -0,0 +1,355 @@
+import type { SubagentAggregate, SubagentNode, SubagentProgress } from '../types.js'
+
+const ROOT_KEY = '__root__'
+
+/**
+ * Reconstruct the subagent spawn tree from a flat event-ordered list.
+ *
+ * Grouping is by `parentId`; a missing `parentId` (or one pointing at an
+ * unknown subagent) is treated as a top-level spawn of the current turn.
+ * Children within a parent are sorted by `depth` then `index` — same key
+ * used in `turnController.upsertSubagent`, so render order matches spawn
+ * order regardless of network reordering of gateway events.
+ *
+ * Older gateways omit `parentId`; every subagent is then a top-level node
+ * and the tree renders flat — matching pre-observability behaviour.
+ */
+export function buildSubagentTree(items: readonly SubagentProgress[]): SubagentNode[] {
+  if (!items.length) {
+    return []
+  }
+
+  const byParent = new Map<string, SubagentProgress[]>()
+  const known = new Set<string>()
+
+  for (const item of items) {
+    known.add(item.id)
+  }
+
+  for (const item of items) {
+    const parentKey = item.parentId && known.has(item.parentId) ? item.parentId : ROOT_KEY
+    const bucket = byParent.get(parentKey) ?? []
+    bucket.push(item)
+    byParent.set(parentKey, bucket)
+  }
+
+  for (const bucket of byParent.values()) {
+    bucket.sort((a, b) => a.depth - b.depth || a.index - b.index)
+  }
+
+  const build = (item: SubagentProgress): SubagentNode => {
+    const kids = byParent.get(item.id) ?? []
+    const children = kids.map(build)
+
+    return { aggregate: aggregate(item, children), children, item }
+  }
+
+  return (byParent.get(ROOT_KEY) ?? []).map(build)
+}
+
+/**
+ * Roll up counts for a node's whole subtree.  Kept pure so the live view
+ * and the post-hoc replay can share the same renderer unchanged.
+ *
+ * `hotness` = tools per second across the subtree — a crude proxy for
+ * "how much work is happening in this branch".  Used to colour tree rails
+ * in the overlay / inline view so the eye spots the expensive branch.
+ */
+export function aggregate(item: SubagentProgress, children: readonly SubagentNode[]): SubagentAggregate {
+  let totalTools = item.toolCount ?? 0
+  let totalDuration = item.durationSeconds ?? 0
+  let descendantCount = 0
+  let activeCount = isRunning(item) ? 1 : 0
+  let maxDepthFromHere = 0
+  let inputTokens = item.inputTokens ?? 0
+  let outputTokens = item.outputTokens ?? 0
+  let costUsd = item.costUsd ?? 0
+  let filesTouched = (item.filesRead?.length ?? 0) + (item.filesWritten?.length ?? 0)
+
+  for (const child of children) {
+    totalTools += child.aggregate.totalTools
+    totalDuration += child.aggregate.totalDuration
+    descendantCount += child.aggregate.descendantCount + 1
+    activeCount += child.aggregate.activeCount
+    maxDepthFromHere = Math.max(maxDepthFromHere, child.aggregate.maxDepthFromHere + 1)
+    inputTokens += child.aggregate.inputTokens
+    outputTokens += child.aggregate.outputTokens
+    costUsd += child.aggregate.costUsd
+    filesTouched += child.aggregate.filesTouched
+  }
+
+  const hotness = totalDuration > 0 ? totalTools / totalDuration : 0
+
+  return {
+    activeCount,
+    costUsd,
+    descendantCount,
+    filesTouched,
+    hotness,
+    inputTokens,
+    maxDepthFromHere,
+    outputTokens,
+    totalDuration,
+    totalTools
+  }
+}
+
+/**
+ * Count of subagents at each depth level, indexed by depth (0 = top level).
+ * Drives the inline sparkline (`▁▃▇▅`) and the status-bar HUD.
+ */
+export function widthByDepth(tree: readonly SubagentNode[]): number[] {
+  const widths: number[] = []
+
+  const walk = (nodes: readonly SubagentNode[], depth: number) => {
+    if (!nodes.length) {
+      return
+    }
+
+    widths[depth] = (widths[depth] ?? 0) + nodes.length
+
+    for (const node of nodes) {
+      walk(node.children, depth + 1)
+    }
+  }
+
+  walk(tree, 0)
+
+  return widths
+}
+
+/**
+ * Flat totals across the full tree — feeds the summary chip header.
+ */
+export function treeTotals(tree: readonly SubagentNode[]): SubagentAggregate {
+  let totalTools = 0
+  let totalDuration = 0
+  let descendantCount = 0
+  let activeCount = 0
+  let maxDepthFromHere = 0
+  let inputTokens = 0
+  let outputTokens = 0
+  let costUsd = 0
+  let filesTouched = 0
+
+  for (const node of tree) {
+    totalTools += node.aggregate.totalTools
+    totalDuration += node.aggregate.totalDuration
+    descendantCount += node.aggregate.descendantCount + 1
+    activeCount += node.aggregate.activeCount
+    maxDepthFromHere = Math.max(maxDepthFromHere, node.aggregate.maxDepthFromHere + 1)
+    inputTokens += node.aggregate.inputTokens
+    outputTokens += node.aggregate.outputTokens
+    costUsd += node.aggregate.costUsd
+    filesTouched += node.aggregate.filesTouched
+  }
+
+  const hotness = totalDuration > 0 ? totalTools / totalDuration : 0
+
+  return {
+    activeCount,
+    costUsd,
+    descendantCount,
+    filesTouched,
+    hotness,
+    inputTokens,
+    maxDepthFromHere,
+    outputTokens,
+    totalDuration,
+    totalTools
+  }
+}
+
+/**
+ * Flatten the tree into visit order — useful for keyboard navigation and
+ * for "kill subtree" walks that fire one RPC per descendant.
+ */
+export function flattenTree(tree: readonly SubagentNode[]): SubagentNode[] {
+  const out: SubagentNode[] = []
+
+  const walk = (nodes: readonly SubagentNode[]) => {
+    for (const node of nodes) {
+      out.push(node)
+      walk(node.children)
+    }
+  }
+
+  walk(tree)
+
+  return out
+}
+
+/**
+ * Collect every descendant's id for a given node (excluding the node itself).
+ */
+export function descendantIds(node: SubagentNode): string[] {
+  const ids: string[] = []
+
+  const walk = (children: readonly SubagentNode[]) => {
+    for (const child of children) {
+      ids.push(child.item.id)
+      walk(child.children)
+    }
+  }
+
+  walk(node.children)
+
+  return ids
+}
+
+export function isRunning(item: Pick<SubagentProgress, 'status'>): boolean {
+  return item.status === 'running' || item.status === 'queued'
+}
+
+const SPARK_RAMP = ['▁', '▂', '▃', '▄', '▅', '▆', '▇', '█'] as const
+
+/**
+ * 8-step unicode bar sparkline from a positive-integer array.  Zeroes render
+ * as spaces so a sparse tree doesn't read as equal activity at every depth.
+ */
+export function sparkline(values: readonly number[]): string {
+  if (!values.length) {
+    return ''
+  }
+
+  const max = Math.max(...values)
+
+  if (max <= 0) {
+    return ' '.repeat(values.length)
+  }
+
+  return values
+    .map(v => {
+      if (v <= 0) {
+        return ' '
+      }
+
+      const idx = Math.min(SPARK_RAMP.length - 1, Math.max(0, Math.ceil((v / max) * (SPARK_RAMP.length - 1))))
+
+      return SPARK_RAMP[idx]
+    })
+    .join('')
+}
+
+/**
+ * Format totals into a compact one-line summary: `d2 · 7 agents · 124 tools · 2m 14s`
+ */
+export function formatSummary(totals: SubagentAggregate): string {
+  const pieces = [`d${Math.max(0, totals.maxDepthFromHere)}`]
+  pieces.push(`${totals.descendantCount} agent${totals.descendantCount === 1 ? '' : 's'}`)
+
+  if (totals.totalTools > 0) {
+    pieces.push(`${totals.totalTools} tool${totals.totalTools === 1 ? '' : 's'}`)
+  }
+
+  if (totals.totalDuration > 0) {
+    pieces.push(fmtDuration(totals.totalDuration))
+  }
+
+  const tokens = totals.inputTokens + totals.outputTokens
+
+  if (tokens > 0) {
+    pieces.push(`${fmtTokens(tokens)} tok`)
+  }
+
+  if (totals.costUsd > 0) {
+    pieces.push(fmtCost(totals.costUsd))
+  }
+
+  if (totals.activeCount > 0) {
+    pieces.push(`⚡${totals.activeCount}`)
+  }
+
+  return pieces.join(' · ')
+}
+
+/** Compact dollar amount: `$0.02`, `$1.34`, `$12.4` — never > 5 chars beyond the `$`. */
+export function fmtCost(usd: number): string {
+  if (!Number.isFinite(usd) || usd <= 0) {
+    return ''
+  }
+
+  if (usd < 0.01) {
+    return '<$0.01'
+  }
+
+  if (usd < 10) {
+    return `$${usd.toFixed(2)}`
+  }
+
+  return `$${usd.toFixed(1)}`
+}
+
+/** Compact token count: `12k`, `1.2k`, `542`. */
+export function fmtTokens(n: number): string {
+  if (!Number.isFinite(n) || n <= 0) {
+    return '0'
+  }
+
+  if (n < 1000) {
+    return String(Math.round(n))
+  }
+
+  if (n < 10_000) {
+    return `${(n / 1000).toFixed(1)}k`
+  }
+
+  return `${Math.round(n / 1000)}k`
+}
+
+/**
+ * `Ns` / `Nm` / `Nm Ss` formatter for seconds.  Shared with the agents
+ * overlay so the timeline + list + summary all speak the same dialect.
+ */
+export function fmtDuration(seconds: number): string {
+  if (seconds < 60) {
+    return `${Math.max(0, Math.round(seconds))}s`
+  }
+
+  const m = Math.floor(seconds / 60)
+  const s = Math.round(seconds - m * 60)
+
+  return s === 0 ? `${m}m` : `${m}m ${s}s`
+}
+
+/**
+ * A subagent is top-level if it has no `parentId`, or its parent isn't in
+ * the same snapshot (orphaned by a pruned mid-flight root).  Same rule
+ * `buildSubagentTree` uses — keep call sites consistent across the live
+ * view, disk label, and diff pane.
+ */
+export function topLevelSubagents(items: readonly SubagentProgress[]): SubagentProgress[] {
+  const ids = new Set(items.map(s => s.id))
+
+  return items.filter(s => !s.parentId || !ids.has(s.parentId))
+}
+
+/**
+ * Normalize a node's hotness into a palette index 0..N-1 where N = buckets.
+ * Higher hotness = "hotter" colour. Normalized against the tree's peak hotness
+ * so a uniformly slow tree still shows gradient across its busiest branches.
+ */
+export function hotnessBucket(hotness: number, peakHotness: number, buckets: number): number {
+  if (!Number.isFinite(hotness) || hotness <= 0 || peakHotness <= 0 || buckets <= 1) {
+    return 0
+  }
+
+  const ratio = Math.min(1, hotness / peakHotness)
+
+  return Math.min(buckets - 1, Math.max(0, Math.round(ratio * (buckets - 1))))
+}
+
+export function peakHotness(tree: readonly SubagentNode[]): number {
+  let peak = 0
+
+  const walk = (nodes: readonly SubagentNode[]) => {
+    for (const node of nodes) {
+      peak = Math.max(peak, node.aggregate.hotness)
+      walk(node.children)
+    }
+  }
+
+  walk(tree)
+
+  return peak
+}
@@ -94,7 +94,12 @@ export const DARK_THEME: Theme = {
    amber: '#FFBF00',
    bronze: '#CD7F32',
    cornsilk: '#FFF8DC',
-    dim: '#B8860B',
+    // Bumped from the old `#B8860B` darkgoldenrod (~53% luminance) which
+    // read as barely-visible on dark terminals for long body text.  The
+    // new value sits ~60% luminance — readable without losing the "muted /
+    // secondary" semantic.  Field labels still use `label` (65%) which
+    // stays brighter so hierarchy holds.
+    dim: '#CC9B1F',
    completionBg: '#FFFFFF',
    completionCurrentBg: mix('#FFFFFF', '#FFBF00', 0.25),

@@ -104,8 +109,11 @@ export const DARK_THEME: Theme = {
    warn: '#ffa726',

    prompt: '#FFF8DC',
-    sessionLabel: '#B8860B',
-    sessionBorder: '#B8860B',
+    // sessionLabel/sessionBorder intentionally track the `dim` value — they
+    // are "same role, same colour" by design.  fromSkin's banner_dim fallback
+    // relies on this pairing (#11300).
+    sessionLabel: '#CC9B1F',
+    sessionBorder: '#CC9B1F',

    statusBg: '#1a1a2e',
    statusFg: '#C0C0C0',
@@ -12,16 +12,72 @@ export interface ActivityItem {
 }

 export interface SubagentProgress {
+  apiCalls?: number
+  costUsd?: number
+  depth: number
  durationSeconds?: number
+  filesRead?: string[]
+  filesWritten?: string[]
  goal: string
  id: string
  index: number
+  inputTokens?: number
+  iteration?: number
+  model?: string
  notes: string[]
-  status: 'completed' | 'failed' | 'interrupted' | 'running'
+  outputTail?: SubagentOutputEntry[]
+  outputTokens?: number
+  parentId: null | string
+  reasoningTokens?: number
+  startedAt?: number
+  status: 'completed' | 'failed' | 'interrupted' | 'queued' | 'running'
  summary?: string
  taskCount: number
  thinking: string[]
+  toolCount: number
  tools: string[]
+  toolsets?: string[]
+}
+
+export interface SubagentOutputEntry {
+  isError: boolean
+  preview: string
+  tool: string
+}
+
+export interface SubagentNode {
+  aggregate: SubagentAggregate
+  children: SubagentNode[]
+  item: SubagentProgress
+}
+
+export interface SubagentAggregate {
+  activeCount: number
+  costUsd: number
+  descendantCount: number
+  filesTouched: number
+  hotness: number
+  inputTokens: number
+  maxDepthFromHere: number
+  outputTokens: number
+  totalDuration: number
+  totalTools: number
+}
+
+export interface DelegationStatus {
+  active: {
+    depth?: number
+    goal?: string
+    model?: null | string
+    parent_id?: null | string
+    started_at?: number
+    status?: string
+    subagent_id?: string
+    tool_count?: number
+  }[]
+  max_concurrent_children?: number
+  max_spawn_depth?: number
+  paused: boolean
 }

 export interface ApprovalReq {
@@ -314,6 +314,7 @@ export interface AnalyticsDailyEntry {
  estimated_cost: number;
  actual_cost: number;
  sessions: number;
+  api_calls: number;
 }

 export interface AnalyticsModelEntry {
@@ -322,6 +323,7 @@ export interface AnalyticsModelEntry {
  output_tokens: number;
  estimated_cost: number;
  sessions: number;
+  api_calls: number;
 }

 export interface AnalyticsSkillEntry {
@@ -351,6 +353,7 @@ export interface AnalyticsResponse {
    total_estimated_cost: number;
    total_actual_cost: number;
    total_sessions: number;
+    total_api_calls: number;
  };
  skills: {
    summary: AnalyticsSkillsSummary;
@@ -347,7 +347,7 @@ export default function AnalyticsPage() {
            <SummaryCard
              icon={TrendingUp}
              label={t.analytics.apiCalls}
-              value={String(data.daily.reduce((sum, d) => sum + d.sessions, 0))}
+              value={String(data.totals.total_api_calls ?? data.daily.reduce((sum, d) => sum + d.sessions, 0))}
              sub={t.analytics.acrossModels.replace("{count}", String(data.by_model.length))}
            />
          </div>
@@ -359,7 +359,11 @@ The setup wizard installs dependencies automatically and only installs what's ne
 | `auto_retain` | `true` | Automatically retain conversation turns |
 | `auto_recall` | `true` | Automatically recall memories before each turn |
 | `retain_async` | `true` | Process retain asynchronously on the server |
-| `tags` | — | Tags applied when storing memories |
+| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
+| `retain_tags` | — | Default tags applied to retained memories; merged with per-call tool tags |
+| `retain_source` | — | Optional `metadata.source` attached to retained memories |
+| `retain_user_prefix` | `User` | Label used before user turns in auto-retained transcripts |
+| `retain_assistant_prefix` | `Assistant` | Label used before assistant turns in auto-retained transcripts |
 | `recall_tags` | — | Tags to filter on recall |

 See [plugin README](https://github.com/NousResearch/hermes-agent/blob/main/plugins/memory/hindsight/README.md) for the full configuration reference.
@@ -17,24 +17,52 @@ Connect Hermes to [WeCom](https://work.weixin.qq.com/) (企业微信), Tencent's

 ## Setup

-### 1. Create an AI Bot
+### Step 1: Create an AI Bot

-1. Log in to the [WeCom Admin Console](https://work.weixin.qq.com/wework_admin/frame)
-2. Navigate to **Applications** → **Create Application** → **AI Bot**
-3. Configure the bot name and description
-4. Copy the **Bot ID** and **Secret** from the credentials page
-
-### 2. Configure Hermes
-
-Run the interactive setup:
+#### Recommended: Scan-to-Create (one command)

 ```bash
 hermes gateway setup
 ```

-Select **WeCom** and enter your Bot ID and Secret.
+Select **WeCom** and scan the QR code with your WeCom mobile app. Hermes will automatically create a bot application with the correct permissions and save the credentials.

-Or set environment variables in `~/.hermes/.env`:
+The setup wizard will:
+1. Display a QR code in your terminal
+2. Wait for you to scan it with the WeCom mobile app
+3. Automatically retrieve the Bot ID and Secret
+4. Guide you through access control configuration
+
+#### Alternative: Manual Setup
+
+If scan-to-create is not available, the wizard falls back to manual input:
+
+1. Log in to the [WeCom Admin Console](https://work.weixin.qq.com/wework_admin/frame)
+2. Navigate to **Applications** → **Create Application** → **AI Bot**
+3. Configure the bot name and description
+4. Copy the **Bot ID** and **Secret** from the credentials page
+5. Run `hermes gateway setup`, select **WeCom**, and enter the credentials when prompted
+
+:::warning
+Keep the Bot Secret private. Anyone with it can impersonate your bot.
+:::
+
+### Step 2: Configure Hermes
+
+#### Option A: Interactive Setup (Recommended)
+
+```bash
+hermes gateway setup
+```
+
+Select **WeCom** and follow the prompts. The wizard will guide you through:
+- Bot credentials (via QR scan or manual entry)
+- Access control settings (allowlist, pairing mode, or open access)
+- Home channel for notifications
+
+#### Option B: Manual Configuration
+
+Add the following to `~/.hermes/.env`:

 ```bash
 WECOM_BOT_ID=your-bot-id
@@ -47,7 +75,7 @@ WECOM_ALLOWED_USERS=user_id_1,user_id_2
 WECOM_HOME_CHANNEL=chat_id
 ```

-### 3. Start the gateway
+### Step 3: Start the gateway

 ```bash
 hermes gateway
@@ -386,7 +386,21 @@ Key tables in `state.db`:

 - Gateway sessions auto-reset based on the configured reset policy
 - Before reset, the agent saves memories and skills from the expiring session
- Ended sessions remain in the database until pruned
+- Opt-in auto-pruning: when `sessions.auto_prune` is `true`, ended sessions older than `sessions.retention_days` (default 90) are pruned at CLI/gateway startup
+- After a prune that actually removed rows, `state.db` is `VACUUM`ed to reclaim disk space (SQLite does not shrink the file on plain DELETE)
+- Pruning runs at most once per `sessions.min_interval_hours` (default 24); the last-run timestamp is tracked inside `state.db` itself so it's shared across every Hermes process in the same `HERMES_HOME`
+
+Default is **off** — session history is valuable for `session_search` recall, and silently deleting it could surprise users. Enable in `~/.hermes/config.yaml`:
+
+```yaml
+sessions:
+  auto_prune: true          # opt in — default is false
+  retention_days: 90        # keep ended sessions this many days
+  vacuum_after_prune: true  # reclaim disk space after a pruning sweep
+  min_interval_hours: 24    # don't re-run the sweep more often than this
+```
+
+Active sessions are never auto-pruned, regardless of age.

 ### Manual Cleanup

@@ -403,5 +417,5 @@ hermes sessions prune --older-than 30 --yes
 ```

 :::tip
-The database grows slowly (typical: 10-15 MB for hundreds of sessions). Pruning is mainly useful for removing old conversations you no longer need for search recall.
+The database grows slowly (typical: 10-15 MB for hundreds of sessions) and session history powers `session_search` recall across past conversations, so auto-prune ships disabled. Enable it if you're running a heavy gateway/cron workload where `state.db` is meaningfully affecting performance (observed failure mode: 384 MB state.db with ~1000 sessions slowing down FTS5 inserts and `/resume` listing). Use `hermes sessions prune` for one-off cleanup without turning on the automatic sweep.
 :::