feat(kanban): default_assignee fallback + per-profile concurrency cap (#27145 , #21582 ) (#34244 )

Two related dispatcher behaviors that have been missing for a while. ## kanban.default_assignee (#27145) Reporter (@agarzon): dashboard creates a task without an assignee, task parks in 'ready' forever even though the operator's intent ('default') is perfectly clear. The dispatcher already had a 'skipped_unassigned' bucket but no fallback routing — users had to manually type 'default' in the assignee field every time. Behavior: when 'kanban.default_assignee' is set in config.yaml, the dispatcher applies that assignee to any unassigned ready task before deciding whether to spawn. The row is mutated (assignee column + an 'assigned' event with source='kanban.default_assignee' for the audit trail). Empty/whitespace config value = no fallback, preserving the existing skipped_unassigned behavior. Dry-run mode reports what WOULD happen via the new 'auto_assigned_default' bucket on DispatchResult, but does NOT mutate the DB — operators using 'hermes kanban dispatch --dry-run' see the routing decision before committing. ## kanban.max_in_progress_per_profile (#21582) Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads saturate one profile's local model / API quota / browser pool while other profiles sit idle. The existing global 'max_in_progress' caps total workers but doesn't balance across profiles. Behavior: when 'kanban.max_in_progress_per_profile' is set to a positive int, the dispatcher tracks per-assignee running counts (one query at tick start) and refuses to spawn for any assignee already at the cap. Tasks blocked this way go to a new 'skipped_per_profile_capped' bucket on DispatchResult as (task_id, assignee, current_running_count) tuples — NOT an operator-actionable failure, just 'try again next tick when the profile has capacity'. Pre-existing 'running' tasks count against the cap (verified via regression test). The cap respects dry_run mode by incrementing its in-memory counter on each would-be spawn so dry_run reports the same balanced subset that a real tick would. Invalid cap values (0, negative, non-int, None) are treated as 'no cap', preserving the existing behavior. Backward-compatible for installs that don't set the config. ## Surfaces - 'hermes kanban dispatch' CLI now prints 'Auto-assigned to kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap, N running): ...' lines, plus matching JSON keys in --json output. - Gateway dispatcher logs the configured values at startup ('default_assignee=X', 'max_in_progress_per_profile=N'). - 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with inline docs. ## Validation - tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap baseline, auto-assign + DB mutation, dry-run reports without mutating, whitespace treated as None, explicit assignees untouched, DispatchResult field schema. - tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including 4 parametrized): no-cap baseline, balanced 2-profile fan-out, pre-existing running counts against cap, invalid cap values (0/-1/'abc'/None), capped tasks dispatched on next tick after running task completes, DispatchResult field schema. - Broader kanban suite: 464/464 pass (was 449 baseline; +15 new regression tests across both features). ## Credit #27145 — Jimmy Johansson reported the dispatcher skipped-unassigned gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix that matches the existing config knob. #21582 — @edwardchenchen filed the per-profile cap ask after hitting model 429s on fan-out research projects; @simlu confirmed the same pain on local-model setups.
docs(docker): refresh user-guide page for s6-overlay reality
2026-05-28 19:02:55 -07:00 · 2026-05-29 11:55:01 +10:00 · 2026-05-29 11:49:54 +10:00 · 2026-05-29 11:49:54 +10:00 · 2026-05-29 11:49:54 +10:00 · 2026-05-29 11:49:54 +10:00
72 changed files with 6124 additions and 1091 deletions
@@ -0,0 +1,110 @@
+# Hermes Agent v0.15.1 (v2026.5.29)
+
+**Release Date:** May 29, 2026
+**Since v0.15.0:** 28 commits · 21 merged PRs · hotfix release · 9 contributors
+
+> **The Patch Release.** A same-day hotfix for v0.15.0. Headline fix: the dashboard infinite-reload loop that hit anyone running v0.15.0 in loopback mode (Docker, hosted Hermes, fresh installs). A handful of other v0.15.0 follow-ups go along for the ride — kanban worker SIGTERM, `/model` picker unification, `/yolo` session bypass, the full 19,932-entry skills.sh catalog, `.md` media delivery restoration, gateway probe-stepdown safety, web-URL redaction passthrough, kanban worker vision on referenced images, hindsight observation-default. Docker users get an explicit `--insecure` opt-in env var (no more bind-host inference), MCP server bare-command PATH resolution, and arm64 PR-build cache fixes.
+
+---
+
+## ✨ Highlights
+
+- **Dashboard 401 reload loop fixed** — In loopback mode the dashboard's identity probe (`/api/auth/me`) returns 401 by design, but v0.15.0's stale-token reload guard treated every 401 as a rotated session token and full-page-reloaded to pick up a fresh one. Every successful sibling call cleared the one-shot reload guard, so the page reload-looped forever (Firefox: "Navigated to /sessions" storm; Chrome: React re-render storm). Fix adds an `allowUnauthorized` opt-out to `fetchJSON` that skips only the loopback stale-token reload — 401 still throws so `AuthWidget` swallows it, gated-mode `login_url` redirects are unaffected. Closes [#34206](https://github.com/NousResearch/hermes-agent/issues/34206), [#34202](https://github.com/NousResearch/hermes-agent/issues/34202). ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
+
+- **Docker dashboard `--insecure` is now an explicit env opt-in, never derived from bind host** — Previously the Docker entrypoint inferred `--insecure` when the dashboard bound to a non-loopback host. That conflated "I want LAN access" with "I want to disable the same-origin guard." The fix splits them: bind host is bind host, and disabling the dashboard's loopback auth requires an explicit `HERMES_DASHBOARD_INSECURE=1`. Existing setups that genuinely wanted insecure binding must now set the env var. ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188), [#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
+
+- **MCP bare command resolution under Docker** — MCP servers configured with bare commands (`npx`, `npm`, `node`) now resolve against `/usr/local/bin` so they actually launch inside the Docker image where those binaries live. v0.15.0 left these failing silently in containers when the agent's effective PATH didn't include the Node toolchain location. ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
+
+- **Skills page sidebar / source pills restored** — A stale `useMemo` dependency in the new dashboard skills page collapsed the source pills and category sidebar to "All" only. Fixed; both surfaces now reflect the live catalog state. ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
+
+- **Kanban worker can be killed again** — `SIGTERM` on a kanban worker was being absorbed by an intermediate process and the worker stayed running. Closes [#28181](https://github.com/NousResearch/hermes-agent/issues/28181). ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
+
+- **Full skills.sh catalog (858 → 19,932 entries)** — The skills hub page was pulling a partial paginated catalog. The fetch now walks the sitemap, so all 19,932 skills.sh entries surface in the picker instead of just the first 858. ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
+
+---
+
+## 🐛 Bug Fixes
+
+### Dashboard / Web
+
+- **`/api/auth/me` 401 no longer triggers reload loop** in loopback mode — ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
+- **Skills page source pills + category sidebar restored** — stale `useMemo` dep ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
+
+### Docker
+
+- **`--insecure` is now explicit opt-in via env var**, not derived from bind host ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188) — @benbarclay)
+- **Dashboard test suite repaired** to match the insecure-opt-in fix ([#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
+- **arm64 PR builds skip the GHA cache** to avoid cache-thrash on cross-arch builders ([#33704](https://github.com/NousResearch/hermes-agent/pull/33704) — @BROCCOLO1D)
+
+### MCP
+
+- **Bare `npx`/`npm`/`node` resolve against `/usr/local/bin`** for Docker compatibility ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
+
+### Kanban
+
+- **Worker SIGTERM actually terminates the process** ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
+- **Workers receive images referenced in task bodies** for vision-capable models ([#34210](https://github.com/NousResearch/hermes-agent/pull/34210))
+
+### Gateway
+
+- **`.md` files deliver again** — media-delivery validation defaults to denylist-only instead of an overly-narrow allowlist ([#34022](https://github.com/NousResearch/hermes-agent/pull/34022))
+- **Probe stepdown safety** — on a context-overflow without an explicit provider context limit, the agent no longer steps down to a smaller model based on an unknown ceiling (salvage of [#33673](https://github.com/NousResearch/hermes-agent/pull/33673)) ([#33826](https://github.com/NousResearch/hermes-agent/pull/33826))
+
+### CLI
+
+- **`/yolo` mid-session enables the per-session bypass** instead of just toggling the env var (which the running agent had already snapshotted) ([#33931](https://github.com/NousResearch/hermes-agent/pull/33931) — @kshitijk4poor)
+- **`/model` and `hermes model` show the same list**, plus disk cache for picker startup ([#33867](https://github.com/NousResearch/hermes-agent/pull/33867))
+
+### Skills
+
+- **Full skills.sh catalog via sitemap** — 858 → 19,932 entries ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
+
+### Redaction
+
+- **Web URLs pass through unchanged** — the redactor was eating query parameters that looked credential-shaped ([#34029](https://github.com/NousResearch/hermes-agent/pull/34029))
+
+---
+
+## ✨ Small Features
+
+- **Hindsight default narrowed to observation-only** for `recall_types` — tool path is also narrowed ([#34079](https://github.com/NousResearch/hermes-agent/pull/34079) — @nicoloboschi, follow-up [#34091](https://github.com/NousResearch/hermes-agent/pull/4df62d239e38bf8c212a595721c9c01e176f6c3a) — @kshitijk4poor)
+- **Memory providers receive completed-turn message context** — salvage of [#28065](https://github.com/NousResearch/hermes-agent/pull/28065) ([#34097](https://github.com/NousResearch/hermes-agent/pull/34097) — @kshitijk4poor, credit to @devwdave)
+
+---
+
+## 📚 Documentation
+
+- **`--no-supervise` / `HERMES_GATEWAY_NO_SUPERVISE` documented** in the reference docs (follow-up to [#33583](https://github.com/NousResearch/hermes-agent/pull/33583)) ([#33751](https://github.com/NousResearch/hermes-agent/pull/33751) — @r266-tech)
+
+---
+
+## 🛠️ Infrastructure
+
+- **Vercel deploy workflow accepts `workflow_dispatch`** so docs deploys can be manually triggered ([#34081](https://github.com/NousResearch/hermes-agent/pull/34081))
+- **`@nous-research/ui` bumped to 0.18.2** (Nix `npmDepsHash` also updated to match) ([#34193](https://github.com/NousResearch/hermes-agent/pull/34193) follow-ups — @austinpickett)
+
+---
+
+## 👥 Contributors
+
+### Core
+- @teknium1
+
+### Community
+- @austinpickett — dashboard 401 reload-loop fix (the headline), `@nous-research/ui` bump, Nix `npmDepsHash` updates
+- @benbarclay — Docker `--insecure` opt-in, MCP bare-command resolution, dashboard test repair
+- @kshitijk4poor — `/yolo` session bypass, completed-turn memory context salvage, hindsight follow-up docs
+- @nicoloboschi — hindsight `recall_types` observation default
+- @BROCCOLO1D — arm64 PR build cache fix
+- @r266-tech — `--no-supervise` reference docs
+- @yangguangjin — probe stepdown safety (salvage of @yanghd's #33673)
+- @devwdave — completed-turn memory context (credited via salvage)
+- @andrewhosf — co-author
+
+### Issue Reporters (the 401 loop)
+- @routesmith ([#34206](https://github.com/NousResearch/hermes-agent/issues/34206))
+- @beeaton ([#34202](https://github.com/NousResearch/hermes-agent/issues/34202))
+
+---
+
+**Full Changelog**: [v2026.5.28...v2026.5.29](https://github.com/NousResearch/hermes-agent/compare/v2026.5.28...v2026.5.29)
@@ -1,7 +1,7 @@
 {
  "id": "hermes-agent",
  "name": "Hermes Agent",
-  "version": "0.15.0",
+  "version": "0.15.1",
  "description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
  "repository": "https://github.com/NousResearch/hermes-agent",
  "website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@@ -9,7 +9,7 @@
  "license": "MIT",
  "distribution": {
    "uvx": {
-      "package": "hermes-agent[acp]==0.15.0",
+      "package": "hermes-agent[acp]==0.15.1",
      "args": ["hermes-acp"]
    }
  }
@@ -980,48 +980,6 @@ def _extract_responses_reasoning_text(item: Any) -> str:
    return ""


-def _format_responses_error(error_obj: Any, response_status: str) -> str:
-    """Build a human-readable error string from a Responses ``response.error`` payload.
-
-    The OpenAI Responses API carries failure details under ``response.error``
-    on terminal ``response.failed`` events, in the shape
-    ``{"code": "rate_limit_exceeded", "message": "Slow down", "param": ...}``.
-    Earlier code only surfaced ``message``, which left users staring at bare
-    strings like ``"Slow down"`` while the failure mode (rate limit vs
-    context-length vs internal_error vs model-overloaded) was hidden in
-    ``code``. We now prefix ``code`` when both are present so consumers can
-    distinguish failure modes without parsing the bare message.
-
-    Falls back to ``code`` alone when ``message`` is empty, and to a stable
-    default referencing the response status when no error payload is
-    available at all. Adapted from anomalyco/opencode#28757.
-    """
-    # Pull code and message from either dict or attribute-style payloads.
-    code: Any = None
-    message: Any = None
-    if isinstance(error_obj, dict):
-        code = error_obj.get("code")
-        message = error_obj.get("message")
-    elif error_obj is not None:
-        code = getattr(error_obj, "code", None)
-        message = getattr(error_obj, "message", None)
-
-    code_str = str(code).strip() if isinstance(code, str) else (str(code).strip() if code else "")
-    message_str = str(message).strip() if isinstance(message, str) else (str(message).strip() if message else "")
-
-    if code_str and message_str:
-        return f"{code_str}: {message_str}"
-    if message_str:
-        return message_str
-    if code_str:
-        return code_str
-    if error_obj:
-        # Last-resort: stringify whatever the provider sent so it's at least
-        # visible in logs/UI rather than silently swallowed.
-        return str(error_obj)
-    return f"Responses API returned status '{response_status}'"
-
-
 # ---------------------------------------------------------------------------
 # Full response normalization
 # ---------------------------------------------------------------------------
@@ -1065,7 +1023,10 @@ def _normalize_codex_response(

    if response_status in {"failed", "cancelled"}:
        error_obj = getattr(response, "error", None)
-        error_msg = _format_responses_error(error_obj, response_status)
+        if isinstance(error_obj, dict):
+            error_msg = error_obj.get("message") or str(error_obj)
+        else:
+            error_msg = str(error_obj) if error_obj else f"Responses API returned status '{response_status}'"
        raise RuntimeError(error_msg)

    content_parts: List[str] = []
@@ -37,6 +37,8 @@ from __future__ import annotations
 import base64
 import logging
 import mimetypes
+import os
+import re
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple

@@ -46,6 +48,102 @@ logger = logging.getLogger(__name__)
 _VALID_MODES = frozenset({"auto", "native", "text"})


+# Image extensions used by extract_image_refs(). Kept tight on purpose — we
+# only auto-attach things the model can actually see. Documents/archives are
+# excluded because the gateway's broader extract_local_files() also routes
+# them differently (send_document), and we don't want to attach a PDF as a
+# vision part.
+_IMAGE_EXTS = (
+    ".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp", ".tiff", ".tif", ".heic",
+)
+_IMAGE_EXT_PATTERN = "|".join(e.lstrip(".") for e in _IMAGE_EXTS)
+
+# Absolute / home-relative local image path. Matches the same shape gateway's
+# extract_local_files() uses: anchors to ``~/`` or ``/``, ignores matches inside
+# URLs (the ``(?<![/:\w.])`` lookbehind), and case-insensitive on the extension.
+_LOCAL_IMAGE_PATH_RE = re.compile(
+    r"(?<![/:\w.])(?:~/|/)(?:[\w.\-]+/)*[\w.\-]+\.(?:" + _IMAGE_EXT_PATTERN + r")\b",
+    re.IGNORECASE,
+)
+
+# http(s) URL ending in an image extension (optionally followed by a
+# query string). Case-insensitive on the extension. Strict ``http(s)://``
+# scheme so we don't accidentally grab ``file://`` URLs or other shapes.
+_IMAGE_URL_RE = re.compile(
+    r"https?://[^\s<>\"']+?\.(?:" + _IMAGE_EXT_PATTERN + r")(?:\?[^\s<>\"']*)?",
+    re.IGNORECASE,
+)
+
+
+def extract_image_refs(text: str) -> Tuple[List[str], List[str]]:
+    """Scan free-form text for image references the model should see.
+
+    Returns ``(local_paths, urls)``:
+
+      * ``local_paths`` — absolute (``/``) or home-relative (``~/``) paths
+        whose suffix is an image extension AND whose expanded form exists
+        on disk as a file. Order-preserving, deduplicated.
+      * ``urls`` — ``http(s)://…`` URLs whose path ends in an image
+        extension (a ``?query`` is allowed after the extension).
+        Order-preserving, deduplicated.
+
+    Matches inside fenced code blocks (``` ``` ```) and inline backticks
+    (`` `…` ``) are skipped so that snippets pasted into a task body for
+    reference aren't mistaken for live attachments. This mirrors the
+    behaviour of ``gateway.platforms.base.BaseAdapter.extract_local_files``.
+
+    Local paths are validated against the filesystem; URLs are not
+    (the provider fetches them at request time).
+    """
+    if not isinstance(text, str) or not text:
+        return [], []
+
+    # Build spans covered by fenced code blocks and inline code so we can
+    # ignore references the author embedded purely as example text.
+    code_spans: list[tuple[int, int]] = []
+    for m in re.finditer(r"```[^\n]*\n.*?```", text, re.DOTALL):
+        code_spans.append((m.start(), m.end()))
+    for m in re.finditer(r"`[^`\n]+`", text):
+        code_spans.append((m.start(), m.end()))
+
+    def _in_code(pos: int) -> bool:
+        return any(s <= pos < e for s, e in code_spans)
+
+    local_paths: list[str] = []
+    seen_paths: set[str] = set()
+    for match in _LOCAL_IMAGE_PATH_RE.finditer(text):
+        if _in_code(match.start()):
+            continue
+        raw = match.group(0)
+        expanded = os.path.expanduser(raw)
+        try:
+            if not os.path.isfile(expanded):
+                continue
+        except OSError:
+            # ENAMETOOLONG / EINVAL on pathological inputs — skip rather than crash.
+            continue
+        if expanded in seen_paths:
+            continue
+        seen_paths.add(expanded)
+        local_paths.append(expanded)
+
+    urls: list[str] = []
+    seen_urls: set[str] = set()
+    for match in _IMAGE_URL_RE.finditer(text):
+        if _in_code(match.start()):
+            continue
+        url = match.group(0)
+        # Strip trailing punctuation that's almost certainly prose, not part
+        # of the URL (e.g. "see https://x.com/a.png." or "/a.png)").
+        url = url.rstrip(".,;:!?)]>")
+        if url in seen_urls:
+            continue
+        seen_urls.add(url)
+        urls.append(url)
+
+    return local_paths, urls
+
+
 # Strict YAML/JSON boolean coercion for capability overrides.
 #
 # ``bool("false")`` is True in Python because non-empty strings are truthy, so
@@ -320,20 +418,29 @@ def _file_to_data_url(path: Path) -> Optional[str]:
 def build_native_content_parts(
    user_text: str,
    image_paths: List[str],
+    image_urls: Optional[List[str]] = None,
 ) -> Tuple[List[Dict[str, Any]], List[str]]:
    """Build an OpenAI-style ``content`` list for a user turn.

    Shape:
      [{"type": "text", "text": "...\\n\\n[Image attached at: /local/path]"},
       {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
+       {"type": "image_url", "image_url": {"url": "https://example.com/a.png"}},
       ...]

-    The local path of each successfully attached image is appended to the
-    text part as ``[Image attached at: <path>]``. The model still sees the
-    pixels via the ``image_url`` part (full native vision); the path note
-    just gives it a string handle so MCP/skill tools that take an image
-    path or URL argument can be invoked on the same image without an
-    extra round-trip. This parallels the text-mode hint produced by
+    Local paths are read from disk and embedded as base64 ``data:`` URLs.
+    Remote URLs (``http(s)://``) are passed through verbatim — the provider
+    fetches them server-side. The model still sees the pixels either way.
+
+    For each successfully attached image, a hint is appended to the text
+    part:
+
+      * local path → ``[Image attached at: <path>]``
+      * URL        → ``[Image attached: <url>]``
+
+    The hint gives the model a string handle so MCP/skill tools that take
+    an image path or URL argument can be invoked on the same image without
+    an extra round-trip. This parallels the text-mode hint produced by
    ``Runner._enrich_message_with_vision`` (``vision_analyze using image_url:
    <path>``) so behaviour is consistent across both image input modes.

@@ -342,12 +449,14 @@ def build_native_content_parts(
    ceiling), the agent's retry loop transparently shrinks and retries
    once — see ``run_agent._try_shrink_image_parts_in_messages``.

-    Returns (content_parts, skipped_paths). Skipped paths are files that
-    couldn't be read from disk and are NOT advertised in the path hints.
+    Returns (content_parts, skipped). Skipped entries are local paths
+    that couldn't be read from disk; URLs are never skipped (they're
+    not validated here).
    """
    skipped: List[str] = []
    image_parts: List[Dict[str, Any]] = []
    attached_paths: List[str] = []
+    attached_urls: List[str] = []

    for raw_path in image_paths:
        p = Path(raw_path)
@@ -364,16 +473,26 @@ def build_native_content_parts(
        })
        attached_paths.append(str(raw_path))

+    for url in image_urls or []:
+        url = (url or "").strip()
+        if not url:
+            continue
+        image_parts.append({
+            "type": "image_url",
+            "image_url": {"url": url},
+        })
+        attached_urls.append(url)
+
    text = (user_text or "").strip()

    # If at least one image attached, build a single text part that combines
-    # the user's caption (or a neutral default) with one path hint per image.
-    if attached_paths:
+    # the user's caption (or a neutral default) with one hint per image.
+    if attached_paths or attached_urls:
        base_text = text or "What do you see in this image?"
-        path_hints = "\n".join(
-            f"[Image attached at: {p}]" for p in attached_paths
-        )
-        combined_text = f"{base_text}\n\n{path_hints}"
+        hint_lines: List[str] = []
+        hint_lines.extend(f"[Image attached at: {p}]" for p in attached_paths)
+        hint_lines.extend(f"[Image attached: {u}]" for u in attached_urls)
+        combined_text = f"{base_text}\n\n" + "\n".join(hint_lines)
        parts: List[Dict[str, Any]] = [{"type": "text", "text": combined_text}]
        parts.extend(image_parts)
        return parts, skipped
@@ -388,4 +507,5 @@ def build_native_content_parts(
 __all__ = [
    "decide_image_input_mode",
    "build_native_content_parts",
+    "extract_image_refs",
 ]
@@ -31,7 +31,6 @@ import time
 from dataclasses import dataclass, field
 from typing import Any, Callable, Optional

-from agent.codex_responses_adapter import _format_responses_error
 from agent.redact import redact_sensitive_text
 from agent.transports.codex_app_server import (
    CodexAppServerClient,
@@ -582,7 +581,7 @@ class CodexAppServerSession:
                        (note.get("params") or {}).get("turn") or {}
                    ).get("error")
                    if err_obj:
-                        err_msg = _format_responses_error(err_obj, str(turn_status))
+                        err_msg = err_obj.get("message") or str(err_obj)
                        # If the turn failed for an auth/refresh reason,
                        # rewrite the error into a re-auth hint AND mark
                        # the session for retirement.
@@ -576,6 +576,8 @@ def load_cli_config() -> Dict[str, Any]:
        "docker_env": "TERMINAL_DOCKER_ENV",
        "docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
        "docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
+        "docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
+        "docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
        "sandbox_dir": "TERMINAL_SANDBOX_DIR",
        # Persistent shell (non-local backends)
        "persistent_shell": "TERMINAL_PERSISTENT_SHELL",
@@ -15125,13 +15127,50 @@ def main(
    # Handle single query mode
    if query or image:
        query, single_query_images = _collect_query_images(query, image)
+        # Kanban workers spawn with ``hermes chat -q "work kanban task <id>"``;
+        # the actual task description lives in the task body. Mirror the
+        # gateway/CLI behaviour for inbound images by scanning the body for
+        # local image paths and http(s) image URLs and attaching them to the
+        # worker's first turn. Without this, users who paste a screenshot
+        # path or URL into a kanban task body never get it routed to the
+        # model's vision input.
+        single_query_image_urls: list[str] = []
+        _kanban_task_id = os.environ.get("HERMES_KANBAN_TASK", "").strip()
+        if _kanban_task_id:
+            try:
+                from hermes_cli import kanban_db as _kb
+                from agent.image_routing import extract_image_refs as _extract_refs
+
+                _conn = _kb.connect()
+                try:
+                    _task = _kb.get_task(_conn, _kanban_task_id)
+                finally:
+                    try:
+                        _conn.close()
+                    except Exception:
+                        pass
+                _body = getattr(_task, "body", "") if _task is not None else ""
+                if _body:
+                    _kb_paths, _kb_urls = _extract_refs(_body)
+                    if _kb_paths:
+                        # Dedupe against any --image the user already passed.
+                        _seen = {str(p) for p in single_query_images}
+                        for _p in _kb_paths:
+                            if _p not in _seen:
+                                _seen.add(_p)
+                                single_query_images.append(Path(_p))
+                    if _kb_urls:
+                        single_query_image_urls.extend(_kb_urls)
+            except Exception as _exc:
+                # Best-effort enrichment; never block worker startup on it.
+                logger.debug("kanban image-ref extraction failed: %s", _exc)
        if quiet:
            # Quiet mode: suppress banner, spinner, tool previews.
            # Only print the final response and parseable session info.
            cli.tool_progress_mode = "off"
            if cli._ensure_runtime_credentials():
                effective_query: Any = query
-                if single_query_images:
+                if single_query_images or single_query_image_urls:
                    # Honour the same image-routing decision used by the
                    # interactive path. With a vision-capable model (incl.
                    # custom-provider models declared via
@@ -15160,19 +15199,26 @@ def main(
                            _parts, _skipped = _build_parts(
                                query if isinstance(query, str) else "",
                                [str(p) for p in single_query_images],
+                                image_urls=list(single_query_image_urls) or None,
                            )
                            if any(p.get("type") == "image_url" for p in _parts):
                                effective_query = _parts
                            else:
                                # All images unreadable — text fallback.
+                                # ``_preprocess_images_with_vision`` only knows
+                                # about local files; URLs would be lost there,
+                                # so keep the original query text intact when
+                                # only URLs were supplied.
+                                if single_query_images:
+                                    effective_query = cli._preprocess_images_with_vision(
+                                        query, single_query_images, announce=False,
+                                    )
+                        except Exception:
+                            if single_query_images:
                                effective_query = cli._preprocess_images_with_vision(
                                    query, single_query_images, announce=False,
                                )
-                        except Exception:
-                            effective_query = cli._preprocess_images_with_vision(
-                                query, single_query_images, announce=False,
-                            )
-                    else:
+                    elif single_query_images:
                        effective_query = cli._preprocess_images_with_vision(
                            query,
                            single_query_images,
@@ -831,6 +831,8 @@ if _config_path.exists():
                "docker_env": "TERMINAL_DOCKER_ENV",
                "docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
                "docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
+                "docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
+                "docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
                "sandbox_dir": "TERMINAL_SANDBOX_DIR",
                "persistent_shell": "TERMINAL_PERSISTENT_SHELL",
            }
@@ -5418,6 +5420,49 @@ class GatewayRunner:
            )
            stale_timeout_seconds = 0

+        # Read kanban.default_assignee — fallback profile for tasks
+        # created without an explicit assignee (e.g. via the dashboard).
+        # When set, the dispatcher applies it to unassigned ready tasks
+        # instead of skipping them indefinitely (#27145). Empty string
+        # (the schema default) means "no fallback, keep skipping" —
+        # backward-compatible with existing installs.
+        default_assignee = (kanban_cfg.get("default_assignee") or "").strip() or None
+        if default_assignee:
+            logger.info(
+                "kanban dispatcher: default_assignee=%r (unassigned ready tasks "
+                "will route to this profile)",
+                default_assignee,
+            )
+
+        # Read kanban.max_in_progress_per_profile — per-profile concurrency
+        # cap (#21582). When set, no single profile gets more than N
+        # workers running at once, even if the global max_in_progress
+        # would allow it. Prevents one profile's local model / API quota
+        # / browser pool from being overwhelmed by a fan-out.
+        raw_per_profile = kanban_cfg.get("max_in_progress_per_profile", None)
+        max_in_progress_per_profile = None
+        if raw_per_profile is not None:
+            try:
+                max_in_progress_per_profile = int(raw_per_profile)
+            except (TypeError, ValueError):
+                logger.warning(
+                    "kanban dispatcher: invalid kanban.max_in_progress_per_profile=%r; ignoring",
+                    raw_per_profile,
+                )
+                max_in_progress_per_profile = None
+            else:
+                if max_in_progress_per_profile < 1:
+                    logger.warning(
+                        "kanban dispatcher: kanban.max_in_progress_per_profile=%r is below 1; ignoring",
+                        raw_per_profile,
+                    )
+                    max_in_progress_per_profile = None
+                else:
+                    logger.info(
+                        "kanban dispatcher: max_in_progress_per_profile=%d",
+                        max_in_progress_per_profile,
+                    )
+
        # Initial delay so the gateway finishes wiring adapters before the
        # dispatcher spawns workers (those workers may hit gateway notify
        # subscriptions etc.). Matches the notifier watcher's delay.
@@ -5509,6 +5554,8 @@ class GatewayRunner:
                    max_in_progress=max_in_progress,
                    failure_limit=failure_limit,
                    stale_timeout_seconds=stale_timeout_seconds,
+                    default_assignee=default_assignee,
+                    max_in_progress_per_profile=max_in_progress_per_profile,
                )
            except sqlite3.DatabaseError as exc:
                if _is_corrupt_board_db_error(exc):
@@ -14,8 +14,8 @@ Provides subcommands for:
 import os
 import sys

-__version__ = "0.15.0"
-__release_date__ = "2026.5.28"
+__version__ = "0.15.1"
+__release_date__ = "2026.5.29"


 def _ensure_utf8():
@@ -1726,6 +1726,15 @@ DEFAULT_CONFIG = {
        # assignee to any installed profile. When unset, falls back to the
        # default profile. A task never ends up with assignee=None.
        "default_assignee": "",
+        # Per-profile concurrency cap (#21582). When set to a positive int,
+        # no single profile can have more than N workers running at once,
+        # even if the global max_in_progress / max_spawn caps would allow
+        # it. Tasks blocked this way defer to the next dispatcher tick.
+        # Unset (None) means "no per-profile cap" — backward-compatible
+        # with existing installs. Useful for fan-out workflows that would
+        # otherwise saturate one profile's local model / API quota /
+        # browser pool while leaving other profiles idle.
+        "max_in_progress_per_profile": None,
        # When true, the kanban dispatcher auto-runs the decomposer on
        # tasks that land in Triage (every dispatcher tick). When false,
        # decomposition is manual via `hermes kanban decompose <id>` or
@@ -5551,6 +5560,8 @@ def set_config_value(key: str, value: str):
        "terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
        "terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
        "terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
+        "terminal.docker_persist_across_processes": "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
+        "terminal.docker_orphan_reaper": "TERMINAL_DOCKER_ORPHAN_REAPER",
        "terminal.docker_env": "TERMINAL_DOCKER_ENV",
        # terminal.cwd intentionally excluded — CLI resolves at runtime,
        # gateway bridges it in gateway/run.py. Persisting to .env causes
@@ -2087,12 +2087,35 @@ def _cmd_tail(args: argparse.Namespace) -> int:


 def _cmd_dispatch(args: argparse.Namespace) -> int:
+    # Honour kanban.default_assignee as the fallback for unassigned ready
+    # tasks (#27145) and kanban.max_in_progress_per_profile as the
+    # per-profile concurrency cap (#21582). Same semantics as the
+    # gateway dispatch path.
+    try:
+        from hermes_cli.config import load_config
+        _cfg = load_config()
+        _kanban_cfg = _cfg.get("kanban", {}) if isinstance(_cfg, dict) else {}
+        default_assignee = (_kanban_cfg.get("default_assignee") or "").strip() or None
+        _raw_per_profile = _kanban_cfg.get("max_in_progress_per_profile", None)
+        try:
+            max_in_progress_per_profile = (
+                int(_raw_per_profile) if _raw_per_profile is not None else None
+            )
+            if max_in_progress_per_profile is not None and max_in_progress_per_profile < 1:
+                max_in_progress_per_profile = None
+        except (TypeError, ValueError):
+            max_in_progress_per_profile = None
+    except Exception:
+        default_assignee = None
+        max_in_progress_per_profile = None
    with kb.connect_closing() as conn:
        res = kb.dispatch_once(
            conn,
            dry_run=args.dry_run,
            max_spawn=args.max,
            failure_limit=getattr(args, "failure_limit", kb.DEFAULT_SPAWN_FAILURE_LIMIT),
+            default_assignee=default_assignee,
+            max_in_progress_per_profile=max_in_progress_per_profile,
        )
    if getattr(args, "json", False):
        print(json.dumps({
@@ -2108,6 +2131,11 @@ def _cmd_dispatch(args: argparse.Namespace) -> int:
            ],
            "skipped_unassigned": res.skipped_unassigned,
            "skipped_nonspawnable": res.skipped_nonspawnable,
+            "skipped_per_profile_capped": [
+                {"task_id": tid, "assignee": who, "current": current}
+                for (tid, who, current) in res.skipped_per_profile_capped
+            ],
+            "auto_assigned_default": res.auto_assigned_default,
        }, indent=2))
        return 0
    print(f"Reclaimed:    {res.reclaimed}")
@@ -2128,8 +2156,18 @@ def _cmd_dispatch(args: argparse.Namespace) -> int:
    for tid, who, ws in res.spawned:
        tag = " (dry)" if args.dry_run else ""
        print(f"  - {tid}  ->  {who}  @ {ws or '-'}{tag}")
+    if res.auto_assigned_default:
+        print(
+            f"Auto-assigned to kanban.default_assignee={default_assignee!r}: "
+            f"{', '.join(res.auto_assigned_default)}"
+        )
    if res.skipped_unassigned:
        print(f"Skipped (unassigned): {', '.join(res.skipped_unassigned)}")
+    if res.skipped_per_profile_capped:
+        for tid, who, current in res.skipped_per_profile_capped:
+            print(
+                f"Deferred ({who} at per-profile cap, {current} running): {tid}"
+            )
    if res.skipped_nonspawnable:
        print(
            f"Skipped (non-spawnable assignee — terminal lane, OK): "
@@ -4289,6 +4289,12 @@ class DispatchResult:
    skipped_unassigned: list[str] = field(default_factory=list)
    """Ready task ids skipped because they have no assignee at all.
    Operator-actionable — usually a misfiled task waiting for routing."""
+    auto_assigned_default: list[str] = field(default_factory=list)
+    """Task ids that were unassigned in the DB and had
+    ``kanban.default_assignee`` applied this tick before spawning (#27145).
+    Surfaces the auto-assignment to telemetry / CLI / dashboard so the
+    operator can see when the dispatcher is acting on the fallback rule
+    rather than on explicit per-task assignments."""
    skipped_nonspawnable: list[str] = field(default_factory=list)
    """Ready task ids skipped because their assignee names a control-plane
    lane (a Claude Code terminal like ``orion-cc``) rather than a Hermes
@@ -4296,6 +4302,14 @@ class DispatchResult:
    operator-actionable failure. Tracked separately so health telemetry
    can distinguish "real stuck" (nothing spawned but spawnable work
    available) from "correctly idle" (nothing spawnable in the queue)."""
+    skipped_per_profile_capped: list[tuple[str, str, int]] = field(default_factory=list)
+    """Tasks deferred this tick because their assignee is already at
+    ``kanban.max_in_progress_per_profile`` (#21582). Each entry is
+    ``(task_id, assignee, current_running_count)``. NOT an
+    operator-actionable failure — the task will be picked up on a
+    subsequent tick when the assignee has capacity. Separate bucket so
+    telemetry / dashboards can show "this profile is busy" vs
+    "task is genuinely stuck"."""
    crashed: list[str] = field(default_factory=list)
    """Task ids reclaimed because their worker PID disappeared."""
    auto_blocked: list[str] = field(default_factory=list)
@@ -5342,6 +5356,8 @@ def dispatch_once(
    failure_limit: int = DEFAULT_SPAWN_FAILURE_LIMIT,
    stale_timeout_seconds: int = 0,
    board: Optional[str] = None,
+    default_assignee: Optional[str] = None,
+    max_in_progress_per_profile: Optional[int] = None,
 ) -> DispatchResult:
    """Run one dispatcher tick.

@@ -5427,12 +5443,89 @@ def dispatch_once(
        if max_spawn is None or max_spawn > remaining:
            max_spawn = remaining
    spawned = 0
+    # Per-profile concurrency cap (#21582): when set, track how many
+    # workers each assignee already has in flight, and refuse to spawn
+    # when this would push that assignee past the cap. Prevents
+    # fan-out workloads from melting a single profile's local model /
+    # API quota / browser pool while leaving other profiles idle.
+    # Tasks blocked this way go to skipped_per_profile_capped (not
+    # skipped_unassigned — the operator-actionable signal is different:
+    # "this profile is busy, try again later" not "this needs routing").
+    _per_profile_cap = max_in_progress_per_profile if (
+        isinstance(max_in_progress_per_profile, int)
+        and max_in_progress_per_profile > 0
+    ) else None
+    _per_profile_running: dict[str, int] = {}
+    if _per_profile_cap is not None:
+        for prow in conn.execute(
+            "SELECT assignee, COUNT(*) AS n FROM tasks "
+            "WHERE status = 'running' AND assignee IS NOT NULL "
+            "GROUP BY assignee"
+        ):
+            _per_profile_running[prow["assignee"]] = int(prow["n"])
+    # Normalize default_assignee once: empty/whitespace string → None so the
+    # rest of the loop can use ``if default_assignee:`` as a single check.
+    # We also resolve profile_exists once here for the same reason.
+    _default_assignee = (default_assignee or "").strip() or None
+    _default_assignee_resolved = False
+    if _default_assignee:
+        try:
+            from hermes_cli.profiles import profile_exists as _pe
+            _default_assignee_resolved = bool(_pe(_default_assignee))
+        except Exception:
+            # Profiles module not importable (test stubs, exotic envs).
+            # Trust the operator's config and try the assignment; the
+            # downstream profile_exists check on the assigned row will
+            # bucket it as nonspawnable if the profile genuinely isn't
+            # there, with the existing diagnostic.
+            _default_assignee_resolved = True
    for row in ready_rows:
        if max_spawn is not None and running_count + spawned >= max_spawn:
            break
-        if not row["assignee"]:
-            result.skipped_unassigned.append(row["id"])
-            continue
+        row_assignee = row["assignee"]
+        if not row_assignee:
+            # Honour kanban.default_assignee: when the dispatcher hits an
+            # unassigned ready task and an operator-configured fallback
+            # exists, persist the assignment and proceed. This removes the
+            # dashboard footgun where a task created without an assignee
+            # parks in 'ready' forever even though the operator's intent
+            # ("default") was perfectly clear (#27145). Mutating the row
+            # (not just the in-memory view) keeps diagnostics and the
+            # board state consistent: the task is now legitimately owned
+            # by ``kanban.default_assignee``, not "unassigned but secretly
+            # routed".
+            if _default_assignee and _default_assignee_resolved:
+                # Dry-run: show what WOULD happen (auto-assign + spawn) without
+                # mutating the DB. Real run: mutate the row + emit the
+                # 'assigned' event so the board state matches what just happened.
+                if not dry_run:
+                    try:
+                        with write_txn(conn):
+                            conn.execute(
+                                "UPDATE tasks SET assignee = ? WHERE id = ? "
+                                "AND (assignee IS NULL OR assignee = '')",
+                                (_default_assignee, row["id"]),
+                            )
+                            _append_event(
+                                conn, row["id"], "assigned",
+                                {
+                                    "assignee": _default_assignee,
+                                    "source": "kanban.default_assignee",
+                                },
+                            )
+                    except Exception:
+                        _log.debug(
+                            "kanban dispatch: failed to apply default_assignee=%r "
+                            "to task %s",
+                            _default_assignee, row["id"], exc_info=True,
+                        )
+                        result.skipped_unassigned.append(row["id"])
+                        continue
+                row_assignee = _default_assignee
+                result.auto_assigned_default.append(row["id"])
+            else:
+                result.skipped_unassigned.append(row["id"])
+                continue
        # Skip ready tasks whose assignee is not a real Hermes profile.
        # `_default_spawn` invokes ``hermes -p <assignee>`` which fails
        # with "Profile 'X' does not exist" when the assignee names a
@@ -5447,7 +5540,7 @@ def dispatch_once(
            from hermes_cli.profiles import profile_exists  # local import: avoids cycle
        except Exception:
            profile_exists = None  # type: ignore[assignment]
-        if profile_exists is not None and not profile_exists(row["assignee"]):
+        if profile_exists is not None and not profile_exists(row_assignee):
            # Bucket separately from skipped_unassigned: the operator
            # cannot fix this by assigning a profile (the assignee IS the
            # intended owner — a terminal lane). Health telemetry uses
@@ -5456,6 +5549,19 @@ def dispatch_once(
            # of human-pulled work.
            result.skipped_nonspawnable.append(row["id"])
            continue
+        # Per-profile concurrency cap (#21582): even if there's global
+        # headroom, refuse to spawn for an assignee that's already at
+        # its in-flight cap. Prevents one profile's local model / API
+        # quota / browser pool from being overwhelmed by a fan-out
+        # while the global max_in_progress / max_spawn caps still allow
+        # work on OTHER profiles.
+        if _per_profile_cap is not None:
+            current = _per_profile_running.get(row_assignee, 0)
+            if current >= _per_profile_cap:
+                result.skipped_per_profile_capped.append(
+                    (row["id"], row_assignee, current)
+                )
+                continue
        # Respawn guard: refuse to re-spawn when useful work is already
        # in-flight/recent, or when the last failure is a deterministic
        # blocker (quota / auth). The guard defers the spawn this tick so
@@ -5478,7 +5584,15 @@ def dispatch_once(
                    )
            continue
        if dry_run:
-            result.spawned.append((row["id"], row["assignee"], ""))
+            result.spawned.append((row["id"], row_assignee, ""))
+            # Increment per-profile counter even in dry_run so the cap
+            # check sees the would-be spawn on subsequent iterations.
+            # Without this, dry_run reports every task as spawnable and
+            # under-reports the capped subset (#21582).
+            if _per_profile_cap is not None and row_assignee:
+                _per_profile_running[row_assignee] = (
+                    _per_profile_running.get(row_assignee, 0) + 1
+                )
            continue
        claimed = claim_task(conn, row["id"], ttl_seconds=ttl_seconds)
        if claimed is None:
@@ -5521,6 +5635,13 @@ def dispatch_once(
            # complete_task).
            result.spawned.append((claimed.id, claimed.assignee or "", str(workspace)))
            spawned += 1
+            # Track the new in-flight count for this profile so later
+            # iterations in this same tick respect the per-profile cap
+            # (#21582). Subsequent ticks re-query from the DB.
+            if _per_profile_cap is not None and claimed.assignee:
+                _per_profile_running[claimed.assignee] = (
+                    _per_profile_running.get(claimed.assignee, 0) + 1
+                )
        except Exception as exc:
            auto = _record_spawn_failure(
                conn, claimed.id, str(exc),
@@ -4,7 +4,7 @@ let
  src = ../web;
  npmDeps = pkgs.fetchNpmDeps {
    inherit src;
-    hash = "sha256-6qhGuifHVtCeep1SiQdCUxBMr7UGhYpdMTvXhrQu/zA=";
+    hash = "sha256-HV0aISBVjwbGqDj8qQynSxGFrrZDzuYAW3D3lB/x3zo=";
  };

  npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "hermes-agent"
-version = "0.15.0"
+version = "0.15.1"
 description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -1,11 +1,6 @@
 from types import SimpleNamespace

-import pytest
-
-from agent.codex_responses_adapter import (
-    _format_responses_error,
-    _normalize_codex_response,
-)
+from agent.codex_responses_adapter import _normalize_codex_response


 def test_normalize_codex_response_drops_transient_rs_tmp_reasoning_items():
@@ -66,111 +61,3 @@ def test_normalize_codex_response_treats_summary_only_reasoning_as_incomplete():
    assert assistant_message.content == ""
    assert assistant_message.reasoning == "still thinking"
    assert assistant_message.codex_reasoning_items is None
-
-
-# ---------------------------------------------------------------------------
-# _format_responses_error — adapted from anomalyco/opencode#28757.
-# Provider failures should surface BOTH the code (rate_limit_exceeded /
-# context_length_exceeded / internal_error / server_error) and the message,
-# so consumers can tell rate limits apart from context-length failures and
-# both apart from generic stream drops.
-# ---------------------------------------------------------------------------
-
-
-def test_format_responses_error_combines_code_and_message():
-    err = {"code": "rate_limit_exceeded", "message": "Slow down"}
-    assert _format_responses_error(err, "failed") == "rate_limit_exceeded: Slow down"
-
-
-def test_format_responses_error_message_only():
-    err = {"message": "Upstream model unavailable"}
-    assert _format_responses_error(err, "failed") == "Upstream model unavailable"
-
-
-def test_format_responses_error_code_only_when_message_empty():
-    # Some providers/proxies emit a code with an empty message body. We
-    # used to fall back to ``str(error_obj)`` — a dict dump — which leaked
-    # ``{'code': 'internal_error', 'message': ''}`` into chat output. Now
-    # the bare code is surfaced, which is the meaningful field.
-    err = {"code": "internal_error", "message": ""}
-    assert _format_responses_error(err, "failed") == "internal_error"
-
-
-def test_format_responses_error_code_only_when_message_missing():
-    err = {"code": "server_error"}
-    assert _format_responses_error(err, "failed") == "server_error"
-
-
-def test_format_responses_error_attribute_style_payload():
-    # SDK objects expose ``code``/``message`` as attributes rather than dict
-    # keys. The helper must accept both shapes since the Responses SDK
-    # returns SimpleNamespace-style objects on ``response.failed``.
-    err = SimpleNamespace(code="context_length_exceeded", message="too long")
-    assert _format_responses_error(err, "failed") == "context_length_exceeded: too long"
-
-
-def test_format_responses_error_falls_back_to_status_when_empty():
-    assert (
-        _format_responses_error(None, "failed")
-        == "Responses API returned status 'failed'"
-    )
-    assert (
-        _format_responses_error(None, "cancelled")
-        == "Responses API returned status 'cancelled'"
-    )
-
-
-def test_format_responses_error_stringifies_opaque_payload():
-    # Last-resort: a provider sent something that isn't a dict and has no
-    # code/message attributes. Surface its repr rather than swallow it
-    # silently — at least it's visible in logs.
-    assert _format_responses_error("opaque sentinel", "failed") == "opaque sentinel"
-
-
-def test_format_responses_error_ignores_non_string_code_message():
-    # Defensive: a malformed gateway could send numbers/objects in these
-    # fields. We don't want to crash; we want a best-effort string.
-    err = {"code": 500, "message": None}
-    assert _format_responses_error(err, "failed") == "500"
-
-
-def test_normalize_codex_response_failed_includes_code_in_error():
-    """Regression: response_status == 'failed' should surface the error
-    code, not just the message. Used to leak a bare 'Slow down' string
-    that was indistinguishable from a generic stream truncation."""
-    # ``output`` non-empty so we don't trip the "no output items" guard
-    # before reaching the failed-status branch. Real failed responses
-    # often DO carry a partial message item alongside the error.
-    response = SimpleNamespace(
-        status="failed",
-        output=[
-            SimpleNamespace(
-                type="message",
-                role="assistant",
-                status="incomplete",
-                content=[SimpleNamespace(type="output_text", text="partial")],
-            ),
-        ],
-        error={"code": "rate_limit_exceeded", "message": "Slow down"},
-    )
-    with pytest.raises(RuntimeError, match=r"^rate_limit_exceeded: Slow down$"):
-        _normalize_codex_response(response)
-
-
-def test_normalize_codex_response_failed_with_message_only():
-    """Backwards-compat: a failed response with only a message field
-    (no code) should still surface that message verbatim."""
-    response = SimpleNamespace(
-        status="failed",
-        output=[
-            SimpleNamespace(
-                type="message",
-                role="assistant",
-                status="incomplete",
-                content=[SimpleNamespace(type="output_text", text="partial")],
-            ),
-        ],
-        error={"message": "model error"},
-    )
-    with pytest.raises(RuntimeError, match=r"^model error$"):
-        _normalize_codex_response(response)
@@ -16,6 +16,7 @@ from agent.image_routing import (
    _supports_vision_override,
    build_native_content_parts,
    decide_image_input_mode,
+    extract_image_refs,
 )


@@ -449,3 +450,190 @@ class TestLargeImageHandling:
        assert len(parts) == 2
        assert parts[0]["type"] == "text"
        assert parts[1]["type"] == "image_url"
+
+
+# ─── extract_image_refs ──────────────────────────────────────────────────────
+
+
+class TestExtractImageRefs:
+    """Scan task body / inbound text for image paths and URLs (kanban worker
+    enrichment, issue raised May 2026)."""
+
+    def test_empty_or_none_returns_empty(self):
+        assert extract_image_refs("") == ([], [])
+        assert extract_image_refs(None) == ([], [])  # type: ignore[arg-type]
+
+    def test_finds_absolute_path(self, tmp_path: Path):
+        img = tmp_path / "screenshot.png"
+        img.write_bytes(_png_bytes())
+        body = f"Look at {img} and tell me what's wrong."
+        paths, urls = extract_image_refs(body)
+        assert paths == [str(img)]
+        assert urls == []
+
+    def test_finds_home_relative_path(self, tmp_path: Path, monkeypatch):
+        # Simulate ~/foo.png by pointing HOME at tmp_path and creating the file
+        monkeypatch.setenv("HOME", str(tmp_path))
+        img = tmp_path / "foo.png"
+        img.write_bytes(_png_bytes())
+        paths, urls = extract_image_refs("see ~/foo.png please")
+        assert paths == [str(img)]
+        assert urls == []
+
+    def test_skips_nonexistent_paths(self, tmp_path: Path):
+        # Path-shaped but no file on disk → skipped.
+        body = f"What's at {tmp_path}/never_created.png ?"
+        paths, urls = extract_image_refs(body)
+        assert paths == []
+        assert urls == []
+
+    def test_finds_http_image_url(self):
+        body = "Check out https://example.com/photos/cat.png — cute right?"
+        paths, urls = extract_image_refs(body)
+        assert paths == []
+        assert urls == ["https://example.com/photos/cat.png"]
+
+    def test_finds_https_url_with_query_string(self):
+        body = "Diagram: https://cdn.example.com/img.jpeg?size=large&v=2 here"
+        paths, urls = extract_image_refs(body)
+        assert urls == ["https://cdn.example.com/img.jpeg?size=large&v=2"]
+
+    def test_url_trailing_punctuation_stripped(self):
+        # Prose punctuation right after the URL must not be part of the URL.
+        body = "See https://example.com/a.png."
+        paths, urls = extract_image_refs(body)
+        assert urls == ["https://example.com/a.png"]
+
+    def test_ignores_non_image_urls(self):
+        body = "See https://example.com/page.html and https://x.com/y.pdf"
+        paths, urls = extract_image_refs(body)
+        assert urls == []
+
+    def test_dedupes_paths_and_urls(self, tmp_path: Path):
+        img = tmp_path / "dup.png"
+        img.write_bytes(_png_bytes())
+        body = (
+            f"First {img} then again {img}. "
+            "Also https://example.com/x.png and https://example.com/x.png again."
+        )
+        paths, urls = extract_image_refs(body)
+        assert paths == [str(img)]
+        assert urls == ["https://example.com/x.png"]
+
+    def test_ignores_paths_in_fenced_code_block(self, tmp_path: Path):
+        img = tmp_path / "real.png"
+        img.write_bytes(_png_bytes())
+        body = (
+            "Outside the block, attach this:\n"
+            f"{img}\n"
+            "But not these examples:\n"
+            "```\n"
+            f"some_other_image: /tmp/example.png\n"
+            f"url: https://example.com/example.png\n"
+            "```\n"
+        )
+        paths, urls = extract_image_refs(body)
+        assert paths == [str(img)]
+        assert urls == []
+
+    def test_ignores_paths_in_inline_code(self, tmp_path: Path):
+        img = tmp_path / "real.jpg"
+        img.write_bytes(_png_bytes())
+        body = (
+            f"Attach {img}, but ignore the example "
+            "`https://example.com/skip.png` in backticks."
+        )
+        paths, urls = extract_image_refs(body)
+        assert paths == [str(img)]
+        assert urls == []
+
+    def test_does_not_match_paths_inside_urls(self, tmp_path: Path):
+        # The lookbehind in the regex prevents matching the path-portion of
+        # a URL as a local path. Only the URL should be detected.
+        body = "Just the URL: https://example.com/some/dir/image.png"
+        paths, urls = extract_image_refs(body)
+        assert paths == []
+        assert urls == ["https://example.com/some/dir/image.png"]
+
+    def test_mixed_paths_and_urls(self, tmp_path: Path):
+        img = tmp_path / "local.png"
+        img.write_bytes(_png_bytes())
+        body = (
+            f"Compare local {img} against the design at "
+            "https://example.com/design/v2.png — does it match?"
+        )
+        paths, urls = extract_image_refs(body)
+        assert paths == [str(img)]
+        assert urls == ["https://example.com/design/v2.png"]
+
+    def test_case_insensitive_extension(self, tmp_path: Path):
+        img = tmp_path / "shouty.PNG"
+        img.write_bytes(_png_bytes())
+        body = f"see {img}"
+        paths, urls = extract_image_refs(body)
+        assert paths == [str(img)]
+
+
+# ─── build_native_content_parts with URLs ────────────────────────────────────
+
+
+class TestBuildNativeContentPartsURLs:
+    """URL pass-through support added so kanban task bodies (and other
+    inbound surfaces) can route remote image URLs straight to the model."""
+
+    def test_url_only_no_local_paths(self):
+        parts, skipped = build_native_content_parts(
+            "what is this?",
+            [],
+            image_urls=["https://example.com/diagram.png"],
+        )
+        assert skipped == []
+        assert len(parts) == 2
+        assert parts[0]["type"] == "text"
+        assert "[Image attached: https://example.com/diagram.png]" in parts[0]["text"]
+        assert parts[0]["text"].startswith("what is this?")
+        assert parts[1] == {
+            "type": "image_url",
+            "image_url": {"url": "https://example.com/diagram.png"},
+        }
+
+    def test_mixed_path_and_url(self, tmp_path: Path):
+        img = tmp_path / "local.png"
+        img.write_bytes(_png_bytes())
+        parts, skipped = build_native_content_parts(
+            "compare these",
+            [str(img)],
+            image_urls=["https://example.com/remote.jpg"],
+        )
+        assert skipped == []
+        # 1 text + 2 image parts (local data URL first, then remote URL).
+        image_parts = [p for p in parts if p.get("type") == "image_url"]
+        assert len(image_parts) == 2
+        assert image_parts[0]["image_url"]["url"].startswith("data:image/png;base64,")
+        assert image_parts[1]["image_url"]["url"] == "https://example.com/remote.jpg"
+        text = parts[0]["text"]
+        assert "[Image attached at:" in text
+        assert "[Image attached: https://example.com/remote.jpg]" in text
+
+    def test_empty_url_list_is_no_op(self, tmp_path: Path):
+        img = tmp_path / "x.png"
+        img.write_bytes(_png_bytes())
+        # image_urls=[] should behave the same as not passing it at all.
+        parts_no_urls, _ = build_native_content_parts("hi", [str(img)])
+        parts_empty_urls, _ = build_native_content_parts("hi", [str(img)], image_urls=[])
+        assert parts_no_urls == parts_empty_urls
+
+    def test_blank_url_strings_are_dropped(self):
+        parts, _ = build_native_content_parts(
+            "x", [], image_urls=["", "  ", "https://example.com/a.png"]
+        )
+        image_parts = [p for p in parts if p.get("type") == "image_url"]
+        assert len(image_parts) == 1
+        assert image_parts[0]["image_url"]["url"] == "https://example.com/a.png"
+
+    def test_url_only_inserts_default_prompt_when_text_empty(self):
+        parts, _ = build_native_content_parts(
+            "", [], image_urls=["https://example.com/a.png"]
+        )
+        assert parts[0]["type"] == "text"
+        assert parts[0]["text"].startswith("What do you see in this image?")
@@ -227,6 +227,8 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
    "TERMINAL_CONTAINER_DISK",
    "TERMINAL_CONTAINER_MEMORY",
    "TERMINAL_CONTAINER_PERSISTENT",
+    "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES",
+    "TERMINAL_DOCKER_ORPHAN_REAPER",
    "TERMINAL_DOCKER_RUN_AS_HOST_USER",
    "BROWSER_CDP_URL",
    "CAMOFOX_URL",
@@ -88,7 +88,15 @@ def test_dashboard_slot_reports_up_when_enabled(
    """Symmetry: with HERMES_DASHBOARD=1, s6-svstat reports the slot as up."""
    subprocess.run(
        ["docker", "run", "-d", "--name", container_name,
-         "-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
+         "-e", "HERMES_DASHBOARD=1",
+         # The default dashboard host is 0.0.0.0, which now engages the
+         # OAuth auth gate. Without a provider registered (no
+         # HERMES_DASHBOARD_OAUTH_CLIENT_ID in this test env), start_server
+         # would fail closed and the slot would never come up. Pin the
+         # explicit insecure opt-in to keep this test focused on the s6
+         # supervision contract, not the auth gate.
+         "-e", "HERMES_DASHBOARD_INSECURE=1",
+         built_image, "sleep", "120"],
        check=True, capture_output=True, timeout=30,
    )
    # uvicorn takes a moment to bind; poll svstat.
@@ -113,7 +121,12 @@ def test_dashboard_opt_in_starts(
    """With HERMES_DASHBOARD=1, a dashboard process should be visible."""
    subprocess.run(
        ["docker", "run", "-d", "--name", container_name,
-         "-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
+         "-e", "HERMES_DASHBOARD=1",
+         # Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
+         # doesn't fail-closed before the process can come up. See
+         # test_dashboard_slot_reports_up_when_enabled for the full rationale.
+         "-e", "HERMES_DASHBOARD_INSECURE=1",
+         built_image, "sleep", "120"],
        check=True, capture_output=True, timeout=30,
    )
    # Poll for the dashboard subprocess to appear — the entrypoint
@@ -132,6 +145,10 @@ def test_dashboard_port_override(
    subprocess.run(
        ["docker", "run", "-d", "--name", container_name,
         "-e", "HERMES_DASHBOARD=1", "-e", "HERMES_DASHBOARD_PORT=9120",
+         # Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
+         # doesn't fail-closed before the port is bound. See
+         # test_dashboard_slot_reports_up_when_enabled for the full rationale.
+         "-e", "HERMES_DASHBOARD_INSECURE=1",
         built_image, "sleep", "120"],
        check=True, capture_output=True, timeout=30,
    )
@@ -161,7 +178,13 @@ def test_dashboard_restarts_after_crash(
    """
    subprocess.run(
        ["docker", "run", "-d", "--name", container_name,
-         "-e", "HERMES_DASHBOARD=1", built_image, "sleep", "120"],
+         "-e", "HERMES_DASHBOARD=1",
+         # Default bind is 0.0.0.0; pin insecure opt-in so the auth gate
+         # doesn't fail-closed before the supervised dashboard can come up.
+         # See test_dashboard_slot_reports_up_when_enabled for the full
+         # rationale.
+         "-e", "HERMES_DASHBOARD_INSECURE=1",
+         built_image, "sleep", "120"],
        check=True, capture_output=True, timeout=30,
    )
    # Wait for the first dashboard to come up.
@@ -214,36 +237,67 @@ def test_dashboard_restarts_after_crash(
 # ---------------------------------------------------------------------------


-def _fetch_api_status(container: str, *, deadline_s: float = 60.0) -> dict:
-    """Poll ``/api/status`` from inside the container via the venv python.
+def _http_probe(
+    container: str,
+    path: str,
+    *,
+    deadline_s: float = 60.0,
+) -> tuple[int, str]:
+    """Poll ``http://127.0.0.1:9119<path>`` from inside the container.

-    The dashboard binds to ``HERMES_DASHBOARD_HOST`` (typically ``0.0.0.0``)
-    so loopback inside the container works. The image doesn't ship
-    ``curl`` but Python's stdlib ``urllib`` is good enough.
+    Returns ``(status_code, body)`` as soon as the dashboard answers any
+    HTTP response — 200, 401, 503, anything. The image doesn't ship
+    ``curl`` but the venv's stdlib ``urllib`` is good enough; we use a
+    proper ``try``/``except`` to intercept ``HTTPError`` because
+    ``urlopen`` raises on 4xx/5xx, and we treat those as legitimate
+    responses (the OAuth gate's 401 IS the success signal for the
+    gate-engaged test).

-    Returns the decoded JSON dict on success; raises AssertionError on
-    timeout.
+    Connection errors (uvicorn still starting, fail-closed exited) keep
+    the poll loop running until ``deadline_s`` elapses.
+
+    The probe Python program is fed over stdin (``python -``) rather
+    than ``python -c`` so we can use proper multi-line syntax with
+    ``try``/``except`` blocks without escaping hell.
+
+    Raises ``AssertionError`` on timeout.
    """
+    py_program = f"""\
+import urllib.request, urllib.error
+req = urllib.request.Request("http://127.0.0.1:9119{path}")
+try:
+    r = urllib.request.urlopen(req, timeout=5)
+    print(r.status)
+    print(r.read().decode(), end="")
+except urllib.error.HTTPError as h:
+    print(h.code)
+    print(h.read().decode(), end="")
+"""
+    # Feed the program over stdin via a heredoc so docker_exec_sh's
+    # single bash string stays clean. The 'PY' delimiter is quoted to
+    # disable shell expansion inside the heredoc body.
    probe = (
-        "/opt/hermes/.venv/bin/python -c "
-        "'import json,urllib.request as u;"
-        "print(u.urlopen(\"http://127.0.0.1:9119/api/status\",timeout=5)"
-        ".read().decode())'"
+        "/opt/hermes/.venv/bin/python - <<'PY'\n"
+        f"{py_program}"
+        "PY"
    )
    end = time.monotonic() + deadline_s
    last_err = ""
    while time.monotonic() < end:
        r = docker_exec_sh(container, probe, timeout=10)
        if r.returncode == 0 and r.stdout.strip():
+            lines = r.stdout.split("\n", 1)
            try:
-                return json.loads(r.stdout)
-            except (ValueError, json.JSONDecodeError) as exc:  # noqa: F841
-                last_err = f"json parse: {exc!r} / stdout={r.stdout!r}"
+                status = int(lines[0].strip())
+                body = lines[1] if len(lines) > 1 else ""
+                return status, body
+            except (ValueError, IndexError) as exc:
+                last_err = f"parse: {exc!r} / stdout={r.stdout!r}"
        else:
            last_err = f"rc={r.returncode} stderr={r.stderr!r}"
        time.sleep(0.5)
    raise AssertionError(
-        f"/api/status never returned valid JSON within {deadline_s}s; "
+        f"Probe of {path} never returned HTTP within {deadline_s}s; "
        f"last error: {last_err}"
    )

@@ -263,6 +317,17 @@ def test_dashboard_oauth_gate_engages_on_non_loopback_bind(
    flipped ``--insecure`` on for any non-loopback bind, which routed
    ``start_server`` straight back into the legacy ``allow_public=True``
    branch and disabled the gate every time.
+
+    We verify two independent observable consequences of the gate being
+    on:
+
+    1. ``/api/auth/providers`` (publicly reachable through the gate so
+       the login page can bootstrap) returns 200 with ``nous`` in the
+       provider list — proves the bundled provider registered.
+    2. ``/api/status`` (a public endpoint under the legacy
+       ``_SESSION_TOKEN`` middleware) returns 401 — proves the OAuth gate
+       runs upstream of the legacy public list and is actively
+       intercepting unauthenticated callers.
    """
    subprocess.run(
        ["docker", "run", "-d", "--name", container_name,
@@ -272,15 +337,27 @@ def test_dashboard_oauth_gate_engages_on_non_loopback_bind(
         built_image, "sleep", "120"],
        check=True, capture_output=True, timeout=30,
    )
-    status = _fetch_api_status(container_name)
-    assert status.get("auth_required") is True, (
-        "OAuth gate must be engaged on 0.0.0.0 bind when a provider is "
-        "registered and HERMES_DASHBOARD_INSECURE is unset. Got: "
-        f"{status!r}"
+
+    # (1) Provider registry visible via the public bootstrap endpoint.
+    status_code, body = _http_probe(container_name, "/api/auth/providers")
+    assert status_code == 200, (
+        f"/api/auth/providers should return 200 when a provider is "
+        f"registered; got {status_code} body={body!r}"
    )
-    assert "nous" in status.get("auth_providers", []), (
+    payload = json.loads(body)
+    provider_names = [p.get("name") for p in payload.get("providers", [])]
+    assert "nous" in provider_names, (
        "Bundled dashboard_auth/nous provider should register when "
-        f"HERMES_DASHBOARD_OAUTH_CLIENT_ID is set. Got: {status!r}"
+        f"HERMES_DASHBOARD_OAUTH_CLIENT_ID is set. Got: {payload!r}"
+    )
+
+    # (2) /api/status is gated by the OAuth middleware → unauthenticated
+    # callers get 401, not the legacy public 200 JSON.
+    status_code, body = _http_probe(container_name, "/api/status")
+    assert status_code == 401, (
+        "OAuth gate must intercept /api/status on 0.0.0.0 bind when a "
+        "provider is registered and HERMES_DASHBOARD_INSECURE is unset. "
+        f"Got: status={status_code} body={body!r}"
    )


@@ -291,6 +368,10 @@ def test_dashboard_insecure_env_var_opts_out_of_gate(
    for operators running on trusted LANs behind a reverse proxy without
    the OAuth contract. Same opt-out shape as the rest of the s6 boolean
    envs (``HERMES_DASHBOARD``, ``HERMES_DASHBOARD_TUI``).
+
+    With the gate off, ``/api/status`` (a public endpoint under the
+    legacy ``_SESSION_TOKEN`` middleware) returns 200 with the
+    ``auth_required: false`` body — proves the gate is bypassed.
    """
    subprocess.run(
        ["docker", "run", "-d", "--name", container_name,
@@ -300,7 +381,12 @@ def test_dashboard_insecure_env_var_opts_out_of_gate(
         built_image, "sleep", "120"],
        check=True, capture_output=True, timeout=30,
    )
-    status = _fetch_api_status(container_name)
+    status_code, body = _http_probe(container_name, "/api/status")
+    assert status_code == 200, (
+        f"/api/status should return 200 with the auth gate disabled; "
+        f"got {status_code} body={body!r}"
+    )
+    status = json.loads(body)
    assert status.get("auth_required") is False, (
        "HERMES_DASHBOARD_INSECURE=1 must disable the auth gate (explicit "
        f"opt-in for trusted-LAN deployments). Got: {status!r}"
@@ -0,0 +1,154 @@
+"""Regression tests for #27145 — kanban.default_assignee for unassigned ready tasks.
+
+When the dispatcher hits an unassigned ready task and ``kanban.default_assignee``
+is set, the dispatcher applies the assignment and spawns. Without the config,
+the task is skipped (existing behavior preserved).
+"""
+from __future__ import annotations
+
+import json
+import os
+import sys
+import tempfile
+
+import pytest
+
+
+@pytest.fixture()
+def isolated_kanban_home(monkeypatch):
+    """Spin up a fresh HERMES_HOME with a clean kanban DB."""
+    test_home = tempfile.mkdtemp(prefix="kanban_default_assignee_test_")
+    monkeypatch.setenv("HERMES_HOME", test_home)
+    # Force-reimport so the fresh HERMES_HOME is picked up.
+    for mod in list(sys.modules.keys()):
+        if mod.startswith("hermes_cli") or mod.startswith("hermes_state") or mod == "hermes_constants":
+            del sys.modules[mod]
+    from hermes_cli import kanban_db
+    yield kanban_db, test_home
+    # Cleanup is best-effort; tempfile dir survives but pytest isolation
+    # gives each test its own monkeypatched HERMES_HOME so no cross-test
+    # contamination.
+
+
+def _fake_spawn(*args, **kwargs):
+    """Stand-in for the real worker spawn — returns a fake PID."""
+    return 12345
+
+
+def test_unassigned_task_skipped_without_default_assignee(isolated_kanban_home):
+    """Baseline: with no default_assignee, an unassigned ready task is
+    skipped via the existing `skipped_unassigned` bucket and the DB row
+    is untouched."""
+    kb, _home = isolated_kanban_home
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        task_id = kb.create_task(conn, title="t1", assignee=None)
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(conn, spawn_fn=_fake_spawn, dry_run=False)
+    assert res.skipped_unassigned == [task_id]
+    assert not res.auto_assigned_default
+    assert not res.spawned
+    with kb.connect_closing() as conn:
+        row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
+    assert row["assignee"] is None
+
+
+def test_unassigned_task_auto_assigned_with_default_assignee(isolated_kanban_home):
+    """Core #27145 contract: with default_assignee set, an unassigned ready
+    task gets the assignment applied and dispatched on the same tick. The
+    DB row is mutated (assignee column + an 'assigned' event)."""
+    kb, _home = isolated_kanban_home
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        task_id = kb.create_task(conn, title="t1", assignee=None)
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=False,
+            default_assignee="default",
+        )
+    assert res.auto_assigned_default == [task_id]
+    assert not res.skipped_unassigned
+    assert len(res.spawned) == 1
+    assert res.spawned[0][0] == task_id
+    assert res.spawned[0][1] == "default"
+
+    with kb.connect_closing() as conn:
+        row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
+    assert row["assignee"] == "default"
+
+    # 'assigned' event emitted for the audit trail
+    with kb.connect_closing() as conn:
+        evs = list(conn.execute(
+            "SELECT kind, payload FROM task_events WHERE task_id = ? AND kind = 'assigned'",
+            (task_id,),
+        ))
+    assert len(evs) == 1
+    payload = json.loads(evs[0][1])
+    assert payload["assignee"] == "default"
+    assert payload["source"] == "kanban.default_assignee"
+
+
+def test_dry_run_with_default_assignee_reports_without_mutating(isolated_kanban_home):
+    """Dry-run mode: reports what WOULD happen (task in auto_assigned_default,
+    spawn entry) but does NOT mutate the DB. Operators using
+    `hermes kanban dispatch --dry-run` see the routing decision before
+    committing."""
+    kb, _home = isolated_kanban_home
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        task_id = kb.create_task(conn, title="t1", assignee=None)
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=True,
+            default_assignee="default",
+        )
+    assert res.auto_assigned_default == [task_id]
+    assert len(res.spawned) == 1
+    with kb.connect_closing() as conn:
+        row = conn.execute("SELECT assignee FROM tasks WHERE id = ?", (task_id,)).fetchone()
+    # DB unchanged — dry_run did not commit the assignment.
+    assert row["assignee"] is None
+
+
+def test_whitespace_default_assignee_treated_as_none(isolated_kanban_home):
+    """Empty / whitespace-only default_assignee values must be treated as
+    'no fallback set' so a misconfigured kanban.default_assignee=' '
+    doesn't surprise operators by silently routing unassigned tasks."""
+    kb, _home = isolated_kanban_home
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        task_id = kb.create_task(conn, title="t1", assignee=None)
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=False,
+            default_assignee="   ",
+        )
+    assert task_id in res.skipped_unassigned
+    assert not res.auto_assigned_default
+
+
+def test_explicitly_assigned_task_untouched_by_default_assignee(isolated_kanban_home):
+    """A task with an explicit assignee must NOT be touched by the
+    default_assignee logic — that fallback only applies to genuinely
+    unassigned rows."""
+    kb, _home = isolated_kanban_home
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        task_id = kb.create_task(conn, title="t1", assignee="default")
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=False,
+            default_assignee="someother",
+        )
+    assert task_id not in res.auto_assigned_default
+    assert any(s[0] == task_id and s[1] == "default" for s in res.spawned)
+
+
+def test_dispatch_result_has_auto_assigned_default_field():
+    """Schema-level invariant: DispatchResult exposes the
+    auto_assigned_default field so CLI / dashboard / gateway can surface
+    the new routing decisions."""
+    from hermes_cli.kanban_db import DispatchResult
+    r = DispatchResult()
+    assert hasattr(r, "auto_assigned_default")
+    assert r.auto_assigned_default == []
@@ -0,0 +1,167 @@
+"""Regression tests for #21582 — per-profile concurrency cap in dispatcher.
+
+When ``kanban.max_in_progress_per_profile`` is set, no single profile
+gets more than N workers running at once even if the global
+``max_in_progress`` cap would allow it. Prevents one profile's local
+model / API quota / browser pool from being overwhelmed by a fan-out.
+"""
+from __future__ import annotations
+
+import os
+import sys
+import tempfile
+
+import pytest
+
+
+@pytest.fixture()
+def isolated_kanban_home_with_profiles(monkeypatch):
+    """Spin up a fresh HERMES_HOME with kanban DB + alpha/beta profiles."""
+    test_home = tempfile.mkdtemp(prefix="kanban_per_profile_cap_test_")
+    for prof in ("alpha", "beta", "default"):
+        os.makedirs(os.path.join(test_home, "profiles", prof), exist_ok=True)
+    monkeypatch.setenv("HERMES_HOME", test_home)
+    for mod in list(sys.modules.keys()):
+        if mod.startswith("hermes_cli") or mod.startswith("hermes_state") or mod == "hermes_constants":
+            del sys.modules[mod]
+    from hermes_cli import kanban_db
+    yield kanban_db
+
+
+def _fake_spawn(*args, **kwargs):
+    return 12345
+
+
+def test_no_cap_all_tasks_dispatched(isolated_kanban_home_with_profiles):
+    """Baseline: with no per-profile cap, all ready tasks dispatch."""
+    kb = isolated_kanban_home_with_profiles
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        for i in range(5):
+            kb.create_task(conn, title=f"a{i}", assignee="alpha")
+        for i in range(3):
+            kb.create_task(conn, title=f"b{i}", assignee="beta")
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(conn, spawn_fn=_fake_spawn, dry_run=True)
+    assert len(res.spawned) == 8
+    assert not res.skipped_per_profile_capped
+
+
+def test_cap_2_balances_two_profiles(isolated_kanban_home_with_profiles):
+    """With cap=2: 2 alpha + 2 beta dispatched; remaining 3 alpha + 1 beta
+    deferred to skipped_per_profile_capped."""
+    kb = isolated_kanban_home_with_profiles
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        for i in range(5):
+            kb.create_task(conn, title=f"a{i}", assignee="alpha")
+        for i in range(3):
+            kb.create_task(conn, title=f"b{i}", assignee="beta")
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=True,
+            max_in_progress_per_profile=2,
+        )
+    spawn_assignees = [s[1] for s in res.spawned]
+    capped_assignees = [c[1] for c in res.skipped_per_profile_capped]
+    assert spawn_assignees.count("alpha") == 2
+    assert spawn_assignees.count("beta") == 2
+    assert capped_assignees.count("alpha") == 3
+    assert capped_assignees.count("beta") == 1
+
+
+def test_pre_existing_running_counts_against_cap(isolated_kanban_home_with_profiles):
+    """A task already in 'running' status when dispatch_once starts counts
+    toward the per-profile cap. With 1 alpha pre-running and cap=1, NO new
+    alpha tasks should spawn; beta is independent so 1 beta spawns."""
+    kb = isolated_kanban_home_with_profiles
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        running_alpha = kb.create_task(conn, title="running alpha", assignee="alpha")
+        with kb.write_txn(conn):
+            conn.execute(
+                "UPDATE tasks SET status = 'running', claim_lock = 'test:1' WHERE id = ?",
+                (running_alpha,),
+            )
+        for i in range(2):
+            kb.create_task(conn, title=f"a{i}", assignee="alpha")
+        for i in range(2):
+            kb.create_task(conn, title=f"b{i}", assignee="beta")
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=True,
+            max_in_progress_per_profile=1,
+        )
+    spawn_assignees = [s[1] for s in res.spawned]
+    capped_assignees = [c[1] for c in res.skipped_per_profile_capped]
+    assert spawn_assignees.count("alpha") == 0
+    assert spawn_assignees.count("beta") == 1
+    assert capped_assignees.count("alpha") == 2
+    assert capped_assignees.count("beta") == 1
+
+
+@pytest.mark.parametrize("cap", [0, -1, "abc", None])
+def test_invalid_cap_treated_as_no_cap(isolated_kanban_home_with_profiles, cap):
+    """Cap values that don't represent a positive int should be treated as
+    'no cap' — silently falling through rather than crashing the dispatcher."""
+    kb = isolated_kanban_home_with_profiles
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        for i in range(3):
+            kb.create_task(conn, title=f"a{i}", assignee="alpha")
+    with kb.connect_closing() as conn:
+        res = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=True,
+            max_in_progress_per_profile=cap,
+        )
+    assert not res.skipped_per_profile_capped
+    assert len(res.spawned) == 3
+
+
+def test_capped_tasks_dispatched_on_subsequent_tick(isolated_kanban_home_with_profiles):
+    """A task deferred this tick because its profile was at cap should be
+    eligible for dispatch on the next tick (after running tasks complete).
+    This verifies the cap is per-tick state, not a permanent block."""
+    kb = isolated_kanban_home_with_profiles
+    with kb.connect_closing() as conn:
+        kb.create_board(slug="default", name="Test")
+        ids = [kb.create_task(conn, title=f"a{i}", assignee="alpha") for i in range(3)]
+
+    # First tick: cap=1, only 1 alpha dispatched
+    with kb.connect_closing() as conn:
+        res1 = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=False,
+            max_in_progress_per_profile=1,
+        )
+    assert len(res1.spawned) == 1
+    assert len(res1.skipped_per_profile_capped) == 2
+
+    # Simulate the running task completing — set it back to done so the
+    # 'running' count drops
+    spawned_id = res1.spawned[0][0]
+    with kb.connect_closing() as conn:
+        with kb.write_txn(conn):
+            conn.execute(
+                "UPDATE tasks SET status = 'done', claim_lock = NULL WHERE id = ?",
+                (spawned_id,),
+            )
+
+    # Second tick: 1 more alpha should now dispatch
+    with kb.connect_closing() as conn:
+        res2 = kb.dispatch_once(
+            conn, spawn_fn=_fake_spawn, dry_run=False,
+            max_in_progress_per_profile=1,
+        )
+    assert len(res2.spawned) == 1
+    assert len(res2.skipped_per_profile_capped) == 1
+    assert res2.spawned[0][0] != spawned_id  # different task this time
+
+
+def test_dispatch_result_has_skipped_per_profile_capped_field():
+    """Schema-level invariant: DispatchResult exposes the
+    skipped_per_profile_capped field as a list of
+    (task_id, assignee, current_running) tuples."""
+    from hermes_cli.kanban_db import DispatchResult
+    r = DispatchResult()
+    assert hasattr(r, "skipped_per_profile_capped")
+    assert r.skipped_per_profile_capped == []
@@ -0,0 +1,238 @@
+"""Worker-side image enrichment for kanban tasks.
+
+When a kanban task body contains a local image path or an ``http(s)://``
+image URL, the worker must surface that image to the model on its first
+user turn — matching the CLI/gateway behaviour for inbound images.
+
+The dispatcher spawns the worker as
+``hermes -p <profile> chat -q "work kanban task <id>"``. The task body
+itself never appears in argv; the worker has to read it from the kanban
+DB during startup. These tests cover the round-trip:
+
+  task body  →  kanban_db.get_task  →  extract_image_refs  →
+  build_native_content_parts  →  multimodal user turn
+"""
+from __future__ import annotations
+
+import base64
+from pathlib import Path
+
+import pytest
+
+from hermes_cli import kanban_db as kb
+from agent.image_routing import (
+    build_native_content_parts,
+    extract_image_refs,
+)
+
+
+# Tiny 1×1 transparent PNG used to back any path the tests stick into a
+# task body. extract_image_refs validates the path exists on disk, so the
+# byte content has to be a real readable file (any image bytes will do).
+_PNG = base64.b64decode(
+    "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR4nGNgYGBgAAAABQABpfZFQAAAAABJRU5ErkJggg=="
+)
+
+
+@pytest.fixture
+def kanban_home(tmp_path: Path, monkeypatch):
+    """Isolated HERMES_HOME with a fresh kanban DB for each test."""
+    home = tmp_path / ".hermes"
+    home.mkdir()
+    monkeypatch.setenv("HERMES_HOME", str(home))
+    monkeypatch.setattr(Path, "home", lambda: tmp_path)
+    kb.init_db()
+    return home
+
+
+def _add_task_with_body(body: str, *, title: str = "Look at this") -> str:
+    conn = kb.connect()
+    try:
+        task_id = kb.create_task(
+            conn,
+            title=title,
+            body=body,
+            assignee="worker-a",
+            tenant=None,
+        )
+    finally:
+        conn.close()
+    return task_id
+
+
+def _read_body(task_id: str) -> str:
+    conn = kb.connect()
+    try:
+        task = kb.get_task(conn, task_id)
+        return (task.body if task is not None else "") or ""
+    finally:
+        conn.close()
+
+
+class TestExtractFromTaskBody:
+    """Read a real kanban task body and run it through extract_image_refs."""
+
+    def test_local_path_in_body_round_trips(self, kanban_home, tmp_path):
+        img = tmp_path / "screenshot.png"
+        img.write_bytes(_PNG)
+        tid = _add_task_with_body(
+            f"Please review the screenshot at {img} and confirm "
+            "the alignment is right."
+        )
+
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+        assert paths == [str(img)]
+        assert urls == []
+
+    def test_url_in_body_round_trips(self, kanban_home):
+        tid = _add_task_with_body(
+            "The design lives at https://example.com/mock/v3.png — "
+            "make the implementation match it."
+        )
+
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+        assert paths == []
+        assert urls == ["https://example.com/mock/v3.png"]
+
+    def test_mixed_path_and_url_in_body(self, kanban_home, tmp_path):
+        img = tmp_path / "current.png"
+        img.write_bytes(_PNG)
+        tid = _add_task_with_body(
+            f"Compare the current screenshot {img} against the design at "
+            "https://example.com/target.png and write a diff."
+        )
+
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+        assert paths == [str(img)]
+        assert urls == ["https://example.com/target.png"]
+
+    def test_body_without_images_yields_nothing(self, kanban_home):
+        tid = _add_task_with_body(
+            "Refactor the auth module to use the new session helper."
+        )
+
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+        assert paths == []
+        assert urls == []
+
+    def test_empty_body_is_safe(self, kanban_home):
+        tid = _add_task_with_body("")
+
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+        assert paths == []
+        assert urls == []
+
+
+class TestBuildPartsFromTaskBody:
+    """Verify the full pipeline produces a multimodal user turn."""
+
+    def test_local_path_becomes_native_image_part(self, kanban_home, tmp_path):
+        img = tmp_path / "design.png"
+        img.write_bytes(_PNG)
+        tid = _add_task_with_body(f"Check out {img} — what's broken?")
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+
+        # Mirrors the cli.py wiring: pass the worker's literal -q argument
+        # (the dispatcher uses ``"work kanban task <id>"``) plus the
+        # extracted refs through build_native_content_parts.
+        parts, skipped = build_native_content_parts(
+            f"work kanban task {tid}",
+            paths,
+            image_urls=urls or None,
+        )
+
+        assert skipped == []
+        # text part + one image_url part
+        assert len(parts) == 2
+        assert parts[0]["type"] == "text"
+        assert parts[0]["text"].startswith(f"work kanban task {tid}")
+        assert f"[Image attached at: {img}]" in parts[0]["text"]
+        assert parts[1]["type"] == "image_url"
+        assert parts[1]["image_url"]["url"].startswith("data:image/png;base64,")
+
+    def test_url_becomes_image_url_part(self, kanban_home):
+        tid = _add_task_with_body(
+            "Reference: https://example.com/target.jpg — match it."
+        )
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+
+        parts, skipped = build_native_content_parts(
+            f"work kanban task {tid}",
+            paths,
+            image_urls=urls or None,
+        )
+
+        assert skipped == []
+        assert len(parts) == 2
+        assert parts[0]["type"] == "text"
+        assert "[Image attached: https://example.com/target.jpg]" in parts[0]["text"]
+        assert parts[1] == {
+            "type": "image_url",
+            "image_url": {"url": "https://example.com/target.jpg"},
+        }
+
+    def test_body_with_both_yields_two_image_parts(self, kanban_home, tmp_path):
+        img = tmp_path / "local.png"
+        img.write_bytes(_PNG)
+        tid = _add_task_with_body(
+            f"Diff {img} vs https://example.com/target.png — explain it."
+        )
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+
+        parts, skipped = build_native_content_parts(
+            f"work kanban task {tid}",
+            paths,
+            image_urls=urls or None,
+        )
+
+        assert skipped == []
+        image_parts = [p for p in parts if p.get("type") == "image_url"]
+        assert len(image_parts) == 2
+        # Local file is embedded as a data URL; remote URL passes through.
+        assert image_parts[0]["image_url"]["url"].startswith("data:image/png;base64,")
+        assert image_parts[1]["image_url"]["url"] == "https://example.com/target.png"
+
+    def test_body_with_no_images_leaves_query_untouched(self, kanban_home):
+        tid = _add_task_with_body(
+            "Rewrite the README intro paragraph to focus on use cases."
+        )
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+
+        parts, skipped = build_native_content_parts(
+            f"work kanban task {tid}",
+            paths,
+            image_urls=urls or None,
+        )
+
+        # No images → plain text-only return (single part, no list mutation).
+        assert skipped == []
+        assert len(parts) == 1
+        assert parts[0]["type"] == "text"
+        assert parts[0]["text"] == f"work kanban task {tid}"
+
+    def test_code_block_example_is_not_attached(self, kanban_home, tmp_path):
+        # Only the real image outside the fenced code block should attach.
+        real = tmp_path / "real.png"
+        real.write_bytes(_PNG)
+        tid = _add_task_with_body(
+            f"Real screenshot:\n{real}\n\n"
+            "Example we DON'T want attached:\n"
+            "```\n"
+            "image: /tmp/example_only.png\n"
+            "url: https://example.com/example.png\n"
+            "```\n"
+        )
+        body = _read_body(tid)
+        paths, urls = extract_image_refs(body)
+
+        assert paths == [str(real)]
+        assert urls == []
@@ -203,25 +203,43 @@ def test_auto_mount_replaces_persistent_workspace_bind(monkeypatch, tmp_path):


 def test_non_persistent_cleanup_removes_container(monkeypatch):
-    """When persistent=false, cleanup() must schedule docker stop + rm."""
+    """When persist_across_processes=false, cleanup() must docker stop AND
+    docker rm so containers don't leak across hermes processes.
+
+    Updated for issue #20561: the previous implementation used fire-and-forget
+    ``subprocess.Popen("... &", shell=True)`` which raced with parent exit;
+    the new implementation uses ``subprocess.run`` on a daemon thread with
+    bounded timeouts. See test_cleanup_with_persist_disabled_stops_and_rms
+    for the full behavior contract.
+    """
    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
-    calls = _mock_subprocess_run(monkeypatch)
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+    # Run the worker thread synchronously so assertions can observe its work.
+    import threading
+    monkeypatch.setattr(threading, "Thread", _FakeThread)

-    popen_cmds = []
-    monkeypatch.setattr(
-        docker_env.subprocess, "Popen",
-        lambda cmd, **kw: (popen_cmds.append(cmd), type("P", (), {"poll": lambda s: 0, "wait": lambda s, **k: None, "returncode": 0, "stdout": iter([]), "stdin": None})())[1],
+    env = docker_env.DockerEnvironment(
+        image="python:3.11", cwd="/root", timeout=60,
+        task_id="ephemeral-task", persistent_filesystem=False,
+        persist_across_processes=False,
    )
-
-    env = _make_dummy_env(persistent_filesystem=False, task_id="ephemeral-task")
-    assert env._container_id
    container_id = env._container_id
+    assert container_id

+    # Capture cleanup-time docker calls (everything before this was init).
+    cleanup_calls = []
+    real_run = docker_env.subprocess.run
+
+    def _capture(cmd, **kw):
+        cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kw))
+        return real_run(cmd, **kw)
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _capture)
    env.cleanup()

-    # Should have stop and rm calls via Popen
-    stop_cmds = [c for c in popen_cmds if container_id in str(c) and "stop" in str(c)]
-    assert len(stop_cmds) >= 1, f"cleanup() should schedule docker stop for {container_id}"
+    stops = [c for c in cleanup_calls if isinstance(c[0], list) and c[0][1:2] == ["stop"]]
+    assert stops, f"cleanup() should docker stop {container_id}; got {cleanup_calls}"


 class _FakePopen:
@@ -514,3 +532,839 @@ def test_run_as_host_user_warns_and_skips_when_no_posix_ids(monkeypatch, caplog)
        "does not expose POSIX uid/gid" in rec.getMessage()
        for rec in caplog.records
    ), "expected a warning when POSIX ids are unavailable"
+
+
+# ── Docker labels (issue #20561) ──────────────────────────────────
+
+
+def _run_args_from_calls(calls):
+    """Pull the argv list passed to the first ``docker run`` invocation."""
+    run_calls = [
+        c for c in calls
+        if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"
+    ]
+    assert run_calls, "docker run should have been called"
+    return run_calls[0][0]
+
+
+def _labels_in_run_args(run_args):
+    """Return the set of ``key=value`` strings passed via ``--label``."""
+    return {
+        run_args[i + 1]
+        for i, flag in enumerate(run_args[:-1])
+        if flag == "--label"
+    }
+
+
+def test_run_command_tags_hermes_agent_label(monkeypatch):
+    """Every container hermes-agent starts must carry the hermes-agent=1 label
+    so the orphan reaper (and external operators) can identify them with a
+    single ``docker ps --filter label=hermes-agent=1`` call. Regression test
+    for issue #20561 — without the label there is no global sweep target."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    calls = _mock_subprocess_run(monkeypatch)
+
+    _make_dummy_env(task_id="my-task")
+
+    labels = _labels_in_run_args(_run_args_from_calls(calls))
+    assert "hermes-agent=1" in labels, (
+        f"hermes-agent=1 label missing; got labels: {sorted(labels)}"
+    )
+
+
+def test_run_command_tags_task_and_profile_labels(monkeypatch):
+    """task_id and the active profile name are surfaced as labels so future
+    cross-process reuse logic can filter to a specific (task, profile) pair
+    without parsing container names. Profile resolution uses the helper that
+    returns ``"default"`` for the root Hermes home."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "research-bot")
+    calls = _mock_subprocess_run(monkeypatch)
+
+    _make_dummy_env(task_id="kanban-42")
+
+    labels = _labels_in_run_args(_run_args_from_calls(calls))
+    assert "hermes-task-id=kanban-42" in labels, (
+        f"hermes-task-id=kanban-42 missing; got: {sorted(labels)}"
+    )
+    assert "hermes-profile=research-bot" in labels, (
+        f"hermes-profile=research-bot missing; got: {sorted(labels)}"
+    )
+
+
+def test_label_sanitizer_rejects_invalid_characters():
+    """Docker label values must be alnum + ``_.-`` and ≤63 chars. Profile or
+    task names containing slashes, colons, or unicode would otherwise emit
+    invalid labels that round-trip badly through ``docker ps --filter``."""
+    assert docker_env._sanitize_label_value("plain-name_1.0") == "plain-name_1.0"
+    assert docker_env._sanitize_label_value("with/slash") == "with_slash"
+    assert docker_env._sanitize_label_value("with:colon") == "with_colon"
+    assert docker_env._sanitize_label_value("emoji-😀-here") == "emoji-_-here"
+    # Empty / non-string inputs must collapse to a queryable token, not "".
+    assert docker_env._sanitize_label_value("") == "unknown"
+    assert docker_env._sanitize_label_value(None) == "unknown"  # type: ignore[arg-type]
+    # >63 chars must truncate, not error.
+    long_value = "x" * 100
+    assert len(docker_env._sanitize_label_value(long_value)) == 63
+
+
+def test_run_command_sanitizes_unsafe_task_id(monkeypatch):
+    """A task_id containing characters Docker rejects in label values must be
+    sanitized before reaching ``docker run --label``; otherwise the daemon
+    refuses the run with an inscrutable error and the agent's first command
+    blows up."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    calls = _mock_subprocess_run(monkeypatch)
+
+    _make_dummy_env(task_id="task/with:weird*chars")
+
+    labels = _labels_in_run_args(_run_args_from_calls(calls))
+    # Each non-OK character becomes an underscore; the safe chars survive.
+    assert "hermes-task-id=task_with_weird_chars" in labels, (
+        f"sanitized task-id label missing; got: {sorted(labels)}"
+    )
+
+
+def test_labels_attribute_populated_after_init(monkeypatch):
+    """``self._labels`` must be set to the same key/value pairs that went onto
+    docker run, so subsequent reuse / reaper paths can match without re-running
+    the sanitizer or re-importing the profile module."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+
+    env = _make_dummy_env(task_id="abc")
+
+    assert env._labels == {
+        "hermes-agent": "1",
+        "hermes-task-id": "abc",
+        "hermes-profile": "default",
+    }
+
+
+# ── Cross-process container reuse (issue #20561) ──────────────────
+
+
+def _mock_subprocess_run_with_reuse(monkeypatch, ps_state: str | None,
+                                     start_succeeds: bool = True):
+    """Reuse-aware subprocess.run mock.
+
+    ``ps_state`` controls what ``docker ps -a --filter ...`` returns:
+      * ``None`` → no match (empty stdout). Forces a fresh ``docker run``.
+      * ``"running"`` / ``"exited"`` / ... → emit ``CID\\tSTATE`` so the reuse
+        path picks it up. ``"running"`` skips ``docker start``; other states
+        trigger ``docker start`` (which can be forced to fail via
+        ``start_succeeds=False``).
+
+    Returns the captured call list so the test can verify which docker
+    commands actually ran.
+    """
+    calls = []
+
+    def _run(cmd, **kwargs):
+        calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
+        if isinstance(cmd, list) and len(cmd) >= 2:
+            sub = cmd[1]
+            if sub == "version":
+                return subprocess.CompletedProcess(cmd, 0, stdout="Docker version", stderr="")
+            if sub == "ps":
+                if ps_state is None:
+                    return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+                return subprocess.CompletedProcess(
+                    cmd, 0, stdout=f"reused-cid\t{ps_state}\n", stderr="",
+                )
+            if sub == "start":
+                if not start_succeeds:
+                    # Real subprocess.run with check=True raises on non-zero exit;
+                    # mirror that so the production code's except clause fires.
+                    raise subprocess.CalledProcessError(1, cmd, output="", stderr="no such container")
+                return subprocess.CompletedProcess(cmd, 0, stdout="reused-cid\n", stderr="")
+            if sub == "run":
+                return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
+        return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _run)
+    return calls
+
+
+def test_reuse_attaches_to_running_container_without_docker_run(monkeypatch):
+    """When a labeled container is already ``running``, the reuse probe
+    must pick it up and skip ``docker run`` entirely. Regression for the
+    issue #20561 root cause: every Hermes process spawning a new container
+    despite docs claiming "ONE long-lived container shared across sessions"."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
+
+    env = _make_dummy_env(task_id="reuse-test")
+
+    # The reuse path must populate _container_id from the ps probe output.
+    assert env._container_id == "reused-cid", (
+        f"expected reused container id, got {env._container_id!r}"
+    )
+    # And it must NOT have run `docker run`.
+    run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
+    assert not run_invocations, (
+        f"docker run should be skipped on reuse, got: {run_invocations}"
+    )
+    # And it must have NOT issued a `docker start` for an already-running container.
+    start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
+    assert not start_invocations, (
+        f"docker start should be skipped when container already running, got: {start_invocations}"
+    )
+
+
+def test_reuse_starts_stopped_container_before_attaching(monkeypatch):
+    """A labeled container in ``exited`` state must be restarted via
+    ``docker start`` before the new Hermes process uses it. Without this
+    step, ``docker exec`` against a stopped container errors out and the
+    first agent command fails opaquely."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="exited")
+
+    env = _make_dummy_env(task_id="reuse-stopped")
+
+    assert env._container_id == "reused-cid"
+    start_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "start"]
+    assert start_invocations, "expected docker start for exited container"
+    run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
+    assert not run_invocations, "should not docker run when reusing an exited container"
+
+
+def test_reuse_falls_back_to_fresh_run_when_start_fails(monkeypatch):
+    """If ``docker start`` on the matched container fails (container was
+    removed between probe and start, daemon paused, etc.), the code must
+    silently fall through to a fresh ``docker run`` rather than leaving the
+    user with a broken environment. Defensive recovery — the probe is best-
+    effort, not authoritative."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    calls = _mock_subprocess_run_with_reuse(
+        monkeypatch, ps_state="exited", start_succeeds=False,
+    )
+
+    env = _make_dummy_env(task_id="reuse-broken-start")
+
+    # docker start should be attempted then fail; code falls through to run.
+    assert env._container_id == "fresh-cid", (
+        f"expected fresh container id after fallback, got {env._container_id!r}"
+    )
+    run_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "run"]
+    assert run_invocations, "fallback to fresh docker run must happen on start failure"
+
+
+def test_no_reuse_when_persist_across_processes_disabled(monkeypatch):
+    """Opt-out path: ``persist_across_processes=False`` skips the ps probe
+    entirely and always starts a fresh container, matching the pre-fix
+    behavior for users who want hard per-process isolation."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    # ps_state=running would trigger reuse if the probe ran — assert it doesn't.
+    calls = _mock_subprocess_run_with_reuse(monkeypatch, ps_state="running")
+
+    env = docker_env.DockerEnvironment(
+        image="python:3.11", cwd="/root", timeout=60,
+        task_id="no-reuse", persist_across_processes=False,
+    )
+
+    # Must NOT have issued docker ps (the probe is gated by the flag).
+    ps_invocations = [c for c in calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "ps"]
+    assert not ps_invocations, (
+        f"docker ps probe should be skipped when persist_across_processes=False, got: {ps_invocations}"
+    )
+    # Should have started a fresh container.
+    assert env._container_id == "fresh-cid"
+
+
+def test_find_reusable_container_prefers_running_over_stopped(monkeypatch):
+    """When the probe returns multiple matches (shouldn't normally happen,
+    but can after a crash leaves stale duplicates), a ``running`` container
+    is preferred over any stopped one. The duplicate gets reaped later by
+    the orphan reaper; we don't try to be heroic about it here."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+
+    def _run(cmd, **kwargs):
+        if isinstance(cmd, list) and len(cmd) >= 2:
+            if cmd[1] == "version":
+                return subprocess.CompletedProcess(cmd, 0, stdout="ok", stderr="")
+            if cmd[1] == "ps":
+                # Two matches: stopped first, running second.
+                return subprocess.CompletedProcess(
+                    cmd, 0,
+                    stdout="stopped-cid\texited\nrunning-cid\trunning\n",
+                    stderr="",
+                )
+        return subprocess.CompletedProcess(cmd, 0, stdout="fresh-cid\n", stderr="")
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _run)
+
+    env = _make_dummy_env(task_id="dup-match")
+    assert env._container_id == "running-cid", (
+        f"running container should win over stopped duplicate, got {env._container_id!r}"
+    )
+
+
+# ── Cleanup correctness (issue #20561) ────────────────────────────
+
+
+class _FakeThread:
+    """Stand-in for threading.Thread that captures target/args and calls
+    target() synchronously when .start() runs, so cleanup behavior is
+    observable without actually backgrounding subprocess calls."""
+
+    def __init__(self, target=None, daemon=None, name=None):
+        self._target = target
+        self.daemon = daemon
+        self.name = name
+        self._done = False
+
+    def start(self):
+        if self._target is not None:
+            self._target()
+        self._done = True
+
+    def is_alive(self):
+        return not self._done
+
+    def join(self, timeout=None):
+        self._done = True
+
+
+def _install_fake_thread(monkeypatch):
+    import threading
+    monkeypatch.setattr(threading, "Thread", _FakeThread)
+
+
+def test_cleanup_with_persist_is_noop_for_container(monkeypatch):
+    """``persist_across_processes=True`` (default) cleanup must NEITHER stop
+    NOR remove the container — the docs promise "ONE long-lived container
+    shared across sessions", and any docker stop would kill background
+    processes inside the container (npm watchers, pytest watchers, etc.).
+
+    Resource reclamation in this mode happens via the orphan reaper on next
+    Hermes startup, not on graceful exit. Issue #20561 — the first iteration
+    of this PR did docker stop here, which Ben caught as contradicting the
+    "ONE long-lived container" semantics."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+    _install_fake_thread(monkeypatch)
+
+    env = _make_dummy_env(task_id="cleanup-persist", persistent_filesystem=False)
+    # Default persist_across_processes=True.
+    container_id = env._container_id
+    assert container_id
+
+    cleanup_calls = []
+    real_run = docker_env.subprocess.run
+
+    def _capturing_run(cmd, **kwargs):
+        cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
+        return real_run(cmd, **kwargs)
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
+
+    env.cleanup()
+
+    stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
+    rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
+    assert not stops, (
+        f"docker stop must NOT be called when persist_across_processes=True; "
+        f"container has to stay running so background processes survive. "
+        f"Got: {stops}"
+    )
+    assert not rms, (
+        f"docker rm must NOT be called when persist_across_processes=True; "
+        f"reuse would be impossible. Got: {rms}"
+    )
+    # The in-process handle must still be cleared so the next __init__
+    # re-probes via labels (and reuses the still-running container).
+    assert env._container_id is None, (
+        "in-process container_id should be cleared even in no-op cleanup"
+    )
+
+
+def test_cleanup_force_remove_stops_and_rms_even_in_persist_mode(monkeypatch):
+    """``cleanup(force_remove=True)`` must stop AND rm the container even
+    when ``persist_across_processes=True``. This is the explicit-teardown
+    path for ``/reset``, ``cleanup_vm(task_id, force_remove=True)``, and any
+    future caller that wants a guaranteed fresh container.
+
+    Without this kwarg, callers in persist mode would have no way to force a
+    fresh container without also flipping the global config — too coarse for
+    a per-task reset.
+    """
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+    _install_fake_thread(monkeypatch)
+
+    env = _make_dummy_env(task_id="cleanup-force", persistent_filesystem=False)
+    assert env._container_id
+
+    cleanup_calls = []
+    real_run = docker_env.subprocess.run
+
+    def _capturing_run(cmd, **kwargs):
+        cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
+        return real_run(cmd, **kwargs)
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
+
+    env.cleanup(force_remove=True)
+
+    stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
+    rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
+    assert stops, f"force_remove must docker stop; got: {cleanup_calls}"
+    assert rms, f"force_remove must docker rm; got: {cleanup_calls}"
+
+
+def test_cleanup_vm_default_honors_persist_mode(monkeypatch):
+    """``cleanup_vm(task_id)`` without ``force_remove=True`` must be a no-op
+    for a persist-mode container.
+
+    Regression for the bug Ben caught after commit 4: ``AIAgent.close()``
+    (which is called from ``tui_gateway/server.py`` on session.close, from
+    ``gateway/run.py`` on per-session teardown, and from per-turn cleanup)
+    calls ``cleanup_vm(task_id)``. If that defaulted to ``force_remove=True``
+    we'd tear down the container on every TUI session close, defeating the
+    "ONE long-lived container shared across sessions" contract.
+    """
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+    _install_fake_thread(monkeypatch)
+
+    from tools import terminal_tool
+
+    env = _make_dummy_env(task_id="session-close-test")
+    container_id = env._container_id
+    terminal_tool._active_environments["session-close-test"] = env
+
+    cleanup_calls = []
+    real_run = docker_env.subprocess.run
+
+    def _capturing_run(cmd, **kwargs):
+        cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
+        return real_run(cmd, **kwargs)
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
+
+    try:
+        terminal_tool.cleanup_vm("session-close-test")
+    finally:
+        terminal_tool._active_environments.pop("session-close-test", None)
+
+    stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
+    rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
+    assert not stops, (
+        f"cleanup_vm() default must not docker stop a persist-mode container; "
+        f"got: {stops}"
+    )
+    assert not rms, (
+        f"cleanup_vm() default must not docker rm a persist-mode container; "
+        f"got: {rms}"
+    )
+
+
+def test_cleanup_vm_force_remove_tears_down_persist_container(monkeypatch):
+    """``cleanup_vm(task_id, force_remove=True)`` tears down a persist-mode
+    container — the explicit-teardown path for ``/reset``-style flows.
+
+    Also pins the runtime-signature-inspection plumbing: the kwarg must
+    actually flow through ``cleanup_vm`` into the backend's ``cleanup()``.
+    """
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+    _install_fake_thread(monkeypatch)
+
+    from tools import terminal_tool
+
+    env = _make_dummy_env(task_id="explicit-teardown-test")
+    terminal_tool._active_environments["explicit-teardown-test"] = env
+
+    cleanup_calls = []
+    real_run = docker_env.subprocess.run
+
+    def _capturing_run(cmd, **kwargs):
+        cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
+        return real_run(cmd, **kwargs)
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
+
+    try:
+        terminal_tool.cleanup_vm("explicit-teardown-test", force_remove=True)
+    finally:
+        terminal_tool._active_environments.pop("explicit-teardown-test", None)
+
+    stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
+    rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
+    assert stops, f"force_remove must reach docker stop; got: {cleanup_calls}"
+    assert rms, f"force_remove must reach docker rm; got: {cleanup_calls}"
+
+
+def test_cleanup_with_persist_disabled_stops_and_rms(monkeypatch):
+    """``persist_across_processes=False`` cleanup must docker stop AND docker
+    rm so containers don't leak. Crucially, this runs regardless of the
+    ``persistent_filesystem`` setting — the original code only rm'd when
+    ``not self._persistent``, which meant the default-on ``container_persistent:
+    true`` users (the documented happy path) leaked Exited containers forever.
+    Issue #20561 root-cause fix."""
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+    _install_fake_thread(monkeypatch)
+
+    # Note: persistent_filesystem=True (the prior-leak scenario) + the new
+    # cross-process toggle OFF must still result in a clean rm.
+    env = docker_env.DockerEnvironment(
+        image="python:3.11", cwd="/root", timeout=60,
+        task_id="cleanup-no-persist", persistent_filesystem=True,
+        persist_across_processes=False,
+    )
+
+    cleanup_calls = []
+    real_run = docker_env.subprocess.run
+
+    def _capturing_run(cmd, **kwargs):
+        cleanup_calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
+        return real_run(cmd, **kwargs)
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _capturing_run)
+
+    env.cleanup()
+
+    stops = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "stop"]
+    rms = [c for c in cleanup_calls if isinstance(c[0], list) and len(c[0]) >= 2 and c[0][1] == "rm"]
+    assert stops, "expected docker stop"
+    assert rms, (
+        "docker rm MUST run when persist_across_processes=False, even with "
+        "persistent_filesystem=True — that gating was the leak source in #20561."
+    )
+
+
+def test_cleanup_uses_subprocess_run_not_detached_shell(monkeypatch):
+    """The pre-fix code used ``subprocess.Popen("... &", shell=True)`` which
+    raced with parent-process exit and silently dropped cleanup work. The
+    new code must use ``subprocess.run`` with bounded ``timeout=`` so the
+    work actually completes within the process lifetime.
+
+    Asserts cleanup never reaches into shell-mode Popen. Uses
+    ``force_remove=True`` so cleanup actually issues docker calls — the
+    default persist-mode path is now a no-op (commit 4) and would trivially
+    pass this assertion without exercising the docker code at all.
+    """
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+    _install_fake_thread(monkeypatch)
+
+    def _forbidden_popen(*args, **kwargs):
+        raise AssertionError(
+            f"cleanup must not use subprocess.Popen anymore (issue #20561); "
+            f"got args={args} kwargs={kwargs}"
+        )
+
+    monkeypatch.setattr(docker_env.subprocess, "Popen", _forbidden_popen)
+
+    env = _make_dummy_env(task_id="no-popen-cleanup")
+    env.cleanup(force_remove=True)  # must not raise
+
+
+def test_wait_for_cleanup_returns_true_when_no_thread_started():
+    """``wait_for_cleanup`` must be a no-op when ``cleanup`` was never called
+    (or the env has no live cleanup thread) — atexit calls it unconditionally
+    across all active envs, so a False return would falsely flag healthy
+    shutdowns."""
+    env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
+    # No _cleanup_thread set — simulates an env that was never cleanup()'d.
+    assert env.wait_for_cleanup(timeout=1.0) is True
+
+
+def test_wait_for_cleanup_after_cleanup_returns_true(monkeypatch):
+    """End-to-end: cleanup() starts a thread, wait_for_cleanup() joins it
+    and reports completion. Atexit relies on this contract to ensure docker
+    stop/rm actually finishes before the Python interpreter exits.
+
+    Uses ``force_remove=True`` so cleanup actually starts a worker thread —
+    the default persist-mode cleanup is a no-op (commit 4) and never spawns
+    a thread, so the trivial "no thread" branch of wait_for_cleanup is
+    already covered by the previous test.
+    """
+    monkeypatch.setattr(docker_env, "find_docker", lambda: "/usr/bin/docker")
+    monkeypatch.setattr(docker_env, "_get_active_profile_name", lambda: "default")
+    _mock_subprocess_run(monkeypatch)
+    _install_fake_thread(monkeypatch)
+
+    env = _make_dummy_env(task_id="wait-test")
+    env.cleanup(force_remove=True)
+    assert env.wait_for_cleanup(timeout=5.0) is True
+
+
+def test_cleanup_on_env_with_no_container_id_does_not_raise(monkeypatch):
+    """A DockerEnvironment whose ``__init__`` failed before the container_id
+    was set (image-pull error, docker daemon down) should still be safe to
+    cleanup() — the post-creation failure path in callers always tries.
+    Without this guard the daemon-down case used to NameError on the cleanup
+    branch."""
+    env = docker_env.DockerEnvironment.__new__(docker_env.DockerEnvironment)
+    env._container_id = None
+    env._persistent = False
+    env._workspace_dir = None
+    env._home_dir = None
+    # No exception expected.
+    env.cleanup()
+
+
+# ── Orphan reaper (issue #20561) ──────────────────────────────────
+
+
+def _now_iso(offset_seconds: int = 0) -> str:
+    """Return an RFC3339 timestamp ``offset_seconds`` in the past."""
+    import datetime
+    t = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(seconds=offset_seconds)
+    # Format like Docker emits — with nanoseconds-style trailing digits.
+    return t.isoformat().replace("+00:00", ".123456789Z")
+
+
+def _reaper_run_mock(monkeypatch, ps_ids: list[str], inspect_responses: dict[str, str],
+                      rm_succeeds: bool = True):
+    """Build a subprocess.run mock for reaper tests.
+
+    * ``ps_ids`` — what ``docker ps -a --filter ... --format '{{.ID}}'`` returns
+    * ``inspect_responses[cid]`` — what ``docker inspect ... FinishedAt`` returns
+      for each cid; ``""`` means "field unset".
+    * ``rm_succeeds`` — whether ``docker rm -f`` returns 0.
+
+    Captures every call so tests can assert which containers were rm'd.
+    """
+    calls = []
+
+    def _run(cmd, **kwargs):
+        calls.append((list(cmd) if isinstance(cmd, list) else cmd, kwargs))
+        if not isinstance(cmd, list) or len(cmd) < 2:
+            return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+        sub = cmd[1]
+        if sub == "ps":
+            return subprocess.CompletedProcess(
+                cmd, 0, stdout="\n".join(ps_ids) + ("\n" if ps_ids else ""), stderr="",
+            )
+        if sub == "inspect":
+            # cmd is [docker, inspect, --format, '{{.State.FinishedAt}}', cid]
+            cid = cmd[-1]
+            return subprocess.CompletedProcess(
+                cmd, 0, stdout=inspect_responses.get(cid, "") + "\n", stderr="",
+            )
+        if sub == "rm":
+            return subprocess.CompletedProcess(
+                cmd, 0 if rm_succeeds else 1,
+                stdout="", stderr="" if rm_succeeds else "no such container",
+            )
+        return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _run)
+    return calls
+
+
+def test_reap_orphan_returns_zero_when_no_matches(monkeypatch):
+    """No labeled containers → no rm calls, returns 0. Establishes the
+    happy-path baseline for the orphan reaper (issue #20561)."""
+    calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
+
+    removed = docker_env.reap_orphan_containers(
+        max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
+    )
+
+    assert removed == 0
+    rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
+    assert not rms, "no rm calls expected when ps returns empty"
+
+
+def test_reap_orphan_removes_stale_exited_container(monkeypatch):
+    """An Exited container older than max_age_seconds must be removed.
+    This is the core repair path for issue #20561 — without the reaper,
+    SIGKILL'd Hermes processes leak containers permanently."""
+    old = _now_iso(offset_seconds=900)  # 15 minutes ago
+    calls = _reaper_run_mock(
+        monkeypatch, ps_ids=["old-cid"], inspect_responses={"old-cid": old},
+    )
+
+    removed = docker_env.reap_orphan_containers(
+        max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
+    )
+
+    assert removed == 1
+    rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
+    assert len(rms) == 1
+    assert "old-cid" in rms[0][0], f"expected rm of old-cid, got {rms[0][0]}"
+
+
+def test_reap_orphan_spares_recently_exited_container(monkeypatch):
+    """A container exited within max_age_seconds must NOT be reaped — that
+    container belongs to a Hermes process that just finished and may be
+    about to be replaced. Conservative window prevents racing sibling
+    processes."""
+    recent = _now_iso(offset_seconds=60)  # 1 minute ago
+    calls = _reaper_run_mock(
+        monkeypatch, ps_ids=["recent-cid"], inspect_responses={"recent-cid": recent},
+    )
+
+    removed = docker_env.reap_orphan_containers(
+        max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
+    )
+
+    assert removed == 0
+    rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
+    assert not rms, f"recent container must not be reaped, got rm calls: {rms}"
+
+
+def test_reap_orphan_scopes_to_profile_filter_via_label(monkeypatch):
+    """The reaper must pass ``--filter label=hermes-profile=<profile>`` to
+    docker ps so it never sweeps another profile's containers. A research
+    profile must not tear down the default profile's stragglers."""
+    calls = _reaper_run_mock(monkeypatch, ps_ids=[], inspect_responses={})
+
+    docker_env.reap_orphan_containers(
+        max_age_seconds=600, profile_filter="research-bot", docker_exe="/usr/bin/docker",
+    )
+
+    ps_calls = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["ps"]]
+    assert ps_calls, "expected at least one docker ps call"
+    flat = " ".join(ps_calls[0][0])
+    assert "label=hermes-profile=research-bot" in flat, (
+        f"profile filter not applied to docker ps; got args: {ps_calls[0][0]}"
+    )
+    assert "label=hermes-agent=1" in flat, (
+        f"hermes-agent label filter must also be applied; got: {ps_calls[0][0]}"
+    )
+    assert "status=exited" in flat, (
+        "must filter to exited containers only — running containers may "
+        "belong to a sibling Hermes process and must NEVER be reaped"
+    )
+
+
+def test_reap_orphan_skips_container_with_unparseable_finished_at(monkeypatch):
+    """If docker inspect returns the zero-value ``0001-01-01T00:00:00Z`` (no
+    FinishedAt yet) or an unparseable timestamp, the reaper must leave the
+    container alone. Defensive — never reap a container whose age we can't
+    determine."""
+    calls = _reaper_run_mock(
+        monkeypatch,
+        ps_ids=["never-finished", "garbage-ts"],
+        inspect_responses={
+            "never-finished": "0001-01-01T00:00:00Z",
+            "garbage-ts": "not-a-timestamp",
+        },
+    )
+
+    removed = docker_env.reap_orphan_containers(
+        max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
+    )
+
+    assert removed == 0
+    rms = [c for c in calls if isinstance(c[0], list) and c[0][1:2] == ["rm"]]
+    assert not rms, (
+        f"reaper must NOT remove containers with unparseable FinishedAt; got: {rms}"
+    )
+
+
+def test_reap_orphan_handles_docker_ps_failure_gracefully(monkeypatch):
+    """If docker ps itself fails (daemon down, permission denied), the
+    reaper returns 0 without crashing. The reaper is best-effort plumbing,
+    not a critical path — it must never block container creation."""
+    def _failing_ps(cmd, **kwargs):
+        if isinstance(cmd, list) and len(cmd) >= 2 and cmd[1] == "ps":
+            return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="Cannot connect to daemon")
+        return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _failing_ps)
+
+    # Must not raise
+    removed = docker_env.reap_orphan_containers(
+        max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
+    )
+    assert removed == 0
+
+
+def test_reap_orphan_continues_after_individual_rm_failure(monkeypatch):
+    """If ``docker rm -f`` fails on one container (already removed by a
+    concurrent process, container locked, etc.), the reaper must log and
+    continue to the next candidate rather than aborting the whole sweep."""
+    old = _now_iso(offset_seconds=900)
+    rm_calls = []
+
+    def _run(cmd, **kwargs):
+        if not isinstance(cmd, list) or len(cmd) < 2:
+            return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+        sub = cmd[1]
+        if sub == "ps":
+            return subprocess.CompletedProcess(
+                cmd, 0, stdout="cid-a\ncid-b\ncid-c\n", stderr="",
+            )
+        if sub == "inspect":
+            return subprocess.CompletedProcess(cmd, 0, stdout=old + "\n", stderr="")
+        if sub == "rm":
+            rm_calls.append(cmd[-1])
+            # cid-b fails; cid-a and cid-c succeed.
+            if cmd[-1] == "cid-b":
+                return subprocess.CompletedProcess(cmd, 1, stdout="", stderr="no such container")
+            return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+        return subprocess.CompletedProcess(cmd, 0, stdout="", stderr="")
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _run)
+
+    removed = docker_env.reap_orphan_containers(
+        max_age_seconds=600, profile_filter="default", docker_exe="/usr/bin/docker",
+    )
+
+    # All three were attempted, two succeeded.
+    assert removed == 2
+    assert set(rm_calls) == {"cid-a", "cid-b", "cid-c"}, (
+        f"reaper must attempt all candidates even when one fails; got: {rm_calls}"
+    )
+
+
+def test_container_finished_at_parses_nanosecond_timestamp(monkeypatch):
+    """Docker emits FinishedAt with nanosecond precision (RFC3339 with up to
+    9 fractional digits), but Python's fromisoformat caps at microseconds.
+    The helper must trim the extra digits without raising — otherwise every
+    candidate gets skipped and the reaper does nothing."""
+
+    def _run(cmd, **kwargs):
+        return subprocess.CompletedProcess(
+            cmd, 0,
+            stdout="2026-05-28T13:45:00.123456789Z\n",
+            stderr="",
+        )
+
+    monkeypatch.setattr(docker_env.subprocess, "run", _run)
+
+    result = docker_env._container_finished_at("/usr/bin/docker", "test-cid")
+    assert result is not None, "must parse RFC3339 with nanoseconds"
+    import datetime
+    assert result.tzinfo == datetime.timezone.utc
+    assert result.year == 2026 and result.month == 5 and result.day == 28
+
+
+def test_container_finished_at_returns_none_on_zero_value():
+    """Docker's zero-value ``0001-01-01T00:00:00Z`` (never finished) must
+    map to None so the reaper treats the container as unreapable."""
+    # Direct test of the parsing helper — no subprocess needed since the
+    # check happens after the inspect call returns.
+    import subprocess as _subprocess
+
+    class _MockRun:
+        def __init__(self, stdout):
+            self.returncode = 0
+            self.stdout = stdout
+            self.stderr = ""
+
+    import unittest.mock
+    with unittest.mock.patch.object(
+        docker_env.subprocess, "run", return_value=_MockRun("0001-01-01T00:00:00Z\n"),
+    ):
+        result = docker_env._container_finished_at("/usr/bin/docker", "never-finished")
+    assert result is None
@@ -0,0 +1,139 @@
+"""Integration tests for the docker orphan-reaper wiring in terminal_tool.
+
+The reaper itself is unit-tested in tests/tools/test_docker_environment.py
+under the "Orphan reaper" section. These tests cover the terminal_tool-side
+gates: once-per-process behavior, the disable flag, and the
+``lifetime_seconds`` doubling that determines the reaper's age threshold.
+
+Issue #20561 — without these gates, parallel subagents would each fire the
+reaper on container creation, and the ``terminal.docker_orphan_reaper: false``
+opt-out would silently do nothing.
+"""
+
+import os
+from unittest.mock import patch
+
+import tools.terminal_tool as terminal_tool
+
+
+def _reset_reaper_gate():
+    """Clear the once-per-process flag between tests."""
+    terminal_tool._docker_orphan_reaper_ran = False
+
+
+def test_maybe_reap_runs_once_per_process(monkeypatch):
+    """The reaper sweep must run at most once per Python interpreter.
+    Parallel subagents that each call _create_environment(env_type='docker')
+    would otherwise fire N concurrent docker ps + inspect storms against the
+    daemon and waste 5–10s of startup."""
+    _reset_reaper_gate()
+    call_count = {"reap": 0}
+
+    def _fake_reap(**kwargs):
+        call_count["reap"] += 1
+        return 0
+
+    with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
+        config = {"docker_orphan_reaper": True}
+        terminal_tool._maybe_reap_docker_orphans(config)
+        terminal_tool._maybe_reap_docker_orphans(config)
+        terminal_tool._maybe_reap_docker_orphans(config)
+
+    assert call_count["reap"] == 1, (
+        f"reaper must run exactly once per process; got {call_count['reap']} calls"
+    )
+
+
+def test_maybe_reap_respects_disable_flag(monkeypatch):
+    """``terminal.docker_orphan_reaper: false`` (via container_config) must
+    skip the sweep entirely — no docker ps, no inspect, no rm. The escape
+    hatch for operators running multiple Hermes processes in the same
+    profile."""
+    _reset_reaper_gate()
+    call_count = {"reap": 0}
+
+    def _fake_reap(**kwargs):
+        call_count["reap"] += 1
+        return 0
+
+    with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
+        terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": False})
+
+    assert call_count["reap"] == 0, "disabled reaper must not run any docker calls"
+    # The once-per-process gate must NOT be tripped when the reaper is
+    # disabled — that would prevent a subsequent toggle to true from working.
+    assert terminal_tool._docker_orphan_reaper_ran is False
+
+
+def test_maybe_reap_doubles_lifetime_for_max_age(monkeypatch):
+    """The reaper's age threshold is ``2 × lifetime_seconds`` (with a 60s
+    floor). Generous default — gives sibling Hermes processes ample grace
+    to be replaced without their just-exited containers being yanked."""
+    _reset_reaper_gate()
+    captured_args = {}
+
+    def _fake_reap(**kwargs):
+        captured_args.update(kwargs)
+        return 0
+
+    monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "300")
+    with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
+        terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
+
+    assert captured_args.get("max_age_seconds") == 600, (
+        f"expected 2 × 300 = 600, got {captured_args.get('max_age_seconds')}"
+    )
+
+
+def test_maybe_reap_floors_at_60_seconds(monkeypatch):
+    """A user pinning TERMINAL_LIFETIME_SECONDS=0 (or any value <30) would
+    otherwise get an effective age threshold of zero, which would race the
+    user's own just-started container creation. Floor at 60s × 2 = 120s."""
+    _reset_reaper_gate()
+    captured_args = {}
+
+    def _fake_reap(**kwargs):
+        captured_args.update(kwargs)
+        return 0
+
+    monkeypatch.setenv("TERMINAL_LIFETIME_SECONDS", "0")
+    with patch("tools.environments.docker.reap_orphan_containers", _fake_reap):
+        terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
+
+    assert captured_args.get("max_age_seconds") == 120, (
+        f"expected floored 60 × 2 = 120, got {captured_args.get('max_age_seconds')}"
+    )
+
+
+def test_maybe_reap_passes_current_profile_as_filter(monkeypatch):
+    """The reaper must be scoped to the current Hermes profile — a research
+    profile must NEVER reap default's containers. Verifies the
+    profile-filter wiring."""
+    _reset_reaper_gate()
+    captured_args = {}
+
+    def _fake_reap(**kwargs):
+        captured_args.update(kwargs)
+        return 0
+
+    with patch("tools.environments.docker.reap_orphan_containers", _fake_reap), \
+         patch("tools.environments.docker._get_active_profile_name", return_value="research-bot"):
+        terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
+
+    assert captured_args.get("profile_filter") == "research-bot", (
+        f"expected profile_filter='research-bot', got {captured_args.get('profile_filter')!r}"
+    )
+
+
+def test_maybe_reap_swallows_exceptions(monkeypatch):
+    """A reaper crash (docker daemon down, parse error in helper) must NOT
+    block env creation. The reaper is best-effort plumbing, not a critical
+    path; failures get logged at debug level and execution continues."""
+    _reset_reaper_gate()
+
+    def _exploding_reap(**kwargs):
+        raise RuntimeError("docker daemon ate the cat")
+
+    with patch("tools.environments.docker.reap_orphan_containers", _exploding_reap):
+        # Must not raise
+        terminal_tool._maybe_reap_docker_orphans({"docker_orphan_reaper": True})
@@ -34,6 +34,39 @@ def test_resolve_stdio_command_falls_back_to_hermes_node_bin(tmp_path):
    assert env["PATH"].split(os.pathsep)[0] == str(node_bin)


+def test_resolve_stdio_command_falls_back_to_usr_local_bin():
+    """When ``npx`` isn't on the filtered PATH and isn't under ``$HERMES_HOME/node/bin``
+    or ``~/.local/bin``, the resolver should still locate it at ``/usr/local/bin/npx``.
+
+    This is the canonical install location for Node on Linux from-source builds,
+    the upstream ``node:bookworm-slim`` image (which the Hermes Docker image
+    copies ``node + npm + corepack`` from since #4977), and macOS Homebrew on
+    Intel. Without this candidate, MCP servers run with an ``env.PATH`` that
+    omits ``/usr/local/bin`` (common when users hand-author PATH for sandboxing)
+    fail with ENOENT at ``execvp``.
+    """
+    target = os.path.join(os.sep, "usr", "local", "bin", "npx")
+
+    # Pretend ONLY the /usr/local/bin/npx candidate exists and is executable —
+    # the other candidates ($HERMES_HOME/node/bin/npx and ~/.local/bin/npx)
+    # should fail isfile() and the resolver must fall through to /usr/local/bin.
+    def _fake_isfile(path):
+        return path == target
+
+    def _fake_access(path, _mode):
+        return path == target
+
+    with patch("tools.mcp_tool.shutil.which", return_value=None), \
+         patch("tools.mcp_tool.os.path.isfile", side_effect=_fake_isfile), \
+         patch("tools.mcp_tool.os.access", side_effect=_fake_access):
+        command, env = _resolve_stdio_command("npx", {"PATH": "/opt/data/bin:/usr/bin:/bin"})
+
+    assert command == target
+    # /usr/local/bin must be prepended so npx's shebang (`/usr/bin/env node`)
+    # can find node in the same directory.
+    assert env["PATH"].split(os.pathsep)[0] == os.path.dirname(target)
+
+
 def test_resolve_stdio_command_respects_explicit_empty_path():
    seen_paths = []

@@ -224,3 +224,39 @@ def test_docker_env_is_bridged_everywhere():
    assert "docker_env" in _gateway_env_map_keys()
    assert "docker_env" in _save_config_env_sync_keys()
    assert "TERMINAL_DOCKER_ENV" in _terminal_tool_env_var_names()
+
+
+def test_docker_persist_across_processes_is_bridged_everywhere():
+    """Regression pin for the cross-process container reuse toggle.
+
+    ``terminal.docker_persist_across_processes`` (issue #20561) controls
+    whether ``DockerEnvironment.__init__`` probes for and reuses an existing
+    labeled container at startup, and whether ``cleanup()`` removes the
+    container on Hermes exit or just stops it (keeping it for the next
+    process).  Same four-bridge invariant as docker_run_as_host_user /
+    docker_env / docker_mount_cwd_to_workspace — drift between any of the
+    four sites means ``terminal.docker_persist_across_processes: false`` in
+    config.yaml silently does nothing for that entry point, leaving the
+    user unable to opt out of the documented "ONE long-lived container
+    shared across sessions" behavior.
+    """
+    assert "docker_persist_across_processes" in _cli_env_map_keys()
+    assert "docker_persist_across_processes" in _gateway_env_map_keys()
+    assert "docker_persist_across_processes" in _save_config_env_sync_keys()
+    assert "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES" in _terminal_tool_env_var_names()
+
+
+def test_docker_orphan_reaper_is_bridged_everywhere():
+    """Regression pin for the startup orphan reaper toggle (issue #20561).
+
+    ``terminal.docker_orphan_reaper`` controls whether Hermes sweeps stale
+    Exited containers from prior SIGKILL'd processes at startup.  Same
+    four-site bridge invariant — drift means
+    ``terminal.docker_orphan_reaper: false`` silently does nothing for one
+    entry point, and the reaper either runs when the operator disabled it
+    or fails to run when they enabled it.
+    """
+    assert "docker_orphan_reaper" in _cli_env_map_keys()
+    assert "docker_orphan_reaper" in _gateway_env_map_keys()
+    assert "docker_orphan_reaper" in _save_config_env_sync_keys()
+    assert "TERMINAL_DOCKER_ORPHAN_REAPER" in _terminal_tool_env_var_names()
@@ -44,11 +44,17 @@ def server(hermes_home):
    ):
        mod = importlib.import_module("tui_gateway.server")
        yield mod
+        # Reset module-level session state without re-importing. importlib.reload
+        # would re-register the module's atexit hooks (ThreadPoolExecutor
+        # shutdown, _shutdown_sessions); the duplicates race the stderr
+        # buffer at interpreter shutdown and surface as Fatal Python error:
+        # _enter_buffered_busy. Clearing the per-session dicts gives the
+        # next test a clean slate; _methods is NOT cleared because it's
+        # populated at module import time and re-registration only happens
+        # via reload (which we don't do).
        mod._sessions.clear()
        mod._pending.clear()
        mod._answers.clear()
-        mod._methods.clear()
-        importlib.reload(mod)


@pytest.fixture()
@@ -30,11 +30,17 @@ def server():
        import importlib
        mod = importlib.import_module("tui_gateway.server")
        yield mod
+        # Reset module-level session state without re-importing. importlib.reload
+        # would re-register the module's atexit hooks (ThreadPoolExecutor
+        # shutdown, _shutdown_sessions); the duplicates race the stderr
+        # buffer at interpreter shutdown and surface as Fatal Python error:
+        # _enter_buffered_busy. Clearing the per-session dicts gives the
+        # next test a clean slate; _methods is NOT cleared because it's
+        # populated at module import time and re-registration only happens
+        # via reload (which we don't do).
        mod._sessions.clear()
        mod._pending.clear()
        mod._answers.clear()
-        mod._methods.clear()
-        importlib.reload(mod)


@pytest.fixture()
@@ -34,11 +34,17 @@ def server():

        mod = importlib.import_module("tui_gateway.server")
        yield mod
+        # Reset module-level session state without re-importing. importlib.reload
+        # would re-register the module's atexit hooks (ThreadPoolExecutor
+        # shutdown, _shutdown_sessions); the duplicates race the stderr
+        # buffer at interpreter shutdown and surface as Fatal Python error:
+        # _enter_buffered_busy. Clearing the per-session dicts gives the
+        # next test a clean slate; _methods is NOT cleared because it's
+        # populated at module import time and re-registration only happens
+        # via reload (which we don't do).
        mod._sessions.clear()
        mod._pending.clear()
        mod._answers.clear()
-        mod._methods.clear()
-        importlib.reload(mod)


 def test_init_session_attaches_background_review_callback(server, monkeypatch):
@@ -98,6 +98,167 @@ def _load_hermes_env_vars() -> dict[str, str]:
        return {}


+# Docker label values must match [a-zA-Z0-9_.-] and stay ≤63 chars to round-trip
+# safely through `docker ps --filter label=key=value`. Profile and task names
+# can technically contain other characters; sanitize defensively.
+_LABEL_VALUE_OK_RE = re.compile(r"[^A-Za-z0-9_.-]")
+
+
+def _sanitize_label_value(value: str) -> str:
+    """Coerce *value* into a Docker label-safe form (alnum + ``_.-``, ≤63 chars).
+
+    Empty or all-invalid inputs collapse to ``"unknown"`` so the resulting
+    label is always queryable. Used at container-create time; never round-trip
+    a sanitized value back into application logic.
+    """
+    if not isinstance(value, str) or not value:
+        return "unknown"
+    cleaned = _LABEL_VALUE_OK_RE.sub("_", value)
+    cleaned = cleaned[:63] or "unknown"
+    return cleaned
+
+
+def _get_active_profile_name() -> str:
+    """Return the active Hermes profile name, or ``"default"`` on any error.
+
+    Resolved at container-create time so a single container is permanently
+    tagged with the profile that created it. Profile switches inside the
+    same process don't retroactively relabel running containers.
+    """
+    try:
+        from hermes_cli.profiles import get_active_profile_name
+
+        return get_active_profile_name() or "default"
+    except Exception:
+        return "default"
+
+
+def reap_orphan_containers(
+    *,
+    max_age_seconds: int = 600,
+    profile_filter: str | None = None,
+    docker_exe: str | None = None,
+) -> int:
+    """Remove stale hermes-tagged containers left behind by prior processes.
+
+    Targets containers that match all of:
+
+    * ``label=hermes-agent=1`` (created by this codebase)
+    * ``status=exited`` (running containers are NEVER reaped — they may
+      belong to a sibling Hermes process whose reuse path will pick them
+      up; killing them would crash the sibling mid-command)
+    * (optional) ``label=hermes-profile=<profile_filter>`` (sweep only the
+      caller's profile by default; a hermes process in profile A must not
+      tear down profile B's containers)
+    * ``State.FinishedAt`` older than *max_age_seconds* ago (so a sibling
+      process that just exited and is about to be replaced doesn't get
+      its container yanked out from under it)
+
+    Returns the number of containers removed. Best-effort: any failure
+    (docker daemon unreachable, slow inspect, parse error) is logged at
+    debug level and the function returns whatever it managed before the
+    failure. Safe to call repeatedly; idempotent.
+
+    Issue #20561 — this is the safety net for SIGKILL / OOM / crashed
+    terminal exits that bypass the ``atexit`` cleanup hook. Without it,
+    even with the cleanup-fix in the prior commit, a hard-killed Hermes
+    process leaves its container behind permanently because there's no
+    subsequent Hermes process scheduled to reuse that exact (task, profile)
+    pair.
+    """
+    docker = docker_exe or find_docker() or "docker"
+    filters = ["--filter", "label=hermes-agent=1", "--filter", "status=exited"]
+    if profile_filter:
+        filters.extend(["--filter", f"label=hermes-profile={_sanitize_label_value(profile_filter)}"])
+
+    try:
+        listing = subprocess.run(
+            [docker, "ps", "-a", *filters, "--format", "{{.ID}}"],
+            capture_output=True, text=True, timeout=15, check=False,
+        )
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.debug("orphan reaper docker ps failed: %s", e)
+        return 0
+    if listing.returncode != 0:
+        logger.debug(
+            "orphan reaper docker ps returned %d: %s",
+            listing.returncode, listing.stderr.strip(),
+        )
+        return 0
+
+    candidate_ids = [ln.strip() for ln in listing.stdout.splitlines() if ln.strip()]
+    if not candidate_ids:
+        return 0
+
+    # Inspect each candidate to get FinishedAt; reap only those exited
+    # long enough ago.  Doing this per-container (rather than bulk inspect)
+    # keeps the failure blast radius to one container at a time.
+    import datetime
+    now = datetime.datetime.now(datetime.timezone.utc)
+    removed = 0
+    for cid in candidate_ids:
+        finished_at = _container_finished_at(docker, cid)
+        if finished_at is None:
+            # Couldn't determine age — be conservative and leave it alone.
+            continue
+        age = (now - finished_at).total_seconds()
+        if age < max_age_seconds:
+            continue
+        try:
+            result = subprocess.run(
+                [docker, "rm", "-f", cid],
+                capture_output=True, text=True, timeout=30,
+            )
+            if result.returncode == 0:
+                removed += 1
+                logger.info(
+                    "Reaped orphan container %s (exited %d seconds ago)",
+                    cid[:12], int(age),
+                )
+            else:
+                logger.debug(
+                    "docker rm -f %s failed: %s",
+                    cid[:12], result.stderr.strip(),
+                )
+        except (subprocess.TimeoutExpired, OSError) as e:
+            logger.debug("orphan reaper docker rm %s failed: %s", cid[:12], e)
+    return removed
+
+
+def _container_finished_at(docker_exe: str, container_id: str):
+    """Parse ``docker inspect`` FinishedAt for *container_id*.
+
+    Returns a timezone-aware datetime, or ``None`` if the field is missing,
+    unparseable, or the zero-value ``0001-01-01T00:00:00Z`` Docker emits
+    for never-finished containers. ``None`` means "don't reap" — the caller
+    leaves the container alone.
+    """
+    try:
+        result = subprocess.run(
+            [docker_exe, "inspect", "--format", "{{.State.FinishedAt}}", container_id],
+            capture_output=True, text=True, timeout=10, check=False,
+        )
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.debug("orphan reaper docker inspect %s failed: %s", container_id[:12], e)
+        return None
+    if result.returncode != 0:
+        return None
+    raw = result.stdout.strip()
+    if not raw or raw.startswith("0001-01-01"):
+        return None
+    # Docker emits RFC3339 with nanoseconds (e.g. "2026-05-28T13:45:00.123456789Z").
+    # Python's fromisoformat handles microseconds but not nanoseconds; trim.
+    import re as _re
+    raw = _re.sub(r"(\.\d{6})\d+", r"\1", raw)
+    raw = raw.replace("Z", "+00:00")
+    try:
+        import datetime
+        return datetime.datetime.fromisoformat(raw)
+    except ValueError as e:
+        logger.debug("could not parse FinishedAt %r for %s: %s", raw, container_id[:12], e)
+        return None
+
+
 def find_docker() -> Optional[str]:
    """Locate the docker (or podman) CLI binary.

@@ -304,15 +465,18 @@ class DockerEnvironment(BaseEnvironment):
        auto_mount_cwd: bool = False,
        run_as_host_user: bool = False,
        extra_args: list = None,
+        persist_across_processes: bool = True,
    ):
        if cwd == "~":
            cwd = "/root"
        super().__init__(cwd=cwd, timeout=timeout)
        self._persistent = persistent_filesystem
+        self._persist_across_processes = persist_across_processes
        self._task_id = task_id
        self._forward_env = _normalize_forward_env_names(forward_env)
        self._env = _normalize_env_dict(env)
        self._container_id: Optional[str] = None
+        self._labels: dict[str, str] = {}
        logger.info(f"DockerEnvironment volumes: {volumes}")
        # Ensure volumes is a list (config.yaml could be malformed)
        if volumes is not None and not isinstance(volumes, list):
@@ -506,25 +670,88 @@ class DockerEnvironment(BaseEnvironment):

        # Start the container directly via `docker run -d`.
        container_name = f"hermes-{uuid.uuid4().hex[:8]}"
-        run_cmd = [
-            self._docker_exe, "run", "-d",
-            "--init",           # tini/catatonit as PID 1 — reaps zombie children
-            "--name", container_name,
-            "-w", cwd,
-            *all_run_args,
-            image,
-            "sleep", "infinity",  # no fixed lifetime — idle reaper handles cleanup
+        # Labels make hermes-created containers identifiable to:
+        #   * the orphan reaper (`hermes-agent=1` for the global sweep filter)
+        #   * future cross-process reuse (`hermes-task-id`, `hermes-profile`)
+        #   * operators running `docker ps --filter label=hermes-agent=1`
+        # Values are limited to the safe character set defined by
+        # _sanitize_label_value(); the active Hermes profile is captured at
+        # container-start time and never changes for the container's lifetime.
+        profile_name = _sanitize_label_value(_get_active_profile_name())
+        task_label = _sanitize_label_value(task_id)
+        label_args = [
+            "--label", "hermes-agent=1",
+            "--label", f"hermes-task-id={task_label}",
+            "--label", f"hermes-profile={profile_name}",
        ]
-        logger.debug(f"Starting container: {' '.join(run_cmd)}")
-        result = subprocess.run(
-            run_cmd,
-            capture_output=True,
-            text=True,
-            timeout=120,  # image pull may take a while
-            check=True,
-        )
-        self._container_id = result.stdout.strip()
-        logger.info(f"Started container {container_name} ({self._container_id[:12]})")
+        self._labels = {
+            "hermes-agent": "1",
+            "hermes-task-id": task_label,
+            "hermes-profile": profile_name,
+        }
+
+        # Cross-process container reuse (issue #20561 — docs claim "ONE long-lived
+        # container shared across sessions").  If a prior Hermes process
+        # already started a container for this (task_id, profile) and it
+        # still exists, attach to it instead of starting a fresh one.  This
+        # restores the documented contract; opt out via
+        # ``terminal.docker_persist_across_processes: false``.
+        #
+        # Reuse matches on labels only — we deliberately do NOT compare image
+        # / mounts / resources.  Operators who need a fresh container after
+        # changing those settings should set ``docker_persist_across_processes:
+        # false`` (or run ``docker rm -f`` against the labeled container) to
+        # force a clean start.
+        reused = False
+        if persist_across_processes:
+            existing = self._find_reusable_container(task_label, profile_name)
+            if existing is not None:
+                container_id, state = existing
+                self._container_id = container_id
+                if state != "running":
+                    try:
+                        subprocess.run(
+                            [self._docker_exe, "start", container_id],
+                            capture_output=True,
+                            text=True,
+                            timeout=30,
+                            check=True,
+                        )
+                    except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
+                        logger.warning(
+                            "Failed to start existing container %s (state=%s): "
+                            "%s — falling back to a fresh container.",
+                            container_id[:12], state, e,
+                        )
+                        self._container_id = None
+                if self._container_id:
+                    logger.info(
+                        "Reusing container %s (task=%s, profile=%s, prior state=%s)",
+                        container_id[:12], task_label, profile_name, state,
+                    )
+                    reused = True
+
+        if not reused:
+            run_cmd = [
+                self._docker_exe, "run", "-d",
+                "--init",           # tini/catatonit as PID 1 — reaps zombie children
+                "--name", container_name,
+                *label_args,
+                "-w", cwd,
+                *all_run_args,
+                image,
+                "sleep", "infinity",  # no fixed lifetime — idle reaper handles cleanup
+            ]
+            logger.debug(f"Starting container: {' '.join(run_cmd)}")
+            result = subprocess.run(
+                run_cmd,
+                capture_output=True,
+                text=True,
+                timeout=120,  # image pull may take a while
+                check=True,
+            )
+            self._container_id = result.stdout.strip()
+            logger.info(f"Started container {container_name} ({self._container_id[:12]})")

        # Build the init-time env forwarding args (used only by init_session
        # to inject host env vars into the snapshot; subsequent commands get
@@ -629,31 +856,191 @@ class DockerEnvironment(BaseEnvironment):
        logger.debug("Docker --storage-opt support: %s", _storage_opt_ok)
        return _storage_opt_ok

-    def cleanup(self):
-        """Stop and remove the container. Bind-mount dirs persist if persistent=True."""
-        if self._container_id:
-            try:
-                # Stop in background so cleanup doesn't block
-                stop_cmd = (
-                    f"(timeout 60 {self._docker_exe} stop {self._container_id} || "
-                    f"{self._docker_exe} rm -f {self._container_id}) >/dev/null 2>&1 &"
-                )
-                subprocess.Popen(stop_cmd, shell=True)
-            except Exception as e:
-                logger.warning("Failed to stop container %s: %s", self._container_id, e)
+    def _find_reusable_container(self, task_label: str, profile_label: str) -> Optional[tuple[str, str]]:
+        """Look for an existing container labeled for this (task, profile).

+        Returns ``(container_id, state)`` on hit, ``None`` on miss / on any
+        failure (including ``docker ps`` itself failing). State is one of the
+        values Docker reports via ``{{.State}}`` — e.g. ``running``, ``exited``,
+        ``created``, ``paused``, ``restarting``, ``dead``. The caller decides
+        whether the state warrants ``docker start`` before reuse.
+
+        Restricted to the docker-stored label set this class creates; never
+        matches containers that happened to be named ``hermes-*`` but were
+        started by some other tool.
+        """
+        try:
+            result = subprocess.run(
+                [
+                    self._docker_exe, "ps", "-a",
+                    "--filter", "label=hermes-agent=1",
+                    "--filter", f"label=hermes-task-id={task_label}",
+                    "--filter", f"label=hermes-profile={profile_label}",
+                    "--format", "{{.ID}}\t{{.State}}",
+                ],
+                capture_output=True,
+                text=True,
+                timeout=10,
+                check=False,
+            )
+        except (subprocess.TimeoutExpired, OSError) as e:
+            logger.debug("docker ps probe failed: %s — will start a fresh container", e)
+            return None
+        if result.returncode != 0:
+            logger.debug(
+                "docker ps probe returned %d: %s — will start a fresh container",
+                result.returncode, result.stderr.strip(),
+            )
+            return None
+        lines = [ln.strip() for ln in result.stdout.splitlines() if ln.strip()]
+        if not lines:
+            return None
+        # Multiple matches are unusual (one (task, profile) should produce one
+        # container) but can happen if a previous Hermes process crashed
+        # mid-cleanup. Prefer a running one if present; otherwise pick the
+        # first listed. Stale duplicates get reaped by the orphan-reaper in a
+        # follow-up commit; we don't try to be heroic about them here.
+        running = None
+        first = None
+        for ln in lines:
+            parts = ln.split("\t", 1)
+            if len(parts) != 2:
+                continue
+            cid, state = parts[0], parts[1].lower()
+            if first is None:
+                first = (cid, state)
+            if state == "running" and running is None:
+                running = (cid, state)
+        return running or first
+
+    def cleanup(self, *, force_remove: bool = False):
+        """Tear down the container according to persist mode and *force_remove*.
+
+        Persist-mode (``persist_across_processes=True``, the default) leaves the
+        container **running** untouched. The docs promise "ONE long-lived
+        container shared across sessions" and stopping it on every Hermes exit
+        breaks that promise:
+
+        * Background processes inside the container (``npm run dev``, watchers,
+          long-running pytest) get killed every time the user runs ``/quit``.
+        * Every reuse requires ``docker start`` + waiting for the container to
+          come back up, adding 1–2s to the first tool call of the new session.
+        * The user-visible difference between "ONE long-lived container" and
+          "a new container that happens to share state" is exactly this:
+          processes survive in the former, die in the latter.
+
+        Resource reclamation for the persist-mode case lives in the
+        ``reap_orphan_containers()`` path (see issue #20561 commit 3): if no
+        Hermes process touches a labeled container for ``2 × lifetime_seconds``
+        it gets ``docker rm -f``'d at the next Hermes startup. That covers the
+        SIGKILL / OOM / abandoned-laptop cases without us needing to stop the
+        container on every graceful exit.
+
+        Opt-out mode (``persist_across_processes=False``) still does
+        ``docker stop`` + ``docker rm -f`` on every cleanup, matching the
+        pre-PR behavior for users who explicitly want per-process isolation.
+
+        ``force_remove=True`` overrides persist mode and always tears the
+        container down (``docker stop`` + ``docker rm -f``). This is the
+        explicit-teardown path for ``/reset``, ``cleanup_vm(task_id)``-driven
+        resets, or any caller that wants a guaranteed fresh container on next
+        ``DockerEnvironment(task_id=...)``. No current caller passes
+        ``force_remove=True``; the parameter is here so the explicit-teardown
+        semantics can be wired up later without changing this method's
+        signature.
+
+        Cleanup runs on a daemon thread with bounded ``subprocess.run`` calls
+        (not the racy ``Popen(... &)`` pattern from before PR #33645). The
+        atexit hook in ``tools/terminal_tool.py`` waits up to 15s for the
+        thread to finish before the interpreter exits, so ``docker stop`` /
+        ``docker rm`` actually completes when we do trigger it.
+        """
+        container_id = self._container_id
+        if not container_id:
+            # Still drop the bind-mount dirs if any were allocated and we're
+            # NOT in persist mode (persist mode preserves them).
            if not self._persistent:
-                # Also schedule removal (stop only leaves it as stopped)
-                try:
-                    subprocess.Popen(
-                        f"sleep 3 && {self._docker_exe} rm -f {self._container_id} >/dev/null 2>&1 &",
-                        shell=True,
-                    )
-                except Exception:
-                    pass
-            self._container_id = None
+                for d in (self._workspace_dir, self._home_dir):
+                    if d:
+                        shutil.rmtree(d, ignore_errors=True)
+            return

-        if not self._persistent:
+        # Decide what to actually do. Three cases:
+        #
+        #   force_remove=True             → stop + rm (explicit teardown)
+        #   persist_across_processes=True → no-op (leave container running)
+        #   persist_across_processes=False → stop + rm (per-process isolation)
+        #
+        # The persist-mode no-op is the issue-#20561 contract: the container
+        # outlives Hermes processes, processes inside it stay alive, and
+        # reuse on next startup is instant.
+        if force_remove:
+            should_stop = True
+            should_remove = True
+        elif self._persist_across_processes:
+            # No-op for the container. Drop the in-process handle so a fresh
+            # __init__ will re-probe via labels (and find the running
+            # container) instead of trying to reuse a stale Python reference.
+            self._container_id = None
+            return
+        else:
+            should_stop = True
+            should_remove = True
+
+        # Capture state needed by the worker before we null out the attrs —
+        # the worker thread can outlive ``self``.
+        docker_exe = self._docker_exe
+        log_id = container_id[:12]
+
+        def _do_cleanup() -> None:
+            if should_stop:
+                try:
+                    subprocess.run(
+                        [docker_exe, "stop", "-t", "10", container_id],
+                        capture_output=True, timeout=30,
+                    )
+                except (subprocess.TimeoutExpired, OSError) as e:
+                    logger.warning("docker stop %s timed out / failed: %s", log_id, e)
+            if should_remove:
+                try:
+                    subprocess.run(
+                        [docker_exe, "rm", "-f", container_id],
+                        capture_output=True, timeout=30,
+                    )
+                except (subprocess.TimeoutExpired, OSError) as e:
+                    logger.warning("docker rm -f %s failed: %s", log_id, e)
+
+        # Daemon thread: doesn't block interpreter exit (atexit returns
+        # promptly), but unlike the old ``Popen(... &)`` shell trick the
+        # Python-level join semantics let the thread actually run to
+        # completion if the interpreter is still alive. atexit registers
+        # ``_atexit_cleanup`` in terminal_tool.py which waits up to ~60s for
+        # outstanding cleanups, so most exits complete the work cleanly.
+        import threading
+        t = threading.Thread(target=_do_cleanup, daemon=True, name=f"hermes-cleanup-{log_id}")
+        t.start()
+        self._cleanup_thread = t
+        self._container_id = None
+
+        # Bind-mount dir teardown only runs when we actually removed the
+        # container (the dirs are the container's filesystem state; keeping
+        # them around with no container would orphan the data on disk).
+        if should_remove and not self._persistent:
            for d in (self._workspace_dir, self._home_dir):
                if d:
                    shutil.rmtree(d, ignore_errors=True)
+
+    def wait_for_cleanup(self, timeout: float = 30.0) -> bool:
+        """Block up to *timeout* seconds for the cleanup worker thread.
+
+        Returns ``True`` if the thread finished (or no thread was started),
+        ``False`` on timeout. The atexit hook in terminal_tool.py calls this
+        on every active environment so docker stop/rm actually completes
+        before the Python process exits — without this, ``hermes /quit``
+        races the interpreter shutdown and leaves stopped containers behind.
+        """
+        thread = getattr(self, "_cleanup_thread", None)
+        if thread is None or not thread.is_alive():
+            return True
+        thread.join(timeout=timeout)
+        return not thread.is_alive()
@@ -422,6 +422,17 @@ def _resolve_stdio_command(command: str, env: dict) -> tuple[str, dict]:
            candidates = [
                os.path.join(hermes_home, "node", "bin", resolved_command),
                os.path.join(os.path.expanduser("~"), ".local", "bin", resolved_command),
+                # /usr/local/bin is the canonical install location for Node on
+                # Linux from-source builds, the upstream node:bookworm-slim
+                # image (which the Hermes Docker image copies node + npm +
+                # corepack from since #4977), and macOS Homebrew on Intel.
+                # Without this candidate, any MCP server configured with an
+                # env.PATH that omits /usr/local/bin (a common pattern when
+                # users hand-author PATH for sandboxing) fails with ENOENT
+                # at execvp, and a naive symlink workaround into the user's
+                # PATH only fails one layer deeper because npx's shebang
+                # re-execs /usr/bin/env node which needs the same directory.
+                os.path.join(os.sep, "usr", "local", "bin", resolved_command),
            ]
            for candidate in candidates:
                if os.path.isfile(candidate) and os.access(candidate, os.X_OK):
@@ -861,6 +861,78 @@ _creation_locks_lock = threading.Lock()  # Protects _creation_locks dict itself
 _cleanup_thread = None
 _cleanup_running = False

+# Once-per-process guard for the docker orphan reaper (issue #20561).
+# Set when _maybe_reap_docker_orphans first runs; concurrent _create_environment
+# calls for parallel subagents won't re-trigger the sweep.
+_docker_orphan_reaper_ran = False
+_docker_orphan_reaper_lock = threading.Lock()
+
+
+def _maybe_reap_docker_orphans(container_config: Dict[str, Any]) -> None:
+    """Run the docker orphan reaper once per process, if enabled.
+
+    Sweeps long-Exited containers labeled ``hermes-agent=1`` for the current
+    profile that match the issue #20561 leak class — containers left behind
+    by Hermes processes that exited without firing ``atexit`` (SIGKILL,
+    OOM, terminal-window-close). The reaper is conservative by default:
+    only Exited containers older than ``2 × lifetime_seconds`` and scoped to
+    the current profile.
+
+    Gates:
+
+    * ``terminal.docker_orphan_reaper: false`` disables it entirely (the
+      operator opted out — usually because they're running multiple
+      Hermes processes in the same profile and don't trust the
+      conservative defaults).
+    * ``_docker_orphan_reaper_ran`` flag — sweep runs once per Python
+      interpreter, not on every subagent / RL-rollout / parallel
+      ``terminal()`` call.
+    """
+    global _docker_orphan_reaper_ran
+    if not container_config.get("docker_orphan_reaper", True):
+        return
+    # Cheap double-checked-locking: read without the lock, take the lock
+    # only on first run, recheck inside.
+    if _docker_orphan_reaper_ran:
+        return
+    with _docker_orphan_reaper_lock:
+        if _docker_orphan_reaper_ran:
+            return
+        _docker_orphan_reaper_ran = True
+
+    # 2 × lifetime_seconds gives sibling Hermes processes a generous grace
+    # window. Floor at 60s so an operator with TERMINAL_LIFETIME_SECONDS=0
+    # doesn't get an instant-reap that races their own setup.
+    # ``container_config`` only carries container_* keys, so read
+    # lifetime_seconds from the env var the rest of the module uses.
+    try:
+        lifetime = int(os.getenv("TERMINAL_LIFETIME_SECONDS", "300"))
+    except (TypeError, ValueError):
+        lifetime = 300
+    lifetime = max(60, lifetime)
+    max_age = lifetime * 2
+
+    try:
+        from tools.environments.docker import (
+            reap_orphan_containers, _get_active_profile_name,
+        )
+    except ImportError:
+        return
+    try:
+        profile = _get_active_profile_name()
+        removed = reap_orphan_containers(
+            max_age_seconds=max_age, profile_filter=profile,
+        )
+        if removed:
+            logger.info(
+                "Docker orphan reaper removed %d stale container(s) for profile %s",
+                removed, profile,
+            )
+    except Exception as e:
+        # Never fail the env-creation path because of a janitor problem.
+        logger.debug("Docker orphan reaper raised: %s", e)
+
+
 # Per-task environment overrides registry.
 # Allows environments (e.g., TerminalBench2Env) to specify a custom Docker/Modal
 # image for a specific task_id BEFORE the agent loop starts. When the terminal or
@@ -1024,6 +1096,22 @@ def _get_env_config() -> Dict[str, Any]:
        "docker_env": _parse_env_var("TERMINAL_DOCKER_ENV", "{}", json.loads, "valid JSON"),
        "docker_run_as_host_user": os.getenv("TERMINAL_DOCKER_RUN_AS_HOST_USER", "false").lower() in {"true", "1", "yes"},
        "docker_extra_args": _parse_env_var("TERMINAL_DOCKER_EXTRA_ARGS", "[]", json.loads, "valid JSON"),
+        # Cross-process container reuse (issue #20561).  The docs claim
+        # "ONE long-lived container shared across sessions" — this toggle
+        # makes that real by probing for a labeled container at startup and
+        # attaching to it instead of always starting a fresh one.  Set to
+        # ``false`` for hard per-process isolation (no reuse, container is
+        # removed on exit).
+        "docker_persist_across_processes": os.getenv(
+            "TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES", "true"
+        ).lower() in {"true", "1", "yes"},
+        # Startup orphan reaper for hermes-tagged containers left behind by
+        # crashed / SIGKILL'd previous processes that bypassed atexit.
+        # Conservative: only sweeps Exited containers older than 2× the
+        # idle-reap window AND scoped to the current profile. Issue #20561.
+        "docker_orphan_reaper": os.getenv(
+            "TERMINAL_DOCKER_ORPHAN_REAPER", "true"
+        ).lower() in {"true", "1", "yes"},
    }


@@ -1072,6 +1160,13 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
        return _LocalEnvironment(cwd=cwd, timeout=timeout)
    
    elif env_type == "docker":
+        # One-shot orphan reaper: clean up labeled containers left behind by
+        # prior Hermes processes that hit SIGKILL / OOM / a closed terminal
+        # before the atexit cleanup hook could run.  Gated to once per
+        # process so concurrent _create_environment calls (parallel
+        # subagents, RL benchmarks) don't run the reaper N times.
+        # Disable via ``terminal.docker_orphan_reaper: false`` (issue #20561).
+        _maybe_reap_docker_orphans(cc)
        return _DockerEnvironment(
            image=image, cwd=cwd, timeout=timeout,
            cpu=cpu, memory=memory, disk=disk,
@@ -1083,6 +1178,7 @@ def _create_environment(env_type: str, image: str, cwd: str, timeout: int,
            env=docker_env,
            run_as_host_user=cc.get("docker_run_as_host_user", False),
            extra_args=docker_extra_args,
+            persist_across_processes=cc.get("docker_persist_across_processes", True),
        )
    
    elif env_type == "singularity":
@@ -1330,8 +1426,27 @@ def cleanup_all_environments():
    return cleaned


-def cleanup_vm(task_id: str):
-    """Manually clean up a specific environment by task_id."""
+def cleanup_vm(task_id: str, *, force_remove: bool = False):
+    """Manually clean up a specific environment by task_id.
+
+    *force_remove* (default False) is forwarded to backends that accept it
+    — currently only ``DockerEnvironment``. The default of False matches
+    session-lifecycle semantics: this function is called from
+    ``AIAgent.close()`` (TUI session close, gateway session teardown) and the
+    per-turn cleanup branch for non-persistent envs, both of which should
+    honor the user's persist-mode preference. Stopping the container here
+    would defeat the "ONE long-lived container shared across sessions"
+    contract — exactly the bug Ben reported when the container was killed
+    on every TUI session close.
+
+    Pass ``force_remove=True`` for actual user-initiated teardown
+    (e.g. ``/reset``-style flows that haven't been wired yet, or future
+    "destroy my sandbox" commands).
+
+    The idle reaper passes the env through ``env.cleanup()`` directly (not
+    via this function), so persist-mode idle envs are similarly no-op'd —
+    only the orphan reaper at next startup reclaims them.
+    """
    # Remove from tracking dicts while holding the lock, but defer the
    # actual (potentially slow) env.cleanup() call to outside the lock
    # so other tool calls aren't blocked.
@@ -1356,7 +1471,14 @@ def cleanup_vm(task_id: str):

    try:
        if hasattr(env, 'cleanup'):
-            env.cleanup()
+            # Pass force_remove only if the env's cleanup() accepts it
+            # (DockerEnvironment after issue #20561; other backends don't).
+            import inspect
+            sig = inspect.signature(env.cleanup)
+            if "force_remove" in sig.parameters:
+                env.cleanup(force_remove=force_remove)
+            else:
+                env.cleanup()
        elif hasattr(env, 'stop'):
            env.stop()
        elif hasattr(env, 'terminate'):
@@ -1378,7 +1500,23 @@ def _atexit_cleanup():
    if _active_environments:
        count = len(_active_environments)
        logger.info("Shutting down %d remaining sandbox(es)...", count)
+        # Snapshot the env objects BEFORE cleanup_all_environments empties
+        # the dict; we need them to wait on docker cleanup threads after the
+        # registry has been cleared.
+        envs_to_wait = list(_active_environments.values())
        cleanup_all_environments()
+        # Block briefly so docker stop/rm actually completes before the
+        # interpreter exits. Issue #20561 — without this join, the daemon
+        # cleanup threads were getting torn down mid-`docker stop`, leaving
+        # Exited containers piled up on the host.
+        for env in envs_to_wait:
+            wait_fn = getattr(env, "wait_for_cleanup", None)
+            if wait_fn is None:
+                continue
+            try:
+                wait_fn(timeout=15.0)
+            except Exception as e:  # never block shutdown on a bad backend
+                logger.debug("wait_for_cleanup raised on exit: %s", e)

 atexit.register(_atexit_cleanup)

@@ -1746,6 +1884,8 @@ def terminal_tool(
                                "docker_env": config.get("docker_env", {}),
                                "docker_run_as_host_user": config.get("docker_run_as_host_user", False),
                                "docker_extra_args": config.get("docker_extra_args", []),
+                                "docker_persist_across_processes": config.get("docker_persist_across_processes", True),
+                                "docker_orphan_reaper": config.get("docker_orphan_reaper", True),
                            }

                        local_config = None
@@ -1589,7 +1589,7 @@ wheels = [

 [[package]]
 name = "hermes-agent"
-version = "0.15.0"
+version = "0.15.1"
 source = { editable = "." }
 dependencies = [
    { name = "croniter" },
@@ -10,7 +10,7 @@
    "preview": "vite preview"
  },
  "dependencies": {
-    "@nous-research/ui": "0.16.0",
+    "@nous-research/ui": "0.18.2",
    "@observablehq/plot": "^0.6.17",
    "@react-three/fiber": "^9.6.0",
    "@tailwindcss/vite": "^4.2.1",
@@ -50,12 +50,12 @@ import {
 import { Button } from "@nous-research/ui/ui/components/button";
 import { SelectionSwitcher } from "@nous-research/ui/ui/components/selection-switcher";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
-import { Typography } from "@/components/NouiTypography";
+import { Typography } from "@nous-research/ui/ui/components/typography/index";
 import { cn } from "@/lib/utils";
 import { Backdrop } from "@/components/Backdrop";
 import { SidebarFooter } from "@/components/SidebarFooter";
 import { SidebarStatusStrip, gatewayLine } from "@/components/SidebarStatusStrip";
-import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
+import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
 import { useSidebarStatus } from "@/hooks/useSidebarStatus";
 import { AuthWidget } from "@/components/AuthWidget";
 import { PageHeaderProvider } from "@/contexts/PageHeaderProvider";
@@ -1,7 +1,7 @@
 import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
 import { Switch } from "@nous-research/ui/ui/components/switch";
-import { Input } from "@/components/ui/input";
-import { Label } from "@/components/ui/label";
+import { Input } from "@nous-research/ui/ui/components/input";
+import { Label } from "@nous-research/ui/ui/components/label";

 function FieldHint({ schema, schemaKey }: { schema: Record<string, unknown>; schemaKey: string }) {
  const keyPath = schemaKey.includes(".") ? schemaKey : "";
@@ -1,225 +0,0 @@
-import {
-  type PointerEvent as ReactPointerEvent,
-  type ReactNode,
-  useEffect,
-  useRef,
-  useState,
-} from "react";
-import { createPortal } from "react-dom";
-import { Typography } from "@/components/NouiTypography";
-import { cn, themedBody } from "@/lib/utils";
-
-const CLOSE_DRAG_MIN_PX = 72;
-const CLOSE_DRAG_RATIO = 0.18;
-const SHEET_TRANSITION_MS = 280;
-
-/**
- * Mobile-first picker shell: fixed backdrop + bottom sheet, portaled to `body`
- * so nested overflow/transform in the sidebar cannot clip menus (theme /
- * language switchers). Open/close uses slide + fade; teardown is delayed until
- * the exit animation finishes so animations can complete.
- *
- * Drag the header/handle downward to dismiss (skipped when reduced motion is on).
- */
-export function BottomPickSheet({
-  backdropDismissLabel = "Dismiss",
-  children,
-  onClose,
-  open,
-  title,
-}: BottomPickSheetProps) {
-  const [renderPortal, setRenderPortal] = useState(open);
-  const [entered, setEntered] = useState(false);
-  const [dragOffsetPx, setDragOffsetPx] = useState(0);
-  const [dragActive, setDragActive] = useState(false);
-
-  const closeTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
-  const sheetRef = useRef<HTMLDivElement>(null);
-  const dragTrackingRef = useRef(false);
-  const dragStartYRef = useRef(0);
-  const dragOffsetRef = useRef(0);
-
-  const reducedMotion =
-    typeof window !== "undefined" &&
-    window.matchMedia("(prefers-reduced-motion: reduce)").matches;
-
-  const syncDragPx = (next: number) => {
-    dragOffsetRef.current = next;
-    setDragOffsetPx(next);
-  };
-
-  useEffect(() => {
-    if (closeTimerRef.current) {
-      clearTimeout(closeTimerRef.current);
-      closeTimerRef.current = null;
-    }
-
-    const ms = reducedMotion ? 0 : SHEET_TRANSITION_MS;
-
-    let openRafId = 0;
-    let exitRafId = 0;
-
-    if (open) {
-      openRafId = requestAnimationFrame(() => {
-        dragTrackingRef.current = false;
-        dragOffsetRef.current = 0;
-        setDragActive(false);
-        setDragOffsetPx(0);
-        setRenderPortal(true);
-        requestAnimationFrame(() => {
-          requestAnimationFrame(() => setEntered(true));
-        });
-      });
-    } else {
-      exitRafId = requestAnimationFrame(() => {
-        dragTrackingRef.current = false;
-        setDragActive(false);
-        setEntered(false);
-        closeTimerRef.current = window.setTimeout(() => {
-          dragOffsetRef.current = 0;
-          setDragOffsetPx(0);
-          setRenderPortal(false);
-          closeTimerRef.current = null;
-        }, ms);
-      });
-    }
-
-    return () => {
-      cancelAnimationFrame(openRafId);
-      cancelAnimationFrame(exitRafId);
-      if (closeTimerRef.current) {
-        clearTimeout(closeTimerRef.current);
-        closeTimerRef.current = null;
-      }
-    };
-  }, [open, reducedMotion]);
-
-  useEffect(() => {
-    if (!renderPortal) return;
-    const prev = document.body.style.overflow;
-    document.body.style.overflow = "hidden";
-    return () => {
-      document.body.style.overflow = prev;
-    };
-  }, [renderPortal]);
-
-  if (!renderPortal || typeof document === "undefined") return null;
-
-  const durationClass = reducedMotion ? "duration-0" : "duration-[280ms]";
-
-  const draggingVisual = dragActive || dragOffsetPx > 0;
-
-  const onDragPointerDown = (e: ReactPointerEvent<HTMLDivElement>) => {
-    if (reducedMotion || !entered) return;
-    if (e.pointerType === "mouse" && e.button !== 0) return;
-
-    dragTrackingRef.current = true;
-    setDragActive(true);
-    dragStartYRef.current = e.clientY;
-    syncDragPx(0);
-    e.currentTarget.setPointerCapture(e.pointerId);
-  };
-
-  const onDragPointerMove = (e: ReactPointerEvent<HTMLDivElement>) => {
-    if (!dragTrackingRef.current) return;
-    const dy = e.clientY - dragStartYRef.current;
-    const next = Math.max(0, dy);
-    const sheetH = sheetRef.current?.offsetHeight ?? 560;
-    syncDragPx(Math.min(next, sheetH));
-  };
-
-  const endDrag = (e: ReactPointerEvent<HTMLDivElement>) => {
-    if (!dragTrackingRef.current) return;
-    dragTrackingRef.current = false;
-    setDragActive(false);
-    try {
-      e.currentTarget.releasePointerCapture(e.pointerId);
-    } catch {
-      /* already released */
-    }
-
-    const sheetH = sheetRef.current?.offsetHeight ?? 560;
-    const threshold = Math.max(CLOSE_DRAG_MIN_PX, sheetH * CLOSE_DRAG_RATIO);
-    const d = dragOffsetRef.current;
-
-    if (d >= threshold) {
-      onClose();
-      return;
-    }
-    syncDragPx(0);
-  };
-
-  return createPortal(
-    <div className="fixed inset-0 z-[200] flex flex-col justify-end">
-      <button
-        type="button"
-        aria-label={backdropDismissLabel}
-        className={cn(
-          "absolute inset-0 bg-black/55 backdrop-blur-[2px]",
-          "transition-opacity ease-out motion-reduce:transition-none",
-          durationClass,
-          entered ? "opacity-100" : "opacity-0",
-        )}
-        onClick={onClose}
-      />
-
-      <div
-        aria-label={title}
-        aria-modal="true"
-        ref={sheetRef}
-        className={cn(
-          themedBody,
-          "relative flex max-h-[85dvh] min-h-0 flex-col rounded-t-xl border border-current/20",
-          "bg-background-base/98 pb-[max(1rem,env(safe-area-inset-bottom))]",
-          "shadow-[0_-12px_40px_-8px_rgba(0,0,0,0.55)] backdrop-blur-md",
-          "ease-out motion-reduce:transition-none transform-gpu",
-          draggingVisual ? "transition-none" : cn("transition-transform", durationClass),
-          entered ? "translate-y-0" : "translate-y-full",
-        )}
-        role="dialog"
-        style={
-          entered && dragOffsetPx > 0
-            ? { transform: `translateY(${dragOffsetPx}px)` }
-            : undefined
-        }
-      >
-        <div
-          className={cn(
-            "flex shrink-0 flex-col gap-2 border-b border-current/15 px-4 pb-3 pt-2",
-            "touch-none select-none",
-            reducedMotion ? "cursor-default" : "cursor-grab active:cursor-grabbing",
-          )}
-          onPointerCancel={endDrag}
-          onPointerDown={onDragPointerDown}
-          onPointerMove={onDragPointerMove}
-          onPointerUp={endDrag}
-        >
-          <div
-            aria-hidden
-            className="mx-auto h-1 w-10 shrink-0 rounded-full bg-current/20"
-          />
-
-          <Typography
-            mondwest
-            className="text-display text-xs tracking-[0.12em] text-text-tertiary"
-          >
-            {title}
-          </Typography>
-        </div>
-
-        <div className="min-h-0 flex-1 overflow-y-auto overscroll-contain">
-          {children}
-        </div>
-      </div>
-    </div>,
-    document.body,
-  );
-}
-
-interface BottomPickSheetProps {
-  backdropDismissLabel?: string;
-  children: ReactNode;
-  onClose: () => void;
-  open: boolean;
-  title: string;
-}
@@ -25,7 +25,7 @@

 import { Button } from "@nous-research/ui/ui/components/button";
 import { Badge } from "@nous-research/ui/ui/components/badge";
-import { Card } from "@/components/ui/card";
+import { Card } from "@nous-research/ui/ui/components/card";

 import { ModelPickerDialog } from "@/components/ModelPickerDialog";
 import { ToolCall, type ToolEntry } from "@/components/ToolCall";
@@ -1,4 +1,4 @@
-import { ConfirmDialog } from "@/components/ui/confirm-dialog";
+import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
 import { useI18n } from "@/i18n";

 export function DeleteConfirmDialog({
@@ -2,9 +2,9 @@ import { useState, useRef, useEffect } from "react";
 import { createPortal } from "react-dom";
 import { Check } from "lucide-react";
 import { Button } from "@nous-research/ui/ui/components/button";
-import { BottomPickSheet } from "@/components/BottomPickSheet";
-import { Typography } from "@/components/NouiTypography";
-import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
+import { BottomSheet } from "@nous-research/ui/ui/components/bottom-sheet";
+import { Typography } from "@nous-research/ui/ui/components/typography/index";
+import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
 import { useI18n } from "@/i18n/context";
 import { LOCALE_META } from "@/i18n";
 import type { Locale } from "@/i18n";
@@ -87,7 +87,7 @@ export function LanguageSwitcher({ collapsed = false, dropUp = false }: Language
      </Button>

      {useMobileSheet && (
-        <BottomPickSheet
+        <BottomSheet
          backdropDismissLabel={t.common.close}
          onClose={() => setOpen(false)}
          open={open}
@@ -101,7 +101,7 @@ export function LanguageSwitcher({ collapsed = false, dropUp = false }: Language
              setOpen={setOpen}
            />
          </div>
-        </BottomPickSheet>
+        </BottomSheet>
      )}

      {open && !useMobileSheet && (() => {
@@ -2,8 +2,8 @@ import { Button } from "@nous-research/ui/ui/components/button";
 import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
 import { ListItem } from "@nous-research/ui/ui/components/list-item";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
-import { Label } from "@/components/ui/label";
-import { Input } from "@/components/ui/input";
+import { Input } from "@nous-research/ui/ui/components/input";
+import { Label } from "@nous-research/ui/ui/components/label";
 import type { GatewayClient } from "@/lib/gatewayClient";
 import { Check, Search, X } from "lucide-react";
 import { useEffect, useMemo, useRef, useState } from "react";
@@ -1,63 +0,0 @@
-import { forwardRef, type ElementType, type HTMLAttributes, type ReactNode } from "react";
-import { cn } from "@/lib/utils";
-
-type TypographyProps = HTMLAttributes<HTMLElement> & {
-  as?: ElementType;
-  children?: ReactNode;
-  compressed?: boolean;
-  courier?: boolean;
-  expanded?: boolean;
-  mondwest?: boolean;
-  mono?: boolean;
-  sans?: boolean;
-  variant?: "sm" | "md" | "lg" | "xl";
-};
-
-const variantClasses: Record<NonNullable<TypographyProps["variant"]>, string> = {
-  sm: "leading-[1.4] text-[.9375rem] tracking-[0.1875rem]",
-  md: "text-[2.625rem] leading-[1] tracking-[0.0525rem]",
-  lg: "text-[2.625rem] leading-[1] tracking-[0.0525rem]",
-  xl: "text-[4.5rem] leading-[1] tracking-[0.135rem]",
-};
-
-export const Typography = forwardRef<HTMLElement, TypographyProps>(function Typography(
-  {
-    as: Component = "span",
-    className,
-    compressed,
-    courier,
-    expanded,
-    mondwest,
-    mono,
-    sans,
-    variant,
-    ...props
-  },
-  ref,
-) {
-  const hasFontVariant = compressed || courier || expanded || mondwest || mono || sans;
-
-  return (
-    <Component
-      className={cn(
-        compressed && "font-compressed",
-        courier && "font-courier",
-        expanded && "font-expanded",
-        mondwest && "font-mondwest tracking-[0.1875rem]",
-        mono && "font-mono",
-        (!hasFontVariant || sans) && "font-sans",
-        variant && variantClasses[variant],
-        className,
-      )}
-      ref={ref}
-      {...props}
-    />
-  );
-});
-
-export const H2 = forwardRef<HTMLHeadingElement, Omit<TypographyProps, "as">>(function H2(
-  { className, variant = "lg", ...props },
-  ref,
-) {
-  return <Typography as="h2" className={cn("font-bold", className)} variant={variant} ref={ref} {...props} />;
-});
@@ -3,9 +3,9 @@ import { ExternalLink, X, Check } from "lucide-react";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { CopyButton } from "@nous-research/ui/ui/components/command-block";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
-import { H2 } from "@/components/NouiTypography";
+import { H2 } from "@nous-research/ui/ui/components/typography/h2";
 import { api, type OAuthProvider, type OAuthStartResponse } from "@/lib/api";
-import { Input } from "@/components/ui/input";
+import { Input } from "@nous-research/ui/ui/components/input";
 import { useI18n } from "@/i18n";
 import { cn, themedBody } from "@/lib/utils";

@@ -16,9 +16,9 @@ import {
  CardDescription,
  CardHeader,
  CardTitle,
-} from "@/components/ui/card";
+} from "@nous-research/ui/ui/components/card";
 import { Badge } from "@nous-research/ui/ui/components/badge";
-import { ConfirmDialog } from "@/components/ui/confirm-dialog";
+import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
 import { OAuthLoginModal } from "@/components/OAuthLoginModal";
 import { useI18n } from "@/i18n";

@@ -2,7 +2,7 @@ import { AlertTriangle, Radio, Wifi, WifiOff } from "lucide-react";
 import type { PlatformStatus } from "@/lib/api";
 import { isoTimeAgo } from "@/lib/utils";
 import { Badge } from "@nous-research/ui/ui/components/badge";
-import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
+import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
 import { useI18n } from "@/i18n";

 export function PlatformsCard({ platforms }: PlatformsCardProps) {
@@ -1,4 +1,4 @@
-import { Typography } from "@/components/NouiTypography";
+import { Typography } from "@nous-research/ui/ui/components/typography/index";
 import type { StatusResponse } from "@/lib/api";
 import { cn } from "@/lib/utils";
 import { useI18n } from "@/i18n";
@@ -3,9 +3,9 @@ import { createPortal } from "react-dom";
 import { Palette, Check } from "lucide-react";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { ListItem } from "@nous-research/ui/ui/components/list-item";
-import { BottomPickSheet } from "@/components/BottomPickSheet";
-import { Typography } from "@/components/NouiTypography";
-import { useBelowBreakpoint } from "@/hooks/useBelowBreakpoint";
+import { BottomSheet } from "@nous-research/ui/ui/components/bottom-sheet";
+import { Typography } from "@nous-research/ui/ui/components/typography/index";
+import { useBelowBreakpoint } from "@nous-research/ui/hooks/use-below-breakpoint";
 import { BUILTIN_THEMES, useTheme } from "@/themes";
 import type { DashboardTheme, ThemeListEntry } from "@/themes";
 import { useI18n } from "@/i18n";
@@ -91,7 +91,7 @@ export function ThemeSwitcher({ collapsed = false, dropUp = false }: ThemeSwitch
      </Button>

      {useMobileSheet && (
-        <BottomPickSheet
+        <BottomSheet
          backdropDismissLabel={t.common.close}
          onClose={close}
          open={open}
@@ -105,7 +105,7 @@ export function ThemeSwitcher({ collapsed = false, dropUp = false }: ThemeSwitch
              themeName={themeName}
            />
          </div>
-        </BottomPickSheet>
+        </BottomSheet>
      )}

      {open && !useMobileSheet && (() => {
@@ -1,40 +0,0 @@
-import { useEffect, useState } from "react";
-import { createPortal } from "react-dom";
-
-export function Toast({ toast }: { toast: { message: string; type: "success" | "error" } | null }) {
-  const [visible, setVisible] = useState(false);
-  const [current, setCurrent] = useState(toast);
-
-  useEffect(() => {
-    if (toast) {
-      setCurrent(toast);
-      setVisible(true);
-    } else {
-      setVisible(false);
-      const timer = setTimeout(() => setCurrent(null), 200);
-      return () => clearTimeout(timer);
-    }
-  }, [toast]);
-
-  if (!current) return null;
-
-  // Portal to document.body so the toast escapes any ancestor stacking context
-  // (e.g. <main> has `relative z-2`, which would trap z-50 below the header's z-40).
-  return createPortal(
-    <div
-      role="status"
-      aria-live="polite"
-      className={`fixed top-16 right-4 z-50 border px-4 py-2.5 font-courier text-xs tracking-wider uppercase backdrop-blur-sm ${
-        current.type === "success"
-          ? "bg-success/15 text-success border-success/30"
-          : "bg-destructive/15 text-destructive border-destructive/30"
-      }`}
-      style={{
-        animation: visible ? "toast-in 200ms ease-out forwards" : "toast-out 200ms ease-in forwards",
-      }}
-    >
-      {current.message}
-    </div>,
-    document.body,
-  );
-}
@@ -1,63 +0,0 @@
-import { cn, themedBody } from "@/lib/utils";
-
-/**
- * Themed card primitive. Themes can restyle every card without touching
- * call sites by setting CSS vars under the `card` component-style bucket:
- *
- *   componentStyles:
- *     card:
- *       clipPath: "polygon(10px 0, 100% 0, 100% calc(100% - 10px), calc(100% - 10px) 100%, 0 100%, 0 10px)"
- *       border: "1px solid var(--color-ring)"
- *       background: "linear-gradient(180deg, var(--color-card) 0%, transparent 100%)"
- *       boxShadow: "0 0 0 1px var(--color-ring) inset, 0 0 24px -8px var(--warm-glow)"
- *
- * All properties are optional — vars that aren't set compute to their
- * CSS initial value, so the default shadcn-y card keeps looking normal
- * for themes that don't override anything.
- */
-const CARD_STYLE: React.CSSProperties = {
-  clipPath: "var(--component-card-clip-path)",
-  borderImage: "var(--component-card-border-image)",
-  background: "var(--component-card-background)",
-  boxShadow: "var(--component-card-box-shadow)",
-};
-
-export function Card({ className, style, ...props }: React.HTMLAttributes<HTMLDivElement>) {
-  return (
-    <div
-      className={cn(
-        "border border-border bg-card/80 text-card-foreground w-full",
-        themedBody,
-        className,
-      )}
-      style={{ ...CARD_STYLE, ...style }}
-      {...props}
-    />
-  );
-}
-
-export function CardHeader({ className, ...props }: React.HTMLAttributes<HTMLDivElement>) {
-  return <div className={cn("flex flex-col gap-1.5 p-4 border-b border-border", className)} {...props} />;
-}
-
-export function CardTitle({ className, ...props }: React.HTMLAttributes<HTMLHeadingElement>) {
-  return (
-    <h3
-      className={cn(
-        "font-mondwest text-display text-sm tracking-[0.12em] text-text-primary",
-        className,
-      )}
-      {...props}
-    />
-  );
-}
-
-export function CardDescription({ className, ...props }: React.HTMLAttributes<HTMLParagraphElement>) {
-  return (
-    <p className={cn("font-mondwest normal-case text-xs text-muted-foreground", className)} {...props} />
-  );
-}
-
-export function CardContent({ className, ...props }: React.HTMLAttributes<HTMLDivElement>) {
-  return <div className={cn("p-4", className)} {...props} />;
-}
@@ -1,137 +0,0 @@
-import { useEffect, useRef } from "react";
-import { createPortal } from "react-dom";
-import { AlertTriangle } from "lucide-react";
-import { Button } from "@nous-research/ui/ui/components/button";
-import { cn, themedBody } from "@/lib/utils";
-
-export function ConfirmDialog({
-  cancelLabel = "Cancel",
-  confirmLabel = "Confirm",
-  description,
-  destructive = false,
-  loading = false,
-  onCancel,
-  onConfirm,
-  open,
-  title,
-}: ConfirmDialogProps) {
-  const dialogRef = useRef<HTMLDivElement>(null);
-
-  // Focus the confirm button when opened; trap ESC to cancel.
-  useEffect(() => {
-    if (!open) return;
-
-    const prevActive = document.activeElement as HTMLElement | null;
-    dialogRef.current
-      ?.querySelector<HTMLButtonElement>("[data-confirm]")
-      ?.focus();
-
-    const onKey = (e: KeyboardEvent) => {
-      if (e.key === "Escape") {
-        e.preventDefault();
-        onCancel();
-      }
-    };
-
-    document.addEventListener("keydown", onKey);
-    const prevOverflow = document.body.style.overflow;
-    document.body.style.overflow = "hidden";
-
-    return () => {
-      document.removeEventListener("keydown", onKey);
-      document.body.style.overflow = prevOverflow;
-      prevActive?.focus?.();
-    };
-  }, [open, onCancel]);
-
-  if (!open) return null;
-
-  return createPortal(
-    <div
-      role="dialog"
-      aria-modal="true"
-      aria-labelledby="confirm-dialog-title"
-      aria-describedby={description ? "confirm-dialog-desc" : undefined}
-      onClick={(e) => {
-        if (e.target === e.currentTarget) onCancel();
-      }}
-      className={cn(
-        "fixed inset-0 z-50 flex items-center justify-center",
-        "bg-black/60 backdrop-blur-sm",
-        "animate-[fade-in_150ms_ease-out]",
-      )}
-    >
-      <div
-        ref={dialogRef}
-        className={cn(
-          themedBody,
-          "relative w-full max-w-md mx-4",
-          "border border-border bg-card shadow-lg",
-          "animate-[dialog-in_180ms_ease-out]",
-        )}
-      >
-        <div className="flex items-start gap-3 p-4 border-b border-border">
-          {destructive && (
-            <div
-              aria-hidden
-              className="mt-0.5 shrink-0 text-destructive"
-            >
-              <AlertTriangle className="h-4 w-4" />
-            </div>
-          )}
-
-          <div className="flex-1 min-w-0 flex flex-col gap-1">
-            <h2
-              id="confirm-dialog-title"
-              className="font-mondwest text-display text-sm font-bold tracking-[0.12em] blend-lighter"
-            >
-              {title}
-            </h2>
-
-            {description && (
-              <p
-                id="confirm-dialog-desc"
-                className="font-mondwest normal-case text-xs text-muted-foreground leading-relaxed"
-              >
-                {description}
-              </p>
-            )}
-          </div>
-        </div>
-
-        <div className="flex items-center justify-end gap-2 p-3">
-          <Button
-            type="button"
-            outlined
-            onClick={onCancel}
-            disabled={loading}
-          >
-            {cancelLabel}
-          </Button>
-          <Button
-            data-confirm
-            type="button"
-            destructive={destructive}
-            onClick={onConfirm}
-            disabled={loading}
-          >
-            {loading ? "…" : confirmLabel}
-          </Button>
-        </div>
-      </div>
-    </div>,
-    document.body,
-  );
-}
-
-interface ConfirmDialogProps {
-  cancelLabel?: string;
-  confirmLabel?: string;
-  description?: string;
-  destructive?: boolean;
-  loading?: boolean;
-  onCancel: () => void;
-  onConfirm: () => void;
-  open: boolean;
-  title: string;
-}
@@ -1,16 +0,0 @@
-import { cn } from "@/lib/utils";
-
-export function Input({ className, ...props }: React.InputHTMLAttributes<HTMLInputElement>) {
-  return (
-    <input
-      className={cn(
-        "flex h-9 w-full border border-border bg-background/40 px-3 py-1 font-courier text-sm transition-colors",
-        "placeholder:text-muted-foreground",
-        "focus-visible:outline-none focus-visible:ring-1 focus-visible:ring-foreground/30 focus-visible:border-foreground/25",
-        "disabled:cursor-not-allowed disabled:opacity-50",
-        className,
-      )}
-      {...props}
-    />
-  );
-}
@@ -1,13 +0,0 @@
-import { cn } from "@/lib/utils";
-
-export function Label({ className, ...props }: React.LabelHTMLAttributes<HTMLLabelElement>) {
-  return (
-    <label
-      className={cn(
-        "font-mondwest text-xs tracking-[0.1em] uppercase leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70",
-        className,
-      )}
-      {...props}
-    />
-  );
-}
@@ -1,19 +0,0 @@
-import { cn } from "@/lib/utils";
-
-export function Separator({
-  className,
-  orientation = "horizontal",
-  ...props
-}: React.HTMLAttributes<HTMLDivElement> & { orientation?: "horizontal" | "vertical" }) {
-  return (
-    <div
-      role="separator"
-      className={cn(
-        "shrink-0 bg-border",
-        orientation === "horizontal" ? "h-px w-full" : "h-full w-px",
-        className,
-      )}
-      {...props}
-    />
-  );
-}
@@ -1,7 +1,7 @@
 import { useCallback, useEffect, useState } from "react";
 import { api } from "@/lib/api";
 import type { ActionStatusResponse } from "@/lib/api";
-import { Toast } from "@/components/Toast";
+import { Toast } from "@nous-research/ui/ui/components/toast";
 import { useI18n } from "@/i18n";
 import {
  SystemActionsContext,
@@ -1,19 +0,0 @@
-import { useEffect, useState } from "react";
-
-/** True when viewport width is strictly below `px` (matches Tailwind `min-width: px`). */
-export function useBelowBreakpoint(px: number) {
-  const query = `(max-width: ${px - 1}px)`;
-  const [matches, setMatches] = useState(() =>
-    typeof window !== "undefined" ? window.matchMedia(query).matches : false,
-  );
-
-  useEffect(() => {
-    const mql = window.matchMedia(query);
-    const sync = () => setMatches(mql.matches);
-    sync();
-    mql.addEventListener("change", sync);
-    return () => mql.removeEventListener("change", sync);
-  }, [query]);
-
-  return matches;
-}
@@ -1,41 +0,0 @@
-import { useCallback, useState } from "react";
-
-export function useConfirmDelete<TId>({
-  onDelete,
-}: {
-  onDelete: (id: TId) => Promise<void>;
-}) {
-  const [pendingId, setPendingId] = useState<TId | null>(null);
-  const [isDeleting, setIsDeleting] = useState(false);
-
-  const requestDelete = useCallback((id: TId) => {
-    setPendingId(id);
-  }, []);
-
-  const cancel = useCallback(() => {
-    if (!isDeleting) setPendingId(null);
-  }, [isDeleting]);
-
-  const confirm = useCallback(async () => {
-    if (pendingId === null) return;
-    const id = pendingId;
-    setIsDeleting(true);
-    try {
-      await onDelete(id);
-      setPendingId(null);
-    } catch {
-      // Dialog stays open; caller can surface errors in onDelete before rethrowing
-    } finally {
-      setIsDeleting(false);
-    }
-  }, [pendingId, onDelete]);
-
-  return {
-    cancel,
-    confirm,
-    isDeleting,
-    isOpen: pendingId !== null,
-    pendingId,
-    requestDelete,
-  } as const;
-}
@@ -1,15 +0,0 @@
-import { useCallback, useState } from "react";
-
-export function useToast(duration = 3000) {
-  const [toast, setToast] = useState<{ message: string; type: "success" | "error" } | null>(null);
-
-  const showToast = useCallback(
-    (message: string, type: "success" | "error") => {
-      setToast({ message, type });
-      setTimeout(() => setToast(null), duration);
-    },
-    [duration],
-  );
-
-  return { toast, showToast };
-}
@@ -41,7 +41,11 @@ function setSessionHeader(headers: Headers, token: string): void {
  }
 }

-export async function fetchJSON<T>(url: string, init?: RequestInit): Promise<T> {
+export async function fetchJSON<T>(
+  url: string,
+  init?: RequestInit,
+  options?: FetchJSONOptions,
+): Promise<T> {
  // Inject the session token into all /api/ requests.
  const headers = new Headers(init?.headers);
  const token = window.__HERMES_SESSION_TOKEN__;
@@ -100,7 +104,7 @@ export async function fetchJSON<T>(url: string, init?: RequestInit): Promise<T>
    // that reload once on the first stale-token 401 — gated mode is
    // handled above, so reaching here in gated mode means a real
    // middleware failure that should not reload-loop.
-    if (!window.__HERMES_AUTH_REQUIRED__) {
+    if (!window.__HERMES_AUTH_REQUIRED__ && !options?.allowUnauthorized) {
      let alreadyReloaded = false;
      try {
        alreadyReloaded =
@@ -198,8 +202,19 @@ export const api = {
   * still exists but is never useful there (no Session, no cookie). The
   * AuthWidget component swallows 401s from this call: if the gate isn't
   * engaged, /api/auth/me returns 401 and the widget renders nothing.
+   *
+   * ``allowUnauthorized`` is load-bearing: in loopback mode this endpoint
+   * 401s by design, and fetchJSON's default loopback behaviour treats a
+   * 401 as a rotated session token and full-page-reloads to pick up a
+   * fresh one. Because every *other* dashboard request succeeds (and so
+   * clears the one-shot reload guard), that turns this expected 401 into
+   * an infinite reload loop. Opting out keeps the 401 a plain throw the
+   * widget can catch.
   */
-  getAuthMe: () => fetchJSON<AuthMeResponse>("/api/auth/me"),
+  getAuthMe: () =>
+    fetchJSON<AuthMeResponse>("/api/auth/me", undefined, {
+      allowUnauthorized: true,
+    }),
  logout: () =>
    fetch(`${BASE}/auth/logout`, {
      method: "POST",
@@ -514,6 +529,15 @@ export interface ActionResponse {
  pid: number;
 }

+/** Per-call overrides for {@link fetchJSON}. */
+interface FetchJSONOptions {
+  /** When true, a 401 response is surfaced as a normal thrown error rather
+   *  than triggering the loopback stale-token page reload. Use for probes
+   *  whose 401 is an expected signal (e.g. /api/auth/me in non-gated mode)
+   *  rather than evidence of a rotated session token. */
+  allowUnauthorized?: boolean;
+}
+
 export interface ActionStatusResponse {
  exit_code: number | null;
  lines: string[];
@@ -20,7 +20,7 @@ import { timeAgo } from "@/lib/utils";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
 import { Stats } from "@nous-research/ui/ui/components/stats";
-import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
+import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
 import { Badge } from "@nous-research/ui/ui/components/badge";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { useI18n } from "@/i18n";
@@ -23,7 +23,7 @@ import { WebglAddon } from "@xterm/addon-webgl";
 import { Terminal } from "@xterm/xterm";
 import "@xterm/xterm/css/xterm.css";
 import { Button } from "@nous-research/ui/ui/components/button";
-import { Typography } from "@/components/NouiTypography";
+import { Typography } from "@nous-research/ui/ui/components/typography/index";
 import { HERMES_BASE_PATH, buildWsAuthParam } from "@/lib/api";
 import { cn } from "@/lib/utils";
 import { Copy, PanelRight, X } from "lucide-react";
@@ -38,15 +38,15 @@ import {
 } from "lucide-react";
 import { api } from "@/lib/api";
 import { getNestedValue, setNestedValue } from "@/lib/nested";
-import { useToast } from "@/hooks/useToast";
-import { Toast } from "@/components/Toast";
+import { useToast } from "@nous-research/ui/hooks/use-toast";
+import { Toast } from "@nous-research/ui/ui/components/toast";
 import { AutoField } from "@/components/AutoField";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { ListItem } from "@nous-research/ui/ui/components/list-item";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
-import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
-import { ConfirmDialog } from "@/components/ui/confirm-dialog";
-import { Input } from "@/components/ui/input";
+import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
+import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
+import { Input } from "@nous-research/ui/ui/components/input";
 import { Badge } from "@nous-research/ui/ui/components/badge";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
@@ -4,17 +4,17 @@ import { Badge } from "@nous-research/ui/ui/components/badge";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
-import { H2 } from "@/components/NouiTypography";
+import { H2 } from "@nous-research/ui/ui/components/typography/h2";
 import { api } from "@/lib/api";
 import type { CronJob, ProfileInfo } from "@/lib/api";
 import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
-import { useToast } from "@/hooks/useToast";
-import { useConfirmDelete } from "@/hooks/useConfirmDelete";
+import { useToast } from "@nous-research/ui/hooks/use-toast";
+import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
 import { useModalBehavior } from "@/hooks/useModalBehavior";
-import { Toast } from "@/components/Toast";
-import { Card, CardContent } from "@/components/ui/card";
-import { Input } from "@/components/ui/input";
-import { Label } from "@/components/ui/label";
+import { Toast } from "@nous-research/ui/ui/components/toast";
+import { Card, CardContent } from "@nous-research/ui/ui/components/card";
+import { Input } from "@nous-research/ui/ui/components/input";
+import { Label } from "@nous-research/ui/ui/components/label";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { PluginSlot } from "@/plugins";
@@ -17,9 +17,9 @@ import {
 import { api } from "@/lib/api";
 import type { EnvVarInfo } from "@/lib/api";
 import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
-import { Toast } from "@/components/Toast";
-import { useConfirmDelete } from "@/hooks/useConfirmDelete";
-import { useToast } from "@/hooks/useToast";
+import { Toast } from "@nous-research/ui/ui/components/toast";
+import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
+import { useToast } from "@nous-research/ui/hooks/use-toast";
 import { OAuthProvidersCard } from "@/components/OAuthProvidersCard";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { ListItem } from "@nous-research/ui/ui/components/list-item";
@@ -30,10 +30,10 @@ import {
  CardDescription,
  CardHeader,
  CardTitle,
-} from "@/components/ui/card";
+} from "@nous-research/ui/ui/components/card";
 import { Badge } from "@nous-research/ui/ui/components/badge";
-import { Input } from "@/components/ui/input";
-import { Label } from "@/components/ui/label";
+import { Input } from "@nous-research/ui/ui/components/input";
+import { Label } from "@nous-research/ui/ui/components/label";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { PluginSlot } from "@/plugins";
@@ -12,8 +12,8 @@ import { Button } from "@nous-research/ui/ui/components/button";
 import { FilterGroup, Segmented } from "@nous-research/ui/ui/components/segmented";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
 import { Switch } from "@nous-research/ui/ui/components/switch";
-import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
-import { Label } from "@/components/ui/label";
+import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
+import { Label } from "@nous-research/ui/ui/components/label";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { PluginSlot } from "@/plugins";
@@ -24,9 +24,9 @@ import { formatTokenCount } from "@/lib/format";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
 import { Stats } from "@nous-research/ui/ui/components/stats";
-import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
+import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
 import { Badge } from "@nous-research/ui/ui/components/badge";
-import { ConfirmDialog } from "@/components/ui/confirm-dialog";
+import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
 import { useModalBehavior } from "@/hooks/useModalBehavior";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { useI18n } from "@/i18n";
@@ -10,12 +10,12 @@ import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
 import { Switch } from "@nous-research/ui/ui/components/switch";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
 import { CommandBlock } from "@nous-research/ui/ui/components/command-block";
-import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
-import { ConfirmDialog } from "@/components/ui/confirm-dialog";
-import { Input } from "@/components/ui/input";
-import { Label } from "@/components/ui/label";
-import { useToast } from "@/hooks/useToast";
-import { Toast } from "@/components/Toast";
+import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
+import { ConfirmDialog } from "@nous-research/ui/ui/components/confirm-dialog";
+import { Input } from "@nous-research/ui/ui/components/input";
+import { Label } from "@nous-research/ui/ui/components/label";
+import { useToast } from "@nous-research/ui/hooks/use-toast";
+import { Toast } from "@nous-research/ui/ui/components/toast";
 import { useI18n } from "@/i18n";
 import { PluginSlot } from "@/plugins";
 import { cn } from "@/lib/utils";
@@ -14,19 +14,19 @@ import {
  X,
 } from "lucide-react";
 import spinners from "unicode-animations";
-import { H2 } from "@/components/NouiTypography";
+import { H2 } from "@nous-research/ui/ui/components/typography/h2";
 import { api } from "@/lib/api";
 import type { ProfileInfo } from "@/lib/api";
 import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
-import { useToast } from "@/hooks/useToast";
-import { useConfirmDelete } from "@/hooks/useConfirmDelete";
+import { useToast } from "@nous-research/ui/hooks/use-toast";
+import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
 import { useModalBehavior } from "@/hooks/useModalBehavior";
-import { Toast } from "@/components/Toast";
-import { Card, CardContent } from "@/components/ui/card";
+import { Toast } from "@nous-research/ui/ui/components/toast";
+import { Card, CardContent } from "@nous-research/ui/ui/components/card";
 import { Badge } from "@nous-research/ui/ui/components/badge";
 import { Button } from "@nous-research/ui/ui/components/button";
-import { Input } from "@/components/ui/input";
-import { Label } from "@/components/ui/label";
+import { Input } from "@nous-research/ui/ui/components/input";
+import { Label } from "@nous-research/ui/ui/components/label";
 import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
@@ -34,18 +34,18 @@ import type {
 import { timeAgo } from "@/lib/utils";
 import { Markdown } from "@/components/Markdown";
 import { PlatformsCard } from "@/components/PlatformsCard";
-import { Toast } from "@/components/Toast";
+import { Toast } from "@nous-research/ui/ui/components/toast";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { ListItem } from "@nous-research/ui/ui/components/list-item";
 import { Segmented } from "@nous-research/ui/ui/components/segmented";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
 import { Badge } from "@nous-research/ui/ui/components/badge";
-import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
+import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
 import { DeleteConfirmDialog } from "@/components/DeleteConfirmDialog";
-import { useConfirmDelete } from "@/hooks/useConfirmDelete";
-import { Input } from "@/components/ui/input";
+import { useConfirmDelete } from "@nous-research/ui/hooks/use-confirm-delete";
+import { Input } from "@nous-research/ui/ui/components/input";
 import { useSystemActions } from "@/contexts/useSystemActions";
-import { useToast } from "@/hooks/useToast";
+import { useToast } from "@nous-research/ui/hooks/use-toast";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { PluginSlot } from "@/plugins";
@@ -17,16 +17,16 @@ import {
 } from "lucide-react";
 import { api } from "@/lib/api";
 import type { SkillInfo, ToolsetInfo } from "@/lib/api";
-import { useToast } from "@/hooks/useToast";
-import { Toast } from "@/components/Toast";
-import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
+import { useToast } from "@nous-research/ui/hooks/use-toast";
+import { Toast } from "@nous-research/ui/ui/components/toast";
+import { Card, CardContent, CardHeader, CardTitle } from "@nous-research/ui/ui/components/card";
 import { Badge } from "@nous-research/ui/ui/components/badge";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { ListItem } from "@nous-research/ui/ui/components/list-item";
 import { Spinner } from "@nous-research/ui/ui/components/spinner";
 import { Switch } from "@nous-research/ui/ui/components/switch";
 import { cn } from "@/lib/utils";
-import { Input } from "@/components/ui/input";
+import { Input } from "@nous-research/ui/ui/components/input";
 import { useI18n } from "@/i18n";
 import { usePageHeader } from "@/contexts/usePageHeader";
 import { PluginSlot } from "@/plugins";
@@ -23,10 +23,10 @@ import { Badge } from "@nous-research/ui/ui/components/badge";
 import { Button } from "@nous-research/ui/ui/components/button";
 import { Checkbox } from "@nous-research/ui/ui/components/checkbox";
 import { Select, SelectOption } from "@nous-research/ui/ui/components/select";
-import { Card, CardHeader, CardTitle, CardContent } from "@/components/ui/card";
-import { Input } from "@/components/ui/input";
-import { Label } from "@/components/ui/label";
-import { Separator } from "@/components/ui/separator";
+import { Card, CardHeader, CardTitle, CardContent } from "@nous-research/ui/ui/components/card";
+import { Input } from "@nous-research/ui/ui/components/input";
+import { Label } from "@nous-research/ui/ui/components/label";
+import { Separator } from "@nous-research/ui/ui/components/separator";
 import { Tabs, TabsList, TabsTrigger } from "@nous-research/ui/ui/components/tabs";
 import { useI18n } from "@/i18n";
 import { registerSlot, PluginSlot } from "./slots";
@@ -130,7 +130,7 @@ The agent has the same filesystem access as your user account. Use `hermes tools

 Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits).

-**Single persistent container, not per-command.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents — for the lifetime of the Hermes process. Working-directory changes, installed packages, and files in `/workspace` carry over from one tool call to the next, just like a local shell. The container is stopped and removed on shutdown. See **Container lifecycle** below for details.
+**Single persistent container, shared across Hermes processes.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents. Working-directory changes, installed packages, files in `/workspace`, and **background processes** all carry over from one tool call to the next, and from one Hermes process to the next. When you close a TUI session, run `/quit`, or start a new `hermes` invocation, the container keeps running and the next Hermes process reuses it via a labeled lookup. See **Container lifecycle** below for the exact teardown rules.

 ```yaml
 terminal:
@@ -138,8 +138,11 @@ terminal:
  docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
  docker_mount_cwd_to_workspace: false  # Mount launch dir into /workspace
  docker_run_as_host_user: false   # See "Running container as host user" below
-  docker_forward_env:              # Env vars to forward into container
+  docker_forward_env:              # Host env vars to forward into container
    - "GITHUB_TOKEN"
+  docker_env:                      # Literal env vars to inject (KEY=value)
+    DEBUG: "1"
+    PYTHONUNBUFFERED: "1"
  docker_volumes:                  # Host directory mounts
    - "/home/user/projects:/workspace/projects"
    - "/home/user/data:/data:ro"   # :ro for read-only
@@ -151,14 +154,49 @@ terminal:
  container_cpu: 1                 # CPU cores (0 = unlimited)
  container_memory: 5120           # MB (0 = unlimited)
  container_disk: 51200            # MB (requires overlay2 on XFS+pquota)
-  container_persistent: true       # Persist /workspace and /root across sessions
+  container_persistent: true       # Persist /workspace and /root bind-mount dirs
+
+  # Cross-process container reuse (defaults match the "one long-lived
+  # container shared across sessions" contract — see Container lifecycle).
+  docker_persist_across_processes: true   # Reuse container across Hermes restarts
+  docker_orphan_reaper: true              # Sweep abandoned Exited containers at startup
+
+  # Cross-backend lifecycle settings (apply to docker as well)
+  timeout: 180                     # Per-command timeout in seconds
+  lifetime_seconds: 300            # Idle-reaper window; also feeds 2× orphan-reaper threshold
 ```

+**`docker_env`** vs **`docker_forward_env`**: the former injects literal `KEY=value` pairs you specify in the config (the values live in your `config.yaml` or are passed as a JSON dict via `TERMINAL_DOCKER_ENV='{"DEBUG":"1"}'`). The latter forwards values from your shell or `~/.hermes/.env`, so the actual secret never appears in the config file. Use `docker_forward_env` for tokens and `docker_env` for static knobs the container needs.
+
 **`terminal.docker_extra_args`** (also overridable via `TERMINAL_DOCKER_EXTRA_ARGS='["--gpus=all"]'`) lets you pass arbitrary `docker run` flags that Hermes doesn't surface as first-class keys — `--gpus`, `--network`, `--add-host`, alternative `--security-opt` overrides, etc. Each entry must be a string; the list is appended last to the assembled `docker run` invocation so it can override Hermes' defaults if needed. Use sparingly — flags that conflict with the sandbox hardening (capability drops, `--user`, the workspace bind mount) will silently weaken isolation.

 **Requirements:** Docker Desktop or Docker Engine installed and running. Hermes probes `$PATH` plus common macOS install locations (`/usr/local/bin/docker`, `/opt/homebrew/bin/docker`, Docker Desktop app bundle). Podman is supported out of the box: set `HERMES_DOCKER_BINARY=podman` (or the full path) to force it when both are installed.

-**Container lifecycle:** Hermes reuses a single long-lived container (`docker run -d ... sleep 2h`) for every terminal and file-tool call, across sessions, `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. Commands run via `docker exec` with a login shell, so working-directory changes, installed packages, and files in `/workspace` all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it).
+#### Container lifecycle
+
+Every Hermes-managed container is tagged with three labels so subsequent processes (and the orphan reaper) can identify it:
+
+- `hermes-agent=1` — marks it as Hermes-managed
+- `hermes-task-id=<sanitized task_id>` — keys the per-task reuse probe
+- `hermes-profile=<sanitized profile name>` — scopes reuse and reaping to the active Hermes profile
+
+On startup, Hermes runs `docker ps --filter label=hermes-task-id=<id> --filter label=hermes-profile=<profile>` and **attaches to the existing container** when it finds one. If the container is `exited` (e.g. after a Docker daemon restart), it's `docker start`'d and reused — filesystem state and any installed packages survive, but in-container background processes do not.
+
+When a Hermes process exits — `/quit`, closing a TUI session, gateway shutdown, even SIGKILL — the cleanup path is a **no-op for the container in default mode**. The container keeps running. The next Hermes process attaches to it in milliseconds via the label probe. This is the behavior the "one long-lived container shared across sessions" contract requires: it's the only way background processes (npm watchers, dev servers, long-running pytest) survive across sessions.
+
+**The container is only torn down (stopped and `docker rm -f`'d) in these cases:**
+
+| Trigger | When it fires |
+|---|---|
+| `docker_persist_across_processes: false` | Explicit per-process isolation. Every `cleanup()` does `stop` + `rm -f`. Matches pre-issue-#20561 behavior. |
+| Idle reaper (`lifetime_seconds`, default 300s) | Only when the env is `persist_across_processes=false`. Persist-mode envs are no-op'd; container survives the idle sweep. |
+| Orphan reaper at next startup | Sweeps **Exited** hermes-labeled containers older than `2 × lifetime_seconds` (default 600s = 10 min), scoped to the current profile. **Running containers are never touched** — sibling-process safety. Set `docker_orphan_reaper: false` to disable. |
+| Direct user action | `docker rm -f`, `docker system prune`, Docker Desktop restart. We don't set `--restart=always`, so a host reboot leaves the container `Exited` (its CoW layer survives and gets reused on next startup, but bg processes are gone). |
+
+Edge cases worth knowing:
+
+- **OOM kill of in-container PID 1** transitions the container to `Exited`. Next reuse will `docker start` it; filesystem state survives, bg processes do not.
+- **Switching profiles** isolates containers from each other — a container labeled `hermes-profile=work` is invisible to a Hermes process running under `hermes-profile=research`. The orphan reaper is profile-scoped too, so cross-profile containers don't get reaped accidentally, but they also won't get cleaned up automatically until you start Hermes again under their original profile.

 Parallel subagents spawned via `delegate_task(tasks=[...])` share this one container — concurrent `cd`, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via `register_task_env_overrides()`, which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images.

@@ -170,6 +208,29 @@ Parallel subagents spawned via `delegate_task(tasks=[...])` share this one conta

 **Credential forwarding:** Env vars listed in `docker_forward_env` are resolved from your shell environment first, then `~/.hermes/.env`. Skills can also declare `required_environment_variables` which are merged automatically.

+#### Environment variable overrides
+
+Every key under `terminal:` has an env-var override of the form `TERMINAL_<KEY_UPPERCASE>`. The most useful ones for the Docker backend:
+
+| Env var | Maps to | Notes |
+|---|---|---|
+| `TERMINAL_DOCKER_IMAGE` | `docker_image` | Base image |
+| `TERMINAL_DOCKER_FORWARD_ENV` | `docker_forward_env` | JSON array: `'["GITHUB_TOKEN","OPENAI_API_KEY"]'` |
+| `TERMINAL_DOCKER_ENV` | `docker_env` | JSON dict: `'{"DEBUG":"1"}'` |
+| `TERMINAL_DOCKER_VOLUMES` | `docker_volumes` | JSON array of `"host:container[:ro]"` strings |
+| `TERMINAL_DOCKER_EXTRA_ARGS` | `docker_extra_args` | JSON array |
+| `TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE` | `docker_mount_cwd_to_workspace` | `true` / `false` |
+| `TERMINAL_DOCKER_RUN_AS_HOST_USER` | `docker_run_as_host_user` | `true` / `false` |
+| `TERMINAL_DOCKER_PERSIST_ACROSS_PROCESSES` | `docker_persist_across_processes` | `true` / `false` — default `true` |
+| `TERMINAL_DOCKER_ORPHAN_REAPER` | `docker_orphan_reaper` | `true` / `false` — default `true` |
+| `TERMINAL_CONTAINER_CPU` | `container_cpu` | CPU cores |
+| `TERMINAL_CONTAINER_MEMORY` | `container_memory` | MB |
+| `TERMINAL_CONTAINER_DISK` | `container_disk` | MB |
+| `TERMINAL_CONTAINER_PERSISTENT` | `container_persistent` | `true` / `false` — controls the bind-mount workspace dirs, distinct from `docker_persist_across_processes` |
+| `TERMINAL_LIFETIME_SECONDS` | `lifetime_seconds` | Idle reaper window |
+| `TERMINAL_TIMEOUT` | `timeout` | Per-command timeout |
+| `HERMES_DOCKER_BINARY` | _none_ | Force a specific docker/podman binary path |
+
 ### SSH Backend

 Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands.
@@ -54,12 +54,7 @@ This behavior applies to the s6-based image only. Earlier (tini-based) images st
 :::

 :::note Where gateway logs go
-Inside the s6 image, the supervised gateway's output is tee'd to two destinations:
-
- **`docker logs <container>`** — every line in real time (raw, no extra prefix). This is the same stream you'd get from a foreground gateway, so existing `docker logs --follow` / `--timestamps` / log-shipper integrations work unchanged.
- **`${HERMES_HOME}/logs/gateways/<profile>/current`** (mapped to `~/.hermes/logs/gateways/<profile>/current` on the host via the volume mount) — rotated, with an ISO 8601 timestamp prepended per line. Rotation is 10 archives × 1 MB each, so it can't fill the disk. This is what `hermes logs` reads and what survives container restarts.
-
-The per-profile reconciler keeps a separate audit log at `${HERMES_HOME}/logs/container-boot.log` — one line per profile per container boot, recording whether each gateway was restored to its prior state.
+See the [Where the logs go](#where-the-logs-go) section below for the full routing map (per-profile gateways, dashboard, boot reconciler, container-wide `docker logs`).
 :::

 Note: the API server is gated on `API_SERVER_ENABLED=true`. To expose it beyond `127.0.0.1` inside the container, also set `API_SERVER_HOST=0.0.0.0` and an `API_SERVER_KEY` (minimum 8 characters — generate one with `openssl rand -hex 32`). Example:
@@ -81,7 +76,7 @@ Opening any port on an internet facing machine is a security risk. You should no

 ## Running the dashboard

-The built-in web dashboard runs as an optional side-process inside the same container as the gateway. Set `HERMES_DASHBOARD=1` to run the dashboard on container loopback (`127.0.0.1`) by default:
+The built-in web dashboard runs as a supervised s6-rc service alongside the gateway in the same container. Set `HERMES_DASHBOARD=1` to bring it up:

 ```sh
 docker run -d \
@@ -89,54 +84,38 @@ docker run -d \
  --restart unless-stopped \
  -v ~/.hermes:/opt/data \
  -p 8642:8642 \
+  -p 9119:9119 \
  -e HERMES_DASHBOARD=1 \
  nousresearch/hermes-agent gateway run
 ```

-The entrypoint starts `hermes dashboard` in the background (running as the non-root `hermes` user) before `exec`-ing the main command. Dashboard output is prefixed with `[dashboard]` in `docker logs` so it's easy to separate from gateway logs.
+The dashboard is supervised by s6 — if it crashes, `s6-supervise` restarts it automatically after a short backoff. Dashboard stdout/stderr is forwarded to `docker logs <container>` (no prefix; the gateway's own output now lives in a per-profile s6-log file — see [Where the logs go](#where-the-logs-go) below — so the two streams don't clash).

 | Environment variable | Description | Default |
 |---------------------|-------------|---------|
-| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to launch the dashboard alongside the main command | *(unset — dashboard not started)* |
-| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `127.0.0.1` |
+| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to enable the supervised dashboard service | *(unset — service is registered but stays down)* |
+| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `0.0.0.0` |
 | `HERMES_DASHBOARD_PORT` | Port for the dashboard HTTP server | `9119` |
 | `HERMES_DASHBOARD_TUI` | Set to `1` to expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket) | *(unset)* |
 | `HERMES_DASHBOARD_INSECURE` | Set to `1` (or `true` / `yes`) to bind without the OAuth auth gate. Only use on trusted networks behind a reverse proxy without the OAuth contract — the dashboard exposes API keys and session data | *(unset — gate enforced when a `DashboardAuthProvider` is registered)* |

-By default, the dashboard stays on loopback (`127.0.0.1`) to avoid exposing
-the web surface over the network. To publish it intentionally, set
-`HERMES_DASHBOARD_HOST=0.0.0.0`. The dashboard's OAuth auth gate engages
-automatically whenever:
+The dashboard inside the container defaults to binding `0.0.0.0` — without it, the published `-p 9119:9119` port would not be reachable from the host. To restrict the bind to container loopback (for sidecar / reverse-proxy setups), set `HERMES_DASHBOARD_HOST=127.0.0.1`.

-1. The bind host is non-loopback, **and**
+The dashboard's OAuth auth gate engages automatically when both of the following are true:
+
+1. The bind host is non-loopback (e.g. the default `0.0.0.0` inside the container), **and**
 2. A `DashboardAuthProvider` plugin is registered.

-The bundled `dashboard_auth/nous` provider activates whenever
-`HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see
-[Web Dashboard → Authentication](features/web-dashboard.md)). With the
-gate engaged, browser callers are redirected to the configured portal's
-OAuth flow before they can reach any protected route.
+The bundled `dashboard_auth/nous` provider activates whenever `HERMES_DASHBOARD_OAUTH_CLIENT_ID` is set (see [Web Dashboard → Authentication](features/web-dashboard.md)). With the gate engaged, browser callers are redirected to the configured portal's OAuth flow before they can reach any protected route.

-If no provider is registered and the bind is non-loopback, the dashboard
-**fails closed at startup** with a specific error pointing at the
-missing env var. To opt out of the gate explicitly — for a trusted-LAN
-deployment behind your own reverse proxy without the OAuth contract —
-set `HERMES_DASHBOARD_INSECURE=1`. This re-enables the legacy "no auth,
-loud warning" mode and is the only path that disables the gate; the bind
-host does not implicitly determine `--insecure` anymore.
+If no provider is registered and the bind is non-loopback, the dashboard **fails closed at startup** with a specific error pointing at the missing env var. To opt out of the gate explicitly — for a trusted-LAN deployment behind your own reverse proxy without the OAuth contract — set `HERMES_DASHBOARD_INSECURE=1`. This is the **only** path that disables the gate; the bind host alone never implies `--insecure` (it used to, but that predated the OAuth gate and silently disabled it on every container-deployed dashboard).

-:::note
-The dashboard runs as a supervised s6 service inside the container. If
-the dashboard process crashes, s6-overlay restarts it automatically
-after a short backoff — you'll see a new PID without needing to
-restart the container. Logs and crash output are visible via
-`docker logs <container>` (s6 forwards service stdout/stderr there).
-
-Running the dashboard as a separate container is not supported: its
-gateway-liveness detection requires a shared PID namespace with the
-gateway process.
+:::warning `HERMES_DASHBOARD_INSECURE=1` exposes API keys
+Opting out of the OAuth gate serves the dashboard's API surface (including model keys and session data) to anyone who can reach the published port. Only enable it when you have your own auth layer in front, or on a trusted LAN you fully control.
 :::

+Running the dashboard as a separate container is not supported: its gateway-liveness detection requires a shared PID namespace with the gateway process.
+
 ## Running interactively (CLI chat)

 To open an interactive chat session against a running data directory:
@@ -179,37 +158,60 @@ Never run two Hermes **gateway** containers against the same data directory simu

 ## Multi-profile support

-Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` directories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **When running under Docker, using Hermes' built-in multi-profile feature is not recommended.**
+Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` subdirectories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **Inside the official Docker image, the s6 supervision tree treats each profile as a first-class supervised service**, so the recommended deployment is **one container hosting all profiles**.

-Instead, the recommended pattern is **one container per profile**, with each container bind-mounting its own host directory as `/opt/data`:
+Each profile created with `hermes profile create <name>` gets:
+
+- A dedicated s6 service slot at `/run/service/gateway-<name>/`, registered dynamically by the runtime — no container rebuild required.
+- Auto-restart on crash, backoff-managed by `s6-supervise`.
+- Per-profile rotated logs at `${HERMES_HOME}/logs/gateways/<name>/current` (10 archives × 1 MB each).
+- State persistence across container restarts: the boot-time reconciler reads `gateway_state.json` from each profile directory and brings the slot back up only for profiles whose last recorded state was `running`. Stopped profiles stay stopped.
+
+The lifecycle commands you'd run on the host work the same way from inside the container:

 ```sh
-# Work profile
-docker run -d \
-  --name hermes-work \
-  --restart unless-stopped \
-  -v ~/.hermes-work:/opt/data \
-  -p 8642:8642 \
-  nousresearch/hermes-agent gateway run
+# Create a profile — registers the gateway-<name> s6 slot.
+docker exec hermes hermes profile create coder

-# Personal profile
-docker run -d \
-  --name hermes-personal \
-  --restart unless-stopped \
-  -v ~/.hermes-personal:/opt/data \
-  -p 8643:8642 \
-  nousresearch/hermes-agent gateway run
+# Start / stop / restart — dispatches s6-svc; the gateway lifecycle survives docker restart.
+docker exec hermes hermes -p coder gateway start
+docker exec hermes hermes -p coder gateway stop
+docker exec hermes hermes -p coder gateway restart
+
+# Status — reports `Manager: s6 (container supervisor)` inside the container.
+docker exec hermes hermes -p coder gateway status
+
+# Remove a profile — tears down the s6 slot too.
+docker exec hermes hermes profile delete coder
 ```

-Why separate containers over profiles in Docker:
+Under the hood, `hermes gateway start/stop/restart` inside the container is intercepted and routed to `s6-svc` against the right service directory; you don't need to learn the s6 commands directly. For raw supervisor state, use `/command/s6-svstat /run/service/gateway-<name>` (note `/command/` is on PATH only for processes spawned by the supervision tree — when calling from `docker exec`, pass the absolute path).

- **Isolation** — each container has its own filesystem, process table, and resource limits. A crash, dependency change, or runaway session in one profile can't affect another.
- **Independent lifecycle** — upgrade, restart, pause, or roll back each agent separately (`docker restart hermes-work` leaves `hermes-personal` untouched).
- **Clean port and network separation** — each gateway binds its own host port; there's no risk of cross-talk between chat platforms or API servers.
- **Simpler mental model** — the container *is* the profile. Backups, migrations, and permissions all follow the bind-mounted directory, with no extra `--profile` flags to remember.
- **Avoids concurrent-write risk** — the warning above about never running two gateways against the same data directory still applies to profiles within a single container.
+### Why one container with many profiles, not many containers

-In Docker Compose, this just means declaring one service per profile with distinct `container_name`, `volumes`, and `ports`:
+Before the s6 migration, "one container per profile" was the recommended pattern because there was no in-container supervisor to manage multiple gateways. With s6 as PID 1, that's no longer necessary, and the single-container layout is simpler in almost every dimension:
+
+| | One container, many profiles | One container per profile |
+|---|---|---|
+| Disk overhead | One image, one bundled venv, one Playwright cache | N images / N caches |
+| Memory overhead | Shared Python interpreter cache, shared node_modules | Duplicated per container |
+| Profile creation | `docker exec ... hermes profile create <name>` (seconds) | New `docker run` invocation + port allocation + bind-mount config |
+| Per-profile crash recovery | `s6-supervise` auto-restart | Docker's `--restart unless-stopped` (slower, kills sibling work) |
+| Logs | Per-profile rotated file via `s6-log`, plus container-boot audit log | `docker logs <name>` per container — no built-in rotation |
+| Backup | One `~/.hermes` directory | N directories to coordinate |
+
+The default profile (`default`) is always registered on first boot, so a fresh container ships with one supervised gateway out of the box. Additional profiles are pure runtime adds.
+
+### When you DO want a separate container
+
+Profile-in-container is the default. Run a separate container per profile only when you have a specific reason:
+
+- **Resource isolation per workload** — e.g. a runaway browser-tool session in profile A shouldn't be able to OOM profile B. Containers give you `--memory` / `--cpus` per profile.
+- **Independent image pinning** — different upstream image tags per workload.
+- **Network segmentation** — distinct Docker networks per profile (e.g. one customer-facing, one internal).
+- **Compliance / blast radius** — distinct credentials never share an OS-level process tree.
+
+In those cases, declare one service per profile with distinct `container_name`, `volumes`, and `ports`:

 ```yaml
 services:
@@ -234,6 +236,24 @@ services:
      - ~/.hermes-personal:/opt/data
 ```

+The warning from [Persistent volumes](#persistent-volumes) still applies: never point two containers at the same `~/.hermes` directory simultaneously. The s6 supervisor inside each container manages its own profile set; cross-container sharing of a data volume corrupts session files and memory stores.
+
+## Where the logs go
+
+The s6 container has four distinct log surfaces, and "why isn't my gateway showing anything in `docker logs`" is a common surprise. Cheatsheet:
+
+| Source | Where it lands | How to read it |
+|---|---|---|
+| **Per-profile gateway** (`hermes gateway run` and per-profile gateways under s6) | Tee'd to two places: `docker logs <container>` (real time, no extra prefix) **and** `${HERMES_HOME}/logs/gateways/<profile>/current` (rotated, ISO-8601 timestamped, 10 archives × 1 MB each) | `docker logs -f hermes` or `tail -F ~/.hermes/logs/gateways/default/current` on the host |
+| **Dashboard** (when `HERMES_DASHBOARD=1`) | `docker logs <container>` (no prefix) | `docker logs -f hermes` — interleaved with gateway lines |
+| **Boot reconciler** (records which profile gateways were restored on each container start) | `${HERMES_HOME}/logs/container-boot.log` (append-only audit log) | `tail -F ~/.hermes/logs/container-boot.log` |
+| **Generic Hermes logs** (`agent.log`, `errors.log`) | `${HERMES_HOME}/logs/` (profile-aware) | `docker exec hermes hermes logs --follow [--level WARNING] [--session <id>]` |
+
+Two practical consequences worth knowing:
+
+- The file copy at `logs/gateways/<profile>/current` is what survives container restarts. `docker logs` only retains output from the current container's lifetime (and is wiped on `docker rm`); the rotated files persist on the bind-mounted volume.
+- The boot reconciler's audit line shape is `<iso-timestamp> profile=<name> prior_state=<state> action=<registered|started>`, so a quick `grep profile=coder ~/.hermes/logs/container-boot.log` reveals when a given profile was last restored and whether s6 auto-started it.
+
 ## Environment variable forwarding

 API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
@@ -249,7 +269,7 @@ docker run -it --rm \
 Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.

 :::note Looking for Docker as the **terminal backend**?
-This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one persistent container per Hermes process), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_run_as_host_user`, and `terminal.docker_extra_args`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set.
+This page covers running Hermes itself inside Docker. If you want Hermes to execute the agent's `terminal` / `execute_code` calls inside a Docker sandbox container (one long-lived container shared across Hermes processes — see issue #20561), that's a separate config block — `terminal.backend: docker` plus `terminal.docker_image`, `terminal.docker_volumes`, `terminal.docker_forward_env`, `terminal.docker_env`, `terminal.docker_run_as_host_user`, `terminal.docker_extra_args`, `terminal.docker_persist_across_processes`, and `terminal.docker_orphan_reaper`. See [Configuration → Docker Backend](configuration.md#docker-backend) for the full set including container-lifecycle rules.
 :::

 ## Docker Compose example
@@ -281,7 +301,7 @@ services:
          cpus: "2.0"
 ```

-Start with `docker compose up -d` and view logs with `docker compose logs -f`. Dashboard output is prefixed with `[dashboard]` so it's easy to filter from gateway logs.
+Start with `docker compose up -d` and view logs with `docker compose logs -f`. The supervised gateway's stdout is also tee'd to `${HERMES_HOME}/logs/gateways/<profile>/current` on the volume — see [Where the logs go](#where-the-logs-go) for the full routing map.

 ## Optional: Linux desktop audio bridge

@@ -415,24 +435,28 @@ The container ENTRYPOINT is now `/init` (s6-overlay), not `/usr/bin/tini`. All f
 Do not override the image entrypoint unless you keep `/init` (or, equivalently, the legacy `docker/entrypoint.sh` shim that forwards to the stage2 hook) in the command chain. s6-overlay's `/init` runs as root so it can chown the volume on first boot, then drops to the `hermes` user via `s6-setuidgid` for every supervised service AND for the main program. Starting `hermes gateway run` as root inside the official image is refused by default because it can leave root-owned files in `/opt/data` and break later dashboard or gateway starts. Set `HERMES_ALLOW_ROOT_GATEWAY=1` only when you intentionally accept that risk.
 :::

-### Per-profile gateway supervision
+### `docker exec` automatically drops to the `hermes` user

-Inside the container, each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`. The lifecycle commands you'd run on the host work the same way:
+`docker exec hermes <cmd>` defaults to running as root inside the container, but the image ships a thin shim at `/opt/hermes/bin/hermes` (earliest on PATH) that detects root callers and transparently re-execs through `s6-setuidgid hermes`. So `docker exec hermes login`, `docker exec hermes profile create …`, `docker exec hermes setup`, etc. all write files owned by UID 10000 — i.e. readable by the supervised gateway — with no extra `--user` flag needed. Non-root callers (the supervised processes themselves, `docker exec --user hermes`, kanban subagents inside the container) hit a short-circuit that exec's the venv binary directly, so there's no overhead on the hot paths.
+
+If you specifically need a `docker exec` that retains root semantics (diagnostic sessions, inspecting root-only state, files outside `/opt/data` that root happens to own), opt out per invocation:

 ```sh
-hermes profile create coder            # registers gateway-coder s6 slot
-hermes -p coder gateway start          # s6-svc -u  → supervised gateway
-hermes -p coder gateway stop           # s6-svc -d  → service down
-hermes -p coder gateway restart        # s6-svc -t  → SIGTERM the supervisor
-hermes profile delete coder            # tears down the s6 slot
+docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 hermes <cmd>
 ```

+The shim accepts `1` / `true` / `yes` (case-insensitive). Anything else — including typos like `=0` — falls through to the drop, so silent opt-outs aren't possible. If `s6-setuidgid` isn't available (custom builds that stripped s6-overlay), the shim refuses to run as root and exits 126 instead, surfacing the broken privilege model loudly rather than regressing to the historical footgun where `docker exec hermes login` would write `auth.json` as `root:root` and break the supervised gateway's auth on every chat platform message.
+
+### Per-profile gateway supervision
+
+Each profile created with `hermes profile create <name>` automatically gets an s6-supervised gateway service registered at `/run/service/gateway-<name>/`, with state-persistent auto-restart across container restarts. See [Multi-profile support](#multi-profile-support) above for the user-facing workflow and the lifecycle commands.
+
 **Supervision benefits over the pre-s6 image:**

 - Gateway crashes are auto-restarted by `s6-supervise` after a ~1s backoff.
- Dashboard crashes are auto-restarted (set `HERMES_DASHBOARD=1` to start it).
+- Dashboard, when enabled with `HERMES_DASHBOARD=1`, is supervised on the same supervision tree and gets the same auto-restart treatment.
 - `docker restart` preserves running gateways: the cont-init reconciler reads `$HERMES_HOME/profiles/<name>/gateway_state.json` and brings the slot back up if the last recorded state was `running`. Stopped gateways stay stopped.
- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot.
+- Per-profile gateway logs persist under `$HERMES_HOME/logs/gateways/<profile>/current` (rotated by `s6-log`), and the reconciler's actions are appended to `$HERMES_HOME/logs/container-boot.log` per boot. See [Where the logs go](#where-the-logs-go) for the full routing map.

 `hermes status` inside the container reports `Manager: s6 (container supervisor)`. Use `/command/s6-svstat /run/service/gateway-<name>` for the raw supervisor view (note `/command/` is on PATH for supervision-tree processes only; pass the absolute path when calling from `docker exec`).

@@ -692,6 +716,8 @@ The container's stage2 hook drops privileges to the non-root `hermes` user (UID
 chmod -R 755 ~/.hermes
 ```

+`docker exec hermes <cmd>` automatically drops to UID 10000 too — see [`docker exec` automatically drops to the `hermes` user](#docker-exec-automatically-drops-to-the-hermes-user) for details and the per-invocation opt-out.
+
 ### Browser tools not working

 Playwright needs shared memory. Add `--shm-size=1g` to your Docker run command:
@@ -500,7 +500,7 @@ export default function SkillsDashboard() {
  const sources = useMemo(() => {
    const set = new Set(allSkillsLocal.map((s) => s.source));
    return SOURCE_ORDER.filter((s) => s === "all" || set.has(s));
-  }, []);
+  }, [allSkillsLocal]);

  const categoryEntries = useMemo(() => {
    const pool =
@@ -523,7 +523,7 @@ export default function SkillsDashboard() {
    return Array.from(map.entries())
      .sort((a, b) => b[1].count - a[1].count)
      .map(([key, { label, count }]) => ({ key, label, count }));
-  }, [sourceFilter]);
+  }, [sourceFilter, allSkillsLocal]);

  const filtered = useMemo(() => {
    const q = debouncedSearch.toLowerCase().trim();