fix: follow-up for salvaged PR #8952

- Rename provider_contracts.py -> volcengine_byteplus.py for explicitness - Consolidate duplicate host-to-provider mappings: provider_for_base_url() now uses the canonical _URL_TO_PROVIDER from model_metadata.py instead of maintaining a separate 20-entry dict - Add volcengine/byteplus to runtime_provider.py model-dependent base URL resolution (kimi-style special case) so manually-edited configs resolve the coding-plan base URL correctly - Remove volcengine/byteplus from _API_KEY_PROVIDER_AUX_MODELS — the main-model-first design in _resolve_auto() handles these providers already; entries were dead code in the normal flow - Add VOLCENGINE_API_KEY and BYTEPLUS_API_KEY to OPTIONAL_ENV_VARS in config.py so they appear in hermes setup - Update docs: environment-variables.md, fallback-providers.md, configuration.md
feat(providers): add Volcengine and BytePlus support
2026-04-22 22:42:39 +05:30 · 2026-04-22 22:33:06 +05:30
155 changed files with 5495 additions and 12568 deletions
@@ -243,17 +243,6 @@ npm run fmt       # prettier
 npm test          # vitest
 ```

-### TUI in the Dashboard (`hermes dashboard` → `/chat`)
-
-The dashboard embeds the real `hermes --tui` — **not** a rewrite.  See `hermes_cli/pty_bridge.py` + the `@app.websocket("/api/pty")` endpoint in `hermes_cli/web_server.py`.
-
- Browser loads `web/src/pages/ChatPage.tsx`, which mounts xterm.js's `Terminal` with the WebGL renderer, `@xterm/addon-fit` for container-driven resize, and `@xterm/addon-unicode11` for modern wide-character widths.
- `/api/pty?token=…` upgrades to a WebSocket; auth uses the same ephemeral `_SESSION_TOKEN` as REST, via query param (browsers can't set `Authorization` on WS upgrade).
- The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
- Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.
-
-**Never add a parallel chat surface in React.** If you catch yourself re-implementing slash popover / model picker / tool cards for the dashboard, stop — the TUI already does those, and anything new you add to Ink will appear in the dashboard automatically.
-
 ---

 ## Adding New Tools
@@ -88,7 +88,7 @@ cp cli-config.yaml.example ~/.hermes/config.yaml
 touch ~/.hermes/.env

 # Add at minimum an LLM provider key:
-echo "OPENROUTER_API_KEY=***" >> ~/.hermes/.env
+echo 'OPENROUTER_API_KEY=sk-or-v1-your-key' >> ~/.hermes/.env
 ```

 ### Run
@@ -12,7 +12,7 @@ ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
 # Install system dependencies in one layer, clear APT cache
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
-        build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli && \
+        build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git && \
    rm -rf /var/lib/apt/lists/*

 # Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
@@ -50,6 +50,5 @@ RUN uv venv && \
 # ---------- Runtime ----------
 ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
 ENV HERMES_HOME=/opt/data
-ENV PATH="/opt/data/.local/bin:${PATH}"
 VOLUME [ "/opt/data" ]
 ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
@@ -13,7 +13,7 @@

 **The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.

-Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
+Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [Volcengine](https://www.volcengine.com/product/ark), [BytePlus](https://www.byteplus.com/en/product/modelark), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.

 <table>
 <tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
@@ -173,6 +173,7 @@ python -m pytest tests/ -q
 - 💬 [Discord](https://discord.gg/NousResearch)
 - 📚 [Skills Hub](https://agentskills.io)
 - 🐛 [Issues](https://github.com/NousResearch/hermes-agent/issues)
+- 💡 [Discussions](https://github.com/NousResearch/hermes-agent/discussions)
 - 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — Community WeChat bridge: Run Hermes Agent and OpenClaw on the same WeChat account.

 ---
@@ -117,63 +117,6 @@ def _get_anthropic_max_output(model: str) -> int:
    return best_val


-def _resolve_positive_anthropic_max_tokens(value) -> Optional[int]:
-    """Return ``value`` floored to a positive int, or ``None`` if it is not a
-    finite positive number. Ported from openclaw/openclaw#66664.
-
-    Anthropic's Messages API rejects ``max_tokens`` values that are 0,
-    negative, non-integer, or non-finite with HTTP 400. Python's ``or``
-    idiom (``max_tokens or fallback``) correctly catches ``0`` but lets
-    negative ints and fractional floats (``-1``, ``0.5``) through to the
-    API, producing a user-visible failure instead of a local error.
-    """
-    # Booleans are a subclass of int — exclude explicitly so ``True`` doesn't
-    # silently become 1 and ``False`` doesn't become 0.
-    if isinstance(value, bool):
-        return None
-    if not isinstance(value, (int, float)):
-        return None
-    try:
-        import math
-        if not math.isfinite(value):
-            return None
-    except Exception:
-        return None
-    floored = int(value)  # truncates toward zero for floats
-    return floored if floored > 0 else None
-
-
-def _resolve_anthropic_messages_max_tokens(
-    requested,
-    model: str,
-    context_length: Optional[int] = None,
-) -> int:
-    """Resolve the ``max_tokens`` budget for an Anthropic Messages call.
-
-    Prefers ``requested`` when it is a positive finite number; otherwise
-    falls back to the model's output ceiling. Raises ``ValueError`` if no
-    positive budget can be resolved (should not happen with current model
-    table defaults, but guards against a future regression where
-    ``_get_anthropic_max_output`` could return ``0``).
-
-    Separately, callers apply a context-window clamp — this resolver does
-    not, to keep the positive-value contract independent of endpoint
-    specifics.
-
-    Ported from openclaw/openclaw#66664 (resolveAnthropicMessagesMaxTokens).
-    """
-    resolved = _resolve_positive_anthropic_max_tokens(requested)
-    if resolved is not None:
-        return resolved
-    fallback = _get_anthropic_max_output(model)
-    if fallback > 0:
-        return fallback
-    raise ValueError(
-        f"Anthropic Messages adapter requires a positive max_tokens value for "
-        f"model {model!r}; got {requested!r} and no model default resolved."
-    )
-
-
 def _supports_adaptive_thinking(model: str) -> bool:
    """Return True for Claude 4.6+ models that support adaptive thinking."""
    return any(v in model for v in _ADAPTIVE_THINKING_SUBSTRINGS)
@@ -1448,12 +1391,7 @@ def build_anthropic_kwargs(

    model = normalize_model_name(model, preserve_dots=preserve_dots)
    # effective_max_tokens = output cap for this call (≠ total context window)
-    # Use the resolver helper so non-positive values (negative ints,
-    # fractional floats, NaN, non-numeric) fail locally with a clear error
-    # rather than 400-ing at the Anthropic API. See openclaw/openclaw#66664.
-    effective_max_tokens = _resolve_anthropic_messages_max_tokens(
-        max_tokens, model, context_length=context_length
-    )
+    effective_max_tokens = max_tokens or _get_anthropic_max_output(model)

    # Clamp output cap to fit inside the total context window.
    # Only matters for small custom endpoints where context_length < native
@@ -1666,3 +1604,42 @@ def normalize_anthropic_response(
        ),
        finish_reason,
    )
+
+
+def normalize_anthropic_response_v2(
+    response,
+    strip_tool_prefix: bool = False,
+) -> "NormalizedResponse":
+    """Normalize Anthropic response to NormalizedResponse.
+
+    Wraps the existing normalize_anthropic_response() and maps its output
+    to the shared transport types.  This allows incremental migration —
+    one call site at a time — without changing the original function.
+    """
+    from agent.transports.types import NormalizedResponse, build_tool_call
+
+    assistant_msg, finish_reason = normalize_anthropic_response(response, strip_tool_prefix)
+
+    tool_calls = None
+    if assistant_msg.tool_calls:
+        tool_calls = [
+            build_tool_call(
+                id=tc.id,
+                name=tc.function.name,
+                arguments=tc.function.arguments,
+            )
+            for tc in assistant_msg.tool_calls
+        ]
+
+    provider_data = {}
+    if getattr(assistant_msg, "reasoning_details", None):
+        provider_data["reasoning_details"] = assistant_msg.reasoning_details
+
+    return NormalizedResponse(
+        content=assistant_msg.content,
+        tool_calls=tool_calls,
+        finish_reason=finish_reason,
+        reasoning=getattr(assistant_msg, "reasoning", None),
+        usage=None,  # Anthropic usage is on the raw response, not the normaliser
+        provider_data=provider_data or None,
+    )
@@ -74,6 +74,10 @@ _PROVIDER_ALIASES = {
    "minimax_cn": "minimax-cn",
    "claude": "anthropic",
    "claude-code": "anthropic",
+    "volcengine-coding-plan": "volcengine",
+    "volcengine_coding_plan": "volcengine",
+    "byteplus-coding-plan": "byteplus",
+    "byteplus_coding_plan": "byteplus",
 }


@@ -64,47 +64,6 @@ _CHARS_PER_TOKEN = 4
 _SUMMARY_FAILURE_COOLDOWN_SECONDS = 600


-def _content_text_for_contains(content: Any) -> str:
-    """Return a best-effort text view of message content.
-
-    Used only for substring checks when we need to know whether we've already
-    appended a note to a message. Keeps multimodal lists intact elsewhere.
-    """
-    if content is None:
-        return ""
-    if isinstance(content, str):
-        return content
-    if isinstance(content, list):
-        parts: list[str] = []
-        for item in content:
-            if isinstance(item, str):
-                parts.append(item)
-            elif isinstance(item, dict):
-                text = item.get("text")
-                if isinstance(text, str):
-                    parts.append(text)
-        return "\n".join(part for part in parts if part)
-    return str(content)
-
-
-def _append_text_to_content(content: Any, text: str, *, prepend: bool = False) -> Any:
-    """Append or prepend plain text to message content safely.
-
-    Compression sometimes needs to add a note or merge a summary into an
-    existing message. Message content may be plain text or a multimodal list of
-    blocks, so direct string concatenation is not always safe.
-    """
-    if content is None:
-        return text
-    if isinstance(content, str):
-        return text + content if prepend else content + text
-    if isinstance(content, list):
-        text_block = {"type": "text", "text": text}
-        return [text_block, *content] if prepend else [*content, text_block]
-    rendered = str(content)
-    return text + rendered if prepend else rendered + text
-
-
 def _truncate_tool_call_args_json(args: str, head_chars: int = 200) -> str:
    """Shrink long string values inside a tool-call arguments JSON blob while
    preserving JSON validity.
@@ -848,7 +807,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
                )
                self.summary_model = ""  # empty = use main model
                self._summary_failure_cooldown_until = 0.0  # no cooldown
-                return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)  # retry immediately
+                return self._generate_summary(turns_to_summarize)  # retry immediately

            # Transient errors (timeout, rate limit, network) — shorter cooldown
            _transient_cooldown = 60
@@ -1185,13 +1144,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        for i in range(compress_start):
            msg = messages[i].copy()
            if i == 0 and msg.get("role") == "system":
-                existing = msg.get("content")
+                existing = msg.get("content") or ""
                _compression_note = "[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
-                if _compression_note not in _content_text_for_contains(existing):
-                    msg["content"] = _append_text_to_content(
-                        existing,
-                        "\n\n" + _compression_note if isinstance(existing, str) and existing else _compression_note,
-                    )
+                if _compression_note not in existing:
+                    msg["content"] = existing + "\n\n" + _compression_note
            compressed.append(msg)

        # If LLM summary failed, insert a static fallback so the model
@@ -1235,15 +1191,12 @@ The user has requested that this compaction PRIORITISE preserving all informatio
        for i in range(compress_end, n_messages):
            msg = messages[i].copy()
            if _merge_summary_into_tail and i == compress_end:
-                merged_prefix = (
+                original = msg.get("content") or ""
+                msg["content"] = (
                    summary
                    + "\n\n--- END OF CONTEXT SUMMARY — "
                    "respond to the message below, not the summary above ---\n\n"
-                )
-                msg["content"] = _append_text_to_content(
-                    msg.get("content"),
-                    merged_prefix,
-                    prepend=True,
+                    + original
                )
                _merge_summary_into_tail = False
            compressed.append(msg)
@@ -220,25 +220,12 @@ _TRANSPORT_ERROR_TYPES = frozenset({
    "ConnectionAbortedError", "BrokenPipeError",
    "TimeoutError", "ReadError",
    "ServerDisconnectedError",
-    # SSL/TLS transport errors — transient mid-stream handshake/record
-    # failures that should retry rather than surface as a stalled session.
-    # ssl.SSLError subclasses OSError (caught by isinstance) but we list
-    # the type names here so provider-wrapped SSL errors (e.g. when the
-    # SDK re-raises without preserving the exception chain) still classify
-    # as transport rather than falling through to the unknown bucket.
-    "SSLError", "SSLZeroReturnError", "SSLWantReadError",
-    "SSLWantWriteError", "SSLEOFError", "SSLSyscallError",
    # OpenAI SDK errors (not subclasses of Python builtins)
    "APIConnectionError",
    "APITimeoutError",
 })

-# Server disconnect patterns (no status code, but transport-level).
-# These are the "ambiguous" patterns — a plain connection close could be
-# transient transport hiccup OR server-side context overflow rejection
-# (common when the API gateway disconnects instead of returning an HTTP
-# error for oversized requests).  A large session + one of these patterns
-# triggers the context-overflow-with-compression recovery path.
+# Server disconnect patterns (no status code, but transport-level)
 _SERVER_DISCONNECT_PATTERNS = [
    "server disconnected",
    "peer closed connection",
@@ -249,40 +236,6 @@ _SERVER_DISCONNECT_PATTERNS = [
    "incomplete chunked read",
 ]

-# SSL/TLS transient failure patterns — intentionally distinct from
-# _SERVER_DISCONNECT_PATTERNS above.
-#
-# An SSL alert mid-stream is almost always a transport-layer hiccup
-# (flaky network, mid-session TLS renegotiation failure, load balancer
-# dropping the connection) — NOT a server-side context overflow signal.
-# So we want the retry path but NOT the compression path; lumping these
-# into _SERVER_DISCONNECT_PATTERNS would trigger unnecessary (and
-# expensive) context compression on any large-session SSL hiccup.
-#
-# The OpenSSL library constructs error codes by prepending a format string
-# to the uppercased alert reason; OpenSSL 3.x changed the separator
-# (e.g. `SSLV3_ALERT_BAD_RECORD_MAC` → `SSL/TLS_ALERT_BAD_RECORD_MAC`),
-# which silently stopped matching anything explicit.  Matching on the
-# stable substrings (`bad record mac`, `ssl alert`, `tls alert`, etc.)
-# survives future OpenSSL format churn without code changes.
-_SSL_TRANSIENT_PATTERNS = [
-    # Space-separated (human-readable form, Python ssl module, most SDKs)
-    "bad record mac",
-    "ssl alert",
-    "tls alert",
-    "ssl handshake failure",
-    "tlsv1 alert",
-    "sslv3 alert",
-    # Underscore-separated (OpenSSL error code tokens, e.g.
-    # `ERR_SSL_SSL/TLS_ALERT_BAD_RECORD_MAC`, `SSLV3_ALERT_BAD_RECORD_MAC`)
-    "bad_record_mac",
-    "ssl_alert",
-    "tls_alert",
-    "tls_alert_internal_error",
-    # Python ssl module prefix, e.g. "[SSL: BAD_RECORD_MAC]"
-    "[ssl:",
-]
-

 # ── Classification pipeline ─────────────────────────────────────────────

@@ -302,10 +255,9 @@ def classify_api_error(
      2. HTTP status code + message-aware refinement
      3. Error code classification (from body)
      4. Message pattern matching (billing vs rate_limit vs context vs auth)
-      5. SSL/TLS transient alert patterns → retry as timeout
+      5. Transport error heuristics
      6. Server disconnect + large session → context overflow
-      7. Transport error heuristics
-      8. Fallback: unknown (retryable with backoff)
+      7. Fallback: unknown (retryable with backoff)

    Args:
        error: The exception from the API call.
@@ -436,18 +388,7 @@ def classify_api_error(
    if classified is not None:
        return classified

-    # ── 5. SSL/TLS transient errors → retry as timeout (not compression) ──
-    # SSL alerts mid-stream are transport hiccups, not server-side context
-    # overflow signals.  Classify before the disconnect check so a large
-    # session doesn't incorrectly trigger context compression when the real
-    # cause is a flaky TLS handshake.  Also matches when the error is
-    # wrapped in a generic exception whose message string carries the SSL
-    # alert text but the type isn't ssl.SSLError (happens with some SDKs
-    # that re-raise without chaining).
-    if any(p in error_msg for p in _SSL_TRANSIENT_PATTERNS):
-        return _result(FailoverReason.timeout, retryable=True)
-
-    # ── 6. Server disconnect + large session → context overflow ─────
+    # ── 5. Server disconnect + large session → context overflow ─────
    # Must come BEFORE generic transport error catch — a disconnect on
    # a large session is more likely context overflow than a transient
    # transport hiccup.  Without this ordering, RemoteProtocolError
@@ -464,12 +405,12 @@ def classify_api_error(
            )
        return _result(FailoverReason.timeout, retryable=True)

-    # ── 7. Transport / timeout heuristics ───────────────────────────
+    # ── 6. Transport / timeout heuristics ───────────────────────────

    if error_type in _TRANSPORT_ERROR_TYPES or isinstance(error, (TimeoutError, ConnectionError, OSError)):
        return _result(FailoverReason.timeout, retryable=True)

-    # ── 8. Fallback: unknown ────────────────────────────────────────
+    # ── 7. Fallback: unknown ────────────────────────────────────────

    return _result(FailoverReason.unknown, retryable=True)

@@ -4,7 +4,6 @@ Pure utility functions with no AIAgent dependency. Used by ContextCompressor
 and run_agent.py for pre-flight context checks.
 """

-import ipaddress
 import logging
 import re
 import time
@@ -15,8 +14,8 @@ from urllib.parse import urlparse
 import requests
 import yaml

+from hermes_cli.volcengine_byteplus import model_context_window
 from utils import base_url_host_matches, base_url_hostname
-
 from hermes_constants import OPENROUTER_MODELS_URL

 logger = logging.getLogger(__name__)
@@ -31,6 +30,10 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "qwen-oauth",
    "xiaomi",
    "arcee",
+    "volcengine",
+    "volcengine-coding-plan",
+    "byteplus",
+    "byteplus-coding-plan",
    "custom", "local",
    # Common aliases
    "google", "google-gemini", "google-ai-studio",
@@ -52,13 +55,6 @@ _OLLAMA_TAG_PATTERN = re.compile(
 )


-# Tailscale's CGNAT range (RFC 6598). `ipaddress.is_private` excludes this
-# block, so without an explicit check Ollama reached over Tailscale (e.g.
-# `http://100.77.243.5:11434`) wouldn't be treated as local and its stream
-# read / stale timeouts wouldn't get auto-bumped. Built once at import time.
-_TAILSCALE_CGNAT = ipaddress.IPv4Network("100.64.0.0/10")
-
-
 def _strip_provider_prefix(model: str) -> str:
    """Strip a recognised provider prefix from a model string.

@@ -133,8 +129,6 @@ DEFAULT_CONTEXT_LENGTHS = {
    # Google
    "gemini": 1048576,
    # Gemma (open models served via AI Studio)
-    "gemma-4": 256000,  # Gemma 4 family
-    "gemma4": 256000,  # Ollama-style naming (e.g. gemma4:31b-cloud)
    "gemma-4-31b": 256000,
    "gemma-3": 131072,
    "gemma": 8192,  # fallback for older gemma models
@@ -187,8 +181,6 @@ DEFAULT_CONTEXT_LENGTHS = {
    "mimo-v2-pro": 1000000,
    "mimo-v2-omni": 256000,
    "mimo-v2-flash": 256000,
-    "mimo-v2.5-pro": 1000000,
-    "mimo-v2.5": 1000000,
    "zai-org/GLM-5": 202752,
 }

@@ -203,7 +195,6 @@ _CONTEXT_LENGTH_KEYS = (
    "max_seq_len",
    "n_ctx_train",
    "n_ctx",
-    "ctx_size",
 )

 _MAX_COMPLETION_KEYS = (
@@ -247,7 +238,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "chatgpt.com": "openai",
    "api.anthropic.com": "anthropic",
    "api.z.ai": "zai",
-    "open.bigmodel.cn": "zai",
    "api.moonshot.ai": "kimi-coding",
    "api.moonshot.cn": "kimi-coding-cn",
    "api.kimi.com": "kimi-coding",
@@ -271,6 +261,8 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.xiaomimimo.com": "xiaomi",
    "xiaomimimo.com": "xiaomi",
    "ollama.com": "ollama-cloud",
+    "ark.cn-beijing.volces.com": "volcengine",
+    "ark.ap-southeast.bytepluses.com": "byteplus",
 }


@@ -297,15 +289,7 @@ def _is_known_provider_base_url(base_url: str) -> bool:


 def is_local_endpoint(base_url: str) -> bool:
-    """Return True if base_url points to a local machine.
-
-    Recognises loopback (``localhost``, ``127.0.0.0/8``, ``::1``),
-    container-internal DNS names (``host.docker.internal`` et al.),
-    RFC-1918 private ranges (``10/8``, ``172.16/12``, ``192.168/16``),
-    link-local, and Tailscale CGNAT (``100.64.0.0/10``). Tailscale CGNAT
-    is included so remote-but-trusted Ollama boxes reached over a
-    Tailscale mesh get the same timeout auto-bumps as localhost Ollama.
-    """
+    """Return True if base_url points to a local machine (localhost / RFC-1918 / WSL)."""
    normalized = _normalize_base_url(base_url)
    if not normalized:
        return False
@@ -320,17 +304,14 @@ def is_local_endpoint(base_url: str) -> bool:
    # Docker / Podman / Lima internal DNS names (e.g. host.docker.internal)
    if any(host.endswith(suffix) for suffix in _CONTAINER_LOCAL_SUFFIXES):
        return True
-    # RFC-1918 private ranges, link-local, and Tailscale CGNAT
+    # RFC-1918 private ranges and link-local
+    import ipaddress
    try:
        addr = ipaddress.ip_address(host)
-        if addr.is_private or addr.is_loopback or addr.is_link_local:
-            return True
-        if isinstance(addr, ipaddress.IPv4Address) and addr in _TAILSCALE_CGNAT:
-            return True
+        return addr.is_private or addr.is_loopback or addr.is_link_local
    except ValueError:
        pass
    # Bare IP that looks like a private range (e.g. 172.26.x.x for WSL)
-    # or Tailscale CGNAT (100.64.x.x–100.127.x.x).
    parts = host.split(".")
    if len(parts) == 4:
        try:
@@ -341,8 +322,6 @@ def is_local_endpoint(base_url: str) -> bool:
                return True
            if first == 192 and second == 168:
                return True
-            if first == 100 and 64 <= second <= 127:
-                return True
        except ValueError:
            pass
    return False
@@ -1146,12 +1125,20 @@ def get_model_context_length(
        ctx = _resolve_nous_context_length(model)
        if ctx:
            return ctx
+    if effective_provider in {"volcengine", "byteplus"}:
+        ctx = model_context_window(model)
+        if ctx:
+            return ctx
    if effective_provider:
        from agent.models_dev import lookup_models_dev_context
        ctx = lookup_models_dev_context(effective_provider, model)
        if ctx:
            return ctx

+    ctx = model_context_window(model)
+    if ctx:
+        return ctx
+
    # 6. OpenRouter live API metadata (provider-unaware fallback)
    metadata = fetch_model_metadata()
    if model in metadata:
@@ -435,7 +435,7 @@ def iter_skill_index_files(skills_dir: Path, filename: str):
    Excludes ``.git``, ``.github``, ``.hub`` directories.
    """
    matches = []
-    for root, dirs, files in os.walk(skills_dir, followlinks=True):
+    for root, dirs, files in os.walk(skills_dir):
        dirs[:] = [d for d in dirs if d not in EXCLUDED_SKILL_DIRS]
        if filename in files:
            matches.append(Path(root) / filename)
@@ -78,50 +78,23 @@ class AnthropicTransport(ProviderTransport):
    def normalize_response(self, response: Any, **kwargs) -> NormalizedResponse:
        """Normalize Anthropic response to NormalizedResponse.

-        Calls the adapter's v1 normalize and maps the (SimpleNamespace, finish_reason)
-        tuple to the shared NormalizedResponse type.
+        kwargs:
+            strip_tool_prefix: bool — strip 'mcp_mcp_' prefixes from tool names.
        """
-        from agent.anthropic_adapter import normalize_anthropic_response
-        from agent.transports.types import build_tool_call
+        from agent.anthropic_adapter import normalize_anthropic_response_v2

        strip_tool_prefix = kwargs.get("strip_tool_prefix", False)
-        assistant_msg, finish_reason = normalize_anthropic_response(response, strip_tool_prefix)
-
-        tool_calls = None
-        if assistant_msg.tool_calls:
-            tool_calls = [
-                build_tool_call(id=tc.id, name=tc.function.name, arguments=tc.function.arguments)
-                for tc in assistant_msg.tool_calls
-            ]
-
-        provider_data = {}
-        if getattr(assistant_msg, "reasoning_details", None):
-            provider_data["reasoning_details"] = assistant_msg.reasoning_details
-
-        return NormalizedResponse(
-            content=assistant_msg.content,
-            tool_calls=tool_calls,
-            finish_reason=finish_reason,
-            reasoning=getattr(assistant_msg, "reasoning", None),
-            usage=None,
-            provider_data=provider_data or None,
-        )
+        return normalize_anthropic_response_v2(response, strip_tool_prefix=strip_tool_prefix)

    def validate_response(self, response: Any) -> bool:
-        """Check Anthropic response structure is valid.
-
-        An empty content list is legitimate when ``stop_reason == "end_turn"``
-        — the model's canonical way of signalling "nothing more to add" after
-        a tool turn that already delivered the user-facing text. Treating it
-        as invalid falsely retries a completed response.
-        """
+        """Check Anthropic response structure is valid."""
        if response is None:
            return False
        content_blocks = getattr(response, "content", None)
        if not isinstance(content_blocks, list):
            return False
        if not content_blocks:
-            return getattr(response, "stop_reason", None) == "end_turn"
+            return False
        return True

    def extract_cache_stats(self, response: Any) -> Optional[Dict[str, int]]:
@@ -533,22 +533,10 @@ def normalize_usage(
        prompt_total = _to_int(getattr(response_usage, "prompt_tokens", 0))
        output_tokens = _to_int(getattr(response_usage, "completion_tokens", 0))
        details = getattr(response_usage, "prompt_tokens_details", None)
-        # Primary: OpenAI-style prompt_tokens_details. Fallback: Anthropic-style
-        # top-level fields that some OpenAI-compatible proxies (OpenRouter, Vercel
-        # AI Gateway, Cline) expose when routing Claude models — without this
-        # fallback, cache writes are undercounted as 0 and cache reads can be
-        # missed when the proxy only surfaces them at the top level.
-        # Port of cline/cline#10266.
        cache_read_tokens = _to_int(getattr(details, "cached_tokens", 0) if details else 0)
-        if not cache_read_tokens:
-            cache_read_tokens = _to_int(getattr(response_usage, "cache_read_input_tokens", 0))
        cache_write_tokens = _to_int(
            getattr(details, "cache_write_tokens", 0) if details else 0
        )
-        if not cache_write_tokens:
-            cache_write_tokens = _to_int(
-                getattr(response_usage, "cache_creation_input_tokens", 0)
-            )
        input_tokens = max(0, prompt_total - cache_read_tokens - cache_write_tokens)

    reasoning_tokens = 0
@@ -776,7 +776,6 @@ delegation:
  # max_concurrent_children: 3                # Max parallel child agents (default: 3)
  # max_spawn_depth: 1                        # Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
  # orchestrator_enabled: true                # Kill switch for role="orchestrator" children (default: true).
-  # inherit_mcp_toolsets: true                # When explicit child toolsets are narrowed, also keep the parent's MCP toolsets (default: true). Set false for strict intersection.
  # model: "google/gemini-3-flash-preview"    # Override model for subagents (empty = inherit parent)
  # provider: "openrouter"                    # Override provider for subagents (empty = inherit parent)
  #                                           # Resolves full credentials (base_url, api_key) automatically.
@@ -108,11 +108,6 @@ def _strip_reasoning_tags(text: str) -> str:
    ``<thought>`` (Gemma 4).  Must stay in sync with
    ``run_agent.py::_strip_think_blocks`` and the stream consumer's
    ``_OPEN_THINK_TAGS`` / ``_CLOSE_THINK_TAGS`` tuples.
-
-    Also strips tool-call XML blocks some open models leak into visible
-    content (``<tool_call>``, ``<function_calls>``, Gemma-style
-    ``<function name="…">…</function>``). Ported from
-    openclaw/openclaw#67318.
    """
    cleaned = text
    for tag in _REASONING_TAGS:
@@ -137,31 +132,6 @@ def _strip_reasoning_tags(text: str) -> str:
            cleaned,
            flags=re.IGNORECASE,
        )
-    # Tool-call XML blocks (openclaw/openclaw#67318).
-    for tc_tag in ("tool_call", "tool_calls", "tool_result",
-                   "function_call", "function_calls"):
-        cleaned = re.sub(
-            rf"<{tc_tag}\b[^>]*>.*?</{tc_tag}>\s*",
-            "",
-            cleaned,
-            flags=re.DOTALL | re.IGNORECASE,
-        )
-    # <function name="..."> — boundary + attribute gated to avoid prose FPs.
-    cleaned = re.sub(
-        r'(?:(?<=^)|(?<=[\n\r.!?:]))[ \t]*'
-        r'<function\b[^>]*\bname\s*=[^>]*>'
-        r'(?:(?:(?!</function>).)*)</function>\s*',
-        '',
-        cleaned,
-        flags=re.DOTALL | re.IGNORECASE,
-    )
-    # Stray tool-call close tags.
-    cleaned = re.sub(
-        r'</(?:tool_call|tool_calls|tool_result|function_call|function_calls|function)>\s*',
-        '',
-        cleaned,
-        flags=re.IGNORECASE,
-    )
    return cleaned.strip()


@@ -68,19 +68,4 @@ if [ -d "$INSTALL_DIR/skills" ]; then
    python3 "$INSTALL_DIR/tools/skills_sync.py"
 fi

-# Final exec: two supported invocation patterns.
-#
-#   docker run <image>                 -> exec `hermes` with no args (legacy default)
-#   docker run <image> chat -q "..."   -> exec `hermes chat -q "..."` (legacy wrap)
-#   docker run <image> sleep infinity  -> exec `sleep infinity` directly
-#   docker run <image> bash            -> exec `bash` directly
-#
-# If the first positional arg resolves to an executable on PATH, we assume the
-# caller wants to run it directly (needed by the launcher which runs long-lived
-# `sleep infinity` sandbox containers — see tools/environments/docker.py).
-# Otherwise we treat the args as a hermes subcommand and wrap with `hermes`,
-# preserving the documented `docker run <image> <subcommand>` behavior.
-if [ $# -gt 0 ] && command -v "$1" >/dev/null 2>&1; then
-    exec "$@"
-fi
 exec hermes "$@"
@@ -135,22 +135,9 @@ class HookRegistry:
            except Exception as e:
                print(f"[hooks] Error loading hook {hook_dir.name}: {e}", flush=True)

-    def _resolve_handlers(self, event_type: str) -> List[Callable]:
-        """Return all handlers that should fire for ``event_type``.
-
-        Exact matches fire first, followed by wildcard matches (e.g.
-        ``command:*`` matches ``command:reset``).
-        """
-        handlers = list(self._handlers.get(event_type, []))
-        if ":" in event_type:
-            base = event_type.split(":")[0]
-            wildcard_key = f"{base}:*"
-            handlers.extend(self._handlers.get(wildcard_key, []))
-        return handlers
-
    async def emit(self, event_type: str, context: Optional[Dict[str, Any]] = None) -> None:
        """
-        Fire all handlers registered for an event, discarding return values.
+        Fire all handlers registered for an event.

        Supports wildcard matching: handlers registered for "command:*" will
        fire for any "command:..." event. Handlers registered for a base type
@@ -164,7 +151,16 @@ class HookRegistry:
        if context is None:
            context = {}

-        for fn in self._resolve_handlers(event_type):
+        # Collect handlers: exact match + wildcard match
+        handlers = list(self._handlers.get(event_type, []))
+
+        # Check for wildcard patterns (e.g., "command:*" matches "command:reset")
+        if ":" in event_type:
+            base = event_type.split(":")[0]
+            wildcard_key = f"{base}:*"
+            handlers.extend(self._handlers.get(wildcard_key, []))
+
+        for fn in handlers:
            try:
                result = fn(event_type, context)
                # Support both sync and async handlers
@@ -172,32 +168,3 @@ class HookRegistry:
                    await result
            except Exception as e:
                print(f"[hooks] Error in handler for '{event_type}': {e}", flush=True)
-
-    async def emit_collect(
-        self,
-        event_type: str,
-        context: Optional[Dict[str, Any]] = None,
-    ) -> List[Any]:
-        """Fire handlers and return their non-None return values in order.
-
-        Like :meth:`emit` but captures each handler's return value. Used for
-        decision-style hooks (e.g. ``command:<name>`` policies that want to
-        allow/deny/rewrite the command before normal dispatch).
-
-        Exceptions from individual handlers are logged but do not abort the
-        remaining handlers.
-        """
-        if context is None:
-            context = {}
-
-        results: List[Any] = []
-        for fn in self._resolve_handlers(event_type):
-            try:
-                result = fn(event_type, context)
-                if asyncio.iscoroutine(result):
-                    result = await result
-                if result is not None:
-                    results.append(result)
-            except Exception as e:
-                print(f"[hooks] Error in handler for '{event_type}': {e}", flush=True)
-        return results
@@ -2129,42 +2129,10 @@ class DiscordAdapter(BasePlatformAdapter):
        # This ensures new commands added to COMMAND_REGISTRY in
        # hermes_cli/commands.py automatically appear as Discord slash
        # commands without needing a manual entry here.
-        def _build_auto_slash_command(_name: str, _description: str, _args_hint: str = ""):
-            """Build a discord.app_commands.Command that proxies to _run_simple_slash."""
-            discord_name = _name.lower()[:32]
-            desc = (_description or f"Run /{_name}")[:100]
-            has_args = bool(_args_hint)
-
-            if has_args:
-                def _make_args_handler(__name: str, __hint: str):
-                    @discord.app_commands.describe(args=f"Arguments: {__hint}"[:100])
-                    async def _handler(interaction: discord.Interaction, args: str = ""):
-                        await self._run_simple_slash(
-                            interaction, f"/{__name} {args}".strip()
-                        )
-                    _handler.__name__ = f"auto_slash_{__name.replace('-', '_')}"
-                    return _handler
-
-                handler = _make_args_handler(_name, _args_hint)
-            else:
-                def _make_simple_handler(__name: str):
-                    async def _handler(interaction: discord.Interaction):
-                        await self._run_simple_slash(interaction, f"/{__name}")
-                    _handler.__name__ = f"auto_slash_{__name.replace('-', '_')}"
-                    return _handler
-
-                handler = _make_simple_handler(_name)
-
-            return discord.app_commands.Command(
-                name=discord_name,
-                description=desc,
-                callback=handler,
-            )
-
-        already_registered: set[str] = set()
        try:
            from hermes_cli.commands import COMMAND_REGISTRY, _is_gateway_available, _resolve_config_gates

+            already_registered = set()
            try:
                already_registered = {cmd.name for cmd in tree.get_commands()}
            except Exception:
@@ -2179,10 +2147,38 @@ class DiscordAdapter(BasePlatformAdapter):
                discord_name = cmd_def.name.lower()[:32]
                if discord_name in already_registered:
                    continue
-                auto_cmd = _build_auto_slash_command(
-                    cmd_def.name,
-                    cmd_def.description,
-                    cmd_def.args_hint,
+                # Skip aliases that overlap with already-registered names
+                # (aliases for explicitly registered commands are handled above).
+                desc = (cmd_def.description or f"Run /{cmd_def.name}")[:100]
+                has_args = bool(cmd_def.args_hint)
+
+                if has_args:
+                    # Command takes optional arguments — create handler with
+                    # an optional ``args`` string parameter.
+                    def _make_args_handler(_name: str, _hint: str):
+                        @discord.app_commands.describe(args=f"Arguments: {_hint}"[:100])
+                        async def _handler(interaction: discord.Interaction, args: str = ""):
+                            await self._run_simple_slash(
+                                interaction, f"/{_name} {args}".strip()
+                            )
+                        _handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
+                        return _handler
+
+                    handler = _make_args_handler(cmd_def.name, cmd_def.args_hint)
+                else:
+                    # Parameterless command.
+                    def _make_simple_handler(_name: str):
+                        async def _handler(interaction: discord.Interaction):
+                            await self._run_simple_slash(interaction, f"/{_name}")
+                        _handler.__name__ = f"auto_slash_{_name.replace('-', '_')}"
+                        return _handler
+
+                    handler = _make_simple_handler(cmd_def.name)
+
+                auto_cmd = discord.app_commands.Command(
+                    name=discord_name,
+                    description=desc,
+                    callback=handler,
                )
                try:
                    tree.add_command(auto_cmd)
@@ -2199,35 +2195,6 @@ class DiscordAdapter(BasePlatformAdapter):
        except Exception as e:
            logger.warning("Discord auto-register from COMMAND_REGISTRY failed: %s", e)

-        # ── Plugin-registered slash commands ──
-        # Plugins register via PluginContext.register_command(); we mirror
-        # those into Discord's native slash picker so users get the same
-        # autocomplete UX as for built-in commands. No per-platform plugin
-        # API needed — plugin commands are platform-agnostic.
-        try:
-            from hermes_cli.commands import _iter_plugin_command_entries
-
-            for plugin_name, plugin_desc, plugin_args_hint in _iter_plugin_command_entries():
-                discord_name = plugin_name.lower()[:32]
-                if discord_name in already_registered:
-                    continue
-                auto_cmd = _build_auto_slash_command(
-                    plugin_name,
-                    plugin_desc,
-                    plugin_args_hint,
-                )
-                try:
-                    tree.add_command(auto_cmd)
-                    already_registered.add(discord_name)
-                except Exception:
-                    # Silently skip commands that fail registration (e.g.
-                    # name conflict with a subcommand group).
-                    pass
-        except Exception as e:
-            logger.warning(
-                "Discord auto-register from plugin commands failed: %s", e
-            )
-
        # Register skills under a single /skill command group with category
        # subcommand groups.  This uses 1 top-level slot instead of N,
        # supporting up to 25 categories × 25 skills = 625 skills.
@@ -545,7 +545,6 @@ class EmailAdapter(BasePlatformAdapter):
        caption: Optional[str] = None,
        file_name: Optional[str] = None,
        reply_to: Optional[str] = None,
-        **kwargs,
    ) -> SendResult:
        """Send a file as an email attachment."""
        try:
@@ -14,35 +14,6 @@ Supports:
 - Interactive card button-click events routed as synthetic COMMAND events
 - Webhook anomaly tracking (matches openclaw createWebhookAnomalyTracker)
 - Verification token validation as second auth layer (matches openclaw)
-
-Feishu identity model
---------------------
-Feishu uses three user-ID tiers (official docs:
-https://open.feishu.cn/document/home/user-identity-introduction/introduction):
-
-  open_id  (ou_xxx)  — **App-scoped**.  The same person gets a different
-                        open_id under each Feishu app.  Always available in
-                        event payloads without extra permissions.
-  user_id  (u_xxx)   — **Tenant-scoped**.  Stable within a company but
-                        requires the ``contact:user.employee_id:readonly``
-                        scope.  May not be present.
-  union_id (on_xxx)  — **Developer-scoped**.  Same across all apps owned by
-                        one developer/ISV.  Best cross-app stable ID.
-
-For bots specifically:
-
-  app_id              — The application's canonical credential identifier.
-  bot open_id         — Returned by ``/bot/v3/info``.  This is the bot's own
-                        open_id *within its app context* and is what Feishu
-                        puts in ``mentions[].id.open_id`` when someone
-                        @-mentions the bot.  Used for mention gating only.
-
-In single-bot mode (what Hermes currently supports), open_id works as a
-de-facto unique user identifier since there is only one app context.
-
-Session-key participant isolation prefers ``union_id`` (via user_id_alt)
-over ``open_id`` (via user_id) so that sessions stay stable if the same
-user is seen through different apps in the future.
 """

 from __future__ import annotations
@@ -64,7 +35,7 @@ from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path
 from types import SimpleNamespace
-from typing import Any, Dict, List, Optional, Sequence
+from typing import Any, Dict, List, Optional
 from urllib.error import HTTPError, URLError
 from urllib.parse import urlencode
 from urllib.request import Request, urlopen
@@ -102,9 +73,7 @@ try:
        UpdateMessageRequest,
        UpdateMessageRequestBody,
    )
-    from lark_oapi.core import AccessTokenType, HttpMethod
    from lark_oapi.core.const import FEISHU_DOMAIN, LARK_DOMAIN
-    from lark_oapi.core.model import BaseRequest
    from lark_oapi.event.callback.model.p2_card_action_trigger import (
        CallBackCard,
        P2CardActionTriggerResponse,
@@ -265,8 +234,6 @@ FALLBACK_ATTACHMENT_TEXT = "[Attachment]"
 _PREFERRED_LOCALES = ("zh_cn", "en_us")
 _MARKDOWN_SPECIAL_CHARS_RE = re.compile(r"([\\`*_{}\[\]()#+\-!|>~])")
 _MENTION_PLACEHOLDER_RE = re.compile(r"@_user_\d+")
-_MENTION_BOUNDARY_CHARS = frozenset(" \t\n\r.,;:!?、，。；：！？()[]{}<>\"'`")
-_TRAILING_TERMINAL_PUNCT = frozenset(" \t\n\r.!?。！？")
 _WHITESPACE_RE = re.compile(r"\s+")
 _SUPPORTED_CARD_TEXT_KEYS = (
    "title",
@@ -310,36 +277,12 @@ class FeishuPostMediaRef:
    resource_type: str = "file"


-@dataclass(frozen=True)
-class FeishuMentionRef:
-    name: str = ""
-    open_id: str = ""
-    is_all: bool = False
-    is_self: bool = False
-
-
-@dataclass(frozen=True)
-class _FeishuBotIdentity:
-    open_id: str = ""
-    user_id: str = ""
-    name: str = ""
-
-    def matches(self, *, open_id: str, user_id: str, name: str) -> bool:
-        # Precedence: open_id > user_id > name. IDs are authoritative when both
-        # sides have them; the next tier is only considered when either side
-        # lacks the current one.
-        if open_id and self.open_id:
-            return open_id == self.open_id
-        if user_id and self.user_id:
-            return user_id == self.user_id
-        return bool(self.name) and name == self.name
-
-
@dataclass(frozen=True)
 class FeishuPostParseResult:
    text_content: str
    image_keys: List[str] = field(default_factory=list)
    media_refs: List[FeishuPostMediaRef] = field(default_factory=list)
+    mentioned_ids: List[str] = field(default_factory=list)


@dataclass(frozen=True)
@@ -349,14 +292,14 @@ class FeishuNormalizedMessage:
    preferred_message_type: str = "text"
    image_keys: List[str] = field(default_factory=list)
    media_refs: List[FeishuPostMediaRef] = field(default_factory=list)
-    mentions: List[FeishuMentionRef] = field(default_factory=list)
+    mentioned_ids: List[str] = field(default_factory=list)
    relation_kind: str = "plain"
    metadata: Dict[str, Any] = field(default_factory=dict)


@dataclass(frozen=True)
 class FeishuAdapterSettings:
-    app_id: str  # Canonical bot/app identifier (credential, not from event payloads)
+    app_id: str
    app_secret: str
    domain_name: str
    connection_mode: str
@@ -364,11 +307,7 @@ class FeishuAdapterSettings:
    verification_token: str
    group_policy: str
    allowed_group_users: frozenset[str]
-    # Bot's own open_id (app-scoped) — returned by /bot/v3/info.  Used only for
-    # @mention matching: Feishu puts this value in mentions[].id.open_id when
-    # a user @-mentions the bot in a group chat.
    bot_open_id: str
-    # Bot's user_id (tenant-scoped) — optional, used as fallback mention match.
    bot_user_id: str
    bot_name: str
    dedup_cache_size: int
@@ -566,17 +505,14 @@ def _build_markdown_post_rows(content: str) -> List[List[Dict[str, str]]]:
    return rows or [[{"tag": "md", "text": content}]]


-def parse_feishu_post_payload(
-    payload: Any,
-    *,
-    mentions_map: Optional[Dict[str, FeishuMentionRef]] = None,
-) -> FeishuPostParseResult:
+def parse_feishu_post_payload(payload: Any) -> FeishuPostParseResult:
    resolved = _resolve_post_payload(payload)
    if not resolved:
        return FeishuPostParseResult(text_content=FALLBACK_POST_TEXT)

    image_keys: List[str] = []
    media_refs: List[FeishuPostMediaRef] = []
+    mentioned_ids: List[str] = []
    parts: List[str] = []

    title = _normalize_feishu_text(str(resolved.get("title", "")).strip())
@@ -587,10 +523,7 @@ def parse_feishu_post_payload(
        if not isinstance(row, list):
            continue
        row_text = _normalize_feishu_text(
-            "".join(
-                _render_post_element(item, image_keys, media_refs, mentions_map)
-                for item in row
-            )
+            "".join(_render_post_element(item, image_keys, media_refs, mentioned_ids) for item in row)
        )
        if row_text:
            parts.append(row_text)
@@ -599,6 +532,7 @@ def parse_feishu_post_payload(
        text_content="\n".join(parts).strip() or FALLBACK_POST_TEXT,
        image_keys=image_keys,
        media_refs=media_refs,
+        mentioned_ids=mentioned_ids,
    )


@@ -650,7 +584,7 @@ def _render_post_element(
    element: Any,
    image_keys: List[str],
    media_refs: List[FeishuPostMediaRef],
-    mentions_map: Optional[Dict[str, FeishuMentionRef]] = None,
+    mentioned_ids: List[str],
 ) -> str:
    if isinstance(element, str):
        return element
@@ -668,21 +602,19 @@ def _render_post_element(
        escaped_label = _escape_markdown_text(label)
        return f"[{escaped_label}]({href})" if href else escaped_label
    if tag == "at":
-        # Post <at>.user_id is a placeholder ("@_user_N" or "@_all"); look up
-        # the real ref in mentions_map for the display name.
-        placeholder = str(element.get("user_id", "")).strip()
-        if placeholder == "@_all":
-            # Feishu SDK sometimes omits @_all from the top-level mentions
-            # payload; record it here so the caller's mention list stays complete.
-            if mentions_map is not None and "@_all" not in mentions_map:
-                mentions_map["@_all"] = FeishuMentionRef(is_all=True)
-            return "@all"
-        ref = (mentions_map or {}).get(placeholder)
-        if ref is not None:
-            display_name = ref.name or ref.open_id or "user"
-        else:
-            display_name = str(element.get("user_name", "")).strip() or "user"
-        return f"@{_escape_markdown_text(display_name)}"
+        mentioned_id = (
+            str(element.get("open_id", "")).strip()
+            or str(element.get("user_id", "")).strip()
+        )
+        if mentioned_id and mentioned_id not in mentioned_ids:
+            mentioned_ids.append(mentioned_id)
+        display_name = (
+            str(element.get("user_name", "")).strip()
+            or str(element.get("name", "")).strip()
+            or str(element.get("text", "")).strip()
+            or mentioned_id
+        )
+        return f"@{_escape_markdown_text(display_name)}" if display_name else "@"
    if tag in {"img", "image"}:
        image_key = str(element.get("image_key", "")).strip()
        if image_key and image_key not in image_keys:
@@ -720,7 +652,8 @@ def _render_post_element(

    nested_parts: List[str] = []
    for key in ("text", "title", "content", "children", "elements"):
-        extracted = _render_nested_post(element.get(key), image_keys, media_refs, mentions_map)
+        value = element.get(key)
+        extracted = _render_nested_post(value, image_keys, media_refs, mentioned_ids)
        if extracted:
            nested_parts.append(extracted)
    return " ".join(part for part in nested_parts if part)
@@ -730,7 +663,7 @@ def _render_nested_post(
    value: Any,
    image_keys: List[str],
    media_refs: List[FeishuPostMediaRef],
-    mentions_map: Optional[Dict[str, FeishuMentionRef]] = None,
+    mentioned_ids: List[str],
 ) -> str:
    if isinstance(value, str):
        return _escape_markdown_text(value)
@@ -738,17 +671,17 @@ def _render_nested_post(
        return " ".join(
            part
            for item in value
-            for part in [_render_nested_post(item, image_keys, media_refs, mentions_map)]
+            for part in [_render_nested_post(item, image_keys, media_refs, mentioned_ids)]
            if part
        )
    if isinstance(value, dict):
-        direct = _render_post_element(value, image_keys, media_refs, mentions_map)
+        direct = _render_post_element(value, image_keys, media_refs, mentioned_ids)
        if direct:
            return direct
        return " ".join(
            part
            for item in value.values()
-            for part in [_render_nested_post(item, image_keys, media_refs, mentions_map)]
+            for part in [_render_nested_post(item, image_keys, media_refs, mentioned_ids)]
            if part
        )
    return ""
@@ -759,48 +692,31 @@ def _render_nested_post(
 # ---------------------------------------------------------------------------


-def normalize_feishu_message(
-    *,
-    message_type: str,
-    raw_content: str,
-    mentions: Optional[Sequence[Any]] = None,
-    bot: _FeishuBotIdentity = _FeishuBotIdentity(),
-) -> FeishuNormalizedMessage:
+def normalize_feishu_message(*, message_type: str, raw_content: str) -> FeishuNormalizedMessage:
    normalized_type = str(message_type or "").strip().lower()
    payload = _load_feishu_payload(raw_content)
-    mentions_map = _build_mentions_map(mentions, bot)

    if normalized_type == "text":
-        text = str(payload.get("text", "") or "")
-        # Feishu SDK sometimes omits @_all from the mentions payload even when
-        # the text literal contains it (confirmed via im.v1.message.get).
-        if "@_all" in text and "@_all" not in mentions_map:
-            mentions_map["@_all"] = FeishuMentionRef(is_all=True)
        return FeishuNormalizedMessage(
            raw_type=normalized_type,
-            text_content=_normalize_feishu_text(text, mentions_map),
-            mentions=list(mentions_map.values()),
+            text_content=_normalize_feishu_text(str(payload.get("text", "") or "")),
        )
    if normalized_type == "post":
-        # The walker writes back to mentions_map if it encounters
-        # <at user_id="@_all">, so reading .values() after parsing is enough.
-        parsed_post = parse_feishu_post_payload(payload, mentions_map=mentions_map)
+        parsed_post = parse_feishu_post_payload(payload)
        return FeishuNormalizedMessage(
            raw_type=normalized_type,
            text_content=parsed_post.text_content,
            image_keys=list(parsed_post.image_keys),
            media_refs=list(parsed_post.media_refs),
-            mentions=list(mentions_map.values()),
+            mentioned_ids=list(parsed_post.mentioned_ids),
            relation_kind="post",
        )
-    mention_refs = list(mentions_map.values())
    if normalized_type == "image":
        image_key = str(payload.get("image_key", "") or "").strip()
        alt_text = _normalize_feishu_text(
            str(payload.get("text", "") or "")
            or str(payload.get("alt", "") or "")
-            or FALLBACK_IMAGE_TEXT,
-            mentions_map,
+            or FALLBACK_IMAGE_TEXT
        )
        return FeishuNormalizedMessage(
            raw_type=normalized_type,
@@ -808,7 +724,6 @@ def normalize_feishu_message(
            preferred_message_type="photo",
            image_keys=[image_key] if image_key else [],
            relation_kind="image",
-            mentions=mention_refs,
        )
    if normalized_type in {"file", "audio", "media"}:
        media_ref = _build_media_ref_from_payload(payload, resource_type=normalized_type)
@@ -820,7 +735,6 @@ def normalize_feishu_message(
            media_refs=[media_ref] if media_ref.file_key else [],
            relation_kind=normalized_type,
            metadata={"placeholder_text": placeholder},
-            mentions=mention_refs,
        )
    if normalized_type == "merge_forward":
        return _normalize_merge_forward_message(payload)
@@ -1095,20 +1009,8 @@ def _first_non_empty_text(*values: Any) -> str:
 # ---------------------------------------------------------------------------


-def _normalize_feishu_text(
-    text: str,
-    mentions_map: Optional[Dict[str, FeishuMentionRef]] = None,
-) -> str:
-    def _sub(match: "re.Match[str]") -> str:
-        key = match.group(0)
-        ref = (mentions_map or {}).get(key)
-        if ref is None:
-            return " "
-        name = ref.name or ref.open_id or "user"
-        return f"@{name}"
-
-    cleaned = _MENTION_PLACEHOLDER_RE.sub(_sub, text or "")
-    cleaned = cleaned.replace("@_all", "@all")
+def _normalize_feishu_text(text: str) -> str:
+    cleaned = _MENTION_PLACEHOLDER_RE.sub(" ", text or "")
    cleaned = cleaned.replace("\r\n", "\n").replace("\r", "\n")
    cleaned = "\n".join(_WHITESPACE_RE.sub(" ", line).strip() for line in cleaned.split("\n"))
    cleaned = "\n".join(line for line in cleaned.split("\n") if line)
@@ -1127,117 +1029,6 @@ def _unique_lines(lines: List[str]) -> List[str]:
    return unique


-# ---------------------------------------------------------------------------
-# Mention helpers
-# ---------------------------------------------------------------------------
-
-
-def _extract_mention_ids(mention: Any) -> tuple[str, str]:
-    # Returns (open_id, user_id). im.v1.message.get hands back id as a string
-    # plus id_type discriminator; event payloads hand back a nested UserId
-    # object carrying both fields.
-    mention_id = getattr(mention, "id", None)
-    if isinstance(mention_id, str):
-        id_type = str(getattr(mention, "id_type", "") or "").lower()
-        if id_type == "open_id":
-            return mention_id, ""
-        if id_type == "user_id":
-            return "", mention_id
-        return "", ""
-    if mention_id is None:
-        return "", ""
-    return (
-        str(getattr(mention_id, "open_id", "") or ""),
-        str(getattr(mention_id, "user_id", "") or ""),
-    )
-
-
-def _build_mentions_map(
-    mentions: Optional[Sequence[Any]],
-    bot: _FeishuBotIdentity,
-) -> Dict[str, FeishuMentionRef]:
-    result: Dict[str, FeishuMentionRef] = {}
-    for mention in mentions or []:
-        key = str(getattr(mention, "key", "") or "")
-        if not key:
-            continue
-        if key == "@_all":
-            result[key] = FeishuMentionRef(is_all=True)
-            continue
-        open_id, user_id = _extract_mention_ids(mention)
-        name = str(getattr(mention, "name", "") or "").strip()
-        result[key] = FeishuMentionRef(
-            name=name,
-            open_id=open_id,
-            is_self=bot.matches(open_id=open_id, user_id=user_id, name=name),
-        )
-    return result
-
-
-def _build_mention_hint(mentions: Sequence[FeishuMentionRef]) -> str:
-    parts: List[str] = []
-    seen: set = set()
-    for ref in mentions:
-        if ref.is_self:
-            continue
-        signature = (ref.is_all, ref.open_id, ref.name)
-        if signature in seen:
-            continue
-        seen.add(signature)
-        if ref.is_all:
-            parts.append("@all")
-        elif ref.open_id:
-            parts.append(f"{ref.name or 'unknown'} (open_id={ref.open_id})")
-        else:
-            parts.append(ref.name or "unknown")
-    return f"[Mentioned: {', '.join(parts)}]" if parts else ""
-
-
-def _strip_edge_self_mentions(
-    text: str,
-    mentions: Sequence[FeishuMentionRef],
-) -> str:
-    # Leading: strip consecutive self-mentions unconditionally.
-    # Trailing: strip only when followed by whitespace/terminal punct, so
-    # mid-sentence references ("don't @Bot again") stay intact.
-    # Leading word-boundary prevents @Al from eating @Alice.
-    if not text:
-        return text
-    self_names = [
-        f"@{ref.name or ref.open_id or 'user'}"
-        for ref in mentions
-        if ref.is_self
-    ]
-    if not self_names:
-        return text
-
-    remaining = text.lstrip()
-    while True:
-        for nm in self_names:
-            if not remaining.startswith(nm):
-                continue
-            after = remaining[len(nm):]
-            if after and after[0] not in _MENTION_BOUNDARY_CHARS:
-                continue
-            remaining = after.lstrip()
-            break
-        else:
-            break
-
-    while True:
-        i = len(remaining)
-        while i > 0 and remaining[i - 1] in _TRAILING_TERMINAL_PUNCT:
-            i -= 1
-        body = remaining[:i]
-        tail = remaining[i:]
-        for nm in self_names:
-            if body.endswith(nm):
-                remaining = body[: -len(nm)].rstrip() + tail
-                break
-        else:
-            return remaining
-
-
 def _run_official_feishu_ws_client(ws_client: Any, adapter: Any) -> None:
    """Run the official Lark WS client in its own thread-local event loop."""
    import lark_oapi.ws.client as ws_client_module
@@ -2679,22 +2470,13 @@ class FeishuAdapter(BasePlatformAdapter):
        chat_type: str,
        message_id: str,
    ) -> None:
-        text, inbound_type, media_urls, media_types, mentions = await self._extract_message_content(message)
-
-        if inbound_type == MessageType.TEXT:
-            text = _strip_edge_self_mentions(text, mentions)
-            if text.startswith("/"):
-                inbound_type = MessageType.COMMAND
-
-        # Guard runs post-strip so a pure "@Bot" message (stripped to "") is dropped.
+        text, inbound_type, media_urls, media_types = await self._extract_message_content(message)
        if inbound_type == MessageType.TEXT and not text and not media_urls:
-            logger.debug("[Feishu] Ignoring empty text message id=%s", message_id)
+            logger.debug("[Feishu] Ignoring unsupported or empty message type: %s", getattr(message, "message_type", ""))
            return

-        if inbound_type != MessageType.COMMAND:
-            hint = _build_mention_hint(mentions)
-            if hint:
-                text = f"{hint}\n\n{text}" if text else hint
+        if inbound_type == MessageType.TEXT and text.startswith("/"):
+            inbound_type = MessageType.COMMAND

        reply_to_message_id = (
            getattr(message, "parent_id", None)
@@ -3153,20 +2935,14 @@ class FeishuAdapter(BasePlatformAdapter):
    # Message content extraction and resource download
    # =========================================================================

-    async def _extract_message_content(
-        self, message: Any
-    ) -> tuple[str, MessageType, List[str], List[str], List[FeishuMentionRef]]:
+    async def _extract_message_content(self, message: Any) -> tuple[str, MessageType, List[str], List[str]]:
+        """Extract text and cached media from a normalized Feishu message."""
        raw_content = getattr(message, "content", "") or ""
        raw_type = getattr(message, "message_type", "") or ""
        message_id = str(getattr(message, "message_id", "") or "")
        logger.info("[Feishu] Received raw message type=%s message_id=%s", raw_type, message_id)

-        normalized = normalize_feishu_message(
-            message_type=raw_type,
-            raw_content=raw_content,
-            mentions=getattr(message, "mentions", None),
-            bot=self._bot_identity(),
-        )
+        normalized = normalize_feishu_message(message_type=raw_type, raw_content=raw_content)
        media_urls, media_types = await self._download_feishu_message_resources(
            message_id=message_id,
            normalized=normalized,
@@ -3183,7 +2959,7 @@ class FeishuAdapter(BasePlatformAdapter):
            if injected:
                text = injected

-        return text, inbound_type, media_urls, media_types, list(normalized.mentions)
+        return text, inbound_type, media_urls, media_types

    async def _download_feishu_message_resources(
        self,
@@ -3447,22 +3223,10 @@ class FeishuAdapter(BasePlatformAdapter):
        return "group"

    async def _resolve_sender_profile(self, sender_id: Any) -> Dict[str, Optional[str]]:
-        """Map Feishu's three-tier user IDs onto Hermes' SessionSource fields.
-
-        Preference order for the primary ``user_id`` field:
-          1. user_id  (tenant-scoped, most stable — requires permission scope)
-          2. open_id  (app-scoped, always available — different per bot app)
-
-        ``user_id_alt`` carries the union_id (developer-scoped, stable across
-        all apps by the same developer).  Session-key generation prefers
-        user_id_alt when present, so participant isolation stays stable even
-        if the primary ID is the app-scoped open_id.
-        """
        open_id = getattr(sender_id, "open_id", None) or None
        user_id = getattr(sender_id, "user_id", None) or None
        union_id = getattr(sender_id, "union_id", None) or None
-        # Prefer tenant-scoped user_id; fall back to app-scoped open_id.
-        primary_id = user_id or open_id
+        primary_id = open_id or user_id
        display_name = await self._resolve_sender_name_from_api(primary_id or union_id)
        return {
            "user_id": primary_id,
@@ -3544,31 +3308,15 @@ class FeishuAdapter(BasePlatformAdapter):
            body = getattr(parent, "body", None)
            msg_type = getattr(parent, "msg_type", "") or ""
            raw_content = getattr(body, "content", "") or ""
-            parent_mentions = getattr(parent, "mentions", None) if parent else None
-            text = self._extract_text_from_raw_content(
-                msg_type=msg_type,
-                raw_content=raw_content,
-                mentions=parent_mentions,
-            )
+            text = self._extract_text_from_raw_content(msg_type=msg_type, raw_content=raw_content)
            self._message_text_cache[message_id] = text
            return text
        except Exception:
            logger.warning("[Feishu] Failed to fetch parent message %s", message_id, exc_info=True)
            return None

-    def _extract_text_from_raw_content(
-        self,
-        *,
-        msg_type: str,
-        raw_content: str,
-        mentions: Optional[Sequence[Any]] = None,
-    ) -> Optional[str]:
-        normalized = normalize_feishu_message(
-            message_type=msg_type,
-            raw_content=raw_content,
-            mentions=mentions,
-            bot=self._bot_identity(),
-        )
+    def _extract_text_from_raw_content(self, *, msg_type: str, raw_content: str) -> Optional[str]:
+        normalized = normalize_feishu_message(message_type=msg_type, raw_content=raw_content)
        if normalized.text_content:
            return normalized.text_content
        placeholder = normalized.metadata.get("placeholder_text") if isinstance(normalized.metadata, dict) else None
@@ -3638,10 +3386,10 @@ class FeishuAdapter(BasePlatformAdapter):
        normalized = normalize_feishu_message(
            message_type=getattr(message, "message_type", "") or "",
            raw_content=raw_content,
-            mentions=getattr(message, "mentions", None),
-            bot=self._bot_identity(),
        )
-        return self._post_mentions_bot(normalized.mentions)
+        if normalized.mentioned_ids:
+            return self._post_mentions_bot(normalized.mentioned_ids)
+        return False

    def _is_self_sent_bot_message(self, event: Any) -> bool:
        """Return True only for Feishu events emitted by this Hermes bot."""
@@ -3661,37 +3409,30 @@ class FeishuAdapter(BasePlatformAdapter):
        return False

    def _message_mentions_bot(self, mentions: List[Any]) -> bool:
-        # IDs trump names: when both sides have open_id (or both user_id),
-        # match requires equal IDs. Name fallback only when either side
-        # lacks an ID.
+        """Check whether any mention targets the configured or inferred bot identity."""
        for mention in mentions:
            mention_id = getattr(mention, "id", None)
-            mention_open_id = (getattr(mention_id, "open_id", None) or "").strip()
-            mention_user_id = (getattr(mention_id, "user_id", None) or "").strip()
+            mention_open_id = getattr(mention_id, "open_id", None)
+            mention_user_id = getattr(mention_id, "user_id", None)
            mention_name = (getattr(mention, "name", None) or "").strip()

-            if mention_open_id and self._bot_open_id:
-                if mention_open_id == self._bot_open_id:
-                    return True
-                continue  # IDs differ — not the bot; skip name fallback.
-            if mention_user_id and self._bot_user_id:
-                if mention_user_id == self._bot_user_id:
-                    return True
-                continue
+            if self._bot_open_id and mention_open_id == self._bot_open_id:
+                return True
+            if self._bot_user_id and mention_user_id == self._bot_user_id:
+                return True
            if self._bot_name and mention_name == self._bot_name:
                return True

        return False

-    def _post_mentions_bot(self, mentions: List[FeishuMentionRef]) -> bool:
-        return any(m.is_self for m in mentions)
-
-    def _bot_identity(self) -> _FeishuBotIdentity:
-        return _FeishuBotIdentity(
-            open_id=self._bot_open_id,
-            user_id=self._bot_user_id,
-            name=self._bot_name,
-        )
+    def _post_mentions_bot(self, mentioned_ids: List[str]) -> bool:
+        if not mentioned_ids:
+            return False
+        if self._bot_open_id and self._bot_open_id in mentioned_ids:
+            return True
+        if self._bot_user_id and self._bot_user_id in mentioned_ids:
+            return True
+        return False

    async def _hydrate_bot_identity(self) -> None:
        """Best-effort discovery of bot identity for precise group mention gating
@@ -3716,15 +3457,14 @@ class FeishuAdapter(BasePlatformAdapter):
        # uses via probe_bot().
        if not self._bot_open_id or not self._bot_name:
            try:
-                req = (
-                    BaseRequest.builder()
-                    .http_method(HttpMethod.GET)
-                    .uri("/open-apis/bot/v3/info")
-                    .token_types({AccessTokenType.TENANT})
-                    .build()
+                resp = await asyncio.to_thread(
+                    self._client.request,
+                    method="GET",
+                    url="/open-apis/bot/v3/info",
+                    body=None,
+                    raw_response=True,
                )
-                resp = await asyncio.to_thread(self._client.request, req)
-                content = getattr(getattr(resp, "raw", None), "content", None)
+                content = getattr(resp, "content", None)
                if content:
                    payload = json.loads(content)
                    parsed = _parse_bot_response(payload) or {}
@@ -4472,9 +4212,6 @@ def probe_bot(app_id: str, app_secret: str, domain: str) -> Optional[dict]:

    Uses lark_oapi SDK when available, falls back to raw HTTP otherwise.
    Returns {"bot_name": ..., "bot_open_id": ...} on success, None on failure.
-
-    Note: ``bot_open_id`` here is the bot's app-scoped open_id — the same ID
-    that Feishu puts in @mention payloads.  It is NOT the app_id.
    """
    if FEISHU_AVAILABLE:
        return _probe_bot_sdk(app_id, app_secret, domain)
@@ -4495,12 +4232,12 @@ def _build_onboard_client(app_id: str, app_secret: str, domain: str) -> Any:


 def _parse_bot_response(data: dict) -> Optional[dict]:
-    # /bot/v3/info returns bot.app_name; legacy paths used bot_name — accept both.
+    """Extract bot_name and bot_open_id from a /bot/v3/info response."""
    if data.get("code") != 0:
        return None
    bot = data.get("bot") or data.get("data", {}).get("bot") or {}
    return {
-        "bot_name": bot.get("app_name") or bot.get("bot_name"),
+        "bot_name": bot.get("bot_name"),
        "bot_open_id": bot.get("open_id"),
    }

@@ -4509,18 +4246,13 @@ def _probe_bot_sdk(app_id: str, app_secret: str, domain: str) -> Optional[dict]:
    """Probe bot info using lark_oapi SDK."""
    try:
        client = _build_onboard_client(app_id, app_secret, domain)
-        req = (
-            BaseRequest.builder()
-            .http_method(HttpMethod.GET)
-            .uri("/open-apis/bot/v3/info")
-            .token_types({AccessTokenType.TENANT})
-            .build()
+        resp = client.request(
+            method="GET",
+            url="/open-apis/bot/v3/info",
+            body=None,
+            raw_response=True,
        )
-        resp = client.request(req)
-        content = getattr(getattr(resp, "raw", None), "content", None)
-        if content is None:
-            return None
-        return _parse_bot_response(json.loads(content))
+        return _parse_bot_response(json.loads(resp.content))
    except Exception as exc:
        logger.debug("[Feishu onboard] SDK probe failed: %s", exc)
        return None
@@ -2687,9 +2687,8 @@ class GatewayRunner:
                except Exception as _e:
                    logger.debug("SessionDB close error: %s", _e)

-            from gateway.status import remove_pid_file, release_gateway_runtime_lock
+            from gateway.status import remove_pid_file
            remove_pid_file()
-            release_gateway_runtime_lock()

            # Write a clean-shutdown marker so the next startup knows this
            # wasn't a crash.  suspend_recently_active() only needs to run
@@ -3486,72 +3485,22 @@ class GatewayRunner:

        # Check for commands
        command = event.get_command()
-
-        from hermes_cli.commands import (
-            GATEWAY_KNOWN_COMMANDS,
-            is_gateway_known_command,
-            resolve_command as _resolve_cmd,
-        )
-
-        # Resolve aliases to canonical name so dispatch and hook names
-        # don't depend on the exact alias the user typed.
-        _cmd_def = _resolve_cmd(command) if command else None
-        canonical = _cmd_def.name if _cmd_def else command
-
-        # Fire the ``command:<canonical>`` hook for any recognized slash
-        # command — built-in OR plugin-registered. Handlers can return a
-        # dict with ``{"decision": "deny" | "handled" | "rewrite", ...}``
-        # to intercept dispatch before core handling runs. This replaces
-        # the previous fire-and-forget emit(): return values are now
-        # honored, but handlers that return nothing behave exactly as
-        # before (telemetry-style hooks keep working).
-        if command and is_gateway_known_command(canonical):
-            raw_args = event.get_command_args().strip()
-            hook_ctx = {
+        
+        # Emit command:* hook for any recognized slash command.
+        # GATEWAY_KNOWN_COMMANDS is derived from the central COMMAND_REGISTRY
+        # in hermes_cli/commands.py — no hardcoded set to maintain here.
+        from hermes_cli.commands import GATEWAY_KNOWN_COMMANDS, resolve_command as _resolve_cmd
+        if command and command in GATEWAY_KNOWN_COMMANDS:
+            await self.hooks.emit(f"command:{command}", {
                "platform": source.platform.value if source.platform else "",
                "user_id": source.user_id,
-                "command": canonical,
-                "raw_command": command,
-                "args": raw_args,
-                "raw_args": raw_args,
-            }
-            try:
-                hook_results = await self.hooks.emit_collect(
-                    f"command:{canonical}", hook_ctx
-                )
-            except Exception as _hook_err:
-                logger.debug(
-                    "command:%s hook dispatch failed (non-fatal): %s",
-                    canonical, _hook_err,
-                )
-                hook_results = []
+                "command": command,
+                "args": event.get_command_args().strip(),
+            })

-            for hook_result in hook_results:
-                if not isinstance(hook_result, dict):
-                    continue
-                decision = str(hook_result.get("decision", "")).strip().lower()
-                if not decision or decision == "allow":
-                    continue
-                if decision == "deny":
-                    message = hook_result.get("message")
-                    if isinstance(message, str) and message:
-                        return message
-                    return f"Command `/{command}` was blocked by a hook."
-                if decision == "handled":
-                    message = hook_result.get("message")
-                    return message if isinstance(message, str) and message else None
-                if decision == "rewrite":
-                    new_command = str(
-                        hook_result.get("command_name", "")
-                    ).strip().lstrip("/")
-                    if not new_command:
-                        continue
-                    new_args = str(hook_result.get("raw_args", "")).strip()
-                    event.text = f"/{new_command} {new_args}".strip()
-                    command = event.get_command()
-                    _cmd_def = _resolve_cmd(command) if command else None
-                    canonical = _cmd_def.name if _cmd_def else command
-                    break
+        # Resolve aliases to canonical name so dispatch only checks canonicals.
+        _cmd_def = _resolve_cmd(command) if command else None
+        canonical = _cmd_def.name if _cmd_def else command

        if canonical == "new":
            return await self._handle_reset_command(event)
@@ -4971,11 +4920,6 @@ class GatewayRunner:
        # the configured default instead of the previously switched model.
        self._session_model_overrides.pop(session_key, None)

-        # Clear session-scoped dangerous-command approvals and /yolo state.
-        # /new is a conversation-boundary operation — approval state from the
-        # previous conversation must not survive the reset.
-        self._clear_session_boundary_security_state(session_key)
-
        # Fire plugin on_session_finalize hook (session boundary)
        try:
            from hermes_cli.plugins import invoke_hook as _invoke_hook
@@ -5746,6 +5690,7 @@ class GatewayRunner:
        from hermes_cli.models import (
            list_available_providers,
            normalize_provider,
+            provider_for_base_url,
            _PROVIDER_LABELS,
        )

@@ -5774,7 +5719,10 @@ class GatewayRunner:
        # Detect custom endpoint from config base_url
        if current_provider == "openrouter":
            _cfg_base = model_cfg.get("base_url", "") if isinstance(model_cfg, dict) else ""
-            if _cfg_base and "openrouter.ai" not in _cfg_base:
+            inferred_provider = provider_for_base_url(_cfg_base)
+            if inferred_provider:
+                current_provider = inferred_provider
+            elif _cfg_base and "openrouter.ai" not in _cfg_base:
                current_provider = "custom"

        current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
@@ -7222,7 +7170,6 @@ class GatewayRunner:
        new_entry = self.session_store.switch_session(session_key, target_id)
        if not new_entry:
            return "Failed to switch session."
-        self._clear_session_boundary_security_state(session_key)

        # Get the title for confirmation
        title = self._session_db.get_session_title(target_id) or name
@@ -7312,7 +7259,6 @@ class GatewayRunner:
        new_entry = self.session_store.switch_session(session_key, new_session_id)
        if not new_entry:
            return "Branch created but failed to switch to it."
-        self._clear_session_boundary_security_state(session_key)

        # Evict any cached agent for this session
        self._evict_cached_agent(session_key)
@@ -7703,14 +7649,13 @@ class GatewayRunner:
        from hermes_cli.debug import (
            _capture_dump, collect_debug_report,
            upload_to_pastebin, _schedule_auto_delete,
-            _GATEWAY_PRIVACY_NOTICE, _best_effort_sweep_expired_pastes,
+            _GATEWAY_PRIVACY_NOTICE,
        )

        loop = asyncio.get_running_loop()

        # Run blocking I/O (dump capture, log reads, uploads) in a thread.
        def _collect_and_upload():
-            _best_effort_sweep_expired_pastes()
            dump_text = _capture_dump()
            report = collect_debug_report(log_lines=200, dump_text=dump_text)

@@ -8687,29 +8632,6 @@ class GatewayRunner:
        if hasattr(self, "_busy_ack_ts"):
            self._busy_ack_ts.pop(session_key, None)

-    def _clear_session_boundary_security_state(self, session_key: str) -> None:
-        """Clear approval state that must not survive a real conversation switch."""
-        if not session_key:
-            return
-
-        pending_approvals = getattr(self, "_pending_approvals", None)
-        if isinstance(pending_approvals, dict):
-            pending_approvals.pop(session_key, None)
-
-        try:
-            from tools.approval import clear_session as _clear_approval_session
-        except Exception:
-            return
-
-        try:
-            _clear_approval_session(session_key)
-        except Exception as e:
-            logger.debug(
-                "Failed to clear approval state for session boundary %s: %s",
-                session_key,
-                e,
-            )
-
    def _begin_session_run_generation(self, session_key: str) -> int:
        """Claim a fresh run generation token for ``session_key``.

@@ -10876,13 +10798,7 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
    # The PID file is scoped to HERMES_HOME, so future multi-profile
    # setups (each profile using a distinct HERMES_HOME) will naturally
    # allow concurrent instances without tripping this guard.
-    from gateway.status import (
-        acquire_gateway_runtime_lock,
-        get_running_pid,
-        release_gateway_runtime_lock,
-        remove_pid_file,
-        terminate_pid,
-    )
+    from gateway.status import get_running_pid, remove_pid_file, terminate_pid
    existing_pid = get_running_pid()
    if existing_pid is not None and existing_pid != os.getpid():
        if replace:
@@ -11095,21 +11011,14 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
            "Exiting to avoid double-running.", _current_pid
        )
        return False
-    if not acquire_gateway_runtime_lock():
-        logger.error(
-            "Gateway runtime lock is already held by another instance. Exiting."
-        )
-        return False
    try:
        write_pid_file()
    except FileExistsError:
-        release_gateway_runtime_lock()
        logger.error(
            "PID file race lost to another gateway instance. Exiting."
        )
        return False
    atexit.register(remove_pid_file)
-    atexit.register(release_gateway_runtime_lock)

    # Start the gateway
    success = await runner.start()
@@ -80,7 +80,7 @@ class SessionSource:
    user_name: Optional[str] = None
    thread_id: Optional[str] = None  # For forum topics, Discord threads, etc.
    chat_topic: Optional[str] = None  # Channel topic/description (Discord, Slack)
-    user_id_alt: Optional[str] = None  # Platform-specific stable alt ID (Signal UUID, Feishu union_id)
+    user_id_alt: Optional[str] = None  # Signal UUID (alternative to phone number)
    chat_id_alt: Optional[str] = None  # Signal group internal ID
    is_bot: bool = False  # True when the message author is a bot/webhook (Discord)
    
@@ -22,18 +22,11 @@ from pathlib import Path
 from hermes_constants import get_hermes_home
 from typing import Any, Optional

-if sys.platform == "win32":
-    import msvcrt
-else:
-    import fcntl
-
 _GATEWAY_KIND = "hermes-gateway"
 _RUNTIME_STATUS_FILE = "gateway_state.json"
 _LOCKS_DIRNAME = "gateway-locks"
 _IS_WINDOWS = sys.platform == "win32"
 _UNSET = object()
-_GATEWAY_LOCK_FILENAME = "gateway.lock"
-_gateway_lock_handle = None


 def _get_pid_path() -> Path:
@@ -42,14 +35,6 @@ def _get_pid_path() -> Path:
    return home / "gateway.pid"


-def _get_gateway_lock_path(pid_path: Optional[Path] = None) -> Path:
-    """Return the path to the runtime gateway lock file."""
-    if pid_path is not None:
-        return pid_path.with_name(_GATEWAY_LOCK_FILENAME)
-    home = get_hermes_home()
-    return home / _GATEWAY_LOCK_FILENAME
-
-
 def _get_runtime_status_path() -> Path:
    """Return the persisted runtime health/status file path."""
    return _get_pid_path().with_name(_RUNTIME_STATUS_FILE)
@@ -136,7 +121,6 @@ def _looks_like_gateway_process(pid: int) -> bool:
        "hermes_cli.main gateway",
        "hermes_cli/main.py gateway",
        "hermes gateway",
-        "hermes-gateway",
        "gateway/run.py",
    )
    return any(pattern in cmdline for pattern in patterns)
@@ -228,135 +212,16 @@ def _read_pid_record(pid_path: Optional[Path] = None) -> Optional[dict]:
    return None


-def _read_gateway_lock_record(lock_path: Optional[Path] = None) -> Optional[dict[str, Any]]:
-    return _read_pid_record(lock_path or _get_gateway_lock_path())
-
-
-def _pid_from_record(record: Optional[dict[str, Any]]) -> Optional[int]:
-    if not record:
-        return None
-    try:
-        return int(record["pid"])
-    except (KeyError, TypeError, ValueError):
-        return None
-
-
 def _cleanup_invalid_pid_path(pid_path: Path, *, cleanup_stale: bool) -> None:
-    """Delete a stale gateway PID file (and its sibling lock metadata).
-
-    Called from ``get_running_pid()`` after the runtime lock has already been
-    confirmed inactive, so the on-disk metadata is known to belong to a dead
-    process.  Unlike ``remove_pid_file()`` (which defensively refuses to delete
-    a PID file whose ``pid`` field differs from ``os.getpid()`` to protect
-    ``--replace`` handoffs), this path force-unlinks both files so the next
-    startup sees a clean slate.
-    """
    if not cleanup_stale:
        return
    try:
-        pid_path.unlink(missing_ok=True)
+        if pid_path == _get_pid_path():
+            remove_pid_file()
+        else:
+            pid_path.unlink(missing_ok=True)
    except Exception:
        pass
-    try:
-        _get_gateway_lock_path(pid_path).unlink(missing_ok=True)
-    except Exception:
-        pass
-
-
-def _write_gateway_lock_record(handle) -> None:
-    handle.seek(0)
-    handle.truncate()
-    json.dump(_build_pid_record(), handle)
-    handle.flush()
-    try:
-        os.fsync(handle.fileno())
-    except OSError:
-        pass
-
-
-def _try_acquire_file_lock(handle) -> bool:
-    try:
-        if _IS_WINDOWS:
-            handle.seek(0, os.SEEK_END)
-            if handle.tell() == 0:
-                handle.write("\n")
-                handle.flush()
-            handle.seek(0)
-            msvcrt.locking(handle.fileno(), msvcrt.LK_NBLCK, 1)
-        else:
-            fcntl.flock(handle.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
-        return True
-    except (BlockingIOError, OSError):
-        return False
-
-
-def _release_file_lock(handle) -> None:
-    try:
-        if _IS_WINDOWS:
-            handle.seek(0)
-            msvcrt.locking(handle.fileno(), msvcrt.LK_UNLCK, 1)
-        else:
-            fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
-    except OSError:
-        pass
-
-
-def acquire_gateway_runtime_lock() -> bool:
-    """Claim the cross-process runtime lock for the gateway.
-
-    Unlike the PID file, the lock is owned by the live process itself. If the
-    process dies abruptly, the OS releases the lock automatically.
-    """
-    global _gateway_lock_handle
-    if _gateway_lock_handle is not None:
-        return True
-
-    path = _get_gateway_lock_path()
-    path.parent.mkdir(parents=True, exist_ok=True)
-    handle = open(path, "a+", encoding="utf-8")
-    if not _try_acquire_file_lock(handle):
-        handle.close()
-        return False
-    _write_gateway_lock_record(handle)
-    _gateway_lock_handle = handle
-    return True
-
-
-def release_gateway_runtime_lock() -> None:
-    """Release the gateway runtime lock when owned by this process."""
-    global _gateway_lock_handle
-    handle = _gateway_lock_handle
-    if handle is None:
-        return
-    _gateway_lock_handle = None
-    _release_file_lock(handle)
-    try:
-        handle.close()
-    except OSError:
-        pass
-
-
-def is_gateway_runtime_lock_active(lock_path: Optional[Path] = None) -> bool:
-    """Return True when some process currently owns the gateway runtime lock."""
-    global _gateway_lock_handle
-    resolved_lock_path = lock_path or _get_gateway_lock_path()
-    if _gateway_lock_handle is not None and resolved_lock_path == _get_gateway_lock_path():
-        return True
-
-    if not resolved_lock_path.exists():
-        return False
-
-    handle = open(resolved_lock_path, "a+", encoding="utf-8")
-    try:
-        if _try_acquire_file_lock(handle):
-            _release_file_lock(handle)
-            return False
-        return True
-    finally:
-        try:
-            handle.close()
-        except OSError:
-            pass


 def write_pid_file() -> None:
@@ -718,42 +583,35 @@ def get_running_pid(
    Cleans up stale PID files automatically.
    """
    resolved_pid_path = pid_path or _get_pid_path()
-    resolved_lock_path = _get_gateway_lock_path(resolved_pid_path)
-    lock_active = is_gateway_runtime_lock_active(resolved_lock_path)
-    if not lock_active:
+    record = _read_pid_record(resolved_pid_path)
+    if not record:
        _cleanup_invalid_pid_path(resolved_pid_path, cleanup_stale=cleanup_stale)
        return None

-    primary_record = _read_pid_record(resolved_pid_path)
-    fallback_record = _read_gateway_lock_record(resolved_lock_path)
+    try:
+        pid = int(record["pid"])
+    except (KeyError, TypeError, ValueError):
+        _cleanup_invalid_pid_path(resolved_pid_path, cleanup_stale=cleanup_stale)
+        return None

-    for record in (primary_record, fallback_record):
-        pid = _pid_from_record(record)
-        if pid is None:
-            continue
+    try:
+        os.kill(pid, 0)  # signal 0 = existence check, no actual signal sent
+    except (ProcessLookupError, PermissionError):
+        _cleanup_invalid_pid_path(resolved_pid_path, cleanup_stale=cleanup_stale)
+        return None

-        try:
-            os.kill(pid, 0)  # signal 0 = existence check, no actual signal sent
-        except ProcessLookupError:
-            continue
-        except PermissionError:
-            # The process exists but belongs to another user/service scope.
-            # With the runtime lock still held, prefer keeping it visible
-            # rather than deleting the PID file as "stale".
-            if _record_looks_like_gateway(record):
-                return pid
-            continue
+    recorded_start = record.get("start_time")
+    current_start = _get_process_start_time(pid)
+    if recorded_start is not None and current_start is not None and current_start != recorded_start:
+        _cleanup_invalid_pid_path(resolved_pid_path, cleanup_stale=cleanup_stale)
+        return None

-        recorded_start = record.get("start_time")
-        current_start = _get_process_start_time(pid)
-        if recorded_start is not None and current_start is not None and current_start != recorded_start:
-            continue
+    if not _looks_like_gateway_process(pid):
+        if not _record_looks_like_gateway(record):
+            _cleanup_invalid_pid_path(resolved_pid_path, cleanup_stale=cleanup_stale)
+            return None

-        if _looks_like_gateway_process(pid) or _record_looks_like_gateway(record):
-            return pid
-
-    _cleanup_invalid_pid_path(resolved_pid_path, cleanup_stale=cleanup_stale)
-    return None
+    return pid


 def is_gateway_running(
@@ -39,6 +39,13 @@ import httpx
 import yaml

 from hermes_cli.config import get_hermes_home, get_config_path, read_raw_config
+from hermes_cli.volcengine_byteplus import (
+    VOLCENGINE_PROVIDER,
+    BYTEPLUS_PROVIDER,
+    VOLCENGINE_STANDARD_BASE_URL,
+    BYTEPLUS_STANDARD_BASE_URL,
+    base_url_for_provider_model,
+)
 from hermes_constants import OPENROUTER_BASE_URL

 logger = logging.getLogger(__name__)
@@ -214,7 +221,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        auth_type="api_key",
        inference_base_url="https://api.anthropic.com",
        api_key_env_vars=("ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN", "CLAUDE_CODE_OAUTH_TOKEN"),
-        base_url_env_var="ANTHROPIC_BASE_URL",
    ),
    "alibaba": ProviderConfig(
        id="alibaba",
@@ -308,6 +314,20 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        api_key_env_vars=("XIAOMI_API_KEY",),
        base_url_env_var="XIAOMI_BASE_URL",
    ),
+    "volcengine": ProviderConfig(
+        id=VOLCENGINE_PROVIDER,
+        name="Volcengine",
+        auth_type="api_key",
+        inference_base_url=VOLCENGINE_STANDARD_BASE_URL,
+        api_key_env_vars=("VOLCENGINE_API_KEY",),
+    ),
+    "byteplus": ProviderConfig(
+        id=BYTEPLUS_PROVIDER,
+        name="BytePlus",
+        auth_type="api_key",
+        inference_base_url=BYTEPLUS_STANDARD_BASE_URL,
+        api_key_env_vars=("BYTEPLUS_API_KEY",),
+    ),
    "ollama-cloud": ProviderConfig(
        id="ollama-cloud",
        name="Ollama Cloud",
@@ -1016,6 +1036,10 @@ def resolve_provider(
        "hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
        "mimo": "xiaomi", "xiaomi-mimo": "xiaomi",
        "aws": "bedrock", "aws-bedrock": "bedrock", "amazon-bedrock": "bedrock", "amazon": "bedrock",
+        "volcengine-coding-plan": "volcengine",
+        "volcengine_coding_plan": "volcengine",
+        "byteplus-coding-plan": "byteplus",
+        "byteplus_coding_plan": "byteplus",
        "go": "opencode-go", "opencode-go-sub": "opencode-go",
        "kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
        # Local server aliases — route through the generic custom provider
@@ -1158,6 +1182,21 @@ def _qwen_cli_auth_path() -> Path:
    return Path.home() / ".qwen" / "oauth_creds.json"


+def _current_model_for_provider(provider_id: str) -> str:
+    """Return the currently configured model when it belongs to the provider."""
+    try:
+        config = read_raw_config()
+    except Exception:
+        return ""
+
+    model_cfg = config.get("model")
+    if isinstance(model_cfg, dict):
+        configured_provider = str(model_cfg.get("provider") or "").strip().lower()
+        if configured_provider == provider_id:
+            return str(model_cfg.get("default") or model_cfg.get("model") or "").strip()
+    return ""
+
+
 def _read_qwen_cli_tokens() -> Dict[str, Any]:
    auth_path = _qwen_cli_auth_path()
    if not auth_path.exists():
@@ -2556,7 +2595,11 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
    if pconfig.base_url_env_var:
        env_url = os.getenv(pconfig.base_url_env_var, "").strip()

-    if provider_id in ("kimi-coding", "kimi-coding-cn"):
+    active_model = _current_model_for_provider(provider_id)
+
+    if provider_id in {VOLCENGINE_PROVIDER, BYTEPLUS_PROVIDER}:
+        base_url = base_url_for_provider_model(provider_id, active_model) or pconfig.inference_base_url
+    elif provider_id in ("kimi-coding", "kimi-coding-cn"):
        base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
    elif env_url:
        base_url = env_url
@@ -2651,7 +2694,11 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
    if pconfig.base_url_env_var:
        env_url = os.getenv(pconfig.base_url_env_var, "").strip()

-    if provider_id in ("kimi-coding", "kimi-coding-cn"):
+    active_model = _current_model_for_provider(provider_id)
+
+    if provider_id in {VOLCENGINE_PROVIDER, BYTEPLUS_PROVIDER}:
+        base_url = base_url_for_provider_model(provider_id, active_model) or pconfig.inference_base_url
+    elif provider_id in ("kimi-coding", "kimi-coding-cn"):
        base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
    elif provider_id == "zai":
        base_url = _resolve_zai_base_url(api_key, pconfig.inference_base_url, env_url)
@@ -249,7 +249,7 @@ def _scan_workspace_state(source_dir: Path) -> list[tuple[Path, str]]:
            state_path = child / state_name
            if state_path.exists():
                kind = "directory" if state_path.is_dir() else "file"
-                rel = state_path.relative_to(source_dir).as_posix()
+                rel = state_path.relative_to(source_dir)
                findings.append((state_path, f"Workspace {kind}: {rel}"))

    return findings
@@ -260,26 +260,6 @@ GATEWAY_KNOWN_COMMANDS: frozenset[str] = frozenset(
 )


-def is_gateway_known_command(name: str | None) -> bool:
-    """Return True if ``name`` resolves to a gateway-dispatchable slash command.
-
-    This covers both built-in commands (``GATEWAY_KNOWN_COMMANDS`` derived
-    from ``COMMAND_REGISTRY``) and plugin-registered commands, which are
-    looked up lazily so importing this module never forces plugin
-    discovery. Gateway code uses this to decide whether to emit
-    ``command:<name>`` hooks — plugin commands get the same lifecycle
-    events as built-ins.
-    """
-    if not name:
-        return False
-    if name in GATEWAY_KNOWN_COMMANDS:
-        return True
-    for plugin_name, _description, _args_hint in _iter_plugin_command_entries():
-        if plugin_name == name:
-            return True
-    return False
-
-
 # Commands with explicit Level-2 running-agent handlers in gateway/run.py.
 # Listed here for introspection / tests; semantically a subset of
 # "all resolvable commands" — which is the real bypass set (see
@@ -391,47 +371,12 @@ def gateway_help_lines() -> list[str]:
    return lines


-def _iter_plugin_command_entries() -> list[tuple[str, str, str]]:
-    """Yield (name, description, args_hint) tuples for all plugin slash commands.
-
-    Plugin commands are registered via
-    :func:`hermes_cli.plugins.PluginContext.register_command`. They behave
-    like ``CommandDef`` entries for gateway surfacing: they appear in the
-    Telegram command menu, in Slack's ``/hermes`` subcommand mapping, and
-    (via :func:`gateway.platforms.discord._register_slash_commands`) in
-    Discord's native slash command picker.
-
-    Lookup is lazy so importing this module never forces plugin discovery
-    (which can trigger filesystem scans and environment-dependent
-    behavior).
-    """
-    try:
-        from hermes_cli.plugins import get_plugin_commands
-    except Exception:
-        return []
-    try:
-        commands = get_plugin_commands() or {}
-    except Exception:
-        return []
-    entries: list[tuple[str, str, str]] = []
-    for name, meta in commands.items():
-        if not isinstance(name, str) or not isinstance(meta, dict):
-            continue
-        description = str(meta.get("description") or f"Run /{name}")
-        args_hint = str(meta.get("args_hint") or "").strip()
-        entries.append((name, description, args_hint))
-    return entries
-
-
 def telegram_bot_commands() -> list[tuple[str, str]]:
    """Return (command_name, description) pairs for Telegram setMyCommands.

    Telegram command names cannot contain hyphens, so they are replaced with
    underscores.  Aliases are skipped -- Telegram shows one menu entry per
    canonical command.
-
-    Plugin-registered slash commands are included so plugins get native
-    autocomplete in Telegram without touching core code.
    """
    overrides = _resolve_config_gates()
    result: list[tuple[str, str]] = []
@@ -441,10 +386,6 @@ def telegram_bot_commands() -> list[tuple[str, str]]:
        tg_name = _sanitize_telegram_name(cmd.name)
        if tg_name:
            result.append((tg_name, cmd.description))
-    for name, description, _args_hint in _iter_plugin_command_entries():
-        tg_name = _sanitize_telegram_name(name)
-        if tg_name:
-            result.append((tg_name, description))
    return result


@@ -809,9 +750,6 @@ def slack_subcommand_map() -> dict[str, str]:

    Maps both canonical names and aliases so /hermes bg do stuff works
    the same as /hermes background do stuff.
-
-    Plugin-registered slash commands are included so ``/hermes <plugin-cmd>``
-    routes through the plugin handler.
    """
    overrides = _resolve_config_gates()
    mapping: dict[str, str] = {}
@@ -821,9 +759,6 @@ def slack_subcommand_map() -> dict[str, str]:
        mapping[cmd.name] = f"/{cmd.name}"
        for alias in cmd.aliases:
            mapping[alias] = f"/{alias}"
-    for name, _description, _args_hint in _iter_plugin_command_entries():
-        if name not in mapping:
-            mapping[name] = f"/{name}"
    return mapping


@@ -712,12 +712,6 @@ DEFAULT_CONFIG = {
        "provider": "",    # e.g. "openrouter" (empty = inherit parent provider + credentials)
        "base_url": "",    # direct OpenAI-compatible endpoint for subagents
        "api_key": "",     # API key for delegation.base_url (falls back to OPENAI_API_KEY)
-        # When delegate_task narrows child toolsets explicitly, preserve any
-        # MCP toolsets the parent already has enabled. On by default so
-        # narrowing (e.g. toolsets=["web","browser"]) expresses "I want these
-        # extras" without silently stripping MCP tools the parent already has.
-        # Set to false for strict intersection.
-        "inherit_mcp_toolsets": True,
        "max_iterations": 50,  # per-subagent iteration cap (each subagent gets its own budget,
                               # independent of the parent's max_iterations)
        "reasoning_effort": "",  # reasoning effort for subagents: "xhigh", "high", "medium",
@@ -846,7 +840,6 @@ DEFAULT_CONFIG = {

    # Pre-exec security scanning via tirith
    "security": {
-        "allow_private_urls": False,  # Allow requests to private/internal IPs (for OpenWrt, proxies, VPNs)
        "redact_secrets": True,
        "tirith_enabled": True,
        "tirith_path": "tirith",
@@ -1288,6 +1281,20 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
+    "VOLCENGINE_API_KEY": {
+        "description": "Volcengine API key for Doubao / Seed models (standard + Coding Plan catalogs)",
+        "prompt": "Volcengine API Key",
+        "url": "https://www.volcengine.com/product/ark",
+        "password": True,
+        "category": "provider",
+    },
+    "BYTEPLUS_API_KEY": {
+        "description": "BytePlus API key for Seed / Dola models (standard + Coding Plan catalogs)",
+        "prompt": "BytePlus API Key",
+        "url": "https://www.byteplus.com/en/product/modelark",
+        "password": True,
+        "category": "provider",
+    },
    "AWS_REGION": {
        "description": "AWS region for Bedrock API calls (e.g. us-east-1, eu-central-1)",
        "prompt": "AWS Region",
@@ -3169,7 +3176,7 @@ def save_config(config: Dict[str, Any]):
    if not sec or sec.get("redact_secrets") is None:
        parts.append(_SECURITY_COMMENT)
    fb = normalized.get("fallback_model", {})
-    if not fb or not isinstance(fb, dict) or not (fb.get("provider") and fb.get("model")):
+    if not fb or not (fb.get("provider") and fb.get("model")):
        parts.append(_FALLBACK_COMMENT)

    atomic_yaml_write(
@@ -13,7 +13,6 @@ import time
 import urllib.error
 import urllib.parse
 import urllib.request
-from dataclasses import dataclass
 from pathlib import Path
 from typing import Optional

@@ -148,14 +147,6 @@ def _sweep_expired_pastes(now: Optional[float] = None) -> tuple[int, int]:
    return (deleted, len(remaining))


-def _best_effort_sweep_expired_pastes() -> None:
-    """Attempt pending-paste cleanup without letting /debug fail offline."""
-    try:
-        _sweep_expired_pastes()
-    except Exception:
-        pass
-
-
 # ---------------------------------------------------------------------------
 # Privacy / delete helpers
 # ---------------------------------------------------------------------------
@@ -323,128 +314,72 @@ def upload_to_pastebin(content: str, expiry_days: int = 7) -> str:
 # Log file reading
 # ---------------------------------------------------------------------------

-
-@dataclass
-class LogSnapshot:
-    """Single-read snapshot of a log file used by debug-share."""
-
-    path: Optional[Path]
-    tail_text: str
-    full_text: Optional[str]
-
-
-def _primary_log_path(log_name: str) -> Optional[Path]:
-    """Where *log_name* would live if present. Doesn't check existence."""
-    from hermes_cli.logs import LOG_FILES
-
-    filename = LOG_FILES.get(log_name)
-    return (get_hermes_home() / "logs" / filename) if filename else None
-
-
 def _resolve_log_path(log_name: str) -> Optional[Path]:
    """Find the log file for *log_name*, falling back to the .1 rotation.

-    Returns the first non-empty candidate (primary, then .1), or None.
-    Callers distinguish 'empty primary' from 'truly missing' via
-    :func:`_primary_log_path`.
+    Returns the path if found, or None.
    """
-    primary = _primary_log_path(log_name)
-    if primary is None:
+    from hermes_cli.logs import LOG_FILES
+
+    filename = LOG_FILES.get(log_name)
+    if not filename:
        return None

+    log_dir = get_hermes_home() / "logs"
+    primary = log_dir / filename
    if primary.exists() and primary.stat().st_size > 0:
        return primary

-    rotated = primary.parent / f"{primary.name}.1"
+    # Fall back to the most recent rotated file (.1).
+    rotated = log_dir / f"{filename}.1"
    if rotated.exists() and rotated.stat().st_size > 0:
        return rotated

    return None


-def _capture_log_snapshot(
-    log_name: str,
-    *,
-    tail_lines: int,
-    max_bytes: int = _MAX_LOG_BYTES,
-) -> LogSnapshot:
-    """Capture a log once and derive summary/full-log views from it.
+def _read_log_tail(log_name: str, num_lines: int) -> str:
+    """Read the last *num_lines* from a log file, or return a placeholder."""
+    from hermes_cli.logs import _read_last_n_lines

-    The report tail and standalone log upload must come from the same file
-    snapshot. Otherwise a rotation/truncate between reads can make the report
-    look newer than the uploaded ``agent.log`` paste.
+    log_path = _resolve_log_path(log_name)
+    if log_path is None:
+        return "(file not found)"
+
+    try:
+        lines = _read_last_n_lines(log_path, num_lines)
+        return "".join(lines).rstrip("\n")
+    except Exception as exc:
+        return f"(error reading: {exc})"
+
+
+def _read_full_log(log_name: str, max_bytes: int = _MAX_LOG_BYTES) -> Optional[str]:
+    """Read a log file for standalone upload.
+
+    Returns the file content (last *max_bytes* if truncated), or None if the
+    file doesn't exist or is empty.
    """
    log_path = _resolve_log_path(log_name)
    if log_path is None:
-        primary = _primary_log_path(log_name)
-        tail = "(file empty)" if primary and primary.exists() else "(file not found)"
-        return LogSnapshot(path=None, tail_text=tail, full_text=None)
+        return None

    try:
        size = log_path.stat().st_size
        if size == 0:
-            # race: file was truncated between _resolve_log_path and stat
-            return LogSnapshot(path=log_path, tail_text="(file empty)", full_text=None)
+            return None

+        if size <= max_bytes:
+            return log_path.read_text(encoding="utf-8", errors="replace")
+
+        # File is larger than max_bytes — read the tail.
        with open(log_path, "rb") as f:
-            if size <= max_bytes:
-                raw = f.read()
-                truncated = False
-            else:
-                # Read from the end until we have enough bytes for the
-                # standalone upload and enough newline context to render the
-                # summary tail from the same snapshot.
-                chunk_size = 8192
-                pos = size
-                chunks: list[bytes] = []
-                total = 0
-                newline_count = 0
-
-                while pos > 0 and (total < max_bytes or newline_count <= tail_lines + 1) and total < max_bytes * 2:
-                    read_size = min(chunk_size, pos)
-                    pos -= read_size
-                    f.seek(pos)
-                    chunk = f.read(read_size)
-                    chunks.insert(0, chunk)
-                    total += len(chunk)
-                    newline_count += chunk.count(b"\n")
-                    chunk_size = min(chunk_size * 2, 65536)
-
-                raw = b"".join(chunks)
-                truncated = pos > 0
-
-        full_raw = raw
-        if truncated and len(full_raw) > max_bytes:
-            cut = len(full_raw) - max_bytes
-            # Check whether the cut lands exactly on a line boundary.  If the
-            # byte just before the cut position is a newline the first retained
-            # byte starts a complete line and we should keep it.  Only drop a
-            # partial first line when we're genuinely mid-line.
-            on_boundary = cut > 0 and full_raw[cut - 1 : cut] == b"\n"
-            full_raw = full_raw[cut:]
-            if not on_boundary and b"\n" in full_raw:
-                full_raw = full_raw.split(b"\n", 1)[1]
-
-        all_text = raw.decode("utf-8", errors="replace")
-        tail_text = "".join(all_text.splitlines(keepends=True)[-tail_lines:]).rstrip("\n")
-
-        full_text = full_raw.decode("utf-8", errors="replace")
-        if truncated:
-            full_text = f"[... truncated — showing last ~{max_bytes // 1024}KB ...]\n{full_text}"
-
-        return LogSnapshot(path=log_path, tail_text=tail_text, full_text=full_text)
-    except Exception as exc:
-        return LogSnapshot(path=log_path, tail_text=f"(error reading: {exc})", full_text=None)
-
-
-def _capture_default_log_snapshots(log_lines: int) -> dict[str, LogSnapshot]:
-    """Capture all logs used by debug-share exactly once."""
-    errors_lines = min(log_lines, 100)
-    return {
-        "agent": _capture_log_snapshot("agent", tail_lines=log_lines),
-        "errors": _capture_log_snapshot("errors", tail_lines=errors_lines),
-        "gateway": _capture_log_snapshot("gateway", tail_lines=errors_lines),
-    }
+            f.seek(size - max_bytes)
+            # Skip partial line at the seek point.
+            f.readline()
+            content = f.read().decode("utf-8", errors="replace")
+        return f"[... truncated — showing last ~{max_bytes // 1024}KB ...]\n{content}"
+    except Exception:
+        return None


 # ---------------------------------------------------------------------------
@@ -470,12 +405,7 @@ def _capture_dump() -> str:
    return capture.getvalue()


-def collect_debug_report(
-    *,
-    log_lines: int = 200,
-    dump_text: str = "",
-    log_snapshots: Optional[dict[str, LogSnapshot]] = None,
-) -> str:
+def collect_debug_report(*, log_lines: int = 200, dump_text: str = "") -> str:
    """Build the summary debug report: system dump + log tails.

    Parameters
@@ -494,22 +424,19 @@ def collect_debug_report(
        dump_text = _capture_dump()
    buf.write(dump_text)

-    if log_snapshots is None:
-        log_snapshots = _capture_default_log_snapshots(log_lines)
-
    # ── Recent log tails (summary only) ──────────────────────────────────
    buf.write("\n\n")
    buf.write(f"--- agent.log (last {log_lines} lines) ---\n")
-    buf.write(log_snapshots["agent"].tail_text)
+    buf.write(_read_log_tail("agent", log_lines))
    buf.write("\n\n")

    errors_lines = min(log_lines, 100)
    buf.write(f"--- errors.log (last {errors_lines} lines) ---\n")
-    buf.write(log_snapshots["errors"].tail_text)
+    buf.write(_read_log_tail("errors", errors_lines))
    buf.write("\n\n")

    buf.write(f"--- gateway.log (last {errors_lines} lines) ---\n")
-    buf.write(log_snapshots["gateway"].tail_text)
+    buf.write(_read_log_tail("gateway", errors_lines))
    buf.write("\n")

    return buf.getvalue()
@@ -521,8 +448,6 @@ def collect_debug_report(

 def run_debug_share(args):
    """Collect debug report + full logs, upload each, print URLs."""
-    _best_effort_sweep_expired_pastes()
-
    log_lines = getattr(args, "lines", 200)
    expiry = getattr(args, "expire", 7)
    local_only = getattr(args, "local", False)
@@ -534,15 +459,10 @@ def run_debug_share(args):

    # Capture dump once — prepended to every paste for context.
    dump_text = _capture_dump()
-    log_snapshots = _capture_default_log_snapshots(log_lines)

-    report = collect_debug_report(
-        log_lines=log_lines,
-        dump_text=dump_text,
-        log_snapshots=log_snapshots,
-    )
-    agent_log = log_snapshots["agent"].full_text
-    gateway_log = log_snapshots["gateway"].full_text
+    report = collect_debug_report(log_lines=log_lines, dump_text=dump_text)
+    agent_log = _read_full_log("agent")
+    gateway_log = _read_full_log("gateway")

    # Prepend dump header to each full log so every paste is self-contained.
    if agent_log:
@@ -333,147 +333,6 @@ def _probe_systemd_service_running(system: bool = False) -> tuple[bool, bool]:
    return selected_system, result.stdout.strip() == "active"


-def _read_systemd_unit_properties(
-    system: bool = False,
-    properties: tuple[str, ...] = (
-        "ActiveState",
-        "SubState",
-        "Result",
-        "ExecMainStatus",
-    ),
-) -> dict[str, str]:
-    """Return selected ``systemctl show`` properties for the gateway unit."""
-    selected_system = _select_systemd_scope(system)
-    try:
-        result = _run_systemctl(
-            [
-                "show",
-                get_service_name(),
-                "--no-pager",
-                "--property",
-                ",".join(properties),
-            ],
-            system=selected_system,
-            capture_output=True,
-            text=True,
-            timeout=10,
-        )
-    except (RuntimeError, subprocess.TimeoutExpired, OSError):
-        return {}
-
-    if result.returncode != 0:
-        return {}
-
-    parsed: dict[str, str] = {}
-    for line in result.stdout.splitlines():
-        if "=" not in line:
-            continue
-        key, value = line.split("=", 1)
-        parsed[key] = value.strip()
-    return parsed
-
-
-def _wait_for_systemd_service_restart(
-    *,
-    system: bool = False,
-    previous_pid: int | None = None,
-    timeout: float = 60.0,
-) -> bool:
-    """Wait for the gateway service to become active after a restart handoff."""
-    import time
-
-    svc = get_service_name()
-    scope_label = _service_scope_label(system).capitalize()
-    deadline = time.time() + timeout
-
-    while time.time() < deadline:
-        props = _read_systemd_unit_properties(system=system)
-        active_state = props.get("ActiveState", "")
-        sub_state = props.get("SubState", "")
-        new_pid = None
-        try:
-            from gateway.status import get_running_pid
-
-            new_pid = get_running_pid()
-        except Exception:
-            new_pid = None
-
-        if active_state == "active":
-            if new_pid and (previous_pid is None or new_pid != previous_pid):
-                print(f"✓ {scope_label} service restarted (PID {new_pid})")
-                return True
-            if previous_pid is None:
-                print(f"✓ {scope_label} service restarted")
-                return True
-
-        if active_state == "activating" and sub_state == "auto-restart":
-            time.sleep(1)
-            continue
-
-        time.sleep(2)
-
-    print(
-        f"⚠ {scope_label} service did not become active within {int(timeout)}s.\n"
-        f"  Check status: {'sudo ' if system else ''}hermes gateway status\n"
-        f"  Check logs:   journalctl {'--user ' if not system else ''}-u {svc} -l --since '2 min ago'"
-    )
-    return False
-
-
-def _recover_pending_systemd_restart(system: bool = False, previous_pid: int | None = None) -> bool:
-    """Recover a planned service restart that is stuck in systemd state."""
-    props = _read_systemd_unit_properties(system=system)
-    if not props:
-        return False
-
-    try:
-        from gateway.status import read_runtime_status
-    except Exception:
-        return False
-
-    runtime_state = read_runtime_status() or {}
-    if not runtime_state.get("restart_requested"):
-        return False
-
-    active_state = props.get("ActiveState", "")
-    sub_state = props.get("SubState", "")
-    exec_main_status = props.get("ExecMainStatus", "")
-    result = props.get("Result", "")
-
-    if active_state == "activating" and sub_state == "auto-restart":
-        print("⏳ Service restart already pending — waiting for systemd relaunch...")
-        return _wait_for_systemd_service_restart(
-            system=system,
-            previous_pid=previous_pid,
-        )
-
-    if active_state == "failed" and (
-        exec_main_status == str(GATEWAY_SERVICE_RESTART_EXIT_CODE)
-        or result == "exit-code"
-    ):
-        svc = get_service_name()
-        scope_label = _service_scope_label(system).capitalize()
-        print(f"↻ Clearing failed state for pending {scope_label.lower()} service restart...")
-        _run_systemctl(
-            ["reset-failed", svc],
-            system=system,
-            check=False,
-            timeout=30,
-        )
-        _run_systemctl(
-            ["start", svc],
-            system=system,
-            check=False,
-            timeout=90,
-        )
-        return _wait_for_systemd_service_restart(
-            system=system,
-            previous_pid=previous_pid,
-        )
-
-    return False
-
-
 def _probe_launchd_service_running() -> bool:
    if not get_launchd_plist_path().exists():
        return False
@@ -611,8 +470,7 @@ def stop_profile_gateway() -> bool:
        except (ProcessLookupError, PermissionError):
            break

-    if get_running_pid() is None:
-        remove_pid_file()
+    remove_pid_file()
    return True


@@ -1647,9 +1505,14 @@ def systemd_restart(system: bool = False):

    pid = get_running_pid()
    if pid is not None and _request_gateway_self_restart(pid):
+        # SIGUSR1 sent — the gateway will drain active agents, exit with
+        # code 75, and systemd will restart it after RestartSec (30s).
+        # Wait for the old process to die and the new one to become active
+        # so the CLI doesn't return while the service is still restarting.
        import time
        scope_label = _service_scope_label(system).capitalize()
        svc = get_service_name()
+        scope_cmd = _systemctl_cmd(system)

        # Phase 1: wait for old process to exit (drain + shutdown)
        print(f"⏳ {scope_label} service draining active work...")
@@ -1663,41 +1526,48 @@ def systemd_restart(system: bool = False):
        else:
            print(f"⚠ Old process (PID {pid}) still alive after 90s")

-        # The gateway exits with code 75 for a planned service restart.
-        # systemd can sit in the RestartSec window or even wedge itself into a
-        # failed/rate-limited state if the operator asks for another restart in
-        # the middle of that handoff. Clear any stale failed state and kick the
-        # unit immediately so `hermes gateway restart` behaves idempotently.
-        _run_systemctl(
-            ["reset-failed", svc],
-            system=system,
-            check=False,
-            timeout=30,
-        )
-        _run_systemctl(
-            ["start", svc],
-            system=system,
-            check=False,
-            timeout=90,
-        )
-        _wait_for_systemd_service_restart(system=system, previous_pid=pid)
-        return
+        # Phase 2: wait for systemd to start the new process
+        print(f"⏳ Waiting for {svc} to restart...")
+        deadline = time.time() + 60
+        while time.time() < deadline:
+            try:
+                result = subprocess.run(
+                    scope_cmd + ["is-active", svc],
+                    capture_output=True, text=True, timeout=5,
+                )
+                if result.stdout.strip() == "active":
+                    # Verify it's a NEW process, not the old one somehow
+                    new_pid = get_running_pid()
+                    if new_pid and new_pid != pid:
+                        print(f"✓ {scope_label} service restarted (PID {new_pid})")
+                        return
+            except (subprocess.TimeoutExpired, FileNotFoundError):
+                pass
+            time.sleep(2)

-    if _recover_pending_systemd_restart(system=system, previous_pid=pid):
+        # Timed out — check final state
+        try:
+            result = subprocess.run(
+                scope_cmd + ["is-active", svc],
+                capture_output=True, text=True, timeout=5,
+            )
+            if result.stdout.strip() == "active":
+                print(f"✓ {scope_label} service restarted")
+                return
+        except Exception:
+            pass
+        print(
+            f"⚠ {scope_label} service did not become active within 60s.\n"
+            f"  Check status: {'sudo ' if system else ''}hermes gateway status\n"
+            f"  Check logs:   journalctl {'--user ' if not system else ''}-u {svc} --since '2 min ago'"
+        )
        return
-
-    _run_systemctl(
-        ["reset-failed", get_service_name()],
-        system=system,
-        check=False,
-        timeout=30,
-    )
    _run_systemctl(["reload-or-restart", get_service_name()], system=system, check=True, timeout=90)
    print(f"✓ {_service_scope_label(system).capitalize()} service restarted")



-def systemd_status(deep: bool = False, system: bool = False, full: bool = False):
+def systemd_status(deep: bool = False, system: bool = False):
    system = _select_systemd_scope(system)
    unit_path = get_systemd_unit_path(system=system)
    scope_flag = " --system" if system else ""
@@ -1720,12 +1590,8 @@ def systemd_status(deep: bool = False, system: bool = False, full: bool = False)
        print(f"  Run: {'sudo ' if system else ''}hermes gateway restart{scope_flag}  # auto-refreshes the unit")
        print()

-    status_cmd = ["status", get_service_name(), "--no-pager"]
-    if full:
-        status_cmd.append("-l")
-
    _run_systemctl(
-        status_cmd,
+        ["status", get_service_name(), "--no-pager"],
        system=system,
        capture_output=False,
        timeout=10,
@@ -1758,19 +1624,6 @@ def systemd_status(deep: bool = False, system: bool = False, full: bool = False)
        for line in runtime_lines:
            print(f"  {line}")

-    unit_props = _read_systemd_unit_properties(system=system)
-    active_state = unit_props.get("ActiveState", "")
-    sub_state = unit_props.get("SubState", "")
-    exec_main_status = unit_props.get("ExecMainStatus", "")
-    result_code = unit_props.get("Result", "")
-    if active_state == "activating" and sub_state == "auto-restart":
-        print("  ⏳ Restart pending: systemd is waiting to relaunch the gateway")
-    elif active_state == "failed" and exec_main_status == str(GATEWAY_SERVICE_RESTART_EXIT_CODE):
-        print("  ⚠ Planned restart is stuck in systemd failed state (exit 75)")
-        print(f"  Run: systemctl {'--user ' if not system else ''}reset-failed {get_service_name()} && {'sudo ' if system else ''}hermes gateway start{scope_flag}")
-    elif active_state == "failed" and result_code:
-        print(f"  ⚠ Systemd unit result: {result_code}")
-
    if system:
        print("✓ System service starts at boot without requiring systemd linger")
    elif deep:
@@ -1786,10 +1639,7 @@ def systemd_status(deep: bool = False, system: bool = False, full: bool = False)
    if deep:
        print()
        print("Recent logs:")
-        log_cmd = _journalctl_cmd(system) + ["-u", get_service_name(), "-n", "20", "--no-pager"]
-        if full:
-            log_cmd.append("-l")
-        subprocess.run(log_cmd, timeout=10)
+        subprocess.run(_journalctl_cmd(system) + ["-u", get_service_name(), "-n", "20", "--no-pager"], timeout=10)


 # =============================================================================
@@ -3912,13 +3762,12 @@ def gateway_command(args):
    
    elif subcmd == "status":
        deep = getattr(args, 'deep', False)
-        full = getattr(args, 'full', False)
        system = getattr(args, 'system', False)
        snapshot = get_gateway_runtime_snapshot(system=system)
        
        # Check for service first
        if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
-            systemd_status(deep, system=system, full=full)
+            systemd_status(deep, system=system)
            _print_gateway_process_mismatch(snapshot)
        elif is_macos() and get_launchd_plist_path().exists():
            launchd_status(deep)
@@ -1570,6 +1570,8 @@ def select_provider_and_model(args=None):
        _model_flow_stepfun(config, current_model)
    elif selected_provider == "bedrock":
        _model_flow_bedrock(config, current_model)
+    elif selected_provider in ("volcengine", "byteplus"):
+        _model_flow_contract_provider(config, selected_provider, current_model)
    elif selected_provider in (
        "gemini",
        "deepseek",
@@ -1954,7 +1956,7 @@ def _aux_flow_custom_endpoint(task: str, task_cfg: dict) -> None:
    print(f"{display_name}: custom ({short_url})" + (f" · {model}" if model else ""))


-def _prompt_provider_choice(choices, *, default=0):
+def _prompt_provider_choice(choices, *, default=0, title="Select provider:"):
    """Show provider selection menu with curses arrow-key navigation.

    Falls back to a numbered list when curses is unavailable (e.g. piped
@@ -1963,8 +1965,7 @@ def _prompt_provider_choice(choices, *, default=0):
    """
    try:
        from hermes_cli.setup import _curses_prompt_choice
-
-        idx = _curses_prompt_choice("Select provider:", choices, default)
+        idx = _curses_prompt_choice(title, choices, default)
        if idx >= 0:
            print()
            return idx
@@ -1972,7 +1973,7 @@ def _prompt_provider_choice(choices, *, default=0):
        pass

    # Fallback: numbered list
-    print("Select provider:")
+    print(title)
    for i, c in enumerate(choices, 1):
        marker = "→" if i - 1 == default else " "
        print(f"  {marker} {i}. {c}")
@@ -2944,6 +2945,10 @@ def _model_flow_named_custom(config, provider_info):

 # Curated model lists for direct API-key providers — single source in models.py
 from hermes_cli.models import _PROVIDER_MODELS
+from hermes_cli.volcengine_byteplus import (
+    base_url_for_provider_model,
+    provider_models,
+)


 def _current_reasoning_effort(config) -> str:
@@ -4033,6 +4038,70 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
        print("No change.")


+def _model_flow_contract_provider(config, provider_id, current_model=""):
+    """Provider flow for Volcengine / BytePlus contract-backed catalogs."""
+    from hermes_cli.auth import (
+        PROVIDER_REGISTRY,
+        _prompt_model_selection,
+        _save_model_choice,
+        deactivate_provider,
+    )
+    from hermes_cli.config import get_env_value, load_config, save_config, save_env_value
+
+    pconfig = PROVIDER_REGISTRY[provider_id]
+    key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
+    existing_key = ""
+    for env_var in pconfig.api_key_env_vars:
+        existing_key = get_env_value(env_var) or os.getenv(env_var, "")
+        if existing_key:
+            break
+
+    if not existing_key:
+        print(f"No {pconfig.name} API key configured.")
+        if key_env:
+            try:
+                import getpass
+
+                new_key = getpass.getpass(f"{key_env} (or Enter to cancel): ").strip()
+            except (KeyboardInterrupt, EOFError):
+                print()
+                return
+            if not new_key:
+                print("Cancelled.")
+                return
+            save_env_value(key_env, new_key)
+            print("API key saved.")
+            print()
+    else:
+        print(f"  {pconfig.name} API key: {existing_key[:8]}... ✓")
+        print()
+
+    model_list = provider_models(provider_id)
+    if not model_list:
+        print(f"No curated model catalog found for {pconfig.name}.")
+        return
+
+    selected = _prompt_model_selection(model_list, current_model=current_model)
+    if not selected:
+        print("No change.")
+        return
+
+    _save_model_choice(selected)
+
+    cfg = load_config()
+    model = cfg.get("model")
+    if not isinstance(model, dict):
+        model = {"default": model} if model else {}
+        cfg["model"] = model
+    model["provider"] = provider_id
+    model["base_url"] = base_url_for_provider_model(provider_id, selected)
+    model.pop("api_mode", None)
+    save_config(cfg)
+    deactivate_provider()
+
+    print(f"Default model set to: {selected} (via {pconfig.name})")
+
+
 def _run_anthropic_oauth_flow(save_env_value):
    """Run the Claude OAuth setup-token flow. Returns True if credentials were saved."""
    from agent.anthropic_adapter import (
@@ -6888,12 +6957,6 @@ For more help on a command:
    # gateway status
    gateway_status = gateway_subparsers.add_parser("status", help="Show gateway status")
    gateway_status.add_argument("--deep", action="store_true", help="Deep status check")
-    gateway_status.add_argument(
-        "-l",
-        "--full",
-        action="store_true",
-        help="Show full, untruncated service/log output where supported",
-    )
    gateway_status.add_argument(
        "--system",
        action="store_true",
@@ -97,6 +97,8 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
    "xiaomi",
    "arcee",
    "ollama-cloud",
+    "volcengine",
+    "byteplus",
    "custom",
 })

@@ -423,4 +425,3 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
 # ---------------------------------------------------------------------------
 # Batch / convenience helpers
 # ---------------------------------------------------------------------------
-
@@ -810,10 +810,7 @@ def list_authenticated_providers(
        get_provider_info as _mdev_pinfo,
    )
    from hermes_cli.auth import PROVIDER_REGISTRY
-    from hermes_cli.models import (
-        OPENROUTER_MODELS, _PROVIDER_MODELS,
-        _MODELS_DEV_PREFERRED, _merge_with_models_dev,
-    )
+    from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS

    results: List[dict] = []
    seen_slugs: set = set()  # lowercase-normalized to catch case variants (#9545)
@@ -859,13 +856,8 @@ def list_authenticated_providers(
        if not has_creds:
            continue

-        # Use curated list, falling back to models.dev if no curated list.
-        # For preferred providers, merge models.dev entries into the curated
-        # catalog so newly released models (e.g. mimo-v2.5-pro on opencode-go)
-        # show up in the picker without requiring a Hermes release.
+        # Use curated list, falling back to models.dev if no curated list
        model_ids = curated.get(hermes_id, [])
-        if hermes_id in _MODELS_DEV_PREFERRED:
-            model_ids = _merge_with_models_dev(hermes_id, model_ids)
        total = len(model_ids)
        top = model_ids[:max_models]

@@ -969,9 +961,6 @@ def list_authenticated_providers(

        # Use curated list — look up by Hermes slug, fall back to overlay key
        model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
-        # Merge with models.dev for preferred providers (same rationale as above).
-        if hermes_slug in _MODELS_DEV_PREFERRED:
-            model_ids = _merge_with_models_dev(hermes_slug, model_ids)
        total = len(model_ids)
        top = model_ids[:max_models]

@@ -22,6 +22,12 @@ from hermes_cli import __version__ as _HERMES_VERSION
 # Check (error 1010) don't reject the default ``Python-urllib/*`` signature.
 _HERMES_USER_AGENT = f"hermes-cli/{_HERMES_VERSION}"

+from hermes_cli.volcengine_byteplus import (
+    BYTEPLUS_PROVIDER,
+    VOLCENGINE_PROVIDER,
+    provider_models,
+)
+
 COPILOT_BASE_URL = "https://api.githubcopilot.com"
 COPILOT_MODELS_URL = f"{COPILOT_BASE_URL}/models"
 COPILOT_EDITOR_VERSION = "vscode/1.104.1"
@@ -42,8 +48,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("openrouter/elephant-alpha",       "free"),
    ("openai/gpt-5.4",                  ""),
    ("openai/gpt-5.4-mini",             ""),
-    ("xiaomi/mimo-v2.5-pro",             ""),
-    ("xiaomi/mimo-v2.5",                 ""),
+    ("xiaomi/mimo-v2-pro",               ""),
    ("openai/gpt-5.3-codex",            ""),
    ("google/gemini-3-pro-image-preview", ""),
    ("google/gemini-3-flash-preview",   ""),
@@ -109,8 +114,7 @@ def _codex_curated_models() -> list[str]:
 _PROVIDER_MODELS: dict[str, list[str]] = {
    "nous": [
        "moonshotai/kimi-k2.6",
-        "xiaomi/mimo-v2.5-pro",
-        "xiaomi/mimo-v2.5",
+        "xiaomi/mimo-v2-pro",
        "anthropic/claude-opus-4.7",
        "anthropic/claude-opus-4.6",
        "anthropic/claude-sonnet-4.6",
@@ -358,6 +362,8 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "us.meta.llama4-maverick-17b-instruct-v1:0",
        "us.meta.llama4-scout-17b-instruct-v1:0",
    ],
+    VOLCENGINE_PROVIDER: provider_models(VOLCENGINE_PROVIDER),
+    BYTEPLUS_PROVIDER: provider_models(BYTEPLUS_PROVIDER),
 }

 # Vercel AI Gateway: derive the bare-model-id catalog from the curated
@@ -692,6 +698,8 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("ai-gateway",     "Vercel AI Gateway",        "Vercel AI Gateway (200+ models, $5 free credit, no markup)"),
    ProviderEntry("anthropic",      "Anthropic",                "Anthropic (Claude models — API key or Claude Code)"),
    ProviderEntry("openai-codex",   "OpenAI Codex",             "OpenAI Codex"),
+    ProviderEntry(VOLCENGINE_PROVIDER, "Volcengine",            "Volcengine (standard + Coding Plan catalogs)"),
+    ProviderEntry(BYTEPLUS_PROVIDER, "BytePlus",                "BytePlus (standard + Coding Plan catalogs)"),
    ProviderEntry("xiaomi",         "Xiaomi MiMo",              "Xiaomi MiMo (MiMo-V2 models — pro, omni, flash)"),
    ProviderEntry("nvidia",         "NVIDIA NIM",               "NVIDIA NIM (Nemotron models — build.nvidia.com or local NIM)"),
    ProviderEntry("qwen-oauth",     "Qwen OAuth (Portal)",      "Qwen OAuth (reuses local Qwen CLI login)"),
@@ -721,7 +729,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
 _PROVIDER_LABELS = {p.slug: p.label for p in CANONICAL_PROVIDERS}
 _PROVIDER_LABELS["custom"] = "Custom endpoint"  # special case: not a named provider

-
 _PROVIDER_ALIASES = {
    "glm": "zai",
    "z-ai": "zai",
@@ -784,6 +791,10 @@ _PROVIDER_ALIASES = {
    "nemotron": "nvidia",
    "ollama": "custom",  # bare "ollama" = local; use "ollama-cloud" for cloud
    "ollama_cloud": "ollama-cloud",
+    "volcengine-coding-plan": VOLCENGINE_PROVIDER,
+    "volcengine_coding_plan": VOLCENGINE_PROVIDER,
+    "byteplus-coding-plan": BYTEPLUS_PROVIDER,
+    "byteplus_coding_plan": BYTEPLUS_PROVIDER,
 }


@@ -1244,7 +1255,6 @@ def list_available_providers() -> list[dict[str, str]]:
    """
    # Derive display order from canonical list + custom
    provider_order = [p.slug for p in CANONICAL_PROVIDERS] + ["custom"]
-
    # Build reverse alias map
    aliases_for: dict[str, list[str]] = {}
    for alias, canonical in _PROVIDER_ALIASES.items():
@@ -1260,7 +1270,7 @@ def list_available_providers() -> list[dict[str, str]]:
            from hermes_cli.auth import get_auth_status, has_usable_secret
            if pid == "custom":
                custom_base_url = _get_custom_base_url() or ""
-                has_creds = bool(custom_base_url.strip())
+                has_creds = bool(custom_base_url.strip()) and provider_for_base_url(custom_base_url) is None
            elif pid == "openrouter":
                has_creds = has_usable_secret(os.getenv("OPENROUTER_API_KEY", ""))
            else:
@@ -1326,6 +1336,29 @@ def _get_custom_base_url() -> str:
    return ""


+def provider_for_base_url(base_url: str) -> Optional[str]:
+    """Return a known built-in provider for a configured base URL, if any.
+
+    Uses the canonical _URL_TO_PROVIDER mapping from model_metadata plus
+    additional entries for providers not in that dict.
+    """
+    normalized = str(base_url or "").strip().rstrip("/")
+    if not normalized or "openrouter.ai" in normalized.lower():
+        return None
+
+    url_lower = normalized.lower()
+
+    # Primary source — shared with context-length resolution
+    from agent.model_metadata import _URL_TO_PROVIDER
+
+    for host, provider_id in _URL_TO_PROVIDER.items():
+        if host in url_lower:
+            canonical = normalize_provider(provider_id)
+            if canonical in _PROVIDER_LABELS and canonical != "custom":
+                return canonical
+    return None
+
+
 def curated_models_for_provider(
    provider: Optional[str],
    *,
@@ -1589,84 +1622,11 @@ def _resolve_copilot_catalog_api_key() -> str:
        return ""


-# Providers where models.dev is treated as authoritative: curated static
-# lists are kept only as an offline fallback and to capture custom additions
-# the registry doesn't publish yet. Adding a provider here causes its
-# curated list to be merged with fresh models.dev entries (fresh first, any
-# curated-only names appended) for both the CLI and the gateway /model picker.
-#
-# DELIBERATELY EXCLUDED:
-#   - "openrouter": curated list is already a hand-picked agentic subset of
-#     OpenRouter's 400+ catalog. Blindly merging would dump everything.
-#   - "nous": curated list and Portal /models endpoint are the source of
-#     truth for the subscription tier.
-# Also excluded: providers that already have dedicated live-endpoint
-# branches below (copilot, anthropic, ai-gateway, ollama-cloud, custom,
-# stepfun, openai-codex) — those paths handle freshness themselves.
-_MODELS_DEV_PREFERRED: frozenset[str] = frozenset({
-    "opencode-go",
-    "opencode-zen",
-    "deepseek",
-    "kilocode",
-    "fireworks",
-    "mistral",
-    "togetherai",
-    "cohere",
-    "perplexity",
-    "groq",
-    "nvidia",
-    "huggingface",
-    "zai",
-    "gemini",
-    "google",
-})
-
-
-def _merge_with_models_dev(provider: str, curated: list[str]) -> list[str]:
-    """Merge curated list with fresh models.dev entries for a preferred provider.
-
-    Returns models.dev entries first (in models.dev order), then any
-    curated-only entries appended. Preserves case for curated fallbacks
-    (e.g. ``MiniMax-M2.7``) while trusting models.dev for newer variants.
-
-    If models.dev is unreachable or returns nothing, the curated list is
-    returned unchanged — this is the offline/CI fallback path.
-    """
-    try:
-        from agent.models_dev import list_agentic_models
-        mdev = list_agentic_models(provider)
-    except Exception:
-        mdev = []
-
-    if not mdev:
-        return list(curated)
-
-    # Case-insensitive dedup while preserving order and curated casing.
-    seen_lower: set[str] = set()
-    merged: list[str] = []
-    for mid in mdev:
-        key = str(mid).lower()
-        if key in seen_lower:
-            continue
-        seen_lower.add(key)
-        merged.append(mid)
-    for mid in curated:
-        key = str(mid).lower()
-        if key in seen_lower:
-            continue
-        seen_lower.add(key)
-        merged.append(mid)
-    return merged
-
-
 def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False) -> list[str]:
    """Return the best known model catalog for a provider.

    Tries live API endpoints for providers that support them (Codex, Nous),
-    falling back to static lists. For providers in ``_MODELS_DEV_PREFERRED``
-    (opencode-go/zen, xiaomi, deepseek, smaller inference providers, etc.),
-    models.dev entries are merged on top of curated so new models released
-    on the platform appear in ``/model`` without a Hermes release.
+    falling back to static lists.
    """
    normalized = normalize_provider(provider)
    if normalized == "openrouter":
@@ -1732,10 +1692,7 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
            live = fetch_api_models(api_key, base_url)
            if live:
                return live
-    curated_static = list(_PROVIDER_MODELS.get(normalized, []))
-    if normalized in _MODELS_DEV_PREFERRED:
-        return _merge_with_models_dev(normalized, curated_static)
-    return curated_static
+    return list(_PROVIDER_MODELS.get(normalized, []))


 def _fetch_anthropic_models(timeout: float = 5.0) -> Optional[list[str]]:
@@ -283,7 +283,6 @@ class PluginContext:
        name: str,
        handler: Callable,
        description: str = "",
-        args_hint: str = "",
    ) -> None:
        """Register a slash command (e.g. ``/lcm``) available in CLI and gateway sessions.

@@ -294,13 +293,6 @@ class PluginContext:
        terminal commands), this registers in-session slash commands that users
        invoke during a conversation.

-        ``args_hint`` is an optional short string (e.g. ``"<file>"`` or
-        ``"dias:7 formato:json"``) used by gateway adapters to surface the
-        command with an argument field — for example Discord's native slash
-        command picker. Plugin commands without ``args_hint`` register as
-        parameterless in Discord and still accept trailing text when invoked
-        as free-form chat.
-
        Names conflicting with built-in commands are rejected with a warning.
        """
        clean = name.lower().strip().lstrip("/").replace(" ", "-")
@@ -328,7 +320,6 @@ class PluginContext:
            "handler": handler,
            "description": description or "Plugin command",
            "plugin": self.manifest.name,
-            "args_hint": (args_hint or "").strip(),
        }
        logger.debug("Plugin %s registered command: /%s", self.manifest.name, clean)

@@ -23,6 +23,12 @@ import logging
 from dataclasses import dataclass
 from typing import Any, Dict, List, Optional, Tuple

+from hermes_cli.volcengine_byteplus import (
+    BYTEPLUS_PROVIDER,
+    BYTEPLUS_STANDARD_BASE_URL,
+    VOLCENGINE_PROVIDER,
+    VOLCENGINE_STANDARD_BASE_URL,
+)
 from utils import base_url_host_matches, base_url_hostname

 logger = logging.getLogger(__name__)
@@ -163,6 +169,16 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        transport="openai_chat",
        base_url_env_var="OLLAMA_BASE_URL",
    ),
+    VOLCENGINE_PROVIDER: HermesOverlay(
+        transport="openai_chat",
+        extra_env_vars=("VOLCENGINE_API_KEY",),
+        base_url_override=VOLCENGINE_STANDARD_BASE_URL,
+    ),
+    BYTEPLUS_PROVIDER: HermesOverlay(
+        transport="openai_chat",
+        extra_env_vars=("BYTEPLUS_API_KEY",),
+        base_url_override=BYTEPLUS_STANDARD_BASE_URL,
+    ),
 }


@@ -273,6 +289,10 @@ ALIASES: Dict[str, str] = {
    # xiaomi
    "mimo": "xiaomi",
    "xiaomi-mimo": "xiaomi",
+    "volcengine-coding-plan": VOLCENGINE_PROVIDER,
+    "volcengine_coding_plan": VOLCENGINE_PROVIDER,
+    "byteplus-coding-plan": BYTEPLUS_PROVIDER,
+    "byteplus_coding_plan": BYTEPLUS_PROVIDER,

    # bedrock
    "aws": "bedrock",
@@ -306,6 +326,8 @@ _LABEL_OVERRIDES: Dict[str, str] = {
    "copilot-acp": "GitHub Copilot ACP",
    "stepfun": "StepFun Step Plan",
    "xiaomi": "Xiaomi MiMo",
+    VOLCENGINE_PROVIDER: "Volcengine",
+    BYTEPLUS_PROVIDER: "BytePlus",
    "local": "Local endpoint",
    "bedrock": "AWS Bedrock",
    "ollama-cloud": "Ollama Cloud",
@@ -1,221 +0,0 @@
-"""PTY bridge for `hermes dashboard` chat tab.
-
-Wraps a child process behind a pseudo-terminal so its ANSI output can be
-streamed to a browser-side terminal emulator (xterm.js) and typed
-keystrokes can be fed back in.  The only caller today is the
-``/api/pty`` WebSocket endpoint in ``hermes_cli.web_server``.
-
-Design constraints:
-
-* **POSIX-only.**  Hermes Agent supports Windows exclusively via WSL, which
-  exposes a native POSIX PTY via ``openpty(3)``.  Native Windows Python
-  has no PTY; :class:`PtyUnavailableError` is raised with a user-readable
-  install/platform message so the dashboard can render a banner instead of
-  crashing.
-* **Zero Node dependency on the server side.**  We use :mod:`ptyprocess`,
-  which is a pure-Python wrapper around the OS calls.  The browser talks
-  to the same ``hermes --tui`` binary it would launch from the CLI, so
-  every TUI feature (slash popover, model picker, tool rows, markdown,
-  skin engine, clarify/sudo/approval prompts) ships automatically.
-* **Byte-safe I/O.**  Reads and writes go through the PTY master fd
-  directly — we avoid :class:`ptyprocess.PtyProcessUnicode` because
-  streaming ANSI is inherently byte-oriented and UTF-8 boundaries may land
-  mid-read.
-"""
-
-from __future__ import annotations
-
-import errno
-import fcntl
-import os
-import select
-import signal
-import struct
-import sys
-import termios
-import time
-from typing import Optional, Sequence
-
-try:
-    import ptyprocess  # type: ignore
-    _PTY_AVAILABLE = not sys.platform.startswith("win")
-except ImportError:  # pragma: no cover - dev env without ptyprocess
-    ptyprocess = None  # type: ignore
-    _PTY_AVAILABLE = False
-
-
-__all__ = ["PtyBridge", "PtyUnavailableError"]
-
-
-class PtyUnavailableError(RuntimeError):
-    """Raised when a PTY cannot be created on this platform.
-
-    Today this means native Windows (no ConPTY bindings) or a dev
-    environment missing the ``ptyprocess`` dependency.  The dashboard
-    surfaces the message to the user as a chat-tab banner.
-    """
-
-
-class PtyBridge:
-    """Thin wrapper around ``ptyprocess.PtyProcess`` for byte streaming.
-
-    Not thread-safe.  A single bridge is owned by the WebSocket handler
-    that spawned it; the reader runs in an executor thread while writes
-    happen on the event-loop thread.  Both sides are OK because the
-    kernel PTY is the actual synchronization point — we never call
-    :mod:`ptyprocess` methods concurrently, we only call ``os.read`` and
-    ``os.write`` on the master fd, which is safe.
-    """
-
-    def __init__(self, proc: "ptyprocess.PtyProcess"):  # type: ignore[name-defined]
-        self._proc = proc
-        self._fd: int = proc.fd
-        self._closed = False
-
-    # -- lifecycle --------------------------------------------------------
-
-    @classmethod
-    def is_available(cls) -> bool:
-        """True if a PTY can be spawned on this platform."""
-        return bool(_PTY_AVAILABLE)
-
-    @classmethod
-    def spawn(
-        cls,
-        argv: Sequence[str],
-        *,
-        cwd: Optional[str] = None,
-        env: Optional[dict] = None,
-        cols: int = 80,
-        rows: int = 24,
-    ) -> "PtyBridge":
-        """Spawn ``argv`` behind a new PTY and return a bridge.
-
-        Raises :class:`PtyUnavailableError` if the platform can't host a
-        PTY.  Raises :class:`FileNotFoundError` or :class:`OSError` for
-        ordinary exec failures (missing binary, bad cwd, etc.).
-        """
-        if not _PTY_AVAILABLE:
-            raise PtyUnavailableError(
-                "Pseudo-terminals are unavailable on this platform. "
-                "Hermes Agent supports Windows only via WSL."
-            )
-        # Let caller-supplied env fully override inheritance; if they pass
-        # None we inherit the server's env (same semantics as subprocess).
-        spawn_env = os.environ.copy() if env is None else env
-        proc = ptyprocess.PtyProcess.spawn(  # type: ignore[union-attr]
-            list(argv),
-            cwd=cwd,
-            env=spawn_env,
-            dimensions=(rows, cols),
-        )
-        return cls(proc)
-
-    @property
-    def pid(self) -> int:
-        return int(self._proc.pid)
-
-    def is_alive(self) -> bool:
-        if self._closed:
-            return False
-        try:
-            return bool(self._proc.isalive())
-        except Exception:
-            return False
-
-    # -- I/O --------------------------------------------------------------
-
-    def read(self, timeout: float = 0.2) -> Optional[bytes]:
-        """Read up to 64 KiB of raw bytes from the PTY master.
-
-        Returns:
-            * bytes — zero or more bytes of child output
-            * empty bytes (``b""``) — no data available within ``timeout``
-            * None — child has exited and the master fd is at EOF
-
-        Never blocks longer than ``timeout`` seconds.  Safe to call after
-        :meth:`close`; returns ``None`` in that case.
-        """
-        if self._closed:
-            return None
-        try:
-            readable, _, _ = select.select([self._fd], [], [], timeout)
-        except (OSError, ValueError):
-            return None
-        if not readable:
-            return b""
-        try:
-            data = os.read(self._fd, 65536)
-        except OSError as exc:
-            # EIO on Linux = slave side closed.  EBADF = already closed.
-            if exc.errno in (errno.EIO, errno.EBADF):
-                return None
-            raise
-        if not data:
-            return None
-        return data
-
-    def write(self, data: bytes) -> None:
-        """Write raw bytes to the PTY master (i.e. the child's stdin)."""
-        if self._closed or not data:
-            return
-        # os.write can return a short write under load; loop until drained.
-        view = memoryview(data)
-        while view:
-            try:
-                n = os.write(self._fd, view)
-            except OSError as exc:
-                if exc.errno in (errno.EIO, errno.EBADF, errno.EPIPE):
-                    return
-                raise
-            if n <= 0:
-                return
-            view = view[n:]
-
-    def resize(self, cols: int, rows: int) -> None:
-        """Forward a terminal resize to the child via ``TIOCSWINSZ``."""
-        if self._closed:
-            return
-        # struct winsize: rows, cols, xpixel, ypixel (all unsigned short)
-        winsize = struct.pack("HHHH", max(1, rows), max(1, cols), 0, 0)
-        try:
-            fcntl.ioctl(self._fd, termios.TIOCSWINSZ, winsize)
-        except OSError:
-            pass
-
-    # -- teardown ---------------------------------------------------------
-
-    def close(self) -> None:
-        """Terminate the child (SIGTERM → 0.5s grace → SIGKILL) and close fds.
-
-        Idempotent.  Reaping the child is important so we don't leak
-        zombies across the lifetime of the dashboard process.
-        """
-        if self._closed:
-            return
-        self._closed = True
-
-        # SIGHUP is the conventional "your terminal went away" signal.
-        # We escalate if the child ignores it.
-        for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
-            if not self._proc.isalive():
-                break
-            try:
-                self._proc.kill(sig)
-            except Exception:
-                pass
-            deadline = time.monotonic() + 0.5
-            while self._proc.isalive() and time.monotonic() < deadline:
-                time.sleep(0.02)
-
-        try:
-            self._proc.close(force=True)
-        except Exception:
-            pass
-
-    # Context-manager sugar — handy in tests and ad-hoc scripts.
-    def __enter__(self) -> "PtyBridge":
-        return self
-
-    def __exit__(self, *_exc) -> None:
-        self.close()
@@ -643,7 +643,7 @@ def _resolve_explicit_runtime(

        base_url = explicit_base_url
        if not base_url:
-            if provider in ("kimi-coding", "kimi-coding-cn"):
+            if provider in ("kimi-coding", "kimi-coding-cn", "volcengine", "byteplus"):
                creds = resolve_api_key_provider_credentials(provider)
                base_url = creds.get("base_url", "").rstrip("/")
            else:
@@ -30,14 +30,6 @@ All fields are optional. Missing values inherit from the ``default`` skin.
      prompt: "#FFF8DC"                  # Prompt text color
      input_rule: "#CD7F32"              # Input area horizontal rule
      response_border: "#FFD700"         # Response box border (ANSI)
-      status_bar_bg: "#1a1a2e"           # Status bar background
-      status_bar_text: "#C0C0C0"         # Status bar default text
-      status_bar_strong: "#FFD700"       # Status bar highlighted text
-      status_bar_dim: "#8B8682"          # Status bar separators/muted text
-      status_bar_good: "#8FBC8F"         # Healthy context usage
-      status_bar_warn: "#FFD700"         # Warning context usage
-      status_bar_bad: "#FF8C00"          # High context usage
-      status_bar_critical: "#FF6B6B"     # Critical context usage
      session_label: "#DAA520"           # Session label color
      session_border: "#8B8682"          # Session ID dim color
      status_bar_bg: "#1a1a2e"          # TUI status/usage bar background
@@ -178,7 +170,6 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
            "prompt": "#FFF8DC",
            "input_rule": "#CD7F32",
            "response_border": "#FFD700",
-            "status_bar_bg": "#1a1a2e",
            "session_label": "#DAA520",
            "session_border": "#8B8682",
        },
@@ -212,14 +203,6 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
            "prompt": "#F1E6CF",
            "input_rule": "#9F1C1C",
            "response_border": "#C7A96B",
-            "status_bar_bg": "#2A1212",
-            "status_bar_text": "#F1E6CF",
-            "status_bar_strong": "#C7A96B",
-            "status_bar_dim": "#6E584B",
-            "status_bar_good": "#7BC96F",
-            "status_bar_warn": "#C7A96B",
-            "status_bar_bad": "#DD4A3A",
-            "status_bar_critical": "#EF5350",
            "session_label": "#C7A96B",
            "session_border": "#6E584B",
        },
@@ -284,14 +267,6 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
            "prompt": "#c9d1d9",
            "input_rule": "#444444",
            "response_border": "#aaaaaa",
-            "status_bar_bg": "#1F1F1F",
-            "status_bar_text": "#C9D1D9",
-            "status_bar_strong": "#E6EDF3",
-            "status_bar_dim": "#777777",
-            "status_bar_good": "#B5B5B5",
-            "status_bar_warn": "#AAAAAA",
-            "status_bar_bad": "#D0D0D0",
-            "status_bar_critical": "#F0F0F0",
            "session_label": "#888888",
            "session_border": "#555555",
        },
@@ -323,14 +298,6 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
            "prompt": "#c9d1d9",
            "input_rule": "#4169e1",
            "response_border": "#7eb8f6",
-            "status_bar_bg": "#151C2F",
-            "status_bar_text": "#C9D1D9",
-            "status_bar_strong": "#7EB8F6",
-            "status_bar_dim": "#4B5563",
-            "status_bar_good": "#63D0A6",
-            "status_bar_warn": "#E6A855",
-            "status_bar_bad": "#F7A072",
-            "status_bar_critical": "#FF7A7A",
            "session_label": "#7eb8f6",
            "session_border": "#4b5563",
        },
@@ -436,14 +403,6 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
            "prompt": "#EAF7FF",
            "input_rule": "#2A6FB9",
            "response_border": "#5DB8F5",
-            "status_bar_bg": "#0F2440",
-            "status_bar_text": "#EAF7FF",
-            "status_bar_strong": "#A9DFFF",
-            "status_bar_dim": "#496884",
-            "status_bar_good": "#6ED7B0",
-            "status_bar_warn": "#5DB8F5",
-            "status_bar_bad": "#2A6FB9",
-            "status_bar_critical": "#D94F4F",
            "session_label": "#A9DFFF",
            "session_border": "#496884",
        },
@@ -508,14 +467,6 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
            "prompt": "#F5F5F5",
            "input_rule": "#656565",
            "response_border": "#B7B7B7",
-            "status_bar_bg": "#202020",
-            "status_bar_text": "#D3D3D3",
-            "status_bar_strong": "#F5F5F5",
-            "status_bar_dim": "#656565",
-            "status_bar_good": "#B7B7B7",
-            "status_bar_warn": "#D3D3D3",
-            "status_bar_bad": "#E7E7E7",
-            "status_bar_critical": "#F5F5F5",
            "session_label": "#919191",
            "session_border": "#656565",
        },
@@ -581,14 +532,6 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
            "prompt": "#FFF0D4",
            "input_rule": "#C75B1D",
            "response_border": "#F29C38",
-            "status_bar_bg": "#2B160E",
-            "status_bar_text": "#FFF0D4",
-            "status_bar_strong": "#FFD39A",
-            "status_bar_dim": "#6C4724",
-            "status_bar_good": "#6BCB77",
-            "status_bar_warn": "#F29C38",
-            "status_bar_bad": "#E2832B",
-            "status_bar_critical": "#EF5350",
            "session_label": "#FFD39A",
            "session_border": "#6C4724",
        },
@@ -827,13 +770,6 @@ def get_prompt_toolkit_style_overrides() -> Dict[str, str]:
    warn = skin.get_color("ui_warn", "#FF8C00")
    error = skin.get_color("ui_error", "#FF6B6B")
    status_bg = skin.get_color("status_bar_bg", "#1a1a2e")
-    status_text = skin.get_color("status_bar_text", text)
-    status_strong = skin.get_color("status_bar_strong", title)
-    status_dim = skin.get_color("status_bar_dim", dim)
-    status_good = skin.get_color("status_bar_good", skin.get_color("ui_ok", "#8FBC8F"))
-    status_warn = skin.get_color("status_bar_warn", warn)
-    status_bad = skin.get_color("status_bar_bad", skin.get_color("banner_accent", warn))
-    status_critical = skin.get_color("status_bar_critical", error)
    voice_bg = skin.get_color("voice_status_bg", status_bg)
    menu_bg = skin.get_color("completion_menu_bg", "#1a1a2e")
    menu_current_bg = skin.get_color("completion_menu_current_bg", "#333355")
@@ -846,13 +782,13 @@ def get_prompt_toolkit_style_overrides() -> Dict[str, str]:
        "prompt": prompt,
        "prompt-working": f"{dim} italic",
        "hint": f"{dim} italic",
-        "status-bar": f"bg:{status_bg} {status_text}",
-        "status-bar-strong": f"bg:{status_bg} {status_strong} bold",
-        "status-bar-dim": f"bg:{status_bg} {status_dim}",
-        "status-bar-good": f"bg:{status_bg} {status_good} bold",
-        "status-bar-warn": f"bg:{status_bg} {status_warn} bold",
-        "status-bar-bad": f"bg:{status_bg} {status_bad} bold",
-        "status-bar-critical": f"bg:{status_bg} {status_critical} bold",
+        "status-bar": f"bg:{status_bg} {text}",
+        "status-bar-strong": f"bg:{status_bg} {title} bold",
+        "status-bar-dim": f"bg:{status_bg} {dim}",
+        "status-bar-good": f"bg:{status_bg} {skin.get_color('ui_ok', '#8FBC8F')} bold",
+        "status-bar-warn": f"bg:{status_bg} {warn} bold",
+        "status-bar-bad": f"bg:{status_bg} {skin.get_color('banner_accent', warn)} bold",
+        "status-bar-critical": f"bg:{status_bg} {error} bold",
        "input-rule": input_rule,
        "image-badge": f"{label} bold",
        "completion-menu": f"bg:{menu_bg} {text}",
@@ -0,0 +1,134 @@
+"""Source-of-truth contracts for built-in providers without models.dev catalogs."""
+
+from __future__ import annotations
+
+from typing import Dict, List, Tuple
+
+VOLCENGINE_PROVIDER = "volcengine"
+BYTEPLUS_PROVIDER = "byteplus"
+
+VOLCENGINE_STANDARD_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3"
+VOLCENGINE_CODING_PLAN_BASE_URL = "https://ark.cn-beijing.volces.com/api/coding/v3"
+BYTEPLUS_STANDARD_BASE_URL = "https://ark.ap-southeast.bytepluses.com/api/v3"
+BYTEPLUS_CODING_PLAN_BASE_URL = "https://ark.ap-southeast.bytepluses.com/api/coding/v3"
+
+VOLCENGINE_STANDARD_MODELS: Tuple[str, ...] = (
+    "doubao-seed-2-0-pro-260215",
+    "doubao-seed-2-0-lite-260215",
+    "doubao-seed-2-0-mini-260215",
+    "doubao-seed-2-0-code-preview-260215",
+    "kimi-k2-5-260127",
+    "glm-4-7-251222",
+    "deepseek-v3-2-251201",
+)
+
+VOLCENGINE_CODING_PLAN_MODELS: Tuple[str, ...] = (
+    "doubao-seed-2.0-code",
+    "doubao-seed-2.0-pro",
+    "doubao-seed-2.0-lite",
+    "doubao-seed-code",
+    "minimax-m2.5",
+    "glm-4.7",
+    "deepseek-v3.2",
+    "kimi-k2.5",
+)
+
+BYTEPLUS_STANDARD_MODELS: Tuple[str, ...] = (
+    "seed-2-0-pro-260328",
+    "seed-2-0-lite-260228",
+    "seed-2-0-mini-260215",
+    "kimi-k2-5-260127",
+    "glm-4-7-251222",
+)
+
+BYTEPLUS_CODING_PLAN_MODELS: Tuple[str, ...] = (
+    "dola-seed-2.0-pro",
+    "dola-seed-2.0-lite",
+    "bytedance-seed-code",
+    "glm-4.7",
+    "kimi-k2.5",
+    "gpt-oss-120b",
+)
+
+VOLCENGINE_STANDARD_MODEL_REFS: Tuple[str, ...] = tuple(
+    f"{VOLCENGINE_PROVIDER}/{model_id}" for model_id in VOLCENGINE_STANDARD_MODELS
+)
+VOLCENGINE_CODING_PLAN_MODEL_REFS: Tuple[str, ...] = tuple(
+    f"{VOLCENGINE_PROVIDER}-coding-plan/{model_id}" for model_id in VOLCENGINE_CODING_PLAN_MODELS
+)
+BYTEPLUS_STANDARD_MODEL_REFS: Tuple[str, ...] = tuple(
+    f"{BYTEPLUS_PROVIDER}/{model_id}" for model_id in BYTEPLUS_STANDARD_MODELS
+)
+BYTEPLUS_CODING_PLAN_MODEL_REFS: Tuple[str, ...] = tuple(
+    f"{BYTEPLUS_PROVIDER}-coding-plan/{model_id}" for model_id in BYTEPLUS_CODING_PLAN_MODELS
+)
+
+PROVIDER_MODEL_CATALOGS: Dict[str, Tuple[str, ...]] = {
+    VOLCENGINE_PROVIDER: VOLCENGINE_STANDARD_MODEL_REFS + VOLCENGINE_CODING_PLAN_MODEL_REFS,
+    BYTEPLUS_PROVIDER: BYTEPLUS_STANDARD_MODEL_REFS + BYTEPLUS_CODING_PLAN_MODEL_REFS,
+}
+
+MODEL_CONTEXT_WINDOWS: Dict[str, int] = {
+    "doubao-seed-2-0-pro-260215": 256000,
+    "doubao-seed-2-0-lite-260215": 256000,
+    "doubao-seed-2-0-mini-260215": 256000,
+    "doubao-seed-2-0-code-preview-260215": 256000,
+    "kimi-k2-5-260127": 256000,
+    "glm-4-7-251222": 200000,
+    "deepseek-v3-2-251201": 128000,
+    "doubao-seed-2.0-code": 256000,
+    "doubao-seed-2.0-pro": 256000,
+    "doubao-seed-2.0-lite": 256000,
+    "doubao-seed-code": 256000,
+    "minimax-m2.5": 200000,
+    "glm-4.7": 200000,
+    "deepseek-v3.2": 128000,
+    "kimi-k2.5": 256000,
+    "seed-2-0-pro-260328": 256000,
+    "seed-2-0-lite-260228": 256000,
+    "seed-2-0-mini-260215": 256000,
+}
+
+
+def provider_models(provider_id: str) -> List[str]:
+    """Return the full user-facing model catalog for a provider."""
+    return list(PROVIDER_MODEL_CATALOGS.get(provider_id, ()))
+
+
+def _bare_model_name(model_name: str) -> str:
+    value = (model_name or "").strip()
+    if not value:
+        return ""
+    if "/" in value:
+        return value.split("/", 1)[1].strip()
+    return value
+
+
+def is_coding_plan_model(provider_id: str, model_name: str) -> bool:
+    """Return True when a model belongs to the coding-plan catalog."""
+    raw = (model_name or "").strip()
+    bare = _bare_model_name(raw)
+    if provider_id == VOLCENGINE_PROVIDER:
+        return raw in VOLCENGINE_CODING_PLAN_MODEL_REFS or bare in VOLCENGINE_CODING_PLAN_MODELS
+    if provider_id == BYTEPLUS_PROVIDER:
+        return raw in BYTEPLUS_CODING_PLAN_MODEL_REFS or bare in BYTEPLUS_CODING_PLAN_MODELS
+    return False
+
+
+def base_url_for_provider_model(provider_id: str, model_name: str) -> str:
+    """Resolve the source-of-truth base URL for a provider+model pair."""
+    if provider_id == VOLCENGINE_PROVIDER:
+        if is_coding_plan_model(provider_id, model_name):
+            return VOLCENGINE_CODING_PLAN_BASE_URL
+        return VOLCENGINE_STANDARD_BASE_URL
+    if provider_id == BYTEPLUS_PROVIDER:
+        if is_coding_plan_model(provider_id, model_name):
+            return BYTEPLUS_CODING_PLAN_BASE_URL
+        return BYTEPLUS_STANDARD_BASE_URL
+    return ""
+
+
+def model_context_window(model_name: str) -> int | None:
+    """Return a known context window for a model, if specified by the contract."""
+    bare = _bare_model_name(model_name)
+    return MODEL_CONTEXT_WINDOWS.get(bare)
@@ -49,7 +49,7 @@ from hermes_cli.config import (
 from gateway.status import get_running_pid, read_runtime_status

 try:
-    from fastapi import FastAPI, HTTPException, Request, WebSocket, WebSocketDisconnect
+    from fastapi import FastAPI, HTTPException, Request
    from fastapi.middleware.cors import CORSMiddleware
    from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
    from fastapi.staticfiles import StaticFiles
@@ -2242,148 +2242,6 @@ async def get_usage_analytics(days: int = 30):
        db.close()


-# ---------------------------------------------------------------------------
-# /api/pty — PTY-over-WebSocket bridge for the dashboard "Chat" tab.
-#
-# The endpoint spawns the same ``hermes --tui`` binary the CLI uses, behind
-# a POSIX pseudo-terminal, and forwards bytes + resize escapes across a
-# WebSocket.  The browser renders the ANSI through xterm.js (see
-# web/src/pages/ChatPage.tsx).
-#
-# Auth: ``?token=<session_token>`` query param (browsers can't set
-# Authorization on the WS upgrade).  Same ephemeral ``_SESSION_TOKEN`` as
-# REST.  Localhost-only — we defensively reject non-loopback clients even
-# though uvicorn binds to 127.0.0.1.
-# ---------------------------------------------------------------------------
-
-import re
-import asyncio
-
-from hermes_cli.pty_bridge import PtyBridge, PtyUnavailableError
-
-_RESIZE_RE = re.compile(rb"\x1b\[RESIZE:(\d+);(\d+)\]")
-_PTY_READ_CHUNK_TIMEOUT = 0.2
-# Starlette's TestClient reports the peer as "testclient"; treat it as
-# loopback so tests don't need to rewrite request scope.
-_LOOPBACK_HOSTS = frozenset({"127.0.0.1", "::1", "localhost", "testclient"})
-
-
-def _resolve_chat_argv(
-    resume: Optional[str] = None,
-) -> tuple[list[str], Optional[str], Optional[dict]]:
-    """Resolve the argv + cwd + env for the chat PTY.
-
-    Default: whatever ``hermes --tui`` would run.  Tests monkeypatch this
-    function to inject a tiny fake command (``cat``, ``sh -c 'printf …'``)
-    so nothing has to build Node or the TUI bundle.
-
-    Session resume is propagated via the ``HERMES_TUI_RESUME`` env var —
-    matching what ``hermes_cli.main._launch_tui`` does for the CLI path.
-    Appending ``--resume <id>`` to argv doesn't work because ``ui-tui`` does
-    not parse its argv.
-    """
-    from hermes_cli.main import PROJECT_ROOT, _make_tui_argv
-
-    argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
-    env: Optional[dict] = None
-    if resume:
-        env = os.environ.copy()
-        env["HERMES_TUI_RESUME"] = resume
-    return list(argv), str(cwd) if cwd else None, env
-
-
-@app.websocket("/api/pty")
-async def pty_ws(ws: WebSocket) -> None:
-    # --- auth + loopback check (before accept so we can close cleanly) ---
-    token = ws.query_params.get("token", "")
-    expected = _SESSION_TOKEN
-    if not hmac.compare_digest(token.encode(), expected.encode()):
-        await ws.close(code=4401)
-        return
-
-    client_host = ws.client.host if ws.client else ""
-    if client_host and client_host not in _LOOPBACK_HOSTS:
-        await ws.close(code=4403)
-        return
-
-    await ws.accept()
-
-    # --- spawn PTY ------------------------------------------------------
-    resume = ws.query_params.get("resume") or None
-    try:
-        argv, cwd, env = _resolve_chat_argv(resume=resume)
-    except SystemExit as exc:
-        # _make_tui_argv calls sys.exit(1) when node/npm is missing.
-        await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
-        await ws.close(code=1011)
-        return
-
-
-    try:
-        bridge = PtyBridge.spawn(argv, cwd=cwd, env=env)
-    except PtyUnavailableError as exc:
-        await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
-        await ws.close(code=1011)
-        return
-    except (FileNotFoundError, OSError) as exc:
-        await ws.send_text(f"\r\n\x1b[31mChat failed to start: {exc}\x1b[0m\r\n")
-        await ws.close(code=1011)
-        return
-
-    loop = asyncio.get_running_loop()
-
-    # --- reader task: PTY master → WebSocket ----------------------------
-    async def pump_pty_to_ws() -> None:
-        while True:
-            chunk = await loop.run_in_executor(
-                None, bridge.read, _PTY_READ_CHUNK_TIMEOUT
-            )
-            if chunk is None:  # EOF
-                return
-            if not chunk:  # no data this tick; yield control and retry
-                await asyncio.sleep(0)
-                continue
-            try:
-                await ws.send_bytes(chunk)
-            except Exception:
-                return
-
-    reader_task = asyncio.create_task(pump_pty_to_ws())
-
-    # --- writer loop: WebSocket → PTY master ----------------------------
-    try:
-        while True:
-            msg = await ws.receive()
-            msg_type = msg.get("type")
-            if msg_type == "websocket.disconnect":
-                break
-            raw = msg.get("bytes")
-            if raw is None:
-                text = msg.get("text")
-                raw = text.encode("utf-8") if isinstance(text, str) else b""
-            if not raw:
-                continue
-
-            # Resize escape is consumed locally, never written to the PTY.
-            match = _RESIZE_RE.match(raw)
-            if match and match.end() == len(raw):
-                cols = int(match.group(1))
-                rows = int(match.group(2))
-                bridge.resize(cols=cols, rows=rows)
-                continue
-
-            bridge.write(raw)
-    except WebSocketDisconnect:
-        pass
-    finally:
-        reader_task.cancel()
-        try:
-            await reader_task
-        except (asyncio.CancelledError, Exception):
-            pass
-        bridge.close()
-
-
 def mount_spa(application: FastAPI):
    """Mount the built SPA. Falls back to index.html for client-side routing.

@@ -108,15 +108,9 @@ def _run_async(coro):
    if loop and loop.is_running():
        # Inside an async context (gateway, RL env) — run in a fresh thread.
        import concurrent.futures
-        pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
-        future = pool.submit(asyncio.run, coro)
-        try:
+        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+            future = pool.submit(asyncio.run, coro)
            return future.result(timeout=300)
-        except concurrent.futures.TimeoutError:
-            future.cancel()
-            raise
-        finally:
-            pool.shutdown(wait=False, cancel_futures=True)

    # If we're on a worker thread (e.g., parallel tool execution in
    # delegate_task), use a per-thread persistent loop.  This avoids
@@ -28,7 +28,7 @@

  let
    cfg = config.services.hermes-agent;
-    hermes-agent = inputs.self.packages.${pkgs.stdenv.hostPlatform.system}.default;
+    hermes-agent = inputs.self.packages.${pkgs.system}.default;

    # Deep-merge config type (from 0xrsydn/nix-hermes-agent)
    deepConfigType = lib.types.mkOptionType {
@@ -16,8 +16,8 @@
  },
  "homepage": "https://github.com/NousResearch/Hermes-Agent#readme",
  "dependencies": {
-    "@askjo/camofox-browser": "^1.5.2",
-    "agent-browser": "^0.26.0"
+    "agent-browser": "^0.13.0",
+    "@askjo/camofox-browser": "^1.5.2"
  },
  "overrides": {
    "lodash": "4.18.1"
@@ -76,6 +76,8 @@ from tools.interrupt import set_interrupt as _set_interrupt
 from tools.browser_tool import cleanup_browser


+from hermes_constants import OPENROUTER_BASE_URL
+
 # Agent internals extracted to agent/ package for modularity
 from agent.memory_manager import build_memory_context_block, sanitize_context
 from agent.retry_utils import jittered_backoff
@@ -96,11 +98,19 @@ from agent.model_metadata import (
 from agent.context_compressor import ContextCompressor
 from agent.subdirectory_hints import SubdirectoryHintTracker
 from agent.prompt_caching import apply_anthropic_cache_control
-from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, build_environment_hints, load_soul_md, TOOL_USE_ENFORCEMENT_GUIDANCE, TOOL_USE_ENFORCEMENT_MODELS, GOOGLE_MODEL_OPERATIONAL_GUIDANCE, OPENAI_MODEL_EXECUTION_GUIDANCE
+from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, build_environment_hints, load_soul_md, TOOL_USE_ENFORCEMENT_GUIDANCE, TOOL_USE_ENFORCEMENT_MODELS, DEVELOPER_ROLE_MODELS, GOOGLE_MODEL_OPERATIONAL_GUIDANCE, OPENAI_MODEL_EXECUTION_GUIDANCE
 from agent.usage_pricing import estimate_usage_cost, normalize_usage
 from agent.codex_responses_adapter import (
+    _chat_content_to_responses_parts,
+    _chat_messages_to_responses_input as _codex_chat_messages_to_responses_input,
    _derive_responses_function_call_id as _codex_derive_responses_function_call_id,
    _deterministic_call_id as _codex_deterministic_call_id,
+    _extract_responses_message_text as _codex_extract_responses_message_text,
+    _extract_responses_reasoning_text as _codex_extract_responses_reasoning_text,
+    _normalize_codex_response as _codex_normalize_codex_response,
+    _preflight_codex_api_kwargs as _codex_preflight_codex_api_kwargs,
+    _preflight_codex_input_items as _codex_preflight_codex_input_items,
+    _responses_tools as _codex_responses_tools,
    _split_responses_tool_id as _codex_split_responses_tool_id,
    _summarize_user_message_for_log,
 )
@@ -375,8 +385,9 @@ def _sanitize_surrogates(text: str) -> str:
    return text


-# _summarize_user_message_for_log is imported from agent.codex_responses_adapter
-# (see import block above). Remains importable from run_agent for backward compat.
+# _chat_content_to_responses_parts and _summarize_user_message_for_log are
+# imported from agent.codex_responses_adapter (see import block above).
+# They remain importable from run_agent for backward compatibility.


 def _sanitize_structure_surrogates(payload: Any) -> bool:
@@ -871,13 +882,6 @@ class AIAgent:
        else:
            self.api_mode = "chat_completions"

-        # Eagerly warm the transport cache so import errors surface at init,
-        # not mid-conversation.  Also validates the api_mode is registered.
-        try:
-            self._get_transport()
-        except Exception:
-            pass  # Non-fatal — transport may not exist for all modes yet
-
        try:
            from hermes_cli.model_normalize import (
                _AGGREGATOR_PROVIDERS,
@@ -913,10 +917,6 @@ class AIAgent:
            )
        ):
            self.api_mode = "codex_responses"
-            # Invalidate the eager-warmed transport cache — api_mode changed
-            # from chat_completions to codex_responses after the warm at __init__.
-            if hasattr(self, "_transport_cache"):
-                self._transport_cache.clear()

        # Pre-warm OpenRouter model metadata cache in a background thread.
        # fetch_model_metadata() is cached for 1 hour; this avoids a blocking
@@ -1923,9 +1923,6 @@ class AIAgent:
        self.provider = new_provider
        self.base_url = base_url or self.base_url
        self.api_mode = api_mode
-        # Invalidate transport cache — new api_mode may need a different transport
-        if hasattr(self, "_transport_cache"):
-            self._transport_cache.clear()
        if api_key:
            self.api_key = api_key

@@ -2526,20 +2523,6 @@ class AIAgent:
          4. Tag variants: ``<think>``, ``<thinking>``, ``<reasoning>``,
             ``<REASONING_SCRATCHPAD>``, ``<thought>`` (Gemma 4), all
             case-insensitive.
-
-        Additionally strips standalone tool-call XML blocks that some open
-        models (notably Gemma variants on OpenRouter) emit inside assistant
-        content instead of via the structured ``tool_calls`` field:
-          * ``<tool_call>…</tool_call>``
-          * ``<tool_calls>…</tool_calls>``
-          * ``<tool_result>…</tool_result>``
-          * ``<function_call>…</function_call>``
-          * ``<function_calls>…</function_calls>``
-          * ``<function name="…">…</function>`` (Gemma style)
-        Ported from openclaw/openclaw#67318. The ``<function>`` variant is
-        boundary-gated (only strips when the tag sits at start-of-line or
-        after punctuation and carries a ``name="..."`` attribute) so prose
-        mentions like "Use <function> in JavaScript" are preserved.
        """
        if not content:
            return ""
@@ -2551,30 +2534,6 @@ class AIAgent:
        content = re.sub(r'<reasoning>.*?</reasoning>', '', content, flags=re.DOTALL | re.IGNORECASE)
        content = re.sub(r'<REASONING_SCRATCHPAD>.*?</REASONING_SCRATCHPAD>', '', content, flags=re.DOTALL | re.IGNORECASE)
        content = re.sub(r'<thought>.*?</thought>', '', content, flags=re.DOTALL | re.IGNORECASE)
-        # 1b. Tool-call XML blocks (openclaw/openclaw#67318). Handle the
-        #     generic tag names first — they have no attribute gating since
-        #     a literal <tool_call> in prose is already vanishingly rare.
-        for _tc_name in ("tool_call", "tool_calls", "tool_result",
-                          "function_call", "function_calls"):
-            content = re.sub(
-                rf'<{_tc_name}\b[^>]*>.*?</{_tc_name}>',
-                '',
-                content,
-                flags=re.DOTALL | re.IGNORECASE,
-            )
-        # 1c. <function name="...">...</function> — Gemma-style standalone
-        #     tool call. Only strip when the tag sits at a block boundary
-        #     (start of text, after a newline, or after sentence-ending
-        #     punctuation) AND carries a name="..." attribute. This keeps
-        #     prose mentions like "Use <function> to declare" safe.
-        content = re.sub(
-            r'(?:(?<=^)|(?<=[\n\r.!?:]))[ \t]*'
-            r'<function\b[^>]*\bname\s*=[^>]*>'
-            r'(?:(?:(?!</function>).)*)</function>',
-            '',
-            content,
-            flags=re.DOTALL | re.IGNORECASE,
-        )
        # 2. Unterminated reasoning block — open tag at a block boundary
        #    (start of text, or after a newline) with no matching close.
        #    Strip from the tag to end of string.  Fixes #8878 / #9568
@@ -2592,16 +2551,6 @@ class AIAgent:
            content,
            flags=re.IGNORECASE,
        )
-        # 3b. Stray tool-call closers. (We do NOT strip bare <function> or
-        #     unterminated <function name="..."> because a truncated tail
-        #     during streaming may still be valuable to the user; matches
-        #     OpenClaw's intentional asymmetry.)
-        content = re.sub(
-            r'</(?:tool_call|tool_calls|tool_result|function_call|function_calls|function)>\s*',
-            '',
-            content,
-            flags=re.IGNORECASE,
-        )
        return content

    @staticmethod
@@ -4895,7 +4844,7 @@ class AIAgent:
        active_client = client or self._ensure_primary_openai_client(reason="codex_create_stream_fallback")
        fallback_kwargs = dict(api_kwargs)
        fallback_kwargs["stream"] = True
-        fallback_kwargs = self._get_transport().preflight_kwargs(fallback_kwargs, allow_stream=True)
+        fallback_kwargs = self._get_codex_transport().preflight_kwargs(fallback_kwargs, allow_stream=True)
        stream_or_response = active_client.responses.create(**fallback_kwargs)

        # Compatibility shim for mocks or providers that still return a concrete response.
@@ -5250,9 +5199,6 @@ class AIAgent:
                    result["response"] = self._anthropic_messages_create(api_kwargs)
                elif self.api_mode == "bedrock_converse":
                    # Bedrock uses boto3 directly — no OpenAI client needed.
-                    # normalize_converse_response produces an OpenAI-compatible
-                    # SimpleNamespace so the rest of the agent loop can treat
-                    # bedrock responses like chat_completions responses.
                    from agent.bedrock_adapter import (
                        _get_bedrock_runtime_client,
                        normalize_converse_response,
@@ -5880,6 +5826,16 @@ class AIAgent:
                            result["response"] = _call_chat_completions()
                        return  # success
                    except Exception as e:
+                        if deltas_were_sent["yes"]:
+                            # Streaming failed AFTER some tokens were already
+                            # delivered.  Don't retry or fall back — partial
+                            # content already reached the user.
+                            logger.warning(
+                                "Streaming failed after partial delivery, not retrying: %s", e
+                            )
+                            result["error"] = e
+                            return
+
                        _is_timeout = isinstance(
                            e, (_httpx.ReadTimeout, _httpx.ConnectTimeout, _httpx.PoolTimeout)
                        )
@@ -5887,123 +5843,6 @@ class AIAgent:
                            e, (_httpx.ConnectError, _httpx.RemoteProtocolError, ConnectionError)
                        )

-                        # If the stream died AFTER some tokens were delivered:
-                        # normally we don't retry (the user already saw text,
-                        # retrying would duplicate it).  BUT: if a tool call
-                        # was in-flight when the stream died, silently aborting
-                        # discards the tool call entirely.  In that case we
-                        # prefer to retry — the user sees a brief
-                        # "reconnecting" marker + duplicated preamble text,
-                        # which is strictly better than a failed action with
-                        # a "retry manually" message.  Limit this to transient
-                        # connection errors (Clawdbot-style narrow gate): no
-                        # tool has executed yet within this API call, so
-                        # silent retry is safe wrt side-effects.
-                        if deltas_were_sent["yes"]:
-                            _partial_tool_in_flight = bool(
-                                result.get("partial_tool_names")
-                            )
-                            _is_sse_conn_err_preview = False
-                            if not _is_timeout and not _is_conn_err:
-                                from openai import APIError as _APIError
-                                if isinstance(e, _APIError) and not getattr(e, "status_code", None):
-                                    _err_lower_preview = str(e).lower()
-                                    _SSE_PREVIEW_PHRASES = (
-                                        "connection lost",
-                                        "connection reset",
-                                        "connection closed",
-                                        "connection terminated",
-                                        "network error",
-                                        "network connection",
-                                        "terminated",
-                                        "peer closed",
-                                        "broken pipe",
-                                        "upstream connect error",
-                                    )
-                                    _is_sse_conn_err_preview = any(
-                                        phrase in _err_lower_preview
-                                        for phrase in _SSE_PREVIEW_PHRASES
-                                    )
-                            _is_transient = (
-                                _is_timeout or _is_conn_err or _is_sse_conn_err_preview
-                            )
-                            _can_silent_retry = (
-                                _partial_tool_in_flight
-                                and _is_transient
-                                and _stream_attempt < _max_stream_retries
-                            )
-                            if not _can_silent_retry:
-                                # Either no tool call was in-flight (so the
-                                # turn was a pure text response — current
-                                # stub-with-recovered-text behaviour is
-                                # correct), or retries are exhausted, or the
-                                # error isn't transient.  Fall through to the
-                                # stub path.
-                                logger.warning(
-                                    "Streaming failed after partial delivery, not retrying: %s", e
-                                )
-                                result["error"] = e
-                                return
-                            # Tool call was in-flight AND error is transient:
-                            # retry silently.  Clear per-attempt state so the
-                            # next stream starts clean.  Fire a "reconnecting"
-                            # marker so the user sees why the preamble is
-                            # about to be re-streamed.
-                            logger.info(
-                                "Streaming attempt %s/%s died mid tool-call "
-                                "(%s: %s) after user-visible text; retrying "
-                                "silently to avoid losing the action. "
-                                "Preamble will re-stream.",
-                                _stream_attempt + 1,
-                                _max_stream_retries + 1,
-                                type(e).__name__,
-                                e,
-                            )
-                            try:
-                                self._fire_stream_delta(
-                                    "\n\n⚠ Connection dropped mid tool-call; "
-                                    "reconnecting…\n\n"
-                                )
-                            except Exception:
-                                pass
-                            # Reset the streamed-text buffer so the retry's
-                            # fresh preamble doesn't get double-recorded in
-                            # _current_streamed_assistant_text (which would
-                            # pollute the interim-visible-text comparison).
-                            try:
-                                self._reset_stream_delivery_tracking()
-                            except Exception:
-                                pass
-                            # Reset in-memory accumulators so the next
-                            # attempt's chunks don't concat onto the dead
-                            # stream's partial JSON.
-                            result["partial_tool_names"] = []
-                            deltas_were_sent["yes"] = False
-                            first_delta_fired["done"] = False
-                            self._emit_status(
-                                f"⚠️ Connection dropped mid tool-call "
-                                f"({type(e).__name__}). Reconnecting… "
-                                f"(attempt {_stream_attempt + 2}/{_max_stream_retries + 1})"
-                            )
-                            self._touch_activity(
-                                f"stream retry {_stream_attempt + 2}/{_max_stream_retries + 1} "
-                                f"mid tool-call after {type(e).__name__}"
-                            )
-                            stale = request_client_holder.get("client")
-                            if stale is not None:
-                                self._close_request_openai_client(
-                                    stale, reason="stream_mid_tool_retry_cleanup"
-                                )
-                                request_client_holder["client"] = None
-                            try:
-                                self._replace_primary_openai_client(
-                                    reason="stream_mid_tool_retry_pool_cleanup"
-                                )
-                            except Exception:
-                                pass
-                            self._emit_status("🔄 Reconnected — resuming…")
-                            continue
-
                        # SSE error events from proxies (e.g. OpenRouter sends
                        # {"error":{"message":"Network connection lost."}}) are
                        # raised as APIError by the OpenAI SDK.  These are
@@ -6314,10 +6153,6 @@ class AIAgent:
            # falling through to OpenRouter defaults.
            fb_base_url_hint = (fb.get("base_url") or "").strip() or None
            fb_api_key_hint = (fb.get("api_key") or "").strip() or None
-            if not fb_api_key_hint:
-                fb_key_env = (fb.get("key_env") or "").strip()
-                if fb_key_env:
-                    fb_api_key_hint = os.getenv(fb_key_env, "").strip() or None
            # For Ollama Cloud endpoints, pull OLLAMA_API_KEY from env
            # when no explicit key is in the fallback config. Host match
            # (not substring) — see GHSA-76xc-57q6-vm5m.
@@ -6367,8 +6202,6 @@ class AIAgent:
            self.provider = fb_provider
            self.base_url = fb_base_url
            self.api_mode = fb_api_mode
-            if hasattr(self, "_transport_cache"):
-                self._transport_cache.clear()
            self._fallback_activated = True

            # Honor per-provider / per-model request_timeout_seconds for the
@@ -6480,8 +6313,6 @@ class AIAgent:
            self.provider = rt["provider"]
            self.base_url = rt["base_url"]           # setter updates _base_url_lower
            self.api_mode = rt["api_mode"]
-            if hasattr(self, "_transport_cache"):
-                self._transport_cache.clear()
            self.api_key = rt["api_key"]
            self._client_kwargs = dict(rt["client_kwargs"])
            self._use_prompt_caching = rt["use_prompt_caching"]
@@ -6588,8 +6419,6 @@ class AIAgent:
            self.provider = rt["provider"]
            self.base_url = rt["base_url"]
            self.api_mode = rt["api_mode"]
-            if hasattr(self, "_transport_cache"):
-                self._transport_cache.clear()
            self.api_key = rt["api_key"]

            if self.api_mode == "anthropic_messages":
@@ -6748,59 +6577,41 @@ class AIAgent:
            return suffix
        return "[A multimodal message was converted to text for Anthropic compatibility.]"

-    def _get_transport(self, api_mode: str = None):
-        """Return the cached transport for the given (or current) api_mode.
-
-        Lazy-initializes on first call per api_mode. Returns None if no
-        transport is registered for the mode.
-        """
-        mode = api_mode or self.api_mode
-        cache = getattr(self, "_transport_cache", None)
-        if cache is None:
-            cache = {}
-            self._transport_cache = cache
-        t = cache.get(mode)
+    def _get_anthropic_transport(self):
+        """Return the cached AnthropicTransport instance (lazy singleton)."""
+        t = getattr(self, "_anthropic_transport", None)
        if t is None:
            from agent.transports import get_transport
-            t = get_transport(mode)
-            cache[mode] = t
+            t = get_transport("anthropic_messages")
+            self._anthropic_transport = t
        return t

-    @staticmethod
-    def _nr_to_assistant_message(nr):
-        """Convert a NormalizedResponse to the SimpleNamespace shape downstream expects.
+    def _get_codex_transport(self):
+        """Return the cached ResponsesApiTransport instance (lazy singleton)."""
+        t = getattr(self, "_codex_transport", None)
+        if t is None:
+            from agent.transports import get_transport
+            t = get_transport("codex_responses")
+            self._codex_transport = t
+        return t

-        This is the single back-compat shim between the transport layer
-        (NormalizedResponse) and the agent loop (SimpleNamespace with
-        .content, .tool_calls, .reasoning, .reasoning_content,
-        .reasoning_details, .codex_reasoning_items, and per-tool-call
-        .call_id / .response_item_id).
+    def _get_chat_completions_transport(self):
+        """Return the cached ChatCompletionsTransport instance (lazy singleton)."""
+        t = getattr(self, "_chat_completions_transport", None)
+        if t is None:
+            from agent.transports import get_transport
+            t = get_transport("chat_completions")
+            self._chat_completions_transport = t
+        return t

-        TODO: Remove when downstream code reads NormalizedResponse directly.
-        """
-        tc_list = None
-        if nr.tool_calls:
-            tc_list = []
-            for tc in nr.tool_calls:
-                tc_ns = SimpleNamespace(
-                    id=tc.id,
-                    type="function",
-                    function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
-                )
-                if tc.provider_data:
-                    for key in ("call_id", "response_item_id"):
-                        if tc.provider_data.get(key):
-                            setattr(tc_ns, key, tc.provider_data[key])
-                tc_list.append(tc_ns)
-        pd = nr.provider_data or {}
-        return SimpleNamespace(
-            content=nr.content,
-            tool_calls=tc_list or None,
-            reasoning=nr.reasoning,
-            reasoning_content=pd.get("reasoning_content"),
-            reasoning_details=pd.get("reasoning_details"),
-            codex_reasoning_items=pd.get("codex_reasoning_items"),
-        )
+    def _get_bedrock_transport(self):
+        """Return the cached BedrockTransport instance (lazy singleton)."""
+        t = getattr(self, "_bedrock_transport", None)
+        if t is None:
+            from agent.transports import get_transport
+            t = get_transport("bedrock_converse")
+            self._bedrock_transport = t
+        return t

    def _prepare_anthropic_messages_for_api(self, api_messages: list) -> list:
        if not any(
@@ -6918,7 +6729,7 @@ class AIAgent:
    def _build_api_kwargs(self, api_messages: list) -> dict:
        """Build the keyword arguments dict for the active API mode."""
        if self.api_mode == "anthropic_messages":
-            _transport = self._get_transport()
+            _transport = self._get_anthropic_transport()
            anthropic_messages = self._prepare_anthropic_messages_for_api(api_messages)
            ctx_len = getattr(self, "context_compressor", None)
            ctx_len = ctx_len.context_length if ctx_len else None
@@ -6941,7 +6752,7 @@ class AIAgent:
        # AWS Bedrock native Converse API — bypasses the OpenAI client entirely.
        # The adapter handles message/tool conversion and boto3 calls directly.
        if self.api_mode == "bedrock_converse":
-            _bt = self._get_transport()
+            _bt = self._get_bedrock_transport()
            region = getattr(self, "_bedrock_region", None) or "us-east-1"
            guardrail = getattr(self, "_bedrock_guardrail_config", None)
            return _bt.build_kwargs(
@@ -6954,7 +6765,7 @@ class AIAgent:
            )

        if self.api_mode == "codex_responses":
-            _ct = self._get_transport()
+            _ct = self._get_codex_transport()
            is_github_responses = (
                base_url_host_matches(self.base_url, "models.github.ai")
                or base_url_host_matches(self.base_url, "api.githubcopilot.com")
@@ -6982,7 +6793,7 @@ class AIAgent:
            )

        # ── chat_completions (default) ─────────────────────────────────────
-        _ct = self._get_transport()
+        _ct = self._get_chat_completions_transport()

        # Provider detection flags
        _is_qwen = self._is_qwen_portal()
@@ -7457,7 +7268,7 @@ class AIAgent:
            if not _aux_available and self.api_mode == "codex_responses":
                # No auxiliary client -- use the Codex Responses path directly
                codex_kwargs = self._build_api_kwargs(api_messages)
-                codex_kwargs["tools"] = self._get_transport().convert_tools([memory_tool_def])
+                codex_kwargs["tools"] = self._get_codex_transport().convert_tools([memory_tool_def])
                if _flush_temperature is not None:
                    codex_kwargs["temperature"] = _flush_temperature
                else:
@@ -7467,7 +7278,7 @@ class AIAgent:
                response = self._run_codex_stream(codex_kwargs)
            elif not _aux_available and self.api_mode == "anthropic_messages":
                # Native Anthropic — use the transport for kwargs
-                _tflush = self._get_transport()
+                _tflush = self._get_anthropic_transport()
                ant_kwargs = _tflush.build_kwargs(
                    model=self.model, messages=api_messages,
                    tools=[memory_tool_def], max_tokens=5120,
@@ -7492,7 +7303,7 @@ class AIAgent:
            # Extract tool calls from the response, handling all API formats
            tool_calls = []
            if self.api_mode == "codex_responses" and not _aux_available:
-                _ct_flush = self._get_transport()
+                _ct_flush = self._get_codex_transport()
                _cnr_flush = _ct_flush.normalize_response(response)
                if _cnr_flush and _cnr_flush.tool_calls:
                    tool_calls = [
@@ -7502,7 +7313,7 @@ class AIAgent:
                        ) for tc in _cnr_flush.tool_calls
                    ]
            elif self.api_mode == "anthropic_messages" and not _aux_available:
-                _tfn = self._get_transport()
+                _tfn = self._get_anthropic_transport()
                _flush_nr = _tfn.normalize_response(response, strip_tool_prefix=self._is_anthropic_oauth)
                if _flush_nr and _flush_nr.tool_calls:
                    tool_calls = [
@@ -7512,11 +7323,9 @@ class AIAgent:
                        ) for tc in _flush_nr.tool_calls
                    ]
            elif hasattr(response, "choices") and response.choices:
-                # chat_completions / bedrock — normalize through transport
-                _flush_cc_nr = self._get_transport().normalize_response(response)
-                _flush_msg = self._nr_to_assistant_message(_flush_cc_nr)
-                if _flush_msg.tool_calls:
-                    tool_calls = _flush_msg.tool_calls
+                assistant_message = response.choices[0].message
+                if assistant_message.tool_calls:
+                    tool_calls = assistant_message.tool_calls

            for tc in tool_calls:
                if tc.function.name == "memory":
@@ -8546,7 +8355,7 @@ class AIAgent:
                codex_kwargs = self._build_api_kwargs(api_messages)
                codex_kwargs.pop("tools", None)
                summary_response = self._run_codex_stream(codex_kwargs)
-                _ct_sum = self._get_transport()
+                _ct_sum = self._get_codex_transport()
                _cnr_sum = _ct_sum.normalize_response(summary_response)
                final_response = (_cnr_sum.content or "").strip()
            else:
@@ -8576,7 +8385,7 @@ class AIAgent:
                    summary_kwargs["extra_body"] = summary_extra_body

                if self.api_mode == "anthropic_messages":
-                    _tsum = self._get_transport()
+                    _tsum = self._get_anthropic_transport()
                    _ant_kw = _tsum.build_kwargs(model=self.model, messages=api_messages, tools=None,
                                   max_tokens=self.max_tokens, reasoning_config=self.reasoning_config,
                                   is_oauth=self._is_anthropic_oauth,
@@ -8586,8 +8395,11 @@ class AIAgent:
                    final_response = (_sum_nr.content or "").strip()
                else:
                    summary_response = self._ensure_primary_openai_client(reason="iteration_limit_summary").chat.completions.create(**summary_kwargs)
-                    _sum_cc_nr = self._get_transport().normalize_response(summary_response)
-                    final_response = (_sum_cc_nr.content or "").strip()
+
+                    if summary_response.choices and summary_response.choices[0].message.content:
+                        final_response = summary_response.choices[0].message.content
+                    else:
+                        final_response = ""

            if final_response:
                if "<think>" in final_response:
@@ -8602,11 +8414,11 @@ class AIAgent:
                    codex_kwargs = self._build_api_kwargs(api_messages)
                    codex_kwargs.pop("tools", None)
                    retry_response = self._run_codex_stream(codex_kwargs)
-                    _ct_retry = self._get_transport()
+                    _ct_retry = self._get_codex_transport()
                    _cnr_retry = _ct_retry.normalize_response(retry_response)
                    final_response = (_cnr_retry.content or "").strip()
                elif self.api_mode == "anthropic_messages":
-                    _tretry = self._get_transport()
+                    _tretry = self._get_anthropic_transport()
                    _ant_kw2 = _tretry.build_kwargs(model=self.model, messages=api_messages, tools=None,
                                    is_oauth=self._is_anthropic_oauth,
                                    max_tokens=self.max_tokens, reasoning_config=self.reasoning_config,
@@ -8627,8 +8439,11 @@ class AIAgent:
                        summary_kwargs["extra_body"] = summary_extra_body

                    summary_response = self._ensure_primary_openai_client(reason="iteration_limit_summary_retry").chat.completions.create(**summary_kwargs)
-                    _retry_cc_nr = self._get_transport().normalize_response(summary_response)
-                    final_response = (_retry_cc_nr.content or "").strip()
+
+                    if summary_response.choices and summary_response.choices[0].message.content:
+                        final_response = summary_response.choices[0].message.content
+                    else:
+                        final_response = ""

                if final_response:
                    if "<think>" in final_response:
@@ -9359,7 +9174,7 @@ class AIAgent:
                    if self._force_ascii_payload:
                        _sanitize_structure_non_ascii(api_kwargs)
                    if self.api_mode == "codex_responses":
-                        api_kwargs = self._get_transport().preflight_kwargs(api_kwargs, allow_stream=False)
+                        api_kwargs = self._get_codex_transport().preflight_kwargs(api_kwargs, allow_stream=False)

                    try:
                        from hermes_cli.plugins import invoke_hook as _invoke_hook
@@ -9447,7 +9262,7 @@ class AIAgent:
                    response_invalid = False
                    error_details = []
                    if self.api_mode == "codex_responses":
-                        _ct_v = self._get_transport()
+                        _ct_v = self._get_codex_transport()
                        if not _ct_v.validate_response(response):
                            if response is None:
                                response_invalid = True
@@ -9476,7 +9291,7 @@ class AIAgent:
                                    response_invalid = True
                                    error_details.append("response.output is empty")
                    elif self.api_mode == "anthropic_messages":
-                        _tv = self._get_transport()
+                        _tv = self._get_anthropic_transport()
                        if not _tv.validate_response(response):
                            response_invalid = True
                            if response is None:
@@ -9484,7 +9299,7 @@ class AIAgent:
                            else:
                                error_details.append("response.content invalid (not a non-empty list)")
                    elif self.api_mode == "bedrock_converse":
-                        _btv = self._get_transport()
+                        _btv = self._get_bedrock_transport()
                        if not _btv.validate_response(response):
                            response_invalid = True
                            if response is None:
@@ -9492,7 +9307,7 @@ class AIAgent:
                            else:
                                error_details.append("Bedrock response invalid (no output or choices)")
                    else:
-                        _ctv = self._get_transport()
+                        _ctv = self._get_chat_completions_transport()
                        if not _ctv.validate_response(response):
                            response_invalid = True
                            if response is None:
@@ -9652,18 +9467,15 @@ class AIAgent:
                        else:
                            finish_reason = "stop"
                    elif self.api_mode == "anthropic_messages":
-                        _tfr = self._get_transport()
+                        _tfr = self._get_anthropic_transport()
                        finish_reason = _tfr.map_finish_reason(response.stop_reason)
                    elif self.api_mode == "bedrock_converse":
-                        # Bedrock response already normalized at dispatch — use transport
-                        _bt_fr = self._get_transport()
-                        _bt_fr_nr = _bt_fr.normalize_response(response)
-                        finish_reason = _bt_fr_nr.finish_reason
+                        # Bedrock response is already normalized at dispatch — finish_reason
+                        # is already in OpenAI format via normalize_converse_response()
+                        finish_reason = response.choices[0].finish_reason if hasattr(response, "choices") and response.choices else "stop"
                    else:
-                        _cc_fr = self._get_transport()
-                        _cc_fr_nr = _cc_fr.normalize_response(response)
-                        finish_reason = _cc_fr_nr.finish_reason
-                        assistant_message = self._nr_to_assistant_message(_cc_fr_nr)
+                        finish_reason = response.choices[0].finish_reason
+                        assistant_message = response.choices[0].message
                        if self._should_treat_stop_as_truncated(
                            finish_reason,
                            assistant_message,
@@ -9686,14 +9498,27 @@ class AIAgent:
                        # interim assistant message is byte-identical to what
                        # would have been appended in the non-truncated path.
                        _trunc_msg = None
-                        _trunc_transport = self._get_transport()
-                        if self.api_mode == "anthropic_messages":
-                            _trunc_nr = _trunc_transport.normalize_response(
+                        if self.api_mode in ("chat_completions", "bedrock_converse"):
+                            _trunc_msg = response.choices[0].message if (hasattr(response, "choices") and response.choices) else None
+                        elif self.api_mode == "anthropic_messages":
+                            _trunc_nr = self._get_anthropic_transport().normalize_response(
                                response, strip_tool_prefix=self._is_anthropic_oauth
                            )
-                        else:
-                            _trunc_nr = _trunc_transport.normalize_response(response)
-                        _trunc_msg = self._nr_to_assistant_message(_trunc_nr)
+                            _trunc_msg = SimpleNamespace(
+                                content=_trunc_nr.content,
+                                tool_calls=[
+                                    SimpleNamespace(
+                                        id=tc.id, type="function",
+                                        function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
+                                    ) for tc in (_trunc_nr.tool_calls or [])
+                                ] or None,
+                                reasoning=_trunc_nr.reasoning,
+                                reasoning_content=None,
+                                reasoning_details=(
+                                    _trunc_nr.provider_data.get("reasoning_details")
+                                    if _trunc_nr.provider_data else None
+                                ),
+                            )

                        _trunc_content = getattr(_trunc_msg, "content", None) if _trunc_msg else None
                        _trunc_has_tool_calls = bool(getattr(_trunc_msg, "tool_calls", None)) if _trunc_msg else False
@@ -10924,13 +10749,69 @@ class AIAgent:
                break

            try:
-                _transport = self._get_transport()
-                _normalize_kwargs = {}
-                if self.api_mode == "anthropic_messages":
-                    _normalize_kwargs["strip_tool_prefix"] = self._is_anthropic_oauth
-                _nr = _transport.normalize_response(response, **_normalize_kwargs)
-                assistant_message = self._nr_to_assistant_message(_nr)
-                finish_reason = _nr.finish_reason
+                if self.api_mode == "codex_responses":
+                    _ct = self._get_codex_transport()
+                    _cnr = _ct.normalize_response(response)
+                    # Back-compat shim: downstream expects SimpleNamespace with
+                    # codex-specific fields (.codex_reasoning_items, .reasoning_details,
+                    # and .call_id/.response_item_id on tool calls).
+                    _tc_list = None
+                    if _cnr.tool_calls:
+                        _tc_list = []
+                        for tc in _cnr.tool_calls:
+                            _tc_ns = SimpleNamespace(
+                                id=tc.id, type="function",
+                                function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
+                            )
+                            if tc.provider_data:
+                                if tc.provider_data.get("call_id"):
+                                    _tc_ns.call_id = tc.provider_data["call_id"]
+                                if tc.provider_data.get("response_item_id"):
+                                    _tc_ns.response_item_id = tc.provider_data["response_item_id"]
+                            _tc_list.append(_tc_ns)
+                    assistant_message = SimpleNamespace(
+                        content=_cnr.content,
+                        tool_calls=_tc_list or None,
+                        reasoning=_cnr.reasoning,
+                        reasoning_content=None,
+                        codex_reasoning_items=(
+                            _cnr.provider_data.get("codex_reasoning_items")
+                            if _cnr.provider_data else None
+                        ),
+                        reasoning_details=(
+                            _cnr.provider_data.get("reasoning_details")
+                            if _cnr.provider_data else None
+                        ),
+                    )
+                    finish_reason = _cnr.finish_reason
+                elif self.api_mode == "anthropic_messages":
+                    _transport = self._get_anthropic_transport()
+                    _nr = _transport.normalize_response(
+                        response, strip_tool_prefix=self._is_anthropic_oauth
+                    )
+                    # Back-compat shim: downstream code expects SimpleNamespace with
+                    # .content, .tool_calls, .reasoning, .reasoning_content,
+                    # .reasoning_details attributes.
+                    assistant_message = SimpleNamespace(
+                        content=_nr.content,
+                        tool_calls=[
+                            SimpleNamespace(
+                                id=tc.id,
+                                type="function",
+                                function=SimpleNamespace(name=tc.name, arguments=tc.arguments),
+                            )
+                            for tc in (_nr.tool_calls or [])
+                        ] or None,
+                        reasoning=_nr.reasoning,
+                        reasoning_content=None,
+                        reasoning_details=(
+                            _nr.provider_data.get("reasoning_details")
+                            if _nr.provider_data else None
+                        ),
+                    )
+                    finish_reason = _nr.finish_reason
+                else:
+                    assistant_message = response.choices[0].message
                
                # Normalize content to string — some OpenAI-compatible servers
                # (llama-server, etc.) return content as a dict or list instead
@@ -265,7 +265,7 @@ def check_config(groq_key, eleven_key):
    if voice_mode_path.exists():
        try:
            import json
-            modes = json.loads(voice_mode_path.read_text(encoding="utf-8"))
+            modes = json.loads(voice_mode_path.read_text())
            off_count = sum(1 for v in modes.values() if v == "off")
            all_count = sum(1 for v in modes.values() if v == "all")
            check("Voice mode state", True, f"{all_count} on, {off_count} off, {len(modes)} total")
@@ -58,7 +58,6 @@ AUTHOR_MAP = {
    "16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
    "185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
    "101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
-    "255305877+ismell0992-afk@users.noreply.github.com": "ismell0992-afk",
    "valdi.jorge@gmail.com": "jvcl",
    "francip@gmail.com": "francip",
    "omni@comelse.com": "omnissiah-comelse",
@@ -106,6 +105,7 @@ AUTHOR_MAP = {
    "134848055+UNLINEARITY@users.noreply.github.com": "UNLINEARITY",
    "ben.burtenshaw@gmail.com": "burtenshaw",
    "roopaknijhara@gmail.com": "rnijhara",
+    "Maaannnn@users.noreply.github.com": "Maaannnn",
    # contributors (manual mapping from git names)
    "ahmedsherif95@gmail.com": "asheriif",
    "liujinkun@bytedance.com": "liujinkun2025",
@@ -141,7 +141,6 @@ AUTHOR_MAP = {
    "331214+counterposition@users.noreply.github.com": "counterposition",
    "blspear@gmail.com": "BrennerSpear",
    "akhater@gmail.com": "akhater",
-    "Cos_Admin@PTG-COS.lodluvup4uaudnm3ycd14giyug.xx.internal.cloudapp.net": "akhater",
    "239876380+handsdiff@users.noreply.github.com": "handsdiff",
    "hesapacicam112@gmail.com": "etherman-os",
    "mark.ramsell@rivermounts.com": "mark-ramsell",
@@ -183,7 +182,6 @@ AUTHOR_MAP = {
    "adavyasharma@gmail.com": "adavyas",
    "acaayush1111@gmail.com": "aayushchaudhary",
    "jason@outland.art": "jasonoutland",
-    "73175452+Magaav@users.noreply.github.com": "Magaav",
    "mrflu1918@proton.me": "SPANISHFLU",
    "morganemoss@gmai.com": "mormio",
    "kopjop926@gmail.com": "cesareth",
@@ -288,7 +286,6 @@ AUTHOR_MAP = {
    "srhtsrht17@gmail.com": "Sertug17",
    "stephenschoettler@gmail.com": "stephenschoettler",
    "tanishq231003@gmail.com": "yyovil",
-    "taosiyuan163@153.com": "taosiyuan163",
    "tesseracttars@gmail.com": "tesseracttars-creator",
    "tianliangjay@gmail.com": "xingkongliang",
    "tranquil_flow@protonmail.com": "Tranquil-Flow",
@@ -345,32 +342,6 @@ AUTHOR_MAP = {
    "shalompmc0505@naver.com": "pinion05",
    "105142614+VTRiot@users.noreply.github.com": "VTRiot",
    "vivien000812@gmail.com": "iamagenius00",
-    "89228157+Feranmi10@users.noreply.github.com": "Feranmi10",
-    "simon@gtcl.us": "simon-gtcl",
-    "suzukaze.haduki@gmail.com": "houko",
-    "cliff@cigii.com": "cgarwood82",
-    "anna@oa.ke": "anna-oake",
-    "jaffarkeikei@gmail.com": "jaffarkeikei",
-    "hxp@hxp.plus": "hxp-plus",
-    "3580442280@qq.com": "Tianworld",
-    "wujianxu91@gmail.com": "wujhsu",
-    "zhrh120@gmail.com": "niyoh120",
-    "vrinek@hey.com": "vrinek",
-    "268198004+xandersbell@users.noreply.github.com": "xandersbell",
-    "somme4096@gmail.com": "Somme4096",
-    "brian@tiuxo.com": "brianclemens",
-    "25944632+yudaiyan@users.noreply.github.com": "yudaiyan",
-    "chayton@sina.com": "ycbai",
-    "longsizhuo@gmail.com": "longsizhuo",
-    "chenb19870707@gmail.com": "ms-alan",
-    "276886827+WuTianyi123@users.noreply.github.com": "WuTianyi123",
-    "22549957+li0near@users.noreply.github.com": "li0near",
-    "23434080+sicnuyudidi@users.noreply.github.com": "sicnuyudidi",
-    "haimu0x0@proton.me": "haimu0x",
-    "abdelmajidnidnasser1@gmail.com": "NIDNASSER-Abdelmajid",
-    "projectadmin@wit.id": "projectadmin-dev",
-    "mrigankamondal10@gmail.com": "Dev-Mriganka",
-    "132275809+shushuzn@users.noreply.github.com": "shushuzn",
 }


@@ -8,7 +8,7 @@
      "name": "hermes-whatsapp-bridge",
      "version": "1.0.0",
      "dependencies": {
-        "@whiskeysockets/baileys": "WhiskeySockets/Baileys#01047debd81beb20da7b7779b08edcb06aa03770",
+        "@whiskeysockets/baileys": "WhiskeySockets/Baileys#fix/abprops-abt-fetch",
        "express": "^4.21.0",
        "pino": "^9.0.0",
        "qrcode-terminal": "^0.12.0"
@@ -1659,91 +1659,3 @@ class TestToolChoice:
            tool_choice="search",
        )
        assert kwargs["tool_choice"] == {"type": "tool", "name": "search"}
-
-
-
-# ---------------------------------------------------------------------------
-# max_tokens resolver — openclaw/openclaw#66664 port
-# ---------------------------------------------------------------------------
-
-from agent.anthropic_adapter import (
-    _resolve_positive_anthropic_max_tokens,
-    _resolve_anthropic_messages_max_tokens,
-)
-
-
-class TestResolvePositiveMaxTokens:
-    """Unit tests for the positive-int resolver helper."""
-
-    def test_positive_int_passes_through(self):
-        assert _resolve_positive_anthropic_max_tokens(8192) == 8192
-
-    def test_zero_returns_none(self):
-        assert _resolve_positive_anthropic_max_tokens(0) is None
-
-    def test_negative_int_returns_none(self):
-        assert _resolve_positive_anthropic_max_tokens(-1) is None
-        assert _resolve_positive_anthropic_max_tokens(-500) is None
-
-    def test_fractional_float_floored_and_kept_if_positive(self):
-        # 8192.7 -> 8192, still positive
-        assert _resolve_positive_anthropic_max_tokens(8192.7) == 8192
-
-    def test_small_positive_float_below_one_returns_none(self):
-        # 0.5 floors to 0, which is not positive
-        assert _resolve_positive_anthropic_max_tokens(0.5) is None
-
-    def test_negative_float_returns_none(self):
-        assert _resolve_positive_anthropic_max_tokens(-1.5) is None
-
-    def test_nan_returns_none(self):
-        assert _resolve_positive_anthropic_max_tokens(float("nan")) is None
-
-    def test_infinity_returns_none(self):
-        assert _resolve_positive_anthropic_max_tokens(float("inf")) is None
-        assert _resolve_positive_anthropic_max_tokens(float("-inf")) is None
-
-    def test_bool_true_returns_none(self):
-        # True is an int subclass but semantically never a real max_tokens value
-        assert _resolve_positive_anthropic_max_tokens(True) is None
-        assert _resolve_positive_anthropic_max_tokens(False) is None
-
-    def test_string_returns_none(self):
-        assert _resolve_positive_anthropic_max_tokens("8192") is None
-
-    def test_none_returns_none(self):
-        assert _resolve_positive_anthropic_max_tokens(None) is None
-
-
-class TestResolveMessagesMaxTokens:
-    """Integration tests for the full Messages resolver."""
-
-    def test_positive_requested_wins(self):
-        assert _resolve_anthropic_messages_max_tokens(
-            8192, "claude-opus-4-6"
-        ) == 8192
-
-    def test_zero_falls_back_to_model_default(self):
-        # Should use _get_anthropic_max_output(model), not crash
-        result = _resolve_anthropic_messages_max_tokens(0, "claude-opus-4-6")
-        assert result > 0
-
-    def test_none_falls_back_to_model_default(self):
-        result = _resolve_anthropic_messages_max_tokens(None, "claude-opus-4-6")
-        assert result > 0
-
-    def test_negative_falls_back_to_model_default(self):
-        # Previously leaked -1 to the API; now falls back safely
-        result = _resolve_anthropic_messages_max_tokens(-1, "claude-opus-4-6")
-        assert result > 0
-
-    def test_fractional_positive_floored(self):
-        assert _resolve_anthropic_messages_max_tokens(
-            8192.5, "claude-opus-4-6"
-        ) == 8192
-
-    def test_sub_one_float_falls_back(self):
-        # 0.5 floors to 0 -> not positive -> falls back to model ceiling
-        result = _resolve_anthropic_messages_max_tokens(0.5, "claude-opus-4-6")
-        assert result > 0
-        assert result != 0
@@ -0,0 +1,238 @@
+"""Regression tests: normalize_anthropic_response_v2 vs v1.
+
+Constructs mock Anthropic responses and asserts that the v2 function
+(returning NormalizedResponse) produces identical field values to the
+original v1 function (returning SimpleNamespace + finish_reason).
+"""
+
+import json
+import pytest
+from types import SimpleNamespace
+
+from agent.anthropic_adapter import (
+    normalize_anthropic_response,
+    normalize_anthropic_response_v2,
+)
+from agent.transports.types import NormalizedResponse, ToolCall
+
+
+# ---------------------------------------------------------------------------
+# Helpers to build mock Anthropic SDK responses
+# ---------------------------------------------------------------------------
+
+def _text_block(text: str):
+    return SimpleNamespace(type="text", text=text)
+
+
+def _thinking_block(thinking: str, signature: str = "sig_abc"):
+    return SimpleNamespace(type="thinking", thinking=thinking, signature=signature)
+
+
+def _tool_use_block(id: str, name: str, input: dict):
+    return SimpleNamespace(type="tool_use", id=id, name=name, input=input)
+
+
+def _response(content_blocks, stop_reason="end_turn"):
+    return SimpleNamespace(
+        content=content_blocks,
+        stop_reason=stop_reason,
+        usage=SimpleNamespace(
+            input_tokens=10,
+            output_tokens=5,
+        ),
+    )
+
+
+# ---------------------------------------------------------------------------
+# Tests
+# ---------------------------------------------------------------------------
+
+class TestTextOnly:
+    """Text-only response — no tools, no thinking."""
+
+    def setup_method(self):
+        self.resp = _response([_text_block("Hello world")])
+        self.v1_msg, self.v1_finish = normalize_anthropic_response(self.resp)
+        self.v2 = normalize_anthropic_response_v2(self.resp)
+
+    def test_type(self):
+        assert isinstance(self.v2, NormalizedResponse)
+
+    def test_content_matches(self):
+        assert self.v2.content == self.v1_msg.content
+
+    def test_finish_reason_matches(self):
+        assert self.v2.finish_reason == self.v1_finish
+
+    def test_no_tool_calls(self):
+        assert self.v2.tool_calls is None
+        assert self.v1_msg.tool_calls is None
+
+    def test_no_reasoning(self):
+        assert self.v2.reasoning is None
+        assert self.v1_msg.reasoning is None
+
+
+class TestWithToolCalls:
+    """Response with tool calls."""
+
+    def setup_method(self):
+        self.resp = _response(
+            [
+                _text_block("I'll check that"),
+                _tool_use_block("toolu_abc", "terminal", {"command": "ls"}),
+                _tool_use_block("toolu_def", "read_file", {"path": "/tmp"}),
+            ],
+            stop_reason="tool_use",
+        )
+        self.v1_msg, self.v1_finish = normalize_anthropic_response(self.resp)
+        self.v2 = normalize_anthropic_response_v2(self.resp)
+
+    def test_finish_reason(self):
+        assert self.v2.finish_reason == "tool_calls"
+        assert self.v1_finish == "tool_calls"
+
+    def test_tool_call_count(self):
+        assert len(self.v2.tool_calls) == 2
+        assert len(self.v1_msg.tool_calls) == 2
+
+    def test_tool_call_ids_match(self):
+        for i in range(2):
+            assert self.v2.tool_calls[i].id == self.v1_msg.tool_calls[i].id
+
+    def test_tool_call_names_match(self):
+        assert self.v2.tool_calls[0].name == "terminal"
+        assert self.v2.tool_calls[1].name == "read_file"
+        for i in range(2):
+            assert self.v2.tool_calls[i].name == self.v1_msg.tool_calls[i].function.name
+
+    def test_tool_call_arguments_match(self):
+        for i in range(2):
+            assert self.v2.tool_calls[i].arguments == self.v1_msg.tool_calls[i].function.arguments
+
+    def test_content_preserved(self):
+        assert self.v2.content == self.v1_msg.content
+        assert "check that" in self.v2.content
+
+
+class TestWithThinking:
+    """Response with thinking blocks (Claude 3.5+ extended thinking)."""
+
+    def setup_method(self):
+        self.resp = _response([
+            _thinking_block("Let me think about this carefully..."),
+            _text_block("The answer is 42."),
+        ])
+        self.v1_msg, self.v1_finish = normalize_anthropic_response(self.resp)
+        self.v2 = normalize_anthropic_response_v2(self.resp)
+
+    def test_reasoning_matches(self):
+        assert self.v2.reasoning == self.v1_msg.reasoning
+        assert "think about this" in self.v2.reasoning
+
+    def test_reasoning_details_in_provider_data(self):
+        v1_details = self.v1_msg.reasoning_details
+        v2_details = self.v2.provider_data.get("reasoning_details") if self.v2.provider_data else None
+        assert v1_details is not None
+        assert v2_details is not None
+        assert len(v2_details) == len(v1_details)
+
+    def test_content_excludes_thinking(self):
+        assert self.v2.content == "The answer is 42."
+
+
+class TestMixed:
+    """Response with thinking + text + tool calls."""
+
+    def setup_method(self):
+        self.resp = _response(
+            [
+                _thinking_block("Planning my approach..."),
+                _text_block("I'll run the command"),
+                _tool_use_block("toolu_xyz", "terminal", {"command": "pwd"}),
+            ],
+            stop_reason="tool_use",
+        )
+        self.v1_msg, self.v1_finish = normalize_anthropic_response(self.resp)
+        self.v2 = normalize_anthropic_response_v2(self.resp)
+
+    def test_all_fields_present(self):
+        assert self.v2.content is not None
+        assert self.v2.tool_calls is not None
+        assert self.v2.reasoning is not None
+        assert self.v2.finish_reason == "tool_calls"
+
+    def test_content_matches(self):
+        assert self.v2.content == self.v1_msg.content
+
+    def test_reasoning_matches(self):
+        assert self.v2.reasoning == self.v1_msg.reasoning
+
+    def test_tool_call_matches(self):
+        assert self.v2.tool_calls[0].id == self.v1_msg.tool_calls[0].id
+        assert self.v2.tool_calls[0].name == self.v1_msg.tool_calls[0].function.name
+
+
+class TestStopReasons:
+    """Verify finish_reason mapping matches between v1 and v2."""
+
+    @pytest.mark.parametrize("stop_reason,expected", [
+        ("end_turn", "stop"),
+        ("tool_use", "tool_calls"),
+        ("max_tokens", "length"),
+        ("stop_sequence", "stop"),
+        ("refusal", "content_filter"),
+        ("model_context_window_exceeded", "length"),
+        ("unknown_future_reason", "stop"),
+    ])
+    def test_stop_reason_mapping(self, stop_reason, expected):
+        resp = _response([_text_block("x")], stop_reason=stop_reason)
+        v1_msg, v1_finish = normalize_anthropic_response(resp)
+        v2 = normalize_anthropic_response_v2(resp)
+        assert v2.finish_reason == v1_finish == expected
+
+
+class TestStripToolPrefix:
+    """Verify mcp_ prefix stripping works identically."""
+
+    def test_prefix_stripped(self):
+        resp = _response(
+            [_tool_use_block("toolu_1", "mcp_terminal", {"cmd": "ls"})],
+            stop_reason="tool_use",
+        )
+        v1_msg, _ = normalize_anthropic_response(resp, strip_tool_prefix=True)
+        v2 = normalize_anthropic_response_v2(resp, strip_tool_prefix=True)
+        assert v1_msg.tool_calls[0].function.name == "terminal"
+        assert v2.tool_calls[0].name == "terminal"
+
+    def test_prefix_kept(self):
+        resp = _response(
+            [_tool_use_block("toolu_1", "mcp_terminal", {"cmd": "ls"})],
+            stop_reason="tool_use",
+        )
+        v1_msg, _ = normalize_anthropic_response(resp, strip_tool_prefix=False)
+        v2 = normalize_anthropic_response_v2(resp, strip_tool_prefix=False)
+        assert v1_msg.tool_calls[0].function.name == "mcp_terminal"
+        assert v2.tool_calls[0].name == "mcp_terminal"
+
+
+class TestEdgeCases:
+    """Edge cases: empty content, no blocks, etc."""
+
+    def test_empty_content_blocks(self):
+        resp = _response([])
+        v1_msg, v1_finish = normalize_anthropic_response(resp)
+        v2 = normalize_anthropic_response_v2(resp)
+        assert v2.content == v1_msg.content
+        assert v2.content is None
+
+    def test_no_reasoning_details_means_none_provider_data(self):
+        resp = _response([_text_block("hi")])
+        v2 = normalize_anthropic_response_v2(resp)
+        assert v2.provider_data is None
+
+    def test_v2_returns_dataclass_not_namespace(self):
+        resp = _response([_text_block("hi")])
+        v2 = normalize_anthropic_response_v2(resp)
+        assert isinstance(v2, NormalizedResponse)
+        assert not isinstance(v2, SimpleNamespace)
@@ -782,6 +782,45 @@ def test_resolve_api_key_provider_skips_unconfigured_anthropic(monkeypatch):
 # ---------------------------------------------------------------------------


+class TestModelDefaultElimination:
+    """_resolve_api_key_provider must skip providers without known aux models."""
+
+    def test_unknown_provider_skipped(self, monkeypatch):
+        """Providers not in _API_KEY_PROVIDER_AUX_MODELS are skipped, not sent model='default'."""
+        from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
+
+        # Verify our known providers have entries
+        assert "gemini" in _API_KEY_PROVIDER_AUX_MODELS
+        assert "kimi-coding" in _API_KEY_PROVIDER_AUX_MODELS
+
+        # A random provider_id not in the dict should return None
+        assert _API_KEY_PROVIDER_AUX_MODELS.get("totally-unknown-provider") is None
+
+    def test_known_provider_gets_real_model(self):
+        """Known providers get a real model name, not 'default'."""
+        from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
+
+        for provider_id, model in _API_KEY_PROVIDER_AUX_MODELS.items():
+            assert model != "default", f"{provider_id} should not map to 'default'"
+            assert isinstance(model, str) and model.strip(), \
+                f"{provider_id} should have a non-empty model string"
+
+    def test_volcengine_byteplus_use_main_model_first(self):
+        """Volcengine/BytePlus use main-model-first — no entry in _API_KEY_PROVIDER_AUX_MODELS."""
+        from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
+
+        assert "volcengine" not in _API_KEY_PROVIDER_AUX_MODELS
+        assert "byteplus" not in _API_KEY_PROVIDER_AUX_MODELS
+
+
+class TestContractProviderAliases:
+    def test_coding_plan_aliases_normalize_to_canonical_provider(self):
+        from agent.auxiliary_client import _normalize_aux_provider
+
+        assert _normalize_aux_provider("volcengine-coding-plan") == "volcengine"
+        assert _normalize_aux_provider("byteplus-coding-plan") == "byteplus"
+
+
 # ---------------------------------------------------------------------------
 # _try_payment_fallback reason parameter (#7512 bug 3)
 # ---------------------------------------------------------------------------
@@ -253,35 +253,6 @@ class TestSummaryPrefixNormalization:


 class TestCompressWithClient:
-    def test_system_content_list_gets_compression_note_without_crashing(self):
-        mock_response = MagicMock()
-        mock_response.choices = [MagicMock()]
-        mock_response.choices[0].message.content = "summary text"
-
-        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
-            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
-
-        msgs = [
-            {"role": "system", "content": [{"type": "text", "text": "system prompt"}]},
-            {"role": "user", "content": "msg 1"},
-            {"role": "assistant", "content": "msg 2"},
-            {"role": "user", "content": "msg 3"},
-            {"role": "assistant", "content": "msg 4"},
-            {"role": "user", "content": "msg 5"},
-            {"role": "assistant", "content": "msg 6"},
-            {"role": "user", "content": "msg 7"},
-        ]
-
-        with patch("agent.context_compressor.call_llm", return_value=mock_response):
-            result = c.compress(msgs)
-
-        assert isinstance(result[0]["content"], list)
-        assert any(
-            isinstance(block, dict)
-            and "compacted into a handoff summary" in block.get("text", "")
-            for block in result[0]["content"]
-        )
-
    def test_summarization_path(self):
        mock_client = MagicMock()
        mock_response = MagicMock()
@@ -489,41 +460,6 @@ class TestCompressWithClient:
        assert len(first_tail) == 1
        assert "summary text" in first_tail[0]["content"]

-    def test_double_collision_merges_summary_into_list_tail_content(self):
-        """Structured tail content should accept a merged summary without TypeError."""
-        mock_response = MagicMock()
-        mock_response.choices = [MagicMock()]
-        mock_response.choices[0].message.content = "summary text"
-
-        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
-            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=3, protect_last_n=3)
-
-        msgs = [
-            {"role": "system", "content": "system prompt"},
-            {"role": "user", "content": "msg 1"},
-            {"role": "assistant", "content": "msg 2"},
-            {"role": "user", "content": "msg 3"},
-            {"role": "assistant", "content": "msg 4"},
-            {"role": "user", "content": "msg 5"},
-            {"role": "user", "content": [{"type": "text", "text": "msg 6"}]},
-            {"role": "assistant", "content": "msg 7"},
-            {"role": "user", "content": "msg 8"},
-        ]
-
-        with patch("agent.context_compressor.call_llm", return_value=mock_response):
-            result = c.compress(msgs)
-
-        merged_tail = next(
-            m for m in result
-            if m.get("role") == "user" and isinstance(m.get("content"), list)
-        )
-        assert isinstance(merged_tail["content"], list)
-        assert "summary text" in merged_tail["content"][0]["text"]
-        assert any(
-            isinstance(block, dict) and block.get("text") == "msg 6"
-            for block in merged_tail["content"]
-        )
-
    def test_double_collision_user_head_assistant_tail(self):
        """Reverse double collision: head ends with 'user', tail starts with 'assistant'.
        summary='assistant' collides with tail, 'user' collides with head → merge."""
@@ -949,94 +949,3 @@ class TestAdversarialEdgeCases:
        e = MockAPIError("server error", status_code=500, body={"message": None})
        result = classify_api_error(e)
        assert result is not None
-
-
-# ── Test: SSL/TLS transient errors ─────────────────────────────────────
-
-class TestSSLTransientPatterns:
-    """SSL/TLS alerts mid-stream should retry as timeout, not unknown, and
-    should NOT trigger context compression even on a large session.
-
-    Motivation: OpenSSL 3.x changed TLS alert error code format
-    (`SSLV3_ALERT_BAD_RECORD_MAC` → `SSL/TLS_ALERT_BAD_RECORD_MAC`),
-    breaking string-exact matching in downstream retry logic.  We match
-    stable substrings instead.
-    """
-
-    def test_bad_record_mac_classifies_as_timeout(self):
-        """OpenSSL 3.x mid-stream bad record mac alert."""
-        e = Exception("[SSL: BAD_RECORD_MAC] sslv3 alert bad record mac (_ssl.c:2580)")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-        assert result.retryable is True
-        assert result.should_compress is False
-
-    def test_openssl_3x_format_classifies_as_timeout(self):
-        """New format `ERR_SSL_SSL/TLS_ALERT_BAD_RECORD_MAC` still matches
-        because we key on both space- and underscore-separated forms of
-        the stable `bad_record_mac` token."""
-        e = Exception("ERR_SSL_SSL/TLS_ALERT_BAD_RECORD_MAC during streaming")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-        assert result.retryable is True
-        assert result.should_compress is False
-
-    def test_tls_alert_internal_error_classifies_as_timeout(self):
-        e = Exception("[SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-        assert result.retryable is True
-        assert result.should_compress is False
-
-    def test_ssl_handshake_failure_classifies_as_timeout(self):
-        e = Exception("ssl handshake failure during mid-stream")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-        assert result.retryable is True
-
-    def test_ssl_prefix_classifies_as_timeout(self):
-        """Python's generic '[SSL: XYZ]' prefix from the ssl module."""
-        e = Exception("[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-        assert result.retryable is True
-
-    def test_ssl_alert_on_large_session_does_not_compress(self):
-        """Critical: SSL alerts on big contexts must NOT trigger context
-        compression — compression is expensive and won't fix a transport
-        hiccup.  This is why _SSL_TRANSIENT_PATTERNS is separate from
-        _SERVER_DISCONNECT_PATTERNS.
-        """
-        e = Exception("[SSL: BAD_RECORD_MAC] sslv3 alert bad record mac")
-        result = classify_api_error(
-            e,
-            approx_tokens=180000,      # 90% of a 200k-context window
-            context_length=200000,
-            num_messages=300,
-        )
-        assert result.reason == FailoverReason.timeout
-        assert result.should_compress is False
-
-    def test_plain_disconnect_on_large_session_still_compresses(self):
-        """Regression guard: the context-overflow-via-disconnect path
-        (non-SSL disconnects on large sessions) must still trigger
-        compression.  Only SSL-specific disconnects skip it.
-        """
-        e = Exception("Server disconnected without sending a response")
-        result = classify_api_error(
-            e,
-            approx_tokens=180000,
-            context_length=200000,
-            num_messages=300,
-        )
-        assert result.reason == FailoverReason.context_overflow
-        assert result.should_compress is True
-
-    def test_real_ssl_error_type_classifies_as_timeout(self):
-        """Real ssl.SSLError instance — the type name alone (not message)
-        should route to the transport bucket."""
-        import ssl
-        e = ssl.SSLError("arbitrary ssl error")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-        assert result.retryable is True
@@ -106,25 +106,3 @@ class TestIsLocalEndpoint:
    ])
    def test_remote_endpoints(self, url):
        assert is_local_endpoint(url) is False
-
-    @pytest.mark.parametrize("url", [
-        "http://100.64.0.0:11434",            # lower bound of CGNAT block
-        "http://100.64.0.1:11434/v1",         # lower bound +1
-        "http://100.77.243.5:11434",          # representative Tailscale host
-        "https://100.100.100.100:443",        # Tailscale MagicDNS anchor
-        "https://100.127.255.254:443",        # upper bound -1
-        "http://100.127.255.255:11434",       # upper bound of CGNAT block
-    ])
-    def test_tailscale_cgnat_is_local(self, url):
-        """Tailscale 100.64.0.0/10 should be treated as local for timeout bumps."""
-        assert is_local_endpoint(url) is True
-
-    @pytest.mark.parametrize("url", [
-        "http://100.63.255.255:11434",        # just below CGNAT block
-        "http://100.128.0.1:11434",           # just above CGNAT block
-        "http://100.200.0.1:11434",           # well outside CGNAT
-        "http://99.64.0.1:11434",             # first octet wrong
-    ])
-    def test_near_but_not_cgnat_is_remote(self, url):
-        """Hosts adjacent to but outside 100.64.0.0/10 must not match."""
-        assert is_local_endpoint(url) is False
@@ -222,6 +222,22 @@ class TestGetModelContextLength:
        mock_fetch.return_value = {}
        assert get_model_context_length("unknown/never-heard-of-this") == CONTEXT_PROBE_TIERS[0]

+    @patch("agent.model_metadata.fetch_model_metadata")
+    def test_volcengine_contract_model_uses_contract_context_length(self, mock_fetch):
+        mock_fetch.return_value = {}
+        assert get_model_context_length(
+            "volcengine/doubao-seed-2-0-pro-260215",
+            provider="volcengine",
+        ) == 256000
+
+    @patch("agent.model_metadata.fetch_model_metadata")
+    def test_byteplus_contract_model_infers_provider_from_url(self, mock_fetch):
+        mock_fetch.return_value = {}
+        assert get_model_context_length(
+            "byteplus-coding-plan/kimi-k2.5",
+            base_url="https://ark.ap-southeast.bytepluses.com/api/coding/v3",
+        ) == 256000
+
    @patch("agent.model_metadata.fetch_model_metadata")
    def test_partial_match_in_defaults(self, mock_fetch):
        mock_fetch.return_value = {}
@@ -39,73 +39,6 @@ def test_normalize_usage_openai_subtracts_cached_prompt_tokens():
    assert normalized.output_tokens == 700


-def test_normalize_usage_openai_reads_top_level_anthropic_cache_fields():
-    """Some OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline) expose
-    Anthropic-style cache token counts at the top level of the usage object when
-    routing Claude models, instead of nesting them in prompt_tokens_details.
-
-    Regression guard for the bug fixed in cline/cline#10266 — before this fix,
-    the chat-completions branch of normalize_usage() only read
-    prompt_tokens_details.cache_write_tokens and completely missed the
-    cache_creation_input_tokens case, so cache writes showed as 0 and reflected
-    inputTokens were overstated by the cache-write amount.
-    """
-    usage = SimpleNamespace(
-        prompt_tokens=1000,
-        completion_tokens=200,
-        prompt_tokens_details=SimpleNamespace(cached_tokens=500),
-        cache_creation_input_tokens=300,
-    )
-
-    normalized = normalize_usage(usage, provider="openrouter", api_mode="chat_completions")
-
-    # Expected: cache read from prompt_tokens_details.cached_tokens (preferred),
-    # cache write from top-level cache_creation_input_tokens (fallback).
-    assert normalized.cache_read_tokens == 500
-    assert normalized.cache_write_tokens == 300
-    # input_tokens = prompt_total - cache_read - cache_write = 1000 - 500 - 300 = 200
-    assert normalized.input_tokens == 200
-    assert normalized.output_tokens == 200
-
-
-def test_normalize_usage_openai_reads_top_level_cache_read_when_details_missing():
-    """Some proxies expose only top-level Anthropic-style fields with no
-    prompt_tokens_details object. Regression guard for cline/cline#10266.
-    """
-    usage = SimpleNamespace(
-        prompt_tokens=1000,
-        completion_tokens=200,
-        cache_read_input_tokens=500,
-        cache_creation_input_tokens=300,
-    )
-
-    normalized = normalize_usage(usage, provider="openrouter", api_mode="chat_completions")
-
-    assert normalized.cache_read_tokens == 500
-    assert normalized.cache_write_tokens == 300
-    assert normalized.input_tokens == 200
-
-
-def test_normalize_usage_openai_prefers_prompt_tokens_details_over_top_level():
-    """When both prompt_tokens_details and top-level Anthropic fields are
-    present, we prefer the OpenAI-standard nested fields. Top-level Anthropic
-    fields are only a fallback when the nested ones are absent/zero.
-    """
-    usage = SimpleNamespace(
-        prompt_tokens=1000,
-        completion_tokens=200,
-        prompt_tokens_details=SimpleNamespace(cached_tokens=600, cache_write_tokens=150),
-        # Intentionally different values — proving we ignore these when details exist.
-        cache_read_input_tokens=999,
-        cache_creation_input_tokens=999,
-    )
-
-    normalized = normalize_usage(usage, provider="openrouter", api_mode="chat_completions")
-
-    assert normalized.cache_read_tokens == 600
-    assert normalized.cache_write_tokens == 150
-
-
 def test_openrouter_models_api_pricing_is_converted_from_per_token_to_per_million(monkeypatch):
    monkeypatch.setattr(
        "agent.usage_pricing.fetch_model_metadata",
@@ -114,14 +114,6 @@ class TestAnthropicTransport:
        r = SimpleNamespace(content=[])
        assert transport.validate_response(r) is False

-    def test_validate_response_empty_content_with_end_turn_is_valid(self, transport):
-        r = SimpleNamespace(content=[], stop_reason="end_turn")
-        assert transport.validate_response(r) is True
-
-    def test_validate_response_empty_content_with_tool_use_is_invalid(self, transport):
-        r = SimpleNamespace(content=[], stop_reason="tool_use")
-        assert transport.validate_response(r) is False
-
    def test_validate_response_valid(self, transport):
        r = SimpleNamespace(content=[SimpleNamespace(type="text", text="hello")])
        assert transport.validate_response(r) is True
@@ -1,60 +0,0 @@
-"""Tests for the gateway /debug command."""
-
-from unittest.mock import patch
-
-import pytest
-
-from gateway.config import GatewayConfig, Platform
-from gateway.platforms.base import MessageEvent
-from gateway.session import SessionSource
-
-
-def _make_event(text="/debug", platform=Platform.TELEGRAM,
-                user_id="12345", chat_id="67890"):
-    source = SessionSource(
-        platform=platform,
-        user_id=user_id,
-        chat_id=chat_id,
-        user_name="testuser",
-    )
-    return MessageEvent(text=text, source=source)
-
-
-def _make_runner():
-    from gateway.run import GatewayRunner
-
-    runner = object.__new__(GatewayRunner)
-    runner.config = GatewayConfig()
-    runner.adapters = {}
-    return runner
-
-
-class TestHandleDebugCommand:
-    @pytest.mark.asyncio
-    async def test_debug_sweeps_expired_pastes_before_upload(self):
-        runner = _make_runner()
-        event = _make_event()
-
-        with patch("hermes_cli.debug._sweep_expired_pastes", return_value=(0, 0)) as mock_sweep, \
-             patch("hermes_cli.debug._capture_dump", return_value="dump"), \
-             patch("hermes_cli.debug.collect_debug_report", return_value="report"), \
-             patch("hermes_cli.debug.upload_to_pastebin", return_value="https://paste.rs/report"), \
-             patch("hermes_cli.debug._schedule_auto_delete"):
-            result = await runner._handle_debug_command(event)
-
-        mock_sweep.assert_called_once()
-        assert "https://paste.rs/report" in result
-
-    @pytest.mark.asyncio
-    async def test_debug_survives_sweep_failure(self):
-        runner = _make_runner()
-        event = _make_event()
-
-        with patch("hermes_cli.debug._sweep_expired_pastes", side_effect=RuntimeError("offline")), \
-             patch("hermes_cli.debug._capture_dump", return_value="dump"), \
-             patch("hermes_cli.debug.collect_debug_report", return_value="report"), \
-             patch("hermes_cli.debug.upload_to_pastebin", return_value="https://paste.rs/report"), \
-             patch("hermes_cli.debug._schedule_auto_delete"):
-            result = await runner._handle_debug_command(event)
-
-        assert "https://paste.rs/report" in result
@@ -199,89 +199,6 @@ async def test_auto_registered_command_with_args(adapter):
    )


-@pytest.mark.asyncio
-async def test_auto_registers_plugin_commands_for_discord(adapter):
-    """Plugin slash commands should appear as native Discord app commands."""
-    adapter._run_simple_slash = AsyncMock()
-
-    with patch(
-        "hermes_cli.plugins.get_plugin_commands",
-        return_value={
-            "metricas": {
-                "handler": lambda _a: "ok",
-                "description": "Metrics dashboard",
-                "args_hint": "dias:7 formato:json",
-                "plugin": "metrics-plugin",
-            }
-        },
-    ):
-        adapter._register_slash_commands()
-
-    tree_names = set(adapter._client.tree.commands.keys())
-    assert "metricas" in tree_names
-
-    metricas_cmd = adapter._client.tree.commands["metricas"]
-    interaction = SimpleNamespace()
-    await metricas_cmd.callback(interaction, args="dias:7 formato:json")
-    adapter._run_simple_slash.assert_awaited_once_with(
-        interaction, "/metricas dias:7 formato:json"
-    )
-
-
-@pytest.mark.asyncio
-async def test_auto_registered_plugin_command_without_args_hint(adapter):
-    """Plugin commands without args_hint should register as parameterless."""
-    adapter._run_simple_slash = AsyncMock()
-
-    with patch(
-        "hermes_cli.plugins.get_plugin_commands",
-        return_value={
-            "ping": {
-                "handler": lambda _a: "pong",
-                "description": "Ping the plugin",
-                "args_hint": "",
-                "plugin": "ping-plugin",
-            }
-        },
-    ):
-        adapter._register_slash_commands()
-
-    assert "ping" in adapter._client.tree.commands
-    ping_cmd = adapter._client.tree.commands["ping"]
-    interaction = SimpleNamespace()
-    await ping_cmd.callback(interaction)
-    adapter._run_simple_slash.assert_awaited_once_with(interaction, "/ping")
-
-
-@pytest.mark.asyncio
-async def test_plugin_command_name_conflict_skipped(adapter):
-    """A plugin command that collides with a built-in must not override it."""
-    adapter._run_simple_slash = AsyncMock()
-
-    with patch(
-        "hermes_cli.plugins.get_plugin_commands",
-        return_value={
-            "status": {
-                "handler": lambda _a: "plugin-status",
-                "description": "Plugin status",
-                "args_hint": "",
-                "plugin": "shadow-plugin",
-            }
-        },
-    ):
-        adapter._register_slash_commands()
-
-    # Built-ins are registered via @tree.command as plain functions. A
-    # plugin-registered override would install a _FakeCommand instance
-    # (has .callback) via tree.add_command. If the conflict-skip logic
-    # fires, the slot remains a bare function.
-    status_entry = adapter._client.tree.commands["status"]
-    assert callable(status_entry) and not hasattr(status_entry, "callback"), (
-        "plugin registration overrode the built-in /status command — "
-        "the already_registered skip must prevent this"
-    )
-
-
 # ------------------------------------------------------------------
 # _handle_thread_create_slash — success, session dispatch, failure
 # ------------------------------------------------------------------
@@ -220,99 +220,3 @@ class TestEmit:

        await reg.emit("agent:start")  # no context arg
        assert captured[0] == {}
-
-
-class TestEmitCollect:
-    """Tests for emit_collect() — returns handler return values for decision-style hooks."""
-
-    @pytest.mark.asyncio
-    async def test_collects_sync_return_values(self):
-        reg = HookRegistry()
-        reg._handlers["command:status"] = [
-            lambda _e, _c: {"decision": "allow"},
-            lambda _e, _c: {"decision": "deny", "message": "nope"},
-        ]
-
-        results = await reg.emit_collect("command:status", {})
-
-        assert results == [
-            {"decision": "allow"},
-            {"decision": "deny", "message": "nope"},
-        ]
-
-    @pytest.mark.asyncio
-    async def test_collects_async_return_values(self):
-        reg = HookRegistry()
-
-        async def _async_handler(_event_type, _ctx):
-            return {"decision": "handled", "message": "done"}
-
-        reg._handlers["command:ping"] = [_async_handler]
-
-        results = await reg.emit_collect("command:ping", {})
-
-        assert results == [{"decision": "handled", "message": "done"}]
-
-    @pytest.mark.asyncio
-    async def test_drops_none_return_values(self):
-        reg = HookRegistry()
-        reg._handlers["command:x"] = [
-            lambda _e, _c: None,  # fire-and-forget, returns nothing
-            lambda _e, _c: {"decision": "deny"},
-            lambda _e, _c: None,
-        ]
-
-        results = await reg.emit_collect("command:x", {})
-
-        assert results == [{"decision": "deny"}]
-
-    @pytest.mark.asyncio
-    async def test_handler_exception_does_not_abort_chain(self):
-        reg = HookRegistry()
-
-        def _raises(_e, _c):
-            raise ValueError("boom")
-
-        reg._handlers["command:x"] = [
-            _raises,
-            lambda _e, _c: {"decision": "allow"},
-        ]
-
-        results = await reg.emit_collect("command:x", {})
-
-        # First handler's exception is swallowed; second handler's value still collected.
-        assert results == [{"decision": "allow"}]
-
-    @pytest.mark.asyncio
-    async def test_wildcard_match_also_collected(self):
-        reg = HookRegistry()
-        reg._handlers["command:*"] = [lambda _e, _c: {"decision": "allow"}]
-        reg._handlers["command:reset"] = [lambda _e, _c: {"decision": "deny"}]
-
-        results = await reg.emit_collect("command:reset", {})
-
-        # Exact match fires first, then wildcard.
-        assert results == [{"decision": "deny"}, {"decision": "allow"}]
-
-    @pytest.mark.asyncio
-    async def test_no_handlers_returns_empty_list(self):
-        reg = HookRegistry()
-
-        results = await reg.emit_collect("unknown:event", {})
-
-        assert results == []
-
-    @pytest.mark.asyncio
-    async def test_default_context(self):
-        reg = HookRegistry()
-        captured = []
-
-        def _handler(event_type, context):
-            captured.append((event_type, context))
-            return None
-
-        reg._handlers["agent:start"] = [_handler]
-
-        await reg.emit_collect("agent:start")  # no context arg
-
-        assert captured == [("agent:start", {})]
@@ -1,201 +0,0 @@
-"""Regression tests for approval-state cleanup on session boundaries."""
-
-from datetime import datetime
-from unittest.mock import AsyncMock, MagicMock
-
-import pytest
-
-from gateway.config import Platform
-from gateway.platforms.base import MessageEvent
-from gateway.session import SessionEntry, SessionSource, build_session_key
-from tools import approval as approval_mod
-from tools.approval import (
-    approve_session,
-    enable_session_yolo,
-    is_approved,
-    is_session_yolo_enabled,
-)
-
-
-@pytest.fixture(autouse=True)
-def _clear_approval_state():
-    approval_mod._gateway_queues.clear()
-    approval_mod._gateway_notify_cbs.clear()
-    approval_mod._session_approved.clear()
-    approval_mod._session_yolo.clear()
-    approval_mod._permanent_approved.clear()
-    approval_mod._pending.clear()
-    yield
-    approval_mod._gateway_queues.clear()
-    approval_mod._gateway_notify_cbs.clear()
-    approval_mod._session_approved.clear()
-    approval_mod._session_yolo.clear()
-    approval_mod._permanent_approved.clear()
-    approval_mod._pending.clear()
-
-
-def _make_source() -> SessionSource:
-    return SessionSource(
-        platform=Platform.TELEGRAM,
-        user_id="u1",
-        chat_id="c1",
-        user_name="tester",
-        chat_type="dm",
-    )
-
-
-def _make_event(text: str) -> MessageEvent:
-    return MessageEvent(text=text, source=_make_source(), message_id="m1")
-
-
-def _make_entry(session_id: str, source: SessionSource | None = None) -> SessionEntry:
-    source = source or _make_source()
-    return SessionEntry(
-        session_key=build_session_key(source),
-        session_id=session_id,
-        created_at=datetime.now(),
-        updated_at=datetime.now(),
-        origin=source,
-        platform=source.platform,
-        chat_type=source.chat_type,
-    )
-
-
-def _make_resume_runner():
-    from gateway.run import GatewayRunner
-
-    source = _make_source()
-    session_key = build_session_key(source)
-    current_entry = _make_entry("current-session", source)
-    resumed_entry = _make_entry("resumed-session", source)
-
-    runner = object.__new__(GatewayRunner)
-    runner.adapters = {}
-    runner._background_tasks = set()
-    runner._async_flush_memories = AsyncMock()
-    runner._running_agents = {}
-    runner._running_agents_ts = {}
-    runner._busy_ack_ts = {}
-    runner._pending_approvals = {}
-    runner._agent_cache_lock = None
-    runner.session_store = MagicMock()
-    runner.session_store.get_or_create_session.return_value = current_entry
-    runner.session_store.switch_session.return_value = resumed_entry
-    runner.session_store.load_transcript.return_value = []
-    runner._session_db = MagicMock()
-    runner._session_db.resolve_session_by_title.return_value = "resumed-session"
-    runner._session_db.get_session_title.return_value = "Resumed Work"
-    return runner, session_key
-
-
-def _make_branch_runner():
-    from gateway.run import GatewayRunner
-
-    source = _make_source()
-    session_key = build_session_key(source)
-    current_entry = _make_entry("current-session", source)
-    branched_entry = _make_entry("branched-session", source)
-
-    runner = object.__new__(GatewayRunner)
-    runner.adapters = {}
-    runner.config = {}
-    runner._running_agents = {}
-    runner._running_agents_ts = {}
-    runner._busy_ack_ts = {}
-    runner._pending_approvals = {}
-    runner._agent_cache_lock = None
-    runner.session_store = MagicMock()
-    runner.session_store.get_or_create_session.return_value = current_entry
-    runner.session_store.load_transcript.return_value = [
-        {"role": "user", "content": "hello"},
-        {"role": "assistant", "content": "world"},
-    ]
-    runner.session_store.switch_session.return_value = branched_entry
-    runner._session_db = MagicMock()
-    runner._session_db.get_session_title.return_value = "Current Work"
-    runner._session_db.get_next_title_in_lineage.return_value = "Current Work #2"
-    return runner, session_key
-
-
-@pytest.mark.asyncio
-async def test_resume_clears_session_scoped_approval_and_yolo_state():
-    runner, session_key = _make_resume_runner()
-    other_key = "agent:main:telegram:dm:other-chat"
-
-    approve_session(session_key, "recursive delete")
-    approve_session(other_key, "recursive delete")
-    enable_session_yolo(session_key)
-    enable_session_yolo(other_key)
-    runner._pending_approvals[session_key] = {"command": "rm -rf /tmp/demo"}
-    runner._pending_approvals[other_key] = {"command": "rm -rf /tmp/other"}
-
-    result = await runner._handle_resume_command(_make_event("/resume Resumed Work"))
-
-    assert "Resumed session" in result
-    assert is_approved(session_key, "recursive delete") is False
-    assert is_session_yolo_enabled(session_key) is False
-    assert session_key not in runner._pending_approvals
-    assert is_approved(other_key, "recursive delete") is True
-    assert is_session_yolo_enabled(other_key) is True
-    assert other_key in runner._pending_approvals
-
-
-@pytest.mark.asyncio
-async def test_branch_clears_session_scoped_approval_and_yolo_state():
-    runner, session_key = _make_branch_runner()
-    other_key = "agent:main:telegram:dm:other-chat"
-
-    approve_session(session_key, "recursive delete")
-    approve_session(other_key, "recursive delete")
-    enable_session_yolo(session_key)
-    enable_session_yolo(other_key)
-    runner._pending_approvals[session_key] = {"command": "rm -rf /tmp/demo"}
-    runner._pending_approvals[other_key] = {"command": "rm -rf /tmp/other"}
-
-    result = await runner._handle_branch_command(_make_event("/branch"))
-
-    assert "Branched to" in result
-    assert is_approved(session_key, "recursive delete") is False
-    assert is_session_yolo_enabled(session_key) is False
-    assert session_key not in runner._pending_approvals
-    assert is_approved(other_key, "recursive delete") is True
-    assert is_session_yolo_enabled(other_key) is True
-    assert other_key in runner._pending_approvals
-
-
-def test_clear_session_boundary_security_state_is_scoped():
-    """The helper must wipe only the target session's approval/yolo state.
-
-    Also exercises the /new reset path indirectly: /new calls this helper,
-    so if the helper is scoped correctly, /new's clearing is correct too.
-    """
-    from gateway.run import GatewayRunner
-
-    runner = object.__new__(GatewayRunner)
-    runner._pending_approvals = {}
-
-    source = _make_source()
-    session_key = build_session_key(source)
-    other_key = "agent:main:telegram:dm:other-chat"
-
-    approve_session(session_key, "recursive delete")
-    approve_session(other_key, "recursive delete")
-    enable_session_yolo(session_key)
-    enable_session_yolo(other_key)
-    runner._pending_approvals[session_key] = {"command": "rm -rf /tmp/demo"}
-    runner._pending_approvals[other_key] = {"command": "rm -rf /tmp/other"}
-
-    runner._clear_session_boundary_security_state(session_key)
-
-    # Target session cleared
-    assert is_approved(session_key, "recursive delete") is False
-    assert is_session_yolo_enabled(session_key) is False
-    assert session_key not in runner._pending_approvals
-    # Other session untouched
-    assert is_approved(other_key, "recursive delete") is True
-    assert is_session_yolo_enabled(other_key) is True
-    assert other_key in runner._pending_approvals
-
-    # Empty session_key is a no-op
-    runner._clear_session_boundary_security_state("")
-    assert is_approved(other_key, "recursive delete") is True
@@ -65,11 +65,7 @@ class TestGatewayPidState:
        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: 123)
        monkeypatch.setattr(status, "_read_process_cmdline", lambda pid: None)

-        assert status.acquire_gateway_runtime_lock() is True
-        try:
-            assert status.get_running_pid() == os.getpid()
-        finally:
-            status.release_gateway_runtime_lock()
+        assert status.get_running_pid() == os.getpid()

    def test_get_running_pid_accepts_script_style_gateway_cmdline(self, tmp_path, monkeypatch):
        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
@@ -89,11 +85,7 @@ class TestGatewayPidState:
            lambda pid: "/venv/bin/python /repo/hermes_cli/main.py gateway run --replace",
        )

-        assert status.acquire_gateway_runtime_lock() is True
-        try:
-            assert status.get_running_pid() == os.getpid()
-        finally:
-            status.release_gateway_runtime_lock()
+        assert status.get_running_pid() == os.getpid()

    def test_get_running_pid_accepts_explicit_pid_path_without_cleanup(self, tmp_path, monkeypatch):
        other_home = tmp_path / "profile-home"
@@ -110,116 +102,9 @@ class TestGatewayPidState:
        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: 123)
        monkeypatch.setattr(status, "_read_process_cmdline", lambda pid: None)

-        lock_path = other_home / "gateway.lock"
-        lock_path.write_text(json.dumps({
-            "pid": os.getpid(),
-            "kind": "hermes-gateway",
-            "argv": ["python", "-m", "hermes_cli.main", "gateway"],
-            "start_time": 123,
-        }))
-        monkeypatch.setattr(status, "is_gateway_runtime_lock_active", lambda lock_path=None: True)
-
        assert status.get_running_pid(pid_path, cleanup_stale=False) == os.getpid()
        assert pid_path.exists()

-    def test_runtime_lock_claims_and_releases_liveness(self, tmp_path, monkeypatch):
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-
-        assert status.is_gateway_runtime_lock_active() is False
-        assert status.acquire_gateway_runtime_lock() is True
-        assert status.is_gateway_runtime_lock_active() is True
-
-        status.release_gateway_runtime_lock()
-
-        assert status.is_gateway_runtime_lock_active() is False
-
-    def test_get_running_pid_treats_pid_file_as_stale_without_runtime_lock(self, tmp_path, monkeypatch):
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-        pid_path = tmp_path / "gateway.pid"
-        pid_path.write_text(json.dumps({
-            "pid": os.getpid(),
-            "kind": "hermes-gateway",
-            "argv": ["python", "-m", "hermes_cli.main", "gateway"],
-            "start_time": 123,
-        }))
-
-        monkeypatch.setattr(status.os, "kill", lambda pid, sig: None)
-        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: 123)
-        monkeypatch.setattr(status, "_read_process_cmdline", lambda pid: None)
-
-        assert status.get_running_pid() is None
-        assert not pid_path.exists()
-
-    def test_get_running_pid_cleans_stale_metadata_from_dead_foreign_pid(self, tmp_path, monkeypatch):
-        """Stale PID file from a *different* PID (crashed process) must still be cleaned.
-
-        Regression for: ``remove_pid_file()`` defensively refuses to delete a
-        PID file whose pid != ``os.getpid()`` to protect ``--replace``
-        handoffs.  Stale-cleanup must not go through that path or real
-        crashed-process PID files never get removed.
-        """
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-        pid_path = tmp_path / "gateway.pid"
-        lock_path = tmp_path / "gateway.lock"
-
-        # PID that is guaranteed not alive and not our own.
-        dead_foreign_pid = 999999
-        assert dead_foreign_pid != os.getpid()
-
-        pid_path.write_text(json.dumps({
-            "pid": dead_foreign_pid,
-            "kind": "hermes-gateway",
-            "argv": ["python", "-m", "hermes_cli.main", "gateway"],
-            "start_time": 123,
-        }))
-        lock_path.write_text(json.dumps({
-            "pid": dead_foreign_pid,
-            "kind": "hermes-gateway",
-            "argv": ["python", "-m", "hermes_cli.main", "gateway"],
-            "start_time": 123,
-        }))
-
-        # No live lock holder → get_running_pid should clean both files.
-        assert status.get_running_pid() is None
-        assert not pid_path.exists()
-        assert not lock_path.exists()
-
-    def test_get_running_pid_falls_back_to_live_lock_record(self, tmp_path, monkeypatch):
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-        pid_path = tmp_path / "gateway.pid"
-        pid_path.write_text(json.dumps({
-            "pid": 99999,
-            "kind": "hermes-gateway",
-            "argv": ["python", "-m", "hermes_cli.main", "gateway"],
-            "start_time": 123,
-        }))
-
-        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: 123)
-        monkeypatch.setattr(status, "_read_process_cmdline", lambda pid: None)
-        monkeypatch.setattr(
-            status,
-            "_build_pid_record",
-            lambda: {
-                "pid": os.getpid(),
-                "kind": "hermes-gateway",
-                "argv": ["python", "-m", "hermes_cli.main", "gateway"],
-                "start_time": 123,
-            },
-        )
-        assert status.acquire_gateway_runtime_lock() is True
-
-        def fake_kill(pid, sig):
-            if pid == 99999:
-                raise ProcessLookupError
-            return None
-
-        monkeypatch.setattr(status.os, "kill", fake_kill)
-
-        try:
-            assert status.get_running_pid() == os.getpid()
-        finally:
-            status.release_gateway_runtime_lock()
-

 class TestGatewayRuntimeStatus:
    def test_write_runtime_status_overwrites_stale_pid_on_restart(self, tmp_path, monkeypatch):
@@ -41,11 +41,7 @@ def _make_runner():
    adapter.send = AsyncMock()
    runner.adapters = {Platform.TELEGRAM: adapter}
    runner._voice_mode = {}
-    runner.hooks = SimpleNamespace(
-        emit=AsyncMock(),
-        emit_collect=AsyncMock(return_value=[]),
-        loaded_hooks=False,
-    )
+    runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False)

    session_entry = SessionEntry(
        session_key=build_session_key(_make_source()),
@@ -168,206 +164,3 @@ async def test_underscored_alias_for_hyphenated_builtin_not_flagged(monkeypatch)
    # Whatever /reload_mcp returns, it must not be the unknown-command guard.
    if result is not None:
        assert "Unknown command" not in result
-
-
-# ------------------------------------------------------------------
-# command:<name> decision hook — deny / handled / rewrite
-# ------------------------------------------------------------------
-
-@pytest.mark.asyncio
-async def test_command_hook_can_deny_before_dispatch(monkeypatch):
-    """A handler returning {"decision": "deny"} blocks a slash command early."""
-    import gateway.run as gateway_run
-
-    runner = _make_runner()
-    runner._run_agent = AsyncMock(
-        side_effect=AssertionError("denied slash command leaked to the agent")
-    )
-    runner._handle_status_command = AsyncMock(
-        side_effect=AssertionError("denied slash command reached its handler")
-    )
-    runner.hooks.emit_collect = AsyncMock(
-        return_value=[{"decision": "deny", "message": "Blocked by ACL"}]
-    )
-
-    monkeypatch.setattr(
-        gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"}
-    )
-
-    result = await runner._handle_message(_make_event("/status"))
-
-    assert result == "Blocked by ACL"
-    runner._run_agent.assert_not_called()
-    # The emit_collect call should use the canonical command name.
-    call_args = runner.hooks.emit_collect.await_args
-    assert call_args.args[0] == "command:status"
-
-
-@pytest.mark.asyncio
-async def test_command_hook_deny_without_message_uses_default(monkeypatch):
-    """A deny decision with no message falls back to a generic blocked string."""
-    import gateway.run as gateway_run
-
-    runner = _make_runner()
-    runner._handle_status_command = AsyncMock(
-        side_effect=AssertionError("denied slash command reached its handler")
-    )
-    runner.hooks.emit_collect = AsyncMock(return_value=[{"decision": "deny"}])
-
-    monkeypatch.setattr(
-        gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"}
-    )
-
-    result = await runner._handle_message(_make_event("/status"))
-
-    assert result is not None
-    assert "blocked" in result.lower()
-
-
-@pytest.mark.asyncio
-async def test_command_hook_can_mark_command_as_handled(monkeypatch):
-    """A handled decision short-circuits dispatch cleanly with a custom reply."""
-    import gateway.run as gateway_run
-
-    runner = _make_runner()
-    runner._handle_status_command = AsyncMock(
-        side_effect=AssertionError("handled slash command reached its handler")
-    )
-    runner.hooks.emit_collect = AsyncMock(
-        return_value=[{"decision": "handled", "message": "Already handled upstream"}]
-    )
-
-    monkeypatch.setattr(
-        gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"}
-    )
-
-    result = await runner._handle_message(_make_event("/status"))
-
-    assert result == "Already handled upstream"
-
-
-@pytest.mark.asyncio
-async def test_command_hook_allow_decision_is_passthrough(monkeypatch):
-    """A handler returning {"decision": "allow"} must NOT prevent normal dispatch."""
-    import gateway.run as gateway_run
-
-    runner = _make_runner()
-    runner._handle_status_command = AsyncMock(return_value="status: ok")
-    runner.hooks.emit_collect = AsyncMock(
-        return_value=[{"decision": "allow"}]
-    )
-
-    monkeypatch.setattr(
-        gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"}
-    )
-
-    result = await runner._handle_message(_make_event("/status"))
-
-    assert result == "status: ok"
-    runner._handle_status_command.assert_awaited_once()
-
-
-@pytest.mark.asyncio
-async def test_command_hook_non_dict_return_values_ignored(monkeypatch):
-    """Hook return values that aren't dicts must not break dispatch."""
-    import gateway.run as gateway_run
-
-    runner = _make_runner()
-    runner._handle_status_command = AsyncMock(return_value="status: ok")
-    runner.hooks.emit_collect = AsyncMock(
-        return_value=["some string", 42, None, {}]
-    )
-
-    monkeypatch.setattr(
-        gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"}
-    )
-
-    result = await runner._handle_message(_make_event("/status"))
-
-    assert result == "status: ok"
-
-
-@pytest.mark.asyncio
-async def test_command_hook_fires_for_plugin_registered_command(monkeypatch):
-    """Plugin-registered slash commands should also trigger command:<name> hooks."""
-    import gateway.run as gateway_run
-
-    runner = _make_runner()
-    runner._run_agent = AsyncMock(
-        side_effect=AssertionError("plugin command leaked to the agent")
-    )
-    runner.hooks.emit_collect = AsyncMock(
-        return_value=[{"decision": "handled", "message": "intercepted"}]
-    )
-
-    monkeypatch.setattr(
-        gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"}
-    )
-    # Stub plugin command lookup so is_gateway_known_command() recognizes /metricas.
-    from hermes_cli import plugins as _plugins_mod
-
-    monkeypatch.setattr(
-        _plugins_mod,
-        "get_plugin_commands",
-        lambda: {"metricas": {"description": "Metrics", "args_hint": "dias:7"}},
-    )
-
-    result = await runner._handle_message(_make_event("/metricas dias:7"))
-
-    assert result == "intercepted"
-    # Hook event name uses the plugin command as canonical.
-    call_args = runner.hooks.emit_collect.await_args
-    assert call_args.args[0] == "command:metricas"
-    # Args are passed through in both "args" and "raw_args" keys.
-    ctx = call_args.args[1]
-    assert ctx["raw_args"] == "dias:7"
-
-
-@pytest.mark.asyncio
-async def test_command_hook_rewrite_routes_to_plugin(monkeypatch):
-    """A rewrite decision should re-resolve the command and route to the new one."""
-    import gateway.run as gateway_run
-
-    runner = _make_runner()
-    runner._run_agent = AsyncMock(
-        side_effect=AssertionError("rewritten command leaked to the agent")
-    )
-
-    call_log = []
-
-    async def _emit_collect(event_type, ctx):
-        call_log.append(event_type)
-        if event_type == "command:status":
-            return [
-                {
-                    "decision": "rewrite",
-                    "command_name": "metricas",
-                    "raw_args": "dias:7",
-                }
-            ]
-        return []
-
-    runner.hooks.emit_collect = AsyncMock(side_effect=_emit_collect)
-
-    monkeypatch.setattr(
-        gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"}
-    )
-    from hermes_cli import plugins as _plugins_mod
-
-    monkeypatch.setattr(
-        _plugins_mod,
-        "get_plugin_commands",
-        lambda: {"metricas": {"description": "Metrics", "args_hint": "dias:7"}},
-    )
-    monkeypatch.setattr(
-        _plugins_mod,
-        "get_plugin_command_handler",
-        lambda name: (lambda args: f"metrics {args}") if name == "metricas" else None,
-    )
-
-    result = await runner._handle_message(_make_event("/status"))
-
-    assert result == "metrics dias:7"
-    # First emit_collect fires on the original command; after rewrite the
-    # dispatcher does NOT re-fire for the new command (one decision per turn).
-    assert call_log == ["command:status"]
@@ -42,6 +42,8 @@ class TestProviderRegistry:
        ("minimax-cn", "MiniMax (China)", "api_key"),
        ("ai-gateway", "Vercel AI Gateway", "api_key"),
        ("kilocode", "Kilo Code", "api_key"),
+        ("volcengine", "Volcengine", "api_key"),
+        ("byteplus", "BytePlus", "api_key"),
    ])
    def test_provider_registered(self, provider_id, name, auth_type):
        assert provider_id in PROVIDER_REGISTRY
@@ -111,6 +113,16 @@ class TestProviderRegistry:
        assert pconfig.api_key_env_vars == ("HF_TOKEN",)
        assert pconfig.base_url_env_var == "HF_BASE_URL"

+    def test_volcengine_env_vars(self):
+        pconfig = PROVIDER_REGISTRY["volcengine"]
+        assert pconfig.api_key_env_vars == ("VOLCENGINE_API_KEY",)
+        assert pconfig.base_url_env_var == ""
+
+    def test_byteplus_env_vars(self):
+        pconfig = PROVIDER_REGISTRY["byteplus"]
+        assert pconfig.api_key_env_vars == ("BYTEPLUS_API_KEY",)
+        assert pconfig.base_url_env_var == ""
+
    def test_base_urls(self):
        assert PROVIDER_REGISTRY["copilot"].inference_base_url == "https://api.githubcopilot.com"
        assert PROVIDER_REGISTRY["copilot-acp"].inference_base_url == "acp://copilot"
@@ -122,6 +134,8 @@ class TestProviderRegistry:
        assert PROVIDER_REGISTRY["ai-gateway"].inference_base_url == "https://ai-gateway.vercel.sh/v1"
        assert PROVIDER_REGISTRY["kilocode"].inference_base_url == "https://api.kilo.ai/api/gateway"
        assert PROVIDER_REGISTRY["huggingface"].inference_base_url == "https://router.huggingface.co/v1"
+        assert PROVIDER_REGISTRY["volcengine"].inference_base_url == "https://ark.cn-beijing.volces.com/api/v3"
+        assert PROVIDER_REGISTRY["byteplus"].inference_base_url == "https://ark.ap-southeast.bytepluses.com/api/v3"

    def test_oauth_providers_unchanged(self):
        """Ensure we didn't break the existing OAuth providers."""
@@ -147,6 +161,7 @@ PROVIDER_ENV_VARS = (
    "NOUS_API_KEY", "GITHUB_TOKEN", "GH_TOKEN",
    "OPENAI_BASE_URL", "HERMES_COPILOT_ACP_COMMAND", "COPILOT_CLI_PATH",
    "HERMES_COPILOT_ACP_ARGS", "COPILOT_ACP_BASE_URL",
+    "VOLCENGINE_API_KEY", "BYTEPLUS_API_KEY",
 )


@@ -232,6 +247,14 @@ class TestResolveProvider:
        assert resolve_provider("github-copilot-acp") == "copilot-acp"
        assert resolve_provider("copilot-acp-agent") == "copilot-acp"

+    def test_alias_volcengine_coding_plan(self):
+        assert resolve_provider("volcengine-coding-plan") == "volcengine"
+        assert resolve_provider("volcengine_coding_plan") == "volcengine"
+
+    def test_alias_byteplus_coding_plan(self):
+        assert resolve_provider("byteplus-coding-plan") == "byteplus"
+        assert resolve_provider("byteplus_coding_plan") == "byteplus"
+
    def test_explicit_huggingface(self):
        assert resolve_provider("huggingface") == "huggingface"

@@ -339,6 +362,23 @@ class TestApiKeyProviderStatus:
        assert status["configured"] is True
        assert status["base_url"] == STEPFUN_STEP_PLAN_CN_BASE_URL

+    def test_volcengine_status_uses_coding_plan_base_url(self, monkeypatch):
+        monkeypatch.setenv("VOLCENGINE_API_KEY", "volc-test-key")
+        monkeypatch.setattr(
+            "hermes_cli.auth.read_raw_config",
+            lambda: {
+                "model": {
+                    "provider": "volcengine",
+                    "default": "volcengine-coding-plan/doubao-seed-2.0-code",
+                }
+            },
+        )
+
+        status = get_api_key_provider_status("volcengine")
+
+        assert status["configured"] is True
+        assert status["base_url"] == "https://ark.cn-beijing.volces.com/api/coding/v3"
+
    def test_copilot_status_uses_gh_cli_token(self, monkeypatch):
        monkeypatch.setattr("hermes_cli.copilot_auth._try_gh_cli_token", lambda: "gho_gh_cli_token")
        status = get_api_key_provider_status("copilot")
@@ -394,6 +434,25 @@ class TestResolveApiKeyProviderCredentials:
        assert creds["base_url"] == "https://api.z.ai/api/paas/v4"
        assert creds["source"] == "GLM_API_KEY"

+    def test_resolve_byteplus_with_coding_plan_model_uses_coding_base_url(self, monkeypatch):
+        monkeypatch.setenv("BYTEPLUS_API_KEY", "byteplus-secret-key")
+        monkeypatch.setattr(
+            "hermes_cli.auth.read_raw_config",
+            lambda: {
+                "model": {
+                    "provider": "byteplus",
+                    "default": "byteplus-coding-plan/dola-seed-2.0-pro",
+                }
+            },
+        )
+
+        creds = resolve_api_key_provider_credentials("byteplus")
+
+        assert creds["provider"] == "byteplus"
+        assert creds["api_key"] == "byteplus-secret-key"
+        assert creds["base_url"] == "https://ark.ap-southeast.bytepluses.com/api/coding/v3"
+        assert creds["source"] == "BYTEPLUS_API_KEY"
+
    def test_resolve_copilot_with_github_token(self, monkeypatch):
        monkeypatch.setenv("GITHUB_TOKEN", "gh-env-secret")
        creds = resolve_api_key_provider_credentials("copilot")
@@ -1208,119 +1208,3 @@ class TestDiscordSkillCommandsByCategory:
        assert "axolotl" in names
        assert "vllm" in names
        assert len(uncategorized) == 0
-
-
-# ---------------------------------------------------------------------------
-# Plugin slash command integration
-# ---------------------------------------------------------------------------
-
-class TestPluginCommandEnumeration:
-    """Plugin commands registered via ctx.register_command() must be surfaced
-    by every gateway enumerator (Telegram menu, Slack subcommand map, etc.).
-    """
-
-    def _patch_plugin_commands(self, monkeypatch, commands):
-        """Monkeypatch hermes_cli.plugins.get_plugin_commands() to a fixed dict."""
-        from hermes_cli import plugins as _plugins_mod
-
-        monkeypatch.setattr(
-            _plugins_mod, "get_plugin_commands", lambda: dict(commands)
-        )
-
-    def test_plugin_command_appears_in_telegram_menu(self, monkeypatch):
-        """/metricas registered by a plugin must appear in Telegram BotCommand menu."""
-        self._patch_plugin_commands(monkeypatch, {
-            "metricas": {
-                "handler": lambda _a: "ok",
-                "description": "Metrics dashboard",
-                "args_hint": "dias:7",
-                "plugin": "metrics-plugin",
-            }
-        })
-        names = {name for name, _desc in telegram_bot_commands()}
-        assert "metricas" in names
-
-    def test_plugin_command_appears_in_slack_subcommand_map(self, monkeypatch):
-        """/hermes metricas must route through the Slack subcommand map."""
-        self._patch_plugin_commands(monkeypatch, {
-            "metricas": {
-                "handler": lambda _a: "ok",
-                "description": "Metrics",
-                "args_hint": "",
-                "plugin": "metrics-plugin",
-            }
-        })
-        mapping = slack_subcommand_map()
-        assert mapping.get("metricas") == "/metricas"
-
-    def test_plugin_command_does_not_shadow_builtin_in_slack(self, monkeypatch):
-        """If a plugin registers a name that collides with a built-in, the built-in mapping wins."""
-        self._patch_plugin_commands(monkeypatch, {
-            "status": {
-                "handler": lambda _a: "plugin-status",
-                "description": "Plugin status",
-                "args_hint": "",
-                "plugin": "shadow-plugin",
-            }
-        })
-        mapping = slack_subcommand_map()
-        # Built-in /status must still be present and not overwritten.
-        assert mapping.get("status") == "/status"
-
-    def test_plugin_command_with_hyphens_sanitized_for_telegram(self, monkeypatch):
-        """Plugin names containing hyphens must be underscore-normalized for Telegram."""
-        self._patch_plugin_commands(monkeypatch, {
-            "my-plugin-cmd": {
-                "handler": lambda _a: "ok",
-                "description": "desc",
-                "args_hint": "",
-                "plugin": "p",
-            }
-        })
-        names = {name for name, _desc in telegram_bot_commands()}
-        assert "my_plugin_cmd" in names
-        assert "my-plugin-cmd" not in names
-
-    def test_is_gateway_known_command_recognizes_plugin_commands(self, monkeypatch):
-        """is_gateway_known_command() must return True for plugin commands."""
-        from hermes_cli.commands import is_gateway_known_command
-
-        self._patch_plugin_commands(monkeypatch, {
-            "metricas": {
-                "handler": lambda _a: "ok",
-                "description": "Metrics",
-                "args_hint": "",
-                "plugin": "p",
-            }
-        })
-        assert is_gateway_known_command("metricas") is True
-        assert is_gateway_known_command("definitely-not-registered") is False
-
-    def test_is_gateway_known_command_still_recognizes_builtins(self, monkeypatch):
-        """Built-in commands must remain known even when plugin discovery fails."""
-        from hermes_cli import plugins as _plugins_mod
-        from hermes_cli.commands import is_gateway_known_command
-
-        def _boom():
-            raise RuntimeError("plugin system down")
-
-        monkeypatch.setattr(_plugins_mod, "get_plugin_commands", _boom)
-
-        assert is_gateway_known_command("status") is True
-        assert is_gateway_known_command(None) is False
-        assert is_gateway_known_command("") is False
-
-    def test_plugin_enumerator_handles_missing_plugin_manager(self, monkeypatch):
-        """Enumerators must never raise when plugin discovery raises."""
-        from hermes_cli import plugins as _plugins_mod
-
-        def _boom():
-            raise RuntimeError("plugin system down")
-
-        monkeypatch.setattr(_plugins_mod, "get_plugin_commands", _boom)
-
-        # Both calls should succeed and just return the built-in set.
-        tg_names = {name for name, _desc in telegram_bot_commands()}
-        slack_names = set(slack_subcommand_map())
-        assert "status" in tg_names
-        assert "status" in slack_names
@@ -137,105 +137,50 @@ class TestUploadToPastebin:
 # Log reading
 # ---------------------------------------------------------------------------

-class TestCaptureLogSnapshot:
-    """Test _capture_log_snapshot for log reading and truncation."""
+class TestReadFullLog:
+    """Test _read_full_log for standalone log uploads."""

    def test_reads_small_file(self, hermes_home):
-        from hermes_cli.debug import _capture_log_snapshot
+        from hermes_cli.debug import _read_full_log

-        snap = _capture_log_snapshot("agent", tail_lines=10)
-        assert snap.full_text is not None
-        assert "session started" in snap.full_text
-        assert "session started" in snap.tail_text
+        content = _read_full_log("agent")
+        assert content is not None
+        assert "session started" in content

    def test_returns_none_for_missing(self, tmp_path, monkeypatch):
        home = tmp_path / ".hermes"
        home.mkdir()
        monkeypatch.setenv("HERMES_HOME", str(home))

-        from hermes_cli.debug import _capture_log_snapshot
-        snap = _capture_log_snapshot("agent", tail_lines=10)
-        assert snap.full_text is None
-        assert snap.tail_text == "(file not found)"
+        from hermes_cli.debug import _read_full_log
+        assert _read_full_log("agent") is None

-    def test_empty_primary_reports_file_empty(self, hermes_home):
-        """Empty primary (no .1 fallback) surfaces as '(file empty)', not missing."""
+    def test_returns_none_for_empty(self, hermes_home):
+        # Truncate agent.log to empty
        (hermes_home / "logs" / "agent.log").write_text("")

-        from hermes_cli.debug import _capture_log_snapshot
-        snap = _capture_log_snapshot("agent", tail_lines=10)
-        assert snap.full_text is None
-        assert snap.tail_text == "(file empty)"
-
-    def test_race_truncate_after_resolve_reports_empty(self, hermes_home, monkeypatch):
-        """If the log is truncated between resolve and stat, say 'empty', not 'missing'."""
-        log_path = hermes_home / "logs" / "agent.log"
-        from hermes_cli import debug
-
-        monkeypatch.setattr(debug, "_resolve_log_path", lambda _name: log_path)
-        log_path.write_text("")
-
-        snap = debug._capture_log_snapshot("agent", tail_lines=10)
-        assert snap.path == log_path
-        assert snap.full_text is None
-        assert snap.tail_text == "(file empty)"
+        from hermes_cli.debug import _read_full_log
+        assert _read_full_log("agent") is None

    def test_truncates_large_file(self, hermes_home):
        """Files larger than max_bytes get tail-truncated."""
-        from hermes_cli.debug import _capture_log_snapshot
+        from hermes_cli.debug import _read_full_log

        # Write a file larger than 1KB
        big_content = "x" * 100 + "\n"
        (hermes_home / "logs" / "agent.log").write_text(big_content * 200)

-        snap = _capture_log_snapshot("agent", tail_lines=10, max_bytes=1024)
-        assert snap.full_text is not None
-        assert "truncated" in snap.full_text
-
-    def test_keeps_first_line_when_truncation_on_boundary(self, hermes_home):
-        """When truncation lands on a line boundary, keep the first full line."""
-        from hermes_cli.debug import _capture_log_snapshot
-
-        # File must exceed the initial chunk_size (8192) used by the
-        # backward-reading loop so the truncation path actually fires.
-        line = "A" * 99 + "\n"  # 100 bytes per line
-        num_lines = 200  # 20000 bytes
-        (hermes_home / "logs" / "agent.log").write_text(line * num_lines)
-
-        # max_bytes = 1000 = 100 * 10 → cut at byte 20000 - 1000 = 19000,
-        # and byte 19000 - 1 is '\n'.  Boundary hit → keep all 10 lines.
-        snap = _capture_log_snapshot("agent", tail_lines=5, max_bytes=1000)
-        assert snap.full_text is not None
-        assert "truncated" in snap.full_text
-        raw = snap.full_text.split("\n", 1)[1]
-        kept = [l for l in raw.strip().splitlines() if l.startswith("A")]
-        assert len(kept) == 10
-
-    def test_drops_partial_when_truncation_mid_line(self, hermes_home):
-        """When truncation lands mid-line, drop the partial fragment."""
-        from hermes_cli.debug import _capture_log_snapshot
-
-        line = "A" * 99 + "\n"  # 100 bytes per line
-        num_lines = 200  # 20000 bytes
-        (hermes_home / "logs" / "agent.log").write_text(line * num_lines)
-
-        # max_bytes = 950 doesn't divide evenly into 100 → mid-line cut.
-        snap = _capture_log_snapshot("agent", tail_lines=5, max_bytes=950)
-        assert snap.full_text is not None
-        assert "truncated" in snap.full_text
-        raw = snap.full_text.split("\n", 1)[1]
-        kept = [l for l in raw.strip().splitlines() if l.startswith("A")]
-        # 950 / 100 = 9.5 → 9 complete lines after dropping partial
-        assert len(kept) == 9
+        content = _read_full_log("agent", max_bytes=1024)
+        assert content is not None
+        assert "truncated" in content

    def test_unknown_log_returns_none(self, hermes_home):
-        from hermes_cli.debug import _capture_log_snapshot
-        snap = _capture_log_snapshot("nonexistent", tail_lines=10)
-        assert snap.full_text is None
+        from hermes_cli.debug import _read_full_log
+        assert _read_full_log("nonexistent") is None

    def test_falls_back_to_rotated_file(self, hermes_home):
        """When gateway.log doesn't exist, falls back to gateway.log.1."""
-        from hermes_cli.debug import _capture_log_snapshot
+        from hermes_cli.debug import _read_full_log

        logs_dir = hermes_home / "logs"
        # Remove the primary (if any) and create a .1 rotation
@@ -244,33 +189,33 @@ class TestCaptureLogSnapshot:
            "2026-04-12 10:00:00 INFO gateway.run: rotated content\n"
        )

-        snap = _capture_log_snapshot("gateway", tail_lines=10)
-        assert snap.full_text is not None
-        assert "rotated content" in snap.full_text
+        content = _read_full_log("gateway")
+        assert content is not None
+        assert "rotated content" in content

    def test_prefers_primary_over_rotated(self, hermes_home):
        """Primary log is used when it exists, even if .1 also exists."""
-        from hermes_cli.debug import _capture_log_snapshot
+        from hermes_cli.debug import _read_full_log

        logs_dir = hermes_home / "logs"
        (logs_dir / "gateway.log").write_text("primary content\n")
        (logs_dir / "gateway.log.1").write_text("rotated content\n")

-        snap = _capture_log_snapshot("gateway", tail_lines=10)
-        assert "primary content" in snap.full_text
-        assert "rotated" not in snap.full_text
+        content = _read_full_log("gateway")
+        assert "primary content" in content
+        assert "rotated" not in content

    def test_falls_back_when_primary_empty(self, hermes_home):
        """Empty primary log falls back to .1 rotation."""
-        from hermes_cli.debug import _capture_log_snapshot
+        from hermes_cli.debug import _read_full_log

        logs_dir = hermes_home / "logs"
        (logs_dir / "agent.log").write_text("")
        (logs_dir / "agent.log.1").write_text("rotated agent data\n")

-        snap = _capture_log_snapshot("agent", tail_lines=10)
-        assert snap.full_text is not None
-        assert "rotated agent data" in snap.full_text
+        content = _read_full_log("agent")
+        assert content is not None
+        assert "rotated agent data" in content


 # ---------------------------------------------------------------------------
@@ -338,44 +283,6 @@ class TestCollectDebugReport:
 class TestRunDebugShare:
    """Test the run_debug_share CLI handler."""

-    def test_share_sweeps_expired_pastes(self, hermes_home, capsys):
-        """Slash-command path should sweep old pending deletes before uploading."""
-        from hermes_cli.debug import run_debug_share
-
-        args = MagicMock()
-        args.lines = 50
-        args.expire = 7
-        args.local = False
-
-        with patch("hermes_cli.dump.run_dump"), \
-             patch("hermes_cli.debug._sweep_expired_pastes", return_value=(0, 0)) as mock_sweep, \
-             patch("hermes_cli.debug.upload_to_pastebin",
-                    return_value="https://paste.rs/test"):
-            run_debug_share(args)
-
-        mock_sweep.assert_called_once()
-        assert "Debug report uploaded" in capsys.readouterr().out
-
-    def test_share_survives_sweep_failure(self, hermes_home, capsys):
-        """Expired-paste cleanup is best-effort and must not block sharing."""
-        from hermes_cli.debug import run_debug_share
-
-        args = MagicMock()
-        args.lines = 50
-        args.expire = 7
-        args.local = False
-
-        with patch("hermes_cli.dump.run_dump"), \
-             patch(
-                 "hermes_cli.debug._sweep_expired_pastes",
-                 side_effect=RuntimeError("offline"),
-             ), \
-             patch("hermes_cli.debug.upload_to_pastebin",
-                    return_value="https://paste.rs/test"):
-            run_debug_share(args)
-
-        assert "https://paste.rs/test" in capsys.readouterr().out
-
    def test_local_flag_prints_full_logs(self, hermes_home, capsys):
        """--local prints the report plus full log contents."""
        from hermes_cli.debug import run_debug_share
@@ -433,55 +340,6 @@ class TestRunDebugShare:
        assert "--- hermes dump ---" in gateway_paste
        assert "--- full gateway.log ---" in gateway_paste

-    def test_share_keeps_report_and_full_log_on_same_snapshot(self, hermes_home, capsys):
-        """A mid-run rotation must not make full agent.log older than the report."""
-        from hermes_cli.debug import run_debug_share, collect_debug_report as real_collect_debug_report
-
-        logs_dir = hermes_home / "logs"
-        (logs_dir / "agent.log").write_text(
-            "2026-04-22 12:00:00 INFO agent: newest line\n"
-        )
-        (logs_dir / "agent.log.1").write_text(
-            "2026-04-10 12:00:00 INFO agent: old rotated line\n"
-        )
-
-        args = MagicMock()
-        args.lines = 50
-        args.expire = 7
-        args.local = False
-
-        uploaded_content = []
-
-        def _mock_upload(content, expiry_days=7):
-            uploaded_content.append(content)
-            return f"https://paste.rs/paste{len(uploaded_content)}"
-
-        def _wrapped_collect_debug_report(*, log_lines=200, dump_text="", log_snapshots=None):
-            report = real_collect_debug_report(
-                log_lines=log_lines,
-                dump_text=dump_text,
-                log_snapshots=log_snapshots,
-            )
-            # Simulate the live log rotating after the report is built but
-            # before the old implementation would have re-read agent.log for
-            # standalone upload.
-            (logs_dir / "agent.log").write_text("")
-            (logs_dir / "agent.log.1").write_text(
-                "2026-04-10 12:00:00 INFO agent: old rotated line\n"
-            )
-            return report
-
-        with patch("hermes_cli.dump.run_dump"), \
-             patch("hermes_cli.debug.collect_debug_report", side_effect=_wrapped_collect_debug_report), \
-             patch("hermes_cli.debug.upload_to_pastebin", side_effect=_mock_upload):
-            run_debug_share(args)
-
-        report_paste = uploaded_content[0]
-        agent_paste = uploaded_content[1]
-        assert "2026-04-22 12:00:00 INFO agent: newest line" in report_paste
-        assert "2026-04-22 12:00:00 INFO agent: newest line" in agent_paste
-        assert "old rotated line" not in agent_paste
-
    def test_share_skips_missing_logs(self, tmp_path, monkeypatch, capsys):
        """Only uploads logs that exist."""
        home = tmp_path / ".hermes"
@@ -121,12 +121,6 @@ def test_systemd_status_warns_when_linger_disabled(monkeypatch, tmp_path, capsys
            return SimpleNamespace(returncode=0, stdout="", stderr="")
        if cmd[:3] == ["systemctl", "--user", "is-active"]:
            return SimpleNamespace(returncode=0, stdout="active\n", stderr="")
-        if cmd[:3] == ["systemctl", "--user", "show"]:
-            return SimpleNamespace(
-                returncode=0,
-                stdout="ActiveState=active\nSubState=running\nResult=success\nExecMainStatus=0\n",
-                stderr="",
-            )
        raise AssertionError(f"Unexpected command: {cmd}")

    monkeypatch.setattr(gateway.subprocess, "run", fake_run)
@@ -358,24 +352,3 @@ class TestWaitForGatewayExit:

        assert killed == 2
        assert calls == [(11, True), (22, True)]
-
-
-class TestStopProfileGateway:
-    def test_stop_profile_gateway_keeps_pid_file_when_process_still_running(self, monkeypatch):
-        calls = {"kill": 0, "remove": 0}
-
-        monkeypatch.setattr("gateway.status.get_running_pid", lambda: 12345)
-        monkeypatch.setattr(
-            gateway.os,
-            "kill",
-            lambda pid, sig: calls.__setitem__("kill", calls["kill"] + 1),
-        )
-        monkeypatch.setattr("time.sleep", lambda _: None)
-        monkeypatch.setattr(
-            "gateway.status.remove_pid_file",
-            lambda: calls.__setitem__("remove", calls["remove"] + 1),
-        )
-
-        assert gateway.stop_profile_gateway() is True
-        assert calls["kill"] == 21
-        assert calls["remove"] == 0
@@ -77,10 +77,8 @@ class TestSystemdServiceRefresh:
        gateway_cli.systemd_restart()

        assert unit_path.read_text(encoding="utf-8") == "new unit\n"
-        assert calls[:4] == [
+        assert calls[:2] == [
            ["systemctl", "--user", "daemon-reload"],
-            ["systemctl", "--user", "show", gateway_cli.get_service_name(), "--no-pager", "--property", "ActiveState,SubState,Result,ExecMainStatus"],
-            ["systemctl", "--user", "reset-failed", gateway_cli.get_service_name()],
            ["systemctl", "--user", "reload-or-restart", gateway_cli.get_service_name()],
        ]

@@ -476,21 +474,13 @@ class TestGatewaySystemServiceRouting:
                raise ProcessLookupError()
        monkeypatch.setattr(os, "kill", fake_kill)

-        # Simulate systemctl reset-failed/start followed by an active unit
+        # Simulate systemctl is-active returning "active" with a new PID
        new_pid = [None]
        def fake_subprocess_run(cmd, **kwargs):
-            if "reset-failed" in cmd:
-                calls.append(("reset-failed", cmd))
-                return SimpleNamespace(stdout="", returncode=0)
-            if "start" in cmd:
-                calls.append(("start", cmd))
-                return SimpleNamespace(stdout="", returncode=0)
-            if "show" in cmd:
-                new_pid[0] = 999
-                return SimpleNamespace(
-                    stdout="ActiveState=active\nSubState=running\nResult=success\nExecMainStatus=0\n",
-                    returncode=0,
-                )
+            if "is-active" in cmd:
+                result = SimpleNamespace(stdout="active\n", returncode=0)
+                new_pid[0] = 999  # new PID
+                return result
            raise AssertionError(f"Unexpected systemctl call: {cmd}")

        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_subprocess_run)
@@ -504,131 +494,9 @@ class TestGatewaySystemServiceRouting:
        gateway_cli.systemd_restart()

        assert ("self", 654) in calls
-        assert any(call[0] == "reset-failed" for call in calls)
-        assert any(call[0] == "start" for call in calls)
        out = capsys.readouterr().out.lower()
        assert "restarted" in out

-    def test_systemd_restart_recovers_failed_planned_restart(self, monkeypatch, capsys):
-        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
-        monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: None)
-        monkeypatch.setattr(
-            "gateway.status.read_runtime_status",
-            lambda: {"restart_requested": True, "gateway_state": "stopped"},
-        )
-        monkeypatch.setattr(gateway_cli, "_request_gateway_self_restart", lambda pid: False)
-
-        calls = []
-        started = {"value": False}
-
-        def fake_subprocess_run(cmd, **kwargs):
-            if "show" in cmd:
-                if not started["value"]:
-                    return SimpleNamespace(
-                        stdout=(
-                            "ActiveState=failed\n"
-                            "SubState=failed\n"
-                            "Result=exit-code\n"
-                            f"ExecMainStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}\n"
-                        ),
-                        returncode=0,
-                    )
-                return SimpleNamespace(
-                    stdout="ActiveState=active\nSubState=running\nResult=success\nExecMainStatus=0\n",
-                    returncode=0,
-                )
-            if "reset-failed" in cmd:
-                calls.append(("reset-failed", cmd))
-                return SimpleNamespace(stdout="", returncode=0)
-            if "start" in cmd:
-                started["value"] = True
-                calls.append(("start", cmd))
-                return SimpleNamespace(stdout="", returncode=0)
-            raise AssertionError(f"Unexpected command: {cmd}")
-
-        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_subprocess_run)
-        monkeypatch.setattr(
-            "gateway.status.get_running_pid",
-            lambda: 999 if started["value"] else None,
-        )
-
-        gateway_cli.systemd_restart()
-
-        assert any(call[0] == "reset-failed" for call in calls)
-        assert any(call[0] == "start" for call in calls)
-        out = capsys.readouterr().out.lower()
-        assert "restarted" in out
-
-    def test_systemd_status_surfaces_planned_restart_failure(self, monkeypatch, capsys):
-        unit = SimpleNamespace(exists=lambda: True)
-        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
-        monkeypatch.setattr(gateway_cli, "get_systemd_unit_path", lambda system=False: unit)
-        monkeypatch.setattr(gateway_cli, "has_conflicting_systemd_units", lambda: False)
-        monkeypatch.setattr(gateway_cli, "has_legacy_hermes_units", lambda: False)
-        monkeypatch.setattr(gateway_cli, "systemd_unit_is_current", lambda system=False: True)
-        monkeypatch.setattr(gateway_cli, "_runtime_health_lines", lambda: ["⚠ Last shutdown reason: Gateway restart requested"])
-        monkeypatch.setattr(gateway_cli, "get_systemd_linger_status", lambda: (True, ""))
-        monkeypatch.setattr(gateway_cli, "_read_systemd_unit_properties", lambda system=False: {
-            "ActiveState": "failed",
-            "SubState": "failed",
-            "Result": "exit-code",
-            "ExecMainStatus": str(GATEWAY_SERVICE_RESTART_EXIT_CODE),
-        })
-
-        calls = []
-
-        def fake_run_systemctl(args, **kwargs):
-            calls.append(args)
-            if args[:2] == ["status", gateway_cli.get_service_name()]:
-                return SimpleNamespace(returncode=0, stdout="", stderr="")
-            if args[:2] == ["is-active", gateway_cli.get_service_name()]:
-                return SimpleNamespace(returncode=3, stdout="failed\n", stderr="")
-            raise AssertionError(f"Unexpected args: {args}")
-
-        monkeypatch.setattr(gateway_cli, "_run_systemctl", fake_run_systemctl)
-
-        gateway_cli.systemd_status()
-
-        out = capsys.readouterr().out
-        assert "Planned restart is stuck in systemd failed state" in out
-
-    def test_gateway_status_dispatches_full_flag(self, monkeypatch):
-        user_unit = SimpleNamespace(exists=lambda: True)
-        system_unit = SimpleNamespace(exists=lambda: False)
-
-        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
-        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(
-            gateway_cli,
-            "get_systemd_unit_path",
-            lambda system=False: system_unit if system else user_unit,
-        )
-        monkeypatch.setattr(
-            gateway_cli,
-            "get_gateway_runtime_snapshot",
-            lambda system=False: gateway_cli.GatewayRuntimeSnapshot(
-                manager="systemd (user)",
-                service_installed=True,
-                service_running=False,
-                gateway_pids=(),
-                service_scope="user",
-            ),
-        )
-
-        calls = []
-        monkeypatch.setattr(
-            gateway_cli,
-            "systemd_status",
-            lambda deep=False, system=False, full=False: calls.append((deep, system, full)),
-        )
-
-        gateway_cli.gateway_command(
-            SimpleNamespace(gateway_command="status", deep=False, system=False, full=True)
-        )
-
-        assert calls == [(False, False, True)]
-
    def test_gateway_install_passes_system_flags(self, monkeypatch):
        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
@@ -679,15 +547,11 @@ class TestGatewaySystemServiceRouting:
        )

        calls = []
-        monkeypatch.setattr(
-            gateway_cli,
-            "systemd_status",
-            lambda deep=False, system=False, full=False: calls.append((deep, system, full)),
-        )
+        monkeypatch.setattr(gateway_cli, "systemd_status", lambda deep=False, system=False: calls.append((deep, system)))

        gateway_cli.gateway_command(SimpleNamespace(gateway_command="status", deep=False, system=False))

-        assert calls == [(False, False, False)]
+        assert calls == [(False, False)]

    def test_gateway_status_reports_manual_process_when_service_is_stopped(self, monkeypatch, capsys):
        user_unit = SimpleNamespace(exists=lambda: True)
@@ -701,11 +565,7 @@ class TestGatewaySystemServiceRouting:
            "get_systemd_unit_path",
            lambda system=False: system_unit if system else user_unit,
        )
-        monkeypatch.setattr(
-            gateway_cli,
-            "systemd_status",
-            lambda deep=False, system=False, full=False: print("service stopped"),
-        )
+        monkeypatch.setattr(gateway_cli, "systemd_status", lambda deep=False, system=False: print("service stopped"))
        monkeypatch.setattr(
            gateway_cli,
            "get_gateway_runtime_snapshot",
@@ -1710,23 +1570,6 @@ class TestMigrateLegacyCommand:

        assert called == {"interactive": False, "dry_run": False}

-
-class TestGatewayStatusParser:
-    def test_gateway_status_subparser_accepts_full_flag(self):
-        import subprocess
-        import sys
-
-        result = subprocess.run(
-            [sys.executable, "-m", "hermes_cli.main", "gateway", "status", "-l", "--help"],
-            cwd=str(gateway_cli.PROJECT_ROOT),
-            capture_output=True,
-            text=True,
-            timeout=15,
-        )
-
-        assert result.returncode == 0
-        assert "unrecognized arguments" not in result.stderr
-
    def test_gateway_command_migrate_legacy_dry_run_passes_through(
        self, monkeypatch
    ):
@@ -179,6 +179,19 @@ class TestIssue6211NativeProviderPrefixNormalization:
        assert normalize_model_for_provider(model, target_provider) == expected


+class TestContractProviderPrefixNormalization:
+    @pytest.mark.parametrize("model,target_provider,expected", [
+        ("volcengine/doubao-seed-2-0-pro-260215", "volcengine", "doubao-seed-2-0-pro-260215"),
+        ("volcengine-coding-plan/doubao-seed-2.0-code", "volcengine", "doubao-seed-2.0-code"),
+        ("byteplus/seed-2-0-pro-260328", "byteplus", "seed-2-0-pro-260328"),
+        ("byteplus-coding-plan/dola-seed-2.0-pro", "byteplus", "dola-seed-2.0-pro"),
+    ])
+    def test_contract_provider_prefixes_strip_to_native_model(
+        self, model, target_provider, expected
+    ):
+        assert normalize_model_for_provider(model, target_provider) == expected
+
+
 # ── detect_vendor ──────────────────────────────────────────────────────

 class TestDetectVendor:
@@ -102,6 +102,31 @@ class TestProviderPersistsAfterModelSave:
        )
        assert model.get("default") == "kimi-k2.5"

+    def test_volcengine_contract_provider_persists_coding_plan_model(self, config_home, monkeypatch):
+        """Volcengine should persist a prefixed coding-plan model and matching base URL."""
+        monkeypatch.setenv("VOLCENGINE_API_KEY", "volc-test-key")
+
+        from hermes_cli.main import _model_flow_contract_provider
+        from hermes_cli.config import load_config
+
+        with patch(
+            "hermes_cli.auth._prompt_model_selection",
+            return_value="volcengine-coding-plan/doubao-seed-2.0-code",
+        ), patch(
+            "hermes_cli.auth.deactivate_provider",
+        ):
+            _model_flow_contract_provider(load_config(), "volcengine", "old-model")
+
+        import yaml
+
+        config = yaml.safe_load((config_home / "config.yaml").read_text()) or {}
+        model = config.get("model")
+        assert isinstance(model, dict), f"model should be dict, got {type(model)}"
+        assert model.get("provider") == "volcengine"
+        assert model.get("default") == "volcengine-coding-plan/doubao-seed-2.0-code"
+        assert model.get("base_url") == "https://ark.cn-beijing.volces.com/api/coding/v3"
+        assert "api_mode" not in model
+
    def test_copilot_provider_saved_when_selected(self, config_home):
        """_model_flow_copilot should persist provider/base_url/model together."""
        from hermes_cli.main import _model_flow_copilot
@@ -6,6 +6,7 @@ from hermes_cli.models import (
    OPENROUTER_MODELS, fetch_openrouter_models, model_ids, detect_provider_for_model,
    is_nous_free_tier, partition_nous_models_by_tier,
    check_nous_free_tier, _FREE_TIER_CACHE_TTL,
+    list_available_providers, provider_for_base_url,
 )
 import hermes_cli.models as _models_mod

@@ -291,6 +292,41 @@ class TestDetectProviderForModel:
        assert result is not None
        assert result[0] not in ("nous",)  # nous has claude models but shouldn't be suggested

+    def test_volcengine_coding_plan_model_detected(self):
+        result = detect_provider_for_model(
+            "volcengine-coding-plan/doubao-seed-2.0-code",
+            "openrouter",
+        )
+        assert result == ("volcengine", "volcengine-coding-plan/doubao-seed-2.0-code")
+
+    def test_byteplus_standard_model_detected(self):
+        result = detect_provider_for_model(
+            "byteplus/seed-2-0-pro-260328",
+            "openrouter",
+        )
+        assert result == ("byteplus", "byteplus/seed-2-0-pro-260328")
+
+
+class TestConfiguredBaseUrlProviderDetection:
+    def test_provider_for_base_url_detects_volcengine(self):
+        assert provider_for_base_url("https://ark.cn-beijing.volces.com/api/v3") == "volcengine"
+
+    def test_provider_for_base_url_detects_byteplus_coding(self):
+        assert provider_for_base_url("https://ark.ap-southeast.bytepluses.com/api/coding/v3") == "byteplus"
+
+    def test_known_builtin_endpoint_is_not_listed_as_custom(self, monkeypatch):
+        monkeypatch.setattr("hermes_cli.models._get_custom_base_url", lambda: "https://ark.cn-beijing.volces.com/api/v3")
+        monkeypatch.setattr(
+            "hermes_cli.auth.get_auth_status",
+            lambda pid: {"configured": pid == "volcengine", "logged_in": pid == "volcengine"},
+        )
+        monkeypatch.setattr("hermes_cli.auth.has_usable_secret", lambda value: False)
+
+        providers = {p["id"]: p for p in list_available_providers()}
+
+        assert providers["volcengine"]["authenticated"] is True
+        assert providers["custom"]["authenticated"] is False
+

 class TestIsNousFreeTier:
    """Tests for is_nous_free_tier — account tier detection."""
@@ -1,124 +0,0 @@
-"""Tests for the models.dev-preferred merge behavior in provider_model_ids
-and list_authenticated_providers.
-
-These guard the contract:
-
-  * For providers in ``_MODELS_DEV_PREFERRED`` (opencode-go, opencode-zen,
-    xiaomi, deepseek, smaller inference providers), both the CLI model
-    picker path (``provider_model_ids``) and the gateway ``/model`` picker
-    path (``list_authenticated_providers``) merge fresh models.dev entries
-    on top of the curated static list.
-  * OpenRouter and Nous Portal are NEVER merged — they keep their curated
-    (OpenRouter) or live-Portal (Nous) semantics.
-  * If models.dev is unreachable (offline / CI), the curated list is the
-    fallback — no crash, no empty list.
-
-Merging is what lets new models (e.g. ``mimo-v2.5-pro`` on opencode-go)
-appear in ``/model`` without a Hermes release.
-"""
-
-import os
-from unittest.mock import patch
-
-import pytest
-
-from hermes_cli.models import (
-    _MODELS_DEV_PREFERRED,
-    _merge_with_models_dev,
-    provider_model_ids,
-)
-
-
-class TestMergeHelper:
-    def test_merge_empty_mdev_returns_curated(self):
-        """When models.dev returns nothing, curated list is preserved verbatim."""
-        with patch("agent.models_dev.list_agentic_models", return_value=[]):
-            out = _merge_with_models_dev("opencode-go", ["mimo-v2-pro", "kimi-k2.6"])
-        assert out == ["mimo-v2-pro", "kimi-k2.6"]
-
-    def test_merge_mdev_raises_returns_curated(self):
-        """Offline / broken models.dev must not break the catalog path."""
-        def boom(_provider):
-            raise RuntimeError("network down")
-
-        with patch("agent.models_dev.list_agentic_models", side_effect=boom):
-            out = _merge_with_models_dev("opencode-go", ["mimo-v2-pro"])
-        assert out == ["mimo-v2-pro"]
-
-    def test_merge_mdev_first_then_curated_extras(self):
-        """models.dev entries come first; curated-only entries are appended."""
-        mdev = ["mimo-v2.5-pro", "mimo-v2-pro", "kimi-k2.6"]
-        curated = ["kimi-k2.6", "kimi-k2.5", "mimo-v2-pro"]  # kimi-k2.5 is curated-only
-        with patch("agent.models_dev.list_agentic_models", return_value=mdev):
-            out = _merge_with_models_dev("opencode-go", curated)
-        # models.dev entries first (in order), then curated-only entries
-        assert out == ["mimo-v2.5-pro", "mimo-v2-pro", "kimi-k2.6", "kimi-k2.5"]
-
-    def test_merge_case_insensitive_dedup(self):
-        """Dedup is case-insensitive but preserves the first occurrence's casing."""
-        mdev = ["MiniMax-M2.7"]
-        curated = ["minimax-m2.7", "minimax-m2.5"]
-        with patch("agent.models_dev.list_agentic_models", return_value=mdev):
-            out = _merge_with_models_dev("minimax", curated)
-        # models.dev casing wins since it came first
-        assert out == ["MiniMax-M2.7", "minimax-m2.5"]
-
-
-class TestProviderModelIdsPreferred:
-    def test_opencode_go_is_preferred(self):
-        assert "opencode-go" in _MODELS_DEV_PREFERRED
-
-    def test_opencode_go_includes_fresh_models_dev_entries(self):
-        """provider_model_ids('opencode-go') adds models.dev entries on top."""
-        mdev = ["mimo-v2.5-pro", "mimo-v2.5", "mimo-v2-pro", "kimi-k2.6"]
-        with patch("agent.models_dev.list_agentic_models", return_value=mdev):
-            out = provider_model_ids("opencode-go")
-        # Fresh models must surface (this is exactly the reported bug fix:
-        # mimo-v2.5-pro should be pickable on opencode-go).
-        assert "mimo-v2.5-pro" in out
-        assert "mimo-v2.5" in out
-        # Curated entries are still present.
-        assert "mimo-v2-pro" in out
-        assert "kimi-k2.6" in out
-
-    def test_opencode_go_offline_falls_back_to_curated(self):
-        """Offline models.dev → curated-only list, no crash."""
-        with patch("agent.models_dev.list_agentic_models", return_value=[]):
-            out = provider_model_ids("opencode-go")
-        # Curated floor (see hermes_cli/models.py _PROVIDER_MODELS["opencode-go"])
-        assert "mimo-v2-pro" in out
-        assert "kimi-k2.6" in out
-
-    def test_opencode_zen_includes_fresh_models(self):
-        """opencode-zen follows the same pattern as opencode-go."""
-        assert "opencode-zen" in _MODELS_DEV_PREFERRED
-        mdev = ["claude-opus-4-7", "kimi-k2.6", "glm-5.1"]
-        with patch("agent.models_dev.list_agentic_models", return_value=mdev):
-            out = provider_model_ids("opencode-zen")
-        assert "claude-opus-4-7" in out
-        assert "kimi-k2.6" in out
-
-
-class TestOpenRouterAndNousUnchanged:
-    """Per Teknium: openrouter and nous are NEVER merged with models.dev."""
-
-    def test_openrouter_not_in_preferred_set(self):
-        assert "openrouter" not in _MODELS_DEV_PREFERRED
-
-    def test_nous_not_in_preferred_set(self):
-        assert "nous" not in _MODELS_DEV_PREFERRED
-
-    def test_openrouter_does_not_call_merge(self):
-        """openrouter takes its own live path — merge helper must NOT run."""
-        with patch(
-            "hermes_cli.models._merge_with_models_dev",
-            side_effect=AssertionError("merge should not be called for openrouter"),
-        ):
-            # Even if model_ids() fails for some other reason, we just care
-            # that the merge path isn't invoked.
-            try:
-                provider_model_ids("openrouter")
-            except AssertionError:
-                raise
-            except Exception:
-                pass  # model_ids() may fail in the hermetic test env — that's fine.
@@ -6,41 +6,16 @@ from unittest.mock import patch
 from hermes_cli.model_switch import list_authenticated_providers


-# Minimum set of models that must be present for opencode-go no matter
-# whether the picker sourced its list from curated-only or curated+models.dev.
-# The curated list in hermes_cli/models.py defines the floor; models.dev only
-# ever adds names on top of it via _merge_with_models_dev.
-_OPENCODE_GO_REQUIRED = {
-    "kimi-k2.6",
-    "kimi-k2.5",
-    "glm-5.1",
-    "glm-5",
-    "mimo-v2-pro",
-    "mimo-v2-omni",
-    "minimax-m2.7",
-    "minimax-m2.5",
-}
-
-
@patch.dict(os.environ, {"OPENCODE_GO_API_KEY": "test-key"}, clear=False)
 def test_opencode_go_appears_when_api_key_set():
    """opencode-go should appear in list_authenticated_providers when OPENCODE_GO_API_KEY is set."""
-    providers = list_authenticated_providers(current_provider="openrouter", max_models=50)
-
+    providers = list_authenticated_providers(current_provider="openrouter")
+    
    # Find opencode-go in results
    opencode_go = next((p for p in providers if p["slug"] == "opencode-go"), None)
-
+    
    assert opencode_go is not None, "opencode-go should appear when OPENCODE_GO_API_KEY is set"
-    # Behavior check: the curated floor must be present. The list may also
-    # include extra models.dev entries (e.g. mimo-v2.5-pro) when the registry
-    # is reachable — that's the whole point of the models.dev-preferred merge
-    # introduced for opencode-go, so don't pin to an exact list here.
-    present = set(opencode_go["models"])
-    missing = _OPENCODE_GO_REQUIRED - present
-    assert not missing, (
-        f"opencode-go picker should include the curated floor; missing: {sorted(missing)}. "
-        f"Got: {opencode_go['models']}"
-    )
+    assert opencode_go["models"] == ["kimi-k2.6", "kimi-k2.5", "glm-5.1", "glm-5", "mimo-v2-pro", "mimo-v2-omni", "minimax-m2.7", "minimax-m2.5"]
    # opencode-go can appear as "built-in" (from PROVIDER_TO_MODELS_DEV when
    # models.dev is reachable) or "hermes" (from HERMES_OVERLAYS fallback when
    # the API is unavailable, e.g. in CI).
@@ -51,10 +26,10 @@ def test_opencode_go_not_appears_when_no_creds():
    """opencode-go should NOT appear when no credentials are set."""
    # Ensure OPENCODE_GO_API_KEY is not set
    env_without_key = {k: v for k, v in os.environ.items() if k != "OPENCODE_GO_API_KEY"}
-
+    
    with patch.dict(os.environ, env_without_key, clear=True):
        providers = list_authenticated_providers(current_provider="openrouter")
-
+        
        # opencode-go should not be in results
        opencode_go = next((p for p in providers if p["slug"] == "opencode-go"), None)
        assert opencode_go is None, "opencode-go should not appear without credentials"
@@ -787,33 +787,6 @@ class TestPluginCommands:
        assert entry["handler"] is handler
        assert entry["description"] == "My custom command"
        assert entry["plugin"] == "test-plugin"
-        # args_hint defaults to empty string when not passed.
-        assert entry["args_hint"] == ""
-
-    def test_register_command_with_args_hint(self):
-        """args_hint is stored and surfaced for gateway-native UI registration."""
-        mgr = PluginManager()
-        manifest = PluginManifest(name="test-plugin", source="user")
-        ctx = PluginContext(manifest, mgr)
-
-        ctx.register_command(
-            "metricas",
-            lambda a: a,
-            description="Metrics dashboard",
-            args_hint="dias:7 formato:json",
-        )
-
-        entry = mgr._plugin_commands["metricas"]
-        assert entry["args_hint"] == "dias:7 formato:json"
-
-    def test_register_command_args_hint_whitespace_trimmed(self):
-        """args_hint leading/trailing whitespace is stripped."""
-        mgr = PluginManager()
-        manifest = PluginManifest(name="test-plugin", source="user")
-        ctx = PluginContext(manifest, mgr)
-
-        ctx.register_command("foo", lambda a: a, args_hint="  <file>  ")
-        assert mgr._plugin_commands["foo"]["args_hint"] == "<file>"

    def test_register_command_normalizes_name(self):
        """Names are lowercased, stripped, and leading slashes removed."""
@@ -1,172 +0,0 @@
-"""Unit tests for hermes_cli.pty_bridge — PTY spawning + byte forwarding.
-
-These tests drive the bridge with minimal POSIX processes (echo, env, sleep,
-printf) to verify it behaves like a PTY you can read/write/resize/close.
-"""
-
-from __future__ import annotations
-
-import os
-import sys
-import time
-
-import pytest
-
-pytest.importorskip("ptyprocess", reason="ptyprocess not installed")
-
-from hermes_cli.pty_bridge import PtyBridge, PtyUnavailableError
-
-
-skip_on_windows = pytest.mark.skipif(
-    sys.platform.startswith("win"), reason="PTY bridge is POSIX-only"
-)
-
-
-def _read_until(bridge: PtyBridge, needle: bytes, timeout: float = 5.0) -> bytes:
-    """Accumulate PTY output until we see `needle` or time out."""
-    deadline = time.monotonic() + timeout
-    buf = bytearray()
-    while time.monotonic() < deadline:
-        chunk = bridge.read(timeout=0.2)
-        if chunk is None:
-            break
-        buf.extend(chunk)
-        if needle in buf:
-            return bytes(buf)
-    return bytes(buf)
-
-
-@skip_on_windows
-class TestPtyBridgeSpawn:
-    def test_is_available_on_posix(self):
-        assert PtyBridge.is_available() is True
-
-    def test_spawn_returns_bridge_with_pid(self):
-        bridge = PtyBridge.spawn(["true"])
-        try:
-            assert bridge.pid > 0
-        finally:
-            bridge.close()
-
-    def test_spawn_raises_on_missing_argv0(self, tmp_path):
-        with pytest.raises((FileNotFoundError, OSError)):
-            PtyBridge.spawn([str(tmp_path / "definitely-not-a-real-binary")])
-
-
-@skip_on_windows
-class TestPtyBridgeIO:
-    def test_reads_child_stdout(self):
-        bridge = PtyBridge.spawn(["/bin/sh", "-c", "printf hermes-ok"])
-        try:
-            output = _read_until(bridge, b"hermes-ok")
-            assert b"hermes-ok" in output
-        finally:
-            bridge.close()
-
-    def test_write_sends_to_child_stdin(self):
-        # `cat` with no args echoes stdin back to stdout.  We write a line,
-        # read it back, then signal EOF to let cat exit cleanly.
-        bridge = PtyBridge.spawn(["/bin/cat"])
-        try:
-            bridge.write(b"hello-pty\n")
-            output = _read_until(bridge, b"hello-pty")
-            assert b"hello-pty" in output
-        finally:
-            bridge.close()
-
-    def test_read_returns_none_after_child_exits(self):
-        bridge = PtyBridge.spawn(["/bin/sh", "-c", "printf done"])
-        try:
-            _read_until(bridge, b"done")
-            # Give the child a beat to exit cleanly, then drain until EOF.
-            deadline = time.monotonic() + 3.0
-            while bridge.is_alive() and time.monotonic() < deadline:
-                bridge.read(timeout=0.1)
-            # Next reads after exit should return None (EOF), not raise.
-            got_none = False
-            for _ in range(10):
-                if bridge.read(timeout=0.1) is None:
-                    got_none = True
-                    break
-            assert got_none, "PtyBridge.read did not return None after child EOF"
-        finally:
-            bridge.close()
-
-
-@skip_on_windows
-class TestPtyBridgeResize:
-    def test_resize_updates_child_winsize(self):
-        # tput reads COLUMNS/LINES from the TTY ioctl (TIOCGWINSZ).
-        # Spawn a shell, resize, then ask tput for the dimensions.
-        bridge = PtyBridge.spawn(
-            ["/bin/sh", "-c", "sleep 0.1; tput cols; tput lines"],
-            cols=80,
-            rows=24,
-        )
-        try:
-            bridge.resize(cols=123, rows=45)
-            output = _read_until(bridge, b"45", timeout=5.0)
-            # tput prints just the numbers, one per line
-            assert b"123" in output
-            assert b"45" in output
-        finally:
-            bridge.close()
-
-
-@skip_on_windows
-class TestPtyBridgeClose:
-    def test_close_is_idempotent(self):
-        bridge = PtyBridge.spawn(["/bin/sh", "-c", "sleep 30"])
-        bridge.close()
-        bridge.close()  # must not raise
-        assert not bridge.is_alive()
-
-    def test_close_terminates_long_running_child(self):
-        bridge = PtyBridge.spawn(["/bin/sh", "-c", "sleep 30"])
-        pid = bridge.pid
-        bridge.close()
-        # Give the kernel a moment to reap
-        deadline = time.monotonic() + 3.0
-        reaped = False
-        while time.monotonic() < deadline:
-            try:
-                os.kill(pid, 0)
-                time.sleep(0.05)
-            except ProcessLookupError:
-                reaped = True
-                break
-        assert reaped, f"pid {pid} still running after close()"
-
-
-@skip_on_windows
-class TestPtyBridgeEnv:
-    def test_cwd_is_respected(self, tmp_path):
-        bridge = PtyBridge.spawn(
-            ["/bin/sh", "-c", "pwd"],
-            cwd=str(tmp_path),
-        )
-        try:
-            output = _read_until(bridge, str(tmp_path).encode())
-            assert str(tmp_path).encode() in output
-        finally:
-            bridge.close()
-
-    def test_env_is_forwarded(self):
-        bridge = PtyBridge.spawn(
-            ["/bin/sh", "-c", "printf %s \"$HERMES_PTY_TEST\""],
-            env={**os.environ, "HERMES_PTY_TEST": "pty-env-works"},
-        )
-        try:
-            output = _read_until(bridge, b"pty-env-works")
-            assert b"pty-env-works" in output
-        finally:
-            bridge.close()
-
-
-class TestPtyBridgeUnavailable:
-    """Platform fallback semantics — PtyUnavailableError is importable and
-    carries a user-readable message."""
-
-    def test_error_carries_user_message(self):
-        err = PtyUnavailableError("platform not supported")
-        assert "platform" in str(err)
@@ -268,6 +268,7 @@ class TestCliBrandingHelpers:

    def test_prompt_toolkit_style_overrides_cover_tui_classes(self):
        from hermes_cli.skin_engine import set_active_skin, get_prompt_toolkit_style_overrides
+
        set_active_skin("ares")
        overrides = get_prompt_toolkit_style_overrides()
        required = {
@@ -276,13 +277,6 @@ class TestCliBrandingHelpers:
            "prompt",
            "prompt-working",
            "hint",
-            "status-bar",
-            "status-bar-strong",
-            "status-bar-dim",
-            "status-bar-good",
-            "status-bar-warn",
-            "status-bar-bad",
-            "status-bar-critical",
            "input-rule",
            "image-badge",
            "completion-menu",
@@ -331,15 +325,6 @@ class TestCliBrandingHelpers:
        overrides = get_prompt_toolkit_style_overrides()
        assert overrides["prompt"] == skin.get_color("prompt")
        assert overrides["input-rule"] == skin.get_color("input_rule")
-        assert overrides["status-bar"] == (
-            f"bg:{skin.get_color('status_bar_bg')} {skin.get_color('status_bar_text')}"
-        )
-        assert overrides["status-bar-strong"] == (
-            f"bg:{skin.get_color('status_bar_bg')} {skin.get_color('status_bar_strong')} bold"
-        )
-        assert overrides["status-bar-critical"] == (
-            f"bg:{skin.get_color('status_bar_bg')} {skin.get_color('status_bar_critical')} bold"
-        )
        assert overrides["clarify-title"] == f"{skin.get_color('banner_title')} bold"
        assert overrides["sudo-prompt"] == f"{skin.get_color('ui_error')} bold"
        assert overrides["approval-title"] == f"{skin.get_color('ui_warn')} bold"
@@ -1256,186 +1256,3 @@ class TestStatusRemoteGateway:
        assert data["gateway_running"] is True
        assert data["gateway_pid"] is None
        assert data["gateway_state"] == "running"
-
-
-# ---------------------------------------------------------------------------
-# /api/pty WebSocket — terminal bridge for the dashboard "Chat" tab.
-#
-# These tests drive the endpoint with a tiny fake command (typically ``cat``
-# or ``sh -c 'printf …'``) instead of the real ``hermes --tui`` binary.  The
-# endpoint resolves its argv through ``_resolve_chat_argv``, so tests
-# monkeypatch that hook.
-# ---------------------------------------------------------------------------
-
-import sys
-
-
-skip_on_windows = pytest.mark.skipif(
-    sys.platform.startswith("win"), reason="PTY bridge is POSIX-only"
-)
-
-
-@skip_on_windows
-class TestPtyWebSocket:
-    @pytest.fixture(autouse=True)
-    def _setup(self, monkeypatch, _isolate_hermes_home):
-        from starlette.testclient import TestClient
-
-        import hermes_cli.web_server as ws
-
-        # Avoid exec'ing the actual TUI in tests: every test below installs
-        # its own fake argv via ``ws._resolve_chat_argv``.
-        self.ws_module = ws
-        self.token = ws._SESSION_TOKEN
-        self.client = TestClient(ws.app)
-
-    def _url(self, token: str | None = None, **params: str) -> str:
-        tok = token if token is not None else self.token
-        # TestClient.websocket_connect takes the path; it reconstructs the
-        # query string, so we pass it inline.
-        from urllib.parse import urlencode
-
-        q = {"token": tok, **params}
-        return f"/api/pty?{urlencode(q)}"
-
-    def test_rejects_missing_token(self, monkeypatch):
-        monkeypatch.setattr(
-            self.ws_module,
-            "_resolve_chat_argv",
-            lambda resume=None: (["/bin/cat"], None, None),
-        )
-        from starlette.websockets import WebSocketDisconnect
-
-        with pytest.raises(WebSocketDisconnect) as exc:
-            with self.client.websocket_connect("/api/pty"):
-                pass
-        assert exc.value.code == 4401
-
-    def test_rejects_bad_token(self, monkeypatch):
-        monkeypatch.setattr(
-            self.ws_module,
-            "_resolve_chat_argv",
-            lambda resume=None: (["/bin/cat"], None, None),
-        )
-        from starlette.websockets import WebSocketDisconnect
-
-        with pytest.raises(WebSocketDisconnect) as exc:
-            with self.client.websocket_connect(self._url(token="wrong")):
-                pass
-        assert exc.value.code == 4401
-
-    def test_streams_child_stdout_to_client(self, monkeypatch):
-        monkeypatch.setattr(
-            self.ws_module,
-            "_resolve_chat_argv",
-            lambda resume=None: (
-                ["/bin/sh", "-c", "printf hermes-ws-ok"],
-                None,
-                None,
-            ),
-        )
-        with self.client.websocket_connect(self._url()) as conn:
-            # Drain frames until we see the needle or time out.  TestClient's
-            # recv_bytes blocks; loop until we have the signal byte string.
-            buf = b""
-            import time
-
-            deadline = time.monotonic() + 5.0
-            while time.monotonic() < deadline:
-                try:
-                    frame = conn.receive_bytes()
-                except Exception:
-                    break
-                if frame:
-                    buf += frame
-                if b"hermes-ws-ok" in buf:
-                    break
-            assert b"hermes-ws-ok" in buf
-
-    def test_client_input_reaches_child_stdin(self, monkeypatch):
-        # ``cat`` echoes stdin back, so a write → read round-trip proves
-        # the full duplex path.
-        monkeypatch.setattr(
-            self.ws_module,
-            "_resolve_chat_argv",
-            lambda resume=None: (["/bin/cat"], None, None),
-        )
-        with self.client.websocket_connect(self._url()) as conn:
-            conn.send_bytes(b"round-trip-payload\n")
-            buf = b""
-            import time
-
-            deadline = time.monotonic() + 5.0
-            while time.monotonic() < deadline:
-                frame = conn.receive_bytes()
-                if frame:
-                    buf += frame
-                if b"round-trip-payload" in buf:
-                    break
-            assert b"round-trip-payload" in buf
-
-    def test_resize_escape_is_forwarded(self, monkeypatch):
-        # Resize escape gets intercepted and applied via TIOCSWINSZ,
-        # then ``tput cols/lines`` reports the new dimensions back.
-        monkeypatch.setattr(
-            self.ws_module,
-            "_resolve_chat_argv",
-            # sleep gives the test time to push the resize before tput runs
-            lambda resume=None: (
-                ["/bin/sh", "-c", "sleep 0.15; tput cols; tput lines"],
-                None,
-                None,
-            ),
-        )
-        with self.client.websocket_connect(self._url()) as conn:
-            conn.send_text("\x1b[RESIZE:99;41]")
-            buf = b""
-            import time
-
-            deadline = time.monotonic() + 5.0
-            while time.monotonic() < deadline:
-                frame = conn.receive_bytes()
-                if frame:
-                    buf += frame
-                if b"99" in buf and b"41" in buf:
-                    break
-            assert b"99" in buf and b"41" in buf
-
-    def test_unavailable_platform_closes_with_message(self, monkeypatch):
-        from hermes_cli.pty_bridge import PtyUnavailableError
-
-        def _raise(argv, **kwargs):
-            raise PtyUnavailableError("pty missing for tests")
-
-        monkeypatch.setattr(
-            self.ws_module,
-            "_resolve_chat_argv",
-            lambda resume=None: (["/bin/cat"], None, None),
-        )
-        # Patch PtyBridge.spawn at the web_server module's binding.
-        import hermes_cli.web_server as ws_mod
-
-        monkeypatch.setattr(ws_mod.PtyBridge, "spawn", classmethod(lambda cls, *a, **k: _raise(*a, **k)))
-
-        with self.client.websocket_connect(self._url()) as conn:
-            # Expect a final text frame with the error message, then close.
-            msg = conn.receive_text()
-            assert "pty missing" in msg or "unavailable" in msg.lower() or "pty" in msg.lower()
-
-    def test_resume_parameter_is_forwarded_to_argv(self, monkeypatch):
-        captured: dict = {}
-
-        def fake_resolve(resume=None):
-            captured["resume"] = resume
-            return (["/bin/sh", "-c", "printf resume-arg-ok"], None, None)
-
-        monkeypatch.setattr(self.ws_module, "_resolve_chat_argv", fake_resolve)
-
-        with self.client.websocket_connect(self._url(resume="sess-42")) as conn:
-            # Drain briefly so the handler actually invokes the resolver.
-            try:
-                conn.receive_bytes()
-            except Exception:
-                pass
-        assert captured.get("resume") == "sess-42"
-
@@ -155,29 +155,3 @@ class TestFallbackChainAdvancement:
            ]
            assert agent._try_activate_fallback() is True
            assert agent.model == "gpt-4o"
-
-    def test_resolves_key_env_for_fallback_provider(self):
-        fbs = [
-            {
-                "provider": "custom",
-                "model": "fallback-model",
-                "base_url": "https://fallback.example/v1",
-                "key_env": "MY_FALLBACK_KEY",
-            }
-        ]
-        agent = _make_agent(fallback_model=fbs)
-        with (
-            patch.dict("os.environ", {"MY_FALLBACK_KEY": "env-secret"}, clear=False),
-            patch(
-                "agent.auxiliary_client.resolve_provider_client",
-                return_value=(
-                    _mock_client(
-                        base_url="https://fallback.example/v1",
-                        api_key="env-secret",
-                    ),
-                    "fallback-model",
-                ),
-            ) as mock_rpc,
-        ):
-            assert agent._try_activate_fallback() is True
-            assert mock_rpc.call_args.kwargs["explicit_api_key"] == "env-secret"
@@ -372,91 +372,6 @@ class TestStripThinkBlocks:
        assert "mixed" not in result
        assert "final" in result

-    # ─── Tool-call XML block stripping (openclaw/openclaw#67318) ─────────
-    # Some open models (notably Gemma variants via OpenRouter) emit
-    # standalone tool-call XML inside assistant content instead of via the
-    # structured `tool_calls` field. Left unstripped, raw XML leaks to
-    # gateway users (Discord/Telegram/Matrix) and the CLI.
-
-    def test_tool_call_block_stripped(self, agent):
-        text = '<tool_call>{"name": "read_file", "arguments": {"path": "/tmp/x"}}</tool_call> done'
-        result = agent._strip_think_blocks(text)
-        assert "<tool_call>" not in result
-        assert "read_file" not in result
-        assert "done" in result
-
-    def test_function_calls_block_stripped(self, agent):
-        text = '<function_calls>[{"name":"x"}]</function_calls>after'
-        result = agent._strip_think_blocks(text)
-        assert "<function_calls>" not in result
-        assert "after" in result
-
-    def test_gemma_function_name_block_stripped(self, agent):
-        """Gemma-style: <function name="read"><parameter>...</parameter></function>."""
-        text = (
-            'Let me check the file.\n'
-            '<function name="read_file"><parameter name="path">/tmp/x.md</parameter></function>\n'
-            'Here is the result.'
-        )
-        result = agent._strip_think_blocks(text)
-        assert '<function name="read_file">' not in result
-        assert "/tmp/x.md" not in result
-        assert "Let me check the file." in result
-        assert "Here is the result." in result
-
-    def test_gemma_function_multiline_payload_stripped(self, agent):
-        text = (
-            'Reading now.\n'
-            '<function name="read_file">\n'
-            '  <parameter name="path">/etc/passwd</parameter>\n'
-            '</function>\n'
-            'Done.'
-        )
-        result = agent._strip_think_blocks(text)
-        assert "/etc/passwd" not in result
-        assert "Reading now." in result
-        assert "Done." in result
-
-    def test_function_mention_in_prose_preserved(self, agent):
-        """'Use <function> in JavaScript.' — no name attr, not at block boundary
-        in a way that suggests tool call. Must survive."""
-        text = "In JS you can use <function> declarations for hoisting."
-        result = agent._strip_think_blocks(text)
-        # Prose mention has no name="..." attribute -> not stripped
-        assert "declarations for hoisting" in result
-
-    def test_function_with_attr_in_middle_of_sentence_preserved(self, agent):
-        """Docs example: 'Use <function name="x">...</function> in docs.'
-        The sentence-middle position without a preceding punctuation block
-        boundary means it is NOT stripped. Prose context remains."""
-        text = 'You can write <function name="x">y</function> inline.'
-        result = agent._strip_think_blocks(text)
-        # Without a leading block boundary (no punctuation before), leaves intact
-        assert "You can write" in result
-        assert "inline" in result
-
-    def test_stray_function_close_tag_removed(self, agent):
-        text = "answer</function> trailing"
-        result = agent._strip_think_blocks(text)
-        assert "</function>" not in result
-        assert "answer" in result
-        assert "trailing" in result
-
-    def test_dangling_function_open_tag_preserved(self, agent):
-        """A streamed-but-truncated <function name="..."> block with no close
-        is intentionally NOT stripped (OpenClaw's asymmetry). The tail of a
-        streaming reply may still be valuable to the user."""
-        text = 'Checking: <function name="read">'
-        result = agent._strip_think_blocks(text)
-        assert "Checking:" in result
-
-    def test_mixed_reasoning_and_tool_call_both_stripped(self, agent):
-        text = '<think>let me plan</think><tool_call>{"name":"x"}</tool_call>final answer'
-        result = agent._strip_think_blocks(text)
-        assert "let me plan" not in result
-        assert "<tool_call>" not in result
-        assert "final answer" in result
-

 class TestExtractReasoning:
    def test_reasoning_field(self, agent):
@@ -13,8 +13,7 @@ They do NOT boot the full AIAgent — the prologue-fix guarantees are pure
 function contracts at module scope.
 """

-from run_agent import _summarize_user_message_for_log
-from agent.codex_responses_adapter import _chat_content_to_responses_parts
+from run_agent import _chat_content_to_responses_parts, _summarize_user_message_for_log


 class TestSummarizeUserMessageForLog:
@@ -1133,225 +1133,3 @@ class TestPartialToolCallWarning:
            f"Unexpected warning on text-only partial stream: {content!r}"
        )

-
-class TestSilentRetryMidToolCall:
-    """Regression: when the stream dies mid tool-call JSON after text was
-    already delivered, we previously stubbed the turn with a "retry manually"
-    warning.  Now: if the error is a transient connection error AND a tool
-    call was in flight, silently retry the stream (the user sees a brief
-    reconnect marker + duplicated preamble, which is strictly better than
-    a lost action).  If no tool call was in flight, or the error isn't
-    transient, the existing stub-with-warning behaviour is preserved.
-    """
-
-    @patch("run_agent.AIAgent._replace_primary_openai_client")
-    @patch("run_agent.AIAgent._create_request_openai_client")
-    @patch("run_agent.AIAgent._close_request_openai_client")
-    def test_silent_retry_recovers_tool_call(
-        self, mock_close, mock_create, mock_replace,
-    ):
-        """First attempt: text + partial tool-call + connection drop.
-        Second attempt: text + complete tool-call.  Response should contain
-        the recovered tool call; no warning stub should be returned."""
-        from run_agent import AIAgent
-        import httpx as _httpx
-
-        attempts = {"n": 0}
-
-        def _first_stream():
-            yield _make_stream_chunk(content="Let me write the audit: ")
-            yield _make_stream_chunk(tool_calls=[
-                _make_tool_call_delta(index=0, tc_id="call_1", name="write_file"),
-            ])
-            yield _make_stream_chunk(tool_calls=[
-                _make_tool_call_delta(index=0, arguments='{"path": "/tmp/x", '),
-            ])
-            raise _httpx.RemoteProtocolError("peer closed connection")
-
-        def _second_stream():
-            yield _make_stream_chunk(content="Let me write the audit: ")
-            yield _make_stream_chunk(tool_calls=[
-                _make_tool_call_delta(index=0, tc_id="call_1", name="write_file"),
-            ])
-            yield _make_stream_chunk(tool_calls=[
-                _make_tool_call_delta(
-                    index=0, arguments='{"path": "/tmp/x", "content": "hi"}',
-                ),
-            ])
-            yield _make_stream_chunk(finish_reason="tool_calls")
-
-        def _pick_stream(*a, **kw):
-            attempts["n"] += 1
-            return _first_stream() if attempts["n"] == 1 else _second_stream()
-
-        mock_client = MagicMock()
-        mock_client.chat.completions.create.side_effect = _pick_stream
-        mock_create.return_value = mock_client
-
-        agent = AIAgent(
-            api_key="test-key",
-            base_url="https://openrouter.ai/api/v1",
-            model="test/model",
-            quiet_mode=True,
-            skip_context_files=True,
-            skip_memory=True,
-        )
-        agent.api_mode = "chat_completions"
-        agent._interrupt_requested = False
-
-        fired_deltas: list = []
-        agent._fire_stream_delta = lambda text: fired_deltas.append(text)
-
-        import os as _os
-        _prev = _os.environ.get("HERMES_STREAM_RETRIES")
-        _os.environ["HERMES_STREAM_RETRIES"] = "2"
-        try:
-            response = agent._interruptible_streaming_api_call({})
-        finally:
-            if _prev is None:
-                _os.environ.pop("HERMES_STREAM_RETRIES", None)
-            else:
-                _os.environ["HERMES_STREAM_RETRIES"] = _prev
-
-        assert attempts["n"] == 2, (
-            f"Expected silent retry (2 attempts), got {attempts['n']}"
-        )
-        # Response should carry the recovered tool call, not a warning stub.
-        msg = response.choices[0].message
-        tool_calls = getattr(msg, "tool_calls", None)
-        assert tool_calls, (
-            f"Silent retry should recover the tool call, got tool_calls={tool_calls!r} "
-            f"content={getattr(msg, 'content', None)!r}"
-        )
-        _tc0 = tool_calls[0]
-        _name = (
-            _tc0["function"]["name"] if isinstance(_tc0, dict)
-            else _tc0.function.name
-        )
-        assert _name == "write_file"
-        # User saw a reconnect marker between attempts.
-        assert any("reconnecting" in d.lower() for d in fired_deltas), (
-            f"Expected a reconnect marker delta, fired_deltas={fired_deltas}"
-        )
-        # Stub-path warning must NOT appear (this was the whole point).
-        joined = "".join(fired_deltas)
-        assert "Stream stalled" not in joined, (
-            f"Stub-path warning leaked into silent-retry path: {joined!r}"
-        )
-
-    @patch("run_agent.AIAgent._replace_primary_openai_client")
-    @patch("run_agent.AIAgent._create_request_openai_client")
-    @patch("run_agent.AIAgent._close_request_openai_client")
-    def test_silent_retry_exhausted_falls_back_to_stub(
-        self, mock_close, mock_create, mock_replace,
-    ):
-        """When all retry attempts fail with connection errors, fall back
-        to the original stub-with-warning behaviour so the user isn't left
-        with zero signal."""
-        from run_agent import AIAgent
-        import httpx as _httpx
-
-        def _always_fails():
-            yield _make_stream_chunk(content="Let me write the audit: ")
-            yield _make_stream_chunk(tool_calls=[
-                _make_tool_call_delta(index=0, tc_id="call_1", name="write_file"),
-            ])
-            raise _httpx.RemoteProtocolError("peer closed connection")
-
-        mock_client = MagicMock()
-        mock_client.chat.completions.create.side_effect = lambda *a, **kw: _always_fails()
-        mock_create.return_value = mock_client
-
-        agent = AIAgent(
-            api_key="test-key",
-            base_url="https://openrouter.ai/api/v1",
-            model="test/model",
-            quiet_mode=True,
-            skip_context_files=True,
-            skip_memory=True,
-        )
-        agent.api_mode = "chat_completions"
-        agent._interrupt_requested = False
-
-        fired_deltas: list = []
-        agent._fire_stream_delta = lambda text: fired_deltas.append(text)
-
-        import os as _os
-        _prev = _os.environ.get("HERMES_STREAM_RETRIES")
-        _os.environ["HERMES_STREAM_RETRIES"] = "1"
-        try:
-            response = agent._interruptible_streaming_api_call({})
-        finally:
-            if _prev is None:
-                _os.environ.pop("HERMES_STREAM_RETRIES", None)
-            else:
-                _os.environ["HERMES_STREAM_RETRIES"] = _prev
-
-        # After retries exhaust, the stub-with-warning path must engage.
-        content = response.choices[0].message.content or ""
-        assert "Stream stalled mid tool-call" in content, (
-            f"Exhausted-retry fallback dropped the user-visible warning: {content!r}"
-        )
-        assert response.choices[0].message.tool_calls is None
-
-    @patch("run_agent.AIAgent._replace_primary_openai_client")
-    @patch("run_agent.AIAgent._create_request_openai_client")
-    @patch("run_agent.AIAgent._close_request_openai_client")
-    def test_no_silent_retry_for_text_only_stall(
-        self, mock_close, mock_create, mock_replace,
-    ):
-        """Text-only stall (no tool call in flight) must NOT trigger silent
-        retry — that's the case where the user saw the model's text reply
-        and retrying would duplicate it with no benefit."""
-        from run_agent import AIAgent
-        import httpx as _httpx
-
-        attempts = {"n": 0}
-
-        def _text_stall(*a, **kw):
-            attempts["n"] += 1
-
-            def _gen():
-                yield _make_stream_chunk(content="Here's my answer so far")
-                raise _httpx.RemoteProtocolError("peer closed connection")
-            return _gen()
-
-        mock_client = MagicMock()
-        mock_client.chat.completions.create.side_effect = _text_stall
-        mock_create.return_value = mock_client
-
-        agent = AIAgent(
-            api_key="test-key",
-            base_url="https://openrouter.ai/api/v1",
-            model="test/model",
-            quiet_mode=True,
-            skip_context_files=True,
-            skip_memory=True,
-        )
-        agent.api_mode = "chat_completions"
-        agent._interrupt_requested = False
-        agent._current_streamed_assistant_text = "Here's my answer so far"
-
-        import os as _os
-        _prev = _os.environ.get("HERMES_STREAM_RETRIES")
-        _os.environ["HERMES_STREAM_RETRIES"] = "2"
-        try:
-            response = agent._interruptible_streaming_api_call({})
-        finally:
-            if _prev is None:
-                _os.environ.pop("HERMES_STREAM_RETRIES", None)
-            else:
-                _os.environ["HERMES_STREAM_RETRIES"] = _prev
-
-        # Only one attempt: text-only stall short-circuits retry.
-        assert attempts["n"] == 1, (
-            f"Text-only stall should not silent-retry, got {attempts['n']} attempts"
-        )
-        content = response.choices[0].message.content or ""
-        assert content == "Here's my answer so far", (
-            f"Text-only stall regressed: {content!r}"
-        )
-        assert "Stream stalled" not in content, (
-            f"Text-only stall should not emit tool-call warning: {content!r}"
-        )
-
@@ -1,69 +0,0 @@
-"""Tests for cli.py::_strip_reasoning_tags — specifically the tool-call
-XML stripping added in openclaw/openclaw#67318 port.
-
-The CLI has its own copy of the stripper because it needs to run on the
-final displayed assistant text (after streaming) without depending on the
-AIAgent instance. It must stay in sync with run_agent.py::_strip_think_blocks
-for tool-call tag coverage."""
-
-import pytest
-
-from cli import _strip_reasoning_tags
-
-
-class TestToolCallStripping:
-    def test_tool_call_block_stripped(self):
-        text = '<tool_call>{"name": "x"}</tool_call>result'
-        result = _strip_reasoning_tags(text)
-        assert "<tool_call>" not in result
-        assert "result" in result
-
-    def test_function_calls_block_stripped(self):
-        text = '<function_calls>[{}]</function_calls>\nanswer'
-        result = _strip_reasoning_tags(text)
-        assert "<function_calls>" not in result
-        assert "answer" in result
-
-    def test_gemma_function_name_block_stripped(self):
-        text = (
-            'Reading.\n'
-            '<function name="r"><parameter name="p">/tmp/x</parameter></function>\n'
-            'Done.'
-        )
-        result = _strip_reasoning_tags(text)
-        assert '<function name="r">' not in result
-        assert "/tmp/x" not in result
-        assert "Reading." in result
-        assert "Done." in result
-
-    def test_prose_mention_of_function_preserved(self):
-        text = "Use <function> declarations in JavaScript."
-        result = _strip_reasoning_tags(text)
-        assert "JavaScript" in result
-
-    def test_reasoning_still_stripped(self):
-        """Regression: make sure existing think-tag stripping still works."""
-        text = "<think>reasoning</think> answer"
-        result = _strip_reasoning_tags(text)
-        assert "reasoning" not in result
-        assert "answer" in result
-
-    def test_mixed_reasoning_and_tool_call(self):
-        text = '<think>plan</think><tool_call>{"x":1}</tool_call>final'
-        result = _strip_reasoning_tags(text)
-        assert "plan" not in result
-        assert "<tool_call>" not in result
-        assert "final" in result
-
-    def test_stray_function_close(self):
-        text = "visible</function> tail"
-        result = _strip_reasoning_tags(text)
-        assert "</function>" not in result
-        assert "visible" in result
-        assert "tail" in result
-
-    def test_empty_string(self):
-        assert _strip_reasoning_tags("") == ""
-
-    def test_plain_text_unchanged(self):
-        assert _strip_reasoning_tags("just text") == "just text"
@@ -66,9 +66,6 @@ class TestCliSkinPromptIntegration:
        assert style_dict["prompt"] == skin.get_color("prompt")
        assert style_dict["input-rule"] == skin.get_color("input_rule")
        assert style_dict["prompt-working"] == f"{skin.get_color('banner_dim')} italic"
-        assert style_dict["status-bar"] == (
-            f"bg:{skin.get_color('status_bar_bg')} {skin.get_color('status_bar_text')}"
-        )
        assert style_dict["approval-title"] == f"{skin.get_color('ui_warn')} bold"

    def test_apply_tui_skin_style_updates_running_app(self):
@@ -197,68 +197,6 @@ class TestRunAsyncWithRunningLoop:
        )
        assert result == 42

-    @pytest.mark.asyncio
-    async def test_timeout_uses_nonblocking_executor_shutdown(self, monkeypatch):
-        """A timeout in the running-loop branch must not wait for the worker.
-
-        ThreadPoolExecutor's context manager performs shutdown(wait=True).
-        If _run_async relies on that path after future.result(timeout=...)
-        times out, the timeout does not bound wall-clock time because the
-        caller still waits for the stuck coroutine's thread to finish.
-        """
-        import concurrent.futures
-        from model_tools import _run_async
-
-        events = {
-            "cancelled": False,
-            "result_timeout": None,
-            "shutdown_calls": [],
-        }
-
-        class TimeoutFuture:
-            def result(self, timeout=None):
-                events["result_timeout"] = timeout
-                raise concurrent.futures.TimeoutError()
-
-            def cancel(self):
-                events["cancelled"] = True
-                return True
-
-        class FakeExecutor:
-            def __init__(self, *args, **kwargs):
-                pass
-
-            def __enter__(self):
-                return self
-
-            def __exit__(self, exc_type, exc, tb):
-                self.shutdown(wait=True)
-                return False
-
-            def submit(self, fn, *args, **kwargs):
-                if args and hasattr(args[0], "close"):
-                    args[0].close()
-                return TimeoutFuture()
-
-            def shutdown(self, wait=True, cancel_futures=False):
-                events["shutdown_calls"].append((wait, cancel_futures))
-
-        async def _never_finishes():
-            await asyncio.sleep(999)
-
-        monkeypatch.setattr(
-            concurrent.futures,
-            "ThreadPoolExecutor",
-            FakeExecutor,
-        )
-
-        with pytest.raises(concurrent.futures.TimeoutError):
-            _run_async(_never_finishes())
-
-        assert events["result_timeout"] == 300
-        assert events["cancelled"] is True
-        assert events["shutdown_calls"] == [(False, True)]
-

 # ---------------------------------------------------------------------------
 # Integration: full vision_analyze dispatch chain
@@ -106,23 +106,11 @@ def test_config_set_yolo_toggles_session_scope():

    server._sessions["sid"] = _session()
    try:
-        resp_on = server.handle_request(
-            {
-                "id": "1",
-                "method": "config.set",
-                "params": {"session_id": "sid", "key": "yolo"},
-            }
-        )
+        resp_on = server.handle_request({"id": "1", "method": "config.set", "params": {"session_id": "sid", "key": "yolo"}})
        assert resp_on["result"]["value"] == "1"
        assert is_session_yolo_enabled("session-key") is True

-        resp_off = server.handle_request(
-            {
-                "id": "2",
-                "method": "config.set",
-                "params": {"session_id": "sid", "key": "yolo"},
-            }
-        )
+        resp_off = server.handle_request({"id": "2", "method": "config.set", "params": {"session_id": "sid", "key": "yolo"}})
        assert resp_off["result"]["value"] == "0"
        assert is_session_yolo_enabled("session-key") is False
    finally:
@@ -130,36 +118,6 @@ def test_config_set_yolo_toggles_session_scope():
        server._sessions.clear()


-def test_config_get_statusbar_survives_non_dict_display(monkeypatch):
-    monkeypatch.setattr(server, "_load_cfg", lambda: {"display": "broken"})
-
-    resp = server.handle_request(
-        {"id": "1", "method": "config.get", "params": {"key": "statusbar"}}
-    )
-
-    assert resp["result"]["value"] == "top"
-
-
-def test_config_set_statusbar_survives_non_dict_display(tmp_path, monkeypatch):
-    import yaml
-
-    cfg_path = tmp_path / "config.yaml"
-    cfg_path.write_text(yaml.safe_dump({"display": "broken"}))
-    monkeypatch.setattr(server, "_hermes_home", tmp_path)
-
-    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {"key": "statusbar", "value": "bottom"},
-        }
-    )
-
-    assert resp["result"]["value"] == "bottom"
-    saved = yaml.safe_load(cfg_path.read_text())
-    assert saved["display"]["tui_statusbar"] == "bottom"
-
-
 def test_enable_gateway_prompts_sets_gateway_env(monkeypatch):
    monkeypatch.delenv("HERMES_EXEC_ASK", raising=False)
    monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
@@ -186,21 +144,13 @@ def test_config_set_reasoning_updates_live_session_and_agent(tmp_path, monkeypat
    server._sessions["sid"] = _session(agent=agent)

    resp_effort = server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {"session_id": "sid", "key": "reasoning", "value": "low"},
-        }
+        {"id": "1", "method": "config.set", "params": {"session_id": "sid", "key": "reasoning", "value": "low"}}
    )
    assert resp_effort["result"]["value"] == "low"
    assert agent.reasoning_config == {"enabled": True, "effort": "low"}

    resp_show = server.handle_request(
-        {
-            "id": "2",
-            "method": "config.set",
-            "params": {"session_id": "sid", "key": "reasoning", "value": "show"},
-        }
+        {"id": "2", "method": "config.set", "params": {"session_id": "sid", "key": "reasoning", "value": "show"}}
    )
    assert resp_show["result"]["value"] == "show"
    assert server._sessions["sid"]["show_reasoning"] is True
@@ -212,11 +162,7 @@ def test_config_set_verbose_updates_session_mode_and_agent(tmp_path, monkeypatch
    server._sessions["sid"] = _session(agent=agent)

    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {"session_id": "sid", "key": "verbose", "value": "cycle"},
-        }
+        {"id": "1", "method": "config.set", "params": {"session_id": "sid", "key": "verbose", "value": "cycle"}}
    )

    assert resp["result"]["value"] == "verbose"
@@ -234,11 +180,7 @@ def test_config_set_model_uses_live_switch_path(monkeypatch):

    monkeypatch.setattr(server, "_apply_model_switch", _fake_apply)
    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {"session_id": "sid", "key": "model", "value": "new/model"},
-        }
+        {"id": "1", "method": "config.set", "params": {"session_id": "sid", "key": "model", "value": "new/model"}}
    )

    assert resp["result"]["value"] == "new/model"
@@ -279,15 +221,7 @@ def test_config_set_model_global_persists(monkeypatch):
    monkeypatch.setattr("hermes_cli.config.save_config", lambda cfg: saved.update(cfg))

    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {
-                "session_id": "sid",
-                "key": "model",
-                "value": "anthropic/claude-sonnet-4.6 --global",
-            },
-        }
+        {"id": "1", "method": "config.set", "params": {"session_id": "sid", "key": "model", "value": "anthropic/claude-sonnet-4.6 --global"}}
    )

    assert resp["result"]["value"] == "anthropic/claude-sonnet-4.6"
@@ -307,7 +241,6 @@ def test_config_set_model_syncs_inference_provider_env(monkeypatch):
    trying openrouter because the env-var-backed resolvers still saw the old
    provider.
    """
-
    class _Agent:
        provider = "openrouter"
        model = "old/model"
@@ -329,39 +262,21 @@ def test_config_set_model_syncs_inference_provider_env(monkeypatch):

    server._sessions["sid"] = _session(agent=_Agent())
    monkeypatch.setenv("HERMES_INFERENCE_PROVIDER", "openrouter")
-    monkeypatch.setattr(
-        "hermes_cli.model_switch.switch_model", lambda **_kwargs: result
-    )
+    monkeypatch.setattr("hermes_cli.model_switch.switch_model", lambda **_kwargs: result)
    monkeypatch.setattr(server, "_restart_slash_worker", lambda session: None)
    monkeypatch.setattr(server, "_emit", lambda *args, **kwargs: None)

    server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {
-                "session_id": "sid",
-                "key": "model",
-                "value": "claude-sonnet-4.6 --provider anthropic",
-            },
-        }
+        {"id": "1", "method": "config.set", "params": {"session_id": "sid", "key": "model", "value": "claude-sonnet-4.6 --provider anthropic"}}
    )

    assert os.environ["HERMES_INFERENCE_PROVIDER"] == "anthropic"


 def test_config_set_personality_rejects_unknown_name(monkeypatch):
-    monkeypatch.setattr(
-        server,
-        "_available_personalities",
-        lambda cfg=None: {"helpful": "You are helpful."},
-    )
+    monkeypatch.setattr(server, "_available_personalities", lambda cfg=None: {"helpful": "You are helpful."})
    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {"key": "personality", "value": "bogus"},
-        }
+        {"id": "1", "method": "config.set", "params": {"key": "personality", "value": "bogus"}}
    )

    assert "error" in resp
@@ -369,36 +284,20 @@ def test_config_set_personality_rejects_unknown_name(monkeypatch):


 def test_config_set_personality_resets_history_and_returns_info(monkeypatch):
-    session = _session(
-        agent=types.SimpleNamespace(),
-        history=[{"role": "user", "text": "hi"}],
-        history_version=4,
-    )
+    session = _session(agent=types.SimpleNamespace(), history=[{"role": "user", "text": "hi"}], history_version=4)
    new_agent = types.SimpleNamespace(model="x")
    emits = []

    server._sessions["sid"] = session
-    monkeypatch.setattr(
-        server,
-        "_available_personalities",
-        lambda cfg=None: {"helpful": "You are helpful."},
-    )
-    monkeypatch.setattr(
-        server, "_make_agent", lambda sid, key, session_id=None: new_agent
-    )
-    monkeypatch.setattr(
-        server, "_session_info", lambda agent: {"model": getattr(agent, "model", "?")}
-    )
+    monkeypatch.setattr(server, "_available_personalities", lambda cfg=None: {"helpful": "You are helpful."})
+    monkeypatch.setattr(server, "_make_agent", lambda sid, key, session_id=None: new_agent)
+    monkeypatch.setattr(server, "_session_info", lambda agent: {"model": getattr(agent, "model", "?")})
    monkeypatch.setattr(server, "_restart_slash_worker", lambda session: None)
    monkeypatch.setattr(server, "_emit", lambda *args: emits.append(args))
    monkeypatch.setattr(server, "_write_config_key", lambda path, value: None)

    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "config.set",
-            "params": {"session_id": "sid", "key": "personality", "value": "helpful"},
-        }
+        {"id": "1", "method": "config.set", "params": {"session_id": "sid", "key": "personality", "value": "helpful"}}
    )

    assert resp["result"]["history_reset"] is True
@@ -412,17 +311,11 @@ def test_session_compress_uses_compress_helper(monkeypatch):
    agent = types.SimpleNamespace()
    server._sessions["sid"] = _session(agent=agent)

-    monkeypatch.setattr(
-        server,
-        "_compress_session_history",
-        lambda session, focus_topic=None: (2, {"total": 42}),
-    )
+    monkeypatch.setattr(server, "_compress_session_history", lambda session, focus_topic=None: (2, {"total": 42}))
    monkeypatch.setattr(server, "_session_info", lambda _agent: {"model": "x"})

    with patch("tui_gateway.server._emit") as emit:
-        resp = server.handle_request(
-            {"id": "1", "method": "session.compress", "params": {"session_id": "sid"}}
-        )
+        resp = server.handle_request({"id": "1", "method": "session.compress", "params": {"session_id": "sid"}})

    assert resp["result"]["removed"] == 2
    assert resp["result"]["usage"]["total"] == 42
@@ -435,14 +328,9 @@ def test_prompt_submit_sets_approval_session_key(monkeypatch):
    captured = {}

    class _Agent:
-        def run_conversation(
-            self, prompt, conversation_history=None, stream_callback=None
-        ):
+        def run_conversation(self, prompt, conversation_history=None, stream_callback=None):
            captured["session_key"] = get_current_session_key(default="")
-            return {
-                "final_response": "ok",
-                "messages": [{"role": "assistant", "content": "ok"}],
-            }
+            return {"final_response": "ok", "messages": [{"role": "assistant", "content": "ok"}]}

    class _ImmediateThread:
        def __init__(self, target=None, daemon=None):
@@ -457,13 +345,7 @@ def test_prompt_submit_sets_approval_session_key(monkeypatch):
    monkeypatch.setattr(server, "make_stream_renderer", lambda cols: None)
    monkeypatch.setattr(server, "render_message", lambda raw, cols: None)

-    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "prompt.submit",
-            "params": {"session_id": "sid", "text": "ping"},
-        }
-    )
+    resp = server.handle_request({"id": "1", "method": "prompt.submit", "params": {"session_id": "sid", "text": "ping"}})

    assert resp["result"]["status"] == "streaming"
    assert captured["session_key"] == "session-key"
@@ -477,14 +359,9 @@ def test_prompt_submit_expands_context_refs(monkeypatch):
        base_url = ""
        api_key = ""

-        def run_conversation(
-            self, prompt, conversation_history=None, stream_callback=None
-        ):
+        def run_conversation(self, prompt, conversation_history=None, stream_callback=None):
            captured["prompt"] = prompt
-            return {
-                "final_response": "ok",
-                "messages": [{"role": "assistant", "content": "ok"}],
-            }
+            return {"final_response": "ok", "messages": [{"role": "assistant", "content": "ok"}]}

    class _ImmediateThread:
        def __init__(self, target=None, daemon=None):
@@ -494,14 +371,8 @@ def test_prompt_submit_expands_context_refs(monkeypatch):
            self._target()

    fake_ctx = types.ModuleType("agent.context_references")
-    fake_ctx.preprocess_context_references = (
-        lambda message, **kwargs: types.SimpleNamespace(
-            blocked=False,
-            message="expanded prompt",
-            warnings=[],
-            references=[],
-            injected_tokens=0,
-        )
+    fake_ctx.preprocess_context_references = lambda message, **kwargs: types.SimpleNamespace(
+        blocked=False, message="expanded prompt", warnings=[], references=[], injected_tokens=0
    )
    fake_meta = types.ModuleType("agent.model_metadata")
    fake_meta.get_model_context_length = lambda *args, **kwargs: 100000
@@ -514,13 +385,7 @@ def test_prompt_submit_expands_context_refs(monkeypatch):
    monkeypatch.setitem(sys.modules, "agent.context_references", fake_ctx)
    monkeypatch.setitem(sys.modules, "agent.model_metadata", fake_meta)

-    server.handle_request(
-        {
-            "id": "1",
-            "method": "prompt.submit",
-            "params": {"session_id": "sid", "text": "@diff"},
-        }
-    )
+    server.handle_request({"id": "1", "method": "prompt.submit", "params": {"session_id": "sid", "text": "@diff"}})

    assert captured["prompt"] == "expanded prompt"

@@ -539,13 +404,7 @@ def test_image_attach_appends_local_image(monkeypatch):
    server._sessions["sid"] = _session()
    monkeypatch.setitem(sys.modules, "cli", fake_cli)

-    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "image.attach",
-            "params": {"session_id": "sid", "path": "/tmp/cat.png"},
-        }
-    )
+    resp = server.handle_request({"id": "1", "method": "image.attach", "params": {"session_id": "sid", "path": "/tmp/cat.png"}})

    assert resp["result"]["attached"] is True
    assert resp["result"]["name"] == "cat.png"
@@ -561,21 +420,14 @@ def test_image_attach_accepts_unquoted_screenshot_path_with_spaces(monkeypatch):
        "is_image": True,
        "remainder": "",
    }
-    fake_cli._split_path_input = lambda raw: (
-        "/tmp/Screenshot",
-        "2026-04-21 at 1.04.43 PM.png",
-    )
+    fake_cli._split_path_input = lambda raw: ("/tmp/Screenshot", "2026-04-21 at 1.04.43 PM.png")
    fake_cli._resolve_attachment_path = lambda raw: None

    server._sessions["sid"] = _session()
    monkeypatch.setitem(sys.modules, "cli", fake_cli)

    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "image.attach",
-            "params": {"session_id": "sid", "path": str(screenshot)},
-        }
+        {"id": "1", "method": "image.attach", "params": {"session_id": "sid", "path": str(screenshot)}}
    )

    assert resp["result"]["attached"] is True
@@ -585,34 +437,20 @@ def test_image_attach_accepts_unquoted_screenshot_path_with_spaces(monkeypatch):


 def test_commands_catalog_surfaces_quick_commands(monkeypatch):
-    monkeypatch.setattr(
-        server,
-        "_load_cfg",
-        lambda: {
-            "quick_commands": {
-                "build": {"type": "exec", "command": "npm run build"},
-                "git": {"type": "alias", "target": "/shell git"},
-                "notes": {
-                    "type": "exec",
-                    "command": "cat NOTES.md",
-                    "description": "Open design notes",
-                },
-            }
-        },
-    )
+    monkeypatch.setattr(server, "_load_cfg", lambda: {"quick_commands": {
+        "build": {"type": "exec", "command": "npm run build"},
+        "git": {"type": "alias", "target": "/shell git"},
+        "notes": {"type": "exec", "command": "cat NOTES.md", "description": "Open design notes"},
+    }})

-    resp = server.handle_request(
-        {"id": "1", "method": "commands.catalog", "params": {}}
-    )
+    resp = server.handle_request({"id": "1", "method": "commands.catalog", "params": {}})

    pairs = dict(resp["result"]["pairs"])
    assert "npm run build" in pairs["/build"]
    assert pairs["/git"].startswith("alias →")
    assert pairs["/notes"] == "Open design notes"

-    user_cat = next(
-        c for c in resp["result"]["categories"] if c["name"] == "User commands"
-    )
+    user_cat = next(c for c in resp["result"]["categories"] if c["name"] == "User commands")
    user_pairs = dict(user_cat["pairs"])
    assert set(user_pairs) == {"/build", "/git", "/notes"}

@@ -621,22 +459,14 @@ def test_commands_catalog_surfaces_quick_commands(monkeypatch):


 def test_command_dispatch_exec_nonzero_surfaces_error(monkeypatch):
-    monkeypatch.setattr(
-        server,
-        "_load_cfg",
-        lambda: {"quick_commands": {"boom": {"type": "exec", "command": "boom"}}},
-    )
+    monkeypatch.setattr(server, "_load_cfg", lambda: {"quick_commands": {"boom": {"type": "exec", "command": "boom"}}})
    monkeypatch.setattr(
        server.subprocess,
        "run",
-        lambda *args, **kwargs: types.SimpleNamespace(
-            returncode=1, stdout="", stderr="failed"
-        ),
+        lambda *args, **kwargs: types.SimpleNamespace(returncode=1, stdout="", stderr="failed"),
    )

-    resp = server.handle_request(
-        {"id": "1", "method": "command.dispatch", "params": {"name": "boom"}}
-    )
+    resp = server.handle_request({"id": "1", "method": "command.dispatch", "params": {"name": "boom"}})

    assert "error" in resp
    assert "failed" in resp["error"]["message"]
@@ -644,22 +474,15 @@ def test_command_dispatch_exec_nonzero_surfaces_error(monkeypatch):

 def test_plugins_list_surfaces_loader_error(monkeypatch):
    with patch("hermes_cli.plugins.get_plugin_manager", side_effect=Exception("boom")):
-        resp = server.handle_request(
-            {"id": "1", "method": "plugins.list", "params": {}}
-        )
+        resp = server.handle_request({"id": "1", "method": "plugins.list", "params": {}})

    assert "error" in resp
    assert "boom" in resp["error"]["message"]


 def test_complete_slash_surfaces_completer_error(monkeypatch):
-    with patch(
-        "hermes_cli.commands.SlashCommandCompleter",
-        side_effect=Exception("no completer"),
-    ):
-        resp = server.handle_request(
-            {"id": "1", "method": "complete.slash", "params": {"text": "/mo"}}
-        )
+    with patch("hermes_cli.commands.SlashCommandCompleter", side_effect=Exception("no completer")):
+        resp = server.handle_request({"id": "1", "method": "complete.slash", "params": {"text": "/mo"}})

    assert "error" in resp
    assert "no completer" in resp["error"]["message"]
@@ -677,11 +500,7 @@ def test_input_detect_drop_attaches_image(monkeypatch):
    monkeypatch.setitem(sys.modules, "cli", fake_cli)

    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "input.detect_drop",
-            "params": {"session_id": "sid", "text": "/tmp/cat.png"},
-        }
+        {"id": "1", "method": "input.detect_drop", "params": {"session_id": "sid", "text": "/tmp/cat.png"}}
    )

    assert resp["result"]["matched"] is True
@@ -702,9 +521,7 @@ def test_rollback_restore_resolves_number_and_file_path():
            calls["args"] = (cwd, target, file_path)
            return {"success": True, "message": "done"}

-    server._sessions["sid"] = _session(
-        agent=types.SimpleNamespace(_checkpoint_mgr=_Mgr()), history=[]
-    )
+    server._sessions["sid"] = _session(agent=types.SimpleNamespace(_checkpoint_mgr=_Mgr()), history=[])
    resp = server.handle_request(
        {
            "id": "1",
@@ -755,9 +572,7 @@ def test_session_steer_calls_agent_steer_when_agent_supports_it():


 def test_session_steer_rejects_empty_text():
-    server._sessions["sid"] = _session(
-        agent=types.SimpleNamespace(steer=lambda t: True)
-    )
+    server._sessions["sid"] = _session(agent=types.SimpleNamespace(steer=lambda t: True))
    try:
        resp = server.handle_request(
            {
@@ -817,13 +632,10 @@ def test_session_undo_rejects_while_running():
    """Fix for TUI silent-drop #1: /undo must not mutate history
    while the agent is mid-turn — would either clobber the undo or
    cause prompt.submit to silently drop the agent's response."""
-    server._sessions["sid"] = _session(
-        running=True,
-        history=[
-            {"role": "user", "content": "hi"},
-            {"role": "assistant", "content": "hello"},
-        ],
-    )
+    server._sessions["sid"] = _session(running=True, history=[
+        {"role": "user", "content": "hi"},
+        {"role": "assistant", "content": "hello"},
+    ])
    try:
        resp = server.handle_request(
            {"id": "1", "method": "session.undo", "params": {"session_id": "sid"}}
@@ -839,13 +651,10 @@ def test_session_undo_rejects_while_running():

 def test_session_undo_allowed_when_idle():
    """Regression guard: when not running, /undo still works."""
-    server._sessions["sid"] = _session(
-        running=False,
-        history=[
-            {"role": "user", "content": "hi"},
-            {"role": "assistant", "content": "hello"},
-        ],
-    )
+    server._sessions["sid"] = _session(running=False, history=[
+        {"role": "user", "content": "hi"},
+        {"role": "assistant", "content": "hello"},
+    ])
    try:
        resp = server.handle_request(
            {"id": "1", "method": "session.undo", "params": {"session_id": "sid"}}
@@ -874,11 +683,7 @@ def test_rollback_restore_rejects_full_history_while_running(monkeypatch):
    server._sessions["sid"] = _session(running=True)
    try:
        resp = server.handle_request(
-            {
-                "id": "1",
-                "method": "rollback.restore",
-                "params": {"session_id": "sid", "hash": "abc"},
-            }
+            {"id": "1", "method": "rollback.restore", "params": {"session_id": "sid", "hash": "abc"}}
        )
        assert resp.get("error"), "full-history rollback should reject while running"
        assert resp["error"]["code"] == 4009
@@ -896,17 +701,12 @@ def test_prompt_submit_history_version_mismatch_surfaces_warning(monkeypatch):
    session_ref = {"s": None}

    class _RacyAgent:
-        def run_conversation(
-            self, prompt, conversation_history=None, stream_callback=None
-        ):
+        def run_conversation(self, prompt, conversation_history=None, stream_callback=None):
            # Simulate: something external bumped history_version
            # while we were running.
            with session_ref["s"]["history_lock"]:
                session_ref["s"]["history_version"] += 1
-            return {
-                "final_response": "agent reply",
-                "messages": [{"role": "assistant", "content": "agent reply"}],
-            }
+            return {"final_response": "agent reply", "messages": [{"role": "assistant", "content": "agent reply"}]}

    class _ImmediateThread:
        def __init__(self, target=None, daemon=None):
@@ -925,11 +725,7 @@ def test_prompt_submit_history_version_mismatch_surfaces_warning(monkeypatch):
        monkeypatch.setattr(server, "_emit", lambda *a: emits.append(a))

        resp = server.handle_request(
-            {
-                "id": "1",
-                "method": "prompt.submit",
-                "params": {"session_id": "sid", "text": "hi"},
-            }
+            {"id": "1", "method": "prompt.submit", "params": {"session_id": "sid", "text": "hi"}}
        )
        assert resp.get("result"), f"got error: {resp.get('error')}"

@@ -946,25 +742,16 @@ def test_prompt_submit_history_version_mismatch_surfaces_warning(monkeypatch):
            "history_version mismatch — otherwise the UI silently "
            "shows output that was never persisted"
        )
-        assert (
-            "not saved" in payload["warning"].lower()
-            or "changed" in payload["warning"].lower()
-        )
+        assert "not saved" in payload["warning"].lower() or "changed" in payload["warning"].lower()
    finally:
        server._sessions.pop("sid", None)


 def test_prompt_submit_history_version_match_persists_normally(monkeypatch):
    """Regression guard: the backstop does not affect the happy path."""
-
    class _Agent:
-        def run_conversation(
-            self, prompt, conversation_history=None, stream_callback=None
-        ):
-            return {
-                "final_response": "reply",
-                "messages": [{"role": "assistant", "content": "reply"}],
-            }
+        def run_conversation(self, prompt, conversation_history=None, stream_callback=None):
+            return {"final_response": "reply", "messages": [{"role": "assistant", "content": "reply"}]}

    class _ImmediateThread:
        def __init__(self, target=None, daemon=None):
@@ -982,18 +769,12 @@ def test_prompt_submit_history_version_match_persists_normally(monkeypatch):
        monkeypatch.setattr(server, "_emit", lambda *a: emits.append(a))

        resp = server.handle_request(
-            {
-                "id": "1",
-                "method": "prompt.submit",
-                "params": {"session_id": "sid", "text": "hi"},
-            }
+            {"id": "1", "method": "prompt.submit", "params": {"session_id": "sid", "text": "hi"}}
        )
        assert resp.get("result")

        # History was written
-        assert server._sessions["sid"]["history"] == [
-            {"role": "assistant", "content": "reply"}
-        ]
+        assert server._sessions["sid"]["history"] == [{"role": "assistant", "content": "reply"}]
        assert server._sessions["sid"]["history_version"] == 1

        # No warning should be attached
@@ -1037,11 +818,7 @@ def test_interrupt_only_clears_own_session_pending():

        # Interrupt session A.
        resp = server.handle_request(
-            {
-                "id": "1",
-                "method": "session.interrupt",
-                "params": {"session_id": "sid_a"},
-            }
+            {"id": "1", "method": "session.interrupt", "params": {"session_id": "sid_a"}}
        )
        assert resp.get("result"), f"got error: {resp.get('error')}"

@@ -1114,11 +891,8 @@ def test_respond_unpacks_sid_tuple_correctly():
    server._pending["rid-x"] = ("sid_x", ev)
    try:
        resp = server.handle_request(
-            {
-                "id": "1",
-                "method": "clarify.respond",
-                "params": {"request_id": "rid-x", "answer": "the answer"},
-            }
+            {"id": "1", "method": "clarify.respond",
+             "params": {"request_id": "rid-x", "answer": "the answer"}}
        )
        assert resp.get("result")
        assert ev.is_set()
@@ -1128,6 +902,7 @@ def test_respond_unpacks_sid_tuple_correctly():
        server._answers.pop("rid-x", None)


+
 # ---------------------------------------------------------------------------
 # /model switch and other agent-mutating commands must reject while the
 # session is running.  agent.switch_model() mutates self.model, self.provider,
@@ -1150,17 +925,10 @@ def test_config_set_model_rejects_while_running(monkeypatch):

    server._sessions["sid"] = _session(running=True)
    try:
-        resp = server.handle_request(
-            {
-                "id": "1",
-                "method": "config.set",
-                "params": {
-                    "session_id": "sid",
-                    "key": "model",
-                    "value": "anthropic/claude-sonnet-4.6",
-                },
-            }
-        )
+        resp = server.handle_request({
+            "id": "1", "method": "config.set",
+            "params": {"session_id": "sid", "key": "model", "value": "anthropic/claude-sonnet-4.6"},
+        })
        assert resp.get("error")
        assert resp["error"]["code"] == 4009
        assert "session busy" in resp["error"]["message"]
@@ -1184,13 +952,10 @@ def test_config_set_model_allowed_when_idle(monkeypatch):

    server._sessions["sid"] = _session(running=False)
    try:
-        resp = server.handle_request(
-            {
-                "id": "1",
-                "method": "config.set",
-                "params": {"session_id": "sid", "key": "model", "value": "newmodel"},
-            }
-        )
+        resp = server.handle_request({
+            "id": "1", "method": "config.set",
+            "params": {"session_id": "sid", "key": "model", "value": "newmodel"},
+        })
        assert resp.get("result")
        assert resp["result"]["value"] == "newmodel"
        assert seen["called"]
@@ -1228,9 +993,9 @@ def test_mirror_slash_side_effects_rejects_mutating_commands_while_running(monke
        ("/compress", "compress"),
    ]:
        warning = server._mirror_slash_side_effects("sid", session, cmd)
-        assert (
-            "session busy" in warning
-        ), f"{cmd} should have returned busy warning, got: {warning!r}"
+        assert "session busy" in warning, (
+            f"{cmd} should have returned busy warning, got: {warning!r}"
+        )
        assert f"/{expected_name}" in warning

    # None of the mutating side-effect helpers should have fired.
@@ -1303,11 +1068,7 @@ def test_session_create_close_race_does_not_orphan_worker(monkeypatch):
    # Stub everything _build touches
    monkeypatch.setattr(server, "_make_agent", _slow_make_agent)
    monkeypatch.setattr(server, "_SlashWorker", _FakeWorker)
-    monkeypatch.setattr(
-        server,
-        "_get_db",
-        lambda: types.SimpleNamespace(create_session=lambda *a, **kw: None),
-    )
+    monkeypatch.setattr(server, "_get_db", lambda: types.SimpleNamespace(create_session=lambda *a, **kw: None))
    monkeypatch.setattr(server, "_session_info", lambda _a: {"model": "x"})
    monkeypatch.setattr(server, "_probe_credentials", lambda _a: None)
    monkeypatch.setattr(server, "_wire_callbacks", lambda _sid: None)
@@ -1315,36 +1076,25 @@ def test_session_create_close_race_does_not_orphan_worker(monkeypatch):

    # Shim register/unregister to observe leaks
    import tools.approval as _approval
-
-    monkeypatch.setattr(_approval, "register_gateway_notify", lambda key, cb: None)
-    monkeypatch.setattr(
-        _approval,
-        "unregister_gateway_notify",
-        lambda key: unregistered_keys.append(key),
-    )
+    monkeypatch.setattr(_approval, "register_gateway_notify",
+                        lambda key, cb: None)
+    monkeypatch.setattr(_approval, "unregister_gateway_notify",
+                        lambda key: unregistered_keys.append(key))
    monkeypatch.setattr(_approval, "load_permanent_allowlist", lambda: None)

    # Start: session.create spawns _build thread, returns synchronously
-    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "session.create",
-            "params": {"cols": 80},
-        }
-    )
+    resp = server.handle_request({
+        "id": "1", "method": "session.create", "params": {"cols": 80},
+    })
    assert resp.get("result"), f"got error: {resp.get('error')}"
    sid = resp["result"]["session_id"]

    # Build thread is blocked in _slow_make_agent.  Close the session
    # NOW — this pops _sessions[sid] before _build can install the
    # worker/notify.
-    close_resp = server.handle_request(
-        {
-            "id": "2",
-            "method": "session.close",
-            "params": {"session_id": sid},
-        }
-    )
+    close_resp = server.handle_request({
+        "id": "2", "method": "session.close", "params": {"session_id": sid},
+    })
    assert close_resp.get("result", {}).get("closed") is True

    # At this point session.close saw slash_worker=None (not yet
@@ -1358,12 +1108,11 @@ def test_session_create_close_race_does_not_orphan_worker(monkeypatch):
        if closed_workers:
            break
        import time
-
        time.sleep(0.02)

-    assert (
-        len(closed_workers) == 1
-    ), f"orphan worker was not cleaned up — closed_workers={closed_workers}"
+    assert len(closed_workers) == 1, (
+        f"orphan worker was not cleaned up — closed_workers={closed_workers}"
+    )
    # Notify may be unregistered by both session.close (unconditional)
    # and the orphan-cleanup path; the key guarantee is that the build
    # thread does at least one unregister call (any prior close
@@ -1397,33 +1146,21 @@ def test_session_create_no_race_keeps_worker_alive(monkeypatch):

    monkeypatch.setattr(server, "_make_agent", lambda sid, key: _FakeAgent())
    monkeypatch.setattr(server, "_SlashWorker", _FakeWorker)
-    monkeypatch.setattr(
-        server,
-        "_get_db",
-        lambda: types.SimpleNamespace(create_session=lambda *a, **kw: None),
-    )
+    monkeypatch.setattr(server, "_get_db", lambda: types.SimpleNamespace(create_session=lambda *a, **kw: None))
    monkeypatch.setattr(server, "_session_info", lambda _a: {"model": "x"})
    monkeypatch.setattr(server, "_probe_credentials", lambda _a: None)
    monkeypatch.setattr(server, "_wire_callbacks", lambda _sid: None)
    monkeypatch.setattr(server, "_emit", lambda *a, **kw: None)

    import tools.approval as _approval
-
    monkeypatch.setattr(_approval, "register_gateway_notify", lambda key, cb: None)
-    monkeypatch.setattr(
-        _approval,
-        "unregister_gateway_notify",
-        lambda key: unregistered_keys.append(key),
-    )
+    monkeypatch.setattr(_approval, "unregister_gateway_notify",
+                        lambda key: unregistered_keys.append(key))
    monkeypatch.setattr(_approval, "load_permanent_allowlist", lambda: None)

-    resp = server.handle_request(
-        {
-            "id": "1",
-            "method": "session.create",
-            "params": {"cols": 80},
-        }
-    )
+    resp = server.handle_request({
+        "id": "1", "method": "session.create", "params": {"cols": 80},
+    })
    sid = resp["result"]["session_id"]

    # Wait for the build to finish (ready event inside session dict).
@@ -1432,12 +1169,12 @@ def test_session_create_no_race_keeps_worker_alive(monkeypatch):

    # Build finished without a close race — nothing should have been
    # cleaned up by the orphan check.
-    assert (
-        closed_workers == []
-    ), f"build thread closed its own worker despite no race: {closed_workers}"
-    assert (
-        unregistered_keys == []
-    ), f"build thread unregistered its own notify despite no race: {unregistered_keys}"
+    assert closed_workers == [], (
+        f"build thread closed its own worker despite no race: {closed_workers}"
+    )
+    assert unregistered_keys == [], (
+        f"build thread unregistered its own notify despite no race: {unregistered_keys}"
+    )

    # Session should have the live worker installed.
    assert session.get("slash_worker") is not None
@@ -1446,75 +1183,6 @@ def test_session_create_no_race_keeps_worker_alive(monkeypatch):
    server._sessions.pop(sid, None)


-def test_get_db_degrades_cleanly_when_sessiondb_init_fails(monkeypatch):
-    fake_mod = types.ModuleType("hermes_state")
-
-    class _BrokenSessionDB:
-        def __init__(self):
-            raise RuntimeError("locking protocol")
-
-    fake_mod.SessionDB = _BrokenSessionDB
-    monkeypatch.setitem(sys.modules, "hermes_state", fake_mod)
-    monkeypatch.setattr(server, "_db", None)
-    monkeypatch.setattr(server, "_db_error", None)
-
-    assert server._get_db() is None
-    assert server._db_error == "locking protocol"
-
-
-def test_session_create_continues_when_state_db_is_unavailable(monkeypatch):
-    class _FakeWorker:
-        def __init__(self, key, model):
-            self.key = key
-
-        def close(self):
-            return None
-
-    class _FakeAgent:
-        def __init__(self):
-            self.model = "x"
-            self.provider = "openrouter"
-            self.base_url = ""
-            self.api_key = ""
-
-    emits = []
-
-    monkeypatch.setattr(server, "_make_agent", lambda sid, key: _FakeAgent())
-    monkeypatch.setattr(server, "_SlashWorker", _FakeWorker)
-    monkeypatch.setattr(server, "_get_db", lambda: None)
-    monkeypatch.setattr(server, "_session_info", lambda _a: {"model": "x"})
-    monkeypatch.setattr(server, "_probe_credentials", lambda _a: None)
-    monkeypatch.setattr(server, "_wire_callbacks", lambda _sid: None)
-    monkeypatch.setattr(server, "_emit", lambda *a, **kw: emits.append(a))
-
-    import tools.approval as _approval
-    monkeypatch.setattr(_approval, "register_gateway_notify", lambda key, cb: None)
-    monkeypatch.setattr(_approval, "load_permanent_allowlist", lambda: None)
-
-    resp = server.handle_request(
-        {"id": "1", "method": "session.create", "params": {"cols": 80}}
-    )
-    sid = resp["result"]["session_id"]
-    session = server._sessions[sid]
-    session["agent_ready"].wait(timeout=2.0)
-
-    assert session["agent_error"] is None
-    assert session["agent"] is not None
-    assert not any(args and args[0] == "error" for args in emits)
-
-    server._sessions.pop(sid, None)
-
-
-def test_session_list_returns_clean_error_when_state_db_is_unavailable(monkeypatch):
-    monkeypatch.setattr(server, "_get_db", lambda: None)
-    monkeypatch.setattr(server, "_db_error", "locking protocol")
-
-    resp = server.handle_request({"id": "1", "method": "session.list", "params": {}})
-
-    assert "error" in resp
-    assert "state.db unavailable: locking protocol" in resp["error"]["message"]
-
-
 # --------------------------------------------------------------------------
 # model.options — curated-list parity with `hermes model` and classic /model
 # --------------------------------------------------------------------------
@@ -1058,59 +1058,6 @@ class TestChildCredentialPoolResolution(unittest.TestCase):

            self.assertEqual(mock_child._credential_pool, mock_pool)

-    @patch("tools.delegate_tool._load_config", return_value={})
-    def test_build_child_agent_preserves_mcp_toolsets_by_default(self, mock_cfg):
-        parent = _make_mock_parent()
-        parent.enabled_toolsets = ["web", "browser", "mcp-MiniMax"]
-
-        with patch("run_agent.AIAgent") as MockAgent:
-            mock_child = MagicMock()
-            MockAgent.return_value = mock_child
-
-            _build_child_agent(
-                task_index=0,
-                goal="Test narrowed toolsets",
-                context=None,
-                toolsets=["web", "browser"],
-                model=None,
-                max_iterations=10,
-                parent_agent=parent,
-                task_count=1,
-            )
-
-        self.assertEqual(
-            MockAgent.call_args[1]["enabled_toolsets"],
-            ["web", "browser", "mcp-MiniMax"],
-        )
-
-    @patch(
-        "tools.delegate_tool._load_config",
-        return_value={"inherit_mcp_toolsets": False},
-    )
-    def test_build_child_agent_strict_intersection_when_opted_out(self, mock_cfg):
-        parent = _make_mock_parent()
-        parent.enabled_toolsets = ["web", "browser", "mcp-MiniMax"]
-
-        with patch("run_agent.AIAgent") as MockAgent:
-            mock_child = MagicMock()
-            MockAgent.return_value = mock_child
-
-            _build_child_agent(
-                task_index=0,
-                goal="Test narrowed toolsets",
-                context=None,
-                toolsets=["web", "browser"],
-                model=None,
-                max_iterations=10,
-                parent_agent=parent,
-                task_count=1,
-            )
-
-        self.assertEqual(
-            MockAgent.call_args[1]["enabled_toolsets"],
-            ["web", "browser"],
-        )
-

 class TestChildCredentialLeasing(unittest.TestCase):
    def test_run_single_child_acquires_and_releases_lease(self):
@@ -382,31 +382,3 @@ def test_normalize_env_dict_rejects_complex_values():
        "BAD_DICT": {"nested": True},
    })
    assert result == {"GOOD": "string"}
-
-
-def test_security_args_include_setuid_setgid_for_gosu_drop():
-    """_SECURITY_ARGS must include SETUID and SETGID so the image entrypoint
-    can drop from root to the non-root `hermes` user via gosu.
-
-    Without these caps gosu exits with
-    ``error: failed switching to 'hermes': operation not permitted``
-    and the container exits immediately (exit 1) before running any work.
-
-    `no-new-privileges` is kept, so gosu still cannot escalate back to root
-    after the drop — the drop is a one-way transition performed before the
-    `no_new_privs` bit is enforced on the exec boundary.
-    """
-    args = docker_env._SECURITY_ARGS
-
-    # Flatten to set of added caps for clarity.
-    added = {
-        args[i + 1]
-        for i, flag in enumerate(args[:-1])
-        if flag == "--cap-add"
-    }
-    assert "SETUID" in added, "SETUID cap missing — gosu drop in entrypoint will fail"
-    assert "SETGID" in added, "SETGID cap missing — gosu drop in entrypoint will fail"
-
-    # Sanity: the hardening posture is still in place.
-    assert "--cap-drop" in args and "ALL" in args
-    assert "--security-opt" in args and "no-new-privileges" in args
@@ -1,6 +1,5 @@
 """Tests for _parse_env_var and _get_env_config env-var validation."""

-import importlib
 import json
 from unittest.mock import patch

@@ -85,23 +84,3 @@ class TestParseEnvVar:
        with patch.dict("os.environ", {"TERMINAL_DOCKER_VOLUMES": "not json"}):
            with pytest.raises(ValueError, match="valid JSON"):
                _parse_env_var("TERMINAL_DOCKER_VOLUMES", "[]", json.loads, "valid JSON")
-
-
-class TestImportTimeEnvParsing:
-    """Module-level env parsing should never make terminal_tool unimportable."""
-
-    def test_invalid_foreground_timeout_falls_back_to_default(self):
-        try:
-            with patch.dict("os.environ", {"TERMINAL_MAX_FOREGROUND_TIMEOUT": "5m"}, clear=False):
-                mod = importlib.reload(_tt_mod)
-                assert mod.FOREGROUND_MAX_TIMEOUT == 600
-        finally:
-            importlib.reload(_tt_mod)
-
-    def test_invalid_disk_warning_threshold_falls_back_to_default(self):
-        try:
-            with patch.dict("os.environ", {"TERMINAL_DISK_WARNING_GB": "huge"}, clear=False):
-                mod = importlib.reload(_tt_mod)
-                assert mod.DISK_USAGE_WARNING_THRESHOLD_GB == 500.0
-        finally:
-            importlib.reload(_tt_mod)
@@ -3,12 +3,7 @@
 import socket
 from unittest.mock import patch

-from tools.url_safety import (
-    is_safe_url,
-    _is_blocked_ip,
-    _global_allow_private_urls,
-    _reset_allow_private_cache,
-)
+from tools.url_safety import is_safe_url, _is_blocked_ip

 import ipaddress
 import pytest
@@ -207,189 +202,3 @@ class TestIsBlockedIp:
    def test_allowed_ips(self, ip_str):
        ip = ipaddress.ip_address(ip_str)
        assert _is_blocked_ip(ip) is False, f"{ip_str} should be allowed"
-
-
-class TestGlobalAllowPrivateUrls:
-    """Tests for the security.allow_private_urls config toggle."""
-
-    @pytest.fixture(autouse=True)
-    def _reset_cache(self):
-        """Reset the module-level toggle cache before and after each test."""
-        _reset_allow_private_cache()
-        yield
-        _reset_allow_private_cache()
-
-    def test_default_is_false(self, monkeypatch):
-        """Toggle defaults to False when no env var or config is set."""
-        monkeypatch.delenv("HERMES_ALLOW_PRIVATE_URLS", raising=False)
-        with patch("hermes_cli.config.read_raw_config", side_effect=Exception("no config")):
-            assert _global_allow_private_urls() is False
-
-    def test_env_var_true(self, monkeypatch):
-        """HERMES_ALLOW_PRIVATE_URLS=true enables the toggle."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        assert _global_allow_private_urls() is True
-
-    def test_env_var_1(self, monkeypatch):
-        """HERMES_ALLOW_PRIVATE_URLS=1 enables the toggle."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "1")
-        assert _global_allow_private_urls() is True
-
-    def test_env_var_yes(self, monkeypatch):
-        """HERMES_ALLOW_PRIVATE_URLS=yes enables the toggle."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "yes")
-        assert _global_allow_private_urls() is True
-
-    def test_env_var_false(self, monkeypatch):
-        """HERMES_ALLOW_PRIVATE_URLS=false keeps it disabled."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "false")
-        assert _global_allow_private_urls() is False
-
-    def test_config_security_section(self, monkeypatch):
-        """security.allow_private_urls in config enables the toggle."""
-        monkeypatch.delenv("HERMES_ALLOW_PRIVATE_URLS", raising=False)
-        cfg = {"security": {"allow_private_urls": True}}
-        with patch("hermes_cli.config.read_raw_config", return_value=cfg):
-            assert _global_allow_private_urls() is True
-
-    def test_config_browser_fallback(self, monkeypatch):
-        """browser.allow_private_urls works as legacy fallback."""
-        monkeypatch.delenv("HERMES_ALLOW_PRIVATE_URLS", raising=False)
-        cfg = {"browser": {"allow_private_urls": True}}
-        with patch("hermes_cli.config.read_raw_config", return_value=cfg):
-            assert _global_allow_private_urls() is True
-
-    def test_config_security_takes_precedence_over_browser(self, monkeypatch):
-        """security section is checked before browser section."""
-        monkeypatch.delenv("HERMES_ALLOW_PRIVATE_URLS", raising=False)
-        cfg = {"security": {"allow_private_urls": True}, "browser": {"allow_private_urls": False}}
-        with patch("hermes_cli.config.read_raw_config", return_value=cfg):
-            assert _global_allow_private_urls() is True
-
-    def test_env_var_overrides_config(self, monkeypatch):
-        """Env var takes priority over config."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "false")
-        cfg = {"security": {"allow_private_urls": True}}
-        with patch("hermes_cli.config.read_raw_config", return_value=cfg):
-            assert _global_allow_private_urls() is False
-
-    def test_result_is_cached(self, monkeypatch):
-        """Second call uses cached result, doesn't re-read config."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        assert _global_allow_private_urls() is True
-        # Change env after first call — should still be True (cached)
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "false")
-        assert _global_allow_private_urls() is True
-
-
-class TestAllowPrivateUrlsIntegration:
-    """Integration tests: is_safe_url respects the global toggle."""
-
-    @pytest.fixture(autouse=True)
-    def _reset_cache(self):
-        _reset_allow_private_cache()
-        yield
-        _reset_allow_private_cache()
-
-    def test_private_ip_allowed_when_toggle_on(self, monkeypatch):
-        """Private IPs pass is_safe_url when toggle is enabled."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("192.168.1.1", 0)),
-        ]):
-            assert is_safe_url("http://router.local") is True
-
-    def test_benchmark_ip_allowed_when_toggle_on(self, monkeypatch):
-        """198.18.x.x (benchmark/OpenWrt proxy range) passes when toggle is on."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("198.18.23.183", 0)),
-        ]):
-            assert is_safe_url("https://nousresearch.com") is True
-
-    def test_cgnat_allowed_when_toggle_on(self, monkeypatch):
-        """CGNAT range (100.64.0.0/10) passes when toggle is on."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("100.100.100.100", 0)),
-        ]):
-            assert is_safe_url("http://tailscale-peer.example/") is True
-
-    def test_localhost_allowed_when_toggle_on(self, monkeypatch):
-        """Even localhost passes when toggle is on."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("127.0.0.1", 0)),
-        ]):
-            assert is_safe_url("http://localhost:8080/api") is True
-
-    # --- Cloud metadata always blocked regardless of toggle ---
-
-    def test_metadata_hostname_blocked_even_with_toggle(self, monkeypatch):
-        """metadata.google.internal is ALWAYS blocked."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        assert is_safe_url("http://metadata.google.internal/computeMetadata/v1/") is False
-
-    def test_metadata_goog_blocked_even_with_toggle(self, monkeypatch):
-        """metadata.goog is ALWAYS blocked."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        assert is_safe_url("http://metadata.goog/computeMetadata/v1/") is False
-
-    def test_metadata_ip_blocked_even_with_toggle(self, monkeypatch):
-        """169.254.169.254 (AWS/GCP metadata IP) is ALWAYS blocked."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("169.254.169.254", 0)),
-        ]):
-            assert is_safe_url("http://169.254.169.254/latest/meta-data/") is False
-
-    def test_metadata_ipv6_blocked_even_with_toggle(self, monkeypatch):
-        """fd00:ec2::254 (AWS IPv6 metadata) is ALWAYS blocked."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (10, 1, 6, "", ("fd00:ec2::254", 0, 0, 0)),
-        ]):
-            assert is_safe_url("http://[fd00:ec2::254]/latest/") is False
-
-    def test_ecs_metadata_blocked_even_with_toggle(self, monkeypatch):
-        """169.254.170.2 (AWS ECS task metadata) is ALWAYS blocked."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("169.254.170.2", 0)),
-        ]):
-            assert is_safe_url("http://169.254.170.2/v2/credentials") is False
-
-    def test_alibaba_metadata_blocked_even_with_toggle(self, monkeypatch):
-        """100.100.100.200 (Alibaba Cloud metadata) is ALWAYS blocked."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("100.100.100.200", 0)),
-        ]):
-            assert is_safe_url("http://100.100.100.200/latest/meta-data/") is False
-
-    def test_azure_wire_server_blocked_even_with_toggle(self, monkeypatch):
-        """169.254.169.253 (Azure IMDS wire server) is ALWAYS blocked."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("169.254.169.253", 0)),
-        ]):
-            assert is_safe_url("http://169.254.169.253/") is False
-
-    def test_entire_link_local_blocked_even_with_toggle(self, monkeypatch):
-        """Any 169.254.x.x address is ALWAYS blocked (entire link-local range)."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("169.254.42.99", 0)),
-        ]):
-            assert is_safe_url("http://169.254.42.99/anything") is False
-
-    def test_dns_failure_still_blocked_with_toggle(self, monkeypatch):
-        """DNS failures are still blocked even with toggle on."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        with patch("socket.getaddrinfo", side_effect=socket.gaierror("fail")):
-            assert is_safe_url("https://nonexistent.example.com") is False
-
-    def test_empty_url_still_blocked_with_toggle(self, monkeypatch):
-        """Empty URLs are still blocked."""
-        monkeypatch.setenv("HERMES_ALLOW_PRIVATE_URLS", "true")
-        assert is_safe_url("") is False
@@ -1,48 +0,0 @@
-"""Regression test for #11884: _make_agent must resolve runtime provider.
-
-Without resolve_runtime_provider(), bare-slug models in config
-(e.g. ``claude-opus-4-6`` with ``model.provider: anthropic``) leave
-provider/base_url/api_key empty in AIAgent, causing HTTP 404.
-"""
-
-from unittest.mock import MagicMock, patch
-
-
-def test_make_agent_passes_resolved_provider():
-    """_make_agent forwards provider/base_url/api_key/api_mode from
-    resolve_runtime_provider to AIAgent."""
-
-    fake_runtime = {
-        "provider": "anthropic",
-        "base_url": "https://api.anthropic.com",
-        "api_key": "sk-test-key",
-        "api_mode": "anthropic_messages",
-        "command": None,
-        "args": None,
-        "credential_pool": None,
-    }
-
-    fake_cfg = {
-        "model": {"default": "claude-opus-4-6", "provider": "anthropic"},
-        "agent": {"system_prompt": "test"},
-    }
-
-    with patch("tui_gateway.server._load_cfg", return_value=fake_cfg), \
-         patch("tui_gateway.server._get_db", return_value=MagicMock()), \
-         patch("tui_gateway.server._load_tool_progress_mode", return_value="compact"), \
-         patch("tui_gateway.server._load_reasoning_config", return_value=None), \
-         patch("tui_gateway.server._load_service_tier", return_value=None), \
-         patch("tui_gateway.server._load_enabled_toolsets", return_value=None), \
-         patch("hermes_cli.runtime_provider.resolve_runtime_provider", return_value=fake_runtime) as mock_resolve, \
-         patch("run_agent.AIAgent") as mock_agent:
-
-        from tui_gateway.server import _make_agent
-        _make_agent("sid-1", "key-1")
-
-        mock_resolve.assert_called_once_with(requested=None)
-
-        call_kwargs = mock_agent.call_args
-        assert call_kwargs.kwargs["provider"] == "anthropic"
-        assert call_kwargs.kwargs["base_url"] == "https://api.anthropic.com"
-        assert call_kwargs.kwargs["api_key"] == "sk-test-key"
-        assert call_kwargs.kwargs["api_mode"] == "anthropic_messages"
@@ -402,7 +402,7 @@ def _browser_cdp_check() -> bool:

 registry.register(
    name="browser_cdp",
-    toolset="browser-cdp",
+    toolset="browser",
    schema=BROWSER_CDP_SCHEMA,
    handler=lambda args, **kw: browser_cdp(
        method=args.get("method", ""),
@@ -1182,15 +1182,6 @@ def _run_browser_command(
        # used during CLI discovery.
        browser_env["PATH"] = _merge_browser_path(browser_env.get("PATH", ""))
        browser_env["AGENT_BROWSER_SOCKET_DIR"] = task_socket_dir
-
-        # Tell the agent-browser daemon to self-terminate after being idle
-        # for our configured inactivity timeout.  This is the daemon-side
-        # counterpart to our Python-side _cleanup_inactive_browser_sessions
-        # — the daemon kills itself and its Chrome children when no CLI
-        # commands arrive within the window.  Added in agent-browser 0.24.
-        if "AGENT_BROWSER_IDLE_TIMEOUT_MS" not in browser_env:
-            idle_ms = str(BROWSER_SESSION_INACTIVITY_TIMEOUT * 1000)
-            browser_env["AGENT_BROWSER_IDLE_TIMEOUT_MS"] = idle_ms
        
        # Use temp files for stdout/stderr instead of pipes.
        # agent-browser starts a background daemon that inherits file
@@ -979,7 +979,6 @@ def execute_code(
        # --- Start UDS server ---
        server_sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        server_sock.bind(sock_path)
-        os.chmod(sock_path, 0o600)
        server_sock.listen(1)

        rpc_thread = threading.Thread(
@@ -383,7 +383,7 @@ class BaseEnvironment(ABC):
        quoted_cwd = (
            shlex.quote(cwd) if cwd != "~" and not cwd.startswith("~/") else cwd
        )
-        parts.append(f"builtin cd {quoted_cwd} || exit 126")
+        parts.append(f"cd {quoted_cwd} || exit 126")

        # Run the actual command
        parts.append(f"eval '{escaped}'")
@@ -148,10 +148,6 @@ def find_docker() -> Optional[str]:
 # We drop all capabilities then add back the minimum needed:
 #   DAC_OVERRIDE - root can write to bind-mounted dirs owned by host user
 #   CHOWN/FOWNER - package managers (pip, npm, apt) need to set file ownership
-#   SETUID/SETGID - the image entrypoint drops from root to the 'hermes'
-#       user via `gosu`, which requires these caps. Combined with
-#       `no-new-privileges`, gosu still cannot escalate back to root after
-#       the drop, so the security posture is preserved.
 # Block privilege escalation and limit PIDs.
 # /tmp is size-limited and nosuid but allows exec (needed by pip/npm builds).
 _SECURITY_ARGS = [
@@ -159,8 +155,6 @@ _SECURITY_ARGS = [
    "--cap-add", "DAC_OVERRIDE",
    "--cap-add", "CHOWN",
    "--cap-add", "FOWNER",
-    "--cap-add", "SETUID",
-    "--cap-add", "SETGID",
    "--security-opt", "no-new-privileges",
    "--pids-limit", "256",
    "--tmpfs", "/tmp:rw,nosuid,size=512m",
@@ -349,7 +349,6 @@ class LocalEnvironment(BaseEnvironment):
            stderr=subprocess.STDOUT,
            stdin=subprocess.PIPE if stdin_data is not None else subprocess.DEVNULL,
            preexec_fn=None if _IS_WINDOWS else os.setsid,
-            cwd=self.cwd,
        )

        if stdin_data is not None:
@@ -111,7 +111,7 @@ def load_env() -> Dict[str, str]:
    if not env_path.exists():
        return env_vars

-    with env_path.open(encoding="utf-8") as f:
+    with env_path.open() as f:
        for line in f:
            line = line.strip()
            if line and not line.startswith("#") and "=" in line:
@@ -72,48 +72,11 @@ from tools.tool_backend_helpers import (
 )


-def _safe_parse_import_env(
-    name: str,
-    default: Any,
-    converter,
-    type_label: str,
-):
-    """Parse module-level numeric env vars without breaking import.
-
-    Terminal tool is imported by CLI, ACP, tests, and tool discovery. A single
-    malformed env var must not make the whole module unloadable at import time.
-    """
-    raw = os.getenv(name)
-    if raw is None or raw == "":
-        return default
-    try:
-        return converter(raw)
-    except (TypeError, ValueError):
-        logger.warning(
-            "Invalid value for %s: %r (expected %s). Falling back to %r.",
-            name,
-            raw,
-            type_label,
-            default,
-        )
-        return default
-
-
 # Hard cap on foreground timeout; override via TERMINAL_MAX_FOREGROUND_TIMEOUT env var.
-FOREGROUND_MAX_TIMEOUT = _safe_parse_import_env(
-    "TERMINAL_MAX_FOREGROUND_TIMEOUT",
-    600,
-    int,
-    "integer",
-)
+FOREGROUND_MAX_TIMEOUT = int(os.getenv("TERMINAL_MAX_FOREGROUND_TIMEOUT", "600"))

 # Disk usage warning threshold (in GB)
-DISK_USAGE_WARNING_THRESHOLD_GB = _safe_parse_import_env(
-    "TERMINAL_DISK_WARNING_GB",
-    500.0,
-    float,
-    "number",
-)
+DISK_USAGE_WARNING_THRESHOLD_GB = float(os.getenv("TERMINAL_DISK_WARNING_GB", "500"))


 def _check_disk_usage_warning():
@@ -1547,8 +1510,6 @@ def terminal_tool(
                                "modal_mode": config.get("modal_mode", "auto"),
                                "docker_volumes": config.get("docker_volumes", []),
                                "docker_mount_cwd_to_workspace": config.get("docker_mount_cwd_to_workspace", False),
-                                "docker_forward_env": config.get("docker_forward_env", []),
-                                "docker_env": config.get("docker_env", {}),
                            }

                        local_config = None
@@ -5,13 +5,6 @@ skill could trick the agent into fetching internal resources like cloud
 metadata endpoints (169.254.169.254), localhost services, or private
 network hosts.

-The check can be globally disabled via ``security.allow_private_urls: true``
-in config.yaml for environments where DNS resolves external domains to
-private/benchmark-range IPs (OpenWrt routers, corporate proxies, VPNs
-that use 198.18.0.0/15 or 100.64.0.0/10).  Even when disabled, cloud
-metadata hostnames (metadata.google.internal, 169.254.169.254) are
-**always** blocked — those are never legitimate agent targets.
-
 Limitations (documented, not fixable at pre-flight level):
  - DNS rebinding (TOCTOU): an attacker-controlled DNS server with TTL=0
    can return a public IP for the check, then a private IP for the actual
@@ -25,35 +18,17 @@ Limitations (documented, not fixable at pre-flight level):

 import ipaddress
 import logging
-import os
 import socket
 from urllib.parse import urlparse

 logger = logging.getLogger(__name__)

 # Hostnames that should always be blocked regardless of IP resolution
-# or any config toggle.  These are cloud metadata endpoints that an
-# attacker could use to steal instance credentials.
 _BLOCKED_HOSTNAMES = frozenset({
    "metadata.google.internal",
    "metadata.goog",
 })

-# IPs and networks that should always be blocked regardless of the
-# allow_private_urls toggle.  These are cloud metadata / credential
-# endpoints — the #1 SSRF target — and the link-local range where
-# they all live.
-_ALWAYS_BLOCKED_IPS = frozenset({
-    ipaddress.ip_address("169.254.169.254"),  # AWS/GCP/Azure/DO/Oracle metadata
-    ipaddress.ip_address("169.254.170.2"),     # AWS ECS task metadata (task IAM creds)
-    ipaddress.ip_address("169.254.169.253"),   # Azure IMDS wire server
-    ipaddress.ip_address("fd00:ec2::254"),     # AWS metadata (IPv6)
-    ipaddress.ip_address("100.100.100.200"),   # Alibaba Cloud metadata
-})
-_ALWAYS_BLOCKED_NETWORKS = (
-    ipaddress.ip_network("169.254.0.0/16"),    # Entire link-local range (no legit agent target)
-)
-
 # Exact HTTPS hostnames allowed to resolve to private/benchmark-space IPs.
 # This is intentionally narrow: QQ media downloads can legitimately resolve
 # to 198.18.0.0/15 behind local proxy/benchmark infrastructure.
@@ -67,67 +42,6 @@ _TRUSTED_PRIVATE_IP_HOSTS = frozenset({
 # VPNs, and some cloud internal networks.
 _CGNAT_NETWORK = ipaddress.ip_network("100.64.0.0/10")

-# ---------------------------------------------------------------------------
-# Global toggle: allow private/internal IP resolution
-# ---------------------------------------------------------------------------
-# Cached after first read so we don't hit the filesystem on every URL check.
-_allow_private_resolved = False
-_cached_allow_private: bool = False
-
-
-def _global_allow_private_urls() -> bool:
-    """Return True when the user has opted out of private-IP blocking.
-
-    Checks (in priority order):
-    1. ``HERMES_ALLOW_PRIVATE_URLS`` env var  (``true``/``1``/``yes``)
-    2. ``security.allow_private_urls`` in config.yaml
-    3. ``browser.allow_private_urls`` in config.yaml  (legacy / backward compat)
-
-    Result is cached for the process lifetime.
-    """
-    global _allow_private_resolved, _cached_allow_private
-    if _allow_private_resolved:
-        return _cached_allow_private
-
-    _allow_private_resolved = True
-    _cached_allow_private = False  # safe default
-
-    # 1. Env var override (highest priority)
-    env_val = os.getenv("HERMES_ALLOW_PRIVATE_URLS", "").strip().lower()
-    if env_val in ("true", "1", "yes"):
-        _cached_allow_private = True
-        return _cached_allow_private
-    if env_val in ("false", "0", "no"):
-        # Explicit false — don't fall through to config
-        return _cached_allow_private
-
-    # 2. Config file
-    try:
-        from hermes_cli.config import read_raw_config
-        cfg = read_raw_config()
-        # security.allow_private_urls (preferred)
-        sec = cfg.get("security", {})
-        if isinstance(sec, dict) and sec.get("allow_private_urls"):
-            _cached_allow_private = True
-            return _cached_allow_private
-        # browser.allow_private_urls (legacy fallback)
-        browser = cfg.get("browser", {})
-        if isinstance(browser, dict) and browser.get("allow_private_urls"):
-            _cached_allow_private = True
-            return _cached_allow_private
-    except Exception:
-        # Config unavailable (e.g. tests, early import) — keep default
-        pass
-
-    return _cached_allow_private
-
-
-def _reset_allow_private_cache() -> None:
-    """Reset the cached toggle — only for tests."""
-    global _allow_private_resolved, _cached_allow_private
-    _allow_private_resolved = False
-    _cached_allow_private = False
-

 def _is_blocked_ip(ip: ipaddress.IPv4Address | ipaddress.IPv6Address) -> bool:
    """Return True if the IP should be blocked for SSRF protection."""
@@ -151,11 +65,6 @@ def is_safe_url(url: str) -> bool:

    Resolves the hostname to an IP and checks against private ranges.
    Fails closed: DNS errors and unexpected exceptions block the request.
-
-    When ``security.allow_private_urls`` is enabled (or the env var
-    ``HERMES_ALLOW_PRIVATE_URLS=true``), private-IP blocking is skipped.
-    Cloud metadata endpoints (169.254.169.254, metadata.google.internal)
-    remain blocked regardless — they are never legitimate agent targets.
    """
    try:
        parsed = urlparse(url)
@@ -164,14 +73,11 @@ def is_safe_url(url: str) -> bool:
        if not hostname:
            return False

-        # Block known internal hostnames — ALWAYS, even with toggle on
+        # Block known internal hostnames
        if hostname in _BLOCKED_HOSTNAMES:
            logger.warning("Blocked request to internal hostname: %s", hostname)
            return False

-        # Check the global toggle AFTER blocking metadata hostnames
-        allow_all_private = _global_allow_private_urls()
-
        allow_private_ip = _allows_private_ip_resolution(hostname, scheme)

        # Try to resolve and check IP
@@ -190,27 +96,14 @@ def is_safe_url(url: str) -> bool:
            except ValueError:
                continue

-            # Always block cloud metadata IPs and link-local, even with toggle on
-            if ip in _ALWAYS_BLOCKED_IPS or any(ip in net for net in _ALWAYS_BLOCKED_NETWORKS):
-                logger.warning(
-                    "Blocked request to cloud metadata address: %s -> %s",
-                    hostname, ip_str,
-                )
-                return False
-
-            if not allow_all_private and not allow_private_ip and _is_blocked_ip(ip):
+            if not allow_private_ip and _is_blocked_ip(ip):
                logger.warning(
                    "Blocked request to private/internal address: %s -> %s",
                    hostname, ip_str,
                )
                return False

-        if allow_all_private:
-            logger.debug(
-                "Allowing private/internal resolution (security.allow_private_urls=true): %s",
-                hostname,
-            )
-        elif allow_private_ip:
+        if allow_private_ip:
            logger.debug(
                "Allowing trusted hostname despite private/internal resolution: %s",
                hostname,
--- a/Show More
+++ b/Show More