fix(gateway): scope /yolo to the active session

fix(config): allow HERMES_HOME_MODE env var to override _secure_dir() permissions (#6993 )
Operators running a web server (nginx, caddy) that needs to traverse ~/.hermes/ can now set HERMES_HOME_MODE=0701 (or any octal mode) instead of having _secure_dir() revert their manual chmod on every gateway restart. Default behavior (0o700) is unchanged. Fixes #6991. Contributed by @ygd58.
2026-04-10 03:09:02 -07:00 · 2026-04-10 03:00:15 -07:00 · 2026-04-10 03:00:12 -07:00 · 2026-04-10 03:00:12 -07:00 · 2026-04-10 02:59:02 -07:00 · 2026-04-10 02:58:54 -07:00
109 changed files with 7071 additions and 607 deletions
@@ -13,7 +13,8 @@ COPY . /opt/hermes
 WORKDIR /opt/hermes

 # Install Python and Node dependencies in one layer, no cache
-RUN pip install --no-cache-dir -e ".[all]" --break-system-packages && \
+RUN pip install --no-cache-dir uv --break-system-packages && \
+    uv pip install --system --break-system-packages --no-cache -e ".[all]" && \
    npm install --prefer-offline --no-audit && \
    npx playwright install --with-deps chromium --only-shell && \
    cd /opt/hermes/scripts/whatsapp-bridge && \
@@ -33,8 +33,10 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
 curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
 ```

-Works on Linux, macOS, and WSL2. The installer handles everything — Python, Node.js, dependencies, and the `hermes` command. No prerequisites except git.
+Works on Linux, macOS, WSL2, and Android via Termux. The installer handles the platform-specific setup for you.

+> **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
+>
 > **Windows:** Native Windows is not supported. Please install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) and run the command above.

 After installation:
@@ -451,14 +451,13 @@ class HermesACPAgent(acp.Agent):
            await conn.session_update(session_id, update)

        usage = None
-        usage_data = result.get("usage")
-        if usage_data and isinstance(usage_data, dict):
+        if any(result.get(key) is not None for key in ("prompt_tokens", "completion_tokens", "total_tokens")):
            usage = Usage(
-                input_tokens=usage_data.get("prompt_tokens", 0),
-                output_tokens=usage_data.get("completion_tokens", 0),
-                total_tokens=usage_data.get("total_tokens", 0),
-                thought_tokens=usage_data.get("reasoning_tokens"),
-                cached_read_tokens=usage_data.get("cached_tokens"),
+                input_tokens=result.get("prompt_tokens", 0),
+                output_tokens=result.get("completion_tokens", 0),
+                total_tokens=result.get("total_tokens", 0),
+                thought_tokens=result.get("reasoning_tokens"),
+                cached_read_tokens=result.get("cache_read_tokens"),
            )

        stop_reason = "cancelled" if state.cancel_event and state.cancel_event.is_set() else "end_turn"
@@ -74,8 +74,11 @@ def _get_anthropic_max_output(model: str) -> int:
    model IDs (claude-sonnet-4-5-20250929) and variant suffixes (:1m, :fast)
    resolve correctly.  Longest-prefix match wins to avoid e.g. "claude-3-5"
    matching before "claude-3-5-sonnet".
+
+    Normalizes dots to hyphens so that model names like
+    ``anthropic/claude-opus-4.6`` match the ``claude-opus-4-6`` table key.
    """
-    m = model.lower()
+    m = model.lower().replace(".", "-")
    best_key = ""
    best_val = _ANTHROPIC_DEFAULT_OUTPUT_LIMIT
    for key, val in _ANTHROPIC_OUTPUT_LIMITS.items():
@@ -95,6 +98,15 @@ _COMMON_BETAS = [
    "interleaved-thinking-2025-05-14",
    "fine-grained-tool-streaming-2025-05-14",
 ]
+# MiniMax's Anthropic-compatible endpoints fail tool-use requests when
+# the fine-grained tool streaming beta is present.  Omit it so tool calls
+# fall back to the provider's default response path.
+_TOOL_STREAMING_BETA = "fine-grained-tool-streaming-2025-05-14"
+
+# Fast mode beta — enables the ``speed: "fast"`` request parameter for
+# significantly higher output token throughput on Opus 4.6 (~2.5x).
+# See https://platform.claude.com/docs/en/build-with-claude/fast-mode
+_FAST_MODE_BETA = "fast-mode-2026-02-01"

 # Additional beta headers required for OAuth/subscription auth.
 # Matches what Claude Code (and pi-ai / OpenCode) send.
@@ -204,6 +216,19 @@ def _requires_bearer_auth(base_url: str | None) -> bool:
    return normalized.startswith(("https://api.minimax.io/anthropic", "https://api.minimaxi.com/anthropic"))


+def _common_betas_for_base_url(base_url: str | None) -> list[str]:
+    """Return the beta headers that are safe for the configured endpoint.
+
+    MiniMax's Anthropic-compatible endpoints (Bearer-auth) reject requests
+    that include Anthropic's ``fine-grained-tool-streaming`` beta — every
+    tool-use message triggers a connection error.  Strip that beta for
+    Bearer-auth endpoints while keeping all other betas intact.
+    """
+    if _requires_bearer_auth(base_url):
+        return [b for b in _COMMON_BETAS if b != _TOOL_STREAMING_BETA]
+    return _COMMON_BETAS
+
+
 def build_anthropic_client(api_key: str, base_url: str = None):
    """Create an Anthropic client, auto-detecting setup-tokens vs API keys.

@@ -222,6 +247,7 @@ def build_anthropic_client(api_key: str, base_url: str = None):
    }
    if normalized_base_url:
        kwargs["base_url"] = normalized_base_url
+    common_betas = _common_betas_for_base_url(normalized_base_url)

    if _requires_bearer_auth(normalized_base_url):
        # Some Anthropic-compatible providers (e.g. MiniMax) expect the API key in
@@ -231,21 +257,21 @@ def build_anthropic_client(api_key: str, base_url: str = None):
        # not use Anthropic's sk-ant-api prefix and would otherwise be misread as
        # Anthropic OAuth/setup tokens.
        kwargs["auth_token"] = api_key
-        if _COMMON_BETAS:
-            kwargs["default_headers"] = {"anthropic-beta": ",".join(_COMMON_BETAS)}
+        if common_betas:
+            kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}
    elif _is_third_party_anthropic_endpoint(base_url):
        # Third-party proxies (Azure AI Foundry, AWS Bedrock, etc.) use their
        # own API keys with x-api-key auth. Skip OAuth detection — their keys
        # don't follow Anthropic's sk-ant-* prefix convention and would be
        # misclassified as OAuth tokens.
        kwargs["api_key"] = api_key
-        if _COMMON_BETAS:
-            kwargs["default_headers"] = {"anthropic-beta": ",".join(_COMMON_BETAS)}
+        if common_betas:
+            kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}
    elif _is_oauth_token(api_key):
        # OAuth access token / setup-token → Bearer auth + Claude Code identity.
        # Anthropic routes OAuth requests based on user-agent and headers;
        # without Claude Code's fingerprint, requests get intermittent 500s.
-        all_betas = _COMMON_BETAS + _OAUTH_ONLY_BETAS
+        all_betas = common_betas + _OAUTH_ONLY_BETAS
        kwargs["auth_token"] = api_key
        kwargs["default_headers"] = {
            "anthropic-beta": ",".join(all_betas),
@@ -255,8 +281,8 @@ def build_anthropic_client(api_key: str, base_url: str = None):
    else:
        # Regular API key → x-api-key header + common betas
        kwargs["api_key"] = api_key
-        if _COMMON_BETAS:
-            kwargs["default_headers"] = {"anthropic-beta": ",".join(_COMMON_BETAS)}
+        if common_betas:
+            kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}

    return _anthropic_sdk.Anthropic(**kwargs)

@@ -1235,6 +1261,7 @@ def build_anthropic_kwargs(
    preserve_dots: bool = False,
    context_length: Optional[int] = None,
    base_url: str | None = None,
+    fast_mode: bool = False,
 ) -> Dict[str, Any]:
    """Build kwargs for anthropic.messages.create().

@@ -1268,6 +1295,10 @@ def build_anthropic_kwargs(

    When *base_url* points to a third-party Anthropic-compatible endpoint,
    thinking block signatures are stripped (they are Anthropic-proprietary).
+
+    When *fast_mode* is True, adds ``speed: "fast"`` and the fast-mode beta
+    header for ~2.5x faster output throughput on Opus 4.6.  Currently only
+    supported on native Anthropic endpoints (not third-party compatible ones).
    """
    system, anthropic_messages = convert_messages_to_anthropic(messages, base_url=base_url)
    anthropic_tools = convert_tools_to_anthropic(tools) if tools else []
@@ -1366,6 +1397,20 @@ def build_anthropic_kwargs(
                kwargs["temperature"] = 1
                kwargs["max_tokens"] = max(effective_max_tokens, budget + 4096)

+    # ── Fast mode (Opus 4.6 only) ────────────────────────────────────
+    # Adds speed:"fast" + the fast-mode beta header for ~2.5x output speed.
+    # Only for native Anthropic endpoints — third-party providers would
+    # reject the unknown beta header and speed parameter.
+    if fast_mode and not _is_third_party_anthropic_endpoint(base_url):
+        kwargs["speed"] = "fast"
+        # Build extra_headers with ALL applicable betas (the per-request
+        # extra_headers override the client-level anthropic-beta header).
+        betas = list(_common_betas_for_base_url(base_url))
+        if is_oauth:
+            betas.extend(_OAUTH_ONLY_BETAS)
+        betas.append(_FAST_MODE_BETA)
+        kwargs["extra_headers"] = {"anthropic-beta": ",".join(betas)}
+
    return kwargs


@@ -1427,4 +1472,4 @@ def normalize_anthropic_response(
            reasoning_details=reasoning_details or None,
        ),
        finish_reason,
-    )
+    )
@@ -702,7 +702,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
            logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
            extra = {}
            if "api.kimi.com" in base_url.lower():
-                extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
+                extra["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
            elif "api.githubcopilot.com" in base_url.lower():
                from hermes_cli.models import copilot_default_headers

@@ -721,7 +721,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
        logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
        extra = {}
        if "api.kimi.com" in base_url.lower():
-            extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
+            extra["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
        elif "api.githubcopilot.com" in base_url.lower():
            from hermes_cli.models import copilot_default_headers

@@ -1195,7 +1195,7 @@ def _to_async_client(sync_client, model: str):

        async_kwargs["default_headers"] = copilot_default_headers()
    elif "api.kimi.com" in base_lower:
-        async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
+        async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
    return AsyncOpenAI(**async_kwargs), model


@@ -1317,7 +1317,7 @@ def resolve_provider_client(
            final_model = model or _read_main_model() or "gpt-4o-mini"
            extra = {}
            if "api.kimi.com" in custom_base.lower():
-                extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
+                extra["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
            elif "api.githubcopilot.com" in custom_base.lower():
                from hermes_cli.models import copilot_default_headers
                extra["default_headers"] = copilot_default_headers()
@@ -1400,7 +1400,7 @@ def resolve_provider_client(
        # Provider-specific headers
        headers = {}
        if "api.kimi.com" in base_url.lower():
-            headers["User-Agent"] = "KimiCLI/1.3"
+            headers["User-Agent"] = "KimiCLI/1.30.0"
        elif "api.githubcopilot.com" in base_url.lower():
            from hermes_cli.models import copilot_default_headers

@@ -20,6 +20,7 @@ from hermes_cli.auth import (
    DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
    KIMI_CODE_BASE_URL,
    PROVIDER_REGISTRY,
+    _auth_store_lock,
    _codex_access_token_is_expiring,
    _decode_jwt_claims,
    _import_codex_cli_tokens,
@@ -27,6 +28,8 @@ from hermes_cli.auth import (
    _load_provider_state,
    _resolve_kimi_base_url,
    _resolve_zai_base_url,
+    _save_auth_store,
+    _save_provider_state,
    read_credential_pool,
    write_credential_pool,
 )
@@ -479,6 +482,67 @@ class CredentialPool:
            logger.debug("Failed to sync from ~/.codex/auth.json: %s", exc)
        return entry

+    def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
+        """Write refreshed pool entry tokens back to auth.json providers.
+
+        After a pool-level refresh, the pool entry has fresh tokens but
+        auth.json's ``providers.<id>`` still holds the pre-refresh state.
+        On the next ``load_pool()``, ``_seed_from_singletons()`` reads that
+        stale state and can overwrite the fresh pool entry — potentially
+        re-seeding a consumed single-use refresh token.
+
+        Applies to any OAuth provider whose singleton lives in auth.json
+        (currently Nous and OpenAI Codex).
+        """
+        if entry.source != "device_code":
+            return
+        try:
+            with _auth_store_lock():
+                auth_store = _load_auth_store()
+                if self.provider == "nous":
+                    state = _load_provider_state(auth_store, "nous")
+                    if state is None:
+                        return
+                    state["access_token"] = entry.access_token
+                    if entry.refresh_token:
+                        state["refresh_token"] = entry.refresh_token
+                    if entry.expires_at:
+                        state["expires_at"] = entry.expires_at
+                    if entry.agent_key:
+                        state["agent_key"] = entry.agent_key
+                    if entry.agent_key_expires_at:
+                        state["agent_key_expires_at"] = entry.agent_key_expires_at
+                    for extra_key in ("obtained_at", "expires_in", "agent_key_id",
+                                      "agent_key_expires_in", "agent_key_reused",
+                                      "agent_key_obtained_at"):
+                        val = entry.extra.get(extra_key)
+                        if val is not None:
+                            state[extra_key] = val
+                    if entry.inference_base_url:
+                        state["inference_base_url"] = entry.inference_base_url
+                    _save_provider_state(auth_store, "nous", state)
+
+                elif self.provider == "openai-codex":
+                    state = _load_provider_state(auth_store, "openai-codex")
+                    if not isinstance(state, dict):
+                        return
+                    tokens = state.get("tokens")
+                    if not isinstance(tokens, dict):
+                        return
+                    tokens["access_token"] = entry.access_token
+                    if entry.refresh_token:
+                        tokens["refresh_token"] = entry.refresh_token
+                    if entry.last_refresh:
+                        state["last_refresh"] = entry.last_refresh
+                    _save_provider_state(auth_store, "openai-codex", state)
+
+                else:
+                    return
+
+                _save_auth_store(auth_store)
+        except Exception as exc:
+            logger.debug("Failed to sync %s pool entry back to auth store: %s", self.provider, exc)
+
    def _refresh_entry(self, entry: PooledCredential, *, force: bool) -> Optional[PooledCredential]:
        if entry.auth_type != AUTH_TYPE_OAUTH or not entry.refresh_token:
            if force:
@@ -513,6 +577,13 @@ class CredentialPool:
                    except Exception as wexc:
                        logger.debug("Failed to write refreshed token to credentials file: %s", wexc)
            elif self.provider == "openai-codex":
+                # Proactively sync from ~/.codex/auth.json before refresh.
+                # The Codex CLI (or another Hermes profile) may have already
+                # consumed our refresh_token.  Syncing first avoids a
+                # "refresh_token_reused" error when the CLI has a newer pair.
+                synced = self._sync_codex_entry_from_cli(entry)
+                if synced is not entry:
+                    entry = synced
                refreshed = auth_mod.refresh_codex_oauth_pure(
                    entry.access_token,
                    entry.refresh_token,
@@ -598,6 +669,37 @@ class CredentialPool:
                    # Credentials file had a valid (non-expired) token — use it directly
                    logger.debug("Credentials file has valid token, using without refresh")
                    return synced
+            # For openai-codex: the refresh_token may have been consumed by
+            # the Codex CLI between our proactive sync and the refresh call.
+            # Re-sync and retry once.
+            if self.provider == "openai-codex":
+                synced = self._sync_codex_entry_from_cli(entry)
+                if synced.refresh_token != entry.refresh_token:
+                    logger.debug("Retrying Codex refresh with synced token from ~/.codex/auth.json")
+                    try:
+                        refreshed = auth_mod.refresh_codex_oauth_pure(
+                            synced.access_token,
+                            synced.refresh_token,
+                        )
+                        updated = replace(
+                            synced,
+                            access_token=refreshed["access_token"],
+                            refresh_token=refreshed["refresh_token"],
+                            last_refresh=refreshed.get("last_refresh"),
+                            last_status=STATUS_OK,
+                            last_status_at=None,
+                            last_error_code=None,
+                        )
+                        self._replace_entry(synced, updated)
+                        self._persist()
+                        self._sync_device_code_entry_to_auth_store(updated)
+                        return updated
+                    except Exception as retry_exc:
+                        logger.debug("Codex retry refresh also failed: %s", retry_exc)
+                elif not self._entry_needs_refresh(synced):
+                    logger.debug("Codex CLI has valid token, using without refresh")
+                    self._sync_device_code_entry_to_auth_store(synced)
+                    return synced
            self._mark_exhausted(entry, None)
            return None

@@ -612,6 +714,10 @@ class CredentialPool:
        )
        self._replace_entry(entry, updated)
        self._persist()
+        # Sync refreshed tokens back to auth.json providers so that
+        # _seed_from_singletons() on the next load_pool() sees fresh state
+        # instead of re-seeding stale/consumed tokens.
+        self._sync_device_code_entry_to_auth_store(updated)
        return updated

    def _entry_needs_refresh(self, entry: PooledCredential) -> bool:
@@ -677,6 +677,27 @@ def _classify_by_message(
            should_compress=True,
        )

+    # Usage-limit patterns need the same disambiguation as 402: some providers
+    # surface "usage limit" errors without an HTTP status code.  A transient
+    # signal ("try again", "resets at", …) means it's a periodic quota, not
+    # billing exhaustion.
+    has_usage_limit = any(p in error_msg for p in _USAGE_LIMIT_PATTERNS)
+    if has_usage_limit:
+        has_transient_signal = any(p in error_msg for p in _USAGE_LIMIT_TRANSIENT_SIGNALS)
+        if has_transient_signal:
+            return result_fn(
+                FailoverReason.rate_limit,
+                retryable=True,
+                should_rotate_credential=True,
+                should_fallback=True,
+            )
+        return result_fn(
+            FailoverReason.billing,
+            retryable=False,
+            should_rotate_credential=True,
+            should_fallback=True,
+        )
+
    # Billing patterns
    if any(p in error_msg for p in _BILLING_PATTERNS):
        return result_fn(
@@ -704,10 +725,14 @@ def _classify_by_message(
        )

    # Auth patterns
+    # Auth errors should NOT be retried directly — the credential is invalid and
+    # retrying with the same key will always fail.  Set retryable=False so the
+    # caller triggers credential rotation (should_rotate_credential=True) or
+    # provider fallback rather than an immediate retry loop.
    if any(p in error_msg for p in _AUTH_PATTERNS):
        return result_fn(
            FailoverReason.auth,
-            retryable=True,
+            retryable=False,
            should_rotate_credential=True,
        )

@@ -120,6 +120,18 @@ def _parse_reasoning_config(effort: str) -> dict | None:
    return result


+def _parse_service_tier_config(raw: str) -> str | None:
+    """Parse a persisted service-tier preference into a Responses API value."""
+    value = str(raw or "").strip().lower()
+    if not value or value in {"normal", "default", "standard", "off", "none"}:
+        return None
+    if value in {"fast", "priority", "on"}:
+        return "priority"
+    logger.warning("Unknown service_tier '%s', ignoring", raw)
+    return None
+
+
+
 def _get_chrome_debug_candidates(system: str) -> list[str]:
    """Return likely browser executables for local CDP auto-launch."""
    candidates: list[str] = []
@@ -239,6 +251,7 @@ def load_cli_config() -> Dict[str, Any]:
            "system_prompt": "",
            "prefill_messages_file": "",
            "reasoning_effort": "",
+            "service_tier": "",
            "personalities": {
                "helpful": "You are a helpful, friendly AI assistant.",
                "concise": "You are a concise assistant. Keep responses brief and to the point.",
@@ -1008,7 +1021,7 @@ def _cprint(text: str):


 # ---------------------------------------------------------------------------
-# File-drop detection — extracted as a pure function for testability.
+# File-drop / local attachment detection — extracted as pure helpers for tests.
 # ---------------------------------------------------------------------------

 _IMAGE_EXTENSIONS = frozenset({
@@ -1017,12 +1030,103 @@ _IMAGE_EXTENSIONS = frozenset({
 })


-def _detect_file_drop(user_input: str) -> "dict | None":
-    """Detect if *user_input* is a dragged/pasted file path, not a slash command.
+from hermes_constants import is_termux as _is_termux_environment

-    When a user drags a file into the terminal, macOS pastes the absolute path
-    (e.g. ``/Users/roland/Desktop/file.png``) which starts with ``/`` and would
-    otherwise be mistaken for a slash command.
+
+def _termux_example_image_path(filename: str = "cat.png") -> str:
+    """Return a realistic example media path for the current Termux setup."""
+    candidates = [
+        os.path.expanduser("~/storage/shared"),
+        "/sdcard",
+        "/storage/emulated/0",
+        "/storage/self/primary",
+    ]
+    for root in candidates:
+        if os.path.isdir(root):
+            return os.path.join(root, "Pictures", filename)
+    return os.path.join("~/storage/shared", "Pictures", filename)
+
+
+def _split_path_input(raw: str) -> tuple[str, str]:
+    """Split a leading file path token from trailing free-form text.
+
+    Supports quoted paths and backslash-escaped spaces so callers can accept
+    inputs like:
+      /tmp/pic.png describe this
+      ~/storage/shared/My\ Photos/cat.png what is this?
+      "/storage/emulated/0/DCIM/Camera/cat 1.png" summarize
+    """
+    raw = str(raw or "").strip()
+    if not raw:
+        return "", ""
+
+    if raw[0] in {'"', "'"}:
+        quote = raw[0]
+        pos = 1
+        while pos < len(raw):
+            ch = raw[pos]
+            if ch == '\\' and pos + 1 < len(raw):
+                pos += 2
+                continue
+            if ch == quote:
+                token = raw[1:pos]
+                remainder = raw[pos + 1 :].strip()
+                return token, remainder
+            pos += 1
+        return raw[1:], ""
+
+    pos = 0
+    while pos < len(raw):
+        ch = raw[pos]
+        if ch == '\\' and pos + 1 < len(raw) and raw[pos + 1] == ' ':
+            pos += 2
+        elif ch == ' ':
+            break
+        else:
+            pos += 1
+
+    token = raw[:pos].replace('\\ ', ' ')
+    remainder = raw[pos:].strip()
+    return token, remainder
+
+
+def _resolve_attachment_path(raw_path: str) -> Path | None:
+    """Resolve a user-supplied local attachment path.
+
+    Accepts quoted or unquoted paths, expands ``~`` and env vars, and resolves
+    relative paths from ``TERMINAL_CWD`` when set (matching terminal tool cwd).
+    Returns ``None`` when the path does not resolve to an existing file.
+    """
+    token = str(raw_path or "").strip()
+    if not token:
+        return None
+
+    if (token.startswith('"') and token.endswith('"')) or (token.startswith("'") and token.endswith("'")):
+        token = token[1:-1].strip()
+    if not token:
+        return None
+
+    expanded = os.path.expandvars(os.path.expanduser(token))
+    path = Path(expanded)
+    if not path.is_absolute():
+        base_dir = Path(os.getenv("TERMINAL_CWD", os.getcwd()))
+        path = base_dir / path
+
+    try:
+        resolved = path.resolve()
+    except Exception:
+        resolved = path
+
+    if not resolved.exists() or not resolved.is_file():
+        return None
+    return resolved
+
+
+def _detect_file_drop(user_input: str) -> "dict | None":
+    """Detect if *user_input* starts with a real local file path.
+
+    This catches dragged/pasted paths before they are mistaken for slash
+    commands, and also supports Termux-friendly paths like ``~/storage/...``.

    Returns a dict on match::

@@ -1034,29 +1138,31 @@ def _detect_file_drop(user_input: str) -> "dict | None":

    Returns ``None`` when the input is not a real file path.
    """
-    if not isinstance(user_input, str) or not user_input.startswith("/"):
+    if not isinstance(user_input, str):
        return None

-    # Walk the string absorbing backslash-escaped spaces ("\ ").
-    raw = user_input
-    pos = 0
-    while pos < len(raw):
-        ch = raw[pos]
-        if ch == '\\' and pos + 1 < len(raw) and raw[pos + 1] == ' ':
-            pos += 2  # skip escaped space
-        elif ch == ' ':
-            break
-        else:
-            pos += 1
-
-    first_token_raw = raw[:pos]
-    first_token = first_token_raw.replace('\\ ', ' ')
-    drop_path = Path(first_token)
-
-    if not drop_path.exists() or not drop_path.is_file():
+    stripped = user_input.strip()
+    if not stripped:
+        return None
+
+    starts_like_path = (
+        stripped.startswith("/")
+        or stripped.startswith("~")
+        or stripped.startswith("./")
+        or stripped.startswith("../")
+        or stripped.startswith('"/')
+        or stripped.startswith('"~')
+        or stripped.startswith("'/")
+        or stripped.startswith("'~")
+    )
+    if not starts_like_path:
+        return None
+
+    first_token, remainder = _split_path_input(stripped)
+    drop_path = _resolve_attachment_path(first_token)
+    if drop_path is None:
        return None

-    remainder = raw[pos:].strip()
    return {
        "path": drop_path,
        "is_image": drop_path.suffix.lower() in _IMAGE_EXTENSIONS,
@@ -1064,6 +1170,74 @@ def _detect_file_drop(user_input: str) -> "dict | None":
    }


+def _format_image_attachment_badges(attached_images: list[Path], image_counter: int, width: int | None = None) -> str:
+    """Format the attached-image badge row for the interactive CLI.
+
+    Narrow terminals such as Termux should get a compact summary that fits on a
+    single row, while wider terminals can show the classic per-image badges.
+    """
+    if not attached_images:
+        return ""
+
+    width = width or shutil.get_terminal_size((80, 24)).columns
+
+    def _trunc(name: str, limit: int) -> str:
+        return name if len(name) <= limit else name[: max(1, limit - 3)] + "..."
+
+    if width < 52:
+        if len(attached_images) == 1:
+            return f"[📎 {_trunc(attached_images[0].name, 20)}]"
+        return f"[📎 {len(attached_images)} images attached]"
+
+    if width < 80:
+        if len(attached_images) == 1:
+            return f"[📎 {_trunc(attached_images[0].name, 32)}]"
+        first = _trunc(attached_images[0].name, 20)
+        extra = len(attached_images) - 1
+        return f"[📎 {first}] [+{extra}]"
+
+    base = image_counter - len(attached_images) + 1
+    return " ".join(
+        f"[📎 Image #{base + i}]"
+        for i in range(len(attached_images))
+    )
+
+
+def _should_auto_attach_clipboard_image_on_paste(pasted_text: str) -> bool:
+    """Auto-attach clipboard images only for image-only paste gestures."""
+    return not pasted_text.strip()
+
+
+def _collect_query_images(query: str | None, image_arg: str | None = None) -> tuple[str, list[Path]]:
+    """Collect local image attachments for single-query CLI flows."""
+    message = query or ""
+    images: list[Path] = []
+
+    if isinstance(message, str):
+        dropped = _detect_file_drop(message)
+        if dropped and dropped.get("is_image"):
+            images.append(dropped["path"])
+            message = dropped["remainder"] or f"[User attached image: {dropped['path'].name}]"
+
+    if image_arg:
+        explicit_path = _resolve_attachment_path(image_arg)
+        if explicit_path is None:
+            raise ValueError(f"Image file not found: {image_arg}")
+        if explicit_path.suffix.lower() not in _IMAGE_EXTENSIONS:
+            raise ValueError(f"Not a supported image file: {explicit_path}")
+        images.append(explicit_path)
+
+    deduped: list[Path] = []
+    seen: set[str] = set()
+    for img in images:
+        key = str(img)
+        if key in seen:
+            continue
+        seen.add(key)
+        deduped.append(img)
+    return message, deduped
+
+
 class ChatConsole:
    """Rich Console adapter for prompt_toolkit's patch_stdout context.

@@ -1478,6 +1652,9 @@ class HermesCLI:
        self.reasoning_config = _parse_reasoning_config(
            CLI_CONFIG["agent"].get("reasoning_effort", "")
        )
+        self.service_tier = _parse_service_tier_config(
+            CLI_CONFIG["agent"].get("service_tier", "")
+        )
        
        # OpenRouter provider routing preferences
        pr = CLI_CONFIG.get("provider_routing", {}) or {}
@@ -1701,15 +1878,70 @@ class HermesCLI:
            width += ch_width
        return "".join(out).rstrip() + ellipsis

+    @staticmethod
+    def _get_tui_terminal_width(default: tuple[int, int] = (80, 24)) -> int:
+        """Return the live prompt_toolkit width, falling back to ``shutil``.
+
+        The TUI layout can be narrower than ``shutil.get_terminal_size()`` reports,
+        especially on Termux/mobile shells, so prefer prompt_toolkit's width whenever
+        an app is active.
+        """
+        try:
+            from prompt_toolkit.application import get_app
+            return get_app().output.get_size().columns
+        except Exception:
+            return shutil.get_terminal_size(default).columns
+
+    def _use_minimal_tui_chrome(self, width: Optional[int] = None) -> bool:
+        """Hide low-value chrome on narrow/mobile terminals to preserve rows."""
+        if width is None:
+            width = self._get_tui_terminal_width()
+        return width < 64
+
+    def _tui_input_rule_height(self, position: str, width: Optional[int] = None) -> int:
+        """Return the visible height for the top/bottom input separator rules."""
+        if position not in {"top", "bottom"}:
+            raise ValueError(f"Unknown input rule position: {position}")
+        if position == "top":
+            return 1
+        return 0 if self._use_minimal_tui_chrome(width=width) else 1
+
+    def _agent_spacer_height(self, width: Optional[int] = None) -> int:
+        """Return the spacer height shown above the status bar while the agent runs."""
+        if not getattr(self, "_agent_running", False):
+            return 0
+        return 0 if self._use_minimal_tui_chrome(width=width) else 1
+
+    def _spinner_widget_height(self, width: Optional[int] = None) -> int:
+        """Return the visible height for the spinner/status text line above the status bar."""
+        if not getattr(self, "_spinner_text", ""):
+            return 0
+        return 0 if self._use_minimal_tui_chrome(width=width) else 1
+
+    def _get_voice_status_fragments(self, width: Optional[int] = None):
+        """Return the voice status bar fragments for the interactive TUI."""
+        width = width or self._get_tui_terminal_width()
+        compact = self._use_minimal_tui_chrome(width=width)
+        if self._voice_recording:
+            if compact:
+                return [("class:voice-status-recording", " ● REC ")]
+            return [("class:voice-status-recording", " ● REC  Ctrl+B to stop ")]
+        if self._voice_processing:
+            if compact:
+                return [("class:voice-status", " ◉ STT ")]
+            return [("class:voice-status", " ◉ Transcribing... ")]
+        if compact:
+            return [("class:voice-status", " 🎤 Ctrl+B ")]
+        tts = " | TTS on" if self._voice_tts else ""
+        cont = " | Continuous" if self._voice_continuous else ""
+        return [("class:voice-status", f" 🎤 Voice mode{tts}{cont}  —  Ctrl+B to record ")]
+
    def _build_status_bar_text(self, width: Optional[int] = None) -> str:
+        """Return a compact one-line session status string for the TUI footer."""
        try:
            snapshot = self._get_status_bar_snapshot()
            if width is None:
-                try:
-                    from prompt_toolkit.application import get_app
-                    width = get_app().output.get_size().columns
-                except Exception:
-                    width = shutil.get_terminal_size((80, 24)).columns
+                width = self._get_tui_terminal_width()
            percent = snapshot["context_percent"]
            percent_label = f"{percent}%" if percent is not None else "--"
            duration_label = snapshot["duration"]
@@ -1745,11 +1977,7 @@ class HermesCLI:
            # values (especially on SSH) that differ from what prompt_toolkit
            # actually renders, causing the fragments to overflow to a second
            # line and produce duplicated status bar rows over long sessions.
-            try:
-                from prompt_toolkit.application import get_app
-                width = get_app().output.get_size().columns
-            except Exception:
-                width = shutil.get_terminal_size((80, 24)).columns
+            width = self._get_tui_terminal_width()
            duration_label = snapshot["duration"]

            if width < 52:
@@ -2085,17 +2313,59 @@ class HermesCLI:
        # Append to a pre-filter buffer first
        self._stream_prefilt = getattr(self, "_stream_prefilt", "") + text

-        # Check if we're entering a reasoning block
+        # Check if we're entering a reasoning block.
+        # Only match tags that appear at a "block boundary": start of the
+        # stream, after a newline (with optional whitespace), or when nothing
+        # but whitespace has been emitted on the current line.
+        # This prevents false positives when models *mention* tags in prose
+        # like "(/think not producing <think> tags)".
+        #
+        # _stream_last_was_newline tracks whether the last character emitted
+        # (or the start of the stream) is a line boundary.  It's True at
+        # stream start and set True whenever emitted text ends with '\n'.
+        if not hasattr(self, "_stream_last_was_newline"):
+            self._stream_last_was_newline = True  # start of stream = boundary
+
        if not getattr(self, "_in_reasoning_block", False):
            for tag in _OPEN_TAGS:
-                idx = self._stream_prefilt.find(tag)
-                if idx != -1:
-                    # Emit everything before the tag
-                    before = self._stream_prefilt[:idx]
-                    if before:
-                        self._emit_stream_text(before)
-                    self._in_reasoning_block = True
-                    self._stream_prefilt = self._stream_prefilt[idx + len(tag):]
+                search_start = 0
+                while True:
+                    idx = self._stream_prefilt.find(tag, search_start)
+                    if idx == -1:
+                        break
+                    # Check if this is a block boundary position
+                    preceding = self._stream_prefilt[:idx]
+                    if idx == 0:
+                        # At buffer start — only a boundary if we're at
+                        # a line start (stream start or last emit ended
+                        # with newline)
+                        is_block_boundary = getattr(self, "_stream_last_was_newline", True)
+                    else:
+                        # Find last newline in the buffer before the tag
+                        last_nl = preceding.rfind("\n")
+                        if last_nl == -1:
+                            # No newline in buffer — boundary only if
+                            # last emit was a newline AND only whitespace
+                            # has accumulated before the tag
+                            is_block_boundary = (
+                                getattr(self, "_stream_last_was_newline", True)
+                                and preceding.strip() == ""
+                            )
+                        else:
+                            # Text between last newline and tag must be
+                            # whitespace-only
+                            is_block_boundary = preceding[last_nl + 1:].strip() == ""
+                    if is_block_boundary:
+                        # Emit everything before the tag
+                        if preceding:
+                            self._emit_stream_text(preceding)
+                            self._stream_last_was_newline = preceding.endswith("\n")
+                        self._in_reasoning_block = True
+                        self._stream_prefilt = self._stream_prefilt[idx + len(tag):]
+                        break
+                    # Not a block boundary — keep searching after this occurrence
+                    search_start = idx + 1
+                if getattr(self, "_in_reasoning_block", False):
                    break

            # Could also be a partial open tag at the end — hold it back
@@ -2109,6 +2379,7 @@ class HermesCLI:
                            break
                if safe:
                    self._emit_stream_text(safe)
+                    self._stream_last_was_newline = safe.endswith("\n")
                    self._stream_prefilt = self._stream_prefilt[len(safe):]
                return

@@ -2198,6 +2469,14 @@ class HermesCLI:

    def _flush_stream(self) -> None:
        """Emit any remaining partial line from the stream buffer and close the box."""
+        # If we're still inside a "reasoning block" at end-of-stream, it was
+        # a false positive — the model mentioned a tag like <think> in prose
+        # but never closed it.  Recover the buffered content as regular text.
+        if getattr(self, "_in_reasoning_block", False) and getattr(self, "_stream_prefilt", ""):
+            self._in_reasoning_block = False
+            self._emit_stream_text(self._stream_prefilt)
+            self._stream_prefilt = ""
+
        # Close reasoning box if still open (in case no content tokens arrived)
        self._close_reasoning_box()

@@ -2220,6 +2499,7 @@ class HermesCLI:
        self._stream_text_ansi = ""
        self._stream_prefilt = ""
        self._in_reasoning_block = False
+        self._stream_last_was_newline = True
        self._reasoning_box_opened = False
        self._reasoning_buf = ""
        self._reasoning_preview_buf = ""
@@ -2349,8 +2629,9 @@ class HermesCLI:
    def _resolve_turn_agent_config(self, user_message: str) -> dict:
        """Resolve model/runtime overrides for a single user turn."""
        from agent.smart_model_routing import resolve_turn_route
+        from hermes_cli.models import resolve_fast_mode_overrides

-        return resolve_turn_route(
+        route = resolve_turn_route(
            user_message,
            self._smart_model_routing,
            {
@@ -2365,7 +2646,19 @@ class HermesCLI:
            },
        )

-    def _init_agent(self, *, model_override: str = None, runtime_override: dict = None, route_label: str = None) -> bool:
+        service_tier = getattr(self, "service_tier", None)
+        if not service_tier:
+            route["request_overrides"] = None
+            return route
+
+        try:
+            overrides = resolve_fast_mode_overrides(route.get("model"))
+        except Exception:
+            overrides = None
+        route["request_overrides"] = overrides
+        return route
+
+    def _init_agent(self, *, model_override: str = None, runtime_override: dict = None, route_label: str = None, request_overrides: dict | None = None) -> bool:
        """
        Initialize the agent on first use.
        When resuming a session, restores conversation history from SQLite.
@@ -2452,6 +2745,8 @@ class HermesCLI:
                ephemeral_system_prompt=self.system_prompt if self.system_prompt else None,
                prefill_messages=self.prefill_messages or None,
                reasoning_config=self.reasoning_config,
+                service_tier=self.service_tier,
+                request_overrides=request_overrides,
                providers_allowed=self._providers_only,
                providers_ignored=self._providers_ignore,
                providers_order=self._providers_order,
@@ -2946,6 +3241,14 @@ class HermesCLI:
        doesn't fire for image-only clipboard content (e.g., VSCode terminal,
        Windows Terminal with WSL2).
        """
+        if _is_termux_environment():
+            _cprint(
+                f"  {_DIM}Clipboard image paste is not available on Termux — "
+                f"use /image <path> or paste a local image path like "
+                f"{_termux_example_image_path()}{_RST}"
+            )
+            return
+
        from hermes_cli.clipboard import has_clipboard_image
        if has_clipboard_image():
            if self._try_attach_clipboard_image():
@@ -2956,7 +3259,31 @@ class HermesCLI:
        else:
            _cprint(f"  {_DIM}(._.) No image found in clipboard{_RST}")

-    def _preprocess_images_with_vision(self, text: str, images: list) -> str:
+    def _handle_image_command(self, cmd_original: str):
+        """Handle /image <path> — attach a local image file for the next prompt."""
+        raw_args = (cmd_original.split(None, 1)[1].strip() if " " in cmd_original else "")
+        if not raw_args:
+            hint = _termux_example_image_path() if _is_termux_environment() else "/path/to/image.png"
+            _cprint(f"  {_DIM}Usage: /image <path>  e.g. /image {hint}{_RST}")
+            return
+
+        path_token, _remainder = _split_path_input(raw_args)
+        image_path = _resolve_attachment_path(path_token)
+        if image_path is None:
+            _cprint(f"  {_DIM}(>_<) File not found: {path_token}{_RST}")
+            return
+        if image_path.suffix.lower() not in _IMAGE_EXTENSIONS:
+            _cprint(f"  {_DIM}(._.) Not a supported image file: {image_path.name}{_RST}")
+            return
+
+        self._attached_images.append(image_path)
+        _cprint(f"  📎 Attached image: {image_path.name}")
+        if _remainder:
+            _cprint(f"  {_DIM}Now type your prompt (or use --image in single-query mode): {_remainder}{_RST}")
+        elif _is_termux_environment():
+            _cprint(f"  {_DIM}Tip: type your next message, or run hermes chat -q --image {_termux_example_image_path(image_path.name)} \"What do you see?\"{_RST}")
+
+    def _preprocess_images_with_vision(self, text: str, images: list, *, announce: bool = True) -> str:
        """Analyze attached images via the vision tool and return enriched text.

        Instead of embedding raw base64 ``image_url`` content parts in the
@@ -2983,7 +3310,8 @@ class HermesCLI:
            if not img_path.exists():
                continue
            size_kb = img_path.stat().st_size // 1024
-            _cprint(f"  {_DIM}👁️  analyzing {img_path.name} ({size_kb}KB)...{_RST}")
+            if announce:
+                _cprint(f"  {_DIM}👁️  analyzing {img_path.name} ({size_kb}KB)...{_RST}")
            try:
                result_json = _asyncio.run(
                    vision_analyze_tool(image_url=str(img_path), user_prompt=analysis_prompt)
@@ -2996,21 +3324,24 @@ class HermesCLI:
                        f"[If you need a closer look, use vision_analyze with "
                        f"image_url: {img_path}]"
                    )
-                    _cprint(f"  {_DIM}✓ image analyzed{_RST}")
+                    if announce:
+                        _cprint(f"  {_DIM}✓ image analyzed{_RST}")
                else:
                    enriched_parts.append(
                        f"[The user attached an image but it couldn't be analyzed. "
                        f"You can try examining it with vision_analyze using "
                        f"image_url: {img_path}]"
                    )
-                    _cprint(f"  {_DIM}⚠ vision analysis failed — path included for retry{_RST}")
+                    if announce:
+                        _cprint(f"  {_DIM}⚠ vision analysis failed — path included for retry{_RST}")
            except Exception as e:
                enriched_parts.append(
                    f"[The user attached an image but analysis failed ({e}). "
                    f"You can try examining it with vision_analyze using "
                    f"image_url: {img_path}]"
                )
-                _cprint(f"  {_DIM}⚠ vision analysis error — path included for retry{_RST}")
+                if announce:
+                    _cprint(f"  {_DIM}⚠ vision analysis error — path included for retry{_RST}")

        # Combine: vision descriptions first, then the user's original text
        user_text = text if isinstance(text, str) and text else ""
@@ -3073,6 +3404,20 @@ class HermesCLI:
            f"{toolsets_info}{provider_info}"
        )
    
+    def _fast_command_available(self) -> bool:
+        try:
+            from hermes_cli.models import model_supports_fast_mode
+        except Exception:
+            return False
+        agent = getattr(self, "agent", None)
+        model = getattr(agent, "model", None) or getattr(self, "model", None)
+        return model_supports_fast_mode(model)
+
+    def _command_available(self, slash_command: str) -> bool:
+        if slash_command == "/fast":
+            return self._fast_command_available()
+        return True
+
    def show_help(self):
        """Display help information with categorized commands."""
        from hermes_cli.commands import COMMANDS_BY_CATEGORY
@@ -3093,6 +3438,8 @@ class HermesCLI:
        for category, commands in COMMANDS_BY_CATEGORY.items():
            _cprint(f"\n  {_BOLD}── {category} ──{_RST}")
            for cmd, desc in commands.items():
+                if not self._command_available(cmd):
+                    continue
                ChatConsole().print(f"    [bold {_accent_hex()}]{cmd:<15}[/] [dim]-[/] {_escape(desc)}")

        if _skill_commands:
@@ -3104,7 +3451,10 @@ class HermesCLI:

        _cprint(f"\n  {_DIM}Tip: Just type your message to chat with Hermes!{_RST}")
        _cprint(f"  {_DIM}Multi-line: Alt+Enter for a new line{_RST}")
-        _cprint(f"  {_DIM}Paste image: Alt+V (or /paste){_RST}\n")
+        if _is_termux_environment():
+            _cprint(f"  {_DIM}Attach image: /image {_termux_example_image_path()} or start your prompt with a local image path{_RST}\n")
+        else:
+            _cprint(f"  {_DIM}Paste image: Alt+V (or /paste){_RST}\n")
    
    def show_tools(self):
        """Display available tools with kawaii ASCII art."""
@@ -4542,6 +4892,8 @@ class HermesCLI:
            self._toggle_yolo()
        elif canonical == "reasoning":
            self._handle_reasoning_command(cmd_original)
+        elif canonical == "fast":
+            self._handle_fast_command(cmd_original)
        elif canonical == "compress":
            self._manual_compress()
        elif canonical == "usage":
@@ -4550,6 +4902,8 @@ class HermesCLI:
            self._show_insights(cmd_original)
        elif canonical == "paste":
            self._handle_paste_command()
+        elif canonical == "image":
+            self._handle_image_command(cmd_original)
        elif canonical == "reload-mcp":
            with self._busy_command(self._slow_command_status(cmd_original)):
                self._reload_mcp()
@@ -4779,6 +5133,8 @@ class HermesCLI:
                    platform="cli",
                    session_db=self._session_db,
                    reasoning_config=self.reasoning_config,
+                    service_tier=self.service_tier,
+                    request_overrides=turn_route.get("request_overrides"),
                    providers_allowed=self._providers_only,
                    providers_ignored=self._providers_ignore,
                    providers_order=self._providers_order,
@@ -4914,6 +5270,8 @@ class HermesCLI:
                    session_id=task_id,
                    platform="cli",
                    reasoning_config=self.reasoning_config,
+                    service_tier=self.service_tier,
+                    request_overrides=turn_route.get("request_overrides"),
                    providers_allowed=self._providers_only,
                    providers_ignored=self._providers_ignore,
                    providers_order=self._providers_order,
@@ -5343,6 +5701,49 @@ class HermesCLI:
        else:
            _cprint(f"  {_GOLD}✓ Reasoning effort set to '{arg}' (session only){_RST}")

+    def _handle_fast_command(self, cmd: str):
+        """Handle /fast — toggle fast mode (OpenAI Priority Processing / Anthropic Fast Mode)."""
+        if not self._fast_command_available():
+            _cprint("  (._.) /fast is only available for models that support fast mode (OpenAI Priority Processing or Anthropic Fast Mode).")
+            return
+
+        # Determine the branding for the current model
+        try:
+            from hermes_cli.models import _is_anthropic_fast_model
+            agent = getattr(self, "agent", None)
+            model = getattr(agent, "model", None) or getattr(self, "model", None)
+            feature_name = "Anthropic Fast Mode" if _is_anthropic_fast_model(model) else "Priority Processing"
+        except Exception:
+            feature_name = "Fast mode"
+
+        parts = cmd.strip().split(maxsplit=1)
+        if len(parts) < 2 or parts[1].strip().lower() == "status":
+            status = "fast" if self.service_tier == "priority" else "normal"
+            _cprint(f"  {_GOLD}{feature_name}: {status}{_RST}")
+            _cprint(f"  {_DIM}Usage: /fast [normal|fast|status]{_RST}")
+            return
+
+        arg = parts[1].strip().lower()
+
+        if arg in {"fast", "on"}:
+            self.service_tier = "priority"
+            saved_value = "fast"
+            label = "FAST"
+        elif arg in {"normal", "off"}:
+            self.service_tier = None
+            saved_value = "normal"
+            label = "NORMAL"
+        else:
+            _cprint(f"  {_DIM}(._.) Unknown argument: {arg}{_RST}")
+            _cprint(f"  {_DIM}Usage: /fast [normal|fast|status]{_RST}")
+            return
+
+        self.agent = None  # Force agent re-init with new service-tier config
+        if save_config_value("agent.service_tier", saved_value):
+            _cprint(f"  {_GOLD}✓ {feature_name} set to {label} (saved to config){_RST}")
+        else:
+            _cprint(f"  {_GOLD}✓ {feature_name} set to {label} (session only){_RST}")
+
    def _on_reasoning(self, reasoning_text: str):
        """Callback for intermediate reasoning display during tool-call loops."""
        if not reasoning_text:
@@ -5743,10 +6144,23 @@ class HermesCLI:
        """Start capturing audio from the microphone."""
        if getattr(self, '_should_exit', False):
            return
-        from tools.voice_mode import AudioRecorder, check_voice_requirements
+        from tools.voice_mode import create_audio_recorder, check_voice_requirements

        reqs = check_voice_requirements()
        if not reqs["audio_available"]:
+            if _is_termux_environment():
+                details = reqs.get("details", "")
+                if "Termux:API Android app is not installed" in details:
+                    raise RuntimeError(
+                        "Termux:API command package detected, but the Android app is missing.\n"
+                        "Install/update the Termux:API Android app, then retry /voice on.\n"
+                        "Fallback: pkg install python-numpy portaudio && python -m pip install sounddevice"
+                    )
+                raise RuntimeError(
+                    "Voice mode requires either Termux:API microphone access or Python audio libraries.\n"
+                    "Option 1: pkg install termux-api and install the Termux:API Android app\n"
+                    "Option 2: pkg install python-numpy portaudio && python -m pip install sounddevice"
+                )
            raise RuntimeError(
                "Voice mode requires sounddevice and numpy.\n"
                "Install with: pip install sounddevice numpy\n"
@@ -5775,7 +6189,7 @@ class HermesCLI:
            pass

        if self._voice_recorder is None:
-            self._voice_recorder = AudioRecorder()
+            self._voice_recorder = create_audio_recorder()

        # Apply config-driven silence params
        self._voice_recorder._silence_threshold = voice_cfg.get("silence_threshold", 200)
@@ -5804,7 +6218,13 @@ class HermesCLI:
            with self._voice_lock:
                self._voice_recording = False
            raise
-        _cprint(f"\n{_GOLD}● Recording...{_RST} {_DIM}(auto-stops on silence | Ctrl+B to stop & exit continuous){_RST}")
+        if getattr(self._voice_recorder, "supports_silence_autostop", True):
+            _recording_hint = "auto-stops on silence | Ctrl+B to stop & exit continuous"
+        elif _is_termux_environment():
+            _recording_hint = "Termux:API capture | Ctrl+B to stop"
+        else:
+            _recording_hint = "Ctrl+B to stop"
+        _cprint(f"\n{_GOLD}● Recording...{_RST} {_DIM}({_recording_hint}){_RST}")

        # Periodically refresh prompt to update audio level indicator
        def _refresh_level():
@@ -5867,6 +6287,9 @@ class HermesCLI:

            if result.get("success") and result.get("transcript", "").strip():
                transcript = result["transcript"].strip()
+                self._attached_images.clear()
+                if hasattr(self, '_app') and self._app:
+                    self._app.invalidate()
                self._pending_input.put(transcript)
                submitted = True
            elif result.get("success"):
@@ -6012,8 +6435,13 @@ class HermesCLI:
            for line in reqs["details"].split("\n"):
                _cprint(f"  {_DIM}{line}{_RST}")
            if reqs["missing_packages"]:
-                _cprint(f"\n  {_BOLD}Install: pip install {' '.join(reqs['missing_packages'])}{_RST}")
-                _cprint(f"  {_DIM}Or: pip install hermes-agent[voice]{_RST}")
+                if _is_termux_environment():
+                    _cprint(f"\n  {_BOLD}Option 1: pkg install termux-api{_RST}")
+                    _cprint(f"  {_DIM}Then install/update the Termux:API Android app for microphone capture{_RST}")
+                    _cprint(f"  {_BOLD}Option 2: pkg install python-numpy portaudio && python -m pip install sounddevice{_RST}")
+                else:
+                    _cprint(f"\n  {_BOLD}Install: pip install {' '.join(reqs['missing_packages'])}{_RST}")
+                    _cprint(f"  {_DIM}Or: pip install hermes-agent[voice]{_RST}")
            return

        with self._voice_lock:
@@ -6477,6 +6905,7 @@ class HermesCLI:
            model_override=turn_route["model"],
            runtime_override=turn_route["runtime"],
            route_label=turn_route["label"],
+            request_overrides=turn_route.get("request_overrides"),
        ):
            return None
        
@@ -6967,27 +7396,39 @@ class HermesCLI:
    def _get_tui_prompt_fragments(self):
        """Return the prompt_toolkit fragments for the current interactive state."""
        symbol, state_suffix = self._get_tui_prompt_symbols()
+        compact = self._use_minimal_tui_chrome(width=self._get_tui_terminal_width())
+
+        def _state_fragment(style: str, icon: str, extra: str = ""):
+            if compact:
+                text = icon
+                if extra:
+                    text = f"{text} {extra.strip()}".rstrip()
+                return [(style, text + " ")]
+            if extra:
+                return [(style, f"{icon} {extra} {state_suffix}")]
+            return [(style, f"{icon} {state_suffix}")]
+
        if self._voice_recording:
            bar = self._audio_level_bar()
-            return [("class:voice-recording", f"● {bar} {state_suffix}")]
+            return _state_fragment("class:voice-recording", "●", bar)
        if self._voice_processing:
-            return [("class:voice-processing", f"◉ {state_suffix}")]
+            return _state_fragment("class:voice-processing", "◉")
        if self._sudo_state:
-            return [("class:sudo-prompt", f"🔐 {state_suffix}")]
+            return _state_fragment("class:sudo-prompt", "🔐")
        if self._secret_state:
-            return [("class:sudo-prompt", f"🔑 {state_suffix}")]
+            return _state_fragment("class:sudo-prompt", "🔑")
        if self._approval_state:
-            return [("class:prompt-working", f"⚠ {state_suffix}")]
+            return _state_fragment("class:prompt-working", "⚠")
        if self._clarify_freetext:
-            return [("class:clarify-selected", f"✎ {state_suffix}")]
+            return _state_fragment("class:clarify-selected", "✎")
        if self._clarify_state:
-            return [("class:prompt-working", f"? {state_suffix}")]
+            return _state_fragment("class:prompt-working", "?")
        if self._command_running:
-            return [("class:prompt-working", f"{self._command_spinner_frame()} {state_suffix}")]
+            return _state_fragment("class:prompt-working", self._command_spinner_frame())
        if self._agent_running:
-            return [("class:prompt-working", f"⚕ {state_suffix}")]
+            return _state_fragment("class:prompt-working", "⚕")
        if self._voice_mode:
-            return [("class:voice-prompt", f"🎤 {state_suffix}")]
+            return _state_fragment("class:voice-prompt", "🎤")
        return [("class:prompt", symbol)]

    def _get_tui_prompt_text(self) -> str:
@@ -7573,8 +8014,9 @@ class HermesCLI:
            """Handle terminal paste — detect clipboard images.

            When the terminal supports bracketed paste, Ctrl+V / Cmd+V
-            triggers this with the pasted text.  We also check the
-            clipboard for an image on every paste event.
+            triggers this with the pasted text. We only auto-attach a
+            clipboard image for image-only/empty paste gestures so text
+            pastes and dictation do not accidentally attach stale images.

            Large pastes (5+ lines) are collapsed to a file reference
            placeholder while preserving any existing user text in the
@@ -7584,7 +8026,7 @@ class HermesCLI:
            # Normalise line endings — Windows \r\n and old Mac \r both become \n
            # so the 5-line collapse threshold and display are consistent.
            pasted_text = pasted_text.replace('\r\n', '\n').replace('\r', '\n')
-            if self._try_attach_clipboard_image():
+            if _should_auto_attach_clipboard_image_on_paste(pasted_text) and self._try_attach_clipboard_image():
                event.app.invalidate()
            if pasted_text:
                line_count = pasted_text.count('\n')
@@ -7647,6 +8089,7 @@ class HermesCLI:

        _completer = SlashCommandCompleter(
            skill_commands_provider=lambda: _skill_commands,
+            command_filter=cli_ref._command_available,
        )
        input_area = TextArea(
            height=Dimension(min=1, max=8, preferred=1),
@@ -7843,9 +8286,9 @@ class HermesCLI:
        def get_hint_height():
            if cli_ref._sudo_state or cli_ref._secret_state or cli_ref._approval_state or cli_ref._clarify_state or cli_ref._command_running:
                return 1
-            # Keep a 1-line spacer while agent runs so output doesn't push
-            # right up against the top rule of the input area
-            return 1 if cli_ref._agent_running else 0
+            # Keep a spacer while the agent runs on roomy terminals, but reclaim
+            # the row on narrow/mobile screens where every line matters.
+            return cli_ref._agent_spacer_height()

        def get_spinner_text():
            txt = cli_ref._spinner_text
@@ -7854,7 +8297,7 @@ class HermesCLI:
            return [('class:hint', f'  {txt}')]

        def get_spinner_height():
-            return 1 if cli_ref._spinner_text else 0
+            return cli_ref._spinner_widget_height()

        spinner_widget = Window(
            content=FormattedTextControl(get_spinner_text),
@@ -8045,18 +8488,17 @@ class HermesCLI:
            filter=Condition(lambda: cli_ref._approval_state is not None),
        )

-        # Horizontal rules above and below the input (bronze, 1 line each).
-        # The bottom rule moves down as the TextArea grows with newlines.
-        # Using char='─' instead of hardcoded repetition so the rule
-        # always spans the full terminal width on any screen size.
+        # Horizontal rules above and below the input.
+        # On narrow/mobile terminals we keep the top separator for structure but
+        # hide the bottom one to recover a full row for conversation content.
        input_rule_top = Window(
            char='─',
-            height=1,
+            height=lambda: cli_ref._tui_input_rule_height("top"),
            style='class:input-rule',
        )
        input_rule_bot = Window(
            char='─',
-            height=1,
+            height=lambda: cli_ref._tui_input_rule_height("bottom"),
            style='class:input-rule',
        )

@@ -8066,10 +8508,9 @@ class HermesCLI:
        def _get_image_bar():
            if not cli_ref._attached_images:
                return []
-            base = cli_ref._image_counter - len(cli_ref._attached_images) + 1
-            badges = " ".join(
-                f"[📎 Image #{base + i}]"
-                for i in range(len(cli_ref._attached_images))
+            badges = _format_image_attachment_badges(
+                cli_ref._attached_images,
+                cli_ref._image_counter,
            )
            return [("class:image-badge", f" {badges} ")]

@@ -8080,13 +8521,7 @@ class HermesCLI:

        # Persistent voice mode status bar (visible only when voice mode is on)
        def _get_voice_status():
-            if cli_ref._voice_recording:
-                return [('class:voice-status-recording', ' ● REC  Ctrl+B to stop ')]
-            if cli_ref._voice_processing:
-                return [('class:voice-status', ' ◉ Transcribing... ')]
-            tts = " | TTS on" if cli_ref._voice_tts else ""
-            cont = " | Continuous" if cli_ref._voice_continuous else ""
-            return [('class:voice-status', f' 🎤 Voice mode{tts}{cont}  —  Ctrl+B to record ')]
+            return cli_ref._get_voice_status_fragments()

        voice_status_bar = ConditionalContainer(
            Window(
@@ -8542,6 +8977,7 @@ class HermesCLI:
 def main(
    query: str = None,
    q: str = None,
+    image: str = None,
    toolsets: str = None,
    skills: str | list[str] | tuple[str, ...] = None,
    model: str = None,
@@ -8567,6 +9003,7 @@ def main(
    Args:
        query: Single query to execute (then exit). Alias: -q
        q: Shorthand for --query
+        image: Optional local image path to attach to a single query
        toolsets: Comma-separated list of toolsets to enable (e.g., "web,terminal")
        skills: Comma-separated or repeated list of skills to preload for the session
        model: Model to use (default: anthropic/claude-opus-4-20250514)
@@ -8587,6 +9024,7 @@ def main(
        python cli.py --toolsets web,terminal    # Use specific toolsets
        python cli.py --skills hermes-agent-dev,github-auth
        python cli.py -q "What is Python?"       # Single query mode
+        python cli.py -q "Describe this" --image ~/storage/shared/Pictures/cat.png
        python cli.py --list-tools               # List tools and exit
        python cli.py --resume 20260225_143052_a1b2c3  # Resume session
        python cli.py -w                         # Start in isolated git worktree
@@ -8709,23 +9147,33 @@ def main(
    atexit.register(_run_cleanup)
    
    # Handle single query mode
-    if query:
+    if query or image:
+        query, single_query_images = _collect_query_images(query, image)
        if quiet:
            # Quiet mode: suppress banner, spinner, tool previews.
            # Only print the final response and parseable session info.
            cli.tool_progress_mode = "off"
            if cli._ensure_runtime_credentials():
-                turn_route = cli._resolve_turn_agent_config(query)
+                effective_query = query
+                if single_query_images:
+                    effective_query = cli._preprocess_images_with_vision(
+                        query,
+                        single_query_images,
+                        announce=False,
+                    )
+                turn_route = cli._resolve_turn_agent_config(effective_query)
                if turn_route["signature"] != cli._active_agent_route_signature:
                    cli.agent = None
                if cli._init_agent(
                    model_override=turn_route["model"],
                    runtime_override=turn_route["runtime"],
                    route_label=turn_route["label"],
+                    request_overrides=turn_route.get("request_overrides"),
                ):
                    cli.agent.quiet_mode = True
+                    cli.agent.suppress_status_output = True
                    result = cli.agent.run_conversation(
-                        user_message=query,
+                        user_message=effective_query,
                        conversation_history=cli.conversation_history,
                    )
                    response = result.get("final_response", "") if isinstance(result, dict) else str(result)
@@ -8740,8 +9188,10 @@ def main(
            sys.exit(1)
        else:
            cli.show_banner()
-            cli.console.print(f"[bold blue]Query:[/] {query}")
-            cli.chat(query)
+            _query_label = query or ("[image attached]" if single_query_images else "")
+            if _query_label:
+                cli.console.print(f"[bold blue]Query:[/] {_query_label}")
+            cli.chat(query, images=single_query_images or None)
            cli._print_exit_summary()
        return
    
@@ -0,0 +1,15 @@
+# Termux / Android dependency constraints for Hermes Agent.
+#
+# Usage:
+#   python -m pip install -e '.[termux]' -c constraints-termux.txt
+#
+# These pins keep the tested Android install path stable when upstream packages
+# move faster than Termux-compatible wheels / sdists.
+
+ipython<10
+jedi>=0.18.1,<0.20
+parso>=0.8.4,<0.9
+stack-data>=0.6,<0.7
+pexpect>4.3,<5
+matplotlib-inline>=0.1.7,<0.2
+asttokens>=2.1,<3
@@ -346,7 +346,42 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
    return None


-_SCRIPT_TIMEOUT = 120  # seconds
+_DEFAULT_SCRIPT_TIMEOUT = 120  # seconds
+# Backward-compatible module override used by tests and emergency monkeypatches.
+_SCRIPT_TIMEOUT = _DEFAULT_SCRIPT_TIMEOUT
+
+
+def _get_script_timeout() -> int:
+    """Resolve cron pre-run script timeout from module/env/config with a safe default."""
+    if _SCRIPT_TIMEOUT != _DEFAULT_SCRIPT_TIMEOUT:
+        try:
+            timeout = int(float(_SCRIPT_TIMEOUT))
+            if timeout > 0:
+                return timeout
+        except Exception:
+            logger.warning("Invalid patched _SCRIPT_TIMEOUT=%r; using env/config/default", _SCRIPT_TIMEOUT)
+
+    env_value = os.getenv("HERMES_CRON_SCRIPT_TIMEOUT", "").strip()
+    if env_value:
+        try:
+            timeout = int(float(env_value))
+            if timeout > 0:
+                return timeout
+        except Exception:
+            logger.warning("Invalid HERMES_CRON_SCRIPT_TIMEOUT=%r; using config/default", env_value)
+
+    try:
+        cfg = load_config() or {}
+        cron_cfg = cfg.get("cron", {}) if isinstance(cfg, dict) else {}
+        configured = cron_cfg.get("script_timeout_seconds")
+        if configured is not None:
+            timeout = int(float(configured))
+            if timeout > 0:
+                return timeout
+    except Exception as exc:
+        logger.debug("Failed to load cron script timeout from config: %s", exc)
+
+    return _DEFAULT_SCRIPT_TIMEOUT


 def _run_job_script(script_path: str) -> tuple[bool, str]:
@@ -393,12 +428,14 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
    if not path.is_file():
        return False, f"Script path is not a file: {path}"

+    script_timeout = _get_script_timeout()
+
    try:
        result = subprocess.run(
            [sys.executable, str(path)],
            capture_output=True,
            text=True,
-            timeout=_SCRIPT_TIMEOUT,
+            timeout=script_timeout,
            cwd=str(path.parent),
        )
        stdout = (result.stdout or "").strip()
@@ -422,7 +459,7 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
        return True, stdout

    except subprocess.TimeoutExpired:
-        return False, f"Script timed out after {_SCRIPT_TIMEOUT}s: {path}"
+        return False, f"Script timed out after {script_timeout}s: {path}"
    except Exception as exc:
        return False, f"Script execution failed: {exc}"

@@ -646,6 +683,24 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            },
        )

+        fallback_model = _cfg.get("fallback_providers") or _cfg.get("fallback_model") or None
+        credential_pool = None
+        runtime_provider = str(turn_route["runtime"].get("provider") or "").strip().lower()
+        if runtime_provider:
+            try:
+                from agent.credential_pool import load_pool
+                pool = load_pool(runtime_provider)
+                if pool.has_credentials():
+                    credential_pool = pool
+                    logger.info(
+                        "Job '%s': loaded credential pool for provider %s with %d entries",
+                        job_id,
+                        runtime_provider,
+                        len(pool.entries()),
+                    )
+            except Exception as e:
+                logger.debug("Job '%s': failed to load credential pool for %s: %s", job_id, runtime_provider, e)
+
        agent = AIAgent(
            model=turn_route["model"],
            api_key=turn_route["runtime"].get("api_key"),
@@ -657,6 +712,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            max_iterations=max_iterations,
            reasoning_config=reasoning_config,
            prefill_messages=prefill_messages,
+            fallback_model=fallback_model,
+            credential_pool=credential_pool,
            providers_allowed=pr.get("only"),
            providers_ignored=pr.get("ignore"),
            providers_order=pr.get("order"),
@@ -901,6 +901,9 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
                pass
        if api_server_host:
            config.platforms[Platform.API_SERVER].extra["host"] = api_server_host
+        api_server_model_name = os.getenv("API_SERVER_MODEL_NAME", "")
+        if api_server_model_name:
+            config.platforms[Platform.API_SERVER].extra["model_name"] = api_server_model_name

    # Webhook platform
    webhook_enabled = os.getenv("WEBHOOK_ENABLED", "").lower() in ("true", "1", "yes")
@@ -299,6 +299,9 @@ class APIServerAdapter(BasePlatformAdapter):
        self._cors_origins: tuple[str, ...] = self._parse_cors_origins(
            extra.get("cors_origins", os.getenv("API_SERVER_CORS_ORIGINS", "")),
        )
+        self._model_name: str = self._resolve_model_name(
+            extra.get("model_name", os.getenv("API_SERVER_MODEL_NAME", "")),
+        )
        self._app: Optional["web.Application"] = None
        self._runner: Optional["web.AppRunner"] = None
        self._site: Optional["web.TCPSite"] = None
@@ -324,6 +327,26 @@ class APIServerAdapter(BasePlatformAdapter):

        return tuple(str(item).strip() for item in items if str(item).strip())

+    @staticmethod
+    def _resolve_model_name(explicit: str) -> str:
+        """Derive the advertised model name for /v1/models.
+
+        Priority:
+        1. Explicit override (config extra or API_SERVER_MODEL_NAME env var)
+        2. Active profile name (so each profile advertises a distinct model)
+        3. Fallback: "hermes-agent"
+        """
+        if explicit and explicit.strip():
+            return explicit.strip()
+        try:
+            from hermes_cli.profiles import get_active_profile_name
+            profile = get_active_profile_name()
+            if profile and profile not in ("default", "custom"):
+                return profile
+        except Exception:
+            pass
+        return "hermes-agent"
+
    def _cors_headers_for_origin(self, origin: str) -> Optional[Dict[str, str]]:
        """Return CORS headers for an allowed browser origin."""
        if not origin or not self._cors_origins:
@@ -468,12 +491,12 @@ class APIServerAdapter(BasePlatformAdapter):
            "object": "list",
            "data": [
                {
-                    "id": "hermes-agent",
+                    "id": self._model_name,
                    "object": "model",
                    "created": int(time.time()),
                    "owned_by": "hermes",
                    "permission": [],
-                    "root": "hermes-agent",
+                    "root": self._model_name,
                    "parent": None,
                }
            ],
@@ -531,8 +554,26 @@ class APIServerAdapter(BasePlatformAdapter):

        # Allow caller to continue an existing session by passing X-Hermes-Session-Id.
        # When provided, history is loaded from state.db instead of from the request body.
+        #
+        # Security: session continuation exposes conversation history, so it is
+        # only allowed when the API key is configured and the request is
+        # authenticated.  Without this gate, any unauthenticated client could
+        # read arbitrary session history by guessing/enumerating session IDs.
        provided_session_id = request.headers.get("X-Hermes-Session-Id", "").strip()
        if provided_session_id:
+            if not self._api_key:
+                logger.warning(
+                    "Session continuation via X-Hermes-Session-Id rejected: "
+                    "no API key configured.  Set API_SERVER_KEY to enable "
+                    "session continuity."
+                )
+                return web.json_response(
+                    _openai_error(
+                        "Session continuation requires API key authentication. "
+                        "Configure API_SERVER_KEY to enable this feature."
+                    ),
+                    status=403,
+                )
            session_id = provided_session_id
            try:
                db = self._ensure_session_db()
@@ -546,7 +587,7 @@ class APIServerAdapter(BasePlatformAdapter):
            # history already set from request body above

        completion_id = f"chatcmpl-{uuid.uuid4().hex[:29]}"
-        model_name = body.get("model", "hermes-agent")
+        model_name = body.get("model", self._model_name)
        created = int(time.time())

        if stream:
@@ -923,7 +964,7 @@ class APIServerAdapter(BasePlatformAdapter):
            "object": "response",
            "status": "completed",
            "created_at": created_at,
-            "model": body.get("model", "hermes-agent"),
+            "model": body.get("model", self._model_name),
            "output": output_items,
            "usage": {
                "input_tokens": usage.get("input_tokens", 0),
@@ -1652,9 +1693,17 @@ class APIServerAdapter(BasePlatformAdapter):
            await self._site.start()

            self._mark_connected()
+            if not self._api_key:
+                logger.warning(
+                    "[%s] ⚠️  No API key configured (API_SERVER_KEY / platforms.api_server.key). "
+                    "All requests will be accepted without authentication. "
+                    "Set an API key for production deployments to prevent "
+                    "unauthorized access to sessions, responses, and cron jobs.",
+                    self.name,
+                )
            logger.info(
-                "[%s] API server listening on http://%s:%d",
-                self.name, self._host, self._port,
+                "[%s] API server listening on http://%s:%d (model: %s)",
+                self.name, self._host, self._port, self._model_name,
            )
            return True

@@ -422,6 +422,7 @@ class DiscordAdapter(BasePlatformAdapter):

    # Discord message limits
    MAX_MESSAGE_LENGTH = 2000
+    _SPLIT_THRESHOLD = 1900  # near the 2000-char split point

    # Auto-disconnect from voice channel after this many seconds of inactivity
    VOICE_TIMEOUT = 300
@@ -433,6 +434,11 @@ class DiscordAdapter(BasePlatformAdapter):
        self._allowed_user_ids: set = set()  # For button approval authorization
        # Voice channel state (per-guild)
        self._voice_clients: Dict[int, Any] = {}  # guild_id -> VoiceClient
+        # Text batching: merge rapid successive messages (Telegram-style)
+        self._text_batch_delay_seconds = float(os.getenv("HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS", "0.6"))
+        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
+        self._pending_text_batches: Dict[str, MessageEvent] = {}
+        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
        self._voice_text_channels: Dict[int, int] = {}  # guild_id -> text_channel_id
        self._voice_timeout_tasks: Dict[int, asyncio.Task] = {}  # guild_id -> timeout task
        # Phase 2: voice listening
@@ -2466,7 +2472,80 @@ class DiscordAdapter(BasePlatformAdapter):
        if thread_id:
            self._track_thread(thread_id)

-        await self.handle_message(event)
+        # Only batch plain text messages — commands, media, etc. dispatch
+        # immediately since they won't be split by the Discord client.
+        if msg_type == MessageType.TEXT and self._text_batch_delay_seconds > 0:
+            self._enqueue_text_event(event)
+        else:
+            await self.handle_message(event)
+
+    # ------------------------------------------------------------------
+    # Text message aggregation (handles Discord client-side splits)
+    # ------------------------------------------------------------------
+
+    def _text_batch_key(self, event: MessageEvent) -> str:
+        """Session-scoped key for text message batching."""
+        from gateway.session import build_session_key
+        return build_session_key(
+            event.source,
+            group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
+            thread_sessions_per_user=self.config.extra.get("thread_sessions_per_user", False),
+        )
+
+    def _enqueue_text_event(self, event: MessageEvent) -> None:
+        """Buffer a text event and reset the flush timer.
+
+        When Discord splits a long user message at 2000 chars, the chunks
+        arrive within a few hundred milliseconds.  This merges them into
+        a single event before dispatching.
+        """
+        key = self._text_batch_key(event)
+        existing = self._pending_text_batches.get(key)
+        chunk_len = len(event.text or "")
+        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            self._pending_text_batches[key] = event
+        else:
+            if event.text:
+                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            if event.media_urls:
+                existing.media_urls.extend(event.media_urls)
+                existing.media_types.extend(event.media_types)
+
+        prior_task = self._pending_text_batch_tasks.get(key)
+        if prior_task and not prior_task.done():
+            prior_task.cancel()
+        self._pending_text_batch_tasks[key] = asyncio.create_task(
+            self._flush_text_batch(key)
+        )
+
+    async def _flush_text_batch(self, key: str) -> None:
+        """Wait for the quiet period then dispatch the aggregated text.
+
+        Uses a longer delay when the latest chunk is near Discord's 2000-char
+        split point, since a continuation chunk is almost certain.
+        """
+        current_task = asyncio.current_task()
+        try:
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
+            event = self._pending_text_batches.pop(key, None)
+            if not event:
+                return
+            logger.info(
+                "[Discord] Flushing text batch %s (%d chars)",
+                key, len(event.text or ""),
+            )
+            await self.handle_message(event)
+        finally:
+            if self._pending_text_batch_tasks.get(key) is current_task:
+                self._pending_text_batch_tasks.pop(key, None)


 # ---------------------------------------------------------------------------
@@ -264,6 +264,7 @@ class FeishuAdapterSettings:
    bot_name: str
    dedup_cache_size: int
    text_batch_delay_seconds: float
+    text_batch_split_delay_seconds: float
    text_batch_max_messages: int
    text_batch_max_chars: int
    media_batch_delay_seconds: float
@@ -1014,6 +1015,10 @@ class FeishuAdapter(BasePlatformAdapter):
    """Feishu/Lark bot adapter."""

    MAX_MESSAGE_LENGTH = 8000
+    # Threshold for detecting Feishu client-side message splits.
+    # When a chunk is near the ~4096-char practical limit, a continuation
+    # is almost certain.
+    _SPLIT_THRESHOLD = 4000

    # =========================================================================
    # Lifecycle — init / settings / connect / disconnect
@@ -1105,6 +1110,9 @@ class FeishuAdapter(BasePlatformAdapter):
            text_batch_delay_seconds=float(
                os.getenv("HERMES_FEISHU_TEXT_BATCH_DELAY_SECONDS", str(_DEFAULT_TEXT_BATCH_DELAY_SECONDS))
            ),
+            text_batch_split_delay_seconds=float(
+                os.getenv("HERMES_FEISHU_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0")
+            ),
            text_batch_max_messages=max(
                1,
                int(os.getenv("HERMES_FEISHU_TEXT_BATCH_MAX_MESSAGES", str(_DEFAULT_TEXT_BATCH_MAX_MESSAGES))),
@@ -1152,6 +1160,7 @@ class FeishuAdapter(BasePlatformAdapter):
        self._bot_name = settings.bot_name
        self._dedup_cache_size = settings.dedup_cache_size
        self._text_batch_delay_seconds = settings.text_batch_delay_seconds
+        self._text_batch_split_delay_seconds = settings.text_batch_split_delay_seconds
        self._text_batch_max_messages = settings.text_batch_max_messages
        self._text_batch_max_chars = settings.text_batch_max_chars
        self._media_batch_delay_seconds = settings.media_batch_delay_seconds
@@ -2478,8 +2487,10 @@ class FeishuAdapter(BasePlatformAdapter):
    async def _enqueue_text_event(self, event: MessageEvent) -> None:
        """Debounce rapid Feishu text bursts into a single MessageEvent."""
        key = self._text_batch_key(event)
+        chunk_len = len(event.text or "")
        existing = self._pending_text_batches.get(key)
        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
            self._pending_text_batches[key] = event
            self._pending_text_batch_counts[key] = 1
            self._schedule_text_batch_flush(key)
@@ -2504,6 +2515,7 @@ class FeishuAdapter(BasePlatformAdapter):
            return

        existing.text = next_text
+        existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
        existing.timestamp = event.timestamp
        if event.message_id:
            existing.message_id = event.message_id
@@ -2530,10 +2542,22 @@ class FeishuAdapter(BasePlatformAdapter):
        task_map[key] = asyncio.create_task(flush_fn(key))

    async def _flush_text_batch(self, key: str) -> None:
-        """Flush a pending text batch after the quiet period."""
+        """Flush a pending text batch after the quiet period.
+
+        Uses a longer delay when the latest chunk is near Feishu's ~4096-char
+        split point, since a continuation chunk is almost certain.
+        """
        current_task = asyncio.current_task()
        try:
-            await asyncio.sleep(self._text_batch_delay_seconds)
+            # Adaptive delay: if the latest chunk is near the split threshold,
+            # a continuation is almost certain — wait longer.
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
            await self._flush_text_batch_now(key)
        finally:
            if self._pending_text_batch_tasks.get(key) is current_task:
@@ -120,6 +120,11 @@ def check_matrix_requirements() -> bool:
 class MatrixAdapter(BasePlatformAdapter):
    """Gateway adapter for Matrix (any homeserver)."""

+    # Threshold for detecting Matrix client-side message splits.
+    # When a chunk is near the ~4000-char practical limit, a continuation
+    # is almost certain.
+    _SPLIT_THRESHOLD = 3900
+
    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.MATRIX)

@@ -172,6 +177,13 @@ class MatrixAdapter(BasePlatformAdapter):
            "MATRIX_REACTIONS", "true"
        ).lower() not in ("false", "0", "no")

+        # Text batching: merge rapid successive messages (Telegram-style).
+        # Matrix clients split long messages around 4000 chars.
+        self._text_batch_delay_seconds = float(os.getenv("HERMES_MATRIX_TEXT_BATCH_DELAY_SECONDS", "0.6"))
+        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_MATRIX_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
+        self._pending_text_batches: Dict[str, MessageEvent] = {}
+        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
+
    def _is_duplicate_event(self, event_id) -> bool:
        """Return True if this event was already processed. Tracks the ID otherwise."""
        if not event_id:
@@ -1088,7 +1100,81 @@ class MatrixAdapter(BasePlatformAdapter):
        # Acknowledge receipt so the room shows as read (fire-and-forget).
        self._background_read_receipt(room.room_id, event.event_id)

-        await self.handle_message(msg_event)
+        # Only batch plain text messages — commands dispatch immediately.
+        if msg_type == MessageType.TEXT and self._text_batch_delay_seconds > 0:
+            self._enqueue_text_event(msg_event)
+        else:
+            await self.handle_message(msg_event)
+
+    # ------------------------------------------------------------------
+    # Text message aggregation (handles Matrix client-side splits)
+    # ------------------------------------------------------------------
+
+    def _text_batch_key(self, event: MessageEvent) -> str:
+        """Session-scoped key for text message batching."""
+        from gateway.session import build_session_key
+        return build_session_key(
+            event.source,
+            group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
+            thread_sessions_per_user=self.config.extra.get("thread_sessions_per_user", False),
+        )
+
+    def _enqueue_text_event(self, event: MessageEvent) -> None:
+        """Buffer a text event and reset the flush timer.
+
+        When a Matrix client splits a long message, the chunks arrive within
+        a few hundred milliseconds.  This merges them into a single event
+        before dispatching.
+        """
+        key = self._text_batch_key(event)
+        existing = self._pending_text_batches.get(key)
+        chunk_len = len(event.text or "")
+        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            self._pending_text_batches[key] = event
+        else:
+            if event.text:
+                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            # Merge any media that might be attached
+            if event.media_urls:
+                existing.media_urls.extend(event.media_urls)
+                existing.media_types.extend(event.media_types)
+
+        # Cancel any pending flush and restart the timer
+        prior_task = self._pending_text_batch_tasks.get(key)
+        if prior_task and not prior_task.done():
+            prior_task.cancel()
+        self._pending_text_batch_tasks[key] = asyncio.create_task(
+            self._flush_text_batch(key)
+        )
+
+    async def _flush_text_batch(self, key: str) -> None:
+        """Wait for the quiet period then dispatch the aggregated text.
+
+        Uses a longer delay when the latest chunk is near Matrix's ~4000-char
+        split point, since a continuation chunk is almost certain.
+        """
+        current_task = asyncio.current_task()
+        try:
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
+            event = self._pending_text_batches.pop(key, None)
+            if not event:
+                return
+            logger.info(
+                "[Matrix] Flushing text batch %s (%d chars)",
+                key, len(event.text or ""),
+            )
+            await self.handle_message(event)
+        finally:
+            if self._pending_text_batch_tasks.get(key) is current_task:
+                self._pending_text_batch_tasks.pop(key, None)

    async def _on_room_message_media(self, room: Any, event: Any) -> None:
        """Handle incoming media messages (images, audio, video, files)."""
@@ -121,6 +121,9 @@ class TelegramAdapter(BasePlatformAdapter):
    
    # Telegram message limits
    MAX_MESSAGE_LENGTH = 4096
+    # Threshold for detecting Telegram client-side message splits.
+    # When a chunk is near this limit, a continuation is almost certain.
+    _SPLIT_THRESHOLD = 4000
    MEDIA_GROUP_WAIT_SECONDS = 0.8
    
    def __init__(self, config: PlatformConfig):
@@ -140,6 +143,7 @@ class TelegramAdapter(BasePlatformAdapter):
        # Buffer rapid text messages so Telegram client-side splits of long
        # messages are aggregated into a single MessageEvent.
        self._text_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS", "0.6"))
+        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
        self._pending_text_batches: Dict[str, MessageEvent] = {}
        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
        self._token_lock_identity: Optional[str] = None
@@ -2160,12 +2164,15 @@ class TelegramAdapter(BasePlatformAdapter):
        """
        key = self._text_batch_key(event)
        existing = self._pending_text_batches.get(key)
+        chunk_len = len(event.text or "")
        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
            self._pending_text_batches[key] = event
        else:
            # Append text from the follow-up chunk
            if event.text:
                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
            # Merge any media that might be attached
            if event.media_urls:
                existing.media_urls.extend(event.media_urls)
@@ -2180,10 +2187,22 @@ class TelegramAdapter(BasePlatformAdapter):
        )

    async def _flush_text_batch(self, key: str) -> None:
-        """Wait for the quiet period then dispatch the aggregated text."""
+        """Wait for the quiet period then dispatch the aggregated text.
+
+        Uses a longer delay when the latest chunk is near Telegram's 4096-char
+        split point, since a continuation chunk is almost certain.
+        """
        current_task = asyncio.current_task()
        try:
-            await asyncio.sleep(self._text_batch_delay_seconds)
+            # Adaptive delay: if the latest chunk is near Telegram's 4096-char
+            # split point, a continuation is almost certain — wait longer.
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
            event = self._pending_text_batches.pop(key, None)
            if not event:
                return
@@ -143,6 +143,9 @@ class WeComAdapter(BasePlatformAdapter):
    """WeCom AI Bot adapter backed by a persistent WebSocket connection."""

    MAX_MESSAGE_LENGTH = MAX_MESSAGE_LENGTH
+    # Threshold for detecting WeCom client-side message splits.
+    # When a chunk is near the 4000-char limit, a continuation is almost certain.
+    _SPLIT_THRESHOLD = 3900

    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.WECOM)
@@ -172,6 +175,13 @@ class WeComAdapter(BasePlatformAdapter):
        self._seen_messages: Dict[str, float] = {}
        self._reply_req_ids: Dict[str, str] = {}

+        # Text batching: merge rapid successive messages (Telegram-style).
+        # WeCom clients split long messages around 4000 chars.
+        self._text_batch_delay_seconds = float(os.getenv("HERMES_WECOM_TEXT_BATCH_DELAY_SECONDS", "0.6"))
+        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_WECOM_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
+        self._pending_text_batches: Dict[str, MessageEvent] = {}
+        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
+
    # ------------------------------------------------------------------
    # Connection lifecycle
    # ------------------------------------------------------------------
@@ -519,7 +529,82 @@ class WeComAdapter(BasePlatformAdapter):
            timestamp=datetime.now(tz=timezone.utc),
        )

-        await self.handle_message(event)
+        # Only batch plain text messages — commands, media, etc. dispatch
+        # immediately since they won't be split by the WeCom client.
+        if message_type == MessageType.TEXT and self._text_batch_delay_seconds > 0:
+            self._enqueue_text_event(event)
+        else:
+            await self.handle_message(event)
+
+    # ------------------------------------------------------------------
+    # Text message aggregation (handles WeCom client-side splits)
+    # ------------------------------------------------------------------
+
+    def _text_batch_key(self, event: MessageEvent) -> str:
+        """Session-scoped key for text message batching."""
+        from gateway.session import build_session_key
+        return build_session_key(
+            event.source,
+            group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
+            thread_sessions_per_user=self.config.extra.get("thread_sessions_per_user", False),
+        )
+
+    def _enqueue_text_event(self, event: MessageEvent) -> None:
+        """Buffer a text event and reset the flush timer.
+
+        When WeCom splits a long user message at 4000 chars, the chunks
+        arrive within a few hundred milliseconds.  This merges them into
+        a single event before dispatching.
+        """
+        key = self._text_batch_key(event)
+        existing = self._pending_text_batches.get(key)
+        chunk_len = len(event.text or "")
+        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            self._pending_text_batches[key] = event
+        else:
+            if event.text:
+                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            # Merge any media that might be attached
+            if event.media_urls:
+                existing.media_urls.extend(event.media_urls)
+                existing.media_types.extend(event.media_types)
+
+        # Cancel any pending flush and restart the timer
+        prior_task = self._pending_text_batch_tasks.get(key)
+        if prior_task and not prior_task.done():
+            prior_task.cancel()
+        self._pending_text_batch_tasks[key] = asyncio.create_task(
+            self._flush_text_batch(key)
+        )
+
+    async def _flush_text_batch(self, key: str) -> None:
+        """Wait for the quiet period then dispatch the aggregated text.
+
+        Uses a longer delay when the latest chunk is near WeCom's 4000-char
+        split point, since a continuation chunk is almost certain.
+        """
+        current_task = asyncio.current_task()
+        try:
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
+            event = self._pending_text_batches.pop(key, None)
+            if not event:
+                return
+            logger.info(
+                "[WeCom] Flushing text batch %s (%d chars)",
+                key, len(event.text or ""),
+            )
+            await self.handle_message(event)
+        finally:
+            if self._pending_text_batch_tasks.get(key) is current_task:
+                self._pending_text_batch_tasks.pop(key, None)

    @staticmethod
    def _extract_text(body: Dict[str, Any]) -> Tuple[str, Optional[str]]:
@@ -4919,14 +4919,21 @@ class GatewayRunner:
            return f"🧠 ✓ Reasoning effort set to `{effort}` (this session only)"

    async def _handle_yolo_command(self, event: MessageEvent) -> str:
-        """Handle /yolo — toggle dangerous command approval bypass."""
-        current = bool(os.environ.get("HERMES_YOLO_MODE"))
+        """Handle /yolo — toggle dangerous command approval bypass for this session only."""
+        from tools.approval import (
+            disable_session_yolo,
+            enable_session_yolo,
+            is_session_yolo_enabled,
+        )
+
+        session_key = self._session_key_for_source(event.source)
+        current = is_session_yolo_enabled(session_key)
        if current:
-            os.environ.pop("HERMES_YOLO_MODE", None)
-            return "⚠️ YOLO mode **OFF** — dangerous commands will require approval."
+            disable_session_yolo(session_key)
+            return "⚠️ YOLO mode **OFF** for this session — dangerous commands will require approval."
        else:
-            os.environ["HERMES_YOLO_MODE"] = "1"
-            return "⚡ YOLO mode **ON** — all commands auto-approved. Use with caution."
+            enable_session_yolo(session_key)
+            return "⚡ YOLO mode **ON** for this session — all commands auto-approved. Use with caution."

    async def _handle_verbose_command(self, event: MessageEvent) -> str:
        """Handle /verbose command — cycle tool progress display mode.
@@ -5274,27 +5281,76 @@ class GatewayRunner:
        )

    async def _handle_usage_command(self, event: MessageEvent) -> str:
-        """Handle /usage command -- show token usage for the session's last agent run."""
+        """Handle /usage command -- show token usage for the current session.
+
+        Checks both _running_agents (mid-turn) and _agent_cache (between turns)
+        so that rate limits, cost estimates, and detailed token breakdowns are
+        available whenever the user asks, not only while the agent is running.
+        """
        source = event.source
        session_key = self._session_key_for_source(source)

+        # Try running agent first (mid-turn), then cached agent (between turns)
        agent = self._running_agents.get(session_key)
+        if not agent or agent is _AGENT_PENDING_SENTINEL:
+            _cache_lock = getattr(self, "_agent_cache_lock", None)
+            _cache = getattr(self, "_agent_cache", None)
+            if _cache_lock and _cache is not None:
+                with _cache_lock:
+                    cached = _cache.get(session_key)
+                    if cached:
+                        agent = cached[0]
+
        if agent and hasattr(agent, "session_total_tokens") and agent.session_api_calls > 0:
            lines = []

-            # Rate limits first (when available from provider headers)
+            # Rate limits (when available from provider headers)
            rl_state = agent.get_rate_limit_state()
            if rl_state and rl_state.has_data:
                from agent.rate_limit_tracker import format_rate_limit_compact
                lines.append(f"⏱️ **Rate Limits:** {format_rate_limit_compact(rl_state)}")
                lines.append("")

-            # Session token usage
+            # Session token usage — detailed breakdown matching CLI
+            input_tokens = getattr(agent, "session_input_tokens", 0) or 0
+            output_tokens = getattr(agent, "session_output_tokens", 0) or 0
+            cache_read = getattr(agent, "session_cache_read_tokens", 0) or 0
+            cache_write = getattr(agent, "session_cache_write_tokens", 0) or 0
+
            lines.append("📊 **Session Token Usage**")
-            lines.append(f"Prompt (input): {agent.session_prompt_tokens:,}")
-            lines.append(f"Completion (output): {agent.session_completion_tokens:,}")
+            lines.append(f"Model: `{agent.model}`")
+            lines.append(f"Input tokens: {input_tokens:,}")
+            if cache_read:
+                lines.append(f"Cache read tokens: {cache_read:,}")
+            if cache_write:
+                lines.append(f"Cache write tokens: {cache_write:,}")
+            lines.append(f"Output tokens: {output_tokens:,}")
            lines.append(f"Total: {agent.session_total_tokens:,}")
            lines.append(f"API calls: {agent.session_api_calls}")
+
+            # Cost estimation
+            try:
+                from agent.usage_pricing import CanonicalUsage, estimate_usage_cost
+                cost_result = estimate_usage_cost(
+                    agent.model,
+                    CanonicalUsage(
+                        input_tokens=input_tokens,
+                        output_tokens=output_tokens,
+                        cache_read_tokens=cache_read,
+                        cache_write_tokens=cache_write,
+                    ),
+                    provider=getattr(agent, "provider", None),
+                    base_url=getattr(agent, "base_url", None),
+                )
+                if cost_result.amount_usd is not None:
+                    prefix = "~" if cost_result.status == "estimated" else ""
+                    lines.append(f"Cost: {prefix}${float(cost_result.amount_usd):.4f}")
+                elif cost_result.status == "included":
+                    lines.append("Cost: included")
+            except Exception:
+                pass
+
+            # Context window and compressions
            ctx = agent.context_compressor
            if ctx.last_prompt_tokens:
                pct = min(100, ctx.last_prompt_tokens / ctx.context_length * 100) if ctx.context_length else 0
@@ -5304,7 +5360,7 @@ class GatewayRunner:

            return "\n".join(lines)

-        # No running agent -- check session history for a rough count
+        # No agent at all -- check session history for a rough count
        session_entry = self.session_store.get_or_create_session(source)
        history = self.session_store.load_transcript(session_entry.session_id)
        if history:
@@ -5315,7 +5371,7 @@ class GatewayRunner:
                f"📊 **Session Info**\n"
                f"Messages: {len(msgs)}\n"
                f"Estimated context: ~{approx:,} tokens\n"
-                f"_(Detailed usage available during active conversations)_"
+                f"_(Detailed usage available after the first agent response)_"
            )
        return "No usage data available for this session."

@@ -6283,6 +6339,32 @@ class GatewayRunner:
        )
        return hashlib.sha256(blob.encode()).hexdigest()[:16]

+    def _apply_session_model_override(
+        self, session_key: str, model: str, runtime_kwargs: dict
+    ) -> tuple:
+        """Apply /model session overrides if present, returning (model, runtime_kwargs).
+
+        The gateway /model command stores per-session overrides in
+        ``_session_model_overrides``.  These must take precedence over
+        config.yaml defaults so the switched model is actually used for
+        subsequent messages.  Fields with ``None`` values are skipped so
+        partial overrides don't clobber valid config defaults.
+        """
+        override = self._session_model_overrides.get(session_key)
+        if not override:
+            return model, runtime_kwargs
+        model = override.get("model", model)
+        for key in ("provider", "api_key", "base_url", "api_mode"):
+            val = override.get(key)
+            if val is not None:
+                runtime_kwargs[key] = val
+        return model, runtime_kwargs
+
+    def _is_intentional_model_switch(self, session_key: str, agent_model: str) -> bool:
+        """Return True if *agent_model* matches an active /model session override."""
+        override = self._session_model_overrides.get(session_key)
+        return override is not None and override.get("model") == agent_model
+
    def _evict_cached_agent(self, session_key: str) -> None:
        """Remove a cached agent for a session (called on /new, /model, etc)."""
        _lock = getattr(self, "_agent_cache_lock", None)
@@ -6660,6 +6742,11 @@ class GatewayRunner:
                    "tools": [],
                }

+            # /model overrides take precedence over config.yaml defaults.
+            model, runtime_kwargs = self._apply_session_model_override(
+                session_key, model, runtime_kwargs
+            )
+
            pr = self._provider_routing
            reasoning_config = self._load_reasoning_config()
            self._reasoning_config = reasoning_config
@@ -7279,14 +7366,15 @@ class GatewayRunner:
            _agent = agent_holder[0]
            if _agent is not None and hasattr(_agent, 'model'):
                _cfg_model = _resolve_gateway_model()
-                if _agent.model != _cfg_model:
+                if _agent.model != _cfg_model and not self._is_intentional_model_switch(session_key, _agent.model):
                    self._effective_model = _agent.model
                    self._effective_provider = getattr(_agent, 'provider', None)
                    # Fallback activated — evict cached agent so the next
                    # message starts fresh and retries the primary model.
                    self._evict_cached_agent(session_key)
                else:
-                    # Primary model worked — clear any stale fallback state
+                    # Primary model worked (or intentional /model switch)
+                    # — clear any stale fallback state.
                    self._effective_model = None
                    self._effective_provider = None

@@ -2581,7 +2581,7 @@ def _prompt_model_selection(
            custom = input("Enter model name: ").strip()
            return custom if custom else None
        return None
-    except (ImportError, NotImplementedError):
+    except (ImportError, NotImplementedError, OSError, subprocess.SubprocessError):
        pass

    # Fallback: numbered list
@@ -100,6 +100,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("reasoning", "Manage reasoning effort and display", "Configuration",
               args_hint="[level|show|hide]",
               subcommands=("none", "minimal", "low", "medium", "high", "xhigh", "show", "hide", "on", "off")),
+    CommandDef("fast", "Toggle fast mode — OpenAI Priority Processing / Anthropic Fast Mode (Normal/Fast)", "Configuration",
+               cli_only=True, args_hint="[normal|fast|status]",
+               subcommands=("normal", "fast", "status", "on", "off")),
    CommandDef("skin", "Show or change the display skin/theme", "Configuration",
               cli_only=True, args_hint="[name]"),
    CommandDef("voice", "Toggle voice mode", "Configuration",
@@ -135,6 +138,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
               cli_only=True, aliases=("gateway",)),
    CommandDef("paste", "Check clipboard for an image and attach it", "Info",
               cli_only=True),
+    CommandDef("image", "Attach a local image file for your next prompt", "Info",
+               cli_only=True, args_hint="<path>"),
    CommandDef("update", "Update Hermes Agent to the latest version", "Info",
               gateway_only=True),

@@ -637,8 +642,18 @@ class SlashCommandCompleter(Completer):
    def __init__(
        self,
        skill_commands_provider: Callable[[], Mapping[str, dict[str, Any]]] | None = None,
+        command_filter: Callable[[str], bool] | None = None,
    ) -> None:
        self._skill_commands_provider = skill_commands_provider
+        self._command_filter = command_filter
+
+    def _command_allowed(self, slash_command: str) -> bool:
+        if self._command_filter is None:
+            return True
+        try:
+            return bool(self._command_filter(slash_command))
+        except Exception:
+            return True

    def _iter_skill_commands(self) -> Mapping[str, dict[str, Any]]:
        if self._skill_commands_provider is None:
@@ -916,7 +931,7 @@ class SlashCommandCompleter(Completer):
                return

            # Static subcommand completions
-            if " " not in sub_text and base_cmd in SUBCOMMANDS:
+            if " " not in sub_text and base_cmd in SUBCOMMANDS and self._command_allowed(base_cmd):
                for sub in SUBCOMMANDS[base_cmd]:
                    if sub.startswith(sub_lower) and sub != sub_lower:
                        yield Completion(
@@ -929,6 +944,8 @@ class SlashCommandCompleter(Completer):
        word = text[1:]

        for cmd, desc in COMMANDS.items():
+            if not self._command_allowed(cmd):
+                continue
            cmd_name = cmd[1:]
            if cmd_name.startswith(word):
                yield Completion(
@@ -987,6 +1004,8 @@ class SlashCommandAutoSuggest(AutoSuggest):
            # Still typing the command name: /upd → suggest "ate"
            word = text[1:].lower()
            for cmd in COMMANDS:
+                if self._completer is not None and not self._completer._command_allowed(cmd):
+                    continue
                cmd_name = cmd[1:]  # strip leading /
                if cmd_name.startswith(word) and cmd_name != word:
                    return Suggestion(cmd_name[len(word):])
@@ -997,6 +1016,8 @@ class SlashCommandAutoSuggest(AutoSuggest):
        sub_lower = sub_text.lower()

        # Static subcommands
+        if self._completer is not None and not self._completer._command_allowed(base_cmd):
+            return None
        if base_cmd in SUBCOMMANDS and SUBCOMMANDS[base_cmd]:
            if " " not in sub_text:
                for sub in SUBCOMMANDS[base_cmd]:
@@ -158,16 +158,27 @@ def get_project_root() -> Path:
    return Path(__file__).parent.parent.resolve()

 def _secure_dir(path):
-    """Set directory to owner-only access (0700). No-op on Windows.
+    """Set directory to owner-only access (0700 by default). No-op on Windows.

    Skipped in managed mode — the NixOS module sets group-readable
    permissions (0750) so interactive users in the hermes group can
    share state with the gateway service.
+
+    The mode can be overridden via the HERMES_HOME_MODE environment variable
+    (e.g. HERMES_HOME_MODE=0701) for deployments where a web server (nginx,
+    caddy, etc.) needs to traverse HERMES_HOME to reach a served subdirectory.
+    The execute-only bit on a directory permits cd-through without exposing
+    directory listings.
    """
    if is_managed():
        return
    try:
-        os.chmod(path, 0o700)
+        mode_str = os.environ.get("HERMES_HOME_MODE", "").strip()
+        mode = int(mode_str, 8) if mode_str else 0o700
+    except ValueError:
+        mode = 0o700
+    try:
+        os.chmod(path, mode)
    except (OSError, NotImplementedError):
        pass

@@ -255,6 +266,7 @@ DEFAULT_CONFIG = {
        # tools or receiving API responses.  Only fires when the agent has
        # been completely idle for this duration.  0 = unlimited.
        "gateway_timeout": 1800,
+        "service_tier": "",
        # Tool-use enforcement: injects system prompt guidance that tells the
        # model to actually call tools instead of describing intended actions.
        # Values: "auto" (default — applies to gpt/codex models), true/false
@@ -1216,6 +1228,14 @@ OPTIONAL_ENV_VARS = {
        "category": "messaging",
        "advanced": True,
    },
+    "API_SERVER_MODEL_NAME": {
+        "description": "Model name advertised on /v1/models. Defaults to the profile name (or 'hermes-agent' for the default profile). Useful for multi-user setups with OpenWebUI.",
+        "prompt": "API server model name",
+        "url": None,
+        "password": False,
+        "category": "messaging",
+        "advanced": True,
+    },
    "WEBHOOK_ENABLED": {
        "description": "Enable the webhook platform adapter for receiving events from GitHub, GitLab, etc.",
        "prompt": "Enable webhooks (true/false)",
@@ -285,6 +285,7 @@ def copilot_request_headers(
    headers: dict[str, str] = {
        "Editor-Version": "vscode/1.104.1",
        "User-Agent": "HermesAgent/1.0",
+        "Copilot-Integration-Id": "vscode-chat",
        "Openai-Intent": "conversation-edits",
        "x-initiator": "agent" if is_agent_turn else "user",
    }
@@ -54,6 +54,32 @@ _PROVIDER_ENV_HINTS = (
 )


+from hermes_constants import is_termux as _is_termux
+
+
+def _python_install_cmd() -> str:
+    return "python -m pip install" if _is_termux() else "uv pip install"
+
+
+def _system_package_install_cmd(pkg: str) -> str:
+    if _is_termux():
+        return f"pkg install {pkg}"
+    if sys.platform == "darwin":
+        return f"brew install {pkg}"
+    return f"sudo apt install {pkg}"
+
+
+def _termux_browser_setup_steps(node_installed: bool) -> list[str]:
+    steps: list[str] = []
+    step = 1
+    if not node_installed:
+        steps.append(f"{step}) pkg install nodejs")
+        step += 1
+    steps.append(f"{step}) npm install -g agent-browser")
+    steps.append(f"{step + 1}) agent-browser install")
+    return steps
+
+
 def _has_provider_env_config(content: str) -> bool:
    """Return True when ~/.hermes/.env contains provider auth/base URL settings."""
    return any(key in content for key in _PROVIDER_ENV_HINTS)
@@ -200,7 +226,7 @@ def run_doctor(args):
            check_ok(name)
        except ImportError:
            check_fail(name, "(missing)")
-            issues.append(f"Install {name}: uv pip install {module}")
+            issues.append(f"Install {name}: {_python_install_cmd()} {module}")
    
    for module, name in optional_packages:
        try:
@@ -503,7 +529,7 @@ def run_doctor(args):
        check_ok("ripgrep (rg)", "(faster file search)")
    else:
        check_warn("ripgrep (rg) not found", "(file search uses grep fallback)")
-        check_info("Install for faster search: sudo apt install ripgrep")
+        check_info(f"Install for faster search: {_system_package_install_cmd('ripgrep')}")
    
    # Docker (optional)
    terminal_env = os.getenv("TERMINAL_ENV", "local")
@@ -526,7 +552,10 @@ def run_doctor(args):
        if shutil.which("docker"):
            check_ok("docker", "(optional)")
        else:
-            check_warn("docker not found", "(optional)")
+            if _is_termux():
+                check_info("Docker backend is not available inside Termux (expected on Android)")
+            else:
+                check_warn("docker not found", "(optional)")
    
    # SSH (if using ssh backend)
    if terminal_env == "ssh":
@@ -574,9 +603,23 @@ def run_doctor(args):
        if agent_browser_path.exists():
            check_ok("agent-browser (Node.js)", "(browser automation)")
        else:
-            check_warn("agent-browser not installed", "(run: npm install)")
+            if _is_termux():
+                check_info("agent-browser is not installed (expected in the tested Termux path)")
+                check_info("Install it manually later with: npm install -g agent-browser && agent-browser install")
+                check_info("Termux browser setup:")
+                for step in _termux_browser_setup_steps(node_installed=True):
+                    check_info(step)
+            else:
+                check_warn("agent-browser not installed", "(run: npm install)")
    else:
-        check_warn("Node.js not found", "(optional, needed for browser tools)")
+        if _is_termux():
+            check_info("Node.js not found (browser tools are optional in the tested Termux path)")
+            check_info("Install Node.js on Termux with: pkg install nodejs")
+            check_info("Termux browser setup:")
+            for step in _termux_browser_setup_steps(node_installed=False):
+                check_info(step)
+        else:
+            check_warn("Node.js not found", "(optional, needed for browser tools)")
    
    # npm audit for all Node.js packages
    if shutil.which("npm"):
@@ -709,7 +752,7 @@ def run_doctor(args):
                _url = (_base.rstrip("/") + "/models") if _base else _default_url
                _headers = {"Authorization": f"Bearer {_key}"}
                if "api.kimi.com" in _url.lower():
-                    _headers["User-Agent"] = "KimiCLI/1.0"
+                    _headers["User-Agent"] = "KimiCLI/1.30.0"
                _resp = httpx.get(
                    _url,
                    headers=_headers,
@@ -739,8 +782,9 @@ def run_doctor(args):
                __import__("tinker_atropos")
                check_ok("tinker-atropos", "(RL training backend)")
            except ImportError:
-                check_warn("tinker-atropos found but not installed", "(run: uv pip install -e ./tinker-atropos)")
-                issues.append("Install tinker-atropos: uv pip install -e ./tinker-atropos")
+                install_cmd = f"{_python_install_cmd()} -e ./tinker-atropos"
+                check_warn("tinker-atropos found but not installed", f"(run: {install_cmd})")
+                issues.append(f"Install tinker-atropos: {install_cmd}")
        else:
            check_warn("tinker-atropos requires Python 3.11+", f"(current: {py_version.major}.{py_version.minor})")
    else:
@@ -39,7 +39,7 @@ def _get_service_pids() -> set:
    pids: set = set()

    # --- systemd (Linux): user and system scopes ---
-    if is_linux():
+    if supports_systemd_services():
        for scope_args in [["systemctl", "--user"], ["systemctl"]]:
            try:
                result = subprocess.run(
@@ -225,6 +225,14 @@ def stop_profile_gateway() -> bool:
 def is_linux() -> bool:
    return sys.platform.startswith('linux')

+
+from hermes_constants import is_termux
+
+
+def supports_systemd_services() -> bool:
+    return is_linux() and not is_termux()
+
+
 def is_macos() -> bool:
    return sys.platform == 'darwin'

@@ -477,13 +485,15 @@ def install_linux_gateway_from_setup(force: bool = False) -> tuple[str | None, b


 def get_systemd_linger_status() -> tuple[bool | None, str]:
-    """Return whether systemd user lingering is enabled for the current user.
+    """Return systemd linger status for the current user.

    Returns:
        (True, "") when linger is enabled.
        (False, "") when linger is disabled.
        (None, detail) when the status could not be determined.
    """
+    if is_termux():
+        return None, "not supported in Termux"
    if not is_linux():
        return None, "not supported on this platform"

@@ -766,7 +776,7 @@ def _print_linger_enable_warning(username: str, detail: str | None = None) -> No

 def _ensure_linger_enabled() -> None:
    """Enable linger when possible so the user gateway survives logout."""
-    if not is_linux():
+    if is_termux() or not is_linux():
        return

    import getpass
@@ -1801,7 +1811,7 @@ def _setup_whatsapp():

 def _is_service_installed() -> bool:
    """Check if the gateway is installed as a system service."""
-    if is_linux():
+    if supports_systemd_services():
        return get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()
    elif is_macos():
        return get_launchd_plist_path().exists()
@@ -1810,7 +1820,7 @@ def _is_service_installed() -> bool:

 def _is_service_running() -> bool:
    """Check if the gateway service is currently running."""
-    if is_linux():
+    if supports_systemd_services():
        user_unit_exists = get_systemd_unit_path(system=False).exists()
        system_unit_exists = get_systemd_unit_path(system=True).exists()

@@ -1983,7 +1993,7 @@ def gateway_setup():
    service_installed = _is_service_installed()
    service_running = _is_service_running()

-    if is_linux() and has_conflicting_systemd_units():
+    if supports_systemd_services() and has_conflicting_systemd_units():
        print_systemd_scope_conflict_warning()
        print()

@@ -1993,7 +2003,7 @@ def gateway_setup():
        print_warning("Gateway service is installed but not running.")
        if prompt_yes_no("  Start it now?", True):
            try:
-                if is_linux():
+                if supports_systemd_services():
                    systemd_start()
                elif is_macos():
                    launchd_start()
@@ -2044,7 +2054,7 @@ def gateway_setup():
        if service_running:
            if prompt_yes_no("  Restart the gateway to pick up changes?", True):
                try:
-                    if is_linux():
+                    if supports_systemd_services():
                        systemd_restart()
                    elif is_macos():
                        launchd_restart()
@@ -2056,7 +2066,7 @@ def gateway_setup():
        elif service_installed:
            if prompt_yes_no("  Start the gateway service?", True):
                try:
-                    if is_linux():
+                    if supports_systemd_services():
                        systemd_start()
                    elif is_macos():
                        launchd_start()
@@ -2064,13 +2074,13 @@ def gateway_setup():
                    print_error(f"  Start failed: {e}")
        else:
            print()
-            if is_linux() or is_macos():
-                platform_name = "systemd" if is_linux() else "launchd"
+            if supports_systemd_services() or is_macos():
+                platform_name = "systemd" if supports_systemd_services() else "launchd"
                if prompt_yes_no(f"  Install the gateway as a {platform_name} service? (runs in background, starts on boot)", True):
                    try:
                        installed_scope = None
                        did_install = False
-                        if is_linux():
+                        if supports_systemd_services():
                            installed_scope, did_install = install_linux_gateway_from_setup(force=False)
                        else:
                            launchd_install(force=False)
@@ -2078,7 +2088,7 @@ def gateway_setup():
                        print()
                        if did_install and prompt_yes_no("  Start the service now?", True):
                            try:
-                                if is_linux():
+                                if supports_systemd_services():
                                    systemd_start(system=installed_scope == "system")
                                else:
                                    launchd_start()
@@ -2089,12 +2099,18 @@ def gateway_setup():
                        print_info("  You can try manually: hermes gateway install")
                else:
                    print_info("  You can install later: hermes gateway install")
-                    if is_linux():
+                    if supports_systemd_services():
                        print_info("  Or as a boot-time service: sudo hermes gateway install --system")
                    print_info("  Or run in foreground:  hermes gateway")
            else:
-                print_info("  Service install not supported on this platform.")
-                print_info("  Run in foreground: hermes gateway")
+                if is_termux():
+                    from hermes_constants import display_hermes_home as _dhh
+                    print_info("  Termux does not use systemd/launchd services.")
+                    print_info("  Run in foreground: hermes gateway")
+                    print_info(f"  Or start it manually in the background (best effort): nohup hermes gateway >{_dhh()}/logs/gateway.log 2>&1 &")
+                else:
+                    print_info("  Service install not supported on this platform.")
+                    print_info("  Run in foreground: hermes gateway")
    else:
        print()
        print_info("No platforms configured. Run 'hermes gateway setup' when ready.")
@@ -2130,7 +2146,11 @@ def gateway_command(args):
        force = getattr(args, 'force', False)
        system = getattr(args, 'system', False)
        run_as_user = getattr(args, 'run_as_user', None)
-        if is_linux():
+        if is_termux():
+            print("Gateway service installation is not supported on Termux.")
+            print("Run manually: hermes gateway")
+            sys.exit(1)
+        if supports_systemd_services():
            systemd_install(force=force, system=system, run_as_user=run_as_user)
        elif is_macos():
            launchd_install(force)
@@ -2144,7 +2164,11 @@ def gateway_command(args):
            managed_error("uninstall gateway service (managed by NixOS)")
            return
        system = getattr(args, 'system', False)
-        if is_linux():
+        if is_termux():
+            print("Gateway service uninstall is not supported on Termux because there is no managed service to remove.")
+            print("Stop manual runs with: hermes gateway stop")
+            sys.exit(1)
+        if supports_systemd_services():
            systemd_uninstall(system=system)
        elif is_macos():
            launchd_uninstall()
@@ -2154,7 +2178,11 @@ def gateway_command(args):
    
    elif subcmd == "start":
        system = getattr(args, 'system', False)
-        if is_linux():
+        if is_termux():
+            print("Gateway service start is not supported on Termux because there is no system service manager.")
+            print("Run manually: hermes gateway")
+            sys.exit(1)
+        if supports_systemd_services():
            systemd_start(system=system)
        elif is_macos():
            launchd_start()
@@ -2169,7 +2197,7 @@ def gateway_command(args):
        if stop_all:
            # --all: kill every gateway process on the machine
            service_available = False
-            if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+            if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
                try:
                    systemd_stop(system=system)
                    service_available = True
@@ -2190,7 +2218,7 @@ def gateway_command(args):
        else:
            # Default: stop only the current profile's gateway
            service_available = False
-            if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+            if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
                try:
                    systemd_stop(system=system)
                    service_available = True
@@ -2218,7 +2246,7 @@ def gateway_command(args):
        system = getattr(args, 'system', False)
        service_configured = False
        
-        if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+        if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
            service_configured = True
            try:
                systemd_restart(system=system)
@@ -2235,7 +2263,7 @@ def gateway_command(args):
        
        if not service_available:
            # systemd/launchd restart failed — check if linger is the issue
-            if is_linux():
+            if supports_systemd_services():
                linger_ok, _detail = get_systemd_linger_status()
                if linger_ok is not True:
                    import getpass
@@ -2272,7 +2300,7 @@ def gateway_command(args):
        system = getattr(args, 'system', False)
        
        # Check for service first
-        if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+        if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
            systemd_status(deep, system=system)
        elif is_macos() and get_launchd_plist_path().exists():
            launchd_status(deep)
@@ -2289,9 +2317,13 @@ def gateway_command(args):
                    for line in runtime_lines:
                        print(f"  {line}")
                print()
-                print("To install as a service:")
-                print("  hermes gateway install")
-                print("  sudo hermes gateway install --system")
+                if is_termux():
+                    print("Termux note:")
+                    print("  Android may stop background jobs when Termux is suspended")
+                else:
+                    print("To install as a service:")
+                    print("  hermes gateway install")
+                    print("  sudo hermes gateway install --system")
            else:
                print("✗ Gateway is not running")
                runtime_lines = _runtime_health_lines()
@@ -2303,5 +2335,8 @@ def gateway_command(args):
                print()
                print("To start:")
                print("  hermes gateway          # Run in foreground")
-                print("  hermes gateway install  # Install as user service")
-                print("  sudo hermes gateway install --system  # Install as boot-time system service")
+                if is_termux():
+                    print("  nohup hermes gateway > ~/.hermes/logs/gateway.log 2>&1 &  # Best-effort background start")
+                else:
+                    print("  hermes gateway install  # Install as user service")
+                    print("  sudo hermes gateway install --system  # Install as boot-time system service")
@@ -646,6 +646,7 @@ def cmd_chat(args):
        "verbose": args.verbose,
        "quiet": getattr(args, "quiet", False),
        "query": args.query,
+        "image": getattr(args, "image", None),
        "resume": getattr(args, "resume", None),
        "worktree": getattr(args, "worktree", False),
        "checkpoints": getattr(args, "checkpoints", False),
@@ -857,7 +858,6 @@ def cmd_whatsapp(args):

 def cmd_setup(args):
    """Interactive setup wizard."""
-    _require_tty("setup")
    from hermes_cli.setup import run_setup_wizard
    run_setup_wizard(args)

@@ -967,10 +967,11 @@ def select_provider_and_model(args=None):
        ("alibaba", "Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
    ]

-    # Add user-defined custom providers from config.yaml
-    custom_providers_cfg = config.get("custom_providers") or []
-    _custom_provider_map = {}  # key → {name, base_url, api_key}
-    if isinstance(custom_providers_cfg, list):
+    def _named_custom_provider_map(cfg) -> dict[str, dict[str, str]]:
+        custom_providers_cfg = cfg.get("custom_providers") or []
+        custom_provider_map = {}
+        if not isinstance(custom_providers_cfg, list):
+            return custom_provider_map
        for entry in custom_providers_cfg:
            if not isinstance(entry, dict):
                continue
@@ -979,16 +980,23 @@ def select_provider_and_model(args=None):
            if not name or not base_url:
                continue
            key = "custom:" + name.lower().replace(" ", "-")
-            short_url = base_url.replace("https://", "").replace("http://", "").rstrip("/")
-            saved_model = entry.get("model", "")
-            model_hint = f" — {saved_model}" if saved_model else ""
-            top_providers.append((key, f"{name} ({short_url}){model_hint}"))
-            _custom_provider_map[key] = {
+            custom_provider_map[key] = {
                "name": name,
                "base_url": base_url,
                "api_key": entry.get("api_key", ""),
-                "model": saved_model,
+                "model": entry.get("model", ""),
            }
+        return custom_provider_map
+
+    # Add user-defined custom providers from config.yaml
+    _custom_provider_map = _named_custom_provider_map(config)  # key → {name, base_url, api_key}
+    for key, provider_info in _custom_provider_map.items():
+        name = provider_info["name"]
+        base_url = provider_info["base_url"]
+        short_url = base_url.replace("https://", "").replace("http://", "").rstrip("/")
+        saved_model = provider_info.get("model", "")
+        model_hint = f" — {saved_model}" if saved_model else ""
+        top_providers.append((key, f"{name} ({short_url}){model_hint}"))

    top_keys = {k for k, _ in top_providers}
    extended_keys = {k for k, _ in extended_providers}
@@ -1053,8 +1061,15 @@ def select_provider_and_model(args=None):
        _model_flow_copilot(config, current_model)
    elif selected_provider == "custom":
        _model_flow_custom(config)
-    elif selected_provider.startswith("custom:") and selected_provider in _custom_provider_map:
-        _model_flow_named_custom(config, _custom_provider_map[selected_provider])
+    elif selected_provider.startswith("custom:"):
+        provider_info = _named_custom_provider_map(load_config()).get(selected_provider)
+        if provider_info is None:
+            print(
+                "Warning: the selected saved custom provider is no longer available. "
+                "It may have been removed from config.yaml. No change."
+            )
+            return
+        _model_flow_named_custom(config, provider_info)
    elif selected_provider == "remove-custom":
        _remove_custom_provider(config)
    elif selected_provider == "anthropic":
@@ -1127,10 +1142,10 @@ def _model_flow_openrouter(config, current_model=""):
        print()

    from hermes_cli.models import model_ids, get_pricing_for_provider
-    openrouter_models = model_ids()
+    openrouter_models = model_ids(force_refresh=True)

    # Fetch live pricing (non-blocking — returns empty dict on failure)
-    pricing = get_pricing_for_provider("openrouter")
+    pricing = get_pricing_for_provider("openrouter", force_refresh=True)

    selected = _prompt_model_selection(openrouter_models, current_model=current_model, pricing=pricing)
    if selected:
@@ -1658,7 +1673,7 @@ def _remove_custom_provider(config):
        )
        idx = menu.show()
        print()
-    except (ImportError, NotImplementedError):
+    except (ImportError, NotImplementedError, OSError, subprocess.SubprocessError):
        for i, c in enumerate(choices, 1):
            print(f"  {i}. {c}")
        print()
@@ -1739,7 +1754,7 @@ def _model_flow_named_custom(config, provider_info):
                print("Cancelled.")
                return
            model_name = models[idx]
-        except (ImportError, NotImplementedError):
+        except (ImportError, NotImplementedError, OSError, subprocess.SubprocessError):
            for i, m in enumerate(models, 1):
                print(f"  {i}. {m}")
            print(f"  {len(models) + 1}. Cancel")
@@ -1860,7 +1875,7 @@ def _prompt_reasoning_effort_selection(efforts, current_effort=""):
        if idx == len(ordered):
            return "none"
        return None
-    except (ImportError, NotImplementedError):
+    except (ImportError, NotImplementedError, OSError, subprocess.SubprocessError):
        pass

    print("Select reasoning effort:")
@@ -3021,33 +3036,19 @@ def _restore_stashed_changes(
        print("\nYour stashed changes are preserved — nothing is lost.")
        print(f"  Stash ref: {stash_ref}")

-        # Ask before resetting (if interactive)
-        do_reset = True
-        if prompt_user:
-            print("\nReset working tree to clean state so Hermes can run?")
-            print("  (You can re-apply your changes later with: git stash apply)")
-            print("[Y/n] ", end="", flush=True)
-            response = input().strip().lower()
-            if response not in ("", "y", "yes"):
-                do_reset = False
-
-        if do_reset:
-            subprocess.run(
-                git_cmd + ["reset", "--hard", "HEAD"],
-                cwd=cwd,
-                capture_output=True,
-            )
-            print("Working tree reset to clean state.")
-        else:
-            print("Working tree left as-is (may have conflict markers).")
-            print("Resolve conflicts manually, then run: git stash drop")
-
-        print(f"Restore your changes with: git stash apply {stash_ref}")
-        # In non-interactive mode (gateway /update), don't abort — the code
-        # update itself succeeded, only the stash restore had conflicts.
-        # Aborting would report the entire update as failed.
-        if prompt_user:
-            sys.exit(1)
+        # Always reset to clean state — leaving conflict markers in source
+        # files makes hermes completely unrunnable (SyntaxError on import).
+        # The user's changes are safe in the stash for manual recovery.
+        subprocess.run(
+            git_cmd + ["reset", "--hard", "HEAD"],
+            cwd=cwd,
+            capture_output=True,
+        )
+        print("Working tree reset to clean state.")
+        print(f"Restore your changes later with: git stash apply {stash_ref}")
+        # Don't sys.exit — the code update itself succeeded, only the stash
+        # restore had conflicts.  Let cmd_update continue with pip install,
+        # skill sync, and gateway restart.
        return False

    stash_selector = _resolve_stash_selector(git_cmd, cwd, stash_ref)
@@ -3763,7 +3764,7 @@ def cmd_update(args):
        # running gateway needs restarting to pick up the new code.
        try:
            from hermes_cli.gateway import (
-                is_macos, is_linux, _ensure_user_systemd_env,
+                is_macos, supports_systemd_services, _ensure_user_systemd_env,
                find_gateway_pids,
                _get_service_pids,
            )
@@ -3774,7 +3775,7 @@ def cmd_update(args):

            # --- Systemd services (Linux) ---
            # Discover all hermes-gateway* units (default + profiles)
-            if is_linux():
+            if supports_systemd_services():
                try:
                    _ensure_user_systemd_env()
                except Exception:
@@ -4291,6 +4292,10 @@ For more help on a command:
        "-q", "--query",
        help="Single query (non-interactive mode)"
    )
+    chat_parser.add_argument(
+        "--image",
+        help="Optional local image path to attach to a single query"
+    )
    chat_parser.add_argument(
        "-m", "--model",
        help="Model to use (e.g., anthropic/claude-sonnet-4)"
@@ -4481,12 +4486,12 @@ For more help on a command:
        "setup",
        help="Interactive setup wizard",
        description="Configure Hermes Agent with an interactive wizard. "
-                    "Run a specific section: hermes setup model|terminal|gateway|tools|agent"
+                    "Run a specific section: hermes setup model|tts|terminal|gateway|tools|agent"
    )
    setup_parser.add_argument(
        "section",
        nargs="?",
-        choices=["model", "terminal", "gateway", "tools", "agent"],
+        choices=["model", "tts", "terminal", "gateway", "tools", "agent"],
        default=None,
        help="Run a specific setup section instead of the full wizard"
    )
@@ -24,18 +24,19 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
 GITHUB_MODELS_BASE_URL = COPILOT_BASE_URL
 GITHUB_MODELS_CATALOG_URL = COPILOT_MODELS_URL

+# Fallback OpenRouter snapshot used when the live catalog is unavailable.
 # (model_id, display description shown in menus)
 OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("anthropic/claude-opus-4.6",       "recommended"),
    ("anthropic/claude-sonnet-4.6",     ""),
-    ("qwen/qwen3.6-plus:free", "free"),
+    ("qwen/qwen3.6-plus",               ""),
    ("anthropic/claude-sonnet-4.5",     ""),
    ("anthropic/claude-haiku-4.5",      ""),
    ("openai/gpt-5.4",                  ""),
    ("openai/gpt-5.4-mini",             ""),
    ("xiaomi/mimo-v2-pro",               ""),
    ("openai/gpt-5.3-codex",            ""),
-    ("google/gemini-3-pro-preview",     ""),
+    ("google/gemini-3-pro-image-preview", ""),
    ("google/gemini-3-flash-preview",   ""),
    ("google/gemini-3.1-pro-preview",     ""),
    ("google/gemini-3.1-flash-lite-preview",   ""),
@@ -47,7 +48,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("z-ai/glm-5.1",                    ""),
    ("z-ai/glm-5-turbo",                ""),
    ("moonshotai/kimi-k2.5",            ""),
-    ("x-ai/grok-4.20-beta",             ""),
+    ("x-ai/grok-4.20",                  ""),
    ("nvidia/nemotron-3-super-120b-a12b",      ""),
    ("nvidia/nemotron-3-super-120b-a12b:free", "free"),
    ("arcee-ai/trinity-large-preview:free", "free"),
@@ -56,6 +57,8 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("openai/gpt-5.4-nano",             ""),
 ]

+_openrouter_catalog_cache: list[tuple[str, str]] | None = None
+
 _PROVIDER_MODELS: dict[str, list[str]] = {
    "nous": [
        "anthropic/claude-opus-4.6",
@@ -530,15 +533,79 @@ _PROVIDER_ALIASES = {
 }


-def model_ids() -> list[str]:
+def _openrouter_model_is_free(pricing: Any) -> bool:
+    """Return True when both prompt and completion pricing are zero."""
+    if not isinstance(pricing, dict):
+        return False
+    try:
+        return float(pricing.get("prompt", "0")) == 0 and float(pricing.get("completion", "0")) == 0
+    except (TypeError, ValueError):
+        return False
+
+
+def fetch_openrouter_models(
+    timeout: float = 8.0,
+    *,
+    force_refresh: bool = False,
+) -> list[tuple[str, str]]:
+    """Return the curated OpenRouter picker list, refreshed from the live catalog when possible."""
+    global _openrouter_catalog_cache
+
+    if _openrouter_catalog_cache is not None and not force_refresh:
+        return list(_openrouter_catalog_cache)
+
+    fallback = list(OPENROUTER_MODELS)
+    preferred_ids = [mid for mid, _ in fallback]
+
+    try:
+        req = urllib.request.Request(
+            "https://openrouter.ai/api/v1/models",
+            headers={"Accept": "application/json"},
+        )
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            payload = json.loads(resp.read().decode())
+    except Exception:
+        return list(_openrouter_catalog_cache or fallback)
+
+    live_items = payload.get("data", [])
+    if not isinstance(live_items, list):
+        return list(_openrouter_catalog_cache or fallback)
+
+    live_by_id: dict[str, dict[str, Any]] = {}
+    for item in live_items:
+        if not isinstance(item, dict):
+            continue
+        mid = str(item.get("id") or "").strip()
+        if not mid:
+            continue
+        live_by_id[mid] = item
+
+    curated: list[tuple[str, str]] = []
+    for preferred_id in preferred_ids:
+        live_item = live_by_id.get(preferred_id)
+        if live_item is None:
+            continue
+        desc = "free" if _openrouter_model_is_free(live_item.get("pricing")) else ""
+        curated.append((preferred_id, desc))
+
+    if not curated:
+        return list(_openrouter_catalog_cache or fallback)
+
+    first_id, _ = curated[0]
+    curated[0] = (first_id, "recommended")
+    _openrouter_catalog_cache = curated
+    return list(curated)
+
+
+def model_ids(*, force_refresh: bool = False) -> list[str]:
    """Return just the OpenRouter model-id strings."""
-    return [mid for mid, _ in OPENROUTER_MODELS]
+    return [mid for mid, _ in fetch_openrouter_models(force_refresh=force_refresh)]


-def menu_labels() -> list[str]:
+def menu_labels(*, force_refresh: bool = False) -> list[str]:
    """Return display labels like 'anthropic/claude-opus-4.6 (recommended)'."""
    labels = []
-    for mid, desc in OPENROUTER_MODELS:
+    for mid, desc in fetch_openrouter_models(force_refresh=force_refresh):
        labels.append(f"{mid} ({desc})" if desc else mid)
    return labels

@@ -727,13 +794,14 @@ def _resolve_nous_pricing_credentials() -> tuple[str, str]:
    return ("", "")


-def get_pricing_for_provider(provider: str) -> dict[str, dict[str, str]]:
+def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> dict[str, dict[str, str]]:
    """Return live pricing for providers that support it (openrouter, nous)."""
    normalized = normalize_provider(provider)
    if normalized == "openrouter":
        return fetch_models_with_pricing(
            api_key=_resolve_openrouter_api_key(),
            base_url="https://openrouter.ai/api",
+            force_refresh=force_refresh,
        )
    if normalized == "nous":
        api_key, base_url = _resolve_nous_pricing_credentials()
@@ -746,6 +814,7 @@ def get_pricing_for_provider(provider: str) -> dict[str, dict[str, str]]:
            return fetch_models_with_pricing(
                api_key=api_key,
                base_url=stripped,
+                force_refresh=force_refresh,
            )
    return {}

@@ -854,7 +923,11 @@ def _get_custom_base_url() -> str:
    return ""


-def curated_models_for_provider(provider: Optional[str]) -> list[tuple[str, str]]:
+def curated_models_for_provider(
+    provider: Optional[str],
+    *,
+    force_refresh: bool = False,
+) -> list[tuple[str, str]]:
    """Return ``(model_id, description)`` tuples for a provider's model list.

    Tries to fetch the live model list from the provider's API first,
@@ -863,7 +936,7 @@ def curated_models_for_provider(provider: Optional[str]) -> list[tuple[str, str]
    """
    normalized = normalize_provider(provider)
    if normalized == "openrouter":
-        return list(OPENROUTER_MODELS)
+        return fetch_openrouter_models(force_refresh=force_refresh)

    # Try live API first (Codex, Nous, etc. all support /models)
    live = provider_model_ids(normalized)
@@ -982,12 +1055,12 @@ def _find_openrouter_slug(model_name: str) -> Optional[str]:
        return None

    # Exact match (already has provider/ prefix)
-    for mid, _ in OPENROUTER_MODELS:
+    for mid in model_ids():
        if name_lower == mid.lower():
            return mid

    # Try matching just the model part (after the /)
-    for mid, _ in OPENROUTER_MODELS:
+    for mid in model_ids():
        if "/" in mid:
            _, model_part = mid.split("/", 1)
            if name_lower == model_part.lower():
@@ -1017,6 +1090,79 @@ def provider_label(provider: Optional[str]) -> str:
    return _PROVIDER_LABELS.get(normalized, original or "OpenRouter")


+# Models that support OpenAI Priority Processing (service_tier="priority").
+# See https://openai.com/api-priority-processing/ for the canonical list.
+# Only the bare model slug is stored (no vendor prefix).
+_PRIORITY_PROCESSING_MODELS: frozenset[str] = frozenset({
+    "gpt-5.4",
+    "gpt-5.4-mini",
+    "gpt-5.2",
+    "gpt-5.1",
+    "gpt-5",
+    "gpt-5-mini",
+    "gpt-4.1",
+    "gpt-4.1-mini",
+    "gpt-4.1-nano",
+    "gpt-4o",
+    "gpt-4o-mini",
+    "o3",
+    "o4-mini",
+})
+
+# Models that support Anthropic Fast Mode (speed="fast").
+# See https://platform.claude.com/docs/en/build-with-claude/fast-mode
+# Currently only Claude Opus 4.6.  Both hyphen and dot variants are stored
+# to handle native Anthropic (claude-opus-4-6) and OpenRouter (claude-opus-4.6).
+_ANTHROPIC_FAST_MODE_MODELS: frozenset[str] = frozenset({
+    "claude-opus-4-6",
+    "claude-opus-4.6",
+})
+
+
+def _strip_vendor_prefix(model_id: str) -> str:
+    """Strip vendor/ prefix from a model ID (e.g. 'anthropic/claude-opus-4-6' -> 'claude-opus-4-6')."""
+    raw = str(model_id or "").strip().lower()
+    if "/" in raw:
+        raw = raw.split("/", 1)[1]
+    return raw
+
+
+def model_supports_fast_mode(model_id: Optional[str]) -> bool:
+    """Return whether Hermes should expose the /fast toggle for this model."""
+    raw = _strip_vendor_prefix(str(model_id or ""))
+    if raw in _PRIORITY_PROCESSING_MODELS:
+        return True
+    # Anthropic fast mode — strip date suffixes (e.g. claude-opus-4-6-20260401)
+    # and OpenRouter variant tags (:fast, :beta) for matching.
+    base = raw.split(":")[0]
+    return base in _ANTHROPIC_FAST_MODE_MODELS
+
+
+def _is_anthropic_fast_model(model_id: Optional[str]) -> bool:
+    """Return True if the model supports Anthropic's fast mode (speed='fast')."""
+    raw = _strip_vendor_prefix(str(model_id or ""))
+    base = raw.split(":")[0]
+    return base in _ANTHROPIC_FAST_MODE_MODELS
+
+
+def resolve_fast_mode_overrides(model_id: Optional[str]) -> dict[str, Any] | None:
+    """Return request_overrides for fast/priority mode, or None if unsupported.
+
+    Returns provider-appropriate overrides:
+    - OpenAI models: ``{"service_tier": "priority"}`` (Priority Processing)
+    - Anthropic models: ``{"speed": "fast"}`` (Anthropic Fast Mode beta)
+
+    The overrides are injected into the API request kwargs by
+    ``_build_api_kwargs`` in run_agent.py — each API path handles its own
+    keys (service_tier for OpenAI/Codex, speed for Anthropic Messages).
+    """
+    if not model_supports_fast_mode(model_id):
+        return None
+    if _is_anthropic_fast_model(model_id):
+        return {"speed": "fast"}
+    return {"service_tier": "priority"}
+
+
 def _resolve_copilot_catalog_api_key() -> str:
    """Best-effort GitHub token for fetching the Copilot model catalog."""
    try:
@@ -1028,7 +1174,7 @@ def _resolve_copilot_catalog_api_key() -> str:
        return ""


-def provider_model_ids(provider: Optional[str]) -> list[str]:
+def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False) -> list[str]:
    """Return the best known model catalog for a provider.

    Tries live API endpoints for providers that support them (Codex, Nous),
@@ -1036,7 +1182,7 @@ def provider_model_ids(provider: Optional[str]) -> list[str]:
    """
    normalized = normalize_provider(provider)
    if normalized == "openrouter":
-        return model_ids()
+        return model_ids(force_refresh=force_refresh)
    if normalized == "openai-codex":
        from hermes_cli.codex_models import get_codex_model_ids

@@ -16,6 +16,7 @@ from hermes_cli.auth import (
    DEFAULT_CODEX_BASE_URL,
    DEFAULT_QWEN_BASE_URL,
    PROVIDER_REGISTRY,
+    _agent_key_is_usable,
    format_auth_error,
    resolve_provider,
    resolve_nous_runtime_credentials,
@@ -644,6 +645,21 @@ def resolve_runtime_provider(
                getattr(entry, "runtime_api_key", None)
                or getattr(entry, "access_token", "")
            )
+        # For Nous, the pool entry's runtime_api_key is the agent_key — a
+        # short-lived inference credential (~30 min TTL).  The pool doesn't
+        # refresh it during selection (that would trigger network calls in
+        # non-runtime contexts like `hermes auth list`).  If the key is
+        # expired, clear pool_api_key so we fall through to
+        # resolve_nous_runtime_credentials() which handles refresh + mint.
+        if provider == "nous" and entry is not None and pool_api_key:
+            min_ttl = max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800")))
+            nous_state = {
+                "agent_key": getattr(entry, "agent_key", None),
+                "agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
+            }
+            if not _agent_key_is_usable(nous_state, min_ttl):
+                logger.debug("Nous pool entry agent_key expired/missing, falling through to runtime resolution")
+                pool_api_key = ""
        if entry is not None and pool_api_key:
            return _resolve_runtime_from_pool_entry(
                provider=provider,
@@ -16,6 +16,7 @@ import logging
 import os
 import shutil
 import sys
+import copy
 from pathlib import Path
 from typing import Optional, Dict, Any

@@ -316,6 +317,7 @@ def _setup_provider_model_selection(config, provider_id, current_model, prompt_c

 # Import config helpers
 from hermes_cli.config import (
+    DEFAULT_CONFIG,
    get_hermes_home,
    get_config_path,
    get_env_path,
@@ -921,8 +923,10 @@ def setup_model_provider(config: dict, *, quick: bool = False):
    # changes with stale values (#4172).
    _refreshed = load_config()
    config["model"] = _refreshed.get("model", config.get("model"))
-    if _refreshed.get("custom_providers"):
+    if "custom_providers" in _refreshed:
        config["custom_providers"] = _refreshed["custom_providers"]
+    else:
+        config.pop("custom_providers", None)

    # Derive the selected provider for downstream steps (vision setup).
    selected_provider = None
@@ -1006,8 +1010,6 @@ def setup_model_provider(config: dict, *, quick: bool = False):
                strategy_value = ["fill_first", "round_robin", "random"][strategy_idx]
                _set_credential_pool_strategy(config, selected_provider, strategy_value)
                print_success(f"Saved {selected_provider} rotation strategy: {strategy_value}")
-            else:
-                _set_credential_pool_strategy(config, selected_provider, "fill_first")
        except Exception as exc:
            logger.debug("Could not configure same-provider fallback in setup: %s", exc)

@@ -2844,6 +2846,7 @@ def run_setup_wizard(args):
    Supports full, quick, and section-specific setup:
      hermes setup           — full or quick (auto-detected)
      hermes setup model     — just model/provider
+      hermes setup tts       — just text-to-speech
      hermes setup terminal  — just terminal backend
      hermes setup gateway   — just messaging platforms
      hermes setup tools     — just tool configuration
@@ -2855,6 +2858,11 @@ def run_setup_wizard(args):
        return
    ensure_hermes_home()

+    reset_requested = bool(getattr(args, "reset", False))
+    if reset_requested:
+        save_config(copy.deepcopy(DEFAULT_CONFIG))
+        print_success("Configuration reset to defaults.")
+
    config = load_config()
    hermes_home = get_hermes_home()

@@ -2955,18 +2963,13 @@ def run_setup_wizard(args):
        menu_choices = [
            "Quick Setup - configure missing items only",
            "Full Setup - reconfigure everything",
-            "---",
            "Model & Provider",
            "Terminal Backend",
            "Messaging Platforms (Gateway)",
            "Tools",
            "Agent Settings",
-            "---",
            "Exit",
        ]
-
-        # Separator indices (not selectable, but prompt_choice doesn't filter them,
-        # so we handle them below)
        choice = prompt_choice("What would you like to do?", menu_choices, 0)

        if choice == 0:
@@ -2976,18 +2979,14 @@ def run_setup_wizard(args):
        elif choice == 1:
            # Full setup — fall through to run all sections
            pass
-        elif choice in (2, 8):
-            # Separator — treat as exit
+        elif choice == 7:
            print_info("Exiting. Run 'hermes setup' again when ready.")
            return
-        elif choice == 9:
-            print_info("Exiting. Run 'hermes setup' again when ready.")
-            return
-        elif 3 <= choice <= 7:
+        elif 2 <= choice <= 6:
            # Individual section — map by key, not by position.
            # SETUP_SECTIONS includes TTS but the returning-user menu skips it,
-            # so positional indexing (choice - 3) would dispatch the wrong section.
-            section_key = RETURNING_USER_MENU_SECTION_KEYS[choice - 3]
+            # so positional indexing (choice - 2) would dispatch the wrong section.
+            section_key = RETURNING_USER_MENU_SECTION_KEYS[choice - 2]
            section = next((s for s in SETUP_SECTIONS if s[0] == section_key), None)
            if section:
                _, label, func = section
@@ -79,6 +79,9 @@ def _effective_provider_label() -> str:
    return provider_label(effective)


+from hermes_constants import is_termux as _is_termux
+
+
 def show_status(args):
    """Show status of all Hermes Agent components."""
    show_all = getattr(args, 'all', False)
@@ -325,7 +328,25 @@ def show_status(args):
    print()
    print(color("◆ Gateway Service", Colors.CYAN, Colors.BOLD))
    
-    if sys.platform.startswith('linux'):
+    if _is_termux():
+        try:
+            from hermes_cli.gateway import find_gateway_pids
+            gateway_pids = find_gateway_pids()
+        except Exception:
+            gateway_pids = []
+        is_running = bool(gateway_pids)
+        print(f"  Status:       {check_mark(is_running)} {'running' if is_running else 'stopped'}")
+        print("  Manager:      Termux / manual process")
+        if gateway_pids:
+            rendered = ", ".join(str(pid) for pid in gateway_pids[:3])
+            if len(gateway_pids) > 3:
+                rendered += ", ..."
+            print(f"  PID(s):       {rendered}")
+        else:
+            print("  Start with:   hermes gateway")
+            print("  Note:         Android may stop background jobs when Termux is suspended")
+
+    elif sys.platform.startswith('linux'):
        try:
            from hermes_cli.gateway import get_service_name
            _gw_svc = get_service_name()
@@ -339,7 +360,7 @@ def show_status(args):
                timeout=5
            )
            is_active = result.stdout.strip() == "active"
-        except subprocess.TimeoutExpired:
+        except (FileNotFoundError, subprocess.TimeoutExpired):
            is_active = False
        print(f"  Status:       {check_mark(is_active)} {'running' if is_active else 'stopped'}")
        print("  Manager:      systemd (user)")
@@ -6,6 +6,8 @@ Provides options for:
 - Keep data: Remove code but keep ~/.hermes/ (configs, sessions, logs)
 """

+import os
+import platform
 import shutil
 import subprocess
 from pathlib import Path
@@ -122,6 +124,10 @@ def uninstall_gateway_service():
    
    if platform.system() != "Linux":
        return False
+
+    prefix = os.getenv("PREFIX", "")
+    if os.getenv("TERMUX_VERSION") or "com.termux/files/usr" in prefix:
+        return False
    
    try:
        from hermes_cli.gateway import get_service_name
@@ -93,6 +93,16 @@ def parse_reasoning_effort(effort: str) -> dict | None:
    return None


+def is_termux() -> bool:
+    """Return True when running inside a Termux (Android) environment.
+
+    Checks ``TERMUX_VERSION`` (set by Termux) or the Termux-specific
+    ``PREFIX`` path.  Import-safe — no heavy deps.
+    """
+    prefix = os.getenv("PREFIX", "")
+    return bool(os.getenv("TERMUX_VERSION") or "com.termux/files/usr" in prefix)
+
+
 OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
 OPENROUTER_MODELS_URL = f"{OPENROUTER_BASE_URL}/models"
 OPENROUTER_CHAT_URL = f"{OPENROUTER_BASE_URL}/chat/completions"
@@ -63,6 +63,17 @@ homeassistant = ["aiohttp>=3.9.0,<4"]
 sms = ["aiohttp>=3.9.0,<4"]
 acp = ["agent-client-protocol>=0.9.0,<1.0"]
 mistral = ["mistralai>=2.3.0,<3"]
+termux = [
+  # Tested Android / Termux path: keeps the core CLI feature-rich while
+  # avoiding extras that currently depend on non-Android wheels (notably
+  # faster-whisper -> ctranslate2 via the voice extra).
+  "hermes-agent[cron]",
+  "hermes-agent[cli]",
+  "hermes-agent[pty]",
+  "hermes-agent[mcp]",
+  "hermes-agent[honcho]",
+  "hermes-agent[acp]",
+]
 dingtalk = ["dingtalk-stream>=0.1.0,<1"]
 feishu = ["lark-oapi>=1.5.3,<2"]
 rl = [
@@ -500,6 +500,8 @@ class AIAgent:
        status_callback: callable = None,
        max_tokens: int = None,
        reasoning_config: Dict[str, Any] = None,
+        service_tier: str = None,
+        request_overrides: Dict[str, Any] = None,
        prefill_messages: List[Dict[str, Any]] = None,
        platform: str = None,
        user_id: str = None,
@@ -622,6 +624,7 @@ class AIAgent:
        self.tool_progress_callback = tool_progress_callback
        self.tool_start_callback = tool_start_callback
        self.tool_complete_callback = tool_complete_callback
+        self.suppress_status_output = False
        self.thinking_callback = thinking_callback
        self.reasoning_callback = reasoning_callback
        self._reasoning_deltas_fired = False  # Set by _fire_reasoning_delta, reset per API call
@@ -661,6 +664,8 @@ class AIAgent:
        # Model response configuration
        self.max_tokens = max_tokens  # None = use model default
        self.reasoning_config = reasoning_config  # None = use default (medium for OpenRouter)
+        self.service_tier = service_tier
+        self.request_overrides = dict(request_overrides or {})
        self.prefill_messages = prefill_messages or []  # Prefilled conversation turns
        
        # Anthropic prompt caching: auto-enabled for Claude models via OpenRouter.
@@ -789,7 +794,7 @@ class AIAgent:
                    client_kwargs["default_headers"] = copilot_default_headers()
                elif "api.kimi.com" in effective_base.lower():
                    client_kwargs["default_headers"] = {
-                        "User-Agent": "KimiCLI/1.3",
+                        "User-Agent": "KimiCLI/1.30.0",
                    }
                elif "portal.qwen.ai" in effective_base.lower():
                    client_kwargs["default_headers"] = _qwen_portal_headers()
@@ -1460,7 +1465,14 @@ class AIAgent:
        After the main response has been delivered and the remaining tool
        calls are post-response housekeeping (``_mute_post_response``),
        all non-forced output is suppressed.
+
+        ``suppress_status_output`` is a stricter CLI automation mode used by
+        parseable single-query flows such as ``hermes chat -q``. In that mode,
+        all status/diagnostic prints routed through ``_vprint`` are suppressed
+        so stdout stays machine-readable.
        """
+        if getattr(self, "suppress_status_output", False):
+            return
        if not force and getattr(self, "_mute_post_response", False):
            return
        if not force and self._has_stream_consumers() and not self._executing_tools:
@@ -1486,6 +1498,17 @@ class AIAgent:
        except (AttributeError, ValueError, OSError):
            return False

+    def _should_emit_quiet_tool_messages(self) -> bool:
+        """Return True when quiet-mode tool summaries should print directly.
+
+        When the caller provides ``tool_progress_callback`` (for example the CLI
+        TUI or a gateway progress renderer), that callback owns progress display.
+        Emitting quiet-mode summary lines here duplicates progress and leaks tool
+        previews into flows that are expected to stay silent, such as
+        ``hermes chat -q``.
+        """
+        return self.quiet_mode and not self.tool_progress_callback
+
    def _emit_status(self, message: str) -> None:
        """Emit a lifecycle status message to both CLI and gateway channels.

@@ -3324,7 +3347,7 @@ class AIAgent:
        allowed_keys = {
            "model", "instructions", "input", "tools", "store",
            "reasoning", "include", "max_output_tokens", "temperature",
-            "tool_choice", "parallel_tool_calls", "prompt_cache_key",
+            "tool_choice", "parallel_tool_calls", "prompt_cache_key", "service_tier",
        }
        normalized: Dict[str, Any] = {
            "model": model,
@@ -3342,6 +3365,9 @@ class AIAgent:
        include = api_kwargs.get("include")
        if isinstance(include, list):
            normalized["include"] = include
+        service_tier = api_kwargs.get("service_tier")
+        if isinstance(service_tier, str) and service_tier.strip():
+            normalized["service_tier"] = service_tier.strip()

        # Pass through max_output_tokens and temperature
        max_output_tokens = api_kwargs.get("max_output_tokens")
@@ -4155,7 +4181,7 @@ class AIAgent:

            self._client_kwargs["default_headers"] = copilot_default_headers()
        elif "api.kimi.com" in normalized:
-            self._client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
+            self._client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.30.0"}
        elif "portal.qwen.ai" in normalized:
            self._client_kwargs["default_headers"] = _qwen_portal_headers()
        else:
@@ -4407,7 +4433,17 @@ class AIAgent:
            """Stream a chat completions response."""
            import httpx as _httpx
            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
-            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 60.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            # Local providers (Ollama, llama.cpp, vLLM) can take minutes for
+            # prefill on large contexts before producing the first token.
+            # Auto-increase the httpx read timeout unless the user explicitly
+            # overrode HERMES_STREAM_READ_TIMEOUT.
+            if _stream_read_timeout == 120.0 and self.base_url and is_local_endpoint(self.base_url):
+                _stream_read_timeout = _base_timeout
+                logger.debug(
+                    "Local provider detected (%s) — stream read timeout raised to %.0fs",
+                    self.base_url, _stream_read_timeout,
+                )
            stream_kwargs = {
                **api_kwargs,
                "stream": True,
@@ -4565,20 +4601,31 @@ class AIAgent:
            # Build mock response matching non-streaming shape
            full_content = "".join(content_parts) or None
            mock_tool_calls = None
+            has_truncated_tool_args = False
            if tool_calls_acc:
                mock_tool_calls = []
                for idx in sorted(tool_calls_acc):
                    tc = tool_calls_acc[idx]
+                    arguments = tc["function"]["arguments"]
+                    if arguments and arguments.strip():
+                        try:
+                            json.loads(arguments)
+                        except json.JSONDecodeError:
+                            has_truncated_tool_args = True
                    mock_tool_calls.append(SimpleNamespace(
                        id=tc["id"],
                        type=tc["type"],
                        extra_content=tc.get("extra_content"),
                        function=SimpleNamespace(
                            name=tc["function"]["name"],
-                            arguments=tc["function"]["arguments"],
+                            arguments=arguments,
                        ),
                    ))

+            effective_finish_reason = finish_reason or "stop"
+            if has_truncated_tool_args:
+                effective_finish_reason = "length"
+
            full_reasoning = "".join(reasoning_parts) or None
            mock_message = SimpleNamespace(
                role=role,
@@ -4589,7 +4636,7 @@ class AIAgent:
            mock_choice = SimpleNamespace(
                index=0,
                message=mock_message,
-                finish_reason=finish_reason or "stop",
+                finish_reason=effective_finish_reason,
            )
            return SimpleNamespace(
                id="stream-" + str(uuid.uuid4()),
@@ -5419,6 +5466,7 @@ class AIAgent:
                preserve_dots=self._anthropic_preserve_dots(),
                context_length=ctx_len,
                base_url=getattr(self, "_anthropic_base_url", None),
+                fast_mode=self.request_overrides.get("speed") == "fast",
            )

        if self.api_mode == "codex_responses":
@@ -5434,6 +5482,10 @@ class AIAgent:
                "models.github.ai" in self.base_url.lower()
                or "api.githubcopilot.com" in self.base_url.lower()
            )
+            is_codex_backend = (
+                self.provider == "openai-codex"
+                or "chatgpt.com/backend-api/codex" in self.base_url.lower()
+            )

            # Resolve reasoning effort: config > default (medium)
            reasoning_effort = "medium"
@@ -5471,7 +5523,10 @@ class AIAgent:
            elif not is_github_responses:
                kwargs["include"] = []

-            if self.max_tokens is not None:
+            if self.request_overrides:
+                kwargs.update(self.request_overrides)
+
+            if self.max_tokens is not None and not is_codex_backend:
                kwargs["max_output_tokens"] = self.max_tokens

            return kwargs
@@ -5566,20 +5621,20 @@ class AIAgent:
        if self.max_tokens is not None:
            if not self._is_qwen_portal():
                api_kwargs.update(self._max_tokens_param(self.max_tokens))
-        elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
-            # OpenRouter translates requests to Anthropic's Messages API,
-            # which requires max_tokens as a mandatory field.  When we omit
-            # it, OpenRouter picks a default that can be too low — the model
-            # spends its output budget on thinking and has almost nothing
-            # left for the actual response (especially large tool calls like
-            # write_file).  Sending the model's real output limit ensures
-            # full capacity.  Other providers handle the default fine.
+        elif (self._is_openrouter_url() or "nousresearch" in self._base_url_lower) and "claude" in (self.model or "").lower():
+            # OpenRouter and Nous Portal translate requests to Anthropic's
+            # Messages API, which requires max_tokens as a mandatory field.
+            # When we omit it, the proxy picks a default that can be too
+            # low — the model spends its output budget on thinking and has
+            # almost nothing left for the actual response (especially large
+            # tool calls like write_file).  Sending the model's real output
+            # limit ensures full capacity.
            try:
                from agent.anthropic_adapter import _get_anthropic_max_output
                _model_output_limit = _get_anthropic_max_output(self.model)
                api_kwargs["max_tokens"] = _model_output_limit
            except Exception:
-                pass  # fail open — let OpenRouter pick its default
+                pass  # fail open — let the proxy pick its default

        extra_body = {}

@@ -5642,6 +5697,11 @@ class AIAgent:
        if "x.ai" in self._base_url_lower and hasattr(self, "session_id") and self.session_id:
            api_kwargs["extra_headers"] = {"x-grok-conv-id": self.session_id}

+        # Priority Processing / generic request overrides (e.g. service_tier).
+        # Applied last so overrides win over any defaults set above.
+        if self.request_overrides:
+            api_kwargs.update(self.request_overrides)
+
        return api_kwargs

    def _supports_reasoning_extra_body(self) -> bool:
@@ -6347,7 +6407,7 @@ class AIAgent:

        # Start spinner for CLI mode (skip when TUI handles tool progress)
        spinner = None
-        if self.quiet_mode and not self.tool_progress_callback and self._should_start_quiet_spinner():
+        if self._should_emit_quiet_tool_messages() and self._should_start_quiet_spinner():
            face = random.choice(KawaiiSpinner.KAWAII_WAITING)
            spinner = KawaiiSpinner(f"{face} ⚡ running {num_tools} tools concurrently", spinner_type='dots', print_fn=self._print_fn)
            spinner.start()
@@ -6397,7 +6457,7 @@ class AIAgent:
                    logging.debug(f"Tool result ({len(function_result)} chars): {function_result}")

            # Print cute message per tool
-            if self.quiet_mode:
+            if self._should_emit_quiet_tool_messages():
                cute_msg = _get_cute_tool_message_impl(name, args, tool_duration, result=function_result)
                self._safe_print(f"  {cute_msg}")
            elif not self.quiet_mode:
@@ -6554,7 +6614,7 @@ class AIAgent:
                    store=self._todo_store,
                )
                tool_duration = time.time() - tool_start_time
-                if self.quiet_mode:
+                if self._should_emit_quiet_tool_messages():
                    self._vprint(f"  {_get_cute_tool_message_impl('todo', function_args, tool_duration, result=function_result)}")
            elif function_name == "session_search":
                if not self._session_db:
@@ -6569,7 +6629,7 @@ class AIAgent:
                        current_session_id=self.session_id,
                    )
                tool_duration = time.time() - tool_start_time
-                if self.quiet_mode:
+                if self._should_emit_quiet_tool_messages():
                    self._vprint(f"  {_get_cute_tool_message_impl('session_search', function_args, tool_duration, result=function_result)}")
            elif function_name == "memory":
                target = function_args.get("target", "memory")
@@ -6582,7 +6642,7 @@ class AIAgent:
                    store=self._memory_store,
                )
                tool_duration = time.time() - tool_start_time
-                if self.quiet_mode:
+                if self._should_emit_quiet_tool_messages():
                    self._vprint(f"  {_get_cute_tool_message_impl('memory', function_args, tool_duration, result=function_result)}")
            elif function_name == "clarify":
                from tools.clarify_tool import clarify_tool as _clarify_tool
@@ -6592,7 +6652,7 @@ class AIAgent:
                    callback=self.clarify_callback,
                )
                tool_duration = time.time() - tool_start_time
-                if self.quiet_mode:
+                if self._should_emit_quiet_tool_messages():
                    self._vprint(f"  {_get_cute_tool_message_impl('clarify', function_args, tool_duration, result=function_result)}")
            elif function_name == "delegate_task":
                from tools.delegate_tool import delegate_task as _delegate_task
@@ -6603,7 +6663,7 @@ class AIAgent:
                    goal_preview = (function_args.get("goal") or "")[:30]
                    spinner_label = f"🔀 {goal_preview}" if goal_preview else "🔀 delegating"
                spinner = None
-                if self.quiet_mode and not self.tool_progress_callback and self._should_start_quiet_spinner():
+                if self._should_emit_quiet_tool_messages() and self._should_start_quiet_spinner():
                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
                    spinner = KawaiiSpinner(f"{face} {spinner_label}", spinner_type='dots', print_fn=self._print_fn)
                    spinner.start()
@@ -6625,13 +6685,13 @@ class AIAgent:
                    cute_msg = _get_cute_tool_message_impl('delegate_task', function_args, tool_duration, result=_delegate_result)
                    if spinner:
                        spinner.stop(cute_msg)
-                    elif self.quiet_mode:
+                    elif self._should_emit_quiet_tool_messages():
                        self._vprint(f"  {cute_msg}")
            elif self._memory_manager and self._memory_manager.has_tool(function_name):
                # Memory provider tools (hindsight_retain, honcho_search, etc.)
                # These are not in the tool registry — route through MemoryManager.
                spinner = None
-                if self.quiet_mode and not self.tool_progress_callback:
+                if self._should_emit_quiet_tool_messages() and self._should_start_quiet_spinner():
                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
                    emoji = _get_tool_emoji(function_name)
                    preview = _build_tool_preview(function_name, function_args) or function_name
@@ -6649,11 +6709,11 @@ class AIAgent:
                    cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_mem_result)
                    if spinner:
                        spinner.stop(cute_msg)
-                    elif self.quiet_mode:
+                    elif self._should_emit_quiet_tool_messages():
                        self._vprint(f"  {cute_msg}")
            elif self.quiet_mode:
                spinner = None
-                if not self.tool_progress_callback:
+                if self._should_emit_quiet_tool_messages() and self._should_start_quiet_spinner():
                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
                    emoji = _get_tool_emoji(function_name)
                    preview = _build_tool_preview(function_name, function_args) or function_name
@@ -6676,7 +6736,7 @@ class AIAgent:
                    cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_spinner_result)
                    if spinner:
                        spinner.stop(cute_msg)
-                    else:
+                    elif self._should_emit_quiet_tool_messages():
                        self._vprint(f"  {cute_msg}")
            else:
                try:
@@ -7300,6 +7360,7 @@ class AIAgent:
        interrupted = False
        codex_ack_continuations = 0
        length_continue_retries = 0
+        truncated_tool_call_retries = 0
        truncated_response_prefix = ""
        compression_attempts = 0
        _turn_exit_reason = "unknown"  # Diagnostic: why the loop ended
@@ -7768,9 +7829,11 @@ class AIAgent:
                        # retries are pointless.  Detect this early and give a
                        # targeted error instead of wasting 3 API calls.
                        _trunc_content = None
+                        _trunc_has_tool_calls = False
                        if self.api_mode == "chat_completions":
                            _trunc_msg = response.choices[0].message if (hasattr(response, "choices") and response.choices) else None
                            _trunc_content = getattr(_trunc_msg, "content", None) if _trunc_msg else None
+                            _trunc_has_tool_calls = bool(getattr(_trunc_msg, "tool_calls", None)) if _trunc_msg else False
                        elif self.api_mode == "anthropic_messages":
                            # Anthropic response.content is a list of blocks
                            _text_parts = []
@@ -7780,9 +7843,11 @@ class AIAgent:
                            _trunc_content = "\n".join(_text_parts) if _text_parts else None

                        _thinking_exhausted = (
-                            _trunc_content is not None
-                            and not self._has_content_after_think_block(_trunc_content)
-                        ) or _trunc_content is None
+                            not _trunc_has_tool_calls and (
+                                (_trunc_content is not None and not self._has_content_after_think_block(_trunc_content))
+                                or _trunc_content is None
+                            )
+                        )

                        if _thinking_exhausted:
                            _exhaust_error = (
@@ -7858,6 +7923,34 @@ class AIAgent:
                                    "error": "Response remained truncated after 3 continuation attempts",
                                }

+                        if self.api_mode == "chat_completions":
+                            assistant_message = response.choices[0].message
+                            if assistant_message.tool_calls:
+                                if truncated_tool_call_retries < 1:
+                                    truncated_tool_call_retries += 1
+                                    self._vprint(
+                                        f"{self.log_prefix}⚠️  Truncated tool call detected — retrying API call...",
+                                        force=True,
+                                    )
+                                    # Don't append the broken response to messages;
+                                    # just re-run the same API call from the current
+                                    # message state, giving the model another chance.
+                                    continue
+                                self._vprint(
+                                    f"{self.log_prefix}⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.",
+                                    force=True,
+                                )
+                                self._cleanup_task_resources(effective_task_id)
+                                self._persist_session(messages, conversation_history)
+                                return {
+                                    "final_response": None,
+                                    "messages": messages,
+                                    "api_calls": api_call_count,
+                                    "completed": False,
+                                    "partial": True,
+                                    "error": "Response truncated due to output length limit",
+                                }
+
                        # If we have prior messages, roll back to last complete state
                        if len(messages) > 1:
                            self._vprint(f"{self.log_prefix}   ⏪ Rolling back to last complete assistant turn")
@@ -8170,7 +8263,33 @@ class AIAgent:
                        if _err_body_str:
                            self._vprint(f"{self.log_prefix}   📋 Details: {_err_body_str}", force=True)
                    self._vprint(f"{self.log_prefix}   ⏱️  Elapsed: {elapsed_time:.2f}s  Context: {len(api_messages)} msgs, ~{approx_tokens:,} tokens")
-                    
+
+                    # Actionable hint for OpenRouter "no tool endpoints" error.
+                    # This fires regardless of whether fallback succeeds — the
+                    # user needs to know WHY their model failed so they can fix
+                    # their provider routing, not just silently fall back.
+                    if (
+                        self._is_openrouter_url()
+                        and "support tool use" in error_msg
+                    ):
+                        self._vprint(
+                            f"{self.log_prefix}   💡 No OpenRouter providers for {_model} support tool calling with your current settings.",
+                            force=True,
+                        )
+                        if self.providers_allowed:
+                            self._vprint(
+                                f"{self.log_prefix}      Your provider_routing.only restriction is filtering out tool-capable providers.",
+                                force=True,
+                            )
+                            self._vprint(
+                                f"{self.log_prefix}      Try removing the restriction or adding providers that support tools for this model.",
+                                force=True,
+                            )
+                        self._vprint(
+                            f"{self.log_prefix}      Check which providers support tools: https://openrouter.ai/models/{_model}",
+                            force=True,
+                        )
+
                    # Check for interrupt before deciding to retry
                    if self._interrupt_requested:
                        self._vprint(f"{self.log_prefix}⚡ Interrupt detected during error handling, aborting retries.", force=True)
@@ -8226,6 +8345,10 @@ class AIAgent:
                                approx_tokens=approx_tokens,
                                task_id=effective_task_id,
                            )
+                            # Compression created a new session — clear history
+                            # so _flush_messages_to_session_db writes compressed
+                            # messages to the new session, not skipping them.
+                            conversation_history = None
                            if len(messages) < original_len or old_ctx > _reduced_ctx:
                                self._emit_status(
                                    f"🗜️ Context reduced to {_reduced_ctx:,} tokens "
@@ -8283,6 +8406,10 @@ class AIAgent:
                            messages, system_message, approx_tokens=approx_tokens,
                            task_id=effective_task_id,
                        )
+                        # Compression created a new session — clear history
+                        # so _flush_messages_to_session_db writes compressed
+                        # messages to the new session, not skipping them.
+                        conversation_history = None

                        if len(messages) < original_len:
                            self._emit_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
@@ -8401,6 +8528,10 @@ class AIAgent:
                            messages, system_message, approx_tokens=approx_tokens,
                            task_id=effective_task_id,
                        )
+                        # Compression created a new session — clear history
+                        # so _flush_messages_to_session_db writes compressed
+                        # messages to the new session, not skipping them.
+                        conversation_history = None

                        if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
                            if len(messages) < original_len:
@@ -9008,6 +9139,11 @@ class AIAgent:

                    self._execute_tool_calls(assistant_message, messages, effective_task_id, api_call_count)

+                    # Reset per-turn retry counters after successful tool
+                    # execution so a single truncation doesn't poison the
+                    # entire conversation.
+                    truncated_tool_call_retries = 0
+
                    # Signal that a paragraph break is needed before the next
                    # streamed text.  We don't emit it immediately because
                    # multiple consecutive tool iterations would stack up
@@ -2,8 +2,8 @@
 # ============================================================================
 # Hermes Agent Installer
 # ============================================================================
-# Installation script for Linux and macOS.
-# Uses uv for fast Python provisioning and package management.
+# Installation script for Linux, macOS, and Android/Termux.
+# Uses uv for desktop/server installs and Python's stdlib venv + pip on Termux.
 #
 # Usage:
 #   curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
@@ -117,6 +117,36 @@ log_error() {
    echo -e "${RED}✗${NC} $1"
 }

+is_termux() {
+    [ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
+}
+
+get_command_link_dir() {
+    if is_termux && [ -n "${PREFIX:-}" ]; then
+        echo "$PREFIX/bin"
+    else
+        echo "$HOME/.local/bin"
+    fi
+}
+
+get_command_link_display_dir() {
+    if is_termux && [ -n "${PREFIX:-}" ]; then
+        echo '$PREFIX/bin'
+    else
+        echo '~/.local/bin'
+    fi
+}
+
+get_hermes_command_path() {
+    local link_dir
+    link_dir="$(get_command_link_dir)"
+    if [ -x "$link_dir/hermes" ]; then
+        echo "$link_dir/hermes"
+    else
+        echo "hermes"
+    fi
+}
+
 # ============================================================================
 # System detection
 # ============================================================================
@@ -124,12 +154,17 @@ log_error() {
 detect_os() {
    case "$(uname -s)" in
        Linux*)
-            OS="linux"
-            if [ -f /etc/os-release ]; then
-                . /etc/os-release
-                DISTRO="$ID"
+            if is_termux; then
+                OS="android"
+                DISTRO="termux"
            else
-                DISTRO="unknown"
+                OS="linux"
+                if [ -f /etc/os-release ]; then
+                    . /etc/os-release
+                    DISTRO="$ID"
+                else
+                    DISTRO="unknown"
+                fi
            fi
            ;;
        Darwin*)
@@ -158,6 +193,12 @@ detect_os() {
 # ============================================================================

 install_uv() {
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Termux detected — using Python's stdlib venv + pip instead of uv"
+        UV_CMD=""
+        return 0
+    fi
+
    log_info "Checking for uv package manager..."

    # Check common locations for uv
@@ -209,6 +250,25 @@ install_uv() {
 }

 check_python() {
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Checking Termux Python..."
+        if command -v python >/dev/null 2>&1; then
+            PYTHON_PATH="$(command -v python)"
+            if "$PYTHON_PATH" -c 'import sys; raise SystemExit(0 if sys.version_info >= (3, 11) else 1)' 2>/dev/null; then
+                PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+                log_success "Python found: $PYTHON_FOUND_VERSION"
+                return 0
+            fi
+        fi
+
+        log_info "Installing Python via pkg..."
+        pkg install -y python >/dev/null
+        PYTHON_PATH="$(command -v python)"
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        log_success "Python installed: $PYTHON_FOUND_VERSION"
+        return 0
+    fi
+
    log_info "Checking Python $PYTHON_VERSION..."

    # Let uv handle Python — it can download and manage Python versions
@@ -243,6 +303,17 @@ check_git() {
    fi

    log_error "Git not found"
+
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Installing Git via pkg..."
+        pkg install -y git >/dev/null
+        if command -v git >/dev/null 2>&1; then
+            GIT_VERSION=$(git --version | awk '{print $3}')
+            log_success "Git $GIT_VERSION installed"
+            return 0
+        fi
+    fi
+
    log_info "Please install Git:"

    case "$OS" in
@@ -262,6 +333,9 @@ check_git() {
                    ;;
            esac
            ;;
+        android)
+            log_info "  pkg install git"
+            ;;
        macos)
            log_info "  xcode-select --install"
            log_info "  Or: brew install git"
@@ -290,11 +364,29 @@ check_node() {
        return 0
    fi

-    log_info "Node.js not found — installing Node.js $NODE_VERSION LTS..."
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Node.js not found — installing Node.js via pkg..."
+    else
+        log_info "Node.js not found — installing Node.js $NODE_VERSION LTS..."
+    fi
    install_node
 }

 install_node() {
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Installing Node.js via pkg..."
+        if pkg install -y nodejs >/dev/null; then
+            local installed_ver
+            installed_ver=$(node --version 2>/dev/null)
+            log_success "Node.js $installed_ver installed via pkg"
+            HAS_NODE=true
+        else
+            log_warn "Failed to install Node.js via pkg"
+            HAS_NODE=false
+        fi
+        return 0
+    fi
+
    local arch=$(uname -m)
    local node_arch
    case "$arch" in
@@ -413,6 +505,30 @@ install_system_packages() {
        need_ffmpeg=true
    fi

+    # Termux always needs the Android build toolchain for the tested pip path,
+    # even when ripgrep/ffmpeg are already present.
+    if [ "$DISTRO" = "termux" ]; then
+        local termux_pkgs=(clang rust make pkg-config libffi openssl)
+        if [ "$need_ripgrep" = true ]; then
+            termux_pkgs+=("ripgrep")
+        fi
+        if [ "$need_ffmpeg" = true ]; then
+            termux_pkgs+=("ffmpeg")
+        fi
+
+        log_info "Installing Termux packages: ${termux_pkgs[*]}"
+        if pkg install -y "${termux_pkgs[@]}" >/dev/null; then
+            [ "$need_ripgrep" = true ] && HAS_RIPGREP=true && log_success "ripgrep installed"
+            [ "$need_ffmpeg" = true ]  && HAS_FFMPEG=true  && log_success "ffmpeg installed"
+            log_success "Termux build dependencies installed"
+            return 0
+        fi
+
+        log_warn "Could not auto-install all Termux packages"
+        log_info "Install manually: pkg install ${termux_pkgs[*]}"
+        return 0
+    fi
+
    # Nothing to install — done
    if [ "$need_ripgrep" = false ] && [ "$need_ffmpeg" = false ]; then
        return 0
@@ -550,6 +666,9 @@ show_manual_install_hint() {
                *)             log_info "  Use your package manager or visit the project homepage" ;;
            esac
            ;;
+        android)
+            log_info "  pkg install $pkg"
+            ;;
        macos) log_info "  brew install $pkg" ;;
    esac
 }
@@ -646,6 +765,19 @@ setup_venv() {
        return 0
    fi

+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Creating virtual environment with Termux Python..."
+
+        if [ -d "venv" ]; then
+            log_info "Virtual environment already exists, recreating..."
+            rm -rf venv
+        fi
+
+        "$PYTHON_PATH" -m venv venv
+        log_success "Virtual environment ready ($(./venv/bin/python --version 2>/dev/null))"
+        return 0
+    fi
+
    log_info "Creating virtual environment with Python $PYTHON_VERSION..."

    if [ -d "venv" ]; then
@@ -662,6 +794,46 @@ setup_venv() {
 install_deps() {
    log_info "Installing dependencies..."

+    if [ "$DISTRO" = "termux" ]; then
+        if [ "$USE_VENV" = true ]; then
+            export VIRTUAL_ENV="$INSTALL_DIR/venv"
+            PIP_PYTHON="$INSTALL_DIR/venv/bin/python"
+        else
+            PIP_PYTHON="$PYTHON_PATH"
+        fi
+
+        if [ -z "${ANDROID_API_LEVEL:-}" ]; then
+            ANDROID_API_LEVEL="$(getprop ro.build.version.sdk 2>/dev/null || true)"
+            if [ -z "$ANDROID_API_LEVEL" ]; then
+                ANDROID_API_LEVEL=24
+            fi
+            export ANDROID_API_LEVEL
+            log_info "Using ANDROID_API_LEVEL=$ANDROID_API_LEVEL for Android wheel builds"
+        fi
+
+        "$PIP_PYTHON" -m pip install --upgrade pip setuptools wheel >/dev/null
+        if ! "$PIP_PYTHON" -m pip install -e '.[termux]' -c constraints-termux.txt; then
+            log_warn "Termux feature install (.[termux]) failed, trying base install..."
+            if ! "$PIP_PYTHON" -m pip install -e '.' -c constraints-termux.txt; then
+                log_error "Package installation failed on Termux."
+                log_info "Ensure these packages are installed: pkg install clang rust make pkg-config libffi openssl"
+                log_info "Then re-run: cd $INSTALL_DIR && python -m pip install -e '.[termux]' -c constraints-termux.txt"
+                exit 1
+            fi
+        fi
+
+        log_success "Main package installed"
+        log_info "Termux note: browser/WhatsApp tooling is not installed by default; see the Termux guide for optional follow-up steps."
+
+        if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+            log_info "tinker-atropos submodule found — skipping install (optional, for RL training)"
+            log_info "  To install later: $PIP_PYTHON -m pip install -e \"./tinker-atropos\""
+        fi
+
+        log_success "All dependencies installed"
+        return 0
+    fi
+
    if [ "$USE_VENV" = true ]; then
        # Tell uv to install into our venv (no need to activate)
        export VIRTUAL_ENV="$INSTALL_DIR/venv"
@@ -743,19 +915,35 @@ setup_path() {
    if [ ! -x "$HERMES_BIN" ]; then
        log_warn "hermes entry point not found at $HERMES_BIN"
        log_info "This usually means the pip install didn't complete successfully."
-        log_info "Try: cd $INSTALL_DIR && uv pip install -e '.[all]'"
+        if [ "$DISTRO" = "termux" ]; then
+            log_info "Try: cd $INSTALL_DIR && python -m pip install -e '.[termux]' -c constraints-termux.txt"
+        else
+            log_info "Try: cd $INSTALL_DIR && uv pip install -e '.[all]'"
+        fi
        return 0
    fi

-    # Create symlink in ~/.local/bin (standard user binary location, usually on PATH)
-    mkdir -p "$HOME/.local/bin"
-    ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
-    log_success "Symlinked hermes → ~/.local/bin/hermes"
+    local command_link_dir
+    local command_link_display_dir
+    command_link_dir="$(get_command_link_dir)"
+    command_link_display_dir="$(get_command_link_display_dir)"
+
+    # Create a user-facing shim for the hermes command.
+    mkdir -p "$command_link_dir"
+    ln -sf "$HERMES_BIN" "$command_link_dir/hermes"
+    log_success "Symlinked hermes → $command_link_display_dir/hermes"
+
+    if [ "$DISTRO" = "termux" ]; then
+        export PATH="$command_link_dir:$PATH"
+        log_info "$command_link_display_dir is the native Termux command path"
+        log_success "hermes command ready"
+        return 0
+    fi

    # Check if ~/.local/bin is on PATH; if not, add it to shell config.
    # Detect the user's actual login shell (not the shell running this script,
    # which is always bash when piped from curl).
-    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
+    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$command_link_dir$"; then
        SHELL_CONFIGS=()
        LOGIN_SHELL="$(basename "${SHELL:-/bin/bash}")"
        case "$LOGIN_SHELL" in
@@ -801,7 +989,7 @@ setup_path() {
    fi

    # Export for current session so hermes works immediately
-    export PATH="$HOME/.local/bin:$PATH"
+    export PATH="$command_link_dir:$PATH"

    log_success "hermes command ready"
 }
@@ -878,6 +1066,13 @@ install_node_deps() {
        return 0
    fi

+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Skipping automatic Node/browser dependency setup on Termux"
+        log_info "Browser automation and WhatsApp bridge are not part of the tested Termux install path yet."
+        log_info "If you want to experiment manually later, run: cd $INSTALL_DIR && npm install"
+        return 0
+    fi
+
    if [ -f "$INSTALL_DIR/package.json" ]; then
        log_info "Installing Node.js dependencies (browser tools)..."
        cd "$INSTALL_DIR"
@@ -992,8 +1187,7 @@ maybe_start_gateway() {
            read -p "Pair WhatsApp now? [Y/n] " -n 1 -r
            echo
            if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
-                HERMES_CMD="$HOME/.local/bin/hermes"
-                [ ! -x "$HERMES_CMD" ] && HERMES_CMD="hermes"
+                HERMES_CMD="$(get_hermes_command_path)"
                $HERMES_CMD whatsapp || true
            fi
        else
@@ -1007,16 +1201,17 @@ maybe_start_gateway() {
    fi

    echo ""
-    read -p "Would you like to install the gateway as a background service? [Y/n] " -n 1 -r < /dev/tty
+    if [ "$DISTRO" = "termux" ]; then
+        read -p "Would you like to start the gateway in the background? [Y/n] " -n 1 -r < /dev/tty
+    else
+        read -p "Would you like to install the gateway as a background service? [Y/n] " -n 1 -r < /dev/tty
+    fi
    echo

    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
-        HERMES_CMD="$HOME/.local/bin/hermes"
-        if [ ! -x "$HERMES_CMD" ]; then
-            HERMES_CMD="hermes"
-        fi
+        HERMES_CMD="$(get_hermes_command_path)"

-        if command -v systemctl &> /dev/null; then
+        if [ "$DISTRO" != "termux" ] && command -v systemctl &> /dev/null; then
            log_info "Installing systemd service..."
            if $HERMES_CMD gateway install 2>/dev/null; then
                log_success "Gateway service installed"
@@ -1029,12 +1224,19 @@ maybe_start_gateway() {
                log_warn "Systemd install failed. You can start manually: hermes gateway"
            fi
        else
-            log_info "systemd not available — starting gateway in background..."
+            if [ "$DISTRO" = "termux" ]; then
+                log_info "Termux detected — starting gateway in best-effort background mode..."
+            else
+                log_info "systemd not available — starting gateway in background..."
+            fi
            nohup $HERMES_CMD gateway > "$HERMES_HOME/logs/gateway.log" 2>&1 &
            GATEWAY_PID=$!
            log_success "Gateway started (PID $GATEWAY_PID). Logs: ~/.hermes/logs/gateway.log"
            log_info "To stop: kill $GATEWAY_PID"
            log_info "To restart later: hermes gateway"
+            if [ "$DISTRO" = "termux" ]; then
+                log_warn "Android may stop background processes when Termux is suspended or the system reclaims resources."
+            fi
        fi
    else
        log_info "Skipped. Start the gateway later with: hermes gateway"
@@ -1073,24 +1275,33 @@ print_success() {

    echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
    echo ""
-    echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
-    echo ""
-    LOGIN_SHELL="$(basename "${SHELL:-/bin/bash}")"
-    if [ "$LOGIN_SHELL" = "zsh" ]; then
-        echo "   source ~/.zshrc"
-    elif [ "$LOGIN_SHELL" = "bash" ]; then
-        echo "   source ~/.bashrc"
+    if [ "$DISTRO" = "termux" ]; then
+        echo -e "${YELLOW}⚡ 'hermes' was linked into $(get_command_link_display_dir), which is already on PATH in Termux.${NC}"
+        echo ""
    else
-        echo "   source ~/.bashrc   # or ~/.zshrc"
+        echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
+        echo ""
+        LOGIN_SHELL="$(basename "${SHELL:-/bin/bash}")"
+        if [ "$LOGIN_SHELL" = "zsh" ]; then
+            echo "   source ~/.zshrc"
+        elif [ "$LOGIN_SHELL" = "bash" ]; then
+            echo "   source ~/.bashrc"
+        else
+            echo "   source ~/.bashrc   # or ~/.zshrc"
+        fi
+        echo ""
    fi
-    echo ""

    # Show Node.js warning if auto-install failed
    if [ "$HAS_NODE" = false ]; then
        echo -e "${YELLOW}"
        echo "Note: Node.js could not be installed automatically."
        echo "Browser tools need Node.js. Install manually:"
-        echo "  https://nodejs.org/en/download/"
+        if [ "$DISTRO" = "termux" ]; then
+            echo "  pkg install nodejs"
+        else
+            echo "  https://nodejs.org/en/download/"
+        fi
        echo -e "${NC}"
    fi

@@ -1099,7 +1310,11 @@ print_success() {
        echo -e "${YELLOW}"
        echo "Note: ripgrep (rg) was not found. File search will use"
        echo "grep as a fallback. For faster search in large codebases,"
-        echo "install ripgrep: sudo apt install ripgrep (or brew install ripgrep)"
+        if [ "$DISTRO" = "termux" ]; then
+            echo "install ripgrep: pkg install ripgrep"
+        else
+            echo "install ripgrep: sudo apt install ripgrep (or brew install ripgrep)"
+        fi
        echo -e "${NC}"
    fi
 }
@@ -3,17 +3,17 @@
 # Hermes Agent Setup Script
 # ============================================================================
 # Quick setup for developers who cloned the repo manually.
-# Uses uv for fast Python provisioning and package management.
+# Uses uv for desktop/server setup and Python's stdlib venv + pip on Termux.
 #
 # Usage:
 #   ./setup-hermes.sh
 #
 # This script:
-# 1. Installs uv if not present
-# 2. Creates a virtual environment with Python 3.11 via uv
-# 3. Installs all dependencies (main package + submodules)
+# 1. Detects desktop/server vs Android/Termux setup path
+# 2. Creates a Python 3.11 virtual environment
+# 3. Installs the appropriate dependency set for the platform
 # 4. Creates .env from template (if not exists)
-# 5. Symlinks the 'hermes' CLI command into ~/.local/bin
+# 5. Symlinks the 'hermes' CLI command into a user-facing bin dir
 # 6. Runs the setup wizard (optional)
 # ============================================================================

@@ -31,6 +31,26 @@ cd "$SCRIPT_DIR"

 PYTHON_VERSION="3.11"

+is_termux() {
+    [ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
+}
+
+get_command_link_dir() {
+    if is_termux && [ -n "${PREFIX:-}" ]; then
+        echo "$PREFIX/bin"
+    else
+        echo "$HOME/.local/bin"
+    fi
+}
+
+get_command_link_display_dir() {
+    if is_termux && [ -n "${PREFIX:-}" ]; then
+        echo '$PREFIX/bin'
+    else
+        echo '~/.local/bin'
+    fi
+}
+
 echo ""
 echo -e "${CYAN}⚕ Hermes Agent Setup${NC}"
 echo ""
@@ -42,36 +62,40 @@ echo ""
 echo -e "${CYAN}→${NC} Checking for uv..."

 UV_CMD=""
-if command -v uv &> /dev/null; then
-    UV_CMD="uv"
-elif [ -x "$HOME/.local/bin/uv" ]; then
-    UV_CMD="$HOME/.local/bin/uv"
-elif [ -x "$HOME/.cargo/bin/uv" ]; then
-    UV_CMD="$HOME/.cargo/bin/uv"
-fi
-
-if [ -n "$UV_CMD" ]; then
-    UV_VERSION=$($UV_CMD --version 2>/dev/null)
-    echo -e "${GREEN}✓${NC} uv found ($UV_VERSION)"
+if is_termux; then
+    echo -e "${CYAN}→${NC} Termux detected — using Python's stdlib venv + pip instead of uv"
 else
-    echo -e "${CYAN}→${NC} Installing uv..."
-    if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
-        if [ -x "$HOME/.local/bin/uv" ]; then
-            UV_CMD="$HOME/.local/bin/uv"
-        elif [ -x "$HOME/.cargo/bin/uv" ]; then
-            UV_CMD="$HOME/.cargo/bin/uv"
-        fi
-        
-        if [ -n "$UV_CMD" ]; then
-            UV_VERSION=$($UV_CMD --version 2>/dev/null)
-            echo -e "${GREEN}✓${NC} uv installed ($UV_VERSION)"
+    if command -v uv &> /dev/null; then
+        UV_CMD="uv"
+    elif [ -x "$HOME/.local/bin/uv" ]; then
+        UV_CMD="$HOME/.local/bin/uv"
+    elif [ -x "$HOME/.cargo/bin/uv" ]; then
+        UV_CMD="$HOME/.cargo/bin/uv"
+    fi
+
+    if [ -n "$UV_CMD" ]; then
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        echo -e "${GREEN}✓${NC} uv found ($UV_VERSION)"
+    else
+        echo -e "${CYAN}→${NC} Installing uv..."
+        if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
+            if [ -x "$HOME/.local/bin/uv" ]; then
+                UV_CMD="$HOME/.local/bin/uv"
+            elif [ -x "$HOME/.cargo/bin/uv" ]; then
+                UV_CMD="$HOME/.cargo/bin/uv"
+            fi
+
+            if [ -n "$UV_CMD" ]; then
+                UV_VERSION=$($UV_CMD --version 2>/dev/null)
+                echo -e "${GREEN}✓${NC} uv installed ($UV_VERSION)"
+            else
+                echo -e "${RED}✗${NC} uv installed but not found. Add ~/.local/bin to PATH and retry."
+                exit 1
+            fi
        else
-            echo -e "${RED}✗${NC} uv installed but not found. Add ~/.local/bin to PATH and retry."
+            echo -e "${RED}✗${NC} Failed to install uv. Visit https://docs.astral.sh/uv/"
            exit 1
        fi
-    else
-        echo -e "${RED}✗${NC} Failed to install uv. Visit https://docs.astral.sh/uv/"
-        exit 1
    fi
 fi

@@ -81,16 +105,34 @@ fi

 echo -e "${CYAN}→${NC} Checking Python $PYTHON_VERSION..."

-if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
-    PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
-    PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
-    echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
+if is_termux; then
+    if command -v python >/dev/null 2>&1; then
+        PYTHON_PATH="$(command -v python)"
+        if "$PYTHON_PATH" -c 'import sys; raise SystemExit(0 if sys.version_info >= (3, 11) else 1)' 2>/dev/null; then
+            PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+            echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
+        else
+            echo -e "${RED}✗${NC} Termux Python must be 3.11+"
+            echo "    Run: pkg install python"
+            exit 1
+        fi
+    else
+        echo -e "${RED}✗${NC} Python not found in Termux"
+        echo "    Run: pkg install python"
+        exit 1
+    fi
 else
-    echo -e "${CYAN}→${NC} Python $PYTHON_VERSION not found, installing via uv..."
-    $UV_CMD python install "$PYTHON_VERSION"
-    PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
-    PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
-    echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION installed"
+    if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
+        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
+    else
+        echo -e "${CYAN}→${NC} Python $PYTHON_VERSION not found, installing via uv..."
+        $UV_CMD python install "$PYTHON_VERSION"
+        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION installed"
+    fi
 fi

 # ============================================================================
@@ -104,11 +146,16 @@ if [ -d "venv" ]; then
    rm -rf venv
 fi

-$UV_CMD venv venv --python "$PYTHON_VERSION"
-echo -e "${GREEN}✓${NC} venv created (Python $PYTHON_VERSION)"
+if is_termux; then
+    "$PYTHON_PATH" -m venv venv
+    echo -e "${GREEN}✓${NC} venv created with stdlib venv"
+else
+    $UV_CMD venv venv --python "$PYTHON_VERSION"
+    echo -e "${GREEN}✓${NC} venv created (Python $PYTHON_VERSION)"
+fi

-# Tell uv to install into this venv (no activation needed for uv)
 export VIRTUAL_ENV="$SCRIPT_DIR/venv"
+SETUP_PYTHON="$SCRIPT_DIR/venv/bin/python"

 # ============================================================================
 # Dependencies
@@ -116,19 +163,34 @@ export VIRTUAL_ENV="$SCRIPT_DIR/venv"

 echo -e "${CYAN}→${NC} Installing dependencies..."

-# Prefer uv sync with lockfile (hash-verified installs) when available,
-# fall back to pip install for compatibility or when lockfile is stale.
-if [ -f "uv.lock" ]; then
-    echo -e "${CYAN}→${NC} Using uv.lock for hash-verified installation..."
-    UV_PROJECT_ENVIRONMENT="$SCRIPT_DIR/venv" $UV_CMD sync --all-extras --locked 2>/dev/null && \
-        echo -e "${GREEN}✓${NC} Dependencies installed (lockfile verified)" || {
-        echo -e "${YELLOW}⚠${NC} Lockfile install failed (may be outdated), falling back to pip install..."
+if is_termux; then
+    export ANDROID_API_LEVEL="$(getprop ro.build.version.sdk 2>/dev/null || printf '%s' "${ANDROID_API_LEVEL:-}")"
+    echo -e "${CYAN}→${NC} Termux detected — installing the tested Android bundle"
+    "$SETUP_PYTHON" -m pip install --upgrade pip setuptools wheel
+    if [ -f "constraints-termux.txt" ]; then
+        "$SETUP_PYTHON" -m pip install -e ".[termux]" -c constraints-termux.txt || {
+            echo -e "${YELLOW}⚠${NC} Termux bundle install failed, falling back to base install..."
+            "$SETUP_PYTHON" -m pip install -e "." -c constraints-termux.txt
+        }
+    else
+        "$SETUP_PYTHON" -m pip install -e ".[termux]" || "$SETUP_PYTHON" -m pip install -e "."
+    fi
+    echo -e "${GREEN}✓${NC} Dependencies installed"
+else
+    # Prefer uv sync with lockfile (hash-verified installs) when available,
+    # fall back to pip install for compatibility or when lockfile is stale.
+    if [ -f "uv.lock" ]; then
+        echo -e "${CYAN}→${NC} Using uv.lock for hash-verified installation..."
+        UV_PROJECT_ENVIRONMENT="$SCRIPT_DIR/venv" $UV_CMD sync --all-extras --locked 2>/dev/null && \
+            echo -e "${GREEN}✓${NC} Dependencies installed (lockfile verified)" || {
+            echo -e "${YELLOW}⚠${NC} Lockfile install failed (may be outdated), falling back to pip install..."
+            $UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
+            echo -e "${GREEN}✓${NC} Dependencies installed"
+        }
+    else
        $UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
        echo -e "${GREEN}✓${NC} Dependencies installed"
-    }
-else
-    $UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
-    echo -e "${GREEN}✓${NC} Dependencies installed"
+    fi
 fi

 # ============================================================================
@@ -138,7 +200,9 @@ fi
 echo -e "${CYAN}→${NC} Installing optional submodules..."

 # tinker-atropos (RL training backend)
-if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+if is_termux; then
+    echo -e "${CYAN}→${NC} Skipping tinker-atropos on Termux (not part of the tested Android path)"
+elif [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
    $UV_CMD pip install -e "./tinker-atropos" && \
        echo -e "${GREEN}✓${NC} tinker-atropos installed" || \
        echo -e "${YELLOW}⚠${NC} tinker-atropos install failed (RL tools may not work)"
@@ -160,34 +224,42 @@ else
    echo
    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
        INSTALLED=false
-        
-        # Check if sudo is available
-        if command -v sudo &> /dev/null && sudo -n true 2>/dev/null; then
-            if command -v apt &> /dev/null; then
-                sudo apt install -y ripgrep && INSTALLED=true
-            elif command -v dnf &> /dev/null; then
-                sudo dnf install -y ripgrep && INSTALLED=true
+
+        if is_termux; then
+            pkg install -y ripgrep && INSTALLED=true
+        else
+            # Check if sudo is available
+            if command -v sudo &> /dev/null && sudo -n true 2>/dev/null; then
+                if command -v apt &> /dev/null; then
+                    sudo apt install -y ripgrep && INSTALLED=true
+                elif command -v dnf &> /dev/null; then
+                    sudo dnf install -y ripgrep && INSTALLED=true
+                fi
+            fi
+
+            # Try brew (no sudo needed)
+            if [ "$INSTALLED" = false ] && command -v brew &> /dev/null; then
+                brew install ripgrep && INSTALLED=true
+            fi
+
+            # Try cargo (no sudo needed)
+            if [ "$INSTALLED" = false ] && command -v cargo &> /dev/null; then
+                echo -e "${CYAN}→${NC} Trying cargo install (no sudo required)..."
+                cargo install ripgrep && INSTALLED=true
            fi
        fi
-        
-        # Try brew (no sudo needed)
-        if [ "$INSTALLED" = false ] && command -v brew &> /dev/null; then
-            brew install ripgrep && INSTALLED=true
-        fi
-        
-        # Try cargo (no sudo needed)
-        if [ "$INSTALLED" = false ] && command -v cargo &> /dev/null; then
-            echo -e "${CYAN}→${NC} Trying cargo install (no sudo required)..."
-            cargo install ripgrep && INSTALLED=true
-        fi
-        
+
        if [ "$INSTALLED" = true ]; then
            echo -e "${GREEN}✓${NC} ripgrep installed"
        else
            echo -e "${YELLOW}⚠${NC} Auto-install failed. Install options:"
-            echo "    sudo apt install ripgrep     # Debian/Ubuntu"
-            echo "    brew install ripgrep         # macOS"
-            echo "    cargo install ripgrep        # With Rust (no sudo)"
+            if is_termux; then
+                echo "    pkg install ripgrep          # Termux / Android"
+            else
+                echo "    sudo apt install ripgrep     # Debian/Ubuntu"
+                echo "    brew install ripgrep         # macOS"
+                echo "    cargo install ripgrep        # With Rust (no sudo)"
+            fi
            echo "    https://github.com/BurntSushi/ripgrep#installation"
        fi
    fi
@@ -207,49 +279,56 @@ else
 fi

 # ============================================================================
-# PATH setup — symlink hermes into ~/.local/bin
+# PATH setup — symlink hermes into a user-facing bin dir
 # ============================================================================

 echo -e "${CYAN}→${NC} Setting up hermes command..."

 HERMES_BIN="$SCRIPT_DIR/venv/bin/hermes"
-mkdir -p "$HOME/.local/bin"
-ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
-echo -e "${GREEN}✓${NC} Symlinked hermes → ~/.local/bin/hermes"
+COMMAND_LINK_DIR="$(get_command_link_dir)"
+COMMAND_LINK_DISPLAY_DIR="$(get_command_link_display_dir)"
+mkdir -p "$COMMAND_LINK_DIR"
+ln -sf "$HERMES_BIN" "$COMMAND_LINK_DIR/hermes"
+echo -e "${GREEN}✓${NC} Symlinked hermes → $COMMAND_LINK_DISPLAY_DIR/hermes"

-# Determine the appropriate shell config file
-SHELL_CONFIG=""
-if [[ "$SHELL" == *"zsh"* ]]; then
-    SHELL_CONFIG="$HOME/.zshrc"
-elif [[ "$SHELL" == *"bash"* ]]; then
-    SHELL_CONFIG="$HOME/.bashrc"
-    [ ! -f "$SHELL_CONFIG" ] && SHELL_CONFIG="$HOME/.bash_profile"
+if is_termux; then
+    export PATH="$COMMAND_LINK_DIR:$PATH"
+    echo -e "${GREEN}✓${NC} $COMMAND_LINK_DISPLAY_DIR is already on PATH in Termux"
 else
-    # Fallback to checking existing files
-    if [ -f "$HOME/.zshrc" ]; then
+    # Determine the appropriate shell config file
+    SHELL_CONFIG=""
+    if [[ "$SHELL" == *"zsh"* ]]; then
        SHELL_CONFIG="$HOME/.zshrc"
-    elif [ -f "$HOME/.bashrc" ]; then
+    elif [[ "$SHELL" == *"bash"* ]]; then
        SHELL_CONFIG="$HOME/.bashrc"
-    elif [ -f "$HOME/.bash_profile" ]; then
-        SHELL_CONFIG="$HOME/.bash_profile"
-    fi
-fi
-
-if [ -n "$SHELL_CONFIG" ]; then
-    # Touch the file just in case it doesn't exist yet but was selected
-    touch "$SHELL_CONFIG" 2>/dev/null || true
-    
-    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
-        if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
-            echo "" >> "$SHELL_CONFIG"
-            echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
-            echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$SHELL_CONFIG"
-            echo -e "${GREEN}✓${NC} Added ~/.local/bin to PATH in $SHELL_CONFIG"
-        else
-            echo -e "${GREEN}✓${NC} ~/.local/bin already in $SHELL_CONFIG"
-        fi
+        [ ! -f "$SHELL_CONFIG" ] && SHELL_CONFIG="$HOME/.bash_profile"
    else
-        echo -e "${GREEN}✓${NC} ~/.local/bin already on PATH"
+        # Fallback to checking existing files
+        if [ -f "$HOME/.zshrc" ]; then
+            SHELL_CONFIG="$HOME/.zshrc"
+        elif [ -f "$HOME/.bashrc" ]; then
+            SHELL_CONFIG="$HOME/.bashrc"
+        elif [ -f "$HOME/.bash_profile" ]; then
+            SHELL_CONFIG="$HOME/.bash_profile"
+        fi
+    fi
+
+    if [ -n "$SHELL_CONFIG" ]; then
+        # Touch the file just in case it doesn't exist yet but was selected
+        touch "$SHELL_CONFIG" 2>/dev/null || true
+
+        if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
+            if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
+                echo "" >> "$SHELL_CONFIG"
+                echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
+                echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$SHELL_CONFIG"
+                echo -e "${GREEN}✓${NC} Added ~/.local/bin to PATH in $SHELL_CONFIG"
+            else
+                echo -e "${GREEN}✓${NC} ~/.local/bin already in $SHELL_CONFIG"
+            fi
+        else
+            echo -e "${GREEN}✓${NC} ~/.local/bin already on PATH"
+        fi
    fi
 fi

@@ -281,18 +360,31 @@ echo -e "${GREEN}✓ Setup complete!${NC}"
 echo ""
 echo "Next steps:"
 echo ""
-echo "  1. Reload your shell:"
-echo "     source $SHELL_CONFIG"
-echo ""
-echo "  2. Run the setup wizard to configure API keys:"
-echo "     hermes setup"
-echo ""
-echo "  3. Start chatting:"
-echo "     hermes"
-echo ""
+if is_termux; then
+    echo "  1. Run the setup wizard to configure API keys:"
+    echo "     hermes setup"
+    echo ""
+    echo "  2. Start chatting:"
+    echo "     hermes"
+    echo ""
+else
+    echo "  1. Reload your shell:"
+    echo "     source $SHELL_CONFIG"
+    echo ""
+    echo "  2. Run the setup wizard to configure API keys:"
+    echo "     hermes setup"
+    echo ""
+    echo "  3. Start chatting:"
+    echo "     hermes"
+    echo ""
+fi
 echo "Other commands:"
 echo "  hermes status        # Check configuration"
-echo "  hermes gateway install # Install gateway service (messaging + cron)"
+if is_termux; then
+    echo "  hermes gateway       # Run gateway in foreground"
+else
+    echo "  hermes gateway install # Install gateway service (messaging + cron)"
+fi
 echo "  hermes cron list     # View scheduled jobs"
 echo "  hermes doctor        # Diagnose issues"
 echo ""
@@ -410,6 +410,37 @@ class TestPrompt:
        update = last_call[1].get("update") or last_call[0][1]
        assert update.session_update == "agent_message_chunk"

+    @pytest.mark.asyncio
+    async def test_prompt_populates_usage_from_top_level_run_conversation_fields(self, agent):
+        """ACP should map top-level token fields into PromptResponse.usage."""
+        new_resp = await agent.new_session(cwd=".")
+        state = agent.session_manager.get_session(new_resp.session_id)
+
+        state.agent.run_conversation = MagicMock(return_value={
+            "final_response": "usage attached",
+            "messages": [],
+            "prompt_tokens": 123,
+            "completion_tokens": 45,
+            "total_tokens": 168,
+            "reasoning_tokens": 7,
+            "cache_read_tokens": 11,
+        })
+
+        mock_conn = MagicMock(spec=acp.Client)
+        mock_conn.session_update = AsyncMock()
+        agent._conn = mock_conn
+
+        prompt = [TextContentBlock(type="text", text="show usage")]
+        resp = await agent.prompt(prompt=prompt, session_id=new_resp.session_id)
+
+        assert isinstance(resp, PromptResponse)
+        assert resp.usage is not None
+        assert resp.usage.input_tokens == 123
+        assert resp.usage.output_tokens == 45
+        assert resp.usage.total_tokens == 168
+        assert resp.usage.thought_tokens == 7
+        assert resp.usage.cached_read_tokens == 11
+
    @pytest.mark.asyncio
    async def test_prompt_cancelled_returns_cancelled_stop_reason(self, agent):
        """If cancel is called during prompt, stop_reason should be 'cancelled'."""
@@ -81,6 +81,9 @@ class TestBuildAnthropicClient:
            build_anthropic_client("sk-ant-api03-x", base_url="https://custom.api.com")
            kwargs = mock_sdk.Anthropic.call_args[1]
            assert kwargs["base_url"] == "https://custom.api.com"
+            assert kwargs["default_headers"] == {
+                "anthropic-beta": "interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14"
+            }

    def test_minimax_anthropic_endpoint_uses_bearer_auth_for_regular_api_keys(self):
        with patch("agent.anthropic_adapter._anthropic_sdk") as mock_sdk:
@@ -92,7 +95,20 @@ class TestBuildAnthropicClient:
            assert kwargs["auth_token"] == "minimax-secret-123"
            assert "api_key" not in kwargs
            assert kwargs["default_headers"] == {
-                "anthropic-beta": "interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14"
+                "anthropic-beta": "interleaved-thinking-2025-05-14"
+            }
+
+    def test_minimax_cn_anthropic_endpoint_omits_tool_streaming_beta(self):
+        with patch("agent.anthropic_adapter._anthropic_sdk") as mock_sdk:
+            build_anthropic_client(
+                "minimax-cn-secret-123",
+                base_url="https://api.minimaxi.com/anthropic",
+            )
+            kwargs = mock_sdk.Anthropic.call_args[1]
+            assert kwargs["auth_token"] == "minimax-cn-secret-123"
+            assert "api_key" not in kwargs
+            assert kwargs["default_headers"] == {
+                "anthropic-beta": "interleaved-thinking-2025-05-14"
            }


@@ -480,6 +480,39 @@ class TestClassifyApiError:
        result = classify_api_error(e)
        assert result.reason == FailoverReason.context_overflow

+    # ── Message-only usage limit disambiguation (no status code) ──
+
+    def test_message_usage_limit_transient_is_rate_limit(self):
+        """'usage limit' + 'try again' with no status code → rate_limit, not billing."""
+        e = Exception("usage limit exceeded, try again in 5 minutes")
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.rate_limit
+        assert result.retryable is True
+        assert result.should_rotate_credential is True
+        assert result.should_fallback is True
+
+    def test_message_usage_limit_no_retry_signal_is_billing(self):
+        """'usage limit' with no transient signal and no status code → billing."""
+        e = Exception("usage limit reached")
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.billing
+        assert result.retryable is False
+        assert result.should_rotate_credential is True
+
+    def test_message_quota_with_reset_window_is_rate_limit(self):
+        """'quota' + 'resets at' with no status code → rate_limit."""
+        e = Exception("quota exceeded, resets at midnight UTC")
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.rate_limit
+        assert result.retryable is True
+
+    def test_message_limit_exceeded_with_wait_is_rate_limit(self):
+        """'limit exceeded' + 'wait' with no status code → rate_limit."""
+        e = Exception("key limit exceeded, please wait before retrying")
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.rate_limit
+        assert result.retryable is True
+
    # ── Unknown / fallback ──

    def test_generic_exception_is_unknown(self):
@@ -0,0 +1,70 @@
+"""Tests for local provider stream read timeout auto-detection.
+
+When a local LLM provider is detected (Ollama, llama.cpp, vLLM, etc.),
+the httpx stream read timeout should be automatically increased from the
+default 60s to HERMES_API_TIMEOUT (1800s) to avoid premature connection
+kills during long prefill phases.
+"""
+
+import os
+import pytest
+from unittest.mock import patch
+
+from agent.model_metadata import is_local_endpoint
+
+
+class TestLocalStreamReadTimeout:
+    """Verify stream read timeout auto-detection logic."""
+
+    @pytest.mark.parametrize("base_url", [
+        "http://localhost:11434",
+        "http://127.0.0.1:8080",
+        "http://0.0.0.0:5000",
+        "http://192.168.1.100:8000",
+        "http://10.0.0.5:1234",
+    ])
+    def test_local_endpoint_bumps_read_timeout(self, base_url):
+        """Local endpoint + default timeout -> bumps to base_timeout."""
+        with patch.dict(os.environ, {}, clear=False):
+            os.environ.pop("HERMES_STREAM_READ_TIMEOUT", None)
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            if _stream_read_timeout == 120.0 and base_url and is_local_endpoint(base_url):
+                _stream_read_timeout = _base_timeout
+            assert _stream_read_timeout == 1800.0
+
+    def test_user_override_respected_for_local(self):
+        """User sets HERMES_STREAM_READ_TIMEOUT -> keep their value even for local."""
+        with patch.dict(os.environ, {"HERMES_STREAM_READ_TIMEOUT": "300"}, clear=False):
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            base_url = "http://localhost:11434"
+            if _stream_read_timeout == 120.0 and base_url and is_local_endpoint(base_url):
+                _stream_read_timeout = _base_timeout
+            assert _stream_read_timeout == 300.0
+
+    @pytest.mark.parametrize("base_url", [
+        "https://api.openai.com",
+        "https://openrouter.ai/api",
+        "https://api.anthropic.com",
+    ])
+    def test_remote_endpoint_keeps_default(self, base_url):
+        """Remote endpoint -> keep 120s default."""
+        with patch.dict(os.environ, {}, clear=False):
+            os.environ.pop("HERMES_STREAM_READ_TIMEOUT", None)
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            if _stream_read_timeout == 120.0 and base_url and is_local_endpoint(base_url):
+                _stream_read_timeout = _base_timeout
+            assert _stream_read_timeout == 120.0
+
+    def test_empty_base_url_keeps_default(self):
+        """No base_url set -> keep 120s default."""
+        with patch.dict(os.environ, {}, clear=False):
+            os.environ.pop("HERMES_STREAM_READ_TIMEOUT", None)
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            base_url = ""
+            if _stream_read_timeout == 120.0 and base_url and is_local_endpoint(base_url):
+                _stream_read_timeout = _base_timeout
+            assert _stream_read_timeout == 120.0
@@ -1,4 +1,6 @@
-"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog."""
+"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog, beta headers."""
+
+from unittest.mock import patch


 class TestMinimaxContextLengths:
@@ -103,3 +105,100 @@ class TestMinimaxModelCatalog:
            models = _PROVIDER_MODELS[provider]
            assert "MiniMax-M2.7-highspeed" not in models
            assert "MiniMax-M2.5-highspeed" not in models
+
+
+class TestMinimaxBetaHeaders:
+    """MiniMax Anthropic-compat endpoints reject fine-grained-tool-streaming beta.
+
+    Verify that build_anthropic_client omits the tool-streaming beta for MiniMax
+    (both global and China domains) while keeping it for native Anthropic and
+    other third-party endpoints.  Covers the fix for #6510 / #6555.
+    """
+
+    _TOOL_BETA = "fine-grained-tool-streaming-2025-05-14"
+    _THINKING_BETA = "interleaved-thinking-2025-05-14"
+
+    # -- helper ----------------------------------------------------------
+
+    def _build_and_get_betas(self, api_key, base_url=None):
+        """Build client, return the anthropic-beta header string."""
+        from agent.anthropic_adapter import build_anthropic_client
+        with patch("agent.anthropic_adapter._anthropic_sdk") as mock_sdk:
+            build_anthropic_client(api_key, base_url=base_url)
+            kwargs = mock_sdk.Anthropic.call_args[1]
+            headers = kwargs.get("default_headers", {})
+            return headers.get("anthropic-beta", "")
+
+    # -- MiniMax global --------------------------------------------------
+
+    def test_minimax_global_omits_tool_streaming(self):
+        betas = self._build_and_get_betas(
+            "mm-key-123", base_url="https://api.minimax.io/anthropic"
+        )
+        assert self._TOOL_BETA not in betas
+        assert self._THINKING_BETA in betas
+
+    def test_minimax_global_trailing_slash(self):
+        betas = self._build_and_get_betas(
+            "mm-key-123", base_url="https://api.minimax.io/anthropic/"
+        )
+        assert self._TOOL_BETA not in betas
+
+    # -- MiniMax China ---------------------------------------------------
+
+    def test_minimax_cn_omits_tool_streaming(self):
+        betas = self._build_and_get_betas(
+            "mm-cn-key-456", base_url="https://api.minimaxi.com/anthropic"
+        )
+        assert self._TOOL_BETA not in betas
+        assert self._THINKING_BETA in betas
+
+    def test_minimax_cn_trailing_slash(self):
+        betas = self._build_and_get_betas(
+            "mm-cn-key-456", base_url="https://api.minimaxi.com/anthropic/"
+        )
+        assert self._TOOL_BETA not in betas
+
+    # -- Non-MiniMax keeps full betas ------------------------------------
+
+    def test_native_anthropic_keeps_tool_streaming(self):
+        betas = self._build_and_get_betas("sk-ant-api03-real-key-here")
+        assert self._TOOL_BETA in betas
+        assert self._THINKING_BETA in betas
+
+    def test_third_party_proxy_keeps_tool_streaming(self):
+        betas = self._build_and_get_betas(
+            "custom-key", base_url="https://my-proxy.example.com/anthropic"
+        )
+        assert self._TOOL_BETA in betas
+
+    def test_custom_base_url_keeps_tool_streaming(self):
+        betas = self._build_and_get_betas(
+            "custom-key", base_url="https://custom.api.com"
+        )
+        assert self._TOOL_BETA in betas
+
+    # -- _common_betas_for_base_url unit tests ---------------------------
+
+    def test_common_betas_none_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _COMMON_BETAS
+        assert _common_betas_for_base_url(None) == _COMMON_BETAS
+
+    def test_common_betas_empty_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _COMMON_BETAS
+        assert _common_betas_for_base_url("") == _COMMON_BETAS
+
+    def test_common_betas_minimax_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _TOOL_STREAMING_BETA
+        betas = _common_betas_for_base_url("https://api.minimax.io/anthropic")
+        assert _TOOL_STREAMING_BETA not in betas
+        assert len(betas) > 0  # still has other betas
+
+    def test_common_betas_minimax_cn_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _TOOL_STREAMING_BETA
+        betas = _common_betas_for_base_url("https://api.minimaxi.com/anthropic")
+        assert _TOOL_STREAMING_BETA not in betas
+
+    def test_common_betas_regular_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _COMMON_BETAS
+        assert _common_betas_for_base_url("https://api.anthropic.com") == _COMMON_BETAS
@@ -147,6 +147,20 @@ class TestEscapedSpaces:
        assert result["path"] == tmp_image_with_spaces
        assert result["remainder"] == "what is this?"

+    def test_tilde_prefixed_path(self, tmp_path, monkeypatch):
+        home = tmp_path / "home"
+        img = home / "storage" / "shared" / "Pictures" / "cat.png"
+        img.parent.mkdir(parents=True, exist_ok=True)
+        img.write_bytes(b"\x89PNG\r\n\x1a\n")
+        monkeypatch.setenv("HOME", str(home))
+
+        result = _detect_file_drop("~/storage/shared/Pictures/cat.png what is this?")
+
+        assert result is not None
+        assert result["path"] == img
+        assert result["is_image"] is True
+        assert result["remainder"] == "what is this?"
+

 # ---------------------------------------------------------------------------
 # Tests: edge cases
@@ -0,0 +1,109 @@
+from pathlib import Path
+from unittest.mock import patch
+
+from cli import (
+    HermesCLI,
+    _collect_query_images,
+    _format_image_attachment_badges,
+    _termux_example_image_path,
+)
+
+
+def _make_cli():
+    cli_obj = HermesCLI.__new__(HermesCLI)
+    cli_obj._attached_images = []
+    return cli_obj
+
+
+def _make_image(path: Path) -> Path:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_bytes(b"\x89PNG\r\n\x1a\n")
+    return path
+
+
+class TestImageCommand:
+    def test_handle_image_command_attaches_local_image(self, tmp_path):
+        img = _make_image(tmp_path / "photo.png")
+        cli_obj = _make_cli()
+
+        with patch("cli._cprint"):
+            cli_obj._handle_image_command(f"/image {img}")
+
+        assert cli_obj._attached_images == [img]
+
+    def test_handle_image_command_supports_quoted_path_with_spaces(self, tmp_path):
+        img = _make_image(tmp_path / "my photo.png")
+        cli_obj = _make_cli()
+
+        with patch("cli._cprint"):
+            cli_obj._handle_image_command(f'/image "{img}"')
+
+        assert cli_obj._attached_images == [img]
+
+    def test_handle_image_command_rejects_non_image_file(self, tmp_path):
+        file_path = tmp_path / "notes.txt"
+        file_path.write_text("hello\n", encoding="utf-8")
+        cli_obj = _make_cli()
+
+        with patch("cli._cprint") as mock_print:
+            cli_obj._handle_image_command(f"/image {file_path}")
+
+        assert cli_obj._attached_images == []
+        rendered = " ".join(str(arg) for call in mock_print.call_args_list for arg in call.args)
+        assert "Not a supported image file" in rendered
+
+
+class TestCollectQueryImages:
+    def test_collect_query_images_accepts_explicit_image_arg(self, tmp_path):
+        img = _make_image(tmp_path / "diagram.png")
+
+        message, images = _collect_query_images("describe this", str(img))
+
+        assert message == "describe this"
+        assert images == [img]
+
+    def test_collect_query_images_extracts_leading_path(self, tmp_path):
+        img = _make_image(tmp_path / "camera.png")
+
+        message, images = _collect_query_images(f"{img} what do you see?")
+
+        assert message == "what do you see?"
+        assert images == [img]
+
+    def test_collect_query_images_supports_tilde_paths(self, tmp_path, monkeypatch):
+        home = tmp_path / "home"
+        img = _make_image(home / "storage" / "shared" / "Pictures" / "cat.png")
+        monkeypatch.setenv("HOME", str(home))
+
+        message, images = _collect_query_images("describe this", "~/storage/shared/Pictures/cat.png")
+
+        assert message == "describe this"
+        assert images == [img]
+
+
+class TestTermuxImageHints:
+    def test_termux_example_image_path_prefers_real_shared_storage_root(self, monkeypatch):
+        existing = {"/sdcard", "/storage/emulated/0"}
+        monkeypatch.setattr("cli.os.path.isdir", lambda path: path in existing)
+
+        hint = _termux_example_image_path()
+
+        assert hint == "/sdcard/Pictures/cat.png"
+
+
+class TestImageBadgeFormatting:
+    def test_compact_badges_use_filename_on_narrow_terminals(self, tmp_path):
+        img = _make_image(tmp_path / "Screenshot 2026-04-09 at 11.22.33 AM.png")
+
+        badges = _format_image_attachment_badges([img], image_counter=1, width=40)
+
+        assert badges.startswith("[📎 ")
+        assert "Image #1" not in badges
+
+    def test_compact_badges_summarize_multiple_images(self, tmp_path):
+        img1 = _make_image(tmp_path / "one.png")
+        img2 = _make_image(tmp_path / "two.png")
+
+        badges = _format_image_attachment_badges([img1, img2], image_counter=2, width=45)
+
+        assert badges == "[📎 2 images attached]"
@@ -49,6 +49,25 @@ class TestCliSkinPromptIntegration:
        set_active_skin("ares")
        assert cli._get_tui_prompt_fragments() == [("class:sudo-prompt", "🔑 ❯ ")]

+    def test_narrow_terminals_compact_voice_prompt_fragments(self):
+        cli = _make_cli_stub()
+        cli._voice_mode = True
+
+        with patch.object(HermesCLI, "_get_tui_terminal_width", return_value=50):
+            assert cli._get_tui_prompt_fragments() == [("class:voice-prompt", "🎤 ")]
+
+    def test_narrow_terminals_compact_voice_recording_prompt_fragments(self):
+        cli = _make_cli_stub()
+        cli._voice_recording = True
+        cli._voice_recorder = SimpleNamespace(current_rms=3000)
+
+        with patch.object(HermesCLI, "_get_tui_terminal_width", return_value=50):
+            frags = cli._get_tui_prompt_fragments()
+
+        assert frags[0][0] == "class:voice-recording"
+        assert frags[0][1].startswith("●")
+        assert "❯" not in frags[0][1]
+
    def test_icon_only_skin_symbol_still_visible_in_special_states(self):
        cli = _make_cli_stub()
        cli._secret_state = {"response_queue": object()}
@@ -206,6 +206,59 @@ class TestCLIStatusBar:
        assert "⚕" in text
        assert "claude-sonnet-4-20250514" in text

+    def test_minimal_tui_chrome_threshold(self):
+        cli_obj = _make_cli()
+
+        assert cli_obj._use_minimal_tui_chrome(width=63) is True
+        assert cli_obj._use_minimal_tui_chrome(width=64) is False
+
+    def test_bottom_input_rule_hides_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+
+        assert cli_obj._tui_input_rule_height("top", width=50) == 1
+        assert cli_obj._tui_input_rule_height("bottom", width=50) == 0
+        assert cli_obj._tui_input_rule_height("bottom", width=90) == 1
+
+    def test_agent_spacer_reclaimed_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+        cli_obj._agent_running = True
+
+        assert cli_obj._agent_spacer_height(width=50) == 0
+        assert cli_obj._agent_spacer_height(width=90) == 1
+        cli_obj._agent_running = False
+        assert cli_obj._agent_spacer_height(width=90) == 0
+
+    def test_spinner_line_hidden_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+        cli_obj._spinner_text = "thinking"
+
+        assert cli_obj._spinner_widget_height(width=50) == 0
+        assert cli_obj._spinner_widget_height(width=90) == 1
+        cli_obj._spinner_text = ""
+        assert cli_obj._spinner_widget_height(width=90) == 0
+
+    def test_voice_status_bar_compacts_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+        cli_obj._voice_mode = True
+        cli_obj._voice_recording = False
+        cli_obj._voice_processing = False
+        cli_obj._voice_tts = True
+        cli_obj._voice_continuous = True
+
+        fragments = cli_obj._get_voice_status_fragments(width=50)
+
+        assert fragments == [("class:voice-status", " 🎤 Ctrl+B ")]
+
+    def test_voice_recording_status_bar_compacts_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+        cli_obj._voice_mode = True
+        cli_obj._voice_recording = True
+        cli_obj._voice_processing = False
+
+        fragments = cli_obj._get_voice_status_fragments(width=50)
+
+        assert fragments == [("class:voice-status-recording", " ● REC ")]
+

 class TestCLIUsageReport:
    def test_show_usage_includes_estimated_cost(self, capsys):
@@ -0,0 +1,413 @@
+"""Tests for the /fast CLI command and service-tier config handling."""
+
+import unittest
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+
+def _import_cli():
+    import hermes_cli.config as config_mod
+
+    if not hasattr(config_mod, "save_env_value_secure"):
+        config_mod.save_env_value_secure = lambda key, value: {
+            "success": True,
+            "stored_as": key,
+            "validated": False,
+        }
+
+    import cli as cli_mod
+
+    return cli_mod
+
+
+class TestParseServiceTierConfig(unittest.TestCase):
+    def _parse(self, raw):
+        cli_mod = _import_cli()
+        return cli_mod._parse_service_tier_config(raw)
+
+    def test_fast_maps_to_priority(self):
+        self.assertEqual(self._parse("fast"), "priority")
+        self.assertEqual(self._parse("priority"), "priority")
+
+    def test_normal_disables_service_tier(self):
+        self.assertIsNone(self._parse("normal"))
+        self.assertIsNone(self._parse("off"))
+        self.assertIsNone(self._parse(""))
+
+
+class TestHandleFastCommand(unittest.TestCase):
+    def _make_cli(self, service_tier=None):
+        return SimpleNamespace(
+            service_tier=service_tier,
+            provider="openai-codex",
+            requested_provider="openai-codex",
+            model="gpt-5.4",
+            _fast_command_available=lambda: True,
+            agent=MagicMock(),
+        )
+
+    def test_no_args_shows_status(self):
+        cli_mod = _import_cli()
+        stub = self._make_cli(service_tier=None)
+        with (
+            patch.object(cli_mod, "_cprint") as mock_cprint,
+            patch.object(cli_mod, "save_config_value") as mock_save,
+        ):
+            cli_mod.HermesCLI._handle_fast_command(stub, "/fast")
+
+        # Bare /fast shows status, does not change config
+        mock_save.assert_not_called()
+        # Should have printed the status line
+        printed = " ".join(str(c) for c in mock_cprint.call_args_list)
+        self.assertIn("normal", printed)
+
+    def test_no_args_shows_fast_when_enabled(self):
+        cli_mod = _import_cli()
+        stub = self._make_cli(service_tier="priority")
+        with (
+            patch.object(cli_mod, "_cprint") as mock_cprint,
+            patch.object(cli_mod, "save_config_value") as mock_save,
+        ):
+            cli_mod.HermesCLI._handle_fast_command(stub, "/fast")
+
+        mock_save.assert_not_called()
+        printed = " ".join(str(c) for c in mock_cprint.call_args_list)
+        self.assertIn("fast", printed)
+
+    def test_normal_argument_clears_service_tier(self):
+        cli_mod = _import_cli()
+        stub = self._make_cli(service_tier="priority")
+        with (
+            patch.object(cli_mod, "_cprint"),
+            patch.object(cli_mod, "save_config_value", return_value=True) as mock_save,
+        ):
+            cli_mod.HermesCLI._handle_fast_command(stub, "/fast normal")
+
+        mock_save.assert_called_once_with("agent.service_tier", "normal")
+        self.assertIsNone(stub.service_tier)
+        self.assertIsNone(stub.agent)
+
+    def test_unsupported_model_does_not_expose_fast(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            service_tier=None,
+            provider="openai-codex",
+            requested_provider="openai-codex",
+            model="gpt-5.3-codex",
+            _fast_command_available=lambda: False,
+            agent=MagicMock(),
+        )
+
+        with (
+            patch.object(cli_mod, "_cprint") as mock_cprint,
+            patch.object(cli_mod, "save_config_value") as mock_save,
+        ):
+            cli_mod.HermesCLI._handle_fast_command(stub, "/fast")
+
+        mock_save.assert_not_called()
+        self.assertTrue(mock_cprint.called)
+
+
+class TestPriorityProcessingModels(unittest.TestCase):
+    """Verify the expanded Priority Processing model registry."""
+
+    def test_all_documented_models_supported(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        # All models from OpenAI's Priority Processing pricing table
+        supported = [
+            "gpt-5.4", "gpt-5.4-mini", "gpt-5.2",
+            "gpt-5.1", "gpt-5", "gpt-5-mini",
+            "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano",
+            "gpt-4o", "gpt-4o-mini",
+            "o3", "o4-mini",
+        ]
+        for model in supported:
+            assert model_supports_fast_mode(model), f"{model} should support fast mode"
+
+    def test_vendor_prefix_stripped(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        assert model_supports_fast_mode("openai/gpt-5.4") is True
+        assert model_supports_fast_mode("openai/gpt-4.1") is True
+        assert model_supports_fast_mode("openai/o3") is True
+
+    def test_non_priority_models_rejected(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        assert model_supports_fast_mode("gpt-5.3-codex") is False
+        assert model_supports_fast_mode("claude-sonnet-4") is False
+        assert model_supports_fast_mode("") is False
+        assert model_supports_fast_mode(None) is False
+
+    def test_resolve_overrides_returns_service_tier(self):
+        from hermes_cli.models import resolve_fast_mode_overrides
+
+        result = resolve_fast_mode_overrides("gpt-5.4")
+        assert result == {"service_tier": "priority"}
+
+        result = resolve_fast_mode_overrides("gpt-4.1")
+        assert result == {"service_tier": "priority"}
+
+    def test_resolve_overrides_none_for_unsupported(self):
+        from hermes_cli.models import resolve_fast_mode_overrides
+
+        assert resolve_fast_mode_overrides("gpt-5.3-codex") is None
+        assert resolve_fast_mode_overrides("claude-sonnet-4") is None
+
+
+class TestFastModeRouting(unittest.TestCase):
+    def test_fast_command_exposed_for_model_even_when_provider_is_auto(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(provider="auto", requested_provider="auto", model="gpt-5.4", agent=None)
+
+        assert cli_mod.HermesCLI._fast_command_available(stub) is True
+
+    def test_fast_command_exposed_for_non_codex_models(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(provider="openai", requested_provider="openai", model="gpt-4.1", agent=None)
+        assert cli_mod.HermesCLI._fast_command_available(stub) is True
+
+        stub = SimpleNamespace(provider="openrouter", requested_provider="openrouter", model="o3", agent=None)
+        assert cli_mod.HermesCLI._fast_command_available(stub) is True
+
+    def test_turn_route_injects_overrides_without_provider_switch(self):
+        """Fast mode should add request_overrides but NOT change the provider/runtime."""
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            model="gpt-5.4",
+            api_key="primary-key",
+            base_url="https://openrouter.ai/api/v1",
+            provider="openrouter",
+            api_mode="chat_completions",
+            acp_command=None,
+            acp_args=[],
+            _credential_pool=None,
+            _smart_model_routing={},
+            service_tier="priority",
+        )
+
+        original_runtime = {
+            "api_key": "***",
+            "base_url": "https://openrouter.ai/api/v1",
+            "provider": "openrouter",
+            "api_mode": "chat_completions",
+            "command": None,
+            "args": [],
+            "credential_pool": None,
+        }
+
+        with patch("agent.smart_model_routing.resolve_turn_route", return_value={
+            "model": "gpt-5.4",
+            "runtime": dict(original_runtime),
+            "label": None,
+            "signature": ("gpt-5.4", "openrouter", "https://openrouter.ai/api/v1", "chat_completions", None, ()),
+        }):
+            route = cli_mod.HermesCLI._resolve_turn_agent_config(stub, "hi")
+
+        # Provider should NOT have changed
+        assert route["runtime"]["provider"] == "openrouter"
+        assert route["runtime"]["api_mode"] == "chat_completions"
+        # But request_overrides should be set
+        assert route["request_overrides"] == {"service_tier": "priority"}
+
+    def test_turn_route_keeps_primary_runtime_when_model_has_no_fast_backend(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            model="gpt-5.3-codex",
+            api_key="primary-key",
+            base_url="https://openrouter.ai/api/v1",
+            provider="openrouter",
+            api_mode="chat_completions",
+            acp_command=None,
+            acp_args=[],
+            _credential_pool=None,
+            _smart_model_routing={},
+            service_tier="priority",
+        )
+
+        primary_route = {
+            "model": "gpt-5.3-codex",
+            "runtime": {
+                "api_key": "***",
+                "base_url": "https://openrouter.ai/api/v1",
+                "provider": "openrouter",
+                "api_mode": "chat_completions",
+                "command": None,
+                "args": [],
+                "credential_pool": None,
+            },
+            "label": None,
+            "signature": ("gpt-5.3-codex", "openrouter", "https://openrouter.ai/api/v1", "chat_completions", None, ()),
+        }
+        with patch("agent.smart_model_routing.resolve_turn_route", return_value=primary_route):
+            route = cli_mod.HermesCLI._resolve_turn_agent_config(stub, "hi")
+
+        assert route["runtime"]["provider"] == "openrouter"
+        assert route.get("request_overrides") is None
+
+
+class TestAnthropicFastMode(unittest.TestCase):
+    """Verify Anthropic Fast Mode model support and override resolution."""
+
+    def test_anthropic_opus_supported(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        # Native Anthropic format (hyphens)
+        assert model_supports_fast_mode("claude-opus-4-6") is True
+        # OpenRouter format (dots)
+        assert model_supports_fast_mode("claude-opus-4.6") is True
+        # With vendor prefix
+        assert model_supports_fast_mode("anthropic/claude-opus-4-6") is True
+        assert model_supports_fast_mode("anthropic/claude-opus-4.6") is True
+
+    def test_anthropic_non_opus_rejected(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        assert model_supports_fast_mode("claude-sonnet-4-6") is False
+        assert model_supports_fast_mode("claude-sonnet-4.6") is False
+        assert model_supports_fast_mode("claude-haiku-4-5") is False
+        assert model_supports_fast_mode("anthropic/claude-sonnet-4.6") is False
+
+    def test_anthropic_variant_tags_stripped(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        # OpenRouter variant tags after colon should be stripped
+        assert model_supports_fast_mode("claude-opus-4.6:fast") is True
+        assert model_supports_fast_mode("claude-opus-4.6:beta") is True
+
+    def test_resolve_overrides_returns_speed_for_anthropic(self):
+        from hermes_cli.models import resolve_fast_mode_overrides
+
+        result = resolve_fast_mode_overrides("claude-opus-4-6")
+        assert result == {"speed": "fast"}
+
+        result = resolve_fast_mode_overrides("anthropic/claude-opus-4.6")
+        assert result == {"speed": "fast"}
+
+    def test_resolve_overrides_returns_service_tier_for_openai(self):
+        """OpenAI models should still get service_tier, not speed."""
+        from hermes_cli.models import resolve_fast_mode_overrides
+
+        result = resolve_fast_mode_overrides("gpt-5.4")
+        assert result == {"service_tier": "priority"}
+
+    def test_is_anthropic_fast_model(self):
+        from hermes_cli.models import _is_anthropic_fast_model
+
+        assert _is_anthropic_fast_model("claude-opus-4-6") is True
+        assert _is_anthropic_fast_model("claude-opus-4.6") is True
+        assert _is_anthropic_fast_model("anthropic/claude-opus-4-6") is True
+        assert _is_anthropic_fast_model("gpt-5.4") is False
+        assert _is_anthropic_fast_model("claude-sonnet-4-6") is False
+
+    def test_fast_command_exposed_for_anthropic_model(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            provider="anthropic", requested_provider="anthropic",
+            model="claude-opus-4-6", agent=None,
+        )
+        assert cli_mod.HermesCLI._fast_command_available(stub) is True
+
+    def test_fast_command_hidden_for_anthropic_sonnet(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            provider="anthropic", requested_provider="anthropic",
+            model="claude-sonnet-4-6", agent=None,
+        )
+        assert cli_mod.HermesCLI._fast_command_available(stub) is False
+
+    def test_turn_route_injects_speed_for_anthropic(self):
+        """Anthropic models should get speed:'fast' override, not service_tier."""
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            model="claude-opus-4-6",
+            api_key="sk-ant-test",
+            base_url="https://api.anthropic.com",
+            provider="anthropic",
+            api_mode="anthropic_messages",
+            acp_command=None,
+            acp_args=[],
+            _credential_pool=None,
+            _smart_model_routing={},
+            service_tier="priority",
+        )
+
+        original_runtime = {
+            "api_key": "***",
+            "base_url": "https://api.anthropic.com",
+            "provider": "anthropic",
+            "api_mode": "anthropic_messages",
+            "command": None,
+            "args": [],
+            "credential_pool": None,
+        }
+
+        with patch("agent.smart_model_routing.resolve_turn_route", return_value={
+            "model": "claude-opus-4-6",
+            "runtime": dict(original_runtime),
+            "label": None,
+            "signature": ("claude-opus-4-6", "anthropic", "https://api.anthropic.com", "anthropic_messages", None, ()),
+        }):
+            route = cli_mod.HermesCLI._resolve_turn_agent_config(stub, "hi")
+
+        assert route["runtime"]["provider"] == "anthropic"
+        assert route["request_overrides"] == {"speed": "fast"}
+
+
+class TestAnthropicFastModeAdapter(unittest.TestCase):
+    """Verify build_anthropic_kwargs handles fast_mode parameter."""
+
+    def test_fast_mode_adds_speed_and_beta(self):
+        from agent.anthropic_adapter import build_anthropic_kwargs, _FAST_MODE_BETA
+
+        kwargs = build_anthropic_kwargs(
+            model="claude-opus-4-6",
+            messages=[{"role": "user", "content": [{"type": "text", "text": "hi"}]}],
+            tools=None,
+            max_tokens=None,
+            reasoning_config=None,
+            fast_mode=True,
+        )
+        assert kwargs.get("speed") == "fast"
+        assert "extra_headers" in kwargs
+        assert _FAST_MODE_BETA in kwargs["extra_headers"].get("anthropic-beta", "")
+
+    def test_fast_mode_off_no_speed(self):
+        from agent.anthropic_adapter import build_anthropic_kwargs
+
+        kwargs = build_anthropic_kwargs(
+            model="claude-opus-4-6",
+            messages=[{"role": "user", "content": [{"type": "text", "text": "hi"}]}],
+            tools=None,
+            max_tokens=None,
+            reasoning_config=None,
+            fast_mode=False,
+        )
+        assert "speed" not in kwargs
+        assert "extra_headers" not in kwargs
+
+    def test_fast_mode_skipped_for_third_party_endpoint(self):
+        from agent.anthropic_adapter import build_anthropic_kwargs
+
+        kwargs = build_anthropic_kwargs(
+            model="claude-opus-4-6",
+            messages=[{"role": "user", "content": [{"type": "text", "text": "hi"}]}],
+            tools=None,
+            max_tokens=None,
+            reasoning_config=None,
+            fast_mode=True,
+            base_url="https://api.minimax.io/anthropic/v1",
+        )
+        # Third-party endpoints should NOT get speed or fast-mode beta
+        assert "speed" not in kwargs
+        assert "extra_headers" not in kwargs
+
+
+class TestConfigDefault(unittest.TestCase):
+    def test_default_config_has_service_tier(self):
+        from hermes_cli.config import DEFAULT_CONFIG
+
+        agent = DEFAULT_CONFIG.get("agent", {})
+        self.assertIn("service_tier", agent)
+        self.assertEqual(agent["service_tier"], "")
@@ -0,0 +1,138 @@
+"""Tests for _stream_delta's handling of <think> tags in prose vs real reasoning blocks."""
+import sys
+import os
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
+
+import pytest
+
+
+def _make_cli_stub():
+    """Create a minimal HermesCLI-like object with stream state."""
+    from cli import HermesCLI
+
+    cli = HermesCLI.__new__(HermesCLI)
+    cli.show_reasoning = False
+    cli._stream_buf = ""
+    cli._stream_started = False
+    cli._stream_box_opened = False
+    cli._stream_prefilt = ""
+    cli._in_reasoning_block = False
+    cli._reasoning_stream_started = False
+    cli._reasoning_box_opened = False
+    cli._reasoning_buf = ""
+    cli._reasoning_preview_buf = ""
+    cli._deferred_content = ""
+    cli._stream_text_ansi = ""
+    cli._stream_needs_break = False
+    cli._emitted = []
+
+    # Mock _emit_stream_text to capture output
+    def mock_emit(text):
+        cli._emitted.append(text)
+    cli._emit_stream_text = mock_emit
+
+    # Mock _stream_reasoning_delta
+    cli._reasoning_emitted = []
+    def mock_reasoning(text):
+        cli._reasoning_emitted.append(text)
+    cli._stream_reasoning_delta = mock_reasoning
+
+    return cli
+
+
+class TestThinkTagInProse:
+    """<think> mentioned in prose should NOT trigger reasoning suppression."""
+
+    def test_think_tag_mid_sentence(self):
+        """'(/think not producing <think> tags)' should pass through."""
+        cli = _make_cli_stub()
+        tokens = [
+            "  1. Fix reasoning mode in eval ",
+            "(/think not producing ",
+            "<think>",
+            " tags — ~2% gap)",
+            "\n  2. Launch production",
+        ]
+        for t in tokens:
+            cli._stream_delta(t)
+        assert not cli._in_reasoning_block, "<think> in prose should not enter reasoning block"
+        full = "".join(cli._emitted)
+        assert "<think>" in full, "The literal <think> tag should be in the emitted text"
+        assert "Launch production" in full
+
+    def test_think_tag_after_text_on_same_line(self):
+        """'some text <think>' should NOT trigger reasoning."""
+        cli = _make_cli_stub()
+        cli._stream_delta("Here is the <think> tag explanation")
+        assert not cli._in_reasoning_block
+        full = "".join(cli._emitted)
+        assert "<think>" in full
+
+    def test_think_tag_in_backticks(self):
+        """'`<think>`' should NOT trigger reasoning."""
+        cli = _make_cli_stub()
+        cli._stream_delta("Use the `<think>` tag for reasoning")
+        assert not cli._in_reasoning_block
+
+
+class TestRealReasoningBlock:
+    """Real <think> tags at block boundaries should still be caught."""
+
+    def test_think_at_start_of_stream(self):
+        """'<think>reasoning</think>answer' should suppress reasoning."""
+        cli = _make_cli_stub()
+        cli._stream_delta("<think>")
+        assert cli._in_reasoning_block
+        cli._stream_delta("I need to analyze this")
+        cli._stream_delta("</think>")
+        assert not cli._in_reasoning_block
+        cli._stream_delta("Here is my answer")
+        full = "".join(cli._emitted)
+        assert "Here is my answer" in full
+        assert "I need to analyze" not in full  # reasoning was suppressed
+
+    def test_think_after_newline(self):
+        """'text\\n<think>' should trigger reasoning block."""
+        cli = _make_cli_stub()
+        cli._stream_delta("Some preamble\n<think>")
+        assert cli._in_reasoning_block
+        full = "".join(cli._emitted)
+        assert "Some preamble" in full
+
+    def test_think_after_newline_with_whitespace(self):
+        """'text\\n  <think>' should trigger reasoning block."""
+        cli = _make_cli_stub()
+        cli._stream_delta("Some preamble\n  <think>")
+        assert cli._in_reasoning_block
+
+    def test_think_with_only_whitespace_before(self):
+        """'   <think>' (whitespace only prefix) should trigger."""
+        cli = _make_cli_stub()
+        cli._stream_delta("   <think>")
+        assert cli._in_reasoning_block
+
+
+class TestFlushRecovery:
+    """_flush_stream should recover content from false-positive reasoning blocks."""
+
+    def test_flush_recovers_buffered_content(self):
+        """If somehow in reasoning block at flush, content is recovered."""
+        cli = _make_cli_stub()
+        # Manually set up a false-positive state
+        cli._in_reasoning_block = True
+        cli._stream_prefilt = " tags — ~2% gap)\n  2. Launch production"
+        cli._stream_box_opened = True
+
+        # Mock _close_reasoning_box and box closing
+        cli._close_reasoning_box = lambda: None
+
+        # Call flush
+        from unittest.mock import patch
+        import shutil
+        with patch.object(shutil, "get_terminal_size", return_value=os.terminal_size((80, 24))):
+            with patch("cli._cprint"):
+                cli._flush_stream()
+
+        assert not cli._in_reasoning_block
+        full = "".join(cli._emitted)
+        assert "Launch production" in full
@@ -294,6 +294,40 @@ class TestModelsEndpoint:
            assert data["data"][0]["id"] == "hermes-agent"
            assert data["data"][0]["owned_by"] == "hermes"

+    @pytest.mark.asyncio
+    async def test_models_returns_profile_name(self):
+        """When running under a named profile, /v1/models advertises the profile name."""
+        with patch("gateway.platforms.api_server.APIServerAdapter._resolve_model_name", return_value="lucas"):
+            adapter = _make_adapter()
+        app = _create_app(adapter)
+        async with TestClient(TestServer(app)) as cli:
+            resp = await cli.get("/v1/models")
+            assert resp.status == 200
+            data = await resp.json()
+            assert data["data"][0]["id"] == "lucas"
+            assert data["data"][0]["root"] == "lucas"
+
+    @pytest.mark.asyncio
+    async def test_models_returns_explicit_model_name(self):
+        """Explicit model_name in config overrides profile name."""
+        extra = {"model_name": "my-custom-agent"}
+        config = PlatformConfig(enabled=True, extra=extra)
+        adapter = APIServerAdapter(config)
+        assert adapter._model_name == "my-custom-agent"
+
+    def test_resolve_model_name_explicit(self):
+        assert APIServerAdapter._resolve_model_name("my-bot") == "my-bot"
+
+    def test_resolve_model_name_default_profile(self):
+        """Default profile falls back to 'hermes-agent'."""
+        with patch("hermes_cli.profiles.get_active_profile_name", return_value="default"):
+            assert APIServerAdapter._resolve_model_name("") == "hermes-agent"
+
+    def test_resolve_model_name_named_profile(self):
+        """Named profile uses the profile name as model name."""
+        with patch("hermes_cli.profiles.get_active_profile_name", return_value="lucas"):
+            assert APIServerAdapter._resolve_model_name("") == "lucas"
+
    @pytest.mark.asyncio
    async def test_models_requires_auth(self, auth_adapter):
        app = _create_app(auth_adapter)
@@ -81,6 +81,7 @@ def adapter(monkeypatch):
    config = PlatformConfig(enabled=True, token="fake-token")
    adapter = DiscordAdapter(config)
    adapter._client = SimpleNamespace(user=SimpleNamespace(id=999))
+    adapter._text_batch_delay_seconds = 0  # disable batching for tests
    adapter.handle_message = AsyncMock()
    return adapter

@@ -91,6 +91,7 @@ def adapter(monkeypatch):
    config = PlatformConfig(enabled=True, token="fake-token")
    adapter = DiscordAdapter(config)
    adapter._client = SimpleNamespace(user=SimpleNamespace(id=999))
+    adapter._text_batch_delay_seconds = 0  # disable batching for tests
    adapter.handle_message = AsyncMock()
    return adapter

@@ -62,6 +62,7 @@ def adapter():
        fetch_channel=AsyncMock(),
        user=SimpleNamespace(id=99999, name="HermesBot"),
    )
+    adapter._text_batch_delay_seconds = 0  # disable batching for tests
    return adapter


@@ -44,6 +44,7 @@ def _make_adapter(tmp_path=None):
        },
    )
    adapter = MatrixAdapter(config)
+    adapter._text_batch_delay_seconds = 0  # disable batching for tests
    adapter.handle_message = AsyncMock()
    adapter._startup_ts = time.time() - 10  # avoid startup grace filter
    return adapter
@@ -0,0 +1,245 @@
+"""Tests that gateway /model switch persists across messages.
+
+The gateway /model command stores session overrides in
+``_session_model_overrides``.  These must:
+
+1. Be applied in ``run_sync()`` so the next agent uses the switched model.
+2. Not be mistaken for fallback activation (which evicts the cached agent).
+3. Survive across multiple messages until /reset clears them.
+
+Tests exercise the real ``_apply_session_model_override()`` and
+``_is_intentional_model_switch()`` methods on ``GatewayRunner``.
+"""
+
+from datetime import datetime
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, MagicMock
+
+import pytest
+
+from gateway.config import GatewayConfig, Platform, PlatformConfig
+from gateway.session import SessionEntry, SessionSource, build_session_key
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_source() -> SessionSource:
+    return SessionSource(
+        platform=Platform.TELEGRAM,
+        user_id="u1",
+        chat_id="c1",
+        user_name="tester",
+        chat_type="dm",
+    )
+
+
+def _make_runner():
+    """Create a minimal GatewayRunner with stubbed internals."""
+    from gateway.run import GatewayRunner
+
+    runner = object.__new__(GatewayRunner)
+    runner.config = GatewayConfig(
+        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="tok")}
+    )
+    adapter = MagicMock()
+    adapter.send = AsyncMock()
+    runner.adapters = {Platform.TELEGRAM: adapter}
+    runner._voice_mode = {}
+    runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False)
+    runner._session_model_overrides = {}
+    runner._pending_model_notes = {}
+    runner._background_tasks = set()
+    runner._running_agents = {}
+    runner._pending_messages = {}
+    runner._pending_approvals = {}
+    runner._session_db = None
+    runner._agent_cache = {}
+    runner._agent_cache_lock = None
+    runner._effective_model = None
+    runner._effective_provider = None
+    runner.session_store = MagicMock()
+    session_key = build_session_key(_make_source())
+    session_entry = SessionEntry(
+        session_key=session_key,
+        session_id="sess-1",
+        created_at=datetime.now(),
+        updated_at=datetime.now(),
+        platform=Platform.TELEGRAM,
+        chat_type="dm",
+    )
+    runner.session_store.get_or_create_session.return_value = session_entry
+    runner.session_store._entries = {session_key: session_entry}
+    return runner
+
+
+# ---------------------------------------------------------------------------
+# Tests: _apply_session_model_override
+# ---------------------------------------------------------------------------
+
+
+class TestApplySessionModelOverride:
+    """Verify _apply_session_model_override replaces config defaults."""
+
+    def test_override_replaces_all_fields(self):
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+
+        runner._session_model_overrides[sk] = {
+            "model": "gpt-5.4-turbo",
+            "provider": "openrouter",
+            "api_key": "or-key-123",
+            "base_url": "https://openrouter.ai/api/v1",
+            "api_mode": "chat_completions",
+        }
+
+        model, rt = runner._apply_session_model_override(
+            sk,
+            "anthropic/claude-sonnet-4",
+            {"provider": "anthropic", "api_key": "ant-key", "base_url": "https://api.anthropic.com", "api_mode": "anthropic_messages"},
+        )
+
+        assert model == "gpt-5.4-turbo"
+        assert rt["provider"] == "openrouter"
+        assert rt["api_key"] == "or-key-123"
+        assert rt["base_url"] == "https://openrouter.ai/api/v1"
+        assert rt["api_mode"] == "chat_completions"
+
+    def test_no_override_returns_originals(self):
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+
+        orig_model = "anthropic/claude-sonnet-4"
+        orig_rt = {"provider": "anthropic", "api_key": "key", "base_url": "https://api.anthropic.com", "api_mode": "anthropic_messages"}
+
+        model, rt = runner._apply_session_model_override(sk, orig_model, dict(orig_rt))
+
+        assert model == orig_model
+        assert rt == orig_rt
+
+    def test_none_values_do_not_overwrite(self):
+        """Override with None api_key/base_url should preserve config defaults."""
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+
+        runner._session_model_overrides[sk] = {
+            "model": "gpt-5.4",
+            "provider": "openai",
+            "api_key": None,
+            "base_url": None,
+            "api_mode": "chat_completions",
+        }
+
+        model, rt = runner._apply_session_model_override(
+            sk,
+            "anthropic/claude-sonnet-4",
+            {"provider": "anthropic", "api_key": "ant-key", "base_url": "https://api.anthropic.com", "api_mode": "anthropic_messages"},
+        )
+
+        assert model == "gpt-5.4"
+        assert rt["provider"] == "openai"
+        assert rt["api_key"] == "ant-key"  # preserved — None didn't overwrite
+        assert rt["base_url"] == "https://api.anthropic.com"  # preserved
+        assert rt["api_mode"] == "chat_completions"  # overwritten (not None)
+
+    def test_empty_string_overwrites(self):
+        """Empty string is not None — it should overwrite the config value."""
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+
+        runner._session_model_overrides[sk] = {
+            "model": "local-model",
+            "provider": "custom",
+            "api_key": "local-key",
+            "base_url": "",
+            "api_mode": "chat_completions",
+        }
+
+        _, rt = runner._apply_session_model_override(
+            sk,
+            "anthropic/claude-sonnet-4",
+            {"provider": "anthropic", "api_key": "ant-key", "base_url": "https://api.anthropic.com", "api_mode": "anthropic_messages"},
+        )
+
+        assert rt["base_url"] == ""  # empty string overwrites
+
+    def test_different_session_key_not_affected(self):
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+        other_sk = "other_session"
+
+        runner._session_model_overrides[other_sk] = {
+            "model": "gpt-5.4",
+            "provider": "openai",
+            "api_key": "key",
+            "base_url": "",
+            "api_mode": "chat_completions",
+        }
+
+        model, rt = runner._apply_session_model_override(
+            sk,
+            "anthropic/claude-sonnet-4",
+            {"provider": "anthropic", "api_key": "ant-key", "base_url": "url", "api_mode": "anthropic_messages"},
+        )
+
+        assert model == "anthropic/claude-sonnet-4"  # unchanged — wrong session key
+
+
+# ---------------------------------------------------------------------------
+# Tests: _is_intentional_model_switch
+# ---------------------------------------------------------------------------
+
+
+class TestIsIntentionalModelSwitch:
+    """Verify fallback detection respects intentional /model overrides."""
+
+    def test_matches_override(self):
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+
+        runner._session_model_overrides[sk] = {
+            "model": "gpt-5.4",
+            "provider": "openai",
+            "api_key": "key",
+            "base_url": "",
+            "api_mode": "chat_completions",
+        }
+
+        assert runner._is_intentional_model_switch(sk, "gpt-5.4") is True
+
+    def test_no_override_returns_false(self):
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+
+        assert runner._is_intentional_model_switch(sk, "gpt-5.4") is False
+
+    def test_different_model_returns_false(self):
+        """Agent fell back to a different model than the override."""
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+
+        runner._session_model_overrides[sk] = {
+            "model": "gpt-5.4",
+            "provider": "openai",
+            "api_key": "key",
+            "base_url": "",
+            "api_mode": "chat_completions",
+        }
+
+        assert runner._is_intentional_model_switch(sk, "gpt-5.4-mini") is False
+
+    def test_wrong_session_key(self):
+        runner = _make_runner()
+        sk = build_session_key(_make_source())
+
+        runner._session_model_overrides["other_session"] = {
+            "model": "gpt-5.4",
+            "provider": "openai",
+            "api_key": "key",
+            "base_url": "",
+            "api_mode": "chat_completions",
+        }
+
+        assert runner._is_intentional_model_switch(sk, "gpt-5.4") is False
@@ -0,0 +1,448 @@
+"""Tests for text message batching across all gateway adapters.
+
+When a user sends a long message, the messaging client splits it at the
+platform's character limit.  Each adapter should buffer rapid successive
+text messages from the same session and aggregate them before dispatching.
+
+Covers: Discord, Matrix, WeCom, and the adaptive delay logic for
+Telegram and Feishu.
+"""
+
+import asyncio
+import os
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import MessageEvent, MessageType, SessionSource
+
+
+# =====================================================================
+# Helpers
+# =====================================================================
+
+def _make_event(
+    text: str,
+    platform: Platform,
+    chat_id: str = "12345",
+    msg_type: MessageType = MessageType.TEXT,
+) -> MessageEvent:
+    return MessageEvent(
+        text=text,
+        message_type=msg_type,
+        source=SessionSource(platform=platform, chat_id=chat_id, chat_type="dm"),
+    )
+
+
+# =====================================================================
+# Discord text batching
+# =====================================================================
+
+def _make_discord_adapter():
+    """Create a minimal DiscordAdapter for testing text batching."""
+    from gateway.platforms.discord import DiscordAdapter
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(DiscordAdapter)
+    adapter._platform = Platform.DISCORD
+    adapter.config = config
+    adapter._pending_text_batches = {}
+    adapter._pending_text_batch_tasks = {}
+    adapter._text_batch_delay_seconds = 0.1  # fast for tests
+    adapter._text_batch_split_delay_seconds = 0.3  # fast for tests
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+class TestDiscordTextBatching:
+    @pytest.mark.asyncio
+    async def test_single_message_dispatched_after_delay(self):
+        adapter = _make_discord_adapter()
+        event = _make_event("hello world", Platform.DISCORD)
+
+        adapter._enqueue_text_event(event)
+
+        # Not dispatched yet
+        adapter.handle_message.assert_not_called()
+
+        # Wait for flush
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        dispatched = adapter.handle_message.call_args[0][0]
+        assert dispatched.text == "hello world"
+
+    @pytest.mark.asyncio
+    async def test_split_messages_aggregated(self):
+        """Two rapid messages from the same chat should be merged."""
+        adapter = _make_discord_adapter()
+
+        adapter._enqueue_text_event(_make_event("Part one of a long", Platform.DISCORD))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("message that was split.", Platform.DISCORD))
+
+        adapter.handle_message.assert_not_called()
+
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "Part one" in text
+        assert "split" in text
+
+    @pytest.mark.asyncio
+    async def test_three_way_split_aggregated(self):
+        adapter = _make_discord_adapter()
+
+        adapter._enqueue_text_event(_make_event("chunk 1", Platform.DISCORD))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("chunk 2", Platform.DISCORD))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("chunk 3", Platform.DISCORD))
+
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "chunk 1" in text
+        assert "chunk 2" in text
+        assert "chunk 3" in text
+
+    @pytest.mark.asyncio
+    async def test_different_chats_not_merged(self):
+        adapter = _make_discord_adapter()
+
+        adapter._enqueue_text_event(_make_event("from A", Platform.DISCORD, chat_id="111"))
+        adapter._enqueue_text_event(_make_event("from B", Platform.DISCORD, chat_id="222"))
+
+        await asyncio.sleep(0.2)
+
+        assert adapter.handle_message.call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_batch_cleans_up_after_flush(self):
+        adapter = _make_discord_adapter()
+
+        adapter._enqueue_text_event(_make_event("test", Platform.DISCORD))
+        await asyncio.sleep(0.2)
+
+        assert len(adapter._pending_text_batches) == 0
+
+    @pytest.mark.asyncio
+    async def test_adaptive_delay_for_near_limit_chunk(self):
+        """Chunks near the 2000-char limit should trigger longer delay."""
+        adapter = _make_discord_adapter()
+        # Simulate a chunk near Discord's 2000-char split point
+        long_text = "x" * 1950
+        adapter._enqueue_text_event(_make_event(long_text, Platform.DISCORD))
+
+        # After the short delay (0.1s), should NOT have flushed yet (split delay is 0.3s)
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_not_called()
+
+        # After the split delay, should be flushed
+        await asyncio.sleep(0.25)
+        adapter.handle_message.assert_called_once()
+
+
+# =====================================================================
+# Matrix text batching
+# =====================================================================
+
+def _make_matrix_adapter():
+    """Create a minimal MatrixAdapter for testing text batching."""
+    from gateway.platforms.matrix import MatrixAdapter
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(MatrixAdapter)
+    adapter._platform = Platform.MATRIX
+    adapter.config = config
+    adapter._pending_text_batches = {}
+    adapter._pending_text_batch_tasks = {}
+    adapter._text_batch_delay_seconds = 0.1
+    adapter._text_batch_split_delay_seconds = 0.3
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+class TestMatrixTextBatching:
+    @pytest.mark.asyncio
+    async def test_single_message_dispatched_after_delay(self):
+        adapter = _make_matrix_adapter()
+        event = _make_event("hello world", Platform.MATRIX)
+
+        adapter._enqueue_text_event(event)
+
+        adapter.handle_message.assert_not_called()
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        assert adapter.handle_message.call_args[0][0].text == "hello world"
+
+    @pytest.mark.asyncio
+    async def test_split_messages_aggregated(self):
+        adapter = _make_matrix_adapter()
+
+        adapter._enqueue_text_event(_make_event("first part", Platform.MATRIX))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("second part", Platform.MATRIX))
+
+        adapter.handle_message.assert_not_called()
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "first part" in text
+        assert "second part" in text
+
+    @pytest.mark.asyncio
+    async def test_different_rooms_not_merged(self):
+        adapter = _make_matrix_adapter()
+
+        adapter._enqueue_text_event(_make_event("room A", Platform.MATRIX, chat_id="!aaa:matrix.org"))
+        adapter._enqueue_text_event(_make_event("room B", Platform.MATRIX, chat_id="!bbb:matrix.org"))
+
+        await asyncio.sleep(0.2)
+
+        assert adapter.handle_message.call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_adaptive_delay_for_near_limit_chunk(self):
+        """Chunks near the 4000-char limit should trigger longer delay."""
+        adapter = _make_matrix_adapter()
+        long_text = "x" * 3950
+        adapter._enqueue_text_event(_make_event(long_text, Platform.MATRIX))
+
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_not_called()
+
+        await asyncio.sleep(0.25)
+        adapter.handle_message.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_batch_cleans_up_after_flush(self):
+        adapter = _make_matrix_adapter()
+        adapter._enqueue_text_event(_make_event("test", Platform.MATRIX))
+        await asyncio.sleep(0.2)
+        assert len(adapter._pending_text_batches) == 0
+
+
+# =====================================================================
+# WeCom text batching
+# =====================================================================
+
+def _make_wecom_adapter():
+    """Create a minimal WeComAdapter for testing text batching."""
+    from gateway.platforms.wecom import WeComAdapter
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(WeComAdapter)
+    adapter._platform = Platform.WECOM
+    adapter.config = config
+    adapter._pending_text_batches = {}
+    adapter._pending_text_batch_tasks = {}
+    adapter._text_batch_delay_seconds = 0.1
+    adapter._text_batch_split_delay_seconds = 0.3
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+class TestWeComTextBatching:
+    @pytest.mark.asyncio
+    async def test_single_message_dispatched_after_delay(self):
+        adapter = _make_wecom_adapter()
+        event = _make_event("hello world", Platform.WECOM)
+
+        adapter._enqueue_text_event(event)
+
+        adapter.handle_message.assert_not_called()
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        assert adapter.handle_message.call_args[0][0].text == "hello world"
+
+    @pytest.mark.asyncio
+    async def test_split_messages_aggregated(self):
+        adapter = _make_wecom_adapter()
+
+        adapter._enqueue_text_event(_make_event("first part", Platform.WECOM))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("second part", Platform.WECOM))
+
+        adapter.handle_message.assert_not_called()
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "first part" in text
+        assert "second part" in text
+
+    @pytest.mark.asyncio
+    async def test_different_chats_not_merged(self):
+        adapter = _make_wecom_adapter()
+
+        adapter._enqueue_text_event(_make_event("chat A", Platform.WECOM, chat_id="chat_a"))
+        adapter._enqueue_text_event(_make_event("chat B", Platform.WECOM, chat_id="chat_b"))
+
+        await asyncio.sleep(0.2)
+
+        assert adapter.handle_message.call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_adaptive_delay_for_near_limit_chunk(self):
+        """Chunks near the 4000-char limit should trigger longer delay."""
+        adapter = _make_wecom_adapter()
+        long_text = "x" * 3950
+        adapter._enqueue_text_event(_make_event(long_text, Platform.WECOM))
+
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_not_called()
+
+        await asyncio.sleep(0.25)
+        adapter.handle_message.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_batch_cleans_up_after_flush(self):
+        adapter = _make_wecom_adapter()
+        adapter._enqueue_text_event(_make_event("test", Platform.WECOM))
+        await asyncio.sleep(0.2)
+        assert len(adapter._pending_text_batches) == 0
+
+
+# =====================================================================
+# Telegram adaptive delay (PR #6891)
+# =====================================================================
+
+def _make_telegram_adapter():
+    """Create a minimal TelegramAdapter for testing adaptive delay."""
+    from gateway.platforms.telegram import TelegramAdapter
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(TelegramAdapter)
+    adapter._platform = Platform.TELEGRAM
+    adapter.config = config
+    adapter._pending_text_batches = {}
+    adapter._pending_text_batch_tasks = {}
+    adapter._text_batch_delay_seconds = 0.1
+    adapter._text_batch_split_delay_seconds = 0.3
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+class TestTelegramAdaptiveDelay:
+    @pytest.mark.asyncio
+    async def test_short_chunk_uses_normal_delay(self):
+        adapter = _make_telegram_adapter()
+        adapter._enqueue_text_event(_make_event("short msg", Platform.TELEGRAM))
+
+        # Should flush after the normal 0.1s delay
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_near_limit_chunk_uses_split_delay(self):
+        """A chunk near the 4096-char limit should trigger longer delay."""
+        adapter = _make_telegram_adapter()
+        long_text = "x" * 4050  # near the 4096 limit
+        adapter._enqueue_text_event(_make_event(long_text, Platform.TELEGRAM))
+
+        # After the short delay, should NOT have flushed yet
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_not_called()
+
+        # After the split delay, should be flushed
+        await asyncio.sleep(0.25)
+        adapter.handle_message.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_split_continuation_merged(self):
+        """Two near-limit chunks should both be merged."""
+        adapter = _make_telegram_adapter()
+
+        adapter._enqueue_text_event(_make_event("x" * 4050, Platform.TELEGRAM))
+        await asyncio.sleep(0.05)
+        adapter._enqueue_text_event(_make_event("continuation text", Platform.TELEGRAM))
+
+        # Short chunk arrived → should use normal delay now
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "continuation text" in text
+
+
+# =====================================================================
+# Feishu adaptive delay
+# =====================================================================
+
+def _make_feishu_adapter():
+    """Create a minimal FeishuAdapter for testing adaptive delay."""
+    from gateway.platforms.feishu import FeishuAdapter, FeishuBatchState
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(FeishuAdapter)
+    adapter._platform = Platform.FEISHU
+    adapter.config = config
+    batch_state = FeishuBatchState()
+    adapter._pending_text_batches = batch_state.events
+    adapter._pending_text_batch_tasks = batch_state.tasks
+    adapter._pending_text_batch_counts = batch_state.counts
+    adapter._text_batch_delay_seconds = 0.1
+    adapter._text_batch_split_delay_seconds = 0.3
+    adapter._text_batch_max_messages = 20
+    adapter._text_batch_max_chars = 50000
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter._handle_message_with_guards = AsyncMock()
+    return adapter
+
+
+class TestFeishuAdaptiveDelay:
+    @pytest.mark.asyncio
+    async def test_short_chunk_uses_normal_delay(self):
+        adapter = _make_feishu_adapter()
+        event = _make_event("short msg", Platform.FEISHU)
+        await adapter._enqueue_text_event(event)
+
+        await asyncio.sleep(0.15)
+        adapter._handle_message_with_guards.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_near_limit_chunk_uses_split_delay(self):
+        """A chunk near the 4096-char limit should trigger longer delay."""
+        adapter = _make_feishu_adapter()
+        long_text = "x" * 4050
+        event = _make_event(long_text, Platform.FEISHU)
+        await adapter._enqueue_text_event(event)
+
+        await asyncio.sleep(0.15)
+        adapter._handle_message_with_guards.assert_not_called()
+
+        await asyncio.sleep(0.25)
+        adapter._handle_message_with_guards.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_split_continuation_merged(self):
+        adapter = _make_feishu_adapter()
+
+        await adapter._enqueue_text_event(_make_event("x" * 4050, Platform.FEISHU))
+        await asyncio.sleep(0.05)
+        await adapter._enqueue_text_event(_make_event("continuation text", Platform.FEISHU))
+
+        await asyncio.sleep(0.15)
+        adapter._handle_message_with_guards.assert_called_once()
+        text = adapter._handle_message_with_guards.call_args[0][0].text
+        assert "continuation text" in text
@@ -0,0 +1,177 @@
+"""Tests for gateway /usage command — agent cache lookup and output fields."""
+
+import asyncio
+import threading
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+
+def _make_mock_agent(**overrides):
+    """Create a mock AIAgent with realistic session counters."""
+    agent = MagicMock()
+    defaults = {
+        "model": "anthropic/claude-sonnet-4.6",
+        "provider": "openrouter",
+        "base_url": None,
+        "session_total_tokens": 50_000,
+        "session_api_calls": 5,
+        "session_prompt_tokens": 40_000,
+        "session_completion_tokens": 10_000,
+        "session_input_tokens": 35_000,
+        "session_output_tokens": 10_000,
+        "session_cache_read_tokens": 5_000,
+        "session_cache_write_tokens": 2_000,
+    }
+    defaults.update(overrides)
+    for k, v in defaults.items():
+        setattr(agent, k, v)
+
+    # Rate limit state
+    rl = MagicMock()
+    rl.has_data = True
+    agent.get_rate_limit_state.return_value = rl
+
+    # Context compressor
+    ctx = MagicMock()
+    ctx.last_prompt_tokens = 30_000
+    ctx.context_length = 200_000
+    ctx.compression_count = 1
+    agent.context_compressor = ctx
+
+    return agent
+
+
+def _make_runner(session_key, agent=None, cached_agent=None):
+    """Build a bare GatewayRunner with just the fields _handle_usage_command needs."""
+    from gateway.run import GatewayRunner, _AGENT_PENDING_SENTINEL
+
+    runner = object.__new__(GatewayRunner)
+    runner._running_agents = {}
+    runner._running_agents_ts = {}
+    runner._agent_cache = {}
+    runner._agent_cache_lock = threading.Lock()
+    runner.session_store = MagicMock()
+
+    if agent is not None:
+        runner._running_agents[session_key] = agent
+
+    if cached_agent is not None:
+        runner._agent_cache[session_key] = (cached_agent, "sig")
+
+    # Wire helper
+    runner._session_key_for_source = MagicMock(return_value=session_key)
+
+    return runner
+
+
+SK = "agent:main:telegram:private:12345"
+
+
+class TestUsageCachedAgent:
+    """The main fix: /usage should find agents in _agent_cache between turns."""
+
+    @pytest.mark.asyncio
+    async def test_cached_agent_shows_detailed_usage(self):
+        agent = _make_mock_agent()
+        runner = _make_runner(SK, cached_agent=agent)
+        event = MagicMock()
+
+        with patch("agent.rate_limit_tracker.format_rate_limit_compact", return_value="RPM: 50/60"), \
+             patch("agent.usage_pricing.estimate_usage_cost") as mock_cost:
+            mock_cost.return_value = MagicMock(amount_usd=0.1234, status="estimated")
+            result = await runner._handle_usage_command(event)
+
+        assert "claude-sonnet-4.6" in result
+        assert "35,000" in result  # input tokens
+        assert "10,000" in result  # output tokens
+        assert "5,000" in result   # cache read
+        assert "2,000" in result   # cache write
+        assert "50,000" in result  # total
+        assert "$0.1234" in result
+        assert "30,000" in result  # context
+        assert "Compressions: 1" in result
+
+    @pytest.mark.asyncio
+    async def test_running_agent_preferred_over_cache(self):
+        """When agent is in both dicts, the running one wins."""
+        running = _make_mock_agent(session_api_calls=10, session_total_tokens=80_000)
+        cached = _make_mock_agent(session_api_calls=5, session_total_tokens=50_000)
+        runner = _make_runner(SK, agent=running, cached_agent=cached)
+        event = MagicMock()
+
+        with patch("agent.rate_limit_tracker.format_rate_limit_compact", return_value="RPM: 50/60"), \
+             patch("agent.usage_pricing.estimate_usage_cost") as mock_cost:
+            mock_cost.return_value = MagicMock(amount_usd=None, status="unknown")
+            result = await runner._handle_usage_command(event)
+
+        assert "80,000" in result   # running agent's total
+        assert "API calls: 10" in result
+
+    @pytest.mark.asyncio
+    async def test_sentinel_skipped_uses_cache(self):
+        """PENDING sentinel in _running_agents should fall through to cache."""
+        from gateway.run import _AGENT_PENDING_SENTINEL
+
+        cached = _make_mock_agent()
+        runner = _make_runner(SK, cached_agent=cached)
+        runner._running_agents[SK] = _AGENT_PENDING_SENTINEL
+        event = MagicMock()
+
+        with patch("agent.rate_limit_tracker.format_rate_limit_compact", return_value="RPM: 50/60"), \
+             patch("agent.usage_pricing.estimate_usage_cost") as mock_cost:
+            mock_cost.return_value = MagicMock(amount_usd=None, status="unknown")
+            result = await runner._handle_usage_command(event)
+
+        assert "claude-sonnet-4.6" in result
+        assert "Session Token Usage" in result
+
+    @pytest.mark.asyncio
+    async def test_no_agent_anywhere_falls_to_history(self):
+        """No running or cached agent → rough estimate from transcript."""
+        runner = _make_runner(SK)
+        event = MagicMock()
+
+        session_entry = MagicMock()
+        session_entry.session_id = "sess123"
+        runner.session_store.get_or_create_session.return_value = session_entry
+        runner.session_store.load_transcript.return_value = [
+            {"role": "user", "content": "hello"},
+            {"role": "assistant", "content": "hi there"},
+        ]
+
+        with patch("agent.model_metadata.estimate_messages_tokens_rough", return_value=500):
+            result = await runner._handle_usage_command(event)
+
+        assert "Session Info" in result
+        assert "Messages: 2" in result
+        assert "~500" in result
+
+    @pytest.mark.asyncio
+    async def test_cache_read_write_hidden_when_zero(self):
+        """Cache token lines should be omitted when zero."""
+        agent = _make_mock_agent(session_cache_read_tokens=0, session_cache_write_tokens=0)
+        runner = _make_runner(SK, cached_agent=agent)
+        event = MagicMock()
+
+        with patch("agent.rate_limit_tracker.format_rate_limit_compact", return_value="RPM: 50/60"), \
+             patch("agent.usage_pricing.estimate_usage_cost") as mock_cost:
+            mock_cost.return_value = MagicMock(amount_usd=None, status="unknown")
+            result = await runner._handle_usage_command(event)
+
+        assert "Cache read" not in result
+        assert "Cache write" not in result
+
+    @pytest.mark.asyncio
+    async def test_cost_included_status(self):
+        """Subscription-included providers show 'included' instead of dollar amount."""
+        agent = _make_mock_agent(provider="openai-codex")
+        runner = _make_runner(SK, cached_agent=agent)
+        event = MagicMock()
+
+        with patch("agent.rate_limit_tracker.format_rate_limit_compact", return_value="RPM: 50/60"), \
+             patch("agent.usage_pricing.estimate_usage_cost") as mock_cost:
+            mock_cost.return_value = MagicMock(amount_usd=None, status="included")
+            result = await runner._handle_usage_command(event)
+
+        assert "Cost: included" in result
@@ -508,6 +508,7 @@ class TestInboundMessages:
        from gateway.platforms.wecom import WeComAdapter

        adapter = WeComAdapter(PlatformConfig(enabled=True))
+        adapter._text_batch_delay_seconds = 0  # disable batching for tests
        adapter.handle_message = AsyncMock()
        adapter._extract_media = AsyncMock(return_value=(["/tmp/test.png"], ["image/png"]))

@@ -539,6 +540,7 @@ class TestInboundMessages:
        from gateway.platforms.wecom import WeComAdapter

        adapter = WeComAdapter(PlatformConfig(enabled=True))
+        adapter._text_batch_delay_seconds = 0  # disable batching for tests
        adapter.handle_message = AsyncMock()
        adapter._extract_media = AsyncMock(return_value=([], []))

@@ -0,0 +1,62 @@
+"""Tests for gateway /yolo session scoping."""
+
+import os
+
+import pytest
+
+import gateway.run as gateway_run
+from gateway.config import Platform
+from gateway.platforms.base import MessageEvent
+from gateway.session import SessionSource
+from tools.approval import clear_session, is_session_yolo_enabled
+
+
+@pytest.fixture(autouse=True)
+def _clean_yolo_state(monkeypatch):
+    monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
+    clear_session("agent:main:telegram:dm:chat-a")
+    clear_session("agent:main:telegram:dm:chat-b")
+    yield
+    monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
+    clear_session("agent:main:telegram:dm:chat-a")
+    clear_session("agent:main:telegram:dm:chat-b")
+
+
+def _make_runner():
+    runner = object.__new__(gateway_run.GatewayRunner)
+    runner.session_store = None
+    runner.config = None
+    return runner
+
+
+def _make_event(chat_id: str) -> MessageEvent:
+    source = SessionSource(
+        platform=Platform.TELEGRAM,
+        user_id=f"user-{chat_id}",
+        chat_id=chat_id,
+        user_name="tester",
+        chat_type="dm",
+    )
+    return MessageEvent(text="/yolo", source=source)
+
+
+@pytest.mark.asyncio
+async def test_yolo_command_toggles_only_current_session(monkeypatch):
+    runner = _make_runner()
+
+    event_a = _make_event("chat-a")
+    session_a = runner._session_key_for_source(event_a.source)
+    session_b = runner._session_key_for_source(_make_event("chat-b").source)
+
+    result_on = await runner._handle_yolo_command(event_a)
+
+    assert "ON" in result_on
+    assert is_session_yolo_enabled(session_a) is True
+    assert is_session_yolo_enabled(session_b) is False
+    assert os.environ.get("HERMES_YOLO_MODE") is None
+
+    result_off = await runner._handle_yolo_command(event_a)
+
+    assert "OFF" in result_off
+    assert is_session_yolo_enabled(session_a) is False
+    assert os.environ.get("HERMES_YOLO_MODE") is None
@@ -633,6 +633,7 @@ class TestHasAnyProviderConfigured:
        hermes_home.mkdir()
        monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
        monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
+        monkeypatch.setattr("hermes_cli.copilot_auth.resolve_copilot_token", lambda: ("", ""))
        # Clear all provider env vars so earlier checks don't short-circuit
        _all_vars = {"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
                      "ANTHROPIC_TOKEN", "OPENAI_BASE_URL"}
@@ -727,6 +728,7 @@ class TestHasAnyProviderConfigured:
        monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
        monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
        monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+        monkeypatch.setattr("hermes_cli.copilot_auth.resolve_copilot_token", lambda: ("", ""))
        _all_vars = {"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
                      "ANTHROPIC_TOKEN", "OPENAI_BASE_URL"}
        for pconfig in PROVIDER_REGISTRY.values():
@@ -49,6 +49,30 @@ def test_chat_subcommand_accepts_skills_flag(monkeypatch):
    }


+def test_chat_subcommand_accepts_image_flag(monkeypatch):
+    import hermes_cli.main as main_mod
+
+    captured = {}
+
+    def fake_cmd_chat(args):
+        captured["query"] = args.query
+        captured["image"] = args.image
+
+    monkeypatch.setattr(main_mod, "cmd_chat", fake_cmd_chat)
+    monkeypatch.setattr(
+        sys,
+        "argv",
+        ["hermes", "chat", "-q", "hello", "--image", "~/storage/shared/Pictures/cat.png"],
+    )
+
+    main_mod.main()
+
+    assert captured == {
+        "query": "hello",
+        "image": "~/storage/shared/Pictures/cat.png",
+    }
+
+
 def test_continue_worktree_and_skills_flags_work_together(monkeypatch):
    import hermes_cli.main as main_mod

@@ -446,6 +446,13 @@ class TestSubcommands:
        assert "show" in subs
        assert "hide" in subs

+    def test_fast_has_subcommands(self):
+        assert "/fast" in SUBCOMMANDS
+        subs = SUBCOMMANDS["/fast"]
+        assert "fast" in subs
+        assert "normal" in subs
+        assert "status" in subs
+
    def test_voice_has_subcommands(self):
        assert "/voice" in SUBCOMMANDS
        assert "on" in SUBCOMMANDS["/voice"]
@@ -474,6 +481,20 @@ class TestSubcommandCompletion:
        assert "high" in texts
        assert "show" in texts

+    def test_fast_subcommand_completion_after_space(self):
+        completions = _completions(SlashCommandCompleter(), "/fast ")
+        texts = {c.text for c in completions}
+        assert "fast" in texts
+        assert "normal" in texts
+
+    def test_fast_command_filtered_out_when_unavailable(self):
+        completions = _completions(
+            SlashCommandCompleter(command_filter=lambda cmd: cmd != "/fast"),
+            "/fa",
+        )
+        texts = {c.text for c in completions}
+        assert "fast" not in texts
+
    def test_subcommand_prefix_filters(self):
        """Typing '/reasoning sh' should only show 'show'."""
        completions = _completions(SlashCommandCompleter(), "/reasoning sh")
@@ -527,6 +548,13 @@ class TestGhostText:
        """/reasoning sh → 'ow'"""
        assert _suggestion("/reasoning sh") == "ow"

+    def test_fast_subcommand_suggestion(self):
+        assert _suggestion("/fast f") == "ast"
+
+    def test_fast_subcommand_suggestion_hidden_when_filtered(self):
+        completer = SlashCommandCompleter(command_filter=lambda cmd: cmd != "/fast")
+        assert _suggestion("/fa", completer=completer) is None
+
    def test_no_suggestion_for_non_slash(self):
        assert _suggestion("hello") is None

@@ -14,6 +14,23 @@ from hermes_cli import doctor as doctor_mod
 from hermes_cli.doctor import _has_provider_env_config


+class TestDoctorPlatformHints:
+    def test_termux_package_hint(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        assert doctor._is_termux() is True
+        assert doctor._python_install_cmd() == "python -m pip install"
+        assert doctor._system_package_install_cmd("ripgrep") == "pkg install ripgrep"
+
+    def test_non_termux_package_hint_defaults_to_apt(self, monkeypatch):
+        monkeypatch.delenv("TERMUX_VERSION", raising=False)
+        monkeypatch.setenv("PREFIX", "/usr")
+        monkeypatch.setattr(sys, "platform", "linux")
+        assert doctor._is_termux() is False
+        assert doctor._python_install_cmd() == "uv pip install"
+        assert doctor._system_package_install_cmd("ripgrep") == "sudo apt install ripgrep"
+
+
 class TestProviderEnvDetection:
    def test_detects_openai_api_key(self):
        content = "OPENAI_BASE_URL=http://localhost:1234/v1\nOPENAI_API_KEY=***"
@@ -206,3 +223,72 @@ class TestDoctorMemoryProviderSection:
        out = self._run_doctor_and_capture(monkeypatch, tmp_path, provider="mem0")
        assert "Memory Provider" in out
        assert "Built-in memory active" not in out
+
+
+def test_run_doctor_termux_treats_docker_and_browser_warnings_as_expected(monkeypatch, tmp_path):
+    helper = TestDoctorMemoryProviderSection()
+    monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+    monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+
+    real_which = doctor_mod.shutil.which
+
+    def fake_which(cmd):
+        if cmd in {"docker", "node", "npm"}:
+            return None
+        return real_which(cmd)
+
+    monkeypatch.setattr(doctor_mod.shutil, "which", fake_which)
+
+    out = helper._run_doctor_and_capture(monkeypatch, tmp_path, provider="")
+
+    assert "Docker backend is not available inside Termux" in out
+    assert "Node.js not found (browser tools are optional in the tested Termux path)" in out
+    assert "Install Node.js on Termux with: pkg install nodejs" in out
+    assert "Termux browser setup:" in out
+    assert "1) pkg install nodejs" in out
+    assert "2) npm install -g agent-browser" in out
+    assert "3) agent-browser install" in out
+    assert "docker not found (optional)" not in out
+
+
+def test_run_doctor_termux_does_not_mark_browser_available_without_agent_browser(monkeypatch, tmp_path):
+    home = tmp_path / ".hermes"
+    home.mkdir(parents=True, exist_ok=True)
+    (home / "config.yaml").write_text("memory: {}\n", encoding="utf-8")
+    project = tmp_path / "project"
+    project.mkdir(exist_ok=True)
+
+    monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+    monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+    monkeypatch.setattr(doctor_mod, "HERMES_HOME", home)
+    monkeypatch.setattr(doctor_mod, "PROJECT_ROOT", project)
+    monkeypatch.setattr(doctor_mod, "_DHH", str(home))
+    monkeypatch.setattr(doctor_mod.shutil, "which", lambda cmd: "/data/data/com.termux/files/usr/bin/node" if cmd in {"node", "npm"} else None)
+
+    fake_model_tools = types.SimpleNamespace(
+        check_tool_availability=lambda *a, **kw: (["terminal"], [{"name": "browser", "env_vars": [], "tools": ["browser_navigate"]}]),
+        TOOLSET_REQUIREMENTS={
+            "terminal": {"name": "terminal"},
+            "browser": {"name": "browser"},
+        },
+    )
+    monkeypatch.setitem(sys.modules, "model_tools", fake_model_tools)
+
+    try:
+        from hermes_cli import auth as _auth_mod
+        monkeypatch.setattr(_auth_mod, "get_nous_auth_status", lambda: {})
+        monkeypatch.setattr(_auth_mod, "get_codex_auth_status", lambda: {})
+    except Exception:
+        pass
+
+    import io, contextlib
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        doctor_mod.run_doctor(Namespace(fix=False))
+    out = buf.getvalue()
+
+    assert "✓ browser" not in out
+    assert "browser" in out
+    assert "system dependency not met" in out
+    assert "agent-browser is not installed (expected in the tested Termux path)" in out
+    assert "npm install -g agent-browser && agent-browser install" in out
@@ -10,6 +10,7 @@ import hermes_cli.gateway as gateway
 class TestSystemdLingerStatus:
    def test_reports_enabled(self, monkeypatch):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setenv("USER", "alice")
        monkeypatch.setattr(
            gateway.subprocess,
@@ -22,6 +23,7 @@ class TestSystemdLingerStatus:

    def test_reports_disabled(self, monkeypatch):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setenv("USER", "alice")
        monkeypatch.setattr(
            gateway.subprocess,
@@ -32,6 +34,11 @@ class TestSystemdLingerStatus:

        assert gateway.get_systemd_linger_status() == (False, "")

+    def test_reports_termux_as_not_supported(self, monkeypatch):
+        monkeypatch.setattr(gateway, "is_termux", lambda: True)
+
+        assert gateway.get_systemd_linger_status() == (None, "not supported in Termux")
+

 def test_systemd_status_warns_when_linger_disabled(monkeypatch, tmp_path, capsys):
    unit_path = tmp_path / "hermes-gateway.service"
@@ -8,6 +8,7 @@ import hermes_cli.gateway as gateway
 class TestEnsureLingerEnabled:
    def test_linger_already_enabled_via_file(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: True))

@@ -22,6 +23,7 @@ class TestEnsureLingerEnabled:

    def test_status_enabled_skips_enable(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: False))
        monkeypatch.setattr(gateway, "get_systemd_linger_status", lambda: (True, ""))
@@ -37,6 +39,7 @@ class TestEnsureLingerEnabled:

    def test_loginctl_success_enables_linger(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: False))
        monkeypatch.setattr(gateway, "get_systemd_linger_status", lambda: (False, ""))
@@ -59,6 +62,7 @@ class TestEnsureLingerEnabled:

    def test_missing_loginctl_shows_manual_guidance(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: False))
        monkeypatch.setattr(gateway, "get_systemd_linger_status", lambda: (None, "loginctl not found"))
@@ -76,6 +80,7 @@ class TestEnsureLingerEnabled:

    def test_loginctl_failure_shows_manual_guidance(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: False))
        monkeypatch.setattr(gateway, "get_systemd_linger_status", lambda: (False, ""))
@@ -109,7 +109,8 @@ class TestGatewayStopCleanup:
        unit_path = tmp_path / "hermes-gateway.service"
        unit_path.write_text("unit\n", encoding="utf-8")

-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
        monkeypatch.setattr(gateway_cli, "get_systemd_unit_path", lambda system=False: unit_path)

@@ -134,7 +135,8 @@ class TestGatewayStopCleanup:
        unit_path = tmp_path / "hermes-gateway.service"
        unit_path.write_text("unit\n", encoding="utf-8")

-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
        monkeypatch.setattr(gateway_cli, "get_systemd_unit_path", lambda system=False: unit_path)

@@ -256,7 +258,8 @@ class TestGatewayServiceDetection:
        user_unit = SimpleNamespace(exists=lambda: True)
        system_unit = SimpleNamespace(exists=lambda: True)

-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
        monkeypatch.setattr(
            gateway_cli,
@@ -278,7 +281,8 @@ class TestGatewayServiceDetection:

 class TestGatewaySystemServiceRouting:
    def test_gateway_install_passes_system_flags(self, monkeypatch):
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)

        calls = []
@@ -294,11 +298,30 @@ class TestGatewaySystemServiceRouting:

        assert calls == [(True, True, "alice")]

+    def test_gateway_install_reports_termux_manual_mode(self, monkeypatch, capsys):
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: False)
+        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
+
+        try:
+            gateway_cli.gateway_command(
+                SimpleNamespace(gateway_command="install", force=False, system=False, run_as_user=None)
+            )
+        except SystemExit as exc:
+            assert exc.code == 1
+        else:
+            raise AssertionError("Expected gateway_command to exit on unsupported Termux service install")
+
+        out = capsys.readouterr().out
+        assert "not supported on Termux" in out
+        assert "Run manually: hermes gateway" in out
+
    def test_gateway_status_prefers_system_service_when_only_system_unit_exists(self, monkeypatch):
        user_unit = SimpleNamespace(exists=lambda: False)
        system_unit = SimpleNamespace(exists=lambda: True)

-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
        monkeypatch.setattr(
            gateway_cli,
@@ -313,6 +336,20 @@ class TestGatewaySystemServiceRouting:

        assert calls == [(False, False)]

+    def test_gateway_status_on_termux_shows_manual_guidance(self, monkeypatch, capsys):
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: False)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
+        monkeypatch.setattr(gateway_cli, "find_gateway_pids", lambda exclude_pids=None: [])
+        monkeypatch.setattr(gateway_cli, "_runtime_health_lines", lambda: [])
+
+        gateway_cli.gateway_command(SimpleNamespace(gateway_command="status", deep=False, system=False))
+
+        out = capsys.readouterr().out
+        assert "Gateway is not running" in out
+        assert "nohup hermes gateway" in out
+        assert "install as user service" not in out
+
    def test_gateway_restart_does_not_fallback_to_foreground_when_launchd_restart_fails(self, tmp_path, monkeypatch):
        plist_path = tmp_path / "ai.hermes.gateway.plist"
        plist_path.write_text("plist\n", encoding="utf-8")
@@ -513,12 +550,22 @@ class TestGeneratedUnitUsesDetectedVenv:
 class TestGeneratedUnitIncludesLocalBin:
    """~/.local/bin must be in PATH so uvx/pipx tools are discoverable."""

-    def test_user_unit_includes_local_bin_in_path(self):
+    def test_user_unit_includes_local_bin_in_path(self, monkeypatch):
+        home = Path.home()
+        monkeypatch.setattr(
+            gateway_cli,
+            "_build_user_local_paths",
+            lambda home_path, existing: [str(home / ".local" / "bin")],
+        )
        unit = gateway_cli.generate_systemd_unit(system=False)
-        home = str(Path.home())
        assert f"{home}/.local/bin" in unit

-    def test_system_unit_includes_local_bin_in_path(self):
+    def test_system_unit_includes_local_bin_in_path(self, monkeypatch):
+        monkeypatch.setattr(
+            gateway_cli,
+            "_build_user_local_paths",
+            lambda home_path, existing: [str(home_path / ".local" / "bin")],
+        )
        unit = gateway_cli.generate_systemd_unit(system=True)
        # System unit uses the resolved home dir from _system_service_identity
        assert "/.local/bin" in unit
@@ -124,7 +124,14 @@ class TestParseModelInput:

 class TestCuratedModelsForProvider:
    def test_openrouter_returns_curated_list(self):
-        models = curated_models_for_provider("openrouter")
+        with patch(
+            "hermes_cli.models.fetch_openrouter_models",
+            return_value=[
+                ("anthropic/claude-opus-4.6", "recommended"),
+                ("qwen/qwen3.6-plus", ""),
+            ],
+        ):
+            models = curated_models_for_provider("openrouter")
        assert len(models) > 0
        assert any("claude" in m[0] for m in models)

@@ -169,7 +176,14 @@ class TestProviderLabel:

 class TestProviderModelIds:
    def test_openrouter_returns_curated_list(self):
-        ids = provider_model_ids("openrouter")
+        with patch(
+            "hermes_cli.models.fetch_openrouter_models",
+            return_value=[
+                ("anthropic/claude-opus-4.6", "recommended"),
+                ("qwen/qwen3.6-plus", ""),
+            ],
+        ):
+            ids = provider_model_ids("openrouter")
        assert len(ids) > 0
        assert all("/" in mid for mid in ids)

@@ -3,7 +3,7 @@
 from unittest.mock import patch, MagicMock

 from hermes_cli.models import (
-    OPENROUTER_MODELS, menu_labels, model_ids, detect_provider_for_model,
+    OPENROUTER_MODELS, fetch_openrouter_models, menu_labels, model_ids, detect_provider_for_model,
    filter_nous_free_models, _NOUS_ALLOWED_FREE_MODELS,
    is_nous_free_tier, partition_nous_models_by_tier,
    check_nous_free_tier, clear_nous_free_tier_cache,
@@ -11,43 +11,57 @@ from hermes_cli.models import (
 )
 import hermes_cli.models as _models_mod

+LIVE_OPENROUTER_MODELS = [
+    ("anthropic/claude-opus-4.6", "recommended"),
+    ("qwen/qwen3.6-plus", ""),
+    ("nvidia/nemotron-3-super-120b-a12b:free", "free"),
+]
+

 class TestModelIds:
    def test_returns_non_empty_list(self):
-        ids = model_ids()
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            ids = model_ids()
        assert isinstance(ids, list)
        assert len(ids) > 0

-    def test_ids_match_models_list(self):
-        ids = model_ids()
-        expected = [mid for mid, _ in OPENROUTER_MODELS]
+    def test_ids_match_fetched_catalog(self):
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            ids = model_ids()
+        expected = [mid for mid, _ in LIVE_OPENROUTER_MODELS]
        assert ids == expected

    def test_all_ids_contain_provider_slash(self):
        """Model IDs should follow the provider/model format."""
-        for mid in model_ids():
-            assert "/" in mid, f"Model ID '{mid}' missing provider/ prefix"
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            for mid in model_ids():
+                assert "/" in mid, f"Model ID '{mid}' missing provider/ prefix"

    def test_no_duplicate_ids(self):
-        ids = model_ids()
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            ids = model_ids()
        assert len(ids) == len(set(ids)), "Duplicate model IDs found"


 class TestMenuLabels:
    def test_same_length_as_model_ids(self):
-        assert len(menu_labels()) == len(model_ids())
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            assert len(menu_labels()) == len(model_ids())

    def test_first_label_marked_recommended(self):
-        labels = menu_labels()
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            labels = menu_labels()
        assert "recommended" in labels[0].lower()

    def test_each_label_contains_its_model_id(self):
-        for label, mid in zip(menu_labels(), model_ids()):
-            assert mid in label, f"Label '{label}' doesn't contain model ID '{mid}'"
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            for label, mid in zip(menu_labels(), model_ids()):
+                assert mid in label, f"Label '{label}' doesn't contain model ID '{mid}'"

    def test_non_recommended_labels_have_no_tag(self):
        """Only the first model should have (recommended)."""
-        labels = menu_labels()
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            labels = menu_labels()
        for label in labels[1:]:
            assert "recommended" not in label.lower(), f"Unexpected 'recommended' in '{label}'"

@@ -65,30 +79,65 @@ class TestOpenRouterModels:
        assert len(OPENROUTER_MODELS) >= 5


+class TestFetchOpenRouterModels:
+    def test_live_fetch_recomputes_free_tags(self, monkeypatch):
+        class _Resp:
+            def __enter__(self):
+                return self
+
+            def __exit__(self, exc_type, exc, tb):
+                return False
+
+            def read(self):
+                return b'{"data":[{"id":"anthropic/claude-opus-4.6","pricing":{"prompt":"0.000015","completion":"0.000075"}},{"id":"qwen/qwen3.6-plus","pricing":{"prompt":"0.000000325","completion":"0.00000195"}},{"id":"nvidia/nemotron-3-super-120b-a12b:free","pricing":{"prompt":"0","completion":"0"}}]}'
+
+        monkeypatch.setattr(_models_mod, "_openrouter_catalog_cache", None)
+        with patch("hermes_cli.models.urllib.request.urlopen", return_value=_Resp()):
+            models = fetch_openrouter_models(force_refresh=True)
+
+        assert models == [
+            ("anthropic/claude-opus-4.6", "recommended"),
+            ("qwen/qwen3.6-plus", ""),
+            ("nvidia/nemotron-3-super-120b-a12b:free", "free"),
+        ]
+
+    def test_falls_back_to_static_snapshot_on_fetch_failure(self, monkeypatch):
+        monkeypatch.setattr(_models_mod, "_openrouter_catalog_cache", None)
+        with patch("hermes_cli.models.urllib.request.urlopen", side_effect=OSError("boom")):
+            models = fetch_openrouter_models(force_refresh=True)
+
+        assert models == OPENROUTER_MODELS
+
+
 class TestFindOpenrouterSlug:
    def test_exact_match(self):
        from hermes_cli.models import _find_openrouter_slug
-        assert _find_openrouter_slug("anthropic/claude-opus-4.6") == "anthropic/claude-opus-4.6"
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            assert _find_openrouter_slug("anthropic/claude-opus-4.6") == "anthropic/claude-opus-4.6"

    def test_bare_name_match(self):
        from hermes_cli.models import _find_openrouter_slug
-        result = _find_openrouter_slug("claude-opus-4.6")
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            result = _find_openrouter_slug("claude-opus-4.6")
        assert result == "anthropic/claude-opus-4.6"

    def test_case_insensitive(self):
        from hermes_cli.models import _find_openrouter_slug
-        result = _find_openrouter_slug("Anthropic/Claude-Opus-4.6")
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            result = _find_openrouter_slug("Anthropic/Claude-Opus-4.6")
        assert result is not None

    def test_unknown_returns_none(self):
        from hermes_cli.models import _find_openrouter_slug
-        assert _find_openrouter_slug("totally-fake-model-xyz") is None
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            assert _find_openrouter_slug("totally-fake-model-xyz") is None


 class TestDetectProviderForModel:
    def test_anthropic_model_detected(self):
        """claude-opus-4-6 should resolve to anthropic provider."""
-        result = detect_provider_for_model("claude-opus-4-6", "openai-codex")
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            result = detect_provider_for_model("claude-opus-4-6", "openai-codex")
        assert result is not None
        assert result[0] == "anthropic"

@@ -105,7 +154,8 @@ class TestDetectProviderForModel:

    def test_openrouter_slug_match(self):
        """Models in the OpenRouter catalog should be found."""
-        result = detect_provider_for_model("anthropic/claude-opus-4.6", "openai-codex")
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            result = detect_provider_for_model("anthropic/claude-opus-4.6", "openai-codex")
        assert result is not None
        assert result[0] == "openrouter"
        assert result[1] == "anthropic/claude-opus-4.6"
@@ -119,18 +169,21 @@ class TestDetectProviderForModel:
        ):
            monkeypatch.delenv(env_var, raising=False)
        """Bare model names should get mapped to full OpenRouter slugs."""
-        result = detect_provider_for_model("claude-opus-4.6", "openai-codex")
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            result = detect_provider_for_model("claude-opus-4.6", "openai-codex")
        assert result is not None
        # Should find it on OpenRouter with full slug
        assert result[1] == "anthropic/claude-opus-4.6"

    def test_unknown_model_returns_none(self):
        """Completely unknown model names should return None."""
-        assert detect_provider_for_model("nonexistent-model-xyz", "openai-codex") is None
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            assert detect_provider_for_model("nonexistent-model-xyz", "openai-codex") is None

    def test_aggregator_not_suggested(self):
        """nous/openrouter should never be auto-suggested as target provider."""
-        result = detect_provider_for_model("claude-opus-4-6", "openai-codex")
+        with patch("hermes_cli.models.fetch_openrouter_models", return_value=LIVE_OPENROUTER_MODELS):
+            result = detect_provider_for_model("claude-opus-4-6", "openai-codex")
        assert result is not None
        assert result[0] not in ("nous",)  # nous has claude models but shouldn't be suggested

@@ -142,6 +142,31 @@ def test_setup_custom_providers_synced(tmp_path, monkeypatch):
    assert reloaded.get("custom_providers") == [{"name": "Local", "base_url": "http://localhost:8080/v1"}]


+def test_setup_syncs_custom_provider_removal_from_disk(tmp_path, monkeypatch):
+    """Removing the last custom provider in model setup should persist."""
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    _clear_provider_env(monkeypatch)
+    _stub_tts(monkeypatch)
+
+    config = load_config()
+    config["custom_providers"] = [{"name": "Local", "base_url": "http://localhost:8080/v1"}]
+    save_config(config)
+
+    def fake_select():
+        cfg = load_config()
+        cfg["model"] = {"provider": "openrouter", "default": "anthropic/claude-opus-4.6"}
+        cfg["custom_providers"] = []
+        save_config(cfg)
+
+    monkeypatch.setattr("hermes_cli.main.select_provider_and_model", fake_select)
+
+    setup_model_provider(config)
+    save_config(config)
+
+    reloaded = load_config()
+    assert reloaded.get("custom_providers") == []
+
+
 def test_setup_cancel_preserves_existing_config(tmp_path, monkeypatch):
    """When the user cancels provider selection, existing config is preserved."""
    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
@@ -201,6 +226,38 @@ def test_setup_keyboard_interrupt_gracefully_handled(tmp_path, monkeypatch):
    setup_model_provider(config)


+def test_select_provider_and_model_warns_if_named_custom_provider_disappears(
+    tmp_path, monkeypatch, capsys
+):
+    """If a saved custom provider is deleted mid-selection, show a warning instead of silently doing nothing."""
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    _clear_provider_env(monkeypatch)
+
+    cfg = load_config()
+    cfg["custom_providers"] = [{"name": "Local", "base_url": "http://localhost:8080/v1"}]
+    save_config(cfg)
+
+    def fake_prompt_provider_choice(choices, default=0):
+        current = load_config()
+        current["custom_providers"] = []
+        save_config(current)
+        return next(i for i, label in enumerate(choices) if label.startswith("Local (localhost:8080/v1)"))
+
+    monkeypatch.setattr("hermes_cli.auth.resolve_provider", lambda provider: None)
+    monkeypatch.setattr("hermes_cli.main._prompt_provider_choice", fake_prompt_provider_choice)
+    monkeypatch.setattr(
+        "hermes_cli.main._model_flow_named_custom",
+        lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("named custom flow should not run")),
+    )
+
+    from hermes_cli.main import select_provider_and_model
+
+    select_provider_and_model()
+
+    out = capsys.readouterr().out
+    assert "selected saved custom provider is no longer available" in out
+
+
 def test_codex_setup_uses_runtime_access_token_for_live_model_list(tmp_path, monkeypatch):
    """Codex model list fetching uses the runtime access token."""
    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
@@ -0,0 +1,21 @@
+from pathlib import Path
+import subprocess
+
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+SETUP_SCRIPT = REPO_ROOT / "setup-hermes.sh"
+
+
+def test_setup_hermes_script_is_valid_shell():
+    result = subprocess.run(["bash", "-n", str(SETUP_SCRIPT)], capture_output=True, text=True)
+    assert result.returncode == 0, result.stderr
+
+
+def test_setup_hermes_script_has_termux_path():
+    content = SETUP_SCRIPT.read_text(encoding="utf-8")
+
+    assert "is_termux()" in content
+    assert ".[termux]" in content
+    assert "constraints-termux.txt" in content
+    assert "$PREFIX/bin" in content
+    assert "Skipping tinker-atropos on Termux" in content
@@ -230,6 +230,39 @@ def test_setup_same_provider_fallback_can_add_another_credential(tmp_path, monke
    assert config.get("credential_pool_strategies", {}).get("openrouter") == "fill_first"


+def test_setup_same_provider_single_credential_keeps_existing_rotation_strategy(tmp_path, monkeypatch):
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    _clear_provider_env(monkeypatch)
+    save_env_value("OPENROUTER_API_KEY", "or-key")
+
+    _write_model_config("openrouter", "", "anthropic/claude-opus-4.6")
+
+    config = load_config()
+    config["credential_pool_strategies"] = {"openrouter": "round_robin"}
+    save_config(config)
+
+    class _Entry:
+        def __init__(self, label):
+            self.label = label
+
+    class _Pool:
+        def entries(self):
+            return [_Entry("primary")]
+
+    def fake_select():
+        pass
+
+    monkeypatch.setattr("hermes_cli.main.select_provider_and_model", fake_select)
+    _stub_tts(monkeypatch)
+    monkeypatch.setattr("hermes_cli.setup.prompt", lambda *args, **kwargs: "")
+    monkeypatch.setattr("agent.credential_pool.load_pool", lambda provider: _Pool())
+    monkeypatch.setattr("agent.auxiliary_client.get_available_vision_backends", lambda: [])
+
+    setup_model_provider(config)
+
+    assert config.get("credential_pool_strategies", {}).get("openrouter") == "round_robin"
+
+
 def test_setup_pool_step_shows_manual_vs_auto_detected_counts(tmp_path, monkeypatch, capsys):
    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
    _clear_provider_env(monkeypatch)
@@ -4,6 +4,7 @@ from argparse import Namespace
 from unittest.mock import MagicMock, patch

 import pytest
+from hermes_cli.config import DEFAULT_CONFIG, load_config, save_config


 def _make_setup_args(**overrides):
@@ -34,6 +35,36 @@ def _make_chat_args(**overrides):
 class TestNonInteractiveSetup:
    """Verify setup paths exit cleanly in headless/non-interactive environments."""

+    def test_cmd_setup_allows_noninteractive_flag_without_tty(self):
+        """The CLI entrypoint should not block --non-interactive before setup.py handles it."""
+        from hermes_cli.main import cmd_setup
+
+        args = _make_setup_args(non_interactive=True)
+
+        with (
+            patch("hermes_cli.setup.run_setup_wizard") as mock_run_setup,
+            patch("sys.stdin") as mock_stdin,
+        ):
+            mock_stdin.isatty.return_value = False
+            cmd_setup(args)
+
+        mock_run_setup.assert_called_once_with(args)
+
+    def test_cmd_setup_defers_no_tty_handling_to_setup_wizard(self):
+        """Bare `hermes setup` should reach setup.py, which prints headless guidance."""
+        from hermes_cli.main import cmd_setup
+
+        args = _make_setup_args(non_interactive=False)
+
+        with (
+            patch("hermes_cli.setup.run_setup_wizard") as mock_run_setup,
+            patch("sys.stdin") as mock_stdin,
+        ):
+            mock_stdin.isatty.return_value = False
+            cmd_setup(args)
+
+        mock_run_setup.assert_called_once_with(args)
+
    def test_non_interactive_flag_skips_wizard(self, capsys):
        """--non-interactive should print guidance and not enter the wizard."""
        from hermes_cli.setup import run_setup_wizard
@@ -72,6 +103,26 @@ class TestNonInteractiveSetup:
        out = capsys.readouterr().out
        assert "hermes config set model.provider custom" in out

+    def test_reset_flag_rewrites_config_before_noninteractive_exit(self, tmp_path, monkeypatch, capsys):
+        """--reset should rewrite config.yaml even when the wizard cannot run interactively."""
+        from hermes_cli.setup import run_setup_wizard
+
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        cfg = load_config()
+        cfg["model"] = {"provider": "custom", "base_url": "http://localhost:8080/v1", "default": "llama3"}
+        cfg["agent"]["max_turns"] = 12
+        save_config(cfg)
+
+        args = _make_setup_args(non_interactive=True, reset=True)
+
+        run_setup_wizard(args)
+
+        reloaded = load_config()
+        assert reloaded["model"] == DEFAULT_CONFIG["model"]
+        assert reloaded["agent"]["max_turns"] == DEFAULT_CONFIG["agent"]["max_turns"]
+        out = capsys.readouterr().out
+        assert "Configuration reset to defaults." in out
+
    def test_chat_first_run_headless_skips_setup_prompt(self, capsys):
        """Bare `hermes` should not prompt for input when no provider exists and stdin is headless."""
        from hermes_cli.main import cmd_chat
@@ -117,7 +168,7 @@ class TestNonInteractiveSetup:
                side_effect=lambda key: "sk-test" if key == "OPENROUTER_API_KEY" else "",
            ),
            patch("hermes_cli.auth.get_active_provider", return_value=None),
-            patch.object(setup_mod, "prompt_choice", return_value=4),
+            patch.object(setup_mod, "prompt_choice", return_value=3),
            patch.object(
                setup_mod,
                "SETUP_SECTIONS",
@@ -137,3 +188,59 @@ class TestNonInteractiveSetup:

        terminal_section.assert_called_once_with(config)
        tts_section.assert_not_called()
+
+    def test_returning_user_menu_does_not_show_separator_rows(self, tmp_path):
+        """Returning-user menu should only show selectable actions."""
+        from hermes_cli import setup as setup_mod
+
+        args = _make_setup_args()
+        captured = {}
+
+        def fake_prompt_choice(question, choices, default=0):
+            captured["question"] = question
+            captured["choices"] = list(choices)
+            return len(choices) - 1
+
+        with (
+            patch.object(setup_mod, "ensure_hermes_home"),
+            patch.object(setup_mod, "load_config", return_value={}),
+            patch.object(setup_mod, "get_hermes_home", return_value=tmp_path),
+            patch.object(setup_mod, "is_interactive_stdin", return_value=True),
+            patch.object(
+                setup_mod,
+                "get_env_value",
+                side_effect=lambda key: "sk-test" if key == "OPENROUTER_API_KEY" else "",
+            ),
+            patch("hermes_cli.auth.get_active_provider", return_value=None),
+            patch.object(setup_mod, "prompt_choice", side_effect=fake_prompt_choice),
+        ):
+            setup_mod.run_setup_wizard(args)
+
+        assert captured["question"] == "What would you like to do?"
+        assert "---" not in captured["choices"]
+        assert captured["choices"] == [
+            "Quick Setup - configure missing items only",
+            "Full Setup - reconfigure everything",
+            "Model & Provider",
+            "Terminal Backend",
+            "Messaging Platforms (Gateway)",
+            "Tools",
+            "Agent Settings",
+            "Exit",
+        ]
+
+    def test_main_accepts_tts_setup_section(self, monkeypatch):
+        """`hermes setup tts` should parse and dispatch like other setup sections."""
+        from hermes_cli import main as main_mod
+
+        received = {}
+
+        def fake_cmd_setup(args):
+            received["section"] = args.section
+
+        monkeypatch.setattr(main_mod, "cmd_setup", fake_cmd_setup)
+        monkeypatch.setattr("sys.argv", ["hermes", "setup", "tts"])
+
+        main_mod.main()
+
+        assert received["section"] == "tts"
@@ -12,3 +12,33 @@ def test_show_status_includes_tavily_key(monkeypatch, capsys, tmp_path):
    output = capsys.readouterr().out
    assert "Tavily" in output
    assert "tvly...cdef" in output
+
+
+def test_show_status_termux_gateway_section_skips_systemctl(monkeypatch, capsys, tmp_path):
+    from hermes_cli import status as status_mod
+    import hermes_cli.auth as auth_mod
+    import hermes_cli.gateway as gateway_mod
+
+    monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+    monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+    monkeypatch.setattr(status_mod, "get_env_path", lambda: tmp_path / ".env", raising=False)
+    monkeypatch.setattr(status_mod, "get_hermes_home", lambda: tmp_path, raising=False)
+    monkeypatch.setattr(status_mod, "load_config", lambda: {"model": "gpt-5.4"}, raising=False)
+    monkeypatch.setattr(status_mod, "resolve_requested_provider", lambda requested=None: "openai-codex", raising=False)
+    monkeypatch.setattr(status_mod, "resolve_provider", lambda requested=None, **kwargs: "openai-codex", raising=False)
+    monkeypatch.setattr(status_mod, "provider_label", lambda provider: "OpenAI Codex", raising=False)
+    monkeypatch.setattr(auth_mod, "get_nous_auth_status", lambda: {}, raising=False)
+    monkeypatch.setattr(auth_mod, "get_codex_auth_status", lambda: {}, raising=False)
+    monkeypatch.setattr(gateway_mod, "find_gateway_pids", lambda exclude_pids=None: [], raising=False)
+
+    def _unexpected_systemctl(*args, **kwargs):
+        raise AssertionError("systemctl should not be called in the Termux status view")
+
+    monkeypatch.setattr(status_mod.subprocess, "run", _unexpected_systemctl)
+
+    status_mod.show_status(SimpleNamespace(all=False, deep=False))
+
+    output = capsys.readouterr().out
+    assert "Manager:      Termux / manual process" in output
+    assert "Start with:   hermes gateway" in output
+    assert "systemd (user)" not in output
@@ -0,0 +1,106 @@
+"""Regression tests for numbered fallbacks when TerminalMenu cannot initialize."""
+
+import subprocess
+import sys
+import types
+
+from hermes_cli.config import load_config, save_config
+
+
+class _BrokenTerminalMenu:
+    def __init__(self, *args, **kwargs):
+        raise subprocess.CalledProcessError(2, ["tput", "clear"])
+
+
+def test_prompt_model_selection_falls_back_on_terminalmenu_runtime_error(monkeypatch):
+    from hermes_cli.auth import _prompt_model_selection
+
+    monkeypatch.setitem(
+        sys.modules,
+        "simple_term_menu",
+        types.SimpleNamespace(TerminalMenu=_BrokenTerminalMenu),
+    )
+    responses = iter(["2"])
+    monkeypatch.setattr("builtins.input", lambda _prompt="": next(responses))
+
+    selected = _prompt_model_selection(["model-a", "model-b"])
+
+    assert selected == "model-b"
+
+
+def test_prompt_reasoning_effort_falls_back_on_terminalmenu_runtime_error(monkeypatch):
+    from hermes_cli.main import _prompt_reasoning_effort_selection
+
+    monkeypatch.setitem(
+        sys.modules,
+        "simple_term_menu",
+        types.SimpleNamespace(TerminalMenu=_BrokenTerminalMenu),
+    )
+    responses = iter(["3"])
+    monkeypatch.setattr("builtins.input", lambda _prompt="": next(responses))
+
+    selected = _prompt_reasoning_effort_selection(["low", "medium", "high"], current_effort="")
+
+    assert selected == "high"
+
+
+def test_remove_custom_provider_falls_back_on_terminalmenu_runtime_error(tmp_path, monkeypatch):
+    from hermes_cli.main import _remove_custom_provider
+
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    monkeypatch.setitem(
+        sys.modules,
+        "simple_term_menu",
+        types.SimpleNamespace(TerminalMenu=_BrokenTerminalMenu),
+    )
+
+    cfg = load_config()
+    cfg["custom_providers"] = [
+        {"name": "Local A", "base_url": "http://localhost:8001/v1"},
+        {"name": "Local B", "base_url": "http://localhost:8002/v1"},
+    ]
+    save_config(cfg)
+
+    responses = iter(["1"])
+    monkeypatch.setattr("builtins.input", lambda _prompt="": next(responses))
+
+    _remove_custom_provider(cfg)
+
+    reloaded = load_config()
+    assert reloaded["custom_providers"] == [
+        {"name": "Local B", "base_url": "http://localhost:8002/v1"},
+    ]
+
+
+def test_named_custom_provider_model_picker_falls_back_on_terminalmenu_runtime_error(tmp_path, monkeypatch):
+    from hermes_cli.main import _model_flow_named_custom
+
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    monkeypatch.setitem(
+        sys.modules,
+        "simple_term_menu",
+        types.SimpleNamespace(TerminalMenu=_BrokenTerminalMenu),
+    )
+    monkeypatch.setattr("hermes_cli.models.fetch_api_models", lambda *args, **kwargs: ["model-a", "model-b"])
+    monkeypatch.setattr("hermes_cli.auth.deactivate_provider", lambda: None)
+
+    cfg = load_config()
+    save_config(cfg)
+
+    responses = iter(["2"])
+    monkeypatch.setattr("builtins.input", lambda _prompt="": next(responses))
+
+    _model_flow_named_custom(
+        cfg,
+        {
+            "name": "Local",
+            "base_url": "http://localhost:8000/v1",
+            "api_key": "",
+            "model": "",
+        },
+    )
+
+    reloaded = load_config()
+    assert reloaded["model"]["provider"] == "custom"
+    assert reloaded["model"]["base_url"] == "http://localhost:8000/v1"
+    assert reloaded["model"]["default"] == "model-b"
@@ -213,8 +213,12 @@ def test_restore_stashed_changes_keeps_going_when_drop_fails(monkeypatch, tmp_pa
    assert "git stash drop stash@{0}" in out


-def test_restore_stashed_changes_prompts_before_reset_on_conflict(monkeypatch, tmp_path, capsys):
-    """When conflicts occur interactively, user is prompted before reset."""
+def test_restore_stashed_changes_always_resets_on_conflict(monkeypatch, tmp_path, capsys):
+    """Conflicts always auto-reset (no prompt) and return False, even interactively.
+
+    Leaving conflict markers in source files makes hermes unrunnable (SyntaxError).
+    The stash is preserved for manual recovery; cmd_update continues normally.
+    """
    calls = []

    def fake_run(cmd, **kwargs):
@@ -230,45 +234,19 @@ def test_restore_stashed_changes_prompts_before_reset_on_conflict(monkeypatch, t
    monkeypatch.setattr(hermes_main.subprocess, "run", fake_run)
    monkeypatch.setattr("builtins.input", lambda: "y")

-    with pytest.raises(SystemExit, match="1"):
-        hermes_main._restore_stashed_changes(["git"], tmp_path, "abc123", prompt_user=True)
+    result = hermes_main._restore_stashed_changes(["git"], tmp_path, "abc123", prompt_user=True)

+    assert result is False
    out = capsys.readouterr().out
    assert "Conflicted files:" in out
    assert "hermes_cli/main.py" in out
    assert "stashed changes are preserved" in out
-    assert "Reset working tree to clean state" in out
    assert "Working tree reset to clean state" in out
+    assert "git stash apply abc123" in out
    reset_calls = [c for c, _ in calls if c[1:3] == ["reset", "--hard"]]
    assert len(reset_calls) == 1


-def test_restore_stashed_changes_user_declines_reset(monkeypatch, tmp_path, capsys):
-    """When user declines reset, working tree is left as-is."""
-    calls = []
-
-    def fake_run(cmd, **kwargs):
-        calls.append((cmd, kwargs))
-        if cmd[1:3] == ["stash", "apply"]:
-            return SimpleNamespace(stdout="", stderr="conflict\n", returncode=1)
-        if cmd[1:3] == ["diff", "--name-only"]:
-            return SimpleNamespace(stdout="cli.py\n", stderr="", returncode=0)
-        raise AssertionError(f"unexpected command: {cmd}")
-
-    monkeypatch.setattr(hermes_main.subprocess, "run", fake_run)
-    # First input: "y" to restore, second input: "n" to decline reset
-    inputs = iter(["y", "n"])
-    monkeypatch.setattr("builtins.input", lambda: next(inputs))
-
-    with pytest.raises(SystemExit, match="1"):
-        hermes_main._restore_stashed_changes(["git"], tmp_path, "abc123", prompt_user=True)
-
-    out = capsys.readouterr().out
-    assert "left as-is" in out
-    reset_calls = [c for c, _ in calls if c[1:3] == ["reset", "--hard"]]
-    assert len(reset_calls) == 0
-
-
 def test_restore_stashed_changes_auto_resets_non_interactive(monkeypatch, tmp_path, capsys):
    """Non-interactive mode auto-resets without prompting and returns False
    instead of sys.exit(1) so the update can continue (gateway /update path)."""
@@ -368,9 +368,8 @@ class TestCmdUpdateLaunchdRestart:
        monkeypatch.setattr(
            gateway_cli, "is_macos", lambda: False,
        )
-        monkeypatch.setattr(
-            gateway_cli, "is_linux", lambda: True,
-        )
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
@@ -429,7 +428,8 @@ class TestCmdUpdateSystemService:
    ):
        """When user systemd is inactive but a system service exists, restart via system scope."""
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
@@ -458,7 +458,8 @@ class TestCmdUpdateSystemService:
    ):
        """When system service restart fails, show the failure message."""
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
@@ -480,7 +481,8 @@ class TestCmdUpdateSystemService:
    ):
        """When both user and system services are active, both are restarted."""
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
@@ -563,7 +565,8 @@ class TestServicePidExclusion:
    ):
        """After systemd restart, the sweep must exclude the service PID."""
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        SERVICE_PID = 55000

@@ -642,7 +645,8 @@ class TestGetServicePids:
    """Unit tests for _get_service_pids()."""

    def test_returns_systemd_main_pid(self, monkeypatch):
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)

        def fake_run(cmd, **kwargs):
@@ -691,7 +695,8 @@ class TestGetServicePids:

    def test_excludes_zero_pid(self, monkeypatch):
        """systemd returns MainPID=0 for stopped services; skip those."""
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)

        def fake_run(cmd, **kwargs):
@@ -172,6 +172,87 @@ class TestHTTP413Compression:
        mock_compress.assert_called_once()
        assert result["completed"] is True

+    def test_413_clears_conversation_history_on_persist(self, agent):
+        """After 413-triggered compression, _persist_session must receive None history.
+
+        Bug: _compress_context() creates a new session and resets _last_flushed_db_idx=0,
+        but if conversation_history still holds the original (pre-compression) list,
+        _flush_messages_to_session_db computes flush_from = max(len(history), 0) which
+        exceeds len(compressed_messages), so messages[flush_from:] is empty and nothing
+        is written to the new session → "Session found but has no messages" on resume.
+        """
+        err_413 = _make_413_error()
+        ok_resp = _mock_response(content="OK", finish_reason="stop")
+        agent.client.chat.completions.create.side_effect = [err_413, ok_resp]
+
+        big_history = [
+            {"role": "user" if i % 2 == 0 else "assistant", "content": f"msg {i}"}
+            for i in range(200)
+        ]
+
+        persist_calls = []
+
+        with (
+            patch.object(agent, "_compress_context") as mock_compress,
+            patch.object(
+                agent, "_persist_session",
+                side_effect=lambda msgs, hist: persist_calls.append(hist),
+            ),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            mock_compress.return_value = (
+                [{"role": "user", "content": "summary"}],
+                "compressed prompt",
+            )
+            agent.run_conversation("hello", conversation_history=big_history)
+
+        assert len(persist_calls) >= 1, "Expected at least one _persist_session call"
+        for hist in persist_calls:
+            assert hist is None, (
+                f"conversation_history should be None after mid-loop compression, "
+                f"got list with {len(hist)} items"
+            )
+
+    def test_context_overflow_clears_conversation_history_on_persist(self, agent):
+        """After context-overflow compression, _persist_session must receive None history."""
+        err_400 = Exception(
+            "Error code: 400 - This endpoint's maximum context length is 128000 tokens. "
+            "However, you requested about 270460 tokens."
+        )
+        err_400.status_code = 400
+        ok_resp = _mock_response(content="OK", finish_reason="stop")
+        agent.client.chat.completions.create.side_effect = [err_400, ok_resp]
+
+        big_history = [
+            {"role": "user" if i % 2 == 0 else "assistant", "content": f"msg {i}"}
+            for i in range(200)
+        ]
+
+        persist_calls = []
+
+        with (
+            patch.object(agent, "_compress_context") as mock_compress,
+            patch.object(
+                agent, "_persist_session",
+                side_effect=lambda msgs, hist: persist_calls.append(hist),
+            ),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            mock_compress.return_value = (
+                [{"role": "user", "content": "summary"}],
+                "compressed prompt",
+            )
+            agent.run_conversation("hello", conversation_history=big_history)
+
+        assert len(persist_calls) >= 1
+        for hist in persist_calls:
+            assert hist is None, (
+                f"conversation_history should be None after context-overflow compression, "
+                f"got list with {len(hist)} items"
+            )
+
    def test_400_context_length_triggers_compression(self, agent):
        """A 400 with 'maximum context length' should trigger compression, not abort as generic 4xx.

@@ -225,6 +225,26 @@ class TestDeveloperRoleSwap:
        assert kwargs["messages"][0]["role"] == "developer"


+class TestBuildApiKwargsChatCompletionsServiceTier:
+    """service_tier via request_overrides works on the chat_completions path."""
+
+    def test_includes_service_tier_via_request_overrides(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "openrouter")
+        agent.model = "gpt-4.1"
+        agent.request_overrides = {"service_tier": "priority"}
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert kwargs["service_tier"] == "priority"
+
+    def test_no_service_tier_when_overrides_empty(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "openrouter")
+        agent.model = "gpt-4.1"
+        agent.request_overrides = {}
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert "service_tier" not in kwargs
+
+
 class TestBuildApiKwargsAIGateway:
    def test_uses_chat_completions_format(self, monkeypatch):
        agent = _make_agent(monkeypatch, "ai-gateway", base_url="https://ai-gateway.vercel.sh/v1")
@@ -356,6 +376,25 @@ class TestBuildApiKwargsCodex:
        assert "reasoning" in kwargs
        assert kwargs["reasoning"]["effort"] == "medium"

+    def test_includes_service_tier_via_request_overrides(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "openai-codex", api_mode="codex_responses",
+                            base_url="https://chatgpt.com/backend-api/codex")
+        agent.model = "gpt-5.4"
+        agent.service_tier = "priority"
+        agent.request_overrides = {"service_tier": "priority"}
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert kwargs["service_tier"] == "priority"
+
+    def test_omits_max_output_tokens_for_codex_backend(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "openai-codex", api_mode="codex_responses",
+                            base_url="https://chatgpt.com/backend-api/codex")
+        agent.model = "gpt-5.4"
+        agent.max_tokens = 20
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert "max_output_tokens" not in kwargs
+
    def test_includes_encrypted_content_in_include(self, monkeypatch):
        agent = _make_agent(monkeypatch, "openai-codex", api_mode="codex_responses",
                            base_url="https://chatgpt.com/backend-api/codex")
@@ -5,6 +5,7 @@ pieces. The OpenAI client and tool loading are mocked so no network calls
 are made.
 """

+import io
 import json
 import logging
 import re
@@ -1061,6 +1062,77 @@ class TestExecuteToolCalls:
        assert len(messages[0]["content"]) < 150_000
        assert ("Truncated" in messages[0]["content"] or "<persisted-output>" in messages[0]["content"])

+    def test_quiet_tool_output_suppressed_when_progress_callback_present(self, agent):
+        tc = _mock_tool_call(name="web_search", arguments='{"q":"test"}', call_id="c1")
+        mock_msg = _mock_assistant_msg(content="", tool_calls=[tc])
+        messages = []
+        agent.tool_progress_callback = lambda *args, **kwargs: None
+
+        with patch("run_agent.handle_function_call", return_value="search result"), \
+             patch.object(agent, "_safe_print") as mock_print:
+            agent._execute_tool_calls(mock_msg, messages, "task-1")
+
+        mock_print.assert_not_called()
+        assert len(messages) == 1
+        assert messages[0]["role"] == "tool"
+
+    def test_quiet_tool_output_prints_without_progress_callback(self, agent):
+        tc = _mock_tool_call(name="web_search", arguments='{"q":"test"}', call_id="c1")
+        mock_msg = _mock_assistant_msg(content="", tool_calls=[tc])
+        messages = []
+        agent.tool_progress_callback = None
+
+        with patch("run_agent.handle_function_call", return_value="search result"), \
+             patch.object(agent, "_safe_print") as mock_print:
+            agent._execute_tool_calls(mock_msg, messages, "task-1")
+
+        mock_print.assert_called_once()
+        assert "search" in str(mock_print.call_args.args[0]).lower()
+        assert len(messages) == 1
+        assert messages[0]["role"] == "tool"
+
+    def test_vprint_suppressed_in_parseable_quiet_mode(self, agent):
+        agent.suppress_status_output = True
+
+        with patch.object(agent, "_safe_print") as mock_print:
+            agent._vprint("status line", force=True)
+            agent._vprint("normal line")
+
+        mock_print.assert_not_called()
+
+    def test_run_conversation_suppresses_retry_noise_in_parseable_quiet_mode(self, agent):
+        class _RateLimitError(Exception):
+            status_code = 429
+
+            def __str__(self):
+                return "Error code: 429 - Rate limit exceeded."
+
+        responses = [_RateLimitError(), _mock_response(content="Recovered")]
+
+        def _fake_api_call(api_kwargs):
+            result = responses.pop(0)
+            if isinstance(result, Exception):
+                raise result
+            return result
+
+        agent.suppress_status_output = True
+        agent._interruptible_api_call = _fake_api_call
+        agent._persist_session = lambda *args, **kwargs: None
+        agent._save_trajectory = lambda *args, **kwargs: None
+        agent._save_session_log = lambda *args, **kwargs: None
+
+        captured = io.StringIO()
+        agent._print_fn = lambda *args, **kw: print(*args, file=captured, **kw)
+
+        with patch("run_agent.time.sleep", return_value=None):
+            result = agent.run_conversation("hello")
+
+        assert result["completed"] is True
+        assert result["final_response"] == "Recovered"
+        output = captured.getvalue()
+        assert "API call failed" not in output
+        assert "Rate limit reached" not in output
+

 class TestConcurrentToolExecution:
    """Tests for _execute_tool_calls_concurrent and dispatch logic."""
@@ -1877,6 +1949,68 @@ class TestRunConversation:
        assert result["final_response"] is not None
        assert "Thinking Budget Exhausted" in result["final_response"]

+    def test_length_with_tool_calls_returns_partial_without_executing_tools(self, agent):
+        self._setup_agent(agent)
+        bad_tc = _mock_tool_call(
+            name="write_file",
+            arguments='{"path":"report.md","content":"partial',
+            call_id="c1",
+        )
+        resp = _mock_response(content="", finish_reason="length", tool_calls=[bad_tc])
+        agent.client.chat.completions.create.return_value = resp
+
+        with (
+            patch("run_agent.handle_function_call") as mock_handle_function_call,
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            result = agent.run_conversation("write the report")
+
+        assert result["completed"] is False
+        assert result["partial"] is True
+        assert "truncated due to output length limit" in result["error"]
+        mock_handle_function_call.assert_not_called()
+
+    def test_truncated_tool_call_retries_once_before_refusing(self, agent):
+        """When tool call args are truncated, the agent retries the API call
+        once. If the retry succeeds (valid JSON args), tool execution proceeds."""
+        self._setup_agent(agent)
+        agent.valid_tool_names.add("write_file")
+        bad_tc = _mock_tool_call(
+            name="write_file",
+            arguments='{"path":"report.md","content":"partial',
+            call_id="c1",
+        )
+        truncated_resp = _mock_response(
+            content="", finish_reason="length", tool_calls=[bad_tc],
+        )
+        good_tc = _mock_tool_call(
+            name="write_file",
+            arguments='{"path":"report.md","content":"full content"}',
+            call_id="c2",
+        )
+        good_resp = _mock_response(
+            content="", finish_reason="stop", tool_calls=[good_tc],
+        )
+        with (
+            patch("run_agent.handle_function_call", return_value='{"success":true}') as mock_hfc,
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            # First call: truncated → retry. Second: valid → execute tool.
+            # Third: final text response.
+            final_resp = _mock_response(content="Done!", finish_reason="stop")
+            agent.client.chat.completions.create.side_effect = [
+                truncated_resp, good_resp, final_resp,
+            ]
+            result = agent.run_conversation("write the report")
+
+        # Tool was executed on the retry (good_resp)
+        mock_hfc.assert_called_once()
+        assert result["final_response"] == "Done!"
+

 class TestRetryExhaustion:
    """Regression: retry_count > max_retries was dead code (off-by-one).
@@ -3010,6 +3144,20 @@ class TestStreamingApiCall:
        assert tc[0].function.name == "search"
        assert tc[1].function.name == "read"

+    def test_truncated_tool_call_args_upgrade_finish_reason_to_length(self, agent):
+        chunks = [
+            _make_chunk(tool_calls=[_make_tc_delta(0, "call_1", "write_file", '{"path":"x.txt","content":"hel')]),
+        ]
+        agent.client.chat.completions.create.return_value = iter(chunks)
+
+        resp = agent._interruptible_streaming_api_call({"messages": []})
+
+        tc = resp.choices[0].message.tool_calls
+        assert len(tc) == 1
+        assert tc[0].function.name == "write_file"
+        assert tc[0].function.arguments == '{"path":"x.txt","content":"hel'
+        assert resp.choices[0].finish_reason == "length"
+
    def test_ollama_reused_index_separate_tool_calls(self, agent):
        """Ollama sends every tool call at index 0 with different ids.

@@ -648,6 +648,15 @@ def test_preflight_codex_api_kwargs_allows_reasoning_and_temperature(monkeypatch
    assert result["max_output_tokens"] == 4096


+def test_preflight_codex_api_kwargs_allows_service_tier(monkeypatch):
+    agent = _build_agent(monkeypatch)
+    kwargs = _codex_request_kwargs()
+    kwargs["service_tier"] = "priority"
+
+    result = agent._preflight_codex_api_kwargs(kwargs)
+    assert result["service_tier"] == "priority"
+
+
 def test_run_conversation_codex_replay_payload_keeps_call_id(monkeypatch):
    agent = _build_agent(monkeypatch)
    responses = [_codex_tool_call_response(), _codex_message_response("done")]
@@ -13,6 +13,7 @@ from tools.browser_tool import (
    _find_agent_browser,
    _run_browser_command,
    _SANE_PATH,
+    check_browser_requirements,
 )


@@ -149,6 +150,31 @@ class TestFindAgentBrowser:
                _find_agent_browser()


+class TestBrowserRequirements:
+    def test_termux_requires_real_agent_browser_install_not_npx_fallback(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.setattr("tools.browser_tool._is_camofox_mode", lambda: False)
+        monkeypatch.setattr("tools.browser_tool._get_cloud_provider", lambda: None)
+        monkeypatch.setattr("tools.browser_tool._find_agent_browser", lambda: "npx agent-browser")
+
+        assert check_browser_requirements() is False
+
+
+class TestRunBrowserCommandTermuxFallback:
+    def test_termux_local_mode_rejects_bare_npx_fallback(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.setattr("tools.browser_tool._find_agent_browser", lambda: "npx agent-browser")
+        monkeypatch.setattr("tools.browser_tool._get_cloud_provider", lambda: None)
+
+        result = _run_browser_command("task-1", "navigate", ["https://example.com"])
+
+        assert result["success"] is False
+        assert "bare npx fallback" in result["error"]
+        assert "agent-browser install" in result["error"]
+
+
 class TestRunBrowserCommandPathConstruction:
    """Verify _run_browser_command() includes Homebrew node dirs in subprocess PATH."""

@@ -35,6 +35,7 @@ from hermes_cli.clipboard import (
    _windows_has_image,
    _convert_to_png,
 )
+from cli import _should_auto_attach_clipboard_image_on_paste

 FAKE_PNG = b"\x89PNG\r\n\x1a\n" + b"\x00" * 100
 FAKE_BMP = b"BM" + b"\x00" * 100
@@ -919,6 +920,48 @@ class TestTryAttachClipboardImage:
        assert path.suffix == ".png"


+class TestAutoAttachClipboardImageOnPaste:
+    def test_skips_auto_attach_for_plain_text_paste(self):
+        assert _should_auto_attach_clipboard_image_on_paste("hello world") is False
+
+    def test_skips_auto_attach_for_whitespace_and_text_paste(self):
+        assert _should_auto_attach_clipboard_image_on_paste("  hello world  ") is False
+
+    def test_allows_auto_attach_for_empty_paste(self):
+        assert _should_auto_attach_clipboard_image_on_paste("") is True
+
+    def test_allows_auto_attach_for_whitespace_only_paste(self):
+        assert _should_auto_attach_clipboard_image_on_paste("   \n\t  ") is True
+
+
+class TestVoiceSubmission:
+    @pytest.fixture
+    def cli(self):
+        from cli import HermesCLI
+        cli_obj = HermesCLI.__new__(HermesCLI)
+        cli_obj._attached_images = [Path("/tmp/stale.png")]
+        cli_obj._pending_input = queue.Queue()
+        cli_obj._voice_lock = MagicMock()
+        cli_obj._voice_processing = True
+        cli_obj._voice_recording = True
+        cli_obj._voice_continuous = False
+        cli_obj._no_speech_count = 0
+        cli_obj._voice_recorder = MagicMock()
+        cli_obj._voice_recorder.stop.return_value = "/tmp/fake.wav"
+        cli_obj._app = None
+        return cli_obj
+
+    def test_voice_transcript_clears_stale_attached_images(self, cli):
+        with patch("tools.voice_mode.play_beep"):
+            with patch("tools.voice_mode.transcribe_recording", return_value={"success": True, "transcript": "hello"}):
+                with patch("os.path.isfile", return_value=False):
+                    with patch("cli._cprint"):
+                        cli._voice_stop_and_transcribe()
+
+        assert cli._attached_images == []
+        assert cli._pending_input.get_nowait() == "hello"
+
+
 # ═════════════════════════════════════════════════════════════════════════
 # Level 4: Queue routing — tuple unpacking in process_loop
 # ═════════════════════════════════════════════════════════════════════════
@@ -44,6 +44,7 @@ from tools.code_execution_tool import (
    build_execute_code_schema,
    EXECUTE_CODE_SCHEMA,
    _TOOL_DOC_LINES,
+    _execute_remote,
 )


@@ -115,6 +116,48 @@ class TestHermesToolsGeneration(unittest.TestCase):
        self.assertIn("def retry(", src)
        self.assertIn("import json, os, socket, shlex, time", src)

+    def test_file_transport_uses_tempfile_fallback_for_rpc_dir(self):
+        src = generate_hermes_tools_module(["terminal"], transport="file")
+        self.assertIn("import json, os, shlex, tempfile, time", src)
+        self.assertIn("os.path.join(tempfile.gettempdir(), \"hermes_rpc\")", src)
+        self.assertNotIn('os.environ.get("HERMES_RPC_DIR", "/tmp/hermes_rpc")', src)
+
+
+class TestExecuteCodeRemoteTempDir(unittest.TestCase):
+    def test_execute_remote_uses_backend_temp_dir_for_sandbox(self):
+        class FakeEnv:
+            def __init__(self):
+                self.commands = []
+
+            def get_temp_dir(self):
+                return "/data/data/com.termux/files/usr/tmp"
+
+            def execute(self, command, cwd=None, timeout=None):
+                self.commands.append((command, cwd, timeout))
+                if "command -v python3" in command:
+                    return {"output": "OK\n"}
+                if "python3 script.py" in command:
+                    return {"output": "hello\n", "returncode": 0}
+                return {"output": ""}
+
+        env = FakeEnv()
+        fake_thread = MagicMock()
+
+        with patch("tools.code_execution_tool._load_config", return_value={"timeout": 30, "max_tool_calls": 5}), \
+             patch("tools.code_execution_tool._get_or_create_env", return_value=(env, "ssh")), \
+             patch("tools.code_execution_tool._ship_file_to_remote"), \
+             patch("tools.code_execution_tool.threading.Thread", return_value=fake_thread):
+            result = json.loads(_execute_remote("print('hello')", "task-1", ["terminal"]))
+
+        self.assertEqual(result["status"], "success")
+        mkdir_cmd = env.commands[1][0]
+        run_cmd = next(cmd for cmd, _, _ in env.commands if "python3 script.py" in cmd)
+        cleanup_cmd = env.commands[-1][0]
+        self.assertIn("mkdir -p /data/data/com.termux/files/usr/tmp/hermes_exec_", mkdir_cmd)
+        self.assertIn("HERMES_RPC_DIR=/data/data/com.termux/files/usr/tmp/hermes_exec_", run_cmd)
+        self.assertIn("rm -rf /data/data/com.termux/files/usr/tmp/hermes_exec_", cleanup_cmd)
+        self.assertNotIn("mkdir -p /tmp/hermes_exec_", mkdir_cmd)
+

@unittest.skipIf(sys.platform == "win32", "UDS not available on Windows")
 class TestExecuteCode(unittest.TestCase):
@@ -0,0 +1,51 @@
+from unittest.mock import patch
+
+from tools.environments.local import LocalEnvironment
+
+
+class TestLocalTempDir:
+    def test_uses_os_tmpdir_for_session_artifacts(self, monkeypatch):
+        monkeypatch.setenv("TMPDIR", "/data/data/com.termux/files/usr/tmp")
+        monkeypatch.delenv("TMP", raising=False)
+        monkeypatch.delenv("TEMP", raising=False)
+
+        with patch.object(LocalEnvironment, "init_session", autospec=True, return_value=None):
+            env = LocalEnvironment(cwd=".", timeout=10)
+
+        assert env.get_temp_dir() == "/data/data/com.termux/files/usr/tmp"
+        assert env._snapshot_path == f"/data/data/com.termux/files/usr/tmp/hermes-snap-{env._session_id}.sh"
+        assert env._cwd_file == f"/data/data/com.termux/files/usr/tmp/hermes-cwd-{env._session_id}.txt"
+
+    def test_prefers_backend_env_tmpdir_override(self, monkeypatch):
+        monkeypatch.delenv("TMPDIR", raising=False)
+        monkeypatch.delenv("TMP", raising=False)
+        monkeypatch.delenv("TEMP", raising=False)
+
+        with patch.object(LocalEnvironment, "init_session", autospec=True, return_value=None):
+            env = LocalEnvironment(
+                cwd=".",
+                timeout=10,
+                env={"TMPDIR": "/data/data/com.termux/files/home/.cache/hermes-tmp/"},
+            )
+
+        assert env.get_temp_dir() == "/data/data/com.termux/files/home/.cache/hermes-tmp"
+        assert env._snapshot_path == (
+            f"/data/data/com.termux/files/home/.cache/hermes-tmp/hermes-snap-{env._session_id}.sh"
+        )
+        assert env._cwd_file == (
+            f"/data/data/com.termux/files/home/.cache/hermes-tmp/hermes-cwd-{env._session_id}.txt"
+        )
+
+    def test_falls_back_to_tempfile_when_tmp_missing(self, monkeypatch):
+        monkeypatch.delenv("TMPDIR", raising=False)
+        monkeypatch.delenv("TMP", raising=False)
+        monkeypatch.delenv("TEMP", raising=False)
+
+        with patch("tools.environments.local.os.path.isdir", return_value=False), \
+             patch("tools.environments.local.os.access", return_value=False), \
+             patch("tools.environments.local.tempfile.gettempdir", return_value="/cache/tmp"), \
+             patch.object(LocalEnvironment, "init_session", autospec=True, return_value=None):
+            env = LocalEnvironment(cwd=".", timeout=10)
+            assert env.get_temp_dir() == "/cache/tmp"
+            assert env._snapshot_path == f"/cache/tmp/hermes-snap-{env._session_id}.sh"
+            assert env._cwd_file == f"/cache/tmp/hermes-cwd-{env._session_id}.txt"
@@ -135,6 +135,64 @@ class TestReadLog:
        assert "5 lines" in result["showing"]


+# =========================================================================
+# Stdin helpers
+# =========================================================================
+
+class TestStdinHelpers:
+    def test_close_stdin_not_found(self, registry):
+        result = registry.close_stdin("nonexistent")
+        assert result["status"] == "not_found"
+
+    def test_close_stdin_pipe_mode(self, registry):
+        proc = MagicMock()
+        proc.stdin = MagicMock()
+        s = _make_session()
+        s.process = proc
+        registry._running[s.id] = s
+
+        result = registry.close_stdin(s.id)
+
+        proc.stdin.close.assert_called_once()
+        assert result["status"] == "ok"
+
+    def test_close_stdin_pty_mode(self, registry):
+        pty = MagicMock()
+        s = _make_session()
+        s._pty = pty
+        registry._running[s.id] = s
+
+        result = registry.close_stdin(s.id)
+
+        pty.sendeof.assert_called_once()
+        assert result["status"] == "ok"
+
+    def test_close_stdin_allows_eof_driven_process_to_finish(self, registry, tmp_path):
+        session = registry.spawn_local(
+            'python3 -c "import sys; print(sys.stdin.read().strip())"',
+            cwd=str(tmp_path),
+            use_pty=False,
+        )
+
+        try:
+            time.sleep(0.5)
+            assert registry.submit_stdin(session.id, "hello")["status"] == "ok"
+            assert registry.close_stdin(session.id)["status"] == "ok"
+
+            deadline = time.time() + 5
+            while time.time() < deadline:
+                poll = registry.poll(session.id)
+                if poll["status"] == "exited":
+                    assert poll["exit_code"] == 0
+                    assert "hello" in poll["output_preview"]
+                    return
+                time.sleep(0.2)
+
+            pytest.fail("process did not exit after stdin was closed")
+        finally:
+            registry.kill_process(session.id)
+
+
 # =========================================================================
 # List sessions
 # =========================================================================
@@ -282,6 +340,67 @@ class TestSpawnEnvSanitization:
        assert f"{_HERMES_PROVIDER_ENV_FORCE_PREFIX}TELEGRAM_BOT_TOKEN" not in env
        assert env["PYTHONUNBUFFERED"] == "1"

+    def test_spawn_via_env_uses_backend_temp_dir_for_artifacts(self, registry):
+        class FakeEnv:
+            def __init__(self):
+                self.commands = []
+
+            def get_temp_dir(self):
+                return "/data/data/com.termux/files/usr/tmp"
+
+            def execute(self, command, timeout=None):
+                self.commands.append((command, timeout))
+                return {"output": "4321\n"}
+
+        env = FakeEnv()
+        fake_thread = MagicMock()
+
+        with patch("tools.process_registry.threading.Thread", return_value=fake_thread), \
+            patch.object(registry, "_write_checkpoint"):
+            session = registry.spawn_via_env(env, "echo hello")
+
+        bg_command = env.commands[0][0]
+        assert session.pid == 4321
+        assert "/data/data/com.termux/files/usr/tmp/hermes_bg_" in bg_command
+        assert ".exit" in bg_command
+        assert "rc=$?;" in bg_command
+        assert " > /tmp/hermes_bg_" not in bg_command
+        assert "cat /tmp/hermes_bg_" not in bg_command
+        fake_thread.start.assert_called_once()
+
+    def test_env_poller_quotes_temp_paths_with_spaces(self, registry):
+        session = _make_session(sid="proc_space")
+        session.exited = False
+
+        class FakeEnv:
+            def __init__(self):
+                self.commands = []
+                self._responses = iter([
+                    {"output": "hello\n"},
+                    {"output": "1\n"},
+                    {"output": "0\n"},
+                ])
+
+            def execute(self, command, timeout=None):
+                self.commands.append((command, timeout))
+                return next(self._responses)
+
+        env = FakeEnv()
+
+        with patch("tools.process_registry.time.sleep", return_value=None), \
+            patch.object(registry, "_move_to_finished"):
+            registry._env_poller_loop(
+                session,
+                env,
+                "/path with spaces/hermes_bg.log",
+                "/path with spaces/hermes_bg.pid",
+                "/path with spaces/hermes_bg.exit",
+            )
+
+        assert env.commands[0][0] == "cat '/path with spaces/hermes_bg.log' 2>/dev/null"
+        assert env.commands[1][0] == "kill -0 \"$(cat '/path with spaces/hermes_bg.pid' 2>/dev/null)\" 2>/dev/null; echo $?"
+        assert env.commands[2][0] == "cat '/path with spaces/hermes_bg.exit' 2>/dev/null"
+

 # =========================================================================
 # Checkpoint
@@ -0,0 +1,187 @@
+"""Tests for foreground timeout cap in terminal_tool.
+
+Ensures that foreground commands with timeout > FOREGROUND_MAX_TIMEOUT
+are rejected with an error suggesting background=true.
+"""
+import json
+import os
+from unittest.mock import patch, MagicMock
+
+
+# ---------------------------------------------------------------------------
+# Shared test config dict — mirrors _get_env_config() return shape.
+# ---------------------------------------------------------------------------
+def _make_env_config(**overrides):
+    """Return a minimal _get_env_config()-shaped dict with optional overrides."""
+    config = {
+        "env_type": "local",
+        "timeout": 180,
+        "cwd": "/tmp",
+        "host_cwd": None,
+        "modal_mode": "auto",
+        "docker_image": "",
+        "singularity_image": "",
+        "modal_image": "",
+        "daytona_image": "",
+    }
+    config.update(overrides)
+    return config
+
+
+class TestForegroundTimeoutCap:
+    """FOREGROUND_MAX_TIMEOUT rejects foreground commands that exceed it."""
+
+    def test_foreground_timeout_rejected_above_max(self):
+        """When model requests timeout > FOREGROUND_MAX_TIMEOUT, return error."""
+        from tools.terminal_tool import terminal_tool, FOREGROUND_MAX_TIMEOUT
+
+        with patch("tools.terminal_tool._get_env_config", return_value=_make_env_config()), \
+             patch("tools.terminal_tool._start_cleanup_thread"):
+
+            result = json.loads(terminal_tool(
+                command="echo hello",
+                timeout=9999,  # Way above max
+            ))
+
+        assert "error" in result
+        assert "9999" in result["error"]
+        assert str(FOREGROUND_MAX_TIMEOUT) in result["error"]
+        assert "background=true" in result["error"]
+
+    def test_foreground_timeout_within_max_executes(self):
+        """When model requests timeout <= FOREGROUND_MAX_TIMEOUT, execute normally."""
+        from tools.terminal_tool import terminal_tool
+
+        with patch("tools.terminal_tool._get_env_config", return_value=_make_env_config()), \
+             patch("tools.terminal_tool._start_cleanup_thread"):
+
+            mock_env = MagicMock()
+            mock_env.execute.return_value = {"output": "done", "returncode": 0}
+
+            with patch("tools.terminal_tool._active_environments", {"default": mock_env}), \
+                 patch("tools.terminal_tool._last_activity", {"default": 0}), \
+                 patch("tools.terminal_tool._check_all_guards", return_value={"approved": True}):
+                result = json.loads(terminal_tool(
+                    command="echo hello",
+                    timeout=300,  # Within max
+                ))
+
+        call_kwargs = mock_env.execute.call_args
+        assert call_kwargs[1]["timeout"] == 300
+        assert "error" not in result or result["error"] is None
+
+    def test_config_default_above_cap_not_rejected(self):
+        """When config default timeout > cap but model passes no timeout, execute normally.
+
+        Only the model's explicit timeout parameter triggers rejection,
+        not the user's configured default.
+        """
+        from tools.terminal_tool import terminal_tool, FOREGROUND_MAX_TIMEOUT
+
+        # User configured TERMINAL_TIMEOUT=900 in their env
+        with patch("tools.terminal_tool._get_env_config",
+                    return_value=_make_env_config(timeout=900)), \
+             patch("tools.terminal_tool._start_cleanup_thread"):
+
+            mock_env = MagicMock()
+            mock_env.execute.return_value = {"output": "done", "returncode": 0}
+
+            with patch("tools.terminal_tool._active_environments", {"default": mock_env}), \
+                 patch("tools.terminal_tool._last_activity", {"default": 0}), \
+                 patch("tools.terminal_tool._check_all_guards", return_value={"approved": True}):
+                result = json.loads(terminal_tool(command="make build"))
+
+        # Should execute with the config default, NOT be rejected
+        call_kwargs = mock_env.execute.call_args
+        assert call_kwargs[1]["timeout"] == 900
+        assert "error" not in result or result["error"] is None
+
+    def test_background_not_rejected(self):
+        """Background commands should NOT be subject to foreground timeout cap."""
+        from tools.terminal_tool import terminal_tool
+
+        with patch("tools.terminal_tool._get_env_config", return_value=_make_env_config()), \
+             patch("tools.terminal_tool._start_cleanup_thread"):
+
+            mock_env = MagicMock()
+            mock_env.env = {}
+            mock_proc_session = MagicMock()
+            mock_proc_session.id = "test-123"
+            mock_proc_session.pid = 1234
+
+            mock_registry = MagicMock()
+            mock_registry.spawn_local.return_value = mock_proc_session
+
+            with patch("tools.terminal_tool._active_environments", {"default": mock_env}), \
+                 patch("tools.terminal_tool._last_activity", {"default": 0}), \
+                 patch("tools.terminal_tool._check_all_guards", return_value={"approved": True}), \
+                 patch("tools.process_registry.process_registry", mock_registry), \
+                 patch("tools.approval.get_current_session_key", return_value=""):
+                result = json.loads(terminal_tool(
+                    command="python server.py",
+                    background=True,
+                    timeout=9999,
+                ))
+
+        # Background should NOT be rejected
+        assert "error" not in result or result["error"] is None
+
+    def test_default_timeout_not_rejected(self):
+        """Default timeout (180s) should not trigger rejection."""
+        from tools.terminal_tool import terminal_tool, FOREGROUND_MAX_TIMEOUT
+
+        # 180 < 600, so no rejection
+        assert 180 < FOREGROUND_MAX_TIMEOUT
+
+        with patch("tools.terminal_tool._get_env_config", return_value=_make_env_config()), \
+             patch("tools.terminal_tool._start_cleanup_thread"):
+
+            mock_env = MagicMock()
+            mock_env.execute.return_value = {"output": "done", "returncode": 0}
+
+            with patch("tools.terminal_tool._active_environments", {"default": mock_env}), \
+                 patch("tools.terminal_tool._last_activity", {"default": 0}), \
+                 patch("tools.terminal_tool._check_all_guards", return_value={"approved": True}):
+                result = json.loads(terminal_tool(command="echo hello"))
+
+        call_kwargs = mock_env.execute.call_args
+        assert call_kwargs[1]["timeout"] == 180
+        assert "error" not in result or result["error"] is None
+
+    def test_exactly_at_max_not_rejected(self):
+        """Timeout exactly at FOREGROUND_MAX_TIMEOUT should execute normally."""
+        from tools.terminal_tool import terminal_tool, FOREGROUND_MAX_TIMEOUT
+
+        with patch("tools.terminal_tool._get_env_config", return_value=_make_env_config()), \
+             patch("tools.terminal_tool._start_cleanup_thread"):
+
+            mock_env = MagicMock()
+            mock_env.execute.return_value = {"output": "done", "returncode": 0}
+
+            with patch("tools.terminal_tool._active_environments", {"default": mock_env}), \
+                 patch("tools.terminal_tool._last_activity", {"default": 0}), \
+                 patch("tools.terminal_tool._check_all_guards", return_value={"approved": True}):
+                result = json.loads(terminal_tool(
+                    command="echo hello",
+                    timeout=FOREGROUND_MAX_TIMEOUT,  # Exactly at limit
+                ))
+
+        call_kwargs = mock_env.execute.call_args
+        assert call_kwargs[1]["timeout"] == FOREGROUND_MAX_TIMEOUT
+        assert "error" not in result or result["error"] is None
+
+
+class TestForegroundMaxTimeoutConstant:
+    """Verify the FOREGROUND_MAX_TIMEOUT constant and schema."""
+
+    def test_default_value_is_600(self):
+        """Default FOREGROUND_MAX_TIMEOUT is 600 when env var is not set."""
+        from tools.terminal_tool import FOREGROUND_MAX_TIMEOUT
+        assert FOREGROUND_MAX_TIMEOUT == 600
+
+    def test_schema_mentions_max(self):
+        """Tool schema description should mention the max timeout."""
+        from tools.terminal_tool import TERMINAL_SCHEMA, FOREGROUND_MAX_TIMEOUT
+        timeout_desc = TERMINAL_SCHEMA["parameters"]["properties"]["timeout"]["description"]
+        assert str(FOREGROUND_MAX_TIMEOUT) in timeout_desc
+        assert "background=true" in timeout_desc
@@ -0,0 +1,91 @@
+import json
+from types import SimpleNamespace
+
+import tools.terminal_tool as terminal_tool_module
+from tools import process_registry as process_registry_module
+
+
+def _base_config(tmp_path):
+    return {
+        "env_type": "local",
+        "docker_image": "",
+        "singularity_image": "",
+        "modal_image": "",
+        "daytona_image": "",
+        "cwd": str(tmp_path),
+        "timeout": 30,
+    }
+
+
+def test_command_requires_pipe_stdin_detects_gh_with_token():
+    assert terminal_tool_module._command_requires_pipe_stdin(
+        "gh auth login --hostname github.com --git-protocol https --with-token"
+    ) is True
+    assert terminal_tool_module._command_requires_pipe_stdin(
+        "gh auth login --web"
+    ) is False
+
+
+def test_terminal_background_disables_pty_for_gh_with_token(monkeypatch, tmp_path):
+    config = _base_config(tmp_path)
+    dummy_env = SimpleNamespace(env={})
+    captured = {}
+
+    def fake_spawn_local(**kwargs):
+        captured.update(kwargs)
+        return SimpleNamespace(id="proc_test", pid=1234, notify_on_complete=False)
+
+    monkeypatch.setattr(terminal_tool_module, "_get_env_config", lambda: config)
+    monkeypatch.setattr(terminal_tool_module, "_start_cleanup_thread", lambda: None)
+    monkeypatch.setattr(terminal_tool_module, "_check_all_guards", lambda *_args, **_kwargs: {"approved": True})
+    monkeypatch.setattr(process_registry_module.process_registry, "spawn_local", fake_spawn_local)
+    monkeypatch.setitem(terminal_tool_module._active_environments, "default", dummy_env)
+    monkeypatch.setitem(terminal_tool_module._last_activity, "default", 0.0)
+
+    try:
+        result = json.loads(
+            terminal_tool_module.terminal_tool(
+                command="gh auth login --hostname github.com --git-protocol https --with-token",
+                background=True,
+                pty=True,
+            )
+        )
+    finally:
+        terminal_tool_module._active_environments.pop("default", None)
+        terminal_tool_module._last_activity.pop("default", None)
+
+    assert captured["use_pty"] is False
+    assert result["session_id"] == "proc_test"
+    assert "PTY disabled" in result["pty_note"]
+
+
+def test_terminal_background_keeps_pty_for_regular_interactive_commands(monkeypatch, tmp_path):
+    config = _base_config(tmp_path)
+    dummy_env = SimpleNamespace(env={})
+    captured = {}
+
+    def fake_spawn_local(**kwargs):
+        captured.update(kwargs)
+        return SimpleNamespace(id="proc_test", pid=1234, notify_on_complete=False)
+
+    monkeypatch.setattr(terminal_tool_module, "_get_env_config", lambda: config)
+    monkeypatch.setattr(terminal_tool_module, "_start_cleanup_thread", lambda: None)
+    monkeypatch.setattr(terminal_tool_module, "_check_all_guards", lambda *_args, **_kwargs: {"approved": True})
+    monkeypatch.setattr(process_registry_module.process_registry, "spawn_local", fake_spawn_local)
+    monkeypatch.setitem(terminal_tool_module._active_environments, "default", dummy_env)
+    monkeypatch.setitem(terminal_tool_module._last_activity, "default", 0.0)
+
+    try:
+        result = json.loads(
+            terminal_tool_module.terminal_tool(
+                command="python3 -c \"print(input())\"",
+                background=True,
+                pty=True,
+            )
+        )
+    finally:
+        terminal_tool_module._active_environments.pop("default", None)
+        terminal_tool_module._last_activity.pop("default", None)
+
+    assert captured["use_pty"] is True
+    assert "pty_note" not in result
@@ -16,6 +16,7 @@ from tools.tool_result_storage import (
    STORAGE_DIR,
    _build_persisted_message,
    _heredoc_marker,
+    _resolve_storage_dir,
    _write_to_sandbox,
    enforce_turn_budget,
    generate_preview,
@@ -115,6 +116,24 @@ class TestWriteToSandbox:
        _write_to_sandbox("content", "/tmp/hermes-results/abc.txt", env)
        assert env.execute.call_args[1]["timeout"] == 30

+    def test_uses_parent_dir_of_remote_path(self):
+        env = MagicMock()
+        env.execute.return_value = {"output": "", "returncode": 0}
+        remote_path = "/data/data/com.termux/files/usr/tmp/hermes-results/abc.txt"
+        _write_to_sandbox("content", remote_path, env)
+        cmd = env.execute.call_args[0][0]
+        assert "mkdir -p /data/data/com.termux/files/usr/tmp/hermes-results" in cmd
+
+
+class TestResolveStorageDir:
+    def test_defaults_to_storage_dir_without_env(self):
+        assert _resolve_storage_dir(None) == STORAGE_DIR
+
+    def test_uses_env_temp_dir_when_available(self):
+        env = MagicMock()
+        env.get_temp_dir.return_value = "/data/data/com.termux/files/usr/tmp"
+        assert _resolve_storage_dir(env) == "/data/data/com.termux/files/usr/tmp/hermes-results"
+

 # ── _build_persisted_message ──────────────────────────────────────────

@@ -341,6 +360,22 @@ class TestMaybePersistToolResult:
        )
        assert "DISTINCTIVE_START_MARKER" in result

+    def test_env_temp_dir_changes_persisted_path(self):
+        env = MagicMock()
+        env.execute.return_value = {"output": "", "returncode": 0}
+        env.get_temp_dir.return_value = "/data/data/com.termux/files/usr/tmp"
+        content = "x" * 60_000
+        result = maybe_persist_tool_result(
+            content=content,
+            tool_name="terminal",
+            tool_use_id="tc_termux",
+            env=env,
+            threshold=30_000,
+        )
+        assert "/data/data/com.termux/files/usr/tmp/hermes-results/tc_termux.txt" in result
+        cmd = env.execute.call_args[0][0]
+        assert "mkdir -p /data/data/com.termux/files/usr/tmp/hermes-results" in cmd
+
    def test_threshold_zero_forces_persist(self):
        env = MagicMock()
        env.execute.return_value = {"output": "", "returncode": 0}
@@ -183,12 +183,77 @@ class TestDetectAudioEnvironment:
        assert result["available"] is False
        assert any("PortAudio" in w for w in result["warnings"])

+    def test_termux_import_error_shows_termux_install_guidance(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.delenv("SSH_CLIENT", raising=False)
+        monkeypatch.delenv("SSH_TTY", raising=False)
+        monkeypatch.delenv("SSH_CONNECTION", raising=False)
+        monkeypatch.setattr("tools.voice_mode._import_audio", lambda: (_ for _ in ()).throw(ImportError("no audio libs")))
+        monkeypatch.setattr("tools.voice_mode._termux_microphone_command", lambda: None)
+
+        from tools.voice_mode import detect_audio_environment
+        result = detect_audio_environment()
+
+        assert result["available"] is False
+        assert any("pkg install python-numpy portaudio" in w for w in result["warnings"])
+        assert any("python -m pip install sounddevice" in w for w in result["warnings"])
+
+    def test_termux_api_package_without_android_app_blocks_voice(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.delenv("SSH_CLIENT", raising=False)
+        monkeypatch.delenv("SSH_TTY", raising=False)
+        monkeypatch.delenv("SSH_CONNECTION", raising=False)
+        monkeypatch.setattr("tools.voice_mode._termux_microphone_command", lambda: "/data/data/com.termux/files/usr/bin/termux-microphone-record")
+        monkeypatch.setattr("tools.voice_mode._termux_api_app_installed", lambda: False)
+        monkeypatch.setattr("tools.voice_mode._import_audio", lambda: (_ for _ in ()).throw(ImportError("no audio libs")))
+
+        from tools.voice_mode import detect_audio_environment
+        result = detect_audio_environment()
+
+        assert result["available"] is False
+        assert any("Termux:API Android app is not installed" in w for w in result["warnings"])
+
+
+    def test_termux_api_microphone_allows_voice_without_sounddevice(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.delenv("SSH_CLIENT", raising=False)
+        monkeypatch.delenv("SSH_TTY", raising=False)
+        monkeypatch.delenv("SSH_CONNECTION", raising=False)
+        monkeypatch.setattr("tools.voice_mode.shutil.which", lambda cmd: "/data/data/com.termux/files/usr/bin/termux-microphone-record" if cmd == "termux-microphone-record" else None)
+        monkeypatch.setattr("tools.voice_mode._termux_api_app_installed", lambda: True)
+        monkeypatch.setattr("tools.voice_mode._import_audio", lambda: (_ for _ in ()).throw(ImportError("no audio libs")))
+
+        from tools.voice_mode import detect_audio_environment
+        result = detect_audio_environment()
+
+        assert result["available"] is True
+        assert any("Termux:API microphone recording available" in n for n in result.get("notices", []))
+        assert result["warnings"] == []
+

 # ============================================================================
 # check_voice_requirements
 # ============================================================================

 class TestCheckVoiceRequirements:
+    def test_termux_api_capture_counts_as_audio_available(self, monkeypatch):
+        monkeypatch.setattr("tools.voice_mode._audio_available", lambda: False)
+        monkeypatch.setattr("tools.voice_mode._termux_microphone_command", lambda: "/data/data/com.termux/files/usr/bin/termux-microphone-record")
+        monkeypatch.setattr("tools.voice_mode._termux_api_app_installed", lambda: True)
+        monkeypatch.setattr("tools.voice_mode.detect_audio_environment", lambda: {"available": True, "warnings": [], "notices": ["Termux:API microphone recording available"]})
+        monkeypatch.setattr("tools.transcription_tools._get_provider", lambda cfg: "openai")
+
+        from tools.voice_mode import check_voice_requirements
+        result = check_voice_requirements()
+
+        assert result["available"] is True
+        assert result["audio_available"] is True
+        assert result["missing_packages"] == []
+        assert "Termux:API microphone" in result["details"]
+
    def test_all_requirements_met(self, monkeypatch):
        monkeypatch.setattr("tools.voice_mode._audio_available", lambda: True)
        monkeypatch.setattr("tools.voice_mode.detect_audio_environment",
@@ -235,8 +300,85 @@ class TestCheckVoiceRequirements:
 # AudioRecorder
 # ============================================================================

-class TestAudioRecorderStart:
-    def test_start_raises_without_audio(self, monkeypatch):
+class TestCreateAudioRecorder:
+    def test_termux_uses_termux_audio_recorder_when_api_present(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.setattr("tools.voice_mode._termux_microphone_command", lambda: "/data/data/com.termux/files/usr/bin/termux-microphone-record")
+        monkeypatch.setattr("tools.voice_mode._termux_api_app_installed", lambda: True)
+
+        from tools.voice_mode import create_audio_recorder, TermuxAudioRecorder
+        recorder = create_audio_recorder()
+
+        assert isinstance(recorder, TermuxAudioRecorder)
+        assert recorder.supports_silence_autostop is False
+
+    def test_termux_without_android_app_falls_back_to_audio_recorder(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.setattr("tools.voice_mode._termux_microphone_command", lambda: "/data/data/com.termux/files/usr/bin/termux-microphone-record")
+        monkeypatch.setattr("tools.voice_mode._termux_api_app_installed", lambda: False)
+
+        from tools.voice_mode import create_audio_recorder, AudioRecorder
+        recorder = create_audio_recorder()
+
+        assert isinstance(recorder, AudioRecorder)
+
+
+class TestTermuxAudioRecorder:
+    def test_start_and_stop_use_termux_microphone_commands(self, monkeypatch, temp_voice_dir):
+        command_calls = []
+        output_path = Path(temp_voice_dir) / "recording_20260409_120000.aac"
+
+        def fake_run(cmd, **kwargs):
+            command_calls.append(cmd)
+            if cmd[1] == "-f":
+                Path(cmd[2]).write_bytes(b"aac-bytes")
+            return MagicMock(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.setattr("tools.voice_mode._termux_microphone_command", lambda: "/data/data/com.termux/files/usr/bin/termux-microphone-record")
+        monkeypatch.setattr("tools.voice_mode._termux_api_app_installed", lambda: True)
+        monkeypatch.setattr("tools.voice_mode.time.strftime", lambda fmt: "20260409_120000")
+        monkeypatch.setattr("tools.voice_mode.subprocess.run", fake_run)
+
+        from tools.voice_mode import TermuxAudioRecorder
+        recorder = TermuxAudioRecorder()
+        recorder.start()
+        recorder._start_time = time.monotonic() - 1.0
+        result = recorder.stop()
+
+        assert result == str(output_path)
+        assert command_calls[0][:2] == ["/data/data/com.termux/files/usr/bin/termux-microphone-record", "-f"]
+        assert command_calls[1] == ["/data/data/com.termux/files/usr/bin/termux-microphone-record", "-q"]
+
+    def test_cancel_removes_partial_termux_recording(self, monkeypatch, temp_voice_dir):
+        output_path = Path(temp_voice_dir) / "recording_20260409_120000.aac"
+
+        def fake_run(cmd, **kwargs):
+            if cmd[1] == "-f":
+                Path(cmd[2]).write_bytes(b"aac-bytes")
+            return MagicMock(returncode=0, stdout="", stderr="")
+
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        monkeypatch.setattr("tools.voice_mode._termux_microphone_command", lambda: "/data/data/com.termux/files/usr/bin/termux-microphone-record")
+        monkeypatch.setattr("tools.voice_mode._termux_api_app_installed", lambda: True)
+        monkeypatch.setattr("tools.voice_mode.time.strftime", lambda fmt: "20260409_120000")
+        monkeypatch.setattr("tools.voice_mode.subprocess.run", fake_run)
+
+        from tools.voice_mode import TermuxAudioRecorder
+        recorder = TermuxAudioRecorder()
+        recorder.start()
+        recorder.cancel()
+
+        assert output_path.exists() is False
+        assert recorder.is_recording is False
+
+
+class TestAudioRecorder:
+    def test_start_raises_without_audio_libs(self, monkeypatch):
        def _fail_import():
            raise ImportError("no sounddevice")
        monkeypatch.setattr("tools.voice_mode._import_audio", _fail_import)
@@ -10,6 +10,11 @@ from tools.approval import (
    check_all_command_guards,
    check_dangerous_command,
    detect_dangerous_command,
+    disable_session_yolo,
+    enable_session_yolo,
+    is_session_yolo_enabled,
+    reset_current_session_key,
+    set_current_session_key,
 )


@@ -18,10 +23,14 @@ def _clear_approval_state():
    approval_module._permanent_approved.clear()
    approval_module.clear_session("default")
    approval_module.clear_session("test-session")
+    approval_module.clear_session("session-a")
+    approval_module.clear_session("session-b")
    yield
    approval_module._permanent_approved.clear()
    approval_module.clear_session("default")
    approval_module.clear_session("test-session")
+    approval_module.clear_session("session-a")
+    approval_module.clear_session("session-b")


 class TestYoloMode:
@@ -108,3 +117,67 @@ class TestYoloMode:
        result = check_dangerous_command("rm -rf /", "local",
                                         approval_callback=lambda *a: "deny")
        assert not result["approved"]
+
+    def test_session_scoped_yolo_only_bypasses_current_session(self, monkeypatch):
+        """Gateway /yolo should only bypass approvals for the active session."""
+        monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
+        monkeypatch.setenv("HERMES_INTERACTIVE", "1")
+
+        enable_session_yolo("session-a")
+        assert is_session_yolo_enabled("session-a") is True
+        assert is_session_yolo_enabled("session-b") is False
+
+        token_a = set_current_session_key("session-a")
+        try:
+            approved = check_dangerous_command("rm -rf /", "local")
+            assert approved["approved"] is True
+        finally:
+            reset_current_session_key(token_a)
+
+        token_b = set_current_session_key("session-b")
+        try:
+            blocked = check_dangerous_command(
+                "rm -rf /",
+                "local",
+                approval_callback=lambda *a: "deny",
+            )
+            assert blocked["approved"] is False
+        finally:
+            reset_current_session_key(token_b)
+
+        disable_session_yolo("session-a")
+        assert is_session_yolo_enabled("session-a") is False
+
+    def test_session_scoped_yolo_bypasses_combined_guard_only_for_current_session(self, monkeypatch):
+        """Combined guard should honor session-scoped YOLO without affecting others."""
+        monkeypatch.delenv("HERMES_YOLO_MODE", raising=False)
+        monkeypatch.setenv("HERMES_INTERACTIVE", "1")
+
+        enable_session_yolo("session-a")
+
+        token_a = set_current_session_key("session-a")
+        try:
+            approved = check_all_command_guards("rm -rf /", "local")
+            assert approved["approved"] is True
+        finally:
+            reset_current_session_key(token_a)
+
+        token_b = set_current_session_key("session-b")
+        try:
+            blocked = check_all_command_guards(
+                "rm -rf /",
+                "local",
+                approval_callback=lambda *a: "deny",
+            )
+            assert blocked["approved"] is False
+        finally:
+            reset_current_session_key(token_b)
+
+    def test_clear_session_removes_session_yolo_state(self):
+        """Session cleanup must remove YOLO bypass state."""
+        enable_session_yolo("session-a")
+        assert is_session_yolo_enabled("session-a") is True
+
+        approval_module.clear_session("session-a")
+
+        assert is_session_yolo_enabled("session-a") is False
@@ -172,6 +172,7 @@ def detect_dangerous_command(command: str) -> tuple:
 _lock = threading.Lock()
 _pending: dict[str, dict] = {}
 _session_approved: dict[str, set] = {}
+_session_yolo: set[str] = set()
 _permanent_approved: set = set()

 # =========================================================================
@@ -287,6 +288,35 @@ def approve_session(session_key: str, pattern_key: str):
        _session_approved.setdefault(session_key, set()).add(pattern_key)


+def enable_session_yolo(session_key: str) -> None:
+    """Enable YOLO bypass for a single session key."""
+    if not session_key:
+        return
+    with _lock:
+        _session_yolo.add(session_key)
+
+
+def disable_session_yolo(session_key: str) -> None:
+    """Disable YOLO bypass for a single session key."""
+    if not session_key:
+        return
+    with _lock:
+        _session_yolo.discard(session_key)
+
+
+def is_session_yolo_enabled(session_key: str) -> bool:
+    """Return True when YOLO bypass is enabled for a specific session."""
+    if not session_key:
+        return False
+    with _lock:
+        return session_key in _session_yolo
+
+
+def is_current_session_yolo_enabled() -> bool:
+    """Return True when the active approval session has YOLO bypass enabled."""
+    return is_session_yolo_enabled(get_current_session_key(default=""))
+
+
 def is_approved(session_key: str, pattern_key: str) -> bool:
    """Check if a pattern is approved (session-scoped or permanent).

@@ -317,6 +347,7 @@ def clear_session(session_key: str):
    """Clear all approvals and pending requests for a session."""
    with _lock:
        _session_approved.pop(session_key, None)
+        _session_yolo.discard(session_key)
        _pending.pop(session_key, None)
        _gateway_notify_cbs.pop(session_key, None)
        # Signal ALL blocked threads so they don't hang forever
@@ -554,8 +585,9 @@ def check_dangerous_command(command: str, env_type: str,
    if env_type in ("docker", "singularity", "modal", "daytona"):
        return {"approved": True, "message": None}

-    # --yolo: bypass all approval prompts
-    if os.getenv("HERMES_YOLO_MODE"):
+    # --yolo: bypass all approval prompts. Gateway /yolo is session-scoped;
+    # CLI --yolo remains process-scoped via the env var for local use.
+    if os.getenv("HERMES_YOLO_MODE") or is_current_session_yolo_enabled():
        return {"approved": True, "message": None}

    is_dangerous, pattern_key, description = detect_dangerous_command(command)
@@ -655,9 +687,10 @@ def check_all_command_guards(command: str, env_type: str,
    if env_type in ("docker", "singularity", "modal", "daytona"):
        return {"approved": True, "message": None}

-    # --yolo or approvals.mode=off: bypass all approval prompts
+    # --yolo or approvals.mode=off: bypass all approval prompts.
+    # Gateway /yolo is session-scoped; CLI --yolo remains process-scoped.
    approval_mode = _get_approval_mode()
-    if os.getenv("HERMES_YOLO_MODE") or approval_mode == "off":
+    if os.getenv("HERMES_YOLO_MODE") or is_current_session_yolo_enabled() or approval_mode == "off":
        return {"approved": True, "message": None}

    is_cli = os.getenv("HERMES_INTERACTIVE")
@@ -285,6 +285,26 @@ def _get_cloud_provider() -> Optional[CloudBrowserProvider]:
    return _cached_cloud_provider


+from hermes_constants import is_termux as _is_termux_environment
+
+
+def _browser_install_hint() -> str:
+    if _is_termux_environment():
+        return "npm install -g agent-browser && agent-browser install"
+    return "npm install -g agent-browser && agent-browser install --with-deps"
+
+
+def _requires_real_termux_browser_install(browser_cmd: str) -> bool:
+    return _is_termux_environment() and _is_local_mode() and browser_cmd.strip() == "npx agent-browser"
+
+
+def _termux_browser_install_error() -> str:
+    return (
+        "Local browser automation on Termux cannot rely on the bare npx fallback. "
+        f"Install agent-browser explicitly first: {_browser_install_hint()}"
+    )
+
+
 def _is_local_mode() -> bool:
    """Return True when the browser tool will use a local browser backend."""
    if _get_cdp_override():
@@ -796,7 +816,8 @@ def _find_agent_browser() -> str:
        return "npx agent-browser"
    
    raise FileNotFoundError(
-        "agent-browser CLI not found. Install it with: npm install -g agent-browser\n"
+        "agent-browser CLI not found. Install it with: "
+        f"{_browser_install_hint()}\n"
        "Or run 'npm install' in the repo root to install locally.\n"
        "Or ensure npx is available in your PATH."
    )
@@ -852,6 +873,11 @@ def _run_browser_command(
    except FileNotFoundError as e:
        logger.warning("agent-browser CLI not found: %s", e)
        return {"success": False, "error": str(e)}
+
+    if _requires_real_termux_browser_install(browser_cmd):
+        error = _termux_browser_install_error()
+        logger.warning("browser command blocked on Termux: %s", error)
+        return {"success": False, "error": error}
    
    from tools.interrupt import is_interrupted
    if is_interrupted():
@@ -2040,10 +2066,17 @@ def check_browser_requirements() -> bool:

    # The agent-browser CLI is always required
    try:
-        _find_agent_browser()
+        browser_cmd = _find_agent_browser()
    except FileNotFoundError:
        return False

+    # On Termux, the bare npx fallback is too fragile to treat as a satisfied
+    # local browser dependency. Require a real install (global or local) so the
+    # browser tool is not advertised as available when it will likely fail on
+    # first use.
+    if _requires_real_termux_browser_install(browser_cmd):
+        return False
+
    # In cloud mode, also require provider credentials
    provider = _get_cloud_provider()
    if provider is not None and not provider.is_configured():
@@ -2073,10 +2106,13 @@ if __name__ == "__main__":
    else:
        print("❌ Missing requirements:")
        try:
-            _find_agent_browser()
+            browser_cmd = _find_agent_browser()
+            if _requires_real_termux_browser_install(browser_cmd):
+                print("   - bare npx fallback found (insufficient on Termux local mode)")
+                print(f"     Install: {_browser_install_hint()}")
        except FileNotFoundError:
            print("   - agent-browser CLI not found")
-            print("     Install: npm install -g agent-browser && agent-browser install --with-deps")
+            print(f"     Install: {_browser_install_hint()}")
        if _cp is not None and not _cp.is_configured():
            print(f"   - {_cp.provider_name()} credentials not configured")
            print("   Tip: set browser.cloud_provider to 'local' to use free local mode instead")
@@ -33,6 +33,7 @@ import json
 import logging
 import os
 import platform
+import shlex
 import signal
 import socket
 import subprocess
@@ -246,9 +247,9 @@ def _call(tool_name, args):

 _FILE_TRANSPORT_HEADER = '''\
 """Auto-generated Hermes tools RPC stubs (file-based transport)."""
-import json, os, shlex, time
+import json, os, shlex, tempfile, time

-_RPC_DIR = os.environ.get("HERMES_RPC_DIR", "/tmp/hermes_rpc")
+_RPC_DIR = os.environ.get("HERMES_RPC_DIR") or os.path.join(tempfile.gettempdir(), "hermes_rpc")
 _seq = 0
 ''' + _COMMON_HELPERS + '''\

@@ -536,13 +537,30 @@ def _ship_file_to_remote(env, remote_path: str, content: str) -> None:
    quotes are fine.
    """
    encoded = base64.b64encode(content.encode("utf-8")).decode("ascii")
+    quoted_remote_path = shlex.quote(remote_path)
    env.execute(
-        f"echo '{encoded}' | base64 -d > {remote_path}",
+        f"echo '{encoded}' | base64 -d > {quoted_remote_path}",
        cwd="/",
        timeout=30,
    )


+def _env_temp_dir(env: Any) -> str:
+    """Return a writable temp dir for env-backed execute_code sandboxes."""
+    get_temp_dir = getattr(env, "get_temp_dir", None)
+    if callable(get_temp_dir):
+        try:
+            temp_dir = get_temp_dir()
+            if isinstance(temp_dir, str) and temp_dir.startswith("/"):
+                return temp_dir.rstrip("/") or "/"
+        except Exception as exc:
+            logger.debug("Could not resolve execute_code env temp dir: %s", exc)
+    candidate = tempfile.gettempdir()
+    if isinstance(candidate, str) and candidate.startswith("/"):
+        return candidate.rstrip("/") or "/"
+    return "/tmp"
+
+
 def _rpc_poll_loop(
    env,
    rpc_dir: str,
@@ -563,11 +581,12 @@ def _rpc_poll_loop(

    poll_interval = 0.1  # 100 ms

+    quoted_rpc_dir = shlex.quote(rpc_dir)
    while not stop_event.is_set():
        try:
            # List pending request files (skip .tmp partials)
            ls_result = env.execute(
-                f"ls -1 {rpc_dir}/req_* 2>/dev/null || true",
+                f"ls -1 {quoted_rpc_dir}/req_* 2>/dev/null || true",
                cwd="/",
                timeout=10,
            )
@@ -589,9 +608,10 @@ def _rpc_poll_loop(

                call_start = time.monotonic()

+                quoted_req_file = shlex.quote(req_file)
                # Read request
                read_result = env.execute(
-                    f"cat {req_file}",
+                    f"cat {quoted_req_file}",
                    cwd="/",
                    timeout=10,
                )
@@ -600,7 +620,7 @@ def _rpc_poll_loop(
                except (json.JSONDecodeError, ValueError):
                    logger.debug("Malformed RPC request in %s", req_file)
                    # Remove bad request to avoid infinite retry
-                    env.execute(f"rm -f {req_file}", cwd="/", timeout=5)
+                    env.execute(f"rm -f {quoted_req_file}", cwd="/", timeout=5)
                    continue

                tool_name = request.get("tool", "")
@@ -608,6 +628,7 @@ def _rpc_poll_loop(
                seq = request.get("seq", 0)
                seq_str = f"{seq:06d}"
                res_file = f"{rpc_dir}/res_{seq_str}"
+                quoted_res_file = shlex.quote(res_file)

                # Enforce allow-list
                if tool_name not in allowed_tools:
@@ -665,14 +686,14 @@ def _rpc_poll_loop(
                    tool_result.encode("utf-8")
                ).decode("ascii")
                env.execute(
-                    f"echo '{encoded_result}' | base64 -d > {res_file}.tmp"
-                    f" && mv {res_file}.tmp {res_file}",
+                    f"echo '{encoded_result}' | base64 -d > {quoted_res_file}.tmp"
+                    f" && mv {quoted_res_file}.tmp {quoted_res_file}",
                    cwd="/",
                    timeout=60,
                )

                # Remove the request file
-                env.execute(f"rm -f {req_file}", cwd="/", timeout=5)
+                env.execute(f"rm -f {quoted_req_file}", cwd="/", timeout=5)

        except Exception as e:
            if not stop_event.is_set():
@@ -707,7 +728,10 @@ def _execute_remote(
    env, env_type = _get_or_create_env(effective_task_id)

    sandbox_id = uuid.uuid4().hex[:12]
-    sandbox_dir = f"/tmp/hermes_exec_{sandbox_id}"
+    temp_dir = _env_temp_dir(env)
+    sandbox_dir = f"{temp_dir}/hermes_exec_{sandbox_id}"
+    quoted_sandbox_dir = shlex.quote(sandbox_dir)
+    quoted_rpc_dir = shlex.quote(f"{sandbox_dir}/rpc")

    tool_call_log: list = []
    tool_call_counter = [0]
@@ -735,7 +759,7 @@ def _execute_remote(

        # Create sandbox directory on remote
        env.execute(
-            f"mkdir -p {sandbox_dir}/rpc", cwd="/", timeout=10,
+            f"mkdir -p {quoted_rpc_dir}", cwd="/", timeout=10,
        )

        # Generate and ship files
@@ -759,7 +783,7 @@ def _execute_remote(

        # Build environment variable prefix for the script
        env_prefix = (
-            f"HERMES_RPC_DIR={sandbox_dir}/rpc "
+            f"HERMES_RPC_DIR={shlex.quote(f'{sandbox_dir}/rpc')} "
            f"PYTHONDONTWRITEBYTECODE=1"
        )
        tz = os.getenv("HERMES_TIMEZONE", "").strip()
@@ -770,7 +794,7 @@ def _execute_remote(
        logger.info("Executing code on %s backend (task %s)...",
                     env_type, effective_task_id[:8])
        script_result = env.execute(
-            f"cd {sandbox_dir} && {env_prefix} python3 script.py",
+            f"cd {quoted_sandbox_dir} && {env_prefix} python3 script.py",
            timeout=timeout,
        )

@@ -807,7 +831,7 @@ def _execute_remote(
        # Clean up remote sandbox dir
        try:
            env.execute(
-                f"rm -rf {sandbox_dir}", cwd="/", timeout=15,
+                f"rm -rf {quoted_sandbox_dir}", cwd="/", timeout=15,
            )
        except Exception:
            logger.debug("Failed to clean up remote sandbox %s", sandbox_dir)
@@ -226,14 +226,24 @@ class BaseEnvironment(ABC):
    # Snapshot creation timeout (override for slow cold-starts).
    _snapshot_timeout: int = 30

+    def get_temp_dir(self) -> str:
+        """Return the backend temp directory used for session artifacts.
+
+        Most sandboxed backends use ``/tmp`` inside the target environment.
+        LocalEnvironment overrides this on platforms like Termux where ``/tmp``
+        may be missing and ``TMPDIR`` is the portable writable location.
+        """
+        return "/tmp"
+
    def __init__(self, cwd: str, timeout: int, env: dict = None):
        self.cwd = cwd
        self.timeout = timeout
        self.env = env or {}

        self._session_id = uuid.uuid4().hex[:12]
-        self._snapshot_path = f"/tmp/hermes-snap-{self._session_id}.sh"
-        self._cwd_file = f"/tmp/hermes-cwd-{self._session_id}.txt"
+        temp_dir = self.get_temp_dir().rstrip("/") or "/"
+        self._snapshot_path = f"{temp_dir}/hermes-snap-{self._session_id}.sh"
+        self._cwd_file = f"{temp_dir}/hermes-cwd-{self._session_id}.txt"
        self._cwd_marker = _cwd_marker(self._session_id)
        self._snapshot_ready = False
        self._last_sync_time: float | None = (
@@ -5,6 +5,7 @@ import platform
 import shutil
 import signal
 import subprocess
+import tempfile

 from tools.environments.base import BaseEnvironment, _pipe_stdin

@@ -209,6 +210,32 @@ class LocalEnvironment(BaseEnvironment):
        super().__init__(cwd=cwd or os.getcwd(), timeout=timeout, env=env)
        self.init_session()

+    def get_temp_dir(self) -> str:
+        """Return a shell-safe writable temp dir for local execution.
+
+        Termux does not provide /tmp by default, but exposes a POSIX TMPDIR.
+        Prefer POSIX-style env vars when available, keep using /tmp on regular
+        Unix systems, and only fall back to tempfile.gettempdir() when it also
+        resolves to a POSIX path.
+
+        Check the environment configured for this backend first so callers can
+        override the temp root explicitly (for example via terminal.env or a
+        custom TMPDIR), then fall back to the host process environment.
+        """
+        for env_var in ("TMPDIR", "TMP", "TEMP"):
+            candidate = self.env.get(env_var) or os.environ.get(env_var)
+            if candidate and candidate.startswith("/"):
+                return candidate.rstrip("/") or "/"
+
+        if os.path.isdir("/tmp") and os.access("/tmp", os.W_OK | os.X_OK):
+            return "/tmp"
+
+        candidate = tempfile.gettempdir()
+        if candidate.startswith("/"):
+            return candidate.rstrip("/") or "/"
+
+        return "/tmp"
+
    def _run_bash(self, cmd_string: str, *, login: bool = False,
                  timeout: int = 120,
                  stdin_data: str | None = None) -> subprocess.Popen:
@@ -172,6 +172,19 @@ class ProcessRegistry:

    # ----- Spawn -----

+    @staticmethod
+    def _env_temp_dir(env: Any) -> str:
+        """Return the writable sandbox temp dir for env-backed background tasks."""
+        get_temp_dir = getattr(env, "get_temp_dir", None)
+        if callable(get_temp_dir):
+            try:
+                temp_dir = get_temp_dir()
+                if isinstance(temp_dir, str) and temp_dir.startswith("/"):
+                    return temp_dir.rstrip("/") or "/"
+            except Exception as exc:
+                logger.debug("Could not resolve environment temp dir: %s", exc)
+        return "/tmp"
+
    def spawn_local(
        self,
        command: str,
@@ -316,12 +329,20 @@ class ProcessRegistry:
        )

        # Run the command in the sandbox with output capture
-        log_path = f"/tmp/hermes_bg_{session.id}.log"
-        pid_path = f"/tmp/hermes_bg_{session.id}.pid"
+        temp_dir = self._env_temp_dir(env)
+        log_path = f"{temp_dir}/hermes_bg_{session.id}.log"
+        pid_path = f"{temp_dir}/hermes_bg_{session.id}.pid"
+        exit_path = f"{temp_dir}/hermes_bg_{session.id}.exit"
        quoted_command = shlex.quote(command)
+        quoted_temp_dir = shlex.quote(temp_dir)
+        quoted_log_path = shlex.quote(log_path)
+        quoted_pid_path = shlex.quote(pid_path)
+        quoted_exit_path = shlex.quote(exit_path)
        bg_command = (
-            f"nohup bash -c {quoted_command} > {log_path} 2>&1 & "
-            f"echo $! > {pid_path} && cat {pid_path}"
+            f"mkdir -p {quoted_temp_dir} && "
+            f"( nohup bash -lc {quoted_command} > {quoted_log_path} 2>&1; "
+            f"rc=$?; printf '%s\\n' \"$rc\" > {quoted_exit_path} ) & "
+            f"echo $! > {quoted_pid_path} && cat {quoted_pid_path}"
        )

        try:
@@ -342,7 +363,7 @@ class ProcessRegistry:
            # Start a poller thread that periodically reads the log file
            reader = threading.Thread(
                target=self._env_poller_loop,
-                args=(session, env, log_path, pid_path),
+                args=(session, env, log_path, pid_path, exit_path),
                daemon=True,
                name=f"proc-poller-{session.id}",
            )
@@ -386,14 +407,17 @@ class ProcessRegistry:
        self._move_to_finished(session)

    def _env_poller_loop(
-        self, session: ProcessSession, env: Any, log_path: str, pid_path: str
+        self, session: ProcessSession, env: Any, log_path: str, pid_path: str, exit_path: str
    ):
        """Background thread: poll a sandbox log file for non-local backends."""
+        quoted_log_path = shlex.quote(log_path)
+        quoted_pid_path = shlex.quote(pid_path)
+        quoted_exit_path = shlex.quote(exit_path)
        while not session.exited:
            time.sleep(2)  # Poll every 2 seconds
            try:
                # Read new output from the log file
-                result = env.execute(f"cat {log_path} 2>/dev/null", timeout=10)
+                result = env.execute(f"cat {quoted_log_path} 2>/dev/null", timeout=10)
                new_output = result.get("output", "")
                if new_output:
                    with session._lock:
@@ -403,14 +427,14 @@ class ProcessRegistry:

                # Check if process is still running
                check = env.execute(
-                    f"kill -0 $(cat {pid_path} 2>/dev/null) 2>/dev/null; echo $?",
+                    f"kill -0 \"$(cat {quoted_pid_path} 2>/dev/null)\" 2>/dev/null; echo $?",
                    timeout=5,
                )
                check_output = check.get("output", "").strip()
                if check_output and check_output.splitlines()[-1].strip() != "0":
-                    # Process has exited -- get exit code
+                    # Process has exited -- get exit code captured by the wrapper shell.
                    exit_result = env.execute(
-                        f"wait $(cat {pid_path} 2>/dev/null) 2>/dev/null; echo $?",
+                        f"cat {quoted_exit_path} 2>/dev/null",
                        timeout=5,
                    )
                    exit_str = exit_result.get("output", "").strip()
@@ -700,6 +724,29 @@ class ProcessRegistry:
        """Send data + newline to a running process's stdin (like pressing Enter)."""
        return self.write_stdin(session_id, data + "\n")

+    def close_stdin(self, session_id: str) -> dict:
+        """Close a running process's stdin / send EOF without killing the process."""
+        session = self.get(session_id)
+        if session is None:
+            return {"status": "not_found", "error": f"No process with ID {session_id}"}
+        if session.exited:
+            return {"status": "already_exited", "error": "Process has already finished"}
+
+        if hasattr(session, '_pty') and session._pty:
+            try:
+                session._pty.sendeof()
+                return {"status": "ok", "message": "EOF sent"}
+            except Exception as e:
+                return {"status": "error", "error": str(e)}
+
+        if not session.process or not session.process.stdin:
+            return {"status": "error", "error": "Process stdin not available (non-local backend or stdin closed)"}
+        try:
+            session.process.stdin.close()
+            return {"status": "ok", "message": "stdin closed"}
+        except Exception as e:
+            return {"status": "error", "error": str(e)}
+
    def list_sessions(self, task_id: str = None) -> list:
        """List all running and recently-finished processes."""
        with self._lock:
@@ -915,14 +962,14 @@ PROCESS_SCHEMA = {
        "Actions: 'list' (show all), 'poll' (check status + new output), "
        "'log' (full output with pagination), 'wait' (block until done or timeout), "
        "'kill' (terminate), 'write' (send raw stdin data without newline), "
-        "'submit' (send data + Enter, for answering prompts)."
+        "'submit' (send data + Enter, for answering prompts), 'close' (close stdin/send EOF)."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "action": {
                "type": "string",
-                "enum": ["list", "poll", "log", "wait", "kill", "write", "submit"],
+                "enum": ["list", "poll", "log", "wait", "kill", "write", "submit", "close"],
                "description": "Action to perform on background processes"
            },
            "session_id": {
@@ -962,7 +1009,7 @@ def _handle_process(args, **kw):

    if action == "list":
        return _json.dumps({"processes": process_registry.list_sessions(task_id=task_id)}, ensure_ascii=False)
-    elif action in ("poll", "log", "wait", "kill", "write", "submit"):
+    elif action in ("poll", "log", "wait", "kill", "write", "submit", "close"):
        if not session_id:
            return tool_error(f"session_id is required for {action}")
        if action == "poll":
@@ -978,7 +1025,9 @@ def _handle_process(args, **kw):
            return _json.dumps(process_registry.write_stdin(session_id, str(args.get("data", ""))), ensure_ascii=False)
        elif action == "submit":
            return _json.dumps(process_registry.submit_stdin(session_id, str(args.get("data", ""))), ensure_ascii=False)
-    return tool_error(f"Unknown process action: {action}. Use: list, poll, log, wait, kill, write, submit")
+        elif action == "close":
+            return _json.dumps(process_registry.close_stdin(session_id), ensure_ascii=False)
+    return tool_error(f"Unknown process action: {action}. Use: list, poll, log, wait, kill, write, submit, close")


 registry.register(
@@ -75,6 +75,9 @@ from tools.tool_backend_helpers import (
 )


+# Hard cap on foreground timeout; override via TERMINAL_MAX_FOREGROUND_TIMEOUT env var.
+FOREGROUND_MAX_TIMEOUT = int(os.getenv("TERMINAL_MAX_FOREGROUND_TIMEOUT", "600"))
+
 # Disk usage warning threshold (in GB)
 DISK_USAGE_WARNING_THRESHOLD_GB = float(os.getenv("TERMINAL_DISK_WARNING_GB", "500"))

@@ -1112,6 +1115,21 @@ def _interpret_exit_code(command: str, exit_code: int) -> str | None:
    return None


+def _command_requires_pipe_stdin(command: str) -> bool:
+    """Return True when PTY mode would break stdin-driven commands.
+
+    Some CLIs change behavior when stdin is a TTY. In particular,
+    `gh auth login --with-token` expects the token to arrive via piped stdin and
+    waits for EOF; when we launch it under a PTY, `process.submit()` only sends a
+    newline, so the command appears to hang forever with no visible progress.
+    """
+    normalized = " ".join(command.lower().split())
+    return (
+        normalized.startswith("gh auth login")
+        and "--with-token" in normalized
+    )
+
+
 def terminal_tool(
    command: str,
    background: bool = False,
@@ -1193,6 +1211,17 @@ def terminal_tool(
        default_timeout = config["timeout"]
        effective_timeout = timeout or default_timeout

+        # Reject foreground commands where the model explicitly requests
+        # a timeout above FOREGROUND_MAX_TIMEOUT — nudge it toward background.
+        if not background and timeout and timeout > FOREGROUND_MAX_TIMEOUT:
+            return json.dumps({
+                "error": (
+                    f"Foreground timeout {timeout}s exceeds the maximum of "
+                    f"{FOREGROUND_MAX_TIMEOUT}s. Use background=true with "
+                    f"notify_on_complete=true for long-running commands."
+                ),
+            }, ensure_ascii=False)
+
        # Start cleanup thread
        _start_cleanup_thread()

@@ -1332,6 +1361,17 @@ def terminal_tool(
                }, ensure_ascii=False)

        # Prepare command for execution
+        pty_disabled_reason = None
+        effective_pty = pty
+        if pty and _command_requires_pipe_stdin(command):
+            effective_pty = False
+            pty_disabled_reason = (
+                "PTY disabled for this command because it expects piped stdin/EOF "
+                "(for example gh auth login --with-token). For local background "
+                "processes, call process(action='close') after writing so it receives "
+                "EOF."
+            )
+
        if background:
            # Spawn a tracked background process via the process registry.
            # For local backends: uses subprocess.Popen with output buffering.
@@ -1349,7 +1389,7 @@ def terminal_tool(
                        task_id=effective_task_id,
                        session_key=session_key,
                        env_vars=env.env if hasattr(env, 'env') else None,
-                        use_pty=pty,
+                        use_pty=effective_pty,
                    )
                else:
                    proc_session = process_registry.spawn_via_env(
@@ -1369,14 +1409,8 @@ def terminal_tool(
                }
                if approval_note:
                    result_data["approval"] = approval_note
-
-                # Transparent timeout clamping note
-                max_timeout = effective_timeout
-                if timeout and timeout > max_timeout:
-                    result_data["timeout_note"] = (
-                        f"Requested timeout {timeout}s was clamped to "
-                        f"configured limit of {max_timeout}s"
-                    )
+                if pty_disabled_reason:
+                    result_data["pty_note"] = pty_disabled_reason

                # Mark for agent notification on completion
                if notify_on_complete and background:
@@ -1705,7 +1739,7 @@ TERMINAL_SCHEMA = {
            },
            "timeout": {
                "type": "integer",
-                "description": "Max seconds to wait (default: 180). Returns INSTANTLY when command finishes — set high for long tasks, you won't wait unnecessarily.",
+                "description": f"Max seconds to wait (default: 180, foreground max: {FOREGROUND_MAX_TIMEOUT}). Returns INSTANTLY when command finishes — set high for long tasks, you won't wait unnecessarily. Foreground timeout above {FOREGROUND_MAX_TIMEOUT}s is rejected; use background=true for longer commands.",
                "minimum": 1
            },
            "workdir": {
@@ -9,9 +9,11 @@ Defense against context-window overflow operates at three levels:
 2. **Per-result persistence** (maybe_persist_tool_result): After a tool
   returns, if its output exceeds the tool's registered threshold
   (registry.get_max_result_size), the full output is written INTO THE
-   SANDBOX at /tmp/hermes-results/{tool_use_id}.txt via env.execute().
-   The in-context content is replaced with a preview + file path reference.
-   The model can read_file to access the full output on any backend.
+   SANDBOX temp dir (for example /tmp/hermes-results/{tool_use_id}.txt on
+   standard Linux, or $TMPDIR/hermes-results/{tool_use_id}.txt on Termux)
+   via env.execute(). The in-context content is replaced with a preview +
+   file path reference. The model can read_file to access the full output
+   on any backend.

 3. **Per-turn aggregate budget** (enforce_turn_budget): After all tool
   results in a single assistant turn are collected, if the total exceeds
@@ -21,6 +23,7 @@ Defense against context-window overflow operates at three levels:
 """

 import logging
+import os
 import uuid

 from tools.budget_config import (
@@ -37,6 +40,22 @@ HEREDOC_MARKER = "HERMES_PERSIST_EOF"
 _BUDGET_TOOL_NAME = "__budget_enforcement__"


+def _resolve_storage_dir(env) -> str:
+    """Return the best temp-backed storage dir for this environment."""
+    if env is not None:
+        get_temp_dir = getattr(env, "get_temp_dir", None)
+        if callable(get_temp_dir):
+            try:
+                temp_dir = get_temp_dir()
+            except Exception as exc:
+                logger.debug("Could not resolve env temp dir: %s", exc)
+            else:
+                if temp_dir:
+                    temp_dir = temp_dir.rstrip("/") or "/"
+                    return f"{temp_dir}/hermes-results"
+    return STORAGE_DIR
+
+
 def generate_preview(content: str, max_chars: int = DEFAULT_PREVIEW_SIZE_CHARS) -> tuple[str, bool]:
    """Truncate at last newline within max_chars. Returns (preview, has_more)."""
    if len(content) <= max_chars:
@@ -58,8 +77,9 @@ def _heredoc_marker(content: str) -> str:
 def _write_to_sandbox(content: str, remote_path: str, env) -> bool:
    """Write content into the sandbox via env.execute(). Returns True on success."""
    marker = _heredoc_marker(content)
+    storage_dir = os.path.dirname(remote_path)
    cmd = (
-        f"mkdir -p {STORAGE_DIR} && cat > {remote_path} << '{marker}'\n"
+        f"mkdir -p {storage_dir} && cat > {remote_path} << '{marker}'\n"
        f"{content}\n"
        f"{marker}"
    )
@@ -125,7 +145,8 @@ def maybe_persist_tool_result(
    if len(content) <= effective_threshold:
        return content

-    remote_path = f"{STORAGE_DIR}/{tool_use_id}.txt"
+    storage_dir = _resolve_storage_dir(env)
+    remote_path = f"{storage_dir}/{tool_use_id}.txt"
    preview, has_more = generate_preview(content, max_chars=config.preview_size)

    if env is not None:
@@ -48,6 +48,47 @@ def _audio_available() -> bool:
        return False


+from hermes_constants import is_termux as _is_termux_environment
+
+
+def _voice_capture_install_hint() -> str:
+    if _is_termux_environment():
+        return "pkg install python-numpy portaudio && python -m pip install sounddevice"
+    return "pip install sounddevice numpy"
+
+
+def _termux_microphone_command() -> Optional[str]:
+    if not _is_termux_environment():
+        return None
+    return shutil.which("termux-microphone-record")
+
+
+def _termux_media_player_command() -> Optional[str]:
+    if not _is_termux_environment():
+        return None
+    return shutil.which("termux-media-player")
+
+
+def _termux_api_app_installed() -> bool:
+    if not _is_termux_environment():
+        return False
+    try:
+        result = subprocess.run(
+            ["pm", "list", "packages", "com.termux.api"],
+            capture_output=True,
+            text=True,
+            timeout=5,
+            check=False,
+        )
+        return "package:com.termux.api" in (result.stdout or "")
+    except Exception:
+        return False
+
+
+def _termux_voice_capture_available() -> bool:
+    return _termux_microphone_command() is not None and _termux_api_app_installed()
+
+
 def detect_audio_environment() -> dict:
    """Detect if the current environment supports audio I/O.

@@ -57,6 +98,9 @@ def detect_audio_environment() -> dict:
    """
    warnings = []   # hard-fail: these block voice mode
    notices = []     # informational: logged but don't block
+    termux_mic_cmd = _termux_microphone_command()
+    termux_app_installed = _termux_api_app_installed()
+    termux_capture = bool(termux_mic_cmd and termux_app_installed)

    # SSH detection
    if any(os.environ.get(v) for v in ('SSH_CLIENT', 'SSH_TTY', 'SSH_CONNECTION')):
@@ -89,23 +133,48 @@ def detect_audio_environment() -> dict:
        try:
            devices = sd.query_devices()
            if not devices:
-                warnings.append("No audio input/output devices detected")
+                if termux_capture:
+                    notices.append("No PortAudio devices detected, but Termux:API microphone capture is available")
+                else:
+                    warnings.append("No audio input/output devices detected")
        except Exception:
            # In WSL with PulseAudio, device queries can fail even though
            # recording/playback works fine. Don't block if PULSE_SERVER is set.
            if os.environ.get('PULSE_SERVER'):
                notices.append("Audio device query failed but PULSE_SERVER is set -- continuing")
+            elif termux_capture:
+                notices.append("PortAudio device query failed, but Termux:API microphone capture is available")
            else:
                warnings.append("Audio subsystem error (PortAudio cannot query devices)")
    except ImportError:
-        warnings.append("Audio libraries not installed (pip install sounddevice numpy)")
+        if termux_capture:
+            notices.append("Termux:API microphone recording available (sounddevice not required)")
+        elif termux_mic_cmd and not termux_app_installed:
+            warnings.append(
+                "Termux:API Android app is not installed. Install/update the Termux:API app to use termux-microphone-record."
+            )
+        else:
+            warnings.append(f"Audio libraries not installed ({_voice_capture_install_hint()})")
    except OSError:
-        warnings.append(
-            "PortAudio system library not found -- install it first:\n"
-            "  Linux:  sudo apt-get install libportaudio2\n"
-            "  macOS:  brew install portaudio\n"
-            "Then retry /voice on."
-        )
+        if termux_capture:
+            notices.append("Termux:API microphone recording available (PortAudio not required)")
+        elif termux_mic_cmd and not termux_app_installed:
+            warnings.append(
+                "Termux:API Android app is not installed. Install/update the Termux:API app to use termux-microphone-record."
+            )
+        elif _is_termux_environment():
+            warnings.append(
+                "PortAudio system library not found -- install it first:\n"
+                "  Termux: pkg install portaudio\n"
+                "Then retry /voice on."
+            )
+        else:
+            warnings.append(
+                "PortAudio system library not found -- install it first:\n"
+                "  Linux:  sudo apt-get install libportaudio2\n"
+                "  macOS:  brew install portaudio\n"
+                "Then retry /voice on."
+            )

    return {
        "available": not warnings,
@@ -174,6 +243,134 @@ def play_beep(frequency: int = 880, duration: float = 0.12, count: int = 1) -> N
        logger.debug("Beep playback failed: %s", e)


+# ============================================================================
+# Termux Audio Recorder
+# ============================================================================
+class TermuxAudioRecorder:
+    """Recorder backend that uses Termux:API microphone capture commands."""
+
+    supports_silence_autostop = False
+
+    def __init__(self) -> None:
+        self._lock = threading.Lock()
+        self._recording = False
+        self._start_time = 0.0
+        self._recording_path: Optional[str] = None
+        self._current_rms = 0
+
+    @property
+    def is_recording(self) -> bool:
+        return self._recording
+
+    @property
+    def elapsed_seconds(self) -> float:
+        if not self._recording:
+            return 0.0
+        return time.monotonic() - self._start_time
+
+    @property
+    def current_rms(self) -> int:
+        return self._current_rms
+
+    def start(self, on_silence_stop=None) -> None:
+        del on_silence_stop  # Termux:API does not expose live silence callbacks.
+        mic_cmd = _termux_microphone_command()
+        if not mic_cmd:
+            raise RuntimeError(
+                "Termux voice capture requires the termux-api package and app.\n"
+                "Install with: pkg install termux-api\n"
+                "Then install/update the Termux:API Android app."
+            )
+        if not _termux_api_app_installed():
+            raise RuntimeError(
+                "Termux voice capture requires the Termux:API Android app.\n"
+                "Install/update the Termux:API app, then retry /voice on."
+            )
+
+        with self._lock:
+            if self._recording:
+                return
+            os.makedirs(_TEMP_DIR, exist_ok=True)
+            timestamp = time.strftime("%Y%m%d_%H%M%S")
+            self._recording_path = os.path.join(_TEMP_DIR, f"recording_{timestamp}.aac")
+
+        command = [
+            mic_cmd,
+            "-f", self._recording_path,
+            "-l", "0",
+            "-e", "aac",
+            "-r", str(SAMPLE_RATE),
+            "-c", str(CHANNELS),
+        ]
+        try:
+            subprocess.run(command, capture_output=True, text=True, timeout=15, check=True)
+        except subprocess.CalledProcessError as e:
+            details = (e.stderr or e.stdout or str(e)).strip()
+            raise RuntimeError(f"Termux microphone start failed: {details}") from e
+        except Exception as e:
+            raise RuntimeError(f"Termux microphone start failed: {e}") from e
+
+        with self._lock:
+            self._start_time = time.monotonic()
+            self._recording = True
+            self._current_rms = 0
+        logger.info("Termux voice recording started")
+
+    def _stop_termux_recording(self) -> None:
+        mic_cmd = _termux_microphone_command()
+        if not mic_cmd:
+            return
+        subprocess.run([mic_cmd, "-q"], capture_output=True, text=True, timeout=15, check=False)
+
+    def stop(self) -> Optional[str]:
+        with self._lock:
+            if not self._recording:
+                return None
+            self._recording = False
+            path = self._recording_path
+            self._recording_path = None
+            started_at = self._start_time
+            self._current_rms = 0
+
+        self._stop_termux_recording()
+        if not path or not os.path.isfile(path):
+            return None
+        if time.monotonic() - started_at < 0.3:
+            try:
+                os.unlink(path)
+            except OSError:
+                pass
+            return None
+        if os.path.getsize(path) <= 0:
+            try:
+                os.unlink(path)
+            except OSError:
+                pass
+            return None
+        logger.info("Termux voice recording stopped: %s", path)
+        return path
+
+    def cancel(self) -> None:
+        with self._lock:
+            path = self._recording_path
+            self._recording = False
+            self._recording_path = None
+            self._current_rms = 0
+        try:
+            self._stop_termux_recording()
+        except Exception:
+            pass
+        if path and os.path.isfile(path):
+            try:
+                os.unlink(path)
+            except OSError:
+                pass
+        logger.info("Termux voice recording cancelled")
+
+    def shutdown(self) -> None:
+        self.cancel()
+
+
 # ============================================================================
 # AudioRecorder
 # ============================================================================
@@ -193,6 +390,8 @@ class AudioRecorder:
    the user is silent for ``silence_duration`` seconds and calls the callback.
    """

+    supports_silence_autostop = True
+
    def __init__(self) -> None:
        self._lock = threading.Lock()
        self._stream: Any = None
@@ -526,6 +725,13 @@ class AudioRecorder:
        return wav_path


+def create_audio_recorder() -> AudioRecorder | TermuxAudioRecorder:
+    """Return the best recorder backend for the current environment."""
+    if _termux_voice_capture_available():
+        return TermuxAudioRecorder()
+    return AudioRecorder()
+
+
 # ============================================================================
 # Whisper hallucination filter
 # ============================================================================
@@ -734,7 +940,8 @@ def check_voice_requirements() -> Dict[str, Any]:
    stt_available = stt_enabled and stt_provider != "none"

    missing: List[str] = []
-    has_audio = _audio_available()
+    termux_capture = _termux_voice_capture_available()
+    has_audio = _audio_available() or termux_capture

    if not has_audio:
        missing.extend(["sounddevice", "numpy"])
@@ -745,10 +952,12 @@ def check_voice_requirements() -> Dict[str, Any]:
    available = has_audio and stt_available and env_check["available"]
    details_parts = []

-    if has_audio:
+    if termux_capture:
+        details_parts.append("Audio capture: OK (Termux:API microphone)")
+    elif has_audio:
        details_parts.append("Audio capture: OK")
    else:
-        details_parts.append("Audio capture: MISSING (pip install sounddevice numpy)")
+        details_parts.append(f"Audio capture: MISSING ({_voice_capture_install_hint()})")

    if not stt_enabled:
        details_parts.append("STT provider: DISABLED in config (stt.enabled: false)")
@@ -1772,6 +1772,15 @@ slack = [
 sms = [
    { name = "aiohttp" },
 ]
+termux = [
+    { name = "agent-client-protocol" },
+    { name = "croniter" },
+    { name = "honcho-ai" },
+    { name = "mcp" },
+    { name = "ptyprocess", marker = "sys_platform != 'win32'" },
+    { name = "pywinpty", marker = "sys_platform == 'win32'" },
+    { name = "simple-term-menu" },
+]
 tts-premium = [
    { name = "elevenlabs" },
 ]
@@ -1806,19 +1815,25 @@ requires-dist = [
    { name = "fire", specifier = ">=0.7.1,<1" },
    { name = "firecrawl-py", specifier = ">=4.16.0,<5" },
    { name = "hermes-agent", extras = ["acp"], marker = "extra == 'all'" },
+    { name = "hermes-agent", extras = ["acp"], marker = "extra == 'termux'" },
    { name = "hermes-agent", extras = ["cli"], marker = "extra == 'all'" },
+    { name = "hermes-agent", extras = ["cli"], marker = "extra == 'termux'" },
    { name = "hermes-agent", extras = ["cron"], marker = "extra == 'all'" },
+    { name = "hermes-agent", extras = ["cron"], marker = "extra == 'termux'" },
    { name = "hermes-agent", extras = ["daytona"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["dev"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["dingtalk"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["feishu"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["homeassistant"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["honcho"], marker = "extra == 'all'" },
+    { name = "hermes-agent", extras = ["honcho"], marker = "extra == 'termux'" },
    { name = "hermes-agent", extras = ["mcp"], marker = "extra == 'all'" },
+    { name = "hermes-agent", extras = ["mcp"], marker = "extra == 'termux'" },
    { name = "hermes-agent", extras = ["messaging"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["mistral"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["modal"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["pty"], marker = "extra == 'all'" },
+    { name = "hermes-agent", extras = ["pty"], marker = "extra == 'termux'" },
    { name = "hermes-agent", extras = ["slack"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["sms"], marker = "extra == 'all'" },
    { name = "hermes-agent", extras = ["tts-premium"], marker = "extra == 'all'" },
@@ -1861,7 +1876,7 @@ requires-dist = [
    { name = "wandb", marker = "extra == 'rl'", specifier = ">=0.15.0,<1" },
    { name = "yc-bench", marker = "python_full_version >= '3.12' and extra == 'yc-bench'", git = "https://github.com/collinear-ai/yc-bench.git" },
 ]
-provides-extras = ["modal", "daytona", "dev", "messaging", "cron", "slack", "matrix", "cli", "tts-premium", "voice", "pty", "honcho", "mcp", "homeassistant", "sms", "acp", "mistral", "dingtalk", "feishu", "rl", "yc-bench", "all"]
+provides-extras = ["modal", "daytona", "dev", "messaging", "cron", "slack", "matrix", "cli", "tts-premium", "voice", "pty", "honcho", "mcp", "homeassistant", "sms", "acp", "mistral", "termux", "dingtalk", "feishu", "rl", "yc-bench", "all"]

 [[package]]
 name = "hf-transfer"
@@ -132,6 +132,22 @@ import requests, json
 # Print summary to stdout — agent analyzes and reports
 ```

+The script timeout defaults to 120 seconds. `_get_script_timeout()` resolves the limit through a three-layer chain:
+
+1. **Module-level override** — `_SCRIPT_TIMEOUT` (for tests/monkeypatching). Only used when it differs from the default.
+2. **Environment variable** — `HERMES_CRON_SCRIPT_TIMEOUT`
+3. **Config** — `cron.script_timeout_seconds` in `config.yaml` (read via `load_config()`)
+4. **Default** — 120 seconds
+
+### Provider Recovery
+
+`run_job()` passes the user's configured fallback providers and credential pool into the `AIAgent` instance:
+
+- **Fallback providers** — reads `fallback_providers` (list) or `fallback_model` (legacy dict) from `config.yaml`, matching the gateway's `_load_fallback_model()` pattern. Passed as `fallback_model=` to `AIAgent.__init__`, which normalizes both formats into a fallback chain.
+- **Credential pool** — loads via `load_pool(provider)` from `agent.credential_pool` using the resolved runtime provider name. Only passed when the pool has credentials (`pool.has_credentials()`). Enables same-provider key rotation on 429/rate-limit errors.
+
+This mirrors the gateway's behavior — without it, cron agents would fail on rate limits without attempting recovery.
+
 ## Delivery Model

 Cron job results can be delivered to any supported platform:
@@ -1,7 +1,7 @@
 ---
 sidebar_position: 2
 title: "Installation"
-description: "Install Hermes Agent on Linux, macOS, or WSL2"
+description: "Install Hermes Agent on Linux, macOS, WSL2, or Android via Termux"
 ---

 # Installation
@@ -16,6 +16,23 @@ Get Hermes Agent up and running in under two minutes with the one-line installer
 curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
 ```

+### Android / Termux
+
+Hermes now ships a Termux-aware installer path too:
+
+```bash
+curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
+```
+
+The installer detects Termux automatically and switches to a tested Android flow:
+- uses Termux `pkg` for system dependencies (`git`, `python`, `nodejs`, `ripgrep`, `ffmpeg`, build tools)
+- creates the virtualenv with `python -m venv`
+- exports `ANDROID_API_LEVEL` automatically for Android wheel builds
+- installs a curated `.[termux]` extra with `pip`
+- skips the untested browser / WhatsApp bootstrap by default
+
+If you want the fully explicit path, follow the dedicated [Termux guide](./termux.md).
+
 :::warning Windows
 Native Windows is **not supported**. Please install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) and run Hermes Agent from there. The install command above works inside WSL2.
 :::
@@ -125,6 +142,7 @@ uv pip install -e "."
 | `tts-premium` | ElevenLabs premium voices | `uv pip install -e ".[tts-premium]"` |
 | `voice` | CLI microphone input + audio playback | `uv pip install -e ".[voice]"` |
 | `pty` | PTY terminal support | `uv pip install -e ".[pty]"` |
+| `termux` | Tested Android / Termux bundle (`cron`, `cli`, `pty`, `mcp`, `honcho`, `acp`) | `python -m pip install -e ".[termux]" -c constraints-termux.txt` |
 | `honcho` | AI-native memory (Honcho integration) | `uv pip install -e ".[honcho]"` |
 | `mcp` | Model Context Protocol support | `uv pip install -e ".[mcp]"` |
 | `homeassistant` | Home Assistant integration | `uv pip install -e ".[homeassistant]"` |
@@ -134,6 +152,10 @@ uv pip install -e "."

 You can combine extras: `uv pip install -e ".[messaging,cron]"`

+:::tip Termux users
+`.[all]` is not currently available on Android because the `voice` extra pulls `faster-whisper`, which depends on `ctranslate2` wheels that are not published for Android. Use `.[termux]` for the tested mobile install path, then add individual extras only as needed.
+:::
+
 </details>

 ### Step 4: Install Optional Submodules (if needed)
@@ -13,10 +13,14 @@ This guide walks you through installing Hermes Agent, setting up a provider, and
 Run the one-line installer:

 ```bash
-# Linux / macOS / WSL2
+# Linux / macOS / WSL2 / Android (Termux)
 curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
 ```

+:::tip Android / Termux
+If you're installing on a phone, see the dedicated [Termux guide](./termux.md) for the tested manual path, supported extras, and current Android-specific limitations.
+:::
+
 :::tip Windows Users
 Install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) first, then run the command above inside your WSL2 terminal.
 :::
--- a/Show More
+++ b/Show More