fix: clear conversation_history after mid-loop compression to prevent empty sessions

After mid-loop compression (triggered by 413, context_overflow, or Anthropic long-context tier errors), _compress_context() creates a new session in SQLite and resets _last_flushed_db_idx=0. However, conversation_history was not cleared, so _flush_messages_to_session_db() computed: flush_from = max(len(conversation_history=200), _last_flushed_db_idx=0) = 200 messages[200:] → empty (compressed messages < 200) This resulted in zero messages being written to the new session's SQLite store. On resume, the user would see 'Session found but has no messages.' The preflight compression path (line 7311) already had the fix: conversation_history = None This commit adds the same clearing to the three mid-loop compression sites: - Anthropic long-context tier overflow - HTTP 413 payload too large - Generic context_overflow error Reported by Aaryan (Nous community).
fix(gateway): bypass text batching when delay is 0 (#6996 )
2026-04-10 00:05:04 -07:00 · 2026-04-09 23:59:20 -07:00 · 2026-04-09 23:28:25 -07:00 · 2026-04-09 23:25:56 -07:00 · 2026-04-09 23:25:27 -07:00 · 2026-04-09 23:25:27 -07:00
144 changed files with 8004 additions and 495 deletions
@@ -13,7 +13,8 @@ COPY . /opt/hermes
 WORKDIR /opt/hermes

 # Install Python and Node dependencies in one layer, no cache
-RUN pip install --no-cache-dir -e ".[all]" --break-system-packages && \
+RUN pip install --no-cache-dir uv --break-system-packages && \
+    uv pip install --system --break-system-packages --no-cache -e ".[all]" && \
    npm install --prefer-offline --no-audit && \
    npx playwright install --with-deps chromium --only-shell && \
    cd /opt/hermes/scripts/whatsapp-bridge && \
@@ -33,8 +33,10 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
 curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
 ```

-Works on Linux, macOS, and WSL2. The installer handles everything — Python, Node.js, dependencies, and the `hermes` command. No prerequisites except git.
+Works on Linux, macOS, WSL2, and Android via Termux. The installer handles the platform-specific setup for you.

+> **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
+>
 > **Windows:** Native Windows is not supported. Please install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) and run the command above.

 After installation:
@@ -74,8 +74,11 @@ def _get_anthropic_max_output(model: str) -> int:
    model IDs (claude-sonnet-4-5-20250929) and variant suffixes (:1m, :fast)
    resolve correctly.  Longest-prefix match wins to avoid e.g. "claude-3-5"
    matching before "claude-3-5-sonnet".
+
+    Normalizes dots to hyphens so that model names like
+    ``anthropic/claude-opus-4.6`` match the ``claude-opus-4-6`` table key.
    """
-    m = model.lower()
+    m = model.lower().replace(".", "-")
    best_key = ""
    best_val = _ANTHROPIC_DEFAULT_OUTPUT_LIMIT
    for key, val in _ANTHROPIC_OUTPUT_LIMITS.items():
@@ -95,6 +98,10 @@ _COMMON_BETAS = [
    "interleaved-thinking-2025-05-14",
    "fine-grained-tool-streaming-2025-05-14",
 ]
+# MiniMax's Anthropic-compatible endpoints fail tool-use requests when
+# the fine-grained tool streaming beta is present.  Omit it so tool calls
+# fall back to the provider's default response path.
+_TOOL_STREAMING_BETA = "fine-grained-tool-streaming-2025-05-14"

 # Additional beta headers required for OAuth/subscription auth.
 # Matches what Claude Code (and pi-ai / OpenCode) send.
@@ -204,6 +211,19 @@ def _requires_bearer_auth(base_url: str | None) -> bool:
    return normalized.startswith(("https://api.minimax.io/anthropic", "https://api.minimaxi.com/anthropic"))


+def _common_betas_for_base_url(base_url: str | None) -> list[str]:
+    """Return the beta headers that are safe for the configured endpoint.
+
+    MiniMax's Anthropic-compatible endpoints (Bearer-auth) reject requests
+    that include Anthropic's ``fine-grained-tool-streaming`` beta — every
+    tool-use message triggers a connection error.  Strip that beta for
+    Bearer-auth endpoints while keeping all other betas intact.
+    """
+    if _requires_bearer_auth(base_url):
+        return [b for b in _COMMON_BETAS if b != _TOOL_STREAMING_BETA]
+    return _COMMON_BETAS
+
+
 def build_anthropic_client(api_key: str, base_url: str = None):
    """Create an Anthropic client, auto-detecting setup-tokens vs API keys.

@@ -222,6 +242,7 @@ def build_anthropic_client(api_key: str, base_url: str = None):
    }
    if normalized_base_url:
        kwargs["base_url"] = normalized_base_url
+    common_betas = _common_betas_for_base_url(normalized_base_url)

    if _requires_bearer_auth(normalized_base_url):
        # Some Anthropic-compatible providers (e.g. MiniMax) expect the API key in
@@ -231,21 +252,21 @@ def build_anthropic_client(api_key: str, base_url: str = None):
        # not use Anthropic's sk-ant-api prefix and would otherwise be misread as
        # Anthropic OAuth/setup tokens.
        kwargs["auth_token"] = api_key
-        if _COMMON_BETAS:
-            kwargs["default_headers"] = {"anthropic-beta": ",".join(_COMMON_BETAS)}
+        if common_betas:
+            kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}
    elif _is_third_party_anthropic_endpoint(base_url):
        # Third-party proxies (Azure AI Foundry, AWS Bedrock, etc.) use their
        # own API keys with x-api-key auth. Skip OAuth detection — their keys
        # don't follow Anthropic's sk-ant-* prefix convention and would be
        # misclassified as OAuth tokens.
        kwargs["api_key"] = api_key
-        if _COMMON_BETAS:
-            kwargs["default_headers"] = {"anthropic-beta": ",".join(_COMMON_BETAS)}
+        if common_betas:
+            kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}
    elif _is_oauth_token(api_key):
        # OAuth access token / setup-token → Bearer auth + Claude Code identity.
        # Anthropic routes OAuth requests based on user-agent and headers;
        # without Claude Code's fingerprint, requests get intermittent 500s.
-        all_betas = _COMMON_BETAS + _OAUTH_ONLY_BETAS
+        all_betas = common_betas + _OAUTH_ONLY_BETAS
        kwargs["auth_token"] = api_key
        kwargs["default_headers"] = {
            "anthropic-beta": ",".join(all_betas),
@@ -255,8 +276,8 @@ def build_anthropic_client(api_key: str, base_url: str = None):
    else:
        # Regular API key → x-api-key header + common betas
        kwargs["api_key"] = api_key
-        if _COMMON_BETAS:
-            kwargs["default_headers"] = {"anthropic-beta": ",".join(_COMMON_BETAS)}
+        if common_betas:
+            kwargs["default_headers"] = {"anthropic-beta": ",".join(common_betas)}

    return _anthropic_sdk.Anthropic(**kwargs)

@@ -485,6 +506,35 @@ def _prefer_refreshable_claude_code_token(env_token: str, creds: Optional[Dict[s
    return None


+def get_anthropic_token_source(token: Optional[str] = None) -> str:
+    """Best-effort source classification for an Anthropic credential token."""
+    token = (token or "").strip()
+    if not token:
+        return "none"
+
+    env_token = os.getenv("ANTHROPIC_TOKEN", "").strip()
+    if env_token and env_token == token:
+        return "anthropic_token_env"
+
+    cc_env_token = os.getenv("CLAUDE_CODE_OAUTH_TOKEN", "").strip()
+    if cc_env_token and cc_env_token == token:
+        return "claude_code_oauth_token_env"
+
+    creds = read_claude_code_credentials()
+    if creds and creds.get("accessToken") == token:
+        return str(creds.get("source") or "claude_code_credentials")
+
+    managed_key = read_claude_managed_key()
+    if managed_key and managed_key == token:
+        return "claude_json_primary_api_key"
+
+    api_key = os.getenv("ANTHROPIC_API_KEY", "").strip()
+    if api_key and api_key == token:
+        return "anthropic_api_key_env"
+
+    return "unknown"
+
+
 def resolve_anthropic_token() -> Optional[str]:
    """Resolve an Anthropic token from all available sources.

@@ -691,6 +741,21 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]:
    }


+def _save_hermes_oauth_credentials(access_token: str, refresh_token: str, expires_at_ms: int) -> None:
+    """Save OAuth credentials to ~/.hermes/.anthropic_oauth.json."""
+    data = {
+        "accessToken": access_token,
+        "refreshToken": refresh_token,
+        "expiresAt": expires_at_ms,
+    }
+    try:
+        _HERMES_OAUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
+        _HERMES_OAUTH_FILE.write_text(json.dumps(data, indent=2), encoding="utf-8")
+        _HERMES_OAUTH_FILE.chmod(0o600)
+    except (OSError, IOError) as e:
+        logger.debug("Failed to save Hermes OAuth credentials: %s", e)
+
+
 def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
    """Read Hermes-managed OAuth credentials from ~/.hermes/.anthropic_oauth.json."""
    if _HERMES_OAUTH_FILE.exists():
@@ -739,6 +804,39 @@ def _sanitize_tool_id(tool_id: str) -> str:
    return sanitized or "tool_0"


+def _convert_openai_image_part_to_anthropic(part: Dict[str, Any]) -> Optional[Dict[str, Any]]:
+    """Convert an OpenAI-style image block to Anthropic's image source format."""
+    image_data = part.get("image_url", {})
+    url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
+    if not isinstance(url, str) or not url.strip():
+        return None
+    url = url.strip()
+
+    if url.startswith("data:"):
+        header, sep, data = url.partition(",")
+        if sep and ";base64" in header:
+            media_type = header[5:].split(";", 1)[0] or "image/png"
+            return {
+                "type": "image",
+                "source": {
+                    "type": "base64",
+                    "media_type": media_type,
+                    "data": data,
+                },
+            }
+
+    if url.startswith(("http://", "https://")):
+        return {
+            "type": "image",
+            "source": {
+                "type": "url",
+                "url": url,
+            },
+        }
+
+    return None
+
+
 def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
    """Convert OpenAI tool definitions to Anthropic format."""
    if not tools:
@@ -1350,4 +1448,4 @@ def normalize_anthropic_response(
            reasoning_details=reasoning_details or None,
        ),
        finish_reason,
-    )
+    )
@@ -967,6 +967,40 @@ def _try_anthropic() -> Tuple[Optional[Any], Optional[str]]:
    return AnthropicAuxiliaryClient(real_client, model, token, base_url, is_oauth=is_oauth), model


+def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Resolve a specific forced provider.  Returns (None, None) if creds missing."""
+    if forced == "openrouter":
+        client, model = _try_openrouter()
+        if client is None:
+            logger.warning("auxiliary.provider=openrouter but OPENROUTER_API_KEY not set")
+        return client, model
+
+    if forced == "nous":
+        client, model = _try_nous()
+        if client is None:
+            logger.warning("auxiliary.provider=nous but Nous Portal not configured (run: hermes auth)")
+        return client, model
+
+    if forced == "codex":
+        client, model = _try_codex()
+        if client is None:
+            logger.warning("auxiliary.provider=codex but no Codex OAuth token found (run: hermes model)")
+        return client, model
+
+    if forced == "main":
+        # "main" = skip OpenRouter/Nous, use the main chat model's credentials.
+        for try_fn in (_try_custom_endpoint, _try_codex, _resolve_api_key_provider):
+            client, model = try_fn()
+            if client is not None:
+                return client, model
+        logger.warning("auxiliary.provider=main but no main endpoint credentials found")
+        return None, None
+
+    # Unknown provider name — fall through to auto
+    logger.warning("Unknown auxiliary.provider=%r, falling back to auto", forced)
+    return None, None
+
+
 _AUTO_PROVIDER_LABELS = {
    "_try_openrouter": "openrouter",
    "_try_nous": "nous",
@@ -1461,6 +1495,22 @@ def _strict_vision_backend_available(provider: str) -> bool:
    return _resolve_strict_vision_backend(provider)[0] is not None


+def _preferred_main_vision_provider() -> Optional[str]:
+    """Return the selected main provider when it is also a supported vision backend."""
+    try:
+        from hermes_cli.config import load_config
+
+        config = load_config()
+        model_cfg = config.get("model", {})
+        if isinstance(model_cfg, dict):
+            provider = _normalize_vision_provider(model_cfg.get("provider", ""))
+            if provider in _VISION_AUTO_PROVIDER_ORDER:
+                return provider
+    except Exception:
+        pass
+    return None
+
+
 def get_available_vision_backends() -> List[str]:
    """Return the currently available vision backends in auto-selection order.

@@ -1574,6 +1624,18 @@ def resolve_vision_provider_client(
    return requested, client, final_model


+def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Return (client, default_model_slug) for vision/multimodal auxiliary tasks."""
+    _, client, final_model = resolve_vision_provider_client(async_mode=False)
+    return client, final_model
+
+
+def get_async_vision_auxiliary_client():
+    """Return (async_client, model_slug) for async vision consumers."""
+    _, client, final_model = resolve_vision_provider_client(async_mode=True)
+    return client, final_model
+
+
 def get_auxiliary_extra_body() -> dict:
    """Return extra_body kwargs for auxiliary API calls.
    
@@ -0,0 +1,114 @@
+"""BuiltinMemoryProvider — wraps MEMORY.md / USER.md as a MemoryProvider.
+
+Always registered as the first provider. Cannot be disabled or removed.
+This is the existing Hermes memory system exposed through the provider
+interface for compatibility with the MemoryManager.
+
+The actual storage logic lives in tools/memory_tool.py (MemoryStore).
+This provider is a thin adapter that delegates to MemoryStore and
+exposes the memory tool schema.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+from typing import Any, Dict, List
+
+from agent.memory_provider import MemoryProvider
+from tools.registry import tool_error
+
+logger = logging.getLogger(__name__)
+
+
+class BuiltinMemoryProvider(MemoryProvider):
+    """Built-in file-backed memory (MEMORY.md + USER.md).
+
+    Always active, never disabled by other providers. The `memory` tool
+    is handled by run_agent.py's agent-level tool interception (not through
+    the normal registry), so get_tool_schemas() returns an empty list —
+    the memory tool is already wired separately.
+    """
+
+    def __init__(
+        self,
+        memory_store=None,
+        memory_enabled: bool = False,
+        user_profile_enabled: bool = False,
+    ):
+        self._store = memory_store
+        self._memory_enabled = memory_enabled
+        self._user_profile_enabled = user_profile_enabled
+
+    @property
+    def name(self) -> str:
+        return "builtin"
+
+    def is_available(self) -> bool:
+        """Built-in memory is always available."""
+        return True
+
+    def initialize(self, session_id: str, **kwargs) -> None:
+        """Load memory from disk if not already loaded."""
+        if self._store is not None:
+            self._store.load_from_disk()
+
+    def system_prompt_block(self) -> str:
+        """Return MEMORY.md and USER.md content for the system prompt.
+
+        Uses the frozen snapshot captured at load time. This ensures the
+        system prompt stays stable throughout a session (preserving the
+        prompt cache), even though the live entries may change via tool calls.
+        """
+        if not self._store:
+            return ""
+
+        parts = []
+        if self._memory_enabled:
+            mem_block = self._store.format_for_system_prompt("memory")
+            if mem_block:
+                parts.append(mem_block)
+        if self._user_profile_enabled:
+            user_block = self._store.format_for_system_prompt("user")
+            if user_block:
+                parts.append(user_block)
+
+        return "\n\n".join(parts)
+
+    def prefetch(self, query: str, *, session_id: str = "") -> str:
+        """Built-in memory doesn't do query-based recall — it's injected via system_prompt_block."""
+        return ""
+
+    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
+        """Built-in memory doesn't auto-sync turns — writes happen via the memory tool."""
+
+    def get_tool_schemas(self) -> List[Dict[str, Any]]:
+        """Return empty list.
+
+        The `memory` tool is an agent-level intercepted tool, handled
+        specially in run_agent.py before normal tool dispatch. It's not
+        part of the standard tool registry. We don't duplicate it here.
+        """
+        return []
+
+    def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
+        """Not used — the memory tool is intercepted in run_agent.py."""
+        return tool_error("Built-in memory tool is handled by the agent loop")
+
+    def shutdown(self) -> None:
+        """No cleanup needed — files are saved on every write."""
+
+    # -- Property access for backward compatibility --------------------------
+
+    @property
+    def store(self):
+        """Access the underlying MemoryStore for legacy code paths."""
+        return self._store
+
+    @property
+    def memory_enabled(self) -> bool:
+        return self._memory_enabled
+
+    @property
+    def user_profile_enabled(self) -> bool:
+        return self._user_profile_enabled
@@ -114,6 +114,7 @@ class ContextCompressor:

        self.last_prompt_tokens = 0
        self.last_completion_tokens = 0
+        self.last_total_tokens = 0

        self.summary_model = summary_model_override or ""

@@ -125,12 +126,28 @@ class ContextCompressor:
        """Update tracked token usage from API response."""
        self.last_prompt_tokens = usage.get("prompt_tokens", 0)
        self.last_completion_tokens = usage.get("completion_tokens", 0)
+        self.last_total_tokens = usage.get("total_tokens", 0)

    def should_compress(self, prompt_tokens: int = None) -> bool:
        """Check if context exceeds the compression threshold."""
        tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
        return tokens >= self.threshold_tokens

+    def should_compress_preflight(self, messages: List[Dict[str, Any]]) -> bool:
+        """Quick pre-flight check using rough estimate (before API call)."""
+        rough_estimate = estimate_messages_tokens_rough(messages)
+        return rough_estimate >= self.threshold_tokens
+
+    def get_status(self) -> Dict[str, Any]:
+        """Get current compression status for display/logging."""
+        return {
+            "last_prompt_tokens": self.last_prompt_tokens,
+            "threshold_tokens": self.threshold_tokens,
+            "context_length": self.context_length,
+            "usage_percent": min(100, (self.last_prompt_tokens / self.context_length * 100)) if self.context_length else 0,
+            "compression_count": self.compression_count,
+        }
+
    # ------------------------------------------------------------------
    # Tool output pruning (cheap pre-pass, no LLM call)
    # ------------------------------------------------------------------
@@ -20,6 +20,7 @@ from hermes_cli.auth import (
    DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
    KIMI_CODE_BASE_URL,
    PROVIDER_REGISTRY,
+    _auth_store_lock,
    _codex_access_token_is_expiring,
    _decode_jwt_claims,
    _import_codex_cli_tokens,
@@ -27,6 +28,8 @@ from hermes_cli.auth import (
    _load_provider_state,
    _resolve_kimi_base_url,
    _resolve_zai_base_url,
+    _save_auth_store,
+    _save_provider_state,
    read_credential_pool,
    write_credential_pool,
 )
@@ -479,6 +482,67 @@ class CredentialPool:
            logger.debug("Failed to sync from ~/.codex/auth.json: %s", exc)
        return entry

+    def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
+        """Write refreshed pool entry tokens back to auth.json providers.
+
+        After a pool-level refresh, the pool entry has fresh tokens but
+        auth.json's ``providers.<id>`` still holds the pre-refresh state.
+        On the next ``load_pool()``, ``_seed_from_singletons()`` reads that
+        stale state and can overwrite the fresh pool entry — potentially
+        re-seeding a consumed single-use refresh token.
+
+        Applies to any OAuth provider whose singleton lives in auth.json
+        (currently Nous and OpenAI Codex).
+        """
+        if entry.source != "device_code":
+            return
+        try:
+            with _auth_store_lock():
+                auth_store = _load_auth_store()
+                if self.provider == "nous":
+                    state = _load_provider_state(auth_store, "nous")
+                    if state is None:
+                        return
+                    state["access_token"] = entry.access_token
+                    if entry.refresh_token:
+                        state["refresh_token"] = entry.refresh_token
+                    if entry.expires_at:
+                        state["expires_at"] = entry.expires_at
+                    if entry.agent_key:
+                        state["agent_key"] = entry.agent_key
+                    if entry.agent_key_expires_at:
+                        state["agent_key_expires_at"] = entry.agent_key_expires_at
+                    for extra_key in ("obtained_at", "expires_in", "agent_key_id",
+                                      "agent_key_expires_in", "agent_key_reused",
+                                      "agent_key_obtained_at"):
+                        val = entry.extra.get(extra_key)
+                        if val is not None:
+                            state[extra_key] = val
+                    if entry.inference_base_url:
+                        state["inference_base_url"] = entry.inference_base_url
+                    _save_provider_state(auth_store, "nous", state)
+
+                elif self.provider == "openai-codex":
+                    state = _load_provider_state(auth_store, "openai-codex")
+                    if not isinstance(state, dict):
+                        return
+                    tokens = state.get("tokens")
+                    if not isinstance(tokens, dict):
+                        return
+                    tokens["access_token"] = entry.access_token
+                    if entry.refresh_token:
+                        tokens["refresh_token"] = entry.refresh_token
+                    if entry.last_refresh:
+                        state["last_refresh"] = entry.last_refresh
+                    _save_provider_state(auth_store, "openai-codex", state)
+
+                else:
+                    return
+
+                _save_auth_store(auth_store)
+        except Exception as exc:
+            logger.debug("Failed to sync %s pool entry back to auth store: %s", self.provider, exc)
+
    def _refresh_entry(self, entry: PooledCredential, *, force: bool) -> Optional[PooledCredential]:
        if entry.auth_type != AUTH_TYPE_OAUTH or not entry.refresh_token:
            if force:
@@ -513,6 +577,13 @@ class CredentialPool:
                    except Exception as wexc:
                        logger.debug("Failed to write refreshed token to credentials file: %s", wexc)
            elif self.provider == "openai-codex":
+                # Proactively sync from ~/.codex/auth.json before refresh.
+                # The Codex CLI (or another Hermes profile) may have already
+                # consumed our refresh_token.  Syncing first avoids a
+                # "refresh_token_reused" error when the CLI has a newer pair.
+                synced = self._sync_codex_entry_from_cli(entry)
+                if synced is not entry:
+                    entry = synced
                refreshed = auth_mod.refresh_codex_oauth_pure(
                    entry.access_token,
                    entry.refresh_token,
@@ -598,6 +669,37 @@ class CredentialPool:
                    # Credentials file had a valid (non-expired) token — use it directly
                    logger.debug("Credentials file has valid token, using without refresh")
                    return synced
+            # For openai-codex: the refresh_token may have been consumed by
+            # the Codex CLI between our proactive sync and the refresh call.
+            # Re-sync and retry once.
+            if self.provider == "openai-codex":
+                synced = self._sync_codex_entry_from_cli(entry)
+                if synced.refresh_token != entry.refresh_token:
+                    logger.debug("Retrying Codex refresh with synced token from ~/.codex/auth.json")
+                    try:
+                        refreshed = auth_mod.refresh_codex_oauth_pure(
+                            synced.access_token,
+                            synced.refresh_token,
+                        )
+                        updated = replace(
+                            synced,
+                            access_token=refreshed["access_token"],
+                            refresh_token=refreshed["refresh_token"],
+                            last_refresh=refreshed.get("last_refresh"),
+                            last_status=STATUS_OK,
+                            last_status_at=None,
+                            last_error_code=None,
+                        )
+                        self._replace_entry(synced, updated)
+                        self._persist()
+                        self._sync_device_code_entry_to_auth_store(updated)
+                        return updated
+                    except Exception as retry_exc:
+                        logger.debug("Codex retry refresh also failed: %s", retry_exc)
+                elif not self._entry_needs_refresh(synced):
+                    logger.debug("Codex CLI has valid token, using without refresh")
+                    self._sync_device_code_entry_to_auth_store(synced)
+                    return synced
            self._mark_exhausted(entry, None)
            return None

@@ -612,6 +714,10 @@ class CredentialPool:
        )
        self._replace_entry(entry, updated)
        self._persist()
+        # Sync refreshed tokens back to auth.json providers so that
+        # _seed_from_singletons() on the next load_pool() sees fresh state
+        # instead of re-seeding stale/consumed tokens.
+        self._sync_device_code_entry_to_auth_store(updated)
        return updated

    def _entry_needs_refresh(self, entry: PooledCredential) -> bool:
@@ -633,6 +739,17 @@ class CredentialPool:
            return False
        return False

+    def mark_used(self, entry_id: Optional[str] = None) -> None:
+        """Increment request_count for tracking. Used by least_used strategy."""
+        target_id = entry_id or self._current_id
+        if not target_id:
+            return
+        with self._lock:
+            for idx, entry in enumerate(self._entries):
+                if entry.id == target_id:
+                    self._entries[idx] = replace(entry, request_count=entry.request_count + 1)
+                    return
+
    def select(self) -> Optional[PooledCredential]:
        with self._lock:
            return self._select_unlocked()
@@ -794,6 +911,11 @@ class CredentialPool:
            else:
                self._active_leases[credential_id] = count - 1

+    def active_lease_count(self, credential_id: str) -> int:
+        """Return the number of active leases for a credential."""
+        with self._lock:
+            return self._active_leases.get(credential_id, 0)
+
    def try_refresh_current(self) -> Optional[PooledCredential]:
        with self._lock:
            return self._try_refresh_current_unlocked()
@@ -67,6 +67,26 @@ def _get_skin():
        return None


+def get_skin_faces(key: str, default: list) -> list:
+    """Get spinner face list from active skin, falling back to default."""
+    skin = _get_skin()
+    if skin:
+        faces = skin.get_spinner_list(key)
+        if faces:
+            return faces
+    return default
+
+
+def get_skin_verbs() -> list:
+    """Get thinking verbs from active skin."""
+    skin = _get_skin()
+    if skin:
+        verbs = skin.get_spinner_list("thinking_verbs")
+        if verbs:
+            return verbs
+    return KawaiiSpinner.THINKING_VERBS
+
+
 def get_skin_tool_prefix() -> str:
    """Get tool output prefix character from active skin."""
    skin = _get_skin()
@@ -703,6 +723,46 @@ class KawaiiSpinner:
        return False


+# =========================================================================
+# Kawaii face arrays (used by AIAgent._execute_tool_calls for spinner text)
+# =========================================================================
+
+KAWAII_SEARCH = [
+    "♪(´ε` )", "(｡◕‿◕｡)", "ヾ(＾∇＾)", "(◕ᴗ◕✿)", "( ˘▽˘)っ",
+    "٩(◕‿◕｡)۶", "(✿◠‿◠)", "♪～(´ε｀ )", "(ノ´ヮ`)ノ*:・゚✧", "＼(◎o◎)／",
+]
+KAWAII_READ = [
+    "φ(゜▽゜*)♪", "( ˘▽˘)っ", "(⌐■_■)", "٩(｡•́‿•̀｡)۶", "(◕‿◕✿)",
+    "ヾ(＠⌒ー⌒＠)ノ", "(✧ω✧)", "♪(๑ᴖ◡ᴖ๑)♪", "(≧◡≦)", "( ´ ▽ ` )ノ",
+]
+KAWAII_TERMINAL = [
+    "ヽ(>∀<☆)ノ", "(ノ°∀°)ノ", "٩(^ᴗ^)۶", "ヾ(⌐■_■)ノ♪", "(•̀ᴗ•́)و",
+    "┗(＾0＾)┓", "(｀・ω・´)", "＼(￣▽￣)／", "(ง •̀_•́)ง", "ヽ(´▽`)/",
+]
+KAWAII_BROWSER = [
+    "(ノ°∀°)ノ", "(☞゚ヮ゚)☞", "( ͡° ͜ʖ ͡°)", "┌( ಠ_ಠ)┘", "(⊙_⊙)？",
+    "ヾ(•ω•`)o", "(￣ω￣)", "( ˇωˇ )", "(ᵔᴥᵔ)", "＼(◎o◎)／",
+]
+KAWAII_CREATE = [
+    "✧*。٩(ˊᗜˋ*)و✧", "(ﾉ◕ヮ◕)ﾉ*:・ﾟ✧", "ヽ(>∀<☆)ノ", "٩(♡ε♡)۶", "(◕‿◕)♡",
+    "✿◕ ‿ ◕✿", "(*≧▽≦)", "ヾ(＾-＾)ノ", "(☆▽☆)", "°˖✧◝(⁰▿⁰)◜✧˖°",
+]
+KAWAII_SKILL = [
+    "ヾ(＠⌒ー⌒＠)ノ", "(๑˃ᴗ˂)ﻭ", "٩(◕‿◕｡)۶", "(✿╹◡╹)", "ヽ(・∀・)ノ",
+    "(ノ´ヮ`)ノ*:・ﾟ✧", "♪(๑ᴖ◡ᴖ๑)♪", "(◠‿◠)", "٩(ˊᗜˋ*)و", "(＾▽＾)",
+    "ヾ(＾∇＾)", "(★ω★)/", "٩(｡•́‿•̀｡)۶", "(◕ᴗ◕✿)", "＼(◎o◎)／",
+    "(✧ω✧)", "ヽ(>∀<☆)ノ", "( ˘▽˘)っ", "(≧◡≦) ♡", "ヾ(￣▽￣)",
+]
+KAWAII_THINK = [
+    "(っ°Д°;)っ", "(；′⌒`)", "(・_・ヾ", "( ´_ゝ`)", "(￣ヘ￣)",
+    "(。-`ω´-)", "( ˘︹˘ )", "(¬_¬)", "ヽ(ー_ー )ノ", "(；一_一)",
+]
+KAWAII_GENERIC = [
+    "♪(´ε` )", "(◕‿◕✿)", "ヾ(＾∇＾)", "٩(◕‿◕｡)۶", "(✿◠‿◠)",
+    "(ノ´ヮ`)ノ*:・ﾟ✧", "ヽ(>∀<☆)ノ", "(☆▽☆)", "( ˘▽˘)っ", "(≧◡≦)",
+]
+
+
 # =========================================================================
 # Cute tool message (completion line that replaces the spinner)
 # =========================================================================
@@ -910,6 +970,22 @@ _SKY_BLUE = "\033[38;5;117m"
 _ANSI_RESET = "\033[0m"


+def honcho_session_url(workspace: str, session_name: str) -> str:
+    """Build a Honcho app URL for a session."""
+    from urllib.parse import quote
+    return (
+        f"https://app.honcho.dev/explore"
+        f"?workspace={quote(workspace, safe='')}"
+        f"&view=sessions"
+        f"&session={quote(session_name, safe='')}"
+    )
+
+
+def _osc8_link(url: str, text: str) -> str:
+    """OSC 8 terminal hyperlink (clickable in iTerm2, Ghostty, WezTerm, etc.)."""
+    return f"\033]8;;{url}\033\\{text}\033]8;;\033\\"
+
+
 # =========================================================================
 # Context pressure display (CLI user-facing warnings)
 # =========================================================================
@@ -82,6 +82,16 @@ class ClassifiedError:
    def is_auth(self) -> bool:
        return self.reason in (FailoverReason.auth, FailoverReason.auth_permanent)

+    @property
+    def is_transient(self) -> bool:
+        """Error is expected to resolve on retry (with or without backoff)."""
+        return self.reason in (
+            FailoverReason.rate_limit,
+            FailoverReason.overloaded,
+            FailoverReason.server_error,
+            FailoverReason.timeout,
+            FailoverReason.unknown,
+        )


 # ── Provider-specific patterns ──────────────────────────────────────────
@@ -667,6 +677,27 @@ def _classify_by_message(
            should_compress=True,
        )

+    # Usage-limit patterns need the same disambiguation as 402: some providers
+    # surface "usage limit" errors without an HTTP status code.  A transient
+    # signal ("try again", "resets at", …) means it's a periodic quota, not
+    # billing exhaustion.
+    has_usage_limit = any(p in error_msg for p in _USAGE_LIMIT_PATTERNS)
+    if has_usage_limit:
+        has_transient_signal = any(p in error_msg for p in _USAGE_LIMIT_TRANSIENT_SIGNALS)
+        if has_transient_signal:
+            return result_fn(
+                FailoverReason.rate_limit,
+                retryable=True,
+                should_rotate_credential=True,
+                should_fallback=True,
+            )
+        return result_fn(
+            FailoverReason.billing,
+            retryable=False,
+            should_rotate_credential=True,
+            should_fallback=True,
+        )
+
    # Billing patterns
    if any(p in error_msg for p in _BILLING_PATTERNS):
        return result_fn(
@@ -39,6 +39,15 @@ def _has_known_pricing(model_name: str, provider: str = None, base_url: str = No
    return has_known_pricing(model_name, provider=provider, base_url=base_url)


+def _get_pricing(model_name: str) -> Dict[str, float]:
+    """Look up pricing for a model. Uses fuzzy matching on model name.
+
+    Returns _DEFAULT_PRICING (zero cost) for unknown/custom models —
+    we can't assume costs for self-hosted endpoints, local inference, etc.
+    """
+    return get_pricing(model_name)
+
+
 def _estimate_cost(
    session_or_model: Dict[str, Any] | str,
    input_tokens: int = 0,
@@ -134,6 +134,11 @@ class MemoryManager:
        """All registered providers in order."""
        return list(self._providers)

+    @property
+    def provider_names(self) -> List[str]:
+        """Names of all registered providers."""
+        return [p.name for p in self._providers]
+
    def get_provider(self, name: str) -> Optional[MemoryProvider]:
        """Get a provider by name, or None if not registered."""
        for p in self._providers:
@@ -135,6 +135,9 @@ class ProviderInfo:
    doc: str = ""                   # documentation URL
    model_count: int = 0

+    def has_api_url(self) -> bool:
+        return bool(self.api)
+

 # ---------------------------------------------------------------------------
 # Provider ID mapping: Hermes ↔ models.dev
@@ -631,6 +634,43 @@ def get_provider_info(provider_id: str) -> Optional[ProviderInfo]:
    return _parse_provider_info(mdev_id, raw)


+def list_all_providers() -> Dict[str, ProviderInfo]:
+    """Return all providers from models.dev as {provider_id: ProviderInfo}.
+
+    Returns the full catalog — 109+ providers.  For providers that have
+    a Hermes alias, both the models.dev ID and the Hermes ID are included.
+    """
+    data = fetch_models_dev()
+    result: Dict[str, ProviderInfo] = {}
+
+    for pid, pdata in data.items():
+        if isinstance(pdata, dict):
+            info = _parse_provider_info(pid, pdata)
+            result[pid] = info
+
+    return result
+
+
+def get_providers_for_env_var(env_var: str) -> List[str]:
+    """Reverse lookup: find all providers that use a given env var.
+
+    Useful for auto-detection: "user has ANTHROPIC_API_KEY set, which
+    providers does that enable?"
+
+    Returns list of models.dev provider IDs.
+    """
+    data = fetch_models_dev()
+    matches: List[str] = []
+
+    for pid, pdata in data.items():
+        if isinstance(pdata, dict):
+            env = pdata.get("env", [])
+            if isinstance(env, list) and env_var in env:
+                matches.append(pid)
+
+    return matches
+
+
 # ---------------------------------------------------------------------------
 # Model-level queries (rich ModelInfo)
 # ---------------------------------------------------------------------------
@@ -668,3 +708,74 @@ def get_model_info(
    return None


+def get_model_info_any_provider(model_id: str) -> Optional[ModelInfo]:
+    """Search all providers for a model by ID.
+
+    Useful when you have a full slug like "anthropic/claude-sonnet-4.6" or
+    a bare name and want to find it anywhere.  Checks Hermes-mapped providers
+    first, then falls back to all models.dev providers.
+    """
+    data = fetch_models_dev()
+
+    # Try Hermes-mapped providers first (more likely what the user wants)
+    for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
+        pdata = data.get(mdev_id)
+        if not isinstance(pdata, dict):
+            continue
+        models = pdata.get("models", {})
+        if not isinstance(models, dict):
+            continue
+
+        raw = models.get(model_id)
+        if isinstance(raw, dict):
+            return _parse_model_info(model_id, raw, mdev_id)
+
+        # Case-insensitive
+        model_lower = model_id.lower()
+        for mid, mdata in models.items():
+            if mid.lower() == model_lower and isinstance(mdata, dict):
+                return _parse_model_info(mid, mdata, mdev_id)
+
+    # Fall back to ALL providers
+    for pid, pdata in data.items():
+        if pid in _get_reverse_mapping():
+            continue  # already checked
+        if not isinstance(pdata, dict):
+            continue
+        models = pdata.get("models", {})
+        if not isinstance(models, dict):
+            continue
+
+        raw = models.get(model_id)
+        if isinstance(raw, dict):
+            return _parse_model_info(model_id, raw, pid)
+
+    return None
+
+
+def list_provider_model_infos(provider_id: str) -> List[ModelInfo]:
+    """Return all models for a provider as ModelInfo objects.
+
+    Filters out deprecated models by default.
+    """
+    mdev_id = PROVIDER_TO_MODELS_DEV.get(provider_id, provider_id)
+
+    data = fetch_models_dev()
+    pdata = data.get(mdev_id)
+    if not isinstance(pdata, dict):
+        return []
+
+    models = pdata.get("models", {})
+    if not isinstance(models, dict):
+        return []
+
+    result: List[ModelInfo] = []
+    for mid, mdata in models.items():
+        if not isinstance(mdata, dict):
+            continue
+        status = mdata.get("status", "")
+        if status == "deprecated":
+            continue
+        result.append(_parse_model_info(mid, mdata, mdev_id))
+
+    return result
@@ -491,6 +491,17 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
        return True, {}, ""


+def _read_skill_conditions(skill_file: Path) -> dict:
+    """Extract conditional activation fields from SKILL.md frontmatter."""
+    try:
+        raw = skill_file.read_text(encoding="utf-8")[:2000]
+        frontmatter, _ = parse_frontmatter(raw)
+        return extract_skill_conditions(frontmatter)
+    except Exception as e:
+        logger.debug("Failed to read skill conditions from %s: %s", skill_file, e)
+        return {}
+
+
 def _skill_should_show(
    conditions: dict,
    available_tools: "set[str] | None",
@@ -595,6 +595,30 @@ def get_pricing(
    }


+def estimate_cost_usd(
+    model: str,
+    input_tokens: int,
+    output_tokens: int,
+    *,
+    provider: Optional[str] = None,
+    base_url: Optional[str] = None,
+    api_key: Optional[str] = None,
+) -> float:
+    """Backward-compatible helper for legacy callers.
+
+    This uses non-cached input/output only. New code should call
+    `estimate_usage_cost()` with canonical usage buckets.
+    """
+    result = estimate_usage_cost(
+        model,
+        CanonicalUsage(input_tokens=input_tokens, output_tokens=output_tokens),
+        provider=provider,
+        base_url=base_url,
+        api_key=api_key,
+    )
+    return float(result.amount_usd or _ZERO)
+
+
 def format_duration_compact(seconds: float) -> str:
    if seconds < 60:
        return f"{seconds:.0f}s"
@@ -120,6 +120,18 @@ def _parse_reasoning_config(effort: str) -> dict | None:
    return result


+def _parse_service_tier_config(raw: str) -> str | None:
+    """Parse a persisted service-tier preference into a Responses API value."""
+    value = str(raw or "").strip().lower()
+    if not value or value in {"normal", "default", "standard", "off", "none"}:
+        return None
+    if value in {"fast", "priority", "on"}:
+        return "priority"
+    logger.warning("Unknown service_tier '%s', ignoring", raw)
+    return None
+
+
+
 def _get_chrome_debug_candidates(system: str) -> list[str]:
    """Return likely browser executables for local CDP auto-launch."""
    candidates: list[str] = []
@@ -239,6 +251,7 @@ def load_cli_config() -> Dict[str, Any]:
            "system_prompt": "",
            "prefill_messages_file": "",
            "reasoning_effort": "",
+            "service_tier": "",
            "personalities": {
                "helpful": "You are a helpful, friendly AI assistant.",
                "concise": "You are a concise assistant. Keep responses brief and to the point.",
@@ -1008,7 +1021,7 @@ def _cprint(text: str):


 # ---------------------------------------------------------------------------
-# File-drop detection — extracted as a pure function for testability.
+# File-drop / local attachment detection — extracted as pure helpers for tests.
 # ---------------------------------------------------------------------------

 _IMAGE_EXTENSIONS = frozenset({
@@ -1017,12 +1030,103 @@ _IMAGE_EXTENSIONS = frozenset({
 })


-def _detect_file_drop(user_input: str) -> "dict | None":
-    """Detect if *user_input* is a dragged/pasted file path, not a slash command.
+from hermes_constants import is_termux as _is_termux_environment

-    When a user drags a file into the terminal, macOS pastes the absolute path
-    (e.g. ``/Users/roland/Desktop/file.png``) which starts with ``/`` and would
-    otherwise be mistaken for a slash command.
+
+def _termux_example_image_path(filename: str = "cat.png") -> str:
+    """Return a realistic example media path for the current Termux setup."""
+    candidates = [
+        os.path.expanduser("~/storage/shared"),
+        "/sdcard",
+        "/storage/emulated/0",
+        "/storage/self/primary",
+    ]
+    for root in candidates:
+        if os.path.isdir(root):
+            return os.path.join(root, "Pictures", filename)
+    return os.path.join("~/storage/shared", "Pictures", filename)
+
+
+def _split_path_input(raw: str) -> tuple[str, str]:
+    """Split a leading file path token from trailing free-form text.
+
+    Supports quoted paths and backslash-escaped spaces so callers can accept
+    inputs like:
+      /tmp/pic.png describe this
+      ~/storage/shared/My\ Photos/cat.png what is this?
+      "/storage/emulated/0/DCIM/Camera/cat 1.png" summarize
+    """
+    raw = str(raw or "").strip()
+    if not raw:
+        return "", ""
+
+    if raw[0] in {'"', "'"}:
+        quote = raw[0]
+        pos = 1
+        while pos < len(raw):
+            ch = raw[pos]
+            if ch == '\\' and pos + 1 < len(raw):
+                pos += 2
+                continue
+            if ch == quote:
+                token = raw[1:pos]
+                remainder = raw[pos + 1 :].strip()
+                return token, remainder
+            pos += 1
+        return raw[1:], ""
+
+    pos = 0
+    while pos < len(raw):
+        ch = raw[pos]
+        if ch == '\\' and pos + 1 < len(raw) and raw[pos + 1] == ' ':
+            pos += 2
+        elif ch == ' ':
+            break
+        else:
+            pos += 1
+
+    token = raw[:pos].replace('\\ ', ' ')
+    remainder = raw[pos:].strip()
+    return token, remainder
+
+
+def _resolve_attachment_path(raw_path: str) -> Path | None:
+    """Resolve a user-supplied local attachment path.
+
+    Accepts quoted or unquoted paths, expands ``~`` and env vars, and resolves
+    relative paths from ``TERMINAL_CWD`` when set (matching terminal tool cwd).
+    Returns ``None`` when the path does not resolve to an existing file.
+    """
+    token = str(raw_path or "").strip()
+    if not token:
+        return None
+
+    if (token.startswith('"') and token.endswith('"')) or (token.startswith("'") and token.endswith("'")):
+        token = token[1:-1].strip()
+    if not token:
+        return None
+
+    expanded = os.path.expandvars(os.path.expanduser(token))
+    path = Path(expanded)
+    if not path.is_absolute():
+        base_dir = Path(os.getenv("TERMINAL_CWD", os.getcwd()))
+        path = base_dir / path
+
+    try:
+        resolved = path.resolve()
+    except Exception:
+        resolved = path
+
+    if not resolved.exists() or not resolved.is_file():
+        return None
+    return resolved
+
+
+def _detect_file_drop(user_input: str) -> "dict | None":
+    """Detect if *user_input* starts with a real local file path.
+
+    This catches dragged/pasted paths before they are mistaken for slash
+    commands, and also supports Termux-friendly paths like ``~/storage/...``.

    Returns a dict on match::

@@ -1034,29 +1138,31 @@ def _detect_file_drop(user_input: str) -> "dict | None":

    Returns ``None`` when the input is not a real file path.
    """
-    if not isinstance(user_input, str) or not user_input.startswith("/"):
+    if not isinstance(user_input, str):
        return None

-    # Walk the string absorbing backslash-escaped spaces ("\ ").
-    raw = user_input
-    pos = 0
-    while pos < len(raw):
-        ch = raw[pos]
-        if ch == '\\' and pos + 1 < len(raw) and raw[pos + 1] == ' ':
-            pos += 2  # skip escaped space
-        elif ch == ' ':
-            break
-        else:
-            pos += 1
-
-    first_token_raw = raw[:pos]
-    first_token = first_token_raw.replace('\\ ', ' ')
-    drop_path = Path(first_token)
-
-    if not drop_path.exists() or not drop_path.is_file():
+    stripped = user_input.strip()
+    if not stripped:
+        return None
+
+    starts_like_path = (
+        stripped.startswith("/")
+        or stripped.startswith("~")
+        or stripped.startswith("./")
+        or stripped.startswith("../")
+        or stripped.startswith('"/')
+        or stripped.startswith('"~')
+        or stripped.startswith("'/")
+        or stripped.startswith("'~")
+    )
+    if not starts_like_path:
+        return None
+
+    first_token, remainder = _split_path_input(stripped)
+    drop_path = _resolve_attachment_path(first_token)
+    if drop_path is None:
        return None

-    remainder = raw[pos:].strip()
    return {
        "path": drop_path,
        "is_image": drop_path.suffix.lower() in _IMAGE_EXTENSIONS,
@@ -1064,6 +1170,69 @@ def _detect_file_drop(user_input: str) -> "dict | None":
    }


+def _format_image_attachment_badges(attached_images: list[Path], image_counter: int, width: int | None = None) -> str:
+    """Format the attached-image badge row for the interactive CLI.
+
+    Narrow terminals such as Termux should get a compact summary that fits on a
+    single row, while wider terminals can show the classic per-image badges.
+    """
+    if not attached_images:
+        return ""
+
+    width = width or shutil.get_terminal_size((80, 24)).columns
+
+    def _trunc(name: str, limit: int) -> str:
+        return name if len(name) <= limit else name[: max(1, limit - 3)] + "..."
+
+    if width < 52:
+        if len(attached_images) == 1:
+            return f"[📎 {_trunc(attached_images[0].name, 20)}]"
+        return f"[📎 {len(attached_images)} images attached]"
+
+    if width < 80:
+        if len(attached_images) == 1:
+            return f"[📎 {_trunc(attached_images[0].name, 32)}]"
+        first = _trunc(attached_images[0].name, 20)
+        extra = len(attached_images) - 1
+        return f"[📎 {first}] [+{extra}]"
+
+    base = image_counter - len(attached_images) + 1
+    return " ".join(
+        f"[📎 Image #{base + i}]"
+        for i in range(len(attached_images))
+    )
+
+
+def _collect_query_images(query: str | None, image_arg: str | None = None) -> tuple[str, list[Path]]:
+    """Collect local image attachments for single-query CLI flows."""
+    message = query or ""
+    images: list[Path] = []
+
+    if isinstance(message, str):
+        dropped = _detect_file_drop(message)
+        if dropped and dropped.get("is_image"):
+            images.append(dropped["path"])
+            message = dropped["remainder"] or f"[User attached image: {dropped['path'].name}]"
+
+    if image_arg:
+        explicit_path = _resolve_attachment_path(image_arg)
+        if explicit_path is None:
+            raise ValueError(f"Image file not found: {image_arg}")
+        if explicit_path.suffix.lower() not in _IMAGE_EXTENSIONS:
+            raise ValueError(f"Not a supported image file: {explicit_path}")
+        images.append(explicit_path)
+
+    deduped: list[Path] = []
+    seen: set[str] = set()
+    for img in images:
+        key = str(img)
+        if key in seen:
+            continue
+        seen.add(key)
+        deduped.append(img)
+    return message, deduped
+
+
 class ChatConsole:
    """Rich Console adapter for prompt_toolkit's patch_stdout context.

@@ -1118,6 +1287,14 @@ HERMES_CADUCEUS = """[#CD7F32]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⡀⠀⣀⣀
 [#B8860B]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⠈⣡⠞⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
 [#B8860B]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]"""

+# Compact banner for smaller terminals (fallback)
+# Note: built dynamically by _build_compact_banner() to fit terminal width
+COMPACT_BANNER = """
+[bold #FFD700]╔══════════════════════════════════════════════════════════════╗[/]
+[bold #FFD700]║[/]  [#FFBF00]⚕ NOUS HERMES[/] [dim #B8860B]- AI Agent Framework[/]              [bold #FFD700]║[/]
+[bold #FFD700]║[/]  [#CD7F32]Messenger of the Digital Gods[/]    [dim #B8860B]Nous Research[/]   [bold #FFD700]║[/]
+[bold #FFD700]╚══════════════════════════════════════════════════════════════╝[/]
+"""


 def _build_compact_banner() -> str:
@@ -1363,6 +1540,7 @@ class HermesCLI:
        self._stream_buf = ""        # Partial line buffer for line-buffered rendering
        self._stream_started = False  # True once first delta arrives
        self._stream_box_opened = False  # True once the response box header is printed
+        self._reasoning_stream_started = False  # True once live reasoning starts streaming
        self._reasoning_preview_buf = ""  # Coalesce tiny reasoning chunks for [thinking] output
        self._pending_edit_snapshots = {}
        
@@ -1420,6 +1598,8 @@ class HermesCLI:
            self.api_key = api_key or os.getenv("OPENROUTER_API_KEY") or os.getenv("OPENAI_API_KEY")
        else:
            self.api_key = api_key or os.getenv("OPENAI_API_KEY") or os.getenv("OPENROUTER_API_KEY")
+        self._nous_key_expires_at: Optional[str] = None
+        self._nous_key_source: Optional[str] = None
        # Max turns priority: CLI arg > config file > env var > default
        if max_turns is not None:  # CLI arg was explicitly set
            self.max_turns = max_turns
@@ -1467,6 +1647,9 @@ class HermesCLI:
        self.reasoning_config = _parse_reasoning_config(
            CLI_CONFIG["agent"].get("reasoning_effort", "")
        )
+        self.service_tier = _parse_service_tier_config(
+            CLI_CONFIG["agent"].get("service_tier", "")
+        )
        
        # OpenRouter provider routing preferences
        pr = CLI_CONFIG.get("provider_routing", {}) or {}
@@ -1690,15 +1873,70 @@ class HermesCLI:
            width += ch_width
        return "".join(out).rstrip() + ellipsis

+    @staticmethod
+    def _get_tui_terminal_width(default: tuple[int, int] = (80, 24)) -> int:
+        """Return the live prompt_toolkit width, falling back to ``shutil``.
+
+        The TUI layout can be narrower than ``shutil.get_terminal_size()`` reports,
+        especially on Termux/mobile shells, so prefer prompt_toolkit's width whenever
+        an app is active.
+        """
+        try:
+            from prompt_toolkit.application import get_app
+            return get_app().output.get_size().columns
+        except Exception:
+            return shutil.get_terminal_size(default).columns
+
+    def _use_minimal_tui_chrome(self, width: Optional[int] = None) -> bool:
+        """Hide low-value chrome on narrow/mobile terminals to preserve rows."""
+        if width is None:
+            width = self._get_tui_terminal_width()
+        return width < 64
+
+    def _tui_input_rule_height(self, position: str, width: Optional[int] = None) -> int:
+        """Return the visible height for the top/bottom input separator rules."""
+        if position not in {"top", "bottom"}:
+            raise ValueError(f"Unknown input rule position: {position}")
+        if position == "top":
+            return 1
+        return 0 if self._use_minimal_tui_chrome(width=width) else 1
+
+    def _agent_spacer_height(self, width: Optional[int] = None) -> int:
+        """Return the spacer height shown above the status bar while the agent runs."""
+        if not getattr(self, "_agent_running", False):
+            return 0
+        return 0 if self._use_minimal_tui_chrome(width=width) else 1
+
+    def _spinner_widget_height(self, width: Optional[int] = None) -> int:
+        """Return the visible height for the spinner/status text line above the status bar."""
+        if not getattr(self, "_spinner_text", ""):
+            return 0
+        return 0 if self._use_minimal_tui_chrome(width=width) else 1
+
+    def _get_voice_status_fragments(self, width: Optional[int] = None):
+        """Return the voice status bar fragments for the interactive TUI."""
+        width = width or self._get_tui_terminal_width()
+        compact = self._use_minimal_tui_chrome(width=width)
+        if self._voice_recording:
+            if compact:
+                return [("class:voice-status-recording", " ● REC ")]
+            return [("class:voice-status-recording", " ● REC  Ctrl+B to stop ")]
+        if self._voice_processing:
+            if compact:
+                return [("class:voice-status", " ◉ STT ")]
+            return [("class:voice-status", " ◉ Transcribing... ")]
+        if compact:
+            return [("class:voice-status", " 🎤 Ctrl+B ")]
+        tts = " | TTS on" if self._voice_tts else ""
+        cont = " | Continuous" if self._voice_continuous else ""
+        return [("class:voice-status", f" 🎤 Voice mode{tts}{cont}  —  Ctrl+B to record ")]
+
    def _build_status_bar_text(self, width: Optional[int] = None) -> str:
+        """Return a compact one-line session status string for the TUI footer."""
        try:
            snapshot = self._get_status_bar_snapshot()
            if width is None:
-                try:
-                    from prompt_toolkit.application import get_app
-                    width = get_app().output.get_size().columns
-                except Exception:
-                    width = shutil.get_terminal_size((80, 24)).columns
+                width = self._get_tui_terminal_width()
            percent = snapshot["context_percent"]
            percent_label = f"{percent}%" if percent is not None else "--"
            duration_label = snapshot["duration"]
@@ -1734,11 +1972,7 @@ class HermesCLI:
            # values (especially on SSH) that differ from what prompt_toolkit
            # actually renders, causing the fragments to overflow to a second
            # line and produce duplicated status bar rows over long sessions.
-            try:
-                from prompt_toolkit.application import get_app
-                width = get_app().output.get_size().columns
-            except Exception:
-                width = shutil.get_terminal_size((80, 24)).columns
+            width = self._get_tui_terminal_width()
            duration_label = snapshot["duration"]

            if width < 52:
@@ -1995,6 +2229,7 @@ class HermesCLI:
        """
        if not text:
            return
+        self._reasoning_stream_started = True
        self._reasoning_shown_this_turn = True
        if getattr(self, "_stream_box_opened", False):
            return
@@ -2073,17 +2308,59 @@ class HermesCLI:
        # Append to a pre-filter buffer first
        self._stream_prefilt = getattr(self, "_stream_prefilt", "") + text

-        # Check if we're entering a reasoning block
+        # Check if we're entering a reasoning block.
+        # Only match tags that appear at a "block boundary": start of the
+        # stream, after a newline (with optional whitespace), or when nothing
+        # but whitespace has been emitted on the current line.
+        # This prevents false positives when models *mention* tags in prose
+        # like "(/think not producing <think> tags)".
+        #
+        # _stream_last_was_newline tracks whether the last character emitted
+        # (or the start of the stream) is a line boundary.  It's True at
+        # stream start and set True whenever emitted text ends with '\n'.
+        if not hasattr(self, "_stream_last_was_newline"):
+            self._stream_last_was_newline = True  # start of stream = boundary
+
        if not getattr(self, "_in_reasoning_block", False):
            for tag in _OPEN_TAGS:
-                idx = self._stream_prefilt.find(tag)
-                if idx != -1:
-                    # Emit everything before the tag
-                    before = self._stream_prefilt[:idx]
-                    if before:
-                        self._emit_stream_text(before)
-                    self._in_reasoning_block = True
-                    self._stream_prefilt = self._stream_prefilt[idx + len(tag):]
+                search_start = 0
+                while True:
+                    idx = self._stream_prefilt.find(tag, search_start)
+                    if idx == -1:
+                        break
+                    # Check if this is a block boundary position
+                    preceding = self._stream_prefilt[:idx]
+                    if idx == 0:
+                        # At buffer start — only a boundary if we're at
+                        # a line start (stream start or last emit ended
+                        # with newline)
+                        is_block_boundary = getattr(self, "_stream_last_was_newline", True)
+                    else:
+                        # Find last newline in the buffer before the tag
+                        last_nl = preceding.rfind("\n")
+                        if last_nl == -1:
+                            # No newline in buffer — boundary only if
+                            # last emit was a newline AND only whitespace
+                            # has accumulated before the tag
+                            is_block_boundary = (
+                                getattr(self, "_stream_last_was_newline", True)
+                                and preceding.strip() == ""
+                            )
+                        else:
+                            # Text between last newline and tag must be
+                            # whitespace-only
+                            is_block_boundary = preceding[last_nl + 1:].strip() == ""
+                    if is_block_boundary:
+                        # Emit everything before the tag
+                        if preceding:
+                            self._emit_stream_text(preceding)
+                            self._stream_last_was_newline = preceding.endswith("\n")
+                        self._in_reasoning_block = True
+                        self._stream_prefilt = self._stream_prefilt[idx + len(tag):]
+                        break
+                    # Not a block boundary — keep searching after this occurrence
+                    search_start = idx + 1
+                if getattr(self, "_in_reasoning_block", False):
                    break

            # Could also be a partial open tag at the end — hold it back
@@ -2097,6 +2374,7 @@ class HermesCLI:
                            break
                if safe:
                    self._emit_stream_text(safe)
+                    self._stream_last_was_newline = safe.endswith("\n")
                    self._stream_prefilt = self._stream_prefilt[len(safe):]
                return

@@ -2186,6 +2464,14 @@ class HermesCLI:

    def _flush_stream(self) -> None:
        """Emit any remaining partial line from the stream buffer and close the box."""
+        # If we're still inside a "reasoning block" at end-of-stream, it was
+        # a false positive — the model mentioned a tag like <think> in prose
+        # but never closed it.  Recover the buffered content as regular text.
+        if getattr(self, "_in_reasoning_block", False) and getattr(self, "_stream_prefilt", ""):
+            self._in_reasoning_block = False
+            self._emit_stream_text(self._stream_prefilt)
+            self._stream_prefilt = ""
+
        # Close reasoning box if still open (in case no content tokens arrived)
        self._close_reasoning_box()

@@ -2204,9 +2490,11 @@ class HermesCLI:
        self._stream_buf = ""
        self._stream_started = False
        self._stream_box_opened = False
+        self._reasoning_stream_started = False
        self._stream_text_ansi = ""
        self._stream_prefilt = ""
        self._in_reasoning_block = False
+        self._stream_last_was_newline = True
        self._reasoning_box_opened = False
        self._reasoning_buf = ""
        self._reasoning_preview_buf = ""
@@ -2336,8 +2624,9 @@ class HermesCLI:
    def _resolve_turn_agent_config(self, user_message: str) -> dict:
        """Resolve model/runtime overrides for a single user turn."""
        from agent.smart_model_routing import resolve_turn_route
+        from hermes_cli.models import resolve_fast_mode_overrides

-        return resolve_turn_route(
+        route = resolve_turn_route(
            user_message,
            self._smart_model_routing,
            {
@@ -2352,7 +2641,19 @@ class HermesCLI:
            },
        )

-    def _init_agent(self, *, model_override: str = None, runtime_override: dict = None, route_label: str = None) -> bool:
+        service_tier = getattr(self, "service_tier", None)
+        if not service_tier:
+            route["request_overrides"] = None
+            return route
+
+        try:
+            overrides = resolve_fast_mode_overrides(route.get("model"))
+        except Exception:
+            overrides = None
+        route["request_overrides"] = overrides
+        return route
+
+    def _init_agent(self, *, model_override: str = None, runtime_override: dict = None, route_label: str = None, request_overrides: dict | None = None) -> bool:
        """
        Initialize the agent on first use.
        When resuming a session, restores conversation history from SQLite.
@@ -2439,6 +2740,8 @@ class HermesCLI:
                ephemeral_system_prompt=self.system_prompt if self.system_prompt else None,
                prefill_messages=self.prefill_messages or None,
                reasoning_config=self.reasoning_config,
+                service_tier=self.service_tier,
+                request_overrides=request_overrides,
                providers_allowed=self._providers_only,
                providers_ignored=self._providers_ignore,
                providers_order=self._providers_order,
@@ -2933,6 +3236,14 @@ class HermesCLI:
        doesn't fire for image-only clipboard content (e.g., VSCode terminal,
        Windows Terminal with WSL2).
        """
+        if _is_termux_environment():
+            _cprint(
+                f"  {_DIM}Clipboard image paste is not available on Termux — "
+                f"use /image <path> or paste a local image path like "
+                f"{_termux_example_image_path()}{_RST}"
+            )
+            return
+
        from hermes_cli.clipboard import has_clipboard_image
        if has_clipboard_image():
            if self._try_attach_clipboard_image():
@@ -2943,7 +3254,31 @@ class HermesCLI:
        else:
            _cprint(f"  {_DIM}(._.) No image found in clipboard{_RST}")

-    def _preprocess_images_with_vision(self, text: str, images: list) -> str:
+    def _handle_image_command(self, cmd_original: str):
+        """Handle /image <path> — attach a local image file for the next prompt."""
+        raw_args = (cmd_original.split(None, 1)[1].strip() if " " in cmd_original else "")
+        if not raw_args:
+            hint = _termux_example_image_path() if _is_termux_environment() else "/path/to/image.png"
+            _cprint(f"  {_DIM}Usage: /image <path>  e.g. /image {hint}{_RST}")
+            return
+
+        path_token, _remainder = _split_path_input(raw_args)
+        image_path = _resolve_attachment_path(path_token)
+        if image_path is None:
+            _cprint(f"  {_DIM}(>_<) File not found: {path_token}{_RST}")
+            return
+        if image_path.suffix.lower() not in _IMAGE_EXTENSIONS:
+            _cprint(f"  {_DIM}(._.) Not a supported image file: {image_path.name}{_RST}")
+            return
+
+        self._attached_images.append(image_path)
+        _cprint(f"  📎 Attached image: {image_path.name}")
+        if _remainder:
+            _cprint(f"  {_DIM}Now type your prompt (or use --image in single-query mode): {_remainder}{_RST}")
+        elif _is_termux_environment():
+            _cprint(f"  {_DIM}Tip: type your next message, or run hermes chat -q --image {_termux_example_image_path(image_path.name)} \"What do you see?\"{_RST}")
+
+    def _preprocess_images_with_vision(self, text: str, images: list, *, announce: bool = True) -> str:
        """Analyze attached images via the vision tool and return enriched text.

        Instead of embedding raw base64 ``image_url`` content parts in the
@@ -2970,7 +3305,8 @@ class HermesCLI:
            if not img_path.exists():
                continue
            size_kb = img_path.stat().st_size // 1024
-            _cprint(f"  {_DIM}👁️  analyzing {img_path.name} ({size_kb}KB)...{_RST}")
+            if announce:
+                _cprint(f"  {_DIM}👁️  analyzing {img_path.name} ({size_kb}KB)...{_RST}")
            try:
                result_json = _asyncio.run(
                    vision_analyze_tool(image_url=str(img_path), user_prompt=analysis_prompt)
@@ -2983,21 +3319,24 @@ class HermesCLI:
                        f"[If you need a closer look, use vision_analyze with "
                        f"image_url: {img_path}]"
                    )
-                    _cprint(f"  {_DIM}✓ image analyzed{_RST}")
+                    if announce:
+                        _cprint(f"  {_DIM}✓ image analyzed{_RST}")
                else:
                    enriched_parts.append(
                        f"[The user attached an image but it couldn't be analyzed. "
                        f"You can try examining it with vision_analyze using "
                        f"image_url: {img_path}]"
                    )
-                    _cprint(f"  {_DIM}⚠ vision analysis failed — path included for retry{_RST}")
+                    if announce:
+                        _cprint(f"  {_DIM}⚠ vision analysis failed — path included for retry{_RST}")
            except Exception as e:
                enriched_parts.append(
                    f"[The user attached an image but analysis failed ({e}). "
                    f"You can try examining it with vision_analyze using "
                    f"image_url: {img_path}]"
                )
-                _cprint(f"  {_DIM}⚠ vision analysis error — path included for retry{_RST}")
+                if announce:
+                    _cprint(f"  {_DIM}⚠ vision analysis error — path included for retry{_RST}")

        # Combine: vision descriptions first, then the user's original text
        user_text = text if isinstance(text, str) and text else ""
@@ -3060,6 +3399,20 @@ class HermesCLI:
            f"{toolsets_info}{provider_info}"
        )
    
+    def _fast_command_available(self) -> bool:
+        try:
+            from hermes_cli.models import model_supports_fast_mode
+        except Exception:
+            return False
+        agent = getattr(self, "agent", None)
+        model = getattr(agent, "model", None) or getattr(self, "model", None)
+        return model_supports_fast_mode(model)
+
+    def _command_available(self, slash_command: str) -> bool:
+        if slash_command == "/fast":
+            return self._fast_command_available()
+        return True
+
    def show_help(self):
        """Display help information with categorized commands."""
        from hermes_cli.commands import COMMANDS_BY_CATEGORY
@@ -3080,6 +3433,8 @@ class HermesCLI:
        for category, commands in COMMANDS_BY_CATEGORY.items():
            _cprint(f"\n  {_BOLD}── {category} ──{_RST}")
            for cmd, desc in commands.items():
+                if not self._command_available(cmd):
+                    continue
                ChatConsole().print(f"    [bold {_accent_hex()}]{cmd:<15}[/] [dim]-[/] {_escape(desc)}")

        if _skill_commands:
@@ -3091,7 +3446,10 @@ class HermesCLI:

        _cprint(f"\n  {_DIM}Tip: Just type your message to chat with Hermes!{_RST}")
        _cprint(f"  {_DIM}Multi-line: Alt+Enter for a new line{_RST}")
-        _cprint(f"  {_DIM}Paste image: Alt+V (or /paste){_RST}\n")
+        if _is_termux_environment():
+            _cprint(f"  {_DIM}Attach image: /image {_termux_example_image_path()} or start your prompt with a local image path{_RST}\n")
+        else:
+            _cprint(f"  {_DIM}Paste image: Alt+V (or /paste){_RST}\n")
    
    def show_tools(self):
        """Display available tools with kawaii ASCII art."""
@@ -4529,6 +4887,8 @@ class HermesCLI:
            self._toggle_yolo()
        elif canonical == "reasoning":
            self._handle_reasoning_command(cmd_original)
+        elif canonical == "fast":
+            self._handle_fast_command(cmd_original)
        elif canonical == "compress":
            self._manual_compress()
        elif canonical == "usage":
@@ -4537,6 +4897,8 @@ class HermesCLI:
            self._show_insights(cmd_original)
        elif canonical == "paste":
            self._handle_paste_command()
+        elif canonical == "image":
+            self._handle_image_command(cmd_original)
        elif canonical == "reload-mcp":
            with self._busy_command(self._slow_command_status(cmd_original)):
                self._reload_mcp()
@@ -4766,6 +5128,8 @@ class HermesCLI:
                    platform="cli",
                    session_db=self._session_db,
                    reasoning_config=self.reasoning_config,
+                    service_tier=self.service_tier,
+                    request_overrides=turn_route.get("request_overrides"),
                    providers_allowed=self._providers_only,
                    providers_ignored=self._providers_ignore,
                    providers_order=self._providers_order,
@@ -4901,6 +5265,8 @@ class HermesCLI:
                    session_id=task_id,
                    platform="cli",
                    reasoning_config=self.reasoning_config,
+                    service_tier=self.service_tier,
+                    request_overrides=turn_route.get("request_overrides"),
                    providers_allowed=self._providers_only,
                    providers_ignored=self._providers_ignore,
                    providers_order=self._providers_order,
@@ -5330,6 +5696,40 @@ class HermesCLI:
        else:
            _cprint(f"  {_GOLD}✓ Reasoning effort set to '{arg}' (session only){_RST}")

+    def _handle_fast_command(self, cmd: str):
+        """Handle /fast — toggle OpenAI Priority Processing (service_tier)."""
+        if not self._fast_command_available():
+            _cprint("  (._.) /fast is only available for OpenAI models that support Priority Processing.")
+            return
+
+        parts = cmd.strip().split(maxsplit=1)
+        if len(parts) < 2 or parts[1].strip().lower() == "status":
+            status = "fast" if self.service_tier == "priority" else "normal"
+            _cprint(f"  {_GOLD}Priority Processing: {status}{_RST}")
+            _cprint(f"  {_DIM}Usage: /fast [normal|fast|status]{_RST}")
+            return
+
+        arg = parts[1].strip().lower()
+
+        if arg in {"fast", "on"}:
+            self.service_tier = "priority"
+            saved_value = "fast"
+            label = "FAST"
+        elif arg in {"normal", "off"}:
+            self.service_tier = None
+            saved_value = "normal"
+            label = "NORMAL"
+        else:
+            _cprint(f"  {_DIM}(._.) Unknown argument: {arg}{_RST}")
+            _cprint(f"  {_DIM}Usage: /fast [normal|fast|status]{_RST}")
+            return
+
+        self.agent = None  # Force agent re-init with new service-tier config
+        if save_config_value("agent.service_tier", saved_value):
+            _cprint(f"  {_GOLD}✓ Priority Processing set to {label} (saved to config){_RST}")
+        else:
+            _cprint(f"  {_GOLD}✓ Priority Processing set to {label} (session only){_RST}")
+
    def _on_reasoning(self, reasoning_text: str):
        """Callback for intermediate reasoning display during tool-call loops."""
        if not reasoning_text:
@@ -5357,7 +5757,7 @@ class HermesCLI:
            approx_tokens = estimate_messages_tokens_rough(self.conversation_history)
            print(f"🗜️  Compressing {original_count} messages (~{approx_tokens:,} tokens)...")

-            compressed, _new_system = self.agent._compress_context(
+            compressed, new_system = self.agent._compress_context(
                self.conversation_history,
                self.agent._cached_system_prompt or "",
                approx_tokens=approx_tokens,
@@ -5730,10 +6130,23 @@ class HermesCLI:
        """Start capturing audio from the microphone."""
        if getattr(self, '_should_exit', False):
            return
-        from tools.voice_mode import AudioRecorder, check_voice_requirements
+        from tools.voice_mode import create_audio_recorder, check_voice_requirements

        reqs = check_voice_requirements()
        if not reqs["audio_available"]:
+            if _is_termux_environment():
+                details = reqs.get("details", "")
+                if "Termux:API Android app is not installed" in details:
+                    raise RuntimeError(
+                        "Termux:API command package detected, but the Android app is missing.\n"
+                        "Install/update the Termux:API Android app, then retry /voice on.\n"
+                        "Fallback: pkg install python-numpy portaudio && python -m pip install sounddevice"
+                    )
+                raise RuntimeError(
+                    "Voice mode requires either Termux:API microphone access or Python audio libraries.\n"
+                    "Option 1: pkg install termux-api and install the Termux:API Android app\n"
+                    "Option 2: pkg install python-numpy portaudio && python -m pip install sounddevice"
+                )
            raise RuntimeError(
                "Voice mode requires sounddevice and numpy.\n"
                "Install with: pip install sounddevice numpy\n"
@@ -5762,7 +6175,7 @@ class HermesCLI:
            pass

        if self._voice_recorder is None:
-            self._voice_recorder = AudioRecorder()
+            self._voice_recorder = create_audio_recorder()

        # Apply config-driven silence params
        self._voice_recorder._silence_threshold = voice_cfg.get("silence_threshold", 200)
@@ -5791,7 +6204,13 @@ class HermesCLI:
            with self._voice_lock:
                self._voice_recording = False
            raise
-        _cprint(f"\n{_GOLD}● Recording...{_RST} {_DIM}(auto-stops on silence | Ctrl+B to stop & exit continuous){_RST}")
+        if getattr(self._voice_recorder, "supports_silence_autostop", True):
+            _recording_hint = "auto-stops on silence | Ctrl+B to stop & exit continuous"
+        elif _is_termux_environment():
+            _recording_hint = "Termux:API capture | Ctrl+B to stop"
+        else:
+            _recording_hint = "Ctrl+B to stop"
+        _cprint(f"\n{_GOLD}● Recording...{_RST} {_DIM}({_recording_hint}){_RST}")

        # Periodically refresh prompt to update audio level indicator
        def _refresh_level():
@@ -5999,8 +6418,13 @@ class HermesCLI:
            for line in reqs["details"].split("\n"):
                _cprint(f"  {_DIM}{line}{_RST}")
            if reqs["missing_packages"]:
-                _cprint(f"\n  {_BOLD}Install: pip install {' '.join(reqs['missing_packages'])}{_RST}")
-                _cprint(f"  {_DIM}Or: pip install hermes-agent[voice]{_RST}")
+                if _is_termux_environment():
+                    _cprint(f"\n  {_BOLD}Option 1: pkg install termux-api{_RST}")
+                    _cprint(f"  {_DIM}Then install/update the Termux:API Android app for microphone capture{_RST}")
+                    _cprint(f"  {_BOLD}Option 2: pkg install python-numpy portaudio && python -m pip install sounddevice{_RST}")
+                else:
+                    _cprint(f"\n  {_BOLD}Install: pip install {' '.join(reqs['missing_packages'])}{_RST}")
+                    _cprint(f"  {_DIM}Or: pip install hermes-agent[voice]{_RST}")
            return

        with self._voice_lock:
@@ -6464,6 +6888,7 @@ class HermesCLI:
            model_override=turn_route["model"],
            runtime_override=turn_route["runtime"],
            route_label=turn_route["label"],
+            request_overrides=turn_route.get("request_overrides"),
        ):
            return None
        
@@ -6954,27 +7379,39 @@ class HermesCLI:
    def _get_tui_prompt_fragments(self):
        """Return the prompt_toolkit fragments for the current interactive state."""
        symbol, state_suffix = self._get_tui_prompt_symbols()
+        compact = self._use_minimal_tui_chrome(width=self._get_tui_terminal_width())
+
+        def _state_fragment(style: str, icon: str, extra: str = ""):
+            if compact:
+                text = icon
+                if extra:
+                    text = f"{text} {extra.strip()}".rstrip()
+                return [(style, text + " ")]
+            if extra:
+                return [(style, f"{icon} {extra} {state_suffix}")]
+            return [(style, f"{icon} {state_suffix}")]
+
        if self._voice_recording:
            bar = self._audio_level_bar()
-            return [("class:voice-recording", f"● {bar} {state_suffix}")]
+            return _state_fragment("class:voice-recording", "●", bar)
        if self._voice_processing:
-            return [("class:voice-processing", f"◉ {state_suffix}")]
+            return _state_fragment("class:voice-processing", "◉")
        if self._sudo_state:
-            return [("class:sudo-prompt", f"🔐 {state_suffix}")]
+            return _state_fragment("class:sudo-prompt", "🔐")
        if self._secret_state:
-            return [("class:sudo-prompt", f"🔑 {state_suffix}")]
+            return _state_fragment("class:sudo-prompt", "🔑")
        if self._approval_state:
-            return [("class:prompt-working", f"⚠ {state_suffix}")]
+            return _state_fragment("class:prompt-working", "⚠")
        if self._clarify_freetext:
-            return [("class:clarify-selected", f"✎ {state_suffix}")]
+            return _state_fragment("class:clarify-selected", "✎")
        if self._clarify_state:
-            return [("class:prompt-working", f"? {state_suffix}")]
+            return _state_fragment("class:prompt-working", "?")
        if self._command_running:
-            return [("class:prompt-working", f"{self._command_spinner_frame()} {state_suffix}")]
+            return _state_fragment("class:prompt-working", self._command_spinner_frame())
        if self._agent_running:
-            return [("class:prompt-working", f"⚕ {state_suffix}")]
+            return _state_fragment("class:prompt-working", "⚕")
        if self._voice_mode:
-            return [("class:voice-prompt", f"🎤 {state_suffix}")]
+            return _state_fragment("class:voice-prompt", "🎤")
        return [("class:prompt", symbol)]

    def _get_tui_prompt_text(self) -> str:
@@ -7634,6 +8071,7 @@ class HermesCLI:

        _completer = SlashCommandCompleter(
            skill_commands_provider=lambda: _skill_commands,
+            command_filter=cli_ref._command_available,
        )
        input_area = TextArea(
            height=Dimension(min=1, max=8, preferred=1),
@@ -7830,9 +8268,9 @@ class HermesCLI:
        def get_hint_height():
            if cli_ref._sudo_state or cli_ref._secret_state or cli_ref._approval_state or cli_ref._clarify_state or cli_ref._command_running:
                return 1
-            # Keep a 1-line spacer while agent runs so output doesn't push
-            # right up against the top rule of the input area
-            return 1 if cli_ref._agent_running else 0
+            # Keep a spacer while the agent runs on roomy terminals, but reclaim
+            # the row on narrow/mobile screens where every line matters.
+            return cli_ref._agent_spacer_height()

        def get_spinner_text():
            txt = cli_ref._spinner_text
@@ -7841,7 +8279,7 @@ class HermesCLI:
            return [('class:hint', f'  {txt}')]

        def get_spinner_height():
-            return 1 if cli_ref._spinner_text else 0
+            return cli_ref._spinner_widget_height()

        spinner_widget = Window(
            content=FormattedTextControl(get_spinner_text),
@@ -8032,18 +8470,17 @@ class HermesCLI:
            filter=Condition(lambda: cli_ref._approval_state is not None),
        )

-        # Horizontal rules above and below the input (bronze, 1 line each).
-        # The bottom rule moves down as the TextArea grows with newlines.
-        # Using char='─' instead of hardcoded repetition so the rule
-        # always spans the full terminal width on any screen size.
+        # Horizontal rules above and below the input.
+        # On narrow/mobile terminals we keep the top separator for structure but
+        # hide the bottom one to recover a full row for conversation content.
        input_rule_top = Window(
            char='─',
-            height=1,
+            height=lambda: cli_ref._tui_input_rule_height("top"),
            style='class:input-rule',
        )
        input_rule_bot = Window(
            char='─',
-            height=1,
+            height=lambda: cli_ref._tui_input_rule_height("bottom"),
            style='class:input-rule',
        )

@@ -8053,10 +8490,9 @@ class HermesCLI:
        def _get_image_bar():
            if not cli_ref._attached_images:
                return []
-            base = cli_ref._image_counter - len(cli_ref._attached_images) + 1
-            badges = " ".join(
-                f"[📎 Image #{base + i}]"
-                for i in range(len(cli_ref._attached_images))
+            badges = _format_image_attachment_badges(
+                cli_ref._attached_images,
+                cli_ref._image_counter,
            )
            return [("class:image-badge", f" {badges} ")]

@@ -8067,13 +8503,7 @@ class HermesCLI:

        # Persistent voice mode status bar (visible only when voice mode is on)
        def _get_voice_status():
-            if cli_ref._voice_recording:
-                return [('class:voice-status-recording', ' ● REC  Ctrl+B to stop ')]
-            if cli_ref._voice_processing:
-                return [('class:voice-status', ' ◉ Transcribing... ')]
-            tts = " | TTS on" if cli_ref._voice_tts else ""
-            cont = " | Continuous" if cli_ref._voice_continuous else ""
-            return [('class:voice-status', f' 🎤 Voice mode{tts}{cont}  —  Ctrl+B to record ')]
+            return cli_ref._get_voice_status_fragments()

        voice_status_bar = ConditionalContainer(
            Window(
@@ -8529,6 +8959,7 @@ class HermesCLI:
 def main(
    query: str = None,
    q: str = None,
+    image: str = None,
    toolsets: str = None,
    skills: str | list[str] | tuple[str, ...] = None,
    model: str = None,
@@ -8554,6 +8985,7 @@ def main(
    Args:
        query: Single query to execute (then exit). Alias: -q
        q: Shorthand for --query
+        image: Optional local image path to attach to a single query
        toolsets: Comma-separated list of toolsets to enable (e.g., "web,terminal")
        skills: Comma-separated or repeated list of skills to preload for the session
        model: Model to use (default: anthropic/claude-opus-4-20250514)
@@ -8574,6 +9006,7 @@ def main(
        python cli.py --toolsets web,terminal    # Use specific toolsets
        python cli.py --skills hermes-agent-dev,github-auth
        python cli.py -q "What is Python?"       # Single query mode
+        python cli.py -q "Describe this" --image ~/storage/shared/Pictures/cat.png
        python cli.py --list-tools               # List tools and exit
        python cli.py --resume 20260225_143052_a1b2c3  # Resume session
        python cli.py -w                         # Start in isolated git worktree
@@ -8696,23 +9129,33 @@ def main(
    atexit.register(_run_cleanup)
    
    # Handle single query mode
-    if query:
+    if query or image:
+        query, single_query_images = _collect_query_images(query, image)
        if quiet:
            # Quiet mode: suppress banner, spinner, tool previews.
            # Only print the final response and parseable session info.
            cli.tool_progress_mode = "off"
            if cli._ensure_runtime_credentials():
-                turn_route = cli._resolve_turn_agent_config(query)
+                effective_query = query
+                if single_query_images:
+                    effective_query = cli._preprocess_images_with_vision(
+                        query,
+                        single_query_images,
+                        announce=False,
+                    )
+                turn_route = cli._resolve_turn_agent_config(effective_query)
                if turn_route["signature"] != cli._active_agent_route_signature:
                    cli.agent = None
                if cli._init_agent(
                    model_override=turn_route["model"],
                    runtime_override=turn_route["runtime"],
                    route_label=turn_route["label"],
+                    request_overrides=turn_route.get("request_overrides"),
                ):
                    cli.agent.quiet_mode = True
+                    cli.agent.suppress_status_output = True
                    result = cli.agent.run_conversation(
-                        user_message=query,
+                        user_message=effective_query,
                        conversation_history=cli.conversation_history,
                    )
                    response = result.get("final_response", "") if isinstance(result, dict) else str(result)
@@ -8727,8 +9170,10 @@ def main(
            sys.exit(1)
        else:
            cli.show_banner()
-            cli.console.print(f"[bold blue]Query:[/] {query}")
-            cli.chat(query)
+            _query_label = query or ("[image attached]" if single_query_images else "")
+            if _query_label:
+                cli.console.print(f"[bold blue]Query:[/] {_query_label}")
+            cli.chat(query, images=single_query_images or None)
            cli._print_exit_summary()
        return
    
@@ -0,0 +1,15 @@
+# Termux / Android dependency constraints for Hermes Agent.
+#
+# Usage:
+#   python -m pip install -e '.[termux]' -c constraints-termux.txt
+#
+# These pins keep the tested Android install path stable when upstream packages
+# move faster than Termux-compatible wheels / sdists.
+
+ipython<10
+jedi>=0.18.1,<0.20
+parso>=0.8.4,<0.9
+stack-data>=0.6,<0.7
+pexpect>4.3,<5
+matplotlib-inline>=0.1.7,<0.2
+asttokens>=2.1,<3
@@ -901,6 +901,9 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
                pass
        if api_server_host:
            config.platforms[Platform.API_SERVER].extra["host"] = api_server_host
+        api_server_model_name = os.getenv("API_SERVER_MODEL_NAME", "")
+        if api_server_model_name:
+            config.platforms[Platform.API_SERVER].extra["model_name"] = api_server_model_name

    # Webhook platform
    webhook_enabled = os.getenv("WEBHOOK_ENABLED", "").lower() in ("true", "1", "yes")
@@ -124,6 +124,53 @@ class DeliveryRouter:
        self.adapters = adapters or {}
        self.output_dir = get_hermes_home() / "cron" / "output"
    
+    def resolve_targets(
+        self,
+        deliver: Union[str, List[str]],
+        origin: Optional[SessionSource] = None
+    ) -> List[DeliveryTarget]:
+        """
+        Resolve delivery specification to concrete targets.
+        
+        Args:
+            deliver: Delivery spec - "origin", "telegram", ["local", "discord"], etc.
+            origin: The source where the request originated (for "origin" target)
+        
+        Returns:
+            List of resolved delivery targets
+        """
+        if isinstance(deliver, str):
+            deliver = [deliver]
+        
+        targets = []
+        seen_platforms = set()
+        
+        for target_str in deliver:
+            target = DeliveryTarget.parse(target_str, origin)
+            
+            # Resolve home channel if needed
+            if target.chat_id is None and target.platform != Platform.LOCAL:
+                home = self.config.get_home_channel(target.platform)
+                if home:
+                    target.chat_id = home.chat_id
+                else:
+                    # No home channel configured, skip this platform
+                    continue
+            
+            # Deduplicate
+            key = (target.platform, target.chat_id, target.thread_id)
+            if key not in seen_platforms:
+                seen_platforms.add(key)
+                targets.append(target)
+        
+        # Always include local if configured
+        if self.config.always_log_local:
+            local_key = (Platform.LOCAL, None, None)
+            if local_key not in seen_platforms:
+                targets.append(DeliveryTarget(platform=Platform.LOCAL))
+        
+        return targets
+    
    async def deliver(
        self,
        content: str,
@@ -252,5 +299,19 @@ class DeliveryRouter:
        return await adapter.send(target.chat_id, content, metadata=send_metadata or None)


+def parse_deliver_spec(
+    deliver: Optional[Union[str, List[str]]],
+    origin: Optional[SessionSource] = None,
+    default: str = "origin"
+) -> Union[str, List[str]]:
+    """
+    Normalize a delivery specification.
+    
+    If None or empty, returns the default.
+    """
+    if not deliver:
+        return default
+    return deliver
+


@@ -299,6 +299,9 @@ class APIServerAdapter(BasePlatformAdapter):
        self._cors_origins: tuple[str, ...] = self._parse_cors_origins(
            extra.get("cors_origins", os.getenv("API_SERVER_CORS_ORIGINS", "")),
        )
+        self._model_name: str = self._resolve_model_name(
+            extra.get("model_name", os.getenv("API_SERVER_MODEL_NAME", "")),
+        )
        self._app: Optional["web.Application"] = None
        self._runner: Optional["web.AppRunner"] = None
        self._site: Optional["web.TCPSite"] = None
@@ -324,6 +327,26 @@ class APIServerAdapter(BasePlatformAdapter):

        return tuple(str(item).strip() for item in items if str(item).strip())

+    @staticmethod
+    def _resolve_model_name(explicit: str) -> str:
+        """Derive the advertised model name for /v1/models.
+
+        Priority:
+        1. Explicit override (config extra or API_SERVER_MODEL_NAME env var)
+        2. Active profile name (so each profile advertises a distinct model)
+        3. Fallback: "hermes-agent"
+        """
+        if explicit and explicit.strip():
+            return explicit.strip()
+        try:
+            from hermes_cli.profiles import get_active_profile_name
+            profile = get_active_profile_name()
+            if profile and profile not in ("default", "custom"):
+                return profile
+        except Exception:
+            pass
+        return "hermes-agent"
+
    def _cors_headers_for_origin(self, origin: str) -> Optional[Dict[str, str]]:
        """Return CORS headers for an allowed browser origin."""
        if not origin or not self._cors_origins:
@@ -468,12 +491,12 @@ class APIServerAdapter(BasePlatformAdapter):
            "object": "list",
            "data": [
                {
-                    "id": "hermes-agent",
+                    "id": self._model_name,
                    "object": "model",
                    "created": int(time.time()),
                    "owned_by": "hermes",
                    "permission": [],
-                    "root": "hermes-agent",
+                    "root": self._model_name,
                    "parent": None,
                }
            ],
@@ -546,7 +569,7 @@ class APIServerAdapter(BasePlatformAdapter):
            # history already set from request body above

        completion_id = f"chatcmpl-{uuid.uuid4().hex[:29]}"
-        model_name = body.get("model", "hermes-agent")
+        model_name = body.get("model", self._model_name)
        created = int(time.time())

        if stream:
@@ -923,7 +946,7 @@ class APIServerAdapter(BasePlatformAdapter):
            "object": "response",
            "status": "completed",
            "created_at": created_at,
-            "model": body.get("model", "hermes-agent"),
+            "model": body.get("model", self._model_name),
            "output": output_items,
            "usage": {
                "input_tokens": usage.get("input_tokens", 0),
@@ -1653,8 +1676,8 @@ class APIServerAdapter(BasePlatformAdapter):

            self._mark_connected()
            logger.info(
-                "[%s] API server listening on http://%s:%d",
-                self.name, self._host, self._port,
+                "[%s] API server listening on http://%s:%d (model: %s)",
+                self.name, self._host, self._port, self._model_name,
            )
            return True

@@ -422,6 +422,7 @@ class DiscordAdapter(BasePlatformAdapter):

    # Discord message limits
    MAX_MESSAGE_LENGTH = 2000
+    _SPLIT_THRESHOLD = 1900  # near the 2000-char split point

    # Auto-disconnect from voice channel after this many seconds of inactivity
    VOICE_TIMEOUT = 300
@@ -433,6 +434,11 @@ class DiscordAdapter(BasePlatformAdapter):
        self._allowed_user_ids: set = set()  # For button approval authorization
        # Voice channel state (per-guild)
        self._voice_clients: Dict[int, Any] = {}  # guild_id -> VoiceClient
+        # Text batching: merge rapid successive messages (Telegram-style)
+        self._text_batch_delay_seconds = float(os.getenv("HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS", "0.6"))
+        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
+        self._pending_text_batches: Dict[str, MessageEvent] = {}
+        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
        self._voice_text_channels: Dict[int, int] = {}  # guild_id -> text_channel_id
        self._voice_timeout_tasks: Dict[int, asyncio.Task] = {}  # guild_id -> timeout task
        # Phase 2: voice listening
@@ -2466,7 +2472,80 @@ class DiscordAdapter(BasePlatformAdapter):
        if thread_id:
            self._track_thread(thread_id)

-        await self.handle_message(event)
+        # Only batch plain text messages — commands, media, etc. dispatch
+        # immediately since they won't be split by the Discord client.
+        if msg_type == MessageType.TEXT and self._text_batch_delay_seconds > 0:
+            self._enqueue_text_event(event)
+        else:
+            await self.handle_message(event)
+
+    # ------------------------------------------------------------------
+    # Text message aggregation (handles Discord client-side splits)
+    # ------------------------------------------------------------------
+
+    def _text_batch_key(self, event: MessageEvent) -> str:
+        """Session-scoped key for text message batching."""
+        from gateway.session import build_session_key
+        return build_session_key(
+            event.source,
+            group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
+            thread_sessions_per_user=self.config.extra.get("thread_sessions_per_user", False),
+        )
+
+    def _enqueue_text_event(self, event: MessageEvent) -> None:
+        """Buffer a text event and reset the flush timer.
+
+        When Discord splits a long user message at 2000 chars, the chunks
+        arrive within a few hundred milliseconds.  This merges them into
+        a single event before dispatching.
+        """
+        key = self._text_batch_key(event)
+        existing = self._pending_text_batches.get(key)
+        chunk_len = len(event.text or "")
+        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            self._pending_text_batches[key] = event
+        else:
+            if event.text:
+                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            if event.media_urls:
+                existing.media_urls.extend(event.media_urls)
+                existing.media_types.extend(event.media_types)
+
+        prior_task = self._pending_text_batch_tasks.get(key)
+        if prior_task and not prior_task.done():
+            prior_task.cancel()
+        self._pending_text_batch_tasks[key] = asyncio.create_task(
+            self._flush_text_batch(key)
+        )
+
+    async def _flush_text_batch(self, key: str) -> None:
+        """Wait for the quiet period then dispatch the aggregated text.
+
+        Uses a longer delay when the latest chunk is near Discord's 2000-char
+        split point, since a continuation chunk is almost certain.
+        """
+        current_task = asyncio.current_task()
+        try:
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
+            event = self._pending_text_batches.pop(key, None)
+            if not event:
+                return
+            logger.info(
+                "[Discord] Flushing text batch %s (%d chars)",
+                key, len(event.text or ""),
+            )
+            await self.handle_message(event)
+        finally:
+            if self._pending_text_batch_tasks.get(key) is current_task:
+                self._pending_text_batch_tasks.pop(key, None)


 # ---------------------------------------------------------------------------
@@ -264,6 +264,7 @@ class FeishuAdapterSettings:
    bot_name: str
    dedup_cache_size: int
    text_batch_delay_seconds: float
+    text_batch_split_delay_seconds: float
    text_batch_max_messages: int
    text_batch_max_chars: int
    media_batch_delay_seconds: float
@@ -1014,6 +1015,10 @@ class FeishuAdapter(BasePlatformAdapter):
    """Feishu/Lark bot adapter."""

    MAX_MESSAGE_LENGTH = 8000
+    # Threshold for detecting Feishu client-side message splits.
+    # When a chunk is near the ~4096-char practical limit, a continuation
+    # is almost certain.
+    _SPLIT_THRESHOLD = 4000

    # =========================================================================
    # Lifecycle — init / settings / connect / disconnect
@@ -1105,6 +1110,9 @@ class FeishuAdapter(BasePlatformAdapter):
            text_batch_delay_seconds=float(
                os.getenv("HERMES_FEISHU_TEXT_BATCH_DELAY_SECONDS", str(_DEFAULT_TEXT_BATCH_DELAY_SECONDS))
            ),
+            text_batch_split_delay_seconds=float(
+                os.getenv("HERMES_FEISHU_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0")
+            ),
            text_batch_max_messages=max(
                1,
                int(os.getenv("HERMES_FEISHU_TEXT_BATCH_MAX_MESSAGES", str(_DEFAULT_TEXT_BATCH_MAX_MESSAGES))),
@@ -1152,6 +1160,7 @@ class FeishuAdapter(BasePlatformAdapter):
        self._bot_name = settings.bot_name
        self._dedup_cache_size = settings.dedup_cache_size
        self._text_batch_delay_seconds = settings.text_batch_delay_seconds
+        self._text_batch_split_delay_seconds = settings.text_batch_split_delay_seconds
        self._text_batch_max_messages = settings.text_batch_max_messages
        self._text_batch_max_chars = settings.text_batch_max_chars
        self._media_batch_delay_seconds = settings.media_batch_delay_seconds
@@ -2478,8 +2487,10 @@ class FeishuAdapter(BasePlatformAdapter):
    async def _enqueue_text_event(self, event: MessageEvent) -> None:
        """Debounce rapid Feishu text bursts into a single MessageEvent."""
        key = self._text_batch_key(event)
+        chunk_len = len(event.text or "")
        existing = self._pending_text_batches.get(key)
        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
            self._pending_text_batches[key] = event
            self._pending_text_batch_counts[key] = 1
            self._schedule_text_batch_flush(key)
@@ -2504,6 +2515,7 @@ class FeishuAdapter(BasePlatformAdapter):
            return

        existing.text = next_text
+        existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
        existing.timestamp = event.timestamp
        if event.message_id:
            existing.message_id = event.message_id
@@ -2530,10 +2542,22 @@ class FeishuAdapter(BasePlatformAdapter):
        task_map[key] = asyncio.create_task(flush_fn(key))

    async def _flush_text_batch(self, key: str) -> None:
-        """Flush a pending text batch after the quiet period."""
+        """Flush a pending text batch after the quiet period.
+
+        Uses a longer delay when the latest chunk is near Feishu's ~4096-char
+        split point, since a continuation chunk is almost certain.
+        """
        current_task = asyncio.current_task()
        try:
-            await asyncio.sleep(self._text_batch_delay_seconds)
+            # Adaptive delay: if the latest chunk is near the split threshold,
+            # a continuation is almost certain — wait longer.
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
            await self._flush_text_batch_now(key)
        finally:
            if self._pending_text_batch_tasks.get(key) is current_task:
@@ -120,6 +120,11 @@ def check_matrix_requirements() -> bool:
 class MatrixAdapter(BasePlatformAdapter):
    """Gateway adapter for Matrix (any homeserver)."""

+    # Threshold for detecting Matrix client-side message splits.
+    # When a chunk is near the ~4000-char practical limit, a continuation
+    # is almost certain.
+    _SPLIT_THRESHOLD = 3900
+
    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.MATRIX)

@@ -172,6 +177,13 @@ class MatrixAdapter(BasePlatformAdapter):
            "MATRIX_REACTIONS", "true"
        ).lower() not in ("false", "0", "no")

+        # Text batching: merge rapid successive messages (Telegram-style).
+        # Matrix clients split long messages around 4000 chars.
+        self._text_batch_delay_seconds = float(os.getenv("HERMES_MATRIX_TEXT_BATCH_DELAY_SECONDS", "0.6"))
+        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_MATRIX_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
+        self._pending_text_batches: Dict[str, MessageEvent] = {}
+        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
+
    def _is_duplicate_event(self, event_id) -> bool:
        """Return True if this event was already processed. Tracks the ID otherwise."""
        if not event_id:
@@ -1088,7 +1100,81 @@ class MatrixAdapter(BasePlatformAdapter):
        # Acknowledge receipt so the room shows as read (fire-and-forget).
        self._background_read_receipt(room.room_id, event.event_id)

-        await self.handle_message(msg_event)
+        # Only batch plain text messages — commands dispatch immediately.
+        if msg_type == MessageType.TEXT and self._text_batch_delay_seconds > 0:
+            self._enqueue_text_event(msg_event)
+        else:
+            await self.handle_message(msg_event)
+
+    # ------------------------------------------------------------------
+    # Text message aggregation (handles Matrix client-side splits)
+    # ------------------------------------------------------------------
+
+    def _text_batch_key(self, event: MessageEvent) -> str:
+        """Session-scoped key for text message batching."""
+        from gateway.session import build_session_key
+        return build_session_key(
+            event.source,
+            group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
+            thread_sessions_per_user=self.config.extra.get("thread_sessions_per_user", False),
+        )
+
+    def _enqueue_text_event(self, event: MessageEvent) -> None:
+        """Buffer a text event and reset the flush timer.
+
+        When a Matrix client splits a long message, the chunks arrive within
+        a few hundred milliseconds.  This merges them into a single event
+        before dispatching.
+        """
+        key = self._text_batch_key(event)
+        existing = self._pending_text_batches.get(key)
+        chunk_len = len(event.text or "")
+        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            self._pending_text_batches[key] = event
+        else:
+            if event.text:
+                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            # Merge any media that might be attached
+            if event.media_urls:
+                existing.media_urls.extend(event.media_urls)
+                existing.media_types.extend(event.media_types)
+
+        # Cancel any pending flush and restart the timer
+        prior_task = self._pending_text_batch_tasks.get(key)
+        if prior_task and not prior_task.done():
+            prior_task.cancel()
+        self._pending_text_batch_tasks[key] = asyncio.create_task(
+            self._flush_text_batch(key)
+        )
+
+    async def _flush_text_batch(self, key: str) -> None:
+        """Wait for the quiet period then dispatch the aggregated text.
+
+        Uses a longer delay when the latest chunk is near Matrix's ~4000-char
+        split point, since a continuation chunk is almost certain.
+        """
+        current_task = asyncio.current_task()
+        try:
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
+            event = self._pending_text_batches.pop(key, None)
+            if not event:
+                return
+            logger.info(
+                "[Matrix] Flushing text batch %s (%d chars)",
+                key, len(event.text or ""),
+            )
+            await self.handle_message(event)
+        finally:
+            if self._pending_text_batch_tasks.get(key) is current_task:
+                self._pending_text_batch_tasks.pop(key, None)

    async def _on_room_message_media(self, room: Any, event: Any) -> None:
        """Handle incoming media messages (images, audio, video, files)."""
@@ -121,6 +121,9 @@ class TelegramAdapter(BasePlatformAdapter):
    
    # Telegram message limits
    MAX_MESSAGE_LENGTH = 4096
+    # Threshold for detecting Telegram client-side message splits.
+    # When a chunk is near this limit, a continuation is almost certain.
+    _SPLIT_THRESHOLD = 4000
    MEDIA_GROUP_WAIT_SECONDS = 0.8
    
    def __init__(self, config: PlatformConfig):
@@ -140,6 +143,7 @@ class TelegramAdapter(BasePlatformAdapter):
        # Buffer rapid text messages so Telegram client-side splits of long
        # messages are aggregated into a single MessageEvent.
        self._text_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS", "0.6"))
+        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
        self._pending_text_batches: Dict[str, MessageEvent] = {}
        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
        self._token_lock_identity: Optional[str] = None
@@ -2160,12 +2164,15 @@ class TelegramAdapter(BasePlatformAdapter):
        """
        key = self._text_batch_key(event)
        existing = self._pending_text_batches.get(key)
+        chunk_len = len(event.text or "")
        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
            self._pending_text_batches[key] = event
        else:
            # Append text from the follow-up chunk
            if event.text:
                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
            # Merge any media that might be attached
            if event.media_urls:
                existing.media_urls.extend(event.media_urls)
@@ -2180,10 +2187,22 @@ class TelegramAdapter(BasePlatformAdapter):
        )

    async def _flush_text_batch(self, key: str) -> None:
-        """Wait for the quiet period then dispatch the aggregated text."""
+        """Wait for the quiet period then dispatch the aggregated text.
+
+        Uses a longer delay when the latest chunk is near Telegram's 4096-char
+        split point, since a continuation chunk is almost certain.
+        """
        current_task = asyncio.current_task()
        try:
-            await asyncio.sleep(self._text_batch_delay_seconds)
+            # Adaptive delay: if the latest chunk is near Telegram's 4096-char
+            # split point, a continuation is almost certain — wait longer.
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
            event = self._pending_text_batches.pop(key, None)
            if not event:
                return
@@ -143,6 +143,9 @@ class WeComAdapter(BasePlatformAdapter):
    """WeCom AI Bot adapter backed by a persistent WebSocket connection."""

    MAX_MESSAGE_LENGTH = MAX_MESSAGE_LENGTH
+    # Threshold for detecting WeCom client-side message splits.
+    # When a chunk is near the 4000-char limit, a continuation is almost certain.
+    _SPLIT_THRESHOLD = 3900

    def __init__(self, config: PlatformConfig):
        super().__init__(config, Platform.WECOM)
@@ -172,6 +175,13 @@ class WeComAdapter(BasePlatformAdapter):
        self._seen_messages: Dict[str, float] = {}
        self._reply_req_ids: Dict[str, str] = {}

+        # Text batching: merge rapid successive messages (Telegram-style).
+        # WeCom clients split long messages around 4000 chars.
+        self._text_batch_delay_seconds = float(os.getenv("HERMES_WECOM_TEXT_BATCH_DELAY_SECONDS", "0.6"))
+        self._text_batch_split_delay_seconds = float(os.getenv("HERMES_WECOM_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
+        self._pending_text_batches: Dict[str, MessageEvent] = {}
+        self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
+
    # ------------------------------------------------------------------
    # Connection lifecycle
    # ------------------------------------------------------------------
@@ -519,7 +529,82 @@ class WeComAdapter(BasePlatformAdapter):
            timestamp=datetime.now(tz=timezone.utc),
        )

-        await self.handle_message(event)
+        # Only batch plain text messages — commands, media, etc. dispatch
+        # immediately since they won't be split by the WeCom client.
+        if message_type == MessageType.TEXT and self._text_batch_delay_seconds > 0:
+            self._enqueue_text_event(event)
+        else:
+            await self.handle_message(event)
+
+    # ------------------------------------------------------------------
+    # Text message aggregation (handles WeCom client-side splits)
+    # ------------------------------------------------------------------
+
+    def _text_batch_key(self, event: MessageEvent) -> str:
+        """Session-scoped key for text message batching."""
+        from gateway.session import build_session_key
+        return build_session_key(
+            event.source,
+            group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
+            thread_sessions_per_user=self.config.extra.get("thread_sessions_per_user", False),
+        )
+
+    def _enqueue_text_event(self, event: MessageEvent) -> None:
+        """Buffer a text event and reset the flush timer.
+
+        When WeCom splits a long user message at 4000 chars, the chunks
+        arrive within a few hundred milliseconds.  This merges them into
+        a single event before dispatching.
+        """
+        key = self._text_batch_key(event)
+        existing = self._pending_text_batches.get(key)
+        chunk_len = len(event.text or "")
+        if existing is None:
+            event._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            self._pending_text_batches[key] = event
+        else:
+            if event.text:
+                existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
+            existing._last_chunk_len = chunk_len  # type: ignore[attr-defined]
+            # Merge any media that might be attached
+            if event.media_urls:
+                existing.media_urls.extend(event.media_urls)
+                existing.media_types.extend(event.media_types)
+
+        # Cancel any pending flush and restart the timer
+        prior_task = self._pending_text_batch_tasks.get(key)
+        if prior_task and not prior_task.done():
+            prior_task.cancel()
+        self._pending_text_batch_tasks[key] = asyncio.create_task(
+            self._flush_text_batch(key)
+        )
+
+    async def _flush_text_batch(self, key: str) -> None:
+        """Wait for the quiet period then dispatch the aggregated text.
+
+        Uses a longer delay when the latest chunk is near WeCom's 4000-char
+        split point, since a continuation chunk is almost certain.
+        """
+        current_task = asyncio.current_task()
+        try:
+            pending = self._pending_text_batches.get(key)
+            last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
+            if last_len >= self._SPLIT_THRESHOLD:
+                delay = self._text_batch_split_delay_seconds
+            else:
+                delay = self._text_batch_delay_seconds
+            await asyncio.sleep(delay)
+            event = self._pending_text_batches.pop(key, None)
+            if not event:
+                return
+            logger.info(
+                "[WeCom] Flushing text batch %s (%d chars)",
+                key, len(event.text or ""),
+            )
+            await self.handle_message(event)
+        finally:
+            if self._pending_text_batch_tasks.get(key) is current_task:
+                self._pending_text_batch_tasks.pop(key, None)

    @staticmethod
    def _extract_text(body: Dict[str, Any]) -> Tuple[str, Optional[str]]:
@@ -514,6 +514,12 @@ class GatewayRunner:
        self._agent_cache: Dict[str, tuple] = {}
        self._agent_cache_lock = _threading.Lock()

+        # Track active fallback model/provider when primary is rate-limited.
+        # Set after an agent run where fallback was activated; cleared when
+        # the primary model succeeds again or the user switches via /model.
+        self._effective_model: Optional[str] = None
+        self._effective_provider: Optional[str] = None
+
        # Per-session model overrides from /model command.
        # Key: session_key, Value: dict with model/provider/api_key/base_url/api_mode
        self._session_model_overrides: Dict[str, Dict[str, str]] = {}
@@ -7274,9 +7280,15 @@ class GatewayRunner:
            if _agent is not None and hasattr(_agent, 'model'):
                _cfg_model = _resolve_gateway_model()
                if _agent.model != _cfg_model:
+                    self._effective_model = _agent.model
+                    self._effective_provider = getattr(_agent, 'provider', None)
                    # Fallback activated — evict cached agent so the next
                    # message starts fresh and retries the primary model.
                    self._evict_cached_agent(session_key)
+                else:
+                    # Primary model worked — clear any stale fallback state
+                    self._effective_model = None
+                    self._effective_provider = None

            # Check if we were interrupted OR have a queued message (/queue).
            result = result_holder[0]
@@ -32,6 +32,9 @@ def _now() -> datetime:
 # PII redaction helpers
 # ---------------------------------------------------------------------------

+_PHONE_RE = re.compile(r"^\+?\d[\d\-\s]{6,}$")
+
+
 def _hash_id(value: str) -> str:
    """Deterministic 12-char hex hash of an identifier."""
    return hashlib.sha256(value.encode("utf-8")).hexdigest()[:12]
@@ -55,6 +58,10 @@ def _hash_chat_id(value: str) -> str:
    return _hash_id(value)


+def _looks_like_phone(value: str) -> bool:
+    """Return True if *value* looks like a phone number (E.164 or similar)."""
+    return bool(_PHONE_RE.match(value.strip()))
+
 from .config import (
    Platform,
    GatewayConfig,
@@ -137,6 +144,15 @@ class SessionSource:
            chat_id_alt=data.get("chat_id_alt"),
        )
    
+    @classmethod
+    def local_cli(cls) -> "SessionSource":
+        """Create a source representing the local CLI."""
+        return cls(
+            platform=Platform.LOCAL,
+            chat_id="cli",
+            chat_name="CLI terminal",
+            chat_type="dm",
+        )


@dataclass
@@ -494,7 +510,8 @@ class SessionStore:
    """
    
    def __init__(self, sessions_dir: Path, config: GatewayConfig,
-                 has_active_processes_fn=None):
+                 has_active_processes_fn=None,
+                 on_auto_reset=None):
        self.sessions_dir = sessions_dir
        self.config = config
        self._entries: Dict[str, SessionEntry] = {}
@@ -70,6 +70,7 @@ DEFAULT_CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex"
 DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
 DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
 DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
+DEFAULT_GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai"
 CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
 CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
 CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
@@ -2341,6 +2342,33 @@ def resolve_external_process_provider_credentials(provider_id: str) -> Dict[str,
    }


+# =============================================================================
+# External credential detection
+# =============================================================================
+
+def detect_external_credentials() -> List[Dict[str, Any]]:
+    """Scan for credentials from other CLI tools that Hermes can reuse.
+
+    Returns a list of dicts, each with:
+      - provider: str   -- Hermes provider id (e.g. "openai-codex")
+      - path: str       -- filesystem path where creds were found
+      - label: str      -- human-friendly description for the setup UI
+    """
+    found: List[Dict[str, Any]] = []
+
+    # Codex CLI: ~/.codex/auth.json (importable, not shared)
+    cli_tokens = _import_codex_cli_tokens()
+    if cli_tokens:
+        codex_path = Path.home() / ".codex" / "auth.json"
+        found.append({
+            "provider": "openai-codex",
+            "path": str(codex_path),
+            "label": f"Codex CLI credentials found ({codex_path}) — run `hermes auth` to create a separate session",
+        })
+
+    return found
+
+
 # =============================================================================
 # CLI Commands — login / logout
 # =============================================================================
@@ -90,6 +90,12 @@ HERMES_CADUCEUS = """[#CD7F32]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⡀⠀⣀⣀
 [#B8860B]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⠈⣡⠞⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
 [#B8860B]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]"""

+COMPACT_BANNER = """
+[bold #FFD700]╔══════════════════════════════════════════════════════════════╗[/]
+[bold #FFD700]║[/]  [#FFBF00]⚕ NOUS HERMES[/] [dim #B8860B]- AI Agent Framework[/]              [bold #FFD700]║[/]
+[bold #FFD700]║[/]  [#CD7F32]Messenger of the Digital Gods[/]    [dim #B8860B]Nous Research[/]   [bold #FFD700]║[/]
+[bold #FFD700]╚══════════════════════════════════════════════════════════════╝[/]
+"""


 # =========================================================================
@@ -0,0 +1,140 @@
+"""Shared curses-based multi-select checklist for Hermes CLI.
+
+Used by both ``hermes tools`` and ``hermes skills`` to present a
+toggleable list of items.  Falls back to a numbered text UI when
+curses is unavailable (Windows without curses, piped stdin, etc.).
+"""
+
+import sys
+from typing import List, Set
+
+from hermes_cli.colors import Colors, color
+
+
+def curses_checklist(
+    title: str,
+    items: List[str],
+    pre_selected: Set[int],
+) -> Set[int]:
+    """Multi-select checklist.  Returns set of **selected** indices.
+
+    Args:
+        title: Header text shown at the top of the checklist.
+        items: Display labels for each row.
+        pre_selected: Indices that start checked.
+
+    Returns:
+        The indices the user confirmed as checked.  On cancel (ESC/q),
+        returns ``pre_selected`` unchanged.
+    """
+    # Safety: return defaults when stdin is not a terminal.
+    if not sys.stdin.isatty():
+        return set(pre_selected)
+
+    try:
+        import curses
+        selected = set(pre_selected)
+        result = [None]
+
+        def _ui(stdscr):
+            curses.curs_set(0)
+            if curses.has_colors():
+                curses.start_color()
+                curses.use_default_colors()
+                curses.init_pair(1, curses.COLOR_GREEN, -1)
+                curses.init_pair(2, curses.COLOR_YELLOW, -1)
+                curses.init_pair(3, 8, -1)  # dim gray
+            cursor = 0
+            scroll_offset = 0
+
+            while True:
+                stdscr.clear()
+                max_y, max_x = stdscr.getmaxyx()
+
+                # Header
+                try:
+                    hattr = curses.A_BOLD | (curses.color_pair(2) if curses.has_colors() else 0)
+                    stdscr.addnstr(0, 0, title, max_x - 1, hattr)
+                    stdscr.addnstr(
+                        1, 0,
+                        "  ↑↓ navigate  SPACE toggle  ENTER confirm  ESC cancel",
+                        max_x - 1, curses.A_DIM,
+                    )
+                except curses.error:
+                    pass
+
+                # Scrollable item list
+                visible_rows = max_y - 3
+                if cursor < scroll_offset:
+                    scroll_offset = cursor
+                elif cursor >= scroll_offset + visible_rows:
+                    scroll_offset = cursor - visible_rows + 1
+
+                for draw_i, i in enumerate(
+                    range(scroll_offset, min(len(items), scroll_offset + visible_rows))
+                ):
+                    y = draw_i + 3
+                    if y >= max_y - 1:
+                        break
+                    check = "✓" if i in selected else " "
+                    arrow = "→" if i == cursor else " "
+                    line = f" {arrow} [{check}] {items[i]}"
+
+                    attr = curses.A_NORMAL
+                    if i == cursor:
+                        attr = curses.A_BOLD
+                        if curses.has_colors():
+                            attr |= curses.color_pair(1)
+                    try:
+                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
+                    except curses.error:
+                        pass
+
+                stdscr.refresh()
+                key = stdscr.getch()
+
+                if key in (curses.KEY_UP, ord("k")):
+                    cursor = (cursor - 1) % len(items)
+                elif key in (curses.KEY_DOWN, ord("j")):
+                    cursor = (cursor + 1) % len(items)
+                elif key == ord(" "):
+                    selected.symmetric_difference_update({cursor})
+                elif key in (curses.KEY_ENTER, 10, 13):
+                    result[0] = set(selected)
+                    return
+                elif key in (27, ord("q")):
+                    result[0] = set(pre_selected)
+                    return
+
+        curses.wrapper(_ui)
+        return result[0] if result[0] is not None else set(pre_selected)
+
+    except Exception:
+        pass  # fall through to numbered fallback
+
+    # ── Numbered text fallback ────────────────────────────────────────────
+    selected = set(pre_selected)
+    print(color(f"\n  {title}", Colors.YELLOW))
+    print(color("  Toggle by number, Enter to confirm.\n", Colors.DIM))
+
+    while True:
+        for i, label in enumerate(items):
+            check = "✓" if i in selected else " "
+            print(f"    {i + 1:3}. [{check}] {label}")
+        print()
+
+        try:
+            raw = input(color("  Number to toggle, 's' to save, 'q' to cancel: ", Colors.DIM)).strip()
+        except (KeyboardInterrupt, EOFError):
+            return set(pre_selected)
+
+        if raw.lower() == "s" or raw == "":
+            return selected
+        if raw.lower() == "q":
+            return set(pre_selected)
+        try:
+            idx = int(raw) - 1
+            if 0 <= idx < len(items):
+                selected.symmetric_difference_update({idx})
+        except ValueError:
+            print(color("  Invalid input", Colors.DIM))
@@ -100,6 +100,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("reasoning", "Manage reasoning effort and display", "Configuration",
               args_hint="[level|show|hide]",
               subcommands=("none", "minimal", "low", "medium", "high", "xhigh", "show", "hide", "on", "off")),
+    CommandDef("fast", "Toggle OpenAI Priority Processing (Normal/Fast)", "Configuration",
+               cli_only=True, args_hint="[normal|fast|status]",
+               subcommands=("normal", "fast", "status", "on", "off")),
    CommandDef("skin", "Show or change the display skin/theme", "Configuration",
               cli_only=True, args_hint="[name]"),
    CommandDef("voice", "Toggle voice mode", "Configuration",
@@ -135,6 +138,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
               cli_only=True, aliases=("gateway",)),
    CommandDef("paste", "Check clipboard for an image and attach it", "Info",
               cli_only=True),
+    CommandDef("image", "Attach a local image file for your next prompt", "Info",
+               cli_only=True, args_hint="<path>"),
    CommandDef("update", "Update Hermes Agent to the latest version", "Info",
               gateway_only=True),

@@ -169,6 +174,12 @@ def resolve_command(name: str) -> CommandDef | None:
    return _COMMAND_LOOKUP.get(name.lower().lstrip("/"))


+def register_plugin_command(cmd: CommandDef) -> None:
+    """Append a plugin-defined command to the registry and refresh lookups."""
+    COMMAND_REGISTRY.append(cmd)
+    rebuild_lookups()
+
+
 def rebuild_lookups() -> None:
    """Rebuild all derived lookup dicts from the current COMMAND_REGISTRY.

@@ -631,8 +642,18 @@ class SlashCommandCompleter(Completer):
    def __init__(
        self,
        skill_commands_provider: Callable[[], Mapping[str, dict[str, Any]]] | None = None,
+        command_filter: Callable[[str], bool] | None = None,
    ) -> None:
        self._skill_commands_provider = skill_commands_provider
+        self._command_filter = command_filter
+
+    def _command_allowed(self, slash_command: str) -> bool:
+        if self._command_filter is None:
+            return True
+        try:
+            return bool(self._command_filter(slash_command))
+        except Exception:
+            return True

    def _iter_skill_commands(self) -> Mapping[str, dict[str, Any]]:
        if self._skill_commands_provider is None:
@@ -910,7 +931,7 @@ class SlashCommandCompleter(Completer):
                return

            # Static subcommand completions
-            if " " not in sub_text and base_cmd in SUBCOMMANDS:
+            if " " not in sub_text and base_cmd in SUBCOMMANDS and self._command_allowed(base_cmd):
                for sub in SUBCOMMANDS[base_cmd]:
                    if sub.startswith(sub_lower) and sub != sub_lower:
                        yield Completion(
@@ -923,6 +944,8 @@ class SlashCommandCompleter(Completer):
        word = text[1:]

        for cmd, desc in COMMANDS.items():
+            if not self._command_allowed(cmd):
+                continue
            cmd_name = cmd[1:]
            if cmd_name.startswith(word):
                yield Completion(
@@ -981,6 +1004,8 @@ class SlashCommandAutoSuggest(AutoSuggest):
            # Still typing the command name: /upd → suggest "ate"
            word = text[1:].lower()
            for cmd in COMMANDS:
+                if self._completer is not None and not self._completer._command_allowed(cmd):
+                    continue
                cmd_name = cmd[1:]  # strip leading /
                if cmd_name.startswith(word) and cmd_name != word:
                    return Suggestion(cmd_name[len(word):])
@@ -991,6 +1016,8 @@ class SlashCommandAutoSuggest(AutoSuggest):
        sub_lower = sub_text.lower()

        # Static subcommands
+        if self._completer is not None and not self._completer._command_allowed(base_cmd):
+            return None
        if base_cmd in SUBCOMMANDS and SUBCOMMANDS[base_cmd]:
            if " " not in sub_text:
                for sub in SUBCOMMANDS[base_cmd]:
@@ -255,6 +255,7 @@ DEFAULT_CONFIG = {
        # tools or receiving API responses.  Only fires when the agent has
        # been completely idle for this duration.  0 = unlimited.
        "gateway_timeout": 1800,
+        "service_tier": "",
        # Tool-use enforcement: injects system prompt guidance that tells the
        # model to actually call tools instead of describing intended actions.
        # Values: "auto" (default — applies to gpt/codex models), true/false
@@ -1216,6 +1217,14 @@ OPTIONAL_ENV_VARS = {
        "category": "messaging",
        "advanced": True,
    },
+    "API_SERVER_MODEL_NAME": {
+        "description": "Model name advertised on /v1/models. Defaults to the profile name (or 'hermes-agent' for the default profile). Useful for multi-user setups with OpenWebUI.",
+        "prompt": "API server model name",
+        "url": None,
+        "password": False,
+        "category": "messaging",
+        "advanced": True,
+    },
    "WEBHOOK_ENABLED": {
        "description": "Enable the webhook platform adapter for receiving events from GitHub, GitLab, etc.",
        "prompt": "Enable webhooks (true/false)",
@@ -31,6 +31,13 @@ logger = logging.getLogger(__name__)

 # OAuth device code flow constants (same client ID as opencode/Copilot CLI)
 COPILOT_OAUTH_CLIENT_ID = "Ov23li8tweQw6odWQebz"
+COPILOT_DEVICE_CODE_URL = "https://github.com/login/device/code"
+COPILOT_ACCESS_TOKEN_URL = "https://github.com/login/oauth/access_token"
+
+# Copilot API constants
+COPILOT_TOKEN_EXCHANGE_URL = "https://api.github.com/copilot_internal/v2/token"
+COPILOT_API_BASE_URL = "https://api.githubcopilot.com"
+
 # Token type prefixes
 _CLASSIC_PAT_PREFIX = "ghp_"
 _SUPPORTED_PREFIXES = ("gho_", "github_pat_", "ghu_")
@@ -43,6 +50,11 @@ _DEVICE_CODE_POLL_INTERVAL = 5  # seconds
 _DEVICE_CODE_POLL_SAFETY_MARGIN = 3  # seconds


+def is_classic_pat(token: str) -> bool:
+    """Check if a token is a classic PAT (ghp_*), which Copilot doesn't support."""
+    return token.strip().startswith(_CLASSIC_PAT_PREFIX)
+
+
 def validate_copilot_token(token: str) -> tuple[bool, str]:
    """Validate that a token is usable with the Copilot API.

@@ -54,6 +54,32 @@ _PROVIDER_ENV_HINTS = (
 )


+from hermes_constants import is_termux as _is_termux
+
+
+def _python_install_cmd() -> str:
+    return "python -m pip install" if _is_termux() else "uv pip install"
+
+
+def _system_package_install_cmd(pkg: str) -> str:
+    if _is_termux():
+        return f"pkg install {pkg}"
+    if sys.platform == "darwin":
+        return f"brew install {pkg}"
+    return f"sudo apt install {pkg}"
+
+
+def _termux_browser_setup_steps(node_installed: bool) -> list[str]:
+    steps: list[str] = []
+    step = 1
+    if not node_installed:
+        steps.append(f"{step}) pkg install nodejs")
+        step += 1
+    steps.append(f"{step}) npm install -g agent-browser")
+    steps.append(f"{step + 1}) agent-browser install")
+    return steps
+
+
 def _has_provider_env_config(content: str) -> bool:
    """Return True when ~/.hermes/.env contains provider auth/base URL settings."""
    return any(key in content for key in _PROVIDER_ENV_HINTS)
@@ -200,7 +226,7 @@ def run_doctor(args):
            check_ok(name)
        except ImportError:
            check_fail(name, "(missing)")
-            issues.append(f"Install {name}: uv pip install {module}")
+            issues.append(f"Install {name}: {_python_install_cmd()} {module}")
    
    for module, name in optional_packages:
        try:
@@ -503,7 +529,7 @@ def run_doctor(args):
        check_ok("ripgrep (rg)", "(faster file search)")
    else:
        check_warn("ripgrep (rg) not found", "(file search uses grep fallback)")
-        check_info("Install for faster search: sudo apt install ripgrep")
+        check_info(f"Install for faster search: {_system_package_install_cmd('ripgrep')}")
    
    # Docker (optional)
    terminal_env = os.getenv("TERMINAL_ENV", "local")
@@ -526,7 +552,10 @@ def run_doctor(args):
        if shutil.which("docker"):
            check_ok("docker", "(optional)")
        else:
-            check_warn("docker not found", "(optional)")
+            if _is_termux():
+                check_info("Docker backend is not available inside Termux (expected on Android)")
+            else:
+                check_warn("docker not found", "(optional)")
    
    # SSH (if using ssh backend)
    if terminal_env == "ssh":
@@ -574,9 +603,23 @@ def run_doctor(args):
        if agent_browser_path.exists():
            check_ok("agent-browser (Node.js)", "(browser automation)")
        else:
-            check_warn("agent-browser not installed", "(run: npm install)")
+            if _is_termux():
+                check_info("agent-browser is not installed (expected in the tested Termux path)")
+                check_info("Install it manually later with: npm install -g agent-browser && agent-browser install")
+                check_info("Termux browser setup:")
+                for step in _termux_browser_setup_steps(node_installed=True):
+                    check_info(step)
+            else:
+                check_warn("agent-browser not installed", "(run: npm install)")
    else:
-        check_warn("Node.js not found", "(optional, needed for browser tools)")
+        if _is_termux():
+            check_info("Node.js not found (browser tools are optional in the tested Termux path)")
+            check_info("Install Node.js on Termux with: pkg install nodejs")
+            check_info("Termux browser setup:")
+            for step in _termux_browser_setup_steps(node_installed=False):
+                check_info(step)
+        else:
+            check_warn("Node.js not found", "(optional, needed for browser tools)")
    
    # npm audit for all Node.js packages
    if shutil.which("npm"):
@@ -739,8 +782,9 @@ def run_doctor(args):
                __import__("tinker_atropos")
                check_ok("tinker-atropos", "(RL training backend)")
            except ImportError:
-                check_warn("tinker-atropos found but not installed", "(run: uv pip install -e ./tinker-atropos)")
-                issues.append("Install tinker-atropos: uv pip install -e ./tinker-atropos")
+                install_cmd = f"{_python_install_cmd()} -e ./tinker-atropos"
+                check_warn("tinker-atropos found but not installed", f"(run: {install_cmd})")
+                issues.append(f"Install tinker-atropos: {install_cmd}")
        else:
            check_warn("tinker-atropos requires Python 3.11+", f"(current: {py_version.major}.{py_version.minor})")
    else:
@@ -32,6 +32,11 @@ def _get_git_commit(project_root: Path) -> str:
    return "(unknown)"


+def _key_present(name: str) -> str:
+    """Return 'set' or 'not set' for an env var."""
+    return "set" if os.getenv(name) else "not set"
+
+
 def _redact(value: str) -> str:
    """Redact all but first 4 and last 4 chars."""
    if not value:
@@ -39,7 +39,7 @@ def _get_service_pids() -> set:
    pids: set = set()

    # --- systemd (Linux): user and system scopes ---
-    if is_linux():
+    if supports_systemd_services():
        for scope_args in [["systemctl", "--user"], ["systemctl"]]:
            try:
                result = subprocess.run(
@@ -225,6 +225,14 @@ def stop_profile_gateway() -> bool:
 def is_linux() -> bool:
    return sys.platform.startswith('linux')

+
+from hermes_constants import is_termux
+
+
+def supports_systemd_services() -> bool:
+    return is_linux() and not is_termux()
+
+
 def is_macos() -> bool:
    return sys.platform == 'darwin'

@@ -308,6 +316,8 @@ def get_service_name() -> str:
    return f"{_SERVICE_BASE}-{suffix}"


+SERVICE_NAME = _SERVICE_BASE  # backward-compat for external importers; prefer get_service_name()
+

 def get_systemd_unit_path(system: bool = False) -> Path:
    name = get_service_name()
@@ -475,13 +485,15 @@ def install_linux_gateway_from_setup(force: bool = False) -> tuple[str | None, b


 def get_systemd_linger_status() -> tuple[bool | None, str]:
-    """Return whether systemd user lingering is enabled for the current user.
+    """Return systemd linger status for the current user.

    Returns:
        (True, "") when linger is enabled.
        (False, "") when linger is disabled.
        (None, detail) when the status could not be determined.
    """
+    if is_termux():
+        return None, "not supported in Termux"
    if not is_linux():
        return None, "not supported on this platform"

@@ -579,6 +591,17 @@ def get_python_path() -> str:
            return str(venv_python)
    return sys.executable

+def get_hermes_cli_path() -> str:
+    """Get the path to the hermes CLI."""
+    # Check if installed via pip
+    import shutil
+    hermes_bin = shutil.which("hermes")
+    if hermes_bin:
+        return hermes_bin
+    
+    # Fallback to direct module execution
+    return f"{get_python_path()} -m hermes_cli.main"
+

 # =============================================================================
 # Systemd (Linux)
@@ -753,7 +776,7 @@ def _print_linger_enable_warning(username: str, detail: str | None = None) -> No

 def _ensure_linger_enabled() -> None:
    """Enable linger when possible so the user gateway survives logout."""
-    if not is_linux():
+    if is_termux() or not is_linux():
        return

    import getpass
@@ -1788,7 +1811,7 @@ def _setup_whatsapp():

 def _is_service_installed() -> bool:
    """Check if the gateway is installed as a system service."""
-    if is_linux():
+    if supports_systemd_services():
        return get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()
    elif is_macos():
        return get_launchd_plist_path().exists()
@@ -1797,7 +1820,7 @@ def _is_service_installed() -> bool:

 def _is_service_running() -> bool:
    """Check if the gateway service is currently running."""
-    if is_linux():
+    if supports_systemd_services():
        user_unit_exists = get_systemd_unit_path(system=False).exists()
        system_unit_exists = get_systemd_unit_path(system=True).exists()

@@ -1970,7 +1993,7 @@ def gateway_setup():
    service_installed = _is_service_installed()
    service_running = _is_service_running()

-    if is_linux() and has_conflicting_systemd_units():
+    if supports_systemd_services() and has_conflicting_systemd_units():
        print_systemd_scope_conflict_warning()
        print()

@@ -1980,7 +2003,7 @@ def gateway_setup():
        print_warning("Gateway service is installed but not running.")
        if prompt_yes_no("  Start it now?", True):
            try:
-                if is_linux():
+                if supports_systemd_services():
                    systemd_start()
                elif is_macos():
                    launchd_start()
@@ -2031,7 +2054,7 @@ def gateway_setup():
        if service_running:
            if prompt_yes_no("  Restart the gateway to pick up changes?", True):
                try:
-                    if is_linux():
+                    if supports_systemd_services():
                        systemd_restart()
                    elif is_macos():
                        launchd_restart()
@@ -2043,7 +2066,7 @@ def gateway_setup():
        elif service_installed:
            if prompt_yes_no("  Start the gateway service?", True):
                try:
-                    if is_linux():
+                    if supports_systemd_services():
                        systemd_start()
                    elif is_macos():
                        launchd_start()
@@ -2051,13 +2074,13 @@ def gateway_setup():
                    print_error(f"  Start failed: {e}")
        else:
            print()
-            if is_linux() or is_macos():
-                platform_name = "systemd" if is_linux() else "launchd"
+            if supports_systemd_services() or is_macos():
+                platform_name = "systemd" if supports_systemd_services() else "launchd"
                if prompt_yes_no(f"  Install the gateway as a {platform_name} service? (runs in background, starts on boot)", True):
                    try:
                        installed_scope = None
                        did_install = False
-                        if is_linux():
+                        if supports_systemd_services():
                            installed_scope, did_install = install_linux_gateway_from_setup(force=False)
                        else:
                            launchd_install(force=False)
@@ -2065,7 +2088,7 @@ def gateway_setup():
                        print()
                        if did_install and prompt_yes_no("  Start the service now?", True):
                            try:
-                                if is_linux():
+                                if supports_systemd_services():
                                    systemd_start(system=installed_scope == "system")
                                else:
                                    launchd_start()
@@ -2076,12 +2099,18 @@ def gateway_setup():
                        print_info("  You can try manually: hermes gateway install")
                else:
                    print_info("  You can install later: hermes gateway install")
-                    if is_linux():
+                    if supports_systemd_services():
                        print_info("  Or as a boot-time service: sudo hermes gateway install --system")
                    print_info("  Or run in foreground:  hermes gateway")
            else:
-                print_info("  Service install not supported on this platform.")
-                print_info("  Run in foreground: hermes gateway")
+                if is_termux():
+                    from hermes_constants import display_hermes_home as _dhh
+                    print_info("  Termux does not use systemd/launchd services.")
+                    print_info("  Run in foreground: hermes gateway")
+                    print_info(f"  Or start it manually in the background (best effort): nohup hermes gateway >{_dhh()}/logs/gateway.log 2>&1 &")
+                else:
+                    print_info("  Service install not supported on this platform.")
+                    print_info("  Run in foreground: hermes gateway")
    else:
        print()
        print_info("No platforms configured. Run 'hermes gateway setup' when ready.")
@@ -2117,7 +2146,11 @@ def gateway_command(args):
        force = getattr(args, 'force', False)
        system = getattr(args, 'system', False)
        run_as_user = getattr(args, 'run_as_user', None)
-        if is_linux():
+        if is_termux():
+            print("Gateway service installation is not supported on Termux.")
+            print("Run manually: hermes gateway")
+            sys.exit(1)
+        if supports_systemd_services():
            systemd_install(force=force, system=system, run_as_user=run_as_user)
        elif is_macos():
            launchd_install(force)
@@ -2131,7 +2164,11 @@ def gateway_command(args):
            managed_error("uninstall gateway service (managed by NixOS)")
            return
        system = getattr(args, 'system', False)
-        if is_linux():
+        if is_termux():
+            print("Gateway service uninstall is not supported on Termux because there is no managed service to remove.")
+            print("Stop manual runs with: hermes gateway stop")
+            sys.exit(1)
+        if supports_systemd_services():
            systemd_uninstall(system=system)
        elif is_macos():
            launchd_uninstall()
@@ -2141,7 +2178,11 @@ def gateway_command(args):
    
    elif subcmd == "start":
        system = getattr(args, 'system', False)
-        if is_linux():
+        if is_termux():
+            print("Gateway service start is not supported on Termux because there is no system service manager.")
+            print("Run manually: hermes gateway")
+            sys.exit(1)
+        if supports_systemd_services():
            systemd_start(system=system)
        elif is_macos():
            launchd_start()
@@ -2156,7 +2197,7 @@ def gateway_command(args):
        if stop_all:
            # --all: kill every gateway process on the machine
            service_available = False
-            if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+            if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
                try:
                    systemd_stop(system=system)
                    service_available = True
@@ -2177,7 +2218,7 @@ def gateway_command(args):
        else:
            # Default: stop only the current profile's gateway
            service_available = False
-            if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+            if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
                try:
                    systemd_stop(system=system)
                    service_available = True
@@ -2205,7 +2246,7 @@ def gateway_command(args):
        system = getattr(args, 'system', False)
        service_configured = False
        
-        if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+        if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
            service_configured = True
            try:
                systemd_restart(system=system)
@@ -2222,7 +2263,7 @@ def gateway_command(args):
        
        if not service_available:
            # systemd/launchd restart failed — check if linger is the issue
-            if is_linux():
+            if supports_systemd_services():
                linger_ok, _detail = get_systemd_linger_status()
                if linger_ok is not True:
                    import getpass
@@ -2259,7 +2300,7 @@ def gateway_command(args):
        system = getattr(args, 'system', False)
        
        # Check for service first
-        if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
+        if supports_systemd_services() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
            systemd_status(deep, system=system)
        elif is_macos() and get_launchd_plist_path().exists():
            launchd_status(deep)
@@ -2276,9 +2317,13 @@ def gateway_command(args):
                    for line in runtime_lines:
                        print(f"  {line}")
                print()
-                print("To install as a service:")
-                print("  hermes gateway install")
-                print("  sudo hermes gateway install --system")
+                if is_termux():
+                    print("Termux note:")
+                    print("  Android may stop background jobs when Termux is suspended")
+                else:
+                    print("To install as a service:")
+                    print("  hermes gateway install")
+                    print("  sudo hermes gateway install --system")
            else:
                print("✗ Gateway is not running")
                runtime_lines = _runtime_health_lines()
@@ -2290,5 +2335,8 @@ def gateway_command(args):
                print()
                print("To start:")
                print("  hermes gateway          # Run in foreground")
-                print("  hermes gateway install  # Install as user service")
-                print("  sudo hermes gateway install --system  # Install as boot-time system service")
+                if is_termux():
+                    print("  nohup hermes gateway > ~/.hermes/logs/gateway.log 2>&1 &  # Best-effort background start")
+                else:
+                    print("  hermes gateway install  # Install as user service")
+                    print("  sudo hermes gateway install --system  # Install as boot-time system service")
@@ -646,6 +646,7 @@ def cmd_chat(args):
        "verbose": args.verbose,
        "quiet": getattr(args, "quiet", False),
        "query": args.query,
+        "image": getattr(args, "image", None),
        "resume": getattr(args, "resume", None),
        "worktree": getattr(args, "worktree", False),
        "checkpoints": getattr(args, "checkpoints", False),
@@ -3763,7 +3764,7 @@ def cmd_update(args):
        # running gateway needs restarting to pick up the new code.
        try:
            from hermes_cli.gateway import (
-                is_macos, is_linux, _ensure_user_systemd_env,
+                is_macos, supports_systemd_services, _ensure_user_systemd_env,
                find_gateway_pids,
                _get_service_pids,
            )
@@ -3774,7 +3775,7 @@ def cmd_update(args):

            # --- Systemd services (Linux) ---
            # Discover all hermes-gateway* units (default + profiles)
-            if is_linux():
+            if supports_systemd_services():
                try:
                    _ensure_user_systemd_env()
                except Exception:
@@ -4291,6 +4292,10 @@ For more help on a command:
        "-q", "--query",
        help="Single query (non-interactive mode)"
    )
+    chat_parser.add_argument(
+        "--image",
+        help="Optional local image path to attach to a single query"
+    )
    chat_parser.add_argument(
        "-m", "--model",
        help="Model to use (e.g., anthropic/claude-sonnet-4)"
@@ -332,3 +332,31 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
 # Batch / convenience helpers
 # ---------------------------------------------------------------------------

+def model_display_name(model_id: str) -> str:
+    """Return a short, human-readable display name for a model id.
+
+    Strips the vendor prefix (if any) for a cleaner display in menus
+    and status bars, while preserving dots for readability.
+
+    Examples::
+
+        >>> model_display_name("anthropic/claude-sonnet-4.6")
+        'claude-sonnet-4.6'
+        >>> model_display_name("claude-sonnet-4-6")
+        'claude-sonnet-4-6'
+    """
+    return _strip_vendor_prefix((model_id or "").strip())
+
+
+def is_aggregator_provider(provider: str) -> bool:
+    """Check if a provider is an aggregator that needs vendor/model format."""
+    return (provider or "").strip().lower() in _AGGREGATOR_PROVIDERS
+
+
+def vendor_for_model(model_name: str) -> str:
+    """Return the vendor slug for a model, or ``""`` if unknown.
+
+    Convenience wrapper around :func:`detect_vendor` that never returns
+    ``None``.
+    """
+    return detect_vendor(model_name) or ""
@@ -859,3 +859,74 @@ def list_authenticated_providers(
    return results


+# ---------------------------------------------------------------------------
+# Fuzzy suggestions
+# ---------------------------------------------------------------------------
+
+def suggest_models(raw_input: str, limit: int = 3) -> List[str]:
+    """Return fuzzy model suggestions for a (possibly misspelled) input."""
+    query = raw_input.strip()
+    if not query:
+        return []
+
+    results = search_models_dev(query, limit=limit)
+    suggestions: list[str] = []
+    for r in results:
+        mid = r.get("model_id", "")
+        if mid:
+            suggestions.append(mid)
+
+    return suggestions[:limit]
+
+
+# ---------------------------------------------------------------------------
+# Custom provider switch
+# ---------------------------------------------------------------------------
+
+def switch_to_custom_provider() -> CustomAutoResult:
+    """Handle bare '/model --provider custom' — resolve endpoint and auto-detect model."""
+    from hermes_cli.runtime_provider import (
+        resolve_runtime_provider,
+        _auto_detect_local_model,
+    )
+
+    try:
+        runtime = resolve_runtime_provider(requested="custom")
+    except Exception as e:
+        return CustomAutoResult(
+            success=False,
+            error_message=f"Could not resolve custom endpoint: {e}",
+        )
+
+    cust_base = runtime.get("base_url", "")
+    cust_key = runtime.get("api_key", "")
+
+    if not cust_base or "openrouter.ai" in cust_base:
+        return CustomAutoResult(
+            success=False,
+            error_message=(
+                "No custom endpoint configured. "
+                "Set model.base_url in config.yaml, or set OPENAI_BASE_URL "
+                "in .env, or run: hermes setup -> Custom OpenAI-compatible endpoint"
+            ),
+        )
+
+    detected_model = _auto_detect_local_model(cust_base)
+    if not detected_model:
+        return CustomAutoResult(
+            success=False,
+            base_url=cust_base,
+            api_key=cust_key,
+            error_message=(
+                f"Custom endpoint at {cust_base} is reachable but no single "
+                f"model was auto-detected. Specify the model explicitly: "
+                f"/model <model-name> --provider custom"
+            ),
+        )
+
+    return CustomAutoResult(
+        success=True,
+        model=detected_model,
+        base_url=cust_base,
+        api_key=cust_key,
+    )
@@ -20,6 +20,10 @@ COPILOT_EDITOR_VERSION = "vscode/1.104.1"
 COPILOT_REASONING_EFFORTS_GPT5 = ["minimal", "low", "medium", "high"]
 COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]

+# Backward-compatible aliases for the earlier GitHub Models-backed Copilot work.
+GITHUB_MODELS_BASE_URL = COPILOT_BASE_URL
+GITHUB_MODELS_CATALOG_URL = COPILOT_MODELS_URL
+
 # (model_id, display description shown in menus)
 OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("anthropic/claude-opus-4.6",       "recommended"),
@@ -412,6 +416,12 @@ _FREE_TIER_CACHE_TTL: int = 180  # seconds (3 minutes)
 _free_tier_cache: tuple[bool, float] | None = None  # (result, timestamp)


+def clear_nous_free_tier_cache() -> None:
+    """Invalidate the cached free-tier result (e.g. after login/logout)."""
+    global _free_tier_cache
+    _free_tier_cache = None
+
+
 def check_nous_free_tier() -> bool:
    """Check if the current Nous Portal user is on a free (unpaid) tier.

@@ -525,6 +535,14 @@ def model_ids() -> list[str]:
    return [mid for mid, _ in OPENROUTER_MODELS]


+def menu_labels() -> list[str]:
+    """Return display labels like 'anthropic/claude-opus-4.6 (recommended)'."""
+    labels = []
+    for mid, desc in OPENROUTER_MODELS:
+        labels.append(f"{mid} ({desc})" if desc else mid)
+    return labels
+
+
 # ---------------------------------------------------------------------------
 # Pricing helpers — fetch live pricing from OpenRouter-compatible /v1/models
 # ---------------------------------------------------------------------------
@@ -557,6 +575,31 @@ def _format_price_per_mtok(per_token_str: str) -> str:
    return f"${per_m:.2f}"


+def format_pricing_label(pricing: dict[str, str] | None) -> str:
+    """Build a compact pricing label like 'in $3 · out $15 · cache $0.30/Mtok'.
+
+    Returns empty string when pricing is unavailable.
+    """
+    if not pricing:
+        return ""
+    prompt_price = pricing.get("prompt", "")
+    completion_price = pricing.get("completion", "")
+    if not prompt_price and not completion_price:
+        return ""
+    inp = _format_price_per_mtok(prompt_price)
+    out = _format_price_per_mtok(completion_price)
+    if inp == "free" and out == "free":
+        return "free"
+    cache_read = pricing.get("input_cache_read", "")
+    cache_str = _format_price_per_mtok(cache_read) if cache_read else ""
+    if inp == out and not cache_str:
+        return f"{inp}/Mtok"
+    parts = [f"in {inp}", f"out {out}"]
+    if cache_str and cache_str != "?" and cache_str != inp:
+        parts.append(f"cache {cache_str}")
+    return " · ".join(parts) + "/Mtok"
+
+
 def format_model_pricing_table(
    models: list[tuple[str, str]],
    pricing_map: dict[str, dict[str, str]],
@@ -974,6 +1017,47 @@ def provider_label(provider: Optional[str]) -> str:
    return _PROVIDER_LABELS.get(normalized, original or "OpenRouter")


+# Models that support OpenAI Priority Processing (service_tier="priority").
+# See https://openai.com/api-priority-processing/ for the canonical list.
+# Only the bare model slug is stored (no vendor prefix).
+_PRIORITY_PROCESSING_MODELS: frozenset[str] = frozenset({
+    "gpt-5.4",
+    "gpt-5.4-mini",
+    "gpt-5.2",
+    "gpt-5.1",
+    "gpt-5",
+    "gpt-5-mini",
+    "gpt-4.1",
+    "gpt-4.1-mini",
+    "gpt-4.1-nano",
+    "gpt-4o",
+    "gpt-4o-mini",
+    "o3",
+    "o4-mini",
+})
+
+
+def model_supports_fast_mode(model_id: Optional[str]) -> bool:
+    """Return whether Hermes should expose the /fast (Priority Processing) toggle."""
+    raw = str(model_id or "").strip().lower()
+    if "/" in raw:
+        raw = raw.split("/", 1)[1]
+    return raw in _PRIORITY_PROCESSING_MODELS
+
+
+def resolve_fast_mode_overrides(model_id: Optional[str]) -> dict[str, Any] | None:
+    """Return request_overrides for Priority Processing, or None if unsupported.
+
+    Unlike the previous ``resolve_fast_mode_runtime``, this does NOT force a
+    provider/backend switch.  The ``service_tier`` parameter is injected into
+    whatever API path the user is already on (Codex Responses, Chat Completions,
+    or OpenRouter passthrough).
+    """
+    if not model_supports_fast_mode(model_id):
+        return None
+    return {"service_tier": "priority"}
+
+
 def _resolve_copilot_catalog_api_key() -> str:
    """Best-effort GitHub token for fetching the Copilot model catalog."""
    try:
@@ -148,6 +148,10 @@ class ProviderDef:
    doc: str = ""
    source: str = ""                      # "models.dev", "hermes", "user-config"

+    @property
+    def is_user_defined(self) -> bool:
+        return self.source == "user-config"
+

 # -- Aliases ------------------------------------------------------------------
 # Maps human-friendly / legacy names to canonical provider IDs.
@@ -258,6 +262,12 @@ def normalize_provider(name: str) -> str:
    return ALIASES.get(key, key)


+def get_overlay(provider_id: str) -> Optional[HermesOverlay]:
+    """Get Hermes overlay for a provider, if one exists."""
+    canonical = normalize_provider(provider_id)
+    return HERMES_OVERLAYS.get(canonical)
+
+
 def get_provider(name: str) -> Optional[ProviderDef]:
    """Look up a provider by id or alias, merging all data sources.

@@ -340,6 +350,37 @@ def get_label(provider_id: str) -> str:
    return canonical


+# For direct import compat, expose as module-level dict
+# Built on demand by get_label() calls
+LABELS: Dict[str, str] = {
+    # Static entries for backward compat — get_label() is the proper API
+    "openrouter": "OpenRouter",
+    "nous": "Nous Portal",
+    "openai-codex": "OpenAI Codex",
+    "copilot-acp": "GitHub Copilot ACP",
+    "github-copilot": "GitHub Copilot",
+    "anthropic": "Anthropic",
+    "zai": "Z.AI / GLM",
+    "kimi-for-coding": "Kimi / Moonshot",
+    "minimax": "MiniMax",
+    "minimax-cn": "MiniMax (China)",
+    "deepseek": "DeepSeek",
+    "alibaba": "Alibaba Cloud (DashScope)",
+    "vercel": "Vercel AI Gateway",
+    "opencode": "OpenCode Zen",
+    "opencode-go": "OpenCode Go",
+    "kilo": "Kilo Gateway",
+    "huggingface": "Hugging Face",
+    "local": "Local endpoint",
+    "custom": "Custom endpoint",
+    # Legacy Hermes IDs (point to same providers)
+    "ai-gateway": "Vercel AI Gateway",
+    "kilocode": "Kilo Gateway",
+    "copilot": "GitHub Copilot",
+    "kimi-coding": "Kimi / Moonshot",
+    "opencode-zen": "OpenCode Zen",
+}
+

 def is_aggregator(provider: str) -> bool:
    """Return True when the provider is a multi-model aggregator."""
@@ -16,6 +16,7 @@ from hermes_cli.auth import (
    DEFAULT_CODEX_BASE_URL,
    DEFAULT_QWEN_BASE_URL,
    PROVIDER_REGISTRY,
+    _agent_key_is_usable,
    format_auth_error,
    resolve_provider,
    resolve_nous_runtime_credentials,
@@ -644,6 +645,21 @@ def resolve_runtime_provider(
                getattr(entry, "runtime_api_key", None)
                or getattr(entry, "access_token", "")
            )
+        # For Nous, the pool entry's runtime_api_key is the agent_key — a
+        # short-lived inference credential (~30 min TTL).  The pool doesn't
+        # refresh it during selection (that would trigger network calls in
+        # non-runtime contexts like `hermes auth list`).  If the key is
+        # expired, clear pool_api_key so we fall through to
+        # resolve_nous_runtime_credentials() which handles refresh + mint.
+        if provider == "nous" and entry is not None and pool_api_key:
+            min_ttl = max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800")))
+            nous_state = {
+                "agent_key": getattr(entry, "agent_key", None),
+                "agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
+            }
+            if not _agent_key_is_usable(nous_state, min_ttl):
+                logger.debug("Nous pool entry agent_key expired/missing, falling through to runtime resolution")
+                pool_api_key = ""
        if entry is not None and pool_api_key:
            return _resolve_runtime_from_pool_entry(
                provider=provider,
@@ -172,6 +172,147 @@ def _setup_copilot_reasoning_selection(
        _set_reasoning_effort(config, "none")


+def _setup_provider_model_selection(config, provider_id, current_model, prompt_choice, prompt_fn):
+    """Model selection for API-key providers with live /models detection.
+
+    Tries the provider's /models endpoint first.  Falls back to a
+    hardcoded default list with a warning if the endpoint is unreachable.
+    Always offers a 'Custom model' escape hatch.
+    """
+    from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
+    from hermes_cli.config import get_env_value
+    from hermes_cli.models import (
+        copilot_model_api_mode,
+        fetch_api_models,
+        fetch_github_model_catalog,
+        normalize_copilot_model_id,
+        normalize_opencode_model_id,
+        opencode_model_api_mode,
+    )
+
+    pconfig = PROVIDER_REGISTRY[provider_id]
+    is_copilot_catalog_provider = provider_id in {"copilot", "copilot-acp"}
+
+    # Resolve API key and base URL for the probe
+    if is_copilot_catalog_provider:
+        api_key = ""
+        if provider_id == "copilot":
+            creds = resolve_api_key_provider_credentials(provider_id)
+            api_key = creds.get("api_key", "")
+            base_url = creds.get("base_url", "") or pconfig.inference_base_url
+        else:
+            try:
+                creds = resolve_api_key_provider_credentials("copilot")
+                api_key = creds.get("api_key", "")
+            except Exception:
+                pass
+            base_url = pconfig.inference_base_url
+        catalog = fetch_github_model_catalog(api_key)
+        current_model = normalize_copilot_model_id(
+            current_model,
+            catalog=catalog,
+            api_key=api_key,
+        ) or current_model
+    else:
+        api_key = ""
+        for ev in pconfig.api_key_env_vars:
+            api_key = get_env_value(ev) or os.getenv(ev, "")
+            if api_key:
+                break
+        base_url_env = pconfig.base_url_env_var or ""
+        base_url = (get_env_value(base_url_env) if base_url_env else "") or pconfig.inference_base_url
+        catalog = None
+
+    # Try live /models endpoint
+    if is_copilot_catalog_provider and catalog:
+        live_models = [item.get("id", "") for item in catalog if item.get("id")]
+    else:
+        live_models = fetch_api_models(api_key, base_url)
+
+    if live_models:
+        provider_models = live_models
+        print_info(f"Found {len(live_models)} model(s) from {pconfig.name} API")
+    else:
+        fallback_provider_id = "copilot" if provider_id == "copilot-acp" else provider_id
+        provider_models = _DEFAULT_PROVIDER_MODELS.get(fallback_provider_id, [])
+        if provider_models:
+            print_warning(
+                f"Could not auto-detect models from {pconfig.name} API — showing defaults.\n"
+                f"    Use \"Custom model\" if the model you expect isn't listed."
+            )
+
+    if provider_id in {"opencode-zen", "opencode-go"}:
+        provider_models = [normalize_opencode_model_id(provider_id, mid) for mid in provider_models]
+        current_model = normalize_opencode_model_id(provider_id, current_model)
+        provider_models = list(dict.fromkeys(mid for mid in provider_models if mid))
+
+    model_choices = list(provider_models)
+    model_choices.append("Custom model")
+    model_choices.append(f"Keep current ({current_model})")
+
+    keep_idx = len(model_choices) - 1
+    model_idx = prompt_choice("Select default model:", model_choices, keep_idx)
+
+    selected_model = current_model
+
+    if model_idx < len(provider_models):
+        selected_model = provider_models[model_idx]
+        if is_copilot_catalog_provider:
+            selected_model = normalize_copilot_model_id(
+                selected_model,
+                catalog=catalog,
+                api_key=api_key,
+            ) or selected_model
+        elif provider_id in {"opencode-zen", "opencode-go"}:
+            selected_model = normalize_opencode_model_id(provider_id, selected_model)
+        _set_default_model(config, selected_model)
+    elif model_idx == len(provider_models):
+        custom = prompt_fn("Enter model name")
+        if custom:
+            if is_copilot_catalog_provider:
+                selected_model = normalize_copilot_model_id(
+                    custom,
+                    catalog=catalog,
+                    api_key=api_key,
+                ) or custom
+            elif provider_id in {"opencode-zen", "opencode-go"}:
+                selected_model = normalize_opencode_model_id(provider_id, custom)
+            else:
+                selected_model = custom
+            _set_default_model(config, selected_model)
+    else:
+        # "Keep current" selected — validate it's compatible with the new
+        # provider.  OpenRouter-formatted names (containing "/") won't work
+        # on direct-API providers and would silently break the gateway.
+        if "/" in (current_model or "") and provider_models:
+            print_warning(
+                f"Current model \"{current_model}\" looks like an OpenRouter model "
+                f"and won't work with {pconfig.name}. "
+                f"Switching to {provider_models[0]}."
+            )
+            selected_model = provider_models[0]
+            _set_default_model(config, provider_models[0])
+
+    if provider_id == "copilot" and selected_model:
+        model_cfg = _model_config_dict(config)
+        model_cfg["api_mode"] = copilot_model_api_mode(
+            selected_model,
+            catalog=catalog,
+            api_key=api_key,
+        )
+        config["model"] = model_cfg
+        _setup_copilot_reasoning_selection(
+            config,
+            selected_model,
+            prompt_choice,
+            catalog=catalog,
+            api_key=api_key,
+        )
+    elif provider_id in {"opencode-zen", "opencode-go"} and selected_model:
+        model_cfg = _model_config_dict(config)
+        model_cfg["api_mode"] = opencode_model_api_mode(provider_id, selected_model)
+        config["model"] = model_cfg
+

 # Import config helpers
 from hermes_cli.config import (
@@ -79,6 +79,9 @@ def _effective_provider_label() -> str:
    return provider_label(effective)


+from hermes_constants import is_termux as _is_termux
+
+
 def show_status(args):
    """Show status of all Hermes Agent components."""
    show_all = getattr(args, 'all', False)
@@ -325,7 +328,25 @@ def show_status(args):
    print()
    print(color("◆ Gateway Service", Colors.CYAN, Colors.BOLD))
    
-    if sys.platform.startswith('linux'):
+    if _is_termux():
+        try:
+            from hermes_cli.gateway import find_gateway_pids
+            gateway_pids = find_gateway_pids()
+        except Exception:
+            gateway_pids = []
+        is_running = bool(gateway_pids)
+        print(f"  Status:       {check_mark(is_running)} {'running' if is_running else 'stopped'}")
+        print("  Manager:      Termux / manual process")
+        if gateway_pids:
+            rendered = ", ".join(str(pid) for pid in gateway_pids[:3])
+            if len(gateway_pids) > 3:
+                rendered += ", ..."
+            print(f"  PID(s):       {rendered}")
+        else:
+            print("  Start with:   hermes gateway")
+            print("  Note:         Android may stop background jobs when Termux is suspended")
+
+    elif sys.platform.startswith('linux'):
        try:
            from hermes_cli.gateway import get_service_name
            _gw_svc = get_service_name()
@@ -339,7 +360,7 @@ def show_status(args):
                timeout=5
            )
            is_active = result.stdout.strip() == "active"
-        except subprocess.TimeoutExpired:
+        except (FileNotFoundError, subprocess.TimeoutExpired):
            is_active = False
        print(f"  Status:       {check_mark(is_active)} {'running' if is_active else 'stopped'}")
        print("  Manager:      systemd (user)")
@@ -122,6 +122,10 @@ def uninstall_gateway_service():
    
    if platform.system() != "Linux":
        return False
+
+    prefix = os.getenv("PREFIX", "")
+    if os.getenv("TERMUX_VERSION") or "com.termux/files/usr" in prefix:
+        return False
    
    try:
        from hermes_cli.gateway import get_service_name
@@ -93,9 +93,23 @@ def parse_reasoning_effort(effort: str) -> dict | None:
    return None


+def is_termux() -> bool:
+    """Return True when running inside a Termux (Android) environment.
+
+    Checks ``TERMUX_VERSION`` (set by Termux) or the Termux-specific
+    ``PREFIX`` path.  Import-safe — no heavy deps.
+    """
+    prefix = os.getenv("PREFIX", "")
+    return bool(os.getenv("TERMUX_VERSION") or "com.termux/files/usr" in prefix)
+
+
 OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
 OPENROUTER_MODELS_URL = f"{OPENROUTER_BASE_URL}/models"
+OPENROUTER_CHAT_URL = f"{OPENROUTER_BASE_URL}/chat/completions"

 AI_GATEWAY_BASE_URL = "https://ai-gateway.vercel.sh/v1"
+AI_GATEWAY_MODELS_URL = f"{AI_GATEWAY_BASE_URL}/models"
+AI_GATEWAY_CHAT_URL = f"{AI_GATEWAY_BASE_URL}/chat/completions"

 NOUS_API_BASE_URL = "https://inference-api.nousresearch.com/v1"
+NOUS_API_CHAT_URL = f"{NOUS_API_BASE_URL}/chat/completions"
@@ -520,6 +520,72 @@ class SessionDB:
            )
        self._execute_write(_do)

+    def set_token_counts(
+        self,
+        session_id: str,
+        input_tokens: int = 0,
+        output_tokens: int = 0,
+        model: str = None,
+        cache_read_tokens: int = 0,
+        cache_write_tokens: int = 0,
+        reasoning_tokens: int = 0,
+        estimated_cost_usd: Optional[float] = None,
+        actual_cost_usd: Optional[float] = None,
+        cost_status: Optional[str] = None,
+        cost_source: Optional[str] = None,
+        pricing_version: Optional[str] = None,
+        billing_provider: Optional[str] = None,
+        billing_base_url: Optional[str] = None,
+        billing_mode: Optional[str] = None,
+    ) -> None:
+        """Set token counters to absolute values (not increment).
+
+        Use this when the caller provides cumulative totals from a completed
+        conversation run (e.g. the gateway, where the cached agent's
+        session_prompt_tokens already reflects the running total).
+        """
+        def _do(conn):
+            conn.execute(
+                """UPDATE sessions SET
+                   input_tokens = ?,
+                   output_tokens = ?,
+                   cache_read_tokens = ?,
+                   cache_write_tokens = ?,
+                   reasoning_tokens = ?,
+                   estimated_cost_usd = ?,
+                   actual_cost_usd = CASE
+                       WHEN ? IS NULL THEN actual_cost_usd
+                       ELSE ?
+                   END,
+                   cost_status = COALESCE(?, cost_status),
+                   cost_source = COALESCE(?, cost_source),
+                   pricing_version = COALESCE(?, pricing_version),
+                   billing_provider = COALESCE(billing_provider, ?),
+                   billing_base_url = COALESCE(billing_base_url, ?),
+                   billing_mode = COALESCE(billing_mode, ?),
+                   model = COALESCE(model, ?)
+                   WHERE id = ?""",
+                (
+                    input_tokens,
+                    output_tokens,
+                    cache_read_tokens,
+                    cache_write_tokens,
+                    reasoning_tokens,
+                    estimated_cost_usd,
+                    actual_cost_usd,
+                    actual_cost_usd,
+                    cost_status,
+                    cost_source,
+                    pricing_version,
+                    billing_provider,
+                    billing_base_url,
+                    billing_mode,
+                    model,
+                    session_id,
+                ),
+            )
+        self._execute_write(_do)
+
    def get_session(self, session_id: str) -> Optional[Dict[str, Any]]:
        """Get a session by ID."""
        with self._lock:
@@ -89,6 +89,13 @@ def get_timezone() -> Optional[ZoneInfo]:
    return _cached_tz


+def get_timezone_name() -> str:
+    """Return the IANA name of the configured timezone, or empty string."""
+    if not _cache_resolved:
+        get_timezone()  # populates cache
+    return _cached_tz_name or ""
+
+
 def now() -> datetime:
    """
    Return the current time as a timezone-aware datetime.
@@ -103,3 +110,9 @@ def now() -> datetime:
    return datetime.now().astimezone()


+def reset_cache() -> None:
+    """Clear the cached timezone. Used by tests and after config changes."""
+    global _cached_tz, _cached_tz_name, _cache_resolved
+    _cached_tz = None
+    _cached_tz_name = None
+    _cache_resolved = False
@@ -63,6 +63,17 @@ homeassistant = ["aiohttp>=3.9.0,<4"]
 sms = ["aiohttp>=3.9.0,<4"]
 acp = ["agent-client-protocol>=0.9.0,<1.0"]
 mistral = ["mistralai>=2.3.0,<3"]
+termux = [
+  # Tested Android / Termux path: keeps the core CLI feature-rich while
+  # avoiding extras that currently depend on non-Android wheels (notably
+  # faster-whisper -> ctranslate2 via the voice extra).
+  "hermes-agent[cron]",
+  "hermes-agent[cli]",
+  "hermes-agent[pty]",
+  "hermes-agent[mcp]",
+  "hermes-agent[honcho]",
+  "hermes-agent[acp]",
+]
 dingtalk = ["dingtalk-stream>=0.1.0,<1"]
 feishu = ["lark-oapi>=1.5.3,<2"]
 rl = [
@@ -500,6 +500,8 @@ class AIAgent:
        status_callback: callable = None,
        max_tokens: int = None,
        reasoning_config: Dict[str, Any] = None,
+        service_tier: str = None,
+        request_overrides: Dict[str, Any] = None,
        prefill_messages: List[Dict[str, Any]] = None,
        platform: str = None,
        user_id: str = None,
@@ -622,8 +624,10 @@ class AIAgent:
        self.tool_progress_callback = tool_progress_callback
        self.tool_start_callback = tool_start_callback
        self.tool_complete_callback = tool_complete_callback
+        self.suppress_status_output = False
        self.thinking_callback = thinking_callback
        self.reasoning_callback = reasoning_callback
+        self._reasoning_deltas_fired = False  # Set by _fire_reasoning_delta, reset per API call
        self.clarify_callback = clarify_callback
        self.step_callback = step_callback
        self.stream_delta_callback = stream_delta_callback
@@ -660,6 +664,8 @@ class AIAgent:
        # Model response configuration
        self.max_tokens = max_tokens  # None = use model default
        self.reasoning_config = reasoning_config  # None = use default (medium for OpenRouter)
+        self.service_tier = service_tier
+        self.request_overrides = dict(request_overrides or {})
        self.prefill_messages = prefill_messages or []  # Prefilled conversation turns
        
        # Anthropic prompt caching: auto-enabled for Claude models via OpenRouter.
@@ -1298,6 +1304,7 @@ class AIAgent:
        if hasattr(self, "context_compressor") and self.context_compressor:
            self.context_compressor.last_prompt_tokens = 0
            self.context_compressor.last_completion_tokens = 0
+            self.context_compressor.last_total_tokens = 0
            self.context_compressor.compression_count = 0
            self.context_compressor._context_probed = False
            self.context_compressor._context_probe_persistable = False
@@ -1458,7 +1465,14 @@ class AIAgent:
        After the main response has been delivered and the remaining tool
        calls are post-response housekeeping (``_mute_post_response``),
        all non-forced output is suppressed.
+
+        ``suppress_status_output`` is a stricter CLI automation mode used by
+        parseable single-query flows such as ``hermes chat -q``. In that mode,
+        all status/diagnostic prints routed through ``_vprint`` are suppressed
+        so stdout stays machine-readable.
        """
+        if getattr(self, "suppress_status_output", False):
+            return
        if not force and getattr(self, "_mute_post_response", False):
            return
        if not force and self._has_stream_consumers() and not self._executing_tools:
@@ -1484,6 +1498,17 @@ class AIAgent:
        except (AttributeError, ValueError, OSError):
            return False

+    def _should_emit_quiet_tool_messages(self) -> bool:
+        """Return True when quiet-mode tool summaries should print directly.
+
+        When the caller provides ``tool_progress_callback`` (for example the CLI
+        TUI or a gateway progress renderer), that callback owns progress display.
+        Emitting quiet-mode summary lines here duplicates progress and leaks tool
+        previews into flows that are expected to stay silent, such as
+        ``hermes chat -q``.
+        """
+        return self.quiet_mode and not self.tool_progress_callback
+
    def _emit_status(self, message: str) -> None:
        """Emit a lifecycle status message to both CLI and gateway channels.

@@ -3322,7 +3347,7 @@ class AIAgent:
        allowed_keys = {
            "model", "instructions", "input", "tools", "store",
            "reasoning", "include", "max_output_tokens", "temperature",
-            "tool_choice", "parallel_tool_calls", "prompt_cache_key",
+            "tool_choice", "parallel_tool_calls", "prompt_cache_key", "service_tier",
        }
        normalized: Dict[str, Any] = {
            "model": model,
@@ -3340,6 +3365,9 @@ class AIAgent:
        include = api_kwargs.get("include")
        if isinstance(include, list):
            normalized["include"] = include
+        service_tier = api_kwargs.get("service_tier")
+        if isinstance(service_tier, str) and service_tier.strip():
+            normalized["service_tier"] = service_tier.strip()

        # Pass through max_output_tokens and temperature
        max_output_tokens = api_kwargs.get("max_output_tokens")
@@ -3847,6 +3875,7 @@ class AIAgent:
        max_stream_retries = 1
        has_tool_calls = False
        first_delta_fired = False
+        self._reasoning_deltas_fired = False
        # Accumulate streamed text so we can recover if get_final_response()
        # returns empty output (e.g. chatgpt.com backend-api sends
        # response.incomplete instead of response.completed).
@@ -4324,6 +4353,7 @@ class AIAgent:

    def _fire_reasoning_delta(self, text: str) -> None:
        """Fire reasoning callback if registered."""
+        self._reasoning_deltas_fired = True
        cb = self.reasoning_callback
        if cb is not None:
            try:
@@ -4403,7 +4433,17 @@ class AIAgent:
            """Stream a chat completions response."""
            import httpx as _httpx
            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
-            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 60.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            # Local providers (Ollama, llama.cpp, vLLM) can take minutes for
+            # prefill on large contexts before producing the first token.
+            # Auto-increase the httpx read timeout unless the user explicitly
+            # overrode HERMES_STREAM_READ_TIMEOUT.
+            if _stream_read_timeout == 120.0 and self.base_url and is_local_endpoint(self.base_url):
+                _stream_read_timeout = _base_timeout
+                logger.debug(
+                    "Local provider detected (%s) — stream read timeout raised to %.0fs",
+                    self.base_url, _stream_read_timeout,
+                )
            stream_kwargs = {
                **api_kwargs,
                "stream": True,
@@ -4443,6 +4483,10 @@ class AIAgent:
            role = "assistant"
            reasoning_parts: list = []
            usage_obj = None
+            # Reset per-call reasoning tracking so _build_assistant_message
+            # knows whether reasoning was already displayed during streaming.
+            self._reasoning_deltas_fired = False
+
            _first_chunk_seen = False
            for chunk in stream:
                last_chunk_time["t"] = time.time()
@@ -4557,20 +4601,31 @@ class AIAgent:
            # Build mock response matching non-streaming shape
            full_content = "".join(content_parts) or None
            mock_tool_calls = None
+            has_truncated_tool_args = False
            if tool_calls_acc:
                mock_tool_calls = []
                for idx in sorted(tool_calls_acc):
                    tc = tool_calls_acc[idx]
+                    arguments = tc["function"]["arguments"]
+                    if arguments and arguments.strip():
+                        try:
+                            json.loads(arguments)
+                        except json.JSONDecodeError:
+                            has_truncated_tool_args = True
                    mock_tool_calls.append(SimpleNamespace(
                        id=tc["id"],
                        type=tc["type"],
                        extra_content=tc.get("extra_content"),
                        function=SimpleNamespace(
                            name=tc["function"]["name"],
-                            arguments=tc["function"]["arguments"],
+                            arguments=arguments,
                        ),
                    ))

+            effective_finish_reason = finish_reason or "stop"
+            if has_truncated_tool_args:
+                effective_finish_reason = "length"
+
            full_reasoning = "".join(reasoning_parts) or None
            mock_message = SimpleNamespace(
                role=role,
@@ -4581,7 +4636,7 @@ class AIAgent:
            mock_choice = SimpleNamespace(
                index=0,
                message=mock_message,
-                finish_reason=finish_reason or "stop",
+                finish_reason=effective_finish_reason,
            )
            return SimpleNamespace(
                id="stream-" + str(uuid.uuid4()),
@@ -4599,6 +4654,7 @@ class AIAgent:
            works unchanged.
            """
            has_tool_use = False
+            self._reasoning_deltas_fired = False

            # Reset stale-stream timer for this attempt
            last_chunk_time["t"] = time.time()
@@ -5425,6 +5481,10 @@ class AIAgent:
                "models.github.ai" in self.base_url.lower()
                or "api.githubcopilot.com" in self.base_url.lower()
            )
+            is_codex_backend = (
+                self.provider == "openai-codex"
+                or "chatgpt.com/backend-api/codex" in self.base_url.lower()
+            )

            # Resolve reasoning effort: config > default (medium)
            reasoning_effort = "medium"
@@ -5462,7 +5522,10 @@ class AIAgent:
            elif not is_github_responses:
                kwargs["include"] = []

-            if self.max_tokens is not None:
+            if self.request_overrides:
+                kwargs.update(self.request_overrides)
+
+            if self.max_tokens is not None and not is_codex_backend:
                kwargs["max_output_tokens"] = self.max_tokens

            return kwargs
@@ -5557,20 +5620,20 @@ class AIAgent:
        if self.max_tokens is not None:
            if not self._is_qwen_portal():
                api_kwargs.update(self._max_tokens_param(self.max_tokens))
-        elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
-            # OpenRouter translates requests to Anthropic's Messages API,
-            # which requires max_tokens as a mandatory field.  When we omit
-            # it, OpenRouter picks a default that can be too low — the model
-            # spends its output budget on thinking and has almost nothing
-            # left for the actual response (especially large tool calls like
-            # write_file).  Sending the model's real output limit ensures
-            # full capacity.  Other providers handle the default fine.
+        elif (self._is_openrouter_url() or "nousresearch" in self._base_url_lower) and "claude" in (self.model or "").lower():
+            # OpenRouter and Nous Portal translate requests to Anthropic's
+            # Messages API, which requires max_tokens as a mandatory field.
+            # When we omit it, the proxy picks a default that can be too
+            # low — the model spends its output budget on thinking and has
+            # almost nothing left for the actual response (especially large
+            # tool calls like write_file).  Sending the model's real output
+            # limit ensures full capacity.
            try:
                from agent.anthropic_adapter import _get_anthropic_max_output
                _model_output_limit = _get_anthropic_max_output(self.model)
                api_kwargs["max_tokens"] = _model_output_limit
            except Exception:
-                pass  # fail open — let OpenRouter pick its default
+                pass  # fail open — let the proxy pick its default

        extra_body = {}

@@ -5633,6 +5696,11 @@ class AIAgent:
        if "x.ai" in self._base_url_lower and hasattr(self, "session_id") and self.session_id:
            api_kwargs["extra_headers"] = {"x-grok-conv-id": self.session_id}

+        # Priority Processing / generic request overrides (e.g. service_tier).
+        # Applied last so overrides win over any defaults set above.
+        if self.request_overrides:
+            api_kwargs.update(self.request_overrides)
+
        return api_kwargs

    def _supports_reasoning_extra_body(self) -> bool:
@@ -6338,7 +6406,7 @@ class AIAgent:

        # Start spinner for CLI mode (skip when TUI handles tool progress)
        spinner = None
-        if self.quiet_mode and not self.tool_progress_callback and self._should_start_quiet_spinner():
+        if self._should_emit_quiet_tool_messages() and self._should_start_quiet_spinner():
            face = random.choice(KawaiiSpinner.KAWAII_WAITING)
            spinner = KawaiiSpinner(f"{face} ⚡ running {num_tools} tools concurrently", spinner_type='dots', print_fn=self._print_fn)
            spinner.start()
@@ -6388,7 +6456,7 @@ class AIAgent:
                    logging.debug(f"Tool result ({len(function_result)} chars): {function_result}")

            # Print cute message per tool
-            if self.quiet_mode:
+            if self._should_emit_quiet_tool_messages():
                cute_msg = _get_cute_tool_message_impl(name, args, tool_duration, result=function_result)
                self._safe_print(f"  {cute_msg}")
            elif not self.quiet_mode:
@@ -6545,7 +6613,7 @@ class AIAgent:
                    store=self._todo_store,
                )
                tool_duration = time.time() - tool_start_time
-                if self.quiet_mode:
+                if self._should_emit_quiet_tool_messages():
                    self._vprint(f"  {_get_cute_tool_message_impl('todo', function_args, tool_duration, result=function_result)}")
            elif function_name == "session_search":
                if not self._session_db:
@@ -6560,7 +6628,7 @@ class AIAgent:
                        current_session_id=self.session_id,
                    )
                tool_duration = time.time() - tool_start_time
-                if self.quiet_mode:
+                if self._should_emit_quiet_tool_messages():
                    self._vprint(f"  {_get_cute_tool_message_impl('session_search', function_args, tool_duration, result=function_result)}")
            elif function_name == "memory":
                target = function_args.get("target", "memory")
@@ -6573,7 +6641,7 @@ class AIAgent:
                    store=self._memory_store,
                )
                tool_duration = time.time() - tool_start_time
-                if self.quiet_mode:
+                if self._should_emit_quiet_tool_messages():
                    self._vprint(f"  {_get_cute_tool_message_impl('memory', function_args, tool_duration, result=function_result)}")
            elif function_name == "clarify":
                from tools.clarify_tool import clarify_tool as _clarify_tool
@@ -6583,7 +6651,7 @@ class AIAgent:
                    callback=self.clarify_callback,
                )
                tool_duration = time.time() - tool_start_time
-                if self.quiet_mode:
+                if self._should_emit_quiet_tool_messages():
                    self._vprint(f"  {_get_cute_tool_message_impl('clarify', function_args, tool_duration, result=function_result)}")
            elif function_name == "delegate_task":
                from tools.delegate_tool import delegate_task as _delegate_task
@@ -6594,7 +6662,7 @@ class AIAgent:
                    goal_preview = (function_args.get("goal") or "")[:30]
                    spinner_label = f"🔀 {goal_preview}" if goal_preview else "🔀 delegating"
                spinner = None
-                if self.quiet_mode and not self.tool_progress_callback and self._should_start_quiet_spinner():
+                if self._should_emit_quiet_tool_messages() and self._should_start_quiet_spinner():
                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
                    spinner = KawaiiSpinner(f"{face} {spinner_label}", spinner_type='dots', print_fn=self._print_fn)
                    spinner.start()
@@ -6616,13 +6684,13 @@ class AIAgent:
                    cute_msg = _get_cute_tool_message_impl('delegate_task', function_args, tool_duration, result=_delegate_result)
                    if spinner:
                        spinner.stop(cute_msg)
-                    elif self.quiet_mode:
+                    elif self._should_emit_quiet_tool_messages():
                        self._vprint(f"  {cute_msg}")
            elif self._memory_manager and self._memory_manager.has_tool(function_name):
                # Memory provider tools (hindsight_retain, honcho_search, etc.)
                # These are not in the tool registry — route through MemoryManager.
                spinner = None
-                if self.quiet_mode and not self.tool_progress_callback:
+                if self._should_emit_quiet_tool_messages() and self._should_start_quiet_spinner():
                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
                    emoji = _get_tool_emoji(function_name)
                    preview = _build_tool_preview(function_name, function_args) or function_name
@@ -6640,11 +6708,11 @@ class AIAgent:
                    cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_mem_result)
                    if spinner:
                        spinner.stop(cute_msg)
-                    elif self.quiet_mode:
+                    elif self._should_emit_quiet_tool_messages():
                        self._vprint(f"  {cute_msg}")
            elif self.quiet_mode:
                spinner = None
-                if not self.tool_progress_callback:
+                if self._should_emit_quiet_tool_messages() and self._should_start_quiet_spinner():
                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
                    emoji = _get_tool_emoji(function_name)
                    preview = _build_tool_preview(function_name, function_args) or function_name
@@ -6667,7 +6735,7 @@ class AIAgent:
                    cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_spinner_result)
                    if spinner:
                        spinner.stop(cute_msg)
-                    else:
+                    elif self._should_emit_quiet_tool_messages():
                        self._vprint(f"  {cute_msg}")
            else:
                try:
@@ -7291,6 +7359,7 @@ class AIAgent:
        interrupted = False
        codex_ack_continuations = 0
        length_continue_retries = 0
+        truncated_tool_call_retries = 0
        truncated_response_prefix = ""
        compression_attempts = 0
        _turn_exit_reason = "unknown"  # Diagnostic: why the loop ended
@@ -7759,9 +7828,11 @@ class AIAgent:
                        # retries are pointless.  Detect this early and give a
                        # targeted error instead of wasting 3 API calls.
                        _trunc_content = None
+                        _trunc_has_tool_calls = False
                        if self.api_mode == "chat_completions":
                            _trunc_msg = response.choices[0].message if (hasattr(response, "choices") and response.choices) else None
                            _trunc_content = getattr(_trunc_msg, "content", None) if _trunc_msg else None
+                            _trunc_has_tool_calls = bool(getattr(_trunc_msg, "tool_calls", None)) if _trunc_msg else False
                        elif self.api_mode == "anthropic_messages":
                            # Anthropic response.content is a list of blocks
                            _text_parts = []
@@ -7771,9 +7842,11 @@ class AIAgent:
                            _trunc_content = "\n".join(_text_parts) if _text_parts else None

                        _thinking_exhausted = (
-                            _trunc_content is not None
-                            and not self._has_content_after_think_block(_trunc_content)
-                        ) or _trunc_content is None
+                            not _trunc_has_tool_calls and (
+                                (_trunc_content is not None and not self._has_content_after_think_block(_trunc_content))
+                                or _trunc_content is None
+                            )
+                        )

                        if _thinking_exhausted:
                            _exhaust_error = (
@@ -7849,6 +7922,34 @@ class AIAgent:
                                    "error": "Response remained truncated after 3 continuation attempts",
                                }

+                        if self.api_mode == "chat_completions":
+                            assistant_message = response.choices[0].message
+                            if assistant_message.tool_calls:
+                                if truncated_tool_call_retries < 1:
+                                    truncated_tool_call_retries += 1
+                                    self._vprint(
+                                        f"{self.log_prefix}⚠️  Truncated tool call detected — retrying API call...",
+                                        force=True,
+                                    )
+                                    # Don't append the broken response to messages;
+                                    # just re-run the same API call from the current
+                                    # message state, giving the model another chance.
+                                    continue
+                                self._vprint(
+                                    f"{self.log_prefix}⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.",
+                                    force=True,
+                                )
+                                self._cleanup_task_resources(effective_task_id)
+                                self._persist_session(messages, conversation_history)
+                                return {
+                                    "final_response": None,
+                                    "messages": messages,
+                                    "api_calls": api_call_count,
+                                    "completed": False,
+                                    "partial": True,
+                                    "error": "Response truncated due to output length limit",
+                                }
+
                        # If we have prior messages, roll back to last complete state
                        if len(messages) > 1:
                            self._vprint(f"{self.log_prefix}   ⏪ Rolling back to last complete assistant turn")
@@ -8161,7 +8262,33 @@ class AIAgent:
                        if _err_body_str:
                            self._vprint(f"{self.log_prefix}   📋 Details: {_err_body_str}", force=True)
                    self._vprint(f"{self.log_prefix}   ⏱️  Elapsed: {elapsed_time:.2f}s  Context: {len(api_messages)} msgs, ~{approx_tokens:,} tokens")
-                    
+
+                    # Actionable hint for OpenRouter "no tool endpoints" error.
+                    # This fires regardless of whether fallback succeeds — the
+                    # user needs to know WHY their model failed so they can fix
+                    # their provider routing, not just silently fall back.
+                    if (
+                        self._is_openrouter_url()
+                        and "support tool use" in error_msg
+                    ):
+                        self._vprint(
+                            f"{self.log_prefix}   💡 No OpenRouter providers for {_model} support tool calling with your current settings.",
+                            force=True,
+                        )
+                        if self.providers_allowed:
+                            self._vprint(
+                                f"{self.log_prefix}      Your provider_routing.only restriction is filtering out tool-capable providers.",
+                                force=True,
+                            )
+                            self._vprint(
+                                f"{self.log_prefix}      Try removing the restriction or adding providers that support tools for this model.",
+                                force=True,
+                            )
+                        self._vprint(
+                            f"{self.log_prefix}      Check which providers support tools: https://openrouter.ai/models/{_model}",
+                            force=True,
+                        )
+
                    # Check for interrupt before deciding to retry
                    if self._interrupt_requested:
                        self._vprint(f"{self.log_prefix}⚡ Interrupt detected during error handling, aborting retries.", force=True)
@@ -8217,6 +8344,10 @@ class AIAgent:
                                approx_tokens=approx_tokens,
                                task_id=effective_task_id,
                            )
+                            # Compression created a new session — clear history
+                            # so _flush_messages_to_session_db writes compressed
+                            # messages to the new session, not skipping them.
+                            conversation_history = None
                            if len(messages) < original_len or old_ctx > _reduced_ctx:
                                self._emit_status(
                                    f"🗜️ Context reduced to {_reduced_ctx:,} tokens "
@@ -8274,6 +8405,10 @@ class AIAgent:
                            messages, system_message, approx_tokens=approx_tokens,
                            task_id=effective_task_id,
                        )
+                        # Compression created a new session — clear history
+                        # so _flush_messages_to_session_db writes compressed
+                        # messages to the new session, not skipping them.
+                        conversation_history = None

                        if len(messages) < original_len:
                            self._emit_status(f"🗜️ Compressed {original_len} → {len(messages)} messages, retrying...")
@@ -8392,6 +8527,10 @@ class AIAgent:
                            messages, system_message, approx_tokens=approx_tokens,
                            task_id=effective_task_id,
                        )
+                        # Compression created a new session — clear history
+                        # so _flush_messages_to_session_db writes compressed
+                        # messages to the new session, not skipping them.
+                        conversation_history = None

                        if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
                            if len(messages) < original_len:
@@ -8999,6 +9138,11 @@ class AIAgent:

                    self._execute_tool_calls(assistant_message, messages, effective_task_id, api_call_count)

+                    # Reset per-turn retry counters after successful tool
+                    # execution so a single truncation doesn't poison the
+                    # entire conversation.
+                    truncated_tool_call_retries = 0
+
                    # Signal that a paragraph break is needed before the next
                    # streamed text.  We don't emit it immediately because
                    # multiple consecutive tool iterations would stack up
@@ -9185,6 +9329,7 @@ class AIAgent:
                    # Reset retry counter/signature on successful content
                    if hasattr(self, '_empty_content_retries'):
                        self._empty_content_retries = 0
+                    self._last_empty_content_signature = None
                    self._thinking_prefill_retries = 0

                    if (
@@ -9256,6 +9401,7 @@ class AIAgent:
                # If an assistant message with tool_calls was already appended,
                # the API expects a role="tool" result for every tool_call_id.
                # Fill in error results for any that weren't answered yet.
+                pending_handled = False
                for idx in range(len(messages) - 1, -1, -1):
                    msg = messages[idx]
                    if not isinstance(msg, dict):
@@ -2,8 +2,8 @@
 # ============================================================================
 # Hermes Agent Installer
 # ============================================================================
-# Installation script for Linux and macOS.
-# Uses uv for fast Python provisioning and package management.
+# Installation script for Linux, macOS, and Android/Termux.
+# Uses uv for desktop/server installs and Python's stdlib venv + pip on Termux.
 #
 # Usage:
 #   curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
@@ -117,6 +117,36 @@ log_error() {
    echo -e "${RED}✗${NC} $1"
 }

+is_termux() {
+    [ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
+}
+
+get_command_link_dir() {
+    if is_termux && [ -n "${PREFIX:-}" ]; then
+        echo "$PREFIX/bin"
+    else
+        echo "$HOME/.local/bin"
+    fi
+}
+
+get_command_link_display_dir() {
+    if is_termux && [ -n "${PREFIX:-}" ]; then
+        echo '$PREFIX/bin'
+    else
+        echo '~/.local/bin'
+    fi
+}
+
+get_hermes_command_path() {
+    local link_dir
+    link_dir="$(get_command_link_dir)"
+    if [ -x "$link_dir/hermes" ]; then
+        echo "$link_dir/hermes"
+    else
+        echo "hermes"
+    fi
+}
+
 # ============================================================================
 # System detection
 # ============================================================================
@@ -124,12 +154,17 @@ log_error() {
 detect_os() {
    case "$(uname -s)" in
        Linux*)
-            OS="linux"
-            if [ -f /etc/os-release ]; then
-                . /etc/os-release
-                DISTRO="$ID"
+            if is_termux; then
+                OS="android"
+                DISTRO="termux"
            else
-                DISTRO="unknown"
+                OS="linux"
+                if [ -f /etc/os-release ]; then
+                    . /etc/os-release
+                    DISTRO="$ID"
+                else
+                    DISTRO="unknown"
+                fi
            fi
            ;;
        Darwin*)
@@ -158,6 +193,12 @@ detect_os() {
 # ============================================================================

 install_uv() {
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Termux detected — using Python's stdlib venv + pip instead of uv"
+        UV_CMD=""
+        return 0
+    fi
+
    log_info "Checking for uv package manager..."

    # Check common locations for uv
@@ -209,6 +250,25 @@ install_uv() {
 }

 check_python() {
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Checking Termux Python..."
+        if command -v python >/dev/null 2>&1; then
+            PYTHON_PATH="$(command -v python)"
+            if "$PYTHON_PATH" -c 'import sys; raise SystemExit(0 if sys.version_info >= (3, 11) else 1)' 2>/dev/null; then
+                PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+                log_success "Python found: $PYTHON_FOUND_VERSION"
+                return 0
+            fi
+        fi
+
+        log_info "Installing Python via pkg..."
+        pkg install -y python >/dev/null
+        PYTHON_PATH="$(command -v python)"
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        log_success "Python installed: $PYTHON_FOUND_VERSION"
+        return 0
+    fi
+
    log_info "Checking Python $PYTHON_VERSION..."

    # Let uv handle Python — it can download and manage Python versions
@@ -243,6 +303,17 @@ check_git() {
    fi

    log_error "Git not found"
+
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Installing Git via pkg..."
+        pkg install -y git >/dev/null
+        if command -v git >/dev/null 2>&1; then
+            GIT_VERSION=$(git --version | awk '{print $3}')
+            log_success "Git $GIT_VERSION installed"
+            return 0
+        fi
+    fi
+
    log_info "Please install Git:"

    case "$OS" in
@@ -262,6 +333,9 @@ check_git() {
                    ;;
            esac
            ;;
+        android)
+            log_info "  pkg install git"
+            ;;
        macos)
            log_info "  xcode-select --install"
            log_info "  Or: brew install git"
@@ -290,11 +364,29 @@ check_node() {
        return 0
    fi

-    log_info "Node.js not found — installing Node.js $NODE_VERSION LTS..."
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Node.js not found — installing Node.js via pkg..."
+    else
+        log_info "Node.js not found — installing Node.js $NODE_VERSION LTS..."
+    fi
    install_node
 }

 install_node() {
+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Installing Node.js via pkg..."
+        if pkg install -y nodejs >/dev/null; then
+            local installed_ver
+            installed_ver=$(node --version 2>/dev/null)
+            log_success "Node.js $installed_ver installed via pkg"
+            HAS_NODE=true
+        else
+            log_warn "Failed to install Node.js via pkg"
+            HAS_NODE=false
+        fi
+        return 0
+    fi
+
    local arch=$(uname -m)
    local node_arch
    case "$arch" in
@@ -413,6 +505,30 @@ install_system_packages() {
        need_ffmpeg=true
    fi

+    # Termux always needs the Android build toolchain for the tested pip path,
+    # even when ripgrep/ffmpeg are already present.
+    if [ "$DISTRO" = "termux" ]; then
+        local termux_pkgs=(clang rust make pkg-config libffi openssl)
+        if [ "$need_ripgrep" = true ]; then
+            termux_pkgs+=("ripgrep")
+        fi
+        if [ "$need_ffmpeg" = true ]; then
+            termux_pkgs+=("ffmpeg")
+        fi
+
+        log_info "Installing Termux packages: ${termux_pkgs[*]}"
+        if pkg install -y "${termux_pkgs[@]}" >/dev/null; then
+            [ "$need_ripgrep" = true ] && HAS_RIPGREP=true && log_success "ripgrep installed"
+            [ "$need_ffmpeg" = true ]  && HAS_FFMPEG=true  && log_success "ffmpeg installed"
+            log_success "Termux build dependencies installed"
+            return 0
+        fi
+
+        log_warn "Could not auto-install all Termux packages"
+        log_info "Install manually: pkg install ${termux_pkgs[*]}"
+        return 0
+    fi
+
    # Nothing to install — done
    if [ "$need_ripgrep" = false ] && [ "$need_ffmpeg" = false ]; then
        return 0
@@ -550,6 +666,9 @@ show_manual_install_hint() {
                *)             log_info "  Use your package manager or visit the project homepage" ;;
            esac
            ;;
+        android)
+            log_info "  pkg install $pkg"
+            ;;
        macos) log_info "  brew install $pkg" ;;
    esac
 }
@@ -646,6 +765,19 @@ setup_venv() {
        return 0
    fi

+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Creating virtual environment with Termux Python..."
+
+        if [ -d "venv" ]; then
+            log_info "Virtual environment already exists, recreating..."
+            rm -rf venv
+        fi
+
+        "$PYTHON_PATH" -m venv venv
+        log_success "Virtual environment ready ($(./venv/bin/python --version 2>/dev/null))"
+        return 0
+    fi
+
    log_info "Creating virtual environment with Python $PYTHON_VERSION..."

    if [ -d "venv" ]; then
@@ -662,6 +794,46 @@ setup_venv() {
 install_deps() {
    log_info "Installing dependencies..."

+    if [ "$DISTRO" = "termux" ]; then
+        if [ "$USE_VENV" = true ]; then
+            export VIRTUAL_ENV="$INSTALL_DIR/venv"
+            PIP_PYTHON="$INSTALL_DIR/venv/bin/python"
+        else
+            PIP_PYTHON="$PYTHON_PATH"
+        fi
+
+        if [ -z "${ANDROID_API_LEVEL:-}" ]; then
+            ANDROID_API_LEVEL="$(getprop ro.build.version.sdk 2>/dev/null || true)"
+            if [ -z "$ANDROID_API_LEVEL" ]; then
+                ANDROID_API_LEVEL=24
+            fi
+            export ANDROID_API_LEVEL
+            log_info "Using ANDROID_API_LEVEL=$ANDROID_API_LEVEL for Android wheel builds"
+        fi
+
+        "$PIP_PYTHON" -m pip install --upgrade pip setuptools wheel >/dev/null
+        if ! "$PIP_PYTHON" -m pip install -e '.[termux]' -c constraints-termux.txt; then
+            log_warn "Termux feature install (.[termux]) failed, trying base install..."
+            if ! "$PIP_PYTHON" -m pip install -e '.' -c constraints-termux.txt; then
+                log_error "Package installation failed on Termux."
+                log_info "Ensure these packages are installed: pkg install clang rust make pkg-config libffi openssl"
+                log_info "Then re-run: cd $INSTALL_DIR && python -m pip install -e '.[termux]' -c constraints-termux.txt"
+                exit 1
+            fi
+        fi
+
+        log_success "Main package installed"
+        log_info "Termux note: browser/WhatsApp tooling is not installed by default; see the Termux guide for optional follow-up steps."
+
+        if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+            log_info "tinker-atropos submodule found — skipping install (optional, for RL training)"
+            log_info "  To install later: $PIP_PYTHON -m pip install -e \"./tinker-atropos\""
+        fi
+
+        log_success "All dependencies installed"
+        return 0
+    fi
+
    if [ "$USE_VENV" = true ]; then
        # Tell uv to install into our venv (no need to activate)
        export VIRTUAL_ENV="$INSTALL_DIR/venv"
@@ -743,19 +915,35 @@ setup_path() {
    if [ ! -x "$HERMES_BIN" ]; then
        log_warn "hermes entry point not found at $HERMES_BIN"
        log_info "This usually means the pip install didn't complete successfully."
-        log_info "Try: cd $INSTALL_DIR && uv pip install -e '.[all]'"
+        if [ "$DISTRO" = "termux" ]; then
+            log_info "Try: cd $INSTALL_DIR && python -m pip install -e '.[termux]' -c constraints-termux.txt"
+        else
+            log_info "Try: cd $INSTALL_DIR && uv pip install -e '.[all]'"
+        fi
        return 0
    fi

-    # Create symlink in ~/.local/bin (standard user binary location, usually on PATH)
-    mkdir -p "$HOME/.local/bin"
-    ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
-    log_success "Symlinked hermes → ~/.local/bin/hermes"
+    local command_link_dir
+    local command_link_display_dir
+    command_link_dir="$(get_command_link_dir)"
+    command_link_display_dir="$(get_command_link_display_dir)"
+
+    # Create a user-facing shim for the hermes command.
+    mkdir -p "$command_link_dir"
+    ln -sf "$HERMES_BIN" "$command_link_dir/hermes"
+    log_success "Symlinked hermes → $command_link_display_dir/hermes"
+
+    if [ "$DISTRO" = "termux" ]; then
+        export PATH="$command_link_dir:$PATH"
+        log_info "$command_link_display_dir is the native Termux command path"
+        log_success "hermes command ready"
+        return 0
+    fi

    # Check if ~/.local/bin is on PATH; if not, add it to shell config.
    # Detect the user's actual login shell (not the shell running this script,
    # which is always bash when piped from curl).
-    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
+    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$command_link_dir$"; then
        SHELL_CONFIGS=()
        LOGIN_SHELL="$(basename "${SHELL:-/bin/bash}")"
        case "$LOGIN_SHELL" in
@@ -801,7 +989,7 @@ setup_path() {
    fi

    # Export for current session so hermes works immediately
-    export PATH="$HOME/.local/bin:$PATH"
+    export PATH="$command_link_dir:$PATH"

    log_success "hermes command ready"
 }
@@ -878,6 +1066,13 @@ install_node_deps() {
        return 0
    fi

+    if [ "$DISTRO" = "termux" ]; then
+        log_info "Skipping automatic Node/browser dependency setup on Termux"
+        log_info "Browser automation and WhatsApp bridge are not part of the tested Termux install path yet."
+        log_info "If you want to experiment manually later, run: cd $INSTALL_DIR && npm install"
+        return 0
+    fi
+
    if [ -f "$INSTALL_DIR/package.json" ]; then
        log_info "Installing Node.js dependencies (browser tools)..."
        cd "$INSTALL_DIR"
@@ -992,8 +1187,7 @@ maybe_start_gateway() {
            read -p "Pair WhatsApp now? [Y/n] " -n 1 -r
            echo
            if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
-                HERMES_CMD="$HOME/.local/bin/hermes"
-                [ ! -x "$HERMES_CMD" ] && HERMES_CMD="hermes"
+                HERMES_CMD="$(get_hermes_command_path)"
                $HERMES_CMD whatsapp || true
            fi
        else
@@ -1007,16 +1201,17 @@ maybe_start_gateway() {
    fi

    echo ""
-    read -p "Would you like to install the gateway as a background service? [Y/n] " -n 1 -r < /dev/tty
+    if [ "$DISTRO" = "termux" ]; then
+        read -p "Would you like to start the gateway in the background? [Y/n] " -n 1 -r < /dev/tty
+    else
+        read -p "Would you like to install the gateway as a background service? [Y/n] " -n 1 -r < /dev/tty
+    fi
    echo

    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
-        HERMES_CMD="$HOME/.local/bin/hermes"
-        if [ ! -x "$HERMES_CMD" ]; then
-            HERMES_CMD="hermes"
-        fi
+        HERMES_CMD="$(get_hermes_command_path)"

-        if command -v systemctl &> /dev/null; then
+        if [ "$DISTRO" != "termux" ] && command -v systemctl &> /dev/null; then
            log_info "Installing systemd service..."
            if $HERMES_CMD gateway install 2>/dev/null; then
                log_success "Gateway service installed"
@@ -1029,12 +1224,19 @@ maybe_start_gateway() {
                log_warn "Systemd install failed. You can start manually: hermes gateway"
            fi
        else
-            log_info "systemd not available — starting gateway in background..."
+            if [ "$DISTRO" = "termux" ]; then
+                log_info "Termux detected — starting gateway in best-effort background mode..."
+            else
+                log_info "systemd not available — starting gateway in background..."
+            fi
            nohup $HERMES_CMD gateway > "$HERMES_HOME/logs/gateway.log" 2>&1 &
            GATEWAY_PID=$!
            log_success "Gateway started (PID $GATEWAY_PID). Logs: ~/.hermes/logs/gateway.log"
            log_info "To stop: kill $GATEWAY_PID"
            log_info "To restart later: hermes gateway"
+            if [ "$DISTRO" = "termux" ]; then
+                log_warn "Android may stop background processes when Termux is suspended or the system reclaims resources."
+            fi
        fi
    else
        log_info "Skipped. Start the gateway later with: hermes gateway"
@@ -1073,24 +1275,33 @@ print_success() {

    echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
    echo ""
-    echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
-    echo ""
-    LOGIN_SHELL="$(basename "${SHELL:-/bin/bash}")"
-    if [ "$LOGIN_SHELL" = "zsh" ]; then
-        echo "   source ~/.zshrc"
-    elif [ "$LOGIN_SHELL" = "bash" ]; then
-        echo "   source ~/.bashrc"
+    if [ "$DISTRO" = "termux" ]; then
+        echo -e "${YELLOW}⚡ 'hermes' was linked into $(get_command_link_display_dir), which is already on PATH in Termux.${NC}"
+        echo ""
    else
-        echo "   source ~/.bashrc   # or ~/.zshrc"
+        echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
+        echo ""
+        LOGIN_SHELL="$(basename "${SHELL:-/bin/bash}")"
+        if [ "$LOGIN_SHELL" = "zsh" ]; then
+            echo "   source ~/.zshrc"
+        elif [ "$LOGIN_SHELL" = "bash" ]; then
+            echo "   source ~/.bashrc"
+        else
+            echo "   source ~/.bashrc   # or ~/.zshrc"
+        fi
+        echo ""
    fi
-    echo ""

    # Show Node.js warning if auto-install failed
    if [ "$HAS_NODE" = false ]; then
        echo -e "${YELLOW}"
        echo "Note: Node.js could not be installed automatically."
        echo "Browser tools need Node.js. Install manually:"
-        echo "  https://nodejs.org/en/download/"
+        if [ "$DISTRO" = "termux" ]; then
+            echo "  pkg install nodejs"
+        else
+            echo "  https://nodejs.org/en/download/"
+        fi
        echo -e "${NC}"
    fi

@@ -1099,7 +1310,11 @@ print_success() {
        echo -e "${YELLOW}"
        echo "Note: ripgrep (rg) was not found. File search will use"
        echo "grep as a fallback. For faster search in large codebases,"
-        echo "install ripgrep: sudo apt install ripgrep (or brew install ripgrep)"
+        if [ "$DISTRO" = "termux" ]; then
+            echo "install ripgrep: pkg install ripgrep"
+        else
+            echo "install ripgrep: sudo apt install ripgrep (or brew install ripgrep)"
+        fi
        echo -e "${NC}"
    fi
 }
@@ -3,17 +3,17 @@
 # Hermes Agent Setup Script
 # ============================================================================
 # Quick setup for developers who cloned the repo manually.
-# Uses uv for fast Python provisioning and package management.
+# Uses uv for desktop/server setup and Python's stdlib venv + pip on Termux.
 #
 # Usage:
 #   ./setup-hermes.sh
 #
 # This script:
-# 1. Installs uv if not present
-# 2. Creates a virtual environment with Python 3.11 via uv
-# 3. Installs all dependencies (main package + submodules)
+# 1. Detects desktop/server vs Android/Termux setup path
+# 2. Creates a Python 3.11 virtual environment
+# 3. Installs the appropriate dependency set for the platform
 # 4. Creates .env from template (if not exists)
-# 5. Symlinks the 'hermes' CLI command into ~/.local/bin
+# 5. Symlinks the 'hermes' CLI command into a user-facing bin dir
 # 6. Runs the setup wizard (optional)
 # ============================================================================

@@ -31,6 +31,26 @@ cd "$SCRIPT_DIR"

 PYTHON_VERSION="3.11"

+is_termux() {
+    [ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
+}
+
+get_command_link_dir() {
+    if is_termux && [ -n "${PREFIX:-}" ]; then
+        echo "$PREFIX/bin"
+    else
+        echo "$HOME/.local/bin"
+    fi
+}
+
+get_command_link_display_dir() {
+    if is_termux && [ -n "${PREFIX:-}" ]; then
+        echo '$PREFIX/bin'
+    else
+        echo '~/.local/bin'
+    fi
+}
+
 echo ""
 echo -e "${CYAN}⚕ Hermes Agent Setup${NC}"
 echo ""
@@ -42,36 +62,40 @@ echo ""
 echo -e "${CYAN}→${NC} Checking for uv..."

 UV_CMD=""
-if command -v uv &> /dev/null; then
-    UV_CMD="uv"
-elif [ -x "$HOME/.local/bin/uv" ]; then
-    UV_CMD="$HOME/.local/bin/uv"
-elif [ -x "$HOME/.cargo/bin/uv" ]; then
-    UV_CMD="$HOME/.cargo/bin/uv"
-fi
-
-if [ -n "$UV_CMD" ]; then
-    UV_VERSION=$($UV_CMD --version 2>/dev/null)
-    echo -e "${GREEN}✓${NC} uv found ($UV_VERSION)"
+if is_termux; then
+    echo -e "${CYAN}→${NC} Termux detected — using Python's stdlib venv + pip instead of uv"
 else
-    echo -e "${CYAN}→${NC} Installing uv..."
-    if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
-        if [ -x "$HOME/.local/bin/uv" ]; then
-            UV_CMD="$HOME/.local/bin/uv"
-        elif [ -x "$HOME/.cargo/bin/uv" ]; then
-            UV_CMD="$HOME/.cargo/bin/uv"
-        fi
-        
-        if [ -n "$UV_CMD" ]; then
-            UV_VERSION=$($UV_CMD --version 2>/dev/null)
-            echo -e "${GREEN}✓${NC} uv installed ($UV_VERSION)"
+    if command -v uv &> /dev/null; then
+        UV_CMD="uv"
+    elif [ -x "$HOME/.local/bin/uv" ]; then
+        UV_CMD="$HOME/.local/bin/uv"
+    elif [ -x "$HOME/.cargo/bin/uv" ]; then
+        UV_CMD="$HOME/.cargo/bin/uv"
+    fi
+
+    if [ -n "$UV_CMD" ]; then
+        UV_VERSION=$($UV_CMD --version 2>/dev/null)
+        echo -e "${GREEN}✓${NC} uv found ($UV_VERSION)"
+    else
+        echo -e "${CYAN}→${NC} Installing uv..."
+        if curl -LsSf https://astral.sh/uv/install.sh | sh 2>/dev/null; then
+            if [ -x "$HOME/.local/bin/uv" ]; then
+                UV_CMD="$HOME/.local/bin/uv"
+            elif [ -x "$HOME/.cargo/bin/uv" ]; then
+                UV_CMD="$HOME/.cargo/bin/uv"
+            fi
+
+            if [ -n "$UV_CMD" ]; then
+                UV_VERSION=$($UV_CMD --version 2>/dev/null)
+                echo -e "${GREEN}✓${NC} uv installed ($UV_VERSION)"
+            else
+                echo -e "${RED}✗${NC} uv installed but not found. Add ~/.local/bin to PATH and retry."
+                exit 1
+            fi
        else
-            echo -e "${RED}✗${NC} uv installed but not found. Add ~/.local/bin to PATH and retry."
+            echo -e "${RED}✗${NC} Failed to install uv. Visit https://docs.astral.sh/uv/"
            exit 1
        fi
-    else
-        echo -e "${RED}✗${NC} Failed to install uv. Visit https://docs.astral.sh/uv/"
-        exit 1
    fi
 fi

@@ -81,16 +105,34 @@ fi

 echo -e "${CYAN}→${NC} Checking Python $PYTHON_VERSION..."

-if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
-    PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
-    PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
-    echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
+if is_termux; then
+    if command -v python >/dev/null 2>&1; then
+        PYTHON_PATH="$(command -v python)"
+        if "$PYTHON_PATH" -c 'import sys; raise SystemExit(0 if sys.version_info >= (3, 11) else 1)' 2>/dev/null; then
+            PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+            echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
+        else
+            echo -e "${RED}✗${NC} Termux Python must be 3.11+"
+            echo "    Run: pkg install python"
+            exit 1
+        fi
+    else
+        echo -e "${RED}✗${NC} Python not found in Termux"
+        echo "    Run: pkg install python"
+        exit 1
+    fi
 else
-    echo -e "${CYAN}→${NC} Python $PYTHON_VERSION not found, installing via uv..."
-    $UV_CMD python install "$PYTHON_VERSION"
-    PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
-    PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
-    echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION installed"
+    if $UV_CMD python find "$PYTHON_VERSION" &> /dev/null; then
+        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION found"
+    else
+        echo -e "${CYAN}→${NC} Python $PYTHON_VERSION not found, installing via uv..."
+        $UV_CMD python install "$PYTHON_VERSION"
+        PYTHON_PATH=$($UV_CMD python find "$PYTHON_VERSION")
+        PYTHON_FOUND_VERSION=$($PYTHON_PATH --version 2>/dev/null)
+        echo -e "${GREEN}✓${NC} $PYTHON_FOUND_VERSION installed"
+    fi
 fi

 # ============================================================================
@@ -104,11 +146,16 @@ if [ -d "venv" ]; then
    rm -rf venv
 fi

-$UV_CMD venv venv --python "$PYTHON_VERSION"
-echo -e "${GREEN}✓${NC} venv created (Python $PYTHON_VERSION)"
+if is_termux; then
+    "$PYTHON_PATH" -m venv venv
+    echo -e "${GREEN}✓${NC} venv created with stdlib venv"
+else
+    $UV_CMD venv venv --python "$PYTHON_VERSION"
+    echo -e "${GREEN}✓${NC} venv created (Python $PYTHON_VERSION)"
+fi

-# Tell uv to install into this venv (no activation needed for uv)
 export VIRTUAL_ENV="$SCRIPT_DIR/venv"
+SETUP_PYTHON="$SCRIPT_DIR/venv/bin/python"

 # ============================================================================
 # Dependencies
@@ -116,19 +163,34 @@ export VIRTUAL_ENV="$SCRIPT_DIR/venv"

 echo -e "${CYAN}→${NC} Installing dependencies..."

-# Prefer uv sync with lockfile (hash-verified installs) when available,
-# fall back to pip install for compatibility or when lockfile is stale.
-if [ -f "uv.lock" ]; then
-    echo -e "${CYAN}→${NC} Using uv.lock for hash-verified installation..."
-    UV_PROJECT_ENVIRONMENT="$SCRIPT_DIR/venv" $UV_CMD sync --all-extras --locked 2>/dev/null && \
-        echo -e "${GREEN}✓${NC} Dependencies installed (lockfile verified)" || {
-        echo -e "${YELLOW}⚠${NC} Lockfile install failed (may be outdated), falling back to pip install..."
+if is_termux; then
+    export ANDROID_API_LEVEL="$(getprop ro.build.version.sdk 2>/dev/null || printf '%s' "${ANDROID_API_LEVEL:-}")"
+    echo -e "${CYAN}→${NC} Termux detected — installing the tested Android bundle"
+    "$SETUP_PYTHON" -m pip install --upgrade pip setuptools wheel
+    if [ -f "constraints-termux.txt" ]; then
+        "$SETUP_PYTHON" -m pip install -e ".[termux]" -c constraints-termux.txt || {
+            echo -e "${YELLOW}⚠${NC} Termux bundle install failed, falling back to base install..."
+            "$SETUP_PYTHON" -m pip install -e "." -c constraints-termux.txt
+        }
+    else
+        "$SETUP_PYTHON" -m pip install -e ".[termux]" || "$SETUP_PYTHON" -m pip install -e "."
+    fi
+    echo -e "${GREEN}✓${NC} Dependencies installed"
+else
+    # Prefer uv sync with lockfile (hash-verified installs) when available,
+    # fall back to pip install for compatibility or when lockfile is stale.
+    if [ -f "uv.lock" ]; then
+        echo -e "${CYAN}→${NC} Using uv.lock for hash-verified installation..."
+        UV_PROJECT_ENVIRONMENT="$SCRIPT_DIR/venv" $UV_CMD sync --all-extras --locked 2>/dev/null && \
+            echo -e "${GREEN}✓${NC} Dependencies installed (lockfile verified)" || {
+            echo -e "${YELLOW}⚠${NC} Lockfile install failed (may be outdated), falling back to pip install..."
+            $UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
+            echo -e "${GREEN}✓${NC} Dependencies installed"
+        }
+    else
        $UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
        echo -e "${GREEN}✓${NC} Dependencies installed"
-    }
-else
-    $UV_CMD pip install -e ".[all]" || $UV_CMD pip install -e "."
-    echo -e "${GREEN}✓${NC} Dependencies installed"
+    fi
 fi

 # ============================================================================
@@ -138,7 +200,9 @@ fi
 echo -e "${CYAN}→${NC} Installing optional submodules..."

 # tinker-atropos (RL training backend)
-if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
+if is_termux; then
+    echo -e "${CYAN}→${NC} Skipping tinker-atropos on Termux (not part of the tested Android path)"
+elif [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
    $UV_CMD pip install -e "./tinker-atropos" && \
        echo -e "${GREEN}✓${NC} tinker-atropos installed" || \
        echo -e "${YELLOW}⚠${NC} tinker-atropos install failed (RL tools may not work)"
@@ -160,34 +224,42 @@ else
    echo
    if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
        INSTALLED=false
-        
-        # Check if sudo is available
-        if command -v sudo &> /dev/null && sudo -n true 2>/dev/null; then
-            if command -v apt &> /dev/null; then
-                sudo apt install -y ripgrep && INSTALLED=true
-            elif command -v dnf &> /dev/null; then
-                sudo dnf install -y ripgrep && INSTALLED=true
+
+        if is_termux; then
+            pkg install -y ripgrep && INSTALLED=true
+        else
+            # Check if sudo is available
+            if command -v sudo &> /dev/null && sudo -n true 2>/dev/null; then
+                if command -v apt &> /dev/null; then
+                    sudo apt install -y ripgrep && INSTALLED=true
+                elif command -v dnf &> /dev/null; then
+                    sudo dnf install -y ripgrep && INSTALLED=true
+                fi
+            fi
+
+            # Try brew (no sudo needed)
+            if [ "$INSTALLED" = false ] && command -v brew &> /dev/null; then
+                brew install ripgrep && INSTALLED=true
+            fi
+
+            # Try cargo (no sudo needed)
+            if [ "$INSTALLED" = false ] && command -v cargo &> /dev/null; then
+                echo -e "${CYAN}→${NC} Trying cargo install (no sudo required)..."
+                cargo install ripgrep && INSTALLED=true
            fi
        fi
-        
-        # Try brew (no sudo needed)
-        if [ "$INSTALLED" = false ] && command -v brew &> /dev/null; then
-            brew install ripgrep && INSTALLED=true
-        fi
-        
-        # Try cargo (no sudo needed)
-        if [ "$INSTALLED" = false ] && command -v cargo &> /dev/null; then
-            echo -e "${CYAN}→${NC} Trying cargo install (no sudo required)..."
-            cargo install ripgrep && INSTALLED=true
-        fi
-        
+
        if [ "$INSTALLED" = true ]; then
            echo -e "${GREEN}✓${NC} ripgrep installed"
        else
            echo -e "${YELLOW}⚠${NC} Auto-install failed. Install options:"
-            echo "    sudo apt install ripgrep     # Debian/Ubuntu"
-            echo "    brew install ripgrep         # macOS"
-            echo "    cargo install ripgrep        # With Rust (no sudo)"
+            if is_termux; then
+                echo "    pkg install ripgrep          # Termux / Android"
+            else
+                echo "    sudo apt install ripgrep     # Debian/Ubuntu"
+                echo "    brew install ripgrep         # macOS"
+                echo "    cargo install ripgrep        # With Rust (no sudo)"
+            fi
            echo "    https://github.com/BurntSushi/ripgrep#installation"
        fi
    fi
@@ -207,49 +279,56 @@ else
 fi

 # ============================================================================
-# PATH setup — symlink hermes into ~/.local/bin
+# PATH setup — symlink hermes into a user-facing bin dir
 # ============================================================================

 echo -e "${CYAN}→${NC} Setting up hermes command..."

 HERMES_BIN="$SCRIPT_DIR/venv/bin/hermes"
-mkdir -p "$HOME/.local/bin"
-ln -sf "$HERMES_BIN" "$HOME/.local/bin/hermes"
-echo -e "${GREEN}✓${NC} Symlinked hermes → ~/.local/bin/hermes"
+COMMAND_LINK_DIR="$(get_command_link_dir)"
+COMMAND_LINK_DISPLAY_DIR="$(get_command_link_display_dir)"
+mkdir -p "$COMMAND_LINK_DIR"
+ln -sf "$HERMES_BIN" "$COMMAND_LINK_DIR/hermes"
+echo -e "${GREEN}✓${NC} Symlinked hermes → $COMMAND_LINK_DISPLAY_DIR/hermes"

-# Determine the appropriate shell config file
-SHELL_CONFIG=""
-if [[ "$SHELL" == *"zsh"* ]]; then
-    SHELL_CONFIG="$HOME/.zshrc"
-elif [[ "$SHELL" == *"bash"* ]]; then
-    SHELL_CONFIG="$HOME/.bashrc"
-    [ ! -f "$SHELL_CONFIG" ] && SHELL_CONFIG="$HOME/.bash_profile"
+if is_termux; then
+    export PATH="$COMMAND_LINK_DIR:$PATH"
+    echo -e "${GREEN}✓${NC} $COMMAND_LINK_DISPLAY_DIR is already on PATH in Termux"
 else
-    # Fallback to checking existing files
-    if [ -f "$HOME/.zshrc" ]; then
+    # Determine the appropriate shell config file
+    SHELL_CONFIG=""
+    if [[ "$SHELL" == *"zsh"* ]]; then
        SHELL_CONFIG="$HOME/.zshrc"
-    elif [ -f "$HOME/.bashrc" ]; then
+    elif [[ "$SHELL" == *"bash"* ]]; then
        SHELL_CONFIG="$HOME/.bashrc"
-    elif [ -f "$HOME/.bash_profile" ]; then
-        SHELL_CONFIG="$HOME/.bash_profile"
-    fi
-fi
-
-if [ -n "$SHELL_CONFIG" ]; then
-    # Touch the file just in case it doesn't exist yet but was selected
-    touch "$SHELL_CONFIG" 2>/dev/null || true
-    
-    if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
-        if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
-            echo "" >> "$SHELL_CONFIG"
-            echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
-            echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$SHELL_CONFIG"
-            echo -e "${GREEN}✓${NC} Added ~/.local/bin to PATH in $SHELL_CONFIG"
-        else
-            echo -e "${GREEN}✓${NC} ~/.local/bin already in $SHELL_CONFIG"
-        fi
+        [ ! -f "$SHELL_CONFIG" ] && SHELL_CONFIG="$HOME/.bash_profile"
    else
-        echo -e "${GREEN}✓${NC} ~/.local/bin already on PATH"
+        # Fallback to checking existing files
+        if [ -f "$HOME/.zshrc" ]; then
+            SHELL_CONFIG="$HOME/.zshrc"
+        elif [ -f "$HOME/.bashrc" ]; then
+            SHELL_CONFIG="$HOME/.bashrc"
+        elif [ -f "$HOME/.bash_profile" ]; then
+            SHELL_CONFIG="$HOME/.bash_profile"
+        fi
+    fi
+
+    if [ -n "$SHELL_CONFIG" ]; then
+        # Touch the file just in case it doesn't exist yet but was selected
+        touch "$SHELL_CONFIG" 2>/dev/null || true
+
+        if ! echo "$PATH" | tr ':' '\n' | grep -q "^$HOME/.local/bin$"; then
+            if ! grep -q '\.local/bin' "$SHELL_CONFIG" 2>/dev/null; then
+                echo "" >> "$SHELL_CONFIG"
+                echo "# Hermes Agent — ensure ~/.local/bin is on PATH" >> "$SHELL_CONFIG"
+                echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$SHELL_CONFIG"
+                echo -e "${GREEN}✓${NC} Added ~/.local/bin to PATH in $SHELL_CONFIG"
+            else
+                echo -e "${GREEN}✓${NC} ~/.local/bin already in $SHELL_CONFIG"
+            fi
+        else
+            echo -e "${GREEN}✓${NC} ~/.local/bin already on PATH"
+        fi
    fi
 fi

@@ -281,18 +360,31 @@ echo -e "${GREEN}✓ Setup complete!${NC}"
 echo ""
 echo "Next steps:"
 echo ""
-echo "  1. Reload your shell:"
-echo "     source $SHELL_CONFIG"
-echo ""
-echo "  2. Run the setup wizard to configure API keys:"
-echo "     hermes setup"
-echo ""
-echo "  3. Start chatting:"
-echo "     hermes"
-echo ""
+if is_termux; then
+    echo "  1. Run the setup wizard to configure API keys:"
+    echo "     hermes setup"
+    echo ""
+    echo "  2. Start chatting:"
+    echo "     hermes"
+    echo ""
+else
+    echo "  1. Reload your shell:"
+    echo "     source $SHELL_CONFIG"
+    echo ""
+    echo "  2. Run the setup wizard to configure API keys:"
+    echo "     hermes setup"
+    echo ""
+    echo "  3. Start chatting:"
+    echo "     hermes"
+    echo ""
+fi
 echo "Other commands:"
 echo "  hermes status        # Check configuration"
-echo "  hermes gateway install # Install gateway service (messaging + cron)"
+if is_termux; then
+    echo "  hermes gateway       # Run gateway in foreground"
+else
+    echo "  hermes gateway install # Install gateway service (messaging + cron)"
+fi
 echo "  hermes cron list     # View scheduled jobs"
 echo "  hermes doctor        # Diagnose issues"
 echo ""
@@ -17,6 +17,7 @@ from agent.anthropic_adapter import (
    build_anthropic_kwargs,
    convert_messages_to_anthropic,
    convert_tools_to_anthropic,
+    get_anthropic_token_source,
    is_claude_code_token_valid,
    normalize_anthropic_response,
    normalize_model_name,
@@ -80,6 +81,9 @@ class TestBuildAnthropicClient:
            build_anthropic_client("sk-ant-api03-x", base_url="https://custom.api.com")
            kwargs = mock_sdk.Anthropic.call_args[1]
            assert kwargs["base_url"] == "https://custom.api.com"
+            assert kwargs["default_headers"] == {
+                "anthropic-beta": "interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14"
+            }

    def test_minimax_anthropic_endpoint_uses_bearer_auth_for_regular_api_keys(self):
        with patch("agent.anthropic_adapter._anthropic_sdk") as mock_sdk:
@@ -91,7 +95,20 @@ class TestBuildAnthropicClient:
            assert kwargs["auth_token"] == "minimax-secret-123"
            assert "api_key" not in kwargs
            assert kwargs["default_headers"] == {
-                "anthropic-beta": "interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14"
+                "anthropic-beta": "interleaved-thinking-2025-05-14"
+            }
+
+    def test_minimax_cn_anthropic_endpoint_omits_tool_streaming_beta(self):
+        with patch("agent.anthropic_adapter._anthropic_sdk") as mock_sdk:
+            build_anthropic_client(
+                "minimax-cn-secret-123",
+                base_url="https://api.minimaxi.com/anthropic",
+            )
+            kwargs = mock_sdk.Anthropic.call_args[1]
+            assert kwargs["auth_token"] == "minimax-cn-secret-123"
+            assert "api_key" not in kwargs
+            assert kwargs["default_headers"] == {
+                "anthropic-beta": "interleaved-thinking-2025-05-14"
            }


@@ -164,6 +181,15 @@ class TestResolveAnthropicToken:
        monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
        assert resolve_anthropic_token() == "sk-ant-oat01-mytoken"

+    def test_reports_claude_json_primary_key_source(self, monkeypatch, tmp_path):
+        monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
+        monkeypatch.delenv("ANTHROPIC_TOKEN", raising=False)
+        monkeypatch.delenv("CLAUDE_CODE_OAUTH_TOKEN", raising=False)
+        (tmp_path / ".claude.json").write_text(json.dumps({"primaryApiKey": "sk-ant-api03-primary"}))
+        monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
+
+        assert get_anthropic_token_source("sk-ant-api03-primary") == "claude_json_primary_api_key"
+
    def test_does_not_resolve_primary_api_key_as_native_anthropic_token(self, monkeypatch, tmp_path):
        monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
        monkeypatch.delenv("ANTHROPIC_TOKEN", raising=False)
@@ -9,6 +9,7 @@ import pytest

 from agent.auxiliary_client import (
    get_text_auxiliary_client,
+    get_vision_auxiliary_client,
    get_available_vision_backends,
    resolve_vision_provider_client,
    resolve_provider_client,
@@ -19,6 +20,7 @@ from agent.auxiliary_client import (
    _get_provider_chain,
    _is_payment_error,
    _try_payment_fallback,
+    _resolve_forced_provider,
    _resolve_auto,
 )

@@ -662,6 +664,15 @@ class TestGetTextAuxiliaryClient:
 class TestVisionClientFallback:
    """Vision client auto mode resolves known-good multimodal backends."""

+    def test_vision_returns_none_without_any_credentials(self):
+        with (
+            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
+            patch("agent.auxiliary_client._try_anthropic", return_value=(None, None)),
+        ):
+            client, model = get_vision_auxiliary_client()
+        assert client is None
+        assert model is None
+
    def test_vision_auto_includes_active_provider_when_configured(self, monkeypatch):
        """Active provider appears in available backends when credentials exist."""
        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
@@ -743,6 +754,21 @@ class TestAuxiliaryPoolAwareness:
        assert call_kwargs["base_url"] == "https://api.githubcopilot.com"
        assert call_kwargs["default_headers"]["Editor-Version"]

+    def test_vision_auto_uses_active_provider_as_fallback(self, monkeypatch):
+        """When no OpenRouter/Nous available, vision auto falls back to active provider."""
+        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
+        with (
+            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
+            patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
+            patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
+            patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
+            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
+        ):
+            client, model = get_vision_auxiliary_client()
+
+        assert client is not None
+        assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
+
    def test_vision_auto_prefers_active_provider_over_openrouter(self, monkeypatch):
        """Active provider is tried before OpenRouter in vision auto."""
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
@@ -774,6 +800,43 @@ class TestAuxiliaryPoolAwareness:
        assert client is not None
        assert provider == "custom:local"

+    def test_vision_direct_endpoint_override(self, monkeypatch):
+        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
+        monkeypatch.setenv("AUXILIARY_VISION_BASE_URL", "http://localhost:4567/v1")
+        monkeypatch.setenv("AUXILIARY_VISION_API_KEY", "vision-key")
+        monkeypatch.setenv("AUXILIARY_VISION_MODEL", "vision-model")
+        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_vision_auxiliary_client()
+        assert model == "vision-model"
+        assert mock_openai.call_args.kwargs["base_url"] == "http://localhost:4567/v1"
+        assert mock_openai.call_args.kwargs["api_key"] == "vision-key"
+
+    def test_vision_direct_endpoint_without_key_uses_placeholder(self, monkeypatch):
+        """Vision endpoint without API key should use 'no-key-required' placeholder."""
+        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
+        monkeypatch.setenv("AUXILIARY_VISION_BASE_URL", "http://localhost:4567/v1")
+        monkeypatch.setenv("AUXILIARY_VISION_MODEL", "vision-model")
+        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_vision_auxiliary_client()
+        assert client is not None
+        assert model == "vision-model"
+        assert mock_openai.call_args.kwargs["api_key"] == "no-key-required"
+
+    def test_vision_uses_openrouter_when_available(self, monkeypatch):
+        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
+        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_vision_auxiliary_client()
+        assert model == "google/gemini-3-flash-preview"
+        assert client is not None
+
+    def test_vision_uses_nous_when_available(self, monkeypatch):
+        with patch("agent.auxiliary_client._read_nous_auth") as mock_nous, \
+             patch("agent.auxiliary_client.OpenAI"):
+            mock_nous.return_value = {"access_token": "nous-tok"}
+            client, model = get_vision_auxiliary_client()
+        assert model == "google/gemini-3-flash-preview"
+        assert client is not None
+
    def test_vision_config_google_provider_uses_gemini_credentials(self, monkeypatch):
        config = {
            "auxiliary": {
@@ -799,6 +862,53 @@ class TestAuxiliaryPoolAwareness:
        assert mock_openai.call_args.kwargs["api_key"] == "gemini-key"
        assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai"

+    def test_vision_forced_main_uses_custom_endpoint(self, monkeypatch):
+        """When explicitly forced to 'main', vision CAN use custom endpoint."""
+        config = {
+            "model": {
+                "provider": "custom",
+                "base_url": "http://localhost:1234/v1",
+                "default": "my-local-model",
+            }
+        }
+        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "main")
+        monkeypatch.setenv("OPENAI_API_KEY", "local-key")
+        monkeypatch.setattr("hermes_cli.config.load_config", lambda: config)
+        monkeypatch.setattr("hermes_cli.runtime_provider.load_config", lambda: config)
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_vision_auxiliary_client()
+        assert client is not None
+        assert model == "my-local-model"
+
+    def test_vision_forced_main_returns_none_without_creds(self, monkeypatch):
+        """Forced main with no credentials still returns None."""
+        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "main")
+        monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
+        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
+        # Clear client cache to avoid stale entries from previous tests
+        from agent.auxiliary_client import _client_cache
+        _client_cache.clear()
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client._read_main_provider", return_value=""), \
+             patch("agent.auxiliary_client._read_main_model", return_value=""), \
+             patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \
+             patch("agent.auxiliary_client._resolve_custom_runtime", return_value=(None, None)), \
+             patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
+             patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)):
+            client, model = get_vision_auxiliary_client()
+        assert client is None
+        assert model is None
+
+    def test_vision_forced_codex(self, monkeypatch, codex_auth_dir):
+        """When forced to 'codex', vision uses Codex OAuth."""
+        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "codex")
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client.OpenAI"):
+            client, model = get_vision_auxiliary_client()
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-5.2-codex"


 class TestGetAuxiliaryProvider:
@@ -838,6 +948,122 @@ class TestGetAuxiliaryProvider:
        assert _get_auxiliary_provider("web_extract") == "main"


+class TestResolveForcedProvider:
+    """Tests for _resolve_forced_provider with explicit provider selection."""
+
+    def test_forced_openrouter(self, monkeypatch):
+        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
+        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = _resolve_forced_provider("openrouter")
+        assert model == "google/gemini-3-flash-preview"
+        assert client is not None
+
+    def test_forced_openrouter_no_key(self, monkeypatch):
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None):
+            client, model = _resolve_forced_provider("openrouter")
+        assert client is None
+        assert model is None
+
+    def test_forced_nous(self, monkeypatch):
+        with patch("agent.auxiliary_client._read_nous_auth") as mock_nous, \
+             patch("agent.auxiliary_client.OpenAI"):
+            mock_nous.return_value = {"access_token": "nous-tok"}
+            client, model = _resolve_forced_provider("nous")
+        assert model == "google/gemini-3-flash-preview"
+        assert client is not None
+
+    def test_forced_nous_not_configured(self, monkeypatch):
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None):
+            client, model = _resolve_forced_provider("nous")
+        assert client is None
+        assert model is None
+
+    def test_forced_main_uses_custom(self, monkeypatch):
+        config = {
+            "model": {
+                "provider": "custom",
+                "base_url": "http://local:8080/v1",
+                "default": "my-local-model",
+            }
+        }
+        monkeypatch.setenv("OPENAI_API_KEY", "local-key")
+        monkeypatch.setattr("hermes_cli.config.load_config", lambda: config)
+        monkeypatch.setattr("hermes_cli.runtime_provider.load_config", lambda: config)
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = _resolve_forced_provider("main")
+        assert model == "my-local-model"
+
+    def test_forced_main_uses_config_saved_custom_endpoint(self, monkeypatch):
+        config = {
+            "model": {
+                "provider": "custom",
+                "base_url": "http://local:8080/v1",
+                "default": "my-local-model",
+            }
+        }
+        monkeypatch.setenv("OPENAI_API_KEY", "local-key")
+        monkeypatch.setattr("hermes_cli.config.load_config", lambda: config)
+        monkeypatch.setattr("hermes_cli.runtime_provider.load_config", lambda: config)
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
+             patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)), \
+             patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = _resolve_forced_provider("main")
+        assert client is not None
+        assert model == "my-local-model"
+        call_kwargs = mock_openai.call_args
+        assert call_kwargs.kwargs["base_url"] == "http://local:8080/v1"
+
+    def test_forced_main_skips_openrouter_nous(self, monkeypatch):
+        """Even if OpenRouter key is set, 'main' skips it."""
+        config = {
+            "model": {
+                "provider": "custom",
+                "base_url": "http://local:8080/v1",
+                "default": "my-local-model",
+            }
+        }
+        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
+        monkeypatch.setenv("OPENAI_API_KEY", "local-key")
+        monkeypatch.setattr("hermes_cli.config.load_config", lambda: config)
+        monkeypatch.setattr("hermes_cli.runtime_provider.load_config", lambda: config)
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = _resolve_forced_provider("main")
+        # Should use custom endpoint, not OpenRouter
+        assert model == "my-local-model"
+
+    def test_forced_main_falls_to_codex(self, codex_auth_dir, monkeypatch):
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client.OpenAI"):
+            client, model = _resolve_forced_provider("main")
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-5.2-codex"
+
+    def test_forced_codex(self, codex_auth_dir, monkeypatch):
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client.OpenAI"):
+            client, model = _resolve_forced_provider("codex")
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-5.2-codex"
+
+    def test_forced_codex_no_token(self, monkeypatch):
+        with patch("agent.auxiliary_client._read_codex_access_token", return_value=None):
+            client, model = _resolve_forced_provider("codex")
+        assert client is None
+        assert model is None
+
+    def test_forced_unknown_returns_none(self, monkeypatch):
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client._read_codex_access_token", return_value=None):
+            client, model = _resolve_forced_provider("invalid-provider")
+        assert client is None
+        assert model is None
+
+
 class TestTaskSpecificOverrides:
    """Integration tests for per-task provider routing via get_text_auxiliary_client(task=...)."""

@@ -38,6 +38,16 @@ class TestShouldCompress:
        assert compressor.should_compress(prompt_tokens=50000) is False


+class TestShouldCompressPreflight:
+    def test_short_messages(self, compressor):
+        msgs = [{"role": "user", "content": "short"}]
+        assert compressor.should_compress_preflight(msgs) is False
+
+    def test_long_messages(self, compressor):
+        # Each message ~100k chars / 4 = 25k tokens, need >85k threshold
+        msgs = [{"role": "user", "content": "x" * 400000}]
+        assert compressor.should_compress_preflight(msgs) is True
+

 class TestUpdateFromResponse:
    def test_updates_fields(self, compressor):
@@ -48,12 +58,27 @@ class TestUpdateFromResponse:
        })
        assert compressor.last_prompt_tokens == 5000
        assert compressor.last_completion_tokens == 1000
+        assert compressor.last_total_tokens == 6000

    def test_missing_fields_default_zero(self, compressor):
        compressor.update_from_response({})
        assert compressor.last_prompt_tokens == 0


+class TestGetStatus:
+    def test_returns_expected_keys(self, compressor):
+        status = compressor.get_status()
+        assert "last_prompt_tokens" in status
+        assert "threshold_tokens" in status
+        assert "context_length" in status
+        assert "usage_percent" in status
+        assert "compression_count" in status
+
+    def test_usage_percent_calculation(self, compressor):
+        compressor.last_prompt_tokens = 50000
+        status = compressor.get_status()
+        assert status["usage_percent"] == 50.0
+

 class TestCompress:
    def _make_messages(self, n):
@@ -480,6 +480,39 @@ class TestClassifyApiError:
        result = classify_api_error(e)
        assert result.reason == FailoverReason.context_overflow

+    # ── Message-only usage limit disambiguation (no status code) ──
+
+    def test_message_usage_limit_transient_is_rate_limit(self):
+        """'usage limit' + 'try again' with no status code → rate_limit, not billing."""
+        e = Exception("usage limit exceeded, try again in 5 minutes")
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.rate_limit
+        assert result.retryable is True
+        assert result.should_rotate_credential is True
+        assert result.should_fallback is True
+
+    def test_message_usage_limit_no_retry_signal_is_billing(self):
+        """'usage limit' with no transient signal and no status code → billing."""
+        e = Exception("usage limit reached")
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.billing
+        assert result.retryable is False
+        assert result.should_rotate_credential is True
+
+    def test_message_quota_with_reset_window_is_rate_limit(self):
+        """'quota' + 'resets at' with no status code → rate_limit."""
+        e = Exception("quota exceeded, resets at midnight UTC")
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.rate_limit
+        assert result.retryable is True
+
+    def test_message_limit_exceeded_with_wait_is_rate_limit(self):
+        """'limit exceeded' + 'wait' with no status code → rate_limit."""
+        e = Exception("key limit exceeded, please wait before retrying")
+        result = classify_api_error(e)
+        assert result.reason == FailoverReason.rate_limit
+        assert result.retryable is True
+
    # ── Unknown / fallback ──

    def test_generic_exception_is_unknown(self):
@@ -7,6 +7,7 @@ from pathlib import Path
 from hermes_state import SessionDB
 from agent.insights import (
    InsightsEngine,
+    _get_pricing,
    _estimate_cost,
    _format_duration,
    _bar_chart,
@@ -117,6 +118,45 @@ def populated_db(db):
    return db


+# =========================================================================
+# Pricing helpers
+# =========================================================================
+
+class TestPricing:
+    def test_provider_prefix_stripped(self):
+        pricing = _get_pricing("anthropic/claude-sonnet-4-20250514")
+        assert pricing["input"] == 3.00
+        assert pricing["output"] == 15.00
+
+    def test_unknown_models_do_not_use_heuristics(self):
+        pricing = _get_pricing("some-new-opus-model")
+        assert pricing == _DEFAULT_PRICING
+        pricing = _get_pricing("anthropic/claude-haiku-future")
+        assert pricing == _DEFAULT_PRICING
+
+    def test_unknown_model_returns_zero_cost(self):
+        """Unknown/custom models should NOT have fabricated costs."""
+        pricing = _get_pricing("totally-unknown-model-xyz")
+        assert pricing == _DEFAULT_PRICING
+        assert pricing["input"] == 0.0
+        assert pricing["output"] == 0.0
+
+    def test_custom_endpoint_model_zero_cost(self):
+        """Self-hosted models should return zero cost."""
+        for model in ["FP16_Hermes_4.5", "Hermes_4.5_1T_epoch2", "my-local-llama"]:
+            pricing = _get_pricing(model)
+            assert pricing["input"] == 0.0, f"{model} should have zero cost"
+            assert pricing["output"] == 0.0, f"{model} should have zero cost"
+
+    def test_none_model(self):
+        pricing = _get_pricing(None)
+        assert pricing == _DEFAULT_PRICING
+
+    def test_empty_model(self):
+        pricing = _get_pricing("")
+        assert pricing == _DEFAULT_PRICING
+
+
 class TestHasKnownPricing:
    def test_known_commercial_model(self):
        assert _has_known_pricing("gpt-4o", provider="openai") is True
@@ -0,0 +1,70 @@
+"""Tests for local provider stream read timeout auto-detection.
+
+When a local LLM provider is detected (Ollama, llama.cpp, vLLM, etc.),
+the httpx stream read timeout should be automatically increased from the
+default 60s to HERMES_API_TIMEOUT (1800s) to avoid premature connection
+kills during long prefill phases.
+"""
+
+import os
+import pytest
+from unittest.mock import patch
+
+from agent.model_metadata import is_local_endpoint
+
+
+class TestLocalStreamReadTimeout:
+    """Verify stream read timeout auto-detection logic."""
+
+    @pytest.mark.parametrize("base_url", [
+        "http://localhost:11434",
+        "http://127.0.0.1:8080",
+        "http://0.0.0.0:5000",
+        "http://192.168.1.100:8000",
+        "http://10.0.0.5:1234",
+    ])
+    def test_local_endpoint_bumps_read_timeout(self, base_url):
+        """Local endpoint + default timeout -> bumps to base_timeout."""
+        with patch.dict(os.environ, {}, clear=False):
+            os.environ.pop("HERMES_STREAM_READ_TIMEOUT", None)
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            if _stream_read_timeout == 120.0 and base_url and is_local_endpoint(base_url):
+                _stream_read_timeout = _base_timeout
+            assert _stream_read_timeout == 1800.0
+
+    def test_user_override_respected_for_local(self):
+        """User sets HERMES_STREAM_READ_TIMEOUT -> keep their value even for local."""
+        with patch.dict(os.environ, {"HERMES_STREAM_READ_TIMEOUT": "300"}, clear=False):
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            base_url = "http://localhost:11434"
+            if _stream_read_timeout == 120.0 and base_url and is_local_endpoint(base_url):
+                _stream_read_timeout = _base_timeout
+            assert _stream_read_timeout == 300.0
+
+    @pytest.mark.parametrize("base_url", [
+        "https://api.openai.com",
+        "https://openrouter.ai/api",
+        "https://api.anthropic.com",
+    ])
+    def test_remote_endpoint_keeps_default(self, base_url):
+        """Remote endpoint -> keep 120s default."""
+        with patch.dict(os.environ, {}, clear=False):
+            os.environ.pop("HERMES_STREAM_READ_TIMEOUT", None)
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            if _stream_read_timeout == 120.0 and base_url and is_local_endpoint(base_url):
+                _stream_read_timeout = _base_timeout
+            assert _stream_read_timeout == 120.0
+
+    def test_empty_base_url_keeps_default(self):
+        """No base_url set -> keep 120s default."""
+        with patch.dict(os.environ, {}, clear=False):
+            os.environ.pop("HERMES_STREAM_READ_TIMEOUT", None)
+            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
+            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 120.0))
+            base_url = ""
+            if _stream_read_timeout == 120.0 and base_url and is_local_endpoint(base_url):
+                _stream_read_timeout = _base_timeout
+            assert _stream_read_timeout == 120.0
@@ -0,0 +1,299 @@
+"""End-to-end test: a SQLite-backed memory plugin exercising the full interface.
+
+This proves a real plugin can register as a MemoryProvider and get wired
+into the agent loop via MemoryManager. Uses SQLite + FTS5 (stdlib, no
+external deps, no API keys).
+"""
+
+import json
+import os
+import sqlite3
+import tempfile
+import pytest
+from unittest.mock import patch, MagicMock
+
+from agent.memory_provider import MemoryProvider
+from agent.memory_manager import MemoryManager
+from agent.builtin_memory_provider import BuiltinMemoryProvider
+
+
+# ---------------------------------------------------------------------------
+# SQLite FTS5 memory provider — a real, minimal plugin implementation
+# ---------------------------------------------------------------------------
+
+
+class SQLiteMemoryProvider(MemoryProvider):
+    """Minimal SQLite + FTS5 memory provider for testing.
+
+    Demonstrates the full MemoryProvider interface with a real backend.
+    No external dependencies — just stdlib sqlite3.
+    """
+
+    def __init__(self, db_path: str = ":memory:"):
+        self._db_path = db_path
+        self._conn = None
+
+    @property
+    def name(self) -> str:
+        return "sqlite_memory"
+
+    def is_available(self) -> bool:
+        return True  # SQLite is always available
+
+    def initialize(self, session_id: str, **kwargs) -> None:
+        self._conn = sqlite3.connect(self._db_path)
+        self._conn.execute("PRAGMA journal_mode=WAL")
+        self._conn.execute("""
+            CREATE VIRTUAL TABLE IF NOT EXISTS memories
+            USING fts5(content, context, session_id)
+        """)
+        self._session_id = session_id
+
+    def system_prompt_block(self) -> str:
+        if not self._conn:
+            return ""
+        count = self._conn.execute("SELECT COUNT(*) FROM memories").fetchone()[0]
+        if count == 0:
+            return ""
+        return (
+            f"# SQLite Memory Plugin\n"
+            f"Active. {count} memories stored.\n"
+            f"Use sqlite_recall to search, sqlite_retain to store."
+        )
+
+    def prefetch(self, query: str, *, session_id: str = "") -> str:
+        if not self._conn or not query:
+            return ""
+        # FTS5 search
+        try:
+            rows = self._conn.execute(
+                "SELECT content FROM memories WHERE memories MATCH ? LIMIT 5",
+                (query,)
+            ).fetchall()
+            if not rows:
+                return ""
+            results = [row[0] for row in rows]
+            return "## SQLite Memory\n" + "\n".join(f"- {r}" for r in results)
+        except sqlite3.OperationalError:
+            return ""
+
+    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
+        if not self._conn:
+            return
+        combined = f"User: {user_content}\nAssistant: {assistant_content}"
+        self._conn.execute(
+            "INSERT INTO memories (content, context, session_id) VALUES (?, ?, ?)",
+            (combined, "conversation", self._session_id),
+        )
+        self._conn.commit()
+
+    def get_tool_schemas(self):
+        return [
+            {
+                "name": "sqlite_retain",
+                "description": "Store a fact to SQLite memory.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "content": {"type": "string", "description": "What to remember"},
+                        "context": {"type": "string", "description": "Category/context"},
+                    },
+                    "required": ["content"],
+                },
+            },
+            {
+                "name": "sqlite_recall",
+                "description": "Search SQLite memory.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "query": {"type": "string", "description": "Search query"},
+                    },
+                    "required": ["query"],
+                },
+            },
+        ]
+
+    def handle_tool_call(self, tool_name: str, args: dict, **kwargs) -> str:
+        if tool_name == "sqlite_retain":
+            content = args.get("content", "")
+            context = args.get("context", "explicit")
+            if not content:
+                return json.dumps({"error": "content is required"})
+            self._conn.execute(
+                "INSERT INTO memories (content, context, session_id) VALUES (?, ?, ?)",
+                (content, context, self._session_id),
+            )
+            self._conn.commit()
+            return json.dumps({"result": "Stored."})
+
+        elif tool_name == "sqlite_recall":
+            query = args.get("query", "")
+            if not query:
+                return json.dumps({"error": "query is required"})
+            try:
+                rows = self._conn.execute(
+                    "SELECT content, context FROM memories WHERE memories MATCH ? LIMIT 10",
+                    (query,)
+                ).fetchall()
+                results = [{"content": r[0], "context": r[1]} for r in rows]
+                return json.dumps({"results": results})
+            except sqlite3.OperationalError:
+                return json.dumps({"results": []})
+
+        return json.dumps({"error": f"Unknown tool: {tool_name}"})
+
+    def on_memory_write(self, action, target, content):
+        """Mirror built-in memory writes to SQLite."""
+        if action == "add" and self._conn:
+            self._conn.execute(
+                "INSERT INTO memories (content, context, session_id) VALUES (?, ?, ?)",
+                (content, f"builtin_{target}", self._session_id),
+            )
+            self._conn.commit()
+
+    def shutdown(self):
+        if self._conn:
+            self._conn.close()
+            self._conn = None
+
+
+# ---------------------------------------------------------------------------
+# End-to-end tests
+# ---------------------------------------------------------------------------
+
+
+class TestSQLiteMemoryPlugin:
+    """Full lifecycle test with the SQLite provider."""
+
+    def test_full_lifecycle(self):
+        """Exercise init → store → recall → sync → prefetch → shutdown."""
+        mgr = MemoryManager()
+        builtin = BuiltinMemoryProvider()
+        sqlite_mem = SQLiteMemoryProvider()
+
+        mgr.add_provider(builtin)
+        mgr.add_provider(sqlite_mem)
+
+        # Initialize
+        mgr.initialize_all(session_id="test-session-1", platform="cli")
+        assert sqlite_mem._conn is not None
+
+        # System prompt — empty at first
+        prompt = mgr.build_system_prompt()
+        assert "SQLite Memory Plugin" not in prompt
+
+        # Store via tool call
+        result = json.loads(mgr.handle_tool_call(
+            "sqlite_retain", {"content": "User prefers dark mode", "context": "preference"}
+        ))
+        assert result["result"] == "Stored."
+
+        # System prompt now shows count
+        prompt = mgr.build_system_prompt()
+        assert "1 memories stored" in prompt
+
+        # Recall via tool call
+        result = json.loads(mgr.handle_tool_call(
+            "sqlite_recall", {"query": "dark mode"}
+        ))
+        assert len(result["results"]) == 1
+        assert "dark mode" in result["results"][0]["content"]
+
+        # Sync a turn (auto-stores conversation)
+        mgr.sync_all("What's my theme?", "You prefer dark mode.")
+        count = sqlite_mem._conn.execute("SELECT COUNT(*) FROM memories").fetchone()[0]
+        assert count == 2  # 1 explicit + 1 synced
+
+        # Prefetch for next turn
+        prefetched = mgr.prefetch_all("dark mode")
+        assert "dark mode" in prefetched
+
+        # Memory bridge — mirroring builtin writes
+        mgr.on_memory_write("add", "user", "Timezone: US Pacific")
+        count = sqlite_mem._conn.execute("SELECT COUNT(*) FROM memories").fetchone()[0]
+        assert count == 3
+
+        # Shutdown
+        mgr.shutdown_all()
+        assert sqlite_mem._conn is None
+
+    def test_tool_routing_with_builtin(self):
+        """Verify builtin + plugin tools coexist without conflict."""
+        mgr = MemoryManager()
+        builtin = BuiltinMemoryProvider()
+        sqlite_mem = SQLiteMemoryProvider()
+        mgr.add_provider(builtin)
+        mgr.add_provider(sqlite_mem)
+        mgr.initialize_all(session_id="test-2")
+
+        # Builtin has no tools
+        assert len(builtin.get_tool_schemas()) == 0
+        # SQLite has 2 tools
+        schemas = mgr.get_all_tool_schemas()
+        names = {s["name"] for s in schemas}
+        assert names == {"sqlite_retain", "sqlite_recall"}
+
+        # Routing works
+        assert mgr.has_tool("sqlite_retain")
+        assert mgr.has_tool("sqlite_recall")
+        assert not mgr.has_tool("memory")  # builtin doesn't register this
+
+    def test_second_external_plugin_rejected(self):
+        """Only one external memory provider is allowed at a time."""
+        mgr = MemoryManager()
+        p1 = SQLiteMemoryProvider()
+        p2 = SQLiteMemoryProvider()
+        # Hack name for p2
+        p2._name_override = "sqlite_memory_2"
+        original_name = p2.__class__.name
+        type(p2).name = property(lambda self: getattr(self, '_name_override', 'sqlite_memory'))
+
+        mgr.add_provider(p1)
+        mgr.add_provider(p2)  # should be rejected
+
+        # Only p1 was accepted
+        assert len(mgr.providers) == 1
+        assert mgr.provider_names == ["sqlite_memory"]
+
+        # Restore class
+        type(p2).name = original_name
+        mgr.shutdown_all()
+
+    def test_provider_failure_isolation(self):
+        """Failing external provider doesn't break builtin."""
+        from agent.builtin_memory_provider import BuiltinMemoryProvider
+
+        mgr = MemoryManager()
+        builtin = BuiltinMemoryProvider()  # name="builtin", always accepted
+        ext = SQLiteMemoryProvider()
+
+        mgr.add_provider(builtin)
+        mgr.add_provider(ext)
+        mgr.initialize_all(session_id="test-4")
+
+        # Break external provider's connection
+        ext._conn.close()
+        ext._conn = None
+
+        # Sync — external fails silently, builtin (no-op sync) succeeds
+        mgr.sync_all("user", "assistant")  # should not raise
+
+        mgr.shutdown_all()
+
+    def test_plugin_registration_flow(self):
+        """Simulate the full plugin load → agent init path."""
+        # Simulate what AIAgent.__init__ does via plugins/memory/ discovery
+        provider = SQLiteMemoryProvider()
+
+        mem_mgr = MemoryManager()
+        mem_mgr.add_provider(BuiltinMemoryProvider())
+        if provider.is_available():
+            mem_mgr.add_provider(provider)
+        mem_mgr.initialize_all(session_id="agent-session")
+
+        assert len(mem_mgr.providers) == 2
+        assert mem_mgr.provider_names == ["builtin", "sqlite_memory"]
+        assert provider._conn is not None  # initialized = connection established
+
+        mem_mgr.shutdown_all()
@@ -6,6 +6,8 @@ from unittest.mock import MagicMock, patch

 from agent.memory_provider import MemoryProvider
 from agent.memory_manager import MemoryManager
+from agent.builtin_memory_provider import BuiltinMemoryProvider
+

 # ---------------------------------------------------------------------------
 # Concrete test provider
@@ -116,7 +118,7 @@ class TestMemoryManager:
    def test_empty_manager(self):
        mgr = MemoryManager()
        assert mgr.providers == []
-        assert [p.name for p in mgr.providers] == []
+        assert mgr.provider_names == []
        assert mgr.get_all_tool_schemas() == []
        assert mgr.build_system_prompt() == ""
        assert mgr.prefetch_all("test") == ""
@@ -126,7 +128,7 @@ class TestMemoryManager:
        p = FakeMemoryProvider("test1")
        mgr.add_provider(p)
        assert len(mgr.providers) == 1
-        assert [p.name for p in mgr.providers] == ["test1"]
+        assert mgr.provider_names == ["test1"]

    def test_get_provider_by_name(self):
        mgr = MemoryManager()
@@ -141,7 +143,7 @@ class TestMemoryManager:
        p2 = FakeMemoryProvider("external")
        mgr.add_provider(p1)
        mgr.add_provider(p2)
-        assert [p.name for p in mgr.providers] == ["builtin", "external"]
+        assert mgr.provider_names == ["builtin", "external"]

    def test_second_external_rejected(self):
        """Only one non-builtin provider is allowed."""
@@ -152,7 +154,7 @@ class TestMemoryManager:
        mgr.add_provider(builtin)
        mgr.add_provider(ext1)
        mgr.add_provider(ext2)  # should be rejected
-        assert [p.name for p in mgr.providers] == ["builtin", "mem0"]
+        assert mgr.provider_names == ["builtin", "mem0"]
        assert len(mgr.providers) == 2

    def test_system_prompt_merges_blocks(self):
@@ -319,6 +321,17 @@ class TestMemoryManager:
        mgr.on_pre_compress([{"role": "user", "content": "old"}])
        assert p.pre_compress_called

+    def test_on_memory_write_skips_builtin(self):
+        """on_memory_write should skip the builtin provider."""
+        mgr = MemoryManager()
+        builtin = BuiltinMemoryProvider()
+        external = FakeMemoryProvider("external")
+        mgr.add_provider(builtin)
+        mgr.add_provider(external)
+
+        mgr.on_memory_write("add", "memory", "test fact")
+        assert external.memory_writes == [("add", "memory", "test fact")]
+
    def test_shutdown_all_reverse_order(self):
        mgr = MemoryManager()
        order = []
@@ -372,6 +385,146 @@ class TestMemoryManager:
        assert result == "works fine"


+# ---------------------------------------------------------------------------
+# BuiltinMemoryProvider tests
+# ---------------------------------------------------------------------------
+
+
+class TestBuiltinMemoryProvider:
+    def test_name(self):
+        p = BuiltinMemoryProvider()
+        assert p.name == "builtin"
+
+    def test_always_available(self):
+        p = BuiltinMemoryProvider()
+        assert p.is_available()
+
+    def test_no_tools(self):
+        """Builtin provider exposes no tools (memory tool is agent-level)."""
+        p = BuiltinMemoryProvider()
+        assert p.get_tool_schemas() == []
+
+    def test_system_prompt_with_store(self):
+        store = MagicMock()
+        store.format_for_system_prompt.side_effect = lambda t: f"BLOCK_{t}" if t == "memory" else f"BLOCK_{t}"
+
+        p = BuiltinMemoryProvider(
+            memory_store=store,
+            memory_enabled=True,
+            user_profile_enabled=True,
+        )
+        block = p.system_prompt_block()
+        assert "BLOCK_memory" in block
+        assert "BLOCK_user" in block
+
+    def test_system_prompt_memory_disabled(self):
+        store = MagicMock()
+        store.format_for_system_prompt.return_value = "content"
+
+        p = BuiltinMemoryProvider(
+            memory_store=store,
+            memory_enabled=False,
+            user_profile_enabled=False,
+        )
+        assert p.system_prompt_block() == ""
+
+    def test_system_prompt_no_store(self):
+        p = BuiltinMemoryProvider(memory_store=None, memory_enabled=True)
+        assert p.system_prompt_block() == ""
+
+    def test_prefetch_returns_empty(self):
+        p = BuiltinMemoryProvider()
+        assert p.prefetch("anything") == ""
+
+    def test_store_property(self):
+        store = MagicMock()
+        p = BuiltinMemoryProvider(memory_store=store)
+        assert p.store is store
+
+    def test_initialize_loads_from_disk(self):
+        store = MagicMock()
+        p = BuiltinMemoryProvider(memory_store=store)
+        p.initialize(session_id="test")
+        store.load_from_disk.assert_called_once()
+
+
+# ---------------------------------------------------------------------------
+# Plugin registration tests
+# ---------------------------------------------------------------------------
+
+
+class TestSingleProviderGating:
+    """Only the configured provider should activate."""
+
+    def test_no_provider_configured_means_builtin_only(self):
+        """When memory.provider is empty, no plugin providers activate."""
+        mgr = MemoryManager()
+        builtin = BuiltinMemoryProvider()
+        mgr.add_provider(builtin)
+
+        # Simulate what run_agent.py does when provider="" 
+        configured = ""
+        available_plugins = [
+            FakeMemoryProvider("holographic"),
+            FakeMemoryProvider("mem0"),
+        ]
+        # With empty config, no plugins should be added
+        if configured:
+            for p in available_plugins:
+                if p.name == configured and p.is_available():
+                    mgr.add_provider(p)
+
+        assert mgr.provider_names == ["builtin"]
+
+    def test_configured_provider_activates(self):
+        """Only the named provider should be added."""
+        mgr = MemoryManager()
+        builtin = BuiltinMemoryProvider()
+        mgr.add_provider(builtin)
+
+        configured = "holographic"
+        p1 = FakeMemoryProvider("holographic")
+        p2 = FakeMemoryProvider("mem0")
+        p3 = FakeMemoryProvider("hindsight")
+
+        for p in [p1, p2, p3]:
+            if p.name == configured and p.is_available():
+                mgr.add_provider(p)
+
+        assert mgr.provider_names == ["builtin", "holographic"]
+        assert p1.initialized is False  # not initialized by the gating logic itself
+
+    def test_unavailable_provider_skipped(self):
+        """If the configured provider is unavailable, it should be skipped."""
+        mgr = MemoryManager()
+        builtin = BuiltinMemoryProvider()
+        mgr.add_provider(builtin)
+
+        configured = "holographic"
+        p1 = FakeMemoryProvider("holographic", available=False)
+
+        for p in [p1]:
+            if p.name == configured and p.is_available():
+                mgr.add_provider(p)
+
+        assert mgr.provider_names == ["builtin"]
+
+    def test_nonexistent_provider_results_in_builtin_only(self):
+        """If the configured name doesn't match any plugin, only builtin remains."""
+        mgr = MemoryManager()
+        builtin = BuiltinMemoryProvider()
+        mgr.add_provider(builtin)
+
+        configured = "nonexistent"
+        plugins = [FakeMemoryProvider("holographic"), FakeMemoryProvider("mem0")]
+
+        for p in plugins:
+            if p.name == configured and p.is_available():
+                mgr.add_provider(p)
+
+        assert mgr.provider_names == ["builtin"]
+
+
 class TestPluginMemoryDiscovery:
    """Memory providers are discovered from plugins/memory/ directory."""

@@ -1,4 +1,6 @@
-"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog."""
+"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog, beta headers."""
+
+from unittest.mock import patch


 class TestMinimaxContextLengths:
@@ -103,3 +105,100 @@ class TestMinimaxModelCatalog:
            models = _PROVIDER_MODELS[provider]
            assert "MiniMax-M2.7-highspeed" not in models
            assert "MiniMax-M2.5-highspeed" not in models
+
+
+class TestMinimaxBetaHeaders:
+    """MiniMax Anthropic-compat endpoints reject fine-grained-tool-streaming beta.
+
+    Verify that build_anthropic_client omits the tool-streaming beta for MiniMax
+    (both global and China domains) while keeping it for native Anthropic and
+    other third-party endpoints.  Covers the fix for #6510 / #6555.
+    """
+
+    _TOOL_BETA = "fine-grained-tool-streaming-2025-05-14"
+    _THINKING_BETA = "interleaved-thinking-2025-05-14"
+
+    # -- helper ----------------------------------------------------------
+
+    def _build_and_get_betas(self, api_key, base_url=None):
+        """Build client, return the anthropic-beta header string."""
+        from agent.anthropic_adapter import build_anthropic_client
+        with patch("agent.anthropic_adapter._anthropic_sdk") as mock_sdk:
+            build_anthropic_client(api_key, base_url=base_url)
+            kwargs = mock_sdk.Anthropic.call_args[1]
+            headers = kwargs.get("default_headers", {})
+            return headers.get("anthropic-beta", "")
+
+    # -- MiniMax global --------------------------------------------------
+
+    def test_minimax_global_omits_tool_streaming(self):
+        betas = self._build_and_get_betas(
+            "mm-key-123", base_url="https://api.minimax.io/anthropic"
+        )
+        assert self._TOOL_BETA not in betas
+        assert self._THINKING_BETA in betas
+
+    def test_minimax_global_trailing_slash(self):
+        betas = self._build_and_get_betas(
+            "mm-key-123", base_url="https://api.minimax.io/anthropic/"
+        )
+        assert self._TOOL_BETA not in betas
+
+    # -- MiniMax China ---------------------------------------------------
+
+    def test_minimax_cn_omits_tool_streaming(self):
+        betas = self._build_and_get_betas(
+            "mm-cn-key-456", base_url="https://api.minimaxi.com/anthropic"
+        )
+        assert self._TOOL_BETA not in betas
+        assert self._THINKING_BETA in betas
+
+    def test_minimax_cn_trailing_slash(self):
+        betas = self._build_and_get_betas(
+            "mm-cn-key-456", base_url="https://api.minimaxi.com/anthropic/"
+        )
+        assert self._TOOL_BETA not in betas
+
+    # -- Non-MiniMax keeps full betas ------------------------------------
+
+    def test_native_anthropic_keeps_tool_streaming(self):
+        betas = self._build_and_get_betas("sk-ant-api03-real-key-here")
+        assert self._TOOL_BETA in betas
+        assert self._THINKING_BETA in betas
+
+    def test_third_party_proxy_keeps_tool_streaming(self):
+        betas = self._build_and_get_betas(
+            "custom-key", base_url="https://my-proxy.example.com/anthropic"
+        )
+        assert self._TOOL_BETA in betas
+
+    def test_custom_base_url_keeps_tool_streaming(self):
+        betas = self._build_and_get_betas(
+            "custom-key", base_url="https://custom.api.com"
+        )
+        assert self._TOOL_BETA in betas
+
+    # -- _common_betas_for_base_url unit tests ---------------------------
+
+    def test_common_betas_none_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _COMMON_BETAS
+        assert _common_betas_for_base_url(None) == _COMMON_BETAS
+
+    def test_common_betas_empty_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _COMMON_BETAS
+        assert _common_betas_for_base_url("") == _COMMON_BETAS
+
+    def test_common_betas_minimax_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _TOOL_STREAMING_BETA
+        betas = _common_betas_for_base_url("https://api.minimax.io/anthropic")
+        assert _TOOL_STREAMING_BETA not in betas
+        assert len(betas) > 0  # still has other betas
+
+    def test_common_betas_minimax_cn_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _TOOL_STREAMING_BETA
+        betas = _common_betas_for_base_url("https://api.minimaxi.com/anthropic")
+        assert _TOOL_STREAMING_BETA not in betas
+
+    def test_common_betas_regular_url(self):
+        from agent.anthropic_adapter import _common_betas_for_base_url, _COMMON_BETAS
+        assert _common_betas_for_base_url("https://api.anthropic.com") == _COMMON_BETAS
@@ -11,6 +11,7 @@ from agent.prompt_builder import (
    _scan_context_content,
    _truncate_content,
    _parse_skill_file,
+    _read_skill_conditions,
    _skill_should_show,
    _find_hermes_md,
    _find_git_root,
@@ -774,6 +775,61 @@ class TestPromptBuilderConstants:
 # Conditional skill activation
 # =========================================================================

+class TestReadSkillConditions:
+    def test_no_conditions_returns_empty_lists(self, tmp_path):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text("---\nname: test\ndescription: A skill\n---\n")
+        conditions = _read_skill_conditions(skill_file)
+        assert conditions["fallback_for_toolsets"] == []
+        assert conditions["requires_toolsets"] == []
+        assert conditions["fallback_for_tools"] == []
+        assert conditions["requires_tools"] == []
+
+    def test_reads_fallback_for_toolsets(self, tmp_path):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text(
+            "---\nname: ddg\ndescription: DuckDuckGo\nmetadata:\n  hermes:\n    fallback_for_toolsets: [web]\n---\n"
+        )
+        conditions = _read_skill_conditions(skill_file)
+        assert conditions["fallback_for_toolsets"] == ["web"]
+
+    def test_reads_requires_toolsets(self, tmp_path):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text(
+            "---\nname: openhue\ndescription: Hue lights\nmetadata:\n  hermes:\n    requires_toolsets: [terminal]\n---\n"
+        )
+        conditions = _read_skill_conditions(skill_file)
+        assert conditions["requires_toolsets"] == ["terminal"]
+
+    def test_reads_multiple_conditions(self, tmp_path):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text(
+            "---\nname: test\ndescription: Test\nmetadata:\n  hermes:\n    fallback_for_toolsets: [browser]\n    requires_tools: [terminal]\n---\n"
+        )
+        conditions = _read_skill_conditions(skill_file)
+        assert conditions["fallback_for_toolsets"] == ["browser"]
+        assert conditions["requires_tools"] == ["terminal"]
+
+    def test_missing_file_returns_empty(self, tmp_path):
+        conditions = _read_skill_conditions(tmp_path / "missing.md")
+        assert conditions == {}
+
+    def test_logs_condition_read_failures_and_returns_empty(self, tmp_path, monkeypatch, caplog):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text("---\nname: broken\n---\n")
+
+        def boom(*args, **kwargs):
+            raise OSError("read exploded")
+
+        monkeypatch.setattr(type(skill_file), "read_text", boom)
+        with caplog.at_level(logging.DEBUG, logger="agent.prompt_builder"):
+            conditions = _read_skill_conditions(skill_file)
+
+        assert conditions == {}
+        assert "Failed to read skill conditions" in caplog.text
+        assert str(skill_file) in caplog.text
+
+
 class TestSkillShouldShow:
    def test_no_filter_info_always_shows(self):
        assert _skill_should_show({}, None, None) is True
@@ -147,6 +147,20 @@ class TestEscapedSpaces:
        assert result["path"] == tmp_image_with_spaces
        assert result["remainder"] == "what is this?"

+    def test_tilde_prefixed_path(self, tmp_path, monkeypatch):
+        home = tmp_path / "home"
+        img = home / "storage" / "shared" / "Pictures" / "cat.png"
+        img.parent.mkdir(parents=True, exist_ok=True)
+        img.write_bytes(b"\x89PNG\r\n\x1a\n")
+        monkeypatch.setenv("HOME", str(home))
+
+        result = _detect_file_drop("~/storage/shared/Pictures/cat.png what is this?")
+
+        assert result is not None
+        assert result["path"] == img
+        assert result["is_image"] is True
+        assert result["remainder"] == "what is this?"
+

 # ---------------------------------------------------------------------------
 # Tests: edge cases
@@ -0,0 +1,109 @@
+from pathlib import Path
+from unittest.mock import patch
+
+from cli import (
+    HermesCLI,
+    _collect_query_images,
+    _format_image_attachment_badges,
+    _termux_example_image_path,
+)
+
+
+def _make_cli():
+    cli_obj = HermesCLI.__new__(HermesCLI)
+    cli_obj._attached_images = []
+    return cli_obj
+
+
+def _make_image(path: Path) -> Path:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_bytes(b"\x89PNG\r\n\x1a\n")
+    return path
+
+
+class TestImageCommand:
+    def test_handle_image_command_attaches_local_image(self, tmp_path):
+        img = _make_image(tmp_path / "photo.png")
+        cli_obj = _make_cli()
+
+        with patch("cli._cprint"):
+            cli_obj._handle_image_command(f"/image {img}")
+
+        assert cli_obj._attached_images == [img]
+
+    def test_handle_image_command_supports_quoted_path_with_spaces(self, tmp_path):
+        img = _make_image(tmp_path / "my photo.png")
+        cli_obj = _make_cli()
+
+        with patch("cli._cprint"):
+            cli_obj._handle_image_command(f'/image "{img}"')
+
+        assert cli_obj._attached_images == [img]
+
+    def test_handle_image_command_rejects_non_image_file(self, tmp_path):
+        file_path = tmp_path / "notes.txt"
+        file_path.write_text("hello\n", encoding="utf-8")
+        cli_obj = _make_cli()
+
+        with patch("cli._cprint") as mock_print:
+            cli_obj._handle_image_command(f"/image {file_path}")
+
+        assert cli_obj._attached_images == []
+        rendered = " ".join(str(arg) for call in mock_print.call_args_list for arg in call.args)
+        assert "Not a supported image file" in rendered
+
+
+class TestCollectQueryImages:
+    def test_collect_query_images_accepts_explicit_image_arg(self, tmp_path):
+        img = _make_image(tmp_path / "diagram.png")
+
+        message, images = _collect_query_images("describe this", str(img))
+
+        assert message == "describe this"
+        assert images == [img]
+
+    def test_collect_query_images_extracts_leading_path(self, tmp_path):
+        img = _make_image(tmp_path / "camera.png")
+
+        message, images = _collect_query_images(f"{img} what do you see?")
+
+        assert message == "what do you see?"
+        assert images == [img]
+
+    def test_collect_query_images_supports_tilde_paths(self, tmp_path, monkeypatch):
+        home = tmp_path / "home"
+        img = _make_image(home / "storage" / "shared" / "Pictures" / "cat.png")
+        monkeypatch.setenv("HOME", str(home))
+
+        message, images = _collect_query_images("describe this", "~/storage/shared/Pictures/cat.png")
+
+        assert message == "describe this"
+        assert images == [img]
+
+
+class TestTermuxImageHints:
+    def test_termux_example_image_path_prefers_real_shared_storage_root(self, monkeypatch):
+        existing = {"/sdcard", "/storage/emulated/0"}
+        monkeypatch.setattr("cli.os.path.isdir", lambda path: path in existing)
+
+        hint = _termux_example_image_path()
+
+        assert hint == "/sdcard/Pictures/cat.png"
+
+
+class TestImageBadgeFormatting:
+    def test_compact_badges_use_filename_on_narrow_terminals(self, tmp_path):
+        img = _make_image(tmp_path / "Screenshot 2026-04-09 at 11.22.33 AM.png")
+
+        badges = _format_image_attachment_badges([img], image_counter=1, width=40)
+
+        assert badges.startswith("[📎 ")
+        assert "Image #1" not in badges
+
+    def test_compact_badges_summarize_multiple_images(self, tmp_path):
+        img1 = _make_image(tmp_path / "one.png")
+        img2 = _make_image(tmp_path / "two.png")
+
+        badges = _format_image_attachment_badges([img1, img2], image_counter=2, width=45)
+
+        assert badges == "[📎 2 images attached]"
@@ -49,6 +49,25 @@ class TestCliSkinPromptIntegration:
        set_active_skin("ares")
        assert cli._get_tui_prompt_fragments() == [("class:sudo-prompt", "🔑 ❯ ")]

+    def test_narrow_terminals_compact_voice_prompt_fragments(self):
+        cli = _make_cli_stub()
+        cli._voice_mode = True
+
+        with patch.object(HermesCLI, "_get_tui_terminal_width", return_value=50):
+            assert cli._get_tui_prompt_fragments() == [("class:voice-prompt", "🎤 ")]
+
+    def test_narrow_terminals_compact_voice_recording_prompt_fragments(self):
+        cli = _make_cli_stub()
+        cli._voice_recording = True
+        cli._voice_recorder = SimpleNamespace(current_rms=3000)
+
+        with patch.object(HermesCLI, "_get_tui_terminal_width", return_value=50):
+            frags = cli._get_tui_prompt_fragments()
+
+        assert frags[0][0] == "class:voice-recording"
+        assert frags[0][1].startswith("●")
+        assert "❯" not in frags[0][1]
+
    def test_icon_only_skin_symbol_still_visible_in_special_states(self):
        cli = _make_cli_stub()
        cli._secret_state = {"response_queue": object()}
@@ -206,6 +206,59 @@ class TestCLIStatusBar:
        assert "⚕" in text
        assert "claude-sonnet-4-20250514" in text

+    def test_minimal_tui_chrome_threshold(self):
+        cli_obj = _make_cli()
+
+        assert cli_obj._use_minimal_tui_chrome(width=63) is True
+        assert cli_obj._use_minimal_tui_chrome(width=64) is False
+
+    def test_bottom_input_rule_hides_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+
+        assert cli_obj._tui_input_rule_height("top", width=50) == 1
+        assert cli_obj._tui_input_rule_height("bottom", width=50) == 0
+        assert cli_obj._tui_input_rule_height("bottom", width=90) == 1
+
+    def test_agent_spacer_reclaimed_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+        cli_obj._agent_running = True
+
+        assert cli_obj._agent_spacer_height(width=50) == 0
+        assert cli_obj._agent_spacer_height(width=90) == 1
+        cli_obj._agent_running = False
+        assert cli_obj._agent_spacer_height(width=90) == 0
+
+    def test_spinner_line_hidden_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+        cli_obj._spinner_text = "thinking"
+
+        assert cli_obj._spinner_widget_height(width=50) == 0
+        assert cli_obj._spinner_widget_height(width=90) == 1
+        cli_obj._spinner_text = ""
+        assert cli_obj._spinner_widget_height(width=90) == 0
+
+    def test_voice_status_bar_compacts_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+        cli_obj._voice_mode = True
+        cli_obj._voice_recording = False
+        cli_obj._voice_processing = False
+        cli_obj._voice_tts = True
+        cli_obj._voice_continuous = True
+
+        fragments = cli_obj._get_voice_status_fragments(width=50)
+
+        assert fragments == [("class:voice-status", " 🎤 Ctrl+B ")]
+
+    def test_voice_recording_status_bar_compacts_on_narrow_terminals(self):
+        cli_obj = _make_cli()
+        cli_obj._voice_mode = True
+        cli_obj._voice_recording = True
+        cli_obj._voice_processing = False
+
+        fragments = cli_obj._get_voice_status_fragments(width=50)
+
+        assert fragments == [("class:voice-status-recording", " ● REC ")]
+

 class TestCLIUsageReport:
    def test_show_usage_includes_estimated_cost(self, capsys):
@@ -0,0 +1,256 @@
+"""Tests for the /fast CLI command and service-tier config handling."""
+
+import unittest
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+
+def _import_cli():
+    import hermes_cli.config as config_mod
+
+    if not hasattr(config_mod, "save_env_value_secure"):
+        config_mod.save_env_value_secure = lambda key, value: {
+            "success": True,
+            "stored_as": key,
+            "validated": False,
+        }
+
+    import cli as cli_mod
+
+    return cli_mod
+
+
+class TestParseServiceTierConfig(unittest.TestCase):
+    def _parse(self, raw):
+        cli_mod = _import_cli()
+        return cli_mod._parse_service_tier_config(raw)
+
+    def test_fast_maps_to_priority(self):
+        self.assertEqual(self._parse("fast"), "priority")
+        self.assertEqual(self._parse("priority"), "priority")
+
+    def test_normal_disables_service_tier(self):
+        self.assertIsNone(self._parse("normal"))
+        self.assertIsNone(self._parse("off"))
+        self.assertIsNone(self._parse(""))
+
+
+class TestHandleFastCommand(unittest.TestCase):
+    def _make_cli(self, service_tier=None):
+        return SimpleNamespace(
+            service_tier=service_tier,
+            provider="openai-codex",
+            requested_provider="openai-codex",
+            model="gpt-5.4",
+            _fast_command_available=lambda: True,
+            agent=MagicMock(),
+        )
+
+    def test_no_args_shows_status(self):
+        cli_mod = _import_cli()
+        stub = self._make_cli(service_tier=None)
+        with (
+            patch.object(cli_mod, "_cprint") as mock_cprint,
+            patch.object(cli_mod, "save_config_value") as mock_save,
+        ):
+            cli_mod.HermesCLI._handle_fast_command(stub, "/fast")
+
+        # Bare /fast shows status, does not change config
+        mock_save.assert_not_called()
+        # Should have printed the status line
+        printed = " ".join(str(c) for c in mock_cprint.call_args_list)
+        self.assertIn("normal", printed)
+
+    def test_no_args_shows_fast_when_enabled(self):
+        cli_mod = _import_cli()
+        stub = self._make_cli(service_tier="priority")
+        with (
+            patch.object(cli_mod, "_cprint") as mock_cprint,
+            patch.object(cli_mod, "save_config_value") as mock_save,
+        ):
+            cli_mod.HermesCLI._handle_fast_command(stub, "/fast")
+
+        mock_save.assert_not_called()
+        printed = " ".join(str(c) for c in mock_cprint.call_args_list)
+        self.assertIn("fast", printed)
+
+    def test_normal_argument_clears_service_tier(self):
+        cli_mod = _import_cli()
+        stub = self._make_cli(service_tier="priority")
+        with (
+            patch.object(cli_mod, "_cprint"),
+            patch.object(cli_mod, "save_config_value", return_value=True) as mock_save,
+        ):
+            cli_mod.HermesCLI._handle_fast_command(stub, "/fast normal")
+
+        mock_save.assert_called_once_with("agent.service_tier", "normal")
+        self.assertIsNone(stub.service_tier)
+        self.assertIsNone(stub.agent)
+
+    def test_unsupported_model_does_not_expose_fast(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            service_tier=None,
+            provider="openai-codex",
+            requested_provider="openai-codex",
+            model="gpt-5.3-codex",
+            _fast_command_available=lambda: False,
+            agent=MagicMock(),
+        )
+
+        with (
+            patch.object(cli_mod, "_cprint") as mock_cprint,
+            patch.object(cli_mod, "save_config_value") as mock_save,
+        ):
+            cli_mod.HermesCLI._handle_fast_command(stub, "/fast")
+
+        mock_save.assert_not_called()
+        self.assertTrue(mock_cprint.called)
+
+
+class TestPriorityProcessingModels(unittest.TestCase):
+    """Verify the expanded Priority Processing model registry."""
+
+    def test_all_documented_models_supported(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        # All models from OpenAI's Priority Processing pricing table
+        supported = [
+            "gpt-5.4", "gpt-5.4-mini", "gpt-5.2",
+            "gpt-5.1", "gpt-5", "gpt-5-mini",
+            "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano",
+            "gpt-4o", "gpt-4o-mini",
+            "o3", "o4-mini",
+        ]
+        for model in supported:
+            assert model_supports_fast_mode(model), f"{model} should support fast mode"
+
+    def test_vendor_prefix_stripped(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        assert model_supports_fast_mode("openai/gpt-5.4") is True
+        assert model_supports_fast_mode("openai/gpt-4.1") is True
+        assert model_supports_fast_mode("openai/o3") is True
+
+    def test_non_priority_models_rejected(self):
+        from hermes_cli.models import model_supports_fast_mode
+
+        assert model_supports_fast_mode("gpt-5.3-codex") is False
+        assert model_supports_fast_mode("claude-sonnet-4") is False
+        assert model_supports_fast_mode("") is False
+        assert model_supports_fast_mode(None) is False
+
+    def test_resolve_overrides_returns_service_tier(self):
+        from hermes_cli.models import resolve_fast_mode_overrides
+
+        result = resolve_fast_mode_overrides("gpt-5.4")
+        assert result == {"service_tier": "priority"}
+
+        result = resolve_fast_mode_overrides("gpt-4.1")
+        assert result == {"service_tier": "priority"}
+
+    def test_resolve_overrides_none_for_unsupported(self):
+        from hermes_cli.models import resolve_fast_mode_overrides
+
+        assert resolve_fast_mode_overrides("gpt-5.3-codex") is None
+        assert resolve_fast_mode_overrides("claude-sonnet-4") is None
+
+
+class TestFastModeRouting(unittest.TestCase):
+    def test_fast_command_exposed_for_model_even_when_provider_is_auto(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(provider="auto", requested_provider="auto", model="gpt-5.4", agent=None)
+
+        assert cli_mod.HermesCLI._fast_command_available(stub) is True
+
+    def test_fast_command_exposed_for_non_codex_models(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(provider="openai", requested_provider="openai", model="gpt-4.1", agent=None)
+        assert cli_mod.HermesCLI._fast_command_available(stub) is True
+
+        stub = SimpleNamespace(provider="openrouter", requested_provider="openrouter", model="o3", agent=None)
+        assert cli_mod.HermesCLI._fast_command_available(stub) is True
+
+    def test_turn_route_injects_overrides_without_provider_switch(self):
+        """Fast mode should add request_overrides but NOT change the provider/runtime."""
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            model="gpt-5.4",
+            api_key="primary-key",
+            base_url="https://openrouter.ai/api/v1",
+            provider="openrouter",
+            api_mode="chat_completions",
+            acp_command=None,
+            acp_args=[],
+            _credential_pool=None,
+            _smart_model_routing={},
+            service_tier="priority",
+        )
+
+        original_runtime = {
+            "api_key": "***",
+            "base_url": "https://openrouter.ai/api/v1",
+            "provider": "openrouter",
+            "api_mode": "chat_completions",
+            "command": None,
+            "args": [],
+            "credential_pool": None,
+        }
+
+        with patch("agent.smart_model_routing.resolve_turn_route", return_value={
+            "model": "gpt-5.4",
+            "runtime": dict(original_runtime),
+            "label": None,
+            "signature": ("gpt-5.4", "openrouter", "https://openrouter.ai/api/v1", "chat_completions", None, ()),
+        }):
+            route = cli_mod.HermesCLI._resolve_turn_agent_config(stub, "hi")
+
+        # Provider should NOT have changed
+        assert route["runtime"]["provider"] == "openrouter"
+        assert route["runtime"]["api_mode"] == "chat_completions"
+        # But request_overrides should be set
+        assert route["request_overrides"] == {"service_tier": "priority"}
+
+    def test_turn_route_keeps_primary_runtime_when_model_has_no_fast_backend(self):
+        cli_mod = _import_cli()
+        stub = SimpleNamespace(
+            model="gpt-5.3-codex",
+            api_key="primary-key",
+            base_url="https://openrouter.ai/api/v1",
+            provider="openrouter",
+            api_mode="chat_completions",
+            acp_command=None,
+            acp_args=[],
+            _credential_pool=None,
+            _smart_model_routing={},
+            service_tier="priority",
+        )
+
+        primary_route = {
+            "model": "gpt-5.3-codex",
+            "runtime": {
+                "api_key": "***",
+                "base_url": "https://openrouter.ai/api/v1",
+                "provider": "openrouter",
+                "api_mode": "chat_completions",
+                "command": None,
+                "args": [],
+                "credential_pool": None,
+            },
+            "label": None,
+            "signature": ("gpt-5.3-codex", "openrouter", "https://openrouter.ai/api/v1", "chat_completions", None, ()),
+        }
+        with patch("agent.smart_model_routing.resolve_turn_route", return_value=primary_route):
+            route = cli_mod.HermesCLI._resolve_turn_agent_config(stub, "hi")
+
+        assert route["runtime"]["provider"] == "openrouter"
+        assert route.get("request_overrides") is None
+
+
+class TestConfigDefault(unittest.TestCase):
+    def test_default_config_has_service_tier(self):
+        from hermes_cli.config import DEFAULT_CONFIG
+
+        agent = DEFAULT_CONFIG.get("agent", {})
+        self.assertIn("service_tier", agent)
+        self.assertEqual(agent["service_tier"], "")
@@ -619,14 +619,17 @@ class TestReasoningDeltasFiredFlag(unittest.TestCase):
        agent = AIAgent.__new__(AIAgent)
        agent.reasoning_callback = None
        agent.stream_delta_callback = None
+        agent._reasoning_deltas_fired = False
        agent.verbose_logging = False
        return agent

-    def test_fire_reasoning_delta_calls_callback(self):
+    def test_fire_reasoning_delta_sets_flag(self):
        agent = self._make_agent()
        captured = []
        agent.reasoning_callback = lambda t: captured.append(t)
+        self.assertFalse(agent._reasoning_deltas_fired)
        agent._fire_reasoning_delta("thinking...")
+        self.assertTrue(agent._reasoning_deltas_fired)
        self.assertEqual(captured, ["thinking..."])

    def test_build_assistant_message_skips_callback_when_already_streamed(self):
@@ -637,7 +640,8 @@ class TestReasoningDeltasFiredFlag(unittest.TestCase):
        agent.reasoning_callback = lambda t: captured.append(t)
        agent.stream_delta_callback = lambda t: None  # streaming is active

-        # Simulate streaming having already fired reasoning
+        # Simulate streaming having fired reasoning
+        agent._reasoning_deltas_fired = True

        msg = SimpleNamespace(
            content="I'll merge that.",
@@ -661,8 +665,9 @@ class TestReasoningDeltasFiredFlag(unittest.TestCase):
        agent.reasoning_callback = lambda t: captured.append(t)
        agent.stream_delta_callback = lambda t: None  # streaming active

-        # Reasoning came through content tags, not reasoning_content deltas.
-        # Callback should not fire since streaming is active.
+        # Even though _reasoning_deltas_fired is False (reasoning came through
+        # content tags, not reasoning_content deltas), callback should not fire
+        agent._reasoning_deltas_fired = False

        msg = SimpleNamespace(
            content="I'll merge that.",
@@ -684,6 +689,7 @@ class TestReasoningDeltasFiredFlag(unittest.TestCase):
        agent.reasoning_callback = lambda t: captured.append(t)
        # No streaming
        agent.stream_delta_callback = None
+        agent._reasoning_deltas_fired = False

        msg = SimpleNamespace(
            content="I'll merge that.",
@@ -0,0 +1,138 @@
+"""Tests for _stream_delta's handling of <think> tags in prose vs real reasoning blocks."""
+import sys
+import os
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
+
+import pytest
+
+
+def _make_cli_stub():
+    """Create a minimal HermesCLI-like object with stream state."""
+    from cli import HermesCLI
+
+    cli = HermesCLI.__new__(HermesCLI)
+    cli.show_reasoning = False
+    cli._stream_buf = ""
+    cli._stream_started = False
+    cli._stream_box_opened = False
+    cli._stream_prefilt = ""
+    cli._in_reasoning_block = False
+    cli._reasoning_stream_started = False
+    cli._reasoning_box_opened = False
+    cli._reasoning_buf = ""
+    cli._reasoning_preview_buf = ""
+    cli._deferred_content = ""
+    cli._stream_text_ansi = ""
+    cli._stream_needs_break = False
+    cli._emitted = []
+
+    # Mock _emit_stream_text to capture output
+    def mock_emit(text):
+        cli._emitted.append(text)
+    cli._emit_stream_text = mock_emit
+
+    # Mock _stream_reasoning_delta
+    cli._reasoning_emitted = []
+    def mock_reasoning(text):
+        cli._reasoning_emitted.append(text)
+    cli._stream_reasoning_delta = mock_reasoning
+
+    return cli
+
+
+class TestThinkTagInProse:
+    """<think> mentioned in prose should NOT trigger reasoning suppression."""
+
+    def test_think_tag_mid_sentence(self):
+        """'(/think not producing <think> tags)' should pass through."""
+        cli = _make_cli_stub()
+        tokens = [
+            "  1. Fix reasoning mode in eval ",
+            "(/think not producing ",
+            "<think>",
+            " tags — ~2% gap)",
+            "\n  2. Launch production",
+        ]
+        for t in tokens:
+            cli._stream_delta(t)
+        assert not cli._in_reasoning_block, "<think> in prose should not enter reasoning block"
+        full = "".join(cli._emitted)
+        assert "<think>" in full, "The literal <think> tag should be in the emitted text"
+        assert "Launch production" in full
+
+    def test_think_tag_after_text_on_same_line(self):
+        """'some text <think>' should NOT trigger reasoning."""
+        cli = _make_cli_stub()
+        cli._stream_delta("Here is the <think> tag explanation")
+        assert not cli._in_reasoning_block
+        full = "".join(cli._emitted)
+        assert "<think>" in full
+
+    def test_think_tag_in_backticks(self):
+        """'`<think>`' should NOT trigger reasoning."""
+        cli = _make_cli_stub()
+        cli._stream_delta("Use the `<think>` tag for reasoning")
+        assert not cli._in_reasoning_block
+
+
+class TestRealReasoningBlock:
+    """Real <think> tags at block boundaries should still be caught."""
+
+    def test_think_at_start_of_stream(self):
+        """'<think>reasoning</think>answer' should suppress reasoning."""
+        cli = _make_cli_stub()
+        cli._stream_delta("<think>")
+        assert cli._in_reasoning_block
+        cli._stream_delta("I need to analyze this")
+        cli._stream_delta("</think>")
+        assert not cli._in_reasoning_block
+        cli._stream_delta("Here is my answer")
+        full = "".join(cli._emitted)
+        assert "Here is my answer" in full
+        assert "I need to analyze" not in full  # reasoning was suppressed
+
+    def test_think_after_newline(self):
+        """'text\\n<think>' should trigger reasoning block."""
+        cli = _make_cli_stub()
+        cli._stream_delta("Some preamble\n<think>")
+        assert cli._in_reasoning_block
+        full = "".join(cli._emitted)
+        assert "Some preamble" in full
+
+    def test_think_after_newline_with_whitespace(self):
+        """'text\\n  <think>' should trigger reasoning block."""
+        cli = _make_cli_stub()
+        cli._stream_delta("Some preamble\n  <think>")
+        assert cli._in_reasoning_block
+
+    def test_think_with_only_whitespace_before(self):
+        """'   <think>' (whitespace only prefix) should trigger."""
+        cli = _make_cli_stub()
+        cli._stream_delta("   <think>")
+        assert cli._in_reasoning_block
+
+
+class TestFlushRecovery:
+    """_flush_stream should recover content from false-positive reasoning blocks."""
+
+    def test_flush_recovers_buffered_content(self):
+        """If somehow in reasoning block at flush, content is recovered."""
+        cli = _make_cli_stub()
+        # Manually set up a false-positive state
+        cli._in_reasoning_block = True
+        cli._stream_prefilt = " tags — ~2% gap)\n  2. Launch production"
+        cli._stream_box_opened = True
+
+        # Mock _close_reasoning_box and box closing
+        cli._close_reasoning_box = lambda: None
+
+        # Call flush
+        from unittest.mock import patch
+        import shutil
+        with patch.object(shutil, "get_terminal_size", return_value=os.terminal_size((80, 24))):
+            with patch("cli._cprint"):
+                cli._flush_stream()
+
+        assert not cli._in_reasoning_block
+        full = "".join(cli._emitted)
+        assert "Launch production" in full
@@ -294,6 +294,40 @@ class TestModelsEndpoint:
            assert data["data"][0]["id"] == "hermes-agent"
            assert data["data"][0]["owned_by"] == "hermes"

+    @pytest.mark.asyncio
+    async def test_models_returns_profile_name(self):
+        """When running under a named profile, /v1/models advertises the profile name."""
+        with patch("gateway.platforms.api_server.APIServerAdapter._resolve_model_name", return_value="lucas"):
+            adapter = _make_adapter()
+        app = _create_app(adapter)
+        async with TestClient(TestServer(app)) as cli:
+            resp = await cli.get("/v1/models")
+            assert resp.status == 200
+            data = await resp.json()
+            assert data["data"][0]["id"] == "lucas"
+            assert data["data"][0]["root"] == "lucas"
+
+    @pytest.mark.asyncio
+    async def test_models_returns_explicit_model_name(self):
+        """Explicit model_name in config overrides profile name."""
+        extra = {"model_name": "my-custom-agent"}
+        config = PlatformConfig(enabled=True, extra=extra)
+        adapter = APIServerAdapter(config)
+        assert adapter._model_name == "my-custom-agent"
+
+    def test_resolve_model_name_explicit(self):
+        assert APIServerAdapter._resolve_model_name("my-bot") == "my-bot"
+
+    def test_resolve_model_name_default_profile(self):
+        """Default profile falls back to 'hermes-agent'."""
+        with patch("hermes_cli.profiles.get_active_profile_name", return_value="default"):
+            assert APIServerAdapter._resolve_model_name("") == "hermes-agent"
+
+    def test_resolve_model_name_named_profile(self):
+        """Named profile uses the profile name as model name."""
+        with patch("hermes_cli.profiles.get_active_profile_name", return_value="lucas"):
+            assert APIServerAdapter._resolve_model_name("") == "lucas"
+
    @pytest.mark.asyncio
    async def test_models_requires_auth(self, auth_adapter):
        app = _create_app(auth_adapter)
@@ -141,7 +141,7 @@ class TestBlockingGatewayApproval:
    def test_resolve_single_pops_oldest_fifo(self):
        """resolve_gateway_approval without resolve_all resolves oldest first."""
        from tools.approval import (
-            resolve_gateway_approval,
+            resolve_gateway_approval, pending_approval_count,
            _ApprovalEntry, _gateway_queues,
        )
        session_key = "test-fifo"
@@ -154,7 +154,7 @@ class TestBlockingGatewayApproval:
        assert e1.event.is_set()
        assert e1.result == "once"
        assert not e2.event.is_set()
-        assert len(_gateway_queues[session_key]) == 1
+        assert pending_approval_count(session_key) == 1

    def test_unregister_signals_all_entries(self):
        """unregister_gateway_notify signals all waiting entries to prevent hangs."""
@@ -173,6 +173,35 @@ class TestBlockingGatewayApproval:
        assert e1.event.is_set()
        assert e2.event.is_set()

+    def test_clear_session_signals_all_entries(self):
+        """clear_session should unblock all waiting approval threads."""
+        from tools.approval import (
+            register_gateway_notify, clear_session,
+            _ApprovalEntry, _gateway_queues,
+        )
+        session_key = "test-clear"
+        register_gateway_notify(session_key, lambda d: None)
+
+        e1 = _ApprovalEntry({"command": "cmd1"})
+        e2 = _ApprovalEntry({"command": "cmd2"})
+        _gateway_queues[session_key] = [e1, e2]
+
+        clear_session(session_key)
+        assert e1.event.is_set()
+        assert e2.event.is_set()
+
+    def test_pending_approval_count(self):
+        from tools.approval import (
+            pending_approval_count, _ApprovalEntry, _gateway_queues,
+        )
+        session_key = "test-count"
+        assert pending_approval_count(session_key) == 0
+        _gateway_queues[session_key] = [
+            _ApprovalEntry({"command": "a"}),
+            _ApprovalEntry({"command": "b"}),
+        ]
+        assert pending_approval_count(session_key) == 2
+

 # ------------------------------------------------------------------
 # /approve command
@@ -477,7 +506,7 @@ class TestBlockingApprovalE2E:
        from tools.approval import (
            register_gateway_notify, unregister_gateway_notify,
            resolve_gateway_approval, check_all_command_guards,
-            _gateway_queues,
+            pending_approval_count,
        )

        session_key = "e2e-parallel"
@@ -516,7 +545,7 @@ class TestBlockingApprovalE2E:
            time.sleep(0.05)

        assert len(notified) == 3
-        assert len(_gateway_queues.get(session_key, [])) == 3
+        assert pending_approval_count(session_key) == 3

        # Approve all at once
        count = resolve_gateway_approval(session_key, "session", resolve_all=True)
@@ -1,7 +1,7 @@
 """Tests for the delivery routing module."""

 from gateway.config import Platform, GatewayConfig, PlatformConfig, HomeChannel
-from gateway.delivery import DeliveryRouter, DeliveryTarget
+from gateway.delivery import DeliveryRouter, DeliveryTarget, parse_deliver_spec
 from gateway.session import SessionSource


@@ -41,6 +41,28 @@ class TestParseTargetPlatformChat:
        assert target.platform == Platform.LOCAL


+class TestParseDeliverSpec:
+    def test_none_returns_default(self):
+        result = parse_deliver_spec(None)
+        assert result == "origin"
+
+    def test_empty_string_returns_default(self):
+        result = parse_deliver_spec("")
+        assert result == "origin"
+
+    def test_custom_default(self):
+        result = parse_deliver_spec(None, default="local")
+        assert result == "local"
+
+    def test_passthrough_string(self):
+        result = parse_deliver_spec("telegram")
+        assert result == "telegram"
+
+    def test_passthrough_list(self):
+        result = parse_deliver_spec(["local", "telegram"])
+        assert result == ["local", "telegram"]
+
+
 class TestTargetToStringRoundtrip:
    def test_origin_roundtrip(self):
        origin = SessionSource(platform=Platform.TELEGRAM, chat_id="111", thread_id="42")
@@ -81,6 +81,7 @@ def adapter(monkeypatch):
    config = PlatformConfig(enabled=True, token="fake-token")
    adapter = DiscordAdapter(config)
    adapter._client = SimpleNamespace(user=SimpleNamespace(id=999))
+    adapter._text_batch_delay_seconds = 0  # disable batching for tests
    adapter.handle_message = AsyncMock()
    return adapter

@@ -91,6 +91,7 @@ def adapter(monkeypatch):
    config = PlatformConfig(enabled=True, token="fake-token")
    adapter = DiscordAdapter(config)
    adapter._client = SimpleNamespace(user=SimpleNamespace(id=999))
+    adapter._text_batch_delay_seconds = 0  # disable batching for tests
    adapter.handle_message = AsyncMock()
    return adapter

@@ -62,6 +62,7 @@ def adapter():
        fetch_channel=AsyncMock(),
        user=SimpleNamespace(id=99999, name="HermesBot"),
    )
+    adapter._text_batch_delay_seconds = 0  # disable batching for tests
    return adapter


@@ -44,6 +44,7 @@ def _make_adapter(tmp_path=None):
        },
    )
    adapter = MatrixAdapter(config)
+    adapter._text_batch_delay_seconds = 0  # disable batching for tests
    adapter.handle_message = AsyncMock()
    adapter._startup_ts = time.time() - 10  # avoid startup grace filter
    return adapter
@@ -7,6 +7,7 @@ from gateway.session import (
    _hash_id,
    _hash_sender_id,
    _hash_chat_id,
+    _looks_like_phone,
 )
 from gateway.config import Platform, HomeChannel

@@ -38,6 +39,14 @@ class TestHashHelpers:
        assert len(result) == 12
        assert "12345" not in result

+    def test_looks_like_phone(self):
+        assert _looks_like_phone("+15551234567")
+        assert _looks_like_phone("15551234567")
+        assert _looks_like_phone("+1-555-123-4567")
+        assert not _looks_like_phone("alice")
+        assert not _looks_like_phone("user-123")
+        assert not _looks_like_phone("")
+

 # ---------------------------------------------------------------------------
 # Integration: build_session_context_prompt
@@ -0,0 +1,448 @@
+"""Tests for text message batching across all gateway adapters.
+
+When a user sends a long message, the messaging client splits it at the
+platform's character limit.  Each adapter should buffer rapid successive
+text messages from the same session and aggregate them before dispatching.
+
+Covers: Discord, Matrix, WeCom, and the adaptive delay logic for
+Telegram and Feishu.
+"""
+
+import asyncio
+import os
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import MessageEvent, MessageType, SessionSource
+
+
+# =====================================================================
+# Helpers
+# =====================================================================
+
+def _make_event(
+    text: str,
+    platform: Platform,
+    chat_id: str = "12345",
+    msg_type: MessageType = MessageType.TEXT,
+) -> MessageEvent:
+    return MessageEvent(
+        text=text,
+        message_type=msg_type,
+        source=SessionSource(platform=platform, chat_id=chat_id, chat_type="dm"),
+    )
+
+
+# =====================================================================
+# Discord text batching
+# =====================================================================
+
+def _make_discord_adapter():
+    """Create a minimal DiscordAdapter for testing text batching."""
+    from gateway.platforms.discord import DiscordAdapter
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(DiscordAdapter)
+    adapter._platform = Platform.DISCORD
+    adapter.config = config
+    adapter._pending_text_batches = {}
+    adapter._pending_text_batch_tasks = {}
+    adapter._text_batch_delay_seconds = 0.1  # fast for tests
+    adapter._text_batch_split_delay_seconds = 0.3  # fast for tests
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+class TestDiscordTextBatching:
+    @pytest.mark.asyncio
+    async def test_single_message_dispatched_after_delay(self):
+        adapter = _make_discord_adapter()
+        event = _make_event("hello world", Platform.DISCORD)
+
+        adapter._enqueue_text_event(event)
+
+        # Not dispatched yet
+        adapter.handle_message.assert_not_called()
+
+        # Wait for flush
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        dispatched = adapter.handle_message.call_args[0][0]
+        assert dispatched.text == "hello world"
+
+    @pytest.mark.asyncio
+    async def test_split_messages_aggregated(self):
+        """Two rapid messages from the same chat should be merged."""
+        adapter = _make_discord_adapter()
+
+        adapter._enqueue_text_event(_make_event("Part one of a long", Platform.DISCORD))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("message that was split.", Platform.DISCORD))
+
+        adapter.handle_message.assert_not_called()
+
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "Part one" in text
+        assert "split" in text
+
+    @pytest.mark.asyncio
+    async def test_three_way_split_aggregated(self):
+        adapter = _make_discord_adapter()
+
+        adapter._enqueue_text_event(_make_event("chunk 1", Platform.DISCORD))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("chunk 2", Platform.DISCORD))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("chunk 3", Platform.DISCORD))
+
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "chunk 1" in text
+        assert "chunk 2" in text
+        assert "chunk 3" in text
+
+    @pytest.mark.asyncio
+    async def test_different_chats_not_merged(self):
+        adapter = _make_discord_adapter()
+
+        adapter._enqueue_text_event(_make_event("from A", Platform.DISCORD, chat_id="111"))
+        adapter._enqueue_text_event(_make_event("from B", Platform.DISCORD, chat_id="222"))
+
+        await asyncio.sleep(0.2)
+
+        assert adapter.handle_message.call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_batch_cleans_up_after_flush(self):
+        adapter = _make_discord_adapter()
+
+        adapter._enqueue_text_event(_make_event("test", Platform.DISCORD))
+        await asyncio.sleep(0.2)
+
+        assert len(adapter._pending_text_batches) == 0
+
+    @pytest.mark.asyncio
+    async def test_adaptive_delay_for_near_limit_chunk(self):
+        """Chunks near the 2000-char limit should trigger longer delay."""
+        adapter = _make_discord_adapter()
+        # Simulate a chunk near Discord's 2000-char split point
+        long_text = "x" * 1950
+        adapter._enqueue_text_event(_make_event(long_text, Platform.DISCORD))
+
+        # After the short delay (0.1s), should NOT have flushed yet (split delay is 0.3s)
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_not_called()
+
+        # After the split delay, should be flushed
+        await asyncio.sleep(0.25)
+        adapter.handle_message.assert_called_once()
+
+
+# =====================================================================
+# Matrix text batching
+# =====================================================================
+
+def _make_matrix_adapter():
+    """Create a minimal MatrixAdapter for testing text batching."""
+    from gateway.platforms.matrix import MatrixAdapter
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(MatrixAdapter)
+    adapter._platform = Platform.MATRIX
+    adapter.config = config
+    adapter._pending_text_batches = {}
+    adapter._pending_text_batch_tasks = {}
+    adapter._text_batch_delay_seconds = 0.1
+    adapter._text_batch_split_delay_seconds = 0.3
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+class TestMatrixTextBatching:
+    @pytest.mark.asyncio
+    async def test_single_message_dispatched_after_delay(self):
+        adapter = _make_matrix_adapter()
+        event = _make_event("hello world", Platform.MATRIX)
+
+        adapter._enqueue_text_event(event)
+
+        adapter.handle_message.assert_not_called()
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        assert adapter.handle_message.call_args[0][0].text == "hello world"
+
+    @pytest.mark.asyncio
+    async def test_split_messages_aggregated(self):
+        adapter = _make_matrix_adapter()
+
+        adapter._enqueue_text_event(_make_event("first part", Platform.MATRIX))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("second part", Platform.MATRIX))
+
+        adapter.handle_message.assert_not_called()
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "first part" in text
+        assert "second part" in text
+
+    @pytest.mark.asyncio
+    async def test_different_rooms_not_merged(self):
+        adapter = _make_matrix_adapter()
+
+        adapter._enqueue_text_event(_make_event("room A", Platform.MATRIX, chat_id="!aaa:matrix.org"))
+        adapter._enqueue_text_event(_make_event("room B", Platform.MATRIX, chat_id="!bbb:matrix.org"))
+
+        await asyncio.sleep(0.2)
+
+        assert adapter.handle_message.call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_adaptive_delay_for_near_limit_chunk(self):
+        """Chunks near the 4000-char limit should trigger longer delay."""
+        adapter = _make_matrix_adapter()
+        long_text = "x" * 3950
+        adapter._enqueue_text_event(_make_event(long_text, Platform.MATRIX))
+
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_not_called()
+
+        await asyncio.sleep(0.25)
+        adapter.handle_message.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_batch_cleans_up_after_flush(self):
+        adapter = _make_matrix_adapter()
+        adapter._enqueue_text_event(_make_event("test", Platform.MATRIX))
+        await asyncio.sleep(0.2)
+        assert len(adapter._pending_text_batches) == 0
+
+
+# =====================================================================
+# WeCom text batching
+# =====================================================================
+
+def _make_wecom_adapter():
+    """Create a minimal WeComAdapter for testing text batching."""
+    from gateway.platforms.wecom import WeComAdapter
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(WeComAdapter)
+    adapter._platform = Platform.WECOM
+    adapter.config = config
+    adapter._pending_text_batches = {}
+    adapter._pending_text_batch_tasks = {}
+    adapter._text_batch_delay_seconds = 0.1
+    adapter._text_batch_split_delay_seconds = 0.3
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+class TestWeComTextBatching:
+    @pytest.mark.asyncio
+    async def test_single_message_dispatched_after_delay(self):
+        adapter = _make_wecom_adapter()
+        event = _make_event("hello world", Platform.WECOM)
+
+        adapter._enqueue_text_event(event)
+
+        adapter.handle_message.assert_not_called()
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        assert adapter.handle_message.call_args[0][0].text == "hello world"
+
+    @pytest.mark.asyncio
+    async def test_split_messages_aggregated(self):
+        adapter = _make_wecom_adapter()
+
+        adapter._enqueue_text_event(_make_event("first part", Platform.WECOM))
+        await asyncio.sleep(0.02)
+        adapter._enqueue_text_event(_make_event("second part", Platform.WECOM))
+
+        adapter.handle_message.assert_not_called()
+        await asyncio.sleep(0.2)
+
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "first part" in text
+        assert "second part" in text
+
+    @pytest.mark.asyncio
+    async def test_different_chats_not_merged(self):
+        adapter = _make_wecom_adapter()
+
+        adapter._enqueue_text_event(_make_event("chat A", Platform.WECOM, chat_id="chat_a"))
+        adapter._enqueue_text_event(_make_event("chat B", Platform.WECOM, chat_id="chat_b"))
+
+        await asyncio.sleep(0.2)
+
+        assert adapter.handle_message.call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_adaptive_delay_for_near_limit_chunk(self):
+        """Chunks near the 4000-char limit should trigger longer delay."""
+        adapter = _make_wecom_adapter()
+        long_text = "x" * 3950
+        adapter._enqueue_text_event(_make_event(long_text, Platform.WECOM))
+
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_not_called()
+
+        await asyncio.sleep(0.25)
+        adapter.handle_message.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_batch_cleans_up_after_flush(self):
+        adapter = _make_wecom_adapter()
+        adapter._enqueue_text_event(_make_event("test", Platform.WECOM))
+        await asyncio.sleep(0.2)
+        assert len(adapter._pending_text_batches) == 0
+
+
+# =====================================================================
+# Telegram adaptive delay (PR #6891)
+# =====================================================================
+
+def _make_telegram_adapter():
+    """Create a minimal TelegramAdapter for testing adaptive delay."""
+    from gateway.platforms.telegram import TelegramAdapter
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(TelegramAdapter)
+    adapter._platform = Platform.TELEGRAM
+    adapter.config = config
+    adapter._pending_text_batches = {}
+    adapter._pending_text_batch_tasks = {}
+    adapter._text_batch_delay_seconds = 0.1
+    adapter._text_batch_split_delay_seconds = 0.3
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+class TestTelegramAdaptiveDelay:
+    @pytest.mark.asyncio
+    async def test_short_chunk_uses_normal_delay(self):
+        adapter = _make_telegram_adapter()
+        adapter._enqueue_text_event(_make_event("short msg", Platform.TELEGRAM))
+
+        # Should flush after the normal 0.1s delay
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_near_limit_chunk_uses_split_delay(self):
+        """A chunk near the 4096-char limit should trigger longer delay."""
+        adapter = _make_telegram_adapter()
+        long_text = "x" * 4050  # near the 4096 limit
+        adapter._enqueue_text_event(_make_event(long_text, Platform.TELEGRAM))
+
+        # After the short delay, should NOT have flushed yet
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_not_called()
+
+        # After the split delay, should be flushed
+        await asyncio.sleep(0.25)
+        adapter.handle_message.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_split_continuation_merged(self):
+        """Two near-limit chunks should both be merged."""
+        adapter = _make_telegram_adapter()
+
+        adapter._enqueue_text_event(_make_event("x" * 4050, Platform.TELEGRAM))
+        await asyncio.sleep(0.05)
+        adapter._enqueue_text_event(_make_event("continuation text", Platform.TELEGRAM))
+
+        # Short chunk arrived → should use normal delay now
+        await asyncio.sleep(0.15)
+        adapter.handle_message.assert_called_once()
+        text = adapter.handle_message.call_args[0][0].text
+        assert "continuation text" in text
+
+
+# =====================================================================
+# Feishu adaptive delay
+# =====================================================================
+
+def _make_feishu_adapter():
+    """Create a minimal FeishuAdapter for testing adaptive delay."""
+    from gateway.platforms.feishu import FeishuAdapter, FeishuBatchState
+
+    config = PlatformConfig(enabled=True, token="test-token")
+    adapter = object.__new__(FeishuAdapter)
+    adapter._platform = Platform.FEISHU
+    adapter.config = config
+    batch_state = FeishuBatchState()
+    adapter._pending_text_batches = batch_state.events
+    adapter._pending_text_batch_tasks = batch_state.tasks
+    adapter._pending_text_batch_counts = batch_state.counts
+    adapter._text_batch_delay_seconds = 0.1
+    adapter._text_batch_split_delay_seconds = 0.3
+    adapter._text_batch_max_messages = 20
+    adapter._text_batch_max_chars = 50000
+    adapter._active_sessions = {}
+    adapter._pending_messages = {}
+    adapter._message_handler = AsyncMock()
+    adapter._handle_message_with_guards = AsyncMock()
+    return adapter
+
+
+class TestFeishuAdaptiveDelay:
+    @pytest.mark.asyncio
+    async def test_short_chunk_uses_normal_delay(self):
+        adapter = _make_feishu_adapter()
+        event = _make_event("short msg", Platform.FEISHU)
+        await adapter._enqueue_text_event(event)
+
+        await asyncio.sleep(0.15)
+        adapter._handle_message_with_guards.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_near_limit_chunk_uses_split_delay(self):
+        """A chunk near the 4096-char limit should trigger longer delay."""
+        adapter = _make_feishu_adapter()
+        long_text = "x" * 4050
+        event = _make_event(long_text, Platform.FEISHU)
+        await adapter._enqueue_text_event(event)
+
+        await asyncio.sleep(0.15)
+        adapter._handle_message_with_guards.assert_not_called()
+
+        await asyncio.sleep(0.25)
+        adapter._handle_message_with_guards.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_split_continuation_merged(self):
+        adapter = _make_feishu_adapter()
+
+        await adapter._enqueue_text_event(_make_event("x" * 4050, Platform.FEISHU))
+        await asyncio.sleep(0.05)
+        await adapter._enqueue_text_event(_make_event("continuation text", Platform.FEISHU))
+
+        await asyncio.sleep(0.15)
+        adapter._handle_message_with_guards.assert_called_once()
+        text = adapter._handle_message_with_guards.call_args[0][0].text
+        assert "continuation text" in text
@@ -508,6 +508,7 @@ class TestInboundMessages:
        from gateway.platforms.wecom import WeComAdapter

        adapter = WeComAdapter(PlatformConfig(enabled=True))
+        adapter._text_batch_delay_seconds = 0  # disable batching for tests
        adapter.handle_message = AsyncMock()
        adapter._extract_media = AsyncMock(return_value=(["/tmp/test.png"], ["image/png"]))

@@ -539,6 +540,7 @@ class TestInboundMessages:
        from gateway.platforms.wecom import WeComAdapter

        adapter = WeComAdapter(PlatformConfig(enabled=True))
+        adapter._text_batch_delay_seconds = 0  # disable batching for tests
        adapter.handle_message = AsyncMock()
        adapter._extract_media = AsyncMock(return_value=([], []))

@@ -49,6 +49,30 @@ def test_chat_subcommand_accepts_skills_flag(monkeypatch):
    }


+def test_chat_subcommand_accepts_image_flag(monkeypatch):
+    import hermes_cli.main as main_mod
+
+    captured = {}
+
+    def fake_cmd_chat(args):
+        captured["query"] = args.query
+        captured["image"] = args.image
+
+    monkeypatch.setattr(main_mod, "cmd_chat", fake_cmd_chat)
+    monkeypatch.setattr(
+        sys,
+        "argv",
+        ["hermes", "chat", "-q", "hello", "--image", "~/storage/shared/Pictures/cat.png"],
+    )
+
+    main_mod.main()
+
+    assert captured == {
+        "query": "hello",
+        "image": "~/storage/shared/Pictures/cat.png",
+    }
+
+
 def test_continue_worktree_and_skills_flags_work_together(monkeypatch):
    import hermes_cli.main as main_mod

@@ -446,6 +446,13 @@ class TestSubcommands:
        assert "show" in subs
        assert "hide" in subs

+    def test_fast_has_subcommands(self):
+        assert "/fast" in SUBCOMMANDS
+        subs = SUBCOMMANDS["/fast"]
+        assert "fast" in subs
+        assert "normal" in subs
+        assert "status" in subs
+
    def test_voice_has_subcommands(self):
        assert "/voice" in SUBCOMMANDS
        assert "on" in SUBCOMMANDS["/voice"]
@@ -474,6 +481,20 @@ class TestSubcommandCompletion:
        assert "high" in texts
        assert "show" in texts

+    def test_fast_subcommand_completion_after_space(self):
+        completions = _completions(SlashCommandCompleter(), "/fast ")
+        texts = {c.text for c in completions}
+        assert "fast" in texts
+        assert "normal" in texts
+
+    def test_fast_command_filtered_out_when_unavailable(self):
+        completions = _completions(
+            SlashCommandCompleter(command_filter=lambda cmd: cmd != "/fast"),
+            "/fa",
+        )
+        texts = {c.text for c in completions}
+        assert "fast" not in texts
+
    def test_subcommand_prefix_filters(self):
        """Typing '/reasoning sh' should only show 'show'."""
        completions = _completions(SlashCommandCompleter(), "/reasoning sh")
@@ -527,6 +548,13 @@ class TestGhostText:
        """/reasoning sh → 'ow'"""
        assert _suggestion("/reasoning sh") == "ow"

+    def test_fast_subcommand_suggestion(self):
+        assert _suggestion("/fast f") == "ast"
+
+    def test_fast_subcommand_suggestion_hidden_when_filtered(self):
+        completer = SlashCommandCompleter(command_filter=lambda cmd: cmd != "/fast")
+        assert _suggestion("/fa", completer=completer) is None
+
    def test_no_suggestion_for_non_slash(self):
        assert _suggestion("hello") is None

@@ -35,6 +35,12 @@ class TestTokenValidation:
        valid, msg = validate_copilot_token("")
        assert valid is False

+    def test_is_classic_pat(self):
+        from hermes_cli.copilot_auth import is_classic_pat
+        assert is_classic_pat("ghp_abc123") is True
+        assert is_classic_pat("gho_abc123") is False
+        assert is_classic_pat("github_pat_abc") is False
+        assert is_classic_pat("") is False


 class TestResolveToken:
@@ -14,6 +14,23 @@ from hermes_cli import doctor as doctor_mod
 from hermes_cli.doctor import _has_provider_env_config


+class TestDoctorPlatformHints:
+    def test_termux_package_hint(self, monkeypatch):
+        monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+        monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+        assert doctor._is_termux() is True
+        assert doctor._python_install_cmd() == "python -m pip install"
+        assert doctor._system_package_install_cmd("ripgrep") == "pkg install ripgrep"
+
+    def test_non_termux_package_hint_defaults_to_apt(self, monkeypatch):
+        monkeypatch.delenv("TERMUX_VERSION", raising=False)
+        monkeypatch.setenv("PREFIX", "/usr")
+        monkeypatch.setattr(sys, "platform", "linux")
+        assert doctor._is_termux() is False
+        assert doctor._python_install_cmd() == "uv pip install"
+        assert doctor._system_package_install_cmd("ripgrep") == "sudo apt install ripgrep"
+
+
 class TestProviderEnvDetection:
    def test_detects_openai_api_key(self):
        content = "OPENAI_BASE_URL=http://localhost:1234/v1\nOPENAI_API_KEY=***"
@@ -206,3 +223,72 @@ class TestDoctorMemoryProviderSection:
        out = self._run_doctor_and_capture(monkeypatch, tmp_path, provider="mem0")
        assert "Memory Provider" in out
        assert "Built-in memory active" not in out
+
+
+def test_run_doctor_termux_treats_docker_and_browser_warnings_as_expected(monkeypatch, tmp_path):
+    helper = TestDoctorMemoryProviderSection()
+    monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+    monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+
+    real_which = doctor_mod.shutil.which
+
+    def fake_which(cmd):
+        if cmd in {"docker", "node", "npm"}:
+            return None
+        return real_which(cmd)
+
+    monkeypatch.setattr(doctor_mod.shutil, "which", fake_which)
+
+    out = helper._run_doctor_and_capture(monkeypatch, tmp_path, provider="")
+
+    assert "Docker backend is not available inside Termux" in out
+    assert "Node.js not found (browser tools are optional in the tested Termux path)" in out
+    assert "Install Node.js on Termux with: pkg install nodejs" in out
+    assert "Termux browser setup:" in out
+    assert "1) pkg install nodejs" in out
+    assert "2) npm install -g agent-browser" in out
+    assert "3) agent-browser install" in out
+    assert "docker not found (optional)" not in out
+
+
+def test_run_doctor_termux_does_not_mark_browser_available_without_agent_browser(monkeypatch, tmp_path):
+    home = tmp_path / ".hermes"
+    home.mkdir(parents=True, exist_ok=True)
+    (home / "config.yaml").write_text("memory: {}\n", encoding="utf-8")
+    project = tmp_path / "project"
+    project.mkdir(exist_ok=True)
+
+    monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+    monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+    monkeypatch.setattr(doctor_mod, "HERMES_HOME", home)
+    monkeypatch.setattr(doctor_mod, "PROJECT_ROOT", project)
+    monkeypatch.setattr(doctor_mod, "_DHH", str(home))
+    monkeypatch.setattr(doctor_mod.shutil, "which", lambda cmd: "/data/data/com.termux/files/usr/bin/node" if cmd in {"node", "npm"} else None)
+
+    fake_model_tools = types.SimpleNamespace(
+        check_tool_availability=lambda *a, **kw: (["terminal"], [{"name": "browser", "env_vars": [], "tools": ["browser_navigate"]}]),
+        TOOLSET_REQUIREMENTS={
+            "terminal": {"name": "terminal"},
+            "browser": {"name": "browser"},
+        },
+    )
+    monkeypatch.setitem(sys.modules, "model_tools", fake_model_tools)
+
+    try:
+        from hermes_cli import auth as _auth_mod
+        monkeypatch.setattr(_auth_mod, "get_nous_auth_status", lambda: {})
+        monkeypatch.setattr(_auth_mod, "get_codex_auth_status", lambda: {})
+    except Exception:
+        pass
+
+    import io, contextlib
+    buf = io.StringIO()
+    with contextlib.redirect_stdout(buf):
+        doctor_mod.run_doctor(Namespace(fix=False))
+    out = buf.getvalue()
+
+    assert "✓ browser" not in out
+    assert "browser" in out
+    assert "system dependency not met" in out
+    assert "agent-browser is not installed (expected in the tested Termux path)" in out
+    assert "npm install -g agent-browser && agent-browser install" in out
@@ -0,0 +1,50 @@
+"""Tests for detect_external_credentials() -- Phase 2 credential sync."""
+
+import json
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+
+from hermes_cli.auth import detect_external_credentials
+
+
+class TestDetectCodexCLI:
+    def test_detects_valid_codex_auth(self, tmp_path, monkeypatch):
+        codex_dir = tmp_path / ".codex"
+        codex_dir.mkdir()
+        auth = codex_dir / "auth.json"
+        auth.write_text(json.dumps({
+            "tokens": {"access_token": "tok-123", "refresh_token": "ref-456"}
+        }))
+        monkeypatch.setenv("CODEX_HOME", str(codex_dir))
+        result = detect_external_credentials()
+        codex_hits = [c for c in result if c["provider"] == "openai-codex"]
+        assert len(codex_hits) == 1
+        assert "Codex CLI" in codex_hits[0]["label"]
+
+    def test_skips_codex_without_access_token(self, tmp_path, monkeypatch):
+        codex_dir = tmp_path / ".codex"
+        codex_dir.mkdir()
+        (codex_dir / "auth.json").write_text(json.dumps({"tokens": {}}))
+        monkeypatch.setenv("CODEX_HOME", str(codex_dir))
+        result = detect_external_credentials()
+        assert not any(c["provider"] == "openai-codex" for c in result)
+
+    def test_skips_missing_codex_dir(self, tmp_path, monkeypatch):
+        monkeypatch.setenv("CODEX_HOME", str(tmp_path / "nonexistent"))
+        result = detect_external_credentials()
+        assert not any(c["provider"] == "openai-codex" for c in result)
+
+    def test_skips_malformed_codex_auth(self, tmp_path, monkeypatch):
+        codex_dir = tmp_path / ".codex"
+        codex_dir.mkdir()
+        (codex_dir / "auth.json").write_text("{bad json")
+        monkeypatch.setenv("CODEX_HOME", str(codex_dir))
+        result = detect_external_credentials()
+        assert not any(c["provider"] == "openai-codex" for c in result)
+
+    def test_returns_empty_when_nothing_found(self, tmp_path, monkeypatch):
+        monkeypatch.setenv("CODEX_HOME", str(tmp_path / "nonexistent"))
+        result = detect_external_credentials()
+        assert result == []
@@ -10,6 +10,7 @@ import hermes_cli.gateway as gateway
 class TestSystemdLingerStatus:
    def test_reports_enabled(self, monkeypatch):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setenv("USER", "alice")
        monkeypatch.setattr(
            gateway.subprocess,
@@ -22,6 +23,7 @@ class TestSystemdLingerStatus:

    def test_reports_disabled(self, monkeypatch):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setenv("USER", "alice")
        monkeypatch.setattr(
            gateway.subprocess,
@@ -32,6 +34,11 @@ class TestSystemdLingerStatus:

        assert gateway.get_systemd_linger_status() == (False, "")

+    def test_reports_termux_as_not_supported(self, monkeypatch):
+        monkeypatch.setattr(gateway, "is_termux", lambda: True)
+
+        assert gateway.get_systemd_linger_status() == (None, "not supported in Termux")
+

 def test_systemd_status_warns_when_linger_disabled(monkeypatch, tmp_path, capsys):
    unit_path = tmp_path / "hermes-gateway.service"
@@ -8,6 +8,7 @@ import hermes_cli.gateway as gateway
 class TestEnsureLingerEnabled:
    def test_linger_already_enabled_via_file(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: True))

@@ -22,6 +23,7 @@ class TestEnsureLingerEnabled:

    def test_status_enabled_skips_enable(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: False))
        monkeypatch.setattr(gateway, "get_systemd_linger_status", lambda: (True, ""))
@@ -37,6 +39,7 @@ class TestEnsureLingerEnabled:

    def test_loginctl_success_enables_linger(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: False))
        monkeypatch.setattr(gateway, "get_systemd_linger_status", lambda: (False, ""))
@@ -59,6 +62,7 @@ class TestEnsureLingerEnabled:

    def test_missing_loginctl_shows_manual_guidance(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: False))
        monkeypatch.setattr(gateway, "get_systemd_linger_status", lambda: (None, "loginctl not found"))
@@ -76,6 +80,7 @@ class TestEnsureLingerEnabled:

    def test_loginctl_failure_shows_manual_guidance(self, monkeypatch, capsys):
        monkeypatch.setattr(gateway, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway, "is_termux", lambda: False)
        monkeypatch.setattr("getpass.getuser", lambda: "testuser")
        monkeypatch.setattr(gateway, "Path", lambda _path: SimpleNamespace(exists=lambda: False))
        monkeypatch.setattr(gateway, "get_systemd_linger_status", lambda: (False, ""))
@@ -109,7 +109,8 @@ class TestGatewayStopCleanup:
        unit_path = tmp_path / "hermes-gateway.service"
        unit_path.write_text("unit\n", encoding="utf-8")

-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
        monkeypatch.setattr(gateway_cli, "get_systemd_unit_path", lambda system=False: unit_path)

@@ -134,7 +135,8 @@ class TestGatewayStopCleanup:
        unit_path = tmp_path / "hermes-gateway.service"
        unit_path.write_text("unit\n", encoding="utf-8")

-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
        monkeypatch.setattr(gateway_cli, "get_systemd_unit_path", lambda system=False: unit_path)

@@ -256,7 +258,8 @@ class TestGatewayServiceDetection:
        user_unit = SimpleNamespace(exists=lambda: True)
        system_unit = SimpleNamespace(exists=lambda: True)

-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
        monkeypatch.setattr(
            gateway_cli,
@@ -278,7 +281,8 @@ class TestGatewayServiceDetection:

 class TestGatewaySystemServiceRouting:
    def test_gateway_install_passes_system_flags(self, monkeypatch):
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)

        calls = []
@@ -294,11 +298,30 @@ class TestGatewaySystemServiceRouting:

        assert calls == [(True, True, "alice")]

+    def test_gateway_install_reports_termux_manual_mode(self, monkeypatch, capsys):
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: False)
+        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
+
+        try:
+            gateway_cli.gateway_command(
+                SimpleNamespace(gateway_command="install", force=False, system=False, run_as_user=None)
+            )
+        except SystemExit as exc:
+            assert exc.code == 1
+        else:
+            raise AssertionError("Expected gateway_command to exit on unsupported Termux service install")
+
+        out = capsys.readouterr().out
+        assert "not supported on Termux" in out
+        assert "Run manually: hermes gateway" in out
+
    def test_gateway_status_prefers_system_service_when_only_system_unit_exists(self, monkeypatch):
        user_unit = SimpleNamespace(exists=lambda: False)
        system_unit = SimpleNamespace(exists=lambda: True)

-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
        monkeypatch.setattr(
            gateway_cli,
@@ -313,6 +336,20 @@ class TestGatewaySystemServiceRouting:

        assert calls == [(False, False)]

+    def test_gateway_status_on_termux_shows_manual_guidance(self, monkeypatch, capsys):
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: False)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
+        monkeypatch.setattr(gateway_cli, "find_gateway_pids", lambda exclude_pids=None: [])
+        monkeypatch.setattr(gateway_cli, "_runtime_health_lines", lambda: [])
+
+        gateway_cli.gateway_command(SimpleNamespace(gateway_command="status", deep=False, system=False))
+
+        out = capsys.readouterr().out
+        assert "Gateway is not running" in out
+        assert "nohup hermes gateway" in out
+        assert "install as user service" not in out
+
    def test_gateway_restart_does_not_fallback_to_foreground_when_launchd_restart_fails(self, tmp_path, monkeypatch):
        plist_path = tmp_path / "ai.hermes.gateway.plist"
        plist_path.write_text("plist\n", encoding="utf-8")
@@ -513,12 +550,22 @@ class TestGeneratedUnitUsesDetectedVenv:
 class TestGeneratedUnitIncludesLocalBin:
    """~/.local/bin must be in PATH so uvx/pipx tools are discoverable."""

-    def test_user_unit_includes_local_bin_in_path(self):
+    def test_user_unit_includes_local_bin_in_path(self, monkeypatch):
+        home = Path.home()
+        monkeypatch.setattr(
+            gateway_cli,
+            "_build_user_local_paths",
+            lambda home_path, existing: [str(home / ".local" / "bin")],
+        )
        unit = gateway_cli.generate_systemd_unit(system=False)
-        home = str(Path.home())
        assert f"{home}/.local/bin" in unit

-    def test_system_unit_includes_local_bin_in_path(self):
+    def test_system_unit_includes_local_bin_in_path(self, monkeypatch):
+        monkeypatch.setattr(
+            gateway_cli,
+            "_build_user_local_paths",
+            lambda home_path, existing: [str(home_path / ".local" / "bin")],
+        )
        unit = gateway_cli.generate_systemd_unit(system=True)
        # System unit uses the resolved home dir from _system_service_identity
        assert "/.local/bin" in unit
@@ -3,13 +3,15 @@
 from unittest.mock import patch, MagicMock

 from hermes_cli.models import (
-    OPENROUTER_MODELS, model_ids, detect_provider_for_model,
+    OPENROUTER_MODELS, menu_labels, model_ids, detect_provider_for_model,
    filter_nous_free_models, _NOUS_ALLOWED_FREE_MODELS,
    is_nous_free_tier, partition_nous_models_by_tier,
-    check_nous_free_tier, _FREE_TIER_CACHE_TTL,
+    check_nous_free_tier, clear_nous_free_tier_cache,
+    _FREE_TIER_CACHE_TTL,
 )
 import hermes_cli.models as _models_mod

+
 class TestModelIds:
    def test_returns_non_empty_list(self):
        ids = model_ids()
@@ -31,6 +33,25 @@ class TestModelIds:
        assert len(ids) == len(set(ids)), "Duplicate model IDs found"


+class TestMenuLabels:
+    def test_same_length_as_model_ids(self):
+        assert len(menu_labels()) == len(model_ids())
+
+    def test_first_label_marked_recommended(self):
+        labels = menu_labels()
+        assert "recommended" in labels[0].lower()
+
+    def test_each_label_contains_its_model_id(self):
+        for label, mid in zip(menu_labels(), model_ids()):
+            assert mid in label, f"Label '{label}' doesn't contain model ID '{mid}'"
+
+    def test_non_recommended_labels_have_no_tag(self):
+        """Only the first model should have (recommended)."""
+        labels = menu_labels()
+        for label in labels[1:]:
+            assert "recommended" not in label.lower(), f"Unexpected 'recommended' in '{label}'"
+
+
 class TestOpenRouterModels:
    def test_structure_is_list_of_tuples(self):
        for entry in OPENROUTER_MODELS:
@@ -281,10 +302,12 @@ class TestCheckNousFreeTierCache:
    """Tests for the TTL cache on check_nous_free_tier()."""

    def setup_method(self):
-        _models_mod._free_tier_cache = None
+        """Reset cache before each test."""
+        clear_nous_free_tier_cache()

    def teardown_method(self):
-        _models_mod._free_tier_cache = None
+        """Reset cache after each test."""
+        clear_nous_free_tier_cache()

    @patch("hermes_cli.models.fetch_nous_account_tier")
    @patch("hermes_cli.models.is_nous_free_tier", return_value=True)
@@ -298,6 +321,7 @@ class TestCheckNousFreeTierCache:

        assert result1 is True
        assert result2 is True
+        # fetch_nous_account_tier should only be called once (cached on second call)
        assert mock_fetch.call_count == 1

    @patch("hermes_cli.models.fetch_nous_account_tier")
@@ -310,6 +334,7 @@ class TestCheckNousFreeTierCache:
            result1 = check_nous_free_tier()
            assert mock_fetch.call_count == 1

+            # Simulate TTL expiry by backdating the cache timestamp
            cached_result, cached_at = _models_mod._free_tier_cache
            _models_mod._free_tier_cache = (cached_result, cached_at - _FREE_TIER_CACHE_TTL - 1)

@@ -319,6 +344,15 @@ class TestCheckNousFreeTierCache:
        assert result1 is False
        assert result2 is False

+    def test_clear_cache_forces_refresh(self):
+        """clear_nous_free_tier_cache() invalidates the cached result."""
+        # Manually seed the cache
+        import time
+        _models_mod._free_tier_cache = (True, time.monotonic())
+
+        clear_nous_free_tier_cache()
+        assert _models_mod._free_tier_cache is None
+
    def test_cache_ttl_is_short(self):
        """TTL should be short enough to catch upgrades quickly (<=5 min)."""
        assert _FREE_TIER_CACHE_TTL <= 300
@@ -0,0 +1,21 @@
+from pathlib import Path
+import subprocess
+
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+SETUP_SCRIPT = REPO_ROOT / "setup-hermes.sh"
+
+
+def test_setup_hermes_script_is_valid_shell():
+    result = subprocess.run(["bash", "-n", str(SETUP_SCRIPT)], capture_output=True, text=True)
+    assert result.returncode == 0, result.stderr
+
+
+def test_setup_hermes_script_has_termux_path():
+    content = SETUP_SCRIPT.read_text(encoding="utf-8")
+
+    assert "is_termux()" in content
+    assert ".[termux]" in content
+    assert "constraints-termux.txt" in content
+    assert "$PREFIX/bin" in content
+    assert "Skipping tinker-atropos on Termux" in content
@@ -305,6 +305,7 @@ def test_setup_copilot_acp_skips_same_provider_pool_step(tmp_path, monkeypatch):
    monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", fake_prompt_yes_no)
    monkeypatch.setattr("hermes_cli.setup.prompt", lambda *args, **kwargs: "")
    monkeypatch.setattr("hermes_cli.auth.get_active_provider", lambda: None)
+    monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
    monkeypatch.setattr("agent.auxiliary_client.get_available_vision_backends", lambda: [])

    setup_model_provider(config)
@@ -0,0 +1,155 @@
+"""Tests for _setup_provider_model_selection and the zai/kimi/minimax branch.
+
+Regression test for the is_coding_plan NameError that crashed setup when
+selecting zai, kimi-coding, minimax, or minimax-cn providers.
+"""
+import pytest
+from unittest.mock import patch, MagicMock
+
+
+@pytest.fixture
+def mock_provider_registry():
+    """Minimal PROVIDER_REGISTRY entries for tested providers."""
+    class FakePConfig:
+        def __init__(self, name, env_vars, base_url_env, inference_url):
+            self.name = name
+            self.api_key_env_vars = env_vars
+            self.base_url_env_var = base_url_env
+            self.inference_base_url = inference_url
+
+    return {
+        "zai": FakePConfig("ZAI", ["ZAI_API_KEY"], "ZAI_BASE_URL", "https://api.zai.example"),
+        "kimi-coding": FakePConfig("Kimi Coding", ["KIMI_API_KEY"], "KIMI_BASE_URL", "https://api.kimi.example"),
+        "minimax": FakePConfig("MiniMax", ["MINIMAX_API_KEY"], "MINIMAX_BASE_URL", "https://api.minimax.example"),
+        "minimax-cn": FakePConfig("MiniMax CN", ["MINIMAX_API_KEY"], "MINIMAX_CN_BASE_URL", "https://api.minimax-cn.example"),
+        "opencode-zen": FakePConfig("OpenCode Zen", ["OPENCODE_ZEN_API_KEY"], "OPENCODE_ZEN_BASE_URL", "https://opencode.ai/zen/v1"),
+        "opencode-go": FakePConfig("OpenCode Go", ["OPENCODE_GO_API_KEY"], "OPENCODE_GO_BASE_URL", "https://opencode.ai/zen/go/v1"),
+    }
+
+
+class TestSetupProviderModelSelection:
+    """Verify _setup_provider_model_selection works for all providers
+    that previously hit the is_coding_plan NameError."""
+
+    @pytest.mark.parametrize("provider_id,expected_defaults", [
+        ("zai", ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"]),
+        ("kimi-coding", ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"]),
+        ("minimax", ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"]),
+        ("minimax-cn", ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"]),
+        ("opencode-zen", ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash"]),
+        ("opencode-go", ["glm-5", "kimi-k2.5", "minimax-m2.5", "minimax-m2.7"]),
+    ])
+    @patch("hermes_cli.models.fetch_api_models", return_value=[])
+    @patch("hermes_cli.config.get_env_value", return_value="fake-key")
+    def test_falls_back_to_default_models_without_crashing(
+        self, mock_env, mock_fetch, provider_id, expected_defaults, mock_provider_registry
+    ):
+        """Previously this code path raised NameError: 'is_coding_plan'.
+        Now it delegates to _setup_provider_model_selection which uses
+        _DEFAULT_PROVIDER_MODELS -- no crash, correct model list."""
+        from hermes_cli.setup import _setup_provider_model_selection
+
+        captured_choices = {}
+
+        def fake_prompt_choice(label, choices, default):
+            captured_choices["choices"] = choices
+            # Select "Keep current" (last item)
+            return len(choices) - 1
+
+        with patch("hermes_cli.auth.PROVIDER_REGISTRY", mock_provider_registry):
+            _setup_provider_model_selection(
+                config={"model": {}},
+                provider_id=provider_id,
+                current_model="some-model",
+                prompt_choice=fake_prompt_choice,
+                prompt_fn=lambda _: None,
+            )
+
+        # The offered model list should start with the default models
+        offered = captured_choices["choices"]
+        for model in expected_defaults:
+            assert model in offered, f"{model} not in choices for {provider_id}"
+
+    @patch("hermes_cli.models.fetch_api_models")
+    @patch("hermes_cli.config.get_env_value", return_value="fake-key")
+    def test_live_models_used_when_available(
+        self, mock_env, mock_fetch, mock_provider_registry
+    ):
+        """When fetch_api_models returns results, those are used instead of defaults."""
+        from hermes_cli.setup import _setup_provider_model_selection
+
+        live = ["live-model-1", "live-model-2"]
+        mock_fetch.return_value = live
+
+        captured_choices = {}
+
+        def fake_prompt_choice(label, choices, default):
+            captured_choices["choices"] = choices
+            return len(choices) - 1
+
+        with patch("hermes_cli.auth.PROVIDER_REGISTRY", mock_provider_registry):
+            _setup_provider_model_selection(
+                config={"model": {}},
+                provider_id="zai",
+                current_model="some-model",
+                prompt_choice=fake_prompt_choice,
+                prompt_fn=lambda _: None,
+            )
+
+        offered = captured_choices["choices"]
+        assert "live-model-1" in offered
+        assert "live-model-2" in offered
+
+    @patch("hermes_cli.models.fetch_api_models", return_value=[])
+    @patch("hermes_cli.config.get_env_value", return_value="fake-key")
+    def test_custom_model_selection(
+        self, mock_env, mock_fetch, mock_provider_registry
+    ):
+        """Selecting 'Custom model' lets user type a model name."""
+        from hermes_cli.setup import _setup_provider_model_selection, _DEFAULT_PROVIDER_MODELS
+
+        defaults = _DEFAULT_PROVIDER_MODELS["zai"]
+        custom_model_idx = len(defaults)  # "Custom model" is right after defaults
+
+        config = {"model": {}}
+
+        def fake_prompt_choice(label, choices, default):
+            return custom_model_idx
+
+        with patch("hermes_cli.auth.PROVIDER_REGISTRY", mock_provider_registry):
+            _setup_provider_model_selection(
+                config=config,
+                provider_id="zai",
+                current_model="some-model",
+                prompt_choice=fake_prompt_choice,
+                prompt_fn=lambda _: "my-custom-model",
+            )
+
+        assert config["model"]["default"] == "my-custom-model"
+
+    @patch("hermes_cli.models.fetch_api_models", return_value=["opencode-go/kimi-k2.5", "opencode-go/minimax-m2.7"])
+    @patch("hermes_cli.config.get_env_value", return_value="fake-key")
+    def test_opencode_live_models_are_normalized_for_selection(
+        self, mock_env, mock_fetch, mock_provider_registry
+    ):
+        from hermes_cli.setup import _setup_provider_model_selection
+
+        captured_choices = {}
+
+        def fake_prompt_choice(label, choices, default):
+            captured_choices["choices"] = choices
+            return len(choices) - 1
+
+        with patch("hermes_cli.auth.PROVIDER_REGISTRY", mock_provider_registry):
+            _setup_provider_model_selection(
+                config={"model": {}},
+                provider_id="opencode-go",
+                current_model="opencode-go/kimi-k2.5",
+                prompt_choice=fake_prompt_choice,
+                prompt_fn=lambda _: None,
+            )
+
+        offered = captured_choices["choices"]
+        assert "kimi-k2.5" in offered
+        assert "minimax-m2.7" in offered
+        assert all("opencode-go/" not in choice for choice in offered)
@@ -196,6 +196,31 @@ class TestDisplayIntegration:
        set_active_skin("ares")
        assert get_skin_tool_prefix() == "╎"

+    def test_get_skin_faces_default(self):
+        from agent.display import get_skin_faces, KawaiiSpinner
+        faces = get_skin_faces("waiting_faces", KawaiiSpinner.KAWAII_WAITING)
+        # Default skin has no custom faces, so should return the default list
+        assert faces == KawaiiSpinner.KAWAII_WAITING
+
+    def test_get_skin_faces_ares(self):
+        from hermes_cli.skin_engine import set_active_skin
+        from agent.display import get_skin_faces, KawaiiSpinner
+        set_active_skin("ares")
+        faces = get_skin_faces("waiting_faces", KawaiiSpinner.KAWAII_WAITING)
+        assert "(⚔)" in faces
+
+    def test_get_skin_verbs_default(self):
+        from agent.display import get_skin_verbs, KawaiiSpinner
+        verbs = get_skin_verbs()
+        assert verbs == KawaiiSpinner.THINKING_VERBS
+
+    def test_get_skin_verbs_ares(self):
+        from hermes_cli.skin_engine import set_active_skin
+        from agent.display import get_skin_verbs
+        set_active_skin("ares")
+        verbs = get_skin_verbs()
+        assert "forging" in verbs
+
    def test_tool_message_uses_skin_prefix(self):
        from hermes_cli.skin_engine import set_active_skin
        from agent.display import get_cute_tool_message
@@ -12,3 +12,33 @@ def test_show_status_includes_tavily_key(monkeypatch, capsys, tmp_path):
    output = capsys.readouterr().out
    assert "Tavily" in output
    assert "tvly...cdef" in output
+
+
+def test_show_status_termux_gateway_section_skips_systemctl(monkeypatch, capsys, tmp_path):
+    from hermes_cli import status as status_mod
+    import hermes_cli.auth as auth_mod
+    import hermes_cli.gateway as gateway_mod
+
+    monkeypatch.setenv("TERMUX_VERSION", "0.118.3")
+    monkeypatch.setenv("PREFIX", "/data/data/com.termux/files/usr")
+    monkeypatch.setattr(status_mod, "get_env_path", lambda: tmp_path / ".env", raising=False)
+    monkeypatch.setattr(status_mod, "get_hermes_home", lambda: tmp_path, raising=False)
+    monkeypatch.setattr(status_mod, "load_config", lambda: {"model": "gpt-5.4"}, raising=False)
+    monkeypatch.setattr(status_mod, "resolve_requested_provider", lambda requested=None: "openai-codex", raising=False)
+    monkeypatch.setattr(status_mod, "resolve_provider", lambda requested=None, **kwargs: "openai-codex", raising=False)
+    monkeypatch.setattr(status_mod, "provider_label", lambda provider: "OpenAI Codex", raising=False)
+    monkeypatch.setattr(auth_mod, "get_nous_auth_status", lambda: {}, raising=False)
+    monkeypatch.setattr(auth_mod, "get_codex_auth_status", lambda: {}, raising=False)
+    monkeypatch.setattr(gateway_mod, "find_gateway_pids", lambda exclude_pids=None: [], raising=False)
+
+    def _unexpected_systemctl(*args, **kwargs):
+        raise AssertionError("systemctl should not be called in the Termux status view")
+
+    monkeypatch.setattr(status_mod.subprocess, "run", _unexpected_systemctl)
+
+    status_mod.show_status(SimpleNamespace(all=False, deep=False))
+
+    output = capsys.readouterr().out
+    assert "Manager:      Termux / manual process" in output
+    assert "Start with:   hermes gateway" in output
+    assert "systemd (user)" not in output
@@ -368,9 +368,8 @@ class TestCmdUpdateLaunchdRestart:
        monkeypatch.setattr(
            gateway_cli, "is_macos", lambda: False,
        )
-        monkeypatch.setattr(
-            gateway_cli, "is_linux", lambda: True,
-        )
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
@@ -429,7 +428,8 @@ class TestCmdUpdateSystemService:
    ):
        """When user systemd is inactive but a system service exists, restart via system scope."""
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
@@ -458,7 +458,8 @@ class TestCmdUpdateSystemService:
    ):
        """When system service restart fails, show the failure message."""
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
@@ -480,7 +481,8 @@ class TestCmdUpdateSystemService:
    ):
        """When both user and system services are active, both are restarted."""
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
@@ -563,7 +565,8 @@ class TestServicePidExclusion:
    ):
        """After systemd restart, the sweep must exclude the service PID."""
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)

        SERVICE_PID = 55000

@@ -642,7 +645,8 @@ class TestGetServicePids:
    """Unit tests for _get_service_pids()."""

    def test_returns_systemd_main_pid(self, monkeypatch):
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)

        def fake_run(cmd, **kwargs):
@@ -691,7 +695,8 @@ class TestGetServicePids:

    def test_excludes_zero_pid(self, monkeypatch):
        """systemd returns MainPID=0 for stopped services; skip those."""
-        monkeypatch.setattr(gateway_cli, "is_linux", lambda: True)
+        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
+        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
        monkeypatch.setattr(gateway_cli, "is_macos", lambda: False)

        def fake_run(cmd, **kwargs):
@@ -172,6 +172,87 @@ class TestHTTP413Compression:
        mock_compress.assert_called_once()
        assert result["completed"] is True

+    def test_413_clears_conversation_history_on_persist(self, agent):
+        """After 413-triggered compression, _persist_session must receive None history.
+
+        Bug: _compress_context() creates a new session and resets _last_flushed_db_idx=0,
+        but if conversation_history still holds the original (pre-compression) list,
+        _flush_messages_to_session_db computes flush_from = max(len(history), 0) which
+        exceeds len(compressed_messages), so messages[flush_from:] is empty and nothing
+        is written to the new session → "Session found but has no messages" on resume.
+        """
+        err_413 = _make_413_error()
+        ok_resp = _mock_response(content="OK", finish_reason="stop")
+        agent.client.chat.completions.create.side_effect = [err_413, ok_resp]
+
+        big_history = [
+            {"role": "user" if i % 2 == 0 else "assistant", "content": f"msg {i}"}
+            for i in range(200)
+        ]
+
+        persist_calls = []
+
+        with (
+            patch.object(agent, "_compress_context") as mock_compress,
+            patch.object(
+                agent, "_persist_session",
+                side_effect=lambda msgs, hist: persist_calls.append(hist),
+            ),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            mock_compress.return_value = (
+                [{"role": "user", "content": "summary"}],
+                "compressed prompt",
+            )
+            agent.run_conversation("hello", conversation_history=big_history)
+
+        assert len(persist_calls) >= 1, "Expected at least one _persist_session call"
+        for hist in persist_calls:
+            assert hist is None, (
+                f"conversation_history should be None after mid-loop compression, "
+                f"got list with {len(hist)} items"
+            )
+
+    def test_context_overflow_clears_conversation_history_on_persist(self, agent):
+        """After context-overflow compression, _persist_session must receive None history."""
+        err_400 = Exception(
+            "Error code: 400 - This endpoint's maximum context length is 128000 tokens. "
+            "However, you requested about 270460 tokens."
+        )
+        err_400.status_code = 400
+        ok_resp = _mock_response(content="OK", finish_reason="stop")
+        agent.client.chat.completions.create.side_effect = [err_400, ok_resp]
+
+        big_history = [
+            {"role": "user" if i % 2 == 0 else "assistant", "content": f"msg {i}"}
+            for i in range(200)
+        ]
+
+        persist_calls = []
+
+        with (
+            patch.object(agent, "_compress_context") as mock_compress,
+            patch.object(
+                agent, "_persist_session",
+                side_effect=lambda msgs, hist: persist_calls.append(hist),
+            ),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            mock_compress.return_value = (
+                [{"role": "user", "content": "summary"}],
+                "compressed prompt",
+            )
+            agent.run_conversation("hello", conversation_history=big_history)
+
+        assert len(persist_calls) >= 1
+        for hist in persist_calls:
+            assert hist is None, (
+                f"conversation_history should be None after context-overflow compression, "
+                f"got list with {len(hist)} items"
+            )
+
    def test_400_context_length_triggers_compression(self, agent):
        """A 400 with 'maximum context length' should trigger compression, not abort as generic 4xx.

@@ -225,6 +225,26 @@ class TestDeveloperRoleSwap:
        assert kwargs["messages"][0]["role"] == "developer"


+class TestBuildApiKwargsChatCompletionsServiceTier:
+    """service_tier via request_overrides works on the chat_completions path."""
+
+    def test_includes_service_tier_via_request_overrides(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "openrouter")
+        agent.model = "gpt-4.1"
+        agent.request_overrides = {"service_tier": "priority"}
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert kwargs["service_tier"] == "priority"
+
+    def test_no_service_tier_when_overrides_empty(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "openrouter")
+        agent.model = "gpt-4.1"
+        agent.request_overrides = {}
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert "service_tier" not in kwargs
+
+
 class TestBuildApiKwargsAIGateway:
    def test_uses_chat_completions_format(self, monkeypatch):
        agent = _make_agent(monkeypatch, "ai-gateway", base_url="https://ai-gateway.vercel.sh/v1")
@@ -356,6 +376,25 @@ class TestBuildApiKwargsCodex:
        assert "reasoning" in kwargs
        assert kwargs["reasoning"]["effort"] == "medium"

+    def test_includes_service_tier_via_request_overrides(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "openai-codex", api_mode="codex_responses",
+                            base_url="https://chatgpt.com/backend-api/codex")
+        agent.model = "gpt-5.4"
+        agent.service_tier = "priority"
+        agent.request_overrides = {"service_tier": "priority"}
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert kwargs["service_tier"] == "priority"
+
+    def test_omits_max_output_tokens_for_codex_backend(self, monkeypatch):
+        agent = _make_agent(monkeypatch, "openai-codex", api_mode="codex_responses",
+                            base_url="https://chatgpt.com/backend-api/codex")
+        agent.model = "gpt-5.4"
+        agent.max_tokens = 20
+        messages = [{"role": "user", "content": "hi"}]
+        kwargs = agent._build_api_kwargs(messages)
+        assert "max_output_tokens" not in kwargs
+
    def test_includes_encrypted_content_in_include(self, monkeypatch):
        agent = _make_agent(monkeypatch, "openai-codex", api_mode="codex_responses",
                            base_url="https://chatgpt.com/backend-api/codex")
@@ -5,6 +5,7 @@ pieces. The OpenAI client and tool loading are mocked so no network calls
 are made.
 """

+import io
 import json
 import logging
 import re
@@ -1061,6 +1062,77 @@ class TestExecuteToolCalls:
        assert len(messages[0]["content"]) < 150_000
        assert ("Truncated" in messages[0]["content"] or "<persisted-output>" in messages[0]["content"])

+    def test_quiet_tool_output_suppressed_when_progress_callback_present(self, agent):
+        tc = _mock_tool_call(name="web_search", arguments='{"q":"test"}', call_id="c1")
+        mock_msg = _mock_assistant_msg(content="", tool_calls=[tc])
+        messages = []
+        agent.tool_progress_callback = lambda *args, **kwargs: None
+
+        with patch("run_agent.handle_function_call", return_value="search result"), \
+             patch.object(agent, "_safe_print") as mock_print:
+            agent._execute_tool_calls(mock_msg, messages, "task-1")
+
+        mock_print.assert_not_called()
+        assert len(messages) == 1
+        assert messages[0]["role"] == "tool"
+
+    def test_quiet_tool_output_prints_without_progress_callback(self, agent):
+        tc = _mock_tool_call(name="web_search", arguments='{"q":"test"}', call_id="c1")
+        mock_msg = _mock_assistant_msg(content="", tool_calls=[tc])
+        messages = []
+        agent.tool_progress_callback = None
+
+        with patch("run_agent.handle_function_call", return_value="search result"), \
+             patch.object(agent, "_safe_print") as mock_print:
+            agent._execute_tool_calls(mock_msg, messages, "task-1")
+
+        mock_print.assert_called_once()
+        assert "search" in str(mock_print.call_args.args[0]).lower()
+        assert len(messages) == 1
+        assert messages[0]["role"] == "tool"
+
+    def test_vprint_suppressed_in_parseable_quiet_mode(self, agent):
+        agent.suppress_status_output = True
+
+        with patch.object(agent, "_safe_print") as mock_print:
+            agent._vprint("status line", force=True)
+            agent._vprint("normal line")
+
+        mock_print.assert_not_called()
+
+    def test_run_conversation_suppresses_retry_noise_in_parseable_quiet_mode(self, agent):
+        class _RateLimitError(Exception):
+            status_code = 429
+
+            def __str__(self):
+                return "Error code: 429 - Rate limit exceeded."
+
+        responses = [_RateLimitError(), _mock_response(content="Recovered")]
+
+        def _fake_api_call(api_kwargs):
+            result = responses.pop(0)
+            if isinstance(result, Exception):
+                raise result
+            return result
+
+        agent.suppress_status_output = True
+        agent._interruptible_api_call = _fake_api_call
+        agent._persist_session = lambda *args, **kwargs: None
+        agent._save_trajectory = lambda *args, **kwargs: None
+        agent._save_session_log = lambda *args, **kwargs: None
+
+        captured = io.StringIO()
+        agent._print_fn = lambda *args, **kw: print(*args, file=captured, **kw)
+
+        with patch("run_agent.time.sleep", return_value=None):
+            result = agent.run_conversation("hello")
+
+        assert result["completed"] is True
+        assert result["final_response"] == "Recovered"
+        output = captured.getvalue()
+        assert "API call failed" not in output
+        assert "Rate limit reached" not in output
+

 class TestConcurrentToolExecution:
    """Tests for _execute_tool_calls_concurrent and dispatch logic."""
@@ -1877,6 +1949,68 @@ class TestRunConversation:
        assert result["final_response"] is not None
        assert "Thinking Budget Exhausted" in result["final_response"]

+    def test_length_with_tool_calls_returns_partial_without_executing_tools(self, agent):
+        self._setup_agent(agent)
+        bad_tc = _mock_tool_call(
+            name="write_file",
+            arguments='{"path":"report.md","content":"partial',
+            call_id="c1",
+        )
+        resp = _mock_response(content="", finish_reason="length", tool_calls=[bad_tc])
+        agent.client.chat.completions.create.return_value = resp
+
+        with (
+            patch("run_agent.handle_function_call") as mock_handle_function_call,
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            result = agent.run_conversation("write the report")
+
+        assert result["completed"] is False
+        assert result["partial"] is True
+        assert "truncated due to output length limit" in result["error"]
+        mock_handle_function_call.assert_not_called()
+
+    def test_truncated_tool_call_retries_once_before_refusing(self, agent):
+        """When tool call args are truncated, the agent retries the API call
+        once. If the retry succeeds (valid JSON args), tool execution proceeds."""
+        self._setup_agent(agent)
+        agent.valid_tool_names.add("write_file")
+        bad_tc = _mock_tool_call(
+            name="write_file",
+            arguments='{"path":"report.md","content":"partial',
+            call_id="c1",
+        )
+        truncated_resp = _mock_response(
+            content="", finish_reason="length", tool_calls=[bad_tc],
+        )
+        good_tc = _mock_tool_call(
+            name="write_file",
+            arguments='{"path":"report.md","content":"full content"}',
+            call_id="c2",
+        )
+        good_resp = _mock_response(
+            content="", finish_reason="stop", tool_calls=[good_tc],
+        )
+        with (
+            patch("run_agent.handle_function_call", return_value='{"success":true}') as mock_hfc,
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            # First call: truncated → retry. Second: valid → execute tool.
+            # Third: final text response.
+            final_resp = _mock_response(content="Done!", finish_reason="stop")
+            agent.client.chat.completions.create.side_effect = [
+                truncated_resp, good_resp, final_resp,
+            ]
+            result = agent.run_conversation("write the report")
+
+        # Tool was executed on the retry (good_resp)
+        mock_hfc.assert_called_once()
+        assert result["final_response"] == "Done!"
+

 class TestRetryExhaustion:
    """Regression: retry_count > max_retries was dead code (off-by-one).
@@ -3010,6 +3144,20 @@ class TestStreamingApiCall:
        assert tc[0].function.name == "search"
        assert tc[1].function.name == "read"

+    def test_truncated_tool_call_args_upgrade_finish_reason_to_length(self, agent):
+        chunks = [
+            _make_chunk(tool_calls=[_make_tc_delta(0, "call_1", "write_file", '{"path":"x.txt","content":"hel')]),
+        ]
+        agent.client.chat.completions.create.return_value = iter(chunks)
+
+        resp = agent._interruptible_streaming_api_call({"messages": []})
+
+        tc = resp.choices[0].message.tool_calls
+        assert len(tc) == 1
+        assert tc[0].function.name == "write_file"
+        assert tc[0].function.arguments == '{"path":"x.txt","content":"hel'
+        assert resp.choices[0].finish_reason == "length"
+
    def test_ollama_reused_index_separate_tool_calls(self, agent):
        """Ollama sends every tool call at index 0 with different ids.

@@ -648,6 +648,15 @@ def test_preflight_codex_api_kwargs_allows_reasoning_and_temperature(monkeypatch
    assert result["max_output_tokens"] == 4096


+def test_preflight_codex_api_kwargs_allows_service_tier(monkeypatch):
+    agent = _build_agent(monkeypatch)
+    kwargs = _codex_request_kwargs()
+    kwargs["service_tier"] = "priority"
+
+    result = agent._preflight_codex_api_kwargs(kwargs)
+    assert result["service_tier"] == "priority"
+
+
 def test_run_conversation_codex_replay_payload_keeps_call_id(monkeypatch):
    agent = _build_agent(monkeypatch)
    responses = [_codex_tool_call_response(), _codex_message_response("done")]
@@ -20,13 +20,6 @@ from zoneinfo import ZoneInfo
 import hermes_time


-def _reset_hermes_time_cache():
-    """Reset the hermes_time module cache (replacement for removed reset_cache)."""
-    hermes_time._cached_tz = None
-    hermes_time._cached_tz_name = None
-    hermes_time._cache_resolved = False
-
-
 # =========================================================================
 # hermes_time.now() — core helper
 # =========================================================================
@@ -35,10 +28,10 @@ class TestHermesTimeNow:
    """Test the timezone-aware now() helper."""

    def setup_method(self):
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

    def teardown_method(self):
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()
        os.environ.pop("HERMES_TIMEZONE", None)

    def test_valid_timezone_applies(self):
@@ -93,24 +86,24 @@ class TestHermesTimeNow:
    def test_cache_invalidation(self):
        """Changing env var + reset_cache picks up new timezone."""
        os.environ["HERMES_TIMEZONE"] = "UTC"
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()
        r1 = hermes_time.now()
        assert r1.utcoffset() == timedelta(0)

        os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()
        r2 = hermes_time.now()
        assert r2.utcoffset() == timedelta(hours=5, minutes=30)


 class TestGetTimezone:
-    """Test get_timezone()."""
+    """Test get_timezone() and get_timezone_name()."""

    def setup_method(self):
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

    def teardown_method(self):
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()
        os.environ.pop("HERMES_TIMEZONE", None)

    def test_returns_zoneinfo_for_valid(self):
@@ -129,6 +122,9 @@ class TestGetTimezone:
        tz = hermes_time.get_timezone()
        assert tz is None

+    def test_get_timezone_name(self):
+        os.environ["HERMES_TIMEZONE"] = "Asia/Tokyo"
+        assert hermes_time.get_timezone_name() == "Asia/Tokyo"


 # =========================================================================
@@ -209,10 +205,10 @@ class TestCronTimezone:
    """Verify cron paths use timezone-aware now()."""

    def setup_method(self):
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

    def teardown_method(self):
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()
        os.environ.pop("HERMES_TIMEZONE", None)

    def test_parse_schedule_duration_uses_tz_aware_now(self):
@@ -241,7 +237,7 @@ class TestCronTimezone:
        monkeypatch.setattr(jobs_module, "OUTPUT_DIR", tmp_path / "cron" / "output")

        os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

        # Create a job with a NAIVE past timestamp (simulating pre-tz data)
        from cron.jobs import create_job, load_jobs, save_jobs, get_due_jobs
@@ -266,7 +262,7 @@ class TestCronTimezone:
        from cron.jobs import _ensure_aware

        os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

        # Create a naive datetime — will be interpreted as system-local time
        naive_dt = datetime(2026, 3, 11, 12, 0, 0)
@@ -290,7 +286,7 @@ class TestCronTimezone:
        from cron.jobs import _ensure_aware

        os.environ["HERMES_TIMEZONE"] = "Asia/Kolkata"
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

        # Create an aware datetime in UTC
        utc_dt = datetime(2026, 3, 11, 15, 0, 0, tzinfo=timezone.utc)
@@ -316,7 +312,7 @@ class TestCronTimezone:
        monkeypatch.setattr(jobs_module, "OUTPUT_DIR", tmp_path / "cron" / "output")

        os.environ["HERMES_TIMEZONE"] = "UTC"
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

        from cron.jobs import create_job, load_jobs, save_jobs, get_due_jobs

@@ -347,7 +343,7 @@ class TestCronTimezone:
        # of the naive timestamp exceeds _hermes_now's wall time — this would
        # have caused a false "not due" with the old replace(tzinfo=...) approach.
        os.environ["HERMES_TIMEZONE"] = "Pacific/Midway"  # UTC-11
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

        from cron.jobs import create_job, load_jobs, save_jobs, get_due_jobs
        create_job(prompt="Cross-tz job", schedule="every 1h")
@@ -371,7 +367,7 @@ class TestCronTimezone:
        monkeypatch.setattr(jobs_module, "OUTPUT_DIR", tmp_path / "cron" / "output")

        os.environ["HERMES_TIMEZONE"] = "US/Eastern"
-        _reset_hermes_time_cache()
+        hermes_time.reset_cache()

        from cron.jobs import create_job
        job = create_job(prompt="TZ test", schedule="every 2h")
@@ -8,9 +8,12 @@ import tools.approval as approval_module
 from tools.approval import (
    _get_approval_mode,
    approve_session,
+    clear_session,
    detect_dangerous_command,
+    has_pending,
    is_approved,
    load_permanent,
+    pop_pending,
    prompt_dangerous_approval,
    submit_pending,
 )
@@ -110,21 +113,42 @@ class TestSafeCommand:
        assert desc is None


-def _clear_session(key):
-    """Replace for removed clear_session() — directly clear internal state."""
-    approval_module._session_approved.pop(key, None)
-    approval_module._pending.pop(key, None)
+class TestSubmitAndPopPending:
+    def test_submit_and_pop(self):
+        key = "test_session_pending"
+        clear_session(key)
+
+        submit_pending(key, {"command": "rm -rf /", "pattern_key": "rm"})
+        assert has_pending(key) is True
+
+        approval = pop_pending(key)
+        assert approval["command"] == "rm -rf /"
+        assert has_pending(key) is False
+
+    def test_pop_empty_returns_none(self):
+        key = "test_session_empty"
+        clear_session(key)
+        assert pop_pending(key) is None
+        assert has_pending(key) is False


 class TestApproveAndCheckSession:
    def test_session_approval(self):
        key = "test_session_approve"
-        _clear_session(key)
+        clear_session(key)

        assert is_approved(key, "rm") is False
        approve_session(key, "rm")
        assert is_approved(key, "rm") is True

+    def test_clear_session_removes_approvals(self):
+        key = "test_session_clear"
+        approve_session(key, "rm")
+        assert is_approved(key, "rm") is True
+        clear_session(key)
+        assert is_approved(key, "rm") is False
+        assert has_pending(key) is False
+

 class TestSessionKeyContext:
    def test_context_session_key_overrides_process_env(self):
@@ -155,6 +179,49 @@ class TestSessionKeyContext:
        assert "set_current_session_key" in called_names
        assert "reset_current_session_key" in called_names

+    def test_context_keeps_pending_approval_attached_to_originating_session(self):
+        import os
+        import threading
+
+        clear_session("alice")
+        clear_session("bob")
+        pop_pending("alice")
+        pop_pending("bob")
+        approval_module._permanent_approved.clear()
+
+        alice_ready = threading.Event()
+        bob_ready = threading.Event()
+
+        def worker_alice():
+            token = approval_module.set_current_session_key("alice")
+            try:
+                os.environ["HERMES_EXEC_ASK"] = "1"
+                os.environ["HERMES_SESSION_KEY"] = "alice"
+                alice_ready.set()
+                bob_ready.wait()
+                approval_module.check_all_command_guards("rm -rf /tmp/alice-secret", "local")
+            finally:
+                approval_module.reset_current_session_key(token)
+
+        def worker_bob():
+            alice_ready.wait()
+            token = approval_module.set_current_session_key("bob")
+            try:
+                os.environ["HERMES_SESSION_KEY"] = "bob"
+                bob_ready.set()
+            finally:
+                approval_module.reset_current_session_key(token)
+
+        t1 = threading.Thread(target=worker_alice)
+        t2 = threading.Thread(target=worker_bob)
+        t1.start()
+        t2.start()
+        t1.join()
+        t2.join()
+
+        assert pop_pending("alice") is not None
+        assert pop_pending("bob") is None
+

 class TestRmFalsePositiveFix:
    """Regression tests: filenames starting with 'r' must NOT trigger recursive delete."""
@@ -434,13 +501,13 @@ class TestPatternKeyUniqueness:
        _, key_exec, _ = detect_dangerous_command("find . -exec rm {} \\;")
        _, key_delete, _ = detect_dangerous_command("find . -name '*.tmp' -delete")
        session = "test_find_collision"
-        _clear_session(session)
+        clear_session(session)
        approve_session(session, key_exec)
        assert is_approved(session, key_exec) is True
        assert is_approved(session, key_delete) is False, (
            "approving find -exec rm should not auto-approve find -delete"
        )
-        _clear_session(session)
+        clear_session(session)

    def test_legacy_find_key_still_approves_find_exec(self):
        """Old allowlist entry 'find' should keep approving the matching command."""
@@ -19,6 +19,7 @@ from tools.browser_camofox import (
    camofox_type,
    camofox_vision,
    check_camofox_available,
+    cleanup_all_camofox_sessions,
    is_camofox_mode,
 )

@@ -273,3 +274,22 @@ class TestBrowserToolRouting:
        assert check_browser_requirements() is True


+# ---------------------------------------------------------------------------
+# Cleanup helper
+# ---------------------------------------------------------------------------
+
+
+class TestCamofoxCleanup:
+    @patch("tools.browser_camofox.requests.post")
+    @patch("tools.browser_camofox.requests.delete")
+    def test_cleanup_all(self, mock_delete, mock_post, monkeypatch):
+        monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377")
+        mock_post.return_value = _mock_response(json_data={"tabId": "tab_c", "url": "https://x.com"})
+        camofox_navigate("https://x.com", task_id="t_cleanup")
+
+        mock_delete.return_value = _mock_response(json_data={"ok": True})
+        cleanup_all_camofox_sessions()
+
+        # Session should be gone
+        result = json.loads(camofox_snapshot(task_id="t_cleanup"))
+        assert result["success"] is False
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Teknium	bf754b4611	fix: clear conversation_history after mid-loop compression to prevent empty sessions After mid-loop compression (triggered by 413, context_overflow, or Anthropic long-context tier errors), _compress_context() creates a new session in SQLite and resets _last_flushed_db_idx=0. However, conversation_history was not cleared, so _flush_messages_to_session_db() computed: flush_from = max(len(conversation_history=200), _last_flushed_db_idx=0) = 200 messages[200:] → empty (compressed messages < 200) This resulted in zero messages being written to the new session's SQLite store. On resume, the user would see 'Session found but has no messages.' The preflight compression path (line 7311) already had the fix: conversation_history = None This commit adds the same clearing to the three mid-loop compression sites: - Anthropic long-context tier overflow - HTTP 413 payload too large - Generic context_overflow error Reported by Aaryan (Nous community).	2026-04-10 00:05:04 -07:00
Teknium	13d7ff3420	fix(gateway): bypass text batching when delay is 0 (#6996 ) The text batching feature routes TEXT messages through asyncio.create_task() + asyncio.sleep(delay). Even with delay=0, the task fires asynchronously and won't complete before synchronous test assertions. This broke 33 tests across Discord, Matrix, and WeCom adapters. When _text_batch_delay_seconds is 0 (the test fixture setting), dispatch directly to handle_message() instead of going through the async batching path. This preserves the pre-batching behavior for tests while keeping batching active in production (default delay 0.6s).	2026-04-09 23:59:20 -07:00
Teknium	d5023d36d8	docs: document streaming timeout auto-detection for local LLMs (#6990 ) Add streaming timeout documentation to three pages: - guides/local-llm-on-mac.md: New 'Timeouts' section with table of all three timeouts, their defaults, local auto-adjustments, and env var overrides - reference/faq.md: Tip box in the local models FAQ section - user-guide/configuration.md: 'Streaming Timeouts' subsection under the agent config section Follow-up to #6967.	2026-04-09 23:28:25 -07:00
Sahil	0602ff8f58	fix(docker): use uv for dependency resolution to fix resolution-too-deep error	2026-04-09 23:25:56 -07:00
Teknium	8104f400f8	test: disable text batching in existing adapter tests Set _text_batch_delay_seconds = 0 on test adapter fixtures so messages dispatch immediately (bypassing async batching). This preserves the existing synchronous assertion patterns while the batching logic is tested separately in test_text_batching.py.	2026-04-09 23:25:27 -07:00
Teknium	1ed00496f2	test: add text batching tests for Discord, Matrix, WeCom, Telegram, Feishu 22 tests covering: - Single message dispatch after delay - Split message aggregation (2-way and 3-way) - Different chats/rooms not merged - Adaptive delay for near-limit chunks - State cleanup after flush - Split continuation merging All 5 platform adapters tested.	2026-04-09 23:25:27 -07:00
Teknium	f92a0b8596	fix(feishu): add adaptive batch delay for split long messages Feishu already had text batching with a static 0.6s delay. This adds adaptive delay: waits 2.0s when a chunk is near the ~4096-char split point since a continuation is almost certain. Tracks _last_chunk_len on each queued event to determine the delay. Configurable via HERMES_FEISHU_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 2.0). Ref #6892	2026-04-09 23:25:27 -07:00
Teknium	1723e8e998	fix(wecom): add text batching to merge split long messages Ports the adaptive batching pattern from the Telegram adapter. WeCom clients split messages around 4000 chars. Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. Only text messages are batched; commands/media dispatch immediately. Ref #6892	2026-04-09 23:25:27 -07:00
Teknium	07148cac9a	fix(matrix): add text batching to merge split long messages Ports the adaptive batching pattern from the Telegram adapter. Matrix clients split messages around 4000 chars. Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. Only text messages are batched; commands dispatch immediately. Ref #6892	2026-04-09 23:25:27 -07:00
Teknium	0fc0c1c83b	fix(discord): add text batching to merge split long messages Cherry-picked from PR #6894 by SHL0MS with fixes: - Only batch TEXT messages; commands/media dispatch immediately - Use build_session_key() for proper session-scoped batch keys - Consistent naming (_text_batch_delay_seconds) - Proper Dict[str, MessageEvent] typing Discord splits at 2000 chars (lowest of all platforms). Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise.	2026-04-09 23:25:27 -07:00
Teknium	5075717949	fix(telegram): adaptive batch delay for split long messages Cherry-picked from PR #6891 by SHL0MS. When a chunk is near the 4096-char split point, wait 2.0s instead of 0.6s since a continuation is almost certain.	2026-04-09 23:25:27 -07:00
Teknium	f783986f5a	fix: increase stream read timeout default to 120s, auto-raise for local LLMs (#6967 ) Raise the default httpx stream read timeout from 60s to 120s for all providers. Additionally, auto-detect local LLM endpoints (Ollama, llama.cpp, vLLM) and raise the read timeout to HERMES_API_TIMEOUT (1800s) since local models can take minutes for prefill on large contexts before producing the first token. The stale stream timeout already had this local auto-detection pattern; the httpx read timeout was missing it — causing a hard 60s wall that users couldn't find (HERMES_STREAM_READ_TIMEOUT was undocumented). Changes: - Default HERMES_STREAM_READ_TIMEOUT: 60s -> 120s - Auto-detect local endpoints -> raise to 1800s (user override respected) - Document HERMES_STREAM_READ_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT - Add 10 parametrized tests Reported-by: Pavan Srinivas (@pavanandums)	2026-04-09 22:35:30 -07:00
emozilla	bda9aa17cb	fix(streaming): prevent <think> in prose from suppressing response output When the model mentions <think> as literal text in its response (e.g. "(/think not producing <think> tags)"), the streaming display treated it as a reasoning block opener and suppressed everything after it. The response box would close with truncated content and no error — the API response was complete but the display ate it. Root cause: _stream_delta() matched <think> anywhere in the text stream regardless of position. Real reasoning blocks always start at the beginning of a line; mentions in prose appear mid-sentence. Fix: track line position across streaming deltas with a _stream_last_was_newline flag. Only enter reasoning suppression when the tag appears at a block boundary (start of stream, after a newline, or after only whitespace on the current line). Add a _flush_stream() safety net that recovers buffered content if no closing tag is found by end-of-stream. Also fixes three related issues discovered during investigation: - anthropic_adapter: _get_anthropic_max_output() now normalizes dots to hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key (was returning 32K instead of 128K) - run_agent: send explicit max_tokens for Claude models on Nous Portal, same as OpenRouter — both proxy to Anthropic's API which requires it. Without it the backend defaults to a low limit that truncates responses. - run_agent: reset truncated_tool_call_retries after successful tool execution so a single truncation doesn't poison the entire conversation.	2026-04-09 22:16:36 -07:00
Teknium	8394b5ddd2	feat: expand /fast to all OpenAI Priority Processing models (#6960 ) Previously /fast only supported gpt-5.4 and forced a provider switch to openai-codex. Now supports all 13 models from OpenAI's Priority Processing pricing table (gpt-5.4, gpt-5.4-mini, gpt-5.2, gpt-5.1, gpt-5, gpt-5-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o4-mini). Key changes: - Replaced _FAST_MODE_BACKEND_CONFIG with _PRIORITY_PROCESSING_MODELS frozenset - Removed provider-forcing logic — service_tier is now injected into whatever API path the user is already on (Codex Responses, Chat Completions, or OpenRouter passthrough) - Added request_overrides support to chat_completions path in run_agent.py - Updated messaging from 'Codex inference tier' to 'Priority Processing' - Expanded test coverage for all supported models	2026-04-09 22:06:30 -07:00
g-guthrie	d416a69288	feat: add Codex fast mode toggle (/fast command) Add /fast slash command to toggle OpenAI Codex service_tier between normal and priority ('fast') inference. Only exposed for models registered in _FAST_MODE_BACKEND_CONFIG (currently gpt-5.4). - Registry-based backend config for extensibility - Dynamic command visibility (hidden from help/autocomplete for non-supported models) via command_filter on SlashCommandCompleter - service_tier flows through request_overrides from route resolution - Omit max_output_tokens for Codex backend (rejects it) - Persists to config.yaml under agent.service_tier Salvage cleanup: removed simple_term_menu/input() menu (banned), bare /fast now shows status like /reasoning. Removed redundant override resolution in _build_api_kwargs — single source of truth via request_overrides from route. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-04-09 21:54:32 -07:00
Teknium	4caa635803	fix: add auth.json write-back for Codex retry and valid-token early-return paths The Codex retry block and valid-token short-circuit in _refresh_entry() both return early, bypassing the auth.json sync at the end of the method. This adds _sync_device_code_entry_to_auth_store() calls on both paths so refreshed/synced tokens are written back to auth.json regardless of which code path succeeds.	2026-04-09 21:48:50 -07:00
Ben Barclay	a64d8a83e1	fix: proactive Codex CLI sync before refresh + retry on failure	2026-04-09 21:48:50 -07:00
Ben Barclay	dfde4058cf	fix: sync refreshed OAuth tokens from pool back to auth.json providers	2026-04-09 21:48:50 -07:00
Ben Barclay	13b3ea6484	fix: skip stale Nous pool entry when agent_key is expired	2026-04-09 21:48:50 -07:00
Teknium	b87d00288d	fix: add actionable hint for OpenRouter 'no tool endpoints' error When OpenRouter returns 'No endpoints found that support tool use' (HTTP 404), display a hint explaining that provider routing restrictions may be filtering out tool-capable providers. Links the user directly to the model's OpenRouter page to check which providers support tools. The hint fires in the error display block that runs regardless of whether fallback succeeds — so the user always understands WHY the model failed, not just that it fell back. Reported via Discord: GLM-5.1 on OpenRouter with US-based provider restrictions eliminated all 4 tool-supporting endpoints (DeepInfra, Z.AI, Friendli, Venice), leaving only 7 non-tool providers.	2026-04-09 18:03:09 -07:00
kshitijk4poor	08e2a1a51e	fix(anthropic): omit tool-streaming beta on MiniMax endpoints MiniMax's Anthropic-compatible endpoints reject requests that include the fine-grained-tool-streaming beta header — every tool-use message triggers a connection error (~18s timeout). Regular chat works fine. Add _common_betas_for_base_url() that filters out the tool-streaming beta for Bearer-auth (MiniMax) endpoints while keeping all other betas. All four client-construction branches now use the filtered list. Based on #6528 by @HiddenPuppy. Original cherry-picked from PR #6688 by kshitijk4poor. Fixes #6510, fixes #6555.	2026-04-09 17:53:52 -07:00
Teknium	9634e20e15	feat: API server model name derived from profile name (#6857 ) * feat: API server model name derived from profile name For multi-user setups (e.g. OpenWebUI), each profile's API server now advertises a distinct model name on /v1/models: - Profile 'lucas' -> model ID 'lucas' - Profile 'admin' -> model ID 'admin' - Default profile -> 'hermes-agent' (unchanged) Explicit override via API_SERVER_MODEL_NAME env var or platforms.api_server.model_name config for custom names. Resolves friction where OpenWebUI couldn't distinguish multiple hermes-agent connections all advertising the same model name. * docs: multi-user setup with profiles for API server + Open WebUI - api-server.md: added Multi-User Setup section, API_SERVER_MODEL_NAME to config table, updated /v1/models description - open-webui.md: added Multi-User Setup with Profiles section with step-by-step guide, updated model name references - environment-variables.md: added API_SERVER_MODEL_NAME entry	2026-04-09 17:07:29 -07:00
AIandI0x1	2d0d05a337	fix(agent): detect truncated streaming tool calls before execution When a streaming response is cut mid-tool-call (connection drop, timeout), the accumulated function.arguments is invalid JSON. The mock response builder defaulted finish_reason to 'stop', so the agent loop treated it as a valid completed turn and tried to execute tools with broken args. Fix: validate tool call arguments with json.loads() during mock response reconstruction. If any are invalid JSON, override finish_reason to 'length'. In the main loop's length handler, if tool calls are present, refuse to execute and return partial=True with a clear error instead of silently failing or wasting retries. Also fixes _thinking_exhausted to not short-circuit when tool calls are present — truncated tool calls are not thinking exhaustion. Original cherry-picked from PR #6776 by AIandI0x1. Closes #6638.	2026-04-09 17:03:54 -07:00
Teknium	3b554bf839	fix: test for suppress_status_output should capture stdout, not mock _vprint The test was mocking _vprint entirely, bypassing the suppress guard. Switch to capturing _print_fn output so the real _vprint runs and the guard suppresses retry noise as intended.	2026-04-09 16:24:53 -07:00
Teknium	69a0092c38	fix: deduplicate _is_termux() into hermes_constants.is_termux() Replace 6 identical copies of the Termux detection function across cli.py, browser_tool.py, voice_mode.py, status.py, doctor.py, and gateway.py with a single shared implementation in hermes_constants.py. Each call site imports with its original local name to preserve all existing callers (internal references and test monkeypatches).	2026-04-09 16:24:53 -07:00
adybag14-cyber	c3141429b7	fix(termux): tighten voice setup and mobile chat UX	2026-04-09 16:24:53 -07:00
adybag14-cyber	769ec1ee1a	fix(termux): deepen browser, voice, and tui support	2026-04-09 16:24:53 -07:00
adybag14-cyber	3237733ca5	fix(termux): harden execute_code and mobile browser/audio UX	2026-04-09 16:24:53 -07:00
adybag14-cyber	54d5138a54	fix(termux): harden env-backed background jobs	2026-04-09 16:24:53 -07:00
adybag14-cyber	6dcb3c4774	fix(termux): compact narrow-screen tui chrome	2026-04-09 16:24:53 -07:00
adybag14-cyber	096b3f9f12	fix(termux): add local image chat route	2026-04-09 16:24:53 -07:00
adybag14-cyber	a3aed1bd26	fix(termux): keep quiet chat output parseable	2026-04-09 16:24:53 -07:00
adybag14-cyber	4970705ed3	fix(termux): silence quiet chat tool previews	2026-04-09 16:24:53 -07:00
adybag14-cyber	2194425918	fix(termux): make setup-hermes use android path	2026-04-09 16:24:53 -07:00
adybag14-cyber	3878495972	fix(termux): disable gateway service flows on android	2026-04-09 16:24:53 -07:00
adybag14-cyber	4e40e93b98	fix(termux): improve status and install UX	2026-04-09 16:24:53 -07:00
adybag14-cyber	122925a6f2	fix(termux): honor temp dirs for local temp artifacts	2026-04-09 16:24:53 -07:00
adybag14-cyber	e79cc88985	feat: add tested Termux install path and EOF-aware gh auth	2026-04-09 16:24:53 -07:00
sprmn24	e053433c84	fix(error_classifier): disambiguate usage-limit patterns in _classify_by_message _classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so messages like 'usage limit exceeded, try again in 5 minutes' arriving without an HTTP status code fell through to FailoverReason.unknown instead of rate_limit. Apply the same billing/rate-limit disambiguation that _classify_402 already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit, USAGE_LIMIT_PATTERNS alone → billing. Add 4 tests covering the no-status-code usage-limit path.	2026-04-09 16:24:13 -07:00