feat(nix): container-aware CLI — auto-route hermes chat into managed container

When container.enable = true in the NixOS module, running 'hermes chat' on the host now automatically execs into the managed container via docker/podman exec. This means the interactive CLI runs in the same environment as the gateway service, with access to all container-installed packages and tools. Implementation: - NixOS activation script writes .container-mode metadata file to HERMES_HOME with backend, container_name, and hermes_bin path - File is removed when container mode is disabled (nixos-rebuild switch) - hermes_cli/config.py: _is_inside_container() detects Docker/Podman indicators (/.dockerenv, /run/.containerenv, cgroup) - hermes_cli/config.py: get_container_exec_info() reads .container-mode metadata, returns None when already inside a container - hermes_cli/main.py: _exec_in_container() validates the container is running, then os.execvp() replaces the process with the container exec - cmd_chat intercepts before normal flow, checks container info, execs Safety: - --host flag bypasses container routing (run on host regardless) - Falls back to host CLI if: container runtime not found, container not running, inspect fails, or any detection error - Strips --host from forwarded args (not meaningful inside container) - Already-inside-container detection prevents infinite exec loops Closes #7380
2026-04-11 06:15:44 +05:30
117 changed files with 3235 additions and 8671 deletions
@@ -6,7 +6,7 @@ ENV PYTHONUNBUFFERED=1
 # Install system dependencies in one layer, clear APT cache
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
-        build-essential nodejs npm python3 python3-pip ripgrep ffmpeg gcc python3-dev libffi-dev procps && \
+        build-essential nodejs npm python3 python3-pip ripgrep ffmpeg gcc python3-dev libffi-dev && \
    rm -rf /var/lib/apt/lists/*

 COPY . /opt/hermes
@@ -60,8 +60,6 @@ _ANTHROPIC_OUTPUT_LIMITS = {
    "claude-3-opus":       4_096,
    "claude-3-sonnet":     4_096,
    "claude-3-haiku":      4_096,
-    # Third-party Anthropic-compatible providers
-    "minimax":            131_072,
 }

 # For any model not in the table, assume the highest current limit.
@@ -163,27 +161,18 @@ def _get_claude_code_version() -> str:


 def _is_oauth_token(key: str) -> bool:
-    """Check if the key is an Anthropic OAuth/setup token.
+    """Check if the key is an OAuth/setup token (not a regular Console API key).

-    Positively identifies Anthropic OAuth tokens by their key format:
-    - ``sk-ant-`` prefix (but NOT ``sk-ant-api``) → setup tokens, managed keys
-    - ``eyJ`` prefix → JWTs from the Anthropic OAuth flow
-
-    Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match either pattern
-    and correctly return False.
+    Regular API keys start with 'sk-ant-api'. Everything else (setup-tokens
+    starting with 'sk-ant-oat', managed keys, JWTs, etc.) needs Bearer auth.
    """
    if not key:
        return False
-    # Regular Anthropic Console API keys — x-api-key auth, never OAuth
+    # Regular Console API keys use x-api-key header
    if key.startswith("sk-ant-api"):
        return False
-    # Anthropic-issued tokens (setup-tokens sk-ant-oat-*, managed keys)
-    if key.startswith("sk-ant-"):
-        return True
-    # JWTs from Anthropic OAuth flow
-    if key.startswith("eyJ"):
-        return True
-    return False
+    # Everything else (setup-tokens, managed keys, JWTs) uses Bearer auth
+    return True


 def _normalize_base_url_text(base_url) -> str:
@@ -1315,10 +1304,9 @@ def build_anthropic_kwargs(
    # Map reasoning_config to Anthropic's thinking parameter.
    # Claude 4.6 models use adaptive thinking + output_config.effort.
    # Older models use manual thinking with budget_tokens.
-    # MiniMax Anthropic-compat endpoints support thinking (manual mode only,
-    # not adaptive).  Haiku does NOT support extended thinking — skip entirely.
+    # Haiku and MiniMax models do NOT support extended thinking — skip entirely.
    if reasoning_config and isinstance(reasoning_config, dict):
-        if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
+        if reasoning_config.get("enabled") is not False and "haiku" not in model.lower() and "minimax" not in model.lower():
            effort = str(reasoning_config.get("effort", "medium")).lower()
            budget = THINKING_BUDGET.get(effort, 8000)
            if _supports_adaptive_thinking(model):
@@ -59,9 +59,6 @@ from hermes_constants import OPENROUTER_BASE_URL

 logger = logging.getLogger(__name__)

-# Module-level flag: only warn once per process about stale OPENAI_BASE_URL.
-_stale_base_url_warned = False
-
 _PROVIDER_ALIASES = {
    "google": "gemini",
    "google-gemini": "gemini",
@@ -710,9 +707,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
            base_url = _to_openai_base_url(
                _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
            )
-            model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
-            if model is None:
-                continue  # skip provider if we don't know a valid aux model
+            model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
            logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
            extra = {}
            if "api.kimi.com" in base_url.lower():
@@ -731,9 +726,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
        base_url = _to_openai_base_url(
            str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
        )
-        model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
-        if model is None:
-            continue  # skip provider if we don't know a valid aux model
+        model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
        logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
        extra = {}
        if "api.kimi.com" in base_url.lower():
@@ -1082,12 +1075,11 @@ def _is_connection_error(exc: Exception) -> bool:
 def _try_payment_fallback(
    failed_provider: str,
    task: str = None,
-    reason: str = "payment error",
 ) -> Tuple[Optional[Any], Optional[str], str]:
-    """Try alternative providers after a payment/credit or connection error.
+    """Try alternative providers after a payment/credit error.

    Iterates the standard auto-detection chain, skipping the provider that
-    failed.
+    returned a payment error.

    Returns:
        (client, model, provider_label) or (None, None, "") if no fallback.
@@ -1113,15 +1105,15 @@ def _try_payment_fallback(
        client, model = try_fn()
        if client is not None:
            logger.info(
-                "Auxiliary %s: %s on %s — falling back to %s (%s)",
-                task or "call", reason, failed_provider, label, model or "default",
+                "Auxiliary %s: payment error on %s — falling back to %s (%s)",
+                task or "call", failed_provider, label, model or "default",
            )
            return client, model, label
        tried.append(label)

    logger.warning(
-        "Auxiliary %s: %s on %s and no fallback available (tried: %s)",
-        task or "call", reason, failed_provider, ", ".join(tried),
+        "Auxiliary %s: payment error on %s and no fallback available (tried: %s)",
+        task or "call", failed_provider, ", ".join(tried),
    )
    return None, None, ""

@@ -1136,28 +1128,9 @@ def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
         provider they already have credentials for — no OpenRouter key needed.
      2. OpenRouter → Nous → custom → Codex → API-key providers (original chain).
    """
-    global auxiliary_is_nous, _stale_base_url_warned
+    global auxiliary_is_nous
    auxiliary_is_nous = False  # Reset — _try_nous() will set True if it wins

-    # ── Warn once if OPENAI_BASE_URL is set but config.yaml uses a named
-    #    provider (not 'custom').  This catches the common "env poisoning"
-    #    scenario where a user switches providers via `hermes model` but the
-    #    old OPENAI_BASE_URL lingers in ~/.hermes/.env. ──
-    if not _stale_base_url_warned:
-        _env_base = os.getenv("OPENAI_BASE_URL", "").strip()
-        _cfg_provider = _read_main_provider()
-        if (_env_base and _cfg_provider
-                and _cfg_provider != "custom"
-                and not _cfg_provider.startswith("custom:")):
-            logger.warning(
-                "OPENAI_BASE_URL is set (%s) but model.provider is '%s'. "
-                "Auxiliary clients may route to the wrong endpoint. "
-                "Run: hermes model to reconfigure, or remove "
-                "OPENAI_BASE_URL from ~/.hermes/.env",
-                _env_base, _cfg_provider,
-            )
-            _stale_base_url_warned = True
-
    # ── Step 1: non-aggregator main provider → use main model directly ──
    main_provider = _read_main_provider()
    main_model = _read_main_model()
@@ -1244,7 +1217,6 @@ def resolve_provider_client(
    raw_codex: bool = False,
    explicit_base_url: str = None,
    explicit_api_key: str = None,
-    api_mode: str = None,
 ) -> Tuple[Optional[Any], Optional[str]]:
    """Central router: given a provider name and optional model, return a
    configured client with the correct auth, base URL, and API format.
@@ -1268,10 +1240,6 @@ def resolve_provider_client(
            the main agent loop).
        explicit_base_url: Optional direct OpenAI-compatible endpoint.
        explicit_api_key: Optional API key paired with explicit_base_url.
-        api_mode: API mode override.  One of "chat_completions",
-            "codex_responses", or None (auto-detect).  When set to
-            "codex_responses", the client is wrapped in
-            CodexAuxiliaryClient to route through the Responses API.

    Returns:
        (client, resolved_model) or (None, None) if auth is unavailable.
@@ -1279,40 +1247,6 @@ def resolve_provider_client(
    # Normalise aliases
    provider = _normalize_aux_provider(provider)

-    def _needs_codex_wrap(client_obj, base_url_str: str, model_str: str) -> bool:
-        """Decide if a plain OpenAI client should be wrapped for Responses API.
-
-        Returns True when api_mode is explicitly "codex_responses", or when
-        auto-detection (api.openai.com + codex-family model) suggests it.
-        Already-wrapped clients (CodexAuxiliaryClient) are skipped.
-        """
-        if isinstance(client_obj, CodexAuxiliaryClient):
-            return False
-        if raw_codex:
-            return False
-        if api_mode == "codex_responses":
-            return True
-        # Auto-detect: api.openai.com + codex model name pattern
-        if api_mode and api_mode != "codex_responses":
-            return False  # explicit non-codex mode
-        normalized_base = (base_url_str or "").strip().lower()
-        if "api.openai.com" in normalized_base and "openrouter" not in normalized_base:
-            model_lower = (model_str or "").lower()
-            if "codex" in model_lower:
-                return True
-        return False
-
-    def _wrap_if_needed(client_obj, final_model_str: str, base_url_str: str = ""):
-        """Wrap a plain OpenAI client in CodexAuxiliaryClient if Responses API is needed."""
-        if _needs_codex_wrap(client_obj, base_url_str, final_model_str):
-            logger.debug(
-                "resolve_provider_client: wrapping client in CodexAuxiliaryClient "
-                "(api_mode=%s, model=%s, base_url=%s)",
-                api_mode or "auto-detected", final_model_str,
-                base_url_str[:60] if base_url_str else "")
-            return CodexAuxiliaryClient(client_obj, final_model_str)
-        return client_obj
-
    # ── Auto: try all providers in priority order ────────────────────
    if provider == "auto":
        client, resolved = _resolve_auto()
@@ -1402,7 +1336,6 @@ def resolve_provider_client(
                from hermes_cli.models import copilot_default_headers
                extra["default_headers"] = copilot_default_headers()
            client = OpenAI(api_key=custom_key, base_url=custom_base, **extra)
-            client = _wrap_if_needed(client, final_model, custom_base)
            return (_to_async_client(client, final_model) if async_mode
                    else (client, final_model))
        # Try custom first, then codex, then API-key providers
@@ -1411,8 +1344,6 @@ def resolve_provider_client(
            client, default = try_fn()
            if client is not None:
                final_model = _normalize_resolved_model(model or default, provider)
-                _cbase = str(getattr(client, "base_url", "") or "")
-                client = _wrap_if_needed(client, final_model, _cbase)
                return (_to_async_client(client, final_model) if async_mode
                        else (client, final_model))
        logger.warning("resolve_provider_client: custom/main requested "
@@ -1432,7 +1363,6 @@ def resolve_provider_client(
                    provider,
                )
                client = OpenAI(api_key=custom_key, base_url=custom_base)
-                client = _wrap_if_needed(client, final_model, custom_base)
                logger.debug(
                    "resolve_provider_client: named custom provider %r (%s)",
                    provider, final_model)
@@ -1495,28 +1425,6 @@ def resolve_provider_client(

        client = OpenAI(api_key=api_key, base_url=base_url,
                        **({"default_headers": headers} if headers else {}))
-
-        # Copilot GPT-5+ models (except gpt-5-mini) require the Responses
-        # API — they are not accessible via /chat/completions.  Wrap the
-        # plain client in CodexAuxiliaryClient so call_llm() transparently
-        # routes through responses.stream().
-        if provider == "copilot" and final_model and not raw_codex:
-            try:
-                from hermes_cli.models import _should_use_copilot_responses_api
-                if _should_use_copilot_responses_api(final_model):
-                    logger.debug(
-                        "resolve_provider_client: copilot model %s needs "
-                        "Responses API — wrapping with CodexAuxiliaryClient",
-                        final_model)
-                    client = CodexAuxiliaryClient(client, final_model)
-            except ImportError:
-                pass
-
-        # Honor api_mode for any API-key provider (e.g. direct OpenAI with
-        # codex-family models).  The copilot-specific wrapping above handles
-        # copilot; this covers the general case (#6800).
-        client = _wrap_if_needed(client, final_model, base_url)
-
        logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
        return (_to_async_client(client, final_model) if async_mode
                else (client, final_model))
@@ -1549,13 +1457,12 @@ def get_text_auxiliary_client(task: str = "") -> Tuple[Optional[OpenAI], Optiona
    Callers may override the returned model with a per-task env var
    (e.g. CONTEXT_COMPRESSION_MODEL, AUXILIARY_WEB_EXTRACT_MODEL).
    """
-    provider, model, base_url, api_key, api_mode = _resolve_task_provider_model(task or None)
+    provider, model, base_url, api_key = _resolve_task_provider_model(task or None)
    return resolve_provider_client(
        provider,
        model=model,
        explicit_base_url=base_url,
        explicit_api_key=api_key,
-        api_mode=api_mode,
    )


@@ -1566,14 +1473,13 @@ def get_async_text_auxiliary_client(task: str = ""):
    (AsyncCodexAuxiliaryClient, model) which wraps the Responses API.
    Returns (None, None) when no provider is available.
    """
-    provider, model, base_url, api_key, api_mode = _resolve_task_provider_model(task or None)
+    provider, model, base_url, api_key = _resolve_task_provider_model(task or None)
    return resolve_provider_client(
        provider,
        model=model,
        async_mode=True,
        explicit_base_url=base_url,
        explicit_api_key=api_key,
-        api_mode=api_mode,
    )


@@ -1646,7 +1552,7 @@ def resolve_vision_provider_client(
    backends, so users can intentionally force experimental providers. Auto mode
    stays conservative and only tries vision backends known to work today.
    """
-    requested, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
+    requested, resolved_model, resolved_base_url, resolved_api_key = _resolve_task_provider_model(
        "vision", provider, model, base_url, api_key
    )
    requested = _normalize_vision_provider(requested)
@@ -1862,30 +1768,12 @@ def cleanup_stale_async_clients() -> None:
            del _client_cache[key]


-def _is_openrouter_client(client: Any) -> bool:
-    for obj in (client, getattr(client, "_client", None), getattr(client, "client", None)):
-        if obj and "openrouter" in str(getattr(obj, "base_url", "") or "").lower():
-            return True
-    return False
-
-
-def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
-    """Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.
-
-    Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
-    """
-    if model and "/" in model and not _is_openrouter_client(client):
-        return cached_default
-    return model or cached_default
-
-
 def _get_cached_client(
    provider: str,
    model: str = None,
    async_mode: bool = False,
    base_url: str = None,
    api_key: str = None,
-    api_mode: str = None,
 ) -> Tuple[Optional[Any], Optional[str]]:
    """Get or create a cached client for the given provider.

@@ -1909,7 +1797,7 @@ def _get_cached_client(
            loop_id = id(current_loop)
        except RuntimeError:
            pass
-    cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", loop_id)
+    cache_key = (provider, async_mode, base_url or "", api_key or "", loop_id)
    with _client_cache_lock:
        if cache_key in _client_cache:
            cached_client, cached_default, cached_loop = _client_cache[cache_key]
@@ -1921,11 +1809,9 @@ def _get_cached_client(
                    _force_close_async_httpx(cached_client)
                    del _client_cache[cache_key]
                else:
-                    effective = _compat_model(cached_client, model, cached_default)
-                    return cached_client, effective
+                    return cached_client, model or cached_default
            else:
-                effective = _compat_model(cached_client, model, cached_default)
-                return cached_client, effective
+                return cached_client, model or cached_default
    # Build outside the lock
    client, default_model = resolve_provider_client(
        provider,
@@ -1933,7 +1819,6 @@ def _get_cached_client(
        async_mode,
        explicit_base_url=base_url,
        explicit_api_key=api_key,
-        api_mode=api_mode,
    )
    if client is not None:
        # For async clients, remember which loop they were created on so we
@@ -1953,7 +1838,7 @@ def _resolve_task_provider_model(
    model: str = None,
    base_url: str = None,
    api_key: str = None,
-) -> Tuple[str, Optional[str], Optional[str], Optional[str], Optional[str]]:
+) -> Tuple[str, Optional[str], Optional[str], Optional[str]]:
    """Determine provider + model for a call.

    Priority:
@@ -1962,17 +1847,15 @@ def _resolve_task_provider_model(
      3. Config file (auxiliary.{task}.* or compression.*)
      4. "auto" (full auto-detection chain)

-    Returns (provider, model, base_url, api_key, api_mode) where model may
-    be None (use provider default). When base_url is set, provider is forced
-    to "custom" and the task uses that direct endpoint. api_mode is one of
-    "chat_completions", "codex_responses", or None (auto-detect).
+    Returns (provider, model, base_url, api_key) where model may be None
+    (use provider default). When base_url is set, provider is forced to
+    "custom" and the task uses that direct endpoint.
    """
    config = {}
    cfg_provider = None
    cfg_model = None
    cfg_base_url = None
    cfg_api_key = None
-    cfg_api_mode = None

    if task:
        try:
@@ -1989,7 +1872,6 @@ def _resolve_task_provider_model(
        cfg_model = str(task_config.get("model", "")).strip() or None
        cfg_base_url = str(task_config.get("base_url", "")).strip() or None
        cfg_api_key = str(task_config.get("api_key", "")).strip() or None
-        cfg_api_mode = str(task_config.get("api_mode", "")).strip() or None

        # Backwards compat: compression section has its own keys.
        # The auxiliary.compression defaults to provider="auto", so treat
@@ -2003,32 +1885,30 @@ def _resolve_task_provider_model(
                cfg_base_url = cfg_base_url or _sbu.strip() or None

    env_model = _get_auxiliary_env_override(task, "MODEL") if task else None
-    env_api_mode = _get_auxiliary_env_override(task, "API_MODE") if task else None
    resolved_model = model or env_model or cfg_model
-    resolved_api_mode = env_api_mode or cfg_api_mode

    if base_url:
-        return "custom", resolved_model, base_url, api_key, resolved_api_mode
+        return "custom", resolved_model, base_url, api_key
    if provider:
-        return provider, resolved_model, base_url, api_key, resolved_api_mode
+        return provider, resolved_model, base_url, api_key

    if task:
        env_base_url = _get_auxiliary_env_override(task, "BASE_URL")
        env_api_key = _get_auxiliary_env_override(task, "API_KEY")
        if env_base_url:
-            return "custom", resolved_model, env_base_url, env_api_key or cfg_api_key, resolved_api_mode
+            return "custom", resolved_model, env_base_url, env_api_key or cfg_api_key

        env_provider = _get_auxiliary_provider(task)
        if env_provider != "auto":
-            return env_provider, resolved_model, None, None, resolved_api_mode
+            return env_provider, resolved_model, None, None

        if cfg_base_url:
-            return "custom", resolved_model, cfg_base_url, cfg_api_key, resolved_api_mode
+            return "custom", resolved_model, cfg_base_url, cfg_api_key
        if cfg_provider and cfg_provider != "auto":
-            return cfg_provider, resolved_model, None, None, resolved_api_mode
-        return "auto", resolved_model, None, None, resolved_api_mode
+            return cfg_provider, resolved_model, None, None
+        return "auto", resolved_model, None, None

-    return "auto", resolved_model, None, None, resolved_api_mode
+    return "auto", resolved_model, None, None


 _DEFAULT_AUX_TIMEOUT = 30.0
@@ -2100,37 +1980,6 @@ def _build_call_kwargs(
    return kwargs


-def _validate_llm_response(response: Any, task: str = None) -> Any:
-    """Validate that an LLM response has the expected .choices[0].message shape.
-
-    Fails fast with a clear error instead of letting malformed payloads
-    propagate to downstream consumers where they crash with misleading
-    AttributeError (e.g. "'str' object has no attribute 'choices'").
-
-    See #7264.
-    """
-    if response is None:
-        raise RuntimeError(
-            f"Auxiliary {task or 'call'}: LLM returned None response"
-        )
-    # Allow SimpleNamespace responses from adapters (CodexAuxiliaryClient,
-    # AnthropicAuxiliaryClient) — they have .choices[0].message.
-    try:
-        choices = response.choices
-        if not choices or not hasattr(choices[0], "message"):
-            raise AttributeError("missing choices[0].message")
-    except (AttributeError, TypeError, IndexError) as exc:
-        response_type = type(response).__name__
-        response_preview = str(response)[:120]
-        raise RuntimeError(
-            f"Auxiliary {task or 'call'}: LLM returned invalid response "
-            f"(type={response_type}): {response_preview!r}. "
-            f"Expected object with .choices[0].message — check provider "
-            f"adapter or custom endpoint compatibility."
-        ) from exc
-    return response
-
-
 def call_llm(
    task: str = None,
    *,
@@ -2169,7 +2018,7 @@ def call_llm(
    Raises:
        RuntimeError: If no provider is configured.
    """
-    resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
+    resolved_provider, resolved_model, resolved_base_url, resolved_api_key = _resolve_task_provider_model(
        task, provider, model, base_url, api_key)

    if task == "vision":
@@ -2202,7 +2051,6 @@ def call_llm(
            resolved_model,
            base_url=resolved_base_url,
            api_key=resolved_api_key,
-            api_mode=resolved_api_mode,
        )
        if client is None:
            # When the user explicitly chose a non-OpenRouter provider but no
@@ -2246,20 +2094,18 @@ def call_llm(

    # Handle max_tokens vs max_completion_tokens retry, then payment fallback.
    try:
-        return _validate_llm_response(
-            client.chat.completions.create(**kwargs), task)
+        return client.chat.completions.create(**kwargs)
    except Exception as first_err:
        err_str = str(first_err)
        if "max_tokens" in err_str or "unsupported_parameter" in err_str:
            kwargs.pop("max_tokens", None)
            kwargs["max_completion_tokens"] = max_tokens
            try:
-                return _validate_llm_response(
-                    client.chat.completions.create(**kwargs), task)
+                return client.chat.completions.create(**kwargs)
            except Exception as retry_err:
-                # If the max_tokens retry also hits a payment or connection
-                # error, fall through to the fallback chain below.
-                if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
+                # If the max_tokens retry also hits a payment error,
+                # fall through to the payment fallback below.
+                if not _is_payment_error(retry_err):
                    raise
                first_err = retry_err

@@ -2276,24 +2122,19 @@ def call_llm(
        # and providers the user never configured that got picked up by
        # the auto-detection chain.
        should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
-        # Only try alternative providers when the user didn't explicitly
-        # configure this task's provider.  Explicit provider = hard constraint;
-        # auto (the default) = best-effort fallback chain.  (#7559)
-        is_auto = resolved_provider in ("auto", "", None)
-        if should_fallback and is_auto:
+        if should_fallback:
            reason = "payment error" if _is_payment_error(first_err) else "connection error"
            logger.info("Auxiliary %s: %s on %s (%s), trying fallback",
                        task or "call", reason, resolved_provider, first_err)
            fb_client, fb_model, fb_label = _try_payment_fallback(
-                resolved_provider, task, reason=reason)
+                resolved_provider, task)
            if fb_client is not None:
                fb_kwargs = _build_call_kwargs(
                    fb_label, fb_model, messages,
                    temperature=temperature, max_tokens=max_tokens,
                    tools=tools, timeout=effective_timeout,
                    extra_body=extra_body)
-                return _validate_llm_response(
-                    fb_client.chat.completions.create(**fb_kwargs), task)
+                return fb_client.chat.completions.create(**fb_kwargs)
        raise


@@ -2371,7 +2212,7 @@ async def async_call_llm(

    Same as call_llm() but async. See call_llm() for full documentation.
    """
-    resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
+    resolved_provider, resolved_model, resolved_base_url, resolved_api_key = _resolve_task_provider_model(
        task, provider, model, base_url, api_key)

    if task == "vision":
@@ -2405,7 +2246,6 @@ async def async_call_llm(
            async_mode=True,
            base_url=resolved_base_url,
            api_key=resolved_api_key,
-            api_mode=resolved_api_mode,
        )
        if client is None:
            _explicit = (resolved_provider or "").strip().lower()
@@ -2416,9 +2256,11 @@ async def async_call_llm(
                    f"variable, or switch to a different provider with `hermes model`."
                )
            if not resolved_base_url:
-                logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
-                            task or "call", resolved_provider)
-                client, final_model = _get_cached_client("auto", async_mode=True)
+                logger.warning("Provider %s unavailable, falling back to openrouter",
+                               resolved_provider)
+                client, final_model = _get_cached_client(
+                    "openrouter", resolved_model or _OPENROUTER_MODEL,
+                    async_mode=True)
        if client is None:
            raise RuntimeError(
                f"No LLM provider configured for task={task} provider={resolved_provider}. "
@@ -2433,42 +2275,11 @@ async def async_call_llm(
        base_url=resolved_base_url)

    try:
-        return _validate_llm_response(
-            await client.chat.completions.create(**kwargs), task)
+        return await client.chat.completions.create(**kwargs)
    except Exception as first_err:
        err_str = str(first_err)
        if "max_tokens" in err_str or "unsupported_parameter" in err_str:
            kwargs.pop("max_tokens", None)
            kwargs["max_completion_tokens"] = max_tokens
-            try:
-                return _validate_llm_response(
-                    await client.chat.completions.create(**kwargs), task)
-            except Exception as retry_err:
-                # If the max_tokens retry also hits a payment or connection
-                # error, fall through to the fallback chain below.
-                if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
-                    raise
-                first_err = retry_err
-
-        # ── Payment / connection fallback (mirrors sync call_llm) ─────
-        should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
-        is_auto = resolved_provider in ("auto", "", None)
-        if should_fallback and is_auto:
-            reason = "payment error" if _is_payment_error(first_err) else "connection error"
-            logger.info("Auxiliary %s (async): %s on %s (%s), trying fallback",
-                        task or "call", reason, resolved_provider, first_err)
-            fb_client, fb_model, fb_label = _try_payment_fallback(
-                resolved_provider, task, reason=reason)
-            if fb_client is not None:
-                fb_kwargs = _build_call_kwargs(
-                    fb_label, fb_model, messages,
-                    temperature=temperature, max_tokens=max_tokens,
-                    tools=tools, timeout=effective_timeout,
-                    extra_body=extra_body)
-                # Convert sync fallback client to async
-                async_fb, async_fb_model = _to_async_client(fb_client, fb_model or "")
-                if async_fb_model and async_fb_model != fb_kwargs.get("model"):
-                    fb_kwargs["model"] = async_fb_model
-                return _validate_llm_response(
-                    await async_fb.chat.completions.create(**fb_kwargs), task)
+            return await client.chat.completions.create(**kwargs)
        raise
@@ -18,7 +18,6 @@ import time
 from typing import Any, Dict, List, Optional

 from agent.auxiliary_client import call_llm
-from agent.context_engine import ContextEngine
 from agent.model_metadata import (
    get_model_context_length,
    estimate_messages_tokens_rough,
@@ -51,8 +50,8 @@ _CHARS_PER_TOKEN = 4
 _SUMMARY_FAILURE_COOLDOWN_SECONDS = 600


-class ContextCompressor(ContextEngine):
-    """Default context engine — compresses conversation context via lossy summarization.
+class ContextCompressor:
+    """Compresses conversation context when approaching the model's context limit.

    Algorithm:
      1. Prune old tool results (cheap, no LLM call)
@@ -62,33 +61,6 @@ class ContextCompressor(ContextEngine):
      5. On subsequent compactions, iteratively update the previous summary
    """

-    @property
-    def name(self) -> str:
-        return "compressor"
-
-    def on_session_reset(self) -> None:
-        """Reset all per-session state for /new or /reset."""
-        super().on_session_reset()
-        self._context_probed = False
-        self._context_probe_persistable = False
-        self._previous_summary = None
-
-    def update_model(
-        self,
-        model: str,
-        context_length: int,
-        base_url: str = "",
-        api_key: str = "",
-        provider: str = "",
-    ) -> None:
-        """Update model info after a model switch or fallback activation."""
-        self.model = model
-        self.base_url = base_url
-        self.api_key = api_key
-        self.provider = provider
-        self.context_length = context_length
-        self.threshold_tokens = int(context_length * self.threshold_percent)
-
    def __init__(
        self,
        model: str,
@@ -1,184 +0,0 @@
-"""Abstract base class for pluggable context engines.
-
-A context engine controls how conversation context is managed when
-approaching the model's token limit. The built-in ContextCompressor
-is the default implementation. Third-party engines (e.g. LCM) can
-replace it via the plugin system or by being placed in the
-``plugins/context_engine/<name>/`` directory.
-
-Selection is config-driven: ``context.engine`` in config.yaml.
-Default is ``"compressor"`` (the built-in). Only one engine is active.
-
-The engine is responsible for:
-  - Deciding when compaction should fire
-  - Performing compaction (summarization, DAG construction, etc.)
-  - Optionally exposing tools the agent can call (e.g. lcm_grep)
-  - Tracking token usage from API responses
-
-Lifecycle:
-  1. Engine is instantiated and registered (plugin register() or default)
-  2. on_session_start() called when a conversation begins
-  3. update_from_response() called after each API response with usage data
-  4. should_compress() checked after each turn
-  5. compress() called when should_compress() returns True
-  6. on_session_end() called at real session boundaries (CLI exit, /reset,
-     gateway session expiry) — NOT per-turn
-"""
-
-from abc import ABC, abstractmethod
-from typing import Any, Dict, List, Optional
-
-
-class ContextEngine(ABC):
-    """Base class all context engines must implement."""
-
-    # -- Identity ----------------------------------------------------------
-
-    @property
-    @abstractmethod
-    def name(self) -> str:
-        """Short identifier (e.g. 'compressor', 'lcm')."""
-
-    # -- Token state (read by run_agent.py for display/logging) ------------
-    #
-    # Engines MUST maintain these. run_agent.py reads them directly.
-
-    last_prompt_tokens: int = 0
-    last_completion_tokens: int = 0
-    last_total_tokens: int = 0
-    threshold_tokens: int = 0
-    context_length: int = 0
-    compression_count: int = 0
-
-    # -- Compaction parameters (read by run_agent.py for preflight) --------
-    #
-    # These control the preflight compression check.  Subclasses may
-    # override via __init__ or property; defaults are sensible for most
-    # engines.
-
-    threshold_percent: float = 0.75
-    protect_first_n: int = 3
-    protect_last_n: int = 6
-
-    # -- Core interface ----------------------------------------------------
-
-    @abstractmethod
-    def update_from_response(self, usage: Dict[str, Any]) -> None:
-        """Update tracked token usage from an API response.
-
-        Called after every LLM call with the usage dict from the response.
-        """
-
-    @abstractmethod
-    def should_compress(self, prompt_tokens: int = None) -> bool:
-        """Return True if compaction should fire this turn."""
-
-    @abstractmethod
-    def compress(
-        self,
-        messages: List[Dict[str, Any]],
-        current_tokens: int = None,
-    ) -> List[Dict[str, Any]]:
-        """Compact the message list and return the new message list.
-
-        This is the main entry point. The engine receives the full message
-        list and returns a (possibly shorter) list that fits within the
-        context budget. The implementation is free to summarize, build a
-        DAG, or do anything else — as long as the returned list is a valid
-        OpenAI-format message sequence.
-        """
-
-    # -- Optional: pre-flight check ----------------------------------------
-
-    def should_compress_preflight(self, messages: List[Dict[str, Any]]) -> bool:
-        """Quick rough check before the API call (no real token count yet).
-
-        Default returns False (skip pre-flight). Override if your engine
-        can do a cheap estimate.
-        """
-        return False
-
-    # -- Optional: session lifecycle ---------------------------------------
-
-    def on_session_start(self, session_id: str, **kwargs) -> None:
-        """Called when a new conversation session begins.
-
-        Use this to load persisted state (DAG, store) for the session.
-        kwargs may include hermes_home, platform, model, etc.
-        """
-
-    def on_session_end(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
-        """Called at real session boundaries (CLI exit, /reset, gateway expiry).
-
-        Use this to flush state, close DB connections, etc.
-        NOT called per-turn — only when the session truly ends.
-        """
-
-    def on_session_reset(self) -> None:
-        """Called on /new or /reset. Reset per-session state.
-
-        Default resets compression_count and token tracking.
-        """
-        self.last_prompt_tokens = 0
-        self.last_completion_tokens = 0
-        self.last_total_tokens = 0
-        self.compression_count = 0
-
-    # -- Optional: tools ---------------------------------------------------
-
-    def get_tool_schemas(self) -> List[Dict[str, Any]]:
-        """Return tool schemas this engine provides to the agent.
-
-        Default returns empty list (no tools). LCM would return schemas
-        for lcm_grep, lcm_describe, lcm_expand here.
-        """
-        return []
-
-    def handle_tool_call(self, name: str, args: Dict[str, Any], **kwargs) -> str:
-        """Handle a tool call from the agent.
-
-        Only called for tool names returned by get_tool_schemas().
-        Must return a JSON string.
-
-        kwargs may include:
-          messages: the current in-memory message list (for live ingestion)
-        """
-        import json
-        return json.dumps({"error": f"Unknown context engine tool: {name}"})
-
-    # -- Optional: status / display ----------------------------------------
-
-    def get_status(self) -> Dict[str, Any]:
-        """Return status dict for display/logging.
-
-        Default returns the standard fields run_agent.py expects.
-        """
-        return {
-            "last_prompt_tokens": self.last_prompt_tokens,
-            "threshold_tokens": self.threshold_tokens,
-            "context_length": self.context_length,
-            "usage_percent": (
-                min(100, self.last_prompt_tokens / self.context_length * 100)
-                if self.context_length else 0
-            ),
-            "compression_count": self.compression_count,
-        }
-
-    # -- Optional: model switch support ------------------------------------
-
-    def update_model(
-        self,
-        model: str,
-        context_length: int,
-        base_url: str = "",
-        api_key: str = "",
-        provider: str = "",
-    ) -> None:
-        """Called when the user switches models or on fallback activation.
-
-        Default updates context_length and recalculates threshold_tokens
-        from threshold_percent. Override if your engine needs more
-        (e.g. recalculate DAG budgets, switch summary models).
-        """
-        self.context_length = context_length
-        self.threshold_tokens = int(context_length * self.threshold_percent)
@@ -21,73 +21,11 @@ _RESET = "\033[0m"
 logger = logging.getLogger(__name__)

 _ANSI_RESET = "\033[0m"
-
-# Diff colors — resolved lazily from the skin engine so they adapt
-# to light/dark themes.  Falls back to sensible defaults on import
-# failure.  We cache after first resolution for performance.
-_diff_colors_cached: dict[str, str] | None = None
-
-
-def _diff_ansi() -> dict[str, str]:
-    """Return ANSI escapes for diff display, resolved from the active skin."""
-    global _diff_colors_cached
-    if _diff_colors_cached is not None:
-        return _diff_colors_cached
-
-    # Defaults that work on dark terminals
-    dim = "\033[38;2;150;150;150m"
-    file_c = "\033[38;2;180;160;255m"
-    hunk = "\033[38;2;120;120;140m"
-    minus = "\033[38;2;255;255;255;48;2;120;20;20m"
-    plus = "\033[38;2;255;255;255;48;2;20;90;20m"
-
-    try:
-        from hermes_cli.skin_engine import get_active_skin
-        skin = get_active_skin()
-
-        def _hex_fg(key: str, fallback_rgb: tuple[int, int, int]) -> str:
-            h = skin.get_color(key, "")
-            if h and len(h) == 7 and h[0] == "#":
-                r, g, b = int(h[1:3], 16), int(h[3:5], 16), int(h[5:7], 16)
-                return f"\033[38;2;{r};{g};{b}m"
-            r, g, b = fallback_rgb
-            return f"\033[38;2;{r};{g};{b}m"
-
-        dim = _hex_fg("banner_dim", (150, 150, 150))
-        file_c = _hex_fg("session_label", (180, 160, 255))
-        hunk = _hex_fg("session_border", (120, 120, 140))
-        # minus/plus use background colors — derive from ui_error/ui_ok
-        err_h = skin.get_color("ui_error", "#ef5350")
-        ok_h = skin.get_color("ui_ok", "#4caf50")
-        if err_h and len(err_h) == 7:
-            er, eg, eb = int(err_h[1:3], 16), int(err_h[3:5], 16), int(err_h[5:7], 16)
-            # Use a dark tinted version as background
-            minus = f"\033[38;2;255;255;255;48;2;{max(er//2,20)};{max(eg//4,10)};{max(eb//4,10)}m"
-        if ok_h and len(ok_h) == 7:
-            or_, og, ob = int(ok_h[1:3], 16), int(ok_h[3:5], 16), int(ok_h[5:7], 16)
-            plus = f"\033[38;2;255;255;255;48;2;{max(or_//4,10)};{max(og//2,20)};{max(ob//4,10)}m"
-    except Exception:
-        pass
-
-    _diff_colors_cached = {
-        "dim": dim, "file": file_c, "hunk": hunk,
-        "minus": minus, "plus": plus,
-    }
-    return _diff_colors_cached
-
-
-def reset_diff_colors() -> None:
-    """Reset cached diff colors (call after /skin switch)."""
-    global _diff_colors_cached
-    _diff_colors_cached = None
-
-
-# Module-level helpers — each call resolves from the active skin lazily.
-def _diff_dim():   return _diff_ansi()["dim"]
-def _diff_file():  return _diff_ansi()["file"]
-def _diff_hunk():  return _diff_ansi()["hunk"]
-def _diff_minus(): return _diff_ansi()["minus"]
-def _diff_plus():  return _diff_ansi()["plus"]
+_ANSI_DIM = "\033[38;2;150;150;150m"
+_ANSI_FILE = "\033[38;2;180;160;255m"
+_ANSI_HUNK = "\033[38;2;120;120;140m"
+_ANSI_MINUS = "\033[38;2;255;255;255;48;2;120;20;20m"
+_ANSI_PLUS = "\033[38;2;255;255;255;48;2;20;90;20m"
 _MAX_INLINE_DIFF_FILES = 6
 _MAX_INLINE_DIFF_LINES = 80

@@ -465,19 +403,19 @@ def _render_inline_unified_diff(diff: str) -> list[str]:
        if raw_line.startswith("+++ "):
            to_file = raw_line[4:].strip()
            if from_file or to_file:
-                rendered.append(f"{_diff_file()}{from_file or 'a/?'} → {to_file or 'b/?'}{_ANSI_RESET}")
+                rendered.append(f"{_ANSI_FILE}{from_file or 'a/?'} → {to_file or 'b/?'}{_ANSI_RESET}")
            continue
        if raw_line.startswith("@@"):
-            rendered.append(f"{_diff_hunk()}{raw_line}{_ANSI_RESET}")
+            rendered.append(f"{_ANSI_HUNK}{raw_line}{_ANSI_RESET}")
            continue
        if raw_line.startswith("-"):
-            rendered.append(f"{_diff_minus()}{raw_line}{_ANSI_RESET}")
+            rendered.append(f"{_ANSI_MINUS}{raw_line}{_ANSI_RESET}")
            continue
        if raw_line.startswith("+"):
-            rendered.append(f"{_diff_plus()}{raw_line}{_ANSI_RESET}")
+            rendered.append(f"{_ANSI_PLUS}{raw_line}{_ANSI_RESET}")
            continue
        if raw_line.startswith(" "):
-            rendered.append(f"{_diff_dim()}{raw_line}{_ANSI_RESET}")
+            rendered.append(f"{_ANSI_DIM}{raw_line}{_ANSI_RESET}")
            continue
        if raw_line:
            rendered.append(raw_line)
@@ -543,7 +481,7 @@ def _summarize_rendered_diff_sections(
        summary = f"… omitted {omitted_lines} diff line(s)"
        if omitted_files:
            summary += f" across {omitted_files} additional file(s)/section(s)"
-        rendered.append(f"{_diff_hunk()}{summary}{_ANSI_RESET}")
+        rendered.append(f"{_ANSI_HUNK}{summary}{_ANSI_RESET}")

    return rendered

@@ -1,49 +0,0 @@
-"""User-facing summaries for manual compression commands."""
-
-from __future__ import annotations
-
-from typing import Any, Sequence
-
-
-def summarize_manual_compression(
-    before_messages: Sequence[dict[str, Any]],
-    after_messages: Sequence[dict[str, Any]],
-    before_tokens: int,
-    after_tokens: int,
-) -> dict[str, Any]:
-    """Return consistent user-facing feedback for manual compression."""
-    before_count = len(before_messages)
-    after_count = len(after_messages)
-    noop = list(after_messages) == list(before_messages)
-
-    if noop:
-        headline = f"No changes from compression: {before_count} messages"
-        if after_tokens == before_tokens:
-            token_line = (
-                f"Rough transcript estimate: ~{before_tokens:,} tokens (unchanged)"
-            )
-        else:
-            token_line = (
-                f"Rough transcript estimate: ~{before_tokens:,} → "
-                f"~{after_tokens:,} tokens"
-            )
-    else:
-        headline = f"Compressed: {before_count} → {after_count} messages"
-        token_line = (
-            f"Rough transcript estimate: ~{before_tokens:,} → "
-            f"~{after_tokens:,} tokens"
-        )
-
-    note = None
-    if not noop and after_count < before_count and after_tokens > before_tokens:
-        note = (
-            "Note: fewer messages can still raise this rough transcript estimate "
-            "when compression rewrites the transcript into denser summaries."
-        )
-
-    return {
-        "noop": noop,
-        "headline": headline,
-        "token_line": token_line,
-        "note": note,
-    }
@@ -115,9 +115,15 @@ DEFAULT_CONTEXT_LENGTHS = {
    "llama": 131072,
    # Qwen
    "qwen": 131072,
-    # MiniMax — official docs: 204,800 context for all models
-    # https://platform.minimax.io/docs/api-reference/text-anthropic-api
-    "minimax": 204800,
+    # MiniMax (lowercase — lookup lowercases model names at line 973)
+    "minimax-m1-256k": 1000000,
+    "minimax-m1-128k": 1000000,
+    "minimax-m1-80k": 1000000,
+    "minimax-m1-40k": 1000000,
+    "minimax-m1": 1000000,
+    "minimax-m2.5": 1048576,
+    "minimax-m2.7": 1048576,
+    "minimax": 1048576,
    # GLM
    "glm": 202752,
    # xAI Grok — xAI /v1/models does not return context_length metadata,
@@ -145,7 +151,7 @@ DEFAULT_CONTEXT_LENGTHS = {
    "deepseek-ai/DeepSeek-V3.2": 65536,
    "moonshotai/Kimi-K2.5": 262144,
    "moonshotai/Kimi-K2-Thinking": 262144,
-    "MiniMaxAI/MiniMax-M2.5": 204800,
+    "MiniMaxAI/MiniMax-M2.5": 1048576,
    "XiaomiMiMo/MiMo-V2-Flash": 32768,
    "mimo-v2-pro": 1048576,
    "mimo-v2-omni": 1048576,
@@ -168,7 +168,7 @@ def _build_skill_message(
            subdir_path = skill_dir / subdir
            if subdir_path.exists():
                for f in sorted(subdir_path.rglob("*")):
-                    if f.is_file() and not f.is_symlink():
+                    if f.is_file():
                        rel = str(f.relative_to(skill_dir))
                        supporting.append(rel)

@@ -480,12 +480,6 @@ agent:
  # Fires once per run when inactivity reaches this threshold (seconds).
  # Set to 0 to disable the warning.
  # gateway_timeout_warning: 900
-
-  # Graceful drain timeout for gateway stop/restart (seconds).
-  # The gateway stops accepting new work, waits for in-flight agents to
-  # finish, then interrupts anything still running after this timeout.
-  # 0 = no drain, interrupt immediately.
-  # restart_drain_timeout: 60
  
  # Enable verbose logging
  verbose: false
@@ -588,7 +582,7 @@ platform_toolsets:
 #   skills_hub   - skill_hub (search/install/manage from online registries — user-driven only)
 #   moa          - mixture_of_agents  (requires OPENROUTER_API_KEY)
 #   todo         - todo (in-memory task planning, no deps)
-#   tts          - text_to_speech  (Edge TTS free, or ELEVENLABS/OPENAI/MINIMAX/MISTRAL key)
+#   tts          - text_to_speech  (Edge TTS free, or ELEVENLABS/OPENAI/MINIMAX key)
 #   cronjob      - cronjob (create/list/update/pause/resume/run/remove scheduled tasks)
 #   rl           - rl_list_environments, rl_start_training, etc. (requires TINKER_API_KEY)
 #
@@ -617,7 +611,7 @@ platform_toolsets:
 #   todo         - Task planning and tracking for multi-step work
 #   memory       - Persistent memory across sessions (personal notes + user profile)
 #   session_search - Search and recall past conversations (FTS5 + Gemini Flash summarization)
-#   tts          - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI, MiniMax, Mistral)
+#   tts          - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI, MiniMax)
 #   cronjob      - Schedule and manage automated tasks (CLI-only)
 #   rl           - RL training tools (Tinker-Atropos)
 #
@@ -987,60 +987,11 @@ def _prune_orphaned_branches(repo_root: str) -> None:
 # - Dim: #B8860B (muted text)

 # ANSI building blocks for conversation display
-_ACCENT_ANSI_DEFAULT = "\033[1;38;2;255;215;0m"  # True-color #FFD700 bold — fallback
+_GOLD = "\033[1;38;2;255;215;0m"  # True-color #FFD700 bold — matches Rich Panel gold
 _BOLD = "\033[1m"
 _DIM = "\033[2m"
 _RST = "\033[0m"

-
-def _hex_to_ansi_bold(hex_color: str) -> str:
-    """Convert a hex color like '#268bd2' to a bold true-color ANSI escape."""
-    try:
-        r = int(hex_color[1:3], 16)
-        g = int(hex_color[3:5], 16)
-        b = int(hex_color[5:7], 16)
-        return f"\033[1;38;2;{r};{g};{b}m"
-    except (ValueError, IndexError):
-        return _ACCENT_ANSI_DEFAULT
-
-
-class _SkinAwareAnsi:
-    """Lazy ANSI escape that resolves from the skin engine on first use.
-
-    Acts as a string in f-strings and concatenation.  Call ``.reset()`` to
-    force re-resolution after a ``/skin`` switch.
-    """
-
-    def __init__(self, skin_key: str, fallback_hex: str = "#FFD700"):
-        self._skin_key = skin_key
-        self._fallback_hex = fallback_hex
-        self._cached: str | None = None
-
-    def __str__(self) -> str:
-        if self._cached is None:
-            try:
-                from hermes_cli.skin_engine import get_active_skin
-                self._cached = _hex_to_ansi_bold(
-                    get_active_skin().get_color(self._skin_key, self._fallback_hex)
-                )
-            except Exception:
-                self._cached = _hex_to_ansi_bold(self._fallback_hex)
-        return self._cached
-
-    def __add__(self, other: str) -> str:
-        return str(self) + other
-
-    def __radd__(self, other: str) -> str:
-        return other + str(self)
-
-    def reset(self) -> None:
-        """Clear cache so the next access re-reads the skin."""
-        self._cached = None
-
-
-_ACCENT = _SkinAwareAnsi("response_border", "#FFD700")
-
-
 def _accent_hex() -> str:
    """Return the active skin accent color for legacy CLI output lines."""
    try:
@@ -2515,7 +2466,7 @@ class HermesCLI:
                self._stream_text_ansi = ""
            w = shutil.get_terminal_size().columns
            fill = w - 2 - len(label)
-            _cprint(f"\n{_ACCENT}╭─{label}{'─' * max(fill - 1, 0)}╮{_RST}")
+            _cprint(f"\n{_GOLD}╭─{label}{'─' * max(fill - 1, 0)}╮{_RST}")

        self._stream_buf += text

@@ -2546,7 +2497,7 @@ class HermesCLI:
        # Close the response box
        if self._stream_box_opened:
            w = shutil.get_terminal_size().columns
-            _cprint(f"{_ACCENT}╰{'─' * (w - 2)}╯{_RST}")
+            _cprint(f"{_GOLD}╰{'─' * (w - 2)}╯{_RST}")

    def _reset_stream_state(self) -> None:
        """Reset streaming state before each agent invocation."""
@@ -2969,17 +2920,15 @@ class HermesCLI:
            title_part = ""
            if session_meta.get("title"):
                title_part = f' "{session_meta["title"]}"'
-            accent_color = _accent_hex()
            self.console.print(
-                f"[{accent_color}]↻ Resumed session [bold]{self.session_id}[/bold]"
+                f"[#DAA520]↻ Resumed session [bold]{self.session_id}[/bold]"
                f"{title_part} "
                f"({msg_count} user message{'s' if msg_count != 1 else ''}, "
                f"{len(restored)} total messages)[/]"
            )
        else:
-            accent_color = _accent_hex()
            self.console.print(
-                f"[{accent_color}]Session {self.session_id} found but has no "
+                f"[#DAA520]Session {self.session_id} found but has no "
                f"messages. Starting fresh.[/]"
            )
            return False
@@ -3448,26 +3397,18 @@ class HermesCLI:
        else:
            api_indicator = "[red bold]●[/]"

-        # Build status line with proper markup — skin-aware colors
-        try:
-            from hermes_cli.skin_engine import get_active_skin
-            skin = get_active_skin()
-            separator_color = skin.get_color("banner_dim", "#B8860B")
-            accent_color = skin.get_color("ui_accent", "#FFBF00")
-            label_color = skin.get_color("ui_label", "#4dd0e1")
-        except Exception:
-            separator_color, accent_color, label_color = "#B8860B", "#FFBF00", "cyan"
+        # Build status line with proper markup
        toolsets_info = ""
        if self.enabled_toolsets and "all" not in self.enabled_toolsets:
-            toolsets_info = f" [dim {separator_color}]·[/] [{label_color}]toolsets: {', '.join(self.enabled_toolsets)}[/]"
+            toolsets_info = f" [dim #B8860B]·[/] [#CD7F32]toolsets: {', '.join(self.enabled_toolsets)}[/]"

-        provider_info = f" [dim {separator_color}]·[/] [dim]provider: {self.provider}[/]"
+        provider_info = f" [dim #B8860B]·[/] [dim]provider: {self.provider}[/]"
        if self._provider_source:
-            provider_info += f" [dim {separator_color}]·[/] [dim]auth: {self._provider_source}[/]"
+            provider_info += f" [dim #B8860B]·[/] [dim]auth: {self._provider_source}[/]"

        self.console.print(
-            f"  {api_indicator} [{accent_color}]{model_short}[/] "
-            f"[dim {separator_color}]·[/] [bold {label_color}]{tool_count} tools[/]"
+            f"  {api_indicator} [#FFBF00]{model_short}[/] "
+            f"[dim #B8860B]·[/] [bold cyan]{tool_count} tools[/]"
            f"{toolsets_info}{provider_info}"
        )

@@ -3658,7 +3599,7 @@ class HermesCLI:
        # TUI event loop (known pitfall).
        verb = "Disabling" if subcommand == "disable" else "Enabling"
        label = ", ".join(names)
-        _cprint(f"{_ACCENT}{verb} {label}...{_RST}")
+        _cprint(f"{_GOLD}{verb} {label}...{_RST}")

        tools_disable_enable_command(
            Namespace(tools_action=subcommand, names=names, platform="cli"))
@@ -5171,17 +5112,17 @@ class HermesCLI:
                    if full_name == typed_base:
                        # Already an exact token — no expansion possible; fall through
                        _cprint(f"\033[1;31mUnknown command: {cmd_lower}{_RST}")
-                        _cprint(f"{_DIM}{_ACCENT}Type /help for available commands{_RST}")
+                        _cprint(f"{_DIM}{_GOLD}Type /help for available commands{_RST}")
                    else:
                        remainder = cmd_original.strip()[len(typed_base):]
                        full_cmd = full_name + remainder
                        return self.process_command(full_cmd)
                elif len(matches) > 1:
-                    _cprint(f"{_ACCENT}Ambiguous command: {cmd_lower}{_RST}")
+                    _cprint(f"{_GOLD}Ambiguous command: {cmd_lower}{_RST}")
                    _cprint(f"{_DIM}Did you mean: {', '.join(sorted(matches))}?{_RST}")
                else:
                    _cprint(f"\033[1;31mUnknown command: {cmd_lower}{_RST}")
-                    _cprint(f"{_DIM}{_ACCENT}Type /help for available commands{_RST}")
+                    _cprint(f"{_DIM}{_GOLD}Type /help for available commands{_RST}")
        
        return True
    
@@ -5719,7 +5660,6 @@ class HermesCLI:
            return

        set_active_skin(new_skin)
-        _ACCENT.reset()  # Re-resolve ANSI color for the new skin
        if save_config_value("display.skin", new_skin):
            print(f"  Skin set to: {new_skin} (saved)")
        else:
@@ -5788,8 +5728,8 @@ class HermesCLI:
            else:
                level = rc.get("effort", "medium")
            display_state = "on ✓" if self.show_reasoning else "off"
-            _cprint(f"  {_ACCENT}Reasoning effort:  {level}{_RST}")
-            _cprint(f"  {_ACCENT}Reasoning display: {display_state}{_RST}")
+            _cprint(f"  {_GOLD}Reasoning effort:  {level}{_RST}")
+            _cprint(f"  {_GOLD}Reasoning display: {display_state}{_RST}")
            _cprint(f"  {_DIM}Usage: /reasoning <none|minimal|low|medium|high|xhigh|show|hide>{_RST}")
            return

@@ -5801,7 +5741,7 @@ class HermesCLI:
            if self.agent:
                self.agent.reasoning_callback = self._current_reasoning_callback()
            save_config_value("display.show_reasoning", True)
-            _cprint(f"  {_ACCENT}✓ Reasoning display: ON (saved){_RST}")
+            _cprint(f"  {_GOLD}✓ Reasoning display: ON (saved){_RST}")
            _cprint(f"  {_DIM}  Model thinking will be shown during and after each response.{_RST}")
            return
        if arg in ("hide", "off"):
@@ -5809,7 +5749,7 @@ class HermesCLI:
            if self.agent:
                self.agent.reasoning_callback = self._current_reasoning_callback()
            save_config_value("display.show_reasoning", False)
-            _cprint(f"  {_ACCENT}✓ Reasoning display: OFF (saved){_RST}")
+            _cprint(f"  {_GOLD}✓ Reasoning display: OFF (saved){_RST}")
            return

        # Effort level change
@@ -5824,9 +5764,9 @@ class HermesCLI:
        self.agent = None  # Force agent re-init with new reasoning config

        if save_config_value("agent.reasoning_effort", arg):
-            _cprint(f"  {_ACCENT}✓ Reasoning effort set to '{arg}' (saved to config){_RST}")
+            _cprint(f"  {_GOLD}✓ Reasoning effort set to '{arg}' (saved to config){_RST}")
        else:
-            _cprint(f"  {_ACCENT}✓ Reasoning effort set to '{arg}' (session only){_RST}")
+            _cprint(f"  {_GOLD}✓ Reasoning effort set to '{arg}' (session only){_RST}")

    def _handle_fast_command(self, cmd: str):
        """Handle /fast — toggle fast mode (OpenAI Priority Processing / Anthropic Fast Mode)."""
@@ -5846,7 +5786,7 @@ class HermesCLI:
        parts = cmd.strip().split(maxsplit=1)
        if len(parts) < 2 or parts[1].strip().lower() == "status":
            status = "fast" if self.service_tier == "priority" else "normal"
-            _cprint(f"  {_ACCENT}{feature_name}: {status}{_RST}")
+            _cprint(f"  {_GOLD}{feature_name}: {status}{_RST}")
            _cprint(f"  {_DIM}Usage: /fast [normal|fast|status]{_RST}")
            return

@@ -5867,9 +5807,9 @@ class HermesCLI:

        self.agent = None  # Force agent re-init with new service-tier config
        if save_config_value("agent.service_tier", saved_value):
-            _cprint(f"  {_ACCENT}✓ {feature_name} set to {label} (saved to config){_RST}")
+            _cprint(f"  {_GOLD}✓ {feature_name} set to {label} (saved to config){_RST}")
        else:
-            _cprint(f"  {_ACCENT}✓ {feature_name} set to {label} (session only){_RST}")
+            _cprint(f"  {_GOLD}✓ {feature_name} set to {label} (session only){_RST}")

    def _on_reasoning(self, reasoning_text: str):
        """Callback for intermediate reasoning display during tool-call loops."""
@@ -5895,29 +5835,21 @@ class HermesCLI:
        original_count = len(self.conversation_history)
        try:
            from agent.model_metadata import estimate_messages_tokens_rough
-            from agent.manual_compression_feedback import summarize_manual_compression
-            original_history = list(self.conversation_history)
-            approx_tokens = estimate_messages_tokens_rough(original_history)
+            approx_tokens = estimate_messages_tokens_rough(self.conversation_history)
            print(f"🗜️  Compressing {original_count} messages (~{approx_tokens:,} tokens)...")

-            compressed, _ = self.agent._compress_context(
-                original_history,
+            compressed, _new_system = self.agent._compress_context(
+                self.conversation_history,
                self.agent._cached_system_prompt or "",
                approx_tokens=approx_tokens,
            )
            self.conversation_history = compressed
+            new_count = len(self.conversation_history)
            new_tokens = estimate_messages_tokens_rough(self.conversation_history)
-            summary = summarize_manual_compression(
-                original_history,
-                self.conversation_history,
-                approx_tokens,
-                new_tokens,
+            print(
+                f"  ✅ Compressed: {original_count} → {new_count} messages "
+                f"(~{approx_tokens:,} → ~{new_tokens:,} tokens)"
            )
-            icon = "🗜️" if summary["noop"] else "✅"
-            print(f"  {icon} {summary['headline']}")
-            print(f"     {summary['token_line']}")
-            if summary["note"]:
-                print(f"     {summary['note']}")

        except Exception as e:
            print(f"  ❌ Compression failed: {e}")
@@ -6369,7 +6301,7 @@ class HermesCLI:
            _recording_hint = "Termux:API capture | Ctrl+B to stop"
        else:
            _recording_hint = "Ctrl+B to stop"
-        _cprint(f"\n{_ACCENT}● Recording...{_RST} {_DIM}({_recording_hint}){_RST}")
+        _cprint(f"\n{_GOLD}● Recording...{_RST} {_DIM}({_recording_hint}){_RST}")

        # Periodically refresh prompt to update audio level indicator
        def _refresh_level():
@@ -6569,14 +6501,14 @@ class HermesCLI:
        # Environment detection -- warn and block in incompatible environments
        env_check = detect_audio_environment()
        if not env_check["available"]:
-            _cprint(f"\n{_ACCENT}Voice mode unavailable in this environment:{_RST}")
+            _cprint(f"\n{_GOLD}Voice mode unavailable in this environment:{_RST}")
            for warning in env_check["warnings"]:
                _cprint(f"  {_DIM}{warning}{_RST}")
            return

        reqs = check_voice_requirements()
        if not reqs["available"]:
-            _cprint(f"\n{_ACCENT}Voice mode requirements not met:{_RST}")
+            _cprint(f"\n{_GOLD}Voice mode requirements not met:{_RST}")
            for line in reqs["details"].split("\n"):
                _cprint(f"  {_DIM}{line}{_RST}")
            if reqs["missing_packages"]:
@@ -6614,7 +6546,7 @@ class HermesCLI:
        except Exception:
            _ptt_key = "c-b"
        _ptt_display = _ptt_key.replace("c-", "Ctrl+").upper()
-        _cprint(f"\n{_ACCENT}Voice mode enabled{tts_status}{_RST}")
+        _cprint(f"\n{_GOLD}Voice mode enabled{tts_status}{_RST}")
        _cprint(f"  {_DIM}{_ptt_display} to start/stop recording{_RST}")
        _cprint(f"  {_DIM}/voice tts  to toggle speech output{_RST}")
        _cprint(f"  {_DIM}/voice off  to disable voice mode{_RST}")
@@ -6666,7 +6598,7 @@ class HermesCLI:
            if not check_tts_requirements():
                _cprint(f"{_DIM}Warning: No TTS provider available. Install edge-tts or set API keys.{_RST}")

-        _cprint(f"{_ACCENT}Voice TTS {status}.{_RST}")
+        _cprint(f"{_GOLD}Voice TTS {status}.{_RST}")

    def _show_voice_status(self):
        """Show current voice mode status."""
@@ -7151,7 +7083,7 @@ class HermesCLI:
                        w = self.console.width
                        label = " ⚕ Hermes "
                        fill = w - 2 - len(label)
-                        _cprint(f"\n{_ACCENT}╭─{label}{'─' * max(fill - 1, 0)}╮{_RST}")
+                        _cprint(f"\n{_GOLD}╭─{label}{'─' * max(fill - 1, 0)}╮{_RST}")
                    _cprint(sentence.rstrip())

                tts_thread = threading.Thread(
@@ -7367,7 +7299,7 @@ class HermesCLI:
                if use_streaming_tts and _streaming_box_opened and not is_error_response:
                    # Text was already printed sentence-by-sentence; just close the box
                    w = shutil.get_terminal_size().columns
-                    _cprint(f"\n{_ACCENT}╰{'─' * (w - 2)}╯{_RST}")
+                    _cprint(f"\n{_GOLD}╰{'─' * (w - 2)}╯{_RST}")
                elif already_streamed:
                    # Response was already streamed token-by-token with box framing;
                    # _flush_stream() already closed the box. Skip Rich Panel.
@@ -442,14 +442,6 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
        stdout = (result.stdout or "").strip()
        stderr = (result.stderr or "").strip()

-        # Redact secrets from both stdout and stderr before any return path.
-        try:
-            from agent.redact import redact_sensitive_text
-            stdout = redact_sensitive_text(stdout)
-            stderr = redact_sensitive_text(stderr)
-        except Exception:
-            pass
-
        if result.returncode != 0:
            parts = [f"Script exited with code {result.returncode}"]
            if stderr:
@@ -458,6 +450,13 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
                parts.append(f"stdout:\n{stdout}")
            return False, "\n".join(parts)

+        # Redact any secrets that may appear in script output before
+        # they are injected into the LLM prompt context.
+        try:
+            from agent.redact import redact_sensitive_text
+            stdout = redact_sensitive_text(stdout)
+        except Exception:
+            pass
        return True, stdout

    except subprocess.TimeoutExpired:
@@ -49,8 +49,6 @@ class HermesToolCallParser(ToolCallParser):
                    continue

                tc_data = json.loads(raw_json)
-                if "name" not in tc_data:
-                    continue
                tool_calls.append(
                    ChatCompletionMessageToolCall(
                        id=f"call_{uuid.uuid4().hex[:8]}",
@@ -89,8 +89,6 @@ class MistralToolCallParser(ToolCallParser):
                        parsed = [parsed]

                    for tc in parsed:
-                        if "name" not in tc:
-                            continue
                        args = tc.get("arguments", {})
                        if isinstance(args, dict):
                            args = json.dumps(args, ensure_ascii=False)
@@ -644,35 +644,15 @@ class APIServerAdapter(BasePlatformAdapter):
                    _stream_q.put(delta)

            def _on_tool_progress(event_type, name, preview, args, **kwargs):
-                """Send tool progress as a separate SSE event.
-
-                Previously, progress markers like ``⏰ list`` were injected
-                directly into ``delta.content``.  OpenAI-compatible frontends
-                (Open WebUI, LobeChat, …) store ``delta.content`` verbatim as
-                the assistant message and send it back on subsequent requests.
-                After enough turns the model learns to *emit* the markers as
-                plain text instead of issuing real tool calls — silently
-                hallucinating tool results.  See #6972.
-
-                The fix: push a tagged tuple ``("__tool_progress__", payload)``
-                onto the stream queue.  The SSE writer emits it as a custom
-                ``event: hermes.tool.progress`` line that compliant frontends
-                can render for UX but will *not* persist into conversation
-                history.  Clients that don't understand the custom event type
-                silently ignore it per the SSE specification.
-                """
+                """Inject tool progress into the SSE stream for Open WebUI."""
                if event_type != "tool.started":
-                    return
+                    return  # Only show tool start events in chat stream
                if name.startswith("_"):
-                    return
+                    return  # Skip internal events (_thinking)
                from agent.display import get_tool_emoji
                emoji = get_tool_emoji(name)
                label = preview or name
-                _stream_q.put(("__tool_progress__", {
-                    "tool": name,
-                    "emoji": emoji,
-                    "label": label,
-                }))
+                _stream_q.put(f"\n`{emoji} {label}`\n")

            # Start agent in background.  agent_ref is a mutable container
            # so the SSE writer can interrupt the agent on client disconnect.
@@ -783,29 +763,6 @@ class APIServerAdapter(BasePlatformAdapter):
            }
            await response.write(f"data: {json.dumps(role_chunk)}\n\n".encode())

-            # Helper — route a queue item to the correct SSE event.
-            async def _emit(item):
-                """Write a single queue item to the SSE stream.
-
-                Plain strings are sent as normal ``delta.content`` chunks.
-                Tagged tuples ``("__tool_progress__", payload)`` are sent
-                as a custom ``event: hermes.tool.progress`` SSE event so
-                frontends can display them without storing the markers in
-                conversation history.  See #6972.
-                """
-                if isinstance(item, tuple) and len(item) == 2 and item[0] == "__tool_progress__":
-                    event_data = json.dumps(item[1])
-                    await response.write(
-                        f"event: hermes.tool.progress\ndata: {event_data}\n\n".encode()
-                    )
-                else:
-                    content_chunk = {
-                        "id": completion_id, "object": "chat.completion.chunk",
-                        "created": created, "model": model,
-                        "choices": [{"index": 0, "delta": {"content": item}, "finish_reason": None}],
-                    }
-                    await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
-
            # Stream content chunks as they arrive from the agent
            loop = asyncio.get_event_loop()
            while True:
@@ -819,7 +776,12 @@ class APIServerAdapter(BasePlatformAdapter):
                                delta = stream_q.get_nowait()
                                if delta is None:
                                    break
-                                await _emit(delta)
+                                content_chunk = {
+                                    "id": completion_id, "object": "chat.completion.chunk",
+                                    "created": created, "model": model,
+                                    "choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
+                                }
+                                await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
                            except _q.Empty:
                                break
                        break
@@ -828,7 +790,12 @@ class APIServerAdapter(BasePlatformAdapter):
                if delta is None:  # End of stream sentinel
                    break

-                await _emit(delta)
+                content_chunk = {
+                    "id": completion_id, "object": "chat.completion.chunk",
+                    "created": created, "model": model,
+                    "choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
+                }
+                await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())

            # Get usage from completed agent
            usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
@@ -673,32 +673,6 @@ class SendResult:
    retryable: bool = False  # True for transient connection errors — base will retry automatically


-def merge_pending_message_event(
-    pending_messages: Dict[str, MessageEvent],
-    session_key: str,
-    event: MessageEvent,
-) -> None:
-    """Store or merge a pending event for a session.
-
-    Photo bursts/albums often arrive as multiple near-simultaneous PHOTO
-    events. Merge those into the existing queued event so the next turn sees
-    the whole burst, while non-photo follow-ups still replace the pending
-    event normally.
-    """
-    existing = pending_messages.get(session_key)
-    if (
-        existing
-        and getattr(existing, "message_type", None) == MessageType.PHOTO
-        and event.message_type == MessageType.PHOTO
-    ):
-        existing.media_urls.extend(event.media_urls)
-        existing.media_types.extend(event.media_types)
-        if event.text:
-            existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
-        return
-    pending_messages[session_key] = event
-
-
 # Error substrings that indicate a transient *connection* failure worth retrying.
 # "timeout" / "timed out" / "readtimeout" / "writetimeout" are intentionally
 # excluded: a read/write timeout on a non-idempotent call (e.g. send_message)
@@ -753,7 +727,6 @@ class BasePlatformAdapter(ABC):
        # working on a task after --replace or manual restarts.
        self._background_tasks: set[asyncio.Task] = set()
        self._expected_cancelled_tasks: set[asyncio.Task] = set()
-        self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
        # Chats where auto-TTS on voice input is disabled (set by /voice off)
        self._auto_tts_disabled_chats: set = set()
        # Chats where typing indicator is paused (e.g. during approval waits).
@@ -842,10 +815,6 @@ class BasePlatformAdapter(ABC):
        an optional response string.
        """
        self._message_handler = handler
-
-    def set_busy_session_handler(self, handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]]) -> None:
-        """Set an optional handler for messages arriving during active sessions."""
-        self._busy_session_handler = handler
    
    def set_session_store(self, session_store: Any) -> None:
        """
@@ -1427,7 +1396,7 @@ class BasePlatformAdapter(ABC):
            # session lifecycle and its cleanup races with the running task
            # (see PR #4926).
            cmd = event.get_command()
-            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background", "restart"):
+            if cmd in ("approve", "deny", "status", "stop", "new", "reset", "background"):
                logger.debug(
                    "[%s] Command '/%s' bypassing active-session guard for %s",
                    self.name, cmd, session_key,
@@ -1446,19 +1415,19 @@ class BasePlatformAdapter(ABC):
                    logger.error("[%s] Command '/%s' dispatch failed: %s", self.name, cmd, e, exc_info=True)
                return

-            if self._busy_session_handler is not None:
-                try:
-                    if await self._busy_session_handler(event, session_key):
-                        return
-                except Exception as e:
-                    logger.error("[%s] Busy-session handler failed: %s", self.name, e, exc_info=True)
-
            # Special case: photo bursts/albums frequently arrive as multiple near-
            # simultaneous messages. Queue them without interrupting the active run,
            # then process them immediately after the current task finishes.
            if event.message_type == MessageType.PHOTO:
                logger.debug("[%s] Queuing photo follow-up for session %s without interrupt", self.name, session_key)
-                merge_pending_message_event(self._pending_messages, session_key, event)
+                existing = self._pending_messages.get(session_key)
+                if existing and existing.message_type == MessageType.PHOTO:
+                    existing.media_urls.extend(event.media_urls)
+                    existing.media_types.extend(event.media_types)
+                    if event.text:
+                        existing.text = self._merge_caption(existing.text, event.text)
+                else:
+                    self._pending_messages[session_key] = event
                return  # Don't interrupt now - will run after current task completes

            # Default behavior for non-photo follow-ups: interrupt the running agent
@@ -1,20 +0,0 @@
-"""Shared gateway restart constants and parsing helpers."""
-
-from hermes_cli.config import DEFAULT_CONFIG
-
-# EX_TEMPFAIL from sysexits.h — used to ask the service manager to restart
-# the gateway after a graceful drain/reload path completes.
-GATEWAY_SERVICE_RESTART_EXIT_CODE = 75
-
-DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT = float(
-    DEFAULT_CONFIG["agent"]["restart_drain_timeout"]
-)
-
-
-def parse_restart_drain_timeout(raw: object) -> float:
-    """Parse a configured drain timeout, falling back to the shared default."""
-    try:
-        value = float(raw) if str(raw or "").strip() else DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-    except (TypeError, ValueError):
-        return DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-    return max(0.0, value)
@@ -186,12 +186,6 @@ if _config_path.exists():
                os.environ["HERMES_AGENT_TIMEOUT"] = str(_agent_cfg["gateway_timeout"])
            if "gateway_timeout_warning" in _agent_cfg and "HERMES_AGENT_TIMEOUT_WARNING" not in os.environ:
                os.environ["HERMES_AGENT_TIMEOUT_WARNING"] = str(_agent_cfg["gateway_timeout_warning"])
-            if "restart_drain_timeout" in _agent_cfg and "HERMES_RESTART_DRAIN_TIMEOUT" not in os.environ:
-                os.environ["HERMES_RESTART_DRAIN_TIMEOUT"] = str(_agent_cfg["restart_drain_timeout"])
-        _display_cfg = _cfg.get("display", {})
-        if _display_cfg and isinstance(_display_cfg, dict):
-            if "busy_input_mode" in _display_cfg and "HERMES_GATEWAY_BUSY_INPUT_MODE" not in os.environ:
-                os.environ["HERMES_GATEWAY_BUSY_INPUT_MODE"] = str(_display_cfg["busy_input_mode"])
        # Timezone: bridge config.yaml → HERMES_TIMEZONE env var.
        # HERMES_TIMEZONE from .env takes precedence (already in os.environ).
        _tz_cfg = _cfg.get("timezone", "")
@@ -241,17 +235,7 @@ from gateway.session import (
    build_session_key,
 )
 from gateway.delivery import DeliveryRouter
-from gateway.platforms.base import (
-    BasePlatformAdapter,
-    MessageEvent,
-    MessageType,
-    merge_pending_message_event,
-)
-from gateway.restart import (
-    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
-    GATEWAY_SERVICE_RESTART_EXIT_CODE,
-    parse_restart_drain_timeout,
-)
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, MessageType


 def _normalize_whatsapp_identifier(value: str) -> str:
@@ -487,16 +471,6 @@ class GatewayRunner:
    # Class-level defaults so partial construction in tests doesn't
    # blow up on attribute access.
    _running_agents_ts: Dict[str, float] = {}
-    _busy_input_mode: str = "interrupt"
-    _restart_drain_timeout: float = DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-    _exit_code: Optional[int] = None
-    _draining: bool = False
-    _restart_requested: bool = False
-    _restart_task_started: bool = False
-    _restart_detached: bool = False
-    _restart_via_service: bool = False
-    _stop_task: Optional[asyncio.Task] = None
-    _session_model_overrides: Dict[str, Dict[str, str]] = {}
    
    def __init__(self, config: Optional[GatewayConfig] = None):
        self.config = config or load_gateway_config()
@@ -509,8 +483,6 @@ class GatewayRunner:
        self._reasoning_config = self._load_reasoning_config()
        self._service_tier = self._load_service_tier()
        self._show_reasoning = self._load_show_reasoning()
-        self._busy_input_mode = self._load_busy_input_mode()
-        self._restart_drain_timeout = self._load_restart_drain_timeout()
        self._provider_routing = self._load_provider_routing()
        self._fallback_model = self._load_fallback_model()
        self._smart_model_routing = self._load_smart_model_routing()
@@ -527,13 +499,6 @@ class GatewayRunner:
        self._exit_cleanly = False
        self._exit_with_failure = False
        self._exit_reason: Optional[str] = None
-        self._exit_code: Optional[int] = None
-        self._draining = False
-        self._restart_requested = False
-        self._restart_task_started = False
-        self._restart_detached = False
-        self._restart_via_service = False
-        self._stop_task: Optional[asyncio.Task] = None
        
        # Track running agents per session for interrupt support
        # Key: session_key, Value: AIAgent instance
@@ -794,10 +759,6 @@ class GatewayRunner:
    def exit_reason(self) -> Optional[str]:
        return self._exit_reason

-    @property
-    def exit_code(self) -> Optional[int]:
-        return self._exit_code
-
    def _session_key_for_source(self, source: SessionSource) -> str:
        """Resolve the current session key for a source, honoring gateway config when available."""
        if hasattr(self, "session_store") and self.session_store is not None:
@@ -907,30 +868,6 @@ class GatewayRunner:
        self._exit_cleanly = True
        self._exit_reason = reason
        self._shutdown_event.set()
-
-    def _running_agent_count(self) -> int:
-        return len(self._running_agents)
-
-    def _status_action_label(self) -> str:
-        return "restart" if self._restart_requested else "shutdown"
-
-    def _status_action_gerund(self) -> str:
-        return "restarting" if self._restart_requested else "shutting down"
-
-    def _queue_during_drain_enabled(self) -> bool:
-        return self._restart_requested and self._busy_input_mode == "queue"
-
-    def _update_runtime_status(self, gateway_state: Optional[str] = None, exit_reason: Optional[str] = None) -> None:
-        try:
-            from gateway.status import write_runtime_status
-            write_runtime_status(
-                gateway_state=gateway_state,
-                exit_reason=exit_reason,
-                restart_requested=self._restart_requested,
-                active_agents=self._running_agent_count(),
-            )
-        except Exception:
-            pass
    
    @staticmethod
    def _load_prefill_messages() -> List[Dict[str, Any]]:
@@ -1057,48 +994,6 @@ class GatewayRunner:
            pass
        return False

-    @staticmethod
-    def _load_busy_input_mode() -> str:
-        """Load gateway drain-time busy-input behavior from config/env."""
-        mode = os.getenv("HERMES_GATEWAY_BUSY_INPUT_MODE", "").strip().lower()
-        if not mode:
-            try:
-                import yaml as _y
-                cfg_path = _hermes_home / "config.yaml"
-                if cfg_path.exists():
-                    with open(cfg_path, encoding="utf-8") as _f:
-                        cfg = _y.safe_load(_f) or {}
-                    mode = str(cfg.get("display", {}).get("busy_input_mode", "") or "").strip().lower()
-            except Exception:
-                pass
-        return "queue" if mode == "queue" else "interrupt"
-
-    @staticmethod
-    def _load_restart_drain_timeout() -> float:
-        """Load graceful gateway restart/stop drain timeout in seconds."""
-        raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
-        if not raw:
-            try:
-                import yaml as _y
-                cfg_path = _hermes_home / "config.yaml"
-                if cfg_path.exists():
-                    with open(cfg_path, encoding="utf-8") as _f:
-                        cfg = _y.safe_load(_f) or {}
-                    raw = str(cfg.get("agent", {}).get("restart_drain_timeout", "") or "").strip()
-            except Exception:
-                pass
-        value = parse_restart_drain_timeout(raw)
-        if raw and value == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT:
-            try:
-                float(raw)
-            except (TypeError, ValueError):
-                logger.warning(
-                    "Invalid restart_drain_timeout '%s', using default %.0fs",
-                    raw,
-                    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
-                )
-        return value
-
    @staticmethod
    def _load_background_notifications_mode() -> str:
        """Load background process notification mode from config or env var.
@@ -1183,155 +1078,6 @@ class GatewayRunner:
            pass
        return {}

-    def _snapshot_running_agents(self) -> Dict[str, Any]:
-        return {
-            session_key: agent
-            for session_key, agent in self._running_agents.items()
-            if agent is not _AGENT_PENDING_SENTINEL
-        }
-
-    def _queue_or_replace_pending_event(self, session_key: str, event: MessageEvent) -> None:
-        adapter = self.adapters.get(event.source.platform)
-        if not adapter:
-            return
-        merge_pending_message_event(adapter._pending_messages, session_key, event)
-
-    async def _handle_active_session_busy_message(self, event: MessageEvent, session_key: str) -> bool:
-        if not self._draining:
-            return False
-
-        adapter = self.adapters.get(event.source.platform)
-        if not adapter:
-            return True
-
-        thread_meta = {"thread_id": event.source.thread_id} if event.source.thread_id else None
-        if self._queue_during_drain_enabled():
-            self._queue_or_replace_pending_event(session_key, event)
-            message = f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
-        else:
-            message = f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
-
-        await adapter._send_with_retry(
-            chat_id=event.source.chat_id,
-            content=message,
-            reply_to=event.message_id,
-            metadata=thread_meta,
-        )
-        return True
-
-    async def _drain_active_agents(self, timeout: float) -> tuple[Dict[str, Any], bool]:
-        snapshot = self._snapshot_running_agents()
-        last_active_count = self._running_agent_count()
-        last_status_at = 0.0
-
-        def _maybe_update_status(force: bool = False) -> None:
-            nonlocal last_active_count, last_status_at
-            now = asyncio.get_running_loop().time()
-            active_count = self._running_agent_count()
-            if force or active_count != last_active_count or (now - last_status_at) >= 1.0:
-                self._update_runtime_status("draining")
-                last_active_count = active_count
-                last_status_at = now
-
-        if not self._running_agents:
-            _maybe_update_status(force=True)
-            return snapshot, False
-
-        _maybe_update_status(force=True)
-        if timeout <= 0:
-            return snapshot, True
-
-        deadline = asyncio.get_running_loop().time() + timeout
-        while self._running_agents and asyncio.get_running_loop().time() < deadline:
-            _maybe_update_status()
-            await asyncio.sleep(0.1)
-        timed_out = bool(self._running_agents)
-        _maybe_update_status(force=True)
-        return snapshot, timed_out
-
-    def _interrupt_running_agents(self, reason: str) -> None:
-        for session_key, agent in list(self._running_agents.items()):
-            if agent is _AGENT_PENDING_SENTINEL:
-                continue
-            try:
-                agent.interrupt(reason)
-                logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
-            except Exception as e:
-                logger.debug("Failed interrupting agent during shutdown: %s", e)
-
-    def _finalize_shutdown_agents(self, active_agents: Dict[str, Any]) -> None:
-        for agent in active_agents.values():
-            try:
-                from hermes_cli.plugins import invoke_hook as _invoke_hook
-                _invoke_hook(
-                    "on_session_finalize",
-                    session_id=getattr(agent, "session_id", None),
-                    platform="gateway",
-                )
-            except Exception:
-                pass
-            try:
-                if hasattr(agent, "shutdown_memory_provider"):
-                    agent.shutdown_memory_provider()
-            except Exception:
-                pass
-            # Close tool resources (terminal sandboxes, browser daemons,
-            # background processes, httpx clients) to prevent zombie
-            # process accumulation.
-            try:
-                if hasattr(agent, 'close'):
-                    agent.close()
-            except Exception:
-                pass
-
-    async def _launch_detached_restart_command(self) -> None:
-        import shutil
-        import subprocess
-
-        hermes_cmd = _resolve_hermes_bin()
-        if not hermes_cmd:
-            logger.error("Could not locate hermes binary for detached /restart")
-            return
-
-        current_pid = os.getpid()
-        cmd = " ".join(shlex.quote(part) for part in hermes_cmd)
-        shell_cmd = (
-            f"while kill -0 {current_pid} 2>/dev/null; do sleep 0.2; done; "
-            f"{cmd} gateway restart"
-        )
-        setsid_bin = shutil.which("setsid")
-        if setsid_bin:
-            subprocess.Popen(
-                [setsid_bin, "bash", "-lc", shell_cmd],
-                stdout=subprocess.DEVNULL,
-                stderr=subprocess.DEVNULL,
-                start_new_session=True,
-            )
-        else:
-            subprocess.Popen(
-                ["bash", "-lc", shell_cmd],
-                stdout=subprocess.DEVNULL,
-                stderr=subprocess.DEVNULL,
-                start_new_session=True,
-            )
-
-    def request_restart(self, *, detached: bool = False, via_service: bool = False) -> bool:
-        if self._restart_task_started:
-            return False
-        self._restart_requested = True
-        self._restart_detached = detached
-        self._restart_via_service = via_service
-        self._restart_task_started = True
-
-        async def _run_restart() -> None:
-            await asyncio.sleep(0.05)
-            await self.stop(restart=True, detached_restart=detached, service_restart=via_service)
-
-        task = asyncio.create_task(_run_restart())
-        self._background_tasks.add(task)
-        task.add_done_callback(self._background_tasks.discard)
-        return True
-
    async def start(self) -> bool:
        """
        Start the gateway and all configured platform adapters.
@@ -1419,7 +1165,6 @@ class GatewayRunner:
            adapter.set_message_handler(self._handle_message)
            adapter.set_fatal_error_handler(self._handle_adapter_fatal_error)
            adapter.set_session_store(self.session_store)
-            adapter.set_busy_session_handler(self._handle_active_session_busy_message)
            
            # Try to connect
            logger.info("Connecting to %s...", platform.value)
@@ -1495,7 +1240,11 @@ class GatewayRunner:
        self.delivery_router.adapters = self.adapters
        
        self._running = True
-        self._update_runtime_status("running")
+        try:
+            from gateway.status import write_runtime_status
+            write_runtime_status(gateway_state="running", exit_reason=None)
+        except Exception:
+            pass
        
        # Emit gateway:startup hook
        hook_count = len(self.hooks.loaded_hooks)
@@ -1730,7 +1479,6 @@ class GatewayRunner:
                    adapter.set_message_handler(self._handle_message)
                    adapter.set_fatal_error_handler(self._handle_adapter_fatal_error)
                    adapter.set_session_store(self.session_store)
-                    adapter.set_busy_session_handler(self._handle_active_session_busy_message)

                    success = await adapter.connect()
                    if success:
@@ -1777,108 +1525,90 @@ class GatewayRunner:
                    return
                await asyncio.sleep(1)

-    async def stop(
-        self,
-        *,
-        restart: bool = False,
-        detached_restart: bool = False,
-        service_restart: bool = False,
-    ) -> None:
+    async def stop(self) -> None:
        """Stop the gateway and disconnect all adapters."""
-        if restart:
-            self._restart_requested = True
-            self._restart_detached = detached_restart
-            self._restart_via_service = service_restart
-        if self._stop_task is not None:
-            await self._stop_task
-            return
+        logger.info("Stopping gateway...")
+        self._running = False

-        async def _stop_impl() -> None:
-            logger.info(
-                "Stopping gateway%s...",
-                " for restart" if self._restart_requested else "",
-            )
-            self._running = False
-            self._draining = True
-
-            timeout = self._restart_drain_timeout
-            active_agents, timed_out = await self._drain_active_agents(timeout)
-            if timed_out:
-                logger.warning(
-                    "Gateway drain timed out after %.1fs with %d active agent(s); interrupting remaining work.",
-                    timeout,
-                    self._running_agent_count(),
-                )
-                self._interrupt_running_agents(
-                    "Gateway restarting" if self._restart_requested else "Gateway shutting down"
-                )
-                interrupt_deadline = asyncio.get_running_loop().time() + 5.0
-                while self._running_agents and asyncio.get_running_loop().time() < interrupt_deadline:
-                    self._update_runtime_status("draining")
-                    await asyncio.sleep(0.1)
-
-            if self._restart_requested and self._restart_detached:
-                try:
-                    await self._launch_detached_restart_command()
-                except Exception as e:
-                    logger.error("Failed to launch detached gateway restart: %s", e)
-
-            self._finalize_shutdown_agents(active_agents)
-
-            for platform, adapter in list(self.adapters.items()):
-                try:
-                    await adapter.cancel_background_tasks()
-                except Exception as e:
-                    logger.debug("✗ %s background-task cancel error: %s", platform.value, e)
-                try:
-                    await adapter.disconnect()
-                    logger.info("✓ %s disconnected", platform.value)
-                except Exception as e:
-                    logger.error("✗ %s disconnect error: %s", platform.value, e)
-
-            for _task in list(self._background_tasks):
-                if _task is self._stop_task:
-                    continue
-                _task.cancel()
-            self._background_tasks.clear()
-
-            self.adapters.clear()
-            self._running_agents.clear()
-            self._pending_messages.clear()
-            self._pending_approvals.clear()
-            self._shutdown_event.set()
-
-            # Global cleanup: kill any remaining tool subprocesses not tied
-            # to a specific agent (catch-all for zombie prevention).
+        for session_key, agent in list(self._running_agents.items()):
+            if agent is _AGENT_PENDING_SENTINEL:
+                continue
            try:
-                from tools.process_registry import process_registry
-                process_registry.kill_all()
+                agent.interrupt("Gateway shutting down")
+                logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
+            except Exception as e:
+                logger.debug("Failed interrupting agent during shutdown: %s", e)
+            # Fire plugin on_session_finalize hook before memory shutdown
+            try:
+                from hermes_cli.plugins import invoke_hook as _invoke_hook
+                _invoke_hook("on_session_finalize",
+                             session_id=getattr(agent, 'session_id', None),
+                             platform="gateway")
            except Exception:
                pass
+            # Shut down memory provider at actual session boundary
            try:
-                from tools.terminal_tool import cleanup_all_environments
-                cleanup_all_environments()
+                if hasattr(agent, 'shutdown_memory_provider'):
+                    agent.shutdown_memory_provider()
            except Exception:
                pass
+            # Close tool resources (terminal sandboxes, browser daemons,
+            # background processes, httpx clients) to prevent zombie
+            # process accumulation.
            try:
-                from tools.browser_tool import cleanup_all_browsers
-                cleanup_all_browsers()
+                if hasattr(agent, 'close'):
+                    agent.close()
            except Exception:
                pass

-            from gateway.status import remove_pid_file
-            remove_pid_file()
+        for platform, adapter in list(self.adapters.items()):
+            try:
+                await adapter.cancel_background_tasks()
+            except Exception as e:
+                logger.debug("✗ %s background-task cancel error: %s", platform.value, e)
+            try:
+                await adapter.disconnect()
+                logger.info("✓ %s disconnected", platform.value)
+            except Exception as e:
+                logger.error("✗ %s disconnect error: %s", platform.value, e)

-            if self._restart_requested and self._restart_via_service:
-                self._exit_code = GATEWAY_SERVICE_RESTART_EXIT_CODE
-                self._exit_reason = self._exit_reason or "Gateway restart requested"
+        # Cancel any pending background tasks
+        for _task in list(self._background_tasks):
+            _task.cancel()
+        self._background_tasks.clear()

-            self._draining = False
-            self._update_runtime_status("stopped", self._exit_reason)
-            logger.info("Gateway stopped")
+        self.adapters.clear()
+        self._running_agents.clear()
+        self._pending_messages.clear()
+        self._pending_approvals.clear()
+        self._shutdown_event.set()

-        self._stop_task = asyncio.create_task(_stop_impl())
-        await self._stop_task
+        # Global cleanup: kill any remaining tool subprocesses not tied
+        # to a specific agent (catch-all for zombie prevention).
+        try:
+            from tools.process_registry import process_registry
+            process_registry.kill_all()
+        except Exception:
+            pass
+        try:
+            from tools.terminal_tool import cleanup_all_environments
+            cleanup_all_environments()
+        except Exception:
+            pass
+        try:
+            from tools.browser_tool import cleanup_all_browsers
+            cleanup_all_browsers()
+        except Exception:
+            pass
+
+        from gateway.status import remove_pid_file, write_runtime_status
+        remove_pid_file()
+        try:
+            write_runtime_status(gateway_state="stopped", exit_reason=self._exit_reason)
+        except Exception:
+            pass
+        
+        logger.info("Gateway stopped")
    
    async def wait_for_shutdown(self) -> None:
        """Wait for shutdown signal."""
@@ -1994,7 +1724,7 @@ class GatewayRunner:
        elif platform == Platform.MATRIX:
            from gateway.platforms.matrix import MatrixAdapter, check_matrix_requirements
            if not check_matrix_requirements():
-                logger.warning("Matrix: mautrix not installed or credentials not set. Run: pip install 'mautrix[encryption]'")
+                logger.warning("Matrix: matrix-nio not installed or credentials not set. Run: pip install 'matrix-nio[e2e]'")
                return None
            return MatrixAdapter(config)

@@ -2284,9 +2014,6 @@ class GatewayRunner:
            _evt_cmd = event.get_command()
            _cmd_def_inner = _resolve_cmd_inner(_evt_cmd) if _evt_cmd else None

-            if _cmd_def_inner and _cmd_def_inner.name == "restart":
-                return await self._handle_restart_command(event)
-
            # /stop must hard-kill the session when an agent is running.
            # A soft interrupt (agent.interrupt()) doesn't help when the agent
            # is truly hung — the executor thread is blocked and never checks
@@ -2367,7 +2094,18 @@ class GatewayRunner:
                logger.debug("PRIORITY photo follow-up for session %s — queueing without interrupt", _quick_key[:20])
                adapter = self.adapters.get(source.platform)
                if adapter:
-                    merge_pending_message_event(adapter._pending_messages, _quick_key, event)
+                    # Reuse adapter queue semantics so photo bursts merge cleanly.
+                    if _quick_key in adapter._pending_messages:
+                        existing = adapter._pending_messages[_quick_key]
+                        if getattr(existing, "message_type", None) == MessageType.PHOTO:
+                            existing.media_urls.extend(event.media_urls)
+                            existing.media_types.extend(event.media_types)
+                            if event.text:
+                                existing.text = BasePlatformAdapter._merge_caption(existing.text, event.text)
+                        else:
+                            adapter._pending_messages[_quick_key] = event
+                    else:
+                        adapter._pending_messages[_quick_key] = event
                return None

            running_agent = self._running_agents.get(_quick_key)
@@ -2385,14 +2123,6 @@ class GatewayRunner:
                if adapter:
                    adapter._pending_messages[_quick_key] = event
                return None
-            if self._draining:
-                if self._queue_during_drain_enabled():
-                    self._queue_or_replace_pending_event(_quick_key, event)
-                return (
-                    f"⏳ Gateway {self._status_action_gerund()} — queued for the next turn after it comes back."
-                    if self._queue_during_drain_enabled()
-                    else f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
-                )
            logger.debug("PRIORITY interrupt for session %s", _quick_key[:20])
            running_agent.interrupt(event.text)
            if _quick_key in self._pending_messages:
@@ -2434,9 +2164,6 @@ class GatewayRunner:

        if canonical == "status":
            return await self._handle_status_command(event)
-
-        if canonical == "restart":
-            return await self._handle_restart_command(event)
        
        if canonical == "stop":
            return await self._handle_stop_command(event)
@@ -2535,9 +2262,6 @@ class GatewayRunner:
        if canonical == "voice":
            return await self._handle_voice_command(event)

-        if self._draining:
-            return f"⏳ Gateway is {self._status_action_gerund()} and is not accepting new work right now."
-
        # User-defined quick commands (bypass agent loop, no LLM call)
        if command:
            if isinstance(self.config, dict):
@@ -3485,12 +3209,7 @@ class GatewayRunner:
            # post-processing in _process_message_background is skipped
            # when already_sent is True, so media files would never be
            # delivered without this.
-            #
-            # Never skip when the agent failed — the error message is new
-            # content the user hasn't seen (streaming only sent earlier
-            # partial output before the failure).  Without this guard,
-            # users see the agent "stop responding without explanation."
-            if agent_result.get("already_sent") and not agent_result.get("failed"):
+            if agent_result.get("already_sent"):
                if response:
                    _media_adapter = self.adapters.get(source.platform)
                    if _media_adapter:
@@ -3837,21 +3556,7 @@ class GatewayRunner:
            return "⚡ Force-stopped. The session is unlocked — you can send a new message."
        else:
            return "No active task to stop."
-
-    async def _handle_restart_command(self, event: MessageEvent) -> str:
-        """Handle /restart command - drain active work, then restart the gateway."""
-        if self._restart_requested or self._draining:
-            count = self._running_agent_count()
-            if count:
-                return f"⏳ Draining {count} active agent(s) before restart..."
-            return "⏳ Gateway restart already in progress..."
-
-        active_agents = self._running_agent_count()
-        self.request_restart(detached=True, via_service=False)
-        if active_agents:
-            return f"⏳ Draining {active_agents} active agent(s) before restart..."
-        return "♻ Restarting gateway..."
-
+    
    async def _handle_help_command(self, event: MessageEvent) -> str:
        """Handle /help command - list available commands."""
        from hermes_cli.commands import gateway_help_lines
@@ -3974,7 +3679,7 @@ class GatewayRunner:
        # Check for session override
        source = event.source
        session_key = self._session_key_for_source(source)
-        override = self._session_model_overrides.get(session_key, {})
+        override = getattr(self, "_session_model_overrides", {}).get(session_key, {})
        if override:
            current_model = override.get("model", current_model)
            current_provider = override.get("provider", current_provider)
@@ -4056,6 +3761,8 @@ class GatewayRunner:
                            f"via {result.provider_label or result.target_provider}. "
                            f"Adjust your self-identification accordingly.]"
                        )
+                        if not hasattr(_self, "_session_model_overrides"):
+                            _self._session_model_overrides = {}
                        _self._session_model_overrides[_session_key] = {
                            "model": result.new_model,
                            "provider": result.target_provider,
@@ -4169,6 +3876,8 @@ class GatewayRunner:
        )

        # Store session override so next agent creation uses the new model
+        if not hasattr(self, "_session_model_overrides"):
+            self._session_model_overrides = {}
        self._session_model_overrides[session_key] = {
            "model": result.new_model,
            "provider": result.target_provider,
@@ -5487,7 +5196,6 @@ class GatewayRunner:

        try:
            from run_agent import AIAgent
-            from agent.manual_compression_feedback import summarize_manual_compression
            from agent.model_metadata import estimate_messages_tokens_rough

            runtime_kwargs = _resolve_runtime_agent_kwargs()
@@ -5515,13 +5223,6 @@ class GatewayRunner:
            )
            tmp_agent._print_fn = lambda *a, **kw: None

-            compressor = tmp_agent.context_compressor
-            compress_start = compressor.protect_first_n
-            compress_start = compressor._align_boundary_forward(msgs, compress_start)
-            compress_end = compressor._find_tail_cut_by_tokens(msgs, compress_start)
-            if compress_start >= compress_end:
-                return "Nothing to compress yet (the transcript is still all protected context)."
-
            loop = asyncio.get_event_loop()
            compressed, _ = await loop.run_in_executor(
                None,
@@ -5542,17 +5243,13 @@ class GatewayRunner:
            self.session_store.update_session(
                session_entry.session_key, last_prompt_tokens=0
            )
+            new_count = len(compressed)
            new_tokens = estimate_messages_tokens_rough(compressed)
-            summary = summarize_manual_compression(
-                msgs,
-                compressed,
-                approx_tokens,
-                new_tokens,
+
+            return (
+                f"🗜️ Compressed: {original_count} → {new_count} messages\n"
+                f"~{approx_tokens:,} → ~{new_tokens:,} tokens"
            )
-            lines = [f"🗜️ {summary['headline']}", summary["token_line"]]
-            if summary["note"]:
-                lines.append(summary["note"])
-            return "\n".join(lines)
        except Exception as e:
            logger.warning("Manual compress failed: %s", e)
            return f"Compression failed: {e}"
@@ -7666,8 +7363,6 @@ class GatewayRunner:
                await asyncio.sleep(0.05)
            if session_key:
                self._running_agents[session_key] = agent_holder[0]
-                if self._draining:
-                    self._update_runtime_status("draining")
        
        tracking_task = asyncio.create_task(track_agent())
        
@@ -7867,19 +7562,12 @@ class GatewayRunner:
            # Track fallback model state: if the agent switched to a
            # fallback model during this run, persist it so /model shows
            # the actually-active model instead of the config default.
-            # Skip eviction when the run failed — evicting a failed agent
-            # forces MCP reinit on the next message for no benefit (the
-            # same error will recur).  This was the root cause of #7130:
-            # a bad model ID triggered fallback → eviction → recreation →
-            # MCP reinit → same 400 → loop, burning 91% CPU for hours.
            _agent = agent_holder[0]
-            _result_for_fb = result_holder[0]
-            _run_failed = _result_for_fb.get("failed") if _result_for_fb else False
-            if _agent is not None and hasattr(_agent, 'model') and not _run_failed:
+            if _agent is not None and hasattr(_agent, 'model'):
                _cfg_model = _resolve_gateway_model()
                if _agent.model != _cfg_model and not self._is_intentional_model_switch(session_key, _agent.model):
-                    # Fallback activated on a successful run — evict cached
-                    # agent so the next message retries the primary model.
+                    # Fallback activated — evict cached agent so the next
+                    # message starts fresh and retries the primary model.
                    self._evict_cached_agent(session_key)

            # Check if we were interrupted OR have a queued message (/queue).
@@ -7920,14 +7608,6 @@ class GatewayRunner:
                    except Exception:
                        pass

-            if self._draining and pending:
-                logger.info(
-                    "Discarding pending follow-up for session %s during gateway %s",
-                    session_key[:20] if session_key else "?",
-                    self._status_action_label(),
-                )
-                pending = None
-
            if pending:
                logger.debug("Processing pending message: '%s...'", pending[:40])
                
@@ -8004,8 +7684,6 @@ class GatewayRunner:
                del self._running_agents[session_key]
            if session_key:
                self._running_agents_ts.pop(session_key, None)
-            if self._draining:
-                self._update_runtime_status("draining")
            
            # Wait for cancelled tasks
            for task in [progress_task, interrupt_monitor, tracking_task, _notify_task]:
@@ -8017,13 +7695,9 @@ class GatewayRunner:

        # If streaming already delivered the response, mark it so the
        # caller's send() is skipped (avoiding duplicate messages).
-        # BUT: never suppress delivery when the agent failed — the error
-        # message is new content the user hasn't seen, and it must reach
-        # them even if streaming had sent earlier partial output.
        _sc = stream_consumer_holder[0]
        if _sc and _sc.already_sent and isinstance(response, dict):
-            if not response.get("failed"):
-                response["already_sent"] = True
+            response["already_sent"] = True
        
        return response

@@ -8207,21 +7881,13 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
    runner = GatewayRunner(config)
    
    # Set up signal handlers
-    def shutdown_signal_handler():
+    def signal_handler():
        asyncio.create_task(runner.stop())
-
-    def restart_signal_handler():
-        runner.request_restart(detached=False, via_service=True)
    
    loop = asyncio.get_event_loop()
    for sig in (signal.SIGINT, signal.SIGTERM):
        try:
-            loop.add_signal_handler(sig, shutdown_signal_handler)
-        except NotImplementedError:
-            pass
-    if hasattr(signal, "SIGUSR1"):
-        try:
-            loop.add_signal_handler(signal.SIGUSR1, restart_signal_handler)
+            loop.add_signal_handler(sig, signal_handler)
        except NotImplementedError:
            pass
    
@@ -8271,9 +7937,6 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
    except Exception:
        pass

-    if runner.exit_code is not None:
-        raise SystemExit(runner.exit_code)
-
    return True


@@ -158,8 +158,6 @@ def _build_runtime_status_record() -> dict[str, Any]:
    payload.update({
        "gateway_state": "starting",
        "exit_reason": None,
-        "restart_requested": False,
-        "active_agents": 0,
        "platforms": {},
        "updated_at": _utc_now_iso(),
    })
@@ -220,8 +218,6 @@ def write_runtime_status(
    *,
    gateway_state: Optional[str] = None,
    exit_reason: Optional[str] = None,
-    restart_requested: Optional[bool] = None,
-    active_agents: Optional[int] = None,
    platform: Optional[str] = None,
    platform_state: Optional[str] = None,
    error_code: Optional[str] = None,
@@ -240,10 +236,6 @@ def write_runtime_status(
        payload["gateway_state"] = gateway_state
    if exit_reason is not None:
        payload["exit_reason"] = exit_reason
-    if restart_requested is not None:
-        payload["restart_requested"] = bool(restart_requested)
-    if active_agents is not None:
-        payload["active_agents"] = max(0, int(active_agents))

    if platform is not None:
        platform_payload = payload["platforms"].get(platform, {})
@@ -19,10 +19,11 @@ import subprocess
 import sys
 from pathlib import Path

-from hermes_constants import is_wsl as _is_wsl
-
 logger = logging.getLogger(__name__)

+# Cache WSL detection (checked once per process)
+_wsl_detected: bool | None = None
+

 def save_clipboard_image(dest: Path) -> bool:
    """Extract an image from the system clipboard and save it as PNG.
@@ -216,6 +217,19 @@ def _windows_save(dest: Path) -> bool:

 # ── Linux ────────────────────────────────────────────────────────────────

+def _is_wsl() -> bool:
+    """Detect if running inside WSL (1 or 2)."""
+    global _wsl_detected
+    if _wsl_detected is not None:
+        return _wsl_detected
+    try:
+        with open("/proc/version", "r") as f:
+            _wsl_detected = "microsoft" in f.read().lower()
+    except Exception:
+        _wsl_detected = False
+    return _wsl_detected
+
+
 def _linux_save(dest: Path) -> bool:
    """Try clipboard backends in priority order: WSL → Wayland → X11."""
    if _is_wsl():
@@ -140,8 +140,6 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("commands", "Browse all commands and skills (paginated)", "Info",
               gateway_only=True, args_hint="[page]"),
    CommandDef("help", "Show available commands", "Info"),
-    CommandDef("restart", "Gracefully restart the gateway after draining active runs", "Session",
-               gateway_only=True),
    CommandDef("usage", "Show token usage and rate limits for the current session", "Info"),
    CommandDef("insights", "Show usage insights and analytics", "Info",
               args_hint="[days]"),
@@ -141,6 +141,68 @@ def managed_error(action: str = "modify configuration"):
    print(format_managed_message(action), file=sys.stderr)


+# =============================================================================
+# Container-aware CLI (NixOS container mode)
+# =============================================================================
+
+def _is_inside_container() -> bool:
+    """Detect if we're already running inside a Docker/Podman container."""
+    # Standard Docker/Podman indicators
+    if os.path.exists("/.dockerenv"):
+        return True
+    # Podman uses /run/.containerenv
+    if os.path.exists("/run/.containerenv"):
+        return True
+    # Check cgroup for container runtime evidence (works for both Docker & Podman)
+    try:
+        with open("/proc/1/cgroup", "r") as f:
+            cgroup = f.read()
+            if "docker" in cgroup or "podman" in cgroup or "/lxc/" in cgroup:
+                return True
+    except (OSError, IOError):
+        pass
+    return False
+
+
+def get_container_exec_info() -> Optional[dict]:
+    """Read container mode metadata from HERMES_HOME/.container-mode.
+
+    Returns a dict with keys: backend, container_name, hermes_bin
+    or None if container mode is not active or we're already inside the container.
+
+    The .container-mode file is written by the NixOS activation script when
+    container.enable = true. It tells the host CLI to exec into the container
+    instead of running locally.
+    """
+    if _is_inside_container():
+        return None
+
+    container_mode_file = get_hermes_home() / ".container-mode"
+    if not container_mode_file.exists():
+        return None
+
+    try:
+        info = {}
+        with open(container_mode_file, "r") as f:
+            for line in f:
+                line = line.strip()
+                if "=" in line and not line.startswith("#"):
+                    key, _, value = line.partition("=")
+                    info[key.strip()] = value.strip()
+
+        backend = info.get("backend", "docker")
+        container_name = info.get("container_name", "hermes-agent")
+        hermes_bin = info.get("hermes_bin", "/data/current-package/bin/hermes")
+
+        return {
+            "backend": backend,
+            "container_name": container_name,
+            "hermes_bin": hermes_bin,
+        }
+    except (OSError, IOError):
+        return None
+
+
 # =============================================================================
 # Config paths
 # =============================================================================
@@ -269,11 +331,6 @@ DEFAULT_CONFIG = {
        # tools or receiving API responses.  Only fires when the agent has
        # been completely idle for this duration.  0 = unlimited.
        "gateway_timeout": 1800,
-        # Graceful drain timeout for gateway stop/restart (seconds).
-        # The gateway stops accepting new work, waits for running agents
-        # to finish, then interrupts any remaining runs after the timeout.
-        # 0 = no drain, interrupt immediately.
-        "restart_drain_timeout": 60,
        "service_tier": "",
        # Tool-use enforcement: injects system prompt guidance that tells the
        # model to actually call tools instead of describing intended actions.
@@ -458,7 +515,7 @@ DEFAULT_CONFIG = {
    
    # Text-to-speech configuration
    "tts": {
-        "provider": "edge",  # "edge" (free) | "elevenlabs" (premium) | "openai" | "minimax" | "mistral" | "neutts" (local)
+        "provider": "edge",  # "edge" (free) | "elevenlabs" (premium) | "openai" | "neutts" (local)
        "edge": {
            "voice": "en-US-AriaNeural",
            # Popular: AriaNeural, JennyNeural, AndrewNeural, BrianNeural, SoniaNeural
@@ -472,10 +529,6 @@ DEFAULT_CONFIG = {
            "voice": "alloy",
            # Voices: alloy, echo, fable, onyx, nova, shimmer
        },
-        "mistral": {
-            "model": "voxtral-mini-tts-2603",
-            "voice_id": "c69964a6-ab8b-4f8a-9465-ec0925096ec8",  # Paul - Neutral
-        },
        "neutts": {
            "ref_audio": "",  # Path to reference voice audio (empty = bundled default)
            "ref_text": "",   # Path to reference voice transcript (empty = bundled default)
@@ -513,16 +566,6 @@ DEFAULT_CONFIG = {
        "max_ms": 2500,
    },
    
-    # Context engine -- controls how the context window is managed when
-    # approaching the model's token limit.
-    # "compressor" = built-in lossy summarization (default).
-    # Set to a plugin name to activate an alternative engine (e.g. "lcm"
-    # for Lossless Context Management).  The engine must be installed as
-    # a plugin in plugins/context_engine/<name>/ or ~/.hermes/plugins/.
-    "context": {
-        "engine": "compressor",
-    },
-
    # Persistent memory -- bounded curated memory injected into system prompt
    "memory": {
        "memory_enabled": True,
@@ -547,8 +590,6 @@ DEFAULT_CONFIG = {
        "api_key": "",     # API key for delegation.base_url (falls back to OPENAI_API_KEY)
        "max_iterations": 50,  # per-subagent iteration cap (each subagent gets its own budget,
                               # independent of the parent's max_iterations)
-        "reasoning_effort": "",  # reasoning effort for subagents: "xhigh", "high", "medium",
-                                 # "low", "minimal", "none" (empty = inherit parent's level)
    },

    # Ephemeral prefill messages file — JSON list of {role, content} dicts
@@ -1020,13 +1061,6 @@ OPTIONAL_ENV_VARS = {
        "password": True,
        "category": "tool",
    },
-    "MISTRAL_API_KEY": {
-        "description": "Mistral API key for Voxtral TTS and transcription (STT)",
-        "prompt": "Mistral API key",
-        "url": "https://console.mistral.ai/",
-        "password": True,
-        "category": "tool",
-    },
    "GITHUB_TOKEN": {
        "description": "GitHub token for Skills Hub (higher API rate limits, skill publish)",
        "prompt": "GitHub Token",
@@ -1478,7 +1512,7 @@ _KNOWN_ROOT_KEYS = {
    "_config_version", "model", "providers", "fallback_model",
    "fallback_providers", "credential_pool_strategies", "toolsets",
    "agent", "terminal", "display", "compression", "delegation",
-    "auxiliary", "custom_providers", "context", "memory", "gateway",
+    "auxiliary", "custom_providers", "memory", "gateway",
 }

 # Valid fields inside a custom_providers list entry
@@ -2801,10 +2835,6 @@ def set_config_value(key: str, value: str):
        "terminal.timeout": "TERMINAL_TIMEOUT",
        "terminal.sandbox_dir": "TERMINAL_SANDBOX_DIR",
        "terminal.persistent_shell": "TERMINAL_PERSISTENT_SHELL",
-        "terminal.container_cpu": "TERMINAL_CONTAINER_CPU",
-        "terminal.container_memory": "TERMINAL_CONTAINER_MEMORY",
-        "terminal.container_disk": "TERMINAL_CONTAINER_DISK",
-        "terminal.container_persistent": "TERMINAL_CONTAINER_PERSISTENT",
    }
    if key in _config_to_env_sync:
        save_env_value(_config_to_env_sync[key], str(value))
@@ -160,133 +160,6 @@ def curses_checklist(
        return _numbered_fallback(title, items, selected, cancel_returns, status_fn)


-def curses_radiolist(
-    title: str,
-    items: List[str],
-    selected: int = 0,
-    *,
-    cancel_returns: int | None = None,
-) -> int:
-    """Curses single-select radio list. Returns the selected index.
-
-    Args:
-        title: Header line displayed above the list.
-        items: Display labels for each row.
-        selected: Index that starts selected (pre-selected).
-        cancel_returns: Returned on ESC/q. Defaults to the original *selected*.
-    """
-    if cancel_returns is None:
-        cancel_returns = selected
-
-    if not sys.stdin.isatty():
-        return cancel_returns
-
-    try:
-        import curses
-        result_holder: list = [None]
-
-        def _draw(stdscr):
-            curses.curs_set(0)
-            if curses.has_colors():
-                curses.start_color()
-                curses.use_default_colors()
-                curses.init_pair(1, curses.COLOR_GREEN, -1)
-                curses.init_pair(2, curses.COLOR_YELLOW, -1)
-            cursor = selected
-            scroll_offset = 0
-
-            while True:
-                stdscr.clear()
-                max_y, max_x = stdscr.getmaxyx()
-
-                # Header
-                try:
-                    hattr = curses.A_BOLD
-                    if curses.has_colors():
-                        hattr |= curses.color_pair(2)
-                    stdscr.addnstr(0, 0, title, max_x - 1, hattr)
-                    stdscr.addnstr(
-                        1, 0,
-                        "  \u2191\u2193 navigate  ENTER/SPACE select  ESC cancel",
-                        max_x - 1, curses.A_DIM,
-                    )
-                except curses.error:
-                    pass
-
-                # Scrollable item list
-                visible_rows = max_y - 4
-                if cursor < scroll_offset:
-                    scroll_offset = cursor
-                elif cursor >= scroll_offset + visible_rows:
-                    scroll_offset = cursor - visible_rows + 1
-
-                for draw_i, i in enumerate(
-                    range(scroll_offset, min(len(items), scroll_offset + visible_rows))
-                ):
-                    y = draw_i + 3
-                    if y >= max_y - 1:
-                        break
-                    radio = "\u25cf" if i == selected else "\u25cb"
-                    arrow = "\u2192" if i == cursor else " "
-                    line = f" {arrow} ({radio}) {items[i]}"
-                    attr = curses.A_NORMAL
-                    if i == cursor:
-                        attr = curses.A_BOLD
-                        if curses.has_colors():
-                            attr |= curses.color_pair(1)
-                    try:
-                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
-                    except curses.error:
-                        pass
-
-                stdscr.refresh()
-                key = stdscr.getch()
-
-                if key in (curses.KEY_UP, ord("k")):
-                    cursor = (cursor - 1) % len(items)
-                elif key in (curses.KEY_DOWN, ord("j")):
-                    cursor = (cursor + 1) % len(items)
-                elif key in (ord(" "), curses.KEY_ENTER, 10, 13):
-                    result_holder[0] = cursor
-                    return
-                elif key in (27, ord("q")):
-                    result_holder[0] = cancel_returns
-                    return
-
-        curses.wrapper(_draw)
-        flush_stdin()
-        return result_holder[0] if result_holder[0] is not None else cancel_returns
-
-    except Exception:
-        return _radio_numbered_fallback(title, items, selected, cancel_returns)
-
-
-def _radio_numbered_fallback(
-    title: str,
-    items: List[str],
-    selected: int,
-    cancel_returns: int,
-) -> int:
-    """Text-based numbered fallback for radio selection."""
-    print(color(f"\n  {title}", Colors.YELLOW))
-    print(color("  Select by number, Enter to confirm.\n", Colors.DIM))
-
-    for i, label in enumerate(items):
-        marker = color("(\u25cf)", Colors.GREEN) if i == selected else "(\u25cb)"
-        print(f"  {marker} {i + 1:>2}. {label}")
-    print()
-    try:
-        val = input(color(f"  Choice [default {selected + 1}]: ", Colors.DIM)).strip()
-        if not val:
-            return selected
-        idx = int(val) - 1
-        if 0 <= idx < len(items):
-            return idx
-        return selected
-    except (ValueError, KeyboardInterrupt, EOFError):
-        return cancel_returns
-
-
 def _numbered_fallback(
    title: str,
    items: List[str],
@@ -722,9 +722,9 @@ def run_doctor(args):
        ("DeepSeek",         ("DEEPSEEK_API_KEY",),                           "https://api.deepseek.com/v1/models",  "DEEPSEEK_BASE_URL", True),
        ("Hugging Face",     ("HF_TOKEN",),                                   "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
        ("Alibaba/DashScope", ("DASHSCOPE_API_KEY",),                         "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models", "DASHSCOPE_BASE_URL", True),
-        # MiniMax: the /anthropic endpoint doesn't support /models, but the /v1 endpoint does.
-        ("MiniMax",          ("MINIMAX_API_KEY",),                            "https://api.minimax.io/v1/models",    "MINIMAX_BASE_URL", True),
-        ("MiniMax (China)",  ("MINIMAX_CN_API_KEY",),                         "https://api.minimaxi.com/v1/models",  "MINIMAX_CN_BASE_URL", True),
+        # MiniMax APIs don't support /models endpoint — https://github.com/NousResearch/hermes-agent/issues/811
+        ("MiniMax",          ("MINIMAX_API_KEY",),                            None,                                  "MINIMAX_BASE_URL", False),
+        ("MiniMax (China)",  ("MINIMAX_CN_API_KEY",),                         None,                                  "MINIMAX_CN_BASE_URL", False),
        ("AI Gateway",       ("AI_GATEWAY_API_KEY",),                          "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
        ("Kilo Code",        ("KILOCODE_API_KEY",),                            "https://api.kilo.ai/api/gateway/models",  "KILOCODE_BASE_URL", True),
        ("OpenCode Zen",     ("OPENCODE_ZEN_API_KEY",),                        "https://opencode.ai/zen/v1/models",  "OPENCODE_ZEN_BASE_URL", True),
@@ -749,11 +749,6 @@ def run_doctor(args):
                # Auto-detect Kimi Code keys (sk-kimi-) → api.kimi.com
                if not _base and _key.startswith("sk-kimi-"):
                    _base = "https://api.kimi.com/coding/v1"
-                # Anthropic-compat endpoints (/anthropic) don't support /models.
-                # Rewrite to the OpenAI-compat /v1 surface for health checks.
-                if _base and _base.rstrip("/").endswith("/anthropic"):
-                    from agent.auxiliary_client import _to_openai_base_url
-                    _base = _to_openai_base_url(_base)
                _url = (_base.rstrip("/") + "/models") if _base else _default_url
                _headers = {"Authorization": f"Bearer {_key}"}
                if "api.kimi.com" in _url.lower():
@@ -15,19 +15,7 @@ from pathlib import Path
 PROJECT_ROOT = Path(__file__).parent.parent.resolve()

 from gateway.status import terminate_pid
-from gateway.restart import (
-    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
-    GATEWAY_SERVICE_RESTART_EXIT_CODE,
-    parse_restart_drain_timeout,
-)
-from hermes_cli.config import (
-    get_env_value,
-    get_hermes_home,
-    is_managed,
-    managed_error,
-    read_raw_config,
-    save_env_value,
-)
+from hermes_cli.config import get_env_value, get_hermes_home, save_env_value, is_managed, managed_error
 # display_hermes_home is imported lazily at call sites to avoid ImportError
 # when hermes_constants is cached from a pre-update version during `hermes update`.
 from hermes_cli.setup import (
@@ -104,59 +92,6 @@ def _get_service_pids() -> set:
    return pids


-def _get_parent_pid(pid: int) -> int | None:
-    """Return the parent PID for ``pid``, or ``None`` when unavailable."""
-    if pid <= 1:
-        return None
-    try:
-        result = subprocess.run(
-            ["ps", "-o", "ppid=", "-p", str(pid)],
-            capture_output=True,
-            text=True,
-            timeout=5,
-        )
-    except (FileNotFoundError, subprocess.TimeoutExpired):
-        return None
-    if result.returncode != 0:
-        return None
-    raw = result.stdout.strip()
-    if not raw:
-        return None
-    try:
-        parent_pid = int(raw.splitlines()[-1].strip())
-    except ValueError:
-        return None
-    return parent_pid if parent_pid > 0 else None
-
-
-def _is_pid_ancestor_of_current_process(target_pid: int) -> bool:
-    """Return True when ``target_pid`` is this process or one of its ancestors."""
-    if target_pid <= 0:
-        return False
-
-    pid = os.getpid()
-    seen: set[int] = set()
-    while pid and pid not in seen:
-        if pid == target_pid:
-            return True
-        seen.add(pid)
-        pid = _get_parent_pid(pid) or 0
-    return False
-
-
-def _request_gateway_self_restart(pid: int) -> bool:
-    """Ask a running gateway ancestor to restart itself asynchronously."""
-    if not hasattr(signal, "SIGUSR1"):
-        return False
-    if not _is_pid_ancestor_of_current_process(pid):
-        return False
-    try:
-        os.kill(pid, signal.SIGUSR1)
-    except (ProcessLookupError, PermissionError, OSError):
-        return False
-    return True
-
-
 def find_gateway_pids(exclude_pids: set | None = None) -> list:
    """Find PIDs of running gateway processes.

@@ -291,33 +226,11 @@ def is_linux() -> bool:
    return sys.platform.startswith('linux')


-from hermes_constants import is_termux, is_wsl
-
-
-def _wsl_systemd_operational() -> bool:
-    """Check if systemd is actually running as PID 1 on WSL.
-
-    WSL2 with ``systemd=true`` in wsl.conf has working systemd.
-    WSL2 without it (or WSL1) does not — systemctl commands fail.
-    """
-    try:
-        result = subprocess.run(
-            ["systemctl", "is-system-running"],
-            capture_output=True, text=True, timeout=5,
-        )
-        # "running", "degraded", "starting" all mean systemd is PID 1
-        status = result.stdout.strip().lower()
-        return status in ("running", "degraded", "starting", "initializing")
-    except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
-        return False
+from hermes_constants import is_termux


 def supports_systemd_services() -> bool:
-    if not is_linux() or is_termux():
-        return False
-    if is_wsl():
-        return _wsl_systemd_operational()
-    return True
+    return is_linux() and not is_termux()


 def is_macos() -> bool:
@@ -752,7 +665,6 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)
            path_entries.append(resolved_node_dir)

    common_bin_paths = ["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"]
-    restart_timeout = max(60, int(_get_restart_drain_timeout() or 0))

    if system:
        username, group_name, home_dir = _system_service_identity(run_as_user)
@@ -791,11 +703,9 @@ Environment="VIRTUAL_ENV={venv_dir}"
 Environment="HERMES_HOME={hermes_home}"
 Restart=on-failure
 RestartSec=30
-RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}
 KillMode=mixed
 KillSignal=SIGTERM
-ExecReload=/bin/kill -USR1 $MAINPID
-TimeoutStopSec={restart_timeout}
+TimeoutStopSec=60
 StandardOutput=journal
 StandardError=journal

@@ -823,11 +733,9 @@ Environment="VIRTUAL_ENV={venv_dir}"
 Environment="HERMES_HOME={hermes_home}"
 Restart=on-failure
 RestartSec=30
-RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}
 KillMode=mixed
 KillSignal=SIGTERM
-ExecReload=/bin/kill -USR1 $MAINPID
-TimeoutStopSec={restart_timeout}
+TimeoutStopSec=60
 StandardOutput=journal
 StandardError=journal

@@ -930,20 +838,6 @@ def _select_systemd_scope(system: bool = False) -> bool:
    return get_systemd_unit_path(system=True).exists() and not get_systemd_unit_path(system=False).exists()


-def _get_restart_drain_timeout() -> float:
-    """Return the configured gateway restart drain timeout in seconds."""
-    raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
-    if not raw:
-        cfg = read_raw_config()
-        agent_cfg = cfg.get("agent", {}) if isinstance(cfg, dict) else {}
-        raw = str(
-            agent_cfg.get(
-                "restart_drain_timeout", DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-            )
-        )
-    return parse_restart_drain_timeout(raw)
-
-
 def systemd_install(force: bool = False, system: bool = False, run_as_user: str | None = None):
    if system:
        _require_root_for_system_service("install")
@@ -1029,13 +923,7 @@ def systemd_restart(system: bool = False):
    if system:
        _require_root_for_system_service("restart")
    refresh_systemd_unit_if_needed(system=system)
-    from gateway.status import get_running_pid
-
-    pid = get_running_pid()
-    if pid is not None and _request_gateway_self_restart(pid):
-        print(f"✓ {_service_scope_label(system).capitalize()} service restart requested")
-        return
-    subprocess.run(_systemctl_cmd(system) + ["reload-or-restart", get_service_name()], check=True, timeout=90)
+    subprocess.run(_systemctl_cmd(system) + ["restart", get_service_name()], check=True, timeout=90)
    print(f"✓ {_service_scope_label(system).capitalize()} service restarted")


@@ -1323,7 +1211,7 @@ def launchd_stop():
    _wait_for_gateway_exit(timeout=10.0, force_after=5.0)
    print("✓ Service stopped")

-def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float | None = 5.0) -> bool:
+def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
    """Wait for the gateway process (by saved PID) to exit.

    Uses the PID from the gateway.pid file — not launchd labels — so this
@@ -1338,21 +1226,21 @@ def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float | None = 5.
    from gateway.status import get_running_pid

    deadline = time.monotonic() + timeout
-    force_deadline = (time.monotonic() + force_after) if force_after is not None else None
+    force_deadline = time.monotonic() + force_after
    force_sent = False

    while time.monotonic() < deadline:
        pid = get_running_pid()
        if pid is None:
-            return True  # Process exited cleanly.
+            return  # Process exited cleanly.

-        if force_after is not None and not force_sent and time.monotonic() >= force_deadline:
+        if not force_sent and time.monotonic() >= force_deadline:
            # Grace period expired — force-kill the specific PID.
            try:
                terminate_pid(pid, force=True)
                print(f"⚠ Gateway PID {pid} did not exit gracefully; sent SIGKILL")
            except (ProcessLookupError, PermissionError, OSError):
-                return True  # Already gone or we can't touch it.
+                return  # Already gone or we can't touch it.
            force_sent = True

        time.sleep(0.3)
@@ -1361,30 +1249,15 @@ def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float | None = 5.
    remaining_pid = get_running_pid()
    if remaining_pid is not None:
        print(f"⚠ Gateway PID {remaining_pid} still running after {timeout}s — restart may fail")
-        return False
-    return True


 def launchd_restart():
    label = get_launchd_label()
    target = f"{_launchd_domain()}/{label}"
-    drain_timeout = _get_restart_drain_timeout()
-    from gateway.status import get_running_pid
-
+    # Use kickstart -k so launchd performs an atomic kill+restart.
+    # A two-step stop/start from inside the gateway's own process tree
+    # would kill the shell before the start command is reached.
    try:
-        pid = get_running_pid()
-        if pid is not None and _request_gateway_self_restart(pid):
-            print("✓ Service restart requested")
-            return
-        if pid is not None:
-            try:
-                terminate_pid(pid, force=False)
-            except (ProcessLookupError, PermissionError, OSError):
-                pid = None
-            if pid is not None:
-                exited = _wait_for_gateway_exit(timeout=drain_timeout, force_after=None)
-                if not exited:
-                    print(f"⚠ Gateway drain timed out after {drain_timeout:.0f}s — forcing launchd restart")
        subprocess.run(["launchctl", "kickstart", "-k", target], check=True, timeout=90)
        print("✓ Service restarted")
    except subprocess.CalledProcessError as e:
@@ -1569,7 +1442,7 @@ _PLATFORMS = [
            "   Or via API: curl -X POST https://your-server/_matrix/client/v3/login \\",
            "     -d '{\"type\":\"m.login.password\",\"user\":\"@bot:server\",\"password\":\"...\"}'",
            "4. Alternatively, provide user ID + password and Hermes will log in directly",
-            "5. For E2EE: set MATRIX_ENCRYPTION=true (requires pip install 'mautrix[encryption]')",
+            "5. For E2EE: set MATRIX_ENCRYPTION=true (requires pip install 'matrix-nio[e2e]')",
            "6. To find your user ID: it's @username:your-server (shown in Element profile)",
        ],
        "vars": [
@@ -1855,8 +1728,6 @@ def _runtime_health_lines() -> list[str]:
    lines: list[str] = []
    gateway_state = state.get("gateway_state")
    exit_reason = state.get("exit_reason")
-    active_agents = state.get("active_agents")
-    restart_requested = state.get("restart_requested")
    platforms = state.get("platforms", {}) or {}

    for platform, pdata in platforms.items():
@@ -1866,10 +1737,6 @@ def _runtime_health_lines() -> list[str]:

    if gateway_state == "startup_failed" and exit_reason:
        lines.append(f"⚠ Last startup issue: {exit_reason}")
-    elif gateway_state == "draining":
-        action = "restart" if restart_requested else "shutdown"
-        count = int(active_agents or 0)
-        lines.append(f"⏳ Gateway draining for {action} ({count} active agent(s))")
    elif gateway_state == "stopped" and exit_reason:
        lines.append(f"⚠ Last shutdown reason: {exit_reason}")

@@ -2377,8 +2244,7 @@ def gateway_setup():
            print()
            if supports_systemd_services() or is_macos():
                platform_name = "systemd" if supports_systemd_services() else "launchd"
-                wsl_note = " (note: services may not survive WSL restarts)" if is_wsl() else ""
-                if prompt_yes_no(f"  Install the gateway as a {platform_name} service?{wsl_note} (runs in background, starts on boot)", True):
+                if prompt_yes_no(f"  Install the gateway as a {platform_name} service? (runs in background, starts on boot)", True):
                    try:
                        installed_scope = None
                        did_install = False
@@ -2403,21 +2269,16 @@ def gateway_setup():
                    print_info("  You can install later: hermes gateway install")
                    if supports_systemd_services():
                        print_info("  Or as a boot-time service: sudo hermes gateway install --system")
-                    print_info("  Or run in foreground:  hermes gateway run")
-            elif is_wsl():
-                print_info("  WSL detected but systemd is not running.")
-                print_info("  Run in foreground: hermes gateway run")
-                print_info("  For persistence:   tmux new -s hermes 'hermes gateway run'")
-                print_info("  To enable systemd: add systemd=true to /etc/wsl.conf, then 'wsl --shutdown'")
+                    print_info("  Or run in foreground:  hermes gateway")
            else:
                if is_termux():
                    from hermes_constants import display_hermes_home as _dhh
                    print_info("  Termux does not use systemd/launchd services.")
-                    print_info("  Run in foreground: hermes gateway run")
-                    print_info(f"  Or start it manually in the background (best effort): nohup hermes gateway run >{_dhh()}/logs/gateway.log 2>&1 &")
+                    print_info("  Run in foreground: hermes gateway")
+                    print_info(f"  Or start it manually in the background (best effort): nohup hermes gateway >{_dhh()}/logs/gateway.log 2>&1 &")
                else:
                    print_info("  Service install not supported on this platform.")
-                    print_info("  Run in foreground: hermes gateway run")
+                    print_info("  Run in foreground: hermes gateway")
    else:
        print()
        print_info("No platforms configured. Run 'hermes gateway setup' when ready.")
@@ -2458,23 +2319,9 @@ def gateway_command(args):
            print("Run manually: hermes gateway")
            sys.exit(1)
        if supports_systemd_services():
-            if is_wsl():
-                print_warning("WSL detected — systemd services may not survive WSL restarts.")
-                print_info("  Consider running in foreground instead: hermes gateway run")
-                print_info("  Or use tmux/screen for persistence: tmux new -s hermes 'hermes gateway run'")
-                print()
            systemd_install(force=force, system=system, run_as_user=run_as_user)
        elif is_macos():
            launchd_install(force)
-        elif is_wsl():
-            print("WSL detected but systemd is not running.")
-            print("Either enable systemd (add systemd=true to /etc/wsl.conf and restart WSL)")
-            print("or run the gateway in foreground mode:")
-            print()
-            print("  hermes gateway run                              # direct foreground")
-            print("  tmux new -s hermes 'hermes gateway run'         # persistent via tmux")
-            print("  nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 &  # background")
-            sys.exit(1)
        else:
            print("Service installation not supported on this platform.")
            print("Run manually: hermes gateway run")
@@ -2507,16 +2354,6 @@ def gateway_command(args):
            systemd_start(system=system)
        elif is_macos():
            launchd_start()
-        elif is_wsl():
-            print("WSL detected but systemd is not available.")
-            print("Run the gateway in foreground mode instead:")
-            print()
-            print("  hermes gateway run                              # direct foreground")
-            print("  tmux new -s hermes 'hermes gateway run'         # persistent via tmux")
-            print("  nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 &  # background")
-            print()
-            print("To enable systemd: add systemd=true to /etc/wsl.conf and run 'wsl --shutdown' from PowerShell.")
-            sys.exit(1)
        else:
            print("Not supported on this platform.")
            sys.exit(1)
@@ -2651,10 +2488,6 @@ def gateway_command(args):
                if is_termux():
                    print("Termux note:")
                    print("  Android may stop background jobs when Termux is suspended")
-                elif is_wsl():
-                    print("WSL note:")
-                    print("  The gateway is running in foreground/manual mode (recommended for WSL).")
-                    print("  Use tmux or screen for persistence across terminal closes.")
                else:
                    print("To install as a service:")
                    print("  hermes gateway install")
@@ -2669,12 +2502,9 @@ def gateway_command(args):
                        print(f"  {line}")
                print()
                print("To start:")
-                print("  hermes gateway run      # Run in foreground")
+                print("  hermes gateway          # Run in foreground")
                if is_termux():
-                    print("  nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 &  # Best-effort background start")
-                elif is_wsl():
-                    print("  tmux new -s hermes 'hermes gateway run'         # persistent via tmux")
-                    print("  nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 &  # background")
+                    print("  nohup hermes gateway > ~/.hermes/logs/gateway.log 2>&1 &  # Best-effort background start")
                else:
                    print("  hermes gateway install  # Install as user service")
                    print("  sudo hermes gateway install --system  # Install as boot-time system service")
@@ -528,6 +528,56 @@ def _resolve_last_cli_session() -> Optional[str]:
    return None


+def _exec_in_container(container_info: dict, cli_args: list):
+    """Replace the current process with a command inside the managed container.
+
+    Uses os.execvp to hand off to docker/podman exec, preserving the TTY
+    so the interactive CLI works seamlessly inside the container.
+
+    Args:
+        container_info: dict with backend, container_name, hermes_bin
+        cli_args: the original CLI arguments (everything after 'hermes')
+    """
+    import shutil
+    import subprocess
+
+    backend = container_info["backend"]
+    container_name = container_info["container_name"]
+    hermes_bin = container_info["hermes_bin"]
+
+    # Find the container runtime on PATH
+    runtime = shutil.which(backend)
+    if not runtime:
+        print(f"Warning: {backend} not found on PATH, falling back to host CLI.",
+              file=sys.stderr)
+        return  # Fall through to normal CLI
+
+    # Check if the container is actually running
+    try:
+        result = subprocess.run(
+            [runtime, "inspect", "--format", "{{.State.Running}}", container_name],
+            capture_output=True, text=True, timeout=5
+        )
+        if result.returncode != 0 or result.stdout.strip().lower() != "true":
+            print(f"Warning: container '{container_name}' is not running, falling back to host CLI.",
+                  file=sys.stderr)
+            return
+    except (subprocess.TimeoutExpired, OSError):
+        return  # Fall through on any error
+
+    # Filter out --host flag from forwarded args (it's not meaningful inside)
+    forwarded_args = [a for a in cli_args if a != "--host"]
+
+    # Build the exec command
+    exec_cmd = [runtime, "exec", "-it", container_name, hermes_bin] + forwarded_args
+
+    print(f"Routing to container '{container_name}' via {backend}...",
+          file=sys.stderr)
+
+    # Replace the current process — this never returns on success
+    os.execvp(runtime, exec_cmd)
+
+
 def _resolve_session_by_name_or_id(name_or_id: str) -> Optional[str]:
    """Resolve a session name (title) or ID to a session ID.

@@ -556,6 +606,21 @@ def _resolve_session_by_name_or_id(name_or_id: str) -> Optional[str]:

 def cmd_chat(args):
    """Run interactive chat CLI."""
+    # ── Container-aware routing ──────────────────────────────────────────
+    # When NixOS container mode is active and we're on the host, exec into
+    # the managed container instead of running locally. --host bypasses this.
+    if not getattr(args, "host", False):
+        try:
+            from hermes_cli.config import get_container_exec_info
+            container_info = get_container_exec_info()
+            if container_info:
+                _exec_in_container(container_info, sys.argv[1:])
+                # _exec_in_container calls os.execvp which replaces the process.
+                # If we get here, the exec failed.
+                sys.exit(1)
+        except Exception:
+            pass  # Fall through to normal CLI on any detection error
+
    # Resolve --continue into --resume with the latest CLI session or by name
    continue_val = getattr(args, "continue_last", None)
    if continue_val and not getattr(args, "resume", None):
@@ -1080,42 +1145,6 @@ def select_provider_and_model(args=None):
    elif selected_provider in ("gemini", "zai", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface"):
        _model_flow_api_key_provider(config, selected_provider, current_model)

-    # ── Post-switch cleanup: clear stale OPENAI_BASE_URL ──────────────
-    # When the user switches to a named provider (anything except "custom"),
-    # a leftover OPENAI_BASE_URL in ~/.hermes/.env can poison auxiliary
-    # clients that use provider:auto. Clear it proactively.  (#5161)
-    if selected_provider not in ("custom", "cancel", "remove-custom") \
-            and not selected_provider.startswith("custom:"):
-        _clear_stale_openai_base_url()
-
-
-def _clear_stale_openai_base_url():
-    """Remove OPENAI_BASE_URL from ~/.hermes/.env if the active provider is not 'custom'.
-
-    After a provider switch, a leftover OPENAI_BASE_URL causes auxiliary
-    clients (compression, vision, delegation) with provider:auto to route
-    requests to the old custom endpoint instead of the newly selected
-    provider.  See issue #5161.
-    """
-    from hermes_cli.config import get_env_value, save_env_value, load_config
-
-    cfg = load_config()
-    model_cfg = cfg.get("model", {})
-    if isinstance(model_cfg, dict):
-        provider = (model_cfg.get("provider") or "").strip().lower()
-    else:
-        provider = ""
-
-    if provider == "custom" or not provider:
-        return  # custom provider legitimately uses OPENAI_BASE_URL
-
-    stale_url = get_env_value("OPENAI_BASE_URL")
-    if stale_url:
-        save_env_value("OPENAI_BASE_URL", "")
-        print(f"Cleared stale OPENAI_BASE_URL from .env (was: {stale_url[:40]}...)"
-              if len(stale_url) > 40
-              else f"Cleared stale OPENAI_BASE_URL from .env (was: {stale_url})")
-

 def _prompt_provider_choice(choices, *, default=0):
    """Show provider selection menu with curses arrow-key navigation.
@@ -4422,6 +4451,12 @@ For more help on a command:
        default=None,
        help="Session source tag for filtering (default: cli). Use 'tool' for third-party integrations that should not appear in user session lists."
    )
+    chat_parser.add_argument(
+        "--host",
+        action="store_true",
+        default=False,
+        help="Run on the host even when NixOS container mode is active (bypass container exec)"
+    )
    chat_parser.set_defaults(func=cmd_chat)

    # =========================================================================
@@ -4483,7 +4518,7 @@ For more help on a command:
    gateway_subparsers = gateway_parser.add_subparsers(dest="gateway_command")
    
    # gateway run (default)
-    gateway_run = gateway_subparsers.add_parser("run", help="Run gateway in foreground (recommended for WSL, Docker, Termux)")
+    gateway_run = gateway_subparsers.add_parser("run", help="Run gateway in foreground")
    gateway_run.add_argument("-v", "--verbose", action="count", default=0,
                             help="Increase stderr log verbosity (-v=INFO, -vv=DEBUG)")
    gateway_run.add_argument("-q", "--quiet", action="store_true",
@@ -4492,7 +4527,7 @@ For more help on a command:
                             help="Replace any existing gateway instance (useful for systemd)")
    
    # gateway start
-    gateway_start = gateway_subparsers.add_parser("start", help="Start the installed systemd/launchd background service")
+    gateway_start = gateway_subparsers.add_parser("start", help="Start gateway service")
    gateway_start.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
    
    # gateway stop
@@ -4510,7 +4545,7 @@ For more help on a command:
    gateway_status.add_argument("--system", action="store_true", help="Target the Linux system-level gateway service")
    
    # gateway install
-    gateway_install = gateway_subparsers.add_parser("install", help="Install gateway as a systemd/launchd background service")
+    gateway_install = gateway_subparsers.add_parser("install", help="Install gateway as service")
    gateway_install.add_argument("--force", action="store_true", help="Force reinstall")
    gateway_install.add_argument("--system", action="store_true", help="Install as a Linux system-level service (starts at boot)")
    gateway_install.add_argument("--run-as-user", dest="run_as_user", help="User account the Linux system service should run as")
@@ -87,8 +87,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "openai/gpt-5.4-nano",
    ],
    "openai-codex": [
-        "gpt-5.4",
-        "gpt-5.4-mini",
        "gpt-5.3-codex",
        "gpt-5.2-codex",
        "gpt-5.1-codex-mini",
@@ -159,16 +157,22 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "kimi-k2-0905-preview",
    ],
    "minimax": [
-        "MiniMax-M2.7",
+        "MiniMax-M1",
+        "MiniMax-M1-40k",
+        "MiniMax-M1-80k",
+        "MiniMax-M1-128k",
+        "MiniMax-M1-256k",
        "MiniMax-M2.5",
-        "MiniMax-M2.1",
-        "MiniMax-M2",
+        "MiniMax-M2.7",
    ],
    "minimax-cn": [
-        "MiniMax-M2.7",
+        "MiniMax-M1",
+        "MiniMax-M1-40k",
+        "MiniMax-M1-80k",
+        "MiniMax-M1-128k",
+        "MiniMax-M1-256k",
        "MiniMax-M2.5",
-        "MiniMax-M2.1",
-        "MiniMax-M2",
+        "MiniMax-M2.7",
    ],
    "anthropic": [
        "claude-opus-4-6",
@@ -143,7 +143,6 @@ def _tts_label(current_provider: str) -> str:
        "openai": "OpenAI TTS",
        "elevenlabs": "ElevenLabs",
        "edge": "Edge TTS",
-        "mistral": "Mistral Voxtral TTS",
        "neutts": "NeuTTS",
    }
    return mapping.get(current_provider or "edge", current_provider or "Edge TTS")
@@ -310,7 +309,6 @@ def get_nous_subscription_features(
        tts_current_provider in {"edge", "neutts"}
        or (tts_current_provider == "openai" and (managed_tts_available or direct_openai_tts))
        or (tts_current_provider == "elevenlabs" and direct_elevenlabs)
-        or (tts_current_provider == "mistral" and bool(get_env_value("MISTRAL_API_KEY")))
    )
    tts_active = bool(tts_tool_enabled and tts_available)

@@ -201,7 +201,8 @@ class PluginContext:

        The *setup_fn* receives an argparse subparser and should add any
        arguments/sub-subparsers.  If *handler_fn* is provided it is set
-        as the default dispatch function via ``set_defaults(func=...)``."""
+        as the default dispatch function via ``set_defaults(func=...)``.
+        """
        self._manager._cli_commands[name] = {
            "name": name,
            "help": help,
@@ -212,38 +213,6 @@ class PluginContext:
        }
        logger.debug("Plugin %s registered CLI command: %s", self.manifest.name, name)

-    # -- context engine registration -----------------------------------------
-
-    def register_context_engine(self, engine) -> None:
-        """Register a context engine to replace the built-in ContextCompressor.
-
-        Only one context engine plugin is allowed. If a second plugin tries
-        to register one, it is rejected with a warning.
-
-        The engine must be an instance of ``agent.context_engine.ContextEngine``.
-        """
-        if self._manager._context_engine is not None:
-            logger.warning(
-                "Plugin '%s' tried to register a context engine, but one is "
-                "already registered. Only one context engine plugin is allowed.",
-                self.manifest.name,
-            )
-            return
-        # Defer the import to avoid circular deps at module level
-        from agent.context_engine import ContextEngine
-        if not isinstance(engine, ContextEngine):
-            logger.warning(
-                "Plugin '%s' tried to register a context engine that does not "
-                "inherit from ContextEngine. Ignoring.",
-                self.manifest.name,
-            )
-            return
-        self._manager._context_engine = engine
-        logger.info(
-            "Plugin '%s' registered context engine: %s",
-            self.manifest.name, engine.name,
-        )
-
    # -- hook registration --------------------------------------------------

    def register_hook(self, hook_name: str, callback: Callable) -> None:
@@ -276,7 +245,6 @@ class PluginManager:
        self._hooks: Dict[str, List[Callable]] = {}
        self._plugin_tool_names: Set[str] = set()
        self._cli_commands: Dict[str, dict] = {}
-        self._context_engine = None  # Set by a plugin via register_context_engine()
        self._discovered: bool = False
        self._cli_ref = None  # Set by CLI after plugin discovery

@@ -598,11 +566,6 @@ def get_plugin_cli_commands() -> Dict[str, dict]:
    return dict(get_plugin_manager()._cli_commands)


-def get_plugin_context_engine():
-    """Return the plugin-registered context engine, or None."""
-    return get_plugin_manager()._context_engine
-
-
 def get_plugin_toolsets() -> List[tuple]:
    """Return plugin toolsets as ``(key, label, description)`` tuples.

@@ -531,7 +531,7 @@ def cmd_disable(name: str) -> None:

    disabled.add(name)
    _save_disabled_set(disabled)
-    console.print(f"[yellow]\u2298[/yellow] Plugin [bold]{name}[/bold] disabled. Takes effect on next session.")
+    console.print(f"[yellow]⊘[/yellow] Plugin [bold]{name}[/bold] disabled. Takes effect on next session.")


 def cmd_list() -> None:
@@ -594,152 +594,8 @@ def cmd_list() -> None:
    console.print("[dim]Enable/disable:[/dim] hermes plugins enable/disable <name>")


-# ---------------------------------------------------------------------------
-# Provider plugin discovery helpers
-# ---------------------------------------------------------------------------
-
-
-def _discover_memory_providers() -> list[tuple[str, str]]:
-    """Return [(name, description), ...] for available memory providers."""
-    try:
-        from plugins.memory import discover_memory_providers
-        return [(name, desc) for name, desc, _avail in discover_memory_providers()]
-    except Exception:
-        return []
-
-
-def _discover_context_engines() -> list[tuple[str, str]]:
-    """Return [(name, description), ...] for available context engines."""
-    try:
-        from plugins.context_engine import discover_context_engines
-        return [(name, desc) for name, desc, _avail in discover_context_engines()]
-    except Exception:
-        return []
-
-
-def _get_current_memory_provider() -> str:
-    """Return the current memory.provider from config (empty = built-in)."""
-    try:
-        from hermes_cli.config import load_config
-        config = load_config()
-        return config.get("memory", {}).get("provider", "") or ""
-    except Exception:
-        return ""
-
-
-def _get_current_context_engine() -> str:
-    """Return the current context.engine from config."""
-    try:
-        from hermes_cli.config import load_config
-        config = load_config()
-        return config.get("context", {}).get("engine", "compressor") or "compressor"
-    except Exception:
-        return "compressor"
-
-
-def _save_memory_provider(name: str) -> None:
-    """Persist memory.provider to config.yaml."""
-    from hermes_cli.config import load_config, save_config
-    config = load_config()
-    if "memory" not in config:
-        config["memory"] = {}
-    config["memory"]["provider"] = name
-    save_config(config)
-
-
-def _save_context_engine(name: str) -> None:
-    """Persist context.engine to config.yaml."""
-    from hermes_cli.config import load_config, save_config
-    config = load_config()
-    if "context" not in config:
-        config["context"] = {}
-    config["context"]["engine"] = name
-    save_config(config)
-
-
-def _configure_memory_provider() -> bool:
-    """Launch a radio picker for memory providers. Returns True if changed."""
-    from hermes_cli.curses_ui import curses_radiolist
-
-    current = _get_current_memory_provider()
-    providers = _discover_memory_providers()
-
-    # Build items: "built-in" first, then discovered providers
-    items = ["built-in (default)"]
-    names = [""]  # empty string = built-in
-    selected = 0
-
-    for name, desc in providers:
-        names.append(name)
-        label = f"{name} \u2014 {desc}" if desc else name
-        items.append(label)
-        if name == current:
-            selected = len(items) - 1
-
-    # If current provider isn't in discovered list, add it
-    if current and current not in names:
-        names.append(current)
-        items.append(f"{current} (not found)")
-        selected = len(items) - 1
-
-    choice = curses_radiolist(
-        title="Memory Provider (select one)",
-        items=items,
-        selected=selected,
-    )
-
-    new_provider = names[choice]
-    if new_provider != current:
-        _save_memory_provider(new_provider)
-        return True
-    return False
-
-
-def _configure_context_engine() -> bool:
-    """Launch a radio picker for context engines. Returns True if changed."""
-    from hermes_cli.curses_ui import curses_radiolist
-
-    current = _get_current_context_engine()
-    engines = _discover_context_engines()
-
-    # Build items: "compressor" first (built-in), then discovered engines
-    items = ["compressor (default)"]
-    names = ["compressor"]
-    selected = 0
-
-    for name, desc in engines:
-        names.append(name)
-        label = f"{name} \u2014 {desc}" if desc else name
-        items.append(label)
-        if name == current:
-            selected = len(items) - 1
-
-    # If current engine isn't in discovered list and isn't compressor, add it
-    if current != "compressor" and current not in names:
-        names.append(current)
-        items.append(f"{current} (not found)")
-        selected = len(items) - 1
-
-    choice = curses_radiolist(
-        title="Context Engine (select one)",
-        items=items,
-        selected=selected,
-    )
-
-    new_engine = names[choice]
-    if new_engine != current:
-        _save_context_engine(new_engine)
-        return True
-    return False
-
-
-# ---------------------------------------------------------------------------
-# Composite plugins UI
-# ---------------------------------------------------------------------------
-
-
 def cmd_toggle() -> None:
-    """Interactive composite UI — general plugins + provider plugin categories."""
+    """Interactive curses checklist to enable/disable installed plugins."""
    from rich.console import Console

    try:
@@ -750,13 +606,18 @@ def cmd_toggle() -> None:
    console = Console()
    plugins_dir = _plugins_dir()

-    # -- General plugins discovery --
    dirs = sorted(d for d in plugins_dir.iterdir() if d.is_dir())
+    if not dirs:
+        console.print("[dim]No plugins installed.[/dim]")
+        console.print("[dim]Install with:[/dim] hermes plugins install owner/repo")
+        return
+
    disabled = _get_disabled_set()

-    plugin_names = []
-    plugin_labels = []
-    plugin_selected = set()
+    # Build items list: "name — description" for display
+    names = []
+    labels = []
+    selected = set()

    for i, d in enumerate(dirs):
        manifest_file = d / "plugin.yaml"
@@ -772,335 +633,36 @@ def cmd_toggle() -> None:
            except Exception:
                pass

-        plugin_names.append(name)
-        label = f"{name} \u2014 {description}" if description else name
-        plugin_labels.append(label)
+        names.append(name)
+        label = f"{name} — {description}" if description else name
+        labels.append(label)

        if name not in disabled and d.name not in disabled:
-            plugin_selected.add(i)
+            selected.add(i)

-    # -- Provider categories --
-    current_memory = _get_current_memory_provider() or "built-in"
-    current_context = _get_current_context_engine()
-    categories = [
-        ("Memory Provider", current_memory, _configure_memory_provider),
-        ("Context Engine", current_context, _configure_context_engine),
-    ]
+    from hermes_cli.curses_ui import curses_checklist

-    has_plugins = bool(plugin_names)
-    has_categories = bool(categories)
+    result = curses_checklist(
+        title="Plugins — toggle enabled/disabled",
+        items=labels,
+        selected=selected,
+    )

-    if not has_plugins and not has_categories:
-        console.print("[dim]No plugins installed and no provider categories available.[/dim]")
-        console.print("[dim]Install with:[/dim] hermes plugins install owner/repo")
-        return
-
-    # Non-TTY fallback
-    if not sys.stdin.isatty():
-        console.print("[dim]Interactive mode requires a terminal.[/dim]")
-        return
-
-    # Launch the composite curses UI
-    try:
-        import curses
-        _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
-                          disabled, categories, console)
-    except ImportError:
-        _run_composite_fallback(plugin_names, plugin_labels, plugin_selected,
-                                disabled, categories, console)
-
-
-def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
-                      disabled, categories, console):
-    """Custom curses screen with checkboxes + category action rows."""
-    from hermes_cli.curses_ui import flush_stdin
-
-    chosen = set(plugin_selected)
-    n_plugins = len(plugin_names)
-    # Total rows: plugins + separator + categories
-    # separator is not navigable
-    n_categories = len(categories)
-    total_items = n_plugins + n_categories  # navigable items
-
-    result_holder = {"plugins_changed": False, "providers_changed": False}
-
-    def _draw(stdscr):
-        curses.curs_set(0)
-        if curses.has_colors():
-            curses.start_color()
-            curses.use_default_colors()
-            curses.init_pair(1, curses.COLOR_GREEN, -1)
-            curses.init_pair(2, curses.COLOR_YELLOW, -1)
-            curses.init_pair(3, curses.COLOR_CYAN, -1)
-            curses.init_pair(4, 8, -1)  # dim gray
-        cursor = 0
-        scroll_offset = 0
-
-        while True:
-            stdscr.clear()
-            max_y, max_x = stdscr.getmaxyx()
-
-            # Header
-            try:
-                hattr = curses.A_BOLD
-                if curses.has_colors():
-                    hattr |= curses.color_pair(2)
-                stdscr.addnstr(0, 0, "Plugins", max_x - 1, hattr)
-                stdscr.addnstr(
-                    1, 0,
-                    "  \u2191\u2193 navigate  SPACE toggle  ENTER configure/confirm  ESC done",
-                    max_x - 1, curses.A_DIM,
-                )
-            except curses.error:
-                pass
-
-            # Build display rows
-            # Row layout:
-            #   [plugins section header] (not navigable, skipped in scroll math)
-            #   plugin checkboxes (navigable, indices 0..n_plugins-1)
-            #   [separator] (not navigable)
-            #   [categories section header] (not navigable)
-            #   category action rows (navigable, indices n_plugins..total_items-1)
-
-            visible_rows = max_y - 4
-            if cursor < scroll_offset:
-                scroll_offset = cursor
-            elif cursor >= scroll_offset + visible_rows:
-                scroll_offset = cursor - visible_rows + 1
-
-            y = 3  # start drawing after header
-
-            # Determine which items are visible based on scroll
-            # We need to map logical cursor positions to screen rows
-            # accounting for non-navigable separator/headers
-
-            draw_row = 0  # tracks navigable item index
-
-            # --- General Plugins section ---
-            if n_plugins > 0:
-                # Section header
-                if y < max_y - 1:
-                    try:
-                        sattr = curses.A_BOLD
-                        if curses.has_colors():
-                            sattr |= curses.color_pair(2)
-                        stdscr.addnstr(y, 0, "  General Plugins", max_x - 1, sattr)
-                    except curses.error:
-                        pass
-                    y += 1
-
-                for i in range(n_plugins):
-                    if y >= max_y - 1:
-                        break
-                    check = "\u2713" if i in chosen else " "
-                    arrow = "\u2192" if i == cursor else " "
-                    line = f" {arrow} [{check}] {plugin_labels[i]}"
-                    attr = curses.A_NORMAL
-                    if i == cursor:
-                        attr = curses.A_BOLD
-                        if curses.has_colors():
-                            attr |= curses.color_pair(1)
-                    try:
-                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
-                    except curses.error:
-                        pass
-                    y += 1
-
-            # --- Separator ---
-            if y < max_y - 1:
-                y += 1  # blank line
-
-            # --- Provider Plugins section ---
-            if n_categories > 0 and y < max_y - 1:
-                try:
-                    sattr = curses.A_BOLD
-                    if curses.has_colors():
-                        sattr |= curses.color_pair(2)
-                    stdscr.addnstr(y, 0, "  Provider Plugins", max_x - 1, sattr)
-                except curses.error:
-                    pass
-                y += 1
-
-                for ci, (cat_name, cat_current, _cat_fn) in enumerate(categories):
-                    if y >= max_y - 1:
-                        break
-                    cat_idx = n_plugins + ci
-                    arrow = "\u2192" if cat_idx == cursor else " "
-                    line = f" {arrow}   {cat_name:<24} \u25b8 {cat_current}"
-                    attr = curses.A_NORMAL
-                    if cat_idx == cursor:
-                        attr = curses.A_BOLD
-                        if curses.has_colors():
-                            attr |= curses.color_pair(3)
-                    try:
-                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
-                    except curses.error:
-                        pass
-                    y += 1
-
-            stdscr.refresh()
-            key = stdscr.getch()
-
-            if key in (curses.KEY_UP, ord("k")):
-                if total_items > 0:
-                    cursor = (cursor - 1) % total_items
-            elif key in (curses.KEY_DOWN, ord("j")):
-                if total_items > 0:
-                    cursor = (cursor + 1) % total_items
-            elif key == ord(" "):
-                if cursor < n_plugins:
-                    # Toggle general plugin
-                    chosen.symmetric_difference_update({cursor})
-                else:
-                    # Provider category — launch sub-screen
-                    ci = cursor - n_plugins
-                    if 0 <= ci < n_categories:
-                        curses.endwin()
-                        _cat_name, _cat_cur, cat_fn = categories[ci]
-                        changed = cat_fn()
-                        if changed:
-                            result_holder["providers_changed"] = True
-                            # Refresh current values
-                            categories[ci] = (
-                                _cat_name,
-                                _get_current_memory_provider() or "built-in" if ci == 0
-                                else _get_current_context_engine(),
-                                cat_fn,
-                            )
-                        # Re-enter curses
-                        stdscr = curses.initscr()
-                        curses.noecho()
-                        curses.cbreak()
-                        stdscr.keypad(True)
-                        if curses.has_colors():
-                            curses.start_color()
-                            curses.use_default_colors()
-                            curses.init_pair(1, curses.COLOR_GREEN, -1)
-                            curses.init_pair(2, curses.COLOR_YELLOW, -1)
-                            curses.init_pair(3, curses.COLOR_CYAN, -1)
-                            curses.init_pair(4, 8, -1)
-                        curses.curs_set(0)
-            elif key in (curses.KEY_ENTER, 10, 13):
-                if cursor < n_plugins:
-                    # ENTER on a plugin checkbox — confirm and exit
-                    result_holder["plugins_changed"] = True
-                    return
-                else:
-                    # ENTER on a category — same as SPACE, launch sub-screen
-                    ci = cursor - n_plugins
-                    if 0 <= ci < n_categories:
-                        curses.endwin()
-                        _cat_name, _cat_cur, cat_fn = categories[ci]
-                        changed = cat_fn()
-                        if changed:
-                            result_holder["providers_changed"] = True
-                            categories[ci] = (
-                                _cat_name,
-                                _get_current_memory_provider() or "built-in" if ci == 0
-                                else _get_current_context_engine(),
-                                cat_fn,
-                            )
-                        stdscr = curses.initscr()
-                        curses.noecho()
-                        curses.cbreak()
-                        stdscr.keypad(True)
-                        if curses.has_colors():
-                            curses.start_color()
-                            curses.use_default_colors()
-                            curses.init_pair(1, curses.COLOR_GREEN, -1)
-                            curses.init_pair(2, curses.COLOR_YELLOW, -1)
-                            curses.init_pair(3, curses.COLOR_CYAN, -1)
-                            curses.init_pair(4, 8, -1)
-                        curses.curs_set(0)
-            elif key in (27, ord("q")):
-                # Save plugin changes on exit
-                result_holder["plugins_changed"] = True
-                return
-
-    curses.wrapper(_draw)
-    flush_stdin()
-
-    # Persist general plugin changes
+    # Compute new disabled set from deselected items
    new_disabled = set()
-    for i, name in enumerate(plugin_names):
-        if i not in chosen:
+    for i, name in enumerate(names):
+        if i not in result:
            new_disabled.add(name)

    if new_disabled != disabled:
        _save_disabled_set(new_disabled)
-        enabled_count = len(plugin_names) - len(new_disabled)
+        enabled_count = len(names) - len(new_disabled)
        console.print(
-            f"\n[green]\u2713[/green] General plugins: {enabled_count} enabled, "
-            f"{len(new_disabled)} disabled."
+            f"\n[green]✓[/green] {enabled_count} enabled, {len(new_disabled)} disabled. "
+            f"Takes effect on next session."
        )
-    elif n_plugins > 0:
-        console.print("\n[dim]General plugins unchanged.[/dim]")
-
-    if result_holder["providers_changed"]:
-        new_memory = _get_current_memory_provider() or "built-in"
-        new_context = _get_current_context_engine()
-        console.print(
-            f"[green]\u2713[/green] Memory provider: [bold]{new_memory}[/bold]  "
-            f"Context engine: [bold]{new_context}[/bold]"
-        )
-
-    if n_plugins > 0 or result_holder["providers_changed"]:
-        console.print("[dim]Changes take effect on next session.[/dim]")
-    console.print()
-
-
-def _run_composite_fallback(plugin_names, plugin_labels, plugin_selected,
-                            disabled, categories, console):
-    """Text-based fallback for the composite plugins UI."""
-    from hermes_cli.colors import Colors, color
-
-    print(color("\n  Plugins", Colors.YELLOW))
-
-    # General plugins
-    if plugin_names:
-        chosen = set(plugin_selected)
-        print(color("\n  General Plugins", Colors.YELLOW))
-        print(color("  Toggle by number, Enter to confirm.\n", Colors.DIM))
-
-        while True:
-            for i, label in enumerate(plugin_labels):
-                marker = color("[\u2713]", Colors.GREEN) if i in chosen else "[ ]"
-                print(f"  {marker} {i + 1:>2}. {label}")
-            print()
-            try:
-                val = input(color("  Toggle # (or Enter to confirm): ", Colors.DIM)).strip()
-                if not val:
-                    break
-                idx = int(val) - 1
-                if 0 <= idx < len(plugin_names):
-                    chosen.symmetric_difference_update({idx})
-            except (ValueError, KeyboardInterrupt, EOFError):
-                return
-            print()
-
-        new_disabled = set()
-        for i, name in enumerate(plugin_names):
-            if i not in chosen:
-                new_disabled.add(name)
-        if new_disabled != disabled:
-            _save_disabled_set(new_disabled)
-
-    # Provider categories
-    if categories:
-        print(color("\n  Provider Plugins", Colors.YELLOW))
-        for ci, (cat_name, cat_current, cat_fn) in enumerate(categories):
-            print(f"  {ci + 1}. {cat_name} [{cat_current}]")
-        print()
-        try:
-            val = input(color("  Configure # (or Enter to skip): ", Colors.DIM)).strip()
-            if val:
-                ci = int(val) - 1
-                if 0 <= ci < len(categories):
-                    categories[ci][2]()  # call the configure function
-        except (ValueError, KeyboardInterrupt, EOFError):
-            pass
-
-    print()
+    else:
+        console.print("\n[dim]No changes.[/dim]")


 def plugins_command(args) -> None:
@@ -88,11 +88,11 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        base_url_env_var="KIMI_BASE_URL",
    ),
    "minimax": HermesOverlay(
-        transport="anthropic_messages",
+        transport="openai_chat",
        base_url_env_var="MINIMAX_BASE_URL",
    ),
    "minimax-cn": HermesOverlay(
-        transport="anthropic_messages",
+        transport="openai_chat",
        base_url_env_var="MINIMAX_CN_BASE_URL",
    ),
    "deepseek": HermesOverlay(
@@ -106,8 +106,8 @@ _DEFAULT_PROVIDER_MODELS = {
    ],
    "zai": ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
    "kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
-    "minimax": ["MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"],
-    "minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"],
+    "minimax": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"],
+    "minimax-cn": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"],
    "ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
    "kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
    "opencode-zen": ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash", "glm-5", "kimi-k2.5", "minimax-m2.7"],
@@ -557,8 +557,6 @@ def _print_setup_summary(config: dict, hermes_home):
        tool_status.append(("Text-to-Speech (OpenAI)", True, None))
    elif tts_provider == "minimax" and get_env_value("MINIMAX_API_KEY"):
        tool_status.append(("Text-to-Speech (MiniMax)", True, None))
-    elif tts_provider == "mistral" and get_env_value("MISTRAL_API_KEY"):
-        tool_status.append(("Text-to-Speech (Mistral Voxtral)", True, None))
    elif tts_provider == "neutts":
        try:
            import importlib.util
@@ -1046,7 +1044,6 @@ def _setup_tts_provider(config: dict):
        "elevenlabs": "ElevenLabs",
        "openai": "OpenAI TTS",
        "minimax": "MiniMax TTS",
-        "mistral": "Mistral Voxtral TTS",
        "neutts": "NeuTTS",
    }
    current_label = provider_labels.get(current_provider, current_provider)
@@ -1067,11 +1064,10 @@ def _setup_tts_provider(config: dict):
            "ElevenLabs (premium quality, needs API key)",
            "OpenAI TTS (good quality, needs API key)",
            "MiniMax TTS (high quality with voice cloning, needs API key)",
-            "Mistral Voxtral TTS (multilingual, native Opus, needs API key)",
            "NeuTTS (local on-device, free, ~300MB model download)",
        ]
    )
-    providers.extend(["edge", "elevenlabs", "openai", "minimax", "mistral", "neutts"])
+    providers.extend(["edge", "elevenlabs", "openai", "minimax", "neutts"])
    choices.append(f"Keep current ({current_label})")
    keep_current_idx = len(choices) - 1
    idx = prompt_choice("Select TTS provider:", choices, keep_current_idx)
@@ -1149,18 +1145,6 @@ def _setup_tts_provider(config: dict):
                print_warning("No API key provided. Falling back to Edge TTS.")
                selected = "edge"

-    elif selected == "mistral":
-        existing = get_env_value("MISTRAL_API_KEY")
-        if not existing:
-            print()
-            api_key = prompt("Mistral API key for TTS", password=True)
-            if api_key:
-                save_env_value("MISTRAL_API_KEY", api_key)
-                print_success("Mistral TTS API key saved")
-            else:
-                print_warning("No API key provided. Falling back to Edge TTS.")
-                selected = "edge"
-
    # Save the selection
    if "tts" not in config:
        config["tts"] = {}
@@ -1941,9 +1925,9 @@ def _setup_matrix():
            save_env_value("MATRIX_ENCRYPTION", "true")
            print_success("E2EE enabled")

-        matrix_pkg = "mautrix[encryption]" if want_e2ee else "mautrix"
+        matrix_pkg = "matrix-nio[e2e]" if want_e2ee else "matrix-nio"
        try:
-            __import__("mautrix")
+            __import__("nio")
        except ImportError:
            print_info(f"Installing {matrix_pkg}...")
            import subprocess
@@ -2938,33 +2922,19 @@ def run_setup_wizard(args):
    _offer_launch_chat()


-def _resolve_hermes_chat_argv() -> Optional[list[str]]:
-    """Resolve argv for launching ``hermes chat`` in a fresh process."""
-    hermes_bin = shutil.which("hermes")
-    if hermes_bin:
-        return [hermes_bin, "chat"]
-
-    try:
-        if importlib.util.find_spec("hermes_cli") is not None:
-            return [sys.executable, "-m", "hermes_cli.main", "chat"]
-    except Exception:
-        pass
-
-    return None
-
-
 def _offer_launch_chat():
    """Prompt the user to jump straight into chat after setup."""
    print()
-    if not prompt_yes_no("Launch hermes chat now?", True):
-        return
-
-    chat_argv = _resolve_hermes_chat_argv()
-    if not chat_argv:
-        print_info("Could not relaunch Hermes automatically. Run 'hermes chat' manually.")
-        return
-
-    os.execvp(chat_argv[0], chat_argv)
+    if prompt_yes_no("Launch hermes chat now?", True):
+        from hermes_cli.main import cmd_chat
+        from types import SimpleNamespace
+        cmd_chat(SimpleNamespace(
+            query=None, resume=None, continue_last=None, model=None,
+            provider=None, effort=None, skin=None, oneshot=False,
+            quiet=False, verbose=False, toolsets=None, skills=None,
+            yolo=False, source=None, worktree=False, checkpoints=False,
+            pass_session_id=False, max_turns=None,
+        ))


 def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool):
@@ -181,14 +181,6 @@ TOOL_CATEGORIES = {
                ],
                "tts_provider": "elevenlabs",
            },
-            {
-                "name": "Mistral (Voxtral TTS)",
-                "tag": "Multilingual, native Opus, needs MISTRAL_API_KEY",
-                "env_vars": [
-                    {"key": "MISTRAL_API_KEY", "prompt": "Mistral API key", "url": "https://console.mistral.ai/"},
-                ],
-                "tts_provider": "mistral",
-            },
        ],
    },
    "web": {
@@ -509,10 +501,6 @@ def _get_platform_tools(
        default_ts = PLATFORMS[platform]["default_toolset"]
        toolset_names = [default_ts]

-    # YAML may parse bare numeric names (e.g. ``12306:``) as int.
-    # Normalise to str so downstream sorted() never mixes types.
-    toolset_names = [str(ts) for ts in toolset_names]
-
    configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}

    # If the saved list contains any configurable keys directly, the user
@@ -571,7 +559,7 @@ def _get_platform_tools(
    # Special sentinel: "no_mcp" in the toolset list disables all MCP servers.
    mcp_servers = config.get("mcp_servers") or {}
    enabled_mcp_servers = {
-        str(name)
+        name
        for name, server_cfg in mcp_servers.items()
        if isinstance(server_cfg, dict)
        and _parse_enabled_flag(server_cfg.get("enabled", True), default=True)
@@ -168,27 +168,6 @@ def is_termux() -> bool:
    return bool(os.getenv("TERMUX_VERSION") or "com.termux/files/usr" in prefix)


-_wsl_detected: bool | None = None
-
-
-def is_wsl() -> bool:
-    """Return True when running inside WSL (Windows Subsystem for Linux).
-
-    Checks ``/proc/version`` for the ``microsoft`` marker that both WSL1
-    and WSL2 inject.  Result is cached for the process lifetime.
-    Import-safe — no heavy deps.
-    """
-    global _wsl_detected
-    if _wsl_detected is not None:
-        return _wsl_detected
-    try:
-        with open("/proc/version", "r") as f:
-            _wsl_detected = "microsoft" in f.read().lower()
-    except Exception:
-        _wsl_detected = False
-    return _wsl_detected
-
-
 OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
 OPENROUTER_MODELS_URL = f"{OPENROUTER_BASE_URL}/models"

@@ -611,6 +611,22 @@
          chown ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/.managed
          chmod 0644 ${cfg.stateDir}/.hermes/.managed

+          # Container mode metadata — tells the host CLI to exec into the
+          # container instead of running locally. Removed when container mode
+          # is disabled so the host CLI falls back to native execution.
+          ${if cfg.container.enable then ''
+            cat > ${cfg.stateDir}/.hermes/.container-mode <<'HERMES_CONTAINER_MODE_EOF'
+# Written by NixOS activation script. Do not edit manually.
+backend=${cfg.container.backend}
+container_name=${containerName}
+hermes_bin=${containerDataDir}/current-package/bin/hermes
+HERMES_CONTAINER_MODE_EOF
+            chown ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/.container-mode
+            chmod 0644 ${cfg.stateDir}/.hermes/.container-mode
+          '' else ''
+            rm -f ${cfg.stateDir}/.hermes/.container-mode
+          ''}
+
          # Seed auth file if provided
          ${lib.optionalString (cfg.authFile != null) ''
            ${if cfg.authFileForceOverwrite then ''
@@ -1,219 +0,0 @@
-"""Context engine plugin discovery.
-
-Scans ``plugins/context_engine/<name>/`` directories for context engine
-plugins.  Each subdirectory must contain ``__init__.py`` with a class
-implementing the ContextEngine ABC.
-
-Context engines are separate from the general plugin system — they live
-in the repo and are always available without user installation.  Only ONE
-can be active at a time, selected via ``context.engine`` in config.yaml.
-The default engine is ``"compressor"`` (the built-in ContextCompressor).
-
-Usage:
-    from plugins.context_engine import discover_context_engines, load_context_engine
-
-    available = discover_context_engines()   # [(name, desc, available), ...]
-    engine = load_context_engine("lcm")      # ContextEngine instance
-"""
-
-from __future__ import annotations
-
-import importlib
-import importlib.util
-import logging
-import sys
-from pathlib import Path
-from typing import List, Optional, Tuple
-
-logger = logging.getLogger(__name__)
-
-_CONTEXT_ENGINE_PLUGINS_DIR = Path(__file__).parent
-
-
-def discover_context_engines() -> List[Tuple[str, str, bool]]:
-    """Scan plugins/context_engine/ for available engines.
-
-    Returns list of (name, description, is_available) tuples.
-    Does NOT import the engines — just reads plugin.yaml for metadata
-    and does a lightweight availability check.
-    """
-    results = []
-    if not _CONTEXT_ENGINE_PLUGINS_DIR.is_dir():
-        return results
-
-    for child in sorted(_CONTEXT_ENGINE_PLUGINS_DIR.iterdir()):
-        if not child.is_dir() or child.name.startswith(("_", ".")):
-            continue
-        init_file = child / "__init__.py"
-        if not init_file.exists():
-            continue
-
-        # Read description from plugin.yaml if available
-        desc = ""
-        yaml_file = child / "plugin.yaml"
-        if yaml_file.exists():
-            try:
-                import yaml
-                with open(yaml_file) as f:
-                    meta = yaml.safe_load(f) or {}
-                desc = meta.get("description", "")
-            except Exception:
-                pass
-
-        # Quick availability check — try loading and calling is_available()
-        available = True
-        try:
-            engine = _load_engine_from_dir(child)
-            if engine is None:
-                available = False
-            elif hasattr(engine, "is_available"):
-                available = engine.is_available()
-        except Exception:
-            available = False
-
-        results.append((child.name, desc, available))
-
-    return results
-
-
-def load_context_engine(name: str) -> Optional["ContextEngine"]:
-    """Load and return a ContextEngine instance by name.
-
-    Returns None if the engine is not found or fails to load.
-    """
-    engine_dir = _CONTEXT_ENGINE_PLUGINS_DIR / name
-    if not engine_dir.is_dir():
-        logger.debug("Context engine '%s' not found in %s", name, _CONTEXT_ENGINE_PLUGINS_DIR)
-        return None
-
-    try:
-        engine = _load_engine_from_dir(engine_dir)
-        if engine:
-            return engine
-        logger.warning("Context engine '%s' loaded but no engine instance found", name)
-        return None
-    except Exception as e:
-        logger.warning("Failed to load context engine '%s': %s", name, e)
-        return None
-
-
-def _load_engine_from_dir(engine_dir: Path) -> Optional["ContextEngine"]:
-    """Import an engine module and extract the ContextEngine instance.
-
-    The module must have either:
-    - A register(ctx) function (plugin-style) — we simulate a ctx
-    - A top-level class that extends ContextEngine — we instantiate it
-    """
-    name = engine_dir.name
-    module_name = f"plugins.context_engine.{name}"
-    init_file = engine_dir / "__init__.py"
-
-    if not init_file.exists():
-        return None
-
-    # Check if already loaded
-    if module_name in sys.modules:
-        mod = sys.modules[module_name]
-    else:
-        # Handle relative imports within the plugin
-        # First ensure the parent packages are registered
-        for parent in ("plugins", "plugins.context_engine"):
-            if parent not in sys.modules:
-                parent_path = Path(__file__).parent
-                if parent == "plugins":
-                    parent_path = parent_path.parent
-                parent_init = parent_path / "__init__.py"
-                if parent_init.exists():
-                    spec = importlib.util.spec_from_file_location(
-                        parent, str(parent_init),
-                        submodule_search_locations=[str(parent_path)]
-                    )
-                    if spec:
-                        parent_mod = importlib.util.module_from_spec(spec)
-                        sys.modules[parent] = parent_mod
-                        try:
-                            spec.loader.exec_module(parent_mod)
-                        except Exception:
-                            pass
-
-        # Now load the engine module
-        spec = importlib.util.spec_from_file_location(
-            module_name, str(init_file),
-            submodule_search_locations=[str(engine_dir)]
-        )
-        if not spec:
-            return None
-
-        mod = importlib.util.module_from_spec(spec)
-        sys.modules[module_name] = mod
-
-        # Register submodules so relative imports work
-        for sub_file in engine_dir.glob("*.py"):
-            if sub_file.name == "__init__.py":
-                continue
-            sub_name = sub_file.stem
-            full_sub_name = f"{module_name}.{sub_name}"
-            if full_sub_name not in sys.modules:
-                sub_spec = importlib.util.spec_from_file_location(
-                    full_sub_name, str(sub_file)
-                )
-                if sub_spec:
-                    sub_mod = importlib.util.module_from_spec(sub_spec)
-                    sys.modules[full_sub_name] = sub_mod
-                    try:
-                        sub_spec.loader.exec_module(sub_mod)
-                    except Exception as e:
-                        logger.debug("Failed to load submodule %s: %s", full_sub_name, e)
-
-        try:
-            spec.loader.exec_module(mod)
-        except Exception as e:
-            logger.debug("Failed to exec_module %s: %s", module_name, e)
-            sys.modules.pop(module_name, None)
-            return None
-
-    # Try register(ctx) pattern first (how plugins are written)
-    if hasattr(mod, "register"):
-        collector = _EngineCollector()
-        try:
-            mod.register(collector)
-            if collector.engine:
-                return collector.engine
-        except Exception as e:
-            logger.debug("register() failed for %s: %s", name, e)
-
-    # Fallback: find a ContextEngine subclass and instantiate it
-    from agent.context_engine import ContextEngine
-    for attr_name in dir(mod):
-        attr = getattr(mod, attr_name, None)
-        if (isinstance(attr, type) and issubclass(attr, ContextEngine)
-                and attr is not ContextEngine):
-            try:
-                return attr()
-            except Exception:
-                pass
-
-    return None
-
-
-class _EngineCollector:
-    """Fake plugin context that captures register_context_engine calls."""
-
-    def __init__(self):
-        self.engine = None
-
-    def register_context_engine(self, engine):
-        self.engine = engine
-
-    # No-op for other registration methods
-    def register_tool(self, *args, **kwargs):
-        pass
-
-    def register_hook(self, *args, **kwargs):
-        pass
-
-    def register_cli_command(self, *args, **kwargs):
-        pass
-
-    def register_memory_provider(self, *args, **kwargs):
-        pass
@@ -218,11 +218,9 @@ class HonchoMemoryProvider(MemoryProvider):
                return

            # Override peer_name with gateway user_id for per-user memory scoping.
-            # Only when no explicit peerName was configured — an explicit peerName
-            # means the user chose their identity; a raw user_id (e.g. Telegram
-            # chat ID) should not silently replace it.
+            # CLI sessions won't have user_id, so the config default is preserved.
            _gw_user_id = kwargs.get("user_id")
-            if _gw_user_id and not cfg.peer_name:
+            if _gw_user_id:
                cfg.peer_name = _gw_user_id

            self._config = cfg
@@ -250,12 +248,6 @@ class HonchoMemoryProvider(MemoryProvider):

            # ----- Port #1957: lazy session init for tools-only mode -----
            if self._recall_mode == "tools":
-                if cfg.init_on_session_start:
-                    # Eager init: create session now so sync_turn() works from turn 1.
-                    # Does NOT enable auto-injection — prefetch() still returns empty.
-                    logger.debug("Honcho tools-only mode — eager session init (initOnSessionStart=true)")
-                    self._do_session_init(cfg, session_id, **kwargs)
-                    return
                # Defer actual session creation until first tool call
                self._lazy_init_kwargs = kwargs
                self._lazy_init_session_id = session_id
@@ -189,11 +189,6 @@ class HonchoClientConfig:
    # "context" — auto-injected context only, Honcho tools removed
    # "tools"   — Honcho tools only, no auto-injected context
    recall_mode: str = "hybrid"
-    # When True and recallMode is "tools", create the Honcho session eagerly
-    # during initialize() instead of deferring to the first tool call.
-    # This ensures sync_turn() can write from the very first turn.
-    # Does NOT enable automatic context injection — only changes init timing.
-    init_on_session_start: bool = False
    # Observation mode: legacy string shorthand ("directional" or "unified").
    # Kept for backward compat; granular per-peer booleans below are preferred.
    observation_mode: str = "directional"
@@ -371,11 +366,6 @@ class HonchoClientConfig:
                or raw.get("recallMode")
                or "hybrid"
            ),
-            init_on_session_start=_resolve_bool(
-                host_block.get("initOnSessionStart"),
-                raw.get("initOnSessionStart"),
-                default=False,
-            ),
            # Migration guard: existing configs without an explicit
            # observationMode keep the old "unified" default so users
            # aren't silently switched to full bidirectional observation.
@@ -43,7 +43,7 @@ dev = ["debugpy>=1.8.0,<2", "pytest>=9.0.2,<10", "pytest-asyncio>=1.3.0,<2", "py
 messaging = ["python-telegram-bot[webhooks]>=22.6,<23", "discord.py[voice]>=2.7.1,<3", "aiohttp>=3.13.3,<4", "slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
 cron = ["croniter>=6.0.0,<7"]
 slack = ["slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
-matrix = ["mautrix[encryption]>=0.20,<1", "Markdown>=3.6,<4"]
+matrix = ["matrix-nio[e2e]>=0.24.0,<1", "Markdown>=3.6,<4"]
 cli = ["simple-term-menu>=1.0,<2"]
 tts-premium = ["elevenlabs>=1.0,<2"]
 voice = [
@@ -766,7 +766,7 @@ class AIAgent:
        # conversation prefix. Uses system_and_3 strategy (4 breakpoints).
        is_openrouter = self._is_openrouter_url()
        is_claude = "claude" in self.model.lower()
-        is_native_anthropic = self.api_mode == "anthropic_messages" and self.provider == "anthropic"
+        is_native_anthropic = self.api_mode == "anthropic_messages"
        self._use_prompt_caching = (is_openrouter and is_claude) or is_native_anthropic
        self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)
        
@@ -1268,88 +1268,20 @@ class AIAgent:
                                        pass
                        break
        
-        # Select context engine: config-driven (like memory providers).
-        # 1. Check config.yaml context.engine setting
-        # 2. Check plugins/context_engine/<name>/ directory (repo-shipped)
-        # 3. Check general plugin system (user-installed plugins)
-        # 4. Fall back to built-in ContextCompressor
-        _selected_engine = None
-        _engine_name = "compressor"  # default
-        try:
-            _ctx_cfg = _agent_cfg.get("context", {}) if isinstance(_agent_cfg, dict) else {}
-            _engine_name = _ctx_cfg.get("engine", "compressor") or "compressor"
-        except Exception:
-            pass
-
-        if _engine_name != "compressor":
-            # Try loading from plugins/context_engine/<name>/
-            try:
-                from plugins.context_engine import load_context_engine
-                _selected_engine = load_context_engine(_engine_name)
-            except Exception as _ce_load_err:
-                logger.debug("Context engine load from plugins/context_engine/: %s", _ce_load_err)
-
-            # Try general plugin system as fallback
-            if _selected_engine is None:
-                try:
-                    from hermes_cli.plugins import get_plugin_context_engine
-                    _candidate = get_plugin_context_engine()
-                    if _candidate and _candidate.name == _engine_name:
-                        _selected_engine = _candidate
-                except Exception:
-                    pass
-
-            if _selected_engine is None:
-                logger.warning(
-                    "Context engine '%s' not found — falling back to built-in compressor",
-                    _engine_name,
-                )
-        # else: config says "compressor" — use built-in, don't auto-activate plugins
-
-        if _selected_engine is not None:
-            self.context_compressor = _selected_engine
-            if not self.quiet_mode:
-                logger.info("Using context engine: %s", _selected_engine.name)
-        else:
-            self.context_compressor = ContextCompressor(
-                model=self.model,
-                threshold_percent=compression_threshold,
-                protect_first_n=3,
-                protect_last_n=compression_protect_last,
-                summary_target_ratio=compression_target_ratio,
-                summary_model_override=compression_summary_model,
-                quiet_mode=self.quiet_mode,
-                base_url=self.base_url,
-                api_key=getattr(self, "api_key", ""),
-                config_context_length=_config_context_length,
-                provider=self.provider,
-            )
+        self.context_compressor = ContextCompressor(
+            model=self.model,
+            threshold_percent=compression_threshold,
+            protect_first_n=3,
+            protect_last_n=compression_protect_last,
+            summary_target_ratio=compression_target_ratio,
+            summary_model_override=compression_summary_model,
+            quiet_mode=self.quiet_mode,
+            base_url=self.base_url,
+            api_key=getattr(self, "api_key", ""),
+            config_context_length=_config_context_length,
+            provider=self.provider,
+        )
        self.compression_enabled = compression_enabled
-
-        # Inject context engine tool schemas (e.g. lcm_grep, lcm_describe, lcm_expand)
-        self._context_engine_tool_names: set = set()
-        if hasattr(self, "context_compressor") and self.context_compressor and self.tools is not None:
-            for _schema in self.context_compressor.get_tool_schemas():
-                _wrapped = {"type": "function", "function": _schema}
-                self.tools.append(_wrapped)
-                _tname = _schema.get("name", "")
-                if _tname:
-                    self.valid_tool_names.add(_tname)
-                    self._context_engine_tool_names.add(_tname)
-
-        # Notify context engine of session start
-        if hasattr(self, "context_compressor") and self.context_compressor:
-            try:
-                self.context_compressor.on_session_start(
-                    self.session_id,
-                    hermes_home=str(get_hermes_home()),
-                    platform=self.platform or "cli",
-                    model=self.model,
-                    context_length=getattr(self.context_compressor, "context_length", 0),
-                )
-            except Exception as _ce_err:
-                logger.debug("Context engine on_session_start: %s", _ce_err)
-
        self._subdirectory_hints = SubdirectoryHintTracker(
            working_dir=os.getenv("TERMINAL_CWD") or None,
        )
@@ -1415,13 +1347,11 @@ class AIAgent:
            "api_key": getattr(self, "api_key", ""),
            "client_kwargs": dict(self._client_kwargs),
            "use_prompt_caching": self._use_prompt_caching,
-            # Context engine state that _try_activate_fallback() overwrites.
-            # Use getattr for model/base_url/api_key/provider since plugin
-            # engines may not have these (they're ContextCompressor-specific).
-            "compressor_model": getattr(_cc, "model", self.model),
-            "compressor_base_url": getattr(_cc, "base_url", self.base_url),
+            # Compressor state that _try_activate_fallback() overwrites
+            "compressor_model": _cc.model,
+            "compressor_base_url": _cc.base_url,
            "compressor_api_key": getattr(_cc, "api_key", ""),
-            "compressor_provider": getattr(_cc, "provider", self.provider),
+            "compressor_provider": _cc.provider,
            "compressor_context_length": _cc.context_length,
            "compressor_threshold_tokens": _cc.threshold_tokens,
        }
@@ -1467,9 +1397,15 @@ class AIAgent:
        # Turn counter (added after reset_session_state was first written — #2635)
        self._user_turn_count = 0

-        # Context engine reset (works for both built-in compressor and plugins)
+        # Context compressor internal counters (if present)
        if hasattr(self, "context_compressor") and self.context_compressor:
-            self.context_compressor.on_session_reset()
+            self.context_compressor.last_prompt_tokens = 0
+            self.context_compressor.last_completion_tokens = 0
+            self.context_compressor.compression_count = 0
+            self.context_compressor._context_probed = False
+            self.context_compressor._context_probe_persistable = False
+            # Iterative summary from previous session must not bleed into new one (#2635)
+            self.context_compressor._previous_summary = None
    
    def switch_model(self, new_model, new_provider, api_key='', base_url='', api_mode=''):
        """Switch the model/provider in-place for a live agent.
@@ -1510,11 +1446,7 @@ class AIAgent:
                resolve_anthropic_token,
                _is_oauth_token,
            )
-            # Only fall back to ANTHROPIC_TOKEN when the provider is actually Anthropic.
-            # Other anthropic_messages providers (MiniMax, Alibaba, etc.) must use their own
-            # API key — falling back would send Anthropic credentials to third-party endpoints.
-            _is_native_anthropic = new_provider == "anthropic"
-            effective_key = (api_key or self.api_key or resolve_anthropic_token() or "") if _is_native_anthropic else (api_key or self.api_key or "")
+            effective_key = api_key or self.api_key or resolve_anthropic_token() or ""
            self.api_key = effective_key
            self._anthropic_api_key = effective_key
            self._anthropic_base_url = base_url or getattr(self, "_anthropic_base_url", None)
@@ -1538,7 +1470,7 @@ class AIAgent:
            )

        # ── Re-evaluate prompt caching ──
-        is_native_anthropic = api_mode == "anthropic_messages" and new_provider == "anthropic"
+        is_native_anthropic = api_mode == "anthropic_messages"
        self._use_prompt_caching = (
            ("openrouter" in (self.base_url or "").lower() and "claude" in new_model.lower())
            or is_native_anthropic
@@ -1554,12 +1486,13 @@ class AIAgent:
                provider=self.provider,
                config_context_length=getattr(self, "_config_context_length", None),
            )
-            self.context_compressor.update_model(
-                model=self.model,
-                context_length=new_context_length,
-                base_url=self.base_url,
-                api_key=getattr(self, "api_key", ""),
-                provider=self.provider,
+            self.context_compressor.model = self.model
+            self.context_compressor.base_url = self.base_url
+            self.context_compressor.api_key = self.api_key
+            self.context_compressor.provider = self.provider
+            self.context_compressor.context_length = new_context_length
+            self.context_compressor.threshold_tokens = int(
+                new_context_length * self.context_compressor.threshold_percent
            )

        # ── Invalidate cached system prompt so it rebuilds next turn ──
@@ -1575,10 +1508,10 @@ class AIAgent:
            "api_key": getattr(self, "api_key", ""),
            "client_kwargs": dict(self._client_kwargs),
            "use_prompt_caching": self._use_prompt_caching,
-            "compressor_model": getattr(_cc, "model", self.model) if _cc else self.model,
-            "compressor_base_url": getattr(_cc, "base_url", self.base_url) if _cc else self.base_url,
+            "compressor_model": _cc.model if _cc else self.model,
+            "compressor_base_url": _cc.base_url if _cc else self.base_url,
            "compressor_api_key": getattr(_cc, "api_key", "") if _cc else "",
-            "compressor_provider": getattr(_cc, "provider", self.provider) if _cc else self.provider,
+            "compressor_provider": _cc.provider if _cc else self.provider,
            "compressor_context_length": _cc.context_length if _cc else 0,
            "compressor_threshold_tokens": _cc.threshold_tokens if _cc else 0,
        }
@@ -2775,11 +2708,10 @@ class AIAgent:
        }

    def shutdown_memory_provider(self, messages: list = None) -> None:
-        """Shut down the memory provider and context engine — call at actual session boundaries.
+        """Shut down the memory provider — call at actual session boundaries.

        This calls on_session_end() then shutdown_all() on the memory
-        manager, and on_session_end() on the context engine.
-        NOT called per-turn — only at CLI exit, /reset, gateway
+        manager. NOT called per-turn — only at CLI exit, /reset, gateway
        session expiry, etc.
        """
        if self._memory_manager:
@@ -2791,15 +2723,6 @@ class AIAgent:
                self._memory_manager.shutdown_all()
            except Exception:
                pass
-        # Notify context engine of session end (flush DAG, close DBs, etc.)
-        if hasattr(self, "context_compressor") and self.context_compressor:
-            try:
-                self.context_compressor.on_session_end(
-                    self.session_id or "",
-                    messages or [],
-                )
-            except Exception:
-                pass
    
    def close(self) -> None:
        """Release all resources held by this agent instance.
@@ -4429,7 +4352,7 @@ class AIAgent:
            self._anthropic_api_key = runtime_key
            self._anthropic_base_url = runtime_base
            self._anthropic_client = build_anthropic_client(runtime_key, runtime_base)
-            self._is_anthropic_oauth = _is_oauth_token(runtime_key)
+            self._is_anthropic_oauth = _is_oauth_token(runtime_key) if self.provider == "anthropic" else False
            self.api_key = runtime_key
            self.base_url = runtime_base
            return
@@ -5301,7 +5224,7 @@ class AIAgent:
                }

            # Re-evaluate prompt caching for the new provider/model
-            is_native_anthropic = fb_api_mode == "anthropic_messages" and fb_provider == "anthropic"
+            is_native_anthropic = fb_api_mode == "anthropic_messages"
            self._use_prompt_caching = (
                ("openrouter" in fb_base_url.lower() and "claude" in fb_model.lower())
                or is_native_anthropic
@@ -5317,12 +5240,13 @@ class AIAgent:
                    self.model, base_url=self.base_url,
                    api_key=self.api_key, provider=self.provider,
                )
-                self.context_compressor.update_model(
-                    model=self.model,
-                    context_length=fb_context_length,
-                    base_url=self.base_url,
-                    api_key=getattr(self, "api_key", ""),
-                    provider=self.provider,
+                self.context_compressor.model = self.model
+                self.context_compressor.base_url = self.base_url
+                self.context_compressor.api_key = self.api_key
+                self.context_compressor.provider = self.provider
+                self.context_compressor.context_length = fb_context_length
+                self.context_compressor.threshold_tokens = int(
+                    fb_context_length * self.context_compressor.threshold_percent
                )

            self._emit_status(
@@ -5382,15 +5306,14 @@ class AIAgent:
                    shared=True,
                )

-            # ── Restore context engine state ──
+            # ── Restore context compressor state ──
            cc = self.context_compressor
-            cc.update_model(
-                model=rt["compressor_model"],
-                context_length=rt["compressor_context_length"],
-                base_url=rt["compressor_base_url"],
-                api_key=rt["compressor_api_key"],
-                provider=rt["compressor_provider"],
-            )
+            cc.model = rt["compressor_model"]
+            cc.base_url = rt["compressor_base_url"]
+            cc.api_key = rt["compressor_api_key"]
+            cc.provider = rt["compressor_provider"]
+            cc.context_length = rt["compressor_context_length"]
+            cc.threshold_tokens = rt["compressor_threshold_tokens"]

            # ── Reset fallback chain for the new turn ──
            self._fallback_activated = False
@@ -5637,12 +5560,11 @@ class AIAgent:
    def _anthropic_preserve_dots(self) -> bool:
        """True when using an anthropic-compatible endpoint that preserves dots in model names.
        Alibaba/DashScope keeps dots (e.g. qwen3.5-plus).
-        MiniMax keeps dots (e.g. MiniMax-M2.7).
        OpenCode Go keeps dots (e.g. minimax-m2.7)."""
-        if (getattr(self, "provider", "") or "").lower() in {"alibaba", "minimax", "minimax-cn", "opencode-go"}:
+        if (getattr(self, "provider", "") or "").lower() in {"alibaba", "opencode-go"}:
            return True
        base = (getattr(self, "base_url", "") or "").lower()
-        return "dashscope" in base or "aliyuncs" in base or "minimax" in base or "opencode.ai/zen/go" in base
+        return "dashscope" in base or "aliyuncs" in base or "opencode.ai/zen/go" in base

    def _is_qwen_portal(self) -> bool:
        """Return True when the base URL targets Qwen Portal."""
@@ -6956,29 +6878,6 @@ class AIAgent:
                        spinner.stop(cute_msg)
                    elif self._should_emit_quiet_tool_messages():
                        self._vprint(f"  {cute_msg}")
-            elif self._context_engine_tool_names and function_name in self._context_engine_tool_names:
-                # Context engine tools (lcm_grep, lcm_describe, lcm_expand, etc.)
-                spinner = None
-                if self.quiet_mode and not self.tool_progress_callback:
-                    face = random.choice(KawaiiSpinner.KAWAII_WAITING)
-                    emoji = _get_tool_emoji(function_name)
-                    preview = _build_tool_preview(function_name, function_args) or function_name
-                    spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots', print_fn=self._print_fn)
-                    spinner.start()
-                _ce_result = None
-                try:
-                    function_result = self.context_compressor.handle_tool_call(function_name, function_args, messages=messages)
-                    _ce_result = function_result
-                except Exception as tool_error:
-                    function_result = json.dumps({"error": f"Context engine tool '{function_name}' failed: {tool_error}"})
-                    logger.error("context_engine.handle_tool_call raised for %s: %s", function_name, tool_error, exc_info=True)
-                finally:
-                    tool_duration = time.time() - tool_start_time
-                    cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_ce_result)
-                    if spinner:
-                        spinner.stop(cute_msg)
-                    elif self.quiet_mode:
-                        self._vprint(f"  {cute_msg}")
            elif self._memory_manager and self._memory_manager.has_tool(function_name):
                # Memory provider tools (hindsight_retain, honcho_search, etc.)
                # These are not in the tool registry — route through MemoryManager.
@@ -7634,7 +7533,6 @@ class AIAgent:
                is_first_turn=(not bool(conversation_history)),
                model=self.model,
                platform=getattr(self, "platform", None) or "",
-                sender_id=getattr(self, "_user_id", None) or "",
            )
            _ctx_parts: list[str] = []
            for r in _pre_results:
@@ -8294,7 +8192,7 @@ class AIAgent:
                        # Cache discovered context length after successful call.
                        # Only persist limits confirmed by the provider (parsed
                        # from the error message), not guessed probe tiers.
-                        if getattr(self.context_compressor, "_context_probed", False):
+                        if self.context_compressor._context_probed:
                            ctx = self.context_compressor.context_length
                            if getattr(self.context_compressor, "_context_probe_persistable", False):
                                save_context_length(self.model, self.base_url, ctx)
@@ -8633,22 +8531,16 @@ class AIAgent:
                        compressor = self.context_compressor
                        old_ctx = compressor.context_length
                        if old_ctx > _reduced_ctx:
-                            compressor.update_model(
-                                model=self.model,
-                                context_length=_reduced_ctx,
-                                base_url=self.base_url,
-                                api_key=getattr(self, "api_key", ""),
-                                provider=self.provider,
+                            compressor.context_length = _reduced_ctx
+                            compressor.threshold_tokens = int(
+                                _reduced_ctx * compressor.threshold_percent
                            )
-                            # Context probing flags — only set on built-in
-                            # compressor (plugin engines manage their own).
-                            if hasattr(compressor, "_context_probed"):
-                                compressor._context_probed = True
-                                # Don't persist — this is a subscription-tier
-                                # limitation, not a model capability.  If the
-                                # user later enables extra usage the 1M limit
-                                # should come back automatically.
-                                compressor._context_probe_persistable = False
+                            compressor._context_probed = True
+                            # Don't persist — this is a subscription-tier
+                            # limitation, not a model capability.  If the user
+                            # later enables extra usage the 1M limit should
+                            # come back automatically.
+                            compressor._context_probe_persistable = False
                            self._vprint(
                                f"{self.log_prefix}⚠️  Anthropic long-context tier "
                                f"requires extra usage — reducing context: "
@@ -8812,25 +8704,17 @@ class AIAgent:
                            new_ctx = get_next_probe_tier(old_ctx)

                        if new_ctx and new_ctx < old_ctx:
-                            compressor.update_model(
-                                model=self.model,
-                                context_length=new_ctx,
-                                base_url=self.base_url,
-                                api_key=getattr(self, "api_key", ""),
-                                provider=self.provider,
+                            compressor.context_length = new_ctx
+                            compressor.threshold_tokens = int(new_ctx * compressor.threshold_percent)
+                            compressor._context_probed = True
+                            # Only persist limits parsed from the provider's
+                            # error message (a real number).  Guessed fallback
+                            # tiers from get_next_probe_tier() should stay
+                            # in-memory only — persisting them pollutes the
+                            # cache with wrong values.
+                            compressor._context_probe_persistable = bool(
+                                parsed_limit and parsed_limit == new_ctx
                            )
-                            # Context probing flags — only set on built-in
-                            # compressor (plugin engines manage their own).
-                            if hasattr(compressor, "_context_probed"):
-                                compressor._context_probed = True
-                                # Only persist limits parsed from the provider's
-                                # error message (a real number).  Guessed fallback
-                                # tiers from get_next_probe_tier() should stay
-                                # in-memory only — persisting them pollutes the
-                                # cache with wrong values.
-                                compressor._context_probe_persistable = bool(
-                                    parsed_limit and parsed_limit == new_ctx
-                                )
                            self._vprint(f"{self.log_prefix}⚠️  Context length exceeded — stepping down: {old_ctx:,} → {new_ctx:,} tokens", force=True)
                        else:
                            self._vprint(f"{self.log_prefix}⚠️  Context length exceeded at minimum tier — attempting compression...", force=True)
@@ -9575,8 +9459,7 @@ class AIAgent:
                        fallback = getattr(self, '_last_content_with_tools', None)
                        if fallback:
                            _turn_exit_reason = "fallback_prior_turn_content"
-                            logger.info("Empty follow-up after tool calls — using prior turn content as final response")
-                            self._emit_status("↻ Empty response after tool calls — using earlier content as final answer")
+                            logger.debug("Empty follow-up after tool calls — using prior turn content as final response")
                            self._last_content_with_tools = None
                            self._empty_content_retries = 0
                            for i in range(len(messages) - 1, -1, -1):
@@ -9607,13 +9490,9 @@ class AIAgent:
                        )
                        if _has_structured and self._thinking_prefill_retries < 2:
                            self._thinking_prefill_retries += 1
-                            logger.info(
-                                "Thinking-only response (no visible content) — "
-                                "prefilling to continue (%d/2)",
-                                self._thinking_prefill_retries,
-                            )
-                            self._emit_status(
-                                f"↻ Thinking-only response — prefilling to continue "
+                            self._vprint(
+                                f"{self.log_prefix}↻ Thinking-only response — "
+                                f"prefilling to continue "
                                f"({self._thinking_prefill_retries}/2)"
                            )
                            interim_msg = self._build_assistant_message(
@@ -9629,57 +9508,23 @@ class AIAgent:
                        # Model returned nothing — no content, no
                        # structured reasoning, no tool calls.  Common
                        # with open models (transient provider issues,
-                        # rate limits, sampling flukes).  Retry up to 3
-                        # times before attempting fallback.  Skip when
+                        # rate limits, sampling flukes).  Silently retry
+                        # up to 3 times before giving up.  Skip when
                        # content has inline <think> tags (model chose
                        # to reason, just no visible text).
                        _truly_empty = not final_response.strip()
                        if _truly_empty and not _has_structured and self._empty_content_retries < 3:
                            self._empty_content_retries += 1
-                            logger.warning(
-                                "Empty response (no content or reasoning) — "
-                                "retry %d/3 (model=%s)",
-                                self._empty_content_retries, self.model,
-                            )
-                            self._emit_status(
-                                f"⚠️ Empty response from model — retrying "
-                                f"({self._empty_content_retries}/3)"
+                            self._vprint(
+                                f"{self.log_prefix}↻ Empty response (no content or reasoning) "
+                                f"— retrying ({self._empty_content_retries}/3)",
+                                force=True,
                            )
                            continue

-                        # ── Exhausted retries — try fallback provider ──
-                        # Before giving up with "(empty)", attempt to
-                        # switch to the next provider in the fallback
-                        # chain.  This covers the case where a model
-                        # (e.g. GLM-4.5-Air) consistently returns empty
-                        # due to context degradation or provider issues.
-                        if _truly_empty and self._fallback_chain:
-                            logger.warning(
-                                "Empty response after %d retries — "
-                                "attempting fallback (model=%s, provider=%s)",
-                                self._empty_content_retries, self.model,
-                                self.provider,
-                            )
-                            self._emit_status(
-                                "⚠️ Model returning empty responses — "
-                                "switching to fallback provider..."
-                            )
-                            if self._try_activate_fallback():
-                                self._empty_content_retries = 0
-                                self._emit_status(
-                                    f"↻ Switched to fallback: {self.model} "
-                                    f"({self.provider})"
-                                )
-                                logger.info(
-                                    "Fallback activated after empty responses: "
-                                    "now using %s on %s",
-                                    self.model, self.provider,
-                                )
-                                continue
-
-                        # Exhausted retries and fallback chain (or no
-                        # fallback configured).  Fall through to the
-                        # "(empty)" terminal.
+                        # Exhausted prefill attempts, empty retries, or
+                        # structured reasoning with no content —
+                        # fall through to "(empty)" terminal.
                        _turn_exit_reason = "empty_response_exhausted"
                        reasoning_text = self._extract_reasoning(assistant_message)
                        assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
@@ -9688,28 +9533,9 @@ class AIAgent:

                        if reasoning_text:
                            reasoning_preview = reasoning_text[:500] + "..." if len(reasoning_text) > 500 else reasoning_text
-                            logger.warning(
-                                "Reasoning-only response (no visible content) "
-                                "after exhausting retries and fallback. "
-                                "Reasoning: %s", reasoning_preview,
-                            )
-                            self._emit_status(
-                                "⚠️ Model produced reasoning but no visible "
-                                "response after all retries. Returning empty."
-                            )
+                            self._vprint(f"{self.log_prefix}ℹ️  Reasoning-only response (no visible content). Reasoning: {reasoning_preview}")
                        else:
-                            logger.warning(
-                                "Empty response (no content or reasoning) "
-                                "after %d retries. No fallback available. "
-                                "model=%s provider=%s",
-                                self._empty_content_retries, self.model,
-                                self.provider,
-                            )
-                            self._emit_status(
-                                "❌ Model returned no content after all retries"
-                                + (" and fallback attempts." if self._fallback_chain else
-                                   ". No fallback providers configured.")
-                            )
+                            self._vprint(f"{self.log_prefix}ℹ️  Empty response (no content or reasoning) after 3 retries.")

                        final_response = "(empty)"
                        break
@@ -249,12 +249,8 @@ def check_config(groq_key, eleven_key):

            if stt_provider == "groq" and not groq_key:
                warn("STT config says groq but GROQ_API_KEY is missing")
-            if stt_provider == "mistral" and not os.getenv("MISTRAL_API_KEY"):
-                warn("STT config says mistral but MISTRAL_API_KEY is missing")
            if tts_provider == "elevenlabs" and not eleven_key:
                warn("TTS config says elevenlabs but ELEVENLABS_API_KEY is missing")
-            if tts_provider == "mistral" and not os.getenv("MISTRAL_API_KEY"):
-                warn("TTS config says mistral but MISTRAL_API_KEY is missing")
        except Exception as e:
            warn("config.yaml", f"parse error: {e}")
    else:
@@ -203,30 +203,3 @@ For segmented videos (quotes, scenes, chapters), render each as a separate clip
 | `references/inputs.md` | Audio analysis (FFT, bands, beats), video sampling, image conversion, text/lyrics, TTS integration (ElevenLabs, voice assignment, audio mixing) |
 | `references/optimization.md` | Hardware detection, quality profiles, vectorized patterns, parallel rendering, memory management, performance budgets |
 | `references/troubleshooting.md` | NumPy broadcasting traps, blend mode pitfalls, multiprocessing/pickling, brightness diagnostics, ffmpeg issues, font problems, common mistakes |
-
---
-
-## Creative Divergence (use only when user requests experimental/creative/unique output)
-
-If the user asks for creative, experimental, surprising, or unconventional output, select the strategy that best fits and reason through its steps BEFORE generating code.
-
- **Forced Connections** — when the user wants cross-domain inspiration ("make it look organic," "industrial aesthetic")
- **Conceptual Blending** — when the user names two things to combine ("ocean meets music," "space + calligraphy")
- **Oblique Strategies** — when the user is maximally open ("surprise me," "something I've never seen")
-
-### Forced Connections
-1. Pick a domain unrelated to the visual goal (weather systems, microbiology, architecture, fluid dynamics, textile weaving)
-2. List its core visual/structural elements (erosion → gradual reveal; mitosis → splitting duplication; weaving → interlocking patterns)
-3. Map those elements onto ASCII characters and animation patterns
-4. Synthesize — what does "erosion" or "crystallization" look like in a character grid?
-
-### Conceptual Blending
-1. Name two distinct visual/conceptual spaces (e.g., ocean waves + sheet music)
-2. Map correspondences (crests = high notes, troughs = rests, foam = staccato)
-3. Blend selectively — keep the most interesting mappings, discard forced ones
-4. Develop emergent properties that exist only in the blend
-
-### Oblique Strategies
-1. Draw one: "Honor thy error as a hidden intention" / "Use an old idea" / "What would your closest friend do?" / "Emphasize the flaws" / "Turn it upside down" / "Only a part, not the whole" / "Reverse"
-2. Interpret the directive against the current ASCII animation challenge
-3. Apply the lateral insight to the visual design before writing code
@@ -1,147 +0,0 @@
---
-name: ideation
-title: Creative Ideation — Constraint-Driven Project Generation
-description: "Generate project ideas through creative constraints. Use when the user says 'I want to build something', 'give me a project idea', 'I'm bored', 'what should I make', 'inspire me', or any variant of 'I have tools but no direction'. Works for code, art, hardware, writing, tools, and anything that can be made."
-version: 1.0.0
-author: SHL0MS
-license: MIT
-metadata:
-  hermes:
-    tags: [Creative, Ideation, Projects, Brainstorming, Inspiration]
-    category: creative
-    requires_toolsets: []
---
-
-# Creative Ideation
-
-Generate project ideas through creative constraints. Constraint + direction = creativity.
-
-## How It Works
-
-1. **Pick a constraint** from the library below — random, or matched to the user's domain/mood
-2. **Interpret it broadly** — a coding prompt can become a hardware project, an art prompt can become a CLI tool
-3. **Generate 3 concrete project ideas** that satisfy the constraint
-4. **If they pick one, build it** — create the project, write the code, ship it
-
-## The Rule
-
-Every prompt is interpreted as broadly as possible. "Does this include X?" → Yes. The prompts provide direction and mild constraint. Without either, there is no creativity.
-
-## Constraint Library
-
-### For Developers
-
-**Solve your own itch:**
-Build the tool you wished existed this week. Under 50 lines. Ship it today.
-
-**Automate the annoying thing:**
-What's the most tedious part of your workflow? Script it away. Two hours to fix a problem that costs you five minutes a day.
-
-**The CLI tool that should exist:**
-Think of a command you've wished you could type. `git undo-that-thing-i-just-did`. `docker why-is-this-broken`. `npm explain-yourself`. Now build it.
-
-**Nothing new except glue:**
-Make something entirely from existing APIs, libraries, and datasets. The only original contribution is how you connect them.
-
-**Frankenstein week:**
-Take something that does X and make it do Y. A git repo that plays music. A Dockerfile that generates poetry. A cron job that sends compliments.
-
-**Subtract:**
-How much can you remove from a codebase before it breaks? Strip a tool to its minimum viable function. Delete until only the essence remains.
-
-**High concept, low effort:**
-A deep idea, lazily executed. The concept should be brilliant. The implementation should take an afternoon. If it takes longer, you're overthinking it.
-
-### For Makers & Artists
-
-**Blatantly copy something:**
-Pick something you admire — a tool, an artwork, an interface. Recreate it from scratch. The learning is in the gap between your version and theirs.
-
-**One million of something:**
-One million is both a lot and not that much. One million pixels is a 1MB photo. One million API calls is a Tuesday. One million of anything becomes interesting at scale.
-
-**Make something that dies:**
-A website that loses a feature every day. A chatbot that forgets. A countdown to nothing. An exercise in rot, killing, or letting go.
-
-**Do a lot of math:**
-Generative geometry, shader golf, mathematical art, computational origami. Time to re-learn what an arcsin is.
-
-### For Anyone
-
-**Text is the universal interface:**
-Build something where text is the only interface. No buttons, no graphics, just words in and words out. Text can go in and out of almost anything.
-
-**Start at the punchline:**
-Think of something that would be a funny sentence. Work backwards to make it real. "I taught my thermostat to gaslight me" → now build it.
-
-**Hostile UI:**
-Make something intentionally painful to use. A password field that requires 47 conditions. A form where every label lies. A CLI that judges your commands.
-
-**Take two:**
-Remember an old project. Do it again from scratch. No looking at the original. See what changed about how you think.
-
-See `references/full-prompt-library.md` for 30+ additional constraints across communication, scale, philosophy, transformation, and more.
-
-## Matching Constraints to Users
-
-| User says | Pick from |
-|-----------|-----------|
-| "I want to build something" (no direction) | Random — any constraint |
-| "I'm learning [language]" | Blatantly copy something, Automate the annoying thing |
-| "I want something weird" | Hostile UI, Frankenstein week, Start at the punchline |
-| "I want something useful" | Solve your own itch, The CLI that should exist, Automate the annoying thing |
-| "I want something beautiful" | Do a lot of math, One million of something |
-| "I'm burned out" | High concept low effort, Make something that dies |
-| "Weekend project" | Nothing new except glue, Start at the punchline |
-| "I want a challenge" | One million of something, Subtract, Take two |
-
-## Output Format
-
-```
-## Constraint: [Name]
-> [The constraint, one sentence]
-
-### Ideas
-
-1. **[One-line pitch]**
-   [2-3 sentences: what you'd build and why it's interesting]
-   ⏱ [weekend / week / month] • 🔧 [stack]
-
-2. **[One-line pitch]**
-   [2-3 sentences]
-   ⏱ ... • 🔧 ...
-
-3. **[One-line pitch]**
-   [2-3 sentences]
-   ⏱ ... • 🔧 ...
-```
-
-## Example
-
-```
-## Constraint: The CLI tool that should exist
-> Think of a command you've wished you could type. Now build it.
-
-### Ideas
-
-1. **`git whatsup` — show what happened while you were away**
-   Compares your last active commit to HEAD and summarizes what changed,
-   who committed, and what PRs merged. Like a morning standup from your repo.
-   ⏱ weekend • 🔧 Python, GitPython, click
-
-2. **`explain 503` — HTTP status codes for humans**
-   Pipe any status code or error message and get a plain-English explanation
-   with common causes and fixes. Pulls from a curated database, not an LLM.
-   ⏱ weekend • 🔧 Rust or Go, static dataset
-
-3. **`deps why <package>` — why is this in my dependency tree**
-   Traces a transitive dependency back to the direct dependency that pulled
-   it in. Answers "why do I have 47 copies of lodash" in one command.
-   ⏱ weekend • 🔧 Node.js, npm/yarn lockfile parsing
-```
-
-After the user picks one, start building — create the project, write the code, iterate.
-
-## Attribution
-
-Constraint approach inspired by [wttdotm.com/prompts.html](https://wttdotm.com/prompts.html). Adapted and expanded for software development and general-purpose ideation.
@@ -1,110 +0,0 @@
-# Full Prompt Library
-
-Extended constraint library beyond the core set in SKILL.md. Load these when the user wants more variety or a specific category.
-
-## Communication & Connection
-
-**Create a means of distribution:**
-The project works when you can use what you made to give something to somebody else.
-
-**Make a way to communicate:**
-The project works when you can hold a conversation with someone else using what you created. Not chat — something weirder.
-
-**Write a love letter:**
-To a person, a programming language, a game, a place, a tool. On paper, in code, in music, in light. Mail it.
-
-**Mail chess / Asynchronous games:**
-Something turn-based played with no time limit. No requirement to be there at the same time. The game happens in the gaps.
-
-**Twitch plays X:**
-A group of people share control over something. Collective input, emergent behavior.
-
-## Screens & Interfaces
-
-**Something for your desktop:**
-You spend a lot of time there. Spruce it up. A custom clock, a pet that lives in your terminal, a wallpaper that changes based on your git activity.
-
-**One screen, two screen, old screen, new screen:**
-Take something you associate with one screen and put it on a very different one. DOOM on a smart fridge. A spreadsheet on a watch. A terminal in a painting.
-
-**Make a mirror:**
-Something that reflects the viewer back at themselves. A website that shows your browsing history. A CLI that prints your git sins.
-
-## Philosophy & Concept
-
-**Code as koan, koan as code:**
-What is the sound of one hand clapping? A program that answers a question it wasn't asked. A function that returns before it's called.
-
-**The useless tree:**
-Make something useless. Deliberately, completely, beautifully useless. No utility. No purpose. No point. That's the point.
-
-**Artificial stupidity:**
-Make fun of AI by showcasing its faults. Mistrain it. Lie to it. Build the opposite of what AI is supposed to be good at.
-
-**"I use technology in order to hate it properly":**
-Make something inspired by the tension between loving and hating your tools.
-
-**The more things change, the more they stay the same:**
-Reflect on time, difference, and similarity.
-
-## Transformation
-
-**Translate:**
-Take something meant for one audience and make it understandable by another. A research paper as a children's book. An API as a board game. A song as an architecture diagram.
-
-**I mean, I GUESS you could store something that way:**
-The project works when you can save and open something. Store data in DNS caches. Encode a novel in emoji. Write a file system on top of something that isn't a file system.
-
-**I mean, I GUESS those could be pixels:**
-The project works when you can display an image. Render anything visual in a medium that wasn't meant for rendering.
-
-## Identity & Reflection
-
-**Make a self-portrait:**
-Be yourself? Be fake? Be real? In code, in data, in sound, in a directory structure.
-
-**Make a pun:**
-The stupider the better. Physical, digital, linguistic, visual. The project IS the joke.
-
-**Doors, walls, borders, barriers, boundaries:**
-Things that intermediate two places: opening, closing, permeating, excluding, combining.
-
-## Scale & Repetition
-
-**Lists!:**
-Itemizations, taxonomies, exhaustive recountings, iterations. This one. A list of list of lists.
-
-**Did you mean *recursion*?**
-Did you mean recursion?
-
-**Animals:**
-Lions, and tigers, and bears. Crab logic gates. Fish plays the stock market.
-
-**Cats:**
-Where would the internet be without them.
-
-## Starting Points
-
-**An idea that comes from a book:**
-Read something. Make something inspired by it.
-
-**Go to a museum:**
-Project ensues.
-
-**NPC loot:**
-What do you drop when you die? What do you take on your journey? Build the item.
-
-**Mythological objects and entities:**
-Pandora's box, the ocarina of time, the palantir. Build the artifact.
-
-**69:**
-Nice. Make something with the joke being the number 69.
-
-**Office Space printer scene:**
-Capture the same energy. Channel the catharsis of destroying the thing that frustrates you.
-
-**Borges week:**
-Something inspired by the Argentine. The library of babel. The map that is the territory.
-
-**Lights!:**
-LED throwies, light installations, illuminated anything. Make something that glows.
@@ -239,26 +239,3 @@ Always iterate at `-ql`. Only render `-qh` for final output.
 | `references/paper-explainer.md` | Turning research papers into animations — workflow, templates, domain patterns |
 | `references/decorations.md` | SurroundingRectangle, Brace, arrows, DashedLine, Angle, annotation lifecycle |
 | `references/production-quality.md` | Pre-code, pre-render, post-render checklists, spatial layout, color, tempo |
-
---
-
-## Creative Divergence (use only when user requests experimental/creative/unique output)
-
-If the user asks for creative, experimental, or unconventional explanatory approaches, select a strategy and reason through it BEFORE designing the animation.
-
- **SCAMPER** — when the user wants a fresh take on a standard explanation
- **Assumption Reversal** — when the user wants to challenge how something is typically taught
-
-### SCAMPER Transformation
-Take a standard mathematical/technical visualization and transform it:
- **Substitute**: replace the standard visual metaphor (number line → winding path, matrix → city grid)
- **Combine**: merge two explanation approaches (algebraic + geometric simultaneously)
- **Reverse**: derive backward — start from the result and deconstruct to axioms
- **Modify**: exaggerate a parameter to show why it matters (10x the learning rate, 1000x the sample size)
- **Eliminate**: remove all notation — explain purely through animation and spatial relationships
-
-### Assumption Reversal
-1. List what's "standard" about how this topic is visualized (left-to-right, 2D, discrete steps, formal notation)
-2. Pick the most fundamental assumption
-3. Reverse it (right-to-left derivation, 3D embedding of a 2D concept, continuous morphing instead of steps, zero notation)
-4. Explore what the reversal reveals that the standard approach hides
@@ -511,37 +511,3 @@ When building p5.js sketches:
 | `references/export-pipeline.md` | `saveCanvas()`, `saveGif()`, `saveFrames()`, deterministic headless capture, ffmpeg frame-to-video, CCapture.js, SVG export, per-clip architecture, platform export (fxhash), video gotchas |
 | `references/troubleshooting.md` | Performance profiling, per-pixel budgets, common mistakes, browser compatibility, WebGL debugging, font loading issues, pixel density traps, memory leaks, CORS |
 | `templates/viewer.html` | Interactive viewer template: seed navigation (prev/next/random/jump), parameter sliders, download PNG, responsive canvas. Start from this for explorable generative art |
-
---
-
-## Creative Divergence (use only when user requests experimental/creative/unique output)
-
-If the user asks for creative, experimental, surprising, or unconventional output, select the strategy that best fits and reason through its steps BEFORE generating code.
-
- **Conceptual Blending** — when the user names two things to combine or wants hybrid aesthetics
- **SCAMPER** — when the user wants a twist on a known generative art pattern
- **Distance Association** — when the user gives a single concept and wants exploration ("make something about time")
-
-### Conceptual Blending
-1. Name two distinct visual systems (e.g., particle physics + handwriting)
-2. Map correspondences (particles = ink drops, forces = pen pressure, fields = letterforms)
-3. Blend selectively — keep mappings that produce interesting emergent visuals
-4. Code the blend as a unified system, not two systems side-by-side
-
-### SCAMPER Transformation
-Take a known generative pattern (flow field, particle system, L-system, cellular automata) and systematically transform it:
- **Substitute**: replace circles with text characters, lines with gradients
- **Combine**: merge two patterns (flow field + voronoi)
- **Adapt**: apply a 2D pattern to a 3D projection
- **Modify**: exaggerate scale, warp the coordinate space
- **Purpose**: use a physics sim for typography, a sorting algorithm for color
- **Eliminate**: remove the grid, remove color, remove symmetry
- **Reverse**: run the simulation backward, invert the parameter space
-
-### Distance Association
-1. Anchor on the user's concept (e.g., "loneliness")
-2. Generate associations at three distances:
-   - Close (obvious): empty room, single figure, silence
-   - Medium (interesting): one fish in a school swimming the wrong way, a phone with no notifications, the gap between subway cars
-   - Far (abstract): prime numbers, asymptotic curves, the color of 3am
-3. Develop the medium-distance associations — they're specific enough to visualize but unexpected enough to be interesting
@@ -39,13 +39,8 @@ class TestIsOAuthToken:
        assert _is_oauth_token("sk-ant-api03-abcdef1234567890") is False

    def test_managed_key(self):
-        # Managed keys from ~/.claude.json without a recognisable Anthropic
-        # prefix are not positively identified as OAuth.  They enter the system
-        # via diagnostics-only read_claude_managed_key(), not via
-        # resolve_anthropic_token(), so they don't reach the OAuth gate in
-        # practice.  Third-party provider keys (MiniMax, Alibaba) also lack
-        # the sk-ant- prefix and must NOT be treated as OAuth.
-        assert _is_oauth_token("ou1R1z-ft0A-bDeZ9wAA") is False
+        # Managed keys from ~/.claude.json are NOT regular API keys
+        assert _is_oauth_token("ou1R1z-ft0A-bDeZ9wAA") is True

    def test_jwt_token(self):
        # JWTs from OAuth flow
@@ -1,10 +1,9 @@
 """Tests for agent.auxiliary_client resolution chain, provider overrides, and model overrides."""

 import json
-import logging
 import os
 from pathlib import Path
-from unittest.mock import patch, MagicMock, AsyncMock
+from unittest.mock import patch, MagicMock

 import pytest

@@ -15,7 +14,6 @@ from agent.auxiliary_client import (
    resolve_provider_client,
    auxiliary_max_tokens_param,
    call_llm,
-    async_call_llm,
    _read_codex_access_token,
    _get_auxiliary_provider,
    _get_provider_chain,
@@ -758,69 +756,6 @@ class TestAuxiliaryPoolAwareness:
        assert call_kwargs["base_url"] == "https://api.githubcopilot.com"
        assert call_kwargs["default_headers"]["Editor-Version"]

-    def test_copilot_responses_api_model_wrapped_in_codex_client(self, monkeypatch):
-        """Copilot GPT-5+ models (needing Responses API) are wrapped in CodexAuxiliaryClient."""
-        monkeypatch.delenv("GITHUB_TOKEN", raising=False)
-        monkeypatch.delenv("GH_TOKEN", raising=False)
-
-        with (
-            patch(
-                "hermes_cli.auth.resolve_api_key_provider_credentials",
-                return_value={
-                    "provider": "copilot",
-                    "api_key": "test-token",
-                    "base_url": "https://api.githubcopilot.com",
-                    "source": "gh auth token",
-                },
-            ),
-            patch("agent.auxiliary_client.OpenAI"),
-        ):
-            client, model = resolve_provider_client("copilot", model="gpt-5.4-mini")
-
-        from agent.auxiliary_client import CodexAuxiliaryClient
-        assert isinstance(client, CodexAuxiliaryClient)
-        assert model == "gpt-5.4-mini"
-
-    def test_copilot_chat_completions_model_not_wrapped(self, monkeypatch):
-        """Copilot models using Chat Completions are returned as plain OpenAI clients."""
-        monkeypatch.delenv("GITHUB_TOKEN", raising=False)
-        monkeypatch.delenv("GH_TOKEN", raising=False)
-
-        with (
-            patch(
-                "hermes_cli.auth.resolve_api_key_provider_credentials",
-                return_value={
-                    "provider": "copilot",
-                    "api_key": "test-token",
-                    "base_url": "https://api.githubcopilot.com",
-                    "source": "gh auth token",
-                },
-            ),
-            patch("agent.auxiliary_client.OpenAI") as mock_openai,
-        ):
-            client, model = resolve_provider_client("copilot", model="gpt-4.1-mini")
-
-        from agent.auxiliary_client import CodexAuxiliaryClient
-        assert not isinstance(client, CodexAuxiliaryClient)
-        assert model == "gpt-4.1-mini"
-        # Should be the raw mock OpenAI client
-        assert client is mock_openai.return_value
-
-    def test_vision_auto_uses_active_provider_as_fallback(self, monkeypatch):
-        """When no OpenRouter/Nous available, vision auto falls back to active provider."""
-        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
-        with (
-            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
-            patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
-            patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
-            patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
-            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
-        ):
-            client, model = get_vision_auxiliary_client()
-
-        assert client is not None
-        assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
-
    def test_vision_auto_prefers_active_provider_over_openrouter(self, monkeypatch):
        """Active provider is tried before OpenRouter in vision auto."""
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
@@ -1124,8 +1059,8 @@ class TestCallLlmPaymentFallback:
        exc.status_code = 402
        return exc

-    def test_402_triggers_fallback_when_auto(self, monkeypatch):
-        """When provider is auto and returns 402, call_llm tries the next one."""
+    def test_402_triggers_fallback(self, monkeypatch):
+        """When the primary provider returns 402, call_llm tries the next one."""
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")

        primary_client = MagicMock()
@@ -1138,7 +1073,7 @@ class TestCallLlmPaymentFallback:
        with patch("agent.auxiliary_client._get_cached_client",
                    return_value=(primary_client, "google/gemini-3-flash-preview")), \
             patch("agent.auxiliary_client._resolve_task_provider_model",
-                    return_value=("auto", "google/gemini-3-flash-preview", None, None, None)), \
+                    return_value=("openrouter", "google/gemini-3-flash-preview", None, None)), \
             patch("agent.auxiliary_client._try_payment_fallback",
                    return_value=(fallback_client, "gpt-5.2-codex", "openai-codex")) as mock_fb:
            result = call_llm(
@@ -1147,62 +1082,13 @@ class TestCallLlmPaymentFallback:
            )

        assert result is fallback_response
-        mock_fb.assert_called_once_with("auto", "compression", reason="payment error")
+        mock_fb.assert_called_once_with("openrouter", "compression")
        # Fallback call should use the fallback model
        fb_kwargs = fallback_client.chat.completions.create.call_args.kwargs
        assert fb_kwargs["model"] == "gpt-5.2-codex"

-    def test_402_no_fallback_when_explicit_provider(self, monkeypatch):
-        """When provider is explicitly configured (not auto), 402 should NOT fallback (#7559)."""
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-
-        primary_client = MagicMock()
-        primary_client.chat.completions.create.side_effect = self._make_402_error()
-
-        with patch("agent.auxiliary_client._get_cached_client",
-                    return_value=(primary_client, "local-model")), \
-             patch("agent.auxiliary_client._resolve_task_provider_model",
-                    return_value=("custom", "local-model", None, None, None)), \
-             patch("agent.auxiliary_client._try_payment_fallback") as mock_fb:
-            with pytest.raises(Exception, match="insufficient credits"):
-                call_llm(
-                    task="compression",
-                    messages=[{"role": "user", "content": "hello"}],
-                )
-
-        # Fallback should NOT be attempted when provider is explicit
-        mock_fb.assert_not_called()
-
-    def test_connection_error_triggers_fallback_when_auto(self, monkeypatch):
-        """Connection errors also trigger fallback when provider is auto."""
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-
-        primary_client = MagicMock()
-        conn_err = Exception("Connection refused")
-        conn_err.status_code = None
-        primary_client.chat.completions.create.side_effect = conn_err
-
-        fallback_client = MagicMock()
-        fallback_response = MagicMock()
-        fallback_client.chat.completions.create.return_value = fallback_response
-
-        with patch("agent.auxiliary_client._get_cached_client",
-                    return_value=(primary_client, "model")), \
-             patch("agent.auxiliary_client._resolve_task_provider_model",
-                    return_value=("auto", "model", None, None, None)), \
-             patch("agent.auxiliary_client._is_connection_error", return_value=True), \
-             patch("agent.auxiliary_client._try_payment_fallback",
-                    return_value=(fallback_client, "fb-model", "nous")) as mock_fb:
-            result = call_llm(
-                task="compression",
-                messages=[{"role": "user", "content": "hello"}],
-            )
-
-        assert result is fallback_response
-        mock_fb.assert_called_once_with("auto", "compression", reason="connection error")
-
    def test_non_payment_error_not_caught(self, monkeypatch):
-        """Non-payment/non-connection errors (500) should NOT trigger fallback."""
+        """Non-payment errors (500, connection, etc.) should NOT trigger fallback."""
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")

        primary_client = MagicMock()
@@ -1213,7 +1099,7 @@ class TestCallLlmPaymentFallback:
        with patch("agent.auxiliary_client._get_cached_client",
                    return_value=(primary_client, "google/gemini-3-flash-preview")), \
             patch("agent.auxiliary_client._resolve_task_provider_model",
-                    return_value=("auto", "google/gemini-3-flash-preview", None, None, None)):
+                    return_value=("openrouter", "google/gemini-3-flash-preview", None, None)):
            with pytest.raises(Exception, match="Internal Server Error"):
                call_llm(
                    task="compression",
@@ -1230,7 +1116,7 @@ class TestCallLlmPaymentFallback:
        with patch("agent.auxiliary_client._get_cached_client",
                    return_value=(primary_client, "google/gemini-3-flash-preview")), \
             patch("agent.auxiliary_client._resolve_task_provider_model",
-                    return_value=("auto", "google/gemini-3-flash-preview", None, None, None)), \
+                    return_value=("openrouter", "google/gemini-3-flash-preview", None, None)), \
             patch("agent.auxiliary_client._try_payment_fallback",
                    return_value=(None, None, "")):
            with pytest.raises(Exception, match="insufficient credits"):
@@ -1280,283 +1166,3 @@ def test_resolve_api_key_provider_skips_unconfigured_anthropic(monkeypatch):

    assert "anthropic" not in called, \
        "_try_anthropic() should not be called when anthropic is not explicitly configured"
-
-
-# ---------------------------------------------------------------------------
-# model="default" elimination (#7512)
-# ---------------------------------------------------------------------------
-
-
-class TestModelDefaultElimination:
-    """_resolve_api_key_provider must skip providers without known aux models."""
-
-    def test_unknown_provider_skipped(self, monkeypatch):
-        """Providers not in _API_KEY_PROVIDER_AUX_MODELS are skipped, not sent model='default'."""
-        from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
-
-        # Verify our known providers have entries
-        assert "gemini" in _API_KEY_PROVIDER_AUX_MODELS
-        assert "kimi-coding" in _API_KEY_PROVIDER_AUX_MODELS
-
-        # A random provider_id not in the dict should return None
-        assert _API_KEY_PROVIDER_AUX_MODELS.get("totally-unknown-provider") is None
-
-    def test_known_provider_gets_real_model(self):
-        """Known providers get a real model name, not 'default'."""
-        from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
-
-        for provider_id, model in _API_KEY_PROVIDER_AUX_MODELS.items():
-            assert model != "default", f"{provider_id} should not map to 'default'"
-            assert isinstance(model, str) and model.strip(), \
-                f"{provider_id} should have a non-empty model string"
-
-
-# ---------------------------------------------------------------------------
-# _try_payment_fallback reason parameter (#7512 bug 3)
-# ---------------------------------------------------------------------------
-
-
-class TestTryPaymentFallbackReason:
-    """_try_payment_fallback uses the reason parameter in log messages."""
-
-    def test_reason_parameter_passed_through(self, monkeypatch):
-        """The reason= parameter is accepted without error."""
-        from agent.auxiliary_client import _try_payment_fallback
-
-        # Mock the provider chain to return nothing
-        monkeypatch.setattr(
-            "agent.auxiliary_client._get_provider_chain",
-            lambda: [],
-        )
-        monkeypatch.setattr(
-            "agent.auxiliary_client._read_main_provider",
-            lambda: "",
-        )
-
-        client, model, label = _try_payment_fallback(
-            "openrouter", task="compression", reason="connection error"
-        )
-        assert client is None
-        assert label == ""
-
-
-# ---------------------------------------------------------------------------
-# _is_connection_error coverage
-# ---------------------------------------------------------------------------
-
-
-class TestIsConnectionError:
-    """Tests for _is_connection_error detection."""
-
-    def test_connection_refused(self):
-        from agent.auxiliary_client import _is_connection_error
-        err = Exception("Connection refused")
-        assert _is_connection_error(err) is True
-
-    def test_timeout(self):
-        from agent.auxiliary_client import _is_connection_error
-        err = Exception("Request timed out.")
-        assert _is_connection_error(err) is True
-
-    def test_dns_failure(self):
-        from agent.auxiliary_client import _is_connection_error
-        err = Exception("Name or service not known")
-        assert _is_connection_error(err) is True
-
-    def test_normal_api_error_not_connection(self):
-        from agent.auxiliary_client import _is_connection_error
-        err = Exception("Bad Request: invalid model")
-        err.status_code = 400
-        assert _is_connection_error(err) is False
-
-    def test_500_not_connection(self):
-        from agent.auxiliary_client import _is_connection_error
-        err = Exception("Internal Server Error")
-        err.status_code = 500
-        assert _is_connection_error(err) is False
-
-
-# ---------------------------------------------------------------------------
-# async_call_llm payment / connection fallback (#7512 bug 2)
-# ---------------------------------------------------------------------------
-
-
-class TestAsyncCallLlmFallback:
-    """async_call_llm mirrors call_llm fallback behavior."""
-
-    def _make_402_error(self, msg="Payment Required: insufficient credits"):
-        exc = Exception(msg)
-        exc.status_code = 402
-        return exc
-
-    @pytest.mark.asyncio
-    async def test_402_triggers_async_fallback_when_auto(self, monkeypatch):
-        """When provider is auto and returns 402, async_call_llm tries fallback."""
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-
-        primary_client = MagicMock()
-        primary_client.chat.completions.create = AsyncMock(
-            side_effect=self._make_402_error())
-
-        # Fallback client (sync) returned by _try_payment_fallback
-        fb_sync_client = MagicMock()
-        fb_async_client = MagicMock()
-        fb_response = MagicMock()
-        fb_async_client.chat.completions.create = AsyncMock(return_value=fb_response)
-
-        with patch("agent.auxiliary_client._get_cached_client",
-                    return_value=(primary_client, "google/gemini-3-flash-preview")), \
-             patch("agent.auxiliary_client._resolve_task_provider_model",
-                    return_value=("auto", "google/gemini-3-flash-preview", None, None, None)), \
-             patch("agent.auxiliary_client._try_payment_fallback",
-                    return_value=(fb_sync_client, "gpt-5.2-codex", "openai-codex")) as mock_fb, \
-             patch("agent.auxiliary_client._to_async_client",
-                    return_value=(fb_async_client, "gpt-5.2-codex")):
-            result = await async_call_llm(
-                task="compression",
-                messages=[{"role": "user", "content": "hello"}],
-            )
-
-        assert result is fb_response
-        mock_fb.assert_called_once_with("auto", "compression", reason="payment error")
-
-    @pytest.mark.asyncio
-    async def test_402_no_async_fallback_when_explicit(self, monkeypatch):
-        """When provider is explicit, 402 should NOT trigger async fallback."""
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-
-        primary_client = MagicMock()
-        primary_client.chat.completions.create = AsyncMock(
-            side_effect=self._make_402_error())
-
-        with patch("agent.auxiliary_client._get_cached_client",
-                    return_value=(primary_client, "local-model")), \
-             patch("agent.auxiliary_client._resolve_task_provider_model",
-                    return_value=("custom", "local-model", None, None, None)), \
-             patch("agent.auxiliary_client._try_payment_fallback") as mock_fb:
-            with pytest.raises(Exception, match="insufficient credits"):
-                await async_call_llm(
-                    task="compression",
-                    messages=[{"role": "user", "content": "hello"}],
-                )
-
-        mock_fb.assert_not_called()
-
-    @pytest.mark.asyncio
-    async def test_connection_error_triggers_async_fallback(self, monkeypatch):
-        """Connection errors trigger async fallback when provider is auto."""
-        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-
-        primary_client = MagicMock()
-        conn_err = Exception("Connection refused")
-        conn_err.status_code = None
-        primary_client.chat.completions.create = AsyncMock(side_effect=conn_err)
-
-        fb_sync_client = MagicMock()
-        fb_async_client = MagicMock()
-        fb_response = MagicMock()
-        fb_async_client.chat.completions.create = AsyncMock(return_value=fb_response)
-
-        with patch("agent.auxiliary_client._get_cached_client",
-                    return_value=(primary_client, "model")), \
-             patch("agent.auxiliary_client._resolve_task_provider_model",
-                    return_value=("auto", "model", None, None, None)), \
-             patch("agent.auxiliary_client._is_connection_error", return_value=True), \
-             patch("agent.auxiliary_client._try_payment_fallback",
-                    return_value=(fb_sync_client, "fb-model", "nous")) as mock_fb, \
-             patch("agent.auxiliary_client._to_async_client",
-                    return_value=(fb_async_client, "fb-model")):
-            result = await async_call_llm(
-                task="compression",
-                messages=[{"role": "user", "content": "hello"}],
-            )
-
-        assert result is fb_response
-        mock_fb.assert_called_once_with("auto", "compression", reason="connection error")
-class TestStaleBaseUrlWarning:
-    """_resolve_auto() warns when OPENAI_BASE_URL conflicts with config provider (#5161)."""
-
-    def test_warns_when_openai_base_url_set_with_named_provider(self, monkeypatch, caplog):
-        """Warning fires when OPENAI_BASE_URL is set but provider is a named provider."""
-        import agent.auxiliary_client as mod
-        # Reset the module-level flag so the warning fires
-        monkeypatch.setattr(mod, "_stale_base_url_warned", False)
-        monkeypatch.setenv("OPENAI_BASE_URL", "http://localhost:11434/v1")
-        monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-test")
-
-        with patch("agent.auxiliary_client._read_main_provider", return_value="openrouter"), \
-             patch("agent.auxiliary_client._read_main_model", return_value="google/gemini-flash"), \
-             caplog.at_level(logging.WARNING, logger="agent.auxiliary_client"):
-            _resolve_auto()
-
-        assert any("OPENAI_BASE_URL is set" in rec.message for rec in caplog.records), \
-            "Expected a warning about stale OPENAI_BASE_URL"
-        assert mod._stale_base_url_warned is True
-
-    def test_no_warning_when_provider_is_custom(self, monkeypatch, caplog):
-        """No warning when the provider is 'custom' — OPENAI_BASE_URL is expected."""
-        import agent.auxiliary_client as mod
-        monkeypatch.setattr(mod, "_stale_base_url_warned", False)
-        monkeypatch.setenv("OPENAI_BASE_URL", "http://localhost:11434/v1")
-        monkeypatch.setenv("OPENAI_API_KEY", "test-key")
-
-        with patch("agent.auxiliary_client._read_main_provider", return_value="custom"), \
-             patch("agent.auxiliary_client._read_main_model", return_value="llama3"), \
-             patch("agent.auxiliary_client._resolve_custom_runtime",
-                   return_value=("http://localhost:11434/v1", "test-key", None)), \
-             patch("agent.auxiliary_client.OpenAI") as mock_openai, \
-             caplog.at_level(logging.WARNING, logger="agent.auxiliary_client"):
-            mock_openai.return_value = MagicMock()
-            _resolve_auto()
-
-        assert not any("OPENAI_BASE_URL is set" in rec.message for rec in caplog.records), \
-            "Should NOT warn when provider is 'custom'"
-
-    def test_no_warning_when_provider_is_named_custom(self, monkeypatch, caplog):
-        """No warning when the provider is 'custom:myname' — base_url comes from config."""
-        import agent.auxiliary_client as mod
-        monkeypatch.setattr(mod, "_stale_base_url_warned", False)
-        monkeypatch.setenv("OPENAI_BASE_URL", "http://localhost:11434/v1")
-        monkeypatch.setenv("OPENAI_API_KEY", "test-key")
-
-        with patch("agent.auxiliary_client._read_main_provider", return_value="custom:ollama-local"), \
-             patch("agent.auxiliary_client._read_main_model", return_value="llama3"), \
-             patch("agent.auxiliary_client.resolve_provider_client",
-                   return_value=(MagicMock(), "llama3")), \
-             caplog.at_level(logging.WARNING, logger="agent.auxiliary_client"):
-            _resolve_auto()
-
-        assert not any("OPENAI_BASE_URL is set" in rec.message for rec in caplog.records), \
-            "Should NOT warn when provider is 'custom:*'"
-
-    def test_no_warning_when_openai_base_url_not_set(self, monkeypatch, caplog):
-        """No warning when OPENAI_BASE_URL is absent."""
-        import agent.auxiliary_client as mod
-        monkeypatch.setattr(mod, "_stale_base_url_warned", False)
-        monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
-        monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-test")
-
-        with patch("agent.auxiliary_client._read_main_provider", return_value="openrouter"), \
-             patch("agent.auxiliary_client._read_main_model", return_value="google/gemini-flash"), \
-             caplog.at_level(logging.WARNING, logger="agent.auxiliary_client"):
-            _resolve_auto()
-
-        assert not any("OPENAI_BASE_URL is set" in rec.message for rec in caplog.records), \
-            "Should NOT warn when OPENAI_BASE_URL is not set"
-
-    def test_warning_only_fires_once(self, monkeypatch, caplog):
-        """Warning is suppressed after the first invocation."""
-        import agent.auxiliary_client as mod
-        monkeypatch.setattr(mod, "_stale_base_url_warned", False)
-        monkeypatch.setenv("OPENAI_BASE_URL", "http://localhost:11434/v1")
-        monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-test")
-
-        with patch("agent.auxiliary_client._read_main_provider", return_value="openrouter"), \
-             patch("agent.auxiliary_client._read_main_model", return_value="google/gemini-flash"), \
-             caplog.at_level(logging.WARNING, logger="agent.auxiliary_client"):
-            _resolve_auto()
-            caplog.clear()
-            _resolve_auto()
-
-        assert not any("OPENAI_BASE_URL is set" in rec.message for rec in caplog.records), \
-            "Warning should not fire a second time"
@@ -1,250 +0,0 @@
-"""Tests for the ContextEngine ABC and plugin slot."""
-
-import json
-import pytest
-from typing import Any, Dict, List
-
-from agent.context_engine import ContextEngine
-from agent.context_compressor import ContextCompressor
-
-
-# ---------------------------------------------------------------------------
-# A minimal concrete engine for testing the ABC
-# ---------------------------------------------------------------------------
-
-class StubEngine(ContextEngine):
-    """Minimal engine that satisfies the ABC without doing real work."""
-
-    def __init__(self, context_length=200000, threshold_pct=0.50):
-        self.context_length = context_length
-        self.threshold_tokens = int(context_length * threshold_pct)
-        self._compress_called = False
-        self._tools_called = []
-
-    @property
-    def name(self) -> str:
-        return "stub"
-
-    def update_from_response(self, usage: Dict[str, Any]) -> None:
-        self.last_prompt_tokens = usage.get("prompt_tokens", 0)
-        self.last_completion_tokens = usage.get("completion_tokens", 0)
-        self.last_total_tokens = usage.get("total_tokens", 0)
-
-    def should_compress(self, prompt_tokens: int = None) -> bool:
-        tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
-        return tokens >= self.threshold_tokens
-
-    def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
-        self._compress_called = True
-        self.compression_count += 1
-        # Trivial: just return as-is
-        return messages
-
-    def get_tool_schemas(self) -> List[Dict[str, Any]]:
-        return [
-            {
-                "name": "stub_search",
-                "description": "Search the stub engine",
-                "parameters": {"type": "object", "properties": {}},
-            }
-        ]
-
-    def handle_tool_call(self, name: str, args: Dict[str, Any]) -> str:
-        self._tools_called.append(name)
-        return json.dumps({"ok": True, "tool": name})
-
-
-# ---------------------------------------------------------------------------
-# ABC contract tests
-# ---------------------------------------------------------------------------
-
-class TestContextEngineABC:
-    """Verify the ABC enforces the required interface."""
-
-    def test_cannot_instantiate_abc_directly(self):
-        with pytest.raises(TypeError):
-            ContextEngine()
-
-    def test_missing_methods_raises(self):
-        """A subclass missing required methods cannot be instantiated."""
-        class Incomplete(ContextEngine):
-            @property
-            def name(self):
-                return "incomplete"
-        with pytest.raises(TypeError):
-            Incomplete()
-
-    def test_stub_engine_satisfies_abc(self):
-        engine = StubEngine()
-        assert isinstance(engine, ContextEngine)
-        assert engine.name == "stub"
-
-    def test_compressor_is_context_engine(self):
-        c = ContextCompressor(model="test", quiet_mode=True, config_context_length=200000)
-        assert isinstance(c, ContextEngine)
-        assert c.name == "compressor"
-
-
-# ---------------------------------------------------------------------------
-# Default method behavior
-# ---------------------------------------------------------------------------
-
-class TestDefaults:
-    """Verify ABC default implementations work correctly."""
-
-    def test_default_tool_schemas_empty(self):
-        engine = StubEngine()
-        # StubEngine overrides this, so test the base via super
-        assert ContextEngine.get_tool_schemas(engine) == []
-
-    def test_default_handle_tool_call_returns_error(self):
-        engine = StubEngine()
-        result = ContextEngine.handle_tool_call(engine, "unknown", {})
-        data = json.loads(result)
-        assert "error" in data
-
-    def test_default_get_status(self):
-        engine = StubEngine()
-        engine.last_prompt_tokens = 50000
-        status = engine.get_status()
-        assert status["last_prompt_tokens"] == 50000
-        assert status["context_length"] == 200000
-        assert status["threshold_tokens"] == 100000
-        assert 0 < status["usage_percent"] <= 100
-
-    def test_on_session_reset(self):
-        engine = StubEngine()
-        engine.last_prompt_tokens = 999
-        engine.compression_count = 3
-        engine.on_session_reset()
-        assert engine.last_prompt_tokens == 0
-        assert engine.compression_count == 0
-
-    def test_should_compress_preflight_default_false(self):
-        engine = StubEngine()
-        assert engine.should_compress_preflight([]) is False
-
-
-# ---------------------------------------------------------------------------
-# StubEngine behavior
-# ---------------------------------------------------------------------------
-
-class TestStubEngine:
-
-    def test_should_compress(self):
-        engine = StubEngine(context_length=100000, threshold_pct=0.50)
-        assert not engine.should_compress(40000)
-        assert engine.should_compress(50000)
-        assert engine.should_compress(60000)
-
-    def test_compress_tracks_count(self):
-        engine = StubEngine()
-        msgs = [{"role": "user", "content": "hello"}]
-        result = engine.compress(msgs)
-        assert result == msgs
-        assert engine._compress_called
-        assert engine.compression_count == 1
-
-    def test_tool_schemas(self):
-        engine = StubEngine()
-        schemas = engine.get_tool_schemas()
-        assert len(schemas) == 1
-        assert schemas[0]["name"] == "stub_search"
-
-    def test_handle_tool_call(self):
-        engine = StubEngine()
-        result = engine.handle_tool_call("stub_search", {})
-        assert json.loads(result)["ok"] is True
-        assert "stub_search" in engine._tools_called
-
-    def test_update_from_response(self):
-        engine = StubEngine()
-        engine.update_from_response({"prompt_tokens": 1000, "completion_tokens": 200, "total_tokens": 1200})
-        assert engine.last_prompt_tokens == 1000
-        assert engine.last_completion_tokens == 200
-
-
-# ---------------------------------------------------------------------------
-# ContextCompressor session reset via ABC
-# ---------------------------------------------------------------------------
-
-class TestCompressorSessionReset:
-    """Verify ContextCompressor.on_session_reset() clears all state."""
-
-    def test_reset_clears_state(self):
-        c = ContextCompressor(model="test", quiet_mode=True, config_context_length=200000)
-        c.last_prompt_tokens = 50000
-        c.compression_count = 3
-        c._previous_summary = "some old summary"
-        c._context_probed = True
-        c._context_probe_persistable = True
-
-        c.on_session_reset()
-
-        assert c.last_prompt_tokens == 0
-        assert c.last_completion_tokens == 0
-        assert c.last_total_tokens == 0
-        assert c.compression_count == 0
-        assert c._context_probed is False
-        assert c._context_probe_persistable is False
-        assert c._previous_summary is None
-
-
-# ---------------------------------------------------------------------------
-# Plugin slot (PluginManager integration)
-# ---------------------------------------------------------------------------
-
-class TestPluginContextEngineSlot:
-    """Test register_context_engine on PluginContext."""
-
-    def test_register_engine(self):
-        from hermes_cli.plugins import PluginManager, PluginContext, PluginManifest
-        mgr = PluginManager()
-        manifest = PluginManifest(name="test-lcm")
-        ctx = PluginContext(manifest, mgr)
-
-        engine = StubEngine()
-        ctx.register_context_engine(engine)
-
-        assert mgr._context_engine is engine
-        assert mgr._context_engine.name == "stub"
-
-    def test_reject_second_engine(self):
-        from hermes_cli.plugins import PluginManager, PluginContext, PluginManifest
-        mgr = PluginManager()
-        manifest = PluginManifest(name="test-lcm")
-        ctx = PluginContext(manifest, mgr)
-
-        engine1 = StubEngine()
-        engine2 = StubEngine()
-        ctx.register_context_engine(engine1)
-        ctx.register_context_engine(engine2)  # should be rejected
-
-        assert mgr._context_engine is engine1
-
-    def test_reject_non_engine(self):
-        from hermes_cli.plugins import PluginManager, PluginContext, PluginManifest
-        mgr = PluginManager()
-        manifest = PluginManifest(name="test-bad")
-        ctx = PluginContext(manifest, mgr)
-
-        ctx.register_context_engine("not an engine")
-        assert mgr._context_engine is None
-
-    def test_get_plugin_context_engine(self):
-        from hermes_cli.plugins import PluginManager, PluginContext, PluginManifest, get_plugin_context_engine, _plugin_manager
-        import hermes_cli.plugins as plugins_mod
-
-        # Inject a test manager
-        old_mgr = plugins_mod._plugin_manager
-        try:
-            mgr = PluginManager()
-            plugins_mod._plugin_manager = mgr
-
-            assert get_plugin_context_engine() is None
-
-            engine = StubEngine()
-            mgr._context_engine = engine
-            assert get_plugin_context_engine() is engine
-        finally:
-            plugins_mod._plugin_manager = old_mgr
@@ -1,37 +1,37 @@
-"""Tests for MiniMax provider hardening — context lengths, thinking, catalog, beta headers, transport."""
+"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog, beta headers."""

 from unittest.mock import patch


 class TestMinimaxContextLengths:
-    """Verify context length entries match official docs (204,800 for all models).
+    """Verify per-model context length entries for MiniMax models."""

-    Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api
-    """
-
-    def test_minimax_prefix_has_correct_context(self):
+    def test_m1_variants_have_1m_context(self):
        from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
-        assert DEFAULT_CONTEXT_LENGTHS["minimax"] == 204_800
+        # Keys are lowercase because the lookup lowercases model names
+        for model in ("minimax-m1", "minimax-m1-40k", "minimax-m1-80k",
+                       "minimax-m1-128k", "minimax-m1-256k"):
+            assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths"
+            assert DEFAULT_CONTEXT_LENGTHS[model] == 1_000_000, f"{model} expected 1M"

-    def test_minimax_models_resolve_via_prefix(self):
-        from agent.model_metadata import get_model_context_length
-        # All MiniMax models should resolve to 204,800 via the "minimax" prefix
-        for model in ("MiniMax-M2.7", "MiniMax-M2.5", "MiniMax-M2.1", "MiniMax-M2"):
-            ctx = get_model_context_length(model, "")
-            assert ctx == 204_800, f"{model} expected 204800, got {ctx}"
+    def test_m2_variants_have_1m_context(self):
+        from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
+        # Keys are lowercase because the lookup lowercases model names
+        for model in ("minimax-m2.5", "minimax-m2.7"):
+            assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths"
+            assert DEFAULT_CONTEXT_LENGTHS[model] == 1_048_576, f"{model} expected 1048576"
+
+    def test_minimax_prefix_fallback(self):
+        from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
+        # The generic "minimax" prefix entry should be 1M for unknown models
+        assert DEFAULT_CONTEXT_LENGTHS["minimax"] == 1_048_576



-class TestMinimaxThinkingSupport:
-    """Verify that MiniMax gets manual thinking (not adaptive).
+class TestMinimaxThinkingGuard:
+    """Verify that build_anthropic_kwargs does NOT add thinking params for MiniMax models."""

-    MiniMax's Anthropic-compat endpoint officially supports the thinking
-    parameter (https://platform.minimax.io/docs/api-reference/text-anthropic-api).
-    It should get manual thinking (type=enabled + budget_tokens), NOT adaptive
-    thinking (which is Claude 4.6-only).
-    """
-
-    def test_minimax_m27_gets_manual_thinking(self):
+    def test_no_thinking_for_minimax_m27(self):
        from agent.anthropic_adapter import build_anthropic_kwargs
        kwargs = build_anthropic_kwargs(
            model="MiniMax-M2.7",
@@ -40,23 +40,19 @@ class TestMinimaxThinkingSupport:
            max_tokens=4096,
            reasoning_config={"enabled": True, "effort": "medium"},
        )
-        assert "thinking" in kwargs
-        assert kwargs["thinking"]["type"] == "enabled"
-        assert "budget_tokens" in kwargs["thinking"]
-        # MiniMax should NOT get adaptive thinking or output_config
+        assert "thinking" not in kwargs
        assert "output_config" not in kwargs

-    def test_minimax_m25_gets_manual_thinking(self):
+    def test_no_thinking_for_minimax_m1(self):
        from agent.anthropic_adapter import build_anthropic_kwargs
        kwargs = build_anthropic_kwargs(
-            model="MiniMax-M2.5",
+            model="MiniMax-M1-128k",
            messages=[{"role": "user", "content": "hello"}],
            tools=None,
            max_tokens=4096,
            reasoning_config={"enabled": True, "effort": "high"},
        )
-        assert "thinking" in kwargs
-        assert kwargs["thinking"]["type"] == "enabled"
+        assert "thinking" not in kwargs

    def test_thinking_still_works_for_claude(self):
        from agent.anthropic_adapter import build_anthropic_kwargs
@@ -85,30 +81,25 @@ class TestMinimaxAuxModel:


 class TestMinimaxModelCatalog:
-    """Verify the model catalog matches official Anthropic-compat endpoint models.
+    """Verify the model catalog includes M1 family and excludes deprecated models."""

-    Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api
-    """
-
-    def test_catalog_includes_current_models(self):
+    def test_catalog_includes_m1_family(self):
        from hermes_cli.models import _PROVIDER_MODELS
        for provider in ("minimax", "minimax-cn"):
            models = _PROVIDER_MODELS[provider]
-            assert "MiniMax-M2.7" in models
-            assert "MiniMax-M2.5" in models
-            assert "MiniMax-M2.1" in models
-            assert "MiniMax-M2" in models
+            assert "MiniMax-M1" in models
+            assert "MiniMax-M1-40k" in models
+            assert "MiniMax-M1-80k" in models
+            assert "MiniMax-M1-128k" in models
+            assert "MiniMax-M1-256k" in models

-    def test_catalog_excludes_m1_family(self):
-        """M1 models are not available on the /anthropic endpoint."""
+    def test_catalog_excludes_deprecated(self):
        from hermes_cli.models import _PROVIDER_MODELS
        for provider in ("minimax", "minimax-cn"):
            models = _PROVIDER_MODELS[provider]
-            assert "MiniMax-M1" not in models
+            assert "MiniMax-M2.1" not in models

    def test_catalog_excludes_highspeed(self):
-        """Highspeed variants are available but not shown in default catalog
-        (users can still specify them manually)."""
        from hermes_cli.models import _PROVIDER_MODELS
        for provider in ("minimax", "minimax-cn"):
            models = _PROVIDER_MODELS[provider]
@@ -211,154 +202,3 @@ class TestMinimaxBetaHeaders:
    def test_common_betas_regular_url(self):
        from agent.anthropic_adapter import _common_betas_for_base_url, _COMMON_BETAS
        assert _common_betas_for_base_url("https://api.anthropic.com") == _COMMON_BETAS
-
-
-class TestMinimaxApiMode:
-    """Verify determine_api_mode returns anthropic_messages for MiniMax providers.
-
-    The MiniMax /anthropic endpoint speaks Anthropic Messages wire format,
-    not OpenAI chat completions.  The overlay transport must reflect this
-    so that code paths calling determine_api_mode() without a base_url
-    (e.g. /model switch) get the correct api_mode.
-    """
-
-    def test_minimax_returns_anthropic_messages(self):
-        from hermes_cli.providers import determine_api_mode
-        assert determine_api_mode("minimax") == "anthropic_messages"
-
-    def test_minimax_cn_returns_anthropic_messages(self):
-        from hermes_cli.providers import determine_api_mode
-        assert determine_api_mode("minimax-cn") == "anthropic_messages"
-
-    def test_minimax_with_url_also_works(self):
-        from hermes_cli.providers import determine_api_mode
-        # Even with explicit base_url, provider lookup takes priority
-        assert determine_api_mode("minimax", "https://api.minimax.io/anthropic") == "anthropic_messages"
-
-    def test_anthropic_still_returns_anthropic_messages(self):
-        from hermes_cli.providers import determine_api_mode
-        assert determine_api_mode("anthropic") == "anthropic_messages"
-
-    def test_openai_returns_chat_completions(self):
-        from hermes_cli.providers import determine_api_mode
-        # Sanity check: standard providers are unaffected
-        result = determine_api_mode("deepseek")
-        assert result == "chat_completions"
-
-
-class TestMinimaxMaxOutput:
-    """Verify _get_anthropic_max_output returns correct limits for MiniMax models.
-
-    MiniMax max output is 131,072 tokens (source: OpenClaw model definitions,
-    cross-referenced with MiniMax API behavior).
-    """
-
-    def test_minimax_m27_output_limit(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("MiniMax-M2.7") == 131_072
-
-    def test_minimax_m25_output_limit(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("MiniMax-M2.5") == 131_072
-
-    def test_minimax_m2_output_limit(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        assert _get_anthropic_max_output("MiniMax-M2") == 131_072
-
-    def test_claude_output_unaffected(self):
-        from agent.anthropic_adapter import _get_anthropic_max_output
-        # Sanity: Claude limits are not broken by the MiniMax entry
-        assert _get_anthropic_max_output("claude-sonnet-4-6") == 64_000
-
-
-class TestMinimaxPreserveDots:
-    """Verify that MiniMax model names preserve dots through the Anthropic adapter.
-
-    MiniMax model IDs like 'MiniMax-M2.7' must NOT have dots converted to
-    hyphens — the endpoint expects the exact name with dots.
-    """
-
-    def test_minimax_provider_preserves_dots(self):
-        from types import SimpleNamespace
-        agent = SimpleNamespace(provider="minimax", base_url="")
-        from run_agent import AIAgent
-        assert AIAgent._anthropic_preserve_dots(agent) is True
-
-    def test_minimax_cn_provider_preserves_dots(self):
-        from types import SimpleNamespace
-        agent = SimpleNamespace(provider="minimax-cn", base_url="")
-        from run_agent import AIAgent
-        assert AIAgent._anthropic_preserve_dots(agent) is True
-
-    def test_minimax_url_preserves_dots(self):
-        from types import SimpleNamespace
-        agent = SimpleNamespace(provider="custom", base_url="https://api.minimax.io/anthropic")
-        from run_agent import AIAgent
-        assert AIAgent._anthropic_preserve_dots(agent) is True
-
-    def test_minimax_cn_url_preserves_dots(self):
-        from types import SimpleNamespace
-        agent = SimpleNamespace(provider="custom", base_url="https://api.minimaxi.com/anthropic")
-        from run_agent import AIAgent
-        assert AIAgent._anthropic_preserve_dots(agent) is True
-
-    def test_anthropic_does_not_preserve_dots(self):
-        from types import SimpleNamespace
-        agent = SimpleNamespace(provider="anthropic", base_url="https://api.anthropic.com")
-        from run_agent import AIAgent
-        assert AIAgent._anthropic_preserve_dots(agent) is False
-
-    def test_normalize_preserves_m27_dot(self):
-        from agent.anthropic_adapter import normalize_model_name
-        assert normalize_model_name("MiniMax-M2.7", preserve_dots=True) == "MiniMax-M2.7"
-
-    def test_normalize_converts_without_preserve(self):
-        from agent.anthropic_adapter import normalize_model_name
-        # Without preserve_dots, dots become hyphens (broken for MiniMax)
-        assert normalize_model_name("MiniMax-M2.7", preserve_dots=False) == "MiniMax-M2-7"
-
-
-class TestMinimaxSwitchModelCredentialGuard:
-    """Verify switch_model() does not leak Anthropic credentials to MiniMax.
-
-    The __init__ path correctly guards against this (line 761), but switch_model()
-    must mirror that guard. Without it, /model switch to minimax with no explicit
-    api_key would fall back to resolve_anthropic_token() and send Anthropic creds
-    to the MiniMax endpoint.
-    """
-
-    def test_switch_to_minimax_does_not_resolve_anthropic_token(self):
-        """switch_model() should NOT call resolve_anthropic_token() for MiniMax."""
-        from unittest.mock import patch, MagicMock
-
-        with patch("run_agent.AIAgent.__init__", return_value=None):
-            from run_agent import AIAgent
-            agent = AIAgent.__new__(AIAgent)
-            agent.provider = "anthropic"
-            agent.model = "claude-sonnet-4"
-            agent.api_key = "sk-ant-fake"
-            agent.base_url = "https://api.anthropic.com"
-            agent.api_mode = "anthropic_messages"
-            agent._anthropic_base_url = "https://api.anthropic.com"
-            agent._anthropic_api_key = "sk-ant-fake"
-            agent._is_anthropic_oauth = False
-            agent._client_kwargs = {}
-            agent.client = None
-            agent._anthropic_client = MagicMock()
-
-        with patch("agent.anthropic_adapter.build_anthropic_client") as mock_build, \
-             patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-leaked") as mock_resolve, \
-             patch("agent.anthropic_adapter._is_oauth_token", return_value=False):
-
-            agent.switch_model(
-                new_model="MiniMax-M2.7",
-                new_provider="minimax",
-                api_mode="anthropic_messages",
-                api_key="mm-key-123",
-                base_url="https://api.minimax.io/anthropic",
-            )
-            # resolve_anthropic_token should NOT be called for non-Anthropic providers
-            mock_resolve.assert_not_called()
-            # The key passed to build_anthropic_client should be the MiniMax key
-            build_args = mock_build.call_args
-            assert build_args[0][0] == "mm-key-123"
@@ -1,66 +0,0 @@
-"""Tests for CLI manual compression messaging."""
-
-from unittest.mock import MagicMock, patch
-
-from tests.cli.test_cli_init import _make_cli
-
-
-def _make_history() -> list[dict[str, str]]:
-    return [
-        {"role": "user", "content": "one"},
-        {"role": "assistant", "content": "two"},
-        {"role": "user", "content": "three"},
-        {"role": "assistant", "content": "four"},
-    ]
-
-
-def test_manual_compress_reports_noop_without_success_banner(capsys):
-    shell = _make_cli()
-    history = _make_history()
-    shell.conversation_history = history
-    shell.agent = MagicMock()
-    shell.agent.compression_enabled = True
-    shell.agent._cached_system_prompt = ""
-    shell.agent._compress_context.return_value = (list(history), "")
-
-    def _estimate(messages):
-        assert messages == history
-        return 100
-
-    with patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate):
-        shell._manual_compress()
-
-    output = capsys.readouterr().out
-    assert "No changes from compression" in output
-    assert "✅ Compressed" not in output
-    assert "Rough transcript estimate: ~100 tokens (unchanged)" in output
-
-
-def test_manual_compress_explains_when_token_estimate_rises(capsys):
-    shell = _make_cli()
-    history = _make_history()
-    compressed = [
-        history[0],
-        {"role": "assistant", "content": "Dense summary that still counts as more tokens."},
-        history[-1],
-    ]
-    shell.conversation_history = history
-    shell.agent = MagicMock()
-    shell.agent.compression_enabled = True
-    shell.agent._cached_system_prompt = ""
-    shell.agent._compress_context.return_value = (compressed, "")
-
-    def _estimate(messages):
-        if messages == history:
-            return 100
-        if messages == compressed:
-            return 120
-        raise AssertionError(f"unexpected transcript: {messages!r}")
-
-    with patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate):
-        shell._manual_compress()
-
-    output = capsys.readouterr().out
-    assert "✅ Compressed: 4 → 3 messages" in output
-    assert "Rough transcript estimate: ~100 → ~120 tokens" in output
-    assert "denser summaries" in output
@@ -1,110 +0,0 @@
-import asyncio
-from unittest.mock import AsyncMock, MagicMock
-
-from gateway.config import GatewayConfig, Platform, PlatformConfig
-from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
-from gateway.restart import DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-from gateway.run import GatewayRunner
-from gateway.session import SessionSource
-
-
-class RestartTestAdapter(BasePlatformAdapter):
-    def __init__(self):
-        super().__init__(PlatformConfig(enabled=True, token="***"), Platform.TELEGRAM)
-        self.sent: list[str] = []
-
-    async def connect(self):
-        return True
-
-    async def disconnect(self):
-        return None
-
-    async def send(self, chat_id, content, reply_to=None, metadata=None):
-        self.sent.append(content)
-        return SendResult(success=True, message_id="1")
-
-    async def send_typing(self, chat_id, metadata=None):
-        return None
-
-    async def get_chat_info(self, chat_id):
-        return {"id": chat_id}
-
-
-def make_restart_source(chat_id: str = "123456", chat_type: str = "dm") -> SessionSource:
-    return SessionSource(
-        platform=Platform.TELEGRAM,
-        chat_id=chat_id,
-        chat_type=chat_type,
-    )
-
-
-def make_restart_runner(
-    adapter: BasePlatformAdapter | None = None,
-) -> tuple[GatewayRunner, BasePlatformAdapter]:
-    runner = object.__new__(GatewayRunner)
-    runner.config = GatewayConfig(
-        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}
-    )
-    runner._running = True
-    runner._shutdown_event = asyncio.Event()
-    runner._exit_reason = None
-    runner._exit_code = None
-    runner._running_agents = {}
-    runner._running_agents_ts = {}
-    runner._pending_messages = {}
-    runner._pending_approvals = {}
-    runner._pending_model_notes = {}
-    runner._background_tasks = set()
-    runner._draining = False
-    runner._restart_requested = False
-    runner._restart_task_started = False
-    runner._restart_detached = False
-    runner._restart_via_service = False
-    runner._restart_drain_timeout = DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-    runner._stop_task = None
-    runner._busy_input_mode = "interrupt"
-    runner._update_prompt_pending = {}
-    runner._voice_mode = {}
-    runner._session_model_overrides = {}
-    runner._shutdown_all_gateway_honcho = lambda: None
-    runner._update_runtime_status = MagicMock()
-    runner._queue_or_replace_pending_event = GatewayRunner._queue_or_replace_pending_event.__get__(
-        runner, GatewayRunner
-    )
-    runner._session_key_for_source = GatewayRunner._session_key_for_source.__get__(
-        runner, GatewayRunner
-    )
-    runner._handle_active_session_busy_message = (
-        GatewayRunner._handle_active_session_busy_message.__get__(runner, GatewayRunner)
-    )
-    runner._handle_restart_command = GatewayRunner._handle_restart_command.__get__(
-        runner, GatewayRunner
-    )
-    runner._status_action_label = GatewayRunner._status_action_label.__get__(
-        runner, GatewayRunner
-    )
-    runner._status_action_gerund = GatewayRunner._status_action_gerund.__get__(
-        runner, GatewayRunner
-    )
-    runner._queue_during_drain_enabled = GatewayRunner._queue_during_drain_enabled.__get__(
-        runner, GatewayRunner
-    )
-    runner._running_agent_count = GatewayRunner._running_agent_count.__get__(
-        runner, GatewayRunner
-    )
-    runner._launch_detached_restart_command = GatewayRunner._launch_detached_restart_command.__get__(
-        runner, GatewayRunner
-    )
-    runner.request_restart = GatewayRunner.request_restart.__get__(runner, GatewayRunner)
-    runner._is_user_authorized = lambda _source: True
-    runner.hooks = MagicMock()
-    runner.hooks.emit = AsyncMock()
-    runner.pairing_store = MagicMock()
-    runner.session_store = MagicMock()
-    runner.delivery_router = MagicMock()
-
-    platform_adapter = adapter or RestartTestAdapter()
-    platform_adapter.set_message_handler(AsyncMock(return_value=None))
-    platform_adapter.set_busy_session_handler(runner._handle_active_session_busy_message)
-    runner.adapters = {Platform.TELEGRAM: platform_adapter}
-    return runner, platform_adapter
@@ -464,7 +464,7 @@ class TestChatCompletionsEndpoint:

    @pytest.mark.asyncio
    async def test_stream_includes_tool_progress(self, adapter):
-        """tool_progress_callback fires → progress appears as custom SSE event, not in delta.content."""
+        """tool_progress_callback fires → progress appears in the SSE stream."""
        import asyncio

        app = _create_app(adapter)
@@ -495,26 +495,8 @@ class TestChatCompletionsEndpoint:
                assert resp.status == 200
                body = await resp.text()
                assert "[DONE]" in body
-                # Tool progress must appear as a custom SSE event, not in
-                # delta.content — prevents model from learning to imitate
-                # markers instead of calling tools (#6972).
-                assert "event: hermes.tool.progress" in body
-                assert '"tool": "terminal"' in body
-                assert '"label": "ls -la"' in body
-                # The progress marker must NOT appear inside any
-                # chat.completion.chunk delta.content field.
-                import json as _json
-                for line in body.splitlines():
-                    if line.startswith("data: ") and line.strip() != "data: [DONE]":
-                        try:
-                            chunk = _json.loads(line[len("data: "):])
-                        except _json.JSONDecodeError:
-                            continue
-                        if chunk.get("object") == "chat.completion.chunk":
-                            for choice in chunk.get("choices", []):
-                                content = choice.get("delta", {}).get("content", "")
-                                # Tool emoji markers must never leak into content
-                                assert "ls -la" not in content or content == "Here are the files."
+                # Tool progress message must appear in the stream
+                assert "ls -la" in body
                # Final content must also be present
                assert "Here are the files." in body

@@ -550,12 +532,10 @@ class TestChatCompletionsEndpoint:
                )
                assert resp.status == 200
                body = await resp.text()
-                # Internal _thinking event should NOT appear anywhere
+                # Internal _thinking event should NOT appear
                assert "some internal state" not in body
-                # Real tool progress should appear as custom SSE event
-                assert "event: hermes.tool.progress" in body
-                assert '"tool": "web_search"' in body
-                assert '"label": "Python docs"' in body
+                # Real tool progress should appear
+                assert "Python docs" in body

    @pytest.mark.asyncio
    async def test_no_user_message_returns_400(self, adapter):
@@ -345,11 +345,6 @@ class TestBlockingApprovalE2E:

    def setup_method(self):
        _clear_approval_state()
-        os.environ.pop("HERMES_YOLO_MODE", None)
-        os.environ.pop("HERMES_INTERACTIVE", None)
-        os.environ.pop("HERMES_GATEWAY_SESSION", None)
-        os.environ.pop("HERMES_EXEC_ASK", None)
-        os.environ.pop("HERMES_SESSION_KEY", None)

    def test_blocking_approval_approve_once(self):
        """check_all_command_guards blocks until resolve_gateway_approval is called."""
@@ -369,7 +364,6 @@ class TestBlockingApprovalE2E:
            from tools.approval import reset_current_session_key, set_current_session_key

            token = set_current_session_key(session_key)
-            os.environ["HERMES_GATEWAY_SESSION"] = "1"
            os.environ["HERMES_EXEC_ASK"] = "1"
            os.environ["HERMES_SESSION_KEY"] = session_key
            try:
@@ -377,7 +371,6 @@ class TestBlockingApprovalE2E:
                    "rm -rf /important", "local"
                )
            finally:
-                os.environ.pop("HERMES_GATEWAY_SESSION", None)
                os.environ.pop("HERMES_EXEC_ASK", None)
                os.environ.pop("HERMES_SESSION_KEY", None)
                reset_current_session_key(token)
@@ -417,7 +410,6 @@ class TestBlockingApprovalE2E:
            from tools.approval import reset_current_session_key, set_current_session_key

            token = set_current_session_key(session_key)
-            os.environ["HERMES_GATEWAY_SESSION"] = "1"
            os.environ["HERMES_EXEC_ASK"] = "1"
            os.environ["HERMES_SESSION_KEY"] = session_key
            try:
@@ -425,7 +417,6 @@ class TestBlockingApprovalE2E:
                    "rm -rf /important", "local"
                )
            finally:
-                os.environ.pop("HERMES_GATEWAY_SESSION", None)
                os.environ.pop("HERMES_EXEC_ASK", None)
                os.environ.pop("HERMES_SESSION_KEY", None)
                reset_current_session_key(token)
@@ -460,7 +451,6 @@ class TestBlockingApprovalE2E:
            from tools.approval import reset_current_session_key, set_current_session_key

            token = set_current_session_key(session_key)
-            os.environ["HERMES_GATEWAY_SESSION"] = "1"
            os.environ["HERMES_EXEC_ASK"] = "1"
            os.environ["HERMES_SESSION_KEY"] = session_key
            try:
@@ -470,7 +460,6 @@ class TestBlockingApprovalE2E:
                        "rm -rf /important", "local"
                    )
            finally:
-                os.environ.pop("HERMES_GATEWAY_SESSION", None)
                os.environ.pop("HERMES_EXEC_ASK", None)
                os.environ.pop("HERMES_SESSION_KEY", None)
                reset_current_session_key(token)
@@ -502,13 +491,11 @@ class TestBlockingApprovalE2E:
                from tools.approval import reset_current_session_key, set_current_session_key

                token = set_current_session_key(session_key)
-                os.environ["HERMES_GATEWAY_SESSION"] = "1"
                os.environ["HERMES_EXEC_ASK"] = "1"
                os.environ["HERMES_SESSION_KEY"] = session_key
                try:
                    results[idx] = check_all_command_guards(cmd, "local")
                finally:
-                    os.environ.pop("HERMES_GATEWAY_SESSION", None)
                    os.environ.pop("HERMES_EXEC_ASK", None)
                    os.environ.pop("HERMES_SESSION_KEY", None)
                    reset_current_session_key(token)
@@ -559,13 +546,11 @@ class TestBlockingApprovalE2E:
                from tools.approval import reset_current_session_key, set_current_session_key

                token = set_current_session_key(session_key)
-                os.environ["HERMES_GATEWAY_SESSION"] = "1"
                os.environ["HERMES_EXEC_ASK"] = "1"
                os.environ["HERMES_SESSION_KEY"] = session_key
                try:
                    results[idx] = check_all_command_guards(cmd, "local")
                finally:
-                    os.environ.pop("HERMES_GATEWAY_SESSION", None)
                    os.environ.pop("HERMES_EXEC_ASK", None)
                    os.environ.pop("HERMES_SESSION_KEY", None)
                    reset_current_session_key(token)
@@ -1,121 +0,0 @@
-"""Tests for gateway /compress user-facing messaging."""
-
-from datetime import datetime
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-from gateway.config import GatewayConfig, Platform, PlatformConfig
-from gateway.platforms.base import MessageEvent
-from gateway.session import SessionEntry, SessionSource, build_session_key
-
-
-def _make_source() -> SessionSource:
-    return SessionSource(
-        platform=Platform.TELEGRAM,
-        user_id="u1",
-        chat_id="c1",
-        user_name="tester",
-        chat_type="dm",
-    )
-
-
-def _make_event(text: str = "/compress") -> MessageEvent:
-    return MessageEvent(text=text, source=_make_source(), message_id="m1")
-
-
-def _make_history() -> list[dict[str, str]]:
-    return [
-        {"role": "user", "content": "one"},
-        {"role": "assistant", "content": "two"},
-        {"role": "user", "content": "three"},
-        {"role": "assistant", "content": "four"},
-    ]
-
-
-def _make_runner(history: list[dict[str, str]]):
-    from gateway.run import GatewayRunner
-
-    runner = object.__new__(GatewayRunner)
-    runner.config = GatewayConfig(
-        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}
-    )
-    session_entry = SessionEntry(
-        session_key=build_session_key(_make_source()),
-        session_id="sess-1",
-        created_at=datetime.now(),
-        updated_at=datetime.now(),
-        platform=Platform.TELEGRAM,
-        chat_type="dm",
-    )
-    runner.session_store = MagicMock()
-    runner.session_store.get_or_create_session.return_value = session_entry
-    runner.session_store.load_transcript.return_value = history
-    runner.session_store.rewrite_transcript = MagicMock()
-    runner.session_store.update_session = MagicMock()
-    runner.session_store._save = MagicMock()
-    return runner
-
-
-@pytest.mark.asyncio
-async def test_compress_command_reports_noop_without_success_banner():
-    history = _make_history()
-    runner = _make_runner(history)
-    agent_instance = MagicMock()
-    agent_instance.context_compressor.protect_first_n = 0
-    agent_instance.context_compressor._align_boundary_forward.return_value = 0
-    agent_instance.context_compressor._find_tail_cut_by_tokens.return_value = 2
-    agent_instance.session_id = "sess-1"
-    agent_instance._compress_context.return_value = (list(history), "")
-
-    def _estimate(messages):
-        assert messages == history
-        return 100
-
-    with (
-        patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "test-key"}),
-        patch("gateway.run._resolve_gateway_model", return_value="test-model"),
-        patch("run_agent.AIAgent", return_value=agent_instance),
-        patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate),
-    ):
-        result = await runner._handle_compress_command(_make_event())
-
-    assert "No changes from compression" in result
-    assert "Compressed:" not in result
-    assert "Rough transcript estimate: ~100 tokens (unchanged)" in result
-
-
-@pytest.mark.asyncio
-async def test_compress_command_explains_when_token_estimate_rises():
-    history = _make_history()
-    compressed = [
-        history[0],
-        {"role": "assistant", "content": "Dense summary that still counts as more tokens."},
-        history[-1],
-    ]
-    runner = _make_runner(history)
-    agent_instance = MagicMock()
-    agent_instance.context_compressor.protect_first_n = 0
-    agent_instance.context_compressor._align_boundary_forward.return_value = 0
-    agent_instance.context_compressor._find_tail_cut_by_tokens.return_value = 2
-    agent_instance.session_id = "sess-1"
-    agent_instance._compress_context.return_value = (compressed, "")
-
-    def _estimate(messages):
-        if messages == history:
-            return 100
-        if messages == compressed:
-            return 120
-        raise AssertionError(f"unexpected transcript: {messages!r}")
-
-    with (
-        patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "test-key"}),
-        patch("gateway.run._resolve_gateway_model", return_value="test-model"),
-        patch("run_agent.AIAgent", return_value=agent_instance),
-        patch("agent.model_metadata.estimate_messages_tokens_rough", side_effect=_estimate),
-    ):
-        result = await runner._handle_compress_command(_make_event())
-
-    assert "Compressed: 4 → 3 messages" in result
-    assert "Rough transcript estimate: ~100 → ~120 tokens" in result
-    assert "denser summaries" in result
@@ -1,44 +0,0 @@
-"""Tests for fallback-eviction gating on failed runs (#7130).
-
-When a run fails, the gateway must NOT evict the cached agent — doing so
-forces MCP reinit on the next message, creating a CPU-burning restart loop.
-Eviction should only happen on successful runs where fallback activated.
-"""
-
-import sys
-from pathlib import Path
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
-
-
-class TestFallbackEvictionGating:
-    """The fallback-eviction code path should skip eviction on failed runs."""
-
-    def test_failed_run_does_not_evict_cached_agent(self):
-        """When result has failed=True, the cached agent should NOT be evicted."""
-        # The fix: `and not _run_failed` guard on the eviction check.
-        # Simulate the variables that the eviction block uses.
-        result = {"failed": True, "final_response": None, "error": "400 invalid model"}
-        _run_failed = result.get("failed") if result else False
-        assert _run_failed is True, "Failed run should be detected"
-
-    def test_successful_run_allows_eviction(self):
-        """When result is successful, fallback eviction should proceed."""
-        result = {"completed": True, "final_response": "Hello!", "failed": False}
-        _run_failed = result.get("failed") if result else False
-        assert _run_failed is False, "Successful run should not be flagged"
-
-    def test_none_result_treated_as_not_failed(self):
-        """When result is None (edge case), treat as not-failed."""
-        result = None
-        _run_failed = result.get("failed") if result else False
-        assert _run_failed is False
-
-    def test_missing_failed_key_treated_as_not_failed(self):
-        """When result dict doesn't have 'failed' key, treat as not-failed."""
-        result = {"completed": True, "final_response": "Hello!"}
-        _run_failed = result.get("failed") if result else False
-        assert not _run_failed, "Missing 'failed' key should be falsy"
@@ -3,15 +3,43 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from gateway.platforms.base import MessageEvent
-from gateway.restart import GATEWAY_SERVICE_RESTART_EXIT_CODE
-from gateway.session import build_session_key
-from tests.gateway.restart_test_helpers import make_restart_runner, make_restart_source
+from gateway.config import GatewayConfig, Platform, PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
+from gateway.run import GatewayRunner
+from gateway.session import SessionSource, build_session_key
+
+
+class StubAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="***"), Platform.TELEGRAM)
+
+    async def connect(self):
+        return True
+
+    async def disconnect(self):
+        return None
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None):
+        return SendResult(success=True, message_id="1")
+
+    async def send_typing(self, chat_id, metadata=None):
+        return None
+
+    async def get_chat_info(self, chat_id):
+        return {"id": chat_id}
+
+
+def _source(chat_id="123456", chat_type="dm"):
+    return SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id=chat_id,
+        chat_type=chat_type,
+    )


@pytest.mark.asyncio
 async def test_cancel_background_tasks_cancels_inflight_message_processing():
-    _runner, adapter = make_restart_runner()
+    adapter = StubAdapter()
    release = asyncio.Event()

    async def block_forever(_event):
@@ -19,7 +47,7 @@ async def test_cancel_background_tasks_cancels_inflight_message_processing():
        return None

    adapter.set_message_handler(block_forever)
-    event = MessageEvent(text="work", source=make_restart_source(), message_id="1")
+    event = MessageEvent(text="work", source=_source(), message_id="1")

    await adapter.handle_message(event)
    await asyncio.sleep(0)
@@ -37,11 +65,17 @@ async def test_cancel_background_tasks_cancels_inflight_message_processing():

@pytest.mark.asyncio
 async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks():
-    runner, adapter = make_restart_runner()
+    runner = object.__new__(GatewayRunner)
+    runner.config = GatewayConfig(platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")})
+    runner._running = True
+    runner._shutdown_event = asyncio.Event()
+    runner._exit_reason = None
    runner._pending_messages = {"session": "pending text"}
    runner._pending_approvals = {"session": {"command": "rm -rf /tmp/x"}}
-    runner._restart_drain_timeout = 0.0
+    runner._background_tasks = set()
+    runner._shutdown_all_gateway_honcho = lambda: None

+    adapter = StubAdapter()
    release = asyncio.Event()

    async def block_forever(_event):
@@ -49,7 +83,7 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
        return None

    adapter.set_message_handler(block_forever)
-    event = MessageEvent(text="work", source=make_restart_source(), message_id="1")
+    event = MessageEvent(text="work", source=_source(), message_id="1")
    await adapter.handle_message(event)
    await asyncio.sleep(0)

@@ -59,6 +93,7 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
    session_key = build_session_key(event.source)
    running_agent = MagicMock()
    runner._running_agents = {session_key: running_agent}
+    runner.adapters = {Platform.TELEGRAM: adapter}

    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
        await runner.stop()
@@ -70,78 +105,3 @@ async def test_gateway_stop_interrupts_running_agents_and_cancels_adapter_tasks(
    assert runner._pending_messages == {}
    assert runner._pending_approvals == {}
    assert runner._shutdown_event.is_set() is True
-
-
-@pytest.mark.asyncio
-async def test_gateway_stop_drains_running_agents_before_disconnect():
-    runner, adapter = make_restart_runner()
-    disconnect_mock = AsyncMock()
-    adapter.disconnect = disconnect_mock
-
-    running_agent = MagicMock()
-    runner._running_agents = {"session": running_agent}
-
-    async def finish_agent():
-        await asyncio.sleep(0.05)
-        runner._running_agents.clear()
-
-    asyncio.create_task(finish_agent())
-
-    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
-        await runner.stop()
-
-    running_agent.interrupt.assert_not_called()
-    disconnect_mock.assert_awaited_once()
-    assert runner._shutdown_event.is_set() is True
-
-
-@pytest.mark.asyncio
-async def test_gateway_stop_interrupts_after_drain_timeout():
-    runner, adapter = make_restart_runner()
-    runner._restart_drain_timeout = 0.05
-
-    disconnect_mock = AsyncMock()
-    adapter.disconnect = disconnect_mock
-
-    running_agent = MagicMock()
-    runner._running_agents = {"session": running_agent}
-
-    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
-        await runner.stop()
-
-    running_agent.interrupt.assert_called_once_with("Gateway shutting down")
-    disconnect_mock.assert_awaited_once()
-    assert runner._shutdown_event.is_set() is True
-
-
-@pytest.mark.asyncio
-async def test_gateway_stop_service_restart_sets_named_exit_code():
-    runner, adapter = make_restart_runner()
-    adapter.disconnect = AsyncMock()
-
-    with patch("gateway.status.remove_pid_file"), patch("gateway.status.write_runtime_status"):
-        await runner.stop(restart=True, service_restart=True)
-
-    assert runner._exit_code == GATEWAY_SERVICE_RESTART_EXIT_CODE
-
-
-@pytest.mark.asyncio
-async def test_drain_active_agents_throttles_status_updates():
-    runner, _adapter = make_restart_runner()
-    runner._update_runtime_status = MagicMock()
-
-    runner._running_agents = {"a": MagicMock(), "b": MagicMock()}
-
-    async def finish_agents():
-        await asyncio.sleep(0.12)
-        runner._running_agents.pop("a")
-        await asyncio.sleep(0.12)
-        runner._running_agents.clear()
-
-    task = asyncio.create_task(finish_agents())
-    await runner._drain_active_agents(1.0)
-    await task
-
-    # Start, one count-change update, and final update. Allow one extra update
-    # if the loop observes the zero-agent state before exiting.
-    assert 3 <= runner._update_runtime_status.call_count <= 4
@@ -11,10 +11,24 @@ import pytest
 from gateway.config import PlatformConfig


-# The matrix adapter module is importable without mautrix installed
-# (module-level imports use try/except with stubs).  No need for
-# module-level mock installation — tests that call adapter methods
-# needing real mautrix APIs mock them individually.
+def _ensure_nio_mock():
+    """Install a mock nio module when matrix-nio isn't available."""
+    if "nio" in sys.modules and hasattr(sys.modules["nio"], "__file__"):
+        return
+    nio_mod = MagicMock()
+    nio_mod.MegolmEvent = type("MegolmEvent", (), {})
+    nio_mod.RoomMessageText = type("RoomMessageText", (), {})
+    nio_mod.RoomMessageImage = type("RoomMessageImage", (), {})
+    nio_mod.RoomMessageAudio = type("RoomMessageAudio", (), {})
+    nio_mod.RoomMessageVideo = type("RoomMessageVideo", (), {})
+    nio_mod.RoomMessageFile = type("RoomMessageFile", (), {})
+    nio_mod.DownloadResponse = type("DownloadResponse", (), {})
+    nio_mod.MemoryDownloadResponse = type("MemoryDownloadResponse", (), {})
+    nio_mod.InviteMemberEvent = type("InviteMemberEvent", (), {})
+    sys.modules.setdefault("nio", nio_mod)
+
+
+_ensure_nio_mock()


 def _make_adapter(tmp_path=None):
@@ -36,25 +50,24 @@ def _make_adapter(tmp_path=None):
    return adapter


-def _set_dm(adapter, room_id="!room1:example.org", is_dm=True):
-    """Mark a room as DM (or not) in the adapter's cache."""
-    adapter._dm_rooms[room_id] = is_dm
+def _make_room(room_id="!room1:example.org", member_count=5, is_dm=False):
+    """Create a fake Matrix room."""
+    room = SimpleNamespace(
+        room_id=room_id,
+        member_count=member_count,
+        users={},
+    )
+    return room


 def _make_event(
    body,
    sender="@alice:example.org",
    event_id="$evt1",
-    room_id="!room1:example.org",
    formatted_body=None,
    thread_id=None,
 ):
-    """Create a fake room message event.
-
-    The mautrix adapter reads ``event.room_id``, ``event.sender``,
-    ``event.event_id``, ``event.timestamp``, and ``event.content``
-    (a dict with ``msgtype``, ``body``, etc.).
-    """
+    """Create a fake RoomMessageText event."""
    content = {"body": body, "msgtype": "m.text"}
    if formatted_body:
        content["formatted_body"] = formatted_body
@@ -70,9 +83,9 @@ def _make_event(
    return SimpleNamespace(
        sender=sender,
        event_id=event_id,
-        room_id=room_id,
-        timestamp=int(time.time() * 1000),
-        content=content,
+        server_timestamp=int(time.time() * 1000),
+        body=body,
+        source={"content": content},
    )


@@ -139,9 +152,10 @@ async def test_require_mention_default_ignores_unmentioned(monkeypatch):
    monkeypatch.delenv("MATRIX_AUTO_THREAD", raising=False)

    adapter = _make_adapter()
+    room = _make_room()
    event = _make_event("hello everyone")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_not_awaited()


@@ -153,9 +167,10 @@ async def test_require_mention_default_processes_mentioned(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
+    room = _make_room()
    event = _make_event("@hermes:example.org help me")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.text == "help me"
@@ -169,10 +184,11 @@ async def test_require_mention_html_pill(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
+    room = _make_room()
    formatted = '<a href="https://matrix.to/#/@hermes:example.org">Hermes</a> help'
    event = _make_event("Hermes help", formatted_body=formatted)

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()


@@ -184,11 +200,11 @@ async def test_require_mention_dm_always_responds(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    # Mark the room as a DM via the adapter's cache.
-    _set_dm(adapter)
+    # member_count=2 triggers DM detection
+    room = _make_room(member_count=2)
    event = _make_event("hello without mention")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()


@@ -200,10 +216,10 @@ async def test_dm_strips_mention(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    _set_dm(adapter)
+    room = _make_room(member_count=2)
    event = _make_event("@hermes:example.org help me")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.text == "help me"
@@ -217,9 +233,10 @@ async def test_bare_mention_passes_empty_string(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
+    room = _make_room()
    event = _make_event("@hermes:example.org")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.text == ""
@@ -233,9 +250,10 @@ async def test_require_mention_free_response_room(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    event = _make_event("hello without mention", room_id="!room1:example.org")
+    room = _make_room(room_id="!room1:example.org")
+    event = _make_event("hello without mention")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()


@@ -249,9 +267,10 @@ async def test_require_mention_bot_participated_thread(monkeypatch):
    adapter = _make_adapter()
    adapter._bot_participated_threads.add("$thread1")

+    room = _make_room()
    event = _make_event("hello without mention", thread_id="$thread1")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()


@@ -263,9 +282,10 @@ async def test_require_mention_disabled(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
+    room = _make_room()
    event = _make_event("hello without mention")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.text == "hello without mention"
@@ -283,9 +303,10 @@ async def test_auto_thread_default_creates_thread(monkeypatch):
    monkeypatch.delenv("MATRIX_AUTO_THREAD", raising=False)

    adapter = _make_adapter()
+    room = _make_room()
    event = _make_event("hello", event_id="$msg1")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id == "$msg1"
@@ -299,9 +320,10 @@ async def test_auto_thread_preserves_existing_thread(monkeypatch):

    adapter = _make_adapter()
    adapter._bot_participated_threads.add("$thread_root")
+    room = _make_room()
    event = _make_event("reply in thread", thread_id="$thread_root")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id == "$thread_root"
@@ -314,10 +336,10 @@ async def test_auto_thread_skips_dm(monkeypatch):
    monkeypatch.delenv("MATRIX_AUTO_THREAD", raising=False)

    adapter = _make_adapter()
-    _set_dm(adapter)
+    room = _make_room(member_count=2)
    event = _make_event("hello dm", event_id="$dm1")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id is None
@@ -330,9 +352,10 @@ async def test_auto_thread_disabled(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
+    room = _make_room()
    event = _make_event("hello", event_id="$msg1")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id is None
@@ -345,10 +368,11 @@ async def test_auto_thread_tracks_participation(monkeypatch):
    monkeypatch.delenv("MATRIX_AUTO_THREAD", raising=False)

    adapter = _make_adapter()
+    room = _make_room()
    event = _make_event("hello", event_id="$msg1")

    with patch.object(adapter, "_save_participated_threads"):
-        await adapter._on_room_message(event)
+        await adapter._on_room_message(room, event)

    assert "$msg1" in adapter._bot_participated_threads

@@ -361,9 +385,8 @@ async def test_auto_thread_tracks_participation(monkeypatch):
 class TestThreadPersistence:
    def test_empty_state_file(self, tmp_path, monkeypatch):
        """No state file → empty set."""
-        from gateway.platforms.matrix import MatrixAdapter
        monkeypatch.setattr(
-            MatrixAdapter, "_thread_state_path",
+            "gateway.platforms.matrix.MatrixAdapter._thread_state_path",
            staticmethod(lambda: tmp_path / "matrix_threads.json"),
        )
        adapter = _make_adapter()
@@ -372,10 +395,9 @@ class TestThreadPersistence:

    def test_track_thread_persists(self, tmp_path, monkeypatch):
        """_track_thread writes to disk."""
-        from gateway.platforms.matrix import MatrixAdapter
        state_path = tmp_path / "matrix_threads.json"
        monkeypatch.setattr(
-            MatrixAdapter, "_thread_state_path",
+            "gateway.platforms.matrix.MatrixAdapter._thread_state_path",
            staticmethod(lambda: state_path),
        )
        adapter = _make_adapter()
@@ -386,11 +408,10 @@ class TestThreadPersistence:

    def test_threads_survive_reload(self, tmp_path, monkeypatch):
        """Persisted threads are loaded by a new adapter instance."""
-        from gateway.platforms.matrix import MatrixAdapter
        state_path = tmp_path / "matrix_threads.json"
        state_path.write_text(json.dumps(["$t1", "$t2"]))
        monkeypatch.setattr(
-            MatrixAdapter, "_thread_state_path",
+            "gateway.platforms.matrix.MatrixAdapter._thread_state_path",
            staticmethod(lambda: state_path),
        )
        adapter = _make_adapter()
@@ -399,10 +420,9 @@ class TestThreadPersistence:

    def test_cap_max_tracked_threads(self, tmp_path, monkeypatch):
        """Thread set is trimmed to _MAX_TRACKED_THREADS."""
-        from gateway.platforms.matrix import MatrixAdapter
        state_path = tmp_path / "matrix_threads.json"
        monkeypatch.setattr(
-            MatrixAdapter, "_thread_state_path",
+            "gateway.platforms.matrix.MatrixAdapter._thread_state_path",
            staticmethod(lambda: state_path),
        )
        adapter = _make_adapter()
@@ -428,10 +448,10 @@ async def test_dm_mention_thread_disabled_by_default(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    _set_dm(adapter)
+    room = _make_room(member_count=2)
    event = _make_event("@hermes:example.org help me", event_id="$dm1")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id is None
@@ -444,11 +464,11 @@ async def test_dm_mention_thread_creates_thread(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    _set_dm(adapter)
+    room = _make_room(member_count=2)
    event = _make_event("@hermes:example.org help me", event_id="$dm1")

    with patch.object(adapter, "_save_participated_threads"):
-        await adapter._on_room_message(event)
+        await adapter._on_room_message(room, event)

    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
@@ -463,10 +483,10 @@ async def test_dm_mention_thread_no_mention_no_thread(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    _set_dm(adapter)
+    room = _make_room(member_count=2)
    event = _make_event("hello without mention", event_id="$dm1")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id is None
@@ -479,11 +499,11 @@ async def test_dm_mention_thread_preserves_existing_thread(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    _set_dm(adapter)
    adapter._bot_participated_threads.add("$existing_thread")
+    room = _make_room(member_count=2)
    event = _make_event("@hermes:example.org help me", thread_id="$existing_thread")

-    await adapter._on_room_message(event)
+    await adapter._on_room_message(room, event)
    adapter.handle_message.assert_awaited_once()
    msg = adapter.handle_message.await_args.args[0]
    assert msg.source.thread_id == "$existing_thread"
@@ -496,11 +516,11 @@ async def test_dm_mention_thread_tracks_participation(monkeypatch):
    monkeypatch.setenv("MATRIX_AUTO_THREAD", "false")

    adapter = _make_adapter()
-    _set_dm(adapter)
+    room = _make_room(member_count=2)
    event = _make_event("@hermes:example.org help", event_id="$dm1")

    with patch.object(adapter, "_save_participated_threads"):
-        await adapter._on_room_message(event)
+        await adapter._on_room_message(room, event)

    assert "$dm1" in adapter._bot_participated_threads

@@ -1,23 +1,18 @@
-"""Tests for Matrix voice message support (MSC3245).
-
-Updated for the mautrix-python SDK (no more matrix-nio / nio imports).
-"""
+"""Tests for Matrix voice message support (MSC3245)."""
 import io
-import os
-import tempfile
 import types
-from types import SimpleNamespace

 import pytest
 from unittest.mock import AsyncMock, MagicMock, patch

-# Try importing mautrix; skip entire file if not available.
+# Try importing real nio; skip entire file if not available.
+# A MagicMock in sys.modules (from another test) is not the real package.
 try:
-    import mautrix as _mautrix_probe
-    if not isinstance(_mautrix_probe, types.ModuleType) or not hasattr(_mautrix_probe, "__file__"):
-        pytest.skip("mautrix in sys.modules is a mock, not the real package", allow_module_level=True)
+    import nio as _nio_probe
+    if not isinstance(_nio_probe, types.ModuleType) or not hasattr(_nio_probe, "__file__"):
+        pytest.skip("nio in sys.modules is a mock, not the real package", allow_module_level=True)
 except ImportError:
-    pytest.skip("mautrix not installed", allow_module_level=True)
+    pytest.skip("matrix-nio not installed", allow_module_level=True)

 from gateway.platforms.base import MessageType

@@ -30,7 +25,7 @@ def _make_adapter():
    """Create a MatrixAdapter with mocked config."""
    from gateway.platforms.matrix import MatrixAdapter
    from gateway.config import PlatformConfig
-
+    
    config = PlatformConfig(
        enabled=True,
        token="***",
@@ -43,26 +38,32 @@ def _make_adapter():
    return adapter


+def _make_room(room_id: str = "!test:example.org", member_count: int = 2):
+    """Create a mock Matrix room."""
+    room = MagicMock()
+    room.room_id = room_id
+    room.member_count = member_count
+    return room
+
+
 def _make_audio_event(
    event_id: str = "$audio_event",
    sender: str = "@alice:example.org",
-    room_id: str = "!test:example.org",
    body: str = "Voice message",
    url: str = "mxc://example.org/abc123",
    is_voice: bool = False,
    mimetype: str = "audio/ogg",
-    timestamp: int = 9999999999000,  # ms
+    timestamp: float = 9999999999000,  # ms
 ):
    """
-    Create a mock mautrix room message event.
-
-    In mautrix, the handler receives a single event object with attributes
-    ``room_id``, ``sender``, ``event_id``, ``timestamp``, and ``content``
-    (a dict-like or serializable object).
-
+    Create a mock RoomMessageAudio event that passes isinstance checks.
+    
    Args:
-        is_voice: If True, adds org.matrix.msc3245.voice field to content.
+        is_voice: If True, adds org.matrix.msc3245.voice field to content
    """
+    import nio
+    
+    # Build the source dict that nio events expose via .source
    content = {
        "msgtype": "m.audio",
        "body": body,
@@ -71,35 +72,39 @@ def _make_audio_event(
            "mimetype": mimetype,
        },
    }
-
+    
    if is_voice:
        content["org.matrix.msc3245.voice"] = {}
-
-    event = SimpleNamespace(
-        event_id=event_id,
-        sender=sender,
-        room_id=room_id,
-        timestamp=timestamp,
-        content=content,
-    )
+    
+    # Create a real nio RoomMessageAudio-like object
+    # We use MagicMock but configure __class__ to pass isinstance check
+    event = MagicMock(spec=nio.RoomMessageAudio)
+    event.event_id = event_id
+    event.sender = sender
+    event.body = body
+    event.url = url
+    event.server_timestamp = timestamp
+    event.source = {
+        "type": "m.room.message",
+        "content": content,
+    }
+    # For MIME type extraction - needs to be a dict
+    event.content = content
+    
    return event


-def _make_state_store(member_count: int = 2):
-    """Create a mock state store with get_members/get_member support."""
-    store = MagicMock()
-    # get_members returns a list of member user IDs
-    members = [MagicMock() for _ in range(member_count)]
-    store.get_members = AsyncMock(return_value=members)
-    # get_member returns a single member info object
-    member = MagicMock()
-    member.displayname = "Alice"
-    store.get_member = AsyncMock(return_value=member)
-    return store
+def _make_download_response(body: bytes = b"fake audio data"):
+    """Create a mock nio.MemoryDownloadResponse."""
+    import nio
+    resp = MagicMock()
+    resp.body = body
+    resp.__class__ = nio.MemoryDownloadResponse
+    return resp


 # ---------------------------------------------------------------------------
-# Tests: MSC3245 Voice Detection
+# Tests: MSC3245 Voice Detection (RED -> GREEN)
 # ---------------------------------------------------------------------------

 class TestMatrixVoiceMessageDetection:
@@ -113,28 +118,27 @@ class TestMatrixVoiceMessageDetection:
        self.adapter._message_handler = AsyncMock()
        # Mock _mxc_to_http to return a fake HTTP URL
        self.adapter._mxc_to_http = lambda url: f"https://matrix.example.org/_matrix/media/v3/download/{url[6:]}"
-        # Mock client for authenticated download — download_media returns bytes directly
+        # Mock client for authenticated download
        self.adapter._client = MagicMock()
-        self.adapter._client.download_media = AsyncMock(return_value=b"fake audio data")
-        # State store for DM detection
-        self.adapter._client.state_store = _make_state_store()
+        self.adapter._client.download = AsyncMock(return_value=_make_download_response())

    @pytest.mark.asyncio
    async def test_voice_message_has_type_voice(self):
        """Voice messages (with MSC3245 field) should be MessageType.VOICE."""
+        room = _make_room()
        event = _make_audio_event(is_voice=True)
-
+        
        # Capture the MessageEvent passed to handle_message
        captured_event = None
-
+        
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-
+        
        self.adapter.handle_message = capture
-
-        await self.adapter._on_room_message(event)
-
+        
+        await self.adapter._on_room_message_media(room, event)
+        
        assert captured_event is not None, "No event was captured"
        assert captured_event.message_type == MessageType.VOICE, \
            f"Expected MessageType.VOICE, got {captured_event.message_type}"
@@ -142,43 +146,44 @@ class TestMatrixVoiceMessageDetection:
    @pytest.mark.asyncio
    async def test_voice_message_has_local_path(self):
        """Voice messages should have a local cached path in media_urls."""
+        room = _make_room()
        event = _make_audio_event(is_voice=True)
-
+        
        captured_event = None
-
+        
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-
+        
        self.adapter.handle_message = capture
-
-        await self.adapter._on_room_message(event)
-
+        
+        await self.adapter._on_room_message_media(room, event)
+        
        assert captured_event is not None
        assert captured_event.media_urls is not None
        assert len(captured_event.media_urls) > 0
        # Should be a local path, not an HTTP URL
        assert not captured_event.media_urls[0].startswith("http"), \
            f"media_urls should contain local path, got {captured_event.media_urls[0]}"
-        # download_media is called with a ContentURI wrapping the mxc URL
-        self.adapter._client.download_media.assert_awaited_once()
+        self.adapter._client.download.assert_awaited_once_with(mxc=event.url)
        assert captured_event.media_types == ["audio/ogg"]

    @pytest.mark.asyncio
    async def test_audio_without_msc3245_stays_audio_type(self):
        """Regular audio uploads (no MSC3245 field) should remain MessageType.AUDIO."""
+        room = _make_room()
        event = _make_audio_event(is_voice=False)  # NOT a voice message
-
+        
        captured_event = None
-
+        
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-
+        
        self.adapter.handle_message = capture
-
-        await self.adapter._on_room_message(event)
-
+        
+        await self.adapter._on_room_message_media(room, event)
+        
        assert captured_event is not None
        assert captured_event.message_type == MessageType.AUDIO, \
            f"Expected MessageType.AUDIO for non-voice, got {captured_event.message_type}"
@@ -186,24 +191,25 @@ class TestMatrixVoiceMessageDetection:
    @pytest.mark.asyncio
    async def test_regular_audio_has_http_url(self):
        """Regular audio uploads should keep HTTP URL (not cached locally)."""
+        room = _make_room()
        event = _make_audio_event(is_voice=False)
-
+        
        captured_event = None
-
+        
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-
+        
        self.adapter.handle_message = capture
-
-        await self.adapter._on_room_message(event)
-
+        
+        await self.adapter._on_room_message_media(room, event)
+        
        assert captured_event is not None
        assert captured_event.media_urls is not None
        # Should be HTTP URL, not local path
        assert captured_event.media_urls[0].startswith("http"), \
            f"Non-voice audio should have HTTP URL, got {captured_event.media_urls[0]}"
-        self.adapter._client.download_media.assert_not_awaited()
+        self.adapter._client.download.assert_not_awaited()
        assert captured_event.media_types == ["audio/ogg"]


@@ -218,26 +224,29 @@ class TestMatrixVoiceCacheFallback:
        self.adapter._message_handler = AsyncMock()
        self.adapter._mxc_to_http = lambda url: f"https://matrix.example.org/_matrix/media/v3/download/{url[6:]}"
        self.adapter._client = MagicMock()
-        self.adapter._client.state_store = _make_state_store()

    @pytest.mark.asyncio
    async def test_voice_cache_failure_falls_back_to_http_url(self):
-        """If caching fails (download returns None), voice message should still be delivered with HTTP URL."""
+        """If caching fails, voice message should still be delivered with HTTP URL."""
+        room = _make_room()
        event = _make_audio_event(is_voice=True)
-
-        # download_media returns None on failure
-        self.adapter._client.download_media = AsyncMock(return_value=None)
-
+        
+        # Make download fail
+        import nio
+        error_resp = MagicMock()
+        error_resp.__class__ = nio.DownloadError
+        self.adapter._client.download = AsyncMock(return_value=error_resp)
+        
        captured_event = None
-
+        
        async def capture(msg_event):
            nonlocal captured_event
            captured_event = msg_event
-
+        
        self.adapter.handle_message = capture
-
-        await self.adapter._on_room_message(event)
-
+        
+        await self.adapter._on_room_message_media(room, event)
+        
        assert captured_event is not None
        assert captured_event.media_urls is not None
        # Should fall back to HTTP URL
@@ -247,9 +256,10 @@ class TestMatrixVoiceCacheFallback:
    @pytest.mark.asyncio
    async def test_voice_cache_exception_falls_back_to_http_url(self):
        """Unexpected download exceptions should also fall back to HTTP URL."""
+        room = _make_room()
        event = _make_audio_event(is_voice=True)

-        self.adapter._client.download_media = AsyncMock(side_effect=RuntimeError("boom"))
+        self.adapter._client.download = AsyncMock(side_effect=RuntimeError("boom"))

        captured_event = None

@@ -259,7 +269,7 @@ class TestMatrixVoiceCacheFallback:

        self.adapter.handle_message = capture

-        await self.adapter._on_room_message(event)
+        await self.adapter._on_room_message_media(room, event)

        assert captured_event is not None
        assert captured_event.media_urls is not None
@@ -268,7 +278,7 @@ class TestMatrixVoiceCacheFallback:


 # ---------------------------------------------------------------------------
-# Tests: send_voice includes MSC3245 field
+# Tests: send_voice includes MSC3245 field (RED -> GREEN)
 # ---------------------------------------------------------------------------

 class TestMatrixSendVoiceMSC3245:
@@ -277,52 +287,62 @@ class TestMatrixSendVoiceMSC3245:
    def setup_method(self):
        self.adapter = _make_adapter()
        self.adapter._user_id = "@bot:example.org"
-        # Mock client — upload_media returns a ContentURI string
+        # Mock client with successful upload
        self.adapter._client = MagicMock()
        self.upload_call = None

-        async def mock_upload_media(data, mime_type=None, filename=None, **kwargs):
-            self.upload_call = {"data": data, "mime_type": mime_type, "filename": filename}
-            return "mxc://example.org/uploaded"
+        async def mock_upload(*args, **kwargs):
+            self.upload_call = (args, kwargs)
+            import nio
+            resp = MagicMock()
+            resp.content_uri = "mxc://example.org/uploaded"
+            resp.__class__ = nio.UploadResponse
+            return resp, None

-        self.adapter._client.upload_media = mock_upload_media
+        self.adapter._client.upload = mock_upload

    @pytest.mark.asyncio
-    @patch("mimetypes.guess_type", return_value=("audio/ogg", None))
-    async def test_send_voice_includes_msc3245_field(self, _mock_guess):
+    async def test_send_voice_includes_msc3245_field(self):
        """send_voice should include org.matrix.msc3245.voice in message content."""
+        import tempfile
+        import os
+        
        # Create a temp audio file
        with tempfile.NamedTemporaryFile(suffix=".ogg", delete=False) as f:
            f.write(b"fake audio data")
            temp_path = f.name
-
+        
        try:
-            # Capture the message content sent via send_message_event
+            # Capture the message content sent to room_send
            sent_content = None
-
-            async def mock_send_message_event(room_id, event_type, content):
+            
+            async def mock_room_send(room_id, event_type, content):
                nonlocal sent_content
                sent_content = content
-                # send_message_event returns an EventID string
-                return "$sent_event"
-
-            self.adapter._client.send_message_event = mock_send_message_event
-
+                resp = MagicMock()
+                resp.event_id = "$sent_event"
+                import nio
+                resp.__class__ = nio.RoomSendResponse
+                return resp
+            
+            self.adapter._client.room_send = mock_room_send
+            
            await self.adapter.send_voice(
                chat_id="!room:example.org",
                audio_path=temp_path,
                caption="Test voice",
            )
-
+            
            assert sent_content is not None, "No message was sent"
            assert "org.matrix.msc3245.voice" in sent_content, \
                f"MSC3245 voice field missing from content: {sent_content.keys()}"
            assert sent_content["msgtype"] == "m.audio"
            assert sent_content["info"]["mimetype"] == "audio/ogg"
-            assert self.upload_call is not None, "Expected upload_media() to be called"
-            assert isinstance(self.upload_call["data"], bytes)
-            assert self.upload_call["mime_type"] == "audio/ogg"
-            assert self.upload_call["filename"].endswith(".ogg")
+            assert self.upload_call is not None, "Expected upload() to be called"
+            args, kwargs = self.upload_call
+            assert isinstance(args[0], io.BytesIO)
+            assert kwargs["content_type"] == "audio/ogg"
+            assert kwargs["filename"].endswith(".ogg")

        finally:
            os.unlink(temp_path)
@@ -1,160 +0,0 @@
-import asyncio
-import shutil
-import subprocess
-from unittest.mock import AsyncMock, MagicMock
-
-import pytest
-
-import gateway.run as gateway_run
-from gateway.platforms.base import MessageEvent, MessageType
-from gateway.restart import DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-from gateway.session import build_session_key
-from tests.gateway.restart_test_helpers import make_restart_runner, make_restart_source
-
-
-@pytest.mark.asyncio
-async def test_restart_command_while_busy_requests_drain_without_interrupt():
-    runner, _adapter = make_restart_runner()
-    runner.request_restart = MagicMock(return_value=True)
-    event = MessageEvent(
-        text="/restart",
-        message_type=MessageType.TEXT,
-        source=make_restart_source(),
-        message_id="m1",
-    )
-    session_key = build_session_key(event.source)
-    running_agent = MagicMock()
-    runner._running_agents[session_key] = running_agent
-
-    result = await runner._handle_message(event)
-
-    assert result == "⏳ Draining 1 active agent(s) before restart..."
-    running_agent.interrupt.assert_not_called()
-    runner.request_restart.assert_called_once_with(detached=True, via_service=False)
-
-
-@pytest.mark.asyncio
-async def test_drain_queue_mode_queues_follow_up_without_interrupt():
-    runner, adapter = make_restart_runner()
-    runner._draining = True
-    runner._restart_requested = True
-    runner._busy_input_mode = "queue"
-
-    event = MessageEvent(
-        text="follow up",
-        message_type=MessageType.TEXT,
-        source=make_restart_source(),
-        message_id="m2",
-    )
-    session_key = build_session_key(event.source)
-    adapter._active_sessions[session_key] = asyncio.Event()
-
-    await adapter.handle_message(event)
-
-    assert session_key in adapter._pending_messages
-    assert adapter._pending_messages[session_key].text == "follow up"
-    assert not adapter._active_sessions[session_key].is_set()
-    assert any("queued for the next turn" in message for message in adapter.sent)
-
-
-@pytest.mark.asyncio
-async def test_draining_rejects_new_session_messages():
-    runner, _adapter = make_restart_runner()
-    runner._draining = True
-    runner._restart_requested = True
-
-    event = MessageEvent(
-        text="hello",
-        message_type=MessageType.TEXT,
-        source=make_restart_source("fresh"),
-        message_id="m3",
-    )
-
-    result = await runner._handle_message(event)
-
-    assert result == "⏳ Gateway is restarting and is not accepting new work right now."
-
-
-def test_load_busy_input_mode_prefers_env_then_config_then_default(tmp_path, monkeypatch):
-    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
-    monkeypatch.delenv("HERMES_GATEWAY_BUSY_INPUT_MODE", raising=False)
-
-    assert gateway_run.GatewayRunner._load_busy_input_mode() == "interrupt"
-
-    (tmp_path / "config.yaml").write_text(
-        "display:\n  busy_input_mode: queue\n", encoding="utf-8"
-    )
-    assert gateway_run.GatewayRunner._load_busy_input_mode() == "queue"
-
-    monkeypatch.setenv("HERMES_GATEWAY_BUSY_INPUT_MODE", "interrupt")
-    assert gateway_run.GatewayRunner._load_busy_input_mode() == "interrupt"
-
-
-def test_load_restart_drain_timeout_prefers_env_then_config_then_default(
-    tmp_path, monkeypatch, caplog
-):
-    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
-    monkeypatch.delenv("HERMES_RESTART_DRAIN_TIMEOUT", raising=False)
-
-    assert (
-        gateway_run.GatewayRunner._load_restart_drain_timeout()
-        == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-    )
-
-    (tmp_path / "config.yaml").write_text(
-        "agent:\n  restart_drain_timeout: 12\n", encoding="utf-8"
-    )
-    assert gateway_run.GatewayRunner._load_restart_drain_timeout() == 12.0
-
-    monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "7")
-    assert gateway_run.GatewayRunner._load_restart_drain_timeout() == 7.0
-
-    monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "invalid")
-    assert (
-        gateway_run.GatewayRunner._load_restart_drain_timeout()
-        == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-    )
-    assert "Invalid restart_drain_timeout" in caplog.text
-
-
-@pytest.mark.asyncio
-async def test_request_restart_is_idempotent():
-    runner, _adapter = make_restart_runner()
-    runner.stop = AsyncMock()
-
-    assert runner.request_restart(detached=True, via_service=False) is True
-    first_task = next(iter(runner._background_tasks))
-    assert runner.request_restart(detached=True, via_service=False) is False
-
-    await first_task
-
-    runner.stop.assert_awaited_once_with(
-        restart=True, detached_restart=True, service_restart=False
-    )
-
-
-@pytest.mark.asyncio
-async def test_launch_detached_restart_command_uses_setsid(monkeypatch):
-    runner, _adapter = make_restart_runner()
-    popen_calls = []
-
-    monkeypatch.setattr(gateway_run, "_resolve_hermes_bin", lambda: ["/usr/bin/hermes"])
-    monkeypatch.setattr(gateway_run.os, "getpid", lambda: 321)
-    monkeypatch.setattr(shutil, "which", lambda cmd: "/usr/bin/setsid" if cmd == "setsid" else None)
-
-    def fake_popen(cmd, **kwargs):
-        popen_calls.append((cmd, kwargs))
-        return MagicMock()
-
-    monkeypatch.setattr(subprocess, "Popen", fake_popen)
-
-    await runner._launch_detached_restart_command()
-
-    assert len(popen_calls) == 1
-    cmd, kwargs = popen_calls[0]
-    assert cmd[:2] == ["/usr/bin/setsid", "bash"]
-    assert "gateway restart" in cmd[-1]
-    assert "kill -0 321" in cmd[-1]
-    assert kwargs["start_new_session"] is True
-    assert kwargs["stdout"] is subprocess.DEVNULL
-    assert kwargs["stderr"] is subprocess.DEVNULL
@@ -127,16 +127,6 @@ async def test_shutdown_fires_finalize_for_active_agents(mock_invoke_hook):
    runner._shutdown_event = MagicMock()
    runner.adapters = {}
    runner._exit_reason = "test"
-    runner._exit_code = None
-    runner._draining = False
-    runner._restart_requested = False
-    runner._restart_task_started = False
-    runner._restart_detached = False
-    runner._restart_via_service = False
-    runner._restart_drain_timeout = 0.0
-    runner._stop_task = None
-    runner._running_agents_ts = {}
-    runner._update_runtime_status = MagicMock()

    agent1 = MagicMock()
    agent1.session_id = "sess-a"
@@ -41,15 +41,6 @@ def _make_runner():
    runner._pending_approvals = {}
    runner._voice_mode = {}
    runner._background_tasks = set()
-    runner._draining = False
-    runner._restart_requested = False
-    runner._restart_task_started = False
-    runner._restart_detached = False
-    runner._restart_via_service = False
-    runner._restart_drain_timeout = 0.0
-    runner._stop_task = None
-    runner._exit_code = None
-    runner._update_runtime_status = MagicMock()
    runner._is_user_authorized = lambda _source: True
    runner.hooks = MagicMock()
    runner.hooks.emit = AsyncMock()
@@ -1,75 +0,0 @@
-"""Tests for _clear_stale_openai_base_url() cleanup after provider switch (#5161)."""
-
-from __future__ import annotations
-
-from unittest.mock import patch
-
-from hermes_cli.config import load_config, save_config, save_env_value, get_env_value
-
-
-def _write_provider(provider: str, model: str = "test-model"):
-    """Helper: write a provider + model to config.yaml."""
-    cfg = load_config()
-    model_cfg = cfg.get("model", {})
-    if not isinstance(model_cfg, dict):
-        model_cfg = {}
-    model_cfg["provider"] = provider
-    model_cfg["default"] = model
-    cfg["model"] = model_cfg
-    save_config(cfg)
-
-
-class TestClearStaleOpenaiBaseUrl:
-    """_clear_stale_openai_base_url() removes OPENAI_BASE_URL when provider is not custom."""
-
-    def test_clears_when_provider_is_named(self, monkeypatch):
-        """OPENAI_BASE_URL is cleared when config provider is a named provider."""
-        from hermes_cli.main import _clear_stale_openai_base_url
-
-        _write_provider("openrouter")
-        save_env_value("OPENAI_BASE_URL", "http://localhost:11434/v1")
-
-        _clear_stale_openai_base_url()
-
-        result = get_env_value("OPENAI_BASE_URL")
-        assert not result, f"Expected OPENAI_BASE_URL to be cleared, got: {result!r}"
-
-    def test_preserves_when_provider_is_custom(self, monkeypatch):
-        """OPENAI_BASE_URL is NOT cleared when config provider is 'custom'."""
-        from hermes_cli.main import _clear_stale_openai_base_url
-
-        _write_provider("custom")
-        save_env_value("OPENAI_BASE_URL", "http://localhost:11434/v1")
-
-        _clear_stale_openai_base_url()
-
-        result = get_env_value("OPENAI_BASE_URL")
-        assert result == "http://localhost:11434/v1", \
-            f"Expected OPENAI_BASE_URL to be preserved, got: {result!r}"
-
-    def test_noop_when_no_openai_base_url(self, monkeypatch):
-        """No error when OPENAI_BASE_URL is not set."""
-        from hermes_cli.main import _clear_stale_openai_base_url
-
-        _write_provider("openrouter")
-        # Ensure it's not set
-        save_env_value("OPENAI_BASE_URL", "")
-        monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
-
-        # Should not raise
-        _clear_stale_openai_base_url()
-
-    def test_noop_when_provider_empty(self, monkeypatch):
-        """No cleanup when provider is not set in config."""
-        from hermes_cli.main import _clear_stale_openai_base_url
-
-        cfg = load_config()
-        cfg.pop("model", None)
-        save_config(cfg)
-        save_env_value("OPENAI_BASE_URL", "http://localhost:11434/v1")
-
-        _clear_stale_openai_base_url()
-
-        result = get_env_value("OPENAI_BASE_URL")
-        assert result == "http://localhost:11434/v1", \
-            "Should not clear when provider is not configured"
@@ -0,0 +1,275 @@
+"""Tests for container-aware CLI routing (NixOS container mode).
+
+When container.enable = true in the NixOS module, the activation script
+writes a .container-mode metadata file. The host CLI detects this and
+execs into the container instead of running locally.
+"""
+import os
+from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from hermes_cli.config import (
+    _is_inside_container,
+    get_container_exec_info,
+)
+
+
+# =============================================================================
+# _is_inside_container
+# =============================================================================
+
+
+def test_is_inside_container_dockerenv(tmp_path):
+    """Detects /.dockerenv marker file."""
+    with patch("os.path.exists") as mock_exists:
+        mock_exists.side_effect = lambda p: p == "/.dockerenv"
+        assert _is_inside_container() is True
+
+
+def test_is_inside_container_containerenv(tmp_path):
+    """Detects Podman's /run/.containerenv marker."""
+    with patch("os.path.exists") as mock_exists:
+        mock_exists.side_effect = lambda p: p == "/run/.containerenv"
+        assert _is_inside_container() is True
+
+
+def test_is_inside_container_cgroup_docker():
+    """Detects 'docker' in /proc/1/cgroup."""
+    with patch("os.path.exists", return_value=False), \
+         patch("builtins.open", create=True) as mock_open:
+        mock_open.return_value.__enter__ = lambda s: s
+        mock_open.return_value.__exit__ = MagicMock(return_value=False)
+        mock_open.return_value.read = MagicMock(
+            return_value="12:memory:/docker/abc123\n"
+        )
+        assert _is_inside_container() is True
+
+
+def test_is_inside_container_false_on_host():
+    """Returns False when none of the container indicators are present."""
+    with patch("os.path.exists", return_value=False), \
+         patch("builtins.open", side_effect=OSError("no such file")):
+        assert _is_inside_container() is False
+
+
+# =============================================================================
+# get_container_exec_info
+# =============================================================================
+
+
+@pytest.fixture
+def container_env(tmp_path, monkeypatch):
+    """Set up a fake HERMES_HOME with .container-mode file."""
+    hermes_home = tmp_path / ".hermes"
+    hermes_home.mkdir()
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+
+    container_mode = hermes_home / ".container-mode"
+    container_mode.write_text(
+        "# Written by NixOS activation script. Do not edit manually.\n"
+        "backend=podman\n"
+        "container_name=hermes-agent\n"
+        "hermes_bin=/data/current-package/bin/hermes\n"
+    )
+    return hermes_home
+
+
+def test_get_container_exec_info_returns_metadata(container_env):
+    """Reads .container-mode and returns backend/name/bin."""
+    with patch("hermes_cli.config._is_inside_container", return_value=False):
+        info = get_container_exec_info()
+
+    assert info is not None
+    assert info["backend"] == "podman"
+    assert info["container_name"] == "hermes-agent"
+    assert info["hermes_bin"] == "/data/current-package/bin/hermes"
+
+
+def test_get_container_exec_info_none_inside_container(container_env):
+    """Returns None when we're already inside a container."""
+    with patch("hermes_cli.config._is_inside_container", return_value=True):
+        info = get_container_exec_info()
+
+    assert info is None
+
+
+def test_get_container_exec_info_none_without_file(tmp_path, monkeypatch):
+    """Returns None when .container-mode doesn't exist (native mode)."""
+    hermes_home = tmp_path / ".hermes"
+    hermes_home.mkdir()
+    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
+
+    with patch("hermes_cli.config._is_inside_container", return_value=False):
+        info = get_container_exec_info()
+
+    assert info is None
+
+
+def test_get_container_exec_info_defaults():
+    """Falls back to defaults for missing keys."""
+    import tempfile
+
+    with tempfile.TemporaryDirectory() as tmpdir:
+        hermes_home = Path(tmpdir) / ".hermes"
+        hermes_home.mkdir()
+        (hermes_home / ".container-mode").write_text(
+            "# minimal file with no keys\n"
+        )
+
+        with patch("hermes_cli.config._is_inside_container", return_value=False), \
+             patch("hermes_cli.config.get_hermes_home", return_value=hermes_home):
+            info = get_container_exec_info()
+
+        assert info is not None
+        assert info["backend"] == "docker"
+        assert info["container_name"] == "hermes-agent"
+        assert info["hermes_bin"] == "/data/current-package/bin/hermes"
+
+
+def test_get_container_exec_info_docker_backend(container_env):
+    """Correctly reads docker backend."""
+    (container_env / ".container-mode").write_text(
+        "backend=docker\n"
+        "container_name=hermes-custom\n"
+        "hermes_bin=/opt/hermes/bin/hermes\n"
+    )
+
+    with patch("hermes_cli.config._is_inside_container", return_value=False):
+        info = get_container_exec_info()
+
+    assert info["backend"] == "docker"
+    assert info["container_name"] == "hermes-custom"
+    assert info["hermes_bin"] == "/opt/hermes/bin/hermes"
+
+
+# =============================================================================
+# _exec_in_container
+# =============================================================================
+
+
+def test_exec_in_container_calls_execvp():
+    """Verifies os.execvp is called with the correct command."""
+    from hermes_cli.main import _exec_in_container
+
+    container_info = {
+        "backend": "podman",
+        "container_name": "hermes-agent",
+        "hermes_bin": "/data/current-package/bin/hermes",
+    }
+
+    with patch("shutil.which", return_value="/usr/bin/podman"), \
+         patch("subprocess.run") as mock_run, \
+         patch("os.execvp") as mock_exec:
+        # Simulate running container
+        mock_result = MagicMock()
+        mock_result.returncode = 0
+        mock_result.stdout = "true\n"
+        mock_run.return_value = mock_result
+
+        _exec_in_container(container_info, ["chat", "-m", "claude-sonnet-4"])
+
+        mock_exec.assert_called_once_with(
+            "/usr/bin/podman",
+            ["/usr/bin/podman", "exec", "-it", "hermes-agent",
+             "/data/current-package/bin/hermes", "chat", "-m", "claude-sonnet-4"]
+        )
+
+
+def test_exec_in_container_strips_host_flag():
+    """The --host flag is not forwarded into the container."""
+    from hermes_cli.main import _exec_in_container
+
+    container_info = {
+        "backend": "podman",
+        "container_name": "hermes-agent",
+        "hermes_bin": "/data/current-package/bin/hermes",
+    }
+
+    with patch("shutil.which", return_value="/usr/bin/podman"), \
+         patch("subprocess.run") as mock_run, \
+         patch("os.execvp") as mock_exec:
+        mock_result = MagicMock()
+        mock_result.returncode = 0
+        mock_result.stdout = "true\n"
+        mock_run.return_value = mock_result
+
+        _exec_in_container(container_info, ["chat", "--host", "-q", "hello"])
+
+        # --host should be stripped
+        exec_args = mock_exec.call_args[0][1]
+        assert "--host" not in exec_args
+        assert "-q" in exec_args
+        assert "hello" in exec_args
+
+
+def test_exec_in_container_fallback_no_runtime(capsys):
+    """Falls back gracefully when container runtime is not found."""
+    from hermes_cli.main import _exec_in_container
+
+    container_info = {
+        "backend": "podman",
+        "container_name": "hermes-agent",
+        "hermes_bin": "/data/current-package/bin/hermes",
+    }
+
+    with patch("shutil.which", return_value=None), \
+         patch("os.execvp") as mock_exec:
+        _exec_in_container(container_info, ["chat"])
+
+        # Should NOT call execvp — graceful fallback
+        mock_exec.assert_not_called()
+
+    captured = capsys.readouterr()
+    assert "not found on PATH" in captured.err
+
+
+def test_exec_in_container_fallback_container_not_running(capsys):
+    """Falls back when container exists but is not running."""
+    from hermes_cli.main import _exec_in_container
+
+    container_info = {
+        "backend": "docker",
+        "container_name": "hermes-agent",
+        "hermes_bin": "/data/current-package/bin/hermes",
+    }
+
+    with patch("shutil.which", return_value="/usr/bin/docker"), \
+         patch("subprocess.run") as mock_run, \
+         patch("os.execvp") as mock_exec:
+        mock_result = MagicMock()
+        mock_result.returncode = 0
+        mock_result.stdout = "false\n"
+        mock_run.return_value = mock_result
+
+        _exec_in_container(container_info, ["chat"])
+
+        mock_exec.assert_not_called()
+
+    captured = capsys.readouterr()
+    assert "not running" in captured.err
+
+
+def test_exec_in_container_fallback_inspect_fails():
+    """Falls back when docker inspect fails entirely."""
+    from hermes_cli.main import _exec_in_container
+
+    container_info = {
+        "backend": "docker",
+        "container_name": "hermes-agent",
+        "hermes_bin": "/data/current-package/bin/hermes",
+    }
+
+    with patch("shutil.which", return_value="/usr/bin/docker"), \
+         patch("subprocess.run") as mock_run, \
+         patch("os.execvp") as mock_exec:
+        mock_result = MagicMock()
+        mock_result.returncode = 1
+        mock_result.stdout = ""
+        mock_run.return_value = mock_result
+
+        _exec_in_container(container_info, ["chat"])
+
+        mock_exec.assert_not_called()
@@ -5,10 +5,6 @@ from pathlib import Path
 from types import SimpleNamespace

 import hermes_cli.gateway as gateway_cli
-from gateway.restart import (
-    DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT,
-    GATEWAY_SERVICE_RESTART_EXIT_CODE,
-)


 class TestSystemdServiceRefresh:
@@ -78,7 +74,7 @@ class TestSystemdServiceRefresh:
        assert unit_path.read_text(encoding="utf-8") == "new unit\n"
        assert calls[:2] == [
            ["systemctl", "--user", "daemon-reload"],
-            ["systemctl", "--user", "reload-or-restart", gateway_cli.get_service_name()],
+            ["systemctl", "--user", "restart", gateway_cli.get_service_name()],
        ]


@@ -88,8 +84,6 @@ class TestGeneratedSystemdUnits:

        assert "ExecStart=" in unit
        assert "ExecStop=" not in unit
-        assert "ExecReload=/bin/kill -USR1 $MAINPID" in unit
-        assert f"RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}" in unit
        assert "TimeoutStopSec=60" in unit

    def test_user_unit_includes_resolved_node_directory_in_path(self, monkeypatch):
@@ -104,8 +98,6 @@ class TestGeneratedSystemdUnits:

        assert "ExecStart=" in unit
        assert "ExecStop=" not in unit
-        assert "ExecReload=/bin/kill -USR1 $MAINPID" in unit
-        assert f"RestartForceExitStatus={GATEWAY_SERVICE_RESTART_EXIT_CODE}" in unit
        assert "TimeoutStopSec=60" in unit
        assert "WantedBy=multi-user.target" in unit

@@ -165,31 +157,6 @@ class TestGatewayStopCleanup:


 class TestLaunchdServiceRecovery:
-    def test_get_restart_drain_timeout_prefers_env_then_config_then_default(self, monkeypatch):
-        monkeypatch.delenv("HERMES_RESTART_DRAIN_TIMEOUT", raising=False)
-        monkeypatch.setattr(gateway_cli, "read_raw_config", lambda: {})
-
-        assert (
-            gateway_cli._get_restart_drain_timeout()
-            == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-        )
-
-        monkeypatch.setattr(
-            gateway_cli,
-            "read_raw_config",
-            lambda: {"agent": {"restart_drain_timeout": 14}},
-        )
-        assert gateway_cli._get_restart_drain_timeout() == 14.0
-
-        monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "9")
-        assert gateway_cli._get_restart_drain_timeout() == 9.0
-
-        monkeypatch.setenv("HERMES_RESTART_DRAIN_TIMEOUT", "invalid")
-        assert (
-            gateway_cli._get_restart_drain_timeout()
-            == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
-        )
-
    def test_launchd_install_repairs_outdated_plist_without_force(self, tmp_path, monkeypatch):
        plist_path = tmp_path / "ai.hermes.gateway.plist"
        plist_path.write_text("<plist>old content</plist>", encoding="utf-8")
@@ -267,55 +234,6 @@ class TestLaunchdServiceRecovery:
            ["launchctl", "kickstart", target],
        ]

-    def test_launchd_restart_drains_running_gateway_before_kickstart(self, monkeypatch):
-        calls = []
-        target = f"{gateway_cli._launchd_domain()}/{gateway_cli.get_launchd_label()}"
-
-        monkeypatch.setattr(gateway_cli, "_get_restart_drain_timeout", lambda: 12.0)
-        monkeypatch.setattr(gateway_cli, "_request_gateway_self_restart", lambda pid: False)
-        monkeypatch.setattr(gateway_cli, "_wait_for_gateway_exit", lambda timeout, force_after=None: True)
-        monkeypatch.setattr(gateway_cli, "terminate_pid", lambda pid, force=False: calls.append(("term", pid, force)))
-        monkeypatch.setattr(
-            "gateway.status.get_running_pid",
-            lambda: 321,
-        )
-
-        def fake_run(cmd, check=False, **kwargs):
-            calls.append(cmd)
-            return SimpleNamespace(returncode=0, stdout="", stderr="")
-
-        monkeypatch.setattr(gateway_cli.subprocess, "run", fake_run)
-
-        gateway_cli.launchd_restart()
-
-        assert calls == [
-            ("term", 321, False),
-            ["launchctl", "kickstart", "-k", target],
-        ]
-
-    def test_launchd_restart_self_requests_graceful_restart_without_kickstart(self, monkeypatch, capsys):
-        calls = []
-
-        monkeypatch.setattr(
-            "gateway.status.get_running_pid",
-            lambda: 321,
-        )
-        monkeypatch.setattr(
-            gateway_cli,
-            "_request_gateway_self_restart",
-            lambda pid: calls.append(("self", pid)) or True,
-        )
-        monkeypatch.setattr(
-            gateway_cli.subprocess,
-            "run",
-            lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("launchctl should not run")),
-        )
-
-        gateway_cli.launchd_restart()
-
-        assert calls == [("self", 321)]
-        assert "restart requested" in capsys.readouterr().out.lower()
-
    def test_launchd_stop_uses_bootout_not_kill(self, monkeypatch):
        """launchd_stop must bootout the service so KeepAlive doesn't respawn it."""
        label = gateway_cli.get_launchd_label()
@@ -419,31 +337,6 @@ class TestGatewayServiceDetection:


 class TestGatewaySystemServiceRouting:
-    def test_systemd_restart_self_requests_graceful_restart_without_reload_or_restart(self, monkeypatch, capsys):
-        calls = []
-
-        monkeypatch.setattr(gateway_cli, "_select_systemd_scope", lambda system=False: False)
-        monkeypatch.setattr(gateway_cli, "refresh_systemd_unit_if_needed", lambda system=False: calls.append(("refresh", system)))
-        monkeypatch.setattr(
-            "gateway.status.get_running_pid",
-            lambda: 654,
-        )
-        monkeypatch.setattr(
-            gateway_cli,
-            "_request_gateway_self_restart",
-            lambda pid: calls.append(("self", pid)) or True,
-        )
-        monkeypatch.setattr(
-            gateway_cli.subprocess,
-            "run",
-            lambda *args, **kwargs: (_ for _ in ()).throw(AssertionError("systemctl should not run")),
-        )
-
-        gateway_cli.systemd_restart()
-
-        assert calls == [("refresh", False), ("self", 654)]
-        assert "restart requested" in capsys.readouterr().out.lower()
-
    def test_gateway_install_passes_system_flags(self, monkeypatch):
        monkeypatch.setattr(gateway_cli, "supports_systemd_services", lambda: True)
        monkeypatch.setattr(gateway_cli, "is_termux", lambda: False)
@@ -1,279 +0,0 @@
-"""Tests for WSL detection and WSL-aware gateway behavior."""
-
-import io
-import subprocess
-import sys
-from types import SimpleNamespace
-from unittest.mock import patch, MagicMock, mock_open
-
-import pytest
-
-import hermes_cli.gateway as gateway
-import hermes_constants
-
-
-# =============================================================================
-# is_wsl() in hermes_constants
-# =============================================================================
-
-class TestIsWsl:
-    """Test the shared is_wsl() utility."""
-
-    def setup_method(self):
-        # Reset cached value between tests
-        hermes_constants._wsl_detected = None
-
-    def test_detects_wsl2(self):
-        fake_content = (
-            "Linux version 5.15.146.1-microsoft-standard-WSL2 "
-            "(gcc (GCC) 11.2.0) #1 SMP Thu Jan 11 04:09:03 UTC 2024\n"
-        )
-        with patch("builtins.open", mock_open(read_data=fake_content)):
-            assert hermes_constants.is_wsl() is True
-
-    def test_detects_wsl1(self):
-        fake_content = (
-            "Linux version 4.4.0-19041-Microsoft "
-            "(Microsoft@Microsoft.com) (gcc version 5.4.0) #1\n"
-        )
-        with patch("builtins.open", mock_open(read_data=fake_content)):
-            assert hermes_constants.is_wsl() is True
-
-    def test_native_linux(self):
-        fake_content = (
-            "Linux version 6.5.0-44-generic (buildd@lcy02-amd64-015) "
-            "(x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0) #44\n"
-        )
-        with patch("builtins.open", mock_open(read_data=fake_content)):
-            assert hermes_constants.is_wsl() is False
-
-    def test_no_proc_version(self):
-        with patch("builtins.open", side_effect=FileNotFoundError):
-            assert hermes_constants.is_wsl() is False
-
-    def test_result_is_cached(self):
-        """After first detection, subsequent calls return the cached value."""
-        hermes_constants._wsl_detected = True
-        # Even with open raising, cached value is returned
-        with patch("builtins.open", side_effect=FileNotFoundError):
-            assert hermes_constants.is_wsl() is True
-
-
-# =============================================================================
-# _wsl_systemd_operational() in gateway
-# =============================================================================
-
-class TestWslSystemdOperational:
-    """Test the WSL systemd check."""
-
-    def test_running(self, monkeypatch):
-        monkeypatch.setattr(
-            gateway.subprocess, "run",
-            lambda *a, **kw: SimpleNamespace(
-                returncode=0, stdout="running\n", stderr=""
-            ),
-        )
-        assert gateway._wsl_systemd_operational() is True
-
-    def test_degraded(self, monkeypatch):
-        monkeypatch.setattr(
-            gateway.subprocess, "run",
-            lambda *a, **kw: SimpleNamespace(
-                returncode=1, stdout="degraded\n", stderr=""
-            ),
-        )
-        assert gateway._wsl_systemd_operational() is True
-
-    def test_starting(self, monkeypatch):
-        monkeypatch.setattr(
-            gateway.subprocess, "run",
-            lambda *a, **kw: SimpleNamespace(
-                returncode=1, stdout="starting\n", stderr=""
-            ),
-        )
-        assert gateway._wsl_systemd_operational() is True
-
-    def test_offline_no_systemd(self, monkeypatch):
-        monkeypatch.setattr(
-            gateway.subprocess, "run",
-            lambda *a, **kw: SimpleNamespace(
-                returncode=1, stdout="offline\n", stderr=""
-            ),
-        )
-        assert gateway._wsl_systemd_operational() is False
-
-    def test_systemctl_not_found(self, monkeypatch):
-        monkeypatch.setattr(
-            gateway.subprocess, "run",
-            MagicMock(side_effect=FileNotFoundError),
-        )
-        assert gateway._wsl_systemd_operational() is False
-
-    def test_timeout(self, monkeypatch):
-        monkeypatch.setattr(
-            gateway.subprocess, "run",
-            MagicMock(side_effect=subprocess.TimeoutExpired("systemctl", 5)),
-        )
-        assert gateway._wsl_systemd_operational() is False
-
-
-# =============================================================================
-# supports_systemd_services() WSL integration
-# =============================================================================
-
-class TestSupportsSystemdServicesWSL:
-    """Test that supports_systemd_services() handles WSL correctly."""
-
-    def test_wsl_with_systemd(self, monkeypatch):
-        """WSL + working systemd → True."""
-        monkeypatch.setattr(gateway, "is_linux", lambda: True)
-        monkeypatch.setattr(gateway, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
-        monkeypatch.setattr(gateway, "_wsl_systemd_operational", lambda: True)
-        assert gateway.supports_systemd_services() is True
-
-    def test_wsl_without_systemd(self, monkeypatch):
-        """WSL + no systemd → False."""
-        monkeypatch.setattr(gateway, "is_linux", lambda: True)
-        monkeypatch.setattr(gateway, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
-        monkeypatch.setattr(gateway, "_wsl_systemd_operational", lambda: False)
-        assert gateway.supports_systemd_services() is False
-
-    def test_native_linux(self, monkeypatch):
-        """Native Linux (not WSL) → True without checking systemd."""
-        monkeypatch.setattr(gateway, "is_linux", lambda: True)
-        monkeypatch.setattr(gateway, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway, "is_wsl", lambda: False)
-        assert gateway.supports_systemd_services() is True
-
-    def test_termux_still_excluded(self, monkeypatch):
-        """Termux → False regardless of WSL status."""
-        monkeypatch.setattr(gateway, "is_linux", lambda: True)
-        monkeypatch.setattr(gateway, "is_termux", lambda: True)
-        assert gateway.supports_systemd_services() is False
-
-
-# =============================================================================
-# WSL messaging in gateway commands
-# =============================================================================
-
-class TestGatewayCommandWSLMessages:
-    """Test that WSL users see appropriate guidance."""
-
-    def test_install_wsl_no_systemd(self, monkeypatch, capsys):
-        """hermes gateway install on WSL without systemd shows guidance."""
-        monkeypatch.setattr(gateway, "is_linux", lambda: True)
-        monkeypatch.setattr(gateway, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
-        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
-        monkeypatch.setattr(gateway, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway, "is_managed", lambda: False)
-
-        args = SimpleNamespace(
-            gateway_command="install", force=False, system=False,
-            run_as_user=None,
-        )
-        with pytest.raises(SystemExit) as exc_info:
-            gateway.gateway_command(args)
-        assert exc_info.value.code == 1
-
-        out = capsys.readouterr().out
-        assert "WSL detected" in out
-        assert "systemd is not running" in out
-        assert "hermes gateway run" in out
-        assert "tmux" in out
-
-    def test_start_wsl_no_systemd(self, monkeypatch, capsys):
-        """hermes gateway start on WSL without systemd shows guidance."""
-        monkeypatch.setattr(gateway, "is_linux", lambda: True)
-        monkeypatch.setattr(gateway, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
-        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
-        monkeypatch.setattr(gateway, "is_macos", lambda: False)
-
-        args = SimpleNamespace(gateway_command="start", system=False)
-        with pytest.raises(SystemExit) as exc_info:
-            gateway.gateway_command(args)
-        assert exc_info.value.code == 1
-
-        out = capsys.readouterr().out
-        assert "WSL detected" in out
-        assert "hermes gateway run" in out
-        assert "wsl.conf" in out
-
-    def test_install_wsl_with_systemd_warns(self, monkeypatch, capsys):
-        """hermes gateway install on WSL with systemd shows warning but proceeds."""
-        monkeypatch.setattr(gateway, "is_linux", lambda: True)
-        monkeypatch.setattr(gateway, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
-        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: True)
-        monkeypatch.setattr(gateway, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway, "is_managed", lambda: False)
-
-        # Mock systemd_install to capture call
-        install_called = []
-        monkeypatch.setattr(
-            gateway, "systemd_install",
-            lambda **kwargs: install_called.append(kwargs),
-        )
-
-        args = SimpleNamespace(
-            gateway_command="install", force=False, system=False,
-            run_as_user=None,
-        )
-        gateway.gateway_command(args)
-
-        out = capsys.readouterr().out
-        assert "WSL detected" in out
-        assert "may not survive WSL restarts" in out
-        assert len(install_called) == 1  # install still proceeded
-
-    def test_status_wsl_running_manual(self, monkeypatch, capsys):
-        """hermes gateway status on WSL with manual process shows WSL note."""
-        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
-        monkeypatch.setattr(gateway, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
-        monkeypatch.setattr(gateway, "find_gateway_pids", lambda: [12345])
-        monkeypatch.setattr(gateway, "_runtime_health_lines", lambda: [])
-        # Stub out the systemd unit path check
-        monkeypatch.setattr(
-            gateway, "get_systemd_unit_path",
-            lambda system=False: SimpleNamespace(exists=lambda: False),
-        )
-        monkeypatch.setattr(
-            gateway, "get_launchd_plist_path",
-            lambda: SimpleNamespace(exists=lambda: False),
-        )
-
-        args = SimpleNamespace(gateway_command="status", deep=False, system=False)
-        gateway.gateway_command(args)
-
-        out = capsys.readouterr().out
-        assert "WSL note" in out
-        assert "tmux or screen" in out
-
-    def test_status_wsl_not_running(self, monkeypatch, capsys):
-        """hermes gateway status on WSL with no process shows WSL start advice."""
-        monkeypatch.setattr(gateway, "supports_systemd_services", lambda: False)
-        monkeypatch.setattr(gateway, "is_macos", lambda: False)
-        monkeypatch.setattr(gateway, "is_termux", lambda: False)
-        monkeypatch.setattr(gateway, "is_wsl", lambda: True)
-        monkeypatch.setattr(gateway, "find_gateway_pids", lambda: [])
-        monkeypatch.setattr(gateway, "_runtime_health_lines", lambda: [])
-        monkeypatch.setattr(
-            gateway, "get_systemd_unit_path",
-            lambda system=False: SimpleNamespace(exists=lambda: False),
-        )
-        monkeypatch.setattr(
-            gateway, "get_launchd_plist_path",
-            lambda: SimpleNamespace(exists=lambda: False),
-        )
-
-        args = SimpleNamespace(gateway_command="status", deep=False, system=False)
-        gateway.gateway_command(args)
-
-        out = capsys.readouterr().out
-        assert "hermes gateway run" in out
-        assert "tmux" in out
@@ -555,103 +555,3 @@ class TestPromptPluginEnvVars:

        # Should not crash, and not save anything
        mock_save.assert_not_called()
-
-
-# ── curses_radiolist ─────────────────────────────────────────────────────
-
-
-class TestCursesRadiolist:
-    """Test the curses_radiolist function (non-TTY fallback path)."""
-
-    def test_non_tty_returns_default(self):
-        from hermes_cli.curses_ui import curses_radiolist
-        with patch("sys.stdin") as mock_stdin:
-            mock_stdin.isatty.return_value = False
-            result = curses_radiolist("Pick one", ["a", "b", "c"], selected=1)
-            assert result == 1
-
-    def test_non_tty_returns_cancel_value(self):
-        from hermes_cli.curses_ui import curses_radiolist
-        with patch("sys.stdin") as mock_stdin:
-            mock_stdin.isatty.return_value = False
-            result = curses_radiolist("Pick", ["x", "y"], selected=0, cancel_returns=1)
-            assert result == 1
-
-
-# ── Provider discovery helpers ───────────────────────────────────────────
-
-
-class TestProviderDiscovery:
-    """Test provider plugin discovery and config helpers."""
-
-    def test_get_current_memory_provider_default(self, tmp_path, monkeypatch):
-        """Empty config returns empty string."""
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-        config_file = tmp_path / "config.yaml"
-        config_file.write_text("memory:\n  provider: ''\n")
-        from hermes_cli.plugins_cmd import _get_current_memory_provider
-        result = _get_current_memory_provider()
-        assert result == ""
-
-    def test_get_current_context_engine_default(self, tmp_path, monkeypatch):
-        """Default config returns 'compressor'."""
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-        config_file = tmp_path / "config.yaml"
-        config_file.write_text("context:\n  engine: compressor\n")
-        from hermes_cli.plugins_cmd import _get_current_context_engine
-        result = _get_current_context_engine()
-        assert result == "compressor"
-
-    def test_save_memory_provider(self, tmp_path, monkeypatch):
-        """Saving a memory provider persists to config.yaml."""
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-        config_file = tmp_path / "config.yaml"
-        config_file.write_text("memory:\n  provider: ''\n")
-        from hermes_cli.plugins_cmd import _save_memory_provider
-        _save_memory_provider("honcho")
-        content = yaml.safe_load(config_file.read_text())
-        assert content["memory"]["provider"] == "honcho"
-
-    def test_save_context_engine(self, tmp_path, monkeypatch):
-        """Saving a context engine persists to config.yaml."""
-        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
-        config_file = tmp_path / "config.yaml"
-        config_file.write_text("context:\n  engine: compressor\n")
-        from hermes_cli.plugins_cmd import _save_context_engine
-        _save_context_engine("lcm")
-        content = yaml.safe_load(config_file.read_text())
-        assert content["context"]["engine"] == "lcm"
-
-    def test_discover_memory_providers_empty(self):
-        """Discovery returns empty list when import fails."""
-        with patch("plugins.memory.discover_memory_providers",
-                    side_effect=ImportError("no module")):
-            from hermes_cli.plugins_cmd import _discover_memory_providers
-            result = _discover_memory_providers()
-            assert result == []
-
-    def test_discover_context_engines_empty(self):
-        """Discovery returns empty list when import fails."""
-        with patch("plugins.context_engine.discover_context_engines",
-                    side_effect=ImportError("no module")):
-            from hermes_cli.plugins_cmd import _discover_context_engines
-            result = _discover_context_engines()
-            assert result == []
-
-
-# ── Auto-activation fix ──────────────────────────────────────────────────
-
-
-class TestNoAutoActivation:
-    """Verify that plugin engines don't auto-activate when config says 'compressor'."""
-
-    def test_compressor_default_ignores_plugin(self):
-        """When context.engine is 'compressor', a plugin-registered engine should NOT
-        be used — only explicit config triggers plugin engines."""
-        # This tests the run_agent.py logic indirectly by checking that the
-        # code path for default config doesn't call get_plugin_context_engine.
-        import run_agent as ra_module
-        source = open(ra_module.__file__).read()
-        # The old code had: "Even with default config, check if a plugin registered one"
-        # The fix removes this. Verify it's gone.
-        assert "Even with default config, check if a plugin registered one" not in source
@@ -4,8 +4,6 @@ import json
 import sys
 import types

-import pytest
-
 from hermes_cli.auth import get_active_provider
 from hermes_cli.config import load_config, save_config
 from hermes_cli.setup import setup_model_provider
@@ -364,52 +362,3 @@ def test_modal_setup_persists_direct_mode_when_user_chooses_their_own_account(tm

    assert config["terminal"]["backend"] == "modal"
    assert config["terminal"]["modal_mode"] == "direct"
-
-
-def test_resolve_hermes_chat_argv_prefers_which(monkeypatch):
-    from hermes_cli import setup as setup_mod
-
-    monkeypatch.setattr(setup_mod.shutil, "which", lambda name: "/usr/local/bin/hermes" if name == "hermes" else None)
-
-    assert setup_mod._resolve_hermes_chat_argv() == ["/usr/local/bin/hermes", "chat"]
-
-
-def test_resolve_hermes_chat_argv_falls_back_to_module(monkeypatch):
-    from hermes_cli import setup as setup_mod
-
-    monkeypatch.setattr(setup_mod.shutil, "which", lambda _name: None)
-    monkeypatch.setattr(setup_mod.importlib.util, "find_spec", lambda name: object() if name == "hermes_cli" else None)
-
-    assert setup_mod._resolve_hermes_chat_argv() == [sys.executable, "-m", "hermes_cli.main", "chat"]
-
-
-def test_offer_launch_chat_execs_fresh_process(monkeypatch):
-    from hermes_cli import setup as setup_mod
-
-    monkeypatch.setattr(setup_mod, "prompt_yes_no", lambda *_args, **_kwargs: True)
-    monkeypatch.setattr(setup_mod, "_resolve_hermes_chat_argv", lambda: ["/usr/local/bin/hermes", "chat"])
-
-    exec_calls = []
-
-    def fake_execvp(path, argv):
-        exec_calls.append((path, argv))
-        raise SystemExit(0)
-
-    monkeypatch.setattr(setup_mod.os, "execvp", fake_execvp)
-
-    with pytest.raises(SystemExit):
-        setup_mod._offer_launch_chat()
-
-    assert exec_calls == [("/usr/local/bin/hermes", ["/usr/local/bin/hermes", "chat"])]
-
-
-def test_offer_launch_chat_manual_fallback_when_unresolvable(monkeypatch, capsys):
-    from hermes_cli import setup as setup_mod
-
-    monkeypatch.setattr(setup_mod, "prompt_yes_no", lambda *_args, **_kwargs: True)
-    monkeypatch.setattr(setup_mod, "_resolve_hermes_chat_argv", lambda: None)
-
-    setup_mod._offer_launch_chat()
-
-    captured = capsys.readouterr()
-    assert "Run 'hermes chat' manually" in captured.out
@@ -22,7 +22,7 @@ def _parse_setup_imports():
 class TestSetupShutilImport:
    def test_shutil_imported_at_module_level(self):
        """shutil must be imported at module level so setup_gateway can use it
-        for the mautrix auto-install path."""
+        for the matrix-nio auto-install path (line ~2126)."""
        names = _parse_setup_imports()
        assert "shutil" in names, (
            "shutil is not imported at the top of hermes_cli/setup.py. "
@@ -428,31 +428,3 @@ class TestPlatformToolsetConsistency:
                f"Platform {platform!r} in tools_config but missing from "
                f"skills_config PLATFORMS"
            )
-
-
-def test_numeric_mcp_server_name_does_not_crash_sorted():
-    """YAML parses bare numeric keys (e.g. ``12306:``) as int.
-
-    _get_platform_tools must normalise them to str so that sorted()
-    on the returned set never raises TypeError on mixed int/str.
-
-    Regression test for https://github.com/NousResearch/hermes-agent/issues/6901
-    """
-    config = {
-        "platform_toolsets": {"cli": ["web", 12306]},
-        "mcp_servers": {
-            12306: {"url": "https://example.com/mcp"},
-            "normal-server": {"url": "https://example.com/mcp2"},
-        },
-    }
-
-    enabled = _get_platform_tools(config, "cli")
-
-    # All names must be str — no int leaking through
-    assert all(isinstance(name, str) for name in enabled), (
-        f"Non-string toolset names found: {enabled}"
-    )
-    assert "12306" in enabled
-
-    # sorted() must not raise TypeError
-    sorted(enabled)
@@ -500,48 +500,6 @@ class TestObservationModeMigration:
        assert cfg.ai_observe_others is True


-class TestInitOnSessionStart:
-    """Tests for the initOnSessionStart config field."""
-
-    def test_default_is_false(self):
-        config = HonchoClientConfig()
-        assert config.init_on_session_start is False
-
-    def test_root_level_true(self, tmp_path):
-        cfg_file = tmp_path / "config.json"
-        cfg_file.write_text(json.dumps({
-            "apiKey": "k",
-            "initOnSessionStart": True,
-        }))
-        cfg = HonchoClientConfig.from_global_config(config_path=cfg_file)
-        assert cfg.init_on_session_start is True
-
-    def test_host_block_overrides_root(self, tmp_path):
-        cfg_file = tmp_path / "config.json"
-        cfg_file.write_text(json.dumps({
-            "apiKey": "k",
-            "initOnSessionStart": True,
-            "hosts": {"hermes": {"initOnSessionStart": False}},
-        }))
-        cfg = HonchoClientConfig.from_global_config(config_path=cfg_file)
-        assert cfg.init_on_session_start is False
-
-    def test_host_block_true_overrides_root_absent(self, tmp_path):
-        cfg_file = tmp_path / "config.json"
-        cfg_file.write_text(json.dumps({
-            "apiKey": "k",
-            "hosts": {"hermes": {"initOnSessionStart": True}},
-        }))
-        cfg = HonchoClientConfig.from_global_config(config_path=cfg_file)
-        assert cfg.init_on_session_start is True
-
-    def test_absent_everywhere_defaults_false(self, tmp_path):
-        cfg_file = tmp_path / "config.json"
-        cfg_file.write_text(json.dumps({"apiKey": "k"}))
-        cfg = HonchoClientConfig.from_global_config(config_path=cfg_file)
-        assert cfg.init_on_session_start is False
-
-
 class TestResetHonchoClient:
    def test_reset_clears_singleton(self):
        import plugins.memory.honcho.client as mod
@@ -275,97 +275,6 @@ class TestPeerLookupHelpers:
 # ---------------------------------------------------------------------------


-# ---------------------------------------------------------------------------
-# Provider init behavior: lazy vs eager in tools mode
-# ---------------------------------------------------------------------------
-
-
-class TestToolsModeInitBehavior:
-    """Verify initOnSessionStart controls session init timing in tools mode."""
-
-    def _make_provider_with_config(self, recall_mode="tools", init_on_session_start=False,
-                                    peer_name=None, user_id=None):
-        """Create a HonchoMemoryProvider with mocked config and dependencies."""
-        from plugins.memory.honcho.client import HonchoClientConfig
-
-        cfg = HonchoClientConfig(
-            api_key="test-key",
-            enabled=True,
-            recall_mode=recall_mode,
-            init_on_session_start=init_on_session_start,
-            peer_name=peer_name,
-        )
-
-        provider = HonchoMemoryProvider()
-
-        # Patch the config loading and session init to avoid real Honcho calls
-        from unittest.mock import patch, MagicMock
-
-        mock_manager = MagicMock()
-        mock_session = MagicMock()
-        mock_session.messages = []
-        mock_manager.get_or_create.return_value = mock_session
-
-        init_kwargs = {}
-        if user_id:
-            init_kwargs["user_id"] = user_id
-
-        with patch("plugins.memory.honcho.client.HonchoClientConfig.from_global_config", return_value=cfg), \
-             patch("plugins.memory.honcho.client.get_honcho_client", return_value=MagicMock()), \
-             patch("plugins.memory.honcho.session.HonchoSessionManager", return_value=mock_manager), \
-             patch("hermes_constants.get_hermes_home", return_value=MagicMock()):
-            provider.initialize(session_id="test-session-001", **init_kwargs)
-
-        return provider, cfg
-
-    def test_tools_lazy_default(self):
-        """tools + initOnSessionStart=false → session NOT initialized after initialize()."""
-        provider, _ = self._make_provider_with_config(
-            recall_mode="tools", init_on_session_start=False,
-        )
-        assert provider._session_initialized is False
-        assert provider._manager is None
-        assert provider._lazy_init_kwargs is not None
-
-    def test_tools_eager_init(self):
-        """tools + initOnSessionStart=true → session IS initialized after initialize()."""
-        provider, _ = self._make_provider_with_config(
-            recall_mode="tools", init_on_session_start=True,
-        )
-        assert provider._session_initialized is True
-        assert provider._manager is not None
-
-    def test_tools_eager_prefetch_still_empty(self):
-        """tools mode with eager init still returns empty from prefetch() (no auto-injection)."""
-        provider, _ = self._make_provider_with_config(
-            recall_mode="tools", init_on_session_start=True,
-        )
-        assert provider.prefetch("test query") == ""
-
-    def test_tools_lazy_prefetch_empty(self):
-        """tools mode with lazy init also returns empty from prefetch()."""
-        provider, _ = self._make_provider_with_config(
-            recall_mode="tools", init_on_session_start=False,
-        )
-        assert provider.prefetch("test query") == ""
-
-    def test_explicit_peer_name_not_overridden_by_user_id(self):
-        """Explicit peerName in config must not be replaced by gateway user_id."""
-        _, cfg = self._make_provider_with_config(
-            recall_mode="tools", init_on_session_start=True,
-            peer_name="Kathie", user_id="8439114563",
-        )
-        assert cfg.peer_name == "Kathie"
-
-    def test_user_id_used_when_no_peer_name(self):
-        """Gateway user_id is used as peer_name when no explicit peerName configured."""
-        _, cfg = self._make_provider_with_config(
-            recall_mode="tools", init_on_session_start=True,
-            peer_name=None, user_id="8439114563",
-        )
-        assert cfg.peer_name == "8439114563"
-
-
 class TestChunkMessage:
    def test_short_message_single_chunk(self):
        result = HonchoMemoryProvider._chunk_message("hello world", 100)
@@ -1823,111 +1823,6 @@ class TestRunConversation:
        assert result["final_response"] == "Here is the actual answer."
        assert result["api_calls"] == 2  # 1 original + 1 nudge retry

-    def test_empty_response_triggers_fallback_provider(self, agent):
-        """After 3 empty retries, fallback provider is activated and produces content."""
-        self._setup_agent(agent)
-        agent.base_url = "http://127.0.0.1:1234/v1"
-        # Configure a fallback chain
-        agent._fallback_chain = [{"provider": "openrouter", "model": "anthropic/claude-sonnet-4"}]
-        agent._fallback_index = 0
-        agent._fallback_activated = False
-
-        empty_resp = _mock_response(content=None, finish_reason="stop")
-        content_resp = _mock_response(content="Fallback answer.", finish_reason="stop")
-        # 4 empty (1 orig + 3 retries), then fallback model answers
-        agent.client.chat.completions.create.side_effect = [
-            empty_resp, empty_resp, empty_resp, empty_resp, content_resp,
-        ]
-
-        fallback_called = {"called": False}
-
-        def _mock_fallback():
-            fallback_called["called"] = True
-            # Simulate what _try_activate_fallback does: just advance the
-            # index and set the flag (the client is already mocked).
-            agent._fallback_index = 1
-            agent._fallback_activated = True
-            agent.model = "anthropic/claude-sonnet-4"
-            agent.provider = "openrouter"
-            return True
-
-        with (
-            patch.object(agent, "_persist_session"),
-            patch.object(agent, "_save_trajectory"),
-            patch.object(agent, "_cleanup_task_resources"),
-            patch.object(agent, "_try_activate_fallback", side_effect=_mock_fallback),
-        ):
-            result = agent.run_conversation("answer me")
-        assert fallback_called["called"], "Fallback should have been triggered"
-        assert result["completed"] is True
-        assert result["final_response"] == "Fallback answer."
-
-    def test_empty_response_fallback_also_empty_returns_empty(self, agent):
-        """If fallback also returns empty, final response is (empty)."""
-        self._setup_agent(agent)
-        agent.base_url = "http://127.0.0.1:1234/v1"
-        agent._fallback_chain = [{"provider": "openrouter", "model": "anthropic/claude-sonnet-4"}]
-        agent._fallback_index = 0
-        agent._fallback_activated = False
-
-        empty_resp = _mock_response(content=None, finish_reason="stop")
-        # 4 empty from primary (1 + 3 retries), fallback activated,
-        # then 4 more empty from fallback (1 + 3 retries), no more fallbacks
-        agent.client.chat.completions.create.side_effect = [
-            empty_resp, empty_resp, empty_resp, empty_resp,  # primary exhausted
-            empty_resp, empty_resp, empty_resp, empty_resp,  # fallback exhausted
-        ]
-
-        def _mock_fallback():
-            if agent._fallback_index >= len(agent._fallback_chain):
-                return False
-            agent._fallback_index += 1
-            agent._fallback_activated = True
-            agent.model = "anthropic/claude-sonnet-4"
-            agent.provider = "openrouter"
-            return True
-
-        with (
-            patch.object(agent, "_persist_session"),
-            patch.object(agent, "_save_trajectory"),
-            patch.object(agent, "_cleanup_task_resources"),
-            patch.object(agent, "_try_activate_fallback", side_effect=_mock_fallback),
-        ):
-            result = agent.run_conversation("answer me")
-        assert result["completed"] is True
-        assert result["final_response"] == "(empty)"
-
-    def test_empty_response_emits_status_for_gateway(self, agent):
-        """_emit_status is called during empty retries so gateway users see feedback."""
-        self._setup_agent(agent)
-        agent.base_url = "http://127.0.0.1:1234/v1"
-
-        empty_resp = _mock_response(content=None, finish_reason="stop")
-        # 4 empty: 1 original + 3 retries, all empty, no fallback
-        agent.client.chat.completions.create.side_effect = [
-            empty_resp, empty_resp, empty_resp, empty_resp,
-        ]
-
-        status_messages = []
-
-        def _capture_status(msg):
-            status_messages.append(msg)
-
-        with (
-            patch.object(agent, "_persist_session"),
-            patch.object(agent, "_save_trajectory"),
-            patch.object(agent, "_cleanup_task_resources"),
-            patch.object(agent, "_emit_status", side_effect=_capture_status),
-        ):
-            result = agent.run_conversation("answer me")
-
-        assert result["final_response"] == "(empty)"
-        # Should have emitted retry statuses (3 retries) + final failure
-        retry_msgs = [m for m in status_messages if "retrying" in m.lower()]
-        assert len(retry_msgs) == 3, f"Expected 3 retry status messages, got {len(retry_msgs)}: {status_messages}"
-        failure_msgs = [m for m in status_messages if "no content" in m.lower() or "no fallback" in m.lower()]
-        assert len(failure_msgs) >= 1, f"Expected at least 1 failure status, got: {status_messages}"
-
    def test_nous_401_refreshes_after_remint_and_retries(self, agent):
        self._setup_agent(agent)
        agent.provider = "nous"
@@ -12,10 +12,10 @@ def _load_optional_dependencies():


 def test_matrix_extra_linux_only_in_all():
-    """mautrix[encryption] depends on python-olm which is upstream-broken on
-    modern macOS (archived libolm, C++ errors with Clang 21+).  The [matrix]
-    extra is included in [all] but gated to Linux via a platform marker so
-    that ``hermes update`` doesn't fail on macOS."""
+    """matrix-nio[e2e] depends on python-olm which is upstream-broken on modern
+    macOS (archived libolm, C++ errors with Clang 21+).  The [matrix] extra is
+    included in [all] but gated to Linux via a platform marker so that
+    ``hermes update`` doesn't fail on macOS."""
    optional_dependencies = _load_optional_dependencies()

    assert "matrix" in optional_dependencies
@@ -156,8 +156,6 @@ class TestSessionKeyContext:
        assert "reset_current_session_key" in called_names


-
-
 class TestRmFalsePositiveFix:
    """Regression tests: filenames starting with 'r' must NOT trigger recursive delete."""

@@ -1,176 +0,0 @@
-"""Unit tests for tools/budget_config.py.
-
-Covers default values, resolve_threshold() priority chain
-(pinned > tool_overrides > registry > default), immutability,
-and the PINNED_THRESHOLDS escape-hatch for read_file.
-"""
-
-import dataclasses
-import math
-from unittest.mock import patch
-
-import pytest
-
-from tools.budget_config import (
-    DEFAULT_BUDGET,
-    DEFAULT_PREVIEW_SIZE_CHARS,
-    DEFAULT_RESULT_SIZE_CHARS,
-    DEFAULT_TURN_BUDGET_CHARS,
-    PINNED_THRESHOLDS,
-    BudgetConfig,
-)
-
-
-# ---------------------------------------------------------------------------
-# Module-level constants
-# ---------------------------------------------------------------------------
-
-
-class TestModuleConstants:
-    """Verify documented default values haven't drifted."""
-
-    def test_default_result_size(self):
-        assert DEFAULT_RESULT_SIZE_CHARS == 100_000
-
-    def test_default_turn_budget(self):
-        assert DEFAULT_TURN_BUDGET_CHARS == 200_000
-
-    def test_default_preview_size(self):
-        assert DEFAULT_PREVIEW_SIZE_CHARS == 1_500
-
-
-class TestPinnedThresholds:
-    """PINNED_THRESHOLDS – tools whose values must never be overridden."""
-
-    def test_read_file_is_inf(self):
-        assert PINNED_THRESHOLDS["read_file"] == float("inf")
-        assert math.isinf(PINNED_THRESHOLDS["read_file"])
-
-    def test_pinned_is_not_empty(self):
-        assert len(PINNED_THRESHOLDS) >= 1
-
-
-# ---------------------------------------------------------------------------
-# BudgetConfig defaults
-# ---------------------------------------------------------------------------
-
-
-class TestBudgetConfigDefaults:
-    """BudgetConfig() should match the module-level defaults exactly."""
-
-    def test_default_result_size(self):
-        cfg = BudgetConfig()
-        assert cfg.default_result_size == DEFAULT_RESULT_SIZE_CHARS
-
-    def test_default_turn_budget(self):
-        cfg = BudgetConfig()
-        assert cfg.turn_budget == DEFAULT_TURN_BUDGET_CHARS
-
-    def test_default_preview_size(self):
-        cfg = BudgetConfig()
-        assert cfg.preview_size == DEFAULT_PREVIEW_SIZE_CHARS
-
-    def test_default_tool_overrides_empty(self):
-        cfg = BudgetConfig()
-        assert cfg.tool_overrides == {}
-
-    def test_default_budget_singleton_matches(self):
-        """DEFAULT_BUDGET should equal a freshly constructed BudgetConfig."""
-        assert DEFAULT_BUDGET == BudgetConfig()
-
-
-# ---------------------------------------------------------------------------
-# Immutability (frozen=True)
-# ---------------------------------------------------------------------------
-
-
-class TestBudgetConfigFrozen:
-    """Frozen dataclass must reject attribute mutation."""
-
-    def test_cannot_set_default_result_size(self):
-        cfg = BudgetConfig()
-        with pytest.raises(dataclasses.FrozenInstanceError):
-            cfg.default_result_size = 999
-
-    def test_cannot_set_turn_budget(self):
-        cfg = BudgetConfig()
-        with pytest.raises(dataclasses.FrozenInstanceError):
-            cfg.turn_budget = 999
-
-    def test_cannot_set_preview_size(self):
-        cfg = BudgetConfig()
-        with pytest.raises(dataclasses.FrozenInstanceError):
-            cfg.preview_size = 999
-
-    def test_cannot_set_tool_overrides(self):
-        cfg = BudgetConfig()
-        with pytest.raises(dataclasses.FrozenInstanceError):
-            cfg.tool_overrides = {"foo": 1}
-
-
-# ---------------------------------------------------------------------------
-# Custom construction
-# ---------------------------------------------------------------------------
-
-
-class TestBudgetConfigCustom:
-    """BudgetConfig can be created with non-default values."""
-
-    def test_custom_values(self):
-        cfg = BudgetConfig(
-            default_result_size=50_000,
-            turn_budget=100_000,
-            preview_size=500,
-            tool_overrides={"my_tool": 42},
-        )
-        assert cfg.default_result_size == 50_000
-        assert cfg.turn_budget == 100_000
-        assert cfg.preview_size == 500
-        assert cfg.tool_overrides == {"my_tool": 42}
-
-
-# ---------------------------------------------------------------------------
-# resolve_threshold() priority chain
-# ---------------------------------------------------------------------------
-
-
-class TestResolveThreshold:
-    """Priority: pinned > tool_overrides > registry > default."""
-
-    def test_pinned_wins_over_override(self):
-        """Even if tool_overrides contains read_file, pinned value wins."""
-        cfg = BudgetConfig(tool_overrides={"read_file": 1})
-        result = cfg.resolve_threshold("read_file")
-        assert result == float("inf")
-
-    def test_tool_override_wins_over_default(self):
-        """tool_overrides should be returned before falling back to registry."""
-        cfg = BudgetConfig(tool_overrides={"my_tool": 42})
-        result = cfg.resolve_threshold("my_tool")
-        assert result == 42
-
-    @patch("tools.registry.registry")
-    def test_falls_back_to_registry(self, mock_registry):
-        """When not pinned and not in overrides, delegate to registry."""
-        mock_registry.get_max_result_size.return_value = 77_777
-        cfg = BudgetConfig()
-        result = cfg.resolve_threshold("some_tool")
-        mock_registry.get_max_result_size.assert_called_once_with(
-            "some_tool", default=DEFAULT_RESULT_SIZE_CHARS
-        )
-        assert result == 77_777
-
-    @patch("tools.registry.registry")
-    def test_registry_receives_custom_default(self, mock_registry):
-        """Custom default_result_size flows through to registry call."""
-        mock_registry.get_max_result_size.return_value = 50_000
-        cfg = BudgetConfig(default_result_size=50_000)
-        cfg.resolve_threshold("unknown_tool")
-        mock_registry.get_max_result_size.assert_called_once_with(
-            "unknown_tool", default=50_000
-        )
-
-    def test_pinned_read_file_returns_inf(self):
-        """Canonical case: read_file must always return inf."""
-        cfg = BudgetConfig()
-        assert cfg.resolve_threshold("read_file") == float("inf")
@@ -205,9 +205,9 @@ class TestMacosOsascript:

 class TestIsWsl:
    def setup_method(self):
-        # _is_wsl is now hermes_constants.is_wsl — reset its cache
-        import hermes_constants
-        hermes_constants._wsl_detected = None
+        # Reset cached value before each test
+        import hermes_cli.clipboard as cb
+        cb._wsl_detected = None

    def test_wsl2_detected(self):
        content = "Linux version 5.15.0 (microsoft-standard-WSL2)"
@@ -229,7 +229,6 @@ class TestIsWsl:
            assert _is_wsl() is False

    def test_result_is_cached(self):
-        import hermes_constants
        content = "Linux version 5.15.0 (microsoft-standard-WSL2)"
        with patch("builtins.open", mock_open(read_data=content)) as m:
            assert _is_wsl() is True
@@ -1210,73 +1210,5 @@ class TestDelegateHeartbeat(unittest.TestCase):
            f"Heartbeat should include last_activity_desc: {touch_calls}")


-class TestDelegationReasoningEffort(unittest.TestCase):
-    """Tests for delegation.reasoning_effort config override."""
-
-    @patch("tools.delegate_tool._load_config")
-    @patch("run_agent.AIAgent")
-    def test_inherits_parent_reasoning_when_no_override(self, MockAgent, mock_cfg):
-        """With no delegation.reasoning_effort, child inherits parent's config."""
-        mock_cfg.return_value = {"max_iterations": 50, "reasoning_effort": ""}
-        MockAgent.return_value = MagicMock()
-        parent = _make_mock_parent()
-        parent.reasoning_config = {"enabled": True, "effort": "xhigh"}
-
-        _build_child_agent(
-            task_index=0, goal="test", context=None, toolsets=None,
-            model=None, max_iterations=50, parent_agent=parent,
-        )
-        call_kwargs = MockAgent.call_args[1]
-        self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "xhigh"})
-
-    @patch("tools.delegate_tool._load_config")
-    @patch("run_agent.AIAgent")
-    def test_override_reasoning_effort_from_config(self, MockAgent, mock_cfg):
-        """delegation.reasoning_effort overrides the parent's level."""
-        mock_cfg.return_value = {"max_iterations": 50, "reasoning_effort": "low"}
-        MockAgent.return_value = MagicMock()
-        parent = _make_mock_parent()
-        parent.reasoning_config = {"enabled": True, "effort": "xhigh"}
-
-        _build_child_agent(
-            task_index=0, goal="test", context=None, toolsets=None,
-            model=None, max_iterations=50, parent_agent=parent,
-        )
-        call_kwargs = MockAgent.call_args[1]
-        self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "low"})
-
-    @patch("tools.delegate_tool._load_config")
-    @patch("run_agent.AIAgent")
-    def test_override_reasoning_effort_none_disables(self, MockAgent, mock_cfg):
-        """delegation.reasoning_effort: 'none' disables thinking for subagents."""
-        mock_cfg.return_value = {"max_iterations": 50, "reasoning_effort": "none"}
-        MockAgent.return_value = MagicMock()
-        parent = _make_mock_parent()
-        parent.reasoning_config = {"enabled": True, "effort": "high"}
-
-        _build_child_agent(
-            task_index=0, goal="test", context=None, toolsets=None,
-            model=None, max_iterations=50, parent_agent=parent,
-        )
-        call_kwargs = MockAgent.call_args[1]
-        self.assertEqual(call_kwargs["reasoning_config"], {"enabled": False})
-
-    @patch("tools.delegate_tool._load_config")
-    @patch("run_agent.AIAgent")
-    def test_invalid_reasoning_effort_falls_back_to_parent(self, MockAgent, mock_cfg):
-        """Invalid delegation.reasoning_effort falls back to parent's config."""
-        mock_cfg.return_value = {"max_iterations": 50, "reasoning_effort": "banana"}
-        MockAgent.return_value = MagicMock()
-        parent = _make_mock_parent()
-        parent.reasoning_config = {"enabled": True, "effort": "medium"}
-
-        _build_child_agent(
-            task_index=0, goal="test", context=None, toolsets=None,
-            model=None, max_iterations=50, parent_agent=parent,
-        )
-        call_kwargs = MockAgent.call_args[1]
-        self.assertEqual(call_kwargs["reasoning_config"], {"enabled": True, "effort": "medium"})
-
-
 if __name__ == "__main__":
    unittest.main()
@@ -1,148 +0,0 @@
-"""Tests for edge cases in tools/file_operations.py.
-
-Covers:
- ``_is_likely_binary()`` content-analysis branch (dead-code removal regression guard)
- ``_check_lint()`` robustness against file paths containing curly braces
-"""
-
-import pytest
-from unittest.mock import MagicMock, patch
-
-from tools.file_operations import ShellFileOperations
-
-
-# =========================================================================
-# _is_likely_binary edge cases
-# =========================================================================
-
-
-class TestIsLikelyBinary:
-    """Verify content-analysis logic after dead-code removal."""
-
-    @pytest.fixture()
-    def ops(self):
-        return ShellFileOperations.__new__(ShellFileOperations)
-
-    def test_binary_extension_returns_true(self, ops):
-        """Known binary extensions should short-circuit without content analysis."""
-        assert ops._is_likely_binary("image.png") is True
-        assert ops._is_likely_binary("archive.tar.gz", content_sample="hello") is True
-
-    def test_text_content_returns_false(self, ops):
-        """Normal printable text should not be classified as binary."""
-        sample = "Hello, world!\nThis is a normal text file.\n"
-        assert ops._is_likely_binary("unknown.xyz", content_sample=sample) is False
-
-    def test_binary_content_returns_true(self, ops):
-        """Content with >30% non-printable characters should be classified as binary."""
-        # 500 NUL bytes + 500 printable = 50% non-printable → binary
-        # Use .xyz extension (not in BINARY_EXTENSIONS) to ensure content analysis runs
-        sample = "\x00" * 500 + "a" * 500
-        assert ops._is_likely_binary("data.xyz", content_sample=sample) is True
-
-    def test_no_content_sample_returns_false(self, ops):
-        """When no content sample is provided and extension is unknown → not binary."""
-        assert ops._is_likely_binary("mystery_file") is False
-
-    def test_none_content_sample_returns_false(self, ops):
-        """Explicit ``None`` content_sample should behave the same as missing."""
-        assert ops._is_likely_binary("mystery_file", content_sample=None) is False
-
-    def test_empty_string_content_sample_returns_false(self, ops):
-        """Empty string is falsy, so content analysis should be skipped → not binary."""
-        assert ops._is_likely_binary("mystery_file", content_sample="") is False
-
-    def test_threshold_boundary(self, ops):
-        """Exactly 30% non-printable should NOT trigger binary classification (> 0.30, not >=)."""
-        # 300 NUL bytes + 700 printable = 30.0% → should be False (uses strict >)
-        sample = "\x00" * 300 + "a" * 700
-        assert ops._is_likely_binary("data.xyz", content_sample=sample) is False
-
-    def test_just_above_threshold(self, ops):
-        """301/1000 = 30.1% non-printable → should be binary."""
-        sample = "\x00" * 301 + "a" * 699
-        assert ops._is_likely_binary("data.xyz", content_sample=sample) is True
-
-    def test_tabs_and_newlines_excluded(self, ops):
-        """Tabs, carriage returns, and newlines should not count as non-printable."""
-        sample = "\t" * 400 + "\n" * 300 + "\r" * 200 + "a" * 100
-        assert ops._is_likely_binary("file.txt", content_sample=sample) is False
-
-    def test_content_sample_longer_than_1000(self, ops):
-        """Only the first 1000 characters should be analysed."""
-        # First 1000 chars: 200 NUL + 800 printable = 20% → not binary
-        # Remaining 1000 chars: all NUL → ignored by [:1000] slice
-        sample = "\x00" * 200 + "a" * 800 + "\x00" * 1000
-        assert ops._is_likely_binary("file.xyz", content_sample=sample) is False
-
-
-# =========================================================================
-# _check_lint edge cases
-# =========================================================================
-
-
-class TestCheckLintBracePaths:
-    """Verify _check_lint handles file paths with curly braces safely."""
-
-    @pytest.fixture()
-    def ops(self):
-        obj = ShellFileOperations.__new__(ShellFileOperations)
-        obj._command_cache = {}
-        return obj
-
-    def test_normal_path(self, ops):
-        """Normal path without braces should work as before."""
-        with patch.object(ops, "_has_command", return_value=True), \
-             patch.object(ops, "_exec") as mock_exec:
-            mock_exec.return_value = MagicMock(exit_code=0, stdout="")
-            result = ops._check_lint("/tmp/test_file.py")
-
-        assert result.success is True
-        # Verify the command was built correctly
-        cmd_arg = mock_exec.call_args[0][0]
-        assert "'/tmp/test_file.py'" in cmd_arg
-
-    def test_path_with_curly_braces(self, ops):
-        """Path containing ``{`` and ``}`` must not raise KeyError/ValueError."""
-        with patch.object(ops, "_has_command", return_value=True), \
-             patch.object(ops, "_exec") as mock_exec:
-            mock_exec.return_value = MagicMock(exit_code=0, stdout="")
-            # This would raise KeyError with .format() but works with .replace()
-            result = ops._check_lint("/tmp/{test}_file.py")
-
-        assert result.success is True
-        cmd_arg = mock_exec.call_args[0][0]
-        assert "{test}" in cmd_arg
-
-    def test_path_with_nested_braces(self, ops):
-        """Path with complex brace patterns like ``{{var}}`` should be safe."""
-        with patch.object(ops, "_has_command", return_value=True), \
-             patch.object(ops, "_exec") as mock_exec:
-            mock_exec.return_value = MagicMock(exit_code=0, stdout="")
-            result = ops._check_lint("/tmp/{{var}}.py")
-
-        assert result.success is True
-
-    def test_unsupported_extension_skipped(self, ops):
-        """Extensions without a linter should return a skipped result."""
-        result = ops._check_lint("/tmp/file.unknown_ext")
-        assert result.skipped is True
-
-    def test_missing_linter_skipped(self, ops):
-        """When the linter binary is not installed, skip gracefully."""
-        with patch.object(ops, "_has_command", return_value=False):
-            result = ops._check_lint("/tmp/test.py")
-        assert result.skipped is True
-
-    def test_lint_failure_returns_output(self, ops):
-        """When the linter exits non-zero, result should capture output."""
-        with patch.object(ops, "_has_command", return_value=True), \
-             patch.object(ops, "_exec") as mock_exec:
-            mock_exec.return_value = MagicMock(
-                exit_code=1,
-                stdout="SyntaxError: invalid syntax",
-            )
-            result = ops._check_lint("/tmp/bad.py")
-
-        assert result.success is False
-        assert "SyntaxError" in result.output
@@ -255,57 +255,3 @@ class TestEdgeCases:

        mgr.sync(force=True)
        upload.assert_not_called()  # _file_mtime_key returns None, skipped
-
-
-class TestBulkUpload:
-    """Tests for the optional bulk_upload_fn callback."""
-
-    def test_bulk_upload_used_when_provided(self, tmp_files):
-        """When bulk_upload_fn is set, it's called instead of per-file upload_fn."""
-        upload = MagicMock()
-        bulk_upload = MagicMock()
-        mgr = FileSyncManager(
-            get_files_fn=_make_get_files(tmp_files),
-            upload_fn=upload,
-            delete_fn=MagicMock(),
-            bulk_upload_fn=bulk_upload,
-        )
-
-        mgr.sync(force=True)
-        upload.assert_not_called()
-        bulk_upload.assert_called_once()
-        # All 3 files passed as a list of (host, remote) tuples
-        files_arg = bulk_upload.call_args[0][0]
-        assert len(files_arg) == 3
-
-    def test_fallback_to_upload_fn_when_no_bulk(self, tmp_files):
-        """Without bulk_upload_fn, per-file upload_fn is used (backwards compat)."""
-        upload = MagicMock()
-        mgr = FileSyncManager(
-            get_files_fn=_make_get_files(tmp_files),
-            upload_fn=upload,
-            delete_fn=MagicMock(),
-            bulk_upload_fn=None,
-        )
-
-        mgr.sync(force=True)
-        assert upload.call_count == 3
-
-    def test_bulk_upload_rollback_on_failure(self, tmp_files):
-        """Bulk upload failure rolls back synced state so next sync retries."""
-        bulk_upload = MagicMock(side_effect=RuntimeError("upload failed"))
-        mgr = FileSyncManager(
-            get_files_fn=_make_get_files(tmp_files),
-            upload_fn=MagicMock(),
-            delete_fn=MagicMock(),
-            bulk_upload_fn=bulk_upload,
-        )
-
-        mgr.sync(force=True)  # fails, should rollback
-
-        # State rolled back: next sync should retry all files
-        bulk_upload.side_effect = None
-        bulk_upload.reset_mock()
-        mgr.sync(force=True)
-        bulk_upload.assert_called_once()
-        assert len(bulk_upload.call_args[0][0]) == 3
@@ -215,7 +215,6 @@ def test_openai_tts_uses_managed_audio_gateway_when_direct_key_absent(monkeypatc
    _install_fake_tools_package()
    _install_fake_openai_module(captured)
    monkeypatch.delenv("VOICE_TOOLS_OPENAI_KEY", raising=False)
-    monkeypatch.delenv("OPENAI_API_KEY", raising=False)
    monkeypatch.setenv("TOOL_GATEWAY_DOMAIN", "nousresearch.com")
    monkeypatch.setenv("TOOL_GATEWAY_USER_TOKEN", "nous-token")

@@ -257,7 +256,6 @@ def test_transcription_uses_model_specific_response_formats(monkeypatch, tmp_pat
    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
    (tmp_path / "config.yaml").write_text("stt:\n  provider: openai\n")
    monkeypatch.delenv("VOICE_TOOLS_OPENAI_KEY", raising=False)
-    monkeypatch.delenv("OPENAI_API_KEY", raising=False)
    monkeypatch.setenv("TOOL_GATEWAY_DOMAIN", "nousresearch.com")
    monkeypatch.setenv("TOOL_GATEWAY_USER_TOKEN", "nous-token")

@@ -6,7 +6,6 @@ from unittest.mock import patch
 from tools.skills_sync import (
    _get_bundled_dir,
    _read_manifest,
-    _read_skill_name,
    _write_manifest,
    _discover_bundled_skills,
    _compute_relative_dest,
@@ -133,37 +132,6 @@ class TestDiscoverBundledSkills:
        assert skills == []


-class TestReadSkillName:
-    def test_reads_name_from_frontmatter(self, tmp_path):
-        skill_md = tmp_path / "SKILL.md"
-        skill_md.write_text("---\nname: audiocraft-audio-generation\n---\n# Skill")
-        assert _read_skill_name(skill_md, "audiocraft") == "audiocraft-audio-generation"
-
-    def test_falls_back_to_dir_name_without_frontmatter(self, tmp_path):
-        skill_md = tmp_path / "SKILL.md"
-        skill_md.write_text("# Just a heading\nNo frontmatter here")
-        assert _read_skill_name(skill_md, "my-skill") == "my-skill"
-
-    def test_falls_back_when_name_field_empty(self, tmp_path):
-        skill_md = tmp_path / "SKILL.md"
-        skill_md.write_text("---\nname:\n---\n")
-        assert _read_skill_name(skill_md, "fallback") == "fallback"
-
-    def test_handles_quoted_name(self, tmp_path):
-        skill_md = tmp_path / "SKILL.md"
-        skill_md.write_text('---\nname: "serving-llms-vllm"\n---\n')
-        assert _read_skill_name(skill_md, "vllm") == "serving-llms-vllm"
-
-    def test_discover_uses_frontmatter_name(self, tmp_path):
-        skill_dir = tmp_path / "category" / "audiocraft"
-        skill_dir.mkdir(parents=True)
-        (skill_dir / "SKILL.md").write_text(
-            "---\nname: audiocraft-audio-generation\n---\n# Skill"
-        )
-        skills = _discover_bundled_skills(tmp_path)
-        assert skills[0][0] == "audiocraft-audio-generation"
-
-
 class TestComputeRelativeDest:
    def test_preserves_category_structure(self):
        bundled = Path("/repo/skills")
@@ -1,287 +0,0 @@
-"""Unit tests for tools/tool_backend_helpers.py.
-
-Tests cover:
- managed_nous_tools_enabled() feature flag
- normalize_browser_cloud_provider() coercion
- coerce_modal_mode() / normalize_modal_mode() validation
- has_direct_modal_credentials() detection
- resolve_modal_backend_state() backend selection matrix
- resolve_openai_audio_api_key() priority chain
-"""
-
-from __future__ import annotations
-
-from pathlib import Path
-from unittest.mock import patch
-
-import pytest
-
-from tools.tool_backend_helpers import (
-    coerce_modal_mode,
-    has_direct_modal_credentials,
-    managed_nous_tools_enabled,
-    normalize_browser_cloud_provider,
-    normalize_modal_mode,
-    resolve_modal_backend_state,
-    resolve_openai_audio_api_key,
-)
-
-
-# ---------------------------------------------------------------------------
-# managed_nous_tools_enabled
-# ---------------------------------------------------------------------------
-class TestManagedNousToolsEnabled:
-    """Feature flag driven by HERMES_ENABLE_NOUS_MANAGED_TOOLS."""
-
-    def test_disabled_by_default(self, monkeypatch):
-        monkeypatch.delenv("HERMES_ENABLE_NOUS_MANAGED_TOOLS", raising=False)
-        assert managed_nous_tools_enabled() is False
-
-    @pytest.mark.parametrize("val", ["1", "true", "True", "yes"])
-    def test_enabled_when_truthy(self, monkeypatch, val):
-        monkeypatch.setenv("HERMES_ENABLE_NOUS_MANAGED_TOOLS", val)
-        assert managed_nous_tools_enabled() is True
-
-    @pytest.mark.parametrize("val", ["0", "false", "no", ""])
-    def test_disabled_when_falsy(self, monkeypatch, val):
-        monkeypatch.setenv("HERMES_ENABLE_NOUS_MANAGED_TOOLS", val)
-        assert managed_nous_tools_enabled() is False
-
-
-# ---------------------------------------------------------------------------
-# normalize_browser_cloud_provider
-# ---------------------------------------------------------------------------
-class TestNormalizeBrowserCloudProvider:
-    """Coerce arbitrary input to a lowercase browser provider key."""
-
-    def test_none_returns_default(self):
-        assert normalize_browser_cloud_provider(None) == "local"
-
-    def test_empty_string_returns_default(self):
-        assert normalize_browser_cloud_provider("") == "local"
-
-    def test_whitespace_only_returns_default(self):
-        assert normalize_browser_cloud_provider("   ") == "local"
-
-    def test_known_provider_normalized(self):
-        assert normalize_browser_cloud_provider("BrowserBase") == "browserbase"
-
-    def test_strips_whitespace(self):
-        assert normalize_browser_cloud_provider("  Local  ") == "local"
-
-    def test_integer_coerced(self):
-        result = normalize_browser_cloud_provider(42)
-        assert isinstance(result, str)
-        assert result == "42"
-
-
-# ---------------------------------------------------------------------------
-# coerce_modal_mode / normalize_modal_mode
-# ---------------------------------------------------------------------------
-class TestCoerceModalMode:
-    """Validate and coerce the requested modal execution mode."""
-
-    @pytest.mark.parametrize("value", ["auto", "direct", "managed"])
-    def test_valid_modes_passthrough(self, value):
-        assert coerce_modal_mode(value) == value
-
-    def test_none_returns_auto(self):
-        assert coerce_modal_mode(None) == "auto"
-
-    def test_empty_string_returns_auto(self):
-        assert coerce_modal_mode("") == "auto"
-
-    def test_whitespace_only_returns_auto(self):
-        assert coerce_modal_mode("   ") == "auto"
-
-    def test_uppercase_normalized(self):
-        assert coerce_modal_mode("DIRECT") == "direct"
-
-    def test_mixed_case_normalized(self):
-        assert coerce_modal_mode("Managed") == "managed"
-
-    def test_invalid_mode_falls_back_to_auto(self):
-        assert coerce_modal_mode("invalid") == "auto"
-        assert coerce_modal_mode("cloud") == "auto"
-
-    def test_strips_whitespace(self):
-        assert coerce_modal_mode("  managed  ") == "managed"
-
-
-class TestNormalizeModalMode:
-    """normalize_modal_mode is an alias for coerce_modal_mode."""
-
-    def test_delegates_to_coerce(self):
-        assert normalize_modal_mode("direct") == coerce_modal_mode("direct")
-        assert normalize_modal_mode(None) == coerce_modal_mode(None)
-        assert normalize_modal_mode("bogus") == coerce_modal_mode("bogus")
-
-
-# ---------------------------------------------------------------------------
-# has_direct_modal_credentials
-# ---------------------------------------------------------------------------
-class TestHasDirectModalCredentials:
-    """Detect Modal credentials via env vars or config file."""
-
-    def test_no_env_no_file(self, monkeypatch, tmp_path):
-        monkeypatch.delenv("MODAL_TOKEN_ID", raising=False)
-        monkeypatch.delenv("MODAL_TOKEN_SECRET", raising=False)
-        with patch.object(Path, "home", return_value=tmp_path):
-            assert has_direct_modal_credentials() is False
-
-    def test_both_env_vars_set(self, monkeypatch, tmp_path):
-        monkeypatch.setenv("MODAL_TOKEN_ID", "id-123")
-        monkeypatch.setenv("MODAL_TOKEN_SECRET", "sec-456")
-        with patch.object(Path, "home", return_value=tmp_path):
-            assert has_direct_modal_credentials() is True
-
-    def test_only_token_id_not_enough(self, monkeypatch, tmp_path):
-        monkeypatch.setenv("MODAL_TOKEN_ID", "id-123")
-        monkeypatch.delenv("MODAL_TOKEN_SECRET", raising=False)
-        with patch.object(Path, "home", return_value=tmp_path):
-            assert has_direct_modal_credentials() is False
-
-    def test_only_token_secret_not_enough(self, monkeypatch, tmp_path):
-        monkeypatch.delenv("MODAL_TOKEN_ID", raising=False)
-        monkeypatch.setenv("MODAL_TOKEN_SECRET", "sec-456")
-        with patch.object(Path, "home", return_value=tmp_path):
-            assert has_direct_modal_credentials() is False
-
-    def test_config_file_present(self, monkeypatch, tmp_path):
-        monkeypatch.delenv("MODAL_TOKEN_ID", raising=False)
-        monkeypatch.delenv("MODAL_TOKEN_SECRET", raising=False)
-        (tmp_path / ".modal.toml").touch()
-        with patch.object(Path, "home", return_value=tmp_path):
-            assert has_direct_modal_credentials() is True
-
-    def test_env_vars_take_priority_over_file(self, monkeypatch, tmp_path):
-        monkeypatch.setenv("MODAL_TOKEN_ID", "id-123")
-        monkeypatch.setenv("MODAL_TOKEN_SECRET", "sec-456")
-        (tmp_path / ".modal.toml").touch()
-        with patch.object(Path, "home", return_value=tmp_path):
-            assert has_direct_modal_credentials() is True
-
-
-# ---------------------------------------------------------------------------
-# resolve_modal_backend_state
-# ---------------------------------------------------------------------------
-class TestResolveModalBackendState:
-    """Full matrix of direct vs managed Modal backend selection."""
-
-    @staticmethod
-    def _resolve(monkeypatch, mode, *, has_direct, managed_ready, nous_enabled=False):
-        """Helper to call resolve_modal_backend_state with feature flag control."""
-        if nous_enabled:
-            monkeypatch.setenv("HERMES_ENABLE_NOUS_MANAGED_TOOLS", "1")
-        else:
-            monkeypatch.setenv("HERMES_ENABLE_NOUS_MANAGED_TOOLS", "")
-        return resolve_modal_backend_state(
-            mode, has_direct=has_direct, managed_ready=managed_ready
-        )
-
-    # --- auto mode ---
-
-    def test_auto_prefers_managed_when_available(self, monkeypatch):
-        result = self._resolve(monkeypatch, "auto", has_direct=True, managed_ready=True, nous_enabled=True)
-        assert result["selected_backend"] == "managed"
-
-    def test_auto_falls_back_to_direct(self, monkeypatch):
-        result = self._resolve(monkeypatch, "auto", has_direct=True, managed_ready=False, nous_enabled=True)
-        assert result["selected_backend"] == "direct"
-
-    def test_auto_no_backends_available(self, monkeypatch):
-        result = self._resolve(monkeypatch, "auto", has_direct=False, managed_ready=False)
-        assert result["selected_backend"] is None
-
-    def test_auto_managed_ready_but_nous_disabled(self, monkeypatch):
-        result = self._resolve(monkeypatch, "auto", has_direct=True, managed_ready=True, nous_enabled=False)
-        assert result["selected_backend"] == "direct"
-
-    def test_auto_nothing_when_only_managed_and_nous_disabled(self, monkeypatch):
-        result = self._resolve(monkeypatch, "auto", has_direct=False, managed_ready=True, nous_enabled=False)
-        assert result["selected_backend"] is None
-
-    # --- direct mode ---
-
-    def test_direct_selects_direct_when_available(self, monkeypatch):
-        result = self._resolve(monkeypatch, "direct", has_direct=True, managed_ready=True, nous_enabled=True)
-        assert result["selected_backend"] == "direct"
-
-    def test_direct_none_when_no_credentials(self, monkeypatch):
-        result = self._resolve(monkeypatch, "direct", has_direct=False, managed_ready=True, nous_enabled=True)
-        assert result["selected_backend"] is None
-
-    # --- managed mode ---
-
-    def test_managed_selects_managed_when_ready_and_enabled(self, monkeypatch):
-        result = self._resolve(monkeypatch, "managed", has_direct=True, managed_ready=True, nous_enabled=True)
-        assert result["selected_backend"] == "managed"
-
-    def test_managed_none_when_not_ready(self, monkeypatch):
-        result = self._resolve(monkeypatch, "managed", has_direct=True, managed_ready=False, nous_enabled=True)
-        assert result["selected_backend"] is None
-
-    def test_managed_blocked_when_nous_disabled(self, monkeypatch):
-        result = self._resolve(monkeypatch, "managed", has_direct=True, managed_ready=True, nous_enabled=False)
-        assert result["selected_backend"] is None
-        assert result["managed_mode_blocked"] is True
-
-    # --- return structure ---
-
-    def test_return_dict_keys(self, monkeypatch):
-        result = self._resolve(monkeypatch, "auto", has_direct=True, managed_ready=False)
-        expected_keys = {
-            "requested_mode",
-            "mode",
-            "has_direct",
-            "managed_ready",
-            "managed_mode_blocked",
-            "selected_backend",
-        }
-        assert set(result.keys()) == expected_keys
-
-    def test_passthrough_flags(self, monkeypatch):
-        result = self._resolve(monkeypatch, "direct", has_direct=True, managed_ready=False)
-        assert result["requested_mode"] == "direct"
-        assert result["mode"] == "direct"
-        assert result["has_direct"] is True
-        assert result["managed_ready"] is False
-
-    # --- invalid mode falls back to auto ---
-
-    def test_invalid_mode_treated_as_auto(self, monkeypatch):
-        result = self._resolve(monkeypatch, "bogus", has_direct=True, managed_ready=False)
-        assert result["requested_mode"] == "auto"
-        assert result["mode"] == "auto"
-
-
-# ---------------------------------------------------------------------------
-# resolve_openai_audio_api_key
-# ---------------------------------------------------------------------------
-class TestResolveOpenaiAudioApiKey:
-    """Priority: VOICE_TOOLS_OPENAI_KEY > OPENAI_API_KEY."""
-
-    def test_voice_key_preferred(self, monkeypatch):
-        monkeypatch.setenv("VOICE_TOOLS_OPENAI_KEY", "voice-key")
-        monkeypatch.setenv("OPENAI_API_KEY", "general-key")
-        assert resolve_openai_audio_api_key() == "voice-key"
-
-    def test_falls_back_to_openai_key(self, monkeypatch):
-        monkeypatch.delenv("VOICE_TOOLS_OPENAI_KEY", raising=False)
-        monkeypatch.setenv("OPENAI_API_KEY", "general-key")
-        assert resolve_openai_audio_api_key() == "general-key"
-
-    def test_empty_voice_key_falls_back(self, monkeypatch):
-        monkeypatch.setenv("VOICE_TOOLS_OPENAI_KEY", "")
-        monkeypatch.setenv("OPENAI_API_KEY", "general-key")
-        assert resolve_openai_audio_api_key() == "general-key"
-
-    def test_no_keys_returns_empty(self, monkeypatch):
-        monkeypatch.delenv("VOICE_TOOLS_OPENAI_KEY", raising=False)
-        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
-        assert resolve_openai_audio_api_key() == ""
-
-    def test_strips_whitespace(self, monkeypatch):
-        monkeypatch.setenv("VOICE_TOOLS_OPENAI_KEY", "  voice-key  ")
-        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
-        assert resolve_openai_audio_api_key() == "voice-key"
@@ -1,245 +0,0 @@
-"""Tests for the Mistral (Voxtral) TTS provider in tools/tts_tool.py."""
-
-import base64
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-
-@pytest.fixture(autouse=True)
-def clean_env(monkeypatch):
-    for key in ("MISTRAL_API_KEY", "HERMES_SESSION_PLATFORM"):
-        monkeypatch.delenv(key, raising=False)
-
-
-@pytest.fixture
-def mock_mistral_module():
-    mock_client = MagicMock()
-    mock_client.__enter__ = MagicMock(return_value=mock_client)
-    mock_client.__exit__ = MagicMock(return_value=False)
-    mock_mistral_cls = MagicMock(return_value=mock_client)
-    fake_module = MagicMock()
-    fake_module.Mistral = mock_mistral_cls
-    with patch.dict("sys.modules", {"mistralai": fake_module, "mistralai.client": fake_module}):
-        yield mock_client
-
-
-class TestGenerateMistralTts:
-    def test_missing_api_key_raises_value_error(self, tmp_path, mock_mistral_module):
-        from tools.tts_tool import _generate_mistral_tts
-
-        output_path = str(tmp_path / "test.mp3")
-        with pytest.raises(ValueError, match="MISTRAL_API_KEY"):
-            _generate_mistral_tts("Hello", output_path, {})
-
-    def test_successful_generation(self, tmp_path, mock_mistral_module, monkeypatch):
-        from tools.tts_tool import _generate_mistral_tts
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        audio_content = b"fake-audio-bytes"
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(audio_content).decode()
-        )
-
-        output_path = str(tmp_path / "test.mp3")
-        result = _generate_mistral_tts("Hello world", output_path, {})
-
-        assert result == output_path
-        assert (tmp_path / "test.mp3").read_bytes() == audio_content
-        mock_mistral_module.audio.speech.complete.assert_called_once()
-        mock_mistral_module.__exit__.assert_called_once()
-        call_kwargs = mock_mistral_module.audio.speech.complete.call_args[1]
-        assert call_kwargs["input"] == "Hello world"
-        assert call_kwargs["response_format"] == "mp3"
-
-    @pytest.mark.parametrize(
-        "extension, expected_format",
-        [(".ogg", "opus"), (".wav", "wav"), (".flac", "flac"), (".mp3", "mp3")],
-    )
-    def test_response_format_from_extension(
-        self, tmp_path, mock_mistral_module, monkeypatch, extension, expected_format
-    ):
-        from tools.tts_tool import _generate_mistral_tts
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(b"data").decode()
-        )
-
-        output_path = str(tmp_path / f"test{extension}")
-        _generate_mistral_tts("Hi", output_path, {})
-
-        call_kwargs = mock_mistral_module.audio.speech.complete.call_args[1]
-        assert call_kwargs["response_format"] == expected_format
-
-    def test_voice_id_passed_when_configured(
-        self, tmp_path, mock_mistral_module, monkeypatch
-    ):
-        from tools.tts_tool import _generate_mistral_tts
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(b"data").decode()
-        )
-
-        config = {"mistral": {"voice_id": "my-voice-uuid"}}
-        _generate_mistral_tts("Hi", str(tmp_path / "test.mp3"), config)
-
-        call_kwargs = mock_mistral_module.audio.speech.complete.call_args[1]
-        assert call_kwargs["voice_id"] == "my-voice-uuid"
-
-    def test_default_voice_id_when_absent(
-        self, tmp_path, mock_mistral_module, monkeypatch
-    ):
-        from tools.tts_tool import DEFAULT_MISTRAL_TTS_VOICE_ID, _generate_mistral_tts
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(b"data").decode()
-        )
-
-        _generate_mistral_tts("Hi", str(tmp_path / "test.mp3"), {})
-
-        call_kwargs = mock_mistral_module.audio.speech.complete.call_args[1]
-        assert call_kwargs["voice_id"] == DEFAULT_MISTRAL_TTS_VOICE_ID
-
-    def test_default_voice_id_when_empty_string(
-        self, tmp_path, mock_mistral_module, monkeypatch
-    ):
-        from tools.tts_tool import DEFAULT_MISTRAL_TTS_VOICE_ID, _generate_mistral_tts
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(b"data").decode()
-        )
-
-        config = {"mistral": {"voice_id": ""}}
-        _generate_mistral_tts("Hi", str(tmp_path / "test.mp3"), config)
-
-        call_kwargs = mock_mistral_module.audio.speech.complete.call_args[1]
-        assert call_kwargs["voice_id"] == DEFAULT_MISTRAL_TTS_VOICE_ID
-
-    def test_api_error_sanitized(self, tmp_path, mock_mistral_module, monkeypatch):
-        from tools.tts_tool import _generate_mistral_tts
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        mock_mistral_module.audio.speech.complete.side_effect = RuntimeError(
-            "secret-key-in-error"
-        )
-
-        with pytest.raises(RuntimeError, match="RuntimeError") as exc_info:
-            _generate_mistral_tts("Hello", str(tmp_path / "test.mp3"), {})
-        assert "secret-key-in-error" not in str(exc_info.value)
-
-    def test_default_model_used(self, tmp_path, mock_mistral_module, monkeypatch):
-        from tools.tts_tool import DEFAULT_MISTRAL_TTS_MODEL, _generate_mistral_tts
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(b"data").decode()
-        )
-
-        _generate_mistral_tts("Hi", str(tmp_path / "test.mp3"), {})
-
-        call_kwargs = mock_mistral_module.audio.speech.complete.call_args[1]
-        assert call_kwargs["model"] == DEFAULT_MISTRAL_TTS_MODEL
-
-    def test_model_from_config_overrides_default(
-        self, tmp_path, mock_mistral_module, monkeypatch
-    ):
-        from tools.tts_tool import _generate_mistral_tts
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(b"data").decode()
-        )
-
-        config = {"mistral": {"model": "voxtral-large-tts-9999"}}
-        _generate_mistral_tts("Hi", str(tmp_path / "test.mp3"), config)
-
-        call_kwargs = mock_mistral_module.audio.speech.complete.call_args[1]
-        assert call_kwargs["model"] == "voxtral-large-tts-9999"
-
-
-class TestTtsDispatcherMistral:
-    def test_dispatcher_routes_to_mistral(
-        self, tmp_path, mock_mistral_module, monkeypatch
-    ):
-        import json
-
-        from tools.tts_tool import text_to_speech_tool
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(b"audio").decode()
-        )
-
-        output_path = str(tmp_path / "out.mp3")
-        with patch("tools.tts_tool._load_tts_config", return_value={"provider": "mistral"}):
-            result = json.loads(text_to_speech_tool("Hello", output_path=output_path))
-
-        assert result["success"] is True
-        assert result["provider"] == "mistral"
-        mock_mistral_module.audio.speech.complete.assert_called_once()
-
-    def test_dispatcher_returns_error_when_sdk_not_installed(self, tmp_path, monkeypatch):
-        import json
-
-        from tools.tts_tool import text_to_speech_tool
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        with patch(
-            "tools.tts_tool._import_mistral_client", side_effect=ImportError("no module")
-        ), patch("tools.tts_tool._load_tts_config", return_value={"provider": "mistral"}):
-            result = json.loads(
-                text_to_speech_tool("Hello", output_path=str(tmp_path / "out.mp3"))
-            )
-
-        assert result["success"] is False
-        assert "mistralai" in result["error"]
-
-
-class TestCheckTtsRequirementsMistral:
-    def test_mistral_sdk_and_key_returns_true(self, mock_mistral_module, monkeypatch):
-        from tools.tts_tool import check_tts_requirements
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        with patch("tools.tts_tool._import_edge_tts", side_effect=ImportError), \
-             patch("tools.tts_tool._import_elevenlabs", side_effect=ImportError), \
-             patch("tools.tts_tool._import_openai_client", side_effect=ImportError), \
-             patch("tools.tts_tool._check_neutts_available", return_value=False):
-            assert check_tts_requirements() is True
-
-    def test_mistral_key_missing_returns_false(self, mock_mistral_module):
-        from tools.tts_tool import check_tts_requirements
-
-        with patch("tools.tts_tool._import_edge_tts", side_effect=ImportError), \
-             patch("tools.tts_tool._import_elevenlabs", side_effect=ImportError), \
-             patch("tools.tts_tool._import_openai_client", side_effect=ImportError), \
-             patch("tools.tts_tool._check_neutts_available", return_value=False):
-            assert check_tts_requirements() is False
-
-
-class TestMistralTtsOpus:
-    def test_telegram_produces_ogg_and_voice_compatible(
-        self, tmp_path, mock_mistral_module, monkeypatch
-    ):
-        import json
-
-        from tools.tts_tool import text_to_speech_tool
-
-        monkeypatch.setenv("MISTRAL_API_KEY", "test-key")
-        monkeypatch.setenv("HERMES_SESSION_PLATFORM", "telegram")
-        mock_mistral_module.audio.speech.complete.return_value = MagicMock(
-            audio_data=base64.b64encode(b"opus-audio").decode()
-        )
-
-        with patch("tools.tts_tool._load_tts_config", return_value={"provider": "mistral"}):
-            result = json.loads(text_to_speech_tool("Hello"))
-
-        assert result["success"] is True
-        assert result["file_path"].endswith(".ogg")
-        assert result["voice_compatible"] is True
-        assert "[[audio_as_voice]]" in result["media_tag"]
-        call_kwargs = mock_mistral_module.audio.speech.complete.call_args[1]
-        assert call_kwargs["response_format"] == "opus"
@@ -414,7 +414,6 @@ class TestVisionSafetyGuards:

        class FakeResponse:
            url = "https://blocked.test/final.png"
-            headers = {"content-length": "24"}
            content = b"\x89PNG\r\n\x1a\n" + b"\x00" * 16

            def raise_for_status(self):
@@ -534,133 +533,6 @@ class TestTildeExpansion:
        assert data["success"] is False


-# ---------------------------------------------------------------------------
-# file:// URI support
-# ---------------------------------------------------------------------------
-
-
-class TestFileUriSupport:
-    """Verify that file:// URIs resolve as local file paths."""
-
-    @pytest.mark.asyncio
-    async def test_file_uri_resolved_as_local_path(self, tmp_path):
-        """file:///absolute/path should be treated as a local file."""
-        img = tmp_path / "photo.png"
-        img.write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 8)
-
-        mock_response = MagicMock()
-        mock_choice = MagicMock()
-        mock_choice.message.content = "A test image"
-        mock_response.choices = [mock_choice]
-
-        with (
-            patch(
-                "tools.vision_tools._image_to_base64_data_url",
-                return_value="data:image/png;base64,abc",
-            ),
-            patch(
-                "tools.vision_tools.async_call_llm",
-                new_callable=AsyncMock,
-                return_value=mock_response,
-            ),
-        ):
-            result = await vision_analyze_tool(
-                f"file://{img}", "describe this", "test/model"
-            )
-            data = json.loads(result)
-            assert data["success"] is True
-
-    @pytest.mark.asyncio
-    async def test_file_uri_nonexistent_gives_error(self, tmp_path):
-        """file:// pointing to a missing file should fail gracefully."""
-        result = await vision_analyze_tool(
-            f"file://{tmp_path}/nonexistent.png", "describe this", "test/model"
-        )
-        data = json.loads(result)
-        assert data["success"] is False
-
-
-# ---------------------------------------------------------------------------
-# Base64 size pre-flight check
-# ---------------------------------------------------------------------------
-
-
-class TestBase64SizeLimit:
-    """Verify that oversized images are rejected before hitting the API."""
-
-    @pytest.mark.asyncio
-    async def test_oversized_image_rejected_before_api_call(self, tmp_path):
-        """Images exceeding 5 MB base64 should fail with a clear size error."""
-        img = tmp_path / "huge.png"
-        img.write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * (4 * 1024 * 1024))
-
-        with patch("tools.vision_tools.async_call_llm", new_callable=AsyncMock) as mock_llm:
-            result = json.loads(await vision_analyze_tool(str(img), "describe this"))
-
-        assert result["success"] is False
-        assert "too large" in result["error"].lower()
-        mock_llm.assert_not_awaited()
-
-    @pytest.mark.asyncio
-    async def test_small_image_not_rejected(self, tmp_path):
-        """Images well under the limit should pass the size check."""
-        img = tmp_path / "small.png"
-        img.write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 64)
-
-        mock_response = MagicMock()
-        mock_choice = MagicMock()
-        mock_choice.message.content = "Small image"
-        mock_response.choices = [mock_choice]
-
-        with (
-            patch(
-                "tools.vision_tools.async_call_llm",
-                new_callable=AsyncMock,
-                return_value=mock_response,
-            ),
-        ):
-            result = json.loads(await vision_analyze_tool(str(img), "describe this", "test/model"))
-
-        assert result["success"] is True
-
-
-# ---------------------------------------------------------------------------
-# Error classification for 400 responses
-# ---------------------------------------------------------------------------
-
-
-class TestErrorClassification:
-    """Verify that API 400 errors produce actionable guidance."""
-
-    @pytest.mark.asyncio
-    async def test_invalid_request_error_gives_image_guidance(self, tmp_path):
-        """An invalid_request_error from the API should mention image size/format."""
-        img = tmp_path / "test.png"
-        img.write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 8)
-
-        api_error = Exception(
-            "Error code: 400 - {'type': 'error', 'error': "
-            "{'type': 'invalid_request_error', 'message': 'Invalid request data'}}"
-        )
-
-        with (
-            patch(
-                "tools.vision_tools._image_to_base64_data_url",
-                return_value="data:image/png;base64,abc",
-            ),
-            patch(
-                "tools.vision_tools.async_call_llm",
-                new_callable=AsyncMock,
-                side_effect=api_error,
-            ),
-        ):
-            result = json.loads(await vision_analyze_tool(str(img), "describe", "test/model"))
-
-        assert result["success"] is False
-        assert "rejected the image" in result["analysis"].lower()
-        assert "smaller" in result["analysis"].lower()
-
-
 class TestVisionRegistration:
    def test_vision_analyze_registered(self):
        from tools.registry import registry
@@ -312,25 +312,6 @@ def _build_child_agent(
    effective_acp_command = override_acp_command or getattr(parent_agent, "acp_command", None)
    effective_acp_args = list(override_acp_args if override_acp_args is not None else (getattr(parent_agent, "acp_args", []) or []))

-    # Resolve reasoning config: delegation override > parent inherit
-    parent_reasoning = getattr(parent_agent, "reasoning_config", None)
-    child_reasoning = parent_reasoning
-    try:
-        delegation_cfg = _load_config()
-        delegation_effort = str(delegation_cfg.get("reasoning_effort") or "").strip()
-        if delegation_effort:
-            from hermes_constants import parse_reasoning_effort
-            parsed = parse_reasoning_effort(delegation_effort)
-            if parsed is not None:
-                child_reasoning = parsed
-            else:
-                logger.warning(
-                    "Unknown delegation.reasoning_effort '%s', inheriting parent level",
-                    delegation_effort,
-                )
-    except Exception as exc:
-        logger.debug("Could not load delegation reasoning_effort: %s", exc)
-
    child = AIAgent(
        base_url=effective_base_url,
        api_key=effective_api_key,
@@ -341,7 +322,7 @@ def _build_child_agent(
        acp_args=effective_acp_args,
        max_iterations=max_iterations,
        max_tokens=getattr(parent_agent, "max_tokens", None),
-        reasoning_config=child_reasoning,
+        reasoning_config=getattr(parent_agent, "reasoning_config", None),
        prefill_messages=getattr(parent_agent, "prefill_messages", None),
        enabled_toolsets=child_toolsets,
        quiet_mode=True,
@@ -9,6 +9,7 @@ import logging
 import math
 import shlex
 import threading
+import warnings
 from pathlib import Path

 from tools.environments.base import (
@@ -62,9 +63,10 @@ class DaytonaEnvironment(BaseEnvironment):
        memory_gib = max(1, math.ceil(memory / 1024))
        disk_gib = max(1, math.ceil(disk / 1024))
        if disk_gib > 10:
-            logger.warning(
-                "Daytona: requested disk (%dGB) exceeds platform limit (10GB). "
-                "Capping to 10GB.", disk_gib,
+            warnings.warn(
+                f"Daytona: requested disk ({disk_gib}GB) exceeds platform limit (10GB). "
+                f"Capping to 10GB.",
+                stacklevel=2,
            )
            disk_gib = 10
        resources = Resources(cpu=cpu, memory=memory_gib, disk=disk_gib)
@@ -127,7 +129,6 @@ class DaytonaEnvironment(BaseEnvironment):
            get_files_fn=lambda: iter_sync_files(f"{self._remote_home}/.hermes"),
            upload_fn=self._daytona_upload,
            delete_fn=self._daytona_delete,
-            bulk_upload_fn=self._daytona_bulk_upload,
        )
        self._sync_manager.sync(force=True)
        self.init_session()
@@ -138,30 +139,6 @@ class DaytonaEnvironment(BaseEnvironment):
        self._sandbox.process.exec(f"mkdir -p {parent}")
        self._sandbox.fs.upload_file(host_path, remote_path)

-    def _daytona_bulk_upload(self, files: list[tuple[str, str]]) -> None:
-        """Upload many files in a single HTTP call via Daytona SDK.
-
-        Uses ``sandbox.fs.upload_files()`` which batches all files into one
-        multipart POST, avoiding per-file TLS/HTTP overhead (~580 files
-        goes from ~5 min to <2 s).
-        """
-        from daytona.common.filesystem import FileUpload
-
-        if not files:
-            return
-
-        # Pre-create all unique parent directories in one shell call
-        parents = sorted({str(Path(remote).parent) for _, remote in files})
-        if parents:
-            mkdir_cmd = "mkdir -p " + " ".join(shlex.quote(p) for p in parents)
-            self._sandbox.process.exec(mkdir_cmd)
-
-        uploads = [
-            FileUpload(source=host_path, destination=remote_path)
-            for host_path, remote_path in files
-        ]
-        self._sandbox.fs.upload_files(uploads)
-
    def _daytona_delete(self, remote_paths: list[str]) -> None:
        """Batch-delete remote files via SDK exec."""
        self._sandbox.process.exec(quoted_rm_command(remote_paths))
@@ -21,7 +21,6 @@ _FORCE_SYNC_ENV = "HERMES_FORCE_FILE_SYNC"

 # Transport callbacks provided by each backend
 UploadFn = Callable[[str, str], None]  # (host_path, remote_path) -> raises on failure
-BulkUploadFn = Callable[[list[tuple[str, str]]], None]  # [(host_path, remote_path), ...] -> raises on failure
 DeleteFn = Callable[[list[str]], None]  # (remote_paths) -> raises on failure
 GetFilesFn = Callable[[], list[tuple[str, str]]]  # () -> [(host_path, remote_path), ...]

@@ -77,11 +76,9 @@ class FileSyncManager:
        upload_fn: UploadFn,
        delete_fn: DeleteFn,
        sync_interval: float = _SYNC_INTERVAL_SECONDS,
-        bulk_upload_fn: BulkUploadFn | None = None,
    ):
        self._get_files_fn = get_files_fn
        self._upload_fn = upload_fn
-        self._bulk_upload_fn = bulk_upload_fn
        self._delete_fn = delete_fn
        self._synced_files: dict[str, tuple[float, int]] = {}  # remote_path -> (mtime, size)
        self._last_sync_time: float = 0.0  # monotonic; 0 ensures first sync runs
@@ -132,13 +129,9 @@ class FileSyncManager:
            logger.debug("file_sync: deleting %d stale remote file(s)", len(to_delete))

        try:
-            if to_upload and self._bulk_upload_fn is not None:
-                self._bulk_upload_fn(to_upload)
-                logger.debug("file_sync: bulk-uploaded %d file(s)", len(to_upload))
-            else:
-                for host_path, remote_path in to_upload:
-                    self._upload_fn(host_path, remote_path)
-                    logger.debug("file_sync: uploaded %s -> %s", host_path, remote_path)
+            for host_path, remote_path in to_upload:
+                self._upload_fn(host_path, remote_path)
+                logger.debug("file_sync: uploaded %s -> %s", host_path, remote_path)

            if to_delete:
                self._delete_fn(to_delete)
@@ -386,7 +386,9 @@ class ShellFileOperations(FileOperations):
        
        # Content analysis: >30% non-printable chars = binary
        if content_sample:
-            non_printable = sum(1 for c in content_sample[:1000]
+            if not content_sample:
+                return False
+            non_printable = sum(1 for c in content_sample[:1000] 
                               if ord(c) < 32 and c not in '\n\r\t')
            return non_printable / min(len(content_sample), 1000) > 0.30
        
@@ -808,7 +810,7 @@ class ShellFileOperations(FileOperations):
            return LintResult(skipped=True, message=f"{base_cmd} not available")
        
        # Run linter
-        cmd = linter_cmd.replace("{file}", self._escape_shell_arg(path))
+        cmd = linter_cmd.format(file=self._escape_shell_arg(path))
        result = self._exec(cmd, timeout=30)
        
        return LintResult(
@@ -396,15 +396,15 @@ class ProcessRegistry:
                        session.output_buffer = session.output_buffer[-session.max_output_chars:]
        except Exception as e:
            logger.debug("Process stdout reader ended: %s", e)
-        finally:
-            # Always reap the child to prevent zombie processes.
-            try:
-                session.process.wait(timeout=5)
-            except Exception as e:
-                logger.debug("Process wait timed out or failed: %s", e)
-            session.exited = True
-            session.exit_code = session.process.returncode
-            self._move_to_finished(session)
+
+        # Process exited
+        try:
+            session.process.wait(timeout=5)
+        except Exception as e:
+            logger.debug("Process wait timed out or failed: %s", e)
+        session.exited = True
+        session.exit_code = session.process.returncode
+        self._move_to_finished(session)

    def _env_poller_loop(
        self, session: ProcessSession, env: Any, log_path: str, pid_path: str, exit_path: str
@@ -109,27 +109,6 @@ def _write_manifest(entries: Dict[str, str]):
        logger.debug("Failed to write skills manifest %s: %s", MANIFEST_FILE, e, exc_info=True)


-def _read_skill_name(skill_md: Path, fallback: str) -> str:
-    """Read the name field from SKILL.md YAML frontmatter, falling back to *fallback*."""
-    try:
-        content = skill_md.read_text(encoding="utf-8", errors="replace")[:4000]
-    except OSError:
-        return fallback
-    in_frontmatter = False
-    for line in content.split("\n"):
-        stripped = line.strip()
-        if stripped == "---":
-            if in_frontmatter:
-                break
-            in_frontmatter = True
-            continue
-        if in_frontmatter and stripped.startswith("name:"):
-            value = stripped.split(":", 1)[1].strip().strip("\"'")
-            if value:
-                return value
-    return fallback
-
-
 def _discover_bundled_skills(bundled_dir: Path) -> List[Tuple[str, Path]]:
    """
    Find all SKILL.md files in the bundled directory.
@@ -144,7 +123,7 @@ def _discover_bundled_skills(bundled_dir: Path) -> List[Tuple[str, Path]]:
        if "/.git/" in path_str or "/.github/" in path_str or "/.hub/" in path_str:
            continue
        skill_dir = skill_md.parent
-        skill_name = _read_skill_name(skill_md, skill_dir.name)
+        skill_name = skill_dir.name
        skills.append((skill_name, skill_dir))

    return skills
@@ -2,12 +2,11 @@
 """
 Text-to-Speech Tool Module

-Supports six TTS providers:
+Supports five TTS providers:
 - Edge TTS (default, free, no API key): Microsoft Edge neural voices
 - ElevenLabs (premium): High-quality voices, needs ELEVENLABS_API_KEY
 - OpenAI TTS: Good quality, needs OPENAI_API_KEY
 - MiniMax TTS: High-quality with voice cloning, needs MINIMAX_API_KEY
- Mistral (Voxtral TTS): Multilingual, native Opus, needs MISTRAL_API_KEY
 - NeuTTS (local, free, no API key): On-device TTS via neutts_cli, needs neutts installed

 Output formats:
@@ -24,7 +23,6 @@ Usage:
 """

 import asyncio
-import base64
 import datetime
 import json
 import logging
@@ -64,11 +62,6 @@ def _import_openai_client():
    from openai import OpenAI as OpenAIClient
    return OpenAIClient

-def _import_mistral_client():
-    """Lazy import Mistral client. Returns the class or raises ImportError."""
-    from mistralai.client import Mistral
-    return Mistral
-
 def _import_sounddevice():
    """Lazy import sounddevice. Returns the module or raises ImportError/OSError."""
    import sounddevice as sd
@@ -89,8 +82,6 @@ DEFAULT_OPENAI_BASE_URL = "https://api.openai.com/v1"
 DEFAULT_MINIMAX_MODEL = "speech-2.8-hd"
 DEFAULT_MINIMAX_VOICE_ID = "English_Graceful_Lady"
 DEFAULT_MINIMAX_BASE_URL = "https://api.minimax.io/v1/t2a_v2"
-DEFAULT_MISTRAL_TTS_MODEL = "voxtral-mini-tts-2603"
-DEFAULT_MISTRAL_TTS_VOICE_ID = "c69964a6-ab8b-4f8a-9465-ec0925096ec8"  # Paul - Neutral

 def _get_default_output_dir() -> str:
    from hermes_constants import get_hermes_dir
@@ -374,55 +365,6 @@ def _generate_minimax_tts(text: str, output_path: str, tts_config: Dict[str, Any
    return output_path


-# ===========================================================================
-# Provider: Mistral (Voxtral TTS)
-# ===========================================================================
-def _generate_mistral_tts(text: str, output_path: str, tts_config: Dict[str, Any]) -> str:
-    """Generate audio using Mistral Voxtral TTS API.
-
-    The API returns base64-encoded audio; this function decodes it
-    and writes the raw bytes to *output_path*.
-    Supports native Opus output for Telegram voice bubbles.
-    """
-    api_key = os.getenv("MISTRAL_API_KEY", "")
-    if not api_key:
-        raise ValueError("MISTRAL_API_KEY not set. Get one at https://console.mistral.ai/")
-
-    mi_config = tts_config.get("mistral", {})
-    model = mi_config.get("model", DEFAULT_MISTRAL_TTS_MODEL)
-    voice_id = mi_config.get("voice_id") or DEFAULT_MISTRAL_TTS_VOICE_ID
-
-    if output_path.endswith(".ogg"):
-        response_format = "opus"
-    elif output_path.endswith(".wav"):
-        response_format = "wav"
-    elif output_path.endswith(".flac"):
-        response_format = "flac"
-    else:
-        response_format = "mp3"
-
-    Mistral = _import_mistral_client()
-    try:
-        with Mistral(api_key=api_key) as client:
-            response = client.audio.speech.complete(
-                model=model,
-                input=text,
-                voice_id=voice_id,
-                response_format=response_format,
-            )
-            audio_bytes = base64.b64decode(response.audio_data)
-    except ValueError:
-        raise
-    except Exception as e:
-        logger.error("Mistral TTS failed: %s", e, exc_info=True)
-        raise RuntimeError(f"Mistral TTS failed: {type(e).__name__}") from e
-
-    with open(output_path, "wb") as f:
-        f.write(audio_bytes)
-
-    return output_path
-
-
 # ===========================================================================
 # NeuTTS (local, on-device TTS via neutts_cli)
 # ===========================================================================
@@ -551,7 +493,7 @@ def text_to_speech_tool(
        out_dir.mkdir(parents=True, exist_ok=True)
        # Use .ogg for Telegram with providers that support native Opus output,
        # otherwise fall back to .mp3 (Edge TTS will attempt ffmpeg conversion later).
-        if want_opus and provider in ("openai", "elevenlabs", "mistral"):
+        if want_opus and provider in ("openai", "elevenlabs"):
            file_path = out_dir / f"tts_{timestamp}.ogg"
        else:
            file_path = out_dir / f"tts_{timestamp}.mp3"
@@ -588,18 +530,6 @@ def text_to_speech_tool(
            logger.info("Generating speech with MiniMax TTS...")
            _generate_minimax_tts(text, file_str, tts_config)

-        elif provider == "mistral":
-            try:
-                _import_mistral_client()
-            except ImportError:
-                return json.dumps({
-                    "success": False,
-                    "error": "Mistral provider selected but 'mistralai' package not installed. "
-                             "Run: pip install 'hermes-agent[mistral]'"
-                }, ensure_ascii=False)
-            logger.info("Generating speech with Mistral Voxtral TTS...")
-            _generate_mistral_tts(text, file_str, tts_config)
-
        elif provider == "neutts":
            if not _check_neutts_available():
                return json.dumps({
@@ -654,7 +584,8 @@ def text_to_speech_tool(
            if opus_path:
                file_str = opus_path
                voice_compatible = True
-        elif provider in ("elevenlabs", "openai", "mistral"):
+        elif provider in ("elevenlabs", "openai"):
+            # These providers can output Opus natively if the path ends in .ogg
            voice_compatible = file_str.endswith(".ogg")

        file_size = os.path.getsize(file_str)
@@ -722,12 +653,6 @@ def check_tts_requirements() -> bool:
        pass
    if os.getenv("MINIMAX_API_KEY"):
        return True
-    try:
-        _import_mistral_client()
-        if os.getenv("MISTRAL_API_KEY"):
-            return True
-    except ImportError:
-        pass
    if _check_neutts_available():
        return True
    return False
@@ -67,10 +67,6 @@ def _resolve_download_timeout() -> float:

 _VISION_DOWNLOAD_TIMEOUT = _resolve_download_timeout()

-# Hard cap on downloaded image file size (50 MB). Prevents OOM from
-# attacker-hosted multi-gigabyte files or decompression bombs.
-_VISION_MAX_DOWNLOAD_BYTES = 50 * 1024 * 1024
-

 def _validate_image_url(url: str) -> bool:
    """
@@ -185,25 +181,13 @@ async def _download_image(image_url: str, destination: Path, max_retries: int =
                )
                response.raise_for_status()

-                # Reject overly large images early via Content-Length header.
-                cl = response.headers.get("content-length")
-                if cl and int(cl) > _VISION_MAX_DOWNLOAD_BYTES:
-                    raise ValueError(
-                        f"Image too large ({int(cl)} bytes, max {_VISION_MAX_DOWNLOAD_BYTES})"
-                    )
-
                final_url = str(response.url)
                blocked = check_website_access(final_url)
                if blocked:
                    raise PermissionError(blocked["message"])
                
-                # Save the image content (double-check actual size)
-                body = response.content
-                if len(body) > _VISION_MAX_DOWNLOAD_BYTES:
-                    raise ValueError(
-                        f"Image too large ({len(body)} bytes, max {_VISION_MAX_DOWNLOAD_BYTES})"
-                    )
-                destination.write_bytes(body)
+                # Save the image content
+                destination.write_bytes(response.content)
            
            return destination
        except Exception as e:
@@ -342,11 +326,7 @@ async def vision_analyze_tool(
        logger.info("User prompt: %s", user_prompt[:100])
        
        # Determine if this is a local file path or a remote URL
-        # Strip file:// scheme so file URIs resolve as local paths.
-        resolved_url = image_url
-        if resolved_url.startswith("file://"):
-            resolved_url = resolved_url[len("file://"):]
-        local_path = Path(os.path.expanduser(resolved_url))
+        local_path = Path(os.path.expanduser(image_url))
        if local_path.is_file():
            # Local file path (e.g. from platform image cache) -- skip download
            logger.info("Using local image file: %s", image_url)
@@ -382,19 +362,7 @@ async def vision_analyze_tool(
        # Calculate size in KB for better readability
        data_size_kb = len(image_data_url) / 1024
        logger.info("Image converted to base64 (%.1f KB)", data_size_kb)
-
-        # Pre-flight size check: most vision APIs cap base64 payloads at 5 MB.
-        # Reject early with a clear message instead of a cryptic provider 400.
-        _MAX_BASE64_BYTES = 5 * 1024 * 1024  # 5 MB
-        # The data URL includes the header (e.g. "data:image/jpeg;base64,") which
-        # is negligible, but measure the full string to be safe.
-        if len(image_data_url) > _MAX_BASE64_BYTES:
-            raise ValueError(
-                f"Image too large for vision API: base64 payload is "
-                f"{len(image_data_url) / (1024 * 1024):.1f} MB (limit 5 MB). "
-                f"Resize or compress the image and try again."
-            )
-
+        
        debug_call_data["image_size_bytes"] = image_size_bytes
        
        # Use the prompt as provided (model_tools.py now handles full description formatting)
@@ -487,21 +455,14 @@ async def vision_analyze_tool(
                f"API provider account and try again. Error: {e}"
            )
        elif any(hint in err_str for hint in (
-            "does not support", "not support image",
-            "content_policy", "multimodal",
+            "does not support", "not support image", "invalid_request",
+            "content_policy", "image_url", "multimodal",
            "unrecognized request argument", "image input",
        )):
            analysis = (
                f"{model} does not support vision or our request was not "
                f"accepted by the server. Error: {e}"
            )
-        elif "invalid_request" in err_str or "image_url" in err_str:
-            analysis = (
-                "The vision API rejected the image. This can happen when the "
-                "image is too large, in an unsupported format, or corrupted. "
-                "Try a smaller JPEG/PNG (under 3.5 MB) and retry. "
-                f"Error: {e}"
-            )
        else:
            analysis = (
                "There was a problem with the request and the image could not "
@@ -375,9 +375,8 @@ class TrajectoryCompressor:
                    f"Missing API key. Set {self.config.api_key_env} "
                    f"environment variable.")
            from openai import OpenAI
-            from agent.auxiliary_client import _to_openai_base_url
            self.client = OpenAI(
-                api_key=api_key, base_url=_to_openai_base_url(self.config.base_url))
+                api_key=api_key, base_url=self.config.base_url)
            # AsyncOpenAI is created lazily in _get_async_client() so it
            # binds to the current event loop — avoids "Event loop is closed"
            # when process_directory() is called multiple times (each call
@@ -396,11 +395,10 @@ class TrajectoryCompressor:
        avoiding "Event loop is closed" errors on repeated calls.
        """
        from openai import AsyncOpenAI
-        from agent.auxiliary_client import _to_openai_base_url
        # Always create a fresh client so it binds to the running loop.
        self.async_client = AsyncOpenAI(
            api_key=self._async_client_api_key,
-            base_url=_to_openai_base_url(self.config.base_url),
+            base_url=self.config.base_url,
        )
        return self.async_client

@@ -152,6 +152,19 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/1a/99/84ba7273339d0f3dfa57901b846489d2e5c2cd731470167757f1935fffbd/aiohttp_retry-2.9.1-py3-none-any.whl", hash = "sha256:66d2759d1921838256a05a3f80ad7e724936f083e35be5abb5e16eed6be6dc54", size = 9981, upload-time = "2024-11-06T10:44:52.917Z" },
 ]

+[[package]]
+name = "aiohttp-socks"
+version = "0.11.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "aiohttp" },
+    { name = "python-socks" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/1f/cc/e5bbd54f76bd56291522251e47267b645dac76327b2657ade9545e30522c/aiohttp_socks-0.11.0.tar.gz", hash = "sha256:0afe51638527c79077e4bd6e57052c87c4824233d6e20bb061c53766421b10f0", size = 11196, upload-time = "2025-12-09T13:35:52.564Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/bf/7d/4b633d709b8901d59444d2e512b93e72fe62d2b492a040097c3f7ba017bb/aiohttp_socks-0.11.0-py3-none-any.whl", hash = "sha256:9aacce57c931b8fbf8f6d333cf3cafe4c35b971b35430309e167a35a8aab9ec1", size = 10556, upload-time = "2025-12-09T13:35:50.18Z" },
+]
+
 [[package]]
 name = "aiosignal"
 version = "1.4.0"
@@ -240,6 +253,12 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/38/0e/27be9fdef66e72d64c0cdc3cc2823101b80585f8119b5c112c2e8f5f7dab/anyio-4.12.1-py3-none-any.whl", hash = "sha256:d405828884fc140aa80a3c667b8beed277f1dfedec42ba031bd6ac3db606ab6c", size = 113592, upload-time = "2026-01-06T11:45:19.497Z" },
 ]

+[[package]]
+name = "atomicwrites"
+version = "1.4.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/87/c6/53da25344e3e3a9c01095a89f16dbcda021c609ddb42dd6d7c0528236fb2/atomicwrites-1.4.1.tar.gz", hash = "sha256:81b2c9071a49367a7f770170e5eec8cb66567cfbbc8c73d20ce5ca4a8d71cf11", size = 14227, upload-time = "2022-07-08T18:31:40.459Z" }
+
 [[package]]
 name = "atroposlib"
 version = "0.4.0"
@@ -357,15 +376,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/41/0a/0896b829a39b5669a2d811e1a79598de661693685cd62b31f11d0c18e65b/av-17.0.0-cp314-cp314t-win_arm64.whl", hash = "sha256:dba98603fc4665b4f750de86fbaf6c0cfaece970671a9b529e0e3d1711e8367e", size = 22071058, upload-time = "2026-03-14T14:38:43.663Z" },
 ]

-[[package]]
-name = "base58"
-version = "2.1.1"
-source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/7f/45/8ae61209bb9015f516102fa559a2914178da1d5868428bd86a1b4421141d/base58-2.1.1.tar.gz", hash = "sha256:c5d0cb3f5b6e81e8e35da5754388ddcc6d0d14b6c6a132cb93d69ed580a7278c", size = 6528, upload-time = "2021-10-30T22:12:17.858Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/4a/45/ec96b29162a402fc4c1c5512d114d7b3787b9d1c2ec241d9568b4816ee23/base58-2.1.1-py3-none-any.whl", hash = "sha256:11a36f4d3ce51dfc1043f3218591ac4eb1ceb172919cebe05b52a5bcc8d245c2", size = 5621, upload-time = "2021-10-30T22:12:16.658Z" },
-]
-
 [[package]]
 name = "blinker"
 version = "1.9.0"
@@ -1682,7 +1692,7 @@ all = [
    { name = "honcho-ai" },
    { name = "lark-oapi" },
    { name = "markdown", marker = "sys_platform == 'linux'" },
-    { name = "mautrix", extra = ["encryption"], marker = "sys_platform == 'linux'" },
+    { name = "matrix-nio", extra = ["e2e"], marker = "sys_platform == 'linux'" },
    { name = "mcp" },
    { name = "mistralai" },
    { name = "modal" },
@@ -1728,7 +1738,7 @@ honcho = [
 ]
 matrix = [
    { name = "markdown" },
-    { name = "mautrix", extra = ["encryption"] },
+    { name = "matrix-nio", extra = ["e2e"] },
 ]
 mcp = [
    { name = "mcp" },
@@ -1836,7 +1846,7 @@ requires-dist = [
    { name = "jinja2", specifier = ">=3.1.5,<4" },
    { name = "lark-oapi", marker = "extra == 'feishu'", specifier = ">=1.5.3,<2" },
    { name = "markdown", marker = "extra == 'matrix'", specifier = ">=3.6,<4" },
-    { name = "mautrix", extras = ["encryption"], marker = "extra == 'matrix'", specifier = ">=0.20,<1" },
+    { name = "matrix-nio", extras = ["e2e"], marker = "extra == 'matrix'", specifier = ">=0.24.0,<1" },
    { name = "mcp", marker = "extra == 'dev'", specifier = ">=1.2.0,<2" },
    { name = "mcp", marker = "extra == 'mcp'", specifier = ">=1.2.0,<2" },
    { name = "mistralai", marker = "extra == 'mistral'", specifier = ">=2.3.0,<3" },
@@ -2591,25 +2601,30 @@ wheels = [
 ]

 [[package]]
-name = "mautrix"
-version = "0.21.0"
+name = "matrix-nio"
+version = "0.25.2"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
+    { name = "aiofiles" },
    { name = "aiohttp" },
-    { name = "attrs" },
-    { name = "yarl" },
+    { name = "aiohttp-socks" },
+    { name = "h11" },
+    { name = "h2" },
+    { name = "jsonschema" },
+    { name = "pycryptodome" },
+    { name = "unpaddedbase64" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/74/a7/8d6d0589e211ecf3a72ce4b28cc32c857c4043d1a6963d63ac9f726af653/mautrix-0.21.0.tar.gz", hash = "sha256:a14e0582e114cb241f282f9e717014608f36c03f1dc59afcd71b4e81780ffe2e", size = 254726, upload-time = "2025-11-17T13:53:09.996Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/33/50/c20129fd6f0e1aad3510feefd3229427fc8163a111f3911ed834e414116b/matrix_nio-0.25.2.tar.gz", hash = "sha256:8ef8180c374e12368e5c83a692abfb3bab8d71efcd17c5560b5c40c9b6f2f600", size = 155480, upload-time = "2024-10-04T07:51:41.62Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/8c/d6/d4b3ae380dacdc9fb07bc3eb7dd17f43b8a7ce391465a184d1094acb66c1/mautrix-0.21.0-py3-none-any.whl", hash = "sha256:1cba30d69f46351918a3b8bc4e5657465cac8470d42ddd2287a742653cab7194", size = 334131, upload-time = "2025-11-17T13:53:08.117Z" },
+    { url = "https://files.pythonhosted.org/packages/7b/0f/8b958d46e23ed4f69d2cffd63b46bb097a1155524e2e7f5c4279c8691c4a/matrix_nio-0.25.2-py3-none-any.whl", hash = "sha256:9c2880004b0e475db874456c0f79b7dd2b6285073a7663bcaca29e0754a67495", size = 181982, upload-time = "2024-10-04T07:51:39.451Z" },
 ]

 [package.optional-dependencies]
-encryption = [
-    { name = "base58" },
-    { name = "pycryptodome" },
+e2e = [
+    { name = "atomicwrites" },
+    { name = "cachetools" },
+    { name = "peewee" },
    { name = "python-olm" },
-    { name = "unpaddedbase64" },
 ]

 [[package]]
@@ -3322,6 +3337,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/a0/3e/2218fa29637781b8e7ac35a928108ff2614ddd40879389d3af2caa725af5/parallel_web-0.4.2-py3-none-any.whl", hash = "sha256:aa3a4a9aecc08972c5ce9303271d4917903373dff4dd277d9a3e30f9cff53346", size = 144012, upload-time = "2026-03-09T22:24:33.979Z" },
 ]

+[[package]]
+name = "peewee"
+version = "3.19.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/88/b0/79462b42e89764998756e0557f2b58a15610a5b4512fbbcccae58fba7237/peewee-3.19.0.tar.gz", hash = "sha256:f88292a6f0d7b906cb26bca9c8599b8f4d8920ebd36124400d0cbaaaf915511f", size = 974035, upload-time = "2026-01-07T17:24:59.597Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/1a/41/19c65578ef9a54b3083253c68a607f099642747168fe00f3a2bceb7c3a34/peewee-3.19.0-py3-none-any.whl", hash = "sha256:de220b94766e6008c466e00ce4ba5299b9a832117d9eb36d45d0062f3cfd7417", size = 411885, upload-time = "2026-01-07T17:24:58.33Z" },
+]
+
 [[package]]
 name = "pillow"
 version = "12.1.1"
@@ -3984,6 +4008,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/79/93/f6729f10149305262194774d6c8b438c0b084740cf239f48ab97b4df02fa/python_olm-3.2.16-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:10a5e68a2f4b5a2bfa5fdb5dbfa22396a551730df6c4a572235acaa96e997d3f", size = 297000, upload-time = "2023-11-28T19:25:31.045Z" },
 ]

+[[package]]
+name = "python-socks"
+version = "2.8.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/36/0b/cd77011c1bc01b76404f7aba07fca18aca02a19c7626e329b40201217624/python_socks-2.8.1.tar.gz", hash = "sha256:698daa9616d46dddaffe65b87db222f2902177a2d2b2c0b9a9361df607ab3687", size = 38909, upload-time = "2026-02-16T05:24:00.745Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/15/fe/9a58cb6eec633ff6afae150ca53c16f8cc8b65862ccb3d088051efdfceb7/python_socks-2.8.1-py3-none-any.whl", hash = "sha256:28232739c4988064e725cdbcd15be194743dd23f1c910f784163365b9d7be035", size = 55087, upload-time = "2026-02-16T05:23:59.147Z" },
+]
+
 [[package]]
 name = "python-telegram-bot"
 version = "22.6"
@@ -226,8 +226,7 @@ After each turn:
 |------|---------|
 | `run_agent.py` | AIAgent class — the complete agent loop (~9,200 lines) |
 | `agent/prompt_builder.py` | System prompt assembly from memory, skills, context files, personality |
-| `agent/context_engine.py` | ContextEngine ABC — pluggable context management |
-| `agent/context_compressor.py` | Default engine — lossy summarization algorithm |
+| `agent/context_compressor.py` | Conversation compression algorithm |
 | `agent/prompt_caching.py` | Anthropic prompt caching markers and cache metrics |
 | `agent/auxiliary_client.py` | Auxiliary LLM client for side tasks (vision, summarization) |
 | `model_tools.py` | Tool schema collection, `handle_function_call()` dispatch |
@@ -16,7 +16,7 @@ This page is the top-level map of Hermes Agent internals. Use it to orient yours
 │                                                                      │
 │  CLI (cli.py)    Gateway (gateway/run.py)    ACP (acp_adapter/)     │
 │  Batch Runner    API Server                  Python Library          │
-└──────────┬──────────────┬───────────────────────┬───────────────────┘
+└──────────┬──────────────┬───────────────────────┬────────────────────┘
           │              │                       │
           ▼              ▼                       ▼
 ┌─────────────────────────────────────────────────────────────────────┐
@@ -62,8 +62,7 @@ hermes-agent/
 │
 ├── agent/                    # Agent internals
 │   ├── prompt_builder.py     # System prompt assembly
-│   ├── context_engine.py     # ContextEngine ABC (pluggable)
-│   ├── context_compressor.py # Default engine — lossy summarization
+│   ├── context_compressor.py # Conversation compression algorithm
 │   ├── prompt_caching.py     # Anthropic prompt caching
 │   ├── auxiliary_client.py   # Auxiliary LLM for side tasks (vision, summarization)
 │   ├── model_metadata.py     # Model context lengths, token estimation
@@ -124,7 +123,6 @@ hermes-agent/
 ├── acp_adapter/              # ACP server (VS Code / Zed / JetBrains)
 ├── cron/                     # Scheduler (jobs.py, scheduler.py)
 ├── plugins/memory/           # Memory provider plugins
-├── plugins/context_engine/   # Context engine plugins
 ├── environments/             # RL training environments (Atropos)
 ├── skills/                   # Bundled skills (always available)
 ├── optional-skills/          # Official optional skills (install explicitly)
@@ -229,7 +227,7 @@ Long-running process with 14 platform adapters, unified session routing, user au

 ### Plugin System

-Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Two specialized plugin types exist: memory providers (`plugins/memory/`) and context engines (`plugins/context_engine/`). Both are single-select — only one of each can be active at a time, configured via `hermes plugins` or `config.yaml`.
+Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Memory providers are a specialized plugin type under `plugins/memory/`.

 → [Plugin Guide](/docs/guides/build-a-hermes-plugin), [Memory Provider Plugin](./memory-provider-plugin.md)

@@ -3,37 +3,10 @@
 Hermes Agent uses a dual compression system and Anthropic prompt caching to
 manage context window usage efficiently across long conversations.

-Source files: `agent/context_engine.py` (ABC), `agent/context_compressor.py` (default engine),
-`agent/prompt_caching.py`, `gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)
+Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
+`gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)


-## Pluggable Context Engine
-
-Context management is built on the `ContextEngine` ABC (`agent/context_engine.py`). The built-in `ContextCompressor` is the default implementation, but plugins can replace it with alternative engines (e.g., Lossless Context Management).
-
-```yaml
-context:
-  engine: "compressor"    # default — built-in lossy summarization
-  engine: "lcm"           # example — plugin providing lossless context
-```
-
-The engine is responsible for:
- Deciding when compaction should fire (`should_compress()`)
- Performing compaction (`compress()`)
- Optionally exposing tools the agent can call (e.g., `lcm_grep`)
- Tracking token usage from API responses
-
-Selection is config-driven via `context.engine` in `config.yaml`. The resolution order:
-1. Check `plugins/context_engine/<name>/` directory
-2. Check general plugin system (`register_context_engine()`)
-3. Fall back to built-in `ContextCompressor`
-
-Plugin engines are **never auto-activated** — the user must explicitly set `context.engine` to the plugin's name. The default `"compressor"` always uses the built-in.
-
-Configure via `hermes plugins` → Provider Plugins → Context Engine, or edit `config.yaml` directly.
-
-For building a context engine plugin, see [Context Engine Plugins](/docs/developer-guide/context-engine-plugin).
-
 ## Dual Compression System

 Hermes has two separate compression layers that operate independently:
--- a/Show More
+++ b/Show More