feat: add fast-path setup for nous account

adds a nous account specific fast flow & autolaunches into chat if gateway isn't set up
change: always run setup on no-config run
2026-04-24 00:07:23 -04:00 · 2026-04-24 00:06:48 -04:00
486 changed files with 5149 additions and 77885 deletions
@@ -53,9 +53,6 @@ jobs:
      - name: Extract skill metadata for dashboard
        run: python3 website/scripts/extract-skills.py

-      - name: Regenerate per-skill docs pages + catalogs
-        run: python3 website/scripts/generate-skill-docs.py
-
      - name: Build skills index (if not already present)
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -36,9 +36,6 @@ jobs:
      - name: Extract skill metadata for dashboard
        run: python3 website/scripts/extract-skills.py

-      - name: Regenerate per-skill docs pages + catalogs
-        run: python3 website/scripts/generate-skill-docs.py
-
      - name: Lint docs diagrams
        run: npm run lint:diagrams
        working-directory: website
@@ -240,19 +240,6 @@ npm run fmt       # prettier
 npm test          # vitest
 ```

-### TUI in the Dashboard (`hermes dashboard` → `/chat`)
-
-The dashboard embeds the real `hermes --tui` — **not** a rewrite.  See `hermes_cli/pty_bridge.py` + the `@app.websocket("/api/pty")` endpoint in `hermes_cli/web_server.py`.
-
- Browser loads `web/src/pages/ChatPage.tsx`, which mounts xterm.js's `Terminal` with the WebGL renderer, `@xterm/addon-fit` for container-driven resize, and `@xterm/addon-unicode11` for modern wide-character widths.
- `/api/pty?token=…` upgrades to a WebSocket; auth uses the same ephemeral `_SESSION_TOKEN` as REST, via query param (browsers can't set `Authorization` on WS upgrade).
- The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
- Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.
-
-**Do not re-implement the primary chat experience in React.** The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded `hermes --tui` — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.
-
-**Structured React UI around the TUI is allowed when it is not a second chat surface.** Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. `ChatSidebar`, `ModelPickerDialog`, `ToolCall`) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.
-
 ---

 ## Adding New Tools
@@ -10,11 +10,9 @@ ENV PYTHONUNBUFFERED=1
 ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright

 # Install system dependencies in one layer, clear APT cache
-# tini reaps orphaned zombie processes (MCP stdio subprocesses, git, bun, etc.)
-# that would otherwise accumulate when hermes runs as PID 1. See #15012.
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
-        build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
+        build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli && \
    rm -rf /var/lib/apt/lists/*

 # Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
@@ -43,15 +41,9 @@ COPY --chown=hermes:hermes . .
 # Build web dashboard (Vite outputs to hermes_cli/web_dist/)
 RUN cd web && npm run build

-# ---------- Permissions ----------
-# Make install dir world-readable so any HERMES_UID can read it at runtime.
-# The venv needs to be traversable too.
-USER root
-RUN chmod -R a+rX /opt/hermes
-# Start as root so the entrypoint can usermod/groupmod + gosu.
-# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
-
 # ---------- Python virtualenv ----------
+RUN chown hermes:hermes /opt/hermes
+USER hermes
 RUN uv venv && \
    uv pip install --no-cache-dir -e ".[all]"

@@ -60,4 +52,4 @@ ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
 ENV HERMES_HOME=/opt/data
 ENV PATH="/opt/data/.local/bin:${PATH}"
 VOLUME [ "/opt/data" ]
-ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]
+ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
@@ -60,7 +60,7 @@ from acp_adapter.events import (
    make_tool_progress_cb,
 )
 from acp_adapter.permissions import make_approval_callback
-from acp_adapter.session import SessionManager, SessionState, _expand_acp_enabled_toolsets
+from acp_adapter.session import SessionManager, SessionState

 logger = logging.getLogger(__name__)

@@ -287,11 +287,7 @@ class HermesACPAgent(acp.Agent):
        try:
            from model_tools import get_tool_definitions

-            enabled_toolsets = _expand_acp_enabled_toolsets(
-                getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"],
-                mcp_server_names=[server.name for server in mcp_servers],
-            )
-            state.agent.enabled_toolsets = enabled_toolsets
+            enabled_toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
            disabled_toolsets = getattr(state.agent, "disabled_toolsets", None)
            state.agent.tools = get_tool_definitions(
                enabled_toolsets=enabled_toolsets,
@@ -758,9 +754,7 @@ class HermesACPAgent(acp.Agent):
    def _cmd_tools(self, args: str, state: SessionState) -> str:
        try:
            from model_tools import get_tool_definitions
-            toolsets = _expand_acp_enabled_toolsets(
-                getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
-            )
+            toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
            tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
            if not tools:
                return "No tools available."
@@ -106,24 +106,6 @@ def _register_task_cwd(task_id: str, cwd: str) -> None:
        logger.debug("Failed to register ACP task cwd override", exc_info=True)


-def _expand_acp_enabled_toolsets(
-    toolsets: List[str] | None = None,
-    mcp_server_names: List[str] | None = None,
-) -> List[str]:
-    """Return ACP toolsets plus explicit MCP server toolsets for this session."""
-    expanded: List[str] = []
-    for name in list(toolsets or ["hermes-acp"]):
-        if name and name not in expanded:
-            expanded.append(name)
-
-    for server_name in list(mcp_server_names or []):
-        toolset_name = f"mcp-{server_name}"
-        if server_name and toolset_name not in expanded:
-            expanded.append(toolset_name)
-
-    return expanded
-
-
 def _clear_task_cwd(task_id: str) -> None:
    """Remove task-specific cwd overrides for an ACP session."""
    if not task_id:
@@ -555,18 +537,9 @@ class SessionManager:
        elif isinstance(model_cfg, str) and model_cfg.strip():
            default_model = model_cfg.strip()

-        configured_mcp_servers = [
-            name
-            for name, cfg in (config.get("mcp_servers") or {}).items()
-            if not isinstance(cfg, dict) or cfg.get("enabled", True) is not False
-        ]
-
        kwargs = {
            "platform": "acp",
-            "enabled_toolsets": _expand_acp_enabled_toolsets(
-                ["hermes-acp"],
-                mcp_server_names=configured_mcp_servers,
-            ),
+            "enabled_toolsets": ["hermes-acp"],
            "quiet_mode": True,
            "session_id": session_id,
            "model": model or default_model,
@@ -14,8 +14,6 @@ import copy
 import json
 import logging
 import os
-import platform
-import subprocess
 from pathlib import Path

 from hermes_constants import get_hermes_home
@@ -279,9 +277,8 @@ def _is_oauth_token(key: str) -> bool:
    Positively identifies Anthropic OAuth tokens by their key format:
    - ``sk-ant-`` prefix (but NOT ``sk-ant-api``) → setup tokens, managed keys
    - ``eyJ`` prefix → JWTs from the Anthropic OAuth flow
-    - ``cc-`` prefix → Claude Code OAuth access tokens (from CLAUDE_CODE_OAUTH_TOKEN)

-    Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match any pattern
+    Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match either pattern
    and correctly return False.
    """
    if not key:
@@ -295,9 +292,6 @@ def _is_oauth_token(key: str) -> bool:
    # JWTs from Anthropic OAuth flow
    if key.startswith("eyJ"):
        return True
-    # Claude Code OAuth access tokens (opaque, from CLAUDE_CODE_OAUTH_TOKEN)
-    if key.startswith("cc-"):
-        return True
    return False


@@ -467,72 +461,8 @@ def build_anthropic_bedrock_client(region: str):
    )


-def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
-    """Read Claude Code OAuth credentials from the macOS Keychain.
-
-    Claude Code >=2.1.114 stores credentials in the macOS Keychain under the
-    service name "Claude Code-credentials" rather than (or in addition to)
-    the JSON file at ~/.claude/.credentials.json.
-
-    The password field contains a JSON string with the same claudeAiOauth
-    structure as the JSON file.
-
-    Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
-    """
-    import platform
-    import subprocess
-
-    if platform.system() != "Darwin":
-        return None
-
-    try:
-        # Read the "Claude Code-credentials" generic password entry
-        result = subprocess.run(
-            ["security", "find-generic-password",
-             "-s", "Claude Code-credentials",
-             "-w"],
-            capture_output=True,
-            text=True,
-            timeout=5,
-        )
-    except (OSError, subprocess.TimeoutExpired):
-        logger.debug("Keychain: security command not available or timed out")
-        return None
-
-    if result.returncode != 0:
-        logger.debug("Keychain: no entry found for 'Claude Code-credentials'")
-        return None
-
-    raw = result.stdout.strip()
-    if not raw:
-        return None
-
-    try:
-        data = json.loads(raw)
-    except json.JSONDecodeError:
-        logger.debug("Keychain: credentials payload is not valid JSON")
-        return None
-
-    oauth_data = data.get("claudeAiOauth")
-    if oauth_data and isinstance(oauth_data, dict):
-        access_token = oauth_data.get("accessToken", "")
-        if access_token:
-            return {
-                "accessToken": access_token,
-                "refreshToken": oauth_data.get("refreshToken", ""),
-                "expiresAt": oauth_data.get("expiresAt", 0),
-                "source": "macos_keychain",
-            }
-
-    return None
-
-
 def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
-    """Read refreshable Claude Code OAuth credentials.
-
-    Checks two sources in order:
-      1. macOS Keychain (Darwin only) — "Claude Code-credentials" entry
-      2. ~/.claude/.credentials.json file
+    """Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.

    This intentionally excludes ~/.claude.json primaryApiKey. Opencode's
    subscription flow is OAuth/setup-token based with refreshable credentials,
@@ -541,12 +471,6 @@ def read_claude_code_credentials() -> Optional[Dict[str, Any]]:

    Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
    """
-    # Try macOS Keychain first (covers Claude Code >=2.1.114)
-    kc_creds = _read_claude_code_credentials_from_keychain()
-    if kc_creds:
-        return kc_creds
-
-    # Fall back to JSON file
    cred_path = Path.home() / ".claude" / ".credentials.json"
    if cred_path.exists():
        try:
@@ -717,9 +641,7 @@ def _write_claude_code_credentials(
        existing["claudeAiOauth"] = oauth_data

        cred_path.parent.mkdir(parents=True, exist_ok=True)
-        _tmp_cred = cred_path.with_suffix(".tmp")
-        _tmp_cred.write_text(json.dumps(existing, indent=2), encoding="utf-8")
-        _tmp_cred.replace(cred_path)
+        cred_path.write_text(json.dumps(existing, indent=2), encoding="utf-8")
        # Restrict permissions (credentials file)
        cred_path.chmod(0o600)
    except (OSError, IOError) as e:
@@ -986,26 +908,6 @@ def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
 # ---------------------------------------------------------------------------


-def _is_bedrock_model_id(model: str) -> bool:
-    """Detect AWS Bedrock model IDs that use dots as namespace separators.
-
-    Bedrock model IDs come in two forms:
-    - Bare:    ``anthropic.claude-opus-4-7``
-    - Regional (inference profiles): ``us.anthropic.claude-sonnet-4-5-v1:0``
-
-    In both cases the dots separate namespace components, not version
-    numbers, and must be preserved verbatim for the Bedrock API.
-    """
-    lower = model.lower()
-    # Regional inference-profile prefixes
-    if any(lower.startswith(p) for p in ("global.", "us.", "eu.", "ap.", "jp.")):
-        return True
-    # Bare Bedrock model IDs: provider.model-family
-    if lower.startswith("anthropic."):
-        return True
-    return False
-
-
 def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
    """Normalize a model name for the Anthropic API.

@@ -1013,19 +915,11 @@ def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
    - Converts dots to hyphens in version numbers (OpenRouter uses dots,
      Anthropic uses hyphens: claude-opus-4.6 → claude-opus-4-6), unless
      preserve_dots is True (e.g. for Alibaba/DashScope: qwen3.5-plus).
-    - Preserves Bedrock model IDs (``anthropic.claude-opus-4-7``) and
-      regional inference profiles (``us.anthropic.claude-*``) whose dots
-      are namespace separators, not version separators.
    """
    lower = model.lower()
    if lower.startswith("anthropic/"):
        model = model[len("anthropic/"):]
    if not preserve_dots:
-        # Bedrock model IDs use dots as namespace separators
-        # (e.g. "anthropic.claude-opus-4-7", "us.anthropic.claude-*").
-        # These must not be converted to hyphens.  See issue #12295.
-        if _is_bedrock_model_id(model):
-            return model
        # OpenRouter uses dots for version separators (claude-opus-4.6),
        # Anthropic uses hyphens (claude-opus-4-6). Convert dots to hyphens.
        model = model.replace(".", "-")
@@ -1704,3 +1598,4 @@ def build_anthropic_kwargs(
    return kwargs


+
@@ -74,12 +74,6 @@ _PROVIDER_ALIASES = {
    "minimax_cn": "minimax-cn",
    "claude": "anthropic",
    "claude-code": "anthropic",
-    "github": "copilot",
-    "github-copilot": "copilot",
-    "github-model": "copilot",
-    "github-models": "copilot",
-    "github-copilot-acp": "copilot-acp",
-    "copilot-acp-agent": "copilot-acp",
 }


@@ -95,11 +89,10 @@ def _normalize_aux_provider(provider: Optional[str]) -> str:
    if normalized == "main":
        # Resolve to the user's actual main provider so named custom providers
        # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
-        main_prov = (_read_main_provider() or "").strip().lower()
+        main_prov = _read_main_provider()
        if main_prov and main_prov not in ("auto", "main", ""):
-            normalized = main_prov
-        else:
-            return "custom"
+            return main_prov
+        return "custom"
    return _PROVIDER_ALIASES.get(normalized, normalized)


@@ -1349,111 +1342,6 @@ def _is_auth_error(exc: Exception) -> bool:
    return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()


-def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
-    """Detect provider 400s for an unsupported request parameter.
-
-    Different OpenAI-compatible endpoints phrase the same class of error a few
-    ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
-    ``param`` field, ``X is not supported``, ``unknown parameter: X``,
-    ``unrecognized request argument: X``.  We match on both the parameter
-    name and a generic "unsupported/unknown/unrecognized parameter" marker so
-    call sites can reactively retry without the offending key instead of
-    surfacing a noisy auxiliary failure.
-
-    Generalizes the temperature-specific detector that originally shipped
-    with PR #15621 so the same retry strategy can cover ``max_tokens``,
-    ``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
-    for the generalization pattern.
-    """
-    param_lower = (param or "").lower()
-    if not param_lower:
-        return False
-    err_lower = str(exc).lower()
-    if param_lower not in err_lower:
-        return False
-    return any(marker in err_lower for marker in (
-        "unsupported parameter",
-        "unsupported_parameter",
-        "not supported",
-        "does not support",
-        "unknown parameter",
-        "unrecognized request argument",
-        "unrecognized parameter",
-        "invalid parameter",
-    ))
-
-
-def _is_unsupported_temperature_error(exc: Exception) -> bool:
-    """Back-compat wrapper: detect API errors where the model rejects ``temperature``.
-
-    Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
-    public symbol because existing tests and call sites import it by name.
-    """
-    return _is_unsupported_parameter_error(exc, "temperature")
-
-
-def _evict_cached_clients(provider: str) -> None:
-    """Drop cached auxiliary clients for a provider so fresh creds are used."""
-    normalized = _normalize_aux_provider(provider)
-    with _client_cache_lock:
-        stale_keys = [
-            key for key in _client_cache
-            if _normalize_aux_provider(str(key[0])) == normalized
-        ]
-        for key in stale_keys:
-            client = _client_cache.get(key, (None, None, None))[0]
-            if client is not None:
-                _force_close_async_httpx(client)
-                try:
-                    close_fn = getattr(client, "close", None)
-                    if callable(close_fn):
-                        close_fn()
-                except Exception:
-                    pass
-            _client_cache.pop(key, None)
-
-
-def _refresh_provider_credentials(provider: str) -> bool:
-    """Refresh short-lived credentials for OAuth-backed auxiliary providers."""
-    normalized = _normalize_aux_provider(provider)
-    try:
-        if normalized == "openai-codex":
-            from hermes_cli.auth import resolve_codex_runtime_credentials
-
-            creds = resolve_codex_runtime_credentials(force_refresh=True)
-            if not str(creds.get("api_key", "") or "").strip():
-                return False
-            _evict_cached_clients(normalized)
-            return True
-        if normalized == "nous":
-            from hermes_cli.auth import resolve_nous_runtime_credentials
-
-            creds = resolve_nous_runtime_credentials(
-                min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
-                timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
-                force_mint=True,
-            )
-            if not str(creds.get("api_key", "") or "").strip():
-                return False
-            _evict_cached_clients(normalized)
-            return True
-        if normalized == "anthropic":
-            from agent.anthropic_adapter import read_claude_code_credentials, _refresh_oauth_token, resolve_anthropic_token
-
-            creds = read_claude_code_credentials()
-            token = _refresh_oauth_token(creds) if isinstance(creds, dict) and creds.get("refreshToken") else None
-            if not str(token or "").strip():
-                token = resolve_anthropic_token()
-            if not str(token or "").strip():
-                return False
-            _evict_cached_clients(normalized)
-            return True
-    except Exception as exc:
-        logger.debug("Auxiliary provider credential refresh failed for %s: %s", normalized, exc)
-        return False
-    return False
-
-
 def _try_payment_fallback(
    failed_provider: str,
    task: str = None,
@@ -1848,7 +1736,7 @@ def resolve_provider_client(
                       "but no endpoint credentials found")
        return None, None

-    # ── Named custom providers (config.yaml providers dict / custom_providers list) ───
+    # ── Named custom providers (config.yaml custom_providers list) ───
    try:
        from hermes_cli.runtime_provider import _get_named_custom_provider
        custom_entry = _get_named_custom_provider(provider)
@@ -1859,51 +1747,16 @@ def resolve_provider_client(
            if not custom_key and custom_key_env:
                custom_key = os.getenv(custom_key_env, "").strip()
            custom_key = custom_key or "no-key-required"
-            # An explicit per-task api_mode override (from _resolve_task_provider_model)
-            # wins; otherwise fall back to what the provider entry declared.
-            entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
            if custom_base:
                final_model = _normalize_resolved_model(
                    model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
                    provider,
                )
-                logger.debug(
-                    "resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
-                    provider, final_model, entry_api_mode or "chat_completions")
-                # anthropic_messages: route through the Anthropic Messages API
-                # via AnthropicAuxiliaryClient. Mirrors the anonymous-custom
-                # branch in _try_custom_endpoint(). See #15033.
-                if entry_api_mode == "anthropic_messages":
-                    try:
-                        from agent.anthropic_adapter import build_anthropic_client
-                        real_client = build_anthropic_client(custom_key, custom_base)
-                    except ImportError:
-                        logger.warning(
-                            "Named custom provider %r declares api_mode="
-                            "anthropic_messages but the anthropic SDK is not "
-                            "installed — falling back to OpenAI-wire.",
-                            provider,
-                        )
-                        client = OpenAI(api_key=custom_key, base_url=custom_base)
-                        return (_to_async_client(client, final_model) if async_mode
-                                else (client, final_model))
-                    sync_anthropic = AnthropicAuxiliaryClient(
-                        real_client, final_model, custom_key, custom_base, is_oauth=False,
-                    )
-                    if async_mode:
-                        return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
-                    return sync_anthropic, final_model
                client = OpenAI(api_key=custom_key, base_url=custom_base)
-                # codex_responses or inherited auto-detect (via _wrap_if_needed).
-                # _wrap_if_needed reads the closed-over `api_mode` (the task-level
-                # override). Named-provider entry api_mode=codex_responses also
-                # flows through here.
-                if entry_api_mode == "codex_responses" and not isinstance(
-                    client, CodexAuxiliaryClient
-                ):
-                    client = CodexAuxiliaryClient(client, final_model)
-                else:
-                    client = _wrap_if_needed(client, final_model, custom_base)
+                client = _wrap_if_needed(client, final_model, custom_base)
+                logger.debug(
+                    "resolve_provider_client: named custom provider %r (%s)",
+                    provider, final_model)
                return (_to_async_client(client, final_model) if async_mode
                        else (client, final_model))
            logger.warning(
@@ -2036,39 +1889,6 @@ def resolve_provider_client(
                       "directly supported", provider)
        return None, None

-    elif pconfig.auth_type == "aws_sdk":
-        # AWS SDK providers (Bedrock) — use the Anthropic Bedrock client via
-        # boto3's credential chain (IAM roles, SSO, env vars, instance metadata).
-        try:
-            from agent.bedrock_adapter import has_aws_credentials, resolve_bedrock_region
-            from agent.anthropic_adapter import build_anthropic_bedrock_client
-        except ImportError:
-            logger.warning("resolve_provider_client: bedrock requested but "
-                           "boto3 or anthropic SDK not installed")
-            return None, None
-
-        if not has_aws_credentials():
-            logger.debug("resolve_provider_client: bedrock requested but "
-                         "no AWS credentials found")
-            return None, None
-
-        region = resolve_bedrock_region()
-        default_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
-        final_model = _normalize_resolved_model(model or default_model, provider)
-        try:
-            real_client = build_anthropic_bedrock_client(region)
-        except ImportError as exc:
-            logger.warning("resolve_provider_client: cannot create Bedrock "
-                           "client: %s", exc)
-            return None, None
-        client = AnthropicAuxiliaryClient(
-            real_client, final_model, api_key="aws-sdk",
-            base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
-        )
-        logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
-        return (_to_async_client(client, final_model) if async_mode
-                else (client, final_model))
-
    elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
        # OAuth providers — route through their specific try functions
        if provider == "nous":
@@ -2995,45 +2815,13 @@ def call_llm(
    if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
        kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])

-    # Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
-    # then payment fallback.
+    # Handle max_tokens vs max_completion_tokens retry, then payment fallback.
    try:
        return _validate_llm_response(
            client.chat.completions.create(**kwargs), task)
    except Exception as first_err:
-        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
-            retry_kwargs = dict(kwargs)
-            retry_kwargs.pop("temperature", None)
-            logger.info(
-                "Auxiliary %s: provider rejected temperature; retrying once without it",
-                task or "call",
-            )
-            try:
-                return _validate_llm_response(
-                    client.chat.completions.create(**retry_kwargs), task)
-            except Exception as retry_err:
-                retry_err_str = str(retry_err)
-                # If retry still fails, fall through to the max_tokens /
-                # payment / auth chains below using the temperature-stripped
-                # kwargs.  Re-raise only if the retry hit something those
-                # chains won't handle.
-                if not (
-                    _is_payment_error(retry_err)
-                    or _is_connection_error(retry_err)
-                    or _is_auth_error(retry_err)
-                    or "max_tokens" in retry_err_str
-                    or "unsupported_parameter" in retry_err_str
-                ):
-                    raise
-                first_err = retry_err
-                kwargs = retry_kwargs
-
        err_str = str(first_err)
-        if max_tokens is not None and (
-            "max_tokens" in err_str
-            or "unsupported_parameter" in err_str
-            or _is_unsupported_parameter_error(first_err, "max_tokens")
-        ):
+        if "max_tokens" in err_str or "unsupported_parameter" in err_str:
            kwargs.pop("max_tokens", None)
            kwargs["max_completion_tokens"] = max_tokens
            try:
@@ -3069,49 +2857,6 @@ def call_llm(
                return _validate_llm_response(
                    refreshed_client.chat.completions.create(**kwargs), task)

-        # ── Auth refresh retry ───────────────────────────────────────
-        if (_is_auth_error(first_err)
-                and resolved_provider not in ("auto", "", None)
-                and not client_is_nous):
-            if _refresh_provider_credentials(resolved_provider):
-                logger.info(
-                    "Auxiliary %s: refreshed %s credentials after auth error, retrying",
-                    task or "call", resolved_provider,
-                )
-                retry_client, retry_model = (
-                    resolve_vision_provider_client(
-                        provider=resolved_provider,
-                        model=final_model,
-                        async_mode=False,
-                    )[1:]
-                    if task == "vision"
-                    else _get_cached_client(
-                        resolved_provider,
-                        resolved_model,
-                        base_url=resolved_base_url,
-                        api_key=resolved_api_key,
-                        api_mode=resolved_api_mode,
-                        main_runtime=main_runtime,
-                    )
-                )
-                if retry_client is not None:
-                    retry_kwargs = _build_call_kwargs(
-                        resolved_provider,
-                        retry_model or final_model,
-                        messages,
-                        temperature=temperature,
-                        max_tokens=max_tokens,
-                        tools=tools,
-                        timeout=effective_timeout,
-                        extra_body=effective_extra_body,
-                        base_url=resolved_base_url,
-                    )
-                    _retry_base = str(getattr(retry_client, "base_url", "") or "")
-                    if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
-                        retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
-                    return _validate_llm_response(
-                        retry_client.chat.completions.create(**retry_kwargs), task)
-
        # ── Payment / credit exhaustion fallback ──────────────────────
        # When the resolved provider returns 402 or a credit-related error,
        # try alternative providers instead of giving up.  This handles the
@@ -3296,35 +3041,8 @@ async def async_call_llm(
        return _validate_llm_response(
            await client.chat.completions.create(**kwargs), task)
    except Exception as first_err:
-        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
-            retry_kwargs = dict(kwargs)
-            retry_kwargs.pop("temperature", None)
-            logger.info(
-                "Auxiliary %s (async): provider rejected temperature; retrying once without it",
-                task or "call",
-            )
-            try:
-                return _validate_llm_response(
-                    await client.chat.completions.create(**retry_kwargs), task)
-            except Exception as retry_err:
-                retry_err_str = str(retry_err)
-                if not (
-                    _is_payment_error(retry_err)
-                    or _is_connection_error(retry_err)
-                    or _is_auth_error(retry_err)
-                    or "max_tokens" in retry_err_str
-                    or "unsupported_parameter" in retry_err_str
-                ):
-                    raise
-                first_err = retry_err
-                kwargs = retry_kwargs
-
        err_str = str(first_err)
-        if max_tokens is not None and (
-            "max_tokens" in err_str
-            or "unsupported_parameter" in err_str
-            or _is_unsupported_parameter_error(first_err, "max_tokens")
-        ):
+        if "max_tokens" in err_str or "unsupported_parameter" in err_str:
            kwargs.pop("max_tokens", None)
            kwargs["max_completion_tokens"] = max_tokens
            try:
@@ -3359,48 +3077,6 @@ async def async_call_llm(
                return _validate_llm_response(
                    await refreshed_client.chat.completions.create(**kwargs), task)

-        # ── Auth refresh retry (mirrors sync call_llm) ───────────────
-        if (_is_auth_error(first_err)
-                and resolved_provider not in ("auto", "", None)
-                and not client_is_nous):
-            if _refresh_provider_credentials(resolved_provider):
-                logger.info(
-                    "Auxiliary %s (async): refreshed %s credentials after auth error, retrying",
-                    task or "call", resolved_provider,
-                )
-                if task == "vision":
-                    _, retry_client, retry_model = resolve_vision_provider_client(
-                        provider=resolved_provider,
-                        model=final_model,
-                        async_mode=True,
-                    )
-                else:
-                    retry_client, retry_model = _get_cached_client(
-                        resolved_provider,
-                        resolved_model,
-                        async_mode=True,
-                        base_url=resolved_base_url,
-                        api_key=resolved_api_key,
-                        api_mode=resolved_api_mode,
-                    )
-                if retry_client is not None:
-                    retry_kwargs = _build_call_kwargs(
-                        resolved_provider,
-                        retry_model or final_model,
-                        messages,
-                        temperature=temperature,
-                        max_tokens=max_tokens,
-                        tools=tools,
-                        timeout=effective_timeout,
-                        extra_body=effective_extra_body,
-                        base_url=resolved_base_url,
-                    )
-                    _retry_base = str(getattr(retry_client, "base_url", "") or "")
-                    if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
-                        retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
-                    return _validate_llm_response(
-                        await retry_client.chat.completions.create(**retry_kwargs), task)
-
        # ── Payment / connection fallback (mirrors sync call_llm) ─────
        should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
        is_auto = resolved_provider in ("auto", "", None)
@@ -87,114 +87,6 @@ def reset_client_cache():
    _bedrock_control_client_cache.clear()


-def invalidate_runtime_client(region: str) -> bool:
-    """Evict the cached ``bedrock-runtime`` client for a single region.
-
-    Per-region counterpart to :func:`reset_client_cache`. Used by the converse
-    call wrappers to discard clients whose underlying HTTP connection has
-    gone stale, so the next call allocates a fresh client (with a fresh
-    connection pool) instead of reusing a dead socket.
-
-    Returns True if a cached entry was evicted, False if the region was not
-    cached.
-    """
-    existed = region in _bedrock_runtime_client_cache
-    _bedrock_runtime_client_cache.pop(region, None)
-    return existed
-
-
-# ---------------------------------------------------------------------------
-# Stale-connection detection
-# ---------------------------------------------------------------------------
-#
-# boto3 caches its HTTPS connection pool inside the client object. When a
-# pooled connection is killed out from under us (NAT timeout, VPN flap,
-# server-side TCP RST, proxy idle cull, etc.), the next use surfaces as
-# one of a handful of low-level exceptions — most commonly
-# ``botocore.exceptions.ConnectionClosedError`` or
-# ``urllib3.exceptions.ProtocolError``. urllib3 also trips an internal
-# ``assert`` in a couple of paths (connection pool state checks, chunked
-# response readers) which bubbles up as a bare ``AssertionError`` with an
-# empty ``str(exc)``.
-#
-# In all of these cases the client is the problem, not the request: retrying
-# with the same cached client reproduces the failure until the process
-# restarts. The fix is to evict the region's cached client so the next
-# attempt builds a new one.
-
-_STALE_LIB_MODULE_PREFIXES = (
-    "urllib3.",
-    "botocore.",
-    "boto3.",
-)
-
-
-def _traceback_frames_modules(exc: BaseException):
-    """Yield ``__name__``-style module strings for each frame in exc's traceback."""
-    tb = getattr(exc, "__traceback__", None)
-    while tb is not None:
-        frame = tb.tb_frame
-        module = frame.f_globals.get("__name__", "")
-        yield module or ""
-        tb = tb.tb_next
-
-
-def is_stale_connection_error(exc: BaseException) -> bool:
-    """Return True if ``exc`` indicates a dead/stale Bedrock HTTP connection.
-
-    Matches:
-      * ``botocore.exceptions.ConnectionError`` and subclasses
-        (``ConnectionClosedError``, ``EndpointConnectionError``,
-        ``ReadTimeoutError``, ``ConnectTimeoutError``).
-      * ``urllib3.exceptions.ProtocolError`` / ``NewConnectionError`` /
-        ``ConnectionError`` (best-effort import — urllib3 is a transitive
-        dependency of botocore so it is always available in practice).
-      * Bare ``AssertionError`` raised from a frame inside urllib3, botocore,
-        or boto3. These are internal-invariant failures (typically triggered
-        by corrupted connection-pool state after a dropped socket) and are
-        recoverable by swapping the client.
-
-    Non-library ``AssertionError``s (from application code or tests) are
-    intentionally not matched — only library-internal asserts signal stale
-    connection state.
-    """
-    # botocore: the canonical signal — HTTPClientError is the umbrella for
-    # ConnectionClosedError, ReadTimeoutError, EndpointConnectionError,
-    # ConnectTimeoutError, and ProxyConnectionError. ConnectionError covers
-    # the same family via a different branch of the hierarchy.
-    try:
-        from botocore.exceptions import (
-            ConnectionError as BotoConnectionError,
-            HTTPClientError,
-        )
-        botocore_errors: tuple = (BotoConnectionError, HTTPClientError)
-    except ImportError:  # pragma: no cover — botocore always present with boto3
-        botocore_errors = ()
-    if botocore_errors and isinstance(exc, botocore_errors):
-        return True
-
-    # urllib3: low-level transport failures
-    try:
-        from urllib3.exceptions import (
-            ProtocolError,
-            NewConnectionError,
-            ConnectionError as Urllib3ConnectionError,
-        )
-        urllib3_errors = (ProtocolError, NewConnectionError, Urllib3ConnectionError)
-    except ImportError:  # pragma: no cover
-        urllib3_errors = ()
-    if urllib3_errors and isinstance(exc, urllib3_errors):
-        return True
-
-    # Library-internal AssertionError (urllib3 / botocore / boto3)
-    if isinstance(exc, AssertionError):
-        for module in _traceback_frames_modules(exc):
-            if any(module.startswith(prefix) for prefix in _STALE_LIB_MODULE_PREFIXES):
-                return True
-
-    return False
-
-
 # ---------------------------------------------------------------------------
 # AWS credential detection
 # ---------------------------------------------------------------------------
@@ -895,17 +787,7 @@ def call_converse(
        guardrail_config=guardrail_config,
    )

-    try:
-        response = client.converse(**kwargs)
-    except Exception as exc:
-        if is_stale_connection_error(exc):
-            logger.warning(
-                "bedrock: stale-connection error on converse(region=%s, model=%s): "
-                "%s — evicting cached client so the next call reconnects.",
-                region, model, type(exc).__name__,
-            )
-            invalidate_runtime_client(region)
-        raise
+    response = client.converse(**kwargs)
    return normalize_converse_response(response)


@@ -937,17 +819,7 @@ def call_converse_stream(
        guardrail_config=guardrail_config,
    )

-    try:
-        response = client.converse_stream(**kwargs)
-    except Exception as exc:
-        if is_stale_connection_error(exc):
-            logger.warning(
-                "bedrock: stale-connection error on converse_stream(region=%s, "
-                "model=%s): %s — evicting cached client so the next call reconnects.",
-                region, model, type(exc).__name__,
-            )
-            invalidate_runtime_client(region)
-        raise
+    response = client.converse_stream(**kwargs)
    return normalize_converse_stream_events(response)


@@ -23,23 +23,6 @@ from agent.prompt_builder import DEFAULT_AGENT_IDENTITY
 logger = logging.getLogger(__name__)


-# Matches Codex/Harmony tool-call serialization that occasionally leaks into
-# assistant-message content when the model fails to emit a structured
-# ``function_call`` item.  Accepts the common forms:
-#
-#   to=functions.exec_command
-#   assistant to=functions.exec_command
-#   <|channel|>commentary to=functions.exec_command
-#
-# ``to=functions.<name>`` is the stable marker — the optional ``assistant`` or
-# Harmony channel prefix varies by degeneration mode.  Case-insensitive to
-# cover lowercase/uppercase ``assistant`` variants.
-_TOOL_CALL_LEAK_PATTERN = re.compile(
-    r"(?:^|[\s>|])to=functions\.[A-Za-z_][\w.]*",
-    re.IGNORECASE,
-)
-
-
 # ---------------------------------------------------------------------------
 # Multimodal content helpers
 # ---------------------------------------------------------------------------
@@ -804,37 +787,6 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
        if isinstance(out_text, str):
            final_text = out_text.strip()

-    # ── Tool-call leak recovery ──────────────────────────────────
-    # gpt-5.x on the Codex Responses API sometimes degenerates and emits
-    # what should be a structured `function_call` item as plain assistant
-    # text using the Harmony/Codex serialization (``to=functions.foo
-    # {json}`` or ``assistant to=functions.foo {json}``). The model
-    # intended to call a tool, but the intent never made it into
-    # ``response.output`` as a ``function_call`` item, so ``tool_calls``
-    # is empty here. If we pass this through, the parent sees a
-    # confident-looking summary with no audit trail (empty ``tool_trace``)
-    # and no tools actually ran — the Taiwan-embassy-email incident.
-    #
-    # Detection: leaked tokens always contain ``to=functions.<name>`` and
-    # the assistant message has no real tool calls. Treat it as incomplete
-    # so the existing Codex-incomplete continuation path (3 retries,
-    # handled in run_agent.py) gets a chance to re-elicit a proper
-    # ``function_call`` item. The existing loop already handles message
-    # append, dedup, and retry budget.
-    leaked_tool_call_text = False
-    if final_text and not tool_calls and _TOOL_CALL_LEAK_PATTERN.search(final_text):
-        leaked_tool_call_text = True
-        logger.warning(
-            "Codex response contains leaked tool-call text in assistant content "
-            "(no structured function_call items). Treating as incomplete so the "
-            "continuation path can re-elicit a proper tool call. Leaked snippet: %r",
-            final_text[:300],
-        )
-        # Clear the text so downstream code doesn't surface the garbage as
-        # a summary. The encrypted reasoning items (if any) are preserved
-        # so the model keeps its chain-of-thought on the retry.
-        final_text = ""
-
    assistant_message = SimpleNamespace(
        content=final_text,
        tool_calls=tool_calls,
@@ -846,8 +798,6 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:

    if tool_calls:
        finish_reason = "tool_calls"
-    elif leaked_tool_call_text:
-        finish_reason = "incomplete"
    elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
        finish_reason = "incomplete"
    elif reasoning_items_raw and not final_text:
@@ -294,7 +294,6 @@ class ContextCompressor(ContextEngine):
        self._context_probed = False
        self._context_probe_persistable = False
        self._previous_summary = None
-        self._last_summary_error = None
        self._last_compression_savings_pct = 100.0
        self._ineffective_compression_count = 0

@@ -318,13 +317,6 @@ class ContextCompressor(ContextEngine):
            int(context_length * self.threshold_percent),
            MINIMUM_CONTEXT_LENGTH,
        )
-        # Recalculate token budgets for the new context length so the
-        # compressor stays calibrated after a model switch (e.g. 200K → 32K).
-        target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
-        self.tail_token_budget = target_tokens
-        self.max_summary_tokens = min(
-            int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
-        )

    def __init__(
        self,
@@ -397,7 +389,6 @@ class ContextCompressor(ContextEngine):
        self._last_compression_savings_pct: float = 100.0
        self._ineffective_compression_count: int = 0
        self._summary_failure_cooldown_until: float = 0.0
-        self._last_summary_error: Optional[str] = None

    def update_from_response(self, usage: Dict[str, Any]):
        """Update tracked token usage from API response."""
@@ -821,12 +812,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            self._previous_summary = summary
            self._summary_failure_cooldown_until = 0.0
            self._summary_model_fallen_back = False
-            self._last_summary_error = None
            return self._with_summary_prefix(summary)
        except RuntimeError:
            # No provider configured — long cooldown, unlikely to self-resolve
            self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
-            self._last_summary_error = "no auxiliary LLM provider configured"
            logging.warning("Context compression: no provider available for "
                            "summary. Middle turns will be dropped without summary "
                            "for %d seconds.",
@@ -864,10 +853,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
            # Transient errors (timeout, rate limit, network) — shorter cooldown
            _transient_cooldown = 60
            self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
-            err_text = str(e).strip() or e.__class__.__name__
-            if len(err_text) > 220:
-                err_text = err_text[:217].rstrip() + "..."
-            self._last_summary_error = err_text
            logging.warning(
                "Failed to generate context summary: %s. "
                "Further summary attempts paused for %d seconds.",
@@ -1114,21 +1099,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio

        return max(cut_idx, head_end + 1)

-    # ------------------------------------------------------------------
-    # ContextEngine: manual /compress preflight
-    # ------------------------------------------------------------------
-
-    def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
-        """Return True if there is a non-empty middle region to compact.
-
-        Overrides the ABC default so the gateway ``/compress`` guard can
-        skip the LLM call when the transcript is still entirely inside
-        the protected head/tail.
-        """
-        compress_start = self._align_boundary_forward(messages, self.protect_first_n)
-        compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
-        return compress_start < compress_end
-
    # ------------------------------------------------------------------
    # Main compression entry point
    # ------------------------------------------------------------------
@@ -78,7 +78,6 @@ class ContextEngine(ABC):
        self,
        messages: List[Dict[str, Any]],
        current_tokens: int = None,
-        focus_topic: str = None,
    ) -> List[Dict[str, Any]]:
        """Compact the message list and return the new message list.

@@ -87,12 +86,6 @@ class ContextEngine(ABC):
        context budget. The implementation is free to summarize, build a
        DAG, or do anything else — as long as the returned list is a valid
        OpenAI-format message sequence.
-
-        Args:
-            focus_topic: Optional topic string from manual ``/compress <focus>``.
-                Engines that support guided compression should prioritise
-                preserving information related to this topic.  Engines that
-                don't support it may simply ignore this argument.
        """

    # -- Optional: pre-flight check ----------------------------------------
@@ -105,21 +98,6 @@ class ContextEngine(ABC):
        """
        return False

-    # -- Optional: manual /compress preflight ------------------------------
-
-    def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
-        """Quick check: is there anything in ``messages`` that can be compacted?
-
-        Used by the gateway ``/compress`` command as a preflight guard —
-        returning False lets the gateway report "nothing to compress yet"
-        without making an LLM call.
-
-        Default returns True (always attempt).  Engines with a cheap way
-        to introspect their own head/tail boundaries should override this
-        to return False when the transcript is still entirely protected.
-        """
-        return True
-
    # -- Optional: session lifecycle ---------------------------------------

    def on_session_start(self, session_id: str, **kwargs) -> None:
@@ -46,47 +46,6 @@ def _resolve_args() -> list[str]:
    return shlex.split(raw)


-def _resolve_home_dir() -> str:
-    """Return a stable HOME for child ACP processes."""
-
-    try:
-        from hermes_constants import get_subprocess_home
-
-        profile_home = get_subprocess_home()
-        if profile_home:
-            return profile_home
-    except Exception:
-        pass
-
-    home = os.environ.get("HOME", "").strip()
-    if home:
-        return home
-
-    expanded = os.path.expanduser("~")
-    if expanded and expanded != "~":
-        return expanded
-
-    try:
-        import pwd
-
-        resolved = pwd.getpwuid(os.getuid()).pw_dir.strip()
-        if resolved:
-            return resolved
-    except Exception:
-        pass
-
-    # Last resort: /tmp (writable on any POSIX system). Avoids crashing the
-    # subprocess with no HOME; callers can set HERMES_HOME explicitly if they
-    # need a different writable dir.
-    return "/tmp"
-
-
-def _build_subprocess_env() -> dict[str, str]:
-    env = os.environ.copy()
-    env["HOME"] = _resolve_home_dir()
-    return env
-
-
 def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
    return {
        "jsonrpc": "2.0",
@@ -423,7 +382,6 @@ class CopilotACPClient:
                text=True,
                bufsize=1,
                cwd=self._acp_cwd,
-                env=_build_subprocess_env(),
            )
        except FileNotFoundError as exc:
            raise RuntimeError(
@@ -455,61 +455,6 @@ class CredentialPool:
            logger.debug("Failed to sync from credentials file: %s", exc)
        return entry

-    def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
-        """Sync a Nous pool entry from auth.json if tokens differ.
-
-        Nous OAuth refresh tokens are single-use.  When another process
-        (e.g. a concurrent cron) refreshes the token via
-        ``resolve_nous_runtime_credentials``, it writes fresh tokens to
-        auth.json under ``_auth_store_lock``.  The pool entry's tokens
-        become stale.  This method detects that and adopts the newer pair,
-        avoiding a "refresh token reuse" revocation on the Nous Portal.
-        """
-        if self.provider != "nous" or entry.source != "device_code":
-            return entry
-        try:
-            with _auth_store_lock():
-                auth_store = _load_auth_store()
-                state = _load_provider_state(auth_store, "nous")
-            if not state:
-                return entry
-            store_refresh = state.get("refresh_token", "")
-            store_access = state.get("access_token", "")
-            if store_refresh and store_refresh != entry.refresh_token:
-                logger.debug(
-                    "Pool entry %s: syncing tokens from auth.json (Nous refresh token changed)",
-                    entry.id,
-                )
-                field_updates: Dict[str, Any] = {
-                    "access_token": store_access,
-                    "refresh_token": store_refresh,
-                    "last_status": None,
-                    "last_status_at": None,
-                    "last_error_code": None,
-                }
-                if state.get("expires_at"):
-                    field_updates["expires_at"] = state["expires_at"]
-                if state.get("agent_key"):
-                    field_updates["agent_key"] = state["agent_key"]
-                if state.get("agent_key_expires_at"):
-                    field_updates["agent_key_expires_at"] = state["agent_key_expires_at"]
-                if state.get("inference_base_url"):
-                    field_updates["inference_base_url"] = state["inference_base_url"]
-                extra_updates = dict(entry.extra)
-                for extra_key in ("obtained_at", "expires_in", "agent_key_id",
-                                  "agent_key_expires_in", "agent_key_reused",
-                                  "agent_key_obtained_at"):
-                    val = state.get(extra_key)
-                    if val is not None:
-                        extra_updates[extra_key] = val
-                updated = replace(entry, extra=extra_updates, **field_updates)
-                self._replace_entry(entry, updated)
-                self._persist()
-                return updated
-        except Exception as exc:
-            logger.debug("Failed to sync Nous entry from auth.json: %s", exc)
-        return entry
-
    def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
        """Write refreshed pool entry tokens back to auth.json providers.

@@ -616,9 +561,6 @@ class CredentialPool:
                    last_refresh=refreshed.get("last_refresh"),
                )
            elif self.provider == "nous":
-                synced = self._sync_nous_entry_from_auth_store(entry)
-                if synced is not entry:
-                    entry = synced
                nous_state = {
                    "access_token": entry.access_token,
                    "refresh_token": entry.refresh_token,
@@ -693,26 +635,6 @@ class CredentialPool:
                    # Credentials file had a valid (non-expired) token — use it directly
                    logger.debug("Credentials file has valid token, using without refresh")
                    return synced
-            # For nous: another process may have consumed the refresh token
-            # between our proactive sync and the HTTP call.  Re-sync from
-            # auth.json and adopt the fresh tokens if available.
-            if self.provider == "nous":
-                synced = self._sync_nous_entry_from_auth_store(entry)
-                if synced.refresh_token != entry.refresh_token:
-                    logger.debug("Nous refresh failed but auth.json has newer tokens — adopting")
-                    updated = replace(
-                        synced,
-                        last_status=STATUS_OK,
-                        last_status_at=None,
-                        last_error_code=None,
-                        last_error_reason=None,
-                        last_error_message=None,
-                        last_error_reset_at=None,
-                    )
-                    self._replace_entry(synced, updated)
-                    self._persist()
-                    self._sync_device_code_entry_to_auth_store(updated)
-                    return updated
            self._mark_exhausted(entry, None)
            return None

@@ -776,17 +698,6 @@ class CredentialPool:
                if synced is not entry:
                    entry = synced
                    cleared_any = True
-            # For nous entries, sync from auth.json before status checks.
-            # Another process may have successfully refreshed via
-            # resolve_nous_runtime_credentials(), making this entry's
-            # exhausted status stale.
-            if (self.provider == "nous"
-                    and entry.source == "device_code"
-                    and entry.last_status == STATUS_EXHAUSTED):
-                synced = self._sync_nous_entry_from_auth_store(entry)
-                if synced is not entry:
-                    entry = synced
-                    cleared_any = True
            if entry.last_status == STATUS_EXHAUSTED:
                exhausted_until = _exhausted_until(entry)
                if exhausted_until is not None and now < exhausted_until:
@@ -828,11 +739,8 @@ class CredentialPool:

        if self._strategy == STRATEGY_LEAST_USED and len(available) > 1:
            entry = min(available, key=lambda e: e.request_count)
-            # Increment usage counter so subsequent selections distribute load
-            updated = replace(entry, request_count=entry.request_count + 1)
-            self._replace_entry(entry, updated)
            self._current_id = entry.id
-            return updated
+            return entry

        if self._strategy == STRATEGY_ROUND_ROBIN and len(available) > 1:
            entry = available[0]
@@ -1148,18 +1056,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
                    "inference_base_url": state.get("inference_base_url"),
                    "agent_key": state.get("agent_key"),
                    "agent_key_expires_at": state.get("agent_key_expires_at"),
-                    # Carry the mint/refresh timestamps into the pool so
-                    # freshness-sensitive consumers (self-heal hooks, pool
-                    # pruning by age) can distinguish just-minted credentials
-                    # from stale ones.  Without these, fresh device_code
-                    # entries get obtained_at=None and look older than they
-                    # are (#15099).
-                    "obtained_at": state.get("obtained_at"),
-                    "expires_in": state.get("expires_in"),
-                    "agent_key_id": state.get("agent_key_id"),
-                    "agent_key_expires_in": state.get("agent_key_expires_in"),
-                    "agent_key_reused": state.get("agent_key_reused"),
-                    "agent_key_obtained_at": state.get("agent_key_obtained_at"),
                    "tls": state.get("tls") if isinstance(state.get("tls"), dict) else None,
                    "label": seeded_label,
                },
@@ -1170,10 +1066,9 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
        # env vars (COPILOT_GITHUB_TOKEN / GH_TOKEN).  They don't live in
        # the auth store or credential pool, so we resolve them here.
        try:
-            from hermes_cli.copilot_auth import resolve_copilot_token, get_copilot_api_token
+            from hermes_cli.copilot_auth import resolve_copilot_token
            token, source = resolve_copilot_token()
            if token:
-                api_token = get_copilot_api_token(token)
                source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
                if not _is_suppressed(provider, source_name):
                    active_sources.add(source_name)
@@ -1185,7 +1080,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
                        {
                            "source": source_name,
                            "auth_type": AUTH_TYPE_API_KEY,
-                            "access_token": api_token,
+                            "access_token": token,
                            "base_url": pconfig.inference_base_url if pconfig else "",
                            "label": source,
                        },
@@ -45,7 +45,6 @@ class FailoverReason(enum.Enum):

    # Model
    model_not_found = "model_not_found"  # 404 or invalid model — fallback to different model
-    provider_policy_blocked = "provider_policy_blocked"  # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy

    # Request format
    format_error = "format_error"        # 400 bad request — abort or strip + retry
@@ -195,29 +194,6 @@ _MODEL_NOT_FOUND_PATTERNS = [
    "unsupported model",
 ]

-# OpenRouter aggregator policy-block patterns.
-#
-# When a user's OpenRouter account privacy setting (or a per-request
-# `provider.data_collection: deny` preference) excludes the only endpoint
-# serving a model, OpenRouter returns 404 with a *specific* message that is
-# distinct from "model not found":
-#
-#   "No endpoints available matching your guardrail restrictions and
-#    data policy. Configure: https://openrouter.ai/settings/privacy"
-#
-# We classify this as `provider_policy_blocked` rather than
-# `model_not_found` because:
-#   - The model *exists* — model_not_found is misleading in logs
-#   - Provider fallback won't help: the account-level setting applies to
-#     every call on the same OpenRouter account
-#   - The error body already contains the fix URL, so the user gets
-#     actionable guidance without us rewriting the message
-_PROVIDER_POLICY_BLOCKED_PATTERNS = [
-    "no endpoints available matching your guardrail",
-    "no endpoints available matching your data policy",
-    "no endpoints found matching your data policy",
-]
-
 # Auth patterns (non-status-code signals)
 _AUTH_PATTERNS = [
    "invalid api key",
@@ -343,11 +319,6 @@ def classify_api_error(
    """
    status_code = _extract_status_code(error)
    error_type = type(error).__name__
-    # Copilot/GitHub Models RateLimitError may not set .status_code; force 429
-    # so downstream rate-limit handling (classifier reason, pool rotation,
-    # fallback gating) fires correctly instead of misclassifying as generic.
-    if status_code is None and error_type == "RateLimitError":
-        status_code = 429
    body = _extract_error_body(error)
    error_code = _extract_error_code(body)

@@ -552,17 +523,6 @@ def _classify_by_status(
        return _classify_402(error_msg, result_fn)

    if status_code == 404:
-        # OpenRouter policy-block 404 — distinct from "model not found".
-        # The model exists; the user's account privacy setting excludes the
-        # only endpoint serving it. Falling back to another provider won't
-        # help (same account setting applies).  The error body already
-        # contains the fix URL, so just surface it.
-        if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
-            return result_fn(
-                FailoverReason.provider_policy_blocked,
-                retryable=False,
-                should_fallback=False,
-            )
        if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
            return result_fn(
                FailoverReason.model_not_found,
@@ -680,12 +640,6 @@ def _classify_400(
        )

    # Some providers return model-not-found as 400 instead of 404 (e.g. OpenRouter).
-    if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
-        return result_fn(
-            FailoverReason.provider_policy_blocked,
-            retryable=False,
-            should_fallback=False,
-        )
    if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
        return result_fn(
            FailoverReason.model_not_found,
@@ -858,15 +812,6 @@ def _classify_by_message(
            should_fallback=True,
        )

-    # Provider policy-block (aggregator-side guardrail) — check before
-    # model_not_found so we don't mis-label as a missing model.
-    if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
-        return result_fn(
-            FailoverReason.provider_policy_blocked,
-            retryable=False,
-            should_fallback=False,
-        )
-
    # Model not found patterns
    if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
        return result_fn(
@@ -44,97 +44,6 @@ def is_native_gemini_base_url(base_url: str) -> bool:
    return not normalized.endswith("/openai")


-def probe_gemini_tier(
-    api_key: str,
-    base_url: str = DEFAULT_GEMINI_BASE_URL,
-    *,
-    model: str = "gemini-2.5-flash",
-    timeout: float = 10.0,
-) -> str:
-    """Probe a Google AI Studio API key and return its tier.
-
-    Returns one of:
-
-    - ``"free"``    -- key is on the free tier (unusable with Hermes)
-    - ``"paid"``    -- key is on a paid tier
-    - ``"unknown"`` -- probe failed; callers should proceed without blocking.
-    """
-    key = (api_key or "").strip()
-    if not key:
-        return "unknown"
-
-    normalized_base = str(base_url or DEFAULT_GEMINI_BASE_URL).strip().rstrip("/")
-    if not normalized_base:
-        normalized_base = DEFAULT_GEMINI_BASE_URL
-    if normalized_base.lower().endswith("/openai"):
-        normalized_base = normalized_base[: -len("/openai")]
-
-    url = f"{normalized_base}/models/{model}:generateContent"
-    payload = {
-        "contents": [{"role": "user", "parts": [{"text": "hi"}]}],
-        "generationConfig": {"maxOutputTokens": 1},
-    }
-
-    try:
-        with httpx.Client(timeout=timeout) as client:
-            resp = client.post(
-                url,
-                params={"key": key},
-                json=payload,
-                headers={"Content-Type": "application/json"},
-            )
-    except Exception as exc:
-        logger.debug("probe_gemini_tier: network error: %s", exc)
-        return "unknown"
-
-    headers_lower = {k.lower(): v for k, v in resp.headers.items()}
-    rpd_header = headers_lower.get("x-ratelimit-limit-requests-per-day")
-    if rpd_header:
-        try:
-            rpd_val = int(rpd_header)
-        except (TypeError, ValueError):
-            rpd_val = None
-        # Published free-tier daily caps (Dec 2025):
-        #   gemini-2.5-pro: 100, gemini-2.5-flash: 250, flash-lite: 1000
-        # Tier 1 starts at ~1500+ for Flash. We treat <= 1000 as free.
-        if rpd_val is not None and rpd_val <= 1000:
-            return "free"
-        if rpd_val is not None and rpd_val > 1000:
-            return "paid"
-
-    if resp.status_code == 429:
-        body_text = ""
-        try:
-            body_text = resp.text or ""
-        except Exception:
-            body_text = ""
-        if "free_tier" in body_text.lower():
-            return "free"
-        return "paid"
-
-    if 200 <= resp.status_code < 300:
-        return "paid"
-
-    return "unknown"
-
-
-def is_free_tier_quota_error(error_message: str) -> bool:
-    """Return True when a Gemini 429 message indicates free-tier exhaustion."""
-    if not error_message:
-        return False
-    return "free_tier" in error_message.lower()
-
-
-_FREE_TIER_GUIDANCE = (
-    "\n\nYour Google API key is on the free tier (<= 250 requests/day for "
-    "gemini-2.5-flash). Hermes typically makes 3-10 API calls per user turn, "
-    "so the free tier is exhausted in a handful of messages and cannot sustain "
-    "an agent session. Enable billing on your Google Cloud project and "
-    "regenerate the key in a billing-enabled project: "
-    "https://aistudio.google.com/apikey"
-)
-
-
 class GeminiAPIError(Exception):
    """Error shape compatible with Hermes retry/error classification."""

@@ -741,12 +650,6 @@ def gemini_http_error(response: httpx.Response) -> GeminiAPIError:
    else:
        message = f"Gemini returned HTTP {status}: {body_text[:500]}"

-    # Free-tier quota exhaustion -> append actionable guidance so users who
-    # bypassed the setup wizard (direct GOOGLE_API_KEY in .env) still learn
-    # that the free tier cannot sustain an agent session.
-    if status == 429 and is_free_tier_quota_error(err_message or body_text):
-        message = message + _FREE_TIER_GUIDANCE
-
    return GeminiAPIError(
        message,
        code=code,
@@ -801,13 +704,6 @@ class GeminiNativeClient:
        http_client: Optional[httpx.Client] = None,
        **_: Any,
    ) -> None:
-        if not (api_key or "").strip():
-            raise RuntimeError(
-                "Gemini native client requires an API key, but none was provided. "
-                "Set GOOGLE_API_KEY or GEMINI_API_KEY in your environment / ~/.hermes/.env "
-                "(get one at https://aistudio.google.com/app/apikey), or run `hermes setup` "
-                "to configure the Google provider."
-            )
        self.api_key = api_key
        normalized_base = (base_url or DEFAULT_GEMINI_BASE_URL).rstrip("/")
        if normalized_base.endswith("/openai"):
@@ -73,20 +73,6 @@ def sanitize_gemini_schema(schema: Any) -> Dict[str, Any]:
            ]
            continue
        cleaned[key] = value
-
-    # Gemini's Schema validator requires every ``enum`` entry to be a string,
-    # even when the parent ``type`` is ``integer`` / ``number`` / ``boolean``.
-    # OpenAI / OpenRouter / Anthropic accept typed enums (e.g. Discord's
-    # ``auto_archive_duration: {type: integer, enum: [60, 1440, 4320, 10080]}``),
-    # so we only drop the ``enum`` when it would collide with Gemini's rule.
-    # Keeping ``type: integer`` plus the human-readable description gives the
-    # model enough guidance; the tool handler still validates the value.
-    enum_val = cleaned.get("enum")
-    type_val = cleaned.get("type")
-    if isinstance(enum_val, list) and type_val in {"integer", "number", "boolean"}:
-        if any(not isinstance(item, str) for item in enum_val):
-            cleaned.pop("enum", None)
-
    return cleaned


@@ -31,7 +31,6 @@ from __future__ import annotations
 import json
 import logging
 import re
-import inspect
 from typing import Any, Dict, List, Optional

 from agent.memory_provider import MemoryProvider
@@ -313,39 +312,7 @@ class MemoryManager:
                )
        return "\n\n".join(parts)

-    @staticmethod
-    def _provider_memory_write_metadata_mode(provider: MemoryProvider) -> str:
-        """Return how to pass metadata to a provider's memory-write hook."""
-        try:
-            signature = inspect.signature(provider.on_memory_write)
-        except (TypeError, ValueError):
-            return "keyword"
-
-        params = list(signature.parameters.values())
-        if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
-            return "keyword"
-        if "metadata" in signature.parameters:
-            return "keyword"
-
-        accepted = [
-            p for p in params
-            if p.kind in (
-                inspect.Parameter.POSITIONAL_ONLY,
-                inspect.Parameter.POSITIONAL_OR_KEYWORD,
-                inspect.Parameter.KEYWORD_ONLY,
-            )
-        ]
-        if len(accepted) >= 4:
-            return "positional"
-        return "legacy"
-
-    def on_memory_write(
-        self,
-        action: str,
-        target: str,
-        content: str,
-        metadata: Optional[Dict[str, Any]] = None,
-    ) -> None:
+    def on_memory_write(self, action: str, target: str, content: str) -> None:
        """Notify external providers when the built-in memory tool writes.

        Skips the builtin provider itself (it's the source of the write).
@@ -354,15 +321,7 @@ class MemoryManager:
            if provider.name == "builtin":
                continue
            try:
-                metadata_mode = self._provider_memory_write_metadata_mode(provider)
-                if metadata_mode == "keyword":
-                    provider.on_memory_write(
-                        action, target, content, metadata=dict(metadata or {})
-                    )
-                elif metadata_mode == "positional":
-                    provider.on_memory_write(action, target, content, dict(metadata or {}))
-                else:
-                    provider.on_memory_write(action, target, content)
+                provider.on_memory_write(action, target, content)
            except Exception as e:
                logger.debug(
                    "Memory provider '%s' on_memory_write failed: %s",
@@ -26,7 +26,7 @@ Optional hooks (override to opt in):
  on_turn_start(turn, message, **kwargs) — per-turn tick with runtime context
  on_session_end(messages)               — end-of-session extraction
  on_pre_compress(messages) -> str       — extract before context compression
-  on_memory_write(action, target, content, metadata=None) — mirror built-in memory writes
+  on_memory_write(action, target, content) — mirror built-in memory writes
  on_delegation(task, result, **kwargs)  — parent-side observation of subagent work
 """

@@ -34,7 +34,7 @@ from __future__ import annotations

 import logging
 from abc import ABC, abstractmethod
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List

 logger = logging.getLogger(__name__)

@@ -220,21 +220,12 @@ class MemoryProvider(ABC):
          should all have ``env_var`` set and this method stays no-op).
        """

-    def on_memory_write(
-        self,
-        action: str,
-        target: str,
-        content: str,
-        metadata: Optional[Dict[str, Any]] = None,
-    ) -> None:
+    def on_memory_write(self, action: str, target: str, content: str) -> None:
        """Called when the built-in memory tool writes an entry.

        action: 'add', 'replace', or 'remove'
        target: 'memory' or 'user'
        content: the entry content
-        metadata: structured provenance for the write, when available. Common
-          keys include ``write_origin``, ``execution_context``, ``session_id``,
-          ``parent_session_id``, ``platform``, and ``tool_name``.

        Use to mirror built-in memory writes to your backend.
        """
@@ -6,7 +6,6 @@ and run_agent.py for pre-flight context checks.

 import ipaddress
 import logging
-import os
 import re
 import time
 from pathlib import Path
@@ -22,25 +21,6 @@ from hermes_constants import OPENROUTER_MODELS_URL

 logger = logging.getLogger(__name__)

-
-def _resolve_requests_verify() -> bool | str:
-    """Resolve SSL verify setting for `requests` calls from env vars.
-
-    The `requests` library only honours REQUESTS_CA_BUNDLE / CURL_CA_BUNDLE
-    by default. Hermes also honours HERMES_CA_BUNDLE (its own convention)
-    and SSL_CERT_FILE (used by the stdlib `ssl` module and by httpx), so
-    that a single env var can cover both `requests` and `httpx` callsites
-    inside the same process.
-
-    Returns either a filesystem path to a CA bundle, or True to defer to
-    the requests default (certifi).
-    """
-    for env_var in ("HERMES_CA_BUNDLE", "REQUESTS_CA_BUNDLE", "SSL_CERT_FILE"):
-        val = os.getenv(env_var)
-        if val and os.path.isfile(val):
-            return val
-    return True
-
 # Provider names that can appear as a "provider:" prefix before a model ID.
 # Only these are stripped — Ollama-style "model:tag" colons (e.g. "qwen3.5:27b")
 # are preserved so the full model name reaches cache lookups and server queries.
@@ -143,9 +123,8 @@ DEFAULT_CONTEXT_LENGTHS = {
    "claude": 200000,
    # OpenAI — GPT-5 family (most have 400k; specific overrides first)
    # Source: https://developers.openai.com/api/docs/models
-    # GPT-5.5 (launched Apr 23 2026). 400k is the fallback for providers we
-    # can't probe live. ChatGPT Codex OAuth actually caps lower (272k as of
-    # Apr 2026) and is resolved via _resolve_codex_oauth_context_length().
+    # GPT-5.5 (launched Apr 23 2026). Verified via live ChatGPT codex/models
+    # endpoint: bare slug `gpt-5.5`, no -pro/-mini variants. 400k context on Codex.
    "gpt-5.5": 400000,
    "gpt-5.4-nano": 400000,           # 400k (not 1.05M like full 5.4)
    "gpt-5.4-mini": 400000,           # 400k (not 1.05M like full 5.4)
@@ -515,7 +494,7 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
        return _model_metadata_cache

    try:
-        response = requests.get(OPENROUTER_MODELS_URL, timeout=10, verify=_resolve_requests_verify())
+        response = requests.get(OPENROUTER_MODELS_URL, timeout=10)
        response.raise_for_status()
        data = response.json()

@@ -582,7 +561,6 @@ def fetch_endpoint_model_metadata(
                    server_url.rstrip("/") + "/api/v1/models",
                    headers=headers,
                    timeout=10,
-                    verify=_resolve_requests_verify(),
                )
                response.raise_for_status()
                payload = response.json()
@@ -631,7 +609,7 @@ def fetch_endpoint_model_metadata(
    for candidate in candidates:
        url = candidate.rstrip("/") + "/models"
        try:
-            response = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
+            response = requests.get(url, headers=headers, timeout=10)
            response.raise_for_status()
            payload = response.json()
            cache: Dict[str, Dict[str, Any]] = {}
@@ -662,10 +640,9 @@ def fetch_endpoint_model_metadata(
                try:
                    # Try /v1/props first (current llama.cpp); fall back to /props for older builds
                    base = candidate.rstrip("/").replace("/v1", "")
-                    _verify = _resolve_requests_verify()
-                    props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5, verify=_verify)
+                    props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5)
                    if not props_resp.ok:
-                        props_resp = requests.get(base + "/props", headers=headers, timeout=5, verify=_verify)
+                        props_resp = requests.get(base + "/props", headers=headers, timeout=5)
                    if props_resp.ok:
                        props = props_resp.json()
                        gen_settings = props.get("default_generation_settings", {})
@@ -737,22 +714,6 @@ def get_cached_context_length(model: str, base_url: str) -> Optional[int]:
    return cache.get(key)


-def _invalidate_cached_context_length(model: str, base_url: str) -> None:
-    """Drop a stale cache entry so it gets re-resolved on the next lookup."""
-    key = f"{model}@{base_url}"
-    cache = _load_context_cache()
-    if key not in cache:
-        return
-    del cache[key]
-    path = _get_context_cache_path()
-    try:
-        path.parent.mkdir(parents=True, exist_ok=True)
-        with open(path, "w") as f:
-            yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
-    except Exception as e:
-        logger.debug("Failed to invalidate context length cache entry %s: %s", key, e)
-
-
 def get_next_probe_tier(current_length: int) -> Optional[int]:
    """Return the next lower probe tier, or None if already at minimum."""
    for tier in CONTEXT_PROBE_TIERS:
@@ -1030,7 +991,7 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
            "x-api-key": api_key,
            "anthropic-version": "2023-06-01",
        }
-        resp = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
+        resp = requests.get(url, headers=headers, timeout=10)
        if resp.status_code != 200:
            return None
        data = resp.json()
@@ -1044,116 +1005,6 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
    return None


-# Known ChatGPT Codex OAuth context windows (observed via live
-# chatgpt.com/backend-api/codex/models probe, Apr 2026). These are the
-# `context_window` values, which are what Codex actually enforces — the
-# direct OpenAI API has larger limits for the same slugs, but Codex OAuth
-# caps lower (e.g. gpt-5.5 is 1.05M on the API, 272K on Codex).
-#
-# Used as a fallback when the live probe fails (no token, network error).
-# Longest keys first so substring match picks the most specific entry.
-_CODEX_OAUTH_CONTEXT_FALLBACK: Dict[str, int] = {
-    "gpt-5.1-codex-max": 272_000,
-    "gpt-5.1-codex-mini": 272_000,
-    "gpt-5.3-codex": 272_000,
-    "gpt-5.2-codex": 272_000,
-    "gpt-5.4-mini": 272_000,
-    "gpt-5.5": 272_000,
-    "gpt-5.4": 272_000,
-    "gpt-5.2": 272_000,
-    "gpt-5": 272_000,
-}
-
-
-_codex_oauth_context_cache: Dict[str, int] = {}
-_codex_oauth_context_cache_time: float = 0.0
-_CODEX_OAUTH_CONTEXT_CACHE_TTL = 3600  # 1 hour
-
-
-def _fetch_codex_oauth_context_lengths(access_token: str) -> Dict[str, int]:
-    """Probe the ChatGPT Codex /models endpoint for per-slug context windows.
-
-    Codex OAuth imposes its own context limits that differ from the direct
-    OpenAI API (e.g. gpt-5.5 is 1.05M on the API, 272K on Codex). The
-    `context_window` field in each model entry is the authoritative source.
-
-    Returns a ``{slug: context_window}`` dict. Empty on failure.
-    """
-    global _codex_oauth_context_cache, _codex_oauth_context_cache_time
-    now = time.time()
-    if (
-        _codex_oauth_context_cache
-        and now - _codex_oauth_context_cache_time < _CODEX_OAUTH_CONTEXT_CACHE_TTL
-    ):
-        return _codex_oauth_context_cache
-
-    try:
-        resp = requests.get(
-            "https://chatgpt.com/backend-api/codex/models?client_version=1.0.0",
-            headers={"Authorization": f"Bearer {access_token}"},
-            timeout=10,
-            verify=_resolve_requests_verify(),
-        )
-        if resp.status_code != 200:
-            logger.debug(
-                "Codex /models probe returned HTTP %s; falling back to hardcoded defaults",
-                resp.status_code,
-            )
-            return {}
-        data = resp.json()
-    except Exception as exc:
-        logger.debug("Codex /models probe failed: %s", exc)
-        return {}
-
-    entries = data.get("models", []) if isinstance(data, dict) else []
-    result: Dict[str, int] = {}
-    for item in entries:
-        if not isinstance(item, dict):
-            continue
-        slug = item.get("slug")
-        ctx = item.get("context_window")
-        if isinstance(slug, str) and isinstance(ctx, int) and ctx > 0:
-            result[slug.strip()] = ctx
-
-    if result:
-        _codex_oauth_context_cache = result
-        _codex_oauth_context_cache_time = now
-    return result
-
-
-def _resolve_codex_oauth_context_length(
-    model: str, access_token: str = ""
-) -> Optional[int]:
-    """Resolve a Codex OAuth model's real context window.
-
-    Prefers a live probe of chatgpt.com/backend-api/codex/models (when we
-    have a bearer token), then falls back to ``_CODEX_OAUTH_CONTEXT_FALLBACK``.
-    """
-    model_bare = _strip_provider_prefix(model).strip()
-    if not model_bare:
-        return None
-
-    if access_token:
-        live = _fetch_codex_oauth_context_lengths(access_token)
-        if model_bare in live:
-            return live[model_bare]
-        # Case-insensitive match in case casing drifts
-        model_lower = model_bare.lower()
-        for slug, ctx in live.items():
-            if slug.lower() == model_lower:
-                return ctx
-
-    # Fallback: longest-key-first substring match over hardcoded defaults.
-    model_lower = model_bare.lower()
-    for slug, ctx in sorted(
-        _CODEX_OAUTH_CONTEXT_FALLBACK.items(), key=lambda x: len(x[0]), reverse=True
-    ):
-        if slug in model_lower:
-            return ctx
-
-    return None
-
-
 def _resolve_nous_context_length(model: str) -> Optional[int]:
    """Resolve Nous Portal model context length via OpenRouter metadata.

@@ -1199,7 +1050,6 @@ def get_model_context_length(
    Resolution order:
    0. Explicit config override (model.context_length or custom_providers per-model)
    1. Persistent cache (previously discovered via probing)
-    1b. AWS Bedrock static table (must precede custom-endpoint probe)
    2. Active endpoint metadata (/models for explicit custom endpoints)
    3. Local server query (for local endpoints)
    4. Anthropic /v1/models API (API-key users only, not OAuth)
@@ -1222,41 +1072,7 @@ def get_model_context_length(
    if base_url:
        cached = get_cached_context_length(model, base_url)
        if cached is not None:
-            # Invalidate stale Codex OAuth cache entries: pre-PR #14935 builds
-            # resolved gpt-5.x to the direct-API value (e.g. 1.05M) via
-            # models.dev and persisted it. Codex OAuth caps at 272K for every
-            # slug, so any cached Codex entry at or above 400K is a leftover
-            # from the old resolution path. Drop it and fall through to the
-            # live /models probe in step 5 below.
-            if provider == "openai-codex" and cached >= 400_000:
-                logger.info(
-                    "Dropping stale Codex cache entry %s@%s -> %s (pre-fix value); "
-                    "re-resolving via live /models probe",
-                    model, base_url, f"{cached:,}",
-                )
-                _invalidate_cached_context_length(model, base_url)
-            else:
-                return cached
-
-    # 1b. AWS Bedrock — use static context length table.
-    # Bedrock's ListFoundationModels API doesn't expose context window sizes,
-    # so we maintain a curated table in bedrock_adapter.py that reflects
-    # AWS-imposed limits (e.g. 200K for Claude models vs 1M on the native
-    # Anthropic API).  This must run BEFORE the custom-endpoint probe at
-    # step 2 — bedrock-runtime.<region>.amazonaws.com is not in
-    # _URL_TO_PROVIDER, so it would otherwise be treated as a custom endpoint,
-    # fail the /models probe (Bedrock doesn't expose that shape), and fall
-    # back to the 128K default before reaching the original step 4b branch.
-    if provider == "bedrock" or (
-        base_url
-        and base_url_hostname(base_url).startswith("bedrock-runtime.")
-        and base_url_host_matches(base_url, "amazonaws.com")
-    ):
-        try:
-            from agent.bedrock_adapter import get_bedrock_context_length
-            return get_bedrock_context_length(model)
-        except ImportError:
-            pass  # boto3 not installed — fall through to generic resolution
+            return cached

    # 2. Active endpoint metadata for truly custom/unknown endpoints.
    # Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
@@ -1303,7 +1119,19 @@ def get_model_context_length(
        if ctx:
            return ctx

-    # 4b. (Bedrock handled earlier at step 1b — before custom-endpoint probe.)
+    # 4b. AWS Bedrock — use static context length table.
+    # Bedrock's ListFoundationModels doesn't expose context window sizes,
+    # so we maintain a curated table in bedrock_adapter.py.
+    if provider == "bedrock" or (
+        base_url
+        and base_url_hostname(base_url).startswith("bedrock-runtime.")
+        and base_url_host_matches(base_url, "amazonaws.com")
+    ):
+        try:
+            from agent.bedrock_adapter import get_bedrock_context_length
+            return get_bedrock_context_length(model)
+        except ImportError:
+            pass  # boto3 not installed — fall through to generic resolution

    # 5. Provider-aware lookups (before generic OpenRouter cache)
    # These are provider-specific and take priority over the generic OR cache,
@@ -1317,32 +1145,10 @@ def get_model_context_length(
            if inferred:
                effective_provider = inferred

-    # 5a. Copilot live /models API — max_prompt_tokens from the user's account.
-    # This catches account-specific models (e.g. claude-opus-4.6-1m) that
-    # don't exist in models.dev. For models that ARE in models.dev, this
-    # returns the provider-enforced limit which is what users can actually use.
-    if effective_provider in ("copilot", "copilot-acp", "github-copilot"):
-        try:
-            from hermes_cli.models import get_copilot_model_context
-            ctx = get_copilot_model_context(model, api_key=api_key)
-            if ctx:
-                return ctx
-        except Exception:
-            pass  # Fall through to models.dev
-
    if effective_provider == "nous":
        ctx = _resolve_nous_context_length(model)
        if ctx:
            return ctx
-    if effective_provider == "openai-codex":
-        # Codex OAuth enforces lower context limits than the direct OpenAI
-        # API for the same slug (e.g. gpt-5.5 is 1.05M on the API but 272K
-        # on Codex). Authoritative source is Codex's own /models endpoint.
-        codex_ctx = _resolve_codex_oauth_context_length(model, access_token=api_key or "")
-        if codex_ctx:
-            if base_url:
-                save_context_length(model, base_url, codex_ctx)
-            return codex_ctx
    if effective_provider:
        from agent.models_dev import lookup_models_dev_context
        ctx = lookup_models_dev_context(effective_provider, model)
@@ -1,29 +1,154 @@
-"""Shared slash command helpers for skills.
+"""Shared slash command helpers for skills and built-in prompt-style modes.

 Shared between CLI (cli.py) and gateway (gateway/run.py) so both surfaces
-can invoke skills via /skill-name commands.
+can invoke skills via /skill-name commands and prompt-only built-ins like
+/plan.
 """

 import json
 import logging
 import re
+import subprocess
+from datetime import datetime
 from pathlib import Path
 from typing import Any, Dict, Optional

 from hermes_constants import display_hermes_home
-from agent.skill_preprocessing import (
-    expand_inline_shell as _expand_inline_shell,
-    load_skills_config as _load_skills_config,
-    substitute_template_vars as _substitute_template_vars,
-)

 logger = logging.getLogger(__name__)

 _skill_commands: Dict[str, Dict[str, Any]] = {}
+_PLAN_SLUG_RE = re.compile(r"[^a-z0-9]+")
 # Patterns for sanitizing skill names into clean hyphen-separated slugs.
 _SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
 _SKILL_MULTI_HYPHEN = re.compile(r"-{2,}")

+# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
+# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
+# left as-is so the user can debug them.
+_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
+
+# Matches inline shell snippets like:  !`date +%Y-%m-%d`
+# Non-greedy, single-line only — no newlines inside the backticks.
+_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
+
+# Cap inline-shell output so a runaway command can't blow out the context.
+_INLINE_SHELL_MAX_OUTPUT = 4000
+
+
+def _load_skills_config() -> dict:
+    """Load the ``skills`` section of config.yaml (best-effort)."""
+    try:
+        from hermes_cli.config import load_config
+
+        cfg = load_config() or {}
+        skills_cfg = cfg.get("skills")
+        if isinstance(skills_cfg, dict):
+            return skills_cfg
+    except Exception:
+        logger.debug("Could not read skills config", exc_info=True)
+    return {}
+
+
+def _substitute_template_vars(
+    content: str,
+    skill_dir: Path | None,
+    session_id: str | None,
+) -> str:
+    """Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
+
+    Only substitutes tokens for which a concrete value is available —
+    unresolved tokens are left in place so the author can spot them.
+    """
+    if not content:
+        return content
+
+    skill_dir_str = str(skill_dir) if skill_dir else None
+
+    def _replace(match: re.Match) -> str:
+        token = match.group(1)
+        if token == "HERMES_SKILL_DIR" and skill_dir_str:
+            return skill_dir_str
+        if token == "HERMES_SESSION_ID" and session_id:
+            return str(session_id)
+        return match.group(0)
+
+    return _SKILL_TEMPLATE_RE.sub(_replace, content)
+
+
+def _run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
+    """Execute a single inline-shell snippet and return its stdout (trimmed).
+
+    Failures return a short ``[inline-shell error: ...]`` marker instead of
+    raising, so one bad snippet can't wreck the whole skill message.
+    """
+    try:
+        completed = subprocess.run(
+            ["bash", "-c", command],
+            cwd=str(cwd) if cwd else None,
+            capture_output=True,
+            text=True,
+            timeout=max(1, int(timeout)),
+            check=False,
+        )
+    except subprocess.TimeoutExpired:
+        return f"[inline-shell timeout after {timeout}s: {command}]"
+    except FileNotFoundError:
+        return f"[inline-shell error: bash not found]"
+    except Exception as exc:
+        return f"[inline-shell error: {exc}]"
+
+    output = (completed.stdout or "").rstrip("\n")
+    if not output and completed.stderr:
+        output = completed.stderr.rstrip("\n")
+    if len(output) > _INLINE_SHELL_MAX_OUTPUT:
+        output = output[:_INLINE_SHELL_MAX_OUTPUT] + "…[truncated]"
+    return output
+
+
+def _expand_inline_shell(
+    content: str,
+    skill_dir: Path | None,
+    timeout: int,
+) -> str:
+    """Replace every !`cmd` snippet in ``content`` with its stdout.
+
+    Runs each snippet with the skill directory as CWD so relative paths in
+    the snippet work the way the author expects.
+    """
+    if "!`" not in content:
+        return content
+
+    def _replace(match: re.Match) -> str:
+        cmd = match.group(1).strip()
+        if not cmd:
+            return ""
+        return _run_inline_shell(cmd, skill_dir, timeout)
+
+    return _INLINE_SHELL_RE.sub(_replace, content)
+
+
+def build_plan_path(
+    user_instruction: str = "",
+    *,
+    now: datetime | None = None,
+) -> Path:
+    """Return the default workspace-relative markdown path for a /plan invocation.
+
+    Relative paths are intentional: file tools are task/backend-aware and resolve
+    them against the active working directory for local, docker, ssh, modal,
+    daytona, and similar terminal backends. That keeps the plan with the active
+    workspace instead of the Hermes host's global home directory.
+    """
+    slug_source = (user_instruction or "").strip().splitlines()[0] if user_instruction else ""
+    slug = _PLAN_SLUG_RE.sub("-", slug_source.lower()).strip("-")
+    if slug:
+        slug = "-".join(part for part in slug.split("-")[:8] if part)[:48].strip("-")
+    slug = slug or "conversation-plan"
+    timestamp = (now or datetime.now()).strftime("%Y-%m-%d_%H%M%S")
+    return Path(".hermes") / "plans" / f"{timestamp}-{slug}.md"
+
+
 def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tuple[dict[str, Any], Path | None, str] | None:
    """Load a skill by name/path and return (loaded_payload, skill_dir, display_name)."""
    raw_identifier = (skill_identifier or "").strip()
@@ -42,9 +167,7 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
        else:
            normalized = raw_identifier.lstrip("/")

-        loaded_skill = json.loads(
-            skill_view(normalized, task_id=task_id, preprocess=False)
-        )
+        loaded_skill = json.loads(skill_view(normalized, task_id=task_id))
    except Exception:
        return None

@@ -1,131 +0,0 @@
-"""Shared SKILL.md preprocessing helpers."""
-
-import logging
-import re
-import subprocess
-from pathlib import Path
-
-logger = logging.getLogger(__name__)
-
-# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
-# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
-# left as-is so the user can debug them.
-_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
-
-# Matches inline shell snippets like:  !`date +%Y-%m-%d`
-# Non-greedy, single-line only -- no newlines inside the backticks.
-_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
-
-# Cap inline-shell output so a runaway command can't blow out the context.
-_INLINE_SHELL_MAX_OUTPUT = 4000
-
-
-def load_skills_config() -> dict:
-    """Load the ``skills`` section of config.yaml (best-effort)."""
-    try:
-        from hermes_cli.config import load_config
-
-        cfg = load_config() or {}
-        skills_cfg = cfg.get("skills")
-        if isinstance(skills_cfg, dict):
-            return skills_cfg
-    except Exception:
-        logger.debug("Could not read skills config", exc_info=True)
-    return {}
-
-
-def substitute_template_vars(
-    content: str,
-    skill_dir: Path | None,
-    session_id: str | None,
-) -> str:
-    """Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
-
-    Only substitutes tokens for which a concrete value is available --
-    unresolved tokens are left in place so the author can spot them.
-    """
-    if not content:
-        return content
-
-    skill_dir_str = str(skill_dir) if skill_dir else None
-
-    def _replace(match: re.Match) -> str:
-        token = match.group(1)
-        if token == "HERMES_SKILL_DIR" and skill_dir_str:
-            return skill_dir_str
-        if token == "HERMES_SESSION_ID" and session_id:
-            return str(session_id)
-        return match.group(0)
-
-    return _SKILL_TEMPLATE_RE.sub(_replace, content)
-
-
-def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
-    """Execute a single inline-shell snippet and return its stdout (trimmed).
-
-    Failures return a short ``[inline-shell error: ...]`` marker instead of
-    raising, so one bad snippet can't wreck the whole skill message.
-    """
-    try:
-        completed = subprocess.run(
-            ["bash", "-c", command],
-            cwd=str(cwd) if cwd else None,
-            capture_output=True,
-            text=True,
-            timeout=max(1, int(timeout)),
-            check=False,
-        )
-    except subprocess.TimeoutExpired:
-        return f"[inline-shell timeout after {timeout}s: {command}]"
-    except FileNotFoundError:
-        return "[inline-shell error: bash not found]"
-    except Exception as exc:
-        return f"[inline-shell error: {exc}]"
-
-    output = (completed.stdout or "").rstrip("\n")
-    if not output and completed.stderr:
-        output = completed.stderr.rstrip("\n")
-    if len(output) > _INLINE_SHELL_MAX_OUTPUT:
-        output = output[:_INLINE_SHELL_MAX_OUTPUT] + "...[truncated]"
-    return output
-
-
-def expand_inline_shell(
-    content: str,
-    skill_dir: Path | None,
-    timeout: int,
-) -> str:
-    """Replace every !`cmd` snippet in ``content`` with its stdout.
-
-    Runs each snippet with the skill directory as CWD so relative paths in
-    the snippet work the way the author expects.
-    """
-    if "!`" not in content:
-        return content
-
-    def _replace(match: re.Match) -> str:
-        cmd = match.group(1).strip()
-        if not cmd:
-            return ""
-        return run_inline_shell(cmd, skill_dir, timeout)
-
-    return _INLINE_SHELL_RE.sub(_replace, content)
-
-
-def preprocess_skill_content(
-    content: str,
-    skill_dir: Path | None,
-    session_id: str | None = None,
-    skills_cfg: dict | None = None,
-) -> str:
-    """Apply configured SKILL.md template and inline-shell preprocessing."""
-    if not content:
-        return content
-
-    cfg = skills_cfg if isinstance(skills_cfg, dict) else load_skills_config()
-    if cfg.get("template_vars", True):
-        content = substitute_template_vars(content, skill_dir, session_id)
-    if cfg.get("inline_shell", False):
-        timeout = int(cfg.get("inline_shell_timeout", 10) or 10)
-        content = expand_inline_shell(content, skill_dir, timeout)
-    return content
@@ -951,9 +951,13 @@ class BatchRunner:
                    root_logger.setLevel(original_level)
        
        # Aggregate all batch statistics and update checkpoint
+        all_completed_prompts = list(completed_prompts_set)
        total_reasoning_stats = {"total_assistant_turns": 0, "turns_with_reasoning": 0, "turns_without_reasoning": 0}
-
+        
        for batch_result in results:
+            # Add newly completed prompts
+            all_completed_prompts.extend(batch_result.get("completed_prompts", []))
+            
            # Aggregate tool stats
            for tool_name, stats in batch_result.get("tool_stats", {}).items():
                if tool_name not in total_tool_stats:
@@ -973,7 +977,7 @@ class BatchRunner:
        
        # Save final checkpoint (best-effort; incremental writes already happened)
        try:
-            checkpoint_data["completed_prompts"] = sorted(completed_prompts_set)
+            checkpoint_data["completed_prompts"] = all_completed_prompts
            self._save_checkpoint(checkpoint_data, lock=checkpoint_lock)
        except Exception as ckpt_err:
            print(f"âš ï¸  Warning: Failed to save final checkpoint: {ckpt_err}")
@@ -326,16 +326,6 @@ compression:
  # To pin a specific model/provider for compression summaries, use the
  # auxiliary section below (auxiliary.compression.provider / model).

-# =============================================================================
-# Anthropic prompt caching TTL
-# =============================================================================
-# When prompt caching is active (Claude via OpenRouter or native Anthropic),
-# Anthropic supports two TTL tiers for cached prefixes: "5m" (default) and
-# "1h". Other values are ignored and "5m" is used.
-#
-prompt_caching:
-  cache_ttl: "5m" # use "1h" for long sessions with pauses between turns
-
 # =============================================================================
 # Auxiliary Models (Advanced — Experimental)
 # =============================================================================
@@ -790,16 +780,9 @@ code_execution:
 # Supports single tasks and batch mode (default 3 parallel, configurable).
 delegation:
  max_iterations: 50                          # Max tool-calling turns per child (default: 50)
-  # max_concurrent_children: 3                # Max parallel child agents per batch (default: 3, floor: 1, no ceiling).
-                                              # WARNING: values above 10 multiply API cost linearly.
-  # max_spawn_depth: 1                        # Delegation tree depth cap (range: 1-3, default: 1 = flat).
-                                              # Raise to 2 to allow workers to spawn their own subagents.
-                                              # Requires role="orchestrator" on intermediate agents.
+  # max_concurrent_children: 3                # Max parallel child agents (default: 3)
+  # max_spawn_depth: 1                        # Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
  # orchestrator_enabled: true                # Kill switch for role="orchestrator" children (default: true).
-  # subagent_auto_approve: false              # When a subagent hits a dangerous-command approval prompt, auto-deny (default: false)
-                                              # or auto-approve "once" (true) instead of blocking on stdin.
-                                              # The parent TUI owns stdin, so blocking would deadlock; non-interactive resolution is required.
-                                              # Both choices emit a logger.warning audit line. Flip to true only for cron/batch pipelines.
  # inherit_mcp_toolsets: true                # When explicit child toolsets are narrowed, also keep the parent's MCP toolsets (default: true). Set false for strict intersection.
  # model: "google/gemini-3-flash-preview"    # Override model for subagents (empty = inherit parent)
  # provider: "openrouter"                    # Override provider for subagents (empty = inherit parent)
@@ -1688,6 +1688,7 @@ def _looks_like_slash_command(text: str) -> bool:
 from agent.skill_commands import (
    scan_skill_commands,
    build_skill_invocation_message,
+    build_plan_path,
    build_preloaded_skills_prompt,
 )

@@ -3083,8 +3084,6 @@ class HermesCLI:
            format_runtime_provider_error,
        )

-        _primary_exc = None
-        runtime = None
        try:
            runtime = resolve_runtime_provider(
                requested=self.requested_provider,
@@ -3092,34 +3091,7 @@ class HermesCLI:
                explicit_base_url=self._explicit_base_url,
            )
        except Exception as exc:
-            _primary_exc = exc
-
-        # Primary provider auth failed — try fallback providers before giving up.
-        if runtime is None and _primary_exc is not None:
-            from hermes_cli.auth import AuthError
-            if isinstance(_primary_exc, AuthError):
-                _fb_chain = self._fallback_model if isinstance(self._fallback_model, list) else []
-                for _fb in _fb_chain:
-                    _fb_provider = (_fb.get("provider") or "").strip().lower()
-                    _fb_model = (_fb.get("model") or "").strip()
-                    if not _fb_provider or not _fb_model:
-                        continue
-                    try:
-                        runtime = resolve_runtime_provider(requested=_fb_provider)
-                        logger.warning(
-                            "Primary provider auth failed (%s). Falling through to fallback: %s/%s",
-                            _primary_exc, _fb_provider, _fb_model,
-                        )
-                        _cprint(f"⚠️  Primary auth failed — switching to fallback: {_fb_provider} / {_fb_model}")
-                        self.requested_provider = _fb_provider
-                        self.model = _fb_model
-                        _primary_exc = None
-                        break
-                    except Exception:
-                        continue
-
-        if runtime is None:
-            message = format_runtime_provider_error(_primary_exc) if _primary_exc else "Provider resolution failed."
+            message = format_runtime_provider_error(exc)
            ChatConsole().print(f"[bold red]{message}[/]")
            return False

@@ -3176,14 +3148,7 @@ class HermesCLI:
        # the configured model (e.g. "qwen3.6-plus"), causing 400 errors.
        runtime_model = runtime.get("model")
        if runtime_model and isinstance(runtime_model, str):
-            # Only use runtime model if: model is unset, or model equals provider name
-            should_use_runtime_model = (
-                not self.model or  # No model configured yet
-                self.model == self.provider or  # Model is the provider slug
-                self.model == runtime.get("name")  # Model matches provider display name
-            )
-            if should_use_runtime_model:
-                self.model = runtime_model
+            self.model = runtime_model

        # If model is still empty (e.g. user ran `hermes auth add openai-codex`
        # without `hermes model`), fall back to the provider's first catalog
@@ -3289,23 +3254,6 @@ class HermesCLI:
                _cprint(f"\033[1;31mSession not found: {self.session_id}{_RST}")
                _cprint(f"{_DIM}Use a session ID from a previous CLI run (hermes sessions list).{_RST}")
                return False
-            # If the requested session is the (empty) head of a compression
-            # chain, walk to the descendant that actually holds the messages.
-            # See #15000 and SessionDB.resolve_resume_session_id.
-            try:
-                resolved_id = self._session_db.resolve_resume_session_id(self.session_id)
-            except Exception:
-                resolved_id = self.session_id
-            if resolved_id and resolved_id != self.session_id:
-                ChatConsole().print(
-                    f"[{_DIM}]Session {_escape(self.session_id)} was compressed into "
-                    f"{_escape(resolved_id)}; resuming the descendant with your "
-                    f"transcript.[/]"
-                )
-                self.session_id = resolved_id
-                resolved_meta = self._session_db.get_session(self.session_id)
-                if resolved_meta:
-                    session_meta = resolved_meta
            restored = self._session_db.get_messages_as_conversation(self.session_id)
            if restored:
                restored = [m for m in restored if m.get("role") != "session_meta"]
@@ -3524,22 +3472,6 @@ class HermesCLI:
            )
            return False

-        # If the requested session is the (empty) head of a compression chain,
-        # walk to the descendant that actually holds the messages. See #15000.
-        try:
-            resolved_id = self._session_db.resolve_resume_session_id(self.session_id)
-        except Exception:
-            resolved_id = self.session_id
-        if resolved_id and resolved_id != self.session_id:
-            self._console_print(
-                f"[dim]Session {self.session_id} was compressed into "
-                f"{resolved_id}; resuming the descendant with your transcript.[/]"
-            )
-            self.session_id = resolved_id
-            resolved_meta = self._session_db.get_session(self.session_id)
-            if resolved_meta:
-                session_meta = resolved_meta
-
        restored = self._session_db.get_messages_as_conversation(self.session_id)
        if restored:
            restored = [m for m in restored if m.get("role") != "session_meta"]
@@ -4754,22 +4686,6 @@ class HermesCLI:
            _cprint("  Use /history or `hermes sessions list` to see available sessions.")
            return

-        # If the target is the empty head of a compression chain, redirect to
-        # the descendant that actually holds the transcript. See #15000.
-        try:
-            resolved_id = self._session_db.resolve_resume_session_id(target_id)
-        except Exception:
-            resolved_id = target_id
-        if resolved_id and resolved_id != target_id:
-            _cprint(
-                f"  Session {target_id} was compressed into {resolved_id}; "
-                f"resuming the descendant with your transcript."
-            )
-            target_id = resolved_id
-            resolved_meta = self._session_db.get_session(target_id)
-            if resolved_meta:
-                session_meta = resolved_meta
-
        if target_id == self.session_id:
            _cprint("  Already on that session.")
            return
@@ -5381,26 +5297,29 @@ class HermesCLI:
        _cprint(f"  ✓ Model switched: {result.new_model}")
        _cprint(f"    Provider: {provider_label}")

-        # Context: always resolve via the provider-aware chain so Codex OAuth,
-        # Copilot, and Nous-enforced caps win over the raw models.dev entry
-        # (e.g. gpt-5.5 is 1.05M on openai but 272K on Codex OAuth).
+        # Rich metadata from models.dev
        mi = result.model_info
-        from hermes_cli.model_switch import resolve_display_context_length
-        ctx = resolve_display_context_length(
-            result.new_model,
-            result.target_provider,
-            base_url=result.base_url or self.base_url or "",
-            api_key=result.api_key or self.api_key or "",
-            model_info=mi,
-        )
-        if ctx:
-            _cprint(f"    Context: {ctx:,} tokens")
        if mi:
+            if mi.context_window:
+                _cprint(f"    Context: {mi.context_window:,} tokens")
            if mi.max_output:
                _cprint(f"    Max output: {mi.max_output:,} tokens")
            if mi.has_cost_data():
                _cprint(f"    Cost: {mi.format_cost()}")
            _cprint(f"    Capabilities: {mi.format_capabilities()}")
+        else:
+            # Fallback to old context length lookup
+            try:
+                from agent.model_metadata import get_model_context_length
+                ctx = get_model_context_length(
+                    result.new_model,
+                    base_url=result.base_url or self.base_url,
+                    api_key=result.api_key or self.api_key,
+                    provider=result.target_provider,
+                )
+                _cprint(f"    Context: {ctx:,} tokens")
+            except Exception:
+                pass

        # Cache notice
        cache_enabled = (
@@ -5459,6 +5378,79 @@ class HermesCLI:
        except Exception:
            return False

+    def _show_model_and_providers(self):
+        """Show current model + provider and list all authenticated providers.
+
+        Shows current model + provider, then lists all authenticated
+        providers with their available models.
+        """
+        from hermes_cli.models import (
+            curated_models_for_provider, list_available_providers,
+            normalize_provider, _PROVIDER_LABELS,
+            get_pricing_for_provider, format_model_pricing_table,
+        )
+        from hermes_cli.auth import resolve_provider as _resolve_provider
+
+        # Resolve current provider
+        raw_provider = normalize_provider(self.provider)
+        if raw_provider == "auto":
+            try:
+                current = _resolve_provider(
+                    self.requested_provider,
+                    explicit_api_key=self._explicit_api_key,
+                    explicit_base_url=self._explicit_base_url,
+                )
+            except Exception:
+                current = "openrouter"
+        else:
+            current = raw_provider
+        current_label = _PROVIDER_LABELS.get(current, current)
+
+        print(f"\n  Current: {self.model} via {current_label}")
+        print()
+
+        # Show all authenticated providers with their models
+        providers = list_available_providers()
+        authed = [p for p in providers if p["authenticated"]]
+        unauthed = [p for p in providers if not p["authenticated"]]
+
+        if authed:
+            print("  Authenticated providers & models:")
+            for p in authed:
+                is_active = p["id"] == current
+                marker = " ← active" if is_active else ""
+                print(f"    [{p['id']}]{marker}")
+                curated = curated_models_for_provider(p["id"])
+                # Fetch pricing for providers that support it (openrouter, nous)
+                pricing_map = get_pricing_for_provider(p["id"]) if p["id"] in ("openrouter", "nous") else {}
+                if curated and pricing_map:
+                    cur_model = self.model if is_active else ""
+                    for line in format_model_pricing_table(curated, pricing_map, current_model=cur_model):
+                        print(line)
+                elif curated:
+                    for mid, desc in curated:
+                        current_marker = " ← current" if (is_active and mid == self.model) else ""
+                        print(f"      {mid}{current_marker}")
+                elif p["id"] == "custom":
+                    from hermes_cli.models import _get_custom_base_url
+                    custom_url = _get_custom_base_url()
+                    if custom_url:
+                        print(f"      endpoint: {custom_url}")
+                    if is_active:
+                        print(f"      model: {self.model} ← current")
+                    print("      (use hermes model to change)")
+                else:
+                    print("      (use hermes model to change)")
+                print()
+
+        if unauthed:
+            names = ", ".join(p["label"] for p in unauthed)
+            print(f"  Not configured: {names}")
+            print("  Run: hermes setup")
+            print()
+
+        print("  To change model or provider, use: hermes model")
+
    def _output_console(self):
        """Use prompt_toolkit-safe Rich rendering once the TUI is live."""
        if getattr(self, "_app", None):
@@ -6034,12 +6026,16 @@ class HermesCLI:
            self._handle_resume_command(cmd_original)
        elif canonical == "model":
            self._handle_model_switch(cmd_original)
+        elif canonical == "provider":
+            self._show_model_and_providers()
        elif canonical == "gquota":
            self._handle_gquota_command(cmd_original)

        elif canonical == "personality":
            # Use original case (handler lowercases the personality name itself)
            self._handle_personality_command(cmd_original)
+        elif canonical == "plan":
+            self._handle_plan_command(cmd_original)
        elif canonical == "retry":
            retry_msg = self.retry_last()
            if retry_msg and hasattr(self, '_pending_input'):
@@ -6169,8 +6165,6 @@ class HermesCLI:
            self._handle_skin_command(cmd_original)
        elif canonical == "voice":
            self._handle_voice_command(cmd_original)
-        elif canonical == "busy":
-            self._handle_busy_command(cmd_original)
        else:
            # Check for user-defined quick commands (bypass agent loop, no LLM call)
            base_cmd = cmd_lower.split()[0]
@@ -6276,6 +6270,32 @@ class HermesCLI:
        
        return True
    
+    def _handle_plan_command(self, cmd: str):
+        """Handle /plan [request] — load the bundled plan skill."""
+        parts = cmd.strip().split(maxsplit=1)
+        user_instruction = parts[1].strip() if len(parts) > 1 else ""
+
+        plan_path = build_plan_path(user_instruction)
+        msg = build_skill_invocation_message(
+            "/plan",
+            user_instruction,
+            task_id=self.session_id,
+            runtime_note=(
+                "Save the markdown plan with write_file to this exact relative path "
+                f"inside the active workspace/backend cwd: {plan_path}"
+            ),
+        )
+
+        if not msg:
+            ChatConsole().print("[bold red]Failed to load the bundled /plan skill[/]")
+            return
+
+        _cprint(f"  📝 Plan mode queued via skill. Markdown plan target: {plan_path}")
+        if hasattr(self, '_pending_input'):
+            self._pending_input.put(msg)
+        else:
+            ChatConsole().print("[bold red]Plan mode unavailable: input queue not initialized[/]")
+    
    def _handle_background_command(self, cmd: str):
        """Handle /background <prompt> — run a prompt in a separate background session.

@@ -6665,13 +6685,6 @@ class HermesCLI:
                print(f"   ⚠ Port {_port} is not reachable at {cdp_url}")

            os.environ["BROWSER_CDP_URL"] = cdp_url
-            # Eagerly start the CDP supervisor so pending_dialogs + frame_tree
-            # show up in the next browser_snapshot.  No-op if already started.
-            try:
-                from tools.browser_tool import _ensure_cdp_supervisor  # type: ignore[import-not-found]
-                _ensure_cdp_supervisor("default")
-            except Exception:
-                pass
            print()
            print("🌐 Browser connected to live Chrome via CDP")
            print(f"   Endpoint: {cdp_url}")
@@ -6693,8 +6706,7 @@ class HermesCLI:
            if current:
                os.environ.pop("BROWSER_CDP_URL", None)
                try:
-                    from tools.browser_tool import cleanup_all_browsers, _stop_cdp_supervisor
-                    _stop_cdp_supervisor("default")
+                    from tools.browser_tool import cleanup_all_browsers
                    cleanup_all_browsers()
                except Exception:
                    pass
@@ -6907,36 +6919,6 @@ class HermesCLI:
        else:
            _cprint(f"  {_ACCENT}✓ Reasoning effort set to '{arg}' (session only){_RST}")

-    def _handle_busy_command(self, cmd: str):
-        """Handle /busy — control what Enter does while Hermes is working.
-
-        Usage:
-            /busy               Show current busy input mode
-            /busy status        Show current busy input mode
-            /busy queue         Queue input for the next turn instead of interrupting
-            /busy interrupt     Interrupt the current run on Enter (default)
-        """
-        parts = cmd.strip().split(maxsplit=1)
-        if len(parts) < 2 or parts[1].strip().lower() == "status":
-            _cprint(f"  {_ACCENT}Busy input mode: {self.busy_input_mode}{_RST}")
-            _cprint(f"  {_DIM}Enter while busy: {'queues for next turn' if self.busy_input_mode == 'queue' else 'interrupts current run'}{_RST}")
-            _cprint(f"  {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
-            return
-
-        arg = parts[1].strip().lower()
-        if arg not in {"queue", "interrupt"}:
-            _cprint(f"  {_DIM}(._.) Unknown argument: {arg}{_RST}")
-            _cprint(f"  {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
-            return
-
-        self.busy_input_mode = arg
-        if save_config_value("display.busy_input_mode", arg):
-            behavior = "Enter will queue follow-up input while Hermes is busy." if arg == "queue" else "Enter will interrupt the current run while Hermes is busy."
-            _cprint(f"  {_ACCENT}✓ Busy input mode set to '{arg}' (saved to config){_RST}")
-            _cprint(f"  {_DIM}{behavior}{_RST}")
-        else:
-            _cprint(f"  {_ACCENT}✓ Busy input mode set to '{arg}' (session only){_RST}")
-
    def _handle_fast_command(self, cmd: str):
        """Handle /fast — toggle fast mode (OpenAI Priority Processing / Anthropic Fast Mode)."""
        if not self._fast_command_available():
@@ -7015,52 +6997,51 @@ class HermesCLI:
                focus_topic = parts[1].strip()

        original_count = len(self.conversation_history)
-        with self._busy_command("Compressing context..."):
-            try:
-                from agent.model_metadata import estimate_messages_tokens_rough
-                from agent.manual_compression_feedback import summarize_manual_compression
-                original_history = list(self.conversation_history)
-                approx_tokens = estimate_messages_tokens_rough(original_history)
-                if focus_topic:
-                    print(f"🗜️  Compressing {original_count} messages (~{approx_tokens:,} tokens), "
-                          f"focus: \"{focus_topic}\"...")
-                else:
-                    print(f"🗜️  Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
+        try:
+            from agent.model_metadata import estimate_messages_tokens_rough
+            from agent.manual_compression_feedback import summarize_manual_compression
+            original_history = list(self.conversation_history)
+            approx_tokens = estimate_messages_tokens_rough(original_history)
+            if focus_topic:
+                print(f"🗜️  Compressing {original_count} messages (~{approx_tokens:,} tokens), "
+                      f"focus: \"{focus_topic}\"...")
+            else:
+                print(f"🗜️  Compressing {original_count} messages (~{approx_tokens:,} tokens)...")

-                compressed, _ = self.agent._compress_context(
-                    original_history,
-                    self.agent._cached_system_prompt or "",
-                    approx_tokens=approx_tokens,
-                    focus_topic=focus_topic or None,
-                )
-                self.conversation_history = compressed
-                # _compress_context ends the old session and creates a new child
-                # session on the agent (run_agent.py::_compress_context). Sync the
-                # CLI's session_id so /status, /resume, exit summary, and title
-                # generation all point at the live continuation session, not the
-                # ended parent. Without this, subsequent end_session() calls target
-                # the already-closed parent and the child is orphaned.
-                if (
-                    getattr(self.agent, "session_id", None)
-                    and self.agent.session_id != self.session_id
-                ):
-                    self.session_id = self.agent.session_id
-                    self._pending_title = None
-                new_tokens = estimate_messages_tokens_rough(self.conversation_history)
-                summary = summarize_manual_compression(
-                    original_history,
-                    self.conversation_history,
-                    approx_tokens,
-                    new_tokens,
-                )
-                icon = "🗜️" if summary["noop"] else "✅"
-                print(f"  {icon} {summary['headline']}")
-                print(f"     {summary['token_line']}")
-                if summary["note"]:
-                    print(f"     {summary['note']}")
+            compressed, _ = self.agent._compress_context(
+                original_history,
+                self.agent._cached_system_prompt or "",
+                approx_tokens=approx_tokens,
+                focus_topic=focus_topic or None,
+            )
+            self.conversation_history = compressed
+            # _compress_context ends the old session and creates a new child
+            # session on the agent (run_agent.py::_compress_context). Sync the
+            # CLI's session_id so /status, /resume, exit summary, and title
+            # generation all point at the live continuation session, not the
+            # ended parent. Without this, subsequent end_session() calls target
+            # the already-closed parent and the child is orphaned.
+            if (
+                getattr(self.agent, "session_id", None)
+                and self.agent.session_id != self.session_id
+            ):
+                self.session_id = self.agent.session_id
+                self._pending_title = None
+            new_tokens = estimate_messages_tokens_rough(self.conversation_history)
+            summary = summarize_manual_compression(
+                original_history,
+                self.conversation_history,
+                approx_tokens,
+                new_tokens,
+            )
+            icon = "🗜️" if summary["noop"] else "✅"
+            print(f"  {icon} {summary['headline']}")
+            print(f"     {summary['token_line']}")
+            if summary["note"]:
+                print(f"     {summary['note']}")

-            except Exception as e:
-                print(f"  ❌ Compression failed: {e}")
+        except Exception as e:
+            print(f"  ❌ Compression failed: {e}")

    def _handle_debug_command(self):
        """Handle /debug — upload debug report + logs and print paste URLs."""
@@ -9562,20 +9543,9 @@ class HermesCLI:
        
        @kb.add('c-d')
        def handle_ctrl_d(event):
-            """Ctrl+D: delete char under cursor (standard readline behaviour).
-            Only exit when the input is empty — same as bash/zsh. Pending
-            attached images count as input and block the EOF-exit so the
-            user doesn't lose them silently.
-            """
-            buf = event.app.current_buffer
-            if buf.text:
-                buf.delete()
-            elif self._attached_images:
-                # Empty text but pending attachments — no-op, don't exit.
-                return
-            else:
-                self._should_exit = True
-                event.app.exit()
+            """Handle Ctrl+D - exit."""
+            self._should_exit = True
+            event.app.exit()

        _modal_prompt_active = Condition(
            lambda: bool(self._secret_state or self._sudo_state)
@@ -16,7 +16,7 @@ import uuid
 from datetime import datetime, timedelta
 from pathlib import Path
 from hermes_constants import get_hermes_home
-from typing import Optional, Dict, List, Any, Union
+from typing import Optional, Dict, List, Any

 logger = logging.getLogger(__name__)

@@ -371,39 +371,6 @@ def save_jobs(jobs: List[Dict[str, Any]]):
        raise


-def _normalize_workdir(workdir: Optional[str]) -> Optional[str]:
-    """Normalize and validate a cron job workdir.
-
-    Rules:
-      - Empty / None → None (feature off, preserves old behaviour).
-      - ``~`` is expanded.  Relative paths are rejected — cron jobs run detached
-        from any shell cwd, so relative paths have no stable meaning.
-      - The path must exist and be a directory at create/update time.  We do
-        NOT re-check at run time (a user might briefly unmount the dir; the
-        scheduler will just fall back to old behaviour with a logged warning).
-
-    Returns the absolute path string, or None when disabled.
-    Raises ValueError on invalid input.
-    """
-    if workdir is None:
-        return None
-    raw = str(workdir).strip()
-    if not raw:
-        return None
-    expanded = Path(raw).expanduser()
-    if not expanded.is_absolute():
-        raise ValueError(
-            f"Cron workdir must be an absolute path (got {raw!r}). "
-            f"Cron jobs run detached from any shell cwd, so relative paths are ambiguous."
-        )
-    resolved = expanded.resolve()
-    if not resolved.exists():
-        raise ValueError(f"Cron workdir does not exist: {resolved}")
-    if not resolved.is_dir():
-        raise ValueError(f"Cron workdir is not a directory: {resolved}")
-    return str(resolved)
-
-
 def create_job(
    prompt: str,
    schedule: str,
@@ -417,9 +384,7 @@ def create_job(
    provider: Optional[str] = None,
    base_url: Optional[str] = None,
    script: Optional[str] = None,
-    context_from: Optional[Union[str, List[str]]] = None,
    enabled_toolsets: Optional[List[str]] = None,
-    workdir: Optional[str] = None,
 ) -> Dict[str, Any]:
    """
    Create a new cron job.
@@ -439,18 +404,9 @@ def create_job(
        script: Optional path to a Python script whose stdout is injected into the
                prompt each run.  The script runs before the agent turn, and its output
                is prepended as context.  Useful for data collection / change detection.
-        context_from: Optional job ID (or list of job IDs) whose most recent output
-                      is injected into the prompt as context before each run.
-                      Useful for chaining cron jobs: job A finds data, job B processes it.
        enabled_toolsets: Optional list of toolset names to restrict the agent to.
                          When set, only tools from these toolsets are loaded, reducing
                          token overhead. When omitted, all default tools are loaded.
-        workdir: Optional absolute path.  When set, the job runs as if launched
-                from that directory: AGENTS.md / CLAUDE.md / .cursorrules from
-                that directory are injected into the system prompt, and the
-                terminal/file/code_exec tools use it as their working directory
-                (via TERMINAL_CWD).  When unset, the old behaviour is preserved
-                (no context files injected, tools use the scheduler's cwd).

    Returns:
        The created job dict
@@ -483,15 +439,6 @@ def create_job(
    normalized_script = normalized_script or None
    normalized_toolsets = [str(t).strip() for t in enabled_toolsets if str(t).strip()] if enabled_toolsets else None
    normalized_toolsets = normalized_toolsets or None
-    normalized_workdir = _normalize_workdir(workdir)
-
-    # Normalize context_from: accept str or list of str, store as list or None
-    if isinstance(context_from, str):
-        context_from = [context_from.strip()] if context_from.strip() else None
-    elif isinstance(context_from, list):
-        context_from = [str(j).strip() for j in context_from if str(j).strip()] or None
-    else:
-        context_from = None

    label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
    job = {
@@ -504,7 +451,6 @@ def create_job(
        "provider": normalized_provider,
        "base_url": normalized_base_url,
        "script": normalized_script,
-        "context_from": context_from,
        "schedule": parsed_schedule,
        "schedule_display": parsed_schedule.get("display", schedule),
        "repeat": {
@@ -525,7 +471,6 @@ def create_job(
        "deliver": deliver,
        "origin": origin,  # Tracks where job was created for "origin" delivery
        "enabled_toolsets": normalized_toolsets,
-        "workdir": normalized_workdir,
    }

    jobs = load_jobs()
@@ -559,15 +504,6 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
        if job["id"] != job_id:
            continue

-        # Validate / normalize workdir if present in updates.  Empty string or
-        # None both mean "clear the field" (restore old behaviour).
-        if "workdir" in updates:
-            _wd = updates["workdir"]
-            if _wd in (None, "", False):
-                updates["workdir"] = None
-            else:
-                updates["workdir"] = _normalize_workdir(_wd)
-
        updated = _apply_skill_fields({**job, **updates})
        schedule_changed = "schedule" in updates

@@ -671,47 +671,6 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
                f"{prompt}"
            )

-    # Inject output from referenced cron jobs as context.
-    context_from = job.get("context_from")
-    if context_from:
-        from cron.jobs import OUTPUT_DIR
-        if isinstance(context_from, str):
-            context_from = [context_from]
-        for source_job_id in context_from:
-            # Guard against path traversal — valid job IDs are 12-char hex strings
-            if not source_job_id or not all(c in "0123456789abcdef" for c in source_job_id):
-                logger.warning("context_from: skipping invalid job_id %r", source_job_id)
-                continue
-            try:
-                job_output_dir = OUTPUT_DIR / source_job_id
-                if not job_output_dir.exists():
-                    continue  # silent skip — no output yet
-                output_files = sorted(
-                    job_output_dir.glob("*.md"),
-                    key=lambda f: f.stat().st_mtime,
-                    reverse=True,
-                )
-                if not output_files:
-                    continue  # silent skip — no output yet
-                latest_output = output_files[0].read_text(encoding="utf-8").strip()
-                # Truncate to 8K characters to avoid prompt bloat
-                _MAX_CONTEXT_CHARS = 8000
-                if len(latest_output) > _MAX_CONTEXT_CHARS:
-                    latest_output = latest_output[:_MAX_CONTEXT_CHARS] + "\n\n[... output truncated ...]"
-                if latest_output:
-                    prompt = (
-                        f"## Output from job '{source_job_id}'\n"
-                        "The following is the most recent output from a preceding "
-                        "cron job. Use it as context for your analysis.\n\n"
-                        f"```\n{latest_output}\n```\n\n"
-                        f"{prompt}"
-                    )
-                else:
-                    continue  # silent skip — empty output
-            except (OSError, PermissionError) as e:
-                logger.warning("context_from: failed to read output for job %r: %s", source_job_id, e)
-                # silent skip — do not pollute the prompt with error messages
-
    # Always prepend cron execution guidance so the agent knows how
    # delivery works and can suppress delivery when appropriate.
    cron_hint = (
@@ -836,30 +795,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
        chat_name=origin.get("chat_name", "") if origin else "",
    )

-    # Per-job working directory.  When set (and validated at create/update
-    # time), we point TERMINAL_CWD at it so:
-    #   - build_context_files_prompt() picks up AGENTS.md / CLAUDE.md /
-    #     .cursorrules from the job's project dir, AND
-    #   - the terminal, file, and code-exec tools run commands from there.
-    #
-    # tick() serializes workdir-jobs outside the parallel pool, so mutating
-    # os.environ["TERMINAL_CWD"] here is safe for those jobs.  For workdir-less
-    # jobs we leave TERMINAL_CWD untouched — preserves the original behaviour
-    # (skip_context_files=True, tools use whatever cwd the scheduler has).
-    _job_workdir = (job.get("workdir") or "").strip() or None
-    if _job_workdir and not Path(_job_workdir).is_dir():
-        # Directory was removed between create-time validation and now.  Log
-        # and drop back to old behaviour rather than crashing the job.
-        logger.warning(
-            "Job '%s': configured workdir %r no longer exists — running without it",
-            job_id, _job_workdir,
-        )
-        _job_workdir = None
-    _prior_terminal_cwd = os.environ.get("TERMINAL_CWD", "_UNSET_")
-    if _job_workdir:
-        os.environ["TERMINAL_CWD"] = _job_workdir
-        logger.info("Job '%s': using workdir %s", job_id, _job_workdir)
-
    try:
        # Re-read .env and config.yaml fresh every run so provider/key
        # changes take effect without a gateway restart.
@@ -936,7 +871,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            resolve_runtime_provider,
            format_runtime_provider_error,
        )
-        from hermes_cli.auth import AuthError
        try:
            runtime_kwargs = {
                "requested": job.get("provider") or os.getenv("HERMES_INFERENCE_PROVIDER"),
@@ -944,28 +878,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            if job.get("base_url"):
                runtime_kwargs["explicit_base_url"] = job.get("base_url")
            runtime = resolve_runtime_provider(**runtime_kwargs)
-        except AuthError as auth_exc:
-            # Primary provider auth failed — try fallback chain before giving up.
-            logger.warning("Job '%s': primary auth failed (%s), trying fallback", job_id, auth_exc)
-            fb = _cfg.get("fallback_providers") or _cfg.get("fallback_model")
-            fb_list = (fb if isinstance(fb, list) else [fb]) if fb else []
-            runtime = None
-            for entry in fb_list:
-                if not isinstance(entry, dict):
-                    continue
-                try:
-                    fb_kwargs = {"requested": entry.get("provider")}
-                    if entry.get("base_url"):
-                        fb_kwargs["explicit_base_url"] = entry["base_url"]
-                    if entry.get("api_key"):
-                        fb_kwargs["explicit_api_key"] = entry["api_key"]
-                    runtime = resolve_runtime_provider(**fb_kwargs)
-                    logger.info("Job '%s': fallback resolved to %s", job_id, runtime.get("provider"))
-                    break
-                except Exception as fb_exc:
-                    logger.debug("Job '%s': fallback %s failed: %s", job_id, entry.get("provider"), fb_exc)
-            if runtime is None:
-                raise RuntimeError(format_runtime_provider_error(auth_exc)) from auth_exc
        except Exception as exc:
            message = format_runtime_provider_error(exc)
            raise RuntimeError(message) from exc
@@ -1008,10 +920,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
            enabled_toolsets=_resolve_cron_enabled_toolsets(job, _cfg),
            disabled_toolsets=["cronjob", "messaging", "clarify"],
            quiet_mode=True,
-            # When a workdir is configured, inject AGENTS.md / CLAUDE.md /
-            # .cursorrules from that directory; otherwise preserve the old
-            # behaviour (don't inject SOUL.md/AGENTS.md from the scheduler cwd).
-            skip_context_files=not bool(_job_workdir),
+            skip_context_files=True,  # Don't inject SOUL.md/AGENTS.md from scheduler cwd
            skip_memory=True,  # Cron system prompts would corrupt user representations
            platform="cron",
            session_id=_cron_session_id,
@@ -1150,14 +1059,6 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
        return False, output, "", error_msg

    finally:
-        # Restore TERMINAL_CWD to whatever it was before this job ran.  We
-        # only ever mutate it when the job has a workdir; see the setup block
-        # at the top of run_job for the serialization guarantee.
-        if _job_workdir:
-            if _prior_terminal_cwd == "_UNSET_":
-                os.environ.pop("TERMINAL_CWD", None)
-            else:
-                os.environ["TERMINAL_CWD"] = _prior_terminal_cwd
        # Clean up ContextVar session/delivery state for this job.
        clear_session_vars(_ctx_tokens)
        if _session_db:
@@ -1285,28 +1186,14 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
                mark_job_run(job["id"], False, str(e))
                return False

-        # Partition due jobs: those with a per-job workdir mutate
-        # os.environ["TERMINAL_CWD"] inside run_job, which is process-global —
-        # so they MUST run sequentially to avoid corrupting each other.  Jobs
-        # without a workdir leave env untouched and stay parallel-safe.
-        workdir_jobs = [j for j in due_jobs if (j.get("workdir") or "").strip()]
-        parallel_jobs = [j for j in due_jobs if not (j.get("workdir") or "").strip()]
-
-        _results: list = []
-
-        # Sequential pass for workdir jobs.
-        for job in workdir_jobs:
-            _ctx = contextvars.copy_context()
-            _results.append(_ctx.run(_process_job, job))
-
-        # Parallel pass for the rest — same behaviour as before.
-        if parallel_jobs:
-            with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
-                _futures = []
-                for job in parallel_jobs:
-                    _ctx = contextvars.copy_context()
-                    _futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
-                _results.extend(f.result() for f in _futures)
+        # Run all due jobs concurrently, each in its own ContextVar copy
+        # so session/delivery state stays isolated per-thread.
+        with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
+            _futures = []
+            for job in due_jobs:
+                _ctx = contextvars.copy_context()
+                _futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
+            _results = [f.result() for f in _futures]

        return sum(_results)
    finally:
@@ -1,52 +0,0 @@
-#
-# docker-compose.yml for Hermes Agent
-#
-# Usage:
-#   HERMES_UID=$(id -u) HERMES_GID=$(id -g) docker compose up -d
-#
-# Set HERMES_UID / HERMES_GID to the host user that owns ~/.hermes so
-# files created inside the container stay readable/writable on the host.
-# The entrypoint remaps the internal `hermes` user to these values via
-# usermod/groupmod + gosu.
-#
-# Security notes:
-#   - The dashboard service binds to 127.0.0.1 by default. It stores API
-#     keys; exposing it on LAN without auth is unsafe. If you want remote
-#     access, use an SSH tunnel or put it behind a reverse proxy that
-#     adds authentication — do NOT pass --insecure --host 0.0.0.0.
-#   - The gateway's API server is off unless you uncomment API_SERVER_KEY
-#     and API_SERVER_HOST. See docs/user-guide/api-server.md before doing
-#     this on an internet-facing host.
-#
-services:
-  gateway:
-    build: .
-    image: hermes-agent
-    container_name: hermes
-    restart: unless-stopped
-    network_mode: host
-    volumes:
-      - ~/.hermes:/opt/data
-    environment:
-      - HERMES_UID=${HERMES_UID:-10000}
-      - HERMES_GID=${HERMES_GID:-10000}
-      # To expose the OpenAI-compatible API server beyond localhost,
-      # uncomment BOTH lines (API_SERVER_KEY is mandatory for auth):
-      # - API_SERVER_HOST=0.0.0.0
-      # - API_SERVER_KEY=${API_SERVER_KEY}
-    command: ["gateway", "run"]
-
-  dashboard:
-    image: hermes-agent
-    container_name: hermes-dashboard
-    restart: unless-stopped
-    network_mode: host
-    depends_on:
-      - gateway
-    volumes:
-      - ~/.hermes:/opt/data
-    environment:
-      - HERMES_UID=${HERMES_UID:-10000}
-      - HERMES_GID=${HERMES_GID:-10000}
-    # Localhost-only. For remote access, tunnel via `ssh -L 9119:localhost:9119`.
-    command: ["dashboard", "--host", "127.0.0.1", "--no-open"]
@@ -22,18 +22,9 @@ if [ "$(id -u)" = "0" ]; then
        groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
    fi

-    # Fix ownership of the data volume. When HERMES_UID remaps the hermes user,
-    # files created by previous runs (under the old UID) become inaccessible.
-    # Always chown -R when UID was remapped; otherwise only if top-level is wrong.
    actual_hermes_uid=$(id -u hermes)
-    needs_chown=false
-    if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "10000" ]; then
-        needs_chown=true
-    elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
-        needs_chown=true
-    fi
-    if [ "$needs_chown" = true ]; then
-        echo "Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
+    if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
+        echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
        # In rootless Podman the container's "root" is mapped to an unprivileged
        # host UID — chown will fail.  That's fine: the volume is already owned
        # by the mapped user on the host side.
@@ -135,7 +135,7 @@ class SessionResetPolicy:
            mode=mode if mode is not None else "both",
            at_hour=at_hour if at_hour is not None else 4,
            idle_minutes=idle_minutes if idle_minutes is not None else 1440,
-            notify=_coerce_bool(notify, True),
+            notify=notify if notify is not None else True,
            notify_exclude_platforms=tuple(exclude) if exclude is not None else ("api_server", "webhook"),
        )

@@ -178,7 +178,7 @@ class PlatformConfig:
            home_channel = HomeChannel.from_dict(data["home_channel"])
        
        return cls(
-            enabled=_coerce_bool(data.get("enabled"), False),
+            enabled=data.get("enabled", False),
            token=data.get("token"),
            api_key=data.get("api_key"),
            home_channel=home_channel,
@@ -435,7 +435,7 @@ class GatewayConfig:
            reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
            quick_commands=quick_commands,
            sessions_dir=sessions_dir,
-            always_log_local=_coerce_bool(data.get("always_log_local"), True),
+            always_log_local=data.get("always_log_local", True),
            stt_enabled=_coerce_bool(stt_enabled, True),
            group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
            thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
@@ -687,11 +687,6 @@ def load_gateway_config() -> GatewayConfig:
                    os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
                if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
                    os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
-                if "group_allowed_chats" in telegram_cfg and not os.getenv("TELEGRAM_GROUP_ALLOWED_USERS"):
-                    gac = telegram_cfg["group_allowed_chats"]
-                    if isinstance(gac, list):
-                        gac = ",".join(str(v) for v in gac)
-                    os.environ["TELEGRAM_GROUP_ALLOWED_USERS"] = str(gac)
                if "disable_link_previews" in telegram_cfg:
                    plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
                    if not isinstance(plat_data, dict):
@@ -1204,12 +1204,10 @@ class APIServerAdapter(BasePlatformAdapter):

        If the client disconnects mid-stream, ``agent.interrupt()`` is
        called so the agent stops issuing upstream LLM calls, then the
-        asyncio task is cancelled.  When ``store=True`` an initial
-        ``in_progress`` snapshot is persisted immediately after
-        ``response.created`` and disconnects update it to an
-        ``incomplete`` snapshot so GET /v1/responses/{id} and
-        ``previous_response_id`` chaining still have something to
-        recover from.
+        asyncio task is cancelled.  When ``store=True`` the full response
+        is persisted to the ResponseStore in a ``finally`` block so GET
+        /v1/responses/{id} and ``previous_response_id`` chaining work the
+        same as the batch path.
        """
        import queue as _q

@@ -1271,60 +1269,6 @@ class APIServerAdapter(BasePlatformAdapter):
        final_response_text = ""
        agent_error: Optional[str] = None
        usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
-        terminal_snapshot_persisted = False
-
-        def _persist_response_snapshot(
-            response_env: Dict[str, Any],
-            *,
-            conversation_history_snapshot: Optional[List[Dict[str, Any]]] = None,
-        ) -> None:
-            if not store:
-                return
-            if conversation_history_snapshot is None:
-                conversation_history_snapshot = list(conversation_history)
-                conversation_history_snapshot.append({"role": "user", "content": user_message})
-            self._response_store.put(response_id, {
-                "response": response_env,
-                "conversation_history": conversation_history_snapshot,
-                "instructions": instructions,
-                "session_id": session_id,
-            })
-            if conversation:
-                self._response_store.set_conversation(conversation, response_id)
-
-        def _persist_incomplete_if_needed() -> None:
-            """Persist an ``incomplete`` snapshot if no terminal one was written.
-
-            Called from both the client-disconnect (``ConnectionResetError``)
-            and server-cancellation (``asyncio.CancelledError``) paths so
-            GET /v1/responses/{id} and ``previous_response_id`` chaining keep
-            working after abrupt stream termination.
-            """
-            if not store or terminal_snapshot_persisted:
-                return
-            incomplete_text = "".join(final_text_parts) or final_response_text
-            incomplete_items: List[Dict[str, Any]] = list(emitted_items)
-            if incomplete_text:
-                incomplete_items.append({
-                    "type": "message",
-                    "role": "assistant",
-                    "content": [{"type": "output_text", "text": incomplete_text}],
-                })
-            incomplete_env = _envelope("incomplete")
-            incomplete_env["output"] = incomplete_items
-            incomplete_env["usage"] = {
-                "input_tokens": usage.get("input_tokens", 0),
-                "output_tokens": usage.get("output_tokens", 0),
-                "total_tokens": usage.get("total_tokens", 0),
-            }
-            incomplete_history = list(conversation_history)
-            incomplete_history.append({"role": "user", "content": user_message})
-            if incomplete_text:
-                incomplete_history.append({"role": "assistant", "content": incomplete_text})
-            _persist_response_snapshot(
-                incomplete_env,
-                conversation_history_snapshot=incomplete_history,
-            )

        try:
            # response.created — initial envelope, status=in_progress
@@ -1334,7 +1278,6 @@ class APIServerAdapter(BasePlatformAdapter):
                "type": "response.created",
                "response": created_env,
            })
-            _persist_response_snapshot(created_env)
            last_activity = time.monotonic()

            async def _open_message_item() -> None:
@@ -1591,18 +1534,6 @@ class APIServerAdapter(BasePlatformAdapter):
                    "output_tokens": usage.get("output_tokens", 0),
                    "total_tokens": usage.get("total_tokens", 0),
                }
-                _failed_history = list(conversation_history)
-                _failed_history.append({"role": "user", "content": user_message})
-                if final_response_text or agent_error:
-                    _failed_history.append({
-                        "role": "assistant",
-                        "content": final_response_text or agent_error,
-                    })
-                _persist_response_snapshot(
-                    failed_env,
-                    conversation_history_snapshot=_failed_history,
-                )
-                terminal_snapshot_persisted = True
                await _write_event("response.failed", {
                    "type": "response.failed",
                    "response": failed_env,
@@ -1615,24 +1546,30 @@ class APIServerAdapter(BasePlatformAdapter):
                    "output_tokens": usage.get("output_tokens", 0),
                    "total_tokens": usage.get("total_tokens", 0),
                }
-                full_history = list(conversation_history)
-                full_history.append({"role": "user", "content": user_message})
-                if isinstance(result, dict) and result.get("messages"):
-                    full_history.extend(result["messages"])
-                else:
-                    full_history.append({"role": "assistant", "content": final_response_text})
-                _persist_response_snapshot(
-                    completed_env,
-                    conversation_history_snapshot=full_history,
-                )
-                terminal_snapshot_persisted = True
                await _write_event("response.completed", {
                    "type": "response.completed",
                    "response": completed_env,
                })

+                # Persist for future chaining / GET retrieval, mirroring
+                # the batch path behavior.
+                if store:
+                    full_history = list(conversation_history)
+                    full_history.append({"role": "user", "content": user_message})
+                    if isinstance(result, dict) and result.get("messages"):
+                        full_history.extend(result["messages"])
+                    else:
+                        full_history.append({"role": "assistant", "content": final_response_text})
+                    self._response_store.put(response_id, {
+                        "response": completed_env,
+                        "conversation_history": full_history,
+                        "instructions": instructions,
+                        "session_id": session_id,
+                    })
+                    if conversation:
+                        self._response_store.set_conversation(conversation, response_id)
+
        except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
-            _persist_incomplete_if_needed()
            # Client disconnected — interrupt the agent so it stops
            # making upstream LLM calls, then cancel the task.
            agent = agent_ref[0] if agent_ref else None
@@ -1648,22 +1585,6 @@ class APIServerAdapter(BasePlatformAdapter):
                except (asyncio.CancelledError, Exception):
                    pass
            logger.info("SSE client disconnected; interrupted agent task %s", response_id)
-        except asyncio.CancelledError:
-            # Server-side cancellation (e.g. shutdown, request timeout) —
-            # persist an incomplete snapshot so GET /v1/responses/{id} and
-            # previous_response_id chaining still work, then re-raise so the
-            # runtime's cancellation semantics are respected.
-            _persist_incomplete_if_needed()
-            agent = agent_ref[0] if agent_ref else None
-            if agent is not None:
-                try:
-                    agent.interrupt("SSE task cancelled")
-                except Exception:
-                    pass
-            if not agent_task.done():
-                agent_task.cancel()
-            logger.info("SSE task cancelled; persisted incomplete snapshot for %s", response_id)
-            raise

        return response

@@ -148,102 +148,7 @@ def _detect_macos_system_proxy() -> str | None:
    return None


-def _split_host_port(value: str) -> tuple[str, int | None]:
-    raw = str(value or "").strip()
-    if not raw:
-        return "", None
-    if "://" in raw:
-        parsed = urlsplit(raw)
-        return (parsed.hostname or "").lower().rstrip("."), parsed.port
-    if raw.startswith("[") and "]" in raw:
-        host, _, rest = raw[1:].partition("]")
-        port = None
-        if rest.startswith(":") and rest[1:].isdigit():
-            port = int(rest[1:])
-        return host.lower().rstrip("."), port
-    if raw.count(":") == 1:
-        host, _, maybe_port = raw.rpartition(":")
-        if maybe_port.isdigit():
-            return host.lower().rstrip("."), int(maybe_port)
-    return raw.lower().strip("[]").rstrip("."), None
-
-
-def _no_proxy_entries() -> list[str]:
-    entries: list[str] = []
-    for key in ("NO_PROXY", "no_proxy"):
-        raw = os.environ.get(key, "")
-        entries.extend(part.strip() for part in raw.split(",") if part.strip())
-    return entries
-
-
-def _no_proxy_entry_matches(entry: str, host: str, port: int | None = None) -> bool:
-    token = str(entry or "").strip().lower()
-    if not token:
-        return False
-    if token == "*":
-        return True
-
-    token_host, token_port = _split_host_port(token)
-    if token_port is not None and port is not None and token_port != port:
-        return False
-    if token_port is not None and port is None:
-        return False
-    if not token_host:
-        return False
-
-    try:
-        network = ipaddress.ip_network(token_host, strict=False)
-        try:
-            return ipaddress.ip_address(host) in network
-        except ValueError:
-            return False
-    except ValueError:
-        pass
-
-    try:
-        token_ip = ipaddress.ip_address(token_host)
-        try:
-            return ipaddress.ip_address(host) == token_ip
-        except ValueError:
-            return False
-    except ValueError:
-        pass
-
-    if token_host.startswith("*."):
-        suffix = token_host[1:]
-        return host.endswith(suffix)
-    if token_host.startswith("."):
-        return host == token_host[1:] or host.endswith(token_host)
-    return host == token_host or host.endswith(f".{token_host}")
-
-
-def should_bypass_proxy(target_hosts: str | list[str] | tuple[str, ...] | set[str] | None) -> bool:
-    """Return True when NO_PROXY/no_proxy matches at least one target host.
-
-    Supports exact hosts, domain suffixes, wildcard suffixes, IP literals,
-    CIDR ranges, optional host:port entries, and ``*``.
-    """
-    entries = _no_proxy_entries()
-    if not entries or not target_hosts:
-        return False
-    if isinstance(target_hosts, str):
-        candidates = [target_hosts]
-    else:
-        candidates = list(target_hosts)
-    for candidate in candidates:
-        host, port = _split_host_port(str(candidate))
-        if not host:
-            continue
-        if any(_no_proxy_entry_matches(entry, host, port) for entry in entries):
-            return True
-    return False
-
-
-def resolve_proxy_url(
-    platform_env_var: str | None = None,
-    *,
-    target_hosts: str | list[str] | tuple[str, ...] | set[str] | None = None,
-) -> str | None:
+def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
    """Return a proxy URL from env vars, or macOS system proxy.

    Check order:
@@ -251,26 +156,18 @@ def resolve_proxy_url(
      1. HTTPS_PROXY / HTTP_PROXY / ALL_PROXY (and lowercase variants)
      2. macOS system proxy via ``scutil --proxy`` (auto-detect)

-    Returns *None* if no proxy is found, or if NO_PROXY/no_proxy matches one
-    of ``target_hosts``.
+    Returns *None* if no proxy is found.
    """
    if platform_env_var:
        value = (os.environ.get(platform_env_var) or "").strip()
        if value:
-            if should_bypass_proxy(target_hosts):
-                return None
            return normalize_proxy_url(value)
    for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
                "https_proxy", "http_proxy", "all_proxy"):
        value = (os.environ.get(key) or "").strip()
        if value:
-            if should_bypass_proxy(target_hosts):
-                return None
            return normalize_proxy_url(value)
-    detected = normalize_proxy_url(_detect_macos_system_proxy())
-    if detected and should_bypass_proxy(target_hosts):
-        return None
-    return detected
+    return normalize_proxy_url(_detect_macos_system_proxy())


 def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:
@@ -2543,9 +2440,6 @@ class BasePlatformAdapter(ABC):
        user_id_alt: Optional[str] = None,
        chat_id_alt: Optional[str] = None,
        is_bot: bool = False,
-        guild_id: Optional[str] = None,
-        parent_chat_id: Optional[str] = None,
-        message_id: Optional[str] = None,
    ) -> SessionSource:
        """Helper to build a SessionSource for this platform."""
        # Normalize empty topic to None
@@ -2563,9 +2457,6 @@ class BasePlatformAdapter(ABC):
            user_id_alt=user_id_alt,
            chat_id_alt=chat_id_alt,
            is_bot=is_bot,
-            guild_id=str(guild_id) if guild_id else None,
-            parent_chat_id=str(parent_chat_id) if parent_chat_id else None,
-            message_id=str(message_id) if message_id else None,
        )
    
    @abstractmethod
@@ -99,7 +99,6 @@ def _normalize_server_url(raw: str) -> str:

 class BlueBubblesAdapter(BasePlatformAdapter):
    platform = Platform.BLUEBUBBLES
-    SUPPORTS_MESSAGE_EDITING = False
    MAX_MESSAGE_LENGTH = MAX_TEXT_LENGTH

    def __init__(self, config: PlatformConfig):
@@ -392,13 +391,6 @@ class BlueBubblesAdapter(BasePlatformAdapter):
    # Text sending
    # ------------------------------------------------------------------

-    @staticmethod
-    def truncate_message(content: str, max_length: int = MAX_TEXT_LENGTH) -> List[str]:
-        # Use the base splitter but skip pagination indicators — iMessage
-        # bubbles flow naturally without "(1/3)" suffixes.
-        chunks = BasePlatformAdapter.truncate_message(content, max_length)
-        return [re.sub(r"\s*\(\d+/\d+\)$", "", c) for c in chunks]
-
    async def send(
        self,
        chat_id: str,
@@ -406,19 +398,10 @@ class BlueBubblesAdapter(BasePlatformAdapter):
        reply_to: Optional[str] = None,
        metadata: Optional[Dict[str, Any]] = None,
    ) -> SendResult:
-        text = self.format_message(content)
+        text = strip_markdown(content or "")
        if not text:
            return SendResult(success=False, error="BlueBubbles send requires text")
-        # Split on paragraph breaks first (double newlines) so each thought
-        # becomes its own iMessage bubble, then truncate any that are still
-        # too long.
-        paragraphs = [p.strip() for p in re.split(r'\n\s*\n', text) if p.strip()]
-        chunks: List[str] = []
-        for para in (paragraphs or [text]):
-            if len(para) <= self.MAX_MESSAGE_LENGTH:
-                chunks.append(para)
-            else:
-                chunks.extend(self.truncate_message(para, max_length=self.MAX_MESSAGE_LENGTH))
+        chunks = self.truncate_message(text, max_length=self.MAX_MESSAGE_LENGTH)
        last = SendResult(success=True)
        for chunk in chunks:
            guid = await self._resolve_chat_guid(chat_id)
@@ -2246,6 +2246,10 @@ class DiscordAdapter(BasePlatformAdapter):
        async def slash_usage(interaction: discord.Interaction):
            await self._run_simple_slash(interaction, "/usage")

+        @tree.command(name="provider", description="Show available providers")
+        async def slash_provider(interaction: discord.Interaction):
+            await self._run_simple_slash(interaction, "/provider")
+
        @tree.command(name="help", description="Show available commands")
        async def slash_help(interaction: discord.Interaction):
            await self._run_simple_slash(interaction, "/help")
@@ -2715,12 +2719,7 @@ class DiscordAdapter(BasePlatformAdapter):
        return os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no", "off")

    def _discord_free_response_channels(self) -> set:
-        """Return Discord channel IDs where no bot mention is required.
-
-        A single ``"*"`` entry (either from a list or a comma-separated
-        string) is preserved in the returned set so callers can short-circuit
-        on wildcard membership, consistent with ``allowed_channels``.
-        """
+        """Return Discord channel IDs where no bot mention is required."""
        raw = self.config.extra.get("free_response_channels")
        if raw is None:
            raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
@@ -3213,14 +3212,14 @@ class DiscordAdapter(BasePlatformAdapter):
            allowed_channels_raw = os.getenv("DISCORD_ALLOWED_CHANNELS", "")
            if allowed_channels_raw:
                allowed_channels = {ch.strip() for ch in allowed_channels_raw.split(",") if ch.strip()}
-                if "*" not in allowed_channels and not (channel_ids & allowed_channels):
+                if not (channel_ids & allowed_channels):
                    logger.debug("[%s] Ignoring message in non-allowed channel: %s", self.name, channel_ids)
                    return

            # Check ignored channels - never respond even when mentioned
            ignored_channels_raw = os.getenv("DISCORD_IGNORED_CHANNELS", "")
            ignored_channels = {ch.strip() for ch in ignored_channels_raw.split(",") if ch.strip()}
-            if "*" in ignored_channels or (channel_ids & ignored_channels):
+            if channel_ids & ignored_channels:
                logger.debug("[%s] Ignoring message in ignored channel: %s", self.name, channel_ids)
                return

@@ -3234,11 +3233,7 @@ class DiscordAdapter(BasePlatformAdapter):
            voice_linked_ids = {str(ch_id) for ch_id in self._voice_text_channels.values()}
            current_channel_id = str(message.channel.id)
            is_voice_linked_channel = current_channel_id in voice_linked_ids
-            is_free_channel = (
-                "*" in free_channels
-                or bool(channel_ids & free_channels)
-                or is_voice_linked_channel
-            )
+            is_free_channel = bool(channel_ids & free_channels) or is_voice_linked_channel

            # Skip the mention check if the message is in a thread where
            # the bot has previously participated (auto-created or replied in).
@@ -3261,7 +3256,6 @@ class DiscordAdapter(BasePlatformAdapter):
            if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
                thread = await self._auto_create_thread(message)
                if thread:
-                    parent_channel_id = str(message.channel.id)
                    is_thread = True
                    thread_id = str(thread.id)
                    auto_threaded_channel = thread
@@ -3321,9 +3315,6 @@ class DiscordAdapter(BasePlatformAdapter):
            thread_id=thread_id,
            chat_topic=chat_topic,
            is_bot=getattr(message.author, "bot", False),
-            guild_id=str(message.guild.id) if message.guild else None,
-            parent_chat_id=parent_channel_id,
-            message_id=str(message.id),
        )

        # Build media URLs -- download image attachments to local cache so the
@@ -3875,15 +3866,6 @@ if DISCORD_AVAILABLE:

            self.resolved = True
            model_id = interaction.data["values"][0]
-            self.clear_items()
-            await interaction.response.edit_message(
-                embed=discord.Embed(
-                    title="⚙ Switching Model",
-                    description=f"Switching to `{model_id}`...",
-                    color=discord.Color.blue(),
-                ),
-                view=None,
-            )

            try:
                result_text = await self.on_model_selected(
@@ -3894,13 +3876,14 @@ if DISCORD_AVAILABLE:
            except Exception as exc:
                result_text = f"Error switching model: {exc}"

-            await interaction.edit_original_response(
+            self.clear_items()
+            await interaction.response.edit_message(
                embed=discord.Embed(
                    title="⚙ Model Switched",
                    description=result_text,
                    color=discord.Color.green(),
                ),
-                view=None,
+                view=self,
            )

        async def _on_back(self, interaction: discord.Interaction):
@@ -532,20 +532,6 @@ class MatrixAdapter(BasePlatformAdapter):
                )
                await crypto_store.open()

-                # Bind the store to the runtime device_id before any
-                # put_account() runs. PgCryptoStore defaults _device_id
-                # to "" and its crypto_account UPSERT never updates the
-                # device_id column on conflict — so once put_account
-                # writes blank, it stays blank forever. That breaks
-                # every downstream device-scoped olm operation: peer
-                # to-device ciphertext can't find our identity key and
-                # no megolm sessions ever land. Setting _device_id here
-                # (in-memory; the on-disk row may not exist yet) makes
-                # the first put_account write the correct value.
-                # DeviceID is a NewType(str) so plain str works at runtime.
-                if client.device_id:
-                    await crypto_store.put_device_id(client.device_id)
-
                crypto_state = _CryptoStateStore(state_store, self._joined_rooms)
                olm = OlmMachine(client, crypto_store, crypto_state)

@@ -703,6 +703,7 @@ class TelegramAdapter(BasePlatformAdapter):
                "write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
            }

+            proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
            disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
            fallback_ips = self._fallback_ips()
            if not fallback_ips:
@@ -713,8 +714,6 @@ class TelegramAdapter(BasePlatformAdapter):
                    ", ".join(fallback_ips),
                )

-            proxy_targets = ["api.telegram.org", *fallback_ips]
-            proxy_url = resolve_proxy_url("TELEGRAM_PROXY", target_hosts=proxy_targets)
            if fallback_ips and not proxy_url and not disable_fallback:
                logger.info(
                    "[%s] Telegram fallback IPs active: %s",
@@ -43,10 +43,10 @@ _DOH_PROVIDERS: list[dict] = [
 _SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]


-def _resolve_proxy_url(target_hosts=None) -> str | None:
+def _resolve_proxy_url() -> str | None:
    # Delegate to shared implementation (env vars + macOS system proxy detection)
    from gateway.platforms.base import resolve_proxy_url
-    return resolve_proxy_url("TELEGRAM_PROXY", target_hosts=target_hosts)
+    return resolve_proxy_url("TELEGRAM_PROXY")


 class TelegramFallbackTransport(httpx.AsyncBaseTransport):
@@ -60,7 +60,7 @@ class TelegramFallbackTransport(httpx.AsyncBaseTransport):

    def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
        self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
-        proxy_url = _resolve_proxy_url(target_hosts=[_TELEGRAM_API_HOST, *self._fallback_ips])
+        proxy_url = _resolve_proxy_url()
        if proxy_url and "proxy" not in transport_kwargs:
            transport_kwargs["proxy"] = proxy_url
        self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)
@@ -14,7 +14,6 @@ Usage:
 """

 import asyncio
-import dataclasses
 import json
 import logging
 import os
@@ -298,16 +297,50 @@ from gateway.restart import (
 )


-from gateway.whatsapp_identity import (
-    canonical_whatsapp_identifier as _canonical_whatsapp_identifier,  # noqa: F401
-    expand_whatsapp_aliases as _expand_whatsapp_auth_aliases,
-    normalize_whatsapp_identifier as _normalize_whatsapp_identifier,
-)
+def _normalize_whatsapp_identifier(value: str) -> str:
+    """Strip WhatsApp JID/LID syntax down to its stable numeric identifier."""
+    return (
+        str(value or "")
+        .strip()
+        .replace("+", "", 1)
+        .split(":", 1)[0]
+        .split("@", 1)[0]
+    )


+def _expand_whatsapp_auth_aliases(identifier: str) -> set:
+    """Resolve WhatsApp phone/LID aliases using bridge session mapping files."""
+    normalized = _normalize_whatsapp_identifier(identifier)
+    if not normalized:
+        return set()
+
+    session_dir = _hermes_home / "whatsapp" / "session"
+    resolved = set()
+    queue = [normalized]
+
+    while queue:
+        current = queue.pop(0)
+        if not current or current in resolved:
+            continue
+
+        resolved.add(current)
+        for suffix in ("", "_reverse"):
+            mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
+            if not mapping_path.exists():
+                continue
+            try:
+                mapped = _normalize_whatsapp_identifier(
+                    json.loads(mapping_path.read_text(encoding="utf-8"))
+                )
+            except Exception:
+                continue
+            if mapped and mapped not in resolved:
+                queue.append(mapped)
+
+    return resolved
+
 logger = logging.getLogger(__name__)

-
 # Sentinel placed into _running_agents immediately when a session starts
 # processing, *before* any await.  Prevents a second message for the same
 # session from bypassing the "already running" guard during the async gap
@@ -316,30 +349,16 @@ _AGENT_PENDING_SENTINEL = object()


 def _resolve_runtime_agent_kwargs() -> dict:
-    """Resolve provider credentials for gateway-created AIAgent instances.
-
-    If the primary provider fails with an authentication error, attempt to
-    resolve credentials using the fallback provider chain from config.yaml
-    before giving up.
-    """
+    """Resolve provider credentials for gateway-created AIAgent instances."""
    from hermes_cli.runtime_provider import (
        resolve_runtime_provider,
        format_runtime_provider_error,
    )
-    from hermes_cli.auth import AuthError

    try:
        runtime = resolve_runtime_provider(
            requested=os.getenv("HERMES_INFERENCE_PROVIDER"),
        )
-    except AuthError as auth_exc:
-        # Primary provider auth failed (expired token, revoked key, etc.).
-        # Try the fallback provider chain before raising.
-        logger.warning("Primary provider auth failed: %s — trying fallback", auth_exc)
-        fb_config = _try_resolve_fallback_provider()
-        if fb_config is not None:
-            return fb_config
-        raise RuntimeError(format_runtime_provider_error(auth_exc)) from auth_exc
    except Exception as exc:
        raise RuntimeError(format_runtime_provider_error(exc)) from exc

@@ -354,48 +373,6 @@ def _resolve_runtime_agent_kwargs() -> dict:
    }


-def _try_resolve_fallback_provider() -> dict | None:
-    """Attempt to resolve credentials from the fallback_model/fallback_providers config."""
-    from hermes_cli.runtime_provider import resolve_runtime_provider
-    try:
-        import yaml as _y
-        cfg_path = _hermes_home / "config.yaml"
-        if not cfg_path.exists():
-            return None
-        with open(cfg_path, encoding="utf-8") as _f:
-            cfg = _y.safe_load(_f) or {}
-        fb = cfg.get("fallback_providers") or cfg.get("fallback_model")
-        if not fb:
-            return None
-        # Normalize to list
-        fb_list = fb if isinstance(fb, list) else [fb]
-        for entry in fb_list:
-            if not isinstance(entry, dict):
-                continue
-            try:
-                runtime = resolve_runtime_provider(
-                    requested=entry.get("provider"),
-                    explicit_base_url=entry.get("base_url"),
-                    explicit_api_key=entry.get("api_key"),
-                )
-                logger.info("Fallback provider resolved: %s", runtime.get("provider"))
-                return {
-                    "api_key": runtime.get("api_key"),
-                    "base_url": runtime.get("base_url"),
-                    "provider": runtime.get("provider"),
-                    "api_mode": runtime.get("api_mode"),
-                    "command": runtime.get("command"),
-                    "args": list(runtime.get("args") or []),
-                    "credential_pool": runtime.get("credential_pool"),
-                }
-            except Exception as fb_exc:
-                logger.debug("Fallback entry %s failed: %s", entry.get("provider"), fb_exc)
-                continue
-    except Exception:
-        pass
-    return None
-
-
 def _build_media_placeholder(event) -> str:
    """Build a text placeholder for media-only events so they aren't dropped.

@@ -2332,17 +2309,6 @@ class GatewayRunner:
                for key, entry in _expired_entries:
                    try:
                        await self._async_flush_memories(entry.session_id, key)
-                        try:
-                            from hermes_cli.plugins import invoke_hook as _invoke_hook
-                            _parts = key.split(":")
-                            _platform = _parts[2] if len(_parts) > 2 else ""
-                            _invoke_hook(
-                                "on_session_finalize",
-                                session_id=entry.session_id,
-                                platform=_platform,
-                            )
-                        except Exception:
-                            pass
                        # Shut down memory provider and close tool resources
                        # on the cached agent.  Idle agents live in
                        # _agent_cache (not _running_agents), so look there.
@@ -3003,7 +2969,6 @@ class GatewayRunner:
            Platform.QQBOT: "QQ_ALLOWED_USERS",
        }
        platform_group_env_map = {
-            Platform.TELEGRAM: "TELEGRAM_GROUP_ALLOWED_USERS",
            Platform.QQBOT: "QQ_GROUP_ALLOWED_USERS",
        }
        platform_allow_all_map = {
@@ -3060,7 +3025,7 @@ class GatewayRunner:
        # Check platform-specific and global allowlists
        platform_allowlist = os.getenv(platform_env_map.get(source.platform, ""), "").strip()
        group_allowlist = ""
-        if source.chat_type in {"group", "forum"}:
+        if source.chat_type == "group":
            group_allowlist = os.getenv(platform_group_env_map.get(source.platform, ""), "").strip()
        global_allowlist = os.getenv("GATEWAY_ALLOWED_USERS", "").strip()

@@ -3069,7 +3034,7 @@ class GatewayRunner:
            return os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes")

        # Some platforms authorize group traffic by chat ID rather than sender ID.
-        if group_allowlist and source.chat_type in {"group", "forum"} and source.chat_id:
+        if group_allowlist and source.chat_type == "group" and source.chat_id:
            allowed_group_ids = {
                chat_id.strip() for chat_id in group_allowlist.split(",") if chat_id.strip()
            }
@@ -3180,50 +3145,7 @@ class GatewayRunner:

        # Internal events (e.g. background-process completion notifications)
        # are system-generated and must skip user authorization.
-        is_internal = bool(getattr(event, "internal", False))
-
-        # Fire pre_gateway_dispatch plugin hook for user-originated messages.
-        # Plugins receive the MessageEvent and may return a dict influencing flow:
-        #   {"action": "skip",    "reason": ...}    -> drop (no reply, plugin handled)
-        #   {"action": "rewrite", "text":  ...}     -> replace event.text, continue
-        #   {"action": "allow"}   /   None          -> normal dispatch
-        # Hook runs BEFORE auth so plugins can handle unauthorized senders
-        # (e.g. customer handover ingest) without triggering the pairing flow.
-        if not is_internal:
-            try:
-                from hermes_cli.plugins import invoke_hook as _invoke_hook
-                _hook_results = _invoke_hook(
-                    "pre_gateway_dispatch",
-                    event=event,
-                    gateway=self,
-                    session_store=self.session_store,
-                )
-            except Exception as _hook_exc:
-                logger.warning("pre_gateway_dispatch invocation failed: %s", _hook_exc)
-                _hook_results = []
-
-            for _result in _hook_results:
-                if not isinstance(_result, dict):
-                    continue
-                _action = _result.get("action")
-                if _action == "skip":
-                    logger.info(
-                        "pre_gateway_dispatch skip: reason=%s platform=%s chat=%s",
-                        _result.get("reason"),
-                        source.platform.value if source.platform else "unknown",
-                        source.chat_id or "unknown",
-                    )
-                    return None
-                if _action == "rewrite":
-                    _new_text = _result.get("text")
-                    if isinstance(_new_text, str):
-                        event = dataclasses.replace(event, text=_new_text)
-                        source = event.source
-                    break
-                if _action == "allow":
-                    break
-
-        if is_internal:
+        if getattr(event, "internal", False):
            pass
        elif source.user_id is None:
            # Messages with no user identity (Telegram service messages,
@@ -3520,7 +3442,7 @@ class GatewayRunner:
            # running-agent guard. Reject gracefully rather than falling
            # through to interrupt + discard. Without this, commands
            # like /model, /reasoning, /voice, /insights, /title,
-            # /resume, /retry, /undo, /compress, /usage,
+            # /resume, /retry, /undo, /compress, /usage, /provider,
            # /reload-mcp, /sethome, /reset (all registered as Discord
            # slash commands) would interrupt the agent AND get
            # silently discarded by the slash-command safety net,
@@ -3591,10 +3513,6 @@ class GatewayRunner:
                    if self._queue_during_drain_enabled()
                    else f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
                )
-            if self._busy_input_mode == "queue":
-                logger.debug("PRIORITY queue follow-up for session %s", _quick_key[:20])
-                self._queue_or_replace_pending_event(_quick_key, event)
-                return None
            logger.debug("PRIORITY interrupt for session %s", _quick_key[:20])
            running_agent.interrupt(event.text)
            if _quick_key in self._pending_messages:
@@ -3711,9 +3629,34 @@ class GatewayRunner:
        if canonical == "model":
            return await self._handle_model_command(event)

+        if canonical == "provider":
+            return await self._handle_provider_command(event)
+        
        if canonical == "personality":
            return await self._handle_personality_command(event)

+        if canonical == "plan":
+            try:
+                from agent.skill_commands import build_plan_path, build_skill_invocation_message
+
+                user_instruction = event.get_command_args().strip()
+                plan_path = build_plan_path(user_instruction)
+                event.text = build_skill_invocation_message(
+                    "/plan",
+                    user_instruction,
+                    task_id=_quick_key,
+                    runtime_note=(
+                        "Save the markdown plan with write_file to this exact relative path "
+                        f"inside the active workspace/backend cwd: {plan_path}"
+                    ),
+                )
+                if not event.text:
+                    return "Failed to load the bundled /plan skill."
+                canonical = None
+            except Exception as e:
+                logger.exception("Failed to prepare /plan command")
+                return f"Failed to enter plan mode: {e}"
+        
        if canonical == "retry":
            return await self._handle_retry_command(event)
        
@@ -5659,17 +5602,9 @@ class GatewayRunner:
                        lines = [f"Model switched to `{result.new_model}`"]
                        lines.append(f"Provider: {plabel}")
                        mi = result.model_info
-                        from hermes_cli.model_switch import resolve_display_context_length
-                        ctx = resolve_display_context_length(
-                            result.new_model,
-                            result.target_provider,
-                            base_url=result.base_url or current_base_url or "",
-                            api_key=result.api_key or current_api_key or "",
-                            model_info=mi,
-                        )
-                        if ctx:
-                            lines.append(f"Context: {ctx:,} tokens")
                        if mi:
+                            if mi.context_window:
+                                lines.append(f"Context: {mi.context_window:,} tokens")
                            if mi.max_output:
                                lines.append(f"Max output: {mi.max_output:,} tokens")
                            if mi.has_cost_data():
@@ -5803,25 +5738,28 @@ class GatewayRunner:
        lines = [f"Model switched to `{result.new_model}`"]
        lines.append(f"Provider: {provider_label}")

-        # Context: always resolve via the provider-aware chain so Codex OAuth,
-        # Copilot, and Nous-enforced caps win over the raw models.dev entry.
+        # Rich metadata from models.dev
        mi = result.model_info
-        from hermes_cli.model_switch import resolve_display_context_length
-        ctx = resolve_display_context_length(
-            result.new_model,
-            result.target_provider,
-            base_url=result.base_url or current_base_url or "",
-            api_key=result.api_key or current_api_key or "",
-            model_info=mi,
-        )
-        if ctx:
-            lines.append(f"Context: {ctx:,} tokens")
        if mi:
+            if mi.context_window:
+                lines.append(f"Context: {mi.context_window:,} tokens")
            if mi.max_output:
                lines.append(f"Max output: {mi.max_output:,} tokens")
            if mi.has_cost_data():
                lines.append(f"Cost: {mi.format_cost()}")
            lines.append(f"Capabilities: {mi.format_capabilities()}")
+        else:
+            try:
+                from agent.model_metadata import get_model_context_length
+                ctx = get_model_context_length(
+                    result.new_model,
+                    base_url=result.base_url or current_base_url,
+                    api_key=result.api_key or current_api_key,
+                    provider=result.target_provider,
+                )
+                lines.append(f"Context: {ctx:,} tokens")
+            except Exception:
+                pass

        # Cache notice
        cache_enabled = (
@@ -5841,6 +5779,63 @@ class GatewayRunner:

        return "\n".join(lines)

+    async def _handle_provider_command(self, event: MessageEvent) -> str:
+        """Handle /provider command - show available providers."""
+        import yaml
+        from hermes_cli.models import (
+            list_available_providers,
+            normalize_provider,
+            _PROVIDER_LABELS,
+        )
+
+        # Resolve current provider from config
+        current_provider = "openrouter"
+        model_cfg = {}
+        config_path = _hermes_home / 'config.yaml'
+        try:
+            if config_path.exists():
+                with open(config_path, encoding="utf-8") as f:
+                    cfg = yaml.safe_load(f) or {}
+                model_cfg = cfg.get("model", {})
+                if isinstance(model_cfg, dict):
+                    current_provider = model_cfg.get("provider", current_provider)
+        except Exception:
+            pass
+
+        current_provider = normalize_provider(current_provider)
+        if current_provider == "auto":
+            try:
+                from hermes_cli.auth import resolve_provider as _resolve_provider
+                current_provider = _resolve_provider(current_provider)
+            except Exception:
+                current_provider = "openrouter"
+
+        # Detect custom endpoint from config base_url
+        if current_provider == "openrouter":
+            _cfg_base = model_cfg.get("base_url", "") if isinstance(model_cfg, dict) else ""
+            if _cfg_base and "openrouter.ai" not in _cfg_base:
+                current_provider = "custom"
+
+        current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
+
+        lines = [
+            f"🔌 **Current provider:** {current_label} (`{current_provider}`)",
+            "",
+            "**Available providers:**",
+        ]
+
+        providers = list_available_providers()
+        for p in providers:
+            marker = " ← active" if p["id"] == current_provider else ""
+            auth = "✅" if p["authenticated"] else "❌"
+            aliases = f"  _(also: {', '.join(p['aliases'])})_" if p["aliases"] else ""
+            lines.append(f"{auth} `{p['id']}` — {p['label']}{aliases}{marker}")
+
+        lines.append("")
+        lines.append("Switch: `/model provider:model-name`")
+        lines.append("Setup: `hermes setup`")
+        return "\n".join(lines)
+    
    async def _handle_personality_command(self, event: MessageEvent) -> str:
        """Handle /personality command - list or set a personality."""
        import yaml
@@ -7107,7 +7102,10 @@ class GatewayRunner:
                tmp_agent._print_fn = lambda *a, **kw: None

                compressor = tmp_agent.context_compressor
-                if not compressor.has_content_to_compress(msgs):
+                compress_start = compressor.protect_first_n
+                compress_start = compressor._align_boundary_forward(msgs, compress_start)
+                compress_end = compressor._find_tail_cut_by_tokens(msgs, compress_start)
+                if compress_start >= compress_end:
                    return "Nothing to compress yet (the transcript is still all protected context)."

                loop = asyncio.get_running_loop()
@@ -7233,19 +7231,13 @@ class GatewayRunner:
                logger.debug("Failed to list titled sessions: %s", e)
                return f"Could not list sessions: {e}"

-        # Resolve the name to a session ID.
+        # Resolve the name to a session ID
        target_id = self._session_db.resolve_session_by_title(name)
        if not target_id:
            return (
                f"No session found matching '**{name}**'.\n"
                "Use `/resume` with no arguments to see available sessions."
            )
-        # Compression creates child continuations that hold the live transcript.
-        # Follow that chain so gateway /resume matches CLI behavior (#15000).
-        try:
-            target_id = self._session_db.resolve_resume_session_id(target_id)
-        except Exception as e:
-            logger.debug("Failed to resolve resume continuation for %s: %s", target_id, e)

        # Check if already on that session
        current_entry = self.session_store.get_or_create_session(source)
@@ -60,10 +60,6 @@ from .config import (
    SessionResetPolicy,  # noqa: F401 — re-exported via gateway/__init__.py
    HomeChannel,
 )
-from .whatsapp_identity import (
-    canonical_whatsapp_identifier,
-    normalize_whatsapp_identifier,
-)


@dataclass
@@ -87,9 +83,6 @@ class SessionSource:
    user_id_alt: Optional[str] = None  # Platform-specific stable alt ID (Signal UUID, Feishu union_id)
    chat_id_alt: Optional[str] = None  # Signal group internal ID
    is_bot: bool = False  # True when the message author is a bot/webhook (Discord)
-    guild_id: Optional[str] = None  # Discord guild / Slack workspace / Matrix server scope
-    parent_chat_id: Optional[str] = None  # Parent channel when chat_id refers to a thread
-    message_id: Optional[str] = None  # ID of the triggering message (for pin/reply/react)
    
    @property
    def description(self) -> str:
@@ -127,14 +120,8 @@ class SessionSource:
            d["user_id_alt"] = self.user_id_alt
        if self.chat_id_alt:
            d["chat_id_alt"] = self.chat_id_alt
-        if self.guild_id:
-            d["guild_id"] = self.guild_id
-        if self.parent_chat_id:
-            d["parent_chat_id"] = self.parent_chat_id
-        if self.message_id:
-            d["message_id"] = self.message_id
        return d
-
+    
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
        return cls(
@@ -148,9 +135,6 @@ class SessionSource:
            chat_topic=data.get("chat_topic"),
            user_id_alt=data.get("user_id_alt"),
            chat_id_alt=data.get("chat_id_alt"),
-            guild_id=data.get("guild_id"),
-            parent_chat_id=data.get("parent_chat_id"),
-            message_id=data.get("message_id"),
        )
    

@@ -202,31 +186,6 @@ that requires raw IDs).  Discord is excluded because mentions use ``<@user_id>``
 and the LLM needs the real ID to tag users."""


-def _discord_tools_loaded() -> bool:
-    """True iff the agent will actually have Discord tools this session.
-
-    Two conditions must hold:
-      1. The `discord` or `discord_admin` toolset is enabled for the
-         Discord platform via `hermes tools` (opt-in, default OFF).
-      2. `DISCORD_BOT_TOKEN` is set — the tool's `check_fn` gates on it
-         at registry time, so the toolset being enabled in config is not
-         enough if the token isn't configured.
-
-    Returns False (safe default — keeps the stale-API disclaimer) on any
-    error so a bad config can't silently promise tools the agent lacks.
-    """
-    if not (os.environ.get("DISCORD_BOT_TOKEN") or "").strip():
-        return False
-    try:
-        from hermes_cli.config import load_config
-        from hermes_cli.tools_config import _get_platform_tools
-        cfg = load_config()
-        enabled = _get_platform_tools(cfg, "discord", include_default_mcp_servers=False)
-        return "discord" in enabled or "discord_admin" in enabled
-    except Exception:
-        return False
-
-
 def build_session_context_prompt(
    context: SessionContext,
    *,
@@ -314,44 +273,13 @@ def build_session_context_prompt(
            "that you can only read messages sent directly to you and respond."
        )
    elif context.source.platform == Platform.DISCORD:
-        # Inject the Discord IDs block only when the agent actually has
-        # Discord tools loaded this session — i.e. the user opted into
-        # `discord` / `discord_admin` via `hermes tools` AND the bot
-        # token is configured.  Otherwise keep the stale-API disclaimer
-        # honest so we never promise tools the agent lacks.
-        if _discord_tools_loaded():
-            src = context.source
-            id_lines = ["", "**Discord IDs (for the `discord` / `discord_admin` tools):**"]
-            if src.guild_id:
-                id_lines.append(f"  - Guild: `{src.guild_id}`")
-            if src.thread_id and src.parent_chat_id:
-                id_lines.append(f"  - Parent channel: `{src.parent_chat_id}`")
-                id_lines.append(f"  - Thread: `{src.thread_id}` (use as `channel_id` for fetch_messages etc.)")
-            else:
-                id_lines.append(f"  - Channel: `{src.chat_id}`")
-            if src.message_id:
-                id_lines.append(f"  - Triggering message: `{src.message_id}`")
-            lines.extend(id_lines)
-        else:
-            lines.append("")
-            lines.append(
-                "**Platform notes:** You are running inside Discord. "
-                "You do NOT have access to Discord-specific APIs — you cannot search "
-                "channel history, pin messages, manage roles, or list server members. "
-                "Do not promise to perform these actions. If the user asks, explain "
-                "that you can only read messages sent directly to you and respond."
-            )
-    elif context.source.platform == Platform.BLUEBUBBLES:
        lines.append("")
        lines.append(
-            "**Platform notes:** You are responding via iMessage. "
-            "Keep responses short and conversational — think texts, not essays. "
-            "Structure longer replies as separate short thoughts, each separated "
-            "by a blank line (double newline). Each block between blank lines "
-            "will be delivered as its own iMessage bubble, so write accordingly: "
-            "one idea per bubble, 1–3 sentences each. "
-            "If the user needs a detailed answer, give the short version first "
-            "and offer to elaborate."
+            "**Platform notes:** You are running inside Discord. "
+            "You do NOT have access to Discord-specific APIs — you cannot search "
+            "channel history, pin messages, manage roles, or list server members. "
+            "Do not promise to perform these actions. If the user asks, explain "
+            "that you can only read messages sent directly to you and respond."
        )

    # Connected platforms
@@ -590,24 +518,15 @@ def build_session_key(
    """
    platform = source.platform.value
    if source.chat_type == "dm":
-        dm_chat_id = source.chat_id
-        if source.platform == Platform.WHATSAPP:
-            dm_chat_id = canonical_whatsapp_identifier(source.chat_id)
-
-        if dm_chat_id:
+        if source.chat_id:
            if source.thread_id:
-                return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
-            return f"agent:main:{platform}:dm:{dm_chat_id}"
+                return f"agent:main:{platform}:dm:{source.chat_id}:{source.thread_id}"
+            return f"agent:main:{platform}:dm:{source.chat_id}"
        if source.thread_id:
            return f"agent:main:{platform}:dm:{source.thread_id}"
        return f"agent:main:{platform}:dm"

    participant_id = source.user_id_alt or source.user_id
-    if participant_id and source.platform == Platform.WHATSAPP:
-        # Same JID/LID-flip bug as the DM case: without canonicalisation, a
-        # single group member gets two isolated per-user sessions when the
-        # bridge reshuffles alias forms.
-        participant_id = canonical_whatsapp_identifier(str(participant_id)) or participant_id
    key_parts = ["agent:main", platform, source.chat_type]

    if source.chat_id:
@@ -1,135 +0,0 @@
-"""Shared helpers for canonicalising WhatsApp sender identity.
-
-WhatsApp's bridge can surface the same human under two different JID shapes
-within a single conversation:
-
- LID form: ``999999999999999@lid``
- Phone form: ``15551234567@s.whatsapp.net``
-
-Both the authorisation path (:mod:`gateway.run`) and the session-key path
-(:mod:`gateway.session`) need to collapse these aliases to a single stable
-identity. This module is the single source of truth for that resolution so
-the two paths can never drift apart.
-
-Public helpers:
-
- :func:`normalize_whatsapp_identifier` — strip JID/LID/device/plus syntax
-  down to the bare numeric identifier.
- :func:`canonical_whatsapp_identifier` — walk the bridge's
-  ``lid-mapping-*.json`` files and return a stable canonical identity
-  across phone/LID variants.
- :func:`expand_whatsapp_aliases` — return the full alias set for an
-  identifier. Used by authorisation code that needs to match any known
-  form of a sender against an allow-list.
-
-Plugins that need per-sender behaviour on WhatsApp (role-based routing,
-per-contact authorisation, policy gating in a gateway hook) should use
-``canonical_whatsapp_identifier`` so their bookkeeping lines up with
-Hermes' own session keys.
-"""
-
-from __future__ import annotations
-
-import json
-from typing import Set
-
-from hermes_constants import get_hermes_home
-
-
-def normalize_whatsapp_identifier(value: str) -> str:
-    """Strip WhatsApp JID/LID syntax down to its stable numeric identifier.
-
-    Accepts any of the identifier shapes the WhatsApp bridge may emit:
-    ``"60123456789@s.whatsapp.net"``, ``"60123456789:47@s.whatsapp.net"``,
-    ``"60123456789@lid"``, or a bare ``"+601****6789"`` / ``"60123456789"``.
-    Returns just the numeric identifier (``"60123456789"``) suitable for
-    equality comparisons.
-
-    Useful for plugins that want to match sender IDs against
-    user-supplied config (phone numbers in ``config.yaml``) without
-    worrying about which variant the bridge happens to deliver.
-    """
-    return (
-        str(value or "")
-        .strip()
-        .replace("+", "", 1)
-        .split(":", 1)[0]
-        .split("@", 1)[0]
-    )
-
-
-def expand_whatsapp_aliases(identifier: str) -> Set[str]:
-    """Resolve WhatsApp phone/LID aliases via bridge session mapping files.
-
-    Returns the set of all identifiers transitively reachable through the
-    bridge's ``$HERMES_HOME/whatsapp/session/lid-mapping-*.json`` files,
-    starting from ``identifier``. The result always includes the
-    normalized input itself, so callers can safely ``in`` check against
-    the return value without a separate fallback branch.
-
-    Returns an empty set if ``identifier`` normalizes to empty.
-    """
-    normalized = normalize_whatsapp_identifier(identifier)
-    if not normalized:
-        return set()
-
-    session_dir = get_hermes_home() / "whatsapp" / "session"
-    resolved: Set[str] = set()
-    queue = [normalized]
-
-    while queue:
-        current = queue.pop(0)
-        if not current or current in resolved:
-            continue
-
-        resolved.add(current)
-        for suffix in ("", "_reverse"):
-            mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
-            if not mapping_path.exists():
-                continue
-            try:
-                mapped = normalize_whatsapp_identifier(
-                    json.loads(mapping_path.read_text(encoding="utf-8"))
-                )
-            except Exception:
-                continue
-            if mapped and mapped not in resolved:
-                queue.append(mapped)
-
-    return resolved
-
-
-def canonical_whatsapp_identifier(identifier: str) -> str:
-    """Return a stable WhatsApp sender identity across phone-JID/LID variants.
-
-    WhatsApp may surface the same person under either a phone-format JID
-    (``60123456789@s.whatsapp.net``) or a LID (``1234567890@lid``). This
-    applies to a DM ``chat_id`` *and* to the ``participant_id`` of a
-    member inside a group chat — both represent a user identity, and the
-    bridge may flip between the two for the same human.
-
-    This helper reads the bridge's ``whatsapp/session/lid-mapping-*.json``
-    files, walks the mapping transitively, and picks the shortest
-    (numeric-preferred) alias as the canonical identity.
-    :func:`gateway.session.build_session_key` uses this for both WhatsApp
-    DM chat_ids and WhatsApp group participant_ids, so callers get the
-    same session-key identity Hermes itself uses.
-
-    Plugins that need per-sender behaviour (role-based routing,
-    authorisation, per-contact policy) should use this so their
-    bookkeeping lines up with Hermes' session bookkeeping even when
-    the bridge reshuffles aliases.
-
-    Returns an empty string if ``identifier`` normalizes to empty. If no
-    mapping files exist yet (fresh bridge install), returns the
-    normalized input unchanged.
-    """
-    normalized = normalize_whatsapp_identifier(identifier)
-    if not normalized:
-        return ""
-
-    # expand_whatsapp_aliases always includes `normalized` itself in the
-    # returned set, so the min() below degrades gracefully to `normalized`
-    # when no lid-mapping files are present.
-    aliases = expand_whatsapp_aliases(normalized)
-    return min(aliases, key=lambda candidate: (len(candidate), candidate))
@@ -110,40 +110,18 @@ def _display_source(source: str) -> str:
    return source.split(":", 1)[1] if source.startswith("manual:") else source


-def _classify_exhausted_status(entry) -> tuple[str, bool]:
-    code = getattr(entry, "last_error_code", None)
-    reason = str(getattr(entry, "last_error_reason", "") or "").strip().lower()
-    message = str(getattr(entry, "last_error_message", "") or "").strip().lower()
-
-    if code == 429 or any(token in reason for token in ("rate_limit", "usage_limit", "quota", "exhausted")) or any(
-        token in message for token in ("rate limit", "usage limit", "quota", "too many requests")
-    ):
-        return "rate-limited", True
-
-    if code in {401, 403} or any(token in reason for token in ("invalid_token", "invalid_grant", "unauthorized", "forbidden", "auth")) or any(
-        token in message for token in ("unauthorized", "forbidden", "expired", "revoked", "invalid token", "authentication")
-    ):
-        return "auth failed", False
-
-    return "exhausted", True
-
-
-
 def _format_exhausted_status(entry) -> str:
    if entry.last_status != STATUS_EXHAUSTED:
        return ""
-    label, show_retry_window = _classify_exhausted_status(entry)
    reason = getattr(entry, "last_error_reason", None)
    reason_text = f" {reason}" if isinstance(reason, str) and reason.strip() else ""
    code = f" ({entry.last_error_code})" if entry.last_error_code else ""
-    if not show_retry_window:
-        return f" {label}{reason_text}{code} (re-auth may be required)"
    exhausted_until = _exhausted_until(entry)
    if exhausted_until is None:
-        return f" {label}{reason_text}{code}"
+        return f" exhausted{reason_text}{code}"
    remaining = max(0, int(math.ceil(exhausted_until - time.time())))
    if remaining <= 0:
-        return f" {label}{reason_text}{code} (ready to retry)"
+        return f" exhausted{reason_text}{code} (ready to retry)"
    minutes, seconds = divmod(remaining, 60)
    hours, minutes = divmod(minutes, 60)
    days, hours = divmod(hours, 24)
@@ -155,7 +133,7 @@ def _format_exhausted_status(entry) -> str:
        wait = f"{minutes}m {seconds}s"
    else:
        wait = f"{seconds}s"
-    return f" {label}{reason_text}{code} ({wait} left)"
+    return f" exhausted{reason_text}{code} ({wait} left)"


 def auth_add_command(args) -> None:
@@ -408,44 +386,6 @@ def auth_reset_command(args) -> None:
    print(f"Reset status on {count} {provider} credentials")


-def auth_status_command(args) -> None:
-    provider = _normalize_provider(getattr(args, "provider", "") or "")
-    if not provider:
-        raise SystemExit("Provider is required. Example: `hermes auth status spotify`.")
-    status = auth_mod.get_auth_status(provider)
-    if not status.get("logged_in"):
-        reason = status.get("error")
-        if reason:
-            print(f"{provider}: logged out ({reason})")
-        else:
-            print(f"{provider}: logged out")
-        return
-
-    print(f"{provider}: logged in")
-    for key in ("auth_type", "client_id", "redirect_uri", "scope", "expires_at", "api_base_url"):
-        value = status.get(key)
-        if value:
-            print(f"  {key}: {value}")
-
-
-def auth_logout_command(args) -> None:
-    auth_mod.logout_command(SimpleNamespace(provider=getattr(args, "provider", None)))
-
-
-def auth_spotify_command(args) -> None:
-    action = str(getattr(args, "spotify_action", "") or "login").strip().lower()
-    if action in {"", "login"}:
-        auth_mod.login_spotify_command(args)
-        return
-    if action == "status":
-        auth_status_command(SimpleNamespace(provider="spotify"))
-        return
-    if action == "logout":
-        auth_logout_command(SimpleNamespace(provider="spotify"))
-        return
-    raise SystemExit(f"Unknown Spotify auth action: {action}")
-
-
 def _interactive_auth() -> None:
    """Interactive credential pool management when `hermes auth` is called bare."""
    # Show current pool status first
@@ -643,14 +583,5 @@ def auth_command(args) -> None:
    if action == "reset":
        auth_reset_command(args)
        return
-    if action == "status":
-        auth_status_command(args)
-        return
-    if action == "logout":
-        auth_logout_command(args)
-        return
-    if action == "spotify":
-        auth_spotify_command(args)
-        return
    # No subcommand — launch interactive mode
    _interactive_auth()
@@ -238,52 +238,6 @@ def get_git_banner_state(repo_dir: Optional[Path] = None) -> Optional[dict]:
    return {"upstream": upstream, "local": local, "ahead": max(ahead, 0)}


-_RELEASE_URL_BASE = "https://github.com/NousResearch/hermes-agent/releases/tag"
-_latest_release_cache: Optional[tuple] = None  # (tag, url) once resolved
-
-
-def get_latest_release_tag(repo_dir: Optional[Path] = None) -> Optional[tuple]:
-    """Return ``(tag, release_url)`` for the latest git tag, or None.
-
-    Local-only — runs ``git describe --tags --abbrev=0`` against the
-    Hermes checkout. Cached per-process. Release URL always points at the
-    canonical NousResearch/hermes-agent repo (forks don't get a link).
-    """
-    global _latest_release_cache
-    if _latest_release_cache is not None:
-        return _latest_release_cache or None
-
-    repo_dir = repo_dir or _resolve_repo_dir()
-    if repo_dir is None:
-        _latest_release_cache = ()  # falsy sentinel — skip future lookups
-        return None
-
-    try:
-        result = subprocess.run(
-            ["git", "describe", "--tags", "--abbrev=0"],
-            capture_output=True,
-            text=True,
-            timeout=3,
-            cwd=str(repo_dir),
-        )
-    except Exception:
-        _latest_release_cache = ()
-        return None
-
-    if result.returncode != 0:
-        _latest_release_cache = ()
-        return None
-
-    tag = (result.stdout or "").strip()
-    if not tag:
-        _latest_release_cache = ()
-        return None
-
-    url = f"{_RELEASE_URL_BASE}/{tag}"
-    _latest_release_cache = (tag, url)
-    return _latest_release_cache
-
-
 def format_banner_version_label() -> str:
    """Return the version label shown in the startup banner title."""
    base = f"Hermes Agent v{VERSION} ({RELEASE_DATE})"
@@ -565,16 +519,9 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
    agent_name = _skin_branding("agent_name", "Hermes Agent")
    title_color = _skin_color("banner_title", "#FFD700")
    border_color = _skin_color("banner_border", "#CD7F32")
-    version_label = format_banner_version_label()
-    release_info = get_latest_release_tag()
-    if release_info:
-        _tag, _url = release_info
-        title_markup = f"[bold {title_color}][link={_url}]{version_label}[/link][/]"
-    else:
-        title_markup = f"[bold {title_color}]{version_label}[/]"
    outer_panel = Panel(
        layout_table,
-        title=title_markup,
+        title=f"[bold {title_color}]{format_banner_version_label()}[/]",
        border_style=border_color,
        padding=(0, 2),
    )
@@ -77,7 +77,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
               args_hint="[number]"),
    CommandDef("snapshot", "Create or restore state snapshots of Hermes config/state", "Session",
-               cli_only=True, aliases=("snap",), args_hint="[create|restore <id>|prune]"),
+               aliases=("snap",), args_hint="[create|restore <id>|prune]"),
    CommandDef("stop", "Kill all running background processes", "Session"),
    CommandDef("approve", "Approve a pending dangerous command", "Session",
               gateway_only=True, args_hint="[session|always]"),
@@ -104,8 +104,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("config", "Show current configuration", "Configuration",
               cli_only=True),
    CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--provider name] [--global]"),
-    CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
-               cli_only=True),
+    CommandDef("provider", "Show available providers and current provider",
+               "Configuration"),
+    CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info"),

    CommandDef("personality", "Set a predefined personality", "Configuration",
               args_hint="[name]"),
@@ -123,12 +124,9 @@ COMMAND_REGISTRY: list[CommandDef] = [
               args_hint="[normal|fast|status]",
               subcommands=("normal", "fast", "status", "on", "off")),
    CommandDef("skin", "Show or change the display skin/theme", "Configuration",
-               cli_only=True, args_hint="[name]"),
+               args_hint="[name]"),
    CommandDef("voice", "Toggle voice mode", "Configuration",
               args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
-    CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
-               cli_only=True, args_hint="[queue|interrupt|status]",
-               subcommands=("queue", "interrupt", "status")),

    # Tools & Skills
    CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@@ -141,8 +139,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
               cli_only=True, args_hint="[subcommand]",
               subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
-    CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
-               cli_only=True),
+    CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills"),
    CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
               aliases=("reload_mcp",)),
    CommandDef("browser", "Connect browser tools to your live Chrome via CDP", "Tools & Skills",
@@ -320,7 +317,7 @@ def should_bypass_active_session(command_name: str | None) -> bool:
    safety net in gateway.run discards any command text that reaches
    the pending queue — which meant a mid-run /model (or /reasoning,
    /voice, /insights, /title, /resume, /retry, /undo, /compress,
-    /usage, /reload-mcp, /sethome, /reset) would silently
+    /usage, /provider, /reload-mcp, /sethome, /reset) would silently
    interrupt the agent AND get discarded, producing a zero-char
    response. See issue #5057 / PRs #6252, #10370, #4665.

@@ -466,12 +466,6 @@ DEFAULT_CONFIG = {
        "record_sessions": False,  # Auto-record browser sessions as WebM videos
        "allow_private_urls": False,  # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
        "cdp_url": "",  # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
-        # CDP supervisor — dialog + frame detection via a persistent WebSocket.
-        # Active only when a CDP-capable backend is attached (Browserbase or
-        # local Chrome via /browser connect). See
-        # website/docs/developer-guide/browser-supervisor.md.
-        "dialog_policy": "must_respond",  # must_respond | auto_dismiss | auto_accept
-        "dialog_timeout_s": 300,  # Safety auto-dismiss after N seconds under must_respond
        "camofox": {
            # When true, Hermes sends a stable profile-scoped userId to Camofox
            # so the server maps it to a persistent Firefox profile automatically.
@@ -492,27 +486,7 @@ DEFAULT_CONFIG = {
    # exceed this are rejected with guidance to use offset+limit.
    # 100K chars ≈ 25–35K tokens across typical tokenisers.
    "file_read_max_chars": 100_000,
-
-    # Tool-output truncation thresholds. When terminal output or a
-    # single read_file page exceeds these limits, Hermes truncates the
-    # payload sent to the model (keeping head + tail for terminal,
-    # enforcing pagination for read_file). Tuning these trades context
-    # footprint against how much raw output the model can see in one
-    # shot. Ported from anomalyco/opencode PR #23770.
-    #
-    # - max_bytes:       terminal_tool output cap, in chars
-    #                    (default 50_000 ≈ 12-15K tokens).
-    # - max_lines:       read_file pagination cap — the maximum `limit`
-    #                    a single read_file call can request before
-    #                    being clamped (default 2000).
-    # - max_line_length: per-line cap applied when read_file emits a
-    #                    line-numbered view (default 2000 chars).
-    "tool_output": {
-        "max_bytes": 50_000,
-        "max_lines": 2000,
-        "max_line_length": 2000,
-    },
-
+    
    "compression": {
        "enabled": True,
        "threshold": 0.50,            # compress when context usage exceeds this ratio
@@ -521,12 +495,6 @@ DEFAULT_CONFIG = {

    },

-    # Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
-    # cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
-    "prompt_caching": {
-        "cache_ttl": "5m",
-    },
-
    # AWS Bedrock provider configuration.
    # Only used when model.provider is "bedrock".
    "bedrock": {
@@ -783,15 +751,6 @@ DEFAULT_CONFIG = {
        # warning log if out of range.
        "max_spawn_depth": 1,        # depth cap (1 = flat [default], 2 = orchestrator→leaf, 3 = three-level)
        "orchestrator_enabled": True,  # kill switch for role="orchestrator"
-        # When a subagent hits a dangerous-command approval prompt, the parent's
-        # prompt_toolkit TUI owns stdin — a thread-local input() call from the
-        # subagent worker would deadlock the parent UI. To avoid the deadlock,
-        # subagent threads ALWAYS resolve approvals non-interactively:
-        #   false (default) → auto-deny with a logger.warning audit line (safe)
-        #   true             → auto-approve "once" with a logger.warning audit line
-        # Flip to true only if you trust delegated work to run dangerous cmds
-        # without human review (cron pipelines, batch automation, etc.).
-        "subagent_auto_approve": False,
    },

    # Ephemeral prefill messages file — JSON list of {role, content} dicts
@@ -848,7 +807,7 @@ DEFAULT_CONFIG = {
        "auto_thread": True,           # Auto-create threads on @mention in channels (like Slack)
        "reactions": True,             # Add 👀/✅/❌ reactions to messages during processing
        "channel_prompts": {},         # Per-channel ephemeral system prompts (forum parents apply to child threads)
-        # discord / discord_admin tools: restrict which actions the agent may call.
+        # discord_server tool: restrict which actions the agent may call.
        # Default (empty) = all actions allowed (subject to bot privileged intents).
        # Accepts comma-separated string ("list_guilds,list_channels,fetch_messages")
        # or YAML list. Unknown names are dropped with a warning at load time.
@@ -275,99 +275,6 @@ def copilot_device_code_login(
    return None


-# ─── Copilot Token Exchange ────────────────────────────────────────────────
-
-# Module-level cache for exchanged Copilot API tokens.
-# Maps raw_token_fingerprint -> (api_token, expires_at_epoch).
-_jwt_cache: dict[str, tuple[str, float]] = {}
-_JWT_REFRESH_MARGIN_SECONDS = 120  # refresh 2 min before expiry
-
-# Token exchange endpoint and headers (matching VS Code / Copilot CLI)
-_TOKEN_EXCHANGE_URL = "https://api.github.com/copilot_internal/v2/token"
-_EDITOR_VERSION = "vscode/1.104.1"
-_EXCHANGE_USER_AGENT = "GitHubCopilotChat/0.26.7"
-
-
-def _token_fingerprint(raw_token: str) -> str:
-    """Short fingerprint of a raw token for cache keying (avoids storing full token)."""
-    import hashlib
-    return hashlib.sha256(raw_token.encode()).hexdigest()[:16]
-
-
-def exchange_copilot_token(raw_token: str, *, timeout: float = 10.0) -> tuple[str, float]:
-    """Exchange a raw GitHub token for a short-lived Copilot API token.
-
-    Calls ``GET https://api.github.com/copilot_internal/v2/token`` with
-    the raw GitHub token and returns ``(api_token, expires_at)``.
-
-    The returned token is a semicolon-separated string (not a standard JWT)
-    used as ``Authorization: Bearer <token>`` for Copilot API requests.
-
-    Results are cached in-process and reused until close to expiry.
-    Raises ``ValueError`` on failure.
-    """
-    import urllib.request
-
-    fp = _token_fingerprint(raw_token)
-
-    # Check cache first
-    cached = _jwt_cache.get(fp)
-    if cached:
-        api_token, expires_at = cached
-        if time.time() < expires_at - _JWT_REFRESH_MARGIN_SECONDS:
-            return api_token, expires_at
-
-    req = urllib.request.Request(
-        _TOKEN_EXCHANGE_URL,
-        method="GET",
-        headers={
-            "Authorization": f"token {raw_token}",
-            "User-Agent": _EXCHANGE_USER_AGENT,
-            "Accept": "application/json",
-            "Editor-Version": _EDITOR_VERSION,
-        },
-    )
-
-    try:
-        with urllib.request.urlopen(req, timeout=timeout) as resp:
-            data = json.loads(resp.read().decode())
-    except Exception as exc:
-        raise ValueError(f"Copilot token exchange failed: {exc}") from exc
-
-    api_token = data.get("token", "")
-    expires_at = data.get("expires_at", 0)
-    if not api_token:
-        raise ValueError("Copilot token exchange returned empty token")
-
-    # Convert expires_at to float if needed
-    expires_at = float(expires_at) if expires_at else time.time() + 1800
-
-    _jwt_cache[fp] = (api_token, expires_at)
-    logger.debug(
-        "Copilot token exchanged, expires_at=%s",
-        expires_at,
-    )
-    return api_token, expires_at
-
-
-def get_copilot_api_token(raw_token: str) -> str:
-    """Exchange a raw GitHub token for a Copilot API token, with fallback.
-
-    Convenience wrapper: returns the exchanged token on success, or the
-    raw token unchanged if the exchange fails (e.g. network error, unsupported
-    account type). This preserves existing behaviour for accounts that don't
-    need exchange while enabling access to internal-only models for those that do.
-    """
-    if not raw_token:
-        return raw_token
-    try:
-        api_token, _ = exchange_copilot_token(raw_token)
-        return api_token
-    except Exception as exc:
-        logger.debug("Copilot token exchange failed, using raw token: %s", exc)
-        return raw_token
-
-
 # ─── Copilot API Headers ───────────────────────────────────────────────────

 def copilot_request_headers(
@@ -93,9 +93,6 @@ def cron_list(show_all: bool = False):
        script = job.get("script")
        if script:
            print(f"    Script:    {script}")
-        workdir = job.get("workdir")
-        if workdir:
-            print(f"    Workdir:   {workdir}")

        # Execution history
        last_status = job.get("last_status")
@@ -171,7 +168,6 @@ def cron_create(args):
        skill=getattr(args, "skill", None),
        skills=_normalize_skills(getattr(args, "skill", None), getattr(args, "skills", None)),
        script=getattr(args, "script", None),
-        workdir=getattr(args, "workdir", None),
    )
    if not result.get("success"):
        print(color(f"Failed to create job: {result.get('error', 'unknown error')}", Colors.RED))
@@ -184,8 +180,6 @@ def cron_create(args):
    job_data = result.get("job", {})
    if job_data.get("script"):
        print(f"  Script: {job_data['script']}")
-    if job_data.get("workdir"):
-        print(f"  Workdir: {job_data['workdir']}")
    print(f"  Next run: {result['next_run_at']}")
    return 0

@@ -224,7 +218,6 @@ def cron_edit(args):
        repeat=getattr(args, "repeat", None),
        skills=final_skills,
        script=getattr(args, "script", None),
-        workdir=getattr(args, "workdir", None),
    )
    if not result.get("success"):
        print(color(f"Failed to update job: {result.get('error', 'unknown error')}", Colors.RED))
@@ -240,8 +233,6 @@ def cron_edit(args):
        print("  Skills: none")
    if updated.get("script"):
        print(f"  Script: {updated['script']}")
-    if updated.get("workdir"):
-        print(f"  Workdir: {updated['workdir']}")
    return 0


@@ -29,7 +29,6 @@ if _env_path.exists():
 load_dotenv(PROJECT_ROOT / ".env", override=False, encoding="utf-8")

 from hermes_cli.colors import Colors, color
-from hermes_cli.models import _HERMES_USER_AGENT
 from hermes_constants import OPENROUTER_MODELS_URL
 from utils import base_url_host_matches

@@ -296,33 +295,16 @@ def run_doctor(args):
            except Exception:
                pass
            try:
-                from hermes_cli.config import get_compatible_custom_providers as _compatible_custom_providers
-                from hermes_cli.providers import resolve_provider_full as _resolve_provider_full
+                from hermes_cli.auth import resolve_provider as _resolve_provider
            except Exception:
-                _compatible_custom_providers = None
-                _resolve_provider_full = None
-
-            custom_providers = []
-            if _compatible_custom_providers is not None:
-                try:
-                    custom_providers = _compatible_custom_providers(cfg)
-                except Exception:
-                    custom_providers = []
-
-            user_providers = cfg.get("providers")
-            if isinstance(user_providers, dict):
-                known_providers.update(str(name).strip().lower() for name in user_providers if str(name).strip())
-            for entry in custom_providers:
-                if not isinstance(entry, dict):
-                    continue
-                name = str(entry.get("name") or "").strip()
-                if name:
-                    known_providers.add("custom:" + name.lower().replace(" ", "-"))
+                _resolve_provider = None

            canonical_provider = provider
-            if provider and _resolve_provider_full is not None and provider != "auto":
-                provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
-                canonical_provider = provider_def.id if provider_def is not None else None
+            if provider and _resolve_provider is not None and provider != "auto":
+                try:
+                    canonical_provider = _resolve_provider(provider)
+                except Exception:
+                    canonical_provider = None

            if provider and provider != "auto":
                if canonical_provider is None or (known_providers and canonical_provider not in known_providers):
@@ -975,10 +957,7 @@ def run_doctor(args):
                if base_url_host_matches(_base, "api.kimi.com") and _base.rstrip("/").endswith("/coding"):
                    _base = _base.rstrip("/") + "/v1"
                _url = (_base.rstrip("/") + "/models") if _base else _default_url
-                _headers = {
-                    "Authorization": f"Bearer {_key}",
-                    "User-Agent": _HERMES_USER_AGENT,
-                }
+                _headers = {"Authorization": f"Bearer {_key}"}
                if base_url_host_matches(_base, "api.kimi.com"):
                    _headers["User-Agent"] = "claude-code/0.1.0"
                _resp = httpx.get(
@@ -267,8 +267,6 @@ def run_dump(args):
        ("ANTHROPIC_API_KEY", "anthropic"),
        ("ANTHROPIC_TOKEN", "anthropic_token"),
        ("NOUS_API_KEY", "nous"),
-        ("GOOGLE_API_KEY", "google/gemini"),
-        ("GEMINI_API_KEY", "gemini"),
        ("GLM_API_KEY", "glm/zai"),
        ("ZAI_API_KEY", "zai"),
        ("KIMI_API_KEY", "kimi"),
@@ -166,27 +166,6 @@ from hermes_cli.env_loader import load_hermes_dotenv

 load_hermes_dotenv(project_env=PROJECT_ROOT / ".env")

-# Bridge security.redact_secrets from config.yaml → HERMES_REDACT_SECRETS env
-# var BEFORE hermes_logging imports agent.redact (which snapshots the flag at
-# module-import time). Without this, config.yaml's toggle is ignored because
-# the setup_logging() call below imports agent.redact, which reads the env var
-# exactly once. Env var in .env still wins — this is config.yaml fallback only.
-try:
-    if "HERMES_REDACT_SECRETS" not in os.environ:
-        import yaml as _yaml_early
-        _cfg_path = get_hermes_home() / "config.yaml"
-        if _cfg_path.exists():
-            with open(_cfg_path, encoding="utf-8") as _f:
-                _early_sec_cfg = (_yaml_early.safe_load(_f) or {}).get("security", {})
-            if isinstance(_early_sec_cfg, dict):
-                _early_redact = _early_sec_cfg.get("redact_secrets")
-                if _early_redact is not None:
-                    os.environ["HERMES_REDACT_SECRETS"] = str(_early_redact).lower()
-            del _early_sec_cfg
-        del _cfg_path
-except Exception:
-    pass  # best-effort — redaction stays at default (enabled) on config errors
-
 # Initialize centralized file logging early — all `hermes` subcommands
 # (chat, setup, gateway, config, etc.) write to agent.log + errors.log.
 try:
@@ -1106,9 +1085,6 @@ def cmd_chat(args):
        print(
            "It looks like Hermes isn't configured yet -- no API keys or providers found."
        )
-        print()
-        print("  Run:  hermes setup")
-        print()

        from hermes_cli.setup import (
            is_interactive_stdin,
@@ -1121,16 +1097,8 @@ def cmd_chat(args):
            )
            sys.exit(1)

-        try:
-            reply = input("Run setup now? [Y/n] ").strip().lower()
-        except (EOFError, KeyboardInterrupt):
-            reply = "n"
-        if reply in ("", "y", "yes"):
-            cmd_setup(args)
-            return
-        print()
-        print("You can run 'hermes setup' at any time to configure.")
-        sys.exit(1)
+        cmd_setup(args)
+        return

    # Start update check in background (runs while other init happens)
    try:
@@ -1450,7 +1418,6 @@ def select_provider_and_model(args=None):
        load_config,
        get_env_value,
    )
-    from hermes_cli.providers import resolve_provider_full

    config = load_config()
    current_model = config.get("model")
@@ -1468,30 +1435,14 @@ def select_provider_and_model(args=None):
    effective_provider = (
        config_provider or os.getenv("HERMES_INFERENCE_PROVIDER") or "auto"
    )
-    compatible_custom_providers = get_compatible_custom_providers(config)
-    active = None
-    if effective_provider != "auto":
-        active_def = resolve_provider_full(
-            effective_provider,
-            config.get("providers"),
-            compatible_custom_providers,
-        )
-        if active_def is not None:
-            active = active_def.id
-        else:
-            warning = (
-                f"Unknown provider '{effective_provider}'. Check 'hermes model' for "
-                "available providers, or run 'hermes doctor' to diagnose config "
-                "issues."
-            )
-            print(f"Warning: {warning} Falling back to auto provider detection.")
-    if active is None:
+    try:
+        active = resolve_provider(effective_provider)
+    except AuthError as exc:
+        warning = format_auth_error(exc)
+        print(f"Warning: {warning} Falling back to auto provider detection.")
        try:
            active = resolve_provider("auto")
-        except AuthError as exc:
-            if effective_provider == "auto":
-                warning = format_auth_error(exc)
-                print(f"Warning: {warning} Falling back to auto provider detection.")
+        except AuthError:
            active = None  # no provider yet; default to first in list

    # Detect custom endpoint
@@ -2173,7 +2124,7 @@ def _model_flow_nous(config, current_model="", args=None):
        resolve_nous_runtime_credentials,
        AuthError,
        format_auth_error,
-        _login_nous,
+        login_nous,
        PROVIDER_REGISTRY,
    )
    from hermes_cli.config import (
@@ -2186,8 +2137,6 @@ def _model_flow_nous(config, current_model="", args=None):

    state = get_provider_auth_state("nous")
    if not state or not state.get("access_token"):
-        print("Not logged into Nous Portal. Starting login...")
-        print()
        try:
            mock_args = argparse.Namespace(
                portal_url=getattr(args, "portal_url", None),
@@ -2199,7 +2148,7 @@ def _model_flow_nous(config, current_model="", args=None):
                ca_bundle=getattr(args, "ca_bundle", None),
                insecure=bool(getattr(args, "insecure", False)),
            )
-            _login_nous(mock_args, PROVIDER_REGISTRY["nous"])
+            login_nous(mock_args, PROVIDER_REGISTRY["nous"])
            # Offer Tool Gateway enablement for paid subscribers
            try:
                _refreshed = load_config() or {}
@@ -2250,7 +2199,7 @@ def _model_flow_nous(config, current_model="", args=None):
                    ca_bundle=None,
                    insecure=False,
                )
-                _login_nous(mock_args, PROVIDER_REGISTRY["nous"])
+                login_nous(mock_args, PROVIDER_REGISTRY["nous"])
            except Exception as login_exc:
                print(f"Re-login failed: {login_exc}")
            return
@@ -2349,41 +2298,7 @@ def _model_flow_openai_codex(config, current_model=""):
    from hermes_cli.codex_models import get_codex_model_ids

    status = get_codex_auth_status()
-    if status.get("logged_in"):
-        print("  OpenAI Codex credentials: ✓")
-        print()
-        print("    1. Use existing credentials")
-        print("    2. Reauthenticate (new OAuth login)")
-        print("    3. Cancel")
-        print()
-        try:
-            choice = input("  Choice [1/2/3]: ").strip()
-        except (KeyboardInterrupt, EOFError):
-            choice = "1"
-
-        if choice == "2":
-            print("Starting a fresh OpenAI Codex login...")
-            print()
-            try:
-                mock_args = argparse.Namespace()
-                _login_openai_codex(
-                    mock_args,
-                    PROVIDER_REGISTRY["openai-codex"],
-                    force_new_login=True,
-                )
-            except SystemExit:
-                print("Login cancelled or failed.")
-                return
-            except Exception as exc:
-                print(f"Login failed: {exc}")
-                return
-            status = get_codex_auth_status()
-            if not status.get("logged_in"):
-                print("Login failed.")
-                return
-        elif choice == "3":
-            return
-    else:
+    if not status.get("logged_in"):
        print("Not logged into OpenAI Codex. Starting login...")
        print()
        try:
@@ -2900,16 +2815,11 @@ def _model_flow_named_custom(config, provider_info):

    name = provider_info["name"]
    base_url = provider_info["base_url"]
-    api_mode = provider_info.get("api_mode", "")
    api_key = provider_info.get("api_key", "")
    key_env = provider_info.get("key_env", "")
    saved_model = provider_info.get("model", "")
    provider_key = (provider_info.get("provider_key") or "").strip()

-    # Resolve key from env var if api_key not set directly
-    if not api_key and key_env:
-        api_key = os.environ.get(key_env, "")
-
    print(f"  Provider: {name}")
    print(f"  URL:      {base_url}")
    if saved_model:
@@ -2917,10 +2827,7 @@ def _model_flow_named_custom(config, provider_info):
    print()

    print("Fetching available models...")
-    models = fetch_api_models(
-        api_key, base_url, timeout=8.0,
-        api_mode=api_mode or None,
-    )
+    models = fetch_api_models(api_key, base_url, timeout=8.0)

    if models:
        default_idx = 0
@@ -4010,71 +3917,12 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
                print("Cancelled.")
                return
            save_env_value(key_env, new_key)
-            existing_key = new_key
            print("API key saved.")
            print()
    else:
        print(f"  {pconfig.name} API key: {existing_key[:8]}... ✓")
        print()

-    # Gemini free-tier gate: free-tier daily quotas (<= 250 RPD for Flash)
-    # are exhausted in a handful of agent turns, so refuse to wire up the
-    # provider with a free-tier key. Probe is best-effort; network or auth
-    # errors fall through without blocking.
-    if provider_id == "gemini" and existing_key:
-        try:
-            from agent.gemini_native_adapter import probe_gemini_tier
-        except Exception:
-            probe_gemini_tier = None
-        if probe_gemini_tier is not None:
-            print("  Checking Gemini API tier...")
-            probe_base = (
-                (get_env_value(base_url_env) if base_url_env else "")
-                or os.getenv(base_url_env or "", "")
-                or pconfig.inference_base_url
-            )
-            tier = probe_gemini_tier(existing_key, probe_base)
-            if tier == "free":
-                print()
-                print(
-                    "❌ This Google API key is on the free tier "
-                    "(<= 250 requests/day for gemini-2.5-flash)."
-                )
-                print(
-                    "   Hermes typically makes 3-10 API calls per user turn "
-                    "(tool iterations + auxiliary tasks),"
-                )
-                print(
-                    "   so the free tier is exhausted after a handful of "
-                    "messages and cannot sustain"
-                )
-                print("   an agent session.")
-                print()
-                print(
-                    "   To use Gemini with Hermes, enable billing on your "
-                    "Google Cloud project and regenerate"
-                )
-                print(
-                    "   the key in a billing-enabled project: "
-                    "https://aistudio.google.com/apikey"
-                )
-                print()
-                print(
-                    "   Alternatives with workable free usage: DeepSeek, "
-                    "OpenRouter (free models), Groq, Nous."
-                )
-                print()
-                print("Not saving Gemini as the default provider.")
-                return
-            if tier == "paid":
-                print("  Tier check: paid ✓")
-            else:
-                # "unknown" -- network issue, auth problem, unexpected response.
-                # Don't block; the runtime 429 handler will surface free-tier
-                # guidance if the key turns out to be free tier.
-                print("  Tier check: could not verify (proceeding anyway).")
-            print()
-
    # Optional base URL override
    current_base = ""
    if base_url_env:
@@ -4316,8 +4164,6 @@ def _model_flow_anthropic(config, current_model=""):
        from agent.anthropic_adapter import (
            read_claude_code_credentials,
            is_claude_code_token_valid,
-            _is_oauth_token,
-            _resolve_claude_code_token_from_credentials,
        )

        cc_creds = read_claude_code_credentials()
@@ -4326,14 +4172,7 @@ def _model_flow_anthropic(config, current_model=""):
    except Exception:
        pass

-    # Stale-OAuth guard: if the only existing cred is an expired OAuth token
-    # (no valid cc_creds to fall back on), treat it as missing so the re-auth
-    # path is offered instead of silently accepting a broken token.
-    existing_is_stale_oauth = False
-    if existing_key and _is_oauth_token(existing_key) and not cc_available:
-        existing_is_stale_oauth = True
-
-    has_creds = (bool(existing_key) and not existing_is_stale_oauth) or cc_available
+    has_creds = bool(existing_key) or cc_available
    needs_auth = not has_creds

    if has_creds:
@@ -6046,31 +5885,6 @@ def _cmd_update_impl(args, gateway_mode: bool):
            )
            import signal as _signal

-            def _wait_for_service_active(
-                scope_cmd_: list, svc_name_: str, timeout: float = 10.0,
-            ) -> bool:
-                """Poll ``systemctl is-active`` until the unit reports active.
-
-                systemd's Stopped -> Started transition after a graceful exit
-                (or a hard restart) is not instantaneous; a one-shot check
-                races that window and falsely reports the unit as down.
-                Poll every 0.5s up to ``timeout`` seconds before giving up.
-                """
-                deadline = _time.monotonic() + max(timeout, 0.5)
-                while True:
-                    try:
-                        _verify = subprocess.run(
-                            scope_cmd_ + ["is-active", svc_name_],
-                            capture_output=True, text=True, timeout=5,
-                        )
-                        if _verify.stdout.strip() == "active":
-                            return True
-                    except (FileNotFoundError, subprocess.TimeoutExpired):
-                        pass
-                    if _time.monotonic() >= deadline:
-                        return False
-                    _time.sleep(0.5)
-
            # Drain budget for graceful SIGUSR1 restarts.  The gateway drains
            # for up to ``agent.restart_drain_timeout`` (default 60s) before
            # exiting with code 75; we wait slightly longer so the drain
@@ -6177,14 +5991,14 @@ def _cmd_update_impl(args, gateway_mode: bool):

                            if _graceful_ok:
                                # Gateway exited 75; systemd should relaunch
-                                # via Restart=on-failure.  Poll is-active for
-                                # up to ~10s because the unit's Stopped ->
-                                # Started transition can take a few seconds
-                                # after the old PID exits, and a one-shot
-                                # check races that window.
-                                if _wait_for_service_active(
-                                    scope_cmd, svc_name, timeout=10.0,
-                                ):
+                                # via Restart=on-failure.  Verify the new
+                                # process came up.
+                                _time.sleep(3)
+                                verify = subprocess.run(
+                                    scope_cmd + ["is-active", svc_name],
+                                    capture_output=True, text=True, timeout=5,
+                                )
+                                if verify.stdout.strip() == "active":
                                    restarted_services.append(svc_name)
                                    continue
                                # Process exited but wasn't respawned (older
@@ -6210,9 +6024,14 @@ def _cmd_update_impl(args, gateway_mode: bool):
                                # Verify the service actually survived the
                                # restart.  systemctl restart returns 0 even
                                # if the new process crashes immediately.
-                                if _wait_for_service_active(
-                                    scope_cmd, svc_name, timeout=10.0,
-                                ):
+                                _time.sleep(3)
+                                verify = subprocess.run(
+                                    scope_cmd + ["is-active", svc_name],
+                                    capture_output=True,
+                                    text=True,
+                                    timeout=5,
+                                )
+                                if verify.stdout.strip() == "active":
                                    restarted_services.append(svc_name)
                                else:
                                    # Retry once — transient startup failures
@@ -6227,9 +6046,14 @@ def _cmd_update_impl(args, gateway_mode: bool):
                                        text=True,
                                        timeout=15,
                                    )
-                                    if _wait_for_service_active(
-                                        scope_cmd, svc_name, timeout=10.0,
-                                    ):
+                                    _time.sleep(3)
+                                    verify2 = subprocess.run(
+                                        scope_cmd + ["is-active", svc_name],
+                                        capture_output=True,
+                                        text=True,
+                                        timeout=5,
+                                    )
+                                    if verify2.stdout.strip() == "active":
                                        restarted_services.append(svc_name)
                                        print(f"  ✓ {svc_name} recovered on retry")
                                    else:
@@ -6730,15 +6554,9 @@ def cmd_dashboard(args):
    try:
        import fastapi  # noqa: F401
        import uvicorn  # noqa: F401
-    except ImportError as e:
-        print("Web UI dependencies not installed (need fastapi + uvicorn).")
-        print(
-            f"Re-install the package into this interpreter so metadata updates apply:\n"
-            f"  cd {PROJECT_ROOT}\n"
-            f"  {sys.executable} -m pip install -e .\n"
-            "If `pip` is missing in this venv, use:  uv pip install -e ."
-        )
-        print(f"Import error: {e}")
+    except ImportError:
+        print("Web UI dependencies not installed.")
+        print(f"Install them with:  {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'")
        sys.exit(1)

    if "HERMES_WEB_DIST" not in os.environ:
@@ -6747,13 +6565,11 @@ def cmd_dashboard(args):

    from hermes_cli.web_server import start_server

-    embedded_chat = args.tui or os.environ.get("HERMES_DASHBOARD_TUI") == "1"
    start_server(
        host=args.host,
        port=args.port,
        open_browser=not args.no_open,
        allow_public=getattr(args, "insecure", False),
-        embedded_chat=embedded_chat,
    )


@@ -7356,7 +7172,7 @@ For more help on a command:
    )
    logout_parser.add_argument(
        "--provider",
-        choices=["nous", "openai-codex", "spotify"],
+        choices=["nous", "openai-codex"],
        default=None,
        help="Provider to log out from (default: active provider)",
    )
@@ -7413,17 +7229,6 @@ For more help on a command:
        "reset", help="Clear exhaustion status for all credentials for a provider"
    )
    auth_reset.add_argument("provider", help="Provider id")
-    auth_status = auth_subparsers.add_parser("status", help="Show auth status for a provider")
-    auth_status.add_argument("provider", help="Provider id")
-    auth_logout = auth_subparsers.add_parser("logout", help="Log out a provider and clear stored auth state")
-    auth_logout.add_argument("provider", help="Provider id")
-    auth_spotify = auth_subparsers.add_parser("spotify", help="Authenticate Hermes with Spotify via PKCE")
-    auth_spotify.add_argument("spotify_action", nargs="?", choices=["login", "status", "logout"], default="login")
-    auth_spotify.add_argument("--client-id", help="Spotify app client_id (or set HERMES_SPOTIFY_CLIENT_ID)")
-    auth_spotify.add_argument("--redirect-uri", help="Allow-listed localhost redirect URI for your Spotify app")
-    auth_spotify.add_argument("--scope", help="Override requested Spotify scopes")
-    auth_spotify.add_argument("--no-browser", action="store_true", help="Do not attempt to open the browser automatically")
-    auth_spotify.add_argument("--timeout", type=float, help="Callback/token exchange timeout in seconds")
    auth_parser.set_defaults(func=cmd_auth)

    # =========================================================================
@@ -7480,10 +7285,6 @@ For more help on a command:
        "--script",
        help="Path to a Python script whose stdout is injected into the prompt each run",
    )
-    cron_create.add_argument(
-        "--workdir",
-        help="Absolute path for the job to run from. Injects AGENTS.md / CLAUDE.md / .cursorrules from that directory and uses it as the cwd for terminal/file/code_exec tools. Omit to preserve old behaviour (no project context files).",
-    )

    # cron edit
    cron_edit = cron_subparsers.add_parser(
@@ -7522,10 +7323,6 @@ For more help on a command:
        "--script",
        help="Path to a Python script whose stdout is injected into the prompt each run. Pass empty string to clear.",
    )
-    cron_edit.add_argument(
-        "--workdir",
-        help="Absolute path for the job to run from (injects AGENTS.md etc. and sets terminal cwd). Pass empty string to clear.",
-    )

    # lifecycle actions
    cron_pause = cron_subparsers.add_parser("pause", help="Pause a scheduled job")
@@ -8939,14 +8736,6 @@ Examples:
        action="store_true",
        help="Allow binding to non-localhost (DANGEROUS: exposes API keys on the network)",
    )
-    dashboard_parser.add_argument(
-        "--tui",
-        action="store_true",
-        help=(
-            "Expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket). "
-            "Alternatively set HERMES_DASHBOARD_TUI=1."
-        ),
-    )
    dashboard_parser.set_defaults(func=cmd_dashboard)

    # =========================================================================
@@ -12,12 +12,8 @@ Different LLM providers expect model identifiers in different formats:
  model IDs, but Claude still uses hyphenated native names like
  ``claude-sonnet-4-6``.
 - **OpenCode Go** preserves dots in model names: ``minimax-m2.7``.
- **DeepSeek** accepts ``deepseek-chat`` (V3), ``deepseek-reasoner``
-  (R1-family), and the first-class V-series IDs (``deepseek-v4-pro``,
-  ``deepseek-v4-flash``, and any future ``deepseek-v<N>-*``).  Older
-  Hermes revisions folded every non-reasoner input into
-  ``deepseek-chat``, which on aggregators routes to V3 — so a user
-  picking V4 Pro was silently downgraded.
+- **DeepSeek** only accepts two model identifiers:
+  ``deepseek-chat`` and ``deepseek-reasoner``.
 - **Custom** and remaining providers pass the name through as-is.

 This module centralises that translation so callers can simply write::
@@ -29,7 +25,6 @@ Inspired by Clawdbot's ``normalizeAnthropicModelId`` pattern.

 from __future__ import annotations

-import re
 from typing import Optional

 # ---------------------------------------------------------------------------
@@ -105,15 +100,6 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
    "custom",
 })

-# Providers whose APIs require lowercase model IDs.  Xiaomi's
-# ``api.xiaomimimo.com`` rejects mixed-case names like ``MiMo-V2.5-Pro``
-# that users might copy from marketing docs — it only accepts
-# ``mimo-v2.5-pro``.  After stripping a matching provider prefix, these
-# providers also get ``.lower()`` applied.
-_LOWERCASE_MODEL_PROVIDERS: frozenset[str] = frozenset({
-    "xiaomi",
-})
-
 # ---------------------------------------------------------------------------
 # DeepSeek special handling
 # ---------------------------------------------------------------------------
@@ -129,30 +115,17 @@ _DEEPSEEK_REASONER_KEYWORDS: frozenset[str] = frozenset({
 })

 _DEEPSEEK_CANONICAL_MODELS: frozenset[str] = frozenset({
-    "deepseek-chat",       # V3 on DeepSeek direct and most aggregators
-    "deepseek-reasoner",   # R1-family reasoning model
-    "deepseek-v4-pro",     # V4 Pro — first-class model ID
-    "deepseek-v4-flash",   # V4 Flash — first-class model ID
+    "deepseek-chat",
+    "deepseek-reasoner",
 })

-# First-class V-series IDs (``deepseek-v4-pro``, ``deepseek-v4-flash``,
-# future ``deepseek-v5-*``, dated variants like ``deepseek-v4-flash-20260423``).
-# Verified empirically 2026-04-24: DeepSeek's Chat Completions API returns
-# ``provider: DeepSeek`` / ``model: deepseek-v4-flash-20260423`` when called
-# with ``model=deepseek/deepseek-v4-flash``, so these names are not aliases
-# of ``deepseek-chat`` and must not be folded into it.
-_DEEPSEEK_V_SERIES_RE = re.compile(r"^deepseek-v\d+([-.].+)?$")
-

 def _normalize_for_deepseek(model_name: str) -> str:
-    """Map a model input to a DeepSeek-accepted identifier.
+    """Map any model input to one of DeepSeek's two accepted identifiers.

    Rules:
-    - Already a known canonical (``deepseek-chat``/``deepseek-reasoner``/
-      ``deepseek-v4-pro``/``deepseek-v4-flash``) -> pass through.
-    - Matches the V-series pattern ``deepseek-v<digit>...`` -> pass through
-      (covers future ``deepseek-v5-*`` and dated variants without a release).
-    - Contains a reasoner keyword (r1, think, reasoning, cot, reasoner)
+    - Already ``deepseek-chat`` or ``deepseek-reasoner`` -> pass through.
+    - Contains any reasoner keyword (r1, think, reasoning, cot, reasoner)
      -> ``deepseek-reasoner``.
    - Everything else -> ``deepseek-chat``.

@@ -160,17 +133,13 @@ def _normalize_for_deepseek(model_name: str) -> str:
        model_name: The bare model name (vendor prefix already stripped).

    Returns:
-        A DeepSeek-accepted model identifier.
+        One of ``"deepseek-chat"`` or ``"deepseek-reasoner"``.
    """
    bare = _strip_vendor_prefix(model_name).lower()

    if bare in _DEEPSEEK_CANONICAL_MODELS:
        return bare

-    # V-series first-class IDs (v4-pro, v4-flash, future v5-*, dated variants)
-    if _DEEPSEEK_V_SERIES_RE.match(bare):
-        return bare
-
    # Check for reasoner-like keywords anywhere in the name
    for keyword in _DEEPSEEK_REASONER_KEYWORDS:
        if keyword in bare:
@@ -378,9 +347,6 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:

        >>> normalize_model_for_provider("claude-sonnet-4.6", "zai")
        'claude-sonnet-4.6'
-
-        >>> normalize_model_for_provider("MiMo-V2.5-Pro", "xiaomi")
-        'mimo-v2.5-pro'
    """
    name = (model_input or "").strip()
    if not name:
@@ -444,12 +410,7 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:

    # --- Direct providers: repair matching provider prefixes only ---
    if provider in _MATCHING_PREFIX_STRIP_PROVIDERS:
-        result = _strip_matching_provider_prefix(name, provider)
-        # Some providers require lowercase model IDs (e.g. Xiaomi's API
-        # rejects "MiMo-V2.5-Pro" but accepts "mimo-v2.5-pro").
-        if provider in _LOWERCASE_MODEL_PROVIDERS:
-            result = result.lower()
-        return result
+        return _strip_matching_provider_prefix(name, provider)

    # --- Authoritative native providers: preserve user-facing slugs as-is ---
    if provider in _AUTHORITATIVE_NATIVE_PROVIDERS:
@@ -527,42 +527,6 @@ def _resolve_alias_fallback(
    return None


-def resolve_display_context_length(
-    model: str,
-    provider: str,
-    base_url: str = "",
-    api_key: str = "",
-    model_info: Optional[ModelInfo] = None,
-) -> Optional[int]:
-    """Resolve the context length to show in /model output.
-
-    models.dev reports per-vendor context (e.g. gpt-5.5 = 1.05M on openai)
-    but provider-enforced limits can be lower (e.g. Codex OAuth caps the
-    same slug at 272k). The authoritative source is
-    ``agent.model_metadata.get_model_context_length`` which already knows
-    about Codex OAuth, Copilot, Nous, and falls back to models.dev for the
-    rest.
-
-    Prefer the provider-aware value; fall back to ``model_info.context_window``
-    only if the resolver returns nothing.
-    """
-    try:
-        from agent.model_metadata import get_model_context_length
-        ctx = get_model_context_length(
-            model,
-            base_url=base_url or "",
-            api_key=api_key or "",
-            provider=provider or None,
-        )
-        if ctx:
-            return int(ctx)
-    except Exception:
-        pass
-    if model_info is not None and model_info.context_window:
-        return int(model_info.context_window)
-    return None
-
-
 # ---------------------------------------------------------------------------
 # Core model-switching pipeline
 # ---------------------------------------------------------------------------
@@ -807,10 +771,7 @@ def switch_model(

    if provider_changed or explicit_provider:
        try:
-            runtime = resolve_runtime_provider(
-                requested=target_provider,
-                target_model=new_model,
-            )
+            runtime = resolve_runtime_provider(requested=target_provider)
            api_key = runtime.get("api_key", "")
            base_url = runtime.get("base_url", "")
            api_mode = runtime.get("api_mode", "")
@@ -827,10 +788,7 @@ def switch_model(
            )
    else:
        try:
-            runtime = resolve_runtime_provider(
-                requested=current_provider,
-                target_model=new_model,
-            )
+            runtime = resolve_runtime_provider(requested=current_provider)
            api_key = runtime.get("api_key", "")
            base_url = runtime.get("base_url", "")
            api_mode = runtime.get("api_mode", "")
@@ -857,7 +815,6 @@ def switch_model(
            target_provider,
            api_key=api_key,
            base_url=base_url,
-            api_mode=api_mode or None,
        )
    except Exception as e:
        validation = {
@@ -979,7 +936,7 @@ def list_authenticated_providers(
    from hermes_cli.auth import PROVIDER_REGISTRY
    from hermes_cli.models import (
        OPENROUTER_MODELS, _PROVIDER_MODELS,
-        _MODELS_DEV_PREFERRED, _merge_with_models_dev, provider_model_ids,
+        _MODELS_DEV_PREFERRED, _merge_with_models_dev,
    )

    results: List[dict] = []
@@ -1027,14 +984,6 @@ def list_authenticated_providers(

        # Check if any env var is set
        has_creds = any(os.environ.get(ev) for ev in env_vars)
-        if not has_creds:
-            try:
-                from hermes_cli.auth import _load_auth_store
-                store = _load_auth_store()
-                if store and hermes_id in store.get("credential_pool", {}):
-                    has_creds = True
-            except Exception:
-                pass
        if not has_creds:
            continue

@@ -1146,14 +1095,11 @@ def list_authenticated_providers(
        if not has_creds:
            continue

-        if hermes_slug in {"copilot", "copilot-acp"}:
-            model_ids = provider_model_ids(hermes_slug)
-        else:
-            # Use curated list — look up by Hermes slug, fall back to overlay key
-            model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
-            # Merge with models.dev for preferred providers (same rationale as above).
-            if hermes_slug in _MODELS_DEV_PREFERRED:
-                model_ids = _merge_with_models_dev(hermes_slug, model_ids)
+        # Use curated list — look up by Hermes slug, fall back to overlay key
+        model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
+        # Merge with models.dev for preferred providers (same rationale as above).
+        if hermes_slug in _MODELS_DEV_PREFERRED:
+            model_ids = _merge_with_models_dev(hermes_slug, model_ids)
        total = len(model_ids)
        top = model_ids[:max_models]

@@ -1276,15 +1222,6 @@ def list_authenticated_providers(
                    if m and m not in models_list:
                        models_list.append(m)

-            # Official OpenAI API rows in providers: often have base_url but no
-            # explicit models: dict — avoid a misleading zero count in /model.
-            if not models_list:
-                url_lower = str(api_url).strip().lower()
-                if "api.openai.com" in url_lower:
-                    fb = curated.get("openai") or []
-                    if fb:
-                        models_list = list(fb)
-
            # Try to probe /v1/models if URL is set (but don't block on it)
            # For now just show what we know from config
            results.append({
@@ -33,8 +33,6 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
 # (model_id, display description shown in menus)
 OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("moonshotai/kimi-k2.6",            "recommended"),
-    ("deepseek/deepseek-v4-pro",        ""),
-    ("deepseek/deepseek-v4-flash",      ""),
    ("anthropic/claude-opus-4.7",       ""),
    ("anthropic/claude-opus-4.6",       ""),
    ("anthropic/claude-sonnet-4.6",     ""),
@@ -42,7 +40,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("anthropic/claude-sonnet-4.5",     ""),
    ("anthropic/claude-haiku-4.5",      ""),
    ("openrouter/elephant-alpha",       "free"),
-    ("openai/gpt-5.5",                  ""),
+    ("openai/gpt-5.4",                  ""),
    ("openai/gpt-5.4-mini",             ""),
    ("xiaomi/mimo-v2.5-pro",             ""),
    ("xiaomi/mimo-v2.5",                 ""),
@@ -65,7 +63,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("nvidia/nemotron-3-super-120b-a12b:free", "free"),
    ("arcee-ai/trinity-large-preview:free", "free"),
    ("arcee-ai/trinity-large-thinking",  ""),
-    ("openai/gpt-5.5-pro",              ""),
+    ("openai/gpt-5.4-pro",              ""),
    ("openai/gpt-5.4-nano",             ""),
 ]

@@ -111,8 +109,6 @@ def _codex_curated_models() -> list[str]:
 _PROVIDER_MODELS: dict[str, list[str]] = {
    "nous": [
        "moonshotai/kimi-k2.6",
-        "deepseek/deepseek-v4-pro",
-        "deepseek/deepseek-v4-flash",
        "xiaomi/mimo-v2.5-pro",
        "xiaomi/mimo-v2.5",
        "anthropic/claude-opus-4.7",
@@ -120,7 +116,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "anthropic/claude-sonnet-4.6",
        "anthropic/claude-sonnet-4.5",
        "anthropic/claude-haiku-4.5",
-        "openai/gpt-5.5",
+        "openai/gpt-5.4",
        "openai/gpt-5.4-mini",
        "openai/gpt-5.3-codex",
        "google/gemini-3-pro-preview",
@@ -139,21 +135,9 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "x-ai/grok-4.20-beta",
        "nvidia/nemotron-3-super-120b-a12b",
        "arcee-ai/trinity-large-thinking",
-        "openai/gpt-5.5-pro",
+        "openai/gpt-5.4-pro",
        "openai/gpt-5.4-nano",
    ],
-    # Native OpenAI Chat Completions (api.openai.com). Used by /model counts and
-    # provider_model_ids fallback when /v1/models is unavailable.
-    "openai": [
-        "gpt-5.4",
-        "gpt-5.4-mini",
-        "gpt-5-mini",
-        "gpt-5.3-codex",
-        "gpt-5.2-codex",
-        "gpt-4.1",
-        "gpt-4o",
-        "gpt-4o-mini",
-    ],
    "openai-codex": _codex_curated_models(),
    "copilot-acp": [
        "copilot-acp",
@@ -167,13 +151,10 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "gpt-4.1",
        "gpt-4o",
        "gpt-4o-mini",
+        "claude-opus-4.6",
        "claude-sonnet-4.6",
-        "claude-sonnet-4",
        "claude-sonnet-4.5",
        "claude-haiku-4.5",
-        "gemini-3.1-pro-preview",
-        "gemini-3-pro-preview",
-        "gemini-3-flash-preview",
        "gemini-2.5-pro",
        "grok-code-fast-1",
    ],
@@ -265,8 +246,6 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "claude-haiku-4-5-20251001",
    ],
    "deepseek": [
-        "deepseek-v4-pro",
-        "deepseek-v4-flash",
        "deepseek-chat",
        "deepseek-reasoner",
    ],
@@ -697,7 +676,7 @@ def get_nous_recommended_aux_model(
 # ---------------------------------------------------------------------------
 # Canonical provider list — single source of truth for provider identity.
 # Every code path that lists, displays, or iterates providers derives from
-# this list:  hermes model, /model, list_authenticated_providers.
+# this list:  hermes model, /model, /provider, list_authenticated_providers.
 #
 # Fields:
 #   slug        — internal provider ID (used in config.yaml, --provider flag)
@@ -1125,10 +1104,7 @@ def fetch_models_with_pricing(
        return _pricing_cache[cache_key]

    url = cache_key.rstrip("/") + "/v1/models"
-    headers: dict[str, str] = {
-        "Accept": "application/json",
-        "User-Agent": _HERMES_USER_AGENT,
-    }
+    headers: dict[str, str] = {"Accept": "application/json"}
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"

@@ -1760,17 +1736,6 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
        live = fetch_ollama_cloud_models(force_refresh=force_refresh)
        if live:
            return live
-    if normalized == "openai":
-        api_key = os.getenv("OPENAI_API_KEY", "").strip()
-        if api_key:
-            base_raw = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
-            base = base_raw or "https://api.openai.com/v1"
-            try:
-                live = fetch_api_models(api_key, base)
-                if live:
-                    return live
-            except Exception:
-                pass
    if normalized == "custom":
        base_url = _get_custom_base_url()
        if base_url:
@@ -1925,51 +1890,6 @@ def fetch_github_model_catalog(
    return None


-# ─── Copilot catalog context-window helpers ─────────────────────────────────
-
-# Module-level cache: {model_id: max_prompt_tokens}
-_copilot_context_cache: dict[str, int] = {}
-_copilot_context_cache_time: float = 0.0
-_COPILOT_CONTEXT_CACHE_TTL = 3600  # 1 hour
-
-
-def get_copilot_model_context(model_id: str, api_key: Optional[str] = None) -> Optional[int]:
-    """Look up max_prompt_tokens for a Copilot model from the live /models API.
-
-    Results are cached in-process for 1 hour to avoid repeated API calls.
-    Returns the token limit or None if not found.
-    """
-    global _copilot_context_cache, _copilot_context_cache_time
-
-    # Serve from cache if fresh
-    if _copilot_context_cache and (time.time() - _copilot_context_cache_time < _COPILOT_CONTEXT_CACHE_TTL):
-        if model_id in _copilot_context_cache:
-            return _copilot_context_cache[model_id]
-        # Cache is fresh but model not in it — don't re-fetch
-        return None
-
-    # Fetch and populate cache
-    catalog = fetch_github_model_catalog(api_key=api_key)
-    if not catalog:
-        return None
-
-    cache: dict[str, int] = {}
-    for item in catalog:
-        mid = str(item.get("id") or "").strip()
-        if not mid:
-            continue
-        caps = item.get("capabilities") or {}
-        limits = caps.get("limits") or {}
-        max_prompt = limits.get("max_prompt_tokens")
-        if isinstance(max_prompt, int) and max_prompt > 0:
-            cache[mid] = max_prompt
-
-    _copilot_context_cache = cache
-    _copilot_context_cache_time = time.time()
-
-    return cache.get(model_id)
-
-
 def _is_github_models_base_url(base_url: Optional[str]) -> bool:
    normalized = (base_url or "").strip().rstrip("/").lower()
    return (
@@ -2003,7 +1923,6 @@ _COPILOT_MODEL_ALIASES = {
    "openai/o4-mini": "gpt-5-mini",
    "anthropic/claude-opus-4.6": "claude-opus-4.6",
    "anthropic/claude-sonnet-4.6": "claude-sonnet-4.6",
-    "anthropic/claude-sonnet-4": "claude-sonnet-4",
    "anthropic/claude-sonnet-4.5": "claude-sonnet-4.5",
    "anthropic/claude-haiku-4.5": "claude-haiku-4.5",
    # Dash-notation fallbacks: Hermes' default Claude IDs elsewhere use
@@ -2013,12 +1932,10 @@ _COPILOT_MODEL_ALIASES = {
    # "model_not_supported".  See issue #6879.
    "claude-opus-4-6": "claude-opus-4.6",
    "claude-sonnet-4-6": "claude-sonnet-4.6",
-    "claude-sonnet-4-0": "claude-sonnet-4",
    "claude-sonnet-4-5": "claude-sonnet-4.5",
    "claude-haiku-4-5": "claude-haiku-4.5",
    "anthropic/claude-opus-4-6": "claude-opus-4.6",
    "anthropic/claude-sonnet-4-6": "claude-sonnet-4.6",
-    "anthropic/claude-sonnet-4-0": "claude-sonnet-4",
    "anthropic/claude-sonnet-4-5": "claude-sonnet-4.5",
    "anthropic/claude-haiku-4-5": "claude-haiku-4.5",
 }
@@ -2243,15 +2160,8 @@ def probe_api_models(
    api_key: Optional[str],
    base_url: Optional[str],
    timeout: float = 5.0,
-    api_mode: Optional[str] = None,
 ) -> dict[str, Any]:
-    """Probe a ``/models`` endpoint with light URL heuristics.
-
-    For ``anthropic_messages`` mode, uses ``x-api-key`` and
-    ``anthropic-version`` headers (Anthropic's native auth) instead of
-    ``Authorization: Bearer``.  The response shape (``data[].id``) is
-    identical, so the same parser works for both.
-    """
+    """Probe an OpenAI-compatible ``/models`` endpoint with light URL heuristics."""
    normalized = (base_url or "").strip().rstrip("/")
    if not normalized:
        return {
@@ -2283,10 +2193,7 @@ def probe_api_models(

    tried: list[str] = []
    headers: dict[str, str] = {"User-Agent": _HERMES_USER_AGENT}
-    if api_key and api_mode == "anthropic_messages":
-        headers["x-api-key"] = api_key
-        headers["anthropic-version"] = "2023-06-01"
-    elif api_key:
+    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"
    if normalized.startswith(COPILOT_BASE_URL):
        headers.update(copilot_default_headers())
@@ -2328,10 +2235,7 @@ def _fetch_ai_gateway_models(timeout: float = 5.0) -> Optional[list[str]]:
        base_url = AI_GATEWAY_BASE_URL

    url = base_url.rstrip("/") + "/models"
-    headers: dict[str, str] = {
-        "Authorization": f"Bearer {api_key}",
-        "User-Agent": _HERMES_USER_AGENT,
-    }
+    headers: dict[str, str] = {"Authorization": f"Bearer {api_key}"}
    req = urllib.request.Request(url, headers=headers)
    try:
        with urllib.request.urlopen(req, timeout=timeout) as resp:
@@ -2351,14 +2255,13 @@ def fetch_api_models(
    api_key: Optional[str],
    base_url: Optional[str],
    timeout: float = 5.0,
-    api_mode: Optional[str] = None,
 ) -> Optional[list[str]]:
    """Fetch the list of available model IDs from the provider's ``/models`` endpoint.

    Returns a list of model ID strings, or ``None`` if the endpoint could not
    be reached (network error, timeout, auth failure, etc.).
    """
-    return probe_api_models(api_key, base_url, timeout=timeout, api_mode=api_mode).get("models")
+    return probe_api_models(api_key, base_url, timeout=timeout).get("models")


 # ---------------------------------------------------------------------------
@@ -2486,7 +2389,6 @@ def validate_requested_model(
    *,
    api_key: Optional[str] = None,
    base_url: Optional[str] = None,
-    api_mode: Optional[str] = None,
 ) -> dict[str, Any]:
    """
    Validate a ``/model`` value for the active provider.
@@ -2528,11 +2430,7 @@ def validate_requested_model(
        }

    if normalized == "custom":
-        # Try probing with correct auth for the api_mode.
-        if api_mode == "anthropic_messages":
-            probe = probe_api_models(api_key, base_url, api_mode=api_mode)
-        else:
-            probe = probe_api_models(api_key, base_url)
+        probe = probe_api_models(api_key, base_url)
        api_models = probe.get("models")
        if api_models is not None:
            if requested_for_lookup in set(api_models):
@@ -2581,17 +2479,12 @@ def validate_requested_model(
            f"Note: could not reach this custom endpoint's model listing at `{probe.get('probed_url')}`. "
            f"Hermes will still save `{requested}`, but the endpoint should expose `/models` for verification."
        )
-        if api_mode == "anthropic_messages":
-            message += (
-                "\n  Many Anthropic-compatible proxies do not implement the Models API "
-                "(GET /v1/models).  The model name has been accepted without verification."
-            )
        if probe.get("suggested_base_url"):
            message += f"\n  If this server expects `/v1`, try base URL: `{probe.get('suggested_base_url')}`"

        return {
-            "accepted": api_mode == "anthropic_messages",
-            "persist": True,
+            "accepted": False,
+            "persist": False,
            "recognized": False,
            "message": message,
        }
@@ -2679,100 +2572,10 @@ def validate_requested_model(
                ),
            }

-    # Native Anthropic provider: /v1/models requires x-api-key (or Bearer for
-    # OAuth) plus anthropic-version headers.  The generic OpenAI-style probe
-    # below uses plain Bearer auth and 401s against Anthropic, so dispatch to
-    # the native fetcher which handles both API keys and Claude-Code OAuth
-    # tokens.  (The api_mode=="anthropic_messages" branch below handles the
-    # Messages-API transport case separately.)
-    if normalized == "anthropic":
-        anthropic_models = _fetch_anthropic_models()
-        if anthropic_models is not None:
-            if requested_for_lookup in set(anthropic_models):
-                return {
-                    "accepted": True,
-                    "persist": True,
-                    "recognized": True,
-                    "message": None,
-                }
-            auto = get_close_matches(requested_for_lookup, anthropic_models, n=1, cutoff=0.9)
-            if auto:
-                return {
-                    "accepted": True,
-                    "persist": True,
-                    "recognized": True,
-                    "corrected_model": auto[0],
-                    "message": f"Auto-corrected `{requested}` → `{auto[0]}`",
-                }
-            suggestions = get_close_matches(requested, anthropic_models, n=3, cutoff=0.5)
-            suggestion_text = ""
-            if suggestions:
-                suggestion_text = "\n  Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
-            # Accept anyway — Anthropic sometimes gates newer/preview models
-            # (e.g. snapshot IDs, early-access releases) behind accounts
-            # even though they aren't listed on /v1/models.
-            return {
-                "accepted": True,
-                "persist": True,
-                "recognized": False,
-                "message": (
-                    f"Note: `{requested}` was not found in Anthropic's /v1/models listing. "
-                    f"It may still work if you have early-access or snapshot IDs."
-                    f"{suggestion_text}"
-                ),
-            }
-        # _fetch_anthropic_models returned None — no token resolvable or
-        # network failure.  Fall through to the generic warning below.
-
-    # Anthropic Messages API: many proxies don't implement /v1/models.
-    # Try probing with correct auth; if it fails, accept with a warning.
-    if api_mode == "anthropic_messages":
-        api_models = fetch_api_models(api_key, base_url, api_mode=api_mode)
-        if api_models is not None:
-            if requested_for_lookup in set(api_models):
-                return {
-                    "accepted": True,
-                    "persist": True,
-                    "recognized": True,
-                    "message": None,
-                }
-            auto = get_close_matches(requested_for_lookup, api_models, n=1, cutoff=0.9)
-            if auto:
-                return {
-                    "accepted": True,
-                    "persist": True,
-                    "recognized": True,
-                    "corrected_model": auto[0],
-                    "message": f"Auto-corrected `{requested}` → `{auto[0]}`",
-                }
-        # Probe failed or model not found — accept anyway (proxy likely
-        # doesn't implement the Anthropic Models API).
-        return {
-            "accepted": True,
-            "persist": True,
-            "recognized": False,
-            "message": (
-                f"Note: could not verify `{requested}` against this endpoint's "
-                f"model listing.  Many Anthropic-compatible proxies do not "
-                f"implement GET /v1/models.  The model name has been accepted "
-                f"without verification."
-            ),
-        }
-
    # Probe the live API to check if the model actually exists
    api_models = fetch_api_models(api_key, base_url)

    if api_models is not None:
-        # Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs
-        # prefixed with "models/" (e.g. "models/gemini-2.5-flash") — native
-        # Gemini-API convention.  Our curated list and user input both use
-        # the bare ID, so a direct set-membership check drops every known
-        # Gemini model.  Strip the prefix before comparison.  See #12532.
-        if normalized == "gemini":
-            api_models = [
-                m[len("models/"):] if isinstance(m, str) and m.startswith("models/") else m
-                for m in api_models
-            ]
        if requested_for_lookup in set(api_models):
            # API confirmed the model exists
            return {
@@ -71,14 +71,6 @@ VALID_HOOKS: Set[str] = {
    "on_session_finalize",
    "on_session_reset",
    "subagent_stop",
-    # Gateway pre-dispatch hook. Fired once per incoming MessageEvent
-    # after the internal-event guard but BEFORE auth/pairing and agent
-    # dispatch. Plugins may return a dict to influence flow:
-    #   {"action": "skip",    "reason": "..."}  -> drop message (no reply)
-    #   {"action": "rewrite", "text": "..."}    -> replace event.text, continue
-    #   {"action": "allow"}  /  None             -> normal dispatch
-    # Kwargs: event: MessageEvent, gateway: GatewayRunner, session_store.
-    "pre_gateway_dispatch",
 }

 ENTRY_POINTS_GROUP = "hermes_agent.plugins"
@@ -116,10 +116,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        transport="openai_chat",
        base_url_env_var="DASHSCOPE_BASE_URL",
    ),
-    "alibaba-coding-plan": HermesOverlay(
-        transport="openai_chat",
-        base_url_env_var="ALIBABA_CODING_PLAN_BASE_URL",
-    ),
    "vercel": HermesOverlay(
        transport="openai_chat",
        is_aggregator=True,
@@ -263,9 +259,6 @@ ALIASES: Dict[str, str] = {
    "aliyun": "alibaba",
    "qwen": "alibaba",
    "alibaba-cloud": "alibaba",
-    "alibaba_coding": "alibaba-coding-plan",
-    "alibaba-coding": "alibaba-coding-plan",
-    "alibaba_coding_plan": "alibaba-coding-plan",

    # google-gemini-cli (OAuth + Code Assist)
    "gemini-cli": "google-gemini-cli",
@@ -1,229 +0,0 @@
-"""PTY bridge for `hermes dashboard` chat tab.
-
-Wraps a child process behind a pseudo-terminal so its ANSI output can be
-streamed to a browser-side terminal emulator (xterm.js) and typed
-keystrokes can be fed back in.  The only caller today is the
-``/api/pty`` WebSocket endpoint in ``hermes_cli.web_server``.
-
-Design constraints:
-
-* **POSIX-only.**  Hermes Agent supports Windows exclusively via WSL, which
-  exposes a native POSIX PTY via ``openpty(3)``.  Native Windows Python
-  has no PTY; :class:`PtyUnavailableError` is raised with a user-readable
-  install/platform message so the dashboard can render a banner instead of
-  crashing.
-* **Zero Node dependency on the server side.**  We use :mod:`ptyprocess`,
-  which is a pure-Python wrapper around the OS calls.  The browser talks
-  to the same ``hermes --tui`` binary it would launch from the CLI, so
-  every TUI feature (slash popover, model picker, tool rows, markdown,
-  skin engine, clarify/sudo/approval prompts) ships automatically.
-* **Byte-safe I/O.**  Reads and writes go through the PTY master fd
-  directly — we avoid :class:`ptyprocess.PtyProcessUnicode` because
-  streaming ANSI is inherently byte-oriented and UTF-8 boundaries may land
-  mid-read.
-"""
-
-from __future__ import annotations
-
-import errno
-import fcntl
-import os
-import select
-import signal
-import struct
-import sys
-import termios
-import time
-from typing import Optional, Sequence
-
-try:
-    import ptyprocess  # type: ignore
-    _PTY_AVAILABLE = not sys.platform.startswith("win")
-except ImportError:  # pragma: no cover - dev env without ptyprocess
-    ptyprocess = None  # type: ignore
-    _PTY_AVAILABLE = False
-
-
-__all__ = ["PtyBridge", "PtyUnavailableError"]
-
-
-class PtyUnavailableError(RuntimeError):
-    """Raised when a PTY cannot be created on this platform.
-
-    Today this means native Windows (no ConPTY bindings) or a dev
-    environment missing the ``ptyprocess`` dependency.  The dashboard
-    surfaces the message to the user as a chat-tab banner.
-    """
-
-
-class PtyBridge:
-    """Thin wrapper around ``ptyprocess.PtyProcess`` for byte streaming.
-
-    Not thread-safe.  A single bridge is owned by the WebSocket handler
-    that spawned it; the reader runs in an executor thread while writes
-    happen on the event-loop thread.  Both sides are OK because the
-    kernel PTY is the actual synchronization point — we never call
-    :mod:`ptyprocess` methods concurrently, we only call ``os.read`` and
-    ``os.write`` on the master fd, which is safe.
-    """
-
-    def __init__(self, proc: "ptyprocess.PtyProcess"):  # type: ignore[name-defined]
-        self._proc = proc
-        self._fd: int = proc.fd
-        self._closed = False
-
-    # -- lifecycle --------------------------------------------------------
-
-    @classmethod
-    def is_available(cls) -> bool:
-        """True if a PTY can be spawned on this platform."""
-        return bool(_PTY_AVAILABLE)
-
-    @classmethod
-    def spawn(
-        cls,
-        argv: Sequence[str],
-        *,
-        cwd: Optional[str] = None,
-        env: Optional[dict] = None,
-        cols: int = 80,
-        rows: int = 24,
-    ) -> "PtyBridge":
-        """Spawn ``argv`` behind a new PTY and return a bridge.
-
-        Raises :class:`PtyUnavailableError` if the platform can't host a
-        PTY.  Raises :class:`FileNotFoundError` or :class:`OSError` for
-        ordinary exec failures (missing binary, bad cwd, etc.).
-        """
-        if not _PTY_AVAILABLE:
-            if sys.platform.startswith("win"):
-                raise PtyUnavailableError(
-                    "Pseudo-terminals are unavailable on this platform. "
-                    "Hermes Agent supports Windows only via WSL."
-                )
-            if ptyprocess is None:
-                raise PtyUnavailableError(
-                    "The `ptyprocess` package is missing. "
-                    "Install with: pip install ptyprocess "
-                    "(or pip install -e '.[pty]')."
-                )
-            raise PtyUnavailableError("Pseudo-terminals are unavailable.")
-        # Let caller-supplied env fully override inheritance; if they pass
-        # None we inherit the server's env (same semantics as subprocess).
-        spawn_env = os.environ.copy() if env is None else env
-        proc = ptyprocess.PtyProcess.spawn(  # type: ignore[union-attr]
-            list(argv),
-            cwd=cwd,
-            env=spawn_env,
-            dimensions=(rows, cols),
-        )
-        return cls(proc)
-
-    @property
-    def pid(self) -> int:
-        return int(self._proc.pid)
-
-    def is_alive(self) -> bool:
-        if self._closed:
-            return False
-        try:
-            return bool(self._proc.isalive())
-        except Exception:
-            return False
-
-    # -- I/O --------------------------------------------------------------
-
-    def read(self, timeout: float = 0.2) -> Optional[bytes]:
-        """Read up to 64 KiB of raw bytes from the PTY master.
-
-        Returns:
-            * bytes — zero or more bytes of child output
-            * empty bytes (``b""``) — no data available within ``timeout``
-            * None — child has exited and the master fd is at EOF
-
-        Never blocks longer than ``timeout`` seconds.  Safe to call after
-        :meth:`close`; returns ``None`` in that case.
-        """
-        if self._closed:
-            return None
-        try:
-            readable, _, _ = select.select([self._fd], [], [], timeout)
-        except (OSError, ValueError):
-            return None
-        if not readable:
-            return b""
-        try:
-            data = os.read(self._fd, 65536)
-        except OSError as exc:
-            # EIO on Linux = slave side closed.  EBADF = already closed.
-            if exc.errno in (errno.EIO, errno.EBADF):
-                return None
-            raise
-        if not data:
-            return None
-        return data
-
-    def write(self, data: bytes) -> None:
-        """Write raw bytes to the PTY master (i.e. the child's stdin)."""
-        if self._closed or not data:
-            return
-        # os.write can return a short write under load; loop until drained.
-        view = memoryview(data)
-        while view:
-            try:
-                n = os.write(self._fd, view)
-            except OSError as exc:
-                if exc.errno in (errno.EIO, errno.EBADF, errno.EPIPE):
-                    return
-                raise
-            if n <= 0:
-                return
-            view = view[n:]
-
-    def resize(self, cols: int, rows: int) -> None:
-        """Forward a terminal resize to the child via ``TIOCSWINSZ``."""
-        if self._closed:
-            return
-        # struct winsize: rows, cols, xpixel, ypixel (all unsigned short)
-        winsize = struct.pack("HHHH", max(1, rows), max(1, cols), 0, 0)
-        try:
-            fcntl.ioctl(self._fd, termios.TIOCSWINSZ, winsize)
-        except OSError:
-            pass
-
-    # -- teardown ---------------------------------------------------------
-
-    def close(self) -> None:
-        """Terminate the child (SIGTERM → 0.5s grace → SIGKILL) and close fds.
-
-        Idempotent.  Reaping the child is important so we don't leak
-        zombies across the lifetime of the dashboard process.
-        """
-        if self._closed:
-            return
-        self._closed = True
-
-        # SIGHUP is the conventional "your terminal went away" signal.
-        # We escalate if the child ignores it.
-        for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
-            if not self._proc.isalive():
-                break
-            try:
-                self._proc.kill(sig)
-            except Exception:
-                pass
-            deadline = time.monotonic() + 0.5
-            while self._proc.isalive() and time.monotonic() < deadline:
-                time.sleep(0.02)
-
-        try:
-            self._proc.close(force=True)
-        except Exception:
-            pass
-
-    # Context-manager sugar — handy in tests and ad-hoc scripts.
-    def __enter__(self) -> "PtyBridge":
-        return self
-
-    def __exit__(self, *_exc) -> None:
-        self.close()
@@ -36,29 +36,6 @@ def _normalize_custom_provider_name(value: str) -> str:
    return value.strip().lower().replace(" ", "-")


-def _loopback_hostname(host: str) -> bool:
-    h = (host or "").lower().rstrip(".")
-    return h in {"localhost", "127.0.0.1", "::1", "0.0.0.0"}
-
-
-def _config_base_url_trustworthy_for_bare_custom(cfg_base_url: str, cfg_provider: str) -> bool:
-    """Decide whether ``model.base_url`` may back bare ``custom`` runtime resolution.
-
-    GitHub #14676: the model picker can select Custom while ``model.provider`` still reflects a
-    previous provider. Reject non-loopback URLs unless the YAML provider is already ``custom``,
-    so a stale OpenRouter/Z.ai base_url cannot hijack local ``custom`` sessions.
-    """
-    cfg_provider_norm = (cfg_provider or "").strip().lower()
-    bu = (cfg_base_url or "").strip()
-    if not bu:
-        return False
-    if cfg_provider_norm == "custom":
-        return True
-    if base_url_host_matches(bu, "openrouter.ai"):
-        return False
-    return _loopback_hostname(base_url_hostname(bu))
-
-
 def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
    """Auto-detect api_mode from the resolved base URL.

@@ -183,16 +160,8 @@ def _resolve_runtime_from_pool_entry(
    requested_provider: str,
    model_cfg: Optional[Dict[str, Any]] = None,
    pool: Optional[CredentialPool] = None,
-    target_model: Optional[str] = None,
 ) -> Dict[str, Any]:
    model_cfg = model_cfg or _get_model_config()
-    # When the caller is resolving for a specific target model (e.g. a /model
-    # mid-session switch), prefer that over the persisted model.default. This
-    # prevents api_mode being computed from a stale config default that no
-    # longer matches the model actually being used — the bug that caused
-    # opencode-zen /v1 to be stripped for chat_completions requests when
-    # config.default was still a Claude model.
-    effective_model = (target_model or model_cfg.get("default") or "")
    base_url = (getattr(entry, "runtime_base_url", None) or getattr(entry, "base_url", None) or "").rstrip("/")
    api_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
    api_mode = "chat_completions"
@@ -238,7 +207,7 @@ def _resolve_runtime_from_pool_entry(
            api_mode = configured_mode
        elif provider in ("opencode-zen", "opencode-go"):
            from hermes_cli.models import opencode_model_api_mode
-            api_mode = opencode_model_api_mode(provider, effective_model)
+            api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
        else:
            # Auto-detect Anthropic-compatible endpoints (/anthropic suffix,
            # Kimi /coding, api.openai.com → codex_responses, api.x.ai →
@@ -354,16 +323,12 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
                # Found match by provider key
                base_url = entry.get("api") or entry.get("url") or entry.get("base_url") or ""
                if base_url:
-                    result = {
+                    return {
                        "name": entry.get("name", ep_name),
                        "base_url": base_url.strip(),
                        "api_key": resolved_api_key,
                        "model": entry.get("default_model", ""),
                    }
-                    api_mode = _parse_api_mode(entry.get("api_mode"))
-                    if api_mode:
-                        result["api_mode"] = api_mode
-                    return result
            # Also check the 'name' field if present
            display_name = entry.get("name", "")
            if display_name:
@@ -372,16 +337,12 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
                    # Found match by display name
                    base_url = entry.get("api") or entry.get("url") or entry.get("base_url") or ""
                    if base_url:
-                        result = {
+                        return {
                            "name": display_name,
                            "base_url": base_url.strip(),
                            "api_key": resolved_api_key,
                            "model": entry.get("default_model", ""),
                        }
-                        api_mode = _parse_api_mode(entry.get("api_mode"))
-                        if api_mode:
-                            result["api_mode"] = api_mode
-                        return result

    # Fall back to custom_providers: list (legacy format)
    custom_providers = config.get("custom_providers")
@@ -503,7 +464,6 @@ def _resolve_openrouter_runtime(
    cfg_provider = cfg_provider.strip().lower()

    env_openrouter_base_url = os.getenv("OPENROUTER_BASE_URL", "").strip()
-    env_custom_base_url = os.getenv("CUSTOM_BASE_URL", "").strip()

    # Use config base_url when available and the provider context matches.
    # OPENAI_BASE_URL env var is no longer consulted — config.yaml is
@@ -513,14 +473,11 @@ def _resolve_openrouter_runtime(
        if requested_norm == "auto":
            if not cfg_provider or cfg_provider == "auto":
                use_config_base_url = True
-        elif requested_norm == "custom" and _config_base_url_trustworthy_for_bare_custom(
-            cfg_base_url, cfg_provider
-        ):
+        elif requested_norm == "custom" and cfg_provider == "custom":
            use_config_base_url = True

    base_url = (
        (explicit_base_url or "").strip()
-        or env_custom_base_url
        or (cfg_base_url.strip() if use_config_base_url else "")
        or env_openrouter_base_url
        or OPENROUTER_BASE_URL
@@ -732,18 +689,8 @@ def resolve_runtime_provider(
    requested: Optional[str] = None,
    explicit_api_key: Optional[str] = None,
    explicit_base_url: Optional[str] = None,
-    target_model: Optional[str] = None,
 ) -> Dict[str, Any]:
-    """Resolve runtime provider credentials for agent execution.
-
-    target_model: Optional override for model_cfg.get("default") when
-    computing provider-specific api_mode (e.g. OpenCode Zen/Go where different
-    models route through different API surfaces). Callers performing an
-    explicit mid-session model switch should pass the new model here so
-    api_mode is derived from the model they are switching TO, not the stale
-    persisted default. Other callers can leave it None to preserve existing
-    behavior (api_mode derived from config).
-    """
+    """Resolve runtime provider credentials for agent execution."""
    requested_provider = resolve_requested_provider(requested)

    custom_runtime = _resolve_named_custom_runtime(
@@ -825,7 +772,6 @@ def resolve_runtime_provider(
                requested_provider=requested_provider,
                model_cfg=model_cfg,
                pool=pool,
-                target_model=target_model,
            )

    if provider == "nous":
@@ -1044,11 +990,7 @@ def resolve_runtime_provider(
                api_mode = configured_mode
            elif provider in ("opencode-zen", "opencode-go"):
                from hermes_cli.models import opencode_model_api_mode
-                # Prefer the target_model from the caller (explicit mid-session
-                # switch) over the stale model.default; see _resolve_runtime_from_pool_entry
-                # for the same rationale.
-                _effective = target_model or model_cfg.get("default", "")
-                api_mode = opencode_model_api_mode(provider, _effective)
+                api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
            else:
                # Auto-detect Anthropic-compatible endpoints by URL convention
                # (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
@@ -18,9 +18,10 @@ import shutil
 import sys
 import copy
 from pathlib import Path
-from typing import Optional, Dict, Any
+from typing import Literal, Optional, Dict, Any

 from hermes_cli.nous_subscription import get_nous_subscription_features
+from hermes_cli.main import _model_flow_nous
 from tools.tool_backend_helpers import managed_nous_tools_enabled
 from utils import base_url_hostname
 from hermes_constants import get_optional_skills_dir
@@ -500,15 +501,6 @@ def _print_setup_summary(config: dict, hermes_home):
    if get_env_value("HASS_TOKEN"):
        tool_status.append(("Smart Home (Home Assistant)", True, None))

-    # Spotify (OAuth via hermes auth spotify — check auth.json, not env vars)
-    try:
-        from hermes_cli.auth import get_provider_auth_state
-        _spotify_state = get_provider_auth_state("spotify") or {}
-        if _spotify_state.get("access_token") or _spotify_state.get("refresh_token"):
-            tool_status.append(("Spotify (PKCE OAuth)", True, None))
-    except Exception:
-        pass
-
    # Skills Hub
    if get_env_value("GITHUB_TOKEN"):
        tool_status.append(("Skills Hub (GitHub)", True, None))
@@ -664,7 +656,7 @@ def _prompt_container_resources(config: dict):



-def setup_model_provider(config: dict, *, quick: bool = False):
+def setup_model_provider(config: dict, *, quick: bool | Literal["nous_portal"] = False):
    """Configure the inference provider and default model.

    Delegates to ``cmd_model()`` (the same flow used by ``hermes model``)
@@ -686,7 +678,11 @@ def setup_model_provider(config: dict, *, quick: bool = False):
    # credential prompting, model selection, and config persistence.
    from hermes_cli.main import select_provider_and_model
    try:
-        select_provider_and_model()
+        if quick == "nous_portal":
+            config = load_config()
+            _model_flow_nous(config)
+        else:
+            select_provider_and_model()
    except (SystemExit, KeyboardInterrupt):
        print()
        print_info("Provider setup skipped.")
@@ -3039,11 +3035,15 @@ def run_setup_wizard(args):
            config = load_config()

        setup_mode = prompt_choice("How would you like to set up Hermes?", [
-            "Quick setup — provider, model & messaging (recommended)",
+            "Nous Account setup — model & messaging (recommended)",
+            "Quick setup — provider, model & messaging",
            "Full setup — configure everything",
        ], 0)

        if setup_mode == 0:
+            _run_first_time_quick_setup(config, hermes_home, is_existing, nous_quick=True)
+            return
+        if setup_mode == 1:
            _run_first_time_quick_setup(config, hermes_home, is_existing)
            return

@@ -3104,7 +3104,7 @@ def _resolve_hermes_chat_argv() -> Optional[list[str]]:
    return None


-def _offer_launch_chat():
+def _offer_launch_chat(auto_launch = False):
    """Prompt the user to jump straight into chat after setup."""
    print()
    if not prompt_yes_no("Launch hermes chat now?", True):
@@ -3118,7 +3118,7 @@ def _offer_launch_chat():
    os.execvp(chat_argv[0], chat_argv)


-def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool):
+def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool, nous_quick=False):
    """Streamlined first-time setup: provider + model only.

    Applies sensible defaults for TTS (Edge), terminal (local), agent
@@ -3126,7 +3126,7 @@ def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool):
    ``hermes setup <section>``.
    """
    # Step 1: Model & Provider (essential — skips rotation/vision/TTS)
-    setup_model_provider(config, quick=True)
+    setup_model_provider(config, quick="nous_portal" if nous_quick else True )

    # Step 2: Apply defaults for everything else
    _apply_default_agent_settings(config)
@@ -3159,7 +3159,9 @@ def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool):

    _print_setup_summary(config, hermes_home)

-    _offer_launch_chat()
+    # if the user hasn't set up the gateway, assume they want to launch chat.
+    force_launch_chat = gateway_choice == 0
+    _offer_launch_chat(force_launch_chat)


 def _run_quick_setup(config: dict, hermes_home):
@@ -164,26 +164,19 @@ def show_status(args):
        qwen_status = {}

    nous_logged_in = bool(nous_status.get("logged_in"))
-    nous_error = nous_status.get("error")
-    nous_label = "logged in" if nous_logged_in else "not logged in (run: hermes auth add nous --type oauth)"
    print(
        f"  {'Nous Portal':<12}  {check_mark(nous_logged_in)} "
-        f"{nous_label}"
+        f"{'logged in' if nous_logged_in else 'not logged in (run: hermes model)'}"
    )
-    portal_url = nous_status.get("portal_base_url") or "(unknown)"
-    access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
-    key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
-    refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
-    if nous_logged_in or portal_url != "(unknown)" or nous_error:
+    if nous_logged_in:
+        portal_url = nous_status.get("portal_base_url") or "(unknown)"
+        access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
+        key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
+        refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
        print(f"    Portal URL: {portal_url}")
-    if nous_logged_in or nous_status.get("access_expires_at"):
        print(f"    Access exp: {access_exp}")
-    if nous_logged_in or nous_status.get("agent_key_expires_at"):
        print(f"    Key exp:    {key_exp}")
-    if nous_logged_in or nous_status.get("has_refresh_token"):
        print(f"    Refresh:    {refresh_label}")
-    if nous_error and not nous_logged_in:
-        print(f"    Error:      {nous_error}")

    codex_logged_in = bool(codex_status.get("logged_in"))
    print(
@@ -127,7 +127,7 @@ TIPS = [

    # --- Tools & Capabilities ---
    "execute_code runs Python scripts that call Hermes tools programmatically — results stay out of context.",
-    "delegate_task spawns up to 3 concurrent sub-agents by default (delegation.max_concurrent_children) with isolated contexts for parallel work.",
+    "delegate_task spawns up to 3 concurrent sub-agents by default (configurable via delegation.max_concurrent_children) with isolated contexts for parallel work.",
    "web_extract works on PDF URLs — pass any PDF link and it converts to markdown.",
    "search_files is ripgrep-backed and faster than grep — use it instead of terminal grep.",
    "patch uses 9 fuzzy matching strategies so minor whitespace differences won't break edits.",
@@ -67,59 +67,25 @@ CONFIGURABLE_TOOLSETS = [
    ("messaging",       "📨 Cross-Platform Messaging",  "send_message"),
    ("rl",              "🧪 RL Training",               "Tinker-Atropos training tools"),
    ("homeassistant",    "🏠 Home Assistant",           "smart home device control"),
-    ("spotify",          "🎵 Spotify",                  "playback, search, playlists, library"),
-    ("discord",         "💬 Discord (read/participate)", "fetch messages, search members, create thread"),
-    ("discord_admin",   "🛡️  Discord Server Admin",    "list channels/roles, pin, assign roles"),
 ]

 # Toolsets that are OFF by default for new installs.
 # They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
 # but the setup checklist won't pre-select them for first-time users.
-_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin"}
-
-# Platform-scoped toolsets: only appear in the `hermes tools` checklist for
-# these platforms, and only resolve/save for these platforms.  A toolset
-# absent from this map is available on every platform (current behaviour).
-#
-# Use this for tools whose APIs only make sense on one platform (Discord
-# server admin, Slack workspace admin, etc.).  Keeps every other platform's
-# checklist from filling up with irrelevant toggles.
-_TOOLSET_PLATFORM_RESTRICTIONS: Dict[str, Set[str]] = {
-    "discord": {"discord"},
-    "discord_admin": {"discord"},
-}
-
-
-def _toolset_allowed_for_platform(ts_key: str, platform: str) -> bool:
-    """Return True if ``ts_key`` is configurable on ``platform``.
-
-    Toolsets without a restriction entry are allowed everywhere (the default).
-    """
-    allowed = _TOOLSET_PLATFORM_RESTRICTIONS.get(ts_key)
-    return allowed is None or platform in allowed
+_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl"}


 def _get_effective_configurable_toolsets():
    """Return CONFIGURABLE_TOOLSETS + any plugin-provided toolsets.

    Plugin toolsets are appended at the end so they appear after the
-    built-in toolsets in the TUI checklist. A plugin whose toolset key
-    already appears in ``CONFIGURABLE_TOOLSETS`` is skipped — bundled
-    plugins (e.g. ``plugins/spotify``) share their toolset key with the
-    built-in entry, and we want the built-in label/description to win.
-    Without the dedupe, ``hermes tools`` → "reconfigure existing" would
-    list the same toolset twice.
+    built-in toolsets in the TUI checklist.
    """
    result = list(CONFIGURABLE_TOOLSETS)
-    seen = {ts_key for ts_key, _, _ in result}
    try:
        from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
        discover_plugins()  # idempotent — ensures plugins are loaded
-        for entry in get_plugin_toolsets():
-            if entry[0] in seen:
-                continue
-            seen.add(entry[0])
-            result.append(entry)
+        result.extend(get_plugin_toolsets())
    except Exception:
        pass
    return result
@@ -395,18 +361,6 @@ TOOL_CATEGORIES = {
            },
        ],
    },
-    "spotify": {
-        "name": "Spotify",
-        "icon": "🎵",
-        "providers": [
-            {
-                "name": "Spotify Web API",
-                "tag": "PKCE OAuth — opens the setup wizard",
-                "env_vars": [],
-                "post_setup": "spotify",
-            },
-        ],
-    },
    "rl": {
        "name": "RL Training",
        "icon": "🧪",
@@ -507,35 +461,6 @@ def _run_post_setup(post_setup_key: str):
            _print_warning("    kittentts install timed out (>5min)")
            _print_info(f"    Run manually: python -m pip install -U '{wheel_url}' soundfile")

-    elif post_setup_key == "spotify":
-        # Run the full `hermes auth spotify` flow — if the user has no
-        # client_id yet, this drops them into the interactive wizard
-        # (opens the Spotify dashboard, prompts for client_id, persists
-        # to ~/.hermes/.env), then continues straight into PKCE. If they
-        # already have an app, it skips the wizard and just does OAuth.
-        from types import SimpleNamespace
-        try:
-            from hermes_cli.auth import login_spotify_command
-        except Exception as exc:
-            _print_warning(f"    Could not load Spotify auth: {exc}")
-            _print_info("    Run manually: hermes auth spotify")
-            return
-        _print_info("    Starting Spotify login...")
-        try:
-            login_spotify_command(SimpleNamespace(
-                client_id=None, redirect_uri=None, scope=None,
-                no_browser=False, timeout=None,
-            ))
-            _print_success("    Spotify authenticated")
-        except SystemExit as exc:
-            # User aborted the wizard, or OAuth failed — don't fail the
-            # toolset enable; they can retry with `hermes auth spotify`.
-            _print_warning(f"    Spotify login did not complete: {exc}")
-            _print_info("    Run later: hermes auth spotify")
-        except Exception as exc:
-            _print_warning(f"    Spotify login failed: {exc}")
-            _print_info("    Run manually: hermes auth spotify")
-
    elif post_setup_key == "rl_training":
        try:
            __import__("tinker_atropos")
@@ -624,7 +549,7 @@ def _get_platform_tools(
    include_default_mcp_servers: bool = True,
 ) -> Set[str]:
    """Resolve which individual toolset names are enabled for a platform."""
-    from toolsets import resolve_toolset, TOOLSETS
+    from toolsets import resolve_toolset

    platform_toolsets = config.get("platform_toolsets") or {}
    toolset_names = platform_toolsets.get(platform)
@@ -638,8 +563,6 @@ def _get_platform_tools(
    toolset_names = [str(ts) for ts in toolset_names]

    configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
-    plugin_ts_keys = _get_plugin_toolset_keys()
-    platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}

    # If the saved list contains any configurable keys directly, the user
    # has explicitly configured this platform — use direct membership.
@@ -649,10 +572,7 @@ def _get_platform_tools(
    has_explicit_config = any(ts in configurable_keys for ts in toolset_names)

    if has_explicit_config:
-        enabled_toolsets = {
-            ts for ts in toolset_names
-            if ts in configurable_keys and _toolset_allowed_for_platform(ts, platform)
-        }
+        enabled_toolsets = {ts for ts in toolset_names if ts in configurable_keys}
    else:
        # No explicit config — fall back to resolving composite toolset names
        # (e.g. "hermes-cli") to individual tool names and reverse-mapping.
@@ -662,59 +582,19 @@ def _get_platform_tools(

        enabled_toolsets = set()
        for ts_key, _, _ in CONFIGURABLE_TOOLSETS:
-            if not _toolset_allowed_for_platform(ts_key, platform):
-                continue
            ts_tools = set(resolve_toolset(ts_key))
            if ts_tools and ts_tools.issubset(all_tool_names):
                enabled_toolsets.add(ts_key)
-
        default_off = set(_DEFAULT_OFF_TOOLSETS)
-        # Legacy safety: if the platform's own name matches a default-off
-        # toolset (e.g. `homeassistant` platform + `homeassistant` toolset),
-        # keep that toolset enabled on first install.  Skip this dodge for
-        # platform-restricted toolsets — those are always opt-in even on
-        # their own platform (e.g. `discord` + `discord` should stay OFF).
-        if platform in default_off and platform not in _TOOLSET_PLATFORM_RESTRICTIONS:
+        if platform in default_off:
            default_off.remove(platform)
        enabled_toolsets -= default_off

-    # Recover non-configurable platform toolsets (e.g. discord, feishu_doc,
-    # feishu_drive).  These are part of the platform's default composite but
-    # absent from CONFIGURABLE_TOOLSETS, so they can't appear in the TUI
-    # checklist or in a user-saved config.  Must run in BOTH branches —
-    # otherwise saving via `hermes tools` (which flips has_explicit_config
-    # to True) silently drops them.
-    platform_tool_universe = set(resolve_toolset(PLATFORMS[platform]["default_toolset"]))
-    configurable_tool_universe = set()
-    for ck in configurable_keys:
-        configurable_tool_universe.update(resolve_toolset(ck))
-    claimed = set()
-    for ts_key in enabled_toolsets:
-        claimed.update(resolve_toolset(ts_key))
-    skip = configurable_keys | plugin_ts_keys | platform_default_keys
-    skip |= {k for k in TOOLSETS if k.startswith("hermes-")}
-    skip |= set(_DEFAULT_OFF_TOOLSETS) - {platform}
-    for ts_key, ts_def in TOOLSETS.items():
-        if ts_key in skip:
-            continue
-        if ts_def.get("includes"):
-            continue
-        ts_tools = set(resolve_toolset(ts_key))
-        if not ts_tools or not ts_tools.issubset(platform_tool_universe):
-            continue
-        if ts_tools.issubset(configurable_tool_universe):
-            continue
-        if not ts_tools.issubset(claimed):
-            enabled_toolsets.add(ts_key)
-            claimed.update(ts_tools)
-
-    # Plugin toolsets: enabled by default unless explicitly disabled, or
-    # unless the toolset is in _DEFAULT_OFF_TOOLSETS (e.g. spotify —
-    # shipped as a bundled plugin but user must opt in via `hermes tools`
-    # so we don't ship 7 Spotify tool schemas to users who don't use it).
+    # Plugin toolsets: enabled by default unless explicitly disabled.
    # A plugin toolset is "known" for a platform once `hermes tools`
    # has been saved for that platform (tracked via known_plugin_toolsets).
    # Unknown plugins default to enabled; known-but-absent = disabled.
+    plugin_ts_keys = _get_plugin_toolset_keys()
    if plugin_ts_keys:
        known_map = config.get("known_plugin_toolsets", {})
        known_for_platform = set(known_map.get(platform, []))
@@ -722,9 +602,6 @@ def _get_platform_tools(
            if pts in toolset_names:
                # Explicitly listed in config — enabled
                enabled_toolsets.add(pts)
-            elif pts in _DEFAULT_OFF_TOOLSETS:
-                # Opt-in plugin toolset — stay off until user picks it
-                continue
            elif pts not in known_for_platform:
                # New plugin not yet seen by hermes tools — default enabled
                enabled_toolsets.add(pts)
@@ -732,6 +609,7 @@ def _get_platform_tools(

    # Preserve any explicit non-configurable toolset entries (for example,
    # custom toolsets or MCP server names saved in platform_toolsets).
+    platform_default_keys = {p["default_toolset"] for p in PLATFORMS.values()}
    explicit_passthrough = {
        ts
        for ts in toolset_names
@@ -777,14 +655,6 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
    """
    config.setdefault("platform_toolsets", {})

-    # Drop platform-scoped toolsets that don't apply here.  Prevents the
-    # "Configure all platforms" checklist (or a hand-edited config.yaml)
-    # from turning on, say, the `discord` toolset for Telegram.
-    enabled_toolset_keys = {
-        ts for ts in enabled_toolset_keys
-        if _toolset_allowed_for_platform(ts, platform)
-    }
-
    # Get the set of all configurable toolset keys (built-in + plugin)
    configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
    plugin_keys = _get_plugin_toolset_keys()
@@ -799,7 +669,6 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
    existing_toolsets = config.get("platform_toolsets", {}).get(platform, [])
    if not isinstance(existing_toolsets, list):
        existing_toolsets = []
-    existing_toolsets = [str(ts) for ts in existing_toolsets]

    # Preserve any entries that are NOT configurable toolsets and NOT platform
    # defaults (i.e. only MCP server names should be preserved)
@@ -807,11 +676,6 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
        entry for entry in existing_toolsets
        if entry not in configurable_keys and entry not in platform_default_keys
    }
-    # Opening `hermes tools` is the user's opt-in to reconfigure tools, so treat
-    # saving from the picker as consent to clear the "no_mcp" sentinel. The
-    # picker has no checkbox for no_mcp, so without this users who once set it
-    # by hand could never re-enable MCP servers through the UI.
-    preserved_entries.discard("no_mcp")

    # Merge preserved entries with new enabled toolsets
    config["platform_toolsets"][platform] = sorted(enabled_toolset_keys | preserved_entries)
@@ -919,7 +783,7 @@ def _estimate_tool_tokens() -> Dict[str, int]:
    return _tool_token_cache


-def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform: str = "cli") -> Set[str]:
+def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
    """Multi-select checklist of toolsets. Returns set of selected toolset keys."""
    from hermes_cli.curses_ui import curses_checklist
    from toolsets import resolve_toolset
@@ -927,12 +791,7 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform:
    # Pre-compute per-tool token counts (cached after first call).
    tool_tokens = _estimate_tool_tokens()

-    effective_all = _get_effective_configurable_toolsets()
-    # Drop platform-scoped toolsets that don't apply to this platform.
-    effective = [
-        (k, l, d) for (k, l, d) in effective_all
-        if _toolset_allowed_for_platform(k, platform)
-    ]
+    effective = _get_effective_configurable_toolsets()

    labels = []
    for ts_key, ts_label, ts_desc in effective:
@@ -1846,7 +1705,7 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
            checklist_preselected = current_enabled - _DEFAULT_OFF_TOOLSETS

            # Show checklist
-            new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected, pkey)
+            new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected)

            added = new_enabled - current_enabled
            removed = current_enabled - new_enabled
@@ -2202,11 +2061,7 @@ def _apply_mcp_change(config: dict, targets: List[str], action: str) -> Set[str]

 def _print_tools_list(enabled_toolsets: set, mcp_servers: dict, platform: str = "cli"):
    """Print a summary of enabled/disabled toolsets and MCP tool filters."""
-    effective_all = _get_effective_configurable_toolsets()
-    effective = [
-        (k, l, d) for (k, l, d) in effective_all
-        if _toolset_allowed_for_platform(k, platform)
-    ]
+    effective = _get_effective_configurable_toolsets()
    builtin_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}

    print(f"Built-in toolsets ({platform}):")
@@ -2272,20 +2127,6 @@ def tools_disable_enable_command(args):
            _print_error(f"Unknown toolset '{name}'")
        toolset_targets = [t for t in toolset_targets if t in valid_toolsets]

-    # Reject platform-scoped toolsets on platforms that don't allow them.
-    restricted_targets = [
-        t for t in toolset_targets
-        if not _toolset_allowed_for_platform(t, platform)
-    ]
-    if restricted_targets:
-        for name in restricted_targets:
-            allowed = sorted(_TOOLSET_PLATFORM_RESTRICTIONS.get(name) or set())
-            _print_error(
-                f"Toolset '{name}' is not available on platform '{platform}' "
-                f"(only: {', '.join(allowed)})"
-            )
-        toolset_targets = [t for t in toolset_targets if t not in restricted_targets]
-
    if toolset_targets:
        _apply_toolset_change(config, platform, toolset_targets, action)

@@ -49,7 +49,7 @@ from hermes_cli.config import (
 from gateway.status import get_running_pid, read_runtime_status

 try:
-    from fastapi import FastAPI, HTTPException, Request, WebSocket, WebSocketDisconnect
+    from fastapi import FastAPI, HTTPException, Request
    from fastapi.middleware.cors import CORSMiddleware
    from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
    from fastapi.staticfiles import StaticFiles
@@ -73,10 +73,6 @@ app = FastAPI(title="Hermes Agent", version=__version__)
 _SESSION_TOKEN = secrets.token_urlsafe(32)
 _SESSION_HEADER_NAME = "X-Hermes-Session-Token"

-# In-browser Chat tab (/chat, /api/pty, …).  Off unless ``hermes dashboard --tui``
-# or HERMES_DASHBOARD_TUI=1.  Set from :func:`start_server`.
-_DASHBOARD_EMBEDDED_CHAT_ENABLED = False
-
 # Simple rate limiter for the reveal endpoint
 _reveal_timestamps: List[float] = []
 _REVEAL_MAX_PER_WINDOW = 5
@@ -287,7 +283,7 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
    "display.busy_input_mode": {
        "type": "select",
        "description": "Input behavior while agent is running",
-        "options": ["interrupt", "queue"],
+        "options": ["queue", "interrupt", "block"],
    },
    "memory.provider": {
        "type": "select",
@@ -1533,30 +1529,26 @@ def _submit_anthropic_pkce(session_id: str, code_input: str) -> Dict[str, Any]:
        with urllib.request.urlopen(req, timeout=20) as resp:
            result = json.loads(resp.read().decode())
    except Exception as e:
-        with _oauth_sessions_lock:
-            sess["status"] = "error"
-            sess["error_message"] = f"Token exchange failed: {e}"
+        sess["status"] = "error"
+        sess["error_message"] = f"Token exchange failed: {e}"
        return {"ok": False, "status": "error", "message": sess["error_message"]}

    access_token = result.get("access_token", "")
    refresh_token = result.get("refresh_token", "")
    expires_in = int(result.get("expires_in") or 3600)
    if not access_token:
-        with _oauth_sessions_lock:
-            sess["status"] = "error"
-            sess["error_message"] = "No access token returned"
+        sess["status"] = "error"
+        sess["error_message"] = "No access token returned"
        return {"ok": False, "status": "error", "message": sess["error_message"]}

    expires_at_ms = int(time.time() * 1000) + (expires_in * 1000)
    try:
        _save_anthropic_oauth_creds(access_token, refresh_token, expires_at_ms)
    except Exception as e:
-        with _oauth_sessions_lock:
-            sess["status"] = "error"
-            sess["error_message"] = f"Save failed: {e}"
+        sess["status"] = "error"
+        sess["error_message"] = f"Save failed: {e}"
        return {"ok": False, "status": "error", "message": sess["error_message"]}
-    with _oauth_sessions_lock:
-        sess["status"] = "approved"
+    sess["status"] = "approved"
    _log.info("oauth/pkce: anthropic login completed (session=%s)", session_id)
    return {"ok": True, "status": "approved"}

@@ -2271,329 +2263,6 @@ async def get_usage_analytics(days: int = 30):
        db.close()


-# ---------------------------------------------------------------------------
-# /api/pty — PTY-over-WebSocket bridge for the dashboard "Chat" tab.
-#
-# The endpoint spawns the same ``hermes --tui`` binary the CLI uses, behind
-# a POSIX pseudo-terminal, and forwards bytes + resize escapes across a
-# WebSocket.  The browser renders the ANSI through xterm.js (see
-# web/src/pages/ChatPage.tsx).
-#
-# Auth: ``?token=<session_token>`` query param (browsers can't set
-# Authorization on the WS upgrade).  Same ephemeral ``_SESSION_TOKEN`` as
-# REST.  Localhost-only — we defensively reject non-loopback clients even
-# though uvicorn binds to 127.0.0.1.
-# ---------------------------------------------------------------------------
-
-import re
-import asyncio
-
-from hermes_cli.pty_bridge import PtyBridge, PtyUnavailableError
-
-_RESIZE_RE = re.compile(rb"\x1b\[RESIZE:(\d+);(\d+)\]")
-_PTY_READ_CHUNK_TIMEOUT = 0.2
-_VALID_CHANNEL_RE = re.compile(r"^[A-Za-z0-9._-]{1,128}$")
-# Starlette's TestClient reports the peer as "testclient"; treat it as
-# loopback so tests don't need to rewrite request scope.
-_LOOPBACK_HOSTS = frozenset({"127.0.0.1", "::1", "localhost", "testclient"})
-
-# Per-channel subscriber registry used by /api/pub (PTY-side gateway → dashboard)
-# and /api/events (dashboard → browser sidebar).  Keyed by an opaque channel id
-# the chat tab generates on mount; entries auto-evict when the last subscriber
-# drops AND the publisher has disconnected.
-_event_channels: dict[str, set] = {}
-_event_lock = asyncio.Lock()
-
-
-def _resolve_chat_argv(
-    resume: Optional[str] = None,
-    sidecar_url: Optional[str] = None,
-) -> tuple[list[str], Optional[str], Optional[dict]]:
-    """Resolve the argv + cwd + env for the chat PTY.
-
-    Default: whatever ``hermes --tui`` would run.  Tests monkeypatch this
-    function to inject a tiny fake command (``cat``, ``sh -c 'printf …'``)
-    so nothing has to build Node or the TUI bundle.
-
-    Session resume is propagated via the ``HERMES_TUI_RESUME`` env var —
-    matching what ``hermes_cli.main._launch_tui`` does for the CLI path.
-    Appending ``--resume <id>`` to argv doesn't work because ``ui-tui`` does
-    not parse its argv.
-
-    `sidecar_url` (when set) is forwarded as ``HERMES_TUI_SIDECAR_URL`` so
-    the spawned ``tui_gateway.entry`` can mirror dispatcher emits to the
-    dashboard's ``/api/pub`` endpoint (see :func:`pub_ws`).
-    """
-    from hermes_cli.main import PROJECT_ROOT, _make_tui_argv
-
-    argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
-    env: Optional[dict] = None
-
-    if resume or sidecar_url:
-        env = os.environ.copy()
-
-        if resume:
-            env["HERMES_TUI_RESUME"] = resume
-
-        if sidecar_url:
-            env["HERMES_TUI_SIDECAR_URL"] = sidecar_url
-
-    return list(argv), str(cwd) if cwd else None, env
-
-
-def _build_sidecar_url(channel: str) -> Optional[str]:
-    """ws:// URL the PTY child should publish events to, or None when unbound."""
-    host = getattr(app.state, "bound_host", None)
-    port = getattr(app.state, "bound_port", None)
-
-    if not host or not port:
-        return None
-
-    netloc = f"[{host}]:{port}" if ":" in host and not host.startswith("[") else f"{host}:{port}"
-    qs = urllib.parse.urlencode({"token": _SESSION_TOKEN, "channel": channel})
-
-    return f"ws://{netloc}/api/pub?{qs}"
-
-
-async def _broadcast_event(channel: str, payload: str) -> None:
-    """Fan out one publisher frame to every subscriber on `channel`."""
-    async with _event_lock:
-        subs = list(_event_channels.get(channel, ()))
-
-    for sub in subs:
-        try:
-            await sub.send_text(payload)
-        except Exception:
-            # Subscriber went away mid-send; the /api/events finally clause
-            # will remove it from the registry on its next iteration.
-            pass
-
-
-def _channel_or_close_code(ws: WebSocket) -> Optional[str]:
-    """Return the channel id from the query string or None if invalid."""
-    channel = ws.query_params.get("channel", "")
-
-    return channel if _VALID_CHANNEL_RE.match(channel) else None
-
-
-@app.websocket("/api/pty")
-async def pty_ws(ws: WebSocket) -> None:
-    if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
-        await ws.close(code=4403)
-        return
-
-    # --- auth + loopback check (before accept so we can close cleanly) ---
-    token = ws.query_params.get("token", "")
-    expected = _SESSION_TOKEN
-    if not hmac.compare_digest(token.encode(), expected.encode()):
-        await ws.close(code=4401)
-        return
-
-    client_host = ws.client.host if ws.client else ""
-    if client_host and client_host not in _LOOPBACK_HOSTS:
-        await ws.close(code=4403)
-        return
-
-    await ws.accept()
-
-    # --- spawn PTY ------------------------------------------------------
-    resume = ws.query_params.get("resume") or None
-    channel = _channel_or_close_code(ws)
-    sidecar_url = _build_sidecar_url(channel) if channel else None
-
-    try:
-        argv, cwd, env = _resolve_chat_argv(resume=resume, sidecar_url=sidecar_url)
-    except SystemExit as exc:
-        # _make_tui_argv calls sys.exit(1) when node/npm is missing.
-        await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
-        await ws.close(code=1011)
-        return
-
-
-    try:
-        bridge = PtyBridge.spawn(argv, cwd=cwd, env=env)
-    except PtyUnavailableError as exc:
-        await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
-        await ws.close(code=1011)
-        return
-    except (FileNotFoundError, OSError) as exc:
-        await ws.send_text(f"\r\n\x1b[31mChat failed to start: {exc}\x1b[0m\r\n")
-        await ws.close(code=1011)
-        return
-
-    loop = asyncio.get_running_loop()
-
-    # --- reader task: PTY master → WebSocket ----------------------------
-    async def pump_pty_to_ws() -> None:
-        while True:
-            chunk = await loop.run_in_executor(
-                None, bridge.read, _PTY_READ_CHUNK_TIMEOUT
-            )
-            if chunk is None:  # EOF
-                return
-            if not chunk:  # no data this tick; yield control and retry
-                await asyncio.sleep(0)
-                continue
-            try:
-                await ws.send_bytes(chunk)
-            except Exception:
-                return
-
-    reader_task = asyncio.create_task(pump_pty_to_ws())
-
-    # --- writer loop: WebSocket → PTY master ----------------------------
-    try:
-        while True:
-            msg = await ws.receive()
-            msg_type = msg.get("type")
-            if msg_type == "websocket.disconnect":
-                break
-            raw = msg.get("bytes")
-            if raw is None:
-                text = msg.get("text")
-                raw = text.encode("utf-8") if isinstance(text, str) else b""
-            if not raw:
-                continue
-
-            # Resize escape is consumed locally, never written to the PTY.
-            match = _RESIZE_RE.match(raw)
-            if match and match.end() == len(raw):
-                cols = int(match.group(1))
-                rows = int(match.group(2))
-                bridge.resize(cols=cols, rows=rows)
-                continue
-
-            bridge.write(raw)
-    except WebSocketDisconnect:
-        pass
-    finally:
-        reader_task.cancel()
-        try:
-            await reader_task
-        except (asyncio.CancelledError, Exception):
-            pass
-        bridge.close()
-
-
-# ---------------------------------------------------------------------------
-# /api/ws — JSON-RPC WebSocket sidecar for the dashboard "Chat" tab.
-#
-# Drives the same `tui_gateway.dispatch` surface Ink uses over stdio, so the
-# dashboard can render structured metadata (model badge, tool-call sidebar,
-# slash launcher, session info) alongside the xterm.js terminal that PTY
-# already paints. Both transports bind to the same session id when one is
-# active, so a tool.start emitted by the agent fans out to both sinks.
-# ---------------------------------------------------------------------------
-
-
-@app.websocket("/api/ws")
-async def gateway_ws(ws: WebSocket) -> None:
-    if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
-        await ws.close(code=4403)
-        return
-
-    token = ws.query_params.get("token", "")
-    if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
-        await ws.close(code=4401)
-        return
-
-    client_host = ws.client.host if ws.client else ""
-    if client_host and client_host not in _LOOPBACK_HOSTS:
-        await ws.close(code=4403)
-        return
-
-    from tui_gateway.ws import handle_ws
-
-    await handle_ws(ws)
-
-
-# ---------------------------------------------------------------------------
-# /api/pub + /api/events — chat-tab event broadcast.
-#
-# The PTY-side ``tui_gateway.entry`` opens /api/pub at startup (driven by
-# HERMES_TUI_SIDECAR_URL set in /api/pty's PTY env) and writes every
-# dispatcher emit through it.  The dashboard fans those frames out to any
-# subscriber that opened /api/events on the same channel id.  This is what
-# gives the React sidebar its tool-call feed without breaking the PTY
-# child's stdio handshake with Ink.
-# ---------------------------------------------------------------------------
-
-
-@app.websocket("/api/pub")
-async def pub_ws(ws: WebSocket) -> None:
-    if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
-        await ws.close(code=4403)
-        return
-
-    token = ws.query_params.get("token", "")
-    if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
-        await ws.close(code=4401)
-        return
-
-    client_host = ws.client.host if ws.client else ""
-    if client_host and client_host not in _LOOPBACK_HOSTS:
-        await ws.close(code=4403)
-        return
-
-    channel = _channel_or_close_code(ws)
-    if not channel:
-        await ws.close(code=4400)
-        return
-
-    await ws.accept()
-
-    try:
-        while True:
-            await _broadcast_event(channel, await ws.receive_text())
-    except WebSocketDisconnect:
-        pass
-
-
-@app.websocket("/api/events")
-async def events_ws(ws: WebSocket) -> None:
-    if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
-        await ws.close(code=4403)
-        return
-
-    token = ws.query_params.get("token", "")
-    if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
-        await ws.close(code=4401)
-        return
-
-    client_host = ws.client.host if ws.client else ""
-    if client_host and client_host not in _LOOPBACK_HOSTS:
-        await ws.close(code=4403)
-        return
-
-    channel = _channel_or_close_code(ws)
-    if not channel:
-        await ws.close(code=4400)
-        return
-
-    await ws.accept()
-
-    async with _event_lock:
-        _event_channels.setdefault(channel, set()).add(ws)
-
-    try:
-        while True:
-            # Subscribers don't speak — the receive() just blocks until
-            # disconnect so the connection stays open as long as the
-            # browser holds it.
-            await ws.receive_text()
-    except WebSocketDisconnect:
-        pass
-    finally:
-        async with _event_lock:
-            subs = _event_channels.get(channel)
-
-            if subs is not None:
-                subs.discard(ws)
-
-                if not subs:
-                    _event_channels.pop(channel, None)
-
-
 def mount_spa(application: FastAPI):
    """Mount the built SPA. Falls back to index.html for client-side routing.

@@ -2615,10 +2284,8 @@ def mount_spa(application: FastAPI):
    def _serve_index():
        """Return index.html with the session token injected."""
        html = _index_path.read_text()
-        chat_js = "true" if _DASHBOARD_EMBEDDED_CHAT_ENABLED else "false"
        token_script = (
-            f'<script>window.__HERMES_SESSION_TOKEN__="{_SESSION_TOKEN}";'
-            f"window.__HERMES_DASHBOARD_EMBEDDED_CHAT__={chat_js};</script>"
+            f'<script>window.__HERMES_SESSION_TOKEN__="{_SESSION_TOKEN}";</script>'
        )
        html = html.replace("</head>", f"{token_script}</head>", 1)
        return HTMLResponse(
@@ -3131,15 +2798,10 @@ def start_server(
    port: int = 9119,
    open_browser: bool = True,
    allow_public: bool = False,
-    *,
-    embedded_chat: bool = False,
 ):
    """Start the web UI server."""
    import uvicorn

-    global _DASHBOARD_EMBEDDED_CHAT_ENABLED
-    _DASHBOARD_EMBEDDED_CHAT_ENABLED = embedded_chat
-
    _LOCALHOST = ("127.0.0.1", "localhost", "::1")
    if host not in _LOCALHOST and not allow_public:
        raise SystemExit(
@@ -3155,10 +2817,7 @@ def start_server(

    # Record the bound host so host_header_middleware can validate incoming
    # Host headers against it. Defends against DNS rebinding (GHSA-ppp5-vxwm-4cf7).
-    # bound_port is also stashed so /api/pty can build the back-WS URL the
-    # PTY child uses to publish events to the dashboard sidebar.
    app.state.bound_host = host
-    app.state.bound_port = port

    if open_browser:
        import webbrowser
@@ -1039,71 +1039,6 @@ class SessionDB:
            result.append(msg)
        return result

-    def resolve_resume_session_id(self, session_id: str) -> str:
-        """Redirect a resume target to the descendant session that holds the messages.
-
-        Context compression ends the current session and forks a new child session
-        (linked via ``parent_session_id``). The flush cursor is reset, so the
-        child is where new messages actually land — the parent ends up with
-        ``message_count = 0`` rows unless messages had already been flushed to
-        it before compression. See #15000.
-
-        This helper walks ``parent_session_id`` forward from ``session_id`` and
-        returns the first descendant in the chain that has at least one message
-        row. If the original session already has messages, or no descendant
-        has any, the original ``session_id`` is returned unchanged.
-
-        The chain is always walked via the child whose ``started_at`` is
-        latest; that matches the single-chain shape that compression creates.
-        A depth cap (32) guards against accidental loops in malformed data.
-        """
-        if not session_id:
-            return session_id
-
-        with self._lock:
-            # If this session already has messages, nothing to redirect.
-            try:
-                row = self._conn.execute(
-                    "SELECT 1 FROM messages WHERE session_id = ? LIMIT 1",
-                    (session_id,),
-                ).fetchone()
-            except Exception:
-                return session_id
-            if row is not None:
-                return session_id
-
-            # Walk descendants: at each step, pick the most-recently-started
-                # child session; stop once we find one with messages.
-            current = session_id
-            seen = {current}
-            for _ in range(32):
-                try:
-                    child_row = self._conn.execute(
-                        "SELECT id FROM sessions "
-                        "WHERE parent_session_id = ? "
-                        "ORDER BY started_at DESC, id DESC LIMIT 1",
-                        (current,),
-                    ).fetchone()
-                except Exception:
-                    return session_id
-                if child_row is None:
-                    return session_id
-                child_id = child_row["id"] if hasattr(child_row, "keys") else child_row[0]
-                if not child_id or child_id in seen:
-                    return session_id
-                seen.add(child_id)
-                try:
-                    msg_row = self._conn.execute(
-                        "SELECT 1 FROM messages WHERE session_id = ? LIMIT 1",
-                        (child_id,),
-                    ).fetchone()
-                except Exception:
-                    return session_id
-                if msg_row is not None:
-                    return child_id
-                current = child_id
-        return session_id
-
    def get_messages_as_conversation(self, session_id: str) -> List[Dict[str, Any]]:
        """
        Load messages in the OpenAI conversation format (role + content dicts).
@@ -288,34 +288,30 @@ def get_tool_definitions(
                filtered_tools[i] = {"type": "function", "function": dynamic_schema}
                break

-    # Rebuild discord / discord_admin schemas based on the bot's privileged
-    # intents (detected from GET /applications/@me) and the user's action
-    # allowlist in config.  Hides actions the bot's intents don't support so
-    # the model never attempts them, and annotates fetch_messages when the
+    # Rebuild discord_server schema based on the bot's privileged intents
+    # (detected from GET /applications/@me) and the user's action allowlist
+    # in config.  Hides actions the bot's intents don't support so the
+    # model never attempts them, and annotates fetch_messages when the
    # MESSAGE_CONTENT intent is missing.
-    _discord_schema_fns = {
-        "discord": "get_dynamic_schema_core",
-        "discord_admin": "get_dynamic_schema_admin",
-    }
-    for discord_tool_name in _discord_schema_fns:
-        if discord_tool_name in available_tool_names:
-            try:
-                from tools import discord_tool as _dt
-                schema_fn = getattr(_dt, _discord_schema_fns[discord_tool_name])
-                dynamic = schema_fn()
-            except Exception:
-                dynamic = None
-            if dynamic is None:
-                filtered_tools = [
-                    t for t in filtered_tools
-                    if t.get("function", {}).get("name") != discord_tool_name
-                ]
-                available_tool_names.discard(discord_tool_name)
-            else:
-                for i, td in enumerate(filtered_tools):
-                    if td.get("function", {}).get("name") == discord_tool_name:
-                        filtered_tools[i] = {"type": "function", "function": dynamic}
-                        break
+    if "discord_server" in available_tool_names:
+        try:
+            from tools.discord_tool import get_dynamic_schema
+            dynamic = get_dynamic_schema()
+        except Exception:  # pragma: no cover — defensive, fall back to static
+            dynamic = None
+        if dynamic is None:
+            # Tool filtered out entirely (empty allowlist or detection disabled
+            # the only remaining actions).  Drop it from the schema list.
+            filtered_tools = [
+                t for t in filtered_tools
+                if t.get("function", {}).get("name") != "discord_server"
+            ]
+            available_tool_names.discard("discord_server")
+        else:
+            for i, td in enumerate(filtered_tools):
+                if td.get("function", {}).get("name") == "discord_server":
+                    filtered_tools[i] = {"type": "function", "function": dynamic}
+                    break

    # Strip web tool cross-references from browser_navigate description when
    # web_search / web_extract are not available.  The static schema says
@@ -347,18 +343,6 @@ def get_tool_definitions(
    global _last_resolved_tool_names
    _last_resolved_tool_names = [t["function"]["name"] for t in filtered_tools]

-    # Sanitize schemas for broad backend compatibility. llama.cpp's
-    # json-schema-to-grammar converter (used by its OAI server to build
-    # GBNF tool-call parsers) rejects some shapes that cloud providers
-    # silently accept — bare "type": "object" with no properties,
-    # string-valued schema nodes from malformed MCP servers, etc. This
-    # is a no-op for schemas that are already well-formed.
-    try:
-        from tools.schema_sanitizer import sanitize_tool_schemas
-        filtered_tools = sanitize_tool_schemas(filtered_tools)
-    except Exception as e:  # pragma: no cover — defensive
-        logger.warning("Schema sanitization skipped: %s", e)
-
    return filtered_tools


@@ -468,9 +452,9 @@ def _coerce_number(value: str, integer_only: bool = False):
        f = float(value)
    except (ValueError, OverflowError):
        return value
-    # Guard against inf/nan — not JSON-serializable, keep original string
+    # Guard against inf/nan before int() conversion
    if f != f or f == float("inf") or f == float("-inf"):
-        return value
+        return f
    # If it looks like an integer (no fractional part), return int
    if f == int(f):
        return int(f)
@@ -156,7 +156,7 @@
      for entry in "''${ENTRIES[@]}"; do
        IFS=":" read -r ATTR FOLDER NIX_FILE <<< "$entry"
        echo "==> .#$ATTR ($FOLDER -> $NIX_FILE)"
-        OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --rebuild --print-build-logs 2>&1)
+        OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --print-build-logs 2>&1)
        STATUS=$?
        if [ "$STATUS" -eq 0 ]; then
          echo "    ok"
@@ -4,7 +4,7 @@ let
  src = ../web;
  npmDeps = pkgs.fetchNpmDeps {
    inherit src;
-    hash = "sha256-4Z8KQ69QhO83X6zff+5urWBv6MME686MhTTMdwSl65o=";
+    hash = "sha256-TS/vrCHbdvXkPcAPxImKzAd2pdDCrKlgYZkXBMQ+TEg=";
  };

  npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };
@@ -91,29 +91,4 @@

  // Register this plugin — the dashboard picks it up automatically.
  window.__HERMES_PLUGINS__.register("example", ExamplePage);
-
-  // ─────────────────────────────────────────────────────────────────────
-  // Page-scoped slot demo: inject a small banner at the top of /sessions.
-  //
-  // Built-in pages expose named slots (<page>:top, <page>:bottom) that
-  // plugins can populate without overriding the whole route. The
-  // manifest lists the slots we use in its `slots` array so the shell
-  // knows to render <PluginSlot name="sessions:top" /> there.
-  // ─────────────────────────────────────────────────────────────────────
-  function SessionsTopBanner() {
-    return React.createElement(Card, {
-      className: "border-dashed",
-    },
-      React.createElement(CardContent, { className: "flex items-center gap-3 py-2" },
-        React.createElement(Badge, { variant: "outline" }, "Example"),
-        React.createElement("span", {
-          className: "text-xs text-muted-foreground",
-        }, "This banner was injected into the Sessions page by the example plugin via the ",
-          React.createElement("code", { className: "font-courier" }, "sessions:top"),
-          " slot."),
-      ),
-    );
-  }
-
-  window.__HERMES_PLUGINS__.registerSlot("example", "sessions:top", SessionsTopBanner);
 })();
@@ -8,7 +8,6 @@
    "path": "/example",
    "position": "after:skills"
  },
-  "slots": ["sessions:top"],
  "entry": "dist/index.js",
  "api": "plugin_api.py"
 }
@@ -59,8 +59,7 @@ Config file: `~/.hermes/hindsight/config.json`

 | Key | Default | Description |
 |-----|---------|-------------|
-| `bank_id` | `hermes` | Memory bank name (static fallback used when `bank_id_template` is unset or resolves empty) |
-| `bank_id_template` | — | Optional template to derive the bank name dynamically. Placeholders: `{profile}`, `{workspace}`, `{platform}`, `{user}`, `{session}`. Example: `hermes-{profile}` isolates memory per active Hermes profile. Empty placeholders collapse cleanly (e.g. `hermes-{user}` with no user becomes `hermes`). |
+| `bank_id` | `hermes` | Memory bank name |
 | `bank_mission` | — | Reflect mission (identity/framing for reflect reasoning). Applied via Banks API. |
 | `bank_retain_mission` | — | Retain mission (steers what gets extracted). Applied via Banks API. |

@@ -3,8 +3,6 @@
 Long-term memory with knowledge graph, entity resolution, and multi-strategy
 retrieval. Supports cloud (API key) and local modes.

-Configurable timeout via HINDSIGHT_TIMEOUT env var or config.json.
-
 Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.

 Config via environment variables:
@@ -13,7 +11,6 @@ Config via environment variables:
  HINDSIGHT_BUDGET                 — recall budget: low/mid/high (default: mid)
  HINDSIGHT_API_URL                — API endpoint
  HINDSIGHT_MODE                   — cloud or local (default: cloud)
-  HINDSIGHT_TIMEOUT                — API request timeout in seconds (default: 120)
  HINDSIGHT_RETAIN_TAGS            — comma-separated tags attached to retained memories
  HINDSIGHT_RETAIN_SOURCE          — metadata source value attached to retained memories
  HINDSIGHT_RETAIN_USER_PREFIX     — label used before user turns in retained transcripts
@@ -26,7 +23,6 @@ Or via $HERMES_HOME/hindsight/config.json (profile-scoped), falling back to
 from __future__ import annotations

 import asyncio
-import importlib
 import json
 import logging
 import os
@@ -44,7 +40,6 @@ logger = logging.getLogger(__name__)
 _DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
 _DEFAULT_LOCAL_URL = "http://localhost:8888"
 _MIN_CLIENT_VERSION = "0.4.22"
-_DEFAULT_TIMEOUT = 120  # seconds — cloud API can take 30-40s per request
 _VALID_BUDGETS = {"low", "mid", "high"}
 _PROVIDER_DEFAULT_MODELS = {
    "openai": "gpt-4o-mini",
@@ -59,22 +54,6 @@ _PROVIDER_DEFAULT_MODELS = {
 }


-def _check_local_runtime() -> tuple[bool, str | None]:
-    """Return whether local embedded Hindsight imports cleanly.
-
-    On older CPUs, importing the local Hindsight stack can raise a runtime
-    error from NumPy before the daemon starts. Treat that as "unavailable"
-    so Hermes can degrade gracefully instead of repeatedly trying to start
-    a broken local memory backend.
-    """
-    try:
-        importlib.import_module("hindsight")
-        importlib.import_module("hindsight_embed.daemon_embed_manager")
-        return True, None
-    except Exception as exc:
-        return False, str(exc)
-
-
 # ---------------------------------------------------------------------------
 # Dedicated event loop for Hindsight async calls (one per process, reused).
 # Avoids creating ephemeral loops that leak aiohttp sessions.
@@ -102,18 +81,13 @@ def _get_loop() -> asyncio.AbstractEventLoop:
        return _loop


-def _run_sync(coro, timeout: float = _DEFAULT_TIMEOUT):
+def _run_sync(coro, timeout: float = 120.0):
    """Schedule *coro* on the shared loop and block until done."""
    loop = _get_loop()
    future = asyncio.run_coroutine_threadsafe(coro, loop)
    return future.result(timeout=timeout)


-# ---------------------------------------------------------------------------
-# Backward-compatible alias — instances use self._run_sync() instead.
-# ---------------------------------------------------------------------------
-
-
 # ---------------------------------------------------------------------------
 # Tool schemas
 # ---------------------------------------------------------------------------
@@ -259,126 +233,6 @@ def _utc_timestamp() -> str:
    return datetime.now(timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")


-def _embedded_profile_name(config: dict[str, Any]) -> str:
-    """Return the Hindsight embedded profile name for this Hermes config."""
-    profile = config.get("profile", "hermes")
-    return str(profile or "hermes")
-
-
-def _load_simple_env(path) -> dict[str, str]:
-    """Parse a simple KEY=VALUE env file, ignoring comments and blank lines."""
-    if not path.exists():
-        return {}
-
-    values: dict[str, str] = {}
-    for line in path.read_text(encoding="utf-8").splitlines():
-        if not line or line.startswith("#") or "=" not in line:
-            continue
-        key, value = line.split("=", 1)
-        values[key.strip()] = value.strip()
-    return values
-
-
-def _build_embedded_profile_env(config: dict[str, Any], *, llm_api_key: str | None = None) -> dict[str, str]:
-    """Build the profile-scoped env file that standalone hindsight-embed consumes."""
-    current_key = llm_api_key
-    if current_key is None:
-        current_key = (
-            config.get("llmApiKey")
-            or config.get("llm_api_key")
-            or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
-        )
-
-    current_provider = config.get("llm_provider", "")
-    current_model = config.get("llm_model", "")
-    current_base_url = config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
-
-    # The embedded daemon expects OpenAI wire format for these providers.
-    daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider
-
-    env_values = {
-        "HINDSIGHT_API_LLM_PROVIDER": str(daemon_provider),
-        "HINDSIGHT_API_LLM_API_KEY": str(current_key or ""),
-        "HINDSIGHT_API_LLM_MODEL": str(current_model),
-        "HINDSIGHT_API_LOG_LEVEL": "info",
-    }
-    if current_base_url:
-        env_values["HINDSIGHT_API_LLM_BASE_URL"] = str(current_base_url)
-    return env_values
-
-
-def _embedded_profile_env_path(config: dict[str, Any]):
-    from pathlib import Path
-
-    return Path.home() / ".hindsight" / "profiles" / f"{_embedded_profile_name(config)}.env"
-
-
-def _materialize_embedded_profile_env(config: dict[str, Any], *, llm_api_key: str | None = None):
-    """Write the profile-scoped env file that standalone hindsight-embed uses."""
-    profile_env = _embedded_profile_env_path(config)
-    profile_env.parent.mkdir(parents=True, exist_ok=True)
-    env_values = _build_embedded_profile_env(config, llm_api_key=llm_api_key)
-    profile_env.write_text(
-        "".join(f"{key}={value}\n" for key, value in env_values.items()),
-        encoding="utf-8",
-    )
-    return profile_env
-
-def _sanitize_bank_segment(value: str) -> str:
-    """Sanitize a bank_id_template placeholder value.
-
-    Bank IDs should be safe for URL paths and filesystem use. Replaces any
-    character that isn't alphanumeric, dash, or underscore with a dash, and
-    collapses runs of dashes.
-    """
-    if not value:
-        return ""
-    out = []
-    prev_dash = False
-    for ch in str(value):
-        if ch.isalnum() or ch == "-" or ch == "_":
-            out.append(ch)
-            prev_dash = False
-        else:
-            if not prev_dash:
-                out.append("-")
-                prev_dash = True
-    return "".join(out).strip("-_")
-
-
-def _resolve_bank_id_template(template: str, fallback: str, **placeholders: str) -> str:
-    """Resolve a bank_id template string with the given placeholders.
-
-    Supported placeholders (each is sanitized before substitution):
-      {profile}   — active Hermes profile name (from agent_identity)
-      {workspace} — Hermes workspace name (from agent_workspace)
-      {platform}  — "cli", "telegram", "discord", etc.
-      {user}      — platform user id (gateway sessions)
-      {session}   — current session id
-
-    Missing/empty placeholders are rendered as the empty string and then
-    collapsed — e.g. ``hermes-{user}`` with no user becomes ``hermes``.
-
-    If the template is empty, resolution falls back to *fallback*.
-    Returns the sanitized bank id.
-    """
-    if not template:
-        return fallback
-    sanitized = {k: _sanitize_bank_segment(v) for k, v in placeholders.items()}
-    try:
-        rendered = template.format(**sanitized)
-    except (KeyError, IndexError) as exc:
-        logger.warning("Invalid bank_id_template %r: %s — using fallback %r",
-                       template, exc, fallback)
-        return fallback
-    while "--" in rendered:
-        rendered = rendered.replace("--", "-")
-    while "__" in rendered:
-        rendered = rendered.replace("__", "_")
-    rendered = rendered.strip("-_")
-    return rendered or fallback
-
-
 # ---------------------------------------------------------------------------
 # MemoryProvider implementation
 # ---------------------------------------------------------------------------
@@ -408,17 +262,13 @@ class HindsightMemoryProvider(MemoryProvider):
        self._chat_type = ""
        self._thread_id = ""
        self._agent_identity = ""
-        self._agent_workspace = ""
        self._turn_index = 0
        self._client = None
-        self._timeout = _DEFAULT_TIMEOUT
        self._prefetch_result = ""
        self._prefetch_lock = threading.Lock()
        self._prefetch_thread = None
        self._sync_thread = None
        self._session_id = ""
-        self._parent_session_id = ""
-        self._document_id = ""

        # Tags
        self._tags: list[str] | None = None
@@ -443,7 +293,6 @@ class HindsightMemoryProvider(MemoryProvider):
        # Bank
        self._bank_mission = ""
        self._bank_retain_mission: str | None = None
-        self._bank_id_template = ""

    @property
    def name(self) -> str:
@@ -453,16 +302,9 @@ class HindsightMemoryProvider(MemoryProvider):
        try:
            cfg = _load_config()
            mode = cfg.get("mode", "cloud")
-            if mode in ("local", "local_embedded"):
-                available, _ = _check_local_runtime()
-                return available
-            if mode == "local_external":
+            if mode in ("local", "local_embedded", "local_external"):
                return True
-            has_key = bool(
-                cfg.get("apiKey")
-                or cfg.get("api_key")
-                or os.environ.get("HINDSIGHT_API_KEY", "")
-            )
+            has_key = bool(cfg.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", ""))
            has_url = bool(cfg.get("api_url") or os.environ.get("HINDSIGHT_API_URL", ""))
            return has_key or has_url
        except Exception:
@@ -521,7 +363,7 @@ class HindsightMemoryProvider(MemoryProvider):
        else:
            deps_to_install = [cloud_dep]

-        print("\n  Checking dependencies...")
+        print(f"\n  Checking dependencies...")
        uv_path = shutil.which("uv")
        if not uv_path:
            print("  ⚠ uv not found — install it: curl -LsSf https://astral.sh/uv/install.sh | sh")
@@ -532,14 +374,14 @@ class HindsightMemoryProvider(MemoryProvider):
                    [uv_path, "pip", "install", "--python", sys.executable, "--quiet", "--upgrade"] + deps_to_install,
                    check=True, timeout=120, capture_output=True,
                )
-                print("  ✓ Dependencies up to date")
+                print(f"  ✓ Dependencies up to date")
            except Exception as e:
                print(f"  ⚠ Install failed: {e}")
                print(f"  Run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")

        # Step 3: Mode-specific config
        if mode == "cloud":
-            print("\n  Get your API key at https://ui.hindsight.vectorize.io\n")
+            print(f"\n  Get your API key at https://ui.hindsight.vectorize.io\n")
            existing_key = os.environ.get("HINDSIGHT_API_KEY", "")
            if existing_key:
                masked = f"...{existing_key[-4:]}" if len(existing_key) > 4 else "set"
@@ -592,19 +434,13 @@ class HindsightMemoryProvider(MemoryProvider):
            sys.stdout.write("  LLM API key: ")
            sys.stdout.flush()
            llm_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
-            # Always write explicitly (including empty) so the provider sees ""
-            # rather than a missing variable.  The daemon reads from .env at
-            # startup and fails when HINDSIGHT_LLM_API_KEY is unset.
-            env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
+            if llm_key:
+                env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key

        # Step 4: Save everything
        provider_config["bank_id"] = "hermes"
        provider_config["recall_budget"] = "mid"
-        # Read existing timeout from config if present, otherwise use default
-        existing_timeout = self._config.get("timeout") if self._config else None
-        timeout_val = existing_timeout if existing_timeout else _DEFAULT_TIMEOUT
-        provider_config["timeout"] = timeout_val
-        env_writes["HINDSIGHT_TIMEOUT"] = str(timeout_val)
+        bank_id = "hermes"
        config["memory"]["provider"] = "hindsight"
        save_config(config)

@@ -630,32 +466,10 @@ class HindsightMemoryProvider(MemoryProvider):
                    new_lines.append(f"{k}={v}")
            env_path.write_text("\n".join(new_lines) + "\n")

-        if mode == "local_embedded":
-            materialized_config = dict(provider_config)
-            config_path = Path(hermes_home) / "hindsight" / "config.json"
-            try:
-                materialized_config = json.loads(config_path.read_text(encoding="utf-8"))
-            except Exception:
-                pass
-
-            llm_api_key = env_writes.get("HINDSIGHT_LLM_API_KEY", "")
-            if not llm_api_key:
-                llm_api_key = _load_simple_env(Path(hermes_home) / ".env").get("HINDSIGHT_LLM_API_KEY", "")
-            if not llm_api_key:
-                llm_api_key = _load_simple_env(_embedded_profile_env_path(materialized_config)).get(
-                    "HINDSIGHT_API_LLM_API_KEY",
-                    "",
-                )
-
-            _materialize_embedded_profile_env(
-                materialized_config,
-                llm_api_key=llm_api_key or None,
-            )
-
        print(f"\n  ✓ Hindsight memory configured ({mode} mode)")
        if env_writes:
-            print("  API keys saved to .env")
-        print("\n  Start a new session to activate.\n")
+            print(f"  API keys saved to .env")
+        print(f"\n  Start a new session to activate.\n")

    def get_config_schema(self):
        return [
@@ -671,8 +485,7 @@ class HindsightMemoryProvider(MemoryProvider):
            {"key": "llm_base_url", "description": "Endpoint URL (e.g. http://192.168.1.10:8080/v1)", "default": "", "when": {"mode": "local_embedded", "llm_provider": "openai_compatible"}},
            {"key": "llm_api_key", "description": "LLM API key (optional for openai_compatible)", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local_embedded"}},
            {"key": "llm_model", "description": "LLM model", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local_embedded"}},
-            {"key": "bank_id", "description": "Memory bank name (static fallback when bank_id_template is unset)", "default": "hermes"},
-            {"key": "bank_id_template", "description": "Optional template to derive bank_id dynamically. Placeholders: {profile}, {workspace}, {platform}, {user}, {session}. Example: hermes-{profile}", "default": ""},
+            {"key": "bank_id", "description": "Memory bank name", "default": "hermes"},
            {"key": "bank_mission", "description": "Mission/purpose description for the memory bank"},
            {"key": "bank_retain_mission", "description": "Custom extraction prompt for memory retention"},
            {"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
@@ -692,19 +505,12 @@ class HindsightMemoryProvider(MemoryProvider):
            {"key": "recall_max_tokens", "description": "Maximum tokens for recall results", "default": 4096},
            {"key": "recall_max_input_chars", "description": "Maximum input query length for auto-recall", "default": 800},
            {"key": "recall_prompt_preamble", "description": "Custom preamble for recalled memories in context"},
-            {"key": "timeout", "description": "API request timeout in seconds", "default": _DEFAULT_TIMEOUT},
        ]

    def _get_client(self):
        """Return the cached Hindsight client (created once, reused)."""
        if self._client is None:
            if self._mode == "local_embedded":
-                available, reason = _check_local_runtime()
-                if not available:
-                    raise RuntimeError(
-                        "Hindsight local runtime is unavailable"
-                        + (f": {reason}" if reason else "")
-                    )
                from hindsight import HindsightEmbedded
                HindsightEmbedded.__del__ = lambda self: None
                llm_provider = self._config.get("llm_provider", "")
@@ -723,30 +529,16 @@ class HindsightMemoryProvider(MemoryProvider):
                self._client = HindsightEmbedded(**kwargs)
            else:
                from hindsight_client import Hindsight
-                timeout = self._timeout or _DEFAULT_TIMEOUT
-                kwargs = {"base_url": self._api_url, "timeout": float(timeout)}
+                kwargs = {"base_url": self._api_url, "timeout": 30.0}
                if self._api_key:
                    kwargs["api_key"] = self._api_key
-                logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s, timeout=%s)",
-                             self._api_url, bool(self._api_key), kwargs["timeout"])
+                logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s)",
+                             self._api_url, bool(self._api_key))
                self._client = Hindsight(**kwargs)
        return self._client

-    def _run_sync(self, coro):
-        """Schedule *coro* on the shared loop using the configured timeout."""
-        return _run_sync(coro, timeout=self._timeout)
-
    def initialize(self, session_id: str, **kwargs) -> None:
        self._session_id = str(session_id or "").strip()
-        self._parent_session_id = str(kwargs.get("parent_session_id", "") or "").strip()
-
-        # Each process lifecycle gets its own document_id. Reusing session_id
-        # alone caused overwrites on /resume — the reloaded session starts
-        # with an empty _session_turns, so the next retain would replace the
-        # previously stored content. session_id stays in tags so processes
-        # for the same session remain filterable together.
-        start_ts = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
-        self._document_id = f"{self._session_id}-{start_ts}"

        # Check client version and auto-upgrade if needed
        try:
@@ -756,9 +548,7 @@ class HindsightMemoryProvider(MemoryProvider):
            if Version(installed) < Version(_MIN_CLIENT_VERSION):
                logger.warning("hindsight-client %s is outdated (need >=%s), attempting upgrade...",
                               installed, _MIN_CLIENT_VERSION)
-                import shutil
-                import subprocess
-                import sys
+                import shutil, subprocess, sys
                uv_path = shutil.which("uv")
                if uv_path:
                    try:
@@ -785,41 +575,19 @@ class HindsightMemoryProvider(MemoryProvider):
        self._chat_type = str(kwargs.get("chat_type") or "").strip()
        self._thread_id = str(kwargs.get("thread_id") or "").strip()
        self._agent_identity = str(kwargs.get("agent_identity") or "").strip()
-        self._agent_workspace = str(kwargs.get("agent_workspace") or "").strip()
        self._turn_index = 0
        self._session_turns = []
        self._mode = self._config.get("mode", "cloud")
-        # Read timeout from config or env var, fall back to default
-        self._timeout = self._config.get("timeout") or int(os.environ.get("HINDSIGHT_TIMEOUT", str(_DEFAULT_TIMEOUT)))
        # "local" is a legacy alias for "local_embedded"
        if self._mode == "local":
            self._mode = "local_embedded"
-        if self._mode == "local_embedded":
-            available, reason = _check_local_runtime()
-            if not available:
-                logger.warning(
-                    "Hindsight local mode disabled because its runtime could not be imported: %s",
-                    reason,
-                )
-                self._mode = "disabled"
-                return
        self._api_key = self._config.get("apiKey") or self._config.get("api_key") or os.environ.get("HINDSIGHT_API_KEY", "")
        default_url = _DEFAULT_LOCAL_URL if self._mode in ("local_embedded", "local_external") else _DEFAULT_API_URL
        self._api_url = self._config.get("api_url") or os.environ.get("HINDSIGHT_API_URL", default_url)
        self._llm_base_url = self._config.get("llm_base_url", "")

        banks = self._config.get("banks", {}).get("hermes", {})
-        static_bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
-        self._bank_id_template = self._config.get("bank_id_template", "") or ""
-        self._bank_id = _resolve_bank_id_template(
-            self._bank_id_template,
-            fallback=static_bank_id,
-            profile=self._agent_identity,
-            workspace=self._agent_workspace,
-            platform=self._platform,
-            user=self._user_id,
-            session=self._session_id,
-        )
+        self._bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
        budget = self._config.get("recall_budget") or self._config.get("budget") or banks.get("budget", "mid")
        self._budget = budget if budget in _VALID_BUDGETS else "mid"

@@ -872,10 +640,6 @@ class HindsightMemoryProvider(MemoryProvider):
            pass
        logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
                     self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
-        if self._bank_id_template:
-            logger.debug("Hindsight bank resolved from template %r: profile=%s workspace=%s platform=%s user=%s -> bank=%s",
-                         self._bank_id_template, self._agent_identity, self._agent_workspace,
-                         self._platform, self._user_id, self._bank_id)
        logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
                     "retain_async=%s, retain_context=%s, recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
                     self._auto_retain, self._auto_recall, self._retain_every_n_turns,
@@ -905,13 +669,42 @@ class HindsightMemoryProvider(MemoryProvider):
                    # Update the profile .env to match our current config so
                    # the daemon always starts with the right settings.
                    # If the config changed and the daemon is running, stop it.
-                    profile_env = _embedded_profile_env_path(self._config)
-                    expected_env = _build_embedded_profile_env(self._config)
-                    saved = _load_simple_env(profile_env)
-                    config_changed = saved != expected_env
+                    from pathlib import Path as _Path
+                    profile_env = _Path.home() / ".hindsight" / "profiles" / f"{profile}.env"
+                    current_key = self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
+                    current_provider = self._config.get("llm_provider", "")
+                    current_model = self._config.get("llm_model", "")
+                    current_base_url = self._config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
+                    # Map openai_compatible/openrouter → openai for the daemon (OpenAI wire format)
+                    daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider
+
+                    # Read saved profile config
+                    saved = {}
+                    if profile_env.exists():
+                        for line in profile_env.read_text().splitlines():
+                            if "=" in line and not line.startswith("#"):
+                                k, v = line.split("=", 1)
+                                saved[k.strip()] = v.strip()
+
+                    config_changed = (
+                        saved.get("HINDSIGHT_API_LLM_PROVIDER") != daemon_provider or
+                        saved.get("HINDSIGHT_API_LLM_MODEL") != current_model or
+                        saved.get("HINDSIGHT_API_LLM_API_KEY") != current_key or
+                        saved.get("HINDSIGHT_API_LLM_BASE_URL", "") != current_base_url
+                    )

                    if config_changed:
-                        profile_env = _materialize_embedded_profile_env(self._config)
+                        # Write updated profile .env
+                        profile_env.parent.mkdir(parents=True, exist_ok=True)
+                        env_lines = (
+                            f"HINDSIGHT_API_LLM_PROVIDER={daemon_provider}\n"
+                            f"HINDSIGHT_API_LLM_API_KEY={current_key}\n"
+                            f"HINDSIGHT_API_LLM_MODEL={current_model}\n"
+                            f"HINDSIGHT_API_LOG_LEVEL=info\n"
+                        )
+                        if current_base_url:
+                            env_lines += f"HINDSIGHT_API_LLM_BASE_URL={current_base_url}\n"
+                        profile_env.write_text(env_lines)
                        if client._manager.is_running(profile):
                            with open(log_path, "a") as f:
                                f.write("\n=== Config changed, restarting daemon ===\n")
@@ -984,7 +777,7 @@ class HindsightMemoryProvider(MemoryProvider):
                client = self._get_client()
                if self._prefetch_method == "reflect":
                    logger.debug("Prefetch: calling reflect (bank=%s, query_len=%d)", self._bank_id, len(query))
-                    resp = self._run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
+                    resp = _run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
                    text = resp.text or ""
                else:
                    recall_kwargs: dict = {
@@ -998,7 +791,7 @@ class HindsightMemoryProvider(MemoryProvider):
                        recall_kwargs["types"] = self._recall_types
                    logger.debug("Prefetch: calling recall (bank=%s, query_len=%d, budget=%s)",
                                 self._bank_id, len(query), self._budget)
-                    resp = self._run_sync(client.arecall(**recall_kwargs))
+                    resp = _run_sync(client.arecall(**recall_kwargs))
                    num_results = len(resp.results) if resp.results else 0
                    logger.debug("Prefetch: recall returned %d results", num_results)
                    text = "\n".join(f"- {r.text}" for r in resp.results if r.text) if resp.results else ""
@@ -1095,7 +888,7 @@ class HindsightMemoryProvider(MemoryProvider):
        if session_id:
            self._session_id = str(session_id).strip()

-        turn = json.dumps(self._build_turn_messages(user_content, assistant_content), ensure_ascii=False)
+        turn = json.dumps(self._build_turn_messages(user_content, assistant_content))
        self._session_turns.append(turn)
        self._turn_counter += 1
        self._turn_index = self._turn_counter
@@ -1109,12 +902,6 @@ class HindsightMemoryProvider(MemoryProvider):
                     len(self._session_turns), sum(len(t) for t in self._session_turns))
        content = "[" + ",".join(self._session_turns) + "]"

-        lineage_tags: list[str] = []
-        if self._session_id:
-            lineage_tags.append(f"session:{self._session_id}")
-        if self._parent_session_id:
-            lineage_tags.append(f"parent:{self._parent_session_id}")
-
        def _sync():
            try:
                client = self._get_client()
@@ -1125,16 +912,15 @@ class HindsightMemoryProvider(MemoryProvider):
                        message_count=len(self._session_turns) * 2,
                        turn_index=self._turn_index,
                    ),
-                    tags=lineage_tags or None,
                )
                item.pop("bank_id", None)
                item.pop("retain_async", None)
                logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
-                             self._bank_id, self._document_id, self._retain_async, len(content), len(self._session_turns))
-                self._run_sync(client.aretain_batch(
+                             self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
+                _run_sync(client.aretain_batch(
                    bank_id=self._bank_id,
                    items=[item],
-                    document_id=self._document_id,
+                    document_id=self._session_id,
                    retain_async=self._retain_async,
                ))
                logger.debug("Hindsight retain succeeded")
@@ -1171,7 +957,7 @@ class HindsightMemoryProvider(MemoryProvider):
                )
                logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
                             self._bank_id, len(content), context)
-                self._run_sync(client.aretain(**retain_kwargs))
+                _run_sync(client.aretain(**retain_kwargs))
                logger.debug("Tool hindsight_retain: success")
                return json.dumps({"result": "Memory stored successfully."})
            except Exception as e:
@@ -1194,7 +980,7 @@ class HindsightMemoryProvider(MemoryProvider):
                    recall_kwargs["types"] = self._recall_types
                logger.debug("Tool hindsight_recall: bank=%s, query_len=%d, budget=%s",
                             self._bank_id, len(query), self._budget)
-                resp = self._run_sync(client.arecall(**recall_kwargs))
+                resp = _run_sync(client.arecall(**recall_kwargs))
                num_results = len(resp.results) if resp.results else 0
                logger.debug("Tool hindsight_recall: %d results", num_results)
                if not resp.results:
@@ -1212,7 +998,7 @@ class HindsightMemoryProvider(MemoryProvider):
            try:
                logger.debug("Tool hindsight_reflect: bank=%s, query_len=%d, budget=%s",
                             self._bank_id, len(query), self._budget)
-                resp = self._run_sync(client.areflect(
+                resp = _run_sync(client.areflect(
                    bank_id=self._bank_id, query=query, budget=self._budget
                ))
                logger.debug("Tool hindsight_reflect: response_len=%d", len(resp.text or ""))
@@ -1225,6 +1011,7 @@ class HindsightMemoryProvider(MemoryProvider):

    def shutdown(self) -> None:
        logger.debug("Hindsight shutdown: waiting for background threads")
+        global _loop, _loop_thread
        for t in (self._prefetch_thread, self._sync_thread):
            if t and t.is_alive():
                t.join(timeout=5.0)
@@ -1239,21 +1026,17 @@ class HindsightMemoryProvider(MemoryProvider):
                    except RuntimeError:
                        pass
                else:
-                    self._run_sync(self._client.aclose())
+                    _run_sync(self._client.aclose())
            except Exception:
                pass
            self._client = None
-        # The module-global background event loop (_loop / _loop_thread)
-        # is intentionally NOT stopped here. It is shared across every
-        # HindsightMemoryProvider instance in the process — the plugin
-        # loader creates a new provider per AIAgent, and the gateway
-        # creates one AIAgent per concurrent chat session. Stopping the
-        # loop from one provider's shutdown() strands the aiohttp
-        # ClientSession + TCPConnector owned by every sibling provider
-        # on a dead loop, which surfaces as the "Unclosed client session"
-        # / "Unclosed connector" warnings reported in #11923. The loop
-        # runs on a daemon thread and is reclaimed on process exit;
-        # per-session cleanup happens via self._client.aclose() above.
+        # Stop the background event loop so no tasks are pending at exit
+        if _loop is not None and _loop.is_running():
+            _loop.call_soon_threadsafe(_loop.stop)
+            if _loop_thread is not None:
+                _loop_thread.join(timeout=5.0)
+            _loop = None
+            _loop_thread = None


 def register(ctx) -> None:
@@ -1,66 +0,0 @@
-"""Spotify integration plugin — bundled, auto-loaded.
-
-Registers 7 tools (playback, devices, queue, search, playlists, albums,
-library) into the ``spotify`` toolset. Each tool's handler is gated by
-``_check_spotify_available()`` — when the user has not run ``hermes auth
-spotify``, the tools remain registered (so they appear in ``hermes
-tools``) but the runtime check prevents dispatch.
-
-Why a plugin instead of a top-level ``tools/`` file?
-
- ``plugins/`` is where third-party service integrations live (see
-  ``plugins/image_gen/`` for the backend-provider pattern, ``plugins/
-  disk-cleanup/`` for the standalone pattern). ``tools/`` is reserved
-  for foundational capabilities (terminal, read_file, web_search, etc.).
- Mirroring the image_gen plugin layout (``plugins/<category>/<backend>/``
-  for categories, flat ``plugins/<name>/`` for standalones) makes new
-  service integrations a pattern contributors can copy.
- Bundled + ``kind: backend`` auto-loads on startup just like image_gen
-  backends — no user opt-in needed, no ``plugins.enabled`` config.
-
-The Spotify auth flow (``hermes auth spotify``), CLI plumbing, and docs
-are unchanged. This move is purely structural.
-"""
-
-from __future__ import annotations
-
-from plugins.spotify.tools import (
-    SPOTIFY_ALBUMS_SCHEMA,
-    SPOTIFY_DEVICES_SCHEMA,
-    SPOTIFY_LIBRARY_SCHEMA,
-    SPOTIFY_PLAYBACK_SCHEMA,
-    SPOTIFY_PLAYLISTS_SCHEMA,
-    SPOTIFY_QUEUE_SCHEMA,
-    SPOTIFY_SEARCH_SCHEMA,
-    _check_spotify_available,
-    _handle_spotify_albums,
-    _handle_spotify_devices,
-    _handle_spotify_library,
-    _handle_spotify_playback,
-    _handle_spotify_playlists,
-    _handle_spotify_queue,
-    _handle_spotify_search,
-)
-
-_TOOLS = (
-    ("spotify_playback",  SPOTIFY_PLAYBACK_SCHEMA,  _handle_spotify_playback,  "🎵"),
-    ("spotify_devices",   SPOTIFY_DEVICES_SCHEMA,   _handle_spotify_devices,   "🔈"),
-    ("spotify_queue",     SPOTIFY_QUEUE_SCHEMA,     _handle_spotify_queue,     "📻"),
-    ("spotify_search",    SPOTIFY_SEARCH_SCHEMA,    _handle_spotify_search,    "🔎"),
-    ("spotify_playlists", SPOTIFY_PLAYLISTS_SCHEMA, _handle_spotify_playlists, "📚"),
-    ("spotify_albums",    SPOTIFY_ALBUMS_SCHEMA,    _handle_spotify_albums,    "💿"),
-    ("spotify_library",   SPOTIFY_LIBRARY_SCHEMA,   _handle_spotify_library,   "❤️"),
-)
-
-
-def register(ctx) -> None:
-    """Register all Spotify tools. Called once by the plugin loader."""
-    for name, schema, handler, emoji in _TOOLS:
-        ctx.register_tool(
-            name=name,
-            toolset="spotify",
-            schema=schema,
-            handler=handler,
-            check_fn=_check_spotify_available,
-            emoji=emoji,
-        )
@@ -1,435 +0,0 @@
-"""Thin Spotify Web API helper used by Hermes native tools."""
-
-from __future__ import annotations
-
-import json
-from typing import Any, Dict, Iterable, Optional
-from urllib.parse import urlparse
-
-import httpx
-
-from hermes_cli.auth import (
-    AuthError,
-    resolve_spotify_runtime_credentials,
-)
-
-
-class SpotifyError(RuntimeError):
-    """Base Spotify tool error."""
-
-
-class SpotifyAuthRequiredError(SpotifyError):
-    """Raised when the user needs to authenticate with Spotify first."""
-
-
-class SpotifyAPIError(SpotifyError):
-    """Structured Spotify API failure."""
-
-    def __init__(
-        self,
-        message: str,
-        *,
-        status_code: Optional[int] = None,
-        response_body: Optional[str] = None,
-    ) -> None:
-        super().__init__(message)
-        self.status_code = status_code
-        self.response_body = response_body
-        self.path = None
-
-
-class SpotifyClient:
-    def __init__(self) -> None:
-        self._runtime = self._resolve_runtime(refresh_if_expiring=True)
-
-    def _resolve_runtime(self, *, force_refresh: bool = False, refresh_if_expiring: bool = True) -> Dict[str, Any]:
-        try:
-            return resolve_spotify_runtime_credentials(
-                force_refresh=force_refresh,
-                refresh_if_expiring=refresh_if_expiring,
-            )
-        except AuthError as exc:
-            raise SpotifyAuthRequiredError(str(exc)) from exc
-
-    @property
-    def base_url(self) -> str:
-        return str(self._runtime.get("base_url") or "").rstrip("/")
-
-    def _headers(self) -> Dict[str, str]:
-        return {
-            "Authorization": f"Bearer {self._runtime['access_token']}",
-            "Content-Type": "application/json",
-        }
-
-    def request(
-        self,
-        method: str,
-        path: str,
-        *,
-        params: Optional[Dict[str, Any]] = None,
-        json_body: Optional[Dict[str, Any]] = None,
-        allow_retry_on_401: bool = True,
-        empty_response: Optional[Dict[str, Any]] = None,
-    ) -> Any:
-        url = f"{self.base_url}{path}"
-        response = httpx.request(
-            method,
-            url,
-            headers=self._headers(),
-            params=_strip_none(params),
-            json=_strip_none(json_body) if json_body is not None else None,
-            timeout=30.0,
-        )
-        if response.status_code == 401 and allow_retry_on_401:
-            self._runtime = self._resolve_runtime(force_refresh=True, refresh_if_expiring=True)
-            return self.request(
-                method,
-                path,
-                params=params,
-                json_body=json_body,
-                allow_retry_on_401=False,
-            )
-        if response.status_code >= 400:
-            self._raise_api_error(response, method=method, path=path)
-        if response.status_code == 204 or not response.content:
-            return empty_response or {"success": True, "status_code": response.status_code, "empty": True}
-        if "application/json" in response.headers.get("content-type", ""):
-            return response.json()
-        return {"success": True, "text": response.text}
-
-    def _raise_api_error(self, response: httpx.Response, *, method: str, path: str) -> None:
-        detail = response.text.strip()
-        message = _friendly_spotify_error_message(
-            status_code=response.status_code,
-            detail=_extract_spotify_error_detail(response, fallback=detail),
-            method=method,
-            path=path,
-            retry_after=response.headers.get("Retry-After"),
-        )
-        error = SpotifyAPIError(message, status_code=response.status_code, response_body=detail)
-        error.path = path
-        raise error
-
-    def get_devices(self) -> Any:
-        return self.request("GET", "/me/player/devices")
-
-    def transfer_playback(self, *, device_id: str, play: bool = False) -> Any:
-        return self.request("PUT", "/me/player", json_body={
-            "device_ids": [device_id],
-            "play": play,
-        })
-
-    def get_playback_state(self, *, market: Optional[str] = None) -> Any:
-        return self.request(
-            "GET",
-            "/me/player",
-            params={"market": market},
-            empty_response={
-                "status_code": 204,
-                "empty": True,
-                "message": "No active Spotify playback session was found. Open Spotify on a device and start playback, or transfer playback to an available device.",
-            },
-        )
-
-    def get_currently_playing(self, *, market: Optional[str] = None) -> Any:
-        return self.request(
-            "GET",
-            "/me/player/currently-playing",
-            params={"market": market},
-            empty_response={
-                "status_code": 204,
-                "empty": True,
-                "message": "Spotify is not currently playing anything. Start playback in Spotify and try again.",
-            },
-        )
-
-    def start_playback(
-        self,
-        *,
-        device_id: Optional[str] = None,
-        context_uri: Optional[str] = None,
-        uris: Optional[list[str]] = None,
-        offset: Optional[Dict[str, Any]] = None,
-        position_ms: Optional[int] = None,
-    ) -> Any:
-        return self.request(
-            "PUT",
-            "/me/player/play",
-            params={"device_id": device_id},
-            json_body={
-                "context_uri": context_uri,
-                "uris": uris,
-                "offset": offset,
-                "position_ms": position_ms,
-            },
-        )
-
-    def pause_playback(self, *, device_id: Optional[str] = None) -> Any:
-        return self.request("PUT", "/me/player/pause", params={"device_id": device_id})
-
-    def skip_next(self, *, device_id: Optional[str] = None) -> Any:
-        return self.request("POST", "/me/player/next", params={"device_id": device_id})
-
-    def skip_previous(self, *, device_id: Optional[str] = None) -> Any:
-        return self.request("POST", "/me/player/previous", params={"device_id": device_id})
-
-    def seek(self, *, position_ms: int, device_id: Optional[str] = None) -> Any:
-        return self.request("PUT", "/me/player/seek", params={
-            "position_ms": position_ms,
-            "device_id": device_id,
-        })
-
-    def set_repeat(self, *, state: str, device_id: Optional[str] = None) -> Any:
-        return self.request("PUT", "/me/player/repeat", params={"state": state, "device_id": device_id})
-
-    def set_shuffle(self, *, state: bool, device_id: Optional[str] = None) -> Any:
-        return self.request("PUT", "/me/player/shuffle", params={"state": str(bool(state)).lower(), "device_id": device_id})
-
-    def set_volume(self, *, volume_percent: int, device_id: Optional[str] = None) -> Any:
-        return self.request("PUT", "/me/player/volume", params={
-            "volume_percent": volume_percent,
-            "device_id": device_id,
-        })
-
-    def get_queue(self) -> Any:
-        return self.request("GET", "/me/player/queue")
-
-    def add_to_queue(self, *, uri: str, device_id: Optional[str] = None) -> Any:
-        return self.request("POST", "/me/player/queue", params={"uri": uri, "device_id": device_id})
-
-    def search(
-        self,
-        *,
-        query: str,
-        search_types: list[str],
-        limit: int = 10,
-        offset: int = 0,
-        market: Optional[str] = None,
-        include_external: Optional[str] = None,
-    ) -> Any:
-        return self.request("GET", "/search", params={
-            "q": query,
-            "type": ",".join(search_types),
-            "limit": limit,
-            "offset": offset,
-            "market": market,
-            "include_external": include_external,
-        })
-
-    def get_my_playlists(self, *, limit: int = 20, offset: int = 0) -> Any:
-        return self.request("GET", "/me/playlists", params={"limit": limit, "offset": offset})
-
-    def get_playlist(self, *, playlist_id: str, market: Optional[str] = None) -> Any:
-        return self.request("GET", f"/playlists/{playlist_id}", params={"market": market})
-
-    def create_playlist(
-        self,
-        *,
-        name: str,
-        public: bool = False,
-        collaborative: bool = False,
-        description: Optional[str] = None,
-    ) -> Any:
-        return self.request("POST", "/me/playlists", json_body={
-            "name": name,
-            "public": public,
-            "collaborative": collaborative,
-            "description": description,
-        })
-
-    def add_playlist_items(
-        self,
-        *,
-        playlist_id: str,
-        uris: list[str],
-        position: Optional[int] = None,
-    ) -> Any:
-        return self.request("POST", f"/playlists/{playlist_id}/items", json_body={
-            "uris": uris,
-            "position": position,
-        })
-
-    def remove_playlist_items(
-        self,
-        *,
-        playlist_id: str,
-        uris: list[str],
-        snapshot_id: Optional[str] = None,
-    ) -> Any:
-        return self.request("DELETE", f"/playlists/{playlist_id}/items", json_body={
-            "items": [{"uri": uri} for uri in uris],
-            "snapshot_id": snapshot_id,
-        })
-
-    def update_playlist_details(
-        self,
-        *,
-        playlist_id: str,
-        name: Optional[str] = None,
-        public: Optional[bool] = None,
-        collaborative: Optional[bool] = None,
-        description: Optional[str] = None,
-    ) -> Any:
-        return self.request("PUT", f"/playlists/{playlist_id}", json_body={
-            "name": name,
-            "public": public,
-            "collaborative": collaborative,
-            "description": description,
-        })
-
-    def get_album(self, *, album_id: str, market: Optional[str] = None) -> Any:
-        return self.request("GET", f"/albums/{album_id}", params={"market": market})
-
-    def get_album_tracks(self, *, album_id: str, limit: int = 20, offset: int = 0, market: Optional[str] = None) -> Any:
-        return self.request("GET", f"/albums/{album_id}/tracks", params={
-            "limit": limit,
-            "offset": offset,
-            "market": market,
-        })
-
-    def get_saved_tracks(self, *, limit: int = 20, offset: int = 0, market: Optional[str] = None) -> Any:
-        return self.request("GET", "/me/tracks", params={"limit": limit, "offset": offset, "market": market})
-
-    def save_library_items(self, *, uris: list[str]) -> Any:
-        return self.request("PUT", "/me/library", params={"uris": ",".join(uris)})
-
-    def library_contains(self, *, uris: list[str]) -> Any:
-        return self.request("GET", "/me/library/contains", params={"uris": ",".join(uris)})
-
-    def get_saved_albums(self, *, limit: int = 20, offset: int = 0, market: Optional[str] = None) -> Any:
-        return self.request("GET", "/me/albums", params={"limit": limit, "offset": offset, "market": market})
-
-    def remove_saved_tracks(self, *, track_ids: list[str]) -> Any:
-        uris = [f"spotify:track:{track_id}" for track_id in track_ids]
-        return self.request("DELETE", "/me/library", params={"uris": ",".join(uris)})
-
-    def remove_saved_albums(self, *, album_ids: list[str]) -> Any:
-        uris = [f"spotify:album:{album_id}" for album_id in album_ids]
-        return self.request("DELETE", "/me/library", params={"uris": ",".join(uris)})
-
-    def get_recently_played(
-        self,
-        *,
-        limit: int = 20,
-        after: Optional[int] = None,
-        before: Optional[int] = None,
-    ) -> Any:
-        return self.request("GET", "/me/player/recently-played", params={
-            "limit": limit,
-            "after": after,
-            "before": before,
-        })
-
-
-def _extract_spotify_error_detail(response: httpx.Response, *, fallback: str) -> str:
-    detail = fallback
-    try:
-        payload = response.json()
-        if isinstance(payload, dict):
-            error_obj = payload.get("error")
-            if isinstance(error_obj, dict):
-                detail = str(error_obj.get("message") or detail)
-            elif isinstance(error_obj, str):
-                detail = error_obj
-    except Exception:
-        pass
-    return detail.strip()
-
-
-def _friendly_spotify_error_message(
-    *,
-    status_code: int,
-    detail: str,
-    method: str,
-    path: str,
-    retry_after: Optional[str],
-) -> str:
-    normalized_detail = detail.lower()
-    is_playback_path = path.startswith("/me/player")
-
-    if status_code == 401:
-        return "Spotify authentication failed or expired. Run `hermes auth spotify` again."
-
-    if status_code == 403:
-        if is_playback_path:
-            return (
-                "Spotify rejected this playback request. Playback control usually requires a Spotify Premium account "
-                "and an active Spotify Connect device."
-            )
-        if "scope" in normalized_detail or "permission" in normalized_detail:
-            return "Spotify rejected the request because the current auth scope is insufficient. Re-run `hermes auth spotify` to refresh permissions."
-        return "Spotify rejected the request. The account may not have permission for this action."
-
-    if status_code == 404:
-        if is_playback_path:
-            return "Spotify could not find an active playback device or player session for this request."
-        return "Spotify resource not found."
-
-    if status_code == 429:
-        message = "Spotify rate limit exceeded."
-        if retry_after:
-            message += f" Retry after {retry_after} seconds."
-        return message
-
-    if detail:
-        return detail
-    return f"Spotify API request failed with status {status_code}."
-
-
-def _strip_none(payload: Optional[Dict[str, Any]]) -> Dict[str, Any]:
-    if not payload:
-        return {}
-    return {key: value for key, value in payload.items() if value is not None}
-
-
-def normalize_spotify_id(value: str, expected_type: Optional[str] = None) -> str:
-    cleaned = (value or "").strip()
-    if not cleaned:
-        raise SpotifyError("Spotify id/uri/url is required.")
-    if cleaned.startswith("spotify:"):
-        parts = cleaned.split(":")
-        if len(parts) >= 3:
-            item_type = parts[1]
-            if expected_type and item_type != expected_type:
-                raise SpotifyError(f"Expected a Spotify {expected_type}, got {item_type}.")
-            return parts[2]
-    if "open.spotify.com" in cleaned:
-        parsed = urlparse(cleaned)
-        path_parts = [part for part in parsed.path.split("/") if part]
-        if len(path_parts) >= 2:
-            item_type, item_id = path_parts[0], path_parts[1]
-            if expected_type and item_type != expected_type:
-                raise SpotifyError(f"Expected a Spotify {expected_type}, got {item_type}.")
-            return item_id
-    return cleaned
-
-
-def normalize_spotify_uri(value: str, expected_type: Optional[str] = None) -> str:
-    cleaned = (value or "").strip()
-    if not cleaned:
-        raise SpotifyError("Spotify URI/url/id is required.")
-    if cleaned.startswith("spotify:"):
-        if expected_type:
-            parts = cleaned.split(":")
-            if len(parts) >= 3 and parts[1] != expected_type:
-                raise SpotifyError(f"Expected a Spotify {expected_type}, got {parts[1]}.")
-        return cleaned
-    item_id = normalize_spotify_id(cleaned, expected_type)
-    if expected_type:
-        return f"spotify:{expected_type}:{item_id}"
-    return cleaned
-
-
-def normalize_spotify_uris(values: Iterable[str], expected_type: Optional[str] = None) -> list[str]:
-    uris: list[str] = []
-    for value in values:
-        uri = normalize_spotify_uri(str(value), expected_type)
-        if uri not in uris:
-            uris.append(uri)
-    if not uris:
-        raise SpotifyError("At least one Spotify item is required.")
-    return uris
-
-
-def compact_json(data: Any) -> str:
-    return json.dumps(data, ensure_ascii=False)
@@ -1,13 +0,0 @@
-name: spotify
-version: 1.0.0
-description: "Native Spotify integration — 7 tools (playback, devices, queue, search, playlists, albums, library) using Spotify Web API + PKCE OAuth. Auth via `hermes auth spotify`. Tools gate on `providers.spotify` in ~/.hermes/auth.json."
-author: NousResearch
-kind: backend
-provides_tools:
-  - spotify_playback
-  - spotify_devices
-  - spotify_queue
-  - spotify_search
-  - spotify_playlists
-  - spotify_albums
-  - spotify_library
@@ -1,454 +0,0 @@
-"""Native Spotify tools for Hermes (registered via plugins/spotify)."""
-
-from __future__ import annotations
-
-from typing import Any, Dict, List
-
-from hermes_cli.auth import get_auth_status
-from plugins.spotify.client import (
-    SpotifyAPIError,
-    SpotifyAuthRequiredError,
-    SpotifyClient,
-    SpotifyError,
-    normalize_spotify_id,
-    normalize_spotify_uri,
-    normalize_spotify_uris,
-)
-from tools.registry import tool_error, tool_result
-
-
-def _check_spotify_available() -> bool:
-    try:
-        return bool(get_auth_status("spotify").get("logged_in"))
-    except Exception:
-        return False
-
-
-def _spotify_client() -> SpotifyClient:
-    return SpotifyClient()
-
-
-def _spotify_tool_error(exc: Exception) -> str:
-    if isinstance(exc, (SpotifyError, SpotifyAuthRequiredError)):
-        return tool_error(str(exc))
-    if isinstance(exc, SpotifyAPIError):
-        return tool_error(str(exc), status_code=exc.status_code)
-    return tool_error(f"Spotify tool failed: {type(exc).__name__}: {exc}")
-
-
-def _coerce_limit(raw: Any, *, default: int = 20, minimum: int = 1, maximum: int = 50) -> int:
-    try:
-        value = int(raw)
-    except Exception:
-        value = default
-    return max(minimum, min(maximum, value))
-
-
-def _coerce_bool(raw: Any, default: bool = False) -> bool:
-    if isinstance(raw, bool):
-        return raw
-    if isinstance(raw, str):
-        cleaned = raw.strip().lower()
-        if cleaned in {"1", "true", "yes", "on"}:
-            return True
-        if cleaned in {"0", "false", "no", "off"}:
-            return False
-    return default
-
-
-def _as_list(raw: Any) -> List[str]:
-    if raw is None:
-        return []
-    if isinstance(raw, list):
-        return [str(item).strip() for item in raw if str(item).strip()]
-    return [str(raw).strip()] if str(raw).strip() else []
-
-
-def _describe_empty_playback(payload: Any, *, action: str) -> dict | None:
-    if not isinstance(payload, dict) or not payload.get("empty"):
-        return None
-    if action == "get_currently_playing":
-        return {
-            "success": True,
-            "action": action,
-            "is_playing": False,
-            "status_code": payload.get("status_code", 204),
-            "message": payload.get("message") or "Spotify is not currently playing anything.",
-        }
-    if action == "get_state":
-        return {
-            "success": True,
-            "action": action,
-            "has_active_device": False,
-            "status_code": payload.get("status_code", 204),
-            "message": payload.get("message") or "No active Spotify playback session was found.",
-        }
-    return None
-
-
-def _handle_spotify_playback(args: dict, **kw) -> str:
-    action = str(args.get("action") or "get_state").strip().lower()
-    client = _spotify_client()
-    try:
-        if action == "get_state":
-            payload = client.get_playback_state(market=args.get("market"))
-            empty_result = _describe_empty_playback(payload, action=action)
-            return tool_result(empty_result or payload)
-        if action == "get_currently_playing":
-            payload = client.get_currently_playing(market=args.get("market"))
-            empty_result = _describe_empty_playback(payload, action=action)
-            return tool_result(empty_result or payload)
-        if action == "play":
-            offset = args.get("offset")
-            if isinstance(offset, dict):
-                payload_offset = {k: v for k, v in offset.items() if v is not None}
-            else:
-                payload_offset = None
-            uris = normalize_spotify_uris(_as_list(args.get("uris")), "track") if args.get("uris") else None
-            context_uri = None
-            if args.get("context_uri"):
-                raw_context = str(args.get("context_uri"))
-                context_type = None
-                if raw_context.startswith("spotify:album:") or "/album/" in raw_context:
-                    context_type = "album"
-                elif raw_context.startswith("spotify:playlist:") or "/playlist/" in raw_context:
-                    context_type = "playlist"
-                elif raw_context.startswith("spotify:artist:") or "/artist/" in raw_context:
-                    context_type = "artist"
-                context_uri = normalize_spotify_uri(raw_context, context_type)
-            result = client.start_playback(
-                device_id=args.get("device_id"),
-                context_uri=context_uri,
-                uris=uris,
-                offset=payload_offset,
-                position_ms=args.get("position_ms"),
-            )
-            return tool_result({"success": True, "action": action, "result": result})
-        if action == "pause":
-            result = client.pause_playback(device_id=args.get("device_id"))
-            return tool_result({"success": True, "action": action, "result": result})
-        if action == "next":
-            result = client.skip_next(device_id=args.get("device_id"))
-            return tool_result({"success": True, "action": action, "result": result})
-        if action == "previous":
-            result = client.skip_previous(device_id=args.get("device_id"))
-            return tool_result({"success": True, "action": action, "result": result})
-        if action == "seek":
-            if args.get("position_ms") is None:
-                return tool_error("position_ms is required for action='seek'")
-            result = client.seek(position_ms=int(args["position_ms"]), device_id=args.get("device_id"))
-            return tool_result({"success": True, "action": action, "result": result})
-        if action == "set_repeat":
-            state = str(args.get("state") or "").strip().lower()
-            if state not in {"track", "context", "off"}:
-                return tool_error("state must be one of: track, context, off")
-            result = client.set_repeat(state=state, device_id=args.get("device_id"))
-            return tool_result({"success": True, "action": action, "result": result})
-        if action == "set_shuffle":
-            result = client.set_shuffle(state=_coerce_bool(args.get("state")), device_id=args.get("device_id"))
-            return tool_result({"success": True, "action": action, "result": result})
-        if action == "set_volume":
-            if args.get("volume_percent") is None:
-                return tool_error("volume_percent is required for action='set_volume'")
-            result = client.set_volume(volume_percent=max(0, min(100, int(args["volume_percent"]))), device_id=args.get("device_id"))
-            return tool_result({"success": True, "action": action, "result": result})
-        if action == "recently_played":
-            after = args.get("after")
-            before = args.get("before")
-            if after and before:
-                return tool_error("Provide only one of 'after' or 'before'")
-            return tool_result(client.get_recently_played(
-                limit=_coerce_limit(args.get("limit"), default=20),
-                after=int(after) if after is not None else None,
-                before=int(before) if before is not None else None,
-            ))
-        return tool_error(f"Unknown spotify_playback action: {action}")
-    except Exception as exc:
-        return _spotify_tool_error(exc)
-
-
-def _handle_spotify_devices(args: dict, **kw) -> str:
-    action = str(args.get("action") or "list").strip().lower()
-    client = _spotify_client()
-    try:
-        if action == "list":
-            return tool_result(client.get_devices())
-        if action == "transfer":
-            device_id = str(args.get("device_id") or "").strip()
-            if not device_id:
-                return tool_error("device_id is required for action='transfer'")
-            result = client.transfer_playback(device_id=device_id, play=_coerce_bool(args.get("play")))
-            return tool_result({"success": True, "action": action, "result": result})
-        return tool_error(f"Unknown spotify_devices action: {action}")
-    except Exception as exc:
-        return _spotify_tool_error(exc)
-
-
-def _handle_spotify_queue(args: dict, **kw) -> str:
-    action = str(args.get("action") or "get").strip().lower()
-    client = _spotify_client()
-    try:
-        if action == "get":
-            return tool_result(client.get_queue())
-        if action == "add":
-            uri = normalize_spotify_uri(str(args.get("uri") or ""), None)
-            result = client.add_to_queue(uri=uri, device_id=args.get("device_id"))
-            return tool_result({"success": True, "action": action, "uri": uri, "result": result})
-        return tool_error(f"Unknown spotify_queue action: {action}")
-    except Exception as exc:
-        return _spotify_tool_error(exc)
-
-
-def _handle_spotify_search(args: dict, **kw) -> str:
-    client = _spotify_client()
-    query = str(args.get("query") or "").strip()
-    if not query:
-        return tool_error("query is required")
-    raw_types = _as_list(args.get("types") or args.get("type") or ["track"])
-    search_types = [value.lower() for value in raw_types if value.lower() in {"album", "artist", "playlist", "track", "show", "episode", "audiobook"}]
-    if not search_types:
-        return tool_error("types must contain one or more of: album, artist, playlist, track, show, episode, audiobook")
-    try:
-        return tool_result(client.search(
-            query=query,
-            search_types=search_types,
-            limit=_coerce_limit(args.get("limit"), default=10),
-            offset=max(0, int(args.get("offset") or 0)),
-            market=args.get("market"),
-            include_external=args.get("include_external"),
-        ))
-    except Exception as exc:
-        return _spotify_tool_error(exc)
-
-
-def _handle_spotify_playlists(args: dict, **kw) -> str:
-    action = str(args.get("action") or "list").strip().lower()
-    client = _spotify_client()
-    try:
-        if action == "list":
-            return tool_result(client.get_my_playlists(
-                limit=_coerce_limit(args.get("limit"), default=20),
-                offset=max(0, int(args.get("offset") or 0)),
-            ))
-        if action == "get":
-            playlist_id = normalize_spotify_id(str(args.get("playlist_id") or ""), "playlist")
-            return tool_result(client.get_playlist(playlist_id=playlist_id, market=args.get("market")))
-        if action == "create":
-            name = str(args.get("name") or "").strip()
-            if not name:
-                return tool_error("name is required for action='create'")
-            return tool_result(client.create_playlist(
-                name=name,
-                public=_coerce_bool(args.get("public")),
-                collaborative=_coerce_bool(args.get("collaborative")),
-                description=args.get("description"),
-            ))
-        if action == "add_items":
-            playlist_id = normalize_spotify_id(str(args.get("playlist_id") or ""), "playlist")
-            uris = normalize_spotify_uris(_as_list(args.get("uris")))
-            return tool_result(client.add_playlist_items(
-                playlist_id=playlist_id,
-                uris=uris,
-                position=args.get("position"),
-            ))
-        if action == "remove_items":
-            playlist_id = normalize_spotify_id(str(args.get("playlist_id") or ""), "playlist")
-            uris = normalize_spotify_uris(_as_list(args.get("uris")))
-            return tool_result(client.remove_playlist_items(
-                playlist_id=playlist_id,
-                uris=uris,
-                snapshot_id=args.get("snapshot_id"),
-            ))
-        if action == "update_details":
-            playlist_id = normalize_spotify_id(str(args.get("playlist_id") or ""), "playlist")
-            return tool_result(client.update_playlist_details(
-                playlist_id=playlist_id,
-                name=args.get("name"),
-                public=args.get("public"),
-                collaborative=args.get("collaborative"),
-                description=args.get("description"),
-            ))
-        return tool_error(f"Unknown spotify_playlists action: {action}")
-    except Exception as exc:
-        return _spotify_tool_error(exc)
-
-
-def _handle_spotify_albums(args: dict, **kw) -> str:
-    action = str(args.get("action") or "get").strip().lower()
-    client = _spotify_client()
-    try:
-        album_id = normalize_spotify_id(str(args.get("album_id") or args.get("id") or ""), "album")
-        if action == "get":
-            return tool_result(client.get_album(album_id=album_id, market=args.get("market")))
-        if action == "tracks":
-            return tool_result(client.get_album_tracks(
-                album_id=album_id,
-                limit=_coerce_limit(args.get("limit"), default=20),
-                offset=max(0, int(args.get("offset") or 0)),
-                market=args.get("market"),
-            ))
-        return tool_error(f"Unknown spotify_albums action: {action}")
-    except Exception as exc:
-        return _spotify_tool_error(exc)
-
-
-def _handle_spotify_library(args: dict, **kw) -> str:
-    """Unified handler for saved tracks + saved albums (formerly two tools)."""
-    kind = str(args.get("kind") or "").strip().lower()
-    if kind not in {"tracks", "albums"}:
-        return tool_error("kind must be one of: tracks, albums")
-    action = str(args.get("action") or "list").strip().lower()
-    item_type = "track" if kind == "tracks" else "album"
-    client = _spotify_client()
-    try:
-        if action == "list":
-            limit = _coerce_limit(args.get("limit"), default=20)
-            offset = max(0, int(args.get("offset") or 0))
-            market = args.get("market")
-            if kind == "tracks":
-                return tool_result(client.get_saved_tracks(limit=limit, offset=offset, market=market))
-            return tool_result(client.get_saved_albums(limit=limit, offset=offset, market=market))
-        if action == "save":
-            uris = normalize_spotify_uris(_as_list(args.get("uris") or args.get("items")), item_type)
-            return tool_result(client.save_library_items(uris=uris))
-        if action == "remove":
-            ids = [normalize_spotify_id(item, item_type) for item in _as_list(args.get("ids") or args.get("items"))]
-            if not ids:
-                return tool_error("ids/items is required for action='remove'")
-            if kind == "tracks":
-                return tool_result(client.remove_saved_tracks(track_ids=ids))
-            return tool_result(client.remove_saved_albums(album_ids=ids))
-        return tool_error(f"Unknown spotify_library action: {action}")
-    except Exception as exc:
-        return _spotify_tool_error(exc)
-
-
-COMMON_STRING = {"type": "string"}
-
-SPOTIFY_PLAYBACK_SCHEMA = {
-    "name": "spotify_playback",
-    "description": "Control Spotify playback, inspect the active playback state, or fetch recently played tracks.",
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "action": {"type": "string", "enum": ["get_state", "get_currently_playing", "play", "pause", "next", "previous", "seek", "set_repeat", "set_shuffle", "set_volume", "recently_played"]},
-            "device_id": COMMON_STRING,
-            "market": COMMON_STRING,
-            "context_uri": COMMON_STRING,
-            "uris": {"type": "array", "items": COMMON_STRING},
-            "offset": {"type": "object"},
-            "position_ms": {"type": "integer"},
-            "state": {"description": "For set_repeat use track/context/off. For set_shuffle use boolean-like true/false.", "oneOf": [{"type": "string"}, {"type": "boolean"}]},
-            "volume_percent": {"type": "integer"},
-            "limit": {"type": "integer", "description": "For recently_played: number of tracks (max 50)"},
-            "after": {"type": "integer", "description": "For recently_played: Unix ms cursor (after this timestamp)"},
-            "before": {"type": "integer", "description": "For recently_played: Unix ms cursor (before this timestamp)"},
-        },
-        "required": ["action"],
-    },
-}
-
-SPOTIFY_DEVICES_SCHEMA = {
-    "name": "spotify_devices",
-    "description": "List Spotify Connect devices or transfer playback to a different device.",
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "action": {"type": "string", "enum": ["list", "transfer"]},
-            "device_id": COMMON_STRING,
-            "play": {"type": "boolean"},
-        },
-        "required": ["action"],
-    },
-}
-
-SPOTIFY_QUEUE_SCHEMA = {
-    "name": "spotify_queue",
-    "description": "Inspect the user's Spotify queue or add an item to it.",
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "action": {"type": "string", "enum": ["get", "add"]},
-            "uri": COMMON_STRING,
-            "device_id": COMMON_STRING,
-        },
-        "required": ["action"],
-    },
-}
-
-SPOTIFY_SEARCH_SCHEMA = {
-    "name": "spotify_search",
-    "description": "Search the Spotify catalog for tracks, albums, artists, playlists, shows, or episodes.",
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "query": COMMON_STRING,
-            "types": {"type": "array", "items": COMMON_STRING},
-            "type": COMMON_STRING,
-            "limit": {"type": "integer"},
-            "offset": {"type": "integer"},
-            "market": COMMON_STRING,
-            "include_external": COMMON_STRING,
-        },
-        "required": ["query"],
-    },
-}
-
-SPOTIFY_PLAYLISTS_SCHEMA = {
-    "name": "spotify_playlists",
-    "description": "List, inspect, create, update, and modify Spotify playlists.",
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "action": {"type": "string", "enum": ["list", "get", "create", "add_items", "remove_items", "update_details"]},
-            "playlist_id": COMMON_STRING,
-            "market": COMMON_STRING,
-            "limit": {"type": "integer"},
-            "offset": {"type": "integer"},
-            "name": COMMON_STRING,
-            "description": COMMON_STRING,
-            "public": {"type": "boolean"},
-            "collaborative": {"type": "boolean"},
-            "uris": {"type": "array", "items": COMMON_STRING},
-            "position": {"type": "integer"},
-            "snapshot_id": COMMON_STRING,
-        },
-        "required": ["action"],
-    },
-}
-
-SPOTIFY_ALBUMS_SCHEMA = {
-    "name": "spotify_albums",
-    "description": "Fetch Spotify album metadata or album tracks.",
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "action": {"type": "string", "enum": ["get", "tracks"]},
-            "album_id": COMMON_STRING,
-            "id": COMMON_STRING,
-            "market": COMMON_STRING,
-            "limit": {"type": "integer"},
-            "offset": {"type": "integer"},
-        },
-        "required": ["action"],
-    },
-}
-
-SPOTIFY_LIBRARY_SCHEMA = {
-    "name": "spotify_library",
-    "description": "List, save, or remove the user's saved Spotify tracks or albums. Use `kind` to select which.",
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "kind": {"type": "string", "enum": ["tracks", "albums"], "description": "Which library to operate on"},
-            "action": {"type": "string", "enum": ["list", "save", "remove"]},
-            "limit": {"type": "integer"},
-            "offset": {"type": "integer"},
-            "market": COMMON_STRING,
-            "uris": {"type": "array", "items": COMMON_STRING},
-            "ids": {"type": "array", "items": COMMON_STRING},
-            "items": {"type": "array", "items": COMMON_STRING},
-        },
-        "required": ["kind", "action"],
-    },
-}
@@ -78,16 +78,6 @@ termux = [
 ]
 dingtalk = ["dingtalk-stream>=0.20,<1", "alibabacloud-dingtalk>=2.0.0", "qrcode>=7.0,<8"]
 feishu = ["lark-oapi>=1.5.3,<2", "qrcode>=7.0,<8"]
-google = [
-  # Required by the google-workspace skill (Gmail, Calendar, Drive, Contacts,
-  # Sheets, Docs).  Declared here so packagers (Nix, Homebrew) ship them with
-  # the [all] extra and users don't hit runtime `pip install` paths that fail
-  # in environments without pip (e.g. Nix-managed Python).
-  "google-api-python-client>=2.100,<3",
-  "google-auth-oauthlib>=1.0,<2",
-  "google-auth-httplib2>=0.2,<1",
-]
-# `hermes dashboard` (localhost SPA + API).  Not in core to keep the default install lean.
 web = ["fastapi>=0.104.0,<1", "uvicorn[standard]>=0.24.0,<1"]
 rl = [
  "atroposlib @ git+https://github.com/NousResearch/atropos.git@c20c85256e5a45ad31edf8b7276e9c5ee1995a30",
@@ -119,7 +109,6 @@ all = [
  "hermes-agent[voice]",
  "hermes-agent[dingtalk]",
  "hermes-agent[feishu]",
-  "hermes-agent[google]",
  "hermes-agent[mistral]",
  "hermes-agent[bedrock]",
  "hermes-agent[web]",
@@ -29,25 +29,10 @@ BOLD='\033[1m'
 REPO_URL_SSH="git@github.com:NousResearch/hermes-agent.git"
 REPO_URL_HTTPS="https://github.com/NousResearch/hermes-agent.git"
 HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"
-# INSTALL_DIR is resolved AFTER arg parsing and OS detection so we can pick an
-# FHS-style layout for root installs.  Track whether the user gave us an
-# explicit directory — if so we never override it.
-if [ -n "${HERMES_INSTALL_DIR:-}" ]; then
-    INSTALL_DIR="$HERMES_INSTALL_DIR"
-    INSTALL_DIR_EXPLICIT=true
-else
-    INSTALL_DIR=""
-    INSTALL_DIR_EXPLICIT=false
-fi
+INSTALL_DIR="${HERMES_INSTALL_DIR:-$HERMES_HOME/hermes-agent}"
 PYTHON_VERSION="3.11"
 NODE_VERSION="22"

-# FHS-style root install layout (set by resolve_install_layout when applicable):
-#   code at /usr/local/lib/hermes-agent, command at /usr/local/bin/hermes,
-#   data still at /root/.hermes (HERMES_HOME).  Matches Claude Code / Codex CLI
-#   and keeps Docker bind-mounted /root/ volumes lean.
-ROOT_FHS_LAYOUT=false
-
 # Options
 USE_VENV=true
 RUN_SETUP=true
@@ -79,7 +64,6 @@ while [[ $# -gt 0 ]]; do
            ;;
        --dir)
            INSTALL_DIR="$2"
-            INSTALL_DIR_EXPLICIT=true
            shift 2
            ;;
        --hermes-home)
@@ -95,20 +79,9 @@ while [[ $# -gt 0 ]]; do
            echo "  --no-venv      Don't create virtual environment"
            echo "  --skip-setup   Skip interactive setup wizard"
            echo "  --branch NAME  Git branch to install (default: main)"
-            echo "  --dir PATH     Installation directory"
-            echo "                   default (non-root):  ~/.hermes/hermes-agent"
-            echo "                   default (root, Linux): /usr/local/lib/hermes-agent"
+            echo "  --dir PATH     Installation directory (default: ~/.hermes/hermes-agent)"
            echo "  --hermes-home PATH  Data directory (default: ~/.hermes, or \$HERMES_HOME)"
            echo "  -h, --help     Show this help"
-            echo ""
-            echo "Notes:"
-            echo "  When running as root on Linux, Hermes installs the code under"
-            echo "  /usr/local/lib/hermes-agent and links the command into"
-            echo "  /usr/local/bin/hermes (FHS layout — matches Claude Code / Codex CLI)."
-            echo "  Data, config, sessions, and logs still live in \$HERMES_HOME"
-            echo "  (default /root/.hermes).  This keeps Docker bind-mounted volumes"
-            echo "  small and ensures the command is on PATH for all shells."
-            echo "  Existing installs at \$HERMES_HOME/hermes-agent are preserved in-place."
            exit 0
            ;;
        *)
@@ -190,60 +163,9 @@ is_termux() {
    [ -n "${TERMUX_VERSION:-}" ] || [[ "${PREFIX:-}" == *"com.termux/files/usr"* ]]
 }

-# Decide where the repo checkout + venv live, and where the `hermes` command
-# symlink goes.  Called after detect_os so $OS/$DISTRO are known.
-#
-# Defaults:
-#   - Non-root, any OS:       INSTALL_DIR = $HERMES_HOME/hermes-agent
-#                             command link in $HOME/.local/bin
-#   - Termux (any uid):       INSTALL_DIR = $HERMES_HOME/hermes-agent
-#                             command link in $PREFIX/bin (already on PATH)
-#   - Root on Linux (new):    INSTALL_DIR = /usr/local/lib/hermes-agent
-#                             command link in /usr/local/bin
-#                             (unless a legacy install already exists at
-#                              $HERMES_HOME/hermes-agent — then preserve it)
-#
-# Always no-op when the user set --dir or $HERMES_INSTALL_DIR.
-resolve_install_layout() {
-    if [ "$INSTALL_DIR_EXPLICIT" = true ]; then
-        log_info "Install directory: $INSTALL_DIR (explicit)"
-        return 0
-    fi
-
-    # Termux: package manager manages /data/data/..., keep code in HERMES_HOME.
-    if is_termux; then
-        INSTALL_DIR="$HERMES_HOME/hermes-agent"
-        return 0
-    fi
-
-    # Root on Linux: prefer FHS layout unless a legacy install already exists.
-    # macOS root installs keep the legacy layout because /usr/local/ on macOS
-    # is Homebrew territory and we don't want to fight that.
-    if [ "$OS" = "linux" ] && [ "$(id -u)" -eq 0 ]; then
-        if [ -d "$HERMES_HOME/hermes-agent/.git" ]; then
-            INSTALL_DIR="$HERMES_HOME/hermes-agent"
-            log_info "Existing install detected at $INSTALL_DIR — keeping legacy layout"
-            log_info "  (new root installs use /usr/local/lib/hermes-agent)"
-            return 0
-        fi
-        INSTALL_DIR="/usr/local/lib/hermes-agent"
-        ROOT_FHS_LAYOUT=true
-        log_info "Root install on Linux — using FHS layout"
-        log_info "  Code:    $INSTALL_DIR"
-        log_info "  Command: /usr/local/bin/hermes"
-        log_info "  Data:    $HERMES_HOME (unchanged)"
-        return 0
-    fi
-
-    # Default: non-root, non-Termux → legacy user-scoped layout.
-    INSTALL_DIR="$HERMES_HOME/hermes-agent"
-}
-
 get_command_link_dir() {
    if is_termux && [ -n "${PREFIX:-}" ]; then
        echo "$PREFIX/bin"
-    elif [ "$ROOT_FHS_LAYOUT" = true ]; then
-        echo "/usr/local/bin"
    else
        echo "$HOME/.local/bin"
    fi
@@ -252,8 +174,6 @@ get_command_link_dir() {
 get_command_link_display_dir() {
    if is_termux && [ -n "${PREFIX:-}" ]; then
        echo '$PREFIX/bin'
-    elif [ "$ROOT_FHS_LAYOUT" = true ]; then
-        echo '/usr/local/bin'
    else
        echo '~/.local/bin'
    fi
@@ -1055,14 +975,6 @@ setup_path() {
        return 0
    fi

-    # FHS layout: /usr/local/bin is on PATH for every standard shell, nothing to inject.
-    if [ "$ROOT_FHS_LAYOUT" = true ]; then
-        export PATH="$command_link_dir:$PATH"
-        log_info "/usr/local/bin is already on PATH for all shells"
-        log_success "hermes command ready"
-        return 0
-    fi
-
    # Check if ~/.local/bin is on PATH; if not, add it to shell config.
    # Detect the user's actual login shell (not the shell running this script,
    # which is always bash when piped from curl).
@@ -1427,12 +1339,12 @@ print_success() {
    echo ""

    # Show file locations
-    echo -e "${CYAN}${BOLD}📁 Your files:${NC}"
+    echo -e "${CYAN}${BOLD}📁 Your files (all in ~/.hermes/):${NC}"
    echo ""
-    echo -e "   ${YELLOW}Config:${NC}    $HERMES_HOME/config.yaml"
-    echo -e "   ${YELLOW}API Keys:${NC}  $HERMES_HOME/.env"
-    echo -e "   ${YELLOW}Data:${NC}      $HERMES_HOME/cron/, sessions/, logs/"
-    echo -e "   ${YELLOW}Code:${NC}      $INSTALL_DIR"
+    echo -e "   ${YELLOW}Config:${NC}    ~/.hermes/config.yaml"
+    echo -e "   ${YELLOW}API Keys:${NC}  ~/.hermes/.env"
+    echo -e "   ${YELLOW}Data:${NC}      ~/.hermes/cron/, sessions/, logs/"
+    echo -e "   ${YELLOW}Code:${NC}      ~/.hermes/hermes-agent/"
    echo ""

    echo -e "${CYAN}─────────────────────────────────────────────────────────${NC}"
@@ -1452,9 +1364,6 @@ print_success() {
    if [ "$DISTRO" = "termux" ]; then
        echo -e "${YELLOW}⚡ 'hermes' was linked into $(get_command_link_display_dir), which is already on PATH in Termux.${NC}"
        echo ""
-    elif [ "$ROOT_FHS_LAYOUT" = true ]; then
-        echo -e "${YELLOW}⚡ 'hermes' was linked into /usr/local/bin and is ready to use — no shell reload needed.${NC}"
-        echo ""
    else
        echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
        echo ""
@@ -1506,7 +1415,6 @@ main() {
    print_banner

    detect_os
-    resolve_install_layout
    install_uv
    check_python
    check_git
@@ -44,13 +44,9 @@ AUTHOR_MAP = {
    "teknium@nousresearch.com": "teknium1",
    "127238744+teknium1@users.noreply.github.com": "teknium1",
    "343873859@qq.com": "DrStrangerUJN",
-    "uzmpsk.dilekakbas@gmail.com": "dlkakbs",
    "jefferson@heimdallstrategy.com": "Mind-Dragon",
    "130918800+devorun@users.noreply.github.com": "devorun",
    "maks.mir@yahoo.com": "say8hi",
-    "web3blind@users.noreply.github.com": "web3blind",
-    "julia@alexland.us": "alexg0bot",
-    "1060770+benjaminsehl@users.noreply.github.com": "benjaminsehl",
    # contributors (from noreply pattern)
    "david.vv@icloud.com": "davidvv",
    "wangqiang@wangqiangdeMac-mini.local": "xiaoqiang243",
@@ -62,19 +58,13 @@ AUTHOR_MAP = {
    "keifergu@tencent.com": "keifergu",
    "kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
    "abner.the.foreman@agentmail.to": "Abnertheforeman",
-    "thomasgeorgevii09@gmail.com": "tochukwuada",
    "harryykyle1@gmail.com": "hharry11",
    "kshitijk4poor@gmail.com": "kshitijk4poor",
-    "keira.voss94@gmail.com": "keiravoss94",
    "16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
-    "simbamax99@gmail.com": "simbam99",
    "185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
    "101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
    "255305877+ismell0992-afk@users.noreply.github.com": "ismell0992-afk",
-    "cyprian@ironin.pl": "iRonin",
    "valdi.jorge@gmail.com": "jvcl",
-    "q19dcp@gmail.com": "aj-nt",
-    "ebukau84@gmail.com": "UgwujaGeorge",
    "francip@gmail.com": "francip",
    "omni@comelse.com": "omnissiah-comelse",
    "oussama.redcode@gmail.com": "mavrickdeveloper",
@@ -87,12 +77,10 @@ AUTHOR_MAP = {
    "77628552+raulvidis@users.noreply.github.com": "raulvidis",
    "145567217+Aum08Desai@users.noreply.github.com": "Aum08Desai",
    "256820943+kshitij-eliza@users.noreply.github.com": "kshitij-eliza",
-    "jiechengwu@pony.ai": "Jason2031",
    "44278268+shitcoinsherpa@users.noreply.github.com": "shitcoinsherpa",
    "104278804+Sertug17@users.noreply.github.com": "Sertug17",
    "112503481+caentzminger@users.noreply.github.com": "caentzminger",
    "258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
-    "xydarcher@uestc.edu.cn": "Readon",
    "sir_even@icloud.com": "sirEven",
    "36056348+sirEven@users.noreply.github.com": "sirEven",
    "70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
@@ -115,7 +103,6 @@ AUTHOR_MAP = {
    "30841158+n-WN@users.noreply.github.com": "n-WN",
    "tsuijinglei@gmail.com": "hiddenpuppy",
    "jerome@clawwork.ai": "HiddenPuppy",
-    "jerome.benoit@sap.com": "jerome-benoit",
    "wysie@users.noreply.github.com": "Wysie",
    "leoyuan0099@gmail.com": "keyuyuan",
    "bxzt2006@163.com": "Only-Code-A",
@@ -181,38 +168,6 @@ AUTHOR_MAP = {
    "seanalt555@gmail.com": "Salt-555",
    "satelerd@gmail.com": "satelerd",
    "dan@danlynn.com": "danklynn",
-    "mattmaximo@hotmail.com": "MattMaximo",
-    "149063006+j3ffffff@users.noreply.github.com": "j3ffffff",
-    "A-FdL-Prog@users.noreply.github.com": "A-FdL-Prog",
-    "l0hde@users.noreply.github.com": "l0hde",
-    "difujia@users.noreply.github.com": "difujia",
-    "vominh1919@gmail.com": "vominh1919",
-    "yue.gu2023@gmail.com": "YueLich",
-    "51783311+andyylin@users.noreply.github.com": "andyylin",
-    "me@jakubkrcmar.cz": "jakubkrcmar",
-    "prasadus92@gmail.com": "prasadus92",
-    "michael@make.software": "mssteuer",
-    "der@konsi.org": "konsisumer",
-    "abogale2@gmail.com": "amanuel2",
-    "alexazzjjtt@163.com": "alexzhu0",
-    "pub_forgreatagent@antgroup.com": "AntAISecurityLab",
-    "252620095+briandevans@users.noreply.github.com": "briandevans",
-    "danielrpike9@gmail.com": "Bartok9",
-    "skozyuk@cruxexperts.com": "CruxExperts",
-    "154585401+LeonSGP43@users.noreply.github.com": "LeonSGP43",
-    "mgparkprint@gmail.com": "vlwkaos",
-    "tranquil_flow@protonmail.com": "Tranquil-Flow",
-    "wangshengyang2004@163.com": "Wangshengyang2004",
-    "hasan.ali13381@gmail.com": "H-Ali13381",
-    "xienb@proton.me": "XieNBi",
-    "139681654+maymuneth@users.noreply.github.com": "maymuneth",
-    "zengwei@nightq.cn": "nightq",
-    "1434494126@qq.com": "5park1e",
-    "158153005+5park1e@users.noreply.github.com": "5park1e",
-    "innocarpe@gmail.com": "innocarpe",
-    "noreply@ked.com": "qike-ms",
-    "andrekurait@gmail.com": "AndreKurait",
-    "bsgdigital@users.noreply.github.com": "bsgdigital",
    "numman.ali@gmail.com": "nummanali",
    "rohithsaimidigudla@gmail.com": "whitehatjr1001",
    "0xNyk@users.noreply.github.com": "0xNyk",
@@ -231,11 +186,6 @@ AUTHOR_MAP = {
    "bryan@intertwinesys.com": "bryanyoung",
    "christo.mitov@gmail.com": "christomitov",
    "hermes@nousresearch.com": "NousResearch",
-    "reginaldasr@gmail.com": "ReginaldasR",
-    "ntconguit@gmail.com": "0xharryriddle",
-    "agent@wildcat.local": "ericnicolaides",
-    "georgex8001@gmail.com": "georgex8001",
-    "stefan@dimagents.ai": "dimitrovi",
    "hermes@noushq.ai": "benbarclay",
    "chinmingcock@gmail.com": "ChimingLiu",
    "openclaw@sparklab.ai": "openclaw",
@@ -384,9 +334,6 @@ AUTHOR_MAP = {
    "brian@bde.io": "briandevans",
    "hubin_ll@qq.com": "LLQWQ",
    "memosr_email@gmail.com": "memosr",
-    "jperlow@gmail.com": "perlowja",
-    "tangyuanjc@JCdeAIfenshendeMac-mini.local": "tangyuanjc",
-    "harryplusplus@gmail.com": "harryplusplus",
    "anthhub@163.com": "anthhub",
    "shenuu@gmail.com": "shenuu",
    "xiayh17@gmail.com": "xiayh0107",
@@ -490,12 +437,6 @@ AUTHOR_MAP = {
    "topcheer@me.com": "topcheer",
    "walli@tencent.com": "walli",
    "zhuofengwang@tencent.com": "Zhuofeng-Wang",
-    # April 2026 salvage-PR batch (#14920, #14986, #14966)
-    "mrunmayeerane17@gmail.com": "mrunmayee17",
-    "69489633+camaragon@users.noreply.github.com": "camaragon",
-    "shamork@outlook.com": "shamork",
-    # April 2026 Discord Copilot /model salvage (#15030)
-    "cshong2017@outlook.com": "Nicecsh",
    # no-github-match — keep as display names
    "clio-agent@sisyphuslabs.ai": "Sisyphus",
    "marco@rutimka.de": "Marco Rutsch",
@@ -503,9 +444,6 @@ AUTHOR_MAP = {
    "zhangxicen@example.com": "zhangxicen",
    "codex@openai.invalid": "teknium1",
    "screenmachine@gmail.com": "teknium1",
-    "chenzeshi@live.com": "chen1749144759",
-    "mor.aleksandr@yahoo.com": "MorAlekss",
-    "ash@users.noreply.github.com": "ash",
 }


@@ -248,6 +248,7 @@ Type these during an interactive chat session.
 ```
 /config              Show config (CLI)
 /model [name]        Show or change model
+/provider            Show provider info
 /personality [name]  Set personality
 /reasoning [level]   Set reasoning (none|minimal|low|medium|high|xhigh|show|hide)
 /verbose             Cycle: off → new → all → verbose
@@ -1,196 +0,0 @@
---
-name: design-md
-description: Author, validate, diff, and export DESIGN.md files — Google's open-source format spec that gives coding agents a persistent, structured understanding of a design system (tokens + rationale in one file). Use when building a design system, porting style rules between projects, generating UI with consistent brand, or auditing accessibility/contrast.
-version: 1.0.0
-author: Hermes Agent
-license: MIT
-metadata:
-  hermes:
-    tags: [design, design-system, tokens, ui, accessibility, wcag, tailwind, dtcg, google]
-    related_skills: [popular-web-designs, excalidraw, architecture-diagram]
---
-
-# DESIGN.md Skill
-
-DESIGN.md is Google's open spec (Apache-2.0, `google-labs-code/design.md`) for
-describing a visual identity to coding agents. One file combines:
-
- **YAML front matter** — machine-readable design tokens (normative values)
- **Markdown body** — human-readable rationale, organized into canonical sections
-
-Tokens give exact values. Prose tells agents *why* those values exist and how to
-apply them. The CLI (`npx @google/design.md`) lints structure + WCAG contrast,
-diffs versions for regressions, and exports to Tailwind or W3C DTCG JSON.
-
-## When to use this skill
-
- User asks for a DESIGN.md file, design tokens, or a design system spec
- User wants consistent UI/brand across multiple projects or tools
- User pastes an existing DESIGN.md and asks to lint, diff, export, or extend it
- User asks to port a style guide into a format agents can consume
- User wants contrast / WCAG accessibility validation on their color palette
-
-For purely visual inspiration or layout examples, use `popular-web-designs`
-instead. This skill is for the *formal spec file* itself.
-
-## File anatomy
-
-```md
---
-version: alpha
-name: Heritage
-description: Architectural minimalism meets journalistic gravitas.
-colors:
-  primary: "#1A1C1E"
-  secondary: "#6C7278"
-  tertiary: "#B8422E"
-  neutral: "#F7F5F2"
-typography:
-  h1:
-    fontFamily: Public Sans
-    fontSize: 3rem
-    fontWeight: 700
-    lineHeight: 1.1
-    letterSpacing: "-0.02em"
-  body-md:
-    fontFamily: Public Sans
-    fontSize: 1rem
-rounded:
-  sm: 4px
-  md: 8px
-  lg: 16px
-spacing:
-  sm: 8px
-  md: 16px
-  lg: 24px
-components:
-  button-primary:
-    backgroundColor: "{colors.tertiary}"
-    textColor: "#FFFFFF"
-    rounded: "{rounded.sm}"
-    padding: 12px
-  button-primary-hover:
-    backgroundColor: "{colors.primary}"
---
-
-## Overview
-
-Architectural Minimalism meets Journalistic Gravitas...
-
-## Colors
-
- **Primary (#1A1C1E):** Deep ink for headlines and core text.
- **Tertiary (#B8422E):** "Boston Clay" — the sole driver for interaction.
-
-## Typography
-
-Public Sans for everything except small all-caps labels...
-
-## Components
-
-`button-primary` is the only high-emphasis action on a page...
-```
-
-## Token types
-
-| Type | Format | Example |
-|------|--------|---------|
-| Color | `#` + hex (sRGB) | `"#1A1C1E"` |
-| Dimension | number + unit (`px`, `em`, `rem`) | `48px`, `-0.02em` |
-| Token reference | `{path.to.token}` | `{colors.primary}` |
-| Typography | object with `fontFamily`, `fontSize`, `fontWeight`, `lineHeight`, `letterSpacing`, `fontFeature`, `fontVariation` | see above |
-
-Component property whitelist: `backgroundColor`, `textColor`, `typography`,
-`rounded`, `padding`, `size`, `height`, `width`. Variants (hover, active,
-pressed) are **separate component entries** with related key names
-(`button-primary-hover`), not nested.
-
-## Canonical section order
-
-Sections are optional, but present ones MUST appear in this order. Duplicate
-headings reject the file.
-
-1. Overview (alias: Brand & Style)
-2. Colors
-3. Typography
-4. Layout (alias: Layout & Spacing)
-5. Elevation & Depth (alias: Elevation)
-6. Shapes
-7. Components
-8. Do's and Don'ts
-
-Unknown sections are preserved, not errored. Unknown token names are accepted
-if the value type is valid. Unknown component properties produce a warning.
-
-## Workflow: authoring a new DESIGN.md
-
-1. **Ask the user** (or infer) the brand tone, accent color, and typography
-   direction. If they provided a site, image, or vibe, translate it to the
-   token shape above.
-2. **Write `DESIGN.md`** in their project root using `write_file`. Always
-   include `name:` and `colors:`; other sections optional but encouraged.
-3. **Use token references** (`{colors.primary}`) in the `components:` section
-   instead of re-typing hex values. Keeps the palette single-source.
-4. **Lint it** (see below). Fix any broken references or WCAG failures
-   before returning.
-5. **If the user has an existing project**, also write Tailwind or DTCG
-   exports next to the file (`tailwind.theme.json`, `tokens.json`).
-
-## Workflow: lint / diff / export
-
-The CLI is `@google/design.md` (Node). Use `npx` — no global install needed.
-
-```bash
-# Validate structure + token references + WCAG contrast
-npx -y @google/design.md lint DESIGN.md
-
-# Compare two versions, fail on regression (exit 1 = regression)
-npx -y @google/design.md diff DESIGN.md DESIGN-v2.md
-
-# Export to Tailwind theme JSON
-npx -y @google/design.md export --format tailwind DESIGN.md > tailwind.theme.json
-
-# Export to W3C DTCG (Design Tokens Format Module) JSON
-npx -y @google/design.md export --format dtcg DESIGN.md > tokens.json
-
-# Print the spec itself — useful when injecting into an agent prompt
-npx -y @google/design.md spec --rules-only --format json
-```
-
-All commands accept `-` for stdin. `lint` returns exit 1 on errors. Use the
-`--format json` flag and parse the output if you need to report findings
-structurally.
-
-### Lint rule reference (what the 7 rules catch)
-
- `broken-ref` (error) — `{colors.missing}` points at a non-existent token
- `duplicate-section` (error) — same `## Heading` appears twice
- `invalid-color`, `invalid-dimension`, `invalid-typography` (error)
- `wcag-contrast` (warning/info) — component `textColor` vs `backgroundColor`
-  ratio against WCAG AA (4.5:1) and AAA (7:1)
- `unknown-component-property` (warning) — outside the whitelist above
-
-When the user cares about accessibility, call this out explicitly in your
-summary — WCAG findings are the most load-bearing reason to use the CLI.
-
-## Pitfalls
-
- **Don't nest component variants.** `button-primary.hover` is wrong;
-  `button-primary-hover` as a sibling key is right.
- **Hex colors must be quoted strings.** YAML will otherwise choke on `#` or
-  truncate values like `#1A1C1E` oddly.
- **Negative dimensions need quotes too.** `letterSpacing: -0.02em` parses as
-  a YAML flow — write `letterSpacing: "-0.02em"`.
- **Section order is enforced.** If the user gives you prose in a random order,
-  reorder it to match the canonical list before saving.
- **`version: alpha` is the current spec version** (as of Apr 2026). The spec
-  is marked alpha — watch for breaking changes.
- **Token references resolve by dotted path.** `{colors.primary}` works;
-  `{primary}` does not.
-
-## Spec source of truth
-
- Repo: https://github.com/google-labs-code/design.md (Apache-2.0)
- CLI: `@google/design.md` on npm
- License of generated DESIGN.md files: whatever the user's project uses;
-  the spec itself is Apache-2.0.
@@ -1,99 +0,0 @@
---
-version: alpha
-name: MyBrand
-description: One-sentence description of the visual identity.
-colors:
-  primary: "#0F172A"
-  secondary: "#64748B"
-  tertiary: "#2563EB"
-  neutral: "#F8FAFC"
-  on-primary: "#FFFFFF"
-  on-tertiary: "#FFFFFF"
-typography:
-  h1:
-    fontFamily: Inter
-    fontSize: 3rem
-    fontWeight: 700
-    lineHeight: 1.1
-    letterSpacing: "-0.02em"
-  h2:
-    fontFamily: Inter
-    fontSize: 2rem
-    fontWeight: 600
-    lineHeight: 1.2
-  body-md:
-    fontFamily: Inter
-    fontSize: 1rem
-    lineHeight: 1.5
-  label-caps:
-    fontFamily: Inter
-    fontSize: 0.75rem
-    fontWeight: 600
-    letterSpacing: "0.08em"
-rounded:
-  sm: 4px
-  md: 8px
-  lg: 16px
-  full: 9999px
-spacing:
-  xs: 4px
-  sm: 8px
-  md: 16px
-  lg: 24px
-  xl: 48px
-components:
-  button-primary:
-    backgroundColor: "{colors.tertiary}"
-    textColor: "{colors.on-tertiary}"
-    rounded: "{rounded.sm}"
-    padding: 12px
-  button-primary-hover:
-    backgroundColor: "{colors.primary}"
-    textColor: "{colors.on-primary}"
-  card:
-    backgroundColor: "{colors.neutral}"
-    textColor: "{colors.primary}"
-    rounded: "{rounded.md}"
-    padding: 24px
---
-
-## Overview
-
-Describe the voice and feel of the brand in one or two paragraphs. What mood
-does it evoke? What emotional response should a user have on first impression?
-
-## Colors
-
- **Primary ({colors.primary}):** Core text, headlines, high-emphasis surfaces.
- **Secondary ({colors.secondary}):** Supporting text, borders, metadata.
- **Tertiary ({colors.tertiary}):** Interaction driver — buttons, links,
-  selected states. Use sparingly to preserve its signal.
- **Neutral ({colors.neutral}):** Page background and surface fills.
-
-## Typography
-
-Inter for everything. Weight and size carry hierarchy, not font family. Tight
-letter-spacing on display sizes; default tracking on body.
-
-## Layout
-
-Spacing scale is a 4px baseline. Use `md` (16px) for intra-component gaps,
-`lg` (24px) for inter-component gaps, `xl` (48px) for section breaks.
-
-## Shapes
-
-Rounded corners are modest — `sm` on interactive elements, `md` on cards.
-`full` is reserved for avatars and pill badges.
-
-## Components
-
- `button-primary` is the only high-emphasis action per screen.
- `card` is the default surface for grouped content. No shadow by default.
-
-## Do's and Don'ts
-
- **Do** use token references (`{colors.primary}`) instead of literal hex in
-  component definitions.
- **Don't** introduce colors outside the palette — extend the palette first.
- **Don't** nest component variants. `button-primary-hover` is a sibling,
-  not a child.
@@ -1,134 +0,0 @@
---
-name: spotify
-description: Control Spotify — play music, search the catalog, manage playlists and library, inspect devices and playback state. Loads when the user asks to play/pause/queue music, search tracks/albums/artists, manage playlists, or check what's playing. Assumes the Hermes Spotify toolset is enabled and `hermes auth spotify` has been run.
-version: 1.0.0
-author: Hermes Agent
-license: MIT
-prerequisites:
-  tools: [spotify_playback, spotify_devices, spotify_queue, spotify_search, spotify_playlists, spotify_albums, spotify_library]
-metadata:
-  hermes:
-    tags: [spotify, music, playback, playlists, media]
-    related_skills: [gif-search]
---
-
-# Spotify
-
-Control the user's Spotify account via the Hermes Spotify toolset (7 tools). Setup guide: https://hermes-agent.nousresearch.com/docs/user-guide/features/spotify
-
-## When to use this skill
-
-The user says something like "play X", "pause", "skip", "queue up X", "what's playing", "search for X", "add to my X playlist", "make a playlist", "save this to my library", etc.
-
-## The 7 tools
-
- `spotify_playback` — play, pause, next, previous, seek, set_repeat, set_shuffle, set_volume, get_state, get_currently_playing, recently_played
- `spotify_devices` — list, transfer
- `spotify_queue` — get, add
- `spotify_search` — search the catalog
- `spotify_playlists` — list, get, create, add_items, remove_items, update_details
- `spotify_albums` — get, tracks
- `spotify_library` — list/save/remove with `kind: "tracks"|"albums"`
-
-Playback-mutating actions require Spotify Premium; search/library/playlist ops work on Free.
-
-## Canonical patterns (minimize tool calls)
-
-### "Play <artist/track/album>"
-One search, then play by URI. Do NOT loop through search results describing them unless the user asked for options.
-
-```
-spotify_search({"query": "miles davis kind of blue", "types": ["album"], "limit": 1})
-→ got album URI spotify:album:1weenld61qoidwYuZ1GESA
-spotify_playback({"action": "play", "context_uri": "spotify:album:1weenld61qoidwYuZ1GESA"})
-```
-
-For "play some <artist>" (no specific song), prefer `types: ["artist"]` and play the artist context URI — Spotify handles smart shuffle. If the user says "the song" or "that track", search `types: ["track"]` and pass `uris: [track_uri]` to play.
-
-### "What's playing?" / "What am I listening to?"
-Single call — don't chain get_state after get_currently_playing.
-
-```
-spotify_playback({"action": "get_currently_playing"})
-```
-
-If it returns 204/empty (`is_playing: false`), tell the user nothing is playing. Don't retry.
-
-### "Pause" / "Skip" / "Volume 50"
-Direct action, no preflight inspection needed.
-
-```
-spotify_playback({"action": "pause"})
-spotify_playback({"action": "next"})
-spotify_playback({"action": "set_volume", "volume_percent": 50})
-```
-
-### "Add to my <playlist name> playlist"
-1. `spotify_playlists list` to find the playlist ID by name
-2. Get the track URI (from currently playing, or search)
-3. `spotify_playlists add_items` with the playlist_id and URIs
-
-```
-spotify_playlists({"action": "list"})
-→ found "Late Night Jazz" = 37i9dQZF1DX4wta20PHgwo
-spotify_playback({"action": "get_currently_playing"})
-→ current track uri = spotify:track:0DiWol3AO6WpXZgp0goxAV
-spotify_playlists({"action": "add_items",
-                   "playlist_id": "37i9dQZF1DX4wta20PHgwo",
-                   "uris": ["spotify:track:0DiWol3AO6WpXZgp0goxAV"]})
-```
-
-### "Create a playlist called X and add the last 3 songs I played"
-```
-spotify_playback({"action": "recently_played", "limit": 3})
-spotify_playlists({"action": "create", "name": "Focus 2026"})
-→ got playlist_id back in response
-spotify_playlists({"action": "add_items", "playlist_id": <id>, "uris": [<3 uris>]})
-```
-
-### "Save / unsave / is this saved?"
-Use `spotify_library` with the right `kind`.
-
-```
-spotify_library({"kind": "tracks", "action": "save", "uris": ["spotify:track:..."]})
-spotify_library({"kind": "albums", "action": "list", "limit": 50})
-```
-
-### "Transfer playback to my <device>"
-```
-spotify_devices({"action": "list"})
-→ pick the device_id by matching name/type
-spotify_devices({"action": "transfer", "device_id": "<id>", "play": true})
-```
-
-## Critical failure modes
-
-**`403 Forbidden — No active device found`** on any playback action means Spotify isn't running anywhere. Tell the user: "Open Spotify on your phone/desktop/web player first, start any track for a second, then retry." Don't retry the tool call blindly — it will fail the same way. You can call `spotify_devices list` to confirm; an empty list means no active device.
-
-**`403 Forbidden — Premium required`** means the user is on Free and tried to mutate playback. Don't retry; tell them this action needs Premium. Reads still work (search, playlists, library, get_state).
-
-**`204 No Content` on `get_currently_playing`** is NOT an error — it means nothing is playing. The tool returns `is_playing: false`. Just report that to the user.
-
-**`429 Too Many Requests`** = rate limit. Wait and retry once. If it keeps happening, you're looping — stop.
-
-**`401 Unauthorized` after a retry** — refresh token revoked. Tell the user to run `hermes auth spotify` again.
-
-## URI and ID formats
-
-Spotify uses three interchangeable ID formats. The tools accept all three and normalize:
-
- URI: `spotify:track:0DiWol3AO6WpXZgp0goxAV` (preferred)
- URL: `https://open.spotify.com/track/0DiWol3AO6WpXZgp0goxAV`
- Bare ID: `0DiWol3AO6WpXZgp0goxAV`
-
-When in doubt, use full URIs. Search results return URIs in the `uri` field — pass those directly.
-
-Entity types: `track`, `album`, `artist`, `playlist`, `show`, `episode`. Use the right type for the action — `spotify_playback.play` with a `context_uri` expects album/playlist/artist; `uris` expects an array of track URIs.
-
-## What NOT to do
-
- **Don't call `get_state` before every action.** Spotify accepts play/pause/skip without preflight. Only inspect state when the user asked "what's playing" or you need to reason about device/track.
- **Don't describe search results unless asked.** If the user said "play X", search, grab the top URI, play it. They'll hear it's wrong if it's wrong.
- **Don't retry on `403 Premium required` or `403 No active device`.** Those are permanent until user action.
- **Don't use `spotify_search` to find a playlist by name** — that searches the public Spotify catalog. User playlists come from `spotify_playlists list`.
- **Don't mix `kind: "tracks"` with album URIs** in `spotify_library` (or vice versa). The tool normalizes IDs but the API endpoint differs.
@@ -134,7 +134,6 @@ masks = processor.image_processor.post_process_masks(

 ### Model architecture

-<!-- ascii-guard-ignore -->
 ```
 SAM Architecture:
 ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
@@ -145,7 +144,6 @@ SAM Architecture:
   Image Embeddings      Prompt Embeddings         Masks + IoU
   (computed once)       (per prompt)             predictions
 ```
-<!-- ascii-guard-ignore-end -->

 ### Model variants

@@ -1,42 +0,0 @@
-"""Resolve HERMES_HOME for standalone skill scripts.
-
-Skill scripts may run outside the Hermes process (e.g. system Python,
-nix env, CI) where ``hermes_constants`` is not importable.  This module
-provides the same ``get_hermes_home()`` and ``display_hermes_home()``
-contracts as ``hermes_constants`` without requiring it on ``sys.path``.
-
-When ``hermes_constants`` IS available it is used directly so that any
-future enhancements (profile resolution, Docker detection, etc.) are
-picked up automatically.  The fallback path replicates the core logic
-from ``hermes_constants.py`` using only the stdlib.
-
-All scripts under ``google-workspace/scripts/`` should import from here
-instead of duplicating the ``HERMES_HOME = Path(os.getenv(...))`` pattern.
-"""
-
-from __future__ import annotations
-
-import os
-from pathlib import Path
-
-try:
-    from hermes_constants import display_hermes_home as display_hermes_home
-    from hermes_constants import get_hermes_home as get_hermes_home
-except (ModuleNotFoundError, ImportError):
-
-    def get_hermes_home() -> Path:
-        """Return the Hermes home directory (default: ~/.hermes).
-
-        Mirrors ``hermes_constants.get_hermes_home()``."""
-        val = os.environ.get("HERMES_HOME", "").strip()
-        return Path(val) if val else Path.home() / ".hermes"
-
-    def display_hermes_home() -> str:
-        """Return a user-friendly ``~/``-shortened display string.
-
-        Mirrors ``hermes_constants.display_hermes_home()``."""
-        home = get_hermes_home()
-        try:
-            return "~/" + str(home.relative_to(Path.home()))
-        except ValueError:
-            return str(home)
@@ -31,14 +31,7 @@ from datetime import datetime, timedelta, timezone
 from email.mime.text import MIMEText
 from pathlib import Path

-# Ensure sibling modules (_hermes_home) are importable when run standalone.
-_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
-if _SCRIPTS_DIR not in sys.path:
-    sys.path.insert(0, _SCRIPTS_DIR)
-
-from _hermes_home import get_hermes_home
-
-HERMES_HOME = get_hermes_home()
+HERMES_HOME = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
 TOKEN_PATH = HERMES_HOME / "google_token.json"
 CLIENT_SECRET_PATH = HERMES_HOME / "google_client_secret.json"

@@ -10,12 +10,9 @@ import sys
 from datetime import datetime, timezone
 from pathlib import Path

-# Ensure sibling modules (_hermes_home) are importable when run standalone.
-_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
-if _SCRIPTS_DIR not in sys.path:
-    sys.path.insert(0, _SCRIPTS_DIR)

-from _hermes_home import get_hermes_home
+def get_hermes_home() -> Path:
+    return Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))


 def get_token_path() -> Path:
@@ -21,8 +21,6 @@ Agent workflow:
  6. Run --check to verify. Done.
 """

-from __future__ import annotations  # allow PEP 604 `X | None` on Python 3.9+
-
 import argparse
 import json
 import os
@@ -30,12 +28,13 @@ import subprocess
 import sys
 from pathlib import Path

-# Ensure sibling modules (_hermes_home) are importable when run standalone.
-_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
-if _SCRIPTS_DIR not in sys.path:
-    sys.path.insert(0, _SCRIPTS_DIR)
-
-from _hermes_home import display_hermes_home, get_hermes_home
+try:
+    from hermes_constants import display_hermes_home, get_hermes_home
+except ModuleNotFoundError:
+    HERMES_AGENT_ROOT = Path(__file__).resolve().parents[4]
+    if HERMES_AGENT_ROOT.exists():
+        sys.path.insert(0, str(HERMES_AGENT_ROOT))
+    from hermes_constants import display_hermes_home, get_hermes_home

 HERMES_HOME = get_hermes_home()
 TOKEN_PATH = HERMES_HOME / "google_token.json"
@@ -112,11 +111,7 @@ def install_deps():
        return True
    except subprocess.CalledProcessError as e:
        print(f"ERROR: Failed to install dependencies: {e}")
-        print(
-            "On environments without pip (e.g. Nix), install the optional extra instead:"
-        )
-        print("  pip install 'hermes-agent[google]'")
-        print(f"Or manually: {sys.executable} -m pip install {' '.join(REQUIRED_PACKAGES)}")
+        print(f"Try manually: {sys.executable} -m pip install {' '.join(REQUIRED_PACKAGES)}")
        return False


@@ -22,7 +22,6 @@ End-to-end pipeline for producing publication-ready ML/AI research papers target

 This is **not a linear pipeline** — it is an iterative loop. Results trigger new experiments. Reviews trigger new analysis. The agent must handle these feedback loops.

-<!-- ascii-guard-ignore -->
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                    RESEARCH PAPER PIPELINE                  │
@@ -42,7 +41,6 @@ This is **not a linear pipeline** — it is an iterative loop. Results trigger n
 │                                                             │
 └─────────────────────────────────────────────────────────────┘
 ```
-<!-- ascii-guard-ignore-end -->

 ---

@@ -904,15 +904,9 @@ class TestRegisterSessionMcpServers:
        ]

        with patch("tools.mcp_tool.register_mcp_servers", return_value=["mcp_srv_search"]), \
-             patch("model_tools.get_tool_definitions", return_value=fake_tools) as mock_defs:
+             patch("model_tools.get_tool_definitions", return_value=fake_tools):
            await agent._register_session_mcp_servers(state, [server])

-        mock_defs.assert_called_once_with(
-            enabled_toolsets=["hermes-acp", "mcp-srv"],
-            disabled_toolsets=None,
-            quiet_mode=True,
-        )
-        assert state.agent.enabled_toolsets == ["hermes-acp", "mcp-srv"]
        assert state.agent.tools == fake_tools
        assert state.agent.valid_tool_names == {"mcp_srv_search", "terminal"}
        # _invalidate_system_prompt should have been called
@@ -138,43 +138,6 @@ class TestListAndCleanup:
 class TestPersistence:
    """Verify that sessions are persisted to SessionDB and can be restored."""

-    def test_create_session_includes_registered_mcp_toolsets(self, tmp_path, monkeypatch):
-        captured = {}
-
-        def fake_resolve_runtime_provider(requested=None, **kwargs):
-            return {
-                "provider": "openrouter",
-                "api_mode": "chat_completions",
-                "base_url": "https://openrouter.example/v1",
-                "api_key": "***",
-                "command": None,
-                "args": [],
-            }
-
-        def fake_agent(**kwargs):
-            captured.update(kwargs)
-            return SimpleNamespace(model=kwargs.get("model"), enabled_toolsets=kwargs.get("enabled_toolsets"))
-
-        monkeypatch.setattr("hermes_cli.config.load_config", lambda: {
-            "model": {"provider": "openrouter", "default": "test-model"},
-            "mcp_servers": {
-                "olympus": {"command": "python", "enabled": True},
-                "exa": {"url": "https://exa.ai/mcp"},
-                "disabled": {"command": "python", "enabled": False},
-            },
-        })
-        monkeypatch.setattr(
-            "hermes_cli.runtime_provider.resolve_runtime_provider",
-            fake_resolve_runtime_provider,
-        )
-        db = SessionDB(tmp_path / "state.db")
-
-        with patch("run_agent.AIAgent", side_effect=fake_agent):
-            manager = SessionManager(db=db)
-            manager.create_session(cwd="/work")
-
-        assert captured["enabled_toolsets"] == ["hermes-acp", "mcp-olympus", "mcp-exa"]
-
    def test_create_session_writes_to_db(self, manager):
        state = manager.create_session(cwd="/project")
        db = manager._get_db()
@@ -1,165 +0,0 @@
-"""Tests for Bug #12905 fixes in agent/anthropic_adapter.py — macOS Keychain support."""
-
-import json
-import platform
-from unittest.mock import patch, MagicMock
-
-import pytest
-
-from agent.anthropic_adapter import (
-    _read_claude_code_credentials_from_keychain,
-    read_claude_code_credentials,
-)
-
-
-class TestReadClaudeCodeCredentialsFromKeychain:
-    """Bug 4: macOS Keychain support for Claude Code >=2.1.114."""
-
-    def test_returns_none_on_linux(self):
-        """Keychain reading is Darwin-only; must return None on other platforms."""
-        with patch("agent.anthropic_adapter.platform.system", return_value="Linux"):
-            assert _read_claude_code_credentials_from_keychain() is None
-
-    def test_returns_none_on_windows(self):
-        with patch("agent.anthropic_adapter.platform.system", return_value="Windows"):
-            assert _read_claude_code_credentials_from_keychain() is None
-
-    def test_returns_none_when_security_command_not_found(self):
-        """OSError from missing security binary must be handled gracefully."""
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run",
-                   side_effect=OSError("security not found")):
-            assert _read_claude_code_credentials_from_keychain() is None
-
-    def test_returns_none_on_nonzero_exit_code(self):
-        """security returns non-zero when the Keychain entry doesn't exist."""
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="")
-            assert _read_claude_code_credentials_from_keychain() is None
-
-    def test_returns_none_for_empty_stdout(self):
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
-            assert _read_claude_code_credentials_from_keychain() is None
-
-    def test_returns_none_for_non_json_payload(self):
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            mock_run.return_value = MagicMock(returncode=0, stdout="not valid json", stderr="")
-            assert _read_claude_code_credentials_from_keychain() is None
-
-    def test_returns_none_when_password_field_is_missing_claude_ai_oauth(self):
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            mock_run.return_value = MagicMock(
-                returncode=0,
-                stdout=json.dumps({"someOtherService": {"accessToken": "tok"}}),
-                stderr="",
-            )
-            assert _read_claude_code_credentials_from_keychain() is None
-
-    def test_returns_none_when_access_token_is_empty(self):
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            mock_run.return_value = MagicMock(
-                returncode=0,
-                stdout=json.dumps({"claudeAiOauth": {"accessToken": "", "refreshToken": "x"}}),
-                stderr="",
-            )
-            assert _read_claude_code_credentials_from_keychain() is None
-
-    def test_parses_valid_keychain_entry(self):
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            mock_run.return_value = MagicMock(
-                returncode=0,
-                stdout=json.dumps({
-                    "claudeAiOauth": {
-                        "accessToken": "kc-access-token-abc",
-                        "refreshToken": "kc-refresh-token-xyz",
-                        "expiresAt": 9999999999999,
-                    }
-                }),
-                stderr="",
-            )
-            creds = _read_claude_code_credentials_from_keychain()
-            assert creds is not None
-            assert creds["accessToken"] == "kc-access-token-abc"
-            assert creds["refreshToken"] == "kc-refresh-token-xyz"
-            assert creds["expiresAt"] == 9999999999999
-            assert creds["source"] == "macos_keychain"
-
-
-class TestReadClaudeCodeCredentialsPriority:
-    """Bug 4: Keychain must be checked before the JSON file."""
-
-    def test_keychain_takes_priority_over_json_file(self, tmp_path, monkeypatch):
-        """When both Keychain and JSON file have credentials, Keychain wins."""
-        # Set up JSON file with "older" token
-        json_cred_file = tmp_path / ".claude" / ".credentials.json"
-        json_cred_file.parent.mkdir(parents=True)
-        json_cred_file.write_text(json.dumps({
-            "claudeAiOauth": {
-                "accessToken": "json-token",
-                "refreshToken": "json-refresh",
-                "expiresAt": 9999999999999,
-            }
-        }))
-        monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
-
-        # Mock Keychain to return a "newer" token
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            mock_run.return_value = MagicMock(
-                returncode=0,
-                stdout=json.dumps({
-                    "claudeAiOauth": {
-                        "accessToken": "keychain-token",
-                        "refreshToken": "keychain-refresh",
-                        "expiresAt": 9999999999999,
-                    }
-                }),
-                stderr="",
-            )
-            creds = read_claude_code_credentials()
-
-        # Keychain token should be returned, not JSON file token
-        assert creds is not None
-        assert creds["accessToken"] == "keychain-token"
-        assert creds["source"] == "macos_keychain"
-
-    def test_falls_back_to_json_when_keychain_returns_none(self, tmp_path, monkeypatch):
-        """When Keychain has no entry, JSON file is used as fallback."""
-        json_cred_file = tmp_path / ".claude" / ".credentials.json"
-        json_cred_file.parent.mkdir(parents=True)
-        json_cred_file.write_text(json.dumps({
-            "claudeAiOauth": {
-                "accessToken": "json-fallback-token",
-                "refreshToken": "json-refresh",
-                "expiresAt": 9999999999999,
-            }
-        }))
-        monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
-
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            # Simulate Keychain entry not found
-            mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="")
-            creds = read_claude_code_credentials()
-
-        assert creds is not None
-        assert creds["accessToken"] == "json-fallback-token"
-        assert creds["source"] == "claude_code_credentials_file"
-
-    def test_returns_none_when_neither_keychain_nor_json_has_creds(self, tmp_path, monkeypatch):
-        """No credentials anywhere — must return None cleanly."""
-        monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
-
-        with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
-             patch("agent.anthropic_adapter.subprocess.run") as mock_run:
-            mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="")
-            creds = read_claude_code_credentials()
-
-        assert creds is None
@@ -19,7 +19,6 @@ from agent.auxiliary_client import (
    _read_codex_access_token,
    _get_provider_chain,
    _is_payment_error,
-    _normalize_aux_provider,
    _try_payment_fallback,
    _resolve_auto,
 )
@@ -55,17 +54,6 @@ def codex_auth_dir(tmp_path, monkeypatch):
    return codex_dir


-class TestNormalizeAuxProvider:
-    def test_maps_github_copilot_aliases(self):
-        assert _normalize_aux_provider("github") == "copilot"
-        assert _normalize_aux_provider("github-copilot") == "copilot"
-        assert _normalize_aux_provider("github-models") == "copilot"
-
-    def test_maps_github_copilot_acp_aliases(self):
-        assert _normalize_aux_provider("github-copilot-acp") == "copilot-acp"
-        assert _normalize_aux_provider("copilot-acp-agent") == "copilot-acp"
-
-
 class TestReadCodexAccessToken:
    def test_valid_auth_store(self, tmp_path, monkeypatch):
        hermes_home = tmp_path / "hermes"
@@ -1215,201 +1203,3 @@ class TestAnthropicCompatImageConversion:
        }]
        result = _convert_openai_images_to_anthropic(messages)
        assert result[0]["content"][0]["source"]["media_type"] == "image/jpeg"
-
-
-class _AuxAuth401(Exception):
-    status_code = 401
-
-    def __init__(self, message="Provided authentication token is expired"):
-        super().__init__(message)
-
-
-class _DummyResponse:
-    def __init__(self, text="ok"):
-        self.choices = [MagicMock(message=MagicMock(content=text))]
-
-
-class _FailingThenSuccessCompletions:
-    def __init__(self):
-        self.calls = 0
-
-    def create(self, **kwargs):
-        self.calls += 1
-        if self.calls == 1:
-            raise _AuxAuth401()
-        return _DummyResponse("sync-ok")
-
-
-class _AsyncFailingThenSuccessCompletions:
-    def __init__(self):
-        self.calls = 0
-
-    async def create(self, **kwargs):
-        self.calls += 1
-        if self.calls == 1:
-            raise _AuxAuth401()
-        return _DummyResponse("async-ok")
-
-
-class TestAuxiliaryAuthRefreshRetry:
-    def test_call_llm_refreshes_codex_on_401_for_vision(self):
-        failing_client = MagicMock()
-        failing_client.base_url = "https://chatgpt.com/backend-api/codex"
-        failing_client.chat.completions = _FailingThenSuccessCompletions()
-
-        fresh_client = MagicMock()
-        fresh_client.base_url = "https://chatgpt.com/backend-api/codex"
-        fresh_client.chat.completions.create.return_value = _DummyResponse("fresh-sync")
-
-        with (
-            patch(
-                "agent.auxiliary_client.resolve_vision_provider_client",
-                side_effect=[("openai-codex", failing_client, "gpt-5.2-codex"), ("openai-codex", fresh_client, "gpt-5.2-codex")],
-            ),
-            patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
-        ):
-            resp = call_llm(
-                task="vision",
-                provider="openai-codex",
-                model="gpt-5.2-codex",
-                messages=[{"role": "user", "content": "hi"}],
-            )
-
-        assert resp.choices[0].message.content == "fresh-sync"
-        mock_refresh.assert_called_once_with("openai-codex")
-
-    def test_call_llm_refreshes_codex_on_401_for_non_vision(self):
-        stale_client = MagicMock()
-        stale_client.base_url = "https://chatgpt.com/backend-api/codex"
-        stale_client.chat.completions.create.side_effect = _AuxAuth401("stale codex token")
-
-        fresh_client = MagicMock()
-        fresh_client.base_url = "https://chatgpt.com/backend-api/codex"
-        fresh_client.chat.completions.create.return_value = _DummyResponse("fresh-non-vision")
-
-        with (
-            patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("openai-codex", "gpt-5.2-codex", None, None, None)),
-            patch("agent.auxiliary_client._get_cached_client", side_effect=[(stale_client, "gpt-5.2-codex"), (fresh_client, "gpt-5.2-codex")]),
-            patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
-        ):
-            resp = call_llm(
-                task="compression",
-                provider="openai-codex",
-                model="gpt-5.2-codex",
-                messages=[{"role": "user", "content": "hi"}],
-            )
-
-        assert resp.choices[0].message.content == "fresh-non-vision"
-        mock_refresh.assert_called_once_with("openai-codex")
-        assert stale_client.chat.completions.create.call_count == 1
-        assert fresh_client.chat.completions.create.call_count == 1
-
-    def test_call_llm_refreshes_anthropic_on_401_for_non_vision(self):
-        stale_client = MagicMock()
-        stale_client.base_url = "https://api.anthropic.com"
-        stale_client.chat.completions.create.side_effect = _AuxAuth401("anthropic token expired")
-
-        fresh_client = MagicMock()
-        fresh_client.base_url = "https://api.anthropic.com"
-        fresh_client.chat.completions.create.return_value = _DummyResponse("fresh-anthropic")
-
-        with (
-            patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("anthropic", "claude-haiku-4-5-20251001", None, None, None)),
-            patch("agent.auxiliary_client._get_cached_client", side_effect=[(stale_client, "claude-haiku-4-5-20251001"), (fresh_client, "claude-haiku-4-5-20251001")]),
-            patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
-        ):
-            resp = call_llm(
-                task="compression",
-                provider="anthropic",
-                model="claude-haiku-4-5-20251001",
-                messages=[{"role": "user", "content": "hi"}],
-            )
-
-        assert resp.choices[0].message.content == "fresh-anthropic"
-        mock_refresh.assert_called_once_with("anthropic")
-        assert stale_client.chat.completions.create.call_count == 1
-        assert fresh_client.chat.completions.create.call_count == 1
-
-    @pytest.mark.asyncio
-    async def test_async_call_llm_refreshes_codex_on_401_for_vision(self):
-        failing_client = MagicMock()
-        failing_client.base_url = "https://chatgpt.com/backend-api/codex"
-        failing_client.chat.completions = _AsyncFailingThenSuccessCompletions()
-
-        fresh_client = MagicMock()
-        fresh_client.base_url = "https://chatgpt.com/backend-api/codex"
-        fresh_client.chat.completions.create = AsyncMock(return_value=_DummyResponse("fresh-async"))
-
-        with (
-            patch(
-                "agent.auxiliary_client.resolve_vision_provider_client",
-                side_effect=[("openai-codex", failing_client, "gpt-5.2-codex"), ("openai-codex", fresh_client, "gpt-5.2-codex")],
-            ),
-            patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
-        ):
-            resp = await async_call_llm(
-                task="vision",
-                provider="openai-codex",
-                model="gpt-5.2-codex",
-                messages=[{"role": "user", "content": "hi"}],
-            )
-
-        assert resp.choices[0].message.content == "fresh-async"
-        mock_refresh.assert_called_once_with("openai-codex")
-
-    def test_refresh_provider_credentials_force_refreshes_anthropic_oauth_and_evicts_cache(self, monkeypatch):
-        stale_client = MagicMock()
-        cache_key = ("anthropic", False, None, None, None)
-
-        monkeypatch.setenv("ANTHROPIC_TOKEN", "")
-        monkeypatch.setenv("CLAUDE_CODE_OAUTH_TOKEN", "")
-        monkeypatch.setenv("ANTHROPIC_API_KEY", "")
-
-        with (
-            patch("agent.auxiliary_client._client_cache", {cache_key: (stale_client, "claude-haiku-4-5-20251001", None)}),
-            patch("agent.anthropic_adapter.read_claude_code_credentials", return_value={
-                "accessToken": "expired-token",
-                "refreshToken": "refresh-token",
-                "expiresAt": 0,
-            }),
-            patch("agent.anthropic_adapter.refresh_anthropic_oauth_pure", return_value={
-                "access_token": "fresh-token",
-                "refresh_token": "refresh-token-2",
-                "expires_at_ms": 9999999999999,
-            }) as mock_refresh_oauth,
-            patch("agent.anthropic_adapter._write_claude_code_credentials") as mock_write,
-        ):
-            from agent.auxiliary_client import _refresh_provider_credentials
-
-            assert _refresh_provider_credentials("anthropic") is True
-
-        mock_refresh_oauth.assert_called_once_with("refresh-token", use_json=False)
-        mock_write.assert_called_once_with("fresh-token", "refresh-token-2", 9999999999999)
-        stale_client.close.assert_called_once()
-
-    @pytest.mark.asyncio
-    async def test_async_call_llm_refreshes_anthropic_on_401_for_non_vision(self):
-        stale_client = MagicMock()
-        stale_client.base_url = "https://api.anthropic.com"
-        stale_client.chat.completions.create = AsyncMock(side_effect=_AuxAuth401("anthropic token expired"))
-
-        fresh_client = MagicMock()
-        fresh_client.base_url = "https://api.anthropic.com"
-        fresh_client.chat.completions.create = AsyncMock(return_value=_DummyResponse("fresh-async-anthropic"))
-
-        with (
-            patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("anthropic", "claude-haiku-4-5-20251001", None, None, None)),
-            patch("agent.auxiliary_client._get_cached_client", side_effect=[(stale_client, "claude-haiku-4-5-20251001"), (fresh_client, "claude-haiku-4-5-20251001")]),
-            patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
-        ):
-            resp = await async_call_llm(
-                task="compression",
-                provider="anthropic",
-                model="claude-haiku-4-5-20251001",
-                messages=[{"role": "user", "content": "hi"}],
-            )
-
-        assert resp.choices[0].message.content == "fresh-async-anthropic"
-        mock_refresh.assert_called_once_with("anthropic")
-        assert stale_client.chat.completions.create.await_count == 1
-        assert fresh_client.chat.completions.create.await_count == 1
@@ -100,26 +100,6 @@ class TestResolveProviderClientMainAlias:
        assert client is not None
        assert "beans.local" in str(client.base_url)

-    def test_main_resolves_github_copilot_alias(self, tmp_path):
-        _write_config(tmp_path, {
-            "model": {"default": "gpt-5.4", "provider": "github-copilot"},
-        })
-        with (
-            patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={
-                "api_key": "ghu_test_token",
-                "base_url": "https://api.githubcopilot.com",
-            }),
-            patch("agent.auxiliary_client.OpenAI") as mock_openai,
-        ):
-            mock_openai.return_value = MagicMock()
-            from agent.auxiliary_client import resolve_provider_client
-
-            client, model = resolve_provider_client("main", "gpt-5.4")
-
-        assert client is not None
-        assert model == "gpt-5.4"
-        assert mock_openai.called
-

 class TestResolveProviderClientNamedCustom:
    """resolve_provider_client should resolve named custom providers directly."""
@@ -272,158 +252,3 @@ class TestVisionPathApiMode:
        mock_gcc.assert_called_once()
        _, kwargs = mock_gcc.call_args
        assert kwargs.get("api_mode") == "chat_completions"
-
-
-class TestProvidersDictApiModeAnthropicMessages:
-    """Regression guard for #15033.
-
-    Named providers declared under the ``providers:`` dict with
-    ``api_mode: anthropic_messages`` must route auxiliary calls through
-    the Anthropic Messages API (via AnthropicAuxiliaryClient), not
-    through an OpenAI chat-completions client.
-
-    The bug had two halves: the providers-dict branch of
-    ``_get_named_custom_provider`` dropped the ``api_mode`` field, and
-    ``resolve_provider_client``'s named-custom branch never read it.
-    """
-
-    def test_providers_dict_propagates_api_mode(self, tmp_path, monkeypatch):
-        monkeypatch.setenv("MYRELAY_API_KEY", "sk-test")
-        _write_config(tmp_path, {
-            "providers": {
-                "myrelay": {
-                    "name": "myrelay",
-                    "base_url": "https://example-relay.test/anthropic",
-                    "key_env": "MYRELAY_API_KEY",
-                    "api_mode": "anthropic_messages",
-                    "default_model": "claude-opus-4-7",
-                },
-            },
-        })
-        from hermes_cli.runtime_provider import _get_named_custom_provider
-        entry = _get_named_custom_provider("myrelay")
-        assert entry is not None
-        assert entry.get("api_mode") == "anthropic_messages"
-        assert entry.get("base_url") == "https://example-relay.test/anthropic"
-        assert entry.get("api_key") == "sk-test"
-
-    def test_providers_dict_invalid_api_mode_is_dropped(self, tmp_path):
-        _write_config(tmp_path, {
-            "providers": {
-                "weird": {
-                    "name": "weird",
-                    "base_url": "https://example.test",
-                    "api_mode": "bogus_nonsense",
-                    "default_model": "x",
-                },
-            },
-        })
-        from hermes_cli.runtime_provider import _get_named_custom_provider
-        entry = _get_named_custom_provider("weird")
-        assert entry is not None
-        assert "api_mode" not in entry
-
-    def test_providers_dict_without_api_mode_is_unchanged(self, tmp_path):
-        _write_config(tmp_path, {
-            "providers": {
-                "localchat": {
-                    "name": "localchat",
-                    "base_url": "http://127.0.0.1:1234/v1",
-                    "api_key": "local-key",
-                    "default_model": "llama-3",
-                },
-            },
-        })
-        from hermes_cli.runtime_provider import _get_named_custom_provider
-        entry = _get_named_custom_provider("localchat")
-        assert entry is not None
-        assert "api_mode" not in entry
-
-    def test_resolve_provider_client_returns_anthropic_client(self, tmp_path, monkeypatch):
-        """Named custom provider with api_mode=anthropic_messages must
-        route through AnthropicAuxiliaryClient."""
-        monkeypatch.setenv("MYRELAY_API_KEY", "sk-test")
-        _write_config(tmp_path, {
-            "providers": {
-                "myrelay": {
-                    "name": "myrelay",
-                    "base_url": "https://example-relay.test/anthropic",
-                    "key_env": "MYRELAY_API_KEY",
-                    "api_mode": "anthropic_messages",
-                    "default_model": "claude-opus-4-7",
-                },
-            },
-        })
-        from agent.auxiliary_client import (
-            resolve_provider_client,
-            AnthropicAuxiliaryClient,
-            AsyncAnthropicAuxiliaryClient,
-        )
-        sync_client, sync_model = resolve_provider_client("myrelay", async_mode=False)
-        assert isinstance(sync_client, AnthropicAuxiliaryClient), (
-            f"expected AnthropicAuxiliaryClient, got {type(sync_client).__name__}"
-        )
-        assert sync_model == "claude-opus-4-7"
-
-        async_client, async_model = resolve_provider_client("myrelay", async_mode=True)
-        assert isinstance(async_client, AsyncAnthropicAuxiliaryClient), (
-            f"expected AsyncAnthropicAuxiliaryClient, got {type(async_client).__name__}"
-        )
-        assert async_model == "claude-opus-4-7"
-
-    def test_aux_task_override_routes_named_provider_to_anthropic(self, tmp_path, monkeypatch):
-        """The full chain: auxiliary.<task>.provider: myrelay with
-        api_mode anthropic_messages must produce an Anthropic client."""
-        monkeypatch.setenv("MYRELAY_API_KEY", "sk-test")
-        _write_config(tmp_path, {
-            "providers": {
-                "myrelay": {
-                    "name": "myrelay",
-                    "base_url": "https://example-relay.test/anthropic",
-                    "key_env": "MYRELAY_API_KEY",
-                    "api_mode": "anthropic_messages",
-                    "default_model": "claude-opus-4-7",
-                },
-            },
-            "auxiliary": {
-                "flush_memories": {
-                    "provider": "myrelay",
-                    "model": "claude-sonnet-4.6",
-                },
-            },
-            "model": {"provider": "openrouter", "default": "anthropic/claude-sonnet-4.6"},
-        })
-        from agent.auxiliary_client import (
-            get_async_text_auxiliary_client,
-            get_text_auxiliary_client,
-            AnthropicAuxiliaryClient,
-            AsyncAnthropicAuxiliaryClient,
-        )
-        async_client, async_model = get_async_text_auxiliary_client("flush_memories")
-        assert isinstance(async_client, AsyncAnthropicAuxiliaryClient)
-        assert async_model == "claude-sonnet-4.6"
-
-        sync_client, sync_model = get_text_auxiliary_client("flush_memories")
-        assert isinstance(sync_client, AnthropicAuxiliaryClient)
-        assert sync_model == "claude-sonnet-4.6"
-
-    def test_provider_without_api_mode_still_uses_openai(self, tmp_path):
-        """Named providers that don't declare api_mode should still go
-        through the plain OpenAI-wire path (no regression)."""
-        _write_config(tmp_path, {
-            "providers": {
-                "localchat": {
-                    "name": "localchat",
-                    "base_url": "http://127.0.0.1:1234/v1",
-                    "api_key": "local-key",
-                    "default_model": "llama-3",
-                },
-            },
-        })
-        from agent.auxiliary_client import resolve_provider_client
-        from openai import OpenAI, AsyncOpenAI
-        sync_client, _ = resolve_provider_client("localchat", async_mode=False)
-        # sync returns the raw OpenAI client
-        assert isinstance(sync_client, OpenAI)
-        async_client, _ = resolve_provider_client("localchat", async_mode=True)
-        assert isinstance(async_client, AsyncOpenAI)
@@ -1230,210 +1230,3 @@ class TestEmptyTextBlockFix:
        from agent.bedrock_adapter import _convert_content_to_converse
        blocks = _convert_content_to_converse("Hello")
        assert blocks[0]["text"] == "Hello"
-
-
-# ---------------------------------------------------------------------------
-# Stale-connection detection and per-region client invalidation
-# ---------------------------------------------------------------------------
-
-class TestInvalidateRuntimeClient:
-    """Per-region eviction used to discard dead/stale bedrock-runtime clients."""
-
-    def test_evicts_only_the_target_region(self):
-        from agent.bedrock_adapter import (
-            _bedrock_runtime_client_cache,
-            invalidate_runtime_client,
-            reset_client_cache,
-        )
-        reset_client_cache()
-        _bedrock_runtime_client_cache["us-east-1"] = "dead-client"
-        _bedrock_runtime_client_cache["us-west-2"] = "live-client"
-
-        evicted = invalidate_runtime_client("us-east-1")
-
-        assert evicted is True
-        assert "us-east-1" not in _bedrock_runtime_client_cache
-        assert _bedrock_runtime_client_cache["us-west-2"] == "live-client"
-
-    def test_returns_false_when_region_not_cached(self):
-        from agent.bedrock_adapter import invalidate_runtime_client, reset_client_cache
-        reset_client_cache()
-        assert invalidate_runtime_client("eu-west-1") is False
-
-
-class TestIsStaleConnectionError:
-    """Classifier that decides whether an exception warrants client eviction."""
-
-    def test_detects_botocore_connection_closed_error(self):
-        from agent.bedrock_adapter import is_stale_connection_error
-        from botocore.exceptions import ConnectionClosedError
-        exc = ConnectionClosedError(endpoint_url="https://bedrock.example")
-        assert is_stale_connection_error(exc) is True
-
-    def test_detects_botocore_endpoint_connection_error(self):
-        from agent.bedrock_adapter import is_stale_connection_error
-        from botocore.exceptions import EndpointConnectionError
-        exc = EndpointConnectionError(endpoint_url="https://bedrock.example")
-        assert is_stale_connection_error(exc) is True
-
-    def test_detects_botocore_read_timeout(self):
-        from agent.bedrock_adapter import is_stale_connection_error
-        from botocore.exceptions import ReadTimeoutError
-        exc = ReadTimeoutError(endpoint_url="https://bedrock.example")
-        assert is_stale_connection_error(exc) is True
-
-    def test_detects_urllib3_protocol_error(self):
-        from agent.bedrock_adapter import is_stale_connection_error
-        from urllib3.exceptions import ProtocolError
-        exc = ProtocolError("Connection broken")
-        assert is_stale_connection_error(exc) is True
-
-    def test_detects_library_internal_assertion_error(self):
-        """A bare AssertionError raised from inside urllib3/botocore signals
-        a corrupted connection-pool invariant and should trigger eviction."""
-        from agent.bedrock_adapter import is_stale_connection_error
-
-        # Fabricate an AssertionError whose traceback's last frame belongs
-        # to a module named "urllib3.connectionpool". We do this by exec'ing
-        # a tiny `assert False` under a fake globals dict — the resulting
-        # frame's ``f_globals["__name__"]`` is what the classifier inspects.
-        fake_globals = {"__name__": "urllib3.connectionpool"}
-        try:
-            exec("def _boom():\n    assert False\n_boom()", fake_globals)
-        except AssertionError as exc:
-            assert is_stale_connection_error(exc) is True
-        else:
-            pytest.fail("AssertionError not raised")
-
-    def test_detects_botocore_internal_assertion_error(self):
-        """Same as above but for a frame inside the botocore namespace."""
-        from agent.bedrock_adapter import is_stale_connection_error
-        fake_globals = {"__name__": "botocore.httpsession"}
-        try:
-            exec("def _boom():\n    assert False\n_boom()", fake_globals)
-        except AssertionError as exc:
-            assert is_stale_connection_error(exc) is True
-        else:
-            pytest.fail("AssertionError not raised")
-
-    def test_ignores_application_assertion_error(self):
-        """AssertionError from application code (not urllib3/botocore) should
-        NOT be classified as stale — those are real test/code bugs."""
-        from agent.bedrock_adapter import is_stale_connection_error
-        try:
-            assert False, "test-only"  # noqa: B011
-        except AssertionError as exc:
-            assert is_stale_connection_error(exc) is False
-
-    def test_ignores_unrelated_exceptions(self):
-        from agent.bedrock_adapter import is_stale_connection_error
-        assert is_stale_connection_error(ValueError("bad input")) is False
-        assert is_stale_connection_error(KeyError("missing")) is False
-
-
-class TestCallConverseInvalidatesOnStaleError:
-    """call_converse / call_converse_stream evict the cached client when the
-    boto3 call raises a stale-connection error — so the next invocation
-    reconnects instead of reusing the dead socket."""
-
-    def test_converse_evicts_client_on_stale_error(self):
-        from agent.bedrock_adapter import (
-            _bedrock_runtime_client_cache,
-            call_converse,
-            reset_client_cache,
-        )
-        from botocore.exceptions import ConnectionClosedError
-
-        reset_client_cache()
-        dead_client = MagicMock()
-        dead_client.converse.side_effect = ConnectionClosedError(
-            endpoint_url="https://bedrock.example",
-        )
-        _bedrock_runtime_client_cache["us-east-1"] = dead_client
-
-        with pytest.raises(ConnectionClosedError):
-            call_converse(
-                region="us-east-1",
-                model="anthropic.claude-3-sonnet-20240229-v1:0",
-                messages=[{"role": "user", "content": "hi"}],
-            )
-
-        assert "us-east-1" not in _bedrock_runtime_client_cache, (
-            "stale client should have been evicted so the retry reconnects"
-        )
-
-    def test_converse_stream_evicts_client_on_stale_error(self):
-        from agent.bedrock_adapter import (
-            _bedrock_runtime_client_cache,
-            call_converse_stream,
-            reset_client_cache,
-        )
-        from botocore.exceptions import ConnectionClosedError
-
-        reset_client_cache()
-        dead_client = MagicMock()
-        dead_client.converse_stream.side_effect = ConnectionClosedError(
-            endpoint_url="https://bedrock.example",
-        )
-        _bedrock_runtime_client_cache["us-east-1"] = dead_client
-
-        with pytest.raises(ConnectionClosedError):
-            call_converse_stream(
-                region="us-east-1",
-                model="anthropic.claude-3-sonnet-20240229-v1:0",
-                messages=[{"role": "user", "content": "hi"}],
-            )
-
-        assert "us-east-1" not in _bedrock_runtime_client_cache
-
-    def test_converse_does_not_evict_on_non_stale_error(self):
-        """Non-stale errors (e.g. ValidationException) leave the client cache alone."""
-        from agent.bedrock_adapter import (
-            _bedrock_runtime_client_cache,
-            call_converse,
-            reset_client_cache,
-        )
-        from botocore.exceptions import ClientError
-
-        reset_client_cache()
-        live_client = MagicMock()
-        live_client.converse.side_effect = ClientError(
-            error_response={"Error": {"Code": "ValidationException", "Message": "bad"}},
-            operation_name="Converse",
-        )
-        _bedrock_runtime_client_cache["us-east-1"] = live_client
-
-        with pytest.raises(ClientError):
-            call_converse(
-                region="us-east-1",
-                model="anthropic.claude-3-sonnet-20240229-v1:0",
-                messages=[{"role": "user", "content": "hi"}],
-            )
-
-        assert _bedrock_runtime_client_cache.get("us-east-1") is live_client, (
-            "validation errors do not indicate a dead connection — keep the client"
-        )
-
-    def test_converse_leaves_successful_client_in_cache(self):
-        from agent.bedrock_adapter import (
-            _bedrock_runtime_client_cache,
-            call_converse,
-            reset_client_cache,
-        )
-
-        reset_client_cache()
-        live_client = MagicMock()
-        live_client.converse.return_value = {
-            "output": {"message": {"role": "assistant", "content": [{"text": "hi"}]}},
-            "stopReason": "end_turn",
-            "usage": {"inputTokens": 1, "outputTokens": 1, "totalTokens": 2},
-        }
-        _bedrock_runtime_client_cache["us-east-1"] = live_client
-
-        call_converse(
-            region="us-east-1",
-            model="anthropic.claude-3-sonnet-20240229-v1:0",
-            messages=[{"role": "user", "content": "hi"}],
-        )
-
-        assert _bedrock_runtime_client_cache.get("us-east-1") is live_client
@@ -376,15 +376,17 @@ class TestBedrockModelNameNormalization:
            "apac.anthropic.claude-haiku-4-5", preserve_dots=True
        ) == "apac.anthropic.claude-haiku-4-5"

-    def test_bedrock_prefix_preserved_without_preserve_dots(self):
-        """Bedrock inference profile IDs are auto-detected by prefix and
-        always returned unmangled -- ``preserve_dots`` is irrelevant for
-        these IDs because the dots are namespace separators, not version
-        separators.  Regression for #12295."""
+    def test_preserve_false_mangles_as_documented(self):
+        """Canary: with ``preserve_dots=False`` the function still
+        produces the broken all-hyphen form — this is the shape that
+        Bedrock rejected and that the fix avoids.  Keeping this test
+        locks in the existing behaviour of ``normalize_model_name`` so a
+        future refactor doesn't accidentally decouple the knob from its
+        effect."""
        from agent.anthropic_adapter import normalize_model_name
        assert normalize_model_name(
            "global.anthropic.claude-opus-4-7", preserve_dots=False
-        ) == "global.anthropic.claude-opus-4-7"
+        ) == "global-anthropic-claude-opus-4-7"

    def test_bare_foundation_model_id_preserved(self):
        """Non-inference-profile Bedrock IDs
@@ -420,11 +422,12 @@ class TestBedrockBuildAnthropicKwargsEndToEnd:
            f"{kwargs['model']!r}"
        )

-    def test_bedrock_model_preserved_without_preserve_dots(self):
-        """Bedrock inference profile IDs survive ``build_anthropic_kwargs``
-        even without ``preserve_dots=True`` -- the prefix auto-detection
-        in ``normalize_model_name`` is the load-bearing piece.
-        Regression for #12295."""
+    def test_bedrock_model_mangled_without_preserve_dots(self):
+        """Inverse canary: without the flag, ``build_anthropic_kwargs``
+        still produces the broken form — so the fix in
+        ``_anthropic_preserve_dots`` is the load-bearing piece that
+        wires ``preserve_dots=True`` through to this builder for the
+        Bedrock case."""
        from agent.anthropic_adapter import build_anthropic_kwargs
        kwargs = build_anthropic_kwargs(
            model="global.anthropic.claude-opus-4-7",
@@ -434,157 +437,4 @@ class TestBedrockBuildAnthropicKwargsEndToEnd:
            reasoning_config=None,
            preserve_dots=False,
        )
-        assert kwargs["model"] == "global.anthropic.claude-opus-4-7"
-
-
-class TestBedrockModelIdDetection:
-    """Tests for ``_is_bedrock_model_id`` and the auto-detection that
-    makes ``normalize_model_name`` preserve dots for Bedrock IDs
-    regardless of ``preserve_dots``.  Regression for #12295."""
-
-    def test_bare_bedrock_id_detected(self):
-        from agent.anthropic_adapter import _is_bedrock_model_id
-        assert _is_bedrock_model_id("anthropic.claude-opus-4-7") is True
-
-    def test_regional_us_prefix_detected(self):
-        from agent.anthropic_adapter import _is_bedrock_model_id
-        assert _is_bedrock_model_id("us.anthropic.claude-sonnet-4-5-v1:0") is True
-
-    def test_regional_global_prefix_detected(self):
-        from agent.anthropic_adapter import _is_bedrock_model_id
-        assert _is_bedrock_model_id("global.anthropic.claude-opus-4-7") is True
-
-    def test_regional_eu_prefix_detected(self):
-        from agent.anthropic_adapter import _is_bedrock_model_id
-        assert _is_bedrock_model_id("eu.anthropic.claude-sonnet-4-6") is True
-
-    def test_openrouter_format_not_detected(self):
-        from agent.anthropic_adapter import _is_bedrock_model_id
-        assert _is_bedrock_model_id("claude-opus-4.6") is False
-
-    def test_bare_claude_not_detected(self):
-        from agent.anthropic_adapter import _is_bedrock_model_id
-        assert _is_bedrock_model_id("claude-opus-4-7") is False
-
-    def test_bare_bedrock_id_preserved_without_flag(self):
-        """The primary bug from #12295: ``anthropic.claude-opus-4-7``
-        sent to bedrock-mantle via auxiliary clients that don't pass
-        ``preserve_dots=True``."""
-        from agent.anthropic_adapter import normalize_model_name
-        assert normalize_model_name(
-            "anthropic.claude-opus-4-7", preserve_dots=False
-        ) == "anthropic.claude-opus-4-7"
-
-    def test_openrouter_dots_still_converted(self):
-        """Non-Bedrock dotted model names must still be converted."""
-        from agent.anthropic_adapter import normalize_model_name
-        assert normalize_model_name("claude-opus-4.6") == "claude-opus-4-6"
-
-    def test_bare_bedrock_id_survives_build_kwargs(self):
-        """End-to-end: bare Bedrock ID through ``build_anthropic_kwargs``
-        without ``preserve_dots=True`` -- the auxiliary client path."""
-        from agent.anthropic_adapter import build_anthropic_kwargs
-        kwargs = build_anthropic_kwargs(
-            model="anthropic.claude-opus-4-7",
-            messages=[{"role": "user", "content": "hi"}],
-            tools=None,
-            max_tokens=1024,
-            reasoning_config=None,
-            preserve_dots=False,
-        )
-        assert kwargs["model"] == "anthropic.claude-opus-4-7"
-
-
-# ---------------------------------------------------------------------------
-# auxiliary_client Bedrock resolution — fix for #13919
-# ---------------------------------------------------------------------------
-# Before the fix, resolve_provider_client("bedrock", ...) fell through to the
-# "unhandled auth_type" warning and returned (None, None), breaking all
-# auxiliary tasks (compression, memory, summarization) for Bedrock users.
-
-
-class TestAuxiliaryClientBedrockResolution:
-    """Verify resolve_provider_client handles Bedrock's aws_sdk auth type."""
-
-    def test_bedrock_returns_client_with_credentials(self, monkeypatch):
-        """With valid AWS credentials, Bedrock should return a usable client."""
-        monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
-        monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
-        monkeypatch.setenv("AWS_REGION", "us-west-2")
-
-        mock_anthropic_bedrock = MagicMock()
-        with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
-                   return_value=mock_anthropic_bedrock):
-            from agent.auxiliary_client import resolve_provider_client, AnthropicAuxiliaryClient
-            client, model = resolve_provider_client("bedrock", None)
-
-        assert client is not None, (
-            "resolve_provider_client('bedrock') returned None — "
-            "aws_sdk auth type is not handled"
-        )
-        assert isinstance(client, AnthropicAuxiliaryClient)
-        assert model is not None
-        assert client.api_key == "aws-sdk"
-        assert "us-west-2" in client.base_url
-
-    def test_bedrock_returns_none_without_credentials(self, monkeypatch):
-        """Without AWS credentials, Bedrock should return (None, None) gracefully."""
-        with patch("agent.bedrock_adapter.has_aws_credentials", return_value=False):
-            from agent.auxiliary_client import resolve_provider_client
-            client, model = resolve_provider_client("bedrock", None)
-
-        assert client is None
-        assert model is None
-
-    def test_bedrock_uses_configured_region(self, monkeypatch):
-        """Bedrock client base_url should reflect AWS_REGION."""
-        monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
-        monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
-        monkeypatch.setenv("AWS_REGION", "eu-central-1")
-
-        with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
-                   return_value=MagicMock()):
-            from agent.auxiliary_client import resolve_provider_client
-            client, _ = resolve_provider_client("bedrock", None)
-
-        assert client is not None
-        assert "eu-central-1" in client.base_url
-
-    def test_bedrock_respects_explicit_model(self, monkeypatch):
-        """When caller passes an explicit model, it should be used."""
-        monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
-        monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
-
-        with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
-                   return_value=MagicMock()):
-            from agent.auxiliary_client import resolve_provider_client
-            _, model = resolve_provider_client(
-                "bedrock", "us.anthropic.claude-sonnet-4-5-20250929-v1:0"
-            )
-
-        assert "claude-sonnet" in model
-
-    def test_bedrock_async_mode(self, monkeypatch):
-        """Async mode should return an AsyncAnthropicAuxiliaryClient."""
-        monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
-        monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
-
-        with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
-                   return_value=MagicMock()):
-            from agent.auxiliary_client import resolve_provider_client, AsyncAnthropicAuxiliaryClient
-            client, model = resolve_provider_client("bedrock", None, async_mode=True)
-
-        assert client is not None
-        assert isinstance(client, AsyncAnthropicAuxiliaryClient)
-
-    def test_bedrock_default_model_is_haiku(self, monkeypatch):
-        """Default auxiliary model for Bedrock should be Haiku (fast, cheap)."""
-        monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
-        monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
-
-        with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
-                   return_value=MagicMock()):
-            from agent.auxiliary_client import resolve_provider_client
-            _, model = resolve_provider_client("bedrock", None)
-
-        assert "haiku" in model.lower()
+        assert kwargs["model"] == "global-anthropic-claude-opus-4-7"
@@ -847,32 +847,6 @@ class TestTokenBudgetTailProtection:
        assert isinstance(pruned, int)


-class TestUpdateModelBudgets:
-    """Regression: update_model() must recalculate token budgets."""
-
-    def test_tail_budget_recalculated(self):
-        """tail_token_budget must change after switching to a different context length."""
-        from unittest.mock import patch
-        with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
-            comp = ContextCompressor("model-a", threshold_percent=0.50, quiet_mode=True)
-        old_tail = comp.tail_token_budget
-        old_max_summary = comp.max_summary_tokens
-
-        comp.update_model("model-b", context_length=32_000)
-        assert comp.tail_token_budget != old_tail, "tail_token_budget should change"
-        assert comp.tail_token_budget < old_tail, "smaller context → smaller budget"
-        assert comp.max_summary_tokens != old_max_summary, "max_summary_tokens should change"
-
-    def test_budgets_proportional(self):
-        """Budgets should be proportional to context_length after update."""
-        from unittest.mock import patch
-        with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
-            comp = ContextCompressor("model-a", threshold_percent=0.50, quiet_mode=True)
-        comp.update_model("model-b", context_length=10_000)
-        assert comp.tail_token_budget == int(comp.threshold_tokens * comp.summary_target_ratio)
-        assert comp.max_summary_tokens == min(int(10_000 * 0.05), 4000)
-
-
 class TestTruncateToolCallArgsJson:
    """Regression tests for #11762.

@@ -144,60 +144,3 @@ class CopilotACPClientSafetyTests(unittest.TestCase):

 if __name__ == "__main__":
    unittest.main()
-
-
-# ── HOME env propagation tests (from PR #11285) ─────────────────────
-
-from unittest.mock import patch as _patch
-import pytest
-
-
-def _make_home_client(tmp_path):
-    return CopilotACPClient(
-        api_key="copilot-acp",
-        base_url="acp://copilot",
-        acp_command="copilot",
-        acp_args=["--acp", "--stdio"],
-        acp_cwd=str(tmp_path),
-    )
-
-
-def _fake_popen_capture(captured):
-    def _fake(cmd, **kwargs):
-        captured["cmd"] = cmd
-        captured["kwargs"] = kwargs
-        raise FileNotFoundError("copilot not found")
-    return _fake
-
-
-def test_run_prompt_prefers_profile_home_when_available(monkeypatch, tmp_path):
-    hermes_home = tmp_path / "hermes"
-    profile_home = hermes_home / "home"
-    profile_home.mkdir(parents=True)
-
-    monkeypatch.delenv("HOME", raising=False)
-    monkeypatch.setenv("HERMES_HOME", str(hermes_home))
-
-    captured = {}
-    client = _make_home_client(tmp_path)
-
-    with _patch("agent.copilot_acp_client.subprocess.Popen", side_effect=_fake_popen_capture(captured)):
-        with pytest.raises(RuntimeError, match="Could not start Copilot ACP command"):
-            client._run_prompt("hello", timeout_seconds=1)
-
-    assert captured["kwargs"]["env"]["HOME"] == str(profile_home)
-
-
-def test_run_prompt_passes_home_when_parent_env_is_clean(monkeypatch, tmp_path):
-    monkeypatch.delenv("HOME", raising=False)
-    monkeypatch.delenv("HERMES_HOME", raising=False)
-
-    captured = {}
-    client = _make_home_client(tmp_path)
-
-    with _patch("agent.copilot_acp_client.subprocess.Popen", side_effect=_fake_popen_capture(captured)):
-        with pytest.raises(RuntimeError, match="Could not start Copilot ACP command"):
-            client._run_prompt("hello", timeout_seconds=1)
-
-    assert "env" in captured["kwargs"]
-    assert captured["kwargs"]["env"]["HOME"]
@@ -1102,271 +1102,3 @@ def test_load_pool_does_not_seed_qwen_oauth_when_no_token(tmp_path, monkeypatch)

    assert not pool.has_credentials()
    assert pool.entries() == []
-
-
-def test_nous_seed_from_singletons_preserves_obtained_at_timestamps(tmp_path, monkeypatch):
-    """Regression test for #15099 secondary issue.
-
-    When ``_seed_from_singletons`` materialises a device_code pool entry from
-    the ``providers.nous`` singleton, it must carry the mint/refresh
-    timestamps (``obtained_at``, ``agent_key_obtained_at``, ``expires_in``,
-    etc.) into the pool entry.  Without them, freshness-sensitive consumers
-    (self-heal hooks, pool pruning by age) treat just-minted credentials as
-    older than they actually are and evict them.
-    """
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
-    _write_auth_store(
-        tmp_path,
-        {
-            "version": 1,
-            "providers": {
-                "nous": {
-                    "access_token": "at_XXXXXXXX",
-                    "refresh_token": "rt_YYYYYYYY",
-                    "client_id": "hermes-cli",
-                    "portal_base_url": "https://portal.nousresearch.com",
-                    "inference_base_url": "https://inference.nousresearch.com/v1",
-                    "token_type": "Bearer",
-                    "scope": "openid profile",
-                    "obtained_at": "2026-04-24T10:00:00+00:00",
-                    "expires_at": "2026-04-24T11:00:00+00:00",
-                    "expires_in": 3600,
-                    "agent_key": "sk-nous-AAAA",
-                    "agent_key_id": "ak_123",
-                    "agent_key_expires_at": "2026-04-25T10:00:00+00:00",
-                    "agent_key_expires_in": 86400,
-                    "agent_key_reused": False,
-                    "agent_key_obtained_at": "2026-04-24T10:00:05+00:00",
-                    "tls": {"insecure": False, "ca_bundle": None},
-                },
-            },
-        },
-    )
-
-    from agent.credential_pool import load_pool
-
-    pool = load_pool("nous")
-    entries = pool.entries()
-
-    device_entries = [e for e in entries if e.source == "device_code"]
-    assert len(device_entries) == 1, f"expected single device_code entry; got {len(device_entries)}"
-    e = device_entries[0]
-
-    # Direct dataclass fields — must survive the singleton → pool copy.
-    assert e.access_token == "at_XXXXXXXX"
-    assert e.refresh_token == "rt_YYYYYYYY"
-    assert e.expires_at == "2026-04-24T11:00:00+00:00"
-    assert e.agent_key == "sk-nous-AAAA"
-    assert e.agent_key_expires_at == "2026-04-25T10:00:00+00:00"
-
-    # Extra fields — this is what regressed.  These must be carried through
-    # via ``extra`` dict or __getattr__, NOT silently dropped.
-    assert e.obtained_at == "2026-04-24T10:00:00+00:00", (
-        f"obtained_at was dropped during seed; got {e.obtained_at!r}. This breaks "
-        f"downstream pool-freshness consumers (#15099)."
-    )
-    assert e.agent_key_obtained_at == "2026-04-24T10:00:05+00:00"
-    assert e.expires_in == 3600
-    assert e.agent_key_id == "ak_123"
-    assert e.agent_key_expires_in == 86400
-    assert e.agent_key_reused is False
-
-
-class TestLeastUsedStrategy:
-    """Regression: least_used strategy must increment request_count on select."""
-
-    def test_request_count_increments(self):
-        """Each select() call should increment the chosen entry's request_count."""
-        from unittest.mock import patch as _patch
-        from agent.credential_pool import CredentialPool, PooledCredential, STRATEGY_LEAST_USED
-
-        entries = [
-            PooledCredential(provider="test", id="a", label="a", auth_type="api_key",
-                             source="a", access_token="tok-a", priority=0, request_count=0),
-            PooledCredential(provider="test", id="b", label="b", auth_type="api_key",
-                             source="b", access_token="tok-b", priority=1, request_count=0),
-        ]
-        with _patch("agent.credential_pool.get_pool_strategy", return_value=STRATEGY_LEAST_USED):
-            pool = CredentialPool("test", entries)
-
-        # First select should pick entry with lowest count (both 0 → first)
-        e1 = pool.select()
-        assert e1 is not None
-        count_after_first = e1.request_count
-        assert count_after_first == 1, f"Expected 1 after first select, got {count_after_first}"
-
-        # Second select should pick the OTHER entry (now has lower count)
-        e2 = pool.select()
-        assert e2 is not None
-        assert e2.id != e1.id or e2.request_count == 2, (
-            "least_used should alternate or increment"
-        )
-
-
-# ── PR #10160 salvage: Nous OAuth cross-process sync tests ─────────────────
-
-def test_sync_nous_entry_from_auth_store_adopts_newer_tokens(tmp_path, monkeypatch):
-    """When auth.json has a newer refresh token, the pool entry should adopt it."""
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
-    _write_auth_store(
-        tmp_path,
-        {
-            "version": 1,
-            "active_provider": "nous",
-            "providers": {
-                "nous": {
-                    "portal_base_url": "https://portal.example.com",
-                    "inference_base_url": "https://inference.example.com/v1",
-                    "client_id": "hermes-cli",
-                    "token_type": "Bearer",
-                    "scope": "inference:mint_agent_key",
-                    "access_token": "access-OLD",
-                    "refresh_token": "refresh-OLD",
-                    "expires_at": "2026-03-24T12:00:00+00:00",
-                    "agent_key": "agent-key-OLD",
-                    "agent_key_expires_at": "2026-03-24T13:30:00+00:00",
-                }
-            },
-        },
-    )
-
-    from agent.credential_pool import load_pool
-
-    pool = load_pool("nous")
-    entry = pool.select()
-    assert entry is not None
-    assert entry.refresh_token == "refresh-OLD"
-
-    # Simulate another process refreshing the token in auth.json
-    _write_auth_store(
-        tmp_path,
-        {
-            "version": 1,
-            "active_provider": "nous",
-            "providers": {
-                "nous": {
-                    "portal_base_url": "https://portal.example.com",
-                    "inference_base_url": "https://inference.example.com/v1",
-                    "client_id": "hermes-cli",
-                    "token_type": "Bearer",
-                    "scope": "inference:mint_agent_key",
-                    "access_token": "access-NEW",
-                    "refresh_token": "refresh-NEW",
-                    "expires_at": "2026-03-24T12:30:00+00:00",
-                    "agent_key": "agent-key-NEW",
-                    "agent_key_expires_at": "2026-03-24T14:00:00+00:00",
-                }
-            },
-        },
-    )
-
-    synced = pool._sync_nous_entry_from_auth_store(entry)
-    assert synced is not entry
-    assert synced.access_token == "access-NEW"
-    assert synced.refresh_token == "refresh-NEW"
-    assert synced.agent_key == "agent-key-NEW"
-    assert synced.agent_key_expires_at == "2026-03-24T14:00:00+00:00"
-
-def test_sync_nous_entry_noop_when_tokens_match(tmp_path, monkeypatch):
-    """When auth.json has the same refresh token, sync should be a no-op."""
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
-    _write_auth_store(
-        tmp_path,
-        {
-            "version": 1,
-            "active_provider": "nous",
-            "providers": {
-                "nous": {
-                    "portal_base_url": "https://portal.example.com",
-                    "inference_base_url": "https://inference.example.com/v1",
-                    "client_id": "hermes-cli",
-                    "token_type": "Bearer",
-                    "scope": "inference:mint_agent_key",
-                    "access_token": "access-token",
-                    "refresh_token": "refresh-token",
-                    "expires_at": "2026-03-24T12:00:00+00:00",
-                    "agent_key": "agent-key",
-                    "agent_key_expires_at": "2026-03-24T13:30:00+00:00",
-                }
-            },
-        },
-    )
-
-    from agent.credential_pool import load_pool
-
-    pool = load_pool("nous")
-    entry = pool.select()
-    assert entry is not None
-
-    synced = pool._sync_nous_entry_from_auth_store(entry)
-    assert synced is entry
-
-def test_nous_exhausted_entry_recovers_via_auth_store_sync(tmp_path, monkeypatch):
-    """An exhausted Nous entry should recover when auth.json has newer tokens."""
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
-    from agent.credential_pool import load_pool, STATUS_EXHAUSTED
-    from dataclasses import replace as dc_replace
-
-    _write_auth_store(
-        tmp_path,
-        {
-            "version": 1,
-            "active_provider": "nous",
-            "providers": {
-                "nous": {
-                    "portal_base_url": "https://portal.example.com",
-                    "inference_base_url": "https://inference.example.com/v1",
-                    "client_id": "hermes-cli",
-                    "token_type": "Bearer",
-                    "scope": "inference:mint_agent_key",
-                    "access_token": "access-OLD",
-                    "refresh_token": "refresh-OLD",
-                    "expires_at": "2026-03-24T12:00:00+00:00",
-                    "agent_key": "agent-key",
-                    "agent_key_expires_at": "2026-03-24T13:30:00+00:00",
-                }
-            },
-        },
-    )
-
-    pool = load_pool("nous")
-    entry = pool.select()
-    assert entry is not None
-
-    # Mark entry as exhausted (simulating a failed refresh)
-    exhausted = dc_replace(
-        entry,
-        last_status=STATUS_EXHAUSTED,
-        last_status_at=time.time(),
-        last_error_code=401,
-    )
-    pool._replace_entry(entry, exhausted)
-    pool._persist()
-
-    # Simulate another process having successfully refreshed
-    _write_auth_store(
-        tmp_path,
-        {
-            "version": 1,
-            "active_provider": "nous",
-            "providers": {
-                "nous": {
-                    "portal_base_url": "https://portal.example.com",
-                    "inference_base_url": "https://inference.example.com/v1",
-                    "client_id": "hermes-cli",
-                    "token_type": "Bearer",
-                    "scope": "inference:mint_agent_key",
-                    "access_token": "access-FRESH",
-                    "refresh_token": "refresh-FRESH",
-                    "expires_at": "2026-03-24T12:30:00+00:00",
-                    "agent_key": "agent-key-FRESH",
-                    "agent_key_expires_at": "2026-03-24T14:00:00+00:00",
-                }
-            },
-        },
-    )
-
-    available = pool._available_entries(clear_expired=True)
-    assert len(available) == 1
-    assert available[0].refresh_token == "refresh-FRESH"
-    assert available[0].last_status is None
@@ -56,7 +56,6 @@ class TestFailoverReason:
            "overloaded", "server_error", "timeout",
            "context_overflow", "payload_too_large",
            "model_not_found", "format_error",
-            "provider_policy_blocked",
            "thinking_signature", "long_context_tier", "unknown",
        }
        actual = {r.value for r in FailoverReason}
@@ -309,59 +308,6 @@ class TestClassifyApiError:
        assert result.retryable is True
        assert result.should_fallback is False

-    # ── Provider policy-block (OpenRouter privacy/guardrail) ──
-
-    def test_404_openrouter_policy_blocked(self):
-        # Real OpenRouter error when the user's account privacy setting
-        # excludes the only endpoint serving a model (e.g. DeepSeek V4 Pro
-        # which is hosted only by DeepSeek, and their endpoint may log
-        # inputs).  Must NOT classify as model_not_found — the model
-        # exists, falling back won't help (same account setting applies),
-        # and the error body already tells the user where to fix it.
-        e = MockAPIError(
-            "No endpoints available matching your guardrail restrictions "
-            "and data policy. Configure: https://openrouter.ai/settings/privacy",
-            status_code=404,
-        )
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.provider_policy_blocked
-        assert result.retryable is False
-        assert result.should_fallback is False
-
-    def test_400_openrouter_policy_blocked(self):
-        # Defense-in-depth: if OpenRouter ever returns this as 400 instead
-        # of 404, still classify it distinctly rather than as format_error
-        # or model_not_found.
-        e = MockAPIError(
-            "No endpoints available matching your data policy",
-            status_code=400,
-        )
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.provider_policy_blocked
-        assert result.retryable is False
-        assert result.should_fallback is False
-
-    def test_message_only_openrouter_policy_blocked(self):
-        # No status code — classifier should still catch the fingerprint
-        # via the message-pattern fallback.
-        e = Exception(
-            "No endpoints available matching your guardrail restrictions "
-            "and data policy"
-        )
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.provider_policy_blocked
-
-    def test_404_model_not_found_still_works(self):
-        # Regression guard: the new policy-block check must not swallow
-        # genuine model_not_found 404s.
-        e = MockAPIError(
-            "openrouter/nonexistent-model is not a valid model ID",
-            status_code=404,
-        )
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.model_not_found
-        assert result.should_fallback is True
-
    # ── Payload too large ──

    def test_413_payload_too_large(self):
@@ -1094,37 +1040,3 @@ class TestSSLTransientPatterns:
        result = classify_api_error(e)
        assert result.reason == FailoverReason.timeout
        assert result.retryable is True
-
-# ── Test: RateLimitError without status_code (Copilot/GitHub Models) ──────────
-
-class TestRateLimitErrorWithoutStatusCode:
-    """Regression tests for the Copilot/GitHub Models edge case where the
-    OpenAI SDK raises RateLimitError but does not populate .status_code."""
-
-    def _make_rate_limit_error(self, status_code=None):
-        """Create an exception whose class name is 'RateLimitError' with
-        an optionally missing status_code, mirroring the OpenAI SDK shape."""
-        cls = type("RateLimitError", (Exception,), {})
-        e = cls("You have exceeded your rate limit.")
-        e.status_code = status_code  # None simulates the Copilot case
-        return e
-
-    def test_rate_limit_error_without_status_code_classified_as_rate_limit(self):
-        """RateLimitError with status_code=None must classify as rate_limit."""
-        e = self._make_rate_limit_error(status_code=None)
-        result = classify_api_error(e, provider="copilot", model="gpt-4o")
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_rate_limit_error_with_status_code_429_classified_as_rate_limit(self):
-        """RateLimitError that does set status_code=429 still classifies correctly."""
-        e = self._make_rate_limit_error(status_code=429)
-        result = classify_api_error(e, provider="copilot", model="gpt-4o")
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_other_error_without_status_code_not_forced_to_rate_limit(self):
-        """A non-RateLimitError with missing status_code must NOT be forced to 429."""
-        cls = type("APIError", (Exception,), {})
-        e = cls("something went wrong")
-        e.status_code = None
-        result = classify_api_error(e, provider="copilot", model="gpt-4o")
-        assert result.reason != FailoverReason.rate_limit
@@ -1,166 +0,0 @@
-"""Tests for Gemini free-tier detection and blocking."""
-from __future__ import annotations
-
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-from agent.gemini_native_adapter import (
-    gemini_http_error,
-    is_free_tier_quota_error,
-    probe_gemini_tier,
-)
-
-
-def _mock_response(status: int, headers: dict | None = None, text: str = "") -> MagicMock:
-    resp = MagicMock()
-    resp.status_code = status
-    resp.headers = headers or {}
-    resp.text = text
-    return resp
-
-
-def _run_probe(resp: MagicMock) -> str:
-    with patch("agent.gemini_native_adapter.httpx.Client") as MC:
-        inst = MagicMock()
-        inst.post.return_value = resp
-        MC.return_value.__enter__.return_value = inst
-        return probe_gemini_tier("fake-key")
-
-
-class TestProbeGeminiTier:
-    """Verify the tier probe classifies keys correctly."""
-
-    def test_free_tier_via_rpd_header_flash(self):
-        # gemini-2.5-flash free tier: 250 RPD
-        resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "250"}, "{}")
-        assert _run_probe(resp) == "free"
-
-    def test_free_tier_via_rpd_header_pro(self):
-        # gemini-2.5-pro free tier: 100 RPD
-        resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "100"}, "{}")
-        assert _run_probe(resp) == "free"
-
-    def test_free_tier_via_rpd_header_flash_lite(self):
-        # flash-lite free tier: 1000 RPD (our upper bound)
-        resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "1000"}, "{}")
-        assert _run_probe(resp) == "free"
-
-    def test_paid_tier_via_rpd_header(self):
-        # Tier 1 starts at 1500+ RPD
-        resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "1500"}, "{}")
-        assert _run_probe(resp) == "paid"
-
-    def test_free_tier_via_429_body(self):
-        body = (
-            '{"error":{"code":429,"message":"Quota exceeded for metric: '
-            'generativelanguage.googleapis.com/generate_content_free_tier_requests, '
-            'limit: 20"}}'
-        )
-        resp = _mock_response(429, {}, body)
-        assert _run_probe(resp) == "free"
-
-    def test_paid_429_has_no_free_tier_marker(self):
-        body = '{"error":{"code":429,"message":"rate limited"}}'
-        resp = _mock_response(429, {}, body)
-        assert _run_probe(resp) == "paid"
-
-    def test_successful_200_without_rpd_header_is_paid(self):
-        resp = _mock_response(200, {}, '{"candidates":[]}')
-        assert _run_probe(resp) == "paid"
-
-    def test_401_returns_unknown(self):
-        resp = _mock_response(401, {}, '{"error":{"code":401}}')
-        assert _run_probe(resp) == "unknown"
-
-    def test_404_returns_unknown(self):
-        resp = _mock_response(404, {}, '{"error":{"code":404}}')
-        assert _run_probe(resp) == "unknown"
-
-    def test_network_error_returns_unknown(self):
-        with patch(
-            "agent.gemini_native_adapter.httpx.Client",
-            side_effect=Exception("dns failure"),
-        ):
-            assert probe_gemini_tier("fake-key") == "unknown"
-
-    def test_empty_key_returns_unknown(self):
-        assert probe_gemini_tier("") == "unknown"
-        assert probe_gemini_tier("   ") == "unknown"
-        assert probe_gemini_tier(None) == "unknown"  # type: ignore[arg-type]
-
-    def test_malformed_rpd_header_falls_through(self):
-        # Non-integer header value shouldn't crash; 200 with no usable header -> paid.
-        resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "abc"}, "{}")
-        assert _run_probe(resp) == "paid"
-
-    def test_openai_compat_suffix_stripped(self):
-        """Base URLs ending in /openai get normalized to the native endpoint."""
-        resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "1500"}, "{}")
-        with patch("agent.gemini_native_adapter.httpx.Client") as MC:
-            inst = MagicMock()
-            inst.post.return_value = resp
-            MC.return_value.__enter__.return_value = inst
-            probe_gemini_tier(
-                "fake",
-                "https://generativelanguage.googleapis.com/v1beta/openai",
-            )
-            # Verify the post URL does NOT contain /openai
-            called_url = inst.post.call_args[0][0]
-            assert "/openai/" not in called_url
-            assert called_url.endswith(":generateContent")
-
-
-class TestIsFreeTierQuotaError:
-    def test_detects_free_tier_marker(self):
-        assert is_free_tier_quota_error(
-            "Quota exceeded for metric: generate_content_free_tier_requests"
-        )
-
-    def test_case_insensitive(self):
-        assert is_free_tier_quota_error("QUOTA: FREE_TIER_REQUESTS")
-
-    def test_no_free_tier_marker(self):
-        assert not is_free_tier_quota_error("rate limited")
-
-    def test_empty_string(self):
-        assert not is_free_tier_quota_error("")
-
-    def test_none(self):
-        assert not is_free_tier_quota_error(None)  # type: ignore[arg-type]
-
-
-class TestGeminiHttpErrorFreeTierGuidance:
-    """gemini_http_error should append free-tier guidance for free-tier 429s."""
-
-    class _FakeResp:
-        def __init__(self, status: int, text: str):
-            self.status_code = status
-            self.headers: dict = {}
-            self.text = text
-
-    def test_free_tier_429_appends_guidance(self):
-        body = (
-            '{"error":{"code":429,"message":"Quota exceeded for metric: '
-            "generativelanguage.googleapis.com/generate_content_free_tier_requests, "
-            'limit: 20","status":"RESOURCE_EXHAUSTED"}}'
-        )
-        err = gemini_http_error(self._FakeResp(429, body))
-        msg = str(err)
-        assert "free tier" in msg.lower()
-        assert "aistudio.google.com/apikey" in msg
-
-    def test_paid_429_has_no_billing_url(self):
-        body = '{"error":{"code":429,"message":"Rate limited","status":"RESOURCE_EXHAUSTED"}}'
-        err = gemini_http_error(self._FakeResp(429, body))
-        assert "aistudio.google.com/apikey" not in str(err)
-
-    def test_non_429_has_no_billing_url(self):
-        body = '{"error":{"code":400,"message":"bad request","status":"INVALID_ARGUMENT"}}'
-        err = gemini_http_error(self._FakeResp(400, body))
-        assert "aistudio.google.com/apikey" not in str(err)
-
-    def test_401_has_no_billing_url(self):
-        body = '{"error":{"code":401,"message":"API key invalid","status":"UNAUTHENTICATED"}}'
-        err = gemini_http_error(self._FakeResp(401, body))
-        assert "aistudio.google.com/apikey" not in str(err)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Ari Lotter	2f230b5ad9	feat: add fast-path setup for nous account adds a nous account specific fast flow & autolaunches into chat if gateway isn't set up	2026-04-24 00:07:23 -04:00
Ari Lotter	bdc9b07c9d	change: always run setup on no-config run there's instructions on how to exit & do it manually, no point in asking	2026-04-24 00:06:48 -04:00