Removing old patches

Removing old files
Updating with trainer config pieces
2026-03-30 10:06:08 -07:00 · 2026-03-30 09:58:05 -07:00 · 2026-03-30 09:46:24 -07:00 · 2026-03-30 09:46:24 -07:00 · 2026-03-30 09:46:24 -07:00 · 2026-03-30 09:46:24 -07:00
255 changed files with 4817 additions and 20198 deletions
@@ -10,6 +10,4 @@ node_modules
 .github

 # Environment files
-.env
-
-*.md
+.env
@@ -7,19 +7,18 @@
 # OpenRouter provides access to many models through one API
 # All LLM calls go through OpenRouter - no direct provider keys needed
 # Get your key at: https://openrouter.ai/keys
-# OPENROUTER_API_KEY=
+OPENROUTER_API_KEY=

-# Default model is configured in ~/.hermes/config.yaml (model.default).
-# Use 'hermes model' or 'hermes setup' to change it.
-# LLM_MODEL is no longer read from .env — this line is kept for reference only.
-# LLM_MODEL=anthropic/claude-opus-4.6
+# Default model to use (OpenRouter format: provider/model)
+# Examples: anthropic/claude-opus-4.6, openai/gpt-4o, google/gemini-3-flash-preview, zhipuai/glm-4-plus
+LLM_MODEL=anthropic/claude-opus-4.6

 # =============================================================================
 # LLM PROVIDER (z.ai / GLM)
 # =============================================================================
 # z.ai provides access to ZhipuAI GLM models (GLM-4-Plus, etc.)
 # Get your key at: https://z.ai or https://open.bigmodel.cn
-# GLM_API_KEY=
+GLM_API_KEY=
 # GLM_BASE_URL=https://api.z.ai/api/paas/v4  # Override default base URL

 # =============================================================================
@@ -29,7 +28,7 @@
 # Get your key at: https://platform.kimi.ai (Kimi Code console)
 # Keys prefixed sk-kimi- use the Kimi Code API (api.kimi.com) by default.
 # Legacy keys from platform.moonshot.ai need KIMI_BASE_URL override below.
-# KIMI_API_KEY=
+KIMI_API_KEY=
 # KIMI_BASE_URL=https://api.kimi.com/coding/v1  # Default for sk-kimi- keys
 # KIMI_BASE_URL=https://api.moonshot.ai/v1      # For legacy Moonshot keys
 # KIMI_BASE_URL=https://api.moonshot.cn/v1       # For Moonshot China keys
@@ -39,11 +38,11 @@
 # =============================================================================
 # MiniMax provides access to MiniMax models (global endpoint)
 # Get your key at: https://www.minimax.io
-# MINIMAX_API_KEY=
+MINIMAX_API_KEY=
 # MINIMAX_BASE_URL=https://api.minimax.io/v1  # Override default base URL

 # MiniMax China endpoint (for users in mainland China)
-# MINIMAX_CN_API_KEY=
+MINIMAX_CN_API_KEY=
 # MINIMAX_CN_BASE_URL=https://api.minimaxi.com/v1  # Override default base URL

 # =============================================================================
@@ -51,7 +50,7 @@
 # =============================================================================
 # OpenCode Zen provides curated, tested models (GPT, Claude, Gemini, MiniMax, GLM, Kimi)
 # Pay-as-you-go pricing. Get your key at: https://opencode.ai/auth
-# OPENCODE_ZEN_API_KEY=
+OPENCODE_ZEN_API_KEY=
 # OPENCODE_ZEN_BASE_URL=https://opencode.ai/zen/v1  # Override default base URL

 # =============================================================================
@@ -59,7 +58,7 @@
 # =============================================================================
 # OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
 # $10/month subscription. Get your key at: https://opencode.ai/auth
-# OPENCODE_GO_API_KEY=
+OPENCODE_GO_API_KEY=

 # =============================================================================
 # LLM PROVIDER (Hugging Face Inference Providers)
@@ -68,7 +67,7 @@
 # Free tier included ($0.10/month), no markup on provider rates.
 # Get your token at: https://huggingface.co/settings/tokens
 # Required permission: "Make calls to Inference Providers"
-# HF_TOKEN=
+HF_TOKEN=
 # OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1  # Override default base URL

 # =============================================================================
@@ -77,26 +76,26 @@

 # Exa API Key - AI-native web search and contents
 # Get at: https://exa.ai
-# EXA_API_KEY=
+EXA_API_KEY=

 # Parallel API Key - AI-native web search and extract
 # Get at: https://parallel.ai
-# PARALLEL_API_KEY=
+PARALLEL_API_KEY=

 # Firecrawl API Key - Web search, extract, and crawl
 # Get at: https://firecrawl.dev/
-# FIRECRAWL_API_KEY=
+FIRECRAWL_API_KEY=


 # FAL.ai API Key - Image generation
 # Get at: https://fal.ai/
-# FAL_KEY=
+FAL_KEY=

 # Honcho - Cross-session AI-native user modeling (optional)
 # Builds a persistent understanding of the user across sessions and tools.
 # Get at: https://app.honcho.dev
 # Also requires ~/.honcho/config.json with enabled=true (see README).
-# HONCHO_API_KEY=
+HONCHO_API_KEY=

 # =============================================================================
 # TERMINAL TOOL CONFIGURATION
@@ -182,10 +181,10 @@ TERMINAL_LIFETIME_SECONDS=300

 # Browserbase API Key - Cloud browser execution
 # Get at: https://browserbase.com/
-# BROWSERBASE_API_KEY=
+BROWSERBASE_API_KEY=

 # Browserbase Project ID - From your Browserbase dashboard
-# BROWSERBASE_PROJECT_ID=
+BROWSERBASE_PROJECT_ID=

 # Enable residential proxies for better CAPTCHA solving (default: true)
 # Routes traffic through residential IPs, significantly improves success rate
@@ -217,7 +216,7 @@ BROWSER_INACTIVITY_TIMEOUT=120
 # Uses OpenAI's API directly (not via OpenRouter).
 # Named VOICE_TOOLS_OPENAI_KEY to avoid interference with OpenRouter.
 # Get at: https://platform.openai.com/api-keys
-# VOICE_TOOLS_OPENAI_KEY=
+VOICE_TOOLS_OPENAI_KEY=

 # =============================================================================
 # SLACK INTEGRATION
@@ -232,21 +231,6 @@ BROWSER_INACTIVITY_TIMEOUT=120
 # Slack allowed users (comma-separated Slack user IDs)
 # SLACK_ALLOWED_USERS=

-# =============================================================================
-# TELEGRAM INTEGRATION
-# =============================================================================
-# Telegram Bot Token - From @BotFather (https://t.me/BotFather)
-# TELEGRAM_BOT_TOKEN=
-# TELEGRAM_ALLOWED_USERS=                  # Comma-separated user IDs
-# TELEGRAM_HOME_CHANNEL=                   # Default chat for cron delivery
-# TELEGRAM_HOME_CHANNEL_NAME=              # Display name for home channel
-
-# Webhook mode (optional — for cloud deployments like Fly.io/Railway)
-# Default is long polling. Setting TELEGRAM_WEBHOOK_URL switches to webhook mode.
-# TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
-# TELEGRAM_WEBHOOK_PORT=8443
-# TELEGRAM_WEBHOOK_SECRET=                 # Recommended for production
-
 # WhatsApp (built-in Baileys bridge — run `hermes whatsapp` to pair)
 # WHATSAPP_ENABLED=false
 # WHATSAPP_ALLOWED_USERS=15551234567
@@ -303,11 +287,11 @@ IMAGE_TOOLS_DEBUG=false

 # Tinker API Key - RL training service
 # Get at: https://tinker-console.thinkingmachines.ai/keys
-# TINKER_API_KEY=
+TINKER_API_KEY=

 # Weights & Biases API Key - Experiment tracking and metrics
 # Get at: https://wandb.ai/authorize
-# WANDB_API_KEY=
+WANDB_API_KEY=

 # RL API Server URL (default: http://localhost:8080)
 # Change if running the rl-server on a different host/port
@@ -19,8 +19,6 @@ concurrency:

 jobs:
  build-and-deploy:
-    # Only run on the upstream repository, not on forks
-    if: github.repository == 'NousResearch/hermes-agent'
    runs-on: ubuntu-latest
    environment:
      name: github-pages
@@ -5,8 +5,6 @@ on:
    branches: [main]
  pull_request:
    branches: [main]
-  release:
-    types: [published]

 concurrency:
  group: docker-${{ github.ref }}
@@ -14,8 +12,6 @@ concurrency:

 jobs:
  build-and-push:
-    # Only run on the upstream repository, not on forks
-    if: github.repository == 'NousResearch/hermes-agent'
    runs-on: ubuntu-latest
    timeout-minutes: 30
    steps:
@@ -45,13 +41,13 @@ jobs:
            nousresearch/hermes-agent:test --help

      - name: Log in to Docker Hub
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
+        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

-      - name: Push image (main branch)
+      - name: Push image
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        uses: docker/build-push-action@v6
        with:
@@ -63,17 +59,3 @@ jobs:
            nousresearch/hermes-agent:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
-
-      - name: Push image (release)
-        if: github.event_name == 'release'
-        uses: docker/build-push-action@v6
-        with:
-          context: .
-          file: Dockerfile
-          push: true
-          tags: |
-            nousresearch/hermes-agent:latest
-            nousresearch/hermes-agent:${{ github.event.release.tag_name }}
-            nousresearch/hermes-agent:${{ github.sha }}
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
@@ -1,25 +1,20 @@
 FROM debian:13.4

-# Install system dependencies in one layer, clear APT cache
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential nodejs npm python3 python3-pip ripgrep ffmpeg gcc python3-dev libffi-dev && \
-    rm -rf /var/lib/apt/lists/*
+RUN apt-get update
+RUN apt-get install -y nodejs npm python3 python3-pip ripgrep ffmpeg gcc python3-dev libffi-dev

 COPY . /opt/hermes
 WORKDIR /opt/hermes

-# Install Python and Node dependencies in one layer, no cache
-RUN pip install --no-cache-dir -e ".[all]" --break-system-packages && \
-    npm install --prefer-offline --no-audit && \
-    npx playwright install --with-deps chromium --only-shell && \
-    cd /opt/hermes/scripts/whatsapp-bridge && \
-    npm install --prefer-offline --no-audit && \
-    npm cache clean --force
+RUN pip install -e ".[all]" --break-system-packages
+RUN npm install
+RUN npx playwright install --with-deps chromium
+WORKDIR /opt/hermes/scripts/whatsapp-bridge
+RUN npm install

 WORKDIR /opt/hermes
 RUN chmod +x /opt/hermes/docker/entrypoint.sh

 ENV HERMES_HOME=/opt/data
 VOLUME [ "/opt/data" ]
-ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
+ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
@@ -1,4 +0,0 @@
-graft skills
-graft optional-skills
-global-exclude __pycache__
-global-exclude *.py[cod]
@@ -162,36 +162,6 @@ def _is_oauth_token(key: str) -> bool:
    return True


-def _is_third_party_anthropic_endpoint(base_url: str | None) -> bool:
-    """Return True for non-Anthropic endpoints using the Anthropic Messages API.
-
-    Third-party proxies (Azure AI Foundry, AWS Bedrock, self-hosted) authenticate
-    with their own API keys via x-api-key, not Anthropic OAuth tokens. OAuth
-    detection should be skipped for these endpoints.
-    """
-    if not base_url:
-        return False  # No base_url = direct Anthropic API
-    normalized = base_url.rstrip("/").lower()
-    if "anthropic.com" in normalized:
-        return False  # Direct Anthropic API — OAuth applies
-    return True  # Any other endpoint is a third-party proxy
-
-
-def _requires_bearer_auth(base_url: str | None) -> bool:
-    """Return True for Anthropic-compatible providers that require Bearer auth.
-
-    Some third-party /anthropic endpoints implement Anthropic's Messages API but
-    require Authorization: Bearer instead of Anthropic's native x-api-key header.
-    MiniMax's global and China Anthropic-compatible endpoints follow this pattern.
-    """
-    if not base_url:
-        return False
-    normalized = base_url.rstrip("/").lower()
-    return normalized.startswith("https://api.minimax.io/anthropic") or normalized.startswith(
-        "https://api.minimaxi.com/anthropic"
-    )
-
-
 def build_anthropic_client(api_key: str, base_url: str = None):
    """Create an Anthropic client, auto-detecting setup-tokens vs API keys.

@@ -210,25 +180,7 @@ def build_anthropic_client(api_key: str, base_url: str = None):
    if base_url:
        kwargs["base_url"] = base_url

-    if _requires_bearer_auth(base_url):
-        # Some Anthropic-compatible providers (e.g. MiniMax) expect the API key in
-        # Authorization: Bearer even for regular API keys. Route those endpoints
-        # through auth_token so the SDK sends Bearer auth instead of x-api-key.
-        # Check this before OAuth token shape detection because MiniMax secrets do
-        # not use Anthropic's sk-ant-api prefix and would otherwise be misread as
-        # Anthropic OAuth/setup tokens.
-        kwargs["auth_token"] = api_key
-        if _COMMON_BETAS:
-            kwargs["default_headers"] = {"anthropic-beta": ",".join(_COMMON_BETAS)}
-    elif _is_third_party_anthropic_endpoint(base_url):
-        # Third-party proxies (Azure AI Foundry, AWS Bedrock, etc.) use their
-        # own API keys with x-api-key auth. Skip OAuth detection — their keys
-        # don't follow Anthropic's sk-ant-* prefix convention and would be
-        # misclassified as OAuth tokens.
-        kwargs["api_key"] = api_key
-        if _COMMON_BETAS:
-            kwargs["default_headers"] = {"anthropic-beta": ",".join(_COMMON_BETAS)}
-    elif _is_oauth_token(api_key):
+    if _is_oauth_token(api_key):
        # OAuth access token / setup-token → Bearer auth + Claude Code identity.
        # Anthropic routes OAuth requests based on user-agent and headers;
        # without Claude Code's fingerprint, requests get intermittent 500s.
@@ -307,105 +259,71 @@ def is_claude_code_token_valid(creds: Dict[str, Any]) -> bool:
    return now_ms < (expires_at - 60_000)


-def refresh_anthropic_oauth_pure(refresh_token: str, *, use_json: bool = False) -> Dict[str, Any]:
-    """Refresh an Anthropic OAuth token without mutating local credential files."""
+def _refresh_oauth_token(creds: Dict[str, Any]) -> Optional[str]:
+    """Attempt to refresh an expired Claude Code OAuth token.
+
+    Uses the same token endpoint and client_id as Claude Code / OpenCode.
+    Only works for credentials that have a refresh token (from claude /login
+    or claude setup-token with OAuth flow).
+
+    Tries the new platform.claude.com endpoint first (Claude Code >=2.1.81),
+    then falls back to console.anthropic.com for older tokens.
+
+    Returns the new access token, or None if refresh fails.
+    """
    import time
-    import urllib.parse
    import urllib.request

-    if not refresh_token:
-        raise ValueError("refresh_token is required")
-
-    client_id = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
-    if use_json:
-        data = json.dumps({
-            "grant_type": "refresh_token",
-            "refresh_token": refresh_token,
-            "client_id": client_id,
-        }).encode()
-        content_type = "application/json"
-    else:
-        data = urllib.parse.urlencode({
-            "grant_type": "refresh_token",
-            "refresh_token": refresh_token,
-            "client_id": client_id,
-        }).encode()
-        content_type = "application/x-www-form-urlencoded"
-
-    token_endpoints = [
-        "https://platform.claude.com/v1/oauth/token",
-        "https://console.anthropic.com/v1/oauth/token",
-    ]
-    last_error = None
-    for endpoint in token_endpoints:
-        req = urllib.request.Request(
-            endpoint,
-            data=data,
-            headers={
-                "Content-Type": content_type,
-                "User-Agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
-            },
-            method="POST",
-        )
-        try:
-            with urllib.request.urlopen(req, timeout=10) as resp:
-                result = json.loads(resp.read().decode())
-        except Exception as exc:
-            last_error = exc
-            logger.debug("Anthropic token refresh failed at %s: %s", endpoint, exc)
-            continue
-
-        access_token = result.get("access_token", "")
-        if not access_token:
-            raise ValueError("Anthropic refresh response was missing access_token")
-        next_refresh = result.get("refresh_token", refresh_token)
-        expires_in = result.get("expires_in", 3600)
-        return {
-            "access_token": access_token,
-            "refresh_token": next_refresh,
-            "expires_at_ms": int(time.time() * 1000) + (expires_in * 1000),
-        }
-
-    if last_error is not None:
-        raise last_error
-    raise ValueError("Anthropic token refresh failed")
-
-
-def _refresh_oauth_token(creds: Dict[str, Any]) -> Optional[str]:
-    """Attempt to refresh an expired Claude Code OAuth token."""
    refresh_token = creds.get("refreshToken", "")
    if not refresh_token:
        logger.debug("No refresh token available — cannot refresh")
        return None

-    try:
-        refreshed = refresh_anthropic_oauth_pure(refresh_token, use_json=False)
-        _write_claude_code_credentials(
-            refreshed["access_token"],
-            refreshed["refresh_token"],
-            refreshed["expires_at_ms"],
+    # Client ID used by Claude Code's OAuth flow
+    CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
+
+    # Anthropic migrated OAuth from console.anthropic.com to platform.claude.com
+    # (Claude Code v2.1.81+). Try new endpoint first, fall back to old.
+    token_endpoints = [
+        "https://platform.claude.com/v1/oauth/token",
+        "https://console.anthropic.com/v1/oauth/token",
+    ]
+
+    payload = json.dumps({
+        "grant_type": "refresh_token",
+        "refresh_token": refresh_token,
+        "client_id": CLIENT_ID,
+    }).encode()
+
+    headers = {
+        "Content-Type": "application/json",
+        "User-Agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
+    }
+
+    for endpoint in token_endpoints:
+        req = urllib.request.Request(
+            endpoint, data=payload, headers=headers, method="POST",
        )
-        logger.debug("Successfully refreshed Claude Code OAuth token")
-        return refreshed["access_token"]
-    except Exception as e:
-        logger.debug("Failed to refresh Claude Code token: %s", e)
-        return None
+        try:
+            with urllib.request.urlopen(req, timeout=10) as resp:
+                result = json.loads(resp.read().decode())
+                new_access = result.get("access_token", "")
+                new_refresh = result.get("refresh_token", refresh_token)
+                expires_in = result.get("expires_in", 3600)
+
+                if new_access:
+                    new_expires_ms = int(time.time() * 1000) + (expires_in * 1000)
+                    _write_claude_code_credentials(new_access, new_refresh, new_expires_ms)
+                    logger.debug("Refreshed Claude Code OAuth token via %s", endpoint)
+                    return new_access
+        except Exception as e:
+            logger.debug("Token refresh failed at %s: %s", endpoint, e)
+
+    return None


-def _write_claude_code_credentials(
-    access_token: str,
-    refresh_token: str,
-    expires_at_ms: int,
-    *,
-    scopes: Optional[list] = None,
-) -> None:
-    """Write refreshed credentials back to ~/.claude/.credentials.json.
-
-    The optional *scopes* list (e.g. ``["user:inference", "user:profile", ...]``)
-    is persisted so that Claude Code's own auth check recognises the credential
-    as valid.  Claude Code >=2.1.81 gates on the presence of ``"user:inference"``
-    in the stored scopes before it will use the token.
-    """
+def _write_claude_code_credentials(access_token: str, refresh_token: str, expires_at_ms: int) -> None:
+    """Write refreshed credentials back to ~/.claude/.credentials.json."""
    cred_path = Path.home() / ".claude" / ".credentials.json"
    try:
        # Read existing file to preserve other fields
@@ -413,19 +331,11 @@ def _write_claude_code_credentials(
        if cred_path.exists():
            existing = json.loads(cred_path.read_text(encoding="utf-8"))

-        oauth_data: Dict[str, Any] = {
+        existing["claudeAiOauth"] = {
            "accessToken": access_token,
            "refreshToken": refresh_token,
            "expiresAt": expires_at_ms,
        }
-        if scopes is not None:
-            oauth_data["scopes"] = scopes
-        elif "claudeAiOauth" in existing and "scopes" in existing["claudeAiOauth"]:
-            # Preserve previously-stored scopes when the refresh response
-            # does not include a scope field.
-            oauth_data["scopes"] = existing["claudeAiOauth"]["scopes"]
-
-        existing["claudeAiOauth"] = oauth_data

        cred_path.parent.mkdir(parents=True, exist_ok=True)
        cred_path.write_text(json.dumps(existing, indent=2), encoding="utf-8")
@@ -585,208 +495,10 @@ def run_oauth_setup_token() -> Optional[str]:
    return None


-# ── Hermes-native PKCE OAuth flow ────────────────────────────────────────
-# Mirrors the flow used by Claude Code, pi-ai, and OpenCode.
-# Stores credentials in ~/.hermes/.anthropic_oauth.json (our own file).
-
-_OAUTH_CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
-_OAUTH_TOKEN_URL = "https://console.anthropic.com/v1/oauth/token"
-_OAUTH_REDIRECT_URI = "https://console.anthropic.com/oauth/code/callback"
-_OAUTH_SCOPES = "org:create_api_key user:profile user:inference"
-_HERMES_OAUTH_FILE = get_hermes_home() / ".anthropic_oauth.json"


-def _generate_pkce() -> tuple:
-    """Generate PKCE code_verifier and code_challenge (S256)."""
-    import base64
-    import hashlib
-    import secrets
-
-    verifier = base64.urlsafe_b64encode(secrets.token_bytes(32)).rstrip(b"=").decode()
-    challenge = base64.urlsafe_b64encode(
-        hashlib.sha256(verifier.encode()).digest()
-    ).rstrip(b"=").decode()
-    return verifier, challenge


-def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]:
-    """Run Hermes-native OAuth PKCE flow and return credential state."""
-    import time
-    import webbrowser
-
-    verifier, challenge = _generate_pkce()
-
-    params = {
-        "code": "true",
-        "client_id": _OAUTH_CLIENT_ID,
-        "response_type": "code",
-        "redirect_uri": _OAUTH_REDIRECT_URI,
-        "scope": _OAUTH_SCOPES,
-        "code_challenge": challenge,
-        "code_challenge_method": "S256",
-        "state": verifier,
-    }
-    from urllib.parse import urlencode
-
-    auth_url = f"https://claude.ai/oauth/authorize?{urlencode(params)}"
-
-    print()
-    print("Authorize Hermes with your Claude Pro/Max subscription.")
-    print()
-    print("╭─ Claude Pro/Max Authorization ────────────────────╮")
-    print("│                                                   │")
-    print("│  Open this link in your browser:                  │")
-    print("╰───────────────────────────────────────────────────╯")
-    print()
-    print(f"  {auth_url}")
-    print()
-
-    try:
-        webbrowser.open(auth_url)
-        print("  (Browser opened automatically)")
-    except Exception:
-        pass
-
-    print()
-    print("After authorizing, you'll see a code. Paste it below.")
-    print()
-    try:
-        auth_code = input("Authorization code: ").strip()
-    except (KeyboardInterrupt, EOFError):
-        return None
-
-    if not auth_code:
-        print("No code entered.")
-        return None
-
-    splits = auth_code.split("#")
-    code = splits[0]
-    state = splits[1] if len(splits) > 1 else ""
-
-    try:
-        import urllib.request
-
-        exchange_data = json.dumps({
-            "grant_type": "authorization_code",
-            "client_id": _OAUTH_CLIENT_ID,
-            "code": code,
-            "state": state,
-            "redirect_uri": _OAUTH_REDIRECT_URI,
-            "code_verifier": verifier,
-        }).encode()
-
-        req = urllib.request.Request(
-            _OAUTH_TOKEN_URL,
-            data=exchange_data,
-            headers={
-                "Content-Type": "application/json",
-                "User-Agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
-            },
-            method="POST",
-        )
-
-        with urllib.request.urlopen(req, timeout=15) as resp:
-            result = json.loads(resp.read().decode())
-    except Exception as e:
-        print(f"Token exchange failed: {e}")
-        return None
-
-    access_token = result.get("access_token", "")
-    refresh_token = result.get("refresh_token", "")
-    expires_in = result.get("expires_in", 3600)
-
-    if not access_token:
-        print("No access token in response.")
-        return None
-
-    expires_at_ms = int(time.time() * 1000) + (expires_in * 1000)
-    return {
-        "access_token": access_token,
-        "refresh_token": refresh_token,
-        "expires_at_ms": expires_at_ms,
-    }
-
-
-def run_hermes_oauth_login() -> Optional[str]:
-    """Run Hermes-native OAuth PKCE flow for Claude Pro/Max subscription.
-
-    Opens a browser to claude.ai for authorization, prompts for the code,
-    exchanges it for tokens, and stores them in ~/.hermes/.anthropic_oauth.json.
-
-    Returns the access token on success, None on failure.
-    """
-    result = run_hermes_oauth_login_pure()
-    if not result:
-        return None
-
-    access_token = result["access_token"]
-    refresh_token = result["refresh_token"]
-    expires_at_ms = result["expires_at_ms"]
-
-    _save_hermes_oauth_credentials(access_token, refresh_token, expires_at_ms)
-    _write_claude_code_credentials(access_token, refresh_token, expires_at_ms)
-
-    print("Authentication successful!")
-    return access_token
-
-
-def _save_hermes_oauth_credentials(access_token: str, refresh_token: str, expires_at_ms: int) -> None:
-    """Save OAuth credentials to ~/.hermes/.anthropic_oauth.json."""
-    data = {
-        "accessToken": access_token,
-        "refreshToken": refresh_token,
-        "expiresAt": expires_at_ms,
-    }
-    try:
-        _HERMES_OAUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
-        _HERMES_OAUTH_FILE.write_text(json.dumps(data, indent=2), encoding="utf-8")
-        _HERMES_OAUTH_FILE.chmod(0o600)
-    except (OSError, IOError) as e:
-        logger.debug("Failed to save Hermes OAuth credentials: %s", e)
-
-
-def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
-    """Read Hermes-managed OAuth credentials from ~/.hermes/.anthropic_oauth.json."""
-    if _HERMES_OAUTH_FILE.exists():
-        try:
-            data = json.loads(_HERMES_OAUTH_FILE.read_text(encoding="utf-8"))
-            if data.get("accessToken"):
-                return data
-        except (json.JSONDecodeError, OSError, IOError) as e:
-            logger.debug("Failed to read Hermes OAuth credentials: %s", e)
-    return None
-
-
-def refresh_hermes_oauth_token() -> Optional[str]:
-    """Refresh the Hermes-managed OAuth token using the stored refresh token.
-
-    Returns the new access token, or None if refresh fails.
-    """
-    creds = read_hermes_oauth_credentials()
-    if not creds or not creds.get("refreshToken"):
-        return None
-
-    try:
-        refreshed = refresh_anthropic_oauth_pure(
-            creds["refreshToken"],
-            use_json=True,
-        )
-        _save_hermes_oauth_credentials(
-            refreshed["access_token"],
-            refreshed["refresh_token"],
-            refreshed["expires_at_ms"],
-        )
-        _write_claude_code_credentials(
-            refreshed["access_token"],
-            refreshed["refresh_token"],
-            refreshed["expires_at_ms"],
-        )
-        logger.debug("Successfully refreshed Hermes OAuth token")
-        return refreshed["access_token"]
-    except Exception as e:
-        logger.debug("Failed to refresh Hermes OAuth token: %s", e)
-
-    return None


 # ---------------------------------------------------------------------------
@@ -1319,4 +1031,4 @@ def normalize_anthropic_response(
            reasoning_details=None,
        ),
        finish_reason,
-    )
+    )
@@ -7,7 +7,7 @@ the best available backend without duplicating fallback logic.
 Resolution order for text tasks (auto mode):
  1. OpenRouter  (OPENROUTER_API_KEY)
  2. Nous Portal (~/.hermes/auth.json active provider)
-  3. Custom endpoint (config.yaml model.base_url + OPENAI_API_KEY)
+  3. Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY)
  4. Codex OAuth (Responses API via chatgpt.com with gpt-5.3-codex,
     wrapped to look like a chat.completions client)
  5. Native Anthropic
@@ -47,7 +47,6 @@ from typing import Any, Dict, List, Optional, Tuple

 from openai import OpenAI

-from agent.credential_pool import load_pool
 from hermes_cli.config import get_hermes_home
 from hermes_constants import OPENROUTER_BASE_URL

@@ -97,45 +96,6 @@ _CODEX_AUX_MODEL = "gpt-5.2-codex"
 _CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"


-def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]:
-    """Return (pool_exists_for_provider, selected_entry)."""
-    try:
-        pool = load_pool(provider)
-    except Exception as exc:
-        logger.debug("Auxiliary client: could not load pool for %s: %s", provider, exc)
-        return False, None
-    if not pool or not pool.has_credentials():
-        return False, None
-    try:
-        return True, pool.select()
-    except Exception as exc:
-        logger.debug("Auxiliary client: could not select pool entry for %s: %s", provider, exc)
-        return True, None
-
-
-def _pool_runtime_api_key(entry: Any) -> str:
-    if entry is None:
-        return ""
-    # Use the PooledCredential.runtime_api_key property which handles
-    # provider-specific fallback (e.g. agent_key for nous).
-    key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
-    return str(key or "").strip()
-
-
-def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
-    if entry is None:
-        return str(fallback or "").strip().rstrip("/")
-    # runtime_base_url handles provider-specific logic (e.g. nous prefers inference_base_url).
-    # Fall back through inference_base_url and base_url for non-PooledCredential entries.
-    url = (
-        getattr(entry, "runtime_base_url", None)
-        or getattr(entry, "inference_base_url", None)
-        or getattr(entry, "base_url", None)
-        or fallback
-    )
-    return str(url or "").strip().rstrip("/")
-
-
 # ── Codex Responses → chat.completions adapter ─────────────────────────────
 # All auxiliary consumers call client.chat.completions.create(**kwargs) and
 # read response.choices[0].message.content. This adapter translates those
@@ -479,22 +439,6 @@ def _read_nous_auth() -> Optional[dict]:
    Returns the provider state dict if Nous is active with tokens,
    otherwise None.
    """
-    pool_present, entry = _select_pool_entry("nous")
-    if pool_present:
-        if entry is None:
-            return None
-        return {
-            "access_token": getattr(entry, "access_token", ""),
-            "refresh_token": getattr(entry, "refresh_token", None),
-            "agent_key": getattr(entry, "agent_key", None),
-            "inference_base_url": _pool_runtime_base_url(entry, _NOUS_DEFAULT_BASE_URL),
-            "portal_base_url": getattr(entry, "portal_base_url", None),
-            "client_id": getattr(entry, "client_id", None),
-            "scope": getattr(entry, "scope", None),
-            "token_type": getattr(entry, "token_type", "Bearer"),
-            "source": "pool",
-        }
-
    try:
        if not _AUTH_JSON_PATH.is_file():
            return None
@@ -523,11 +467,6 @@ def _nous_base_url() -> str:

 def _read_codex_access_token() -> Optional[str]:
    """Read a valid, non-expired Codex OAuth access token from Hermes auth store."""
-    pool_present, entry = _select_pool_entry("openai-codex")
-    if pool_present:
-        token = _pool_runtime_api_key(entry)
-        return token or None
-
    try:
        from hermes_cli.auth import _read_codex_tokens
        data = _read_codex_tokens()
@@ -574,24 +513,6 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
        if provider_id == "anthropic":
            return _try_anthropic()

-        pool_present, entry = _select_pool_entry(provider_id)
-        if pool_present:
-            api_key = _pool_runtime_api_key(entry)
-            if not api_key:
-                continue
-
-            base_url = _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
-            model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
-            logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
-            extra = {}
-            if "api.kimi.com" in base_url.lower():
-                extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
-            elif "api.githubcopilot.com" in base_url.lower():
-                from hermes_cli.models import copilot_default_headers
-
-                extra["default_headers"] = copilot_default_headers()
-            return OpenAI(api_key=api_key, base_url=base_url, **extra), model
-
        creds = resolve_api_key_provider_credentials(provider_id)
        api_key = str(creds.get("api_key", "")).strip()
        if not api_key:
@@ -641,16 +562,6 @@ def _get_auxiliary_env_override(task: str, suffix: str) -> Optional[str]:


 def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
-    pool_present, entry = _select_pool_entry("openrouter")
-    if pool_present:
-        or_key = _pool_runtime_api_key(entry)
-        if not or_key:
-            return None, None
-        base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
-        logger.debug("Auxiliary client: OpenRouter via pool")
-        return OpenAI(api_key=or_key, base_url=base_url,
-                       default_headers=_OR_HEADERS), _OPENROUTER_MODEL
-
    or_key = os.getenv("OPENROUTER_API_KEY")
    if not or_key:
        return None, None
@@ -666,22 +577,22 @@ def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
    global auxiliary_is_nous
    auxiliary_is_nous = True
    logger.debug("Auxiliary client: Nous Portal")
-    model = "gemini-3-flash" if nous.get("source") == "pool" else _NOUS_MODEL
    return (
-        OpenAI(
-            api_key=_nous_api_key(nous),
-            base_url=str(nous.get("inference_base_url") or _nous_base_url()).rstrip("/"),
-        ),
-        model,
+        OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
+        _NOUS_MODEL,
    )


 def _read_main_model() -> str:
-    """Read the user's configured main model from config.yaml.
+    """Read the user's configured main model from config/env.

-    config.yaml model.default is the single source of truth for the active
-    model. Environment variables are no longer consulted.
+    Falls back through HERMES_MODEL → LLM_MODEL → config.yaml model.default
+    so the auxiliary client can use the same model as the main agent when no
+    dedicated auxiliary model is available.
    """
+    from_env = os.getenv("OPENAI_MODEL") or os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL")
+    if from_env:
+        return from_env.strip()
    try:
        from hermes_cli.config import load_config
        cfg = load_config()
@@ -748,19 +659,11 @@ def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:


 def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
-    pool_present, entry = _select_pool_entry("openai-codex")
-    if pool_present:
-        codex_token = _pool_runtime_api_key(entry)
-        if not codex_token:
-            return None, None
-        base_url = _pool_runtime_base_url(entry, _CODEX_AUX_BASE_URL) or _CODEX_AUX_BASE_URL
-    else:
-        codex_token = _read_codex_access_token()
-        if not codex_token:
-            return None, None
-        base_url = _CODEX_AUX_BASE_URL
+    codex_token = _read_codex_access_token()
+    if not codex_token:
+        return None, None
    logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
-    real_client = OpenAI(api_key=codex_token, base_url=base_url)
+    real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
    return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL


@@ -770,21 +673,14 @@ def _try_anthropic() -> Tuple[Optional[Any], Optional[str]]:
    except ImportError:
        return None, None

-    pool_present, entry = _select_pool_entry("anthropic")
-    if pool_present:
-        if entry is None:
-            return None, None
-        token = _pool_runtime_api_key(entry)
-    else:
-        entry = None
-        token = resolve_anthropic_token()
+    token = resolve_anthropic_token()
    if not token:
        return None, None

    # Allow base URL override from config.yaml model.base_url, but only
    # when the configured provider is anthropic — otherwise a non-Anthropic
    # base_url (e.g. Codex endpoint) would leak into Anthropic requests.
-    base_url = _pool_runtime_base_url(entry, _ANTHROPIC_DEFAULT_BASE_URL) if pool_present else _ANTHROPIC_DEFAULT_BASE_URL
+    base_url = _ANTHROPIC_DEFAULT_BASE_URL
    try:
        from hermes_cli.config import load_config
        cfg = load_config()
@@ -17,7 +17,7 @@ REFERENCE_PATTERN = re.compile(
    r"(?<![\w/])@(?:(?P<simple>diff|staged)\b|(?P<kind>file|folder|git|url):(?P<value>\S+))"
 )
 TRAILING_PUNCTUATION = ",.;!?"
-_SENSITIVE_HOME_DIRS = (".ssh", ".aws", ".gnupg", ".kube", ".docker", ".azure", ".config/gh")
+_SENSITIVE_HOME_DIRS = (".ssh", ".aws", ".gnupg", ".kube")
 _SENSITIVE_HERMES_DIRS = (Path("skills") / ".hub",)
 _SENSITIVE_HOME_FILES = (
    Path(".ssh") / "authorized_keys",
@@ -1,848 +0,0 @@
-"""Persistent multi-credential pool for same-provider failover."""
-
-from __future__ import annotations
-
-import logging
-import random
-import threading
-import time
-import uuid
-import os
-from dataclasses import dataclass, fields, replace
-from typing import Any, Dict, List, Optional, Set, Tuple
-
-from hermes_constants import OPENROUTER_BASE_URL
-import hermes_cli.auth as auth_mod
-from hermes_cli.auth import (
-    ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
-    CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
-    DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
-    PROVIDER_REGISTRY,
-    _agent_key_is_usable,
-    _codex_access_token_is_expiring,
-    _decode_jwt_claims,
-    _is_expiring,
-    _load_auth_store,
-    _load_provider_state,
-    read_credential_pool,
-    write_credential_pool,
-)
-
-logger = logging.getLogger(__name__)
-
-
-def _load_config_safe() -> Optional[dict]:
-    """Load config.yaml, returning None on any error."""
-    try:
-        from hermes_cli.config import load_config
-
-        return load_config()
-    except Exception:
-        return None
-
-
-# --- Status and type constants ---
-
-STATUS_OK = "ok"
-STATUS_EXHAUSTED = "exhausted"
-
-AUTH_TYPE_OAUTH = "oauth"
-AUTH_TYPE_API_KEY = "api_key"
-
-SOURCE_MANUAL = "manual"
-
-STRATEGY_FILL_FIRST = "fill_first"
-STRATEGY_ROUND_ROBIN = "round_robin"
-STRATEGY_RANDOM = "random"
-STRATEGY_LEAST_USED = "least_used"
-SUPPORTED_POOL_STRATEGIES = {
-    STRATEGY_FILL_FIRST,
-    STRATEGY_ROUND_ROBIN,
-    STRATEGY_RANDOM,
-    STRATEGY_LEAST_USED,
-}
-
-# Cooldown before retrying an exhausted credential.
-# 429 (rate-limited) cools down faster since quotas reset frequently.
-# 402 (billing/quota) and other codes use a longer default.
-EXHAUSTED_TTL_429_SECONDS = 60 * 60          # 1 hour
-EXHAUSTED_TTL_DEFAULT_SECONDS = 24 * 60 * 60 # 24 hours
-
-# Pool key prefix for custom OpenAI-compatible endpoints.
-# Custom endpoints all share provider='custom' but are keyed by their
-# custom_providers name: 'custom:<normalized_name>'.
-CUSTOM_POOL_PREFIX = "custom:"
-
-
-# Fields that are only round-tripped through JSON — never used for logic as attributes.
-_EXTRA_KEYS = frozenset({
-    "token_type", "scope", "client_id", "portal_base_url", "obtained_at",
-    "expires_in", "agent_key_id", "agent_key_expires_in", "agent_key_reused",
-    "agent_key_obtained_at", "tls",
-})
-
-
-@dataclass
-class PooledCredential:
-    provider: str
-    id: str
-    label: str
-    auth_type: str
-    priority: int
-    source: str
-    access_token: str
-    refresh_token: Optional[str] = None
-    last_status: Optional[str] = None
-    last_status_at: Optional[float] = None
-    last_error_code: Optional[int] = None
-    base_url: Optional[str] = None
-    expires_at: Optional[str] = None
-    expires_at_ms: Optional[int] = None
-    last_refresh: Optional[str] = None
-    inference_base_url: Optional[str] = None
-    agent_key: Optional[str] = None
-    agent_key_expires_at: Optional[str] = None
-    request_count: int = 0
-    extra: Dict[str, Any] = None  # type: ignore[assignment]
-
-    def __post_init__(self):
-        if self.extra is None:
-            self.extra = {}
-
-    def __getattr__(self, name: str):
-        if name in _EXTRA_KEYS:
-            return self.extra.get(name)
-        raise AttributeError(f"'{type(self).__name__}' object has no attribute {name!r}")
-
-    @classmethod
-    def from_dict(cls, provider: str, payload: Dict[str, Any]) -> "PooledCredential":
-        field_names = {f.name for f in fields(cls) if f.name != "provider"}
-        data = {k: payload.get(k) for k in field_names if k in payload}
-        extra = {k: payload[k] for k in _EXTRA_KEYS if k in payload and payload[k] is not None}
-        data["extra"] = extra
-        data.setdefault("id", uuid.uuid4().hex[:6])
-        data.setdefault("label", payload.get("source", provider))
-        data.setdefault("auth_type", AUTH_TYPE_API_KEY)
-        data.setdefault("priority", 0)
-        data.setdefault("source", SOURCE_MANUAL)
-        data.setdefault("access_token", "")
-        return cls(provider=provider, **data)
-
-    def to_dict(self) -> Dict[str, Any]:
-        _ALWAYS_EMIT = {"last_status", "last_status_at", "last_error_code"}
-        result: Dict[str, Any] = {}
-        for field_def in fields(self):
-            if field_def.name in ("provider", "extra"):
-                continue
-            value = getattr(self, field_def.name)
-            if value is not None or field_def.name in _ALWAYS_EMIT:
-                result[field_def.name] = value
-        for k, v in self.extra.items():
-            if v is not None:
-                result[k] = v
-        return result
-
-    @property
-    def runtime_api_key(self) -> str:
-        if self.provider == "nous":
-            return str(self.agent_key or self.access_token or "")
-        return str(self.access_token or "")
-
-    @property
-    def runtime_base_url(self) -> Optional[str]:
-        if self.provider == "nous":
-            return self.inference_base_url or self.base_url
-        return self.base_url
-
-
-def label_from_token(token: str, fallback: str) -> str:
-    claims = _decode_jwt_claims(token)
-    for key in ("email", "preferred_username", "upn"):
-        value = claims.get(key)
-        if isinstance(value, str) and value.strip():
-            return value.strip()
-    return fallback
-
-
-def _next_priority(entries: List[PooledCredential]) -> int:
-    return max((entry.priority for entry in entries), default=-1) + 1
-
-
-def _is_manual_source(source: str) -> bool:
-    normalized = (source or "").strip().lower()
-    return normalized == SOURCE_MANUAL or normalized.startswith(f"{SOURCE_MANUAL}:")
-
-
-def _exhausted_ttl(error_code: Optional[int]) -> int:
-    """Return cooldown seconds based on the HTTP status that caused exhaustion."""
-    if error_code == 429:
-        return EXHAUSTED_TTL_429_SECONDS
-    return EXHAUSTED_TTL_DEFAULT_SECONDS
-
-
-def _normalize_custom_pool_name(name: str) -> str:
-    """Normalize a custom provider name for use as a pool key suffix."""
-    return name.strip().lower().replace(" ", "-")
-
-
-def _iter_custom_providers(config: Optional[dict] = None):
-    """Yield (normalized_name, entry_dict) for each valid custom_providers entry."""
-    if config is None:
-        config = _load_config_safe()
-    if config is None:
-        return
-    custom_providers = config.get("custom_providers")
-    if not isinstance(custom_providers, list):
-        return
-    for entry in custom_providers:
-        if not isinstance(entry, dict):
-            continue
-        name = entry.get("name")
-        if not isinstance(name, str):
-            continue
-        yield _normalize_custom_pool_name(name), entry
-
-
-def get_custom_provider_pool_key(base_url: str) -> Optional[str]:
-    """Look up the custom_providers list in config.yaml and return 'custom:<name>' for a matching base_url.
-
-    Returns None if no match is found.
-    """
-    if not base_url:
-        return None
-    normalized_url = base_url.strip().rstrip("/")
-    for norm_name, entry in _iter_custom_providers():
-        entry_url = str(entry.get("base_url") or "").strip().rstrip("/")
-        if entry_url and entry_url == normalized_url:
-            return f"{CUSTOM_POOL_PREFIX}{norm_name}"
-    return None
-
-
-def list_custom_pool_providers() -> List[str]:
-    """Return all 'custom:*' pool keys that have entries in auth.json."""
-    pool_data = read_credential_pool(None)
-    return sorted(
-        key for key in pool_data
-        if key.startswith(CUSTOM_POOL_PREFIX)
-        and isinstance(pool_data.get(key), list)
-        and pool_data[key]
-    )
-
-
-def _get_custom_provider_config(pool_key: str) -> Optional[Dict[str, Any]]:
-    """Return the custom_providers config entry matching a pool key like 'custom:together.ai'."""
-    if not pool_key.startswith(CUSTOM_POOL_PREFIX):
-        return None
-    suffix = pool_key[len(CUSTOM_POOL_PREFIX):]
-    for norm_name, entry in _iter_custom_providers():
-        if norm_name == suffix:
-            return entry
-    return None
-
-
-def get_pool_strategy(provider: str) -> str:
-    """Return the configured selection strategy for a provider."""
-    config = _load_config_safe()
-    if config is None:
-        return STRATEGY_FILL_FIRST
-
-    strategies = config.get("credential_pool_strategies")
-    if not isinstance(strategies, dict):
-        return STRATEGY_FILL_FIRST
-
-    strategy = str(strategies.get(provider, "") or "").strip().lower()
-    if strategy in SUPPORTED_POOL_STRATEGIES:
-        return strategy
-    return STRATEGY_FILL_FIRST
-
-
-class CredentialPool:
-    def __init__(self, provider: str, entries: List[PooledCredential]):
-        self.provider = provider
-        self._entries = sorted(entries, key=lambda entry: entry.priority)
-        self._current_id: Optional[str] = None
-        self._strategy = get_pool_strategy(provider)
-        self._lock = threading.Lock()
-
-    def has_credentials(self) -> bool:
-        return bool(self._entries)
-
-    def has_available(self) -> bool:
-        """True if at least one entry is not currently in exhaustion cooldown."""
-        return bool(self._available_entries())
-
-    def entries(self) -> List[PooledCredential]:
-        return list(self._entries)
-
-    def current(self) -> Optional[PooledCredential]:
-        if not self._current_id:
-            return None
-        return next((entry for entry in self._entries if entry.id == self._current_id), None)
-
-    def _replace_entry(self, old: PooledCredential, new: PooledCredential) -> None:
-        """Swap an entry in-place by id, preserving sort order."""
-        for idx, entry in enumerate(self._entries):
-            if entry.id == old.id:
-                self._entries[idx] = new
-                return
-
-    def _persist(self) -> None:
-        write_credential_pool(
-            self.provider,
-            [entry.to_dict() for entry in self._entries],
-        )
-
-    def _mark_exhausted(self, entry: PooledCredential, status_code: Optional[int]) -> PooledCredential:
-        updated = replace(
-            entry,
-            last_status=STATUS_EXHAUSTED,
-            last_status_at=time.time(),
-            last_error_code=status_code,
-        )
-        self._replace_entry(entry, updated)
-        self._persist()
-        return updated
-
-    def _refresh_entry(self, entry: PooledCredential, *, force: bool) -> Optional[PooledCredential]:
-        if entry.auth_type != AUTH_TYPE_OAUTH or not entry.refresh_token:
-            if force:
-                self._mark_exhausted(entry, None)
-            return None
-
-        try:
-            if self.provider == "anthropic":
-                from agent.anthropic_adapter import refresh_anthropic_oauth_pure
-
-                refreshed = refresh_anthropic_oauth_pure(
-                    entry.refresh_token,
-                    use_json=entry.source.endswith("hermes_pkce"),
-                )
-                updated = replace(
-                    entry,
-                    access_token=refreshed["access_token"],
-                    refresh_token=refreshed["refresh_token"],
-                    expires_at_ms=refreshed["expires_at_ms"],
-                )
-            elif self.provider == "openai-codex":
-                refreshed = auth_mod.refresh_codex_oauth_pure(
-                    entry.access_token,
-                    entry.refresh_token,
-                )
-                updated = replace(
-                    entry,
-                    access_token=refreshed["access_token"],
-                    refresh_token=refreshed["refresh_token"],
-                    last_refresh=refreshed.get("last_refresh"),
-                )
-            elif self.provider == "nous":
-                nous_state = {
-                    "access_token": entry.access_token,
-                    "refresh_token": entry.refresh_token,
-                    "client_id": entry.client_id,
-                    "portal_base_url": entry.portal_base_url,
-                    "inference_base_url": entry.inference_base_url,
-                    "token_type": entry.token_type,
-                    "scope": entry.scope,
-                    "obtained_at": entry.obtained_at,
-                    "expires_at": entry.expires_at,
-                    "agent_key": entry.agent_key,
-                    "agent_key_expires_at": entry.agent_key_expires_at,
-                    "tls": entry.tls,
-                }
-                refreshed = auth_mod.refresh_nous_oauth_from_state(
-                    nous_state,
-                    min_key_ttl_seconds=DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
-                    force_refresh=force,
-                    force_mint=force,
-                )
-                # Apply returned fields: dataclass fields via replace, extras via dict update
-                field_updates = {}
-                extra_updates = dict(entry.extra)
-                _field_names = {f.name for f in fields(entry)}
-                for k, v in refreshed.items():
-                    if k in _field_names:
-                        field_updates[k] = v
-                    elif k in _EXTRA_KEYS:
-                        extra_updates[k] = v
-                updated = replace(entry, extra=extra_updates, **field_updates)
-            else:
-                return entry
-        except Exception as exc:
-            logger.debug("Credential refresh failed for %s/%s: %s", self.provider, entry.id, exc)
-            self._mark_exhausted(entry, None)
-            return None
-
-        updated = replace(updated, last_status=STATUS_OK, last_status_at=None, last_error_code=None)
-        self._replace_entry(entry, updated)
-        self._persist()
-        return updated
-
-    def _entry_needs_refresh(self, entry: PooledCredential) -> bool:
-        if entry.auth_type != AUTH_TYPE_OAUTH:
-            return False
-        if self.provider == "anthropic":
-            if entry.expires_at_ms is None:
-                return False
-            return int(entry.expires_at_ms) <= int(time.time() * 1000) + 120_000
-        if self.provider == "openai-codex":
-            return _codex_access_token_is_expiring(
-                entry.access_token,
-                CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
-            )
-        if self.provider == "nous":
-            # Nous refresh/mint can require network access and should happen when
-            # runtime credentials are actually resolved, not merely when the pool
-            # is enumerated for listing, migration, or selection.
-            return False
-        return False
-
-    def mark_used(self, entry_id: Optional[str] = None) -> None:
-        """Increment request_count for tracking. Used by least_used strategy."""
-        target_id = entry_id or self._current_id
-        if not target_id:
-            return
-        with self._lock:
-            for idx, entry in enumerate(self._entries):
-                if entry.id == target_id:
-                    self._entries[idx] = replace(entry, request_count=entry.request_count + 1)
-                    return
-
-    def select(self) -> Optional[PooledCredential]:
-        with self._lock:
-            return self._select_unlocked()
-
-    def _available_entries(self, *, clear_expired: bool = False, refresh: bool = False) -> List[PooledCredential]:
-        """Return entries not currently in exhaustion cooldown.
-
-        When *clear_expired* is True, entries whose cooldown has elapsed are
-        reset to STATUS_OK and persisted.  When *refresh* is True, entries
-        that need a token refresh are refreshed (skipped on failure).
-        """
-        now = time.time()
-        cleared_any = False
-        available: List[PooledCredential] = []
-        for entry in self._entries:
-            if entry.last_status == STATUS_EXHAUSTED:
-                ttl = _exhausted_ttl(entry.last_error_code)
-                if entry.last_status_at and now - entry.last_status_at < ttl:
-                    continue
-                if clear_expired:
-                    cleared = replace(entry, last_status=STATUS_OK, last_status_at=None, last_error_code=None)
-                    self._replace_entry(entry, cleared)
-                    entry = cleared
-                    cleared_any = True
-            if refresh and self._entry_needs_refresh(entry):
-                refreshed = self._refresh_entry(entry, force=False)
-                if refreshed is None:
-                    continue
-                entry = refreshed
-            available.append(entry)
-        if cleared_any:
-            self._persist()
-        return available
-
-    def _select_unlocked(self) -> Optional[PooledCredential]:
-        available = self._available_entries(clear_expired=True, refresh=True)
-        if not available:
-            self._current_id = None
-            return None
-
-        if self._strategy == STRATEGY_RANDOM:
-            entry = random.choice(available)
-            self._current_id = entry.id
-            return entry
-
-        if self._strategy == STRATEGY_LEAST_USED and len(available) > 1:
-            entry = min(available, key=lambda e: e.request_count)
-            self._current_id = entry.id
-            return entry
-
-        if self._strategy == STRATEGY_ROUND_ROBIN and len(available) > 1:
-            entry = available[0]
-            rotated = [candidate for candidate in self._entries if candidate.id != entry.id]
-            rotated.append(replace(entry, priority=len(self._entries) - 1))
-            self._entries = [replace(candidate, priority=idx) for idx, candidate in enumerate(rotated)]
-            self._persist()
-            self._current_id = entry.id
-            return self.current() or entry
-
-        entry = available[0]
-        self._current_id = entry.id
-        return entry
-
-    def peek(self) -> Optional[PooledCredential]:
-        current = self.current()
-        if current is not None:
-            return current
-        available = self._available_entries()
-        return available[0] if available else None
-
-    def mark_exhausted_and_rotate(self, *, status_code: Optional[int]) -> Optional[PooledCredential]:
-        with self._lock:
-            entry = self.current() or self._select_unlocked()
-            if entry is None:
-                return None
-            self._mark_exhausted(entry, status_code)
-            self._current_id = None
-            return self._select_unlocked()
-
-    def try_refresh_current(self) -> Optional[PooledCredential]:
-        with self._lock:
-            return self._try_refresh_current_unlocked()
-
-    def _try_refresh_current_unlocked(self) -> Optional[PooledCredential]:
-        entry = self.current()
-        if entry is None:
-            return None
-        refreshed = self._refresh_entry(entry, force=True)
-        if refreshed is not None:
-            self._current_id = refreshed.id
-        return refreshed
-
-    def reset_statuses(self) -> int:
-        count = 0
-        new_entries = []
-        for entry in self._entries:
-            if entry.last_status or entry.last_status_at or entry.last_error_code:
-                new_entries.append(replace(entry, last_status=None, last_status_at=None, last_error_code=None))
-                count += 1
-            else:
-                new_entries.append(entry)
-        if count:
-            self._entries = new_entries
-            self._persist()
-        return count
-
-    def remove_index(self, index: int) -> Optional[PooledCredential]:
-        if index < 1 or index > len(self._entries):
-            return None
-        removed = self._entries.pop(index - 1)
-        self._entries = [
-            replace(entry, priority=new_priority)
-            for new_priority, entry in enumerate(self._entries)
-        ]
-        self._persist()
-        if self._current_id == removed.id:
-            self._current_id = None
-        return removed
-
-    def add_entry(self, entry: PooledCredential) -> PooledCredential:
-        entry = replace(entry, priority=_next_priority(self._entries))
-        self._entries.append(entry)
-        self._persist()
-        return entry
-
-
-def _upsert_entry(entries: List[PooledCredential], provider: str, source: str, payload: Dict[str, Any]) -> bool:
-    existing_idx = None
-    for idx, entry in enumerate(entries):
-        if entry.source == source:
-            existing_idx = idx
-            break
-
-    if existing_idx is None:
-        payload.setdefault("id", uuid.uuid4().hex[:6])
-        payload.setdefault("priority", _next_priority(entries))
-        payload.setdefault("label", payload.get("label") or source)
-        entries.append(PooledCredential.from_dict(provider, payload))
-        return True
-
-    existing = entries[existing_idx]
-    field_updates = {}
-    extra_updates = {}
-    _field_names = {f.name for f in fields(existing)}
-    for key, value in payload.items():
-        if key in {"id", "priority"} or value is None:
-            continue
-        if key == "label" and existing.label:
-            continue
-        if key in _field_names:
-            if getattr(existing, key) != value:
-                field_updates[key] = value
-        elif key in _EXTRA_KEYS:
-            if existing.extra.get(key) != value:
-                extra_updates[key] = value
-    if field_updates or extra_updates:
-        if extra_updates:
-            field_updates["extra"] = {**existing.extra, **extra_updates}
-        entries[existing_idx] = replace(existing, **field_updates)
-        return True
-    return False
-
-
-def _normalize_pool_priorities(provider: str, entries: List[PooledCredential]) -> bool:
-    if provider != "anthropic":
-        return False
-
-    source_rank = {
-        "env:ANTHROPIC_TOKEN": 0,
-        "env:CLAUDE_CODE_OAUTH_TOKEN": 1,
-        "hermes_pkce": 2,
-        "claude_code": 3,
-        "env:ANTHROPIC_API_KEY": 4,
-    }
-    manual_entries = sorted(
-        (entry for entry in entries if _is_manual_source(entry.source)),
-        key=lambda entry: entry.priority,
-    )
-    seeded_entries = sorted(
-        (entry for entry in entries if not _is_manual_source(entry.source)),
-        key=lambda entry: (
-            source_rank.get(entry.source, len(source_rank)),
-            entry.priority,
-            entry.label,
-        ),
-    )
-
-    ordered = [*manual_entries, *seeded_entries]
-    id_to_idx = {entry.id: idx for idx, entry in enumerate(entries)}
-    changed = False
-    for new_priority, entry in enumerate(ordered):
-        if entry.priority != new_priority:
-            entries[id_to_idx[entry.id]] = replace(entry, priority=new_priority)
-            changed = True
-    return changed
-
-
-def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tuple[bool, Set[str]]:
-    changed = False
-    active_sources: Set[str] = set()
-    auth_store = _load_auth_store()
-
-    if provider == "anthropic":
-        from agent.anthropic_adapter import read_claude_code_credentials, read_hermes_oauth_credentials
-
-        for source_name, creds in (
-            ("hermes_pkce", read_hermes_oauth_credentials()),
-            ("claude_code", read_claude_code_credentials()),
-        ):
-            if creds and creds.get("accessToken"):
-                active_sources.add(source_name)
-                changed |= _upsert_entry(
-                    entries,
-                    provider,
-                    source_name,
-                    {
-                        "source": source_name,
-                        "auth_type": AUTH_TYPE_OAUTH,
-                        "access_token": creds.get("accessToken", ""),
-                        "refresh_token": creds.get("refreshToken"),
-                        "expires_at_ms": creds.get("expiresAt"),
-                        "label": label_from_token(creds.get("accessToken", ""), source_name),
-                    },
-                )
-
-    elif provider == "nous":
-        state = _load_provider_state(auth_store, "nous")
-        if state:
-            active_sources.add("device_code")
-            changed |= _upsert_entry(
-                entries,
-                provider,
-                "device_code",
-                {
-                    "source": "device_code",
-                    "auth_type": AUTH_TYPE_OAUTH,
-                    "access_token": state.get("access_token", ""),
-                    "refresh_token": state.get("refresh_token"),
-                    "expires_at": state.get("expires_at"),
-                    "token_type": state.get("token_type"),
-                    "scope": state.get("scope"),
-                    "client_id": state.get("client_id"),
-                    "portal_base_url": state.get("portal_base_url"),
-                    "inference_base_url": state.get("inference_base_url"),
-                    "agent_key": state.get("agent_key"),
-                    "agent_key_expires_at": state.get("agent_key_expires_at"),
-                    "tls": state.get("tls") if isinstance(state.get("tls"), dict) else None,
-                    "label": label_from_token(state.get("access_token", ""), "device_code"),
-                },
-            )
-
-    elif provider == "openai-codex":
-        state = _load_provider_state(auth_store, "openai-codex")
-        tokens = state.get("tokens") if isinstance(state, dict) else None
-        if isinstance(tokens, dict) and tokens.get("access_token"):
-            active_sources.add("device_code")
-            changed |= _upsert_entry(
-                entries,
-                provider,
-                "device_code",
-                {
-                    "source": "device_code",
-                    "auth_type": AUTH_TYPE_OAUTH,
-                    "access_token": tokens.get("access_token", ""),
-                    "refresh_token": tokens.get("refresh_token"),
-                    "base_url": "https://chatgpt.com/backend-api/codex",
-                    "last_refresh": state.get("last_refresh"),
-                    "label": label_from_token(tokens.get("access_token", ""), "device_code"),
-                },
-            )
-
-    return changed, active_sources
-
-
-def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool, Set[str]]:
-    changed = False
-    active_sources: Set[str] = set()
-    if provider == "openrouter":
-        token = os.getenv("OPENROUTER_API_KEY", "").strip()
-        if token:
-            source = "env:OPENROUTER_API_KEY"
-            active_sources.add(source)
-            changed |= _upsert_entry(
-                entries,
-                provider,
-                source,
-                {
-                    "source": source,
-                    "auth_type": AUTH_TYPE_API_KEY,
-                    "access_token": token,
-                    "base_url": OPENROUTER_BASE_URL,
-                    "label": "OPENROUTER_API_KEY",
-                },
-            )
-        return changed, active_sources
-
-    pconfig = PROVIDER_REGISTRY.get(provider)
-    if not pconfig or pconfig.auth_type != AUTH_TYPE_API_KEY:
-        return changed, active_sources
-
-    env_url = ""
-    if pconfig.base_url_env_var:
-        env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")
-
-    env_vars = list(pconfig.api_key_env_vars)
-    if provider == "anthropic":
-        env_vars = [
-            "ANTHROPIC_TOKEN",
-            "CLAUDE_CODE_OAUTH_TOKEN",
-            "ANTHROPIC_API_KEY",
-        ]
-
-    for env_var in env_vars:
-        token = os.getenv(env_var, "").strip()
-        if not token:
-            continue
-        source = f"env:{env_var}"
-        active_sources.add(source)
-        auth_type = AUTH_TYPE_OAUTH if provider == "anthropic" and not token.startswith("sk-ant-api") else AUTH_TYPE_API_KEY
-        base_url = env_url or pconfig.inference_base_url
-        changed |= _upsert_entry(
-            entries,
-            provider,
-            source,
-            {
-                "source": source,
-                "auth_type": auth_type,
-                "access_token": token,
-                "base_url": base_url,
-                "label": env_var,
-            },
-        )
-    return changed, active_sources
-
-
-def _prune_stale_seeded_entries(entries: List[PooledCredential], active_sources: Set[str]) -> bool:
-    retained = [
-        entry
-        for entry in entries
-        if _is_manual_source(entry.source)
-        or entry.source in active_sources
-        or not (
-            entry.source.startswith("env:")
-            or entry.source in {"claude_code", "hermes_pkce"}
-        )
-    ]
-    if len(retained) == len(entries):
-        return False
-    entries[:] = retained
-    return True
-
-
-def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[bool, Set[str]]:
-    """Seed a custom endpoint pool from custom_providers config and model config."""
-    changed = False
-    active_sources: Set[str] = set()
-
-    # Seed from the custom_providers config entry's api_key field
-    cp_config = _get_custom_provider_config(pool_key)
-    if cp_config:
-        api_key = str(cp_config.get("api_key") or "").strip()
-        base_url = str(cp_config.get("base_url") or "").strip().rstrip("/")
-        name = str(cp_config.get("name") or "").strip()
-        if api_key:
-            source = f"config:{name}"
-            active_sources.add(source)
-            changed |= _upsert_entry(
-                entries,
-                pool_key,
-                source,
-                {
-                    "source": source,
-                    "auth_type": AUTH_TYPE_API_KEY,
-                    "access_token": api_key,
-                    "base_url": base_url,
-                    "label": name or source,
-                },
-            )
-
-    # Seed from model.api_key if model.provider=='custom' and model.base_url matches
-    try:
-        config = _load_config_safe()
-        model_cfg = config.get("model") if config else None
-        if isinstance(model_cfg, dict):
-            model_provider = str(model_cfg.get("provider") or "").strip().lower()
-            model_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
-            model_api_key = ""
-            for k in ("api_key", "api"):
-                v = model_cfg.get(k)
-                if isinstance(v, str) and v.strip():
-                    model_api_key = v.strip()
-                    break
-            if model_provider == "custom" and model_base_url and model_api_key:
-                # Check if this model's base_url matches our custom provider
-                matched_key = get_custom_provider_pool_key(model_base_url)
-                if matched_key == pool_key:
-                    source = "model_config"
-                    active_sources.add(source)
-                    changed |= _upsert_entry(
-                        entries,
-                        pool_key,
-                        source,
-                        {
-                            "source": source,
-                            "auth_type": AUTH_TYPE_API_KEY,
-                            "access_token": model_api_key,
-                            "base_url": model_base_url,
-                            "label": "model_config",
-                        },
-                    )
-    except Exception:
-        pass
-
-    return changed, active_sources
-
-
-def load_pool(provider: str) -> CredentialPool:
-    provider = (provider or "").strip().lower()
-    raw_entries = read_credential_pool(provider)
-    entries = [PooledCredential.from_dict(provider, payload) for payload in raw_entries]
-
-    if provider.startswith(CUSTOM_POOL_PREFIX):
-        # Custom endpoint pool — seed from custom_providers config and model config
-        custom_changed, custom_sources = _seed_custom_pool(provider, entries)
-        changed = custom_changed
-        changed |= _prune_stale_seeded_entries(entries, custom_sources)
-    else:
-        singleton_changed, singleton_sources = _seed_from_singletons(provider, entries)
-        env_changed, env_sources = _seed_from_env(provider, entries)
-        changed = singleton_changed or env_changed
-        changed |= _prune_stale_seeded_entries(entries, singleton_sources | env_sources)
-        changed |= _normalize_pool_priorities(provider, entries)
-
-    if changed:
-        write_credential_pool(
-            provider,
-            [entry.to_dict() for entry in sorted(entries, key=lambda item: item.priority)],
-        )
-    return CredentialPool(provider, entries)
@@ -10,9 +10,6 @@ import os
 import sys
 import threading
 import time
-from dataclasses import dataclass, field
-from difflib import unified_diff
-from pathlib import Path

 # ANSI escape codes for coloring tool failure indicators
 _RED = "\033[31m"
@@ -20,22 +17,6 @@ _RESET = "\033[0m"

 logger = logging.getLogger(__name__)

-_ANSI_RESET = "\033[0m"
-_ANSI_DIM = "\033[38;2;150;150;150m"
-_ANSI_FILE = "\033[38;2;180;160;255m"
-_ANSI_HUNK = "\033[38;2;120;120;140m"
-_ANSI_MINUS = "\033[38;2;255;255;255;48;2;120;20;20m"
-_ANSI_PLUS = "\033[38;2;255;255;255;48;2;20;90;20m"
-_MAX_INLINE_DIFF_FILES = 6
-_MAX_INLINE_DIFF_LINES = 80
-
-
-@dataclass
-class LocalEditSnapshot:
-    """Pre-tool filesystem snapshot used to render diffs locally after writes."""
-    paths: list[Path] = field(default_factory=list)
-    before: dict[str, str | None] = field(default_factory=dict)
-
 # =========================================================================
 # Configurable tool preview length (0 = no limit)
 # Set once at startup by CLI or gateway from display.tool_preview_length config.
@@ -237,300 +218,6 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
    return preview


-# =========================================================================
-# Inline diff previews for write actions
-# =========================================================================
-
-def _resolved_path(path: str) -> Path:
-    """Resolve a possibly-relative filesystem path against the current cwd."""
-    candidate = Path(os.path.expanduser(path))
-    if candidate.is_absolute():
-        return candidate
-    return Path.cwd() / candidate
-
-
-def _snapshot_text(path: Path) -> str | None:
-    """Return UTF-8 file content, or None for missing/unreadable files."""
-    try:
-        return path.read_text(encoding="utf-8")
-    except (FileNotFoundError, IsADirectoryError, UnicodeDecodeError, OSError):
-        return None
-
-
-def _display_diff_path(path: Path) -> str:
-    """Prefer cwd-relative paths in diffs when available."""
-    try:
-        return str(path.resolve().relative_to(Path.cwd().resolve()))
-    except Exception:
-        return str(path)
-
-
-def _resolve_skill_manage_paths(args: dict) -> list[Path]:
-    """Resolve skill_manage write targets to filesystem paths."""
-    action = args.get("action")
-    name = args.get("name")
-    if not action or not name:
-        return []
-
-    from tools.skill_manager_tool import _find_skill, _resolve_skill_dir
-
-    if action == "create":
-        skill_dir = _resolve_skill_dir(name, args.get("category"))
-        return [skill_dir / "SKILL.md"]
-
-    existing = _find_skill(name)
-    if not existing:
-        return []
-
-    skill_dir = Path(existing["path"])
-    if action in {"edit", "patch"}:
-        file_path = args.get("file_path")
-        return [skill_dir / file_path] if file_path else [skill_dir / "SKILL.md"]
-    if action in {"write_file", "remove_file"}:
-        file_path = args.get("file_path")
-        return [skill_dir / file_path] if file_path else []
-    if action == "delete":
-        files = [path for path in sorted(skill_dir.rglob("*")) if path.is_file()]
-        return files
-    return []
-
-
-def _resolve_local_edit_paths(tool_name: str, function_args: dict | None) -> list[Path]:
-    """Resolve local filesystem targets for write-capable tools."""
-    if not isinstance(function_args, dict):
-        return []
-
-    if tool_name == "write_file":
-        path = function_args.get("path")
-        return [_resolved_path(path)] if path else []
-
-    if tool_name == "patch":
-        path = function_args.get("path")
-        return [_resolved_path(path)] if path else []
-
-    if tool_name == "skill_manage":
-        return _resolve_skill_manage_paths(function_args)
-
-    return []
-
-
-def capture_local_edit_snapshot(tool_name: str, function_args: dict | None) -> LocalEditSnapshot | None:
-    """Capture before-state for local write previews."""
-    paths = _resolve_local_edit_paths(tool_name, function_args)
-    if not paths:
-        return None
-
-    snapshot = LocalEditSnapshot(paths=paths)
-    for path in paths:
-        snapshot.before[str(path)] = _snapshot_text(path)
-    return snapshot
-
-
-def _result_succeeded(result: str | None) -> bool:
-    """Conservatively detect whether a tool result represents success."""
-    if not result:
-        return False
-    try:
-        data = json.loads(result)
-    except (json.JSONDecodeError, TypeError):
-        return False
-    if not isinstance(data, dict):
-        return False
-    if data.get("error"):
-        return False
-    if "success" in data:
-        return bool(data.get("success"))
-    return True
-
-
-def _diff_from_snapshot(snapshot: LocalEditSnapshot | None) -> str | None:
-    """Generate unified diff text from a stored before-state and current files."""
-    if not snapshot:
-        return None
-
-    chunks: list[str] = []
-    for path in snapshot.paths:
-        before = snapshot.before.get(str(path))
-        after = _snapshot_text(path)
-        if before == after:
-            continue
-
-        display_path = _display_diff_path(path)
-        diff = "".join(
-            unified_diff(
-                [] if before is None else before.splitlines(keepends=True),
-                [] if after is None else after.splitlines(keepends=True),
-                fromfile=f"a/{display_path}",
-                tofile=f"b/{display_path}",
-            )
-        )
-        if diff:
-            chunks.append(diff)
-
-    if not chunks:
-        return None
-    return "".join(chunk if chunk.endswith("\n") else chunk + "\n" for chunk in chunks)
-
-
-def extract_edit_diff(
-    tool_name: str,
-    result: str | None,
-    *,
-    function_args: dict | None = None,
-    snapshot: LocalEditSnapshot | None = None,
-) -> str | None:
-    """Extract a unified diff from a file-edit tool result."""
-    if tool_name == "patch" and result:
-        try:
-            data = json.loads(result)
-        except (json.JSONDecodeError, TypeError):
-            data = None
-        if isinstance(data, dict):
-            diff = data.get("diff")
-            if isinstance(diff, str) and diff.strip():
-                return diff
-
-    if tool_name not in {"write_file", "patch", "skill_manage"}:
-        return None
-    if not _result_succeeded(result):
-        return None
-    return _diff_from_snapshot(snapshot)
-
-
-def _emit_inline_diff(diff_text: str, print_fn) -> bool:
-    """Emit rendered diff text through the CLI's prompt_toolkit-safe printer."""
-    if print_fn is None or not diff_text:
-        return False
-    try:
-        print_fn("  ┊ review diff")
-        for line in diff_text.rstrip("\n").splitlines():
-            print_fn(line)
-        return True
-    except Exception:
-        return False
-
-
-def _render_inline_unified_diff(diff: str) -> list[str]:
-    """Render unified diff lines in Hermes' inline transcript style."""
-    rendered: list[str] = []
-    from_file = None
-    to_file = None
-
-    for raw_line in diff.splitlines():
-        if raw_line.startswith("--- "):
-            from_file = raw_line[4:].strip()
-            continue
-        if raw_line.startswith("+++ "):
-            to_file = raw_line[4:].strip()
-            if from_file or to_file:
-                rendered.append(f"{_ANSI_FILE}{from_file or 'a/?'} → {to_file or 'b/?'}{_ANSI_RESET}")
-            continue
-        if raw_line.startswith("@@"):
-            rendered.append(f"{_ANSI_HUNK}{raw_line}{_ANSI_RESET}")
-            continue
-        if raw_line.startswith("-"):
-            rendered.append(f"{_ANSI_MINUS}{raw_line}{_ANSI_RESET}")
-            continue
-        if raw_line.startswith("+"):
-            rendered.append(f"{_ANSI_PLUS}{raw_line}{_ANSI_RESET}")
-            continue
-        if raw_line.startswith(" "):
-            rendered.append(f"{_ANSI_DIM}{raw_line}{_ANSI_RESET}")
-            continue
-        if raw_line:
-            rendered.append(raw_line)
-
-    return rendered
-
-
-def _split_unified_diff_sections(diff: str) -> list[str]:
-    """Split a unified diff into per-file sections."""
-    sections: list[list[str]] = []
-    current: list[str] = []
-
-    for line in diff.splitlines():
-        if line.startswith("--- ") and current:
-            sections.append(current)
-            current = [line]
-            continue
-        current.append(line)
-
-    if current:
-        sections.append(current)
-
-    return ["\n".join(section) for section in sections if section]
-
-
-def _summarize_rendered_diff_sections(
-    diff: str,
-    *,
-    max_files: int = _MAX_INLINE_DIFF_FILES,
-    max_lines: int = _MAX_INLINE_DIFF_LINES,
-) -> list[str]:
-    """Render diff sections while capping file count and total line count."""
-    sections = _split_unified_diff_sections(diff)
-    rendered: list[str] = []
-    omitted_files = 0
-    omitted_lines = 0
-
-    for idx, section in enumerate(sections):
-        if idx >= max_files:
-            omitted_files += 1
-            omitted_lines += len(_render_inline_unified_diff(section))
-            continue
-
-        section_lines = _render_inline_unified_diff(section)
-        remaining_budget = max_lines - len(rendered)
-        if remaining_budget <= 0:
-            omitted_lines += len(section_lines)
-            omitted_files += 1
-            continue
-
-        if len(section_lines) <= remaining_budget:
-            rendered.extend(section_lines)
-            continue
-
-        rendered.extend(section_lines[:remaining_budget])
-        omitted_lines += len(section_lines) - remaining_budget
-        omitted_files += 1 + max(0, len(sections) - idx - 1)
-        for leftover in sections[idx + 1:]:
-            omitted_lines += len(_render_inline_unified_diff(leftover))
-        break
-
-    if omitted_files or omitted_lines:
-        summary = f"… omitted {omitted_lines} diff line(s)"
-        if omitted_files:
-            summary += f" across {omitted_files} additional file(s)/section(s)"
-        rendered.append(f"{_ANSI_HUNK}{summary}{_ANSI_RESET}")
-
-    return rendered
-
-
-def render_edit_diff_with_delta(
-    tool_name: str,
-    result: str | None,
-    *,
-    function_args: dict | None = None,
-    snapshot: LocalEditSnapshot | None = None,
-    print_fn=None,
-) -> bool:
-    """Render an edit diff inline without taking over the terminal UI."""
-    diff = extract_edit_diff(
-        tool_name,
-        result,
-        function_args=function_args,
-        snapshot=snapshot,
-    )
-    if not diff:
-        return False
-    try:
-        rendered_lines = _summarize_rendered_diff_sections(diff)
-    except Exception as exc:
-        logger.debug("Could not render inline diff: %s", exc)
-        return False
-    return _emit_inline_diff("\n".join(rendered_lines), print_fn)
-
-
 # =========================================================================
 # KawaiiSpinner
 # =========================================================================
@@ -644,9 +644,6 @@ class InsightsEngine:
        lines.append(f"  Sessions:          {o['total_sessions']:<12}  Messages:        {o['total_messages']:,}")
        lines.append(f"  Tool calls:        {o['total_tool_calls']:<12,}  User messages:   {o['user_messages']:,}")
        lines.append(f"  Input tokens:      {o['total_input_tokens']:<12,}  Output tokens:   {o['total_output_tokens']:,}")
-        cache_total = o.get("total_cache_read_tokens", 0) + o.get("total_cache_write_tokens", 0)
-        if cache_total > 0:
-            lines.append(f"  Cache read:        {o['total_cache_read_tokens']:<12,}  Cache write:     {o['total_cache_write_tokens']:,}")
        cost_str = f"${o['estimated_cost']:.2f}"
        if o.get("models_without_pricing"):
            cost_str += " *"
@@ -749,11 +746,7 @@ class InsightsEngine:

        # Overview
        lines.append(f"**Sessions:** {o['total_sessions']} | **Messages:** {o['total_messages']:,} | **Tool calls:** {o['total_tool_calls']:,}")
-        cache_total = o.get("total_cache_read_tokens", 0) + o.get("total_cache_write_tokens", 0)
-        if cache_total > 0:
-            lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,} / cache: {cache_total:,})")
-        else:
-            lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,})")
+        lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,})")
        cost_note = ""
        if o.get("models_without_pricing"):
            cost_note = " _(excludes custom/self-hosted models)_"
@@ -176,7 +176,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.deepseek.com": "deepseek",
    "api.githubcopilot.com": "copilot",
    "models.github.ai": "copilot",
-    "api.fireworks.ai": "fireworks",
 }


@@ -43,7 +43,6 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
    "opencode-zen": "opencode",
    "opencode-go": "opencode-go",
    "kilocode": "kilo",
-    "fireworks": "fireworks-ai",
 }


@@ -13,19 +13,11 @@ import re

 logger = logging.getLogger(__name__)

-# Snapshot at import time so runtime env mutations (e.g. LLM-generated
-# `export HERMES_REDACT_SECRETS=false`) cannot disable redaction mid-session.
-_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "").lower() not in ("0", "false", "no", "off")
-
 # Known API key prefixes -- match the prefix + contiguous token chars
 _PREFIX_PATTERNS = [
    r"sk-[A-Za-z0-9_-]{10,}",           # OpenAI / OpenRouter / Anthropic (sk-ant-*)
    r"ghp_[A-Za-z0-9]{10,}",            # GitHub PAT (classic)
    r"github_pat_[A-Za-z0-9_]{10,}",    # GitHub PAT (fine-grained)
-    r"gho_[A-Za-z0-9]{10,}",            # GitHub OAuth access token
-    r"ghu_[A-Za-z0-9]{10,}",            # GitHub user-to-server token
-    r"ghs_[A-Za-z0-9]{10,}",            # GitHub server-to-server token
-    r"ghr_[A-Za-z0-9]{10,}",            # GitHub refresh token
    r"xox[baprs]-[A-Za-z0-9-]{10,}",    # Slack tokens
    r"AIza[A-Za-z0-9_-]{30,}",          # Google API keys
    r"pplx-[A-Za-z0-9]{10,}",           # Perplexity
@@ -117,7 +109,7 @@ def redact_sensitive_text(text: str) -> str:
        text = str(text)
    if not text:
        return text
-    if not _REDACT_ENABLED:
+    if os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("0", "false", "no", "off"):
        return text

    # Known prefixes (sk-, ghp_, etc.)
@@ -127,7 +127,6 @@ def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any
                "api_mode": primary.get("api_mode"),
                "command": primary.get("command"),
                "args": list(primary.get("args") or []),
-                "credential_pool": primary.get("credential_pool"),
            },
            "label": None,
            "signature": (
@@ -163,7 +162,6 @@ def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any
                "api_mode": primary.get("api_mode"),
                "command": primary.get("command"),
                "args": list(primary.get("args") or []),
-                "credential_pool": primary.get("credential_pool"),
            },
            "label": None,
            "signature": (
@@ -263,20 +263,17 @@ def load_cli_config() -> Dict[str, Any]:
                    # Old format: model is a dict with default/base_url
                    defaults["model"].update(file_config["model"])

-            # Legacy root-level provider/base_url fallback.
-            # Some users (or old code) put provider: / base_url: at the
-            # config root instead of inside the model: section.  These are
-            # only used as a FALLBACK when model.provider / model.base_url
-            # is not already set — never as an override.  The canonical
-            # location is model.provider (written by `hermes model`).
-            if not defaults["model"].get("provider"):
-                root_provider = file_config.get("provider")
-                if root_provider:
-                    defaults["model"]["provider"] = root_provider
-            if not defaults["model"].get("base_url"):
-                root_base_url = file_config.get("base_url")
-                if root_base_url:
-                    defaults["model"]["base_url"] = root_base_url
+            # Root-level provider and base_url override model config.
+            # Users may write:
+            #   model: kimi-k2.5:cloud
+            #   provider: custom
+            #   base_url: http://localhost:11434/v1
+            # These root-level keys must be merged into defaults["model"] so
+            # they are picked up by CLI provider resolution.
+            if "provider" in file_config and file_config["provider"]:
+                defaults["model"]["provider"] = file_config["provider"]
+            if "base_url" in file_config and file_config["base_url"]:
+                defaults["model"]["base_url"] = file_config["base_url"]
            
            # Deep merge file_config into defaults.
            # First: merge keys that exist in both (deep-merge dicts, overwrite scalars)
@@ -994,10 +991,9 @@ def save_config_value(key_path: str, value: any) -> bool:
            current = current[key]
        current[keys[-1]] = value
        
-        # Save back atomically — write to temp file + fsync + os.replace
-        # so an interrupt never leaves config.yaml truncated or empty.
-        from utils import atomic_yaml_write
-        atomic_yaml_write(config_path, config)
+        # Save back
+        with open(config_path, 'w') as f:
+            yaml.dump(config, f, default_flow_style=False, sort_keys=False)
        
        # Enforce owner-only permissions on config files (contain API keys)
        try:
@@ -1077,16 +1073,12 @@ class HermesCLI:
        # streaming: stream tokens to the terminal as they arrive (display.streaming in config.yaml)
        self.streaming_enabled = CLI_CONFIG["display"].get("streaming", False)

-        # Inline diff previews for write actions (display.inline_diffs in config.yaml)
-        self._inline_diffs_enabled = CLI_CONFIG["display"].get("inline_diffs", True)
-
        # Streaming display state
        self._stream_buf = ""        # Partial line buffer for line-buffered rendering
        self._stream_started = False  # True once first delta arrives
        self._stream_box_opened = False  # True once the response box header is printed
        self._reasoning_stream_started = False  # True once live reasoning starts streaming
        self._reasoning_preview_buf = ""  # Coalesce tiny reasoning chunks for [thinking] output
-        self._pending_edit_snapshots = {}
        
        # Configuration - priority: CLI args > env vars > config file
        # Model comes from: CLI arg or config.yaml (single source of truth).
@@ -1132,9 +1124,9 @@ class HermesCLI:
        self.acp_args: list[str] = []
        self.base_url = (
            base_url
-            or CLI_CONFIG["model"].get("base_url", "")
-            or os.getenv("OPENROUTER_BASE_URL", "")
-        ) or None
+            or os.getenv("OPENAI_BASE_URL")
+            or os.getenv("OPENROUTER_BASE_URL", CLI_CONFIG["model"]["base_url"])
+        )
        # Match key to resolved base_url: OpenRouter URL → prefer OPENROUTER_API_KEY,
        # custom endpoint → prefer OPENAI_API_KEY (issue #560).
        # Note: _ensure_runtime_credentials() re-resolves this before first use.
@@ -1963,7 +1955,6 @@ class HermesCLI:
        resolved_api_mode = runtime.get("api_mode", self.api_mode)
        resolved_acp_command = runtime.get("command")
        resolved_acp_args = list(runtime.get("args") or [])
-        resolved_credential_pool = runtime.get("credential_pool")
        if not isinstance(api_key, str) or not api_key:
            # Custom / local endpoints (llama.cpp, ollama, vLLM, etc.) often
            # don't require authentication.  When a base_url IS configured but
@@ -1996,7 +1987,6 @@ class HermesCLI:
        self.api_mode = resolved_api_mode
        self.acp_command = resolved_acp_command
        self.acp_args = resolved_acp_args
-        self._credential_pool = resolved_credential_pool
        self._provider_source = runtime.get("source")
        self.api_key = api_key
        self.base_url = base_url
@@ -2028,7 +2018,6 @@ class HermesCLI:
                "api_mode": self.api_mode,
                "command": self.acp_command,
                "args": list(self.acp_args or []),
-                "credential_pool": getattr(self, "_credential_pool", None),
            },
        )

@@ -2099,7 +2088,6 @@ class HermesCLI:
                "api_mode": self.api_mode,
                "command": self.acp_command,
                "args": list(self.acp_args or []),
-                "credential_pool": getattr(self, "_credential_pool", None),
            }
            effective_model = model_override or self.model
            self.agent = AIAgent(
@@ -2110,7 +2098,6 @@ class HermesCLI:
                api_mode=runtime.get("api_mode"),
                acp_command=runtime.get("command"),
                acp_args=runtime.get("args"),
-                credential_pool=runtime.get("credential_pool"),
                max_iterations=self.max_turns,
                enabled_toolsets=self.enabled_toolsets,
                verbose_logging=self.verbose,
@@ -2136,8 +2123,6 @@ class HermesCLI:
                checkpoint_max_snapshots=self.checkpoint_max_snapshots,
                pass_session_id=self.pass_session_id,
                tool_progress_callback=self._on_tool_progress,
-                tool_start_callback=self._on_tool_start if self._inline_diffs_enabled else None,
-                tool_complete_callback=self._on_tool_complete if self._inline_diffs_enabled else None,
                stream_delta_callback=self._stream_delta if self.streaming_enabled else None,
                tool_gen_callback=self._on_tool_gen_start if self.streaming_enabled else None,
            )
@@ -2169,12 +2154,6 @@ class HermesCLI:
    def show_banner(self):
        """Display the welcome banner in Claude Code style."""
        self.console.clear()
-
-        # Get context length for display before branching so it remains
-        # available to the low-context warning logic in compact mode too.
-        ctx_len = None
-        if hasattr(self, 'agent') and self.agent and hasattr(self.agent, 'context_compressor'):
-            ctx_len = self.agent.context_compressor.context_length
        
        # Auto-compact for narrow terminals — the full banner with caduceus
        # + tool list needs ~80 columns minimum to render without wrapping.
@@ -2191,6 +2170,11 @@ class HermesCLI:
            # Get terminal working directory (where commands will execute)
            cwd = os.getenv("TERMINAL_CWD", os.getcwd())
            
+            # Get context length for display
+            ctx_len = None
+            if hasattr(self, 'agent') and self.agent and hasattr(self.agent, 'context_compressor'):
+                ctx_len = self.agent.context_compressor.context_length
+            
            # Build and display the banner
            build_welcome_banner(
                console=self.console,
@@ -2204,31 +2188,7 @@ class HermesCLI:
        
        # Show tool availability warnings if any tools are disabled
        self._show_tool_availability_warnings()
-
-        # Warn about very low context lengths (common with local servers)
-        if ctx_len and ctx_len <= 8192:
-            self.console.print()
-            self.console.print(
-                f"[yellow]⚠️  Context length is only {ctx_len:,} tokens — "
-                f"this is likely too low for agent use with tools.[/]"
-            )
-            self.console.print(
-                "[dim]   Hermes needs 16k–32k minimum. Tool schemas + system prompt alone use ~4k–8k.[/]"
-            )
-            base_url = getattr(self, "base_url", "") or ""
-            if "11434" in base_url or "ollama" in base_url.lower():
-                self.console.print(
-                    "[dim]   Ollama fix: OLLAMA_CONTEXT_LENGTH=32768 ollama serve[/]"
-                )
-            elif "1234" in base_url:
-                self.console.print(
-                    "[dim]   LM Studio fix: Set context length in model settings → reload model[/]"
-                )
-            else:
-                self.console.print(
-                    "[dim]   Fix: Set model.context_length in config.yaml, or increase your server's context setting[/]"
-                )
-
+        
        self.console.print()

    def _preload_resumed_session(self) -> bool:
@@ -2877,28 +2837,6 @@ class HermesCLI:
        print("  Example: python cli.py --toolsets web,terminal")
        print()
    
-    def _handle_profile_command(self):
-        """Display active profile name and home directory."""
-        from hermes_constants import get_hermes_home, display_hermes_home
-
-        home = get_hermes_home()
-        display = display_hermes_home()
-
-        profiles_parent = Path.home() / ".hermes" / "profiles"
-        try:
-            rel = home.relative_to(profiles_parent)
-            profile_name = str(rel).split("/")[0]
-        except ValueError:
-            profile_name = None
-
-        print()
-        if profile_name:
-            print(f"  Profile: {profile_name}")
-        else:
-            print("  Profile: default")
-        print(f"  Home:    {display}")
-        print()
-
    def show_config(self):
        """Display current configuration with kawaii ASCII art."""
        # Get terminal config from environment (which was set from cli-config.yaml)
@@ -3279,7 +3217,7 @@ class HermesCLI:
                        print(f"      {mid}{current_marker}")
                elif p["id"] == "custom":
                    from hermes_cli.models import _get_custom_base_url
-                    custom_url = _get_custom_base_url()
+                    custom_url = _get_custom_base_url() or os.getenv("OPENAI_BASE_URL", "")
                    if custom_url:
                        print(f"      endpoint: {custom_url}")
                    if is_active:
@@ -3741,8 +3679,6 @@ class HermesCLI:
            return False
        elif canonical == "help":
            self.show_help()
-        elif canonical == "profile":
-            self._handle_profile_command()
        elif canonical == "tools":
            self._handle_tools_command(cmd_original)
        elif canonical == "toolsets":
@@ -3900,8 +3836,6 @@ class HermesCLI:
            self.console.print(f"  Status bar {state}")
        elif canonical == "verbose":
            self._toggle_verbose()
-        elif canonical == "yolo":
-            self._toggle_yolo()
        elif canonical == "reasoning":
            self._handle_reasoning_command(cmd_original)
        elif canonical == "compress":
@@ -3944,8 +3878,6 @@ class HermesCLI:
            self._handle_stop_command()
        elif canonical == "background":
            self._handle_background_command(cmd_original)
-        elif canonical == "btw":
-            self._handle_btw_command(cmd_original)
        elif canonical == "queue":
            # Extract prompt after "/queue " or "/q "
            parts = cmd_original.split(None, 1)
@@ -4232,121 +4164,6 @@ class HermesCLI:
        self._background_tasks[task_id] = thread
        thread.start()

-    def _handle_btw_command(self, cmd: str):
-        """Handle /btw <question> — ephemeral side question using session context.
-
-        Snapshots the current conversation history, spawns a no-tools agent in
-        a background thread, and prints the answer without persisting anything
-        to the main session.
-        """
-        parts = cmd.strip().split(maxsplit=1)
-        if len(parts) < 2 or not parts[1].strip():
-            _cprint("  Usage: /btw <question>")
-            _cprint("  Example: /btw what module owns session title sanitization?")
-            _cprint("  Answers using session context. No tools, not persisted.")
-            return
-
-        question = parts[1].strip()
-        task_id = f"btw_{datetime.now().strftime('%H%M%S')}_{uuid.uuid4().hex[:6]}"
-
-        if not self._ensure_runtime_credentials():
-            _cprint("  (>_<) Cannot start /btw: no valid credentials.")
-            return
-
-        turn_route = self._resolve_turn_agent_config(question)
-        history_snapshot = list(self.conversation_history)
-
-        preview = question[:60] + ("..." if len(question) > 60 else "")
-        _cprint(f'  💬 /btw: "{preview}"')
-
-        def run_btw():
-            try:
-                btw_agent = AIAgent(
-                    model=turn_route["model"],
-                    api_key=turn_route["runtime"].get("api_key"),
-                    base_url=turn_route["runtime"].get("base_url"),
-                    provider=turn_route["runtime"].get("provider"),
-                    api_mode=turn_route["runtime"].get("api_mode"),
-                    acp_command=turn_route["runtime"].get("command"),
-                    acp_args=turn_route["runtime"].get("args"),
-                    max_iterations=8,
-                    enabled_toolsets=[],
-                    quiet_mode=True,
-                    verbose_logging=False,
-                    session_id=task_id,
-                    platform="cli",
-                    reasoning_config=self.reasoning_config,
-                    providers_allowed=self._providers_only,
-                    providers_ignored=self._providers_ignore,
-                    providers_order=self._providers_order,
-                    provider_sort=self._provider_sort,
-                    provider_require_parameters=self._provider_require_params,
-                    provider_data_collection=self._provider_data_collection,
-                    fallback_model=self._fallback_model,
-                    session_db=None,
-                    skip_memory=True,
-                    skip_context_files=True,
-                    persist_session=False,
-                )
-
-                btw_prompt = (
-                    "[Ephemeral /btw side question. Answer using the conversation "
-                    "context. No tools available. Be direct and concise.]\n\n"
-                    + question
-                )
-                result = btw_agent.run_conversation(
-                    user_message=btw_prompt,
-                    conversation_history=history_snapshot,
-                    task_id=task_id,
-                    sync_honcho=False,
-                )
-
-                response = (result.get("final_response") or "") if result else ""
-                if not response and result and result.get("error"):
-                    response = f"Error: {result['error']}"
-
-                # TUI refresh before printing
-                if self._app:
-                    self._app.invalidate()
-                    time.sleep(0.05)
-                print()
-
-                if response:
-                    try:
-                        from hermes_cli.skin_engine import get_active_skin
-                        _skin = get_active_skin()
-                        _resp_color = _skin.get_color("response_border", "#4F6D4A")
-                    except Exception:
-                        _resp_color = "#4F6D4A"
-
-                    ChatConsole().print(Panel(
-                        _rich_text_from_ansi(response),
-                        title=f"[{_resp_color} bold]⚕ /btw[/]",
-                        title_align="left",
-                        border_style=_resp_color,
-                        box=rich_box.HORIZONTALS,
-                        padding=(1, 2),
-                    ))
-                else:
-                    _cprint("  💬 /btw: (no response)")
-
-                if self.bell_on_complete:
-                    sys.stdout.write("\a")
-                    sys.stdout.flush()
-
-            except Exception as e:
-                if self._app:
-                    self._app.invalidate()
-                    time.sleep(0.05)
-                print()
-                _cprint(f"  ❌ /btw failed: {e}")
-            finally:
-                if self._app:
-                    self._invalidate(min_interval=0)
-
-        thread = threading.Thread(target=run_btw, daemon=True, name=f"btw-{task_id}")
-        thread.start()
-
    @staticmethod
    def _try_launch_chrome_debug(port: int, system: str) -> bool:
        """Try to launch Chrome/Chromium with remote debugging enabled.
@@ -4617,17 +4434,6 @@ class HermesCLI:
        }
        _cprint(labels.get(self.tool_progress_mode, ""))

-    def _toggle_yolo(self):
-        """Toggle YOLO mode — skip all dangerous command approval prompts."""
-        import os
-        current = bool(os.environ.get("HERMES_YOLO_MODE"))
-        if current:
-            os.environ.pop("HERMES_YOLO_MODE", None)
-            self.console.print("  ⚠ YOLO mode [bold red]OFF[/] — dangerous commands will require approval.")
-        else:
-            os.environ["HERMES_YOLO_MODE"] = "1"
-            self.console.print("  ⚡ YOLO mode [bold green]ON[/] — all commands auto-approved. Use with caution.")
-
    def _handle_reasoning_command(self, cmd: str):
        """Handle /reasoning — manage effort level and display toggle.

@@ -5040,33 +4846,6 @@ class HermesCLI:
        except Exception:
            pass

-    def _on_tool_start(self, tool_call_id: str, function_name: str, function_args: dict):
-        """Capture local before-state for write-capable tools."""
-        try:
-            from agent.display import capture_local_edit_snapshot
-
-            snapshot = capture_local_edit_snapshot(function_name, function_args)
-            if snapshot is not None:
-                self._pending_edit_snapshots[tool_call_id] = snapshot
-        except Exception:
-            logger.debug("Edit snapshot capture failed for %s", function_name, exc_info=True)
-
-    def _on_tool_complete(self, tool_call_id: str, function_name: str, function_args: dict, function_result: str):
-        """Render file edits with inline diff after write-capable tools complete."""
-        snapshot = self._pending_edit_snapshots.pop(tool_call_id, None)
-        try:
-            from agent.display import render_edit_diff_with_delta
-
-            render_edit_diff_with_delta(
-                function_name,
-                function_result,
-                function_args=function_args,
-                snapshot=snapshot,
-                print_fn=_cprint,
-            )
-        except Exception:
-            logger.debug("Edit diff preview failed for %s", function_name, exc_info=True)
-
    # ====================================================================
    # Voice mode methods
    # ====================================================================
@@ -5781,8 +5560,6 @@ class HermesCLI:
            self.agent = None

        # Initialize agent if needed
-        if self.agent is None:
-            _cprint(f"{_DIM}Initializing agent...{_RST}")
        if not self._init_agent(
            model_override=turn_route["model"],
            runtime_override=turn_route["runtime"],
@@ -6378,17 +6155,6 @@ class HermesCLI:

    def run(self):
        """Run the interactive CLI loop with persistent input at bottom."""
-        # Push the entire TUI to the bottom of the terminal so the banner,
-        # responses, and prompt all appear pinned to the bottom — empty
-        # space stays above, not below.  This prints enough blank lines to
-        # scroll the cursor to the last row before any content is rendered.
-        try:
-            _term_lines = shutil.get_terminal_size().lines
-            if _term_lines > 2:
-                print("\n" * (_term_lines - 1), end="", flush=True)
-        except Exception:
-            pass
-
        self.show_banner()

        # One-line Honcho session indicator (TTY-only, not captured by agent).
@@ -7614,7 +7380,6 @@ class HermesCLI:
                    finally:
                        self._agent_running = False
                        self._spinner_text = ""
-
                        app.invalidate()  # Refresh status line

                        # Continuous voice: auto-restart recording after agent responds.
@@ -7643,20 +7408,6 @@ class HermesCLI:
        # Register atexit cleanup so resources are freed even on unexpected exit
        atexit.register(_run_cleanup)
        
-        # Register signal handlers for graceful shutdown on SSH disconnect / SIGTERM
-        def _signal_handler(signum, frame):
-            """Handle SIGHUP/SIGTERM by triggering graceful cleanup."""
-            logger.debug("Received signal %s, triggering graceful shutdown", signum)
-            raise KeyboardInterrupt()
-        
-        try:
-            import signal as _signal
-            _signal.signal(_signal.SIGTERM, _signal_handler)
-            if hasattr(_signal, 'SIGHUP'):
-                _signal.signal(_signal.SIGHUP, _signal_handler)
-        except Exception:
-            pass  # Signal handlers may fail in restricted environments
-        
        # Install a custom asyncio exception handler that suppresses the
        # "Event loop is closed" RuntimeError from httpx transport cleanup.
        # This is defense-in-depth — the primary fix is neuter_async_httpx_del
@@ -7680,7 +7431,7 @@ class HermesCLI:
                except Exception:
                    pass
                app.run()
-        except (EOFError, KeyboardInterrupt, BrokenPipeError):
+        except (EOFError, KeyboardInterrupt):
            pass
        finally:
            self._should_exit = True
@@ -7719,23 +7470,6 @@ class HermesCLI:
                    self._session_db.end_session(self.agent.session_id, "cli_close")
                except (Exception, KeyboardInterrupt) as e:
                    logger.debug("Could not close session in DB: %s", e)
-            # Plugin hook: on_session_end — safety net for interrupted exits.
-            # run_conversation() already fires this per-turn on normal completion,
-            # so only fire here if the agent was mid-turn (_agent_running) when
-            # the exit occurred, meaning run_conversation's hook didn't fire.
-            if self.agent and getattr(self, '_agent_running', False):
-                try:
-                    from hermes_cli.plugins import invoke_hook as _invoke_hook
-                    _invoke_hook(
-                        "on_session_end",
-                        session_id=self.agent.session_id,
-                        completed=False,
-                        interrupted=True,
-                        model=getattr(self.agent, 'model', None),
-                        platform=getattr(self.agent, 'platform', None) or "cli",
-                    )
-                except Exception:
-                    pass
            _run_cleanup()
            self._print_exit_summary()

@@ -13,6 +13,7 @@ Core layers:
 Concrete environments:
    - terminal_test_env/: Simple file-creation tasks for testing the stack
    - hermes_swe_env/: SWE-bench style tasks with Modal sandboxes
+    - endless_terminals/: Terminal tasks from HuggingFace dataset with Apptainer containers

 Benchmarks (eval-only):
    - benchmarks/terminalbench_2/: Terminal-Bench 2.0 evaluation
@@ -1,324 +0,0 @@
-"""
-HermesAgent for tau2-bench evaluation.
-
-Implements the tau2 HalfDuplexAgent interface using litellm with OpenRouter,
-matching the inference path used across the rest of the Hermes Agent codebase.
-
-Usage:
-    python environments/benchmarks/taubench/run_eval.py \\
-        --model anthropic/claude-sonnet-4-5 \\
-        --base-url openrouter \\
-        --env retail
-"""
-
-import json
-import os
-import sys
-from pathlib import Path
-from typing import Optional
-
-import litellm
-from pydantic import BaseModel
-
-_repo_root = Path(__file__).resolve().parent.parent.parent.parent
-if str(_repo_root) not in sys.path:
-    sys.path.insert(0, str(_repo_root))
-
-from environments.tool_call_parsers import get_parser
-
-from tau2.agent.base_agent import HalfDuplexAgent, ValidAgentInputMessage
-from tau2.data_model.message import (
-    AssistantMessage,
-    Message,
-    MultiToolMessage,
-    SystemMessage,
-    ToolCall,
-    ToolMessage,
-    UserMessage,
-)
-from tau2.environment.tool import Tool
-
-
-class HermesAgentState(BaseModel):
-    system_messages: list[SystemMessage]
-    messages: list
-
-
-class HermesAgent(HalfDuplexAgent[HermesAgentState]):
-    """
-    tau2 HalfDuplexAgent backed by litellm, using OpenRouter (or any
-    OpenAI-compatible endpoint).
-
-    Registered as "hermes_agent" in the tau2 registry by run_eval.py.
-    """
-
-    SYSTEM_PROMPT = (
-        "You are a customer service agent that helps the user according to the "
-        "<policy> provided below.\n"
-        "In each turn you can either:\n"
-        "- Send a message to the user.\n"
-        "- Make a tool call.\n"
-        "You cannot do both at the same time.\n\n"
-        "Try to be helpful and always follow the policy. "
-        "Always make sure you generate valid JSON only.\n\n"
-        "<policy>\n{domain_policy}\n</policy>"
-    )
-
-    # System prompt variant for qwen3_coder tool format — tools are embedded
-    # directly in the system prompt as <tools> XML instead of passed via the
-    # OpenAI tools= parameter.
-    SYSTEM_PROMPT_QWEN3_CODER = (
-        "You are a customer service agent that helps the user according to the "
-        "<policy> provided below.\n"
-        "In each turn you can either:\n"
-        "- Send a message to the user.\n"
-        "- Make a tool call.\n"
-        "You cannot do both at the same time.\n\n"
-        "Try to be helpful and always follow the policy. "
-        "Always make sure you generate valid JSON only.\n\n"
-        "You may call one or more functions to assist with the user query.\n\n"
-        "You are provided with function signatures within <tools></tools> XML tags:\n"
-        "<tools>\n{tools_json}\n</tools>\n\n"
-        "<policy>\n{domain_policy}\n</policy>"
-    )
-
-    def __init__(
-        self,
-        tools: list[Tool],
-        domain_policy: str,
-        model: str,
-        base_url: Optional[str] = None,
-        api_key: Optional[str] = None,
-        temperature: float = 0.0,
-        max_tokens: Optional[int] = None,
-        top_p: Optional[float] = None,
-        thinking: bool = False,
-        tool_parser: Optional[str] = None,
-    ):
-        super().__init__(tools=tools, domain_policy=domain_policy)
-        self.model = model
-        self.base_url = base_url
-        self.api_key = api_key
-        self.temperature = temperature
-        self.max_tokens = max_tokens
-        self.top_p = top_p
-        self.thinking = thinking
-        self.tool_parser = tool_parser
-        self._parser = get_parser(tool_parser) if tool_parser else None
-
-        # OpenRouter requires specific headers; pass them via litellm extra_headers
-        self._extra_headers: dict = {}
-        if base_url and "openrouter" in base_url.lower():
-            self._extra_headers = {
-                "HTTP-Referer": "https://hermes-agent.nousresearch.com",
-                "X-Title": "Hermes Agent",
-            }
-
-    @property
-    def system_prompt(self) -> str:
-        if self.tool_parser == "qwen3_coder" and self.tools:
-            tools_json = json.dumps(
-                [t.openai_schema for t in self.tools], indent=2, ensure_ascii=False
-            )
-            return self.SYSTEM_PROMPT_QWEN3_CODER.format(
-                tools_json=tools_json,
-                domain_policy=self.domain_policy,
-            )
-        return self.SYSTEM_PROMPT.format(domain_policy=self.domain_policy)
-
-    def get_init_state(
-        self, message_history: Optional[list[Message]] = None
-    ) -> HermesAgentState:
-        return HermesAgentState(
-            system_messages=[SystemMessage(role="system", content=self.system_prompt)],
-            messages=list(message_history or []),
-        )
-
-    def generate_next_message(
-        self, message: ValidAgentInputMessage, state: HermesAgentState
-    ) -> tuple[AssistantMessage, HermesAgentState]:
-        # Append incoming message(s) to history
-        if isinstance(message, MultiToolMessage):
-            state.messages.extend(message.tool_messages)
-        else:
-            state.messages.append(message)
-
-        # Build litellm-compatible message list
-        all_messages = state.system_messages + state.messages
-        lm_messages = [_to_litellm_message(m) for m in all_messages]
-
-        kwargs = dict(
-            model=self.model,
-            messages=lm_messages,
-            temperature=self.temperature,
-        )
-        if self.tools:
-            kwargs["tools"] = [t.openai_schema for t in self.tools]
-        if self.max_tokens is not None:
-            kwargs["max_tokens"] = self.max_tokens
-        if self.top_p is not None:
-            kwargs["top_p"] = self.top_p
-        # Enable thinking/reasoning mode. OpenRouter exposes this as
-        # `include_reasoning` for nemotron (per supported_parameters in the
-        # model metadata). Pass via extra_body to bypass litellm filtering.
-        if self.thinking:
-            kwargs["extra_body"] = {"include_reasoning": True}
-        # Only pass base_url when model doesn't already have a provider prefix
-        # (litellm uses either the prefix OR base_url, not both)
-        if self.base_url and not self.model.startswith("openrouter/"):
-            kwargs["base_url"] = self.base_url
-        if self.api_key:
-            kwargs["api_key"] = self.api_key
-        if self._extra_headers:
-            kwargs["extra_headers"] = self._extra_headers
-
-        response = litellm.completion(**kwargs)
-        assistant_msg = _litellm_response_to_assistant_message(response, parser=self._parser)
-
-        state.messages.append(assistant_msg)
-        return assistant_msg, state
-
-
-# ---------------------------------------------------------------------------
-# Conversion helpers
-# ---------------------------------------------------------------------------
-
-
-def _to_litellm_message(msg) -> dict:
-    """Convert a tau2 message object to a litellm-compatible dict."""
-    if isinstance(msg, SystemMessage):
-        return {"role": "system", "content": msg.content or ""}
-
-    if isinstance(msg, UserMessage):
-        if msg.tool_calls:
-            # User tool calls (tau2 v2 feature — user has tools too)
-            return {
-                "role": "user",
-                "content": msg.content or "",
-                "tool_calls": [_tool_call_to_dict(tc) for tc in msg.tool_calls],
-            }
-        return {"role": "user", "content": msg.content or ""}
-
-    if isinstance(msg, AssistantMessage):
-        d: dict = {"role": "assistant", "content": msg.content or ""}
-        if msg.tool_calls:
-            d["tool_calls"] = [_tool_call_to_dict(tc) for tc in msg.tool_calls]
-        return d
-
-    if isinstance(msg, ToolMessage):
-        return {
-            "role": "tool",
-            "tool_call_id": msg.id,
-            "content": msg.content or "",
-        }
-
-    # Fallback
-    return {"role": getattr(msg, "role", "user"), "content": str(getattr(msg, "content", ""))}
-
-
-def _tool_call_to_dict(tc: ToolCall) -> dict:
-    import json
-    return {
-        "id": tc.id or "call_0",
-        "type": "function",
-        "function": {
-            "name": tc.name,
-            "arguments": json.dumps(tc.arguments),
-        },
-    }
-
-
-def _litellm_response_to_assistant_message(response, parser=None) -> AssistantMessage:
-    """Convert a litellm ModelResponse to a tau2 AssistantMessage."""
-    import json
-
-    choice = response.choices[0]
-    msg = choice.message
-
-    content = msg.content or ""
-    tool_calls_raw = getattr(msg, "tool_calls", None)
-
-    tau2_tool_calls: Optional[list[ToolCall]] = None
-
-    if parser and content:
-        # Use the custom tool parser (e.g. qwen3_coder) to extract tool calls
-        # from the raw text response.
-        parsed_content, parsed_tool_calls = parser.parse(content)
-        if parsed_tool_calls:
-            content = parsed_content or ""
-            tau2_tool_calls = []
-            for tc in parsed_tool_calls:
-                try:
-                    arguments = json.loads(tc.function.arguments or "{}")
-                except json.JSONDecodeError:
-                    arguments = {}
-                tau2_tool_calls.append(
-                    ToolCall(
-                        id=tc.id or "call_0",
-                        name=tc.function.name,
-                        arguments=arguments,
-                        requestor="assistant",
-                    )
-                )
-    elif tool_calls_raw:
-        tau2_tool_calls = []
-        for tc in tool_calls_raw:
-            if hasattr(tc, "function"):
-                name = tc.function.name
-                try:
-                    arguments = json.loads(tc.function.arguments or "{}")
-                except json.JSONDecodeError:
-                    arguments = {}
-                tau2_tool_calls.append(
-                    ToolCall(
-                        id=tc.id or "call_0",
-                        name=name,
-                        arguments=arguments,
-                        requestor="assistant",
-                    )
-                )
-
-    cost = None
-    try:
-        cost = litellm.completion_cost(response)
-    except Exception:
-        pass
-
-    usage = None
-    if hasattr(response, "usage") and response.usage:
-        usage = dict(response.usage)
-
-    return AssistantMessage(
-        role="assistant",
-        content=content if not tau2_tool_calls else None,
-        tool_calls=tau2_tool_calls,
-        cost=cost,
-        usage=usage,
-    )
-
-
-def create_hermes_agent(tools: list[Tool], domain_policy: str, **kwargs) -> HermesAgent:
-    """
-    Factory function registered with the tau2 registry.
-
-    Expected kwargs:
-        model (str): litellm model string
-        base_url (str): API base URL (optional)
-        api_key (str): API key (optional)
-        temperature (float): sampling temperature (default 0.0)
-        top_p (float): nucleus sampling (optional)
-        max_tokens (int): max tokens (optional)
-        thinking (bool): enable reasoning/thinking mode (default False)
-    """
-    return HermesAgent(
-        tools=tools,
-        domain_policy=domain_policy,
-        model=kwargs["model"],
-        base_url=kwargs.get("base_url"),
-        api_key=kwargs.get("api_key"),
-        temperature=kwargs.get("temperature", 0.0),
-        top_p=kwargs.get("top_p"),
-        max_tokens=kwargs.get("max_tokens"),
-        thinking=kwargs.get("thinking", False),
-        tool_parser=kwargs.get("tool_parser"),
-    )
@@ -1,288 +0,0 @@
-"""
-tau2-bench evaluation runner for Hermes Agent.
-
-Runs the tau2-bench retail, airline, telecom, or banking_knowledge evaluation
-using HermesAgent backed by litellm — the same inference path used across the
-rest of the Hermes Agent codebase.
-
-Usage:
-    # Against OpenRouter (auto-detects OPENROUTER_API_KEY)
-    python environments/benchmarks/taubench/run_eval.py \\
-        --model openrouter/anthropic/claude-sonnet-4-5 \\
-        --base-url openrouter \\
-        --env retail
-
-    # Against OpenAI directly
-    python environments/benchmarks/taubench/run_eval.py \\
-        --model gpt-4o \\
-        --env retail
-
-    # Local vLLM
-    python environments/benchmarks/taubench/run_eval.py \\
-        --model openai/NousResearch/Hermes-3-Llama-3.1-70B \\
-        --base-url http://localhost:8000/v1 \\
-        --env retail \\
-        --num-trials 3
-
-    # Specific tasks only
-    python environments/benchmarks/taubench/run_eval.py \\
-        --model openrouter/anthropic/claude-sonnet-4-5 \\
-        --base-url openrouter \\
-        --env retail \\
-        --task-ids task_1 task_2 task_5
-
-Results are saved to results/tau2bench/ as JSON.
-
-Dependencies (requires Python 3.12+):
-    pip install "tau2 @ git+https://github.com/sierra-research/tau2-bench.git"
-    # or: pip install -e ".[tau2bench]"
-"""
-
-import argparse
-import logging
-import os
-import sys
-from pathlib import Path
-from typing import Optional
-
-_repo_root = Path(__file__).resolve().parent.parent.parent.parent
-if str(_repo_root) not in sys.path:
-    sys.path.insert(0, str(_repo_root))
-
-from tau2.data_model.simulation import Results, TextRunConfig
-from tau2.evaluator.evaluator import EvaluationType
-from tau2.registry import registry
-from tau2.runner.batch import run_tasks
-from tau2.runner.helpers import get_tasks
-
-from environments.benchmarks.taubench.hermes_agent import create_hermes_agent
-
-logging.basicConfig(
-    level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s"
-)
-logger = logging.getLogger(__name__)
-
-OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
-AGENT_NAME = "hermes_agent"
-
-
-def _register_agent(
-    model: str,
-    base_url: Optional[str],
-    api_key: Optional[str],
-    temperature: float,
-    top_p: Optional[float],
-    max_tokens: Optional[int],
-    thinking: bool,
-    tool_parser: Optional[str],
-) -> None:
-    """Register the HermesAgent factory with the tau2 registry (idempotent)."""
-    if registry.get_agent_factory(AGENT_NAME) is not None:
-        return
-
-    def factory(tools, domain_policy, **kwargs):
-        return create_hermes_agent(
-            tools=tools,
-            domain_policy=domain_policy,
-            model=model,
-            base_url=base_url,
-            api_key=api_key,
-            temperature=temperature,
-            top_p=top_p,
-            max_tokens=max_tokens,
-            thinking=thinking,
-            tool_parser=tool_parser,
-        )
-
-    registry.register_agent_factory(factory=factory, name=AGENT_NAME)
-    logger.info("Registered agent factory: %s (model=%s, thinking=%s, tool_parser=%s)", AGENT_NAME, model, thinking, tool_parser)
-
-
-def run_eval(
-    model: str,
-    base_url: Optional[str],
-    api_key: Optional[str],
-    user_model: str,
-    env_name: str,
-    task_split: Optional[str],
-    num_trials: int,
-    max_concurrency: int,
-    max_steps: int,
-    temperature: float,
-    top_p: Optional[float],
-    max_tokens: Optional[int],
-    thinking: bool,
-    tool_parser: Optional[str],
-    task_ids: Optional[list],
-    start_index: int,
-    end_index: int,
-    log_dir: str,
-    seed: int,
-) -> Results:
-    # Resolve OpenRouter shorthand
-    if base_url and base_url.strip().lower() == "openrouter":
-        base_url = OPENROUTER_BASE_URL
-
-    is_openrouter = base_url and "openrouter" in base_url.lower()
-
-    # litellm requires the "openrouter/" prefix to route correctly
-    if is_openrouter and not model.startswith("openrouter/"):
-        model = f"openrouter/{model}"
-    if is_openrouter and not user_model.startswith("openrouter/"):
-        user_model = f"openrouter/{user_model}"
-
-    # Resolve API key
-    if is_openrouter:
-        api_key = api_key or os.environ.get("OPENROUTER_API_KEY") or os.environ.get("OPENAI_API_KEY")
-        # litellm reads OPENAI_API_KEY for base_url overrides; set it so the
-        # user simulator's generate() call also authenticates correctly.
-        if api_key and not os.environ.get("OPENAI_API_KEY"):
-            os.environ["OPENAI_API_KEY"] = api_key
-    else:
-        api_key = api_key or os.environ.get("OPENAI_API_KEY")
-
-    _register_agent(
-        model=model,
-        base_url=base_url,
-        api_key=api_key,
-        temperature=temperature,
-        top_p=top_p,
-        max_tokens=max_tokens,
-        thinking=thinking,
-        tool_parser=tool_parser,
-    )
-
-    # Load tasks — task_ids in tau2 are strings like "task_1"
-    tasks = get_tasks(
-        task_set_name=env_name,
-        task_split_name=task_split,
-        task_ids=[str(i) for i in task_ids] if task_ids else None,
-    )
-
-    if not task_ids and (end_index != -1 or start_index != 0):
-        end = end_index if end_index != -1 else len(tasks)
-        tasks = tasks[start_index:end]
-
-    logger.info(
-        "Running tau2-%s eval: %d tasks, %d trial(s), concurrency=%d",
-        env_name, len(tasks), num_trials, max_concurrency,
-    )
-
-    save_path = Path(log_dir) / f"tau2-{env_name}-{model.split('/')[-1]}.json"
-    save_path.parent.mkdir(parents=True, exist_ok=True)
-
-    # Pass api_key/base_url to user sim via llm_args so tau2's generate() authenticates.
-    # When using OpenRouter for the user sim, mirror the agent's key + endpoint.
-    user_llm_args: dict = {}
-    if is_openrouter and api_key:
-        user_llm_args["api_key"] = api_key
-        user_llm_args["base_url"] = base_url
-
-    config = TextRunConfig(
-        domain=env_name,
-        agent=AGENT_NAME,
-        user="user_simulator",
-        llm_agent=model,
-        llm_args_agent={},
-        llm_user=user_model,
-        llm_args_user=user_llm_args,
-        num_trials=num_trials,
-        max_steps=max_steps,
-        max_concurrency=max_concurrency,
-        seed=seed,
-    )
-
-    results = run_tasks(
-        config,
-        tasks,
-        save_path=save_path,
-        console_display=True,
-        # ALL: respects each task's reward_basis. NL assertions are skipped
-        # gracefully (scored as pass) rather than raising an error, so tasks
-        # are evaluated only on their actual basis components (DB, ACTION, etc.)
-        evaluation_type=EvaluationType.ALL,
-    )
-
-    logger.info("Results saved to %s", save_path)
-    return results
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Run tau2-bench evaluation with Hermes Agent (requires Python 3.12+)",
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
-    )
-    parser.add_argument(
-        "--model", required=True,
-        help="litellm model string, e.g. 'openrouter/anthropic/claude-sonnet-4-5' or 'gpt-4o'",
-    )
-    parser.add_argument(
-        "--base-url", default=None,
-        help="API base URL. Use 'openrouter' as shorthand for https://openrouter.ai/api/v1.",
-    )
-    parser.add_argument("--api-key", default=None, help="API key (falls back to OPENROUTER_API_KEY / OPENAI_API_KEY)")
-    parser.add_argument("--temperature", type=float, default=1.0,
-                        help="Sampling temperature. NVIDIA used 1.0 for nemotron-super.")
-    parser.add_argument("--top-p", type=float, default=0.95,
-                        help="Nucleus sampling. NVIDIA used 0.95 for nemotron-super.")
-    parser.add_argument("--max-tokens", type=int, default=None)
-    parser.add_argument("--thinking", action="store_true", default=False,
-                        help="Enable reasoning/thinking mode (use_reasoning=true). "
-                             "Required to match NVIDIA's reported nemotron-super scores.")
-    parser.add_argument("--tool-parser", default=None,
-                        help="Tool call parser to use (e.g. 'qwen3_coder'). When set, tools are "
-                             "embedded in the system prompt as <tools> XML and responses are parsed "
-                             "from raw text instead of using OpenAI function calling format.")
-    parser.add_argument(
-        "--user-model", default="qwen/qwen3-235b-a22b-2507:nitro",
-        help="litellm model string for the tau2 user simulator. "
-             "Defaults to qwen/qwen3-235b-a22b-2507:nitro (instruct, non-thinking) to match NVIDIA's eval setup. "
-             "When using --base-url openrouter the openrouter/ prefix is added automatically.",
-    )
-    parser.add_argument(
-        "--env", default="retail",
-        choices=["retail", "airline", "telecom", "banking_knowledge", "mock"],
-    )
-    parser.add_argument(
-        "--task-split", default=None,
-        help="Task split name (e.g. 'base'). Defaults to the domain default.",
-    )
-    parser.add_argument("--num-trials", type=int, default=1)
-    parser.add_argument("--max-concurrency", type=int, default=8)
-    parser.add_argument("--max-steps", type=int, default=50)
-    parser.add_argument(
-        "--task-ids", nargs="*", default=None,
-        help="Specific task IDs to run (tau2 task IDs are strings like 'task_1')",
-    )
-    parser.add_argument("--start-index", type=int, default=0)
-    parser.add_argument("--end-index", type=int, default=-1)
-    parser.add_argument("--seed", type=int, default=10)
-    parser.add_argument("--log-dir", default="results/tau2bench")
-
-    args = parser.parse_args()
-
-    run_eval(
-        model=args.model,
-        base_url=args.base_url,
-        api_key=args.api_key,
-        user_model=args.user_model,
-        env_name=args.env,
-        task_split=args.task_split,
-        num_trials=args.num_trials,
-        max_concurrency=args.max_concurrency,
-        max_steps=args.max_steps,
-        temperature=args.temperature,
-        top_p=args.top_p,
-        max_tokens=args.max_tokens,
-        thinking=args.thinking,
-        tool_parser=args.tool_parser,
-        task_ids=args.task_ids,
-        start_index=args.start_index,
-        end_index=args.end_index,
-        log_dir=args.log_dir,
-        seed=args.seed,
-    )
-
-
-if __name__ == "__main__":
-    main()
@@ -0,0 +1,5 @@
+"""Endless Terminals Environment - Terminal task training from HuggingFace dataset."""
+
+from .endless_terminals_env import EndlessTerminalsEnv, EndlessTerminalsEnvConfig
+
+__all__ = ["EndlessTerminalsEnv", "EndlessTerminalsEnvConfig"]
@@ -0,0 +1,91 @@
+# Endless Terminals - Qwen3-4B-Instruct-2507
+# Single config for both trainer (launch_training.py) and env (endless_terminals_env.py serve)
+#
+# Usage:
+#   Terminal 1: run-api
+#   Terminal 2: cd tinker-atropos && python launch_training.py --config ../environments/endless_terminals/tinker_qwen.yaml
+#   Terminal 3: python environments/endless_terminals/endless_terminals_env.py serve --config environments/endless_terminals/tinker_qwen.yaml
+
+env:
+  # Toolsets
+  enabled_toolsets: ["terminal", "file"]
+
+  # Model / tokenizer
+  tokenizer_name: "Qwen/Qwen3-4B-Instruct-2507"
+
+  # Agent configuration
+  max_agent_turns: 16
+  max_token_length: 2048
+  agent_temperature: 0.6
+  extra_body:
+    chat_template_kwargs:
+      enable_thinking: false
+  tool_call_parser: "hermes"
+
+  # Terminal backend
+  terminal_backend: "docker"
+
+  # Dataset settings
+  use_dataset: true
+  dataset_name: "obiwan96/endless-terminals"
+  dataset_split: "train"
+  dataset_cache_dir: "~/.cache/huggingface/datasets"
+  tasks_base_dir: "/Users/samherring/Desktop/Projects/Hermes-Agent/endless-terminals"
+
+  # Test execution
+  test_timeout_s: 180
+  default_docker_image: "ubuntu:22.04"
+  max_concurrent_containers: 16
+
+  # Training configuration
+  group_size: 16
+  batch_size: 64          # 4 groups × 16 rollouts per step
+  total_steps: 500
+  steps_per_eval: 5
+  min_items_sent_before_logging: 1
+  ensure_scores_are_not_same: true
+  max_num_workers: 2048
+  worker_timeout: 3600
+  inference_weight: 1.0
+  eval_limit_ratio: 0.1
+  rollout_server_url: "http://localhost:8000"
+
+  # Evaluation configuration
+  num_eval_tasks: 20
+  eval_split_ratio: 0.1
+
+  # Logging
+  use_wandb: true
+  wandb_name: "endless-terminals-qwen3-4b"
+
+  # System prompt
+  system_prompt: >
+    You are a skilled Linux system administrator and programmer.
+    You have access to a terminal and file tools to complete system administration
+    and programming tasks. Use the tools effectively to solve the given task,
+    and verify your solution works correctly before finishing.
+    Keep each command short and focused — break complex tasks into multiple steps
+    rather than writing long one-liners.
+
+tinker:
+  lora_rank: 32
+  learning_rate: 0.0000005
+  max_token_trainer_length: 32768
+  checkpoint_dir: "./temp/"
+  save_checkpoint_interval: 50
+  wandb_project: "endless-terminals"
+  wandb_group: null
+  wandb_run_name: "qwen3-4b"
+  tool_call_parser: "hermes"
+
+openai:
+  - model_name: "Qwen/Qwen3-4B-Instruct-2507"
+    base_url: "http://localhost:8001/v1"
+    api_key: "x"
+    weight: 1.0
+    num_requests_for_eval: 64
+    timeout: 600
+    server_type: "sglang"
+
+slurm: false
+testing: false
@@ -298,7 +298,6 @@ class HermesAgentBaseEnv(BaseEnv):
            return False

        server = self.server.servers[0]
-        # If the server is an OpenAI server (not VLLM/SGLang), use direct mode
        from atroposlib.envs.server_handling.openai_server import OpenAIServer
        return not isinstance(server, OpenAIServer)

@@ -48,7 +48,13 @@ class HermesToolCallParser(ToolCallParser):
                if not raw_json.strip():
                    continue

-                tc_data = json.loads(raw_json)
+                try:
+                    tc_data = json.loads(raw_json)
+                except json.JSONDecodeError:
+                    # Fix invalid backslash escapes from shell commands in JSON strings
+                    # e.g. \s \w \d \n (unescaped) → \\s \\w \\d \\n
+                    fixed = re.sub(r'\\([^"\\/bfnrtu0-9\n])', r'\\\\\1', raw_json)
+                    tc_data = json.loads(fixed)
                tool_calls.append(
                    ChatCompletionMessageToolCall(
                        id=f"call_{uuid.uuid4().hex[:8]}",
@@ -27,16 +27,9 @@ def _coerce_bool(value: Any, default: bool = True) -> bool:
        return default
    if isinstance(value, bool):
        return value
-    if isinstance(value, int):
-        return value != 0
    if isinstance(value, str):
-        lowered = value.strip().lower()
-        if lowered in ("true", "1", "yes", "on"):
-            return True
-        if lowered in ("false", "0", "no", "off"):
-            return False
-        return default
-    return default
+        return value.strip().lower() in ("true", "1", "yes", "on")
+    return bool(value)


 def _normalize_unauthorized_dm_behavior(value: Any, default: str = "pair") -> str:
@@ -550,8 +543,6 @@ def load_gateway_config() -> GatewayConfig:
                    os.environ["DISCORD_FREE_RESPONSE_CHANNELS"] = str(frc)
                if "auto_thread" in discord_cfg and not os.getenv("DISCORD_AUTO_THREAD"):
                    os.environ["DISCORD_AUTO_THREAD"] = str(discord_cfg["auto_thread"]).lower()
-                if "reactions" in discord_cfg and not os.getenv("DISCORD_REACTIONS"):
-                    os.environ["DISCORD_REACTIONS"] = str(discord_cfg["reactions"]).lower()

            # Telegram settings → env vars (env vars take precedence)
            telegram_cfg = yaml_cfg.get("telegram", {})
@@ -70,15 +70,12 @@ class DeliveryTarget:
        if target == "local":
            return cls(platform=Platform.LOCAL)
        
-        # Check for platform:chat_id or platform:chat_id:thread_id format
+        # Check for platform:chat_id format
        if ":" in target:
-            parts = target.split(":", 2)
-            platform_str = parts[0]
-            chat_id = parts[1] if len(parts) > 1 else None
-            thread_id = parts[2] if len(parts) > 2 else None
+            platform_str, chat_id = target.split(":", 1)
            try:
                platform = Platform(platform_str)
-                return cls(platform=platform, chat_id=chat_id, thread_id=thread_id, is_explicit=True)
+                return cls(platform=platform, chat_id=chat_id, is_explicit=True)
            except ValueError:
                # Unknown platform, treat as local
                return cls(platform=Platform.LOCAL)
@@ -97,8 +94,6 @@ class DeliveryTarget:
            return "origin"
        if self.platform == Platform.LOCAL:
            return "local"
-        if self.chat_id and self.thread_id:
-            return f"{self.platform.value}:{self.chat_id}:{self.thread_id}"
        if self.chat_id:
            return f"{self.platform.value}:{self.chat_id}"
        return self.platform.value
@@ -380,7 +380,6 @@ class APIServerAdapter(BasePlatformAdapter):
        ephemeral_system_prompt: Optional[str] = None,
        session_id: Optional[str] = None,
        stream_delta_callback=None,
-        tool_progress_callback=None,
    ) -> Any:
        """
        Create an AIAgent instance using the gateway's runtime config.
@@ -413,7 +412,6 @@ class APIServerAdapter(BasePlatformAdapter):
            session_id=session_id,
            platform="api_server",
            stream_delta_callback=stream_delta_callback,
-            tool_progress_callback=tool_progress_callback,
        )
        return agent

@@ -516,15 +514,6 @@ class APIServerAdapter(BasePlatformAdapter):
                if delta is not None:
                    _stream_q.put(delta)

-            def _on_tool_progress(name, preview, args):
-                """Inject tool progress into the SSE stream for Open WebUI."""
-                if name.startswith("_"):
-                    return  # Skip internal events (_thinking)
-                from agent.display import get_tool_emoji
-                emoji = get_tool_emoji(name)
-                label = preview or name
-                _stream_q.put(f"\n`{emoji} {label}`\n")
-
            # Start agent in background.  agent_ref is a mutable container
            # so the SSE writer can interrupt the agent on client disconnect.
            agent_ref = [None]
@@ -534,7 +523,6 @@ class APIServerAdapter(BasePlatformAdapter):
                ephemeral_system_prompt=system_prompt,
                session_id=session_id,
                stream_delta_callback=_on_delta,
-                tool_progress_callback=_on_tool_progress,
                agent_ref=agent_ref,
            ))

@@ -1206,7 +1194,6 @@ class APIServerAdapter(BasePlatformAdapter):
        ephemeral_system_prompt: Optional[str] = None,
        session_id: Optional[str] = None,
        stream_delta_callback=None,
-        tool_progress_callback=None,
        agent_ref: Optional[list] = None,
    ) -> tuple:
        """
@@ -1227,7 +1214,6 @@ class APIServerAdapter(BasePlatformAdapter):
                ephemeral_system_prompt=ephemeral_system_prompt,
                session_id=session_id,
                stream_delta_callback=stream_delta_callback,
-                tool_progress_callback=tool_progress_callback,
            )
            if agent_ref is not None:
                agent_ref[0] = agent
@@ -408,7 +408,7 @@ class VoiceReceiver:
 class DiscordAdapter(BasePlatformAdapter):
    """
    Discord bot adapter.
-
+    
    Handles:
    - Receiving messages from servers and DMs
    - Sending responses with Discord markdown
@@ -418,10 +418,10 @@ class DiscordAdapter(BasePlatformAdapter):
    - Auto-threading for long conversations
    - Reaction-based feedback
    """
-
+    
    # Discord message limits
    MAX_MESSAGE_LENGTH = 2000
-
+    
    # Auto-disconnect from voice channel after this many seconds of inactivity
    VOICE_TIMEOUT = 300

@@ -449,7 +449,7 @@ class DiscordAdapter(BasePlatformAdapter):
        self._bot_task: Optional[asyncio.Task] = None
        # Cap to prevent unbounded growth (Discord threads get archived).
        self._MAX_TRACKED_THREADS = 500
-
+    
    async def connect(self) -> bool:
        """Connect to Discord and start receiving events."""
        if not DISCORD_AVAILABLE:
@@ -480,11 +480,11 @@ class DiscordAdapter(BasePlatformAdapter):
                    logger.warning("Opus codec found at %s but failed to load", opus_path)
            if not discord.opus.is_loaded():
                logger.warning("Opus codec not found — voice channel playback disabled")
-
+        
        if not self.config.token:
            logger.error("[%s] No bot token configured", self.name)
            return False
-
+        
        try:
            # Acquire scoped lock to prevent duplicate bot token usage
            from gateway.status import acquire_scoped_lock
@@ -504,13 +504,13 @@ class DiscordAdapter(BasePlatformAdapter):
            intents.guild_messages = True
            intents.members = True
            intents.voice_states = True
-
+            
            # Create bot
            self._client = commands.Bot(
                command_prefix="!",  # Not really used, we handle raw messages
                intents=intents,
            )
-
+            
            # Parse allowed user entries (may contain usernames or IDs)
            allowed_env = os.getenv("DISCORD_ALLOWED_USERS", "")
            if allowed_env:
@@ -518,17 +518,17 @@ class DiscordAdapter(BasePlatformAdapter):
                    _clean_discord_id(uid) for uid in allowed_env.split(",")
                    if uid.strip()
                }
-
+            
            adapter_self = self  # capture for closure
-
+            
            # Register event handlers
            @self._client.event
            async def on_ready():
                logger.info("[%s] Connected as %s", adapter_self.name, adapter_self._client.user)
-
+                
                # Resolve any usernames in the allowed list to numeric IDs
                await adapter_self._resolve_allowed_usernames()
-
+                
                # Sync slash commands with Discord
                try:
                    synced = await adapter_self._client.tree.sync()
@@ -536,22 +536,18 @@ class DiscordAdapter(BasePlatformAdapter):
                except Exception as e:  # pragma: no cover - defensive logging
                    logger.warning("[%s] Slash command sync failed: %s", adapter_self.name, e, exc_info=True)
                adapter_self._ready_event.set()
-
+            
            @self._client.event
            async def on_message(message: DiscordMessage):
                # Always ignore our own messages
                if message.author == self._client.user:
                    return
-
+                
                # Ignore Discord system messages (thread renames, pins, member joins, etc.)
                # Allow both default and reply types — replies have a distinct MessageType.
                if message.type not in (discord.MessageType.default, discord.MessageType.reply):
                    return
-
-                # Check if the message author is in the allowed user list
-                if not self._is_allowed_user(str(message.author.id)):
-                    return
-
+                
                # Bot message filtering (DISCORD_ALLOW_BOTS):
                #   "none"     — ignore all other bots (default)
                #   "mentions" — accept bot messages only when they @mention us
@@ -564,7 +560,7 @@ class DiscordAdapter(BasePlatformAdapter):
                        if not self._client.user or self._client.user not in message.mentions:
                            return
                    # "all" falls through to handle_message
-
+                
                # If the message @mentions other users but NOT the bot, the
                # sender is talking to someone else — stay silent.  Only
                # applies in server channels; in DMs the user is always
@@ -618,23 +614,23 @@ class DiscordAdapter(BasePlatformAdapter):

            # Register slash commands
            self._register_slash_commands()
-
+            
            # Start the bot in background
            self._bot_task = asyncio.create_task(self._client.start(self.config.token))
-
+            
            # Wait for ready
            await asyncio.wait_for(self._ready_event.wait(), timeout=30)
-
+            
            self._running = True
            return True
-
+            
        except asyncio.TimeoutError:
            logger.error("[%s] Timeout waiting for connection to Discord", self.name, exc_info=True)
            return False
        except Exception as e:  # pragma: no cover - defensive logging
            logger.error("[%s] Failed to connect to Discord: %s", self.name, e, exc_info=True)
            return False
-
+    
    async def disconnect(self) -> None:
        """Disconnect from Discord."""
        # Clean up all active voice connections before closing the client
@@ -687,27 +683,19 @@ class DiscordAdapter(BasePlatformAdapter):
            logger.debug("[%s] remove_reaction failed (%s): %s", self.name, emoji, e)
            return False

-    def _reactions_enabled(self) -> bool:
-        """Check if message reactions are enabled via config/env."""
-        return os.getenv("DISCORD_REACTIONS", "true").lower() not in ("false", "0", "no")
-
    async def on_processing_start(self, event: MessageEvent) -> None:
        """Add an in-progress reaction for normal Discord message events."""
-        if not self._reactions_enabled():
-            return
        message = event.raw_message
        if hasattr(message, "add_reaction"):
            await self._add_reaction(message, "👀")

    async def on_processing_complete(self, event: MessageEvent, success: bool) -> None:
        """Swap the in-progress reaction for a final success/failure reaction."""
-        if not self._reactions_enabled():
-            return
        message = event.raw_message
        if hasattr(message, "add_reaction"):
            await self._remove_reaction(message, "👀")
            await self._add_reaction(message, "✅" if success else "❌")
-
+    
    async def send(
        self,
        chat_id: str,
@@ -724,24 +712,24 @@ class DiscordAdapter(BasePlatformAdapter):
            channel = self._client.get_channel(int(chat_id))
            if not channel:
                channel = await self._client.fetch_channel(int(chat_id))
-
+            
            if not channel:
                return SendResult(success=False, error=f"Channel {chat_id} not found")
-
+            
            # Format and split message if needed
            formatted = self.format_message(content)
            chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
-
+            
            message_ids = []
            reference = None
-
+            
            if reply_to:
                try:
                    ref_msg = await channel.fetch_message(int(reply_to))
                    reference = ref_msg
                except Exception as e:
                    logger.debug("Could not fetch reply-to message: %s", e)
-
+            
            for i, chunk in enumerate(chunks):
                chunk_reference = reference if i == 0 else None
                try:
@@ -768,13 +756,13 @@ class DiscordAdapter(BasePlatformAdapter):
                    else:
                        raise
                message_ids.append(str(msg.id))
-
+            
            return SendResult(
                success=True,
                message_id=message_ids[0] if message_ids else None,
                raw_response={"message_ids": message_ids}
            )
-
+            
        except Exception as e:  # pragma: no cover - defensive logging
            logger.error("[%s] Failed to send Discord message: %s", self.name, e, exc_info=True)
            return SendResult(success=False, error=str(e))
@@ -1246,25 +1234,25 @@ class DiscordAdapter(BasePlatformAdapter):
        """Send an image natively as a Discord file attachment."""
        if not self._client:
            return SendResult(success=False, error="Not connected")
-
+        
        try:
            import aiohttp
-
+            
            channel = self._client.get_channel(int(chat_id))
            if not channel:
                channel = await self._client.fetch_channel(int(chat_id))
            if not channel:
                return SendResult(success=False, error=f"Channel {chat_id} not found")
-
+            
            # Download the image and send as a Discord file attachment
            # (Discord renders attachments inline, unlike plain URLs)
            async with aiohttp.ClientSession() as session:
                async with session.get(image_url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
                    if resp.status != 200:
                        raise Exception(f"Failed to download image: HTTP {resp.status}")
-
+                    
                    image_data = await resp.read()
-
+                    
                    # Determine filename from URL or content type
                    content_type = resp.headers.get("content-type", "image/png")
                    ext = "png"
@@ -1274,16 +1262,16 @@ class DiscordAdapter(BasePlatformAdapter):
                        ext = "gif"
                    elif "webp" in content_type:
                        ext = "webp"
-
+                    
                    import io
                    file = discord.File(io.BytesIO(image_data), filename=f"image.{ext}")
-
+                    
                    msg = await channel.send(
                        content=caption if caption else None,
                        file=file,
                    )
                    return SendResult(success=True, message_id=str(msg.id))
-
+        
        except ImportError:
            logger.warning(
                "[%s] aiohttp not installed, falling back to URL. Run: pip install aiohttp",
@@ -1334,7 +1322,7 @@ class DiscordAdapter(BasePlatformAdapter):
        except Exception as e:  # pragma: no cover - defensive logging
            logger.error("[%s] Failed to send document, falling back to base adapter: %s", self.name, e, exc_info=True)
            return await super().send_document(chat_id, file_path, caption, file_name, reply_to, metadata=metadata)
-
+    
    async def send_typing(self, chat_id: str, metadata=None) -> None:
        """Start a persistent typing indicator for a channel.

@@ -1378,20 +1366,20 @@ class DiscordAdapter(BasePlatformAdapter):
                await task
            except (asyncio.CancelledError, Exception):
                pass
-
+    
    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
        """Get information about a Discord channel."""
        if not self._client:
            return {"name": "Unknown", "type": "dm"}
-
+        
        try:
            channel = self._client.get_channel(int(chat_id))
            if not channel:
                channel = await self._client.fetch_channel(int(chat_id))
-
+            
            if not channel:
                return {"name": str(chat_id), "type": "dm"}
-
+            
            # Determine channel type
            if isinstance(channel, discord.DMChannel):
                chat_type = "dm"
@@ -1407,7 +1395,7 @@ class DiscordAdapter(BasePlatformAdapter):
            else:
                chat_type = "channel"
                name = getattr(channel, "name", str(chat_id))
-
+            
            return {
                "name": name,
                "type": chat_type,
@@ -1417,7 +1405,7 @@ class DiscordAdapter(BasePlatformAdapter):
        except Exception as e:  # pragma: no cover - defensive logging
            logger.error("[%s] Failed to get chat info for %s: %s", self.name, chat_id, e, exc_info=True)
            return {"name": str(chat_id), "type": "dm", "error": str(e)}
-
+    
    async def _resolve_allowed_usernames(self) -> None:
        """
        Resolve non-numeric entries in DISCORD_ALLOWED_USERS to Discord user IDs.
@@ -1485,7 +1473,7 @@ class DiscordAdapter(BasePlatformAdapter):
    def format_message(self, content: str) -> str:
        """
        Format message for Discord.
-
+        
        Discord uses its own markdown variant.
        """
        # Discord markdown is fairly standard, no special escaping needed
@@ -1651,7 +1639,7 @@ class DiscordAdapter(BasePlatformAdapter):
            chat_name = interaction.channel.name
            if hasattr(interaction.channel, "guild") and interaction.channel.guild:
                chat_name = f"{interaction.channel.guild.name} / #{chat_name}"
-
+        
        # Get channel topic (if available)
        chat_topic = getattr(interaction.channel, "topic", None)

@@ -2055,7 +2043,7 @@ class DiscordAdapter(BasePlatformAdapter):
                        if doc_ext in SUPPORTED_DOCUMENT_TYPES:
                            msg_type = MessageType.DOCUMENT
                    break
-
+        
        # When auto-threading kicked in, route responses to the new thread
        effective_channel = auto_threaded_channel or message.channel

@@ -2074,7 +2062,7 @@ class DiscordAdapter(BasePlatformAdapter):

        # Get channel topic (if available - TextChannels have topics, DMs/threads don't)
        chat_topic = getattr(message.channel, "topic", None)
-
+        
        # Build source
        source = self.build_source(
            chat_id=str(effective_channel.id),
@@ -2085,7 +2073,7 @@ class DiscordAdapter(BasePlatformAdapter):
            thread_id=thread_id,
            chat_topic=chat_topic,
        )
-
+        
        # Build media URLs -- download image attachments to local cache so the
        # vision tool can access them reliably (Discord CDN URLs can expire).
        media_urls = []
@@ -2179,7 +2167,7 @@ class DiscordAdapter(BasePlatformAdapter):
                                "[Discord] Failed to cache document %s: %s",
                                att.filename, e, exc_info=True,
                            )
-
+        
        event_text = message.content
        if pending_text_injection:
            event_text = f"{pending_text_injection}\n\n{event_text}" if event_text else pending_text_injection
@@ -49,14 +49,6 @@ _STORE_DIR = _get_hermes_dir("platforms/matrix/store", "matrix/store")
 # Grace period: ignore messages older than this many seconds before startup.
 _STARTUP_GRACE_SECONDS = 5

-# E2EE key export file for persistence across restarts.
-_KEY_EXPORT_FILE = _STORE_DIR / "exported_keys.txt"
-_KEY_EXPORT_PASSPHRASE = "hermes-matrix-e2ee-keys"
-
-# Pending undecrypted events: cap and TTL for retry buffer.
-_MAX_PENDING_EVENTS = 100
-_PENDING_EVENT_TTL = 300  # seconds — stop retrying after 5 min
-

 def check_matrix_requirements() -> bool:
    """Return True if the Matrix adapter can be used."""
@@ -119,10 +111,6 @@ class MatrixAdapter(BasePlatformAdapter):
        self._processed_events: deque = deque(maxlen=1000)
        self._processed_events_set: set = set()

-        # Buffer for undecrypted events pending key receipt.
-        # Each entry: (room, event, timestamp)
-        self._pending_megolm: list = []
-
    def _is_duplicate_event(self, event_id) -> bool:
        """Return True if this event was already processed. Tracks the ID otherwise."""
        if not event_id:
@@ -244,16 +232,6 @@ class MatrixAdapter(BasePlatformAdapter):
                logger.info("Matrix: E2EE crypto initialized")
            except Exception as exc:
                logger.warning("Matrix: crypto init issue: %s", exc)
-
-            # Import previously exported Megolm keys (survives restarts).
-            if _KEY_EXPORT_FILE.exists():
-                try:
-                    await client.import_keys(
-                        str(_KEY_EXPORT_FILE), _KEY_EXPORT_PASSPHRASE,
-                    )
-                    logger.info("Matrix: imported Megolm keys from backup")
-                except Exception as exc:
-                    logger.debug("Matrix: could not import keys: %s", exc)
        elif self._encryption:
            logger.warning(
                "Matrix: E2EE requested but crypto store is not loaded; "
@@ -308,18 +286,6 @@ class MatrixAdapter(BasePlatformAdapter):
            except (asyncio.CancelledError, Exception):
                pass

-        # Export Megolm keys before closing so the next restart can decrypt
-        # events that used sessions from this run.
-        if self._client and self._encryption and getattr(self._client, "olm", None):
-            try:
-                _STORE_DIR.mkdir(parents=True, exist_ok=True)
-                await self._client.export_keys(
-                    str(_KEY_EXPORT_FILE), _KEY_EXPORT_PASSPHRASE,
-                )
-                logger.info("Matrix: exported Megolm keys for next restart")
-            except Exception as exc:
-                logger.debug("Matrix: could not export keys on disconnect: %s", exc)
-
        if self._client:
            await self._client.close()
            self._client = None
@@ -699,22 +665,17 @@ class MatrixAdapter(BasePlatformAdapter):
        Hermes uses a custom sync loop instead of matrix-nio's sync_forever(),
        so we need to explicitly drive the key management work that sync_forever()
        normally handles for encrypted rooms.
-
-        Also auto-trusts all devices (so senders share session keys with us)
-        and retries decryption for any buffered MegolmEvents.
        """
        client = self._client
        if not client or not self._encryption or not getattr(client, "olm", None):
            return

-        did_query_keys = client.should_query_keys
-
        tasks = [asyncio.create_task(client.send_to_device_messages())]

        if client.should_upload_keys:
            tasks.append(asyncio.create_task(client.keys_upload()))

-        if did_query_keys:
+        if client.should_query_keys:
            tasks.append(asyncio.create_task(client.keys_query()))

        if client.should_claim_keys:
@@ -730,111 +691,6 @@ class MatrixAdapter(BasePlatformAdapter):
            except Exception as exc:
                logger.warning("Matrix: E2EE maintenance task failed: %s", exc)

-        # After key queries, auto-trust all devices so senders share keys with
-        # us.  For a bot this is the right default — we want to decrypt
-        # everything, not enforce manual verification.
-        if did_query_keys:
-            self._auto_trust_devices()
-
-        # Retry any buffered undecrypted events now that new keys may have
-        # arrived (from key requests, key queries, or to-device forwarding).
-        if self._pending_megolm:
-            await self._retry_pending_decryptions()
-
-    def _auto_trust_devices(self) -> None:
-        """Trust/verify all unverified devices we know about.
-
-        When other clients see our device as verified, they proactively share
-        Megolm session keys with us.  Without this, many clients will refuse
-        to include an unverified device in key distributions.
-        """
-        client = self._client
-        if not client:
-            return
-
-        device_store = getattr(client, "device_store", None)
-        if not device_store:
-            return
-
-        own_device = getattr(client, "device_id", None)
-        trusted_count = 0
-
-        try:
-            # DeviceStore.__iter__ yields OlmDevice objects directly.
-            for device in device_store:
-                if getattr(device, "device_id", None) == own_device:
-                    continue
-                if not getattr(device, "verified", False):
-                    client.verify_device(device)
-                    trusted_count += 1
-        except Exception as exc:
-            logger.debug("Matrix: auto-trust error: %s", exc)
-
-        if trusted_count:
-            logger.info("Matrix: auto-trusted %d new device(s)", trusted_count)
-
-    async def _retry_pending_decryptions(self) -> None:
-        """Retry decrypting buffered MegolmEvents after new keys arrive."""
-        import nio
-
-        client = self._client
-        if not client or not self._pending_megolm:
-            return
-
-        now = time.time()
-        still_pending: list = []
-
-        for room, event, ts in self._pending_megolm:
-            # Drop events that have aged past the TTL.
-            if now - ts > _PENDING_EVENT_TTL:
-                logger.debug(
-                    "Matrix: dropping expired pending event %s (age %.0fs)",
-                    getattr(event, "event_id", "?"), now - ts,
-                )
-                continue
-
-            try:
-                decrypted = client.decrypt_event(event)
-            except Exception:
-                # Still missing the key — keep in buffer.
-                still_pending.append((room, event, ts))
-                continue
-
-            if isinstance(decrypted, nio.MegolmEvent):
-                # decrypt_event returned the same undecryptable event.
-                still_pending.append((room, event, ts))
-                continue
-
-            logger.info(
-                "Matrix: decrypted buffered event %s (%s)",
-                getattr(event, "event_id", "?"),
-                type(decrypted).__name__,
-            )
-
-            # Route to the appropriate handler based on decrypted type.
-            try:
-                if isinstance(decrypted, nio.RoomMessageText):
-                    await self._on_room_message(room, decrypted)
-                elif isinstance(
-                    decrypted,
-                    (nio.RoomMessageImage, nio.RoomMessageAudio,
-                     nio.RoomMessageVideo, nio.RoomMessageFile),
-                ):
-                    await self._on_room_message_media(room, decrypted)
-                else:
-                    logger.debug(
-                        "Matrix: decrypted event %s has unhandled type %s",
-                        getattr(event, "event_id", "?"),
-                        type(decrypted).__name__,
-                    )
-            except Exception as exc:
-                logger.warning(
-                    "Matrix: error processing decrypted event %s: %s",
-                    getattr(event, "event_id", "?"), exc,
-                )
-
-        self._pending_megolm = still_pending
-
    # ------------------------------------------------------------------
    # Event callbacks
    # ------------------------------------------------------------------
@@ -856,29 +712,13 @@ class MatrixAdapter(BasePlatformAdapter):
        if event_ts and event_ts < self._startup_ts - _STARTUP_GRACE_SECONDS:
            return

-        # Handle undecryptable MegolmEvents: request the missing session key
-        # and buffer the event for retry once the key arrives.
+        # Handle decrypted MegolmEvents — extract the inner event.
        if isinstance(event, nio.MegolmEvent):
+            # Failed to decrypt.
            logger.warning(
-                "Matrix: could not decrypt event %s in %s — requesting key",
+                "Matrix: could not decrypt event %s in %s",
                event.event_id, room.room_id,
            )
-
-            # Ask other devices in the room to forward the session key.
-            try:
-                resp = await self._client.request_room_key(event)
-                if hasattr(resp, "event_id") or not isinstance(resp, Exception):
-                    logger.debug(
-                        "Matrix: room key request sent for session %s",
-                        getattr(event, "session_id", "?"),
-                    )
-            except Exception as exc:
-                logger.debug("Matrix: room key request failed: %s", exc)
-
-            # Buffer for retry on next maintenance cycle.
-            self._pending_megolm.append((room, event, time.time()))
-            if len(self._pending_megolm) > _MAX_PENDING_EVENTS:
-                self._pending_megolm = self._pending_megolm[-_MAX_PENDING_EVENTS:]
            return

        # Skip edits (m.replace relation).
@@ -622,19 +622,10 @@ class TelegramAdapter(BasePlatformAdapter):
            # gateway command there automatically adds it to the Telegram menu.
            try:
                from telegram import BotCommand
-                from hermes_cli.commands import telegram_menu_commands
-                # Telegram allows up to 100 commands but has an undocumented
-                # payload size limit.  Skill descriptions are truncated to 40
-                # chars in telegram_menu_commands() to fit 100 commands safely.
-                menu_commands, hidden_count = telegram_menu_commands(max_commands=100)
+                from hermes_cli.commands import telegram_bot_commands
                await self._bot.set_my_commands([
-                    BotCommand(name, desc) for name, desc in menu_commands
+                    BotCommand(name, desc) for name, desc in telegram_bot_commands()
                ])
-                if hidden_count:
-                    logger.info(
-                        "[%s] Telegram menu: %d commands registered, %d hidden (over 100 limit). Use /commands for full list.",
-                        self.name, len(menu_commands), hidden_count,
-                    )
            except Exception as e:
                logger.warning(
                    "[%s] Could not register Telegram command menu: %s",
@@ -742,10 +733,6 @@ class TelegramAdapter(BasePlatformAdapter):
        if not self._bot:
            return SendResult(success=False, error="Not connected")
        
-        # Skip whitespace-only text to prevent Telegram 400 empty-text errors.
-        if not content or not content.strip():
-            return SendResult(success=True, message_id=None)
-        
        try:
            # Format and split message if needed
            formatted = self.format_message(content)
@@ -135,9 +135,6 @@ def _normalize_fallback_ips(values: Iterable[str]) -> list[str]:
        if addr.version != 4:
            logger.warning("Ignoring non-IPv4 Telegram fallback IP: %s", raw)
            continue
-        if addr.is_private or addr.is_loopback or addr.is_link_local or addr.is_unspecified:
-            logger.warning("Ignoring private/internal Telegram fallback IP: %s", raw)
-            continue
        normalized.append(str(addr))
    return normalized

@@ -24,7 +24,6 @@ import signal
 import tempfile
 import threading
 import time
-import uuid
 from logging.handlers import RotatingFileHandler
 from pathlib import Path
 from datetime import datetime
@@ -299,54 +298,9 @@ def _resolve_runtime_agent_kwargs() -> dict:
        "api_mode": runtime.get("api_mode"),
        "command": runtime.get("command"),
        "args": list(runtime.get("args") or []),
-        "credential_pool": runtime.get("credential_pool"),
    }


-def _check_unavailable_skill(command_name: str) -> str | None:
-    """Check if a command matches a known-but-inactive skill.
-
-    Returns a helpful message if the skill exists but is disabled or only
-    available as an optional install. Returns None if no match found.
-    """
-    # Normalize: command uses hyphens, skill names may use hyphens or underscores
-    normalized = command_name.lower().replace("_", "-")
-    try:
-        from tools.skills_tool import SKILLS_DIR, _get_disabled_skill_names
-        disabled = _get_disabled_skill_names()
-
-        # Check disabled built-in skills
-        for skill_md in SKILLS_DIR.rglob("SKILL.md"):
-            if any(part in ('.git', '.github', '.hub') for part in skill_md.parts):
-                continue
-            name = skill_md.parent.name.lower().replace("_", "-")
-            if name == normalized and name in disabled:
-                return (
-                    f"The **{command_name}** skill is installed but disabled.\n"
-                    f"Enable it with: `hermes skills config`"
-                )
-
-        # Check optional skills (shipped with repo but not installed)
-        from hermes_constants import get_hermes_home, get_optional_skills_dir
-        repo_root = Path(__file__).resolve().parent.parent
-        optional_dir = get_optional_skills_dir(repo_root / "optional-skills")
-        if optional_dir.exists():
-            for skill_md in optional_dir.rglob("SKILL.md"):
-                name = skill_md.parent.name.lower().replace("_", "-")
-                if name == normalized:
-                    # Build install path: official/<category>/<name>
-                    rel = skill_md.parent.relative_to(optional_dir)
-                    parts = list(rel.parts)
-                    install_path = f"official/{'/'.join(parts)}"
-                    return (
-                        f"The **{command_name}** skill is available but not installed.\n"
-                        f"Install it with: `hermes skills install {install_path}`"
-                    )
-    except Exception:
-        pass
-    return None
-
-
 def _platform_config_key(platform: "Platform") -> str:
    """Map a Platform enum to its config.yaml key (LOCAL→"cli", rest→enum value)."""
    return "cli" if platform == Platform.LOCAL else platform.value
@@ -366,19 +320,20 @@ def _load_gateway_config() -> dict:


 def _resolve_gateway_model(config: dict | None = None) -> str:
-    """Read model from config.yaml — single source of truth.
+    """Read model from env/config — mirrors the resolution in _run_agent_sync.

    Without this, temporary AIAgent instances (memory flush, /compress) fall
    back to the hardcoded default which fails when the active provider is
    openai-codex.
    """
+    model = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or ""
    cfg = config if config is not None else _load_gateway_config()
    model_cfg = cfg.get("model", {})
    if isinstance(model_cfg, str):
-        return model_cfg
+        model = model_cfg
    elif isinstance(model_cfg, dict):
-        return model_cfg.get("default") or model_cfg.get("model") or ""
-    return ""
+        model = model_cfg.get("default") or model_cfg.get("model") or model
+    return model


 def _resolve_hermes_bin() -> Optional[list[str]]:
@@ -477,8 +432,6 @@ class GatewayRunner:
        self._honcho_managers: Dict[str, Any] = {}
        self._honcho_configs: Dict[str, Any] = {}

-
-
        # Ensure tirith security scanner is available (downloads if needed)
        try:
            from tools.tirith_security import ensure_installed
@@ -789,7 +742,6 @@ class GatewayRunner:
            "api_mode": runtime_kwargs.get("api_mode"),
            "command": runtime_kwargs.get("command"),
            "args": list(runtime_kwargs.get("args") or []),
-            "credential_pool": runtime_kwargs.get("credential_pool"),
        }
        return resolve_turn_route(user_message, getattr(self, "_smart_model_routing", {}), primary)

@@ -1652,11 +1604,6 @@ class GatewayRunner:
        if global_allowlist:
            allowed_ids.update(uid.strip() for uid in global_allowlist.split(",") if uid.strip())

-        # "*" in any allowlist means allow everyone (consistent with
-        # SIGNAL_GROUP_ALLOWED_USERS precedent)
-        if "*" in allowed_ids:
-            return True
-
        check_ids = {user_id}
        if "@" in user_id:
            check_ids.add(user_id.split("@")[0])
@@ -1704,11 +1651,6 @@ class GatewayRunner:
            # In DMs: offer pairing code. In groups: silently ignore.
            if source.chat_type == "dm" and self._get_unauthorized_dm_behavior(source.platform) == "pair":
                platform_name = source.platform.value if source.platform else "unknown"
-                # Rate-limit ALL pairing responses (code or rejection) to
-                # prevent spamming the user with repeated messages when
-                # multiple DMs arrive in quick succession.
-                if self.pairing_store._is_rate_limited(platform_name, source.user_id):
-                    return None
                code = self.pairing_store.generate_code(
                    platform_name, source.user_id, source.user_name or ""
                )
@@ -1730,8 +1672,6 @@ class GatewayRunner:
                            "Too many pairing requests right now~ "
                            "Please try again later!"
                        )
-                    # Record rate limit so subsequent messages are silently ignored
-                    self.pairing_store._record_rate_limit(platform_name, source.user_id)
            return None
        
        # PRIORITY handling when an agent is already running for this session.
@@ -1877,13 +1817,7 @@ class GatewayRunner:
        
        if canonical == "help":
            return await self._handle_help_command(event)
-
-        if canonical == "commands":
-            return await self._handle_commands_command(event)
        
-        if canonical == "profile":
-            return await self._handle_profile_command(event)
-
        if canonical == "status":
            return await self._handle_status_command(event)
        
@@ -1896,9 +1830,6 @@ class GatewayRunner:
        if canonical == "verbose":
            return await self._handle_verbose_command(event)

-        if canonical == "yolo":
-            return await self._handle_yolo_command(event)
-
        if canonical == "provider":
            return await self._handle_provider_command(event)
        
@@ -1969,9 +1900,6 @@ class GatewayRunner:
        if canonical == "background":
            return await self._handle_background_command(event)

-        if canonical == "btw":
-            return await self._handle_btw_command(event)
-
        if canonical == "voice":
            return await self._handle_voice_command(event)

@@ -2046,12 +1974,6 @@ class GatewayRunner:
                    if msg:
                        event.text = msg
                        # Fall through to normal message processing with skill content
-                else:
-                    # Not an active skill — check if it's a known-but-disabled or
-                    # uninstalled skill and give actionable guidance.
-                    _unavail_msg = _check_unavailable_skill(command)
-                    if _unavail_msg:
-                        return _unavail_msg
            except Exception as e:
                logger.debug("Skill command check failed (non-fatal): %s", e)
        
@@ -2289,29 +2211,6 @@ class GatewayRunner:
                        _hyg_api_key = _hyg_runtime.get("api_key")
                    except Exception:
                        pass
-
-                # Check custom_providers per-model context_length
-                # (same fallback as run_agent.py lines 1171-1189).
-                # Must run after runtime resolution so _hyg_base_url is set.
-                if _hyg_config_context_length is None and _hyg_base_url:
-                    try:
-                        _hyg_custom_providers = _hyg_data.get("custom_providers")
-                        if isinstance(_hyg_custom_providers, list):
-                            for _cp in _hyg_custom_providers:
-                                if not isinstance(_cp, dict):
-                                    continue
-                                _cp_url = (_cp.get("base_url") or "").rstrip("/")
-                                if _cp_url and _cp_url == _hyg_base_url.rstrip("/"):
-                                    _cp_models = _cp.get("models", {})
-                                    if isinstance(_cp_models, dict):
-                                        _cp_model_cfg = _cp_models.get(_hyg_model, {})
-                                        if isinstance(_cp_model_cfg, dict):
-                                            _cp_ctx = _cp_model_cfg.get("context_length")
-                                            if _cp_ctx is not None:
-                                                _hyg_config_context_length = int(_cp_ctx)
-                                    break
-                    except (TypeError, ValueError):
-                        pass
            except Exception:
                pass

@@ -2359,7 +2258,18 @@ class GatewayRunner:
                        f"{_compress_token_threshold:,}",
                    )

+                    _hyg_adapter = self.adapters.get(source.platform)
                    _hyg_meta = {"thread_id": source.thread_id} if source.thread_id else None
+                    if _hyg_adapter:
+                        try:
+                            await _hyg_adapter.send(
+                                source.chat_id,
+                                f"🗜️ Session is large ({_msg_count} messages, "
+                                f"~{_approx_tokens:,} tokens). Auto-compressing...",
+                                metadata=_hyg_meta,
+                            )
+                        except Exception:
+                            pass

                    try:
                        from run_agent import AIAgent
@@ -2420,17 +2330,62 @@ class GatewayRunner:
                                    f"{_approx_tokens:,}", f"{_new_tokens:,}",
                                )

+                                if _hyg_adapter:
+                                    try:
+                                        await _hyg_adapter.send(
+                                            source.chat_id,
+                                            f"🗜️ Compressed: {_msg_count} → "
+                                            f"{_new_count} messages, "
+                                            f"~{_approx_tokens:,} → "
+                                            f"~{_new_tokens:,} tokens",
+                                            metadata=_hyg_meta,
+                                        )
+                                    except Exception:
+                                        pass
+
+                                # Still too large after compression — warn user
                                if _new_tokens >= _warn_token_threshold:
                                    logger.warning(
                                        "Session hygiene: still ~%s tokens after "
-                                        "compression",
+                                        "compression — suggesting /reset",
                                        f"{_new_tokens:,}",
                                    )
+                                    if _hyg_adapter:
+                                        try:
+                                            await _hyg_adapter.send(
+                                                source.chat_id,
+                                                "⚠️ Session is still very large "
+                                                "after compression "
+                                                f"(~{_new_tokens:,} tokens). "
+                                                "Consider using /reset to start "
+                                                "fresh if you experience issues.",
+                                                metadata=_hyg_meta,
+                                            )
+                                        except Exception:
+                                            pass

                    except Exception as e:
                        logger.warning(
                            "Session hygiene auto-compress failed: %s", e
                        )
+                        # Compression failed and session is dangerously large
+                        if _approx_tokens >= _warn_token_threshold:
+                            _hyg_adapter = self.adapters.get(source.platform)
+                            _hyg_meta = {"thread_id": source.thread_id} if source.thread_id else None
+                            if _hyg_adapter:
+                                try:
+                                    await _hyg_adapter.send(
+                                        source.chat_id,
+                                        f"⚠️ Session is very large "
+                                        f"({_msg_count} messages, "
+                                        f"~{_approx_tokens:,} tokens) and "
+                                        "auto-compression failed. Consider "
+                                        "using /compress or /reset to avoid "
+                                        "issues.",
+                                        metadata=_hyg_meta,
+                                    )
+                                except Exception:
+                                    pass

        # First-message onboarding -- only on the very first interaction ever
        if not history and not self.session_store.has_any_sessions():
@@ -2769,7 +2724,7 @@ class GatewayRunner:
                    {
                        "role": "session_meta",
                        "tools": tool_defs or [],
-                        "model": _resolve_gateway_model(),
+                        "model": os.getenv("HERMES_MODEL", ""),
                        "platform": source.platform.value if source.platform else "",
                        "timestamp": ts,
                    }
@@ -3044,36 +2999,6 @@ class GatewayRunner:
            return f"{header}\n\n{session_info}"
        return header
    
-    async def _handle_profile_command(self, event: MessageEvent) -> str:
-        """Handle /profile — show active profile name and home directory."""
-        from hermes_constants import get_hermes_home, display_hermes_home
-        from pathlib import Path
-
-        home = get_hermes_home()
-        display = display_hermes_home()
-
-        # Detect profile name from HERMES_HOME path
-        # Profile paths look like: ~/.hermes/profiles/<name>
-        profiles_parent = Path.home() / ".hermes" / "profiles"
-        try:
-            rel = home.relative_to(profiles_parent)
-            profile_name = str(rel).split("/")[0]
-        except ValueError:
-            profile_name = None
-
-        if profile_name:
-            lines = [
-                f"👤 **Profile:** `{profile_name}`",
-                f"📂 **Home:** `{display}`",
-            ]
-        else:
-            lines = [
-                "👤 **Profile:** default",
-                f"📂 **Home:** `{display}`",
-            ]
-
-        return "\n".join(lines)
-
    async def _handle_status_command(self, event: MessageEvent) -> str:
        """Handle /status command."""
        source = event.source
@@ -3140,68 +3065,11 @@ class GatewayRunner:
            from agent.skill_commands import get_skill_commands
            skill_cmds = get_skill_commands()
            if skill_cmds:
-                lines.append(f"\n⚡ **Skill Commands** ({len(skill_cmds)} active):")
-                # Show first 10, then point to /commands for the rest
-                sorted_cmds = sorted(skill_cmds)
-                for cmd in sorted_cmds[:10]:
-                    lines.append(f"`{cmd}` — {skill_cmds[cmd]['description']}")
-                if len(sorted_cmds) > 10:
-                    lines.append(f"\n... and {len(sorted_cmds) - 10} more. Use `/commands` for the full paginated list.")
-        except Exception:
-            pass
-        return "\n".join(lines)
-
-    async def _handle_commands_command(self, event: MessageEvent) -> str:
-        """Handle /commands [page] - paginated list of all commands and skills."""
-        from hermes_cli.commands import gateway_help_lines
-
-        raw_args = event.get_command_args().strip()
-        if raw_args:
-            try:
-                requested_page = int(raw_args)
-            except ValueError:
-                return "Usage: `/commands [page]`"
-        else:
-            requested_page = 1
-
-        # Build combined entry list: built-in commands + skill commands
-        entries = list(gateway_help_lines())
-        try:
-            from agent.skill_commands import get_skill_commands
-            skill_cmds = get_skill_commands()
-            if skill_cmds:
-                entries.append("")
-                entries.append("⚡ **Skill Commands**:")
+                lines.append(f"\n⚡ **Skill Commands** ({len(skill_cmds)} installed):")
                for cmd in sorted(skill_cmds):
-                    desc = skill_cmds[cmd].get("description", "").strip() or "Skill command"
-                    entries.append(f"`{cmd}` — {desc}")
+                    lines.append(f"`{cmd}` — {skill_cmds[cmd]['description']}")
        except Exception:
            pass
-
-        if not entries:
-            return "No commands available."
-
-        from gateway.config import Platform
-        page_size = 15 if event.source.platform == Platform.TELEGRAM else 20
-        total_pages = max(1, (len(entries) + page_size - 1) // page_size)
-        page = max(1, min(requested_page, total_pages))
-        start = (page - 1) * page_size
-        page_entries = entries[start:start + page_size]
-
-        lines = [
-            f"📚 **Commands** ({len(entries)} total, page {page}/{total_pages})",
-            "",
-            *page_entries,
-        ]
-        if total_pages > 1:
-            nav_parts = []
-            if page > 1:
-                nav_parts.append(f"`/commands {page - 1}` ← prev")
-            if page < total_pages:
-                nav_parts.append(f"next → `/commands {page + 1}`")
-            lines.extend(["", " | ".join(nav_parts)])
-        if page != requested_page:
-            lines.append(f"_(Requested page {requested_page} was out of range, showing page {page}.)_")
        return "\n".join(lines)
    
    async def _handle_provider_command(self, event: MessageEvent) -> str:
@@ -3234,11 +3102,9 @@ class GatewayRunner:
            except Exception:
                current_provider = "openrouter"

-        # Detect custom endpoint from config base_url
-        if current_provider == "openrouter":
-            _cfg_base = model_cfg.get("base_url", "") if isinstance(model_cfg, dict) else ""
-            if _cfg_base and "openrouter.ai" not in _cfg_base:
-                current_provider = "custom"
+        # Detect custom endpoint
+        if current_provider == "openrouter" and os.getenv("OPENAI_BASE_URL", "").strip():
+            current_provider = "custom"

        current_label = _PROVIDER_LABELS.get(current_provider, current_provider)

@@ -4050,167 +3916,6 @@ class GatewayRunner:
            except Exception:
                pass

-    async def _handle_btw_command(self, event: MessageEvent) -> str:
-        """Handle /btw <question> — ephemeral side question in the same chat."""
-        question = event.get_command_args().strip()
-        if not question:
-            return (
-                "Usage: /btw <question>\n"
-                "Example: /btw what module owns session title sanitization?\n\n"
-                "Answers using session context. No tools, not persisted."
-            )
-
-        source = event.source
-        session_key = self._session_key_for_source(source)
-
-        # Guard: one /btw at a time per session
-        existing = getattr(self, "_active_btw_tasks", {}).get(session_key)
-        if existing and not existing.done():
-            return "A /btw is already running for this chat. Wait for it to finish."
-
-        if not hasattr(self, "_active_btw_tasks"):
-            self._active_btw_tasks: dict = {}
-
-        import uuid as _uuid
-        task_id = f"btw_{datetime.now().strftime('%H%M%S')}_{_uuid.uuid4().hex[:6]}"
-        _task = asyncio.create_task(self._run_btw_task(question, source, session_key, task_id))
-        self._background_tasks.add(_task)
-        self._active_btw_tasks[session_key] = _task
-
-        def _cleanup(task):
-            self._background_tasks.discard(task)
-            if self._active_btw_tasks.get(session_key) is task:
-                self._active_btw_tasks.pop(session_key, None)
-
-        _task.add_done_callback(_cleanup)
-
-        preview = question[:60] + ("..." if len(question) > 60 else "")
-        return f'💬 /btw: "{preview}"\nReply will appear here shortly.'
-
-    async def _run_btw_task(
-        self, question: str, source, session_key: str, task_id: str,
-    ) -> None:
-        """Execute an ephemeral /btw side question and deliver the answer."""
-        from run_agent import AIAgent
-
-        adapter = self.adapters.get(source.platform)
-        if not adapter:
-            logger.warning("No adapter for platform %s in /btw task %s", source.platform, task_id)
-            return
-
-        _thread_meta = {"thread_id": source.thread_id} if source.thread_id else None
-
-        try:
-            runtime_kwargs = _resolve_runtime_agent_kwargs()
-            if not runtime_kwargs.get("api_key"):
-                await adapter.send(
-                    source.chat_id,
-                    "❌ /btw failed: no provider credentials configured.",
-                    metadata=_thread_meta,
-                )
-                return
-
-            user_config = _load_gateway_config()
-            model = _resolve_gateway_model(user_config)
-            platform_key = _platform_config_key(source.platform)
-            reasoning_config = self._load_reasoning_config()
-            turn_route = self._resolve_turn_agent_config(question, model, runtime_kwargs)
-            pr = self._provider_routing
-
-            # Snapshot history from running agent or stored transcript
-            running_agent = self._running_agents.get(session_key)
-            if running_agent and running_agent is not _AGENT_PENDING_SENTINEL:
-                history_snapshot = list(getattr(running_agent, "_session_messages", []) or [])
-            else:
-                session_entry = self.session_store.get_or_create_session(source)
-                history_snapshot = self.session_store.load_transcript(session_entry.session_id)
-
-            btw_prompt = (
-                "[Ephemeral /btw side question. Answer using the conversation "
-                "context. No tools available. Be direct and concise.]\n\n"
-                + question
-            )
-
-            def run_sync():
-                agent = AIAgent(
-                    model=turn_route["model"],
-                    **turn_route["runtime"],
-                    max_iterations=8,
-                    quiet_mode=True,
-                    verbose_logging=False,
-                    enabled_toolsets=[],
-                    reasoning_config=reasoning_config,
-                    providers_allowed=pr.get("only"),
-                    providers_ignored=pr.get("ignore"),
-                    providers_order=pr.get("order"),
-                    provider_sort=pr.get("sort"),
-                    provider_require_parameters=pr.get("require_parameters", False),
-                    provider_data_collection=pr.get("data_collection"),
-                    session_id=task_id,
-                    platform=platform_key,
-                    session_db=None,
-                    fallback_model=self._fallback_model,
-                    skip_memory=True,
-                    skip_context_files=True,
-                    persist_session=False,
-                )
-                return agent.run_conversation(
-                    user_message=btw_prompt,
-                    conversation_history=history_snapshot,
-                    task_id=task_id,
-                    sync_honcho=False,
-                )
-
-            loop = asyncio.get_event_loop()
-            result = await loop.run_in_executor(None, run_sync)
-
-            response = (result.get("final_response") or "") if result else ""
-            if not response and result and result.get("error"):
-                response = f"Error: {result['error']}"
-            if not response:
-                response = "(No response generated)"
-
-            media_files, response = adapter.extract_media(response)
-            images, text_content = adapter.extract_images(response)
-            preview = question[:60] + ("..." if len(question) > 60 else "")
-            header = f'💬 /btw: "{preview}"\n\n'
-
-            if text_content:
-                await adapter.send(
-                    chat_id=source.chat_id,
-                    content=header + text_content,
-                    metadata=_thread_meta,
-                )
-            elif not images and not media_files:
-                await adapter.send(
-                    chat_id=source.chat_id,
-                    content=header + "(No response generated)",
-                    metadata=_thread_meta,
-                )
-
-            for image_url, alt_text in (images or []):
-                try:
-                    await adapter.send_image(chat_id=source.chat_id, image_url=image_url, caption=alt_text)
-                except Exception:
-                    pass
-
-            for media_path in (media_files or []):
-                try:
-                    await adapter.send_file(chat_id=source.chat_id, file_path=media_path)
-                except Exception:
-                    pass
-
-        except Exception as e:
-            logger.exception("/btw task %s failed", task_id)
-            try:
-                await adapter.send(
-                    chat_id=source.chat_id,
-                    content=f"❌ /btw failed: {e}",
-                    metadata=_thread_meta,
-                )
-            except Exception:
-                pass
-
    async def _handle_reasoning_command(self, event: MessageEvent) -> str:
        """Handle /reasoning command — manage reasoning effort and display toggle.

@@ -4294,16 +3999,6 @@ class GatewayRunner:
        else:
            return f"🧠 ✓ Reasoning effort set to `{effort}` (this session only)"

-    async def _handle_yolo_command(self, event: MessageEvent) -> str:
-        """Handle /yolo — toggle dangerous command approval bypass."""
-        current = bool(os.environ.get("HERMES_YOLO_MODE"))
-        if current:
-            os.environ.pop("HERMES_YOLO_MODE", None)
-            return "⚠️ YOLO mode **OFF** — dangerous commands will require approval."
-        else:
-            os.environ["HERMES_YOLO_MODE"] = "1"
-            return "⚡ YOLO mode **ON** — all commands auto-approved. Use with caution."
-
    async def _handle_verbose_command(self, event: MessageEvent) -> str:
        """Handle /verbose command — cycle tool progress display mode.

@@ -4721,13 +4416,9 @@ class GatewayRunner:

    _APPROVAL_TIMEOUT_SECONDS = 300  # 5 minutes

-    async def _handle_approve_command(self, event: MessageEvent) -> Optional[str]:
+    async def _handle_approve_command(self, event: MessageEvent) -> str:
        """Handle /approve command — execute a pending dangerous command.

-        After execution, re-invokes the agent with the command result so it
-        can continue its multi-step task (fixes the "dead agent" bug where
-        the agent loop exited on approval_required and never resumed).
-
        Usage:
            /approve          — approve and execute the pending command
            /approve session  — approve and remember for this session
@@ -4776,57 +4467,8 @@ class GatewayRunner:

        logger.info("User approved dangerous command via /approve: %s...%s", cmd[:60], scope_msg)
        from tools.terminal_tool import terminal_tool
-        result = await asyncio.to_thread(terminal_tool, command=cmd, force=True)
-
-        # Send immediate feedback so the user sees the command output right away
-        immediate_msg = f"✅ Command approved and executed{scope_msg}.\n\n```\n{result[:3500]}\n```"
-        adapter = self.adapters.get(source.platform)
-        if adapter:
-            try:
-                await adapter.send(source.chat_id, immediate_msg)
-            except Exception as e:
-                logger.warning("Failed to send approval feedback: %s", e)
-
-        # Re-invoke the agent with the command result so it can continue its task.
-        # The agent's conversation history (persisted in SQLite) already contains
-        # the tool call that returned approval_required — the continuation message
-        # provides the actual execution output so the agent can pick up where it
-        # left off.
-        continuation_text = (
-            f"[System: The user approved the previously blocked command and it has been executed.\n"
-            f"Command: {cmd}\n"
-            f"<command_output>\n{result[:3500]}\n</command_output>\n\n"
-            f"Continue with the task you were working on.]"
-        )
-
-        synthetic_event = MessageEvent(
-            text=continuation_text,
-            source=source,
-            message_id=f"approve-continuation-{uuid.uuid4().hex}",
-        )
-
-        async def _continue_agent():
-            try:
-                response = await self._handle_message(synthetic_event)
-                if response and adapter:
-                    await adapter.send(source.chat_id, response)
-            except Exception as e:
-                logger.error("Failed to continue agent after /approve: %s", e)
-                if adapter:
-                    try:
-                        await adapter.send(
-                            source.chat_id,
-                            f"⚠️ Failed to resume agent after approval: {e}"
-                        )
-                    except Exception:
-                        pass
-
-        _task = asyncio.create_task(_continue_agent())
-        self._background_tasks.add(_task)
-        _task.add_done_callback(self._background_tasks.discard)
-        # Return None — we already sent the immediate feedback and the agent
-        # continuation is running in the background.
-        return None
+        result = terminal_tool(command=cmd, force=True)
+        return f"✅ Command approved and executed{scope_msg}.\n\n```\n{result[:3500]}\n```"

    async def _handle_deny_command(self, event: MessageEvent) -> str:
        """Handle /deny command — reject a pending dangerous command."""
@@ -4843,8 +4485,8 @@ class GatewayRunner:
    async def _handle_update_command(self, event: MessageEvent) -> str:
        """Handle /update command — update Hermes Agent to the latest version.

-        Spawns ``hermes update`` in a detached session (via ``setsid``) so it
-        survives the gateway restart that ``hermes update`` may trigger. Marker
+        Spawns ``hermes update`` in a separate systemd scope so it survives the
+        gateway restart that ``hermes update`` may trigger at the end. Marker
        files are written so either the current gateway process or the next one
        can notify the user when the update finishes.
        """
@@ -4852,10 +4494,6 @@ class GatewayRunner:
        import shutil
        import subprocess
        from datetime import datetime
-        from hermes_cli.config import is_managed, format_managed_message
-
-        if is_managed():
-            return f"✗ {format_managed_message('update Hermes Agent')}"

        project_root = Path(__file__).parent.parent.resolve()
        git_dir = project_root / '.git'
@@ -4884,28 +4522,28 @@ class GatewayRunner:
        pending_path.write_text(json.dumps(pending))
        exit_code_path.unlink(missing_ok=True)

-        # Spawn `hermes update` detached so it survives gateway restart.
-        # Use setsid for portable session detach (works under system services
-        # where systemd-run --user fails due to missing D-Bus session).
+        # Spawn `hermes update` in a separate cgroup so it survives gateway
+        # restart. systemd-run --user --scope creates a transient scope unit.
        hermes_cmd_str = " ".join(shlex.quote(part) for part in hermes_cmd)
        update_cmd = (
            f"{hermes_cmd_str} update > {shlex.quote(str(output_path))} 2>&1; "
            f"status=$?; printf '%s' \"$status\" > {shlex.quote(str(exit_code_path))}"
        )
        try:
-            setsid_bin = shutil.which("setsid")
-            if setsid_bin:
-                # Preferred: setsid creates a new session, fully detached
+            systemd_run = shutil.which("systemd-run")
+            if systemd_run:
                subprocess.Popen(
-                    [setsid_bin, "bash", "-c", update_cmd],
+                    [systemd_run, "--user", "--scope",
+                     "--unit=hermes-update", "--",
+                     "bash", "-c", update_cmd],
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.DEVNULL,
                    start_new_session=True,
                )
            else:
-                # Fallback: start_new_session=True calls os.setsid() in child
+                # Fallback: best-effort detach with start_new_session
                subprocess.Popen(
-                    ["bash", "-c", update_cmd],
+                    ["bash", "-c", f"nohup {update_cmd} &"],
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.DEVNULL,
                    start_new_session=True,
@@ -5896,9 +5534,7 @@ class GatewayRunner:
            # If so, update the session store entry so the NEXT message loads
            # the compressed transcript, not the stale pre-compression one.
            agent = agent_holder[0]
-            _session_was_split = False
            if agent and session_key and hasattr(agent, 'session_id') and agent.session_id != session_id:
-                _session_was_split = True
                logger.info(
                    "Session split detected: %s → %s (compression)",
                    session_id, agent.session_id,
@@ -5910,13 +5546,6 @@ class GatewayRunner:

            effective_session_id = getattr(agent, 'session_id', session_id) if agent else session_id

-            # When compression created a new session, the messages list was
-            # shortened.  Using the original history offset would produce an
-            # empty new_messages slice, causing the gateway to write only a
-            # user/assistant pair — losing the compressed summary and tail.
-            # Reset to 0 so the gateway writes ALL compressed messages.
-            _effective_history_offset = 0 if _session_was_split else len(agent_history)
-
            # Auto-generate session title after first exchange (non-blocking)
            if final_response and self._session_db:
                try:
@@ -5938,7 +5567,7 @@ class GatewayRunner:
                "messages": result_holder[0].get("messages", []) if result_holder[0] else [],
                "api_calls": result_holder[0].get("api_calls", 0) if result_holder[0] else 0,
                "tools": tools_holder[0] or [],
-                "history_offset": _effective_history_offset,
+                "history_offset": len(agent_history),
                "last_prompt_tokens": _last_prompt_toks,
                "input_tokens": _input_toks,
                "output_tokens": _output_toks,
@@ -160,7 +160,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        id="alibaba",
        name="Alibaba Cloud (DashScope)",
        auth_type="api_key",
-        inference_base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
+        inference_base_url="https://coding-intl.dashscope.aliyuncs.com/v1",
        api_key_env_vars=("DASHSCOPE_API_KEY",),
        base_url_env_var="DASHSCOPE_BASE_URL",
    ),
@@ -545,11 +545,7 @@ def _load_auth_store(auth_file: Optional[Path] = None) -> Dict[str, Any]:
    except Exception:
        return {"version": AUTH_STORE_VERSION, "providers": {}}

-    if isinstance(raw, dict) and (
-        isinstance(raw.get("providers"), dict)
-        or isinstance(raw.get("credential_pool"), dict)
-    ):
-        raw.setdefault("providers", {})
+    if isinstance(raw, dict) and isinstance(raw.get("providers"), dict):
        return raw

    # Migrate from PR's "systems" format if present
@@ -617,30 +613,6 @@ def _save_provider_state(auth_store: Dict[str, Any], provider_id: str, state: Di
    auth_store["active_provider"] = provider_id


-def read_credential_pool(provider_id: Optional[str] = None) -> Dict[str, Any]:
-    """Return the persisted credential pool, or one provider slice."""
-    auth_store = _load_auth_store()
-    pool = auth_store.get("credential_pool")
-    if not isinstance(pool, dict):
-        pool = {}
-    if provider_id is None:
-        return dict(pool)
-    provider_entries = pool.get(provider_id)
-    return list(provider_entries) if isinstance(provider_entries, list) else []
-
-
-def write_credential_pool(provider_id: str, entries: List[Dict[str, Any]]) -> Path:
-    """Persist one provider's credential pool under auth.json."""
-    with _auth_store_lock():
-        auth_store = _load_auth_store()
-        pool = auth_store.get("credential_pool")
-        if not isinstance(pool, dict):
-            pool = {}
-            auth_store["credential_pool"] = pool
-        pool[provider_id] = list(entries)
-        return _save_auth_store(auth_store)
-
-
 def get_provider_auth_state(provider_id: str) -> Optional[Dict[str, Any]]:
    """Return persisted auth state for a provider, or None."""
    auth_store = _load_auth_store()
@@ -666,25 +638,10 @@ def clear_provider_auth(provider_id: Optional[str] = None) -> bool:
            return False

        providers = auth_store.get("providers", {})
-        if not isinstance(providers, dict):
-            providers = {}
-            auth_store["providers"] = providers
-
-        pool = auth_store.get("credential_pool")
-        if not isinstance(pool, dict):
-            pool = {}
-            auth_store["credential_pool"] = pool
-
-        cleared = False
-        if target in providers:
-            del providers[target]
-            cleared = True
-        if target in pool:
-            del pool[target]
-            cleared = True
-
-        if not cleared:
+        if target not in providers:
            return False
+
+        del providers[target]
        if auth_store.get("active_provider") == target:
            auth_store["active_provider"] = None
        _save_auth_store(auth_store)
@@ -941,14 +898,15 @@ def _save_codex_tokens(tokens: Dict[str, str], last_refresh: str = None) -> None
        _save_auth_store(auth_store)


-def refresh_codex_oauth_pure(
-    access_token: str,
-    refresh_token: str,
-    *,
-    timeout_seconds: float = 20.0,
-) -> Dict[str, Any]:
-    """Refresh Codex OAuth tokens without mutating Hermes auth state."""
-    del access_token  # Access token is only used by callers to decide whether to refresh.
+def _refresh_codex_auth_tokens(
+    tokens: Dict[str, str],
+    timeout_seconds: float,
+) -> Dict[str, str]:
+    """Refresh Codex access token using the refresh token.
+    
+    Saves the new tokens to Hermes auth store automatically.
+    """
+    refresh_token = tokens.get("refresh_token")
    if not isinstance(refresh_token, str) or not refresh_token.strip():
        raise AuthError(
            "Codex auth is missing refresh_token. Run `hermes login` to re-authenticate.",
@@ -1003,8 +961,8 @@ def refresh_codex_oauth_pure(
            relogin_required=True,
        ) from exc

-    refreshed_access = refresh_payload.get("access_token")
-    if not isinstance(refreshed_access, str) or not refreshed_access.strip():
+    access_token = refresh_payload.get("access_token")
+    if not isinstance(access_token, str) or not access_token.strip():
        raise AuthError(
            "Codex token refresh response was missing access_token.",
            provider="openai-codex",
@@ -1012,33 +970,11 @@ def refresh_codex_oauth_pure(
            relogin_required=True,
        )

-    updated = {
-        "access_token": refreshed_access.strip(),
-        "refresh_token": refresh_token.strip(),
-        "last_refresh": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
-    }
+    updated_tokens = dict(tokens)
+    updated_tokens["access_token"] = access_token.strip()
    next_refresh = refresh_payload.get("refresh_token")
    if isinstance(next_refresh, str) and next_refresh.strip():
-        updated["refresh_token"] = next_refresh.strip()
-    return updated
-
-
-def _refresh_codex_auth_tokens(
-    tokens: Dict[str, str],
-    timeout_seconds: float,
-) -> Dict[str, str]:
-    """Refresh Codex access token using the refresh token.
-    
-    Saves the new tokens to Hermes auth store automatically.
-    """
-    refreshed = refresh_codex_oauth_pure(
-        str(tokens.get("access_token", "") or ""),
-        str(tokens.get("refresh_token", "") or ""),
-        timeout_seconds=timeout_seconds,
-    )
-    updated_tokens = dict(tokens)
-    updated_tokens["access_token"] = refreshed["access_token"]
-    updated_tokens["refresh_token"] = refreshed["refresh_token"]
+        updated_tokens["refresh_token"] = next_refresh.strip()

    _save_codex_tokens(updated_tokens)
    return updated_tokens
@@ -1377,122 +1313,6 @@ def _agent_key_is_usable(state: Dict[str, Any], min_ttl_seconds: int) -> bool:
    return not _is_expiring(state.get("agent_key_expires_at"), min_ttl_seconds)


-def refresh_nous_oauth_pure(
-    access_token: str,
-    refresh_token: str,
-    client_id: str,
-    portal_base_url: str,
-    inference_base_url: str,
-    *,
-    token_type: str = "Bearer",
-    scope: str = DEFAULT_NOUS_SCOPE,
-    obtained_at: Optional[str] = None,
-    expires_at: Optional[str] = None,
-    agent_key: Optional[str] = None,
-    agent_key_expires_at: Optional[str] = None,
-    min_key_ttl_seconds: int = DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
-    timeout_seconds: float = 15.0,
-    insecure: Optional[bool] = None,
-    ca_bundle: Optional[str] = None,
-    force_refresh: bool = False,
-    force_mint: bool = False,
-) -> Dict[str, Any]:
-    """Refresh Nous OAuth state without mutating auth.json."""
-    state: Dict[str, Any] = {
-        "access_token": access_token,
-        "refresh_token": refresh_token,
-        "client_id": client_id or DEFAULT_NOUS_CLIENT_ID,
-        "portal_base_url": (portal_base_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/"),
-        "inference_base_url": (inference_base_url or DEFAULT_NOUS_INFERENCE_URL).rstrip("/"),
-        "token_type": token_type or "Bearer",
-        "scope": scope or DEFAULT_NOUS_SCOPE,
-        "obtained_at": obtained_at,
-        "expires_at": expires_at,
-        "agent_key": agent_key,
-        "agent_key_expires_at": agent_key_expires_at,
-        "tls": {
-            "insecure": bool(insecure),
-            "ca_bundle": ca_bundle,
-        },
-    }
-    verify = _resolve_verify(insecure=insecure, ca_bundle=ca_bundle, auth_state=state)
-    timeout = httpx.Timeout(timeout_seconds if timeout_seconds else 15.0)
-
-    with httpx.Client(timeout=timeout, headers={"Accept": "application/json"}, verify=verify) as client:
-        if force_refresh or _is_expiring(state.get("expires_at"), ACCESS_TOKEN_REFRESH_SKEW_SECONDS):
-            refreshed = _refresh_access_token(
-                client=client,
-                portal_base_url=state["portal_base_url"],
-                client_id=state["client_id"],
-                refresh_token=state["refresh_token"],
-            )
-            now = datetime.now(timezone.utc)
-            access_ttl = _coerce_ttl_seconds(refreshed.get("expires_in"))
-            state["access_token"] = refreshed["access_token"]
-            state["refresh_token"] = refreshed.get("refresh_token") or state["refresh_token"]
-            state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
-            state["scope"] = refreshed.get("scope") or state.get("scope")
-            refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
-            if refreshed_url:
-                state["inference_base_url"] = refreshed_url
-            state["obtained_at"] = now.isoformat()
-            state["expires_in"] = access_ttl
-            state["expires_at"] = datetime.fromtimestamp(
-                now.timestamp() + access_ttl, tz=timezone.utc
-            ).isoformat()
-
-        if force_mint or not _agent_key_is_usable(state, max(60, int(min_key_ttl_seconds))):
-            mint_payload = _mint_agent_key(
-                client=client,
-                portal_base_url=state["portal_base_url"],
-                access_token=state["access_token"],
-                min_ttl_seconds=min_key_ttl_seconds,
-            )
-            now = datetime.now(timezone.utc)
-            state["agent_key"] = mint_payload.get("api_key")
-            state["agent_key_id"] = mint_payload.get("key_id")
-            state["agent_key_expires_at"] = mint_payload.get("expires_at")
-            state["agent_key_expires_in"] = mint_payload.get("expires_in")
-            state["agent_key_reused"] = bool(mint_payload.get("reused", False))
-            state["agent_key_obtained_at"] = now.isoformat()
-            minted_url = _optional_base_url(mint_payload.get("inference_base_url"))
-            if minted_url:
-                state["inference_base_url"] = minted_url
-
-    return state
-
-
-def refresh_nous_oauth_from_state(
-    state: Dict[str, Any],
-    *,
-    min_key_ttl_seconds: int = DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
-    timeout_seconds: float = 15.0,
-    force_refresh: bool = False,
-    force_mint: bool = False,
-) -> Dict[str, Any]:
-    """Refresh Nous OAuth from a state dict. Thin wrapper around refresh_nous_oauth_pure."""
-    tls = state.get("tls") or {}
-    return refresh_nous_oauth_pure(
-        state.get("access_token", ""),
-        state.get("refresh_token", ""),
-        state.get("client_id", "hermes-cli"),
-        state.get("portal_base_url", DEFAULT_NOUS_PORTAL_URL),
-        state.get("inference_base_url", DEFAULT_NOUS_INFERENCE_URL),
-        token_type=state.get("token_type", "Bearer"),
-        scope=state.get("scope", DEFAULT_NOUS_SCOPE),
-        obtained_at=state.get("obtained_at"),
-        expires_at=state.get("expires_at"),
-        agent_key=state.get("agent_key"),
-        agent_key_expires_at=state.get("agent_key_expires_at"),
-        min_key_ttl_seconds=min_key_ttl_seconds,
-        timeout_seconds=timeout_seconds,
-        insecure=tls.get("insecure"),
-        ca_bundle=tls.get("ca_bundle"),
-        force_refresh=force_refresh,
-        force_mint=force_mint,
-    )
-
-
 def resolve_nous_runtime_credentials(
    *,
    min_key_ttl_seconds: int = DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
@@ -2360,36 +2180,34 @@ def _codex_device_code_login() -> Dict[str, Any]:
    }


-def _nous_device_code_login(
-    *,
-    portal_base_url: Optional[str] = None,
-    inference_base_url: Optional[str] = None,
-    client_id: Optional[str] = None,
-    scope: Optional[str] = None,
-    open_browser: bool = True,
-    timeout_seconds: float = 15.0,
-    insecure: bool = False,
-    ca_bundle: Optional[str] = None,
-    min_key_ttl_seconds: int = 5 * 60,
-) -> Dict[str, Any]:
-    """Run the Nous device-code flow and return full OAuth state without persisting."""
-    pconfig = PROVIDER_REGISTRY["nous"]
+def _login_nous(args, pconfig: ProviderConfig) -> None:
+    """Nous Portal device authorization flow."""
    portal_base_url = (
-        portal_base_url
+        getattr(args, "portal_url", None)
        or os.getenv("HERMES_PORTAL_BASE_URL")
        or os.getenv("NOUS_PORTAL_BASE_URL")
        or pconfig.portal_base_url
    ).rstrip("/")
    requested_inference_url = (
-        inference_base_url
+        getattr(args, "inference_url", None)
        or os.getenv("NOUS_INFERENCE_BASE_URL")
        or pconfig.inference_base_url
    ).rstrip("/")
-    client_id = client_id or pconfig.client_id
-    scope = scope or pconfig.scope
+    client_id = getattr(args, "client_id", None) or pconfig.client_id
+    scope = getattr(args, "scope", None) or pconfig.scope
+    open_browser = not getattr(args, "no_browser", False)
+    timeout_seconds = getattr(args, "timeout", None) or 15.0
    timeout = httpx.Timeout(timeout_seconds)
+
+    insecure = bool(getattr(args, "insecure", False))
+    ca_bundle = (
+        getattr(args, "ca_bundle", None)
+        or os.getenv("HERMES_CA_BUNDLE")
+        or os.getenv("SSL_CERT_FILE")
+    )
    verify: bool | str = False if insecure else (ca_bundle if ca_bundle else True)

+    # Skip browser open in SSH sessions
    if _is_remote_session():
        open_browser = False

@@ -2400,109 +2218,74 @@ def _nous_device_code_login(
    elif ca_bundle:
        print(f"TLS verification: custom CA bundle ({ca_bundle})")

-    with httpx.Client(timeout=timeout, headers={"Accept": "application/json"}, verify=verify) as client:
-        device_data = _request_device_code(
-            client=client,
-            portal_base_url=portal_base_url,
-            client_id=client_id,
-            scope=scope,
-        )
-
-        verification_url = str(device_data["verification_uri_complete"])
-        user_code = str(device_data["user_code"])
-        expires_in = int(device_data["expires_in"])
-        interval = int(device_data["interval"])
-
-        print()
-        print("To continue:")
-        print(f"  1. Open: {verification_url}")
-        print(f"  2. If prompted, enter code: {user_code}")
-
-        if open_browser:
-            opened = webbrowser.open(verification_url)
-            if opened:
-                print("  (Opened browser for verification)")
-            else:
-                print("  Could not open browser automatically — use the URL above.")
-
-        effective_interval = max(1, min(interval, DEVICE_AUTH_POLL_INTERVAL_CAP_SECONDS))
-        print(f"Waiting for approval (polling every {effective_interval}s)...")
-
-        token_data = _poll_for_token(
-            client=client,
-            portal_base_url=portal_base_url,
-            client_id=client_id,
-            device_code=str(device_data["device_code"]),
-            expires_in=expires_in,
-            poll_interval=interval,
-        )
-
-    now = datetime.now(timezone.utc)
-    token_expires_in = _coerce_ttl_seconds(token_data.get("expires_in", 0))
-    expires_at = now.timestamp() + token_expires_in
-    resolved_inference_url = (
-        _optional_base_url(token_data.get("inference_base_url"))
-        or requested_inference_url
-    )
-    if resolved_inference_url != requested_inference_url:
-        print(f"Using portal-provided inference URL: {resolved_inference_url}")
-
-    auth_state = {
-        "portal_base_url": portal_base_url,
-        "inference_base_url": resolved_inference_url,
-        "client_id": client_id,
-        "scope": token_data.get("scope") or scope,
-        "token_type": token_data.get("token_type", "Bearer"),
-        "access_token": token_data["access_token"],
-        "refresh_token": token_data.get("refresh_token"),
-        "obtained_at": now.isoformat(),
-        "expires_at": datetime.fromtimestamp(expires_at, tz=timezone.utc).isoformat(),
-        "expires_in": token_expires_in,
-        "tls": {
-            "insecure": verify is False,
-            "ca_bundle": verify if isinstance(verify, str) else None,
-        },
-        "agent_key": None,
-        "agent_key_id": None,
-        "agent_key_expires_at": None,
-        "agent_key_expires_in": None,
-        "agent_key_reused": None,
-        "agent_key_obtained_at": None,
-    }
-    return refresh_nous_oauth_from_state(
-        auth_state,
-        min_key_ttl_seconds=min_key_ttl_seconds,
-        timeout_seconds=timeout_seconds,
-        force_refresh=False,
-        force_mint=True,
-    )
-
-
-def _login_nous(args, pconfig: ProviderConfig) -> None:
-    """Nous Portal device authorization flow."""
-    timeout_seconds = getattr(args, "timeout", None) or 15.0
-    insecure = bool(getattr(args, "insecure", False))
-    ca_bundle = (
-        getattr(args, "ca_bundle", None)
-        or os.getenv("HERMES_CA_BUNDLE")
-        or os.getenv("SSL_CERT_FILE")
-    )
-
    try:
-        auth_state = _nous_device_code_login(
-            portal_base_url=getattr(args, "portal_url", None) or pconfig.portal_base_url,
-            inference_base_url=getattr(args, "inference_url", None) or pconfig.inference_base_url,
-            client_id=getattr(args, "client_id", None) or pconfig.client_id,
-            scope=getattr(args, "scope", None) or pconfig.scope,
-            open_browser=not getattr(args, "no_browser", False),
-            timeout_seconds=timeout_seconds,
-            insecure=insecure,
-            ca_bundle=ca_bundle,
-            min_key_ttl_seconds=5 * 60,
-        )
-        inference_base_url = auth_state["inference_base_url"]
-        verify: bool | str = False if insecure else (ca_bundle if ca_bundle else True)
+        with httpx.Client(timeout=timeout, headers={"Accept": "application/json"}, verify=verify) as client:
+            device_data = _request_device_code(
+                client=client, portal_base_url=portal_base_url,
+                client_id=client_id, scope=scope,
+            )

+            verification_url = str(device_data["verification_uri_complete"])
+            user_code = str(device_data["user_code"])
+            expires_in = int(device_data["expires_in"])
+            interval = int(device_data["interval"])
+
+            print()
+            print("To continue:")
+            print(f"  1. Open: {verification_url}")
+            print(f"  2. If prompted, enter code: {user_code}")
+
+            if open_browser:
+                opened = webbrowser.open(verification_url)
+                if opened:
+                    print("  (Opened browser for verification)")
+                else:
+                    print("  Could not open browser automatically — use the URL above.")
+
+            effective_interval = max(1, min(interval, DEVICE_AUTH_POLL_INTERVAL_CAP_SECONDS))
+            print(f"Waiting for approval (polling every {effective_interval}s)...")
+
+            token_data = _poll_for_token(
+                client=client, portal_base_url=portal_base_url,
+                client_id=client_id, device_code=str(device_data["device_code"]),
+                expires_in=expires_in, poll_interval=interval,
+            )
+
+        # Process token response
+        now = datetime.now(timezone.utc)
+        token_expires_in = _coerce_ttl_seconds(token_data.get("expires_in", 0))
+        expires_at = now.timestamp() + token_expires_in
+        inference_base_url = (
+            _optional_base_url(token_data.get("inference_base_url"))
+            or requested_inference_url
+        )
+        if inference_base_url != requested_inference_url:
+            print(f"Using portal-provided inference URL: {inference_base_url}")
+
+        auth_state = {
+            "portal_base_url": portal_base_url,
+            "inference_base_url": inference_base_url,
+            "client_id": client_id,
+            "scope": token_data.get("scope") or scope,
+            "token_type": token_data.get("token_type", "Bearer"),
+            "access_token": token_data["access_token"],
+            "refresh_token": token_data.get("refresh_token"),
+            "obtained_at": now.isoformat(),
+            "expires_at": datetime.fromtimestamp(expires_at, tz=timezone.utc).isoformat(),
+            "expires_in": token_expires_in,
+            "tls": {
+                "insecure": verify is False,
+                "ca_bundle": verify if isinstance(verify, str) else None,
+            },
+            "agent_key": None,
+            "agent_key_id": None,
+            "agent_key_expires_at": None,
+            "agent_key_expires_in": None,
+            "agent_key_reused": None,
+            "agent_key_obtained_at": None,
+        }
+
+        # Save auth state
        with _auth_store_lock():
            auth_store = _load_auth_store()
            _save_provider_state(auth_store, "nous", auth_state)
@@ -2514,14 +2297,18 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
        print(f"  Auth state: {saved_to}")
        print(f"  Config updated: {config_path} (model.provider=nous)")

+        # Mint an initial agent key and list available models
        try:
-            runtime_key = auth_state.get("agent_key") or auth_state.get("access_token")
+            runtime_creds = resolve_nous_runtime_credentials(
+                min_key_ttl_seconds=5 * 60,
+                timeout_seconds=timeout_seconds,
+                insecure=insecure, ca_bundle=ca_bundle,
+            )
+            runtime_key = runtime_creds.get("api_key")
+            runtime_base_url = runtime_creds.get("base_url") or inference_base_url
            if not isinstance(runtime_key, str) or not runtime_key:
-                raise AuthError(
-                    "No runtime API key available to fetch models",
-                    provider="nous",
-                    code="invalid_token",
-                )
+                raise AuthError("No runtime API key available to fetch models",
+                                provider="nous", code="invalid_token")

            # Use curated model list (same as OpenRouter defaults) instead
            # of the full /models dump which returns hundreds of models.
@@ -1,470 +0,0 @@
-"""Credential-pool auth subcommands."""
-
-from __future__ import annotations
-
-from getpass import getpass
-import math
-import time
-from types import SimpleNamespace
-import uuid
-
-from agent.credential_pool import (
-    AUTH_TYPE_API_KEY,
-    AUTH_TYPE_OAUTH,
-    CUSTOM_POOL_PREFIX,
-    SOURCE_MANUAL,
-    STATUS_EXHAUSTED,
-    STRATEGY_FILL_FIRST,
-    STRATEGY_ROUND_ROBIN,
-    STRATEGY_RANDOM,
-    STRATEGY_LEAST_USED,
-    SUPPORTED_POOL_STRATEGIES,
-    PooledCredential,
-    _normalize_custom_pool_name,
-    get_pool_strategy,
-    label_from_token,
-    list_custom_pool_providers,
-    load_pool,
-    _exhausted_ttl,
-)
-import hermes_cli.auth as auth_mod
-from hermes_cli.auth import PROVIDER_REGISTRY
-from hermes_constants import OPENROUTER_BASE_URL
-
-
-# Providers that support OAuth login in addition to API keys.
-_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex"}
-
-
-def _get_custom_provider_names() -> list:
-    """Return list of (display_name, pool_key) tuples for custom_providers in config."""
-    try:
-        from hermes_cli.config import load_config
-
-        config = load_config()
-    except Exception:
-        return []
-    custom_providers = config.get("custom_providers")
-    if not isinstance(custom_providers, list):
-        return []
-    result = []
-    for entry in custom_providers:
-        if not isinstance(entry, dict):
-            continue
-        name = entry.get("name")
-        if not isinstance(name, str) or not name.strip():
-            continue
-        pool_key = f"{CUSTOM_POOL_PREFIX}{_normalize_custom_pool_name(name)}"
-        result.append((name.strip(), pool_key))
-    return result
-
-
-def _resolve_custom_provider_input(raw: str) -> str | None:
-    """If raw input matches a custom_providers entry name (case-insensitive), return its pool key."""
-    normalized = (raw or "").strip().lower().replace(" ", "-")
-    if not normalized:
-        return None
-    # Direct match on 'custom:name' format
-    if normalized.startswith(CUSTOM_POOL_PREFIX):
-        return normalized
-    for display_name, pool_key in _get_custom_provider_names():
-        if _normalize_custom_pool_name(display_name) == normalized:
-            return pool_key
-    return None
-
-
-def _normalize_provider(provider: str) -> str:
-    normalized = (provider or "").strip().lower()
-    if normalized in {"or", "open-router"}:
-        return "openrouter"
-    # Check if it matches a custom provider name
-    custom_key = _resolve_custom_provider_input(normalized)
-    if custom_key:
-        return custom_key
-    return normalized
-
-
-def _provider_base_url(provider: str) -> str:
-    if provider == "openrouter":
-        return OPENROUTER_BASE_URL
-    if provider.startswith(CUSTOM_POOL_PREFIX):
-        from agent.credential_pool import _get_custom_provider_config
-
-        cp_config = _get_custom_provider_config(provider)
-        if cp_config:
-            return str(cp_config.get("base_url") or "").strip()
-        return ""
-    pconfig = PROVIDER_REGISTRY.get(provider)
-    return pconfig.inference_base_url if pconfig else ""
-
-
-def _oauth_default_label(provider: str, count: int) -> str:
-    return f"{provider}-oauth-{count}"
-
-
-def _api_key_default_label(count: int) -> str:
-    return f"api-key-{count}"
-
-
-def _display_source(source: str) -> str:
-    return source.split(":", 1)[1] if source.startswith("manual:") else source
-
-
-def _format_exhausted_status(entry) -> str:
-    if entry.last_status != STATUS_EXHAUSTED:
-        return ""
-    code = f" ({entry.last_error_code})" if entry.last_error_code else ""
-    if not entry.last_status_at:
-        return f" exhausted{code}"
-    remaining = max(0, int(math.ceil((entry.last_status_at + _exhausted_ttl(entry.last_error_code)) - time.time())))
-    if remaining <= 0:
-        return f" exhausted{code} (ready to retry)"
-    minutes, seconds = divmod(remaining, 60)
-    hours, minutes = divmod(minutes, 60)
-    if hours:
-        wait = f"{hours}h {minutes}m"
-    elif minutes:
-        wait = f"{minutes}m {seconds}s"
-    else:
-        wait = f"{seconds}s"
-    return f" exhausted{code} ({wait} left)"
-
-
-def auth_add_command(args) -> None:
-    provider = _normalize_provider(getattr(args, "provider", ""))
-    if provider not in PROVIDER_REGISTRY and provider != "openrouter" and not provider.startswith(CUSTOM_POOL_PREFIX):
-        raise SystemExit(f"Unknown provider: {provider}")
-
-    requested_type = str(getattr(args, "auth_type", "") or "").strip().lower()
-    if requested_type in {AUTH_TYPE_API_KEY, "api-key"}:
-        requested_type = AUTH_TYPE_API_KEY
-    if not requested_type:
-        if provider.startswith(CUSTOM_POOL_PREFIX):
-            requested_type = AUTH_TYPE_API_KEY
-        else:
-            requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex"} else AUTH_TYPE_API_KEY
-
-    pool = load_pool(provider)
-
-    if requested_type == AUTH_TYPE_API_KEY:
-        token = (getattr(args, "api_key", None) or "").strip()
-        if not token:
-            token = getpass("Paste your API key: ").strip()
-        if not token:
-            raise SystemExit("No API key provided.")
-        default_label = _api_key_default_label(len(pool.entries()) + 1)
-        label = (getattr(args, "label", None) or "").strip()
-        if not label:
-            label = input(f"Label (optional, default: {default_label}): ").strip() or default_label
-        entry = PooledCredential(
-            provider=provider,
-            id=uuid.uuid4().hex[:6],
-            label=label,
-            auth_type=AUTH_TYPE_API_KEY,
-            priority=0,
-            source=SOURCE_MANUAL,
-            access_token=token,
-            base_url=_provider_base_url(provider),
-        )
-        pool.add_entry(entry)
-        print(f'Added {provider} credential #{len(pool.entries())}: "{label}"')
-        return
-
-    if provider == "anthropic":
-        from agent import anthropic_adapter as anthropic_mod
-
-        creds = anthropic_mod.run_hermes_oauth_login_pure()
-        if not creds:
-            raise SystemExit("Anthropic OAuth login did not return credentials.")
-        label = (getattr(args, "label", None) or "").strip() or label_from_token(
-            creds["access_token"],
-            _oauth_default_label(provider, len(pool.entries()) + 1),
-        )
-        entry = PooledCredential(
-            provider=provider,
-            id=uuid.uuid4().hex[:6],
-            label=label,
-            auth_type=AUTH_TYPE_OAUTH,
-            priority=0,
-            source=f"{SOURCE_MANUAL}:hermes_pkce",
-            access_token=creds["access_token"],
-            refresh_token=creds.get("refresh_token"),
-            expires_at_ms=creds.get("expires_at_ms"),
-            base_url=_provider_base_url(provider),
-        )
-        pool.add_entry(entry)
-        print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
-        return
-
-    if provider == "nous":
-        creds = auth_mod._nous_device_code_login(
-            portal_base_url=getattr(args, "portal_url", None),
-            inference_base_url=getattr(args, "inference_url", None),
-            client_id=getattr(args, "client_id", None),
-            scope=getattr(args, "scope", None),
-            open_browser=not getattr(args, "no_browser", False),
-            timeout_seconds=getattr(args, "timeout", None) or 15.0,
-            insecure=bool(getattr(args, "insecure", False)),
-            ca_bundle=getattr(args, "ca_bundle", None),
-            min_key_ttl_seconds=max(60, int(getattr(args, "min_key_ttl_seconds", 5 * 60))),
-        )
-        label = (getattr(args, "label", None) or "").strip() or label_from_token(
-            creds.get("access_token", ""),
-            _oauth_default_label(provider, len(pool.entries()) + 1),
-        )
-        entry = PooledCredential.from_dict(provider, {
-            **creds,
-            "label": label,
-            "auth_type": AUTH_TYPE_OAUTH,
-            "source": f"{SOURCE_MANUAL}:device_code",
-            "base_url": creds.get("inference_base_url"),
-        })
-        pool.add_entry(entry)
-        print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
-        return
-
-    if provider == "openai-codex":
-        creds = auth_mod._codex_device_code_login()
-        label = (getattr(args, "label", None) or "").strip() or label_from_token(
-            creds["tokens"]["access_token"],
-            _oauth_default_label(provider, len(pool.entries()) + 1),
-        )
-        entry = PooledCredential(
-            provider=provider,
-            id=uuid.uuid4().hex[:6],
-            label=label,
-            auth_type=AUTH_TYPE_OAUTH,
-            priority=0,
-            source=f"{SOURCE_MANUAL}:device_code",
-            access_token=creds["tokens"]["access_token"],
-            refresh_token=creds["tokens"].get("refresh_token"),
-            base_url=creds.get("base_url"),
-            last_refresh=creds.get("last_refresh"),
-        )
-        pool.add_entry(entry)
-        print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
-        return
-
-    raise SystemExit(f"`hermes auth add {provider}` is not implemented for auth type {requested_type} yet.")
-
-
-def auth_list_command(args) -> None:
-    provider_filter = _normalize_provider(getattr(args, "provider", "") or "")
-    if provider_filter:
-        providers = [provider_filter]
-    else:
-        providers = sorted({
-            *PROVIDER_REGISTRY.keys(),
-            "openrouter",
-            *list_custom_pool_providers(),
-        })
-    for provider in providers:
-        pool = load_pool(provider)
-        entries = pool.entries()
-        if not entries:
-            continue
-        current = pool.peek()
-        print(f"{provider} ({len(entries)} credentials):")
-        for idx, entry in enumerate(entries, start=1):
-            marker = "  "
-            if current is not None and entry.id == current.id:
-                marker = "← "
-            status = _format_exhausted_status(entry)
-            source = _display_source(entry.source)
-            print(f"  #{idx}  {entry.label:<20} {entry.auth_type:<7} {source}{status} {marker}".rstrip())
-        print()
-
-
-def auth_remove_command(args) -> None:
-    provider = _normalize_provider(getattr(args, "provider", ""))
-    index = int(getattr(args, "index"))
-    pool = load_pool(provider)
-    removed = pool.remove_index(index)
-    if removed is None:
-        raise SystemExit(f"No credential #{index} for provider {provider}.")
-    print(f"Removed {provider} credential #{index} ({removed.label})")
-
-
-def auth_reset_command(args) -> None:
-    provider = _normalize_provider(getattr(args, "provider", ""))
-    pool = load_pool(provider)
-    count = pool.reset_statuses()
-    print(f"Reset status on {count} {provider} credentials")
-
-
-def _interactive_auth() -> None:
-    """Interactive credential pool management when `hermes auth` is called bare."""
-    # Show current pool status first
-    print("Credential Pool Status")
-    print("=" * 50)
-
-    auth_list_command(SimpleNamespace(provider=None))
-    print()
-
-    # Main menu
-    choices = [
-        "Add a credential",
-        "Remove a credential",
-        "Reset cooldowns for a provider",
-        "Set rotation strategy for a provider",
-        "Exit",
-    ]
-    print("What would you like to do?")
-    for i, choice in enumerate(choices, 1):
-        print(f"  {i}. {choice}")
-
-    try:
-        raw = input("\nChoice: ").strip()
-    except (EOFError, KeyboardInterrupt):
-        return
-
-    if not raw or raw == str(len(choices)):
-        return
-
-    if raw == "1":
-        _interactive_add()
-    elif raw == "2":
-        _interactive_remove()
-    elif raw == "3":
-        _interactive_reset()
-    elif raw == "4":
-        _interactive_strategy()
-
-
-def _pick_provider(prompt: str = "Provider") -> str:
-    """Prompt for a provider name with auto-complete hints."""
-    known = sorted(set(list(PROVIDER_REGISTRY.keys()) + ["openrouter"]))
-    custom_names = _get_custom_provider_names()
-    if custom_names:
-        custom_display = [name for name, _key in custom_names]
-        print(f"\nKnown providers: {', '.join(known)}")
-        print(f"Custom endpoints: {', '.join(custom_display)}")
-    else:
-        print(f"\nKnown providers: {', '.join(known)}")
-    try:
-        raw = input(f"{prompt}: ").strip()
-    except (EOFError, KeyboardInterrupt):
-        raise SystemExit()
-    return _normalize_provider(raw)
-
-
-def _interactive_add() -> None:
-    provider = _pick_provider("Provider to add credential for")
-    if provider not in PROVIDER_REGISTRY and provider != "openrouter" and not provider.startswith(CUSTOM_POOL_PREFIX):
-        raise SystemExit(f"Unknown provider: {provider}")
-
-    # For OAuth-capable providers, ask which type
-    if provider in _OAUTH_CAPABLE_PROVIDERS:
-        print(f"\n{provider} supports both API keys and OAuth login.")
-        print("  1. API key (paste a key from the provider dashboard)")
-        print("  2. OAuth login (authenticate via browser)")
-        try:
-            type_choice = input("Type [1/2]: ").strip()
-        except (EOFError, KeyboardInterrupt):
-            return
-        if type_choice == "2":
-            auth_type = "oauth"
-        else:
-            auth_type = "api_key"
-    else:
-        auth_type = "api_key"
-
-    auth_add_command(SimpleNamespace(
-        provider=provider, auth_type=auth_type, label=None, api_key=None,
-        portal_url=None, inference_url=None, client_id=None, scope=None,
-        no_browser=False, timeout=None, insecure=False, ca_bundle=None,
-    ))
-
-
-def _interactive_remove() -> None:
-    provider = _pick_provider("Provider to remove credential from")
-    pool = load_pool(provider)
-    if not pool.has_credentials():
-        print(f"No credentials for {provider}.")
-        return
-
-    # Show entries with indices
-    for i, e in enumerate(pool.entries(), 1):
-        exhausted = _format_exhausted_status(e)
-        print(f"  #{i}  {e.label:25s} {e.auth_type:10s} {e.source}{exhausted}")
-
-    try:
-        raw = input("Remove # (or blank to cancel): ").strip()
-    except (EOFError, KeyboardInterrupt):
-        return
-    if not raw:
-        return
-
-    try:
-        index = int(raw)
-    except ValueError:
-        print("Invalid number.")
-        return
-
-    auth_remove_command(SimpleNamespace(provider=provider, index=index))
-
-
-def _interactive_reset() -> None:
-    provider = _pick_provider("Provider to reset cooldowns for")
-
-    auth_reset_command(SimpleNamespace(provider=provider))
-
-
-def _interactive_strategy() -> None:
-    provider = _pick_provider("Provider to set strategy for")
-    current = get_pool_strategy(provider)
-    strategies = [STRATEGY_FILL_FIRST, STRATEGY_ROUND_ROBIN, STRATEGY_LEAST_USED, STRATEGY_RANDOM]
-
-    print(f"\nCurrent strategy for {provider}: {current}")
-    print()
-    descriptions = {
-        STRATEGY_FILL_FIRST: "Use first key until exhausted, then next",
-        STRATEGY_ROUND_ROBIN: "Cycle through keys evenly",
-        STRATEGY_LEAST_USED: "Always pick the least-used key",
-        STRATEGY_RANDOM: "Random selection",
-    }
-    for i, s in enumerate(strategies, 1):
-        marker = " ←" if s == current else ""
-        print(f"  {i}. {s:15s} — {descriptions.get(s, '')}{marker}")
-
-    try:
-        raw = input("\nStrategy [1-4]: ").strip()
-    except (EOFError, KeyboardInterrupt):
-        return
-    if not raw:
-        return
-
-    try:
-        idx = int(raw) - 1
-        strategy = strategies[idx]
-    except (ValueError, IndexError):
-        print("Invalid choice.")
-        return
-
-    from hermes_cli.config import load_config, save_config
-    cfg = load_config()
-    pool_strategies = cfg.get("credential_pool_strategies") or {}
-    if not isinstance(pool_strategies, dict):
-        pool_strategies = {}
-    pool_strategies[provider] = strategy
-    cfg["credential_pool_strategies"] = pool_strategies
-    save_config(cfg)
-    print(f"Set {provider} strategy to: {strategy}")
-
-
-def auth_command(args) -> None:
-    action = getattr(args, "auth_action", "")
-    if action == "add":
-        auth_add_command(args)
-        return
-    if action == "list":
-        auth_list_command(args)
-        return
-    if action == "remove":
-        auth_remove_command(args)
-        return
-    if action == "reset":
-        auth_reset_command(args)
-        return
-    # No subcommand — launch interactive mode
-    _interactive_auth()
@@ -432,11 +432,10 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
    try:
        behind = get_update_result(timeout=0.5)
        if behind and behind > 0:
-            from hermes_cli.config import recommended_update_command
            commits_word = "commit" if behind == 1 else "commits"
            right_lines.append(
                f"[bold yellow]⚠ {behind} {commits_word} behind[/]"
-                f"[dim yellow] — run [bold]{recommended_update_command()}[/bold] to update[/]"
+                f"[dim yellow] — run [bold]hermes update[/bold] to update[/]"
            )
    except Exception:
        pass  # Never break the banner over an update check
@@ -4,19 +4,14 @@ Usage:
    hermes claw migrate              # Interactive migration from ~/.openclaw
    hermes claw migrate --dry-run    # Preview what would be migrated
    hermes claw migrate --preset full --overwrite  # Full migration, overwrite conflicts
-    hermes claw cleanup              # Archive leftover OpenClaw directories
-    hermes claw cleanup --dry-run    # Preview what would be archived
 """

 import importlib.util
 import logging
-import shutil
 import sys
-from datetime import datetime
 from pathlib import Path

 from hermes_cli.config import get_hermes_home, get_config_path, load_config, save_config
-from hermes_constants import get_optional_skills_dir
 from hermes_cli.setup import (
    Colors,
    color,
@@ -24,7 +19,6 @@ from hermes_cli.setup import (
    print_info,
    print_success,
    print_error,
-    print_warning,
    prompt_yes_no,
 )

@@ -33,7 +27,8 @@ logger = logging.getLogger(__name__)
 PROJECT_ROOT = Path(__file__).parent.parent.resolve()

 _OPENCLAW_SCRIPT = (
-    get_optional_skills_dir(PROJECT_ROOT / "optional-skills")
+    PROJECT_ROOT
+    / "optional-skills"
    / "migration"
    / "openclaw-migration"
    / "scripts"
@@ -50,18 +45,6 @@ _OPENCLAW_SCRIPT_INSTALLED = (
    / "openclaw_to_hermes.py"
 )

-# Known OpenClaw directory names (current + legacy)
-_OPENCLAW_DIR_NAMES = (".openclaw", ".clawdbot", ".moldbot")
-
-# State files commonly found in OpenClaw workspace directories that cause
-# confusion after migration (the agent discovers them and writes to them)
-_WORKSPACE_STATE_GLOBS = (
-    "*/todo.json",
-    "*/sessions/*",
-    "*/memory/*.json",
-    "*/logs/*",
-)
-

 def _find_migration_script() -> Path | None:
    """Find the openclaw_to_hermes.py script in known locations."""
@@ -88,88 +71,19 @@ def _load_migration_module(script_path: Path):
    return mod


-def _find_openclaw_dirs() -> list[Path]:
-    """Find all OpenClaw directories on disk."""
-    found = []
-    for name in _OPENCLAW_DIR_NAMES:
-        candidate = Path.home() / name
-        if candidate.is_dir():
-            found.append(candidate)
-    return found
-
-
-def _scan_workspace_state(source_dir: Path) -> list[tuple[Path, str]]:
-    """Scan an OpenClaw directory for workspace state files that cause confusion.
-
-    Returns a list of (path, description) tuples.
-    """
-    findings: list[tuple[Path, str]] = []
-
-    # Direct state files in the root
-    for name in ("todo.json", "sessions", "logs"):
-        candidate = source_dir / name
-        if candidate.exists():
-            kind = "directory" if candidate.is_dir() else "file"
-            findings.append((candidate, f"Root {kind}: {name}"))
-
-    # State files inside workspace directories
-    for child in sorted(source_dir.iterdir()):
-        if not child.is_dir() or child.name.startswith("."):
-            continue
-        # Check for workspace-like subdirectories
-        for state_name in ("todo.json", "sessions", "logs", "memory"):
-            state_path = child / state_name
-            if state_path.exists():
-                kind = "directory" if state_path.is_dir() else "file"
-                rel = state_path.relative_to(source_dir)
-                findings.append((state_path, f"Workspace {kind}: {rel}"))
-
-    return findings
-
-
-def _archive_directory(source_dir: Path, dry_run: bool = False) -> Path:
-    """Rename an OpenClaw directory to .pre-migration.
-
-    Returns the archive path.
-    """
-    timestamp = datetime.now().strftime("%Y%m%d")
-    archive_name = f"{source_dir.name}.pre-migration"
-    archive_path = source_dir.parent / archive_name
-
-    # If archive already exists, add timestamp
-    if archive_path.exists():
-        archive_name = f"{source_dir.name}.pre-migration-{timestamp}"
-        archive_path = source_dir.parent / archive_name
-
-    # If still exists (multiple runs same day), add counter
-    counter = 2
-    while archive_path.exists():
-        archive_name = f"{source_dir.name}.pre-migration-{timestamp}-{counter}"
-        archive_path = source_dir.parent / archive_name
-        counter += 1
-
-    if not dry_run:
-        source_dir.rename(archive_path)
-
-    return archive_path
-
-
 def claw_command(args):
    """Route hermes claw subcommands."""
    action = getattr(args, "claw_action", None)

    if action == "migrate":
        _cmd_migrate(args)
-    elif action in ("cleanup", "clean"):
-        _cmd_cleanup(args)
    else:
-        print("Usage: hermes claw <command> [options]")
+        print("Usage: hermes claw migrate [options]")
        print()
        print("Commands:")
        print("  migrate          Migrate settings from OpenClaw to Hermes")
-        print("  cleanup          Archive leftover OpenClaw directories after migration")
        print()
-        print("Run 'hermes claw <command> --help' for options.")
+        print("Run 'hermes claw migrate --help' for migration options.")


 def _cmd_migrate(args):
@@ -296,168 +210,6 @@ def _cmd_migrate(args):
    # Print results
    _print_migration_report(report, dry_run)

-    # After successful non-dry-run migration, offer to archive the source directory
-    if not dry_run and report.get("summary", {}).get("migrated", 0) > 0:
-        _offer_source_archival(source_dir, getattr(args, "yes", False))
-
-
-def _offer_source_archival(source_dir: Path, auto_yes: bool = False):
-    """After migration, offer to rename the source directory to prevent state fragmentation.
-
-    OpenClaw workspace directories contain state files (todo.json, sessions, etc.)
-    that the agent may discover and write to, causing confusion. Renaming the
-    directory prevents this.
-    """
-    if not source_dir.is_dir():
-        return
-
-    # Scan for state files that could cause problems
-    state_files = _scan_workspace_state(source_dir)
-
-    print()
-    print_header("Post-Migration Cleanup")
-    print_info("The OpenClaw directory still exists and contains workspace state files")
-    print_info("that can confuse the agent (todo lists, sessions, logs).")
-    if state_files:
-        print()
-        print(color("  Found state files:", Colors.YELLOW))
-        # Show up to 10 most relevant findings
-        for path, desc in state_files[:10]:
-            print(f"      {desc}")
-        if len(state_files) > 10:
-            print(f"      ... and {len(state_files) - 10} more")
-    print()
-    print_info(f"Recommend: rename {source_dir.name}/ to {source_dir.name}.pre-migration/")
-    print_info("This prevents the agent from discovering old workspace directories.")
-    print_info("You can always rename it back if needed.")
-    print()
-
-    if auto_yes or prompt_yes_no(f"Archive {source_dir} now?", default=True):
-        try:
-            archive_path = _archive_directory(source_dir)
-            print_success(f"Archived: {source_dir} → {archive_path}")
-            print_info("The original directory has been renamed, not deleted.")
-            print_info(f"To undo: mv {archive_path} {source_dir}")
-        except OSError as e:
-            print_error(f"Could not archive: {e}")
-            print_info(f"You can do it manually: mv {source_dir} {source_dir}.pre-migration")
-    else:
-        print_info("Skipped. You can archive later with: hermes claw cleanup")
-
-
-def _cmd_cleanup(args):
-    """Archive leftover OpenClaw directories after migration.
-
-    Scans for OpenClaw directories that still exist after migration and offers
-    to rename them to .pre-migration to prevent state fragmentation.
-    """
-    dry_run = getattr(args, "dry_run", False)
-    auto_yes = getattr(args, "yes", False)
-    explicit_source = getattr(args, "source", None)
-
-    print()
-    print(
-        color(
-            "┌─────────────────────────────────────────────────────────┐",
-            Colors.MAGENTA,
-        )
-    )
-    print(
-        color(
-            "│          ⚕ Hermes — OpenClaw Cleanup                   │",
-            Colors.MAGENTA,
-        )
-    )
-    print(
-        color(
-            "└─────────────────────────────────────────────────────────┘",
-            Colors.MAGENTA,
-        )
-    )
-
-    # Find OpenClaw directories
-    if explicit_source:
-        dirs_to_check = [Path(explicit_source)]
-    else:
-        dirs_to_check = _find_openclaw_dirs()
-
-    if not dirs_to_check:
-        print()
-        print_success("No OpenClaw directories found. Nothing to clean up.")
-        return
-
-    total_archived = 0
-
-    for source_dir in dirs_to_check:
-        print()
-        print_header(f"Found: {source_dir}")
-
-        # Scan for state files
-        state_files = _scan_workspace_state(source_dir)
-
-        # Show directory stats
-        try:
-            workspace_dirs = [
-                d for d in source_dir.iterdir()
-                if d.is_dir() and not d.name.startswith(".")
-                and any((d / name).exists() for name in ("todo.json", "SOUL.md", "MEMORY.md", "USER.md"))
-            ]
-        except OSError:
-            workspace_dirs = []
-
-        if workspace_dirs:
-            print_info(f"Workspace directories: {len(workspace_dirs)}")
-            for ws in workspace_dirs[:5]:
-                items = []
-                if (ws / "todo.json").exists():
-                    items.append("todo.json")
-                if (ws / "sessions").is_dir():
-                    items.append("sessions/")
-                if (ws / "SOUL.md").exists():
-                    items.append("SOUL.md")
-                if (ws / "MEMORY.md").exists():
-                    items.append("MEMORY.md")
-                detail = ", ".join(items) if items else "empty"
-                print(f"      {ws.name}/  ({detail})")
-            if len(workspace_dirs) > 5:
-                print(f"      ... and {len(workspace_dirs) - 5} more")
-
-        if state_files:
-            print()
-            print(color(f"  {len(state_files)} state file(s) that could cause confusion:", Colors.YELLOW))
-            for path, desc in state_files[:8]:
-                print(f"      {desc}")
-            if len(state_files) > 8:
-                print(f"      ... and {len(state_files) - 8} more")
-
-        print()
-
-        if dry_run:
-            archive_path = _archive_directory(source_dir, dry_run=True)
-            print_info(f"Would archive: {source_dir} → {archive_path}")
-        else:
-            if auto_yes or prompt_yes_no(f"Archive {source_dir}?", default=True):
-                try:
-                    archive_path = _archive_directory(source_dir)
-                    print_success(f"Archived: {source_dir} → {archive_path}")
-                    total_archived += 1
-                except OSError as e:
-                    print_error(f"Could not archive: {e}")
-                    print_info(f"Try manually: mv {source_dir} {source_dir}.pre-migration")
-            else:
-                print_info("Skipped.")
-
-    # Summary
-    print()
-    if dry_run:
-        print_info(f"Dry run complete. {len(dirs_to_check)} directory(ies) would be archived.")
-        print_info("Run without --dry-run to archive them.")
-    elif total_archived:
-        print_success(f"Cleaned up {total_archived} OpenClaw directory(ies).")
-        print_info("Directories were renamed, not deleted. You can undo by renaming them back.")
-    else:
-        print_info("No directories were archived.")
-

 def _print_migration_report(report: dict, dry_run: bool):
    """Print a formatted migration report."""
@@ -1,24 +1,8 @@
 """Shared ANSI color utilities for Hermes CLI modules."""

-import os
 import sys


-def should_use_color() -> bool:
-    """Return True when colored output is appropriate.
-
-    Respects the NO_COLOR environment variable (https://no-color.org/)
-    and TERM=dumb, in addition to the existing TTY check.
-    """
-    if os.environ.get("NO_COLOR") is not None:
-        return False
-    if os.environ.get("TERM") == "dumb":
-        return False
-    if not sys.stdout.isatty():
-        return False
-    return True
-
-
 class Colors:
    RESET = "\033[0m"
    BOLD = "\033[1m"
@@ -32,7 +16,7 @@ class Colors:


 def color(text: str, *codes) -> str:
-    """Apply color codes to text (only when color output is appropriate)."""
-    if not should_use_color():
+    """Apply color codes to text (only when output is a TTY)."""
+    if not sys.stdout.isatty():
        return text
    return "".join(codes) + text + Colors.RESET
@@ -67,13 +67,10 @@ COMMAND_REGISTRY: list[CommandDef] = [
               gateway_only=True),
    CommandDef("background", "Run a prompt in the background", "Session",
               aliases=("bg",), args_hint="<prompt>"),
-    CommandDef("btw", "Ephemeral side question using session context (no tools, not persisted)", "Session",
-               args_hint="<question>"),
    CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session",
               aliases=("q",), args_hint="<prompt>"),
    CommandDef("status", "Show session info", "Session",
               gateway_only=True),
-    CommandDef("profile", "Show active profile name and home directory", "Info"),
    CommandDef("sethome", "Set this chat as the home channel", "Session",
               gateway_only=True, aliases=("set-home",)),
    CommandDef("resume", "Resume a previously-named session", "Session",
@@ -93,8 +90,6 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("verbose", "Cycle tool progress display: off -> new -> all -> verbose",
               "Configuration", cli_only=True,
               gateway_config_gate="display.tool_progress_command"),
-    CommandDef("yolo", "Toggle YOLO mode (skip all dangerous command approvals)",
-               "Configuration"),
    CommandDef("reasoning", "Manage reasoning effort and display", "Configuration",
               args_hint="[level|show|hide]",
               subcommands=("none", "low", "minimal", "medium", "high", "xhigh", "show", "hide", "on", "off")),
@@ -123,8 +118,6 @@ COMMAND_REGISTRY: list[CommandDef] = [
               "Tools & Skills", cli_only=True),

    # Info
-    CommandDef("commands", "Browse all commands and skills (paginated)", "Info",
-               gateway_only=True, args_hint="[page]"),
    CommandDef("help", "Show available commands", "Info"),
    CommandDef("usage", "Show token usage for the current session", "Info"),
    CommandDef("insights", "Show usage insights and analytics", "Info",
@@ -368,117 +361,6 @@ def telegram_bot_commands() -> list[tuple[str, str]]:
    return result


-_TG_NAME_LIMIT = 32
-
-
-def _clamp_telegram_names(
-    entries: list[tuple[str, str]],
-    reserved: set[str],
-) -> list[tuple[str, str]]:
-    """Enforce Telegram's 32-char command name limit with collision avoidance.
-
-    Names exceeding 32 chars are truncated.  If truncation creates a duplicate
-    (against *reserved* names or earlier entries in the same batch), the name is
-    shortened to 31 chars and a digit ``0``-``9`` is appended to differentiate.
-    If all 10 digit slots are taken the entry is silently dropped.
-    """
-    used: set[str] = set(reserved)
-    result: list[tuple[str, str]] = []
-    for name, desc in entries:
-        if len(name) > _TG_NAME_LIMIT:
-            candidate = name[:_TG_NAME_LIMIT]
-            if candidate in used:
-                prefix = name[:_TG_NAME_LIMIT - 1]
-                for digit in range(10):
-                    candidate = f"{prefix}{digit}"
-                    if candidate not in used:
-                        break
-                else:
-                    # All 10 digit slots exhausted — skip entry
-                    continue
-            name = candidate
-        if name in used:
-            continue
-        used.add(name)
-        result.append((name, desc))
-    return result
-
-
-def telegram_menu_commands(max_commands: int = 100) -> tuple[list[tuple[str, str]], int]:
-    """Return Telegram menu commands capped to the Bot API limit.
-
-    Priority order (higher priority = never bumped by overflow):
-      1. Core CommandDef commands (always included)
-      2. Plugin slash commands (take precedence over skills)
-      3. Built-in skill commands (fill remaining slots, alphabetical)
-
-    Skills are the only tier that gets trimmed when the cap is hit.
-    User-installed hub skills are excluded — accessible via /skills.
-
-    Returns:
-        (menu_commands, hidden_count) where hidden_count is the number of
-        skill commands omitted due to the cap.
-    """
-    core_commands = list(telegram_bot_commands())
-    # Reserve core names so plugin/skill truncation can't collide with them
-    reserved_names = {n for n, _ in core_commands}
-    all_commands = list(core_commands)
-
-    # Plugin slash commands get priority over skills
-    plugin_entries: list[tuple[str, str]] = []
-    try:
-        from hermes_cli.plugins import get_plugin_manager
-        pm = get_plugin_manager()
-        plugin_cmds = getattr(pm, "_plugin_commands", {})
-        for cmd_name in sorted(plugin_cmds):
-            tg_name = cmd_name.replace("-", "_")
-            desc = "Plugin command"
-            if len(desc) > 40:
-                desc = desc[:37] + "..."
-            plugin_entries.append((tg_name, desc))
-    except Exception:
-        pass
-
-    # Clamp plugin names to 32 chars with collision avoidance
-    plugin_entries = _clamp_telegram_names(plugin_entries, reserved_names)
-    reserved_names.update(n for n, _ in plugin_entries)
-    all_commands.extend(plugin_entries)
-
-    # Remaining slots go to built-in skill commands (not hub-installed).
-    skill_entries: list[tuple[str, str]] = []
-    try:
-        from agent.skill_commands import get_skill_commands
-        from tools.skills_tool import SKILLS_DIR
-        _skills_dir = str(SKILLS_DIR.resolve())
-        _hub_dir = str((SKILLS_DIR / ".hub").resolve())
-        skill_cmds = get_skill_commands()
-        for cmd_key in sorted(skill_cmds):
-            info = skill_cmds[cmd_key]
-            skill_path = info.get("skill_md_path", "")
-            if not skill_path.startswith(_skills_dir):
-                continue
-            if skill_path.startswith(_hub_dir):
-                continue
-            name = cmd_key.lstrip("/").replace("-", "_")
-            desc = info.get("description", "")
-            # Keep descriptions short — setMyCommands has an undocumented
-            # total payload limit.  40 chars fits 100 commands safely.
-            if len(desc) > 40:
-                desc = desc[:37] + "..."
-            skill_entries.append((name, desc))
-    except Exception:
-        pass
-
-    # Clamp skill names to 32 chars with collision avoidance
-    skill_entries = _clamp_telegram_names(skill_entries, reserved_names)
-
-    # Skills fill remaining slots — they're the only tier that gets trimmed
-    remaining_slots = max(0, max_commands - len(all_commands))
-    hidden_count = max(0, len(skill_entries) - remaining_slots)
-    all_commands.extend(skill_entries[:remaining_slots])
-    return all_commands[:max_commands], hidden_count
-
-
 def slack_subcommand_map() -> dict[str, str]:
    """Return subcommand -> /command mapping for Slack /hermes handler.

@@ -52,86 +52,26 @@ from hermes_cli.default_soul import DEFAULT_SOUL_MD
 # Managed mode (NixOS declarative config)
 # =============================================================================

-_MANAGED_TRUE_VALUES = ("true", "1", "yes")
-_MANAGED_SYSTEM_NAMES = {
-    "brew": "Homebrew",
-    "homebrew": "Homebrew",
-    "nix": "NixOS",
-    "nixos": "NixOS",
-}
-
-
-def get_managed_system() -> Optional[str]:
-    """Return the package manager owning this install, if any."""
-    raw = os.getenv("HERMES_MANAGED", "").strip()
-    if raw:
-        normalized = raw.lower()
-        if normalized in _MANAGED_TRUE_VALUES:
-            return "NixOS"
-        return _MANAGED_SYSTEM_NAMES.get(normalized, raw)
-
-    managed_marker = get_hermes_home() / ".managed"
-    if managed_marker.exists():
-        return "NixOS"
-    return None
-
-
 def is_managed() -> bool:
-    """Check if Hermes is running in package-manager-managed mode.
+    """Check if hermes is running in Nix-managed mode.

    Two signals: the HERMES_MANAGED env var (set by the systemd service),
    or a .managed marker file in HERMES_HOME (set by the NixOS activation
    script, so interactive shells also see it).
    """
-    return get_managed_system() is not None
-
-
-def get_managed_update_command() -> Optional[str]:
-    """Return the preferred upgrade command for a managed install."""
-    managed_system = get_managed_system()
-    if managed_system == "Homebrew":
-        return "brew upgrade hermes-agent"
-    if managed_system == "NixOS":
-        return "sudo nixos-rebuild switch"
-    return None
-
-
-def recommended_update_command() -> str:
-    """Return the best update command for the current installation."""
-    return get_managed_update_command() or "hermes update"
-
-
-def format_managed_message(action: str = "modify this Hermes installation") -> str:
-    """Build a user-facing error for managed installs."""
-    managed_system = get_managed_system() or "a package manager"
-    raw = os.getenv("HERMES_MANAGED", "").strip().lower()
-
-    if managed_system == "NixOS":
-        env_hint = "true" if raw in _MANAGED_TRUE_VALUES else raw or "true"
-        return (
-            f"Cannot {action}: this Hermes installation is managed by NixOS "
-            f"(HERMES_MANAGED={env_hint}).\n"
-            "Edit services.hermes-agent.settings in your configuration.nix and run:\n"
-            "  sudo nixos-rebuild switch"
-        )
-
-    if managed_system == "Homebrew":
-        env_hint = raw or "homebrew"
-        return (
-            f"Cannot {action}: this Hermes installation is managed by Homebrew "
-            f"(HERMES_MANAGED={env_hint}).\n"
-            "Use:\n"
-            "  brew upgrade hermes-agent"
-        )
-
-    return (
-        f"Cannot {action}: this Hermes installation is managed by {managed_system}.\n"
-        "Use your package manager to upgrade or reinstall Hermes."
-    )
+    if os.getenv("HERMES_MANAGED", "").lower() in ("true", "1", "yes"):
+        return True
+    managed_marker = get_hermes_home() / ".managed"
+    return managed_marker.exists()

 def managed_error(action: str = "modify configuration"):
    """Print user-friendly error for managed mode."""
-    print(format_managed_message(action), file=sys.stderr)
+    print(
+        f"Cannot {action}: configuration is managed by NixOS (HERMES_MANAGED=true).\n"
+        "Edit services.hermes-agent.settings in your configuration.nix and run:\n"
+        "  sudo nixos-rebuild switch",
+        file=sys.stderr,
+    )


 # =============================================================================
@@ -198,7 +138,6 @@ def ensure_hermes_home():
 DEFAULT_CONFIG = {
    "model": "anthropic/claude-opus-4.6",
    "fallback_providers": [],
-    "credential_pool_strategies": {},
    "toolsets": ["hermes-cli"],
    "agent": {
        "max_turns": 90,
@@ -246,14 +185,6 @@ DEFAULT_CONFIG = {
        "inactivity_timeout": 120,
        "command_timeout": 30,  # Timeout for browser commands in seconds (screenshot, navigate, etc.)
        "record_sessions": False,  # Auto-record browser sessions as WebM videos
-        "allow_private_urls": False,  # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
-        "camofox": {
-            # When true, Hermes sends a stable profile-scoped userId to Camofox
-            # so the server can map it to a persistent browser profile directory.
-            # Requires Camofox server to be configured with CAMOFOX_PROFILE_DIR.
-            # When false (default), each session gets a random userId (ephemeral).
-            "managed_persistence": False,
-        },
    },

    # Filesystem checkpoints — automatic snapshots before destructive file ops.
@@ -263,11 +194,6 @@ DEFAULT_CONFIG = {
        "enabled": True,
        "max_snapshots": 50,  # Max checkpoints to keep per directory
    },
-
-    # Maximum characters returned by a single read_file call.  Reads that
-    # exceed this are rejected with guidance to use offset+limit.
-    # 100K chars ≈ 25–35K tokens across typical tokenisers.
-    "file_read_max_chars": 100_000,
    
    "compression": {
        "enabled": True,
@@ -359,7 +285,6 @@ DEFAULT_CONFIG = {
        "bell_on_complete": False,
        "show_reasoning": False,
        "streaming": False,
-        "inline_diffs": True,     # Show inline diff previews for write actions (write_file, patch, skill_manage)
        "show_cost": False,       # Show $ cost in the status bar (off by default)
        "skin": "default",
        "tool_progress_command": False,  # Enable /verbose command in messaging gateway
@@ -467,7 +392,6 @@ DEFAULT_CONFIG = {
        "require_mention": True,       # Require @mention to respond in server channels
        "free_response_channels": "",  # Comma-separated channel IDs where bot responds without mention
        "auto_thread": True,           # Auto-create threads on @mention in channels (like Slack)
-        "reactions": True,             # Add 👀/✅/❌ reactions to messages during processing
    },

    # WhatsApp platform settings (gateway mode)
@@ -517,7 +441,7 @@ DEFAULT_CONFIG = {
    },

    # Config schema version - bump this when adding new required fields
-    "_config_version": 11,
+    "_config_version": 10,
 }

 # =============================================================================
@@ -782,14 +706,6 @@ OPTIONAL_ENV_VARS = {
        "password": True,
        "category": "tool",
    },
-    "CAMOFOX_URL": {
-        "description": "Camofox browser server URL for local anti-detection browsing (e.g. http://localhost:9377)",
-        "prompt": "Camofox server URL",
-        "url": "https://github.com/jo-inc/camofox-browser",
-        "tools": ["browser_navigate", "browser_click"],
-        "password": False,
-        "category": "tool",
-    },
    "FAL_KEY": {
        "description": "FAL API key for image generation",
        "prompt": "FAL API key",
@@ -1381,36 +1297,6 @@ def _expand_env_vars(obj):
    return obj


-def _normalize_root_model_keys(config: Dict[str, Any]) -> Dict[str, Any]:
-    """Move stale root-level provider/base_url into model section.
-
-    Some users (or older code) placed ``provider:`` and ``base_url:`` at the
-    config root instead of inside ``model:``.  These root-level keys are only
-    used as a fallback when the corresponding ``model.*`` key is empty — they
-    never override an existing ``model.provider`` or ``model.base_url``.
-    After migration the root-level keys are removed so they can't cause
-    confusion on subsequent loads.
-    """
-    # Only act if there are root-level keys to migrate
-    has_root = any(config.get(k) for k in ("provider", "base_url"))
-    if not has_root:
-        return config
-
-    config = dict(config)
-    model = config.get("model")
-    if not isinstance(model, dict):
-        model = {"default": model} if model else {}
-        config["model"] = model
-
-    for key in ("provider", "base_url"):
-        root_val = config.get(key)
-        if root_val and not model.get(key):
-            model[key] = root_val
-        config.pop(key, None)
-
-    return config
-
-
 def _normalize_max_turns_config(config: Dict[str, Any]) -> Dict[str, Any]:
    """Normalize legacy root-level max_turns into agent.max_turns."""
    config = dict(config)
@@ -1452,7 +1338,7 @@ def load_config() -> Dict[str, Any]:
        except Exception as e:
            print(f"Warning: Failed to load config: {e}")
    
-    return _expand_env_vars(_normalize_root_model_keys(_normalize_max_turns_config(config)))
+    return _expand_env_vars(_normalize_max_turns_config(config))


 _SECURITY_COMMENT = """
@@ -1559,7 +1445,7 @@ def save_config(config: Dict[str, Any]):

    ensure_hermes_home()
    config_path = get_config_path()
-    normalized = _normalize_root_model_keys(_normalize_max_turns_config(config))
+    normalized = _normalize_max_turns_config(config)

    # Build optional commented-out sections for features that are off by
    # default or only relevant when explicitly configured.
@@ -2083,7 +1969,7 @@ def config_command(args):
    elif subcmd == "set":
        key = getattr(args, 'key', None)
        value = getattr(args, 'value', None)
-        if not key or value is None:
+        if not key or not value:
            print("Usage: hermes config set <key> <value>")
            print()
            print("Examples:")
@@ -56,7 +56,7 @@ def cron_list(show_all: bool = False):
    print()

    for job in jobs:
-        job_id = job.get("id", "?")
+        job_id = job.get("id", "?")[:8]
        name = job.get("name", "(unnamed)")
        schedule = job.get("schedule_display", job.get("schedule", {}).get("value", "?"))
        state = job.get("state", "scheduled" if job.get("enabled", True) else "paused")
@@ -406,11 +406,8 @@ def run_doctor(args):
    if terminal_env == "docker":
        if shutil.which("docker"):
            # Check if docker daemon is running
-            try:
-                result = subprocess.run(["docker", "info"], capture_output=True, timeout=10)
-            except subprocess.TimeoutExpired:
-                result = None
-            if result is not None and result.returncode == 0:
+            result = subprocess.run(["docker", "info"], capture_output=True)
+            if result.returncode == 0:
                check_ok("docker", "(daemon running)")
            else:
                check_fail("docker daemon not running")
@@ -429,16 +426,12 @@ def run_doctor(args):
        ssh_host = os.getenv("TERMINAL_SSH_HOST")
        if ssh_host:
            # Try to connect
-            try:
-                result = subprocess.run(
-                    ["ssh", "-o", "ConnectTimeout=5", "-o", "BatchMode=yes", ssh_host, "echo ok"],
-                    capture_output=True,
-                    text=True,
-                    timeout=15
-                )
-            except subprocess.TimeoutExpired:
-                result = None
-            if result is not None and result.returncode == 0:
+            result = subprocess.run(
+                ["ssh", "-o", "ConnectTimeout=5", "-o", "BatchMode=yes", ssh_host, "echo ok"],
+                capture_output=True,
+                text=True
+            )
+            if result.returncode == 0:
                check_ok(f"SSH connection to {ssh_host}")
            else:
                check_fail(f"SSH connection to {ssh_host}")
@@ -463,32 +463,6 @@ def _build_user_local_paths(home: Path, path_entries: list[str]) -> list[str]:
    return [p for p in candidates if p not in path_entries and Path(p).exists()]


-def _hermes_home_for_target_user(target_home_dir: str) -> str:
-    """Remap the current HERMES_HOME to the equivalent under a target user's home.
-
-    When installing a system service via sudo, get_hermes_home() resolves to
-    root's home.  This translates it to the target user's equivalent path:
-      /root/.hermes                    → /home/alice/.hermes
-      /root/.hermes/profiles/coder     → /home/alice/.hermes/profiles/coder
-      /opt/custom-hermes               → /opt/custom-hermes  (kept as-is)
-    """
-    current_hermes = get_hermes_home().resolve()
-    current_default = (Path.home() / ".hermes").resolve()
-    target_default = Path(target_home_dir) / ".hermes"
-
-    # Default ~/.hermes → remap to target user's default
-    if current_hermes == current_default:
-        return str(target_default)
-
-    # Profile or subdir of ~/.hermes → preserve the relative structure
-    try:
-        relative = current_hermes.relative_to(current_default)
-        return str(target_default / relative)
-    except ValueError:
-        # Completely custom path (not under ~/.hermes) — keep as-is
-        return str(current_hermes)
-
-
 def generate_systemd_unit(system: bool = False, run_as_user: str | None = None) -> str:
    python_path = get_python_path()
    working_dir = str(PROJECT_ROOT)
@@ -504,11 +478,12 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)
        if resolved_node_dir not in path_entries:
            path_entries.append(resolved_node_dir)

+    hermes_home = str(get_hermes_home().resolve())
+
    common_bin_paths = ["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"]

    if system:
        username, group_name, home_dir = _system_service_identity(run_as_user)
-        hermes_home = _hermes_home_for_target_user(home_dir)
        path_entries.extend(_build_user_local_paths(Path(home_dir), path_entries))
        path_entries.extend(common_bin_paths)
        sane_path = ":".join(path_entries)
@@ -543,7 +518,6 @@ StandardError=journal
 WantedBy=multi-user.target
 """

-    hermes_home = str(get_hermes_home().resolve())
    path_entries.extend(_build_user_local_paths(Path.home(), path_entries))
    path_entries.extend(common_bin_paths)
    sane_path = ":".join(path_entries)
@@ -173,25 +173,9 @@ def _relative_time(ts) -> str:

 def _has_any_provider_configured() -> bool:
    """Check if at least one inference provider is usable."""
-    from hermes_cli.config import get_env_path, get_hermes_home, load_config
+    from hermes_cli.config import get_env_path, get_hermes_home
    from hermes_cli.auth import get_auth_status

-    # Determine whether Hermes itself has been explicitly configured (model
-    # in config that isn't the hardcoded default). Used below to gate external
-    # tool credentials (Claude Code, Codex CLI) that shouldn't silently skip
-    # the setup wizard on a fresh install.
-    from hermes_cli.config import DEFAULT_CONFIG
-    _DEFAULT_MODEL = DEFAULT_CONFIG.get("model", "")
-    cfg = load_config()
-    model_cfg = cfg.get("model")
-    if isinstance(model_cfg, dict):
-        _model_name = (model_cfg.get("default") or "").strip()
-    elif isinstance(model_cfg, str):
-        _model_name = model_cfg.strip()
-    else:
-        _model_name = ""
-    _has_hermes_config = _model_name and _model_name != _DEFAULT_MODEL
-
    # Check env vars (may be set by .env or shell).
    # OPENAI_BASE_URL alone counts — local models (vLLM, llama.cpp, etc.)
    # often don't require an API key.
@@ -246,28 +230,16 @@ def _has_any_provider_configured() -> bool:
            pass


-    # Check config.yaml — if model is a dict with an explicit provider set,
-    # the user has gone through setup (fresh installs have model as a plain
-    # string).  Also covers custom endpoints that store api_key/base_url in
-    # config rather than .env.
-    if isinstance(model_cfg, dict):
-        cfg_provider = (model_cfg.get("provider") or "").strip()
-        cfg_base_url = (model_cfg.get("base_url") or "").strip()
-        cfg_api_key = (model_cfg.get("api_key") or "").strip()
-        if cfg_provider or cfg_base_url or cfg_api_key:
-            return True
-
    # Check for Claude Code OAuth credentials (~/.claude/.credentials.json)
-    # Only count these if Hermes has been explicitly configured — Claude Code
-    # being installed doesn't mean the user wants Hermes to use their tokens.
-    if _has_hermes_config:
-        try:
-            from agent.anthropic_adapter import read_claude_code_credentials, is_claude_code_token_valid
-            creds = read_claude_code_credentials()
-            if creds and (is_claude_code_token_valid(creds) or creds.get("refreshToken")):
-                return True
-        except Exception:
-            pass
+    # These are used by resolve_anthropic_token() at runtime but were missing
+    # from this startup gate check.
+    try:
+        from agent.anthropic_adapter import read_claude_code_credentials, is_claude_code_token_valid
+        creds = read_claude_code_credentials()
+        if creds and (is_claude_code_token_valid(creds) or creds.get("refreshToken")):
+            return True
+    except Exception:
+        pass

    return False

@@ -643,7 +615,6 @@ def cmd_chat(args):
        "worktree": getattr(args, "worktree", False),
        "checkpoints": getattr(args, "checkpoints", False),
        "pass_session_id": getattr(args, "pass_session_id", False),
-        "max_turns": getattr(args, "max_turns", None),
    }
    # Filter out None values
    kwargs = {k: v for k, v in kwargs.items() if v is not None}
@@ -858,17 +829,6 @@ def cmd_setup(args):
 def cmd_model(args):
    """Select default model — starts with provider selection, then model picker."""
    _require_tty("model")
-    select_provider_and_model()
-
-
-def select_provider_and_model():
-    """Core provider selection + model picking logic.
-
-    Shared by ``cmd_model`` (``hermes model``) and the setup wizard
-    (``setup_model_provider`` in setup.py).  Handles the full flow:
-    provider picker, credential prompting, model selection, and config
-    persistence.
-    """
    from hermes_cli.auth import (
        resolve_provider, AuthError, format_auth_error,
    )
@@ -898,10 +858,7 @@ def select_provider_and_model():
    except AuthError as exc:
        warning = format_auth_error(exc)
        print(f"Warning: {warning} Falling back to auto provider detection.")
-        try:
-            active = resolve_provider("auto")
-        except AuthError:
-            active = "openrouter"  # no provider yet; show full picker
+        active = resolve_provider("auto")

    # Detect custom endpoint
    if active == "openrouter" and get_env_value("OPENAI_BASE_URL"):
@@ -1093,6 +1050,10 @@ def _model_flow_openrouter(config, current_model=""):

    selected = _prompt_model_selection(openrouter_models, current_model=current_model)
    if selected:
+        # Clear any custom endpoint and set provider to openrouter
+        if get_env_value("OPENAI_BASE_URL"):
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
        _save_model_choice(selected)

        # Update config provider and deactivate any OAuth provider
@@ -1182,6 +1143,10 @@ def _model_flow_nous(config, current_model=""):
        # Reactivate Nous as the provider and update config
        inference_url = creds.get("base_url", "")
        _update_config_for_provider("nous", inference_url)
+        # Clear any custom endpoint that might conflict
+        if get_env_value("OPENAI_BASE_URL"):
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
        print(f"Default model set to: {selected} (via Nous Portal)")
    else:
        print("No change.")
@@ -1226,6 +1191,10 @@ def _model_flow_openai_codex(config, current_model=""):
    if selected:
        _save_model_choice(selected)
        _update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
+        # Clear custom endpoint env vars that would otherwise override Codex.
+        if get_env_value("OPENAI_BASE_URL"):
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
        print(f"Default model set to: {selected} (via OpenAI Codex)")
    else:
        print("No change.")
@@ -1254,10 +1223,22 @@ def _model_flow_custom(config):
    try:
        base_url = input(f"API base URL [{current_url or 'e.g. https://api.example.com/v1'}]: ").strip()
        api_key = input(f"API key [{current_key[:8] + '...' if current_key else 'optional'}]: ").strip()
+        model_name = input("Model name (e.g. gpt-4, llama-3-70b): ").strip()
+        context_length_str = input("Context length in tokens [leave blank for auto-detect]: ").strip()
    except (KeyboardInterrupt, EOFError):
        print("\nCancelled.")
        return

+    context_length = None
+    if context_length_str:
+        try:
+            context_length = int(context_length_str.replace(",", "").replace("k", "000").replace("K", "000"))
+            if context_length <= 0:
+                context_length = None
+        except ValueError:
+            print(f"Invalid context length: {context_length_str} — will auto-detect.")
+            context_length = None
+
    if not base_url and not current_url:
        print("No URL provided. Cancelled.")
        return
@@ -1294,43 +1275,10 @@ def _model_flow_custom(config):
        if probe.get("suggested_base_url"):
            print(f"  If this server expects /v1, try base URL: {probe['suggested_base_url']}")

-    # Select model — use probe results when available, fall back to manual input
-    model_name = ""
-    detected_models = probe.get("models") or []
-    try:
-        if len(detected_models) == 1:
-            print(f"  Detected model: {detected_models[0]}")
-            confirm = input("  Use this model? [Y/n]: ").strip().lower()
-            if confirm in ("", "y", "yes"):
-                model_name = detected_models[0]
-            else:
-                model_name = input("Model name (e.g. gpt-4, llama-3-70b): ").strip()
-        elif len(detected_models) > 1:
-            print("  Available models:")
-            for i, m in enumerate(detected_models, 1):
-                print(f"    {i}. {m}")
-            pick = input(f"  Select model [1-{len(detected_models)}] or type name: ").strip()
-            if pick.isdigit() and 1 <= int(pick) <= len(detected_models):
-                model_name = detected_models[int(pick) - 1]
-            elif pick:
-                model_name = pick
-        else:
-            model_name = input("Model name (e.g. gpt-4, llama-3-70b): ").strip()
-
-        context_length_str = input("Context length in tokens [leave blank for auto-detect]: ").strip()
-    except (KeyboardInterrupt, EOFError):
-        print("\nCancelled.")
-        return
-
-    context_length = None
-    if context_length_str:
-        try:
-            context_length = int(context_length_str.replace(",", "").replace("k", "000").replace("K", "000"))
-            if context_length <= 0:
-                context_length = None
-        except ValueError:
-            print(f"Invalid context length: {context_length_str} — will auto-detect.")
-            context_length = None
+    if base_url:
+        save_env_value("OPENAI_BASE_URL", effective_url)
+    if api_key:
+        save_env_value("OPENAI_API_KEY", api_key)

    if model_name:
        _save_model_choice(model_name)
@@ -1343,33 +1291,14 @@ def _model_flow_custom(config):
            cfg["model"] = model
        model["provider"] = "custom"
        model["base_url"] = effective_url
-        if effective_key:
-            model["api_key"] = effective_key
        model.pop("api_mode", None)  # let runtime auto-detect from URL
        save_config(cfg)
        deactivate_provider()

-        # Sync the caller's config dict so the setup wizard's final
-        # save_config(config) preserves our model settings.  Without
-        # this, the wizard overwrites model.provider/base_url with
-        # the stale values from its own config dict (#4172).
-        config["model"] = dict(model)
-
        print(f"Default model set to: {model_name} (via {effective_url})")
    else:
        if base_url or api_key:
            deactivate_provider()
-        # Even without a model name, persist the custom endpoint on the
-        # caller's config dict so the setup wizard doesn't lose it.
-        _caller_model = config.get("model")
-        if not isinstance(_caller_model, dict):
-            _caller_model = {"default": _caller_model} if _caller_model else {}
-        _caller_model["provider"] = "custom"
-        _caller_model["base_url"] = effective_url
-        if effective_key:
-            _caller_model["api_key"] = effective_key
-        _caller_model.pop("api_mode", None)
-        config["model"] = _caller_model
        print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")

    # Auto-save to custom_providers so it appears in the menu next time
@@ -1510,6 +1439,9 @@ def _model_flow_named_custom(config, provider_info):

    # If a model is saved, just activate immediately — no probing needed
    if saved_model:
+        save_env_value("OPENAI_BASE_URL", base_url)
+        if api_key:
+            save_env_value("OPENAI_API_KEY", api_key)
        _save_model_choice(saved_model)

        cfg = load_config()
@@ -1519,8 +1451,6 @@ def _model_flow_named_custom(config, provider_info):
            cfg["model"] = model
        model["provider"] = "custom"
        model["base_url"] = base_url
-        if api_key:
-            model["api_key"] = api_key
        save_config(cfg)
        deactivate_provider()

@@ -1583,6 +1513,9 @@ def _model_flow_named_custom(config, provider_info):
            return

    # Activate and save the model to the custom_providers entry
+    save_env_value("OPENAI_BASE_URL", base_url)
+    if api_key:
+        save_env_value("OPENAI_API_KEY", api_key)
    _save_model_choice(model_name)

    cfg = load_config()
@@ -1592,8 +1525,6 @@ def _model_flow_named_custom(config, provider_info):
        cfg["model"] = model
    model["provider"] = "custom"
    model["base_url"] = base_url
-    if api_key:
-        model["api_key"] = api_key
    save_config(cfg)
    deactivate_provider()

@@ -1646,15 +1577,11 @@ _PROVIDER_MODELS = {
        "kimi-k2-0905-preview",
    ],
    "minimax": [
-        "MiniMax-M2.7",
-        "MiniMax-M2.7-highspeed",
        "MiniMax-M2.5",
        "MiniMax-M2.5-highspeed",
        "MiniMax-M2.1",
    ],
    "minimax-cn": [
-        "MiniMax-M2.7",
-        "MiniMax-M2.7-highspeed",
        "MiniMax-M2.5",
        "MiniMax-M2.5-highspeed",
        "MiniMax-M2.1",
@@ -1902,6 +1829,11 @@ def _model_flow_copilot(config, current_model=""):
            catalog=catalog,
            api_key=api_key,
        ) or selected
+        # Clear stale custom-endpoint overrides so the Copilot provider wins cleanly.
+        if get_env_value("OPENAI_BASE_URL"):
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
+
        initial_cfg = load_config()
        current_effort = _current_reasoning_effort(initial_cfg)
        reasoning_efforts = github_model_reasoning_efforts(
@@ -2126,6 +2058,11 @@ def _model_flow_kimi(config, current_model=""):
            selected = None

    if selected:
+        # Clear custom endpoint if set (avoid confusion)
+        if get_env_value("OPENAI_BASE_URL"):
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
+
        _save_model_choice(selected)

        # Update config with provider and base URL
@@ -2228,6 +2165,11 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
            selected = None

    if selected:
+        # Clear custom endpoint if set (avoid confusion)
+        if get_env_value("OPENAI_BASE_URL"):
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
+
        _save_model_choice(selected)

        # Update config with provider and base URL
@@ -2439,6 +2381,11 @@ def _model_flow_anthropic(config, current_model=""):
            selected = None

    if selected:
+        # Clear custom endpoint if set
+        if get_env_value("OPENAI_BASE_URL"):
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
+
        _save_model_choice(selected)

        # Update config with provider — clear base_url since
@@ -2472,12 +2419,6 @@ def cmd_logout(args):
    logout_command(args)


-def cmd_auth(args):
-    """Manage pooled credentials."""
-    from hermes_cli.auth_commands import auth_command
-    auth_command(args)
-
-
 def cmd_status(args):
    """Show status of all components."""
    from hermes_cli.status import show_status
@@ -2526,14 +2467,10 @@ def cmd_version(args):
    # Show update status (synchronous — acceptable since user asked for version info)
    try:
        from hermes_cli.banner import check_for_updates
-        from hermes_cli.config import recommended_update_command
        behind = check_for_updates()
        if behind and behind > 0:
            commits_word = "commit" if behind == 1 else "commits"
-            print(
-                f"Update available: {behind} {commits_word} behind — "
-                f"run '{recommended_update_command()}'"
-            )
+            print(f"Update available: {behind} {commits_word} behind — run 'hermes update'")
        elif behind == 0:
            print("Up to date")
    except Exception:
@@ -2884,11 +2821,6 @@ def _invalidate_update_cache():
 def cmd_update(args):
    """Update Hermes Agent to the latest version."""
    import shutil
-    from hermes_cli.config import is_managed, managed_error
-
-    if is_managed():
-        managed_error("update Hermes Agent")
-        return
    
    print("⚕ Updating Hermes Agent...")
    print()
@@ -3224,7 +3156,6 @@ def cmd_update(args):
            _gw_service_name = get_service_name()
            existing_pid = get_running_pid()
            has_systemd_service = False
-            has_system_service = False
            has_launchd_service = False

            try:
@@ -3237,19 +3168,6 @@ def cmd_update(args):
            except (FileNotFoundError, subprocess.TimeoutExpired):
                pass

-            # Also check for a system-level service (hermes gateway install --system).
-            # This covers gateways running under system systemd where --user
-            # fails due to missing D-Bus session.
-            if not has_systemd_service and is_linux():
-                try:
-                    check = subprocess.run(
-                        ["systemctl", "is-active", _gw_service_name],
-                        capture_output=True, text=True, timeout=5,
-                    )
-                    has_system_service = check.stdout.strip() == "active"
-                except (FileNotFoundError, subprocess.TimeoutExpired):
-                    pass
-
            # Check for macOS launchd service
            if is_macos():
                try:
@@ -3264,7 +3182,7 @@ def cmd_update(args):
                except (FileNotFoundError, subprocess.TimeoutExpired):
                    pass

-            if existing_pid or has_systemd_service or has_system_service or has_launchd_service:
+            if existing_pid or has_systemd_service or has_launchd_service:
                print()

                # When a service manager is handling the gateway, let it
@@ -3305,21 +3223,6 @@ def cmd_update(args):
                                print("    hermes gateway restart")
                            else:
                                print("  Try manually: hermes gateway restart")
-                elif has_system_service:
-                    # System-level service (hermes gateway install --system).
-                    # No D-Bus session needed — systemctl without --user talks
-                    # directly to the system manager over /run/systemd/private.
-                    print("→ Restarting system gateway service...")
-                    restart = subprocess.run(
-                        ["systemctl", "restart", _gw_service_name],
-                        capture_output=True, text=True, timeout=15,
-                    )
-                    if restart.returncode == 0:
-                        print("✓ Gateway restarted (system service).")
-                    else:
-                        print(f"⚠ Gateway restart failed: {restart.stderr.strip()}")
-                        print("  System services may require root.  Try:")
-                        print(f"    sudo systemctl restart {_gw_service_name}")
                elif has_launchd_service:
                    # Refresh the plist first (picks up --replace and other
                    # changes from the update we just pulled).
@@ -3383,7 +3286,7 @@ def _coalesce_session_name_args(argv: list) -> list:
    or a known top-level subcommand.
    """
    _SUBCOMMANDS = {
-        "chat", "model", "gateway", "setup", "whatsapp", "login", "logout", "auth",
+        "chat", "model", "gateway", "setup", "whatsapp", "login", "logout",
        "status", "cron", "doctor", "config", "pairing", "skills", "tools",
        "mcp", "sessions", "insights", "version", "update", "uninstall",
        "profile",
@@ -3672,10 +3575,6 @@ Examples:
    hermes --resume <session_id>  Resume a specific session by ID
    hermes setup                  Run setup wizard
    hermes logout                 Clear stored authentication
-    hermes auth add <provider>    Add a pooled credential
-    hermes auth list              List pooled credentials
-    hermes auth remove <p> <n>    Remove pooled credential by index
-    hermes auth reset <provider>  Clear exhaustion status for a provider
    hermes model                  Select default model
    hermes config                 View configuration
    hermes config edit            Edit config in $EDITOR
@@ -3809,13 +3708,6 @@ For more help on a command:
        default=False,
        help="Enable filesystem checkpoints before destructive file operations (use /rollback to restore)"
    )
-    chat_parser.add_argument(
-        "--max-turns",
-        type=int,
-        default=None,
-        metavar="N",
-        help="Maximum tool-calling iterations per conversation turn (default: 90, or agent.max_turns in config)"
-    )
    chat_parser.add_argument(
        "--yolo",
        action="store_true",
@@ -4001,33 +3893,6 @@ For more help on a command:
    )
    logout_parser.set_defaults(func=cmd_logout)

-    auth_parser = subparsers.add_parser(
-        "auth",
-        help="Manage pooled provider credentials",
-    )
-    auth_subparsers = auth_parser.add_subparsers(dest="auth_action")
-    auth_add = auth_subparsers.add_parser("add", help="Add a pooled credential")
-    auth_add.add_argument("provider", help="Provider id (for example: anthropic, openai-codex, openrouter)")
-    auth_add.add_argument("--type", dest="auth_type", choices=["oauth", "api-key", "api_key"], help="Credential type to add")
-    auth_add.add_argument("--label", help="Optional display label")
-    auth_add.add_argument("--api-key", help="API key value (otherwise prompted securely)")
-    auth_add.add_argument("--portal-url", help="Nous portal base URL")
-    auth_add.add_argument("--inference-url", help="Nous inference base URL")
-    auth_add.add_argument("--client-id", help="OAuth client id")
-    auth_add.add_argument("--scope", help="OAuth scope override")
-    auth_add.add_argument("--no-browser", action="store_true", help="Do not auto-open a browser for OAuth login")
-    auth_add.add_argument("--timeout", type=float, help="OAuth/network timeout in seconds")
-    auth_add.add_argument("--insecure", action="store_true", help="Disable TLS verification for OAuth login")
-    auth_add.add_argument("--ca-bundle", help="Custom CA bundle for OAuth login")
-    auth_list = auth_subparsers.add_parser("list", help="List pooled credentials")
-    auth_list.add_argument("provider", nargs="?", help="Optional provider filter")
-    auth_remove = auth_subparsers.add_parser("remove", help="Remove a pooled credential by index")
-    auth_remove.add_argument("provider", help="Provider id")
-    auth_remove.add_argument("index", type=int, help="1-based credential index")
-    auth_reset = auth_subparsers.add_parser("reset", help="Clear exhaustion status for all credentials for a provider")
-    auth_reset.add_argument("provider", help="Provider id")
-    auth_parser.set_defaults(func=cmd_auth)
-
    # =========================================================================
    # status command
    # =========================================================================
@@ -4838,28 +4703,6 @@ For more help on a command:
        help="Skip confirmation prompts"
    )

-    # claw cleanup
-    claw_cleanup = claw_subparsers.add_parser(
-        "cleanup",
-        aliases=["clean"],
-        help="Archive leftover OpenClaw directories after migration",
-        description="Scan for and archive leftover OpenClaw directories to prevent state fragmentation"
-    )
-    claw_cleanup.add_argument(
-        "--source",
-        help="Path to a specific OpenClaw directory to clean up"
-    )
-    claw_cleanup.add_argument(
-        "--dry-run",
-        action="store_true",
-        help="Preview what would be archived without making changes"
-    )
-    claw_cleanup.add_argument(
-        "--yes", "-y",
-        action="store_true",
-        help="Skip confirmation prompts"
-    )
-
    def cmd_claw(args):
        from hermes_cli.claw import claw_command
        claw_command(args)
@@ -27,8 +27,6 @@ GITHUB_MODELS_CATALOG_URL = COPILOT_MODELS_URL
 # (model_id, display description shown in menus)
 OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("anthropic/claude-opus-4.6",       "recommended"),
-    ("anthropic/claude-sonnet-4.6",     ""),
-    ("qwen/qwen3.6-plus-preview:free", "free"),
    ("anthropic/claude-sonnet-4.5",     ""),
    ("anthropic/claude-haiku-4.5",      ""),
    ("openai/gpt-5.4",                  ""),
@@ -58,8 +56,6 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
 _PROVIDER_MODELS: dict[str, list[str]] = {
    "nous": [
        "anthropic/claude-opus-4.6",
-        "anthropic/claude-sonnet-4.6",
-        "qwen/qwen3.6-plus-preview:free",
        "anthropic/claude-sonnet-4.5",
        "anthropic/claude-haiku-4.5",
        "openai/gpt-5.4",
@@ -193,7 +189,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
    "opencode-go": [
        "glm-5",
        "kimi-k2.5",
-        "minimax-m2.7",
+        "minimax-m2.5",
    ],
    "ai-gateway": [
        "anthropic/claude-opus-4.6",
@@ -351,7 +347,7 @@ def list_available_providers() -> list[dict[str, str]]:
        try:
            from hermes_cli.auth import get_auth_status, has_usable_secret
            if pid == "custom":
-                custom_base_url = _get_custom_base_url() or ""
+                custom_base_url = _get_custom_base_url() or os.getenv("OPENAI_BASE_URL", "")
                has_creds = bool(custom_base_url.strip())
            elif pid == "openrouter":
                has_creds = has_usable_secret(os.getenv("OPENROUTER_API_KEY", ""))
@@ -265,11 +265,10 @@ def cmd_install(identifier: str, force: bool = False) -> None:
                )
                sys.exit(1)
            if mv_int > _SUPPORTED_MANIFEST_VERSION:
-                from hermes_cli.config import recommended_update_command
                console.print(
                    f"[red]Error:[/red] Plugin '{plugin_name}' requires manifest_version "
                    f"{mv}, but this installer only supports up to {_SUPPORTED_MANIFEST_VERSION}.\n"
-                    f"Run [bold]{recommended_update_command()}[/bold] to get a newer installer."
+                    f"Run [bold]hermes update[/bold] to get a newer installer."
                )
                sys.exit(1)

@@ -27,7 +27,7 @@ import stat
 import subprocess
 import sys
 from dataclasses import dataclass, field
-from pathlib import Path, PurePosixPath, PureWindowsPath
+from pathlib import Path
 from typing import List, Optional

 _PROFILE_ID_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
@@ -58,32 +58,6 @@ _CLONE_ALL_STRIP = [
    "processes.json",
 ]

-# Directories/files to exclude when exporting the default (~/.hermes) profile.
-# The default profile contains infrastructure (repo checkout, worktrees, DBs,
-# caches, binaries) that named profiles don't have.  We exclude those so the
-# export is a portable, reasonable-size archive of actual profile data.
-_DEFAULT_EXPORT_EXCLUDE_ROOT = frozenset({
-    # Infrastructure
-    "hermes-agent",         # repo checkout (multi-GB)
-    ".worktrees",           # git worktrees
-    "profiles",             # other profiles — never recursive-export
-    "bin",                  # installed binaries (tirith, etc.)
-    "node_modules",         # npm packages
-    # Databases & runtime state
-    "state.db", "state.db-shm", "state.db-wal",
-    "hermes_state.db",
-    "response_store.db", "response_store.db-shm", "response_store.db-wal",
-    "gateway.pid", "gateway_state.json", "processes.json",
-    "auth.lock", "active_profile", ".update_check",
-    "errors.log",
-    ".hermes_history",
-    # Caches (regenerated on use)
-    "image_cache", "audio_cache", "document_cache",
-    "browser_screenshots", "checkpoints",
-    "sandboxes",
-    "logs",                 # gateway logs
-})
-
 # Names that cannot be used as profile aliases
 _RESERVED_NAMES = frozenset({
    "hermes", "default", "test", "tmp", "root", "sudo",
@@ -267,7 +241,7 @@ def _read_config_model(profile_dir: Path) -> tuple:
        if isinstance(model_cfg, str):
            return model_cfg, None
        if isinstance(model_cfg, dict):
-            return model_cfg.get("default") or model_cfg.get("model"), model_cfg.get("provider")
+            return model_cfg.get("model"), model_cfg.get("provider")
        return None, None
    except Exception:
        return None, None
@@ -711,37 +685,11 @@ def get_active_profile_name() -> str:
 # Export / Import
 # ---------------------------------------------------------------------------

-def _default_export_ignore(root_dir: Path):
-    """Return an *ignore* callable for :func:`shutil.copytree`.
-
-    At the root level it excludes everything in ``_DEFAULT_EXPORT_EXCLUDE_ROOT``.
-    At all levels it excludes ``__pycache__``, sockets, and temp files.
-    """
-
-    def _ignore(directory: str, contents: list) -> set:
-        ignored: set = set()
-        for entry in contents:
-            # Universal exclusions (any depth)
-            if entry == "__pycache__" or entry.endswith((".sock", ".tmp")):
-                ignored.add(entry)
-            # npm lockfiles can appear at root
-            elif entry in ("package.json", "package-lock.json"):
-                ignored.add(entry)
-        # Root-level exclusions
-        if Path(directory) == root_dir:
-            ignored.update(c for c in contents if c in _DEFAULT_EXPORT_EXCLUDE_ROOT)
-        return ignored
-
-    return _ignore
-
-
 def export_profile(name: str, output_path: str) -> Path:
    """Export a profile to a tar.gz archive.

    Returns the output file path.
    """
-    import tempfile
-
    validate_profile_name(name)
    profile_dir = get_profile_dir(name)
    if not profile_dir.is_dir():
@@ -750,77 +698,10 @@ def export_profile(name: str, output_path: str) -> Path:
    output = Path(output_path)
    # shutil.make_archive wants the base name without extension
    base = str(output).removesuffix(".tar.gz").removesuffix(".tgz")
-
-    if name == "default":
-        # The default profile IS ~/.hermes itself — its parent is ~/ and its
-        # directory name is ".hermes", not "default".  We stage a clean copy
-        # under a temp dir so the archive contains ``default/...``.
-        with tempfile.TemporaryDirectory() as tmpdir:
-            staged = Path(tmpdir) / "default"
-            shutil.copytree(
-                profile_dir,
-                staged,
-                ignore=_default_export_ignore(profile_dir),
-            )
-            result = shutil.make_archive(base, "gztar", tmpdir, "default")
-            return Path(result)
-
    result = shutil.make_archive(base, "gztar", str(profile_dir.parent), name)
    return Path(result)


-def _normalize_profile_archive_parts(member_name: str) -> List[str]:
-    """Return safe path parts for a profile archive member."""
-    normalized_name = member_name.replace("\\", "/")
-    posix_path = PurePosixPath(normalized_name)
-    windows_path = PureWindowsPath(member_name)
-
-    if (
-        not normalized_name
-        or posix_path.is_absolute()
-        or windows_path.is_absolute()
-        or windows_path.drive
-    ):
-        raise ValueError(f"Unsafe archive member path: {member_name}")
-
-    parts = [part for part in posix_path.parts if part not in ("", ".")]
-    if not parts or any(part == ".." for part in parts):
-        raise ValueError(f"Unsafe archive member path: {member_name}")
-    return parts
-
-
-def _safe_extract_profile_archive(archive: Path, destination: Path) -> None:
-    """Extract a profile archive without allowing path escapes or links."""
-    import tarfile
-
-    with tarfile.open(archive, "r:gz") as tf:
-        for member in tf.getmembers():
-            parts = _normalize_profile_archive_parts(member.name)
-            target = destination.joinpath(*parts)
-
-            if member.isdir():
-                target.mkdir(parents=True, exist_ok=True)
-                continue
-
-            if not member.isfile():
-                raise ValueError(
-                    f"Unsupported archive member type: {member.name}"
-                )
-
-            target.parent.mkdir(parents=True, exist_ok=True)
-            extracted = tf.extractfile(member)
-            if extracted is None:
-                raise ValueError(f"Cannot read archive member: {member.name}")
-
-            with extracted, open(target, "wb") as dst:
-                shutil.copyfileobj(extracted, dst)
-
-            try:
-                os.chmod(target, member.mode & 0o777)
-            except OSError:
-                pass
-
-
 def import_profile(archive_path: str, name: Optional[str] = None) -> Path:
    """Import a profile from a tar.gz archive.

@@ -835,18 +716,9 @@ def import_profile(archive_path: str, name: Optional[str] = None) -> Path:

    # Peek at the archive to find the top-level directory name
    with tarfile.open(archive, "r:gz") as tf:
-        top_dirs = {
-            parts[0]
-            for member in tf.getmembers()
-            for parts in [_normalize_profile_archive_parts(member.name)]
-            if len(parts) > 1 or member.isdir()
-        }
+        top_dirs = {m.name.split("/")[0] for m in tf.getmembers() if "/" in m.name}
        if not top_dirs:
-            top_dirs = {
-                _normalize_profile_archive_parts(member.name)[0]
-                for member in tf.getmembers()
-                if member.isdir()
-            }
+            top_dirs = {m.name for m in tf.getmembers() if m.isdir()}

    inferred_name = name or (top_dirs.pop() if len(top_dirs) == 1 else None)
    if not inferred_name:
@@ -855,15 +727,6 @@ def import_profile(archive_path: str, name: Optional[str] = None) -> Path:
            "Specify it explicitly: hermes profile import <archive> --name <name>"
        )

-    # Archives exported from the default profile have "default/" as top-level
-    # dir.  Importing as "default" would target ~/.hermes itself — disallow
-    # that and guide the user toward a named profile.
-    if inferred_name == "default":
-        raise ValueError(
-            "Cannot import as 'default' — that is the built-in root profile (~/.hermes). "
-            "Specify a different name: hermes profile import <archive> --name <name>"
-        )
-
    validate_profile_name(inferred_name)
    profile_dir = get_profile_dir(inferred_name)
    if profile_dir.exists():
@@ -872,7 +735,7 @@ def import_profile(archive_path: str, name: Optional[str] = None) -> Path:
    profiles_root = _get_profiles_root()
    profiles_root.mkdir(parents=True, exist_ok=True)

-    _safe_extract_profile_archive(archive, profiles_root)
+    shutil.unpack_archive(str(archive), str(profiles_root))

    # If the archive extracted under a different name, rename
    extracted = profiles_root / (top_dirs.pop() if top_dirs else inferred_name)
@@ -6,10 +6,8 @@ import os
 from typing import Any, Dict, Optional

 from hermes_cli import auth as auth_mod
-from agent.credential_pool import CredentialPool, PooledCredential, get_custom_provider_pool_key, load_pool
 from hermes_cli.auth import (
    AuthError,
-    DEFAULT_CODEX_BASE_URL,
    PROVIDER_REGISTRY,
    format_auth_error,
    resolve_provider,
@@ -111,50 +109,6 @@ def _parse_api_mode(raw: Any) -> Optional[str]:
    return None


-def _resolve_runtime_from_pool_entry(
-    *,
-    provider: str,
-    entry: PooledCredential,
-    requested_provider: str,
-    model_cfg: Optional[Dict[str, Any]] = None,
-    pool: Optional[CredentialPool] = None,
-) -> Dict[str, Any]:
-    model_cfg = model_cfg or _get_model_config()
-    base_url = (getattr(entry, "runtime_base_url", None) or getattr(entry, "base_url", None) or "").rstrip("/")
-    api_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
-    api_mode = "chat_completions"
-    if provider == "openai-codex":
-        api_mode = "codex_responses"
-        base_url = base_url or DEFAULT_CODEX_BASE_URL
-    elif provider == "anthropic":
-        api_mode = "anthropic_messages"
-        cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
-        cfg_base_url = ""
-        if cfg_provider == "anthropic":
-            cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
-        base_url = cfg_base_url or base_url or "https://api.anthropic.com"
-    elif provider == "nous":
-        api_mode = "chat_completions"
-    elif provider == "copilot":
-        api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
-    else:
-        configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
-        if configured_mode:
-            api_mode = configured_mode
-        elif base_url.rstrip("/").endswith("/anthropic"):
-            api_mode = "anthropic_messages"
-
-    return {
-        "provider": provider,
-        "api_mode": api_mode,
-        "base_url": base_url,
-        "api_key": api_key,
-        "source": getattr(entry, "source", "pool"),
-        "credential_pool": pool,
-        "requested_provider": requested_provider,
-    }
-
-
 def resolve_requested_provider(requested: Optional[str] = None) -> str:
    """Resolve provider request from explicit arg, config, then env."""
    if requested and requested.strip():
@@ -174,37 +128,6 @@ def resolve_requested_provider(requested: Optional[str] = None) -> str:
    return "auto"


-def _try_resolve_from_custom_pool(
-    base_url: str,
-    provider_label: str,
-    api_mode_override: Optional[str] = None,
-) -> Optional[Dict[str, Any]]:
-    """Check if a credential pool exists for a custom endpoint and return a runtime dict if so."""
-    pool_key = get_custom_provider_pool_key(base_url)
-    if not pool_key:
-        return None
-    try:
-        pool = load_pool(pool_key)
-        if not pool.has_credentials():
-            return None
-        entry = pool.select()
-        if entry is None:
-            return None
-        pool_api_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
-        if not pool_api_key:
-            return None
-        return {
-            "provider": provider_label,
-            "api_mode": api_mode_override or _detect_api_mode_for_url(base_url) or "chat_completions",
-            "base_url": base_url,
-            "api_key": pool_api_key,
-            "source": f"pool:{pool_key}",
-            "credential_pool": pool,
-        }
-    except Exception:
-        return None
-
-
 def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, Any]]:
    requested_norm = _normalize_custom_provider_name(requested_provider or "")
    if not requested_norm or requested_norm == "custom":
@@ -269,11 +192,6 @@ def _resolve_named_custom_runtime(
    if not base_url:
        return None

-    # Check if a credential pool exists for this custom endpoint
-    pool_result = _try_resolve_from_custom_pool(base_url, "custom", custom_provider.get("api_mode"))
-    if pool_result:
-        return pool_result
-
    api_key_candidates = [
        (explicit_api_key or "").strip(),
        str(custom_provider.get("api_key", "") or "").strip(),
@@ -311,22 +229,28 @@ def _resolve_openrouter_runtime(
    requested_norm = (requested_provider or "").strip().lower()
    cfg_provider = cfg_provider.strip().lower()

+    env_openai_base_url = os.getenv("OPENAI_BASE_URL", "").strip()
    env_openrouter_base_url = os.getenv("OPENROUTER_BASE_URL", "").strip()

-    # Use config base_url when available and the provider context matches.
-    # OPENAI_BASE_URL env var is no longer consulted — config.yaml is
-    # the single source of truth for endpoint URLs.
    use_config_base_url = False
    if cfg_base_url.strip() and not explicit_base_url:
        if requested_norm == "auto":
-            if not cfg_provider or cfg_provider == "auto":
+            if (not cfg_provider or cfg_provider == "auto") and not env_openai_base_url:
                use_config_base_url = True
        elif requested_norm == "custom" and cfg_provider == "custom":
+            # provider: custom — use base_url from config (Fixes #1760).
            use_config_base_url = True

+    # When the user explicitly requested the openrouter provider, skip
+    # OPENAI_BASE_URL — it typically points to a custom / non-OpenRouter
+    # endpoint and would prevent switching back to OpenRouter (#874).
+    skip_openai_base = requested_norm == "openrouter"
+
+    # For custom, prefer config base_url over env so config.yaml is honored (#1760).
    base_url = (
        (explicit_base_url or "").strip()
        or (cfg_base_url.strip() if use_config_base_url else "")
+        or ("" if skip_openai_base else env_openai_base_url)
        or env_openrouter_base_url
        or OPENROUTER_BASE_URL
    ).rstrip("/")
@@ -363,15 +287,6 @@ def _resolve_openrouter_runtime(
    # Also provide a placeholder API key for local servers that don't require
    # authentication — the OpenAI SDK requires a non-empty api_key string.
    effective_provider = "custom" if requested_norm == "custom" else "openrouter"
-
-    # For custom endpoints, check if a credential pool exists
-    if effective_provider == "custom" and base_url:
-        pool_result = _try_resolve_from_custom_pool(
-            base_url, effective_provider, _parse_api_mode(model_cfg.get("api_mode")),
-        )
-        if pool_result:
-            return pool_result
-
    if effective_provider == "custom" and not api_key and not _is_openrouter_url:
        api_key = "no-key-required"

@@ -386,134 +301,6 @@ def _resolve_openrouter_runtime(
    }


-def _resolve_explicit_runtime(
-    *,
-    provider: str,
-    requested_provider: str,
-    model_cfg: Dict[str, Any],
-    explicit_api_key: Optional[str] = None,
-    explicit_base_url: Optional[str] = None,
-) -> Optional[Dict[str, Any]]:
-    explicit_api_key = str(explicit_api_key or "").strip()
-    explicit_base_url = str(explicit_base_url or "").strip().rstrip("/")
-    if not explicit_api_key and not explicit_base_url:
-        return None
-
-    if provider == "anthropic":
-        cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
-        cfg_base_url = ""
-        if cfg_provider == "anthropic":
-            cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
-        base_url = explicit_base_url or cfg_base_url or "https://api.anthropic.com"
-        api_key = explicit_api_key
-        if not api_key:
-            from agent.anthropic_adapter import resolve_anthropic_token
-
-            api_key = resolve_anthropic_token()
-            if not api_key:
-                raise AuthError(
-                    "No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
-                    "run 'claude setup-token', or authenticate with 'claude /login'."
-                )
-        return {
-            "provider": "anthropic",
-            "api_mode": "anthropic_messages",
-            "base_url": base_url,
-            "api_key": api_key,
-            "source": "explicit",
-            "requested_provider": requested_provider,
-        }
-
-    if provider == "openai-codex":
-        base_url = explicit_base_url or DEFAULT_CODEX_BASE_URL
-        api_key = explicit_api_key
-        last_refresh = None
-        if not api_key:
-            creds = resolve_codex_runtime_credentials()
-            api_key = creds.get("api_key", "")
-            last_refresh = creds.get("last_refresh")
-            if not explicit_base_url:
-                base_url = creds.get("base_url", "").rstrip("/") or base_url
-        return {
-            "provider": "openai-codex",
-            "api_mode": "codex_responses",
-            "base_url": base_url,
-            "api_key": api_key,
-            "source": "explicit",
-            "last_refresh": last_refresh,
-            "requested_provider": requested_provider,
-        }
-
-    if provider == "nous":
-        state = auth_mod.get_provider_auth_state("nous") or {}
-        base_url = (
-            explicit_base_url
-            or str(state.get("inference_base_url") or auth_mod.DEFAULT_NOUS_INFERENCE_URL).strip().rstrip("/")
-        )
-        api_key = explicit_api_key or str(state.get("agent_key") or state.get("access_token") or "").strip()
-        expires_at = state.get("agent_key_expires_at") or state.get("expires_at")
-        if not api_key:
-            creds = resolve_nous_runtime_credentials(
-                min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
-                timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
-            )
-            api_key = creds.get("api_key", "")
-            expires_at = creds.get("expires_at")
-            if not explicit_base_url:
-                base_url = creds.get("base_url", "").rstrip("/") or base_url
-        return {
-            "provider": "nous",
-            "api_mode": "chat_completions",
-            "base_url": base_url,
-            "api_key": api_key,
-            "source": "explicit",
-            "expires_at": expires_at,
-            "requested_provider": requested_provider,
-        }
-
-    pconfig = PROVIDER_REGISTRY.get(provider)
-    if pconfig and pconfig.auth_type == "api_key":
-        env_url = ""
-        if pconfig.base_url_env_var:
-            env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")
-
-        base_url = explicit_base_url
-        if not base_url:
-            if provider == "kimi-coding":
-                creds = resolve_api_key_provider_credentials(provider)
-                base_url = creds.get("base_url", "").rstrip("/")
-            else:
-                base_url = env_url or pconfig.inference_base_url
-
-        api_key = explicit_api_key
-        if not api_key:
-            creds = resolve_api_key_provider_credentials(provider)
-            api_key = creds.get("api_key", "")
-            if not base_url:
-                base_url = creds.get("base_url", "").rstrip("/")
-
-        api_mode = "chat_completions"
-        if provider == "copilot":
-            api_mode = _copilot_runtime_api_mode(model_cfg, api_key)
-        else:
-            configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
-            if configured_mode:
-                api_mode = configured_mode
-            elif base_url.rstrip("/").endswith("/anthropic"):
-                api_mode = "anthropic_messages"
-
-        return {
-            "provider": provider,
-            "api_mode": api_mode,
-            "base_url": base_url.rstrip("/"),
-            "api_key": api_key,
-            "source": "explicit",
-            "requested_provider": requested_provider,
-        }
-
-    return None
-
-
 def resolve_runtime_provider(
    *,
    requested: Optional[str] = None,
@@ -537,57 +324,6 @@ def resolve_runtime_provider(
        explicit_api_key=explicit_api_key,
        explicit_base_url=explicit_base_url,
    )
-    model_cfg = _get_model_config()
-    explicit_runtime = _resolve_explicit_runtime(
-        provider=provider,
-        requested_provider=requested_provider,
-        model_cfg=model_cfg,
-        explicit_api_key=explicit_api_key,
-        explicit_base_url=explicit_base_url,
-    )
-    if explicit_runtime:
-        return explicit_runtime
-
-    should_use_pool = provider != "openrouter"
-    if provider == "openrouter":
-        cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
-        cfg_base_url = str(model_cfg.get("base_url") or "").strip()
-        env_openai_base_url = os.getenv("OPENAI_BASE_URL", "").strip()
-        env_openrouter_base_url = os.getenv("OPENROUTER_BASE_URL", "").strip()
-        has_custom_endpoint = bool(
-            explicit_base_url
-            or env_openai_base_url
-            or env_openrouter_base_url
-        )
-        if cfg_base_url and cfg_provider in {"auto", "custom"}:
-            has_custom_endpoint = True
-        has_runtime_override = bool(explicit_api_key or explicit_base_url)
-        should_use_pool = (
-            requested_provider in {"openrouter", "auto"}
-            and not has_custom_endpoint
-            and not has_runtime_override
-        )
-
-    try:
-        pool = load_pool(provider) if should_use_pool else None
-    except Exception:
-        pool = None
-    if pool and pool.has_credentials():
-        entry = pool.select()
-        pool_api_key = ""
-        if entry is not None:
-            pool_api_key = (
-                getattr(entry, "runtime_api_key", None)
-                or getattr(entry, "access_token", "")
-            )
-        if entry is not None and pool_api_key:
-            return _resolve_runtime_from_pool_entry(
-                provider=provider,
-                entry=entry,
-                requested_provider=requested_provider,
-                model_cfg=model_cfg,
-                pool=pool,
-            )

    if provider == "nous":
        creds = resolve_nous_runtime_credentials(
@@ -641,6 +377,7 @@ def resolve_runtime_provider(
        # Allow base URL override from config.yaml model.base_url, but only
        # when the configured provider is anthropic — otherwise a non-Anthropic
        # base_url (e.g. Codex endpoint) would leak into Anthropic requests.
+        model_cfg = _get_model_config()
        cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
        cfg_base_url = ""
        if cfg_provider == "anthropic":
@@ -659,6 +396,7 @@ def resolve_runtime_provider(
    pconfig = PROVIDER_REGISTRY.get(provider)
    if pconfig and pconfig.auth_type == "api_key":
        creds = resolve_api_key_provider_credentials(provider)
+        model_cfg = _get_model_config()
        base_url = creds.get("base_url", "").rstrip("/")
        api_mode = "chat_completions"
        if provider == "copilot":
@@ -285,31 +285,23 @@ def show_status(args):
            _gw_svc = get_service_name()
        except Exception:
            _gw_svc = "hermes-gateway"
-        try:
-            result = subprocess.run(
-                ["systemctl", "--user", "is-active", _gw_svc],
-                capture_output=True,
-                text=True,
-                timeout=5
-            )
-            is_active = result.stdout.strip() == "active"
-        except subprocess.TimeoutExpired:
-            is_active = False
+        result = subprocess.run(
+            ["systemctl", "--user", "is-active", _gw_svc],
+            capture_output=True,
+            text=True
+        )
+        is_active = result.stdout.strip() == "active"
        print(f"  Status:       {check_mark(is_active)} {'running' if is_active else 'stopped'}")
        print("  Manager:      systemd (user)")
        
    elif sys.platform == 'darwin':
        from hermes_cli.gateway import get_launchd_label
-        try:
-            result = subprocess.run(
-                ["launchctl", "list", get_launchd_label()],
-                capture_output=True,
-                text=True,
-                timeout=5
-            )
-            is_loaded = result.returncode == 0
-        except subprocess.TimeoutExpired:
-            is_loaded = False
+        result = subprocess.run(
+            ["launchctl", "list", get_launchd_label()],
+            capture_output=True,
+            text=True
+        )
+        is_loaded = result.returncode == 0
        print(f"  Status:       {check_mark(is_loaded)} {'loaded' if is_loaded else 'not loaded'}")
        print("  Manager:      launchd")
    else:
@@ -273,16 +273,6 @@ TOOL_CATEGORIES = {
                "browser_provider": "browser-use",
                "post_setup": "browserbase",
            },
-            {
-                "name": "Camofox",
-                "tag": "Local anti-detection browser (Firefox/Camoufox)",
-                "env_vars": [
-                    {"key": "CAMOFOX_URL", "prompt": "Camofox server URL", "default": "http://localhost:9377",
-                     "url": "https://github.com/jo-inc/camofox-browser"},
-                ],
-                "browser_provider": "camofox",
-                "post_setup": "camofox",
-            },
        ],
    },
    "homeassistant": {
@@ -347,28 +337,6 @@ def _run_post_setup(post_setup_key: str):
        elif not node_modules.exists():
            _print_warning("    Node.js not found - browser tools require: npm install (in hermes-agent directory)")

-    elif post_setup_key == "camofox":
-        camofox_dir = PROJECT_ROOT / "node_modules" / "@askjo" / "camoufox-browser"
-        if not camofox_dir.exists() and shutil.which("npm"):
-            _print_info("    Installing Camofox browser server...")
-            import subprocess
-            result = subprocess.run(
-                ["npm", "install", "--silent"],
-                capture_output=True, text=True, cwd=str(PROJECT_ROOT)
-            )
-            if result.returncode == 0:
-                _print_success("    Camofox installed")
-            else:
-                _print_warning("    npm install failed - run manually: npm install")
-        if camofox_dir.exists():
-            _print_info("    Start the Camofox server:")
-            _print_info("      npx @askjo/camoufox-browser")
-            _print_info("    First run downloads the Camoufox engine (~300MB)")
-            _print_info("    Or use Docker: docker run -p 9377:9377 -e CAMOFOX_PORT=9377 jo-inc/camofox-browser")
-        elif not shutil.which("npm"):
-            _print_warning("    Node.js not found. Install Camofox via Docker:")
-            _print_info("      docker run -p 9377:9377 -e CAMOFOX_PORT=9377 jo-inc/camofox-browser")
-
    elif post_setup_key == "rl_training":
        try:
            __import__("tinker_atropos")
@@ -597,9 +565,7 @@ def _toolset_has_keys(ts_key: str) -> bool:
    if cat:
        for provider in cat.get("providers", []):
            env_vars = provider.get("env_vars", [])
-            if not env_vars:
-                return True  # No-key provider (e.g. Local Browser, Edge TTS)
-            if all(get_env_value(e["key"]) for e in env_vars):
+            if env_vars and all(get_env_value(e["key"]) for e in env_vars):
                return True
        return False

@@ -983,13 +949,8 @@ def _configure_simple_requirements(ts_key: str):
            key_label = "    OPENAI_API_KEY" if "api.openai.com" in base_url.lower() else "    API key"
            api_key = _prompt(key_label, password=True)
            if api_key and api_key.strip():
+                save_env_value("OPENAI_BASE_URL", base_url)
                save_env_value("OPENAI_API_KEY", api_key.strip())
-                # Save vision base URL to config (not .env — only secrets go there)
-                from hermes_cli.config import load_config, save_config
-                _cfg = load_config()
-                _aux = _cfg.setdefault("auxiliary", {}).setdefault("vision", {})
-                _aux["base_url"] = base_url
-                save_config(_cfg)
                if "api.openai.com" in base_url.lower():
                    save_env_value("AUXILIARY_VISION_MODEL", "gpt-4o-mini")
                _print_success("    Saved")
@@ -17,20 +17,6 @@ def get_hermes_home() -> Path:
    return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))


-def get_optional_skills_dir(default: Path | None = None) -> Path:
-    """Return the optional-skills directory, honoring package-manager wrappers.
-
-    Packaged installs may ship ``optional-skills`` outside the Python package
-    tree and expose it via ``HERMES_OPTIONAL_SKILLS``.
-    """
-    override = os.getenv("HERMES_OPTIONAL_SKILLS", "").strip()
-    if override:
-        return Path(override)
-    if default is not None:
-        return default
-    return get_hermes_home() / "optional-skills"
-
-
 def get_hermes_dir(new_subpath: str, old_name: str) -> Path:
    """Resolve a Hermes subdirectory with backward compatibility.

@@ -10,27 +10,16 @@ import os
 import sys
 from pathlib import Path

-from hermes_constants import get_hermes_home
 from honcho_integration.client import resolve_config_path, GLOBAL_CONFIG_PATH

 HOST = "hermes"


 def _config_path() -> Path:
-    """Return the active Honcho config path for reading (instance-local or global)."""
+    """Return the active Honcho config path (instance-local or global)."""
    return resolve_config_path()


-def _local_config_path() -> Path:
-    """Return the instance-local Honcho config path for writing.
-
-    Always returns $HERMES_HOME/honcho.json so each profile/instance gets
-    its own config file.  The global ~/.honcho/config.json is only used as
-    a read fallback (via resolve_config_path) for cross-app interop.
-    """
-    return get_hermes_home() / "honcho.json"
-
-
 def _read_config() -> dict:
    path = _config_path()
    if path.exists():
@@ -42,7 +31,7 @@ def _read_config() -> dict:


 def _write_config(cfg: dict, path: Path | None = None) -> None:
-    path = path or _local_config_path()
+    path = path or _config_path()
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(
        json.dumps(cfg, indent=2, ensure_ascii=False) + "\n",
@@ -106,13 +95,13 @@ def cmd_setup(args) -> None:
    """Interactive Honcho setup wizard."""
    cfg = _read_config()

-    write_path = _local_config_path()
-    read_path = _config_path()
+    active_path = _config_path()
    print("\nHoncho memory setup\n" + "─" * 40)
    print("  Honcho gives Hermes persistent cross-session memory.")
-    print(f"  Config: {write_path}")
-    if read_path != write_path and read_path.exists():
-        print(f"  (seeding from existing config at {read_path})")
+    if active_path != GLOBAL_CONFIG_PATH:
+        print(f"  Instance config: {active_path}")
+    else:
+        print("  Config is shared with other hosts at ~/.honcho/config.json")
    print()

    if not _ensure_sdk_installed():
@@ -200,7 +189,7 @@ def cmd_setup(args) -> None:
    hermes_host.setdefault("saveMessages", True)

    _write_config(cfg)
-    print(f"\n  Config written to {write_path}")
+    print(f"\n  Config written to {active_path}")

    # Test connection
    print("  Testing connection... ", end="", flush=True)
@@ -248,7 +237,6 @@ def cmd_status(args) -> None:
    cfg = _read_config()

    active_path = _config_path()
-    write_path = _local_config_path()

    if not cfg:
        print(f"  No Honcho config found at {active_path}")
@@ -271,8 +259,6 @@ def cmd_status(args) -> None:
    print(f"  Workspace:      {hcfg.workspace_id}")
    print(f"  Host:           {hcfg.host}")
    print(f"  Config path:    {active_path}")
-    if write_path != active_path:
-        print(f"  Write path:     {write_path}  (instance-local)")
    print(f"  AI peer:        {hcfg.ai_peer}")
    print(f"  User peer:      {hcfg.peer_name or 'not set'}")
    print(f"  Session key:    {hcfg.resolve_session_name()}")
@@ -252,7 +252,7 @@ def get_tool_definitions(
    # Determine which tool names the caller wants
    tools_to_include: set = set()

-    if enabled_toolsets is not None:
+    if enabled_toolsets:
        for toolset_name in enabled_toolsets:
            if validate_toolset(toolset_name):
                resolved = resolve_toolset(toolset_name)
@@ -2455,24 +2455,9 @@ class Migrator:
            notes.append("")

        notes.extend([
-            "## IMPORTANT: Archive the OpenClaw Directory",
-            "",
-            "After migration, your OpenClaw directory still exists on disk with workspace",
-            "state files (todo.json, sessions, logs). If the Hermes agent discovers these",
-            "directories, it may read/write to them instead of the Hermes state, causing",
-            "confusion (e.g., cron jobs reading a different todo list than interactive sessions).",
-            "",
-            "**Strongly recommended:** Run `hermes claw cleanup` to rename the OpenClaw",
-            "directory to `.openclaw.pre-migration`. This prevents the agent from finding it.",
-            "The directory is renamed, not deleted — you can undo this at any time.",
-            "",
-            "If you skip this step and notice the agent getting confused about workspaces",
-            "or todo lists, run `hermes claw cleanup` to fix it.",
-            "",
            "## Hermes-Specific Setup",
            "",
            "After migration, you may want to:",
-            "- Run `hermes claw cleanup` to archive the OpenClaw directory (prevents state confusion)",
            "- Run `hermes setup` to configure any remaining settings",
            "- Run `hermes mcp list` to verify MCP servers were imported correctly",
            "- Run `hermes cron` to recreate scheduled tasks (see archive/cron-config.json)",
@@ -16,8 +16,7 @@
  },
  "homepage": "https://github.com/NousResearch/Hermes-Agent#readme",
  "dependencies": {
-    "agent-browser": "^0.13.0",
-    "@askjo/camoufox-browser": "^1.0.0"
+    "agent-browser": "^0.13.0"
  },
  "engines": {
    "node": ">=18.0.0"
@@ -1,14 +0,0 @@
-Homebrew packaging notes for Hermes Agent.
-
-Use `packaging/homebrew/hermes-agent.rb` as a tap or `homebrew-core` starting point.
-
-Key choices:
- Stable builds should target the semver-named sdist asset attached to each GitHub release, not the CalVer tag tarball.
- `faster-whisper` now lives in the `voice` extra, which keeps wheel-only transitive dependencies out of the base Homebrew formula.
- The wrapper exports `HERMES_BUNDLED_SKILLS`, `HERMES_OPTIONAL_SKILLS`, and `HERMES_MANAGED=homebrew` so packaged installs keep runtime assets and defer upgrades to Homebrew.
-
-Typical update flow:
-1. Bump the formula `url`, `version`, and `sha256`.
-2. Refresh Python resources with `brew update-python-resources --print-only hermes-agent`.
-3. Keep `ignore_packages: %w[certifi cryptography pydantic]`.
-4. Verify `brew audit --new --strict hermes-agent` and `brew test hermes-agent`.
@@ -1,48 +0,0 @@
-class HermesAgent < Formula
-  include Language::Python::Virtualenv
-
-  desc "Self-improving AI agent that creates skills from experience"
-  homepage "https://hermes-agent.nousresearch.com"
-  # Stable source should point at the semver-named sdist asset attached by
-  # scripts/release.py, not the CalVer tag tarball.
-  url "https://github.com/NousResearch/hermes-agent/releases/download/v2026.3.30/hermes_agent-0.6.0.tar.gz"
-  sha256 "<replace-with-release-asset-sha256>"
-  license "MIT"
-
-  depends_on "certifi" => :no_linkage
-  depends_on "cryptography" => :no_linkage
-  depends_on "libyaml"
-  depends_on "python@3.14"
-
-  pypi_packages ignore_packages: %w[certifi cryptography pydantic]
-
-  # Refresh resource stanzas after bumping the source url/version:
-  #   brew update-python-resources --print-only hermes-agent
-
-  def install
-    venv = virtualenv_create(libexec, "python3.14")
-    venv.pip_install resources
-    venv.pip_install buildpath
-
-    pkgshare.install "skills", "optional-skills"
-
-    %w[hermes hermes-agent hermes-acp].each do |exe|
-      next unless (libexec/"bin"/exe).exist?
-
-      (bin/exe).write_env_script(
-        libexec/"bin"/exe,
-        HERMES_BUNDLED_SKILLS: pkgshare/"skills",
-        HERMES_OPTIONAL_SKILLS: pkgshare/"optional-skills",
-        HERMES_MANAGED: "homebrew"
-      )
-    end
-  end
-
-  test do
-    assert_match "Hermes Agent v#{version}", shell_output("#{bin}/hermes version")
-
-    managed = shell_output("#{bin}/hermes update 2>&1")
-    assert_match "managed by Homebrew", managed
-    assert_match "brew upgrade hermes-agent", managed
-  end
-end
@@ -32,6 +32,7 @@ dependencies = [
  "fal-client>=0.13.1,<1",
  # Text-to-speech (Edge TTS is free, no API key needed)
  "edge-tts>=7.2.7,<8",
+  "faster-whisper>=1.0.0,<2",
  # Skills Hub (GitHub App JWT auth — optional, only needed for bot identity)
  "PyJWT[crypto]>=2.12.0,<3",  # CVE-2026-32597
 ]
@@ -46,13 +47,7 @@ slack = ["slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
 matrix = ["matrix-nio[e2e]>=0.24.0,<1"]
 cli = ["simple-term-menu>=1.0,<2"]
 tts-premium = ["elevenlabs>=1.0,<2"]
-voice = [
-  # Local STT pulls in wheel-only transitive deps (ctranslate2, onnxruntime),
-  # so keep it out of the base install for source-build packagers like Homebrew.
-  "faster-whisper>=1.0.0,<2",
-  "sounddevice>=0.4.6,<1",
-  "numpy>=1.24.0,<3",
-]
+voice = ["sounddevice>=0.4.6,<1", "numpy>=1.24.0,<3"]
 pty = [
  "ptyprocess>=0.7.0,<1; sys_platform != 'win32'",
  "pywinpty>=2.0.0,<3; sys_platform == 'win32'",
@@ -72,8 +67,6 @@ rl = [
  "wandb>=0.15.0,<1",
 ]
 yc-bench = ["yc-bench @ git+https://github.com/collinear-ai/yc-bench.git ; python_version >= '3.12'"]
-taubench = ["tau-bench @ git+https://github.com/sierra-research/tau-bench.git"]
-tau2bench = ["tau2 @ git+https://github.com/sierra-research/tau2-bench.git"]
 all = [
  "hermes-agent[modal]",
  "hermes-agent[daytona]",
@@ -320,12 +320,8 @@ def _extract_parallel_scope_path(tool_name: str, function_args: dict) -> Path |
    if not isinstance(raw_path, str) or not raw_path.strip():
        return None

-    expanded = Path(raw_path).expanduser()
-    if expanded.is_absolute():
-        return Path(os.path.abspath(str(expanded)))
-
    # Avoid resolve(); the file may not exist yet.
-    return Path(os.path.abspath(str(Path.cwd() / expanded)))
+    return Path(raw_path).expanduser()


 def _paths_overlap(left: Path, right: Path) -> bool:
@@ -490,8 +486,6 @@ class AIAgent:
        provider_data_collection: str = None,
        session_id: str = None,
        tool_progress_callback: callable = None,
-        tool_start_callback: callable = None,
-        tool_complete_callback: callable = None,
        thinking_callback: callable = None,
        reasoning_callback: callable = None,
        clarify_callback: callable = None,
@@ -511,11 +505,9 @@ class AIAgent:
        honcho_config=None,
        iteration_budget: "IterationBudget" = None,
        fallback_model: Dict[str, Any] = None,
-        credential_pool=None,
        checkpoints_enabled: bool = False,
        checkpoint_max_snapshots: int = 50,
        pass_session_id: bool = False,
-        persist_session: bool = True,
    ):
        """
        Initialize the AI Agent.
@@ -581,8 +573,6 @@ class AIAgent:
        self.background_review_callback = None  # Optional sync callback for gateway delivery
        self.skip_context_files = skip_context_files
        self.pass_session_id = pass_session_id
-        self.persist_session = persist_session
-        self._credential_pool = credential_pool
        self.log_prefix_chars = log_prefix_chars
        self.log_prefix = f"{log_prefix} " if log_prefix else ""
        # Store effective base URL for feature detection (prompt caching, reasoning, etc.)
@@ -626,8 +616,6 @@ class AIAgent:
            ).start()

        self.tool_progress_callback = tool_progress_callback
-        self.tool_start_callback = tool_start_callback
-        self.tool_complete_callback = tool_complete_callback
        self.thinking_callback = thinking_callback
        self.reasoning_callback = reasoning_callback
        self._reasoning_deltas_fired = False  # Set by _fire_reasoning_delta, reset per API call
@@ -1397,7 +1385,6 @@ class AIAgent:
        content = re.sub(r'<thinking>.*?</thinking>', '', content, flags=re.DOTALL | re.IGNORECASE)
        content = re.sub(r'<reasoning>.*?</reasoning>', '', content, flags=re.DOTALL)
        content = re.sub(r'<REASONING_SCRATCHPAD>.*?</REASONING_SCRATCHPAD>', '', content, flags=re.DOTALL)
-        content = re.sub(r'</?(?:think|thinking|reasoning|REASONING_SCRATCHPAD)>\s*', '', content, flags=re.IGNORECASE)
        return content

    def _looks_like_codex_intermediate_ack(
@@ -1713,10 +1700,7 @@ class AIAgent:
        """Save session state to both JSON log and SQLite on any exit path.

        Ensures conversations are never lost, even on errors or early returns.
-        Skipped when ``persist_session=False`` (ephemeral helper flows).
        """
-        if not self.persist_session:
-            return
        self._apply_persist_user_message_override(messages)
        self._session_messages = messages
        self._save_session_log(messages)
@@ -3246,10 +3230,9 @@ class AIAgent:
            "model": model,
            "instructions": instructions,
            "input": normalized_input,
+            "tools": normalized_tools,
            "store": False,
        }
-        if normalized_tools is not None:
-            normalized["tools"] = normalized_tools

        # Pass through reasoning config
        reasoning = api_kwargs.get("reasoning")
@@ -3494,33 +3477,14 @@ class AIAgent:

    @staticmethod
    def _is_openai_client_closed(client: Any) -> bool:
-        """Check if an OpenAI client is closed.
-
-        Handles both property and method forms of is_closed:
-        - httpx.Client.is_closed is a bool property
-        - openai.OpenAI.is_closed is a method returning bool
-
-        Prior bug: getattr(client, "is_closed", False) returned the bound method,
-        which is always truthy, causing unnecessary client recreation on every call.
-        """
        from unittest.mock import Mock

        if isinstance(client, Mock):
            return False
-
-        is_closed_attr = getattr(client, "is_closed", None)
-        if is_closed_attr is not None:
-            # Handle method (openai SDK) vs property (httpx)
-            if callable(is_closed_attr):
-                if is_closed_attr():
-                    return True
-            elif bool(is_closed_attr):
-                return True
-
+        if bool(getattr(client, "is_closed", False)):
+            return True
        http_client = getattr(client, "_client", None)
-        if http_client is not None:
-            return bool(getattr(http_client, "is_closed", False))
-        return False
+        return bool(getattr(http_client, "is_closed", False))

    def _create_openai_client(self, client_kwargs: dict, *, reason: str, shared: bool) -> Any:
        if self.provider == "copilot-acp" or str(client_kwargs.get("base_url", "")).startswith("acp://copilot"):
@@ -3611,8 +3575,6 @@ class AIAgent:

    def _run_codex_stream(self, api_kwargs: dict, client: Any = None, on_first_delta: callable = None):
        """Execute one streaming Responses API request and return the final response."""
-        import httpx as _httpx
-
        active_client = client or self._ensure_primary_openai_client(reason="codex_stream_direct")
        max_stream_retries = 1
        has_tool_calls = False
@@ -3646,22 +3608,6 @@ class AIAgent:
                            if reasoning_text:
                                self._fire_reasoning_delta(reasoning_text)
                    return stream.get_final_response()
-            except (_httpx.RemoteProtocolError, _httpx.ReadTimeout, _httpx.ConnectError, ConnectionError) as exc:
-                if attempt < max_stream_retries:
-                    logger.debug(
-                        "Codex Responses stream transport failed (attempt %s/%s); retrying. %s error=%s",
-                        attempt + 1,
-                        max_stream_retries + 1,
-                        self._client_log_context(),
-                        exc,
-                    )
-                    continue
-                logger.debug(
-                    "Codex Responses stream transport failed; falling back to create(stream=True). %s error=%s",
-                    self._client_log_context(),
-                    exc,
-                )
-                return self._run_codex_create_stream_fallback(api_kwargs, client=active_client)
            except RuntimeError as exc:
                err_text = str(exc)
                missing_completed = "response.completed" in err_text
@@ -3824,100 +3770,6 @@ class AIAgent:
        self._is_anthropic_oauth = _is_oauth_token(new_token)
        return True

-    def _apply_client_headers_for_base_url(self, base_url: str) -> None:
-        from agent.auxiliary_client import _OR_HEADERS
-
-        normalized = (base_url or "").lower()
-        if "openrouter" in normalized:
-            self._client_kwargs["default_headers"] = dict(_OR_HEADERS)
-        elif "api.githubcopilot.com" in normalized:
-            from hermes_cli.models import copilot_default_headers
-
-            self._client_kwargs["default_headers"] = copilot_default_headers()
-        elif "api.kimi.com" in normalized:
-            self._client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
-        else:
-            self._client_kwargs.pop("default_headers", None)
-
-    def _swap_credential(self, entry) -> None:
-        runtime_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
-        runtime_base = getattr(entry, "runtime_base_url", None) or getattr(entry, "base_url", None) or self.base_url
-
-        if self.api_mode == "anthropic_messages":
-            from agent.anthropic_adapter import build_anthropic_client, _is_oauth_token
-
-            try:
-                self._anthropic_client.close()
-            except Exception:
-                pass
-
-            self._anthropic_api_key = runtime_key
-            self._anthropic_base_url = runtime_base
-            self._anthropic_client = build_anthropic_client(runtime_key, runtime_base)
-            self._is_anthropic_oauth = _is_oauth_token(runtime_key) if self.provider == "anthropic" else False
-            self.api_key = runtime_key
-            self.base_url = runtime_base
-            return
-
-        self.api_key = runtime_key
-        self.base_url = runtime_base.rstrip("/") if isinstance(runtime_base, str) else runtime_base
-        self._client_kwargs["api_key"] = self.api_key
-        self._client_kwargs["base_url"] = self.base_url
-        self._apply_client_headers_for_base_url(self.base_url)
-        self._replace_primary_openai_client(reason="credential_rotation")
-
-    def _recover_with_credential_pool(
-        self,
-        *,
-        status_code: Optional[int],
-        has_retried_429: bool,
-    ) -> tuple[bool, bool]:
-        """Attempt credential recovery via pool rotation.
-
-        Returns (recovered, has_retried_429).
-        On 429: first occurrence retries same credential (sets flag True).
-                second consecutive 429 rotates to next credential (resets flag).
-        On 402: immediately rotates (billing exhaustion won't resolve with retry).
-        On 401: attempts token refresh before rotating.
-        """
-        pool = self._credential_pool
-        if pool is None or status_code is None:
-            return False, has_retried_429
-
-        if status_code == 402:
-            next_entry = pool.mark_exhausted_and_rotate(status_code=402)
-            if next_entry is not None:
-                logger.info(f"Credential 402 (billing) — rotated to pool entry {getattr(next_entry, 'id', '?')}")
-                self._swap_credential(next_entry)
-                return True, False
-            return False, has_retried_429
-
-        if status_code == 429:
-            if not has_retried_429:
-                return False, True
-            next_entry = pool.mark_exhausted_and_rotate(status_code=429)
-            if next_entry is not None:
-                logger.info(f"Credential 429 (rate limit) — rotated to pool entry {getattr(next_entry, 'id', '?')}")
-                self._swap_credential(next_entry)
-                return True, False
-            return False, True
-
-        if status_code == 401:
-            refreshed = pool.try_refresh_current()
-            if refreshed is not None:
-                logger.info(f"Credential 401 — refreshed pool entry {getattr(refreshed, 'id', '?')}")
-                self._swap_credential(refreshed)
-                return True, has_retried_429
-            # Refresh failed — rotate to next credential instead of giving up.
-            # The failed entry is already marked exhausted by try_refresh_current().
-            next_entry = pool.mark_exhausted_and_rotate(status_code=401)
-            if next_entry is not None:
-                logger.info(f"Credential 401 (refresh failed) — rotated to pool entry {getattr(next_entry, 'id', '?')}")
-                self._swap_credential(next_entry)
-                return True, False
-
-        return False, has_retried_429
-
    def _anthropic_messages_create(self, api_kwargs: dict):
        if self.api_mode == "anthropic_messages":
            self._try_refresh_anthropic_client_credentials()
@@ -5369,8 +5221,11 @@ class AIAgent:
            except Exception as e:
                logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)

-        # Update token estimate after compaction so pressure calculations
-        # use the post-compression count, not the stale pre-compression one.
+        # Reset context pressure warning and token estimate — usage drops
+        # after compaction.  Without this, the stale last_prompt_tokens from
+        # the previous API call causes the pressure calculation to stay at
+        # >1000% and spam warnings / re-trigger compression in a loop.
+        self._context_pressure_warned = False
        _compressed_est = (
            estimate_tokens_rough(new_system_prompt)
            + estimate_messages_tokens_rough(compressed)
@@ -5378,25 +5233,6 @@ class AIAgent:
        self.context_compressor.last_prompt_tokens = _compressed_est
        self.context_compressor.last_completion_tokens = 0

-        # Only reset the pressure warning if compression actually brought
-        # us below the warning level (85% of threshold).  When compression
-        # can't reduce enough (e.g. threshold is very low, or system prompt
-        # alone exceeds the warning level), keep the flag set to prevent
-        # spamming the user with repeated warnings every loop iteration.
-        if self.context_compressor.threshold_tokens > 0:
-            _post_progress = _compressed_est / self.context_compressor.threshold_tokens
-            if _post_progress < 0.85:
-                self._context_pressure_warned = False
-
-        # Clear the file-read dedup cache.  After compression the original
-        # read content is summarised away — if the model re-reads the same
-        # file it needs the full content, not a "file unchanged" stub.
-        try:
-            from tools.file_tools import reset_file_dedup
-            reset_file_dedup(task_id)
-        except Exception:
-            pass
-
        return compressed, new_system_prompt

    def _execute_tool_calls(self, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None:
@@ -5561,7 +5397,7 @@ class AIAgent:
                    args_preview = args_str[:self.log_prefix_chars] + "..." if len(args_str) > self.log_prefix_chars else args_str
                    print(f"  📞 Tool {i}: {name}({list(args.keys())}) - {args_preview}")

-        for tc, name, args in parsed_calls:
+        for _, name, args in parsed_calls:
            if self.tool_progress_callback:
                try:
                    preview = _build_tool_preview(name, args)
@@ -5569,13 +5405,6 @@ class AIAgent:
                except Exception as cb_err:
                    logging.debug(f"Tool progress callback error: {cb_err}")

-        for tc, name, args in parsed_calls:
-            if self.tool_start_callback:
-                try:
-                    self.tool_start_callback(tc.id, name, args)
-                except Exception as cb_err:
-                    logging.debug(f"Tool start callback error: {cb_err}")
-
        # ── Concurrent execution ─────────────────────────────────────────
        # Each slot holds (function_name, function_args, function_result, duration, error_flag)
        results = [None] * num_tools
@@ -5646,12 +5475,6 @@ class AIAgent:
                    response_preview = function_result[:self.log_prefix_chars] + "..." if len(function_result) > self.log_prefix_chars else function_result
                    print(f"  ✅ Tool {i+1} completed in {tool_duration:.2f}s - {response_preview}")

-            if self.tool_complete_callback:
-                try:
-                    self.tool_complete_callback(tc.id, name, args, function_result)
-                except Exception as cb_err:
-                    logging.debug(f"Tool complete callback error: {cb_err}")
-
            # Truncate oversized results
            MAX_TOOL_RESULT_CHARS = 100_000
            if len(function_result) > MAX_TOOL_RESULT_CHARS:
@@ -5740,12 +5563,6 @@ class AIAgent:
                except Exception as cb_err:
                    logging.debug(f"Tool progress callback error: {cb_err}")

-            if self.tool_start_callback:
-                try:
-                    self.tool_start_callback(tool_call.id, function_name, function_args)
-                except Exception as cb_err:
-                    logging.debug(f"Tool start callback error: {cb_err}")
-
            # Checkpoint: snapshot working dir before file-mutating tools
            if function_name in ("write_file", "patch") and self._checkpoint_mgr.enabled:
                try:
@@ -5910,12 +5727,6 @@ class AIAgent:
                logging.debug(f"Tool {function_name} completed in {tool_duration:.2f}s")
                logging.debug(f"Tool result ({len(function_result)} chars): {function_result}")

-            if self.tool_complete_callback:
-                try:
-                    self.tool_complete_callback(tool_call.id, function_name, function_args, function_result)
-                except Exception as cb_err:
-                    logging.debug(f"Tool complete callback error: {cb_err}")
-
            # Guard against tools returning absurdly large content that would
            # blow up the context window. 100K chars ≈ 25K tokens — generous
            # enough for any reasonable tool output but prevents catastrophic
@@ -6432,12 +6243,6 @@ class AIAgent:
                    )
                    if len(messages) >= _orig_len:
                        break  # Cannot compress further
-                    # Compression created a new session — clear the history
-                    # reference so _flush_messages_to_session_db writes ALL
-                    # compressed messages to the new session's SQLite, not
-                    # skipping them because conversation_history is still the
-                    # pre-compression length.
-                    conversation_history = None
                    # Re-estimate after compression
                    _preflight_tokens = estimate_request_tokens_rough(
                        messages,
@@ -6637,7 +6442,6 @@ class AIAgent:
            codex_auth_retry_attempted = False
            anthropic_auth_retry_attempted = False
            nous_auth_retry_attempted = False
-            has_retried_429 = False
            restart_with_compressed_messages = False
            restart_with_length_continuation = False

@@ -7073,7 +6877,6 @@ class AIAgent:
                            if not self.quiet_mode:
                                self._vprint(f"{self.log_prefix}   💾 Cache: {cached:,}/{prompt:,} tokens ({hit_pct:.0f}% hit, {written:,} written)")
                    
-                    has_retried_429 = False  # Reset on success
                    break  # Success, exit retry loop

                except InterruptedError:
@@ -7116,12 +6919,6 @@ class AIAgent:
                        # prompt or prefill.  Fall through to normal error path.

                    status_code = getattr(api_error, "status_code", None)
-                    recovered_with_pool, has_retried_429 = self._recover_with_credential_pool(
-                        status_code=status_code,
-                        has_retried_429=has_retried_429,
-                    )
-                    if recovered_with_pool:
-                        continue
                    if (
                        self.api_mode == "codex_responses"
                        and self.provider == "openai-codex"
@@ -7230,17 +7027,10 @@ class AIAgent:
                        or "quota" in error_msg
                    )
                    if is_rate_limited and self._fallback_index < len(self._fallback_chain):
-                        # Don't eagerly fallback if credential pool rotation may
-                        # still recover.  The pool's retry-then-rotate cycle needs
-                        # at least one more attempt to fire — jumping to a fallback
-                        # provider here short-circuits it.
-                        pool = self._credential_pool
-                        pool_may_recover = pool is not None and pool.has_available()
-                        if not pool_may_recover:
-                            self._emit_status("⚠️ Rate limited — switching to fallback provider...")
-                            if self._try_activate_fallback():
-                                retry_count = 0
-                                continue
+                        self._emit_status("⚠️ Rate limited — switching to fallback provider...")
+                        if self._try_activate_fallback():
+                            retry_count = 0
+                            continue

                    is_payload_too_large = (
                        status_code == 413
@@ -7253,7 +7043,6 @@ class AIAgent:
                        compression_attempts += 1
                        if compression_attempts > max_compression_attempts:
                            self._vprint(f"{self.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached for payload-too-large error.", force=True)
-                            self._vprint(f"{self.log_prefix}   💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
                            logging.error(f"{self.log_prefix}413 compression failed after {max_compression_attempts} attempts.")
                            self._persist_session(messages, conversation_history)
                            return {
@@ -7278,7 +7067,6 @@ class AIAgent:
                            break
                        else:
                            self._vprint(f"{self.log_prefix}❌ Payload too large and cannot compress further.", force=True)
-                            self._vprint(f"{self.log_prefix}   💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
                            logging.error(f"{self.log_prefix}413 payload too large. Cannot compress further.")
                            self._persist_session(messages, conversation_history)
                            return {
@@ -7355,7 +7143,6 @@ class AIAgent:
                        compression_attempts += 1
                        if compression_attempts > max_compression_attempts:
                            self._vprint(f"{self.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
-                            self._vprint(f"{self.log_prefix}   💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
                            logging.error(f"{self.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
                            self._persist_session(messages, conversation_history)
                            return {
@@ -7382,7 +7169,7 @@ class AIAgent:
                        else:
                            # Can't compress further and already at minimum tier
                            self._vprint(f"{self.log_prefix}❌ Context length exceeded and cannot compress further.", force=True)
-                            self._vprint(f"{self.log_prefix}   💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.", force=True)
+                            self._vprint(f"{self.log_prefix}   💡 The conversation has accumulated too much content.", force=True)
                            logging.error(f"{self.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
                            self._persist_session(messages, conversation_history)
                            return {
@@ -7971,10 +7758,6 @@ class AIAgent:
                            approx_tokens=self.context_compressor.last_prompt_tokens,
                            task_id=effective_task_id,
                        )
-                        # Compression created a new session — clear history so
-                        # _flush_messages_to_session_db writes compressed messages
-                        # to the new session (see preflight compression comment).
-                        conversation_history = None
                    
                    # Save session log incrementally (so progress is visible even if interrupted)
                    self._session_messages = messages
@@ -94,7 +94,7 @@ print_banner() {
    echo ""
    echo -e "${MAGENTA}${BOLD}"
    echo "┌─────────────────────────────────────────────────────────┐"
-    echo "│             ⚕ Hermes Agent Installer                    │"
+    echo "│             ⚕ Hermes Agent Installer                   │"
    echo "├─────────────────────────────────────────────────────────┤"
    echo "│  An open source AI agent by Nous Research.              │"
    echo "└─────────────────────────────────────────────────────────┘"
@@ -699,19 +699,14 @@ install_deps() {

    # Install the main package in editable mode with all extras.
    # Try [all] first, fall back to base install if extras have issues.
-    ALL_INSTALL_LOG=$(mktemp)
-    if ! $UV_CMD pip install -e ".[all]" 2>"$ALL_INSTALL_LOG"; then
+    if ! $UV_CMD pip install -e ".[all]" 2>/dev/null; then
        log_warn "Full install (.[all]) failed, trying base install..."
-        log_info "Reason: $(tail -5 "$ALL_INSTALL_LOG" | head -3)"
-        rm -f "$ALL_INSTALL_LOG"
        if ! $UV_CMD pip install -e "."; then
            log_error "Package installation failed."
            log_info "Check that build tools are installed: sudo apt install build-essential python3-dev"
            log_info "Then re-run: cd $INSTALL_DIR && uv pip install -e '.[all]'"
            exit 1
        fi
-    else
-        rm -f "$ALL_INSTALL_LOG"
    fi

    log_success "Main package installed"
@@ -1075,14 +1070,7 @@ print_success() {
    echo ""
    echo -e "${YELLOW}⚡ Reload your shell to use 'hermes' command:${NC}"
    echo ""
-    LOGIN_SHELL="$(basename "${SHELL:-/bin/bash}")"
-    if [ "$LOGIN_SHELL" = "zsh" ]; then
-        echo "   source ~/.zshrc"
-    elif [ "$LOGIN_SHELL" = "bash" ]; then
-        echo "   source ~/.bashrc"
-    else
-        echo "   source ~/.bashrc   # or ~/.zshrc"
-    fi
+    echo "   source ~/.bashrc   # or ~/.zshrc"
    echo ""

    # Show Node.js warning if auto-install failed
@@ -24,7 +24,6 @@ import argparse
 import json
 import os
 import re
-import shutil
 import subprocess
 import sys
 from collections import defaultdict
@@ -129,16 +128,6 @@ def git(*args, cwd=None):
    return result.stdout.strip()


-def git_result(*args, cwd=None):
-    """Run a git command and return the full CompletedProcess."""
-    return subprocess.run(
-        ["git"] + list(args),
-        capture_output=True,
-        text=True,
-        cwd=cwd or str(REPO_ROOT),
-    )
-
-
 def get_last_tag():
    """Get the most recent CalVer tag."""
    tags = git("tag", "--list", "v20*", "--sort=-v:refname")
@@ -147,18 +136,6 @@ def get_last_tag():
    return None


-def next_available_tag(base_tag: str) -> tuple[str, str]:
-    """Return a tag/calver pair, suffixing same-day releases when needed."""
-    if not git("tag", "--list", base_tag):
-        return base_tag, base_tag.removeprefix("v")
-
-    suffix = 2
-    while git("tag", "--list", f"{base_tag}.{suffix}"):
-        suffix += 1
-    tag_name = f"{base_tag}.{suffix}"
-    return tag_name, tag_name.removeprefix("v")
-
-
 def get_current_version():
    """Read current semver from __init__.py."""
    content = VERSION_FILE.read_text()
@@ -215,41 +192,6 @@ def update_version_files(semver: str, calver_date: str):
    PYPROJECT_FILE.write_text(pyproject)


-def build_release_artifacts(semver: str) -> list[Path]:
-    """Build sdist/wheel artifacts for the current release.
-
-    Returns the artifact paths when the local environment has ``python -m build``
-    available. If build tooling is missing or the build fails, returns an empty
-    list and lets the release proceed without attached Python artifacts.
-    """
-    dist_dir = REPO_ROOT / "dist"
-    shutil.rmtree(dist_dir, ignore_errors=True)
-
-    result = subprocess.run(
-        [sys.executable, "-m", "build", "--sdist", "--wheel"],
-        cwd=str(REPO_ROOT),
-        capture_output=True,
-        text=True,
-    )
-    if result.returncode != 0:
-        print("  ⚠ Could not build Python release artifacts.")
-        stderr = result.stderr.strip()
-        stdout = result.stdout.strip()
-        if stderr:
-            print(f"    {stderr.splitlines()[-1]}")
-        elif stdout:
-            print(f"    {stdout.splitlines()[-1]}")
-        print("    Install the 'build' package to attach semver-named sdist/wheel assets.")
-        return []
-
-    artifacts = sorted(p for p in dist_dir.iterdir() if p.is_file())
-    matching = [p for p in artifacts if semver in p.name]
-    if not matching:
-        print("  ⚠ Built artifacts did not match the expected release version.")
-        return []
-    return matching
-
-
 def resolve_author(name: str, email: str) -> str:
    """Resolve a git author to a GitHub @mention."""
    # Try email lookup first
@@ -482,10 +424,18 @@ def main():
        now = datetime.now()
        calver_date = f"{now.year}.{now.month}.{now.day}"

-    base_tag = f"v{calver_date}"
-    tag_name, calver_date = next_available_tag(base_tag)
-    if tag_name != base_tag:
-        print(f"Note: Tag {base_tag} already exists, using {tag_name}")
+    tag_name = f"v{calver_date}"
+
+    # Check for existing tag with same date
+    existing = git("tag", "--list", tag_name)
+    if existing and not args.publish:
+        # Append a suffix for same-day releases
+        suffix = 2
+        while git("tag", "--list", f"{tag_name}.{suffix}"):
+            suffix += 1
+        tag_name = f"{tag_name}.{suffix}"
+        calver_date = f"{calver_date}.{suffix}"
+        print(f"Note: Tag {tag_name[:-2]} already exists, using {tag_name}")

    # Determine semver
    current_version = get_current_version()
@@ -544,83 +494,41 @@ def main():
            print(f"  ✓ Updated version files to v{new_version} ({calver_date})")

            # Commit version bump
-            add_result = git_result("add", str(VERSION_FILE), str(PYPROJECT_FILE))
-            if add_result.returncode != 0:
-                print(f"  ✗ Failed to stage version files: {add_result.stderr.strip()}")
-                return
-
-            commit_result = git_result(
-                "commit", "-m", f"chore: bump version to v{new_version} ({calver_date})"
-            )
-            if commit_result.returncode != 0:
-                print(f"  ✗ Failed to commit version bump: {commit_result.stderr.strip()}")
-                return
+            git("add", str(VERSION_FILE), str(PYPROJECT_FILE))
+            git("commit", "-m", f"chore: bump version to v{new_version} ({calver_date})")
            print(f"  ✓ Committed version bump")

        # Create annotated tag
-        tag_result = git_result(
-            "tag", "-a", tag_name, "-m",
-            f"Hermes Agent v{new_version} ({calver_date})\n\nWeekly release"
-        )
-        if tag_result.returncode != 0:
-            print(f"  ✗ Failed to create tag {tag_name}: {tag_result.stderr.strip()}")
-            return
+        git("tag", "-a", tag_name, "-m",
+            f"Hermes Agent v{new_version} ({calver_date})\n\nWeekly release")
        print(f"  ✓ Created tag {tag_name}")

        # Push
-        push_result = git_result("push", "origin", "HEAD", "--tags")
-        if push_result.returncode == 0:
-            print(f"  ✓ Pushed to origin")
-        else:
-            print(f"  ✗ Failed to push to origin: {push_result.stderr.strip()}")
-            print("    Continue manually after fixing access:")
-            print("    git push origin HEAD --tags")
-
-        # Build semver-named Python artifacts so downstream packagers
-        # (e.g. Homebrew) can target them without relying on CalVer tag names.
-        artifacts = build_release_artifacts(new_version)
-        if artifacts:
-            print("  ✓ Built release artifacts:")
-            for artifact in artifacts:
-                print(f"    - {artifact.relative_to(REPO_ROOT)}")
+        push_result = git("push", "origin", "HEAD", "--tags")
+        print(f"  ✓ Pushed to origin")

        # Create GitHub release
        changelog_file = REPO_ROOT / ".release_notes.md"
        changelog_file.write_text(changelog)

-        gh_cmd = [
-            "gh", "release", "create", tag_name,
-            "--title", f"Hermes Agent v{new_version} ({calver_date})",
-            "--notes-file", str(changelog_file),
-        ]
-        gh_cmd.extend(str(path) for path in artifacts)
+        result = subprocess.run(
+            ["gh", "release", "create", tag_name,
+             "--title", f"Hermes Agent v{new_version} ({calver_date})",
+             "--notes-file", str(changelog_file)],
+            capture_output=True, text=True,
+            cwd=str(REPO_ROOT),
+        )

-        gh_bin = shutil.which("gh")
-        if gh_bin:
-            result = subprocess.run(
-                gh_cmd,
-                capture_output=True, text=True,
-                cwd=str(REPO_ROOT),
-            )
-        else:
-            result = None
+        changelog_file.unlink(missing_ok=True)

-        if result and result.returncode == 0:
-            changelog_file.unlink(missing_ok=True)
+        if result.returncode == 0:
            print(f"  ✓ GitHub release created: {result.stdout.strip()}")
-            print(f"\n  🎉 Release v{new_version} ({tag_name}) published!")
        else:
-            if result is None:
-                print("  ✗ GitHub release skipped: `gh` CLI not found.")
-            else:
-                print(f"  ✗ GitHub release failed: {result.stderr.strip()}")
-            print(f"    Release notes kept at: {changelog_file}")
-            print(f"    Tag was created locally. Create the release manually:")
-            print(
-                f"    gh release create {tag_name} --title 'Hermes Agent v{new_version} ({calver_date})' "
-                f"--notes-file .release_notes.md {' '.join(str(path) for path in artifacts)}"
-            )
-            print(f"\n  ✓ Release artifacts prepared for manual publish: v{new_version} ({tag_name})")
+            print(f"  ✗ GitHub release failed: {result.stderr}")
+            print(f"    Tag was created. Create the release manually:")
+            print(f"    gh release create {tag_name} --title 'Hermes Agent v{new_version} ({calver_date})'")
+
+        print(f"\n  🎉 Release v{new_version} ({tag_name}) published!")
    else:
        print(f"\n{'='*60}")
        print(f"  Dry run complete. To publish, add --publish")
@@ -68,11 +68,6 @@ export function matchesAllowedUser(senderId, allowedUsers, sessionDir) {
    return true;
  }

-  // "*" means allow everyone (consistent with SIGNAL_GROUP_ALLOWED_USERS)
-  if (allowedUsers.has('*')) {
-    return true;
-  }
-
  const aliases = expandWhatsAppIdentifiers(senderId, sessionDir);
  for (const alias of aliases) {
    if (allowedUsers.has(alias)) {
@@ -45,15 +45,3 @@ test('matchesAllowedUser accepts mapped lid sender when allowlist only contains
    rmSync(sessionDir, { recursive: true, force: true });
  }
 });
-
-test('matchesAllowedUser treats * as allow-all wildcard', () => {
-  const sessionDir = mkdtempSync(path.join(os.tmpdir(), 'hermes-wa-allowlist-'));
-
-  try {
-    const allowedUsers = parseAllowedUsers('*');
-    assert.equal(matchesAllowedUser('19175395595@s.whatsapp.net', allowedUsers, sessionDir), true);
-    assert.equal(matchesAllowedUser('267383306489914@lid', allowedUsers, sessionDir), true);
-  } finally {
-    rmSync(sessionDir, { recursive: true, force: true });
-  }
-});
@@ -1,655 +1,203 @@
 ---
-name: hermes-agent
-description: Complete guide to using and extending Hermes Agent — CLI usage, setup, configuration, spawning additional agents, gateway platforms, skills, voice, tools, profiles, and a concise contributor reference. Load this skill when helping users configure Hermes, troubleshoot issues, spawn agent instances, or make code contributions.
-version: 2.0.0
-author: Hermes Agent + Teknium
+name: hermes-agent-spawning
+description: Spawn additional Hermes Agent instances as autonomous subprocesses for independent long-running tasks. Supports non-interactive one-shot mode (-q) and interactive PTY mode for multi-turn collaboration. Different from delegate_task — this runs a full separate hermes process.
+version: 1.1.0
+author: Hermes Agent
 license: MIT
 metadata:
  hermes:
-    tags: [hermes, setup, configuration, multi-agent, spawning, cli, gateway, development]
+    tags: [Agent, Hermes, Multi-Agent, Orchestration, Subprocess, Interactive]
    homepage: https://github.com/NousResearch/hermes-agent
-    related_skills: [claude-code, codex, opencode]
+    related_skills: [claude-code, codex]
 ---

-# Hermes Agent
+# Spawning Hermes Agent Instances

-Hermes Agent is an open-source AI agent framework by Nous Research that runs in your terminal, messaging platforms, and IDEs. It belongs to the same category as Claude Code (Anthropic), Codex (OpenAI), and OpenClaw — autonomous coding and task-execution agents that use tool calling to interact with your system. Hermes works with any LLM provider (OpenRouter, Anthropic, OpenAI, DeepSeek, local models, and 15+ others) and runs on Linux, macOS, and WSL.
+Run additional Hermes Agent processes as autonomous subprocesses. Unlike `delegate_task` (which spawns lightweight subagents sharing the same process), this launches fully independent `hermes` CLI processes with their own sessions, tools, and terminal environments.

-What makes Hermes different:
+## When to Use This vs delegate_task

- **Self-improving through skills** — Hermes learns from experience by saving reusable procedures as skills. When it solves a complex problem, discovers a workflow, or gets corrected, it can persist that knowledge as a skill document that loads into future sessions. Skills accumulate over time, making the agent better at your specific tasks and environment.
- **Persistent memory across sessions** — remembers who you are, your preferences, environment details, and lessons learned. Pluggable memory backends (built-in, Honcho, Mem0, and more) let you choose how memory works.
- **Multi-platform gateway** — the same agent runs on Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, and 8+ other platforms with full tool access, not just chat.
- **Provider-agnostic** — swap models and providers mid-workflow without changing anything else. Credential pools rotate across multiple API keys automatically.
- **Profiles** — run multiple independent Hermes instances with isolated configs, sessions, skills, and memory.
- **Extensible** — plugins, MCP servers, custom tools, webhook triggers, cron scheduling, and the full Python ecosystem.
+| Feature | `delegate_task` | Spawning `hermes` process |
+|---------|-----------------|--------------------------|
+| Context isolation | Separate conversation, shared process | Fully independent process |
+| Tool access | Subset of parent's tools | Full tool access (all toolsets) |
+| Session persistence | Ephemeral (no DB entry) | Full session logging + DB |
+| Duration | Minutes (bounded by parent's loop) | Hours/days (runs independently) |
+| Monitoring | Parent waits for result | Background process, monitor via `process` tool |
+| Interactive | No | Yes (PTY mode supports back-and-forth) |
+| Use case | Quick parallel subtasks | Long autonomous missions, interactive collaboration |

-People use Hermes for software development, research, system administration, data analysis, content creation, home automation, and anything else that benefits from an AI agent with persistent context and full system access.
+## Prerequisites

-**This skill helps you work with Hermes Agent effectively** — setting it up, configuring features, spawning additional agent instances, troubleshooting issues, finding the right commands and settings, and understanding how the system works when you need to extend or contribute to it.
+- `hermes` CLI installed and on PATH
+- API key configured in `~/.hermes/.env`

-**Docs:** https://hermes-agent.nousresearch.com/docs/
+### Installation

-## Quick Start
+Requires an interactive shell (the installer runs a setup wizard):

-```bash
-# Install
+```
 curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
-
-# Interactive chat (default)
-hermes
-
-# Single query
-hermes chat -q "What is the capital of France?"
-
-# Setup wizard
-hermes setup
-
-# Change model/provider
-hermes model
-
-# Check health
-hermes doctor
 ```

---
+This installs uv, Python 3.11, clones the repo, sets up the venv, and launches an interactive setup wizard to configure your API provider and model. See the [GitHub repo](https://github.com/NousResearch/hermes-agent) for details.

-## CLI Reference
+## Resuming Previous Sessions

-### Global Flags
+Resume a prior CLI session instead of starting fresh. Useful for continuing long tasks across process restarts:

 ```
-hermes [flags] [command]
+# Resume the most recent CLI session
+terminal(command="hermes --continue", background=true, pty=true)

-  --version, -V             Show version
-  --resume, -r SESSION      Resume session by ID or title
-  --continue, -c [NAME]     Resume by name, or most recent session
-  --worktree, -w            Isolated git worktree mode (parallel agents)
-  --skills, -s SKILL        Preload skills (comma-separate or repeat)
-  --profile, -p NAME        Use a named profile
-  --yolo                    Skip dangerous command approval
-  --pass-session-id         Include session ID in system prompt
+# Resume a specific session by ID (shown on exit)
+terminal(command="hermes --resume 20260225_143052_a1b2c3", background=true, pty=true)
 ```

-No subcommand defaults to `chat`.
+The full conversation history (messages, tool calls, responses) is restored from SQLite. The agent sees everything from the previous session.

-### Chat
+## Mode 1: One-Shot Query (-q flag)
+
+Run a single query non-interactively. The agent executes, does its work, and exits:

 ```
-hermes chat [flags]
-  -q, --query TEXT          Single query, non-interactive
-  -m, --model MODEL         Model (e.g. anthropic/claude-sonnet-4)
-  -t, --toolsets LIST       Comma-separated toolsets
-  --provider PROVIDER       Force provider (openrouter, anthropic, nous, etc.)
-  -v, --verbose             Verbose output
-  -Q, --quiet               Suppress banner, spinner, tool previews
-  --checkpoints             Enable filesystem checkpoints (/rollback)
-  --source TAG              Session source tag (default: cli)
+terminal(command="hermes chat -q 'Research the latest GRPO training papers and write a summary to ~/research/grpo.md'", timeout=300)
 ```

-### Configuration
-
+Background for long tasks:
 ```
-hermes setup [section]      Interactive wizard (model|terminal|gateway|tools|agent)
-hermes model                Interactive model/provider picker
-hermes config               View current config
-hermes config edit          Open config.yaml in $EDITOR
-hermes config set KEY VAL   Set a config value
-hermes config path          Print config.yaml path
-hermes config env-path      Print .env path
-hermes config check         Check for missing/outdated config
-hermes config migrate       Update config with new options
-hermes login [--provider P] OAuth login (nous, openai-codex)
-hermes logout               Clear stored auth
-hermes doctor [--fix]       Check dependencies and config
-hermes status [--all]       Show component status
-```
-
-### Tools & Skills
-
-```
-hermes tools                Interactive tool enable/disable (curses UI)
-hermes tools list           Show all tools and status
-hermes tools enable NAME    Enable a toolset
-hermes tools disable NAME   Disable a toolset
-
-hermes skills list          List installed skills
-hermes skills search QUERY  Search the skills hub
-hermes skills install ID    Install a skill
-hermes skills inspect ID    Preview without installing
-hermes skills config        Enable/disable skills per platform
-hermes skills check         Check for updates
-hermes skills update        Update outdated skills
-hermes skills uninstall N   Remove a hub skill
-hermes skills publish PATH  Publish to registry
-hermes skills browse        Browse all available skills
-hermes skills tap add REPO  Add a GitHub repo as skill source
-```
-
-### MCP Servers
-
-```
-hermes mcp serve            Run Hermes as an MCP server
-hermes mcp add NAME         Add an MCP server (--url or --command)
-hermes mcp remove NAME      Remove an MCP server
-hermes mcp list             List configured servers
-hermes mcp test NAME        Test connection
-hermes mcp configure NAME   Toggle tool selection
-```
-
-### Gateway (Messaging Platforms)
-
-```
-hermes gateway run          Start gateway foreground
-hermes gateway install      Install as background service
-hermes gateway start/stop   Control the service
-hermes gateway restart      Restart the service
-hermes gateway status       Check status
-hermes gateway setup        Configure platforms
-```
-
-Supported platforms: Telegram, Discord, Slack, WhatsApp, Signal, Email, SMS, Matrix, Mattermost, Home Assistant, DingTalk, Feishu, WeCom, API Server, Webhooks, Open WebUI.
-
-Platform docs: https://hermes-agent.nousresearch.com/docs/user-guide/messaging/
-
-### Sessions
-
-```
-hermes sessions list        List recent sessions
-hermes sessions browse      Interactive picker
-hermes sessions export OUT  Export to JSONL
-hermes sessions rename ID T Rename a session
-hermes sessions delete ID   Delete a session
-hermes sessions prune       Clean up old sessions (--older-than N days)
-hermes sessions stats       Session store statistics
-```
-
-### Cron Jobs
-
-```
-hermes cron list            List jobs (--all for disabled)
-hermes cron create SCHED    Create: '30m', 'every 2h', '0 9 * * *'
-hermes cron edit ID         Edit schedule, prompt, delivery
-hermes cron pause/resume ID Control job state
-hermes cron run ID          Trigger on next tick
-hermes cron remove ID       Delete a job
-hermes cron status          Scheduler status
-```
-
-### Webhooks
-
-```
-hermes webhook subscribe N  Create route at /webhooks/<name>
-hermes webhook list         List subscriptions
-hermes webhook remove NAME  Remove a subscription
-hermes webhook test NAME    Send a test POST
-```
-
-### Profiles
-
-```
-hermes profile list         List all profiles
-hermes profile create NAME  Create (--clone, --clone-all, --clone-from)
-hermes profile use NAME     Set sticky default
-hermes profile delete NAME  Delete a profile
-hermes profile show NAME    Show details
-hermes profile alias NAME   Manage wrapper scripts
-hermes profile rename A B   Rename a profile
-hermes profile export NAME  Export to tar.gz
-hermes profile import FILE  Import from archive
-```
-
-### Credential Pools
-
-```
-hermes auth add             Interactive credential wizard
-hermes auth list [PROVIDER] List pooled credentials
-hermes auth remove P INDEX  Remove by provider + index
-hermes auth reset PROVIDER  Clear exhaustion status
-```
-
-### Other
-
-```
-hermes insights [--days N]  Usage analytics
-hermes update               Update to latest version
-hermes pairing list/approve/revoke  DM authorization
-hermes plugins list/install/remove  Plugin management
-hermes honcho setup/status  Honcho memory integration
-hermes memory setup/status/off  Memory provider config
-hermes completion bash|zsh  Shell completions
-hermes acp                  ACP server (IDE integration)
-hermes claw migrate         Migrate from OpenClaw
-hermes uninstall            Uninstall Hermes
-```
-
---
-
-## Slash Commands (In-Session)
-
-Type these during an interactive chat session.
-
-### Session Control
-```
-/new (/reset)        Fresh session
-/clear               Clear screen + new session (CLI)
-/retry               Resend last message
-/undo                Remove last exchange
-/title [name]        Name the session
-/compress            Manually compress context
-/stop                Kill background processes
-/rollback [N]        Restore filesystem checkpoint
-/background <prompt> Run prompt in background
-/queue <prompt>      Queue for next turn
-/resume [name]       Resume a named session
-```
-
-### Configuration
-```
-/config              Show config (CLI)
-/model [name]        Show or change model
-/provider            Show provider info
-/prompt [text]       View/set system prompt (CLI)
-/personality [name]  Set personality
-/reasoning [level]   Set reasoning (none|low|medium|high|xhigh|show|hide)
-/verbose             Cycle: off → new → all → verbose
-/voice [on|off|tts]  Voice mode
-/yolo                Toggle approval bypass
-/skin [name]         Change theme (CLI)
-/statusbar           Toggle status bar (CLI)
-```
-
-### Tools & Skills
-```
-/tools               Manage tools (CLI)
-/toolsets            List toolsets (CLI)
-/skills              Search/install skills (CLI)
-/skill <name>        Load a skill into session
-/cron                Manage cron jobs (CLI)
-/reload-mcp          Reload MCP servers
-/plugins             List plugins (CLI)
-```
-
-### Info
-```
-/help                Show commands
-/commands [page]     Browse all commands (gateway)
-/usage               Token usage
-/insights [days]     Usage analytics
-/status              Session info (gateway)
-/profile             Active profile info
-```
-
-### Exit
-```
-/quit (/exit, /q)    Exit CLI
-```
-
---
-
-## Key Paths & Config
-
-```
-~/.hermes/config.yaml       Main configuration
-~/.hermes/.env              API keys and secrets
-~/.hermes/skills/           Installed skills
-~/.hermes/sessions/         Session transcripts
-~/.hermes/logs/             Gateway and error logs
-~/.hermes/auth.json         OAuth tokens and credential pools
-~/.hermes/hermes-agent/     Source code (if git-installed)
-```
-
-Profiles use `~/.hermes/profiles/<name>/` with the same layout.
-
-### Config Sections
-
-Edit with `hermes config edit` or `hermes config set section.key value`.
-
-| Section | Key options |
-|---------|-------------|
-| `model` | `default`, `provider`, `base_url`, `api_key`, `context_length` |
-| `agent` | `max_turns` (90), `tool_use_enforcement` |
-| `terminal` | `backend` (local/docker/ssh/modal), `cwd`, `timeout` (180) |
-| `compression` | `enabled`, `threshold` (0.50), `target_ratio` (0.20) |
-| `display` | `skin`, `tool_progress`, `show_reasoning`, `show_cost` |
-| `stt` | `enabled`, `provider` (local/groq/openai) |
-| `tts` | `provider` (edge/elevenlabs/openai/kokoro/fish) |
-| `memory` | `memory_enabled`, `user_profile_enabled`, `provider` |
-| `security` | `tirith_enabled`, `website_blocklist` |
-| `delegation` | `model`, `provider`, `max_iterations` (50) |
-| `smart_model_routing` | `enabled`, `cheap_model` |
-| `checkpoints` | `enabled`, `max_snapshots` (50) |
-
-Full config reference: https://hermes-agent.nousresearch.com/docs/user-guide/configuration
-
-### Providers
-
-18 providers supported. Set via `hermes model` or `hermes setup`.
-
-| Provider | Auth | Key env var |
-|----------|------|-------------|
-| OpenRouter | API key | `OPENROUTER_API_KEY` |
-| Anthropic | API key | `ANTHROPIC_API_KEY` |
-| Nous Portal | OAuth | `hermes login --provider nous` |
-| OpenAI Codex | OAuth | `hermes login --provider openai-codex` |
-| GitHub Copilot | Token | `COPILOT_GITHUB_TOKEN` |
-| DeepSeek | API key | `DEEPSEEK_API_KEY` |
-| Hugging Face | Token | `HF_TOKEN` |
-| Z.AI / GLM | API key | `GLM_API_KEY` |
-| MiniMax | API key | `MINIMAX_API_KEY` |
-| Kimi / Moonshot | API key | `KIMI_API_KEY` |
-| Alibaba / DashScope | API key | `DASHSCOPE_API_KEY` |
-| Kilo Code | API key | `KILOCODE_API_KEY` |
-| Custom endpoint | Config | `model.base_url` + `model.api_key` in config.yaml |
-
-Plus: AI Gateway, OpenCode Zen, OpenCode Go, MiniMax CN, GitHub Copilot ACP.
-
-Full provider docs: https://hermes-agent.nousresearch.com/docs/integrations/providers
-
-### Toolsets
-
-Enable/disable via `hermes tools` (interactive) or `hermes tools enable/disable NAME`.
-
-| Toolset | What it provides |
-|---------|-----------------|
-| `web` | Web search and content extraction |
-| `browser` | Browser automation (Browserbase, Camofox, or local Chromium) |
-| `terminal` | Shell commands and process management |
-| `file` | File read/write/search/patch |
-| `code_execution` | Sandboxed Python execution |
-| `vision` | Image analysis |
-| `image_gen` | AI image generation |
-| `tts` | Text-to-speech |
-| `skills` | Skill browsing and management |
-| `memory` | Persistent cross-session memory |
-| `session_search` | Search past conversations |
-| `delegation` | Subagent task delegation |
-| `cronjob` | Scheduled task management |
-| `clarify` | Ask user clarifying questions |
-| `moa` | Mixture of Agents (off by default) |
-| `homeassistant` | Smart home control (off by default) |
-
-Tool changes take effect on `/reset` (new session). They do NOT apply mid-conversation to preserve prompt caching.
-
---
-
-## Voice & Transcription
-
-### STT (Voice → Text)
-
-Voice messages from messaging platforms are auto-transcribed.
-
-Provider priority (auto-detected):
-1. **Local faster-whisper** — free, no API key: `pip install faster-whisper`
-2. **Groq Whisper** — free tier: set `GROQ_API_KEY`
-3. **OpenAI Whisper** — paid: set `VOICE_TOOLS_OPENAI_KEY`
-
-Config:
-```yaml
-stt:
-  enabled: true
-  provider: local        # local, groq, openai
-  local:
-    model: base          # tiny, base, small, medium, large-v3
-```
-
-### TTS (Text → Voice)
-
-| Provider | Env var | Free? |
-|----------|---------|-------|
-| Edge TTS | None | Yes (default) |
-| ElevenLabs | `ELEVENLABS_API_KEY` | Free tier |
-| OpenAI | `VOICE_TOOLS_OPENAI_KEY` | Paid |
-| Kokoro (local) | None | Free |
-| Fish Audio | `FISH_AUDIO_API_KEY` | Free tier |
-
-Voice commands: `/voice on` (voice-to-voice), `/voice tts` (always voice), `/voice off`.
-
---
-
-## Spawning Additional Hermes Instances
-
-Run additional Hermes processes as fully independent subprocesses — separate sessions, tools, and environments.
-
-### When to Use This vs delegate_task
-
-| | `delegate_task` | Spawning `hermes` process |
-|-|-----------------|--------------------------|
-| Isolation | Separate conversation, shared process | Fully independent process |
-| Duration | Minutes (bounded by parent loop) | Hours/days |
-| Tool access | Subset of parent's tools | Full tool access |
-| Interactive | No | Yes (PTY mode) |
-| Use case | Quick parallel subtasks | Long autonomous missions |
-
-### One-Shot Mode
-
-```
-terminal(command="hermes chat -q 'Research GRPO papers and write summary to ~/research/grpo.md'", timeout=300)
-
-# Background for long tasks:
 terminal(command="hermes chat -q 'Set up CI/CD for ~/myapp'", background=true)
+# Returns session_id, monitor with process tool
 ```

-### Interactive PTY Mode (via tmux)
+## Mode 2: Interactive PTY Session

-Hermes uses prompt_toolkit, which requires a real terminal. Use tmux for interactive spawning:
+Launch a full interactive Hermes session with PTY for back-and-forth collaboration. You can send messages, review its work, give feedback, and steer it.
+
+Note: Hermes uses prompt_toolkit for its CLI UI. Through a PTY, this works because ptyprocess provides a real terminal — input sent via `submit` arrives as keystrokes. The output log will contain ANSI escape sequences from the UI rendering — focus on the text content, not the formatting.

 ```
-# Start
-terminal(command="tmux new-session -d -s agent1 -x 120 -y 40 'hermes'", timeout=10)
+# Start interactive hermes in background with PTY
+terminal(command="hermes", workdir="~/project", background=true, pty=true)
+# Returns session_id

-# Wait for startup, then send a message
-terminal(command="sleep 8 && tmux send-keys -t agent1 'Build a FastAPI auth service' Enter", timeout=15)
+# Send it a task
+process(action="submit", session_id="<id>", data="Set up a Python project with FastAPI, add auth endpoints, and write tests")

-# Read output
-terminal(command="sleep 20 && tmux capture-pane -t agent1 -p", timeout=5)
+# Wait for it to work, then check progress
+process(action="log", session_id="<id>")

-# Send follow-up
-terminal(command="tmux send-keys -t agent1 'Add rate limiting middleware' Enter", timeout=5)
+# Give feedback on what it produced
+process(action="submit", session_id="<id>", data="The tests look good but add edge cases for invalid tokens")

-# Exit
-terminal(command="tmux send-keys -t agent1 '/exit' Enter && sleep 2 && tmux kill-session -t agent1", timeout=10)
+# Check its response
+process(action="log", session_id="<id>")
+
+# Ask it to iterate
+process(action="submit", session_id="<id>", data="Now add rate limiting middleware")
+
+# When done, exit the session
+process(action="submit", session_id="<id>", data="/exit")
 ```

-### Multi-Agent Coordination
+### Interactive Collaboration Patterns

+**Code review loop** — spawn hermes, send code for review, iterate on feedback:
+```
+terminal(command="hermes", workdir="~/project", background=true, pty=true)
+process(action="submit", session_id="<id>", data="Review the changes in src/auth.py and suggest improvements")
+# ... read its review ...
+process(action="submit", session_id="<id>", data="Good points. Go ahead and implement suggestions 1 and 3")
+# ... it makes changes ...
+process(action="submit", session_id="<id>", data="Run the tests to make sure nothing broke")
+```
+
+**Research with steering** — start broad, narrow down based on findings:
+```
+terminal(command="hermes", background=true, pty=true)
+process(action="submit", session_id="<id>", data="Search for the latest papers on KV cache compression techniques")
+# ... read its findings ...
+process(action="submit", session_id="<id>", data="The MQA approach looks promising. Dig deeper into that one and compare with GQA")
+# ... more detailed research ...
+process(action="submit", session_id="<id>", data="Write up everything you found to ~/research/kv-cache-compression.md")
+```
+
+**Multi-agent coordination** — spawn two agents working on related tasks, pass context between them:
 ```
 # Agent A: backend
-terminal(command="tmux new-session -d -s backend -x 120 -y 40 'hermes -w'", timeout=10)
-terminal(command="sleep 8 && tmux send-keys -t backend 'Build REST API for user management' Enter", timeout=15)
+terminal(command="hermes", workdir="~/project/backend", background=true, pty=true)
+process(action="submit", session_id="<agent-a>", data="Build a REST API for user management with CRUD endpoints")

 # Agent B: frontend
-terminal(command="tmux new-session -d -s frontend -x 120 -y 40 'hermes -w'", timeout=10)
-terminal(command="sleep 8 && tmux send-keys -t frontend 'Build React dashboard for user management' Enter", timeout=15)
+terminal(command="hermes", workdir="~/project/frontend", background=true, pty=true)
+process(action="submit", session_id="<agent-b>", data="Build a React dashboard that will connect to a REST API at localhost:8000/api/users")

-# Check progress, relay context between them
-terminal(command="tmux capture-pane -t backend -p | tail -30", timeout=5)
-terminal(command="tmux send-keys -t frontend 'Here is the API schema from the backend agent: ...' Enter", timeout=5)
+# Check Agent A's progress, relay API schema to Agent B
+process(action="log", session_id="<agent-a>")
+process(action="submit", session_id="<agent-b>", data="Here's the API schema Agent A built: GET /api/users, POST /api/users, etc. Update your fetch calls to match.")
 ```

-### Session Resume
+## Parallel Non-Interactive Instances
+
+Spawn multiple independent agents for unrelated tasks:

 ```
-# Resume most recent session
-terminal(command="tmux new-session -d -s resumed 'hermes --continue'", timeout=10)
-
-# Resume specific session
-terminal(command="tmux new-session -d -s resumed 'hermes --resume 20260225_143052_a1b2c3'", timeout=10)
+terminal(command="hermes chat -q 'Research competitor landing pages and write a report to ~/research/competitors.md'", background=true)
+terminal(command="hermes chat -q 'Audit security of ~/myapp and write findings to ~/myapp/SECURITY_AUDIT.md'", background=true)
+process(action="list")
 ```

-### Tips
-
- **Prefer `delegate_task` for quick subtasks** — less overhead than spawning a full process
- **Use `-w` (worktree mode)** when spawning agents that edit code — prevents git conflicts
- **Set timeouts** for one-shot mode — complex tasks can take 5-10 minutes
- **Use `hermes chat -q` for fire-and-forget** — no PTY needed
- **Use tmux for interactive sessions** — raw PTY mode has `\r` vs `\n` issues with prompt_toolkit
- **For scheduled tasks**, use the `cronjob` tool instead of spawning — handles delivery and retry
-
---
-
-## Troubleshooting
-
-### Voice not working
-1. Check `stt.enabled: true` in config.yaml
-2. Verify provider: `pip install faster-whisper` or set API key
-3. Restart gateway: `/restart`
-
-### Tool not available
-1. `hermes tools` — check if toolset is enabled for your platform
-2. Some tools need env vars (check `.env`)
-3. `/reset` after enabling tools
-
-### Model/provider issues
-1. `hermes doctor` — check config and dependencies
-2. `hermes login` — re-authenticate OAuth providers
-3. Check `.env` has the right API key
-
-### Changes not taking effect
- **Tools/skills:** `/reset` starts a new session with updated toolset
- **Config changes:** `/restart` reloads gateway config
- **Code changes:** Restart the CLI or gateway process
-
-### Skills not showing
-1. `hermes skills list` — verify installed
-2. `hermes skills config` — check platform enablement
-3. Load explicitly: `/skill name` or `hermes -s name`
-
-### Gateway issues
-Check logs first:
-```bash
-grep -i "failed to send\|error" ~/.hermes/logs/gateway.log | tail -20
-```
-
---
-
-## Where to Find Things
-
-| Looking for... | Location |
-|----------------|----------|
-| Config options | `hermes config edit` or [Configuration docs](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) |
-| Available tools | `hermes tools list` or [Tools reference](https://hermes-agent.nousresearch.com/docs/reference/tools-reference) |
-| Slash commands | `/help` in session or [Slash commands reference](https://hermes-agent.nousresearch.com/docs/reference/slash-commands) |
-| Skills catalog | `hermes skills browse` or [Skills catalog](https://hermes-agent.nousresearch.com/docs/reference/skills-catalog) |
-| Provider setup | `hermes model` or [Providers guide](https://hermes-agent.nousresearch.com/docs/integrations/providers) |
-| Platform setup | `hermes gateway setup` or [Messaging docs](https://hermes-agent.nousresearch.com/docs/user-guide/messaging/) |
-| MCP servers | `hermes mcp list` or [MCP guide](https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp) |
-| Profiles | `hermes profile list` or [Profiles docs](https://hermes-agent.nousresearch.com/docs/user-guide/profiles) |
-| Cron jobs | `hermes cron list` or [Cron docs](https://hermes-agent.nousresearch.com/docs/user-guide/features/cron) |
-| Memory | `hermes memory status` or [Memory docs](https://hermes-agent.nousresearch.com/docs/user-guide/features/memory) |
-| Env variables | `hermes config env-path` or [Env vars reference](https://hermes-agent.nousresearch.com/docs/reference/environment-variables) |
-| CLI commands | `hermes --help` or [CLI reference](https://hermes-agent.nousresearch.com/docs/reference/cli-commands) |
-| Gateway logs | `~/.hermes/logs/gateway.log` |
-| Session files | `~/.hermes/sessions/` or `hermes sessions browse` |
-| Source code | `~/.hermes/hermes-agent/` |
-
---
-
-## Contributor Quick Reference
-
-For occasional contributors and PR authors. Full developer docs: https://hermes-agent.nousresearch.com/docs/developer-guide/
-
-### Project Layout
+## With Custom Model

 ```
-hermes-agent/
-├── run_agent.py          # AIAgent — core conversation loop
-├── model_tools.py        # Tool discovery and dispatch
-├── toolsets.py           # Toolset definitions
-├── cli.py                # Interactive CLI (HermesCLI)
-├── hermes_state.py       # SQLite session store
-├── agent/                # Prompt builder, compression, display, adapters
-├── hermes_cli/           # CLI subcommands, config, setup, commands
-│   ├── commands.py       # Slash command registry (CommandDef)
-│   ├── config.py         # DEFAULT_CONFIG, env var definitions
-│   └── main.py           # CLI entry point and argparse
-├── tools/                # One file per tool
-│   └── registry.py       # Central tool registry
-├── gateway/              # Messaging gateway
-│   └── platforms/        # Platform adapters (telegram, discord, etc.)
-├── cron/                 # Job scheduler
-├── tests/                # ~3000 pytest tests
-└── website/              # Docusaurus docs site
+terminal(command="hermes chat -q 'Summarize this codebase' --model google/gemini-2.5-pro", workdir="~/project", background=true)
 ```

-Config: `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys).
+## Gateway Cron Integration

-### Adding a Tool (3 files)
+For scheduled autonomous tasks, use the unified `cronjob` tool instead of spawning processes — cron jobs handle delivery, retry, and persistence automatically.

-**1. Create `tools/your_tool.py`:**
-```python
-import json, os
-from tools.registry import registry
+## Key Differences Between Modes

-def check_requirements() -> bool:
-    return bool(os.getenv("EXAMPLE_API_KEY"))
+| | `-q` (one-shot) | Interactive (PTY) | `--continue` / `--resume` |
+|---|---|---|---|
+| User interaction | None | Full back-and-forth | Full back-and-forth |
+| PTY required | No | Yes (`pty=true`) | Yes (`pty=true`) |
+| Multi-turn | Single query | Unlimited turns | Continues previous turns |
+| Best for | Fire-and-forget tasks | Iterative work, steering | Picking up where you left off |
+| Exit | Automatic after completion | Send `/exit` or kill | Send `/exit` or kill |

-def example_tool(param: str, task_id: str = None) -> str:
-    return json.dumps({"success": True, "data": "..."})
+## Known Issues

-registry.register(
-    name="example_tool",
-    toolset="example",
-    schema={"name": "example_tool", "description": "...", "parameters": {...}},
-    handler=lambda args, **kw: example_tool(
-        param=args.get("param", ""), task_id=kw.get("task_id")),
-    check_fn=check_requirements,
-    requires_env=["EXAMPLE_API_KEY"],
-)
-```
-
-**2. Add import** in `model_tools.py` → `_discover_tools()` list.
-
-**3. Add to `toolsets.py`** → `_HERMES_CORE_TOOLS` list.
-
-All handlers must return JSON strings. Use `get_hermes_home()` for paths, never hardcode `~/.hermes`.
-
-### Adding a Slash Command
-
-1. Add `CommandDef` to `COMMAND_REGISTRY` in `hermes_cli/commands.py`
-2. Add handler in `cli.py` → `process_command()`
-3. (Optional) Add gateway handler in `gateway/run.py`
-
-All consumers (help text, autocomplete, Telegram menu, Slack mapping) derive from the central registry automatically.
-
-### Agent Loop (High Level)
+- **Interactive PTY + prompt_toolkit**: The `submit` action sends `\n` (line feed) but prompt_toolkit in raw mode expects `\r` (carriage return) for Enter. Text appears in the prompt but never submits. **Workaround**: Use **tmux** instead of raw PTY mode. tmux's `send-keys Enter` sends the correct `\r`:

 ```
-run_conversation():
-  1. Build system prompt
-  2. Loop while iterations < max:
-     a. Call LLM (OpenAI-format messages + tool schemas)
-     b. If tool_calls → dispatch each via handle_function_call() → append results → continue
-     c. If text response → return
-  3. Context compression triggers automatically near token limit
+# Start hermes inside tmux
+tmux new-session -d -s hermes-session -x 120 -y 40 "hermes"
+sleep 10  # Wait for banner/startup
+
+# Send messages
+tmux send-keys -t hermes-session "your message here" Enter
+
+# Read output
+sleep 15  # Wait for LLM response
+tmux capture-pane -t hermes-session -p
+
+# Multi-turn: just send more messages and capture again
+tmux send-keys -t hermes-session "follow-up message" Enter
+
+# Exit when done
+tmux send-keys -t hermes-session "/exit" Enter
+tmux kill-session -t hermes-session
 ```

-### Testing
+## Rules

-```bash
-source venv/bin/activate  # or .venv/bin/activate
-python -m pytest tests/ -o 'addopts=' -q   # Full suite
-python -m pytest tests/tools/ -q            # Specific area
-```
-
- Tests auto-redirect `HERMES_HOME` to temp dirs — never touch real `~/.hermes/`
- Run full suite before pushing any change
- Use `-o 'addopts='` to clear any baked-in pytest flags
-
-### Commit Conventions
-
-```
-type: concise subject line
-
-Optional body.
-```
-
-Types: `fix:`, `feat:`, `refactor:`, `docs:`, `chore:`
-
-### Key Rules
-
- **Never break prompt caching** — don't change context, tools, or system prompt mid-conversation
- **Message role alternation** — never two assistant or two user messages in a row
- Use `get_hermes_home()` from `hermes_constants` for all paths (profile-safe)
- Config values go in `config.yaml`, secrets go in `.env`
- New tools need a `check_fn` so they only appear when requirements are met
+1. **Use `-q` for autonomous tasks** — agent works independently and exits
+2. **Use `pty=true` for interactive sessions** — required for the full CLI UI
+3. **Use `submit` not `write`** — `submit` adds a newline (Enter), `write` doesn't
+4. **Read logs before sending more** — check what the agent produced before giving next instruction
+5. **Set timeouts for `-q` mode** — complex tasks may take 5-10 minutes
+6. **Prefer `delegate_task` for quick subtasks** — spawning a full process has more overhead
+7. **Each instance is independent** — they don't share conversation context with the parent
+8. **Check results** — after completion, read the output files or logs the agent produced
@@ -744,149 +744,3 @@ class PixelBlendStack:
            result = blend_canvas(result, canvas, mode, opacity)
        return result
 ```
-
-## Text Backdrop (Readability Mask)
-
-When placing readable text over busy multi-grid ASCII backgrounds, the text will blend into the background and become illegible. **Always apply a dark backdrop behind text regions.**
-
-The technique: compute the bounding box of all text glyphs, create a gaussian-blurred dark mask covering that area with padding, and multiply the background by `(1 - mask * darkness)` before rendering text on top.
-
-```python
-from scipy.ndimage import gaussian_filter
-
-def apply_text_backdrop(canvas, glyphs, padding=80, darkness=0.75):
-    """Darken the background behind text for readability.
-    
-    Call AFTER rendering background, BEFORE rendering text.
-    
-    Args:
-        canvas: (VH, VW, 3) uint8 background
-        glyphs: list of {"x": float, "y": float, ...} glyph positions
-        padding: pixel padding around text bounding box
-        darkness: 0.0 = no darkening, 1.0 = fully black
-    Returns:
-        darkened canvas (uint8)
-    """
-    if not glyphs:
-        return canvas
-    xs = [g['x'] for g in glyphs]
-    ys = [g['y'] for g in glyphs]
-    x0 = max(0, int(min(xs)) - padding)
-    y0 = max(0, int(min(ys)) - padding)
-    x1 = min(VW, int(max(xs)) + padding + 50)   # extra for char width
-    y1 = min(VH, int(max(ys)) + padding + 60)   # extra for char height
-    
-    # Soft dark mask with gaussian blur for feathered edges
-    mask = np.zeros((VH, VW), dtype=np.float32)
-    mask[y0:y1, x0:x1] = 1.0
-    mask = gaussian_filter(mask, sigma=padding * 0.6)
-    
-    factor = 1.0 - mask * darkness
-    return (canvas.astype(np.float32) * factor[:, :, np.newaxis]).astype(np.uint8)
-```
-
-### Usage in render pipeline
-
-Insert between background rendering and text rendering:
-
-```python
-# 1. Render background (multi-grid ASCII effects)
-bg = render_background(cfg, t)
-
-# 2. Darken behind text region
-bg = apply_text_backdrop(bg, frame_glyphs, padding=80, darkness=0.75)
-
-# 3. Render text on top (now readable against dark backdrop)
-bg = text_renderer.render(bg, frame_glyphs, color=(255, 255, 255))
-```
-
-Combine with **reverse vignette** (see shaders.md) for scenes where text is always centered — the reverse vignette provides a persistent center-dark zone, while the backdrop handles per-frame glyph positions.
-
-## External Layout Oracle Pattern
-
-For text-heavy videos where text needs to dynamically reflow around obstacles (shapes, icons, other text), use an external layout engine to pre-compute glyph positions and feed them into the Python renderer via JSON.
-
-### Architecture
-
-```
-Layout Engine (browser/Node.js)  →  layouts.json  →  Python ASCII Renderer
-         ↑                                                    ↑
-   Computes per-frame                               Reads glyph positions,
-   glyph (x,y) positions                            renders as ASCII chars
-   with obstacle-aware reflow                        with full effect pipeline
-```
-
-### JSON interchange format
-
-```json
-{
-  "meta": {
-    "canvas_width": 1080, "canvas_height": 1080,
-    "fps": 24, "total_frames": 1248,
-    "fonts": {
-      "body": {"charW": 12.04, "charH": 24, "fontSize": 20},
-      "hero": {"charW": 24.08, "charH": 48, "fontSize": 40}
-    }
-  },
-  "scenes": [
-    {
-      "id": "scene_name",
-      "start_frame": 0, "end_frame": 96,
-      "frames": {
-        "0": {
-          "glyphs": [
-            {"char": "H", "x": 287.1, "y": 400.0, "alpha": 1.0},
-            {"char": "e", "x": 311.2, "y": 400.0, "alpha": 1.0}
-          ],
-          "obstacles": [
-            {"type": "circle", "cx": 540, "cy": 540, "r": 80},
-            {"type": "rect", "x": 300, "y": 500, "w": 120, "h": 80}
-          ]
-        }
-      }
-    }
-  ]
-}
-```
-
-### When to use
-
- Text that dynamically reflows around moving objects
- Per-glyph animation (reveal, scatter, physics)
- Variable typography that needs precise measurement
- Any case where Python's Pillow text layout is insufficient
-
-### When NOT to use
-
- Static centered text (just use PIL `draw.text()` directly)
- Text that only fades in/out without spatial animation
- Simple typewriter effects (handle in Python with a character counter)
-
-### Running the oracle
-
-Use Playwright to run the layout engine in a headless browser:
-
-```javascript
-// extract.mjs
-import { chromium } from 'playwright';
-const browser = await chromium.launch({ headless: true });
-const page = await browser.newPage();
-await page.goto(`file://${oraclePath}`);
-await page.waitForFunction(() => window.__ORACLE_DONE__ === true, null, { timeout: 60000 });
-const result = await page.evaluate(() => window.__ORACLE_RESULT__);
-writeFileSync('layouts.json', JSON.stringify(result));
-await browser.close();
-```
-
-### Consuming in Python
-
-```python
-# In the renderer, map pixel positions to the canvas:
-for glyph in frame_data['glyphs']:
-    char, px, py = glyph['char'], glyph['x'], glyph['y']
-    alpha = glyph.get('alpha', 1.0)
-    # Render using PIL draw.text() at exact pixel position
-    draw.text((px, py), char, fill=(int(255*alpha),)*3, font=font)
-```
-
-Obstacles from the JSON can also be rendered as glowing ASCII shapes (circles, rectangles) to visualize the reflow zones.
@@ -834,39 +834,6 @@ def sh_vignette(c, s=0.22):
    return np.clip(c * _vig_cache[k][:,:,None], 0, 255).astype(np.uint8)
 ```

-#### Reverse Vignette
-
-Inverted vignette: darkens the **center** and leaves edges bright. Useful when text is centered over busy backgrounds — creates a natural dark zone for readability without a hard-edged box.
-
-Combine with `apply_text_backdrop()` (see composition.md) for per-frame glyph-aware darkening.
-
-```python
-_rvignette_cache = {}
-
-def sh_reverse_vignette(c, strength=0.5):
-    """Center darkening, edge brightening. Cached."""
-    k = ('rv', c.shape[0], c.shape[1], round(strength, 2))
-    if k not in _rvignette_cache:
-        h, w = c.shape[:2]
-        Y = np.linspace(-1, 1, h)[:, None]
-        X = np.linspace(-1, 1, w)[None, :]
-        d = np.sqrt(X**2 + Y**2)
-        # Invert: bright at edges, dark at center
-        mask = np.clip(1.0 - (1.0 - d * 0.7) * strength, 0.2, 1.0)
-        _rvignette_cache[k] = mask[:, :, np.newaxis].astype(np.float32)
-    return np.clip(c.astype(np.float32) * _rvignette_cache[k], 0, 255).astype(np.uint8)
-```
-
-| Param | Default | Effect |
-|-------|---------|--------|
-| `strength` | 0.5 | 0 = no effect, 1.0 = center nearly black |
-
-Add to ShaderChain dispatch:
-```python
-elif name == "reverse_vignette":
-    return sh_reverse_vignette(canvas, kwargs.get("strength", 0.5))
-```
-
 #### Contrast
 ```python
 def sh_contrast(c, factor=1.3):
@@ -14,8 +14,6 @@
 | Random dark holes in output | Font missing Unicode glyphs | Validate palettes at init |
 | Audio-visual desync | Frame timing accumulation | Use integer frame counter, compute t fresh each frame |
 | Single-color flat output | Hue field shape mismatch | Ensure h,s,v arrays all (rows,cols) before hsv2rgb |
-| Text unreadable over busy bg | No contrast between text and background | Use `apply_text_backdrop()` (composition.md) + `reverse_vignette` shader (shaders.md) |
-| Text garbled/mirrored | Kaleidoscope or mirror shader applied to text scene | **Never apply kaleidoscope, mirror_h/v/quad/diag to scenes with readable text** — radial folding destroys legibility. Apply these only to background layers or text-free scenes |

 Common bugs, gotchas, and platform-specific issues encountered during ASCII video development.

@@ -0,0 +1,300 @@
+---
+name: hermes-agent-setup
+description: Help users configure Hermes Agent — CLI usage, setup wizard, model/provider selection, tools, skills, voice/STT/TTS, gateway, and troubleshooting. Use when someone asks to enable features, configure settings, or needs help with Hermes itself.
+version: 1.1.0
+author: Hermes Agent
+tags: [setup, configuration, tools, stt, tts, voice, hermes, cli, skills]
+---
+
+# Hermes Agent Setup & Configuration
+
+Use this skill when a user asks about configuring Hermes, enabling features, setting up voice, managing tools/skills, or troubleshooting.
+
+## Key Paths
+
+- Config: `~/.hermes/config.yaml`
+- API keys: `~/.hermes/.env`
+- Skills: `~/.hermes/skills/`
+- Hermes install: `~/.hermes/hermes-agent/`
+- Venv: `~/.hermes/hermes-agent/venv/`
+
+## CLI Overview
+
+Hermes is used via the `hermes` command (or `python -m hermes_cli.main` from the repo).
+
+### Core commands:
+
+```
+hermes                          Interactive chat (default)
+hermes chat -q "question"       Single query, then exit
+hermes chat -m MODEL            Chat with a specific model
+hermes -c                       Resume most recent session
+hermes -c "project name"        Resume session by name
+hermes --resume SESSION_ID      Resume by exact ID
+hermes -w                       Isolated git worktree mode
+hermes -s skill1,skill2         Preload skills for the session
+hermes --yolo                   Skip dangerous command approval
+```
+
+### Configuration & setup:
+
+```
+hermes setup                    Interactive setup wizard (provider, API keys, model)
+hermes model                    Interactive model/provider selection
+hermes config                   View current configuration
+hermes config edit              Open config.yaml in $EDITOR
+hermes config set KEY VALUE     Set a config value directly
+hermes login                    Authenticate with a provider
+hermes logout                   Clear stored auth
+hermes doctor                   Check configuration and dependencies
+```
+
+### Tools & skills:
+
+```
+hermes tools                    Interactive tool enable/disable per platform
+hermes skills list              List installed skills
+hermes skills search QUERY      Search the skills hub
+hermes skills install NAME      Install a skill from the hub
+hermes skills config            Enable/disable skills per platform
+```
+
+### Gateway (messaging platforms):
+
+```
+hermes gateway run              Start the messaging gateway
+hermes gateway install          Install gateway as background service
+hermes gateway status           Check gateway status
+```
+
+### Session management:
+
+```
+hermes sessions list            List past sessions
+hermes sessions browse          Interactive session picker
+hermes sessions rename ID TITLE Rename a session
+hermes sessions export ID       Export session as markdown
+hermes sessions prune           Clean up old sessions
+```
+
+### Other:
+
+```
+hermes status                   Show status of all components
+hermes cron list                List cron jobs
+hermes insights                 Usage analytics
+hermes update                   Update to latest version
+hermes pairing                  Manage DM authorization codes
+```
+
+## Setup Wizard (`hermes setup`)
+
+The interactive setup wizard walks through:
+1. **Provider selection** — OpenRouter, Anthropic, OpenAI, Google, DeepSeek, and many more
+2. **API key entry** — stores securely in the env file
+3. **Model selection** — picks from available models for the chosen provider
+4. **Basic settings** — reasoning effort, tool preferences
+
+Run it from terminal:
+```bash
+cd ~/.hermes/hermes-agent
+source venv/bin/activate
+python -m hermes_cli.main setup
+```
+
+To change just the model/provider later: `hermes model`
+
+## Skills Configuration (`hermes skills`)
+
+Skills are reusable instruction sets that extend what Hermes can do.
+
+### Managing skills:
+
+```bash
+hermes skills list              # Show installed skills
+hermes skills search "docker"   # Search the hub
+hermes skills install NAME      # Install from hub
+hermes skills config            # Enable/disable per platform
+```
+
+### Per-platform skill control:
+
+`hermes skills config` opens an interactive UI where you can enable or disable specific skills for each platform (cli, telegram, discord, etc.). Disabled skills won't appear in the agent's available skills list for that platform.
+
+### Loading skills in a session:
+
+- CLI: `hermes -s skill-name` or `hermes -s skill1,skill2`
+- Chat: `/skill skill-name`
+- Gateway: type `/skill skill-name` in any chat
+
+## Voice Messages (STT)
+
+Voice messages from Telegram/Discord/WhatsApp/Slack/Signal are auto-transcribed when an STT provider is available.
+
+### Provider priority (auto-detected):
+1. **Local faster-whisper** — free, no API key, runs on CPU/GPU
+2. **Groq Whisper** — free tier, needs GROQ_API_KEY
+3. **OpenAI Whisper** — paid, needs VOICE_TOOLS_OPENAI_KEY
+
+### Setup local STT (recommended):
+
+```bash
+cd ~/.hermes/hermes-agent
+source venv/bin/activate
+pip install faster-whisper
+```
+
+Add to config.yaml under the `stt:` section:
+```yaml
+stt:
+  enabled: true
+  provider: local
+  local:
+    model: base  # Options: tiny, base, small, medium, large-v3
+```
+
+Model downloads automatically on first use (~150 MB for base).
+
+### Setup Groq STT (free cloud):
+
+1. Get free key from https://console.groq.com
+2. Add GROQ_API_KEY to the env file
+3. Set provider to groq in config.yaml stt section
+
+### Verify STT:
+
+After config changes, restart the gateway (send /restart in chat, or restart `hermes gateway run`). Then send a voice message.
+
+## Voice Replies (TTS)
+
+Hermes can reply with voice when users send voice messages.
+
+### TTS providers (set API key in env file):
+
+| Provider | Env var | Free? |
+|----------|---------|-------|
+| ElevenLabs | ELEVENLABS_API_KEY | Free tier |
+| OpenAI | VOICE_TOOLS_OPENAI_KEY | Paid |
+| Kokoro (local) | None needed | Free |
+| Fish Audio | FISH_AUDIO_API_KEY | Free tier |
+
+### Voice commands (in any chat):
+- `/voice on` — voice reply to voice messages only
+- `/voice tts` — voice reply to all messages
+- `/voice off` — text only (default)
+
+## Enabling/Disabling Tools (`hermes tools`)
+
+### Interactive tool config:
+
+```bash
+cd ~/.hermes/hermes-agent
+source venv/bin/activate
+python -m hermes_cli.main tools
+```
+
+This opens a curses UI to enable/disable toolsets per platform (cli, telegram, discord, slack, etc.).
+
+### After changing tools:
+
+Use `/reset` in the chat to start a fresh session with the new toolset. Tool changes do NOT take effect mid-conversation (this preserves prompt caching and avoids cost spikes).
+
+### Common toolsets:
+
+| Toolset | What it provides |
+|---------|-----------------|
+| terminal | Shell command execution |
+| file | File read/write/search/patch |
+| web | Web search and extraction |
+| browser | Browser automation (needs Browserbase) |
+| image_gen | AI image generation |
+| mcp | MCP server connections |
+| voice | Text-to-speech output |
+| cronjob | Scheduled tasks |
+
+## Installing Dependencies
+
+Some tools need extra packages:
+
+```bash
+cd ~/.hermes/hermes-agent && source venv/bin/activate
+
+pip install faster-whisper    # Local STT (voice transcription)
+pip install browserbase       # Browser automation
+pip install mcp               # MCP server connections
+```
+
+## Config File Reference
+
+The main config file is `~/.hermes/config.yaml`. Key sections:
+
+```yaml
+# Model and provider
+model:
+  default: anthropic/claude-opus-4.6
+  provider: openrouter
+
+# Agent behavior
+agent:
+  max_turns: 90
+  reasoning_effort: high    # xhigh, high, medium, low, minimal, none
+
+# Voice
+stt:
+  enabled: true
+  provider: local           # local, groq, openai
+tts:
+  provider: elevenlabs      # elevenlabs, openai, kokoro, fish
+
+# Display
+display:
+  skin: default             # default, ares, mono, slate
+  tool_progress: full       # full, compact, off
+  background_process_notifications: all  # all, result, error, off
+```
+
+Edit with `hermes config edit` or `hermes config set KEY VALUE`.
+
+## Gateway Commands (Messaging Platforms)
+
+| Command | What it does |
+|---------|-------------|
+| /reset or /new | Fresh session (picks up new tool config) |
+| /help | Show all commands |
+| /model [name] | Show or change model |
+| /compact | Compress conversation to save context |
+| /voice [mode] | Configure voice replies |
+| /reasoning [effort] | Set reasoning level |
+| /sethome | Set home channel for cron/notifications |
+| /restart | Restart the gateway (picks up config changes) |
+| /status | Show session info |
+| /retry | Retry last message |
+| /undo | Remove last exchange |
+| /personality [name] | Set agent personality |
+| /skill [name] | Load a skill |
+
+## Troubleshooting
+
+### Voice messages not working
+1. Check stt.enabled is true in config.yaml
+2. Check a provider is available (faster-whisper installed, or API key set)
+3. Restart gateway after config changes (/restart)
+
+### Tool not available
+1. Run `hermes tools` to check if the toolset is enabled for your platform
+2. Some tools need env vars — check the env file
+3. Use /reset after enabling tools
+
+### Model/provider issues
+1. Run `hermes doctor` to check configuration
+2. Run `hermes login` to re-authenticate
+3. Check the env file has the right API key
+
+### Changes not taking effect
+- Gateway: /reset for tool changes, /restart for config changes
+- CLI: start a new session
+
+### Skills not showing up
+1. Check `hermes skills list` shows the skill
+2. Check `hermes skills config` has it enabled for your platform
+3. Load explicitly with `/skill name` or `hermes -s name`
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Sam Herring	e3123be445	Removing old patches	2026-03-30 10:06:08 -07:00
Sam Herring	e46d5b2c13	Removing old files	2026-03-30 09:58:05 -07:00
Sam Herring	34cc666105	Updating with trainer config pieces	2026-03-30 09:46:24 -07:00
Sam Herring	d6832260f9	Fixing eval steps to be a set number of tasks	2026-03-30 09:46:24 -07:00
Sam Herring	d2652e980f	Adding random jitter for agent temp to add variance into rollouts	2026-03-30 09:46:24 -07:00
Sam Herring	89cea9fd2d	Test basic Atropos trainer	2026-03-30 09:46:24 -07:00
Sam Herring	143e72c145	Updating endless terminals env with silenced warnings	2026-03-30 09:46:24 -07:00
Sam Herring	51305b3f3d	Tool call changes	2026-03-30 09:46:24 -07:00
Sam Herring	570e52b342	Monkey patching chat template kwargs	2026-03-30 09:46:24 -07:00
Sam Herring	d6e874491d	Env changes for tool use	2026-03-30 09:46:24 -07:00
Sam Herring	dd3812dffe	Adding tool call parser default	2026-03-30 09:46:24 -07:00
Sam Herring	6e17630bac	Eval splits for holdout sets	2026-03-30 09:46:24 -07:00
Sam Herring	53b710b13f	Changing return type to be ScoredDataGroup to account for multiple trajectories	2026-03-30 09:46:24 -07:00
Sam Herring	5b1e8059cb	Added task sppecific metris and evals	2026-03-30 09:46:24 -07:00
Sam Herring	ff16a33cdd	Wandb changes	2026-03-30 09:46:24 -07:00
Sam Herring	7cfb9eb1f6	Updating config	2026-03-30 09:46:24 -07:00
Sam Herring	c7b15f8ce1	Adding config init method	2026-03-30 09:46:24 -07:00
Sam Herring	7602c462ee	Updating path vars and dataset loading	2026-03-30 09:46:24 -07:00
Sam Herring	e38c24363c	Updating to use hermes-agent backend and parse container definition out of provided .sif files	2026-03-30 09:46:24 -07:00
Sam Herring	d768b244a5	Adding endless terminal environment after rebase:	2026-03-30 09:46:24 -07:00