fix: head+tail truncation for execute_code stdout (inspired by openclaw context-pruning)

Previously, _drain() only captured the first MAX_STDOUT_BYTES (50KB) of stdout, silently dropping all tail output. Scripts that print() their final results at the end would have those results lost. Now uses a two-buffer approach: 40% head + 60% tail (rolling window). This matches the pattern already used in terminal_tool.py (line 1042-1051) but gives the tail more space since execute_code scripts typically print() their final results at the end. Inspired by openclaw's softTrim context-pruning (headChars/tailChars).
2026-03-09 02:15:48 -07:00
2235 changed files with 65346 additions and 628061 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -1,16 +0,0 @@
-# Git
-.git
-.gitignore
-.gitmodules
-
-# Dependencies
-node_modules
-.venv
-
-# CI/CD
-.github
-
-# Environment files
-.env
-
-*.md
--- a/.env.example
+++ b/.env.example
@@ -7,38 +7,18 @@
 # OpenRouter provides access to many models through one API
 # All LLM calls go through OpenRouter - no direct provider keys needed
 # Get your key at: https://openrouter.ai/keys
-# OPENROUTER_API_KEY=
+OPENROUTER_API_KEY=

-# Default model is configured in ~/.hermes/config.yaml (model.default).
-# Use 'hermes model' or 'hermes setup' to change it.
-# LLM_MODEL is no longer read from .env — this line is kept for reference only.
-# LLM_MODEL=anthropic/claude-opus-4.6
-
-# =============================================================================
-# LLM PROVIDER (Google AI Studio / Gemini)
-# =============================================================================
-# Native Gemini API via Google's OpenAI-compatible endpoint.
-# Get your key at: https://aistudio.google.com/app/apikey
-# GOOGLE_API_KEY=your_google_ai_studio_key_here
-# GEMINI_API_KEY=your_gemini_key_here  # alias for GOOGLE_API_KEY
-# Optional base URL override (default: Google's OpenAI-compatible endpoint)
-# GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
-
-# =============================================================================
-# LLM PROVIDER (Ollama Cloud)
-# =============================================================================
-# Cloud-hosted open models via Ollama's OpenAI-compatible endpoint.
-# Get your key at: https://ollama.com/settings
-# OLLAMA_API_KEY=your_ollama_key_here
-# Optional base URL override (default: https://ollama.com/v1)
-# OLLAMA_BASE_URL=https://ollama.com/v1
+# Default model to use (OpenRouter format: provider/model)
+# Examples: anthropic/claude-opus-4.6, openai/gpt-4o, google/gemini-3-flash-preview, zhipuai/glm-4-plus
+LLM_MODEL=anthropic/claude-opus-4.6

 # =============================================================================
 # LLM PROVIDER (z.ai / GLM)
 # =============================================================================
 # z.ai provides access to ZhipuAI GLM models (GLM-4-Plus, etc.)
 # Get your key at: https://z.ai or https://open.bigmodel.cn
-# GLM_API_KEY=
+GLM_API_KEY=
 # GLM_BASE_URL=https://api.z.ai/api/paas/v4  # Override default base URL

 # =============================================================================
@@ -48,103 +28,43 @@
 # Get your key at: https://platform.kimi.ai (Kimi Code console)
 # Keys prefixed sk-kimi- use the Kimi Code API (api.kimi.com) by default.
 # Legacy keys from platform.moonshot.ai need KIMI_BASE_URL override below.
-# KIMI_API_KEY=
+KIMI_API_KEY=
 # KIMI_BASE_URL=https://api.kimi.com/coding/v1  # Default for sk-kimi- keys
 # KIMI_BASE_URL=https://api.moonshot.ai/v1      # For legacy Moonshot keys
 # KIMI_BASE_URL=https://api.moonshot.cn/v1       # For Moonshot China keys
-# KIMI_CN_API_KEY=                               # Dedicated Moonshot China key
-
-# =============================================================================
-# LLM PROVIDER (Arcee AI)
-# =============================================================================
-# Arcee AI provides access to Trinity models (trinity-mini, trinity-large-*)
-# Get an Arcee key at: https://chat.arcee.ai/
-# ARCEEAI_API_KEY=
-# ARCEE_BASE_URL=                                 # Override default base URL

 # =============================================================================
 # LLM PROVIDER (MiniMax)
 # =============================================================================
 # MiniMax provides access to MiniMax models (global endpoint)
 # Get your key at: https://www.minimax.io
-# MINIMAX_API_KEY=
+MINIMAX_API_KEY=
 # MINIMAX_BASE_URL=https://api.minimax.io/v1  # Override default base URL

 # MiniMax China endpoint (for users in mainland China)
-# MINIMAX_CN_API_KEY=
+MINIMAX_CN_API_KEY=
 # MINIMAX_CN_BASE_URL=https://api.minimaxi.com/v1  # Override default base URL

-# =============================================================================
-# LLM PROVIDER (OpenCode Zen)
-# =============================================================================
-# OpenCode Zen provides curated, tested models (GPT, Claude, Gemini, MiniMax, GLM, Kimi)
-# Pay-as-you-go pricing. Get your key at: https://opencode.ai/auth
-# OPENCODE_ZEN_API_KEY=
-# OPENCODE_ZEN_BASE_URL=https://opencode.ai/zen/v1  # Override default base URL
-
-# =============================================================================
-# LLM PROVIDER (OpenCode Go)
-# =============================================================================
-# OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
-# $10/month subscription. Get your key at: https://opencode.ai/auth
-# OPENCODE_GO_API_KEY=
-
-# =============================================================================
-# LLM PROVIDER (Hugging Face Inference Providers)
-# =============================================================================
-# Hugging Face routes to 20+ open models via unified OpenAI-compatible endpoint.
-# Free tier included ($0.10/month), no markup on provider rates.
-# Get your token at: https://huggingface.co/settings/tokens
-# Required permission: "Make calls to Inference Providers"
-# HF_TOKEN=
-# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1  # Override default base URL
-
-# =============================================================================
-# LLM PROVIDER (Qwen OAuth)
-# =============================================================================
-# Qwen OAuth reuses your local Qwen CLI login (qwen auth qwen-oauth).
-# No API key needed — credentials come from ~/.qwen/oauth_creds.json.
-# Optional base URL override:
-# HERMES_QWEN_BASE_URL=https://portal.qwen.ai/v1
-
-# =============================================================================
-# LLM PROVIDER (Xiaomi MiMo)
-# =============================================================================
-# Xiaomi MiMo models (mimo-v2-pro, mimo-v2-omni, mimo-v2-flash).
-# Get your key at: https://platform.xiaomimimo.com
-# XIAOMI_API_KEY=your_key_here
-# Optional base URL override:
-# XIAOMI_BASE_URL=https://api.xiaomimimo.com/v1
-
 # =============================================================================
 # TOOL API KEYS
 # =============================================================================

-# Exa API Key - AI-native web search and contents
-# Get at: https://exa.ai
-# EXA_API_KEY=
-
-# Parallel API Key - AI-native web search and extract
-# Get at: https://parallel.ai
-# PARALLEL_API_KEY=
-
 # Firecrawl API Key - Web search, extract, and crawl
 # Get at: https://firecrawl.dev/
-# FIRECRAWL_API_KEY=
-
+FIRECRAWL_API_KEY=

 # FAL.ai API Key - Image generation
 # Get at: https://fal.ai/
-# FAL_KEY=
+FAL_KEY=

 # Honcho - Cross-session AI-native user modeling (optional)
 # Builds a persistent understanding of the user across sessions and tools.
 # Get at: https://app.honcho.dev
 # Also requires ~/.honcho/config.json with enabled=true (see README).
-# HONCHO_API_KEY=
+HONCHO_API_KEY=

 # =============================================================================
-# TERMINAL TOOL CONFIGURATION
+# TERMINAL TOOL CONFIGURATION (mini-swe-agent backend)
 # =============================================================================
 # Backend type: "local", "singularity", "docker", "modal", or "ssh"
 # Terminal backend is configured in ~/.hermes/config.yaml (terminal.backend).
@@ -154,10 +74,6 @@
 # Only override here if you need to force a backend without touching config.yaml:
 # TERMINAL_ENV=local

-# Override the container runtime binary (e.g. to use Podman instead of Docker).
-# Useful on systems where Docker's storage driver is broken or unavailable.
-# HERMES_DOCKER_BINARY=/usr/local/bin/podman
-
 # Container images (for singularity/docker/modal backends)
 # TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
 # TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
@@ -231,10 +147,10 @@ TERMINAL_LIFETIME_SECONDS=300

 # Browserbase API Key - Cloud browser execution
 # Get at: https://browserbase.com/
-# BROWSERBASE_API_KEY=
+BROWSERBASE_API_KEY=

 # Browserbase Project ID - From your Browserbase dashboard
-# BROWSERBASE_PROJECT_ID=
+BROWSERBASE_PROJECT_ID=

 # Enable residential proxies for better CAPTCHA solving (default: true)
 # Routes traffic through residential IPs, significantly improves success rate
@@ -266,7 +182,7 @@ BROWSER_INACTIVITY_TIMEOUT=120
 # Uses OpenAI's API directly (not via OpenRouter).
 # Named VOICE_TOOLS_OPENAI_KEY to avoid interference with OpenRouter.
 # Get at: https://platform.openai.com/api-keys
-# VOICE_TOOLS_OPENAI_KEY=
+VOICE_TOOLS_OPENAI_KEY=

 # =============================================================================
 # SLACK INTEGRATION
@@ -281,37 +197,10 @@ BROWSER_INACTIVITY_TIMEOUT=120
 # Slack allowed users (comma-separated Slack user IDs)
 # SLACK_ALLOWED_USERS=

-# =============================================================================
-# TELEGRAM INTEGRATION
-# =============================================================================
-# Telegram Bot Token - From @BotFather (https://t.me/BotFather)
-# TELEGRAM_BOT_TOKEN=
-# TELEGRAM_ALLOWED_USERS=                  # Comma-separated user IDs
-# TELEGRAM_HOME_CHANNEL=                   # Default chat for cron delivery
-# TELEGRAM_HOME_CHANNEL_NAME=              # Display name for home channel
-
-# Webhook mode (optional — for cloud deployments like Fly.io/Railway)
-# Default is long polling. Setting TELEGRAM_WEBHOOK_URL switches to webhook mode.
-# TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
-# TELEGRAM_WEBHOOK_PORT=8443
-# TELEGRAM_WEBHOOK_SECRET=                 # Recommended for production
-
 # WhatsApp (built-in Baileys bridge — run `hermes whatsapp` to pair)
 # WHATSAPP_ENABLED=false
 # WHATSAPP_ALLOWED_USERS=15551234567

-# Email (IMAP/SMTP — send and receive emails as Hermes)
-# For Gmail: enable 2FA → create App Password at https://myaccount.google.com/apppasswords
-# EMAIL_ADDRESS=hermes@gmail.com
-# EMAIL_PASSWORD=xxxx xxxx xxxx xxxx
-# EMAIL_IMAP_HOST=imap.gmail.com
-# EMAIL_IMAP_PORT=993
-# EMAIL_SMTP_HOST=smtp.gmail.com
-# EMAIL_SMTP_PORT=587
-# EMAIL_POLL_INTERVAL=15
-# EMAIL_ALLOWED_USERS=your@email.com
-# EMAIL_HOME_ADDRESS=your@email.com
-
 # Gateway-wide: allow ALL users without an allowlist (default: false = deny)
 # Only set to true if you intentionally want open access.
 # GATEWAY_ALLOW_ALL_USERS=false
@@ -352,11 +241,11 @@ IMAGE_TOOLS_DEBUG=false

 # Tinker API Key - RL training service
 # Get at: https://tinker-console.thinkingmachines.ai/keys
-# TINKER_API_KEY=
+TINKER_API_KEY=

 # Weights & Biases API Key - Experiment tracking and metrics
 # Get at: https://wandb.ai/authorize
-# WANDB_API_KEY=
+WANDB_API_KEY=

 # RL API Server URL (default: http://localhost:8080)
 # Change if running the rl-server on a different host/port
@@ -374,27 +263,3 @@ IMAGE_TOOLS_DEBUG=false
 # GITHUB_APP_ID=
 # GITHUB_APP_PRIVATE_KEY_PATH=
 # GITHUB_APP_INSTALLATION_ID=
-
-# Groq API key (free tier — used for Whisper STT in voice mode)
-# GROQ_API_KEY=
-
-# =============================================================================
-# STT PROVIDER SELECTION
-# =============================================================================
-# Default STT provider is "local" (faster-whisper) — runs on your machine, no API key needed.
-# Install with: pip install faster-whisper
-# Model downloads automatically on first use (~150 MB for "base").
-# To use cloud providers instead, set GROQ_API_KEY or VOICE_TOOLS_OPENAI_KEY above.
-# Provider priority: local > groq > openai
-# Configure in config.yaml: stt.provider: local | groq | openai
-
-# =============================================================================
-# STT ADVANCED OVERRIDES (optional)
-# =============================================================================
-# Override default STT models per provider (normally set via stt.model in config.yaml)
-# STT_GROQ_MODEL=whisper-large-v3-turbo
-# STT_OPENAI_MODEL=whisper-1
-
-# Override STT provider endpoints (for proxies or self-hosted instances)
-# GROQ_BASE_URL=https://api.groq.com/openai/v1
-# STT_OPENAI_BASE_URL=https://api.openai.com/v1
--- a/.envrc
+++ b/.envrc
@@ -1,5 +0,0 @@
-watch_file pyproject.toml uv.lock
-watch_file ui-tui/package-lock.json ui-tui/package.json
-watch_file flake.nix flake.lock nix/devShell.nix nix/tui.nix nix/package.nix nix/python.nix
-
-use flake
--- a/.git-blame-ignore-revs
+++ b/.git-blame-ignore-revs
@@ -1,5 +0,0 @@
-# hermes_agent package restructure (PR 1/3)
-# Commit 2: pure git mv — all source files into hermes_agent/
-65ca3ba93b3fa7fd2b15af5b62d54020061f3672
-# Commit 3: rewrite all imports for hermes_agent package
-4b16341975a1217588054f567d0f76dc5a3cc481
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,2 +0,0 @@
-# Auto-generated files — collapse diffs and exclude from language stats
-web/package-lock.json linguist-generated=true
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -11,7 +11,6 @@ body:
        **Before submitting**, please:
        - [ ] Search [existing issues](https://github.com/NousResearch/hermes-agent/issues) to avoid duplicates
        - [ ] Update to the latest version (`hermes update`) and confirm the bug still exists
-        - [ ] Run `hermes debug share` and paste the links below (see Debug Report section)

  - type: textarea
    id: description
@@ -83,25 +82,6 @@ body:
        - Slack
        - WhatsApp

-  - type: textarea
-    id: debug-report
-    attributes:
-      label: Debug Report
-      description: |
-        Run `hermes debug share` from your terminal and paste the links it prints here.
-        This uploads your system info, config, and recent logs to a paste service automatically.
-
-        If you're in an interactive chat session, you can also use the `/debug` slash command — it does the same thing.
-
-        If the upload fails, run `hermes debug share --local` and paste the output directly.
-      placeholder: |
-        Report   https://paste.rs/abc123
-        agent.log   https://paste.rs/def456
-        gateway.log   https://paste.rs/ghi789
-      render: shell
-    validations:
-      required: true
-
  - type: input
    id: os
    attributes:
@@ -117,6 +97,8 @@ body:
      label: Python Version
      description: Output of `python --version`
      placeholder: "3.11.9"
+    validations:
+      required: true

  - type: input
    id: hermes-version
@@ -124,14 +106,14 @@ body:
      label: Hermes Version
      description: Output of `hermes version`
      placeholder: "2.1.0"
+    validations:
+      required: true

  - type: textarea
    id: logs
    attributes:
-      label: Additional Logs / Traceback (optional)
-      description: |
-        The debug report above covers most logs. Use this field for any extra error output, 
-        tracebacks, or screenshots not captured by `hermes debug share`.
+      label: Relevant Logs / Traceback
+      description: Paste any error output, traceback, or log messages. This will be auto-formatted as code.
      render: shell

  - type: textarea
--- a/.github/ISSUE_TEMPLATE/feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -71,15 +71,3 @@ body:
      label: Contribution
      options:
        - label: I'd like to implement this myself and submit a PR
-
-  - type: textarea
-    id: debug-report
-    attributes:
-      label: Debug Report (optional)
-      description: |
-        If this feature request is related to a problem you're experiencing, run `hermes debug share` and paste the links here.
-        In an interactive chat session, you can use `/debug` instead.
-        This helps us understand your environment and any related logs.
-      placeholder: |
-        Report   https://paste.rs/abc123
-      render: shell
--- a/.github/ISSUE_TEMPLATE/setup_help.yml
+++ b/.github/ISSUE_TEMPLATE/setup_help.yml
@@ -9,8 +9,7 @@ body:
        Sorry you're having trouble! Please fill out the details below so we can help.

        **Quick checks first:**
-        - Run `hermes debug share` and paste the links in the Debug Report section below
-        - If you're in a chat session, you can use `/debug` instead — it does the same thing
+        - Run `hermes doctor` and include the output below
        - Try `hermes update` to get the latest version
        - Check the [README troubleshooting section](https://github.com/NousResearch/hermes-agent#troubleshooting)
        - For general questions, consider the [Nous Research Discord](https://discord.gg/NousResearch) for faster help
@@ -75,21 +74,10 @@ body:
      placeholder: "2.1.0"

  - type: textarea
-    id: debug-report
+    id: doctor-output
    attributes:
-      label: Debug Report
-      description: |
-        Run `hermes debug share` from your terminal and paste the links it prints here.
-        This uploads your system info, config, and recent logs to a paste service automatically.
-
-        If you're in an interactive chat session, you can also use the `/debug` slash command — it does the same thing.
-
-        If the upload fails or install didn't get that far, run `hermes debug share --local` and paste the output directly.
-        If even that doesn't work, run `hermes doctor` and paste that output instead.
-      placeholder: |
-        Report   https://paste.rs/abc123
-        agent.log   https://paste.rs/def456
-        gateway.log   https://paste.rs/ghi789
+      label: Output of `hermes doctor`
+      description: Run `hermes doctor` and paste the full output. This will be auto-formatted.
      render: shell

  - type: textarea
--- a/.github/actions/nix-setup/action.yml
+++ b/.github/actions/nix-setup/action.yml
@@ -1,8 +0,0 @@
-name: 'Setup Nix'
-description: 'Install Nix with DeterminateSystems and enable magic-nix-cache'
-
-runs:
-  using: composite
-  steps:
-    - uses: DeterminateSystems/nix-installer-action@ef8a148080ab6020fd15196c2084a2eea5ff2d25 # v22
-    - uses: DeterminateSystems/magic-nix-cache-action@565684385bcd71bad329742eefe8d12f2e765b39 # v13
--- a/.github/workflows/contributor-check.yml
+++ b/.github/workflows/contributor-check.yml
@@ -1,73 +0,0 @@
-name: Contributor Attribution Check
-
-on:
-  pull_request:
-    branches: [main]
-    paths:
-      # Only run when code files change (not docs-only PRs)
-      - '*.py'
-      - '**/*.py'
-      - '.github/workflows/contributor-check.yml'
-
-permissions:
-  contents: read
-
-jobs:
-  check-attribution:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-        with:
-          fetch-depth: 0  # Full history needed for git log
-
-      - name: Check for unmapped contributor emails
-        run: |
-          # Get the merge base between this PR and main
-          MERGE_BASE=$(git merge-base origin/main HEAD)
-
-          # Find any new author emails in this PR's commits
-          NEW_EMAILS=$(git log ${MERGE_BASE}..HEAD --format='%ae' --no-merges | sort -u)
-
-          if [ -z "$NEW_EMAILS" ]; then
-            echo "No new commits to check."
-            exit 0
-          fi
-
-          # Check each email against AUTHOR_MAP in release.py
-          MISSING=""
-          while IFS= read -r email; do
-            # Skip teknium and bot emails
-            case "$email" in
-              *teknium*|*noreply@github.com*|*dependabot*|*github-actions*|*anthropic.com*|*cursor.com*)
-                continue ;;
-            esac
-
-            # Check if email is in AUTHOR_MAP (either as a key or matches noreply pattern)
-            if echo "$email" | grep -qP '\+.*@users\.noreply\.github\.com'; then
-              continue  # GitHub noreply emails auto-resolve
-            fi
-
-            if ! grep -qF "\"${email}\"" scripts/release.py 2>/dev/null; then
-              AUTHOR=$(git log --author="$email" --format='%an' -1)
-              MISSING="${MISSING}\n  ${email} (${AUTHOR})"
-            fi
-          done <<< "$NEW_EMAILS"
-
-          if [ -n "$MISSING" ]; then
-            echo ""
-            echo "⚠️  New contributor email(s) not in AUTHOR_MAP:"
-            echo -e "$MISSING"
-            echo ""
-            echo "Please add mappings to scripts/release.py AUTHOR_MAP:"
-            echo -e "$MISSING" | while read -r line; do
-              email=$(echo "$line" | sed 's/^ *//' | cut -d' ' -f1)
-              [ -z "$email" ] && continue
-              echo "    \"${email}\": \"<github-username>\","
-            done
-            echo ""
-            echo "To find the GitHub username for an email:"
-            echo "  gh api 'search/users?q=EMAIL+in:email' --jq '.items[0].login'"
-            exit 1
-          else
-            echo "✅ All contributor emails are mapped in AUTHOR_MAP."
-          fi
--- a/.github/workflows/deploy-site.yml
+++ b/.github/workflows/deploy-site.yml
@@ -1,14 +1,11 @@
 name: Deploy Site

 on:
-  release:
-    types: [published]
  push:
    branches: [main]
    paths:
      - 'website/**'
-      - 'skills/**'
-      - 'optional-skills/**'
+      - 'landingpage/**'
      - '.github/workflows/deploy-site.yml'
  workflow_dispatch:

@@ -21,46 +18,20 @@ concurrency:
  cancel-in-progress: false

 jobs:
-  deploy-vercel:
-    if: github.event_name == 'release'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Trigger Vercel Deploy
-        run: curl -X POST "${{ secrets.VERCEL_DEPLOY_HOOK }}"
-
-  deploy-docs:
-    if: github.repository == 'NousResearch/hermes-agent'
+  build-and-deploy:
    runs-on: ubuntu-latest
    environment:
      name: github-pages
      url: ${{ steps.deploy.outputs.page_url }}
    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
+      - uses: actions/checkout@v4

-      - uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4
+      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
          cache-dependency-path: website/package-lock.json

-      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5
-        with:
-          python-version: '3.11'
-
-      - name: Install PyYAML for skill extraction
-        run: pip install pyyaml==6.0.2 httpx==0.28.1
-
-      - name: Extract skill metadata for dashboard
-        run: python3 website/scripts/extract-skills.py
-
-      - name: Build skills index (if not already present)
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          if [ ! -f website/static/api/skills-index.json ]; then
-            python3 scripts/build_skills_index.py || echo "Skills index build failed (non-fatal)"
-          fi
-
      - name: Install dependencies
        run: npm ci
        working-directory: website
@@ -72,13 +43,18 @@ jobs:
      - name: Stage deployment
        run: |
          mkdir -p _site/docs
+          # Landing page at root
+          cp -r landingpage/* _site/
+          # Docusaurus at /docs/
          cp -r website/build/* _site/docs/
+          # CNAME so GitHub Pages keeps the custom domain between deploys
+          echo "hermes-agent.nousresearch.com" > _site/CNAME

      - name: Upload artifact
-        uses: actions/upload-pages-artifact@56afc609e74202658d3ffba0e8f6dda462b719fa  # v3
+        uses: actions/upload-pages-artifact@v3
        with:
          path: _site

      - name: Deploy to GitHub Pages
        id: deploy
-        uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e  # v4
+        uses: actions/deploy-pages@v4
--- a/.github/workflows/docker-publish.yml
+++ b/.github/workflows/docker-publish.yml
@@ -1,99 +0,0 @@
-name: Docker Build and Publish
-
-on:
-  push:
-    branches: [main]
-    paths:
-      - '**/*.py'
-      - 'pyproject.toml'
-      - 'uv.lock'
-      - 'Dockerfile'
-      - 'docker/**'
-      - '.github/workflows/docker-publish.yml'
-  release:
-    types: [published]
-
-permissions:
-  contents: read
-
-concurrency:
-  group: docker-${{ github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  build-and-push:
-    # Only run on the upstream repository, not on forks
-    if: github.repository == 'NousResearch/hermes-agent'
-    runs-on: ubuntu-latest
-    timeout-minutes: 60
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-        with:
-          submodules: recursive
-
-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@c7c53464625b32c7a7e944ae62b3e17d2b600130  # v3
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3
-
-      # Build amd64 only so we can `load` the image for smoke testing.
-      # `load: true` cannot export a multi-arch manifest to the local daemon.
-      # The multi-arch build follows on push to main / release.
-      - name: Build image (amd64, smoke test)
-        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
-        with:
-          context: .
-          file: Dockerfile
-          load: true
-          platforms: linux/amd64
-          tags: nousresearch/hermes-agent:test
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-
-      - name: Test image starts
-        run: |
-          # The image runs as the hermes user (UID 10000).  GitHub Actions
-          # creates /tmp/hermes-test root-owned by default, which hermes
-          # can't write to — chown it to match the in-container UID before
-          # bind-mounting.  Real users doing `docker run -v ~/.hermes:...`
-          # with their own UID hit the same issue and have their own
-          # remediations (HERMES_UID env var, or chown locally).
-          mkdir -p /tmp/hermes-test
-          sudo chown -R 10000:10000 /tmp/hermes-test
-          docker run --rm \
-            -v /tmp/hermes-test:/opt/data \
-            --entrypoint /opt/hermes/docker/entrypoint.sh \
-            nousresearch/hermes-agent:test --help
-
-      - name: Log in to Docker Hub
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
-        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3
-        with:
-          username: ${{ secrets.DOCKERHUB_USERNAME }}
-          password: ${{ secrets.DOCKERHUB_TOKEN }}
-
-      - name: Push multi-arch image (main branch)
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
-        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
-        with:
-          context: .
-          file: Dockerfile
-          push: true
-          platforms: linux/amd64,linux/arm64
-          tags: nousresearch/hermes-agent:latest
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-
-      - name: Push multi-arch image (release)
-        if: github.event_name == 'release'
-        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6
-        with:
-          context: .
-          file: Dockerfile
-          push: true
-          platforms: linux/amd64,linux/arm64
-          tags: nousresearch/hermes-agent:${{ github.event.release.tag_name }}
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
--- a/.github/workflows/docs-site-checks.yml
+++ b/.github/workflows/docs-site-checks.yml
@@ -1,45 +0,0 @@
-name: Docs Site Checks
-
-on:
-  pull_request:
-    paths:
-      - 'website/**'
-      - '.github/workflows/docs-site-checks.yml'
-  workflow_dispatch:
-
-permissions:
-  contents: read
-
-jobs:
-  docs-site-checks:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-
-      - uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4
-        with:
-          node-version: 20
-          cache: npm
-          cache-dependency-path: website/package-lock.json
-
-      - name: Install website dependencies
-        run: npm ci
-        working-directory: website
-
-      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5
-        with:
-          python-version: '3.11'
-
-      - name: Install ascii-guard
-        run: python -m pip install ascii-guard==2.3.0 pyyaml==6.0.3
-
-      - name: Extract skill metadata for dashboard
-        run: python3 website/scripts/extract-skills.py
-
-      - name: Lint docs diagrams
-        run: npm run lint:diagrams
-        working-directory: website
-
-      - name: Build Docusaurus
-        run: npm run build
-        working-directory: website
--- a/.github/workflows/nix-lockfile-check.yml
+++ b/.github/workflows/nix-lockfile-check.yml
@@ -1,68 +0,0 @@
-name: Nix Lockfile Check
-
-on:
-  pull_request:
-  workflow_dispatch:
-
-permissions:
-  contents: read
-  pull-requests: write
-
-concurrency:
-  group: nix-lockfile-check-${{ github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  check:
-    runs-on: ubuntu-latest
-    timeout-minutes: 20
-    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-
-      - uses: ./.github/actions/nix-setup
-
-      - name: Resolve head SHA
-        id: sha
-        shell: bash
-        run: |
-          FULL="${{ github.event.pull_request.head.sha || github.sha }}"
-          echo "full=$FULL" >> "$GITHUB_OUTPUT"
-          echo "short=${FULL:0:7}" >> "$GITHUB_OUTPUT"
-
-      - name: Check lockfile hashes
-        id: check
-        continue-on-error: true
-        env:
-          LINK_SHA: ${{ steps.sha.outputs.full }}
-        run: nix run .#fix-lockfiles -- --check
-
-      - name: Post sticky PR comment (stale)
-        if: steps.check.outputs.stale == 'true' && github.event_name == 'pull_request'
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          message: |
-            ### ⚠️ npm lockfile hash out of date
-
-            Checked against commit [`${{ steps.sha.outputs.short }}`](${{ github.server_url }}/${{ github.repository }}/commit/${{ steps.sha.outputs.full }}) (PR head at check time).
-
-            The `hash = "sha256-..."` line in these nix files no longer matches the committed `package-lock.json`:
-
-            ${{ steps.check.outputs.report }}
-
-            #### Apply the fix
-
-            - [ ] **Apply lockfile fix** — tick to push a commit with the correct hashes to this PR branch
-            - Or [run the Nix Lockfile Fix workflow](${{ github.server_url }}/${{ github.repository }}/actions/workflows/nix-lockfile-fix.yml) manually (pass PR `#${{ github.event.pull_request.number }}`)
-            - Or locally: `nix run .#fix-lockfiles -- --apply` and commit the diff
-
-      - name: Clear sticky PR comment (resolved)
-        if: steps.check.outputs.stale == 'false' && github.event_name == 'pull_request'
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          delete: true
-
-      - name: Fail if stale
-        if: steps.check.outputs.stale == 'true'
-        run: exit 1
--- a/.github/workflows/nix-lockfile-fix.yml
+++ b/.github/workflows/nix-lockfile-fix.yml
@@ -1,149 +0,0 @@
-name: Nix Lockfile Fix
-
-on:
-  workflow_dispatch:
-    inputs:
-      pr_number:
-        description: 'PR number to fix (leave empty to run on the selected branch)'
-        required: false
-        type: string
-  issue_comment:
-    types: [edited]
-
-permissions:
-  contents: write
-  pull-requests: write
-
-concurrency:
-  group: nix-lockfile-fix-${{ github.event.issue.number || github.event.inputs.pr_number || github.ref }}
-  cancel-in-progress: false
-
-jobs:
-  fix:
-    # Run on manual dispatch OR when a task-list checkbox in the sticky
-    # lockfile-check comment flips from `[ ]` to `[x]`.
-    if: |
-      github.event_name == 'workflow_dispatch' ||
-      (github.event_name == 'issue_comment'
-       && github.event.issue.pull_request != null
-       && contains(github.event.comment.body, '[x] **Apply lockfile fix**')
-       && !contains(github.event.changes.body.from, '[x] **Apply lockfile fix**'))
-    runs-on: ubuntu-latest
-    timeout-minutes: 25
-    steps:
-      - name: Authorize & resolve PR
-        id: resolve
-        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea  # v7.0.1
-        with:
-          script: |
-            // 1. Verify the actor has write access — applies to both checkbox
-            //    clicks and manual dispatch.
-            const { data: perm } =
-              await github.rest.repos.getCollaboratorPermissionLevel({
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                username: context.actor,
-              });
-            if (!['admin', 'write', 'maintain'].includes(perm.permission)) {
-              core.setFailed(
-                `${context.actor} lacks write access (has: ${perm.permission})`
-              );
-              return;
-            }
-
-            // 2. Resolve which ref to check out.
-            let prNumber = '';
-            if (context.eventName === 'issue_comment') {
-              prNumber = String(context.payload.issue.number);
-            } else if (context.eventName === 'workflow_dispatch') {
-              prNumber = context.payload.inputs.pr_number || '';
-            }
-
-            if (!prNumber) {
-              core.setOutput('ref', context.ref.replace(/^refs\/heads\//, ''));
-              core.setOutput('repo', context.repo.repo);
-              core.setOutput('owner', context.repo.owner);
-              core.setOutput('pr', '');
-              return;
-            }
-
-            const { data: pr } = await github.rest.pulls.get({
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              pull_number: Number(prNumber),
-            });
-            core.setOutput('ref', pr.head.ref);
-            core.setOutput('repo', pr.head.repo.name);
-            core.setOutput('owner', pr.head.repo.owner.login);
-            core.setOutput('pr', String(pr.number));
-
-      # Wipe the sticky lockfile-check comment to a "running" state as soon
-      # as the job is authorized, so the user sees their click was picked up
-      # before the ~minute of nix build work.
-      - name: Mark sticky as running
-        if: steps.resolve.outputs.pr != ''
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          number: ${{ steps.resolve.outputs.pr }}
-          message: |
-            ### 🔄 Applying lockfile fix…
-
-            Triggered by @${{ github.actor }} — [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).
-
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-        with:
-          repository: ${{ steps.resolve.outputs.owner }}/${{ steps.resolve.outputs.repo }}
-          ref: ${{ steps.resolve.outputs.ref }}
-          token: ${{ secrets.GITHUB_TOKEN }}
-          fetch-depth: 0
-
-      - uses: ./.github/actions/nix-setup
-
-      - name: Apply lockfile hashes
-        id: apply
-        run: nix run .#fix-lockfiles -- --apply
-
-      - name: Commit & push
-        if: steps.apply.outputs.changed == 'true'
-        shell: bash
-        run: |
-          set -euo pipefail
-          git config user.name 'github-actions[bot]'
-          git config user.email '41898282+github-actions[bot]@users.noreply.github.com'
-          git add nix/tui.nix nix/web.nix
-          git commit -m "fix(nix): refresh npm lockfile hashes"
-          git push
-
-      - name: Update sticky (applied)
-        if: steps.apply.outputs.changed == 'true' && steps.resolve.outputs.pr != ''
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          number: ${{ steps.resolve.outputs.pr }}
-          message: |
-            ### ✅ Lockfile fix applied
-
-            Pushed a commit refreshing the npm lockfile hashes — [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).
-
-      - name: Update sticky (already current)
-        if: steps.apply.outputs.changed == 'false' && steps.resolve.outputs.pr != ''
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          number: ${{ steps.resolve.outputs.pr }}
-          message: |
-            ### ✅ Lockfile hashes already current
-
-            Nothing to commit — [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).
-
-      - name: Update sticky (failed)
-        if: failure() && steps.resolve.outputs.pr != ''
-        uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728  # v2.9.1
-        with:
-          header: nix-lockfile-check
-          number: ${{ steps.resolve.outputs.pr }}
-          message: |
-            ### ❌ Lockfile fix failed
-
-            See the [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}) for logs.
--- a/.github/workflows/nix.yml
+++ b/.github/workflows/nix.yml
@@ -1,33 +0,0 @@
-name: Nix
-
-on:
-  push:
-    branches: [main]
-  pull_request:
-
-permissions:
-  contents: read
-
-concurrency:
-  group: nix-${{ github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  nix:
-    strategy:
-      matrix:
-        os: [ubuntu-latest, macos-latest]
-    runs-on: ${{ matrix.os }}
-    timeout-minutes: 30
-    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
-      - uses: ./.github/actions/nix-setup
-      - name: Check flake
-        if: runner.os == 'Linux'
-        run: nix flake check --print-build-logs
-      - name: Build package
-        if: runner.os == 'Linux'
-        run: nix build --print-build-logs
-      - name: Evaluate flake (macOS)
-        if: runner.os == 'macOS'
-        run: nix flake show --json > /dev/null
--- a/.github/workflows/skills-index.yml
+++ b/.github/workflows/skills-index.yml
@@ -1,101 +0,0 @@
-name: Build Skills Index
-
-on:
-  schedule:
-    # Run twice daily: 6 AM and 6 PM UTC
-    - cron: '0 6,18 * * *'
-  workflow_dispatch:  # Manual trigger
-  push:
-    branches: [main]
-    paths:
-      - 'scripts/build_skills_index.py'
-      - '.github/workflows/skills-index.yml'
-
-permissions:
-  contents: read
-
-jobs:
-  build-index:
-    # Only run on the upstream repository, not on forks
-    if: github.repository == 'NousResearch/hermes-agent'
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-
-      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5
-        with:
-          python-version: '3.11'
-
-      - name: Install dependencies
-        run: pip install httpx==0.28.1 pyyaml==6.0.2
-
-      - name: Build skills index
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: python scripts/build_skills_index.py
-
-      - name: Upload index artifact
-        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02  # v4
-        with:
-          name: skills-index
-          path: website/static/api/skills-index.json
-          retention-days: 7
-
-  deploy-with-index:
-    needs: build-index
-    runs-on: ubuntu-latest
-    permissions:
-      pages: write
-      id-token: write
-    environment:
-      name: github-pages
-      url: ${{ steps.deploy.outputs.page_url }}
-    # Only deploy on schedule or manual trigger (not on every push to the script)
-    if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
-    steps:
-      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-
-      - uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093  # v4
-        with:
-          name: skills-index
-          path: website/static/api/
-
-      - uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020  # v4
-        with:
-          node-version: 20
-          cache: npm
-          cache-dependency-path: website/package-lock.json
-
-      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065  # v5
-        with:
-          python-version: '3.11'
-
-      - name: Install PyYAML for skill extraction
-        run: pip install pyyaml==6.0.2
-
-      - name: Extract skill metadata for dashboard
-        run: python3 website/scripts/extract-skills.py
-
-      - name: Install dependencies
-        run: npm ci
-        working-directory: website
-
-      - name: Build Docusaurus
-        run: npm run build
-        working-directory: website
-
-      - name: Stage deployment
-        run: |
-          mkdir -p _site/docs
-          cp -r landingpage/* _site/
-          cp -r website/build/* _site/docs/
-          echo "hermes-agent.nousresearch.com" > _site/CNAME
-
-      - name: Upload artifact
-        uses: actions/upload-pages-artifact@56afc609e74202658d3ffba0e8f6dda462b719fa  # v3
-        with:
-          path: _site
-
-      - name: Deploy to GitHub Pages
-        id: deploy
-        uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e  # v4
--- a/.github/workflows/supply-chain-audit.yml
+++ b/.github/workflows/supply-chain-audit.yml
@@ -1,139 +0,0 @@
-name: Supply Chain Audit
-
-on:
-  pull_request:
-    types: [opened, synchronize, reopened]
-    paths:
-      - '**/*.py'
-      - '**/*.pth'
-      - '**/setup.py'
-      - '**/setup.cfg'
-      - '**/sitecustomize.py'
-      - '**/usercustomize.py'
-      - '**/__init__.pth'
-
-permissions:
-  pull-requests: write
-  contents: read
-
-# Narrow, high-signal scanner. Only fires on critical indicators of supply
-# chain attacks (e.g. the litellm-style payloads). Low-signal heuristics
-# (plain base64, plain exec/eval, dependency/Dockerfile/workflow edits,
-# Actions version unpinning, outbound POST/PUT) were intentionally
-# removed — they fired on nearly every PR and trained reviewers to ignore
-# the scanner. Keep this file's checks ruthlessly narrow: if you find
-# yourself adding WARNING-tier patterns here again, make a separate
-# advisory-only workflow instead.
-
-jobs:
-  scan:
-    name: Scan PR for critical supply chain risks
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-        with:
-          fetch-depth: 0
-
-      - name: Scan diff for critical patterns
-        id: scan
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          set -euo pipefail
-
-          BASE="${{ github.event.pull_request.base.sha }}"
-          HEAD="${{ github.event.pull_request.head.sha }}"
-
-          # Added lines only, excluding lockfiles.
-          DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
-
-          FINDINGS=""
-
-          # --- .pth files (auto-execute on Python startup) ---
-          # The exact mechanism used in the litellm supply chain attack:
-          # https://github.com/BerriAI/litellm/issues/24512
-          PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
-          if [ -n "$PTH_FILES" ]; then
-            FINDINGS="${FINDINGS}
-          ### 🚨 CRITICAL: .pth file added or modified
-          Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required.
-
-          **Files:**
-          \`\`\`
-          ${PTH_FILES}
-          \`\`\`
-          "
-          fi
-
-          # --- base64 decode + exec/eval on the same line (the litellm attack pattern) ---
-          B64_EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|decodebytes|urlsafe_b64decode)' | grep -iE 'exec\(|eval\(' | head -10 || true)
-          if [ -n "$B64_EXEC_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### 🚨 CRITICAL: base64 decode + exec/eval combo
-          Base64-decoded strings passed directly to exec/eval — the signature of hidden credential-stealing payloads.
-
-          **Matches:**
-          \`\`\`
-          ${B64_EXEC_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- subprocess with encoded/obfuscated command argument ---
-          PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|\\x[0-9a-f]{2}|chr\(' | head -10 || true)
-          if [ -n "$PROC_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### 🚨 CRITICAL: subprocess with encoded/obfuscated command
-          Subprocess calls whose command strings are base64- or hex-encoded are a strong indicator of payload execution.
-
-          **Matches:**
-          \`\`\`
-          ${PROC_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- Install-hook files (setup.py/sitecustomize/usercustomize/__init__.pth) ---
-          # These execute during pip install or interpreter startup.
-          SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(^|/)(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
-          if [ -n "$SETUP_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### 🚨 CRITICAL: Install-hook file added or modified
-          These files can execute code during package installation or interpreter startup.
-
-          **Files:**
-          \`\`\`
-          ${SETUP_HITS}
-          \`\`\`
-          "
-          fi
-
-          if [ -n "$FINDINGS" ]; then
-            echo "found=true" >> "$GITHUB_OUTPUT"
-            echo "$FINDINGS" > /tmp/findings.md
-          else
-            echo "found=false" >> "$GITHUB_OUTPUT"
-          fi
-
-      - name: Post critical finding comment
-        if: steps.scan.outputs.found == 'true'
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          BODY="## 🚨 CRITICAL Supply Chain Risk Detected
-
-          This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.
-
-          $(cat /tmp/findings.md)
-
-          ---
-          *Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.*"
-
-          gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY" || echo "::warning::Could not post PR comment (expected for fork PRs — GITHUB_TOKEN is read-only)"
-
-      - name: Fail on critical findings
-        if: steps.scan.outputs.found == 'true'
-        run: |
-          echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
-          exit 1
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -3,17 +3,8 @@ name: Tests
 on:
  push:
    branches: [main]
-    paths-ignore:
-      - '**/*.md'
-      - 'docs/**'
  pull_request:
    branches: [main]
-    paths-ignore:
-      - '**/*.md'
-      - 'docs/**'
-
-permissions:
-  contents: read

 # Cancel in-progress runs for the same PR/branch
 concurrency:
@@ -23,16 +14,13 @@ concurrency:
 jobs:
  test:
    runs-on: ubuntu-latest
-    timeout-minutes: 20
+    timeout-minutes: 10
    steps:
      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-
-      - name: Install system dependencies
-        run: sudo apt-get update && sudo apt-get install -y ripgrep
+        uses: actions/checkout@v4

      - name: Install uv
-        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
+        uses: astral-sh/setup-uv@v5

      - name: Set up Python 3.11
        run: uv python install 3.11
@@ -46,37 +34,9 @@ jobs:
      - name: Run tests
        run: |
          source .venv/bin/activate
-          python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto
+          python -m pytest tests/ -q --ignore=tests/integration --tb=short
        env:
          # Ensure tests don't accidentally call real APIs
          OPENROUTER_API_KEY: ""
          OPENAI_API_KEY: ""
          NOUS_API_KEY: ""
-
-  e2e:
-    runs-on: ubuntu-latest
-    timeout-minutes: 10
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5  # v4
-
-      - name: Install uv
-        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
-
-      - name: Set up Python 3.11
-        run: uv python install 3.11
-
-      - name: Install dependencies
-        run: |
-          uv venv .venv --python 3.11
-          source .venv/bin/activate
-          uv pip install -e ".[all,dev]"
-
-      - name: Run e2e tests
-        run: |
-          source .venv/bin/activate
-          python -m pytest tests/e2e/ -v --tb=short
-        env:
-          OPENROUTER_API_KEY: ""
-          OPENAI_API_KEY: ""
-          NOUS_API_KEY: ""
--- a/.gitignore
+++ b/.gitignore
@@ -1,70 +1,51 @@
-/venv/
-/_pycache/
-*.pyc*
-__pycache__/
-.venv/
-.vscode/
-.env
-.env.local
-.env.development.local
-.env.test.local
-.env.production.local
-.env.development
-.env.test
-export*
-__pycache__/model_tools.cpython-310.pyc
-__pycache__/web_tools.cpython-310.pyc
-logs/
-data/
-.pytest_cache/
-tmp/
-temp_vision_images/
-hermes-*/*
-examples/
-tests/quick_test_dataset.jsonl
-tests/sample_dataset.jsonl
-run_datagen_kimik2-thinking.sh
-run_datagen_megascience_glm4-6.sh
-run_datagen_sonnet.sh
-source-data/*
-run_datagen_megascience_glm4-6.sh
-data/*
-node_modules/
-browser-use/
-agent-browser/
-# Private keys
-*.ppk
-*.pem
-privvy*
-images/
-__pycache__/
-hermes_agent.egg-info/
-wandb/
-testlogs
-
-# CLI config (may contain sensitive SSH paths)
-cli-config.yaml
-
-# Skills Hub state (lives in ~/.hermes/skills/.hub/ at runtime, but just in case)
-skills/.hub/
+/venv/
+/_pycache/
+*.pyc*
+__pycache__/
+.venv/
+.vscode/
+.env
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+.env.development
+.env.test
+export*
+__pycache__/model_tools.cpython-310.pyc
+__pycache__/web_tools.cpython-310.pyc
+logs/
+data/
+.pytest_cache/
+tmp/
+temp_vision_images/
+hermes-*/*
+examples/
+tests/quick_test_dataset.jsonl
+tests/sample_dataset.jsonl
+run_datagen_kimik2-thinking.sh
+run_datagen_megascience_glm4-6.sh
+run_datagen_sonnet.sh
+source-data/*
+run_datagen_megascience_glm4-6.sh
+data/*
+node_modules/
+browser-use/
+agent-browser/
+# Private keys
+*.ppk
+*.pem
+privvy*
+images/
+__pycache__/
+hermes_agent.egg-info/
+wandb/
+testlogs
+
+# CLI config (may contain sensitive SSH paths)
+cli-config.yaml
+
+# Skills Hub state (lives in ~/.hermes/skills/.hub/ at runtime, but just in case)
+skills/.hub/
 ignored/
 .worktrees/
-environments/benchmarks/evals/
-
-# Web UI build output
-hermes_cli/web_dist/
-
-# Web UI assets — synced from @nous-research/ui at build time via
-# `npm run sync-assets` (see web/package.json).
-web/public/fonts/
-web/public/ds-assets/
-
-# Release script temp files
-.release_notes.md
-mini-swe-agent/
-
-# Nix
-.direnv/
-.nix-stamps/
-result
-website/static/api/skills-index.json
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,6 @@
+[submodule "mini-swe-agent"]
+	path = mini-swe-agent
+	url = https://github.com/SWE-agent/mini-swe-agent
 [submodule "tinker-atropos"]
 	path = tinker-atropos
 	url = https://github.com/nousresearch/tinker-atropos
--- a/.mailmap
+++ b/.mailmap
@@ -1,108 +0,0 @@
-# .mailmap — canonical author mapping for git shortlog / git log / GitHub
-# Format: Canonical Name <canonical@email> <commit@email>
-# See: https://git-scm.com/docs/gitmailmap
-#
-# This maps commit emails to GitHub noreply addresses so that:
-# 1. `git shortlog -sn` shows deduplicated contributor counts
-# 2. GitHub's contributor graph can attribute commits correctly
-# 3. Contributors with personal/work emails get proper credit
-#
-# When adding entries: use the contributor's GitHub noreply email as canonical
-# so GitHub can link commits to their profile.
-
-# === Teknium (multiple emails) ===
-Teknium <127238744+teknium1@users.noreply.github.com> <teknium1@gmail.com>
-Teknium <127238744+teknium1@users.noreply.github.com> <teknium@nousresearch.com>
-
-# === Contributors — personal/work emails mapped to GitHub noreply ===
-# Format: Canonical Name <GH-noreply> <commit-email>
-
-# Verified via GH API email search
-luyao618 <364939526@qq.com> <364939526@qq.com>
-ethernet8023 <arilotter@gmail.com> <arilotter@gmail.com>
-nicoloboschi <boschi1997@gmail.com> <boschi1997@gmail.com>
-cherifya <chef.ya@gmail.com> <chef.ya@gmail.com>
-BongSuCHOI <chlqhdtn98@gmail.com> <chlqhdtn98@gmail.com>
-dsocolobsky <dsocolobsky@gmail.com> <dsocolobsky@gmail.com>
-pefontana <fontana.pedro93@gmail.com> <fontana.pedro93@gmail.com>
-Helmi <frank@helmschrott.de> <frank@helmschrott.de>
-hata1234 <hata1234@gmail.com> <hata1234@gmail.com>
-
-# Verified via PR investigation / salvage PR bodies
-DeployFaith <agents@kylefrench.dev> <agents@kylefrench.dev>
-flobo3 <floptopbot33@gmail.com> <floptopbot33@gmail.com>
-gaixianggeng <gaixg94@gmail.com> <gaixg94@gmail.com>
-KUSH42 <xush@xush.org> <xush@xush.org>
-konsisumer <der@konsi.org> <der@konsi.org>
-WorldInnovationsDepartment <vorvul.danylo@gmail.com> <vorvul.danylo@gmail.com>
-m0n5t3r <iacobs@m0n5t3r.info> <iacobs@m0n5t3r.info>
-sprmn24 <oncuevtv@gmail.com> <oncuevtv@gmail.com>
-fancydirty <fancydirty@gmail.com> <fancydirty@gmail.com>
-fxfitz <francis.x.fitzpatrick@gmail.com> <francis.x.fitzpatrick@gmail.com>
-limars874 <limars874@gmail.com> <limars874@gmail.com>
-AaronWong1999 <aaronwong1999@icloud.com> <aaronwong1999@icloud.com>
-dippwho <dipp.who@gmail.com> <dipp.who@gmail.com>
-duerzy <duerzy@gmail.com> <duerzy@gmail.com>
-geoffwellman <geoff.wellman@gmail.com> <geoff.wellman@gmail.com>
-hcshen0111 <shenhaocheng19990111@gmail.com> <shenhaocheng19990111@gmail.com>
-jamesarch <han.shan@live.cn> <han.shan@live.cn>
-stephenschoettler <stephenschoettler@gmail.com> <stephenschoettler@gmail.com>
-Tranquil-Flow <tranquil_flow@protonmail.com> <tranquil_flow@protonmail.com>
-Dusk1e <yusufalweshdemir@gmail.com> <yusufalweshdemir@gmail.com>
-Awsh1 <ysfalweshcan@gmail.com> <ysfalweshcan@gmail.com>
-WAXLYY <ysfwaxlycan@gmail.com> <ysfwaxlycan@gmail.com>
-donrhmexe <don.rhm@gmail.com> <don.rhm@gmail.com>
-hqhq1025 <1506751656@qq.com> <1506751656@qq.com>
-BlackishGreen33 <s5460703@gmail.com> <s5460703@gmail.com>
-tomqiaozc <zqiao@microsoft.com> <zqiao@microsoft.com>
-MagicRay1217 <mingjwan@microsoft.com> <mingjwan@microsoft.com>
-aaronagent <1115117931@qq.com> <1115117931@qq.com>
-YoungYang963 <young@YoungdeMacBook-Pro.local> <young@YoungdeMacBook-Pro.local>
-LongOddCode <haolong@microsoft.com> <haolong@microsoft.com>
-Cafexss <coffeemjj@gmail.com> <coffeemjj@gmail.com>
-Cygra <sjtuwbh@gmail.com> <sjtuwbh@gmail.com>
-DomGrieco <dgrieco@redhat.com> <dgrieco@redhat.com>
-
-# Duplicate email mapping (same person, multiple emails)
-Sertug17 <104278804+Sertug17@users.noreply.github.com> <srhtsrht17@gmail.com>
-yyovil <birdiegyal@gmail.com> <tanishq231003@gmail.com>
-DomGrieco <dgrieco@redhat.com> <dgrieco@redhat.com>
-dsocolobsky <dsocolobsky@gmail.com> <dylan.socolobsky@lambdaclass.com>
-olafthiele <programming@olafthiele.com> <olafthiele@gmail.com>
-
-# Verified via git display name matching GH contributor username
-cokemine <aptx4561@gmail.com> <aptx4561@gmail.com>
-dalianmao000 <dalianmao0107@gmail.com> <dalianmao0107@gmail.com>
-emozilla <emozilla@nousresearch.com> <emozilla@nousresearch.com>
-jjovalle99 <juan.ovalle@mistral.ai> <juan.ovalle@mistral.ai>
-kagura-agent <kagura.chen28@gmail.com> <kagura.chen28@gmail.com>
-spniyant <niyant@spicefi.xyz> <niyant@spicefi.xyz>
-olafthiele <programming@olafthiele.com> <programming@olafthiele.com>
-r266-tech <r2668940489@gmail.com> <r2668940489@gmail.com>
-xingkongliang <tianliangjay@gmail.com> <tianliangjay@gmail.com>
-win4r <win4r@outlook.com> <win4r@outlook.com>
-zhouboli <zhouboli@gmail.com> <zhouboli@gmail.com>
-yongtenglei <yongtenglei@gmail.com> <yongtenglei@gmail.com>
-
-# Nous Research team
-benbarclay <ben@nousresearch.com> <ben@nousresearch.com>
-jquesnelle <jonny@nousresearch.com> <jonny@nousresearch.com>
-
-# GH contributor list verified
-spideystreet <dhicham.pro@gmail.com> <dhicham.pro@gmail.com>
-dorukardahan <dorukardahan@hotmail.com> <dorukardahan@hotmail.com>
-MustafaKara7 <karamusti912@gmail.com> <karamusti912@gmail.com>
-Hmbown <hmbown@gmail.com> <hmbown@gmail.com>
-kamil-gwozdz <kamil@gwozdz.me> <kamil@gwozdz.me>
-kira-ariaki <kira@ariaki.me> <kira@ariaki.me>
-knopki <knopki@duck.com> <knopki@duck.com>
-Unayung <unayung@gmail.com> <unayung@gmail.com>
-SeeYangZhi <yangzhi.see@gmail.com> <yangzhi.see@gmail.com>
-Julientalbot <julien.talbot@ergonomia.re> <julien.talbot@ergonomia.re>
-lesterli <lisicheng168@gmail.com> <lisicheng168@gmail.com>
-JiayuuWang <jiayuw794@gmail.com> <jiayuw794@gmail.com>
-tesseracttars-creator <tesseracttars@gmail.com> <tesseracttars@gmail.com>
-xinbenlv <zzn+pa@zzn.im> <zzn+pa@zzn.im>
-SaulJWu <saul.jj.wu@gmail.com> <saul.jj.wu@gmail.com>
-angelos <angelos@oikos.lan.home.malaiwah.com> <angelos@oikos.lan.home.malaiwah.com>
-MestreY0d4-Uninter <241404605+MestreY0d4-Uninter@users.noreply.github.com> <MestreY0d4-Uninter@users.noreply.github.com>
--- a/.plans/openai-api-server.md
+++ b/.plans/openai-api-server.md
@@ -1,291 +0,0 @@
-# OpenAI-Compatible API Server for Hermes Agent
-
-## Motivation
-
-Every major chat frontend (Open WebUI 126k★, LobeChat 73k★, LibreChat 34k★,
-AnythingLLM 56k★, NextChat 87k★, ChatBox 39k★, Jan 26k★, HF Chat-UI 8k★,
-big-AGI 7k★) connects to backends via the OpenAI-compatible REST API with
-SSE streaming. By exposing this endpoint, hermes-agent becomes instantly
-usable as a backend for all of them — no custom adapters needed.
-
-## What It Enables
-
-```
-┌──────────────────┐
-│  Open WebUI      │──┐
-│  LobeChat        │  │    POST /v1/chat/completions
-│  LibreChat       │  ├──► Authorization: Bearer <key>     ┌─────────────────┐
-│  AnythingLLM     │  │    {"messages": [...]}             │  hermes-agent   │
-│  NextChat        │  │                                    │  gateway        │
-│  Any OAI client  │──┘    ◄── SSE streaming response      │  (API server)   │
-└──────────────────┘                                        └─────────────────┘
-```
-
-A user would:
-1. Set `API_SERVER_ENABLED=true` in `~/.hermes/.env`
-2. Run `hermes gateway` (API server starts alongside Telegram/Discord/etc.)
-3. Point Open WebUI (or any frontend) at `http://localhost:8642/v1`
-4. Chat with hermes-agent through any OpenAI-compatible UI
-
-## Endpoints
-
-| Method | Path | Purpose |
-|--------|------|---------|
-| POST | `/v1/chat/completions` | Chat with the agent (streaming + non-streaming) |
-| GET | `/v1/models` | List available "models" (returns hermes-agent as a model) |
-| GET | `/health` | Health check |
-
-## Architecture
-
-### Option A: Gateway Platform Adapter (recommended)
-
-Create `gateway/platforms/api_server.py` as a new platform adapter that
-extends `BasePlatformAdapter`. This is the cleanest approach because:
-
- Reuses all gateway infrastructure (session management, auth, context building)
- Runs in the same async loop as other adapters
- Gets message handling, interrupt support, and session persistence for free
- Follows the established pattern (like Telegram, Discord, etc.)
- Uses `aiohttp.web` (already a dependency) for the HTTP server
-
-The adapter would start an `aiohttp.web.Application` server in `connect()`
-and route incoming HTTP requests through the standard `handle_message()` pipeline.
-
-### Option B: Standalone Component
-
-A separate HTTP server class in `gateway/api_server.py` that creates its own
-AIAgent instances directly. Simpler but duplicates session/auth logic.
-
-**Recommendation: Option A** — fits the existing architecture, less code to
-maintain, gets all gateway features for free.
-
-## Request/Response Format
-
-### Chat Completions (non-streaming)
-
-```
-POST /v1/chat/completions
-Authorization: Bearer hermes-api-key-here
-Content-Type: application/json
-
-{
-  "model": "hermes-agent",
-  "messages": [
-    {"role": "system", "content": "You are a helpful assistant."},
-    {"role": "user", "content": "What files are in the current directory?"}
-  ],
-  "stream": false,
-  "temperature": 0.7
-}
-```
-
-Response:
-```json
-{
-  "id": "chatcmpl-abc123",
-  "object": "chat.completion",
-  "created": 1710000000,
-  "model": "hermes-agent",
-  "choices": [{
-    "index": 0,
-    "message": {
-      "role": "assistant",
-      "content": "Here are the files in the current directory:\n..."
-    },
-    "finish_reason": "stop"
-  }],
-  "usage": {
-    "prompt_tokens": 50,
-    "completion_tokens": 200,
-    "total_tokens": 250
-  }
-}
-```
-
-### Chat Completions (streaming)
-
-Same request with `"stream": true`. Response is SSE:
-
-```
-data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
-
-data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Here "},"finish_reason":null}]}
-
-data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"are "},"finish_reason":null}]}
-
-data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
-
-data: [DONE]
-```
-
-### Models List
-
-```
-GET /v1/models
-Authorization: Bearer hermes-api-key-here
-```
-
-Response:
-```json
-{
-  "object": "list",
-  "data": [{
-    "id": "hermes-agent",
-    "object": "model",
-    "created": 1710000000,
-    "owned_by": "hermes-agent"
-  }]
-}
-```
-
-## Key Design Decisions
-
-### 1. Session Management
-
-The OpenAI API is stateless — each request includes the full conversation.
-But hermes-agent sessions have persistent state (memory, skills, tool context).
-
-**Approach: Hybrid**
- Default: Stateless. Each request is independent. The `messages` array IS
-  the conversation. No session persistence between requests.
- Opt-in persistent sessions via `X-Session-ID` header. When provided, the
-  server maintains session state across requests (conversation history,
-  memory context, tool state). This enables richer agent behavior.
- The session ID also enables interrupt support — a subsequent request with
-  the same session ID while one is running triggers an interrupt.
-
-### 2. Streaming
-
-The agent's `run_conversation()` is synchronous and returns the full response.
-For real SSE streaming, we need to emit chunks as they're generated.
-
-**Phase 1 (MVP):** Run agent in a thread, return the complete response as
-a single SSE chunk + `[DONE]`. This works with all frontends — they just see
-a fast single-chunk response. Not true streaming but functional.
-
-**Phase 2:** Add a response callback to AIAgent that emits text chunks as the
-LLM generates them. The API server captures these via a queue and streams them
-as SSE events. This gives real token-by-token streaming.
-
-**Phase 3:** Stream tool execution progress too — emit tool call/result events
-as the agent works, giving frontends visibility into what the agent is doing.
-
-### 3. Tool Transparency
-
-Two modes:
- **Opaque (default):** Frontends see only the final response. Tool calls
-  happen server-side and are invisible. Best for general-purpose UIs.
- **Transparent (opt-in via header):** Tool calls are emitted as OpenAI-format
-  tool_call/tool_result messages in the stream. Useful for agent-aware frontends.
-
-### 4. Authentication
-
- Bearer token via `Authorization: Bearer <key>` header
- Token configured via `API_SERVER_KEY` env var
- Optional: allow unauthenticated local-only access (127.0.0.1 bind)
- Follows the same pattern as other platform adapters
-
-### 5. Model Mapping
-
-Frontends send `"model": "hermes-agent"` (or whatever). The actual LLM model
-used is configured server-side in config.yaml. The API server maps any
-requested model name to the configured hermes-agent model.
-
-Optionally, allow model passthrough: if the frontend sends
-`"model": "anthropic/claude-sonnet-4"`, the agent uses that model. Controlled
-by a config flag.
-
-## Configuration
-
-```yaml
-# In config.yaml
-api_server:
-  enabled: true
-  port: 8642
-  host: "127.0.0.1"        # localhost only by default
-  key: "your-secret-key"   # or via API_SERVER_KEY env var
-  allow_model_override: false  # let clients choose the model
-  max_concurrent: 5         # max simultaneous requests
-```
-
-Environment variables:
-```bash
-API_SERVER_ENABLED=true
-API_SERVER_PORT=8642
-API_SERVER_HOST=127.0.0.1
-API_SERVER_KEY=your-secret-key
-```
-
-## Implementation Plan
-
-### Phase 1: MVP (non-streaming) — PR
-
-1. `gateway/platforms/api_server.py` — new adapter
-   - aiohttp.web server with endpoints:
-     - `POST /v1/chat/completions` — Chat Completions API (universal compat)
-     - `POST /v1/responses` — Responses API (server-side state, tool preservation)
-     - `GET /v1/models` — list available models
-     - `GET /health` — health check
-   - Bearer token auth middleware
-   - Non-streaming responses (run agent, return full result)
-   - Chat Completions: stateless, messages array is the conversation
-   - Responses API: server-side conversation storage via previous_response_id
-     - Store full internal conversation (including tool calls) keyed by response ID
-     - On subsequent requests, reconstruct full context from stored chain
-   - Frontend system prompt layered on top of hermes-agent's core prompt
-
-2. `gateway/config.py` — add `Platform.API_SERVER` enum + config
-
-3. `gateway/run.py` — register adapter in `_create_adapter()`
-
-4. Tests in `tests/gateway/test_api_server.py`
-
-### Phase 2: SSE Streaming
-
-1. Add response streaming to both endpoints
-   - Chat Completions: `choices[0].delta.content` SSE format
-   - Responses API: semantic events (response.output_text.delta, etc.)
-   - Run agent in thread, collect output via callback queue
-   - Handle client disconnect (cancel agent)
-
-2. Add `stream_callback` parameter to `AIAgent.run_conversation()`
-
-### Phase 3: Enhanced Features
-
-1. Tool call transparency mode (opt-in)
-2. Model passthrough/override
-3. Concurrent request limiting
-4. Usage tracking / rate limiting
-5. CORS headers for browser-based frontends
-6. GET /v1/responses/{id} — retrieve stored response
-7. DELETE /v1/responses/{id} — delete stored response
-
-## Files Changed
-
-| File | Change |
-|------|--------|
-| `gateway/platforms/api_server.py` | NEW — main adapter (~300 lines) |
-| `gateway/config.py` | Add Platform.API_SERVER + config (~20 lines) |
-| `gateway/run.py` | Register adapter in _create_adapter() (~10 lines) |
-| `tests/gateway/test_api_server.py` | NEW — tests (~200 lines) |
-| `cli-config.yaml.example` | Add api_server section |
-| `README.md` | Mention API server in platform list |
-
-## Compatibility Matrix
-
-Once implemented, hermes-agent works as a drop-in backend for:
-
-| Frontend | Stars | How to Connect |
-|----------|-------|---------------|
-| Open WebUI | 126k | Settings → Connections → Add OpenAI API, URL: `http://localhost:8642/v1` |
-| NextChat | 87k | BASE_URL env var |
-| LobeChat | 73k | Custom provider endpoint |
-| AnythingLLM | 56k | LLM Provider → Generic OpenAI |
-| Oobabooga | 42k | Already a backend, not a frontend |
-| ChatBox | 39k | API Host setting |
-| LibreChat | 34k | librechat.yaml custom endpoint |
-| Chatbot UI | 29k | Custom API endpoint |
-| Jan | 26k | Remote model config |
-| AionUI | 18k | Custom API endpoint |
-| HF Chat-UI | 8k | OPENAI_BASE_URL env var |
-| big-AGI | 7k | Custom endpoint |
--- a/.plans/streaming-support.md
+++ b/.plans/streaming-support.md
@@ -1,705 +0,0 @@
-# Streaming LLM Response Support for Hermes Agent
-
-## Overview
-
-Add token-by-token streaming of LLM responses across all platforms. When enabled,
-users see the response typing out live instead of waiting for the full generation.
-Streaming is opt-in via config, defaults to off, and all existing non-streaming
-code paths remain intact as the default.
-
-## Design Principles
-
-1. **Feature-flagged**: `streaming.enabled: true` in config.yaml. Off by default.
-   When off, all existing code paths are unchanged — zero risk to current behavior.
-2. **Callback-based**: A simple `stream_callback(text_delta: str)` function injected
-   into AIAgent. The agent doesn't know or care what the consumer does with tokens.
-3. **Graceful degradation**: If the provider doesn't support streaming, or streaming
-   fails for any reason, silently fall back to the non-streaming path.
-4. **Platform-agnostic core**: The streaming mechanism in AIAgent works the same
-   regardless of whether the consumer is CLI, Telegram, Discord, or the API server.
-
---
-
-## Architecture
-
-```
-                              stream_callback(delta)
-                                    │
-  ┌─────────────┐    ┌─────────────▼──────────────┐
-  │  LLM API    │    │      queue.Queue()          │
-  │  (stream)   │───►│  thread-safe bridge between │
-  │             │    │  agent thread & consumer    │
-  └─────────────┘    └─────────────┬──────────────┘
-                                   │
-                    ┌──────────────┼──────────────┐
-                    │              │              │
-              ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
-              │    CLI     │ │  Gateway  │ │ API Server│
-              │ print to   │ │ edit msg  │ │ SSE event │
-              │ terminal   │ │ on Tg/Dc  │ │ to client │
-              └───────────┘ └───────────┘ └───────────┘
-```
-
-The agent runs in a thread. The callback puts tokens into a thread-safe queue.
-Each consumer reads the queue in its own context (async task, main thread, etc.).
-
---
-
-## Configuration
-
-### config.yaml
-
-```yaml
-streaming:
-  enabled: false          # Master switch. Default off.
-  # Per-platform overrides (optional):
-  # cli: true             # Override for CLI only
-  # telegram: true        # Override for Telegram only
-  # discord: false        # Keep Discord non-streaming
-  # api_server: true      # Override for API server
-```
-
-### Environment variables
-
-```
-HERMES_STREAMING_ENABLED=true    # Master switch via env
-```
-
-### How the flag is read
-
- **CLI**: `load_cli_config()` reads `streaming.enabled`, sets env var. AIAgent
-  checks at init time.
- **Gateway**: `_run_agent()` reads config, decides whether to pass
-  `stream_callback` to the AIAgent constructor.
- **API server**: For Chat Completions `stream=true` requests, always uses streaming
-  regardless of config (the client is explicitly requesting it). For non-stream
-  requests, uses config.
-
-### Precedence
-
-1. API server: client's `stream` field overrides everything
-2. Per-platform config override (e.g., `streaming.telegram: true`)
-3. Master `streaming.enabled` flag
-4. Default: off
-
---
-
-## Implementation Plan
-
-### Phase 1: Core streaming infrastructure in AIAgent
-
-**File: run_agent.py**
-
-#### 1a. Add stream_callback parameter to __init__ (~5 lines)
-
-```python
-def __init__(self, ..., stream_callback: callable = None, ...):
-    self.stream_callback = stream_callback
-```
-
-No other init changes. The callback is optional — when None, everything
-works exactly as before.
-
-#### 1b. Add _run_streaming_chat_completion() method (~65 lines)
-
-New method for Chat Completions API streaming:
-
-```python
-def _run_streaming_chat_completion(self, api_kwargs: dict):
-    """Stream a chat completion, emitting text tokens via stream_callback.
-    
-    Returns a fake response object compatible with the non-streaming code path.
-    Falls back to non-streaming on any error.
-    """
-    stream_kwargs = dict(api_kwargs)
-    stream_kwargs["stream"] = True
-    stream_kwargs["stream_options"] = {"include_usage": True}
-    
-    accumulated_content = []
-    accumulated_tool_calls = {}  # index -> {id, name, arguments}
-    final_usage = None
-    
-    try:
-        stream = self.client.chat.completions.create(**stream_kwargs)
-        
-        for chunk in stream:
-            if not chunk.choices:
-                # Usage-only chunk (final)
-                if chunk.usage:
-                    final_usage = chunk.usage
-                continue
-            
-            delta = chunk.choices[0].delta
-            
-            # Text content — emit via callback
-            if delta.content:
-                accumulated_content.append(delta.content)
-                if self.stream_callback:
-                    try:
-                        self.stream_callback(delta.content)
-                    except Exception:
-                        pass
-            
-            # Tool call deltas — accumulate silently
-            if delta.tool_calls:
-                for tc_delta in delta.tool_calls:
-                    idx = tc_delta.index
-                    if idx not in accumulated_tool_calls:
-                        accumulated_tool_calls[idx] = {
-                            "id": tc_delta.id or "",
-                            "name": "", "arguments": ""
-                        }
-                    if tc_delta.function:
-                        if tc_delta.function.name:
-                            accumulated_tool_calls[idx]["name"] = tc_delta.function.name
-                        if tc_delta.function.arguments:
-                            accumulated_tool_calls[idx]["arguments"] += tc_delta.function.arguments
-        
-        # Build fake response compatible with existing code
-        tool_calls = []
-        for idx in sorted(accumulated_tool_calls):
-            tc = accumulated_tool_calls[idx]
-            if tc["name"]:
-                tool_calls.append(SimpleNamespace(
-                    id=tc["id"], type="function",
-                    function=SimpleNamespace(name=tc["name"], arguments=tc["arguments"]),
-                ))
-        
-        return SimpleNamespace(
-            choices=[SimpleNamespace(
-                message=SimpleNamespace(
-                    content="".join(accumulated_content) or "",
-                    tool_calls=tool_calls or None,
-                    role="assistant",
-                ),
-                finish_reason="tool_calls" if tool_calls else "stop",
-            )],
-            usage=final_usage,
-            model=self.model,
-        )
-    
-    except Exception as e:
-        logger.debug("Streaming failed, falling back to non-streaming: %s", e)
-        return self.client.chat.completions.create(**api_kwargs)
-```
-
-#### 1c. Modify _run_codex_stream() for Responses API (~10 lines)
-
-The method already iterates the stream. Add callback emission:
-
-```python
-def _run_codex_stream(self, api_kwargs: dict):
-    with self.client.responses.stream(**api_kwargs) as stream:
-        for event in stream:
-            # Emit text deltas if streaming callback is set
-            if self.stream_callback and hasattr(event, 'type'):
-                if event.type == 'response.output_text.delta':
-                    try:
-                        self.stream_callback(event.delta)
-                    except Exception:
-                        pass
-        return stream.get_final_response()
-```
-
-#### 1d. Modify _interruptible_api_call() (~5 lines)
-
-Add the streaming branch:
-
-```python
-def _call():
-    try:
-        if self.api_mode == "codex_responses":
-            result["response"] = self._run_codex_stream(api_kwargs)
-        elif self.stream_callback is not None:
-            result["response"] = self._run_streaming_chat_completion(api_kwargs)
-        else:
-            result["response"] = self.client.chat.completions.create(**api_kwargs)
-    except Exception as e:
-        result["error"] = e
-```
-
-#### 1e. Signal end-of-stream to consumers (~5 lines)
-
-After the API call returns, signal the callback that streaming is done
-so consumers can finalize (remove cursor, close SSE, etc.):
-
-```python
-# In run_conversation(), after _interruptible_api_call returns:
-if self.stream_callback:
-    try:
-        self.stream_callback(None)  # None = end of stream signal
-    except Exception:
-        pass
-```
-
-Consumers check: `if delta is None: finalize()`
-
-**Tests for Phase 1:** (~150 lines)
- Test _run_streaming_chat_completion with mocked stream
- Test fallback to non-streaming on error
- Test tool_call accumulation during streaming
- Test stream_callback receives correct deltas
- Test None signal at end of stream
- Test streaming disabled when callback is None
-
---
-
-### Phase 2: Gateway consumers (Telegram, Discord, etc.)
-
-**File: gateway/run.py**
-
-#### 2a. Read streaming config (~15 lines)
-
-In `_run_agent()`, before creating the AIAgent:
-
-```python
-# Read streaming config
-_streaming_enabled = False
-try:
-    # Check per-platform override first
-    platform_key = source.platform.value if source.platform else ""
-    _stream_cfg = {}  # loaded from config.yaml streaming section
-    if _stream_cfg.get(platform_key) is not None:
-        _streaming_enabled = bool(_stream_cfg[platform_key])
-    else:
-        _streaming_enabled = bool(_stream_cfg.get("enabled", False))
-except Exception:
-    pass
-# Env var override
-if os.getenv("HERMES_STREAMING_ENABLED", "").lower() in ("true", "1", "yes"):
-    _streaming_enabled = True
-```
-
-#### 2b. Set up queue + callback (~15 lines)
-
-```python
-_stream_q = None
-_stream_done = None
-_stream_msg_id = [None]  # mutable ref for the async task
-
-if _streaming_enabled:
-    import queue as _q
-    _stream_q = _q.Queue()
-    _stream_done = threading.Event()
-    
-    def _on_token(delta):
-        if delta is None:
-            _stream_done.set()
-        else:
-            _stream_q.put(delta)
-```
-
-Pass `stream_callback=_on_token` to the AIAgent constructor.
-
-#### 2c. Telegram/Discord stream preview task (~50 lines)
-
-```python
-async def stream_preview():
-    """Progressively edit a message with streaming tokens."""
-    if not _stream_q:
-        return
-    adapter = self.adapters.get(source.platform)
-    if not adapter:
-        return
-    
-    accumulated = []
-    token_count = 0
-    last_edit = 0.0
-    MIN_TOKENS = 20          # Don't show until enough context
-    EDIT_INTERVAL = 1.5      # Respect Telegram rate limits
-    
-    try:
-        while not _stream_done.is_set():
-            try:
-                chunk = _stream_q.get(timeout=0.1)
-                accumulated.append(chunk)
-                token_count += 1
-            except queue.Empty:
-                continue
-            
-            now = time.monotonic()
-            if token_count >= MIN_TOKENS and (now - last_edit) >= EDIT_INTERVAL:
-                preview = "".join(accumulated) + " ▌"
-                if _stream_msg_id[0] is None:
-                    r = await adapter.send(
-                        chat_id=source.chat_id,
-                        content=preview,
-                        metadata=_thread_metadata,
-                    )
-                    if r.success and r.message_id:
-                        _stream_msg_id[0] = r.message_id
-                else:
-                    await adapter.edit_message(
-                        chat_id=source.chat_id,
-                        message_id=_stream_msg_id[0],
-                        content=preview,
-                    )
-                last_edit = now
-        
-        # Drain remaining tokens
-        while not _stream_q.empty():
-            accumulated.append(_stream_q.get_nowait())
-        
-        # Final edit — remove cursor, show complete text
-        if _stream_msg_id[0] and accumulated:
-            await adapter.edit_message(
-                chat_id=source.chat_id,
-                message_id=_stream_msg_id[0],
-                content="".join(accumulated),
-            )
-    
-    except asyncio.CancelledError:
-        # Clean up on cancel
-        if _stream_msg_id[0] and accumulated:
-            try:
-                await adapter.edit_message(
-                    chat_id=source.chat_id,
-                    message_id=_stream_msg_id[0],
-                    content="".join(accumulated),
-                )
-            except Exception:
-                pass
-    except Exception as e:
-        logger.debug("stream_preview error: %s", e)
-```
-
-#### 2d. Skip final send if already streamed (~10 lines)
-
-In `_process_message_background()` (base.py), after getting the response,
-if streaming was active and `_stream_msg_id[0]` is set, the final response
-was already delivered via progressive edits. Skip the normal `self.send()`
-call to avoid duplicating the message.
-
-This is the most delicate integration point — we need to communicate from
-the gateway's `_run_agent` back to the base adapter's response sender that
-the response was already delivered. Options:
-
- **Option A**: Return a special marker in the result dict:
-  `result["_streamed_msg_id"] = _stream_msg_id[0]`
-  The base adapter checks this and skips `send()`.
-  
- **Option B**: Edit the already-sent message with the final response
-  (which may differ slightly from accumulated tokens due to think-block
-  stripping, etc.) and don't send a new one.
-
- **Option C**: The stream preview task handles the FULL final response
-  (including any post-processing), and the handler returns None to skip
-  the normal send path.
-
-Recommended: **Option A** — cleanest separation. The result dict already
-carries metadata; adding one more field is low-risk.
-
-**Platform-specific considerations:**
-
-| Platform | Edit support | Rate limits | Streaming approach |
-|----------|-------------|-------------|-------------------|
-| Telegram | ✅ edit_message_text | ~20 edits/min | Edit every 1.5s |
-| Discord | ✅ message.edit | 5 edits/5s per message | Edit every 1.2s |
-| Slack | ✅ chat.update | Tier 3 (~50/min) | Edit every 1.5s |
-| WhatsApp | ❌ no edit support | N/A | Skip streaming, use normal path |
-| HomeAssistant | ❌ no edit | N/A | Skip streaming |
-| API Server | ✅ SSE native | No limit | Real SSE events |
-
-WhatsApp and HomeAssistant fall back to non-streaming automatically because
-they don't support message editing.
-
-**Tests for Phase 2:** (~100 lines)
- Test stream_preview sends/edits correctly
- Test skip-final-send when streaming delivered
- Test WhatsApp/HA graceful fallback
- Test streaming disabled per-platform config
- Test thread_id metadata forwarded in stream messages
-
---
-
-### Phase 3: CLI streaming
-
-**File: cli.py**
-
-#### 3a. Set up callback in the CLI chat loop (~20 lines)
-
-In `_chat_once()` or wherever the agent is invoked:
-
-```python
-if streaming_enabled:
-    _stream_q = queue.Queue()
-    _stream_done = threading.Event()
-    
-    def _cli_stream_callback(delta):
-        if delta is None:
-            _stream_done.set()
-        else:
-            _stream_q.put(delta)
-    
-    agent.stream_callback = _cli_stream_callback
-```
-
-#### 3b. Token display thread/task (~30 lines)
-
-Start a thread that reads the queue and prints tokens:
-
-```python
-def _stream_display():
-    """Print tokens to terminal as they arrive."""
-    first_token = True
-    while not _stream_done.is_set():
-        try:
-            delta = _stream_q.get(timeout=0.1)
-        except queue.Empty:
-            continue
-        if first_token:
-            # Print response box top border
-            _cprint(f"\n{top}")
-            first_token = False
-        sys.stdout.write(delta)
-        sys.stdout.flush()
-    # Drain remaining
-    while not _stream_q.empty():
-        sys.stdout.write(_stream_q.get_nowait())
-    sys.stdout.flush()
-    # Print bottom border
-    _cprint(f"\n\n{bot}")
-```
-
-**Integration challenge: prompt_toolkit**
-
-The CLI uses prompt_toolkit which controls the terminal. Writing directly
-to stdout while prompt_toolkit is active can cause display corruption.
-The existing KawaiiSpinner already solves this by using prompt_toolkit's
-`patch_stdout` context. The streaming display would need to do the same.
-
-Alternative: use `_cprint()` for each token chunk (routes through
-prompt_toolkit's renderer). But this might be slow for individual tokens.
-
-Recommended approach: accumulate tokens in small batches (e.g., every 50ms)
-and `_cprint()` the batch. This balances display responsiveness with
-prompt_toolkit compatibility.
-
-**Tests for Phase 3:** (~50 lines)
- Test CLI streaming callback setup
- Test response box borders with streaming
- Test fallback when streaming disabled
-
---
-
-### Phase 4: API Server real streaming
-
-**File: gateway/platforms/api_server.py**
-
-Replace the pseudo-streaming `_write_sse_chat_completion()` with real
-token-by-token SSE when the agent supports it.
-
-#### 4a. Wire streaming callback for stream=true requests (~20 lines)
-
-```python
-if stream:
-    _stream_q = queue.Queue()
-    
-    def _api_stream_callback(delta):
-        _stream_q.put(delta)  # None = done
-    
-    # Pass callback to _run_agent
-    result, usage = await self._run_agent(
-        ..., stream_callback=_api_stream_callback,
-    )
-```
-
-#### 4b. Real SSE writer (~40 lines)
-
-```python
-async def _write_real_sse(self, request, completion_id, model, stream_q):
-    response = web.StreamResponse(
-        headers={"Content-Type": "text/event-stream", "Cache-Control": "no-cache"},
-    )
-    await response.prepare(request)
-    
-    # Role chunk
-    await response.write(...)
-    
-    # Stream content chunks as they arrive
-    while True:
-        try:
-            delta = await asyncio.get_event_loop().run_in_executor(
-                None, lambda: stream_q.get(timeout=0.1)
-            )
-        except queue.Empty:
-            continue
-        
-        if delta is None:  # End of stream
-            break
-        
-        chunk = {"id": completion_id, "object": "chat.completion.chunk", ...
-                 "choices": [{"delta": {"content": delta}, ...}]}
-        await response.write(f"data: {json.dumps(chunk)}\n\n".encode())
-    
-    # Finish + [DONE]
-    await response.write(...)
-    await response.write(b"data: [DONE]\n\n")
-    return response
-```
-
-**Challenge: concurrent execution**
-
-The agent runs in a thread executor. SSE writing happens in the async event
-loop. The queue bridges them. But `_run_agent()` currently awaits the full
-result before returning. For real streaming, we need to start the agent in
-the background and stream tokens while it runs:
-
-```python
-# Start agent in background
-agent_task = asyncio.create_task(self._run_agent_async(...))
-
-# Stream tokens while agent runs
-await self._write_real_sse(request, ..., stream_q)
-
-# Agent is done by now (stream_q received None)
-result, usage = await agent_task
-```
-
-This requires splitting `_run_agent` into an async version that doesn't
-block waiting for the result, or running it in a separate task.
-
-**Responses API SSE format:**
-
-For `/v1/responses` with `stream=true`, the SSE events are different:
-
-```
-event: response.output_text.delta
-data: {"type":"response.output_text.delta","delta":"Hello"}
-
-event: response.completed  
-data: {"type":"response.completed","response":{...}}
-```
-
-This needs a separate SSE writer that emits Responses API format events.
-
-**Tests for Phase 4:** (~80 lines)
- Test real SSE streaming with mocked agent
- Test SSE event format (Chat Completions vs Responses)
- Test client disconnect during streaming
- Test fallback to pseudo-streaming when callback not available
-
---
-
-## Integration Issues & Edge Cases
-
-### 1. Tool calls during streaming
-
-When the model returns tool calls instead of text, no text tokens are emitted.
-The stream_callback is simply never called with text. After tools execute, the
-next API call may produce the final text response — streaming picks up again.
-
-The stream preview task needs to handle this: if no tokens arrive during a
-tool-call round, don't send/edit any message. The tool progress messages
-continue working as before.
-
-### 2. Duplicate messages
-
-The biggest risk: the agent sends the final response normally (via the
-existing send path) AND the stream preview already showed it. The user
-sees the response twice.
-
-Prevention: when streaming is active and tokens were delivered, the final
-response send must be suppressed. The `result["_streamed_msg_id"]` marker
-tells the base adapter to skip its normal send.
-
-### 3. Response post-processing
-
-The final response may differ from the accumulated streamed tokens:
- Think block stripping (`<think>...</think>` removed)
- Trailing whitespace cleanup
- Tool result media tag appending
-
-The stream preview shows raw tokens. The final edit should use the
-post-processed version. This means the final edit (removing the cursor)
-should use the post-processed `final_response`, not just the accumulated
-stream text.
-
-### 4. Context compression during streaming
-
-If the agent triggers context compression mid-conversation, the streaming
-tokens from BEFORE compression are from a different context than those
-after. This isn't a problem in practice — compression happens between
-API calls, not during streaming.
-
-### 5. Interrupt during streaming
-
-User sends a new message while streaming → interrupt. The stream is killed
-(HTTP connection closed), accumulated tokens are shown as-is (no cursor),
-and the interrupt message is processed normally. This is already handled by
-`_interruptible_api_call` closing the client.
-
-### 6. Multi-model / fallback
-
-If the primary model fails and the agent falls back to a different model,
-streaming state resets. The fallback call may or may not support streaming.
-The graceful fallback in `_run_streaming_chat_completion` handles this.
-
-### 7. Rate limiting on edits
-
-Telegram: ~20 edits/minute (~1 every 3 seconds to be safe)
-Discord: 5 edits per 5 seconds per message
-Slack: ~50 API calls/minute
-
-The 1.5s edit interval is conservative enough for all platforms. If we get
-429 rate limit errors on edits, just skip that edit cycle and try next time.
-
---
-
-## Files Changed Summary
-
-| File | Phase | Changes |
-|------|-------|---------|
-| `run_agent.py` | 1 | +stream_callback param, +_run_streaming_chat_completion(), modify _run_codex_stream(), modify _interruptible_api_call() |
-| `gateway/run.py` | 2 | +streaming config reader, +queue/callback setup, +stream_preview task, +skip-final-send logic |
-| `gateway/platforms/base.py` | 2 | +check for _streamed_msg_id in response handler |
-| `cli.py` | 3 | +streaming setup, +token display, +response box integration |
-| `gateway/platforms/api_server.py` | 4 | +real SSE writer, +streaming callback wiring |
-| `hermes_cli/config.py` | 1 | +streaming config defaults |
-| `cli-config.yaml.example` | 1 | +streaming section |
-| `tests/test_streaming.py` | 1-4 | NEW — ~380 lines of tests |
-
-**Total new code**: ~500 lines across all phases
-**Total test code**: ~380 lines
-
---
-
-## Rollout Plan
-
-1. **Phase 1** (core): Merge to main. Streaming disabled by default.
-   Zero impact on existing behavior. Can be tested with env var.
-
-2. **Phase 2** (gateway): Merge to main. Test on Telegram manually.
-   Enable per-platform: `streaming.telegram: true` in config.
-
-3. **Phase 3** (CLI): Merge to main. Test in terminal.
-   Enable: `streaming.cli: true` or `streaming.enabled: true`.
-
-4. **Phase 4** (API server): Merge to main. Test with Open WebUI.
-   Auto-enabled when client sends `stream: true`.
-
-Each phase is independently mergeable and testable. Streaming stays
-off by default throughout. Once all phases are stable, consider
-changing the default to enabled.
-
---
-
-## Config Reference (final state)
-
-```yaml
-# config.yaml
-streaming:
-  enabled: false          # Master switch (default: off)
-  cli: true               # Per-platform override
-  telegram: true
-  discord: true
-  slack: true
-  api_server: true        # API server always streams when client requests it
-  edit_interval: 1.5      # Seconds between message edits (default: 1.5)
-  min_tokens: 20          # Tokens before first display (default: 20)
-```
-
-```bash
-# Environment variable override
-HERMES_STREAMING_ENABLED=true
-```
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -5,66 +5,53 @@ Instructions for AI coding assistants and developers working on the hermes-agent
 ## Development Environment

 ```bash
-source venv/bin/activate  # ALWAYS activate before running Python
+source .venv/bin/activate  # ALWAYS activate before running Python
 ```

 ## Project Structure

 ```
 hermes-agent/
-├── hermes_agent/             # Single installable package
-│   ├── agent/                # Core conversation loop and agent internals
-│   │   ├── loop.py               # AIAgent class — core conversation loop
-│   │   ├── prompt_builder.py     # System prompt assembly
-│   │   ├── context/              # Context management (engine, compressor, references)
-│   │   ├── memory/               # Memory management (manager, provider)
-│   │   ├── image_gen/            # Image generation (provider, registry)
-│   │   ├── display.py            # KawaiiSpinner, tool preview formatting
-│   │   ├── skill_commands.py     # Skill slash commands (shared CLI/gateway)
-│   │   └── trajectory.py         # Trajectory saving helpers
-│   ├── providers/            # LLM provider adapters and transports
-│   │   ├── anthropic_adapter.py  # Anthropic adapter
-│   │   ├── anthropic_transport.py # Anthropic transport
-│   │   ├── metadata.py           # Model context lengths, token estimation
-│   │   ├── auxiliary.py           # Auxiliary LLM client (vision, summarization)
-│   │   ├── caching.py            # Anthropic prompt caching
-│   │   └── credential_pool.py    # Credential management
-│   ├── tools/                # Tool implementations
-│   │   ├── dispatch.py           # Tool orchestration, discover_builtin_tools()
-│   │   ├── toolsets.py           # Toolset definitions
-│   │   ├── registry.py           # Central tool registry
-│   │   ├── terminal.py           # Terminal orchestration
-│   │   ├── browser/              # Browser tools (tool, cdp, camofox, providers/)
-│   │   ├── mcp/                  # MCP client and server
-│   │   ├── skills/               # Skill management (manager, tool, hub, guard, sync)
-│   │   ├── media/                # Voice, TTS, transcription, image gen
-│   │   ├── files/                # File operations (tools, operations, state)
-│   │   └── security/             # Path security, URL safety, approval
-│   ├── backends/             # Terminal backends (local, docker, ssh, modal, daytona, singularity)
-│   ├── cli/                  # CLI subcommands and setup
-│   │   ├── main.py               # Entry point — all `hermes` subcommands
-│   │   ├── repl.py               # HermesCLI class — interactive CLI orchestrator
-│   │   ├── config.py             # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
-│   │   ├── commands.py           # Slash command definitions
-│   │   ├── auth/                 # Provider credential resolution
-│   │   ├── models/               # Model catalog, provider lists, switching
-│   │   └── ui/                   # Banner, colors, skin engine, callbacks, tips
-│   ├── gateway/              # Messaging platform gateway
-│   │   ├── run.py                # Main loop, slash commands, message dispatch
-│   │   ├── session.py            # SessionStore — conversation persistence
-│   │   └── platforms/            # Adapters: telegram, discord, slack, whatsapp, etc.
-│   ├── acp/                  # ACP server (VS Code / Zed / JetBrains integration)
-│   ├── cron/                 # Scheduler (jobs.py, scheduler.py)
-│   ├── plugins/              # Plugin system (memory providers, context engines)
-│   ├── constants.py          # Shared constants
-│   ├── state.py              # SessionDB — SQLite session store
-│   ├── logging.py            # Logging configuration
-│   └── utils.py              # Shared utilities
-├── tui_gateway/          # Python JSON-RPC backend for the TUI
-├── ui-tui/               # Ink (React) terminal UI — `hermes --tui`
+├── run_agent.py          # AIAgent class — core conversation loop
+├── model_tools.py        # Tool orchestration, _discover_tools(), handle_function_call()
+├── toolsets.py           # Toolset definitions, _HERMES_CORE_TOOLS list
+├── cli.py                # HermesCLI class — interactive CLI orchestrator
+├── hermes_state.py       # SessionDB — SQLite session store (FTS5 search)
+├── agent/                # Agent internals
+│   ├── prompt_builder.py     # System prompt assembly
+│   ├── context_compressor.py # Auto context compression
+│   ├── prompt_caching.py     # Anthropic prompt caching
+│   ├── auxiliary_client.py   # Auxiliary LLM client (vision, summarization)
+│   ├── model_metadata.py     # Model context lengths, token estimation
+│   ├── display.py            # KawaiiSpinner, tool preview formatting
+│   ├── skill_commands.py     # Skill slash commands (shared CLI/gateway)
+│   └── trajectory.py         # Trajectory saving helpers
+├── hermes_cli/           # CLI subcommands and setup
+│   ├── main.py           # Entry point — all `hermes` subcommands
+│   ├── config.py         # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
+│   ├── commands.py       # Slash command definitions + SlashCommandCompleter
+│   ├── callbacks.py      # Terminal callbacks (clarify, sudo, approval)
+│   └── setup.py          # Interactive setup wizard
+├── tools/                # Tool implementations (one file per tool)
+│   ├── registry.py       # Central tool registry (schemas, handlers, dispatch)
+│   ├── approval.py       # Dangerous command detection
+│   ├── terminal_tool.py  # Terminal orchestration
+│   ├── process_registry.py # Background process management
+│   ├── file_tools.py     # File read/write/search/patch
+│   ├── web_tools.py      # Firecrawl search/extract
+│   ├── browser_tool.py   # Browserbase browser automation
+│   ├── code_execution_tool.py # execute_code sandbox
+│   ├── delegate_tool.py  # Subagent delegation
+│   ├── mcp_tool.py       # MCP client (~1050 lines)
+│   └── environments/     # Terminal backends (local, docker, ssh, modal, daytona, singularity)
+├── gateway/              # Messaging platform gateway
+│   ├── run.py            # Main loop, slash commands, message dispatch
+│   ├── session.py        # SessionStore — conversation persistence
+│   └── platforms/        # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal
+├── cron/                 # Scheduler (jobs.py, scheduler.py)
 ├── environments/         # RL training environments (Atropos)
-├── tests/                # Pytest suite
-└── web/                  # Vite + React web dashboard
+├── tests/                # Pytest suite (~2500+ tests)
+└── batch_runner.py       # Parallel batch processing
 ```

 **User config:** `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys)
@@ -72,18 +59,18 @@ hermes-agent/
 ## File Dependency Chain

 ```
-hermes_agent/tools/registry.py  (no deps — imported by all tool files)
+tools/registry.py  (no deps — imported by all tool files)
       ↑
-hermes_agent/tools/*.py  (each calls registry.register() at import time)
+tools/*.py  (each calls registry.register() at import time)
       ↑
-hermes_agent/tools/dispatch.py  (imports registry + triggers tool discovery)
+model_tools.py  (imports tools/registry + triggers tool discovery)
       ↑
-hermes_agent/agent/loop.py, hermes_agent/cli/repl.py, environments/
+run_agent.py, cli.py, batch_runner.py, environments/
 ```

 ---

-## AIAgent Class (hermes_agent/agent/loop.py)
+## AIAgent Class (run_agent.py)

 ```python
 class AIAgent:
@@ -129,116 +116,25 @@ Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Re

 ---

-## CLI Architecture (hermes_agent/cli/repl.py)
+## CLI Architecture (cli.py)

 - **Rich** for banner/panels, **prompt_toolkit** for input with autocomplete
- **KawaiiSpinner** (`hermes_agent/agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
- `load_cli_config()` in repl.py merges hardcoded defaults + user config YAML
- **Skin engine** (`hermes_agent/cli/ui/skin_engine.py`) — data-driven CLI theming; initialized from `display.skin` config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
- `process_command()` is a method on `HermesCLI` — dispatches on canonical command name resolved via `resolve_command()` from the central registry
- Skill slash commands: `hermes_agent/agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching
+- **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
+- `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
+- `process_command()` is a method on `HermesCLI` (not in commands.py)
+- Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching

-### Slash Command Registry (`hermes_cli/commands.py`)
+### Adding CLI Commands

-All slash commands are defined in a central `COMMAND_REGISTRY` list of `CommandDef` objects. Every downstream consumer derives from this registry automatically:
-
- **CLI** — `process_command()` resolves aliases via `resolve_command()`, dispatches on canonical name
- **Gateway** — `GATEWAY_KNOWN_COMMANDS` frozenset for hook emission, `resolve_command()` for dispatch
- **Gateway help** — `gateway_help_lines()` generates `/help` output
- **Telegram** — `telegram_bot_commands()` generates the BotCommand menu
- **Slack** — `slack_subcommand_map()` generates `/hermes` subcommand routing
- **Autocomplete** — `COMMANDS` flat dict feeds `SlashCommandCompleter`
- **CLI help** — `COMMANDS_BY_CATEGORY` dict feeds `show_help()`
-
-### Adding a Slash Command
-
-1. Add a `CommandDef` entry to `COMMAND_REGISTRY` in `hermes_cli/commands.py`:
-```python
-CommandDef("mycommand", "Description of what it does", "Session",
-           aliases=("mc",), args_hint="[arg]"),
-```
-2. Add handler in `HermesCLI.process_command()` in `cli.py`:
-```python
-elif canonical == "mycommand":
-    self._handle_mycommand(cmd_original)
-```
-3. If the command is available in the gateway, add a handler in `gateway/run.py`:
-```python
-if canonical == "mycommand":
-    return await self._handle_mycommand(event)
-```
-4. For persistent settings, use `save_config_value()` in `cli.py`
-
-**CommandDef fields:**
- `name` — canonical name without slash (e.g. `"background"`)
- `description` — human-readable description
- `category` — one of `"Session"`, `"Configuration"`, `"Tools & Skills"`, `"Info"`, `"Exit"`
- `aliases` — tuple of alternative names (e.g. `("bg",)`)
- `args_hint` — argument placeholder shown in help (e.g. `"<prompt>"`, `"[name]"`)
- `cli_only` — only available in the interactive CLI
- `gateway_only` — only available in messaging platforms
- `gateway_config_gate` — config dotpath (e.g. `"display.tool_progress_command"`); when set on a `cli_only` command, the command becomes available in the gateway if the config value is truthy. `GATEWAY_KNOWN_COMMANDS` always includes config-gated commands so the gateway can dispatch them; help/menus only show them when the gate is open.
-
-**Adding an alias** requires only adding it to the `aliases` tuple on the existing `CommandDef`. No other file changes needed — dispatch, help text, Telegram menu, Slack mapping, and autocomplete all update automatically.
-
---
-
-## TUI Architecture (ui-tui + tui_gateway)
-
-The TUI is a full replacement for the classic (prompt_toolkit) CLI, activated via `hermes --tui` or `HERMES_TUI=1`.
-
-### Process Model
-
-```
-hermes --tui
-  └─ Node (Ink)  ──stdio JSON-RPC──  Python (tui_gateway)
-       │                                  └─ AIAgent + tools + sessions
-       └─ renders transcript, composer, prompts, activity
-```
-
-TypeScript owns the screen. Python owns sessions, tools, model calls, and slash command logic.
-
-### Transport
-
-Newline-delimited JSON-RPC over stdio. Requests from Ink, events from Python. See `tui_gateway/server.py` for the full method/event catalog.
-
-### Key Surfaces
-
-| Surface | Ink component | Gateway method |
-|---------|---------------|----------------|
-| Chat streaming | `app.tsx` + `messageLine.tsx` | `prompt.submit` → `message.delta/complete` |
-| Tool activity | `thinking.tsx` | `tool.start/progress/complete` |
-| Approvals | `prompts.tsx` | `approval.respond` ← `approval.request` |
-| Clarify/sudo/secret | `prompts.tsx`, `maskedPrompt.tsx` | `clarify/sudo/secret.respond` |
-| Session picker | `sessionPicker.tsx` | `session.list/resume` |
-| Slash commands | Local handler + fallthrough | `slash.exec` → `_SlashWorker`, `command.dispatch` |
-| Completions | `useCompletion` hook | `complete.slash`, `complete.path` |
-| Theming | `theme.ts` + `branding.tsx` | `gateway.ready` with skin data |
-
-### Slash Command Flow
-
-1. Built-in client commands (`/help`, `/quit`, `/clear`, `/resume`, `/copy`, `/paste`, etc.) handled locally in `app.tsx`
-2. Everything else → `slash.exec` (runs in persistent `_SlashWorker` subprocess) → `command.dispatch` fallback
-
-### Dev Commands
-
-```bash
-cd ui-tui
-npm install       # first time
-npm run dev       # watch mode (rebuilds hermes-ink + tsx --watch)
-npm start         # production
-npm run build     # full build (hermes-ink + tsc)
-npm run type-check # typecheck only (tsc --noEmit)
-npm run lint      # eslint
-npm run fmt       # prettier
-npm test          # vitest
-```
+1. Add to `COMMANDS` dict in `hermes_cli/commands.py`
+2. Add handler in `HermesCLI.process_command()` in `cli.py`
+3. For persistent settings, use `save_config_value()` in `cli.py`

 ---

 ## Adding New Tools

-Requires changes in **2 files**:
+Requires changes in **3 files**:

 **1. Create `tools/your_tool.py`:**
 ```python
@@ -261,16 +157,12 @@ registry.register(
 )
 ```

-**2. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
+**2. Add import** in `model_tools.py` `_discover_tools()` list.

-Auto-discovery: any `hermes_agent/tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual import list to maintain.
+**3. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.

 The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.

-**Path references in tool schemas**: If the schema description mentions file paths (e.g. default output directories), use `display_hermes_home()` to make them profile-aware. The schema is generated at import time, which is after `_apply_profile_override()` sets `HERMES_HOME`.
-
-**State files**: If a tool stores persistent state (caches, logs, checkpoints), use `get_hermes_home()` for the base directory — never `Path.home() / ".hermes"`. This ensures each profile gets its own state.
-
 **Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `todo_tool.py` for the pattern.

 ---
@@ -303,96 +195,8 @@ The registry handles schema collection, dispatch, availability checking, and err

 ---

-## Skin/Theme System
-
-The skin engine (`hermes_cli/skin_engine.py`) provides data-driven CLI visual customization. Skins are **pure data** — no code changes needed to add a new skin.
-
-### Architecture
-
-```
-hermes_cli/skin_engine.py    # SkinConfig dataclass, built-in skins, YAML loader
-~/.hermes/skins/*.yaml       # User-installed custom skins (drop-in)
-```
-
- `init_skin_from_config()` — called at CLI startup, reads `display.skin` from config
- `get_active_skin()` — returns cached `SkinConfig` for the current skin
- `set_active_skin(name)` — switches skin at runtime (used by `/skin` command)
- `load_skin(name)` — loads from user skins first, then built-ins, then falls back to default
- Missing skin values inherit from the `default` skin automatically
-
-### What skins customize
-
-| Element | Skin Key | Used By |
-|---------|----------|---------|
-| Banner panel border | `colors.banner_border` | `banner.py` |
-| Banner panel title | `colors.banner_title` | `banner.py` |
-| Banner section headers | `colors.banner_accent` | `banner.py` |
-| Banner dim text | `colors.banner_dim` | `banner.py` |
-| Banner body text | `colors.banner_text` | `banner.py` |
-| Response box border | `colors.response_border` | `cli.py` |
-| Spinner faces (waiting) | `spinner.waiting_faces` | `display.py` |
-| Spinner faces (thinking) | `spinner.thinking_faces` | `display.py` |
-| Spinner verbs | `spinner.thinking_verbs` | `display.py` |
-| Spinner wings (optional) | `spinner.wings` | `display.py` |
-| Tool output prefix | `tool_prefix` | `display.py` |
-| Per-tool emojis | `tool_emojis` | `display.py` → `get_tool_emoji()` |
-| Agent name | `branding.agent_name` | `banner.py`, `cli.py` |
-| Welcome message | `branding.welcome` | `cli.py` |
-| Response box label | `branding.response_label` | `cli.py` |
-| Prompt symbol | `branding.prompt_symbol` | `cli.py` |
-
-### Built-in skins
-
- `default` — Classic Hermes gold/kawaii (the current look)
- `ares` — Crimson/bronze war-god theme with custom spinner wings
- `mono` — Clean grayscale monochrome
- `slate` — Cool blue developer-focused theme
-
-### Adding a built-in skin
-
-Add to `_BUILTIN_SKINS` dict in `hermes_cli/skin_engine.py`:
-
-```python
-"mytheme": {
-    "name": "mytheme",
-    "description": "Short description",
-    "colors": { ... },
-    "spinner": { ... },
-    "branding": { ... },
-    "tool_prefix": "┊",
-},
-```
-
-### User skins (YAML)
-
-Users create `~/.hermes/skins/<name>.yaml`:
-
-```yaml
-name: cyberpunk
-description: Neon-soaked terminal theme
-
-colors:
-  banner_border: "#FF00FF"
-  banner_title: "#00FFFF"
-  banner_accent: "#FF1493"
-
-spinner:
-  thinking_verbs: ["jacking in", "decrypting", "uploading"]
-  wings:
-    - ["⟨⚡", "⚡⟩"]
-
-branding:
-  agent_name: "Cyber Agent"
-  response_label: " ⚡ Cyber "
-
-tool_prefix: "▏"
-```
-
-Activate with `/skin cyberpunk` or `display.skin: cyberpunk` in config.yaml.
-
---
-
 ## Important Policies
+
 ### Prompt Caching Must Not Break

 Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT implement changes that would:**
@@ -406,203 +210,33 @@ Cache-breaking forces dramatically higher costs. The ONLY time we alter context
 - **CLI**: Uses current directory (`.` → `os.getcwd()`)
 - **Messaging**: Uses `MESSAGING_CWD` env var (default: home directory)

-### Background Process Notifications (Gateway)
-
-When `terminal(background=true, notify_on_complete=true)` is used, the gateway runs a watcher that
-detects process completion and triggers a new agent turn. Control verbosity of background process
-messages with `display.background_process_notifications`
-in config.yaml (or `HERMES_BACKGROUND_NOTIFICATIONS` env var):
-
- `all` — running-output updates + final message (default)
- `result` — only the final completion message
- `error` — only the final message when exit code != 0
- `off` — no watcher messages at all
-
 ---

-## Profiles: Multi-Instance Support
-
-Hermes supports **profiles** — multiple fully isolated instances, each with its own
-`HERMES_HOME` directory (config, API keys, memory, sessions, skills, gateway, etc.).
-
-The core mechanism: `_apply_profile_override()` in `hermes_cli/main.py` sets
-`HERMES_HOME` before any module imports. All 119+ references to `get_hermes_home()`
-automatically scope to the active profile.
-
-### Rules for profile-safe code
-
-1. **Use `get_hermes_home()` for all HERMES_HOME paths.** Import from `hermes_constants`.
-   NEVER hardcode `~/.hermes` or `Path.home() / ".hermes"` in code that reads/writes state.
-   ```python
-   # GOOD
-   from hermes_constants import get_hermes_home
-   config_path = get_hermes_home() / "config.yaml"
-
-   # BAD — breaks profiles
-   config_path = Path.home() / ".hermes" / "config.yaml"
-   ```
-
-2. **Use `display_hermes_home()` for user-facing messages.** Import from `hermes_constants`.
-   This returns `~/.hermes` for default or `~/.hermes/profiles/<name>` for profiles.
-   ```python
-   # GOOD
-   from hermes_constants import display_hermes_home
-   print(f"Config saved to {display_hermes_home()}/config.yaml")
-
-   # BAD — shows wrong path for profiles
-   print("Config saved to ~/.hermes/config.yaml")
-   ```
-
-3. **Module-level constants are fine** — they cache `get_hermes_home()` at import time,
-   which is AFTER `_apply_profile_override()` sets the env var. Just use `get_hermes_home()`,
-   not `Path.home() / ".hermes"`.
-
-4. **Tests that mock `Path.home()` must also set `HERMES_HOME`** — since code now uses
-   `get_hermes_home()` (reads env var), not `Path.home() / ".hermes"`:
-   ```python
-   with patch.object(Path, "home", return_value=tmp_path), \
-        patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}):
-       ...
-   ```
-
-5. **Gateway platform adapters should use token locks** — if the adapter connects with
-   a unique credential (bot token, API key), call `acquire_scoped_lock()` from
-   `gateway.status` in the `connect()`/`start()` method and `release_scoped_lock()` in
-   `disconnect()`/`stop()`. This prevents two profiles from using the same credential.
-   See `gateway/platforms/telegram.py` for the canonical pattern.
-
-6. **Profile operations are HOME-anchored, not HERMES_HOME-anchored** — `_get_profiles_root()`
-   returns `Path.home() / ".hermes" / "profiles"`, NOT `get_hermes_home() / "profiles"`.
-   This is intentional — it lets `hermes -p coder profile list` see all profiles regardless
-   of which one is active.
-
 ## Known Pitfalls

-### DO NOT hardcode `~/.hermes` paths
-Use `get_hermes_home()` from `hermes_constants` for code paths. Use `display_hermes_home()`
-for user-facing print/log messages. Hardcoding `~/.hermes` breaks profiles — each profile
-has its own `HERMES_HOME` directory. This was the source of 5 bugs fixed in PR #3575.
-
 ### DO NOT use `simple_term_menu` for interactive menus
 Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) instead. See `hermes_cli/tools_config.py` for the pattern.

 ### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
 Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.

-### `_last_resolved_tool_names` is a process-global in `hermes_agent/tools/dispatch.py`
-`_run_single_child()` in `delegate_tool.py` saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.
-
-### DO NOT hardcode cross-tool references in schema descriptions
-Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `hermes_agent/tools/dispatch.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
+### `_last_resolved_tool_names` is a process-global in `model_tools.py`
+When subagents overwrite this global, `execute_code` calls after delegation may fail with missing tool imports. Known bug.

 ### Tests must not write to `~/.hermes/`
 The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.

-**Profile tests**: When testing profile features, also mock `Path.home()` so that
-`_get_profiles_root()` and `_get_default_hermes_home()` resolve within the temp dir.
-Use the pattern from `tests/hermes_cli/test_profiles.py`:
-```python
-@pytest.fixture
-def profile_env(tmp_path, monkeypatch):
-    home = tmp_path / ".hermes"
-    home.mkdir()
-    monkeypatch.setattr(Path, "home", lambda: tmp_path)
-    monkeypatch.setenv("HERMES_HOME", str(home))
-    return home
-```
-
 ---

 ## Testing

-**ALWAYS use `scripts/run_tests.sh`** — do not call `pytest` directly. The script enforces
-hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
-4 xdist workers matching GHA ubuntu-latest). Direct `pytest` on a 16+ core
-developer machine with API keys set diverges from CI in ways that have caused
-multiple "works locally, fails in CI" incidents (and the reverse).
-
 ```bash
-scripts/run_tests.sh                                  # full suite, CI-parity
-scripts/run_tests.sh tests/gateway/                   # one directory
-scripts/run_tests.sh tests/agent/test_foo.py::test_x  # one test
-scripts/run_tests.sh -v --tb=long                     # pass-through pytest flags
+source .venv/bin/activate
+python -m pytest tests/ -q          # Full suite (~2500 tests, ~2 min)
+python -m pytest tests/test_model_tools.py -q   # Toolset resolution
+python -m pytest tests/test_cli_init.py -q       # CLI config loading
+python -m pytest tests/gateway/ -q               # Gateway tests
+python -m pytest tests/tools/ -q                 # Tool-level tests
 ```

-### Why the wrapper (and why the old "just call pytest" doesn't work)
-
-Five real sources of local-vs-CI drift the script closes:
-
-| | Without wrapper | With wrapper |
-|---|---|---|
-| Provider API keys | Whatever is in your env (auto-detects pool) | All `*_API_KEY`/`*_TOKEN`/etc. unset |
-| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
-| Timezone | Local TZ (PDT etc.) | UTC |
-| Locale | Whatever is set | C.UTF-8 |
-| xdist workers | `-n auto` = all cores (20+ on a workstation) | `-n 4` matching CI |
-
-`tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest
-invocation (including IDE integrations) gets hermetic behavior — but the wrapper
-is belt-and-suspenders.
-
-### Running without the wrapper (only if you must)
-
-If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
-pytest directly), at minimum activate the venv and pass `-n 4`:
-
-```bash
-source venv/bin/activate
-python -m pytest tests/ -q -n 4
-```
-
-Worker count above 4 will surface test-ordering flakes that CI never sees.
-
 Always run the full suite before pushing changes.
-
-### Don't write change-detector tests
-
-A test is a **change-detector** if it fails whenever data that is **expected
-to change** gets updated — model catalogs, config version numbers,
-enumeration counts, hardcoded lists of provider models. These tests add no
-behavioral coverage; they just guarantee that routine source updates break
-CI and cost engineering time to "fix."
-
-**Do not write:**
-
-```python
-# catalog snapshot — breaks every model release
-assert "gemini-2.5-pro" in _PROVIDER_MODELS["gemini"]
-assert "MiniMax-M2.7" in models
-
-# config version literal — breaks every schema bump
-assert DEFAULT_CONFIG["_config_version"] == 21
-
-# enumeration count — breaks every time a skill/provider is added
-assert len(_PROVIDER_MODELS["huggingface"]) == 8
-```
-
-**Do write:**
-
-```python
-# behavior: does the catalog plumbing work at all?
-assert "gemini" in _PROVIDER_MODELS
-assert len(_PROVIDER_MODELS["gemini"]) >= 1
-
-# behavior: does migration bump the user's version to current latest?
-assert raw["_config_version"] == DEFAULT_CONFIG["_config_version"]
-
-# invariant: no plan-only model leaks into the legacy list
-assert not (set(moonshot_models) & coding_plan_only_models)
-
-# invariant: every model in the catalog has a context-length entry
-for m in _PROVIDER_MODELS["huggingface"]:
-    assert m.lower() in DEFAULT_CONTEXT_LENGTHS_LOWER
-```
-
-The rule: if the test reads like a snapshot of current data, delete it. If
-it reads like a contract about how two pieces of data must relate, keep it.
-When a PR adds a new provider/model and you want a test, make the test
-assert the relationship (e.g. "catalog entries all have context lengths"),
-not the specific names.
-
-Reviewers should reject new change-detector tests; authors should convert
-them into invariants before re-requesting review.
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -72,9 +72,8 @@ export VIRTUAL_ENV="$(pwd)/venv"

 # Install with all extras (messaging, cron, CLI menus, dev tools)
 uv pip install -e ".[all,dev]"
-
-# Optional: RL training submodule
-# git submodule update --init tinker-atropos && uv pip install -e "./tinker-atropos"
+uv pip install -e "./mini-swe-agent"
+uv pip install -e "./tinker-atropos"

 # Optional: browser tools
 npm install
@@ -137,18 +136,17 @@ hermes-agent/
 │   ├── auth.py                   # Provider resolution, OAuth, Nous Portal
 │   ├── models.py                 # OpenRouter model selection lists
 │   ├── banner.py                 # Welcome banner, ASCII art
-│   ├── commands.py               # Central slash command registry (CommandDef), autocomplete, gateway helpers
+│   ├── commands.py               # Slash command definitions + autocomplete
 │   ├── callbacks.py              # Interactive callbacks (clarify, sudo, approval)
 │   ├── doctor.py                 # Diagnostics
-│   ├── skills_hub.py             # Skills Hub CLI + /skills slash command
-│   └── skin_engine.py            # Skin/theme engine — data-driven CLI visual customization
+│   └── skills_hub.py             # Skills Hub CLI + /skills slash command
 │
 ├── tools/                    # Tool implementations (self-registering)
 │   ├── registry.py               # Central tool registry (schemas, handlers, dispatch)
 │   ├── approval.py               # Dangerous command detection + per-session approval
 │   ├── terminal_tool.py          # Terminal orchestration (sudo, env lifecycle, backends)
 │   ├── file_operations.py        # read_file, write_file, search, patch, etc.
-│   ├── web_tools.py              # web_search, web_extract (Parallel/Firecrawl + Gemini summarization)
+│   ├── web_tools.py              # web_search, web_extract (Firecrawl + Gemini summarization)
 │   ├── vision_tools.py           # Image analysis via multimodal models
 │   ├── delegate_tool.py          # Subagent spawning and parallel task execution
 │   ├── code_execution_tool.py    # Sandboxed Python with RPC tool access
@@ -330,20 +328,10 @@ license: MIT
 platforms: [macos, linux]          # Optional — restrict to specific OS platforms
                                   #   Valid: macos, linux, windows
                                   #   Omit to load on all platforms (default)
-required_environment_variables:    # Optional — secure setup-on-load metadata
-  - name: MY_API_KEY
-    prompt: API key
-    help: Where to get it
-    required_for: full functionality
-prerequisites:                     # Optional legacy runtime requirements
-  env_vars: [MY_API_KEY]           #   Backward-compatible alias for required env vars
-  commands: [curl, jq]             #   Advisory only; does not hide the skill
 metadata:
  hermes:
    tags: [Category, Subcategory, Keywords]
    related_skills: [other-skill-name]
-    fallback_for_toolsets: [web]       # Optional — show only when toolset is unavailable
-    requires_toolsets: [terminal]      # Optional — show only when toolset is available
 ---

 # Skill Title
@@ -378,82 +366,6 @@ platforms: [windows]          # Windows only

 If the field is omitted or empty, the skill loads on all platforms (backward compatible). See `skills/apple/` for examples of macOS-only skills.

-### Conditional skill activation
-
-Skills can declare conditions that control when they appear in the system prompt, based on which tools and toolsets are available in the current session. This is primarily used for **fallback skills** — alternatives that should only be shown when a primary tool is unavailable.
-
-Four fields are supported under `metadata.hermes`:
-
-```yaml
-metadata:
-  hermes:
-    fallback_for_toolsets: [web]      # Show ONLY when these toolsets are unavailable
-    requires_toolsets: [terminal]     # Show ONLY when these toolsets are available
-    fallback_for_tools: [web_search]  # Show ONLY when these specific tools are unavailable
-    requires_tools: [terminal]        # Show ONLY when these specific tools are available
-```
-
-**Semantics:**
- `fallback_for_*`: The skill is a backup. It is **hidden** when the listed tools/toolsets are available, and **shown** when they are unavailable. Use this for free alternatives to premium tools.
- `requires_*`: The skill needs certain tools to function. It is **hidden** when the listed tools/toolsets are unavailable. Use this for skills that depend on specific capabilities (e.g., a skill that only makes sense with terminal access).
- If both are specified, both conditions must be satisfied for the skill to appear.
- If neither is specified, the skill is always shown (backward compatible).
-
-**Examples:**
-
-```yaml
-# DuckDuckGo search — shown when Firecrawl (web toolset) is unavailable
-metadata:
-  hermes:
-    fallback_for_toolsets: [web]
-
-# Smart home skill — only useful when terminal is available
-metadata:
-  hermes:
-    requires_toolsets: [terminal]
-
-# Local browser fallback — shown when Browserbase is unavailable
-metadata:
-  hermes:
-    fallback_for_toolsets: [browser]
-```
-
-The filtering happens at prompt build time in `agent/prompt_builder.py`. The `build_skills_system_prompt()` function receives the set of available tools and toolsets from the agent and uses `_skill_should_show()` to evaluate each skill's conditions.
-
-### Skill setup metadata
-
-Skills can declare secure setup-on-load metadata via the `required_environment_variables` frontmatter field. Missing values do not hide the skill from discovery; they trigger a CLI-only secure prompt when the skill is actually loaded.
-
-```yaml
-required_environment_variables:
-  - name: TENOR_API_KEY
-    prompt: Tenor API key
-    help: Get a key from https://developers.google.com/tenor
-    required_for: full functionality
-```
-
-The user may skip setup and keep loading the skill. Hermes only exposes metadata (`stored_as`, `skipped`, `validated`) to the model — never the secret value.
-
-Legacy `prerequisites.env_vars` remains supported and is normalized into the new representation.
-
-```yaml
-prerequisites:
-  env_vars: [TENOR_API_KEY]       # Legacy alias for required_environment_variables
-  commands: [curl, jq]            # Advisory CLI checks
-```
-
-Gateway and messaging sessions never collect secrets in-band; they instruct the user to run `hermes setup` or update `~/.hermes/.env` locally.
-
-**When to declare required environment variables:**
- The skill uses an API key or token that should be collected securely at load time
- The skill can still be useful if the user skips setup, but may degrade gracefully
-
-**When to declare command prerequisites:**
- The skill relies on a CLI tool that may not be installed (e.g., `himalaya`, `openhue`, `ddgs`)
- Treat command checks as guidance, not discovery-time hiding
-
-See `skills/gifs/gif-search/` and `skills/email/himalaya/` for examples.
-
 ### Skill guidelines

 - **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
@@ -463,56 +375,6 @@ See `skills/gifs/gif-search/` and `skills/email/himalaya/` for examples.

 ---

-## Adding a Skin / Theme
-
-Hermes uses a data-driven skin system — no code changes needed to add a new skin.
-
-**Option A: User skin (YAML file)**
-
-Create `~/.hermes/skins/<name>.yaml`:
-
-```yaml
-name: mytheme
-description: Short description of the theme
-
-colors:
-  banner_border: "#HEX"     # Panel border color
-  banner_title: "#HEX"      # Panel title color
-  banner_accent: "#HEX"     # Section header color
-  banner_dim: "#HEX"        # Muted/dim text color
-  banner_text: "#HEX"       # Body text color
-  response_border: "#HEX"   # Response box border
-
-spinner:
-  waiting_faces: ["(⚔)", "(⛨)"]
-  thinking_faces: ["(⚔)", "(⌁)"]
-  thinking_verbs: ["forging", "plotting"]
-  wings:                     # Optional left/right decorations
-    - ["⟪⚔", "⚔⟫"]
-
-branding:
-  agent_name: "My Agent"
-  welcome: "Welcome message"
-  response_label: " ⚔ Agent "
-  prompt_symbol: "⚔ ❯ "
-
-tool_prefix: "╎"             # Tool output line prefix
-```
-
-All fields are optional — missing values inherit from the default skin.
-
-**Option B: Built-in skin**
-
-Add to `_BUILTIN_SKINS` dict in `hermes_cli/skin_engine.py`. Use the same schema as above but as a Python dict. Built-in skins ship with the package and are always available.
-
-**Activating:**
- CLI: `/skin mytheme` or set `display.skin: mytheme` in config.yaml
- Config: `display: { skin: mytheme }`
-
-See `hermes_cli/skin_engine.py` for the full schema and existing skins as examples.
-
---
-
 ## Cross-Platform Compatibility

 Hermes runs on Linux, macOS, and Windows. When writing code that touches the OS:
--- a/54
+++ b/54
@@ -1,54 +0,0 @@
-FROM ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie@sha256:b3c543b6c4f23a5f2df22866bd7857e5d304b67a564f4feab6ac22044dde719b AS uv_source
-FROM tianon/gosu:1.19-trixie@sha256:3b176695959c71e123eb390d427efc665eeb561b1540e82679c15e992006b8b9 AS gosu_source
-FROM debian:13.4
-
-# Disable Python stdout buffering to ensure logs are printed immediately
-ENV PYTHONUNBUFFERED=1
-
-# Store Playwright browsers outside the volume mount so the build-time
-# install survives the /opt/data volume overlay at runtime.
-ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
-
-# Install system dependencies in one layer, clear APT cache
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git && \
-    rm -rf /var/lib/apt/lists/*
-
-# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
-RUN useradd -u 10000 -m -d /opt/data hermes
-
-COPY --chmod=0755 --from=gosu_source /gosu /usr/local/bin/
-COPY --chmod=0755 --from=uv_source /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/
-
-WORKDIR /opt/hermes
-
-# ---------- Layer-cached dependency install ----------
-# Copy only package manifests first so npm install + Playwright are cached
-# unless the lockfiles themselves change.
-COPY package.json package-lock.json ./
-COPY web/package.json web/package-lock.json web/
-
-RUN npm install --prefer-offline --no-audit && \
-    npx playwright install --with-deps chromium --only-shell && \
-    (cd web && npm install --prefer-offline --no-audit) && \
-    npm cache clean --force
-
-# ---------- Source code ----------
-# .dockerignore excludes node_modules, so the installs above survive.
-COPY --chown=hermes:hermes . .
-
-# Build web dashboard (Vite outputs to hermes_agent/cli/web_dist/)
-RUN cd web && npm run build
-
-# ---------- Python virtualenv ----------
-RUN chown hermes:hermes /opt/hermes
-USER hermes
-RUN uv venv && \
-    uv pip install --no-cache-dir -e ".[all]"
-
-# ---------- Runtime ----------
-ENV HERMES_WEB_DIST=/opt/hermes/hermes_agent/cli/web_dist
-ENV HERMES_HOME=/opt/data
-VOLUME [ "/opt/data" ]
-ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,5 +0,0 @@
-graft hermes_agent
-graft skills
-graft optional-skills
-global-exclude __pycache__
-global-exclude *.py[cod]
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
  <img src="assets/banner.png" alt="Hermes Agent" width="100%">
 </p>

-# Hermes Agent ☤
+# Hermes Agent ⚕

 <p align="center">
  <a href="https://hermes-agent.nousresearch.com/docs/"><img src="https://img.shields.io/badge/Docs-hermes--agent.nousresearch.com-FFD700?style=for-the-badge" alt="Documentation"></a>
@@ -13,7 +13,7 @@

 **The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.

-Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
+Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.

 <table>
 <tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
@@ -33,16 +33,15 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
 curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
 ```

-Works on Linux, macOS, WSL2, and Android via Termux. The installer handles the platform-specific setup for you.
+Works on Linux, macOS, and WSL2. The installer handles everything — Python, Node.js, dependencies, and the `hermes` command. No prerequisites except git.

-> **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
->
 > **Windows:** Native Windows is not supported. Please install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) and run the command above.

 After installation:

 ```bash
 source ~/.bashrc    # reload shell (or: source ~/.zshrc)
+hermes setup        # configure your LLM provider
 hermes              # start chatting!
 ```

@@ -52,36 +51,15 @@ hermes              # start chatting!

 ```bash
 hermes              # Interactive CLI — start a conversation
-hermes model        # Choose your LLM provider and model
-hermes tools        # Configure which tools are enabled
-hermes config set   # Set individual config values
+hermes model        # Switch provider or model
+hermes setup        # Re-run the setup wizard
 hermes gateway      # Start the messaging gateway (Telegram, Discord, etc.)
-hermes setup        # Run the full setup wizard (configures everything at once)
-hermes claw migrate # Migrate from OpenClaw (if coming from OpenClaw)
 hermes update       # Update to the latest version
 hermes doctor       # Diagnose any issues
 ```

 📖 **[Full documentation →](https://hermes-agent.nousresearch.com/docs/)**

-## CLI vs Messaging Quick Reference
-
-Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
-
-| Action | CLI | Messaging platforms |
-|---------|-----|---------------------|
-| Start chatting | `hermes` | Run `hermes gateway setup` + `hermes gateway start`, then send the bot a message |
-| Start fresh conversation | `/new` or `/reset` | `/new` or `/reset` |
-| Change model | `/model [provider:model]` | `/model [provider:model]` |
-| Set a personality | `/personality [name]` | `/personality [name]` |
-| Retry or undo the last turn | `/retry`, `/undo` | `/retry`, `/undo` |
-| Compress context / check usage | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
-| Browse skills | `/skills` or `/<skill-name>` | `/skills` or `/<skill-name>` |
-| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
-| Platform-specific status | `/platforms` | `/status`, `/sethome` |
-
-For the full command lists, see the [CLI guide](https://hermes-agent.nousresearch.com/docs/user-guide/cli) and the [Messaging Gateway guide](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
-
 ---

 ## Documentation
@@ -108,64 +86,23 @@ All documentation lives at **[hermes-agent.nousresearch.com/docs](https://hermes

 ---

-## Migrating from OpenClaw
-
-If you're coming from OpenClaw, Hermes can automatically import your settings, memories, skills, and API keys.
-
-**During first-time setup:** The setup wizard (`hermes setup`) automatically detects `~/.openclaw` and offers to migrate before configuration begins.
-
-**Anytime after install:**
-
-```bash
-hermes claw migrate              # Interactive migration (full preset)
-hermes claw migrate --dry-run    # Preview what would be migrated
-hermes claw migrate --preset user-data   # Migrate without secrets
-hermes claw migrate --overwrite  # Overwrite existing conflicts
-```
-
-What gets imported:
- **SOUL.md** — persona file
- **Memories** — MEMORY.md and USER.md entries
- **Skills** — user-created skills → `~/.hermes/skills/openclaw-imports/`
- **Command allowlist** — approval patterns
- **Messaging settings** — platform configs, allowed users, working directory
- **API keys** — allowlisted secrets (Telegram, OpenRouter, OpenAI, Anthropic, ElevenLabs)
- **TTS assets** — workspace audio files
- **Workspace instructions** — AGENTS.md (with `--workspace-target`)
-
-See `hermes claw migrate --help` for all options, or use the `openclaw-migration` skill for an interactive agent-guided migration with dry-run previews.
-
---
-
 ## Contributing

 We welcome contributions! See the [Contributing Guide](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) for development setup, code style, and PR process.

-Quick start for contributors — clone and go with `setup-hermes.sh`:
+Quick start for contributors:

 ```bash
-git clone https://github.com/NousResearch/hermes-agent.git
+git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
 cd hermes-agent
-./setup-hermes.sh     # installs uv, creates venv, installs .[all], symlinks ~/.local/bin/hermes
-./hermes              # auto-detects the venv, no need to `source` first
-```
-
-Manual path (equivalent to the above):
-
-```bash
 curl -LsSf https://astral.sh/uv/install.sh | sh
-uv venv venv --python 3.11
-source venv/bin/activate
+uv venv .venv --python 3.11
+source .venv/bin/activate
 uv pip install -e ".[all,dev]"
+uv pip install -e "./mini-swe-agent"
 python -m pytest tests/ -q
 ```

-> **RL Training (optional):** To work on the RL/Tinker-Atropos integration:
-> ```bash
-> git submodule update --init tinker-atropos
-> uv pip install -e "./tinker-atropos"
-> ```
-
 ---

 ## Community
@@ -174,7 +111,6 @@ python -m pytest tests/ -q
 - 📚 [Skills Hub](https://agentskills.io)
 - 🐛 [Issues](https://github.com/NousResearch/hermes-agent/issues)
 - 💡 [Discussions](https://github.com/NousResearch/hermes-agent/discussions)
- 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — Community WeChat bridge: Run Hermes Agent and OpenClaw on the same WeChat account.

 ---

--- a/RELEASE_v0.10.0.md
+++ b/RELEASE_v0.10.0.md
@@ -1,27 +0,0 @@
-# Hermes Agent v0.10.0 (v2026.4.16)
-
-**Release Date:** April 16, 2026
-
-> The Tool Gateway release — paid Nous Portal subscribers can now use web search, image generation, text-to-speech, and browser automation through their existing subscription with zero additional API keys.
-
---
-
-## ✨ Highlights
-
- **Nous Tool Gateway** — Paid [Nous Portal](https://portal.nousresearch.com) subscribers now get automatic access to **web search** (Firecrawl), **image generation** (FAL / FLUX 2 Pro), **text-to-speech** (OpenAI TTS), and **browser automation** (Browser Use) through their existing subscription. No separate API keys needed — just run `hermes model`, select Nous Portal, and pick which tools to enable. Per-tool opt-in via `use_gateway` config, full integration with `hermes tools` and `hermes status`, and the runtime correctly prefers the gateway even when direct API keys exist. Replaces the old hidden `HERMES_ENABLE_NOUS_MANAGED_TOOLS` env var with clean subscription-based detection. ([#11206](https://github.com/NousResearch/hermes-agent/pull/11206), based on work by @jquesnelle; docs: [#11208](https://github.com/NousResearch/hermes-agent/pull/11208))
-
---
-
-## 🐛 Bug Fixes & Improvements
-
-This release includes 180+ commits with numerous bug fixes, platform improvements, and reliability enhancements across the agent core, gateway, CLI, and tool system. Full details will be published in the v0.11.0 changelog.
-
---
-
-## 👥 Contributors
-
- **@jquesnelle** (emozilla) — Original Tool Gateway implementation ([#10799](https://github.com/NousResearch/hermes-agent/pull/10799)), salvaged and shipped in this release
-
---
-
-**Full Changelog**: [v2026.4.13...v2026.4.16](https://github.com/NousResearch/hermes-agent/compare/v2026.4.13...v2026.4.16)
--- a/RELEASE_v0.2.0.md
+++ b/RELEASE_v0.2.0.md
@@ -1,383 +0,0 @@
-# Hermes Agent v0.2.0 (v2026.3.12)
-
-**Release Date:** March 12, 2026
-
-> First tagged release since v0.1.0 (the initial pre-public foundation). In just over two weeks, Hermes Agent went from a small internal project to a full-featured AI agent platform — thanks to an explosion of community contributions. This release covers **216 merged pull requests** from **63 contributors**, resolving **119 issues**.
-
---
-
-## ✨ Highlights
-
- **Multi-Platform Messaging Gateway** — Telegram, Discord, Slack, WhatsApp, Signal, Email (IMAP/SMTP), and Home Assistant platforms with unified session management, media attachments, and per-platform tool configuration.
-
- **MCP (Model Context Protocol) Client** — Native MCP support with stdio and HTTP transports, reconnection, resource/prompt discovery, and sampling (server-initiated LLM requests). ([#291](https://github.com/NousResearch/hermes-agent/pull/291) — @0xbyt4, [#301](https://github.com/NousResearch/hermes-agent/pull/301), [#753](https://github.com/NousResearch/hermes-agent/pull/753))
-
- **Skills Ecosystem** — 70+ bundled and optional skills across 15+ categories with a Skills Hub for community discovery, per-platform enable/disable, conditional activation based on tool availability, and prerequisite validation. ([#743](https://github.com/NousResearch/hermes-agent/pull/743) — @teyrebaz33, [#785](https://github.com/NousResearch/hermes-agent/pull/785) — @teyrebaz33)
-
- **Centralized Provider Router** — Unified `call_llm()`/`async_call_llm()` API replaces scattered provider logic across vision, summarization, compression, and trajectory saving. All auxiliary consumers route through a single code path with automatic credential resolution. ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003))
-
- **ACP Server** — VS Code, Zed, and JetBrains editor integration via the Agent Communication Protocol standard. ([#949](https://github.com/NousResearch/hermes-agent/pull/949))
-
- **CLI Skin/Theme Engine** — Data-driven visual customization: banners, spinners, colors, branding. 7 built-in skins + custom YAML skins.
-
- **Git Worktree Isolation** — `hermes -w` launches isolated agent sessions in git worktrees for safe parallel work on the same repo. ([#654](https://github.com/NousResearch/hermes-agent/pull/654))
-
- **Filesystem Checkpoints & Rollback** — Automatic snapshots before destructive operations with `/rollback` to restore. ([#824](https://github.com/NousResearch/hermes-agent/pull/824))
-
- **3,289 Tests** — From near-zero test coverage to a comprehensive test suite covering agent, gateway, tools, cron, and CLI.
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### Provider & Model Support
- Centralized provider router with `resolve_provider_client()` + `call_llm()` API ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003))
- Nous Portal as first-class provider in setup ([#644](https://github.com/NousResearch/hermes-agent/issues/644))
- OpenAI Codex (Responses API) with ChatGPT subscription support ([#43](https://github.com/NousResearch/hermes-agent/pull/43)) — @grp06
- Codex OAuth vision support + multimodal content adapter
- Validate `/model` against live API instead of hardcoded lists
- Self-hosted Firecrawl support ([#460](https://github.com/NousResearch/hermes-agent/pull/460)) — @caentzminger
- Kimi Code API support ([#635](https://github.com/NousResearch/hermes-agent/pull/635)) — @christomitov
- MiniMax model ID update ([#473](https://github.com/NousResearch/hermes-agent/pull/473)) — @tars90percent
- OpenRouter provider routing configuration (provider_preferences)
- Nous credential refresh on 401 errors ([#571](https://github.com/NousResearch/hermes-agent/pull/571), [#269](https://github.com/NousResearch/hermes-agent/pull/269)) — @rewbs
- z.ai/GLM, Kimi/Moonshot, MiniMax, Azure OpenAI as first-class providers
- Unified `/model` and `/provider` into single view
-
-### Agent Loop & Conversation
- Simple fallback model for provider resilience ([#740](https://github.com/NousResearch/hermes-agent/pull/740))
- Shared iteration budget across parent + subagent delegation
- Iteration budget pressure via tool result injection
- Configurable subagent provider/model with full credential resolution
- Handle 413 payload-too-large via compression instead of aborting ([#153](https://github.com/NousResearch/hermes-agent/pull/153)) — @tekelala
- Retry with rebuilt payload after compression ([#616](https://github.com/NousResearch/hermes-agent/pull/616)) — @tripledoublev
- Auto-compress pathologically large gateway sessions ([#628](https://github.com/NousResearch/hermes-agent/issues/628))
- Tool call repair middleware — auto-lowercase and invalid tool handler
- Reasoning effort configuration and `/reasoning` command ([#921](https://github.com/NousResearch/hermes-agent/pull/921))
- Detect and block file re-read/search loops after context compression ([#705](https://github.com/NousResearch/hermes-agent/pull/705)) — @0xbyt4
-
-### Session & Memory
- Session naming with unique titles, auto-lineage, rich listing, and resume by name ([#720](https://github.com/NousResearch/hermes-agent/pull/720))
- Interactive session browser with search filtering ([#733](https://github.com/NousResearch/hermes-agent/pull/733))
- Display previous messages when resuming a session ([#734](https://github.com/NousResearch/hermes-agent/pull/734))
- Honcho AI-native cross-session user modeling ([#38](https://github.com/NousResearch/hermes-agent/pull/38)) — @erosika
- Proactive async memory flush on session expiry
- Smart context length probing with persistent caching + banner display
- `/resume` command for switching to named sessions in gateway
- Session reset policy for messaging platforms
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### Telegram
- Native file attachments: send_document + send_video
- Document file processing for PDF, text, and Office files — @tekelala
- Forum topic session isolation ([#766](https://github.com/NousResearch/hermes-agent/pull/766)) — @spanishflu-est1918
- Browser screenshot sharing via MEDIA: protocol ([#657](https://github.com/NousResearch/hermes-agent/pull/657))
- Location support for find-nearby skill
- TTS voice message accumulation fix ([#176](https://github.com/NousResearch/hermes-agent/pull/176)) — @Bartok9
- Improved error handling and logging ([#763](https://github.com/NousResearch/hermes-agent/pull/763)) — @aydnOktay
- Italic regex newline fix + 43 format tests ([#204](https://github.com/NousResearch/hermes-agent/pull/204)) — @0xbyt4
-
-### Discord
- Channel topic included in session context ([#248](https://github.com/NousResearch/hermes-agent/pull/248)) — @Bartok9
- DISCORD_ALLOW_BOTS config for bot message filtering ([#758](https://github.com/NousResearch/hermes-agent/pull/758))
- Document and video support ([#784](https://github.com/NousResearch/hermes-agent/pull/784))
- Improved error handling and logging ([#761](https://github.com/NousResearch/hermes-agent/pull/761)) — @aydnOktay
-
-### Slack
- App_mention 404 fix + document/video support ([#784](https://github.com/NousResearch/hermes-agent/pull/784))
- Structured logging replacing print statements — @aydnOktay
-
-### WhatsApp
- Native media sending — images, videos, documents ([#292](https://github.com/NousResearch/hermes-agent/pull/292)) — @satelerd
- Multi-user session isolation ([#75](https://github.com/NousResearch/hermes-agent/pull/75)) — @satelerd
- Cross-platform port cleanup replacing Linux-only fuser ([#433](https://github.com/NousResearch/hermes-agent/pull/433)) — @Farukest
- DM interrupt key mismatch fix ([#350](https://github.com/NousResearch/hermes-agent/pull/350)) — @Farukest
-
-### Signal
- Full Signal messenger gateway via signal-cli-rest-api ([#405](https://github.com/NousResearch/hermes-agent/issues/405))
- Media URL support in message events ([#871](https://github.com/NousResearch/hermes-agent/pull/871))
-
-### Email (IMAP/SMTP)
- New email gateway platform — @0xbyt4
-
-### Home Assistant
- REST tools + WebSocket gateway integration ([#184](https://github.com/NousResearch/hermes-agent/pull/184)) — @0xbyt4
- Service discovery and enhanced setup
- Toolset mapping fix ([#538](https://github.com/NousResearch/hermes-agent/pull/538)) — @Himess
-
-### Gateway Core
- Expose subagent tool calls and thinking to users ([#186](https://github.com/NousResearch/hermes-agent/pull/186)) — @cutepawss
- Configurable background process watcher notifications ([#840](https://github.com/NousResearch/hermes-agent/pull/840))
- `edit_message()` for Telegram/Discord/Slack with fallback
- `/compress`, `/usage`, `/update` slash commands
- Eliminated 3x SQLite message duplication in gateway sessions ([#873](https://github.com/NousResearch/hermes-agent/pull/873))
- Stabilize system prompt across gateway turns for cache hits ([#754](https://github.com/NousResearch/hermes-agent/pull/754))
- MCP server shutdown on gateway exit ([#796](https://github.com/NousResearch/hermes-agent/pull/796)) — @0xbyt4
- Pass session_db to AIAgent, fixing session_search error ([#108](https://github.com/NousResearch/hermes-agent/pull/108)) — @Bartok9
- Persist transcript changes in /retry, /undo; fix /reset attribute ([#217](https://github.com/NousResearch/hermes-agent/pull/217)) — @Farukest
- UTF-8 encoding fix preventing Windows crashes ([#369](https://github.com/NousResearch/hermes-agent/pull/369)) — @ch3ronsa
-
---
-
-## 🖥️ CLI & User Experience
-
-### Interactive CLI
- Data-driven skin/theme engine — 7 built-in skins (default, ares, mono, slate, poseidon, sisyphus, charizard) + custom YAML skins
- `/personality` command with custom personality + disable support ([#773](https://github.com/NousResearch/hermes-agent/pull/773)) — @teyrebaz33
- User-defined quick commands that bypass the agent loop ([#746](https://github.com/NousResearch/hermes-agent/pull/746)) — @teyrebaz33
- `/reasoning` command for effort level and display toggle ([#921](https://github.com/NousResearch/hermes-agent/pull/921))
- `/verbose` slash command to toggle debug at runtime ([#94](https://github.com/NousResearch/hermes-agent/pull/94)) — @cesareth
- `/insights` command — usage analytics, cost estimation & activity patterns ([#552](https://github.com/NousResearch/hermes-agent/pull/552))
- `/background` command for managing background processes
- `/help` formatting with command categories
- Bell-on-complete — terminal bell when agent finishes ([#738](https://github.com/NousResearch/hermes-agent/pull/738))
- Up/down arrow history navigation
- Clipboard image paste (Alt+V / Ctrl+V)
- Loading indicators for slow slash commands ([#882](https://github.com/NousResearch/hermes-agent/pull/882))
- Spinner flickering fix under patch_stdout ([#91](https://github.com/NousResearch/hermes-agent/pull/91)) — @0xbyt4
- `--quiet/-Q` flag for programmatic single-query mode
- `--fuck-it-ship-it` flag to bypass all approval prompts ([#724](https://github.com/NousResearch/hermes-agent/pull/724)) — @dmahan93
- Tools summary flag ([#767](https://github.com/NousResearch/hermes-agent/pull/767)) — @luisv-1
- Terminal blinking fix on SSH ([#284](https://github.com/NousResearch/hermes-agent/pull/284)) — @ygd58
- Multi-line paste detection fix ([#84](https://github.com/NousResearch/hermes-agent/pull/84)) — @0xbyt4
-
-### Setup & Configuration
- Modular setup wizard with section subcommands and tool-first UX
- Container resource configuration prompts
- Backend validation for required binaries
- Config migration system (currently v7)
- API keys properly routed to .env instead of config.yaml ([#469](https://github.com/NousResearch/hermes-agent/pull/469)) — @ygd58
- Atomic write for .env to prevent API key loss on crash ([#954](https://github.com/NousResearch/hermes-agent/pull/954))
- `hermes tools` — per-platform tool enable/disable with curses UI
- `hermes doctor` for health checks across all configured providers
- `hermes update` with auto-restart for gateway service
- Show update-available notice in CLI banner
- Multiple named custom providers
- Shell config detection improvement for PATH setup ([#317](https://github.com/NousResearch/hermes-agent/pull/317)) — @mehmetkr-31
- Consistent HERMES_HOME and .env path resolution ([#51](https://github.com/NousResearch/hermes-agent/pull/51), [#48](https://github.com/NousResearch/hermes-agent/pull/48)) — @deankerr
- Docker backend fix on macOS + subagent auth for Nous Portal ([#46](https://github.com/NousResearch/hermes-agent/pull/46)) — @rsavitt
-
---
-
-## 🔧 Tool System
-
-### MCP (Model Context Protocol)
- Native MCP client with stdio + HTTP transports ([#291](https://github.com/NousResearch/hermes-agent/pull/291) — @0xbyt4, [#301](https://github.com/NousResearch/hermes-agent/pull/301))
- Sampling support — server-initiated LLM requests ([#753](https://github.com/NousResearch/hermes-agent/pull/753))
- Resource and prompt discovery
- Automatic reconnection and security hardening
- Banner integration, `/reload-mcp` command
- `hermes tools` UI integration
-
-### Browser
- Local browser backend — zero-cost headless Chromium (no Browserbase needed)
- Console/errors tool, annotated screenshots, auto-recording, dogfood QA skill ([#745](https://github.com/NousResearch/hermes-agent/pull/745))
- Screenshot sharing via MEDIA: on all messaging platforms ([#657](https://github.com/NousResearch/hermes-agent/pull/657))
-
-### Terminal & Execution
- `execute_code` sandbox with json_parse, shell_quote, retry helpers
- Docker: custom volume mounts ([#158](https://github.com/NousResearch/hermes-agent/pull/158)) — @Indelwin
- Daytona cloud sandbox backend ([#451](https://github.com/NousResearch/hermes-agent/pull/451)) — @rovle
- SSH backend fix ([#59](https://github.com/NousResearch/hermes-agent/pull/59)) — @deankerr
- Shell noise filtering and login shell execution for environment consistency
- Head+tail truncation for execute_code stdout overflow
- Configurable background process notification modes
-
-### File Operations
- Filesystem checkpoints and `/rollback` command ([#824](https://github.com/NousResearch/hermes-agent/pull/824))
- Structured tool result hints (next-action guidance) for patch and search_files ([#722](https://github.com/NousResearch/hermes-agent/issues/722))
- Docker volumes passed to sandbox container config ([#687](https://github.com/NousResearch/hermes-agent/pull/687)) — @manuelschipper
-
---
-
-## 🧩 Skills Ecosystem
-
-### Skills System
- Per-platform skill enable/disable ([#743](https://github.com/NousResearch/hermes-agent/pull/743)) — @teyrebaz33
- Conditional skill activation based on tool availability ([#785](https://github.com/NousResearch/hermes-agent/pull/785)) — @teyrebaz33
- Skill prerequisites — hide skills with unmet dependencies ([#659](https://github.com/NousResearch/hermes-agent/pull/659)) — @kshitijk4poor
- Optional skills — shipped but not activated by default
- `hermes skills browse` — paginated hub browsing
- Skills sub-category organization
- Platform-conditional skill loading
- Atomic skill file writes ([#551](https://github.com/NousResearch/hermes-agent/pull/551)) — @aydnOktay
- Skills sync data loss prevention ([#563](https://github.com/NousResearch/hermes-agent/pull/563)) — @0xbyt4
- Dynamic skill slash commands for CLI and gateway
-
-### New Skills (selected)
- **ASCII Art** — pyfiglet (571 fonts), cowsay, image-to-ascii ([#209](https://github.com/NousResearch/hermes-agent/pull/209)) — @0xbyt4
- **ASCII Video** — Full production pipeline ([#854](https://github.com/NousResearch/hermes-agent/pull/854)) — @SHL0MS
- **DuckDuckGo Search** — Firecrawl fallback ([#267](https://github.com/NousResearch/hermes-agent/pull/267)) — @gamedevCloudy; DDGS API expansion ([#598](https://github.com/NousResearch/hermes-agent/pull/598)) — @areu01or00
- **Solana Blockchain** — Wallet balances, USD pricing, token names ([#212](https://github.com/NousResearch/hermes-agent/pull/212)) — @gizdusum
- **AgentMail** — Agent-owned email inboxes ([#330](https://github.com/NousResearch/hermes-agent/pull/330)) — @teyrebaz33
- **Polymarket** — Prediction market data (read-only) ([#629](https://github.com/NousResearch/hermes-agent/pull/629))
- **OpenClaw Migration** — Official migration tool ([#570](https://github.com/NousResearch/hermes-agent/pull/570)) — @unmodeled-tyler
- **Domain Intelligence** — Passive recon: subdomains, SSL, WHOIS, DNS ([#136](https://github.com/NousResearch/hermes-agent/pull/136)) — @FurkanL0
- **Superpowers** — Software development skills ([#137](https://github.com/NousResearch/hermes-agent/pull/137)) — @kaos35
- **Hermes-Atropos** — RL environment development skill ([#815](https://github.com/NousResearch/hermes-agent/pull/815))
- Plus: arXiv search, OCR/documents, Excalidraw diagrams, YouTube transcripts, GIF search, Pokémon player, Minecraft modpack server, OpenHue (Philips Hue), Google Workspace, Notion, PowerPoint, Obsidian, find-nearby, and 40+ MLOps skills
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- Path traversal fix in skill_view — prevented reading arbitrary files ([#220](https://github.com/NousResearch/hermes-agent/issues/220)) — @Farukest
- Shell injection prevention in sudo password piping ([#65](https://github.com/NousResearch/hermes-agent/pull/65)) — @leonsgithub
- Dangerous command detection: multiline bypass fix ([#233](https://github.com/NousResearch/hermes-agent/pull/233)) — @Farukest; tee/process substitution patterns ([#280](https://github.com/NousResearch/hermes-agent/pull/280)) — @dogiladeveloper
- Symlink boundary check fix in skills_guard ([#386](https://github.com/NousResearch/hermes-agent/pull/386)) — @Farukest
- Symlink bypass fix in write deny list on macOS ([#61](https://github.com/NousResearch/hermes-agent/pull/61)) — @0xbyt4
- Multi-word prompt injection bypass prevention ([#192](https://github.com/NousResearch/hermes-agent/pull/192)) — @0xbyt4
- Cron prompt injection scanner bypass fix ([#63](https://github.com/NousResearch/hermes-agent/pull/63)) — @0xbyt4
- Enforce 0600/0700 file permissions on sensitive files ([#757](https://github.com/NousResearch/hermes-agent/pull/757))
- .env file permissions restricted to owner-only ([#529](https://github.com/NousResearch/hermes-agent/pull/529)) — @Himess
- `--force` flag properly blocked from overriding dangerous verdicts ([#388](https://github.com/NousResearch/hermes-agent/pull/388)) — @Farukest
- FTS5 query sanitization + DB connection leak fix ([#565](https://github.com/NousResearch/hermes-agent/pull/565)) — @0xbyt4
- Expand secret redaction patterns + config toggle to disable
- In-memory permanent allowlist to prevent data leak ([#600](https://github.com/NousResearch/hermes-agent/pull/600)) — @alireza78a
-
-### Atomic Writes (data loss prevention)
- sessions.json ([#611](https://github.com/NousResearch/hermes-agent/pull/611)) — @alireza78a
- Cron jobs ([#146](https://github.com/NousResearch/hermes-agent/pull/146)) — @alireza78a
- .env config ([#954](https://github.com/NousResearch/hermes-agent/pull/954))
- Process checkpoints ([#298](https://github.com/NousResearch/hermes-agent/pull/298)) — @aydnOktay
- Batch runner ([#297](https://github.com/NousResearch/hermes-agent/pull/297)) — @aydnOktay
- Skill files ([#551](https://github.com/NousResearch/hermes-agent/pull/551)) — @aydnOktay
-
-### Reliability
- Guard all print() against OSError for systemd/headless environments ([#963](https://github.com/NousResearch/hermes-agent/pull/963))
- Reset all retry counters at start of run_conversation ([#607](https://github.com/NousResearch/hermes-agent/pull/607)) — @0xbyt4
- Return deny on approval callback timeout instead of None ([#603](https://github.com/NousResearch/hermes-agent/pull/603)) — @0xbyt4
- Fix None message content crashes across codebase ([#277](https://github.com/NousResearch/hermes-agent/pull/277))
- Fix context overrun crash with local LLM backends ([#403](https://github.com/NousResearch/hermes-agent/pull/403)) — @ch3ronsa
- Prevent `_flush_sentinel` from leaking to external APIs ([#227](https://github.com/NousResearch/hermes-agent/pull/227)) — @Farukest
- Prevent conversation_history mutation in callers ([#229](https://github.com/NousResearch/hermes-agent/pull/229)) — @Farukest
- Fix systemd restart loop ([#614](https://github.com/NousResearch/hermes-agent/pull/614)) — @voidborne-d
- Close file handles and sockets to prevent fd leaks ([#568](https://github.com/NousResearch/hermes-agent/pull/568) — @alireza78a, [#296](https://github.com/NousResearch/hermes-agent/pull/296) — @alireza78a, [#709](https://github.com/NousResearch/hermes-agent/pull/709) — @memosr)
- Prevent data loss in clipboard PNG conversion ([#602](https://github.com/NousResearch/hermes-agent/pull/602)) — @0xbyt4
- Eliminate shell noise from terminal output ([#293](https://github.com/NousResearch/hermes-agent/pull/293)) — @0xbyt4
- Timezone-aware now() for prompt, cron, and execute_code ([#309](https://github.com/NousResearch/hermes-agent/pull/309)) — @areu01or00
-
-### Windows Compatibility
- Guard POSIX-only process functions ([#219](https://github.com/NousResearch/hermes-agent/pull/219)) — @Farukest
- Windows native support via Git Bash + ZIP-based update fallback
- pywinpty for PTY support ([#457](https://github.com/NousResearch/hermes-agent/pull/457)) — @shitcoinsherpa
- Explicit UTF-8 encoding on all config/data file I/O ([#458](https://github.com/NousResearch/hermes-agent/pull/458)) — @shitcoinsherpa
- Windows-compatible path handling ([#354](https://github.com/NousResearch/hermes-agent/pull/354), [#390](https://github.com/NousResearch/hermes-agent/pull/390)) — @Farukest
- Regex-based search output parsing for drive-letter paths ([#533](https://github.com/NousResearch/hermes-agent/pull/533)) — @Himess
- Auth store file lock for Windows ([#455](https://github.com/NousResearch/hermes-agent/pull/455)) — @shitcoinsherpa
-
---
-
-## 🐛 Notable Bug Fixes
-
- Fix DeepSeek V3 tool call parser silently dropping multi-line JSON arguments ([#444](https://github.com/NousResearch/hermes-agent/pull/444)) — @PercyDikec
- Fix gateway transcript losing 1 message per turn due to offset mismatch ([#395](https://github.com/NousResearch/hermes-agent/pull/395)) — @PercyDikec
- Fix /retry command silently discarding the agent's final response ([#441](https://github.com/NousResearch/hermes-agent/pull/441)) — @PercyDikec
- Fix max-iterations retry returning empty string after think-block stripping ([#438](https://github.com/NousResearch/hermes-agent/pull/438)) — @PercyDikec
- Fix max-iterations retry using hardcoded max_tokens ([#436](https://github.com/NousResearch/hermes-agent/pull/436)) — @Farukest
- Fix Codex status dict key mismatch ([#448](https://github.com/NousResearch/hermes-agent/pull/448)) and visibility filter ([#446](https://github.com/NousResearch/hermes-agent/pull/446)) — @PercyDikec
- Strip \<think\> blocks from final user-facing responses ([#174](https://github.com/NousResearch/hermes-agent/pull/174)) — @Bartok9
- Fix \<think\> block regex stripping visible content when model discusses tags literally ([#786](https://github.com/NousResearch/hermes-agent/issues/786))
- Fix Mistral 422 errors from leftover finish_reason in assistant messages ([#253](https://github.com/NousResearch/hermes-agent/pull/253)) — @Sertug17
- Fix OPENROUTER_API_KEY resolution order across all code paths ([#295](https://github.com/NousResearch/hermes-agent/pull/295)) — @0xbyt4
- Fix OPENAI_BASE_URL API key priority ([#420](https://github.com/NousResearch/hermes-agent/pull/420)) — @manuelschipper
- Fix Anthropic "prompt is too long" 400 error not detected as context length error ([#813](https://github.com/NousResearch/hermes-agent/issues/813))
- Fix SQLite session transcript accumulating duplicate messages — 3-4x token inflation ([#860](https://github.com/NousResearch/hermes-agent/issues/860))
- Fix setup wizard skipping API key prompts on first install ([#748](https://github.com/NousResearch/hermes-agent/pull/748))
- Fix setup wizard showing OpenRouter model list for Nous Portal ([#575](https://github.com/NousResearch/hermes-agent/pull/575)) — @PercyDikec
- Fix provider selection not persisting when switching via hermes model ([#881](https://github.com/NousResearch/hermes-agent/pull/881))
- Fix Docker backend failing when docker not in PATH on macOS ([#889](https://github.com/NousResearch/hermes-agent/pull/889))
- Fix ClawHub Skills Hub adapter for API endpoint changes ([#286](https://github.com/NousResearch/hermes-agent/pull/286)) — @BP602
- Fix Honcho auto-enable when API key is present ([#243](https://github.com/NousResearch/hermes-agent/pull/243)) — @Bartok9
- Fix duplicate 'skills' subparser crash on Python 3.11+ ([#898](https://github.com/NousResearch/hermes-agent/issues/898))
- Fix memory tool entry parsing when content contains section sign ([#162](https://github.com/NousResearch/hermes-agent/pull/162)) — @aydnOktay
- Fix piped install silently aborting when interactive prompts fail ([#72](https://github.com/NousResearch/hermes-agent/pull/72)) — @cutepawss
- Fix false positives in recursive delete detection ([#68](https://github.com/NousResearch/hermes-agent/pull/68)) — @cutepawss
- Fix Ruff lint warnings across codebase ([#608](https://github.com/NousResearch/hermes-agent/pull/608)) — @JackTheGit
- Fix Anthropic native base URL fail-fast ([#173](https://github.com/NousResearch/hermes-agent/pull/173)) — @adavyas
- Fix install.sh creating ~/.hermes before moving Node.js directory ([#53](https://github.com/NousResearch/hermes-agent/pull/53)) — @JoshuaMart
- Fix SystemExit traceback during atexit cleanup on Ctrl+C ([#55](https://github.com/NousResearch/hermes-agent/pull/55)) — @bierlingm
- Restore missing MIT license file ([#620](https://github.com/NousResearch/hermes-agent/pull/620)) — @stablegenius49
-
---
-
-## 🧪 Testing
-
- **3,289 tests** across agent, gateway, tools, cron, and CLI
- Parallelized test suite with pytest-xdist ([#802](https://github.com/NousResearch/hermes-agent/pull/802)) — @OutThisLife
- Unit tests batch 1: 8 core modules ([#60](https://github.com/NousResearch/hermes-agent/pull/60)) — @0xbyt4
- Unit tests batch 2: 8 more modules ([#62](https://github.com/NousResearch/hermes-agent/pull/62)) — @0xbyt4
- Unit tests batch 3: 8 untested modules ([#191](https://github.com/NousResearch/hermes-agent/pull/191)) — @0xbyt4
- Unit tests batch 4: 5 security/logic-critical modules ([#193](https://github.com/NousResearch/hermes-agent/pull/193)) — @0xbyt4
- AIAgent (run_agent.py) unit tests ([#67](https://github.com/NousResearch/hermes-agent/pull/67)) — @0xbyt4
- Trajectory compressor tests ([#203](https://github.com/NousResearch/hermes-agent/pull/203)) — @0xbyt4
- Clarify tool tests ([#121](https://github.com/NousResearch/hermes-agent/pull/121)) — @Bartok9
- Telegram format tests — 43 tests for italic/bold/code rendering ([#204](https://github.com/NousResearch/hermes-agent/pull/204)) — @0xbyt4
- Vision tools type hints + 42 tests ([#792](https://github.com/NousResearch/hermes-agent/pull/792))
- Compressor tool-call boundary regression tests ([#648](https://github.com/NousResearch/hermes-agent/pull/648)) — @intertwine
- Test structure reorganization ([#34](https://github.com/NousResearch/hermes-agent/pull/34)) — @0xbyt4
- Shell noise elimination + fix 36 test failures ([#293](https://github.com/NousResearch/hermes-agent/pull/293)) — @0xbyt4
-
---
-
-## 🔬 RL & Evaluation Environments
-
- WebResearchEnv — Multi-step web research RL environment ([#434](https://github.com/NousResearch/hermes-agent/pull/434)) — @jackx707
- Modal sandbox concurrency limits to avoid deadlocks ([#621](https://github.com/NousResearch/hermes-agent/pull/621)) — @voteblake
- Hermes-atropos-environments bundled skill ([#815](https://github.com/NousResearch/hermes-agent/pull/815))
- Local vLLM instance support for evaluation — @dmahan93
- YC-Bench long-horizon agent benchmark environment
- OpenThoughts-TBLite evaluation environment and scripts
-
---
-
-## 📚 Documentation
-
- Full documentation website (Docusaurus) with 37+ pages
- Comprehensive platform setup guides for Telegram, Discord, Slack, WhatsApp, Signal, Email
- AGENTS.md — development guide for AI coding assistants
- CONTRIBUTING.md ([#117](https://github.com/NousResearch/hermes-agent/pull/117)) — @Bartok9
- Slash commands reference ([#142](https://github.com/NousResearch/hermes-agent/pull/142)) — @Bartok9
- Comprehensive AGENTS.md accuracy audit ([#732](https://github.com/NousResearch/hermes-agent/pull/732))
- Skin/theme system documentation
- MCP documentation and examples
- Docs accuracy audit — 35+ corrections
- Documentation typo fixes ([#825](https://github.com/NousResearch/hermes-agent/pull/825), [#439](https://github.com/NousResearch/hermes-agent/pull/439)) — @JackTheGit
- CLI config precedence and terminology standardization ([#166](https://github.com/NousResearch/hermes-agent/pull/166), [#167](https://github.com/NousResearch/hermes-agent/pull/167), [#168](https://github.com/NousResearch/hermes-agent/pull/168)) — @Jr-kenny
- Telegram token regex documentation ([#713](https://github.com/NousResearch/hermes-agent/pull/713)) — @VolodymyrBg
-
---
-
-## 👥 Contributors
-
-Thank you to the 63 contributors who made this release possible! In just over two weeks, the Hermes Agent community came together to ship an extraordinary amount of work.
-
-### Core
- **@teknium1** — 43 PRs: Project lead, core architecture, provider router, sessions, skills, CLI, documentation
-
-### Top Community Contributors
- **@0xbyt4** — 40 PRs: MCP client, Home Assistant, security fixes (symlink, prompt injection, cron), extensive test coverage (6 batches), ascii-art skill, shell noise elimination, skills sync, Telegram formatting, and dozens more
- **@Farukest** — 16 PRs: Security hardening (path traversal, dangerous command detection, symlink boundary), Windows compatibility (POSIX guards, path handling), WhatsApp fixes, max-iterations retry, gateway fixes
- **@aydnOktay** — 11 PRs: Atomic writes (process checkpoints, batch runner, skill files), error handling improvements across Telegram, Discord, code execution, transcription, TTS, and skills
- **@Bartok9** — 9 PRs: CONTRIBUTING.md, slash commands reference, Discord channel topics, think-block stripping, TTS fix, Honcho fix, session count fix, clarify tests
- **@PercyDikec** — 7 PRs: DeepSeek V3 parser fix, /retry response discard, gateway transcript offset, Codex status/visibility, max-iterations retry, setup wizard fix
- **@teyrebaz33** — 5 PRs: Skills enable/disable system, quick commands, personality customization, conditional skill activation
- **@alireza78a** — 5 PRs: Atomic writes (cron, sessions), fd leak prevention, security allowlist, code execution socket cleanup
- **@shitcoinsherpa** — 3 PRs: Windows support (pywinpty, UTF-8 encoding, auth store lock)
- **@Himess** — 3 PRs: Cron/HomeAssistant/Daytona fix, Windows drive-letter parsing, .env permissions
- **@satelerd** — 2 PRs: WhatsApp native media, multi-user session isolation
- **@rovle** — 1 PR: Daytona cloud sandbox backend (4 commits)
- **@erosika** — 1 PR: Honcho AI-native memory integration
- **@dmahan93** — 1 PR: --fuck-it-ship-it flag + RL environment work
- **@SHL0MS** — 1 PR: ASCII video skill
-
-### All Contributors
-@0xbyt4, @BP602, @Bartok9, @Farukest, @FurkanL0, @Himess, @Indelwin, @JackTheGit, @JoshuaMart, @Jr-kenny, @OutThisLife, @PercyDikec, @SHL0MS, @Sertug17, @VencentSoliman, @VolodymyrBg, @adavyas, @alireza78a, @areu01or00, @aydnOktay, @batuhankocyigit, @bierlingm, @caentzminger, @cesareth, @ch3ronsa, @christomitov, @cutepawss, @deankerr, @dmahan93, @dogiladeveloper, @dragonkhoi, @erosika, @gamedevCloudy, @gizdusum, @grp06, @intertwine, @jackx707, @jdblackstar, @johnh4098, @kaos35, @kshitijk4poor, @leonsgithub, @luisv-1, @manuelschipper, @mehmetkr-31, @memosr, @PeterFile, @rewbs, @rovle, @rsavitt, @satelerd, @spanishflu-est1918, @stablegenius49, @tars90percent, @tekelala, @teknium1, @teyrebaz33, @tripledoublev, @unmodeled-tyler, @voidborne-d, @voteblake, @ygd58
-
---
-
-**Full Changelog**: [v0.1.0...v2026.3.12](https://github.com/NousResearch/hermes-agent/compare/v0.1.0...v2026.3.12)
--- a/RELEASE_v0.3.0.md
+++ b/RELEASE_v0.3.0.md
@@ -1,377 +0,0 @@
-# Hermes Agent v0.3.0 (v2026.3.17)
-
-**Release Date:** March 17, 2026
-
-> The streaming, plugins, and provider release — unified real-time token delivery, first-class plugin architecture, rebuilt provider system with Vercel AI Gateway, native Anthropic provider, smart approvals, live Chrome CDP browser connect, ACP IDE integration, Honcho memory, voice mode, persistent shell, and 50+ bug fixes across every platform.
-
---
-
-## ✨ Highlights
-
- **Unified Streaming Infrastructure** — Real-time token-by-token delivery in CLI and all gateway platforms. Responses stream as they're generated instead of arriving as a block. ([#1538](https://github.com/NousResearch/hermes-agent/pull/1538))
-
- **First-Class Plugin Architecture** — Drop Python files into `~/.hermes/plugins/` to extend Hermes with custom tools, commands, and hooks. No forking required. ([#1544](https://github.com/NousResearch/hermes-agent/pull/1544), [#1555](https://github.com/NousResearch/hermes-agent/pull/1555))
-
- **Native Anthropic Provider** — Direct Anthropic API calls with Claude Code credential auto-discovery, OAuth PKCE flows, and native prompt caching. No OpenRouter middleman needed. ([#1097](https://github.com/NousResearch/hermes-agent/pull/1097))
-
- **Smart Approvals + /stop Command** — Codex-inspired approval system that learns which commands are safe and remembers your preferences. `/stop` kills the current agent run immediately. ([#1543](https://github.com/NousResearch/hermes-agent/pull/1543))
-
- **Honcho Memory Integration** — Async memory writes, configurable recall modes, session title integration, and multi-user isolation in gateway mode. By @erosika. ([#736](https://github.com/NousResearch/hermes-agent/pull/736))
-
- **Voice Mode** — Push-to-talk in CLI, voice notes in Telegram/Discord, Discord voice channel support, and local Whisper transcription via faster-whisper. ([#1299](https://github.com/NousResearch/hermes-agent/pull/1299), [#1185](https://github.com/NousResearch/hermes-agent/pull/1185), [#1429](https://github.com/NousResearch/hermes-agent/pull/1429))
-
- **Concurrent Tool Execution** — Multiple independent tool calls now run in parallel via ThreadPoolExecutor, significantly reducing latency for multi-tool turns. ([#1152](https://github.com/NousResearch/hermes-agent/pull/1152))
-
- **PII Redaction** — When `privacy.redact_pii` is enabled, personally identifiable information is automatically scrubbed before sending context to LLM providers. ([#1542](https://github.com/NousResearch/hermes-agent/pull/1542))
-
- **`/browser connect` via CDP** — Attach browser tools to a live Chrome instance through Chrome DevTools Protocol. Debug, inspect, and interact with pages you already have open. ([#1549](https://github.com/NousResearch/hermes-agent/pull/1549))
-
- **Vercel AI Gateway Provider** — Route Hermes through Vercel's AI Gateway for access to their model catalog and infrastructure. ([#1628](https://github.com/NousResearch/hermes-agent/pull/1628))
-
- **Centralized Provider Router** — Rebuilt provider system with `call_llm` API, unified `/model` command, auto-detect provider on model switch, and direct endpoint overrides for auxiliary/delegation clients. ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003), [#1506](https://github.com/NousResearch/hermes-agent/pull/1506), [#1375](https://github.com/NousResearch/hermes-agent/pull/1375))
-
- **ACP Server (IDE Integration)** — VS Code, Zed, and JetBrains can now connect to Hermes as an agent backend, with full slash command support. ([#1254](https://github.com/NousResearch/hermes-agent/pull/1254), [#1532](https://github.com/NousResearch/hermes-agent/pull/1532))
-
- **Persistent Shell Mode** — Local and SSH terminal backends can maintain shell state across tool calls — cd, env vars, and aliases persist. By @alt-glitch. ([#1067](https://github.com/NousResearch/hermes-agent/pull/1067), [#1483](https://github.com/NousResearch/hermes-agent/pull/1483))
-
- **Agentic On-Policy Distillation (OPD)** — New RL training environment for distilling agent policies, expanding the Atropos training ecosystem. ([#1149](https://github.com/NousResearch/hermes-agent/pull/1149))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### Provider & Model Support
- **Centralized provider router** with `call_llm` API and unified `/model` command — switch models and providers seamlessly ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003))
- **Vercel AI Gateway** provider support ([#1628](https://github.com/NousResearch/hermes-agent/pull/1628))
- **Auto-detect provider** when switching models via `/model` ([#1506](https://github.com/NousResearch/hermes-agent/pull/1506))
- **Direct endpoint overrides** for auxiliary and delegation clients — point vision/subagent calls at specific endpoints ([#1375](https://github.com/NousResearch/hermes-agent/pull/1375))
- **Native Anthropic auxiliary vision** — use Claude's native vision API instead of routing through OpenAI-compatible endpoints ([#1377](https://github.com/NousResearch/hermes-agent/pull/1377))
- Anthropic OAuth flow improvements — auto-run `claude setup-token`, reauthentication, PKCE state persistence, identity fingerprinting ([#1132](https://github.com/NousResearch/hermes-agent/pull/1132), [#1360](https://github.com/NousResearch/hermes-agent/pull/1360), [#1396](https://github.com/NousResearch/hermes-agent/pull/1396), [#1597](https://github.com/NousResearch/hermes-agent/pull/1597))
- Fix adaptive thinking without `budget_tokens` for Claude 4.6 models — by @ASRagab ([#1128](https://github.com/NousResearch/hermes-agent/pull/1128))
- Fix Anthropic cache markers through adapter — by @brandtcormorant ([#1216](https://github.com/NousResearch/hermes-agent/pull/1216))
- Retry Anthropic 429/529 errors and surface details to users — by @0xbyt4 ([#1585](https://github.com/NousResearch/hermes-agent/pull/1585))
- Fix Anthropic adapter max_tokens, fallback crash, proxy base_url — by @0xbyt4 ([#1121](https://github.com/NousResearch/hermes-agent/pull/1121))
- Fix DeepSeek V3 parser dropping multiple parallel tool calls — by @mr-emmett-one ([#1365](https://github.com/NousResearch/hermes-agent/pull/1365), [#1300](https://github.com/NousResearch/hermes-agent/pull/1300))
- Accept unlisted models with warning instead of rejecting ([#1047](https://github.com/NousResearch/hermes-agent/pull/1047), [#1102](https://github.com/NousResearch/hermes-agent/pull/1102))
- Skip reasoning params for unsupported OpenRouter models ([#1485](https://github.com/NousResearch/hermes-agent/pull/1485))
- MiniMax Anthropic API compatibility fix ([#1623](https://github.com/NousResearch/hermes-agent/pull/1623))
- Custom endpoint `/models` verification and `/v1` base URL suggestion ([#1480](https://github.com/NousResearch/hermes-agent/pull/1480))
- Resolve delegation providers from `custom_providers` config ([#1328](https://github.com/NousResearch/hermes-agent/pull/1328))
- Kimi model additions and User-Agent fix ([#1039](https://github.com/NousResearch/hermes-agent/pull/1039))
- Strip `call_id`/`response_item_id` for Mistral compatibility ([#1058](https://github.com/NousResearch/hermes-agent/pull/1058))
-
-### Agent Loop & Conversation
- **Anthropic Context Editing API** support ([#1147](https://github.com/NousResearch/hermes-agent/pull/1147))
- Improved context compaction handoff summaries — compressor now preserves more actionable state ([#1273](https://github.com/NousResearch/hermes-agent/pull/1273))
- Sync session_id after mid-run context compression ([#1160](https://github.com/NousResearch/hermes-agent/pull/1160))
- Session hygiene threshold tuned to 50% for more proactive compression ([#1096](https://github.com/NousResearch/hermes-agent/pull/1096), [#1161](https://github.com/NousResearch/hermes-agent/pull/1161))
- Include session ID in system prompt via `--pass-session-id` flag ([#1040](https://github.com/NousResearch/hermes-agent/pull/1040))
- Prevent closed OpenAI client reuse across retries ([#1391](https://github.com/NousResearch/hermes-agent/pull/1391))
- Sanitize chat payloads and provider precedence ([#1253](https://github.com/NousResearch/hermes-agent/pull/1253))
- Handle dict tool call arguments from Codex and local backends ([#1393](https://github.com/NousResearch/hermes-agent/pull/1393), [#1440](https://github.com/NousResearch/hermes-agent/pull/1440))
-
-### Memory & Sessions
- **Improve memory prioritization** — user preferences and corrections weighted above procedural knowledge ([#1548](https://github.com/NousResearch/hermes-agent/pull/1548))
- Tighter memory and session recall guidance in system prompts ([#1329](https://github.com/NousResearch/hermes-agent/pull/1329))
- Persist CLI token counts to session DB for `/insights` ([#1498](https://github.com/NousResearch/hermes-agent/pull/1498))
- Keep Honcho recall out of the cached system prefix ([#1201](https://github.com/NousResearch/hermes-agent/pull/1201))
- Correct `seed_ai_identity` to use `session.add_messages()` ([#1475](https://github.com/NousResearch/hermes-agent/pull/1475))
- Isolate Honcho session routing for multi-user gateway ([#1500](https://github.com/NousResearch/hermes-agent/pull/1500))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### Gateway Core
- **System gateway service mode** — run as a system-level systemd service, not just user-level ([#1371](https://github.com/NousResearch/hermes-agent/pull/1371))
- **Gateway install scope prompts** — choose user vs system scope during setup ([#1374](https://github.com/NousResearch/hermes-agent/pull/1374))
- **Reasoning hot reload** — change reasoning settings without restarting the gateway ([#1275](https://github.com/NousResearch/hermes-agent/pull/1275))
- Default group sessions to per-user isolation — no more shared state across users in group chats ([#1495](https://github.com/NousResearch/hermes-agent/pull/1495), [#1417](https://github.com/NousResearch/hermes-agent/pull/1417))
- Harden gateway restart recovery ([#1310](https://github.com/NousResearch/hermes-agent/pull/1310))
- Cancel active runs during shutdown ([#1427](https://github.com/NousResearch/hermes-agent/pull/1427))
- SSL certificate auto-detection for NixOS and non-standard systems ([#1494](https://github.com/NousResearch/hermes-agent/pull/1494))
- Auto-detect D-Bus session bus for `systemctl --user` on headless servers ([#1601](https://github.com/NousResearch/hermes-agent/pull/1601))
- Auto-enable systemd linger during gateway install on headless servers ([#1334](https://github.com/NousResearch/hermes-agent/pull/1334))
- Fall back to module entrypoint when `hermes` is not on PATH ([#1355](https://github.com/NousResearch/hermes-agent/pull/1355))
- Fix dual gateways on macOS launchd after `hermes update` ([#1567](https://github.com/NousResearch/hermes-agent/pull/1567))
- Remove recursive ExecStop from systemd units ([#1530](https://github.com/NousResearch/hermes-agent/pull/1530))
- Prevent logging handler accumulation in gateway mode ([#1251](https://github.com/NousResearch/hermes-agent/pull/1251))
- Restart on retryable startup failures — by @jplew ([#1517](https://github.com/NousResearch/hermes-agent/pull/1517))
- Backfill model on gateway sessions after agent runs ([#1306](https://github.com/NousResearch/hermes-agent/pull/1306))
- PID-based gateway kill and deferred config write ([#1499](https://github.com/NousResearch/hermes-agent/pull/1499))
-
-### Telegram
- Buffer media groups to prevent self-interruption from photo bursts ([#1341](https://github.com/NousResearch/hermes-agent/pull/1341), [#1422](https://github.com/NousResearch/hermes-agent/pull/1422))
- Retry on transient TLS failures during connect and send ([#1535](https://github.com/NousResearch/hermes-agent/pull/1535))
- Harden polling conflict handling ([#1339](https://github.com/NousResearch/hermes-agent/pull/1339))
- Escape chunk indicators and inline code in MarkdownV2 ([#1478](https://github.com/NousResearch/hermes-agent/pull/1478), [#1626](https://github.com/NousResearch/hermes-agent/pull/1626))
- Check updater/app state before disconnect ([#1389](https://github.com/NousResearch/hermes-agent/pull/1389))
-
-### Discord
- `/thread` command with `auto_thread` config and media metadata fixes ([#1178](https://github.com/NousResearch/hermes-agent/pull/1178))
- Auto-thread on @mention, skip mention text in bot threads ([#1438](https://github.com/NousResearch/hermes-agent/pull/1438))
- Retry without reply reference for system messages ([#1385](https://github.com/NousResearch/hermes-agent/pull/1385))
- Preserve native document and video attachment support ([#1392](https://github.com/NousResearch/hermes-agent/pull/1392))
- Defer discord adapter annotations to avoid optional import crashes ([#1314](https://github.com/NousResearch/hermes-agent/pull/1314))
-
-### Slack
- Thread handling overhaul — progress messages, responses, and session isolation all respect threads ([#1103](https://github.com/NousResearch/hermes-agent/pull/1103))
- Formatting, reactions, user resolution, and command improvements ([#1106](https://github.com/NousResearch/hermes-agent/pull/1106))
- Fix MAX_MESSAGE_LENGTH 3900 → 39000 ([#1117](https://github.com/NousResearch/hermes-agent/pull/1117))
- File upload fallback preserves thread context — by @0xbyt4 ([#1122](https://github.com/NousResearch/hermes-agent/pull/1122))
- Improve setup guidance ([#1387](https://github.com/NousResearch/hermes-agent/pull/1387))
-
-### Email
- Fix IMAP UID tracking and SMTP TLS verification ([#1305](https://github.com/NousResearch/hermes-agent/pull/1305))
- Add `skip_attachments` option via config.yaml ([#1536](https://github.com/NousResearch/hermes-agent/pull/1536))
-
-### Home Assistant
- Event filtering closed by default ([#1169](https://github.com/NousResearch/hermes-agent/pull/1169))
-
---
-
-## 🖥️ CLI & User Experience
-
-### Interactive CLI
- **Persistent CLI status bar** — always-visible model, provider, and token counts ([#1522](https://github.com/NousResearch/hermes-agent/pull/1522))
- **File path autocomplete** in the input prompt ([#1545](https://github.com/NousResearch/hermes-agent/pull/1545))
- **`/plan` command** — generate implementation plans from specs ([#1372](https://github.com/NousResearch/hermes-agent/pull/1372), [#1381](https://github.com/NousResearch/hermes-agent/pull/1381))
- **Major `/rollback` improvements** — richer checkpoint history, clearer UX ([#1505](https://github.com/NousResearch/hermes-agent/pull/1505))
- **Preload CLI skills on launch** — skills are ready before the first prompt ([#1359](https://github.com/NousResearch/hermes-agent/pull/1359))
- **Centralized slash command registry** — all commands defined once, consumed everywhere ([#1603](https://github.com/NousResearch/hermes-agent/pull/1603))
- `/bg` alias for `/background` ([#1590](https://github.com/NousResearch/hermes-agent/pull/1590))
- Prefix matching for slash commands — `/mod` resolves to `/model` ([#1320](https://github.com/NousResearch/hermes-agent/pull/1320))
- `/new`, `/reset`, `/clear` now start genuinely fresh sessions ([#1237](https://github.com/NousResearch/hermes-agent/pull/1237))
- Accept session ID prefixes for session actions ([#1425](https://github.com/NousResearch/hermes-agent/pull/1425))
- TUI prompt and accent output now respect active skin ([#1282](https://github.com/NousResearch/hermes-agent/pull/1282))
- Centralize tool emoji metadata in registry + skin integration ([#1484](https://github.com/NousResearch/hermes-agent/pull/1484))
- "View full command" option added to dangerous command approval — by @teknium1 based on design by community ([#887](https://github.com/NousResearch/hermes-agent/pull/887))
- Non-blocking startup update check and banner deduplication ([#1386](https://github.com/NousResearch/hermes-agent/pull/1386))
- `/reasoning` command output ordering and inline think extraction fixes ([#1031](https://github.com/NousResearch/hermes-agent/pull/1031))
- Verbose mode shows full untruncated output ([#1472](https://github.com/NousResearch/hermes-agent/pull/1472))
- Fix `/status` to report live state and tokens ([#1476](https://github.com/NousResearch/hermes-agent/pull/1476))
- Seed a default global SOUL.md ([#1311](https://github.com/NousResearch/hermes-agent/pull/1311))
-
-### Setup & Configuration
- **OpenClaw migration** during first-time setup — by @kshitijk4poor ([#981](https://github.com/NousResearch/hermes-agent/pull/981))
- `hermes claw migrate` command + migration docs ([#1059](https://github.com/NousResearch/hermes-agent/pull/1059))
- Smart vision setup that respects the user's chosen provider ([#1323](https://github.com/NousResearch/hermes-agent/pull/1323))
- Handle headless setup flows end-to-end ([#1274](https://github.com/NousResearch/hermes-agent/pull/1274))
- Prefer curses over `simple_term_menu` in setup.py ([#1487](https://github.com/NousResearch/hermes-agent/pull/1487))
- Show effective model and provider in `/status` ([#1284](https://github.com/NousResearch/hermes-agent/pull/1284))
- Config set examples use placeholder syntax ([#1322](https://github.com/NousResearch/hermes-agent/pull/1322))
- Reload .env over stale shell overrides ([#1434](https://github.com/NousResearch/hermes-agent/pull/1434))
- Fix is_coding_plan NameError crash — by @0xbyt4 ([#1123](https://github.com/NousResearch/hermes-agent/pull/1123))
- Add missing packages to setuptools config — by @alt-glitch ([#912](https://github.com/NousResearch/hermes-agent/pull/912))
- Installer: clarify why sudo is needed at every prompt ([#1602](https://github.com/NousResearch/hermes-agent/pull/1602))
-
---
-
-## 🔧 Tool System
-
-### Terminal & Execution
- **Persistent shell mode** for local and SSH backends — maintain shell state across tool calls — by @alt-glitch ([#1067](https://github.com/NousResearch/hermes-agent/pull/1067), [#1483](https://github.com/NousResearch/hermes-agent/pull/1483))
- **Tirith pre-exec command scanning** — security layer that analyzes commands before execution ([#1256](https://github.com/NousResearch/hermes-agent/pull/1256))
- Strip Hermes provider env vars from all subprocess environments ([#1157](https://github.com/NousResearch/hermes-agent/pull/1157), [#1172](https://github.com/NousResearch/hermes-agent/pull/1172), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399), [#1419](https://github.com/NousResearch/hermes-agent/pull/1419)) — initial fix by @eren-karakus0
- SSH preflight check ([#1486](https://github.com/NousResearch/hermes-agent/pull/1486))
- Docker backend: make cwd workspace mount explicit opt-in ([#1534](https://github.com/NousResearch/hermes-agent/pull/1534))
- Add project root to PYTHONPATH in execute_code sandbox ([#1383](https://github.com/NousResearch/hermes-agent/pull/1383))
- Eliminate execute_code progress spam on gateway platforms ([#1098](https://github.com/NousResearch/hermes-agent/pull/1098))
- Clearer docker backend preflight errors ([#1276](https://github.com/NousResearch/hermes-agent/pull/1276))
-
-### Browser
- **`/browser connect`** — attach browser tools to a live Chrome instance via CDP ([#1549](https://github.com/NousResearch/hermes-agent/pull/1549))
- Improve browser cleanup, local browser PATH setup, and screenshot recovery ([#1333](https://github.com/NousResearch/hermes-agent/pull/1333))
-
-### MCP
- **Selective tool loading** with utility policies — filter which MCP tools are available ([#1302](https://github.com/NousResearch/hermes-agent/pull/1302))
- Auto-reload MCP tools when `mcp_servers` config changes without restart ([#1474](https://github.com/NousResearch/hermes-agent/pull/1474))
- Resolve npx stdio connection failures ([#1291](https://github.com/NousResearch/hermes-agent/pull/1291))
- Preserve MCP toolsets when saving platform tool config ([#1421](https://github.com/NousResearch/hermes-agent/pull/1421))
-
-### Vision
- Unify vision backend gating ([#1367](https://github.com/NousResearch/hermes-agent/pull/1367))
- Surface actual error reason instead of generic message ([#1338](https://github.com/NousResearch/hermes-agent/pull/1338))
- Make Claude image handling work end-to-end ([#1408](https://github.com/NousResearch/hermes-agent/pull/1408))
-
-### Cron
- **Compress cron management into one tool** — single `cronjob` tool replaces multiple commands ([#1343](https://github.com/NousResearch/hermes-agent/pull/1343))
- Suppress duplicate cron sends to auto-delivery targets ([#1357](https://github.com/NousResearch/hermes-agent/pull/1357))
- Persist cron sessions to SQLite ([#1255](https://github.com/NousResearch/hermes-agent/pull/1255))
- Per-job runtime overrides (provider, model, base_url) ([#1398](https://github.com/NousResearch/hermes-agent/pull/1398))
- Atomic write in `save_job_output` to prevent data loss on crash ([#1173](https://github.com/NousResearch/hermes-agent/pull/1173))
- Preserve thread context for `deliver=origin` ([#1437](https://github.com/NousResearch/hermes-agent/pull/1437))
-
-### Patch Tool
- Avoid corrupting pipe chars in V4A patch apply ([#1286](https://github.com/NousResearch/hermes-agent/pull/1286))
- Permissive `block_anchor` thresholds and unicode normalization ([#1539](https://github.com/NousResearch/hermes-agent/pull/1539))
-
-### Delegation
- Add observability metadata to subagent results (model, tokens, duration, tool trace) ([#1175](https://github.com/NousResearch/hermes-agent/pull/1175))
-
---
-
-## 🧩 Skills Ecosystem
-
-### Skills System
- **Integrate skills.sh** as a hub source alongside ClawHub ([#1303](https://github.com/NousResearch/hermes-agent/pull/1303))
- Secure skill env setup on load ([#1153](https://github.com/NousResearch/hermes-agent/pull/1153))
- Honor policy table for dangerous verdicts ([#1330](https://github.com/NousResearch/hermes-agent/pull/1330))
- Harden ClawHub skill search exact matches ([#1400](https://github.com/NousResearch/hermes-agent/pull/1400))
- Fix ClawHub skill install — use `/download` ZIP endpoint ([#1060](https://github.com/NousResearch/hermes-agent/pull/1060))
- Avoid mislabeling local skills as builtin — by @arceus77-7 ([#862](https://github.com/NousResearch/hermes-agent/pull/862))
-
-### New Skills
- **Linear** project management ([#1230](https://github.com/NousResearch/hermes-agent/pull/1230))
- **X/Twitter** via x-cli ([#1285](https://github.com/NousResearch/hermes-agent/pull/1285))
- **Telephony** — Twilio, SMS, and AI calls ([#1289](https://github.com/NousResearch/hermes-agent/pull/1289))
- **1Password** — by @arceus77-7 ([#883](https://github.com/NousResearch/hermes-agent/pull/883), [#1179](https://github.com/NousResearch/hermes-agent/pull/1179))
- **NeuroSkill BCI** integration ([#1135](https://github.com/NousResearch/hermes-agent/pull/1135))
- **Blender MCP** for 3D modeling ([#1531](https://github.com/NousResearch/hermes-agent/pull/1531))
- **OSS Security Forensics** ([#1482](https://github.com/NousResearch/hermes-agent/pull/1482))
- **Parallel CLI** research skill ([#1301](https://github.com/NousResearch/hermes-agent/pull/1301))
- **OpenCode** CLI skill ([#1174](https://github.com/NousResearch/hermes-agent/pull/1174))
- **ASCII Video** skill refactored — by @SHL0MS ([#1213](https://github.com/NousResearch/hermes-agent/pull/1213), [#1598](https://github.com/NousResearch/hermes-agent/pull/1598))
-
---
-
-## 🎙️ Voice Mode
-
- Voice mode foundation — push-to-talk CLI, Telegram/Discord voice notes ([#1299](https://github.com/NousResearch/hermes-agent/pull/1299))
- Free local Whisper transcription via faster-whisper ([#1185](https://github.com/NousResearch/hermes-agent/pull/1185))
- Discord voice channel reliability fixes ([#1429](https://github.com/NousResearch/hermes-agent/pull/1429))
- Restore local STT fallback for gateway voice notes ([#1490](https://github.com/NousResearch/hermes-agent/pull/1490))
- Honor `stt.enabled: false` across gateway transcription ([#1394](https://github.com/NousResearch/hermes-agent/pull/1394))
- Fix bogus incapability message on Telegram voice notes (Issue [#1033](https://github.com/NousResearch/hermes-agent/issues/1033))
-
---
-
-## 🔌 ACP (IDE Integration)
-
- Restore ACP server implementation ([#1254](https://github.com/NousResearch/hermes-agent/pull/1254))
- Support slash commands in ACP adapter ([#1532](https://github.com/NousResearch/hermes-agent/pull/1532))
-
---
-
-## 🧪 RL Training
-
- **Agentic On-Policy Distillation (OPD)** environment — new RL training environment for agent policy distillation ([#1149](https://github.com/NousResearch/hermes-agent/pull/1149))
- Make tinker-atropos RL training fully optional ([#1062](https://github.com/NousResearch/hermes-agent/pull/1062))
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- **Tirith pre-exec command scanning** — static analysis of terminal commands before execution ([#1256](https://github.com/NousResearch/hermes-agent/pull/1256))
- **PII redaction** when `privacy.redact_pii` is enabled ([#1542](https://github.com/NousResearch/hermes-agent/pull/1542))
- Strip Hermes provider/gateway/tool env vars from all subprocess environments ([#1157](https://github.com/NousResearch/hermes-agent/pull/1157), [#1172](https://github.com/NousResearch/hermes-agent/pull/1172), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399), [#1419](https://github.com/NousResearch/hermes-agent/pull/1419))
- Docker cwd workspace mount now explicit opt-in — never auto-mount host directories ([#1534](https://github.com/NousResearch/hermes-agent/pull/1534))
- Escape parens and braces in fork bomb regex pattern ([#1397](https://github.com/NousResearch/hermes-agent/pull/1397))
- Harden `.worktreeinclude` path containment ([#1388](https://github.com/NousResearch/hermes-agent/pull/1388))
- Use description as `pattern_key` to prevent approval collisions ([#1395](https://github.com/NousResearch/hermes-agent/pull/1395))
-
-### Reliability
- Guard init-time stdio writes ([#1271](https://github.com/NousResearch/hermes-agent/pull/1271))
- Session log writes reuse shared atomic JSON helper ([#1280](https://github.com/NousResearch/hermes-agent/pull/1280))
- Atomic temp cleanup protected on interrupts ([#1401](https://github.com/NousResearch/hermes-agent/pull/1401))
-
---
-
-## 🐛 Notable Bug Fixes
-
- **`/status` always showing 0 tokens** — now reports live state (Issue [#1465](https://github.com/NousResearch/hermes-agent/issues/1465), [#1476](https://github.com/NousResearch/hermes-agent/pull/1476))
- **Custom model endpoints not working** — restored config-saved endpoint resolution (Issue [#1460](https://github.com/NousResearch/hermes-agent/issues/1460), [#1373](https://github.com/NousResearch/hermes-agent/pull/1373))
- **MCP tools not visible until restart** — auto-reload on config change (Issue [#1036](https://github.com/NousResearch/hermes-agent/issues/1036), [#1474](https://github.com/NousResearch/hermes-agent/pull/1474))
- **`hermes tools` removing MCP tools** — preserve MCP toolsets when saving (Issue [#1247](https://github.com/NousResearch/hermes-agent/issues/1247), [#1421](https://github.com/NousResearch/hermes-agent/pull/1421))
- **Terminal subprocesses inheriting `OPENAI_BASE_URL`** breaking external tools (Issue [#1002](https://github.com/NousResearch/hermes-agent/issues/1002), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399))
- **Background process lost on gateway restart** — improved recovery (Issue [#1144](https://github.com/NousResearch/hermes-agent/issues/1144))
- **Cron jobs not persisting state** — now stored in SQLite (Issue [#1416](https://github.com/NousResearch/hermes-agent/issues/1416), [#1255](https://github.com/NousResearch/hermes-agent/pull/1255))
- **Cronjob `deliver: origin` not preserving thread context** (Issue [#1219](https://github.com/NousResearch/hermes-agent/issues/1219), [#1437](https://github.com/NousResearch/hermes-agent/pull/1437))
- **Gateway systemd service failing to auto-restart** when browser processes orphaned (Issue [#1617](https://github.com/NousResearch/hermes-agent/issues/1617))
- **`/background` completion report cut off in Telegram** (Issue [#1443](https://github.com/NousResearch/hermes-agent/issues/1443))
- **Model switching not taking effect** (Issue [#1244](https://github.com/NousResearch/hermes-agent/issues/1244), [#1183](https://github.com/NousResearch/hermes-agent/pull/1183))
- **`hermes doctor` reporting cronjob as unavailable** (Issue [#878](https://github.com/NousResearch/hermes-agent/issues/878), [#1180](https://github.com/NousResearch/hermes-agent/pull/1180))
- **WhatsApp bridge messages not received** from mobile (Issue [#1142](https://github.com/NousResearch/hermes-agent/issues/1142))
- **Setup wizard hanging on headless SSH** (Issue [#905](https://github.com/NousResearch/hermes-agent/issues/905), [#1274](https://github.com/NousResearch/hermes-agent/pull/1274))
- **Log handler accumulation** degrading gateway performance (Issue [#990](https://github.com/NousResearch/hermes-agent/issues/990), [#1251](https://github.com/NousResearch/hermes-agent/pull/1251))
- **Gateway NULL model in DB** (Issue [#987](https://github.com/NousResearch/hermes-agent/issues/987), [#1306](https://github.com/NousResearch/hermes-agent/pull/1306))
- **Strict endpoints rejecting replayed tool_calls** (Issue [#893](https://github.com/NousResearch/hermes-agent/issues/893))
- **Remaining hardcoded `~/.hermes` paths** — all now respect `HERMES_HOME` (Issue [#892](https://github.com/NousResearch/hermes-agent/issues/892), [#1233](https://github.com/NousResearch/hermes-agent/pull/1233))
- **Delegate tool not working with custom inference providers** (Issue [#1011](https://github.com/NousResearch/hermes-agent/issues/1011), [#1328](https://github.com/NousResearch/hermes-agent/pull/1328))
- **Skills Guard blocking official skills** (Issue [#1006](https://github.com/NousResearch/hermes-agent/issues/1006), [#1330](https://github.com/NousResearch/hermes-agent/pull/1330))
- **Setup writing provider before model selection** (Issue [#1182](https://github.com/NousResearch/hermes-agent/issues/1182))
- **`GatewayConfig.get()` AttributeError** crashing all message handling (Issue [#1158](https://github.com/NousResearch/hermes-agent/issues/1158), [#1287](https://github.com/NousResearch/hermes-agent/pull/1287))
- **`/update` hard-failing with "command not found"** (Issue [#1049](https://github.com/NousResearch/hermes-agent/issues/1049))
- **Image analysis failing silently** (Issue [#1034](https://github.com/NousResearch/hermes-agent/issues/1034), [#1338](https://github.com/NousResearch/hermes-agent/pull/1338))
- **API `BadRequestError` from `'dict'` object has no attribute `'strip'`** (Issue [#1071](https://github.com/NousResearch/hermes-agent/issues/1071))
- **Slash commands requiring exact full name** — now uses prefix matching (Issue [#928](https://github.com/NousResearch/hermes-agent/issues/928), [#1320](https://github.com/NousResearch/hermes-agent/pull/1320))
- **Gateway stops responding when terminal is closed on headless** (Issue [#1005](https://github.com/NousResearch/hermes-agent/issues/1005))
-
---
-
-## 🧪 Testing
-
- Cover empty cached Anthropic tool-call turns ([#1222](https://github.com/NousResearch/hermes-agent/pull/1222))
- Fix stale CI assumptions in parser and quick-command coverage ([#1236](https://github.com/NousResearch/hermes-agent/pull/1236))
- Fix gateway async tests without implicit event loop ([#1278](https://github.com/NousResearch/hermes-agent/pull/1278))
- Make gateway async tests xdist-safe ([#1281](https://github.com/NousResearch/hermes-agent/pull/1281))
- Cross-timezone naive timestamp regression for cron ([#1319](https://github.com/NousResearch/hermes-agent/pull/1319))
- Isolate codex provider tests from local env ([#1335](https://github.com/NousResearch/hermes-agent/pull/1335))
- Lock retry replacement semantics ([#1379](https://github.com/NousResearch/hermes-agent/pull/1379))
- Improve error logging in session search tool — by @aydnOktay ([#1533](https://github.com/NousResearch/hermes-agent/pull/1533))
-
---
-
-## 📚 Documentation
-
- Comprehensive SOUL.md guide ([#1315](https://github.com/NousResearch/hermes-agent/pull/1315))
- Voice mode documentation ([#1316](https://github.com/NousResearch/hermes-agent/pull/1316), [#1362](https://github.com/NousResearch/hermes-agent/pull/1362))
- Provider contribution guide ([#1361](https://github.com/NousResearch/hermes-agent/pull/1361))
- ACP and internal systems implementation guides ([#1259](https://github.com/NousResearch/hermes-agent/pull/1259))
- Expand Docusaurus coverage across CLI, tools, skills, and skins ([#1232](https://github.com/NousResearch/hermes-agent/pull/1232))
- Terminal backend and Windows troubleshooting ([#1297](https://github.com/NousResearch/hermes-agent/pull/1297))
- Skills hub reference section ([#1317](https://github.com/NousResearch/hermes-agent/pull/1317))
- Checkpoint, /rollback, and git worktrees guide ([#1493](https://github.com/NousResearch/hermes-agent/pull/1493), [#1524](https://github.com/NousResearch/hermes-agent/pull/1524))
- CLI status bar and /usage reference ([#1523](https://github.com/NousResearch/hermes-agent/pull/1523))
- Fallback providers + /background command docs ([#1430](https://github.com/NousResearch/hermes-agent/pull/1430))
- Gateway service scopes docs ([#1378](https://github.com/NousResearch/hermes-agent/pull/1378))
- Slack thread reply behavior docs ([#1407](https://github.com/NousResearch/hermes-agent/pull/1407))
- Redesigned landing page with Nous blue palette — by @austinpickett ([#974](https://github.com/NousResearch/hermes-agent/pull/974))
- Fix several documentation typos — by @JackTheGit ([#953](https://github.com/NousResearch/hermes-agent/pull/953))
- Stabilize website diagrams ([#1405](https://github.com/NousResearch/hermes-agent/pull/1405))
- CLI vs messaging quick reference in README ([#1491](https://github.com/NousResearch/hermes-agent/pull/1491))
- Add search to Docusaurus ([#1053](https://github.com/NousResearch/hermes-agent/pull/1053))
- Home Assistant integration docs ([#1170](https://github.com/NousResearch/hermes-agent/pull/1170))
-
---
-
-## 👥 Contributors
-
-### Core
- **@teknium1** — 220+ PRs spanning every area of the codebase
-
-### Top Community Contributors
-
- **@0xbyt4** (4 PRs) — Anthropic adapter fixes (max_tokens, fallback crash, 429/529 retry), Slack file upload thread context, setup NameError fix
- **@erosika** (1 PR) — Honcho memory integration: async writes, memory modes, session title integration
- **@SHL0MS** (2 PRs) — ASCII video skill design patterns and refactoring
- **@alt-glitch** (2 PRs) — Persistent shell mode for local/SSH backends, setuptools packaging fix
- **@arceus77-7** (2 PRs) — 1Password skill, fix skills list mislabeling
- **@kshitijk4poor** (1 PR) — OpenClaw migration during setup wizard
- **@ASRagab** (1 PR) — Fix adaptive thinking for Claude 4.6 models
- **@eren-karakus0** (1 PR) — Strip Hermes provider env vars from subprocess environment
- **@mr-emmett-one** (1 PR) — Fix DeepSeek V3 parser multi-tool call support
- **@jplew** (1 PR) — Gateway restart on retryable startup failures
- **@brandtcormorant** (1 PR) — Fix Anthropic cache control for empty text blocks
- **@aydnOktay** (1 PR) — Improve error logging in session search tool
- **@austinpickett** (1 PR) — Landing page redesign with Nous blue palette
- **@JackTheGit** (1 PR) — Documentation typo fixes
-
-### All Contributors
-
-@0xbyt4, @alt-glitch, @arceus77-7, @ASRagab, @austinpickett, @aydnOktay, @brandtcormorant, @eren-karakus0, @erosika, @JackTheGit, @jplew, @kshitijk4poor, @mr-emmett-one, @SHL0MS, @teknium1
-
---
-
-**Full Changelog**: [v2026.3.12...v2026.3.17](https://github.com/NousResearch/hermes-agent/compare/v2026.3.12...v2026.3.17)
--- a/RELEASE_v0.4.0.md
+++ b/RELEASE_v0.4.0.md
@@ -1,400 +0,0 @@
-# Hermes Agent v0.4.0 (v2026.3.23)
-
-**Release Date:** March 23, 2026
-
-> The platform expansion release — OpenAI-compatible API server, 6 new messaging adapters, 4 new inference providers, MCP server management with OAuth 2.1, @ context references, gateway prompt caching, streaming enabled by default, and a sweeping reliability pass with 200+ bug fixes.
-
---
-
-## ✨ Highlights
-
- **OpenAI-compatible API server** — Expose Hermes as an `/v1/chat/completions` endpoint with a new `/api/jobs` REST API for cron job management, hardened with input limits, field whitelists, SQLite-backed response persistence, and CORS origin protection ([#1756](https://github.com/NousResearch/hermes-agent/pull/1756), [#2450](https://github.com/NousResearch/hermes-agent/pull/2450), [#2456](https://github.com/NousResearch/hermes-agent/pull/2456), [#2451](https://github.com/NousResearch/hermes-agent/pull/2451), [#2472](https://github.com/NousResearch/hermes-agent/pull/2472))
-
- **6 new messaging platform adapters** — Signal, DingTalk, SMS (Twilio), Mattermost, Matrix, and Webhook adapters join Telegram, Discord, and WhatsApp. Gateway auto-reconnects failed platforms with exponential backoff ([#2206](https://github.com/NousResearch/hermes-agent/pull/2206), [#1685](https://github.com/NousResearch/hermes-agent/pull/1685), [#1688](https://github.com/NousResearch/hermes-agent/pull/1688), [#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2166](https://github.com/NousResearch/hermes-agent/pull/2166), [#2584](https://github.com/NousResearch/hermes-agent/pull/2584))
-
- **@ context references** — Claude Code-style `@file` and `@url` context injection with tab completions in the CLI ([#2343](https://github.com/NousResearch/hermes-agent/pull/2343), [#2482](https://github.com/NousResearch/hermes-agent/pull/2482))
-
- **4 new inference providers** — GitHub Copilot (OAuth + token validation), Alibaba Cloud / DashScope, Kilo Code, and OpenCode Zen/Go ([#1924](https://github.com/NousResearch/hermes-agent/pull/1924), [#1879](https://github.com/NousResearch/hermes-agent/pull/1879) by @mchzimm, [#1673](https://github.com/NousResearch/hermes-agent/pull/1673), [#1666](https://github.com/NousResearch/hermes-agent/pull/1666), [#1650](https://github.com/NousResearch/hermes-agent/pull/1650))
-
- **MCP server management CLI** — `hermes mcp` commands for installing, configuring, and authenticating MCP servers with full OAuth 2.1 PKCE flow ([#2465](https://github.com/NousResearch/hermes-agent/pull/2465))
-
- **Gateway prompt caching** — Cache AIAgent instances per session, preserving Anthropic prompt cache across turns for dramatic cost reduction on long conversations ([#2282](https://github.com/NousResearch/hermes-agent/pull/2282), [#2284](https://github.com/NousResearch/hermes-agent/pull/2284), [#2361](https://github.com/NousResearch/hermes-agent/pull/2361))
-
- **Context compression overhaul** — Structured summaries with iterative updates, token-budget tail protection, configurable summary endpoint, and fallback model support ([#2323](https://github.com/NousResearch/hermes-agent/pull/2323), [#1727](https://github.com/NousResearch/hermes-agent/pull/1727), [#2224](https://github.com/NousResearch/hermes-agent/pull/2224))
-
- **Streaming enabled by default** — CLI streaming on by default with proper spinner/tool progress display during streaming mode, plus extensive linebreak and concatenation fixes ([#2340](https://github.com/NousResearch/hermes-agent/pull/2340), [#2161](https://github.com/NousResearch/hermes-agent/pull/2161), [#2258](https://github.com/NousResearch/hermes-agent/pull/2258))
-
---
-
-## 🖥️ CLI & User Experience
-
-### New Commands & Interactions
- **@ context completions** — Tab-completable `@file`/`@url` references that inject file content or web pages into the conversation ([#2482](https://github.com/NousResearch/hermes-agent/pull/2482), [#2343](https://github.com/NousResearch/hermes-agent/pull/2343))
- **`/statusbar`** — Toggle a persistent config bar showing model + provider info in the prompt ([#2240](https://github.com/NousResearch/hermes-agent/pull/2240), [#1917](https://github.com/NousResearch/hermes-agent/pull/1917))
- **`/queue`** — Queue prompts for the agent without interrupting the current run ([#2191](https://github.com/NousResearch/hermes-agent/pull/2191), [#2469](https://github.com/NousResearch/hermes-agent/pull/2469))
- **`/permission`** — Switch approval mode dynamically during a session ([#2207](https://github.com/NousResearch/hermes-agent/pull/2207))
- **`/browser`** — Interactive browser sessions from the CLI ([#2273](https://github.com/NousResearch/hermes-agent/pull/2273), [#1814](https://github.com/NousResearch/hermes-agent/pull/1814))
- **`/cost`** — Live pricing and usage tracking in gateway mode ([#2180](https://github.com/NousResearch/hermes-agent/pull/2180))
- **`/approve` and `/deny`** — Replaced bare text approval in gateway with explicit commands ([#2002](https://github.com/NousResearch/hermes-agent/pull/2002))
-
-### Streaming & Display
- Streaming enabled by default in CLI ([#2340](https://github.com/NousResearch/hermes-agent/pull/2340))
- Show spinners and tool progress during streaming mode ([#2161](https://github.com/NousResearch/hermes-agent/pull/2161))
- Show reasoning/thinking blocks when `show_reasoning` enabled ([#2118](https://github.com/NousResearch/hermes-agent/pull/2118))
- Context pressure warnings for CLI and gateway ([#2159](https://github.com/NousResearch/hermes-agent/pull/2159))
- Fix: streaming chunks concatenated without whitespace ([#2258](https://github.com/NousResearch/hermes-agent/pull/2258))
- Fix: iteration boundary linebreak prevents stream concatenation ([#2413](https://github.com/NousResearch/hermes-agent/pull/2413))
- Fix: defer streaming linebreak to prevent blank line stacking ([#2473](https://github.com/NousResearch/hermes-agent/pull/2473))
- Fix: suppress spinner animation in non-TTY environments ([#2216](https://github.com/NousResearch/hermes-agent/pull/2216))
- Fix: display provider and endpoint in API error messages ([#2266](https://github.com/NousResearch/hermes-agent/pull/2266))
- Fix: resolve garbled ANSI escape codes in status printouts ([#2448](https://github.com/NousResearch/hermes-agent/pull/2448))
- Fix: update gold ANSI color to true-color format ([#2246](https://github.com/NousResearch/hermes-agent/pull/2246))
- Fix: normalize toolset labels and use skin colors in banner ([#1912](https://github.com/NousResearch/hermes-agent/pull/1912))
-
-### CLI Polish
- Fix: prevent 'Press ENTER to continue...' on exit ([#2555](https://github.com/NousResearch/hermes-agent/pull/2555))
- Fix: flush stdout during agent loop to prevent macOS display freeze ([#1654](https://github.com/NousResearch/hermes-agent/pull/1654))
- Fix: show human-readable error when `hermes setup` hits permissions error ([#2196](https://github.com/NousResearch/hermes-agent/pull/2196))
- Fix: `/stop` command crash + UnboundLocalError in streaming media delivery ([#2463](https://github.com/NousResearch/hermes-agent/pull/2463))
- Fix: allow custom/local endpoints without API key ([#2556](https://github.com/NousResearch/hermes-agent/pull/2556))
- Fix: Kitty keyboard protocol Shift+Enter for Ghostty/WezTerm (attempted + reverted due to prompt_toolkit crash) ([#2345](https://github.com/NousResearch/hermes-agent/pull/2345), [#2349](https://github.com/NousResearch/hermes-agent/pull/2349))
-
-### Configuration
- **`${ENV_VAR}` substitution** in config.yaml ([#2684](https://github.com/NousResearch/hermes-agent/pull/2684))
- **Real-time config reload** — config.yaml changes apply without restart ([#2210](https://github.com/NousResearch/hermes-agent/pull/2210))
- **`custom_models.yaml`** for user-managed model additions ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214))
- **Priority-based context file selection** + CLAUDE.md support ([#2301](https://github.com/NousResearch/hermes-agent/pull/2301))
- **Merge nested YAML sections** instead of replacing on config update ([#2213](https://github.com/NousResearch/hermes-agent/pull/2213))
- Fix: config.yaml provider key overrides env var silently ([#2272](https://github.com/NousResearch/hermes-agent/pull/2272))
- Fix: log warning instead of silently swallowing config.yaml errors ([#2683](https://github.com/NousResearch/hermes-agent/pull/2683))
- Fix: disabled toolsets re-enable themselves after `hermes tools` ([#2268](https://github.com/NousResearch/hermes-agent/pull/2268))
- Fix: platform default toolsets silently override tool deselection ([#2624](https://github.com/NousResearch/hermes-agent/pull/2624))
- Fix: honor bare YAML `approvals.mode: off` ([#2620](https://github.com/NousResearch/hermes-agent/pull/2620))
- Fix: `hermes update` use `.[all]` extras with fallback ([#1728](https://github.com/NousResearch/hermes-agent/pull/1728))
- Fix: `hermes update` prompt before resetting working tree on stash conflicts ([#2390](https://github.com/NousResearch/hermes-agent/pull/2390))
- Fix: use git pull --rebase in update/install to avoid divergent branch error ([#2274](https://github.com/NousResearch/hermes-agent/pull/2274))
- Fix: add zprofile fallback and create zshrc on fresh macOS installs ([#2320](https://github.com/NousResearch/hermes-agent/pull/2320))
- Fix: remove `ANTHROPIC_BASE_URL` env var to avoid collisions ([#1675](https://github.com/NousResearch/hermes-agent/pull/1675))
- Fix: don't ask IMAP password if already in keyring or env ([#2212](https://github.com/NousResearch/hermes-agent/pull/2212))
- Fix: OpenCode Zen/Go show OpenRouter models instead of their own ([#2277](https://github.com/NousResearch/hermes-agent/pull/2277))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### New Providers
- **GitHub Copilot** — Full OAuth auth, API routing, token validation, and 400k context. ([#1924](https://github.com/NousResearch/hermes-agent/pull/1924), [#1896](https://github.com/NousResearch/hermes-agent/pull/1896), [#1879](https://github.com/NousResearch/hermes-agent/pull/1879) by @mchzimm, [#2507](https://github.com/NousResearch/hermes-agent/pull/2507))
- **Alibaba Cloud / DashScope** — Full integration with DashScope v1 runtime, model dot preservation, and 401 auth fixes ([#1673](https://github.com/NousResearch/hermes-agent/pull/1673), [#2332](https://github.com/NousResearch/hermes-agent/pull/2332), [#2459](https://github.com/NousResearch/hermes-agent/pull/2459))
- **Kilo Code** — First-class inference provider ([#1666](https://github.com/NousResearch/hermes-agent/pull/1666))
- **OpenCode Zen and OpenCode Go** — New provider backends ([#1650](https://github.com/NousResearch/hermes-agent/pull/1650), [#2393](https://github.com/NousResearch/hermes-agent/pull/2393) by @0xbyt4)
- **NeuTTS** — Local TTS provider backend with built-in setup flow, replacing the old optional skill ([#1657](https://github.com/NousResearch/hermes-agent/pull/1657), [#1664](https://github.com/NousResearch/hermes-agent/pull/1664))
-
-### Provider Improvements
- **Eager fallback** to backup model on rate-limit errors ([#1730](https://github.com/NousResearch/hermes-agent/pull/1730))
- **Endpoint metadata** for custom model context and pricing; query local servers for actual context window size ([#1906](https://github.com/NousResearch/hermes-agent/pull/1906), [#2091](https://github.com/NousResearch/hermes-agent/pull/2091) by @dusterbloom)
- **Context length detection overhaul** — models.dev integration, provider-aware resolution, fuzzy matching for custom endpoints, `/v1/props` for llama.cpp ([#2158](https://github.com/NousResearch/hermes-agent/pull/2158), [#2051](https://github.com/NousResearch/hermes-agent/pull/2051), [#2403](https://github.com/NousResearch/hermes-agent/pull/2403))
- **Model catalog updates** — gpt-5.4-mini, gpt-5.4-nano, healer-alpha, haiku-4.5, minimax-m2.7, claude 4.6 at 1M context ([#1913](https://github.com/NousResearch/hermes-agent/pull/1913), [#1915](https://github.com/NousResearch/hermes-agent/pull/1915), [#1900](https://github.com/NousResearch/hermes-agent/pull/1900), [#2155](https://github.com/NousResearch/hermes-agent/pull/2155), [#2474](https://github.com/NousResearch/hermes-agent/pull/2474))
- **Custom endpoint improvements** — `model.base_url` in config.yaml, `api_mode` override for responses API, allow endpoints without API key, fail fast on missing keys ([#2330](https://github.com/NousResearch/hermes-agent/pull/2330), [#1651](https://github.com/NousResearch/hermes-agent/pull/1651), [#2556](https://github.com/NousResearch/hermes-agent/pull/2556), [#2445](https://github.com/NousResearch/hermes-agent/pull/2445), [#1994](https://github.com/NousResearch/hermes-agent/pull/1994), [#1998](https://github.com/NousResearch/hermes-agent/pull/1998))
- Inject model and provider into system prompt ([#1929](https://github.com/NousResearch/hermes-agent/pull/1929))
- Tie `api_mode` to provider config instead of env var ([#1656](https://github.com/NousResearch/hermes-agent/pull/1656))
- Fix: prevent Anthropic token leaking to third-party `anthropic_messages` providers ([#2389](https://github.com/NousResearch/hermes-agent/pull/2389))
- Fix: prevent Anthropic fallback from inheriting non-Anthropic `base_url` ([#2388](https://github.com/NousResearch/hermes-agent/pull/2388))
- Fix: `auxiliary_is_nous` flag never resets — leaked Nous tags to other providers ([#1713](https://github.com/NousResearch/hermes-agent/pull/1713))
- Fix: Anthropic `tool_choice 'none'` still allowed tool calls ([#1714](https://github.com/NousResearch/hermes-agent/pull/1714))
- Fix: Mistral parser nested JSON fallback extraction ([#2335](https://github.com/NousResearch/hermes-agent/pull/2335))
- Fix: MiniMax 401 auth resolved by defaulting to `anthropic_messages` ([#2103](https://github.com/NousResearch/hermes-agent/pull/2103))
- Fix: case-insensitive model family matching ([#2350](https://github.com/NousResearch/hermes-agent/pull/2350))
- Fix: ignore placeholder provider keys in activation checks ([#2358](https://github.com/NousResearch/hermes-agent/pull/2358))
- Fix: Preserve Ollama model:tag colons in context length detection ([#2149](https://github.com/NousResearch/hermes-agent/pull/2149))
- Fix: recognize Claude Code OAuth credentials in startup gate ([#1663](https://github.com/NousResearch/hermes-agent/pull/1663))
- Fix: detect Claude Code version dynamically for OAuth user-agent ([#1670](https://github.com/NousResearch/hermes-agent/pull/1670))
- Fix: OAuth flag stale after refresh/fallback ([#1890](https://github.com/NousResearch/hermes-agent/pull/1890))
- Fix: auxiliary client skips expired Codex JWT ([#2397](https://github.com/NousResearch/hermes-agent/pull/2397))
-
-### Agent Loop
- **Gateway prompt caching** — Cache AIAgent per session, keep assistant turns, fix session restore ([#2282](https://github.com/NousResearch/hermes-agent/pull/2282), [#2284](https://github.com/NousResearch/hermes-agent/pull/2284), [#2361](https://github.com/NousResearch/hermes-agent/pull/2361))
- **Context compression overhaul** — Structured summaries, iterative updates, token-budget tail protection, configurable `summary_base_url` ([#2323](https://github.com/NousResearch/hermes-agent/pull/2323), [#1727](https://github.com/NousResearch/hermes-agent/pull/1727), [#2224](https://github.com/NousResearch/hermes-agent/pull/2224))
- **Pre-call sanitization and post-call tool guardrails** ([#1732](https://github.com/NousResearch/hermes-agent/pull/1732))
- **Auto-recover** from provider-rejected `tool_choice` by retrying without ([#2174](https://github.com/NousResearch/hermes-agent/pull/2174))
- **Background memory/skill review** replaces inline nudges ([#2235](https://github.com/NousResearch/hermes-agent/pull/2235))
- **SOUL.md as primary agent identity** instead of hardcoded default ([#1922](https://github.com/NousResearch/hermes-agent/pull/1922))
- Fix: prevent silent tool result loss during context compression ([#1993](https://github.com/NousResearch/hermes-agent/pull/1993))
- Fix: handle empty/null function arguments in tool call recovery ([#2163](https://github.com/NousResearch/hermes-agent/pull/2163))
- Fix: handle API refusal responses gracefully instead of crashing ([#2156](https://github.com/NousResearch/hermes-agent/pull/2156))
- Fix: prevent stuck agent loop on malformed tool calls ([#2114](https://github.com/NousResearch/hermes-agent/pull/2114))
- Fix: return JSON parse error to model instead of dispatching with empty args ([#2342](https://github.com/NousResearch/hermes-agent/pull/2342))
- Fix: consecutive assistant message merge drops content on mixed types ([#1703](https://github.com/NousResearch/hermes-agent/pull/1703))
- Fix: message role alternation violations in JSON recovery and error handler ([#1722](https://github.com/NousResearch/hermes-agent/pull/1722))
- Fix: `compression_attempts` resets each iteration — allowed unlimited compressions ([#1723](https://github.com/NousResearch/hermes-agent/pull/1723))
- Fix: `length_continue_retries` never resets — later truncations got fewer retries ([#1717](https://github.com/NousResearch/hermes-agent/pull/1717))
- Fix: compressor summary role violated consecutive-role constraint ([#1720](https://github.com/NousResearch/hermes-agent/pull/1720), [#1743](https://github.com/NousResearch/hermes-agent/pull/1743))
- Fix: remove hardcoded `gemini-3-flash-preview` as default summary model ([#2464](https://github.com/NousResearch/hermes-agent/pull/2464))
- Fix: correctly handle empty tool results ([#2201](https://github.com/NousResearch/hermes-agent/pull/2201))
- Fix: crash on None entry in `tool_calls` list ([#2209](https://github.com/NousResearch/hermes-agent/pull/2209) by @0xbyt4, [#2316](https://github.com/NousResearch/hermes-agent/pull/2316))
- Fix: per-thread persistent event loops in worker threads ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214) by @jquesnelle)
- Fix: prevent 'event loop already running' when async tools run in parallel ([#2207](https://github.com/NousResearch/hermes-agent/pull/2207))
- Fix: strip ANSI at the source — clean terminal output before it reaches the model ([#2115](https://github.com/NousResearch/hermes-agent/pull/2115))
- Fix: skip top-level `cache_control` on role:tool for OpenRouter ([#2391](https://github.com/NousResearch/hermes-agent/pull/2391))
- Fix: delegate tool — save parent tool names before child construction mutates global ([#2083](https://github.com/NousResearch/hermes-agent/pull/2083) by @ygd58, [#1894](https://github.com/NousResearch/hermes-agent/pull/1894))
- Fix: only strip last assistant message if empty string ([#2326](https://github.com/NousResearch/hermes-agent/pull/2326))
-
-### Session & Memory
- **Session search** and management slash commands ([#2198](https://github.com/NousResearch/hermes-agent/pull/2198))
- **Auto session titles** and `.hermes.md` project config ([#1712](https://github.com/NousResearch/hermes-agent/pull/1712))
- Fix: concurrent memory writes silently drop entries — added file locking ([#1726](https://github.com/NousResearch/hermes-agent/pull/1726))
- Fix: search all sources by default in `session_search` ([#1892](https://github.com/NousResearch/hermes-agent/pull/1892))
- Fix: handle hyphenated FTS5 queries and preserve quoted literals ([#1776](https://github.com/NousResearch/hermes-agent/pull/1776))
- Fix: skip corrupt lines in `load_transcript` instead of crashing ([#1744](https://github.com/NousResearch/hermes-agent/pull/1744))
- Fix: normalize session keys to prevent case-sensitive duplicates ([#2157](https://github.com/NousResearch/hermes-agent/pull/2157))
- Fix: prevent `session_search` crash when no sessions exist ([#2194](https://github.com/NousResearch/hermes-agent/pull/2194))
- Fix: reset token counters on new session for accurate usage display ([#2101](https://github.com/NousResearch/hermes-agent/pull/2101) by @InB4DevOps)
- Fix: prevent stale memory overwrites by flush agent ([#2687](https://github.com/NousResearch/hermes-agent/pull/2687))
- Fix: remove synthetic error message injection, fix session resume after repeated failures ([#2303](https://github.com/NousResearch/hermes-agent/pull/2303))
- Fix: quiet mode with `--resume` now passes conversation_history ([#2357](https://github.com/NousResearch/hermes-agent/pull/2357))
- Fix: unify resume logic in batch mode ([#2331](https://github.com/NousResearch/hermes-agent/pull/2331))
-
-### Honcho Memory
- Honcho config fixes and @ context reference integration ([#2343](https://github.com/NousResearch/hermes-agent/pull/2343))
- Self-hosted / Docker configuration documentation ([#2475](https://github.com/NousResearch/hermes-agent/pull/2475))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### New Platform Adapters
- **Signal Messenger** — Full adapter with attachment handling, group message filtering, and Note to Self echo-back protection ([#2206](https://github.com/NousResearch/hermes-agent/pull/2206), [#2400](https://github.com/NousResearch/hermes-agent/pull/2400), [#2297](https://github.com/NousResearch/hermes-agent/pull/2297), [#2156](https://github.com/NousResearch/hermes-agent/pull/2156))
- **DingTalk** — Adapter with gateway wiring and setup docs ([#1685](https://github.com/NousResearch/hermes-agent/pull/1685), [#1690](https://github.com/NousResearch/hermes-agent/pull/1690), [#1692](https://github.com/NousResearch/hermes-agent/pull/1692))
- **SMS (Twilio)** ([#1688](https://github.com/NousResearch/hermes-agent/pull/1688))
- **Mattermost** — With @-mention-only channel filter ([#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2443](https://github.com/NousResearch/hermes-agent/pull/2443))
- **Matrix** — With vision support and image caching ([#1683](https://github.com/NousResearch/hermes-agent/pull/1683), [#2520](https://github.com/NousResearch/hermes-agent/pull/2520))
- **Webhook** — Platform adapter for external event triggers ([#2166](https://github.com/NousResearch/hermes-agent/pull/2166))
- **OpenAI-compatible API server** — `/v1/chat/completions` endpoint with `/api/jobs` cron management ([#1756](https://github.com/NousResearch/hermes-agent/pull/1756), [#2450](https://github.com/NousResearch/hermes-agent/pull/2450), [#2456](https://github.com/NousResearch/hermes-agent/pull/2456))
-
-### Telegram Improvements
- MarkdownV2 support — strikethrough, spoiler, blockquotes, escape parentheses/braces/backslashes/backticks ([#2199](https://github.com/NousResearch/hermes-agent/pull/2199), [#2200](https://github.com/NousResearch/hermes-agent/pull/2200) by @llbn, [#2386](https://github.com/NousResearch/hermes-agent/pull/2386))
- Auto-detect HTML tags and use `parse_mode=HTML` ([#1709](https://github.com/NousResearch/hermes-agent/pull/1709))
- Telegram group vision support + thread-based sessions ([#2153](https://github.com/NousResearch/hermes-agent/pull/2153))
- Auto-reconnect polling after network interruption ([#2517](https://github.com/NousResearch/hermes-agent/pull/2517))
- Aggregate split text messages before dispatching ([#1674](https://github.com/NousResearch/hermes-agent/pull/1674))
- Fix: streaming config bridge, not-modified, flood control ([#1782](https://github.com/NousResearch/hermes-agent/pull/1782), [#1783](https://github.com/NousResearch/hermes-agent/pull/1783))
- Fix: edited_message event crashes ([#2074](https://github.com/NousResearch/hermes-agent/pull/2074))
- Fix: retry 409 polling conflicts before giving up ([#2312](https://github.com/NousResearch/hermes-agent/pull/2312))
- Fix: topic delivery via `platform:chat_id:thread_id` format ([#2455](https://github.com/NousResearch/hermes-agent/pull/2455))
-
-### Discord Improvements
- Document caching and text-file injection ([#2503](https://github.com/NousResearch/hermes-agent/pull/2503))
- Persistent typing indicator for DMs ([#2468](https://github.com/NousResearch/hermes-agent/pull/2468))
- Discord DM vision — inline images + attachment analysis ([#2186](https://github.com/NousResearch/hermes-agent/pull/2186))
- Persist thread participation across gateway restarts ([#1661](https://github.com/NousResearch/hermes-agent/pull/1661))
- Fix: gateway crash on non-ASCII guild names ([#2302](https://github.com/NousResearch/hermes-agent/pull/2302))
- Fix: thread permission errors ([#2073](https://github.com/NousResearch/hermes-agent/pull/2073))
- Fix: slash event routing in threads ([#2460](https://github.com/NousResearch/hermes-agent/pull/2460))
- Fix: remove bugged followup messages + `/ask` command ([#1836](https://github.com/NousResearch/hermes-agent/pull/1836))
- Fix: graceful WebSocket reconnection ([#2127](https://github.com/NousResearch/hermes-agent/pull/2127))
- Fix: voice channel TTS when streaming enabled ([#2322](https://github.com/NousResearch/hermes-agent/pull/2322))
-
-### WhatsApp & Other Adapters
- WhatsApp: outbound `send_message` routing ([#1769](https://github.com/NousResearch/hermes-agent/pull/1769) by @sai-samarth), LID format self-chat ([#1667](https://github.com/NousResearch/hermes-agent/pull/1667)), `reply_prefix` config fix ([#1923](https://github.com/NousResearch/hermes-agent/pull/1923)), restart on bridge child exit ([#2334](https://github.com/NousResearch/hermes-agent/pull/2334)), image/bridge improvements ([#2181](https://github.com/NousResearch/hermes-agent/pull/2181))
- Matrix: correct `reply_to_message_id` parameter ([#1895](https://github.com/NousResearch/hermes-agent/pull/1895)), bare media types fix ([#1736](https://github.com/NousResearch/hermes-agent/pull/1736))
- Mattermost: MIME types for media attachments ([#2329](https://github.com/NousResearch/hermes-agent/pull/2329))
-
-### Gateway Core
- **Auto-reconnect** failed platforms with exponential backoff ([#2584](https://github.com/NousResearch/hermes-agent/pull/2584))
- **Notify users when session auto-resets** ([#2519](https://github.com/NousResearch/hermes-agent/pull/2519))
- **Reply-to message context** for out-of-session replies ([#1662](https://github.com/NousResearch/hermes-agent/pull/1662))
- **Ignore unauthorized DMs** config option ([#1919](https://github.com/NousResearch/hermes-agent/pull/1919))
- Fix: `/reset` in thread-mode resets global session instead of thread ([#2254](https://github.com/NousResearch/hermes-agent/pull/2254))
- Fix: deliver MEDIA: files after streaming responses ([#2382](https://github.com/NousResearch/hermes-agent/pull/2382))
- Fix: cap interrupt recursion depth to prevent resource exhaustion ([#1659](https://github.com/NousResearch/hermes-agent/pull/1659))
- Fix: detect stopped processes and release stale locks on `--replace` ([#2406](https://github.com/NousResearch/hermes-agent/pull/2406), [#1908](https://github.com/NousResearch/hermes-agent/pull/1908))
- Fix: PID-based wait with force-kill for gateway restart ([#1902](https://github.com/NousResearch/hermes-agent/pull/1902))
- Fix: prevent `--replace` mode from killing the caller process ([#2185](https://github.com/NousResearch/hermes-agent/pull/2185))
- Fix: `/model` shows active fallback model instead of config default ([#1660](https://github.com/NousResearch/hermes-agent/pull/1660))
- Fix: `/title` command fails when session doesn't exist in SQLite yet ([#2379](https://github.com/NousResearch/hermes-agent/pull/2379) by @ten-jampa)
- Fix: process `/queue`'d messages after agent completion ([#2469](https://github.com/NousResearch/hermes-agent/pull/2469))
- Fix: strip orphaned `tool_results` + let `/reset` bypass running agent ([#2180](https://github.com/NousResearch/hermes-agent/pull/2180))
- Fix: prevent agents from starting gateway outside systemd management ([#2617](https://github.com/NousResearch/hermes-agent/pull/2617))
- Fix: prevent systemd restart storm on gateway connection failure ([#2327](https://github.com/NousResearch/hermes-agent/pull/2327))
- Fix: include resolved node path in systemd unit ([#1767](https://github.com/NousResearch/hermes-agent/pull/1767) by @sai-samarth)
- Fix: send error details to user in gateway outer exception handler ([#1966](https://github.com/NousResearch/hermes-agent/pull/1966))
- Fix: improve error handling for 429 usage limits and 500 context overflow ([#1839](https://github.com/NousResearch/hermes-agent/pull/1839))
- Fix: add all missing platform allowlist env vars to startup warning check ([#2628](https://github.com/NousResearch/hermes-agent/pull/2628))
- Fix: media delivery fails for file paths containing spaces ([#2621](https://github.com/NousResearch/hermes-agent/pull/2621))
- Fix: duplicate session-key collision in multi-platform gateway ([#2171](https://github.com/NousResearch/hermes-agent/pull/2171))
- Fix: Matrix and Mattermost never report as connected ([#1711](https://github.com/NousResearch/hermes-agent/pull/1711))
- Fix: PII redaction config never read — missing yaml import ([#1701](https://github.com/NousResearch/hermes-agent/pull/1701))
- Fix: NameError on skill slash commands ([#1697](https://github.com/NousResearch/hermes-agent/pull/1697))
- Fix: persist watcher metadata in checkpoint for crash recovery ([#1706](https://github.com/NousResearch/hermes-agent/pull/1706))
- Fix: pass `message_thread_id` in send_image_file, send_document, send_video ([#2339](https://github.com/NousResearch/hermes-agent/pull/2339))
- Fix: media-group aggregation on rapid successive photo messages ([#2160](https://github.com/NousResearch/hermes-agent/pull/2160))
-
---
-
-## 🔧 Tool System
-
-### MCP Enhancements
- **MCP server management CLI** + OAuth 2.1 PKCE auth ([#2465](https://github.com/NousResearch/hermes-agent/pull/2465))
- **Expose MCP servers as standalone toolsets** ([#1907](https://github.com/NousResearch/hermes-agent/pull/1907))
- **Interactive MCP tool configuration** in `hermes tools` ([#1694](https://github.com/NousResearch/hermes-agent/pull/1694))
- Fix: MCP-OAuth port mismatch, path traversal, and shared handler state ([#2552](https://github.com/NousResearch/hermes-agent/pull/2552))
- Fix: preserve MCP tool registrations across session resets ([#2124](https://github.com/NousResearch/hermes-agent/pull/2124))
- Fix: concurrent file access crash + duplicate MCP registration ([#2154](https://github.com/NousResearch/hermes-agent/pull/2154))
- Fix: normalise MCP schemas + expand session list columns ([#2102](https://github.com/NousResearch/hermes-agent/pull/2102))
- Fix: `tool_choice` `mcp_` prefix handling ([#1775](https://github.com/NousResearch/hermes-agent/pull/1775))
-
-### Web Tool Backends
- **Tavily** as web search/extract/crawl backend ([#1731](https://github.com/NousResearch/hermes-agent/pull/1731))
- **Parallel** as alternative web search/extract backend ([#1696](https://github.com/NousResearch/hermes-agent/pull/1696))
- **Configurable web backend** — Firecrawl/BeautifulSoup/Playwright selection ([#2256](https://github.com/NousResearch/hermes-agent/pull/2256))
- Fix: whitespace-only env vars bypass web backend detection ([#2341](https://github.com/NousResearch/hermes-agent/pull/2341))
-
-### New Tools
- **IMAP email** reading and sending ([#2173](https://github.com/NousResearch/hermes-agent/pull/2173))
- **STT (speech-to-text)** tool using Whisper API ([#2072](https://github.com/NousResearch/hermes-agent/pull/2072))
- **Route-aware pricing estimates** ([#1695](https://github.com/NousResearch/hermes-agent/pull/1695))
-
-### Tool Improvements
- TTS: `base_url` support for OpenAI TTS provider ([#2064](https://github.com/NousResearch/hermes-agent/pull/2064) by @hanai)
- Vision: configurable timeout, tilde expansion in file paths, DM vision with multi-image and base64 fallback ([#2480](https://github.com/NousResearch/hermes-agent/pull/2480), [#2585](https://github.com/NousResearch/hermes-agent/pull/2585), [#2211](https://github.com/NousResearch/hermes-agent/pull/2211))
- Browser: race condition fix in session creation ([#1721](https://github.com/NousResearch/hermes-agent/pull/1721)), TypeError on unexpected LLM params ([#1735](https://github.com/NousResearch/hermes-agent/pull/1735))
- File tools: strip ANSI escape codes from write_file and patch content ([#2532](https://github.com/NousResearch/hermes-agent/pull/2532)), include pagination args in repeated search key ([#1824](https://github.com/NousResearch/hermes-agent/pull/1824) by @cutepawss), improve fuzzy matching accuracy + position calculation refactor ([#2096](https://github.com/NousResearch/hermes-agent/pull/2096), [#1681](https://github.com/NousResearch/hermes-agent/pull/1681))
- Code execution: resource leak and double socket close fix ([#2381](https://github.com/NousResearch/hermes-agent/pull/2381))
- Delegate: thread safety for concurrent subagent delegation ([#1672](https://github.com/NousResearch/hermes-agent/pull/1672)), preserve parent agent's tool list after delegation ([#1778](https://github.com/NousResearch/hermes-agent/pull/1778))
- Fix: make concurrent tool batching path-aware for file mutations ([#1914](https://github.com/NousResearch/hermes-agent/pull/1914))
- Fix: chunk long messages in `send_message_tool` before platform dispatch ([#1646](https://github.com/NousResearch/hermes-agent/pull/1646))
- Fix: add missing 'messaging' toolset ([#1718](https://github.com/NousResearch/hermes-agent/pull/1718))
- Fix: prevent unavailable tool names from leaking into model schemas ([#2072](https://github.com/NousResearch/hermes-agent/pull/2072))
- Fix: pass visited set by reference to prevent diamond dependency duplication ([#2311](https://github.com/NousResearch/hermes-agent/pull/2311))
- Fix: Daytona sandbox lookup migrated from `find_one` to `get/list` ([#2063](https://github.com/NousResearch/hermes-agent/pull/2063) by @rovle)
-
---
-
-## 🧩 Skills Ecosystem
-
-### Skills System Improvements
- **Agent-created skills** — Caution-level findings allowed, dangerous skills ask instead of block ([#1840](https://github.com/NousResearch/hermes-agent/pull/1840), [#2446](https://github.com/NousResearch/hermes-agent/pull/2446))
- **`--yes` flag** to bypass confirmation in `/skills install` and uninstall ([#1647](https://github.com/NousResearch/hermes-agent/pull/1647))
- **Disabled skills respected** across banner, system prompt, and slash commands ([#1897](https://github.com/NousResearch/hermes-agent/pull/1897))
- Fix: skills custom_tools import crash + sandbox file_tools integration ([#2239](https://github.com/NousResearch/hermes-agent/pull/2239))
- Fix: agent-created skills with pip requirements crash on install ([#2145](https://github.com/NousResearch/hermes-agent/pull/2145))
- Fix: race condition in `Skills.__init__` when `hub.yaml` missing ([#2242](https://github.com/NousResearch/hermes-agent/pull/2242))
- Fix: validate skill metadata before install and block duplicates ([#2241](https://github.com/NousResearch/hermes-agent/pull/2241))
- Fix: skills hub inspect/resolve — 4 bugs in inspect, redirects, discovery, tap list ([#2447](https://github.com/NousResearch/hermes-agent/pull/2447))
- Fix: agent-created skills keep working after session reset ([#2121](https://github.com/NousResearch/hermes-agent/pull/2121))
-
-### New Skills
- **OCR-and-documents** — PDF/DOCX/XLS/PPTX/image OCR with optional GPU ([#2236](https://github.com/NousResearch/hermes-agent/pull/2236), [#2461](https://github.com/NousResearch/hermes-agent/pull/2461))
- **Huggingface-hub** bundled skill ([#1921](https://github.com/NousResearch/hermes-agent/pull/1921))
- **Sherlock OSINT** username search ([#1671](https://github.com/NousResearch/hermes-agent/pull/1671))
- **Meme-generation** — Image generator with Pillow ([#2344](https://github.com/NousResearch/hermes-agent/pull/2344))
- **Bioinformatics** gateway skill — index to 400+ bio skills ([#2387](https://github.com/NousResearch/hermes-agent/pull/2387))
- **Inference.sh** skill (terminal-based) ([#1686](https://github.com/NousResearch/hermes-agent/pull/1686))
- **Base blockchain** optional skill ([#1643](https://github.com/NousResearch/hermes-agent/pull/1643))
- **3D-model-viewer** optional skill ([#2226](https://github.com/NousResearch/hermes-agent/pull/2226))
- **FastMCP** optional skill ([#2113](https://github.com/NousResearch/hermes-agent/pull/2113))
- **Hermes-agent-setup** skill ([#1905](https://github.com/NousResearch/hermes-agent/pull/1905))
-
---
-
-## 🔌 Plugin System Enhancements
-
- **TUI extension hooks** — Build custom CLIs on top of Hermes ([#2333](https://github.com/NousResearch/hermes-agent/pull/2333))
- **`hermes plugins install/remove/list`** commands ([#2337](https://github.com/NousResearch/hermes-agent/pull/2337))
- **Slash command registration** for plugins ([#2359](https://github.com/NousResearch/hermes-agent/pull/2359))
- **`session:end` lifecycle event** hook ([#1725](https://github.com/NousResearch/hermes-agent/pull/1725))
- Fix: require opt-in for project plugin discovery ([#2215](https://github.com/NousResearch/hermes-agent/pull/2215))
-
---
-
-## 🔒 Security & Reliability
-
-### Security
- **SSRF protection** for vision_tools and web_tools ([#2679](https://github.com/NousResearch/hermes-agent/pull/2679))
- **Shell injection prevention** in `_expand_path` via `~user` path suffix ([#2685](https://github.com/NousResearch/hermes-agent/pull/2685))
- **Block untrusted browser-origin** API server access ([#2451](https://github.com/NousResearch/hermes-agent/pull/2451))
- **Block sandbox backend creds** from subprocess env ([#1658](https://github.com/NousResearch/hermes-agent/pull/1658))
- **Block @ references** from reading secrets outside workspace ([#2601](https://github.com/NousResearch/hermes-agent/pull/2601) by @Gutslabs)
- **Malicious code pattern pre-exec scanner** for terminal_tool ([#2245](https://github.com/NousResearch/hermes-agent/pull/2245))
- **Harden terminal safety** and sandbox file writes ([#1653](https://github.com/NousResearch/hermes-agent/pull/1653))
- **PKCE verifier leak** fix + OAuth refresh Content-Type ([#1775](https://github.com/NousResearch/hermes-agent/pull/1775))
- **Eliminate SQL string formatting** in `execute()` calls ([#2061](https://github.com/NousResearch/hermes-agent/pull/2061) by @dusterbloom)
- **Harden jobs API** — input limits, field whitelist, startup check ([#2456](https://github.com/NousResearch/hermes-agent/pull/2456))
-
-### Reliability
- Thread locks on 4 SessionDB methods ([#1704](https://github.com/NousResearch/hermes-agent/pull/1704))
- File locking for concurrent memory writes ([#1726](https://github.com/NousResearch/hermes-agent/pull/1726))
- Handle OpenRouter errors gracefully ([#2112](https://github.com/NousResearch/hermes-agent/pull/2112))
- Guard print() calls against OSError ([#1668](https://github.com/NousResearch/hermes-agent/pull/1668))
- Safely handle non-string inputs in redacting formatter ([#2392](https://github.com/NousResearch/hermes-agent/pull/2392), [#1700](https://github.com/NousResearch/hermes-agent/pull/1700))
- ACP: preserve session provider on model switch, persist sessions to disk ([#2380](https://github.com/NousResearch/hermes-agent/pull/2380), [#2071](https://github.com/NousResearch/hermes-agent/pull/2071))
- API server: persist ResponseStore to SQLite across restarts ([#2472](https://github.com/NousResearch/hermes-agent/pull/2472))
- Fix: `fetch_nous_models` always TypeError from positional args ([#1699](https://github.com/NousResearch/hermes-agent/pull/1699))
- Fix: resolve merge conflict markers in cli.py breaking startup ([#2347](https://github.com/NousResearch/hermes-agent/pull/2347))
- Fix: `minisweagent_path.py` missing from wheel ([#2098](https://github.com/NousResearch/hermes-agent/pull/2098) by @JiwaniZakir)
-
-### Cron System
- **`[SILENT]` response** — cron agents can suppress delivery ([#1833](https://github.com/NousResearch/hermes-agent/pull/1833))
- **Scale missed-job grace window** with schedule frequency ([#2449](https://github.com/NousResearch/hermes-agent/pull/2449))
- **Recover recent one-shot jobs** ([#1918](https://github.com/NousResearch/hermes-agent/pull/1918))
- Fix: normalize `repeat<=0` to None — jobs deleted after first run when LLM passes -1 ([#2612](https://github.com/NousResearch/hermes-agent/pull/2612) by @Mibayy)
- Fix: Matrix added to scheduler delivery platform_map ([#2167](https://github.com/NousResearch/hermes-agent/pull/2167) by @buntingszn)
- Fix: naive ISO timestamps without timezone — jobs fire at wrong time ([#1729](https://github.com/NousResearch/hermes-agent/pull/1729))
- Fix: `get_due_jobs` reads `jobs.json` twice — race condition ([#1716](https://github.com/NousResearch/hermes-agent/pull/1716))
- Fix: silent jobs return empty response for delivery skip ([#2442](https://github.com/NousResearch/hermes-agent/pull/2442))
- Fix: stop injecting cron outputs into gateway session history ([#2313](https://github.com/NousResearch/hermes-agent/pull/2313))
- Fix: close abandoned coroutine when `asyncio.run()` raises RuntimeError ([#2317](https://github.com/NousResearch/hermes-agent/pull/2317))
-
---
-
-## 🧪 Testing
-
- Resolve all consistently failing tests ([#2488](https://github.com/NousResearch/hermes-agent/pull/2488))
- Replace `FakePath` with `monkeypatch` for Python 3.12 compat ([#2444](https://github.com/NousResearch/hermes-agent/pull/2444))
- Align Hermes setup and full-suite expectations ([#1710](https://github.com/NousResearch/hermes-agent/pull/1710))
-
---
-
-## 📚 Documentation
-
- Comprehensive docs update for recent features ([#1693](https://github.com/NousResearch/hermes-agent/pull/1693), [#2183](https://github.com/NousResearch/hermes-agent/pull/2183))
- Alibaba Cloud and DingTalk setup guides ([#1687](https://github.com/NousResearch/hermes-agent/pull/1687), [#1692](https://github.com/NousResearch/hermes-agent/pull/1692))
- Detailed skills documentation ([#2244](https://github.com/NousResearch/hermes-agent/pull/2244))
- Honcho self-hosted / Docker configuration ([#2475](https://github.com/NousResearch/hermes-agent/pull/2475))
- Context length detection FAQ and quickstart references ([#2179](https://github.com/NousResearch/hermes-agent/pull/2179))
- Fix docs inconsistencies across reference and user guides ([#1995](https://github.com/NousResearch/hermes-agent/pull/1995))
- Fix MCP install commands — use uv, not bare pip ([#1909](https://github.com/NousResearch/hermes-agent/pull/1909))
- Replace ASCII diagrams with Mermaid/lists ([#2402](https://github.com/NousResearch/hermes-agent/pull/2402))
- Gemini OAuth provider implementation plan ([#2467](https://github.com/NousResearch/hermes-agent/pull/2467))
- Discord Server Members Intent marked as required ([#2330](https://github.com/NousResearch/hermes-agent/pull/2330))
- Fix MDX build error in api-server.md ([#1787](https://github.com/NousResearch/hermes-agent/pull/1787))
- Align venv path to match installer ([#2114](https://github.com/NousResearch/hermes-agent/pull/2114))
- New skills added to hub index ([#2281](https://github.com/NousResearch/hermes-agent/pull/2281))
-
---
-
-## 👥 Contributors
-
-### Core
- **@teknium1** (Teknium) — 280 PRs
-
-### Community Contributors
- **@mchzimm** (to_the_max) — GitHub Copilot provider integration ([#1879](https://github.com/NousResearch/hermes-agent/pull/1879))
- **@jquesnelle** (Jeffrey Quesnelle) — Per-thread persistent event loops fix ([#2214](https://github.com/NousResearch/hermes-agent/pull/2214))
- **@llbn** (lbn) — Telegram MarkdownV2 strikethrough, spoiler, blockquotes, and escape fixes ([#2199](https://github.com/NousResearch/hermes-agent/pull/2199), [#2200](https://github.com/NousResearch/hermes-agent/pull/2200))
- **@dusterbloom** — SQL injection prevention + local server context window querying ([#2061](https://github.com/NousResearch/hermes-agent/pull/2061), [#2091](https://github.com/NousResearch/hermes-agent/pull/2091))
- **@0xbyt4** — Anthropic tool_calls None guard + OpenCode-Go provider config fix ([#2209](https://github.com/NousResearch/hermes-agent/pull/2209), [#2393](https://github.com/NousResearch/hermes-agent/pull/2393))
- **@sai-samarth** (Saisamarth) — WhatsApp send_message routing + systemd node path ([#1769](https://github.com/NousResearch/hermes-agent/pull/1769), [#1767](https://github.com/NousResearch/hermes-agent/pull/1767))
- **@Gutslabs** (Guts) — Block @ references from reading secrets ([#2601](https://github.com/NousResearch/hermes-agent/pull/2601))
- **@Mibayy** (Mibay) — Cron job repeat normalization ([#2612](https://github.com/NousResearch/hermes-agent/pull/2612))
- **@ten-jampa** (Tenzin Jampa) — Gateway /title command fix ([#2379](https://github.com/NousResearch/hermes-agent/pull/2379))
- **@cutepawss** (lila) — File tools search pagination fix ([#1824](https://github.com/NousResearch/hermes-agent/pull/1824))
- **@hanai** (Hanai) — OpenAI TTS base_url support ([#2064](https://github.com/NousResearch/hermes-agent/pull/2064))
- **@rovle** (Lovre Pešut) — Daytona sandbox API migration ([#2063](https://github.com/NousResearch/hermes-agent/pull/2063))
- **@buntingszn** (bunting szn) — Matrix cron delivery support ([#2167](https://github.com/NousResearch/hermes-agent/pull/2167))
- **@InB4DevOps** — Token counter reset on new session ([#2101](https://github.com/NousResearch/hermes-agent/pull/2101))
- **@JiwaniZakir** (Zakir Jiwani) — Missing file in wheel fix ([#2098](https://github.com/NousResearch/hermes-agent/pull/2098))
- **@ygd58** (buray) — Delegate tool parent tool names fix ([#2083](https://github.com/NousResearch/hermes-agent/pull/2083))
-
---
-
-**Full Changelog**: [v2026.3.17...v2026.3.23](https://github.com/NousResearch/hermes-agent/compare/v2026.3.17...v2026.3.23)
--- a/RELEASE_v0.5.0.md
+++ b/RELEASE_v0.5.0.md
@@ -1,348 +0,0 @@
-# Hermes Agent v0.5.0 (v2026.3.28)
-
-**Release Date:** March 28, 2026
-
-> The hardening release — Hugging Face provider, /model command overhaul, Telegram Private Chat Topics, native Modal SDK, plugin lifecycle hooks, tool-use enforcement for GPT models, Nix flake, 50+ security and reliability fixes, and a comprehensive supply chain audit.
-
---
-
-## ✨ Highlights
-
- **Nous Portal now supports 400+ models** — The Nous Research inference portal has expanded dramatically, giving Hermes Agent users access to over 400 models through a single provider endpoint
-
- **Hugging Face as a first-class inference provider** — Full integration with HF Inference API including curated agentic model picker that maps to OpenRouter analogues, live `/models` endpoint probe, and setup wizard flow ([#3419](https://github.com/NousResearch/hermes-agent/pull/3419), [#3440](https://github.com/NousResearch/hermes-agent/pull/3440))
-
- **Telegram Private Chat Topics** — Project-based conversations with functional skill binding per topic, enabling isolated workflows within a single Telegram chat ([#3163](https://github.com/NousResearch/hermes-agent/pull/3163))
-
- **Native Modal SDK backend** — Replaced swe-rex dependency with native Modal SDK (`Sandbox.create.aio` + `exec.aio`), eliminating tunnels and simplifying the Modal terminal backend ([#3538](https://github.com/NousResearch/hermes-agent/pull/3538))
-
- **Plugin lifecycle hooks activated** — `pre_llm_call`, `post_llm_call`, `on_session_start`, and `on_session_end` hooks now fire in the agent loop and CLI/gateway, completing the plugin hook system ([#3542](https://github.com/NousResearch/hermes-agent/pull/3542))
-
- **Improved OpenAI Model Reliability** — Added `GPT_TOOL_USE_GUIDANCE` to prevent GPT models from describing intended actions instead of making tool calls, plus automatic stripping of stale budget warnings from conversation history that caused models to avoid tools across turns ([#3528](https://github.com/NousResearch/hermes-agent/pull/3528))
-
- **Nix flake** — Full uv2nix build, NixOS module with persistent container mode, auto-generated config keys from Python source, and suffix PATHs for agent-friendliness ([#20](https://github.com/NousResearch/hermes-agent/pull/20), [#3274](https://github.com/NousResearch/hermes-agent/pull/3274), [#3061](https://github.com/NousResearch/hermes-agent/pull/3061)) by @alt-glitch
-
- **Supply chain hardening** — Removed compromised `litellm` dependency, pinned all dependency version ranges, regenerated `uv.lock` with hashes, added CI workflow scanning PRs for supply chain attack patterns, and bumped deps to fix CVEs ([#2796](https://github.com/NousResearch/hermes-agent/pull/2796), [#2810](https://github.com/NousResearch/hermes-agent/pull/2810), [#2812](https://github.com/NousResearch/hermes-agent/pull/2812), [#2816](https://github.com/NousResearch/hermes-agent/pull/2816), [#3073](https://github.com/NousResearch/hermes-agent/pull/3073))
-
- **Anthropic output limits fix** — Replaced hardcoded 16K `max_tokens` with per-model native output limits (128K for Opus 4.6, 64K for Sonnet 4.6), fixing "Response truncated" and thinking-budget exhaustion on direct Anthropic API ([#3426](https://github.com/NousResearch/hermes-agent/pull/3426), [#3444](https://github.com/NousResearch/hermes-agent/pull/3444))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### New Provider: Hugging Face
- First-class Hugging Face Inference API integration with auth, setup wizard, and model picker ([#3419](https://github.com/NousResearch/hermes-agent/pull/3419))
- Curated model list mapping OpenRouter agentic defaults to HF equivalents — providers with 8+ curated models skip live `/models` probe for speed ([#3440](https://github.com/NousResearch/hermes-agent/pull/3440))
- Added glm-5-turbo to Z.AI provider model list ([#3095](https://github.com/NousResearch/hermes-agent/pull/3095))
-
-### Provider & Model Improvements
- `/model` command overhaul — extracted shared `switch_model()` pipeline for CLI and gateway, custom endpoint support, provider-aware routing ([#2795](https://github.com/NousResearch/hermes-agent/pull/2795), [#2799](https://github.com/NousResearch/hermes-agent/pull/2799))
- Removed `/model` slash command from CLI and gateway in favor of `hermes model` subcommand ([#3080](https://github.com/NousResearch/hermes-agent/pull/3080))
- Preserve `custom` provider instead of silently remapping to `openrouter` ([#2792](https://github.com/NousResearch/hermes-agent/pull/2792))
- Read root-level `provider` and `base_url` from config.yaml into model config ([#3112](https://github.com/NousResearch/hermes-agent/pull/3112))
- Align Nous Portal model slugs with OpenRouter naming ([#3253](https://github.com/NousResearch/hermes-agent/pull/3253))
- Fix Alibaba provider default endpoint and model list ([#3484](https://github.com/NousResearch/hermes-agent/pull/3484))
- Allow MiniMax users to override `/v1` → `/anthropic` auto-correction ([#3553](https://github.com/NousResearch/hermes-agent/pull/3553))
- Migrate OAuth token refresh to `platform.claude.com` with fallback ([#3246](https://github.com/NousResearch/hermes-agent/pull/3246))
-
-### Agent Loop & Conversation
- **Improved OpenAI model reliability** — `GPT_TOOL_USE_GUIDANCE` prevents GPT models from describing actions instead of calling tools + automatic budget warning stripping from history ([#3528](https://github.com/NousResearch/hermes-agent/pull/3528))
- **Surface lifecycle events** — All retry, fallback, and compression events now surface to the user as formatted messages ([#3153](https://github.com/NousResearch/hermes-agent/pull/3153))
- **Anthropic output limits** — Per-model native output limits instead of hardcoded 16K `max_tokens` ([#3426](https://github.com/NousResearch/hermes-agent/pull/3426))
- **Thinking-budget exhaustion detection** — Skip useless continuation retries when model uses all output tokens on reasoning ([#3444](https://github.com/NousResearch/hermes-agent/pull/3444))
- Always prefer streaming for API calls to prevent hung subagents ([#3120](https://github.com/NousResearch/hermes-agent/pull/3120))
- Restore safe non-streaming fallback after stream failures ([#3020](https://github.com/NousResearch/hermes-agent/pull/3020))
- Give subagents independent iteration budgets ([#3004](https://github.com/NousResearch/hermes-agent/pull/3004))
- Update `api_key` in `_try_activate_fallback` for subagent auth ([#3103](https://github.com/NousResearch/hermes-agent/pull/3103))
- Graceful return on max retries instead of crashing thread ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Count compression restarts toward retry limit ([#3070](https://github.com/NousResearch/hermes-agent/pull/3070))
- Include tool tokens in preflight estimate, guard context probe persistence ([#3164](https://github.com/NousResearch/hermes-agent/pull/3164))
- Update context compressor limits after fallback activation ([#3305](https://github.com/NousResearch/hermes-agent/pull/3305))
- Validate empty user messages to prevent Anthropic API 400 errors ([#3322](https://github.com/NousResearch/hermes-agent/pull/3322))
- GLM reasoning-only and max-length handling ([#3010](https://github.com/NousResearch/hermes-agent/pull/3010))
- Increase API timeout default from 900s to 1800s for slow-thinking models ([#3431](https://github.com/NousResearch/hermes-agent/pull/3431))
- Send `max_tokens` for Claude/OpenRouter + retry SSE connection errors ([#3497](https://github.com/NousResearch/hermes-agent/pull/3497))
- Prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode ([#2701](https://github.com/NousResearch/hermes-agent/pull/2701)) by @ctlst
-
-### Streaming & Reasoning
- **Persist reasoning across gateway session turns** with new schema v6 columns (`reasoning`, `reasoning_details`, `codex_reasoning_items`) ([#2974](https://github.com/NousResearch/hermes-agent/pull/2974))
- Detect and kill stale SSE connections ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Fix stale stream detector race causing spurious `RemoteProtocolError` ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Skip duplicate callback for `<think>`-extracted reasoning during streaming ([#3116](https://github.com/NousResearch/hermes-agent/pull/3116))
- Preserve reasoning fields in `rewrite_transcript` ([#3311](https://github.com/NousResearch/hermes-agent/pull/3311))
- Preserve Gemini thought signatures in streamed tool calls ([#2997](https://github.com/NousResearch/hermes-agent/pull/2997))
- Ensure first delta is fired during reasoning updates ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
-### Session & Memory
- **Session search recent sessions mode** — Omit query to browse recent sessions with titles, previews, and timestamps ([#2533](https://github.com/NousResearch/hermes-agent/pull/2533))
- **Session config surfacing** on `/new`, `/reset`, and auto-reset ([#3321](https://github.com/NousResearch/hermes-agent/pull/3321))
- **Third-party session isolation** — `--source` flag for isolating sessions by origin ([#3255](https://github.com/NousResearch/hermes-agent/pull/3255))
- Add `/resume` CLI handler, session log truncation guard, `reopen_session` API ([#3315](https://github.com/NousResearch/hermes-agent/pull/3315))
- Clear compressor summary and turn counter on `/clear` and `/new` ([#3102](https://github.com/NousResearch/hermes-agent/pull/3102))
- Surface silent SessionDB failures that cause session data loss ([#2999](https://github.com/NousResearch/hermes-agent/pull/2999))
- Session search fallback preview on summarization failure ([#3478](https://github.com/NousResearch/hermes-agent/pull/3478))
- Prevent stale memory overwrites by flush agent ([#2687](https://github.com/NousResearch/hermes-agent/pull/2687))
-
-### Context Compression
- Replace dead `summary_target_tokens` with ratio-based scaling ([#2554](https://github.com/NousResearch/hermes-agent/pull/2554))
- Expose `compression.target_ratio`, `protect_last_n`, and `threshold` in `DEFAULT_CONFIG` ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Restore sane defaults and cap summary at 12K tokens ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Preserve transcript on `/compress` and hygiene compression ([#3556](https://github.com/NousResearch/hermes-agent/pull/3556))
- Update context pressure warnings and token estimates after compaction ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
-### Architecture & Dependencies
- **Remove mini-swe-agent dependency** — Inline Docker and Modal backends directly ([#2804](https://github.com/NousResearch/hermes-agent/pull/2804))
- **Replace swe-rex with native Modal SDK** for Modal backend ([#3538](https://github.com/NousResearch/hermes-agent/pull/3538))
- **Plugin lifecycle hooks** — `pre_llm_call`, `post_llm_call`, `on_session_start`, `on_session_end` now fire in the agent loop ([#3542](https://github.com/NousResearch/hermes-agent/pull/3542))
- Fix plugin toolsets invisible in `hermes tools` and standalone processes ([#3457](https://github.com/NousResearch/hermes-agent/pull/3457))
- Consolidate `get_hermes_home()` and `parse_reasoning_effort()` ([#3062](https://github.com/NousResearch/hermes-agent/pull/3062))
- Remove unused Hermes-native PKCE OAuth flow ([#3107](https://github.com/NousResearch/hermes-agent/pull/3107))
- Remove ~100 unused imports across 55 files ([#3016](https://github.com/NousResearch/hermes-agent/pull/3016))
- Fix 154 f-strings, simplify getattr/URL patterns, remove dead code ([#3119](https://github.com/NousResearch/hermes-agent/pull/3119))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### Telegram
- **Private Chat Topics** — Project-based conversations with functional skill binding per topic, enabling isolated workflows within a single Telegram chat ([#3163](https://github.com/NousResearch/hermes-agent/pull/3163))
- **Auto-discover fallback IPs via DNS-over-HTTPS** when `api.telegram.org` is unreachable ([#3376](https://github.com/NousResearch/hermes-agent/pull/3376))
- **Configurable reply threading mode** ([#2907](https://github.com/NousResearch/hermes-agent/pull/2907))
- Fall back to no `thread_id` on "Message thread not found" BadRequest ([#3390](https://github.com/NousResearch/hermes-agent/pull/3390))
- Self-reschedule reconnect when `start_polling` fails after 502 ([#3268](https://github.com/NousResearch/hermes-agent/pull/3268))
-
-### Discord
- Stop phantom typing indicator after agent turn completes ([#3003](https://github.com/NousResearch/hermes-agent/pull/3003))
-
-### Slack
- Send tool call progress messages to correct Slack thread ([#3063](https://github.com/NousResearch/hermes-agent/pull/3063))
- Scope progress thread fallback to Slack only ([#3488](https://github.com/NousResearch/hermes-agent/pull/3488))
-
-### WhatsApp
- Download documents, audio, and video media from messages ([#2978](https://github.com/NousResearch/hermes-agent/pull/2978))
-
-### Matrix
- Add missing Matrix entry in `PLATFORMS` dict ([#3473](https://github.com/NousResearch/hermes-agent/pull/3473))
- Harden e2ee access-token handling ([#3562](https://github.com/NousResearch/hermes-agent/pull/3562))
- Add backoff for `SyncError` in sync loop ([#3280](https://github.com/NousResearch/hermes-agent/pull/3280))
-
-### Signal
- Track SSE keepalive comments as connection activity ([#3316](https://github.com/NousResearch/hermes-agent/pull/3316))
-
-### Email
- Prevent unbounded growth of `_seen_uids` in EmailAdapter ([#3490](https://github.com/NousResearch/hermes-agent/pull/3490))
-
-### Gateway Core
- **Config-gated `/verbose` command** for messaging platforms — toggle tool output verbosity from chat ([#3262](https://github.com/NousResearch/hermes-agent/pull/3262))
- **Background review notifications** delivered to user chat ([#3293](https://github.com/NousResearch/hermes-agent/pull/3293))
- **Retry transient send failures** and notify user on exhaustion ([#3288](https://github.com/NousResearch/hermes-agent/pull/3288))
- Recover from hung agents — `/stop` hard-kills session lock ([#3104](https://github.com/NousResearch/hermes-agent/pull/3104))
- Thread-safe `SessionStore` — protect `_entries` with `threading.Lock` ([#3052](https://github.com/NousResearch/hermes-agent/pull/3052))
- Fix gateway token double-counting with cached agents — use absolute set instead of increment ([#3306](https://github.com/NousResearch/hermes-agent/pull/3306), [#3317](https://github.com/NousResearch/hermes-agent/pull/3317))
- Fingerprint full auth token in agent cache signature ([#3247](https://github.com/NousResearch/hermes-agent/pull/3247))
- Silence background agent terminal output ([#3297](https://github.com/NousResearch/hermes-agent/pull/3297))
- Include per-platform `ALLOW_ALL` and `SIGNAL_GROUP` in startup allowlist check ([#3313](https://github.com/NousResearch/hermes-agent/pull/3313))
- Include user-local bin paths in systemd unit PATH ([#3527](https://github.com/NousResearch/hermes-agent/pull/3527))
- Track background task references in `GatewayRunner` ([#3254](https://github.com/NousResearch/hermes-agent/pull/3254))
- Add request timeouts to HA, Email, Mattermost, SMS adapters ([#3258](https://github.com/NousResearch/hermes-agent/pull/3258))
- Add media download retry to Mattermost, Slack, and base cache ([#3323](https://github.com/NousResearch/hermes-agent/pull/3323))
- Detect virtualenv path instead of hardcoding `venv/` ([#2797](https://github.com/NousResearch/hermes-agent/pull/2797))
- Use `TERMINAL_CWD` for context file discovery, not process cwd ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Stop loading hermes repo AGENTS.md into gateway sessions (~10k wasted tokens) ([#2891](https://github.com/NousResearch/hermes-agent/pull/2891))
-
---
-
-## 🖥️ CLI & User Experience
-
-### Interactive CLI
- **Configurable busy input mode** + fix `/queue` always working ([#3298](https://github.com/NousResearch/hermes-agent/pull/3298))
- **Preserve user input on multiline paste** ([#3065](https://github.com/NousResearch/hermes-agent/pull/3065))
- **Tool generation callback** — streaming "preparing terminal…" updates during tool argument generation ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Show tool progress for substantive tools, not just "preparing" ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Buffer reasoning preview chunks and fix duplicate display ([#3013](https://github.com/NousResearch/hermes-agent/pull/3013))
- Prevent reasoning box from rendering 3x during tool-calling loops ([#3405](https://github.com/NousResearch/hermes-agent/pull/3405))
- Eliminate "Event loop is closed" / "Press ENTER to continue" during idle — three-layer fix with `neuter_async_httpx_del()`, custom exception handler, and stale client cleanup ([#3398](https://github.com/NousResearch/hermes-agent/pull/3398))
- Fix status bar shows 26K instead of 260K for token counts with trailing zeros ([#3024](https://github.com/NousResearch/hermes-agent/pull/3024))
- Fix status bar duplicates and degrades during long sessions ([#3291](https://github.com/NousResearch/hermes-agent/pull/3291))
- Refresh TUI before background task output to prevent status bar overlap ([#3048](https://github.com/NousResearch/hermes-agent/pull/3048))
- Suppress KawaiiSpinner animation under `patch_stdout` ([#2994](https://github.com/NousResearch/hermes-agent/pull/2994))
- Skip KawaiiSpinner when TUI handles tool progress ([#2973](https://github.com/NousResearch/hermes-agent/pull/2973))
- Guard `isatty()` against closed streams via `_is_tty` property ([#3056](https://github.com/NousResearch/hermes-agent/pull/3056))
- Ensure single closure of streaming boxes during tool generation ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Cap context pressure percentage at 100% in display ([#3480](https://github.com/NousResearch/hermes-agent/pull/3480))
- Clean up HTML error messages in CLI display ([#3069](https://github.com/NousResearch/hermes-agent/pull/3069))
- Show HTTP status code and 400 body in API error output ([#3096](https://github.com/NousResearch/hermes-agent/pull/3096))
- Extract useful info from HTML error pages, dump debug on max retries ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Prevent TypeError on startup when `base_url` is None ([#3068](https://github.com/NousResearch/hermes-agent/pull/3068))
- Prevent update crash in non-TTY environments ([#3094](https://github.com/NousResearch/hermes-agent/pull/3094))
- Handle EOFError in sessions delete/prune confirmation prompts ([#3101](https://github.com/NousResearch/hermes-agent/pull/3101))
- Catch KeyboardInterrupt during `flush_memories` on exit and in exit cleanup handlers ([#3025](https://github.com/NousResearch/hermes-agent/pull/3025), [#3257](https://github.com/NousResearch/hermes-agent/pull/3257))
- Guard `.strip()` against None values from YAML config ([#3552](https://github.com/NousResearch/hermes-agent/pull/3552))
- Guard `config.get()` against YAML null values to prevent AttributeError ([#3377](https://github.com/NousResearch/hermes-agent/pull/3377))
- Store asyncio task references to prevent GC mid-execution ([#3267](https://github.com/NousResearch/hermes-agent/pull/3267))
-
-### Setup & Configuration
- Use explicit key mapping for returning-user menu dispatch instead of positional index ([#3083](https://github.com/NousResearch/hermes-agent/pull/3083))
- Use `sys.executable` for pip in update commands to fix PEP 668 ([#3099](https://github.com/NousResearch/hermes-agent/pull/3099))
- Harden `hermes update` against diverged history, non-main branches, and gateway edge cases ([#3492](https://github.com/NousResearch/hermes-agent/pull/3492))
- OpenClaw migration overwrites defaults and setup wizard skips imported sections — fixed ([#3282](https://github.com/NousResearch/hermes-agent/pull/3282))
- Stop recursive AGENTS.md walk, load top-level only ([#3110](https://github.com/NousResearch/hermes-agent/pull/3110))
- Add macOS Homebrew paths to browser and terminal PATH resolution ([#2713](https://github.com/NousResearch/hermes-agent/pull/2713))
- YAML boolean handling for `tool_progress` config ([#3300](https://github.com/NousResearch/hermes-agent/pull/3300))
- Reset default SOUL.md to baseline identity text ([#3159](https://github.com/NousResearch/hermes-agent/pull/3159))
- Reject relative cwd paths for container terminal backends ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Add explicit `hermes-api-server` toolset for API server platform ([#3304](https://github.com/NousResearch/hermes-agent/pull/3304))
- Reorder setup wizard providers — OpenRouter first ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
---
-
-## 🔧 Tool System
-
-### API Server
- **Idempotency-Key support**, body size limit, and OpenAI error envelope ([#2903](https://github.com/NousResearch/hermes-agent/pull/2903))
- Allow Idempotency-Key in CORS headers ([#3530](https://github.com/NousResearch/hermes-agent/pull/3530))
- Cancel orphaned agent + true interrupt on SSE disconnect ([#3427](https://github.com/NousResearch/hermes-agent/pull/3427))
- Fix streaming breaks when agent makes tool calls ([#2985](https://github.com/NousResearch/hermes-agent/pull/2985))
-
-### Terminal & File Operations
- Handle addition-only hunks in V4A patch parser ([#3325](https://github.com/NousResearch/hermes-agent/pull/3325))
- Exponential backoff for persistent shell polling ([#2996](https://github.com/NousResearch/hermes-agent/pull/2996))
- Add timeout to subprocess calls in `context_references` ([#3469](https://github.com/NousResearch/hermes-agent/pull/3469))
-
-### Browser & Vision
- Handle 402 insufficient credits error in vision tool ([#2802](https://github.com/NousResearch/hermes-agent/pull/2802))
- Fix `browser_vision` ignores `auxiliary.vision.timeout` config ([#2901](https://github.com/NousResearch/hermes-agent/pull/2901))
- Make browser command timeout configurable via config.yaml ([#2801](https://github.com/NousResearch/hermes-agent/pull/2801))
-
-### MCP
- MCP toolset resolution for runtime and config ([#3252](https://github.com/NousResearch/hermes-agent/pull/3252))
- Add MCP tool name collision protection ([#3077](https://github.com/NousResearch/hermes-agent/pull/3077))
-
-### Auxiliary LLM
- Guard aux LLM calls against None content + reasoning fallback + retry ([#3449](https://github.com/NousResearch/hermes-agent/pull/3449))
- Catch ImportError from `build_anthropic_client` in vision auto-detection ([#3312](https://github.com/NousResearch/hermes-agent/pull/3312))
-
-### Other Tools
- Add request timeouts to `send_message_tool` HTTP calls ([#3162](https://github.com/NousResearch/hermes-agent/pull/3162)) by @memosr
- Auto-repair `jobs.json` with invalid control characters ([#3537](https://github.com/NousResearch/hermes-agent/pull/3537))
- Enable fine-grained tool streaming for Claude/OpenRouter ([#3497](https://github.com/NousResearch/hermes-agent/pull/3497))
-
---
-
-## 🧩 Skills Ecosystem
-
-### Skills System
- **Env var passthrough** for skills and user config — skills can declare environment variables to pass through ([#2807](https://github.com/NousResearch/hermes-agent/pull/2807))
- Cache skills prompt with shared `skill_utils` module for faster TTFT ([#3421](https://github.com/NousResearch/hermes-agent/pull/3421))
- Avoid redundant file re-read for skill conditions ([#2992](https://github.com/NousResearch/hermes-agent/pull/2992))
- Use Git Trees API to prevent silent subdirectory loss during install ([#2995](https://github.com/NousResearch/hermes-agent/pull/2995))
- Fix skills-sh install for deeply nested repo structures ([#2980](https://github.com/NousResearch/hermes-agent/pull/2980))
- Handle null metadata in skill frontmatter ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Preserve trust for skills-sh identifiers + reduce resolution churn ([#3251](https://github.com/NousResearch/hermes-agent/pull/3251))
- Agent-created skills were incorrectly treated as untrusted community content — fixed ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
-### New Skills
- **G0DM0D3 godmode jailbreaking skill** + docs ([#3157](https://github.com/NousResearch/hermes-agent/pull/3157))
- **Docker management skill** added to optional-skills ([#3060](https://github.com/NousResearch/hermes-agent/pull/3060))
- **OpenClaw migration v2** — 17 new modules, terminal recap for migrating from OpenClaw to Hermes ([#2906](https://github.com/NousResearch/hermes-agent/pull/2906))
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- **SSRF protection** added to `browser_navigate` ([#3058](https://github.com/NousResearch/hermes-agent/pull/3058))
- **SSRF protection** added to `vision_tools` and `web_tools` (hardened) ([#2679](https://github.com/NousResearch/hermes-agent/pull/2679))
- **Restrict subagent toolsets** to parent's enabled set ([#3269](https://github.com/NousResearch/hermes-agent/pull/3269))
- **Prevent zip-slip path traversal** in self-update ([#3250](https://github.com/NousResearch/hermes-agent/pull/3250))
- **Prevent shell injection** in `_expand_path` via `~user` path suffix ([#2685](https://github.com/NousResearch/hermes-agent/pull/2685))
- **Normalize input** before dangerous command detection ([#3260](https://github.com/NousResearch/hermes-agent/pull/3260))
- Make tirith block verdicts approvable instead of hard-blocking ([#3428](https://github.com/NousResearch/hermes-agent/pull/3428))
- Remove compromised `litellm`/`typer`/`platformdirs` from deps ([#2796](https://github.com/NousResearch/hermes-agent/pull/2796))
- Pin all dependency version ranges ([#2810](https://github.com/NousResearch/hermes-agent/pull/2810))
- Regenerate `uv.lock` with hashes, use lockfile in setup ([#2812](https://github.com/NousResearch/hermes-agent/pull/2812))
- Bump dependencies to fix CVEs + regenerate `uv.lock` ([#3073](https://github.com/NousResearch/hermes-agent/pull/3073))
- Supply chain audit CI workflow for PR scanning ([#2816](https://github.com/NousResearch/hermes-agent/pull/2816))
-
-### Reliability
- **SQLite WAL write-lock contention** causing 15-20s TUI freeze — fixed ([#3385](https://github.com/NousResearch/hermes-agent/pull/3385))
- **SQLite concurrency hardening** + session transcript integrity ([#3249](https://github.com/NousResearch/hermes-agent/pull/3249))
- Prevent recurring cron job re-fire on gateway crash/restart loop ([#3396](https://github.com/NousResearch/hermes-agent/pull/3396))
- Mark cron session as ended after job completes ([#2998](https://github.com/NousResearch/hermes-agent/pull/2998))
-
---
-
-## ⚡ Performance
-
- **TTFT startup optimizations** — salvaged easy-win startup improvements ([#3395](https://github.com/NousResearch/hermes-agent/pull/3395))
- Cache skills prompt with shared `skill_utils` module ([#3421](https://github.com/NousResearch/hermes-agent/pull/3421))
- Avoid redundant file re-read for skill conditions in prompt builder ([#2992](https://github.com/NousResearch/hermes-agent/pull/2992))
-
---
-
-## 🐛 Notable Bug Fixes
-
- Fix gateway token double-counting with cached agents ([#3306](https://github.com/NousResearch/hermes-agent/pull/3306), [#3317](https://github.com/NousResearch/hermes-agent/pull/3317))
- Fix "Event loop is closed" / "Press ENTER to continue" during idle sessions ([#3398](https://github.com/NousResearch/hermes-agent/pull/3398))
- Fix reasoning box rendering 3x during tool-calling loops ([#3405](https://github.com/NousResearch/hermes-agent/pull/3405))
- Fix status bar shows 26K instead of 260K for token counts ([#3024](https://github.com/NousResearch/hermes-agent/pull/3024))
- Fix `/queue` always working regardless of config ([#3298](https://github.com/NousResearch/hermes-agent/pull/3298))
- Fix phantom Discord typing indicator after agent turn ([#3003](https://github.com/NousResearch/hermes-agent/pull/3003))
- Fix Slack progress messages appearing in wrong thread ([#3063](https://github.com/NousResearch/hermes-agent/pull/3063))
- Fix WhatsApp media downloads (documents, audio, video) ([#2978](https://github.com/NousResearch/hermes-agent/pull/2978))
- Fix Telegram "Message thread not found" killing progress messages ([#3390](https://github.com/NousResearch/hermes-agent/pull/3390))
- Fix OpenClaw migration overwriting defaults ([#3282](https://github.com/NousResearch/hermes-agent/pull/3282))
- Fix returning-user setup menu dispatching wrong section ([#3083](https://github.com/NousResearch/hermes-agent/pull/3083))
- Fix `hermes update` PEP 668 "externally-managed-environment" error ([#3099](https://github.com/NousResearch/hermes-agent/pull/3099))
- Fix subagents hitting `max_iterations` prematurely via shared budget ([#3004](https://github.com/NousResearch/hermes-agent/pull/3004))
- Fix YAML boolean handling for `tool_progress` config ([#3300](https://github.com/NousResearch/hermes-agent/pull/3300))
- Fix `config.get()` crashes on YAML null values ([#3377](https://github.com/NousResearch/hermes-agent/pull/3377))
- Fix `.strip()` crash on None values from YAML config ([#3552](https://github.com/NousResearch/hermes-agent/pull/3552))
- Fix hung agents on gateway — `/stop` now hard-kills session lock ([#3104](https://github.com/NousResearch/hermes-agent/pull/3104))
- Fix `_custom` provider silently remapped to `openrouter` ([#2792](https://github.com/NousResearch/hermes-agent/pull/2792))
- Fix Matrix missing from `PLATFORMS` dict ([#3473](https://github.com/NousResearch/hermes-agent/pull/3473))
- Fix Email adapter unbounded `_seen_uids` growth ([#3490](https://github.com/NousResearch/hermes-agent/pull/3490))
-
---
-
-## 🧪 Testing
-
- Pin `agent-client-protocol` < 0.9 to handle breaking upstream release ([#3320](https://github.com/NousResearch/hermes-agent/pull/3320))
- Catch anthropic ImportError in vision auto-detection tests ([#3312](https://github.com/NousResearch/hermes-agent/pull/3312))
- Update retry-exhaust test for new graceful return behavior ([#3320](https://github.com/NousResearch/hermes-agent/pull/3320))
- Add regression tests for null metadata frontmatter ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
---
-
-## 📚 Documentation
-
- Update all docs for `/model` command overhaul and custom provider support ([#2800](https://github.com/NousResearch/hermes-agent/pull/2800))
- Fix stale and incorrect documentation across 18 files ([#2805](https://github.com/NousResearch/hermes-agent/pull/2805))
- Document 9 previously undocumented features ([#2814](https://github.com/NousResearch/hermes-agent/pull/2814))
- Add missing skills, CLI commands, and messaging env vars to docs ([#2809](https://github.com/NousResearch/hermes-agent/pull/2809))
- Fix api-server response storage documentation — SQLite, not in-memory ([#2819](https://github.com/NousResearch/hermes-agent/pull/2819))
- Quote pip install extras to fix zsh glob errors ([#2815](https://github.com/NousResearch/hermes-agent/pull/2815))
- Unify hooks documentation — add plugin hooks to hooks page, add `session:end` event ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Clarify two-mode behavior in `session_search` schema description ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Fix Discord Public Bot setting for Discord-provided invite link ([#3519](https://github.com/NousResearch/hermes-agent/pull/3519)) by @mehmoodosman
- Revise v0.4.0 changelog — fix feature attribution, reorder sections ([untagged commit](https://github.com/NousResearch/hermes-agent))
-
---
-
-## 👥 Contributors
-
-### Core
- **@teknium1** — 157 PRs covering the full scope of this release
-
-### Community Contributors
- **@alt-glitch** (Siddharth Balyan) — 2 PRs: Nix flake with uv2nix build, NixOS module, and persistent container mode ([#20](https://github.com/NousResearch/hermes-agent/pull/20)); auto-generated config keys and suffix PATHs for Nix builds ([#3061](https://github.com/NousResearch/hermes-agent/pull/3061), [#3274](https://github.com/NousResearch/hermes-agent/pull/3274))
- **@ctlst** — 1 PR: Prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode ([#2701](https://github.com/NousResearch/hermes-agent/pull/2701))
- **@memosr** (memosr.eth) — 1 PR: Add request timeouts to `send_message_tool` HTTP calls ([#3162](https://github.com/NousResearch/hermes-agent/pull/3162))
- **@mehmoodosman** (Osman Mehmood) — 1 PR: Fix Discord docs for Public Bot setting ([#3519](https://github.com/NousResearch/hermes-agent/pull/3519))
-
-### All Contributors
-@alt-glitch, @ctlst, @mehmoodosman, @memosr, @teknium1
-
---
-
-**Full Changelog**: [v2026.3.23...v2026.3.28](https://github.com/NousResearch/hermes-agent/compare/v2026.3.23...v2026.3.28)
--- a/RELEASE_v0.6.0.md
+++ b/RELEASE_v0.6.0.md
@@ -1,249 +0,0 @@
-# Hermes Agent v0.6.0 (v2026.3.30)
-
-**Release Date:** March 30, 2026
-
-> The multi-instance release — Profiles for running isolated agent instances, MCP server mode, Docker container, fallback provider chains, two new messaging platforms (Feishu/Lark and WeCom), Telegram webhook mode, Slack multi-workspace OAuth, 95 PRs and 16 resolved issues in 2 days.
-
---
-
-## ✨ Highlights
-
- **Profiles — Multi-Instance Hermes** — Run multiple isolated Hermes instances from the same installation. Each profile gets its own config, memory, sessions, skills, and gateway service. Create with `hermes profile create`, switch with `hermes -p <name>`, export/import for sharing. Full token-lock isolation prevents two profiles from using the same bot credential. ([#3681](https://github.com/NousResearch/hermes-agent/pull/3681))
-
- **MCP Server Mode** — Expose Hermes conversations and sessions to any MCP-compatible client (Claude Desktop, Cursor, VS Code, etc.) via `hermes mcp serve`. Browse conversations, read messages, search across sessions, and manage attachments — all through the Model Context Protocol. Supports both stdio and Streamable HTTP transports. ([#3795](https://github.com/NousResearch/hermes-agent/pull/3795))
-
- **Docker Container** — Official Dockerfile for running Hermes Agent in a container. Supports both CLI and gateway modes with volume-mounted config. ([#3668](https://github.com/NousResearch/hermes-agent/pull/3668), closes [#850](https://github.com/NousResearch/hermes-agent/issues/850))
-
- **Ordered Fallback Provider Chain** — Configure multiple inference providers with automatic failover. When your primary provider returns errors or is unreachable, Hermes automatically tries the next provider in the chain. Configure via `fallback_providers` in config.yaml. ([#3813](https://github.com/NousResearch/hermes-agent/pull/3813), closes [#1734](https://github.com/NousResearch/hermes-agent/issues/1734))
-
- **Feishu/Lark Platform Support** — Full gateway adapter for Feishu (飞书) and Lark with event subscriptions, message cards, group chat, image/file attachments, and interactive card callbacks. ([#3799](https://github.com/NousResearch/hermes-agent/pull/3799), [#3817](https://github.com/NousResearch/hermes-agent/pull/3817), closes [#1788](https://github.com/NousResearch/hermes-agent/issues/1788))
-
- **WeCom (Enterprise WeChat) Platform Support** — New gateway adapter for WeCom (企业微信) with text/image/voice messages, group chats, and callback verification. ([#3847](https://github.com/NousResearch/hermes-agent/pull/3847))
-
- **Slack Multi-Workspace OAuth** — Connect a single Hermes gateway to multiple Slack workspaces via OAuth token file. Each workspace gets its own bot token, resolved dynamically per incoming event. ([#3903](https://github.com/NousResearch/hermes-agent/pull/3903))
-
- **Telegram Webhook Mode & Group Controls** — Run the Telegram adapter in webhook mode as an alternative to polling — faster response times and better for production deployments behind a reverse proxy. New group mention gating controls when the bot responds: always, only when @mentioned, or via regex triggers. ([#3880](https://github.com/NousResearch/hermes-agent/pull/3880), [#3870](https://github.com/NousResearch/hermes-agent/pull/3870))
-
- **Exa Search Backend** — Add Exa as an alternative web search and content extraction backend alongside Firecrawl and DuckDuckGo. Set `EXA_API_KEY` and configure as preferred backend. ([#3648](https://github.com/NousResearch/hermes-agent/pull/3648))
-
- **Skills & Credentials on Remote Backends** — Mount skill directories and credential files into Modal and Docker containers, so remote terminal sessions have access to the same skills and secrets as local execution. ([#3890](https://github.com/NousResearch/hermes-agent/pull/3890), [#3671](https://github.com/NousResearch/hermes-agent/pull/3671), closes [#3665](https://github.com/NousResearch/hermes-agent/issues/3665), [#3433](https://github.com/NousResearch/hermes-agent/issues/3433))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### Provider & Model Support
- **Ordered fallback provider chain** — automatic failover across multiple configured providers ([#3813](https://github.com/NousResearch/hermes-agent/pull/3813))
- **Fix api_mode on provider switch** — switching providers via `hermes model` now correctly clears stale `api_mode` instead of hardcoding `chat_completions`, fixing 404s for providers with Anthropic-compatible endpoints ([#3726](https://github.com/NousResearch/hermes-agent/pull/3726), [#3857](https://github.com/NousResearch/hermes-agent/pull/3857), closes [#3685](https://github.com/NousResearch/hermes-agent/issues/3685))
- **Stop silent OpenRouter fallback** — when no provider is configured, Hermes now raises a clear error instead of silently routing to OpenRouter ([#3807](https://github.com/NousResearch/hermes-agent/pull/3807), [#3862](https://github.com/NousResearch/hermes-agent/pull/3862))
- **Gemini 3.1 preview models** — added to OpenRouter and Nous Portal catalogs ([#3803](https://github.com/NousResearch/hermes-agent/pull/3803), closes [#3753](https://github.com/NousResearch/hermes-agent/issues/3753))
- **Gemini direct API context length** — full context length resolution for direct Google AI endpoints ([#3876](https://github.com/NousResearch/hermes-agent/pull/3876))
- **gpt-5.4-mini** added to Codex fallback catalog ([#3855](https://github.com/NousResearch/hermes-agent/pull/3855))
- **Curated model lists preferred** over live API probe when the probe returns fewer models ([#3856](https://github.com/NousResearch/hermes-agent/pull/3856), [#3867](https://github.com/NousResearch/hermes-agent/pull/3867))
- **User-friendly 429 rate limit messages** with Retry-After countdown ([#3809](https://github.com/NousResearch/hermes-agent/pull/3809))
- **Auxiliary client placeholder key** for local servers without auth requirements ([#3842](https://github.com/NousResearch/hermes-agent/pull/3842))
- **INFO-level logging** for auxiliary provider resolution ([#3866](https://github.com/NousResearch/hermes-agent/pull/3866))
-
-### Agent Loop & Conversation
- **Subagent status reporting** — reports `completed` status when summary exists instead of generic failure ([#3829](https://github.com/NousResearch/hermes-agent/pull/3829))
- **Session log file updated during compression** — prevents stale file references after context compression ([#3835](https://github.com/NousResearch/hermes-agent/pull/3835))
- **Omit empty tools param** — sends no `tools` parameter when empty instead of `None`, fixing compatibility with strict providers ([#3820](https://github.com/NousResearch/hermes-agent/pull/3820))
-
-### Profiles & Multi-Instance
- **Profiles system** — `hermes profile create/list/switch/delete/export/import/rename`. Each profile gets isolated HERMES_HOME, gateway service, CLI wrapper. Token locks prevent credential collisions. Tab completion for profile names. ([#3681](https://github.com/NousResearch/hermes-agent/pull/3681))
- **Profile-aware display paths** — all user-facing `~/.hermes` paths replaced with `display_hermes_home()` to show the correct profile directory ([#3623](https://github.com/NousResearch/hermes-agent/pull/3623))
- **Lazy display_hermes_home imports** — prevents `ImportError` during `hermes update` when modules cache stale bytecode ([#3776](https://github.com/NousResearch/hermes-agent/pull/3776))
- **HERMES_HOME for protected paths** — `.env` write-deny path now respects HERMES_HOME instead of hardcoded `~/.hermes` ([#3840](https://github.com/NousResearch/hermes-agent/pull/3840))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### New Platforms
- **Feishu/Lark** — Full adapter with event subscriptions, message cards, group chat, image/file attachments, interactive card callbacks ([#3799](https://github.com/NousResearch/hermes-agent/pull/3799), [#3817](https://github.com/NousResearch/hermes-agent/pull/3817))
- **WeCom (Enterprise WeChat)** — Text/image/voice messages, group chats, callback verification ([#3847](https://github.com/NousResearch/hermes-agent/pull/3847))
-
-### Telegram
- **Webhook mode** — run as webhook endpoint instead of polling for production deployments ([#3880](https://github.com/NousResearch/hermes-agent/pull/3880))
- **Group mention gating & regex triggers** — configurable bot response behavior in groups: always, @mention-only, or regex-matched ([#3870](https://github.com/NousResearch/hermes-agent/pull/3870))
- **Gracefully handle deleted reply targets** — no more crashes when the message being replied to was deleted ([#3858](https://github.com/NousResearch/hermes-agent/pull/3858), closes [#3229](https://github.com/NousResearch/hermes-agent/issues/3229))
-
-### Discord
- **Message processing reactions** — adds a reaction emoji while processing and removes it when done, giving visual feedback in channels ([#3871](https://github.com/NousResearch/hermes-agent/pull/3871))
- **DISCORD_IGNORE_NO_MENTION** — skip messages that @mention other users/bots but not Hermes ([#3640](https://github.com/NousResearch/hermes-agent/pull/3640))
- **Clean up deferred "thinking..."** — properly removes the "thinking..." indicator after slash commands complete ([#3674](https://github.com/NousResearch/hermes-agent/pull/3674), closes [#3595](https://github.com/NousResearch/hermes-agent/issues/3595))
-
-### Slack
- **Multi-workspace OAuth** — connect to multiple Slack workspaces from a single gateway via OAuth token file ([#3903](https://github.com/NousResearch/hermes-agent/pull/3903))
-
-### WhatsApp
- **Persistent aiohttp session** — reuse HTTP sessions across requests instead of creating new ones per message ([#3818](https://github.com/NousResearch/hermes-agent/pull/3818))
- **LID↔phone alias resolution** — correctly match Linked ID and phone number formats in allowlists ([#3830](https://github.com/NousResearch/hermes-agent/pull/3830))
- **Skip reply prefix in bot mode** — cleaner message formatting when running as a WhatsApp bot ([#3931](https://github.com/NousResearch/hermes-agent/pull/3931))
-
-### Matrix
- **Native voice messages via MSC3245** — send voice messages as proper Matrix voice events instead of file attachments ([#3877](https://github.com/NousResearch/hermes-agent/pull/3877))
-
-### Mattermost
- **Configurable mention behavior** — respond to messages without requiring @mention ([#3664](https://github.com/NousResearch/hermes-agent/pull/3664))
-
-### Signal
- **URL-encode phone numbers** and correct attachment RPC parameter — fixes delivery failures with certain phone number formats ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670)) — @kshitijk4poor
-
-### Email
- **Close SMTP/IMAP connections on failure** — prevents connection leaks during error scenarios ([#3804](https://github.com/NousResearch/hermes-agent/pull/3804))
-
-### Gateway Core
- **Atomic config writes** — use atomic file writes for config.yaml to prevent data loss during crashes ([#3800](https://github.com/NousResearch/hermes-agent/pull/3800))
- **Home channel env overrides** — apply environment variable overrides for home channels consistently ([#3796](https://github.com/NousResearch/hermes-agent/pull/3796), [#3808](https://github.com/NousResearch/hermes-agent/pull/3808))
- **Replace print() with logger** — BasePlatformAdapter now uses proper logging instead of print statements ([#3669](https://github.com/NousResearch/hermes-agent/pull/3669))
- **Cron delivery labels** — resolve human-friendly delivery labels via channel directory ([#3860](https://github.com/NousResearch/hermes-agent/pull/3860), closes [#1945](https://github.com/NousResearch/hermes-agent/issues/1945))
- **Cron [SILENT] tightening** — prevent agents from prefixing reports with [SILENT] to suppress delivery ([#3901](https://github.com/NousResearch/hermes-agent/pull/3901))
- **Background task media delivery** and vision download timeout fixes ([#3919](https://github.com/NousResearch/hermes-agent/pull/3919))
- **Boot-md hook** — example built-in hook to run a BOOT.md file on gateway startup ([#3733](https://github.com/NousResearch/hermes-agent/pull/3733))
-
---
-
-## 🖥️ CLI & User Experience
-
-### Interactive CLI
- **Configurable tool preview length** — show full file paths by default instead of truncating at 40 chars ([#3841](https://github.com/NousResearch/hermes-agent/pull/3841))
- **Tool token context display** — `hermes tools` checklist now shows estimated token cost per toolset ([#3805](https://github.com/NousResearch/hermes-agent/pull/3805))
- **/bg spinner TUI fix** — route background task spinner through the TUI widget to prevent status bar collision ([#3643](https://github.com/NousResearch/hermes-agent/pull/3643))
- **Prevent status bar wrapping** into duplicate rows ([#3883](https://github.com/NousResearch/hermes-agent/pull/3883)) — @kshitijk4poor
- **Handle closed stdout ValueError** in safe print paths — fixes crashes when stdout is closed during gateway thread shutdown ([#3843](https://github.com/NousResearch/hermes-agent/pull/3843), closes [#3534](https://github.com/NousResearch/hermes-agent/issues/3534))
- **Remove input() from /tools disable** — eliminates freeze in terminal when disabling tools ([#3918](https://github.com/NousResearch/hermes-agent/pull/3918))
- **TTY guard for interactive CLI commands** — prevent CPU spin when launched without a terminal ([#3933](https://github.com/NousResearch/hermes-agent/pull/3933))
- **Argparse entrypoint** — use argparse in the top-level launcher for cleaner error handling ([#3874](https://github.com/NousResearch/hermes-agent/pull/3874))
- **Lazy-initialized tools show yellow** in banner instead of red, reducing false alarm about "missing" tools ([#3822](https://github.com/NousResearch/hermes-agent/pull/3822))
- **Honcho tools shown in banner** when configured ([#3810](https://github.com/NousResearch/hermes-agent/pull/3810))
-
-### Setup & Configuration
- **Auto-install matrix-nio** during `hermes setup` when Matrix is selected ([#3802](https://github.com/NousResearch/hermes-agent/pull/3802), [#3873](https://github.com/NousResearch/hermes-agent/pull/3873))
- **Session export stdout support** — export sessions to stdout with `-` for piping ([#3641](https://github.com/NousResearch/hermes-agent/pull/3641), closes [#3609](https://github.com/NousResearch/hermes-agent/issues/3609))
- **Configurable approval timeouts** — set how long dangerous command approval prompts wait before auto-denying ([#3886](https://github.com/NousResearch/hermes-agent/pull/3886), closes [#3765](https://github.com/NousResearch/hermes-agent/issues/3765))
- **Clear __pycache__ during update** — prevents stale bytecode ImportError after `hermes update` ([#3819](https://github.com/NousResearch/hermes-agent/pull/3819))
-
---
-
-## 🔧 Tool System
-
-### MCP
- **MCP Server Mode** — `hermes mcp serve` exposes conversations, sessions, and attachments to MCP clients via stdio or Streamable HTTP ([#3795](https://github.com/NousResearch/hermes-agent/pull/3795))
- **Dynamic tool discovery** — respond to `notifications/tools/list_changed` events to pick up new tools from MCP servers without reconnecting ([#3812](https://github.com/NousResearch/hermes-agent/pull/3812))
- **Non-deprecated HTTP transport** — switched from `sse_client` to `streamable_http_client` ([#3646](https://github.com/NousResearch/hermes-agent/pull/3646))
-
-### Web Tools
- **Exa search backend** — alternative to Firecrawl and DuckDuckGo for web search and extraction ([#3648](https://github.com/NousResearch/hermes-agent/pull/3648))
-
-### Browser
- **Guard against None LLM responses** in browser snapshot and vision tools ([#3642](https://github.com/NousResearch/hermes-agent/pull/3642))
-
-### Terminal & Remote Backends
- **Mount skill directories** into Modal and Docker containers ([#3890](https://github.com/NousResearch/hermes-agent/pull/3890))
- **Mount credential files** into remote backends with mtime+size caching ([#3671](https://github.com/NousResearch/hermes-agent/pull/3671))
- **Preserve partial output** when commands time out instead of losing everything ([#3868](https://github.com/NousResearch/hermes-agent/pull/3868))
- **Stop marking persisted env vars as missing** on remote backends ([#3650](https://github.com/NousResearch/hermes-agent/pull/3650))
-
-### Audio
- **.aac format support** in transcription tool ([#3865](https://github.com/NousResearch/hermes-agent/pull/3865), closes [#1963](https://github.com/NousResearch/hermes-agent/issues/1963))
- **Audio download retry** — retry logic for `cache_audio_from_url` matching the existing image download pattern ([#3401](https://github.com/NousResearch/hermes-agent/pull/3401)) — @binhnt92
-
-### Vision
- **Reject non-image files** and enforce website-only policy for vision analysis ([#3845](https://github.com/NousResearch/hermes-agent/pull/3845))
-
-### Tool Schema
- **Ensure name field** always present in tool definitions, fixing `KeyError: 'name'` crashes ([#3811](https://github.com/NousResearch/hermes-agent/pull/3811), closes [#3729](https://github.com/NousResearch/hermes-agent/issues/3729))
-
-### ACP (Editor Integration)
- **Complete session management surface** for VS Code/Zed/JetBrains clients — proper task lifecycle, cancel support, session persistence ([#3675](https://github.com/NousResearch/hermes-agent/pull/3675))
-
---
-
-## 🧩 Skills & Plugins
-
-### Skills System
- **External skill directories** — configure additional skill directories via `skills.external_dirs` in config.yaml ([#3678](https://github.com/NousResearch/hermes-agent/pull/3678))
- **Category path traversal blocked** — prevents `../` attacks in skill category names ([#3844](https://github.com/NousResearch/hermes-agent/pull/3844))
- **parallel-cli moved to optional-skills** — reduces default skill footprint ([#3673](https://github.com/NousResearch/hermes-agent/pull/3673)) — @kshitijk4poor
-
-### New Skills
- **memento-flashcards** — spaced repetition flashcard system ([#3827](https://github.com/NousResearch/hermes-agent/pull/3827))
- **songwriting-and-ai-music** — songwriting craft and AI music generation prompts ([#3834](https://github.com/NousResearch/hermes-agent/pull/3834))
- **SiYuan Note** — integration with SiYuan note-taking app ([#3742](https://github.com/NousResearch/hermes-agent/pull/3742))
- **Scrapling** — web scraping skill using Scrapling library ([#3742](https://github.com/NousResearch/hermes-agent/pull/3742))
- **one-three-one-rule** — communication framework skill ([#3797](https://github.com/NousResearch/hermes-agent/pull/3797))
-
-### Plugin System
- **Plugin enable/disable commands** — `hermes plugins enable/disable <name>` for managing plugin state without removing them ([#3747](https://github.com/NousResearch/hermes-agent/pull/3747))
- **Plugin message injection** — plugins can now inject messages into the conversation stream on behalf of the user via `ctx.inject_message()` ([#3778](https://github.com/NousResearch/hermes-agent/pull/3778)) — @winglian
- **Honcho self-hosted support** — allow local Honcho instances without requiring an API key ([#3644](https://github.com/NousResearch/hermes-agent/pull/3644))
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- **Hardened dangerous command detection** — expanded pattern matching for risky shell commands and added file tool path guards for sensitive locations (`/etc/`, `/boot/`, docker.sock) ([#3872](https://github.com/NousResearch/hermes-agent/pull/3872))
- **Sensitive path write checks** in approval system — catch writes to system config files through file tools, not just terminal ([#3859](https://github.com/NousResearch/hermes-agent/pull/3859))
- **Secret redaction expansion** — now covers ElevenLabs, Tavily, and Exa API keys ([#3920](https://github.com/NousResearch/hermes-agent/pull/3920))
- **Vision file rejection** — reject non-image files passed to vision analysis to prevent information disclosure ([#3845](https://github.com/NousResearch/hermes-agent/pull/3845))
- **Category path traversal blocking** — prevent directory traversal in skill category names ([#3844](https://github.com/NousResearch/hermes-agent/pull/3844))
-
-### Reliability
- **Atomic config.yaml writes** — prevent data loss during gateway crashes ([#3800](https://github.com/NousResearch/hermes-agent/pull/3800))
- **Clear __pycache__ on update** — prevent stale bytecode from causing ImportError after updates ([#3819](https://github.com/NousResearch/hermes-agent/pull/3819))
- **Lazy imports for update safety** — prevent ImportError chains during `hermes update` when modules reference new functions ([#3776](https://github.com/NousResearch/hermes-agent/pull/3776))
- **Restore terminalbench2 from patch corruption** — recovered file damaged by patch tool's secret redaction ([#3801](https://github.com/NousResearch/hermes-agent/pull/3801))
- **Terminal timeout preserves partial output** — no more lost command output on timeout ([#3868](https://github.com/NousResearch/hermes-agent/pull/3868))
-
---
-
-## 🐛 Notable Bug Fixes
-
- **OpenClaw migration model config overwrite** — migration no longer overwrites model config dict with a string ([#3924](https://github.com/NousResearch/hermes-agent/pull/3924)) — @0xbyt4
- **OpenClaw migration expanded** — covers full data footprint including sessions, cron, memory ([#3869](https://github.com/NousResearch/hermes-agent/pull/3869))
- **Telegram deleted reply targets** — gracefully handle replies to deleted messages instead of crashing ([#3858](https://github.com/NousResearch/hermes-agent/pull/3858))
- **Discord "thinking..." persistence** — properly cleans up deferred response indicators ([#3674](https://github.com/NousResearch/hermes-agent/pull/3674))
- **WhatsApp LID↔phone aliases** — fixes allowlist matching failures with Linked ID format ([#3830](https://github.com/NousResearch/hermes-agent/pull/3830))
- **Signal URL-encoded phone numbers** — fixes delivery failures with certain formats ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670))
- **Email connection leaks** — properly close SMTP/IMAP connections on error ([#3804](https://github.com/NousResearch/hermes-agent/pull/3804))
- **_safe_print ValueError** — no more gateway thread crashes on closed stdout ([#3843](https://github.com/NousResearch/hermes-agent/pull/3843))
- **Tool schema KeyError 'name'** — ensure name field always present in tool definitions ([#3811](https://github.com/NousResearch/hermes-agent/pull/3811))
- **api_mode stale on provider switch** — correctly clear when switching providers via `hermes model` ([#3857](https://github.com/NousResearch/hermes-agent/pull/3857))
-
---
-
-## 🧪 Testing
-
- Resolved 10+ CI failures across hooks, tiktoken, plugins, and skill tests ([#3848](https://github.com/NousResearch/hermes-agent/pull/3848), [#3721](https://github.com/NousResearch/hermes-agent/pull/3721), [#3936](https://github.com/NousResearch/hermes-agent/pull/3936))
-
---
-
-## 📚 Documentation
-
- **Comprehensive OpenClaw migration guide** — step-by-step guide for migrating from OpenClaw/Claw3D to Hermes Agent ([#3864](https://github.com/NousResearch/hermes-agent/pull/3864), [#3900](https://github.com/NousResearch/hermes-agent/pull/3900))
- **Credential file passthrough docs** — document how to forward credential files and env vars to remote backends ([#3677](https://github.com/NousResearch/hermes-agent/pull/3677))
- **DuckDuckGo requirements clarified** — note runtime dependency on duckduckgo-search package ([#3680](https://github.com/NousResearch/hermes-agent/pull/3680))
- **Skills catalog updated** — added red-teaming category and optional skills listing ([#3745](https://github.com/NousResearch/hermes-agent/pull/3745))
- **Feishu docs MDX fix** — escape angle-bracket URLs that break Docusaurus build ([#3902](https://github.com/NousResearch/hermes-agent/pull/3902))
-
---
-
-## 👥 Contributors
-
-### Core
- **@teknium1** — 90 PRs across all subsystems
-
-### Community Contributors
- **@kshitijk4poor** — 3 PRs: Signal phone number fix ([#3670](https://github.com/NousResearch/hermes-agent/pull/3670)), parallel-cli to optional-skills ([#3673](https://github.com/NousResearch/hermes-agent/pull/3673)), status bar wrapping fix ([#3883](https://github.com/NousResearch/hermes-agent/pull/3883))
- **@winglian** — 1 PR: Plugin message injection interface ([#3778](https://github.com/NousResearch/hermes-agent/pull/3778))
- **@binhnt92** — 1 PR: Audio download retry logic ([#3401](https://github.com/NousResearch/hermes-agent/pull/3401))
- **@0xbyt4** — 1 PR: OpenClaw migration model config fix ([#3924](https://github.com/NousResearch/hermes-agent/pull/3924))
-
-### Issues Resolved from Community
-@Material-Scientist ([#850](https://github.com/NousResearch/hermes-agent/issues/850)), @hanxu98121 ([#1734](https://github.com/NousResearch/hermes-agent/issues/1734)), @penwyp ([#1788](https://github.com/NousResearch/hermes-agent/issues/1788)), @dan-and ([#1945](https://github.com/NousResearch/hermes-agent/issues/1945)), @AdrianScott ([#1963](https://github.com/NousResearch/hermes-agent/issues/1963)), @clawdbot47 ([#3229](https://github.com/NousResearch/hermes-agent/issues/3229)), @alanfwilliams ([#3404](https://github.com/NousResearch/hermes-agent/issues/3404)), @kentimsit ([#3433](https://github.com/NousResearch/hermes-agent/issues/3433)), @hayka-pacha ([#3534](https://github.com/NousResearch/hermes-agent/issues/3534)), @primmer ([#3595](https://github.com/NousResearch/hermes-agent/issues/3595)), @dagelf ([#3609](https://github.com/NousResearch/hermes-agent/issues/3609)), @HenkDz ([#3685](https://github.com/NousResearch/hermes-agent/issues/3685)), @tmdgusya ([#3729](https://github.com/NousResearch/hermes-agent/issues/3729)), @TypQxQ ([#3753](https://github.com/NousResearch/hermes-agent/issues/3753)), @acsezen ([#3765](https://github.com/NousResearch/hermes-agent/issues/3765))
-
---
-
-**Full Changelog**: [v2026.3.28...v2026.3.30](https://github.com/NousResearch/hermes-agent/compare/v2026.3.28...v2026.3.30)
--- a/RELEASE_v0.7.0.md
+++ b/RELEASE_v0.7.0.md
@@ -1,290 +0,0 @@
-# Hermes Agent v0.7.0 (v2026.4.3)
-
-**Release Date:** April 3, 2026
-
-> The resilience release — pluggable memory providers, credential pool rotation, Camofox anti-detection browser, inline diff previews, gateway hardening across race conditions and approval routing, and deep security fixes across 168 PRs and 46 resolved issues.
-
---
-
-## ✨ Highlights
-
- **Pluggable Memory Provider Interface** — Memory is now an extensible plugin system. Third-party memory backends (Honcho, vector stores, custom DBs) implement a simple provider ABC and register via the plugin system. Built-in memory is the default provider. Honcho integration restored to full parity as the reference plugin with profile-scoped host/peer resolution. ([#4623](https://github.com/NousResearch/hermes-agent/pull/4623), [#4616](https://github.com/NousResearch/hermes-agent/pull/4616), [#4355](https://github.com/NousResearch/hermes-agent/pull/4355))
-
- **Same-Provider Credential Pools** — Configure multiple API keys for the same provider with automatic rotation. Thread-safe `least_used` strategy distributes load across keys, and 401 failures trigger automatic rotation to the next credential. Set up via the setup wizard or `credential_pool` config. ([#4188](https://github.com/NousResearch/hermes-agent/pull/4188), [#4300](https://github.com/NousResearch/hermes-agent/pull/4300), [#4361](https://github.com/NousResearch/hermes-agent/pull/4361))
-
- **Camofox Anti-Detection Browser Backend** — New local browser backend using Camoufox for stealth browsing. Persistent sessions with VNC URL discovery for visual debugging, configurable SSRF bypass for local backends, auto-install via `hermes tools`. ([#4008](https://github.com/NousResearch/hermes-agent/pull/4008), [#4419](https://github.com/NousResearch/hermes-agent/pull/4419), [#4292](https://github.com/NousResearch/hermes-agent/pull/4292))
-
- **Inline Diff Previews** — File write and patch operations now show inline diffs in the tool activity feed, giving you visual confirmation of what changed before the agent moves on. ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
-
- **API Server Session Continuity & Tool Streaming** — The API server (Open WebUI integration) now streams tool progress events in real-time and supports `X-Hermes-Session-Id` headers for persistent sessions across requests. Sessions persist to the shared SessionDB. ([#4092](https://github.com/NousResearch/hermes-agent/pull/4092), [#4478](https://github.com/NousResearch/hermes-agent/pull/4478), [#4802](https://github.com/NousResearch/hermes-agent/pull/4802))
-
- **ACP: Client-Provided MCP Servers** — Editor integrations (VS Code, Zed, JetBrains) can now register their own MCP servers, which Hermes picks up as additional agent tools. Your editor's MCP ecosystem flows directly into the agent. ([#4705](https://github.com/NousResearch/hermes-agent/pull/4705))
-
- **Gateway Hardening** — Major stability pass across race conditions, photo media delivery, flood control, stuck sessions, approval routing, and compression death spirals. The gateway is substantially more reliable in production. ([#4727](https://github.com/NousResearch/hermes-agent/pull/4727), [#4750](https://github.com/NousResearch/hermes-agent/pull/4750), [#4798](https://github.com/NousResearch/hermes-agent/pull/4798), [#4557](https://github.com/NousResearch/hermes-agent/pull/4557))
-
- **Security: Secret Exfiltration Blocking** — Browser URLs and LLM responses are now scanned for secret patterns, blocking exfiltration attempts via URL encoding, base64, or prompt injection. Credential directory protections expanded to `.docker`, `.azure`, `.config/gh`. Execute_code sandbox output is redacted. ([#4483](https://github.com/NousResearch/hermes-agent/pull/4483), [#4360](https://github.com/NousResearch/hermes-agent/pull/4360), [#4305](https://github.com/NousResearch/hermes-agent/pull/4305), [#4327](https://github.com/NousResearch/hermes-agent/pull/4327))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### Provider & Model Support
- **Same-provider credential pools** — configure multiple API keys with automatic `least_used` rotation and 401 failover ([#4188](https://github.com/NousResearch/hermes-agent/pull/4188), [#4300](https://github.com/NousResearch/hermes-agent/pull/4300))
- **Credential pool preserved through smart routing** — pool state survives fallback provider switches and defers eager fallback on 429 ([#4361](https://github.com/NousResearch/hermes-agent/pull/4361))
- **Per-turn primary runtime restoration** — after fallback provider use, the agent automatically restores the primary provider on the next turn with transport recovery ([#4624](https://github.com/NousResearch/hermes-agent/pull/4624))
- **`developer` role for GPT-5 and Codex models** — uses OpenAI's recommended system message role for newer models ([#4498](https://github.com/NousResearch/hermes-agent/pull/4498))
- **Google model operational guidance** — Gemini and Gemma models get provider-specific prompting guidance ([#4641](https://github.com/NousResearch/hermes-agent/pull/4641))
- **Anthropic long-context tier 429 handling** — automatically reduces context to 200k when hitting tier limits ([#4747](https://github.com/NousResearch/hermes-agent/pull/4747))
- **URL-based auth for third-party Anthropic endpoints** + CI test fixes ([#4148](https://github.com/NousResearch/hermes-agent/pull/4148))
- **Bearer auth for MiniMax Anthropic endpoints** ([#4028](https://github.com/NousResearch/hermes-agent/pull/4028))
- **Fireworks context length detection** ([#4158](https://github.com/NousResearch/hermes-agent/pull/4158))
- **Standard DashScope international endpoint** for Alibaba provider ([#4133](https://github.com/NousResearch/hermes-agent/pull/4133), closes [#3912](https://github.com/NousResearch/hermes-agent/issues/3912))
- **Custom providers context_length** honored in hygiene compression ([#4085](https://github.com/NousResearch/hermes-agent/pull/4085))
- **Non-sk-ant keys** treated as regular API keys, not OAuth tokens ([#4093](https://github.com/NousResearch/hermes-agent/pull/4093))
- **Claude-sonnet-4.6** added to OpenRouter and Nous model lists ([#4157](https://github.com/NousResearch/hermes-agent/pull/4157))
- **Qwen 3.6 Plus Preview** added to model lists ([#4376](https://github.com/NousResearch/hermes-agent/pull/4376))
- **MiniMax M2.7** added to hermes model picker and OpenCode ([#4208](https://github.com/NousResearch/hermes-agent/pull/4208))
- **Auto-detect models from server probe** in custom endpoint setup ([#4218](https://github.com/NousResearch/hermes-agent/pull/4218))
- **Config.yaml single source of truth** for endpoint URLs — no more env var vs config.yaml conflicts ([#4165](https://github.com/NousResearch/hermes-agent/pull/4165))
- **Setup wizard no longer overwrites** custom endpoint config ([#4180](https://github.com/NousResearch/hermes-agent/pull/4180), closes [#4172](https://github.com/NousResearch/hermes-agent/issues/4172))
- **Unified setup wizard provider selection** with `hermes model` — single code path for both flows ([#4200](https://github.com/NousResearch/hermes-agent/pull/4200))
- **Root-level provider config** no longer overrides `model.provider` ([#4329](https://github.com/NousResearch/hermes-agent/pull/4329))
- **Rate-limit pairing rejection messages** to prevent spam ([#4081](https://github.com/NousResearch/hermes-agent/pull/4081))
-
-### Agent Loop & Conversation
- **Preserve Anthropic thinking block signatures** across tool-use turns ([#4626](https://github.com/NousResearch/hermes-agent/pull/4626))
- **Classify think-only empty responses** before retrying — prevents infinite retry loops on models that produce thinking blocks without content ([#4645](https://github.com/NousResearch/hermes-agent/pull/4645))
- **Prevent compression death spiral** from API disconnects — stops the loop where compression triggers, fails, compresses again ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
- **Persist compressed context** to gateway session after mid-run compression ([#4095](https://github.com/NousResearch/hermes-agent/pull/4095))
- **Context-exceeded error messages** now include actionable guidance ([#4155](https://github.com/NousResearch/hermes-agent/pull/4155), closes [#4061](https://github.com/NousResearch/hermes-agent/issues/4061))
- **Strip orphaned think/reasoning tags** from user-facing responses ([#4311](https://github.com/NousResearch/hermes-agent/pull/4311), closes [#4285](https://github.com/NousResearch/hermes-agent/issues/4285))
- **Harden Codex responses preflight** and stream error handling ([#4313](https://github.com/NousResearch/hermes-agent/pull/4313))
- **Deterministic call_id fallbacks** instead of random UUIDs for prompt cache consistency ([#3991](https://github.com/NousResearch/hermes-agent/pull/3991))
- **Context pressure warning spam** prevented after compression ([#4012](https://github.com/NousResearch/hermes-agent/pull/4012))
- **AsyncOpenAI created lazily** in trajectory compressor to avoid closed event loop errors ([#4013](https://github.com/NousResearch/hermes-agent/pull/4013))
-
-### Memory & Sessions
- **Pluggable memory provider interface** — ABC-based plugin system for custom memory backends with profile isolation ([#4623](https://github.com/NousResearch/hermes-agent/pull/4623))
- **Honcho full integration parity** restored as reference memory provider plugin ([#4355](https://github.com/NousResearch/hermes-agent/pull/4355)) — @erosika
- **Honcho profile-scoped** host and peer resolution ([#4616](https://github.com/NousResearch/hermes-agent/pull/4616))
- **Memory flush state persisted** to prevent redundant re-flushes on gateway restart ([#4481](https://github.com/NousResearch/hermes-agent/pull/4481))
- **Memory provider tools** routed through sequential execution path ([#4803](https://github.com/NousResearch/hermes-agent/pull/4803))
- **Honcho config** written to instance-local path for profile isolation ([#4037](https://github.com/NousResearch/hermes-agent/pull/4037))
- **API server sessions** persist to shared SessionDB ([#4802](https://github.com/NousResearch/hermes-agent/pull/4802))
- **Token usage persisted** for non-CLI sessions ([#4627](https://github.com/NousResearch/hermes-agent/pull/4627))
- **Quote dotted terms in FTS5 queries** — fixes session search for terms containing dots ([#4549](https://github.com/NousResearch/hermes-agent/pull/4549))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### Gateway Core
- **Race condition fixes** — photo media loss, flood control, stuck sessions, and STT config issues resolved in one hardening pass ([#4727](https://github.com/NousResearch/hermes-agent/pull/4727))
- **Approval routing through running-agent guard** — `/approve` and `/deny` now route correctly when the agent is blocked waiting for approval instead of being swallowed as interrupts ([#4798](https://github.com/NousResearch/hermes-agent/pull/4798), [#4557](https://github.com/NousResearch/hermes-agent/pull/4557), closes [#4542](https://github.com/NousResearch/hermes-agent/issues/4542))
- **Resume agent after /approve** — tool result is no longer lost when executing blocked commands ([#4418](https://github.com/NousResearch/hermes-agent/pull/4418))
- **DM thread sessions seeded** with parent transcript to preserve context ([#4559](https://github.com/NousResearch/hermes-agent/pull/4559))
- **Skill-aware slash commands** — gateway dynamically registers installed skills as slash commands with paginated `/commands` list and Telegram 100-command cap ([#3934](https://github.com/NousResearch/hermes-agent/pull/3934), [#4005](https://github.com/NousResearch/hermes-agent/pull/4005), [#4006](https://github.com/NousResearch/hermes-agent/pull/4006), [#4010](https://github.com/NousResearch/hermes-agent/pull/4010), [#4023](https://github.com/NousResearch/hermes-agent/pull/4023))
- **Per-platform disabled skills** respected in Telegram menu and gateway dispatch ([#4799](https://github.com/NousResearch/hermes-agent/pull/4799))
- **Remove user-facing compression warnings** — cleaner message flow ([#4139](https://github.com/NousResearch/hermes-agent/pull/4139))
- **`-v/-q` flags wired to stderr logging** for gateway service ([#4474](https://github.com/NousResearch/hermes-agent/pull/4474))
- **HERMES_HOME remapped** to target user in system service unit ([#4456](https://github.com/NousResearch/hermes-agent/pull/4456))
- **Honor default for invalid bool-like config values** ([#4029](https://github.com/NousResearch/hermes-agent/pull/4029))
- **setsid instead of systemd-run** for `/update` command to avoid systemd permission issues ([#4104](https://github.com/NousResearch/hermes-agent/pull/4104), closes [#4017](https://github.com/NousResearch/hermes-agent/issues/4017))
- **'Initializing agent...'** shown on first message for better UX ([#4086](https://github.com/NousResearch/hermes-agent/pull/4086))
- **Allow running gateway service as root** for LXC/container environments ([#4732](https://github.com/NousResearch/hermes-agent/pull/4732))
-
-### Telegram
- **32-char limit on command names** with collision avoidance ([#4211](https://github.com/NousResearch/hermes-agent/pull/4211))
- **Priority order enforced** in menu — core > plugins > skills ([#4023](https://github.com/NousResearch/hermes-agent/pull/4023))
- **Capped at 50 commands** — API rejects above ~60 ([#4006](https://github.com/NousResearch/hermes-agent/pull/4006))
- **Skip empty/whitespace text** to prevent 400 errors ([#4388](https://github.com/NousResearch/hermes-agent/pull/4388))
- **E2E gateway tests** added ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497)) — @pefontana
-
-### Discord
- **Button-based approval UI** — register `/approve` and `/deny` slash commands with interactive button prompts ([#4800](https://github.com/NousResearch/hermes-agent/pull/4800))
- **Configurable reactions** — `discord.reactions` config option to disable message processing reactions ([#4199](https://github.com/NousResearch/hermes-agent/pull/4199))
- **Skip reactions and auto-threading** for unauthorized users ([#4387](https://github.com/NousResearch/hermes-agent/pull/4387))
-
-### Slack
- **Reply in thread** — `slack.reply_in_thread` config option for threaded responses ([#4643](https://github.com/NousResearch/hermes-agent/pull/4643), closes [#2662](https://github.com/NousResearch/hermes-agent/issues/2662))
-
-### WhatsApp
- **Enforce require_mention in group chats** ([#4730](https://github.com/NousResearch/hermes-agent/pull/4730))
-
-### Webhook
- **Platform support fixes** — skip home channel prompt, disable tool progress for webhook adapters ([#4660](https://github.com/NousResearch/hermes-agent/pull/4660))
-
-### Matrix
- **E2EE decryption hardening** — request missing keys, auto-trust devices, retry buffered events ([#4083](https://github.com/NousResearch/hermes-agent/pull/4083))
-
---
-
-## 🖥️ CLI & User Experience
-
-### New Slash Commands
- **`/yolo`** — toggle dangerous command approvals on/off for the session ([#3990](https://github.com/NousResearch/hermes-agent/pull/3990))
- **`/btw`** — ephemeral side questions that don't affect the main conversation context ([#4161](https://github.com/NousResearch/hermes-agent/pull/4161))
- **`/profile`** — show active profile info without leaving the chat session ([#4027](https://github.com/NousResearch/hermes-agent/pull/4027))
-
-### Interactive CLI
- **Inline diff previews** for write and patch operations in the tool activity feed ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
- **TUI pinned to bottom** on startup — no more large blank spaces between response and input ([#4412](https://github.com/NousResearch/hermes-agent/pull/4412), [#4359](https://github.com/NousResearch/hermes-agent/pull/4359), closes [#4398](https://github.com/NousResearch/hermes-agent/issues/4398), [#4421](https://github.com/NousResearch/hermes-agent/issues/4421))
- **`/history` and `/resume`** now surface recent sessions directly instead of requiring search ([#4728](https://github.com/NousResearch/hermes-agent/pull/4728))
- **Cache tokens shown** in `/insights` overview so total adds up ([#4428](https://github.com/NousResearch/hermes-agent/pull/4428))
- **`--max-turns` CLI flag** for `hermes chat` to limit agent iterations ([#4314](https://github.com/NousResearch/hermes-agent/pull/4314))
- **Detect dragged file paths** instead of treating them as slash commands ([#4533](https://github.com/NousResearch/hermes-agent/pull/4533)) — @rolme
- **Allow empty strings and falsy values** in `config set` ([#4310](https://github.com/NousResearch/hermes-agent/pull/4310), closes [#4277](https://github.com/NousResearch/hermes-agent/issues/4277))
- **Voice mode in WSL** when PulseAudio bridge is configured ([#4317](https://github.com/NousResearch/hermes-agent/pull/4317))
- **Respect `NO_COLOR` env var** and `TERM=dumb` for accessibility ([#4079](https://github.com/NousResearch/hermes-agent/pull/4079), closes [#4066](https://github.com/NousResearch/hermes-agent/issues/4066)) — @SHL0MS
- **Correct shell reload instruction** for macOS/zsh users ([#4025](https://github.com/NousResearch/hermes-agent/pull/4025))
- **Zero exit code** on successful quiet mode queries ([#4613](https://github.com/NousResearch/hermes-agent/pull/4613), closes [#4601](https://github.com/NousResearch/hermes-agent/issues/4601)) — @devorun
- **on_session_end hook fires** on interrupted exits ([#4159](https://github.com/NousResearch/hermes-agent/pull/4159))
- **Profile list display** reads `model.default` key correctly ([#4160](https://github.com/NousResearch/hermes-agent/pull/4160))
- **Browser and TTS** shown in reconfigure menu ([#4041](https://github.com/NousResearch/hermes-agent/pull/4041))
- **Web backend priority** detection simplified ([#4036](https://github.com/NousResearch/hermes-agent/pull/4036))
-
-### Setup & Configuration
- **Allowed_users preserved** during setup and quiet unconfigured provider warnings ([#4551](https://github.com/NousResearch/hermes-agent/pull/4551)) — @kshitijk4poor
- **Save API key to model config** for custom endpoints ([#4202](https://github.com/NousResearch/hermes-agent/pull/4202), closes [#4182](https://github.com/NousResearch/hermes-agent/issues/4182))
- **Claude Code credentials gated** behind explicit Hermes config in wizard trigger ([#4210](https://github.com/NousResearch/hermes-agent/pull/4210))
- **Atomic writes in save_config_value** to prevent config loss on interrupt ([#4298](https://github.com/NousResearch/hermes-agent/pull/4298), [#4320](https://github.com/NousResearch/hermes-agent/pull/4320))
- **Scopes field written** to Claude Code credentials on token refresh ([#4126](https://github.com/NousResearch/hermes-agent/pull/4126))
-
-### Update System
- **Fork detection and upstream sync** in `hermes update` ([#4744](https://github.com/NousResearch/hermes-agent/pull/4744))
- **Preserve working optional extras** when one extra fails during update ([#4550](https://github.com/NousResearch/hermes-agent/pull/4550))
- **Handle conflicted git index** during hermes update ([#4735](https://github.com/NousResearch/hermes-agent/pull/4735))
- **Avoid launchd restart race** on macOS ([#4736](https://github.com/NousResearch/hermes-agent/pull/4736))
- **Missing subprocess.run() timeouts** added to doctor and status commands ([#4009](https://github.com/NousResearch/hermes-agent/pull/4009))
-
---
-
-## 🔧 Tool System
-
-### Browser
- **Camofox anti-detection browser backend** — local stealth browsing with auto-install via `hermes tools` ([#4008](https://github.com/NousResearch/hermes-agent/pull/4008))
- **Persistent Camofox sessions** with VNC URL discovery for visual debugging ([#4419](https://github.com/NousResearch/hermes-agent/pull/4419))
- **Skip SSRF check for local backends** (Camofox, headless Chromium) ([#4292](https://github.com/NousResearch/hermes-agent/pull/4292))
- **Configurable SSRF check** via `browser.allow_private_urls` ([#4198](https://github.com/NousResearch/hermes-agent/pull/4198)) — @nils010485
- **CAMOFOX_PORT=9377** added to Docker commands ([#4340](https://github.com/NousResearch/hermes-agent/pull/4340))
-
-### File Operations
- **Inline diff previews** on write and patch actions ([#4411](https://github.com/NousResearch/hermes-agent/pull/4411), [#4423](https://github.com/NousResearch/hermes-agent/pull/4423))
- **Stale file detection** on write and patch — warns when file was modified externally since last read ([#4345](https://github.com/NousResearch/hermes-agent/pull/4345))
- **Staleness timestamp refreshed** after writes ([#4390](https://github.com/NousResearch/hermes-agent/pull/4390))
- **Size guard, dedup, and device blocking** on read_file ([#4315](https://github.com/NousResearch/hermes-agent/pull/4315))
-
-### MCP
- **Stability fix pack** — reload timeout, shutdown cleanup, event loop handler, OAuth non-blocking ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#4462](https://github.com/NousResearch/hermes-agent/issues/4462), [#2537](https://github.com/NousResearch/hermes-agent/issues/2537))
-
-### ACP (Editor Integration)
- **Client-provided MCP servers** registered as agent tools — editors pass their MCP servers to Hermes ([#4705](https://github.com/NousResearch/hermes-agent/pull/4705))
-
-### Skills System
- **Size limits for agent writes** and **fuzzy matching for skill patch** — prevents oversized skill writes and improves edit reliability ([#4414](https://github.com/NousResearch/hermes-agent/pull/4414))
- **Validate hub bundle paths** before install — blocks path traversal in skill bundles ([#3986](https://github.com/NousResearch/hermes-agent/pull/3986))
- **Unified hermes-agent and hermes-agent-setup** into single skill ([#4332](https://github.com/NousResearch/hermes-agent/pull/4332))
- **Skill metadata type check** in extract_skill_conditions ([#4479](https://github.com/NousResearch/hermes-agent/pull/4479))
-
-### New/Updated Skills
- **research-paper-writing** — full end-to-end research pipeline (replaced ml-paper-writing) ([#4654](https://github.com/NousResearch/hermes-agent/pull/4654)) — @SHL0MS
- **ascii-video** — text readability techniques and external layout oracle ([#4054](https://github.com/NousResearch/hermes-agent/pull/4054)) — @SHL0MS
- **youtube-transcript** updated for youtube-transcript-api v1.x ([#4455](https://github.com/NousResearch/hermes-agent/pull/4455)) — @el-analista
- **Skills browse and search page** added to documentation site ([#4500](https://github.com/NousResearch/hermes-agent/pull/4500)) — @IAvecilla
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- **Block secret exfiltration** via browser URLs and LLM responses — scans for secret patterns in URL encoding, base64, and prompt injection vectors ([#4483](https://github.com/NousResearch/hermes-agent/pull/4483))
- **Redact secrets from execute_code sandbox output** ([#4360](https://github.com/NousResearch/hermes-agent/pull/4360))
- **Protect `.docker`, `.azure`, `.config/gh` credential directories** from read/write via file tools and terminal ([#4305](https://github.com/NousResearch/hermes-agent/pull/4305), [#4327](https://github.com/NousResearch/hermes-agent/pull/4327)) — @memosr
- **GitHub OAuth token patterns** added to redaction + snapshot redact flag ([#4295](https://github.com/NousResearch/hermes-agent/pull/4295))
- **Reject private and loopback IPs** in Telegram DoH fallback ([#4129](https://github.com/NousResearch/hermes-agent/pull/4129))
- **Reject path traversal** in credential file registration ([#4316](https://github.com/NousResearch/hermes-agent/pull/4316))
- **Validate tar archive member paths** on profile import — blocks zip-slip attacks ([#4318](https://github.com/NousResearch/hermes-agent/pull/4318))
- **Exclude auth.json and .env** from profile exports ([#4475](https://github.com/NousResearch/hermes-agent/pull/4475))
-
-### Reliability
- **Prevent compression death spiral** from API disconnects ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
- **Handle `is_closed` as method** in OpenAI SDK — prevents false positive client closure detection ([#4416](https://github.com/NousResearch/hermes-agent/pull/4416), closes [#4377](https://github.com/NousResearch/hermes-agent/issues/4377))
- **Exclude matrix from [all] extras** — python-olm is upstream-broken, prevents install failures ([#4615](https://github.com/NousResearch/hermes-agent/pull/4615), closes [#4178](https://github.com/NousResearch/hermes-agent/issues/4178))
- **OpenCode model routing** repaired ([#4508](https://github.com/NousResearch/hermes-agent/pull/4508))
- **Docker container image** optimized ([#4034](https://github.com/NousResearch/hermes-agent/pull/4034)) — @bcross
-
-### Windows & Cross-Platform
- **Voice mode in WSL** with PulseAudio bridge ([#4317](https://github.com/NousResearch/hermes-agent/pull/4317))
- **Homebrew packaging** preparation ([#4099](https://github.com/NousResearch/hermes-agent/pull/4099))
- **CI fork conditionals** to prevent workflow failures on forks ([#4107](https://github.com/NousResearch/hermes-agent/pull/4107))
-
---
-
-## 🐛 Notable Bug Fixes
-
- **Gateway approval blocked agent thread** — approval now blocks the agent thread like CLI does, preventing tool result loss ([#4557](https://github.com/NousResearch/hermes-agent/pull/4557), closes [#4542](https://github.com/NousResearch/hermes-agent/issues/4542))
- **Compression death spiral** from API disconnects — detected and halted instead of looping ([#4750](https://github.com/NousResearch/hermes-agent/pull/4750), closes [#2153](https://github.com/NousResearch/hermes-agent/issues/2153))
- **Anthropic thinking blocks lost** across tool-use turns ([#4626](https://github.com/NousResearch/hermes-agent/pull/4626))
- **Profile model config ignored** with `-p` flag — model.model now promoted to model.default correctly ([#4160](https://github.com/NousResearch/hermes-agent/pull/4160), closes [#4486](https://github.com/NousResearch/hermes-agent/issues/4486))
- **CLI blank space** between response and input area ([#4412](https://github.com/NousResearch/hermes-agent/pull/4412), [#4359](https://github.com/NousResearch/hermes-agent/pull/4359), closes [#4398](https://github.com/NousResearch/hermes-agent/issues/4398))
- **Dragged file paths** treated as slash commands instead of file references ([#4533](https://github.com/NousResearch/hermes-agent/pull/4533)) — @rolme
- **Orphaned `</think>` tags** leaking into user-facing responses ([#4311](https://github.com/NousResearch/hermes-agent/pull/4311), closes [#4285](https://github.com/NousResearch/hermes-agent/issues/4285))
- **OpenAI SDK `is_closed`** is a method not property — false positive client closure ([#4416](https://github.com/NousResearch/hermes-agent/pull/4416), closes [#4377](https://github.com/NousResearch/hermes-agent/issues/4377))
- **MCP OAuth server** could block Hermes startup instead of degrading gracefully ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#4462](https://github.com/NousResearch/hermes-agent/issues/4462))
- **MCP event loop closed** on shutdown with HTTP servers ([#4757](https://github.com/NousResearch/hermes-agent/pull/4757), closes [#2537](https://github.com/NousResearch/hermes-agent/issues/2537))
- **Alibaba provider** hardcoded to wrong endpoint ([#4133](https://github.com/NousResearch/hermes-agent/pull/4133), closes [#3912](https://github.com/NousResearch/hermes-agent/issues/3912))
- **Slack reply_in_thread** missing config option ([#4643](https://github.com/NousResearch/hermes-agent/pull/4643), closes [#2662](https://github.com/NousResearch/hermes-agent/issues/2662))
- **Quiet mode exit code** — successful `-q` queries no longer exit nonzero ([#4613](https://github.com/NousResearch/hermes-agent/pull/4613), closes [#4601](https://github.com/NousResearch/hermes-agent/issues/4601))
- **Mobile sidebar** shows only close button due to backdrop-filter issue in docs site ([#4207](https://github.com/NousResearch/hermes-agent/pull/4207)) — @xsmyile
- **Config restore reverted** by stale-branch squash merge — `_config_version` fixed ([#4440](https://github.com/NousResearch/hermes-agent/pull/4440))
-
---
-
-## 🧪 Testing
-
- **Telegram gateway E2E tests** — full integration test suite for the Telegram adapter ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497)) — @pefontana
- **11 real test failures fixed** plus sys.modules cascade poisoner resolved ([#4570](https://github.com/NousResearch/hermes-agent/pull/4570))
- **7 CI failures resolved** across hooks, plugins, and skill tests ([#3936](https://github.com/NousResearch/hermes-agent/pull/3936))
- **Codex 401 refresh tests** updated for CI compatibility ([#4166](https://github.com/NousResearch/hermes-agent/pull/4166))
- **Stale OPENAI_BASE_URL test** fixed ([#4217](https://github.com/NousResearch/hermes-agent/pull/4217))
-
---
-
-## 📚 Documentation
-
- **Comprehensive documentation audit** — 9 HIGH and 20+ MEDIUM gaps fixed across 21 files ([#4087](https://github.com/NousResearch/hermes-agent/pull/4087))
- **Site navigation restructured** — features and platforms promoted to top-level ([#4116](https://github.com/NousResearch/hermes-agent/pull/4116))
- **Tool progress streaming** documented for API server and Open WebUI ([#4138](https://github.com/NousResearch/hermes-agent/pull/4138))
- **Telegram webhook mode** documentation ([#4089](https://github.com/NousResearch/hermes-agent/pull/4089))
- **Local LLM provider guides** — comprehensive setup guides with context length warnings ([#4294](https://github.com/NousResearch/hermes-agent/pull/4294))
- **WhatsApp allowlist behavior** clarified with `WHATSAPP_ALLOW_ALL_USERS` documentation ([#4293](https://github.com/NousResearch/hermes-agent/pull/4293))
- **Slack configuration options** — new config section in Slack docs ([#4644](https://github.com/NousResearch/hermes-agent/pull/4644))
- **Terminal backends section** expanded + docs build fixes ([#4016](https://github.com/NousResearch/hermes-agent/pull/4016))
- **Adding-providers guide** updated for unified setup flow ([#4201](https://github.com/NousResearch/hermes-agent/pull/4201))
- **ACP Zed config** fixed ([#4743](https://github.com/NousResearch/hermes-agent/pull/4743))
- **Community FAQ** entries for common workflows and troubleshooting ([#4797](https://github.com/NousResearch/hermes-agent/pull/4797))
- **Skills browse and search page** on docs site ([#4500](https://github.com/NousResearch/hermes-agent/pull/4500)) — @IAvecilla
-
---
-
-## 👥 Contributors
-
-### Core
- **@teknium1** — 135 commits across all subsystems
-
-### Top Community Contributors
- **@kshitijk4poor** — 13 commits: preserve allowed_users during setup ([#4551](https://github.com/NousResearch/hermes-agent/pull/4551)), and various fixes
- **@erosika** — 12 commits: Honcho full integration parity restored as memory provider plugin ([#4355](https://github.com/NousResearch/hermes-agent/pull/4355))
- **@pefontana** — 9 commits: Telegram gateway E2E test suite ([#4497](https://github.com/NousResearch/hermes-agent/pull/4497))
- **@bcross** — 5 commits: Docker container image optimization ([#4034](https://github.com/NousResearch/hermes-agent/pull/4034))
- **@SHL0MS** — 4 commits: NO_COLOR/TERM=dumb support ([#4079](https://github.com/NousResearch/hermes-agent/pull/4079)), ascii-video skill updates ([#4054](https://github.com/NousResearch/hermes-agent/pull/4054)), research-paper-writing skill ([#4654](https://github.com/NousResearch/hermes-agent/pull/4654))
-
-### All Contributors
-@0xbyt4, @arasovic, @Bartok9, @bcross, @binhnt92, @camden-lowrance, @curtitoo, @Dakota, @Dave Tist, @Dean Kerr, @devorun, @dieutx, @Dilee, @el-analista, @erosika, @Gutslabs, @IAvecilla, @Jack, @Johannnnn506, @kshitijk4poor, @Laura Batalha, @Leegenux, @Lume, @MacroAnarchy, @maymuneth, @memosr, @NexVeridian, @Nick, @nils010485, @pefontana, @Penov, @rolme, @SHL0MS, @txchen, @xsmyile
-
-### Issues Resolved from Community
-@acsezen ([#2537](https://github.com/NousResearch/hermes-agent/issues/2537)), @arasovic ([#4285](https://github.com/NousResearch/hermes-agent/issues/4285)), @camden-lowrance ([#4462](https://github.com/NousResearch/hermes-agent/issues/4462)), @devorun ([#4601](https://github.com/NousResearch/hermes-agent/issues/4601)), @eloklam ([#4486](https://github.com/NousResearch/hermes-agent/issues/4486)), @HenkDz ([#3719](https://github.com/NousResearch/hermes-agent/issues/3719)), @hypotyposis ([#2153](https://github.com/NousResearch/hermes-agent/issues/2153)), @kazamak ([#4178](https://github.com/NousResearch/hermes-agent/issues/4178)), @lstep ([#4366](https://github.com/NousResearch/hermes-agent/issues/4366)), @Mark-Lok ([#4542](https://github.com/NousResearch/hermes-agent/issues/4542)), @NoJster ([#4421](https://github.com/NousResearch/hermes-agent/issues/4421)), @patp ([#2662](https://github.com/NousResearch/hermes-agent/issues/2662)), @pr0n ([#4601](https://github.com/NousResearch/hermes-agent/issues/4601)), @saulmc ([#4377](https://github.com/NousResearch/hermes-agent/issues/4377)), @SHL0MS ([#4060](https://github.com/NousResearch/hermes-agent/issues/4060), [#4061](https://github.com/NousResearch/hermes-agent/issues/4061), [#4066](https://github.com/NousResearch/hermes-agent/issues/4066), [#4172](https://github.com/NousResearch/hermes-agent/issues/4172), [#4277](https://github.com/NousResearch/hermes-agent/issues/4277)), @Z-Mackintosh ([#4398](https://github.com/NousResearch/hermes-agent/issues/4398))
-
---
-
-**Full Changelog**: [v2026.3.30...v2026.4.3](https://github.com/NousResearch/hermes-agent/compare/v2026.3.30...v2026.4.3)
--- a/RELEASE_v0.8.0.md
+++ b/RELEASE_v0.8.0.md
@@ -1,346 +0,0 @@
-# Hermes Agent v0.8.0 (v2026.4.8)
-
-**Release Date:** April 8, 2026
-
-> The intelligence release — background task auto-notifications, free MiMo v2 Pro on Nous Portal, live model switching across all platforms, self-optimized GPT/Codex guidance, native Google AI Studio, smart inactivity timeouts, approval buttons, MCP OAuth 2.1, and 209 merged PRs with 82 resolved issues.
-
---
-
-## ✨ Highlights
-
- **Background Process Auto-Notifications (`notify_on_complete`)** — Background tasks can now automatically notify the agent when they finish. Start a long-running process (AI model training, test suites, deployments, builds) and the agent gets notified on completion — no polling needed. The agent can keep working on other things and pick up results when they land. ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779))
-
- **Free Xiaomi MiMo v2 Pro on Nous Portal** — Nous Portal now supports the free-tier Xiaomi MiMo v2 Pro model for auxiliary tasks (compression, vision, summarization), with free-tier model gating and pricing display in model selection. ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018), [#5880](https://github.com/NousResearch/hermes-agent/pull/5880))
-
- **Live Model Switching (`/model` Command)** — Switch models and providers mid-session from CLI, Telegram, Discord, Slack, or any gateway platform. Aggregator-aware resolution keeps you on OpenRouter/Nous when possible, with automatic cross-provider fallback when needed. Interactive model pickers on Telegram and Discord with inline buttons. ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181), [#5742](https://github.com/NousResearch/hermes-agent/pull/5742))
-
- **Self-Optimized GPT/Codex Tool-Use Guidance** — The agent diagnosed and patched 5 failure modes in GPT and Codex tool calling through automated behavioral benchmarking, dramatically improving reliability on OpenAI models. Includes execution discipline guidance and thinking-only prefill continuation for structured reasoning. ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120), [#5414](https://github.com/NousResearch/hermes-agent/pull/5414), [#5931](https://github.com/NousResearch/hermes-agent/pull/5931))
-
- **Google AI Studio (Gemini) Native Provider** — Direct access to Gemini models through Google's AI Studio API. Includes automatic models.dev registry integration for real-time context length detection across any provider. ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577))
-
- **Inactivity-Based Agent Timeouts** — Gateway and cron timeouts now track actual tool activity instead of wall-clock time. Long-running tasks that are actively working will never be killed — only truly idle agents time out. ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389), [#5440](https://github.com/NousResearch/hermes-agent/pull/5440))
-
- **Approval Buttons on Slack & Telegram** — Dangerous command approval via native platform buttons instead of typing `/approve`. Slack gets thread context preservation; Telegram gets emoji reactions for approval status. ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
-
- **MCP OAuth 2.1 PKCE + OSV Malware Scanning** — Full standards-compliant OAuth for MCP server authentication, plus automatic malware scanning of MCP extension packages via the OSV vulnerability database. ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420), [#5305](https://github.com/NousResearch/hermes-agent/pull/5305))
-
- **Centralized Logging & Config Validation** — Structured logging to `~/.hermes/logs/` (agent.log + errors.log) with the `hermes logs` command for tailing and filtering. Config structure validation catches malformed YAML at startup before it causes cryptic failures. ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430), [#5426](https://github.com/NousResearch/hermes-agent/pull/5426))
-
- **Plugin System Expansion** — Plugins can now register CLI subcommands, receive request-scoped API hooks with correlation IDs, prompt for required env vars during install, and hook into session lifecycle events (finalize/reset). ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295), [#5427](https://github.com/NousResearch/hermes-agent/pull/5427), [#5470](https://github.com/NousResearch/hermes-agent/pull/5470), [#6129](https://github.com/NousResearch/hermes-agent/pull/6129))
-
- **Matrix Tier 1 & Platform Hardening** — Matrix gets reactions, read receipts, rich formatting, and room management. Discord adds channel controls and ignored channels. Signal gets full MEDIA: tag delivery. Mattermost gets file attachments. Comprehensive reliability fixes across all platforms. ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975), [#5602](https://github.com/NousResearch/hermes-agent/pull/5602))
-
- **Security Hardening Pass** — Consolidated SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards, cron path traversal hardening, and cross-session isolation. Terminal workdir sanitization across all backends. ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944), [#5613](https://github.com/NousResearch/hermes-agent/pull/5613), [#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### Provider & Model Support
- **Native Google AI Studio (Gemini) provider** with models.dev integration for automatic context length detection ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577))
- **`/model` command — full provider+model system overhaul** — live switching across CLI and all gateway platforms with aggregator-aware resolution ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181))
- **Interactive model picker for Telegram and Discord** — inline button-based model selection ([#5742](https://github.com/NousResearch/hermes-agent/pull/5742))
- **Nous Portal free-tier model gating** with pricing display in model selection ([#5880](https://github.com/NousResearch/hermes-agent/pull/5880))
- **Model pricing display** for OpenRouter and Nous Portal providers ([#5416](https://github.com/NousResearch/hermes-agent/pull/5416))
- **xAI (Grok) prompt caching** via `x-grok-conv-id` header ([#5604](https://github.com/NousResearch/hermes-agent/pull/5604))
- **Grok added to tool-use enforcement models** for direct xAI usage ([#5595](https://github.com/NousResearch/hermes-agent/pull/5595))
- **MiniMax TTS provider** (speech-2.8) ([#4963](https://github.com/NousResearch/hermes-agent/pull/4963))
- **Non-agentic model warning** — warns users when loading Hermes LLM models not designed for tool use ([#5378](https://github.com/NousResearch/hermes-agent/pull/5378))
- **Ollama Cloud auth, /model switch persistence**, and alias tab completion ([#5269](https://github.com/NousResearch/hermes-agent/pull/5269))
- **Preserve dots in OpenCode Go model names** (minimax-m2.7, glm-4.5, kimi-k2.5) ([#5597](https://github.com/NousResearch/hermes-agent/pull/5597))
- **MiniMax models 404 fix** — strip /v1 from Anthropic base URL for OpenCode Go ([#4918](https://github.com/NousResearch/hermes-agent/pull/4918))
- **Provider credential reset windows** honored in pooled failover ([#5188](https://github.com/NousResearch/hermes-agent/pull/5188))
- **OAuth token sync** between credential pool and credentials file ([#4981](https://github.com/NousResearch/hermes-agent/pull/4981))
- **Stale OAuth credentials** no longer block OpenRouter users on auto-detect ([#5746](https://github.com/NousResearch/hermes-agent/pull/5746))
- **Codex OAuth credential pool disconnect** + expired token import fix ([#5681](https://github.com/NousResearch/hermes-agent/pull/5681))
- **Codex pool entry sync** from `~/.codex/auth.json` on exhaustion — @GratefulDave ([#5610](https://github.com/NousResearch/hermes-agent/pull/5610))
- **Auxiliary client payment fallback** — retry with next provider on 402 ([#5599](https://github.com/NousResearch/hermes-agent/pull/5599))
- **Auxiliary client resolves named custom providers** and 'main' alias ([#5978](https://github.com/NousResearch/hermes-agent/pull/5978))
- **Use mimo-v2-pro** for non-vision auxiliary tasks on Nous free tier ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018))
- **Vision auto-detection** tries main provider first ([#6041](https://github.com/NousResearch/hermes-agent/pull/6041))
- **Provider re-ordering and Quick Install** — @austinpickett ([#4664](https://github.com/NousResearch/hermes-agent/pull/4664))
- **Nous OAuth access_token** no longer used as inference API key — @SHL0MS ([#5564](https://github.com/NousResearch/hermes-agent/pull/5564))
- **HERMES_PORTAL_BASE_URL env var** respected during Nous login — @benbarclay ([#5745](https://github.com/NousResearch/hermes-agent/pull/5745))
- **Env var overrides** for Nous portal/inference URLs ([#5419](https://github.com/NousResearch/hermes-agent/pull/5419))
- **Z.AI endpoint auto-detect** via probe and cache ([#5763](https://github.com/NousResearch/hermes-agent/pull/5763))
- **MiniMax context lengths, model catalog, thinking guard, aux model, and config base_url** corrections ([#6082](https://github.com/NousResearch/hermes-agent/pull/6082))
- **Community provider/model resolution fixes** — salvaged 4 community PRs + MiniMax aux URL ([#5983](https://github.com/NousResearch/hermes-agent/pull/5983))
-
-### Agent Loop & Conversation
- **Self-optimized GPT/Codex tool-use guidance** via automated behavioral benchmarking — agent self-diagnosed and patched 5 failure modes ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120))
- **GPT/Codex execution discipline guidance** in system prompts ([#5414](https://github.com/NousResearch/hermes-agent/pull/5414))
- **Thinking-only prefill continuation** for structured reasoning responses ([#5931](https://github.com/NousResearch/hermes-agent/pull/5931))
- **Accept reasoning-only responses** without retries — set content to "(empty)" instead of infinite retry ([#5278](https://github.com/NousResearch/hermes-agent/pull/5278))
- **Jittered retry backoff** — exponential backoff with jitter for API retries ([#6048](https://github.com/NousResearch/hermes-agent/pull/6048))
- **Smart thinking block signature management** — preserve and manage Anthropic thinking signatures across turns ([#6112](https://github.com/NousResearch/hermes-agent/pull/6112))
- **Coerce tool call arguments** to match JSON Schema types — fixes models that send strings instead of numbers/booleans ([#5265](https://github.com/NousResearch/hermes-agent/pull/5265))
- **Save oversized tool results to file** instead of destructive truncation ([#5210](https://github.com/NousResearch/hermes-agent/pull/5210))
- **Sandbox-aware tool result persistence** ([#6085](https://github.com/NousResearch/hermes-agent/pull/6085))
- **Streaming fallback** improved after edit failures ([#6110](https://github.com/NousResearch/hermes-agent/pull/6110))
- **Codex empty-output gaps** covered in fallback + normalizer + auxiliary client ([#5724](https://github.com/NousResearch/hermes-agent/pull/5724), [#5730](https://github.com/NousResearch/hermes-agent/pull/5730), [#5734](https://github.com/NousResearch/hermes-agent/pull/5734))
- **Codex stream output backfill** from output_item.done events ([#5689](https://github.com/NousResearch/hermes-agent/pull/5689))
- **Stream consumer creates new message** after tool boundaries ([#5739](https://github.com/NousResearch/hermes-agent/pull/5739))
- **Codex validation aligned** with normalization for empty stream output ([#5940](https://github.com/NousResearch/hermes-agent/pull/5940))
- **Bridge tool-calls** in copilot-acp adapter ([#5460](https://github.com/NousResearch/hermes-agent/pull/5460))
- **Filter transcript-only roles** from chat-completions payload ([#4880](https://github.com/NousResearch/hermes-agent/pull/4880))
- **Context compaction failures fixed** on temperature-restricted models — @MadKangYu ([#5608](https://github.com/NousResearch/hermes-agent/pull/5608))
- **Sanitize tool_calls for all strict APIs** (Fireworks, Mistral, etc.) — @lumethegreat ([#5183](https://github.com/NousResearch/hermes-agent/pull/5183))
-
-### Memory & Sessions
- **Supermemory memory provider** — new memory plugin with multi-container, search_mode, identity template, and env var override ([#5737](https://github.com/NousResearch/hermes-agent/pull/5737), [#5933](https://github.com/NousResearch/hermes-agent/pull/5933))
- **Shared thread sessions** by default — multi-user thread support across gateway platforms ([#5391](https://github.com/NousResearch/hermes-agent/pull/5391))
- **Subagent sessions linked to parent** and hidden from session list ([#5309](https://github.com/NousResearch/hermes-agent/pull/5309))
- **Profile-scoped memory isolation** and clone support ([#4845](https://github.com/NousResearch/hermes-agent/pull/4845))
- **Thread gateway user_id to memory plugins** for per-user scoping ([#5895](https://github.com/NousResearch/hermes-agent/pull/5895))
- **Honcho plugin drift overhaul** + plugin CLI registration system ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295))
- **Honcho holographic prompt and trust score** rendering preserved ([#4872](https://github.com/NousResearch/hermes-agent/pull/4872))
- **Honcho doctor fix** — use recall_mode instead of memory_mode — @techguysimon ([#5645](https://github.com/NousResearch/hermes-agent/pull/5645))
- **RetainDB** — API routes, write queue, dialectic, agent model, file tools fixes ([#5461](https://github.com/NousResearch/hermes-agent/pull/5461))
- **Hindsight memory plugin overhaul** + memory setup wizard fixes ([#5094](https://github.com/NousResearch/hermes-agent/pull/5094))
- **mem0 API v2 compat**, prefetch context fencing, secret redaction ([#5423](https://github.com/NousResearch/hermes-agent/pull/5423))
- **mem0 env vars merged** with mem0.json instead of either/or ([#4939](https://github.com/NousResearch/hermes-agent/pull/4939))
- **Clean user message** used for all memory provider operations ([#4940](https://github.com/NousResearch/hermes-agent/pull/4940))
- **Silent memory flush failure** on /new and /resume fixed — @ryanautomated ([#5640](https://github.com/NousResearch/hermes-agent/pull/5640))
- **OpenViking atexit safety net** for session commit ([#5664](https://github.com/NousResearch/hermes-agent/pull/5664))
- **OpenViking tenant-scoping headers** for multi-tenant servers ([#4936](https://github.com/NousResearch/hermes-agent/pull/4936))
- **ByteRover brv query** runs synchronously before LLM call ([#4831](https://github.com/NousResearch/hermes-agent/pull/4831))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### Gateway Core
- **Inactivity-based agent timeout** — replaces wall-clock timeout with smart activity tracking; long-running active tasks never killed ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389))
- **Approval buttons for Slack & Telegram** + Slack thread context preservation ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890))
- **Live-stream /update output** + forward interactive prompts to user ([#5180](https://github.com/NousResearch/hermes-agent/pull/5180))
- **Infinite timeout support** + periodic notifications + actionable error messages ([#4959](https://github.com/NousResearch/hermes-agent/pull/4959))
- **Duplicate message prevention** — gateway dedup + partial stream guard ([#4878](https://github.com/NousResearch/hermes-agent/pull/4878))
- **Webhook delivery_info persistence** + full session id in /status ([#5942](https://github.com/NousResearch/hermes-agent/pull/5942))
- **Tool preview truncation** respects tool_preview_length in all/new progress modes ([#5937](https://github.com/NousResearch/hermes-agent/pull/5937))
- **Short preview truncation** restored for all/new tool progress modes ([#4935](https://github.com/NousResearch/hermes-agent/pull/4935))
- **Update-pending state** written atomically to prevent corruption ([#4923](https://github.com/NousResearch/hermes-agent/pull/4923))
- **Approval session key isolated** per turn ([#4884](https://github.com/NousResearch/hermes-agent/pull/4884))
- **Active-session guard bypass** for /approve, /deny, /stop, /new ([#4926](https://github.com/NousResearch/hermes-agent/pull/4926), [#5765](https://github.com/NousResearch/hermes-agent/pull/5765))
- **Typing indicator paused** during approval waits ([#5893](https://github.com/NousResearch/hermes-agent/pull/5893))
- **Caption check** uses exact line-by-line match instead of substring (all platforms) ([#5939](https://github.com/NousResearch/hermes-agent/pull/5939))
- **MEDIA: tags stripped** from streamed gateway messages ([#5152](https://github.com/NousResearch/hermes-agent/pull/5152))
- **MEDIA: tags extracted** from cron delivery before sending ([#5598](https://github.com/NousResearch/hermes-agent/pull/5598))
- **Profile-aware service units** + voice transcription cleanup ([#5972](https://github.com/NousResearch/hermes-agent/pull/5972))
- **Thread-safe PairingStore** with atomic writes — @CharlieKerfoot ([#5656](https://github.com/NousResearch/hermes-agent/pull/5656))
- **Sanitize media URLs** in base platform logs — @WAXLYY ([#5631](https://github.com/NousResearch/hermes-agent/pull/5631))
- **Reduce Telegram fallback IP activation log noise** — @MadKangYu ([#5615](https://github.com/NousResearch/hermes-agent/pull/5615))
- **Cron static method wrappers** to prevent self-binding ([#5299](https://github.com/NousResearch/hermes-agent/pull/5299))
- **Stale 'hermes login' replaced** with 'hermes auth' + credential removal re-seeding fix ([#5670](https://github.com/NousResearch/hermes-agent/pull/5670))
-
-### Telegram
- **Group topics skill binding** for supergroup forum topics ([#4886](https://github.com/NousResearch/hermes-agent/pull/4886))
- **Emoji reactions** for approval status and notifications ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
- **Duplicate message delivery prevented** on send timeout ([#5153](https://github.com/NousResearch/hermes-agent/pull/5153))
- **Command names sanitized** to strip invalid characters ([#5596](https://github.com/NousResearch/hermes-agent/pull/5596))
- **Per-platform disabled skills** respected in Telegram menu and gateway dispatch ([#4799](https://github.com/NousResearch/hermes-agent/pull/4799))
- **/approve and /deny** routed through running-agent guard ([#4798](https://github.com/NousResearch/hermes-agent/pull/4798))
-
-### Discord
- **Channel controls** — ignored_channels and no_thread_channels config options ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
- **Skills registered as native slash commands** via shared gateway logic ([#5603](https://github.com/NousResearch/hermes-agent/pull/5603))
- **/approve, /deny, /queue, /background, /btw** registered as native slash commands ([#4800](https://github.com/NousResearch/hermes-agent/pull/4800), [#5477](https://github.com/NousResearch/hermes-agent/pull/5477))
- **Unnecessary members intent** removed on startup + token lock leak fix ([#5302](https://github.com/NousResearch/hermes-agent/pull/5302))
-
-### Slack
- **Thread engagement** — auto-respond in bot-started and mentioned threads ([#5897](https://github.com/NousResearch/hermes-agent/pull/5897))
- **mrkdwn in edit_message** + thread replies without @mentions ([#5733](https://github.com/NousResearch/hermes-agent/pull/5733))
-
-### Matrix
- **Tier 1 feature parity** — reactions, read receipts, rich formatting, room management ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275))
- **MATRIX_REQUIRE_MENTION and MATRIX_AUTO_THREAD** support ([#5106](https://github.com/NousResearch/hermes-agent/pull/5106))
- **Comprehensive reliability** — encrypted media, auth recovery, cron E2EE, Synapse compat ([#5271](https://github.com/NousResearch/hermes-agent/pull/5271))
- **CJK input, E2EE, and reconnect** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
-
-### Signal
- **Full MEDIA: tag delivery** — send_image_file, send_voice, and send_video implemented ([#5602](https://github.com/NousResearch/hermes-agent/pull/5602))
-
-### Mattermost
- **File attachments** — set message type to DOCUMENT when post has file attachments — @nericervin ([#5609](https://github.com/NousResearch/hermes-agent/pull/5609))
-
-### Feishu
- **Interactive card approval buttons** ([#6043](https://github.com/NousResearch/hermes-agent/pull/6043))
- **Reconnect and ACL** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
-
-### Webhooks
- **`{__raw__}` template token** and thread_id passthrough for forum topics ([#5662](https://github.com/NousResearch/hermes-agent/pull/5662))
-
---
-
-## 🖥️ CLI & User Experience
-
-### Interactive CLI
- **Defer response content** until reasoning block completes ([#5773](https://github.com/NousResearch/hermes-agent/pull/5773))
- **Ghost status-bar lines cleared** on terminal resize ([#4960](https://github.com/NousResearch/hermes-agent/pull/4960))
- **Normalise \r\n and \r line endings** in pasted text ([#4849](https://github.com/NousResearch/hermes-agent/pull/4849))
- **ChatConsole errors, curses scroll, skin-aware banner, git state** banner fixes ([#5974](https://github.com/NousResearch/hermes-agent/pull/5974))
- **Native Windows image paste** support ([#5917](https://github.com/NousResearch/hermes-agent/pull/5917))
- **--yolo and other flags** no longer silently dropped when placed before 'chat' subcommand ([#5145](https://github.com/NousResearch/hermes-agent/pull/5145))
-
-### Setup & Configuration
- **Config structure validation** — detect malformed YAML at startup with actionable error messages ([#5426](https://github.com/NousResearch/hermes-agent/pull/5426))
- **Centralized logging** to `~/.hermes/logs/` — agent.log (INFO+), errors.log (WARNING+) with `hermes logs` command ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430))
- **Docs links added** to setup wizard sections ([#5283](https://github.com/NousResearch/hermes-agent/pull/5283))
- **Doctor diagnostics** — sync provider checks, config migration, WAL and mem0 diagnostics ([#5077](https://github.com/NousResearch/hermes-agent/pull/5077))
- **Timeout debug logging** and user-facing diagnostics improved ([#5370](https://github.com/NousResearch/hermes-agent/pull/5370))
- **Reasoning effort unified** to config.yaml only ([#6118](https://github.com/NousResearch/hermes-agent/pull/6118))
- **Permanent command allowlist** loaded on startup ([#5076](https://github.com/NousResearch/hermes-agent/pull/5076))
- **`hermes auth remove`** now clears env-seeded credentials permanently ([#5285](https://github.com/NousResearch/hermes-agent/pull/5285))
- **Bundled skills synced to all profiles** during update ([#5795](https://github.com/NousResearch/hermes-agent/pull/5795))
- **`hermes update` no longer kills** freshly-restarted gateway service ([#5448](https://github.com/NousResearch/hermes-agent/pull/5448))
- **Subprocess.run() timeouts** added to all gateway CLI commands ([#5424](https://github.com/NousResearch/hermes-agent/pull/5424))
- **Actionable error message** when Codex refresh token is reused — @tymrtn ([#5612](https://github.com/NousResearch/hermes-agent/pull/5612))
- **Google-workspace skill scripts** can now run directly — @xinbenlv ([#5624](https://github.com/NousResearch/hermes-agent/pull/5624))
-
-### Cron System
- **Inactivity-based cron timeout** — replaces wall-clock; active tasks run indefinitely ([#5440](https://github.com/NousResearch/hermes-agent/pull/5440))
- **Pre-run script injection** for data collection and change detection ([#5082](https://github.com/NousResearch/hermes-agent/pull/5082))
- **Delivery failure tracking** in job status ([#6042](https://github.com/NousResearch/hermes-agent/pull/6042))
- **Delivery guidance** in cron prompts — stops send_message thrashing ([#5444](https://github.com/NousResearch/hermes-agent/pull/5444))
- **MEDIA files delivered** as native platform attachments ([#5921](https://github.com/NousResearch/hermes-agent/pull/5921))
- **[SILENT] suppression** works anywhere in response — @auspic7 ([#5654](https://github.com/NousResearch/hermes-agent/pull/5654))
- **Cron path traversal** hardening ([#5147](https://github.com/NousResearch/hermes-agent/pull/5147))
-
---
-
-## 🔧 Tool System
-
-### Terminal & Execution
- **Execute_code on remote backends** — code execution now works on Docker, SSH, Modal, and other remote terminal backends ([#5088](https://github.com/NousResearch/hermes-agent/pull/5088))
- **Exit code context** for common CLI tools in terminal results — helps agent understand what went wrong ([#5144](https://github.com/NousResearch/hermes-agent/pull/5144))
- **Progressive subdirectory hint discovery** — agent learns project structure as it navigates ([#5291](https://github.com/NousResearch/hermes-agent/pull/5291))
- **notify_on_complete for background processes** — get notified when long-running tasks finish ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779))
- **Docker env config** — explicit container environment variables via docker_env config ([#4738](https://github.com/NousResearch/hermes-agent/pull/4738))
- **Approval metadata included** in terminal tool results ([#5141](https://github.com/NousResearch/hermes-agent/pull/5141))
- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
- **Detached process crash recovery** state corrected ([#6101](https://github.com/NousResearch/hermes-agent/pull/6101))
- **Agent-browser paths with spaces** preserved — @Vasanthdev2004 ([#6077](https://github.com/NousResearch/hermes-agent/pull/6077))
- **Portable base64 encoding** for image reading on macOS — @CharlieKerfoot ([#5657](https://github.com/NousResearch/hermes-agent/pull/5657))
-
-### Browser
- **Switch managed browser provider** from Browserbase to Browser Use — @benbarclay ([#5750](https://github.com/NousResearch/hermes-agent/pull/5750))
- **Firecrawl cloud browser** provider — @alt-glitch ([#5628](https://github.com/NousResearch/hermes-agent/pull/5628))
- **JS evaluation** via browser_console expression parameter ([#5303](https://github.com/NousResearch/hermes-agent/pull/5303))
- **Windows browser** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
-
-### MCP
- **MCP OAuth 2.1 PKCE** — full standards-compliant OAuth client support ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420))
- **OSV malware check** for MCP extension packages ([#5305](https://github.com/NousResearch/hermes-agent/pull/5305))
- **Prefer structuredContent over text** + no_mcp sentinel ([#5979](https://github.com/NousResearch/hermes-agent/pull/5979))
- **Unknown toolsets warning suppressed** for MCP server names ([#5279](https://github.com/NousResearch/hermes-agent/pull/5279))
-
-### Web & Files
- **.zip document support** + auto-mount cache dirs into remote backends ([#4846](https://github.com/NousResearch/hermes-agent/pull/4846))
- **Redact query secrets** in send_message errors — @WAXLYY ([#5650](https://github.com/NousResearch/hermes-agent/pull/5650))
-
-### Delegation
- **Credential pool sharing** + workspace path hints for subagents ([#5748](https://github.com/NousResearch/hermes-agent/pull/5748))
-
-### ACP (VS Code / Zed / JetBrains)
- **Aggregate ACP improvements** — auth compat, protocol fixes, command ads, delegation, SSE events ([#5292](https://github.com/NousResearch/hermes-agent/pull/5292))
-
---
-
-## 🧩 Skills Ecosystem
-
-### Skills System
- **Skill config interface** — skills can declare required config.yaml settings, prompted during setup, injected at load time ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635))
- **Plugin CLI registration system** — plugins register their own CLI subcommands without touching main.py ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295))
- **Request-scoped API hooks** with tool call correlation IDs for plugins ([#5427](https://github.com/NousResearch/hermes-agent/pull/5427))
- **Session lifecycle hooks** — on_session_finalize and on_session_reset for CLI + gateway ([#6129](https://github.com/NousResearch/hermes-agent/pull/6129))
- **Prompt for required env vars** during plugin install — @kshitijk4poor ([#5470](https://github.com/NousResearch/hermes-agent/pull/5470))
- **Plugin name validation** — reject names that resolve to plugins root ([#5368](https://github.com/NousResearch/hermes-agent/pull/5368))
- **pre_llm_call plugin context** moved to user message to preserve prompt cache ([#5146](https://github.com/NousResearch/hermes-agent/pull/5146))
-
-### New & Updated Skills
- **popular-web-designs** — 54 production website design systems ([#5194](https://github.com/NousResearch/hermes-agent/pull/5194))
- **p5js creative coding** — @SHL0MS ([#5600](https://github.com/NousResearch/hermes-agent/pull/5600))
- **manim-video** — mathematical and technical animations — @SHL0MS ([#4930](https://github.com/NousResearch/hermes-agent/pull/4930))
- **llm-wiki** — Karpathy's LLM Wiki skill ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635))
- **gitnexus-explorer** — codebase indexing and knowledge serving ([#5208](https://github.com/NousResearch/hermes-agent/pull/5208))
- **research-paper-writing** — AI-Scientist & GPT-Researcher patterns — @SHL0MS ([#5421](https://github.com/NousResearch/hermes-agent/pull/5421))
- **blogwatcher** updated to JulienTant's fork ([#5759](https://github.com/NousResearch/hermes-agent/pull/5759))
- **claude-code skill** comprehensive rewrite v2.0 + v2.2 ([#5155](https://github.com/NousResearch/hermes-agent/pull/5155), [#5158](https://github.com/NousResearch/hermes-agent/pull/5158))
- **Code verification skills** consolidated into one ([#4854](https://github.com/NousResearch/hermes-agent/pull/4854))
- **Manim CE reference docs** expanded — geometry, animations, LaTeX — @leotrs ([#5791](https://github.com/NousResearch/hermes-agent/pull/5791))
- **Manim-video references** — design thinking, updaters, paper explainer, decorations, production quality — @SHL0MS ([#5588](https://github.com/NousResearch/hermes-agent/pull/5588), [#5408](https://github.com/NousResearch/hermes-agent/pull/5408))
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- **Consolidated security** — SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944))
- **Cross-session isolation** + cron path traversal hardening ([#5613](https://github.com/NousResearch/hermes-agent/pull/5613))
- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
- **Approval 'once' session escalation** prevented + cron delivery platform validation ([#5280](https://github.com/NousResearch/hermes-agent/pull/5280))
- **Profile-scoped Google Workspace OAuth tokens** protected ([#4910](https://github.com/NousResearch/hermes-agent/pull/4910))
-
-### Reliability
- **Aggressive worktree and branch cleanup** to prevent accumulation ([#6134](https://github.com/NousResearch/hermes-agent/pull/6134))
- **O(n²) catastrophic backtracking** in redact regex fixed — 100x improvement on large outputs ([#4962](https://github.com/NousResearch/hermes-agent/pull/4962))
- **Runtime stability fixes** across core, web, delegate, and browser tools ([#4843](https://github.com/NousResearch/hermes-agent/pull/4843))
- **API server streaming fix** + conversation history support ([#5977](https://github.com/NousResearch/hermes-agent/pull/5977))
- **OpenViking API endpoint paths** and response parsing corrected ([#5078](https://github.com/NousResearch/hermes-agent/pull/5078))
-
---
-
-## 🐛 Notable Bug Fixes
-
- **9 community bugfixes salvaged** — gateway, cron, deps, macOS launchd in one batch ([#5288](https://github.com/NousResearch/hermes-agent/pull/5288))
- **Batch core bug fixes** — model config, session reset, alias fallback, launchctl, delegation, atomic writes ([#5630](https://github.com/NousResearch/hermes-agent/pull/5630))
- **Batch gateway/platform fixes** — matrix E2EE, CJK input, Windows browser, Feishu reconnect + ACL ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
- **Stale test skips removed**, regex backtracking, file search bug, and test flakiness ([#4969](https://github.com/NousResearch/hermes-agent/pull/4969))
- **Nix flake** — read version, regen uv.lock, add hermes_logging — @alt-glitch ([#5651](https://github.com/NousResearch/hermes-agent/pull/5651))
- **Lowercase variable redaction** regression tests ([#5185](https://github.com/NousResearch/hermes-agent/pull/5185))
-
---
-
-## 🧪 Testing
-
- **57 failing CI tests repaired** across 14 files ([#5823](https://github.com/NousResearch/hermes-agent/pull/5823))
- **Test suite re-architecture** + CI failure fixes — @alt-glitch ([#5946](https://github.com/NousResearch/hermes-agent/pull/5946))
- **Codebase-wide lint cleanup** — unused imports, dead code, and inefficient patterns ([#5821](https://github.com/NousResearch/hermes-agent/pull/5821))
- **browser_close tool removed** — auto-cleanup handles it ([#5792](https://github.com/NousResearch/hermes-agent/pull/5792))
-
---
-
-## 📚 Documentation
-
- **Comprehensive documentation audit** — fix stale info, expand thin pages, add depth ([#5393](https://github.com/NousResearch/hermes-agent/pull/5393))
- **40+ discrepancies fixed** between documentation and codebase ([#5818](https://github.com/NousResearch/hermes-agent/pull/5818))
- **13 features documented** from last week's PRs ([#5815](https://github.com/NousResearch/hermes-agent/pull/5815))
- **Guides section overhaul** — fix existing + add 3 new tutorials ([#5735](https://github.com/NousResearch/hermes-agent/pull/5735))
- **Salvaged 4 docs PRs** — docker setup, post-update validation, local LLM guide, signal-cli install ([#5727](https://github.com/NousResearch/hermes-agent/pull/5727))
- **Discord configuration reference** ([#5386](https://github.com/NousResearch/hermes-agent/pull/5386))
- **Community FAQ entries** for common workflows and troubleshooting ([#4797](https://github.com/NousResearch/hermes-agent/pull/4797))
- **WSL2 networking guide** for local model servers ([#5616](https://github.com/NousResearch/hermes-agent/pull/5616))
- **Honcho CLI reference** + plugin CLI registration docs ([#5308](https://github.com/NousResearch/hermes-agent/pull/5308))
- **Obsidian Headless setup** for servers in llm-wiki ([#5660](https://github.com/NousResearch/hermes-agent/pull/5660))
- **Hermes Mod visual skin editor** added to skins page ([#6095](https://github.com/NousResearch/hermes-agent/pull/6095))
-
---
-
-## 👥 Contributors
-
-### Core
- **@teknium1** — 179 PRs
-
-### Top Community Contributors
- **@SHL0MS** (7 PRs) — p5js creative coding skill, manim-video skill + 5 reference expansions, research-paper-writing, Nous OAuth fix, manim font fix
- **@alt-glitch** (3 PRs) — Firecrawl cloud browser provider, test re-architecture + CI fixes, Nix flake fixes
- **@benbarclay** (2 PRs) — Browser Use managed provider switch, Nous portal base URL fix
- **@CharlieKerfoot** (2 PRs) — macOS portable base64 encoding, thread-safe PairingStore
- **@WAXLYY** (2 PRs) — send_message secret redaction, gateway media URL sanitization
- **@MadKangYu** (2 PRs) — Telegram log noise reduction, context compaction fix for temperature-restricted models
-
-### All Contributors
-@alt-glitch, @austinpickett, @auspic7, @benbarclay, @CharlieKerfoot, @GratefulDave, @kshitijk4poor, @leotrs, @lumethegreat, @MadKangYu, @nericervin, @ryanautomated, @SHL0MS, @techguysimon, @tymrtn, @Vasanthdev2004, @WAXLYY, @xinbenlv
-
---
-
-**Full Changelog**: [v2026.4.3...v2026.4.8](https://github.com/NousResearch/hermes-agent/compare/v2026.4.3...v2026.4.8)
--- a/RELEASE_v0.9.0.md
+++ b/RELEASE_v0.9.0.md
@@ -1,329 +0,0 @@
-# Hermes Agent v0.9.0 (v2026.4.13)
-
-**Release Date:** April 13, 2026
-**Since v0.8.0:** 487 commits · 269 merged PRs · 167 resolved issues · 493 files changed · 63,281 insertions · 24 contributors
-
-> The everywhere release — Hermes goes mobile with Termux/Android, adds iMessage and WeChat, ships Fast Mode for OpenAI and Anthropic, introduces background process monitoring, launches a local web dashboard for managing your agent, and delivers the deepest security hardening pass yet across 16 supported platforms.
-
---
-
-## ✨ Highlights
-
- **Local Web Dashboard** — A new browser-based dashboard for managing your Hermes Agent locally. Configure settings, monitor sessions, browse skills, and manage your gateway — all from a clean web interface without touching config files or the terminal. The easiest way to get started with Hermes.
-
- **Fast Mode (`/fast`)** — Priority processing for OpenAI and Anthropic models. Toggle `/fast` to route through priority queues for significantly lower latency on supported models (GPT-5.4, Codex, Claude). Expands across all OpenAI Priority Processing models and Anthropic's fast tier. ([#6875](https://github.com/NousResearch/hermes-agent/pull/6875), [#6960](https://github.com/NousResearch/hermes-agent/pull/6960), [#7037](https://github.com/NousResearch/hermes-agent/pull/7037))
-
- **iMessage via BlueBubbles** — Full iMessage integration through BlueBubbles, bringing Hermes to Apple's messaging ecosystem. Auto-webhook registration, setup wizard integration, and crash resilience. ([#6437](https://github.com/NousResearch/hermes-agent/pull/6437), [#6460](https://github.com/NousResearch/hermes-agent/pull/6460), [#6494](https://github.com/NousResearch/hermes-agent/pull/6494))
-
- **WeChat (Weixin) & WeCom Callback Mode** — Native WeChat support via iLink Bot API and a new WeCom callback-mode adapter for self-built enterprise apps. Streaming cursor, media uploads, markdown link handling, and atomic state persistence. Hermes now covers the Chinese messaging ecosystem end-to-end. ([#7166](https://github.com/NousResearch/hermes-agent/pull/7166), [#7943](https://github.com/NousResearch/hermes-agent/pull/7943))
-
- **Termux / Android Support** — Run Hermes natively on Android via Termux. Adapted install paths, TUI optimizations for mobile screens, voice backend support, and the `/image` command work on-device. ([#6834](https://github.com/NousResearch/hermes-agent/pull/6834))
-
- **Background Process Monitoring (`watch_patterns`)** — Set patterns to watch for in background process output and get notified in real-time when they match. Monitor for errors, wait for specific events ("listening on port"), or watch build logs — all without polling. ([#7635](https://github.com/NousResearch/hermes-agent/pull/7635))
-
- **Native xAI & Xiaomi MiMo Providers** — First-class provider support for xAI (Grok) and Xiaomi MiMo, with direct API access, model catalogs, and setup wizard integration. Plus Qwen OAuth with portal request support. ([#7372](https://github.com/NousResearch/hermes-agent/pull/7372), [#7855](https://github.com/NousResearch/hermes-agent/pull/7855))
-
- **Pluggable Context Engine** — Context management is now a pluggable slot via `hermes plugins`. Swap in custom context engines that control what the agent sees each turn — filtering, summarization, or domain-specific context injection. ([#7464](https://github.com/NousResearch/hermes-agent/pull/7464))
-
- **Unified Proxy Support** — SOCKS proxy, `DISCORD_PROXY`, and system proxy auto-detection across all gateway platforms. Hermes behind corporate firewalls just works. ([#6814](https://github.com/NousResearch/hermes-agent/pull/6814))
-
- **Comprehensive Security Hardening** — Path traversal protection in checkpoint manager, shell injection neutralization in sandbox writes, SSRF redirect guards in Slack image uploads, Twilio webhook signature validation (SMS RCE fix), API server auth enforcement, git argument injection prevention, and approval button authorization. ([#7933](https://github.com/NousResearch/hermes-agent/pull/7933), [#7944](https://github.com/NousResearch/hermes-agent/pull/7944), [#7940](https://github.com/NousResearch/hermes-agent/pull/7940), [#7151](https://github.com/NousResearch/hermes-agent/pull/7151), [#7156](https://github.com/NousResearch/hermes-agent/pull/7156))
-
- **`hermes backup` & `hermes import`** — Full backup and restore of your Hermes configuration, sessions, skills, and memory. Migrate between machines or create snapshots before major changes. ([#7997](https://github.com/NousResearch/hermes-agent/pull/7997))
-
- **16 Supported Platforms** — With BlueBubbles (iMessage) and WeChat joining Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, SMS, DingTalk, Feishu, WeCom, Mattermost, Home Assistant, and Webhooks, Hermes now runs on 16 messaging platforms out of the box.
-
- **`/debug` & `hermes debug share`** — New debugging toolkit: `/debug` slash command across all platforms for quick diagnostics, plus `hermes debug share` to upload a full debug report to a pastebin for easy sharing when troubleshooting. ([#8681](https://github.com/NousResearch/hermes-agent/pull/8681))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### Provider & Model Support
- **Native xAI (Grok) provider** with direct API access and model catalog ([#7372](https://github.com/NousResearch/hermes-agent/pull/7372))
- **Xiaomi MiMo as first-class provider** — setup wizard, model catalog, empty response recovery ([#7855](https://github.com/NousResearch/hermes-agent/pull/7855))
- **Qwen OAuth provider** with portal request support ([#6282](https://github.com/NousResearch/hermes-agent/pull/6282))
- **Fast Mode** — `/fast` toggle for OpenAI Priority Processing + Anthropic fast tier ([#6875](https://github.com/NousResearch/hermes-agent/pull/6875), [#6960](https://github.com/NousResearch/hermes-agent/pull/6960), [#7037](https://github.com/NousResearch/hermes-agent/pull/7037))
- **Structured API error classification** for smart failover decisions ([#6514](https://github.com/NousResearch/hermes-agent/pull/6514))
- **Rate limit header capture** shown in `/usage` ([#6541](https://github.com/NousResearch/hermes-agent/pull/6541))
- **API server model name** derived from profile name ([#6857](https://github.com/NousResearch/hermes-agent/pull/6857))
- **Custom providers** now included in `/model` listings and resolution ([#7088](https://github.com/NousResearch/hermes-agent/pull/7088))
- **Fallback provider activation** on repeated empty responses with user-visible status ([#7505](https://github.com/NousResearch/hermes-agent/pull/7505))
- **OpenRouter variant tags** (`:free`, `:extended`, `:fast`) preserved during model switch ([#6383](https://github.com/NousResearch/hermes-agent/pull/6383))
- **Credential exhaustion TTL** reduced from 24 hours to 1 hour ([#6504](https://github.com/NousResearch/hermes-agent/pull/6504))
- **OAuth credential lifecycle** hardening — stale pool keys, auth.json sync, Codex CLI race fixes ([#6874](https://github.com/NousResearch/hermes-agent/pull/6874))
- Empty response recovery for reasoning models (MiMo, Qwen, GLM) ([#8609](https://github.com/NousResearch/hermes-agent/pull/8609))
- MiniMax context lengths, thinking guard, endpoint corrections ([#6082](https://github.com/NousResearch/hermes-agent/pull/6082), [#7126](https://github.com/NousResearch/hermes-agent/pull/7126))
- Z.AI endpoint auto-detect via probe and cache ([#5763](https://github.com/NousResearch/hermes-agent/pull/5763))
-
-### Agent Loop & Conversation
- **Pluggable context engine slot** via `hermes plugins` ([#7464](https://github.com/NousResearch/hermes-agent/pull/7464))
- **Background process monitoring** — `watch_patterns` for real-time output alerts ([#7635](https://github.com/NousResearch/hermes-agent/pull/7635))
- **Improved context compression** — higher limits, tool tracking, degradation warnings, token-budget tail protection ([#6395](https://github.com/NousResearch/hermes-agent/pull/6395), [#6453](https://github.com/NousResearch/hermes-agent/pull/6453))
- **`/compress <focus>`** — guided compression with a focus topic ([#8017](https://github.com/NousResearch/hermes-agent/pull/8017))
- **Tiered context pressure warnings** with gateway dedup ([#6411](https://github.com/NousResearch/hermes-agent/pull/6411))
- **Staged inactivity warning** before timeout escalation ([#6387](https://github.com/NousResearch/hermes-agent/pull/6387))
- **Prevent agent from stopping mid-task** — compression floor, budget overhaul, activity tracking ([#7983](https://github.com/NousResearch/hermes-agent/pull/7983))
- **Propagate child activity to parent** during `delegate_task` ([#7295](https://github.com/NousResearch/hermes-agent/pull/7295))
- **Truncated streaming tool call detection** before execution ([#6847](https://github.com/NousResearch/hermes-agent/pull/6847))
- Empty response retry (3 attempts with nudge) ([#6488](https://github.com/NousResearch/hermes-agent/pull/6488))
- Adaptive streaming backoff + cursor strip to prevent message truncation ([#7683](https://github.com/NousResearch/hermes-agent/pull/7683))
- Compression uses live session model instead of stale persisted config ([#8258](https://github.com/NousResearch/hermes-agent/pull/8258))
- Strip `<thought>` tags from Gemma 4 responses ([#8562](https://github.com/NousResearch/hermes-agent/pull/8562))
- Prevent `<think>` in prose from suppressing response output ([#6968](https://github.com/NousResearch/hermes-agent/pull/6968))
- Turn-exit diagnostic logging to agent loop ([#6549](https://github.com/NousResearch/hermes-agent/pull/6549))
- Scope tool interrupt signal per-thread to prevent cross-session leaks ([#7930](https://github.com/NousResearch/hermes-agent/pull/7930))
-
-### Memory & Sessions
- **Hindsight memory plugin** — feature parity, setup wizard, config improvements — @nicoloboschi ([#6428](https://github.com/NousResearch/hermes-agent/pull/6428))
- **Honcho** — opt-in `initOnSessionStart` for tools mode — @Kathie-yu ([#6995](https://github.com/NousResearch/hermes-agent/pull/6995))
- Orphan children instead of cascade-deleting in prune/delete ([#6513](https://github.com/NousResearch/hermes-agent/pull/6513))
- Doctor command only checks the active memory provider ([#6285](https://github.com/NousResearch/hermes-agent/pull/6285))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### New Platforms
- **BlueBubbles (iMessage)** — full adapter with auto-webhook registration, setup wizard, and crash resilience ([#6437](https://github.com/NousResearch/hermes-agent/pull/6437), [#6460](https://github.com/NousResearch/hermes-agent/pull/6460), [#6494](https://github.com/NousResearch/hermes-agent/pull/6494), [#7107](https://github.com/NousResearch/hermes-agent/pull/7107))
- **Weixin (WeChat)** — native support via iLink Bot API with streaming, media uploads, markdown links ([#7166](https://github.com/NousResearch/hermes-agent/pull/7166), [#8665](https://github.com/NousResearch/hermes-agent/pull/8665))
- **WeCom Callback Mode** — self-built enterprise app adapter with atomic state persistence ([#7943](https://github.com/NousResearch/hermes-agent/pull/7943), [#7928](https://github.com/NousResearch/hermes-agent/pull/7928))
-
-### Discord
- **Allowed channels whitelist** config — @jarvis-phw ([#7044](https://github.com/NousResearch/hermes-agent/pull/7044))
- **Forum channel topic inheritance** in thread sessions — @hermes-agent-dhabibi ([#6377](https://github.com/NousResearch/hermes-agent/pull/6377))
- **DISCORD_REPLY_TO_MODE** setting ([#6333](https://github.com/NousResearch/hermes-agent/pull/6333))
- Accept `.log` attachments, raise document size limit — @kira-ariaki ([#6467](https://github.com/NousResearch/hermes-agent/pull/6467))
- Decouple readiness from slash sync ([#8016](https://github.com/NousResearch/hermes-agent/pull/8016))
-
-### Slack
- **Consolidated Slack improvements** — 7 community PRs salvaged into one ([#6809](https://github.com/NousResearch/hermes-agent/pull/6809))
- Handle assistant thread lifecycle events ([#6433](https://github.com/NousResearch/hermes-agent/pull/6433))
-
-### Matrix
- **Migrated from matrix-nio to mautrix-python** ([#7518](https://github.com/NousResearch/hermes-agent/pull/7518))
- SQLite crypto store replacing pickle (fixes E2EE decryption) — @alt-glitch ([#7981](https://github.com/NousResearch/hermes-agent/pull/7981))
- Cross-signing recovery key verification for E2EE migration ([#8282](https://github.com/NousResearch/hermes-agent/pull/8282))
- DM mention threads + group chat events for Feishu ([#7423](https://github.com/NousResearch/hermes-agent/pull/7423))
-
-### Gateway Core
- **Unified proxy support** — SOCKS, DISCORD_PROXY, multi-platform with macOS auto-detection ([#6814](https://github.com/NousResearch/hermes-agent/pull/6814))
- **Inbound text batching** for Discord, Matrix, WeCom + adaptive delay ([#6979](https://github.com/NousResearch/hermes-agent/pull/6979))
- **Surface natural mid-turn assistant messages** in chat platforms ([#7978](https://github.com/NousResearch/hermes-agent/pull/7978))
- **WSL-aware gateway** with smart systemd detection ([#7510](https://github.com/NousResearch/hermes-agent/pull/7510))
- **All missing platforms added to setup wizard** ([#7949](https://github.com/NousResearch/hermes-agent/pull/7949))
- **Per-platform `tool_progress` overrides** ([#6348](https://github.com/NousResearch/hermes-agent/pull/6348))
- **Configurable 'still working' notification interval** ([#8572](https://github.com/NousResearch/hermes-agent/pull/8572))
- `/model` switch persists across messages ([#7081](https://github.com/NousResearch/hermes-agent/pull/7081))
- `/usage` shows rate limits, cost, and token details between turns ([#7038](https://github.com/NousResearch/hermes-agent/pull/7038))
- Drain in-flight work before restart ([#7503](https://github.com/NousResearch/hermes-agent/pull/7503))
- Don't evict cached agent on failed runs — prevents MCP restart loop ([#7539](https://github.com/NousResearch/hermes-agent/pull/7539))
- Replace `os.environ` session state with `contextvars` ([#7454](https://github.com/NousResearch/hermes-agent/pull/7454))
- Derive channel directory platforms from enum instead of hardcoded list ([#7450](https://github.com/NousResearch/hermes-agent/pull/7450))
- Validate image downloads before caching (cross-platform) ([#7125](https://github.com/NousResearch/hermes-agent/pull/7125))
- Cross-platform webhook delivery for all platforms ([#7095](https://github.com/NousResearch/hermes-agent/pull/7095))
- Cron Discord thread_id delivery support ([#7106](https://github.com/NousResearch/hermes-agent/pull/7106))
- Feishu QR-based bot onboarding ([#8570](https://github.com/NousResearch/hermes-agent/pull/8570))
- Gateway status scoped to active profile ([#7951](https://github.com/NousResearch/hermes-agent/pull/7951))
- Prevent background process notifications from triggering false pairing requests ([#6434](https://github.com/NousResearch/hermes-agent/pull/6434))
-
---
-
-## 🖥️ CLI & User Experience
-
-### Interactive CLI
- **Termux / Android support** — adapted install paths, TUI, voice, `/image` ([#6834](https://github.com/NousResearch/hermes-agent/pull/6834))
- **Native `/model` picker modal** for provider → model selection ([#8003](https://github.com/NousResearch/hermes-agent/pull/8003))
- **Live per-tool elapsed timer** restored in TUI spinner ([#7359](https://github.com/NousResearch/hermes-agent/pull/7359))
- **Stacked tool progress scrollback** in TUI ([#8201](https://github.com/NousResearch/hermes-agent/pull/8201))
- **Random tips on new session start** (CLI + gateway, 279 tips) ([#8225](https://github.com/NousResearch/hermes-agent/pull/8225), [#8237](https://github.com/NousResearch/hermes-agent/pull/8237))
- **`hermes dump`** — copy-pasteable setup summary for debugging ([#6550](https://github.com/NousResearch/hermes-agent/pull/6550))
- **`hermes backup` / `hermes import`** — full config backup and restore ([#7997](https://github.com/NousResearch/hermes-agent/pull/7997))
- **WSL environment hint** in system prompt ([#8285](https://github.com/NousResearch/hermes-agent/pull/8285))
- **Profile creation UX** — seed SOUL.md + credential warning ([#8553](https://github.com/NousResearch/hermes-agent/pull/8553))
- Shell-aware sudo detection, empty password support ([#6517](https://github.com/NousResearch/hermes-agent/pull/6517))
- Flush stdin after curses/terminal menus to prevent escape sequence leakage ([#7167](https://github.com/NousResearch/hermes-agent/pull/7167))
- Handle broken stdin in prompt_toolkit startup ([#8560](https://github.com/NousResearch/hermes-agent/pull/8560))
-
-### Setup & Configuration
- **Per-platform display verbosity** configuration ([#8006](https://github.com/NousResearch/hermes-agent/pull/8006))
- **Component-separated logging** with session context and filtering ([#7991](https://github.com/NousResearch/hermes-agent/pull/7991))
- **`network.force_ipv4`** config to fix IPv6 timeout issues ([#8196](https://github.com/NousResearch/hermes-agent/pull/8196))
- **Standardize message whitespace and JSON formatting** ([#7988](https://github.com/NousResearch/hermes-agent/pull/7988))
- **Rebrand OpenClaw → Hermes** during migration ([#8210](https://github.com/NousResearch/hermes-agent/pull/8210))
- Config.yaml takes priority over env vars for auxiliary settings ([#7889](https://github.com/NousResearch/hermes-agent/pull/7889))
- Harden setup provider flows + live OpenRouter catalog refresh ([#7078](https://github.com/NousResearch/hermes-agent/pull/7078))
- Normalize reasoning effort ordering across all surfaces ([#6804](https://github.com/NousResearch/hermes-agent/pull/6804))
- Remove dead `LLM_MODEL` env var + migration to clear stale entries ([#6543](https://github.com/NousResearch/hermes-agent/pull/6543))
- Remove `/prompt` slash command — prefix expansion footgun ([#6752](https://github.com/NousResearch/hermes-agent/pull/6752))
- `HERMES_HOME_MODE` env var to override permissions — @ygd58 ([#6993](https://github.com/NousResearch/hermes-agent/pull/6993))
- Fall back to default model when model config is empty ([#8303](https://github.com/NousResearch/hermes-agent/pull/8303))
- Warn when compression model context is too small ([#7894](https://github.com/NousResearch/hermes-agent/pull/7894))
-
---
-
-## 🔧 Tool System
-
-### Environments & Execution
- **Unified spawn-per-call execution layer** for environments ([#6343](https://github.com/NousResearch/hermes-agent/pull/6343))
- **Unified file sync** with mtime tracking, deletion, and transactional state ([#7087](https://github.com/NousResearch/hermes-agent/pull/7087))
- **Persistent sandbox envs** survive between turns ([#6412](https://github.com/NousResearch/hermes-agent/pull/6412))
- **Bulk file sync** via tar pipe for SSH/Modal backends — @alt-glitch ([#8014](https://github.com/NousResearch/hermes-agent/pull/8014))
- **Daytona** — bulk upload, config bridge, silent disk cap ([#7538](https://github.com/NousResearch/hermes-agent/pull/7538))
- Foreground timeout cap to prevent session deadlocks ([#7082](https://github.com/NousResearch/hermes-agent/pull/7082))
- Guard invalid command values ([#6417](https://github.com/NousResearch/hermes-agent/pull/6417))
-
-### MCP
- **`hermes mcp add --env` and `--preset`** support ([#7970](https://github.com/NousResearch/hermes-agent/pull/7970))
- Combine `content` and `structuredContent` when both present ([#7118](https://github.com/NousResearch/hermes-agent/pull/7118))
- MCP tool name deconfliction fixes ([#7654](https://github.com/NousResearch/hermes-agent/pull/7654))
-
-### Browser
- Browser hardening — dead code removal, caching, scroll perf, security, thread safety ([#7354](https://github.com/NousResearch/hermes-agent/pull/7354))
- `/browser connect` auto-launch uses dedicated Chrome profile dir ([#6821](https://github.com/NousResearch/hermes-agent/pull/6821))
- Reap orphaned browser sessions on startup ([#7931](https://github.com/NousResearch/hermes-agent/pull/7931))
-
-### Voice & Vision
- **Voxtral TTS provider** (Mistral AI) ([#7653](https://github.com/NousResearch/hermes-agent/pull/7653))
- **TTS speed support** for Edge TTS, OpenAI TTS, MiniMax ([#8666](https://github.com/NousResearch/hermes-agent/pull/8666))
- **Vision auto-resize** for oversized images, raise limit to 20 MB, retry-on-failure ([#7883](https://github.com/NousResearch/hermes-agent/pull/7883), [#7902](https://github.com/NousResearch/hermes-agent/pull/7902))
- STT provider-model mismatch fix (whisper-1 vs faster-whisper) ([#7113](https://github.com/NousResearch/hermes-agent/pull/7113))
-
-### Other Tools
- **`hermes dump`** command for setup summary ([#6550](https://github.com/NousResearch/hermes-agent/pull/6550))
- TODO store enforces ID uniqueness during replace operations ([#7986](https://github.com/NousResearch/hermes-agent/pull/7986))
- List all available toolsets in `delegate_task` schema description ([#8231](https://github.com/NousResearch/hermes-agent/pull/8231))
- API server: tool progress as custom SSE event to prevent model corruption ([#7500](https://github.com/NousResearch/hermes-agent/pull/7500))
- API server: share one Docker container across all conversations ([#7127](https://github.com/NousResearch/hermes-agent/pull/7127))
-
---
-
-## 🧩 Skills Ecosystem
-
- **Centralized skills index + tree cache** — eliminates rate-limit failures on install ([#8575](https://github.com/NousResearch/hermes-agent/pull/8575))
- **More aggressive skill loading instructions** in system prompt (v3) ([#8209](https://github.com/NousResearch/hermes-agent/pull/8209), [#8286](https://github.com/NousResearch/hermes-agent/pull/8286))
- **Google Workspace skill** migrated to GWS CLI backend ([#6788](https://github.com/NousResearch/hermes-agent/pull/6788))
- **Creative divergence strategies** skill — @SHL0MS ([#6882](https://github.com/NousResearch/hermes-agent/pull/6882))
- **Creative ideation** — constraint-driven project generation — @SHL0MS ([#7555](https://github.com/NousResearch/hermes-agent/pull/7555))
- Parallelize skills browse/search to prevent hanging ([#7301](https://github.com/NousResearch/hermes-agent/pull/7301))
- Read name from SKILL.md frontmatter in skills_sync ([#7623](https://github.com/NousResearch/hermes-agent/pull/7623))
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- **Twilio webhook signature validation** — SMS RCE fix ([#7933](https://github.com/NousResearch/hermes-agent/pull/7933))
- **Shell injection neutralization** in `_write_to_sandbox` via path quoting ([#7940](https://github.com/NousResearch/hermes-agent/pull/7940))
- **Git argument injection** and path traversal prevention in checkpoint manager ([#7944](https://github.com/NousResearch/hermes-agent/pull/7944))
- **SSRF redirect bypass** in Slack image uploads + base.py cache helpers ([#7151](https://github.com/NousResearch/hermes-agent/pull/7151))
- **Path traversal, credential gate, DANGEROUS_PATTERNS gaps** ([#7156](https://github.com/NousResearch/hermes-agent/pull/7156))
- **API bind guard** — enforce `API_SERVER_KEY` for non-loopback binding ([#7455](https://github.com/NousResearch/hermes-agent/pull/7455))
- **Approval button authorization** — require auth for session continuation — @Cafexss ([#6930](https://github.com/NousResearch/hermes-agent/pull/6930))
- Path boundary enforcement in skill manager operations ([#7156](https://github.com/NousResearch/hermes-agent/pull/7156))
- DingTalk/API webhook URL origin validation, header injection rejection ([#7455](https://github.com/NousResearch/hermes-agent/pull/7455))
-
-### Reliability
- **Contextual error diagnostics** for invalid API responses ([#8565](https://github.com/NousResearch/hermes-agent/pull/8565))
- **Prevent 400 format errors** from triggering compression loop on Codex ([#6751](https://github.com/NousResearch/hermes-agent/pull/6751))
- **Don't halve context_length** on output-cap-too-large errors — @KUSH42 ([#6664](https://github.com/NousResearch/hermes-agent/pull/6664))
- **Recover primary client** on OpenAI transport errors ([#7108](https://github.com/NousResearch/hermes-agent/pull/7108))
- **Credential pool rotation** on billing-classified 400s ([#7112](https://github.com/NousResearch/hermes-agent/pull/7112))
- **Auto-increase stream read timeout** for local LLM providers ([#6967](https://github.com/NousResearch/hermes-agent/pull/6967))
- **Fall back to default certs** when CA bundle path doesn't exist ([#7352](https://github.com/NousResearch/hermes-agent/pull/7352))
- **Disambiguate usage-limit patterns** in error classifier — @sprmn24 ([#6836](https://github.com/NousResearch/hermes-agent/pull/6836))
- Harden cron script timeout and provider recovery ([#7079](https://github.com/NousResearch/hermes-agent/pull/7079))
- Gateway interrupt detection resilient to monitor task failures ([#8208](https://github.com/NousResearch/hermes-agent/pull/8208))
- Prevent unwanted session auto-reset after graceful gateway restarts ([#8299](https://github.com/NousResearch/hermes-agent/pull/8299))
- Prevent duplicate update prompt spam in gateway watcher ([#8343](https://github.com/NousResearch/hermes-agent/pull/8343))
- Deduplicate reasoning items in Responses API input ([#7946](https://github.com/NousResearch/hermes-agent/pull/7946))
-
-### Infrastructure
- **Multi-arch Docker image** — amd64 + arm64 ([#6124](https://github.com/NousResearch/hermes-agent/pull/6124))
- **Docker runs as non-root user** with virtualenv — @benbarclay contributing ([#8226](https://github.com/NousResearch/hermes-agent/pull/8226))
- **Use `uv`** for Docker dependency resolution to fix resolution-too-deep ([#6965](https://github.com/NousResearch/hermes-agent/pull/6965))
- **Container-aware Nix CLI** — auto-route into managed container — @alt-glitch ([#7543](https://github.com/NousResearch/hermes-agent/pull/7543))
- **Nix shared-state permission model** for interactive CLI users — @alt-glitch ([#6796](https://github.com/NousResearch/hermes-agent/pull/6796))
- **Per-profile subprocess HOME isolation** ([#7357](https://github.com/NousResearch/hermes-agent/pull/7357))
- Profile paths fixed in Docker — profiles go to mounted volume ([#7170](https://github.com/NousResearch/hermes-agent/pull/7170))
- Docker container gateway pathway hardened ([#8614](https://github.com/NousResearch/hermes-agent/pull/8614))
- Enable unbuffered stdout for live Docker logs ([#6749](https://github.com/NousResearch/hermes-agent/pull/6749))
- Install procps in Docker image — @HiddenPuppy ([#7032](https://github.com/NousResearch/hermes-agent/pull/7032))
- Shallow git clone for faster installation — @sosyz ([#8396](https://github.com/NousResearch/hermes-agent/pull/8396))
- `hermes update` always reset on stash conflict ([#7010](https://github.com/NousResearch/hermes-agent/pull/7010))
- Write update exit code before gateway restart (cgroup kill race) ([#8288](https://github.com/NousResearch/hermes-agent/pull/8288))
- Nix: `setupSecrets` optional, tirith runtime dep — @devorun, @ethernet8023 ([#6261](https://github.com/NousResearch/hermes-agent/pull/6261), [#6721](https://github.com/NousResearch/hermes-agent/pull/6721))
- launchd stop uses `bootout` so `KeepAlive` doesn't respawn ([#7119](https://github.com/NousResearch/hermes-agent/pull/7119))
-
---
-
-## 🐛 Notable Bug Fixes
-
- Fix: `/model` switch not persisting across gateway messages ([#7081](https://github.com/NousResearch/hermes-agent/pull/7081))
- Fix: session-scoped gateway model overrides ignored — @Hygaard ([#7662](https://github.com/NousResearch/hermes-agent/pull/7662))
- Fix: compaction model context length ignoring config — 3 related issues ([#8258](https://github.com/NousResearch/hermes-agent/pull/8258), [#8107](https://github.com/NousResearch/hermes-agent/pull/8107))
- Fix: OpenCode.ai context window resolved to 128K instead of 1M ([#6472](https://github.com/NousResearch/hermes-agent/pull/6472))
- Fix: Codex fallback auth-store lookup — @cherifya ([#6462](https://github.com/NousResearch/hermes-agent/pull/6462))
- Fix: duplicate completion notifications when process killed ([#7124](https://github.com/NousResearch/hermes-agent/pull/7124))
- Fix: agent daemon thread prevents orphan CLI processes on tab close ([#8557](https://github.com/NousResearch/hermes-agent/pull/8557))
- Fix: stale image attachment on text paste and voice input ([#7077](https://github.com/NousResearch/hermes-agent/pull/7077))
- Fix: DM thread session seeding causing cross-thread contamination ([#7084](https://github.com/NousResearch/hermes-agent/pull/7084))
- Fix: OpenClaw migration shows dry-run preview before executing ([#6769](https://github.com/NousResearch/hermes-agent/pull/6769))
- Fix: auth errors misclassified as retryable — @kuishou68 ([#7027](https://github.com/NousResearch/hermes-agent/pull/7027))
- Fix: Copilot-Integration-Id header missing ([#7083](https://github.com/NousResearch/hermes-agent/pull/7083))
- Fix: ACP session capabilities — @luyao618 ([#6985](https://github.com/NousResearch/hermes-agent/pull/6985))
- Fix: ACP PromptResponse usage from top-level fields ([#7086](https://github.com/NousResearch/hermes-agent/pull/7086))
- Fix: several failing/flaky tests on main — @dsocolobsky ([#6777](https://github.com/NousResearch/hermes-agent/pull/6777))
- Fix: backup marker filenames — @sprmn24 ([#8600](https://github.com/NousResearch/hermes-agent/pull/8600))
- Fix: `NoneType` in fast_mode check — @0xbyt4 ([#7350](https://github.com/NousResearch/hermes-agent/pull/7350))
- Fix: missing imports in uninstall.py — @JiayuuWang ([#7034](https://github.com/NousResearch/hermes-agent/pull/7034))
-
---
-
-## 📚 Documentation
-
- Platform adapter developer guide + WeCom Callback docs ([#7969](https://github.com/NousResearch/hermes-agent/pull/7969))
- Cron troubleshooting guide ([#7122](https://github.com/NousResearch/hermes-agent/pull/7122))
- Streaming timeout auto-detection for local LLMs ([#6990](https://github.com/NousResearch/hermes-agent/pull/6990))
- Tool-use enforcement documentation expanded ([#7984](https://github.com/NousResearch/hermes-agent/pull/7984))
- BlueBubbles pairing instructions ([#6548](https://github.com/NousResearch/hermes-agent/pull/6548))
- Telegram proxy support section ([#6348](https://github.com/NousResearch/hermes-agent/pull/6348))
- `hermes dump` and `hermes logs` CLI reference ([#6552](https://github.com/NousResearch/hermes-agent/pull/6552))
- `tool_progress_overrides` configuration reference ([#6364](https://github.com/NousResearch/hermes-agent/pull/6364))
- Compression model context length warning docs ([#7879](https://github.com/NousResearch/hermes-agent/pull/7879))
-
---
-
-## 👥 Contributors
-
-**269 merged PRs** from **24 contributors** across **487 commits**.
-
-### Community Contributors
- **@alt-glitch** (6 PRs) — Nix container-aware CLI, shared-state permissions, Matrix SQLite crypto store, bulk SSH/Modal file sync, Matrix mautrix compat
- **@SHL0MS** (2 PRs) — Creative divergence strategies skill, creative ideation skill
- **@sprmn24** (2 PRs) — Error classifier disambiguation, backup marker fix
- **@nicoloboschi** — Hindsight memory plugin feature parity
- **@Hygaard** — Session-scoped gateway model override fix
- **@jarvis-phw** — Discord allowed_channels whitelist
- **@Kathie-yu** — Honcho initOnSessionStart for tools mode
- **@hermes-agent-dhabibi** — Discord forum channel topic inheritance
- **@kira-ariaki** — Discord .log attachments and size limit
- **@cherifya** — Codex fallback auth-store lookup
- **@Cafexss** — Security: auth for session continuation
- **@KUSH42** — Compaction context_length fix
- **@kuishou68** — Auth error retryable classification fix
- **@luyao618** — ACP session capabilities
- **@ygd58** — HERMES_HOME_MODE env var override
- **@0xbyt4** — Fast mode NoneType fix
- **@JiayuuWang** — CLI uninstall import fix
- **@HiddenPuppy** — Docker procps installation
- **@dsocolobsky** — Test suite fixes
- **@bobashopcashier** (1 PR) — Graceful gateway drain before restart (salvaged into #7503 from #7290)
- **@benbarclay** — Docker image tag simplification
- **@sosyz** — Shallow git clone for faster install
- **@devorun** — Nix setupSecrets optional
- **@ethernet8023** — Nix tirith runtime dep
-
---
-
-**Full Changelog**: [v2026.4.8...v2026.4.13](https://github.com/NousResearch/hermes-agent/compare/v2026.4.8...v2026.4.13)
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -1,84 +0,0 @@
-# Hermes Agent Security Policy
-
-This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project.
-
-## 1. Vulnerability Reporting
-
-Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities.
-
-### Required Submission Details
- **Title & Severity:** Concise description and CVSS score/rating.
- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`).
- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version.
- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release.
- **Impact:** Explanation of what trust boundary was crossed.
-
---
-
-## 2. Trust Model
-
-The core assumption is that Hermes is a **personal agent** with one trusted operator.
-
-### Operator & Session Trust
- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level.
- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries.
- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing.
-
-### Dangerous Command Approval
-The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`:
- `"on"` (default) — prompts the user to approve dangerous commands.
- `"auto"` — auto-approves after a configurable delay.
- `"off"` — disables the gate entirely (break-glass; see Section 3).
-
-### Output Redaction
-`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations.
-
-### Skills vs. MCP Servers
- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands.
- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning.
-
-### Code Execution Sandbox
-The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls.
-
-### Subagents
- **No recursive delegation:** The `delegate_task` tool is disabled for child agents.
- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected.
- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation.
-
---
-
-## 3. Out of Scope (Non-Vulnerabilities)
-
-The following scenarios are **not** considered security breaches:
- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox.
- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection.
- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files).
- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability.
- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production.
- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system).
-
---
-
-## 4. Deployment Hardening & Best Practices
-
-### Filesystem & Network
- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads.
- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs.
- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment.
-
-### Skills & Supply Chain
- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal.
- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned.
- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns.
-
-### Credential Storage
- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control.
- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases.
-
---
-
-## 5. Disclosure Process
-
- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first.
- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com.
- **Credits:** Reporters are credited in release notes unless anonymity is requested.
--- a/acp_registry/agent.json
+++ b/acp_registry/agent.json
@@ -1,12 +0,0 @@
-{
-  "schema_version": 1,
-  "name": "hermes-agent",
-  "display_name": "Hermes Agent",
-  "description": "AI agent by Nous Research with 90+ tools, persistent memory, and multi-platform support",
-  "icon": "icon.svg",
-  "distribution": {
-    "type": "command",
-    "command": "hermes",
-    "args": ["acp"]
-  }
-}
--- a/acp_registry/icon.svg
+++ b/acp_registry/icon.svg
@@ -1,25 +0,0 @@
-<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 64 64" width="64" height="64">
-  <defs>
-    <linearGradient id="gold" x1="0%" y1="0%" x2="0%" y2="100%">
-      <stop offset="0%" style="stop-color:#F5C542;stop-opacity:1" />
-      <stop offset="100%" style="stop-color:#D4961C;stop-opacity:1" />
-    </linearGradient>
-  </defs>
-  <!-- Staff -->
-  <rect x="30" y="10" width="4" height="46" rx="2" fill="url(#gold)" />
-  <!-- Wings (left) -->
-  <path d="M30 18 C24 14, 14 14, 10 18 C14 16, 22 16, 28 20" fill="#F5C542" opacity="0.9" />
-  <path d="M30 22 C26 19, 18 19, 14 22 C18 20, 24 20, 28 24" fill="#D4961C" opacity="0.8" />
-  <!-- Wings (right) -->
-  <path d="M34 18 C40 14, 50 14, 54 18 C50 16, 42 16, 36 20" fill="#F5C542" opacity="0.9" />
-  <path d="M34 22 C38 19, 46 19, 50 22 C46 20, 40 20, 36 24" fill="#D4961C" opacity="0.8" />
-  <!-- Left serpent -->
-  <path d="M32 48 C22 44, 20 38, 26 34 C20 36, 18 42, 24 46 C18 40, 22 30, 30 28 C24 32, 22 38, 28 42"
-        fill="none" stroke="#F5C542" stroke-width="2.5" stroke-linecap="round" />
-  <!-- Right serpent -->
-  <path d="M32 48 C42 44, 44 38, 38 34 C44 36, 46 42, 40 46 C46 40, 42 30, 34 28 C40 32, 42 38, 36 42"
-        fill="none" stroke="#D4961C" stroke-width="2.5" stroke-linecap="round" />
-  <!-- Orb at top -->
-  <circle cx="32" cy="10" r="4" fill="#F5C542" />
-  <circle cx="32" cy="10" r="2" fill="#FFF8E1" opacity="0.7" />
-</svg>
--- a/hermes_agent/agent/init.py
+++ b/hermes_agent/agent/init.py
--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@@ -0,0 +1,596 @@
+"""Shared auxiliary OpenAI client for cheap/fast side tasks.
+
+Provides a single resolution chain so every consumer (context compression,
+session search, web extraction, vision analysis, browser vision) picks up
+the best available backend without duplicating fallback logic.
+
+Resolution order for text tasks (auto mode):
+  1. OpenRouter  (OPENROUTER_API_KEY)
+  2. Nous Portal (~/.hermes/auth.json active provider)
+  3. Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY)
+  4. Codex OAuth (Responses API via chatgpt.com with gpt-5.3-codex,
+     wrapped to look like a chat.completions client)
+  5. Direct API-key providers (z.ai/GLM, Kimi/Moonshot, MiniMax, MiniMax-CN)
+     — checked via PROVIDER_REGISTRY entries with auth_type='api_key'
+  6. None
+
+Resolution order for vision/multimodal tasks (auto mode):
+  1. OpenRouter
+  2. Nous Portal
+  3. None  (steps 3-5 are skipped — they may not support multimodal)
+
+Per-task provider overrides (e.g. AUXILIARY_VISION_PROVIDER,
+CONTEXT_COMPRESSION_PROVIDER) can force a specific provider for each task:
+"openrouter", "nous", "codex", or "main" (= steps 3-5).
+Default "auto" follows the chains above.
+
+Per-task model overrides (e.g. AUXILIARY_VISION_MODEL,
+AUXILIARY_WEB_EXTRACT_MODEL) let callers use a different model slug
+than the provider's default.
+"""
+
+import json
+import logging
+import os
+from pathlib import Path
+from types import SimpleNamespace
+from typing import Any, Dict, List, Optional, Tuple
+
+from openai import OpenAI
+
+from hermes_constants import OPENROUTER_BASE_URL
+
+logger = logging.getLogger(__name__)
+
+# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
+_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
+    "zai": "glm-4.5-flash",
+    "kimi-coding": "kimi-k2-turbo-preview",
+    "minimax": "MiniMax-M2.5-highspeed",
+    "minimax-cn": "MiniMax-M2.5-highspeed",
+}
+
+# OpenRouter app attribution headers
+_OR_HEADERS = {
+    "HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
+    "X-OpenRouter-Title": "Hermes Agent",
+    "X-OpenRouter-Categories": "productivity,cli-agent",
+}
+
+# Nous Portal extra_body for product attribution.
+# Callers should pass this as extra_body in chat.completions.create()
+# when the auxiliary client is backed by Nous Portal.
+NOUS_EXTRA_BODY = {"tags": ["product=hermes-agent"]}
+
+# Set at resolve time — True if the auxiliary client points to Nous Portal
+auxiliary_is_nous: bool = False
+
+# Default auxiliary models per provider
+_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
+_NOUS_MODEL = "gemini-3-flash"
+_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
+_AUTH_JSON_PATH = Path.home() / ".hermes" / "auth.json"
+
+# Codex fallback: uses the Responses API (the only endpoint the Codex
+# OAuth token can access) with a fast model for auxiliary tasks.
+_CODEX_AUX_MODEL = "gpt-5.3-codex"
+_CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
+
+
+# ── Codex Responses → chat.completions adapter ─────────────────────────────
+# All auxiliary consumers call client.chat.completions.create(**kwargs) and
+# read response.choices[0].message.content. This adapter translates those
+# calls to the Codex Responses API so callers don't need any changes.
+
+
+def _convert_content_for_responses(content: Any) -> Any:
+    """Convert chat.completions content to Responses API format.
+
+    chat.completions uses:
+      {"type": "text", "text": "..."}
+      {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
+
+    Responses API uses:
+      {"type": "input_text", "text": "..."}
+      {"type": "input_image", "image_url": "data:image/png;base64,..."}
+
+    If content is a plain string, it's returned as-is (the Responses API
+    accepts strings directly for text-only messages).
+    """
+    if isinstance(content, str):
+        return content
+    if not isinstance(content, list):
+        return str(content) if content else ""
+
+    converted: List[Dict[str, Any]] = []
+    for part in content:
+        if not isinstance(part, dict):
+            continue
+        ptype = part.get("type", "")
+        if ptype == "text":
+            converted.append({"type": "input_text", "text": part.get("text", "")})
+        elif ptype == "image_url":
+            # chat.completions nests the URL: {"image_url": {"url": "..."}}
+            image_data = part.get("image_url", {})
+            url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
+            entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
+            # Preserve detail if specified
+            detail = image_data.get("detail") if isinstance(image_data, dict) else None
+            if detail:
+                entry["detail"] = detail
+            converted.append(entry)
+        elif ptype in ("input_text", "input_image"):
+            # Already in Responses format — pass through
+            converted.append(part)
+        else:
+            # Unknown content type — try to preserve as text
+            text = part.get("text", "")
+            if text:
+                converted.append({"type": "input_text", "text": text})
+
+    return converted or ""
+
+
+class _CodexCompletionsAdapter:
+    """Drop-in shim that accepts chat.completions.create() kwargs and
+    routes them through the Codex Responses streaming API."""
+
+    def __init__(self, real_client: OpenAI, model: str):
+        self._client = real_client
+        self._model = model
+
+    def create(self, **kwargs) -> Any:
+        messages = kwargs.get("messages", [])
+        model = kwargs.get("model", self._model)
+        temperature = kwargs.get("temperature")
+
+        # Separate system/instructions from conversation messages.
+        # Convert chat.completions multimodal content blocks to Responses
+        # API format (input_text / input_image instead of text / image_url).
+        instructions = "You are a helpful assistant."
+        input_msgs: List[Dict[str, Any]] = []
+        for msg in messages:
+            role = msg.get("role", "user")
+            content = msg.get("content") or ""
+            if role == "system":
+                instructions = content if isinstance(content, str) else str(content)
+            else:
+                input_msgs.append({
+                    "role": role,
+                    "content": _convert_content_for_responses(content),
+                })
+
+        resp_kwargs: Dict[str, Any] = {
+            "model": model,
+            "instructions": instructions,
+            "input": input_msgs or [{"role": "user", "content": ""}],
+            "store": False,
+        }
+
+        # Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
+        # support max_output_tokens or temperature — omit to avoid 400 errors.
+
+        # Tools support for flush_memories and similar callers
+        tools = kwargs.get("tools")
+        if tools:
+            converted = []
+            for t in tools:
+                fn = t.get("function", {}) if isinstance(t, dict) else {}
+                name = fn.get("name")
+                if not name:
+                    continue
+                converted.append({
+                    "type": "function",
+                    "name": name,
+                    "description": fn.get("description", ""),
+                    "parameters": fn.get("parameters", {}),
+                })
+            if converted:
+                resp_kwargs["tools"] = converted
+
+        # Stream and collect the response
+        text_parts: List[str] = []
+        tool_calls_raw: List[Any] = []
+        usage = None
+
+        try:
+            with self._client.responses.stream(**resp_kwargs) as stream:
+                for _event in stream:
+                    pass
+                final = stream.get_final_response()
+
+            # Extract text and tool calls from the Responses output
+            for item in getattr(final, "output", []):
+                item_type = getattr(item, "type", None)
+                if item_type == "message":
+                    for part in getattr(item, "content", []):
+                        ptype = getattr(part, "type", None)
+                        if ptype in ("output_text", "text"):
+                            text_parts.append(getattr(part, "text", ""))
+                elif item_type == "function_call":
+                    tool_calls_raw.append(SimpleNamespace(
+                        id=getattr(item, "call_id", ""),
+                        type="function",
+                        function=SimpleNamespace(
+                            name=getattr(item, "name", ""),
+                            arguments=getattr(item, "arguments", "{}"),
+                        ),
+                    ))
+
+            resp_usage = getattr(final, "usage", None)
+            if resp_usage:
+                usage = SimpleNamespace(
+                    prompt_tokens=getattr(resp_usage, "input_tokens", 0),
+                    completion_tokens=getattr(resp_usage, "output_tokens", 0),
+                    total_tokens=getattr(resp_usage, "total_tokens", 0),
+                )
+        except Exception as exc:
+            logger.debug("Codex auxiliary Responses API call failed: %s", exc)
+            raise
+
+        content = "".join(text_parts).strip() or None
+
+        # Build a response that looks like chat.completions
+        message = SimpleNamespace(
+            role="assistant",
+            content=content,
+            tool_calls=tool_calls_raw or None,
+        )
+        choice = SimpleNamespace(
+            index=0,
+            message=message,
+            finish_reason="stop" if not tool_calls_raw else "tool_calls",
+        )
+        return SimpleNamespace(
+            choices=[choice],
+            model=model,
+            usage=usage,
+        )
+
+
+class _CodexChatShim:
+    """Wraps the adapter to provide client.chat.completions.create()."""
+
+    def __init__(self, adapter: _CodexCompletionsAdapter):
+        self.completions = adapter
+
+
+class CodexAuxiliaryClient:
+    """OpenAI-client-compatible wrapper that routes through Codex Responses API.
+
+    Consumers can call client.chat.completions.create(**kwargs) as normal.
+    Also exposes .api_key and .base_url for introspection by async wrappers.
+    """
+
+    def __init__(self, real_client: OpenAI, model: str):
+        self._real_client = real_client
+        adapter = _CodexCompletionsAdapter(real_client, model)
+        self.chat = _CodexChatShim(adapter)
+        self.api_key = real_client.api_key
+        self.base_url = real_client.base_url
+
+    def close(self):
+        self._real_client.close()
+
+
+class _AsyncCodexCompletionsAdapter:
+    """Async version of the Codex Responses adapter.
+
+    Wraps the sync adapter via asyncio.to_thread() so async consumers
+    (web_tools, session_search) can await it as normal.
+    """
+
+    def __init__(self, sync_adapter: _CodexCompletionsAdapter):
+        self._sync = sync_adapter
+
+    async def create(self, **kwargs) -> Any:
+        import asyncio
+        return await asyncio.to_thread(self._sync.create, **kwargs)
+
+
+class _AsyncCodexChatShim:
+    def __init__(self, adapter: _AsyncCodexCompletionsAdapter):
+        self.completions = adapter
+
+
+class AsyncCodexAuxiliaryClient:
+    """Async-compatible wrapper matching AsyncOpenAI.chat.completions.create()."""
+
+    def __init__(self, sync_wrapper: "CodexAuxiliaryClient"):
+        sync_adapter = sync_wrapper.chat.completions
+        async_adapter = _AsyncCodexCompletionsAdapter(sync_adapter)
+        self.chat = _AsyncCodexChatShim(async_adapter)
+        self.api_key = sync_wrapper.api_key
+        self.base_url = sync_wrapper.base_url
+
+
+def _read_nous_auth() -> Optional[dict]:
+    """Read and validate ~/.hermes/auth.json for an active Nous provider.
+
+    Returns the provider state dict if Nous is active with tokens,
+    otherwise None.
+    """
+    try:
+        if not _AUTH_JSON_PATH.is_file():
+            return None
+        data = json.loads(_AUTH_JSON_PATH.read_text())
+        if data.get("active_provider") != "nous":
+            return None
+        provider = data.get("providers", {}).get("nous", {})
+        # Must have at least an access_token or agent_key
+        if not provider.get("agent_key") and not provider.get("access_token"):
+            return None
+        return provider
+    except Exception as exc:
+        logger.debug("Could not read Nous auth: %s", exc)
+        return None
+
+
+def _nous_api_key(provider: dict) -> str:
+    """Extract the best API key from a Nous provider state dict."""
+    return provider.get("agent_key") or provider.get("access_token", "")
+
+
+def _nous_base_url() -> str:
+    """Resolve the Nous inference base URL from env or default."""
+    return os.getenv("NOUS_INFERENCE_BASE_URL", _NOUS_DEFAULT_BASE_URL)
+
+
+def _read_codex_access_token() -> Optional[str]:
+    """Read a valid Codex OAuth access token from Hermes auth store (~/.hermes/auth.json)."""
+    try:
+        from hermes_cli.auth import _read_codex_tokens
+        data = _read_codex_tokens()
+        tokens = data.get("tokens", {})
+        access_token = tokens.get("access_token")
+        if isinstance(access_token, str) and access_token.strip():
+            return access_token.strip()
+        return None
+    except Exception as exc:
+        logger.debug("Could not read Codex auth for auxiliary client: %s", exc)
+        return None
+
+
+def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Try each API-key provider in PROVIDER_REGISTRY order.
+
+    Returns (client, model) for the first provider whose env var is set,
+    or (None, None) if none are configured.
+    """
+    try:
+        from hermes_cli.auth import PROVIDER_REGISTRY
+    except ImportError:
+        logger.debug("Could not import PROVIDER_REGISTRY for API-key fallback")
+        return None, None
+
+    for provider_id, pconfig in PROVIDER_REGISTRY.items():
+        if pconfig.auth_type != "api_key":
+            continue
+        # Check if any of the provider's env vars are set
+        api_key = ""
+        for env_var in pconfig.api_key_env_vars:
+            val = os.getenv(env_var, "").strip()
+            if val:
+                api_key = val
+                break
+        if not api_key:
+            continue
+        # Resolve base URL (with optional env-var override)
+        # Kimi Code keys (sk-kimi-) need api.kimi.com/coding/v1
+        env_url = ""
+        if pconfig.base_url_env_var:
+            env_url = os.getenv(pconfig.base_url_env_var, "").strip()
+        if env_url:
+            base_url = env_url.rstrip("/")
+        elif provider_id == "kimi-coding" and api_key.startswith("sk-kimi-"):
+            base_url = "https://api.kimi.com/coding/v1"
+        else:
+            base_url = pconfig.inference_base_url
+        model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
+        logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
+        extra = {}
+        if "api.kimi.com" in base_url.lower():
+            extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
+        return OpenAI(api_key=api_key, base_url=base_url, **extra), model
+
+    return None, None
+
+
+# ── Provider resolution helpers ─────────────────────────────────────────────
+
+def _get_auxiliary_provider(task: str = "") -> str:
+    """Read the provider override for a specific auxiliary task.
+
+    Checks AUXILIARY_{TASK}_PROVIDER first (e.g. AUXILIARY_VISION_PROVIDER),
+    then CONTEXT_{TASK}_PROVIDER (for the compression section's summary_provider),
+    then falls back to "auto".  Returns one of: "auto", "openrouter", "nous", "main".
+    """
+    if task:
+        for prefix in ("AUXILIARY_", "CONTEXT_"):
+            val = os.getenv(f"{prefix}{task.upper()}_PROVIDER", "").strip().lower()
+            if val and val != "auto":
+                return val
+    return "auto"
+
+
+def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
+    or_key = os.getenv("OPENROUTER_API_KEY")
+    if not or_key:
+        return None, None
+    logger.debug("Auxiliary client: OpenRouter")
+    return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
+                   default_headers=_OR_HEADERS), _OPENROUTER_MODEL
+
+
+def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
+    nous = _read_nous_auth()
+    if not nous:
+        return None, None
+    global auxiliary_is_nous
+    auxiliary_is_nous = True
+    logger.debug("Auxiliary client: Nous Portal")
+    return (
+        OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
+        _NOUS_MODEL,
+    )
+
+
+def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
+    custom_base = os.getenv("OPENAI_BASE_URL")
+    custom_key = os.getenv("OPENAI_API_KEY")
+    if not custom_base or not custom_key:
+        return None, None
+    model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
+    logger.debug("Auxiliary client: custom endpoint (%s)", model)
+    return OpenAI(api_key=custom_key, base_url=custom_base), model
+
+
+def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
+    codex_token = _read_codex_access_token()
+    if not codex_token:
+        return None, None
+    logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
+    real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
+    return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
+
+
+def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Resolve a specific forced provider.  Returns (None, None) if creds missing."""
+    if forced == "openrouter":
+        client, model = _try_openrouter()
+        if client is None:
+            logger.warning("auxiliary.provider=openrouter but OPENROUTER_API_KEY not set")
+        return client, model
+
+    if forced == "nous":
+        client, model = _try_nous()
+        if client is None:
+            logger.warning("auxiliary.provider=nous but Nous Portal not configured (run: hermes login)")
+        return client, model
+
+    if forced == "codex":
+        client, model = _try_codex()
+        if client is None:
+            logger.warning("auxiliary.provider=codex but no Codex OAuth token found (run: hermes model)")
+        return client, model
+
+    if forced == "main":
+        # "main" = skip OpenRouter/Nous, use the main chat model's credentials.
+        for try_fn in (_try_custom_endpoint, _try_codex, _resolve_api_key_provider):
+            client, model = try_fn()
+            if client is not None:
+                return client, model
+        logger.warning("auxiliary.provider=main but no main endpoint credentials found")
+        return None, None
+
+    # Unknown provider name — fall through to auto
+    logger.warning("Unknown auxiliary.provider=%r, falling back to auto", forced)
+    return None, None
+
+
+def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Full auto-detection chain: OpenRouter → Nous → custom → Codex → API-key → None."""
+    for try_fn in (_try_openrouter, _try_nous, _try_custom_endpoint,
+                   _try_codex, _resolve_api_key_provider):
+        client, model = try_fn()
+        if client is not None:
+            return client, model
+    logger.debug("Auxiliary client: none available")
+    return None, None
+
+
+# ── Public API ──────────────────────────────────────────────────────────────
+
+def get_text_auxiliary_client(task: str = "") -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Return (client, default_model_slug) for text-only auxiliary tasks.
+
+    Args:
+        task: Optional task name ("compression", "web_extract") to check
+              for a task-specific provider override.
+
+    Callers may override the returned model with a per-task env var
+    (e.g. CONTEXT_COMPRESSION_MODEL, AUXILIARY_WEB_EXTRACT_MODEL).
+    """
+    forced = _get_auxiliary_provider(task)
+    if forced != "auto":
+        return _resolve_forced_provider(forced)
+    return _resolve_auto()
+
+
+def get_async_text_auxiliary_client(task: str = ""):
+    """Return (async_client, model_slug) for async consumers.
+
+    For standard providers returns (AsyncOpenAI, model). For Codex returns
+    (AsyncCodexAuxiliaryClient, model) which wraps the Responses API.
+    Returns (None, None) when no provider is available.
+    """
+    from openai import AsyncOpenAI
+
+    sync_client, model = get_text_auxiliary_client(task)
+    if sync_client is None:
+        return None, None
+
+    if isinstance(sync_client, CodexAuxiliaryClient):
+        return AsyncCodexAuxiliaryClient(sync_client), model
+
+    async_kwargs = {
+        "api_key": sync_client.api_key,
+        "base_url": str(sync_client.base_url),
+    }
+    if "openrouter" in str(sync_client.base_url).lower():
+        async_kwargs["default_headers"] = dict(_OR_HEADERS)
+    elif "api.kimi.com" in str(sync_client.base_url).lower():
+        async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
+    return AsyncOpenAI(**async_kwargs), model
+
+
+def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Return (client, default_model_slug) for vision/multimodal auxiliary tasks.
+
+    Checks AUXILIARY_VISION_PROVIDER for a forced provider, otherwise
+    auto-detects.  Callers may override the returned model with
+    AUXILIARY_VISION_MODEL.
+
+    In auto mode, only providers known to support multimodal are tried:
+    OpenRouter, Nous Portal, and Codex OAuth (gpt-5.3-codex supports
+    vision via the Responses API).  Custom endpoints and API-key
+    providers are skipped — they may not handle vision input.  To use
+    them, set AUXILIARY_VISION_PROVIDER explicitly.
+    """
+    forced = _get_auxiliary_provider("vision")
+    if forced != "auto":
+        return _resolve_forced_provider(forced)
+    # Auto: only multimodal-capable providers
+    for try_fn in (_try_openrouter, _try_nous, _try_codex):
+        client, model = try_fn()
+        if client is not None:
+            return client, model
+    logger.debug("Auxiliary vision client: none available (auto only tries OpenRouter/Nous/Codex)")
+    return None, None
+
+
+def get_auxiliary_extra_body() -> dict:
+    """Return extra_body kwargs for auxiliary API calls.
+    
+    Includes Nous Portal product tags when the auxiliary client is backed
+    by Nous Portal. Returns empty dict otherwise.
+    """
+    return dict(NOUS_EXTRA_BODY) if auxiliary_is_nous else {}
+
+
+def auxiliary_max_tokens_param(value: int) -> dict:
+    """Return the correct max tokens kwarg for the auxiliary client's provider.
+    
+    OpenRouter and local models use 'max_tokens'. Direct OpenAI with newer
+    models (gpt-4o, o-series, gpt-5+) requires 'max_completion_tokens'.
+    The Codex adapter translates max_tokens internally, so we use max_tokens
+    for it as well.
+    """
+    custom_base = os.getenv("OPENAI_BASE_URL", "")
+    or_key = os.getenv("OPENROUTER_API_KEY")
+    # Only use max_completion_tokens for direct OpenAI custom endpoints
+    if (not or_key
+            and _read_nous_auth() is None
+            and "api.openai.com" in custom_base.lower()):
+        return {"max_completion_tokens": value}
+    return {"max_tokens": value}
--- a/agent/context_compressor.py
+++ b/agent/context_compressor.py
@@ -0,0 +1,365 @@
+"""Automatic context window compression for long conversations.
+
+Self-contained class with its own OpenAI client for summarization.
+Uses Gemini Flash (cheap/fast) to summarize middle turns while
+protecting head and tail context.
+"""
+
+import logging
+import os
+from typing import Any, Dict, List, Optional
+
+from agent.auxiliary_client import get_text_auxiliary_client
+from agent.model_metadata import (
+    get_model_context_length,
+    estimate_messages_tokens_rough,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class ContextCompressor:
+    """Compresses conversation context when approaching the model's context limit.
+
+    Algorithm: protect first N + last N turns, summarize everything in between.
+    Token tracking uses actual counts from API responses for accuracy.
+    """
+
+    def __init__(
+        self,
+        model: str,
+        threshold_percent: float = 0.85,
+        protect_first_n: int = 3,
+        protect_last_n: int = 4,
+        summary_target_tokens: int = 2500,
+        quiet_mode: bool = False,
+        summary_model_override: str = None,
+        base_url: str = "",
+    ):
+        self.model = model
+        self.base_url = base_url
+        self.threshold_percent = threshold_percent
+        self.protect_first_n = protect_first_n
+        self.protect_last_n = protect_last_n
+        self.summary_target_tokens = summary_target_tokens
+        self.quiet_mode = quiet_mode
+
+        self.context_length = get_model_context_length(model, base_url=base_url)
+        self.threshold_tokens = int(self.context_length * threshold_percent)
+        self.compression_count = 0
+        self._context_probed = False  # True after a step-down from context error
+
+        self.last_prompt_tokens = 0
+        self.last_completion_tokens = 0
+        self.last_total_tokens = 0
+
+        self.client, default_model = get_text_auxiliary_client("compression")
+        self.summary_model = summary_model_override or default_model
+
+    def update_from_response(self, usage: Dict[str, Any]):
+        """Update tracked token usage from API response."""
+        self.last_prompt_tokens = usage.get("prompt_tokens", 0)
+        self.last_completion_tokens = usage.get("completion_tokens", 0)
+        self.last_total_tokens = usage.get("total_tokens", 0)
+
+    def should_compress(self, prompt_tokens: int = None) -> bool:
+        """Check if context exceeds the compression threshold."""
+        tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
+        return tokens >= self.threshold_tokens
+
+    def should_compress_preflight(self, messages: List[Dict[str, Any]]) -> bool:
+        """Quick pre-flight check using rough estimate (before API call)."""
+        rough_estimate = estimate_messages_tokens_rough(messages)
+        return rough_estimate >= self.threshold_tokens
+
+    def get_status(self) -> Dict[str, Any]:
+        """Get current compression status for display/logging."""
+        return {
+            "last_prompt_tokens": self.last_prompt_tokens,
+            "threshold_tokens": self.threshold_tokens,
+            "context_length": self.context_length,
+            "usage_percent": (self.last_prompt_tokens / self.context_length * 100) if self.context_length else 0,
+            "compression_count": self.compression_count,
+        }
+
+    def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> Optional[str]:
+        """Generate a concise summary of conversation turns.
+
+        Tries the auxiliary model first, then falls back to the user's main
+        model.  Returns None if all attempts fail — the caller should drop
+        the middle turns without a summary rather than inject a useless
+        placeholder.
+        """
+        parts = []
+        for msg in turns_to_summarize:
+            role = msg.get("role", "unknown")
+            content = msg.get("content") or ""
+            if len(content) > 2000:
+                content = content[:1000] + "\n...[truncated]...\n" + content[-500:]
+            tool_calls = msg.get("tool_calls", [])
+            if tool_calls:
+                tool_names = [tc.get("function", {}).get("name", "?") for tc in tool_calls if isinstance(tc, dict)]
+                content += f"\n[Tool calls: {', '.join(tool_names)}]"
+            parts.append(f"[{role.upper()}]: {content}")
+
+        content_to_summarize = "\n\n".join(parts)
+        prompt = f"""Summarize these conversation turns concisely. This summary will replace these turns in the conversation history.
+
+Write from a neutral perspective describing:
+1. What actions were taken (tool calls, searches, file operations)
+2. Key information or results obtained
+3. Important decisions or findings
+4. Relevant data, file names, or outputs
+
+Keep factual and informative. Target ~{self.summary_target_tokens} tokens.
+
+---
+TURNS TO SUMMARIZE:
+{content_to_summarize}
+---
+
+Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
+
+        # 1. Try the auxiliary model (cheap/fast)
+        if self.client:
+            try:
+                return self._call_summary_model(self.client, self.summary_model, prompt)
+            except Exception as e:
+                logging.warning(f"Failed to generate context summary with auxiliary model: {e}")
+
+        # 2. Fallback: try the user's main model endpoint
+        fallback_client, fallback_model = self._get_fallback_client()
+        if fallback_client is not None:
+            try:
+                logger.info("Retrying context summary with main model (%s)", fallback_model)
+                summary = self._call_summary_model(fallback_client, fallback_model, prompt)
+                self.client = fallback_client
+                self.summary_model = fallback_model
+                return summary
+            except Exception as fallback_err:
+                logging.warning(f"Main model summary also failed: {fallback_err}")
+
+        # 3. All models failed — return None so the caller drops turns without a summary
+        logging.warning("Context compression: no model available for summary. Middle turns will be dropped without summary.")
+        return None
+
+    def _call_summary_model(self, client, model: str, prompt: str) -> str:
+        """Make the actual LLM call to generate a summary. Raises on failure."""
+        kwargs = {
+            "model": model,
+            "messages": [{"role": "user", "content": prompt}],
+            "temperature": 0.3,
+            "timeout": 30.0,
+        }
+        # Most providers (OpenRouter, local models) use max_tokens.
+        # Direct OpenAI with newer models (gpt-4o, o-series, gpt-5+)
+        # requires max_completion_tokens instead.
+        try:
+            kwargs["max_tokens"] = self.summary_target_tokens * 2
+            response = client.chat.completions.create(**kwargs)
+        except Exception as first_err:
+            if "max_tokens" in str(first_err) or "unsupported_parameter" in str(first_err):
+                kwargs.pop("max_tokens", None)
+                kwargs["max_completion_tokens"] = self.summary_target_tokens * 2
+                response = client.chat.completions.create(**kwargs)
+            else:
+                raise
+
+        summary = response.choices[0].message.content.strip()
+        if not summary.startswith("[CONTEXT SUMMARY]:"):
+            summary = "[CONTEXT SUMMARY]: " + summary
+        return summary
+
+    def _get_fallback_client(self):
+        """Try to build a fallback client from the main model's endpoint config.
+
+        When the primary auxiliary client fails (e.g. stale OpenRouter key), this
+        creates a client using the user's active custom endpoint (OPENAI_BASE_URL)
+        so compression can still produce a real summary instead of a static string.
+
+        Returns (client, model) or (None, None).
+        """
+        custom_base = os.getenv("OPENAI_BASE_URL")
+        custom_key = os.getenv("OPENAI_API_KEY")
+        if not custom_base or not custom_key:
+            return None, None
+
+        # Don't fallback to the same provider that just failed
+        from hermes_constants import OPENROUTER_BASE_URL
+        if custom_base.rstrip("/") == OPENROUTER_BASE_URL.rstrip("/"):
+            return None, None
+
+        model = os.getenv("LLM_MODEL") or os.getenv("OPENAI_MODEL") or self.model
+        try:
+            from openai import OpenAI as _OpenAI
+            client = _OpenAI(api_key=custom_key, base_url=custom_base)
+            logger.debug("Built fallback auxiliary client: %s via %s", model, custom_base)
+            return client, model
+        except Exception as exc:
+            logger.debug("Could not build fallback auxiliary client: %s", exc)
+            return None, None
+
+    # ------------------------------------------------------------------
+    # Tool-call / tool-result pair integrity helpers
+    # ------------------------------------------------------------------
+
+    @staticmethod
+    def _get_tool_call_id(tc) -> str:
+        """Extract the call ID from a tool_call entry (dict or SimpleNamespace)."""
+        if isinstance(tc, dict):
+            return tc.get("id", "")
+        return getattr(tc, "id", "") or ""
+
+    def _sanitize_tool_pairs(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+        """Fix orphaned tool_call / tool_result pairs after compression.
+
+        Two failure modes:
+        1. A tool *result* references a call_id whose assistant tool_call was
+           removed (summarized/truncated).  The API rejects this with
+           "No tool call found for function call output with call_id ...".
+        2. An assistant message has tool_calls whose results were dropped.
+           The API rejects this because every tool_call must be followed by
+           a tool result with the matching call_id.
+
+        This method removes orphaned results and inserts stub results for
+        orphaned calls so the message list is always well-formed.
+        """
+        surviving_call_ids: set = set()
+        for msg in messages:
+            if msg.get("role") == "assistant":
+                for tc in msg.get("tool_calls") or []:
+                    cid = self._get_tool_call_id(tc)
+                    if cid:
+                        surviving_call_ids.add(cid)
+
+        result_call_ids: set = set()
+        for msg in messages:
+            if msg.get("role") == "tool":
+                cid = msg.get("tool_call_id")
+                if cid:
+                    result_call_ids.add(cid)
+
+        # 1. Remove tool results whose call_id has no matching assistant tool_call
+        orphaned_results = result_call_ids - surviving_call_ids
+        if orphaned_results:
+            messages = [
+                m for m in messages
+                if not (m.get("role") == "tool" and m.get("tool_call_id") in orphaned_results)
+            ]
+            if not self.quiet_mode:
+                logger.info("Compression sanitizer: removed %d orphaned tool result(s)", len(orphaned_results))
+
+        # 2. Add stub results for assistant tool_calls whose results were dropped
+        missing_results = surviving_call_ids - result_call_ids
+        if missing_results:
+            patched: List[Dict[str, Any]] = []
+            for msg in messages:
+                patched.append(msg)
+                if msg.get("role") == "assistant":
+                    for tc in msg.get("tool_calls") or []:
+                        cid = self._get_tool_call_id(tc)
+                        if cid in missing_results:
+                            patched.append({
+                                "role": "tool",
+                                "content": "[Result from earlier conversation — see context summary above]",
+                                "tool_call_id": cid,
+                            })
+            messages = patched
+            if not self.quiet_mode:
+                logger.info("Compression sanitizer: added %d stub tool result(s)", len(missing_results))
+
+        return messages
+
+    def _align_boundary_forward(self, messages: List[Dict[str, Any]], idx: int) -> int:
+        """Push a compress-start boundary forward past any orphan tool results.
+
+        If ``messages[idx]`` is a tool result, slide forward until we hit a
+        non-tool message so we don't start the summarised region mid-group.
+        """
+        while idx < len(messages) and messages[idx].get("role") == "tool":
+            idx += 1
+        return idx
+
+    def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
+        """Pull a compress-end boundary backward to avoid splitting a
+        tool_call / result group.
+
+        If the message just before ``idx`` is an assistant message with
+        tool_calls, those tool results will start at ``idx`` and would be
+        separated from their parent.  Move backwards to include the whole
+        group in the summarised region.
+        """
+        if idx <= 0 or idx >= len(messages):
+            return idx
+        prev = messages[idx - 1]
+        if prev.get("role") == "assistant" and prev.get("tool_calls"):
+            # The results for this assistant turn sit at idx..idx+k.
+            # Include the assistant message in the summarised region too.
+            idx -= 1
+        return idx
+
+    def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
+        """Compress conversation messages by summarizing middle turns.
+
+        Keeps first N + last N turns, summarizes everything in between.
+        After compression, orphaned tool_call / tool_result pairs are cleaned
+        up so the API never receives mismatched IDs.
+        """
+        n_messages = len(messages)
+        if n_messages <= self.protect_first_n + self.protect_last_n + 1:
+            if not self.quiet_mode:
+                print(f"⚠️  Cannot compress: only {n_messages} messages (need > {self.protect_first_n + self.protect_last_n + 1})")
+            return messages
+
+        compress_start = self.protect_first_n
+        compress_end = n_messages - self.protect_last_n
+        if compress_start >= compress_end:
+            return messages
+
+        # Adjust boundaries to avoid splitting tool_call/result groups.
+        compress_start = self._align_boundary_forward(messages, compress_start)
+        compress_end = self._align_boundary_backward(messages, compress_end)
+        if compress_start >= compress_end:
+            return messages
+
+        turns_to_summarize = messages[compress_start:compress_end]
+        display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)
+
+        if not self.quiet_mode:
+            print(f"\n📦 Context compression triggered ({display_tokens:,} tokens ≥ {self.threshold_tokens:,} threshold)")
+            print(f"   📊 Model context limit: {self.context_length:,} tokens ({self.threshold_percent*100:.0f}% = {self.threshold_tokens:,})")
+
+        if not self.quiet_mode:
+            print(f"   🗜️  Summarizing turns {compress_start+1}-{compress_end} ({len(turns_to_summarize)} turns)")
+
+        summary = self._generate_summary(turns_to_summarize)
+
+        compressed = []
+        for i in range(compress_start):
+            msg = messages[i].copy()
+            if i == 0 and msg.get("role") == "system" and self.compression_count == 0:
+                msg["content"] = (msg.get("content") or "") + "\n\n[Note: Some earlier conversation turns may be summarized to preserve context space.]"
+            compressed.append(msg)
+
+        if summary:
+            last_head_role = messages[compress_start - 1].get("role", "user") if compress_start > 0 else "user"
+            summary_role = "user" if last_head_role in ("assistant", "tool") else "assistant"
+            compressed.append({"role": summary_role, "content": summary})
+        else:
+            if not self.quiet_mode:
+                print("   ⚠️  No summary model available — middle turns dropped without summary")
+
+        for i in range(compress_end, n_messages):
+            compressed.append(messages[i].copy())
+
+        self.compression_count += 1
+
+        compressed = self._sanitize_tool_pairs(compressed)
+
+        if not self.quiet_mode:
+            new_estimate = estimate_messages_tokens_rough(compressed)
+            saved_estimate = display_tokens - new_estimate
+            print(f"   ✅ Compressed: {n_messages} → {len(compressed)} messages (~{saved_estimate:,} tokens saved)")
+            print(f"   💡 Compression #{self.compression_count} complete")
+
+        return compressed
--- a/agent/display.py
+++ b/agent/display.py
@@ -0,0 +1,469 @@
+"""CLI presentation -- spinner, kawaii faces, tool preview formatting.
+
+Pure display functions and classes with no AIAgent dependency.
+Used by AIAgent._execute_tool_calls for CLI feedback.
+"""
+
+import json
+import os
+import random
+import sys
+import threading
+import time
+
+# ANSI escape codes for coloring tool failure indicators
+_RED = "\033[31m"
+_RESET = "\033[0m"
+
+
+# =========================================================================
+# Tool preview (one-line summary of a tool call's primary argument)
+# =========================================================================
+
+def build_tool_preview(tool_name: str, args: dict, max_len: int = 40) -> str:
+    """Build a short preview of a tool call's primary argument for display."""
+    primary_args = {
+        "terminal": "command", "web_search": "query", "web_extract": "urls",
+        "read_file": "path", "write_file": "path", "patch": "path",
+        "search_files": "pattern", "browser_navigate": "url",
+        "browser_click": "ref", "browser_type": "text",
+        "image_generate": "prompt", "text_to_speech": "text",
+        "vision_analyze": "question", "mixture_of_agents": "user_prompt",
+        "skill_view": "name", "skills_list": "category",
+        "schedule_cronjob": "name",
+        "execute_code": "code", "delegate_task": "goal",
+        "clarify": "question", "skill_manage": "name",
+    }
+
+    if tool_name == "process":
+        action = args.get("action", "")
+        sid = args.get("session_id", "")
+        data = args.get("data", "")
+        timeout_val = args.get("timeout")
+        parts = [action]
+        if sid:
+            parts.append(sid[:16])
+        if data:
+            parts.append(f'"{data[:20]}"')
+        if timeout_val and action == "wait":
+            parts.append(f"{timeout_val}s")
+        return " ".join(parts) if parts else None
+
+    if tool_name == "todo":
+        todos_arg = args.get("todos")
+        merge = args.get("merge", False)
+        if todos_arg is None:
+            return "reading task list"
+        elif merge:
+            return f"updating {len(todos_arg)} task(s)"
+        else:
+            return f"planning {len(todos_arg)} task(s)"
+
+    if tool_name == "session_search":
+        query = args.get("query", "")
+        return f"recall: \"{query[:25]}{'...' if len(query) > 25 else ''}\""
+
+    if tool_name == "memory":
+        action = args.get("action", "")
+        target = args.get("target", "")
+        if action == "add":
+            content = args.get("content", "")
+            return f"+{target}: \"{content[:25]}{'...' if len(content) > 25 else ''}\""
+        elif action == "replace":
+            return f"~{target}: \"{args.get('old_text', '')[:20]}\""
+        elif action == "remove":
+            return f"-{target}: \"{args.get('old_text', '')[:20]}\""
+        return action
+
+    if tool_name == "send_message":
+        target = args.get("target", "?")
+        msg = args.get("message", "")
+        if len(msg) > 20:
+            msg = msg[:17] + "..."
+        return f"to {target}: \"{msg}\""
+
+    if tool_name.startswith("rl_"):
+        rl_previews = {
+            "rl_list_environments": "listing envs",
+            "rl_select_environment": args.get("name", ""),
+            "rl_get_current_config": "reading config",
+            "rl_edit_config": f"{args.get('field', '')}={args.get('value', '')}",
+            "rl_start_training": "starting",
+            "rl_check_status": args.get("run_id", "")[:16],
+            "rl_stop_training": f"stopping {args.get('run_id', '')[:16]}",
+            "rl_get_results": args.get("run_id", "")[:16],
+            "rl_list_runs": "listing runs",
+            "rl_test_inference": f"{args.get('num_steps', 3)} steps",
+        }
+        return rl_previews.get(tool_name)
+
+    key = primary_args.get(tool_name)
+    if not key:
+        for fallback_key in ("query", "text", "command", "path", "name", "prompt", "code", "goal"):
+            if fallback_key in args:
+                key = fallback_key
+                break
+
+    if not key or key not in args:
+        return None
+
+    value = args[key]
+    if isinstance(value, list):
+        value = value[0] if value else ""
+
+    preview = str(value).strip()
+    if not preview:
+        return None
+    if len(preview) > max_len:
+        preview = preview[:max_len - 3] + "..."
+    return preview
+
+
+# =========================================================================
+# KawaiiSpinner
+# =========================================================================
+
+class KawaiiSpinner:
+    """Animated spinner with kawaii faces for CLI feedback during tool execution."""
+
+    SPINNERS = {
+        'dots': ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'],
+        'bounce': ['⠁', '⠂', '⠄', '⡀', '⢀', '⠠', '⠐', '⠈'],
+        'grow': ['▁', '▂', '▃', '▄', '▅', '▆', '▇', '█', '▇', '▆', '▅', '▄', '▃', '▂'],
+        'arrows': ['←', '↖', '↑', '↗', '→', '↘', '↓', '↙'],
+        'star': ['✶', '✷', '✸', '✹', '✺', '✹', '✸', '✷'],
+        'moon': ['🌑', '🌒', '🌓', '🌔', '🌕', '🌖', '🌗', '🌘'],
+        'pulse': ['◜', '◠', '◝', '◞', '◡', '◟'],
+        'brain': ['🧠', '💭', '💡', '✨', '💫', '🌟', '💡', '💭'],
+        'sparkle': ['⁺', '˚', '*', '✧', '✦', '✧', '*', '˚'],
+    }
+
+    KAWAII_WAITING = [
+        "(｡◕‿◕｡)", "(◕‿◕✿)", "٩(◕‿◕｡)۶", "(✿◠‿◠)", "( ˘▽˘)っ",
+        "♪(´ε` )", "(◕ᴗ◕✿)", "ヾ(＾∇＾)", "(≧◡≦)", "(★ω★)",
+    ]
+
+    KAWAII_THINKING = [
+        "(｡•́︿•̀｡)", "(◔_◔)", "(¬‿¬)", "( •_•)>⌐■-■", "(⌐■_■)",
+        "(´･_･`)", "◉_◉", "(°ロ°)", "( ˘⌣˘)♡", "ヽ(>∀<☆)☆",
+        "٩(๑❛ᴗ❛๑)۶", "(⊙_⊙)", "(¬_¬)", "( ͡° ͜ʖ ͡°)", "ಠ_ಠ",
+    ]
+
+    THINKING_VERBS = [
+        "pondering", "contemplating", "musing", "cogitating", "ruminating",
+        "deliberating", "mulling", "reflecting", "processing", "reasoning",
+        "analyzing", "computing", "synthesizing", "formulating", "brainstorming",
+    ]
+
+    def __init__(self, message: str = "", spinner_type: str = 'dots'):
+        self.message = message
+        self.spinner_frames = self.SPINNERS.get(spinner_type, self.SPINNERS['dots'])
+        self.running = False
+        self.thread = None
+        self.frame_idx = 0
+        self.start_time = None
+        self.last_line_len = 0
+        # Capture stdout NOW, before any redirect_stdout(devnull) from
+        # child agents can replace sys.stdout with a black hole.
+        self._out = sys.stdout
+
+    def _write(self, text: str, end: str = '\n', flush: bool = False):
+        """Write to the stdout captured at spinner creation time."""
+        try:
+            self._out.write(text + end)
+            if flush:
+                self._out.flush()
+        except (ValueError, OSError):
+            pass
+
+    def _animate(self):
+        while self.running:
+            if os.getenv("HERMES_SPINNER_PAUSE"):
+                time.sleep(0.1)
+                continue
+            frame = self.spinner_frames[self.frame_idx % len(self.spinner_frames)]
+            elapsed = time.time() - self.start_time
+            line = f"  {frame} {self.message} ({elapsed:.1f}s)"
+            pad = max(self.last_line_len - len(line), 0)
+            self._write(f"\r{line}{' ' * pad}", end='', flush=True)
+            self.last_line_len = len(line)
+            self.frame_idx += 1
+            time.sleep(0.12)
+
+    def start(self):
+        if self.running:
+            return
+        self.running = True
+        self.start_time = time.time()
+        self.thread = threading.Thread(target=self._animate, daemon=True)
+        self.thread.start()
+
+    def update_text(self, new_message: str):
+        self.message = new_message
+
+    def print_above(self, text: str):
+        """Print a line above the spinner without disrupting animation.
+
+        Clears the current spinner line, prints the text, and lets the
+        next animation tick redraw the spinner on the line below.
+        Thread-safe: uses the captured stdout reference (self._out).
+        Works inside redirect_stdout(devnull) because _write bypasses
+        sys.stdout and writes to the stdout captured at spinner creation.
+        """
+        if not self.running:
+            self._write(f"  {text}", flush=True)
+            return
+        # Clear spinner line with spaces (not \033[K) to avoid garbled escape
+        # codes when prompt_toolkit's patch_stdout is active — same approach
+        # as stop(). Then print text; spinner redraws on next tick.
+        blanks = ' ' * max(self.last_line_len + 5, 40)
+        self._write(f"\r{blanks}\r  {text}", flush=True)
+
+    def stop(self, final_message: str = None):
+        self.running = False
+        if self.thread:
+            self.thread.join(timeout=0.5)
+        # Clear the spinner line with spaces instead of \033[K to avoid
+        # garbled escape codes when prompt_toolkit's patch_stdout is active.
+        blanks = ' ' * max(self.last_line_len + 5, 40)
+        self._write(f"\r{blanks}\r", end='', flush=True)
+        if final_message:
+            self._write(f"  {final_message}", flush=True)
+
+    def __enter__(self):
+        self.start()
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        self.stop()
+        return False
+
+
+# =========================================================================
+# Kawaii face arrays (used by AIAgent._execute_tool_calls for spinner text)
+# =========================================================================
+
+KAWAII_SEARCH = [
+    "♪(´ε` )", "(｡◕‿◕｡)", "ヾ(＾∇＾)", "(◕ᴗ◕✿)", "( ˘▽˘)っ",
+    "٩(◕‿◕｡)۶", "(✿◠‿◠)", "♪～(´ε｀ )", "(ノ´ヮ`)ノ*:・゚✧", "＼(◎o◎)／",
+]
+KAWAII_READ = [
+    "φ(゜▽゜*)♪", "( ˘▽˘)っ", "(⌐■_■)", "٩(｡•́‿•̀｡)۶", "(◕‿◕✿)",
+    "ヾ(＠⌒ー⌒＠)ノ", "(✧ω✧)", "♪(๑ᴖ◡ᴖ๑)♪", "(≧◡≦)", "( ´ ▽ ` )ノ",
+]
+KAWAII_TERMINAL = [
+    "ヽ(>∀<☆)ノ", "(ノ°∀°)ノ", "٩(^ᴗ^)۶", "ヾ(⌐■_■)ノ♪", "(•̀ᴗ•́)و",
+    "┗(＾0＾)┓", "(｀・ω・´)", "＼(￣▽￣)／", "(ง •̀_•́)ง", "ヽ(´▽`)/",
+]
+KAWAII_BROWSER = [
+    "(ノ°∀°)ノ", "(☞゚ヮ゚)☞", "( ͡° ͜ʖ ͡°)", "┌( ಠ_ಠ)┘", "(⊙_⊙)？",
+    "ヾ(•ω•`)o", "(￣ω￣)", "( ˇωˇ )", "(ᵔᴥᵔ)", "＼(◎o◎)／",
+]
+KAWAII_CREATE = [
+    "✧*。٩(ˊᗜˋ*)و✧", "(ﾉ◕ヮ◕)ﾉ*:・ﾟ✧", "ヽ(>∀<☆)ノ", "٩(♡ε♡)۶", "(◕‿◕)♡",
+    "✿◕ ‿ ◕✿", "(*≧▽≦)", "ヾ(＾-＾)ノ", "(☆▽☆)", "°˖✧◝(⁰▿⁰)◜✧˖°",
+]
+KAWAII_SKILL = [
+    "ヾ(＠⌒ー⌒＠)ノ", "(๑˃ᴗ˂)ﻭ", "٩(◕‿◕｡)۶", "(✿╹◡╹)", "ヽ(・∀・)ノ",
+    "(ノ´ヮ`)ノ*:・ﾟ✧", "♪(๑ᴖ◡ᴖ๑)♪", "(◠‿◠)", "٩(ˊᗜˋ*)و", "(＾▽＾)",
+    "ヾ(＾∇＾)", "(★ω★)/", "٩(｡•́‿•̀｡)۶", "(◕ᴗ◕✿)", "＼(◎o◎)／",
+    "(✧ω✧)", "ヽ(>∀<☆)ノ", "( ˘▽˘)っ", "(≧◡≦) ♡", "ヾ(￣▽￣)",
+]
+KAWAII_THINK = [
+    "(っ°Д°;)っ", "(；′⌒`)", "(・_・ヾ", "( ´_ゝ`)", "(￣ヘ￣)",
+    "(。-`ω´-)", "( ˘︹˘ )", "(¬_¬)", "ヽ(ー_ー )ノ", "(；一_一)",
+]
+KAWAII_GENERIC = [
+    "♪(´ε` )", "(◕‿◕✿)", "ヾ(＾∇＾)", "٩(◕‿◕｡)۶", "(✿◠‿◠)",
+    "(ノ´ヮ`)ノ*:・ﾟ✧", "ヽ(>∀<☆)ノ", "(☆▽☆)", "( ˘▽˘)っ", "(≧◡≦)",
+]
+
+
+# =========================================================================
+# Cute tool message (completion line that replaces the spinner)
+# =========================================================================
+
+def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]:
+    """Inspect a tool result string for signs of failure.
+
+    Returns ``(is_failure, suffix)`` where *suffix* is an informational tag
+    like ``" [exit 1]"`` for terminal failures, or ``" [error]"`` for generic
+    failures.  On success, returns ``(False, "")``.
+    """
+    if result is None:
+        return False, ""
+
+    if tool_name == "terminal":
+        try:
+            data = json.loads(result)
+            exit_code = data.get("exit_code")
+            if exit_code is not None and exit_code != 0:
+                return True, f" [exit {exit_code}]"
+        except (json.JSONDecodeError, TypeError, AttributeError):
+            pass
+        return False, ""
+
+    # Memory-specific: distinguish "full" from real errors
+    if tool_name == "memory":
+        try:
+            data = json.loads(result)
+            if data.get("success") is False and "exceed the limit" in data.get("error", ""):
+                return True, " [full]"
+        except (json.JSONDecodeError, TypeError, AttributeError):
+            pass
+
+    # Generic heuristic for non-terminal tools
+    lower = result[:500].lower()
+    if '"error"' in lower or '"failed"' in lower or result.startswith("Error"):
+        return True, " [error]"
+
+    return False, ""
+
+
+def get_cute_tool_message(
+    tool_name: str, args: dict, duration: float, result: str | None = None,
+) -> str:
+    """Generate a formatted tool completion line for CLI quiet mode.
+
+    Format: ``| {emoji} {verb:9} {detail}  {duration}``
+
+    When *result* is provided the line is checked for failure indicators.
+    Failed tool calls get a red prefix and an informational suffix.
+    """
+    dur = f"{duration:.1f}s"
+    is_failure, failure_suffix = _detect_tool_failure(tool_name, result)
+
+    def _trunc(s, n=40):
+        s = str(s)
+        return (s[:n-3] + "...") if len(s) > n else s
+
+    def _path(p, n=35):
+        p = str(p)
+        return ("..." + p[-(n-3):]) if len(p) > n else p
+
+    def _wrap(line: str) -> str:
+        """Append failure suffix when the tool failed."""
+        if not is_failure:
+            return line
+        return f"{line}{failure_suffix}"
+
+    if tool_name == "web_search":
+        return _wrap(f"┊ 🔍 search    {_trunc(args.get('query', ''), 42)}  {dur}")
+    if tool_name == "web_extract":
+        urls = args.get("urls", [])
+        if urls:
+            url = urls[0] if isinstance(urls, list) else str(urls)
+            domain = url.replace("https://", "").replace("http://", "").split("/")[0]
+            extra = f" +{len(urls)-1}" if len(urls) > 1 else ""
+            return _wrap(f"┊ 📄 fetch     {_trunc(domain, 35)}{extra}  {dur}")
+        return _wrap(f"┊ 📄 fetch     pages  {dur}")
+    if tool_name == "web_crawl":
+        url = args.get("url", "")
+        domain = url.replace("https://", "").replace("http://", "").split("/")[0]
+        return _wrap(f"┊ 🕸️  crawl     {_trunc(domain, 35)}  {dur}")
+    if tool_name == "terminal":
+        return _wrap(f"┊ 💻 $         {_trunc(args.get('command', ''), 42)}  {dur}")
+    if tool_name == "process":
+        action = args.get("action", "?")
+        sid = args.get("session_id", "")[:12]
+        labels = {"list": "ls processes", "poll": f"poll {sid}", "log": f"log {sid}",
+                  "wait": f"wait {sid}", "kill": f"kill {sid}", "write": f"write {sid}", "submit": f"submit {sid}"}
+        return _wrap(f"┊ ⚙️  proc      {labels.get(action, f'{action} {sid}')}  {dur}")
+    if tool_name == "read_file":
+        return _wrap(f"┊ 📖 read      {_path(args.get('path', ''))}  {dur}")
+    if tool_name == "write_file":
+        return _wrap(f"┊ ✍️  write     {_path(args.get('path', ''))}  {dur}")
+    if tool_name == "patch":
+        return _wrap(f"┊ 🔧 patch     {_path(args.get('path', ''))}  {dur}")
+    if tool_name == "search_files":
+        pattern = _trunc(args.get("pattern", ""), 35)
+        target = args.get("target", "content")
+        verb = "find" if target == "files" else "grep"
+        return _wrap(f"┊ 🔎 {verb:9} {pattern}  {dur}")
+    if tool_name == "browser_navigate":
+        url = args.get("url", "")
+        domain = url.replace("https://", "").replace("http://", "").split("/")[0]
+        return _wrap(f"┊ 🌐 navigate  {_trunc(domain, 35)}  {dur}")
+    if tool_name == "browser_snapshot":
+        mode = "full" if args.get("full") else "compact"
+        return _wrap(f"┊ 📸 snapshot  {mode}  {dur}")
+    if tool_name == "browser_click":
+        return _wrap(f"┊ 👆 click     {args.get('ref', '?')}  {dur}")
+    if tool_name == "browser_type":
+        return _wrap(f"┊ ⌨️  type      \"{_trunc(args.get('text', ''), 30)}\"  {dur}")
+    if tool_name == "browser_scroll":
+        d = args.get("direction", "down")
+        arrow = {"down": "↓", "up": "↑", "right": "→", "left": "←"}.get(d, "↓")
+        return _wrap(f"┊ {arrow}  scroll    {d}  {dur}")
+    if tool_name == "browser_back":
+        return _wrap(f"┊ ◀️  back      {dur}")
+    if tool_name == "browser_press":
+        return _wrap(f"┊ ⌨️  press     {args.get('key', '?')}  {dur}")
+    if tool_name == "browser_close":
+        return _wrap(f"┊ 🚪 close     browser  {dur}")
+    if tool_name == "browser_get_images":
+        return _wrap(f"┊ 🖼️  images    extracting  {dur}")
+    if tool_name == "browser_vision":
+        return _wrap(f"┊ 👁️  vision    analyzing page  {dur}")
+    if tool_name == "todo":
+        todos_arg = args.get("todos")
+        merge = args.get("merge", False)
+        if todos_arg is None:
+            return _wrap(f"┊ 📋 plan      reading tasks  {dur}")
+        elif merge:
+            return _wrap(f"┊ 📋 plan      update {len(todos_arg)} task(s)  {dur}")
+        else:
+            return _wrap(f"┊ 📋 plan      {len(todos_arg)} task(s)  {dur}")
+    if tool_name == "session_search":
+        return _wrap(f"┊ 🔍 recall    \"{_trunc(args.get('query', ''), 35)}\"  {dur}")
+    if tool_name == "memory":
+        action = args.get("action", "?")
+        target = args.get("target", "")
+        if action == "add":
+            return _wrap(f"┊ 🧠 memory    +{target}: \"{_trunc(args.get('content', ''), 30)}\"  {dur}")
+        elif action == "replace":
+            return _wrap(f"┊ 🧠 memory    ~{target}: \"{_trunc(args.get('old_text', ''), 20)}\"  {dur}")
+        elif action == "remove":
+            return _wrap(f"┊ 🧠 memory    -{target}: \"{_trunc(args.get('old_text', ''), 20)}\"  {dur}")
+        return _wrap(f"┊ 🧠 memory    {action}  {dur}")
+    if tool_name == "skills_list":
+        return _wrap(f"┊ 📚 skills    list {args.get('category', 'all')}  {dur}")
+    if tool_name == "skill_view":
+        return _wrap(f"┊ 📚 skill     {_trunc(args.get('name', ''), 30)}  {dur}")
+    if tool_name == "image_generate":
+        return _wrap(f"┊ 🎨 create    {_trunc(args.get('prompt', ''), 35)}  {dur}")
+    if tool_name == "text_to_speech":
+        return _wrap(f"┊ 🔊 speak     {_trunc(args.get('text', ''), 30)}  {dur}")
+    if tool_name == "vision_analyze":
+        return _wrap(f"┊ 👁️  vision    {_trunc(args.get('question', ''), 30)}  {dur}")
+    if tool_name == "mixture_of_agents":
+        return _wrap(f"┊ 🧠 reason    {_trunc(args.get('user_prompt', ''), 30)}  {dur}")
+    if tool_name == "send_message":
+        return _wrap(f"┊ 📨 send      {args.get('target', '?')}: \"{_trunc(args.get('message', ''), 25)}\"  {dur}")
+    if tool_name == "schedule_cronjob":
+        return _wrap(f"┊ ⏰ schedule  {_trunc(args.get('name', args.get('prompt', 'task')), 30)}  {dur}")
+    if tool_name == "list_cronjobs":
+        return _wrap(f"┊ ⏰ jobs      listing  {dur}")
+    if tool_name == "remove_cronjob":
+        return _wrap(f"┊ ⏰ remove    job {args.get('job_id', '?')}  {dur}")
+    if tool_name.startswith("rl_"):
+        rl = {
+            "rl_list_environments": "list envs", "rl_select_environment": f"select {args.get('name', '')}",
+            "rl_get_current_config": "get config", "rl_edit_config": f"set {args.get('field', '?')}",
+            "rl_start_training": "start training", "rl_check_status": f"status {args.get('run_id', '?')[:12]}",
+            "rl_stop_training": f"stop {args.get('run_id', '?')[:12]}", "rl_get_results": f"results {args.get('run_id', '?')[:12]}",
+            "rl_list_runs": "list runs", "rl_test_inference": "test inference",
+        }
+        return _wrap(f"┊ 🧪 rl        {rl.get(tool_name, tool_name.replace('rl_', ''))}  {dur}")
+    if tool_name == "execute_code":
+        code = args.get("code", "")
+        first_line = code.strip().split("\n")[0] if code.strip() else ""
+        return _wrap(f"┊ 🐍 exec      {_trunc(first_line, 35)}  {dur}")
+    if tool_name == "delegate_task":
+        tasks = args.get("tasks")
+        if tasks and isinstance(tasks, list):
+            return _wrap(f"┊ 🔀 delegate  {len(tasks)} parallel tasks  {dur}")
+        return _wrap(f"┊ 🔀 delegate  {_trunc(args.get('goal', ''), 35)}  {dur}")
+
+    preview = build_tool_preview(tool_name, args) or ""
+    return _wrap(f"┊ ⚡ {tool_name[:9]:9} {_trunc(preview, 35)}  {dur}")
--- a/hermes_agent/agent/insights.py
+++ b/hermes_agent/agent/insights.py
@@ -10,7 +10,7 @@ multi-platform architecture with additional cost estimation and platform
 breakdown capabilities.

 Usage:
-    from hermes_agent.agent.insights import InsightsEngine
+    from agent.insights import InsightsEngine
    engine = InsightsEngine(db)
    report = engine.generate(days=30)
    print(engine.format_terminal(report))
@@ -20,66 +20,134 @@ import json
 import time
 from collections import Counter, defaultdict
 from datetime import datetime
-from typing import Any, Dict, List
+from typing import Any, Dict, List, Optional

-from hermes_agent.providers.pricing import (
-    CanonicalUsage,
-    DEFAULT_PRICING,
-    estimate_usage_cost,
-    format_duration_compact,
-    has_known_pricing,
-)
+# =========================================================================
+# Model pricing (USD per million tokens) — approximate as of early 2026
+# =========================================================================
+MODEL_PRICING = {
+    # OpenAI
+    "gpt-4o": {"input": 2.50, "output": 10.00},
+    "gpt-4o-mini": {"input": 0.15, "output": 0.60},
+    "gpt-4.1": {"input": 2.00, "output": 8.00},
+    "gpt-4.1-mini": {"input": 0.40, "output": 1.60},
+    "gpt-4.1-nano": {"input": 0.10, "output": 0.40},
+    "gpt-4.5-preview": {"input": 75.00, "output": 150.00},
+    "gpt-5": {"input": 10.00, "output": 30.00},
+    "gpt-5.4": {"input": 10.00, "output": 30.00},
+    "o3": {"input": 10.00, "output": 40.00},
+    "o3-mini": {"input": 1.10, "output": 4.40},
+    "o4-mini": {"input": 1.10, "output": 4.40},
+    # Anthropic
+    "claude-opus-4-20250514": {"input": 15.00, "output": 75.00},
+    "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
+    "claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
+    "claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
+    "claude-3-opus-20240229": {"input": 15.00, "output": 75.00},
+    "claude-3-haiku-20240307": {"input": 0.25, "output": 1.25},
+    # DeepSeek
+    "deepseek-chat": {"input": 0.14, "output": 0.28},
+    "deepseek-reasoner": {"input": 0.55, "output": 2.19},
+    # Google
+    "gemini-2.5-pro": {"input": 1.25, "output": 10.00},
+    "gemini-2.5-flash": {"input": 0.15, "output": 0.60},
+    "gemini-2.0-flash": {"input": 0.10, "output": 0.40},
+    # Meta (via providers)
+    "llama-4-maverick": {"input": 0.50, "output": 0.70},
+    "llama-4-scout": {"input": 0.20, "output": 0.30},
+    # Z.AI / GLM (direct provider — pricing not published externally, treat as local)
+    "glm-5": {"input": 0.0, "output": 0.0},
+    "glm-4.7": {"input": 0.0, "output": 0.0},
+    "glm-4.5": {"input": 0.0, "output": 0.0},
+    "glm-4.5-flash": {"input": 0.0, "output": 0.0},
+    # Kimi / Moonshot (direct provider — pricing not published externally, treat as local)
+    "kimi-k2.5": {"input": 0.0, "output": 0.0},
+    "kimi-k2-thinking": {"input": 0.0, "output": 0.0},
+    "kimi-k2-turbo-preview": {"input": 0.0, "output": 0.0},
+    "kimi-k2-0905-preview": {"input": 0.0, "output": 0.0},
+    # MiniMax (direct provider — pricing not published externally, treat as local)
+    "MiniMax-M2.5": {"input": 0.0, "output": 0.0},
+    "MiniMax-M2.5-highspeed": {"input": 0.0, "output": 0.0},
+    "MiniMax-M2.1": {"input": 0.0, "output": 0.0},
+}

-_DEFAULT_PRICING = DEFAULT_PRICING
+# Fallback: unknown/custom models get zero cost (we can't assume pricing
+# for self-hosted models, custom OAI endpoints, local inference, etc.)
+_DEFAULT_PRICING = {"input": 0.0, "output": 0.0}


-def _has_known_pricing(model_name: str, provider: str = None, base_url: str = None) -> bool:
+def _has_known_pricing(model_name: str) -> bool:
    """Check if a model has known pricing (vs unknown/custom endpoint)."""
-    return has_known_pricing(model_name, provider=provider, base_url=base_url)
+    return _get_pricing(model_name) is not _DEFAULT_PRICING


-def _estimate_cost(
-    session_or_model: Dict[str, Any] | str,
-    input_tokens: int = 0,
-    output_tokens: int = 0,
-    *,
-    cache_read_tokens: int = 0,
-    cache_write_tokens: int = 0,
-    provider: str = None,
-    base_url: str = None,
-) -> tuple[float, str]:
-    """Estimate the USD cost for a session row or a model/token tuple."""
-    if isinstance(session_or_model, dict):
-        session = session_or_model
-        model = session.get("model") or ""
-        usage = CanonicalUsage(
-            input_tokens=session.get("input_tokens") or 0,
-            output_tokens=session.get("output_tokens") or 0,
-            cache_read_tokens=session.get("cache_read_tokens") or 0,
-            cache_write_tokens=session.get("cache_write_tokens") or 0,
-        )
-        provider = session.get("billing_provider")
-        base_url = session.get("billing_base_url")
-    else:
-        model = session_or_model or ""
-        usage = CanonicalUsage(
-            input_tokens=input_tokens,
-            output_tokens=output_tokens,
-            cache_read_tokens=cache_read_tokens,
-            cache_write_tokens=cache_write_tokens,
-        )
-    result = estimate_usage_cost(
-        model,
-        usage,
-        provider=provider,
-        base_url=base_url,
-    )
-    return float(result.amount_usd or 0.0), result.status
+def _get_pricing(model_name: str) -> Dict[str, float]:
+    """Look up pricing for a model. Uses fuzzy matching on model name.
+
+    Returns _DEFAULT_PRICING (zero cost) for unknown/custom models —
+    we can't assume costs for self-hosted endpoints, local inference, etc.
+    """
+    if not model_name:
+        return _DEFAULT_PRICING
+
+    # Strip provider prefix (e.g., "anthropic/claude-..." -> "claude-...")
+    bare = model_name.split("/")[-1].lower()
+
+    # Exact match first
+    if bare in MODEL_PRICING:
+        return MODEL_PRICING[bare]
+
+    # Fuzzy prefix match — prefer the LONGEST matching key to avoid
+    # e.g. "gpt-4o" matching before "gpt-4o-mini" for "gpt-4o-mini-2024-07-18"
+    best_match = None
+    best_len = 0
+    for key, price in MODEL_PRICING.items():
+        if bare.startswith(key) and len(key) > best_len:
+            best_match = price
+            best_len = len(key)
+    if best_match:
+        return best_match
+
+    # Keyword heuristics (checked in most-specific-first order)
+    if "opus" in bare:
+        return {"input": 15.00, "output": 75.00}
+    if "sonnet" in bare:
+        return {"input": 3.00, "output": 15.00}
+    if "haiku" in bare:
+        return {"input": 0.80, "output": 4.00}
+    if "gpt-4o-mini" in bare:
+        return {"input": 0.15, "output": 0.60}
+    if "gpt-4o" in bare:
+        return {"input": 2.50, "output": 10.00}
+    if "gpt-5" in bare:
+        return {"input": 10.00, "output": 30.00}
+    if "deepseek" in bare:
+        return {"input": 0.14, "output": 0.28}
+    if "gemini" in bare:
+        return {"input": 0.15, "output": 0.60}
+
+    return _DEFAULT_PRICING
+
+
+def _estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
+    """Estimate the USD cost for a given model and token counts."""
+    pricing = _get_pricing(model)
+    return (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000


 def _format_duration(seconds: float) -> str:
    """Format seconds into a human-readable duration string."""
-    return format_duration_compact(seconds)
+    if seconds < 60:
+        return f"{seconds:.0f}s"
+    minutes = seconds / 60
+    if minutes < 60:
+        return f"{minutes:.0f}m"
+    hours = minutes / 60
+    if hours < 24:
+        remaining_min = int(minutes % 60)
+        return f"{int(hours)}h {remaining_min}m" if remaining_min else f"{int(hours)}h"
+    days = hours / 24
+    return f"{days:.1f}d"


 def _bar_chart(values: List[int], max_width: int = 20) -> List[str]:
@@ -124,7 +192,6 @@ class InsightsEngine:
        # Gather raw data
        sessions = self._get_sessions(cutoff, source)
        tool_usage = self._get_tool_usage(cutoff, source)
-        skill_usage = self._get_skill_usage(cutoff, source)
        message_stats = self._get_message_stats(cutoff, source)

        if not sessions:
@@ -136,15 +203,6 @@ class InsightsEngine:
                "models": [],
                "platforms": [],
                "tools": [],
-                "skills": {
-                    "summary": {
-                        "total_skill_loads": 0,
-                        "total_skill_edits": 0,
-                        "total_skill_actions": 0,
-                        "distinct_skills_used": 0,
-                    },
-                    "top_skills": [],
-                },
                "activity": {},
                "top_sessions": [],
            }
@@ -154,7 +212,6 @@ class InsightsEngine:
        models = self._compute_model_breakdown(sessions)
        platforms = self._compute_platform_breakdown(sessions)
        tools = self._compute_tool_breakdown(tool_usage)
-        skills = self._compute_skill_breakdown(skill_usage)
        activity = self._compute_activity_patterns(sessions)
        top_sessions = self._compute_top_sessions(sessions)

@@ -167,7 +224,6 @@ class InsightsEngine:
            "models": models,
            "platforms": platforms,
            "tools": tools,
-            "skills": skills,
            "activity": activity,
            "top_sessions": top_sessions,
        }
@@ -178,30 +234,24 @@ class InsightsEngine:

    # Columns we actually need (skip system_prompt, model_config blobs)
    _SESSION_COLS = ("id, source, model, started_at, ended_at, "
-                     "message_count, tool_call_count, input_tokens, output_tokens, "
-                     "cache_read_tokens, cache_write_tokens, billing_provider, "
-                     "billing_base_url, billing_mode, estimated_cost_usd, "
-                     "actual_cost_usd, cost_status, cost_source")
-
-    # Pre-computed query strings — f-string evaluated once at class definition,
-    # not at runtime, so no user-controlled value can alter the query structure.
-    _GET_SESSIONS_WITH_SOURCE = (
-        f"SELECT {_SESSION_COLS} FROM sessions"
-        " WHERE started_at >= ? AND source = ?"
-        " ORDER BY started_at DESC"
-    )
-    _GET_SESSIONS_ALL = (
-        f"SELECT {_SESSION_COLS} FROM sessions"
-        " WHERE started_at >= ?"
-        " ORDER BY started_at DESC"
-    )
+                     "message_count, tool_call_count, input_tokens, output_tokens")

    def _get_sessions(self, cutoff: float, source: str = None) -> List[Dict]:
        """Fetch sessions within the time window."""
        if source:
-            cursor = self._conn.execute(self._GET_SESSIONS_WITH_SOURCE, (cutoff, source))
+            cursor = self._conn.execute(
+                f"""SELECT {self._SESSION_COLS} FROM sessions
+                    WHERE started_at >= ? AND source = ?
+                    ORDER BY started_at DESC""",
+                (cutoff, source),
+            )
        else:
-            cursor = self._conn.execute(self._GET_SESSIONS_ALL, (cutoff,))
+            cursor = self._conn.execute(
+                f"""SELECT {self._SESSION_COLS} FROM sessions
+                    WHERE started_at >= ?
+                    ORDER BY started_at DESC""",
+                (cutoff,),
+            )
        return [dict(row) for row in cursor.fetchall()]

    def _get_tool_usage(self, cutoff: float, source: str = None) -> List[Dict]:
@@ -296,82 +346,6 @@ class InsightsEngine:
            for name, count in tool_counts.most_common()
        ]

-    def _get_skill_usage(self, cutoff: float, source: str = None) -> List[Dict]:
-        """Extract per-skill usage from assistant tool calls."""
-        skill_counts: Dict[str, Dict[str, Any]] = {}
-
-        if source:
-            cursor = self._conn.execute(
-                """SELECT m.tool_calls, m.timestamp
-                   FROM messages m
-                   JOIN sessions s ON s.id = m.session_id
-                   WHERE s.started_at >= ? AND s.source = ?
-                     AND m.role = 'assistant' AND m.tool_calls IS NOT NULL""",
-                (cutoff, source),
-            )
-        else:
-            cursor = self._conn.execute(
-                """SELECT m.tool_calls, m.timestamp
-                   FROM messages m
-                   JOIN sessions s ON s.id = m.session_id
-                   WHERE s.started_at >= ?
-                     AND m.role = 'assistant' AND m.tool_calls IS NOT NULL""",
-                (cutoff,),
-            )
-
-        for row in cursor.fetchall():
-            try:
-                calls = row["tool_calls"]
-                if isinstance(calls, str):
-                    calls = json.loads(calls)
-                if not isinstance(calls, list):
-                    continue
-            except (json.JSONDecodeError, TypeError):
-                continue
-
-            timestamp = row["timestamp"]
-            for call in calls:
-                if not isinstance(call, dict):
-                    continue
-                func = call.get("function", {})
-                tool_name = func.get("name")
-                if tool_name not in {"skill_view", "skill_manage"}:
-                    continue
-
-                args = func.get("arguments")
-                if isinstance(args, str):
-                    try:
-                        args = json.loads(args)
-                    except (json.JSONDecodeError, TypeError):
-                        continue
-                if not isinstance(args, dict):
-                    continue
-
-                skill_name = args.get("name")
-                if not isinstance(skill_name, str) or not skill_name.strip():
-                    continue
-
-                entry = skill_counts.setdefault(
-                    skill_name,
-                    {
-                        "skill": skill_name,
-                        "view_count": 0,
-                        "manage_count": 0,
-                        "last_used_at": None,
-                    },
-                )
-                if tool_name == "skill_view":
-                    entry["view_count"] += 1
-                else:
-                    entry["manage_count"] += 1
-
-                if timestamp is not None and (
-                    entry["last_used_at"] is None or timestamp > entry["last_used_at"]
-                ):
-                    entry["last_used_at"] = timestamp
-
-        return list(skill_counts.values())
-
    def _get_message_stats(self, cutoff: float, source: str = None) -> Dict:
        """Get aggregate message statistics."""
        if source:
@@ -412,30 +386,21 @@ class InsightsEngine:
        """Compute high-level overview statistics."""
        total_input = sum(s.get("input_tokens") or 0 for s in sessions)
        total_output = sum(s.get("output_tokens") or 0 for s in sessions)
-        total_cache_read = sum(s.get("cache_read_tokens") or 0 for s in sessions)
-        total_cache_write = sum(s.get("cache_write_tokens") or 0 for s in sessions)
-        total_tokens = total_input + total_output + total_cache_read + total_cache_write
+        total_tokens = total_input + total_output
        total_tool_calls = sum(s.get("tool_call_count") or 0 for s in sessions)
        total_messages = sum(s.get("message_count") or 0 for s in sessions)

        # Cost estimation (weighted by model)
        total_cost = 0.0
-        actual_cost = 0.0
        models_with_pricing = set()
        models_without_pricing = set()
-        unknown_cost_sessions = 0
-        included_cost_sessions = 0
        for s in sessions:
            model = s.get("model") or ""
-            estimated, status = _estimate_cost(s)
-            total_cost += estimated
-            actual_cost += s.get("actual_cost_usd") or 0.0
+            inp = s.get("input_tokens") or 0
+            out = s.get("output_tokens") or 0
+            total_cost += _estimate_cost(model, inp, out)
            display = model.split("/")[-1] if "/" in model else (model or "unknown")
-            if status == "included":
-                included_cost_sessions += 1
-            elif status == "unknown":
-                unknown_cost_sessions += 1
-            if _has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url")):
+            if _has_known_pricing(model):
                models_with_pricing.add(display)
            else:
                models_without_pricing.add(display)
@@ -462,11 +427,8 @@ class InsightsEngine:
            "total_tool_calls": total_tool_calls,
            "total_input_tokens": total_input,
            "total_output_tokens": total_output,
-            "total_cache_read_tokens": total_cache_read,
-            "total_cache_write_tokens": total_cache_write,
            "total_tokens": total_tokens,
            "estimated_cost": total_cost,
-            "actual_cost": actual_cost,
            "total_hours": total_hours,
            "avg_session_duration": avg_duration,
            "avg_messages_per_session": total_messages / len(sessions) if sessions else 0,
@@ -478,15 +440,12 @@ class InsightsEngine:
            "date_range_end": date_range_end,
            "models_with_pricing": sorted(models_with_pricing),
            "models_without_pricing": sorted(models_without_pricing),
-            "unknown_cost_sessions": unknown_cost_sessions,
-            "included_cost_sessions": included_cost_sessions,
        }

    def _compute_model_breakdown(self, sessions: List[Dict]) -> List[Dict]:
        """Break down usage by model."""
        model_data = defaultdict(lambda: {
            "sessions": 0, "input_tokens": 0, "output_tokens": 0,
-            "cache_read_tokens": 0, "cache_write_tokens": 0,
            "total_tokens": 0, "tool_calls": 0, "cost": 0.0,
        })

@@ -498,18 +457,12 @@ class InsightsEngine:
            d["sessions"] += 1
            inp = s.get("input_tokens") or 0
            out = s.get("output_tokens") or 0
-            cache_read = s.get("cache_read_tokens") or 0
-            cache_write = s.get("cache_write_tokens") or 0
            d["input_tokens"] += inp
            d["output_tokens"] += out
-            d["cache_read_tokens"] += cache_read
-            d["cache_write_tokens"] += cache_write
-            d["total_tokens"] += inp + out + cache_read + cache_write
+            d["total_tokens"] += inp + out
            d["tool_calls"] += s.get("tool_call_count") or 0
-            estimate, status = _estimate_cost(s)
-            d["cost"] += estimate
-            d["has_pricing"] = _has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url"))
-            d["cost_status"] = status
+            d["cost"] += _estimate_cost(model, inp, out)
+            d["has_pricing"] = _has_known_pricing(model)

        result = [
            {"model": model, **data}
@@ -523,8 +476,7 @@ class InsightsEngine:
        """Break down usage by platform/source."""
        platform_data = defaultdict(lambda: {
            "sessions": 0, "messages": 0, "input_tokens": 0,
-            "output_tokens": 0, "cache_read_tokens": 0,
-            "cache_write_tokens": 0, "total_tokens": 0, "tool_calls": 0,
+            "output_tokens": 0, "total_tokens": 0, "tool_calls": 0,
        })

        for s in sessions:
@@ -534,13 +486,9 @@ class InsightsEngine:
            d["messages"] += s.get("message_count") or 0
            inp = s.get("input_tokens") or 0
            out = s.get("output_tokens") or 0
-            cache_read = s.get("cache_read_tokens") or 0
-            cache_write = s.get("cache_write_tokens") or 0
            d["input_tokens"] += inp
            d["output_tokens"] += out
-            d["cache_read_tokens"] += cache_read
-            d["cache_write_tokens"] += cache_write
-            d["total_tokens"] += inp + out + cache_read + cache_write
+            d["total_tokens"] += inp + out
            d["tool_calls"] += s.get("tool_call_count") or 0

        result = [
@@ -563,46 +511,6 @@ class InsightsEngine:
            })
        return result

-    def _compute_skill_breakdown(self, skill_usage: List[Dict]) -> Dict[str, Any]:
-        """Process per-skill usage into summary + ranked list."""
-        total_skill_loads = sum(s["view_count"] for s in skill_usage) if skill_usage else 0
-        total_skill_edits = sum(s["manage_count"] for s in skill_usage) if skill_usage else 0
-        total_skill_actions = total_skill_loads + total_skill_edits
-
-        top_skills = []
-        for skill in skill_usage:
-            total_count = skill["view_count"] + skill["manage_count"]
-            percentage = (total_count / total_skill_actions * 100) if total_skill_actions else 0
-            top_skills.append({
-                "skill": skill["skill"],
-                "view_count": skill["view_count"],
-                "manage_count": skill["manage_count"],
-                "total_count": total_count,
-                "percentage": percentage,
-                "last_used_at": skill.get("last_used_at"),
-            })
-
-        top_skills.sort(
-            key=lambda s: (
-                s["total_count"],
-                s["view_count"],
-                s["manage_count"],
-                s["last_used_at"] or 0,
-                s["skill"],
-            ),
-            reverse=True,
-        )
-
-        return {
-            "summary": {
-                "total_skill_loads": total_skill_loads,
-                "total_skill_edits": total_skill_edits,
-                "total_skill_actions": total_skill_actions,
-                "distinct_skills_used": len(skill_usage),
-            },
-            "top_skills": top_skills,
-        }
-
    def _compute_activity_patterns(self, sessions: List[Dict]) -> Dict:
        """Analyze activity patterns by day of week and hour."""
        day_counts = Counter()  # 0=Monday ... 6=Sunday
@@ -762,7 +670,10 @@ class InsightsEngine:
        lines.append(f"  Sessions:          {o['total_sessions']:<12}  Messages:        {o['total_messages']:,}")
        lines.append(f"  Tool calls:        {o['total_tool_calls']:<12,}  User messages:   {o['user_messages']:,}")
        lines.append(f"  Input tokens:      {o['total_input_tokens']:<12,}  Output tokens:   {o['total_output_tokens']:,}")
-        lines.append(f"  Total tokens:      {o['total_tokens']:,}")
+        cost_str = f"${o['estimated_cost']:.2f}"
+        if o.get("models_without_pricing"):
+            cost_str += " *"
+        lines.append(f"  Total tokens:      {o['total_tokens']:<12,}  Est. cost:       {cost_str}")
        if o["total_hours"] > 0:
            lines.append(f"  Active time:       ~{_format_duration(o['total_hours'] * 3600):<11}  Avg session:     ~{_format_duration(o['avg_session_duration'])}")
        lines.append(f"  Avg msgs/session:  {o['avg_messages_per_session']:.1f}")
@@ -772,10 +683,16 @@ class InsightsEngine:
        if report["models"]:
            lines.append("  🤖 Models Used")
            lines.append("  " + "─" * 56)
-            lines.append(f"  {'Model':<30} {'Sessions':>8} {'Tokens':>12}")
+            lines.append(f"  {'Model':<30} {'Sessions':>8} {'Tokens':>12} {'Cost':>8}")
            for m in report["models"]:
                model_name = m["model"][:28]
-                lines.append(f"  {model_name:<30} {m['sessions']:>8} {m['total_tokens']:>12,}")
+                if m.get("has_pricing"):
+                    cost_cell = f"${m['cost']:>6.2f}"
+                else:
+                    cost_cell = "     N/A"
+                lines.append(f"  {model_name:<30} {m['sessions']:>8} {m['total_tokens']:>12,} {cost_cell}")
+            if o.get("models_without_pricing"):
+                lines.append(f"  * Cost N/A for custom/self-hosted models")
            lines.append("")

        # Platform breakdown
@@ -798,28 +715,6 @@ class InsightsEngine:
                lines.append(f"  ... and {len(report['tools']) - 15} more tools")
            lines.append("")

-        # Skill usage
-        skills = report.get("skills", {})
-        top_skills = skills.get("top_skills", [])
-        if top_skills:
-            lines.append("  🧠 Top Skills")
-            lines.append("  " + "─" * 56)
-            lines.append(f"  {'Skill':<28} {'Loads':>7} {'Edits':>7} {'Last used':>11}")
-            for skill in top_skills[:10]:
-                last_used = "—"
-                if skill.get("last_used_at"):
-                    last_used = datetime.fromtimestamp(skill["last_used_at"]).strftime("%b %d")
-                lines.append(
-                    f"  {skill['skill'][:28]:<28} {skill['view_count']:>7,} {skill['manage_count']:>7,} {last_used:>11}"
-                )
-            summary = skills.get("summary", {})
-            lines.append(
-                f"  Distinct skills: {summary.get('distinct_skills_used', 0)}  "
-                f"Loads: {summary.get('total_skill_loads', 0):,}  "
-                f"Edits: {summary.get('total_skill_edits', 0):,}"
-            )
-            lines.append("")
-
        # Activity patterns
        act = report.get("activity", {})
        if act.get("by_day"):
@@ -878,6 +773,10 @@ class InsightsEngine:
        # Overview
        lines.append(f"**Sessions:** {o['total_sessions']} | **Messages:** {o['total_messages']:,} | **Tool calls:** {o['total_tool_calls']:,}")
        lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,})")
+        cost_note = ""
+        if o.get("models_without_pricing"):
+            cost_note = " _(excludes custom/self-hosted models)_"
+        lines.append(f"**Est. cost:** ${o['estimated_cost']:.2f}{cost_note}")
        if o["total_hours"] > 0:
            lines.append(f"**Active time:** ~{_format_duration(o['total_hours'] * 3600)} | **Avg session:** ~{_format_duration(o['avg_session_duration'])}")
        lines.append("")
@@ -886,7 +785,8 @@ class InsightsEngine:
        if report["models"]:
            lines.append("**🤖 Models:**")
            for m in report["models"][:5]:
-                lines.append(f"  {m['model'][:25]} — {m['sessions']} sessions, {m['total_tokens']:,} tokens")
+                cost_str = f"${m['cost']:.2f}" if m.get("has_pricing") else "N/A"
+                lines.append(f"  {m['model'][:25]} — {m['sessions']} sessions, {m['total_tokens']:,} tokens, {cost_str}")
            lines.append("")

        # Platforms (if multi-platform)
@@ -903,18 +803,6 @@ class InsightsEngine:
                lines.append(f"  {t['tool']} — {t['count']:,} calls ({t['percentage']:.1f}%)")
            lines.append("")

-        skills = report.get("skills", {})
-        if skills.get("top_skills"):
-            lines.append("**🧠 Top Skills:**")
-            for skill in skills["top_skills"][:5]:
-                suffix = ""
-                if skill.get("last_used_at"):
-                    suffix = f", last used {datetime.fromtimestamp(skill['last_used_at']).strftime('%b %d')}"
-                lines.append(
-                    f"  {skill['skill']} — {skill['view_count']:,} loads, {skill['manage_count']:,} edits{suffix}"
-                )
-            lines.append("")
-
        # Activity summary
        act = report.get("activity", {})
        if act.get("busiest_day") and act.get("busiest_hour"):
--- a/agent/model_metadata.py
+++ b/agent/model_metadata.py
@@ -0,0 +1,224 @@
+"""Model metadata, context lengths, and token estimation utilities.
+
+Pure utility functions with no AIAgent dependency. Used by ContextCompressor
+and run_agent.py for pre-flight context checks.
+"""
+
+import logging
+import os
+import re
+import time
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+import requests
+import yaml
+
+from hermes_constants import OPENROUTER_MODELS_URL
+
+logger = logging.getLogger(__name__)
+
+_model_metadata_cache: Dict[str, Dict[str, Any]] = {}
+_model_metadata_cache_time: float = 0
+_MODEL_CACHE_TTL = 3600
+
+# Descending tiers for context length probing when the model is unknown.
+# We start high and step down on context-length errors until one works.
+CONTEXT_PROBE_TIERS = [
+    2_000_000,
+    1_000_000,
+    512_000,
+    200_000,
+    128_000,
+    64_000,
+    32_000,
+]
+
+DEFAULT_CONTEXT_LENGTHS = {
+    "anthropic/claude-opus-4": 200000,
+    "anthropic/claude-opus-4.5": 200000,
+    "anthropic/claude-opus-4.6": 200000,
+    "anthropic/claude-sonnet-4": 200000,
+    "anthropic/claude-sonnet-4-20250514": 200000,
+    "anthropic/claude-haiku-4.5": 200000,
+    "openai/gpt-4o": 128000,
+    "openai/gpt-4-turbo": 128000,
+    "openai/gpt-4o-mini": 128000,
+    "google/gemini-2.0-flash": 1048576,
+    "google/gemini-2.5-pro": 1048576,
+    "meta-llama/llama-3.3-70b-instruct": 131072,
+    "deepseek/deepseek-chat-v3": 65536,
+    "qwen/qwen-2.5-72b-instruct": 32768,
+    "glm-4.7": 202752,
+    "glm-5": 202752,
+    "glm-4.5": 131072,
+    "glm-4.5-flash": 131072,
+    "kimi-k2.5": 262144,
+    "kimi-k2-thinking": 262144,
+    "kimi-k2-turbo-preview": 262144,
+    "kimi-k2-0905-preview": 131072,
+    "MiniMax-M2.5": 204800,
+    "MiniMax-M2.5-highspeed": 204800,
+    "MiniMax-M2.1": 204800,
+}
+
+
+def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any]]:
+    """Fetch model metadata from OpenRouter (cached for 1 hour)."""
+    global _model_metadata_cache, _model_metadata_cache_time
+
+    if not force_refresh and _model_metadata_cache and (time.time() - _model_metadata_cache_time) < _MODEL_CACHE_TTL:
+        return _model_metadata_cache
+
+    try:
+        response = requests.get(OPENROUTER_MODELS_URL, timeout=10)
+        response.raise_for_status()
+        data = response.json()
+
+        cache = {}
+        for model in data.get("data", []):
+            model_id = model.get("id", "")
+            cache[model_id] = {
+                "context_length": model.get("context_length", 128000),
+                "max_completion_tokens": model.get("top_provider", {}).get("max_completion_tokens", 4096),
+                "name": model.get("name", model_id),
+                "pricing": model.get("pricing", {}),
+            }
+            canonical = model.get("canonical_slug", "")
+            if canonical and canonical != model_id:
+                cache[canonical] = cache[model_id]
+
+        _model_metadata_cache = cache
+        _model_metadata_cache_time = time.time()
+        logger.debug("Fetched metadata for %s models from OpenRouter", len(cache))
+        return cache
+
+    except Exception as e:
+        logging.warning(f"Failed to fetch model metadata from OpenRouter: {e}")
+        return _model_metadata_cache or {}
+
+
+def _get_context_cache_path() -> Path:
+    """Return path to the persistent context length cache file."""
+    hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
+    return hermes_home / "context_length_cache.yaml"
+
+
+def _load_context_cache() -> Dict[str, int]:
+    """Load the model+provider → context_length cache from disk."""
+    path = _get_context_cache_path()
+    if not path.exists():
+        return {}
+    try:
+        with open(path) as f:
+            data = yaml.safe_load(f) or {}
+        return data.get("context_lengths", {})
+    except Exception as e:
+        logger.debug("Failed to load context length cache: %s", e)
+        return {}
+
+
+def save_context_length(model: str, base_url: str, length: int) -> None:
+    """Persist a discovered context length for a model+provider combo.
+
+    Cache key is ``model@base_url`` so the same model name served from
+    different providers can have different limits.
+    """
+    key = f"{model}@{base_url}"
+    cache = _load_context_cache()
+    if cache.get(key) == length:
+        return  # already stored
+    cache[key] = length
+    path = _get_context_cache_path()
+    try:
+        path.parent.mkdir(parents=True, exist_ok=True)
+        with open(path, "w") as f:
+            yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
+        logger.info("Cached context length %s → %s tokens", key, f"{length:,}")
+    except Exception as e:
+        logger.debug("Failed to save context length cache: %s", e)
+
+
+def get_cached_context_length(model: str, base_url: str) -> Optional[int]:
+    """Look up a previously discovered context length for model+provider."""
+    key = f"{model}@{base_url}"
+    cache = _load_context_cache()
+    return cache.get(key)
+
+
+def get_next_probe_tier(current_length: int) -> Optional[int]:
+    """Return the next lower probe tier, or None if already at minimum."""
+    for tier in CONTEXT_PROBE_TIERS:
+        if tier < current_length:
+            return tier
+    return None
+
+
+def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
+    """Try to extract the actual context limit from an API error message.
+
+    Many providers include the limit in their error text, e.g.:
+      - "maximum context length is 32768 tokens"
+      - "context_length_exceeded: 131072"
+      - "Maximum context size 32768 exceeded"
+      - "model's max context length is 65536"
+    """
+    error_lower = error_msg.lower()
+    # Pattern: look for numbers near context-related keywords
+    patterns = [
+        r'(?:max(?:imum)?|limit)\s*(?:context\s*)?(?:length|size|window)?\s*(?:is|of|:)?\s*(\d{4,})',
+        r'context\s*(?:length|size|window)\s*(?:is|of|:)?\s*(\d{4,})',
+        r'(\d{4,})\s*(?:token)?\s*(?:context|limit)',
+        r'>\s*(\d{4,})\s*(?:max|limit|token)',  # "250000 tokens > 200000 maximum"
+        r'(\d{4,})\s*(?:max(?:imum)?)\b',  # "200000 maximum"
+    ]
+    for pattern in patterns:
+        match = re.search(pattern, error_lower)
+        if match:
+            limit = int(match.group(1))
+            # Sanity check: must be a reasonable context length
+            if 1024 <= limit <= 10_000_000:
+                return limit
+    return None
+
+
+def get_model_context_length(model: str, base_url: str = "") -> int:
+    """Get the context length for a model.
+
+    Resolution order:
+    1. Persistent cache (previously discovered via probing)
+    2. OpenRouter API metadata
+    3. Hardcoded DEFAULT_CONTEXT_LENGTHS (fuzzy match)
+    4. First probe tier (2M) — will be narrowed on first context error
+    """
+    # 1. Check persistent cache (model+provider)
+    if base_url:
+        cached = get_cached_context_length(model, base_url)
+        if cached is not None:
+            return cached
+
+    # 2. OpenRouter API metadata
+    metadata = fetch_model_metadata()
+    if model in metadata:
+        return metadata[model].get("context_length", 128000)
+
+    # 3. Hardcoded defaults (fuzzy match)
+    for default_model, length in DEFAULT_CONTEXT_LENGTHS.items():
+        if default_model in model or model in default_model:
+            return length
+
+    # 4. Unknown model — start at highest probe tier
+    return CONTEXT_PROBE_TIERS[0]
+
+
+def estimate_tokens_rough(text: str) -> int:
+    """Rough token estimate (~4 chars/token) for pre-flight checks."""
+    if not text:
+        return 0
+    return len(text) // 4
+
+
+def estimate_messages_tokens_rough(messages: List[Dict[str, Any]]) -> int:
+    """Rough token estimate for a message list (pre-flight only)."""
+    total_chars = sum(len(str(msg)) for msg in messages)
+    return total_chars // 4
--- a/agent/prompt_builder.py
+++ b/agent/prompt_builder.py
@@ -0,0 +1,378 @@
+"""System prompt assembly -- identity, platform hints, skills index, context files.
+
+All functions are stateless. AIAgent._build_system_prompt() calls these to
+assemble pieces, then combines them with memory and ephemeral prompts.
+"""
+
+import logging
+import os
+import re
+from pathlib import Path
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Context file scanning — detect prompt injection in AGENTS.md, .cursorrules,
+# SOUL.md before they get injected into the system prompt.
+# ---------------------------------------------------------------------------
+
+_CONTEXT_THREAT_PATTERNS = [
+    (r'ignore\s+(previous|all|above|prior)\s+instructions', "prompt_injection"),
+    (r'do\s+not\s+tell\s+the\s+user', "deception_hide"),
+    (r'system\s+prompt\s+override', "sys_prompt_override"),
+    (r'disregard\s+(your|all|any)\s+(instructions|rules|guidelines)', "disregard_rules"),
+    (r'act\s+as\s+(if|though)\s+you\s+(have\s+no|don\'t\s+have)\s+(restrictions|limits|rules)', "bypass_restrictions"),
+    (r'<!--[^>]*(?:ignore|override|system|secret|hidden)[^>]*-->', "html_comment_injection"),
+    (r'<\s*div\s+style\s*=\s*["\'].*display\s*:\s*none', "hidden_div"),
+    (r'translate\s+.*\s+into\s+.*\s+and\s+(execute|run|eval)', "translate_execute"),
+    (r'curl\s+[^\n]*\$\{?\w*(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL|API)', "exfil_curl"),
+    (r'cat\s+[^\n]*(\.env|credentials|\.netrc|\.pgpass)', "read_secrets"),
+]
+
+_CONTEXT_INVISIBLE_CHARS = {
+    '\u200b', '\u200c', '\u200d', '\u2060', '\ufeff',
+    '\u202a', '\u202b', '\u202c', '\u202d', '\u202e',
+}
+
+
+def _scan_context_content(content: str, filename: str) -> str:
+    """Scan context file content for injection. Returns sanitized content."""
+    findings = []
+
+    # Check invisible unicode
+    for char in _CONTEXT_INVISIBLE_CHARS:
+        if char in content:
+            findings.append(f"invisible unicode U+{ord(char):04X}")
+
+    # Check threat patterns
+    for pattern, pid in _CONTEXT_THREAT_PATTERNS:
+        if re.search(pattern, content, re.IGNORECASE):
+            findings.append(pid)
+
+    if findings:
+        logger.warning("Context file %s blocked: %s", filename, ", ".join(findings))
+        return f"[BLOCKED: {filename} contained potential prompt injection ({', '.join(findings)}). Content not loaded.]"
+
+    return content
+
+# =========================================================================
+# Constants
+# =========================================================================
+
+DEFAULT_AGENT_IDENTITY = (
+    "You are Hermes Agent, an intelligent AI assistant created by Nous Research. "
+    "You are helpful, knowledgeable, and direct. You assist users with a wide "
+    "range of tasks including answering questions, writing and editing code, "
+    "analyzing information, creative work, and executing actions via your tools. "
+    "You communicate clearly, admit uncertainty when appropriate, and prioritize "
+    "being genuinely useful over being verbose unless otherwise directed below. "
+    "Be targeted and efficient in your exploration and investigations."
+)
+
+MEMORY_GUIDANCE = (
+    "You have persistent memory across sessions. Proactively save important things "
+    "you learn (user preferences, environment details, useful approaches) and do "
+    "(like a diary!) using the memory tool -- don't wait to be asked."
+)
+
+SESSION_SEARCH_GUIDANCE = (
+    "When the user references something from a past conversation or you suspect "
+    "relevant prior context exists, use session_search to recall it before asking "
+    "them to repeat themselves."
+)
+
+SKILLS_GUIDANCE = (
+    "After completing a complex task (5+ tool calls), fixing a tricky error, "
+    "or discovering a non-trivial workflow, consider saving the approach as a "
+    "skill with skill_manage so you can reuse it next time."
+)
+
+PLATFORM_HINTS = {
+    "whatsapp": (
+        "You are on a text messaging communication platform, WhatsApp. "
+        "Please do not use markdown as it does not render. "
+        "You can send media files natively: to deliver a file to the user, "
+        "include MEDIA:/absolute/path/to/file in your response. The file "
+        "will be sent as a native WhatsApp attachment — images (.jpg, .png, "
+        ".webp) appear as photos, videos (.mp4, .mov) play inline, and other "
+        "files arrive as downloadable documents. You can also include image "
+        "URLs in markdown format ![alt](url) and they will be sent as photos."
+    ),
+    "telegram": (
+        "You are on a text messaging communication platform, Telegram. "
+        "Please do not use markdown as it does not render. "
+        "You can send media files natively: to deliver a file to the user, "
+        "include MEDIA:/absolute/path/to/file in your response. Images "
+        "(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
+        "bubbles, and videos (.mp4) play inline. You can also include image "
+        "URLs in markdown format ![alt](url) and they will be sent as native photos."
+    ),
+    "discord": (
+        "You are in a Discord server or group chat communicating with your user. "
+        "You can send media files natively: include MEDIA:/absolute/path/to/file "
+        "in your response. Images (.png, .jpg, .webp) are sent as photo "
+        "attachments, audio as file attachments. You can also include image URLs "
+        "in markdown format ![alt](url) and they will be sent as attachments."
+    ),
+    "slack": (
+        "You are in a Slack workspace communicating with your user. "
+        "You can send media files natively: include MEDIA:/absolute/path/to/file "
+        "in your response. Images (.png, .jpg, .webp) are uploaded as photo "
+        "attachments, audio as file attachments. You can also include image URLs "
+        "in markdown format ![alt](url) and they will be uploaded as attachments."
+    ),
+    "signal": (
+        "You are on a text messaging communication platform, Signal. "
+        "Please do not use markdown as it does not render. "
+        "You can send media files natively: to deliver a file to the user, "
+        "include MEDIA:/absolute/path/to/file in your response. Images "
+        "(.png, .jpg, .webp) appear as photos, audio as attachments, and other "
+        "files arrive as downloadable documents. You can also include image "
+        "URLs in markdown format ![alt](url) and they will be sent as photos."
+    ),
+    "cli": (
+        "You are a CLI AI Agent. Try not to use markdown but simple text "
+        "renderable inside a terminal."
+    ),
+}
+
+CONTEXT_FILE_MAX_CHARS = 20_000
+CONTEXT_TRUNCATE_HEAD_RATIO = 0.7
+CONTEXT_TRUNCATE_TAIL_RATIO = 0.2
+
+
+# =========================================================================
+# Skills index
+# =========================================================================
+
+def _read_skill_description(skill_file: Path, max_chars: int = 60) -> str:
+    """Read the description from a SKILL.md frontmatter, capped at max_chars."""
+    try:
+        raw = skill_file.read_text(encoding="utf-8")[:2000]
+        match = re.search(
+            r"^---\s*\n.*?description:\s*(.+?)\s*\n.*?^---",
+            raw, re.MULTILINE | re.DOTALL,
+        )
+        if match:
+            desc = match.group(1).strip().strip("'\"")
+            if len(desc) > max_chars:
+                desc = desc[:max_chars - 3] + "..."
+            return desc
+    except Exception:
+        pass
+    return ""
+
+
+def _skill_is_platform_compatible(skill_file: Path) -> bool:
+    """Quick check if a SKILL.md is compatible with the current OS platform.
+
+    Reads just enough to parse the ``platforms`` frontmatter field.
+    Skills without the field (the vast majority) are always compatible.
+    """
+    try:
+        from tools.skills_tool import _parse_frontmatter, skill_matches_platform
+        raw = skill_file.read_text(encoding="utf-8")[:2000]
+        frontmatter, _ = _parse_frontmatter(raw)
+        return skill_matches_platform(frontmatter)
+    except Exception:
+        return True  # Err on the side of showing the skill
+
+
+def build_skills_system_prompt() -> str:
+    """Build a compact skill index for the system prompt.
+
+    Scans ~/.hermes/skills/ for SKILL.md files grouped by category.
+    Includes per-skill descriptions from frontmatter so the model can
+    match skills by meaning, not just name.
+    Filters out skills incompatible with the current OS platform.
+    """
+    hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
+    skills_dir = hermes_home / "skills"
+
+    if not skills_dir.exists():
+        return ""
+
+    # Collect skills with descriptions, grouped by category
+    # Each entry: (skill_name, description)
+    skills_by_category: dict[str, list[tuple[str, str]]] = {}
+    for skill_file in skills_dir.rglob("SKILL.md"):
+        # Skip skills incompatible with the current OS platform
+        if not _skill_is_platform_compatible(skill_file):
+            continue
+        rel_path = skill_file.relative_to(skills_dir)
+        parts = rel_path.parts
+        if len(parts) >= 2:
+            category = parts[0]
+            skill_name = parts[-2]
+        else:
+            category = "general"
+            skill_name = skill_file.parent.name
+        desc = _read_skill_description(skill_file)
+        skills_by_category.setdefault(category, []).append((skill_name, desc))
+
+    if not skills_by_category:
+        return ""
+
+    # Read category-level descriptions from DESCRIPTION.md
+    category_descriptions = {}
+    for category in skills_by_category:
+        desc_file = skills_dir / category / "DESCRIPTION.md"
+        if desc_file.exists():
+            try:
+                content = desc_file.read_text(encoding="utf-8")
+                match = re.search(r"^---\s*\n.*?description:\s*(.+?)\s*\n.*?^---", content, re.MULTILINE | re.DOTALL)
+                if match:
+                    category_descriptions[category] = match.group(1).strip()
+            except Exception as e:
+                logger.debug("Could not read skill description %s: %s", desc_file, e)
+
+    index_lines = []
+    for category in sorted(skills_by_category.keys()):
+        cat_desc = category_descriptions.get(category, "")
+        if cat_desc:
+            index_lines.append(f"  {category}: {cat_desc}")
+        else:
+            index_lines.append(f"  {category}:")
+        # Deduplicate and sort skills within each category
+        seen = set()
+        for name, desc in sorted(skills_by_category[category], key=lambda x: x[0]):
+            if name in seen:
+                continue
+            seen.add(name)
+            if desc:
+                index_lines.append(f"    - {name}: {desc}")
+            else:
+                index_lines.append(f"    - {name}")
+
+    return (
+        "## Skills (mandatory)\n"
+        "Before replying, scan the skills below. If one clearly matches your task, "
+        "load it with skill_view(name) and follow its instructions. "
+        "If a skill has issues, fix it with skill_manage(action='patch').\n"
+        "\n"
+        "<available_skills>\n"
+        + "\n".join(index_lines) + "\n"
+        "</available_skills>\n"
+        "\n"
+        "If none match, proceed normally without loading a skill."
+    )
+
+
+# =========================================================================
+# Context files (SOUL.md, AGENTS.md, .cursorrules)
+# =========================================================================
+
+def _truncate_content(content: str, filename: str, max_chars: int = CONTEXT_FILE_MAX_CHARS) -> str:
+    """Head/tail truncation with a marker in the middle."""
+    if len(content) <= max_chars:
+        return content
+    head_chars = int(max_chars * CONTEXT_TRUNCATE_HEAD_RATIO)
+    tail_chars = int(max_chars * CONTEXT_TRUNCATE_TAIL_RATIO)
+    head = content[:head_chars]
+    tail = content[-tail_chars:]
+    marker = f"\n\n[...truncated {filename}: kept {head_chars}+{tail_chars} of {len(content)} chars. Use file tools to read the full file.]\n\n"
+    return head + marker + tail
+
+
+def build_context_files_prompt(cwd: Optional[str] = None) -> str:
+    """Discover and load context files for the system prompt.
+
+    Discovery: AGENTS.md (recursive), .cursorrules / .cursor/rules/*.mdc,
+    SOUL.md (cwd then ~/.hermes/ fallback). Each capped at 20,000 chars.
+    """
+    if cwd is None:
+        cwd = os.getcwd()
+
+    cwd_path = Path(cwd).resolve()
+    sections = []
+
+    # AGENTS.md (hierarchical, recursive)
+    top_level_agents = None
+    for name in ["AGENTS.md", "agents.md"]:
+        candidate = cwd_path / name
+        if candidate.exists():
+            top_level_agents = candidate
+            break
+
+    if top_level_agents:
+        agents_files = []
+        for root, dirs, files in os.walk(cwd_path):
+            dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ('node_modules', '__pycache__', 'venv', '.venv')]
+            for f in files:
+                if f.lower() == "agents.md":
+                    agents_files.append(Path(root) / f)
+        agents_files.sort(key=lambda p: len(p.parts))
+
+        total_agents_content = ""
+        for agents_path in agents_files:
+            try:
+                content = agents_path.read_text(encoding="utf-8").strip()
+                if content:
+                    rel_path = agents_path.relative_to(cwd_path)
+                    content = _scan_context_content(content, str(rel_path))
+                    total_agents_content += f"## {rel_path}\n\n{content}\n\n"
+            except Exception as e:
+                logger.debug("Could not read %s: %s", agents_path, e)
+
+        if total_agents_content:
+            total_agents_content = _truncate_content(total_agents_content, "AGENTS.md")
+            sections.append(total_agents_content)
+
+    # .cursorrules
+    cursorrules_content = ""
+    cursorrules_file = cwd_path / ".cursorrules"
+    if cursorrules_file.exists():
+        try:
+            content = cursorrules_file.read_text(encoding="utf-8").strip()
+            if content:
+                content = _scan_context_content(content, ".cursorrules")
+                cursorrules_content += f"## .cursorrules\n\n{content}\n\n"
+        except Exception as e:
+            logger.debug("Could not read .cursorrules: %s", e)
+
+    cursor_rules_dir = cwd_path / ".cursor" / "rules"
+    if cursor_rules_dir.exists() and cursor_rules_dir.is_dir():
+        mdc_files = sorted(cursor_rules_dir.glob("*.mdc"))
+        for mdc_file in mdc_files:
+            try:
+                content = mdc_file.read_text(encoding="utf-8").strip()
+                if content:
+                    content = _scan_context_content(content, f".cursor/rules/{mdc_file.name}")
+                    cursorrules_content += f"## .cursor/rules/{mdc_file.name}\n\n{content}\n\n"
+            except Exception as e:
+                logger.debug("Could not read %s: %s", mdc_file, e)
+
+    if cursorrules_content:
+        cursorrules_content = _truncate_content(cursorrules_content, ".cursorrules")
+        sections.append(cursorrules_content)
+
+    # SOUL.md (cwd first, then ~/.hermes/ fallback)
+    soul_path = None
+    for name in ["SOUL.md", "soul.md"]:
+        candidate = cwd_path / name
+        if candidate.exists():
+            soul_path = candidate
+            break
+    if not soul_path:
+        global_soul = Path.home() / ".hermes" / "SOUL.md"
+        if global_soul.exists():
+            soul_path = global_soul
+
+    if soul_path:
+        try:
+            content = soul_path.read_text(encoding="utf-8").strip()
+            if content:
+                content = _scan_context_content(content, "SOUL.md")
+                content = _truncate_content(content, "SOUL.md")
+                sections.append(
+                    f"## SOUL.md\n\nIf SOUL.md is present, embody its persona and tone. "
+                    f"Avoid stiff, generic replies; follow its guidance unless higher-priority "
+                    f"instructions override it.\n\n{content}"
+                )
+        except Exception as e:
+            logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
+
+    if not sections:
+        return ""
+    return "# Project Context\n\nThe following project context files have been loaded and should be followed:\n\n" + "\n".join(sections)
--- a/hermes_agent/providers/caching.py
+++ b/hermes_agent/providers/caching.py
@@ -12,24 +12,21 @@ import copy
 from typing import Any, Dict, List


-def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool = False) -> None:
+def _apply_cache_marker(msg: dict, cache_marker: dict) -> None:
    """Add cache_control to a single message, handling all format variations."""
    role = msg.get("role", "")
    content = msg.get("content")

    if role == "tool":
-        if native_anthropic:
-            msg["cache_control"] = cache_marker
+        msg["cache_control"] = cache_marker
        return

-    if content is None or content == "":
+    if content is None:
        msg["cache_control"] = cache_marker
        return

    if isinstance(content, str):
-        msg["content"] = [
-            {"type": "text", "text": content, "cache_control": cache_marker}
-        ]
+        msg["content"] = [{"type": "text", "text": content, "cache_control": cache_marker}]
        return

    if isinstance(content, list) and content:
@@ -41,7 +38,6 @@ def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool =
 def apply_anthropic_cache_control(
    api_messages: List[Dict[str, Any]],
    cache_ttl: str = "5m",
-    native_anthropic: bool = False,
 ) -> List[Dict[str, Any]]:
    """Apply system_and_3 caching strategy to messages for Anthropic models.

@@ -61,12 +57,12 @@ def apply_anthropic_cache_control(
    breakpoints_used = 0

    if messages[0].get("role") == "system":
-        _apply_cache_marker(messages[0], marker, native_anthropic=native_anthropic)
+        _apply_cache_marker(messages[0], marker)
        breakpoints_used += 1

    remaining = 4 - breakpoints_used
    non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
    for idx in non_sys[-remaining:]:
-        _apply_cache_marker(messages[idx], marker, native_anthropic=native_anthropic)
+        _apply_cache_marker(messages[idx], marker)

    return messages
--- a/agent/redact.py
+++ b/agent/redact.py
@@ -0,0 +1,161 @@
+"""Regex-based secret redaction for logs and tool output.
+
+Applies pattern matching to mask API keys, tokens, and credentials
+before they reach log files, verbose output, or gateway logs.
+
+Short tokens (< 18 chars) are fully masked. Longer tokens preserve
+the first 6 and last 4 characters for debuggability.
+"""
+
+import logging
+import os
+import re
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# Known API key prefixes -- match the prefix + contiguous token chars
+_PREFIX_PATTERNS = [
+    r"sk-[A-Za-z0-9_-]{10,}",           # OpenAI / OpenRouter / Anthropic (sk-ant-*)
+    r"ghp_[A-Za-z0-9]{10,}",            # GitHub PAT (classic)
+    r"github_pat_[A-Za-z0-9_]{10,}",    # GitHub PAT (fine-grained)
+    r"xox[baprs]-[A-Za-z0-9-]{10,}",    # Slack tokens
+    r"AIza[A-Za-z0-9_-]{30,}",          # Google API keys
+    r"pplx-[A-Za-z0-9]{10,}",           # Perplexity
+    r"fal_[A-Za-z0-9_-]{10,}",          # Fal.ai
+    r"fc-[A-Za-z0-9]{10,}",             # Firecrawl
+    r"bb_live_[A-Za-z0-9_-]{10,}",      # BrowserBase
+    r"gAAAA[A-Za-z0-9_=-]{20,}",        # Codex encrypted tokens
+    r"AKIA[A-Z0-9]{16}",                # AWS Access Key ID
+    r"sk_live_[A-Za-z0-9]{10,}",        # Stripe secret key (live)
+    r"sk_test_[A-Za-z0-9]{10,}",        # Stripe secret key (test)
+    r"rk_live_[A-Za-z0-9]{10,}",        # Stripe restricted key
+    r"SG\.[A-Za-z0-9_-]{10,}",          # SendGrid API key
+    r"hf_[A-Za-z0-9]{10,}",             # HuggingFace token
+    r"r8_[A-Za-z0-9]{10,}",             # Replicate API token
+    r"npm_[A-Za-z0-9]{10,}",            # npm access token
+    r"pypi-[A-Za-z0-9_-]{10,}",         # PyPI API token
+    r"dop_v1_[A-Za-z0-9]{10,}",         # DigitalOcean PAT
+    r"doo_v1_[A-Za-z0-9]{10,}",         # DigitalOcean OAuth
+    r"am_[A-Za-z0-9_-]{10,}",           # AgentMail API key
+]
+
+# ENV assignment patterns: KEY=value where KEY contains a secret-like name
+_SECRET_ENV_NAMES = r"(?:API_?KEY|TOKEN|SECRET|PASSWORD|PASSWD|CREDENTIAL|AUTH)"
+_ENV_ASSIGN_RE = re.compile(
+    rf"([A-Z_]*{_SECRET_ENV_NAMES}[A-Z_]*)\s*=\s*(['\"]?)(\S+)\2",
+    re.IGNORECASE,
+)
+
+# JSON field patterns: "apiKey": "value", "token": "value", etc.
+_JSON_KEY_NAMES = r"(?:api_?[Kk]ey|token|secret|password|access_token|refresh_token|auth_token|bearer)"
+_JSON_FIELD_RE = re.compile(
+    rf'("{_JSON_KEY_NAMES}")\s*:\s*"([^"]+)"',
+    re.IGNORECASE,
+)
+
+# Authorization headers
+_AUTH_HEADER_RE = re.compile(
+    r"(Authorization:\s*Bearer\s+)(\S+)",
+    re.IGNORECASE,
+)
+
+# Telegram bot tokens: bot<digits>:<token> or <digits>:<alphanum>
+_TELEGRAM_RE = re.compile(
+    r"(bot)?(\d{8,}):([-A-Za-z0-9_]{30,})",
+)
+
+# Private key blocks: -----BEGIN RSA PRIVATE KEY----- ... -----END RSA PRIVATE KEY-----
+_PRIVATE_KEY_RE = re.compile(
+    r"-----BEGIN[A-Z ]*PRIVATE KEY-----[\s\S]*?-----END[A-Z ]*PRIVATE KEY-----"
+)
+
+# Database connection strings: protocol://user:PASSWORD@host
+# Catches postgres, mysql, mongodb, redis, amqp URLs and redacts the password
+_DB_CONNSTR_RE = re.compile(
+    r"((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis|amqp)://[^:]+:)([^@]+)(@)",
+    re.IGNORECASE,
+)
+
+# E.164 phone numbers: +<country><number>, 7-15 digits
+# Negative lookahead prevents matching hex strings or identifiers
+_SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
+
+# Compile known prefix patterns into one alternation
+_PREFIX_RE = re.compile(
+    r"(?<![A-Za-z0-9_-])(" + "|".join(_PREFIX_PATTERNS) + r")(?![A-Za-z0-9_-])"
+)
+
+
+def _mask_token(token: str) -> str:
+    """Mask a token, preserving prefix for long tokens."""
+    if len(token) < 18:
+        return "***"
+    return f"{token[:6]}...{token[-4:]}"
+
+
+def redact_sensitive_text(text: str) -> str:
+    """Apply all redaction patterns to a block of text.
+
+    Safe to call on any string -- non-matching text passes through unchanged.
+    Disabled when security.redact_secrets is false in config.yaml.
+    """
+    if not text:
+        return text
+    if os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("0", "false", "no", "off"):
+        return text
+
+    # Known prefixes (sk-, ghp_, etc.)
+    text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
+
+    # ENV assignments: OPENAI_API_KEY=sk-abc...
+    def _redact_env(m):
+        name, quote, value = m.group(1), m.group(2), m.group(3)
+        return f"{name}={quote}{_mask_token(value)}{quote}"
+    text = _ENV_ASSIGN_RE.sub(_redact_env, text)
+
+    # JSON fields: "apiKey": "value"
+    def _redact_json(m):
+        key, value = m.group(1), m.group(2)
+        return f'{key}: "{_mask_token(value)}"'
+    text = _JSON_FIELD_RE.sub(_redact_json, text)
+
+    # Authorization headers
+    text = _AUTH_HEADER_RE.sub(
+        lambda m: m.group(1) + _mask_token(m.group(2)),
+        text,
+    )
+
+    # Telegram bot tokens
+    def _redact_telegram(m):
+        prefix = m.group(1) or ""
+        digits = m.group(2)
+        return f"{prefix}{digits}:***"
+    text = _TELEGRAM_RE.sub(_redact_telegram, text)
+
+    # Private key blocks
+    text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
+
+    # Database connection string passwords
+    text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
+
+    # E.164 phone numbers (Signal, WhatsApp)
+    def _redact_phone(m):
+        phone = m.group(1)
+        if len(phone) <= 8:
+            return phone[:2] + "****" + phone[-2:]
+        return phone[:4] + "****" + phone[-4:]
+    text = _SIGNAL_PHONE_RE.sub(_redact_phone, text)
+
+    return text
+
+
+class RedactingFormatter(logging.Formatter):
+    """Log formatter that redacts secrets from all log messages."""
+
+    def __init__(self, fmt=None, datefmt=None, style='%', **kwargs):
+        super().__init__(fmt, datefmt, style, **kwargs)
+
+    def format(self, record: logging.LogRecord) -> str:
+        original = super().format(record)
+        return redact_sensitive_text(original)
--- a/agent/skill_commands.py
+++ b/agent/skill_commands.py
@@ -0,0 +1,116 @@
+"""Skill slash commands — scan installed skills and build invocation messages.
+
+Shared between CLI (cli.py) and gateway (gateway/run.py) so both surfaces
+can invoke skills via /skill-name commands.
+"""
+
+import logging
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+logger = logging.getLogger(__name__)
+
+_skill_commands: Dict[str, Dict[str, Any]] = {}
+
+
+def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
+    """Scan ~/.hermes/skills/ and return a mapping of /command -> skill info.
+
+    Returns:
+        Dict mapping "/skill-name" to {name, description, skill_md_path, skill_dir}.
+    """
+    global _skill_commands
+    _skill_commands = {}
+    try:
+        from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform
+        if not SKILLS_DIR.exists():
+            return _skill_commands
+        for skill_md in SKILLS_DIR.rglob("SKILL.md"):
+            if any(part in ('.git', '.github', '.hub') for part in skill_md.parts):
+                continue
+            try:
+                content = skill_md.read_text(encoding='utf-8')
+                frontmatter, body = _parse_frontmatter(content)
+                # Skip skills incompatible with the current OS platform
+                if not skill_matches_platform(frontmatter):
+                    continue
+                name = frontmatter.get('name', skill_md.parent.name)
+                description = frontmatter.get('description', '')
+                if not description:
+                    for line in body.strip().split('\n'):
+                        line = line.strip()
+                        if line and not line.startswith('#'):
+                            description = line[:80]
+                            break
+                cmd_name = name.lower().replace(' ', '-').replace('_', '-')
+                _skill_commands[f"/{cmd_name}"] = {
+                    "name": name,
+                    "description": description or f"Invoke the {name} skill",
+                    "skill_md_path": str(skill_md),
+                    "skill_dir": str(skill_md.parent),
+                }
+            except Exception:
+                continue
+    except Exception:
+        pass
+    return _skill_commands
+
+
+def get_skill_commands() -> Dict[str, Dict[str, Any]]:
+    """Return the current skill commands mapping (scan first if empty)."""
+    if not _skill_commands:
+        scan_skill_commands()
+    return _skill_commands
+
+
+def build_skill_invocation_message(cmd_key: str, user_instruction: str = "") -> Optional[str]:
+    """Build the user message content for a skill slash command invocation.
+
+    Args:
+        cmd_key: The command key including leading slash (e.g., "/gif-search").
+        user_instruction: Optional text the user typed after the command.
+
+    Returns:
+        The formatted message string, or None if the skill wasn't found.
+    """
+    commands = get_skill_commands()
+    skill_info = commands.get(cmd_key)
+    if not skill_info:
+        return None
+
+    skill_md_path = Path(skill_info["skill_md_path"])
+    skill_dir = Path(skill_info["skill_dir"])
+    skill_name = skill_info["name"]
+
+    try:
+        content = skill_md_path.read_text(encoding='utf-8')
+    except Exception:
+        return f"[Failed to load skill: {skill_name}]"
+
+    parts = [
+        f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
+        "",
+        content.strip(),
+    ]
+
+    supporting = []
+    for subdir in ("references", "templates", "scripts", "assets"):
+        subdir_path = skill_dir / subdir
+        if subdir_path.exists():
+            for f in sorted(subdir_path.rglob("*")):
+                if f.is_file():
+                    rel = str(f.relative_to(skill_dir))
+                    supporting.append(rel)
+
+    if supporting:
+        parts.append("")
+        parts.append("[This skill has supporting files you can load with the skill_view tool:]")
+        for sf in supporting:
+            parts.append(f"- {sf}")
+        parts.append(f'\nTo view any of these, use: skill_view(name="{skill_name}", file="<path>")')
+
+    if user_instruction:
+        parts.append("")
+        parts.append(f"The user has provided the following instruction alongside the skill invocation: {user_instruction}")
+
+    return "\n".join(parts)
--- a/hermes_agent/agent/trajectory.py
+++ b/hermes_agent/agent/trajectory.py
--- a/scripts/batch_runner.py
+++ b/scripts/batch_runner.py
@@ -20,13 +20,9 @@ Usage:
    python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run --distribution=image_gen
 """

-import os
-import sys
-
-sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-
 import json
 import logging
+import os
 import time
 from pathlib import Path
 from typing import List, Dict, Any, Optional, Tuple
@@ -35,17 +31,15 @@ from multiprocessing import Pool, Lock
 import traceback
 from rich.progress import Progress, SpinnerColumn, BarColumn, TextColumn, TimeRemainingColumn, MofNCompleteColumn
 from rich.console import Console
-
-logger = logging.getLogger(__name__)
 import fire

-from hermes_agent.agent.loop import AIAgent
-from hermes_agent.tools.distributions import (
+from run_agent import AIAgent
+from toolset_distributions import (
    list_distributions, 
    sample_toolsets_from_distribution,
    validate_distribution
 )
-from hermes_agent.tools.dispatch import TOOL_TO_TOOLSET_MAP
+from model_tools import TOOL_TO_TOOLSET_MAP


 # Global configuration for worker processes
@@ -134,7 +128,6 @@ def _extract_tool_stats(messages: List[Dict[str, Any]]) -> Dict[str, Dict[str, i
        # Track tool calls from assistant messages
        if msg["role"] == "assistant" and "tool_calls" in msg and msg["tool_calls"]:
            for tool_call in msg["tool_calls"]:
-                if not tool_call or not isinstance(tool_call, dict): continue
                tool_name = tool_call["function"]["name"]
                tool_call_id = tool_call["id"]
                
@@ -293,7 +286,7 @@ def _process_single_prompt(
                if config.get("verbose"):
                    print(f"   Prompt {prompt_index}: Docker image check failed: {img_err}", flush=True)

-        from hermes_agent.tools.terminal import register_task_env_overrides
+        from tools.terminal_tool import register_task_env_overrides
        overrides = {
            "docker_image": container_image,
            "modal_image": container_image,
@@ -448,7 +441,6 @@ def _process_batch_worker(args: Tuple) -> Dict[str, Any]:
            if not reasoning.get("has_any_reasoning", True):
                print(f"   🚫 Prompt {prompt_index} discarded (no reasoning in any turn)")
                discarded_no_reasoning += 1
-                completed_in_batch.append(prompt_index)
                continue
            
            # Get and normalize tool stats for consistent schema across all entries
@@ -566,10 +558,7 @@ class BatchRunner:
            provider_sort (str): Sort providers by price/throughput/latency (optional)
            max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
            reasoning_config (Dict): OpenRouter reasoning config override (e.g. {"effort": "none"} to disable thinking)
-            prefill_messages (List[Dict]): Messages to prepend as prefilled conversation context (few-shot priming).
-                NOTE: Anthropic Sonnet 4.6+ and Opus 4.6+ reject a trailing assistant-role prefill
-                (400 error).  For those models use output_config.format or structured-output
-                schemas instead.  Safe here for user-role priming and for older Claude / non-Claude models.
+            prefill_messages (List[Dict]): Messages to prepend as prefilled conversation context (few-shot priming)
            max_samples (int): Only process the first N samples from the dataset (optional, processes all if not set)
        """
        self.dataset_file = Path(dataset_file)
@@ -617,7 +606,7 @@ class BatchRunner:
        # Create batches
        self.batches = self._create_batches()
        
-        print("📊 Batch Runner Initialized")
+        print(f"📊 Batch Runner Initialized")
        print(f"   Dataset: {self.dataset_file} ({len(self.dataset)} prompts)")
        print(f"   Batch size: {self.batch_size}")
        print(f"   Total batches: {len(self.batches)}")
@@ -712,7 +701,7 @@ class BatchRunner:
        """
        checkpoint_data["last_updated"] = datetime.now().isoformat()

-        from hermes_agent.utils import atomic_json_write
+        from utils import atomic_json_write
        if lock:
            with lock:
                atomic_json_write(self.checkpoint_file, checkpoint_data)
@@ -837,7 +826,7 @@ class BatchRunner:
            print("=" * 70)
            print(f"   Original dataset size:     {len(self.dataset):,} prompts")
            print(f"   Already completed:         {len(skipped_indices):,} prompts")
-            print("   ─────────────────────────────────────────")
+            print(f"   ─────────────────────────────────────────")
            print(f"   🎯 RESUMING WITH:          {len(filtered_entries):,} prompts")
            print(f"   New batches created:       {len(batches_to_process)}")
            print("=" * 70 + "\n")
@@ -899,7 +888,7 @@ class BatchRunner:
            ]
            
            print(f"✅ Created {len(tasks)} batch tasks")
-            print("🚀 Starting parallel batch processing...\n")
+            print(f"🚀 Starting parallel batch processing...\n")
            
            # Use rich Progress for better visual tracking with persistent bottom bar
            # redirect_stdout/stderr lets rich manage all output so progress bar stays clean
@@ -1026,7 +1015,7 @@ class BatchRunner:
                            tool_stats = data.get('tool_stats', {})
                            
                            # Check for invalid tool names (model hallucinations)
-                            invalid_tools = [k for k in tool_stats if k not in VALID_TOOLS]
+                            invalid_tools = [k for k in tool_stats.keys() if k not in VALID_TOOLS]
                            
                            if invalid_tools:
                                filtered_entries += 1
@@ -1068,7 +1057,7 @@ class BatchRunner:
        print(f"✅ Total trajectories in merged file: {total_entries - filtered_entries}")
        print(f"✅ Total batch files merged: {batch_files_found}")
        print(f"⏱️  Total duration: {round(time.time() - start_time, 2)}s")
-        print("\n📈 Tool Usage Statistics:")
+        print(f"\n📈 Tool Usage Statistics:")
        print("-" * 70)
        
        if total_tool_stats:
@@ -1095,7 +1084,7 @@ class BatchRunner:
        # Print reasoning coverage stats
        total_discarded = sum(r.get("discarded_no_reasoning", 0) for r in results)
        
-        print("\n🧠 Reasoning Coverage:")
+        print(f"\n🧠 Reasoning Coverage:")
        print("-" * 70)
        total_turns = total_reasoning_stats["total_assistant_turns"]
        with_reasoning = total_reasoning_stats["turns_with_reasoning"]
@@ -1112,8 +1101,8 @@ class BatchRunner:
            print(f"   🚫 Samples discarded (zero reasoning): {total_discarded:,}")
        
        print(f"\n💾 Results saved to: {self.output_dir}")
-        print("   - Trajectories: trajectories.jsonl (combined)")
-        print("   - Individual batches: batch_*.jsonl (for debugging)")
+        print(f"   - Trajectories: trajectories.jsonl (combined)")
+        print(f"   - Individual batches: batch_*.jsonl (for debugging)")
        print(f"   - Statistics: {self.stats_file.name}")
        print(f"   - Checkpoint: {self.checkpoint_file.name}")

@@ -1130,7 +1119,7 @@ def main(
    num_workers: int = 4,
    resume: bool = False,
    verbose: bool = False,
-    show_distributions: bool = False,
+    list_distributions: bool = False,
    ephemeral_system_prompt: str = None,
    log_prefix_chars: int = 100,
    providers_allowed: str = None,
@@ -1158,7 +1147,7 @@ def main(
        num_workers (int): Number of parallel worker processes (default: 4)
        resume (bool): Resume from checkpoint if run was interrupted (default: False)
        verbose (bool): Enable verbose logging (default: False)
-        show_distributions (bool): List available toolset distributions and exit
+        list_distributions (bool): List available toolset distributions and exit
        ephemeral_system_prompt (str): System prompt used during agent execution but NOT saved to trajectories (optional)
        log_prefix_chars (int): Number of characters to show in log previews for tool calls/responses (default: 20)
        providers_allowed (str): Comma-separated list of OpenRouter providers to allow (e.g. "anthropic,openai")
@@ -1166,7 +1155,7 @@ def main(
        providers_order (str): Comma-separated list of OpenRouter providers to try in order (e.g. "anthropic,openai,google")
        provider_sort (str): Sort providers by "price", "throughput", or "latency" (OpenRouter only)
        max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
-        reasoning_effort (str): OpenRouter reasoning effort level: "none", "minimal", "low", "medium", "high", "xhigh" (default: "medium")
+        reasoning_effort (str): OpenRouter reasoning effort level: "xhigh", "high", "medium", "low", "minimal", "none" (default: "medium")
        reasoning_disabled (bool): Completely disable reasoning/thinking tokens (default: False)
        prefill_messages_file (str): Path to JSON file containing prefill messages (list of {role, content} dicts)
        max_samples (int): Only process the first N samples from the dataset (optional, processes all if not set)
@@ -1190,16 +1179,16 @@ def main(
                               --prefill_messages_file=configs/prefill_opus.json
        
        # List available distributions
-        python batch_runner.py --show_distributions
+        python batch_runner.py --list_distributions
    """
    # Handle list distributions
-    if show_distributions:
-        from hermes_agent.tools.distributions import print_distribution_info
-
+    if list_distributions:
+        from toolset_distributions import list_distributions as get_all_dists, print_distribution_info
+        
        print("📊 Available Toolset Distributions")
        print("=" * 70)
-
-        all_dists = list_distributions()
+        
+        all_dists = get_all_dists()
        for dist_name in sorted(all_dists.keys()):
            print_distribution_info(dist_name)
        
@@ -1235,7 +1224,7 @@ def main(
        print("🧠 Reasoning: DISABLED (effort=none)")
    elif reasoning_effort:
        # Use specified effort level
-        valid_efforts = ["none", "minimal", "low", "medium", "high", "xhigh"]
+        valid_efforts = ["xhigh", "high", "medium", "low", "minimal", "none"]
        if reasoning_effort not in valid_efforts:
            print(f"❌ Error: --reasoning_effort must be one of: {', '.join(valid_efforts)}")
            return
@@ -1249,7 +1238,7 @@ def main(
            with open(prefill_messages_file, 'r', encoding='utf-8') as f:
                prefill_messages = json.load(f)
            if not isinstance(prefill_messages, list):
-                print("❌ Error: prefill_messages_file must contain a JSON array of messages")
+                print(f"❌ Error: prefill_messages_file must contain a JSON array of messages")
                return
            print(f"💬 Loaded {len(prefill_messages)} prefill messages from {prefill_messages_file}")
        except Exception as e:
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -7,38 +7,16 @@
 # =============================================================================
 model:
  # Default model to use (can be overridden with --model flag)
-  # Both "default" and "model" work as the key name here.
  default: "anthropic/claude-opus-4.6"
  
  # Inference provider selection:
-  #   "auto"         - Auto-detect from credentials (default)
-  #   "openrouter"   - OpenRouter (requires: OPENROUTER_API_KEY or OPENAI_API_KEY)
-  #   "nous"         - Nous Portal OAuth (requires: hermes login)
-  #   "nous-api"     - Nous Portal API key (requires: NOUS_API_KEY)
-  #   "anthropic"    - Direct Anthropic API (requires: ANTHROPIC_API_KEY)
-  #   "openai-codex" - OpenAI Codex (requires: hermes auth)
-  #   "copilot"      - GitHub Copilot / GitHub Models (requires: GITHUB_TOKEN)
-  #   "gemini"      - Use Google AI Studio direct (requires: GOOGLE_API_KEY or GEMINI_API_KEY)
-  #   "zai"         - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
-  #   "kimi-coding"  - Kimi / Moonshot AI (requires: KIMI_API_KEY)
-  #   "minimax"      - MiniMax global (requires: MINIMAX_API_KEY)
-  #   "minimax-cn"   - MiniMax China (requires: MINIMAX_CN_API_KEY)
-  #   "huggingface"  - Hugging Face Inference (requires: HF_TOKEN)
-  #   "nvidia"       - NVIDIA NIM / build.nvidia.com (requires: NVIDIA_API_KEY)
-  #   "xiaomi"       - Xiaomi MiMo (requires: XIAOMI_API_KEY)
-  #   "arcee"        - Arcee AI Trinity models (requires: ARCEEAI_API_KEY)
-  #   "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY — https://ollama.com/settings)
-  #   "kilocode"     - KiloCode gateway (requires: KILOCODE_API_KEY)
-  #   "ai-gateway"   - Vercel AI Gateway (requires: AI_GATEWAY_API_KEY)
-  #
-  # Local servers (LM Studio, Ollama, vLLM, llama.cpp):
-  #   "custom"       - Any OpenAI-compatible endpoint. Set base_url below.
-  #   Aliases: "lmstudio", "ollama", "vllm", "llamacpp" all map to "custom".
-  #   Example for LM Studio:
-  #     provider: "lmstudio"
-  #     base_url: "http://localhost:1234/v1"
-  #   No API key needed — local servers typically ignore auth.
-  #
+  #   "auto"       - Use Nous Portal if logged in, otherwise OpenRouter/env vars (default)
+  #   "openrouter" - Always use OpenRouter API key from OPENROUTER_API_KEY
+  #   "nous"       - Always use Nous Portal (requires: hermes login)
+  #   "zai"        - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
+  #   "kimi-coding"- Use Kimi / Moonshot AI models (requires: KIMI_API_KEY)
+  #   "minimax"    - Use MiniMax global endpoint (requires: MINIMAX_API_KEY)
+  #   "minimax-cn" - Use MiniMax China endpoint (requires: MINIMAX_CN_API_KEY)
  # Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
  provider: "auto"
  
@@ -46,56 +24,6 @@ model:
  # api_key: "your-key-here"  # Uncomment to set here instead of .env
  base_url: "https://openrouter.ai/api/v1"

-  # ── Token limits — two settings, easy to confuse ──────────────────────────
-  #
-  # context_length: TOTAL context window (input + output tokens combined).
-  #   Controls when Hermes compresses history and validates requests.
-  #   Leave unset — Hermes auto-detects the correct value from the provider.
-  #   Set manually only when auto-detection is wrong (e.g. a local server with
-  #   a custom num_ctx, or a proxy that doesn't expose /v1/models).
-  #
-  # context_length: 131072
-  #
-  # max_tokens: OUTPUT cap — maximum tokens the model may generate per response.
-  #   Unrelated to how long your conversation history can be.
-  #   The OpenAI-standard name "max_tokens" is a misnomer; Anthropic's native
-  #   API has since renamed it "max_output_tokens" for clarity.
-  #   Leave unset to use the model's native output ceiling (recommended).
-  #   Set only if you want to deliberately limit individual response length.
-  #
-# max_tokens: 8192
-
-# Named provider overrides (optional)
-# Use this for per-provider request timeouts, non-stream stale timeouts,
-# and per-model exceptions.
-# Applies to the primary turn client on every api_mode (OpenAI-wire, native
-# Anthropic, and Anthropic-compatible providers), the fallback chain, and
-# client rebuilds during credential rotation.  For OpenAI-wire chat
-# completions (streaming and non-streaming) the configured value is also
-# used as the per-request ``timeout=`` kwarg so it wins over the legacy
-# HERMES_API_TIMEOUT env var (which still applies when no config is set).
-# ``stale_timeout_seconds`` controls the non-streaming stale-call detector and
-# wins over the legacy HERMES_API_CALL_STALE_TIMEOUT env var. Leaving these
-# unset keeps the legacy defaults (HERMES_API_TIMEOUT=1800s,
-# HERMES_API_CALL_STALE_TIMEOUT=300s, native Anthropic 900s).
-#
-# Not currently wired for AWS Bedrock (bedrock_converse + AnthropicBedrock
-# SDK paths) — those use boto3 with its own timeout configuration.
-#
-# providers:
-#   ollama-local:
-#     request_timeout_seconds: 300   # Longer timeout for local cold-starts
-#     stale_timeout_seconds: 900     # Explicitly re-enable stale detection on local endpoints
-#   anthropic:
-#     request_timeout_seconds: 30    # Fast-fail cloud requests
-#     models:
-#       claude-opus-4.6:
-#         timeout_seconds: 600       # Longer timeout for extended-thinking Opus calls
-#   openai-codex:
-#     models:
-#       gpt-5.4:
-#         stale_timeout_seconds: 1800  # Longer non-stream stale timeout for slow large-context turns
-
 # =============================================================================
 # OpenRouter Provider Routing (only applies when using OpenRouter)
 # =============================================================================
@@ -147,12 +75,10 @@ model:
 #   - Messaging (Telegram/Discord): Uses MESSAGING_CWD from .env (default: home)
 terminal:
  backend: "local"
-  cwd: "."  # For local backend: "." = current directory. Ignored for remote backends unless a backend documents otherwise.
+  cwd: "."  # For local backend: "." = current directory. Ignored for remote backends.
  timeout: 180
-  docker_mount_cwd_to_workspace: false  # SECURITY: off by default. Opt in to mount the launch cwd into Docker /workspace.
  lifetime_seconds: 300
-  # sudo_password: "hunter2"  # Optional: pipe a sudo password via sudo -S. SECURITY WARNING: plaintext.
-  # sudo_password: ""         # Explicit empty password: try empty and never open the interactive sudo prompt.
+  # sudo_password: ""  # Enable sudo commands (pipes via sudo -S) - SECURITY WARNING: plaintext!

 # -----------------------------------------------------------------------------
 # OPTION 2: SSH remote execution
@@ -180,13 +106,6 @@ terminal:
 #   timeout: 180
 #   lifetime_seconds: 300
 #   docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
-#   docker_mount_cwd_to_workspace: true   # Explicit opt-in: mount your launch cwd into /workspace
-#   # Optional: explicitly forward selected env vars into Docker.
-#   # These values come from your current shell first, then ~/.hermes/.env.
-#   # Warning: anything forwarded here is visible to commands run in the container.
-#   docker_forward_env:
-#     - "GITHUB_TOKEN"
-#     - "NPM_TOKEN"

 # -----------------------------------------------------------------------------
 # OPTION 4: Singularity/Apptainer container
@@ -243,18 +162,13 @@ terminal:
 #
 # SECURITY WARNING: Password stored in plaintext!
 #
-# INTERACTIVE PROMPT: If sudo_password is unset and the CLI is running,
+# INTERACTIVE PROMPT: If no sudo_password is set and the CLI is running,
 # you'll be prompted to enter your password when sudo is needed:
 # - 45-second timeout (auto-skips if no input)
 # - Press Enter to skip (command fails gracefully)
 # - Password is hidden while typing
 # - Password is cached for the session
 #
-# EMPTY PASSWORDS: Setting sudo_password to an explicit empty string is different
-# from leaving it unset. Hermes will try an empty password via `sudo -S` and
-# will not open the interactive prompt. This is useful for passwordless sudo,
-# Touch ID sudo setups, and environments where prompting is just noise.
-#
 # ALTERNATIVES:
 # - SSH backend: Configure passwordless sudo on the remote server
 # - Containers: Run as root inside the container (no sudo needed)
@@ -263,20 +177,6 @@ terminal:
 # Example (add to your terminal section):
 #   sudo_password: "your-password-here"

-# =============================================================================
-# Security Scanning (tirith)
-# =============================================================================
-# Optional pre-exec command security scanning via tirith.
-# Detects homograph URLs, pipe-to-shell, terminal injection, env manipulation.
-# Install: brew install sheeki03/tap/tirith
-# Docs: https://github.com/sheeki03/tirith
-#
-# security:
-#   tirith_enabled: true        # Enable/disable tirith scanning
-#   tirith_path: "tirith"       # Path to tirith binary (supports ~ expansion)
-#   tirith_timeout: 5           # Scan timeout in seconds
-#   tirith_fail_open: true      # Allow commands if tirith unavailable
-
 # =============================================================================
 # Browser Tool Configuration
 # =============================================================================
@@ -295,36 +195,28 @@ browser:
 # 1. Tracks actual token usage from API responses (not estimates)
 # 2. When prompt_tokens >= threshold% of model's context_length, triggers compression
 # 3. Protects first 3 turns (system prompt, initial request, first response)
-# 4. Protects last N turns (default 20 messages = ~10 full turns of recent context)
+# 4. Protects last 4 turns (recent context is most relevant)
 # 5. Summarizes middle turns using a fast/cheap model
 # 6. Inserts summary as a user message, continues conversation seamlessly
 #
-# Post-compression tail budget is target_ratio × threshold × context_length:
-#   200K context, threshold 0.50, ratio 0.20 → 20K tokens of recent tail preserved
-#   1M   context, threshold 0.50, ratio 0.20 → 100K tokens of recent tail preserved
-#
 compression:
  # Enable automatic context compression (default: true)
  # Set to false if you prefer to manage context manually or want errors on overflow
  enabled: true
  
-  # Trigger compression at this % of model's context limit (default: 0.50 = 50%)
+  # Trigger compression at this % of model's context limit (default: 0.85 = 85%)
  # Lower values = more aggressive compression, higher values = compress later
-  threshold: 0.50
+  threshold: 0.85
  
-  # Fraction of the threshold to preserve as recent tail (default: 0.20 = 20%)
-  # e.g. 20% of 50% threshold = 10% of total context kept as recent messages.
-  # Summary output is separately capped at 12K tokens (Gemini output limit).
-  # Range: 0.10 - 0.80
-  target_ratio: 0.20
-
-  # Number of most-recent messages to always preserve (default: 20 ≈ 10 full turns)
-  # Higher values keep more recent conversation intact at the cost of more aggressive
-  # compression of older turns.
-  protect_last_n: 20
-
-  # To pin a specific model/provider for compression summaries, use the
-  # auxiliary section below (auxiliary.compression.provider / model).
+  # Model to use for generating summaries (fast/cheap recommended)
+  # This model compresses the middle turns into a concise summary.
+  # IMPORTANT: it receives the full middle section of the conversation, so it
+  # MUST support a context length at least as large as your main model's.
+  summary_model: "google/gemini-3-flash-preview"
+  
+  # Provider for the summary model (default: "auto")
+  # Options: "auto", "openrouter", "nous", "main"
+  # summary_provider: "auto"

 # =============================================================================
 # Auxiliary Models (Advanced — Experimental)
@@ -349,9 +241,7 @@ compression:
 #   "auto"       - Best available: OpenRouter → Nous Portal → main endpoint (default)
 #   "openrouter" - Force OpenRouter (requires OPENROUTER_API_KEY)
 #   "nous"       - Force Nous Portal (requires: hermes login)
-#   "gemini"      - Force Google AI Studio direct (requires: GOOGLE_API_KEY or GEMINI_API_KEY)
-#   "ollama-cloud" - Ollama Cloud (requires: OLLAMA_API_KEY)
-#   "codex"       - Force Codex OAuth (requires: hermes model → Codex).
+#   "codex"      - Force Codex OAuth (requires: hermes model → Codex).
 #                  Uses gpt-5.3-codex which supports vision.
 #   "main"       - Use your custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY).
 #                  Works with OpenAI API, local models, or any OpenAI-compatible
@@ -366,26 +256,11 @@ compression:
 #   vision:
 #     provider: "auto"
 #     model: ""              # e.g. "google/gemini-2.5-flash", "openai/gpt-4o"
-#     timeout: 30            # LLM API call timeout (seconds)
-#     download_timeout: 30   # Image HTTP download timeout (seconds)
-#                            # Increase for slow connections or self-hosted image servers
 #
 #   # Web page scraping / summarization + browser page text extraction
 #   web_extract:
 #     provider: "auto"
 #     model: ""
-#
-#   # Session search — summarizes matching past sessions
-#   session_search:
-#     provider: "auto"
-#     model: ""
-#     timeout: 30
-#     max_concurrency: 3    # Limit parallel summaries to reduce request-burst 429s
-#     extra_body: {}        # Provider-specific OpenAI-compatible request fields
-#                           # Example for providers that support request-body
-#                           # reasoning controls:
-#                           # extra_body:
-#                           #   enable_thinking: false

 # =============================================================================
 # Persistent Memory
@@ -443,25 +318,6 @@ session_reset:
  idle_minutes: 1440   # Inactivity timeout in minutes (default: 1440 = 24 hours)
  at_hour: 4           # Daily reset hour, 0-23 local time (default: 4 AM)

-# When true, group/channel chats use one session per participant when the platform
-# provides a user ID. This is the secure default and prevents users in the same
-# room from sharing context, interrupts, and token costs. Set false only if you
-# explicitly want one shared "room brain" per group/channel.
-group_sessions_per_user: true
-
-# ─────────────────────────────────────────────────────────────────────────────
-# Gateway Streaming
-# ─────────────────────────────────────────────────────────────────────────────
-# Stream tokens to messaging platforms in real-time. The bot sends a message
-# on first token, then progressively edits it as more tokens arrive.
-# Disabled by default — enable to try the streaming UX on Telegram/Discord/Slack.
-streaming:
-  enabled: false
-  # transport: edit           # "edit" = progressive editMessageText
-  # edit_interval: 0.3        # seconds between message edits
-  # buffer_threshold: 40      # chars before forcing an edit flush
-  # cursor: " ▉"              # cursor shown during streaming
-
 # =============================================================================
 # Skills Configuration
 # =============================================================================
@@ -474,15 +330,6 @@ skills:
  # Set to 0 to disable.
  creation_nudge_interval: 15

-  # External skill directories — share skills across tools/agents without
-  # copying them into ~/.hermes/skills/.  Each path is expanded (~ and ${VAR})
-  # and resolved to an absolute path.  External dirs are read-only: skill
-  # creation always writes to ~/.hermes/skills/.  Local skills take precedence
-  # when names collide.
-  # external_dirs:
-  #   - ~/.agents/skills
-  #   - /home/shared/team-skills
-
 # =============================================================================
 # Agent Behavior
 # =============================================================================
@@ -491,22 +338,6 @@ agent:
  # Higher = more room for complex tasks, but costs more tokens
  # Recommended: 20-30 for focused tasks, 50-100 for open exploration
  max_turns: 60
-
-  # Inactivity timeout for gateway agent runs (seconds, 0 = unlimited).
-  # The agent can run indefinitely when actively calling tools or receiving
-  # API responses.  Only fires after the agent has been idle for this duration.
-  # gateway_timeout: 1800
-
-  # Staged warning: send a warning before escalating to full timeout.
-  # Fires once per run when inactivity reaches this threshold (seconds).
-  # Set to 0 to disable the warning.
-  # gateway_timeout_warning: 900
-
-  # Graceful drain timeout for gateway stop/restart (seconds).
-  # The gateway stops accepting new work, waits for in-flight agents to
-  # finish, then interrupts anything still running after this timeout.
-  # 0 = no drain, interrupt immediately.
-  # restart_drain_timeout: 60
  
  # Enable verbose logging
  verbose: false
@@ -537,7 +368,7 @@ agent:
 # Toolsets
 # =============================================================================
 # Control which tools the agent has access to.
-# Use `hermes tools` to interactively enable/disable tools per platform.
+# Use "all" to enable everything, or specify individual toolsets.

 # =============================================================================
 # Platform Toolsets (per-platform tool configuration)
@@ -549,7 +380,7 @@ agent:
 #   - A preset like "hermes-cli" or "hermes-telegram" (curated tool set)
 #   - A list of individual toolsets to compose your own (see list below)
 #
-# Supported platform keys: cli, telegram, discord, whatsapp, slack, qqbot
+# Supported platform keys: cli, telegram, discord, whatsapp, slack
 #
 # Examples:
 #
@@ -571,14 +402,11 @@ agent:
 #     discord: [web, vision, skills, todo]
 #
 # If not set, defaults are:
-#   cli:           hermes-cli            (everything + cronjob management)
-#   telegram:      hermes-telegram       (terminal, file, web, vision, image, tts, browser, skills, todo, cronjob, messaging)
-#   discord:       hermes-discord        (same as telegram)
-#   whatsapp:      hermes-whatsapp       (same as telegram)
-#   slack:         hermes-slack          (same as telegram)
-#   signal:        hermes-signal         (same as telegram)
-#   homeassistant: hermes-homeassistant  (same as telegram)
-#   qqbot:            hermes-qqbot            (same as telegram)
+#   cli:      hermes-cli      (everything + cronjob management)
+#   telegram: hermes-telegram  (terminal, file, web, vision, image, tts, browser, skills, todo, cronjob, messaging)
+#   discord:  hermes-discord   (same as telegram)
+#   whatsapp: hermes-whatsapp  (same as telegram)
+#   slack:    hermes-slack     (same as telegram)
 #
 platform_toolsets:
  cli: [hermes-cli]
@@ -586,21 +414,6 @@ platform_toolsets:
  discord: [hermes-discord]
  whatsapp: [hermes-whatsapp]
  slack: [hermes-slack]
-  signal: [hermes-signal]
-  homeassistant: [hermes-homeassistant]
-  qqbot: [hermes-qqbot]
-
-# =============================================================================
-# Gateway Platform Settings
-# =============================================================================
-# Optional per-platform messaging settings.
-# Platform-specific knobs live under `extra`.
-#
-# platforms:
-#   telegram:
-#     reply_to_mode: "first"  # off | first | all
-#     extra:
-#       disable_link_previews: false  # Set true to suppress Telegram URL previews in bot messages

 # ─────────────────────────────────────────────────────────────────────────────
 # Available toolsets (use these names in platform_toolsets or the toolsets list)
@@ -615,7 +428,7 @@ platform_toolsets:
 #   terminal     - terminal, process
 #   file         - read_file, write_file, patch, search
 #   browser      - browser_navigate, browser_snapshot, browser_click, browser_type,
-#                  browser_scroll, browser_back, browser_press,
+#                  browser_scroll, browser_back, browser_press, browser_close,
 #                  browser_get_images, browser_vision  (requires BROWSERBASE_API_KEY)
 #   vision       - vision_analyze  (requires OPENROUTER_API_KEY)
 #   image_gen    - image_generate  (requires FAL_KEY)
@@ -623,8 +436,8 @@ platform_toolsets:
 #   skills_hub   - skill_hub (search/install/manage from online registries — user-driven only)
 #   moa          - mixture_of_agents  (requires OPENROUTER_API_KEY)
 #   todo         - todo (in-memory task planning, no deps)
-#   tts          - text_to_speech  (Edge TTS free, or ELEVENLABS/OPENAI/MINIMAX/MISTRAL key)
-#   cronjob      - cronjob (create/list/update/pause/resume/run/remove scheduled tasks)
+#   tts          - text_to_speech  (Edge TTS free, or ELEVENLABS/OPENAI key)
+#   cronjob      - schedule_cronjob, list_cronjobs, remove_cronjob
 #   rl           - rl_list_environments, rl_start_training, etc. (requires TINKER_API_KEY)
 #
 # PRESETS (curated bundles):
@@ -652,7 +465,7 @@ platform_toolsets:
 #   todo         - Task planning and tracking for multi-step work
 #   memory       - Persistent memory across sessions (personal notes + user profile)
 #   session_search - Search and recall past conversations (FTS5 + Gemini Flash summarization)
-#   tts          - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI, MiniMax, Mistral)
+#   tts          - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI)
 #   cronjob      - Schedule and manage automated tasks (CLI-only)
 #   rl           - RL training tools (Tinker-Atropos)
 #
@@ -660,11 +473,53 @@ platform_toolsets:
 #   debugging    - terminal + web + file (for troubleshooting)
 #   safe         - web + vision + moa (no terminal access)

-# NOTE: The top-level "toolsets" key is deprecated and ignored.
-# Tool configuration is managed per-platform via platform_toolsets above.
-# Use `hermes tools` to configure interactively, or edit platform_toolsets directly.
-#
-# CLI override: hermes chat --toolsets terminal,web,file
+# -----------------------------------------------------------------------------
+# OPTION 1: Enable all tools (default)
+# -----------------------------------------------------------------------------
+toolsets:
+  - all
+
+# -----------------------------------------------------------------------------
+# OPTION 2: Minimal - just web search and terminal
+# Great for: Simple coding tasks, quick lookups
+# -----------------------------------------------------------------------------
+# toolsets:
+#   - web
+#   - terminal
+
+# -----------------------------------------------------------------------------
+# OPTION 3: Research mode - no execution capabilities
+# Great for: Safe information gathering, research tasks
+# -----------------------------------------------------------------------------
+# toolsets:
+#   - web
+#   - vision
+#   - skills
+
+# -----------------------------------------------------------------------------
+# OPTION 4: Full automation - browser + terminal
+# Great for: Web scraping, automation tasks, testing
+# -----------------------------------------------------------------------------
+# toolsets:
+#   - terminal
+#   - browser
+#   - web
+
+# -----------------------------------------------------------------------------
+# OPTION 5: Creative mode - vision + image generation
+# Great for: Design work, image analysis, creative tasks
+# -----------------------------------------------------------------------------
+# toolsets:
+#   - vision
+#   - image_gen
+#   - web
+
+# -----------------------------------------------------------------------------
+# OPTION 6: Safe mode - no terminal or browser
+# Great for: Restricted environments, untrusted queries
+# -----------------------------------------------------------------------------
+# toolsets:
+#   - safe

 # =============================================================================
 # MCP (Model Context Protocol) Servers
@@ -700,38 +555,15 @@ platform_toolsets:
 #     args: ["-y", "@modelcontextprotocol/server-github"]
 #     env:
 #       GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..."
-#
-# Sampling (server-initiated LLM requests) — enabled by default.
-# Per-server config under the 'sampling' key:
-#   analysis:
-#     command: npx
-#     args: ["-y", "analysis-server"]
-#     sampling:
-#       enabled: true           # default: true
-#       model: "gemini-3-flash" # override model (optional)
-#       max_tokens_cap: 4096    # max tokens per request
-#       timeout: 30             # LLM call timeout (seconds)
-#       max_rpm: 10             # max requests per minute
-#       allowed_models: []      # model whitelist (empty = all)
-#       max_tool_rounds: 5      # tool loop limit (0 = disable)
-#       log_level: "info"       # audit verbosity

 # =============================================================================
 # Voice Transcription (Speech-to-Text)
 # =============================================================================
 # Automatically transcribe voice messages on messaging platforms.
-# Providers: local (free, faster-whisper) | groq (free tier) | openai (Whisper API) | mistral (Voxtral Transcribe)
-# Set the corresponding API key in .env: GROQ_API_KEY, OPENAI_API_KEY, or MISTRAL_API_KEY.
+# Requires OPENAI_API_KEY in .env (uses OpenAI Whisper API directly).
 stt:
  enabled: true
-  # provider: "local"          # auto-detected if omitted
-  local:
-    model: "base"              # tiny | base | small | medium | large-v3 | turbo
-    # language: ""             # auto-detect; set to "en", "es", "fr", etc. to force
-  openai:
-    model: "whisper-1"         # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
-  # mistral:
-  #   model: "voxtral-mini-latest"  # voxtral-mini-latest | voxtral-mini-2602
+  model: "whisper-1"  # whisper-1 (cheapest) | gpt-4o-mini-transcribe | gpt-4o-transcribe

 # =============================================================================
 # Response Pacing (Messaging Platforms)
@@ -770,16 +602,10 @@ code_execution:
 # Subagent Delegation
 # =============================================================================
 # The delegate_task tool spawns child agents with isolated context.
-# Supports single tasks and batch mode (default 3 parallel, configurable).
+# Supports single tasks and batch mode (up to 3 parallel).
 delegation:
  max_iterations: 50                          # Max tool-calling turns per child (default: 50)
-  # max_concurrent_children: 3                # Max parallel child agents (default: 3)
-  # max_spawn_depth: 1                        # Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
-  # orchestrator_enabled: true                # Kill switch for role="orchestrator" children (default: true).
-  # model: "google/gemini-3-flash-preview"    # Override model for subagents (empty = inherit parent)
-  # provider: "openrouter"                    # Override provider for subagents (empty = inherit parent)
-  #                                           # Resolves full credentials (base_url, api_key) automatically.
-  #                                           # Supported: openrouter, nous, zai, kimi-coding, minimax
+  default_toolsets: ["terminal", "file", "web"]  # Default toolsets for subagents

 # =============================================================================
 # Honcho Integration (Cross-Session User Modeling)
@@ -810,148 +636,7 @@ display:
  # Toggle at runtime with /verbose in the CLI
  tool_progress: all

-  # Gateway-only natural mid-turn assistant updates.
-  # When true, completed assistant status messages are sent as separate chat
-  # messages. This is independent of tool_progress and gateway streaming.
-  interim_assistant_messages: true
-
-  # What Enter does when Hermes is already busy in the CLI.
-  #   interrupt: Interrupt the current run and redirect Hermes (default)
-  #   queue:     Queue your message for the next turn
-  # Ctrl+C always interrupts regardless of this setting.
-  busy_input_mode: interrupt
-
-  # Background process notifications (gateway/messaging only).
-  # Controls how chatty the process watcher is when you use
-  # terminal(background=true, notify_on_complete=true) from Telegram/Discord/etc.
-  #   off:     No watcher messages at all
-  #   result:  Only the final completion message
-  #   error:   Only the final message when exit code != 0
-  #   all:     Running output updates + final message (default)
-  background_process_notifications: all
-
-
  # Play terminal bell when agent finishes a response.
  # Useful for long-running tasks — your terminal will ding when the agent is done.
  # Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
  bell_on_complete: false
-
-  # Show model reasoning/thinking before each response.
-  # When enabled, a dim box shows the model's thought process above the response.
-  # Toggle at runtime with /reasoning show or /reasoning hide.
-  show_reasoning: false
-
-  # Stream tokens to the terminal as they arrive instead of waiting for the
-  # full response. The response box opens on first token and text appears
-  # line-by-line. Tool calls are still captured silently.
-  # Stream tokens to the terminal in real-time. Disable to wait for full responses.
-  streaming: true
-
-  # ───────────────────────────────────────────────────────────────────────────
-  # Skin / Theme
-  # ───────────────────────────────────────────────────────────────────────────
-  # Customize CLI visual appearance — banner colors, spinner faces, tool prefix,
-  # response box label, and branding text. Change at runtime with /skin <name>.
-  #
-  # Built-in skins:
-  #   default  — Classic Hermes gold/kawaii
-  #   ares     — Crimson/bronze war-god theme with spinner wings
-  #   mono     — Clean grayscale monochrome
-  #   slate    — Cool blue developer-focused
-  #
-  # Custom skins: drop a YAML file in ~/.hermes/skins/<name>.yaml
-  # Schema (all fields optional, missing values inherit from default):
-  #
-  #   name: my-theme
-  #   description: Short description
-  #   colors:
-  #     banner_border: "#HEX"    # Panel border
-  #     banner_title: "#HEX"     # Panel title
-  #     banner_accent: "#HEX"    # Section headers (Available Tools, etc.)
-  #     banner_dim: "#HEX"       # Dim/muted text
-  #     banner_text: "#HEX"      # Body text (tool names, skill names)
-  #     ui_accent: "#HEX"        # UI accent color
-  #     response_border: "#HEX"  # Response box border color
-  #   spinner:
-  #     waiting_faces: ["(⚔)", "(⛨)"]       # Faces shown while waiting
-  #     thinking_faces: ["(⚔)", "(⌁)"]      # Faces shown while thinking
-  #     thinking_verbs: ["forging", "plotting"]  # Verbs for spinner messages
-  #     wings:                                # Optional left/right spinner decorations
-  #       - ["⟪⚔", "⚔⟫"]
-  #       - ["⟪▲", "▲⟫"]
-  #   branding:
-  #     agent_name: "My Agent"               # Banner title and branding
-  #     welcome: "Welcome message"           # Shown at CLI startup
-  #     response_label: " ⚔ Agent "         # Response box header label
-  #     prompt_symbol: "⚔ ❯ "              # Prompt symbol
-  #   tool_prefix: "╎"                       # Tool output line prefix (default: ┊)
-  #
-  skin: default
-
-# =============================================================================
-# Model Aliases — short names for /model command
-# =============================================================================
-# Map short aliases to exact (model, provider, base_url) tuples.
-# Used by /model tab completion and resolve_alias().
-# Aliases are checked BEFORE the models.dev catalog, so they can route
-# to endpoints not in the catalog (e.g. Ollama Cloud, local servers).
-#
-# model_aliases:
-#   opus:
-#     model: claude-opus-4-6
-#     provider: anthropic
-#   qwen:
-#     model: "qwen3.5:397b"
-#     provider: custom
-#     base_url: "https://ollama.com/v1"
-#   glm:
-#     model: glm-4.7
-#     provider: custom
-#     base_url: "https://ollama.com/v1"
-
-# =============================================================================
-# Privacy
-# =============================================================================
-# privacy:
-#   # Redact PII from the LLM context prompt.
-#   # When true, phone numbers are stripped and user/chat IDs are replaced
-#   # with deterministic hashes before being sent to the model.
-#   # Names and usernames are NOT affected (user-chosen, publicly visible).
-#   # Routing/delivery still uses the original values internally.
-#   redact_pii: false
-
-# =============================================================================
-# Shell-script hooks
-# =============================================================================
-# Register shell scripts as plugin-hook callbacks.  Each entry is executed as
-# a subprocess (shell=False, shlex.split) with a JSON payload on stdin.  On
-# stdout the script may return JSON that either blocks the tool call or
-# injects context into the next LLM call.
-#
-# Valid events (mirror hermes_cli.plugins.VALID_HOOKS):
-#   pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
-#   pre_api_request, post_api_request, on_session_start, on_session_end,
-#   on_session_finalize, on_session_reset, subagent_stop
-#
-# First-use consent: each (event, command) pair prompts once on a TTY, then
-# is persisted to ~/.hermes/shell-hooks-allowlist.json.  Non-interactive
-# runs (gateway, cron) need --accept-hooks, HERMES_ACCEPT_HOOKS=1, or the
-# hooks_auto_accept key below.
-#
-# See website/docs/user-guide/features/hooks.md for the full JSON wire
-# protocol and worked examples.
-#
-# hooks:
-#   pre_tool_call:
-#     - matcher: "terminal"
-#       command: "~/.hermes/agent-hooks/block-rm-rf.sh"
-#       timeout: 10
-#   post_tool_call:
-#     - matcher: "write_file|patch"
-#       command: "~/.hermes/agent-hooks/auto-format.sh"
-#   pre_llm_call:
-#     - command: "~/.hermes/agent-hooks/inject-cwd-context.sh"
-#   subagent_stop:
-#     - command: "~/.hermes/agent-hooks/log-orchestration.sh"
-#
-# hooks_auto_accept: false
--- a/cli.py
+++ b/cli.py
--- a/constraints-termux.txt
+++ b/constraints-termux.txt
@@ -1,15 +0,0 @@
-# Termux / Android dependency constraints for Hermes Agent.
-#
-# Usage:
-#   python -m pip install -e '.[termux]' -c constraints-termux.txt
-#
-# These pins keep the tested Android install path stable when upstream packages
-# move faster than Termux-compatible wheels / sdists.
-
-ipython<10
-jedi>=0.18.1,<0.20
-parso>=0.8.4,<0.9
-stack-data>=0.6,<0.7
-pexpect>4.3,<5
-matplotlib-inline>=0.1.7,<0.2
-asttokens>=2.1,<3
--- a/hermes_agent/cron/init.py
+++ b/hermes_agent/cron/init.py
@@ -7,26 +7,22 @@ This module provides scheduled task execution, allowing the agent to:
 - Execute tasks in isolated sessions (no prior context)

 Cron jobs are executed automatically by the gateway daemon:
-    hermes gateway install    # Install as a user service
-    sudo hermes gateway install --system  # Linux servers: boot-time system service
+    hermes gateway install    # Install as system service (recommended)
    hermes gateway            # Or run in foreground

 The gateway ticks the scheduler every 60 seconds. A file lock prevents
 duplicate execution if multiple processes overlap.
 """

-from hermes_agent.cron.jobs import (
+from cron.jobs import (
    create_job,
    get_job,
    list_jobs,
    remove_job,
    update_job,
-    pause_job,
-    resume_job,
-    trigger_job,
    JOBS_FILE,
 )
-from hermes_agent.cron.scheduler import tick
+from cron.scheduler import tick

 __all__ = [
    "create_job",
@@ -34,9 +30,6 @@ __all__ = [
    "list_jobs",
    "remove_job",
    "update_job",
-    "pause_job",
-    "resume_job",
-    "trigger_job",
    "tick",
    "JOBS_FILE",
 ]
--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -0,0 +1,410 @@
+"""
+Cron job storage and management.
+
+Jobs are stored in ~/.hermes/cron/jobs.json
+Output is saved to ~/.hermes/cron/output/{job_id}/{timestamp}.md
+"""
+
+import json
+import tempfile
+import os
+import re
+import uuid
+from datetime import datetime, timedelta
+from pathlib import Path
+from typing import Optional, Dict, List, Any
+
+from hermes_time import now as _hermes_now
+
+try:
+    from croniter import croniter
+    HAS_CRONITER = True
+except ImportError:
+    HAS_CRONITER = False
+
+# =============================================================================
+# Configuration
+# =============================================================================
+
+HERMES_DIR = Path.home() / ".hermes"
+CRON_DIR = HERMES_DIR / "cron"
+JOBS_FILE = CRON_DIR / "jobs.json"
+OUTPUT_DIR = CRON_DIR / "output"
+
+
+def ensure_dirs():
+    """Ensure cron directories exist."""
+    CRON_DIR.mkdir(parents=True, exist_ok=True)
+    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+
+
+# =============================================================================
+# Schedule Parsing
+# =============================================================================
+
+def parse_duration(s: str) -> int:
+    """
+    Parse duration string into minutes.
+    
+    Examples:
+        "30m" → 30
+        "2h" → 120
+        "1d" → 1440
+    """
+    s = s.strip().lower()
+    match = re.match(r'^(\d+)\s*(m|min|mins|minute|minutes|h|hr|hrs|hour|hours|d|day|days)$', s)
+    if not match:
+        raise ValueError(f"Invalid duration: '{s}'. Use format like '30m', '2h', or '1d'")
+    
+    value = int(match.group(1))
+    unit = match.group(2)[0]  # First char: m, h, or d
+    
+    multipliers = {'m': 1, 'h': 60, 'd': 1440}
+    return value * multipliers[unit]
+
+
+def parse_schedule(schedule: str) -> Dict[str, Any]:
+    """
+    Parse schedule string into structured format.
+    
+    Returns dict with:
+        - kind: "once" | "interval" | "cron"
+        - For "once": "run_at" (ISO timestamp)
+        - For "interval": "minutes" (int)
+        - For "cron": "expr" (cron expression)
+    
+    Examples:
+        "30m"              → once in 30 minutes
+        "2h"               → once in 2 hours
+        "every 30m"        → recurring every 30 minutes
+        "every 2h"         → recurring every 2 hours
+        "0 9 * * *"        → cron expression
+        "2026-02-03T14:00" → once at timestamp
+    """
+    schedule = schedule.strip()
+    original = schedule
+    schedule_lower = schedule.lower()
+    
+    # "every X" pattern → recurring interval
+    if schedule_lower.startswith("every "):
+        duration_str = schedule[6:].strip()
+        minutes = parse_duration(duration_str)
+        return {
+            "kind": "interval",
+            "minutes": minutes,
+            "display": f"every {minutes}m"
+        }
+    
+    # Check for cron expression (5 or 6 space-separated fields)
+    # Cron fields: minute hour day month weekday [year]
+    parts = schedule.split()
+    if len(parts) >= 5 and all(
+        re.match(r'^[\d\*\-,/]+$', p) for p in parts[:5]
+    ):
+        if not HAS_CRONITER:
+            raise ValueError("Cron expressions require 'croniter' package. Install with: pip install croniter")
+        # Validate cron expression
+        try:
+            croniter(schedule)
+        except Exception as e:
+            raise ValueError(f"Invalid cron expression '{schedule}': {e}")
+        return {
+            "kind": "cron",
+            "expr": schedule,
+            "display": schedule
+        }
+    
+    # ISO timestamp (contains T or looks like date)
+    if 'T' in schedule or re.match(r'^\d{4}-\d{2}-\d{2}', schedule):
+        try:
+            # Parse and validate
+            dt = datetime.fromisoformat(schedule.replace('Z', '+00:00'))
+            return {
+                "kind": "once",
+                "run_at": dt.isoformat(),
+                "display": f"once at {dt.strftime('%Y-%m-%d %H:%M')}"
+            }
+        except ValueError as e:
+            raise ValueError(f"Invalid timestamp '{schedule}': {e}")
+    
+    # Duration like "30m", "2h", "1d" → one-shot from now
+    try:
+        minutes = parse_duration(schedule)
+        run_at = _hermes_now() + timedelta(minutes=minutes)
+        return {
+            "kind": "once",
+            "run_at": run_at.isoformat(),
+            "display": f"once in {original}"
+        }
+    except ValueError:
+        pass
+    
+    raise ValueError(
+        f"Invalid schedule '{original}'. Use:\n"
+        f"  - Duration: '30m', '2h', '1d' (one-shot)\n"
+        f"  - Interval: 'every 30m', 'every 2h' (recurring)\n"
+        f"  - Cron: '0 9 * * *' (cron expression)\n"
+        f"  - Timestamp: '2026-02-03T14:00:00' (one-shot at time)"
+    )
+
+
+def _ensure_aware(dt: datetime) -> datetime:
+    """Make a naive datetime tz-aware using the configured timezone.
+
+    Handles backward compatibility: timestamps stored before timezone support
+    are naive (server-local).  We assume they were in the same timezone as
+    the current configuration so comparisons work without crashing.
+    """
+    if dt.tzinfo is None:
+        tz = _hermes_now().tzinfo
+        return dt.replace(tzinfo=tz)
+    return dt
+
+
+def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
+    """
+    Compute the next run time for a schedule.
+
+    Returns ISO timestamp string, or None if no more runs.
+    """
+    now = _hermes_now()
+
+    if schedule["kind"] == "once":
+        run_at = _ensure_aware(datetime.fromisoformat(schedule["run_at"]))
+        # If in the future, return it; if in the past, no more runs
+        return schedule["run_at"] if run_at > now else None
+
+    elif schedule["kind"] == "interval":
+        minutes = schedule["minutes"]
+        if last_run_at:
+            # Next run is last_run + interval
+            last = _ensure_aware(datetime.fromisoformat(last_run_at))
+            next_run = last + timedelta(minutes=minutes)
+        else:
+            # First run is now + interval
+            next_run = now + timedelta(minutes=minutes)
+        return next_run.isoformat()
+
+    elif schedule["kind"] == "cron":
+        if not HAS_CRONITER:
+            return None
+        cron = croniter(schedule["expr"], now)
+        next_run = cron.get_next(datetime)
+        return next_run.isoformat()
+
+    return None
+
+
+# =============================================================================
+# Job CRUD Operations
+# =============================================================================
+
+def load_jobs() -> List[Dict[str, Any]]:
+    """Load all jobs from storage."""
+    ensure_dirs()
+    if not JOBS_FILE.exists():
+        return []
+    
+    try:
+        with open(JOBS_FILE, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+            return data.get("jobs", [])
+    except (json.JSONDecodeError, IOError):
+        return []
+
+
+def save_jobs(jobs: List[Dict[str, Any]]):
+    """Save all jobs to storage."""
+    ensure_dirs()
+    fd, tmp_path = tempfile.mkstemp(dir=str(JOBS_FILE.parent), suffix='.tmp', prefix='.jobs_')
+    try:
+        with os.fdopen(fd, 'w', encoding='utf-8') as f:
+            json.dump({"jobs": jobs, "updated_at": _hermes_now().isoformat()}, f, indent=2)
+            f.flush()
+            os.fsync(f.fileno())
+        os.replace(tmp_path, JOBS_FILE)
+    except BaseException:
+        try:
+            os.unlink(tmp_path)
+        except OSError:
+            pass
+        raise
+
+
+def create_job(
+    prompt: str,
+    schedule: str,
+    name: Optional[str] = None,
+    repeat: Optional[int] = None,
+    deliver: Optional[str] = None,
+    origin: Optional[Dict[str, Any]] = None
+) -> Dict[str, Any]:
+    """
+    Create a new cron job.
+    
+    Args:
+        prompt: The prompt to run (must be self-contained)
+        schedule: Schedule string (see parse_schedule)
+        name: Optional friendly name
+        repeat: How many times to run (None = forever, 1 = once)
+        deliver: Where to deliver output ("origin", "local", "telegram", etc.)
+        origin: Source info where job was created (for "origin" delivery)
+    
+    Returns:
+        The created job dict
+    """
+    parsed_schedule = parse_schedule(schedule)
+    
+    # Auto-set repeat=1 for one-shot schedules if not specified
+    if parsed_schedule["kind"] == "once" and repeat is None:
+        repeat = 1
+    
+    # Default delivery to origin if available, otherwise local
+    if deliver is None:
+        deliver = "origin" if origin else "local"
+    
+    job_id = uuid.uuid4().hex[:12]
+    now = _hermes_now().isoformat()
+    
+    job = {
+        "id": job_id,
+        "name": name or prompt[:50].strip(),
+        "prompt": prompt,
+        "schedule": parsed_schedule,
+        "schedule_display": parsed_schedule.get("display", schedule),
+        "repeat": {
+            "times": repeat,  # None = forever
+            "completed": 0
+        },
+        "enabled": True,
+        "created_at": now,
+        "next_run_at": compute_next_run(parsed_schedule),
+        "last_run_at": None,
+        "last_status": None,
+        "last_error": None,
+        # Delivery configuration
+        "deliver": deliver,
+        "origin": origin,  # Tracks where job was created for "origin" delivery
+    }
+    
+    jobs = load_jobs()
+    jobs.append(job)
+    save_jobs(jobs)
+    
+    return job
+
+
+def get_job(job_id: str) -> Optional[Dict[str, Any]]:
+    """Get a job by ID."""
+    jobs = load_jobs()
+    for job in jobs:
+        if job["id"] == job_id:
+            return job
+    return None
+
+
+def list_jobs(include_disabled: bool = False) -> List[Dict[str, Any]]:
+    """List all jobs, optionally including disabled ones."""
+    jobs = load_jobs()
+    if not include_disabled:
+        jobs = [j for j in jobs if j.get("enabled", True)]
+    return jobs
+
+
+def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]:
+    """Update a job by ID."""
+    jobs = load_jobs()
+    for i, job in enumerate(jobs):
+        if job["id"] == job_id:
+            jobs[i] = {**job, **updates}
+            save_jobs(jobs)
+            return jobs[i]
+    return None
+
+
+def remove_job(job_id: str) -> bool:
+    """Remove a job by ID."""
+    jobs = load_jobs()
+    original_len = len(jobs)
+    jobs = [j for j in jobs if j["id"] != job_id]
+    if len(jobs) < original_len:
+        save_jobs(jobs)
+        return True
+    return False
+
+
+def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
+    """
+    Mark a job as having been run.
+    
+    Updates last_run_at, last_status, increments completed count,
+    computes next_run_at, and auto-deletes if repeat limit reached.
+    """
+    jobs = load_jobs()
+    for i, job in enumerate(jobs):
+        if job["id"] == job_id:
+            now = _hermes_now().isoformat()
+            job["last_run_at"] = now
+            job["last_status"] = "ok" if success else "error"
+            job["last_error"] = error if not success else None
+            
+            # Increment completed count
+            if job.get("repeat"):
+                job["repeat"]["completed"] = job["repeat"].get("completed", 0) + 1
+                
+                # Check if we've hit the repeat limit
+                times = job["repeat"].get("times")
+                completed = job["repeat"]["completed"]
+                if times is not None and completed >= times:
+                    # Remove the job (limit reached)
+                    jobs.pop(i)
+                    save_jobs(jobs)
+                    return
+            
+            # Compute next run
+            job["next_run_at"] = compute_next_run(job["schedule"], now)
+            
+            # If no next run (one-shot completed), disable
+            if job["next_run_at"] is None:
+                job["enabled"] = False
+            
+            save_jobs(jobs)
+            return
+    
+    save_jobs(jobs)
+
+
+def get_due_jobs() -> List[Dict[str, Any]]:
+    """Get all jobs that are due to run now."""
+    now = _hermes_now()
+    jobs = load_jobs()
+    due = []
+    
+    for job in jobs:
+        if not job.get("enabled", True):
+            continue
+        
+        next_run = job.get("next_run_at")
+        if not next_run:
+            continue
+        
+        next_run_dt = _ensure_aware(datetime.fromisoformat(next_run))
+        if next_run_dt <= now:
+            due.append(job)
+    
+    return due
+
+
+def save_job_output(job_id: str, output: str):
+    """Save job output to file."""
+    ensure_dirs()
+    job_output_dir = OUTPUT_DIR / job_id
+    job_output_dir.mkdir(parents=True, exist_ok=True)
+    
+    timestamp = _hermes_now().strftime("%Y-%m-%d_%H-%M-%S")
+    output_file = job_output_dir / f"{timestamp}.md"
+    
+    with open(output_file, 'w', encoding='utf-8') as f:
+        f.write(output)
+    
+    return output_file
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -0,0 +1,390 @@
+"""
+Cron job scheduler - executes due jobs.
+
+Provides tick() which checks for due jobs and runs them. The gateway
+calls this every 60 seconds from a background thread.
+
+Uses a file-based lock (~/.hermes/cron/.tick.lock) so only one tick
+runs at a time if multiple processes overlap.
+"""
+
+import asyncio
+import logging
+import os
+import sys
+import traceback
+
+# fcntl is Unix-only; on Windows use msvcrt for file locking
+try:
+    import fcntl
+except ImportError:
+    fcntl = None
+    try:
+        import msvcrt
+    except ImportError:
+        msvcrt = None
+from datetime import datetime
+from pathlib import Path
+from typing import Optional
+
+from hermes_time import now as _hermes_now
+
+logger = logging.getLogger(__name__)
+
+# Add parent directory to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from cron.jobs import get_due_jobs, mark_job_run, save_job_output
+
+# Resolve Hermes home directory (respects HERMES_HOME override)
+_hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
+
+# File-based lock prevents concurrent ticks from gateway + daemon + systemd timer
+_LOCK_DIR = _hermes_home / "cron"
+_LOCK_FILE = _LOCK_DIR / ".tick.lock"
+
+
+def _resolve_origin(job: dict) -> Optional[dict]:
+    """Extract origin info from a job, returning {platform, chat_id, chat_name} or None."""
+    origin = job.get("origin")
+    if not origin:
+        return None
+    platform = origin.get("platform")
+    chat_id = origin.get("chat_id")
+    if platform and chat_id:
+        return origin
+    return None
+
+
+def _deliver_result(job: dict, content: str) -> None:
+    """
+    Deliver job output to the configured target (origin chat, specific platform, etc.).
+
+    Uses the standalone platform send functions from send_message_tool so delivery
+    works whether or not the gateway is running.
+    """
+    deliver = job.get("deliver", "local")
+    origin = _resolve_origin(job)
+
+    if deliver == "local":
+        return
+
+    # Resolve target platform + chat_id
+    if deliver == "origin":
+        if not origin:
+            logger.warning("Job '%s' deliver=origin but no origin stored, skipping delivery", job["id"])
+            return
+        platform_name = origin["platform"]
+        chat_id = origin["chat_id"]
+    elif ":" in deliver:
+        platform_name, chat_id = deliver.split(":", 1)
+    else:
+        # Bare platform name like "telegram" — need to resolve to origin or home channel
+        platform_name = deliver
+        if origin and origin.get("platform") == platform_name:
+            chat_id = origin["chat_id"]
+        else:
+            # Fall back to home channel
+            chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
+            if not chat_id:
+                logger.warning("Job '%s' deliver=%s but no chat_id or home channel. Set via: hermes config set %s_HOME_CHANNEL <channel_id>", job["id"], deliver, platform_name.upper())
+                return
+
+    from tools.send_message_tool import _send_to_platform
+    from gateway.config import load_gateway_config, Platform
+
+    platform_map = {
+        "telegram": Platform.TELEGRAM,
+        "discord": Platform.DISCORD,
+        "slack": Platform.SLACK,
+        "whatsapp": Platform.WHATSAPP,
+        "signal": Platform.SIGNAL,
+    }
+    platform = platform_map.get(platform_name.lower())
+    if not platform:
+        logger.warning("Job '%s': unknown platform '%s' for delivery", job["id"], platform_name)
+        return
+
+    try:
+        config = load_gateway_config()
+    except Exception as e:
+        logger.error("Job '%s': failed to load gateway config for delivery: %s", job["id"], e)
+        return
+
+    pconfig = config.platforms.get(platform)
+    if not pconfig or not pconfig.enabled:
+        logger.warning("Job '%s': platform '%s' not configured/enabled", job["id"], platform_name)
+        return
+
+    # Run the async send in a fresh event loop (safe from any thread)
+    try:
+        result = asyncio.run(_send_to_platform(platform, pconfig, chat_id, content))
+    except RuntimeError:
+        # asyncio.run() fails if there's already a running loop in this thread;
+        # spin up a new thread to avoid that.
+        import concurrent.futures
+        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+            future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, content))
+            result = future.result(timeout=30)
+    except Exception as e:
+        logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
+        return
+
+    if result and result.get("error"):
+        logger.error("Job '%s': delivery error: %s", job["id"], result["error"])
+    else:
+        logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
+        # Mirror the delivered content into the target's gateway session
+        try:
+            from gateway.mirror import mirror_to_session
+            mirror_to_session(platform_name, chat_id, content, source_label="cron")
+        except Exception:
+            pass
+
+
+def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
+    """
+    Execute a single cron job.
+    
+    Returns:
+        Tuple of (success, full_output_doc, final_response, error_message)
+    """
+    from run_agent import AIAgent
+    
+    job_id = job["id"]
+    job_name = job["name"]
+    prompt = job["prompt"]
+    origin = _resolve_origin(job)
+    
+    logger.info("Running job '%s' (ID: %s)", job_name, job_id)
+    logger.info("Prompt: %s", prompt[:100])
+
+    # Inject origin context so the agent's send_message tool knows the chat
+    if origin:
+        os.environ["HERMES_SESSION_PLATFORM"] = origin["platform"]
+        os.environ["HERMES_SESSION_CHAT_ID"] = str(origin["chat_id"])
+        if origin.get("chat_name"):
+            os.environ["HERMES_SESSION_CHAT_NAME"] = origin["chat_name"]
+
+    try:
+        # Re-read .env and config.yaml fresh every run so provider/key
+        # changes take effect without a gateway restart.
+        from dotenv import load_dotenv
+        try:
+            load_dotenv(str(_hermes_home / ".env"), override=True, encoding="utf-8")
+        except UnicodeDecodeError:
+            load_dotenv(str(_hermes_home / ".env"), override=True, encoding="latin-1")
+
+        model = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
+
+        # Load config.yaml for model, reasoning, prefill, toolsets, provider routing
+        _cfg = {}
+        try:
+            import yaml
+            _cfg_path = str(_hermes_home / "config.yaml")
+            if os.path.exists(_cfg_path):
+                with open(_cfg_path) as _f:
+                    _cfg = yaml.safe_load(_f) or {}
+                _model_cfg = _cfg.get("model", {})
+                if isinstance(_model_cfg, str):
+                    model = _model_cfg
+                elif isinstance(_model_cfg, dict):
+                    model = _model_cfg.get("default", model)
+        except Exception:
+            pass
+
+        # Reasoning config from env or config.yaml
+        reasoning_config = None
+        effort = os.getenv("HERMES_REASONING_EFFORT", "")
+        if not effort:
+            effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
+        if effort and effort.lower() != "none":
+            valid = ("xhigh", "high", "medium", "low", "minimal")
+            if effort.lower() in valid:
+                reasoning_config = {"enabled": True, "effort": effort.lower()}
+        elif effort.lower() == "none":
+            reasoning_config = {"enabled": False}
+
+        # Prefill messages from env or config.yaml
+        prefill_messages = None
+        prefill_file = os.getenv("HERMES_PREFILL_MESSAGES_FILE", "") or _cfg.get("prefill_messages_file", "")
+        if prefill_file:
+            import json as _json
+            pfpath = Path(prefill_file).expanduser()
+            if not pfpath.is_absolute():
+                pfpath = _hermes_home / pfpath
+            if pfpath.exists():
+                try:
+                    with open(pfpath, "r", encoding="utf-8") as _pf:
+                        prefill_messages = _json.load(_pf)
+                    if not isinstance(prefill_messages, list):
+                        prefill_messages = None
+                except Exception:
+                    prefill_messages = None
+
+        # Max iterations
+        max_iterations = _cfg.get("agent", {}).get("max_turns") or _cfg.get("max_turns") or 90
+
+        # Provider routing
+        pr = _cfg.get("provider_routing", {})
+
+        from hermes_cli.runtime_provider import (
+            resolve_runtime_provider,
+            format_runtime_provider_error,
+        )
+        try:
+            runtime = resolve_runtime_provider(
+                requested=os.getenv("HERMES_INFERENCE_PROVIDER"),
+            )
+        except Exception as exc:
+            message = format_runtime_provider_error(exc)
+            raise RuntimeError(message) from exc
+
+        agent = AIAgent(
+            model=model,
+            api_key=runtime.get("api_key"),
+            base_url=runtime.get("base_url"),
+            provider=runtime.get("provider"),
+            api_mode=runtime.get("api_mode"),
+            max_iterations=max_iterations,
+            reasoning_config=reasoning_config,
+            prefill_messages=prefill_messages,
+            providers_allowed=pr.get("only"),
+            providers_ignored=pr.get("ignore"),
+            providers_order=pr.get("order"),
+            provider_sort=pr.get("sort"),
+            quiet_mode=True,
+            session_id=f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}"
+        )
+        
+        result = agent.run_conversation(prompt)
+        
+        final_response = result.get("final_response", "")
+        if not final_response:
+            final_response = "(No response generated)"
+        
+        output = f"""# Cron Job: {job_name}
+
+**Job ID:** {job_id}
+**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}
+**Schedule:** {job.get('schedule_display', 'N/A')}
+
+## Prompt
+
+{prompt}
+
+## Response
+
+{final_response}
+"""
+        
+        logger.info("Job '%s' completed successfully", job_name)
+        return True, output, final_response, None
+        
+    except Exception as e:
+        error_msg = f"{type(e).__name__}: {str(e)}"
+        logger.error("Job '%s' failed: %s", job_name, error_msg)
+        
+        output = f"""# Cron Job: {job_name} (FAILED)
+
+**Job ID:** {job_id}
+**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}
+**Schedule:** {job.get('schedule_display', 'N/A')}
+
+## Prompt
+
+{prompt}
+
+## Error
+
+```
+{error_msg}
+
+{traceback.format_exc()}
+```
+"""
+        return False, output, "", error_msg
+
+    finally:
+        # Clean up injected env vars so they don't leak to other jobs
+        for key in ("HERMES_SESSION_PLATFORM", "HERMES_SESSION_CHAT_ID", "HERMES_SESSION_CHAT_NAME"):
+            os.environ.pop(key, None)
+
+
+def tick(verbose: bool = True) -> int:
+    """
+    Check and run all due jobs.
+    
+    Uses a file lock so only one tick runs at a time, even if the gateway's
+    in-process ticker and a standalone daemon or manual tick overlap.
+    
+    Args:
+        verbose: Whether to print status messages
+    
+    Returns:
+        Number of jobs executed (0 if another tick is already running)
+    """
+    _LOCK_DIR.mkdir(parents=True, exist_ok=True)
+
+    # Cross-platform file locking: fcntl on Unix, msvcrt on Windows
+    lock_fd = None
+    try:
+        lock_fd = open(_LOCK_FILE, "w")
+        if fcntl:
+            fcntl.flock(lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
+        elif msvcrt:
+            msvcrt.locking(lock_fd.fileno(), msvcrt.LK_NBLCK, 1)
+    except (OSError, IOError):
+        logger.debug("Tick skipped — another instance holds the lock")
+        if lock_fd is not None:
+            lock_fd.close()
+        return 0
+
+    try:
+        due_jobs = get_due_jobs()
+
+        if verbose and not due_jobs:
+            logger.info("%s - No jobs due", _hermes_now().strftime('%H:%M:%S'))
+            return 0
+
+        if verbose:
+            logger.info("%s - %s job(s) due", _hermes_now().strftime('%H:%M:%S'), len(due_jobs))
+
+        executed = 0
+        for job in due_jobs:
+            try:
+                success, output, final_response, error = run_job(job)
+
+                output_file = save_job_output(job["id"], output)
+                if verbose:
+                    logger.info("Output saved to: %s", output_file)
+
+                # Deliver the final response to the origin/target chat
+                deliver_content = final_response if success else f"⚠️ Cron job '{job.get('name', job['id'])}' failed:\n{error}"
+                if deliver_content:
+                    try:
+                        _deliver_result(job, deliver_content)
+                    except Exception as de:
+                        logger.error("Delivery failed for job %s: %s", job["id"], de)
+
+                mark_job_run(job["id"], success, error)
+                executed += 1
+
+            except Exception as e:
+                logger.error("Error processing job %s: %s", job['id'], e)
+                mark_job_run(job["id"], False, str(e))
+
+        return executed
+    finally:
+        if fcntl:
+            fcntl.flock(lock_fd, fcntl.LOCK_UN)
+        elif msvcrt:
+            try:
+                msvcrt.locking(lock_fd.fileno(), msvcrt.LK_UNLCK, 1)
+            except (OSError, IOError):
+                pass
+        lock_fd.close()
+
+
+if __name__ == "__main__":
+    tick(verbose=True)
--- a/datagen-config-examples/run_browser_tasks.sh
+++ b/datagen-config-examples/run_browser_tasks.sh
@@ -29,7 +29,7 @@ echo "📝 Logging to: $LOG_FILE"
 # Point to the example dataset in this directory
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

-python scripts/batch_runner.py \
+python batch_runner.py \
  --dataset_file="$SCRIPT_DIR/example_browser_tasks.jsonl" \
  --batch_size=5 \
  --run_name="browser_tasks_example" \
--- a/datagen-config-examples/web_research.yaml
+++ b/datagen-config-examples/web_research.yaml
@@ -1,46 +0,0 @@
-# datagen-config-examples/web_research.yaml
-#
-# Batch data generation config for WebResearchEnv.
-# Generates tool-calling trajectories for multi-step web research tasks.
-#
-# Usage:
-#   python scripts/batch_runner.py \
-#     --config datagen-config-examples/web_research.yaml \
-#     --run_name web_research_v1
-
-environment: web-research
-
-# Toolsets available to the agent during data generation
-toolsets:
-  - web
-  - file
-
-# How many parallel workers to use
-num_workers: 4
-
-# Questions per batch
-batch_size: 20
-
-# Total trajectories to generate (comment out to run full dataset)
-max_items: 500
-
-# Model to use for generation (override with --model flag)
-model: openrouter/nousresearch/hermes-3-llama-3.1-405b
-
-# System prompt additions (ephemeral — not saved to trajectories)
-ephemeral_system_prompt: |
-  You are a highly capable research agent. When asked a factual question,
-  always use web_search to find current, accurate information before answering.
-  Cite at least 2 sources. Be concise and accurate.
-
-# Output directory
-output_dir: data/web_research_v1
-
-# Trajectory compression settings (for fitting into training token budgets)
-compression:
-  enabled: true
-  target_max_tokens: 16000
-
-# Eval settings
-eval_every: 100       # Run eval every N trajectories
-eval_size: 25         # Number of held-out questions per eval run
--- a/docker/SOUL.md
+++ b/docker/SOUL.md
@@ -1,15 +0,0 @@
-# Hermes Agent Persona
-
-<!--
-This file defines the agent's personality and tone.
-The agent will embody whatever you write here.
-Edit this to customize how Hermes communicates with you.
-
-Examples:
-  - "You are a warm, playful assistant who uses kaomoji occasionally."
-  - "You are a concise technical expert. No fluff, just facts."
-  - "You speak like a friendly coworker who happens to know everything."
-
-This file is loaded fresh each message -- no restart needed.
-Delete the contents (or this file) to use the default personality.
-->
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@@ -1,71 +0,0 @@
-#!/bin/bash
-# Docker/Podman entrypoint: bootstrap config files into the mounted volume, then run hermes.
-set -e
-
-HERMES_HOME="${HERMES_HOME:-/opt/data}"
-INSTALL_DIR="/opt/hermes"
-
-# --- Privilege dropping via gosu ---
-# When started as root (the default for Docker, or fakeroot in rootless Podman),
-# optionally remap the hermes user/group to match host-side ownership, fix volume
-# permissions, then re-exec as hermes.
-if [ "$(id -u)" = "0" ]; then
-    if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
-        echo "Changing hermes UID to $HERMES_UID"
-        usermod -u "$HERMES_UID" hermes
-    fi
-
-    if [ -n "$HERMES_GID" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
-        echo "Changing hermes GID to $HERMES_GID"
-        # -o allows non-unique GID (e.g. macOS GID 20 "staff" may already exist
-        # as "dialout" in the Debian-based container image)
-        groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
-    fi
-
-    actual_hermes_uid=$(id -u hermes)
-    if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
-        echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
-        # In rootless Podman the container's "root" is mapped to an unprivileged
-        # host UID — chown will fail.  That's fine: the volume is already owned
-        # by the mapped user on the host side.
-        chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
-            echo "Warning: chown failed (rootless container?) — continuing anyway"
-    fi
-
-    echo "Dropping root privileges"
-    exec gosu hermes "$0" "$@"
-fi
-
-# --- Running as hermes from here ---
-source "${INSTALL_DIR}/.venv/bin/activate"
-
-# Create essential directory structure.  Cache and platform directories
-# (cache/images, cache/audio, platforms/whatsapp, etc.) are created on
-# demand by the application — don't pre-create them here so new installs
-# get the consolidated layout from get_hermes_dir().
-# The "home/" subdirectory is a per-profile HOME for subprocesses (git,
-# ssh, gh, npm …).  Without it those tools write to /root which is
-# ephemeral and shared across profiles.  See issue #4426.
-mkdir -p "$HERMES_HOME"/{cron,sessions,logs,hooks,memories,skills,skins,plans,workspace,home}
-
-# .env
-if [ ! -f "$HERMES_HOME/.env" ]; then
-    cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
-fi
-
-# config.yaml
-if [ ! -f "$HERMES_HOME/config.yaml" ]; then
-    cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
-fi
-
-# SOUL.md
-if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
-    cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
-fi
-
-# Sync bundled skills (manifest-based so user edits are preserved)
-if [ -d "$INSTALL_DIR/skills" ]; then
-    hermes-skills-sync
-fi
-
-exec hermes "$@"
--- a/environments/README.md
+++ b/environments/README.md
@@ -101,11 +101,21 @@ Available methods:

 ### Patches (`patches.py`)

-**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., the Modal backend). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
+**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., mini-swe-agent's Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.

-**Solution**: `ModalEnvironment` uses a dedicated `_AsyncWorker` background thread with its own event loop. The calling code sees a sync interface, but internally all async Modal SDK calls happen on the worker thread so they don't conflict with Atropos's loop. This is built directly into `tools/environments/modal.py` — no monkey-patching required.
+**Solution**: `patches.py` monkey-patches `SwerexModalEnvironment` to use a dedicated background thread (`_AsyncWorker`) with its own event loop. The calling code sees the same sync interface, but internally the async work happens on a separate thread that doesn't conflict with Atropos's loop.

-`patches.py` is now a no-op (kept for backward compatibility with imports).
+What gets patched:
+- `SwerexModalEnvironment.__init__` -- creates Modal deployment on a background thread
+- `SwerexModalEnvironment.execute` -- runs commands on the same background thread
+- `SwerexModalEnvironment.stop` -- stops deployment on the background thread
+
+The patches are:
+- **Idempotent** -- calling `apply_patches()` multiple times is safe
+- **Transparent** -- same interface and behavior, only the internal async execution changes
+- **Universal** -- works identically in normal CLI use (no running event loop)
+
+Applied automatically at import time by `hermes_base_env.py`.

 ### Tool Call Parsers (`tool_call_parsers/`)

--- a/environments/init.py
+++ b/environments/init.py
@@ -18,14 +18,9 @@ Benchmarks (eval-only):
    - benchmarks/terminalbench_2/: Terminal-Bench 2.0 evaluation
 """

-try:
-    from environments.agent_loop import AgentResult, HermesAgentLoop
-    from environments.tool_context import ToolContext
-    from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
-except ImportError:
-    # atroposlib not installed — environments are unavailable but
-    # submodules like tool_call_parsers can still be imported directly.
-    pass
+from environments.agent_loop import AgentResult, HermesAgentLoop
+from environments.tool_context import ToolContext
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig

 __all__ = [
    "AgentResult",
--- a/environments/agent_loop.py
+++ b/environments/agent_loop.py
@@ -18,17 +18,12 @@ import logging
 import os
 import uuid
 from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional, Set, TYPE_CHECKING
+from typing import Any, Dict, List, Optional, Set

-if TYPE_CHECKING:
-    from hermes_agent.tools.budget_config import BudgetConfig
-
-from hermes_agent.tools.dispatch import handle_function_call
-from hermes_agent.tools.terminal import get_active_env
-from hermes_agent.tools.result_storage import maybe_persist_tool_result, enforce_turn_budget
+from model_tools import handle_function_call

 # Thread pool for running sync tool calls that internally use asyncio.run()
-# (e.g., the Modal/Docker/Daytona terminal backends). Running them in a separate
+# (e.g., mini-swe-agent's modal/docker/daytona backends). Running them in a separate
 # thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
 # Size must be large enough for concurrent eval tasks (e.g., 89 TB2 tasks all
 # making tool calls). Too small = thread pool starvation, tasks queue for minutes.
@@ -44,9 +39,7 @@ def resize_tool_pool(max_workers: int):
    Safe to call before any tasks are submitted.
    """
    global _tool_executor
-    old_executor = _tool_executor
    _tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
-    old_executor.shutdown(wait=False)
    logger.info("Tool thread pool resized to %d workers", max_workers)

 logger = logging.getLogger(__name__)
@@ -143,7 +136,6 @@ class HermesAgentLoop:
        temperature: float = 1.0,
        max_tokens: Optional[int] = None,
        extra_body: Optional[Dict[str, Any]] = None,
-        budget_config: Optional["BudgetConfig"] = None,
    ):
        """
        Initialize the agent loop.
@@ -160,11 +152,7 @@ class HermesAgentLoop:
            extra_body: Extra parameters passed to the OpenAI client's create() call.
                        Used for OpenRouter provider preferences, transforms, etc.
                        e.g. {"provider": {"ignore": ["DeepInfra"]}}
-            budget_config: Tool result persistence budget. Controls per-tool
-                        thresholds, per-turn aggregate budget, and preview size.
-                        If None, uses DEFAULT_BUDGET (current hardcoded values).
        """
-        from hermes_agent.tools.budget_config import DEFAULT_BUDGET
        self.server = server
        self.tool_schemas = tool_schemas
        self.valid_tool_names = valid_tool_names
@@ -173,7 +161,6 @@ class HermesAgentLoop:
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.extra_body = extra_body
-        self.budget_config = budget_config or DEFAULT_BUDGET

    async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
        """
@@ -190,7 +177,7 @@ class HermesAgentLoop:
        tool_errors: List[ToolError] = []

        # Per-loop TodoStore for the todo tool (ephemeral, dies with the loop)
-        from hermes_agent.tools.todo import TodoStore, todo_tool as _todo_tool
+        from tools.todo_tool import TodoStore, todo_tool as _todo_tool
        _todo_store = TodoStore()

        # Extract user task from first user message for browser_snapshot context
@@ -262,62 +249,23 @@ class HermesAgentLoop:
            reasoning = _extract_reasoning_from_message(assistant_msg)
            reasoning_per_turn.append(reasoning)

-            # Check for tool calls -- standard OpenAI spec.
-            # Fallback: if response has no structured tool_calls but content
-            # contains raw tool call tags (e.g. <tool_call>), parse them using
-            # hermes-agent's standalone parsers. This handles the case where
-            # ManagedServer's ToolCallTranslator couldn't parse because vLLM
-            # isn't installed.
-            if (
-                not assistant_msg.tool_calls
-                and assistant_msg.content
-                and self.tool_schemas
-                and "<tool_call>" in (assistant_msg.content or "")
-            ):
-                try:
-                    from environments.tool_call_parsers import get_parser
-                    fallback_parser = get_parser("hermes")
-                    parsed_content, parsed_calls = fallback_parser.parse(
-                        assistant_msg.content
-                    )
-                    if parsed_calls:
-                        assistant_msg.tool_calls = parsed_calls
-                        if parsed_content is not None:
-                            assistant_msg.content = parsed_content
-                        logger.debug(
-                            "Fallback parser extracted %d tool calls from raw content",
-                            len(parsed_calls),
-                        )
-                except Exception:
-                    pass  # Fall through to no tool calls
-
+            # Check for tool calls -- standard OpenAI spec
            if assistant_msg.tool_calls:
-                # Normalize tool calls to dicts — they may come as objects
-                # (OpenAI API) or dicts (vLLM ToolCallTranslator).
-                def _tc_to_dict(tc):
-                    if isinstance(tc, dict):
-                        return {
-                            "id": tc.get("id", f"call_{uuid.uuid4().hex[:8]}"),
-                            "type": "function",
-                            "function": {
-                                "name": tc.get("function", {}).get("name", tc.get("name", "")),
-                                "arguments": tc.get("function", {}).get("arguments", tc.get("arguments", "{}")),
-                            },
-                        }
-                    return {
-                        "id": tc.id,
-                        "type": "function",
-                        "function": {
-                            "name": tc.function.name,
-                            "arguments": tc.function.arguments,
-                        },
-                    }
-
                # Build the assistant message dict for conversation history
                msg_dict: Dict[str, Any] = {
                    "role": "assistant",
                    "content": assistant_msg.content or "",
-                    "tool_calls": [_tc_to_dict(tc) for tc in assistant_msg.tool_calls],
+                    "tool_calls": [
+                        {
+                            "id": tc.id,
+                            "type": "function",
+                            "function": {
+                                "name": tc.function.name,
+                                "arguments": tc.function.arguments,
+                            },
+                        }
+                        for tc in assistant_msg.tool_calls
+                    ],
                }

                # Preserve reasoning_content for multi-turn chat template handling
@@ -330,13 +278,8 @@ class HermesAgentLoop:

                # Execute each tool call via hermes-agent's dispatch
                for tc in assistant_msg.tool_calls:
-                    # Handle both object (OpenAI) and dict (vLLM) formats
-                    if isinstance(tc, dict):
-                        tool_name = tc.get("function", {}).get("name", tc.get("name", ""))
-                        tool_args_raw = tc.get("function", {}).get("arguments", tc.get("arguments", "{}"))
-                    else:
-                        tool_name = tc.function.name
-                        tool_args_raw = tc.function.arguments
+                    tool_name = tc.function.name
+                    tool_args_raw = tc.function.arguments

                    # Validate tool name
                    if tool_name not in self.valid_tool_names:
@@ -357,89 +300,78 @@ class HermesAgentLoop:
                            tool_name, turn + 1,
                        )
                    else:
-                        # Parse arguments
+                        # Parse arguments and dispatch
                        try:
                            args = json.loads(tool_args_raw)
-                        except json.JSONDecodeError as e:
-                            args = None
-                            tool_result = json.dumps(
-                                {"error": f"Invalid JSON in tool arguments: {e}. Please retry with valid JSON."}
-                            )
-                            tool_errors.append(ToolError(
-                                turn=turn + 1, tool_name=tool_name,
-                                arguments=tool_args_raw[:200],
-                                error=f"Invalid JSON: {e}",
-                                tool_result=tool_result,
-                            ))
+                        except json.JSONDecodeError:
+                            args = {}
                            logger.warning(
                                "Invalid JSON in tool call arguments for '%s': %s",
                                tool_name, tool_args_raw[:200],
                            )

-                        # Dispatch tool only if arguments parsed successfully
-                        if args is not None:
-                            try:
-                                if tool_name == "terminal":
-                                    backend = os.getenv("TERMINAL_ENV", "local")
-                                    cmd_preview = args.get("command", "")[:80]
-                                    logger.info(
-                                        "[%s] $ %s", self.task_id[:8], cmd_preview,
-                                    )
-
-                                tool_submit_time = _time.monotonic()
-
-                                # Todo tool -- handle locally (needs per-loop TodoStore)
-                                if tool_name == "todo":
-                                    tool_result = _todo_tool(
-                                        todos=args.get("todos"),
-                                        merge=args.get("merge", False),
-                                        store=_todo_store,
-                                    )
-                                    tool_elapsed = _time.monotonic() - tool_submit_time
-                                elif tool_name == "memory":
-                                    tool_result = json.dumps({"error": "Memory is not available in RL environments."})
-                                    tool_elapsed = _time.monotonic() - tool_submit_time
-                                elif tool_name == "session_search":
-                                    tool_result = json.dumps({"error": "Session search is not available in RL environments."})
-                                    tool_elapsed = _time.monotonic() - tool_submit_time
-                                else:
-                                    # Run tool calls in a thread pool so backends that
-                                    # use asyncio.run() internally (modal, docker, daytona) get
-                                    # a clean event loop instead of deadlocking.
-                                    loop = asyncio.get_event_loop()
-                                    # Capture current tool_name/args for the lambda
-                                    _tn, _ta, _tid = tool_name, args, self.task_id
-                                    tool_result = await loop.run_in_executor(
-                                        _tool_executor,
-                                        lambda: handle_function_call(
-                                            _tn, _ta, task_id=_tid,
-                                            user_task=_user_task,
-                                        ),
-                                    )
-                                    tool_elapsed = _time.monotonic() - tool_submit_time
-
-                                # Log slow tools and thread pool stats for debugging
-                                pool_active = _tool_executor._work_queue.qsize()
-                                if tool_elapsed > 30:
-                                    logger.warning(
-                                        "[%s] turn %d: %s took %.1fs (pool queue=%d)",
-                                        self.task_id[:8], turn + 1, tool_name,
-                                        tool_elapsed, pool_active,
-                                    )
-                            except Exception as e:
-                                tool_result = json.dumps(
-                                    {"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
+                        try:
+                            if tool_name == "terminal":
+                                backend = os.getenv("TERMINAL_ENV", "local")
+                                cmd_preview = args.get("command", "")[:80]
+                                logger.info(
+                                    "[%s] $ %s", self.task_id[:8], cmd_preview,
                                )
-                                tool_errors.append(ToolError(
-                                    turn=turn + 1, tool_name=tool_name,
-                                    arguments=tool_args_raw[:200],
-                                    error=f"{type(e).__name__}: {str(e)}",
-                                    tool_result=tool_result,
-                                ))
-                                logger.error(
-                                    "Tool '%s' execution failed on turn %d: %s",
-                                    tool_name, turn + 1, e,
+
+                            tool_submit_time = _time.monotonic()
+
+                            # Todo tool -- handle locally (needs per-loop TodoStore)
+                            if tool_name == "todo":
+                                tool_result = _todo_tool(
+                                    todos=args.get("todos"),
+                                    merge=args.get("merge", False),
+                                    store=_todo_store,
                                )
+                                tool_elapsed = _time.monotonic() - tool_submit_time
+                            elif tool_name == "memory":
+                                tool_result = json.dumps({"error": "Memory is not available in RL environments."})
+                                tool_elapsed = _time.monotonic() - tool_submit_time
+                            elif tool_name == "session_search":
+                                tool_result = json.dumps({"error": "Session search is not available in RL environments."})
+                                tool_elapsed = _time.monotonic() - tool_submit_time
+                            else:
+                                # Run tool calls in a thread pool so backends that
+                                # use asyncio.run() internally (modal, docker, daytona) get
+                                # a clean event loop instead of deadlocking.
+                                loop = asyncio.get_event_loop()
+                                # Capture current tool_name/args for the lambda
+                                _tn, _ta, _tid = tool_name, args, self.task_id
+                                tool_result = await loop.run_in_executor(
+                                    _tool_executor,
+                                    lambda: handle_function_call(
+                                        _tn, _ta, task_id=_tid,
+                                        user_task=_user_task,
+                                    ),
+                                )
+                                tool_elapsed = _time.monotonic() - tool_submit_time
+
+                            # Log slow tools and thread pool stats for debugging
+                            pool_active = _tool_executor._work_queue.qsize()
+                            if tool_elapsed > 30:
+                                logger.warning(
+                                    "[%s] turn %d: %s took %.1fs (pool queue=%d)",
+                                    self.task_id[:8], turn + 1, tool_name,
+                                    tool_elapsed, pool_active,
+                                )
+                        except Exception as e:
+                            tool_result = json.dumps(
+                                {"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
+                            )
+                            tool_errors.append(ToolError(
+                                turn=turn + 1, tool_name=tool_name,
+                                arguments=tool_args_raw[:200],
+                                error=f"{type(e).__name__}: {str(e)}",
+                                tool_result=tool_result,
+                            ))
+                            logger.error(
+                                "Tool '%s' execution failed on turn %d: %s",
+                                tool_name, turn + 1, e,
+                            )

                        # Also check if the tool returned an error in its JSON result
                        try:
@@ -457,31 +389,15 @@ class HermesAgentLoop:
                        except (json.JSONDecodeError, TypeError):
                            pass

-                    tc_id = tc.get("id", "") if isinstance(tc, dict) else tc.id
-                    tool_result = maybe_persist_tool_result(
-                        content=tool_result,
-                        tool_name=tool_name,
-                        tool_use_id=tc_id,
-                        env=get_active_env(self.task_id),
-                        config=self.budget_config,
-                    )
-
+                    # Add tool response to conversation
                    messages.append(
                        {
                            "role": "tool",
-                            "tool_call_id": tc_id,
+                            "tool_call_id": tc.id,
                            "content": tool_result,
                        }
                    )

-                num_tcs = len(assistant_msg.tool_calls)
-                if num_tcs > 0:
-                    enforce_turn_budget(
-                        messages[-num_tcs:],
-                        env=get_active_env(self.task_id),
-                        config=self.budget_config,
-                    )
-
                turn_elapsed = _time.monotonic() - turn_start
                logger.info(
                    "[%s] turn %d: api=%.1fs, %d tools, turn_total=%.1fs",
--- a/environments/agentic_opd_env.py
+++ b/environments/agentic_opd_env.py
--- a/environments/benchmarks/tblite/local.yaml
+++ b/environments/benchmarks/tblite/local.yaml
@@ -1,38 +0,0 @@
-# OpenThoughts-TBLite Evaluation -- Docker Backend (Local Compute)
-#
-# Runs tasks in Docker containers on the local machine.
-# Sandboxed like Modal but no cloud costs. Good for dev/testing.
-#
-# Usage:
-#   python environments/benchmarks/tblite/tblite_env.py evaluate \
-#       --config environments/benchmarks/tblite/local.yaml
-#
-#   # Override concurrency:
-#   python environments/benchmarks/tblite/tblite_env.py evaluate \
-#       --config environments/benchmarks/tblite/local.yaml \
-#       --env.eval_concurrency 4
-
-env:
-  enabled_toolsets: ["terminal", "file"]
-  max_agent_turns: 60
-  max_token_length: 32000
-  agent_temperature: 0.8
-  terminal_backend: "docker"
-  terminal_timeout: 300
-  tool_pool_size: 16
-  dataset_name: "NousResearch/openthoughts-tblite"
-  test_timeout: 600
-  task_timeout: 1200
-  eval_concurrency: 8          # max 8 tasks at once
-  tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
-  use_wandb: false
-  wandb_name: "openthoughts-tblite-local"
-  ensure_scores_are_not_same: false
-  data_dir_to_save_evals: "environments/benchmarks/evals/openthoughts-tblite-local"
-
-openai:
-  base_url: "https://openrouter.ai/api/v1"
-  model_name: "anthropic/claude-sonnet-4"
-  server_type: "openai"
-  health_check: false
-  # api_key loaded from OPENROUTER_API_KEY in .env
--- a/environments/benchmarks/tblite/local_vllm.yaml
+++ b/environments/benchmarks/tblite/local_vllm.yaml
@@ -1,40 +0,0 @@
-# OpenThoughts-TBLite Evaluation -- Local vLLM Backend
-#
-# Runs against a local vLLM server with Docker sandboxes.
-#
-# Start the vLLM server from the atropos directory:
-#   python -m example_trainer.vllm_api_server \
-#       --model Qwen/Qwen3-4B-Instruct-2507 \
-#       --port 9001 \
-#       --gpu-memory-utilization 0.8 \
-#       --max-model-len=32000
-#
-# Then run:
-#   python environments/benchmarks/tblite/tblite_env.py evaluate \
-#       --config environments/benchmarks/tblite/local_vllm.yaml
-
-env:
-  enabled_toolsets: ["terminal", "file"]
-  max_agent_turns: 60
-  max_token_length: 16000
-  agent_temperature: 0.6
-  terminal_backend: "docker"
-  terminal_timeout: 300
-  tool_pool_size: 16
-  dataset_name: "NousResearch/openthoughts-tblite"
-  test_timeout: 600
-  task_timeout: 1200
-  eval_concurrency: 8
-  tool_call_parser: "hermes"
-  system_prompt: "You are an expert terminal agent. You MUST use the provided tools to complete tasks. Use the terminal tool to run shell commands, read_file to read files, write_file to write files, search_files to search, and patch to edit files. Do NOT write out solutions as text - execute them using the tools. Always start by exploring the environment with terminal commands."
-  tokenizer_name: "Qwen/Qwen3-4B-Instruct-2507"
-  use_wandb: false
-  wandb_name: "tblite-qwen3-4b-instruct"
-  ensure_scores_are_not_same: false
-  data_dir_to_save_evals: "environments/benchmarks/evals/tblite-qwen3-4b-local"
-
-openai:
-  base_url: "http://localhost:9001"
-  model_name: "Qwen/Qwen3-4B-Instruct-2507"
-  server_type: "vllm"
-  health_check: false
--- a/environments/benchmarks/terminalbench_2/default.yaml
+++ b/environments/benchmarks/terminalbench_2/default.yaml
@@ -29,10 +29,6 @@ env:
  wandb_name: "terminal-bench-2"
  ensure_scores_are_not_same: false
  data_dir_to_save_evals: "environments/benchmarks/evals/terminal-bench-2"
-  # CRITICAL: Limit concurrent Modal sandbox creations to avoid deadlocks.
-  # Modal's blocking calls (App.lookup, etc.) deadlock when too many sandboxes
-  # are created simultaneously inside thread pool workers via asyncio.run().
-  max_concurrent_tasks: 8

 openai:
  base_url: "https://openrouter.ai/api/v1"
--- a/environments/benchmarks/terminalbench_2/terminalbench2_env.py
+++ b/environments/benchmarks/terminalbench_2/terminalbench2_env.py
@@ -44,7 +44,7 @@ import tempfile
 import time
 import uuid
 from collections import defaultdict
-from pathlib import Path, PurePosixPath, PureWindowsPath
+from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple, Union

 # Ensure repo root is on sys.path for imports
@@ -60,7 +60,7 @@ from atroposlib.envs.server_handling.server_manager import APIServerConfig
 from environments.agent_loop import AgentResult, HermesAgentLoop
 from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
 from environments.tool_context import ToolContext
-from hermes_agent.tools.terminal import (
+from tools.terminal_tool import (
    register_task_env_overrides,
    clear_task_env_overrides,
    cleanup_vm,
@@ -118,23 +118,6 @@ class TerminalBench2EvalConfig(HermesAgentEnvConfig):
        "Tasks exceeding this are scored as FAIL. Default 30 minutes.",
    )

-    # --- Concurrency control ---
-    max_concurrent_tasks: int = Field(
-        default=8,
-        description="Maximum number of tasks to run concurrently. "
-        "Limits concurrent Modal sandbox creations to avoid async/threading deadlocks. "
-        "Modal has internal limits and creating too many sandboxes simultaneously "
-        "causes blocking calls to deadlock inside the thread pool.",
-    )
-
-    # --- Eval concurrency ---
-    eval_concurrency: int = Field(
-        default=0,
-        description="Maximum number of tasks to evaluate in parallel. "
-        "0 means unlimited (all tasks run concurrently). "
-        "Set to 8 for local backends to avoid overwhelming the machine.",
-    )
-

 # Tasks that cannot run properly on Modal and are excluded from scoring.
 MODAL_INCOMPATIBLE_TASKS = {
@@ -148,62 +131,6 @@ MODAL_INCOMPATIBLE_TASKS = {
 # Tar extraction helper
 # =============================================================================

-def _normalize_tar_member_parts(member_name: str) -> list:
-    """Return safe path components for a tar member or raise ValueError."""
-    normalized_name = member_name.replace("\\", "/")
-    posix_path = PurePosixPath(normalized_name)
-    windows_path = PureWindowsPath(member_name)
-
-    if (
-        not normalized_name
-        or posix_path.is_absolute()
-        or windows_path.is_absolute()
-        or windows_path.drive
-    ):
-        raise ValueError(f"Unsafe archive member path: {member_name}")
-
-    parts = [part for part in posix_path.parts if part not in ("", ".")]
-    if not parts or any(part == ".." for part in parts):
-        raise ValueError(f"Unsafe archive member path: {member_name}")
-    return parts
-
-
-def _safe_extract_tar(tar: tarfile.TarFile, target_dir: Path) -> None:
-    """Extract a tar archive without allowing traversal or link entries."""
-    target_dir.mkdir(parents=True, exist_ok=True)
-    target_root = target_dir.resolve()
-
-    for member in tar.getmembers():
-        parts = _normalize_tar_member_parts(member.name)
-        target = target_dir.joinpath(*parts)
-        target_real = target.resolve(strict=False)
-
-        try:
-            target_real.relative_to(target_root)
-        except ValueError as exc:
-            raise ValueError(f"Unsafe archive member path: {member.name}") from exc
-
-        if member.isdir():
-            target_real.mkdir(parents=True, exist_ok=True)
-            continue
-
-        if not member.isfile():
-            raise ValueError(f"Unsupported archive member type: {member.name}")
-
-        target_real.parent.mkdir(parents=True, exist_ok=True)
-        extracted = tar.extractfile(member)
-        if extracted is None:
-            raise ValueError(f"Cannot read archive member: {member.name}")
-
-        with extracted, open(target_real, "wb") as dst:
-            shutil.copyfileobj(extracted, dst)
-
-        try:
-            os.chmod(target_real, member.mode & 0o777)
-        except OSError:
-            pass
-
-
 def _extract_base64_tar(b64_data: str, target_dir: Path):
    """Extract a base64-encoded tar.gz archive into target_dir."""
    if not b64_data:
@@ -211,7 +138,7 @@ def _extract_base64_tar(b64_data: str, target_dir: Path):
    raw = base64.b64decode(b64_data)
    buf = io.BytesIO(raw)
    with tarfile.open(fileobj=buf, mode="r:gz") as tar:
-        _safe_extract_tar(tar, target_dir)
+        tar.extractall(path=str(target_dir))


 # =============================================================================
@@ -502,14 +429,8 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
                    "error": "no_image",
                }

-            # --- 2. Register per-task image override ---
-            # Set both modal_image and docker_image so the task image is used
-            # regardless of which backend is configured.
-            register_task_env_overrides(task_id, {
-                "modal_image": modal_image,
-                "docker_image": modal_image,
-                "cwd": "/app",
-            })
+            # --- 2. Register per-task Modal image override ---
+            register_task_env_overrides(task_id, {"modal_image": modal_image})
            logger.info(
                "Task %s: registered image override for task_id %s",
                task_name, task_id[:8],
@@ -524,39 +445,17 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
            messages.append({"role": "user", "content": self.format_prompt(eval_item)})

            # --- 4. Run agent loop ---
-            # Use ManagedServer (Phase 2) for vLLM/SGLang backends to get
-            # token-level tracking via /generate. Falls back to direct
-            # ServerManager (Phase 1) for OpenAI endpoints.
-            if self._use_managed_server():
-                async with self.server.managed_server(
-                    tokenizer=self.tokenizer,
-                    preserve_think_blocks=bool(self.config.thinking_mode),
-                ) as managed:
-                    agent = HermesAgentLoop(
-                        server=managed,
-                        tool_schemas=tools,
-                        valid_tool_names=valid_names,
-                        max_turns=self.config.max_agent_turns,
-                        task_id=task_id,
-                        temperature=self.config.agent_temperature,
-                        max_tokens=self.config.max_token_length,
-                        extra_body=self.config.extra_body,
-                        budget_config=self.config.build_budget_config(),
-                    )
-                    result = await agent.run(messages)
-            else:
-                agent = HermesAgentLoop(
-                    server=self.server,
-                    tool_schemas=tools,
-                    valid_tool_names=valid_names,
-                    max_turns=self.config.max_agent_turns,
-                    task_id=task_id,
-                    temperature=self.config.agent_temperature,
-                    max_tokens=self.config.max_token_length,
-                    extra_body=self.config.extra_body,
-                    budget_config=self.config.build_budget_config(),
-                )
-                result = await agent.run(messages)
+            agent = HermesAgentLoop(
+                server=self.server,
+                tool_schemas=tools,
+                valid_tool_names=valid_names,
+                max_turns=self.config.max_agent_turns,
+                task_id=task_id,
+                temperature=self.config.agent_temperature,
+                max_tokens=self.config.max_token_length,
+                extra_body=self.config.extra_body,
+            )
+            result = await agent.run(messages)

            # --- 5. Verify -- run test suite in the agent's sandbox ---
            # Skip verification if the agent produced no meaningful output
@@ -834,23 +733,12 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
        print(f"  Tool thread pool: {self.config.tool_pool_size}")
        print(f"  Terminal timeout: {self.config.terminal_timeout}s/cmd")
        print(f"  Terminal lifetime: {self.config.terminal_lifetime}s (auto: task_timeout + 120)")
-        print(f"  Max concurrent tasks: {self.config.max_concurrent_tasks}")
        print(f"{'='*60}\n")

-        # Semaphore to limit concurrent Modal sandbox creations.
-        # Without this, all 86 tasks fire simultaneously, each creating a Modal
-        # sandbox via asyncio.run() inside a thread pool worker. Modal's blocking
-        # calls (App.lookup, etc.) deadlock when too many are created at once.
-        semaphore = asyncio.Semaphore(self.config.max_concurrent_tasks)
-
-        async def _eval_with_semaphore(item):
-            async with semaphore:
-                return await self._eval_with_timeout(item)
-
        # Fire all tasks with wall-clock timeout, track live accuracy on the bar
        total_tasks = len(self.all_eval_items)
        eval_tasks = [
-            asyncio.ensure_future(_eval_with_semaphore(item))
+            asyncio.ensure_future(self._eval_with_timeout(item))
            for item in self.all_eval_items
        ]

@@ -876,7 +764,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
            # Let cancellations propagate (finally blocks run cleanup_vm)
            await asyncio.gather(*eval_tasks, return_exceptions=True)
            # Belt-and-suspenders: clean up any remaining sandboxes
-            from hermes_agent.tools.terminal import cleanup_all_environments
+            from tools.terminal_tool import cleanup_all_environments
            cleanup_all_environments()
            print("All sandboxes cleaned up.")
            return
@@ -984,7 +872,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):

        # Kill all remaining sandboxes. Timed-out tasks leave orphaned thread
        # pool workers still executing commands -- cleanup_all stops them.
-        from hermes_agent.tools.terminal import cleanup_all_environments
+        from tools.terminal_tool import cleanup_all_environments
        print("\nCleaning up all sandboxes...")
        cleanup_all_environments()

--- a/environments/benchmarks/yc_bench/yc_bench_env.py
+++ b/environments/benchmarks/yc_bench/yc_bench_env.py
@@ -549,7 +549,6 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
                temperature=self.config.agent_temperature,
                max_tokens=self.config.max_token_length,
                extra_body=self.config.extra_body,
-                budget_config=self.config.build_budget_config(),
            )
            result = await agent.run(messages)

@@ -709,7 +708,7 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
            tqdm.write("\n[INTERRUPTED] Stopping evaluation...")
            pbar.close()
            try:
-                from hermes_agent.tools.terminal import cleanup_all_environments
+                from tools.terminal_tool import cleanup_all_environments
                cleanup_all_environments()
            except Exception:
                pass
@@ -819,7 +818,7 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
            print(f"Results saved to: {self._streaming_path}")

        try:
-            from hermes_agent.tools.terminal import cleanup_all_environments
+            from tools.terminal_tool import cleanup_all_environments
            cleanup_all_environments()
        except Exception:
            pass
--- a/environments/hermes_base_env.py
+++ b/environments/hermes_base_env.py
@@ -62,15 +62,10 @@ from atroposlib.type_definitions import Item

 from environments.agent_loop import AgentResult, HermesAgentLoop
 from environments.tool_context import ToolContext
-from hermes_agent.tools.budget_config import (
-    DEFAULT_RESULT_SIZE_CHARS,
-    DEFAULT_TURN_BUDGET_CHARS,
-    DEFAULT_PREVIEW_SIZE_CHARS,
-)

 # Import hermes-agent toolset infrastructure
-from hermes_agent.tools.dispatch import get_tool_definitions
-from hermes_agent.tools.distributions import sample_toolsets_from_distribution
+from model_tools import get_tool_definitions
+from toolset_distributions import sample_toolsets_from_distribution

 logger = logging.getLogger(__name__)

@@ -165,32 +160,6 @@ class HermesAgentEnvConfig(BaseEnvConfig):
        "Options: hermes, mistral, llama3_json, qwen, deepseek_v3, etc.",
    )

-    # --- Tool result budget ---
-    # Defaults imported from tools.budget_config (single source of truth).
-    default_result_size_chars: int = Field(
-        default=DEFAULT_RESULT_SIZE_CHARS,
-        description="Default per-tool threshold (chars) for persisting large results "
-        "to sandbox. Results exceeding this are written to /tmp/hermes-results/ "
-        "and replaced with a preview. Per-tool registry values take precedence "
-        "unless overridden via tool_result_overrides.",
-    )
-    turn_budget_chars: int = Field(
-        default=DEFAULT_TURN_BUDGET_CHARS,
-        description="Aggregate char budget per assistant turn. If all tool results "
-        "in a single turn exceed this, the largest are persisted to disk first.",
-    )
-    preview_size_chars: int = Field(
-        default=DEFAULT_PREVIEW_SIZE_CHARS,
-        description="Size of the inline preview shown after a tool result is persisted.",
-    )
-    tool_result_overrides: Optional[Dict[str, int]] = Field(
-        default=None,
-        description="Per-tool threshold overrides (chars). Keys are tool names, "
-        "values are char thresholds. Overrides both the default and registry "
-        "per-tool values. Example: {'terminal': 10000, 'search_files': 5000}. "
-        "Note: read_file is pinned to infinity and cannot be overridden.",
-    )
-
    # --- Provider-specific parameters ---
    # Passed as extra_body to the OpenAI client's chat.completions.create() call.
    # Useful for OpenRouter provider preferences, transforms, route settings, etc.
@@ -207,16 +176,6 @@ class HermesAgentEnvConfig(BaseEnvConfig):
        "transforms, and other provider-specific settings.",
    )

-    def build_budget_config(self):
-        """Build a BudgetConfig from env config fields."""
-        from hermes_agent.tools.budget_config import BudgetConfig
-        return BudgetConfig(
-            default_result_size=self.default_result_size_chars,
-            turn_budget=self.turn_budget_chars,
-            preview_size=self.preview_size_chars,
-            tool_overrides=dict(self.tool_result_overrides) if self.tool_result_overrides else {},
-        )
-

 class HermesAgentBaseEnv(BaseEnv):
    """
@@ -270,12 +229,6 @@ class HermesAgentBaseEnv(BaseEnv):
        from environments.agent_loop import resize_tool_pool
        resize_tool_pool(config.tool_pool_size)

-        # Set tool_parser on the ServerManager so ManagedServer uses it
-        # for bidirectional tool call translation (raw text ↔ OpenAI tool_calls).
-        if hasattr(self.server, 'tool_parser'):
-            self.server.tool_parser = config.tool_call_parser
-            print(f"🔧 Tool parser: {config.tool_call_parser}")
-
        # Current group's resolved tools (set in collect_trajectories)
        self._current_group_tools: Optional[Tuple[List[Dict], Set[str]]] = None

@@ -513,14 +466,22 @@ class HermesAgentBaseEnv(BaseEnv):
        # Run the agent loop
        result: AgentResult
        if self._use_managed_server():
-            # Phase 2: ManagedServer with ToolCallTranslator -- exact tokens + logprobs
-            # tool_parser is set on ServerManager in __init__ and passed through
-            # to ManagedServer, which uses ToolCallTranslator for bidirectional
-            # translation between raw text and OpenAI tool_calls.
+            # Phase 2: ManagedServer with parser -- exact tokens + logprobs
+            # Load the tool call parser from registry based on config
+            from environments.tool_call_parsers import get_parser
+            try:
+                tc_parser = get_parser(self.config.tool_call_parser)
+            except KeyError:
+                logger.warning(
+                    "Tool call parser '%s' not found, falling back to 'hermes'",
+                    self.config.tool_call_parser,
+                )
+                tc_parser = get_parser("hermes")
+
            try:
                async with self.server.managed_server(
                    tokenizer=self.tokenizer,
-                    preserve_think_blocks=bool(self.config.thinking_mode),
+                    tool_call_parser=tc_parser,
                ) as managed:
                    agent = HermesAgentLoop(
                        server=managed,
@@ -531,7 +492,6 @@ class HermesAgentBaseEnv(BaseEnv):
                        temperature=self.config.agent_temperature,
                        max_tokens=self.config.max_token_length,
                        extra_body=self.config.extra_body,
-                        budget_config=self.config.build_budget_config(),
                    )
                    result = await agent.run(messages)
            except NotImplementedError:
@@ -549,7 +509,6 @@ class HermesAgentBaseEnv(BaseEnv):
                    temperature=self.config.agent_temperature,
                    max_tokens=self.config.max_token_length,
                    extra_body=self.config.extra_body,
-                    budget_config=self.config.build_budget_config(),
                )
                result = await agent.run(messages)
        else:
@@ -563,7 +522,6 @@ class HermesAgentBaseEnv(BaseEnv):
                temperature=self.config.agent_temperature,
                max_tokens=self.config.max_token_length,
                extra_body=self.config.extra_body,
-                budget_config=self.config.build_budget_config(),
            )
            result = await agent.run(messages)

--- a/environments/patches.py
+++ b/environments/patches.py
@@ -2,34 +2,187 @@
 Monkey patches for making hermes-agent tools work inside async frameworks (Atropos).

 Problem:
-    Some tools use asyncio.run() internally (e.g., Modal backend via SWE-ReX,
+    Some tools use asyncio.run() internally (e.g., mini-swe-agent's Modal backend,
    web_extract). This crashes when called from inside Atropos's event loop because
    asyncio.run() can't be nested.

 Solution:
-    The Modal environment (tools/environments/modal.py) now uses a dedicated
-    _AsyncWorker thread internally, making it safe for both CLI and Atropos use.
-    No monkey-patching is required.
+    Replace the problematic methods with versions that use a dedicated background
+    thread with its own event loop. The calling code sees the same sync interface --
+    call a function, get a result -- but internally the async work happens on a
+    separate thread that doesn't conflict with Atropos's loop.

-    This module is kept for backward compatibility. apply_patches() is a no-op.
+    These patches are safe for normal CLI use too: when there's no running event
+    loop, the behavior is identical (the background thread approach works regardless).
+
+What gets patched:
+    - SwerexModalEnvironment.__init__ -- creates Modal deployment on a background thread
+    - SwerexModalEnvironment.execute -- runs commands on the same background thread
+    - SwerexModalEnvironment.stop -- stops deployment on the background thread

 Usage:
    Call apply_patches() once at import time (done automatically by hermes_base_env.py).
-    This is idempotent and safe to call multiple times.
+    This is idempotent -- calling it multiple times is safe.
 """

+import asyncio
 import logging
+import threading
+from typing import Any

 logger = logging.getLogger(__name__)

 _patches_applied = False


+class _AsyncWorker:
+    """
+    A dedicated background thread with its own event loop.
+
+    Allows sync code to submit async coroutines and block for results,
+    even when called from inside another running event loop. Used to
+    bridge sync tool interfaces with async backends (Modal, SWE-ReX).
+    """
+
+    def __init__(self):
+        self._loop: asyncio.AbstractEventLoop = None
+        self._thread: threading.Thread = None
+        self._started = threading.Event()
+
+    def start(self):
+        """Start the background event loop thread."""
+        self._thread = threading.Thread(target=self._run_loop, daemon=True)
+        self._thread.start()
+        self._started.wait(timeout=30)
+
+    def _run_loop(self):
+        """Background thread entry point -- runs the event loop forever."""
+        self._loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(self._loop)
+        self._started.set()
+        self._loop.run_forever()
+
+    def run_coroutine(self, coro, timeout=600):
+        """
+        Submit a coroutine to the background loop and block until it completes.
+
+        Safe to call from any thread, including threads that already have
+        a running event loop.
+        """
+        if self._loop is None or self._loop.is_closed():
+            raise RuntimeError("AsyncWorker loop is not running")
+        future = asyncio.run_coroutine_threadsafe(coro, self._loop)
+        return future.result(timeout=timeout)
+
+    def stop(self):
+        """Stop the background event loop and join the thread."""
+        if self._loop and self._loop.is_running():
+            self._loop.call_soon_threadsafe(self._loop.stop)
+        if self._thread:
+            self._thread.join(timeout=10)
+
+
+def _patch_swerex_modal():
+    """
+    Monkey patch SwerexModalEnvironment to use a background thread event loop
+    instead of asyncio.run(). This makes it safe to call from inside Atropos's
+    async event loop.
+
+    The patched methods have the exact same interface and behavior -- the only
+    difference is HOW the async work is executed internally.
+    """
+    try:
+        from minisweagent.environments.extra.swerex_modal import (
+            SwerexModalEnvironment,
+            SwerexModalEnvironmentConfig,
+        )
+        from swerex.deployment.modal import ModalDeployment
+        from swerex.runtime.abstract import Command as RexCommand
+    except ImportError:
+        # mini-swe-agent or swe-rex not installed -- nothing to patch
+        logger.debug("mini-swe-agent Modal backend not available, skipping patch")
+        return
+
+    # Save original methods so we can refer to config handling
+    _original_init = SwerexModalEnvironment.__init__
+
+    def _patched_init(self, **kwargs):
+        """Patched __init__: creates Modal deployment on a background thread."""
+        self.config = SwerexModalEnvironmentConfig(**kwargs)
+
+        # Start a dedicated event loop thread for all Modal async operations
+        self._worker = _AsyncWorker()
+        self._worker.start()
+
+        # Create AND start the deployment entirely on the worker's loop/thread
+        # so all gRPC channels and async state are bound to that loop
+        async def _create_and_start():
+            deployment = ModalDeployment(
+                image=self.config.image,
+                startup_timeout=self.config.startup_timeout,
+                runtime_timeout=self.config.runtime_timeout,
+                deployment_timeout=self.config.deployment_timeout,
+                install_pipx=self.config.install_pipx,
+                modal_sandbox_kwargs=self.config.modal_sandbox_kwargs,
+            )
+            await deployment.start()
+            return deployment
+
+        self.deployment = self._worker.run_coroutine(_create_and_start())
+
+    def _patched_execute(self, command: str, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
+        """Patched execute: runs commands on the background thread's loop."""
+        async def _do_execute():
+            return await self.deployment.runtime.execute(
+                RexCommand(
+                    command=command,
+                    shell=True,
+                    check=False,
+                    cwd=cwd or self.config.cwd,
+                    timeout=timeout or self.config.timeout,
+                    merge_output_streams=True,
+                    env=self.config.env if self.config.env else None,
+                )
+            )
+
+        output = self._worker.run_coroutine(_do_execute())
+        return {
+            "output": output.stdout,
+            "returncode": output.exit_code,
+        }
+
+    def _patched_stop(self):
+        """Patched stop: stops deployment on the background thread, then stops the thread."""
+        try:
+            self._worker.run_coroutine(
+                asyncio.wait_for(self.deployment.stop(), timeout=10),
+                timeout=15,
+            )
+        except Exception:
+            pass
+        finally:
+            self._worker.stop()
+
+    # Apply the patches
+    SwerexModalEnvironment.__init__ = _patched_init
+    SwerexModalEnvironment.execute = _patched_execute
+    SwerexModalEnvironment.stop = _patched_stop
+
+    logger.debug("Patched SwerexModalEnvironment for async-safe operation")
+
+
 def apply_patches():
-    """Apply all monkey patches needed for Atropos compatibility."""
+    """
+    Apply all monkey patches needed for Atropos compatibility.
+
+    Safe to call multiple times -- patches are only applied once.
+    Safe for normal CLI use -- patched code works identically when
+    there is no running event loop.
+    """
    global _patches_applied
    if _patches_applied:
        return

-    logger.debug("apply_patches() called; no patches needed (async safety is built-in)")
+    _patch_swerex_modal()
+
    _patches_applied = True
--- a/environments/tool_call_parsers/deepseek_v3_parser.py
+++ b/environments/tool_call_parsers/deepseek_v3_parser.py
@@ -10,13 +10,12 @@ Format uses special unicode tokens:
    <｜tool▁call▁end｜>
    <｜tool▁calls▁end｜>

-Fixes Issue #989: Support for multiple simultaneous tool calls.
+Based on VLLM's DeepSeekV3ToolParser.extract_tool_calls()
 """

 import re
 import uuid
-import logging
-from typing import List, Optional, Tuple
+from typing import List, Optional

 from openai.types.chat.chat_completion_message_tool_call import (
    ChatCompletionMessageToolCall,
@@ -25,7 +24,6 @@ from openai.types.chat.chat_completion_message_tool_call import (

 from environments.tool_call_parsers import ParseResult, ToolCallParser, register_parser

-logger = logging.getLogger(__name__)

@register_parser("deepseek_v3")
 class DeepSeekV3ToolCallParser(ToolCallParser):
@@ -34,56 +32,45 @@ class DeepSeekV3ToolCallParser(ToolCallParser):

    Uses special unicode tokens with fullwidth angle brackets and block elements.
    Extracts type, function name, and JSON arguments from the structured format.
-    Ensures all tool calls are captured when the model executes multiple actions.
    """

    START_TOKEN = "<｜tool▁calls▁begin｜>"

-    # Updated PATTERN: Using \s* instead of literal \n for increased robustness
-    # against variations in model formatting (Issue #989).
+    # Regex captures: type, function_name, function_arguments
    PATTERN = re.compile(
-        r"<｜tool▁call▁begin｜>(?P<type>.*?)<｜tool▁sep｜>(?P<function_name>.*?)\s*```json\s*(?P<function_arguments>.*?)\s*```\s*<｜tool▁call▁end｜>",
+        r"<｜tool▁call▁begin｜>(?P<type>.*)<｜tool▁sep｜>(?P<function_name>.*)\n```json\n(?P<function_arguments>.*)\n```<｜tool▁call▁end｜>",
        re.DOTALL,
    )

    def parse(self, text: str) -> ParseResult:
-        """
-        Parses the input text and extracts all available tool calls.
-        """
        if self.START_TOKEN not in text:
            return text, None

        try:
-            # Using finditer to capture ALL tool calls in the sequence
-            matches = list(self.PATTERN.finditer(text))
+            matches = self.PATTERN.findall(text)
            if not matches:
                return text, None

            tool_calls: List[ChatCompletionMessageToolCall] = []
-            
            for match in matches:
-                func_name = match.group("function_name").strip()
-                func_args = match.group("function_arguments").strip()
-                
+                tc_type, func_name, func_args = match
                tool_calls.append(
                    ChatCompletionMessageToolCall(
                        id=f"call_{uuid.uuid4().hex[:8]}",
                        type="function",
                        function=Function(
-                            name=func_name,
-                            arguments=func_args,
+                            name=func_name.strip(),
+                            arguments=func_args.strip(),
                        ),
                    )
                )

-            if tool_calls:
-                # Content is text before the first tool call block
-                content_index = text.find(self.START_TOKEN)
-                content = text[:content_index].strip()
-                return content if content else None, tool_calls
+            if not tool_calls:
+                return text, None

-            return text, None
+            # Content is everything before the tool calls section
+            content = text[: text.find(self.START_TOKEN)].strip()
+            return content if content else None, tool_calls

-        except Exception as e:
-            logger.error(f"Error parsing DeepSeek V3 tool calls: {e}")
+        except Exception:
            return text, None
--- a/environments/tool_call_parsers/hermes_parser.py
+++ b/environments/tool_call_parsers/hermes_parser.py
@@ -49,8 +49,6 @@ class HermesToolCallParser(ToolCallParser):
                    continue

                tc_data = json.loads(raw_json)
-                if "name" not in tc_data:
-                    continue
                tool_calls.append(
                    ChatCompletionMessageToolCall(
                        id=f"call_{uuid.uuid4().hex[:8]}",
--- a/environments/tool_call_parsers/mistral_parser.py
+++ b/environments/tool_call_parsers/mistral_parser.py
@@ -10,6 +10,7 @@ The [TOOL_CALLS] token is the bot_token used by Mistral models.
 """

 import json
+import re
 import uuid
 from typing import List, Optional

@@ -41,6 +42,9 @@ class MistralToolCallParser(ToolCallParser):
    # The [TOOL_CALLS] token -- may appear as different strings depending on tokenizer
    BOT_TOKEN = "[TOOL_CALLS]"

+    # Fallback regex for pre-v11 format when JSON parsing fails
+    TOOL_CALL_REGEX = re.compile(r"\[?\s*(\{.*?\})\s*\]?", re.DOTALL)
+
    def parse(self, text: str) -> ParseResult:
        if self.BOT_TOKEN not in text:
            return text, None
@@ -67,13 +71,6 @@ class MistralToolCallParser(ToolCallParser):
                    tool_name = raw[:brace_idx].strip()
                    args_str = raw[brace_idx:]

-                    # Validate and clean the JSON arguments
-                    try:
-                        parsed_args = json.loads(args_str)
-                        args_str = json.dumps(parsed_args, ensure_ascii=False)
-                    except json.JSONDecodeError:
-                        pass  # Keep raw if parsing fails
-
                    tool_calls.append(
                        ChatCompletionMessageToolCall(
                            id=_generate_mistral_id(),
@@ -89,8 +86,6 @@ class MistralToolCallParser(ToolCallParser):
                        parsed = [parsed]

                    for tc in parsed:
-                        if "name" not in tc:
-                            continue
                        args = tc.get("arguments", {})
                        if isinstance(args, dict):
                            args = json.dumps(args, ensure_ascii=False)
@@ -105,14 +100,13 @@ class MistralToolCallParser(ToolCallParser):
                            )
                        )
                except json.JSONDecodeError:
-                    # Fallback: extract JSON objects using raw_decode
-                    decoder = json.JSONDecoder()
-                    idx = 0
-                    while idx < len(first_raw):
-                        try:
-                            obj, end_idx = decoder.raw_decode(first_raw, idx)
-                            if isinstance(obj, dict) and "name" in obj:
-                                args = obj.get("arguments", {})
+                    # Fallback regex extraction
+                    match = self.TOOL_CALL_REGEX.findall(first_raw)
+                    if match:
+                        for raw_json in match:
+                            try:
+                                tc = json.loads(raw_json)
+                                args = tc.get("arguments", {})
                                if isinstance(args, dict):
                                    args = json.dumps(args, ensure_ascii=False)
                                tool_calls.append(
@@ -120,13 +114,12 @@ class MistralToolCallParser(ToolCallParser):
                                        id=_generate_mistral_id(),
                                        type="function",
                                        function=Function(
-                                            name=obj["name"], arguments=args
+                                            name=tc["name"], arguments=args
                                        ),
                                    )
                                )
-                            idx = end_idx
-                        except json.JSONDecodeError:
-                            idx += 1
+                            except (json.JSONDecodeError, KeyError):
+                                continue

            if not tool_calls:
                return text, None
--- a/environments/tool_context.py
+++ b/environments/tool_context.py
@@ -31,9 +31,9 @@ from typing import Any, Dict, List, Optional
 import asyncio
 import concurrent.futures

-from hermes_agent.tools.dispatch import handle_function_call
-from hermes_agent.tools.terminal import cleanup_vm
-from hermes_agent.tools.browser.tool import cleanup_browser
+from model_tools import handle_function_call
+from tools.terminal_tool import cleanup_vm
+from tools.browser_tool import cleanup_browser

 logger = logging.getLogger(__name__)

@@ -53,6 +53,7 @@ def _run_tool_in_thread(tool_name: str, arguments: Dict[str, Any], task_id: str)
    try:
        loop = asyncio.get_running_loop()
        # We're in an async context -- need to run in thread
+        import concurrent.futures
        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
            future = pool.submit(
                handle_function_call, tool_name, arguments, task_id
@@ -446,7 +447,7 @@ class ToolContext:
        """
        # Kill any background processes from this rollout (safety net)
        try:
-            from hermes_agent.tools.process_registry import process_registry
+            from tools.process_registry import process_registry
            killed = process_registry.kill_all(task_id=self.task_id)
            if killed:
                logger.debug("Process cleanup for task %s: killed %d process(es)", self.task_id, killed)
--- a/environments/web_research_env.py
+++ b/environments/web_research_env.py
@@ -1,719 +0,0 @@
-"""
-WebResearchEnv — RL Environment for Multi-Step Web Research
-============================================================
-
-Trains models to do accurate, efficient, multi-source web research.
-
-Reward signals:
-  - Answer correctness  (LLM judge, 0.0–1.0)
-  - Source diversity    (used ≥2 distinct domains)
-  - Efficiency          (penalizes excessive tool calls)
-  - Tool usage          (bonus for actually using web tools)
-
-Dataset: FRAMES benchmark (Google, 2024) — multi-hop factual questions
-  HuggingFace: google/frames-benchmark
-  Fallback:    built-in sample questions (no HF token needed)
-
-Usage:
-    # Phase 1 (OpenAI-compatible server)
-    python environments/web_research_env.py serve \\
-        --openai.base_url http://localhost:8000/v1 \\
-        --openai.model_name YourModel \\
-        --openai.server_type openai
-
-    # Process mode (offline data generation)
-    python environments/web_research_env.py process \\
-        --env.data_path_to_save_groups data/web_research.jsonl
-
-    # Standalone eval
-    python environments/web_research_env.py evaluate \\
-        --openai.base_url http://localhost:8000/v1 \\
-        --openai.model_name YourModel
-
-Built by: github.com/jackx707
-Inspired by: GroceryMind — production Hermes agent doing live web research
-             across German grocery stores (firecrawl + hermes-agent)
-"""
-
-from __future__ import annotations
-
-import asyncio
-import json
-import logging
-import os
-import random
-import re
-import sys
-from pathlib import Path
-from typing import Any, Dict, List, Optional, Tuple
-from urllib.parse import urlparse
-
-from pydantic import Field
-
-# Ensure hermes-agent root is on path
-_repo_root = Path(__file__).resolve().parent.parent
-if str(_repo_root) not in sys.path:
-    sys.path.insert(0, str(_repo_root))
-
-# ---------------------------------------------------------------------------
-# Optional HuggingFace datasets import
-# ---------------------------------------------------------------------------
-try:
-    from datasets import load_dataset
-    HF_AVAILABLE = True
-except ImportError:
-    HF_AVAILABLE = False
-
-from atroposlib.envs.base import ScoredDataGroup
-from atroposlib.envs.server_handling.server_manager import APIServerConfig
-from atroposlib.type_definitions import Item
-
-from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
-from environments.agent_loop import AgentResult
-from environments.tool_context import ToolContext
-
-logger = logging.getLogger(__name__)
-
-# ---------------------------------------------------------------------------
-# Fallback sample dataset (used when HuggingFace is unavailable)
-# Multi-hop questions requiring real web search to answer.
-# ---------------------------------------------------------------------------
-SAMPLE_QUESTIONS = [
-    {
-        "question": "What is the current population of the capital city of the country that won the 2022 FIFA World Cup?",
-        "answer": "Buenos Aires has approximately 3 million people in the city proper, or around 15 million in the greater metro area.",
-        "difficulty": "medium",
-        "hops": 2,
-    },
-    {
-        "question": "Who is the CEO of the company that makes the most widely used open-source container orchestration platform?",
-        "answer": "The Linux Foundation oversees Kubernetes. CNCF (Cloud Native Computing Foundation) is the specific body — it does not have a traditional CEO but has an executive director.",
-        "difficulty": "medium",
-        "hops": 2,
-    },
-    {
-        "question": "What programming language was used to write the original version of the web framework used by Instagram?",
-        "answer": "Django, which Instagram was built on, is written in Python.",
-        "difficulty": "easy",
-        "hops": 2,
-    },
-    {
-        "question": "In what year was the university founded where the inventor of the World Wide Web currently holds a professorship?",
-        "answer": "Tim Berners-Lee holds a professorship at MIT (founded 1861) and the University of Southampton (founded 1952).",
-        "difficulty": "hard",
-        "hops": 3,
-    },
-    {
-        "question": "What is the latest stable version of the programming language that ranks #1 on the TIOBE index as of this year?",
-        "answer": "Python is currently #1 on TIOBE. The latest stable version should be verified via the official python.org site.",
-        "difficulty": "medium",
-        "hops": 2,
-    },
-    {
-        "question": "How many employees does the parent company of Instagram have?",
-        "answer": "Meta Platforms (parent of Instagram) employs approximately 70,000+ people as of recent reports.",
-        "difficulty": "medium",
-        "hops": 2,
-    },
-    {
-        "question": "What is the current interest rate set by the central bank of the country where the Eiffel Tower is located?",
-        "answer": "The European Central Bank sets rates for France/eurozone. The current rate should be verified — it has changed frequently in 2023-2025.",
-        "difficulty": "hard",
-        "hops": 2,
-    },
-    {
-        "question": "Which company acquired the startup founded by the creator of Oculus VR?",
-        "answer": "Palmer Luckey founded Oculus VR, which was acquired by Facebook (now Meta). He later founded Anduril Industries.",
-        "difficulty": "medium",
-        "hops": 2,
-    },
-    {
-        "question": "What is the market cap of the company that owns the most popular search engine in Russia?",
-        "answer": "Yandex (now split into separate entities after 2024 restructuring). Current market cap should be verified via financial sources.",
-        "difficulty": "hard",
-        "hops": 2,
-    },
-    {
-        "question": "What was the GDP growth rate of the country that hosted the most recent Summer Olympics?",
-        "answer": "Paris, France hosted the 2024 Summer Olympics. France's recent GDP growth should be verified via World Bank or IMF data.",
-        "difficulty": "hard",
-        "hops": 2,
-    },
-]
-
-
-# ---------------------------------------------------------------------------
-# Configuration
-# ---------------------------------------------------------------------------
-
-class WebResearchEnvConfig(HermesAgentEnvConfig):
-    """Configuration for the web research RL environment."""
-
-    # Reward weights
-    correctness_weight: float = Field(
-        default=0.6,
-        description="Weight for answer correctness in reward (LLM judge score).",
-    )
-    tool_usage_weight: float = Field(
-        default=0.2,
-        description="Weight for tool usage signal (did the model actually use web tools?).",
-    )
-    efficiency_weight: float = Field(
-        default=0.2,
-        description="Weight for efficiency signal (penalizes excessive tool calls).",
-    )
-    diversity_bonus: float = Field(
-        default=0.1,
-        description="Bonus reward for citing ≥2 distinct domains.",
-    )
-
-    # Efficiency thresholds
-    efficient_max_calls: int = Field(
-        default=5,
-        description="Maximum tool calls before efficiency penalty begins.",
-    )
-    heavy_penalty_calls: int = Field(
-        default=10,
-        description="Tool call count where efficiency penalty steepens.",
-    )
-
-    # Eval
-    eval_size: int = Field(
-        default=20,
-        description="Number of held-out items for evaluation.",
-    )
-    eval_split_ratio: float = Field(
-        default=0.1,
-        description="Fraction of dataset to hold out for evaluation (0.0–1.0).",
-    )
-
-    # Dataset
-    dataset_name: str = Field(
-        default="google/frames-benchmark",
-        description="HuggingFace dataset name for research questions.",
-    )
-
-
-# ---------------------------------------------------------------------------
-# Environment
-# ---------------------------------------------------------------------------
-
-class WebResearchEnv(HermesAgentBaseEnv):
-    """
-    RL environment for training multi-step web research skills.
-
-    The model is given a factual question requiring 2-3 hops of web research
-    and must use web_search / web_extract tools to find and synthesize the answer.
-
-    Reward is multi-signal:
-      60% — answer correctness (LLM judge)
-      20% — tool usage (did the model actually search the web?)
-      20% — efficiency (penalizes >5 tool calls)
-
-    Bonus +0.1 for source diversity (≥2 distinct domains cited).
-    """
-
-    name = "web-research"
-    env_config_cls = WebResearchEnvConfig
-
-    # Default toolsets for this environment — web + file for saving notes
-    default_toolsets = ["web", "file"]
-
-    @classmethod
-    def config_init(cls) -> Tuple[WebResearchEnvConfig, List[APIServerConfig]]:
-        """Default configuration for the web research environment."""
-        env_config = WebResearchEnvConfig(
-            enabled_toolsets=["web", "file"],
-            max_agent_turns=15,
-            agent_temperature=1.0,
-            system_prompt=(
-                "You are a highly capable research agent. When asked a factual question, "
-                "always use web_search to find current, accurate information before answering. "
-                "Cite at least 2 sources. Be concise and accurate."
-            ),
-            group_size=4,
-            total_steps=1000,
-            steps_per_eval=100,
-            use_wandb=True,
-            wandb_name="web-research",
-        )
-
-        server_configs = [
-            APIServerConfig(
-                base_url="https://openrouter.ai/api/v1",
-                model_name="anthropic/claude-sonnet-4.5",
-                server_type="openai",
-                api_key=os.getenv("OPENROUTER_API_KEY", ""),
-                health_check=False,
-            )
-        ]
-
-        return env_config, server_configs
-
-    def __init__(self, *args, **kwargs):
-        super().__init__(*args, **kwargs)
-        self._items: list[dict] = []
-        self._eval_items: list[dict] = []
-        self._index: int = 0
-
-        # Metrics tracking for wandb
-        self._reward_buffer: list[float] = []
-        self._correctness_buffer: list[float] = []
-        self._tool_usage_buffer: list[float] = []
-        self._efficiency_buffer: list[float] = []
-        self._diversity_buffer: list[float] = []
-
-    # ------------------------------------------------------------------
-    # 1. Setup — load dataset
-    # ------------------------------------------------------------------
-
-    async def setup(self) -> None:
-        """Load the FRAMES benchmark or fall back to built-in samples."""
-        if HF_AVAILABLE:
-            try:
-                logger.info("Loading FRAMES benchmark from HuggingFace...")
-                ds = load_dataset(self.config.dataset_name, split="test")
-                self._items = [
-                    {
-                        "question": row["Prompt"],
-                        "answer": row["Answer"],
-                        "difficulty": row.get("reasoning_types", "unknown"),
-                        "hops": 2,
-                    }
-                    for row in ds
-                ]
-                # Hold out for eval
-                eval_size = max(
-                    self.config.eval_size,
-                    int(len(self._items) * self.config.eval_split_ratio),
-                )
-                random.shuffle(self._items)
-                self._eval_items = self._items[:eval_size]
-                self._items = self._items[eval_size:]
-                logger.info(
-                    f"Loaded {len(self._items)} train / {len(self._eval_items)} eval items "
-                    f"from FRAMES benchmark."
-                )
-                return
-            except Exception as e:
-                logger.warning(f"Could not load FRAMES from HuggingFace: {e}. Using built-in samples.")
-
-        # Fallback
-        random.shuffle(SAMPLE_QUESTIONS)
-        split = max(1, len(SAMPLE_QUESTIONS) * 8 // 10)
-        self._items = SAMPLE_QUESTIONS[:split]
-        self._eval_items = SAMPLE_QUESTIONS[split:]
-        logger.info(
-            f"Using built-in sample dataset: {len(self._items)} train / "
-            f"{len(self._eval_items)} eval items."
-        )
-
-    # ------------------------------------------------------------------
-    # 2. get_next_item — return the next question
-    # ------------------------------------------------------------------
-
-    async def get_next_item(self) -> dict:
-        """Return the next item, cycling through the dataset."""
-        if not self._items:
-            raise RuntimeError("Dataset is empty. Did you call setup()?")
-        item = self._items[self._index % len(self._items)]
-        self._index += 1
-        return item
-
-    # ------------------------------------------------------------------
-    # 3. format_prompt — build the user-facing prompt
-    # ------------------------------------------------------------------
-
-    def format_prompt(self, item: dict) -> str:
-        """Format the research question as a task prompt."""
-        return (
-            f"Research the following question thoroughly using web search. "
-            f"You MUST search the web to find current, accurate information — "
-            f"do not rely solely on your training data.\n\n"
-            f"Question: {item['question']}\n\n"
-            f"Requirements:\n"
-            f"- Use web_search and/or web_extract tools to find information\n"
-            f"- Search at least 2 different sources\n"
-            f"- Provide a concise, accurate answer (2-4 sentences)\n"
-            f"- Cite the sources you used"
-        )
-
-    # ------------------------------------------------------------------
-    # 4. compute_reward — multi-signal scoring
-    # ------------------------------------------------------------------
-
-    async def compute_reward(
-        self,
-        item: dict,
-        result: AgentResult,
-        ctx: ToolContext,
-    ) -> float:
-        """
-        Multi-signal reward function:
-
-          correctness_weight * correctness  — LLM judge comparing answer to ground truth
-          tool_usage_weight  * tool_used    — binary: did the model use web tools?
-          efficiency_weight  * efficiency   — penalizes wasteful tool usage
-          + diversity_bonus                 — source diversity (≥2 distinct domains)
-        """
-        # Extract final response from messages (last assistant message with content)
-        final_response = ""
-        tools_used: list[str] = []
-        for msg in reversed(result.messages):
-            if msg.get("role") == "assistant" and msg.get("content") and not final_response:
-                final_response = msg["content"]
-            # Collect tool names from tool call messages
-            if msg.get("role") == "assistant" and msg.get("tool_calls"):
-                for tc in msg["tool_calls"]:
-                    fn = tc.get("function", {}) if isinstance(tc, dict) else {}
-                    name = fn.get("name", "")
-                    if name:
-                        tools_used.append(name)
-        tool_call_count: int = result.turns_used or len(tools_used)
-
-        cfg = self.config
-
-        # ---- Signal 1: Answer correctness (LLM judge) ----------------
-        correctness = await self._llm_judge(
-            question=item["question"],
-            expected=item["answer"],
-            model_answer=final_response,
-        )
-
-        # ---- Signal 2: Web tool usage --------------------------------
-        web_tools = {"web_search", "web_extract", "search", "firecrawl"}
-        tool_used = 1.0 if any(t in web_tools for t in tools_used) else 0.0
-
-        # ---- Signal 3: Efficiency ------------------------------------
-        if tool_call_count <= cfg.efficient_max_calls:
-            efficiency = 1.0
-        elif tool_call_count <= cfg.heavy_penalty_calls:
-            efficiency = 1.0 - (tool_call_count - cfg.efficient_max_calls) * 0.08
-        else:
-            efficiency = max(0.0, 1.0 - (tool_call_count - cfg.efficient_max_calls) * 0.12)
-
-        # ---- Bonus: Source diversity ---------------------------------
-        domains = self._extract_domains(final_response)
-        diversity = cfg.diversity_bonus if len(domains) >= 2 else 0.0
-
-        # ---- Combine ------------------------------------------------
-        reward = (
-            cfg.correctness_weight * correctness
-            + cfg.tool_usage_weight * tool_used
-            + cfg.efficiency_weight * efficiency
-            + diversity
-        )
-        reward = min(1.0, max(0.0, reward))  # clamp to [0, 1]
-
-        # Track for wandb
-        self._reward_buffer.append(reward)
-        self._correctness_buffer.append(correctness)
-        self._tool_usage_buffer.append(tool_used)
-        self._efficiency_buffer.append(efficiency)
-        self._diversity_buffer.append(diversity)
-
-        logger.debug(
-            f"Reward breakdown — correctness={correctness:.2f}, "
-            f"tool_used={tool_used:.1f}, efficiency={efficiency:.2f}, "
-            f"diversity={diversity:.1f} → total={reward:.3f}"
-        )
-
-        return reward
-
-    # ------------------------------------------------------------------
-    # 5. evaluate — run on held-out eval split
-    # ------------------------------------------------------------------
-
-    async def evaluate(self, *args, **kwargs) -> None:
-        """Run evaluation on the held-out split using the full agent loop with tools.
-
-        Each eval item runs through the same agent loop as training —
-        the model can use web_search, web_extract, etc. to research answers.
-        This measures actual agentic research capability, not just knowledge.
-        """
-        import time
-        import uuid
-        from environments.agent_loop import HermesAgentLoop
-        from environments.tool_context import ToolContext
-
-        items = self._eval_items
-        if not items:
-            logger.warning("No eval items available.")
-            return
-
-        eval_size = min(self.config.eval_size, len(items))
-        eval_items = items[:eval_size]
-
-        logger.info(f"Running eval on {len(eval_items)} questions (with agent loop + tools)...")
-        start_time = time.time()
-        samples = []
-
-        # Resolve tools once for all eval items
-        tools, valid_names = self._resolve_tools_for_group()
-
-        for i, item in enumerate(eval_items):
-            task_id = str(uuid.uuid4())
-            logger.info(f"Eval [{i+1}/{len(eval_items)}]: {item['question'][:80]}...")
-
-            try:
-                # Build messages
-                messages: List[Dict[str, Any]] = []
-                if self.config.system_prompt:
-                    messages.append({"role": "system", "content": self.config.system_prompt})
-                messages.append({"role": "user", "content": self.format_prompt(item)})
-
-                # Run the full agent loop with tools
-                agent = HermesAgentLoop(
-                    server=self.server,
-                    tool_schemas=tools,
-                    valid_tool_names=valid_names,
-                    max_turns=self.config.max_agent_turns,
-                    task_id=task_id,
-                    temperature=0.0,  # Deterministic for eval
-                    max_tokens=self.config.max_token_length,
-                    extra_body=self.config.extra_body,
-                    budget_config=self.config.build_budget_config(),
-                )
-                result = await agent.run(messages)
-
-                # Extract final response and tool usage from messages
-                final_response = ""
-                tool_call_count = 0
-                for msg in reversed(result.messages):
-                    if msg.get("role") == "assistant" and msg.get("content") and not final_response:
-                        final_response = msg["content"]
-                    if msg.get("role") == "assistant" and msg.get("tool_calls"):
-                        tool_call_count += len(msg["tool_calls"])
-
-                # Compute reward (includes LLM judge for correctness)
-                # Temporarily save buffer lengths so we can extract the
-                # correctness score without calling judge twice, and avoid
-                # polluting training metric buffers with eval data.
-                buf_len = len(self._correctness_buffer)
-                ctx = ToolContext(task_id)
-                try:
-                    reward = await self.compute_reward(item, result, ctx)
-                finally:
-                    ctx.cleanup()
-
-                # Extract correctness from the buffer (compute_reward appended it)
-                # then remove eval entries from training buffers
-                correctness = (
-                    self._correctness_buffer[buf_len]
-                    if len(self._correctness_buffer) > buf_len
-                    else 0.0
-                )
-                # Roll back buffers to avoid polluting training metrics
-                for buf in (
-                    self._reward_buffer, self._correctness_buffer,
-                    self._tool_usage_buffer, self._efficiency_buffer,
-                    self._diversity_buffer,
-                ):
-                    if len(buf) > buf_len:
-                        buf.pop()
-
-                samples.append({
-                    "prompt": item["question"],
-                    "response": final_response[:500],
-                    "expected": item["answer"],
-                    "correctness": correctness,
-                    "reward": reward,
-                    "tool_calls": tool_call_count,
-                    "turns": result.turns_used,
-                })
-
-                logger.info(
-                    f"  → correctness={correctness:.2f}, reward={reward:.3f}, "
-                    f"tools={tool_call_count}, turns={result.turns_used}"
-                )
-
-            except Exception as e:
-                logger.error(f"Eval error on item: {e}")
-                samples.append({
-                    "prompt": item["question"],
-                    "response": f"ERROR: {e}",
-                    "expected": item["answer"],
-                    "correctness": 0.0,
-                    "reward": 0.0,
-                    "tool_calls": 0,
-                    "turns": 0,
-                })
-
-        end_time = time.time()
-
-        # Compute aggregate metrics
-        correctness_scores = [s["correctness"] for s in samples]
-        rewards = [s["reward"] for s in samples]
-        tool_counts = [s["tool_calls"] for s in samples]
-        n = len(samples)
-
-        eval_metrics = {
-            "eval/mean_correctness": sum(correctness_scores) / n if n else 0.0,
-            "eval/mean_reward": sum(rewards) / n if n else 0.0,
-            "eval/mean_tool_calls": sum(tool_counts) / n if n else 0.0,
-            "eval/tool_usage_rate": sum(1 for t in tool_counts if t > 0) / n if n else 0.0,
-            "eval/n_items": n,
-        }
-
-        logger.info(
-            f"Eval complete — correctness={eval_metrics['eval/mean_correctness']:.3f}, "
-            f"reward={eval_metrics['eval/mean_reward']:.3f}, "
-            f"tool_usage={eval_metrics['eval/tool_usage_rate']:.0%}"
-        )
-
-        await self.evaluate_log(
-            metrics=eval_metrics,
-            samples=samples,
-            start_time=start_time,
-            end_time=end_time,
-        )
-
-    # ------------------------------------------------------------------
-    # 6. wandb_log — custom metrics
-    # ------------------------------------------------------------------
-
-    async def wandb_log(self, wandb_metrics: Optional[Dict] = None) -> None:
-        """Log reward breakdown metrics to wandb."""
-        if wandb_metrics is None:
-            wandb_metrics = {}
-
-        if self._reward_buffer:
-            n = len(self._reward_buffer)
-            wandb_metrics["train/mean_reward"] = sum(self._reward_buffer) / n
-            wandb_metrics["train/mean_correctness"] = sum(self._correctness_buffer) / n
-            wandb_metrics["train/mean_tool_usage"] = sum(self._tool_usage_buffer) / n
-            wandb_metrics["train/mean_efficiency"] = sum(self._efficiency_buffer) / n
-            wandb_metrics["train/mean_diversity"] = sum(self._diversity_buffer) / n
-            wandb_metrics["train/total_rollouts"] = n
-
-            # Accuracy buckets
-            wandb_metrics["train/correct_rate"] = (
-                sum(1 for c in self._correctness_buffer if c >= 0.7) / n
-            )
-            wandb_metrics["train/tool_usage_rate"] = (
-                sum(1 for t in self._tool_usage_buffer if t > 0) / n
-            )
-
-            # Clear buffers
-            self._reward_buffer.clear()
-            self._correctness_buffer.clear()
-            self._tool_usage_buffer.clear()
-            self._efficiency_buffer.clear()
-            self._diversity_buffer.clear()
-
-        await super().wandb_log(wandb_metrics)
-
-    # ------------------------------------------------------------------
-    # Private helpers
-    # ------------------------------------------------------------------
-
-    async def _llm_judge(
-        self,
-        question: str,
-        expected: str,
-        model_answer: str,
-    ) -> float:
-        """
-        Use the server's LLM to judge answer correctness.
-        Falls back to keyword heuristic if LLM call fails.
-        """
-        if not model_answer or not model_answer.strip():
-            return 0.0
-
-        judge_prompt = (
-            "You are an impartial judge evaluating the quality of an AI research answer.\n\n"
-            f"Question: {question}\n\n"
-            f"Reference answer: {expected}\n\n"
-            f"Model answer: {model_answer}\n\n"
-            "Score the model answer on a scale from 0.0 to 1.0 where:\n"
-            "  1.0 = fully correct and complete\n"
-            "  0.7 = mostly correct with minor gaps\n"
-            "  0.4 = partially correct\n"
-            "  0.1 = mentions relevant topic but wrong or very incomplete\n"
-            "  0.0 = completely wrong or no answer\n\n"
-            "Consider: factual accuracy, completeness, and relevance.\n"
-            'Respond with ONLY a JSON object: {"score": <float>, "reason": "<one sentence>"}'
-        )
-
-        try:
-            response = await self.server.chat_completion(
-                messages=[{"role": "user", "content": judge_prompt}],
-                n=1,
-                max_tokens=150,
-                temperature=0.0,
-                split="eval",
-            )
-            text = response.choices[0].message.content if response.choices else ""
-            parsed = self._parse_judge_json(text)
-            if parsed is not None:
-                return float(parsed)
-        except Exception as e:
-            logger.debug(f"LLM judge failed: {e}. Using heuristic.")
-
-        return self._heuristic_score(expected, model_answer)
-
-    @staticmethod
-    def _parse_judge_json(text: str) -> Optional[float]:
-        """Extract the score float from LLM judge JSON response."""
-        try:
-            clean = re.sub(r"```(?:json)?|```", "", text).strip()
-            data = json.loads(clean)
-            score = float(data.get("score", -1))
-            if 0.0 <= score <= 1.0:
-                return score
-        except Exception:
-            match = re.search(r'"score"\s*:\s*([0-9.]+)', text)
-            if match:
-                score = float(match.group(1))
-                if 0.0 <= score <= 1.0:
-                    return score
-        return None
-
-    @staticmethod
-    def _heuristic_score(expected: str, model_answer: str) -> float:
-        """Lightweight keyword overlap score as fallback."""
-        stopwords = {
-            "the", "a", "an", "is", "are", "was", "were", "of", "in", "on",
-            "at", "to", "for", "with", "and", "or", "but", "it", "its",
-            "this", "that", "as", "by", "from", "be", "has", "have", "had",
-        }
-
-        def tokenize(text: str) -> set:
-            tokens = re.findall(r'\b\w+\b', text.lower())
-            return {t for t in tokens if t not in stopwords and len(t) > 2}
-
-        expected_tokens = tokenize(expected)
-        answer_tokens = tokenize(model_answer)
-
-        if not expected_tokens:
-            return 0.5
-
-        overlap = len(expected_tokens & answer_tokens)
-        union = len(expected_tokens | answer_tokens)
-
-        jaccard = overlap / union if union > 0 else 0.0
-        recall = overlap / len(expected_tokens)
-        return min(1.0, 0.4 * jaccard + 0.6 * recall)
-
-    @staticmethod
-    def _extract_domains(text: str) -> set:
-        """Extract unique domains from URLs cited in the response."""
-        urls = re.findall(r'https?://[^\s\)>\]"\']+', text)
-        domains = set()
-        for url in urls:
-            try:
-                parsed = urlparse(url)
-                domain = parsed.netloc.lower().lstrip("www.")
-                if domain:
-                    domains.add(domain)
-            except Exception:
-                pass
-        return domains
-
-
-# ---------------------------------------------------------------------------
-# Entry point
-# ---------------------------------------------------------------------------
-
-if __name__ == "__main__":
-    WebResearchEnv.cli()
--- a/flake.lock
+++ b/flake.lock
@@ -1,202 +0,0 @@
-{
-  "nodes": {
-    "flake-parts": {
-      "inputs": {
-        "nixpkgs-lib": [
-          "nixpkgs"
-        ]
-      },
-      "locked": {
-        "lastModified": 1772408722,
-        "narHash": "sha256-rHuJtdcOjK7rAHpHphUb1iCvgkU3GpfvicLMwwnfMT0=",
-        "owner": "hercules-ci",
-        "repo": "flake-parts",
-        "rev": "f20dc5d9b8027381c474144ecabc9034d6a839a3",
-        "type": "github"
-      },
-      "original": {
-        "owner": "hercules-ci",
-        "repo": "flake-parts",
-        "type": "github"
-      }
-    },
-    "nixpkgs": {
-      "locked": {
-        "lastModified": 1775036866,
-        "narHash": "sha256-ZojAnPuCdy657PbTq5V0Y+AHKhZAIwSIT2cb8UgAz/U=",
-        "owner": "NixOS",
-        "repo": "nixpkgs",
-        "rev": "6201e203d09599479a3b3450ed24fa81537ebc4e",
-        "type": "github"
-      },
-      "original": {
-        "owner": "NixOS",
-        "ref": "nixos-unstable",
-        "repo": "nixpkgs",
-        "type": "github"
-      }
-    },
-    "npm-lockfile-fix": {
-      "inputs": {
-        "nixpkgs": [
-          "nixpkgs"
-        ]
-      },
-      "locked": {
-        "lastModified": 1775903712,
-        "narHash": "sha256-2GV79U6iVH4gKAPWYrxUReB0S41ty/Y3dBLquU8AlaA=",
-        "owner": "jeslie0",
-        "repo": "npm-lockfile-fix",
-        "rev": "c6093acb0c0548e0f9b8b3d82918823721930fe8",
-        "type": "github"
-      },
-      "original": {
-        "owner": "jeslie0",
-        "repo": "npm-lockfile-fix",
-        "type": "github"
-      }
-    },
-    "pyproject-build-systems": {
-      "inputs": {
-        "nixpkgs": [
-          "nixpkgs"
-        ],
-        "pyproject-nix": "pyproject-nix",
-        "uv2nix": "uv2nix"
-      },
-      "locked": {
-        "lastModified": 1772555609,
-        "narHash": "sha256-3BA3HnUvJSbHJAlJj6XSy0Jmu7RyP2gyB/0fL7XuEDo=",
-        "owner": "pyproject-nix",
-        "repo": "build-system-pkgs",
-        "rev": "c37f66a953535c394244888598947679af231863",
-        "type": "github"
-      },
-      "original": {
-        "owner": "pyproject-nix",
-        "repo": "build-system-pkgs",
-        "type": "github"
-      }
-    },
-    "pyproject-nix": {
-      "inputs": {
-        "nixpkgs": [
-          "pyproject-build-systems",
-          "nixpkgs"
-        ]
-      },
-      "locked": {
-        "lastModified": 1769936401,
-        "narHash": "sha256-kwCOegKLZJM9v/e/7cqwg1p/YjjTAukKPqmxKnAZRgA=",
-        "owner": "nix-community",
-        "repo": "pyproject.nix",
-        "rev": "b0d513eeeebed6d45b4f2e874f9afba2021f7812",
-        "type": "github"
-      },
-      "original": {
-        "owner": "nix-community",
-        "repo": "pyproject.nix",
-        "type": "github"
-      }
-    },
-    "pyproject-nix_2": {
-      "inputs": {
-        "nixpkgs": [
-          "nixpkgs"
-        ]
-      },
-      "locked": {
-        "lastModified": 1772865871,
-        "narHash": "sha256-/ZTSg97aouL0SlPHaokA4r3iuH9QzHVuWPACD2CUCFY=",
-        "owner": "pyproject-nix",
-        "repo": "pyproject.nix",
-        "rev": "e537db02e72d553cea470976b9733581bcf5b3ed",
-        "type": "github"
-      },
-      "original": {
-        "owner": "pyproject-nix",
-        "repo": "pyproject.nix",
-        "type": "github"
-      }
-    },
-    "pyproject-nix_3": {
-      "inputs": {
-        "nixpkgs": [
-          "uv2nix",
-          "nixpkgs"
-        ]
-      },
-      "locked": {
-        "lastModified": 1771518446,
-        "narHash": "sha256-nFJSfD89vWTu92KyuJWDoTQJuoDuddkJV3TlOl1cOic=",
-        "owner": "pyproject-nix",
-        "repo": "pyproject.nix",
-        "rev": "eb204c6b3335698dec6c7fc1da0ebc3c6df05937",
-        "type": "github"
-      },
-      "original": {
-        "owner": "pyproject-nix",
-        "repo": "pyproject.nix",
-        "type": "github"
-      }
-    },
-    "root": {
-      "inputs": {
-        "flake-parts": "flake-parts",
-        "nixpkgs": "nixpkgs",
-        "npm-lockfile-fix": "npm-lockfile-fix",
-        "pyproject-build-systems": "pyproject-build-systems",
-        "pyproject-nix": "pyproject-nix_2",
-        "uv2nix": "uv2nix_2"
-      }
-    },
-    "uv2nix": {
-      "inputs": {
-        "nixpkgs": [
-          "pyproject-build-systems",
-          "nixpkgs"
-        ],
-        "pyproject-nix": [
-          "pyproject-build-systems",
-          "pyproject-nix"
-        ]
-      },
-      "locked": {
-        "lastModified": 1770770348,
-        "narHash": "sha256-A2GzkmzdYvdgmMEu5yxW+xhossP+txrYb7RuzRaqhlg=",
-        "owner": "pyproject-nix",
-        "repo": "uv2nix",
-        "rev": "5d1b2cb4fe3158043fbafbbe2e46238abbc954b0",
-        "type": "github"
-      },
-      "original": {
-        "owner": "pyproject-nix",
-        "repo": "uv2nix",
-        "type": "github"
-      }
-    },
-    "uv2nix_2": {
-      "inputs": {
-        "nixpkgs": [
-          "nixpkgs"
-        ],
-        "pyproject-nix": "pyproject-nix_3"
-      },
-      "locked": {
-        "lastModified": 1773039484,
-        "narHash": "sha256-+boo33KYkJDw9KItpeEXXv8+65f7hHv/earxpcyzQ0I=",
-        "owner": "pyproject-nix",
-        "repo": "uv2nix",
-        "rev": "b68be7cfeacbed9a3fa38a2b5adc0cfb81d9bb1f",
-        "type": "github"
-      },
-      "original": {
-        "owner": "pyproject-nix",
-        "repo": "uv2nix",
-        "type": "github"
-      }
-    }
-  },
-  "root": "root",
-  "version": 7
-}
--- a/flake.nix
+++ b/flake.nix
@@ -1,44 +0,0 @@
-{
-  description = "Hermes Agent - AI agent framework by Nous Research";
-
-  inputs = {
-    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
-    flake-parts = {
-      url = "github:hercules-ci/flake-parts";
-      inputs.nixpkgs-lib.follows = "nixpkgs";
-    };
-    pyproject-nix = {
-      url = "github:pyproject-nix/pyproject.nix";
-      inputs.nixpkgs.follows = "nixpkgs";
-    };
-    uv2nix = {
-      url = "github:pyproject-nix/uv2nix";
-      inputs.nixpkgs.follows = "nixpkgs";
-    };
-    pyproject-build-systems = {
-      url = "github:pyproject-nix/build-system-pkgs";
-      inputs.nixpkgs.follows = "nixpkgs";
-    };
-    npm-lockfile-fix = {
-      url = "github:jeslie0/npm-lockfile-fix";
-      inputs.nixpkgs.follows = "nixpkgs";
-    };
-  };
-
-  outputs =
-    inputs:
-    inputs.flake-parts.lib.mkFlake { inherit inputs; } {
-      systems = [
-        "x86_64-linux"
-        "aarch64-linux"
-        "aarch64-darwin"
-      ];
-
-      imports = [
-        ./nix/packages.nix
-        ./nix/nixosModules.nix
-        ./nix/checks.nix
-        ./nix/devShell.nix
-      ];
-    };
-}
--- a/hermes_agent/gateway/init.py
+++ b/hermes_agent/gateway/init.py
--- a/hermes_agent/gateway/channel_directory.py
+++ b/hermes_agent/gateway/channel_directory.py
@@ -9,48 +9,12 @@ action="list" and for resolving human-friendly channel names to numeric IDs.
 import json
 import logging
 from datetime import datetime
+from pathlib import Path
 from typing import Any, Dict, List, Optional

-from hermes_agent.cli.config import get_hermes_home
-from hermes_agent.utils import atomic_json_write
-
 logger = logging.getLogger(__name__)

-DIRECTORY_PATH = get_hermes_home() / "channel_directory.json"
-
-
-def _normalize_channel_query(value: str) -> str:
-    return value.lstrip("#").strip().lower()
-
-
-def _channel_target_name(platform_name: str, channel: Dict[str, Any]) -> str:
-    """Return the human-facing target label shown to users for a channel entry."""
-    name = channel["name"]
-    if platform_name == "discord" and channel.get("guild"):
-        return f"#{name}"
-    if platform_name != "discord" and channel.get("type"):
-        return f"{name} ({channel['type']})"
-    return name
-
-
-def _session_entry_id(origin: Dict[str, Any]) -> Optional[str]:
-    chat_id = origin.get("chat_id")
-    if not chat_id:
-        return None
-    thread_id = origin.get("thread_id")
-    if thread_id:
-        return f"{chat_id}:{thread_id}"
-    return str(chat_id)
-
-
-def _session_entry_name(origin: Dict[str, Any]) -> str:
-    base_name = origin.get("chat_name") or origin.get("user_name") or str(origin.get("chat_id"))
-    thread_id = origin.get("thread_id")
-    if not thread_id:
-        return base_name
-
-    topic_label = origin.get("chat_topic") or f"topic {thread_id}"
-    return f"{base_name} / {topic_label}"
+DIRECTORY_PATH = Path.home() / ".hermes" / "channel_directory.json"


 # ---------------------------------------------------------------------------
@@ -63,7 +27,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:

    Returns the directory dict and writes it to DIRECTORY_PATH.
    """
-    from hermes_agent.gateway.config import Platform
+    from gateway.config import Platform

    platforms: Dict[str, List[Dict[str, str]]] = {}

@@ -76,15 +40,10 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
        except Exception as e:
            logger.warning("Channel directory: failed to build %s: %s", platform.value, e)

-    # Platforms that don't support direct channel enumeration get session-based
-    # discovery automatically.  Skip infrastructure entries that aren't messaging
-    # platforms — everything else falls through to _build_from_sessions().
-    _SKIP_SESSION_DISCOVERY = frozenset({"local", "api_server", "webhook"})
-    for plat in Platform:
-        plat_name = plat.value
-        if plat_name in _SKIP_SESSION_DISCOVERY or plat_name in platforms:
-            continue
-        platforms[plat_name] = _build_from_sessions(plat_name)
+    # Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
+    for plat_name in ("telegram", "whatsapp", "signal"):
+        if plat_name not in platforms:
+            platforms[plat_name] = _build_from_sessions(plat_name)

    directory = {
        "updated_at": datetime.now().isoformat(),
@@ -92,7 +51,9 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
    }

    try:
-        atomic_json_write(DIRECTORY_PATH, directory)
+        DIRECTORY_PATH.parent.mkdir(parents=True, exist_ok=True)
+        with open(DIRECTORY_PATH, "w", encoding="utf-8") as f:
+            json.dump(directory, f, indent=2, ensure_ascii=False)
    except Exception as e:
        logger.warning("Channel directory: failed to write: %s", e)

@@ -100,14 +61,14 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:


 def _build_discord(adapter) -> List[Dict[str, str]]:
-    """Enumerate all text channels and forum channels the Discord bot can see."""
+    """Enumerate all text channels the Discord bot can see."""
    channels = []
    client = getattr(adapter, "_client", None)
    if not client:
        return channels

    try:
-        import discord as _discord  # noqa: F401 — SDK presence check
+        import discord as _discord
    except ImportError:
        return channels

@@ -119,15 +80,6 @@ def _build_discord(adapter) -> List[Dict[str, str]]:
                "guild": guild.name,
                "type": "channel",
            })
-        # Forum channels (type 15) — creating a message auto-spawns a thread post.
-        forums = getattr(guild, "forum_channels", None) or []
-        for ch in forums:
-            channels.append({
-                "id": str(ch.id),
-                "name": ch.name,
-                "guild": guild.name,
-                "type": "forum",
-            })
        # Also include DM-capable users we've interacted with is not
        # feasible via guild enumeration; those come from sessions.

@@ -138,13 +90,15 @@ def _build_discord(adapter) -> List[Dict[str, str]]:

 def _build_slack(adapter) -> List[Dict[str, str]]:
    """List Slack channels the bot has joined."""
+    channels = []
    # Slack adapter may expose a web client
    client = getattr(adapter, "_app", None) or getattr(adapter, "_client", None)
    if not client:
        return _build_from_sessions("slack")

    try:
-        from hermes_agent.tools.send_message import _send_slack  # noqa: F401
+        import asyncio
+        from tools.send_message_tool import _send_slack  # noqa: F401
        # Use the Slack Web API directly if available
    except Exception:
        pass
@@ -155,7 +109,7 @@ def _build_slack(adapter) -> List[Dict[str, str]]:

 def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
    """Pull known channels/contacts from sessions.json origin data."""
-    sessions_path = get_hermes_home() / "sessions" / "sessions.json"
+    sessions_path = Path.home() / ".hermes" / "sessions" / "sessions.json"
    if not sessions_path.exists():
        return []

@@ -169,15 +123,14 @@ def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
            origin = session.get("origin") or {}
            if origin.get("platform") != platform_name:
                continue
-            entry_id = _session_entry_id(origin)
-            if not entry_id or entry_id in seen_ids:
+            chat_id = origin.get("chat_id")
+            if not chat_id or chat_id in seen_ids:
                continue
-            seen_ids.add(entry_id)
+            seen_ids.add(chat_id)
            entries.append({
-                "id": entry_id,
-                "name": _session_entry_name(origin),
+                "id": str(chat_id),
+                "name": origin.get("chat_name") or origin.get("user_name") or str(chat_id),
                "type": session.get("chat_type", "dm"),
-                "thread_id": origin.get("thread_id"),
            })
    except Exception as e:
        logger.debug("Channel directory: failed to read sessions for %s: %s", platform_name, e)
@@ -200,15 +153,6 @@ def load_directory() -> Dict[str, Any]:
        return {"updated_at": None, "platforms": {}}


-def lookup_channel_type(platform_name: str, chat_id: str) -> Optional[str]:
-    """Return the channel ``type`` string (e.g. ``"channel"``, ``"forum"``) for *chat_id*, or *None* if unknown."""
-    directory = load_directory()
-    for ch in directory.get("platforms", {}).get(platform_name, []):
-        if ch.get("id") == chat_id:
-            return ch.get("type")
-    return None
-
-
 def resolve_channel_name(platform_name: str, name: str) -> Optional[str]:
    """
    Resolve a human-friendly channel name to a numeric ID.
@@ -223,25 +167,23 @@ def resolve_channel_name(platform_name: str, name: str) -> Optional[str]:
    if not channels:
        return None

-    query = _normalize_channel_query(name)
+    query = name.lstrip("#").lower()

-    # 1. Exact name match, including the display labels shown by send_message(action="list")
+    # 1. Exact name match
    for ch in channels:
-        if _normalize_channel_query(ch["name"]) == query:
-            return ch["id"]
-        if _normalize_channel_query(_channel_target_name(platform_name, ch)) == query:
+        if ch["name"].lower() == query:
            return ch["id"]

    # 2. Guild-qualified match for Discord ("GuildName/channel")
    if "/" in query:
        guild_part, ch_part = query.rsplit("/", 1)
        for ch in channels:
-            guild = ch.get("guild", "").strip().lower()
-            if guild == guild_part and _normalize_channel_query(ch["name"]) == ch_part:
+            guild = ch.get("guild", "").lower()
+            if guild == guild_part and ch["name"].lower() == ch_part:
                return ch["id"]

    # 3. Partial prefix match (only if unambiguous)
-    matches = [ch for ch in channels if _normalize_channel_query(ch["name"]).startswith(query)]
+    matches = [ch for ch in channels if ch["name"].lower().startswith(query)]
    if len(matches) == 1:
        return matches[0]["id"]

@@ -276,16 +218,17 @@ def format_directory_for_display() -> str:
            for guild_name, guild_channels in sorted(guilds.items()):
                lines.append(f"Discord ({guild_name}):")
                for ch in sorted(guild_channels, key=lambda c: c["name"]):
-                    lines.append(f"  discord:{_channel_target_name(plat_name, ch)}")
+                    lines.append(f"  discord:#{ch['name']}")
            if dms:
                lines.append("Discord (DMs):")
                for ch in dms:
-                    lines.append(f"  discord:{_channel_target_name(plat_name, ch)}")
+                    lines.append(f"  discord:{ch['name']}")
            lines.append("")
        else:
            lines.append(f"{plat_name.title()}:")
            for ch in channels:
-                lines.append(f"  {plat_name}:{_channel_target_name(plat_name, ch)}")
+                type_label = f" ({ch['type']})" if ch.get("type") else ""
+                lines.append(f"  {plat_name}:{ch['name']}{type_label}")
            lines.append("")

    lines.append('Use these as the "target" parameter when sending.')
--- a/gateway/config.py
+++ b/gateway/config.py
@@ -0,0 +1,445 @@
+"""
+Gateway configuration management.
+
+Handles loading and validating configuration for:
+- Connected platforms (Telegram, Discord, WhatsApp)
+- Home channels for each platform
+- Session reset policies
+- Delivery preferences
+"""
+
+import logging
+import os
+import json
+from pathlib import Path
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Any
+from enum import Enum
+
+logger = logging.getLogger(__name__)
+
+
+class Platform(Enum):
+    """Supported messaging platforms."""
+    LOCAL = "local"
+    TELEGRAM = "telegram"
+    DISCORD = "discord"
+    WHATSAPP = "whatsapp"
+    SLACK = "slack"
+    SIGNAL = "signal"
+    HOMEASSISTANT = "homeassistant"
+
+
+@dataclass
+class HomeChannel:
+    """
+    Default destination for a platform.
+    
+    When a cron job specifies deliver="telegram" without a specific chat ID,
+    messages are sent to this home channel.
+    """
+    platform: Platform
+    chat_id: str
+    name: str  # Human-readable name for display
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "platform": self.platform.value,
+            "chat_id": self.chat_id,
+            "name": self.name,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "HomeChannel":
+        return cls(
+            platform=Platform(data["platform"]),
+            chat_id=str(data["chat_id"]),
+            name=data.get("name", "Home"),
+        )
+
+
+@dataclass
+class SessionResetPolicy:
+    """
+    Controls when sessions reset (lose context).
+    
+    Modes:
+    - "daily": Reset at a specific hour each day
+    - "idle": Reset after N minutes of inactivity
+    - "both": Whichever triggers first (daily boundary OR idle timeout)
+    - "none": Never auto-reset (context managed only by compression)
+    """
+    mode: str = "both"  # "daily", "idle", "both", or "none"
+    at_hour: int = 4  # Hour for daily reset (0-23, local time)
+    idle_minutes: int = 1440  # Minutes of inactivity before reset (24 hours)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "mode": self.mode,
+            "at_hour": self.at_hour,
+            "idle_minutes": self.idle_minutes,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionResetPolicy":
+        return cls(
+            mode=data.get("mode", "both"),
+            at_hour=data.get("at_hour", 4),
+            idle_minutes=data.get("idle_minutes", 1440),
+        )
+
+
+@dataclass
+class PlatformConfig:
+    """Configuration for a single messaging platform."""
+    enabled: bool = False
+    token: Optional[str] = None  # Bot token (Telegram, Discord)
+    api_key: Optional[str] = None  # API key if different from token
+    home_channel: Optional[HomeChannel] = None
+    
+    # Platform-specific settings
+    extra: Dict[str, Any] = field(default_factory=dict)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        result = {
+            "enabled": self.enabled,
+            "extra": self.extra,
+        }
+        if self.token:
+            result["token"] = self.token
+        if self.api_key:
+            result["api_key"] = self.api_key
+        if self.home_channel:
+            result["home_channel"] = self.home_channel.to_dict()
+        return result
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "PlatformConfig":
+        home_channel = None
+        if "home_channel" in data:
+            home_channel = HomeChannel.from_dict(data["home_channel"])
+        
+        return cls(
+            enabled=data.get("enabled", False),
+            token=data.get("token"),
+            api_key=data.get("api_key"),
+            home_channel=home_channel,
+            extra=data.get("extra", {}),
+        )
+
+
+@dataclass
+class GatewayConfig:
+    """
+    Main gateway configuration.
+    
+    Manages all platform connections, session policies, and delivery settings.
+    """
+    # Platform configurations
+    platforms: Dict[Platform, PlatformConfig] = field(default_factory=dict)
+    
+    # Session reset policies by type
+    default_reset_policy: SessionResetPolicy = field(default_factory=SessionResetPolicy)
+    reset_by_type: Dict[str, SessionResetPolicy] = field(default_factory=dict)
+    reset_by_platform: Dict[Platform, SessionResetPolicy] = field(default_factory=dict)
+    
+    # Reset trigger commands
+    reset_triggers: List[str] = field(default_factory=lambda: ["/new", "/reset"])
+    
+    # Storage paths
+    sessions_dir: Path = field(default_factory=lambda: Path.home() / ".hermes" / "sessions")
+    
+    # Delivery settings
+    always_log_local: bool = True  # Always save cron outputs to local files
+    
+    def get_connected_platforms(self) -> List[Platform]:
+        """Return list of platforms that are enabled and configured."""
+        connected = []
+        for platform, config in self.platforms.items():
+            if not config.enabled:
+                continue
+            # Platforms that use token/api_key auth
+            if config.token or config.api_key:
+                connected.append(platform)
+            # WhatsApp uses enabled flag only (bridge handles auth)
+            elif platform == Platform.WHATSAPP:
+                connected.append(platform)
+            # Signal uses extra dict for config (http_url + account)
+            elif platform == Platform.SIGNAL and config.extra.get("http_url"):
+                connected.append(platform)
+        return connected
+    
+    def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
+        """Get the home channel for a platform."""
+        config = self.platforms.get(platform)
+        if config:
+            return config.home_channel
+        return None
+    
+    def get_reset_policy(
+        self, 
+        platform: Optional[Platform] = None,
+        session_type: Optional[str] = None
+    ) -> SessionResetPolicy:
+        """
+        Get the appropriate reset policy for a session.
+        
+        Priority: platform override > type override > default
+        """
+        # Platform-specific override takes precedence
+        if platform and platform in self.reset_by_platform:
+            return self.reset_by_platform[platform]
+        
+        # Type-specific override (dm, group, thread)
+        if session_type and session_type in self.reset_by_type:
+            return self.reset_by_type[session_type]
+        
+        return self.default_reset_policy
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "platforms": {
+                p.value: c.to_dict() for p, c in self.platforms.items()
+            },
+            "default_reset_policy": self.default_reset_policy.to_dict(),
+            "reset_by_type": {
+                k: v.to_dict() for k, v in self.reset_by_type.items()
+            },
+            "reset_by_platform": {
+                p.value: v.to_dict() for p, v in self.reset_by_platform.items()
+            },
+            "reset_triggers": self.reset_triggers,
+            "sessions_dir": str(self.sessions_dir),
+            "always_log_local": self.always_log_local,
+        }
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "GatewayConfig":
+        platforms = {}
+        for platform_name, platform_data in data.get("platforms", {}).items():
+            try:
+                platform = Platform(platform_name)
+                platforms[platform] = PlatformConfig.from_dict(platform_data)
+            except ValueError:
+                pass  # Skip unknown platforms
+        
+        reset_by_type = {}
+        for type_name, policy_data in data.get("reset_by_type", {}).items():
+            reset_by_type[type_name] = SessionResetPolicy.from_dict(policy_data)
+        
+        reset_by_platform = {}
+        for platform_name, policy_data in data.get("reset_by_platform", {}).items():
+            try:
+                platform = Platform(platform_name)
+                reset_by_platform[platform] = SessionResetPolicy.from_dict(policy_data)
+            except ValueError:
+                pass
+        
+        default_policy = SessionResetPolicy()
+        if "default_reset_policy" in data:
+            default_policy = SessionResetPolicy.from_dict(data["default_reset_policy"])
+        
+        sessions_dir = Path.home() / ".hermes" / "sessions"
+        if "sessions_dir" in data:
+            sessions_dir = Path(data["sessions_dir"])
+        
+        return cls(
+            platforms=platforms,
+            default_reset_policy=default_policy,
+            reset_by_type=reset_by_type,
+            reset_by_platform=reset_by_platform,
+            reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
+            sessions_dir=sessions_dir,
+            always_log_local=data.get("always_log_local", True),
+        )
+
+
+def load_gateway_config() -> GatewayConfig:
+    """
+    Load gateway configuration from multiple sources.
+    
+    Priority (highest to lowest):
+    1. Environment variables
+    2. ~/.hermes/gateway.json
+    3. cli-config.yaml gateway section
+    4. Defaults
+    """
+    config = GatewayConfig()
+    
+    # Try loading from ~/.hermes/gateway.json
+    gateway_config_path = Path.home() / ".hermes" / "gateway.json"
+    if gateway_config_path.exists():
+        try:
+            with open(gateway_config_path, "r") as f:
+                data = json.load(f)
+                config = GatewayConfig.from_dict(data)
+        except Exception as e:
+            print(f"[gateway] Warning: Failed to load {gateway_config_path}: {e}")
+    
+    # Bridge session_reset from config.yaml (the user-facing config file)
+    # into the gateway config. config.yaml takes precedence over gateway.json
+    # for session reset policy since that's where hermes setup writes it.
+    try:
+        import yaml
+        config_yaml_path = Path.home() / ".hermes" / "config.yaml"
+        if config_yaml_path.exists():
+            with open(config_yaml_path) as f:
+                yaml_cfg = yaml.safe_load(f) or {}
+            sr = yaml_cfg.get("session_reset")
+            if sr and isinstance(sr, dict):
+                config.default_reset_policy = SessionResetPolicy.from_dict(sr)
+    except Exception:
+        pass
+
+    # Override with environment variables
+    _apply_env_overrides(config)
+    
+    # --- Validate loaded values ---
+    policy = config.default_reset_policy
+
+    if not (0 <= policy.at_hour <= 23):
+        logger.warning(
+            "Invalid at_hour=%s (must be 0-23). Using default 4.", policy.at_hour
+        )
+        policy.at_hour = 4
+
+    if policy.idle_minutes is None or policy.idle_minutes <= 0:
+        logger.warning(
+            "Invalid idle_minutes=%s (must be positive). Using default 1440.",
+            policy.idle_minutes,
+        )
+        policy.idle_minutes = 1440
+
+    # Warn about empty bot tokens — platforms that loaded an empty string
+    # won't connect and the cause can be confusing without a log line.
+    _token_env_names = {
+        Platform.TELEGRAM: "TELEGRAM_BOT_TOKEN",
+        Platform.DISCORD: "DISCORD_BOT_TOKEN",
+        Platform.SLACK: "SLACK_BOT_TOKEN",
+    }
+    for platform, pconfig in config.platforms.items():
+        if not pconfig.enabled:
+            continue
+        env_name = _token_env_names.get(platform)
+        if env_name and pconfig.token is not None and not pconfig.token.strip():
+            logger.warning(
+                "%s is enabled but %s is empty. "
+                "The adapter will likely fail to connect.",
+                platform.value, env_name,
+            )
+
+    return config
+
+
+def _apply_env_overrides(config: GatewayConfig) -> None:
+    """Apply environment variable overrides to config."""
+    
+    # Telegram
+    telegram_token = os.getenv("TELEGRAM_BOT_TOKEN")
+    if telegram_token:
+        if Platform.TELEGRAM not in config.platforms:
+            config.platforms[Platform.TELEGRAM] = PlatformConfig()
+        config.platforms[Platform.TELEGRAM].enabled = True
+        config.platforms[Platform.TELEGRAM].token = telegram_token
+    
+    telegram_home = os.getenv("TELEGRAM_HOME_CHANNEL")
+    if telegram_home and Platform.TELEGRAM in config.platforms:
+        config.platforms[Platform.TELEGRAM].home_channel = HomeChannel(
+            platform=Platform.TELEGRAM,
+            chat_id=telegram_home,
+            name=os.getenv("TELEGRAM_HOME_CHANNEL_NAME", "Home"),
+        )
+    
+    # Discord
+    discord_token = os.getenv("DISCORD_BOT_TOKEN")
+    if discord_token:
+        if Platform.DISCORD not in config.platforms:
+            config.platforms[Platform.DISCORD] = PlatformConfig()
+        config.platforms[Platform.DISCORD].enabled = True
+        config.platforms[Platform.DISCORD].token = discord_token
+    
+    discord_home = os.getenv("DISCORD_HOME_CHANNEL")
+    if discord_home and Platform.DISCORD in config.platforms:
+        config.platforms[Platform.DISCORD].home_channel = HomeChannel(
+            platform=Platform.DISCORD,
+            chat_id=discord_home,
+            name=os.getenv("DISCORD_HOME_CHANNEL_NAME", "Home"),
+        )
+    
+    # WhatsApp (typically uses different auth mechanism)
+    whatsapp_enabled = os.getenv("WHATSAPP_ENABLED", "").lower() in ("true", "1", "yes")
+    if whatsapp_enabled:
+        if Platform.WHATSAPP not in config.platforms:
+            config.platforms[Platform.WHATSAPP] = PlatformConfig()
+        config.platforms[Platform.WHATSAPP].enabled = True
+    
+    # Slack
+    slack_token = os.getenv("SLACK_BOT_TOKEN")
+    if slack_token:
+        if Platform.SLACK not in config.platforms:
+            config.platforms[Platform.SLACK] = PlatformConfig()
+        config.platforms[Platform.SLACK].enabled = True
+        config.platforms[Platform.SLACK].token = slack_token
+        # Home channel
+        slack_home = os.getenv("SLACK_HOME_CHANNEL")
+        if slack_home:
+            config.platforms[Platform.SLACK].home_channel = HomeChannel(
+                platform=Platform.SLACK,
+                chat_id=slack_home,
+                name=os.getenv("SLACK_HOME_CHANNEL_NAME", ""),
+            )
+    
+    # Signal
+    signal_url = os.getenv("SIGNAL_HTTP_URL")
+    signal_account = os.getenv("SIGNAL_ACCOUNT")
+    if signal_url and signal_account:
+        if Platform.SIGNAL not in config.platforms:
+            config.platforms[Platform.SIGNAL] = PlatformConfig()
+        config.platforms[Platform.SIGNAL].enabled = True
+        config.platforms[Platform.SIGNAL].extra.update({
+            "http_url": signal_url,
+            "account": signal_account,
+            "ignore_stories": os.getenv("SIGNAL_IGNORE_STORIES", "true").lower() in ("true", "1", "yes"),
+        })
+        signal_home = os.getenv("SIGNAL_HOME_CHANNEL")
+        if signal_home:
+            config.platforms[Platform.SIGNAL].home_channel = HomeChannel(
+                platform=Platform.SIGNAL,
+                chat_id=signal_home,
+                name=os.getenv("SIGNAL_HOME_CHANNEL_NAME", "Home"),
+            )
+
+    # Home Assistant
+    hass_token = os.getenv("HASS_TOKEN")
+    if hass_token:
+        if Platform.HOMEASSISTANT not in config.platforms:
+            config.platforms[Platform.HOMEASSISTANT] = PlatformConfig()
+        config.platforms[Platform.HOMEASSISTANT].enabled = True
+        config.platforms[Platform.HOMEASSISTANT].token = hass_token
+        hass_url = os.getenv("HASS_URL")
+        if hass_url:
+            config.platforms[Platform.HOMEASSISTANT].extra["url"] = hass_url
+
+    # Session settings
+    idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
+    if idle_minutes:
+        try:
+            config.default_reset_policy.idle_minutes = int(idle_minutes)
+        except ValueError:
+            pass
+    
+    reset_hour = os.getenv("SESSION_RESET_HOUR")
+    if reset_hour:
+        try:
+            config.default_reset_policy.at_hour = int(reset_hour)
+        except ValueError:
+            pass
+
+
+def save_gateway_config(config: GatewayConfig) -> None:
+    """Save gateway configuration to ~/.hermes/gateway.json."""
+    gateway_config_path = Path.home() / ".hermes" / "gateway.json"
+    gateway_config_path.parent.mkdir(parents=True, exist_ok=True)
+    
+    with open(gateway_config_path, "w") as f:
+        json.dump(config.to_dict(), f, indent=2)
--- a/hermes_agent/gateway/delivery.py
+++ b/hermes_agent/gateway/delivery.py
@@ -12,9 +12,8 @@ import logging
 from pathlib import Path
 from datetime import datetime
 from dataclasses import dataclass
-from typing import Dict, List, Optional, Any
-
-from hermes_agent.cli.config import get_hermes_home
+from typing import Dict, List, Optional, Any, Union
+from enum import Enum

 logger = logging.getLogger(__name__)

@@ -38,7 +37,6 @@ class DeliveryTarget:
    """
    platform: Platform
    chat_id: Optional[str] = None  # None means use home channel
-    thread_id: Optional[str] = None
    is_origin: bool = False
    is_explicit: bool = False  # True if chat_id was explicitly specified
    
@@ -60,7 +58,6 @@ class DeliveryTarget:
                return cls(
                    platform=origin.platform,
                    chat_id=origin.chat_id,
-                    thread_id=origin.thread_id,
                    is_origin=True,
                )
            else:
@@ -70,15 +67,12 @@ class DeliveryTarget:
        if target == "local":
            return cls(platform=Platform.LOCAL)
        
-        # Check for platform:chat_id or platform:chat_id:thread_id format
+        # Check for platform:chat_id format
        if ":" in target:
-            parts = target.split(":", 2)
-            platform_str = parts[0]
-            chat_id = parts[1] if len(parts) > 1 else None
-            thread_id = parts[2] if len(parts) > 2 else None
+            platform_str, chat_id = target.split(":", 1)
            try:
                platform = Platform(platform_str)
-                return cls(platform=platform, chat_id=chat_id, thread_id=thread_id, is_explicit=True)
+                return cls(platform=platform, chat_id=chat_id, is_explicit=True)
            except ValueError:
                # Unknown platform, treat as local
                return cls(platform=Platform.LOCAL)
@@ -97,8 +91,6 @@ class DeliveryTarget:
            return "origin"
        if self.platform == Platform.LOCAL:
            return "local"
-        if self.chat_id and self.thread_id:
-            return f"{self.platform.value}:{self.chat_id}:{self.thread_id}"
        if self.chat_id:
            return f"{self.platform.value}:{self.chat_id}"
        return self.platform.value
@@ -122,7 +114,54 @@ class DeliveryRouter:
        """
        self.config = config
        self.adapters = adapters or {}
-        self.output_dir = get_hermes_home() / "cron" / "output"
+        self.output_dir = Path.home() / ".hermes" / "cron" / "output"
+    
+    def resolve_targets(
+        self,
+        deliver: Union[str, List[str]],
+        origin: Optional[SessionSource] = None
+    ) -> List[DeliveryTarget]:
+        """
+        Resolve delivery specification to concrete targets.
+        
+        Args:
+            deliver: Delivery spec - "origin", "telegram", ["local", "discord"], etc.
+            origin: The source where the request originated (for "origin" target)
+        
+        Returns:
+            List of resolved delivery targets
+        """
+        if isinstance(deliver, str):
+            deliver = [deliver]
+        
+        targets = []
+        seen_platforms = set()
+        
+        for target_str in deliver:
+            target = DeliveryTarget.parse(target_str, origin)
+            
+            # Resolve home channel if needed
+            if target.chat_id is None and target.platform != Platform.LOCAL:
+                home = self.config.get_home_channel(target.platform)
+                if home:
+                    target.chat_id = home.chat_id
+                else:
+                    # No home channel configured, skip this platform
+                    continue
+            
+            # Deduplicate
+            key = (target.platform, target.chat_id)
+            if key not in seen_platforms:
+                seen_platforms.add(key)
+                targets.append(target)
+        
+        # Always include local if configured
+        if self.config.always_log_local:
+            local_key = (Platform.LOCAL, None)
+            if local_key not in seen_platforms:
+                targets.append(DeliveryTarget(platform=Platform.LOCAL))
+        
+        return targets
    
    async def deliver(
        self,
@@ -215,7 +254,7 @@ class DeliveryRouter:
    def _save_full_output(self, content: str, job_id: str) -> Path:
        """Save full cron output to disk and return the file path."""
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-        out_dir = get_hermes_home() / "cron" / "output"
+        out_dir = Path.home() / ".hermes" / "cron" / "output"
        out_dir.mkdir(parents=True, exist_ok=True)
        path = out_dir / f"{job_id}_{timestamp}.txt"
        path.write_text(content)
@@ -246,11 +285,56 @@ class DeliveryRouter:
                + f"\n\n... [truncated, full output saved to {saved_path}]"
            )
        
-        send_metadata = dict(metadata or {})
-        if target.thread_id and "thread_id" not in send_metadata:
-            send_metadata["thread_id"] = target.thread_id
-        return await adapter.send(target.chat_id, content, metadata=send_metadata or None)
+        return await adapter.send(target.chat_id, content, metadata=metadata)


+def parse_deliver_spec(
+    deliver: Optional[Union[str, List[str]]],
+    origin: Optional[SessionSource] = None,
+    default: str = "origin"
+) -> Union[str, List[str]]:
+    """
+    Normalize a delivery specification.
+    
+    If None or empty, returns the default.
+    """
+    if not deliver:
+        return default
+    return deliver


+def build_delivery_context_for_tool(
+    config: GatewayConfig,
+    origin: Optional[SessionSource] = None
+) -> Dict[str, Any]:
+    """
+    Build context for the schedule_cronjob tool to understand delivery options.
+    
+    This is passed to the tool so it can validate and explain delivery targets.
+    """
+    connected = config.get_connected_platforms()
+    
+    options = {
+        "origin": {
+            "description": "Back to where this job was created",
+            "available": origin is not None,
+        },
+        "local": {
+            "description": "Save to local files only",
+            "available": True,
+        }
+    }
+    
+    for platform in connected:
+        home = config.get_home_channel(platform)
+        options[platform.value] = {
+            "description": f"{platform.value.title()} home channel",
+            "available": True,
+            "home_channel": home.to_dict() if home else None,
+        }
+    
+    return {
+        "origin": origin.to_dict() if origin else None,
+        "options": options,
+        "always_log_local": config.always_log_local,
+    }
--- a/hermes_agent/gateway/hooks.py
+++ b/hermes_agent/gateway/hooks.py
@@ -8,9 +8,8 @@ Hooks are discovered from ~/.hermes/hooks/ directories, each containing:

 Events:
  - gateway:startup     -- Gateway process starts
-  - session:start       -- New session created (first message of a new session)
-  - session:end         -- Session ends (user ran /new or /reset)
-  - session:reset       -- Session reset completed (new session entry created)
+  - session:start       -- New session created
+  - session:reset       -- User ran /new or /reset
  - agent:start         -- Agent begins processing a message
  - agent:step          -- Each turn in the tool-calling loop
  - agent:end           -- Agent finishes processing
@@ -21,14 +20,14 @@ Errors in hooks are caught and logged but never block the main pipeline.

 import asyncio
 import importlib.util
+import os
+from pathlib import Path
 from typing import Any, Callable, Dict, List, Optional

 import yaml

-from hermes_agent.cli.config import get_hermes_home

-
-HOOKS_DIR = get_hermes_home() / "hooks"
+HOOKS_DIR = Path(os.path.expanduser("~/.hermes/hooks"))


 class HookRegistry:
@@ -51,33 +50,14 @@ class HookRegistry:
        """Return metadata about all loaded hooks."""
        return list(self._loaded_hooks)

-    def _register_builtin_hooks(self) -> None:
-        """Register built-in hooks that are always active."""
-        try:
-            from hermes_agent.gateway.builtin_hooks.boot_md import handle as boot_md_handle
-
-            self._handlers.setdefault("gateway:startup", []).append(boot_md_handle)
-            self._loaded_hooks.append({
-                "name": "boot-md",
-                "description": "Run ~/.hermes/BOOT.md on gateway startup",
-                "events": ["gateway:startup"],
-                "path": "(builtin)",
-            })
-        except Exception as e:
-            print(f"[hooks] Could not load built-in boot-md hook: {e}", flush=True)
-
    def discover_and_load(self) -> None:
        """
        Scan the hooks directory for hook directories and load their handlers.

-        Also registers built-in hooks that are always active.
-
        Each hook directory must contain:
          - HOOK.yaml with at least 'name' and 'events' keys
          - handler.py with a top-level 'handle' function (sync or async)
        """
-        self._register_builtin_hooks()
-
        if not HOOKS_DIR.exists():
            return

--- a/hermes_agent/gateway/mirror.py
+++ b/hermes_agent/gateway/mirror.py
@@ -12,13 +12,12 @@ the full SessionStore machinery.
 import json
 import logging
 from datetime import datetime
+from pathlib import Path
 from typing import Optional

-from hermes_agent.cli.config import get_hermes_home
-
 logger = logging.getLogger(__name__)

-_SESSIONS_DIR = get_hermes_home() / "sessions"
+_SESSIONS_DIR = Path.home() / ".hermes" / "sessions"
 _SESSIONS_INDEX = _SESSIONS_DIR / "sessions.json"


@@ -27,7 +26,6 @@ def mirror_to_session(
    chat_id: str,
    message_text: str,
    source_label: str = "cli",
-    thread_id: Optional[str] = None,
 ) -> bool:
    """
    Append a delivery-mirror message to the target session's transcript.
@@ -39,9 +37,9 @@ def mirror_to_session(
    All errors are caught -- this is never fatal.
    """
    try:
-        session_id = _find_session_id(platform, str(chat_id), thread_id=thread_id)
+        session_id = _find_session_id(platform, str(chat_id))
        if not session_id:
-            logger.debug("Mirror: no session found for %s:%s:%s", platform, chat_id, thread_id)
+            logger.debug("Mirror: no session found for %s:%s", platform, chat_id)
            return False

        mirror_msg = {
@@ -59,11 +57,11 @@ def mirror_to_session(
        return True

    except Exception as e:
-        logger.debug("Mirror failed for %s:%s:%s: %s", platform, chat_id, thread_id, e)
+        logger.debug("Mirror failed for %s:%s: %s", platform, chat_id, e)
        return False


-def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = None) -> Optional[str]:
+def _find_session_id(platform: str, chat_id: str) -> Optional[str]:
    """
    Find the active session_id for a platform + chat_id pair.

@@ -93,9 +91,6 @@ def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = Non

        origin_chat_id = str(origin.get("chat_id", ""))
        if origin_chat_id == str(chat_id):
-            origin_thread_id = origin.get("thread_id")
-            if thread_id is not None and str(origin_thread_id or "") != str(thread_id):
-                continue
            updated = entry.get("updated_at", "")
            if updated > best_updated:
                best_updated = updated
@@ -116,9 +111,8 @@ def _append_to_jsonl(session_id: str, message: dict) -> None:

 def _append_to_sqlite(session_id: str, message: dict) -> None:
    """Append a message to the SQLite session database."""
-    db = None
    try:
-        from hermes_agent.state import SessionDB
+        from hermes_state import SessionDB
        db = SessionDB()
        db.append_message(
            session_id=session_id,
@@ -127,6 +121,3 @@ def _append_to_sqlite(session_id: str, message: dict) -> None:
        )
    except Exception as e:
        logger.debug("Mirror SQLite write failed: %s", e)
-    finally:
-        if db is not None:
-            db.close()
--- a/hermes_agent/gateway/pairing.py
+++ b/hermes_agent/gateway/pairing.py
@@ -21,14 +21,10 @@ Storage: ~/.hermes/pairing/
 import json
 import os
 import secrets
-import tempfile
-import threading
 import time
 from pathlib import Path
 from typing import Optional

-from hermes_agent.constants import get_hermes_dir
-

 # Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
 ALPHABET = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789"
@@ -43,33 +39,17 @@ LOCKOUT_SECONDS = 3600              # Lockout duration after too many failures
 MAX_PENDING_PER_PLATFORM = 3        # Max pending codes per platform
 MAX_FAILED_ATTEMPTS = 5             # Failed approvals before lockout

-PAIRING_DIR = get_hermes_dir("platforms/pairing", "pairing")
+PAIRING_DIR = Path(os.path.expanduser("~/.hermes/pairing"))


 def _secure_write(path: Path, data: str) -> None:
-    """Write data to file with restrictive permissions (owner read/write only).
-
-    Uses a temp-file + atomic rename so readers always see either the old
-    complete file or the new one — never a partial write.
-    """
+    """Write data to file with restrictive permissions (owner read/write only)."""
    path.parent.mkdir(parents=True, exist_ok=True)
-    fd, tmp_path = tempfile.mkstemp(dir=str(path.parent), suffix=".tmp")
+    path.write_text(data, encoding="utf-8")
    try:
-        with os.fdopen(fd, "w", encoding="utf-8") as f:
-            f.write(data)
-            f.flush()
-            os.fsync(f.fileno())
-        os.replace(tmp_path, str(path))
-        try:
-            os.chmod(path, 0o600)
-        except OSError:
-            pass  # Windows doesn't support chmod the same way
-    except BaseException:
-        try:
-            os.unlink(tmp_path)
-        except OSError:
-            pass
-        raise
+        os.chmod(path, 0o600)
+    except OSError:
+        pass  # Windows doesn't support chmod the same way


 class PairingStore:
@@ -84,9 +64,6 @@ class PairingStore:

    def __init__(self):
        PAIRING_DIR.mkdir(parents=True, exist_ok=True)
-        # Protects all read-modify-write cycles. The gateway runs multiple
-        # platform adapters concurrently in threads sharing one PairingStore.
-        self._lock = threading.RLock()

    def _pending_path(self, platform: str) -> Path:
        return PAIRING_DIR / f"{platform}-pending.json"
@@ -126,7 +103,7 @@ class PairingStore:
        return results

    def _approve_user(self, platform: str, user_id: str, user_name: str = "") -> None:
-        """Add a user to the approved list. Must be called under self._lock."""
+        """Add a user to the approved list."""
        approved = self._load_json(self._approved_path(platform))
        approved[user_id] = {
            "user_name": user_name,
@@ -137,12 +114,11 @@ class PairingStore:
    def revoke(self, platform: str, user_id: str) -> bool:
        """Remove a user from the approved list. Returns True if found."""
        path = self._approved_path(platform)
-        with self._lock:
-            approved = self._load_json(path)
-            if user_id in approved:
-                del approved[user_id]
-                self._save_json(path, approved)
-                return True
+        approved = self._load_json(path)
+        if user_id in approved:
+            del approved[user_id]
+            self._save_json(path, approved)
+            return True
        return False

    # ----- Pending codes -----
@@ -158,37 +134,36 @@ class PairingStore:
          - Max pending codes reached for this platform
          - User/platform is in lockout due to failed attempts
        """
-        with self._lock:
-            self._cleanup_expired(platform)
+        self._cleanup_expired(platform)

-            # Check lockout
-            if self._is_locked_out(platform):
-                return None
+        # Check lockout
+        if self._is_locked_out(platform):
+            return None

-            # Check rate limit for this specific user
-            if self._is_rate_limited(platform, user_id):
-                return None
+        # Check rate limit for this specific user
+        if self._is_rate_limited(platform, user_id):
+            return None

-            # Check max pending
-            pending = self._load_json(self._pending_path(platform))
-            if len(pending) >= MAX_PENDING_PER_PLATFORM:
-                return None
+        # Check max pending
+        pending = self._load_json(self._pending_path(platform))
+        if len(pending) >= MAX_PENDING_PER_PLATFORM:
+            return None

-            # Generate cryptographically random code
-            code = "".join(secrets.choice(ALPHABET) for _ in range(CODE_LENGTH))
+        # Generate cryptographically random code
+        code = "".join(secrets.choice(ALPHABET) for _ in range(CODE_LENGTH))

-            # Store pending request
-            pending[code] = {
-                "user_id": user_id,
-                "user_name": user_name,
-                "created_at": time.time(),
-            }
-            self._save_json(self._pending_path(platform), pending)
+        # Store pending request
+        pending[code] = {
+            "user_id": user_id,
+            "user_name": user_name,
+            "created_at": time.time(),
+        }
+        self._save_json(self._pending_path(platform), pending)

-            # Record rate limit
-            self._record_rate_limit(platform, user_id)
+        # Record rate limit
+        self._record_rate_limit(platform, user_id)

-            return code
+        return code

    def approve_code(self, platform: str, code: str) -> Optional[dict]:
        """
@@ -196,25 +171,24 @@ class PairingStore:

        Returns {user_id, user_name} on success, None if code is invalid/expired.
        """
-        with self._lock:
-            self._cleanup_expired(platform)
-            code = code.upper().strip()
+        self._cleanup_expired(platform)
+        code = code.upper().strip()

-            pending = self._load_json(self._pending_path(platform))
-            if code not in pending:
-                self._record_failed_attempt(platform)
-                return None
+        pending = self._load_json(self._pending_path(platform))
+        if code not in pending:
+            self._record_failed_attempt(platform)
+            return None

-            entry = pending.pop(code)
-            self._save_json(self._pending_path(platform), pending)
+        entry = pending.pop(code)
+        self._save_json(self._pending_path(platform), pending)

-            # Add to approved list
-            self._approve_user(platform, entry["user_id"], entry.get("user_name", ""))
+        # Add to approved list
+        self._approve_user(platform, entry["user_id"], entry.get("user_name", ""))

-            return {
-                "user_id": entry["user_id"],
-                "user_name": entry.get("user_name", ""),
-            }
+        return {
+            "user_id": entry["user_id"],
+            "user_name": entry.get("user_name", ""),
+        }

    def list_pending(self, platform: str = None) -> list:
        """List pending pairing requests, optionally filtered by platform."""
@@ -236,13 +210,12 @@ class PairingStore:

    def clear_pending(self, platform: str = None) -> int:
        """Clear all pending requests. Returns count removed."""
-        with self._lock:
-            count = 0
-            platforms = [platform] if platform else self._all_platforms("pending")
-            for p in platforms:
-                pending = self._load_json(self._pending_path(p))
-                count += len(pending)
-                self._save_json(self._pending_path(p), {})
+        count = 0
+        platforms = [platform] if platform else self._all_platforms("pending")
+        for p in platforms:
+            pending = self._load_json(self._pending_path(p))
+            count += len(pending)
+            self._save_json(self._pending_path(p), {})
        return count

    # ----- Rate limiting and lockout -----
--- a/hermes_agent/gateway/platforms/ADDING_A_PLATFORM.md
+++ b/hermes_agent/gateway/platforms/ADDING_A_PLATFORM.md
@@ -173,7 +173,7 @@ platform_map = {
 }
 ```

-Without this, `cronjob(action="create", deliver="your_platform", ...)` silently fails.
+Without this, `schedule_cronjob(deliver="your_platform")` silently fails.

 ---

--- a/hermes_agent/gateway/platforms/init.py
+++ b/hermes_agent/gateway/platforms/init.py
@@ -9,11 +9,9 @@ Each adapter handles:
 """

 from .base import BasePlatformAdapter, MessageEvent, SendResult
-from .qqbot import QQAdapter

 __all__ = [
    "BasePlatformAdapter",
    "MessageEvent",
    "SendResult",
-    "QQAdapter",
 ]
--- a/gateway/platforms/base.py
+++ b/gateway/platforms/base.py
@@ -0,0 +1,971 @@
+"""
+Base platform adapter interface.
+
+All platform adapters (Telegram, Discord, WhatsApp) inherit from this
+and implement the required methods.
+"""
+
+import asyncio
+import logging
+import os
+import re
+import uuid
+from abc import ABC, abstractmethod
+
+logger = logging.getLogger(__name__)
+from dataclasses import dataclass, field
+from datetime import datetime
+from pathlib import Path
+from typing import Dict, List, Optional, Any, Callable, Awaitable, Tuple
+from enum import Enum
+
+import sys
+from pathlib import Path as _Path
+sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
+
+from gateway.config import Platform, PlatformConfig
+from gateway.session import SessionSource
+
+
+# ---------------------------------------------------------------------------
+# Image cache utilities
+#
+# When users send images on messaging platforms, we download them to a local
+# cache directory so they can be analyzed by the vision tool (which accepts
+# local file paths). This avoids issues with ephemeral platform URLs
+# (e.g. Telegram file URLs expire after ~1 hour).
+# ---------------------------------------------------------------------------
+
+# Default location: ~/.hermes/image_cache/
+IMAGE_CACHE_DIR = Path(os.path.expanduser("~/.hermes/image_cache"))
+
+
+def get_image_cache_dir() -> Path:
+    """Return the image cache directory, creating it if it doesn't exist."""
+    IMAGE_CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    return IMAGE_CACHE_DIR
+
+
+def cache_image_from_bytes(data: bytes, ext: str = ".jpg") -> str:
+    """
+    Save raw image bytes to the cache and return the absolute file path.
+
+    Args:
+        data: Raw image bytes.
+        ext:  File extension including the dot (e.g. ".jpg", ".png").
+
+    Returns:
+        Absolute path to the cached image file as a string.
+    """
+    cache_dir = get_image_cache_dir()
+    filename = f"img_{uuid.uuid4().hex[:12]}{ext}"
+    filepath = cache_dir / filename
+    filepath.write_bytes(data)
+    return str(filepath)
+
+
+async def cache_image_from_url(url: str, ext: str = ".jpg") -> str:
+    """
+    Download an image from a URL and save it to the local cache.
+
+    Uses httpx for async download with a reasonable timeout.
+
+    Args:
+        url: The HTTP/HTTPS URL to download from.
+        ext: File extension including the dot (e.g. ".jpg", ".png").
+
+    Returns:
+        Absolute path to the cached image file as a string.
+    """
+    import httpx
+
+    async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+        response = await client.get(
+            url,
+            headers={
+                "User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
+                "Accept": "image/*,*/*;q=0.8",
+            },
+        )
+        response.raise_for_status()
+        return cache_image_from_bytes(response.content, ext)
+
+
+def cleanup_image_cache(max_age_hours: int = 24) -> int:
+    """
+    Delete cached images older than *max_age_hours*.
+
+    Returns the number of files removed.
+    """
+    import time
+
+    cache_dir = get_image_cache_dir()
+    cutoff = time.time() - (max_age_hours * 3600)
+    removed = 0
+    for f in cache_dir.iterdir():
+        if f.is_file() and f.stat().st_mtime < cutoff:
+            try:
+                f.unlink()
+                removed += 1
+            except OSError:
+                pass
+    return removed
+
+
+# ---------------------------------------------------------------------------
+# Audio cache utilities
+#
+# Same pattern as image cache -- voice messages from platforms are downloaded
+# here so the STT tool (OpenAI Whisper) can transcribe them from local files.
+# ---------------------------------------------------------------------------
+
+AUDIO_CACHE_DIR = Path(os.path.expanduser("~/.hermes/audio_cache"))
+
+
+def get_audio_cache_dir() -> Path:
+    """Return the audio cache directory, creating it if it doesn't exist."""
+    AUDIO_CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    return AUDIO_CACHE_DIR
+
+
+def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
+    """
+    Save raw audio bytes to the cache and return the absolute file path.
+
+    Args:
+        data: Raw audio bytes.
+        ext:  File extension including the dot (e.g. ".ogg", ".mp3").
+
+    Returns:
+        Absolute path to the cached audio file as a string.
+    """
+    cache_dir = get_audio_cache_dir()
+    filename = f"audio_{uuid.uuid4().hex[:12]}{ext}"
+    filepath = cache_dir / filename
+    filepath.write_bytes(data)
+    return str(filepath)
+
+
+async def cache_audio_from_url(url: str, ext: str = ".ogg") -> str:
+    """
+    Download an audio file from a URL and save it to the local cache.
+
+    Args:
+        url: The HTTP/HTTPS URL to download from.
+        ext: File extension including the dot (e.g. ".ogg", ".mp3").
+
+    Returns:
+        Absolute path to the cached audio file as a string.
+    """
+    import httpx
+
+    async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+        response = await client.get(
+            url,
+            headers={
+                "User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
+                "Accept": "audio/*,*/*;q=0.8",
+            },
+        )
+        response.raise_for_status()
+        return cache_audio_from_bytes(response.content, ext)
+
+
+# ---------------------------------------------------------------------------
+# Document cache utilities
+#
+# Same pattern as image/audio cache -- documents from platforms are downloaded
+# here so the agent can reference them by local file path.
+# ---------------------------------------------------------------------------
+
+DOCUMENT_CACHE_DIR = Path(os.path.expanduser("~/.hermes/document_cache"))
+
+SUPPORTED_DOCUMENT_TYPES = {
+    ".pdf": "application/pdf",
+    ".md": "text/markdown",
+    ".txt": "text/plain",
+    ".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+    ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
+    ".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
+}
+
+
+def get_document_cache_dir() -> Path:
+    """Return the document cache directory, creating it if it doesn't exist."""
+    DOCUMENT_CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    return DOCUMENT_CACHE_DIR
+
+
+def cache_document_from_bytes(data: bytes, filename: str) -> str:
+    """
+    Save raw document bytes to the cache and return the absolute file path.
+
+    The cached filename preserves the original human-readable name with a
+    unique prefix: ``doc_{uuid12}_{original_filename}``.
+
+    Args:
+        data: Raw document bytes.
+        filename: Original filename (e.g. "report.pdf").
+
+    Returns:
+        Absolute path to the cached document file as a string.
+
+    Raises:
+        ValueError: If the sanitized path escapes the cache directory.
+    """
+    cache_dir = get_document_cache_dir()
+    # Sanitize: strip directory components, null bytes, and control characters
+    safe_name = Path(filename).name if filename else "document"
+    safe_name = safe_name.replace("\x00", "").strip()
+    if not safe_name or safe_name in (".", ".."):
+        safe_name = "document"
+    cached_name = f"doc_{uuid.uuid4().hex[:12]}_{safe_name}"
+    filepath = cache_dir / cached_name
+    # Final safety check: ensure path stays inside cache dir
+    if not filepath.resolve().is_relative_to(cache_dir.resolve()):
+        raise ValueError(f"Path traversal rejected: {filename!r}")
+    filepath.write_bytes(data)
+    return str(filepath)
+
+
+def cleanup_document_cache(max_age_hours: int = 24) -> int:
+    """
+    Delete cached documents older than *max_age_hours*.
+
+    Returns the number of files removed.
+    """
+    import time
+
+    cache_dir = get_document_cache_dir()
+    cutoff = time.time() - (max_age_hours * 3600)
+    removed = 0
+    for f in cache_dir.iterdir():
+        if f.is_file() and f.stat().st_mtime < cutoff:
+            try:
+                f.unlink()
+                removed += 1
+            except OSError:
+                pass
+    return removed
+
+
+class MessageType(Enum):
+    """Types of incoming messages."""
+    TEXT = "text"
+    PHOTO = "photo"
+    VIDEO = "video"
+    AUDIO = "audio"
+    VOICE = "voice"
+    DOCUMENT = "document"
+    STICKER = "sticker"
+    COMMAND = "command"  # /command style
+
+
+@dataclass
+class MessageEvent:
+    """
+    Incoming message from a platform.
+    
+    Normalized representation that all adapters produce.
+    """
+    # Message content
+    text: str
+    message_type: MessageType = MessageType.TEXT
+    
+    # Source information
+    source: SessionSource = None
+    
+    # Original platform data
+    raw_message: Any = None
+    message_id: Optional[str] = None
+    
+    # Media attachments
+    media_urls: List[str] = field(default_factory=list)
+    media_types: List[str] = field(default_factory=list)
+    
+    # Reply context
+    reply_to_message_id: Optional[str] = None
+    
+    # Timestamps
+    timestamp: datetime = field(default_factory=datetime.now)
+    
+    def is_command(self) -> bool:
+        """Check if this is a command message (e.g., /new, /reset)."""
+        return self.text.startswith("/")
+    
+    def get_command(self) -> Optional[str]:
+        """Extract command name if this is a command message."""
+        if not self.is_command():
+            return None
+        # Split on space and get first word, strip the /
+        parts = self.text.split(maxsplit=1)
+        return parts[0][1:].lower() if parts else None
+    
+    def get_command_args(self) -> str:
+        """Get the arguments after a command."""
+        if not self.is_command():
+            return self.text
+        parts = self.text.split(maxsplit=1)
+        return parts[1] if len(parts) > 1 else ""
+
+
+@dataclass 
+class SendResult:
+    """Result of sending a message."""
+    success: bool
+    message_id: Optional[str] = None
+    error: Optional[str] = None
+    raw_response: Any = None
+
+
+# Type for message handlers
+MessageHandler = Callable[[MessageEvent], Awaitable[Optional[str]]]
+
+
+class BasePlatformAdapter(ABC):
+    """
+    Base class for platform adapters.
+    
+    Subclasses implement platform-specific logic for:
+    - Connecting and authenticating
+    - Receiving messages
+    - Sending messages/responses
+    - Handling media
+    """
+    
+    def __init__(self, config: PlatformConfig, platform: Platform):
+        self.config = config
+        self.platform = platform
+        self._message_handler: Optional[MessageHandler] = None
+        self._running = False
+        
+        # Track active message handlers per session for interrupt support
+        # Key: session_key (e.g., chat_id), Value: (event, asyncio.Event for interrupt)
+        self._active_sessions: Dict[str, asyncio.Event] = {}
+        self._pending_messages: Dict[str, MessageEvent] = {}
+    
+    @property
+    def name(self) -> str:
+        """Human-readable name for this adapter."""
+        return self.platform.value.title()
+    
+    @property
+    def is_connected(self) -> bool:
+        """Check if adapter is currently connected."""
+        return self._running
+    
+    def set_message_handler(self, handler: MessageHandler) -> None:
+        """
+        Set the handler for incoming messages.
+        
+        The handler receives a MessageEvent and should return
+        an optional response string.
+        """
+        self._message_handler = handler
+    
+    @abstractmethod
+    async def connect(self) -> bool:
+        """
+        Connect to the platform and start receiving messages.
+        
+        Returns True if connection was successful.
+        """
+        pass
+    
+    @abstractmethod
+    async def disconnect(self) -> None:
+        """Disconnect from the platform."""
+        pass
+    
+    @abstractmethod
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """
+        Send a message to a chat.
+        
+        Args:
+            chat_id: The chat/channel ID to send to
+            content: Message content (may be markdown)
+            reply_to: Optional message ID to reply to
+            metadata: Additional platform-specific options
+        
+        Returns:
+            SendResult with success status and message ID
+        """
+        pass
+
+    async def edit_message(
+        self,
+        chat_id: str,
+        message_id: str,
+        content: str,
+    ) -> SendResult:
+        """
+        Edit a previously sent message. Optional — platforms that don't
+        support editing return success=False and callers fall back to
+        sending a new message.
+        """
+        return SendResult(success=False, error="Not supported")
+
+    async def send_typing(self, chat_id: str) -> None:
+        """
+        Send a typing indicator.
+        
+        Override in subclasses if the platform supports it.
+        """
+        pass
+    
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """
+        Send an image natively via the platform API.
+        
+        Override in subclasses to send images as proper attachments
+        instead of plain-text URLs. Default falls back to sending the
+        URL as a text message.
+        """
+        # Fallback: send URL as text (subclasses override for native images)
+        text = f"{caption}\n{image_url}" if caption else image_url
+        return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+    
+    async def send_animation(
+        self,
+        chat_id: str,
+        animation_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """
+        Send an animated GIF natively via the platform API.
+        
+        Override in subclasses to send GIFs as proper animations
+        (e.g., Telegram send_animation) so they auto-play inline.
+        Default falls back to send_image.
+        """
+        return await self.send_image(chat_id=chat_id, image_url=animation_url, caption=caption, reply_to=reply_to)
+    
+    @staticmethod
+    def _is_animation_url(url: str) -> bool:
+        """Check if a URL points to an animated GIF (vs a static image)."""
+        lower = url.lower().split('?')[0]  # Strip query params
+        return lower.endswith('.gif')
+
+    @staticmethod
+    def extract_images(content: str) -> Tuple[List[Tuple[str, str]], str]:
+        """
+        Extract image URLs from markdown and HTML image tags in a response.
+        
+        Finds patterns like:
+        - ![alt text](https://example.com/image.png)
+        - <img src="https://example.com/image.png">
+        - <img src="https://example.com/image.png"></img>
+        
+        Args:
+            content: The response text to scan.
+        
+        Returns:
+            Tuple of (list of (url, alt_text) pairs, cleaned content with image tags removed).
+        """
+        images = []
+        cleaned = content
+        
+        # Match markdown images: ![alt](url)
+        md_pattern = r'!\[([^\]]*)\]\((https?://[^\s\)]+)\)'
+        for match in re.finditer(md_pattern, content):
+            alt_text = match.group(1)
+            url = match.group(2)
+            # Only extract URLs that look like actual images
+            if any(url.lower().endswith(ext) or ext in url.lower() for ext in
+                   ['.png', '.jpg', '.jpeg', '.gif', '.webp', 'fal.media', 'fal-cdn', 'replicate.delivery']):
+                images.append((url, alt_text))
+        
+        # Match HTML img tags: <img src="url"> or <img src="url"></img> or <img src="url"/>
+        html_pattern = r'<img\s+src=["\']?(https?://[^\s"\'<>]+)["\']?\s*/?>\s*(?:</img>)?'
+        for match in re.finditer(html_pattern, content):
+            url = match.group(1)
+            images.append((url, ""))
+        
+        # Remove only the matched image tags from content (not all markdown images)
+        if images:
+            extracted_urls = {url for url, _ in images}
+            def _remove_if_extracted(match):
+                url = match.group(2) if match.lastindex >= 2 else match.group(1)
+                return '' if url in extracted_urls else match.group(0)
+            cleaned = re.sub(md_pattern, _remove_if_extracted, cleaned)
+            cleaned = re.sub(html_pattern, _remove_if_extracted, cleaned)
+            # Clean up leftover blank lines
+            cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
+        
+        return images, cleaned
+    
+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """
+        Send an audio file as a native voice message via the platform API.
+        
+        Override in subclasses to send audio as voice bubbles (Telegram)
+        or file attachments (Discord). Default falls back to sending the
+        file path as text.
+        """
+        text = f"🔊 Audio: {audio_path}"
+        if caption:
+            text = f"{caption}\n{text}"
+        return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+
+    async def send_video(
+        self,
+        chat_id: str,
+        video_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """
+        Send a video natively via the platform API.
+
+        Override in subclasses to send videos as inline playable media.
+        Default falls back to sending the file path as text.
+        """
+        text = f"🎬 Video: {video_path}"
+        if caption:
+            text = f"{caption}\n{text}"
+        return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+
+    async def send_document(
+        self,
+        chat_id: str,
+        file_path: str,
+        caption: Optional[str] = None,
+        file_name: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """
+        Send a document/file natively via the platform API.
+
+        Override in subclasses to send files as downloadable attachments.
+        Default falls back to sending the file path as text.
+        """
+        text = f"📎 File: {file_path}"
+        if caption:
+            text = f"{caption}\n{text}"
+        return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+
+    async def send_image_file(
+        self,
+        chat_id: str,
+        image_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """
+        Send a local image file natively via the platform API.
+
+        Unlike send_image() which takes a URL, this takes a local file path.
+        Override in subclasses for native photo attachments.
+        Default falls back to sending the file path as text.
+        """
+        text = f"🖼️ Image: {image_path}"
+        if caption:
+            text = f"{caption}\n{text}"
+        return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+
+    @staticmethod
+    def extract_media(content: str) -> Tuple[List[Tuple[str, bool]], str]:
+        """
+        Extract MEDIA:<path> tags and [[audio_as_voice]] directives from response text.
+        
+        The TTS tool returns responses like:
+            [[audio_as_voice]]
+            MEDIA:/path/to/audio.ogg
+        
+        Args:
+            content: The response text to scan.
+        
+        Returns:
+            Tuple of (list of (path, is_voice) pairs, cleaned content with tags removed).
+        """
+        media = []
+        cleaned = content
+        
+        # Check for [[audio_as_voice]] directive
+        has_voice_tag = "[[audio_as_voice]]" in content
+        cleaned = cleaned.replace("[[audio_as_voice]]", "")
+        
+        # Extract MEDIA:<path> tags (path may contain spaces)
+        media_pattern = r'MEDIA:(\S+)'
+        for match in re.finditer(media_pattern, content):
+            path = match.group(1).strip()
+            if path:
+                media.append((path, has_voice_tag))
+        
+        # Remove MEDIA tags from content
+        if media:
+            cleaned = re.sub(media_pattern, '', cleaned)
+            cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
+        
+        return media, cleaned
+    
+    async def _keep_typing(self, chat_id: str, interval: float = 2.0) -> None:
+        """
+        Continuously send typing indicator until cancelled.
+        
+        Telegram/Discord typing status expires after ~5 seconds, so we refresh every 2
+        to recover quickly after progress messages interrupt it.
+        """
+        try:
+            while True:
+                await self.send_typing(chat_id)
+                await asyncio.sleep(interval)
+        except asyncio.CancelledError:
+            pass  # Normal cancellation when handler completes
+    
+    async def handle_message(self, event: MessageEvent) -> None:
+        """
+        Process an incoming message.
+        
+        This method returns quickly by spawning background tasks.
+        This allows new messages to be processed even while an agent is running,
+        enabling interruption support.
+        """
+        if not self._message_handler:
+            return
+        
+        session_key = event.source.chat_id
+        
+        # Check if there's already an active handler for this session
+        if session_key in self._active_sessions:
+            # Store this as a pending message - it will interrupt the running agent
+            print(f"[{self.name}] ⚡ New message while session {session_key} is active - triggering interrupt")
+            self._pending_messages[session_key] = event
+            # Signal the interrupt (the processing task checks this)
+            self._active_sessions[session_key].set()
+            return  # Don't process now - will be handled after current task finishes
+        
+        # Spawn background task to process this message
+        asyncio.create_task(self._process_message_background(event, session_key))
+    
+    @staticmethod
+    def _get_human_delay() -> float:
+        """
+        Return a random delay in seconds for human-like response pacing.
+
+        Reads from env vars:
+          HERMES_HUMAN_DELAY_MODE: "off" (default) | "natural" | "custom"
+          HERMES_HUMAN_DELAY_MIN_MS: minimum delay in ms (default 800, custom mode)
+          HERMES_HUMAN_DELAY_MAX_MS: maximum delay in ms (default 2500, custom mode)
+        """
+        import random
+
+        mode = os.getenv("HERMES_HUMAN_DELAY_MODE", "off").lower()
+        if mode == "off":
+            return 0.0
+        min_ms = int(os.getenv("HERMES_HUMAN_DELAY_MIN_MS", "800"))
+        max_ms = int(os.getenv("HERMES_HUMAN_DELAY_MAX_MS", "2500"))
+        if mode == "natural":
+            min_ms, max_ms = 800, 2500
+        return random.uniform(min_ms / 1000.0, max_ms / 1000.0)
+
+    async def _process_message_background(self, event: MessageEvent, session_key: str) -> None:
+        """Background task that actually processes the message."""
+        # Create interrupt event for this session
+        interrupt_event = asyncio.Event()
+        self._active_sessions[session_key] = interrupt_event
+        
+        # Start continuous typing indicator (refreshes every 2 seconds)
+        typing_task = asyncio.create_task(self._keep_typing(event.source.chat_id))
+        
+        try:
+            # Call the handler (this can take a while with tool calls)
+            response = await self._message_handler(event)
+            
+            # Send response if any
+            if not response:
+                logger.warning("[%s] Handler returned empty/None response for %s", self.name, event.source.chat_id)
+            if response:
+                # Extract MEDIA:<path> tags (from TTS tool) before other processing
+                media_files, response = self.extract_media(response)
+                
+                # Extract image URLs and send them as native platform attachments
+                images, text_content = self.extract_images(response)
+                if images:
+                    logger.info("[%s] extract_images found %d image(s) in response (%d chars)", self.name, len(images), len(response))
+                
+                # Send the text portion first (if any remains after extractions)
+                if text_content:
+                    logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id)
+                    result = await self.send(
+                        chat_id=event.source.chat_id,
+                        content=text_content,
+                        reply_to=event.message_id
+                    )
+                    
+                    # Log send failures (don't raise - user already saw tool progress)
+                    if not result.success:
+                        print(f"[{self.name}] Failed to send response: {result.error}")
+                        # Try sending without markdown as fallback
+                        fallback_result = await self.send(
+                            chat_id=event.source.chat_id,
+                            content=f"(Response formatting failed, plain text:)\n\n{text_content[:3500]}",
+                            reply_to=event.message_id
+                        )
+                        if not fallback_result.success:
+                            print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
+                
+                # Human-like pacing delay between text and media
+                human_delay = self._get_human_delay()
+                
+                # Send extracted images as native attachments
+                if images:
+                    logger.info("[%s] Extracted %d image(s) to send as attachments", self.name, len(images))
+                for image_url, alt_text in images:
+                    if human_delay > 0:
+                        await asyncio.sleep(human_delay)
+                    try:
+                        logger.info("[%s] Sending image: %s (alt=%s)", self.name, image_url[:80], alt_text[:30] if alt_text else "")
+                        # Route animated GIFs through send_animation for proper playback
+                        if self._is_animation_url(image_url):
+                            img_result = await self.send_animation(
+                                chat_id=event.source.chat_id,
+                                animation_url=image_url,
+                                caption=alt_text if alt_text else None,
+                            )
+                        else:
+                            img_result = await self.send_image(
+                                chat_id=event.source.chat_id,
+                                image_url=image_url,
+                                caption=alt_text if alt_text else None,
+                            )
+                        if not img_result.success:
+                            logger.error("[%s] Failed to send image: %s", self.name, img_result.error)
+                    except Exception as img_err:
+                        logger.error("[%s] Error sending image: %s", self.name, img_err, exc_info=True)
+                
+                # Send extracted media files — route by file type
+                _AUDIO_EXTS = {'.ogg', '.opus', '.mp3', '.wav', '.m4a'}
+                _VIDEO_EXTS = {'.mp4', '.mov', '.avi', '.mkv', '.3gp'}
+                _IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.webp', '.gif'}
+
+                for media_path, is_voice in media_files:
+                    if human_delay > 0:
+                        await asyncio.sleep(human_delay)
+                    try:
+                        ext = Path(media_path).suffix.lower()
+                        if ext in _AUDIO_EXTS:
+                            media_result = await self.send_voice(
+                                chat_id=event.source.chat_id,
+                                audio_path=media_path,
+                            )
+                        elif ext in _VIDEO_EXTS:
+                            media_result = await self.send_video(
+                                chat_id=event.source.chat_id,
+                                video_path=media_path,
+                            )
+                        elif ext in _IMAGE_EXTS:
+                            media_result = await self.send_image_file(
+                                chat_id=event.source.chat_id,
+                                image_path=media_path,
+                            )
+                        else:
+                            media_result = await self.send_document(
+                                chat_id=event.source.chat_id,
+                                file_path=media_path,
+                            )
+
+                        if not media_result.success:
+                            print(f"[{self.name}] Failed to send media ({ext}): {media_result.error}")
+                    except Exception as media_err:
+                        print(f"[{self.name}] Error sending media: {media_err}")
+            
+            # Check if there's a pending message that was queued during our processing
+            if session_key in self._pending_messages:
+                pending_event = self._pending_messages.pop(session_key)
+                print(f"[{self.name}] 📨 Processing queued message from interrupt")
+                # Clean up current session before processing pending
+                if session_key in self._active_sessions:
+                    del self._active_sessions[session_key]
+                typing_task.cancel()
+                try:
+                    await typing_task
+                except asyncio.CancelledError:
+                    pass
+                # Process pending message in new background task
+                await self._process_message_background(pending_event, session_key)
+                return  # Already cleaned up
+                
+        except Exception as e:
+            print(f"[{self.name}] Error handling message: {e}")
+            import traceback
+            traceback.print_exc()
+        finally:
+            # Stop typing indicator
+            typing_task.cancel()
+            try:
+                await typing_task
+            except asyncio.CancelledError:
+                pass
+            # Clean up session tracking
+            if session_key in self._active_sessions:
+                del self._active_sessions[session_key]
+    
+    def has_pending_interrupt(self, session_key: str) -> bool:
+        """Check if there's a pending interrupt for a session."""
+        return session_key in self._active_sessions and self._active_sessions[session_key].is_set()
+    
+    def get_pending_message(self, session_key: str) -> Optional[MessageEvent]:
+        """Get and clear any pending message for a session."""
+        return self._pending_messages.pop(session_key, None)
+    
+    def build_source(
+        self,
+        chat_id: str,
+        chat_name: Optional[str] = None,
+        chat_type: str = "dm",
+        user_id: Optional[str] = None,
+        user_name: Optional[str] = None,
+        thread_id: Optional[str] = None,
+        chat_topic: Optional[str] = None,
+        user_id_alt: Optional[str] = None,
+        chat_id_alt: Optional[str] = None,
+    ) -> SessionSource:
+        """Helper to build a SessionSource for this platform."""
+        # Normalize empty topic to None
+        if chat_topic is not None and not chat_topic.strip():
+            chat_topic = None
+        return SessionSource(
+            platform=self.platform,
+            chat_id=str(chat_id),
+            chat_name=chat_name,
+            chat_type=chat_type,
+            user_id=str(user_id) if user_id else None,
+            user_name=user_name,
+            thread_id=str(thread_id) if thread_id else None,
+            chat_topic=chat_topic.strip() if chat_topic else None,
+            user_id_alt=user_id_alt,
+            chat_id_alt=chat_id_alt,
+        )
+    
+    @abstractmethod
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """
+        Get information about a chat/channel.
+        
+        Returns dict with at least:
+        - name: Chat name
+        - type: "dm", "group", "channel"
+        """
+        pass
+    
+    def format_message(self, content: str) -> str:
+        """
+        Format a message for this platform.
+        
+        Override in subclasses to handle platform-specific formatting
+        (e.g., Telegram MarkdownV2, Discord markdown).
+        
+        Default implementation returns content as-is.
+        """
+        return content
+    
+    def truncate_message(self, content: str, max_length: int = 4096) -> List[str]:
+        """
+        Split a long message into chunks, preserving code block boundaries.
+
+        When a split falls inside a triple-backtick code block, the fence is
+        closed at the end of the current chunk and reopened (with the original
+        language tag) at the start of the next chunk.  Multi-chunk responses
+        receive indicators like ``(1/3)``.
+
+        Args:
+            content: The full message content
+            max_length: Maximum length per chunk (platform-specific)
+
+        Returns:
+            List of message chunks
+        """
+        if len(content) <= max_length:
+            return [content]
+
+        INDICATOR_RESERVE = 10   # room for " (XX/XX)"
+        FENCE_CLOSE = "\n```"
+
+        chunks: List[str] = []
+        remaining = content
+        # When the previous chunk ended mid-code-block, this holds the
+        # language tag (possibly "") so we can reopen the fence.
+        carry_lang: Optional[str] = None
+
+        while remaining:
+            # If we're continuing a code block from the previous chunk,
+            # prepend a new opening fence with the same language tag.
+            prefix = f"```{carry_lang}\n" if carry_lang is not None else ""
+
+            # How much body text we can fit after accounting for the prefix,
+            # a potential closing fence, and the chunk indicator.
+            headroom = max_length - INDICATOR_RESERVE - len(prefix) - len(FENCE_CLOSE)
+            if headroom < 1:
+                headroom = max_length // 2
+
+            # Everything remaining fits in one final chunk
+            if len(prefix) + len(remaining) <= max_length - INDICATOR_RESERVE:
+                chunks.append(prefix + remaining)
+                break
+
+            # Find a natural split point (prefer newlines, then spaces)
+            region = remaining[:headroom]
+            split_at = region.rfind("\n")
+            if split_at < headroom // 2:
+                split_at = region.rfind(" ")
+            if split_at < 1:
+                split_at = headroom
+
+            chunk_body = remaining[:split_at]
+            remaining = remaining[split_at:].lstrip()
+
+            full_chunk = prefix + chunk_body
+
+            # Walk only the chunk_body (not the prefix we prepended) to
+            # determine whether we end inside an open code block.
+            in_code = carry_lang is not None
+            lang = carry_lang or ""
+            for line in chunk_body.split("\n"):
+                stripped = line.strip()
+                if stripped.startswith("```"):
+                    if in_code:
+                        in_code = False
+                        lang = ""
+                    else:
+                        in_code = True
+                        tag = stripped[3:].strip()
+                        lang = tag.split()[0] if tag else ""
+
+            if in_code:
+                # Close the orphaned fence so the chunk is valid on its own
+                full_chunk += FENCE_CLOSE
+                carry_lang = lang
+            else:
+                carry_lang = None
+
+            chunks.append(full_chunk)
+
+        # Append chunk indicators when the response spans multiple messages
+        if len(chunks) > 1:
+            total = len(chunks)
+            chunks = [
+                f"{chunk} ({i + 1}/{total})" for i, chunk in enumerate(chunks)
+            ]
+
+        return chunks
--- a/gateway/platforms/discord.py
+++ b/gateway/platforms/discord.py
@@ -0,0 +1,976 @@
+"""
+Discord platform adapter.
+
+Uses discord.py library for:
+- Receiving messages from servers and DMs
+- Sending responses back
+- Handling threads and channels
+"""
+
+import asyncio
+import logging
+import os
+from typing import Dict, List, Optional, Any
+
+logger = logging.getLogger(__name__)
+
+try:
+    import discord
+    from discord import Message as DiscordMessage, Intents
+    from discord.ext import commands
+    DISCORD_AVAILABLE = True
+except ImportError:
+    DISCORD_AVAILABLE = False
+    discord = None
+    DiscordMessage = Any
+    Intents = Any
+    commands = None
+
+import sys
+from pathlib import Path as _Path
+sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_image_from_url,
+    cache_audio_from_url,
+)
+
+
+def check_discord_requirements() -> bool:
+    """Check if Discord dependencies are available."""
+    return DISCORD_AVAILABLE
+
+
+class DiscordAdapter(BasePlatformAdapter):
+    """
+    Discord bot adapter.
+    
+    Handles:
+    - Receiving messages from servers and DMs
+    - Sending responses with Discord markdown
+    - Thread support
+    - Native slash commands (/ask, /reset, /status, /stop)
+    - Button-based exec approvals
+    - Auto-threading for long conversations
+    - Reaction-based feedback
+    """
+    
+    # Discord message limits
+    MAX_MESSAGE_LENGTH = 2000
+    
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.DISCORD)
+        self._client: Optional[commands.Bot] = None
+        self._ready_event = asyncio.Event()
+        self._allowed_user_ids: set = set()  # For button approval authorization
+    
+    async def connect(self) -> bool:
+        """Connect to Discord and start receiving events."""
+        if not DISCORD_AVAILABLE:
+            print(f"[{self.name}] discord.py not installed. Run: pip install discord.py")
+            return False
+        
+        if not self.config.token:
+            print(f"[{self.name}] No bot token configured")
+            return False
+        
+        try:
+            # Set up intents -- members intent needed for username-to-ID resolution
+            intents = Intents.default()
+            intents.message_content = True
+            intents.dm_messages = True
+            intents.guild_messages = True
+            intents.members = True
+            
+            # Create bot
+            self._client = commands.Bot(
+                command_prefix="!",  # Not really used, we handle raw messages
+                intents=intents,
+            )
+            
+            # Parse allowed user entries (may contain usernames or IDs)
+            allowed_env = os.getenv("DISCORD_ALLOWED_USERS", "")
+            if allowed_env:
+                self._allowed_user_ids = {
+                    uid.strip() for uid in allowed_env.split(",") if uid.strip()
+                }
+            
+            adapter_self = self  # capture for closure
+            
+            # Register event handlers
+            @self._client.event
+            async def on_ready():
+                print(f"[{adapter_self.name}] Connected as {adapter_self._client.user}")
+                
+                # Resolve any usernames in the allowed list to numeric IDs
+                await adapter_self._resolve_allowed_usernames()
+                
+                # Sync slash commands with Discord
+                try:
+                    synced = await adapter_self._client.tree.sync()
+                    print(f"[{adapter_self.name}] Synced {len(synced)} slash command(s)")
+                except Exception as e:
+                    print(f"[{adapter_self.name}] Slash command sync failed: {e}")
+                adapter_self._ready_event.set()
+            
+            @self._client.event
+            async def on_message(message: DiscordMessage):
+                # Ignore bot's own messages
+                if message.author == self._client.user:
+                    return
+                await self._handle_message(message)
+            
+            # Register slash commands
+            self._register_slash_commands()
+            
+            # Start the bot in background
+            asyncio.create_task(self._client.start(self.config.token))
+            
+            # Wait for ready
+            await asyncio.wait_for(self._ready_event.wait(), timeout=30)
+            
+            self._running = True
+            return True
+            
+        except asyncio.TimeoutError:
+            print(f"[{self.name}] Timeout waiting for connection")
+            return False
+        except Exception as e:
+            print(f"[{self.name}] Failed to connect: {e}")
+            return False
+    
+    async def disconnect(self) -> None:
+        """Disconnect from Discord."""
+        if self._client:
+            try:
+                await self._client.close()
+            except Exception as e:
+                print(f"[{self.name}] Error during disconnect: {e}")
+        
+        self._running = False
+        self._client = None
+        self._ready_event.clear()
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message to a Discord channel."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            # Get the channel
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            # Format and split message if needed
+            formatted = self.format_message(content)
+            chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
+            
+            message_ids = []
+            reference = None
+            
+            if reply_to:
+                try:
+                    ref_msg = await channel.fetch_message(int(reply_to))
+                    reference = ref_msg
+                except Exception as e:
+                    logger.debug("Could not fetch reply-to message: %s", e)
+            
+            for i, chunk in enumerate(chunks):
+                msg = await channel.send(
+                    content=chunk,
+                    reference=reference if i == 0 else None,
+                )
+                message_ids.append(str(msg.id))
+            
+            return SendResult(
+                success=True,
+                message_id=message_ids[0] if message_ids else None,
+                raw_response={"message_ids": message_ids}
+            )
+            
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def edit_message(
+        self,
+        chat_id: str,
+        message_id: str,
+        content: str,
+    ) -> SendResult:
+        """Edit a previously sent Discord message."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        try:
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            msg = await channel.fetch_message(int(message_id))
+            formatted = self.format_message(content)
+            if len(formatted) > self.MAX_MESSAGE_LENGTH:
+                formatted = formatted[:self.MAX_MESSAGE_LENGTH - 3] + "..."
+            await msg.edit(content=formatted)
+            return SendResult(success=True, message_id=message_id)
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send audio as a Discord file attachment."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import io
+            
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            if not os.path.exists(audio_path):
+                return SendResult(success=False, error=f"Audio file not found: {audio_path}")
+            
+            # Determine filename from path
+            filename = os.path.basename(audio_path)
+            
+            with open(audio_path, "rb") as f:
+                file = discord.File(io.BytesIO(f.read()), filename=filename)
+                msg = await channel.send(
+                    content=caption if caption else None,
+                    file=file,
+                )
+                return SendResult(success=True, message_id=str(msg.id))
+        
+        except Exception as e:
+            print(f"[{self.name}] Failed to send audio: {e}")
+            return await super().send_voice(chat_id, audio_path, caption, reply_to)
+    
+    async def send_image_file(
+        self,
+        chat_id: str,
+        image_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a local image file natively as a Discord file attachment."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import io
+            
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            if not os.path.exists(image_path):
+                return SendResult(success=False, error=f"Image file not found: {image_path}")
+            
+            filename = os.path.basename(image_path)
+            
+            with open(image_path, "rb") as f:
+                file = discord.File(io.BytesIO(f.read()), filename=filename)
+                msg = await channel.send(
+                    content=caption if caption else None,
+                    file=file,
+                )
+                return SendResult(success=True, message_id=str(msg.id))
+        
+        except Exception as e:
+            print(f"[{self.name}] Failed to send local image: {e}")
+            return await super().send_image_file(chat_id, image_path, caption, reply_to)
+
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an image natively as a Discord file attachment."""
+        if not self._client:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import aiohttp
+            
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            if not channel:
+                return SendResult(success=False, error=f"Channel {chat_id} not found")
+            
+            # Download the image and send as a Discord file attachment
+            # (Discord renders attachments inline, unlike plain URLs)
+            async with aiohttp.ClientSession() as session:
+                async with session.get(image_url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
+                    if resp.status != 200:
+                        raise Exception(f"Failed to download image: HTTP {resp.status}")
+                    
+                    image_data = await resp.read()
+                    
+                    # Determine filename from URL or content type
+                    content_type = resp.headers.get("content-type", "image/png")
+                    ext = "png"
+                    if "jpeg" in content_type or "jpg" in content_type:
+                        ext = "jpg"
+                    elif "gif" in content_type:
+                        ext = "gif"
+                    elif "webp" in content_type:
+                        ext = "webp"
+                    
+                    import io
+                    file = discord.File(io.BytesIO(image_data), filename=f"image.{ext}")
+                    
+                    msg = await channel.send(
+                        content=caption if caption else None,
+                        file=file,
+                    )
+                    return SendResult(success=True, message_id=str(msg.id))
+        
+        except ImportError:
+            print(f"[{self.name}] aiohttp not installed, falling back to URL. Run: pip install aiohttp")
+            return await super().send_image(chat_id, image_url, caption, reply_to)
+        except Exception as e:
+            print(f"[{self.name}] Failed to send image attachment, falling back to URL: {e}")
+            return await super().send_image(chat_id, image_url, caption, reply_to)
+    
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator."""
+        if self._client:
+            try:
+                channel = self._client.get_channel(int(chat_id))
+                if channel:
+                    await channel.typing()
+            except Exception:
+                pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a Discord channel."""
+        if not self._client:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+            
+            if not channel:
+                return {"name": str(chat_id), "type": "dm"}
+            
+            # Determine channel type
+            if isinstance(channel, discord.DMChannel):
+                chat_type = "dm"
+                name = channel.recipient.name if channel.recipient else str(chat_id)
+            elif isinstance(channel, discord.Thread):
+                chat_type = "thread"
+                name = channel.name
+            elif isinstance(channel, discord.TextChannel):
+                chat_type = "channel"
+                name = f"#{channel.name}"
+                if channel.guild:
+                    name = f"{channel.guild.name} / {name}"
+            else:
+                chat_type = "channel"
+                name = getattr(channel, "name", str(chat_id))
+            
+            return {
+                "name": name,
+                "type": chat_type,
+                "guild_id": str(channel.guild.id) if hasattr(channel, "guild") and channel.guild else None,
+                "guild_name": channel.guild.name if hasattr(channel, "guild") and channel.guild else None,
+            }
+        except Exception as e:
+            return {"name": str(chat_id), "type": "dm", "error": str(e)}
+    
+    async def _resolve_allowed_usernames(self) -> None:
+        """
+        Resolve non-numeric entries in DISCORD_ALLOWED_USERS to Discord user IDs.
+
+        Users can specify usernames (e.g. "teknium") or display names instead of
+        raw numeric IDs.  After resolution, the env var and internal set are updated
+        so authorization checks work with IDs only.
+        """
+        if not self._allowed_user_ids or not self._client:
+            return
+
+        numeric_ids = set()
+        to_resolve = set()
+
+        for entry in self._allowed_user_ids:
+            if entry.isdigit():
+                numeric_ids.add(entry)
+            else:
+                to_resolve.add(entry.lower())
+
+        if not to_resolve:
+            return
+
+        print(f"[{self.name}] Resolving {len(to_resolve)} username(s): {', '.join(to_resolve)}")
+        resolved_count = 0
+
+        for guild in self._client.guilds:
+            # Fetch full member list (requires members intent)
+            try:
+                members = guild.members
+                if len(members) < guild.member_count:
+                    members = [m async for m in guild.fetch_members(limit=None)]
+            except Exception as e:
+                logger.warning("Failed to fetch members for guild %s: %s", guild.name, e)
+                continue
+
+            for member in members:
+                name_lower = member.name.lower()
+                display_lower = member.display_name.lower()
+                global_lower = (member.global_name or "").lower()
+
+                matched = name_lower in to_resolve or display_lower in to_resolve or global_lower in to_resolve
+                if matched:
+                    uid = str(member.id)
+                    numeric_ids.add(uid)
+                    resolved_count += 1
+                    matched_name = name_lower if name_lower in to_resolve else (
+                        display_lower if display_lower in to_resolve else global_lower
+                    )
+                    to_resolve.discard(matched_name)
+                    print(f"[{self.name}] Resolved '{matched_name}' -> {uid} ({member.name}#{member.discriminator})")
+
+            if not to_resolve:
+                break
+
+        if to_resolve:
+            print(f"[{self.name}] Could not resolve usernames: {', '.join(to_resolve)}")
+
+        # Update internal set and env var so gateway auth checks use IDs
+        self._allowed_user_ids = numeric_ids
+        os.environ["DISCORD_ALLOWED_USERS"] = ",".join(sorted(numeric_ids))
+        if resolved_count:
+            print(f"[{self.name}] Updated DISCORD_ALLOWED_USERS with {resolved_count} resolved ID(s)")
+
+    def format_message(self, content: str) -> str:
+        """
+        Format message for Discord.
+        
+        Discord uses its own markdown variant.
+        """
+        # Discord markdown is fairly standard, no special escaping needed
+        return content
+    
+    def _register_slash_commands(self) -> None:
+        """Register Discord slash commands on the command tree."""
+        if not self._client:
+            return
+
+        tree = self._client.tree
+
+        @tree.command(name="ask", description="Ask Hermes a question")
+        @discord.app_commands.describe(question="Your question for Hermes")
+        async def slash_ask(interaction: discord.Interaction, question: str):
+            await interaction.response.defer()
+            event = self._build_slash_event(interaction, question)
+            await self.handle_message(event)
+            # The response is sent via the normal send() flow
+            # Send a followup to close the interaction if needed
+            try:
+                await interaction.followup.send("Processing complete~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="new", description="Start a new conversation")
+        async def slash_new(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/reset")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("New conversation started~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="reset", description="Reset your Hermes session")
+        async def slash_reset(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/reset")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Session reset~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="model", description="Show or change the model")
+        @discord.app_commands.describe(name="Model name (e.g. anthropic/claude-sonnet-4). Leave empty to see current.")
+        async def slash_model(interaction: discord.Interaction, name: str = ""):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, f"/model {name}".strip())
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="personality", description="Set a personality")
+        @discord.app_commands.describe(name="Personality name. Leave empty to list available.")
+        async def slash_personality(interaction: discord.Interaction, name: str = ""):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, f"/personality {name}".strip())
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="retry", description="Retry your last message")
+        async def slash_retry(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/retry")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Retrying~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="undo", description="Remove the last exchange")
+        async def slash_undo(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/undo")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="status", description="Show Hermes session status")
+        async def slash_status(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/status")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Status sent~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="sethome", description="Set this chat as the home channel")
+        async def slash_sethome(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/sethome")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="stop", description="Stop the running Hermes agent")
+        async def slash_stop(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/stop")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Stop requested~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="compress", description="Compress conversation context")
+        async def slash_compress(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/compress")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="title", description="Set or show the session title")
+        @discord.app_commands.describe(name="Session title. Leave empty to show current.")
+        async def slash_title(interaction: discord.Interaction, name: str = ""):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, f"/title {name}".strip())
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="resume", description="Resume a previously-named session")
+        @discord.app_commands.describe(name="Session name to resume. Leave empty to list sessions.")
+        async def slash_resume(interaction: discord.Interaction, name: str = ""):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, f"/resume {name}".strip())
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="usage", description="Show token usage for this session")
+        async def slash_usage(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/usage")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="provider", description="Show available providers")
+        async def slash_provider(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/provider")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="help", description="Show available commands")
+        async def slash_help(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/help")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="insights", description="Show usage insights and analytics")
+        @discord.app_commands.describe(days="Number of days to analyze (default: 7)")
+        async def slash_insights(interaction: discord.Interaction, days: int = 7):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, f"/insights {days}")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="reload-mcp", description="Reload MCP servers from config")
+        async def slash_reload_mcp(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/reload-mcp")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Done~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+        @tree.command(name="update", description="Update Hermes Agent to the latest version")
+        async def slash_update(interaction: discord.Interaction):
+            await interaction.response.defer(ephemeral=True)
+            event = self._build_slash_event(interaction, "/update")
+            await self.handle_message(event)
+            try:
+                await interaction.followup.send("Update initiated~", ephemeral=True)
+            except Exception as e:
+                logger.debug("Discord followup failed: %s", e)
+
+    def _build_slash_event(self, interaction: discord.Interaction, text: str) -> MessageEvent:
+        """Build a MessageEvent from a Discord slash command interaction."""
+        is_dm = isinstance(interaction.channel, discord.DMChannel)
+        chat_type = "dm" if is_dm else "group"
+        chat_name = ""
+        if not is_dm and hasattr(interaction.channel, "name"):
+            chat_name = interaction.channel.name
+            if hasattr(interaction.channel, "guild") and interaction.channel.guild:
+                chat_name = f"{interaction.channel.guild.name} / #{chat_name}"
+        
+        # Get channel topic (if available)
+        chat_topic = getattr(interaction.channel, "topic", None)
+
+        source = self.build_source(
+            chat_id=str(interaction.channel_id),
+            chat_name=chat_name,
+            chat_type=chat_type,
+            user_id=str(interaction.user.id),
+            user_name=interaction.user.display_name,
+            chat_topic=chat_topic,
+        )
+
+        msg_type = MessageType.COMMAND if text.startswith("/") else MessageType.TEXT
+        return MessageEvent(
+            text=text,
+            message_type=msg_type,
+            source=source,
+            raw_message=interaction,
+        )
+
+    async def send_exec_approval(
+        self, chat_id: str, command: str, approval_id: str
+    ) -> SendResult:
+        """
+        Send a button-based exec approval prompt for a dangerous command.
+
+        Returns SendResult. The approval is resolved when a user clicks a button.
+        """
+        if not self._client or not DISCORD_AVAILABLE:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            channel = self._client.get_channel(int(chat_id))
+            if not channel:
+                channel = await self._client.fetch_channel(int(chat_id))
+
+            embed = discord.Embed(
+                title="Command Approval Required",
+                description=f"```\n{command[:500]}\n```",
+                color=discord.Color.orange(),
+            )
+            embed.set_footer(text=f"Approval ID: {approval_id}")
+
+            view = ExecApprovalView(
+                approval_id=approval_id,
+                allowed_user_ids=self._allowed_user_ids,
+            )
+
+            msg = await channel.send(embed=embed, view=view)
+            return SendResult(success=True, message_id=str(msg.id))
+
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def _handle_message(self, message: DiscordMessage) -> None:
+        """Handle incoming Discord messages."""
+        # In server channels (not DMs), require the bot to be @mentioned
+        # UNLESS the channel is in the free-response list.
+        #
+        # Config:
+        #   DISCORD_FREE_RESPONSE_CHANNELS: Comma-separated channel IDs where the
+        #       bot responds to every message without needing a mention.
+        #   DISCORD_REQUIRE_MENTION: Set to "false" to disable mention requirement
+        #       globally (all channels become free-response). Default: "true".
+        
+        if not isinstance(message.channel, discord.DMChannel):
+            # Check if this channel is in the free-response list
+            free_channels_raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
+            free_channels = {ch.strip() for ch in free_channels_raw.split(",") if ch.strip()}
+            channel_id = str(message.channel.id)
+            
+            # Global override: if DISCORD_REQUIRE_MENTION=false, all channels are free
+            require_mention = os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no")
+            
+            is_free_channel = channel_id in free_channels
+            
+            if require_mention and not is_free_channel:
+                # Must be @mentioned to respond
+                if self._client.user not in message.mentions:
+                    return  # Silently ignore messages that don't mention the bot
+            
+            # Strip the bot mention from the message text so the agent sees clean input
+            if self._client.user and self._client.user in message.mentions:
+                message.content = message.content.replace(f"<@{self._client.user.id}>", "").strip()
+                message.content = message.content.replace(f"<@!{self._client.user.id}>", "").strip()
+        
+        # Determine message type
+        msg_type = MessageType.TEXT
+        if message.content.startswith("/"):
+            msg_type = MessageType.COMMAND
+        elif message.attachments:
+            # Check attachment types
+            for att in message.attachments:
+                if att.content_type:
+                    if att.content_type.startswith("image/"):
+                        msg_type = MessageType.PHOTO
+                    elif att.content_type.startswith("video/"):
+                        msg_type = MessageType.VIDEO
+                    elif att.content_type.startswith("audio/"):
+                        msg_type = MessageType.AUDIO
+                    else:
+                        msg_type = MessageType.DOCUMENT
+                    break
+        
+        # Determine chat type
+        if isinstance(message.channel, discord.DMChannel):
+            chat_type = "dm"
+            chat_name = message.author.name
+        elif isinstance(message.channel, discord.Thread):
+            chat_type = "thread"
+            chat_name = message.channel.name
+        else:
+            chat_type = "group"  # Treat server channels as groups
+            chat_name = getattr(message.channel, "name", str(message.channel.id))
+            if hasattr(message.channel, "guild") and message.channel.guild:
+                chat_name = f"{message.channel.guild.name} / #{chat_name}"
+        
+        # Get thread ID if in a thread
+        thread_id = None
+        if isinstance(message.channel, discord.Thread):
+            thread_id = str(message.channel.id)
+        
+        # Get channel topic (if available - TextChannels have topics, DMs/threads don't)
+        chat_topic = getattr(message.channel, "topic", None)
+        
+        # Build source
+        source = self.build_source(
+            chat_id=str(message.channel.id),
+            chat_name=chat_name,
+            chat_type=chat_type,
+            user_id=str(message.author.id),
+            user_name=message.author.display_name,
+            thread_id=thread_id,
+            chat_topic=chat_topic,
+        )
+        
+        # Build media URLs -- download image attachments to local cache so the
+        # vision tool can access them reliably (Discord CDN URLs can expire).
+        media_urls = []
+        media_types = []
+        for att in message.attachments:
+            content_type = att.content_type or "unknown"
+            if content_type.startswith("image/"):
+                try:
+                    # Determine extension from content type (image/png -> .png)
+                    ext = "." + content_type.split("/")[-1].split(";")[0]
+                    if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
+                        ext = ".jpg"
+                    cached_path = await cache_image_from_url(att.url, ext=ext)
+                    media_urls.append(cached_path)
+                    media_types.append(content_type)
+                    print(f"[Discord] Cached user image: {cached_path}", flush=True)
+                except Exception as e:
+                    print(f"[Discord] Failed to cache image attachment: {e}", flush=True)
+                    # Fall back to the CDN URL if caching fails
+                    media_urls.append(att.url)
+                    media_types.append(content_type)
+            elif content_type.startswith("audio/"):
+                try:
+                    ext = "." + content_type.split("/")[-1].split(";")[0]
+                    if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
+                        ext = ".ogg"
+                    cached_path = await cache_audio_from_url(att.url, ext=ext)
+                    media_urls.append(cached_path)
+                    media_types.append(content_type)
+                    print(f"[Discord] Cached user audio: {cached_path}", flush=True)
+                except Exception as e:
+                    print(f"[Discord] Failed to cache audio attachment: {e}", flush=True)
+                    media_urls.append(att.url)
+                    media_types.append(content_type)
+            else:
+                # Other attachments: keep the original URL
+                media_urls.append(att.url)
+                media_types.append(content_type)
+        
+        event = MessageEvent(
+            text=message.content,
+            message_type=msg_type,
+            source=source,
+            raw_message=message,
+            message_id=str(message.id),
+            media_urls=media_urls,
+            media_types=media_types,
+            reply_to_message_id=str(message.reference.message_id) if message.reference else None,
+            timestamp=message.created_at,
+        )
+        
+        await self.handle_message(event)
+
+
+# ---------------------------------------------------------------------------
+# Discord UI Components (outside the adapter class)
+# ---------------------------------------------------------------------------
+
+if DISCORD_AVAILABLE:
+
+    class ExecApprovalView(discord.ui.View):
+        """
+        Interactive button view for exec approval of dangerous commands.
+
+        Shows three buttons: Allow Once (green), Always Allow (blue), Deny (red).
+        Only users in the allowed list can click. The view times out after 5 minutes.
+        """
+
+        def __init__(self, approval_id: str, allowed_user_ids: set):
+            super().__init__(timeout=300)  # 5-minute timeout
+            self.approval_id = approval_id
+            self.allowed_user_ids = allowed_user_ids
+            self.resolved = False
+
+        def _check_auth(self, interaction: discord.Interaction) -> bool:
+            """Verify the user clicking is authorized."""
+            if not self.allowed_user_ids:
+                return True  # No allowlist = anyone can approve
+            return str(interaction.user.id) in self.allowed_user_ids
+
+        async def _resolve(
+            self, interaction: discord.Interaction, action: str, color: discord.Color
+        ):
+            """Resolve the approval and update the message."""
+            if self.resolved:
+                await interaction.response.send_message(
+                    "This approval has already been resolved~", ephemeral=True
+                )
+                return
+
+            if not self._check_auth(interaction):
+                await interaction.response.send_message(
+                    "You're not authorized to approve commands~", ephemeral=True
+                )
+                return
+
+            self.resolved = True
+
+            # Update the embed with the decision
+            embed = interaction.message.embeds[0] if interaction.message.embeds else None
+            if embed:
+                embed.color = color
+                embed.set_footer(text=f"{action} by {interaction.user.display_name}")
+
+            # Disable all buttons
+            for child in self.children:
+                child.disabled = True
+
+            await interaction.response.edit_message(embed=embed, view=self)
+
+            # Store the approval decision
+            try:
+                from tools.approval import approve_permanent
+                if action == "allow_once":
+                    pass  # One-time approval handled by gateway
+                elif action == "allow_always":
+                    approve_permanent(self.approval_id)
+            except ImportError:
+                pass
+
+        @discord.ui.button(label="Allow Once", style=discord.ButtonStyle.green)
+        async def allow_once(
+            self, interaction: discord.Interaction, button: discord.ui.Button
+        ):
+            await self._resolve(interaction, "allow_once", discord.Color.green())
+
+        @discord.ui.button(label="Always Allow", style=discord.ButtonStyle.blurple)
+        async def allow_always(
+            self, interaction: discord.Interaction, button: discord.ui.Button
+        ):
+            await self._resolve(interaction, "allow_always", discord.Color.blue())
+
+        @discord.ui.button(label="Deny", style=discord.ButtonStyle.red)
+        async def deny(
+            self, interaction: discord.Interaction, button: discord.ui.Button
+        ):
+            await self._resolve(interaction, "deny", discord.Color.red())
+
+        async def on_timeout(self):
+            """Handle view timeout -- disable buttons and mark as expired."""
+            self.resolved = True
+            for child in self.children:
+                child.disabled = True
--- a/hermes_agent/gateway/platforms/homeassistant.py
+++ b/hermes_agent/gateway/platforms/homeassistant.py
@@ -19,7 +19,7 @@ import os
 import time
 import uuid
 from datetime import datetime
-from typing import Any, Dict, Optional, Set
+from typing import Any, Dict, List, Optional, Set

 try:
    import aiohttp
@@ -28,8 +28,8 @@ except ImportError:
    AIOHTTP_AVAILABLE = False
    aiohttp = None  # type: ignore[assignment]

-from hermes_agent.gateway.config import Platform, PlatformConfig
-from hermes_agent.gateway.platforms.base import (
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
    MessageType,
@@ -83,7 +83,6 @@ class HomeAssistantAdapter(BasePlatformAdapter):
        self._watch_domains: Set[str] = set(extra.get("watch_domains", []))
        self._watch_entities: Set[str] = set(extra.get("watch_entities", []))
        self._ignore_entities: Set[str] = set(extra.get("ignore_entities", []))
-        self._watch_all: bool = bool(extra.get("watch_all", False))
        self._cooldown_seconds: int = int(extra.get("cooldown_seconds", 30))

        # Cooldown tracking: entity_id -> last_event_timestamp
@@ -114,18 +113,7 @@ class HomeAssistantAdapter(BasePlatformAdapter):
                return False

            # Dedicated REST session for send() calls
-            self._rest_session = aiohttp.ClientSession(
-                timeout=aiohttp.ClientTimeout(total=30)
-            )
-
-            # Warn if no event filters are configured
-            if not self._watch_domains and not self._watch_entities and not self._watch_all:
-                logger.warning(
-                    "[%s] No watch_domains, watch_entities, or watch_all configured. "
-                    "All state_changed events will be dropped. Configure filters in "
-                    "your HA platform config to receive events.",
-                    self.name,
-                )
+            self._rest_session = aiohttp.ClientSession()

            # Start background listener
            self._listen_task = asyncio.create_task(self._listen_loop())
@@ -142,10 +130,8 @@ class HomeAssistantAdapter(BasePlatformAdapter):
        ws_url = self._hass_url.replace("http://", "ws://").replace("https://", "wss://")
        ws_url = f"{ws_url}/api/websocket"

-        self._session = aiohttp.ClientSession(
-            timeout=aiohttp.ClientTimeout(total=30)
-        )
-        self._ws = await self._session.ws_connect(ws_url, heartbeat=30, timeout=30)
+        self._session = aiohttp.ClientSession()
+        self._ws = await self._session.ws_connect(ws_url, heartbeat=30)

        # Step 1: Receive auth_required
        msg = await self._ws.receive_json()
@@ -271,17 +257,13 @@ class HomeAssistantAdapter(BasePlatformAdapter):
        if entity_id in self._ignore_entities:
            return

-        # Apply domain/entity watch filters (closed by default — require
-        # explicit watch_domains, watch_entities, or watch_all to forward)
+        # Apply domain/entity watch filters
        domain = entity_id.split(".")[0] if "." in entity_id else ""
        if self._watch_domains or self._watch_entities:
            domain_match = domain in self._watch_domains if self._watch_domains else False
            entity_match = entity_id in self._watch_entities if self._watch_entities else False
            if not domain_match and not entity_match:
                return
-        elif not self._watch_all:
-            # No filters configured and watch_all is off — drop the event
-            return

        # Apply cooldown
        now = time.time()
@@ -437,8 +419,9 @@ class HomeAssistantAdapter(BasePlatformAdapter):
        except Exception as e:
            return SendResult(success=False, error=str(e))

-    async def send_typing(self, chat_id: str, metadata=None) -> None:
+    async def send_typing(self, chat_id: str) -> None:
        """No typing indicator for Home Assistant."""
+        pass

    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
        """Return basic info about the HA event channel."""
--- a/hermes_agent/gateway/platforms/signal.py
+++ b/hermes_agent/gateway/platforms/signal.py
@@ -17,17 +17,17 @@ import json
 import logging
 import os
 import random
+import re
 import time
-import uuid
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Dict, List, Optional, Any
-from urllib.parse import quote, unquote
+from urllib.parse import unquote

 import httpx

-from hermes_agent.gateway.config import Platform, PlatformConfig
-from hermes_agent.gateway.platforms.base import (
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
    BasePlatformAdapter,
    MessageEvent,
    MessageType,
@@ -37,7 +37,6 @@ from hermes_agent.gateway.platforms.base import (
    cache_document_from_bytes,
    cache_image_from_url,
 )
-from hermes_agent.gateway.platforms.helpers import redact_phone

 logger = logging.getLogger(__name__)

@@ -52,10 +51,22 @@ SSE_RETRY_DELAY_MAX = 60.0
 HEALTH_CHECK_INTERVAL = 30.0  # seconds between health checks
 HEALTH_CHECK_STALE_THRESHOLD = 120.0  # seconds without SSE activity before concern

+# E.164 phone number pattern for redaction
+_PHONE_RE = re.compile(r"\+[1-9]\d{6,14}")
+
+
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------

+def _redact_phone(phone: str) -> str:
+    """Redact a phone number for logging: +15551234567 -> +155****4567."""
+    if not phone:
+        return "<none>"
+    if len(phone) <= 8:
+        return phone[:2] + "****" + phone[-2:] if len(phone) > 4 else "****"
+    return phone[:4] + "****" + phone[-4:]
+

 def _parse_comma_list(value: str) -> List[str]:
    """Split a comma-separated string into a list, stripping whitespace."""
@@ -93,20 +104,6 @@ def _is_audio_ext(ext: str) -> bool:
    return ext.lower() in (".mp3", ".wav", ".ogg", ".m4a", ".aac")


-_EXT_TO_MIME = {
-    ".jpg": "image/jpeg", ".jpeg": "image/jpeg", ".png": "image/png",
-    ".gif": "image/gif", ".webp": "image/webp",
-    ".ogg": "audio/ogg", ".mp3": "audio/mpeg", ".wav": "audio/wav",
-    ".m4a": "audio/mp4", ".aac": "audio/aac",
-    ".mp4": "video/mp4", ".pdf": "application/pdf", ".zip": "application/zip",
-}
-
-
-def _ext_to_mime(ext: str) -> str:
-    """Map file extension to MIME type."""
-    return _EXT_TO_MIME.get(ext.lower(), "application/octet-stream")
-
-
 def _render_mentions(text: str, mentions: list) -> str:
    """Replace Signal mention placeholders (\\uFFFC) with readable @identifiers.

@@ -128,27 +125,6 @@ def _render_mentions(text: str, mentions: list) -> str:
    return text


-def _is_signal_service_id(value: str) -> bool:
-    """Return True if *value* already looks like a Signal service identifier."""
-    if not value:
-        return False
-    if value.startswith("PNI:") or value.startswith("u:"):
-        return True
-    try:
-        uuid.UUID(value)
-        return True
-    except (ValueError, AttributeError, TypeError):
-        return False
-
-
-def _looks_like_e164_number(value: str) -> bool:
-    """Return True for a plausible E.164 phone number."""
-    if not value or not value.startswith("+"):
-        return False
-    digits = value[1:]
-    return digits.isdigit() and 7 <= len(digits) <= 15
-
-
 def check_signal_requirements() -> bool:
    """Check if Signal is configured (has URL and account)."""
    return bool(os.getenv("SIGNAL_HTTP_URL") and os.getenv("SIGNAL_ACCOUNT"))
@@ -182,14 +158,6 @@ class SignalAdapter(BasePlatformAdapter):
        self._sse_task: Optional[asyncio.Task] = None
        self._health_monitor_task: Optional[asyncio.Task] = None
        self._typing_tasks: Dict[str, asyncio.Task] = {}
-        # Per-chat typing-indicator backoff. When signal-cli reports
-        # NETWORK_FAILURE (recipient offline / unroutable), base.py's
-        # _keep_typing refresh loop would otherwise hammer sendTyping every
-        # ~2s indefinitely, producing WARNING-level log spam and pointless
-        # RPC traffic. We track consecutive failures per chat and skip the
-        # RPC during a cooldown window instead.
-        self._typing_failures: Dict[str, int] = {}
-        self._typing_skip_until: Dict[str, float] = {}
        self._running = False
        self._last_sse_activity = 0.0
        self._sse_response: Optional[httpx.Response] = None
@@ -197,19 +165,8 @@ class SignalAdapter(BasePlatformAdapter):
        # Normalize account for self-message filtering
        self._account_normalized = self.account.strip()

-        # Track recently sent message timestamps to prevent echo-back loops
-        # in Note to Self / self-chat mode (mirrors WhatsApp recentlySentIds)
-        self._recent_sent_timestamps: set = set()
-        self._max_recent_timestamps = 50
-        # Signal increasingly exposes ACI/PNI UUIDs as stable recipient IDs.
-        # Keep a best-effort mapping so outbound sends can upgrade from a
-        # phone number to the corresponding UUID when signal-cli prefers it.
-        self._recipient_uuid_by_number: Dict[str, str] = {}
-        self._recipient_number_by_uuid: Dict[str, str] = {}
-        self._recipient_cache_lock = asyncio.Lock()
-
        logger.info("Signal adapter initialized: url=%s account=%s groups=%s",
-                     self.http_url, redact_phone(self.account),
+                     self.http_url, _redact_phone(self.account),
                     "enabled" if self.group_allow_from else "disabled")

    # ------------------------------------------------------------------
@@ -222,41 +179,25 @@ class SignalAdapter(BasePlatformAdapter):
            logger.error("Signal: SIGNAL_HTTP_URL and SIGNAL_ACCOUNT are required")
            return False

-        # Acquire scoped lock to prevent duplicate Signal listeners for the same phone
-        lock_acquired = False
-        try:
-            if not self._acquire_platform_lock('signal-phone', self.account, 'Signal account'):
-                return False
-            lock_acquired = True
-        except Exception as e:
-            logger.warning("Signal: Could not acquire phone lock (non-fatal): %s", e)
-
        self.client = httpx.AsyncClient(timeout=30.0)
+
+        # Health check — verify signal-cli daemon is reachable
        try:
-            # Health check — verify signal-cli daemon is reachable
-            try:
-                resp = await self.client.get(f"{self.http_url}/api/v1/check", timeout=10.0)
-                if resp.status_code != 200:
-                    logger.error("Signal: health check failed (status %d)", resp.status_code)
-                    return False
-            except Exception as e:
-                logger.error("Signal: cannot reach signal-cli at %s: %s", self.http_url, e)
+            resp = await self.client.get(f"{self.http_url}/api/v1/check", timeout=10.0)
+            if resp.status_code != 200:
+                logger.error("Signal: health check failed (status %d)", resp.status_code)
                return False
+        except Exception as e:
+            logger.error("Signal: cannot reach signal-cli at %s: %s", self.http_url, e)
+            return False

-            self._running = True
-            self._last_sse_activity = time.time()
-            self._sse_task = asyncio.create_task(self._sse_listener())
-            self._health_monitor_task = asyncio.create_task(self._health_monitor())
+        self._running = True
+        self._last_sse_activity = time.time()
+        self._sse_task = asyncio.create_task(self._sse_listener())
+        self._health_monitor_task = asyncio.create_task(self._health_monitor())

-            logger.info("Signal: connected to %s", self.http_url)
-            return True
-        finally:
-            if not self._running:
-                if self.client:
-                    await self.client.aclose()
-                    self.client = None
-                if lock_acquired:
-                    self._release_platform_lock()
+        logger.info("Signal: connected to %s", self.http_url)
+        return True

    async def disconnect(self) -> None:
        """Stop SSE listener and clean up."""
@@ -285,8 +226,6 @@ class SignalAdapter(BasePlatformAdapter):
            await self.client.aclose()
            self.client = None

-        self._release_platform_lock()
-
        logger.info("Signal: disconnected")

    # ------------------------------------------------------------------
@@ -295,7 +234,7 @@ class SignalAdapter(BasePlatformAdapter):

    async def _sse_listener(self) -> None:
        """Listen for SSE events from signal-cli daemon."""
-        url = f"{self.http_url}/api/v1/events?account={quote(self.account, safe='')}"
+        url = f"{self.http_url}/api/v1/events?account={self.account}"
        backoff = SSE_RETRY_DELAY_INITIAL

        while self._running:
@@ -321,12 +260,6 @@ class SignalAdapter(BasePlatformAdapter):
                            line = line.strip()
                            if not line:
                                continue
-                            # SSE keepalive comments (":") prove the connection
-                            # is alive — update activity so the health monitor
-                            # doesn't report false idle warnings.
-                            if line.startswith(":"):
-                                self._last_sse_activity = time.time()
-                                continue
                            # Parse SSE data lines
                            if line.startswith("data:"):
                                data_str = line[5:].strip()
@@ -392,9 +325,7 @@ class SignalAdapter(BasePlatformAdapter):
        """Force SSE reconnection by closing the current response."""
        if self._sse_response and not self._sse_response.is_stream_consumed:
            try:
-                task = asyncio.create_task(self._sse_response.aclose())
-                self._background_tasks.add(task)
-                task.add_done_callback(self._background_tasks.discard)
+                asyncio.create_task(self._sse_response.aclose())
            except Exception:
                pass
            self._sse_response = None
@@ -408,26 +339,10 @@ class SignalAdapter(BasePlatformAdapter):
        # Unwrap nested envelope if present
        envelope_data = envelope.get("envelope", envelope)

-        # Handle syncMessage: extract "Note to Self" messages (sent to own account)
-        # while still filtering other sync events (read receipts, typing, etc.)
-        is_note_to_self = False
+        # Filter syncMessage envelopes (sent transcripts, read receipts, etc.)
+        # signal-cli may set syncMessage to null vs omitting it, so check key existence
        if "syncMessage" in envelope_data:
-            sync_msg = envelope_data.get("syncMessage")
-            if sync_msg and isinstance(sync_msg, dict):
-                sent_msg = sync_msg.get("sentMessage")
-                if sent_msg and isinstance(sent_msg, dict):
-                    dest = sent_msg.get("destinationNumber") or sent_msg.get("destination")
-                    sent_ts = sent_msg.get("timestamp")
-                    if dest == self._account_normalized:
-                        # Check if this is an echo of our own outbound reply
-                        if sent_ts and sent_ts in self._recent_sent_timestamps:
-                            self._recent_sent_timestamps.discard(sent_ts)
-                            return
-                        # Genuine user Note to Self — promote to dataMessage
-                        is_note_to_self = True
-                        envelope_data = {**envelope_data, "dataMessage": sent_msg}
-            if not is_note_to_self:
-                return
+            return

        # Extract sender info
        sender = (
@@ -437,14 +352,13 @@ class SignalAdapter(BasePlatformAdapter):
        )
        sender_name = envelope_data.get("sourceName", "")
        sender_uuid = envelope_data.get("sourceUuid", "")
-        self._remember_recipient_identifiers(sender, sender_uuid)

        if not sender:
            logger.debug("Signal: ignoring envelope with no sender")
            return

-        # Self-message filtering — prevent reply loops (but allow Note to Self)
-        if self._account_normalized and sender == self._account_normalized and not is_note_to_self:
+        # Self-message filtering — prevent reply loops
+        if self._account_normalized and sender == self._account_normalized:
            return

        # Filter stories
@@ -490,8 +404,9 @@ class SignalAdapter(BasePlatformAdapter):

        # Process attachments
        attachments_data = data_message.get("attachments", [])
-        media_urls = []
-        media_types = []
+        image_paths = []
+        audio_path = None
+        document_paths = []

        if attachments_data and not getattr(self, "ignore_attachments", False):
            for att in attachments_data:
@@ -505,10 +420,12 @@ class SignalAdapter(BasePlatformAdapter):
                try:
                    cached_path, ext = await self._fetch_attachment(att_id)
                    if cached_path:
-                        # Use contentType from Signal if available, else map from extension
-                        content_type = att.get("contentType") or _ext_to_mime(ext)
-                        media_urls.append(cached_path)
-                        media_types.append(content_type)
+                        if _is_image_ext(ext):
+                            image_paths.append(cached_path)
+                        elif _is_audio_ext(ext):
+                            audio_path = cached_path
+                        else:
+                            document_paths.append(cached_path)
                except Exception:
                    logger.exception("Signal: failed to fetch attachment %s", att_id)

@@ -523,13 +440,12 @@ class SignalAdapter(BasePlatformAdapter):
            chat_id_alt=group_id if is_group else None,
        )

-        # Determine message type from media
+        # Determine message type
        msg_type = MessageType.TEXT
-        if media_types:
-            if any(mt.startswith("audio/") for mt in media_types):
-                msg_type = MessageType.VOICE
-            elif any(mt.startswith("image/") for mt in media_types):
-                msg_type = MessageType.PHOTO
+        if audio_path:
+            msg_type = MessageType.VOICE
+        elif image_paths:
+            msg_type = MessageType.IMAGE

        # Parse timestamp from envelope data (milliseconds since epoch)
        ts_ms = envelope_data.get("timestamp", 0)
@@ -546,74 +462,17 @@ class SignalAdapter(BasePlatformAdapter):
            source=source,
            text=text or "",
            message_type=msg_type,
-            media_urls=media_urls,
-            media_types=media_types,
+            image_paths=image_paths,
+            audio_path=audio_path,
+            document_paths=document_paths,
            timestamp=timestamp,
        )

        logger.debug("Signal: message from %s in %s: %s",
-                      redact_phone(sender), chat_id[:20], (text or "")[:50])
+                      _redact_phone(sender), chat_id[:20], (text or "")[:50])

        await self.handle_message(event)

-    def _remember_recipient_identifiers(self, number: Optional[str], service_id: Optional[str]) -> None:
-        """Cache any number↔UUID mapping observed from Signal envelopes."""
-        if not number or not service_id or not _is_signal_service_id(service_id):
-            return
-        self._recipient_uuid_by_number[number] = service_id
-        self._recipient_number_by_uuid[service_id] = number
-
-    def _extract_contact_uuid(self, contact: Any, phone_number: str) -> Optional[str]:
-        """Best-effort extraction of a Signal service ID from listContacts output."""
-        if not isinstance(contact, dict):
-            return None
-
-        number = contact.get("number")
-        recipient = contact.get("recipient")
-        service_id = contact.get("uuid") or contact.get("serviceId")
-        if not service_id:
-            profile = contact.get("profile")
-            if isinstance(profile, dict):
-                service_id = profile.get("serviceId") or profile.get("uuid")
-
-        if service_id and _is_signal_service_id(service_id):
-            matches_number = number == phone_number or recipient == phone_number
-            if matches_number:
-                return service_id
-        return None
-
-    async def _resolve_recipient(self, chat_id: str) -> str:
-        """Return the preferred Signal recipient identifier for a direct chat."""
-        if (
-            not chat_id
-            or chat_id.startswith("group:")
-            or _is_signal_service_id(chat_id)
-            or not _looks_like_e164_number(chat_id)
-        ):
-            return chat_id
-
-        cached = self._recipient_uuid_by_number.get(chat_id)
-        if cached:
-            return cached
-
-        async with self._recipient_cache_lock:
-            cached = self._recipient_uuid_by_number.get(chat_id)
-            if cached:
-                return cached
-
-            contacts = await self._rpc("listContacts", {
-                "account": self.account,
-                "allRecipients": True,
-            })
-            if isinstance(contacts, list):
-                for contact in contacts:
-                    number = contact.get("number") if isinstance(contact, dict) else None
-                    service_id = self._extract_contact_uuid(contact, chat_id)
-                    if number and service_id:
-                        self._remember_recipient_identifiers(number, service_id)
-
-            return self._recipient_uuid_by_number.get(chat_id, chat_id)
-
    # ------------------------------------------------------------------
    # Attachment Handling
    # ------------------------------------------------------------------
@@ -622,19 +481,12 @@ class SignalAdapter(BasePlatformAdapter):
        """Fetch an attachment via JSON-RPC and cache it. Returns (path, ext)."""
        result = await self._rpc("getAttachment", {
            "account": self.account,
-            "id": attachment_id,
+            "attachmentId": attachment_id,
        })

        if not result:
            return None, ""

-        # Handle dict response (signal-cli returns {"data": "base64..."})
-        if isinstance(result, dict):
-            result = result.get("data")
-            if not result:
-                logger.warning("Signal: attachment response missing 'data' key")
-                return None, ""
-
        # Result is base64-encoded file content
        raw_data = base64.b64decode(result)
        ext = _guess_extension(raw_data)
@@ -652,22 +504,8 @@ class SignalAdapter(BasePlatformAdapter):
    # JSON-RPC Communication
    # ------------------------------------------------------------------

-    async def _rpc(
-        self,
-        method: str,
-        params: dict,
-        rpc_id: str = None,
-        *,
-        log_failures: bool = True,
-    ) -> Any:
-        """Send a JSON-RPC 2.0 request to signal-cli daemon.
-
-        When ``log_failures=False``, error and exception paths log at DEBUG
-        instead of WARNING — used by the typing-indicator path to silence
-        repeated NETWORK_FAILURE spam for unreachable recipients while
-        still preserving visibility for the first occurrence and for
-        unrelated RPCs.
-        """
+    async def _rpc(self, method: str, params: dict, rpc_id: str = None) -> Any:
+        """Send a JSON-RPC 2.0 request to signal-cli daemon."""
        if not self.client:
            logger.warning("Signal: RPC called but client not connected")
            return None
@@ -692,19 +530,13 @@ class SignalAdapter(BasePlatformAdapter):
            data = resp.json()

            if "error" in data:
-                if log_failures:
-                    logger.warning("Signal RPC error (%s): %s", method, data["error"])
-                else:
-                    logger.debug("Signal RPC error (%s): %s", method, data["error"])
+                logger.warning("Signal RPC error (%s): %s", method, data["error"])
                return None

            return data.get("result")

        except Exception as e:
-            if log_failures:
-                logger.warning("Signal RPC %s failed: %s", method, e)
-            else:
-                logger.debug("Signal RPC %s failed: %s", method, e)
+            logger.warning("Signal RPC %s failed: %s", method, e)
            return None

    # ------------------------------------------------------------------
@@ -714,65 +546,31 @@ class SignalAdapter(BasePlatformAdapter):
    async def send(
        self,
        chat_id: str,
-        content: str,
-        reply_to: Optional[str] = None,
-        metadata: Optional[Dict[str, Any]] = None,
+        text: str,
+        reply_to_message_id: Optional[str] = None,
+        **kwargs,
    ) -> SendResult:
        """Send a text message."""
        await self._stop_typing_indicator(chat_id)

        params: Dict[str, Any] = {
            "account": self.account,
-            "message": content,
+            "message": text,
        }

        if chat_id.startswith("group:"):
            params["groupId"] = chat_id[6:]
        else:
-            params["recipient"] = [await self._resolve_recipient(chat_id)]
+            params["recipient"] = [chat_id]

        result = await self._rpc("send", params)

        if result is not None:
-            self._track_sent_timestamp(result)
-            # Use the timestamp from the RPC result as a pseudo message_id.
-            # Signal doesn't have real message IDs, but the stream consumer
-            # needs a truthy value to follow its edit→fallback path correctly.
-            _msg_id = str(result.get("timestamp", "")) if isinstance(result, dict) else None
-            return SendResult(success=True, message_id=_msg_id or None)
+            return SendResult(success=True)
        return SendResult(success=False, error="RPC send failed")

-    def _track_sent_timestamp(self, rpc_result) -> None:
-        """Record outbound message timestamp for echo-back filtering."""
-        ts = rpc_result.get("timestamp") if isinstance(rpc_result, dict) else None
-        if ts:
-            self._recent_sent_timestamps.add(ts)
-            if len(self._recent_sent_timestamps) > self._max_recent_timestamps:
-                self._recent_sent_timestamps.pop()
-
-    async def send_typing(self, chat_id: str, metadata=None) -> None:
-        """Send a typing indicator.
-
-        base.py's ``_keep_typing`` refresh loop calls this every ~2s while
-        the agent is processing. If signal-cli returns NETWORK_FAILURE for
-        this recipient (offline, unroutable, group membership lost, etc.)
-        the unmitigated behaviour is: a WARNING log every 2 seconds for as
-        long as the agent keeps running. Instead we:
-
-        - silence the WARNING after the first consecutive failure (subsequent
-          attempts log at DEBUG) so transport issues are still visible once
-          but don't flood the log,
-        - skip the RPC entirely during an exponential cooldown window once
-          three consecutive failures have happened, so we stop hammering
-          signal-cli with requests it can't deliver.
-
-        A successful sendTyping clears the counters.
-        """
-        now = time.monotonic()
-        skip_until = self._typing_skip_until.get(chat_id, 0.0)
-        if now < skip_until:
-            return
-
+    async def send_typing(self, chat_id: str) -> None:
+        """Send a typing indicator."""
        params: Dict[str, Any] = {
            "account": self.account,
        }
@@ -780,28 +578,9 @@ class SignalAdapter(BasePlatformAdapter):
        if chat_id.startswith("group:"):
            params["groupId"] = chat_id[6:]
        else:
-            params["recipient"] = [await self._resolve_recipient(chat_id)]
+            params["recipient"] = [chat_id]

-        fails = self._typing_failures.get(chat_id, 0)
-        result = await self._rpc(
-            "sendTyping",
-            params,
-            rpc_id="typing",
-            log_failures=(fails == 0),
-        )
-
-        if result is None:
-            fails += 1
-            self._typing_failures[chat_id] = fails
-            # After 3 consecutive failures, back off exponentially (16s,
-            # 32s, 60s cap) to stop spamming signal-cli for a recipient
-            # that clearly isn't reachable right now.
-            if fails >= 3:
-                backoff = min(60.0, 16.0 * (2 ** (fails - 3)))
-                self._typing_skip_until[chat_id] = now + backoff
-        else:
-            self._typing_failures.pop(chat_id, None)
-            self._typing_skip_until.pop(chat_id, None)
+        await self._rpc("sendTyping", params, rpc_id="typing")

    async def send_image(
        self,
@@ -841,53 +620,13 @@ class SignalAdapter(BasePlatformAdapter):
        if chat_id.startswith("group:"):
            params["groupId"] = chat_id[6:]
        else:
-            params["recipient"] = [await self._resolve_recipient(chat_id)]
+            params["recipient"] = [chat_id]

        result = await self._rpc("send", params)
        if result is not None:
-            self._track_sent_timestamp(result)
            return SendResult(success=True)
        return SendResult(success=False, error="RPC send with attachment failed")

-    async def _send_attachment(
-        self,
-        chat_id: str,
-        file_path: str,
-        media_label: str,
-        caption: Optional[str] = None,
-    ) -> SendResult:
-        """Send any file as a Signal attachment via RPC.
-
-        Shared implementation for send_document, send_image_file, send_voice,
-        and send_video — avoids duplicating the validation/routing/RPC logic.
-        """
-        await self._stop_typing_indicator(chat_id)
-
-        try:
-            file_size = Path(file_path).stat().st_size
-        except FileNotFoundError:
-            return SendResult(success=False, error=f"{media_label} file not found: {file_path}")
-
-        if file_size > SIGNAL_MAX_ATTACHMENT_SIZE:
-            return SendResult(success=False, error=f"{media_label} too large ({file_size} bytes)")
-
-        params: Dict[str, Any] = {
-            "account": self.account,
-            "message": caption or "",
-            "attachments": [file_path],
-        }
-
-        if chat_id.startswith("group:"):
-            params["groupId"] = chat_id[6:]
-        else:
-            params["recipient"] = [await self._resolve_recipient(chat_id)]
-
-        result = await self._rpc("send", params)
-        if result is not None:
-            self._track_sent_timestamp(result)
-            return SendResult(success=True)
-        return SendResult(success=False, error=f"RPC send {media_label.lower()} failed")
-
    async def send_document(
        self,
        chat_id: str,
@@ -897,53 +636,46 @@ class SignalAdapter(BasePlatformAdapter):
        **kwargs,
    ) -> SendResult:
        """Send a document/file attachment."""
-        return await self._send_attachment(chat_id, file_path, "File", caption)
+        await self._stop_typing_indicator(chat_id)

-    async def send_image_file(
-        self,
-        chat_id: str,
-        image_path: str,
-        caption: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        """Send a local image file as a native Signal attachment.
+        if not Path(file_path).exists():
+            return SendResult(success=False, error="File not found")

-        Called by the gateway media delivery flow when MEDIA: tags containing
-        image paths are extracted from agent responses.
-        """
-        return await self._send_attachment(chat_id, image_path, "Image", caption)
+        params: Dict[str, Any] = {
+            "account": self.account,
+            "message": caption or "",
+            "attachments": [file_path],
+        }

-    async def send_voice(
-        self,
-        chat_id: str,
-        audio_path: str,
-        caption: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        """Send an audio file as a Signal attachment.
+        if chat_id.startswith("group:"):
+            params["groupId"] = chat_id[6:]
+        else:
+            params["recipient"] = [chat_id]

-        Signal does not distinguish voice messages from file attachments at
-        the API level, so this routes through the same RPC send path.
-        """
-        return await self._send_attachment(chat_id, audio_path, "Audio", caption)
-
-    async def send_video(
-        self,
-        chat_id: str,
-        video_path: str,
-        caption: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        """Send a video file as a Signal attachment."""
-        return await self._send_attachment(chat_id, video_path, "Video", caption)
+        result = await self._rpc("send", params)
+        if result is not None:
+            return SendResult(success=True)
+        return SendResult(success=False, error="RPC send document failed")

    # ------------------------------------------------------------------
    # Typing Indicators
    # ------------------------------------------------------------------

+    async def _start_typing_indicator(self, chat_id: str) -> None:
+        """Start a typing indicator loop for a chat."""
+        if chat_id in self._typing_tasks:
+            return  # Already running
+
+        async def _typing_loop():
+            try:
+                while True:
+                    await self.send_typing(chat_id)
+                    await asyncio.sleep(TYPING_INTERVAL)
+            except asyncio.CancelledError:
+                pass
+
+        self._typing_tasks[chat_id] = asyncio.create_task(_typing_loop())
+
    async def _stop_typing_indicator(self, chat_id: str) -> None:
        """Stop a typing indicator loop for a chat."""
        task = self._typing_tasks.pop(chat_id, None)
@@ -953,15 +685,6 @@ class SignalAdapter(BasePlatformAdapter):
                await task
            except asyncio.CancelledError:
                pass
-        # Reset per-chat typing backoff state so the next agent turn starts
-        # fresh rather than inheriting a cooldown from a prior conversation.
-        self._typing_failures.pop(chat_id, None)
-        self._typing_skip_until.pop(chat_id, None)
-
-    async def stop_typing(self, chat_id: str) -> None:
-        """Public interface for stopping typing — called by base adapter's
-        _keep_typing finally block to clean up platform-level typing tasks."""
-        await self._stop_typing_indicator(chat_id)

    # ------------------------------------------------------------------
    # Chat Info
--- a/gateway/platforms/slack.py
+++ b/gateway/platforms/slack.py
@@ -0,0 +1,429 @@
+"""
+Slack platform adapter.
+
+Uses slack-bolt (Python) with Socket Mode for:
+- Receiving messages from channels and DMs
+- Sending responses back
+- Handling slash commands
+- Thread support
+"""
+
+import asyncio
+import os
+from typing import Dict, List, Optional, Any
+
+try:
+    from slack_bolt.async_app import AsyncApp
+    from slack_bolt.adapter.socket_mode.async_handler import AsyncSocketModeHandler
+    from slack_sdk.web.async_client import AsyncWebClient
+    SLACK_AVAILABLE = True
+except ImportError:
+    SLACK_AVAILABLE = False
+    AsyncApp = Any
+    AsyncSocketModeHandler = Any
+    AsyncWebClient = Any
+
+import sys
+from pathlib import Path as _Path
+sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_image_from_url,
+    cache_audio_from_url,
+)
+
+
+def check_slack_requirements() -> bool:
+    """Check if Slack dependencies are available."""
+    return SLACK_AVAILABLE
+
+
+class SlackAdapter(BasePlatformAdapter):
+    """
+    Slack bot adapter using Socket Mode.
+
+    Requires two tokens:
+      - SLACK_BOT_TOKEN (xoxb-...) for API calls
+      - SLACK_APP_TOKEN (xapp-...) for Socket Mode connection
+
+    Features:
+      - DMs and channel messages (mention-gated in channels)
+      - Thread support
+      - File/image/audio attachments
+      - Slash commands (/hermes)
+      - Typing indicators (not natively supported by Slack bots)
+    """
+
+    MAX_MESSAGE_LENGTH = 4000  # Slack's limit is higher but mrkdwn can inflate
+
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.SLACK)
+        self._app: Optional[AsyncApp] = None
+        self._handler: Optional[AsyncSocketModeHandler] = None
+        self._bot_user_id: Optional[str] = None
+
+    async def connect(self) -> bool:
+        """Connect to Slack via Socket Mode."""
+        if not SLACK_AVAILABLE:
+            print("[Slack] slack-bolt not installed. Run: pip install slack-bolt")
+            return False
+
+        bot_token = self.config.token
+        app_token = os.getenv("SLACK_APP_TOKEN")
+
+        if not bot_token:
+            print("[Slack] SLACK_BOT_TOKEN not set")
+            return False
+        if not app_token:
+            print("[Slack] SLACK_APP_TOKEN not set")
+            return False
+
+        try:
+            self._app = AsyncApp(token=bot_token)
+
+            # Get our own bot user ID for mention detection
+            auth_response = await self._app.client.auth_test()
+            self._bot_user_id = auth_response.get("user_id")
+            bot_name = auth_response.get("user", "unknown")
+
+            # Register message event handler
+            @self._app.event("message")
+            async def handle_message_event(event, say):
+                await self._handle_slack_message(event)
+
+            # Register slash command handler
+            @self._app.command("/hermes")
+            async def handle_hermes_command(ack, command):
+                await ack()
+                await self._handle_slash_command(command)
+
+            # Start Socket Mode handler in background
+            self._handler = AsyncSocketModeHandler(self._app, app_token)
+            asyncio.create_task(self._handler.start_async())
+
+            self._running = True
+            print(f"[Slack] Connected as @{bot_name} (Socket Mode)")
+            return True
+
+        except Exception as e:
+            print(f"[Slack] Connection failed: {e}")
+            return False
+
+    async def disconnect(self) -> None:
+        """Disconnect from Slack."""
+        if self._handler:
+            await self._handler.close_async()
+        self._running = False
+        print("[Slack] Disconnected")
+
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> SendResult:
+        """Send a message to a Slack channel or DM."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            kwargs = {
+                "channel": chat_id,
+                "text": content,
+            }
+
+            # Reply in thread if thread_ts is available
+            if reply_to:
+                kwargs["thread_ts"] = reply_to
+            elif metadata and metadata.get("thread_ts"):
+                kwargs["thread_ts"] = metadata["thread_ts"]
+
+            result = await self._app.client.chat_postMessage(**kwargs)
+
+            return SendResult(
+                success=True,
+                message_id=result.get("ts"),
+                raw_response=result,
+            )
+
+        except Exception as e:
+            print(f"[Slack] Send error: {e}")
+            return SendResult(success=False, error=str(e))
+
+    async def edit_message(
+        self,
+        chat_id: str,
+        message_id: str,
+        content: str,
+    ) -> SendResult:
+        """Edit a previously sent Slack message."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+        try:
+            await self._app.client.chat_update(
+                channel=chat_id,
+                ts=message_id,
+                text=content,
+            )
+            return SendResult(success=True, message_id=message_id)
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def send_typing(self, chat_id: str) -> None:
+        """Slack doesn't have a direct typing indicator API for bots."""
+        pass
+
+    async def send_image_file(
+        self,
+        chat_id: str,
+        image_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a local image file to Slack by uploading it."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            import os
+            if not os.path.exists(image_path):
+                return SendResult(success=False, error=f"Image file not found: {image_path}")
+
+            result = await self._app.client.files_upload_v2(
+                channel=chat_id,
+                file=image_path,
+                filename=os.path.basename(image_path),
+                initial_comment=caption or "",
+                thread_ts=reply_to,
+            )
+            return SendResult(success=True, raw_response=result)
+
+        except Exception as e:
+            print(f"[{self.name}] Failed to send local image: {e}")
+            return await super().send_image_file(chat_id, image_path, caption, reply_to)
+
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an image to Slack by uploading the URL as a file."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            import httpx
+
+            # Download the image first
+            async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+                response = await client.get(image_url)
+                response.raise_for_status()
+
+            result = await self._app.client.files_upload_v2(
+                channel=chat_id,
+                content=response.content,
+                filename="image.png",
+                initial_comment=caption or "",
+                thread_ts=reply_to,
+            )
+
+            return SendResult(success=True, raw_response=result)
+
+        except Exception as e:
+            # Fall back to sending the URL as text
+            text = f"{caption}\n{image_url}" if caption else image_url
+            return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
+
+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an audio file to Slack."""
+        if not self._app:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            result = await self._app.client.files_upload_v2(
+                channel=chat_id,
+                file=audio_path,
+                filename=os.path.basename(audio_path),
+                initial_comment=caption or "",
+                thread_ts=reply_to,
+            )
+            return SendResult(success=True, raw_response=result)
+
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a Slack channel."""
+        if not self._app:
+            return {"name": chat_id, "type": "unknown"}
+
+        try:
+            result = await self._app.client.conversations_info(channel=chat_id)
+            channel = result.get("channel", {})
+            is_dm = channel.get("is_im", False)
+            return {
+                "name": channel.get("name", chat_id),
+                "type": "dm" if is_dm else "group",
+            }
+        except Exception:
+            return {"name": chat_id, "type": "unknown"}
+
+    # ----- Internal handlers -----
+
+    async def _handle_slack_message(self, event: dict) -> None:
+        """Handle an incoming Slack message event."""
+        # Ignore bot messages (including our own)
+        if event.get("bot_id") or event.get("subtype") == "bot_message":
+            return
+
+        # Ignore message edits and deletions
+        subtype = event.get("subtype")
+        if subtype in ("message_changed", "message_deleted"):
+            return
+
+        text = event.get("text", "")
+        user_id = event.get("user", "")
+        channel_id = event.get("channel", "")
+        thread_ts = event.get("thread_ts") or event.get("ts")
+        ts = event.get("ts", "")
+
+        # Determine if this is a DM or channel message
+        channel_type = event.get("channel_type", "")
+        is_dm = channel_type == "im"
+
+        # In channels, only respond if bot is mentioned
+        if not is_dm and self._bot_user_id:
+            if f"<@{self._bot_user_id}>" not in text:
+                return
+            # Strip the bot mention from the text
+            text = text.replace(f"<@{self._bot_user_id}>", "").strip()
+
+        # Determine message type
+        msg_type = MessageType.TEXT
+        if text.startswith("/"):
+            msg_type = MessageType.COMMAND
+
+        # Handle file attachments
+        media_urls = []
+        media_types = []
+        files = event.get("files", [])
+        for f in files:
+            mimetype = f.get("mimetype", "unknown")
+            url = f.get("url_private_download") or f.get("url_private", "")
+            if mimetype.startswith("image/") and url:
+                try:
+                    ext = "." + mimetype.split("/")[-1].split(";")[0]
+                    if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
+                        ext = ".jpg"
+                    # Slack private URLs require the bot token as auth header
+                    cached = await self._download_slack_file(url, ext)
+                    media_urls.append(cached)
+                    media_types.append(mimetype)
+                    msg_type = MessageType.PHOTO
+                except Exception as e:
+                    print(f"[Slack] Failed to cache image: {e}", flush=True)
+            elif mimetype.startswith("audio/") and url:
+                try:
+                    ext = "." + mimetype.split("/")[-1].split(";")[0]
+                    if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
+                        ext = ".ogg"
+                    cached = await self._download_slack_file(url, ext, audio=True)
+                    media_urls.append(cached)
+                    media_types.append(mimetype)
+                    msg_type = MessageType.VOICE
+                except Exception as e:
+                    print(f"[Slack] Failed to cache audio: {e}", flush=True)
+
+        # Build source
+        source = self.build_source(
+            chat_id=channel_id,
+            chat_name=channel_id,  # Will be resolved later if needed
+            chat_type="dm" if is_dm else "group",
+            user_id=user_id,
+            thread_id=thread_ts,
+        )
+
+        msg_event = MessageEvent(
+            text=text,
+            message_type=msg_type,
+            source=source,
+            raw_message=event,
+            message_id=ts,
+            media_urls=media_urls,
+            media_types=media_types,
+            reply_to_message_id=thread_ts if thread_ts != ts else None,
+        )
+
+        await self.handle_message(msg_event)
+
+    async def _handle_slash_command(self, command: dict) -> None:
+        """Handle /hermes slash command."""
+        text = command.get("text", "").strip()
+        user_id = command.get("user_id", "")
+        channel_id = command.get("channel_id", "")
+
+        # Map subcommands to gateway commands
+        subcommand_map = {
+            "new": "/reset", "reset": "/reset",
+            "status": "/status", "stop": "/stop",
+            "help": "/help",
+            "model": "/model", "personality": "/personality",
+            "retry": "/retry", "undo": "/undo",
+        }
+        first_word = text.split()[0] if text else ""
+        if first_word in subcommand_map:
+            # Preserve arguments after the subcommand
+            rest = text[len(first_word):].strip()
+            text = f"{subcommand_map[first_word]} {rest}".strip() if rest else subcommand_map[first_word]
+        elif text:
+            pass  # Treat as a regular question
+        else:
+            text = "/help"
+
+        source = self.build_source(
+            chat_id=channel_id,
+            chat_type="dm",  # Slash commands are always in DM-like context
+            user_id=user_id,
+        )
+
+        event = MessageEvent(
+            text=text,
+            message_type=MessageType.COMMAND if text.startswith("/") else MessageType.TEXT,
+            source=source,
+            raw_message=command,
+        )
+
+        await self.handle_message(event)
+
+    async def _download_slack_file(self, url: str, ext: str, audio: bool = False) -> str:
+        """Download a Slack file using the bot token for auth."""
+        import httpx
+
+        bot_token = self.config.token
+        async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
+            response = await client.get(
+                url,
+                headers={"Authorization": f"Bearer {bot_token}"},
+            )
+            response.raise_for_status()
+
+        if audio:
+            from gateway.platforms.base import cache_audio_from_bytes
+            return cache_audio_from_bytes(response.content, ext)
+        else:
+            from gateway.platforms.base import cache_image_from_bytes
+            return cache_image_from_bytes(response.content, ext)
--- a/gateway/platforms/telegram.py
+++ b/gateway/platforms/telegram.py
@@ -0,0 +1,794 @@
+"""
+Telegram platform adapter.
+
+Uses python-telegram-bot library for:
+- Receiving messages from users/groups
+- Sending responses back
+- Handling media and commands
+"""
+
+import asyncio
+import logging
+import os
+import re
+from typing import Dict, List, Optional, Any
+
+logger = logging.getLogger(__name__)
+
+try:
+    from telegram import Update, Bot, Message
+    from telegram.ext import (
+        Application,
+        CommandHandler,
+        MessageHandler as TelegramMessageHandler,
+        ContextTypes,
+        filters,
+    )
+    from telegram.constants import ParseMode, ChatType
+    TELEGRAM_AVAILABLE = True
+except ImportError:
+    TELEGRAM_AVAILABLE = False
+    Update = Any
+    Bot = Any
+    Message = Any
+    Application = Any
+    CommandHandler = Any
+    TelegramMessageHandler = Any
+    filters = None
+    ParseMode = None
+    ChatType = None
+
+    # Mock ContextTypes so type annotations using ContextTypes.DEFAULT_TYPE
+    # don't crash during class definition when the library isn't installed.
+    class _MockContextTypes:
+        DEFAULT_TYPE = Any
+    ContextTypes = _MockContextTypes
+
+import sys
+from pathlib import Path as _Path
+sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_image_from_bytes,
+    cache_audio_from_bytes,
+    cache_document_from_bytes,
+    SUPPORTED_DOCUMENT_TYPES,
+)
+
+
+def check_telegram_requirements() -> bool:
+    """Check if Telegram dependencies are available."""
+    return TELEGRAM_AVAILABLE
+
+
+# Matches every character that MarkdownV2 requires to be backslash-escaped
+# when it appears outside a code span or fenced code block.
+_MDV2_ESCAPE_RE = re.compile(r'([_*\[\]()~`>#\+\-=|{}.!\\])')
+
+
+def _escape_mdv2(text: str) -> str:
+    """Escape Telegram MarkdownV2 special characters with a preceding backslash."""
+    return _MDV2_ESCAPE_RE.sub(r'\\\1', text)
+
+
+def _strip_mdv2(text: str) -> str:
+    """Strip MarkdownV2 escape backslashes to produce clean plain text.
+
+    Also removes MarkdownV2 bold markers (*text* -> text) so the fallback
+    doesn't show stray asterisks from header/bold conversion.
+    """
+    # Remove escape backslashes before special characters
+    cleaned = re.sub(r'\\([_*\[\]()~`>#\+\-=|{}.!\\])', r'\1', text)
+    # Remove MarkdownV2 bold markers that format_message converted from **bold**
+    cleaned = re.sub(r'\*([^*]+)\*', r'\1', cleaned)
+    return cleaned
+
+
+class TelegramAdapter(BasePlatformAdapter):
+    """
+    Telegram bot adapter.
+    
+    Handles:
+    - Receiving messages from users and groups
+    - Sending responses with Telegram markdown
+    - Forum topics (thread_id support)
+    - Media messages
+    """
+    
+    # Telegram message limits
+    MAX_MESSAGE_LENGTH = 4096
+    
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.TELEGRAM)
+        self._app: Optional[Application] = None
+        self._bot: Optional[Bot] = None
+    
+    async def connect(self) -> bool:
+        """Connect to Telegram and start polling for updates."""
+        if not TELEGRAM_AVAILABLE:
+            print(f"[{self.name}] python-telegram-bot not installed. Run: pip install python-telegram-bot")
+            return False
+        
+        if not self.config.token:
+            print(f"[{self.name}] No bot token configured")
+            return False
+        
+        try:
+            # Build the application
+            self._app = Application.builder().token(self.config.token).build()
+            self._bot = self._app.bot
+            
+            # Register handlers
+            self._app.add_handler(TelegramMessageHandler(
+                filters.TEXT & ~filters.COMMAND,
+                self._handle_text_message
+            ))
+            self._app.add_handler(TelegramMessageHandler(
+                filters.COMMAND,
+                self._handle_command
+            ))
+            self._app.add_handler(TelegramMessageHandler(
+                filters.PHOTO | filters.VIDEO | filters.AUDIO | filters.VOICE | filters.Document.ALL | filters.Sticker.ALL,
+                self._handle_media_message
+            ))
+            
+            # Start polling in background
+            await self._app.initialize()
+            await self._app.start()
+            await self._app.updater.start_polling(allowed_updates=Update.ALL_TYPES)
+            
+            # Register bot commands so Telegram shows a hint menu when users type /
+            try:
+                from telegram import BotCommand
+                await self._bot.set_my_commands([
+                    BotCommand("new", "Start a new conversation"),
+                    BotCommand("reset", "Reset conversation history"),
+                    BotCommand("model", "Show or change the model"),
+                    BotCommand("personality", "Set a personality"),
+                    BotCommand("retry", "Retry your last message"),
+                    BotCommand("undo", "Remove the last exchange"),
+                    BotCommand("status", "Show session info"),
+                    BotCommand("stop", "Stop the running agent"),
+                    BotCommand("sethome", "Set this chat as the home channel"),
+                    BotCommand("compress", "Compress conversation context"),
+                    BotCommand("title", "Set or show the session title"),
+                    BotCommand("resume", "Resume a previously-named session"),
+                    BotCommand("usage", "Show token usage for this session"),
+                    BotCommand("provider", "Show available providers"),
+                    BotCommand("insights", "Show usage insights and analytics"),
+                    BotCommand("update", "Update Hermes to the latest version"),
+                    BotCommand("reload_mcp", "Reload MCP servers from config"),
+                    BotCommand("help", "Show available commands"),
+                ])
+            except Exception as e:
+                print(f"[{self.name}] Could not register command menu: {e}")
+            
+            self._running = True
+            print(f"[{self.name}] Connected and polling for updates")
+            return True
+            
+        except Exception as e:
+            print(f"[{self.name}] Failed to connect: {e}")
+            return False
+    
+    async def disconnect(self) -> None:
+        """Stop polling and disconnect."""
+        if self._app:
+            try:
+                await self._app.updater.stop()
+                await self._app.stop()
+                await self._app.shutdown()
+            except Exception as e:
+                print(f"[{self.name}] Error during disconnect: {e}")
+        
+        self._running = False
+        self._app = None
+        self._bot = None
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message to a Telegram chat."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            # Format and split message if needed
+            formatted = self.format_message(content)
+            chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
+            
+            message_ids = []
+            thread_id = metadata.get("thread_id") if metadata else None
+            
+            for i, chunk in enumerate(chunks):
+                # Try Markdown first, fall back to plain text if it fails
+                try:
+                    msg = await self._bot.send_message(
+                        chat_id=int(chat_id),
+                        text=chunk,
+                        parse_mode=ParseMode.MARKDOWN_V2,
+                        reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
+                        message_thread_id=int(thread_id) if thread_id else None,
+                    )
+                except Exception as md_error:
+                    # Markdown parsing failed, try plain text
+                    if "parse" in str(md_error).lower() or "markdown" in str(md_error).lower():
+                        logger.warning("[%s] MarkdownV2 parse failed, falling back to plain text: %s", self.name, md_error)
+                        # Strip MDV2 escape backslashes so the user doesn't
+                        # see raw backslashes littered through the message.
+                        plain_chunk = _strip_mdv2(chunk)
+                        msg = await self._bot.send_message(
+                            chat_id=int(chat_id),
+                            text=plain_chunk,
+                            parse_mode=None,  # Plain text
+                            reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
+                            message_thread_id=int(thread_id) if thread_id else None,
+                        )
+                    else:
+                        raise  # Re-raise if not a parse error
+                message_ids.append(str(msg.message_id))
+            
+            return SendResult(
+                success=True,
+                message_id=message_ids[0] if message_ids else None,
+                raw_response={"message_ids": message_ids}
+            )
+            
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def edit_message(
+        self,
+        chat_id: str,
+        message_id: str,
+        content: str,
+    ) -> SendResult:
+        """Edit a previously sent Telegram message."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        try:
+            formatted = self.format_message(content)
+            try:
+                await self._bot.edit_message_text(
+                    chat_id=int(chat_id),
+                    message_id=int(message_id),
+                    text=formatted,
+                    parse_mode=ParseMode.MARKDOWN_V2,
+                )
+            except Exception:
+                # Fallback: retry without markdown formatting
+                await self._bot.edit_message_text(
+                    chat_id=int(chat_id),
+                    message_id=int(message_id),
+                    text=content,
+                )
+            return SendResult(success=True, message_id=message_id)
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def send_voice(
+        self,
+        chat_id: str,
+        audio_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send audio as a native Telegram voice message or audio file."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import os
+            if not os.path.exists(audio_path):
+                return SendResult(success=False, error=f"Audio file not found: {audio_path}")
+            
+            with open(audio_path, "rb") as audio_file:
+                # .ogg files -> send as voice (round playable bubble)
+                if audio_path.endswith(".ogg") or audio_path.endswith(".opus"):
+                    msg = await self._bot.send_voice(
+                        chat_id=int(chat_id),
+                        voice=audio_file,
+                        caption=caption[:1024] if caption else None,
+                        reply_to_message_id=int(reply_to) if reply_to else None,
+                    )
+                else:
+                    # .mp3 and others -> send as audio file
+                    msg = await self._bot.send_audio(
+                        chat_id=int(chat_id),
+                        audio=audio_file,
+                        caption=caption[:1024] if caption else None,
+                        reply_to_message_id=int(reply_to) if reply_to else None,
+                    )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            print(f"[{self.name}] Failed to send voice/audio: {e}")
+            return await super().send_voice(chat_id, audio_path, caption, reply_to)
+    
+    async def send_image_file(
+        self,
+        chat_id: str,
+        image_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a local image file natively as a Telegram photo."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import os
+            if not os.path.exists(image_path):
+                return SendResult(success=False, error=f"Image file not found: {image_path}")
+            
+            with open(image_path, "rb") as image_file:
+                msg = await self._bot.send_photo(
+                    chat_id=int(chat_id),
+                    photo=image_file,
+                    caption=caption[:1024] if caption else None,
+                    reply_to_message_id=int(reply_to) if reply_to else None,
+                )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            print(f"[{self.name}] Failed to send local image: {e}")
+            return await super().send_image_file(chat_id, image_path, caption, reply_to)
+
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an image natively as a Telegram photo.
+        
+        Tries URL-based send first (fast, works for <5MB images).
+        Falls back to downloading and uploading as file (supports up to 10MB).
+        """
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            # Telegram can send photos directly from URLs (up to ~5MB)
+            msg = await self._bot.send_photo(
+                chat_id=int(chat_id),
+                photo=image_url,
+                caption=caption[:1024] if caption else None,  # Telegram caption limit
+                reply_to_message_id=int(reply_to) if reply_to else None,
+            )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            logger.warning("[%s] URL-based send_photo failed (%s), trying file upload", self.name, e)
+            # Fallback: download and upload as file (supports up to 10MB)
+            try:
+                import httpx
+                async with httpx.AsyncClient(timeout=30.0) as client:
+                    resp = await client.get(image_url)
+                    resp.raise_for_status()
+                    image_data = resp.content
+                
+                msg = await self._bot.send_photo(
+                    chat_id=int(chat_id),
+                    photo=image_data,
+                    caption=caption[:1024] if caption else None,
+                    reply_to_message_id=int(reply_to) if reply_to else None,
+                )
+                return SendResult(success=True, message_id=str(msg.message_id))
+            except Exception as e2:
+                logger.error("[%s] File upload send_photo also failed: %s", self.name, e2)
+                # Final fallback: send URL as text
+                return await super().send_image(chat_id, image_url, caption, reply_to)
+    
+    async def send_animation(
+        self,
+        chat_id: str,
+        animation_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an animated GIF natively as a Telegram animation (auto-plays inline)."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            msg = await self._bot.send_animation(
+                chat_id=int(chat_id),
+                animation=animation_url,
+                caption=caption[:1024] if caption else None,
+                reply_to_message_id=int(reply_to) if reply_to else None,
+            )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            print(f"[{self.name}] Failed to send animation, falling back to photo: {e}")
+            # Fallback: try as a regular photo
+            return await self.send_image(chat_id, animation_url, caption, reply_to)
+
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator."""
+        if self._bot:
+            try:
+                await self._bot.send_chat_action(
+                    chat_id=int(chat_id),
+                    action="typing"
+                )
+            except Exception:
+                pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a Telegram chat."""
+        if not self._bot:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            chat = await self._bot.get_chat(int(chat_id))
+            
+            chat_type = "dm"
+            if chat.type == ChatType.GROUP:
+                chat_type = "group"
+            elif chat.type == ChatType.SUPERGROUP:
+                chat_type = "group"
+                if chat.is_forum:
+                    chat_type = "forum"
+            elif chat.type == ChatType.CHANNEL:
+                chat_type = "channel"
+            
+            return {
+                "name": chat.title or chat.full_name or str(chat_id),
+                "type": chat_type,
+                "username": chat.username,
+                "is_forum": getattr(chat, "is_forum", False),
+            }
+        except Exception as e:
+            return {"name": str(chat_id), "type": "dm", "error": str(e)}
+    
+    def format_message(self, content: str) -> str:
+        """
+        Convert standard markdown to Telegram MarkdownV2 format.
+
+        Protected regions (code blocks, inline code) are extracted first so
+        their contents are never modified.  Standard markdown constructs
+        (headers, bold, italic, links) are translated to MarkdownV2 syntax,
+        and all remaining special characters are escaped.
+        """
+        if not content:
+            return content
+
+        placeholders: dict = {}
+        counter = [0]
+
+        def _ph(value: str) -> str:
+            """Stash *value* behind a placeholder token that survives escaping."""
+            key = f"\x00PH{counter[0]}\x00"
+            counter[0] += 1
+            placeholders[key] = value
+            return key
+
+        text = content
+
+        # 1) Protect fenced code blocks (``` ... ```)
+        text = re.sub(
+            r'(```(?:[^\n]*\n)?[\s\S]*?```)',
+            lambda m: _ph(m.group(0)),
+            text,
+        )
+
+        # 2) Protect inline code (`...`)
+        text = re.sub(r'(`[^`]+`)', lambda m: _ph(m.group(0)), text)
+
+        # 3) Convert markdown links – escape the display text; inside the URL
+        #    only ')' and '\' need escaping per the MarkdownV2 spec.
+        def _convert_link(m):
+            display = _escape_mdv2(m.group(1))
+            url = m.group(2).replace('\\', '\\\\').replace(')', '\\)')
+            return _ph(f'[{display}]({url})')
+
+        text = re.sub(r'\[([^\]]+)\]\(([^)]+)\)', _convert_link, text)
+
+        # 4) Convert markdown headers (## Title) → bold *Title*
+        def _convert_header(m):
+            inner = m.group(1).strip()
+            # Strip redundant bold markers that may appear inside a header
+            inner = re.sub(r'\*\*(.+?)\*\*', r'\1', inner)
+            return _ph(f'*{_escape_mdv2(inner)}*')
+
+        text = re.sub(
+            r'^#{1,6}\s+(.+)$', _convert_header, text, flags=re.MULTILINE
+        )
+
+        # 5) Convert bold: **text** → *text* (MarkdownV2 bold)
+        text = re.sub(
+            r'\*\*(.+?)\*\*',
+            lambda m: _ph(f'*{_escape_mdv2(m.group(1))}*'),
+            text,
+        )
+
+        # 6) Convert italic: *text* (single asterisk) → _text_ (MarkdownV2 italic)
+        #    [^*\n]+ prevents matching across newlines (which would corrupt
+        #    bullet lists using * markers and multi-line content).
+        text = re.sub(
+            r'\*([^*\n]+)\*',
+            lambda m: _ph(f'_{_escape_mdv2(m.group(1))}_'),
+            text,
+        )
+
+        # 7) Escape remaining special characters in plain text
+        text = _escape_mdv2(text)
+
+        # 8) Restore placeholders in reverse insertion order so that
+        #    nested references (a placeholder inside another) resolve correctly.
+        for key in reversed(list(placeholders.keys())):
+            text = text.replace(key, placeholders[key])
+
+        return text
+    
+    async def _handle_text_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming text messages."""
+        if not update.message or not update.message.text:
+            return
+        
+        event = self._build_message_event(update.message, MessageType.TEXT)
+        await self.handle_message(event)
+    
+    async def _handle_command(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming command messages."""
+        if not update.message or not update.message.text:
+            return
+        
+        event = self._build_message_event(update.message, MessageType.COMMAND)
+        await self.handle_message(event)
+    
+    async def _handle_media_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+        """Handle incoming media messages, downloading images to local cache."""
+        if not update.message:
+            return
+        
+        msg = update.message
+        
+        # Determine media type
+        if msg.sticker:
+            msg_type = MessageType.STICKER
+        elif msg.photo:
+            msg_type = MessageType.PHOTO
+        elif msg.video:
+            msg_type = MessageType.VIDEO
+        elif msg.audio:
+            msg_type = MessageType.AUDIO
+        elif msg.voice:
+            msg_type = MessageType.VOICE
+        elif msg.document:
+            msg_type = MessageType.DOCUMENT
+        else:
+            msg_type = MessageType.DOCUMENT
+        
+        event = self._build_message_event(msg, msg_type)
+        
+        # Add caption as text
+        if msg.caption:
+            event.text = msg.caption
+        
+        # Handle stickers: describe via vision tool with caching
+        if msg.sticker:
+            await self._handle_sticker(msg, event)
+            await self.handle_message(event)
+            return
+        
+        # Download photo to local image cache so the vision tool can access it
+        # even after Telegram's ephemeral file URLs expire (~1 hour).
+        if msg.photo:
+            try:
+                # msg.photo is a list of PhotoSize sorted by size; take the largest
+                photo = msg.photo[-1]
+                file_obj = await photo.get_file()
+                # Download the image bytes directly into memory
+                image_bytes = await file_obj.download_as_bytearray()
+                # Determine extension from the file path if available
+                ext = ".jpg"
+                if file_obj.file_path:
+                    for candidate in [".png", ".webp", ".gif", ".jpeg", ".jpg"]:
+                        if file_obj.file_path.lower().endswith(candidate):
+                            ext = candidate
+                            break
+                # Save to cache and populate media_urls with the local path
+                cached_path = cache_image_from_bytes(bytes(image_bytes), ext=ext)
+                event.media_urls = [cached_path]
+                event.media_types = [f"image/{ext.lstrip('.')}"]
+                print(f"[Telegram] Cached user photo: {cached_path}", flush=True)
+            except Exception as e:
+                print(f"[Telegram] Failed to cache photo: {e}", flush=True)
+        
+        # Download voice/audio messages to cache for STT transcription
+        if msg.voice:
+            try:
+                file_obj = await msg.voice.get_file()
+                audio_bytes = await file_obj.download_as_bytearray()
+                cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".ogg")
+                event.media_urls = [cached_path]
+                event.media_types = ["audio/ogg"]
+                print(f"[Telegram] Cached user voice: {cached_path}", flush=True)
+            except Exception as e:
+                print(f"[Telegram] Failed to cache voice: {e}", flush=True)
+        elif msg.audio:
+            try:
+                file_obj = await msg.audio.get_file()
+                audio_bytes = await file_obj.download_as_bytearray()
+                cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".mp3")
+                event.media_urls = [cached_path]
+                event.media_types = ["audio/mp3"]
+                print(f"[Telegram] Cached user audio: {cached_path}", flush=True)
+            except Exception as e:
+                print(f"[Telegram] Failed to cache audio: {e}", flush=True)
+
+        # Download document files to cache for agent processing
+        elif msg.document:
+            doc = msg.document
+            try:
+                # Determine file extension
+                ext = ""
+                original_filename = doc.file_name or ""
+                if original_filename:
+                    _, ext = os.path.splitext(original_filename)
+                    ext = ext.lower()
+
+                # If no extension from filename, reverse-lookup from MIME type
+                if not ext and doc.mime_type:
+                    mime_to_ext = {v: k for k, v in SUPPORTED_DOCUMENT_TYPES.items()}
+                    ext = mime_to_ext.get(doc.mime_type, "")
+
+                # Check if supported
+                if ext not in SUPPORTED_DOCUMENT_TYPES:
+                    supported_list = ", ".join(sorted(SUPPORTED_DOCUMENT_TYPES.keys()))
+                    event.text = (
+                        f"Unsupported document type '{ext or 'unknown'}'. "
+                        f"Supported types: {supported_list}"
+                    )
+                    print(f"[Telegram] Unsupported document type: {ext or 'unknown'}", flush=True)
+                    await self.handle_message(event)
+                    return
+
+                # Check file size (Telegram Bot API limit: 20 MB)
+                MAX_DOC_BYTES = 20 * 1024 * 1024
+                if not doc.file_size or doc.file_size > MAX_DOC_BYTES:
+                    event.text = (
+                        "The document is too large or its size could not be verified. "
+                        "Maximum: 20 MB."
+                    )
+                    print(f"[Telegram] Document too large: {doc.file_size} bytes", flush=True)
+                    await self.handle_message(event)
+                    return
+
+                # Download and cache
+                file_obj = await doc.get_file()
+                doc_bytes = await file_obj.download_as_bytearray()
+                raw_bytes = bytes(doc_bytes)
+                cached_path = cache_document_from_bytes(raw_bytes, original_filename or f"document{ext}")
+                mime_type = SUPPORTED_DOCUMENT_TYPES[ext]
+                event.media_urls = [cached_path]
+                event.media_types = [mime_type]
+                print(f"[Telegram] Cached user document: {cached_path}", flush=True)
+
+                # For text files, inject content into event.text (capped at 100 KB)
+                MAX_TEXT_INJECT_BYTES = 100 * 1024
+                if ext in (".md", ".txt") and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
+                    try:
+                        text_content = raw_bytes.decode("utf-8")
+                        display_name = original_filename or f"document{ext}"
+                        display_name = re.sub(r'[^\w.\- ]', '_', display_name)
+                        injection = f"[Content of {display_name}]:\n{text_content}"
+                        if event.text:
+                            event.text = f"{injection}\n\n{event.text}"
+                        else:
+                            event.text = injection
+                    except UnicodeDecodeError:
+                        print(f"[Telegram] Could not decode text file as UTF-8, skipping content injection", flush=True)
+
+            except Exception as e:
+                print(f"[Telegram] Failed to cache document: {e}", flush=True)
+
+        await self.handle_message(event)
+    
+    async def _handle_sticker(self, msg: Message, event: "MessageEvent") -> None:
+        """
+        Describe a Telegram sticker via vision analysis, with caching.
+
+        For static stickers (WEBP), we download, analyze with vision, and cache
+        the description by file_unique_id. For animated/video stickers, we inject
+        a placeholder noting the emoji.
+        """
+        from gateway.sticker_cache import (
+            get_cached_description,
+            cache_sticker_description,
+            build_sticker_injection,
+            build_animated_sticker_injection,
+            STICKER_VISION_PROMPT,
+        )
+
+        sticker = msg.sticker
+        emoji = sticker.emoji or ""
+        set_name = sticker.set_name or ""
+
+        # Animated and video stickers can't be analyzed as static images
+        if sticker.is_animated or sticker.is_video:
+            event.text = build_animated_sticker_injection(emoji)
+            return
+
+        # Check the cache first
+        cached = get_cached_description(sticker.file_unique_id)
+        if cached:
+            event.text = build_sticker_injection(
+                cached["description"], cached.get("emoji", emoji), cached.get("set_name", set_name)
+            )
+            print(f"[Telegram] Sticker cache hit: {sticker.file_unique_id}", flush=True)
+            return
+
+        # Cache miss -- download and analyze
+        try:
+            file_obj = await sticker.get_file()
+            image_bytes = await file_obj.download_as_bytearray()
+            cached_path = cache_image_from_bytes(bytes(image_bytes), ext=".webp")
+            print(f"[Telegram] Analyzing sticker: {cached_path}", flush=True)
+
+            from tools.vision_tools import vision_analyze_tool
+            import json as _json
+
+            result_json = await vision_analyze_tool(
+                image_url=cached_path,
+                user_prompt=STICKER_VISION_PROMPT,
+            )
+            result = _json.loads(result_json)
+
+            if result.get("success"):
+                description = result.get("analysis", "a sticker")
+                cache_sticker_description(sticker.file_unique_id, description, emoji, set_name)
+                event.text = build_sticker_injection(description, emoji, set_name)
+            else:
+                # Vision failed -- use emoji as fallback
+                event.text = build_sticker_injection(
+                    f"a sticker with emoji {emoji}" if emoji else "a sticker",
+                    emoji, set_name,
+                )
+        except Exception as e:
+            print(f"[Telegram] Sticker analysis error: {e}", flush=True)
+            event.text = build_sticker_injection(
+                f"a sticker with emoji {emoji}" if emoji else "a sticker",
+                emoji, set_name,
+            )
+
+    def _build_message_event(self, message: Message, msg_type: MessageType) -> MessageEvent:
+        """Build a MessageEvent from a Telegram message."""
+        chat = message.chat
+        user = message.from_user
+        
+        # Determine chat type
+        chat_type = "dm"
+        if chat.type in (ChatType.GROUP, ChatType.SUPERGROUP):
+            chat_type = "group"
+        elif chat.type == ChatType.CHANNEL:
+            chat_type = "channel"
+        
+        # Build source
+        source = self.build_source(
+            chat_id=str(chat.id),
+            chat_name=chat.title or (chat.full_name if hasattr(chat, "full_name") else None),
+            chat_type=chat_type,
+            user_id=str(user.id) if user else None,
+            user_name=user.full_name if user else None,
+            thread_id=str(message.message_thread_id) if message.message_thread_id else None,
+        )
+        
+        return MessageEvent(
+            text=message.text or "",
+            message_type=msg_type,
+            source=source,
+            raw_message=message,
+            message_id=str(message.message_id),
+            timestamp=message.date,
+        )
--- a/gateway/platforms/whatsapp.py
+++ b/gateway/platforms/whatsapp.py
@@ -0,0 +1,638 @@
+"""
+WhatsApp platform adapter.
+
+WhatsApp integration is more complex than Telegram/Discord because:
+- No official bot API for personal accounts
+- Business API requires Meta Business verification
+- Most solutions use web-based automation
+
+This adapter supports multiple backends:
+1. WhatsApp Business API (requires Meta verification)
+2. whatsapp-web.js (via Node.js subprocess) - for personal accounts
+3. Baileys (via Node.js subprocess) - alternative for personal accounts
+
+For simplicity, we'll implement a generic interface that can work
+with different backends via a bridge pattern.
+"""
+
+import asyncio
+import json
+import logging
+import os
+import platform
+import subprocess
+
+_IS_WINDOWS = platform.system() == "Windows"
+from pathlib import Path
+from typing import Dict, List, Optional, Any
+
+logger = logging.getLogger(__name__)
+
+
+def _kill_port_process(port: int) -> None:
+    """Kill any process listening on the given TCP port."""
+    try:
+        if _IS_WINDOWS:
+            # Use netstat to find the PID bound to this port, then taskkill
+            result = subprocess.run(
+                ["netstat", "-ano", "-p", "TCP"],
+                capture_output=True, text=True, timeout=5,
+            )
+            for line in result.stdout.splitlines():
+                parts = line.split()
+                if len(parts) >= 5 and parts[3] == "LISTENING":
+                    local_addr = parts[1]
+                    if local_addr.endswith(f":{port}"):
+                        try:
+                            subprocess.run(
+                                ["taskkill", "/PID", parts[4], "/F"],
+                                capture_output=True, timeout=5,
+                            )
+                        except subprocess.SubprocessError:
+                            pass
+        else:
+            result = subprocess.run(
+                ["fuser", f"{port}/tcp"],
+                capture_output=True, timeout=5,
+            )
+            if result.returncode == 0:
+                subprocess.run(
+                    ["fuser", "-k", f"{port}/tcp"],
+                    capture_output=True, timeout=5,
+                )
+    except Exception:
+        pass
+
+import sys
+sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_image_from_url,
+    cache_audio_from_url,
+)
+
+
+def check_whatsapp_requirements() -> bool:
+    """
+    Check if WhatsApp dependencies are available.
+    
+    WhatsApp requires a Node.js bridge for most implementations.
+    """
+    # Check for Node.js
+    try:
+        result = subprocess.run(
+            ["node", "--version"],
+            capture_output=True,
+            text=True,
+            timeout=5
+        )
+        return result.returncode == 0
+    except Exception:
+        return False
+
+
+class WhatsAppAdapter(BasePlatformAdapter):
+    """
+    WhatsApp adapter.
+    
+    This implementation uses a simple HTTP bridge pattern where:
+    1. A Node.js process runs the WhatsApp Web client
+    2. Messages are forwarded via HTTP/IPC to this Python adapter
+    3. Responses are sent back through the bridge
+    
+    The actual Node.js bridge implementation can vary:
+    - whatsapp-web.js based
+    - Baileys based
+    - Business API based
+    
+    Configuration:
+    - bridge_script: Path to the Node.js bridge script
+    - bridge_port: Port for HTTP communication (default: 3000)
+    - session_path: Path to store WhatsApp session data
+    """
+    
+    # WhatsApp message limits
+    MAX_MESSAGE_LENGTH = 65536  # WhatsApp allows longer messages
+    
+    # Default bridge location relative to the hermes-agent install
+    _DEFAULT_BRIDGE_DIR = Path(__file__).resolve().parents[2] / "scripts" / "whatsapp-bridge"
+
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.WHATSAPP)
+        self._bridge_process: Optional[subprocess.Popen] = None
+        self._bridge_port: int = config.extra.get("bridge_port", 3000)
+        self._bridge_script: Optional[str] = config.extra.get(
+            "bridge_script",
+            str(self._DEFAULT_BRIDGE_DIR / "bridge.js"),
+        )
+        self._session_path: Path = Path(config.extra.get(
+            "session_path",
+            Path.home() / ".hermes" / "whatsapp" / "session"
+        ))
+        self._message_queue: asyncio.Queue = asyncio.Queue()
+        self._bridge_log_fh = None
+        self._bridge_log: Optional[Path] = None
+    
+    async def connect(self) -> bool:
+        """
+        Start the WhatsApp bridge.
+        
+        This launches the Node.js bridge process and waits for it to be ready.
+        """
+        if not check_whatsapp_requirements():
+            logger.warning("[%s] Node.js not found. WhatsApp requires Node.js.", self.name)
+            return False
+        
+        bridge_path = Path(self._bridge_script)
+        if not bridge_path.exists():
+            logger.warning("[%s] Bridge script not found: %s", self.name, bridge_path)
+            return False
+        
+        logger.info("[%s] Bridge found at %s", self.name, bridge_path)
+        
+        # Auto-install npm dependencies if node_modules doesn't exist
+        bridge_dir = bridge_path.parent
+        if not (bridge_dir / "node_modules").exists():
+            print(f"[{self.name}] Installing WhatsApp bridge dependencies...")
+            try:
+                install_result = subprocess.run(
+                    ["npm", "install", "--silent"],
+                    cwd=str(bridge_dir),
+                    capture_output=True,
+                    text=True,
+                    timeout=60,
+                )
+                if install_result.returncode != 0:
+                    print(f"[{self.name}] npm install failed: {install_result.stderr}")
+                    return False
+                print(f"[{self.name}] Dependencies installed")
+            except Exception as e:
+                print(f"[{self.name}] Failed to install dependencies: {e}")
+                return False
+        
+        try:
+            # Ensure session directory exists
+            self._session_path.mkdir(parents=True, exist_ok=True)
+            
+            # Kill any orphaned bridge from a previous gateway run
+            _kill_port_process(self._bridge_port)
+            import time
+            time.sleep(1)
+            
+            # Start the bridge process in its own process group.
+            # Route output to a log file so QR codes, errors, and reconnection
+            # messages are preserved for troubleshooting.
+            whatsapp_mode = os.getenv("WHATSAPP_MODE", "self-chat")
+            self._bridge_log = self._session_path.parent / "bridge.log"
+            bridge_log_fh = open(self._bridge_log, "a")
+            self._bridge_log_fh = bridge_log_fh
+            self._bridge_process = subprocess.Popen(
+                [
+                    "node",
+                    str(bridge_path),
+                    "--port", str(self._bridge_port),
+                    "--session", str(self._session_path),
+                    "--mode", whatsapp_mode,
+                ],
+                stdout=bridge_log_fh,
+                stderr=bridge_log_fh,
+                preexec_fn=None if _IS_WINDOWS else os.setsid,
+            )
+            
+            # Wait for the bridge to connect to WhatsApp.
+            # Phase 1: wait for the HTTP server to come up (up to 15s).
+            # Phase 2: wait for WhatsApp status: connected (up to 15s more).
+            import aiohttp
+            http_ready = False
+            data = {}
+            for attempt in range(15):
+                await asyncio.sleep(1)
+                if self._bridge_process.poll() is not None:
+                    print(f"[{self.name}] Bridge process died (exit code {self._bridge_process.returncode})")
+                    print(f"[{self.name}] Check log: {self._bridge_log}")
+                    self._close_bridge_log()
+                    return False
+                try:
+                    async with aiohttp.ClientSession() as session:
+                        async with session.get(
+                            f"http://localhost:{self._bridge_port}/health",
+                            timeout=aiohttp.ClientTimeout(total=2)
+                        ) as resp:
+                            if resp.status == 200:
+                                http_ready = True
+                                data = await resp.json()
+                                if data.get("status") == "connected":
+                                    print(f"[{self.name}] Bridge ready (status: connected)")
+                                    break
+                except Exception:
+                    continue
+
+            if not http_ready:
+                print(f"[{self.name}] Bridge HTTP server did not start in 15s")
+                print(f"[{self.name}] Check log: {self._bridge_log}")
+                self._close_bridge_log()
+                return False
+            
+            # Phase 2: HTTP is up but WhatsApp may still be connecting.
+            # Give it more time to authenticate with saved credentials.
+            if data.get("status") != "connected":
+                print(f"[{self.name}] Bridge HTTP ready, waiting for WhatsApp connection...")
+                for attempt in range(15):
+                    await asyncio.sleep(1)
+                    if self._bridge_process.poll() is not None:
+                        print(f"[{self.name}] Bridge process died during connection")
+                        print(f"[{self.name}] Check log: {self._bridge_log}")
+                        self._close_bridge_log()
+                        return False
+                    try:
+                        async with aiohttp.ClientSession() as session:
+                            async with session.get(
+                                f"http://localhost:{self._bridge_port}/health",
+                                timeout=aiohttp.ClientTimeout(total=2)
+                            ) as resp:
+                                if resp.status == 200:
+                                    data = await resp.json()
+                                    if data.get("status") == "connected":
+                                        print(f"[{self.name}] Bridge ready (status: connected)")
+                                        break
+                    except Exception:
+                        continue
+                else:
+                    # Still not connected — warn but proceed (bridge may
+                    # auto-reconnect later, e.g. after a code 515 restart).
+                    print(f"[{self.name}] ⚠ WhatsApp not connected after 30s")
+                    print(f"[{self.name}]   Bridge log: {self._bridge_log}")
+                    print(f"[{self.name}]   If session expired, re-pair: hermes whatsapp")
+            
+            # Start message polling task
+            asyncio.create_task(self._poll_messages())
+            
+            self._running = True
+            print(f"[{self.name}] Bridge started on port {self._bridge_port}")
+            return True
+            
+        except Exception as e:
+            logger.error("[%s] Failed to start bridge: %s", self.name, e, exc_info=True)
+            self._close_bridge_log()
+            return False
+    
+    def _close_bridge_log(self) -> None:
+        """Close the bridge log file handle if open."""
+        if self._bridge_log_fh:
+            try:
+                self._bridge_log_fh.close()
+            except Exception:
+                pass
+            self._bridge_log_fh = None
+
+    async def disconnect(self) -> None:
+        """Stop the WhatsApp bridge and clean up any orphaned processes."""
+        if self._bridge_process:
+            try:
+                # Kill the entire process group so child node processes die too
+                import signal
+                try:
+                    if _IS_WINDOWS:
+                        self._bridge_process.terminate()
+                    else:
+                        os.killpg(os.getpgid(self._bridge_process.pid), signal.SIGTERM)
+                except (ProcessLookupError, PermissionError):
+                    self._bridge_process.terminate()
+                await asyncio.sleep(1)
+                if self._bridge_process.poll() is None:
+                    try:
+                        if _IS_WINDOWS:
+                            self._bridge_process.kill()
+                        else:
+                            os.killpg(os.getpgid(self._bridge_process.pid), signal.SIGKILL)
+                    except (ProcessLookupError, PermissionError):
+                        self._bridge_process.kill()
+            except Exception as e:
+                print(f"[{self.name}] Error stopping bridge: {e}")
+        
+        # Also kill any orphaned bridge processes on our port
+        _kill_port_process(self._bridge_port)
+        
+        self._running = False
+        self._bridge_process = None
+        self._close_bridge_log()
+        print(f"[{self.name}] Disconnected")
+    
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> SendResult:
+        """Send a message via the WhatsApp bridge."""
+        if not self._running:
+            return SendResult(success=False, error="Not connected")
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                payload = {
+                    "chatId": chat_id,
+                    "message": content,
+                }
+                if reply_to:
+                    payload["replyTo"] = reply_to
+                
+                async with session.post(
+                    f"http://localhost:{self._bridge_port}/send",
+                    json=payload,
+                    timeout=aiohttp.ClientTimeout(total=30)
+                ) as resp:
+                    if resp.status == 200:
+                        data = await resp.json()
+                        return SendResult(
+                            success=True,
+                            message_id=data.get("messageId"),
+                            raw_response=data
+                        )
+                    else:
+                        error = await resp.text()
+                        return SendResult(success=False, error=error)
+                        
+        except ImportError:
+            return SendResult(
+                success=False, 
+                error="aiohttp not installed. Run: pip install aiohttp"
+            )
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def edit_message(
+        self,
+        chat_id: str,
+        message_id: str,
+        content: str,
+    ) -> SendResult:
+        """Edit a previously sent message via the WhatsApp bridge."""
+        if not self._running:
+            return SendResult(success=False, error="Not connected")
+        try:
+            import aiohttp
+            async with aiohttp.ClientSession() as session:
+                async with session.post(
+                    f"http://localhost:{self._bridge_port}/edit",
+                    json={
+                        "chatId": chat_id,
+                        "messageId": message_id,
+                        "message": content,
+                    },
+                    timeout=aiohttp.ClientTimeout(total=15)
+                ) as resp:
+                    if resp.status == 200:
+                        return SendResult(success=True, message_id=message_id)
+                    else:
+                        error = await resp.text()
+                        return SendResult(success=False, error=error)
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def _send_media_to_bridge(
+        self,
+        chat_id: str,
+        file_path: str,
+        media_type: str,
+        caption: Optional[str] = None,
+        file_name: Optional[str] = None,
+    ) -> SendResult:
+        """Send any media file via bridge /send-media endpoint."""
+        if not self._running:
+            return SendResult(success=False, error="Not connected")
+        try:
+            import aiohttp
+
+            if not os.path.exists(file_path):
+                return SendResult(success=False, error=f"File not found: {file_path}")
+
+            payload: Dict[str, Any] = {
+                "chatId": chat_id,
+                "filePath": file_path,
+                "mediaType": media_type,
+            }
+            if caption:
+                payload["caption"] = caption
+            if file_name:
+                payload["fileName"] = file_name
+
+            async with aiohttp.ClientSession() as session:
+                async with session.post(
+                    f"http://localhost:{self._bridge_port}/send-media",
+                    json=payload,
+                    timeout=aiohttp.ClientTimeout(total=120),
+                ) as resp:
+                    if resp.status == 200:
+                        data = await resp.json()
+                        return SendResult(
+                            success=True,
+                            message_id=data.get("messageId"),
+                            raw_response=data,
+                        )
+                    else:
+                        error = await resp.text()
+                        return SendResult(success=False, error=error)
+
+        except Exception as e:
+            return SendResult(success=False, error=str(e))
+
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Download image URL to cache, send natively via bridge."""
+        try:
+            local_path = await cache_image_from_url(image_url)
+            return await self._send_media_to_bridge(chat_id, local_path, "image", caption)
+        except Exception:
+            return await super().send_image(chat_id, image_url, caption, reply_to)
+
+    async def send_image_file(
+        self,
+        chat_id: str,
+        image_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a local image file natively via bridge."""
+        return await self._send_media_to_bridge(chat_id, image_path, "image", caption)
+
+    async def send_video(
+        self,
+        chat_id: str,
+        video_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a video natively via bridge — plays inline in WhatsApp."""
+        return await self._send_media_to_bridge(chat_id, video_path, "video", caption)
+
+    async def send_document(
+        self,
+        chat_id: str,
+        file_path: str,
+        caption: Optional[str] = None,
+        file_name: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a document/file as a downloadable attachment via bridge."""
+        return await self._send_media_to_bridge(
+            chat_id, file_path, "document", caption,
+            file_name or os.path.basename(file_path),
+        )
+
+    async def send_typing(self, chat_id: str) -> None:
+        """Send typing indicator via bridge."""
+        if not self._running:
+            return
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                await session.post(
+                    f"http://localhost:{self._bridge_port}/typing",
+                    json={"chatId": chat_id},
+                    timeout=aiohttp.ClientTimeout(total=5)
+                )
+        except Exception:
+            pass  # Ignore typing indicator failures
+    
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Get information about a WhatsApp chat."""
+        if not self._running:
+            return {"name": "Unknown", "type": "dm"}
+        
+        try:
+            import aiohttp
+            
+            async with aiohttp.ClientSession() as session:
+                async with session.get(
+                    f"http://localhost:{self._bridge_port}/chat/{chat_id}",
+                    timeout=aiohttp.ClientTimeout(total=10)
+                ) as resp:
+                    if resp.status == 200:
+                        data = await resp.json()
+                        return {
+                            "name": data.get("name", chat_id),
+                            "type": "group" if data.get("isGroup") else "dm",
+                            "participants": data.get("participants", []),
+                        }
+        except Exception as e:
+            logger.debug("Could not get WhatsApp chat info for %s: %s", chat_id, e)
+        
+        return {"name": chat_id, "type": "dm"}
+    
+    async def _poll_messages(self) -> None:
+        """Poll the bridge for incoming messages."""
+        try:
+            import aiohttp
+        except ImportError:
+            print(f"[{self.name}] aiohttp not installed, message polling disabled")
+            return
+        
+        while self._running:
+            try:
+                async with aiohttp.ClientSession() as session:
+                    async with session.get(
+                        f"http://localhost:{self._bridge_port}/messages",
+                        timeout=aiohttp.ClientTimeout(total=30)
+                    ) as resp:
+                        if resp.status == 200:
+                            messages = await resp.json()
+                            for msg_data in messages:
+                                event = await self._build_message_event(msg_data)
+                                if event:
+                                    await self.handle_message(event)
+            except asyncio.CancelledError:
+                break
+            except Exception as e:
+                print(f"[{self.name}] Poll error: {e}")
+                await asyncio.sleep(5)
+            
+            await asyncio.sleep(1)  # Poll interval
+    
+    async def _build_message_event(self, data: Dict[str, Any]) -> Optional[MessageEvent]:
+        """Build a MessageEvent from bridge message data, downloading images to cache."""
+        try:
+            # Determine message type
+            msg_type = MessageType.TEXT
+            if data.get("hasMedia"):
+                media_type = data.get("mediaType", "")
+                if "image" in media_type:
+                    msg_type = MessageType.PHOTO
+                elif "video" in media_type:
+                    msg_type = MessageType.VIDEO
+                elif "audio" in media_type or "ptt" in media_type:  # ptt = voice note
+                    msg_type = MessageType.VOICE
+                else:
+                    msg_type = MessageType.DOCUMENT
+            
+            # Determine chat type
+            is_group = data.get("isGroup", False)
+            chat_type = "group" if is_group else "dm"
+            
+            # Build source
+            source = self.build_source(
+                chat_id=data.get("chatId", ""),
+                chat_name=data.get("chatName"),
+                chat_type=chat_type,
+                user_id=data.get("senderId"),
+                user_name=data.get("senderName"),
+            )
+            
+            # Download image media URLs to the local cache so the vision tool
+            # can access them reliably regardless of URL expiration.
+            raw_urls = data.get("mediaUrls", [])
+            cached_urls = []
+            media_types = []
+            for url in raw_urls:
+                if msg_type == MessageType.PHOTO and url.startswith(("http://", "https://")):
+                    try:
+                        cached_path = await cache_image_from_url(url, ext=".jpg")
+                        cached_urls.append(cached_path)
+                        media_types.append("image/jpeg")
+                        print(f"[{self.name}] Cached user image: {cached_path}", flush=True)
+                    except Exception as e:
+                        print(f"[{self.name}] Failed to cache image: {e}", flush=True)
+                        cached_urls.append(url)
+                        media_types.append("image/jpeg")
+                elif msg_type == MessageType.VOICE and url.startswith(("http://", "https://")):
+                    try:
+                        cached_path = await cache_audio_from_url(url, ext=".ogg")
+                        cached_urls.append(cached_path)
+                        media_types.append("audio/ogg")
+                        print(f"[{self.name}] Cached user voice: {cached_path}", flush=True)
+                    except Exception as e:
+                        print(f"[{self.name}] Failed to cache voice: {e}", flush=True)
+                        cached_urls.append(url)
+                        media_types.append("audio/ogg")
+                else:
+                    cached_urls.append(url)
+                    media_types.append("unknown")
+            
+            return MessageEvent(
+                text=data.get("body", ""),
+                message_type=msg_type,
+                source=source,
+                raw_message=data,
+                message_id=data.get("messageId"),
+                media_urls=cached_urls,
+                media_types=media_types,
+            )
+        except Exception as e:
+            print(f"[{self.name}] Error building event: {e}")
+            return None
+
--- a/gateway/run.py
+++ b/gateway/run.py
--- a/gateway/session.py
+++ b/gateway/session.py
@@ -0,0 +1,772 @@
+"""
+Session management for the gateway.
+
+Handles:
+- Session context tracking (where messages come from)
+- Session storage (conversations persisted to disk)
+- Reset policy evaluation (when to start fresh)
+- Dynamic system prompt injection (agent knows its context)
+"""
+
+import logging
+import os
+import json
+import uuid
+from pathlib import Path
+from datetime import datetime, timedelta
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Any
+
+logger = logging.getLogger(__name__)
+
+from .config import (
+    Platform,
+    GatewayConfig,
+    SessionResetPolicy,
+    HomeChannel,
+)
+
+
+@dataclass
+class SessionSource:
+    """
+    Describes where a message originated from.
+    
+    This information is used to:
+    1. Route responses back to the right place
+    2. Inject context into the system prompt
+    3. Track origin for cron job delivery
+    """
+    platform: Platform
+    chat_id: str
+    chat_name: Optional[str] = None
+    chat_type: str = "dm"  # "dm", "group", "channel", "thread"
+    user_id: Optional[str] = None
+    user_name: Optional[str] = None
+    thread_id: Optional[str] = None  # For forum topics, Discord threads, etc.
+    chat_topic: Optional[str] = None  # Channel topic/description (Discord, Slack)
+    user_id_alt: Optional[str] = None  # Signal UUID (alternative to phone number)
+    chat_id_alt: Optional[str] = None  # Signal group internal ID
+    
+    @property
+    def description(self) -> str:
+        """Human-readable description of the source."""
+        if self.platform == Platform.LOCAL:
+            return "CLI terminal"
+        
+        parts = []
+        if self.chat_type == "dm":
+            parts.append(f"DM with {self.user_name or self.user_id or 'user'}")
+        elif self.chat_type == "group":
+            parts.append(f"group: {self.chat_name or self.chat_id}")
+        elif self.chat_type == "channel":
+            parts.append(f"channel: {self.chat_name or self.chat_id}")
+        else:
+            parts.append(self.chat_name or self.chat_id)
+        
+        if self.thread_id:
+            parts.append(f"thread: {self.thread_id}")
+        
+        return ", ".join(parts)
+    
+    def to_dict(self) -> Dict[str, Any]:
+        d = {
+            "platform": self.platform.value,
+            "chat_id": self.chat_id,
+            "chat_name": self.chat_name,
+            "chat_type": self.chat_type,
+            "user_id": self.user_id,
+            "user_name": self.user_name,
+            "thread_id": self.thread_id,
+            "chat_topic": self.chat_topic,
+        }
+        if self.user_id_alt:
+            d["user_id_alt"] = self.user_id_alt
+        if self.chat_id_alt:
+            d["chat_id_alt"] = self.chat_id_alt
+        return d
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionSource":
+        return cls(
+            platform=Platform(data["platform"]),
+            chat_id=str(data["chat_id"]),
+            chat_name=data.get("chat_name"),
+            chat_type=data.get("chat_type", "dm"),
+            user_id=data.get("user_id"),
+            user_name=data.get("user_name"),
+            thread_id=data.get("thread_id"),
+            chat_topic=data.get("chat_topic"),
+            user_id_alt=data.get("user_id_alt"),
+            chat_id_alt=data.get("chat_id_alt"),
+        )
+    
+    @classmethod
+    def local_cli(cls) -> "SessionSource":
+        """Create a source representing the local CLI."""
+        return cls(
+            platform=Platform.LOCAL,
+            chat_id="cli",
+            chat_name="CLI terminal",
+            chat_type="dm",
+        )
+
+
+@dataclass
+class SessionContext:
+    """
+    Full context for a session, used for dynamic system prompt injection.
+    
+    The agent receives this information to understand:
+    - Where messages are coming from
+    - What platforms are available
+    - Where it can deliver scheduled task outputs
+    """
+    source: SessionSource
+    connected_platforms: List[Platform]
+    home_channels: Dict[Platform, HomeChannel]
+    
+    # Session metadata
+    session_key: str = ""
+    session_id: str = ""
+    created_at: Optional[datetime] = None
+    updated_at: Optional[datetime] = None
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "source": self.source.to_dict(),
+            "connected_platforms": [p.value for p in self.connected_platforms],
+            "home_channels": {
+                p.value: hc.to_dict() for p, hc in self.home_channels.items()
+            },
+            "session_key": self.session_key,
+            "session_id": self.session_id,
+            "created_at": self.created_at.isoformat() if self.created_at else None,
+            "updated_at": self.updated_at.isoformat() if self.updated_at else None,
+        }
+
+
+def build_session_context_prompt(context: SessionContext) -> str:
+    """
+    Build the dynamic system prompt section that tells the agent about its context.
+    
+    This is injected into the system prompt so the agent knows:
+    - Where messages are coming from
+    - What platforms are connected
+    - Where it can deliver scheduled task outputs
+    """
+    lines = [
+        "## Current Session Context",
+        "",
+    ]
+    
+    # Source info
+    platform_name = context.source.platform.value.title()
+    if context.source.platform == Platform.LOCAL:
+        lines.append(f"**Source:** {platform_name} (the machine running this agent)")
+    else:
+        lines.append(f"**Source:** {platform_name} ({context.source.description})")
+    
+    # Channel topic (if available - provides context about the channel's purpose)
+    if context.source.chat_topic:
+        lines.append(f"**Channel Topic:** {context.source.chat_topic}")
+
+    # User identity (especially useful for WhatsApp where multiple people DM)
+    if context.source.user_name:
+        lines.append(f"**User:** {context.source.user_name}")
+    elif context.source.user_id:
+        lines.append(f"**User ID:** {context.source.user_id}")
+    
+    # Connected platforms
+    platforms_list = ["local (files on this machine)"]
+    for p in context.connected_platforms:
+        if p != Platform.LOCAL:
+            platforms_list.append(f"{p.value}: Connected ✓")
+    
+    lines.append(f"**Connected Platforms:** {', '.join(platforms_list)}")
+    
+    # Home channels
+    if context.home_channels:
+        lines.append("")
+        lines.append("**Home Channels (default destinations):**")
+        for platform, home in context.home_channels.items():
+            lines.append(f"  - {platform.value}: {home.name} (ID: {home.chat_id})")
+    
+    # Delivery options for scheduled tasks
+    lines.append("")
+    lines.append("**Delivery options for scheduled tasks:**")
+    
+    # Origin delivery
+    if context.source.platform == Platform.LOCAL:
+        lines.append("- `\"origin\"` → Local output (saved to files)")
+    else:
+        lines.append(f"- `\"origin\"` → Back to this chat ({context.source.chat_name or context.source.chat_id})")
+    
+    # Local always available
+    lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
+    
+    # Platform home channels
+    for platform, home in context.home_channels.items():
+        lines.append(f"- `\"{platform.value}\"` → Home channel ({home.name})")
+    
+    # Note about explicit targeting
+    lines.append("")
+    lines.append("*For explicit targeting, use `\"platform:chat_id\"` format if the user provides a specific chat ID.*")
+    
+    return "\n".join(lines)
+
+
+@dataclass
+class SessionEntry:
+    """
+    Entry in the session store.
+    
+    Maps a session key to its current session ID and metadata.
+    """
+    session_key: str
+    session_id: str
+    created_at: datetime
+    updated_at: datetime
+    
+    # Origin metadata for delivery routing
+    origin: Optional[SessionSource] = None
+    
+    # Display metadata
+    display_name: Optional[str] = None
+    platform: Optional[Platform] = None
+    chat_type: str = "dm"
+    
+    # Token tracking
+    input_tokens: int = 0
+    output_tokens: int = 0
+    total_tokens: int = 0
+    
+    # Set when a session was created because the previous one expired;
+    # consumed once by the message handler to inject a notice into context
+    was_auto_reset: bool = False
+    
+    def to_dict(self) -> Dict[str, Any]:
+        result = {
+            "session_key": self.session_key,
+            "session_id": self.session_id,
+            "created_at": self.created_at.isoformat(),
+            "updated_at": self.updated_at.isoformat(),
+            "display_name": self.display_name,
+            "platform": self.platform.value if self.platform else None,
+            "chat_type": self.chat_type,
+            "input_tokens": self.input_tokens,
+            "output_tokens": self.output_tokens,
+            "total_tokens": self.total_tokens,
+        }
+        if self.origin:
+            result["origin"] = self.origin.to_dict()
+        return result
+    
+    @classmethod
+    def from_dict(cls, data: Dict[str, Any]) -> "SessionEntry":
+        origin = None
+        if "origin" in data and data["origin"]:
+            origin = SessionSource.from_dict(data["origin"])
+        
+        platform = None
+        if data.get("platform"):
+            try:
+                platform = Platform(data["platform"])
+            except ValueError:
+                pass
+        
+        return cls(
+            session_key=data["session_key"],
+            session_id=data["session_id"],
+            created_at=datetime.fromisoformat(data["created_at"]),
+            updated_at=datetime.fromisoformat(data["updated_at"]),
+            origin=origin,
+            display_name=data.get("display_name"),
+            platform=platform,
+            chat_type=data.get("chat_type", "dm"),
+            input_tokens=data.get("input_tokens", 0),
+            output_tokens=data.get("output_tokens", 0),
+            total_tokens=data.get("total_tokens", 0),
+        )
+
+
+def build_session_key(source: SessionSource) -> str:
+    """Build a deterministic session key from a message source.
+
+    This is the single source of truth for session key construction.
+    WhatsApp DMs include chat_id (multi-user), other DMs do not (single owner).
+    """
+    platform = source.platform.value
+    if source.chat_type == "dm":
+        if platform == "whatsapp" and source.chat_id:
+            return f"agent:main:{platform}:dm:{source.chat_id}"
+        return f"agent:main:{platform}:dm"
+    return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}"
+
+
+class SessionStore:
+    """
+    Manages session storage and retrieval.
+    
+    Uses SQLite (via SessionDB) for session metadata and message transcripts.
+    Falls back to legacy JSONL files if SQLite is unavailable.
+    """
+    
+    def __init__(self, sessions_dir: Path, config: GatewayConfig,
+                 has_active_processes_fn=None,
+                 on_auto_reset=None):
+        self.sessions_dir = sessions_dir
+        self.config = config
+        self._entries: Dict[str, SessionEntry] = {}
+        self._loaded = False
+        self._has_active_processes_fn = has_active_processes_fn
+        # on_auto_reset is deprecated — memory flush now runs proactively
+        # via the background session expiry watcher in GatewayRunner.
+        self._pre_flushed_sessions: set = set()  # session_ids already flushed by watcher
+        
+        # Initialize SQLite session database
+        self._db = None
+        try:
+            from hermes_state import SessionDB
+            self._db = SessionDB()
+        except Exception as e:
+            print(f"[gateway] Warning: SQLite session store unavailable, falling back to JSONL: {e}")
+    
+    def _ensure_loaded(self) -> None:
+        """Load sessions index from disk if not already loaded."""
+        if self._loaded:
+            return
+        
+        self.sessions_dir.mkdir(parents=True, exist_ok=True)
+        sessions_file = self.sessions_dir / "sessions.json"
+        
+        if sessions_file.exists():
+            try:
+                with open(sessions_file, "r", encoding="utf-8") as f:
+                    data = json.load(f)
+                    for key, entry_data in data.items():
+                        self._entries[key] = SessionEntry.from_dict(entry_data)
+            except Exception as e:
+                print(f"[gateway] Warning: Failed to load sessions: {e}")
+        
+        self._loaded = True
+    
+    def _save(self) -> None:
+        """Save sessions index to disk (kept for session key -> ID mapping)."""
+        self.sessions_dir.mkdir(parents=True, exist_ok=True)
+        sessions_file = self.sessions_dir / "sessions.json"
+        
+        data = {key: entry.to_dict() for key, entry in self._entries.items()}
+        with open(sessions_file, "w", encoding="utf-8") as f:
+            json.dump(data, f, indent=2)
+    
+    def _generate_session_key(self, source: SessionSource) -> str:
+        """Generate a session key from a source."""
+        return build_session_key(source)
+    
+    def _is_session_expired(self, entry: SessionEntry) -> bool:
+        """Check if a session has expired based on its reset policy.
+        
+        Works from the entry alone — no SessionSource needed.
+        Used by the background expiry watcher to proactively flush memories.
+        Sessions with active background processes are never considered expired.
+        """
+        if self._has_active_processes_fn:
+            if self._has_active_processes_fn(entry.session_key):
+                return False
+
+        policy = self.config.get_reset_policy(
+            platform=entry.platform,
+            session_type=entry.chat_type,
+        )
+
+        if policy.mode == "none":
+            return False
+
+        now = datetime.now()
+
+        if policy.mode in ("idle", "both"):
+            idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
+            if now > idle_deadline:
+                return True
+
+        if policy.mode in ("daily", "both"):
+            today_reset = now.replace(
+                hour=policy.at_hour,
+                minute=0, second=0, microsecond=0,
+            )
+            if now.hour < policy.at_hour:
+                today_reset -= timedelta(days=1)
+            if entry.updated_at < today_reset:
+                return True
+
+        return False
+
+    def _should_reset(self, entry: SessionEntry, source: SessionSource) -> bool:
+        """
+        Check if a session should be reset based on policy.
+        
+        Sessions with active background processes are never reset.
+        """
+        if self._has_active_processes_fn:
+            session_key = self._generate_session_key(source)
+            if self._has_active_processes_fn(session_key):
+                return False
+
+        policy = self.config.get_reset_policy(
+            platform=source.platform,
+            session_type=source.chat_type
+        )
+        
+        if policy.mode == "none":
+            return False
+        
+        now = datetime.now()
+        
+        if policy.mode in ("idle", "both"):
+            idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
+            if now > idle_deadline:
+                return True
+        
+        if policy.mode in ("daily", "both"):
+            today_reset = now.replace(
+                hour=policy.at_hour, 
+                minute=0, 
+                second=0, 
+                microsecond=0
+            )
+            if now.hour < policy.at_hour:
+                today_reset -= timedelta(days=1)
+            
+            if entry.updated_at < today_reset:
+                return True
+        
+        return False
+    
+    def has_any_sessions(self) -> bool:
+        """Check if any sessions have ever been created (across all platforms).
+
+        Uses the SQLite database as the source of truth because it preserves
+        historical session records (ended sessions still count).  The in-memory
+        ``_entries`` dict replaces entries on reset, so ``len(_entries)`` would
+        stay at 1 for single-platform users — which is the bug this fixes.
+
+        The current session is already in the DB by the time this is called
+        (get_or_create_session runs first), so we check ``> 1``.
+        """
+        if self._db:
+            try:
+                return self._db.session_count() > 1
+            except Exception:
+                pass  # fall through to heuristic
+        # Fallback: check if sessions.json was loaded with existing data.
+        # This covers the rare case where the DB is unavailable.
+        self._ensure_loaded()
+        return len(self._entries) > 1
+    
+    def get_or_create_session(
+        self, 
+        source: SessionSource,
+        force_new: bool = False
+    ) -> SessionEntry:
+        """
+        Get an existing session or create a new one.
+        
+        Evaluates reset policy to determine if the existing session is stale.
+        Creates a session record in SQLite when a new session starts.
+        """
+        self._ensure_loaded()
+        
+        session_key = self._generate_session_key(source)
+        now = datetime.now()
+        
+        if session_key in self._entries and not force_new:
+            entry = self._entries[session_key]
+            
+            if not self._should_reset(entry, source):
+                entry.updated_at = now
+                self._save()
+                return entry
+            else:
+                # Session is being auto-reset.  The background expiry watcher
+                # should have already flushed memories proactively; discard
+                # the marker so it doesn't accumulate.
+                was_auto_reset = True
+                self._pre_flushed_sessions.discard(entry.session_id)
+                if self._db:
+                    try:
+                        self._db.end_session(entry.session_id, "session_reset")
+                    except Exception as e:
+                        logger.debug("Session DB operation failed: %s", e)
+        else:
+            was_auto_reset = False
+        
+        # Create new session
+        session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        
+        entry = SessionEntry(
+            session_key=session_key,
+            session_id=session_id,
+            created_at=now,
+            updated_at=now,
+            origin=source,
+            display_name=source.chat_name,
+            platform=source.platform,
+            chat_type=source.chat_type,
+            was_auto_reset=was_auto_reset,
+        )
+        
+        self._entries[session_key] = entry
+        self._save()
+        
+        # Create session in SQLite
+        if self._db:
+            try:
+                self._db.create_session(
+                    session_id=session_id,
+                    source=source.platform.value,
+                    user_id=source.user_id,
+                )
+            except Exception as e:
+                print(f"[gateway] Warning: Failed to create SQLite session: {e}")
+        
+        return entry
+    
+    def update_session(
+        self, 
+        session_key: str,
+        input_tokens: int = 0,
+        output_tokens: int = 0
+    ) -> None:
+        """Update a session's metadata after an interaction."""
+        self._ensure_loaded()
+        
+        if session_key in self._entries:
+            entry = self._entries[session_key]
+            entry.updated_at = datetime.now()
+            entry.input_tokens += input_tokens
+            entry.output_tokens += output_tokens
+            entry.total_tokens = entry.input_tokens + entry.output_tokens
+            self._save()
+            
+            if self._db:
+                try:
+                    self._db.update_token_counts(
+                        entry.session_id, input_tokens, output_tokens
+                    )
+                except Exception as e:
+                    logger.debug("Session DB operation failed: %s", e)
+    
+    def reset_session(self, session_key: str) -> Optional[SessionEntry]:
+        """Force reset a session, creating a new session ID."""
+        self._ensure_loaded()
+        
+        if session_key not in self._entries:
+            return None
+        
+        old_entry = self._entries[session_key]
+        
+        # End old session in SQLite
+        if self._db:
+            try:
+                self._db.end_session(old_entry.session_id, "session_reset")
+            except Exception as e:
+                logger.debug("Session DB operation failed: %s", e)
+        
+        now = datetime.now()
+        session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+        
+        new_entry = SessionEntry(
+            session_key=session_key,
+            session_id=session_id,
+            created_at=now,
+            updated_at=now,
+            origin=old_entry.origin,
+            display_name=old_entry.display_name,
+            platform=old_entry.platform,
+            chat_type=old_entry.chat_type,
+        )
+        
+        self._entries[session_key] = new_entry
+        self._save()
+        
+        # Create new session in SQLite
+        if self._db:
+            try:
+                self._db.create_session(
+                    session_id=session_id,
+                    source=old_entry.platform.value if old_entry.platform else "unknown",
+                    user_id=old_entry.origin.user_id if old_entry.origin else None,
+                )
+            except Exception as e:
+                logger.debug("Session DB operation failed: %s", e)
+        
+        return new_entry
+
+    def switch_session(self, session_key: str, target_session_id: str) -> Optional[SessionEntry]:
+        """Switch a session key to point at an existing session ID.
+
+        Used by ``/resume`` to restore a previously-named session.
+        Ends the current session in SQLite (like reset), but instead of
+        generating a fresh session ID, re-uses ``target_session_id`` so the
+        old transcript is loaded on the next message.
+        """
+        self._ensure_loaded()
+
+        if session_key not in self._entries:
+            return None
+
+        old_entry = self._entries[session_key]
+
+        # Don't switch if already on that session
+        if old_entry.session_id == target_session_id:
+            return old_entry
+
+        # End the current session in SQLite
+        if self._db:
+            try:
+                self._db.end_session(old_entry.session_id, "session_switch")
+            except Exception as e:
+                logger.debug("Session DB end_session failed: %s", e)
+
+        now = datetime.now()
+        new_entry = SessionEntry(
+            session_key=session_key,
+            session_id=target_session_id,
+            created_at=now,
+            updated_at=now,
+            origin=old_entry.origin,
+            display_name=old_entry.display_name,
+            platform=old_entry.platform,
+            chat_type=old_entry.chat_type,
+        )
+
+        self._entries[session_key] = new_entry
+        self._save()
+        return new_entry
+
+    def list_sessions(self, active_minutes: Optional[int] = None) -> List[SessionEntry]:
+        """List all sessions, optionally filtered by activity."""
+        self._ensure_loaded()
+        
+        entries = list(self._entries.values())
+        
+        if active_minutes is not None:
+            cutoff = datetime.now() - timedelta(minutes=active_minutes)
+            entries = [e for e in entries if e.updated_at >= cutoff]
+        
+        entries.sort(key=lambda e: e.updated_at, reverse=True)
+        
+        return entries
+    
+    def get_transcript_path(self, session_id: str) -> Path:
+        """Get the path to a session's legacy transcript file."""
+        return self.sessions_dir / f"{session_id}.jsonl"
+    
+    def append_to_transcript(self, session_id: str, message: Dict[str, Any]) -> None:
+        """Append a message to a session's transcript (SQLite + legacy JSONL)."""
+        # Write to SQLite
+        if self._db:
+            try:
+                self._db.append_message(
+                    session_id=session_id,
+                    role=message.get("role", "unknown"),
+                    content=message.get("content"),
+                    tool_name=message.get("tool_name"),
+                    tool_calls=message.get("tool_calls"),
+                    tool_call_id=message.get("tool_call_id"),
+                )
+            except Exception as e:
+                logger.debug("Session DB operation failed: %s", e)
+        
+        # Also write legacy JSONL (keeps existing tooling working during transition)
+        transcript_path = self.get_transcript_path(session_id)
+        with open(transcript_path, "a", encoding="utf-8") as f:
+            f.write(json.dumps(message, ensure_ascii=False) + "\n")
+    
+    def rewrite_transcript(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
+        """Replace the entire transcript for a session with new messages.
+        
+        Used by /retry, /undo, and /compress to persist modified conversation history.
+        Rewrites both SQLite and legacy JSONL storage.
+        """
+        # SQLite: clear old messages and re-insert
+        if self._db:
+            try:
+                self._db.clear_messages(session_id)
+                for msg in messages:
+                    self._db.append_message(
+                        session_id=session_id,
+                        role=msg.get("role", "unknown"),
+                        content=msg.get("content"),
+                        tool_name=msg.get("tool_name"),
+                        tool_calls=msg.get("tool_calls"),
+                        tool_call_id=msg.get("tool_call_id"),
+                    )
+            except Exception as e:
+                logger.debug("Failed to rewrite transcript in DB: %s", e)
+        
+        # JSONL: overwrite the file
+        transcript_path = self.get_transcript_path(session_id)
+        with open(transcript_path, "w", encoding="utf-8") as f:
+            for msg in messages:
+                f.write(json.dumps(msg, ensure_ascii=False) + "\n")
+
+    def load_transcript(self, session_id: str) -> List[Dict[str, Any]]:
+        """Load all messages from a session's transcript."""
+        # Try SQLite first
+        if self._db:
+            try:
+                messages = self._db.get_messages_as_conversation(session_id)
+                if messages:
+                    return messages
+            except Exception as e:
+                logger.debug("Could not load messages from DB: %s", e)
+        
+        # Fall back to legacy JSONL
+        transcript_path = self.get_transcript_path(session_id)
+        
+        if not transcript_path.exists():
+            return []
+        
+        messages = []
+        with open(transcript_path, "r", encoding="utf-8") as f:
+            for line in f:
+                line = line.strip()
+                if line:
+                    messages.append(json.loads(line))
+        
+        return messages
+
+
+def build_session_context(
+    source: SessionSource,
+    config: GatewayConfig,
+    session_entry: Optional[SessionEntry] = None
+) -> SessionContext:
+    """
+    Build a full session context from a source and config.
+    
+    This is used to inject context into the agent's system prompt.
+    """
+    connected = config.get_connected_platforms()
+    
+    home_channels = {}
+    for platform in connected:
+        home = config.get_home_channel(platform)
+        if home:
+            home_channels[platform] = home
+    
+    context = SessionContext(
+        source=source,
+        connected_platforms=connected,
+        home_channels=home_channels,
+    )
+    
+    if session_entry:
+        context.session_key = session_entry.session_key
+        context.session_id = session_entry.session_id
+        context.created_at = session_entry.created_at
+        context.updated_at = session_entry.updated_at
+    
+    return context
--- a/gateway/status.py
+++ b/gateway/status.py
@@ -0,0 +1,61 @@
+"""
+Gateway runtime status helpers.
+
+Provides PID-file based detection of whether the gateway daemon is running,
+used by send_message's check_fn to gate availability in the CLI.
+
+The PID file lives at ``{HERMES_HOME}/gateway.pid``.  HERMES_HOME defaults to
+``~/.hermes`` but can be overridden via the environment variable.  This means
+separate HERMES_HOME directories naturally get separate PID files — a property
+that will be useful when we add named profiles (multiple agents running
+concurrently under distinct configurations).
+"""
+
+import os
+from pathlib import Path
+from typing import Optional
+
+
+def _get_pid_path() -> Path:
+    """Return the path to the gateway PID file, respecting HERMES_HOME."""
+    home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
+    return home / "gateway.pid"
+
+
+def write_pid_file() -> None:
+    """Write the current process PID to the gateway PID file."""
+    pid_path = _get_pid_path()
+    pid_path.parent.mkdir(parents=True, exist_ok=True)
+    pid_path.write_text(str(os.getpid()))
+
+
+def remove_pid_file() -> None:
+    """Remove the gateway PID file if it exists."""
+    try:
+        _get_pid_path().unlink(missing_ok=True)
+    except Exception:
+        pass
+
+
+def get_running_pid() -> Optional[int]:
+    """Return the PID of a running gateway instance, or ``None``.
+
+    Checks the PID file and verifies the process is actually alive.
+    Cleans up stale PID files automatically.
+    """
+    pid_path = _get_pid_path()
+    if not pid_path.exists():
+        return None
+    try:
+        pid = int(pid_path.read_text().strip())
+        os.kill(pid, 0)  # signal 0 = existence check, no actual signal sent
+        return pid
+    except (ValueError, ProcessLookupError, PermissionError):
+        # Stale PID file — process is gone
+        remove_pid_file()
+        return None
+
+
+def is_gateway_running() -> bool:
+    """Check if the gateway daemon is currently running."""
+    return get_running_pid() is not None
--- a/Show More
+++ b/Show More