feat: make tinker-atropos RL training fully optional

The tinker-atropos submodule and its heavy dependencies (atroposlib, tinker, wandb, fastapi, uvicorn) were being installed for all users by default, adding significant install time and disk usage for most users who don't need RL training capabilities. Changes: - install.sh: Only init mini-swe-agent submodule by default; skip tinker-atropos clone and install entirely - install.sh: Remove --recurse-submodules from git clone (only fetches what's needed) - pyproject.toml: Add [rl] optional dependency group for explicit opt-in - rl_training_tool.py: Move LOGS_DIR.mkdir() from module-level to lazy init (_ensure_logs_dir) to avoid side effects on import - README.md: Update contributor quick start to not auto-fetch tinker-atropos; add RL opt-in instructions Users who want RL training can opt in with: git submodule update --init tinker-atropos uv pip install -e ./tinker-atropos
fix: use session_key instead of chat_id for adapter interrupt lookups
2026-03-12 09:11:44 -07:00 · 2026-03-12 08:35:45 -07:00 · 2026-03-12 08:26:24 -07:00 · 2026-03-12 08:23:05 -07:00 · 2026-03-12 08:21:36 -07:00 · 2026-03-12 08:20:12 -07:00
167 changed files with 18653 additions and 2327 deletions
@@ -201,6 +201,18 @@ VOICE_TOOLS_OPENAI_KEY=
 # WHATSAPP_ENABLED=false
 # WHATSAPP_ALLOWED_USERS=15551234567

+# Email (IMAP/SMTP — send and receive emails as Hermes)
+# For Gmail: enable 2FA → create App Password at https://myaccount.google.com/apppasswords
+# EMAIL_ADDRESS=hermes@gmail.com
+# EMAIL_PASSWORD=xxxx xxxx xxxx xxxx
+# EMAIL_IMAP_HOST=imap.gmail.com
+# EMAIL_IMAP_PORT=993
+# EMAIL_SMTP_HOST=smtp.gmail.com
+# EMAIL_SMTP_PORT=587
+# EMAIL_POLL_INTERVAL=15
+# EMAIL_ALLOWED_USERS=your@email.com
+# EMAIL_HOME_ADDRESS=your@email.com
+
 # Gateway-wide: allow ALL users without an allowlist (default: false = deny)
 # Only set to true if you intentionally want open access.
 # GATEWAY_ALLOW_ALL_USERS=false
@@ -34,7 +34,7 @@ jobs:
      - name: Run tests
        run: |
          source .venv/bin/activate
-          python -m pytest tests/ -q --ignore=tests/integration --tb=short
+          python -m pytest tests/ -q --ignore=tests/integration --tb=short -n auto
        env:
          # Ensure tests don't accidentally call real APIs
          OPENROUTER_API_KEY: ""
@@ -1,51 +1,55 @@
-/venv/
-/_pycache/
-*.pyc*
-__pycache__/
-.venv/
-.vscode/
-.env
-.env.local
-.env.development.local
-.env.test.local
-.env.production.local
-.env.development
-.env.test
-export*
-__pycache__/model_tools.cpython-310.pyc
-__pycache__/web_tools.cpython-310.pyc
-logs/
-data/
-.pytest_cache/
-tmp/
-temp_vision_images/
-hermes-*/*
-examples/
-tests/quick_test_dataset.jsonl
-tests/sample_dataset.jsonl
-run_datagen_kimik2-thinking.sh
-run_datagen_megascience_glm4-6.sh
-run_datagen_sonnet.sh
-source-data/*
-run_datagen_megascience_glm4-6.sh
-data/*
-node_modules/
-browser-use/
-agent-browser/
-# Private keys
-*.ppk
-*.pem
-privvy*
-images/
-__pycache__/
-hermes_agent.egg-info/
-wandb/
-testlogs
-
-# CLI config (may contain sensitive SSH paths)
-cli-config.yaml
-
-# Skills Hub state (lives in ~/.hermes/skills/.hub/ at runtime, but just in case)
-skills/.hub/
+/venv/
+/_pycache/
+*.pyc*
+__pycache__/
+.venv/
+.vscode/
+.env
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+.env.development
+.env.test
+export*
+__pycache__/model_tools.cpython-310.pyc
+__pycache__/web_tools.cpython-310.pyc
+logs/
+data/
+.pytest_cache/
+tmp/
+temp_vision_images/
+hermes-*/*
+examples/
+tests/quick_test_dataset.jsonl
+tests/sample_dataset.jsonl
+run_datagen_kimik2-thinking.sh
+run_datagen_megascience_glm4-6.sh
+run_datagen_sonnet.sh
+source-data/*
+run_datagen_megascience_glm4-6.sh
+data/*
+node_modules/
+browser-use/
+agent-browser/
+# Private keys
+*.ppk
+*.pem
+privvy*
+images/
+__pycache__/
+hermes_agent.egg-info/
+wandb/
+testlogs
+
+# CLI config (may contain sensitive SSH paths)
+cli-config.yaml
+
+# Skills Hub state (lives in ~/.hermes/skills/.hub/ at runtime, but just in case)
+skills/.hub/
 ignored/
 .worktrees/
+environments/benchmarks/evals/
+
+# Release script temp files
+.release_notes.md
@@ -0,0 +1,291 @@
+# OpenAI-Compatible API Server for Hermes Agent
+
+## Motivation
+
+Every major chat frontend (Open WebUI 126k★, LobeChat 73k★, LibreChat 34k★,
+AnythingLLM 56k★, NextChat 87k★, ChatBox 39k★, Jan 26k★, HF Chat-UI 8k★,
+big-AGI 7k★) connects to backends via the OpenAI-compatible REST API with
+SSE streaming. By exposing this endpoint, hermes-agent becomes instantly
+usable as a backend for all of them — no custom adapters needed.
+
+## What It Enables
+
+```
+┌──────────────────┐
+│  Open WebUI      │──┐
+│  LobeChat        │  │    POST /v1/chat/completions
+│  LibreChat       │  ├──► Authorization: Bearer <key>     ┌─────────────────┐
+│  AnythingLLM     │  │    {"messages": [...]}             │  hermes-agent   │
+│  NextChat        │  │                                    │  gateway        │
+│  Any OAI client  │──┘    ◄── SSE streaming response      │  (API server)   │
+└──────────────────┘                                        └─────────────────┘
+```
+
+A user would:
+1. Set `API_SERVER_ENABLED=true` in `~/.hermes/.env`
+2. Run `hermes gateway` (API server starts alongside Telegram/Discord/etc.)
+3. Point Open WebUI (or any frontend) at `http://localhost:8642/v1`
+4. Chat with hermes-agent through any OpenAI-compatible UI
+
+## Endpoints
+
+| Method | Path | Purpose |
+|--------|------|---------|
+| POST | `/v1/chat/completions` | Chat with the agent (streaming + non-streaming) |
+| GET | `/v1/models` | List available "models" (returns hermes-agent as a model) |
+| GET | `/health` | Health check |
+
+## Architecture
+
+### Option A: Gateway Platform Adapter (recommended)
+
+Create `gateway/platforms/api_server.py` as a new platform adapter that
+extends `BasePlatformAdapter`. This is the cleanest approach because:
+
+- Reuses all gateway infrastructure (session management, auth, context building)
+- Runs in the same async loop as other adapters
+- Gets message handling, interrupt support, and session persistence for free
+- Follows the established pattern (like Telegram, Discord, etc.)
+- Uses `aiohttp.web` (already a dependency) for the HTTP server
+
+The adapter would start an `aiohttp.web.Application` server in `connect()`
+and route incoming HTTP requests through the standard `handle_message()` pipeline.
+
+### Option B: Standalone Component
+
+A separate HTTP server class in `gateway/api_server.py` that creates its own
+AIAgent instances directly. Simpler but duplicates session/auth logic.
+
+**Recommendation: Option A** — fits the existing architecture, less code to
+maintain, gets all gateway features for free.
+
+## Request/Response Format
+
+### Chat Completions (non-streaming)
+
+```
+POST /v1/chat/completions
+Authorization: Bearer hermes-api-key-here
+Content-Type: application/json
+
+{
+  "model": "hermes-agent",
+  "messages": [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What files are in the current directory?"}
+  ],
+  "stream": false,
+  "temperature": 0.7
+}
+```
+
+Response:
+```json
+{
+  "id": "chatcmpl-abc123",
+  "object": "chat.completion",
+  "created": 1710000000,
+  "model": "hermes-agent",
+  "choices": [{
+    "index": 0,
+    "message": {
+      "role": "assistant",
+      "content": "Here are the files in the current directory:\n..."
+    },
+    "finish_reason": "stop"
+  }],
+  "usage": {
+    "prompt_tokens": 50,
+    "completion_tokens": 200,
+    "total_tokens": 250
+  }
+}
+```
+
+### Chat Completions (streaming)
+
+Same request with `"stream": true`. Response is SSE:
+
+```
+data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
+
+data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Here "},"finish_reason":null}]}
+
+data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"are "},"finish_reason":null}]}
+
+data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
+
+data: [DONE]
+```
+
+### Models List
+
+```
+GET /v1/models
+Authorization: Bearer hermes-api-key-here
+```
+
+Response:
+```json
+{
+  "object": "list",
+  "data": [{
+    "id": "hermes-agent",
+    "object": "model",
+    "created": 1710000000,
+    "owned_by": "hermes-agent"
+  }]
+}
+```
+
+## Key Design Decisions
+
+### 1. Session Management
+
+The OpenAI API is stateless — each request includes the full conversation.
+But hermes-agent sessions have persistent state (memory, skills, tool context).
+
+**Approach: Hybrid**
+- Default: Stateless. Each request is independent. The `messages` array IS
+  the conversation. No session persistence between requests.
+- Opt-in persistent sessions via `X-Session-ID` header. When provided, the
+  server maintains session state across requests (conversation history,
+  memory context, tool state). This enables richer agent behavior.
+- The session ID also enables interrupt support — a subsequent request with
+  the same session ID while one is running triggers an interrupt.
+
+### 2. Streaming
+
+The agent's `run_conversation()` is synchronous and returns the full response.
+For real SSE streaming, we need to emit chunks as they're generated.
+
+**Phase 1 (MVP):** Run agent in a thread, return the complete response as
+a single SSE chunk + `[DONE]`. This works with all frontends — they just see
+a fast single-chunk response. Not true streaming but functional.
+
+**Phase 2:** Add a response callback to AIAgent that emits text chunks as the
+LLM generates them. The API server captures these via a queue and streams them
+as SSE events. This gives real token-by-token streaming.
+
+**Phase 3:** Stream tool execution progress too — emit tool call/result events
+as the agent works, giving frontends visibility into what the agent is doing.
+
+### 3. Tool Transparency
+
+Two modes:
+- **Opaque (default):** Frontends see only the final response. Tool calls
+  happen server-side and are invisible. Best for general-purpose UIs.
+- **Transparent (opt-in via header):** Tool calls are emitted as OpenAI-format
+  tool_call/tool_result messages in the stream. Useful for agent-aware frontends.
+
+### 4. Authentication
+
+- Bearer token via `Authorization: Bearer <key>` header
+- Token configured via `API_SERVER_KEY` env var
+- Optional: allow unauthenticated local-only access (127.0.0.1 bind)
+- Follows the same pattern as other platform adapters
+
+### 5. Model Mapping
+
+Frontends send `"model": "hermes-agent"` (or whatever). The actual LLM model
+used is configured server-side in config.yaml. The API server maps any
+requested model name to the configured hermes-agent model.
+
+Optionally, allow model passthrough: if the frontend sends
+`"model": "anthropic/claude-sonnet-4"`, the agent uses that model. Controlled
+by a config flag.
+
+## Configuration
+
+```yaml
+# In config.yaml
+api_server:
+  enabled: true
+  port: 8642
+  host: "127.0.0.1"        # localhost only by default
+  key: "your-secret-key"   # or via API_SERVER_KEY env var
+  allow_model_override: false  # let clients choose the model
+  max_concurrent: 5         # max simultaneous requests
+```
+
+Environment variables:
+```bash
+API_SERVER_ENABLED=true
+API_SERVER_PORT=8642
+API_SERVER_HOST=127.0.0.1
+API_SERVER_KEY=your-secret-key
+```
+
+## Implementation Plan
+
+### Phase 1: MVP (non-streaming) — PR
+
+1. `gateway/platforms/api_server.py` — new adapter
+   - aiohttp.web server with endpoints:
+     - `POST /v1/chat/completions` — Chat Completions API (universal compat)
+     - `POST /v1/responses` — Responses API (server-side state, tool preservation)
+     - `GET /v1/models` — list available models
+     - `GET /health` — health check
+   - Bearer token auth middleware
+   - Non-streaming responses (run agent, return full result)
+   - Chat Completions: stateless, messages array is the conversation
+   - Responses API: server-side conversation storage via previous_response_id
+     - Store full internal conversation (including tool calls) keyed by response ID
+     - On subsequent requests, reconstruct full context from stored chain
+   - Frontend system prompt layered on top of hermes-agent's core prompt
+
+2. `gateway/config.py` — add `Platform.API_SERVER` enum + config
+
+3. `gateway/run.py` — register adapter in `_create_adapter()`
+
+4. Tests in `tests/gateway/test_api_server.py`
+
+### Phase 2: SSE Streaming
+
+1. Add response streaming to both endpoints
+   - Chat Completions: `choices[0].delta.content` SSE format
+   - Responses API: semantic events (response.output_text.delta, etc.)
+   - Run agent in thread, collect output via callback queue
+   - Handle client disconnect (cancel agent)
+
+2. Add `stream_callback` parameter to `AIAgent.run_conversation()`
+
+### Phase 3: Enhanced Features
+
+1. Tool call transparency mode (opt-in)
+2. Model passthrough/override
+3. Concurrent request limiting
+4. Usage tracking / rate limiting
+5. CORS headers for browser-based frontends
+6. GET /v1/responses/{id} — retrieve stored response
+7. DELETE /v1/responses/{id} — delete stored response
+
+## Files Changed
+
+| File | Change |
+|------|--------|
+| `gateway/platforms/api_server.py` | NEW — main adapter (~300 lines) |
+| `gateway/config.py` | Add Platform.API_SERVER + config (~20 lines) |
+| `gateway/run.py` | Register adapter in _create_adapter() (~10 lines) |
+| `tests/gateway/test_api_server.py` | NEW — tests (~200 lines) |
+| `cli-config.yaml.example` | Add api_server section |
+| `README.md` | Mention API server in platform list |
+
+## Compatibility Matrix
+
+Once implemented, hermes-agent works as a drop-in backend for:
+
+| Frontend | Stars | How to Connect |
+|----------|-------|---------------|
+| Open WebUI | 126k | Settings → Connections → Add OpenAI API, URL: `http://localhost:8642/v1` |
+| NextChat | 87k | BASE_URL env var |
+| LobeChat | 73k | Custom provider endpoint |
+| AnythingLLM | 56k | LLM Provider → Generic OpenAI |
+| Oobabooga | 42k | Already a backend, not a frontend |
+| ChatBox | 39k | API Host setting |
+| LibreChat | 34k | librechat.yaml custom endpoint |
+| Chatbot UI | 29k | Custom API endpoint |
+| Jan | 26k | Remote model config |
+| AionUI | 18k | Custom API endpoint |
+| HF Chat-UI | 8k | OPENAI_BASE_URL env var |
+| big-AGI | 7k | Custom endpoint |
@@ -0,0 +1,705 @@
+# Streaming LLM Response Support for Hermes Agent
+
+## Overview
+
+Add token-by-token streaming of LLM responses across all platforms. When enabled,
+users see the response typing out live instead of waiting for the full generation.
+Streaming is opt-in via config, defaults to off, and all existing non-streaming
+code paths remain intact as the default.
+
+## Design Principles
+
+1. **Feature-flagged**: `streaming.enabled: true` in config.yaml. Off by default.
+   When off, all existing code paths are unchanged — zero risk to current behavior.
+2. **Callback-based**: A simple `stream_callback(text_delta: str)` function injected
+   into AIAgent. The agent doesn't know or care what the consumer does with tokens.
+3. **Graceful degradation**: If the provider doesn't support streaming, or streaming
+   fails for any reason, silently fall back to the non-streaming path.
+4. **Platform-agnostic core**: The streaming mechanism in AIAgent works the same
+   regardless of whether the consumer is CLI, Telegram, Discord, or the API server.
+
+---
+
+## Architecture
+
+```
+                              stream_callback(delta)
+                                    │
+  ┌─────────────┐    ┌─────────────▼──────────────┐
+  │  LLM API    │    │      queue.Queue()          │
+  │  (stream)   │───►│  thread-safe bridge between │
+  │             │    │  agent thread & consumer    │
+  └─────────────┘    └─────────────┬──────────────┘
+                                   │
+                    ┌──────────────┼──────────────┐
+                    │              │              │
+              ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
+              │    CLI     │ │  Gateway  │ │ API Server│
+              │ print to   │ │ edit msg  │ │ SSE event │
+              │ terminal   │ │ on Tg/Dc  │ │ to client │
+              └───────────┘ └───────────┘ └───────────┘
+```
+
+The agent runs in a thread. The callback puts tokens into a thread-safe queue.
+Each consumer reads the queue in its own context (async task, main thread, etc.).
+
+---
+
+## Configuration
+
+### config.yaml
+
+```yaml
+streaming:
+  enabled: false          # Master switch. Default off.
+  # Per-platform overrides (optional):
+  # cli: true             # Override for CLI only
+  # telegram: true        # Override for Telegram only
+  # discord: false        # Keep Discord non-streaming
+  # api_server: true      # Override for API server
+```
+
+### Environment variables
+
+```
+HERMES_STREAMING_ENABLED=true    # Master switch via env
+```
+
+### How the flag is read
+
+- **CLI**: `load_cli_config()` reads `streaming.enabled`, sets env var. AIAgent
+  checks at init time.
+- **Gateway**: `_run_agent()` reads config, decides whether to pass
+  `stream_callback` to the AIAgent constructor.
+- **API server**: For Chat Completions `stream=true` requests, always uses streaming
+  regardless of config (the client is explicitly requesting it). For non-stream
+  requests, uses config.
+
+### Precedence
+
+1. API server: client's `stream` field overrides everything
+2. Per-platform config override (e.g., `streaming.telegram: true`)
+3. Master `streaming.enabled` flag
+4. Default: off
+
+---
+
+## Implementation Plan
+
+### Phase 1: Core streaming infrastructure in AIAgent
+
+**File: run_agent.py**
+
+#### 1a. Add stream_callback parameter to __init__ (~5 lines)
+
+```python
+def __init__(self, ..., stream_callback: callable = None, ...):
+    self.stream_callback = stream_callback
+```
+
+No other init changes. The callback is optional — when None, everything
+works exactly as before.
+
+#### 1b. Add _run_streaming_chat_completion() method (~65 lines)
+
+New method for Chat Completions API streaming:
+
+```python
+def _run_streaming_chat_completion(self, api_kwargs: dict):
+    """Stream a chat completion, emitting text tokens via stream_callback.
+    
+    Returns a fake response object compatible with the non-streaming code path.
+    Falls back to non-streaming on any error.
+    """
+    stream_kwargs = dict(api_kwargs)
+    stream_kwargs["stream"] = True
+    stream_kwargs["stream_options"] = {"include_usage": True}
+    
+    accumulated_content = []
+    accumulated_tool_calls = {}  # index -> {id, name, arguments}
+    final_usage = None
+    
+    try:
+        stream = self.client.chat.completions.create(**stream_kwargs)
+        
+        for chunk in stream:
+            if not chunk.choices:
+                # Usage-only chunk (final)
+                if chunk.usage:
+                    final_usage = chunk.usage
+                continue
+            
+            delta = chunk.choices[0].delta
+            
+            # Text content — emit via callback
+            if delta.content:
+                accumulated_content.append(delta.content)
+                if self.stream_callback:
+                    try:
+                        self.stream_callback(delta.content)
+                    except Exception:
+                        pass
+            
+            # Tool call deltas — accumulate silently
+            if delta.tool_calls:
+                for tc_delta in delta.tool_calls:
+                    idx = tc_delta.index
+                    if idx not in accumulated_tool_calls:
+                        accumulated_tool_calls[idx] = {
+                            "id": tc_delta.id or "",
+                            "name": "", "arguments": ""
+                        }
+                    if tc_delta.function:
+                        if tc_delta.function.name:
+                            accumulated_tool_calls[idx]["name"] = tc_delta.function.name
+                        if tc_delta.function.arguments:
+                            accumulated_tool_calls[idx]["arguments"] += tc_delta.function.arguments
+        
+        # Build fake response compatible with existing code
+        tool_calls = []
+        for idx in sorted(accumulated_tool_calls):
+            tc = accumulated_tool_calls[idx]
+            if tc["name"]:
+                tool_calls.append(SimpleNamespace(
+                    id=tc["id"], type="function",
+                    function=SimpleNamespace(name=tc["name"], arguments=tc["arguments"]),
+                ))
+        
+        return SimpleNamespace(
+            choices=[SimpleNamespace(
+                message=SimpleNamespace(
+                    content="".join(accumulated_content) or "",
+                    tool_calls=tool_calls or None,
+                    role="assistant",
+                ),
+                finish_reason="tool_calls" if tool_calls else "stop",
+            )],
+            usage=final_usage,
+            model=self.model,
+        )
+    
+    except Exception as e:
+        logger.debug("Streaming failed, falling back to non-streaming: %s", e)
+        return self.client.chat.completions.create(**api_kwargs)
+```
+
+#### 1c. Modify _run_codex_stream() for Responses API (~10 lines)
+
+The method already iterates the stream. Add callback emission:
+
+```python
+def _run_codex_stream(self, api_kwargs: dict):
+    with self.client.responses.stream(**api_kwargs) as stream:
+        for event in stream:
+            # Emit text deltas if streaming callback is set
+            if self.stream_callback and hasattr(event, 'type'):
+                if event.type == 'response.output_text.delta':
+                    try:
+                        self.stream_callback(event.delta)
+                    except Exception:
+                        pass
+        return stream.get_final_response()
+```
+
+#### 1d. Modify _interruptible_api_call() (~5 lines)
+
+Add the streaming branch:
+
+```python
+def _call():
+    try:
+        if self.api_mode == "codex_responses":
+            result["response"] = self._run_codex_stream(api_kwargs)
+        elif self.stream_callback is not None:
+            result["response"] = self._run_streaming_chat_completion(api_kwargs)
+        else:
+            result["response"] = self.client.chat.completions.create(**api_kwargs)
+    except Exception as e:
+        result["error"] = e
+```
+
+#### 1e. Signal end-of-stream to consumers (~5 lines)
+
+After the API call returns, signal the callback that streaming is done
+so consumers can finalize (remove cursor, close SSE, etc.):
+
+```python
+# In run_conversation(), after _interruptible_api_call returns:
+if self.stream_callback:
+    try:
+        self.stream_callback(None)  # None = end of stream signal
+    except Exception:
+        pass
+```
+
+Consumers check: `if delta is None: finalize()`
+
+**Tests for Phase 1:** (~150 lines)
+- Test _run_streaming_chat_completion with mocked stream
+- Test fallback to non-streaming on error
+- Test tool_call accumulation during streaming
+- Test stream_callback receives correct deltas
+- Test None signal at end of stream
+- Test streaming disabled when callback is None
+
+---
+
+### Phase 2: Gateway consumers (Telegram, Discord, etc.)
+
+**File: gateway/run.py**
+
+#### 2a. Read streaming config (~15 lines)
+
+In `_run_agent()`, before creating the AIAgent:
+
+```python
+# Read streaming config
+_streaming_enabled = False
+try:
+    # Check per-platform override first
+    platform_key = source.platform.value if source.platform else ""
+    _stream_cfg = {}  # loaded from config.yaml streaming section
+    if _stream_cfg.get(platform_key) is not None:
+        _streaming_enabled = bool(_stream_cfg[platform_key])
+    else:
+        _streaming_enabled = bool(_stream_cfg.get("enabled", False))
+except Exception:
+    pass
+# Env var override
+if os.getenv("HERMES_STREAMING_ENABLED", "").lower() in ("true", "1", "yes"):
+    _streaming_enabled = True
+```
+
+#### 2b. Set up queue + callback (~15 lines)
+
+```python
+_stream_q = None
+_stream_done = None
+_stream_msg_id = [None]  # mutable ref for the async task
+
+if _streaming_enabled:
+    import queue as _q
+    _stream_q = _q.Queue()
+    _stream_done = threading.Event()
+    
+    def _on_token(delta):
+        if delta is None:
+            _stream_done.set()
+        else:
+            _stream_q.put(delta)
+```
+
+Pass `stream_callback=_on_token` to the AIAgent constructor.
+
+#### 2c. Telegram/Discord stream preview task (~50 lines)
+
+```python
+async def stream_preview():
+    """Progressively edit a message with streaming tokens."""
+    if not _stream_q:
+        return
+    adapter = self.adapters.get(source.platform)
+    if not adapter:
+        return
+    
+    accumulated = []
+    token_count = 0
+    last_edit = 0.0
+    MIN_TOKENS = 20          # Don't show until enough context
+    EDIT_INTERVAL = 1.5      # Respect Telegram rate limits
+    
+    try:
+        while not _stream_done.is_set():
+            try:
+                chunk = _stream_q.get(timeout=0.1)
+                accumulated.append(chunk)
+                token_count += 1
+            except queue.Empty:
+                continue
+            
+            now = time.monotonic()
+            if token_count >= MIN_TOKENS and (now - last_edit) >= EDIT_INTERVAL:
+                preview = "".join(accumulated) + " ▌"
+                if _stream_msg_id[0] is None:
+                    r = await adapter.send(
+                        chat_id=source.chat_id,
+                        content=preview,
+                        metadata=_thread_metadata,
+                    )
+                    if r.success and r.message_id:
+                        _stream_msg_id[0] = r.message_id
+                else:
+                    await adapter.edit_message(
+                        chat_id=source.chat_id,
+                        message_id=_stream_msg_id[0],
+                        content=preview,
+                    )
+                last_edit = now
+        
+        # Drain remaining tokens
+        while not _stream_q.empty():
+            accumulated.append(_stream_q.get_nowait())
+        
+        # Final edit — remove cursor, show complete text
+        if _stream_msg_id[0] and accumulated:
+            await adapter.edit_message(
+                chat_id=source.chat_id,
+                message_id=_stream_msg_id[0],
+                content="".join(accumulated),
+            )
+    
+    except asyncio.CancelledError:
+        # Clean up on cancel
+        if _stream_msg_id[0] and accumulated:
+            try:
+                await adapter.edit_message(
+                    chat_id=source.chat_id,
+                    message_id=_stream_msg_id[0],
+                    content="".join(accumulated),
+                )
+            except Exception:
+                pass
+    except Exception as e:
+        logger.debug("stream_preview error: %s", e)
+```
+
+#### 2d. Skip final send if already streamed (~10 lines)
+
+In `_process_message_background()` (base.py), after getting the response,
+if streaming was active and `_stream_msg_id[0]` is set, the final response
+was already delivered via progressive edits. Skip the normal `self.send()`
+call to avoid duplicating the message.
+
+This is the most delicate integration point — we need to communicate from
+the gateway's `_run_agent` back to the base adapter's response sender that
+the response was already delivered. Options:
+
+- **Option A**: Return a special marker in the result dict:
+  `result["_streamed_msg_id"] = _stream_msg_id[0]`
+  The base adapter checks this and skips `send()`.
+  
+- **Option B**: Edit the already-sent message with the final response
+  (which may differ slightly from accumulated tokens due to think-block
+  stripping, etc.) and don't send a new one.
+
+- **Option C**: The stream preview task handles the FULL final response
+  (including any post-processing), and the handler returns None to skip
+  the normal send path.
+
+Recommended: **Option A** — cleanest separation. The result dict already
+carries metadata; adding one more field is low-risk.
+
+**Platform-specific considerations:**
+
+| Platform | Edit support | Rate limits | Streaming approach |
+|----------|-------------|-------------|-------------------|
+| Telegram | ✅ edit_message_text | ~20 edits/min | Edit every 1.5s |
+| Discord | ✅ message.edit | 5 edits/5s per message | Edit every 1.2s |
+| Slack | ✅ chat.update | Tier 3 (~50/min) | Edit every 1.5s |
+| WhatsApp | ❌ no edit support | N/A | Skip streaming, use normal path |
+| HomeAssistant | ❌ no edit | N/A | Skip streaming |
+| API Server | ✅ SSE native | No limit | Real SSE events |
+
+WhatsApp and HomeAssistant fall back to non-streaming automatically because
+they don't support message editing.
+
+**Tests for Phase 2:** (~100 lines)
+- Test stream_preview sends/edits correctly
+- Test skip-final-send when streaming delivered
+- Test WhatsApp/HA graceful fallback
+- Test streaming disabled per-platform config
+- Test thread_id metadata forwarded in stream messages
+
+---
+
+### Phase 3: CLI streaming
+
+**File: cli.py**
+
+#### 3a. Set up callback in the CLI chat loop (~20 lines)
+
+In `_chat_once()` or wherever the agent is invoked:
+
+```python
+if streaming_enabled:
+    _stream_q = queue.Queue()
+    _stream_done = threading.Event()
+    
+    def _cli_stream_callback(delta):
+        if delta is None:
+            _stream_done.set()
+        else:
+            _stream_q.put(delta)
+    
+    agent.stream_callback = _cli_stream_callback
+```
+
+#### 3b. Token display thread/task (~30 lines)
+
+Start a thread that reads the queue and prints tokens:
+
+```python
+def _stream_display():
+    """Print tokens to terminal as they arrive."""
+    first_token = True
+    while not _stream_done.is_set():
+        try:
+            delta = _stream_q.get(timeout=0.1)
+        except queue.Empty:
+            continue
+        if first_token:
+            # Print response box top border
+            _cprint(f"\n{top}")
+            first_token = False
+        sys.stdout.write(delta)
+        sys.stdout.flush()
+    # Drain remaining
+    while not _stream_q.empty():
+        sys.stdout.write(_stream_q.get_nowait())
+    sys.stdout.flush()
+    # Print bottom border
+    _cprint(f"\n\n{bot}")
+```
+
+**Integration challenge: prompt_toolkit**
+
+The CLI uses prompt_toolkit which controls the terminal. Writing directly
+to stdout while prompt_toolkit is active can cause display corruption.
+The existing KawaiiSpinner already solves this by using prompt_toolkit's
+`patch_stdout` context. The streaming display would need to do the same.
+
+Alternative: use `_cprint()` for each token chunk (routes through
+prompt_toolkit's renderer). But this might be slow for individual tokens.
+
+Recommended approach: accumulate tokens in small batches (e.g., every 50ms)
+and `_cprint()` the batch. This balances display responsiveness with
+prompt_toolkit compatibility.
+
+**Tests for Phase 3:** (~50 lines)
+- Test CLI streaming callback setup
+- Test response box borders with streaming
+- Test fallback when streaming disabled
+
+---
+
+### Phase 4: API Server real streaming
+
+**File: gateway/platforms/api_server.py**
+
+Replace the pseudo-streaming `_write_sse_chat_completion()` with real
+token-by-token SSE when the agent supports it.
+
+#### 4a. Wire streaming callback for stream=true requests (~20 lines)
+
+```python
+if stream:
+    _stream_q = queue.Queue()
+    
+    def _api_stream_callback(delta):
+        _stream_q.put(delta)  # None = done
+    
+    # Pass callback to _run_agent
+    result, usage = await self._run_agent(
+        ..., stream_callback=_api_stream_callback,
+    )
+```
+
+#### 4b. Real SSE writer (~40 lines)
+
+```python
+async def _write_real_sse(self, request, completion_id, model, stream_q):
+    response = web.StreamResponse(
+        headers={"Content-Type": "text/event-stream", "Cache-Control": "no-cache"},
+    )
+    await response.prepare(request)
+    
+    # Role chunk
+    await response.write(...)
+    
+    # Stream content chunks as they arrive
+    while True:
+        try:
+            delta = await asyncio.get_event_loop().run_in_executor(
+                None, lambda: stream_q.get(timeout=0.1)
+            )
+        except queue.Empty:
+            continue
+        
+        if delta is None:  # End of stream
+            break
+        
+        chunk = {"id": completion_id, "object": "chat.completion.chunk", ...
+                 "choices": [{"delta": {"content": delta}, ...}]}
+        await response.write(f"data: {json.dumps(chunk)}\n\n".encode())
+    
+    # Finish + [DONE]
+    await response.write(...)
+    await response.write(b"data: [DONE]\n\n")
+    return response
+```
+
+**Challenge: concurrent execution**
+
+The agent runs in a thread executor. SSE writing happens in the async event
+loop. The queue bridges them. But `_run_agent()` currently awaits the full
+result before returning. For real streaming, we need to start the agent in
+the background and stream tokens while it runs:
+
+```python
+# Start agent in background
+agent_task = asyncio.create_task(self._run_agent_async(...))
+
+# Stream tokens while agent runs
+await self._write_real_sse(request, ..., stream_q)
+
+# Agent is done by now (stream_q received None)
+result, usage = await agent_task
+```
+
+This requires splitting `_run_agent` into an async version that doesn't
+block waiting for the result, or running it in a separate task.
+
+**Responses API SSE format:**
+
+For `/v1/responses` with `stream=true`, the SSE events are different:
+
+```
+event: response.output_text.delta
+data: {"type":"response.output_text.delta","delta":"Hello"}
+
+event: response.completed  
+data: {"type":"response.completed","response":{...}}
+```
+
+This needs a separate SSE writer that emits Responses API format events.
+
+**Tests for Phase 4:** (~80 lines)
+- Test real SSE streaming with mocked agent
+- Test SSE event format (Chat Completions vs Responses)
+- Test client disconnect during streaming
+- Test fallback to pseudo-streaming when callback not available
+
+---
+
+## Integration Issues & Edge Cases
+
+### 1. Tool calls during streaming
+
+When the model returns tool calls instead of text, no text tokens are emitted.
+The stream_callback is simply never called with text. After tools execute, the
+next API call may produce the final text response — streaming picks up again.
+
+The stream preview task needs to handle this: if no tokens arrive during a
+tool-call round, don't send/edit any message. The tool progress messages
+continue working as before.
+
+### 2. Duplicate messages
+
+The biggest risk: the agent sends the final response normally (via the
+existing send path) AND the stream preview already showed it. The user
+sees the response twice.
+
+Prevention: when streaming is active and tokens were delivered, the final
+response send must be suppressed. The `result["_streamed_msg_id"]` marker
+tells the base adapter to skip its normal send.
+
+### 3. Response post-processing
+
+The final response may differ from the accumulated streamed tokens:
+- Think block stripping (`<think>...</think>` removed)
+- Trailing whitespace cleanup
+- Tool result media tag appending
+
+The stream preview shows raw tokens. The final edit should use the
+post-processed version. This means the final edit (removing the cursor)
+should use the post-processed `final_response`, not just the accumulated
+stream text.
+
+### 4. Context compression during streaming
+
+If the agent triggers context compression mid-conversation, the streaming
+tokens from BEFORE compression are from a different context than those
+after. This isn't a problem in practice — compression happens between
+API calls, not during streaming.
+
+### 5. Interrupt during streaming
+
+User sends a new message while streaming → interrupt. The stream is killed
+(HTTP connection closed), accumulated tokens are shown as-is (no cursor),
+and the interrupt message is processed normally. This is already handled by
+`_interruptible_api_call` closing the client.
+
+### 6. Multi-model / fallback
+
+If the primary model fails and the agent falls back to a different model,
+streaming state resets. The fallback call may or may not support streaming.
+The graceful fallback in `_run_streaming_chat_completion` handles this.
+
+### 7. Rate limiting on edits
+
+Telegram: ~20 edits/minute (~1 every 3 seconds to be safe)
+Discord: 5 edits per 5 seconds per message
+Slack: ~50 API calls/minute
+
+The 1.5s edit interval is conservative enough for all platforms. If we get
+429 rate limit errors on edits, just skip that edit cycle and try next time.
+
+---
+
+## Files Changed Summary
+
+| File | Phase | Changes |
+|------|-------|---------|
+| `run_agent.py` | 1 | +stream_callback param, +_run_streaming_chat_completion(), modify _run_codex_stream(), modify _interruptible_api_call() |
+| `gateway/run.py` | 2 | +streaming config reader, +queue/callback setup, +stream_preview task, +skip-final-send logic |
+| `gateway/platforms/base.py` | 2 | +check for _streamed_msg_id in response handler |
+| `cli.py` | 3 | +streaming setup, +token display, +response box integration |
+| `gateway/platforms/api_server.py` | 4 | +real SSE writer, +streaming callback wiring |
+| `hermes_cli/config.py` | 1 | +streaming config defaults |
+| `cli-config.yaml.example` | 1 | +streaming section |
+| `tests/test_streaming.py` | 1-4 | NEW — ~380 lines of tests |
+
+**Total new code**: ~500 lines across all phases
+**Total test code**: ~380 lines
+
+---
+
+## Rollout Plan
+
+1. **Phase 1** (core): Merge to main. Streaming disabled by default.
+   Zero impact on existing behavior. Can be tested with env var.
+
+2. **Phase 2** (gateway): Merge to main. Test on Telegram manually.
+   Enable per-platform: `streaming.telegram: true` in config.
+
+3. **Phase 3** (CLI): Merge to main. Test in terminal.
+   Enable: `streaming.cli: true` or `streaming.enabled: true`.
+
+4. **Phase 4** (API server): Merge to main. Test with Open WebUI.
+   Auto-enabled when client sends `stream: true`.
+
+Each phase is independently mergeable and testable. Streaming stays
+off by default throughout. Once all phases are stable, consider
+changing the default to enabled.
+
+---
+
+## Config Reference (final state)
+
+```yaml
+# config.yaml
+streaming:
+  enabled: false          # Master switch (default: off)
+  cli: true               # Per-platform override
+  telegram: true
+  discord: true
+  slack: true
+  api_server: true        # API server always streams when client requests it
+  edit_interval: 1.5      # Seconds between message edits (default: 1.5)
+  min_tokens: 20          # Tokens before first display (default: 20)
+```
+
+```bash
+# Environment variable override
+HERMES_STREAMING_ENABLED=true
+```
@@ -32,7 +32,12 @@ hermes-agent/
 │   ├── commands.py       # Slash command definitions + SlashCommandCompleter
 │   ├── callbacks.py      # Terminal callbacks (clarify, sudo, approval)
 │   ├── setup.py          # Interactive setup wizard
-│   └── skin_engine.py    # Skin/theme engine — CLI visual customization
+│   ├── skin_engine.py    # Skin/theme engine — CLI visual customization
+│   ├── skills_config.py  # `hermes skills` — enable/disable skills per platform
+│   ├── tools_config.py   # `hermes tools` — enable/disable tools per platform
+│   ├── skills_hub.py     # `/skills` slash command (search, browse, install)
+│   ├── models.py         # Model catalog, provider model lists
+│   └── auth.py           # Provider credential resolution
 ├── tools/                # Tool implementations (one file per tool)
 │   ├── registry.py       # Central tool registry (schemas, handlers, dispatch)
 │   ├── approval.py       # Dangerous command detection
@@ -49,9 +54,10 @@ hermes-agent/
 │   ├── run.py            # Main loop, slash commands, message dispatch
 │   ├── session.py        # SessionStore — conversation persistence
 │   └── platforms/        # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal
+├── acp_adapter/          # ACP server (VS Code / Zed / JetBrains integration)
 ├── cron/                 # Scheduler (jobs.py, scheduler.py)
 ├── environments/         # RL training environments (Atropos)
-├── tests/                # Pytest suite (~2500+ tests)
+├── tests/                # Pytest suite (~3000 tests)
 └── batch_runner.py       # Parallel batch processing
 ```

@@ -333,7 +339,7 @@ The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HER

 ```bash
 source .venv/bin/activate
-python -m pytest tests/ -q          # Full suite (~2500 tests, ~2 min)
+python -m pytest tests/ -q          # Full suite (~3000 tests, ~3 min)
 python -m pytest tests/test_model_tools.py -q   # Toolset resolution
 python -m pytest tests/test_cli_init.py -q       # CLI config loading
 python -m pytest tests/gateway/ -q               # Gateway tests
@@ -333,6 +333,8 @@ metadata:
  hermes:
    tags: [Category, Subcategory, Keywords]
    related_skills: [other-skill-name]
+    fallback_for_toolsets: [web]       # Optional — show only when toolset is unavailable
+    requires_toolsets: [terminal]      # Optional — show only when toolset is available
 ---

 # Skill Title
@@ -367,6 +369,48 @@ platforms: [windows]          # Windows only

 If the field is omitted or empty, the skill loads on all platforms (backward compatible). See `skills/apple/` for examples of macOS-only skills.

+### Conditional skill activation
+
+Skills can declare conditions that control when they appear in the system prompt, based on which tools and toolsets are available in the current session. This is primarily used for **fallback skills** — alternatives that should only be shown when a primary tool is unavailable.
+
+Four fields are supported under `metadata.hermes`:
+
+```yaml
+metadata:
+  hermes:
+    fallback_for_toolsets: [web]      # Show ONLY when these toolsets are unavailable
+    requires_toolsets: [terminal]     # Show ONLY when these toolsets are available
+    fallback_for_tools: [web_search]  # Show ONLY when these specific tools are unavailable
+    requires_tools: [terminal]        # Show ONLY when these specific tools are available
+```
+
+**Semantics:**
+- `fallback_for_*`: The skill is a backup. It is **hidden** when the listed tools/toolsets are available, and **shown** when they are unavailable. Use this for free alternatives to premium tools.
+- `requires_*`: The skill needs certain tools to function. It is **hidden** when the listed tools/toolsets are unavailable. Use this for skills that depend on specific capabilities (e.g., a skill that only makes sense with terminal access).
+- If both are specified, both conditions must be satisfied for the skill to appear.
+- If neither is specified, the skill is always shown (backward compatible).
+
+**Examples:**
+
+```yaml
+# DuckDuckGo search — shown when Firecrawl (web toolset) is unavailable
+metadata:
+  hermes:
+    fallback_for_toolsets: [web]
+
+# Smart home skill — only useful when terminal is available
+metadata:
+  hermes:
+    requires_toolsets: [terminal]
+
+# Local browser fallback — shown when Browserbase is unavailable
+metadata:
+  hermes:
+    fallback_for_toolsets: [browser]
+```
+
+The filtering happens at prompt build time in `agent/prompt_builder.py`. The `build_skills_system_prompt()` function receives the set of available tools and toolsets from the agent and uses `_skill_should_show()` to evaluate each skill's conditions.
+
 ### Skill guidelines

 - **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
@@ -41,7 +41,6 @@ After installation:

 ```bash
 source ~/.bashrc    # reload shell (or: source ~/.zshrc)
-hermes setup        # configure your LLM provider
 hermes              # start chatting!
 ```

@@ -51,9 +50,12 @@ hermes              # start chatting!

 ```bash
 hermes              # Interactive CLI — start a conversation
-hermes model        # Switch provider or model
-hermes setup        # Re-run the setup wizard
+hermes model        # Choose your LLM provider and model
+hermes tools        # Configure which tools are enabled
+hermes config set   # Set individual config values
 hermes gateway      # Start the messaging gateway (Telegram, Discord, etc.)
+hermes setup        # Run the full setup wizard (configures everything at once)
+hermes claw migrate # Migrate from OpenClaw (if coming from OpenClaw)
 hermes update       # Update to the latest version
 hermes doctor       # Diagnose any issues
 ```
@@ -86,6 +88,35 @@ All documentation lives at **[hermes-agent.nousresearch.com/docs](https://hermes

 ---

+## Migrating from OpenClaw
+
+If you're coming from OpenClaw, Hermes can automatically import your settings, memories, skills, and API keys.
+
+**During first-time setup:** The setup wizard (`hermes setup`) automatically detects `~/.openclaw` and offers to migrate before configuration begins.
+
+**Anytime after install:**
+
+```bash
+hermes claw migrate              # Interactive migration (full preset)
+hermes claw migrate --dry-run    # Preview what would be migrated
+hermes claw migrate --preset user-data   # Migrate without secrets
+hermes claw migrate --overwrite  # Overwrite existing conflicts
+```
+
+What gets imported:
+- **SOUL.md** — persona file
+- **Memories** — MEMORY.md and USER.md entries
+- **Skills** — user-created skills → `~/.hermes/skills/openclaw-imports/`
+- **Command allowlist** — approval patterns
+- **Messaging settings** — platform configs, allowed users, working directory
+- **API keys** — allowlisted secrets (Telegram, OpenRouter, OpenAI, Anthropic, ElevenLabs)
+- **TTS assets** — workspace audio files
+- **Workspace instructions** — AGENTS.md (with `--workspace-target`)
+
+See `hermes claw migrate --help` for all options, or use the `openclaw-migration` skill for an interactive agent-guided migration with dry-run previews.
+
+---
+
 ## Contributing

 We welcome contributions! See the [Contributing Guide](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) for development setup, code style, and PR process.
@@ -93,8 +124,9 @@ We welcome contributions! See the [Contributing Guide](https://hermes-agent.nous
 Quick start for contributors:

 ```bash
-git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
+git clone https://github.com/NousResearch/hermes-agent.git
 cd hermes-agent
+git submodule update --init mini-swe-agent   # required terminal backend
 curl -LsSf https://astral.sh/uv/install.sh | sh
 uv venv .venv --python 3.11
 source .venv/bin/activate
@@ -103,6 +135,12 @@ uv pip install -e "./mini-swe-agent"
 python -m pytest tests/ -q
 ```

+> **RL Training (optional):** To work on the RL/Tinker-Atropos integration, also run:
+> ```bash
+> git submodule update --init tinker-atropos
+> uv pip install -e "./tinker-atropos"
+> ```
+
 ---

 ## Community
@@ -0,0 +1,383 @@
+# Hermes Agent v0.2.0 (v2026.3.12)
+
+**Release Date:** March 12, 2026
+
+> First tagged release since v0.1.0 (the initial pre-public foundation). In just over two weeks, Hermes Agent went from a small internal project to a full-featured AI agent platform — thanks to an explosion of community contributions. This release covers **216 merged pull requests** from **63 contributors**, resolving **119 issues**.
+
+---
+
+## ✨ Highlights
+
+- **Multi-Platform Messaging Gateway** — Telegram, Discord, Slack, WhatsApp, Signal, Email (IMAP/SMTP), and Home Assistant platforms with unified session management, media attachments, and per-platform tool configuration.
+
+- **MCP (Model Context Protocol) Client** — Native MCP support with stdio and HTTP transports, reconnection, resource/prompt discovery, and sampling (server-initiated LLM requests). ([#291](https://github.com/NousResearch/hermes-agent/pull/291) — @0xbyt4, [#301](https://github.com/NousResearch/hermes-agent/pull/301), [#753](https://github.com/NousResearch/hermes-agent/pull/753))
+
+- **Skills Ecosystem** — 70+ bundled and optional skills across 15+ categories with a Skills Hub for community discovery, per-platform enable/disable, conditional activation based on tool availability, and prerequisite validation. ([#743](https://github.com/NousResearch/hermes-agent/pull/743) — @teyrebaz33, [#785](https://github.com/NousResearch/hermes-agent/pull/785) — @teyrebaz33)
+
+- **Centralized Provider Router** — Unified `call_llm()`/`async_call_llm()` API replaces scattered provider logic across vision, summarization, compression, and trajectory saving. All auxiliary consumers route through a single code path with automatic credential resolution. ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003))
+
+- **ACP Server** — VS Code, Zed, and JetBrains editor integration via the Agent Communication Protocol standard. ([#949](https://github.com/NousResearch/hermes-agent/pull/949))
+
+- **CLI Skin/Theme Engine** — Data-driven visual customization: banners, spinners, colors, branding. 7 built-in skins + custom YAML skins.
+
+- **Git Worktree Isolation** — `hermes -w` launches isolated agent sessions in git worktrees for safe parallel work on the same repo. ([#654](https://github.com/NousResearch/hermes-agent/pull/654))
+
+- **Filesystem Checkpoints & Rollback** — Automatic snapshots before destructive operations with `/rollback` to restore. ([#824](https://github.com/NousResearch/hermes-agent/pull/824))
+
+- **3,289 Tests** — From near-zero test coverage to a comprehensive test suite covering agent, gateway, tools, cron, and CLI.
+
+---
+
+## 🏗️ Core Agent & Architecture
+
+### Provider & Model Support
+- Centralized provider router with `resolve_provider_client()` + `call_llm()` API ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003))
+- Nous Portal as first-class provider in setup ([#644](https://github.com/NousResearch/hermes-agent/issues/644))
+- OpenAI Codex (Responses API) with ChatGPT subscription support ([#43](https://github.com/NousResearch/hermes-agent/pull/43)) — @grp06
+- Codex OAuth vision support + multimodal content adapter
+- Validate `/model` against live API instead of hardcoded lists
+- Self-hosted Firecrawl support ([#460](https://github.com/NousResearch/hermes-agent/pull/460)) — @caentzminger
+- Kimi Code API support ([#635](https://github.com/NousResearch/hermes-agent/pull/635)) — @christomitov
+- MiniMax model ID update ([#473](https://github.com/NousResearch/hermes-agent/pull/473)) — @tars90percent
+- OpenRouter provider routing configuration (provider_preferences)
+- Nous credential refresh on 401 errors ([#571](https://github.com/NousResearch/hermes-agent/pull/571), [#269](https://github.com/NousResearch/hermes-agent/pull/269)) — @rewbs
+- z.ai/GLM, Kimi/Moonshot, MiniMax, Azure OpenAI as first-class providers
+- Unified `/model` and `/provider` into single view
+
+### Agent Loop & Conversation
+- Simple fallback model for provider resilience ([#740](https://github.com/NousResearch/hermes-agent/pull/740))
+- Shared iteration budget across parent + subagent delegation
+- Iteration budget pressure via tool result injection
+- Configurable subagent provider/model with full credential resolution
+- Handle 413 payload-too-large via compression instead of aborting ([#153](https://github.com/NousResearch/hermes-agent/pull/153)) — @tekelala
+- Retry with rebuilt payload after compression ([#616](https://github.com/NousResearch/hermes-agent/pull/616)) — @tripledoublev
+- Auto-compress pathologically large gateway sessions ([#628](https://github.com/NousResearch/hermes-agent/issues/628))
+- Tool call repair middleware — auto-lowercase and invalid tool handler
+- Reasoning effort configuration and `/reasoning` command ([#921](https://github.com/NousResearch/hermes-agent/pull/921))
+- Detect and block file re-read/search loops after context compression ([#705](https://github.com/NousResearch/hermes-agent/pull/705)) — @0xbyt4
+
+### Session & Memory
+- Session naming with unique titles, auto-lineage, rich listing, and resume by name ([#720](https://github.com/NousResearch/hermes-agent/pull/720))
+- Interactive session browser with search filtering ([#733](https://github.com/NousResearch/hermes-agent/pull/733))
+- Display previous messages when resuming a session ([#734](https://github.com/NousResearch/hermes-agent/pull/734))
+- Honcho AI-native cross-session user modeling ([#38](https://github.com/NousResearch/hermes-agent/pull/38)) — @erosika
+- Proactive async memory flush on session expiry
+- Smart context length probing with persistent caching + banner display
+- `/resume` command for switching to named sessions in gateway
+- Session reset policy for messaging platforms
+
+---
+
+## 📱 Messaging Platforms (Gateway)
+
+### Telegram
+- Native file attachments: send_document + send_video
+- Document file processing for PDF, text, and Office files — @tekelala
+- Forum topic session isolation ([#766](https://github.com/NousResearch/hermes-agent/pull/766)) — @spanishflu-est1918
+- Browser screenshot sharing via MEDIA: protocol ([#657](https://github.com/NousResearch/hermes-agent/pull/657))
+- Location support for find-nearby skill
+- TTS voice message accumulation fix ([#176](https://github.com/NousResearch/hermes-agent/pull/176)) — @Bartok9
+- Improved error handling and logging ([#763](https://github.com/NousResearch/hermes-agent/pull/763)) — @aydnOktay
+- Italic regex newline fix + 43 format tests ([#204](https://github.com/NousResearch/hermes-agent/pull/204)) — @0xbyt4
+
+### Discord
+- Channel topic included in session context ([#248](https://github.com/NousResearch/hermes-agent/pull/248)) — @Bartok9
+- DISCORD_ALLOW_BOTS config for bot message filtering ([#758](https://github.com/NousResearch/hermes-agent/pull/758))
+- Document and video support ([#784](https://github.com/NousResearch/hermes-agent/pull/784))
+- Improved error handling and logging ([#761](https://github.com/NousResearch/hermes-agent/pull/761)) — @aydnOktay
+
+### Slack
+- App_mention 404 fix + document/video support ([#784](https://github.com/NousResearch/hermes-agent/pull/784))
+- Structured logging replacing print statements — @aydnOktay
+
+### WhatsApp
+- Native media sending — images, videos, documents ([#292](https://github.com/NousResearch/hermes-agent/pull/292)) — @satelerd
+- Multi-user session isolation ([#75](https://github.com/NousResearch/hermes-agent/pull/75)) — @satelerd
+- Cross-platform port cleanup replacing Linux-only fuser ([#433](https://github.com/NousResearch/hermes-agent/pull/433)) — @Farukest
+- DM interrupt key mismatch fix ([#350](https://github.com/NousResearch/hermes-agent/pull/350)) — @Farukest
+
+### Signal
+- Full Signal messenger gateway via signal-cli-rest-api ([#405](https://github.com/NousResearch/hermes-agent/issues/405))
+- Media URL support in message events ([#871](https://github.com/NousResearch/hermes-agent/pull/871))
+
+### Email (IMAP/SMTP)
+- New email gateway platform — @0xbyt4
+
+### Home Assistant
+- REST tools + WebSocket gateway integration ([#184](https://github.com/NousResearch/hermes-agent/pull/184)) — @0xbyt4
+- Service discovery and enhanced setup
+- Toolset mapping fix ([#538](https://github.com/NousResearch/hermes-agent/pull/538)) — @Himess
+
+### Gateway Core
+- Expose subagent tool calls and thinking to users ([#186](https://github.com/NousResearch/hermes-agent/pull/186)) — @cutepawss
+- Configurable background process watcher notifications ([#840](https://github.com/NousResearch/hermes-agent/pull/840))
+- `edit_message()` for Telegram/Discord/Slack with fallback
+- `/compress`, `/usage`, `/update` slash commands
+- Eliminated 3x SQLite message duplication in gateway sessions ([#873](https://github.com/NousResearch/hermes-agent/pull/873))
+- Stabilize system prompt across gateway turns for cache hits ([#754](https://github.com/NousResearch/hermes-agent/pull/754))
+- MCP server shutdown on gateway exit ([#796](https://github.com/NousResearch/hermes-agent/pull/796)) — @0xbyt4
+- Pass session_db to AIAgent, fixing session_search error ([#108](https://github.com/NousResearch/hermes-agent/pull/108)) — @Bartok9
+- Persist transcript changes in /retry, /undo; fix /reset attribute ([#217](https://github.com/NousResearch/hermes-agent/pull/217)) — @Farukest
+- UTF-8 encoding fix preventing Windows crashes ([#369](https://github.com/NousResearch/hermes-agent/pull/369)) — @ch3ronsa
+
+---
+
+## 🖥️ CLI & User Experience
+
+### Interactive CLI
+- Data-driven skin/theme engine — 7 built-in skins (default, ares, mono, slate, poseidon, sisyphus, charizard) + custom YAML skins
+- `/personality` command with custom personality + disable support ([#773](https://github.com/NousResearch/hermes-agent/pull/773)) — @teyrebaz33
+- User-defined quick commands that bypass the agent loop ([#746](https://github.com/NousResearch/hermes-agent/pull/746)) — @teyrebaz33
+- `/reasoning` command for effort level and display toggle ([#921](https://github.com/NousResearch/hermes-agent/pull/921))
+- `/verbose` slash command to toggle debug at runtime ([#94](https://github.com/NousResearch/hermes-agent/pull/94)) — @cesareth
+- `/insights` command — usage analytics, cost estimation & activity patterns ([#552](https://github.com/NousResearch/hermes-agent/pull/552))
+- `/background` command for managing background processes
+- `/help` formatting with command categories
+- Bell-on-complete — terminal bell when agent finishes ([#738](https://github.com/NousResearch/hermes-agent/pull/738))
+- Up/down arrow history navigation
+- Clipboard image paste (Alt+V / Ctrl+V)
+- Loading indicators for slow slash commands ([#882](https://github.com/NousResearch/hermes-agent/pull/882))
+- Spinner flickering fix under patch_stdout ([#91](https://github.com/NousResearch/hermes-agent/pull/91)) — @0xbyt4
+- `--quiet/-Q` flag for programmatic single-query mode
+- `--fuck-it-ship-it` flag to bypass all approval prompts ([#724](https://github.com/NousResearch/hermes-agent/pull/724)) — @dmahan93
+- Tools summary flag ([#767](https://github.com/NousResearch/hermes-agent/pull/767)) — @luisv-1
+- Terminal blinking fix on SSH ([#284](https://github.com/NousResearch/hermes-agent/pull/284)) — @ygd58
+- Multi-line paste detection fix ([#84](https://github.com/NousResearch/hermes-agent/pull/84)) — @0xbyt4
+
+### Setup & Configuration
+- Modular setup wizard with section subcommands and tool-first UX
+- Container resource configuration prompts
+- Backend validation for required binaries
+- Config migration system (currently v7)
+- API keys properly routed to .env instead of config.yaml ([#469](https://github.com/NousResearch/hermes-agent/pull/469)) — @ygd58
+- Atomic write for .env to prevent API key loss on crash ([#954](https://github.com/NousResearch/hermes-agent/pull/954))
+- `hermes tools` — per-platform tool enable/disable with curses UI
+- `hermes doctor` for health checks across all configured providers
+- `hermes update` with auto-restart for gateway service
+- Show update-available notice in CLI banner
+- Multiple named custom providers
+- Shell config detection improvement for PATH setup ([#317](https://github.com/NousResearch/hermes-agent/pull/317)) — @mehmetkr-31
+- Consistent HERMES_HOME and .env path resolution ([#51](https://github.com/NousResearch/hermes-agent/pull/51), [#48](https://github.com/NousResearch/hermes-agent/pull/48)) — @deankerr
+- Docker backend fix on macOS + subagent auth for Nous Portal ([#46](https://github.com/NousResearch/hermes-agent/pull/46)) — @rsavitt
+
+---
+
+## 🔧 Tool System
+
+### MCP (Model Context Protocol)
+- Native MCP client with stdio + HTTP transports ([#291](https://github.com/NousResearch/hermes-agent/pull/291) — @0xbyt4, [#301](https://github.com/NousResearch/hermes-agent/pull/301))
+- Sampling support — server-initiated LLM requests ([#753](https://github.com/NousResearch/hermes-agent/pull/753))
+- Resource and prompt discovery
+- Automatic reconnection and security hardening
+- Banner integration, `/reload-mcp` command
+- `hermes tools` UI integration
+
+### Browser
+- Local browser backend — zero-cost headless Chromium (no Browserbase needed)
+- Console/errors tool, annotated screenshots, auto-recording, dogfood QA skill ([#745](https://github.com/NousResearch/hermes-agent/pull/745))
+- Screenshot sharing via MEDIA: on all messaging platforms ([#657](https://github.com/NousResearch/hermes-agent/pull/657))
+
+### Terminal & Execution
+- `execute_code` sandbox with json_parse, shell_quote, retry helpers
+- Docker: custom volume mounts ([#158](https://github.com/NousResearch/hermes-agent/pull/158)) — @Indelwin
+- Daytona cloud sandbox backend ([#451](https://github.com/NousResearch/hermes-agent/pull/451)) — @rovle
+- SSH backend fix ([#59](https://github.com/NousResearch/hermes-agent/pull/59)) — @deankerr
+- Shell noise filtering and login shell execution for environment consistency
+- Head+tail truncation for execute_code stdout overflow
+- Configurable background process notification modes
+
+### File Operations
+- Filesystem checkpoints and `/rollback` command ([#824](https://github.com/NousResearch/hermes-agent/pull/824))
+- Structured tool result hints (next-action guidance) for patch and search_files ([#722](https://github.com/NousResearch/hermes-agent/issues/722))
+- Docker volumes passed to sandbox container config ([#687](https://github.com/NousResearch/hermes-agent/pull/687)) — @manuelschipper
+
+---
+
+## 🧩 Skills Ecosystem
+
+### Skills System
+- Per-platform skill enable/disable ([#743](https://github.com/NousResearch/hermes-agent/pull/743)) — @teyrebaz33
+- Conditional skill activation based on tool availability ([#785](https://github.com/NousResearch/hermes-agent/pull/785)) — @teyrebaz33
+- Skill prerequisites — hide skills with unmet dependencies ([#659](https://github.com/NousResearch/hermes-agent/pull/659)) — @kshitijk4poor
+- Optional skills — shipped but not activated by default
+- `hermes skills browse` — paginated hub browsing
+- Skills sub-category organization
+- Platform-conditional skill loading
+- Atomic skill file writes ([#551](https://github.com/NousResearch/hermes-agent/pull/551)) — @aydnOktay
+- Skills sync data loss prevention ([#563](https://github.com/NousResearch/hermes-agent/pull/563)) — @0xbyt4
+- Dynamic skill slash commands for CLI and gateway
+
+### New Skills (selected)
+- **ASCII Art** — pyfiglet (571 fonts), cowsay, image-to-ascii ([#209](https://github.com/NousResearch/hermes-agent/pull/209)) — @0xbyt4
+- **ASCII Video** — Full production pipeline ([#854](https://github.com/NousResearch/hermes-agent/pull/854)) — @SHL0MS
+- **DuckDuckGo Search** — Firecrawl fallback ([#267](https://github.com/NousResearch/hermes-agent/pull/267)) — @gamedevCloudy; DDGS API expansion ([#598](https://github.com/NousResearch/hermes-agent/pull/598)) — @areu01or00
+- **Solana Blockchain** — Wallet balances, USD pricing, token names ([#212](https://github.com/NousResearch/hermes-agent/pull/212)) — @gizdusum
+- **AgentMail** — Agent-owned email inboxes ([#330](https://github.com/NousResearch/hermes-agent/pull/330)) — @teyrebaz33
+- **Polymarket** — Prediction market data (read-only) ([#629](https://github.com/NousResearch/hermes-agent/pull/629))
+- **OpenClaw Migration** — Official migration tool ([#570](https://github.com/NousResearch/hermes-agent/pull/570)) — @unmodeled-tyler
+- **Domain Intelligence** — Passive recon: subdomains, SSL, WHOIS, DNS ([#136](https://github.com/NousResearch/hermes-agent/pull/136)) — @FurkanL0
+- **Superpowers** — Software development skills ([#137](https://github.com/NousResearch/hermes-agent/pull/137)) — @kaos35
+- **Hermes-Atropos** — RL environment development skill ([#815](https://github.com/NousResearch/hermes-agent/pull/815))
+- Plus: arXiv search, OCR/documents, Excalidraw diagrams, YouTube transcripts, GIF search, Pokémon player, Minecraft modpack server, OpenHue (Philips Hue), Google Workspace, Notion, PowerPoint, Obsidian, find-nearby, and 40+ MLOps skills
+
+---
+
+## 🔒 Security & Reliability
+
+### Security Hardening
+- Path traversal fix in skill_view — prevented reading arbitrary files ([#220](https://github.com/NousResearch/hermes-agent/issues/220)) — @Farukest
+- Shell injection prevention in sudo password piping ([#65](https://github.com/NousResearch/hermes-agent/pull/65)) — @leonsgithub
+- Dangerous command detection: multiline bypass fix ([#233](https://github.com/NousResearch/hermes-agent/pull/233)) — @Farukest; tee/process substitution patterns ([#280](https://github.com/NousResearch/hermes-agent/pull/280)) — @dogiladeveloper
+- Symlink boundary check fix in skills_guard ([#386](https://github.com/NousResearch/hermes-agent/pull/386)) — @Farukest
+- Symlink bypass fix in write deny list on macOS ([#61](https://github.com/NousResearch/hermes-agent/pull/61)) — @0xbyt4
+- Multi-word prompt injection bypass prevention ([#192](https://github.com/NousResearch/hermes-agent/pull/192)) — @0xbyt4
+- Cron prompt injection scanner bypass fix ([#63](https://github.com/NousResearch/hermes-agent/pull/63)) — @0xbyt4
+- Enforce 0600/0700 file permissions on sensitive files ([#757](https://github.com/NousResearch/hermes-agent/pull/757))
+- .env file permissions restricted to owner-only ([#529](https://github.com/NousResearch/hermes-agent/pull/529)) — @Himess
+- `--force` flag properly blocked from overriding dangerous verdicts ([#388](https://github.com/NousResearch/hermes-agent/pull/388)) — @Farukest
+- FTS5 query sanitization + DB connection leak fix ([#565](https://github.com/NousResearch/hermes-agent/pull/565)) — @0xbyt4
+- Expand secret redaction patterns + config toggle to disable
+- In-memory permanent allowlist to prevent data leak ([#600](https://github.com/NousResearch/hermes-agent/pull/600)) — @alireza78a
+
+### Atomic Writes (data loss prevention)
+- sessions.json ([#611](https://github.com/NousResearch/hermes-agent/pull/611)) — @alireza78a
+- Cron jobs ([#146](https://github.com/NousResearch/hermes-agent/pull/146)) — @alireza78a
+- .env config ([#954](https://github.com/NousResearch/hermes-agent/pull/954))
+- Process checkpoints ([#298](https://github.com/NousResearch/hermes-agent/pull/298)) — @aydnOktay
+- Batch runner ([#297](https://github.com/NousResearch/hermes-agent/pull/297)) — @aydnOktay
+- Skill files ([#551](https://github.com/NousResearch/hermes-agent/pull/551)) — @aydnOktay
+
+### Reliability
+- Guard all print() against OSError for systemd/headless environments ([#963](https://github.com/NousResearch/hermes-agent/pull/963))
+- Reset all retry counters at start of run_conversation ([#607](https://github.com/NousResearch/hermes-agent/pull/607)) — @0xbyt4
+- Return deny on approval callback timeout instead of None ([#603](https://github.com/NousResearch/hermes-agent/pull/603)) — @0xbyt4
+- Fix None message content crashes across codebase ([#277](https://github.com/NousResearch/hermes-agent/pull/277))
+- Fix context overrun crash with local LLM backends ([#403](https://github.com/NousResearch/hermes-agent/pull/403)) — @ch3ronsa
+- Prevent `_flush_sentinel` from leaking to external APIs ([#227](https://github.com/NousResearch/hermes-agent/pull/227)) — @Farukest
+- Prevent conversation_history mutation in callers ([#229](https://github.com/NousResearch/hermes-agent/pull/229)) — @Farukest
+- Fix systemd restart loop ([#614](https://github.com/NousResearch/hermes-agent/pull/614)) — @voidborne-d
+- Close file handles and sockets to prevent fd leaks ([#568](https://github.com/NousResearch/hermes-agent/pull/568) — @alireza78a, [#296](https://github.com/NousResearch/hermes-agent/pull/296) — @alireza78a, [#709](https://github.com/NousResearch/hermes-agent/pull/709) — @memosr)
+- Prevent data loss in clipboard PNG conversion ([#602](https://github.com/NousResearch/hermes-agent/pull/602)) — @0xbyt4
+- Eliminate shell noise from terminal output ([#293](https://github.com/NousResearch/hermes-agent/pull/293)) — @0xbyt4
+- Timezone-aware now() for prompt, cron, and execute_code ([#309](https://github.com/NousResearch/hermes-agent/pull/309)) — @areu01or00
+
+### Windows Compatibility
+- Guard POSIX-only process functions ([#219](https://github.com/NousResearch/hermes-agent/pull/219)) — @Farukest
+- Windows native support via Git Bash + ZIP-based update fallback
+- pywinpty for PTY support ([#457](https://github.com/NousResearch/hermes-agent/pull/457)) — @shitcoinsherpa
+- Explicit UTF-8 encoding on all config/data file I/O ([#458](https://github.com/NousResearch/hermes-agent/pull/458)) — @shitcoinsherpa
+- Windows-compatible path handling ([#354](https://github.com/NousResearch/hermes-agent/pull/354), [#390](https://github.com/NousResearch/hermes-agent/pull/390)) — @Farukest
+- Regex-based search output parsing for drive-letter paths ([#533](https://github.com/NousResearch/hermes-agent/pull/533)) — @Himess
+- Auth store file lock for Windows ([#455](https://github.com/NousResearch/hermes-agent/pull/455)) — @shitcoinsherpa
+
+---
+
+## 🐛 Notable Bug Fixes
+
+- Fix DeepSeek V3 tool call parser silently dropping multi-line JSON arguments ([#444](https://github.com/NousResearch/hermes-agent/pull/444)) — @PercyDikec
+- Fix gateway transcript losing 1 message per turn due to offset mismatch ([#395](https://github.com/NousResearch/hermes-agent/pull/395)) — @PercyDikec
+- Fix /retry command silently discarding the agent's final response ([#441](https://github.com/NousResearch/hermes-agent/pull/441)) — @PercyDikec
+- Fix max-iterations retry returning empty string after think-block stripping ([#438](https://github.com/NousResearch/hermes-agent/pull/438)) — @PercyDikec
+- Fix max-iterations retry using hardcoded max_tokens ([#436](https://github.com/NousResearch/hermes-agent/pull/436)) — @Farukest
+- Fix Codex status dict key mismatch ([#448](https://github.com/NousResearch/hermes-agent/pull/448)) and visibility filter ([#446](https://github.com/NousResearch/hermes-agent/pull/446)) — @PercyDikec
+- Strip \<think\> blocks from final user-facing responses ([#174](https://github.com/NousResearch/hermes-agent/pull/174)) — @Bartok9
+- Fix \<think\> block regex stripping visible content when model discusses tags literally ([#786](https://github.com/NousResearch/hermes-agent/issues/786))
+- Fix Mistral 422 errors from leftover finish_reason in assistant messages ([#253](https://github.com/NousResearch/hermes-agent/pull/253)) — @Sertug17
+- Fix OPENROUTER_API_KEY resolution order across all code paths ([#295](https://github.com/NousResearch/hermes-agent/pull/295)) — @0xbyt4
+- Fix OPENAI_BASE_URL API key priority ([#420](https://github.com/NousResearch/hermes-agent/pull/420)) — @manuelschipper
+- Fix Anthropic "prompt is too long" 400 error not detected as context length error ([#813](https://github.com/NousResearch/hermes-agent/issues/813))
+- Fix SQLite session transcript accumulating duplicate messages — 3-4x token inflation ([#860](https://github.com/NousResearch/hermes-agent/issues/860))
+- Fix setup wizard skipping API key prompts on first install ([#748](https://github.com/NousResearch/hermes-agent/pull/748))
+- Fix setup wizard showing OpenRouter model list for Nous Portal ([#575](https://github.com/NousResearch/hermes-agent/pull/575)) — @PercyDikec
+- Fix provider selection not persisting when switching via hermes model ([#881](https://github.com/NousResearch/hermes-agent/pull/881))
+- Fix Docker backend failing when docker not in PATH on macOS ([#889](https://github.com/NousResearch/hermes-agent/pull/889))
+- Fix ClawHub Skills Hub adapter for API endpoint changes ([#286](https://github.com/NousResearch/hermes-agent/pull/286)) — @BP602
+- Fix Honcho auto-enable when API key is present ([#243](https://github.com/NousResearch/hermes-agent/pull/243)) — @Bartok9
+- Fix duplicate 'skills' subparser crash on Python 3.11+ ([#898](https://github.com/NousResearch/hermes-agent/issues/898))
+- Fix memory tool entry parsing when content contains section sign ([#162](https://github.com/NousResearch/hermes-agent/pull/162)) — @aydnOktay
+- Fix piped install silently aborting when interactive prompts fail ([#72](https://github.com/NousResearch/hermes-agent/pull/72)) — @cutepawss
+- Fix false positives in recursive delete detection ([#68](https://github.com/NousResearch/hermes-agent/pull/68)) — @cutepawss
+- Fix Ruff lint warnings across codebase ([#608](https://github.com/NousResearch/hermes-agent/pull/608)) — @JackTheGit
+- Fix Anthropic native base URL fail-fast ([#173](https://github.com/NousResearch/hermes-agent/pull/173)) — @adavyas
+- Fix install.sh creating ~/.hermes before moving Node.js directory ([#53](https://github.com/NousResearch/hermes-agent/pull/53)) — @JoshuaMart
+- Fix SystemExit traceback during atexit cleanup on Ctrl+C ([#55](https://github.com/NousResearch/hermes-agent/pull/55)) — @bierlingm
+- Restore missing MIT license file ([#620](https://github.com/NousResearch/hermes-agent/pull/620)) — @stablegenius49
+
+---
+
+## 🧪 Testing
+
+- **3,289 tests** across agent, gateway, tools, cron, and CLI
+- Parallelized test suite with pytest-xdist ([#802](https://github.com/NousResearch/hermes-agent/pull/802)) — @OutThisLife
+- Unit tests batch 1: 8 core modules ([#60](https://github.com/NousResearch/hermes-agent/pull/60)) — @0xbyt4
+- Unit tests batch 2: 8 more modules ([#62](https://github.com/NousResearch/hermes-agent/pull/62)) — @0xbyt4
+- Unit tests batch 3: 8 untested modules ([#191](https://github.com/NousResearch/hermes-agent/pull/191)) — @0xbyt4
+- Unit tests batch 4: 5 security/logic-critical modules ([#193](https://github.com/NousResearch/hermes-agent/pull/193)) — @0xbyt4
+- AIAgent (run_agent.py) unit tests ([#67](https://github.com/NousResearch/hermes-agent/pull/67)) — @0xbyt4
+- Trajectory compressor tests ([#203](https://github.com/NousResearch/hermes-agent/pull/203)) — @0xbyt4
+- Clarify tool tests ([#121](https://github.com/NousResearch/hermes-agent/pull/121)) — @Bartok9
+- Telegram format tests — 43 tests for italic/bold/code rendering ([#204](https://github.com/NousResearch/hermes-agent/pull/204)) — @0xbyt4
+- Vision tools type hints + 42 tests ([#792](https://github.com/NousResearch/hermes-agent/pull/792))
+- Compressor tool-call boundary regression tests ([#648](https://github.com/NousResearch/hermes-agent/pull/648)) — @intertwine
+- Test structure reorganization ([#34](https://github.com/NousResearch/hermes-agent/pull/34)) — @0xbyt4
+- Shell noise elimination + fix 36 test failures ([#293](https://github.com/NousResearch/hermes-agent/pull/293)) — @0xbyt4
+
+---
+
+## 🔬 RL & Evaluation Environments
+
+- WebResearchEnv — Multi-step web research RL environment ([#434](https://github.com/NousResearch/hermes-agent/pull/434)) — @jackx707
+- Modal sandbox concurrency limits to avoid deadlocks ([#621](https://github.com/NousResearch/hermes-agent/pull/621)) — @voteblake
+- Hermes-atropos-environments bundled skill ([#815](https://github.com/NousResearch/hermes-agent/pull/815))
+- Local vLLM instance support for evaluation — @dmahan93
+- YC-Bench long-horizon agent benchmark environment
+- OpenThoughts-TBLite evaluation environment and scripts
+
+---
+
+## 📚 Documentation
+
+- Full documentation website (Docusaurus) with 37+ pages
+- Comprehensive platform setup guides for Telegram, Discord, Slack, WhatsApp, Signal, Email
+- AGENTS.md — development guide for AI coding assistants
+- CONTRIBUTING.md ([#117](https://github.com/NousResearch/hermes-agent/pull/117)) — @Bartok9
+- Slash commands reference ([#142](https://github.com/NousResearch/hermes-agent/pull/142)) — @Bartok9
+- Comprehensive AGENTS.md accuracy audit ([#732](https://github.com/NousResearch/hermes-agent/pull/732))
+- Skin/theme system documentation
+- MCP documentation and examples
+- Docs accuracy audit — 35+ corrections
+- Documentation typo fixes ([#825](https://github.com/NousResearch/hermes-agent/pull/825), [#439](https://github.com/NousResearch/hermes-agent/pull/439)) — @JackTheGit
+- CLI config precedence and terminology standardization ([#166](https://github.com/NousResearch/hermes-agent/pull/166), [#167](https://github.com/NousResearch/hermes-agent/pull/167), [#168](https://github.com/NousResearch/hermes-agent/pull/168)) — @Jr-kenny
+- Telegram token regex documentation ([#713](https://github.com/NousResearch/hermes-agent/pull/713)) — @VolodymyrBg
+
+---
+
+## 👥 Contributors
+
+Thank you to the 63 contributors who made this release possible! In just over two weeks, the Hermes Agent community came together to ship an extraordinary amount of work.
+
+### Core
+- **@teknium1** — 43 PRs: Project lead, core architecture, provider router, sessions, skills, CLI, documentation
+
+### Top Community Contributors
+- **@0xbyt4** — 40 PRs: MCP client, Home Assistant, security fixes (symlink, prompt injection, cron), extensive test coverage (6 batches), ascii-art skill, shell noise elimination, skills sync, Telegram formatting, and dozens more
+- **@Farukest** — 16 PRs: Security hardening (path traversal, dangerous command detection, symlink boundary), Windows compatibility (POSIX guards, path handling), WhatsApp fixes, max-iterations retry, gateway fixes
+- **@aydnOktay** — 11 PRs: Atomic writes (process checkpoints, batch runner, skill files), error handling improvements across Telegram, Discord, code execution, transcription, TTS, and skills
+- **@Bartok9** — 9 PRs: CONTRIBUTING.md, slash commands reference, Discord channel topics, think-block stripping, TTS fix, Honcho fix, session count fix, clarify tests
+- **@PercyDikec** — 7 PRs: DeepSeek V3 parser fix, /retry response discard, gateway transcript offset, Codex status/visibility, max-iterations retry, setup wizard fix
+- **@teyrebaz33** — 5 PRs: Skills enable/disable system, quick commands, personality customization, conditional skill activation
+- **@alireza78a** — 5 PRs: Atomic writes (cron, sessions), fd leak prevention, security allowlist, code execution socket cleanup
+- **@shitcoinsherpa** — 3 PRs: Windows support (pywinpty, UTF-8 encoding, auth store lock)
+- **@Himess** — 3 PRs: Cron/HomeAssistant/Daytona fix, Windows drive-letter parsing, .env permissions
+- **@satelerd** — 2 PRs: WhatsApp native media, multi-user session isolation
+- **@rovle** — 1 PR: Daytona cloud sandbox backend (4 commits)
+- **@erosika** — 1 PR: Honcho AI-native memory integration
+- **@dmahan93** — 1 PR: --fuck-it-ship-it flag + RL environment work
+- **@SHL0MS** — 1 PR: ASCII video skill
+
+### All Contributors
+@0xbyt4, @BP602, @Bartok9, @Farukest, @FurkanL0, @Himess, @Indelwin, @JackTheGit, @JoshuaMart, @Jr-kenny, @OutThisLife, @PercyDikec, @SHL0MS, @Sertug17, @VencentSoliman, @VolodymyrBg, @adavyas, @alireza78a, @areu01or00, @aydnOktay, @batuhankocyigit, @bierlingm, @caentzminger, @cesareth, @ch3ronsa, @christomitov, @cutepawss, @deankerr, @dmahan93, @dogiladeveloper, @dragonkhoi, @erosika, @gamedevCloudy, @gizdusum, @grp06, @intertwine, @jackx707, @jdblackstar, @johnh4098, @kaos35, @kshitijk4poor, @leonsgithub, @luisv-1, @manuelschipper, @mehmetkr-31, @memosr, @PeterFile, @rewbs, @rovle, @rsavitt, @satelerd, @spanishflu-est1918, @stablegenius49, @tars90percent, @tekelala, @teknium1, @teyrebaz33, @tripledoublev, @unmodeled-tyler, @voidborne-d, @voteblake, @ygd58
+
+---
+
+**Full Changelog**: [v0.1.0...v2026.3.12](https://github.com/NousResearch/hermes-agent/compare/v0.1.0...v2026.3.12)
@@ -17,7 +17,10 @@ Resolution order for text tasks (auto mode):
 Resolution order for vision/multimodal tasks (auto mode):
  1. OpenRouter
  2. Nous Portal
-  3. None  (steps 3-5 are skipped — they may not support multimodal)
+  3. Codex OAuth (gpt-5.3-codex supports vision via Responses API)
+  4. Custom endpoint (for local vision models: Qwen-VL, LLaVA, Pixtral, etc.)
+  5. None  (API-key providers like z.ai/Kimi/MiniMax are skipped —
+     they may not support multimodal)

 Per-task provider overrides (e.g. AUXILIARY_VISION_PROVIDER,
 CONTEXT_COMPRESSION_PROVIDER) can force a specific provider for each task:
@@ -440,7 +443,7 @@ def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
    custom_key = os.getenv("OPENAI_API_KEY")
    if not custom_base or not custom_key:
        return None, None
-    model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
+    model = os.getenv("OPENAI_MODEL") or "gpt-4o-mini"
    logger.debug("Auxiliary client: custom endpoint (%s)", model)
    return OpenAI(api_key=custom_key, base_url=custom_base), model

@@ -499,6 +502,205 @@ def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
    return None, None


+# ── Centralized Provider Router ─────────────────────────────────────────────
+#
+# resolve_provider_client() is the single entry point for creating a properly
+# configured client given a (provider, model) pair.  It handles auth lookup,
+# base URL resolution, provider-specific headers, and API format differences
+# (Chat Completions vs Responses API for Codex).
+#
+# All auxiliary consumer code should go through this or the public helpers
+# below — never look up auth env vars ad-hoc.
+
+
+def _to_async_client(sync_client, model: str):
+    """Convert a sync client to its async counterpart, preserving Codex routing."""
+    from openai import AsyncOpenAI
+
+    if isinstance(sync_client, CodexAuxiliaryClient):
+        return AsyncCodexAuxiliaryClient(sync_client), model
+
+    async_kwargs = {
+        "api_key": sync_client.api_key,
+        "base_url": str(sync_client.base_url),
+    }
+    base_lower = str(sync_client.base_url).lower()
+    if "openrouter" in base_lower:
+        async_kwargs["default_headers"] = dict(_OR_HEADERS)
+    elif "api.kimi.com" in base_lower:
+        async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
+    return AsyncOpenAI(**async_kwargs), model
+
+
+def resolve_provider_client(
+    provider: str,
+    model: str = None,
+    async_mode: bool = False,
+    raw_codex: bool = False,
+) -> Tuple[Optional[Any], Optional[str]]:
+    """Central router: given a provider name and optional model, return a
+    configured client with the correct auth, base URL, and API format.
+
+    The returned client always exposes ``.chat.completions.create()`` — for
+    Codex/Responses API providers, an adapter handles the translation
+    transparently.
+
+    Args:
+        provider: Provider identifier.  One of:
+            "openrouter", "nous", "openai-codex" (or "codex"),
+            "zai", "kimi-coding", "minimax", "minimax-cn",
+            "custom" (OPENAI_BASE_URL + OPENAI_API_KEY),
+            "auto" (full auto-detection chain).
+        model: Model slug override.  If None, uses the provider's default
+               auxiliary model.
+        async_mode: If True, return an async-compatible client.
+        raw_codex: If True, return a raw OpenAI client for Codex providers
+            instead of wrapping in CodexAuxiliaryClient.  Use this when
+            the caller needs direct access to responses.stream() (e.g.,
+            the main agent loop).
+
+    Returns:
+        (client, resolved_model) or (None, None) if auth is unavailable.
+    """
+    # Normalise aliases
+    provider = (provider or "auto").strip().lower()
+    if provider == "codex":
+        provider = "openai-codex"
+    if provider == "main":
+        provider = "custom"
+
+    # ── Auto: try all providers in priority order ────────────────────
+    if provider == "auto":
+        client, resolved = _resolve_auto()
+        if client is None:
+            return None, None
+        final_model = model or resolved
+        return (_to_async_client(client, final_model) if async_mode
+                else (client, final_model))
+
+    # ── OpenRouter ───────────────────────────────────────────────────
+    if provider == "openrouter":
+        client, default = _try_openrouter()
+        if client is None:
+            logger.warning("resolve_provider_client: openrouter requested "
+                           "but OPENROUTER_API_KEY not set")
+            return None, None
+        final_model = model or default
+        return (_to_async_client(client, final_model) if async_mode
+                else (client, final_model))
+
+    # ── Nous Portal (OAuth) ──────────────────────────────────────────
+    if provider == "nous":
+        client, default = _try_nous()
+        if client is None:
+            logger.warning("resolve_provider_client: nous requested "
+                           "but Nous Portal not configured (run: hermes login)")
+            return None, None
+        final_model = model or default
+        return (_to_async_client(client, final_model) if async_mode
+                else (client, final_model))
+
+    # ── OpenAI Codex (OAuth → Responses API) ─────────────────────────
+    if provider == "openai-codex":
+        if raw_codex:
+            # Return the raw OpenAI client for callers that need direct
+            # access to responses.stream() (e.g., the main agent loop).
+            codex_token = _read_codex_access_token()
+            if not codex_token:
+                logger.warning("resolve_provider_client: openai-codex requested "
+                               "but no Codex OAuth token found (run: hermes model)")
+                return None, None
+            final_model = model or _CODEX_AUX_MODEL
+            raw_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
+            return (raw_client, final_model)
+        # Standard path: wrap in CodexAuxiliaryClient adapter
+        client, default = _try_codex()
+        if client is None:
+            logger.warning("resolve_provider_client: openai-codex requested "
+                           "but no Codex OAuth token found (run: hermes model)")
+            return None, None
+        final_model = model or default
+        return (_to_async_client(client, final_model) if async_mode
+                else (client, final_model))
+
+    # ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
+    if provider == "custom":
+        # Try custom first, then codex, then API-key providers
+        for try_fn in (_try_custom_endpoint, _try_codex,
+                       _resolve_api_key_provider):
+            client, default = try_fn()
+            if client is not None:
+                final_model = model or default
+                return (_to_async_client(client, final_model) if async_mode
+                        else (client, final_model))
+        logger.warning("resolve_provider_client: custom/main requested "
+                       "but no endpoint credentials found")
+        return None, None
+
+    # ── API-key providers from PROVIDER_REGISTRY ─────────────────────
+    try:
+        from hermes_cli.auth import PROVIDER_REGISTRY, _resolve_kimi_base_url
+    except ImportError:
+        logger.debug("hermes_cli.auth not available for provider %s", provider)
+        return None, None
+
+    pconfig = PROVIDER_REGISTRY.get(provider)
+    if pconfig is None:
+        logger.warning("resolve_provider_client: unknown provider %r", provider)
+        return None, None
+
+    if pconfig.auth_type == "api_key":
+        # Find the first configured API key
+        api_key = ""
+        for env_var in pconfig.api_key_env_vars:
+            api_key = os.getenv(env_var, "").strip()
+            if api_key:
+                break
+        if not api_key:
+            logger.warning("resolve_provider_client: provider %s has no API "
+                           "key configured (tried: %s)",
+                           provider, ", ".join(pconfig.api_key_env_vars))
+            return None, None
+
+        # Resolve base URL (env override → provider-specific logic → default)
+        base_url_override = os.getenv(pconfig.base_url_env_var, "").strip() if pconfig.base_url_env_var else ""
+        if provider == "kimi-coding":
+            base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, base_url_override)
+        elif base_url_override:
+            base_url = base_url_override
+        else:
+            base_url = pconfig.inference_base_url
+
+        default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
+        final_model = model or default_model
+
+        # Provider-specific headers
+        headers = {}
+        if "api.kimi.com" in base_url.lower():
+            headers["User-Agent"] = "KimiCLI/1.0"
+
+        client = OpenAI(api_key=api_key, base_url=base_url,
+                        **({"default_headers": headers} if headers else {}))
+        logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
+        return (_to_async_client(client, final_model) if async_mode
+                else (client, final_model))
+
+    elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
+        # OAuth providers — route through their specific try functions
+        if provider == "nous":
+            return resolve_provider_client("nous", model, async_mode)
+        if provider == "openai-codex":
+            return resolve_provider_client("openai-codex", model, async_mode)
+        # Other OAuth providers not directly supported
+        logger.warning("resolve_provider_client: OAuth provider %s not "
+                       "directly supported, try 'auto'", provider)
+        return None, None
+
+    logger.warning("resolve_provider_client: unhandled auth_type %s for %s",
+                   pconfig.auth_type, provider)
+    return None, None
+
+
 # ── Public API ──────────────────────────────────────────────────────────────

 def get_text_auxiliary_client(task: str = "") -> Tuple[Optional[OpenAI], Optional[str]]:
@@ -513,8 +715,8 @@ def get_text_auxiliary_client(task: str = "") -> Tuple[Optional[OpenAI], Optiona
    """
    forced = _get_auxiliary_provider(task)
    if forced != "auto":
-        return _resolve_forced_provider(forced)
-    return _resolve_auto()
+        return resolve_provider_client(forced)
+    return resolve_provider_client("auto")


 def get_async_text_auxiliary_client(task: str = ""):
@@ -524,24 +726,10 @@ def get_async_text_auxiliary_client(task: str = ""):
    (AsyncCodexAuxiliaryClient, model) which wraps the Responses API.
    Returns (None, None) when no provider is available.
    """
-    from openai import AsyncOpenAI
-
-    sync_client, model = get_text_auxiliary_client(task)
-    if sync_client is None:
-        return None, None
-
-    if isinstance(sync_client, CodexAuxiliaryClient):
-        return AsyncCodexAuxiliaryClient(sync_client), model
-
-    async_kwargs = {
-        "api_key": sync_client.api_key,
-        "base_url": str(sync_client.base_url),
-    }
-    if "openrouter" in str(sync_client.base_url).lower():
-        async_kwargs["default_headers"] = dict(_OR_HEADERS)
-    elif "api.kimi.com" in str(sync_client.base_url).lower():
-        async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
-    return AsyncOpenAI(**async_kwargs), model
+    forced = _get_auxiliary_provider(task)
+    if forced != "auto":
+        return resolve_provider_client(forced, async_mode=True)
+    return resolve_provider_client("auto", async_mode=True)


 def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
@@ -559,7 +747,7 @@ def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
    """
    forced = _get_auxiliary_provider("vision")
    if forced != "auto":
-        return _resolve_forced_provider(forced)
+        return resolve_provider_client(forced)
    # Auto: try providers known to support multimodal first, then fall
    # back to the user's custom endpoint.  Many local models (Qwen-VL,
    # LLaVA, Pixtral, etc.) support vision — skipping them entirely
@@ -573,6 +761,21 @@ def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
    return None, None


+def get_async_vision_auxiliary_client():
+    """Return (async_client, model_slug) for async vision consumers.
+
+    Properly handles Codex routing — unlike manually constructing
+    AsyncOpenAI from a sync client, this preserves the Responses API
+    adapter for Codex providers.
+
+    Returns (None, None) when no provider is available.
+    """
+    sync_client, model = get_vision_auxiliary_client()
+    if sync_client is None:
+        return None, None
+    return _to_async_client(sync_client, model)
+
+
 def get_auxiliary_extra_body() -> dict:
    """Return extra_body kwargs for auxiliary API calls.
    
@@ -598,3 +801,253 @@ def auxiliary_max_tokens_param(value: int) -> dict:
            and "api.openai.com" in custom_base.lower()):
        return {"max_completion_tokens": value}
    return {"max_tokens": value}
+
+
+# ── Centralized LLM Call API ────────────────────────────────────────────────
+#
+# call_llm() and async_call_llm() own the full request lifecycle:
+#   1. Resolve provider + model from task config (or explicit args)
+#   2. Get or create a cached client for that provider
+#   3. Format request args for the provider + model (max_tokens handling, etc.)
+#   4. Make the API call
+#   5. Return the response
+#
+# Every auxiliary LLM consumer should use these instead of manually
+# constructing clients and calling .chat.completions.create().
+
+# Client cache: (provider, async_mode) -> (client, default_model)
+_client_cache: Dict[tuple, tuple] = {}
+
+
+def _get_cached_client(
+    provider: str, model: str = None, async_mode: bool = False,
+) -> Tuple[Optional[Any], Optional[str]]:
+    """Get or create a cached client for the given provider."""
+    cache_key = (provider, async_mode)
+    if cache_key in _client_cache:
+        cached_client, cached_default = _client_cache[cache_key]
+        return cached_client, model or cached_default
+    client, default_model = resolve_provider_client(provider, model, async_mode)
+    if client is not None:
+        _client_cache[cache_key] = (client, default_model)
+    return client, model or default_model
+
+
+def _resolve_task_provider_model(
+    task: str = None,
+    provider: str = None,
+    model: str = None,
+) -> Tuple[str, Optional[str]]:
+    """Determine provider + model for a call.
+
+    Priority:
+      1. Explicit provider/model args (always win)
+      2. Env var overrides (AUXILIARY_{TASK}_PROVIDER, etc.)
+      3. Config file (auxiliary.{task}.provider/model or compression.*)
+      4. "auto" (full auto-detection chain)
+
+    Returns (provider, model) where model may be None (use provider default).
+    """
+    if provider:
+        return provider, model
+
+    if task:
+        # Check env var overrides first
+        env_provider = _get_auxiliary_provider(task)
+        if env_provider != "auto":
+            # Check for env var model override too
+            env_model = None
+            for prefix in ("AUXILIARY_", "CONTEXT_"):
+                val = os.getenv(f"{prefix}{task.upper()}_MODEL", "").strip()
+                if val:
+                    env_model = val
+                    break
+            return env_provider, model or env_model
+
+        # Read from config file
+        try:
+            from hermes_cli.config import load_config
+            config = load_config()
+        except ImportError:
+            return "auto", model
+
+        # Check auxiliary.{task} section
+        aux = config.get("auxiliary", {})
+        task_config = aux.get(task, {})
+        cfg_provider = task_config.get("provider", "").strip() or None
+        cfg_model = task_config.get("model", "").strip() or None
+
+        # Backwards compat: compression section has its own keys
+        if task == "compression" and not cfg_provider:
+            comp = config.get("compression", {})
+            cfg_provider = comp.get("summary_provider", "").strip() or None
+            cfg_model = cfg_model or comp.get("summary_model", "").strip() or None
+
+        if cfg_provider and cfg_provider != "auto":
+            return cfg_provider, model or cfg_model
+        return "auto", model or cfg_model
+
+    return "auto", model
+
+
+def _build_call_kwargs(
+    provider: str,
+    model: str,
+    messages: list,
+    temperature: Optional[float] = None,
+    max_tokens: Optional[int] = None,
+    tools: Optional[list] = None,
+    timeout: float = 30.0,
+    extra_body: Optional[dict] = None,
+) -> dict:
+    """Build kwargs for .chat.completions.create() with model/provider adjustments."""
+    kwargs: Dict[str, Any] = {
+        "model": model,
+        "messages": messages,
+        "timeout": timeout,
+    }
+
+    if temperature is not None:
+        kwargs["temperature"] = temperature
+
+    if max_tokens is not None:
+        # Codex adapter handles max_tokens internally; OpenRouter/Nous use max_tokens.
+        # Direct OpenAI api.openai.com with newer models needs max_completion_tokens.
+        if provider == "custom":
+            custom_base = os.getenv("OPENAI_BASE_URL", "")
+            if "api.openai.com" in custom_base.lower():
+                kwargs["max_completion_tokens"] = max_tokens
+            else:
+                kwargs["max_tokens"] = max_tokens
+        else:
+            kwargs["max_tokens"] = max_tokens
+
+    if tools:
+        kwargs["tools"] = tools
+
+    # Provider-specific extra_body
+    merged_extra = dict(extra_body or {})
+    if provider == "nous" or auxiliary_is_nous:
+        merged_extra.setdefault("tags", []).extend(["product=hermes-agent"])
+    if merged_extra:
+        kwargs["extra_body"] = merged_extra
+
+    return kwargs
+
+
+def call_llm(
+    task: str = None,
+    *,
+    provider: str = None,
+    model: str = None,
+    messages: list,
+    temperature: float = None,
+    max_tokens: int = None,
+    tools: list = None,
+    timeout: float = 30.0,
+    extra_body: dict = None,
+) -> Any:
+    """Centralized synchronous LLM call.
+
+    Resolves provider + model (from task config, explicit args, or auto-detect),
+    handles auth, request formatting, and model-specific arg adjustments.
+
+    Args:
+        task: Auxiliary task name ("compression", "vision", "web_extract",
+              "session_search", "skills_hub", "mcp", "flush_memories").
+              Reads provider:model from config/env. Ignored if provider is set.
+        provider: Explicit provider override.
+        model: Explicit model override.
+        messages: Chat messages list.
+        temperature: Sampling temperature (None = provider default).
+        max_tokens: Max output tokens (handles max_tokens vs max_completion_tokens).
+        tools: Tool definitions (for function calling).
+        timeout: Request timeout in seconds.
+        extra_body: Additional request body fields.
+
+    Returns:
+        Response object with .choices[0].message.content
+
+    Raises:
+        RuntimeError: If no provider is configured.
+    """
+    resolved_provider, resolved_model = _resolve_task_provider_model(
+        task, provider, model)
+
+    client, final_model = _get_cached_client(resolved_provider, resolved_model)
+    if client is None:
+        # Fallback: try openrouter
+        if resolved_provider != "openrouter":
+            logger.warning("Provider %s unavailable, falling back to openrouter",
+                           resolved_provider)
+            client, final_model = _get_cached_client(
+                "openrouter", resolved_model or _OPENROUTER_MODEL)
+    if client is None:
+        raise RuntimeError(
+            f"No LLM provider configured for task={task} provider={resolved_provider}. "
+            f"Run: hermes setup")
+
+    kwargs = _build_call_kwargs(
+        resolved_provider, final_model, messages,
+        temperature=temperature, max_tokens=max_tokens,
+        tools=tools, timeout=timeout, extra_body=extra_body)
+
+    # Handle max_tokens vs max_completion_tokens retry
+    try:
+        return client.chat.completions.create(**kwargs)
+    except Exception as first_err:
+        err_str = str(first_err)
+        if "max_tokens" in err_str or "unsupported_parameter" in err_str:
+            kwargs.pop("max_tokens", None)
+            kwargs["max_completion_tokens"] = max_tokens
+            return client.chat.completions.create(**kwargs)
+        raise
+
+
+async def async_call_llm(
+    task: str = None,
+    *,
+    provider: str = None,
+    model: str = None,
+    messages: list,
+    temperature: float = None,
+    max_tokens: int = None,
+    tools: list = None,
+    timeout: float = 30.0,
+    extra_body: dict = None,
+) -> Any:
+    """Centralized asynchronous LLM call.
+
+    Same as call_llm() but async. See call_llm() for full documentation.
+    """
+    resolved_provider, resolved_model = _resolve_task_provider_model(
+        task, provider, model)
+
+    client, final_model = _get_cached_client(
+        resolved_provider, resolved_model, async_mode=True)
+    if client is None:
+        if resolved_provider != "openrouter":
+            logger.warning("Provider %s unavailable, falling back to openrouter",
+                           resolved_provider)
+            client, final_model = _get_cached_client(
+                "openrouter", resolved_model or _OPENROUTER_MODEL,
+                async_mode=True)
+    if client is None:
+        raise RuntimeError(
+            f"No LLM provider configured for task={task} provider={resolved_provider}. "
+            f"Run: hermes setup")
+
+    kwargs = _build_call_kwargs(
+        resolved_provider, final_model, messages,
+        temperature=temperature, max_tokens=max_tokens,
+        tools=tools, timeout=timeout, extra_body=extra_body)
+
+    try:
+        return await client.chat.completions.create(**kwargs)
+    except Exception as first_err:
+        err_str = str(first_err)
+        if "max_tokens" in err_str or "unsupported_parameter" in err_str:
+            kwargs.pop("max_tokens", None)
+            kwargs["max_completion_tokens"] = max_tokens
+            return await client.chat.completions.create(**kwargs)
+        raise
@@ -9,7 +9,7 @@ import logging
 import os
 from typing import Any, Dict, List, Optional

-from agent.auxiliary_client import get_text_auxiliary_client
+from agent.auxiliary_client import call_llm
 from agent.model_metadata import (
    get_model_context_length,
    estimate_messages_tokens_rough,
@@ -53,8 +53,7 @@ class ContextCompressor:
        self.last_completion_tokens = 0
        self.last_total_tokens = 0

-        self.client, default_model = get_text_auxiliary_client("compression")
-        self.summary_model = summary_model_override or default_model
+        self.summary_model = summary_model_override or ""

    def update_from_response(self, usage: Dict[str, Any]):
        """Update tracked token usage from API response."""
@@ -120,84 +119,30 @@ TURNS TO SUMMARIZE:

 Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""

-        # 1. Try the auxiliary model (cheap/fast)
-        if self.client:
-            try:
-                return self._call_summary_model(self.client, self.summary_model, prompt)
-            except Exception as e:
-                logging.warning(f"Failed to generate context summary with auxiliary model: {e}")
-
-        # 2. Fallback: try the user's main model endpoint
-        fallback_client, fallback_model = self._get_fallback_client()
-        if fallback_client is not None:
-            try:
-                logger.info("Retrying context summary with main model (%s)", fallback_model)
-                summary = self._call_summary_model(fallback_client, fallback_model, prompt)
-                self.client = fallback_client
-                self.summary_model = fallback_model
-                return summary
-            except Exception as fallback_err:
-                logging.warning(f"Main model summary also failed: {fallback_err}")
-
-        # 3. All models failed — return None so the caller drops turns without a summary
-        logging.warning("Context compression: no model available for summary. Middle turns will be dropped without summary.")
-        return None
-
-    def _call_summary_model(self, client, model: str, prompt: str) -> str:
-        """Make the actual LLM call to generate a summary. Raises on failure."""
-        kwargs = {
-            "model": model,
-            "messages": [{"role": "user", "content": prompt}],
-            "temperature": 0.3,
-            "timeout": 30.0,
-        }
-        # Most providers (OpenRouter, local models) use max_tokens.
-        # Direct OpenAI with newer models (gpt-4o, o-series, gpt-5+)
-        # requires max_completion_tokens instead.
+        # Use the centralized LLM router — handles provider resolution,
+        # auth, and fallback internally.
        try:
-            kwargs["max_tokens"] = self.summary_target_tokens * 2
-            response = client.chat.completions.create(**kwargs)
-        except Exception as first_err:
-            if "max_tokens" in str(first_err) or "unsupported_parameter" in str(first_err):
-                kwargs.pop("max_tokens", None)
-                kwargs["max_completion_tokens"] = self.summary_target_tokens * 2
-                response = client.chat.completions.create(**kwargs)
-            else:
-                raise
-
-        summary = response.choices[0].message.content.strip()
-        if not summary.startswith("[CONTEXT SUMMARY]:"):
-            summary = "[CONTEXT SUMMARY]: " + summary
-        return summary
-
-    def _get_fallback_client(self):
-        """Try to build a fallback client from the main model's endpoint config.
-
-        When the primary auxiliary client fails (e.g. stale OpenRouter key), this
-        creates a client using the user's active custom endpoint (OPENAI_BASE_URL)
-        so compression can still produce a real summary instead of a static string.
-
-        Returns (client, model) or (None, None).
-        """
-        custom_base = os.getenv("OPENAI_BASE_URL")
-        custom_key = os.getenv("OPENAI_API_KEY")
-        if not custom_base or not custom_key:
-            return None, None
-
-        # Don't fallback to the same provider that just failed
-        from hermes_constants import OPENROUTER_BASE_URL
-        if custom_base.rstrip("/") == OPENROUTER_BASE_URL.rstrip("/"):
-            return None, None
-
-        model = os.getenv("LLM_MODEL") or os.getenv("OPENAI_MODEL") or self.model
-        try:
-            from openai import OpenAI as _OpenAI
-            client = _OpenAI(api_key=custom_key, base_url=custom_base)
-            logger.debug("Built fallback auxiliary client: %s via %s", model, custom_base)
-            return client, model
-        except Exception as exc:
-            logger.debug("Could not build fallback auxiliary client: %s", exc)
-            return None, None
+            call_kwargs = {
+                "task": "compression",
+                "messages": [{"role": "user", "content": prompt}],
+                "temperature": 0.3,
+                "max_tokens": self.summary_target_tokens * 2,
+                "timeout": 30.0,
+            }
+            if self.summary_model:
+                call_kwargs["model"] = self.summary_model
+            response = call_llm(**call_kwargs)
+            summary = response.choices[0].message.content.strip()
+            if not summary.startswith("[CONTEXT SUMMARY]:"):
+                summary = "[CONTEXT SUMMARY]: " + summary
+            return summary
+        except RuntimeError:
+            logging.warning("Context compression: no provider available for "
+                            "summary. Middle turns will be dropped without summary.")
+            return None
+        except Exception as e:
+            logging.warning("Failed to generate context summary: %s", e)
+            return None

    # ------------------------------------------------------------------
    # Tool-call / tool-result pair integrity helpers
@@ -53,8 +53,10 @@ DEFAULT_CONTEXT_LENGTHS = {
    "glm-5": 202752,
    "glm-4.5": 131072,
    "glm-4.5-flash": 131072,
+    "kimi-for-coding": 262144,
    "kimi-k2.5": 262144,
    "kimi-k2-thinking": 262144,
+    "kimi-k2-thinking-turbo": 262144,
    "kimi-k2-turbo-preview": 262144,
    "kimi-k2-0905-preview": 131072,
    "MiniMax-M2.5": 204800,
@@ -131,6 +131,14 @@ PLATFORM_HINTS = {
        "files arrive as downloadable documents. You can also include image "
        "URLs in markdown format ![alt](url) and they will be sent as photos."
    ),
+    "email": (
+        "You are communicating via email. Write clear, well-structured responses "
+        "suitable for email. Use plain text formatting (no markdown). "
+        "Keep responses concise but complete. You can send file attachments — "
+        "include MEDIA:/absolute/path/to/file in your response. The subject line "
+        "is preserved for threading. Do not include greetings or sign-offs unless "
+        "contextually appropriate."
+    ),
    "cli": (
        "You are a CLI AI Agent. Try not to use markdown but simple text "
        "renderable inside a terminal."
@@ -179,7 +187,58 @@ def _skill_is_platform_compatible(skill_file: Path) -> bool:
        return True  # Err on the side of showing the skill


-def build_skills_system_prompt() -> str:
+def _read_skill_conditions(skill_file: Path) -> dict:
+    """Extract conditional activation fields from SKILL.md frontmatter."""
+    try:
+        from tools.skills_tool import _parse_frontmatter
+        raw = skill_file.read_text(encoding="utf-8")[:2000]
+        frontmatter, _ = _parse_frontmatter(raw)
+        hermes = frontmatter.get("metadata", {}).get("hermes", {})
+        return {
+            "fallback_for_toolsets": hermes.get("fallback_for_toolsets", []),
+            "requires_toolsets": hermes.get("requires_toolsets", []),
+            "fallback_for_tools": hermes.get("fallback_for_tools", []),
+            "requires_tools": hermes.get("requires_tools", []),
+        }
+    except Exception:
+        return {}
+
+
+def _skill_should_show(
+    conditions: dict,
+    available_tools: "set[str] | None",
+    available_toolsets: "set[str] | None",
+) -> bool:
+    """Return False if the skill's conditional activation rules exclude it."""
+    if available_tools is None and available_toolsets is None:
+        return True  # No filtering info — show everything (backward compat)
+
+    at = available_tools or set()
+    ats = available_toolsets or set()
+
+    # fallback_for: hide when the primary tool/toolset IS available
+    for ts in conditions.get("fallback_for_toolsets", []):
+        if ts in ats:
+            return False
+    for t in conditions.get("fallback_for_tools", []):
+        if t in at:
+            return False
+
+    # requires: hide when a required tool/toolset is NOT available
+    for ts in conditions.get("requires_toolsets", []):
+        if ts not in ats:
+            return False
+    for t in conditions.get("requires_tools", []):
+        if t not in at:
+            return False
+
+    return True
+
+
+def build_skills_system_prompt(
+    available_tools: "set[str] | None" = None,
+    available_toolsets: "set[str] | None" = None,
+) -> str:
    """Build a compact skill index for the system prompt.

    Scans ~/.hermes/skills/ for SKILL.md files grouped by category.
@@ -202,6 +261,10 @@ def build_skills_system_prompt() -> str:
        # Skip skills incompatible with the current OS platform
        if not _skill_is_platform_compatible(skill_file):
            continue
+        # Skip skills whose conditional activation rules exclude them
+        conditions = _read_skill_conditions(skill_file)
+        if not _skill_should_show(conditions, available_tools, available_toolsets):
+            continue
        rel_path = skill_file.relative_to(skills_dir)
        parts = rel_path.parts
        if len(parts) >= 2:
@@ -626,6 +626,10 @@ code_execution:
 delegation:
  max_iterations: 50                          # Max tool-calling turns per child (default: 50)
  default_toolsets: ["terminal", "file", "web"]  # Default toolsets for subagents
+  # model: "google/gemini-3-flash-preview"    # Override model for subagents (empty = inherit parent)
+  # provider: "openrouter"                    # Override provider for subagents (empty = inherit parent)
+  #                                           # Resolves full credentials (base_url, api_key) automatically.
+  #                                           # Supported: openrouter, nous, zai, kimi-coding, minimax

 # =============================================================================
 # Honcho Integration (Cross-Session User Modeling)
@@ -670,6 +674,11 @@ display:
  # Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
  bell_on_complete: false

+  # Show model reasoning/thinking before each response.
+  # When enabled, a dim box shows the model's thought process above the response.
+  # Toggle at runtime with /reasoning show or /reasoning hide.
+  show_reasoning: false
+
  # ───────────────────────────────────────────────────────────────────────────
  # Skin / Theme
  # ───────────────────────────────────────────────────────────────────────────
@@ -205,6 +205,7 @@ def load_cli_config() -> Dict[str, Any]:
        "display": {
            "compact": False,
            "resume_display": "full",
+            "show_reasoning": False,
            "skin": "default",
        },
        "clarify": {
@@ -217,6 +218,8 @@ def load_cli_config() -> Dict[str, Any]:
        "delegation": {
            "max_iterations": 45,  # Max tool-calling turns per child agent
            "default_toolsets": ["terminal", "file", "web"],  # Default toolsets for subagents
+            "model": "",       # Subagent model override (empty = inherit parent model)
+            "provider": "",    # Subagent provider override (empty = inherit parent provider)
        },
    }
    
@@ -413,7 +416,7 @@ from model_tools import get_tool_definitions, get_toolset_for_tool
 # Extracted CLI modules (Phase 3)
 from hermes_cli.banner import (
    cprint as _cprint, _GOLD, _BOLD, _DIM, _RST,
-    VERSION, HERMES_AGENT_LOGO, HERMES_CADUCEUS, COMPACT_BANNER,
+    VERSION, RELEASE_DATE, HERMES_AGENT_LOGO, HERMES_CADUCEUS, COMPACT_BANNER,
    get_available_skills as _get_available_skills,
    build_welcome_banner,
 )
@@ -990,7 +993,7 @@ def build_welcome_banner(console: Console, model: str, cwd: str, tools: List[dic
    # Wrap in a panel with the title
    outer_panel = Panel(
        layout_table,
-        title=f"[bold {_title_c}]{_agent_name} {VERSION}[/]",
+        title=f"[bold {_title_c}]{_agent_name} v{VERSION} ({RELEASE_DATE})[/]",
        border_style=_border_c,
        padding=(0, 2),
    )
@@ -1060,6 +1063,12 @@ def save_config_value(key_path: str, value: any) -> bool:
        with open(config_path, 'w') as f:
            yaml.dump(config, f, default_flow_style=False, sort_keys=False)
        
+        # Enforce owner-only permissions on config files (contain API keys)
+        try:
+            os.chmod(config_path, 0o600)
+        except (OSError, NotImplementedError):
+            pass
+        
        return True
    except Exception as e:
        logger.error("Failed to save config: %s", e)
@@ -1090,6 +1099,7 @@ class HermesCLI:
        compact: bool = False,
        resume: str = None,
        checkpoints: bool = False,
+        pass_session_id: bool = False,
    ):
        """
        Initialize the Hermes CLI.
@@ -1104,9 +1114,11 @@ class HermesCLI:
            verbose: Enable verbose logging
            compact: Use compact display mode
            resume: Session ID to resume (restores conversation history from SQLite)
+            pass_session_id: Include the session ID in the agent's system prompt
        """
        # Initialize Rich console
        self.console = Console()
+        self.config = CLI_CONFIG
        self.compact = compact if compact is not None else CLI_CONFIG["display"].get("compact", False)
        # tool_progress: "off", "new", "all", "verbose" (from config.yaml display section)
        self.tool_progress_mode = CLI_CONFIG["display"].get("tool_progress", "all")
@@ -1114,15 +1126,22 @@ class HermesCLI:
        self.resume_display = CLI_CONFIG["display"].get("resume_display", "full")
        # bell_on_complete: play terminal bell (\a) when agent finishes a response
        self.bell_on_complete = CLI_CONFIG["display"].get("bell_on_complete", False)
+        # show_reasoning: display model thinking/reasoning before the response
+        self.show_reasoning = CLI_CONFIG["display"].get("show_reasoning", False)
        self.verbose = verbose if verbose is not None else (self.tool_progress_mode == "verbose")
        
        # Configuration - priority: CLI args > env vars > config file
-        # Model can come from: CLI arg, LLM_MODEL env, OPENAI_MODEL env (custom endpoint), or config
-        self.model = model or os.getenv("LLM_MODEL") or os.getenv("OPENAI_MODEL") or CLI_CONFIG["model"]["default"]
+        # Model comes from: CLI arg or config.yaml (single source of truth).
+        # LLM_MODEL/OPENAI_MODEL env vars are NOT checked — config.yaml is
+        # authoritative.  This avoids conflicts in multi-agent setups where
+        # env vars would stomp each other.
+        _model_config = CLI_CONFIG.get("model", {})
+        _config_model = _model_config.get("default", "") if isinstance(_model_config, dict) else (_model_config or "")
+        self.model = model or _config_model or "anthropic/claude-opus-4.6"
        # Track whether model was explicitly chosen by the user or fell back
        # to the global default.  Provider-specific normalisation may override
        # the default silently but should warn when overriding an explicit choice.
-        self._model_is_default = not (model or os.getenv("LLM_MODEL") or os.getenv("OPENAI_MODEL"))
+        self._model_is_default = not model

        self._explicit_api_key = api_key
        self._explicit_base_url = base_url
@@ -1177,6 +1196,7 @@ class HermesCLI:
            cp_cfg = {"enabled": cp_cfg}
        self.checkpoints_enabled = checkpoints or cp_cfg.get("enabled", False)
        self.checkpoint_max_snapshots = cp_cfg.get("max_snapshots", 50)
+        self.pass_session_id = pass_session_id
        
        # Ephemeral system prompt: env var takes precedence, then config
        self.system_prompt = (
@@ -1243,6 +1263,10 @@ class HermesCLI:
        self._command_running = False
        self._command_status = ""

+        # Background task tracking: {task_id: threading.Thread}
+        self._background_tasks: Dict[str, threading.Thread] = {}
+        self._background_task_counter = 0
+
    def _invalidate(self, min_interval: float = 0.25) -> None:
        """Throttled UI repaint — prevents terminal blinking on slow/SSH connections."""
        import time as _time
@@ -1484,11 +1508,13 @@ class HermesCLI:
                platform="cli",
                session_db=self._session_db,
                clarify_callback=self._clarify_callback,
+                reasoning_callback=self._on_reasoning if self.show_reasoning else None,
                honcho_session_key=self.session_id,
                fallback_model=self._fallback_model,
                thinking_callback=self._on_thinking,
                checkpoints_enabled=self.checkpoints_enabled,
                checkpoint_max_snapshots=self.checkpoint_max_snapshots,
+                pass_session_id=self.pass_session_id,
            )
            # Apply any pending title now that the session exists in the DB
            if self._pending_title and self._session_db:
@@ -1942,18 +1968,22 @@ class HermesCLI:
        )
    
    def show_help(self):
-        """Display help information."""
-        _cprint(f"\n{_BOLD}+{'-' * 50}+{_RST}")
-        _cprint(f"{_BOLD}|{' ' * 14}(^_^)? Available Commands{' ' * 10}|{_RST}")
-        _cprint(f"{_BOLD}+{'-' * 50}+{_RST}\n")
-        
-        for cmd, desc in COMMANDS.items():
-            _cprint(f"  {_GOLD}{cmd:<15}{_RST} {_DIM}-{_RST} {desc}")
-        
+        """Display help information with categorized commands."""
+        from hermes_cli.commands import COMMANDS_BY_CATEGORY
+
+        _cprint(f"\n{_BOLD}+{'-' * 55}+{_RST}")
+        _cprint(f"{_BOLD}|{' ' * 14}(^_^)? Available Commands{' ' * 15}|{_RST}")
+        _cprint(f"{_BOLD}+{'-' * 55}+{_RST}")
+
+        for category, commands in COMMANDS_BY_CATEGORY.items():
+            _cprint(f"\n  {_BOLD}── {category} ──{_RST}")
+            for cmd, desc in commands.items():
+                _cprint(f"    {_GOLD}{cmd:<15}{_RST} {_DIM}-{_RST} {desc}")
+
        if _skill_commands:
            _cprint(f"\n  ⚡ {_BOLD}Skill Commands{_RST} ({len(_skill_commands)} installed):")
            for cmd, info in sorted(_skill_commands.items()):
-                _cprint(f"  {_GOLD}{cmd:<22}{_RST} {_DIM}-{_RST} {info['description']}")
+                _cprint(f"    {_GOLD}{cmd:<22}{_RST} {_DIM}-{_RST} {info['description']}")

        _cprint(f"\n  {_DIM}Tip: Just type your message to chat with Hermes!{_RST}")
        _cprint(f"  {_DIM}Multi-line: Alt+Enter for a new line{_RST}")
@@ -2239,6 +2269,72 @@ class HermesCLI:
        remaining = len(self.conversation_history)
        print(f"  {remaining} message(s) remaining in history.")
    
+    def _show_model_and_providers(self):
+        """Unified /model and /provider display.
+
+        Shows current model + provider, then lists all authenticated
+        providers with their available models so users can switch easily.
+        """
+        from hermes_cli.models import (
+            curated_models_for_provider, list_available_providers,
+            normalize_provider, _PROVIDER_LABELS,
+        )
+        from hermes_cli.auth import resolve_provider as _resolve_provider
+
+        # Resolve current provider
+        raw_provider = normalize_provider(self.provider)
+        if raw_provider == "auto":
+            try:
+                current = _resolve_provider(
+                    self.requested_provider,
+                    explicit_api_key=self._explicit_api_key,
+                    explicit_base_url=self._explicit_base_url,
+                )
+            except Exception:
+                current = "openrouter"
+        else:
+            current = raw_provider
+        current_label = _PROVIDER_LABELS.get(current, current)
+
+        print(f"\n  Current: {self.model} via {current_label}")
+        print()
+
+        # Show all authenticated providers with their models
+        providers = list_available_providers()
+        authed = [p for p in providers if p["authenticated"]]
+        unauthed = [p for p in providers if not p["authenticated"]]
+
+        if authed:
+            print("  Authenticated providers & models:")
+            for p in authed:
+                is_active = p["id"] == current
+                marker = " ← active" if is_active else ""
+                print(f"    [{p['id']}]{marker}")
+                curated = curated_models_for_provider(p["id"])
+                if curated:
+                    for mid, desc in curated:
+                        current_marker = " ← current" if (is_active and mid == self.model) else ""
+                        print(f"      {mid}{current_marker}")
+                else:
+                    print(f"      (use /model {p['id']}:<model-name>)")
+                print()
+
+        if unauthed:
+            names = ", ".join(p["label"] for p in unauthed)
+            print(f"  Not configured: {names}")
+            print(f"  Run: hermes setup")
+            print()
+
+        print("  Switch model:    /model <model-name>")
+        print("  Switch provider: /model <provider>:<model-name>")
+        if authed and len(authed) > 1:
+            # Show a concrete example with a non-active provider
+            other = next((p for p in authed if p["id"] != current), authed[0])
+            other_models = curated_models_for_provider(other["id"])
+            if other_models:
+                example_model = other_models[0][0]
+                print(f"  Example: /model {other['id']}:{example_model}")
+
    def _handle_prompt_command(self, cmd: str):
        """Handle the /prompt command to view or set system prompt."""
        parts = cmd.split(maxsplit=1)
@@ -2293,6 +2389,19 @@ class HermesCLI:
            print("    /personality    - Use a predefined personality")
            print()
    
+
+    @staticmethod
+    def _resolve_personality_prompt(value) -> str:
+        """Accept string or dict personality value; return system prompt string."""
+        if isinstance(value, dict):
+            parts = [value.get("system_prompt", "")]
+            if value.get("tone"):
+                parts.append(f'Tone: {value["tone"]}' )
+            if value.get("style"):
+                parts.append(f'Style: {value["style"]}' )
+            return "\n".join(p for p in parts if p)
+        return str(value)
+
    def _handle_personality_command(self, cmd: str):
        """Handle the /personality command to set predefined personalities."""
        parts = cmd.split(maxsplit=1)
@@ -2301,8 +2410,16 @@ class HermesCLI:
            # Set personality
            personality_name = parts[1].strip().lower()
            
-            if personality_name in self.personalities:
-                self.system_prompt = self.personalities[personality_name]
+            if personality_name in ("none", "default", "neutral"):
+                self.system_prompt = ""
+                self.agent = None  # Force re-init
+                if save_config_value("agent.system_prompt", ""):
+                    print("(^_^)b Personality cleared (saved to config)")
+                else:
+                    print("(^_^) Personality cleared (session only)")
+                print("  No personality overlay — using base agent behavior.")
+            elif personality_name in self.personalities:
+                self.system_prompt = self._resolve_personality_prompt(self.personalities[personality_name])
                self.agent = None  # Force re-init
                if save_config_value("agent.system_prompt", self.system_prompt):
                    print(f"(^_^)b Personality set to '{personality_name}' (saved to config)")
@@ -2311,7 +2428,7 @@ class HermesCLI:
                print(f"  \"{self.system_prompt[:60]}{'...' if len(self.system_prompt) > 60 else ''}\"")
            else:
                print(f"(._.) Unknown personality: {personality_name}")
-                print(f"  Available: {', '.join(self.personalities.keys())}")
+                print(f"  Available: none, {', '.join(self.personalities.keys())}")
        else:
            # Show available personalities
            print()
@@ -2319,8 +2436,13 @@ class HermesCLI:
            print("|" + " " * 12 + "(^o^)/ Personalities" + " " * 15 + "|")
            print("+" + "-" * 50 + "+")
            print()
+            print(f"  {'none':<12} - (no personality overlay)")
            for name, prompt in self.personalities.items():
-                print(f"  {name:<12} - \"{prompt}\"")
+                if isinstance(prompt, dict):
+                    preview = prompt.get("description") or prompt.get("system_prompt", "")[:50]
+                else:
+                    preview = str(prompt)[:50]
+                print(f"  {name:<12} - {preview}")
            print()
            print("  Usage: /personality <name>")
            print()
@@ -2677,7 +2799,11 @@ class HermesCLI:
                        base_url_for_probe = runtime.get("base_url", "")
                    except Exception as e:
                        provider_label = _PROVIDER_LABELS.get(target_provider, target_provider)
-                        print(f"(>_<) Could not resolve credentials for provider '{provider_label}': {e}")
+                        if target_provider == "custom":
+                            print(f"(>_<) Custom endpoint not configured. Set OPENAI_BASE_URL and OPENAI_API_KEY,")
+                            print(f"      or run: hermes setup → Custom OpenAI-compatible endpoint")
+                        else:
+                            print(f"(>_<) Could not resolve credentials for provider '{provider_label}': {e}")
                        print(f"(^_^) Current model unchanged: {self.model}")
                        return True

@@ -2724,65 +2850,9 @@ class HermesCLI:
                            print(f"  Reason: {message}")
                        print("  Note: Model will revert on restart. Use a verified model to save to config.")
            else:
-                from hermes_cli.models import curated_models_for_provider, normalize_provider, _PROVIDER_LABELS
-                from hermes_cli.auth import resolve_provider as _resolve_provider
-                # Resolve "auto" to the actual provider using credential detection
-                raw_provider = normalize_provider(self.provider)
-                if raw_provider == "auto":
-                    try:
-                        display_provider = _resolve_provider(
-                            self.requested_provider,
-                            explicit_api_key=self._explicit_api_key,
-                            explicit_base_url=self._explicit_base_url,
-                        )
-                    except Exception:
-                        display_provider = "openrouter"
-                else:
-                    display_provider = raw_provider
-                provider_label = _PROVIDER_LABELS.get(display_provider, display_provider)
-                print(f"\n  Current model:    {self.model}")
-                print(f"  Current provider: {provider_label}")
-                print()
-                curated = curated_models_for_provider(display_provider)
-                if curated:
-                    print(f"  Available models ({provider_label}):")
-                    for mid, desc in curated:
-                        marker = " ←" if mid == self.model else ""
-                        label = f"  {desc}" if desc else ""
-                        print(f"    {mid}{label}{marker}")
-                    print()
-                print("  Usage: /model <model-name>")
-                print("         /model provider:model-name  (to switch provider)")
-                print("  Example: /model openrouter:anthropic/claude-sonnet-4.5")
-                print("  See /provider for available providers")
+                self._show_model_and_providers()
        elif cmd_lower == "/provider":
-            from hermes_cli.models import list_available_providers, normalize_provider, _PROVIDER_LABELS
-            from hermes_cli.auth import resolve_provider as _resolve_provider
-            # Resolve current provider
-            raw_provider = normalize_provider(self.provider)
-            if raw_provider == "auto":
-                try:
-                    current = _resolve_provider(
-                        self.requested_provider,
-                        explicit_api_key=self._explicit_api_key,
-                        explicit_base_url=self._explicit_base_url,
-                    )
-                except Exception:
-                    current = "openrouter"
-            else:
-                current = raw_provider
-            current_label = _PROVIDER_LABELS.get(current, current)
-            print(f"\n  Current provider: {current_label} ({current})\n")
-            providers = list_available_providers()
-            print("  Available providers:")
-            for p in providers:
-                marker = " ← active" if p["id"] == current else ""
-                auth = "✓" if p["authenticated"] else "✗"
-                aliases = f"  (also: {', '.join(p['aliases'])})" if p["aliases"] else ""
-                print(f"    [{auth}] {p['id']:<14} {p['label']}{aliases}{marker}")
-            print()
-            print("  Switch: /model provider:model-name")
-            print("  Setup:  hermes setup")
+            self._show_model_and_providers()
        elif cmd_lower.startswith("/prompt"):
            # Use original case so prompt text isn't lowercased
            self._handle_prompt_command(cmd_original)
@@ -2807,6 +2877,8 @@ class HermesCLI:
            self._show_gateway_status()
        elif cmd_lower == "/verbose":
            self._toggle_verbose()
+        elif cmd_lower.startswith("/reasoning"):
+            self._handle_reasoning_command(cmd_original)
        elif cmd_lower == "/compress":
            self._manual_compress()
        elif cmd_lower == "/usage":
@@ -2820,12 +2892,37 @@ class HermesCLI:
                self._reload_mcp()
        elif cmd_lower.startswith("/rollback"):
            self._handle_rollback_command(cmd_original)
+        elif cmd_lower.startswith("/background"):
+            self._handle_background_command(cmd_original)
        elif cmd_lower.startswith("/skin"):
            self._handle_skin_command(cmd_original)
        else:
-            # Check for skill slash commands (/gif-search, /axolotl, etc.)
+            # Check for user-defined quick commands (bypass agent loop, no LLM call)
            base_cmd = cmd_lower.split()[0]
-            if base_cmd in _skill_commands:
+            quick_commands = self.config.get("quick_commands", {})
+            if base_cmd.lstrip("/") in quick_commands:
+                qcmd = quick_commands[base_cmd.lstrip("/")]
+                if qcmd.get("type") == "exec":
+                    import subprocess
+                    exec_cmd = qcmd.get("command", "")
+                    if exec_cmd:
+                        try:
+                            result = subprocess.run(
+                                exec_cmd, shell=True, capture_output=True,
+                                text=True, timeout=30
+                            )
+                            output = result.stdout.strip() or result.stderr.strip()
+                            self.console.print(output if output else "[dim]Command returned no output[/]")
+                        except subprocess.TimeoutExpired:
+                            self.console.print("[bold red]Quick command timed out (30s)[/]")
+                        except Exception as e:
+                            self.console.print(f"[bold red]Quick command error: {e}[/]")
+                    else:
+                        self.console.print(f"[bold red]Quick command '{base_cmd}' has no command defined[/]")
+                else:
+                    self.console.print(f"[bold red]Quick command '{base_cmd}' has unsupported type (only 'exec' is supported)[/]")
+            # Check for skill slash commands (/gif-search, /axolotl, etc.)
+            elif base_cmd in _skill_commands:
                user_instruction = cmd_original[len(base_cmd):].strip()
                msg = build_skill_invocation_message(base_cmd, user_instruction)
                if msg:
@@ -2841,6 +2938,113 @@ class HermesCLI:
        
        return True
    
+    def _handle_background_command(self, cmd: str):
+        """Handle /background <prompt> — run a prompt in a separate background session.
+
+        Spawns a new AIAgent in a background thread with its own session.
+        When it completes, prints the result to the CLI without modifying
+        the active session's conversation history.
+        """
+        parts = cmd.strip().split(maxsplit=1)
+        if len(parts) < 2 or not parts[1].strip():
+            _cprint("  Usage: /background <prompt>")
+            _cprint("  Example: /background Summarize the top HN stories today")
+            _cprint("  The task runs in a separate session and results display here when done.")
+            return
+
+        prompt = parts[1].strip()
+        self._background_task_counter += 1
+        task_num = self._background_task_counter
+        task_id = f"bg_{datetime.now().strftime('%H%M%S')}_{uuid.uuid4().hex[:6]}"
+
+        # Make sure we have valid credentials
+        if not self._ensure_runtime_credentials():
+            _cprint("  (>_<) Cannot start background task: no valid credentials.")
+            return
+
+        _cprint(f"  🔄 Background task #{task_num} started: \"{prompt[:60]}{'...' if len(prompt) > 60 else ''}\"")
+        _cprint(f"  Task ID: {task_id}")
+        _cprint(f"  You can continue chatting — results will appear when done.\n")
+
+        def run_background():
+            try:
+                bg_agent = AIAgent(
+                    model=self.model,
+                    api_key=self.api_key,
+                    base_url=self.base_url,
+                    provider=self.provider,
+                    api_mode=self.api_mode,
+                    max_iterations=self.max_turns,
+                    enabled_toolsets=self.enabled_toolsets,
+                    quiet_mode=True,
+                    verbose_logging=False,
+                    session_id=task_id,
+                    platform="cli",
+                    session_db=self._session_db,
+                    reasoning_config=self.reasoning_config,
+                    providers_allowed=self._providers_only,
+                    providers_ignored=self._providers_ignore,
+                    providers_order=self._providers_order,
+                    provider_sort=self._provider_sort,
+                    provider_require_parameters=self._provider_require_params,
+                    provider_data_collection=self._provider_data_collection,
+                    fallback_model=self._fallback_model,
+                )
+
+                result = bg_agent.run_conversation(
+                    user_message=prompt,
+                    task_id=task_id,
+                )
+
+                response = result.get("final_response", "") if result else ""
+                if not response and result and result.get("error"):
+                    response = f"Error: {result['error']}"
+
+                # Display result in the CLI (thread-safe via patch_stdout)
+                print()
+                _cprint(f"{_GOLD}{'─' * 40}{_RST}")
+                _cprint(f"  ✅ Background task #{task_num} complete")
+                _cprint(f"  Prompt: \"{prompt[:60]}{'...' if len(prompt) > 60 else ''}\"")
+                _cprint(f"{_GOLD}{'─' * 40}{_RST}")
+                if response:
+                    try:
+                        from hermes_cli.skin_engine import get_active_skin
+                        _skin = get_active_skin()
+                        label = _skin.get_branding("response_label", "⚕ Hermes")
+                        _resp_color = _skin.get_color("response_border", "#CD7F32")
+                    except Exception:
+                        label = "⚕ Hermes"
+                        _resp_color = "#CD7F32"
+
+                    _chat_console = ChatConsole()
+                    _chat_console.print(Panel(
+                        response,
+                        title=f"[bold]{label} (background #{task_num})[/bold]",
+                        title_align="left",
+                        border_style=_resp_color,
+                        box=rich_box.HORIZONTALS,
+                        padding=(1, 2),
+                    ))
+                else:
+                    _cprint("  (No response generated)")
+
+                # Play bell if enabled
+                if self.bell_on_complete:
+                    sys.stdout.write("\a")
+                    sys.stdout.flush()
+
+            except Exception as e:
+                print()
+                _cprint(f"  ❌ Background task #{task_num} failed: {e}")
+            finally:
+                self._background_tasks.pop(task_id, None)
+                if self._app:
+                    self._invalidate(min_interval=0)
+
+        thread = threading.Thread(target=run_background, daemon=True, name=f"bg-task-{task_id}")
+        self._background_tasks[task_id] = thread
+        thread.start()
+
    def _handle_skin_command(self, cmd: str):
        """Handle /skin [name] — show or change the display skin."""
        try:
@@ -2900,6 +3104,77 @@ class HermesCLI:
        }
        self.console.print(labels.get(self.tool_progress_mode, ""))

+    def _handle_reasoning_command(self, cmd: str):
+        """Handle /reasoning — manage effort level and display toggle.
+
+        Usage:
+            /reasoning              Show current effort level and display state
+            /reasoning <level>      Set reasoning effort (none, low, medium, high, xhigh)
+            /reasoning show|on      Show model thinking/reasoning in output
+            /reasoning hide|off     Hide model thinking/reasoning from output
+        """
+        parts = cmd.strip().split(maxsplit=1)
+
+        if len(parts) < 2:
+            # Show current state
+            rc = self.reasoning_config
+            if rc is None:
+                level = "medium (default)"
+            elif rc.get("enabled") is False:
+                level = "none (disabled)"
+            else:
+                level = rc.get("effort", "medium")
+            display_state = "on ✓" if self.show_reasoning else "off"
+            _cprint(f"  {_GOLD}Reasoning effort:  {level}{_RST}")
+            _cprint(f"  {_GOLD}Reasoning display: {display_state}{_RST}")
+            _cprint(f"  {_DIM}Usage: /reasoning <none|low|medium|high|xhigh|show|hide>{_RST}")
+            return
+
+        arg = parts[1].strip().lower()
+
+        # Display toggle
+        if arg in ("show", "on"):
+            self.show_reasoning = True
+            if self.agent:
+                self.agent.reasoning_callback = self._on_reasoning
+            save_config_value("display.show_reasoning", True)
+            _cprint(f"  {_GOLD}✓ Reasoning display: ON (saved){_RST}")
+            _cprint(f"  {_DIM}  Model thinking will be shown during and after each response.{_RST}")
+            return
+        if arg in ("hide", "off"):
+            self.show_reasoning = False
+            if self.agent:
+                self.agent.reasoning_callback = None
+            save_config_value("display.show_reasoning", False)
+            _cprint(f"  {_GOLD}✓ Reasoning display: OFF (saved){_RST}")
+            return
+
+        # Effort level change
+        parsed = _parse_reasoning_config(arg)
+        if parsed is None:
+            _cprint(f"  {_DIM}(._.) Unknown argument: {arg}{_RST}")
+            _cprint(f"  {_DIM}Valid levels: none, low, minimal, medium, high, xhigh{_RST}")
+            _cprint(f"  {_DIM}Display:      show, hide{_RST}")
+            return
+
+        self.reasoning_config = parsed
+        self.agent = None  # Force agent re-init with new reasoning config
+
+        if save_config_value("agent.reasoning_effort", arg):
+            _cprint(f"  {_GOLD}✓ Reasoning effort set to '{arg}' (saved to config){_RST}")
+        else:
+            _cprint(f"  {_GOLD}✓ Reasoning effort set to '{arg}' (session only){_RST}")
+
+    def _on_reasoning(self, reasoning_text: str):
+        """Callback for intermediate reasoning display during tool-call loops."""
+        lines = reasoning_text.strip().splitlines()
+        if len(lines) > 5:
+            preview = "\n".join(lines[:5])
+            preview += f"\n  ... ({len(lines) - 5} more lines)"
+        else:
+            preview = reasoning_text.strip()
+        _cprint(f"  {_DIM}[thinking] {preview}{_RST}")
+
    def _manual_compress(self):
        """Manually trigger context compression on the current conversation."""
        if not self.conversation_history or len(self.conversation_history) < 4:
@@ -3333,6 +3608,19 @@ class HermesCLI:
                                continue
                            print(f"\n⚡ New message detected, interrupting...")
                            self.agent.interrupt(interrupt_msg)
+                            # Debug: log to file (stdout may be devnull from redirect_stdout)
+                            try:
+                                import pathlib as _pl
+                                _dbg = _pl.Path.home() / ".hermes" / "interrupt_debug.log"
+                                with open(_dbg, "a") as _f:
+                                    import time as _t
+                                    _f.write(f"{_t.strftime('%H:%M:%S')} interrupt fired: msg={str(interrupt_msg)[:60]!r}, "
+                                             f"children={len(self.agent._active_children)}, "
+                                             f"parent._interrupt={self.agent._interrupt_requested}\n")
+                                    for _ci, _ch in enumerate(self.agent._active_children):
+                                        _f.write(f"  child[{_ci}]._interrupt={_ch._interrupt_requested}\n")
+                            except Exception:
+                                pass
                            break
                    except queue.Empty:
                        pass  # Queue empty or timeout, continue waiting
@@ -3369,6 +3657,24 @@ class HermesCLI:
                if response and pending_message:
                    response = response + "\n\n---\n_[Interrupted - processing new message]_"
            
+            # Display reasoning (thinking) box if enabled and available
+            if self.show_reasoning and result:
+                reasoning = result.get("last_reasoning")
+                if reasoning:
+                    w = shutil.get_terminal_size().columns
+                    r_label = " Reasoning "
+                    r_fill = w - 2 - len(r_label)
+                    r_top = f"{_DIM}┌─{r_label}{'─' * max(r_fill - 1, 0)}┐{_RST}"
+                    r_bot = f"{_DIM}└{'─' * (w - 2)}┘{_RST}"
+                    # Collapse long reasoning: show first 10 lines
+                    lines = reasoning.strip().splitlines()
+                    if len(lines) > 10:
+                        display_reasoning = "\n".join(lines[:10])
+                        display_reasoning += f"\n{_DIM}  ... ({len(lines) - 10} more lines){_RST}"
+                    else:
+                        display_reasoning = reasoning.strip()
+                    _cprint(f"\n{r_top}\n{_DIM}{display_reasoning}{_RST}\n{r_bot}")
+
            if response:
                # Use a Rich Panel for the response box — adapts to terminal
                # width at render time instead of hard-coding border length.
@@ -3531,7 +3837,17 @@ class HermesCLI:
                selected = state["selected"]
                choices = state["choices"]
                if 0 <= selected < len(choices):
-                    state["response_queue"].put(choices[selected])
+                    chosen = choices[selected]
+                    if chosen == "view":
+                        # Toggle full command display without closing the prompt
+                        state["show_full"] = True
+                        # Remove the "view" option since it's been used
+                        state["choices"] = [c for c in choices if c != "view"]
+                        if state["selected"] >= len(state["choices"]):
+                            state["selected"] = len(state["choices"]) - 1
+                        event.app.invalidate()
+                        return
+                    state["response_queue"].put(chosen)
                self._approval_state = None
                event.app.invalidate()
                return
@@ -3574,6 +3890,16 @@ class HermesCLI:
                payload = (text, images) if images else text
                if self._agent_running and not (text and text.startswith("/")):
                    self._interrupt_queue.put(payload)
+                    # Debug: log to file when message enters interrupt queue
+                    try:
+                        import pathlib as _pl
+                        _dbg = _pl.Path.home() / ".hermes" / "interrupt_debug.log"
+                        with open(_dbg, "a") as _f:
+                            import time as _t
+                            _f.write(f"{_t.strftime('%H:%M:%S')} ENTER: queued interrupt msg={str(payload)[:60]!r}, "
+                                     f"agent_running={self._agent_running}\n")
+                    except Exception:
+                        pass
                else:
                    self._pending_input.put(payload)
                event.app.current_buffer.reset(append_to_history=True)
@@ -4079,13 +4405,18 @@ class HermesCLI:
            description = state["description"]
            choices = state["choices"]
            selected = state.get("selected", 0)
+            show_full = state.get("show_full", False)

-            cmd_display = command[:70] + '...' if len(command) > 70 else command
+            if show_full or len(command) <= 70:
+                cmd_display = command
+            else:
+                cmd_display = command[:70] + '...'
            choice_labels = {
                "once": "Allow once",
                "session": "Allow for this session",
                "always": "Add to permanent allowlist",
                "deny": "Deny",
+                "view": "Show full command",
            }
            preview_lines = _wrap_panel_text(description, 60)
            preview_lines.extend(_wrap_panel_text(cmd_display, 60))
@@ -4257,7 +4588,7 @@ class HermesCLI:
                    
                    # Check for commands
                    if isinstance(user_input, str) and user_input.startswith("/"):
-                        print(f"\n⚙️  {user_input}")
+                        _cprint(f"\n⚙️  {user_input}")
                        if not self.process_command(user_input):
                            self._should_exit = True
                            # Schedule app exit
@@ -4356,6 +4687,7 @@ def main(
    base_url: str = None,
    max_turns: int = None,
    verbose: bool = False,
+    quiet: bool = False,
    compact: bool = False,
    list_tools: bool = False,
    list_toolsets: bool = False,
@@ -4364,6 +4696,7 @@ def main(
    worktree: bool = False,
    w: bool = False,
    checkpoints: bool = False,
+    pass_session_id: bool = False,
 ):
    """
    Hermes Agent CLI - Interactive AI Assistant
@@ -4469,6 +4802,7 @@ def main(
        compact=compact,
        resume=resume,
        checkpoints=checkpoints,
+        pass_session_id=pass_session_id,
    )

    # Inject worktree context into agent's system prompt
@@ -4498,10 +4832,22 @@ def main(
    
    # Handle single query mode
    if query:
-        cli.show_banner()
-        cli.console.print(f"[bold blue]Query:[/] {query}")
-        cli.chat(query)
-        cli._print_exit_summary()
+        if quiet:
+            # Quiet mode: suppress banner, spinner, tool previews.
+            # Only print the final response and parseable session info.
+            cli.tool_progress_mode = "off"
+            if cli._init_agent():
+                cli.agent.quiet_mode = True
+                result = cli.agent.run_conversation(query)
+                response = result.get("final_response", "") if isinstance(result, dict) else str(result)
+                if response:
+                    print(response)
+                print(f"\nsession_id: {cli.session_id}")
+        else:
+            cli.show_banner()
+            cli.console.print(f"[bold blue]Query:[/] {query}")
+            cli.chat(query)
+            cli._print_exit_summary()
        return
    
    # Run interactive mode
@@ -32,10 +32,29 @@ JOBS_FILE = CRON_DIR / "jobs.json"
 OUTPUT_DIR = CRON_DIR / "output"


+def _secure_dir(path: Path):
+    """Set directory to owner-only access (0700). No-op on Windows."""
+    try:
+        os.chmod(path, 0o700)
+    except (OSError, NotImplementedError):
+        pass  # Windows or other platforms where chmod is not supported
+
+
+def _secure_file(path: Path):
+    """Set file to owner-only read/write (0600). No-op on Windows."""
+    try:
+        if path.exists():
+            os.chmod(path, 0o600)
+    except (OSError, NotImplementedError):
+        pass
+
+
 def ensure_dirs():
-    """Ensure cron directories exist."""
+    """Ensure cron directories exist with secure permissions."""
    CRON_DIR.mkdir(parents=True, exist_ok=True)
    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+    _secure_dir(CRON_DIR)
+    _secure_dir(OUTPUT_DIR)


 # =============================================================================
@@ -149,16 +168,22 @@ def parse_schedule(schedule: str) -> Dict[str, Any]:


 def _ensure_aware(dt: datetime) -> datetime:
-    """Make a naive datetime tz-aware using the configured timezone.
+    """Return a timezone-aware datetime in Hermes configured timezone.

-    Handles backward compatibility: timestamps stored before timezone support
-    are naive (server-local).  We assume they were in the same timezone as
-    the current configuration so comparisons work without crashing.
+    Backward compatibility:
+    - Older stored timestamps may be naive.
+    - Naive values are interpreted as *system-local wall time* (the timezone
+      `datetime.now()` used when they were created), then converted to the
+      configured Hermes timezone.
+
+    This preserves relative ordering for legacy naive timestamps across
+    timezone changes and avoids false not-due results.
    """
+    target_tz = _hermes_now().tzinfo
    if dt.tzinfo is None:
-        tz = _hermes_now().tzinfo
-        return dt.replace(tzinfo=tz)
-    return dt
+        local_tz = datetime.now().astimezone().tzinfo
+        return dt.replace(tzinfo=local_tz).astimezone(target_tz)
+    return dt.astimezone(target_tz)


 def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
@@ -223,6 +248,7 @@ def save_jobs(jobs: List[Dict[str, Any]]):
            f.flush()
            os.fsync(f.fileno())
        os.replace(tmp_path, JOBS_FILE)
+        _secure_file(JOBS_FILE)
    except BaseException:
        try:
            os.unlink(tmp_path)
@@ -400,11 +426,13 @@ def save_job_output(job_id: str, output: str):
    ensure_dirs()
    job_output_dir = OUTPUT_DIR / job_id
    job_output_dir.mkdir(parents=True, exist_ok=True)
+    _secure_dir(job_output_dir)
    
    timestamp = _hermes_now().strftime("%Y-%m-%d_%H-%M-%S")
    output_file = job_output_dir / f"{timestamp}.md"
    
    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(output)
+    _secure_file(output_file)
    
    return output_file
@@ -45,7 +45,7 @@ _LOCK_FILE = _LOCK_DIR / ".tick.lock"


 def _resolve_origin(job: dict) -> Optional[dict]:
-    """Extract origin info from a job, returning {platform, chat_id, chat_name} or None."""
+    """Extract origin info from a job, preserving any extra routing metadata."""
    origin = job.get("origin")
    if not origin:
        return None
@@ -69,6 +69,8 @@ def _deliver_result(job: dict, content: str) -> None:
    if deliver == "local":
        return

+    thread_id = None
+
    # Resolve target platform + chat_id
    if deliver == "origin":
        if not origin:
@@ -76,6 +78,7 @@ def _deliver_result(job: dict, content: str) -> None:
            return
        platform_name = origin["platform"]
        chat_id = origin["chat_id"]
+        thread_id = origin.get("thread_id")
    elif ":" in deliver:
        platform_name, chat_id = deliver.split(":", 1)
    else:
@@ -83,6 +86,7 @@ def _deliver_result(job: dict, content: str) -> None:
        platform_name = deliver
        if origin and origin.get("platform") == platform_name:
            chat_id = origin["chat_id"]
+            thread_id = origin.get("thread_id")
        else:
            # Fall back to home channel
            chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
@@ -99,6 +103,7 @@ def _deliver_result(job: dict, content: str) -> None:
        "slack": Platform.SLACK,
        "whatsapp": Platform.WHATSAPP,
        "signal": Platform.SIGNAL,
+        "email": Platform.EMAIL,
    }
    platform = platform_map.get(platform_name.lower())
    if not platform:
@@ -118,13 +123,13 @@ def _deliver_result(job: dict, content: str) -> None:

    # Run the async send in a fresh event loop (safe from any thread)
    try:
-        result = asyncio.run(_send_to_platform(platform, pconfig, chat_id, content))
+        result = asyncio.run(_send_to_platform(platform, pconfig, chat_id, content, thread_id=thread_id))
    except RuntimeError:
        # asyncio.run() fails if there's already a running loop in this thread;
        # spin up a new thread to avoid that.
        import concurrent.futures
        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
-            future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, content))
+            future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, content, thread_id=thread_id))
            result = future.result(timeout=30)
    except Exception as e:
        logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
@@ -137,7 +142,7 @@ def _deliver_result(job: dict, content: str) -> None:
        # Mirror the delivered content into the target's gateway session
        try:
            from gateway.mirror import mirror_to_session
-            mirror_to_session(platform_name, chat_id, content, source_label="cron")
+            mirror_to_session(platform_name, chat_id, content, source_label="cron", thread_id=thread_id)
        except Exception as e:
            logger.warning("Job '%s': mirror_to_session failed: %s", job["id"], e)

@@ -175,7 +180,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
        except UnicodeDecodeError:
            load_dotenv(str(_hermes_home / ".env"), override=True, encoding="latin-1")

-        model = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
+        model = os.getenv("HERMES_MODEL") or "anthropic/claude-opus-4.6"

        # Load config.yaml for model, reasoning, prefill, toolsets, provider routing
        _cfg = {}
@@ -0,0 +1,110 @@
+# Migrating from OpenClaw to Hermes Agent
+
+This guide covers how to import your OpenClaw settings, memories, skills, and API keys into Hermes Agent.
+
+## Three Ways to Migrate
+
+### 1. Automatic (during first-time setup)
+
+When you run `hermes setup` for the first time and Hermes detects `~/.openclaw`, it automatically offers to import your OpenClaw data before configuration begins. Just accept the prompt and everything is handled for you.
+
+### 2. CLI Command (quick, scriptable)
+
+```bash
+hermes claw migrate                      # Full migration with confirmation prompt
+hermes claw migrate --dry-run            # Preview what would happen
+hermes claw migrate --preset user-data   # Migrate without API keys/secrets
+hermes claw migrate --yes                # Skip confirmation prompt
+```
+
+**All options:**
+
+| Flag | Description |
+|------|-------------|
+| `--source PATH` | Path to OpenClaw directory (default: `~/.openclaw`) |
+| `--dry-run` | Preview only — no files are modified |
+| `--preset {user-data,full}` | Migration preset (default: `full`). `user-data` excludes secrets |
+| `--overwrite` | Overwrite existing files (default: skip conflicts) |
+| `--migrate-secrets` | Include allowlisted secrets (auto-enabled with `full` preset) |
+| `--workspace-target PATH` | Copy workspace instructions (AGENTS.md) to this absolute path |
+| `--skill-conflict {skip,overwrite,rename}` | How to handle skill name conflicts (default: `skip`) |
+| `--yes`, `-y` | Skip confirmation prompts |
+
+### 3. Agent-Guided (interactive, with previews)
+
+Ask the agent to run the migration for you:
+
+```
+> Migrate my OpenClaw setup to Hermes
+```
+
+The agent will use the `openclaw-migration` skill to:
+1. Run a dry-run first to preview changes
+2. Ask about conflict resolution (SOUL.md, skills, etc.)
+3. Let you choose between `user-data` and `full` presets
+4. Execute the migration with your choices
+5. Print a detailed summary of what was migrated
+
+## What Gets Migrated
+
+### `user-data` preset
+| Item | Source | Destination |
+|------|--------|-------------|
+| SOUL.md | `~/.openclaw/workspace/SOUL.md` | `~/.hermes/SOUL.md` |
+| Memory entries | `~/.openclaw/workspace/MEMORY.md` | `~/.hermes/memories/MEMORY.md` |
+| User profile | `~/.openclaw/workspace/USER.md` | `~/.hermes/memories/USER.md` |
+| Skills | `~/.openclaw/workspace/skills/` | `~/.hermes/skills/openclaw-imports/` |
+| Command allowlist | `~/.openclaw/workspace/exec_approval_patterns.yaml` | Merged into `~/.hermes/config.yaml` |
+| Messaging settings | `~/.openclaw/config.yaml` (TELEGRAM_ALLOWED_USERS, MESSAGING_CWD) | `~/.hermes/.env` |
+| TTS assets | `~/.openclaw/workspace/tts/` | `~/.hermes/tts/` |
+
+### `full` preset (adds to `user-data`)
+| Item | Source | Destination |
+|------|--------|-------------|
+| Telegram bot token | `~/.openclaw/config.yaml` | `~/.hermes/.env` |
+| OpenRouter API key | `~/.openclaw/.env` or config | `~/.hermes/.env` |
+| OpenAI API key | `~/.openclaw/.env` or config | `~/.hermes/.env` |
+| Anthropic API key | `~/.openclaw/.env` or config | `~/.hermes/.env` |
+| ElevenLabs API key | `~/.openclaw/.env` or config | `~/.hermes/.env` |
+
+Only these 6 allowlisted secrets are ever imported. Other credentials are skipped and reported.
+
+## Conflict Handling
+
+By default, the migration **will not overwrite** existing Hermes data:
+
+- **SOUL.md** — skipped if one already exists in `~/.hermes/`
+- **Memory entries** — skipped if memories already exist (to avoid duplicates)
+- **Skills** — skipped if a skill with the same name already exists
+- **API keys** — skipped if the key is already set in `~/.hermes/.env`
+
+To overwrite conflicts, use `--overwrite`. The migration creates backups before overwriting.
+
+For skills, you can also use `--skill-conflict rename` to import conflicting skills under a new name (e.g., `skill-name-imported`).
+
+## Migration Report
+
+Every migration (including dry runs) produces a report showing:
+- **Migrated items** — what was successfully imported
+- **Conflicts** — items skipped because they already exist
+- **Skipped items** — items not found in the source
+- **Errors** — items that failed to import
+
+For execute runs, the full report is saved to `~/.hermes/migration/openclaw/<timestamp>/`.
+
+## Troubleshooting
+
+### "OpenClaw directory not found"
+The migration looks for `~/.openclaw` by default. If your OpenClaw is installed elsewhere, use `--source`:
+```bash
+hermes claw migrate --source /path/to/.openclaw
+```
+
+### "Migration script not found"
+The migration script ships with Hermes Agent. If you installed via pip (not git clone), the `optional-skills/` directory may not be present. Install the skill from the Skills Hub:
+```bash
+hermes skills install openclaw-migration
+```
+
+### Memory overflow
+If your OpenClaw MEMORY.md or USER.md exceeds Hermes' character limits, excess entries are exported to an overflow file in the migration report directory. You can manually review and add the most important ones.
@@ -18,9 +18,14 @@ Benchmarks (eval-only):
    - benchmarks/terminalbench_2/: Terminal-Bench 2.0 evaluation
 """

-from environments.agent_loop import AgentResult, HermesAgentLoop
-from environments.tool_context import ToolContext
-from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+try:
+    from environments.agent_loop import AgentResult, HermesAgentLoop
+    from environments.tool_context import ToolContext
+    from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+except ImportError:
+    # atroposlib not installed — environments are unavailable but
+    # submodules like tool_call_parsers can still be imported directly.
+    pass

 __all__ = [
    "AgentResult",
@@ -249,23 +249,62 @@ class HermesAgentLoop:
            reasoning = _extract_reasoning_from_message(assistant_msg)
            reasoning_per_turn.append(reasoning)

-            # Check for tool calls -- standard OpenAI spec
+            # Check for tool calls -- standard OpenAI spec.
+            # Fallback: if response has no structured tool_calls but content
+            # contains raw tool call tags (e.g. <tool_call>), parse them using
+            # hermes-agent's standalone parsers. This handles the case where
+            # ManagedServer's ToolCallTranslator couldn't parse because vLLM
+            # isn't installed.
+            if (
+                not assistant_msg.tool_calls
+                and assistant_msg.content
+                and self.tool_schemas
+                and "<tool_call>" in (assistant_msg.content or "")
+            ):
+                try:
+                    from environments.tool_call_parsers import get_parser
+                    fallback_parser = get_parser("hermes")
+                    parsed_content, parsed_calls = fallback_parser.parse(
+                        assistant_msg.content
+                    )
+                    if parsed_calls:
+                        assistant_msg.tool_calls = parsed_calls
+                        if parsed_content is not None:
+                            assistant_msg.content = parsed_content
+                        logger.debug(
+                            "Fallback parser extracted %d tool calls from raw content",
+                            len(parsed_calls),
+                        )
+                except Exception:
+                    pass  # Fall through to no tool calls
+
            if assistant_msg.tool_calls:
+                # Normalize tool calls to dicts — they may come as objects
+                # (OpenAI API) or dicts (vLLM ToolCallTranslator).
+                def _tc_to_dict(tc):
+                    if isinstance(tc, dict):
+                        return {
+                            "id": tc.get("id", f"call_{uuid.uuid4().hex[:8]}"),
+                            "type": "function",
+                            "function": {
+                                "name": tc.get("function", {}).get("name", tc.get("name", "")),
+                                "arguments": tc.get("function", {}).get("arguments", tc.get("arguments", "{}")),
+                            },
+                        }
+                    return {
+                        "id": tc.id,
+                        "type": "function",
+                        "function": {
+                            "name": tc.function.name,
+                            "arguments": tc.function.arguments,
+                        },
+                    }
+
                # Build the assistant message dict for conversation history
                msg_dict: Dict[str, Any] = {
                    "role": "assistant",
                    "content": assistant_msg.content or "",
-                    "tool_calls": [
-                        {
-                            "id": tc.id,
-                            "type": "function",
-                            "function": {
-                                "name": tc.function.name,
-                                "arguments": tc.function.arguments,
-                            },
-                        }
-                        for tc in assistant_msg.tool_calls
-                    ],
+                    "tool_calls": [_tc_to_dict(tc) for tc in assistant_msg.tool_calls],
                }

                # Preserve reasoning_content for multi-turn chat template handling
@@ -278,8 +317,13 @@ class HermesAgentLoop:

                # Execute each tool call via hermes-agent's dispatch
                for tc in assistant_msg.tool_calls:
-                    tool_name = tc.function.name
-                    tool_args_raw = tc.function.arguments
+                    # Handle both object (OpenAI) and dict (vLLM) formats
+                    if isinstance(tc, dict):
+                        tool_name = tc.get("function", {}).get("name", tc.get("name", ""))
+                        tool_args_raw = tc.get("function", {}).get("arguments", tc.get("arguments", "{}"))
+                    else:
+                        tool_name = tc.function.name
+                        tool_args_raw = tc.function.arguments

                    # Validate tool name
                    if tool_name not in self.valid_tool_names:
@@ -390,10 +434,11 @@ class HermesAgentLoop:
                            pass

                    # Add tool response to conversation
+                    tc_id = tc.get("id", "") if isinstance(tc, dict) else tc.id
                    messages.append(
                        {
                            "role": "tool",
-                            "tool_call_id": tc.id,
+                            "tool_call_id": tc_id,
                            "content": tool_result,
                        }
                    )
@@ -0,0 +1,38 @@
+# OpenThoughts-TBLite Evaluation -- Docker Backend (Local Compute)
+#
+# Runs tasks in Docker containers on the local machine.
+# Sandboxed like Modal but no cloud costs. Good for dev/testing.
+#
+# Usage:
+#   python environments/benchmarks/tblite/tblite_env.py evaluate \
+#       --config environments/benchmarks/tblite/local.yaml
+#
+#   # Override concurrency:
+#   python environments/benchmarks/tblite/tblite_env.py evaluate \
+#       --config environments/benchmarks/tblite/local.yaml \
+#       --env.eval_concurrency 4
+
+env:
+  enabled_toolsets: ["terminal", "file"]
+  max_agent_turns: 60
+  max_token_length: 32000
+  agent_temperature: 0.8
+  terminal_backend: "docker"
+  terminal_timeout: 300
+  tool_pool_size: 16
+  dataset_name: "NousResearch/openthoughts-tblite"
+  test_timeout: 600
+  task_timeout: 1200
+  eval_concurrency: 8          # max 8 tasks at once
+  tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
+  use_wandb: false
+  wandb_name: "openthoughts-tblite-local"
+  ensure_scores_are_not_same: false
+  data_dir_to_save_evals: "environments/benchmarks/evals/openthoughts-tblite-local"
+
+openai:
+  base_url: "https://openrouter.ai/api/v1"
+  model_name: "anthropic/claude-sonnet-4"
+  server_type: "openai"
+  health_check: false
+  # api_key loaded from OPENROUTER_API_KEY in .env
@@ -0,0 +1,40 @@
+# OpenThoughts-TBLite Evaluation -- Local vLLM Backend
+#
+# Runs against a local vLLM server with Docker sandboxes.
+#
+# Start the vLLM server from the atropos directory:
+#   python -m example_trainer.vllm_api_server \
+#       --model Qwen/Qwen3-4B-Instruct-2507 \
+#       --port 9001 \
+#       --gpu-memory-utilization 0.8 \
+#       --max-model-len=32000
+#
+# Then run:
+#   python environments/benchmarks/tblite/tblite_env.py evaluate \
+#       --config environments/benchmarks/tblite/local_vllm.yaml
+
+env:
+  enabled_toolsets: ["terminal", "file"]
+  max_agent_turns: 60
+  max_token_length: 16000
+  agent_temperature: 0.6
+  terminal_backend: "docker"
+  terminal_timeout: 300
+  tool_pool_size: 16
+  dataset_name: "NousResearch/openthoughts-tblite"
+  test_timeout: 600
+  task_timeout: 1200
+  eval_concurrency: 8
+  tool_call_parser: "hermes"
+  system_prompt: "You are an expert terminal agent. You MUST use the provided tools to complete tasks. Use the terminal tool to run shell commands, read_file to read files, write_file to write files, search_files to search, and patch to edit files. Do NOT write out solutions as text - execute them using the tools. Always start by exploring the environment with terminal commands."
+  tokenizer_name: "Qwen/Qwen3-4B-Instruct-2507"
+  use_wandb: false
+  wandb_name: "tblite-qwen3-4b-instruct"
+  ensure_scores_are_not_same: false
+  data_dir_to_save_evals: "environments/benchmarks/evals/tblite-qwen3-4b-local"
+
+openai:
+  base_url: "http://localhost:9001"
+  model_name: "Qwen/Qwen3-4B-Instruct-2507"
+  server_type: "vllm"
+  health_check: false
@@ -127,6 +127,14 @@ class TerminalBench2EvalConfig(HermesAgentEnvConfig):
        "causes blocking calls to deadlock inside the thread pool.",
    )

+    # --- Eval concurrency ---
+    eval_concurrency: int = Field(
+        default=0,
+        description="Maximum number of tasks to evaluate in parallel. "
+        "0 means unlimited (all tasks run concurrently). "
+        "Set to 8 for local backends to avoid overwhelming the machine.",
+    )
+

 # Tasks that cannot run properly on Modal and are excluded from scoring.
 MODAL_INCOMPATIBLE_TASKS = {
@@ -201,7 +209,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):

            # Agent settings -- TB2 tasks are complex, need many turns
            max_agent_turns=60,
-            max_token_length=16000,
+            max_token_length=***
            agent_temperature=0.6,
            system_prompt=None,

@@ -225,7 +233,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
            steps_per_eval=1,
            total_steps=1,

-            tokenizer_name="NousResearch/Hermes-3-Llama-3.1-8B",
+            tokenizer_name="NousRe...1-8B",
            use_wandb=True,
            wandb_name="terminal-bench-2",
            ensure_scores_are_not_same=False,  # Binary rewards may all be 0 or 1
@@ -237,7 +245,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
                base_url="https://openrouter.ai/api/v1",
                model_name="anthropic/claude-sonnet-4",
                server_type="openai",
-                api_key=os.getenv("OPENROUTER_API_KEY", ""),
+                api_key=os.get...EY", ""),
                health_check=False,
            )
        ]
@@ -438,8 +446,14 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
                    "error": "no_image",
                }

-            # --- 2. Register per-task Modal image override ---
-            register_task_env_overrides(task_id, {"modal_image": modal_image, "cwd": "/app"})
+            # --- 2. Register per-task image override ---
+            # Set both modal_image and docker_image so the task image is used
+            # regardless of which backend is configured.
+            register_task_env_overrides(task_id, {
+                "modal_image": modal_image,
+                "docker_image": modal_image,
+                "cwd": "/app",
+            })
            logger.info(
                "Task %s: registered image override for task_id %s",
                task_name, task_id[:8],
@@ -454,17 +468,37 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
            messages.append({"role": "user", "content": self.format_prompt(eval_item)})

            # --- 4. Run agent loop ---
-            agent = HermesAgentLoop(
-                server=self.server,
-                tool_schemas=tools,
-                valid_tool_names=valid_names,
-                max_turns=self.config.max_agent_turns,
-                task_id=task_id,
-                temperature=self.config.agent_temperature,
-                max_tokens=self.config.max_token_length,
-                extra_body=self.config.extra_body,
-            )
-            result = await agent.run(messages)
+            # Use ManagedServer (Phase 2) for vLLM/SGLang backends to get
+            # token-level tracking via /generate. Falls back to direct
+            # ServerManager (Phase 1) for OpenAI endpoints.
+            if self._use_managed_server():
+                async with self.server.managed_server(
+                    tokenizer=self.tokenizer,
+                    preserve_think_blocks=bool(self.config.thinking_mode),
+                ) as managed:
+                    agent = HermesAgentLoop(
+                        server=managed,
+                        tool_schemas=tools,
+                        valid_tool_names=valid_names,
+                        max_turns=self.config.max_agent_turns,
+                        task_id=task_id,
+                        temperature=self.config.agent_temperature,
+                        max_tokens=self.config.max_token_length,
+                        extra_body=self.config.extra_body,
+                    )
+                    result = await agent.run(messages)
+            else:
+                agent = HermesAgentLoop(
+                    server=self.server,
+                    tool_schemas=tools,
+                    valid_tool_names=valid_names,
+                    max_turns=self.config.max_agent_turns,
+                    task_id=task_id,
+                    temperature=self.config.agent_temperature,
+                    max_tokens=self.config.max_token_length,
+                    extra_body=self.config.extra_body,
+                )
+                result = await agent.run(messages)

            # --- 5. Verify -- run test suite in the agent's sandbox ---
            # Skip verification if the agent produced no meaningful output
@@ -479,446 +513,3 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
                reward = 0.0
            else:
                # Run tests in a thread so the blocking ctx.terminal() calls
-                # don't freeze the entire event loop (which would stall all
-                # other tasks, tqdm updates, and timeout timers).
-                ctx = ToolContext(task_id)
-                try:
-                    loop = asyncio.get_event_loop()
-                    reward = await loop.run_in_executor(
-                        None,  # default thread pool
-                        self._run_tests, eval_item, ctx, task_name,
-                    )
-                except Exception as e:
-                    logger.error("Task %s: test verification failed: %s", task_name, e)
-                    reward = 0.0
-                finally:
-                    ctx.cleanup()
-
-            passed = reward == 1.0
-            status = "PASS" if passed else "FAIL"
-            elapsed = time.time() - task_start
-            tqdm.write(f"  [{status}] {task_name} (turns={result.turns_used}, {elapsed:.0f}s)")
-            logger.info(
-                "Task %s: reward=%.1f, turns=%d, finished=%s",
-                task_name, reward, result.turns_used, result.finished_naturally,
-            )
-
-            out = {
-                "passed": passed,
-                "reward": reward,
-                "task_name": task_name,
-                "category": category,
-                "turns_used": result.turns_used,
-                "finished_naturally": result.finished_naturally,
-                "messages": result.messages,
-            }
-            self._save_result(out)
-            return out
-
-        except Exception as e:
-            elapsed = time.time() - task_start
-            logger.error("Task %s: rollout failed: %s", task_name, e, exc_info=True)
-            tqdm.write(f"  [ERROR] {task_name}: {e} ({elapsed:.0f}s)")
-            out = {
-                "passed": False, "reward": 0.0,
-                "task_name": task_name, "category": category,
-                "error": str(e),
-            }
-            self._save_result(out)
-            return out
-
-        finally:
-            # --- Cleanup: clear overrides, sandbox, and temp files ---
-            clear_task_env_overrides(task_id)
-            try:
-                cleanup_vm(task_id)
-            except Exception as e:
-                logger.debug("VM cleanup for %s: %s", task_id[:8], e)
-            if task_dir and task_dir.exists():
-                shutil.rmtree(task_dir, ignore_errors=True)
-
-    def _run_tests(
-        self, item: Dict[str, Any], ctx: ToolContext, task_name: str
-    ) -> float:
-        """
-        Upload and execute the test suite in the agent's sandbox, then
-        download the verifier output locally to read the reward.
-
-        Follows Harbor's verification pattern:
-        1. Upload tests/ directory into the sandbox
-        2. Execute test.sh inside the sandbox
-        3. Download /logs/verifier/ directory to a local temp dir
-        4. Read reward.txt locally with native Python I/O
-
-        Downloading locally avoids issues with the file_read tool on
-        the Modal VM and matches how Harbor handles verification.
-
-        TB2 test scripts (test.sh) typically:
-        1. Install pytest via uv/pip
-        2. Run pytest against the test files in /tests/
-        3. Write results to /logs/verifier/reward.txt
-
-        Args:
-            item: The TB2 task dict (contains tests_tar, test_sh)
-            ctx: ToolContext scoped to this task's sandbox
-            task_name: For logging
-
-        Returns:
-            1.0 if tests pass, 0.0 otherwise
-        """
-        tests_tar = item.get("tests_tar", "")
-        test_sh = item.get("test_sh", "")
-
-        if not test_sh:
-            logger.warning("Task %s: no test_sh content, reward=0", task_name)
-            return 0.0
-
-        # Create required directories in the sandbox
-        ctx.terminal("mkdir -p /tests /logs/verifier")
-
-        # Upload test files into the sandbox (binary-safe via base64)
-        if tests_tar:
-            tests_temp = Path(tempfile.mkdtemp(prefix=f"tb2-tests-{task_name}-"))
-            try:
-                _extract_base64_tar(tests_tar, tests_temp)
-                ctx.upload_dir(str(tests_temp), "/tests")
-            except Exception as e:
-                logger.warning("Task %s: failed to upload test files: %s", task_name, e)
-            finally:
-                shutil.rmtree(tests_temp, ignore_errors=True)
-
-        # Write the test runner script (test.sh)
-        ctx.write_file("/tests/test.sh", test_sh)
-        ctx.terminal("chmod +x /tests/test.sh")
-
-        # Execute the test suite
-        logger.info(
-            "Task %s: running test suite (timeout=%ds)",
-            task_name, self.config.test_timeout,
-        )
-        test_result = ctx.terminal(
-            "bash /tests/test.sh",
-            timeout=self.config.test_timeout,
-        )
-
-        exit_code = test_result.get("exit_code", -1)
-        output = test_result.get("output", "")
-
-        # Download the verifier output directory locally, then read reward.txt
-        # with native Python I/O. This avoids issues with file_read on the
-        # Modal VM and matches Harbor's verification pattern.
-        reward = 0.0
-        local_verifier_dir = Path(tempfile.mkdtemp(prefix=f"tb2-verifier-{task_name}-"))
-        try:
-            ctx.download_dir("/logs/verifier", str(local_verifier_dir))
-
-            reward_file = local_verifier_dir / "reward.txt"
-            if reward_file.exists() and reward_file.stat().st_size > 0:
-                content = reward_file.read_text().strip()
-                if content == "1":
-                    reward = 1.0
-                elif content == "0":
-                    reward = 0.0
-                else:
-                    # Unexpected content -- try parsing as float
-                    try:
-                        reward = float(content)
-                    except (ValueError, TypeError):
-                        logger.warning(
-                            "Task %s: reward.txt content unexpected (%r), "
-                            "falling back to exit_code=%d",
-                            task_name, content, exit_code,
-                        )
-                        reward = 1.0 if exit_code == 0 else 0.0
-            else:
-                # reward.txt not written -- fall back to exit code
-                logger.warning(
-                    "Task %s: reward.txt not found after download, "
-                    "falling back to exit_code=%d",
-                    task_name, exit_code,
-                )
-                reward = 1.0 if exit_code == 0 else 0.0
-        except Exception as e:
-            logger.warning(
-                "Task %s: failed to download verifier dir: %s, "
-                "falling back to exit_code=%d",
-                task_name, e, exit_code,
-            )
-            reward = 1.0 if exit_code == 0 else 0.0
-        finally:
-            shutil.rmtree(local_verifier_dir, ignore_errors=True)
-
-        # Log test output for debugging failures
-        if reward == 0.0:
-            output_preview = output[-500:] if output else "(no output)"
-            logger.info(
-                "Task %s: FAIL (exit_code=%d)\n%s",
-                task_name, exit_code, output_preview,
-            )
-
-        return reward
-
-    # =========================================================================
-    # Evaluate -- main entry point for the eval subcommand
-    # =========================================================================
-
-    async def _eval_with_timeout(self, item: Dict[str, Any]) -> Dict:
-        """
-        Wrap rollout_and_score_eval with a per-task wall-clock timeout.
-
-        If the task exceeds task_timeout seconds, it's automatically scored
-        as FAIL. This prevents any single task from hanging indefinitely.
-        """
-        task_name = item.get("task_name", "unknown")
-        category = item.get("category", "unknown")
-        try:
-            return await asyncio.wait_for(
-                self.rollout_and_score_eval(item),
-                timeout=self.config.task_timeout,
-            )
-        except asyncio.TimeoutError:
-            from tqdm import tqdm
-            elapsed = self.config.task_timeout
-            tqdm.write(f"  [TIMEOUT] {task_name} (exceeded {elapsed}s wall-clock limit)")
-            logger.error("Task %s: wall-clock timeout after %ds", task_name, elapsed)
-            out = {
-                "passed": False, "reward": 0.0,
-                "task_name": task_name, "category": category,
-                "error": f"timeout ({elapsed}s)",
-            }
-            self._save_result(out)
-            return out
-
-    async def evaluate(self, *args, **kwargs) -> None:
-        """
-        Run Terminal-Bench 2.0 evaluation over all tasks.
-
-        This is the main entry point when invoked via:
-            python environments/terminalbench2_env.py evaluate
-
-        Runs all tasks through rollout_and_score_eval() via asyncio.gather()
-        (same pattern as GPQA and other Atropos eval envs). Each task is
-        wrapped with a wall-clock timeout so hung tasks auto-fail.
-
-        Suppresses noisy Modal/terminal output (HERMES_QUIET) so the tqdm
-        bar stays visible.
-        """
-        start_time = time.time()
-
-        # Route all logging through tqdm.write() so the progress bar stays
-        # pinned at the bottom while log lines scroll above it.
-        from tqdm import tqdm
-
-        class _TqdmHandler(logging.Handler):
-            def emit(self, record):
-                try:
-                    tqdm.write(self.format(record))
-                except Exception:
-                    self.handleError(record)
-
-        handler = _TqdmHandler()
-        handler.setFormatter(logging.Formatter(
-            "%(asctime)s [%(name)s] %(levelname)s: %(message)s",
-            datefmt="%H:%M:%S",
-        ))
-        root = logging.getLogger()
-        root.handlers = [handler]  # Replace any existing handlers
-        root.setLevel(logging.INFO)
-
-        # Silence noisy third-party loggers that flood the output
-        logging.getLogger("httpx").setLevel(logging.WARNING)      # Every HTTP request
-        logging.getLogger("openai").setLevel(logging.WARNING)     # OpenAI client retries
-        logging.getLogger("rex-deploy").setLevel(logging.WARNING) # Swerex deployment
-        logging.getLogger("rex_image_builder").setLevel(logging.WARNING)  # Image builds
-
-        print(f"\n{'='*60}")
-        print("Starting Terminal-Bench 2.0 Evaluation")
-        print(f"{'='*60}")
-        print(f"  Dataset: {self.config.dataset_name}")
-        print(f"  Total tasks: {len(self.all_eval_items)}")
-        print(f"  Max agent turns: {self.config.max_agent_turns}")
-        print(f"  Task timeout: {self.config.task_timeout}s")
-        print(f"  Terminal backend: {self.config.terminal_backend}")
-        print(f"  Tool thread pool: {self.config.tool_pool_size}")
-        print(f"  Terminal timeout: {self.config.terminal_timeout}s/cmd")
-        print(f"  Terminal lifetime: {self.config.terminal_lifetime}s (auto: task_timeout + 120)")
-        print(f"  Max concurrent tasks: {self.config.max_concurrent_tasks}")
-        print(f"{'='*60}\n")
-
-        # Semaphore to limit concurrent Modal sandbox creations.
-        # Without this, all 86 tasks fire simultaneously, each creating a Modal
-        # sandbox via asyncio.run() inside a thread pool worker. Modal's blocking
-        # calls (App.lookup, etc.) deadlock when too many are created at once.
-        semaphore = asyncio.Semaphore(self.config.max_concurrent_tasks)
-
-        async def _eval_with_semaphore(item):
-            async with semaphore:
-                return await self._eval_with_timeout(item)
-
-        # Fire all tasks with wall-clock timeout, track live accuracy on the bar
-        total_tasks = len(self.all_eval_items)
-        eval_tasks = [
-            asyncio.ensure_future(_eval_with_semaphore(item))
-            for item in self.all_eval_items
-        ]
-
-        results = []
-        passed_count = 0
-        pbar = tqdm(total=total_tasks, desc="Evaluating TB2", dynamic_ncols=True)
-        try:
-            for coro in asyncio.as_completed(eval_tasks):
-                result = await coro
-                results.append(result)
-                if result and result.get("passed"):
-                    passed_count += 1
-                done = len(results)
-                pct = (passed_count / done * 100) if done else 0
-                pbar.set_postfix_str(f"pass={passed_count}/{done} ({pct:.1f}%)")
-                pbar.update(1)
-        except (KeyboardInterrupt, asyncio.CancelledError):
-            pbar.close()
-            print(f"\n\nInterrupted! Cleaning up {len(eval_tasks)} tasks...")
-            # Cancel all pending tasks
-            for task in eval_tasks:
-                task.cancel()
-            # Let cancellations propagate (finally blocks run cleanup_vm)
-            await asyncio.gather(*eval_tasks, return_exceptions=True)
-            # Belt-and-suspenders: clean up any remaining sandboxes
-            from tools.terminal_tool import cleanup_all_environments
-            cleanup_all_environments()
-            print("All sandboxes cleaned up.")
-            return
-        finally:
-            pbar.close()
-
-        end_time = time.time()
-
-        # Filter out None results (shouldn't happen, but be safe)
-        valid_results = [r for r in results if r is not None]
-
-        if not valid_results:
-            print("Warning: No valid evaluation results obtained")
-            return
-
-        # ---- Compute metrics ----
-        total = len(valid_results)
-        passed = sum(1 for r in valid_results if r.get("passed"))
-        overall_pass_rate = passed / total if total > 0 else 0.0
-
-        # Per-category breakdown
-        cat_results: Dict[str, List[Dict]] = defaultdict(list)
-        for r in valid_results:
-            cat_results[r.get("category", "unknown")].append(r)
-
-        # Build metrics dict
-        eval_metrics = {
-            "eval/pass_rate": overall_pass_rate,
-            "eval/total_tasks": total,
-            "eval/passed_tasks": passed,
-            "eval/evaluation_time_seconds": end_time - start_time,
-        }
-
-        # Per-category metrics
-        for category, cat_items in sorted(cat_results.items()):
-            cat_passed = sum(1 for r in cat_items if r.get("passed"))
-            cat_total = len(cat_items)
-            cat_pass_rate = cat_passed / cat_total if cat_total > 0 else 0.0
-            cat_key = category.replace(" ", "_").replace("-", "_").lower()
-            eval_metrics[f"eval/pass_rate_{cat_key}"] = cat_pass_rate
-
-        # Store metrics for wandb_log
-        self.eval_metrics = [(k, v) for k, v in eval_metrics.items()]
-
-        # ---- Print summary ----
-        print(f"\n{'='*60}")
-        print("Terminal-Bench 2.0 Evaluation Results")
-        print(f"{'='*60}")
-        print(f"Overall Pass Rate: {overall_pass_rate:.4f} ({passed}/{total})")
-        print(f"Evaluation Time: {end_time - start_time:.1f} seconds")
-
-        print("\nCategory Breakdown:")
-        for category, cat_items in sorted(cat_results.items()):
-            cat_passed = sum(1 for r in cat_items if r.get("passed"))
-            cat_total = len(cat_items)
-            cat_rate = cat_passed / cat_total if cat_total > 0 else 0.0
-            print(f"  {category}: {cat_rate:.1%} ({cat_passed}/{cat_total})")
-
-        # Print individual task results
-        print("\nTask Results:")
-        for r in sorted(valid_results, key=lambda x: x.get("task_name", "")):
-            status = "PASS" if r.get("passed") else "FAIL"
-            turns = r.get("turns_used", "?")
-            error = r.get("error", "")
-            extra = f" (error: {error})" if error else ""
-            print(f"  [{status}] {r['task_name']} (turns={turns}){extra}")
-
-        print(f"{'='*60}\n")
-
-        # Build sample records for evaluate_log (includes full conversations)
-        samples = [
-            {
-                "task_name": r.get("task_name"),
-                "category": r.get("category"),
-                "passed": r.get("passed"),
-                "reward": r.get("reward"),
-                "turns_used": r.get("turns_used"),
-                "error": r.get("error"),
-                "messages": r.get("messages"),
-            }
-            for r in valid_results
-        ]
-
-        # Log evaluation results
-        try:
-            await self.evaluate_log(
-                metrics=eval_metrics,
-                samples=samples,
-                start_time=start_time,
-                end_time=end_time,
-                generation_parameters={
-                    "temperature": self.config.agent_temperature,
-                    "max_tokens": self.config.max_token_length,
-                    "max_agent_turns": self.config.max_agent_turns,
-                    "terminal_backend": self.config.terminal_backend,
-                },
-            )
-        except Exception as e:
-            print(f"Error logging evaluation results: {e}")
-
-        # Close streaming file
-        if hasattr(self, "_streaming_file") and not self._streaming_file.closed:
-            self._streaming_file.close()
-            print(f"  Live results saved to: {self._streaming_path}")
-
-        # Kill all remaining sandboxes. Timed-out tasks leave orphaned thread
-        # pool workers still executing commands -- cleanup_all stops them.
-        from tools.terminal_tool import cleanup_all_environments
-        print("\nCleaning up all sandboxes...")
-        cleanup_all_environments()
-
-        # Shut down the tool thread pool so orphaned workers from timed-out
-        # tasks are killed immediately instead of retrying against dead
-        # sandboxes and spamming the console with TimeoutError warnings.
-        from environments.agent_loop import _tool_executor
-        _tool_executor.shutdown(wait=False, cancel_futures=True)
-        print("Done.")
-
-    # =========================================================================
-    # Wandb logging
-    # =========================================================================
-
-    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
-        """Log TB2-specific metrics to wandb."""
-        if wandb_metrics is None:
-            wandb_metrics = {}
-
-        # Add stored eval metrics
-        for metric_name, metric_value in self.eval_metrics:
-            wandb_metrics[metric_name] = metric_value
-        self.eval_metrics = []
-
-        await super().wandb_log(wandb_metrics)
-
-
-if __name__ == "__main__":
-    TerminalBench2EvalEnv.cli()
@@ -229,6 +229,12 @@ class HermesAgentBaseEnv(BaseEnv):
        from environments.agent_loop import resize_tool_pool
        resize_tool_pool(config.tool_pool_size)

+        # Set tool_parser on the ServerManager so ManagedServer uses it
+        # for bidirectional tool call translation (raw text ↔ OpenAI tool_calls).
+        if hasattr(self.server, 'tool_parser'):
+            self.server.tool_parser = config.tool_call_parser
+            print(f"🔧 Tool parser: {config.tool_call_parser}")
+
        # Current group's resolved tools (set in collect_trajectories)
        self._current_group_tools: Optional[Tuple[List[Dict], Set[str]]] = None

@@ -466,22 +472,14 @@ class HermesAgentBaseEnv(BaseEnv):
        # Run the agent loop
        result: AgentResult
        if self._use_managed_server():
-            # Phase 2: ManagedServer with parser -- exact tokens + logprobs
-            # Load the tool call parser from registry based on config
-            from environments.tool_call_parsers import get_parser
-            try:
-                tc_parser = get_parser(self.config.tool_call_parser)
-            except KeyError:
-                logger.warning(
-                    "Tool call parser '%s' not found, falling back to 'hermes'",
-                    self.config.tool_call_parser,
-                )
-                tc_parser = get_parser("hermes")
-
+            # Phase 2: ManagedServer with ToolCallTranslator -- exact tokens + logprobs
+            # tool_parser is set on ServerManager in __init__ and passed through
+            # to ManagedServer, which uses ToolCallTranslator for bidirectional
+            # translation between raw text and OpenAI tool_calls.
            try:
                async with self.server.managed_server(
                    tokenizer=self.tokenizer,
-                    tool_call_parser=tc_parser,
+                    preserve_think_blocks=bool(self.config.thinking_mode),
                ) as managed:
                    agent = HermesAgentLoop(
                        server=managed,
@@ -114,11 +114,27 @@ def _patch_swerex_modal():
        self._worker = _AsyncWorker()
        self._worker.start()

+        # Pre-build a modal.Image with pip fix for Modal's legacy image builder.
+        # Modal requires `python -m pip` to work during image build, but some
+        # task images (e.g., TBLite's broken-python) have intentionally broken pip.
+        # Fix: remove stale pip dist-info and reinstall via ensurepip before Modal
+        # tries to use it. This is a no-op for images where pip already works.
+        import modal as _modal
+        image_spec = self.config.image
+        if isinstance(image_spec, str):
+            image_spec = _modal.Image.from_registry(
+                image_spec,
+                setup_dockerfile_commands=[
+                    "RUN rm -rf /usr/local/lib/python*/site-packages/pip* 2>/dev/null; "
+                    "python -m ensurepip --upgrade --default-pip 2>/dev/null || true",
+                ],
+            )
+
        # Create AND start the deployment entirely on the worker's loop/thread
        # so all gRPC channels and async state are bound to that loop
        async def _create_and_start():
            deployment = ModalDeployment(
-                image=self.config.image,
+                image=image_spec,
                startup_timeout=self.config.startup_timeout,
                runtime_timeout=self.config.runtime_timeout,
                deployment_timeout=self.config.deployment_timeout,
@@ -17,6 +17,26 @@ logger = logging.getLogger(__name__)
 DIRECTORY_PATH = Path.home() / ".hermes" / "channel_directory.json"


+def _session_entry_id(origin: Dict[str, Any]) -> Optional[str]:
+    chat_id = origin.get("chat_id")
+    if not chat_id:
+        return None
+    thread_id = origin.get("thread_id")
+    if thread_id:
+        return f"{chat_id}:{thread_id}"
+    return str(chat_id)
+
+
+def _session_entry_name(origin: Dict[str, Any]) -> str:
+    base_name = origin.get("chat_name") or origin.get("user_name") or str(origin.get("chat_id"))
+    thread_id = origin.get("thread_id")
+    if not thread_id:
+        return base_name
+
+    topic_label = origin.get("chat_topic") or f"topic {thread_id}"
+    return f"{base_name} / {topic_label}"
+
+
 # ---------------------------------------------------------------------------
 # Build / refresh
 # ---------------------------------------------------------------------------
@@ -41,7 +61,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
            logger.warning("Channel directory: failed to build %s: %s", platform.value, e)

    # Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
-    for plat_name in ("telegram", "whatsapp", "signal"):
+    for plat_name in ("telegram", "whatsapp", "signal", "email"):
        if plat_name not in platforms:
            platforms[plat_name] = _build_from_sessions(plat_name)

@@ -123,14 +143,15 @@ def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
            origin = session.get("origin") or {}
            if origin.get("platform") != platform_name:
                continue
-            chat_id = origin.get("chat_id")
-            if not chat_id or chat_id in seen_ids:
+            entry_id = _session_entry_id(origin)
+            if not entry_id or entry_id in seen_ids:
                continue
-            seen_ids.add(chat_id)
+            seen_ids.add(entry_id)
            entries.append({
-                "id": str(chat_id),
-                "name": origin.get("chat_name") or origin.get("user_name") or str(chat_id),
+                "id": entry_id,
+                "name": _session_entry_name(origin),
                "type": session.get("chat_type", "dm"),
+                "thread_id": origin.get("thread_id"),
            })
    except Exception as e:
        logger.debug("Channel directory: failed to read sessions for %s: %s", platform_name, e)
@@ -28,6 +28,7 @@ class Platform(Enum):
    SLACK = "slack"
    SIGNAL = "signal"
    HOMEASSISTANT = "homeassistant"
+    EMAIL = "email"


@dataclass
@@ -167,6 +168,9 @@ class GatewayConfig:
            # Signal uses extra dict for config (http_url + account)
            elif platform == Platform.SIGNAL and config.extra.get("http_url"):
                connected.append(platform)
+            # Email uses extra dict for config (address + imap_host + smtp_host)
+            elif platform == Platform.EMAIL and config.extra.get("address"):
+                connected.append(platform)
        return connected
    
    def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
@@ -288,6 +292,18 @@ def load_gateway_config() -> GatewayConfig:
            sr = yaml_cfg.get("session_reset")
            if sr and isinstance(sr, dict):
                config.default_reset_policy = SessionResetPolicy.from_dict(sr)
+
+            # Bridge discord settings from config.yaml to env vars
+            # (env vars take precedence — only set if not already defined)
+            discord_cfg = yaml_cfg.get("discord", {})
+            if isinstance(discord_cfg, dict):
+                if "require_mention" in discord_cfg and not os.getenv("DISCORD_REQUIRE_MENTION"):
+                    os.environ["DISCORD_REQUIRE_MENTION"] = str(discord_cfg["require_mention"]).lower()
+                frc = discord_cfg.get("free_response_channels")
+                if frc is not None and not os.getenv("DISCORD_FREE_RESPONSE_CHANNELS"):
+                    if isinstance(frc, list):
+                        frc = ",".join(str(v) for v in frc)
+                    os.environ["DISCORD_FREE_RESPONSE_CHANNELS"] = str(frc)
    except Exception:
        pass

@@ -420,6 +436,28 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
        if hass_url:
            config.platforms[Platform.HOMEASSISTANT].extra["url"] = hass_url

+    # Email
+    email_addr = os.getenv("EMAIL_ADDRESS")
+    email_pwd = os.getenv("EMAIL_PASSWORD")
+    email_imap = os.getenv("EMAIL_IMAP_HOST")
+    email_smtp = os.getenv("EMAIL_SMTP_HOST")
+    if all([email_addr, email_pwd, email_imap, email_smtp]):
+        if Platform.EMAIL not in config.platforms:
+            config.platforms[Platform.EMAIL] = PlatformConfig()
+        config.platforms[Platform.EMAIL].enabled = True
+        config.platforms[Platform.EMAIL].extra.update({
+            "address": email_addr,
+            "imap_host": email_imap,
+            "smtp_host": email_smtp,
+        })
+        email_home = os.getenv("EMAIL_HOME_ADDRESS")
+        if email_home:
+            config.platforms[Platform.EMAIL].home_channel = HomeChannel(
+                platform=Platform.EMAIL,
+                chat_id=email_home,
+                name=os.getenv("EMAIL_HOME_ADDRESS_NAME", "Home"),
+            )
+
    # Session settings
    idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
    if idle_minutes:
@@ -37,6 +37,7 @@ class DeliveryTarget:
    """
    platform: Platform
    chat_id: Optional[str] = None  # None means use home channel
+    thread_id: Optional[str] = None
    is_origin: bool = False
    is_explicit: bool = False  # True if chat_id was explicitly specified
    
@@ -58,6 +59,7 @@ class DeliveryTarget:
                return cls(
                    platform=origin.platform,
                    chat_id=origin.chat_id,
+                    thread_id=origin.thread_id,
                    is_origin=True,
                )
            else:
@@ -150,7 +152,7 @@ class DeliveryRouter:
                    continue
            
            # Deduplicate
-            key = (target.platform, target.chat_id)
+            key = (target.platform, target.chat_id, target.thread_id)
            if key not in seen_platforms:
                seen_platforms.add(key)
                targets.append(target)
@@ -285,7 +287,10 @@ class DeliveryRouter:
                + f"\n\n... [truncated, full output saved to {saved_path}]"
            )
        
-        return await adapter.send(target.chat_id, content, metadata=metadata)
+        send_metadata = dict(metadata or {})
+        if target.thread_id and "thread_id" not in send_metadata:
+            send_metadata["thread_id"] = target.thread_id
+        return await adapter.send(target.chat_id, content, metadata=send_metadata or None)


 def parse_deliver_spec(
@@ -26,6 +26,7 @@ def mirror_to_session(
    chat_id: str,
    message_text: str,
    source_label: str = "cli",
+    thread_id: Optional[str] = None,
 ) -> bool:
    """
    Append a delivery-mirror message to the target session's transcript.
@@ -37,9 +38,9 @@ def mirror_to_session(
    All errors are caught -- this is never fatal.
    """
    try:
-        session_id = _find_session_id(platform, str(chat_id))
+        session_id = _find_session_id(platform, str(chat_id), thread_id=thread_id)
        if not session_id:
-            logger.debug("Mirror: no session found for %s:%s", platform, chat_id)
+            logger.debug("Mirror: no session found for %s:%s:%s", platform, chat_id, thread_id)
            return False

        mirror_msg = {
@@ -57,11 +58,11 @@ def mirror_to_session(
        return True

    except Exception as e:
-        logger.debug("Mirror failed for %s:%s: %s", platform, chat_id, e)
+        logger.debug("Mirror failed for %s:%s:%s: %s", platform, chat_id, thread_id, e)
        return False


-def _find_session_id(platform: str, chat_id: str) -> Optional[str]:
+def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = None) -> Optional[str]:
    """
    Find the active session_id for a platform + chat_id pair.

@@ -91,6 +92,9 @@ def _find_session_id(platform: str, chat_id: str) -> Optional[str]:

        origin_chat_id = str(origin.get("chat_id", ""))
        if origin_chat_id == str(chat_id):
+            origin_thread_id = origin.get("thread_id")
+            if thread_id is not None and str(origin_thread_id or "") != str(thread_id):
+                continue
            updated = entry.get("updated_at", "")
            if updated > best_updated:
                best_updated = updated
@@ -24,7 +24,7 @@ from pathlib import Path as _Path
 sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))

 from gateway.config import Platform, PlatformConfig
-from gateway.session import SessionSource
+from gateway.session import SessionSource, build_session_key


 # ---------------------------------------------------------------------------
@@ -516,6 +516,7 @@ class BasePlatformAdapter(ABC):
        audio_path: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
+        **kwargs,
    ) -> SendResult:
        """
        Send an audio file as a native voice message via the platform API.
@@ -535,6 +536,7 @@ class BasePlatformAdapter(ABC):
        video_path: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
+        **kwargs,
    ) -> SendResult:
        """
        Send a video natively via the platform API.
@@ -554,6 +556,7 @@ class BasePlatformAdapter(ABC):
        caption: Optional[str] = None,
        file_name: Optional[str] = None,
        reply_to: Optional[str] = None,
+        **kwargs,
    ) -> SendResult:
        """
        Send a document/file natively via the platform API.
@@ -572,6 +575,7 @@ class BasePlatformAdapter(ABC):
        image_path: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
+        **kwargs,
    ) -> SendResult:
        """
        Send a local image file natively via the platform API.
@@ -646,7 +650,7 @@ class BasePlatformAdapter(ABC):
        if not self._message_handler:
            return
        
-        session_key = event.source.chat_id
+        session_key = build_session_key(event.source)
        
        # Check if there's already an active handler for this session
        if session_key in self._active_sessions:
@@ -72,11 +72,11 @@ class DiscordAdapter(BasePlatformAdapter):
    async def connect(self) -> bool:
        """Connect to Discord and start receiving events."""
        if not DISCORD_AVAILABLE:
-            print(f"[{self.name}] discord.py not installed. Run: pip install discord.py")
+            logger.error("[%s] discord.py not installed. Run: pip install discord.py", self.name)
            return False
        
        if not self.config.token:
-            print(f"[{self.name}] No bot token configured")
+            logger.error("[%s] No bot token configured", self.name)
            return False
        
        try:
@@ -105,7 +105,7 @@ class DiscordAdapter(BasePlatformAdapter):
            # Register event handlers
            @self._client.event
            async def on_ready():
-                print(f"[{adapter_self.name}] Connected as {adapter_self._client.user}")
+                logger.info("[%s] Connected as %s", adapter_self.name, adapter_self._client.user)
                
                # Resolve any usernames in the allowed list to numeric IDs
                await adapter_self._resolve_allowed_usernames()
@@ -113,16 +113,30 @@ class DiscordAdapter(BasePlatformAdapter):
                # Sync slash commands with Discord
                try:
                    synced = await adapter_self._client.tree.sync()
-                    print(f"[{adapter_self.name}] Synced {len(synced)} slash command(s)")
-                except Exception as e:
-                    print(f"[{adapter_self.name}] Slash command sync failed: {e}")
+                    logger.info("[%s] Synced %d slash command(s)", adapter_self.name, len(synced))
+                except Exception as e:  # pragma: no cover - defensive logging
+                    logger.warning("[%s] Slash command sync failed: %s", adapter_self.name, e, exc_info=True)
                adapter_self._ready_event.set()
            
            @self._client.event
            async def on_message(message: DiscordMessage):
-                # Ignore bot's own messages
+                # Always ignore our own messages
                if message.author == self._client.user:
                    return
+                
+                # Bot message filtering (DISCORD_ALLOW_BOTS):
+                #   "none"     — ignore all other bots (default)
+                #   "mentions" — accept bot messages only when they @mention us
+                #   "all"      — accept all bot messages
+                if getattr(message.author, "bot", False):
+                    allow_bots = os.getenv("DISCORD_ALLOW_BOTS", "none").lower().strip()
+                    if allow_bots == "none":
+                        return
+                    elif allow_bots == "mentions":
+                        if not self._client.user or self._client.user not in message.mentions:
+                            return
+                    # "all" falls through to handle_message
+                
                await self._handle_message(message)
            
            # Register slash commands
@@ -138,10 +152,10 @@ class DiscordAdapter(BasePlatformAdapter):
            return True
            
        except asyncio.TimeoutError:
-            print(f"[{self.name}] Timeout waiting for connection")
+            logger.error("[%s] Timeout waiting for connection to Discord", self.name, exc_info=True)
            return False
-        except Exception as e:
-            print(f"[{self.name}] Failed to connect: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error("[%s] Failed to connect to Discord: %s", self.name, e, exc_info=True)
            return False
    
    async def disconnect(self) -> None:
@@ -149,13 +163,13 @@ class DiscordAdapter(BasePlatformAdapter):
        if self._client:
            try:
                await self._client.close()
-            except Exception as e:
-                print(f"[{self.name}] Error during disconnect: {e}")
+            except Exception as e:  # pragma: no cover - defensive logging
+                logger.warning("[%s] Error during disconnect: %s", self.name, e, exc_info=True)
        
        self._running = False
        self._client = None
        self._ready_event.clear()
-        print(f"[{self.name}] Disconnected")
+        logger.info("[%s] Disconnected", self.name)
    
    async def send(
        self,
@@ -204,7 +218,8 @@ class DiscordAdapter(BasePlatformAdapter):
                raw_response={"message_ids": message_ids}
            )
            
-        except Exception as e:
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error("[%s] Failed to send Discord message: %s", self.name, e, exc_info=True)
            return SendResult(success=False, error=str(e))

    async def edit_message(
@@ -226,7 +241,8 @@ class DiscordAdapter(BasePlatformAdapter):
                formatted = formatted[:self.MAX_MESSAGE_LENGTH - 3] + "..."
            await msg.edit(content=formatted)
            return SendResult(success=True, message_id=message_id)
-        except Exception as e:
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error("[%s] Failed to edit Discord message %s: %s", self.name, message_id, e, exc_info=True)
            return SendResult(success=False, error=str(e))

    async def send_voice(
@@ -263,8 +279,8 @@ class DiscordAdapter(BasePlatformAdapter):
                )
                return SendResult(success=True, message_id=str(msg.id))
        
-        except Exception as e:
-            print(f"[{self.name}] Failed to send audio: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error("[%s] Failed to send audio, falling back to base adapter: %s", self.name, e, exc_info=True)
            return await super().send_voice(chat_id, audio_path, caption, reply_to)
    
    async def send_image_file(
@@ -300,8 +316,8 @@ class DiscordAdapter(BasePlatformAdapter):
                )
                return SendResult(success=True, message_id=str(msg.id))
        
-        except Exception as e:
-            print(f"[{self.name}] Failed to send local image: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error("[%s] Failed to send local image, falling back to base adapter: %s", self.name, e, exc_info=True)
            return await super().send_image_file(chat_id, image_path, caption, reply_to)

    async def send_image(
@@ -353,10 +369,19 @@ class DiscordAdapter(BasePlatformAdapter):
                    return SendResult(success=True, message_id=str(msg.id))
        
        except ImportError:
-            print(f"[{self.name}] aiohttp not installed, falling back to URL. Run: pip install aiohttp")
+            logger.warning(
+                "[%s] aiohttp not installed, falling back to URL. Run: pip install aiohttp",
+                self.name,
+                exc_info=True,
+            )
            return await super().send_image(chat_id, image_url, caption, reply_to)
-        except Exception as e:
-            print(f"[{self.name}] Failed to send image attachment, falling back to URL: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error(
+                "[%s] Failed to send image attachment, falling back to URL: %s",
+                self.name,
+                e,
+                exc_info=True,
+            )
            return await super().send_image(chat_id, image_url, caption, reply_to)
    
    async def send_typing(self, chat_id: str, metadata=None) -> None:
@@ -404,7 +429,8 @@ class DiscordAdapter(BasePlatformAdapter):
                "guild_id": str(channel.guild.id) if hasattr(channel, "guild") and channel.guild else None,
                "guild_name": channel.guild.name if hasattr(channel, "guild") and channel.guild else None,
            }
-        except Exception as e:
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error("[%s] Failed to get chat info for %s: %s", self.name, chat_id, e, exc_info=True)
            return {"name": str(chat_id), "type": "dm", "error": str(e)}
    
    async def _resolve_allowed_usernames(self) -> None:
@@ -749,6 +775,46 @@ class DiscordAdapter(BasePlatformAdapter):
        except Exception as e:
            return SendResult(success=False, error=str(e))

+    def _get_parent_channel_id(self, channel: Any) -> Optional[str]:
+        """Return the parent channel ID for a Discord thread-like channel, if present."""
+        parent = getattr(channel, "parent", None)
+        if parent is not None and getattr(parent, "id", None) is not None:
+            return str(parent.id)
+        parent_id = getattr(channel, "parent_id", None)
+        if parent_id is not None:
+            return str(parent_id)
+        return None
+
+    def _is_forum_parent(self, channel: Any) -> bool:
+        """Best-effort check for whether a Discord channel is a forum channel."""
+        if channel is None:
+            return False
+        forum_cls = getattr(discord, "ForumChannel", None)
+        if forum_cls and isinstance(channel, forum_cls):
+            return True
+        channel_type = getattr(channel, "type", None)
+        if channel_type is not None:
+            type_value = getattr(channel_type, "value", channel_type)
+            if type_value == 15:
+                return True
+        return False
+
+    def _format_thread_chat_name(self, thread: Any) -> str:
+        """Build a readable chat name for thread-like Discord channels, including forum context when available."""
+        thread_name = getattr(thread, "name", None) or str(getattr(thread, "id", "thread"))
+        parent = getattr(thread, "parent", None)
+        guild = getattr(thread, "guild", None) or getattr(parent, "guild", None)
+        guild_name = getattr(guild, "name", None)
+        parent_name = getattr(parent, "name", None)
+
+        if self._is_forum_parent(parent) and guild_name and parent_name:
+            return f"{guild_name} / {parent_name} / {thread_name}"
+        if parent_name and guild_name:
+            return f"{guild_name} / #{parent_name} / {thread_name}"
+        if parent_name:
+            return f"{parent_name} / {thread_name}"
+        return thread_name
+
    async def _handle_message(self, message: DiscordMessage) -> None:
        """Handle incoming Discord messages."""
        # In server channels (not DMs), require the bot to be @mentioned
@@ -759,28 +825,33 @@ class DiscordAdapter(BasePlatformAdapter):
        #       bot responds to every message without needing a mention.
        #   DISCORD_REQUIRE_MENTION: Set to "false" to disable mention requirement
        #       globally (all channels become free-response). Default: "true".
-        
+        #       Can also be set via discord.require_mention in config.yaml.
+
+        thread_id = None
+        parent_channel_id = None
+        is_thread = isinstance(message.channel, discord.Thread)
+        if is_thread:
+            thread_id = str(message.channel.id)
+            parent_channel_id = self._get_parent_channel_id(message.channel)
+
        if not isinstance(message.channel, discord.DMChannel):
-            # Check if this channel is in the free-response list
            free_channels_raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
            free_channels = {ch.strip() for ch in free_channels_raw.split(",") if ch.strip()}
-            channel_id = str(message.channel.id)
-            
-            # Global override: if DISCORD_REQUIRE_MENTION=false, all channels are free
+            channel_ids = {str(message.channel.id)}
+            if parent_channel_id:
+                channel_ids.add(parent_channel_id)
+
            require_mention = os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no")
-            
-            is_free_channel = channel_id in free_channels
-            
+            is_free_channel = bool(channel_ids & free_channels)
+
            if require_mention and not is_free_channel:
-                # Must be @mentioned to respond
                if self._client.user not in message.mentions:
-                    return  # Silently ignore messages that don't mention the bot
-            
-            # Strip the bot mention from the message text so the agent sees clean input
+                    return
+
            if self._client.user and self._client.user in message.mentions:
                message.content = message.content.replace(f"<@{self._client.user.id}>", "").strip()
                message.content = message.content.replace(f"<@!{self._client.user.id}>", "").strip()
-        
+
        # Determine message type
        msg_type = MessageType.TEXT
        if message.content.startswith("/"):
@@ -803,20 +874,15 @@ class DiscordAdapter(BasePlatformAdapter):
        if isinstance(message.channel, discord.DMChannel):
            chat_type = "dm"
            chat_name = message.author.name
-        elif isinstance(message.channel, discord.Thread):
+        elif is_thread:
            chat_type = "thread"
-            chat_name = message.channel.name
+            chat_name = self._format_thread_chat_name(message.channel)
        else:
-            chat_type = "group"  # Treat server channels as groups
+            chat_type = "group"
            chat_name = getattr(message.channel, "name", str(message.channel.id))
            if hasattr(message.channel, "guild") and message.channel.guild:
                chat_name = f"{message.channel.guild.name} / #{chat_name}"
-        
-        # Get thread ID if in a thread
-        thread_id = None
-        if isinstance(message.channel, discord.Thread):
-            thread_id = str(message.channel.id)
-        
+
        # Get channel topic (if available - TextChannels have topics, DMs/threads don't)
        chat_topic = getattr(message.channel, "topic", None)
        
@@ -0,0 +1,533 @@
+"""
+Email platform adapter for the Hermes gateway.
+
+Allows users to interact with Hermes by sending emails.
+Uses IMAP to receive and SMTP to send messages.
+
+Environment variables:
+    EMAIL_IMAP_HOST     — IMAP server host (e.g., imap.gmail.com)
+    EMAIL_IMAP_PORT     — IMAP server port (default: 993)
+    EMAIL_SMTP_HOST     — SMTP server host (e.g., smtp.gmail.com)
+    EMAIL_SMTP_PORT     — SMTP server port (default: 587)
+    EMAIL_ADDRESS       — Email address for the agent
+    EMAIL_PASSWORD      — Email password or app-specific password
+    EMAIL_POLL_INTERVAL — Seconds between mailbox checks (default: 15)
+    EMAIL_ALLOWED_USERS — Comma-separated list of allowed sender addresses
+"""
+
+import asyncio
+import email as email_lib
+import imaplib
+import logging
+import os
+import re
+import smtplib
+import uuid
+from datetime import datetime
+from email.header import decode_header
+from email.mime.multipart import MIMEMultipart
+from email.mime.text import MIMEText
+from email.mime.base import MIMEBase
+from email import encoders
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+from gateway.platforms.base import (
+    BasePlatformAdapter,
+    MessageEvent,
+    MessageType,
+    SendResult,
+    cache_document_from_bytes,
+    cache_image_from_bytes,
+)
+from gateway.config import Platform, PlatformConfig
+
+logger = logging.getLogger(__name__)
+
+# Gmail-safe max length per email body
+MAX_MESSAGE_LENGTH = 50_000
+
+# Supported image extensions for inline detection
+_IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
+
+
+def check_email_requirements() -> bool:
+    """Check if email platform dependencies are available."""
+    addr = os.getenv("EMAIL_ADDRESS")
+    pwd = os.getenv("EMAIL_PASSWORD")
+    imap = os.getenv("EMAIL_IMAP_HOST")
+    smtp = os.getenv("EMAIL_SMTP_HOST")
+    if not all([addr, pwd, imap, smtp]):
+        return False
+    return True
+
+
+def _decode_header_value(raw: str) -> str:
+    """Decode an RFC 2047 encoded email header into a plain string."""
+    parts = decode_header(raw)
+    decoded = []
+    for part, charset in parts:
+        if isinstance(part, bytes):
+            decoded.append(part.decode(charset or "utf-8", errors="replace"))
+        else:
+            decoded.append(part)
+    return " ".join(decoded)
+
+
+def _extract_text_body(msg: email_lib.message.Message) -> str:
+    """Extract the plain-text body from a potentially multipart email."""
+    if msg.is_multipart():
+        for part in msg.walk():
+            content_type = part.get_content_type()
+            disposition = str(part.get("Content-Disposition", ""))
+            # Skip attachments
+            if "attachment" in disposition:
+                continue
+            if content_type == "text/plain":
+                payload = part.get_payload(decode=True)
+                if payload:
+                    charset = part.get_content_charset() or "utf-8"
+                    return payload.decode(charset, errors="replace")
+        # Fallback: try text/html and strip tags
+        for part in msg.walk():
+            content_type = part.get_content_type()
+            disposition = str(part.get("Content-Disposition", ""))
+            if "attachment" in disposition:
+                continue
+            if content_type == "text/html":
+                payload = part.get_payload(decode=True)
+                if payload:
+                    charset = part.get_content_charset() or "utf-8"
+                    html = payload.decode(charset, errors="replace")
+                    return _strip_html(html)
+        return ""
+    else:
+        payload = msg.get_payload(decode=True)
+        if payload:
+            charset = msg.get_content_charset() or "utf-8"
+            text = payload.decode(charset, errors="replace")
+            if msg.get_content_type() == "text/html":
+                return _strip_html(text)
+            return text
+        return ""
+
+
+def _strip_html(html: str) -> str:
+    """Naive HTML tag stripper for fallback text extraction."""
+    text = re.sub(r"<br\s*/?>", "\n", html, flags=re.IGNORECASE)
+    text = re.sub(r"<p[^>]*>", "\n", text, flags=re.IGNORECASE)
+    text = re.sub(r"</p>", "\n", text, flags=re.IGNORECASE)
+    text = re.sub(r"<[^>]+>", "", text)
+    text = re.sub(r"&nbsp;", " ", text)
+    text = re.sub(r"&amp;", "&", text)
+    text = re.sub(r"&lt;", "<", text)
+    text = re.sub(r"&gt;", ">", text)
+    text = re.sub(r"\n{3,}", "\n\n", text)
+    return text.strip()
+
+
+def _extract_email_address(raw: str) -> str:
+    """Extract bare email address from 'Name <addr>' format."""
+    match = re.search(r"<([^>]+)>", raw)
+    if match:
+        return match.group(1).strip().lower()
+    return raw.strip().lower()
+
+
+def _extract_attachments(msg: email_lib.message.Message) -> List[Dict[str, Any]]:
+    """Extract attachment metadata and cache files locally."""
+    attachments = []
+    if not msg.is_multipart():
+        return attachments
+
+    for part in msg.walk():
+        disposition = str(part.get("Content-Disposition", ""))
+        if "attachment" not in disposition and "inline" not in disposition:
+            continue
+        # Skip text/plain and text/html body parts
+        content_type = part.get_content_type()
+        if content_type in ("text/plain", "text/html") and "attachment" not in disposition:
+            continue
+
+        filename = part.get_filename()
+        if filename:
+            filename = _decode_header_value(filename)
+        else:
+            ext = part.get_content_subtype() or "bin"
+            filename = f"attachment.{ext}"
+
+        payload = part.get_payload(decode=True)
+        if not payload:
+            continue
+
+        ext = Path(filename).suffix.lower()
+        if ext in _IMAGE_EXTS:
+            cached_path = cache_image_from_bytes(payload, ext)
+            attachments.append({
+                "path": cached_path,
+                "filename": filename,
+                "type": "image",
+                "media_type": content_type,
+            })
+        else:
+            cached_path = cache_document_from_bytes(payload, filename)
+            attachments.append({
+                "path": cached_path,
+                "filename": filename,
+                "type": "document",
+                "media_type": content_type,
+            })
+
+    return attachments
+
+
+class EmailAdapter(BasePlatformAdapter):
+    """Email gateway adapter using IMAP (receive) and SMTP (send)."""
+
+    def __init__(self, config: PlatformConfig):
+        super().__init__(config, Platform.EMAIL)
+
+        self._address = os.getenv("EMAIL_ADDRESS", "")
+        self._password = os.getenv("EMAIL_PASSWORD", "")
+        self._imap_host = os.getenv("EMAIL_IMAP_HOST", "")
+        self._imap_port = int(os.getenv("EMAIL_IMAP_PORT", "993"))
+        self._smtp_host = os.getenv("EMAIL_SMTP_HOST", "")
+        self._smtp_port = int(os.getenv("EMAIL_SMTP_PORT", "587"))
+        self._poll_interval = int(os.getenv("EMAIL_POLL_INTERVAL", "15"))
+
+        # Track message IDs we've already processed to avoid duplicates
+        self._seen_uids: set = set()
+        self._poll_task: Optional[asyncio.Task] = None
+
+        # Map chat_id (sender email) -> last subject + message-id for threading
+        self._thread_context: Dict[str, Dict[str, str]] = {}
+
+        logger.info("[Email] Adapter initialized for %s", self._address)
+
+    async def connect(self) -> bool:
+        """Connect to the IMAP server and start polling for new messages."""
+        try:
+            # Test IMAP connection
+            imap = imaplib.IMAP4_SSL(self._imap_host, self._imap_port)
+            imap.login(self._address, self._password)
+            # Mark all existing messages as seen so we only process new ones
+            imap.select("INBOX")
+            status, data = imap.search(None, "ALL")
+            if status == "OK" and data[0]:
+                for uid in data[0].split():
+                    self._seen_uids.add(uid)
+            imap.logout()
+            logger.info("[Email] IMAP connection test passed. %d existing messages skipped.", len(self._seen_uids))
+        except Exception as e:
+            logger.error("[Email] IMAP connection failed: %s", e)
+            return False
+
+        try:
+            # Test SMTP connection
+            smtp = smtplib.SMTP(self._smtp_host, self._smtp_port)
+            smtp.starttls()
+            smtp.login(self._address, self._password)
+            smtp.quit()
+            logger.info("[Email] SMTP connection test passed.")
+        except Exception as e:
+            logger.error("[Email] SMTP connection failed: %s", e)
+            return False
+
+        self._running = True
+        self._poll_task = asyncio.create_task(self._poll_loop())
+        print(f"[Email] Connected as {self._address}")
+        return True
+
+    async def disconnect(self) -> None:
+        """Stop polling and disconnect."""
+        self._running = False
+        if self._poll_task:
+            self._poll_task.cancel()
+            try:
+                await self._poll_task
+            except asyncio.CancelledError:
+                pass
+            self._poll_task = None
+        logger.info("[Email] Disconnected.")
+
+    async def _poll_loop(self) -> None:
+        """Poll IMAP for new messages at regular intervals."""
+        while self._running:
+            try:
+                await self._check_inbox()
+            except asyncio.CancelledError:
+                break
+            except Exception as e:
+                logger.error("[Email] Poll error: %s", e)
+            await asyncio.sleep(self._poll_interval)
+
+    async def _check_inbox(self) -> None:
+        """Check INBOX for unseen messages and dispatch them."""
+        # Run IMAP operations in a thread to avoid blocking the event loop
+        loop = asyncio.get_running_loop()
+        messages = await loop.run_in_executor(None, self._fetch_new_messages)
+        for msg_data in messages:
+            await self._dispatch_message(msg_data)
+
+    def _fetch_new_messages(self) -> List[Dict[str, Any]]:
+        """Fetch new (unseen) messages from IMAP. Runs in executor thread."""
+        results = []
+        try:
+            imap = imaplib.IMAP4_SSL(self._imap_host, self._imap_port)
+            imap.login(self._address, self._password)
+            imap.select("INBOX")
+
+            status, data = imap.search(None, "UNSEEN")
+            if status != "OK" or not data[0]:
+                imap.logout()
+                return results
+
+            for uid in data[0].split():
+                if uid in self._seen_uids:
+                    continue
+                self._seen_uids.add(uid)
+
+                status, msg_data = imap.fetch(uid, "(RFC822)")
+                if status != "OK":
+                    continue
+
+                raw_email = msg_data[0][1]
+                msg = email_lib.message_from_bytes(raw_email)
+
+                sender_raw = msg.get("From", "")
+                sender_addr = _extract_email_address(sender_raw)
+                sender_name = _decode_header_value(sender_raw)
+                # Remove email from name if present
+                if "<" in sender_name:
+                    sender_name = sender_name.split("<")[0].strip().strip('"')
+
+                subject = _decode_header_value(msg.get("Subject", "(no subject)"))
+                message_id = msg.get("Message-ID", "")
+                in_reply_to = msg.get("In-Reply-To", "")
+                body = _extract_text_body(msg)
+                attachments = _extract_attachments(msg)
+
+                results.append({
+                    "uid": uid,
+                    "sender_addr": sender_addr,
+                    "sender_name": sender_name,
+                    "subject": subject,
+                    "message_id": message_id,
+                    "in_reply_to": in_reply_to,
+                    "body": body,
+                    "attachments": attachments,
+                    "date": msg.get("Date", ""),
+                })
+
+            imap.logout()
+        except Exception as e:
+            logger.error("[Email] IMAP fetch error: %s", e)
+        return results
+
+    async def _dispatch_message(self, msg_data: Dict[str, Any]) -> None:
+        """Convert a fetched email into a MessageEvent and dispatch it."""
+        sender_addr = msg_data["sender_addr"]
+
+        # Skip self-messages
+        if sender_addr == self._address.lower():
+            return
+
+        subject = msg_data["subject"]
+        body = msg_data["body"].strip()
+        attachments = msg_data["attachments"]
+
+        # Build message text: include subject as context
+        text = body
+        if subject and not subject.startswith("Re:"):
+            text = f"[Subject: {subject}]\n\n{body}"
+
+        # Determine message type and media
+        media_urls = []
+        media_types = []
+        msg_type = MessageType.TEXT
+
+        for att in attachments:
+            media_urls.append(att["path"])
+            media_types.append(att["media_type"])
+            if att["type"] == "image":
+                msg_type = MessageType.PHOTO
+
+        # Store thread context for reply threading
+        self._thread_context[sender_addr] = {
+            "subject": subject,
+            "message_id": msg_data["message_id"],
+        }
+
+        source = self.build_source(
+            chat_id=sender_addr,
+            chat_name=msg_data["sender_name"] or sender_addr,
+            chat_type="dm",
+            user_id=sender_addr,
+            user_name=msg_data["sender_name"] or sender_addr,
+        )
+
+        event = MessageEvent(
+            text=text or "(empty email)",
+            message_type=msg_type,
+            source=source,
+            message_id=msg_data["message_id"],
+            media_urls=media_urls,
+            media_types=media_types,
+            reply_to_message_id=msg_data["in_reply_to"] or None,
+        )
+
+        logger.info("[Email] New message from %s: %s", sender_addr, subject)
+        await self.handle_message(event)
+
+    async def send(
+        self,
+        chat_id: str,
+        content: str,
+        reply_to: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> SendResult:
+        """Send an email reply to the given address."""
+        try:
+            loop = asyncio.get_running_loop()
+            message_id = await loop.run_in_executor(
+                None, self._send_email, chat_id, content, reply_to
+            )
+            return SendResult(success=True, message_id=message_id)
+        except Exception as e:
+            logger.error("[Email] Send failed to %s: %s", chat_id, e)
+            return SendResult(success=False, error=str(e))
+
+    def _send_email(
+        self,
+        to_addr: str,
+        body: str,
+        reply_to_msg_id: Optional[str] = None,
+    ) -> str:
+        """Send an email via SMTP. Runs in executor thread."""
+        msg = MIMEMultipart()
+        msg["From"] = self._address
+        msg["To"] = to_addr
+
+        # Thread context for reply
+        ctx = self._thread_context.get(to_addr, {})
+        subject = ctx.get("subject", "Hermes Agent")
+        if not subject.startswith("Re:"):
+            subject = f"Re: {subject}"
+        msg["Subject"] = subject
+
+        # Threading headers
+        original_msg_id = reply_to_msg_id or ctx.get("message_id")
+        if original_msg_id:
+            msg["In-Reply-To"] = original_msg_id
+            msg["References"] = original_msg_id
+
+        msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
+        msg["Message-ID"] = msg_id
+
+        msg.attach(MIMEText(body, "plain", "utf-8"))
+
+        smtp = smtplib.SMTP(self._smtp_host, self._smtp_port)
+        smtp.starttls()
+        smtp.login(self._address, self._password)
+        smtp.send_message(msg)
+        smtp.quit()
+
+        logger.info("[Email] Sent reply to %s (subject: %s)", to_addr, subject)
+        return msg_id
+
+    async def send_typing(self, chat_id: str) -> None:
+        """Email has no typing indicator — no-op."""
+        pass
+
+    async def send_image(
+        self,
+        chat_id: str,
+        image_url: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send an image URL as part of an email body."""
+        text = caption or ""
+        text += f"\n\nImage: {image_url}"
+        return await self.send(chat_id, text.strip(), reply_to)
+
+    async def send_document(
+        self,
+        chat_id: str,
+        file_path: str,
+        caption: Optional[str] = None,
+        file_name: Optional[str] = None,
+        reply_to: Optional[str] = None,
+    ) -> SendResult:
+        """Send a file as an email attachment."""
+        try:
+            loop = asyncio.get_running_loop()
+            message_id = await loop.run_in_executor(
+                None,
+                self._send_email_with_attachment,
+                chat_id,
+                caption or "",
+                file_path,
+                file_name,
+            )
+            return SendResult(success=True, message_id=message_id)
+        except Exception as e:
+            logger.error("[Email] Send document failed: %s", e)
+            return SendResult(success=False, error=str(e))
+
+    def _send_email_with_attachment(
+        self,
+        to_addr: str,
+        body: str,
+        file_path: str,
+        file_name: Optional[str] = None,
+    ) -> str:
+        """Send an email with a file attachment via SMTP."""
+        msg = MIMEMultipart()
+        msg["From"] = self._address
+        msg["To"] = to_addr
+
+        ctx = self._thread_context.get(to_addr, {})
+        subject = ctx.get("subject", "Hermes Agent")
+        if not subject.startswith("Re:"):
+            subject = f"Re: {subject}"
+        msg["Subject"] = subject
+
+        original_msg_id = ctx.get("message_id")
+        if original_msg_id:
+            msg["In-Reply-To"] = original_msg_id
+            msg["References"] = original_msg_id
+
+        msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
+        msg["Message-ID"] = msg_id
+
+        if body:
+            msg.attach(MIMEText(body, "plain", "utf-8"))
+
+        # Attach file
+        p = Path(file_path)
+        fname = file_name or p.name
+        with open(p, "rb") as f:
+            part = MIMEBase("application", "octet-stream")
+            part.set_payload(f.read())
+            encoders.encode_base64(part)
+            part.add_header("Content-Disposition", f"attachment; filename={fname}")
+            msg.attach(part)
+
+        smtp = smtplib.SMTP(self._smtp_host, self._smtp_port)
+        smtp.starttls()
+        smtp.login(self._address, self._password)
+        smtp.send_message(msg)
+        smtp.quit()
+
+        return msg_id
+
+    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
+        """Return basic info about the email chat."""
+        ctx = self._thread_context.get(chat_id, {})
+        return {
+            "name": chat_id,
+            "type": "dm",
+            "chat_id": chat_id,
+            "subject": ctx.get("subject", ""),
+        }
@@ -9,6 +9,7 @@ Uses slack-bolt (Python) with Socket Mode for:
 """

 import asyncio
+import logging
 import os
 import re
 from typing import Dict, List, Optional, Any
@@ -41,6 +42,9 @@ from gateway.platforms.base import (
 )


+logger = logging.getLogger(__name__)
+
+
 def check_slack_requirements() -> bool:
    """Check if Slack dependencies are available."""
    return SLACK_AVAILABLE
@@ -73,17 +77,19 @@ class SlackAdapter(BasePlatformAdapter):
    async def connect(self) -> bool:
        """Connect to Slack via Socket Mode."""
        if not SLACK_AVAILABLE:
-            print("[Slack] slack-bolt not installed. Run: pip install slack-bolt")
+            logger.error(
+                "[Slack] slack-bolt not installed. Run: pip install slack-bolt",
+            )
            return False

        bot_token = self.config.token
        app_token = os.getenv("SLACK_APP_TOKEN")

        if not bot_token:
-            print("[Slack] SLACK_BOT_TOKEN not set")
+            logger.error("[Slack] SLACK_BOT_TOKEN not set")
            return False
        if not app_token:
-            print("[Slack] SLACK_APP_TOKEN not set")
+            logger.error("[Slack] SLACK_APP_TOKEN not set")
            return False

        try:
@@ -117,19 +123,22 @@ class SlackAdapter(BasePlatformAdapter):
            asyncio.create_task(self._handler.start_async())

            self._running = True
-            print(f"[Slack] Connected as @{bot_name} (Socket Mode)")
+            logger.info("[Slack] Connected as @%s (Socket Mode)", bot_name)
            return True

-        except Exception as e:
-            print(f"[Slack] Connection failed: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error("[Slack] Connection failed: %s", e, exc_info=True)
            return False

    async def disconnect(self) -> None:
        """Disconnect from Slack."""
        if self._handler:
-            await self._handler.close_async()
+            try:
+                await self._handler.close_async()
+            except Exception as e:  # pragma: no cover - defensive logging
+                logger.warning("[Slack] Error while closing Socket Mode handler: %s", e, exc_info=True)
        self._running = False
-        print("[Slack] Disconnected")
+        logger.info("[Slack] Disconnected")

    async def send(
        self,
@@ -162,8 +171,8 @@ class SlackAdapter(BasePlatformAdapter):
                raw_response=result,
            )

-        except Exception as e:
-            print(f"[Slack] Send error: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error("[Slack] Send error: %s", e, exc_info=True)
            return SendResult(success=False, error=str(e))

    async def edit_message(
@@ -182,7 +191,14 @@ class SlackAdapter(BasePlatformAdapter):
                text=content,
            )
            return SendResult(success=True, message_id=message_id)
-        except Exception as e:
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error(
+                "[Slack] Failed to edit message %s in channel %s: %s",
+                message_id,
+                chat_id,
+                e,
+                exc_info=True,
+            )
            return SendResult(success=False, error=str(e))

    async def send_typing(self, chat_id: str, metadata=None) -> None:
@@ -214,8 +230,14 @@ class SlackAdapter(BasePlatformAdapter):
            )
            return SendResult(success=True, raw_response=result)

-        except Exception as e:
-            print(f"[{self.name}] Failed to send local image: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error(
+                "[%s] Failed to send local Slack image %s: %s",
+                self.name,
+                image_path,
+                e,
+                exc_info=True,
+            )
            return await super().send_image_file(chat_id, image_path, caption, reply_to)

    async def send_image(
@@ -247,7 +269,13 @@ class SlackAdapter(BasePlatformAdapter):

            return SendResult(success=True, raw_response=result)

-        except Exception as e:
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.warning(
+                "[Slack] Failed to upload image from URL %s, falling back to text: %s",
+                image_url,
+                e,
+                exc_info=True,
+            )
            # Fall back to sending the URL as text
            text = f"{caption}\n{image_url}" if caption else image_url
            return await self.send(chat_id=chat_id, content=text, reply_to=reply_to)
@@ -273,7 +301,13 @@ class SlackAdapter(BasePlatformAdapter):
            )
            return SendResult(success=True, raw_response=result)

-        except Exception as e:
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error(
+                "[Slack] Failed to send audio file %s: %s",
+                audio_path,
+                e,
+                exc_info=True,
+            )
            return SendResult(success=False, error=str(e))

    async def send_video(
@@ -300,8 +334,14 @@ class SlackAdapter(BasePlatformAdapter):
            )
            return SendResult(success=True, raw_response=result)

-        except Exception as e:
-            print(f"[{self.name}] Failed to send video: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error(
+                "[%s] Failed to send video %s: %s",
+                self.name,
+                video_path,
+                e,
+                exc_info=True,
+            )
            return await super().send_video(chat_id, video_path, caption, reply_to)

    async def send_document(
@@ -331,8 +371,14 @@ class SlackAdapter(BasePlatformAdapter):
            )
            return SendResult(success=True, raw_response=result)

-        except Exception as e:
-            print(f"[{self.name}] Failed to send document: {e}")
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error(
+                "[%s] Failed to send document %s: %s",
+                self.name,
+                file_path,
+                e,
+                exc_info=True,
+            )
            return await super().send_document(chat_id, file_path, caption, file_name, reply_to)

    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
@@ -348,7 +394,13 @@ class SlackAdapter(BasePlatformAdapter):
                "name": channel.get("name", chat_id),
                "type": "dm" if is_dm else "group",
            }
-        except Exception:
+        except Exception as e:  # pragma: no cover - defensive logging
+            logger.error(
+                "[Slack] Failed to fetch chat info for %s: %s",
+                chat_id,
+                e,
+                exc_info=True,
+            )
            return {"name": chat_id, "type": "unknown"}

    # ----- Internal handlers -----
@@ -403,8 +455,8 @@ class SlackAdapter(BasePlatformAdapter):
                    media_urls.append(cached)
                    media_types.append(mimetype)
                    msg_type = MessageType.PHOTO
-                except Exception as e:
-                    print(f"[Slack] Failed to cache image: {e}", flush=True)
+                except Exception as e:  # pragma: no cover - defensive logging
+                    logger.warning("[Slack] Failed to cache image from %s: %s", url, e, exc_info=True)
            elif mimetype.startswith("audio/") and url:
                try:
                    ext = "." + mimetype.split("/")[-1].split(";")[0]
@@ -414,8 +466,8 @@ class SlackAdapter(BasePlatformAdapter):
                    media_urls.append(cached)
                    media_types.append(mimetype)
                    msg_type = MessageType.VOICE
-                except Exception as e:
-                    print(f"[Slack] Failed to cache audio: {e}", flush=True)
+                except Exception as e:  # pragma: no cover - defensive logging
+                    logger.warning("[Slack] Failed to cache audio from %s: %s", url, e, exc_info=True)
            elif url:
                # Try to handle as a document attachment
                try:
@@ -437,7 +489,7 @@ class SlackAdapter(BasePlatformAdapter):
                    file_size = f.get("size", 0)
                    MAX_DOC_BYTES = 20 * 1024 * 1024
                    if not file_size or file_size > MAX_DOC_BYTES:
-                        print(f"[Slack] Document too large or unknown size: {file_size}", flush=True)
+                        logger.warning("[Slack] Document too large or unknown size: %s", file_size)
                        continue

                    # Download and cache
@@ -449,7 +501,7 @@ class SlackAdapter(BasePlatformAdapter):
                    media_urls.append(cached_path)
                    media_types.append(doc_mime)
                    msg_type = MessageType.DOCUMENT
-                    print(f"[Slack] Cached user document: {cached_path}", flush=True)
+                    logger.debug("[Slack] Cached user document: %s", cached_path)

                    # Inject text content for .txt/.md files (capped at 100 KB)
                    MAX_TEXT_INJECT_BYTES = 100 * 1024
@@ -466,8 +518,8 @@ class SlackAdapter(BasePlatformAdapter):
                        except UnicodeDecodeError:
                            pass  # Binary content, skip injection

-                except Exception as e:
-                    print(f"[Slack] Failed to cache document: {e}", flush=True)
+                except Exception as e:  # pragma: no cover - defensive logging
+                    logger.warning("[Slack] Failed to cache document from %s: %s", url, e, exc_info=True)

        # Build source
        source = self.build_source(
@@ -114,11 +114,14 @@ class TelegramAdapter(BasePlatformAdapter):
    async def connect(self) -> bool:
        """Connect to Telegram and start polling for updates."""
        if not TELEGRAM_AVAILABLE:
-            print(f"[{self.name}] python-telegram-bot not installed. Run: pip install python-telegram-bot")
+            logger.error(
+                "[%s] python-telegram-bot not installed. Run: pip install python-telegram-bot",
+                self.name,
+            )
            return False
        
        if not self.config.token:
-            print(f"[{self.name}] No bot token configured")
+            logger.error("[%s] No bot token configured", self.name)
            return False
        
        try:
@@ -173,14 +176,19 @@ class TelegramAdapter(BasePlatformAdapter):
                    BotCommand("help", "Show available commands"),
                ])
            except Exception as e:
-                print(f"[{self.name}] Could not register command menu: {e}")
+                logger.warning(
+                    "[%s] Could not register Telegram command menu: %s",
+                    self.name,
+                    e,
+                    exc_info=True,
+                )
            
            self._running = True
-            print(f"[{self.name}] Connected and polling for updates")
+            logger.info("[%s] Connected and polling for Telegram updates", self.name)
            return True
            
        except Exception as e:
-            print(f"[{self.name}] Failed to connect: {e}")
+            logger.error("[%s] Failed to connect to Telegram: %s", self.name, e, exc_info=True)
            return False
    
    async def disconnect(self) -> None:
@@ -191,12 +199,12 @@ class TelegramAdapter(BasePlatformAdapter):
                await self._app.stop()
                await self._app.shutdown()
            except Exception as e:
-                print(f"[{self.name}] Error during disconnect: {e}")
+                logger.warning("[%s] Error during Telegram disconnect: %s", self.name, e, exc_info=True)
        
        self._running = False
        self._app = None
        self._bot = None
-        print(f"[{self.name}] Disconnected")
+        logger.info("[%s] Disconnected from Telegram", self.name)
    
    async def send(
        self,
@@ -252,6 +260,7 @@ class TelegramAdapter(BasePlatformAdapter):
            )
            
        except Exception as e:
+            logger.error("[%s] Failed to send Telegram message: %s", self.name, e, exc_info=True)
            return SendResult(success=False, error=str(e))

    async def edit_message(
@@ -281,6 +290,13 @@ class TelegramAdapter(BasePlatformAdapter):
                )
            return SendResult(success=True, message_id=message_id)
        except Exception as e:
+            logger.error(
+                "[%s] Failed to edit Telegram message %s: %s",
+                self.name,
+                message_id,
+                e,
+                exc_info=True,
+            )
            return SendResult(success=False, error=str(e))

    async def send_voice(
@@ -323,7 +339,12 @@ class TelegramAdapter(BasePlatformAdapter):
                    )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
-            print(f"[{self.name}] Failed to send voice/audio: {e}")
+            logger.error(
+                "[%s] Failed to send Telegram voice/audio, falling back to base adapter: %s",
+                self.name,
+                e,
+                exc_info=True,
+            )
            return await super().send_voice(chat_id, audio_path, caption, reply_to)
    
    async def send_image_file(
@@ -332,6 +353,7 @@ class TelegramAdapter(BasePlatformAdapter):
        image_path: str,
        caption: Optional[str] = None,
        reply_to: Optional[str] = None,
+        **kwargs,
    ) -> SendResult:
        """Send a local image file natively as a Telegram photo."""
        if not self._bot:
@@ -351,9 +373,74 @@ class TelegramAdapter(BasePlatformAdapter):
                )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
-            print(f"[{self.name}] Failed to send local image: {e}")
+            logger.error(
+                "[%s] Failed to send Telegram local image, falling back to base adapter: %s",
+                self.name,
+                e,
+                exc_info=True,
+            )
            return await super().send_image_file(chat_id, image_path, caption, reply_to)

+    async def send_document(
+        self,
+        chat_id: str,
+        file_path: str,
+        caption: Optional[str] = None,
+        file_name: Optional[str] = None,
+        reply_to: Optional[str] = None,
+        **kwargs,
+    ) -> SendResult:
+        """Send a document/file natively as a Telegram file attachment."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            if not os.path.exists(file_path):
+                return SendResult(success=False, error=f"File not found: {file_path}")
+
+            display_name = file_name or os.path.basename(file_path)
+
+            with open(file_path, "rb") as f:
+                msg = await self._bot.send_document(
+                    chat_id=int(chat_id),
+                    document=f,
+                    filename=display_name,
+                    caption=caption[:1024] if caption else None,
+                    reply_to_message_id=int(reply_to) if reply_to else None,
+                )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            print(f"[{self.name}] Failed to send document: {e}")
+            return await super().send_document(chat_id, file_path, caption, file_name, reply_to)
+
+    async def send_video(
+        self,
+        chat_id: str,
+        video_path: str,
+        caption: Optional[str] = None,
+        reply_to: Optional[str] = None,
+        **kwargs,
+    ) -> SendResult:
+        """Send a video natively as a Telegram video message."""
+        if not self._bot:
+            return SendResult(success=False, error="Not connected")
+
+        try:
+            if not os.path.exists(video_path):
+                return SendResult(success=False, error=f"Video file not found: {video_path}")
+
+            with open(video_path, "rb") as f:
+                msg = await self._bot.send_video(
+                    chat_id=int(chat_id),
+                    video=f,
+                    caption=caption[:1024] if caption else None,
+                    reply_to_message_id=int(reply_to) if reply_to else None,
+                )
+            return SendResult(success=True, message_id=str(msg.message_id))
+        except Exception as e:
+            print(f"[{self.name}] Failed to send video: {e}")
+            return await super().send_video(chat_id, video_path, caption, reply_to)
+
    async def send_image(
        self,
        chat_id: str,
@@ -382,7 +469,12 @@ class TelegramAdapter(BasePlatformAdapter):
            )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
-            logger.warning("[%s] URL-based send_photo failed (%s), trying file upload", self.name, e)
+            logger.warning(
+                "[%s] URL-based send_photo failed, trying file upload: %s",
+                self.name,
+                e,
+                exc_info=True,
+            )
            # Fallback: download and upload as file (supports up to 10MB)
            try:
                import httpx
@@ -399,7 +491,12 @@ class TelegramAdapter(BasePlatformAdapter):
                )
                return SendResult(success=True, message_id=str(msg.message_id))
            except Exception as e2:
-                logger.error("[%s] File upload send_photo also failed: %s", self.name, e2)
+                logger.error(
+                    "[%s] File upload send_photo also failed: %s",
+                    self.name,
+                    e2,
+                    exc_info=True,
+                )
                # Final fallback: send URL as text
                return await super().send_image(chat_id, image_url, caption, reply_to)
    
@@ -426,7 +523,12 @@ class TelegramAdapter(BasePlatformAdapter):
            )
            return SendResult(success=True, message_id=str(msg.message_id))
        except Exception as e:
-            print(f"[{self.name}] Failed to send animation, falling back to photo: {e}")
+            logger.error(
+                "[%s] Failed to send Telegram animation, falling back to photo: %s",
+                self.name,
+                e,
+                exc_info=True,
+            )
            # Fallback: try as a regular photo
            return await self.send_image(chat_id, animation_url, caption, reply_to)

@@ -440,8 +542,14 @@ class TelegramAdapter(BasePlatformAdapter):
                    action="typing",
                    message_thread_id=int(_typing_thread) if _typing_thread else None,
                )
-            except Exception:
-                pass  # Ignore typing indicator failures
+            except Exception as e:
+                # Typing failures are non-fatal; log at debug level only.
+                logger.debug(
+                    "[%s] Failed to send Telegram typing indicator: %s",
+                    self.name,
+                    e,
+                    exc_info=True,
+                )
    
    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
        """Get information about a Telegram chat."""
@@ -468,6 +576,13 @@ class TelegramAdapter(BasePlatformAdapter):
                "is_forum": getattr(chat, "is_forum", False),
            }
        except Exception as e:
+            logger.error(
+                "[%s] Failed to get Telegram chat info for %s: %s",
+                self.name,
+                chat_id,
+                e,
+                exc_info=True,
+            )
            return {"name": str(chat_id), "type": "dm", "error": str(e)}
    
    def format_message(self, content: str) -> str:
@@ -656,9 +771,9 @@ class TelegramAdapter(BasePlatformAdapter):
                cached_path = cache_image_from_bytes(bytes(image_bytes), ext=ext)
                event.media_urls = [cached_path]
                event.media_types = [f"image/{ext.lstrip('.')}"]
-                print(f"[Telegram] Cached user photo: {cached_path}", flush=True)
+                logger.info("[Telegram] Cached user photo at %s", cached_path)
            except Exception as e:
-                print(f"[Telegram] Failed to cache photo: {e}", flush=True)
+                logger.warning("[Telegram] Failed to cache photo: %s", e, exc_info=True)
        
        # Download voice/audio messages to cache for STT transcription
        if msg.voice:
@@ -668,9 +783,9 @@ class TelegramAdapter(BasePlatformAdapter):
                cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".ogg")
                event.media_urls = [cached_path]
                event.media_types = ["audio/ogg"]
-                print(f"[Telegram] Cached user voice: {cached_path}", flush=True)
+                logger.info("[Telegram] Cached user voice at %s", cached_path)
            except Exception as e:
-                print(f"[Telegram] Failed to cache voice: {e}", flush=True)
+                logger.warning("[Telegram] Failed to cache voice: %s", e, exc_info=True)
        elif msg.audio:
            try:
                file_obj = await msg.audio.get_file()
@@ -678,9 +793,9 @@ class TelegramAdapter(BasePlatformAdapter):
                cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".mp3")
                event.media_urls = [cached_path]
                event.media_types = ["audio/mp3"]
-                print(f"[Telegram] Cached user audio: {cached_path}", flush=True)
+                logger.info("[Telegram] Cached user audio at %s", cached_path)
            except Exception as e:
-                print(f"[Telegram] Failed to cache audio: {e}", flush=True)
+                logger.warning("[Telegram] Failed to cache audio: %s", e, exc_info=True)

        # Download document files to cache for agent processing
        elif msg.document:
@@ -705,7 +820,7 @@ class TelegramAdapter(BasePlatformAdapter):
                        f"Unsupported document type '{ext or 'unknown'}'. "
                        f"Supported types: {supported_list}"
                    )
-                    print(f"[Telegram] Unsupported document type: {ext or 'unknown'}", flush=True)
+                    logger.info("[Telegram] Unsupported document type: %s", ext or "unknown")
                    await self.handle_message(event)
                    return

@@ -716,7 +831,7 @@ class TelegramAdapter(BasePlatformAdapter):
                        "The document is too large or its size could not be verified. "
                        "Maximum: 20 MB."
                    )
-                    print(f"[Telegram] Document too large: {doc.file_size} bytes", flush=True)
+                    logger.info("[Telegram] Document too large: %s bytes", doc.file_size)
                    await self.handle_message(event)
                    return

@@ -728,7 +843,7 @@ class TelegramAdapter(BasePlatformAdapter):
                mime_type = SUPPORTED_DOCUMENT_TYPES[ext]
                event.media_urls = [cached_path]
                event.media_types = [mime_type]
-                print(f"[Telegram] Cached user document: {cached_path}", flush=True)
+                logger.info("[Telegram] Cached user document at %s", cached_path)

                # For text files, inject content into event.text (capped at 100 KB)
                MAX_TEXT_INJECT_BYTES = 100 * 1024
@@ -743,10 +858,13 @@ class TelegramAdapter(BasePlatformAdapter):
                        else:
                            event.text = injection
                    except UnicodeDecodeError:
-                        print(f"[Telegram] Could not decode text file as UTF-8, skipping content injection", flush=True)
+                        logger.warning(
+                            "[Telegram] Could not decode text file as UTF-8, skipping content injection",
+                            exc_info=True,
+                        )

            except Exception as e:
-                print(f"[Telegram] Failed to cache document: {e}", flush=True)
+                logger.warning("[Telegram] Failed to cache document: %s", e, exc_info=True)

        await self.handle_message(event)
    
@@ -781,7 +899,7 @@ class TelegramAdapter(BasePlatformAdapter):
            event.text = build_sticker_injection(
                cached["description"], cached.get("emoji", emoji), cached.get("set_name", set_name)
            )
-            print(f"[Telegram] Sticker cache hit: {sticker.file_unique_id}", flush=True)
+            logger.info("[Telegram] Sticker cache hit: %s", sticker.file_unique_id)
            return

        # Cache miss -- download and analyze
@@ -789,7 +907,7 @@ class TelegramAdapter(BasePlatformAdapter):
            file_obj = await sticker.get_file()
            image_bytes = await file_obj.download_as_bytearray()
            cached_path = cache_image_from_bytes(bytes(image_bytes), ext=".webp")
-            print(f"[Telegram] Analyzing sticker: {cached_path}", flush=True)
+            logger.info("[Telegram] Analyzing sticker at %s", cached_path)

            from tools.vision_tools import vision_analyze_tool
            import json as _json
@@ -811,7 +929,7 @@ class TelegramAdapter(BasePlatformAdapter):
                    emoji, set_name,
                )
        except Exception as e:
-            print(f"[Telegram] Sticker analysis error: {e}", flush=True)
+            logger.warning("[Telegram] Sticker analysis error: %s", e, exc_info=True)
            event.text = build_sticker_injection(
                f"a sticker with emoji {emoji}" if emoji else "a sticker",
                emoji, set_name,
@@ -181,8 +181,8 @@ class WhatsAppAdapter(BasePlatformAdapter):
            
            # Kill any orphaned bridge from a previous gateway run
            _kill_port_process(self._bridge_port)
-            import time
-            time.sleep(1)
+            import asyncio
+            await asyncio.sleep(1)
            
            # Start the bridge process in its own process group.
            # Route output to a log file so QR codes, errors, and reconnection
@@ -187,6 +187,30 @@ def _resolve_runtime_agent_kwargs() -> dict:
    }


+def _resolve_gateway_model() -> str:
+    """Read model from env/config — mirrors the resolution in _run_agent_sync.
+
+    Without this, temporary AIAgent instances (memory flush, /compress) fall
+    back to the hardcoded default ("anthropic/claude-opus-4.6") which fails
+    when the active provider is openai-codex.
+    """
+    model = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
+    try:
+        import yaml as _y
+        _cfg_path = _hermes_home / "config.yaml"
+        if _cfg_path.exists():
+            with open(_cfg_path, encoding="utf-8") as _f:
+                _cfg = _y.safe_load(_f) or {}
+            _model_cfg = _cfg.get("model", {})
+            if isinstance(_model_cfg, str):
+                model = _model_cfg
+            elif isinstance(_model_cfg, dict):
+                model = _model_cfg.get("default", model)
+    except Exception:
+        pass
+    return model
+
+
 class GatewayRunner:
    """
    Main gateway controller.
@@ -204,6 +228,7 @@ class GatewayRunner:
        self._prefill_messages = self._load_prefill_messages()
        self._ephemeral_system_prompt = self._load_ephemeral_system_prompt()
        self._reasoning_config = self._load_reasoning_config()
+        self._show_reasoning = self._load_show_reasoning()
        self._provider_routing = self._load_provider_routing()
        self._fallback_model = self._load_fallback_model()

@@ -258,8 +283,14 @@ class GatewayRunner:
            if not runtime_kwargs.get("api_key"):
                return

+            # Resolve model from config — AIAgent's default is OpenRouter-
+            # formatted ("anthropic/claude-opus-4.6") which fails when the
+            # active provider is openai-codex.
+            model = _resolve_gateway_model()
+
            tmp_agent = AIAgent(
                **runtime_kwargs,
+                model=model,
                max_iterations=8,
                quiet_mode=True,
                enabled_toolsets=["memory", "skills"],
@@ -391,6 +422,20 @@ class GatewayRunner:
        logger.warning("Unknown reasoning_effort '%s', using default (medium)", effort)
        return None

+    @staticmethod
+    def _load_show_reasoning() -> bool:
+        """Load show_reasoning toggle from config.yaml display section."""
+        try:
+            import yaml as _y
+            cfg_path = _hermes_home / "config.yaml"
+            if cfg_path.exists():
+                with open(cfg_path, encoding="utf-8") as _f:
+                    cfg = _y.safe_load(_f) or {}
+                return bool(cfg.get("display", {}).get("show_reasoning", False))
+        except Exception:
+            pass
+        return False
+
    @staticmethod
    def _load_background_notifications_mode() -> str:
        """Load background process notification mode from config or env var.
@@ -672,6 +717,13 @@ class GatewayRunner:
                return None
            return HomeAssistantAdapter(config)

+        elif platform == Platform.EMAIL:
+            from gateway.platforms.email import EmailAdapter, check_email_requirements
+            if not check_email_requirements():
+                logger.warning("Email: EMAIL_ADDRESS, EMAIL_PASSWORD, EMAIL_IMAP_HOST, or EMAIL_SMTP_HOST not set")
+                return None
+            return EmailAdapter(config)
+
        return None
    
    def _is_user_authorized(self, source: SessionSource) -> bool:
@@ -701,6 +753,7 @@ class GatewayRunner:
            Platform.WHATSAPP: "WHATSAPP_ALLOWED_USERS",
            Platform.SLACK: "SLACK_ALLOWED_USERS",
            Platform.SIGNAL: "SIGNAL_ALLOWED_USERS",
+            Platform.EMAIL: "EMAIL_ALLOWED_USERS",
        }
        platform_allow_all_map = {
            Platform.TELEGRAM: "TELEGRAM_ALLOW_ALL_USERS",
@@ -708,6 +761,7 @@ class GatewayRunner:
            Platform.WHATSAPP: "WHATSAPP_ALLOW_ALL_USERS",
            Platform.SLACK: "SLACK_ALLOW_ALL_USERS",
            Platform.SIGNAL: "SIGNAL_ALLOW_ALL_USERS",
+            Platform.EMAIL: "EMAIL_ALLOW_ALL_USERS",
        }

        # Per-platform allow-all flag (e.g., DISCORD_ALLOW_ALL_USERS=true)
@@ -806,7 +860,8 @@ class GatewayRunner:
        _known_commands = {"new", "reset", "help", "status", "stop", "model",
                          "personality", "retry", "undo", "sethome", "set-home",
                          "compress", "usage", "insights", "reload-mcp", "reload_mcp",
-                          "update", "title", "resume", "provider", "rollback"}
+                          "update", "title", "resume", "provider", "rollback",
+                          "background", "reasoning"}
        if command and command in _known_commands:
            await self.hooks.emit(f"command:{command}", {
                "platform": source.platform.value if source.platform else "",
@@ -868,7 +923,39 @@ class GatewayRunner:

        if command == "rollback":
            return await self._handle_rollback_command(event)
+
+        if command == "background":
+            return await self._handle_background_command(event)
+
+        if command == "reasoning":
+            return await self._handle_reasoning_command(event)
        
+        # User-defined quick commands (bypass agent loop, no LLM call)
+        if command:
+            quick_commands = self.config.get("quick_commands", {})
+            if command in quick_commands:
+                qcmd = quick_commands[command]
+                if qcmd.get("type") == "exec":
+                    exec_cmd = qcmd.get("command", "")
+                    if exec_cmd:
+                        try:
+                            proc = await asyncio.create_subprocess_shell(
+                                exec_cmd,
+                                stdout=asyncio.subprocess.PIPE,
+                                stderr=asyncio.subprocess.PIPE,
+                            )
+                            stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=30)
+                            output = (stdout or stderr).decode().strip()
+                            return output if output else "Command returned no output."
+                        except asyncio.TimeoutError:
+                            return "Quick command timed out (30s)."
+                        except Exception as e:
+                            return f"Quick command error: {e}"
+                    else:
+                        return f"Quick command '/{command}' has no command defined."
+                else:
+                    return f"Quick command '/{command}' has unsupported type (only 'exec' is supported)."
+
        # Skill slash commands: /skill-name loads the skill and sends to agent
        if command:
            try:
@@ -901,6 +988,10 @@ class GatewayRunner:
            elif user_text in ("no", "n", "deny", "cancel", "nope"):
                self._pending_approvals.pop(session_key_preview)
                return "❌ Command denied."
+            elif user_text in ("full", "show", "view", "show full", "view full"):
+                # Show full command without consuming the approval
+                cmd = self._pending_approvals[session_key_preview]["command"]
+                return f"Full command:\n\n```\n{cmd}\n```\n\nReply yes/no to approve or deny."
            # If it's not clearly an approval/denial, fall through to normal processing
        
        # Get or create session
@@ -950,9 +1041,12 @@ class GatewayRunner:
        # repeated truncation/context failures.  Detect this early and
        # compress proactively — before the agent even starts.  (#628)
        #
-        # Thresholds are derived from the SAME compression config the
-        # agent uses (compression.threshold × model context length) so
-        # CLI and messaging platforms behave identically.
+        # Token source priority:
+        # 1. Actual API-reported prompt_tokens from the last turn
+        #    (stored in session_entry.last_prompt_tokens)
+        # 2. Rough char-based estimate (str(msg)//4) with a 1.4x
+        #    safety factor to account for overestimation on tool-heavy
+        #    conversations (code/JSON tokenizes at 5-7+ chars/token).
        # -----------------------------------------------------------------
        if history and len(history) >= 4:
            from agent.model_metadata import (
@@ -1003,31 +1097,48 @@ class GatewayRunner:
                _compress_token_threshold = int(
                    _hyg_context_length * _hyg_threshold_pct
                )
-                # Warn if still huge after compression (95% of context)
                _warn_token_threshold = int(_hyg_context_length * 0.95)

                _msg_count = len(history)
-                _approx_tokens = estimate_messages_tokens_rough(history)
+
+                # Prefer actual API-reported tokens from the last turn
+                # (stored in session entry) over the rough char-based estimate.
+                # The rough estimate (str(msg)//4) overestimates by 30-50% on
+                # tool-heavy/code-heavy conversations, causing premature compression.
+                _stored_tokens = session_entry.last_prompt_tokens
+                if _stored_tokens > 0:
+                    _approx_tokens = _stored_tokens
+                    _token_source = "actual"
+                else:
+                    _approx_tokens = estimate_messages_tokens_rough(history)
+                    # Apply safety factor only for rough estimates
+                    _compress_token_threshold = int(
+                        _compress_token_threshold * 1.4
+                    )
+                    _warn_token_threshold = int(_warn_token_threshold * 1.4)
+                    _token_source = "estimated"

                _needs_compress = _approx_tokens >= _compress_token_threshold

                if _needs_compress:
                    logger.info(
-                        "Session hygiene: %s messages, ~%s tokens — auto-compressing "
+                        "Session hygiene: %s messages, ~%s tokens (%s) — auto-compressing "
                        "(threshold: %s%% of %s = %s tokens)",
-                        _msg_count, f"{_approx_tokens:,}",
+                        _msg_count, f"{_approx_tokens:,}", _token_source,
                        int(_hyg_threshold_pct * 100),
                        f"{_hyg_context_length:,}",
                        f"{_compress_token_threshold:,}",
                    )

                    _hyg_adapter = self.adapters.get(source.platform)
+                    _hyg_meta = {"thread_id": source.thread_id} if source.thread_id else None
                    if _hyg_adapter:
                        try:
                            await _hyg_adapter.send(
                                source.chat_id,
                                f"🗜️ Session is large ({_msg_count} messages, "
-                                f"~{_approx_tokens:,} tokens). Auto-compressing..."
+                                f"~{_approx_tokens:,} tokens). Auto-compressing...",
+                                metadata=_hyg_meta,
                            )
                        except Exception:
                            pass
@@ -1047,6 +1158,7 @@ class GatewayRunner:
                            if len(_hyg_msgs) >= 4:
                                _hyg_agent = AIAgent(
                                    **_hyg_runtime,
+                                    model=_hyg_model,
                                    max_iterations=4,
                                    quiet_mode=True,
                                    enabled_toolsets=["memory"],
@@ -1065,6 +1177,8 @@ class GatewayRunner:
                                self.session_store.rewrite_transcript(
                                    session_entry.session_id, _compressed
                                )
+                                # Reset stored token count — transcript was rewritten
+                                session_entry.last_prompt_tokens = 0
                                history = _compressed
                                _new_count = len(_compressed)
                                _new_tokens = estimate_messages_tokens_rough(
@@ -1085,7 +1199,8 @@ class GatewayRunner:
                                            f"🗜️ Compressed: {_msg_count} → "
                                            f"{_new_count} messages, "
                                            f"~{_approx_tokens:,} → "
-                                            f"~{_new_tokens:,} tokens"
+                                            f"~{_new_tokens:,} tokens",
+                                            metadata=_hyg_meta,
                                        )
                                    except Exception:
                                        pass
@@ -1105,7 +1220,8 @@ class GatewayRunner:
                                                "after compression "
                                                f"(~{_new_tokens:,} tokens). "
                                                "Consider using /reset to start "
-                                                "fresh if you experience issues."
+                                                "fresh if you experience issues.",
+                                                metadata=_hyg_meta,
                                            )
                                        except Exception:
                                            pass
@@ -1117,6 +1233,7 @@ class GatewayRunner:
                        # Compression failed and session is dangerously large
                        if _approx_tokens >= _warn_token_threshold:
                            _hyg_adapter = self.adapters.get(source.platform)
+                            _hyg_meta = {"thread_id": source.thread_id} if source.thread_id else None
                            if _hyg_adapter:
                                try:
                                    await _hyg_adapter.send(
@@ -1126,7 +1243,8 @@ class GatewayRunner:
                                        f"~{_approx_tokens:,} tokens) and "
                                        "auto-compression failed. Consider "
                                        "using /compress or /reset to avoid "
-                                        "issues."
+                                        "issues.",
+                                        metadata=_hyg_meta,
                                    )
                                except Exception:
                                    pass
@@ -1256,7 +1374,20 @@ class GatewayRunner:
            
            response = agent_result.get("final_response", "")
            agent_messages = agent_result.get("messages", [])
-            
+
+            # Prepend reasoning/thinking if display is enabled
+            if getattr(self, "_show_reasoning", False) and response:
+                last_reasoning = agent_result.get("last_reasoning")
+                if last_reasoning:
+                    # Collapse long reasoning to keep messages readable
+                    lines = last_reasoning.strip().splitlines()
+                    if len(lines) > 15:
+                        display_reasoning = "\n".join(lines[:15])
+                        display_reasoning += f"\n_... ({len(lines) - 15} more lines)_"
+                    else:
+                        display_reasoning = last_reasoning.strip()
+                    response = f"💭 **Reasoning:**\n```\n{display_reasoning}\n```\n\n{response}"
+
            # Emit agent:end hook
            await self.hooks.emit("agent:end", {
                **hook_ctx,
@@ -1338,8 +1469,11 @@ class GatewayRunner:
                        skip_db=agent_persisted,
                    )
            
-            # Update session
-            self.session_store.update_session(session_entry.session_key)
+            # Update session with actual prompt token count from the agent
+            self.session_store.update_session(
+                session_entry.session_key,
+                last_prompt_tokens=agent_result.get("last_prompt_tokens", 0),
+            )
            
            return response
            
@@ -1444,7 +1578,9 @@ class GatewayRunner:
            "`/resume [name]` — Resume a previously-named session",
            "`/usage` — Show token usage for this session",
            "`/insights [days]` — Show usage insights and analytics",
+            "`/reasoning [level|show|hide]` — Set reasoning effort or toggle display",
            "`/rollback [number]` — List or restore filesystem checkpoints",
+            "`/background <prompt>` — Run a prompt in a separate background session",
            "`/reload-mcp` — Reload MCP servers from config",
            "`/update` — Update Hermes Agent to the latest version",
            "`/help` — Show this message",
@@ -1475,7 +1611,7 @@ class GatewayRunner:
        config_path = _hermes_home / 'config.yaml'

        # Resolve current model and provider from config
-        current = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
+        current = os.getenv("HERMES_MODEL") or "anthropic/claude-opus-4.6"
        current_provider = "openrouter"
        try:
            if config_path.exists():
@@ -1678,14 +1814,39 @@ class GatewayRunner:

        if not args:
            lines = ["🎭 **Available Personalities**\n"]
+            lines.append("• `none` — (no personality overlay)")
            for name, prompt in personalities.items():
-                preview = prompt[:50] + "..." if len(prompt) > 50 else prompt
+                if isinstance(prompt, dict):
+                    preview = prompt.get("description") or prompt.get("system_prompt", "")[:50]
+                else:
+                    preview = prompt[:50] + "..." if len(prompt) > 50 else prompt
                lines.append(f"• `{name}` — {preview}")
            lines.append(f"\nUsage: `/personality <name>`")
            return "\n".join(lines)

-        if args in personalities:
-            new_prompt = personalities[args]
+        def _resolve_prompt(value):
+            if isinstance(value, dict):
+                parts = [value.get("system_prompt", "")]
+                if value.get("tone"):
+                    parts.append(f'Tone: {value["tone"]}')
+                if value.get("style"):
+                    parts.append(f'Style: {value["style"]}')
+                return "\n".join(p for p in parts if p)
+            return str(value)
+
+        if args in ("none", "default", "neutral"):
+            try:
+                if "agent" not in config or not isinstance(config.get("agent"), dict):
+                    config["agent"] = {}
+                config["agent"]["system_prompt"] = ""
+                with open(config_path, "w") as f:
+                    yaml.dump(config, f, default_flow_style=False, sort_keys=False)
+            except Exception as e:
+                return f"⚠️ Failed to save personality change: {e}"
+            self._ephemeral_system_prompt = ""
+            return "🎭 Personality cleared — using base agent behavior.\n_(takes effect on next message)_"
+        elif args in personalities:
+            new_prompt = _resolve_prompt(personalities[args])

            # Write to config.yaml, same pattern as CLI save_config_value.
            try:
@@ -1702,7 +1863,7 @@ class GatewayRunner:

            return f"🎭 Personality set to **{args}**\n_(takes effect on next message)_"

-        available = ", ".join(f"`{n}`" for n in personalities.keys())
+        available = "`none`, " + ", ".join(f"`{n}`" for n in personalities.keys())
        return f"Unknown personality: `{args}`\n\nAvailable: {available}"
    
    async def _handle_retry_command(self, event: MessageEvent) -> str:
@@ -1726,6 +1887,8 @@ class GatewayRunner:
        # Truncate history to before the last user message and persist
        truncated = history[:last_user_idx]
        self.session_store.rewrite_transcript(session_entry.session_id, truncated)
+        # Reset stored token count — transcript was truncated
+        session_entry.last_prompt_tokens = 0
        
        # Re-send by creating a fake text event with the old message
        retry_event = MessageEvent(
@@ -1757,6 +1920,8 @@ class GatewayRunner:
        removed_msg = history[last_user_idx].get("content", "")
        removed_count = len(history) - last_user_idx
        self.session_store.rewrite_transcript(session_entry.session_id, history[:last_user_idx])
+        # Reset stored token count — transcript was truncated
+        session_entry.last_prompt_tokens = 0
        
        preview = removed_msg[:40] + "..." if len(removed_msg) > 40 else removed_msg
        return f"↩️ Undid {removed_count} message(s).\nRemoved: \"{preview}\""
@@ -1850,6 +2015,279 @@ class GatewayRunner:
            )
        return f"❌ {result['error']}"

+    async def _handle_background_command(self, event: MessageEvent) -> str:
+        """Handle /background <prompt> — run a prompt in a separate background session.
+
+        Spawns a new AIAgent in a background thread with its own session.
+        When it completes, sends the result back to the same chat without
+        modifying the active session's conversation history.
+        """
+        prompt = event.get_command_args().strip()
+        if not prompt:
+            return (
+                "Usage: /background <prompt>\n"
+                "Example: /background Summarize the top HN stories today\n\n"
+                "Runs the prompt in a separate session. "
+                "You can keep chatting — the result will appear here when done."
+            )
+
+        source = event.source
+        task_id = f"bg_{datetime.now().strftime('%H%M%S')}_{os.urandom(3).hex()}"
+
+        # Fire-and-forget the background task
+        asyncio.create_task(
+            self._run_background_task(prompt, source, task_id)
+        )
+
+        preview = prompt[:60] + ("..." if len(prompt) > 60 else "")
+        return f'🔄 Background task started: "{preview}"\nTask ID: {task_id}\nYou can keep chatting — results will appear when done.'
+
+    async def _run_background_task(
+        self, prompt: str, source: "SessionSource", task_id: str
+    ) -> None:
+        """Execute a background agent task and deliver the result to the chat."""
+        from run_agent import AIAgent
+
+        adapter = self.adapters.get(source.platform)
+        if not adapter:
+            logger.warning("No adapter for platform %s in background task %s", source.platform, task_id)
+            return
+
+        _thread_metadata = {"thread_id": source.thread_id} if source.thread_id else None
+
+        try:
+            runtime_kwargs = _resolve_runtime_agent_kwargs()
+            if not runtime_kwargs.get("api_key"):
+                await adapter.send(
+                    source.chat_id,
+                    f"❌ Background task {task_id} failed: no provider credentials configured.",
+                    metadata=_thread_metadata,
+                )
+                return
+
+            # Read model from config via shared helper
+            model = _resolve_gateway_model()
+
+            # Determine toolset (same logic as _run_agent)
+            default_toolset_map = {
+                Platform.LOCAL: "hermes-cli",
+                Platform.TELEGRAM: "hermes-telegram",
+                Platform.DISCORD: "hermes-discord",
+                Platform.WHATSAPP: "hermes-whatsapp",
+                Platform.SLACK: "hermes-slack",
+                Platform.SIGNAL: "hermes-signal",
+                Platform.HOMEASSISTANT: "hermes-homeassistant",
+                Platform.EMAIL: "hermes-email",
+            }
+            platform_toolsets_config = {}
+            try:
+                config_path = _hermes_home / 'config.yaml'
+                if config_path.exists():
+                    import yaml
+                    with open(config_path, 'r', encoding="utf-8") as f:
+                        user_config = yaml.safe_load(f) or {}
+                    platform_toolsets_config = user_config.get("platform_toolsets", {})
+            except Exception:
+                pass
+
+            platform_config_key = {
+                Platform.LOCAL: "cli",
+                Platform.TELEGRAM: "telegram",
+                Platform.DISCORD: "discord",
+                Platform.WHATSAPP: "whatsapp",
+                Platform.SLACK: "slack",
+                Platform.SIGNAL: "signal",
+                Platform.HOMEASSISTANT: "homeassistant",
+                Platform.EMAIL: "email",
+            }.get(source.platform, "telegram")
+
+            config_toolsets = platform_toolsets_config.get(platform_config_key)
+            if config_toolsets and isinstance(config_toolsets, list):
+                enabled_toolsets = config_toolsets
+            else:
+                default_toolset = default_toolset_map.get(source.platform, "hermes-telegram")
+                enabled_toolsets = [default_toolset]
+
+            platform_key = "cli" if source.platform == Platform.LOCAL else source.platform.value
+
+            pr = self._provider_routing
+            max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
+
+            def run_sync():
+                agent = AIAgent(
+                    model=model,
+                    **runtime_kwargs,
+                    max_iterations=max_iterations,
+                    quiet_mode=True,
+                    verbose_logging=False,
+                    enabled_toolsets=enabled_toolsets,
+                    reasoning_config=self._reasoning_config,
+                    providers_allowed=pr.get("only"),
+                    providers_ignored=pr.get("ignore"),
+                    providers_order=pr.get("order"),
+                    provider_sort=pr.get("sort"),
+                    provider_require_parameters=pr.get("require_parameters", False),
+                    provider_data_collection=pr.get("data_collection"),
+                    session_id=task_id,
+                    platform=platform_key,
+                    session_db=self._session_db,
+                    fallback_model=self._fallback_model,
+                )
+
+                return agent.run_conversation(
+                    user_message=prompt,
+                    task_id=task_id,
+                )
+
+            loop = asyncio.get_event_loop()
+            result = await loop.run_in_executor(None, run_sync)
+
+            response = result.get("final_response", "") if result else ""
+            if not response and result and result.get("error"):
+                response = f"Error: {result['error']}"
+
+            # Extract media files from the response
+            if response:
+                media_files, response = adapter.extract_media(response)
+                images, text_content = adapter.extract_images(response)
+
+                preview = prompt[:60] + ("..." if len(prompt) > 60 else "")
+                header = f'✅ Background task complete\nPrompt: "{preview}"\n\n'
+
+                if text_content:
+                    await adapter.send(
+                        chat_id=source.chat_id,
+                        content=header + text_content,
+                        metadata=_thread_metadata,
+                    )
+                elif not images and not media_files:
+                    await adapter.send(
+                        chat_id=source.chat_id,
+                        content=header + "(No response generated)",
+                        metadata=_thread_metadata,
+                    )
+
+                # Send extracted images
+                for image_url, alt_text in (images or []):
+                    try:
+                        await adapter.send_image(
+                            chat_id=source.chat_id,
+                            image_url=image_url,
+                            caption=alt_text,
+                        )
+                    except Exception:
+                        pass
+
+                # Send media files
+                for media_path in (media_files or []):
+                    try:
+                        await adapter.send_file(
+                            chat_id=source.chat_id,
+                            file_path=media_path,
+                        )
+                    except Exception:
+                        pass
+            else:
+                preview = prompt[:60] + ("..." if len(prompt) > 60 else "")
+                await adapter.send(
+                    chat_id=source.chat_id,
+                    content=f'✅ Background task complete\nPrompt: "{preview}"\n\n(No response generated)',
+                    metadata=_thread_metadata,
+                )
+
+        except Exception as e:
+            logger.exception("Background task %s failed", task_id)
+            try:
+                await adapter.send(
+                    chat_id=source.chat_id,
+                    content=f"❌ Background task {task_id} failed: {e}",
+                    metadata=_thread_metadata,
+                )
+            except Exception:
+                pass
+
+    async def _handle_reasoning_command(self, event: MessageEvent) -> str:
+        """Handle /reasoning command — manage reasoning effort and display toggle.
+
+        Usage:
+            /reasoning              Show current effort level and display state
+            /reasoning <level>      Set reasoning effort (none, low, medium, high, xhigh)
+            /reasoning show|on      Show model reasoning in responses
+            /reasoning hide|off     Hide model reasoning from responses
+        """
+        import yaml
+
+        args = event.get_command_args().strip().lower()
+        config_path = _hermes_home / "config.yaml"
+
+        def _save_config_key(key_path: str, value):
+            """Save a dot-separated key to config.yaml."""
+            try:
+                user_config = {}
+                if config_path.exists():
+                    with open(config_path, encoding="utf-8") as f:
+                        user_config = yaml.safe_load(f) or {}
+                keys = key_path.split(".")
+                current = user_config
+                for k in keys[:-1]:
+                    if k not in current or not isinstance(current[k], dict):
+                        current[k] = {}
+                    current = current[k]
+                current[keys[-1]] = value
+                with open(config_path, "w", encoding="utf-8") as f:
+                    yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
+                return True
+            except Exception as e:
+                logger.error("Failed to save config key %s: %s", key_path, e)
+                return False
+
+        if not args:
+            # Show current state
+            rc = self._reasoning_config
+            if rc is None:
+                level = "medium (default)"
+            elif rc.get("enabled") is False:
+                level = "none (disabled)"
+            else:
+                level = rc.get("effort", "medium")
+            display_state = "on ✓" if self._show_reasoning else "off"
+            return (
+                "🧠 **Reasoning Settings**\n\n"
+                f"**Effort:** `{level}`\n"
+                f"**Display:** {display_state}\n\n"
+                "_Usage:_ `/reasoning <none|low|medium|high|xhigh|show|hide>`"
+            )
+
+        # Display toggle
+        if args in ("show", "on"):
+            self._show_reasoning = True
+            _save_config_key("display.show_reasoning", True)
+            return "🧠 ✓ Reasoning display: **ON**\nModel thinking will be shown before each response."
+
+        if args in ("hide", "off"):
+            self._show_reasoning = False
+            _save_config_key("display.show_reasoning", False)
+            return "🧠 ✓ Reasoning display: **OFF**"
+
+        # Effort level change
+        effort = args.strip()
+        if effort == "none":
+            parsed = {"enabled": False}
+        elif effort in ("xhigh", "high", "medium", "low", "minimal"):
+            parsed = {"enabled": True, "effort": effort}
+        else:
+            return (
+                f"⚠️ Unknown argument: `{effort}`\n\n"
+                "**Valid levels:** none, low, minimal, medium, high, xhigh\n"
+                "**Display:** show, hide"
+            )
+
+        self._reasoning_config = parsed
+        if _save_config_key("agent.reasoning_effort", effort):
+            return f"🧠 ✓ Reasoning effort set to `{effort}` (saved to config)\n_(takes effect on next message)_"
+        else:
+            return f"🧠 ✓ Reasoning effort set to `{effort}` (this session only)"
+
    async def _handle_compress_command(self, event: MessageEvent) -> str:
        """Handle /compress command -- manually compress conversation context."""
        source = event.source
@@ -1867,6 +2305,9 @@ class GatewayRunner:
            if not runtime_kwargs.get("api_key"):
                return "No provider configured -- cannot compress."

+            # Resolve model from config (same reason as memory flush above).
+            model = _resolve_gateway_model()
+
            msgs = [
                {"role": m.get("role"), "content": m.get("content")}
                for m in history
@@ -1877,6 +2318,7 @@ class GatewayRunner:

            tmp_agent = AIAgent(
                **runtime_kwargs,
+                model=model,
                max_iterations=4,
                quiet_mode=True,
                enabled_toolsets=["memory"],
@@ -1890,6 +2332,10 @@ class GatewayRunner:
            )

            self.session_store.rewrite_transcript(session_entry.session_id, compressed)
+            # Reset stored token count — transcript changed, old value is stale
+            self.session_store.update_session(
+                session_entry.session_key, last_prompt_tokens=0,
+            )
            new_count = len(compressed)
            new_tokens = estimate_messages_tokens_rough(compressed)

@@ -2531,6 +2977,7 @@ class GatewayRunner:
            Platform.SLACK: "hermes-slack",
            Platform.SIGNAL: "hermes-signal",
            Platform.HOMEASSISTANT: "hermes-homeassistant",
+            Platform.EMAIL: "hermes-email",
        }
        
        # Try to load platform_toolsets from config
@@ -2554,6 +3001,7 @@ class GatewayRunner:
            Platform.SLACK: "slack",
            Platform.SIGNAL: "signal",
            Platform.HOMEASSISTANT: "homeassistant",
+            Platform.EMAIL: "email",
        }.get(source.platform, "telegram")
        
        # Use config override if present (list of toolsets), otherwise hardcoded default
@@ -2707,7 +3155,7 @@ class GatewayRunner:

                    # Restore typing indicator
                    await asyncio.sleep(0.3)
-                    await adapter.send_typing(source.chat_id)
+                    await adapter.send_typing(source.chat_id, metadata=_progress_metadata)

                except queue.Empty:
                    await asyncio.sleep(0.3)
@@ -2785,21 +3233,7 @@ class GatewayRunner:
            except Exception:
                pass

-            model = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
-
-            try:
-                import yaml as _y
-                _cfg_path = _hermes_home / "config.yaml"
-                if _cfg_path.exists():
-                    with open(_cfg_path, encoding="utf-8") as _f:
-                        _cfg = _y.safe_load(_f) or {}
-                    _model_cfg = _cfg.get("model", {})
-                    if isinstance(_model_cfg, str):
-                        model = _model_cfg
-                    elif isinstance(_model_cfg, dict):
-                        model = _model_cfg.get("default", model)
-            except Exception:
-                pass
+            model = _resolve_gateway_model()

            try:
                runtime_kwargs = _resolve_runtime_agent_kwargs()
@@ -2902,6 +3336,13 @@ class GatewayRunner:
            
            # Return final response, or a message if something went wrong
            final_response = result.get("final_response")
+
+            # Extract last actual prompt token count from the agent's compressor
+            _last_prompt_toks = 0
+            _agent = agent_holder[0]
+            if _agent and hasattr(_agent, "context_compressor"):
+                _last_prompt_toks = getattr(_agent.context_compressor, "last_prompt_tokens", 0)
+
            if not final_response:
                error_msg = f"⚠️ {result['error']}" if result.get("error") else "(No response generated)"
                return {
@@ -2910,6 +3351,7 @@ class GatewayRunner:
                    "api_calls": result.get("api_calls", 0),
                    "tools": tools_holder[0] or [],
                    "history_offset": len(agent_history),
+                    "last_prompt_tokens": _last_prompt_toks,
                }
            
            # Scan tool results for MEDIA:<path> tags that need to be delivered
@@ -2949,10 +3391,12 @@ class GatewayRunner:
            
            return {
                "final_response": final_response,
+                "last_reasoning": result.get("last_reasoning"),
                "messages": result_holder[0].get("messages", []) if result_holder[0] else [],
                "api_calls": result_holder[0].get("api_calls", 0) if result_holder[0] else 0,
                "tools": tools_holder[0] or [],
                "history_offset": len(agent_history),
+                "last_prompt_tokens": _last_prompt_toks,
            }
        
        # Start progress message sender if enabled
@@ -2974,17 +3418,19 @@ class GatewayRunner:
        # Monitor for interrupts from the adapter (new messages arriving)
        async def monitor_for_interrupt():
            adapter = self.adapters.get(source.platform)
-            if not adapter:
+            if not adapter or not session_key:
                return
            
-            chat_id = source.chat_id
            while True:
                await asyncio.sleep(0.2)  # Check every 200ms
-                # Check if adapter has a pending interrupt for this session
-                if hasattr(adapter, 'has_pending_interrupt') and adapter.has_pending_interrupt(chat_id):
+                # Check if adapter has a pending interrupt for this session.
+                # Must use session_key (build_session_key output) — NOT
+                # source.chat_id — because the adapter stores interrupt events
+                # under the full session key.
+                if hasattr(adapter, 'has_pending_interrupt') and adapter.has_pending_interrupt(session_key):
                    agent = agent_holder[0]
                    if agent:
-                        pending_event = adapter.get_pending_message(chat_id)
+                        pending_event = adapter.get_pending_message(session_key)
                        pending_text = pending_event.text if pending_event else None
                        logger.debug("Interrupt detected from adapter, signaling agent...")
                        agent.interrupt(pending_text)
@@ -3001,10 +3447,11 @@ class GatewayRunner:
            result = result_holder[0]
            adapter = self.adapters.get(source.platform)
            
-            # Get pending message from adapter if interrupted
+            # Get pending message from adapter if interrupted.
+            # Use session_key (not source.chat_id) to match adapter's storage keys.
            pending = None
            if result and result.get("interrupted") and adapter:
-                pending_event = adapter.get_pending_message(source.chat_id)
+                pending_event = adapter.get_pending_message(session_key) if session_key else None
                if pending_event:
                    pending = pending_event.text
                elif result.get("interrupt_message"):
@@ -3016,8 +3463,8 @@ class GatewayRunner:
                # Clear the adapter's interrupt event so the next _run_agent call
                # doesn't immediately re-trigger the interrupt before the new agent
                # even makes its first API call (this was causing an infinite loop).
-                if adapter and hasattr(adapter, '_active_sessions') and source.chat_id in adapter._active_sessions:
-                    adapter._active_sessions[source.chat_id].clear()
+                if adapter and hasattr(adapter, '_active_sessions') and session_key and session_key in adapter._active_sessions:
+                    adapter._active_sessions[session_key].clear()
                
                # Don't send the interrupted response to the user — it's just noise
                # like "Operation interrupted." They already know they sent a new
@@ -241,6 +241,9 @@ class SessionEntry:
    output_tokens: int = 0
    total_tokens: int = 0
    
+    # Last API-reported prompt tokens (for accurate compression pre-check)
+    last_prompt_tokens: int = 0
+    
    # Set when a session was created because the previous one expired;
    # consumed once by the message handler to inject a notice into context
    was_auto_reset: bool = False
@@ -257,6 +260,7 @@ class SessionEntry:
            "input_tokens": self.input_tokens,
            "output_tokens": self.output_tokens,
            "total_tokens": self.total_tokens,
+            "last_prompt_tokens": self.last_prompt_tokens,
        }
        if self.origin:
            result["origin"] = self.origin.to_dict()
@@ -287,6 +291,7 @@ class SessionEntry:
            input_tokens=data.get("input_tokens", 0),
            output_tokens=data.get("output_tokens", 0),
            total_tokens=data.get("total_tokens", 0),
+            last_prompt_tokens=data.get("last_prompt_tokens", 0),
        )


@@ -301,6 +306,8 @@ def build_session_key(source: SessionSource) -> str:
        if platform == "whatsapp" and source.chat_id:
            return f"agent:main:{platform}:dm:{source.chat_id}"
        return f"agent:main:{platform}:dm"
+    if source.thread_id:
+        return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}:{source.thread_id}"
    return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}"


@@ -550,7 +557,8 @@ class SessionStore:
        self, 
        session_key: str,
        input_tokens: int = 0,
-        output_tokens: int = 0
+        output_tokens: int = 0,
+        last_prompt_tokens: int = None,
    ) -> None:
        """Update a session's metadata after an interaction."""
        self._ensure_loaded()
@@ -560,6 +568,8 @@ class SessionStore:
            entry.updated_at = datetime.now()
            entry.input_tokens += input_tokens
            entry.output_tokens += output_tokens
+            if last_prompt_tokens is not None:
+                entry.last_prompt_tokens = last_prompt_tokens
            entry.total_tokens = entry.input_tokens + entry.output_tokens
            self._save()
            
@@ -11,4 +11,5 @@ Provides subcommands for:
 - hermes cron          - Manage cron jobs
 """

-__version__ = "v1.0.0"
+__version__ = "0.2.0"
+__release_date__ = "2026.3.12"
@@ -108,14 +108,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        auth_type="oauth_external",
        inference_base_url=DEFAULT_CODEX_BASE_URL,
    ),
-    "nous-api": ProviderConfig(
-        id="nous-api",
-        name="Nous Portal (API Key)",
-        auth_type="api_key",
-        inference_base_url="https://inference-api.nousresearch.com/v1",
-        api_key_env_vars=("NOUS_API_KEY",),
-        base_url_env_var="NOUS_BASE_URL",
-    ),
    "zai": ProviderConfig(
        id="zai",
        name="Z.AI / GLM",
@@ -521,7 +513,6 @@ def resolve_provider(

    # Normalize provider aliases
    _PROVIDER_ALIASES = {
-        "nous_api": "nous-api", "nousapi": "nous-api", "nous-portal-api": "nous-api",
        "glm": "zai", "z-ai": "zai", "z.ai": "zai", "zhipu": "zai",
        "kimi": "kimi-coding", "moonshot": "kimi-coding",
        "minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
@@ -1103,6 +1094,19 @@ def fetch_nous_models(
                continue
            model_ids.append(mid)

+    # Sort: prefer opus > pro > haiku/flash > sonnet (sonnet is cheap/fast,
+    # users who want the best model should see opus first).
+    def _model_priority(mid: str) -> tuple:
+        low = mid.lower()
+        if "opus" in low:
+            return (0, mid)
+        if "pro" in low and "sonnet" not in low:
+            return (1, mid)
+        if "sonnet" in low:
+            return (3, mid)
+        return (2, mid)
+
+    model_ids.sort(key=_model_priority)
    return list(dict.fromkeys(model_ids))


@@ -1667,8 +1671,12 @@ def _prompt_model_selection(model_ids: List[str], current_model: str = "") -> Op


 def _save_model_choice(model_id: str) -> None:
-    """Save the selected model to config.yaml and .env."""
-    from hermes_cli.config import save_config, load_config, save_env_value
+    """Save the selected model to config.yaml (single source of truth).
+
+    The model is stored in config.yaml only — NOT in .env.  This avoids
+    conflicts in multi-agent setups where env vars would stomp each other.
+    """
+    from hermes_cli.config import save_config, load_config

    config = load_config()
    # Always use dict format so provider/base_url can be stored alongside
@@ -1677,7 +1685,6 @@ def _save_model_choice(model_id: str) -> None:
    else:
        config["model"] = {"default": model_id}
    save_config(config)
-    save_env_value("LLM_MODEL", model_id)


 def login_command(args) -> None:
@@ -62,7 +62,7 @@ def _skin_branding(key: str, fallback: str) -> str:
 # ASCII Art & Branding
 # =========================================================================

-from hermes_cli import __version__ as VERSION
+from hermes_cli import __version__ as VERSION, __release_date__ as RELEASE_DATE

 HERMES_AGENT_LOGO = """[bold #FFD700]██╗  ██╗███████╗██████╗ ███╗   ███╗███████╗███████╗       █████╗  ██████╗ ███████╗███╗   ██╗████████╗[/]
 [bold #FFD700]██║  ██║██╔════╝██╔══██╗████╗ ████║██╔════╝██╔════╝      ██╔══██╗██╔════╝ ██╔════╝████╗  ██║╚══██╔══╝[/]
@@ -380,7 +380,7 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
    border_color = _skin_color("banner_border", "#CD7F32")
    outer_panel = Panel(
        layout_table,
-        title=f"[bold {title_color}]{agent_name} {VERSION}[/]",
+        title=f"[bold {title_color}]{agent_name} v{VERSION} ({RELEASE_DATE})[/]",
        border_style=border_color,
        padding=(0, 2),
    )
@@ -105,10 +105,14 @@ def approval_callback(cli, command: str, description: str) -> str:
    """Prompt for dangerous command approval through the TUI.

    Shows a selection UI with choices: once / session / always / deny.
+    When the command is longer than 70 characters, a "view" option is
+    included so the user can reveal the full text before deciding.
    """
    timeout = 60
    response_queue = queue.Queue()
    choices = ["once", "session", "always", "deny"]
+    if len(command) > 70:
+        choices.append("view")

    cli._approval_state = {
        "command": command,
@@ -0,0 +1,135 @@
+"""Shared curses-based multi-select checklist for Hermes CLI.
+
+Used by both ``hermes tools`` and ``hermes skills`` to present a
+toggleable list of items.  Falls back to a numbered text UI when
+curses is unavailable (Windows without curses, piped stdin, etc.).
+"""
+
+from typing import List, Set
+
+from hermes_cli.colors import Colors, color
+
+
+def curses_checklist(
+    title: str,
+    items: List[str],
+    pre_selected: Set[int],
+) -> Set[int]:
+    """Multi-select checklist.  Returns set of **selected** indices.
+
+    Args:
+        title: Header text shown at the top of the checklist.
+        items: Display labels for each row.
+        pre_selected: Indices that start checked.
+
+    Returns:
+        The indices the user confirmed as checked.  On cancel (ESC/q),
+        returns ``pre_selected`` unchanged.
+    """
+    try:
+        import curses
+        selected = set(pre_selected)
+        result = [None]
+
+        def _ui(stdscr):
+            curses.curs_set(0)
+            if curses.has_colors():
+                curses.start_color()
+                curses.use_default_colors()
+                curses.init_pair(1, curses.COLOR_GREEN, -1)
+                curses.init_pair(2, curses.COLOR_YELLOW, -1)
+                curses.init_pair(3, 8, -1)  # dim gray
+            cursor = 0
+            scroll_offset = 0
+
+            while True:
+                stdscr.clear()
+                max_y, max_x = stdscr.getmaxyx()
+
+                # Header
+                try:
+                    hattr = curses.A_BOLD | (curses.color_pair(2) if curses.has_colors() else 0)
+                    stdscr.addnstr(0, 0, title, max_x - 1, hattr)
+                    stdscr.addnstr(
+                        1, 0,
+                        "  ↑↓ navigate  SPACE toggle  ENTER confirm  ESC cancel",
+                        max_x - 1, curses.A_DIM,
+                    )
+                except curses.error:
+                    pass
+
+                # Scrollable item list
+                visible_rows = max_y - 3
+                if cursor < scroll_offset:
+                    scroll_offset = cursor
+                elif cursor >= scroll_offset + visible_rows:
+                    scroll_offset = cursor - visible_rows + 1
+
+                for draw_i, i in enumerate(
+                    range(scroll_offset, min(len(items), scroll_offset + visible_rows))
+                ):
+                    y = draw_i + 3
+                    if y >= max_y - 1:
+                        break
+                    check = "✓" if i in selected else " "
+                    arrow = "→" if i == cursor else " "
+                    line = f" {arrow} [{check}] {items[i]}"
+
+                    attr = curses.A_NORMAL
+                    if i == cursor:
+                        attr = curses.A_BOLD
+                        if curses.has_colors():
+                            attr |= curses.color_pair(1)
+                    try:
+                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
+                    except curses.error:
+                        pass
+
+                stdscr.refresh()
+                key = stdscr.getch()
+
+                if key in (curses.KEY_UP, ord("k")):
+                    cursor = (cursor - 1) % len(items)
+                elif key in (curses.KEY_DOWN, ord("j")):
+                    cursor = (cursor + 1) % len(items)
+                elif key == ord(" "):
+                    selected.symmetric_difference_update({cursor})
+                elif key in (curses.KEY_ENTER, 10, 13):
+                    result[0] = set(selected)
+                    return
+                elif key in (27, ord("q")):
+                    result[0] = set(pre_selected)
+                    return
+
+        curses.wrapper(_ui)
+        return result[0] if result[0] is not None else set(pre_selected)
+
+    except Exception:
+        pass  # fall through to numbered fallback
+
+    # ── Numbered text fallback ────────────────────────────────────────────
+    selected = set(pre_selected)
+    print(color(f"\n  {title}", Colors.YELLOW))
+    print(color("  Toggle by number, Enter to confirm.\n", Colors.DIM))
+
+    while True:
+        for i, label in enumerate(items):
+            check = "✓" if i in selected else " "
+            print(f"    {i + 1:3}. [{check}] {label}")
+        print()
+
+        try:
+            raw = input(color("  Number to toggle, 's' to save, 'q' to cancel: ", Colors.DIM)).strip()
+        except (KeyboardInterrupt, EOFError):
+            return set(pre_selected)
+
+        if raw.lower() == "s" or raw == "":
+            return selected
+        if raw.lower() == "q":
+            return set(pre_selected)
+        try:
+            idx = int(raw) - 1
+            if 0 <= idx < len(items):
+                selected.symmetric_difference_update({idx})
+        except ValueError:
+            print(color("  Invalid input", Colors.DIM))
@@ -0,0 +1,296 @@
+"""hermes claw — OpenClaw migration commands.
+
+Usage:
+    hermes claw migrate              # Interactive migration from ~/.openclaw
+    hermes claw migrate --dry-run    # Preview what would be migrated
+    hermes claw migrate --preset full --overwrite  # Full migration, overwrite conflicts
+"""
+
+import importlib.util
+import logging
+import sys
+from pathlib import Path
+
+from hermes_cli.config import get_hermes_home, get_config_path, load_config, save_config
+from hermes_cli.setup import (
+    Colors,
+    color,
+    print_header,
+    print_info,
+    print_success,
+    print_warning,
+    print_error,
+    prompt_yes_no,
+    prompt_choice,
+)
+
+logger = logging.getLogger(__name__)
+
+PROJECT_ROOT = Path(__file__).parent.parent.resolve()
+
+_OPENCLAW_SCRIPT = (
+    PROJECT_ROOT
+    / "optional-skills"
+    / "migration"
+    / "openclaw-migration"
+    / "scripts"
+    / "openclaw_to_hermes.py"
+)
+
+# Fallback: user may have installed the skill from the Hub
+_OPENCLAW_SCRIPT_INSTALLED = (
+    get_hermes_home()
+    / "skills"
+    / "migration"
+    / "openclaw-migration"
+    / "scripts"
+    / "openclaw_to_hermes.py"
+)
+
+
+def _find_migration_script() -> Path | None:
+    """Find the openclaw_to_hermes.py script in known locations."""
+    for candidate in [_OPENCLAW_SCRIPT, _OPENCLAW_SCRIPT_INSTALLED]:
+        if candidate.exists():
+            return candidate
+    return None
+
+
+def _load_migration_module(script_path: Path):
+    """Dynamically load the migration script as a module."""
+    spec = importlib.util.spec_from_file_location("openclaw_to_hermes", script_path)
+    if spec is None or spec.loader is None:
+        return None
+    mod = importlib.util.module_from_spec(spec)
+    # Register in sys.modules so @dataclass can resolve the module
+    # (Python 3.11+ requires this for dynamically loaded modules)
+    sys.modules[spec.name] = mod
+    try:
+        spec.loader.exec_module(mod)
+    except Exception:
+        sys.modules.pop(spec.name, None)
+        raise
+    return mod
+
+
+def claw_command(args):
+    """Route hermes claw subcommands."""
+    action = getattr(args, "claw_action", None)
+
+    if action == "migrate":
+        _cmd_migrate(args)
+    else:
+        print("Usage: hermes claw migrate [options]")
+        print()
+        print("Commands:")
+        print("  migrate          Migrate settings from OpenClaw to Hermes")
+        print()
+        print("Run 'hermes claw migrate --help' for migration options.")
+
+
+def _cmd_migrate(args):
+    """Run the OpenClaw → Hermes migration."""
+    source_dir = Path(getattr(args, "source", None) or Path.home() / ".openclaw")
+    dry_run = getattr(args, "dry_run", False)
+    preset = getattr(args, "preset", "full")
+    overwrite = getattr(args, "overwrite", False)
+    migrate_secrets = getattr(args, "migrate_secrets", False)
+    workspace_target = getattr(args, "workspace_target", None)
+    skill_conflict = getattr(args, "skill_conflict", "skip")
+
+    # If using the "full" preset, secrets are included by default
+    if preset == "full":
+        migrate_secrets = True
+
+    print()
+    print(
+        color(
+            "┌─────────────────────────────────────────────────────────┐",
+            Colors.MAGENTA,
+        )
+    )
+    print(
+        color(
+            "│          ⚕ Hermes — OpenClaw Migration                 │",
+            Colors.MAGENTA,
+        )
+    )
+    print(
+        color(
+            "└─────────────────────────────────────────────────────────┘",
+            Colors.MAGENTA,
+        )
+    )
+
+    # Check source directory
+    if not source_dir.is_dir():
+        print()
+        print_error(f"OpenClaw directory not found: {source_dir}")
+        print_info("Make sure your OpenClaw installation is at the expected path.")
+        print_info(f"You can specify a custom path: hermes claw migrate --source /path/to/.openclaw")
+        return
+
+    # Find the migration script
+    script_path = _find_migration_script()
+    if not script_path:
+        print()
+        print_error("Migration script not found.")
+        print_info("Expected at one of:")
+        print_info(f"  {_OPENCLAW_SCRIPT}")
+        print_info(f"  {_OPENCLAW_SCRIPT_INSTALLED}")
+        print_info("Make sure the openclaw-migration skill is installed.")
+        return
+
+    # Show what we're doing
+    hermes_home = get_hermes_home()
+    print()
+    print_header("Migration Settings")
+    print_info(f"Source:      {source_dir}")
+    print_info(f"Target:      {hermes_home}")
+    print_info(f"Preset:      {preset}")
+    print_info(f"Mode:        {'dry run (preview only)' if dry_run else 'execute'}")
+    print_info(f"Overwrite:   {'yes' if overwrite else 'no (skip conflicts)'}")
+    print_info(f"Secrets:     {'yes (allowlisted only)' if migrate_secrets else 'no'}")
+    if skill_conflict != "skip":
+        print_info(f"Skill conflicts: {skill_conflict}")
+    if workspace_target:
+        print_info(f"Workspace:   {workspace_target}")
+    print()
+
+    # For execute mode (non-dry-run), confirm unless --yes was passed
+    if not dry_run and not getattr(args, "yes", False):
+        if not prompt_yes_no("Proceed with migration?", default=True):
+            print_info("Migration cancelled.")
+            return
+
+    # Ensure config.yaml exists before migration tries to read it
+    config_path = get_config_path()
+    if not config_path.exists():
+        save_config(load_config())
+
+    # Load and run the migration
+    try:
+        mod = _load_migration_module(script_path)
+        if mod is None:
+            print_error("Could not load migration script.")
+            return
+
+        selected = mod.resolve_selected_options(None, None, preset=preset)
+        ws_target = Path(workspace_target).resolve() if workspace_target else None
+
+        migrator = mod.Migrator(
+            source_root=source_dir.resolve(),
+            target_root=hermes_home.resolve(),
+            execute=not dry_run,
+            workspace_target=ws_target,
+            overwrite=overwrite,
+            migrate_secrets=migrate_secrets,
+            output_dir=None,
+            selected_options=selected,
+            preset_name=preset,
+            skill_conflict_mode=skill_conflict,
+        )
+        report = migrator.migrate()
+    except Exception as e:
+        print()
+        print_error(f"Migration failed: {e}")
+        logger.debug("OpenClaw migration error", exc_info=True)
+        return
+
+    # Print results
+    _print_migration_report(report, dry_run)
+
+
+def _print_migration_report(report: dict, dry_run: bool):
+    """Print a formatted migration report."""
+    summary = report.get("summary", {})
+    migrated = summary.get("migrated", 0)
+    skipped = summary.get("skipped", 0)
+    conflicts = summary.get("conflict", 0)
+    errors = summary.get("error", 0)
+    total = migrated + skipped + conflicts + errors
+
+    print()
+    if dry_run:
+        print_header("Dry Run Results")
+        print_info("No files were modified. This is a preview of what would happen.")
+    else:
+        print_header("Migration Results")
+
+    print()
+
+    # Detailed items
+    items = report.get("items", [])
+    if items:
+        # Group by status
+        migrated_items = [i for i in items if i.get("status") == "migrated"]
+        skipped_items = [i for i in items if i.get("status") == "skipped"]
+        conflict_items = [i for i in items if i.get("status") == "conflict"]
+        error_items = [i for i in items if i.get("status") == "error"]
+
+        if migrated_items:
+            label = "Would migrate" if dry_run else "Migrated"
+            print(color(f"  ✓ {label}:", Colors.GREEN))
+            for item in migrated_items:
+                kind = item.get("kind", "unknown")
+                dest = item.get("destination", "")
+                if dest:
+                    dest_short = str(dest).replace(str(Path.home()), "~")
+                    print(f"      {kind:<22s} → {dest_short}")
+                else:
+                    print(f"      {kind}")
+            print()
+
+        if conflict_items:
+            print(color(f"  ⚠ Conflicts (skipped — use --overwrite to force):", Colors.YELLOW))
+            for item in conflict_items:
+                kind = item.get("kind", "unknown")
+                reason = item.get("reason", "already exists")
+                print(f"      {kind:<22s}  {reason}")
+            print()
+
+        if skipped_items:
+            print(color(f"  ─ Skipped:", Colors.DIM))
+            for item in skipped_items:
+                kind = item.get("kind", "unknown")
+                reason = item.get("reason", "")
+                print(f"      {kind:<22s}  {reason}")
+            print()
+
+        if error_items:
+            print(color(f"  ✗ Errors:", Colors.RED))
+            for item in error_items:
+                kind = item.get("kind", "unknown")
+                reason = item.get("reason", "unknown error")
+                print(f"      {kind:<22s}  {reason}")
+            print()
+
+    # Summary line
+    parts = []
+    if migrated:
+        action = "would migrate" if dry_run else "migrated"
+        parts.append(f"{migrated} {action}")
+    if conflicts:
+        parts.append(f"{conflicts} conflict(s)")
+    if skipped:
+        parts.append(f"{skipped} skipped")
+    if errors:
+        parts.append(f"{errors} error(s)")
+
+    if parts:
+        print_info(f"Summary: {', '.join(parts)}")
+    else:
+        print_info("Nothing to migrate.")
+
+    # Output directory
+    output_dir = report.get("output_dir")
+    if output_dir:
+        print_info(f"Full report saved to: {output_dir}")
+
+    if dry_run:
+        print()
+        print_info("To execute the migration, run without --dry-run:")
+        print_info(f"  hermes claw migrate --preset {report.get('preset', 'full')}")
+    elif migrated:
+        print()
+        print_success("Migration complete!")
@@ -254,6 +254,7 @@ def _wayland_save(dest: Path) -> bool:
            )

        if not dest.exists() or dest.stat().st_size == 0:
+            dest.unlink(missing_ok=True)
            return False

        # BMP needs conversion to PNG (common in WSLg where only BMP
@@ -47,7 +47,7 @@ def _fetch_models_from_api(access_token: str) -> List[str]:
        if item.get("supported_in_api") is False:
            continue
        visibility = item.get("visibility", "")
-        if isinstance(visibility, str) and visibility.strip().lower() == "hidden":
+        if isinstance(visibility, str) and visibility.strip().lower() in ("hide", "hidden"):
            continue
        priority = item.get("priority")
        rank = int(priority) if isinstance(priority, (int, float)) else 10_000
@@ -97,7 +97,7 @@ def _read_cache_models(codex_home: Path) -> List[str]:
            if item.get("supported_in_api") is False:
                continue
            visibility = item.get("visibility")
-            if isinstance(visibility, str) and visibility.strip().lower() == "hidden":
+            if isinstance(visibility, str) and visibility.strip().lower() in ("hide", "hidden"):
                continue
            priority = item.get("priority")
            rank = int(priority) if isinstance(priority, (int, float)) else 10_000
@@ -13,37 +13,55 @@ from typing import Any
 from prompt_toolkit.completion import Completer, Completion


-COMMANDS = {
-    "/help": "Show this help message",
-    "/tools": "List available tools",
-    "/toolsets": "List available toolsets",
-    "/model": "Show or change the current model",
-    "/provider": "Show available providers and current provider",
-    "/prompt": "View/set custom system prompt",
-    "/personality": "Set a predefined personality",
-    "/clear": "Clear screen and reset conversation (fresh start)",
-    "/history": "Show conversation history",
-    "/new": "Start a new conversation (reset history)",
-    "/reset": "Reset conversation only (keep screen)",
-    "/retry": "Retry the last message (resend to agent)",
-    "/undo": "Remove the last user/assistant exchange",
-    "/save": "Save the current conversation",
-    "/config": "Show current configuration",
-    "/cron": "Manage scheduled tasks (list, add, remove)",
-    "/skills": "Search, install, inspect, or manage skills from online registries",
-    "/platforms": "Show gateway/messaging platform status",
-    "/verbose": "Cycle tool progress display: off → new → all → verbose",
-    "/compress": "Manually compress conversation context (flush memories + summarize)",
-    "/title": "Set a title for the current session (usage: /title My Session Name)",
-    "/usage": "Show token usage for the current session",
-    "/insights": "Show usage insights and analytics (last 30 days)",
-    "/paste": "Check clipboard for an image and attach it",
-    "/reload-mcp": "Reload MCP servers from config.yaml",
-    "/rollback": "List or restore filesystem checkpoints (usage: /rollback [number])",
-    "/skin": "Show or change the display skin/theme",
-    "/quit": "Exit the CLI (also: /exit, /q)",
+# Commands organized by category for better help display
+COMMANDS_BY_CATEGORY = {
+    "Session": {
+        "/new": "Start a new conversation (reset history)",
+        "/reset": "Reset conversation only (keep screen)",
+        "/clear": "Clear screen and reset conversation (fresh start)",
+        "/history": "Show conversation history",
+        "/save": "Save the current conversation",
+        "/retry": "Retry the last message (resend to agent)",
+        "/undo": "Remove the last user/assistant exchange",
+        "/title": "Set a title for the current session (usage: /title My Session Name)",
+        "/compress": "Manually compress conversation context (flush memories + summarize)",
+        "/rollback": "List or restore filesystem checkpoints (usage: /rollback [number])",
+        "/background": "Run a prompt in the background (usage: /background <prompt>)",
+    },
+    "Configuration": {
+        "/config": "Show current configuration",
+        "/model": "Show or change the current model",
+        "/provider": "Show available providers and current provider",
+        "/prompt": "View/set custom system prompt",
+        "/personality": "Set a predefined personality",
+        "/verbose": "Cycle tool progress display: off → new → all → verbose",
+        "/reasoning": "Manage reasoning effort and display (usage: /reasoning [level|show|hide])",
+        "/skin": "Show or change the display skin/theme",
+    },
+    "Tools & Skills": {
+        "/tools": "List available tools",
+        "/toolsets": "List available toolsets",
+        "/skills": "Search, install, inspect, or manage skills from online registries",
+        "/cron": "Manage scheduled tasks (list, add, remove)",
+        "/reload-mcp": "Reload MCP servers from config.yaml",
+    },
+    "Info": {
+        "/help": "Show this help message",
+        "/usage": "Show token usage for the current session",
+        "/insights": "Show usage insights and analytics (last 30 days)",
+        "/platforms": "Show gateway/messaging platform status",
+        "/paste": "Check clipboard for an image and attach it",
+    },
+    "Exit": {
+        "/quit": "Exit the CLI (also: /exit, /q)",
+    },
 }

+# Flat dict for backwards compatibility and autocomplete
+COMMANDS = {}
+for category_commands in COMMANDS_BY_CATEGORY.values():
+    COMMANDS.update(category_commands)
+

 class SlashCommandCompleter(Completer):
    """Autocomplete for built-in slash commands and optional skill commands."""
@@ -17,6 +17,7 @@ import platform
 import stat
 import subprocess
 import sys
+import tempfile
 from pathlib import Path
 from typing import Dict, Any, Optional, List, Tuple

@@ -47,13 +48,32 @@ def get_project_root() -> Path:
    """Get the project installation directory."""
    return Path(__file__).parent.parent.resolve()

+def _secure_dir(path):
+    """Set directory to owner-only access (0700). No-op on Windows."""
+    try:
+        os.chmod(path, 0o700)
+    except (OSError, NotImplementedError):
+        pass
+
+
+def _secure_file(path):
+    """Set file to owner-only read/write (0600). No-op on Windows."""
+    try:
+        if os.path.exists(str(path)):
+            os.chmod(path, 0o600)
+    except (OSError, NotImplementedError):
+        pass
+
+
 def ensure_hermes_home():
-    """Ensure ~/.hermes directory structure exists."""
+    """Ensure ~/.hermes directory structure exists with secure permissions."""
    home = get_hermes_home()
-    (home / "cron").mkdir(parents=True, exist_ok=True)
-    (home / "sessions").mkdir(parents=True, exist_ok=True)
-    (home / "logs").mkdir(parents=True, exist_ok=True)
-    (home / "memories").mkdir(parents=True, exist_ok=True)
+    home.mkdir(parents=True, exist_ok=True)
+    _secure_dir(home)
+    for subdir in ("cron", "sessions", "logs", "memories"):
+        d = home / subdir
+        d.mkdir(parents=True, exist_ok=True)
+        _secure_dir(d)


 # =============================================================================
@@ -106,17 +126,41 @@ DEFAULT_CONFIG = {
        "summary_provider": "auto",
    },
    
-    # Auxiliary model overrides (advanced).  By default Hermes auto-selects
-    # the provider and model for each side task.  Set these to override.
+    # Auxiliary model config — provider:model for each side task.
+    # Format: provider is the provider name, model is the model slug.
+    # "auto" for provider = auto-detect best available provider.
+    # Empty model = use provider's default auxiliary model.
+    # All tasks fall back to openrouter:google/gemini-3-flash-preview if
+    # the configured provider is unavailable.
    "auxiliary": {
        "vision": {
-            "provider": "auto",    # auto | openrouter | nous | main
+            "provider": "auto",    # auto | openrouter | nous | codex | custom
            "model": "",           # e.g. "google/gemini-2.5-flash", "gpt-4o"
        },
        "web_extract": {
            "provider": "auto",
            "model": "",
        },
+        "compression": {
+            "provider": "auto",
+            "model": "",
+        },
+        "session_search": {
+            "provider": "auto",
+            "model": "",
+        },
+        "skills_hub": {
+            "provider": "auto",
+            "model": "",
+        },
+        "mcp": {
+            "provider": "auto",
+            "model": "",
+        },
+        "flush_memories": {
+            "provider": "auto",
+            "model": "",
+        },
    },
    
    "display": {
@@ -124,6 +168,7 @@ DEFAULT_CONFIG = {
        "personality": "kawaii",
        "resume_display": "full",
        "bell_on_complete": False,
+        "show_reasoning": False,
        "skin": "default",
    },
    
@@ -163,7 +208,16 @@ DEFAULT_CONFIG = {
        "memory_char_limit": 2200,   # ~800 tokens at 2.75 chars/token
        "user_char_limit": 1375,     # ~500 tokens at 2.75 chars/token
    },
-    
+
+    # Subagent delegation — override the provider:model used by delegate_task
+    # so child agents can run on a different (cheaper/faster) provider and model.
+    # Uses the same runtime provider resolution as CLI/gateway startup, so all
+    # configured providers (OpenRouter, Nous, Z.ai, Kimi, etc.) are supported.
+    "delegation": {
+        "model": "",       # e.g. "google/gemini-3-flash-preview" (empty = inherit parent model)
+        "provider": "",    # e.g. "openrouter" (empty = inherit parent provider + credentials)
+    },
+
    # Ephemeral prefill messages file — JSON list of {role, content} dicts
    # injected at the start of every API call for few-shot priming.
    # Never saved to sessions, logs, or trajectories.
@@ -178,11 +232,23 @@ DEFAULT_CONFIG = {
    # Empty string means use server-local time.
    "timezone": "",

+    # Discord platform settings (gateway mode)
+    "discord": {
+        "require_mention": True,       # Require @mention to respond in server channels
+        "free_response_channels": "",  # Comma-separated channel IDs where bot responds without mention
+    },
+
    # Permanently allowed dangerous command patterns (added via "always" approval)
    "command_allowlist": [],
+    # User-defined quick commands that bypass the agent loop (type: exec only)
+    "quick_commands": {},
+    # Custom personalities — add your own entries here
+    # Supports string format: {"name": "system prompt"}
+    # Or dict format: {"name": {"description": "...", "system_prompt": "...", "tone": "...", "style": "..."}}
+    "personalities": {},

    # Config schema version - bump this when adding new required fields
-    "_config_version": 6,
+    "_config_version": 7,
 }

 # =============================================================================
@@ -207,14 +273,6 @@ REQUIRED_ENV_VARS = {}
 # Optional environment variables that enhance functionality
 OPTIONAL_ENV_VARS = {
    # ── Provider (handled in provider selection, not shown in checklists) ──
-    "NOUS_API_KEY": {
-        "description": "Nous Portal API key (direct API key access to Nous inference)",
-        "prompt": "Nous Portal API key",
-        "url": "https://portal.nousresearch.com",
-        "password": True,
-        "category": "provider",
-        "advanced": True,
-    },
    "NOUS_BASE_URL": {
        "description": "Nous Portal base URL override",
        "prompt": "Nous Portal base URL (leave empty for default)",
@@ -872,6 +930,7 @@ def save_config(config: Dict[str, Any]):
        normalized,
        extra_content=_COMMENTED_SECTIONS if sections else None,
    )
+    _secure_file(config_path)


 def load_env() -> Dict[str, str]:
@@ -922,8 +981,20 @@ def save_env_value(key: str, value: str):
            lines[-1] += "\n"
        lines.append(f"{key}={value}\n")
    
-    with open(env_path, 'w', **write_kw) as f:
-        f.writelines(lines)
+    fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix='.tmp', prefix='.env_')
+    try:
+        with os.fdopen(fd, 'w', **write_kw) as f:
+            f.writelines(lines)
+            f.flush()
+            os.fsync(f.fileno())
+        os.replace(tmp_path, env_path)
+    except BaseException:
+        try:
+            os.unlink(tmp_path)
+        except OSError:
+            pass
+        raise
+    _secure_file(env_path)

    # Restrict .env permissions to owner-only (contains API keys)
    if not _IS_WINDOWS:
@@ -998,6 +1069,14 @@ def show_config():
    print(f"  Max turns:    {config.get('agent', {}).get('max_turns', DEFAULT_CONFIG['agent']['max_turns'])}")
    print(f"  Toolsets:     {', '.join(config.get('toolsets', ['all']))}")
    
+    # Display
+    print()
+    print(color("◆ Display", Colors.CYAN, Colors.BOLD))
+    display = config.get('display', {})
+    print(f"  Personality:  {display.get('personality', 'kawaii')}")
+    print(f"  Reasoning:    {'on' if display.get('show_reasoning', False) else 'off'}")
+    print(f"  Bell:         {'on' if display.get('bell_on_complete', False) else 'off'}")
+
    # Terminal
    print()
    print(color("◆ Terminal", Colors.CYAN, Colors.BOLD))
@@ -0,0 +1,140 @@
+"""Shared curses-based UI components for Hermes CLI.
+
+Used by `hermes tools` and `hermes skills` for interactive checklists.
+Provides a curses multi-select with keyboard navigation, plus a
+text-based numbered fallback for terminals without curses support.
+"""
+from typing import List, Set
+
+from hermes_cli.colors import Colors, color
+
+
+def curses_checklist(
+    title: str,
+    items: List[str],
+    selected: Set[int],
+    *,
+    cancel_returns: Set[int] | None = None,
+) -> Set[int]:
+    """Curses multi-select checklist. Returns set of selected indices.
+
+    Args:
+        title: Header line displayed above the checklist.
+        items: Display labels for each row.
+        selected: Indices that start checked (pre-selected).
+        cancel_returns: Returned on ESC/q. Defaults to the original *selected*.
+    """
+    if cancel_returns is None:
+        cancel_returns = set(selected)
+
+    try:
+        import curses
+        chosen = set(selected)
+        result_holder: list = [None]
+
+        def _draw(stdscr):
+            curses.curs_set(0)
+            if curses.has_colors():
+                curses.start_color()
+                curses.use_default_colors()
+                curses.init_pair(1, curses.COLOR_GREEN, -1)
+                curses.init_pair(2, curses.COLOR_YELLOW, -1)
+                curses.init_pair(3, 8, -1)  # dim gray
+            cursor = 0
+            scroll_offset = 0
+
+            while True:
+                stdscr.clear()
+                max_y, max_x = stdscr.getmaxyx()
+
+                # Header
+                try:
+                    hattr = curses.A_BOLD
+                    if curses.has_colors():
+                        hattr |= curses.color_pair(2)
+                    stdscr.addnstr(0, 0, title, max_x - 1, hattr)
+                    stdscr.addnstr(
+                        1, 0,
+                        "  ↑↓ navigate  SPACE toggle  ENTER confirm  ESC cancel",
+                        max_x - 1, curses.A_DIM,
+                    )
+                except curses.error:
+                    pass
+
+                # Scrollable item list
+                visible_rows = max_y - 3
+                if cursor < scroll_offset:
+                    scroll_offset = cursor
+                elif cursor >= scroll_offset + visible_rows:
+                    scroll_offset = cursor - visible_rows + 1
+
+                for draw_i, i in enumerate(
+                    range(scroll_offset, min(len(items), scroll_offset + visible_rows))
+                ):
+                    y = draw_i + 3
+                    if y >= max_y - 1:
+                        break
+                    check = "✓" if i in chosen else " "
+                    arrow = "→" if i == cursor else " "
+                    line = f" {arrow} [{check}] {items[i]}"
+                    attr = curses.A_NORMAL
+                    if i == cursor:
+                        attr = curses.A_BOLD
+                        if curses.has_colors():
+                            attr |= curses.color_pair(1)
+                    try:
+                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
+                    except curses.error:
+                        pass
+
+                stdscr.refresh()
+                key = stdscr.getch()
+
+                if key in (curses.KEY_UP, ord("k")):
+                    cursor = (cursor - 1) % len(items)
+                elif key in (curses.KEY_DOWN, ord("j")):
+                    cursor = (cursor + 1) % len(items)
+                elif key == ord(" "):
+                    chosen.symmetric_difference_update({cursor})
+                elif key in (curses.KEY_ENTER, 10, 13):
+                    result_holder[0] = set(chosen)
+                    return
+                elif key in (27, ord("q")):
+                    result_holder[0] = cancel_returns
+                    return
+
+        curses.wrapper(_draw)
+        return result_holder[0] if result_holder[0] is not None else cancel_returns
+
+    except Exception:
+        return _numbered_fallback(title, items, selected, cancel_returns)
+
+
+def _numbered_fallback(
+    title: str,
+    items: List[str],
+    selected: Set[int],
+    cancel_returns: Set[int],
+) -> Set[int]:
+    """Text-based toggle fallback for terminals without curses."""
+    chosen = set(selected)
+    print(color(f"\n  {title}", Colors.YELLOW))
+    print(color("  Toggle by number, Enter to confirm.\n", Colors.DIM))
+
+    while True:
+        for i, label in enumerate(items):
+            marker = color("[✓]", Colors.GREEN) if i in chosen else "[ ]"
+            print(f"  {marker} {i + 1:>2}. {label}")
+        print()
+        try:
+            val = input(color("  Toggle # (or Enter to confirm): ", Colors.DIM)).strip()
+            if not val:
+                break
+            idx = int(val) - 1
+            if 0 <= idx < len(items):
+                chosen.symmetric_difference_update({idx})
+        except (ValueError, KeyboardInterrupt, EOFError):
+            return cancel_returns
+        print()
+
+    return chosen
@@ -490,13 +490,16 @@ def run_doctor(args):
            print(f"\r  {color('⚠', Colors.YELLOW)} Anthropic API {color(f'({e})', Colors.DIM)}                 ")

    # -- API-key providers (Z.AI/GLM, Kimi, MiniMax, MiniMax-CN) --
+    # Tuple: (name, env_vars, default_url, base_env, supports_models_endpoint)
+    # If supports_models_endpoint is False, we skip the health check and just show "configured"
    _apikey_providers = [
-        ("Z.AI / GLM",      ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"), "https://api.z.ai/api/paas/v4/models", "GLM_BASE_URL"),
-        ("Kimi / Moonshot",  ("KIMI_API_KEY",),                              "https://api.moonshot.ai/v1/models",   "KIMI_BASE_URL"),
-        ("MiniMax",          ("MINIMAX_API_KEY",),                            "https://api.minimax.io/v1/models",    "MINIMAX_BASE_URL"),
-        ("MiniMax (China)",  ("MINIMAX_CN_API_KEY",),                         "https://api.minimaxi.com/v1/models",  "MINIMAX_CN_BASE_URL"),
+        ("Z.AI / GLM",      ("GLM_API_KEY", "ZAI_API_KEY", "Z_AI_API_KEY"), "https://api.z.ai/api/paas/v4/models", "GLM_BASE_URL", True),
+        ("Kimi / Moonshot",  ("KIMI_API_KEY",),                              "https://api.moonshot.ai/v1/models",   "KIMI_BASE_URL", True),
+        # MiniMax APIs don't support /models endpoint — https://github.com/NousResearch/hermes-agent/issues/811
+        ("MiniMax",          ("MINIMAX_API_KEY",),                            None,                                  "MINIMAX_BASE_URL", False),
+        ("MiniMax (China)",  ("MINIMAX_CN_API_KEY",),                         None,                                  "MINIMAX_CN_BASE_URL", False),
    ]
-    for _pname, _env_vars, _default_url, _base_env in _apikey_providers:
+    for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
        _key = ""
        for _ev in _env_vars:
            _key = os.getenv(_ev, "")
@@ -504,6 +507,10 @@ def run_doctor(args):
                break
        if _key:
            _label = _pname.ljust(20)
+            # Some providers (like MiniMax) don't support /models endpoint
+            if not _supports_health_check:
+                print(f"  {color('✓', Colors.GREEN)} {_label} {color('(key configured)', Colors.DIM)}")
+                continue
            print(f"  Checking {_pname} API...", end="", flush=True)
            try:
                import httpx
@@ -518,6 +518,32 @@ _PLATFORMS = [
        "emoji": "📡",
        "token_var": "SIGNAL_HTTP_URL",
    },
+    {
+        "key": "email",
+        "label": "Email",
+        "emoji": "📧",
+        "token_var": "EMAIL_ADDRESS",
+        "setup_instructions": [
+            "1. Use a dedicated email account for your Hermes agent",
+            "2. For Gmail: enable 2FA, then create an App Password at",
+            "   https://myaccount.google.com/apppasswords",
+            "3. For other providers: use your email password or app-specific password",
+            "4. IMAP must be enabled on your email account",
+        ],
+        "vars": [
+            {"name": "EMAIL_ADDRESS", "prompt": "Email address", "password": False,
+             "help": "The email address Hermes will use (e.g., hermes@gmail.com)."},
+            {"name": "EMAIL_PASSWORD", "prompt": "Email password (or app password)", "password": True,
+             "help": "For Gmail, use an App Password (not your regular password)."},
+            {"name": "EMAIL_IMAP_HOST", "prompt": "IMAP host", "password": False,
+             "help": "e.g., imap.gmail.com for Gmail, outlook.office365.com for Outlook."},
+            {"name": "EMAIL_SMTP_HOST", "prompt": "SMTP host", "password": False,
+             "help": "e.g., smtp.gmail.com for Gmail, smtp.office365.com for Outlook."},
+            {"name": "EMAIL_ALLOWED_USERS", "prompt": "Allowed sender emails (comma-separated)", "password": False,
+             "is_allowlist": True,
+             "help": "Only emails from these addresses will be processed."},
+        ],
+    },
 ]


@@ -543,6 +569,15 @@ def _platform_status(platform: dict) -> str:
        if val or account:
            return "partially configured"
        return "not configured"
+    if platform.get("key") == "email":
+        pwd = get_env_value("EMAIL_PASSWORD")
+        imap = get_env_value("EMAIL_IMAP_HOST")
+        smtp = get_env_value("EMAIL_SMTP_HOST")
+        if all([val, pwd, imap, smtp]):
+            return "configured"
+        if any([val, pwd, imap, smtp]):
+            return "partially configured"
+        return "not configured"
    if val:
        return "configured"
    return "not configured"
@@ -22,6 +22,8 @@ Usage:
    hermes update              # Update to latest version
    hermes uninstall           # Uninstall Hermes Agent
    hermes sessions browse     # Interactive session picker with search
+    hermes claw migrate        # Migrate from OpenClaw to Hermes
+    hermes claw migrate --dry-run  # Preview migration without changes
 """

 import argparse
@@ -51,7 +53,7 @@ os.environ.setdefault("MSWEA_SILENT_STARTUP", "1")

 import logging

-from hermes_cli import __version__
+from hermes_cli import __version__, __release_date__
 from hermes_constants import OPENROUTER_BASE_URL

 logger = logging.getLogger(__name__)
@@ -477,6 +479,10 @@ def cmd_chat(args):
    except Exception:
        pass

+    # --yolo: bypass all dangerous command approvals
+    if getattr(args, "yolo", False):
+        os.environ["HERMES_YOLO_MODE"] = "1"
+
    # Import and run the CLI
    from cli import main as cli_main
    
@@ -486,10 +492,12 @@ def cmd_chat(args):
        "provider": getattr(args, "provider", None),
        "toolsets": args.toolsets,
        "verbose": args.verbose,
+        "quiet": getattr(args, "quiet", False),
        "query": args.query,
        "resume": getattr(args, "resume", None),
        "worktree": getattr(args, "worktree", False),
        "checkpoints": getattr(args, "checkpoints", False),
+        "pass_session_id": getattr(args, "pass_session_id", False),
    }
    # Filter out None values
    kwargs = {k: v for k, v in kwargs.items() if v is not None}
@@ -826,7 +834,9 @@ def cmd_model(args):
        _model_flow_named_custom(config, _custom_provider_map[selected_provider])
    elif selected_provider == "remove-custom":
        _remove_custom_provider(config)
-    elif selected_provider in ("zai", "kimi-coding", "minimax", "minimax-cn"):
+    elif selected_provider == "kimi-coding":
+        _model_flow_kimi(config, current_model)
+    elif selected_provider in ("zai", "minimax", "minimax-cn"):
        _model_flow_api_key_provider(config, selected_provider, current_model)


@@ -1337,8 +1347,10 @@ _PROVIDER_MODELS = {
        "glm-4.5-flash",
    ],
    "kimi-coding": [
+        "kimi-for-coding",
        "kimi-k2.5",
        "kimi-k2-thinking",
+        "kimi-k2-thinking-turbo",
        "kimi-k2-turbo-preview",
        "kimi-k2-0905-preview",
    ],
@@ -1355,8 +1367,112 @@ _PROVIDER_MODELS = {
 }


+def _model_flow_kimi(config, current_model=""):
+    """Kimi / Moonshot model selection with automatic endpoint routing.
+
+    - sk-kimi-* keys   → api.kimi.com/coding/v1  (Kimi Coding Plan)
+    - Other keys        → api.moonshot.ai/v1      (legacy Moonshot)
+
+    No manual base URL prompt — endpoint is determined by key prefix.
+    """
+    from hermes_cli.auth import (
+        PROVIDER_REGISTRY, KIMI_CODE_BASE_URL, _prompt_model_selection,
+        _save_model_choice, deactivate_provider,
+    )
+    from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
+
+    provider_id = "kimi-coding"
+    pconfig = PROVIDER_REGISTRY[provider_id]
+    key_env = pconfig.api_key_env_vars[0] if pconfig.api_key_env_vars else ""
+    base_url_env = pconfig.base_url_env_var or ""
+
+    # Step 1: Check / prompt for API key
+    existing_key = ""
+    for ev in pconfig.api_key_env_vars:
+        existing_key = get_env_value(ev) or os.getenv(ev, "")
+        if existing_key:
+            break
+
+    if not existing_key:
+        print(f"No {pconfig.name} API key configured.")
+        if key_env:
+            try:
+                new_key = input(f"{key_env} (or Enter to cancel): ").strip()
+            except (KeyboardInterrupt, EOFError):
+                print()
+                return
+            if not new_key:
+                print("Cancelled.")
+                return
+            save_env_value(key_env, new_key)
+            existing_key = new_key
+            print("API key saved.")
+            print()
+    else:
+        print(f"  {pconfig.name} API key: {existing_key[:8]}... ✓")
+        print()
+
+    # Step 2: Auto-detect endpoint from key prefix
+    is_coding_plan = existing_key.startswith("sk-kimi-")
+    if is_coding_plan:
+        effective_base = KIMI_CODE_BASE_URL
+        print(f"  Detected Kimi Coding Plan key → {effective_base}")
+    else:
+        effective_base = pconfig.inference_base_url
+        print(f"  Using Moonshot endpoint → {effective_base}")
+    # Clear any manual base URL override so auto-detection works at runtime
+    if base_url_env and get_env_value(base_url_env):
+        save_env_value(base_url_env, "")
+    print()
+
+    # Step 3: Model selection — show appropriate models for the endpoint
+    if is_coding_plan:
+        # Coding Plan models (kimi-for-coding first)
+        model_list = [
+            "kimi-for-coding",
+            "kimi-k2.5",
+            "kimi-k2-thinking",
+            "kimi-k2-thinking-turbo",
+        ]
+    else:
+        # Legacy Moonshot models
+        model_list = _PROVIDER_MODELS.get(provider_id, [])
+
+    if model_list:
+        selected = _prompt_model_selection(model_list, current_model=current_model)
+    else:
+        try:
+            selected = input("Enter model name: ").strip()
+        except (KeyboardInterrupt, EOFError):
+            selected = None
+
+    if selected:
+        # Clear custom endpoint if set (avoid confusion)
+        if get_env_value("OPENAI_BASE_URL"):
+            save_env_value("OPENAI_BASE_URL", "")
+            save_env_value("OPENAI_API_KEY", "")
+
+        _save_model_choice(selected)
+
+        # Update config with provider and base URL
+        cfg = load_config()
+        model = cfg.get("model")
+        if not isinstance(model, dict):
+            model = {"default": model} if model else {}
+            cfg["model"] = model
+        model["provider"] = provider_id
+        model["base_url"] = effective_base
+        save_config(cfg)
+        deactivate_provider()
+
+        endpoint_label = "Kimi Coding" if is_coding_plan else "Moonshot"
+        print(f"Default model set to: {selected} (via {endpoint_label})")
+    else:
+        print("No change.")
+
+
 def _model_flow_api_key_provider(config, provider_id, current_model=""):
-    """Generic flow for API-key providers (z.ai, Kimi, MiniMax)."""
+    """Generic flow for API-key providers (z.ai, MiniMax)."""
    from hermes_cli.auth import (
        PROVIDER_REGISTRY, _prompt_model_selection, _save_model_choice,
        _update_config_for_provider, deactivate_provider,
@@ -1479,7 +1595,7 @@ def cmd_config(args):

 def cmd_version(args):
    """Show version."""
-    print(f"Hermes Agent v{__version__}")
+    print(f"Hermes Agent v{__version__} ({__release_date__})")
    print(f"Project: {PROJECT_ROOT}")
    
    # Show Python version
@@ -1884,6 +2000,18 @@ For more help on a command:
        default=False,
        help="Run in an isolated git worktree (for parallel agents)"
    )
+    parser.add_argument(
+        "--yolo",
+        action="store_true",
+        default=False,
+        help="Bypass all dangerous command approval prompts (use at your own risk)"
+    )
+    parser.add_argument(
+        "--pass-session-id",
+        action="store_true",
+        default=False,
+        help="Include the session ID in the agent's system prompt"
+    )
    
    subparsers = parser.add_subparsers(dest="command", help="Command to run")
    
@@ -1918,6 +2046,11 @@ For more help on a command:
        action="store_true",
        help="Verbose output"
    )
+    chat_parser.add_argument(
+        "-Q", "--quiet",
+        action="store_true",
+        help="Quiet mode for programmatic use: suppress banner, spinner, and tool previews. Only output the final response and session info."
+    )
    chat_parser.add_argument(
        "--resume", "-r",
        metavar="SESSION_ID",
@@ -1944,6 +2077,18 @@ For more help on a command:
        default=False,
        help="Enable filesystem checkpoints before destructive file operations (use /rollback to restore)"
    )
+    chat_parser.add_argument(
+        "--yolo",
+        action="store_true",
+        default=False,
+        help="Bypass all dangerous command approval prompts (use at your own risk)"
+    )
+    chat_parser.add_argument(
+        "--pass-session-id",
+        action="store_true",
+        default=False,
+        help="Include the session ID in the agent's system prompt"
+    )
    chat_parser.set_defaults(func=cmd_chat)

    # =========================================================================
@@ -2230,8 +2375,8 @@ For more help on a command:
    # =========================================================================
    skills_parser = subparsers.add_parser(
        "skills",
-        help="Skills Hub — search, install, and manage skills from online registries",
-        description="Search, install, inspect, audit, and manage skills from GitHub, ClawHub, and other registries."
+        help="Search, install, configure, and manage skills",
+        description="Search, install, inspect, audit, configure, and manage skills from GitHub, ClawHub, and other registries."
    )
    skills_subparsers = skills_parser.add_subparsers(dest="skills_action")

@@ -2256,7 +2401,7 @@ For more help on a command:
    skills_inspect.add_argument("identifier", help="Skill identifier")

    skills_list = skills_subparsers.add_parser("list", help="List installed skills")
-    skills_list.add_argument("--source", default="all", choices=["all", "hub", "builtin"])
+    skills_list.add_argument("--source", default="all", choices=["all", "hub", "builtin", "local"])

    skills_audit = skills_subparsers.add_parser("audit", help="Re-scan installed hub skills")
    skills_audit.add_argument("name", nargs="?", help="Specific skill to audit (default: all)")
@@ -2285,9 +2430,17 @@ For more help on a command:
    tap_rm = tap_subparsers.add_parser("remove", help="Remove a tap")
    tap_rm.add_argument("name", help="Tap name to remove")

+    # config sub-action: interactive enable/disable
+    skills_subparsers.add_parser("config", help="Interactive skill configuration — enable/disable individual skills")
+
    def cmd_skills(args):
-        from hermes_cli.skills_hub import skills_command
-        skills_command(args)
+        # Route 'config' action to skills_config module
+        if getattr(args, 'skills_action', None) == 'config':
+            from hermes_cli.skills_config import skills_command as skills_config_command
+            skills_config_command(args)
+        else:
+            from hermes_cli.skills_hub import skills_command
+            skills_command(args)

    skills_parser.set_defaults(func=cmd_skills)

@@ -2299,13 +2452,17 @@ For more help on a command:
        help="Configure which tools are enabled per platform",
        description="Interactive tool configuration — enable/disable tools for CLI, Telegram, Discord, etc."
    )
+    tools_parser.add_argument(
+        "--summary",
+        action="store_true",
+        help="Print a summary of enabled tools per platform and exit"
+    )

    def cmd_tools(args):
        from hermes_cli.tools_config import tools_command
        tools_command(args)

    tools_parser.set_defaults(func=cmd_tools)
-
    # =========================================================================
    # sessions command
    # =========================================================================
@@ -2528,6 +2685,69 @@ For more help on a command:

    insights_parser.set_defaults(func=cmd_insights)

+    # =========================================================================
+    # claw command (OpenClaw migration)
+    # =========================================================================
+    claw_parser = subparsers.add_parser(
+        "claw",
+        help="OpenClaw migration tools",
+        description="Migrate settings, memories, skills, and API keys from OpenClaw to Hermes"
+    )
+    claw_subparsers = claw_parser.add_subparsers(dest="claw_action")
+
+    # claw migrate
+    claw_migrate = claw_subparsers.add_parser(
+        "migrate",
+        help="Migrate from OpenClaw to Hermes",
+        description="Import settings, memories, skills, and API keys from an OpenClaw installation"
+    )
+    claw_migrate.add_argument(
+        "--source",
+        help="Path to OpenClaw directory (default: ~/.openclaw)"
+    )
+    claw_migrate.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="Preview what would be migrated without making changes"
+    )
+    claw_migrate.add_argument(
+        "--preset",
+        choices=["user-data", "full"],
+        default="full",
+        help="Migration preset (default: full). 'user-data' excludes secrets"
+    )
+    claw_migrate.add_argument(
+        "--overwrite",
+        action="store_true",
+        help="Overwrite existing files (default: skip conflicts)"
+    )
+    claw_migrate.add_argument(
+        "--migrate-secrets",
+        action="store_true",
+        help="Include allowlisted secrets (TELEGRAM_BOT_TOKEN, API keys, etc.)"
+    )
+    claw_migrate.add_argument(
+        "--workspace-target",
+        help="Absolute path to copy workspace instructions into"
+    )
+    claw_migrate.add_argument(
+        "--skill-conflict",
+        choices=["skip", "overwrite", "rename"],
+        default="skip",
+        help="How to handle skill name conflicts (default: skip)"
+    )
+    claw_migrate.add_argument(
+        "--yes", "-y",
+        action="store_true",
+        help="Skip confirmation prompts"
+    )
+
+    def cmd_claw(args):
+        from hermes_cli.claw import claw_command
+        claw_command(args)
+
+    claw_parser.set_defaults(func=cmd_claw)
+
    # =========================================================================
    # version command
    # =========================================================================
@@ -31,6 +31,19 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
 ]

 _PROVIDER_MODELS: dict[str, list[str]] = {
+    "nous": [
+        "claude-opus-4-6",
+        "claude-sonnet-4-6",
+        "gpt-5.4",
+        "gemini-3-flash",
+        "gemini-3.0-pro-preview",
+        "deepseek-v3.2",
+    ],
+    "openai-codex": [
+        "gpt-5.2-codex",
+        "gpt-5.1-codex-mini",
+        "gpt-5.1-codex-max",
+    ],
    "zai": [
        "glm-5",
        "glm-4.7",
@@ -38,8 +51,10 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "glm-4.5-flash",
    ],
    "kimi-coding": [
+        "kimi-for-coding",
        "kimi-k2.5",
        "kimi-k2-thinking",
+        "kimi-k2-thinking-turbo",
        "kimi-k2-turbo-preview",
        "kimi-k2-0905-preview",
    ],
@@ -164,10 +179,22 @@ def parse_model_input(raw: str, current_provider: str) -> tuple[str, str]:


 def curated_models_for_provider(provider: Optional[str]) -> list[tuple[str, str]]:
-    """Return ``(model_id, description)`` tuples for a provider's curated list."""
+    """Return ``(model_id, description)`` tuples for a provider's model list.
+
+    Tries to fetch the live model list from the provider's API first,
+    falling back to the static ``_PROVIDER_MODELS`` catalog if the API
+    is unreachable.
+    """
    normalized = normalize_provider(provider)
    if normalized == "openrouter":
        return list(OPENROUTER_MODELS)
+
+    # Try live API first (Codex, Nous, etc. all support /models)
+    live = provider_model_ids(normalized)
+    if live:
+        return [(m, "") for m in live]
+
+    # Fallback to static catalog
    models = _PROVIDER_MODELS.get(normalized, [])
    return [(m, "") for m in models]

@@ -184,7 +211,11 @@ def normalize_provider(provider: Optional[str]) -> str:


 def provider_model_ids(provider: Optional[str]) -> list[str]:
-    """Return the best known model catalog for a provider."""
+    """Return the best known model catalog for a provider.
+
+    Tries live API endpoints for providers that support them (Codex, Nous),
+    falling back to static lists.
+    """
    normalized = normalize_provider(provider)
    if normalized == "openrouter":
        return model_ids()
@@ -192,6 +223,17 @@ def provider_model_ids(provider: Optional[str]) -> list[str]:
        from hermes_cli.codex_models import get_codex_model_ids

        return get_codex_model_ids()
+    if normalized == "nous":
+        # Try live Nous Portal /models endpoint
+        try:
+            from hermes_cli.auth import fetch_nous_models, resolve_nous_runtime_credentials
+            creds = resolve_nous_runtime_credentials()
+            if creds:
+                live = fetch_nous_models(creds.get("api_key", ""), creds.get("base_url", ""))
+                if live:
+                    return live
+        except Exception:
+            pass
    return list(_PROVIDER_MODELS.get(normalized, []))


@@ -263,6 +305,15 @@ def validate_requested_model(
            "message": "Model names cannot contain spaces.",
        }

+    # Custom endpoints can serve any model — skip validation
+    if normalized == "custom":
+        return {
+            "accepted": True,
+            "persist": True,
+            "recognized": False,
+            "message": None,
+        }
+
    # Probe the live API to check if the model actually exists
    api_models = fetch_api_models(api_key, base_url)

@@ -0,0 +1,181 @@
+"""
+Skills configuration for Hermes Agent.
+`hermes skills` enters this module.
+
+Toggle individual skills or categories on/off, globally or per-platform.
+Config stored in ~/.hermes/config.yaml under:
+
+  skills:
+    disabled: [skill-a, skill-b]          # global disabled list
+    platform_disabled:                    # per-platform overrides
+      telegram: [skill-c]
+      cli: []
+"""
+from typing import Dict, List, Optional, Set
+
+from hermes_cli.config import load_config, save_config
+from hermes_cli.colors import Colors, color
+
+PLATFORMS = {
+    "cli":      "🖥️  CLI",
+    "telegram": "📱 Telegram",
+    "discord":  "💬 Discord",
+    "slack":    "💼 Slack",
+    "whatsapp": "📱 WhatsApp",
+    "signal":   "📡 Signal",
+    "email":    "📧 Email",
+}
+
+# ─── Config Helpers ───────────────────────────────────────────────────────────
+
+def get_disabled_skills(config: dict, platform: Optional[str] = None) -> Set[str]:
+    """Return disabled skill names. Platform-specific list falls back to global."""
+    skills_cfg = config.get("skills", {})
+    global_disabled = set(skills_cfg.get("disabled", []))
+    if platform is None:
+        return global_disabled
+    platform_disabled = skills_cfg.get("platform_disabled", {}).get(platform)
+    if platform_disabled is None:
+        return global_disabled
+    return set(platform_disabled)
+
+
+def save_disabled_skills(config: dict, disabled: Set[str], platform: Optional[str] = None):
+    """Persist disabled skill names to config."""
+    config.setdefault("skills", {})
+    if platform is None:
+        config["skills"]["disabled"] = sorted(disabled)
+    else:
+        config["skills"].setdefault("platform_disabled", {})
+        config["skills"]["platform_disabled"][platform] = sorted(disabled)
+    save_config(config)
+
+
+# ─── Skill Discovery ─────────────────────────────────────────────────────────
+
+def _list_all_skills() -> List[dict]:
+    """Return all installed skills (ignoring disabled state)."""
+    try:
+        from tools.skills_tool import _find_all_skills
+        return _find_all_skills(skip_disabled=True)
+    except Exception:
+        return []
+
+
+def _get_categories(skills: List[dict]) -> List[str]:
+    """Return sorted unique category names (None -> 'uncategorized')."""
+    return sorted({s["category"] or "uncategorized" for s in skills})
+
+
+# ─── Platform Selection ──────────────────────────────────────────────────────
+
+def _select_platform() -> Optional[str]:
+    """Ask user which platform to configure, or global."""
+    options = [("global", "All platforms (global default)")] + list(PLATFORMS.items())
+    print()
+    print(color("  Configure skills for:", Colors.BOLD))
+    for i, (key, label) in enumerate(options, 1):
+        print(f"  {i}. {label}")
+    print()
+    try:
+        raw = input(color("  Select [1]: ", Colors.YELLOW)).strip()
+    except (KeyboardInterrupt, EOFError):
+        return None
+    if not raw:
+        return None  # global
+    try:
+        idx = int(raw) - 1
+        if 0 <= idx < len(options):
+            key = options[idx][0]
+            return None if key == "global" else key
+    except ValueError:
+        pass
+    return None
+
+
+# ─── Category Toggle ─────────────────────────────────────────────────────────
+
+def _toggle_by_category(skills: List[dict], disabled: Set[str]) -> Set[str]:
+    """Toggle all skills in a category at once."""
+    from hermes_cli.curses_ui import curses_checklist
+
+    categories = _get_categories(skills)
+    cat_labels = []
+    # A category is "enabled" (checked) when NOT all its skills are disabled
+    pre_selected = set()
+    for i, cat in enumerate(categories):
+        cat_skills = [s["name"] for s in skills if (s["category"] or "uncategorized") == cat]
+        cat_labels.append(f"{cat} ({len(cat_skills)} skills)")
+        if not all(s in disabled for s in cat_skills):
+            pre_selected.add(i)
+
+    chosen = curses_checklist(
+        "Categories — toggle entire categories",
+        cat_labels, pre_selected, cancel_returns=pre_selected,
+    )
+
+    new_disabled = set(disabled)
+    for i, cat in enumerate(categories):
+        cat_skills = {s["name"] for s in skills if (s["category"] or "uncategorized") == cat}
+        if i in chosen:
+            new_disabled -= cat_skills  # category enabled → remove from disabled
+        else:
+            new_disabled |= cat_skills  # category disabled → add to disabled
+    return new_disabled
+
+
+# ─── Entry Point ──────────────────────────────────────────────────────────────
+
+def skills_command(args=None):
+    """Entry point for `hermes skills`."""
+    from hermes_cli.curses_ui import curses_checklist
+
+    config = load_config()
+    skills = _list_all_skills()
+
+    if not skills:
+        print(color("  No skills installed.", Colors.DIM))
+        return
+
+    # Step 1: Select platform
+    platform = _select_platform()
+    platform_label = PLATFORMS.get(platform, "All platforms") if platform else "All platforms"
+
+    # Step 2: Select mode — individual or by category
+    print()
+    print(color(f"  Configure for: {platform_label}", Colors.DIM))
+    print()
+    print("  1. Toggle individual skills")
+    print("  2. Toggle by category")
+    print()
+    try:
+        mode = input(color("  Select [1]: ", Colors.YELLOW)).strip() or "1"
+    except (KeyboardInterrupt, EOFError):
+        return
+
+    disabled = get_disabled_skills(config, platform)
+
+    if mode == "2":
+        new_disabled = _toggle_by_category(skills, disabled)
+    else:
+        # Build labels and map indices → skill names
+        labels = [
+            f"{s['name']}  ({s['category'] or 'uncategorized'})  —  {s['description'][:55]}"
+            for s in skills
+        ]
+        # "selected" = enabled (not disabled) — matches the [✓] convention
+        pre_selected = {i for i, s in enumerate(skills) if s["name"] not in disabled}
+        chosen = curses_checklist(
+            f"Skills for {platform_label}",
+            labels, pre_selected, cancel_returns=pre_selected,
+        )
+        # Anything NOT chosen is disabled
+        new_disabled = {skills[i]["name"] for i in range(len(skills)) if i not in chosen}
+
+    if new_disabled == disabled:
+        print(color("  No changes.", Colors.DIM))
+        return
+
+    save_disabled_skills(config, new_disabled, platform)
+    enabled_count = len(skills) - len(new_disabled)
+    print(color(f"✓ Saved: {enabled_count} enabled, {len(new_disabled)} disabled ({platform_label}).", Colors.GREEN))
@@ -407,14 +407,16 @@ def do_inspect(identifier: str, console: Optional[Console] = None) -> None:


 def do_list(source_filter: str = "all", console: Optional[Console] = None) -> None:
-    """List installed skills, distinguishing builtins from hub-installed."""
+    """List installed skills, distinguishing hub, builtin, and local skills."""
    from tools.skills_hub import HubLockFile, ensure_hub_dirs
+    from tools.skills_sync import _read_manifest
    from tools.skills_tool import _find_all_skills

    c = console or _console
    ensure_hub_dirs()
    lock = HubLockFile()
    hub_installed = {e["name"]: e for e in lock.list_installed()}
+    builtin_names = set(_read_manifest())

    all_skills = _find_all_skills()

@@ -424,30 +426,42 @@ def do_list(source_filter: str = "all", console: Optional[Console] = None) -> No
    table.add_column("Source", style="dim")
    table.add_column("Trust", style="dim")

+    hub_count = 0
+    builtin_count = 0
+    local_count = 0
+
    for skill in sorted(all_skills, key=lambda s: (s.get("category") or "", s["name"])):
        name = skill["name"]
        category = skill.get("category", "")
        hub_entry = hub_installed.get(name)

        if hub_entry:
+            source_type = "hub"
            source_display = hub_entry.get("source", "hub")
            trust = hub_entry.get("trust_level", "community")
-        else:
+            hub_count += 1
+        elif name in builtin_names:
+            source_type = "builtin"
            source_display = "builtin"
            trust = "builtin"
+            builtin_count += 1
+        else:
+            source_type = "local"
+            source_display = "local"
+            trust = "local"
+            local_count += 1

-        if source_filter == "hub" and not hub_entry:
-            continue
-        if source_filter == "builtin" and hub_entry:
+        if source_filter != "all" and source_filter != source_type:
            continue

-        trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow"}.get(trust, "dim")
+        trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow", "local": "dim"}.get(trust, "dim")
        trust_label = "official" if source_display == "official" else trust
        table.add_row(name, category, source_display, f"[{trust_style}]{trust_label}[/]")

    c.print(table)
-    c.print(f"[dim]{len(hub_installed)} hub-installed, "
-            f"{len(all_skills) - len(hub_installed)} builtin[/]\n")
+    c.print(
+        f"[dim]{hub_count} hub-installed, {builtin_count} builtin, {local_count} local[/]\n"
+    )


 def do_audit(name: Optional[str] = None, console: Optional[Console] = None) -> None:
@@ -1014,7 +1028,7 @@ def _print_skills_help(console: Console) -> None:
        "  [cyan]search[/] <query>              Search registries for skills\n"
        "  [cyan]install[/] <identifier>        Install a skill (with security scan)\n"
        "  [cyan]inspect[/] <identifier>        Preview a skill without installing\n"
-        "  [cyan]list[/] [--source hub|builtin] List installed skills\n"
+        "  [cyan]list[/] [--source hub|builtin|local] List installed skills\n"
        "  [cyan]audit[/] [name]                Re-scan hub skills for security\n"
        "  [cyan]uninstall[/] <name>            Remove a hub-installed skill\n"
        "  [cyan]publish[/] <path> --repo <r>   Publish a skill to GitHub via PR\n"
@@ -208,6 +208,7 @@ def show_status(args):
        "WhatsApp": ("WHATSAPP_ENABLED", None),
        "Signal": ("SIGNAL_HTTP_URL", "SIGNAL_HOME_CHANNEL"),
        "Slack": ("SLACK_BOT_TOKEN", None),
+        "Email": ("EMAIL_ADDRESS", "EMAIL_HOME_ADDRESS"),
    }
    
    for name, (token_var, home_var) in platforms.items():
@@ -11,7 +11,7 @@ the `platform_toolsets` key.

 import sys
 from pathlib import Path
-from typing import Dict, List, Set
+from typing import Dict, List, Optional, Set

 import os

@@ -108,6 +108,8 @@ PLATFORMS = {
    "discord":  {"label": "💬 Discord",    "default_toolset": "hermes-discord"},
    "slack":    {"label": "💼 Slack",      "default_toolset": "hermes-slack"},
    "whatsapp": {"label": "📱 WhatsApp",   "default_toolset": "hermes-whatsapp"},
+    "signal":   {"label": "📡 Signal",     "default_toolset": "hermes-signal"},
+    "email":    {"label": "📧 Email",      "default_toolset": "hermes-email"},
 }


@@ -308,6 +310,22 @@ def _get_enabled_platforms() -> List[str]:
    return enabled


+def _platform_toolset_summary(config: dict, platforms: Optional[List[str]] = None) -> Dict[str, Set[str]]:
+    """Return a summary of enabled toolsets per platform.
+
+    When ``platforms`` is None, this uses ``_get_enabled_platforms`` to
+    auto-detect platforms. Tests can pass an explicit list to avoid relying
+    on environment variables.
+    """
+    if platforms is None:
+        platforms = _get_enabled_platforms()
+
+    summary: Dict[str, Set[str]] = {}
+    for pkey in platforms:
+        summary[pkey] = _get_platform_tools(config, pkey)
+    return summary
+
+
 def _get_platform_tools(config: dict, platform: str) -> Set[str]:
    """Resolve which individual toolset names are enabled for a platform."""
    from toolsets import resolve_toolset, TOOLSETS
@@ -447,6 +465,7 @@ def _prompt_choice(question: str, choices: list, default: int = 0) -> int:

 def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
    """Multi-select checklist of toolsets. Returns set of selected toolset keys."""
+    from hermes_cli.curses_ui import curses_checklist

    labels = []
    for ts_key, ts_label, ts_desc in CONFIGURABLE_TOOLSETS:
@@ -455,112 +474,18 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str
            suffix = "  [no API key]"
        labels.append(f"{ts_label}  ({ts_desc}){suffix}")

-    pre_selected_indices = [
+    pre_selected = {
        i for i, (ts_key, _, _) in enumerate(CONFIGURABLE_TOOLSETS)
        if ts_key in enabled
-    ]
+    }

-    # Curses-based multi-select — arrow keys + space to toggle + enter to confirm.
-    # simple_term_menu has rendering bugs in tmux, iTerm, and other terminals.
-    try:
-        import curses
-        selected = set(pre_selected_indices)
-        result_holder = [None]
-
-        def _curses_checklist(stdscr):
-            curses.curs_set(0)
-            if curses.has_colors():
-                curses.start_color()
-                curses.use_default_colors()
-                curses.init_pair(1, curses.COLOR_GREEN, -1)
-                curses.init_pair(2, curses.COLOR_YELLOW, -1)
-                curses.init_pair(3, 8, -1)  # dim gray
-            cursor = 0
-            scroll_offset = 0
-
-            while True:
-                stdscr.clear()
-                max_y, max_x = stdscr.getmaxyx()
-                header = f"Tools for {platform_label}  —  ↑↓ navigate, SPACE toggle, ENTER confirm"
-                try:
-                    stdscr.addnstr(0, 0, header, max_x - 1, curses.A_BOLD | curses.color_pair(2) if curses.has_colors() else curses.A_BOLD)
-                except curses.error:
-                    pass
-
-                visible_rows = max_y - 3
-                if cursor < scroll_offset:
-                    scroll_offset = cursor
-                elif cursor >= scroll_offset + visible_rows:
-                    scroll_offset = cursor - visible_rows + 1
-
-                for draw_i, i in enumerate(range(scroll_offset, min(len(labels), scroll_offset + visible_rows))):
-                    y = draw_i + 2
-                    if y >= max_y - 1:
-                        break
-                    check = "✓" if i in selected else " "
-                    arrow = "→" if i == cursor else " "
-                    line = f" {arrow} [{check}] {labels[i]}"
-
-                    attr = curses.A_NORMAL
-                    if i == cursor:
-                        attr = curses.A_BOLD
-                        if curses.has_colors():
-                            attr |= curses.color_pair(1)
-                    try:
-                        stdscr.addnstr(y, 0, line, max_x - 1, attr)
-                    except curses.error:
-                        pass
-
-                stdscr.refresh()
-                key = stdscr.getch()
-
-                if key in (curses.KEY_UP, ord('k')):
-                    cursor = (cursor - 1) % len(labels)
-                elif key in (curses.KEY_DOWN, ord('j')):
-                    cursor = (cursor + 1) % len(labels)
-                elif key == ord(' '):
-                    if cursor in selected:
-                        selected.discard(cursor)
-                    else:
-                        selected.add(cursor)
-                elif key in (curses.KEY_ENTER, 10, 13):
-                    result_holder[0] = {CONFIGURABLE_TOOLSETS[i][0] for i in selected}
-                    return
-                elif key in (27, ord('q')):  # ESC or q
-                    result_holder[0] = enabled
-                    return
-
-        curses.wrapper(_curses_checklist)
-        return result_holder[0] if result_holder[0] is not None else enabled
-
-    except Exception:
-        pass  # fall through to numbered toggle
-
-    # Final fallback: numbered toggle (Windows without curses, etc.)
-    selected = set(pre_selected_indices)
-    print(color(f"\n  Tools for {platform_label}", Colors.YELLOW))
-    print(color("  Toggle by number, Enter to confirm.\n", Colors.DIM))
-
-    while True:
-        for i, label in enumerate(labels):
-            marker = color("[✓]", Colors.GREEN) if i in selected else "[ ]"
-            print(f"  {marker} {i + 1:>2}. {label}")
-        print()
-        try:
-            val = input(color("  Toggle # (or Enter to confirm): ", Colors.DIM)).strip()
-            if not val:
-                break
-            idx = int(val) - 1
-            if 0 <= idx < len(labels):
-                if idx in selected:
-                    selected.discard(idx)
-                else:
-                    selected.add(idx)
-        except (ValueError, KeyboardInterrupt, EOFError):
-            return enabled
-        print()
-
-    return {CONFIGURABLE_TOOLSETS[i][0] for i in selected}
+    chosen = curses_checklist(
+        f"Tools for {platform_label}",
+        labels,
+        pre_selected,
+        cancel_returns=pre_selected,
+    )
+    return {CONFIGURABLE_TOOLSETS[i][0] for i in chosen}


 # ─── Provider-Aware Configuration ────────────────────────────────────────────
@@ -874,6 +799,26 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
    enabled_platforms = _get_enabled_platforms()

    print()
+
+    # Non-interactive summary mode for CLI usage
+    if getattr(args, "summary", False):
+        total = len(CONFIGURABLE_TOOLSETS)
+        print(color("⚕ Tool Summary", Colors.CYAN, Colors.BOLD))
+        print()
+        summary = _platform_toolset_summary(config, enabled_platforms)
+        for pkey in enabled_platforms:
+            pinfo = PLATFORMS[pkey]
+            enabled = summary.get(pkey, set())
+            count = len(enabled)
+            print(color(f"  {pinfo['label']}", Colors.BOLD) + color(f"  ({count}/{total})", Colors.DIM))
+            if enabled:
+                for ts_key in sorted(enabled):
+                    label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts_key), ts_key)
+                    print(color(f"    ✓ {label}", Colors.GREEN))
+            else:
+                print(color("    (none enabled)", Colors.DIM))
+        print()
+        return
    print(color("⚕ Hermes Tool Configuration", Colors.CYAN, Colors.BOLD))
    print(color("  Enable or disable tools per platform.", Colors.DIM))
    print(color("  Tools that need API keys will be configured when enabled.", Colors.DIM))
@@ -941,22 +886,68 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
        platform_choices.append(f"Configure {pinfo['label']}  ({count}/{total} enabled)")
        platform_keys.append(pkey)

+    if len(platform_keys) > 1:
+        platform_choices.append("Configure all platforms (global)")
    platform_choices.append("Reconfigure an existing tool's provider or API key")
    platform_choices.append("Done")

+    # Index offsets for the extra options after per-platform entries
+    _global_idx = len(platform_keys) if len(platform_keys) > 1 else -1
+    _reconfig_idx = len(platform_keys) + (1 if len(platform_keys) > 1 else 0)
+    _done_idx = _reconfig_idx + 1
+
    while True:
        idx = _prompt_choice("Select an option:", platform_choices, default=0)

        # "Done" selected
-        if idx == len(platform_keys) + 1:
+        if idx == _done_idx:
            break

        # "Reconfigure" selected
-        if idx == len(platform_keys):
+        if idx == _reconfig_idx:
            _reconfigure_tool(config)
            print()
            continue

+        # "Configure all platforms (global)" selected
+        if idx == _global_idx:
+            # Use the union of all platforms' current tools as the starting state
+            all_current = set()
+            for pk in platform_keys:
+                all_current |= _get_platform_tools(config, pk)
+            new_enabled = _prompt_toolset_checklist("All platforms", all_current)
+            if new_enabled != all_current:
+                for pk in platform_keys:
+                    prev = _get_platform_tools(config, pk)
+                    added = new_enabled - prev
+                    removed = prev - new_enabled
+                    pinfo_inner = PLATFORMS[pk]
+                    if added or removed:
+                        print(color(f"  {pinfo_inner['label']}:", Colors.DIM))
+                        for ts in sorted(added):
+                            label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts), ts)
+                            print(color(f"    + {label}", Colors.GREEN))
+                        for ts in sorted(removed):
+                            label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts), ts)
+                            print(color(f"    - {label}", Colors.RED))
+                    # Configure API keys for newly enabled tools
+                    for ts_key in sorted(added):
+                        if (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
+                            if not _toolset_has_keys(ts_key):
+                                _configure_toolset(ts_key, config)
+                    _save_platform_tools(config, pk, new_enabled)
+                save_config(config)
+                print(color("  ✓ Saved configuration for all platforms", Colors.GREEN))
+                # Update choice labels
+                for ci, pk in enumerate(platform_keys):
+                    new_count = len(_get_platform_tools(config, pk))
+                    total = len(CONFIGURABLE_TOOLSETS)
+                    platform_choices[ci] = f"Configure {PLATFORMS[pk]['label']}  ({new_count}/{total} enabled)"
+            else:
+                print(color("  No changes", Colors.DIM))
+            print()
+            continue
+
        pkey = platform_keys[idx]
        pinfo = PLATFORMS[pkey]

@@ -189,29 +189,30 @@ class MiniSWERunner:
        )
        self.logger = logging.getLogger(__name__)
        
-        # Initialize OpenAI client - defaults to OpenRouter
-        from openai import OpenAI
-        
-        client_kwargs = {}
-        
-        # Default to OpenRouter if no base_url provided
-        if base_url:
-            client_kwargs["base_url"] = base_url
+        # Initialize LLM client via centralized provider router.
+        # If explicit api_key/base_url are provided (e.g. from CLI args),
+        # construct directly.  Otherwise use the router for OpenRouter.
+        if api_key or base_url:
+            from openai import OpenAI
+            client_kwargs = {
+                "base_url": base_url or "https://openrouter.ai/api/v1",
+                "api_key": api_key or os.getenv(
+                    "OPENROUTER_API_KEY",
+                    os.getenv("ANTHROPIC_API_KEY",
+                              os.getenv("OPENAI_API_KEY", ""))),
+            }
+            self.client = OpenAI(**client_kwargs)
        else:
-            client_kwargs["base_url"] = "https://openrouter.ai/api/v1"
-
-
-        
-        # Handle API key - OpenRouter is the primary provider
-        if api_key:
-            client_kwargs["api_key"] = api_key
-        else:
-            client_kwargs["api_key"] = os.getenv(
-                "OPENROUTER_API_KEY",
-                os.getenv("ANTHROPIC_API_KEY", os.getenv("OPENAI_API_KEY", ""))
-            )
-        
-        self.client = OpenAI(**client_kwargs)
+            from agent.auxiliary_client import resolve_provider_client
+            self.client, _ = resolve_provider_client("openrouter", model=model)
+            if self.client is None:
+                # Fallback: try auto-detection
+                self.client, _ = resolve_provider_client("auto", model=model)
+            if self.client is None:
+                from openai import OpenAI
+                self.client = OpenAI(
+                    base_url="https://openrouter.ai/api/v1",
+                    api_key=os.getenv("OPENROUTER_API_KEY", ""))
        
        # Environment will be created per-task
        self.env = None
@@ -14,6 +14,22 @@ metadata:

 Use this skill when a user wants to move their OpenClaw setup into Hermes Agent with minimal manual cleanup.

+## CLI Command
+
+For a quick, non-interactive migration, use the built-in CLI command:
+
+```bash
+hermes claw migrate              # Full interactive migration
+hermes claw migrate --dry-run    # Preview what would be migrated
+hermes claw migrate --preset user-data   # Migrate without secrets
+hermes claw migrate --overwrite  # Overwrite existing conflicts
+hermes claw migrate --source /custom/path/.openclaw  # Custom source
+```
+
+The CLI command runs the same migration script described below. Use this skill (via the agent) when you want an interactive, guided migration with dry-run previews and per-item conflict resolution.
+
+**First-time setup:** The `hermes setup` wizard automatically detects `~/.openclaw` and offers migration before configuration begins.
+
 ## What this skill does

 It uses `scripts/openclaw_to_hermes.py` to:
@@ -0,0 +1,218 @@
+# Checkpoint & Rollback — Implementation Plan
+
+## Goal
+
+Automatic filesystem snapshots before destructive file operations, with user-facing rollback. The agent never sees or interacts with this — it's transparent infrastructure.
+
+## Design Principles
+
+1. **Not a tool** — the LLM never knows about it. Zero prompt tokens, zero tool schema overhead.
+2. **Once per turn** — checkpoint at most once per conversation turn (user message → agent response cycle), triggered lazily on the first file-mutating operation. Not on every write.
+3. **Opt-in via config** — disabled by default, enabled with `checkpoints: true` in config.yaml.
+4. **Works on any directory** — uses a shadow git repo completely separate from the user's project git. Works on git repos, non-git directories, anything.
+5. **User-facing rollback** — `/rollback` slash command (CLI + gateway) to list and restore checkpoints. Also `hermes rollback` CLI subcommand.
+
+## Architecture
+
+```
+~/.hermes/checkpoints/
+  {sha256(abs_dir)[:16]}/       # Shadow git repo per working directory
+    HEAD, refs/, objects/...    # Standard git internals
+    HERMES_WORKDIR              # Original dir path (for display)
+    info/exclude                # Default excludes (node_modules, .env, etc.)
+```
+
+### Core: CheckpointManager (new file: tools/checkpoint_manager.py)
+
+Adapted from PR #559's CheckpointStore. Key changes from the PR:
+
+- **Not a tool** — no schema, no registry entry, no handler
+- **Turn-scoped deduplication** — tracks `_checkpointed_dirs: Set[str]` per turn
+- **Configurable** — reads `checkpoints` config key
+- **Pruning** — keeps last N snapshots per directory (default 50), prunes on take
+
+```python
+class CheckpointManager:
+    def __init__(self, enabled: bool = False, max_snapshots: int = 50):
+        self.enabled = enabled
+        self.max_snapshots = max_snapshots
+        self._checkpointed_dirs: Set[str] = set()  # reset each turn
+
+    def new_turn(self):
+        """Call at start of each conversation turn to reset dedup."""
+        self._checkpointed_dirs.clear()
+
+    def ensure_checkpoint(self, working_dir: str, reason: str = "auto") -> None:
+        """Take a checkpoint if enabled and not already done this turn."""
+        if not self.enabled:
+            return
+        abs_dir = str(Path(working_dir).resolve())
+        if abs_dir in self._checkpointed_dirs:
+            return
+        self._checkpointed_dirs.add(abs_dir)
+        try:
+            self._take(abs_dir, reason)
+        except Exception as e:
+            logger.debug("Checkpoint failed (non-fatal): %s", e)
+
+    def list_checkpoints(self, working_dir: str) -> List[dict]:
+        """List available checkpoints for a directory."""
+        ...
+
+    def restore(self, working_dir: str, commit_hash: str) -> dict:
+        """Restore files to a checkpoint state."""
+        ...
+
+    def _take(self, working_dir: str, reason: str):
+        """Shadow git: add -A + commit. Prune if over max_snapshots."""
+        ...
+
+    def _prune(self, shadow_repo: Path):
+        """Keep only last max_snapshots commits."""
+        ...
+```
+
+### Integration Point: run_agent.py
+
+The AIAgent already owns the conversation loop. Add CheckpointManager as an instance attribute:
+
+```python
+class AIAgent:
+    def __init__(self, ...):
+        ...
+        # Checkpoint manager — reads config to determine if enabled
+        self._checkpoint_mgr = CheckpointManager(
+            enabled=config.get("checkpoints", False),
+            max_snapshots=config.get("checkpoint_max_snapshots", 50),
+        )
+```
+
+**Turn boundary** — in `run_conversation()`, call `new_turn()` at the start of each agent iteration (before processing tool calls):
+
+```python
+# Inside the main loop, before _execute_tool_calls():
+self._checkpoint_mgr.new_turn()
+```
+
+**Trigger point** — in `_execute_tool_calls()`, before dispatching file-mutating tools:
+
+```python
+# Before the handle_function_call dispatch:
+if function_name in ("write_file", "patch"):
+    # Determine working dir from the file path in the args
+    file_path = function_args.get("path", "") or function_args.get("old_string", "")
+    if file_path:
+        work_dir = str(Path(file_path).parent.resolve())
+        self._checkpoint_mgr.ensure_checkpoint(work_dir, f"before {function_name}")
+```
+
+This means:
+- First `write_file` in a turn → checkpoint (fast, one `git add -A && git commit`)
+- Subsequent writes in the same turn → no-op (already checkpointed)
+- Next turn (new user message) → fresh checkpoint eligibility
+
+### Config
+
+Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`:
+
+```python
+"checkpoints": False,          # Enable filesystem checkpoints before destructive ops
+"checkpoint_max_snapshots": 50, # Max snapshots to keep per directory
+```
+
+User enables with:
+```yaml
+# ~/.hermes/config.yaml
+checkpoints: true
+```
+
+### User-Facing Rollback
+
+**CLI slash command** — add `/rollback` to `process_command()` in `cli.py`:
+
+```
+/rollback         — List recent checkpoints for the current directory
+/rollback <hash>  — Restore files to that checkpoint
+```
+
+Shows a numbered list:
+```
+📸 Checkpoints for /home/user/project:
+  1. abc1234  2026-03-09 21:15  before write_file (3 files changed)
+  2. def5678  2026-03-09 20:42  before patch (1 file changed)
+  3. ghi9012  2026-03-09 20:30  before write_file (2 files changed)
+
+Use /rollback <number> to restore, e.g. /rollback 1
+```
+
+**Gateway slash command** — add `/rollback` to gateway/run.py with the same behavior.
+
+**CLI subcommand** — `hermes rollback` (optional, lower priority).
+
+### What Gets Excluded (not checkpointed)
+
+Same as the PR's defaults — written to the shadow repo's `info/exclude`:
+
+```
+node_modules/
+dist/
+build/
+.env
+.env.*
+__pycache__/
+*.pyc
+.DS_Store
+*.log
+.cache/
+.venv/
+.git/
+```
+
+Also respects the project's `.gitignore` if present (shadow repo can read it via `core.excludesFile`).
+
+### Safety
+
+- `ensure_checkpoint()` wraps everything in try/except — a checkpoint failure never blocks the actual file operation
+- Shadow repo is completely isolated — GIT_DIR + GIT_WORK_TREE env vars, never touches user's .git
+- If git isn't installed, checkpoints silently disable
+- Large directories: add a file count check — skip checkpoint if >50K files to avoid slowdowns
+
+## Files to Create/Modify
+
+| File | Change |
+|------|--------|
+| `tools/checkpoint_manager.py` | **NEW** — CheckpointManager class (adapted from PR #559) |
+| `run_agent.py` | Add CheckpointManager init + trigger in `_execute_tool_calls()` |
+| `hermes_cli/config.py` | Add `checkpoints` + `checkpoint_max_snapshots` to DEFAULT_CONFIG |
+| `cli.py` | Add `/rollback` slash command handler |
+| `gateway/run.py` | Add `/rollback` slash command handler |
+| `tests/tools/test_checkpoint_manager.py` | **NEW** — tests (adapted from PR #559's tests) |
+
+## What We Take From PR #559
+
+- `_shadow_repo_path()` — deterministic path hashing ✅
+- `_git_env()` — GIT_DIR/GIT_WORK_TREE isolation ✅
+- `_run_git()` — subprocess wrapper with timeout ✅
+- `_init_shadow_repo()` — shadow repo initialization ✅
+- `DEFAULT_EXCLUDES` list ✅
+- Test structure and patterns ✅
+
+## What We Change From PR #559
+
+- **Remove tool schema/registry** — not a tool
+- **Remove injection into file_operations.py and patch_parser.py** — trigger from run_agent.py instead
+- **Add turn-scoped deduplication** — one checkpoint per turn, not per operation
+- **Add pruning** — keep last N snapshots
+- **Add config flag** — opt-in, not mandatory
+- **Add /rollback command** — user-facing restore UI
+- **Add file count guard** — skip huge directories
+
+## Implementation Order
+
+1. `tools/checkpoint_manager.py` — core class with take/list/restore/prune
+2. `tests/tools/test_checkpoint_manager.py` — tests
+3. `hermes_cli/config.py` — config keys
+4. `run_agent.py` — integration (init + trigger)
+5. `cli.py` — `/rollback` slash command
+6. `gateway/run.py` — `/rollback` slash command
+7. Full test suite run + manual smoke test
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "hermes-agent"
-version = "0.1.0"
+version = "0.2.0"
 description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -40,7 +40,7 @@ dependencies = [
 [project.optional-dependencies]
 modal = ["swe-rex[modal]>=1.4.0"]
 daytona = ["daytona>=0.148.0"]
-dev = ["pytest", "pytest-asyncio", "mcp>=1.2.0"]
+dev = ["pytest", "pytest-asyncio", "pytest-xdist", "mcp>=1.2.0"]
 messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0", "aiohttp>=3.9.0", "slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
 cron = ["croniter"]
 slack = ["slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
@@ -53,6 +53,13 @@ pty = [
 honcho = ["honcho-ai>=2.0.1"]
 mcp = ["mcp>=1.2.0"]
 homeassistant = ["aiohttp>=3.9.0"]
+rl = [
+  "atroposlib @ git+https://github.com/NousResearch/atropos.git",
+  "tinker @ git+https://github.com/thinking-machines-lab/tinker.git",
+  "fastapi>=0.104.0",
+  "uvicorn[standard]>=0.24.0",
+  "wandb>=0.15.0",
+]
 yc-bench = ["yc-bench @ git+https://github.com/collinear-ai/yc-bench.git"]
 all = [
  "hermes-agent[modal]",
@@ -84,4 +91,4 @@ testpaths = ["tests"]
 markers = [
    "integration: marks tests requiring external services (API keys, Modal, etc.)",
 ]
-addopts = "-m 'not integration'"
+addopts = "-m 'not integration' -n auto"
@@ -99,6 +99,51 @@ from agent.trajectory import (
 )


+class _SafeWriter:
+    """Transparent stdout wrapper that catches OSError from broken pipes.
+
+    When hermes-agent runs as a systemd service, Docker container, or headless
+    daemon, the stdout pipe can become unavailable (idle timeout, buffer
+    exhaustion, socket reset). Any print() call then raises
+    ``OSError: [Errno 5] Input/output error``, which can crash
+    run_conversation() — especially via double-fault when the except handler
+    also tries to print.
+
+    This wrapper delegates all writes to the underlying stream and silently
+    catches OSError.  It is installed once at the start of run_conversation()
+    and is transparent when stdout is healthy (zero overhead on the happy path).
+    """
+
+    __slots__ = ("_inner",)
+
+    def __init__(self, inner):
+        object.__setattr__(self, "_inner", inner)
+
+    def write(self, data):
+        try:
+            return self._inner.write(data)
+        except OSError:
+            return len(data) if isinstance(data, str) else 0
+
+    def flush(self):
+        try:
+            self._inner.flush()
+        except OSError:
+            pass
+
+    def fileno(self):
+        return self._inner.fileno()
+
+    def isatty(self):
+        try:
+            return self._inner.isatty()
+        except OSError:
+            return False
+
+    def __getattr__(self, name):
+        return getattr(self._inner, name)
+
+
 class IterationBudget:
    """Thread-safe shared iteration counter for parent and child agents.

@@ -173,6 +218,7 @@ class AIAgent:
        session_id: str = None,
        tool_progress_callback: callable = None,
        thinking_callback: callable = None,
+        reasoning_callback: callable = None,
        clarify_callback: callable = None,
        step_callback: callable = None,
        max_tokens: int = None,
@@ -187,6 +233,7 @@ class AIAgent:
        fallback_model: Dict[str, Any] = None,
        checkpoints_enabled: bool = False,
        checkpoint_max_snapshots: int = 50,
+        pass_session_id: bool = False,
    ):
        """
        Initialize the AI Agent.
@@ -241,6 +288,7 @@ class AIAgent:
        self.ephemeral_system_prompt = ephemeral_system_prompt
        self.platform = platform  # "cli", "telegram", "discord", "whatsapp", etc.
        self.skip_context_files = skip_context_files
+        self.pass_session_id = pass_session_id
        self.log_prefix_chars = log_prefix_chars
        self.log_prefix = f"{log_prefix} " if log_prefix else ""
        # Store effective base URL for feature detection (prompt caching, reasoning, etc.)
@@ -260,6 +308,7 @@ class AIAgent:

        self.tool_progress_callback = tool_progress_callback
        self.thinking_callback = thinking_callback
+        self.reasoning_callback = reasoning_callback
        self.clarify_callback = clarify_callback
        self.step_callback = step_callback
        self._last_reported_tool = None  # Track for "new tool" mode
@@ -297,6 +346,13 @@ class AIAgent:
        self._use_prompt_caching = is_openrouter and is_claude
        self._cache_ttl = "5m"  # Default 5-minute TTL (1.25x write cost)
        
+        # Iteration budget pressure: warn the LLM as it approaches max_iterations.
+        # Warnings are injected into the last tool result JSON (not as separate
+        # messages) so they don't break message structure or invalidate caching.
+        self._budget_caution_threshold = 0.7   # 70% — nudge to start wrapping up
+        self._budget_warning_threshold = 0.9   # 90% — urgent, respond now
+        self._budget_pressure_enabled = True
+
        # Persistent error log -- always writes WARNING+ to ~/.hermes/logs/errors.log
        # so tool failures, API errors, etc. are inspectable after the fact.
        from agent.redact import RedactingFormatter
@@ -364,36 +420,50 @@ class AIAgent:
                ]:
                    logging.getLogger(quiet_logger).setLevel(logging.ERROR)
        
-        # Initialize OpenAI client - defaults to OpenRouter
-        client_kwargs = {}
-        
-        # Default to OpenRouter if no base_url provided
-        if base_url:
-            client_kwargs["base_url"] = base_url
+        # Initialize OpenAI client via centralized provider router.
+        # The router handles auth resolution, base URL, headers, and
+        # Codex wrapping for all known providers.
+        # raw_codex=True because the main agent needs direct responses.stream()
+        # access for Codex Responses API streaming.
+        if api_key and base_url:
+            # Explicit credentials from CLI/gateway — construct directly.
+            # The runtime provider resolver already handled auth for us.
+            client_kwargs = {"api_key": api_key, "base_url": base_url}
+            effective_base = base_url
+            if "openrouter" in effective_base.lower():
+                client_kwargs["default_headers"] = {
+                    "HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
+                    "X-OpenRouter-Title": "Hermes Agent",
+                    "X-OpenRouter-Categories": "productivity,cli-agent",
+                }
+            elif "api.kimi.com" in effective_base.lower():
+                client_kwargs["default_headers"] = {
+                    "User-Agent": "KimiCLI/1.3",
+                }
        else:
-            client_kwargs["base_url"] = OPENROUTER_BASE_URL
-        
-        # Handle API key - OpenRouter is the primary provider
-        if api_key:
-            client_kwargs["api_key"] = api_key
-        else:
-            # Primary: OPENROUTER_API_KEY, fallback to direct provider keys
-            client_kwargs["api_key"] = os.getenv("OPENROUTER_API_KEY", "")
-        
-        # OpenRouter app attribution — shows hermes-agent in rankings/analytics
-        effective_base = client_kwargs.get("base_url", "")
-        if "openrouter" in effective_base.lower():
-            client_kwargs["default_headers"] = {
-                "HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
-                "X-OpenRouter-Title": "Hermes Agent",
-                "X-OpenRouter-Categories": "productivity,cli-agent",
-            }
-        elif "api.kimi.com" in effective_base.lower():
-            # Kimi Code API requires a recognized coding-agent User-Agent
-            # (see https://github.com/MoonshotAI/kimi-cli)
-            client_kwargs["default_headers"] = {
-                "User-Agent": "KimiCLI/1.0",
-            }
+            # No explicit creds — use the centralized provider router
+            from agent.auxiliary_client import resolve_provider_client
+            _routed_client, _ = resolve_provider_client(
+                self.provider or "auto", model=self.model, raw_codex=True)
+            if _routed_client is not None:
+                client_kwargs = {
+                    "api_key": _routed_client.api_key,
+                    "base_url": str(_routed_client.base_url),
+                }
+                # Preserve any default_headers the router set
+                if hasattr(_routed_client, '_default_headers') and _routed_client._default_headers:
+                    client_kwargs["default_headers"] = dict(_routed_client._default_headers)
+            else:
+                # Final fallback: try raw OpenRouter key
+                client_kwargs = {
+                    "api_key": os.getenv("OPENROUTER_API_KEY", ""),
+                    "base_url": OPENROUTER_BASE_URL,
+                    "default_headers": {
+                        "HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
+                        "X-OpenRouter-Title": "Hermes Agent",
+                        "X-OpenRouter-Categories": "productivity,cli-agent",
+                    },
+                }
        
        self._client_kwargs = client_kwargs  # stored for rebuilding after interrupt
        try:
@@ -1397,7 +1467,14 @@ class AIAgent:
                    prompt_parts.append(user_block)

        has_skills_tools = any(name in self.valid_tool_names for name in ['skills_list', 'skill_view', 'skill_manage'])
-        skills_prompt = build_skills_system_prompt() if has_skills_tools else ""
+        if has_skills_tools:
+            avail_toolsets = {ts for ts, avail in check_toolset_requirements().items() if avail}
+            skills_prompt = build_skills_system_prompt(
+                available_tools=self.valid_tool_names,
+                available_toolsets=avail_toolsets,
+            )
+        else:
+            skills_prompt = ""
        if skills_prompt:
            prompt_parts.append(skills_prompt)

@@ -1408,9 +1485,10 @@ class AIAgent:

        from hermes_time import now as _hermes_now
        now = _hermes_now()
-        prompt_parts.append(
-            f"Conversation started: {now.strftime('%A, %B %d, %Y %I:%M %p')}"
-        )
+        timestamp_line = f"Conversation started: {now.strftime('%A, %B %d, %Y %I:%M %p')}"
+        if self.pass_session_id and self.session_id:
+            timestamp_line += f"\nSession ID: {self.session_id}"
+        prompt_parts.append(timestamp_line)

        platform_key = (self.platform or "").lower().strip()
        if platform_key in PLATFORM_HINTS:
@@ -1772,6 +1850,7 @@ class AIAgent:
        allowed_keys = {
            "model", "instructions", "input", "tools", "store",
            "reasoning", "include", "max_output_tokens", "temperature",
+            "tool_choice", "parallel_tool_calls", "prompt_cache_key",
        }
        normalized: Dict[str, Any] = {
            "model": model,
@@ -1797,6 +1876,12 @@ class AIAgent:
        if isinstance(temperature, (int, float)):
            normalized["temperature"] = float(temperature)

+        # Pass through tool_choice, parallel_tool_calls, prompt_cache_key
+        for passthrough_key in ("tool_choice", "parallel_tool_calls", "prompt_cache_key"):
+            val = api_kwargs.get(passthrough_key)
+            if val is not None:
+                normalized[passthrough_key] = val
+
        if allow_stream:
            stream = api_kwargs.get("stream")
            if stream is not None and stream is not True:
@@ -2175,75 +2260,6 @@ class AIAgent:

    # ── Provider fallback ──────────────────────────────────────────────────

-    # API-key providers: provider → (base_url, [env_var_names])
-    _FALLBACK_API_KEY_PROVIDERS = {
-        "openrouter": (OPENROUTER_BASE_URL, ["OPENROUTER_API_KEY"]),
-        "zai": ("https://api.z.ai/api/paas/v4", ["ZAI_API_KEY", "Z_AI_API_KEY"]),
-        "kimi-coding": ("https://api.moonshot.ai/v1", ["KIMI_API_KEY"]),
-        "minimax": ("https://api.minimax.io/v1", ["MINIMAX_API_KEY"]),
-        "minimax-cn": ("https://api.minimaxi.com/v1", ["MINIMAX_CN_API_KEY"]),
-    }
-
-    # OAuth providers: provider → (resolver_import_path, api_mode)
-    # Each resolver returns {"api_key": ..., "base_url": ...}.
-    _FALLBACK_OAUTH_PROVIDERS = {
-        "openai-codex": ("resolve_codex_runtime_credentials", "codex_responses"),
-        "nous": ("resolve_nous_runtime_credentials", "chat_completions"),
-    }
-
-    def _resolve_fallback_credentials(
-        self, fb_provider: str, fb_config: dict
-    ) -> Optional[tuple]:
-        """Resolve credentials for a fallback provider.
-
-        Returns (api_key, base_url, api_mode) on success, or None on failure.
-        Handles three cases:
-          1. OAuth providers (openai-codex, nous) — call credential resolver
-          2. API-key providers (openrouter, zai, etc.) — read env var
-          3. Custom endpoints — use base_url + api_key_env from config
-        """
-        # ── 1. OAuth providers ────────────────────────────────────────
-        if fb_provider in self._FALLBACK_OAUTH_PROVIDERS:
-            resolver_name, api_mode = self._FALLBACK_OAUTH_PROVIDERS[fb_provider]
-            try:
-                import hermes_cli.auth as _auth
-                resolver = getattr(_auth, resolver_name)
-                creds = resolver()
-                return creds["api_key"], creds["base_url"], api_mode
-            except Exception as e:
-                logging.warning(
-                    "Fallback to %s failed (credential resolution): %s",
-                    fb_provider, e,
-                )
-                return None
-
-        # ── 2. API-key providers ──────────────────────────────────────
-        fb_key = (fb_config.get("api_key") or "").strip()
-        if not fb_key:
-            key_env = (fb_config.get("api_key_env") or "").strip()
-            if key_env:
-                fb_key = os.getenv(key_env, "")
-            elif fb_provider in self._FALLBACK_API_KEY_PROVIDERS:
-                for env_var in self._FALLBACK_API_KEY_PROVIDERS[fb_provider][1]:
-                    fb_key = os.getenv(env_var, "")
-                    if fb_key:
-                        break
-        if not fb_key:
-            logging.warning(
-                "Fallback model configured but no API key found for provider '%s'",
-                fb_provider,
-            )
-            return None
-
-        # ── 3. Resolve base URL ───────────────────────────────────────
-        fb_base_url = (fb_config.get("base_url") or "").strip()
-        if not fb_base_url and fb_provider in self._FALLBACK_API_KEY_PROVIDERS:
-            fb_base_url = self._FALLBACK_API_KEY_PROVIDERS[fb_provider][0]
-        if not fb_base_url:
-            fb_base_url = OPENROUTER_BASE_URL
-
-        return fb_key, fb_base_url, "chat_completions"
-
    def _try_activate_fallback(self) -> bool:
        """Switch to the configured fallback model/provider.

@@ -2251,6 +2267,10 @@ class AIAgent:
        OpenAI client, model slug, and provider in-place so the retry loop
        can continue with the new backend.  One-shot: returns False if
        already activated or not configured.
+
+        Uses the centralized provider router (resolve_provider_client) for
+        auth resolution and client construction — no duplicated provider→key
+        mappings.
        """
        if self._fallback_activated or not self._fallback_model:
            return False
@@ -2261,25 +2281,31 @@ class AIAgent:
        if not fb_provider or not fb_model:
            return False

-        resolved = self._resolve_fallback_credentials(fb_provider, fb)
-        if resolved is None:
-            return False
-        fb_key, fb_base_url, fb_api_mode = resolved
-
-        # Build new client
+        # Use centralized router for client construction.
+        # raw_codex=True because the main agent needs direct responses.stream()
+        # access for Codex providers.
        try:
-            client_kwargs = {"api_key": fb_key, "base_url": fb_base_url}
-            if "openrouter" in fb_base_url.lower():
-                client_kwargs["default_headers"] = {
-                    "HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
-                    "X-OpenRouter-Title": "Hermes Agent",
-                    "X-OpenRouter-Categories": "productivity,cli-agent",
-                }
-            elif "api.kimi.com" in fb_base_url.lower():
-                client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
+            from agent.auxiliary_client import resolve_provider_client
+            fb_client, _ = resolve_provider_client(
+                fb_provider, model=fb_model, raw_codex=True)
+            if fb_client is None:
+                logging.warning(
+                    "Fallback to %s failed: provider not configured",
+                    fb_provider)
+                return False

-            self.client = OpenAI(**client_kwargs)
-            self._client_kwargs = client_kwargs
+            # Determine api_mode from provider
+            fb_api_mode = "chat_completions"
+            if fb_provider == "openai-codex":
+                fb_api_mode = "codex_responses"
+            fb_base_url = str(fb_client.base_url)
+
+            # Swap client and config in-place
+            self.client = fb_client
+            self._client_kwargs = {
+                "api_key": fb_client.api_key,
+                "base_url": fb_base_url,
+            }
            old_model = self.model
            self.model = fb_model
            self.provider = fb_provider
@@ -2333,7 +2359,10 @@ class AIAgent:
                "instructions": instructions,
                "input": self._chat_messages_to_responses_input(payload_messages),
                "tools": self._responses_tools(),
+                "tool_choice": "auto",
+                "parallel_tool_calls": True,
                "store": False,
+                "prompt_cache_key": self.session_id,
            }

            if reasoning_enabled:
@@ -2373,16 +2402,26 @@ class AIAgent:

        extra_body = {}

-        if provider_preferences:
-            extra_body["provider"] = provider_preferences
-
        _is_openrouter = "openrouter" in self.base_url.lower()
+
+        # Provider preferences (only, ignore, order, sort) are OpenRouter-
+        # specific.  Only send to OpenRouter-compatible endpoints.
+        # TODO: Nous Portal will add transparent proxy support — re-enable
+        # for _is_nous when their backend is updated.
+        if provider_preferences and _is_openrouter:
+            extra_body["provider"] = provider_preferences
        _is_nous = "nousresearch" in self.base_url.lower()

        _is_mistral = "api.mistral.ai" in self.base_url.lower()
        if (_is_openrouter or _is_nous) and not _is_mistral:
            if self.reasoning_config is not None:
-                extra_body["reasoning"] = self.reasoning_config
+                rc = dict(self.reasoning_config)
+                # Nous Portal requires reasoning enabled — don't send
+                # enabled=false to it (would cause 400).
+                if _is_nous and rc.get("enabled") is False:
+                    pass  # omit reasoning entirely for Nous when disabled
+                else:
+                    extra_body["reasoning"] = rc
            else:
                extra_body["reasoning"] = {
                    "enabled": True,
@@ -2406,10 +2445,26 @@ class AIAgent:
        """
        reasoning_text = self._extract_reasoning(assistant_message)

+        # Fallback: extract inline <think> blocks from content when no structured
+        # reasoning fields are present (some models/providers embed thinking
+        # directly in the content rather than returning separate API fields).
+        if not reasoning_text:
+            content = assistant_message.content or ""
+            think_blocks = re.findall(r'<think>(.*?)</think>', content, flags=re.DOTALL)
+            if think_blocks:
+                combined = "\n\n".join(b.strip() for b in think_blocks if b.strip())
+                reasoning_text = combined or None
+
        if reasoning_text and self.verbose_logging:
            preview = reasoning_text[:100] + "..." if len(reasoning_text) > 100 else reasoning_text
            logging.debug(f"Captured reasoning ({len(reasoning_text)} chars): {preview}")

+        if reasoning_text and self.reasoning_callback:
+            try:
+                self.reasoning_callback(reasoning_text)
+            except Exception:
+                pass
+
        msg = {
            "role": "assistant",
            "content": assistant_message.content or "",
@@ -2488,6 +2543,31 @@ class AIAgent:

        return msg

+    @staticmethod
+    def _sanitize_tool_calls_for_strict_api(api_msg: dict) -> dict:
+        """Strip Codex Responses API fields from tool_calls for strict providers.
+
+        Providers like Mistral strictly validate the Chat Completions schema
+        and reject unknown fields (call_id, response_item_id) with 422.
+        These fields are preserved in the internal message history — this
+        method only modifies the outgoing API copy.
+
+        Creates new tool_call dicts rather than mutating in-place, so the
+        original messages list retains call_id/response_item_id for Codex
+        Responses API compatibility (e.g. if the session falls back to a
+        Codex provider later).
+        """
+        tool_calls = api_msg.get("tool_calls")
+        if not isinstance(tool_calls, list):
+            return api_msg
+        _STRIP_KEYS = {"call_id", "response_item_id"}
+        api_msg["tool_calls"] = [
+            {k: v for k, v in tc.items() if k not in _STRIP_KEYS}
+            if isinstance(tc, dict) else tc
+            for tc in tool_calls
+        ]
+        return api_msg
+
    def flush_memories(self, messages: list = None, min_turns: int = None):
        """Give the model one turn to persist memories before context is lost.

@@ -2525,6 +2605,7 @@ class AIAgent:

        try:
            # Build API messages for the flush call
+            _is_strict_api = "api.mistral.ai" in self.base_url.lower()
            api_messages = []
            for msg in messages:
                api_msg = msg.copy()
@@ -2535,6 +2616,8 @@ class AIAgent:
                api_msg.pop("reasoning", None)
                api_msg.pop("finish_reason", None)
                api_msg.pop("_flush_sentinel", None)
+                if _is_strict_api:
+                    self._sanitize_tool_calls_for_strict_api(api_msg)
                api_messages.append(api_msg)

            if self._cached_system_prompt:
@@ -2553,19 +2636,22 @@ class AIAgent:

            # Use auxiliary client for the flush call when available --
            # it's cheaper and avoids Codex Responses API incompatibility.
-            from agent.auxiliary_client import get_text_auxiliary_client
-            aux_client, aux_model = get_text_auxiliary_client()
+            from agent.auxiliary_client import call_llm as _call_llm
+            _aux_available = True
+            try:
+                response = _call_llm(
+                    task="flush_memories",
+                    messages=api_messages,
+                    tools=[memory_tool_def],
+                    temperature=0.3,
+                    max_tokens=5120,
+                    timeout=30.0,
+                )
+            except RuntimeError:
+                _aux_available = False
+                response = None

-            if aux_client:
-                api_kwargs = {
-                    "model": aux_model,
-                    "messages": api_messages,
-                    "tools": [memory_tool_def],
-                    "temperature": 0.3,
-                    "max_tokens": 5120,
-                }
-                response = aux_client.chat.completions.create(**api_kwargs, timeout=30.0)
-            elif self.api_mode == "codex_responses":
+            if not _aux_available and self.api_mode == "codex_responses":
                # No auxiliary client -- use the Codex Responses path directly
                codex_kwargs = self._build_api_kwargs(api_messages)
                codex_kwargs["tools"] = self._responses_tools([memory_tool_def])
@@ -2573,7 +2659,7 @@ class AIAgent:
                if "max_output_tokens" in codex_kwargs:
                    codex_kwargs["max_output_tokens"] = 5120
                response = self._run_codex_stream(codex_kwargs)
-            else:
+            elif not _aux_available:
                api_kwargs = {
                    "model": self.model,
                    "messages": api_messages,
@@ -2585,7 +2671,7 @@ class AIAgent:

            # Extract tool calls from the response, handling both API formats
            tool_calls = []
-            if self.api_mode == "codex_responses" and not aux_client:
+            if self.api_mode == "codex_responses" and not _aux_available:
                assistant_msg, _ = self._normalize_codex_response(response)
                if assistant_msg and assistant_msg.tool_calls:
                    tool_calls = assistant_msg.tool_calls
@@ -2691,7 +2777,7 @@ class AIAgent:

        return compressed, new_system_prompt

-    def _execute_tool_calls(self, assistant_message, messages: list, effective_task_id: str) -> None:
+    def _execute_tool_calls(self, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None:
        """Execute tool calls from the assistant message and append results to messages."""
        for i, tool_call in enumerate(assistant_message.tool_calls, 1):
            # SAFETY: check interrupt BEFORE starting each tool.
@@ -2938,6 +3024,51 @@ class AIAgent:
            if self.tool_delay > 0 and i < len(assistant_message.tool_calls):
                time.sleep(self.tool_delay)

+        # ── Budget pressure injection ─────────────────────────────────
+        # After all tool calls in this turn are processed, check if we're
+        # approaching max_iterations. If so, inject a warning into the LAST
+        # tool result's JSON so the LLM sees it naturally when reading results.
+        budget_warning = self._get_budget_warning(api_call_count)
+        if budget_warning and messages and messages[-1].get("role") == "tool":
+            last_content = messages[-1]["content"]
+            try:
+                parsed = json.loads(last_content)
+                if isinstance(parsed, dict):
+                    parsed["_budget_warning"] = budget_warning
+                    messages[-1]["content"] = json.dumps(parsed, ensure_ascii=False)
+                else:
+                    messages[-1]["content"] = last_content + f"\n\n{budget_warning}"
+            except (json.JSONDecodeError, TypeError):
+                messages[-1]["content"] = last_content + f"\n\n{budget_warning}"
+            if not self.quiet_mode:
+                remaining = self.max_iterations - api_call_count
+                tier = "⚠️  WARNING" if remaining <= self.max_iterations * 0.1 else "💡 CAUTION"
+                print(f"{self.log_prefix}{tier}: {remaining} iterations remaining")
+
+    def _get_budget_warning(self, api_call_count: int) -> Optional[str]:
+        """Return a budget pressure string, or None if not yet needed.
+
+        Two-tier system:
+          - Caution (70%): nudge to consolidate work
+          - Warning (90%): urgent, must respond now
+        """
+        if not self._budget_pressure_enabled or self.max_iterations <= 0:
+            return None
+        progress = api_call_count / self.max_iterations
+        remaining = self.max_iterations - api_call_count
+        if progress >= self._budget_warning_threshold:
+            return (
+                f"[BUDGET WARNING: Iteration {api_call_count}/{self.max_iterations}. "
+                f"Only {remaining} iteration(s) left. "
+                "Provide your final response NOW. No more tool calls unless absolutely critical.]"
+            )
+        if progress >= self._budget_caution_threshold:
+            return (
+                f"[BUDGET: Iteration {api_call_count}/{self.max_iterations}. "
+                f"{remaining} iterations left. Start consolidating your work.]"
+            )
+        return None
+
    def _handle_max_iterations(self, messages: list, api_call_count: int) -> str:
        """Request a summary when max iterations are reached. Returns the final response text."""
        print(f"⚠️  Reached maximum iterations ({self.max_iterations}). Requesting summary...")
@@ -2952,11 +3083,14 @@ class AIAgent:
        try:
            # Build API messages, stripping internal-only fields
            # (finish_reason, reasoning) that strict APIs like Mistral reject with 422
+            _is_strict_api = "api.mistral.ai" in self.base_url.lower()
            api_messages = []
            for msg in messages:
                api_msg = msg.copy()
                for internal_field in ("reasoning", "finish_reason"):
                    api_msg.pop(internal_field, None)
+                if _is_strict_api:
+                    self._sanitize_tool_calls_for_strict_api(api_msg)
                api_messages.append(api_msg)

            effective_system = self._cached_system_prompt or ""
@@ -3087,6 +3221,11 @@ class AIAgent:
        Returns:
            Dict: Complete conversation result with final response and message history
        """
+        # Guard stdout against OSError from broken pipes (systemd/headless/daemon).
+        # Installed once, transparent when stdout is healthy, prevents crash on write.
+        if not isinstance(sys.stdout, _SafeWriter):
+            sys.stdout = _SafeWriter(sys.stdout)
+
        # Generate unique task_id if not provided to isolate VMs between concurrent tasks
        effective_task_id = task_id or str(uuid.uuid4())
        
@@ -3330,6 +3469,12 @@ class AIAgent:
                # Remove finish_reason - not accepted by strict APIs (e.g. Mistral)
                if "finish_reason" in api_msg:
                    api_msg.pop("finish_reason")
+                # Strip Codex Responses API fields (call_id, response_item_id) for
+                # strict providers like Mistral that reject unknown fields with 422.
+                # Uses new dicts so the internal messages list retains the fields
+                # for Codex Responses compatibility.
+                if "api.mistral.ai" in self.base_url.lower():
+                    self._sanitize_tool_calls_for_strict_api(api_msg)
                # Keep 'reasoning_details' - OpenRouter uses this for multi-turn reasoning context
                # The signature field helps maintain reasoning continuity
                api_messages.append(api_msg)
@@ -3399,7 +3544,7 @@ class AIAgent:
            
            api_start_time = time.time()
            retry_count = 0
-            max_retries = 6  # Increased to allow longer backoff periods
+            max_retries = 3
            compression_attempts = 0
            max_compression_attempts = 3
            codex_auth_retry_attempted = False
@@ -3802,6 +3947,7 @@ class AIAgent:
                        'token limit', 'too many tokens', 'reduce the length',
                        'exceeds the limit', 'context window',
                        'request entity too large',  # OpenRouter/Nous 413 safety net
+                        'prompt is too long',  # Anthropic: "prompt is too long: N tokens > M maximum"
                    ])
                    
                    if is_context_length_error:
@@ -3869,8 +4015,11 @@ class AIAgent:
                    # These indicate a problem with the request itself (bad model ID,
                    # invalid API key, forbidden, etc.) and will never succeed on retry.
                    # Note: 413 and context-length errors are excluded — handled above.
+                    # Also catch local validation errors (ValueError, TypeError) — these
+                    # are programming bugs, not transient failures.
+                    is_local_validation_error = isinstance(api_error, (ValueError, TypeError))
                    is_client_status_error = isinstance(status_code, int) and 400 <= status_code < 500 and status_code != 413
-                    is_client_error = (is_client_status_error or any(phrase in error_msg for phrase in [
+                    is_client_error = (is_local_validation_error or is_client_status_error or any(phrase in error_msg for phrase in [
                        'error code: 401', 'error code: 403',
                        'error code: 404', 'error code: 422',
                        'is not a valid model', 'invalid model', 'model not found',
@@ -4183,7 +4332,8 @@ class AIAgent:
                    
                    messages.append(assistant_msg)
                    
-                    self._execute_tool_calls(assistant_message, messages, effective_task_id)
+                    _msg_count_before_tools = len(messages)
+                    self._execute_tool_calls(assistant_message, messages, effective_task_id, api_call_count)

                    # Refund the iteration if the ONLY tool(s) called were
                    # execute_code (programmatic tool calling).  These are
@@ -4192,7 +4342,20 @@ class AIAgent:
                    if _tc_names == {"execute_code"}:
                        self.iteration_budget.refund()
                    
-                    if self.compression_enabled and self.context_compressor.should_compress():
+                    # Estimate next prompt size using real token counts from the
+                    # last API response + rough estimate of newly appended tool
+                    # results.  This catches cases where tool results push the
+                    # context past the limit that last_prompt_tokens alone misses
+                    # (e.g. large file reads, web extractions).
+                    _compressor = self.context_compressor
+                    _new_tool_msgs = messages[_msg_count_before_tools:]
+                    _new_chars = sum(len(str(m.get("content", "") or "")) for m in _new_tool_msgs)
+                    _estimated_next_prompt = (
+                        _compressor.last_prompt_tokens
+                        + _compressor.last_completion_tokens
+                        + _new_chars // 3  # conservative: JSON-heavy tool results ≈ 3 chars/token
+                    )
+                    if self.compression_enabled and _compressor.should_compress(_estimated_next_prompt):
                        messages, active_system_prompt = self._compress_context(
                            messages, system_message,
                            approx_tokens=self.context_compressor.last_prompt_tokens,
@@ -4415,9 +4578,17 @@ class AIAgent:
        if final_response and not interrupted:
            self._honcho_sync(original_user_message, final_response)

+        # Extract reasoning from the last assistant message (if any)
+        last_reasoning = None
+        for msg in reversed(messages):
+            if msg.get("role") == "assistant" and msg.get("reasoning"):
+                last_reasoning = msg["reasoning"]
+                break
+
        # Build result with interrupt info if applicable
        result = {
            "final_response": final_response,
+            "last_reasoning": last_reasoning,
            "messages": messages,
            "api_calls": api_call_count,
            "completed": completed,
@@ -572,17 +572,16 @@ clone_repo() {
        fi
    else
        # Try SSH first (for private repo access), fall back to HTTPS
-        # Use --recurse-submodules to also clone mini-swe-agent and tinker-atropos
        # GIT_SSH_COMMAND disables interactive prompts and sets a short timeout
        # so SSH fails fast instead of hanging when no key is configured.
        log_info "Trying SSH clone..."
        if GIT_SSH_COMMAND="ssh -o BatchMode=yes -o ConnectTimeout=5" \
-           git clone --branch "$BRANCH" --recurse-submodules "$REPO_URL_SSH" "$INSTALL_DIR" 2>/dev/null; then
+           git clone --branch "$BRANCH" "$REPO_URL_SSH" "$INSTALL_DIR" 2>/dev/null; then
            log_success "Cloned via SSH"
        else
            rm -rf "$INSTALL_DIR" 2>/dev/null  # Clean up partial SSH clone
            log_info "SSH failed, trying HTTPS..."
-            if git clone --branch "$BRANCH" --recurse-submodules "$REPO_URL_HTTPS" "$INSTALL_DIR"; then
+            if git clone --branch "$BRANCH" "$REPO_URL_HTTPS" "$INSTALL_DIR"; then
                log_success "Cloned via HTTPS"
            else
                log_error "Failed to clone repository"
@@ -593,10 +592,12 @@ clone_repo() {

    cd "$INSTALL_DIR"

-    # Ensure submodules are initialized and updated (for existing installs or if --recurse failed)
-    log_info "Initializing submodules (mini-swe-agent, tinker-atropos)..."
-    git submodule update --init --recursive
-    log_success "Submodules ready"
+    # Only init mini-swe-agent (terminal tool backend — required).
+    # tinker-atropos (RL training) is optional and heavy — users can opt in later
+    # with: git submodule update --init tinker-atropos && uv pip install -e ./tinker-atropos
+    log_info "Initializing mini-swe-agent submodule (terminal backend)..."
+    git submodule update --init mini-swe-agent
+    log_success "Submodule ready"

    log_success "Repository ready"
 }
@@ -679,12 +680,11 @@ install_deps() {
        log_warn "mini-swe-agent not found (run: git submodule update --init)"
    fi

-    log_info "Installing tinker-atropos (RL training backend)..."
+    # tinker-atropos (RL training) is optional — skip by default.
+    # To enable RL tools: git submodule update --init tinker-atropos && uv pip install -e "./tinker-atropos"
    if [ -d "tinker-atropos" ] && [ -f "tinker-atropos/pyproject.toml" ]; then
-        $UV_CMD pip install -e "./tinker-atropos" || log_warn "tinker-atropos install failed (RL tools may not work)"
-        log_success "tinker-atropos installed"
-    else
-        log_warn "tinker-atropos not found (run: git submodule update --init)"
+        log_info "tinker-atropos submodule found — skipping install (optional, for RL training)"
+        log_info "  To install: $UV_CMD pip install -e \"./tinker-atropos\""
    fi

    log_success "All dependencies installed"
@@ -0,0 +1,540 @@
+#!/usr/bin/env python3
+"""Hermes Agent Release Script
+
+Generates changelogs and creates GitHub releases with CalVer tags.
+
+Usage:
+    # Preview changelog (dry run)
+    python scripts/release.py
+
+    # Preview with semver bump
+    python scripts/release.py --bump minor
+
+    # Create the release
+    python scripts/release.py --bump minor --publish
+
+    # First release (no previous tag)
+    python scripts/release.py --bump minor --publish --first-release
+
+    # Override CalVer date (e.g. for a belated release)
+    python scripts/release.py --bump minor --publish --date 2026.3.15
+"""
+
+import argparse
+import json
+import os
+import re
+import subprocess
+import sys
+from collections import defaultdict
+from datetime import datetime
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+VERSION_FILE = REPO_ROOT / "hermes_cli" / "__init__.py"
+PYPROJECT_FILE = REPO_ROOT / "pyproject.toml"
+
+# ──────────────────────────────────────────────────────────────────────
+# Git email → GitHub username mapping
+# ──────────────────────────────────────────────────────────────────────
+
+# Auto-extracted from noreply emails + manual overrides
+AUTHOR_MAP = {
+    # teknium (multiple emails)
+    "teknium1@gmail.com": "teknium1",
+    "teknium@nousresearch.com": "teknium1",
+    "127238744+teknium1@users.noreply.github.com": "teknium1",
+    # contributors (from noreply pattern)
+    "35742124+0xbyt4@users.noreply.github.com": "0xbyt4",
+    "82637225+kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
+    "16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
+    "185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
+    "101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
+    "126368201+vilkasdev@users.noreply.github.com": "vilkasdev",
+    "137614867+cutepawss@users.noreply.github.com": "cutepawss",
+    "96793918+memosr@users.noreply.github.com": "memosr",
+    "131039422+SHL0MS@users.noreply.github.com": "SHL0MS",
+    "77628552+raulvidis@users.noreply.github.com": "raulvidis",
+    "145567217+Aum08Desai@users.noreply.github.com": "Aum08Desai",
+    "256820943+kshitij-eliza@users.noreply.github.com": "kshitij-eliza",
+    "44278268+shitcoinsherpa@users.noreply.github.com": "shitcoinsherpa",
+    "104278804+Sertug17@users.noreply.github.com": "Sertug17",
+    "112503481+caentzminger@users.noreply.github.com": "caentzminger",
+    "258577966+voidborne-d@users.noreply.github.com": "voidborne-d",
+    "70424851+insecurejezza@users.noreply.github.com": "insecurejezza",
+    "259807879+Bartok9@users.noreply.github.com": "Bartok9",
+    # contributors (manual mapping from git names)
+    "dmayhem93@gmail.com": "dmahan93",
+    "samherring99@gmail.com": "samherring99",
+    "desaiaum08@gmail.com": "Aum08Desai",
+    "shannon.sands.1979@gmail.com": "shannonsands",
+    "shannon@nousresearch.com": "shannonsands",
+    "eri@plasticlabs.ai": "Erosika",
+    "hjcpuro@gmail.com": "hjc-puro",
+    "xaydinoktay@gmail.com": "aydnOktay",
+    "abdullahfarukozden@gmail.com": "Farukest",
+    "lovre.pesut@gmail.com": "rovle",
+    "hakanerten02@hotmail.com": "teyrebaz33",
+    "alireza78.crypto@gmail.com": "alireza78a",
+    "brooklyn.bb.nicholson@gmail.com": "brooklynnicholson",
+    "gpickett00@gmail.com": "gpickett00",
+    "mcosma@gmail.com": "wakamex",
+    "clawdia.nash@proton.me": "clawdia-nash",
+    "pickett.austin@gmail.com": "austinpickett",
+    "jaisehgal11299@gmail.com": "jaisup",
+    "percydikec@gmail.com": "PercyDikec",
+    "dean.kerr@gmail.com": "deankerr",
+    "socrates1024@gmail.com": "socrates1024",
+    "satelerd@gmail.com": "satelerd",
+    "numman.ali@gmail.com": "nummanali",
+    "0xNyk@users.noreply.github.com": "0xNyk",
+    "0xnykcd@googlemail.com": "0xNyk",
+    "buraysandro9@gmail.com": "buray",
+    "contact@jomar.fr": "joshmartinelle",
+    "camilo@tekelala.com": "tekelala",
+    "vincentcharlebois@gmail.com": "vincentcharlebois",
+    "aryan@synvoid.com": "aryansingh",
+    "johnsonblake1@gmail.com": "blakejohnson",
+    "bryan@intertwinesys.com": "bryanyoung",
+    "christo.mitov@gmail.com": "christomitov",
+    "hermes@nousresearch.com": "NousResearch",
+    "openclaw@sparklab.ai": "openclaw",
+    "semihcvlk53@gmail.com": "Himess",
+    "erenkar950@gmail.com": "erenkarakus",
+    "adavyasharma@gmail.com": "adavyas",
+    "acaayush1111@gmail.com": "aayushchaudhary",
+    "jason@outland.art": "jasonoutland",
+    "mrflu1918@proton.me": "SPANISHFLU",
+    "morganemoss@gmai.com": "mormio",
+    "kopjop926@gmail.com": "cesareth",
+    "fuleinist@gmail.com": "fuleinist",
+    "jack.47@gmail.com": "JackTheGit",
+    "dalvidjr2022@gmail.com": "Jr-kenny",
+    "m@statecraft.systems": "mbierling",
+    "balyan.sid@gmail.com": "balyansid",
+}
+
+
+def git(*args, cwd=None):
+    """Run a git command and return stdout."""
+    result = subprocess.run(
+        ["git"] + list(args),
+        capture_output=True, text=True,
+        cwd=cwd or str(REPO_ROOT),
+    )
+    if result.returncode != 0:
+        print(f"git {' '.join(args)} failed: {result.stderr}", file=sys.stderr)
+        return ""
+    return result.stdout.strip()
+
+
+def get_last_tag():
+    """Get the most recent CalVer tag."""
+    tags = git("tag", "--list", "v20*", "--sort=-v:refname")
+    if tags:
+        return tags.split("\n")[0]
+    return None
+
+
+def get_current_version():
+    """Read current semver from __init__.py."""
+    content = VERSION_FILE.read_text()
+    match = re.search(r'__version__\s*=\s*"([^"]+)"', content)
+    return match.group(1) if match else "0.0.0"
+
+
+def bump_version(current: str, part: str) -> str:
+    """Bump a semver version string."""
+    parts = current.split(".")
+    if len(parts) != 3:
+        parts = ["0", "0", "0"]
+    major, minor, patch = int(parts[0]), int(parts[1]), int(parts[2])
+
+    if part == "major":
+        major += 1
+        minor = 0
+        patch = 0
+    elif part == "minor":
+        minor += 1
+        patch = 0
+    elif part == "patch":
+        patch += 1
+    else:
+        raise ValueError(f"Unknown bump part: {part}")
+
+    return f"{major}.{minor}.{patch}"
+
+
+def update_version_files(semver: str, calver_date: str):
+    """Update version strings in source files."""
+    # Update __init__.py
+    content = VERSION_FILE.read_text()
+    content = re.sub(
+        r'__version__\s*=\s*"[^"]+"',
+        f'__version__ = "{semver}"',
+        content,
+    )
+    content = re.sub(
+        r'__release_date__\s*=\s*"[^"]+"',
+        f'__release_date__ = "{calver_date}"',
+        content,
+    )
+    VERSION_FILE.write_text(content)
+
+    # Update pyproject.toml
+    pyproject = PYPROJECT_FILE.read_text()
+    pyproject = re.sub(
+        r'^version\s*=\s*"[^"]+"',
+        f'version = "{semver}"',
+        pyproject,
+        flags=re.MULTILINE,
+    )
+    PYPROJECT_FILE.write_text(pyproject)
+
+
+def resolve_author(name: str, email: str) -> str:
+    """Resolve a git author to a GitHub @mention."""
+    # Try email lookup first
+    gh_user = AUTHOR_MAP.get(email)
+    if gh_user:
+        return f"@{gh_user}"
+
+    # Try noreply pattern
+    noreply_match = re.match(r"(\d+)\+(.+)@users\.noreply\.github\.com", email)
+    if noreply_match:
+        return f"@{noreply_match.group(2)}"
+
+    # Try username@users.noreply.github.com
+    noreply_match2 = re.match(r"(.+)@users\.noreply\.github\.com", email)
+    if noreply_match2:
+        return f"@{noreply_match2.group(1)}"
+
+    # Fallback to git name
+    return name
+
+
+def categorize_commit(subject: str) -> str:
+    """Categorize a commit by its conventional commit prefix."""
+    subject_lower = subject.lower()
+
+    # Match conventional commit patterns
+    patterns = {
+        "breaking": [r"^breaking[\s:(]", r"^!:", r"BREAKING CHANGE"],
+        "features": [r"^feat[\s:(]", r"^feature[\s:(]", r"^add[\s:(]"],
+        "fixes": [r"^fix[\s:(]", r"^bugfix[\s:(]", r"^bug[\s:(]", r"^hotfix[\s:(]"],
+        "improvements": [r"^improve[\s:(]", r"^perf[\s:(]", r"^enhance[\s:(]",
+                         r"^refactor[\s:(]", r"^cleanup[\s:(]", r"^clean[\s:(]",
+                         r"^update[\s:(]", r"^optimize[\s:(]"],
+        "docs": [r"^doc[\s:(]", r"^docs[\s:(]"],
+        "tests": [r"^test[\s:(]", r"^tests[\s:(]"],
+        "chore": [r"^chore[\s:(]", r"^ci[\s:(]", r"^build[\s:(]",
+                  r"^deps[\s:(]", r"^bump[\s:(]"],
+    }
+
+    for category, regexes in patterns.items():
+        for regex in regexes:
+            if re.match(regex, subject_lower):
+                return category
+
+    # Heuristic fallbacks
+    if any(w in subject_lower for w in ["add ", "new ", "implement", "support "]):
+        return "features"
+    if any(w in subject_lower for w in ["fix ", "fixed ", "resolve", "patch "]):
+        return "fixes"
+    if any(w in subject_lower for w in ["refactor", "cleanup", "improve", "update "]):
+        return "improvements"
+
+    return "other"
+
+
+def clean_subject(subject: str) -> str:
+    """Clean up a commit subject for display."""
+    # Remove conventional commit prefix
+    cleaned = re.sub(r"^(feat|fix|docs|chore|refactor|test|perf|ci|build|improve|add|update|cleanup|hotfix|breaking|enhance|optimize|bugfix|bug|feature|tests|deps|bump)[\s:(!]+\s*", "", subject, flags=re.IGNORECASE)
+    # Remove trailing issue refs that are redundant with PR links
+    cleaned = cleaned.strip()
+    # Capitalize first letter
+    if cleaned:
+        cleaned = cleaned[0].upper() + cleaned[1:]
+    return cleaned
+
+
+def get_commits(since_tag=None):
+    """Get commits since a tag (or all commits if None)."""
+    if since_tag:
+        range_spec = f"{since_tag}..HEAD"
+    else:
+        range_spec = "HEAD"
+
+    # Format: hash|author_name|author_email|subject
+    log = git(
+        "log", range_spec,
+        "--format=%H|%an|%ae|%s",
+        "--no-merges",
+    )
+
+    if not log:
+        return []
+
+    commits = []
+    for line in log.split("\n"):
+        if not line.strip():
+            continue
+        parts = line.split("|", 3)
+        if len(parts) != 4:
+            continue
+        sha, name, email, subject = parts
+        commits.append({
+            "sha": sha,
+            "short_sha": sha[:8],
+            "author_name": name,
+            "author_email": email,
+            "subject": subject,
+            "category": categorize_commit(subject),
+            "github_author": resolve_author(name, email),
+        })
+
+    return commits
+
+
+def get_pr_number(subject: str) -> str:
+    """Extract PR number from commit subject if present."""
+    match = re.search(r"#(\d+)", subject)
+    if match:
+        return match.group(1)
+    return None
+
+
+def generate_changelog(commits, tag_name, semver, repo_url="https://github.com/NousResearch/hermes-agent",
+                       prev_tag=None, first_release=False):
+    """Generate markdown changelog from categorized commits."""
+    lines = []
+
+    # Header
+    now = datetime.now()
+    date_str = now.strftime("%B %d, %Y")
+    lines.append(f"# Hermes Agent v{semver} ({tag_name})")
+    lines.append("")
+    lines.append(f"**Release Date:** {date_str}")
+    lines.append("")
+
+    if first_release:
+        lines.append("> 🎉 **First official release!** This marks the beginning of regular weekly releases")
+        lines.append("> for Hermes Agent. See below for everything included in this initial release.")
+        lines.append("")
+
+    # Group commits by category
+    categories = defaultdict(list)
+    all_authors = set()
+    teknium_aliases = {"@teknium1"}
+
+    for commit in commits:
+        categories[commit["category"]].append(commit)
+        author = commit["github_author"]
+        if author not in teknium_aliases:
+            all_authors.add(author)
+
+    # Category display order and emoji
+    category_order = [
+        ("breaking", "⚠️ Breaking Changes"),
+        ("features", "✨ Features"),
+        ("improvements", "🔧 Improvements"),
+        ("fixes", "🐛 Bug Fixes"),
+        ("docs", "📚 Documentation"),
+        ("tests", "🧪 Tests"),
+        ("chore", "🏗️ Infrastructure"),
+        ("other", "📦 Other Changes"),
+    ]
+
+    for cat_key, cat_title in category_order:
+        cat_commits = categories.get(cat_key, [])
+        if not cat_commits:
+            continue
+
+        lines.append(f"## {cat_title}")
+        lines.append("")
+
+        for commit in cat_commits:
+            subject = clean_subject(commit["subject"])
+            pr_num = get_pr_number(commit["subject"])
+            author = commit["github_author"]
+
+            # Build the line
+            parts = [f"- {subject}"]
+            if pr_num:
+                parts.append(f"([#{pr_num}]({repo_url}/pull/{pr_num}))")
+            else:
+                parts.append(f"([`{commit['short_sha']}`]({repo_url}/commit/{commit['sha']}))")
+
+            if author not in teknium_aliases:
+                parts.append(f"— {author}")
+
+            lines.append(" ".join(parts))
+
+        lines.append("")
+
+    # Contributors section
+    if all_authors:
+        # Sort contributors by commit count
+        author_counts = defaultdict(int)
+        for commit in commits:
+            author = commit["github_author"]
+            if author not in teknium_aliases:
+                author_counts[author] += 1
+
+        sorted_authors = sorted(author_counts.items(), key=lambda x: -x[1])
+
+        lines.append("## 👥 Contributors")
+        lines.append("")
+        lines.append("Thank you to everyone who contributed to this release!")
+        lines.append("")
+        for author, count in sorted_authors:
+            commit_word = "commit" if count == 1 else "commits"
+            lines.append(f"- {author} ({count} {commit_word})")
+        lines.append("")
+
+    # Full changelog link
+    if prev_tag:
+        lines.append(f"**Full Changelog**: [{prev_tag}...{tag_name}]({repo_url}/compare/{prev_tag}...{tag_name})")
+    else:
+        lines.append(f"**Full Changelog**: [{tag_name}]({repo_url}/commits/{tag_name})")
+    lines.append("")
+
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Hermes Agent Release Tool")
+    parser.add_argument("--bump", choices=["major", "minor", "patch"],
+                        help="Which semver component to bump")
+    parser.add_argument("--publish", action="store_true",
+                        help="Actually create the tag and GitHub release (otherwise dry run)")
+    parser.add_argument("--date", type=str,
+                        help="Override CalVer date (format: YYYY.M.D)")
+    parser.add_argument("--first-release", action="store_true",
+                        help="Mark as first release (no previous tag expected)")
+    parser.add_argument("--output", type=str,
+                        help="Write changelog to file instead of stdout")
+    args = parser.parse_args()
+
+    # Determine CalVer date
+    if args.date:
+        calver_date = args.date
+    else:
+        now = datetime.now()
+        calver_date = f"{now.year}.{now.month}.{now.day}"
+
+    tag_name = f"v{calver_date}"
+
+    # Check for existing tag with same date
+    existing = git("tag", "--list", tag_name)
+    if existing and not args.publish:
+        # Append a suffix for same-day releases
+        suffix = 2
+        while git("tag", "--list", f"{tag_name}.{suffix}"):
+            suffix += 1
+        tag_name = f"{tag_name}.{suffix}"
+        calver_date = f"{calver_date}.{suffix}"
+        print(f"Note: Tag {tag_name[:-2]} already exists, using {tag_name}")
+
+    # Determine semver
+    current_version = get_current_version()
+    if args.bump:
+        new_version = bump_version(current_version, args.bump)
+    else:
+        new_version = current_version
+
+    # Get previous tag
+    prev_tag = get_last_tag()
+    if not prev_tag and not args.first_release:
+        print("No previous tags found. Use --first-release for the initial release.")
+        print(f"Would create tag: {tag_name}")
+        print(f"Would set version: {new_version}")
+
+    # Get commits
+    commits = get_commits(since_tag=prev_tag)
+    if not commits:
+        print("No new commits since last tag.")
+        if not args.first_release:
+            return
+
+    print(f"{'='*60}")
+    print(f"  Hermes Agent Release Preview")
+    print(f"{'='*60}")
+    print(f"  CalVer tag:      {tag_name}")
+    print(f"  SemVer:          v{current_version} → v{new_version}")
+    print(f"  Previous tag:    {prev_tag or '(none — first release)'}")
+    print(f"  Commits:         {len(commits)}")
+    print(f"  Unique authors:  {len(set(c['github_author'] for c in commits))}")
+    print(f"  Mode:            {'PUBLISH' if args.publish else 'DRY RUN'}")
+    print(f"{'='*60}")
+    print()
+
+    # Generate changelog
+    changelog = generate_changelog(
+        commits, tag_name, new_version,
+        prev_tag=prev_tag,
+        first_release=args.first_release,
+    )
+
+    if args.output:
+        Path(args.output).write_text(changelog)
+        print(f"Changelog written to {args.output}")
+    else:
+        print(changelog)
+
+    if args.publish:
+        print(f"\n{'='*60}")
+        print("  Publishing release...")
+        print(f"{'='*60}")
+
+        # Update version files
+        if args.bump:
+            update_version_files(new_version, calver_date)
+            print(f"  ✓ Updated version files to v{new_version} ({calver_date})")
+
+            # Commit version bump
+            git("add", str(VERSION_FILE), str(PYPROJECT_FILE))
+            git("commit", "-m", f"chore: bump version to v{new_version} ({calver_date})")
+            print(f"  ✓ Committed version bump")
+
+        # Create annotated tag
+        git("tag", "-a", tag_name, "-m",
+            f"Hermes Agent v{new_version} ({calver_date})\n\nWeekly release")
+        print(f"  ✓ Created tag {tag_name}")
+
+        # Push
+        push_result = git("push", "origin", "HEAD", "--tags")
+        print(f"  ✓ Pushed to origin")
+
+        # Create GitHub release
+        changelog_file = REPO_ROOT / ".release_notes.md"
+        changelog_file.write_text(changelog)
+
+        result = subprocess.run(
+            ["gh", "release", "create", tag_name,
+             "--title", f"Hermes Agent v{new_version} ({calver_date})",
+             "--notes-file", str(changelog_file)],
+            capture_output=True, text=True,
+            cwd=str(REPO_ROOT),
+        )
+
+        changelog_file.unlink(missing_ok=True)
+
+        if result.returncode == 0:
+            print(f"  ✓ GitHub release created: {result.stdout.strip()}")
+        else:
+            print(f"  ✗ GitHub release failed: {result.stderr}")
+            print(f"    Tag was created. Create the release manually:")
+            print(f"    gh release create {tag_name} --title 'Hermes Agent v{new_version} ({calver_date})'")
+
+        print(f"\n  🎉 Release v{new_version} ({tag_name}) published!")
+    else:
+        print(f"\n{'='*60}")
+        print(f"  Dry run complete. To publish, add --publish")
+        print(f"  Example: python scripts/release.py --bump minor --publish")
+        print(f"{'='*60}")
+
+
+if __name__ == "__main__":
+    main()
@@ -115,7 +115,7 @@ A config for this would look like:

 Reference: Pre-Tokenized Dataset Documentation.

-We reccomend this approach when you want granular control over the prompt formatting, special tokens, and masking, whilst letting Axolotl handle the tokenization. This is very useful if your dataset has unique prompts that differ across samples and where one single general template wouldn’t suffice.
+We recommend this approach when you want granular control over the prompt formatting, special tokens, and masking, whilst letting Axolotl handle the tokenization. This is very useful if your dataset has unique prompts that differ across samples and where one single general template wouldn’t suffice.

 In the example below, you could see that there is no proper structure. At the same time, it’s very flexible as there are no constraints on how your prompt can look.

@@ -583,7 +583,7 @@ A config for this would look like:

 Reference: Pre-Tokenized Dataset Documentation.

-We reccomend this approach when you want granular control over the prompt formatting, special tokens, and masking, whilst letting Axolotl handle the tokenization. This is very useful if your dataset has unique prompts that differ across samples and where one single general template wouldn’t suffice.
+We recommend this approach when you want granular control over the prompt formatting, special tokens, and masking, whilst letting Axolotl handle the tokenization. This is very useful if your dataset has unique prompts that differ across samples and where one single general template wouldn’t suffice.

 In the example below, you could see that there is no proper structure. At the same time, it’s very flexible as there are no constraints on how your prompt can look.

@@ -796,7 +796,7 @@ A config for this would look like:

 Reference: Pre-Tokenized Dataset Documentation.

-We reccomend this approach when you want granular control over the prompt formatting, special tokens, and masking, whilst letting Axolotl handle the tokenization. This is very useful if your dataset has unique prompts that differ across samples and where one single general template wouldn’t suffice.
+We recommend this approach when you want granular control over the prompt formatting, special tokens, and masking, whilst letting Axolotl handle the tokenization. This is very useful if your dataset has unique prompts that differ across samples and where one single general template wouldn’t suffice.

 In the example below, you could see that there is no proper structure. At the same time, it’s very flexible as there are no constraints on how your prompt can look.

@@ -1387,7 +1387,7 @@ trainer = SFTTrainer(
 For **advanced installation instructions** or if you see weird errors during installations:

 1. Install `torch` and `triton`. Go to <https://pytorch.org> to install it. For example `pip install torch torchvision torchaudio triton`
-2. Confirm if CUDA is installated correctly. Try `nvcc`. If that fails, you need to install `cudatoolkit` or CUDA drivers.
+2. Confirm if CUDA is installed correctly. Try `nvcc`. If that fails, you need to install `cudatoolkit` or CUDA drivers.
 3. Install `xformers` manually. You can try installing `vllm` and seeing if `vllm` succeeds. Check if `xformers` succeeded with `python -m xformers.info` Go to <https://github.com/facebookresearch/xformers>. Another option is to install `flash-attn` for Ampere GPUs.
 4. Double check that your versions of Python, CUDA, CUDNN, `torch`, `triton`, and `xformers` are compatible with one another. The [PyTorch Compatibility Matrix](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix) may be useful.
 5. Finally, install `bitsandbytes` and check it with `python -m bitsandbytes`
@@ -1824,7 +1824,7 @@ For LLMs, datasets are collections of data that can be used to train our models.
 [datasets-guide](https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/datasets-guide)
 {% endcontent-ref %}

-For most of our notebook examples, we utilize the [Alpaca dataset](https://docs.unsloth.ai/basics/tutorial-how-to-finetune-llama-3-and-use-in-ollama#id-6.-alpaca-dataset) however other notebooks like Vision will use different datasets which may need images in the answer ouput as well.
+For most of our notebook examples, we utilize the [Alpaca dataset](https://docs.unsloth.ai/basics/tutorial-how-to-finetune-llama-3-and-use-in-ollama#id-6.-alpaca-dataset) however other notebooks like Vision will use different datasets which may need images in the answer output as well.

 ## 4. Understand Training Hyperparameters

@@ -13280,7 +13280,7 @@ if __name__ == '__main__':
 ## :detective: Extra Findings & Tips

 1. We find using lower KV cache quantization (4bit) seems to degrade generation quality via empirical tests - more tests need to be done, but we suggest using `q8_0` cache quantization. The goal of quantization is to support longer context lengths since the KV cache uses quite a bit of memory.
-2. We found the `down_proj` in this model to be extremely sensitive to quantitation. We had to redo some of our dyanmic quants which used 2bits for `down_proj` and now we use 3bits as the minimum for all these matrices.
+2. We found the `down_proj` in this model to be extremely sensitive to quantitation. We had to redo some of our dynamic quants which used 2bits for `down_proj` and now we use 3bits as the minimum for all these matrices.
 3. Using `llama.cpp` 's Flash Attention backend does result in somewhat faster decoding speeds. Use `-DGGML_CUDA_FA_ALL_QUANTS=ON` when compiling. Note it's also best to set your CUDA architecture as found in <https://developer.nvidia.com/cuda-gpus> to reduce compilation times, then set it via `-DCMAKE_CUDA_ARCHITECTURES="80"`&#x20;
 4. Using a `min_p=0.01`is probably enough. `llama.cpp`defaults to 0.1, which is probably not necessary. Since a temperature of 0.3 is used anyways, we most likely will very unlikely sample low probability tokens, so removing very unlikely tokens is a good idea. DeepSeek recommends 0.0 temperature for coding tasks.

@@ -16682,7 +16682,7 @@ Advanced flags which might be useful if you see breaking finetunes, or you want

 <table><thead><tr><th width="397.4666748046875">Environment variable</th><th>Purpose</th><th data-hidden></th></tr></thead><tbody><tr><td><code>os.environ["UNSLOTH_RETURN_LOGITS"] = "1"</code></td><td>Forcibly returns logits - useful for evaluation if logits are needed.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_COMPILE_DISABLE"] = "1"</code></td><td>Disables auto compiler. Could be useful to debug incorrect finetune results.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_DISABLE_FAST_GENERATION"] = "1"</code></td><td>Disables fast generation for generic models.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_ENABLE_LOGGING"] = "1"</code></td><td>Enables auto compiler logging - useful to see which functions are compiled or not.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_FORCE_FLOAT32"] = "1"</code></td><td>On float16 machines, use float32 and not float16 mixed precision. Useful for Gemma 3.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_STUDIO_DISABLED"] = "1"</code></td><td>Disables extra features.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_COMPILE_DEBUG"] = "1"</code></td><td>Turns on extremely verbose <code>torch.compile</code>logs.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_COMPILE_MAXIMUM"] = "0"</code></td><td>Enables maximum <code>torch.compile</code>optimizations - not recommended.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_COMPILE_IGNORE_ERRORS"] = "1"</code></td><td>Can turn this off to enable fullgraph parsing.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_FULLGRAPH"] = "0"</code></td><td>Enable <code>torch.compile</code> fullgraph mode</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_DISABLE_AUTO_UPDATES"] = "1"</code></td><td>Forces no updates to <code>unsloth-zoo</code></td><td></td></tr></tbody></table>

-Another possiblity is maybe the model uploads we uploaded are corrupted, but unlikely. Try the following:
+Another possibility is maybe the model uploads we uploaded are corrupted, but unlikely. Try the following:

 ```python
 model, tokenizer = FastVisionModel.from_pretrained(
@@ -855,7 +855,7 @@ To run Unsloth directly on Windows:
 For **advanced installation instructions** or if you see weird errors during installations:

 1. Install `torch` and `triton`. Go to <https://pytorch.org> to install it. For example `pip install torch torchvision torchaudio triton`
-2. Confirm if CUDA is installated correctly. Try `nvcc`. If that fails, you need to install `cudatoolkit` or CUDA drivers.
+2. Confirm if CUDA is installed correctly. Try `nvcc`. If that fails, you need to install `cudatoolkit` or CUDA drivers.
 3. Install `xformers` manually. You can try installing `vllm` and seeing if `vllm` succeeds. Check if `xformers` succeeded with `python -m xformers.info` Go to <https://github.com/facebookresearch/xformers>. Another option is to install `flash-attn` for Ampere GPUs.
 4. Double check that your versions of Python, CUDA, CUDNN, `torch`, `triton`, and `xformers` are compatible with one another. The [PyTorch Compatibility Matrix](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix) may be useful.
 5. Finally, install `bitsandbytes` and check it with `python -m bitsandbytes`
@@ -2994,7 +2994,7 @@ if __name__ == '__main__':
 ## :detective: Extra Findings & Tips

 1. We find using lower KV cache quantization (4bit) seems to degrade generation quality via empirical tests - more tests need to be done, but we suggest using `q8_0` cache quantization. The goal of quantization is to support longer context lengths since the KV cache uses quite a bit of memory.
-2. We found the `down_proj` in this model to be extremely sensitive to quantitation. We had to redo some of our dyanmic quants which used 2bits for `down_proj` and now we use 3bits as the minimum for all these matrices.
+2. We found the `down_proj` in this model to be extremely sensitive to quantitation. We had to redo some of our dynamic quants which used 2bits for `down_proj` and now we use 3bits as the minimum for all these matrices.
 3. Using `llama.cpp` 's Flash Attention backend does result in somewhat faster decoding speeds. Use `-DGGML_CUDA_FA_ALL_QUANTS=ON` when compiling. Note it's also best to set your CUDA architecture as found in <https://developer.nvidia.com/cuda-gpus> to reduce compilation times, then set it via `-DCMAKE_CUDA_ARCHITECTURES="80"`&#x20;
 4. Using a `min_p=0.01`is probably enough. `llama.cpp`defaults to 0.1, which is probably not necessary. Since a temperature of 0.3 is used anyways, we most likely will very unlikely sample low probability tokens, so removing very unlikely tokens is a good idea. DeepSeek recommends 0.0 temperature for coding tasks.

@@ -3509,7 +3509,7 @@ Advanced flags which might be useful if you see breaking finetunes, or you want

 <table><thead><tr><th width="397.4666748046875">Environment variable</th><th>Purpose</th><th data-hidden></th></tr></thead><tbody><tr><td><code>os.environ["UNSLOTH_RETURN_LOGITS"] = "1"</code></td><td>Forcibly returns logits - useful for evaluation if logits are needed.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_COMPILE_DISABLE"] = "1"</code></td><td>Disables auto compiler. Could be useful to debug incorrect finetune results.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_DISABLE_FAST_GENERATION"] = "1"</code></td><td>Disables fast generation for generic models.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_ENABLE_LOGGING"] = "1"</code></td><td>Enables auto compiler logging - useful to see which functions are compiled or not.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_FORCE_FLOAT32"] = "1"</code></td><td>On float16 machines, use float32 and not float16 mixed precision. Useful for Gemma 3.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_STUDIO_DISABLED"] = "1"</code></td><td>Disables extra features.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_COMPILE_DEBUG"] = "1"</code></td><td>Turns on extremely verbose <code>torch.compile</code>logs.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_COMPILE_MAXIMUM"] = "0"</code></td><td>Enables maximum <code>torch.compile</code>optimizations - not recommended.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_COMPILE_IGNORE_ERRORS"] = "1"</code></td><td>Can turn this off to enable fullgraph parsing.</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_FULLGRAPH"] = "0"</code></td><td>Enable <code>torch.compile</code> fullgraph mode</td><td></td></tr><tr><td><code>os.environ["UNSLOTH_DISABLE_AUTO_UPDATES"] = "1"</code></td><td>Forces no updates to <code>unsloth-zoo</code></td><td></td></tr></tbody></table>

-Another possiblity is maybe the model uploads we uploaded are corrupted, but unlikely. Try the following:
+Another possibility is maybe the model uploads we uploaded are corrupted, but unlikely. Try the following:

 **Examples:**

@@ -9120,7 +9120,7 @@ For LLMs, datasets are collections of data that can be used to train our models.
 [datasets-guide](https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/datasets-guide)
 {% endcontent-ref %}

-For most of our notebook examples, we utilize the [Alpaca dataset](https://docs.unsloth.ai/basics/tutorial-how-to-finetune-llama-3-and-use-in-ollama#id-6.-alpaca-dataset) however other notebooks like Vision will use different datasets which may need images in the answer ouput as well.
+For most of our notebook examples, we utilize the [Alpaca dataset](https://docs.unsloth.ai/basics/tutorial-how-to-finetune-llama-3-and-use-in-ollama#id-6.-alpaca-dataset) however other notebooks like Vision will use different datasets which may need images in the answer output as well.

 ## 4. Understand Training Hyperparameters

@@ -8,6 +8,7 @@ metadata:
  hermes:
    tags: [search, duckduckgo, web-search, free, fallback]
    related_skills: [arxiv]
+    fallback_for_toolsets: [web]
 ---

 # DuckDuckGo Search
@@ -9,8 +9,7 @@ from agent.context_compressor import ContextCompressor
@pytest.fixture()
 def compressor():
    """Create a ContextCompressor with mocked dependencies."""
-    with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-         patch("agent.context_compressor.get_text_auxiliary_client", return_value=(None, None)):
+    with patch("agent.context_compressor.get_model_context_length", return_value=100000):
        c = ContextCompressor(
            model="test/model",
            threshold_percent=0.85,
@@ -119,14 +118,11 @@ class TestGenerateSummaryNoneContent:
    """Regression: content=None (from tool-call-only assistant messages) must not crash."""

    def test_none_content_does_not_crash(self):
-        mock_client = MagicMock()
        mock_response = MagicMock()
        mock_response.choices = [MagicMock()]
        mock_response.choices[0].message.content = "[CONTEXT SUMMARY]: tool calls happened"
-        mock_client.chat.completions.create.return_value = mock_response

-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(mock_client, "test-model")):
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(model="test", quiet_mode=True)

        messages = [
@@ -139,14 +135,14 @@ class TestGenerateSummaryNoneContent:
            {"role": "user", "content": "thanks"},
        ]

-        summary = c._generate_summary(messages)
+        with patch("agent.context_compressor.call_llm", return_value=mock_response):
+            summary = c._generate_summary(messages)
        assert isinstance(summary, str)
        assert "CONTEXT SUMMARY" in summary

    def test_none_content_in_system_message_compress(self):
        """System message with content=None should not crash during compress."""
-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(None, None)):
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)

        msgs = [{"role": "system", "content": None}] + [
@@ -165,12 +161,12 @@ class TestCompressWithClient:
        mock_response.choices[0].message.content = "[CONTEXT SUMMARY]: stuff happened"
        mock_client.chat.completions.create.return_value = mock_response

-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(mock_client, "test-model")):
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(model="test", quiet_mode=True)

        msgs = [{"role": "user" if i % 2 == 0 else "assistant", "content": f"msg {i}"} for i in range(10)]
-        result = c.compress(msgs)
+        with patch("agent.context_compressor.call_llm", return_value=mock_response):
+            result = c.compress(msgs)

        # Should have summary message in the middle
        contents = [m.get("content", "") for m in result]
@@ -184,8 +180,7 @@ class TestCompressWithClient:
        mock_response.choices[0].message.content = "[CONTEXT SUMMARY]: compressed middle"
        mock_client.chat.completions.create.return_value = mock_response

-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(mock_client, "test-model")):
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(
                model="test",
                quiet_mode=True,
@@ -212,7 +207,8 @@ class TestCompressWithClient:
            {"role": "user", "content": "later 4"},
        ]

-        result = c.compress(msgs)
+        with patch("agent.context_compressor.call_llm", return_value=mock_response):
+            result = c.compress(msgs)

        answered_ids = {
            msg.get("tool_call_id")
@@ -232,8 +228,7 @@ class TestCompressWithClient:
        mock_response.choices[0].message.content = "[CONTEXT SUMMARY]: stuff happened"
        mock_client.chat.completions.create.return_value = mock_response

-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(mock_client, "test-model")):
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)

        # Last head message (index 1) is "assistant" → summary should be "user"
@@ -245,7 +240,8 @@ class TestCompressWithClient:
            {"role": "user", "content": "msg 4"},
            {"role": "assistant", "content": "msg 5"},
        ]
-        result = c.compress(msgs)
+        with patch("agent.context_compressor.call_llm", return_value=mock_response):
+            result = c.compress(msgs)
        summary_msg = [m for m in result if "CONTEXT SUMMARY" in (m.get("content") or "")]
        assert len(summary_msg) == 1
        assert summary_msg[0]["role"] == "user"
@@ -258,8 +254,7 @@ class TestCompressWithClient:
        mock_response.choices[0].message.content = "[CONTEXT SUMMARY]: stuff happened"
        mock_client.chat.completions.create.return_value = mock_response

-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(mock_client, "test-model")):
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=3, protect_last_n=2)

        # Last head message (index 2) is "user" → summary should be "assistant"
@@ -273,20 +268,18 @@ class TestCompressWithClient:
            {"role": "user", "content": "msg 6"},
            {"role": "assistant", "content": "msg 7"},
        ]
-        result = c.compress(msgs)
+        with patch("agent.context_compressor.call_llm", return_value=mock_response):
+            result = c.compress(msgs)
        summary_msg = [m for m in result if "CONTEXT SUMMARY" in (m.get("content") or "")]
        assert len(summary_msg) == 1
        assert summary_msg[0]["role"] == "assistant"

    def test_summarization_does_not_start_tail_with_tool_outputs(self):
-        mock_client = MagicMock()
        mock_response = MagicMock()
        mock_response.choices = [MagicMock()]
        mock_response.choices[0].message.content = "[CONTEXT SUMMARY]: compressed middle"
-        mock_client.chat.completions.create.return_value = mock_response

-        with patch("agent.context_compressor.get_model_context_length", return_value=100000), \
-             patch("agent.context_compressor.get_text_auxiliary_client", return_value=(mock_client, "test-model")):
+        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(
                model="test",
                quiet_mode=True,
@@ -309,7 +302,8 @@ class TestCompressWithClient:
            {"role": "user", "content": "latest user"},
        ]

-        result = c.compress(msgs)
+        with patch("agent.context_compressor.call_llm", return_value=mock_response):
+            result = c.compress(msgs)

        called_ids = {
            tc["id"]
@@ -8,6 +8,8 @@ from agent.prompt_builder import (
    _scan_context_content,
    _truncate_content,
    _read_skill_description,
+    _read_skill_conditions,
+    _skill_should_show,
    build_skills_system_prompt,
    build_context_files_prompt,
    CONTEXT_FILE_MAX_CHARS,
@@ -277,3 +279,177 @@ class TestPromptBuilderConstants:
        assert "telegram" in PLATFORM_HINTS
        assert "discord" in PLATFORM_HINTS
        assert "cli" in PLATFORM_HINTS
+
+
+# =========================================================================
+# Conditional skill activation
+# =========================================================================
+
+class TestReadSkillConditions:
+    def test_no_conditions_returns_empty_lists(self, tmp_path):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text("---\nname: test\ndescription: A skill\n---\n")
+        conditions = _read_skill_conditions(skill_file)
+        assert conditions["fallback_for_toolsets"] == []
+        assert conditions["requires_toolsets"] == []
+        assert conditions["fallback_for_tools"] == []
+        assert conditions["requires_tools"] == []
+
+    def test_reads_fallback_for_toolsets(self, tmp_path):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text(
+            "---\nname: ddg\ndescription: DuckDuckGo\nmetadata:\n  hermes:\n    fallback_for_toolsets: [web]\n---\n"
+        )
+        conditions = _read_skill_conditions(skill_file)
+        assert conditions["fallback_for_toolsets"] == ["web"]
+
+    def test_reads_requires_toolsets(self, tmp_path):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text(
+            "---\nname: openhue\ndescription: Hue lights\nmetadata:\n  hermes:\n    requires_toolsets: [terminal]\n---\n"
+        )
+        conditions = _read_skill_conditions(skill_file)
+        assert conditions["requires_toolsets"] == ["terminal"]
+
+    def test_reads_multiple_conditions(self, tmp_path):
+        skill_file = tmp_path / "SKILL.md"
+        skill_file.write_text(
+            "---\nname: test\ndescription: Test\nmetadata:\n  hermes:\n    fallback_for_toolsets: [browser]\n    requires_tools: [terminal]\n---\n"
+        )
+        conditions = _read_skill_conditions(skill_file)
+        assert conditions["fallback_for_toolsets"] == ["browser"]
+        assert conditions["requires_tools"] == ["terminal"]
+
+    def test_missing_file_returns_empty(self, tmp_path):
+        conditions = _read_skill_conditions(tmp_path / "missing.md")
+        assert conditions == {}
+
+
+class TestSkillShouldShow:
+    def test_no_filter_info_always_shows(self):
+        assert _skill_should_show({}, None, None) is True
+
+    def test_empty_conditions_always_shows(self):
+        assert _skill_should_show(
+            {"fallback_for_toolsets": [], "requires_toolsets": [],
+             "fallback_for_tools": [], "requires_tools": []},
+            {"web_search"}, {"web"}
+        ) is True
+
+    def test_fallback_hidden_when_toolset_available(self):
+        conditions = {"fallback_for_toolsets": ["web"], "requires_toolsets": [],
+                      "fallback_for_tools": [], "requires_tools": []}
+        assert _skill_should_show(conditions, set(), {"web"}) is False
+
+    def test_fallback_shown_when_toolset_unavailable(self):
+        conditions = {"fallback_for_toolsets": ["web"], "requires_toolsets": [],
+                      "fallback_for_tools": [], "requires_tools": []}
+        assert _skill_should_show(conditions, set(), set()) is True
+
+    def test_requires_shown_when_toolset_available(self):
+        conditions = {"fallback_for_toolsets": [], "requires_toolsets": ["terminal"],
+                      "fallback_for_tools": [], "requires_tools": []}
+        assert _skill_should_show(conditions, set(), {"terminal"}) is True
+
+    def test_requires_hidden_when_toolset_missing(self):
+        conditions = {"fallback_for_toolsets": [], "requires_toolsets": ["terminal"],
+                      "fallback_for_tools": [], "requires_tools": []}
+        assert _skill_should_show(conditions, set(), set()) is False
+
+    def test_fallback_for_tools_hidden_when_tool_available(self):
+        conditions = {"fallback_for_toolsets": [], "requires_toolsets": [],
+                      "fallback_for_tools": ["web_search"], "requires_tools": []}
+        assert _skill_should_show(conditions, {"web_search"}, set()) is False
+
+    def test_fallback_for_tools_shown_when_tool_missing(self):
+        conditions = {"fallback_for_toolsets": [], "requires_toolsets": [],
+                      "fallback_for_tools": ["web_search"], "requires_tools": []}
+        assert _skill_should_show(conditions, set(), set()) is True
+
+    def test_requires_tools_hidden_when_tool_missing(self):
+        conditions = {"fallback_for_toolsets": [], "requires_toolsets": [],
+                      "fallback_for_tools": [], "requires_tools": ["terminal"]}
+        assert _skill_should_show(conditions, set(), set()) is False
+
+    def test_requires_tools_shown_when_tool_available(self):
+        conditions = {"fallback_for_toolsets": [], "requires_toolsets": [],
+                      "fallback_for_tools": [], "requires_tools": ["terminal"]}
+        assert _skill_should_show(conditions, {"terminal"}, set()) is True
+
+
+class TestBuildSkillsSystemPromptConditional:
+    def test_fallback_skill_hidden_when_primary_available(self, monkeypatch, tmp_path):
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        skill_dir = tmp_path / "skills" / "search" / "duckduckgo"
+        skill_dir.mkdir(parents=True)
+        (skill_dir / "SKILL.md").write_text(
+            "---\nname: duckduckgo\ndescription: Free web search\nmetadata:\n  hermes:\n    fallback_for_toolsets: [web]\n---\n"
+        )
+        result = build_skills_system_prompt(
+            available_tools=set(),
+            available_toolsets={"web"},
+        )
+        assert "duckduckgo" not in result
+
+    def test_fallback_skill_shown_when_primary_unavailable(self, monkeypatch, tmp_path):
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        skill_dir = tmp_path / "skills" / "search" / "duckduckgo"
+        skill_dir.mkdir(parents=True)
+        (skill_dir / "SKILL.md").write_text(
+            "---\nname: duckduckgo\ndescription: Free web search\nmetadata:\n  hermes:\n    fallback_for_toolsets: [web]\n---\n"
+        )
+        result = build_skills_system_prompt(
+            available_tools=set(),
+            available_toolsets=set(),
+        )
+        assert "duckduckgo" in result
+
+    def test_requires_skill_hidden_when_toolset_missing(self, monkeypatch, tmp_path):
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        skill_dir = tmp_path / "skills" / "iot" / "openhue"
+        skill_dir.mkdir(parents=True)
+        (skill_dir / "SKILL.md").write_text(
+            "---\nname: openhue\ndescription: Hue lights\nmetadata:\n  hermes:\n    requires_toolsets: [terminal]\n---\n"
+        )
+        result = build_skills_system_prompt(
+            available_tools=set(),
+            available_toolsets=set(),
+        )
+        assert "openhue" not in result
+
+    def test_requires_skill_shown_when_toolset_available(self, monkeypatch, tmp_path):
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        skill_dir = tmp_path / "skills" / "iot" / "openhue"
+        skill_dir.mkdir(parents=True)
+        (skill_dir / "SKILL.md").write_text(
+            "---\nname: openhue\ndescription: Hue lights\nmetadata:\n  hermes:\n    requires_toolsets: [terminal]\n---\n"
+        )
+        result = build_skills_system_prompt(
+            available_tools=set(),
+            available_toolsets={"terminal"},
+        )
+        assert "openhue" in result
+
+    def test_unconditional_skill_always_shown(self, monkeypatch, tmp_path):
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        skill_dir = tmp_path / "skills" / "general" / "notes"
+        skill_dir.mkdir(parents=True)
+        (skill_dir / "SKILL.md").write_text(
+            "---\nname: notes\ndescription: Take notes\n---\n"
+        )
+        result = build_skills_system_prompt(
+            available_tools=set(),
+            available_toolsets=set(),
+        )
+        assert "notes" in result
+
+    def test_no_args_shows_all_skills(self, monkeypatch, tmp_path):
+        """Backward compat: calling with no args shows everything."""
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        skill_dir = tmp_path / "skills" / "search" / "duckduckgo"
+        skill_dir.mkdir(parents=True)
+        (skill_dir / "SKILL.md").write_text(
+            "---\nname: duckduckgo\ndescription: Free web search\nmetadata:\n  hermes:\n    fallback_for_toolsets: [web]\n---\n"
+        )
+        result = build_skills_system_prompt()
+        assert "duckduckgo" in result
@@ -1,6 +1,7 @@
 """Shared fixtures for the hermes-agent test suite."""

 import os
+import signal
 import sys
 import tempfile
 from pathlib import Path
@@ -48,3 +49,21 @@ def mock_config():
        "memory": {"memory_enabled": False, "user_profile_enabled": False},
        "command_allowlist": [],
    }
+
+
+# ── Global test timeout ─────────────────────────────────────────────────────
+# Kill any individual test that takes longer than 30 seconds.
+# Prevents hanging tests (subprocess spawns, blocking I/O) from stalling the
+# entire test suite.
+
+def _timeout_handler(signum, frame):
+    raise TimeoutError("Test exceeded 30 second timeout")
+
+@pytest.fixture(autouse=True)
+def _enforce_test_timeout():
+    """Kill any individual test that takes longer than 30 seconds."""
+    old = signal.signal(signal.SIGALRM, _timeout_handler)
+    signal.alarm(30)
+    yield
+    signal.alarm(0)
+    signal.signal(signal.SIGALRM, old)
@@ -16,6 +16,7 @@ class TestResolveOrigin:
                "platform": "telegram",
                "chat_id": "123456",
                "chat_name": "Test Chat",
+                "thread_id": "42",
            }
        }
        result = _resolve_origin(job)
@@ -24,6 +25,7 @@ class TestResolveOrigin:
        assert result["platform"] == "telegram"
        assert result["chat_id"] == "123456"
        assert result["chat_name"] == "Test Chat"
+        assert result["thread_id"] == "42"

    def test_no_origin(self):
        assert _resolve_origin({}) is None
@@ -68,6 +70,41 @@ class TestDeliverResultMirrorLogging:
        assert any("mirror_to_session failed" in r.message for r in caplog.records), \
            f"Expected 'mirror_to_session failed' warning in logs, got: {[r.message for r in caplog.records]}"

+    def test_origin_delivery_preserves_thread_id(self):
+        """Origin delivery should forward thread_id to send/mirror helpers."""
+        from gateway.config import Platform
+
+        pconfig = MagicMock()
+        pconfig.enabled = True
+        mock_cfg = MagicMock()
+        mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
+
+        job = {
+            "id": "test-job",
+            "deliver": "origin",
+            "origin": {
+                "platform": "telegram",
+                "chat_id": "-1001",
+                "thread_id": "17585",
+            },
+        }
+
+        with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
+             patch("tools.send_message_tool._send_to_platform", return_value={"success": True}) as send_mock, \
+             patch("gateway.mirror.mirror_to_session") as mirror_mock, \
+             patch("asyncio.run", side_effect=lambda coro: None):
+            _deliver_result(job, "hello")
+
+        send_mock.assert_called_once()
+        assert send_mock.call_args.kwargs["thread_id"] == "17585"
+        mirror_mock.assert_called_once_with(
+            "telegram",
+            "-1001",
+            "hello",
+            source_label="cron",
+            thread_id="17585",
+        )
+

 class TestRunJobConfigLogging:
    """Verify that config.yaml parse failures are logged, not silently swallowed."""
@@ -0,0 +1,305 @@
+"""Tests for /background gateway slash command.
+
+Tests the _handle_background_command handler (run a prompt in a separate
+background session) across gateway messenger platforms.
+"""
+
+import asyncio
+import os
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from gateway.config import Platform
+from gateway.platforms.base import MessageEvent
+from gateway.session import SessionSource
+
+
+def _make_event(text="/background", platform=Platform.TELEGRAM,
+                user_id="12345", chat_id="67890"):
+    """Build a MessageEvent for testing."""
+    source = SessionSource(
+        platform=platform,
+        user_id=user_id,
+        chat_id=chat_id,
+        user_name="testuser",
+    )
+    return MessageEvent(text=text, source=source)
+
+
+def _make_runner():
+    """Create a bare GatewayRunner with minimal mocks."""
+    from gateway.run import GatewayRunner
+    runner = object.__new__(GatewayRunner)
+    runner.adapters = {}
+    runner._session_db = None
+    runner._reasoning_config = None
+    runner._provider_routing = {}
+    runner._fallback_model = None
+    runner._running_agents = {}
+
+    mock_store = MagicMock()
+    runner.session_store = mock_store
+
+    from gateway.hooks import HookRegistry
+    runner.hooks = HookRegistry()
+
+    return runner
+
+
+# ---------------------------------------------------------------------------
+# _handle_background_command
+# ---------------------------------------------------------------------------
+
+
+class TestHandleBackgroundCommand:
+    """Tests for GatewayRunner._handle_background_command."""
+
+    @pytest.mark.asyncio
+    async def test_no_prompt_shows_usage(self):
+        """Running /background with no prompt shows usage."""
+        runner = _make_runner()
+        event = _make_event(text="/background")
+        result = await runner._handle_background_command(event)
+        assert "Usage:" in result
+        assert "/background" in result
+
+    @pytest.mark.asyncio
+    async def test_empty_prompt_shows_usage(self):
+        """Running /background with only whitespace shows usage."""
+        runner = _make_runner()
+        event = _make_event(text="/background   ")
+        result = await runner._handle_background_command(event)
+        assert "Usage:" in result
+
+    @pytest.mark.asyncio
+    async def test_valid_prompt_starts_task(self):
+        """Running /background with a prompt returns confirmation and starts task."""
+        runner = _make_runner()
+
+        # Patch asyncio.create_task to capture the coroutine
+        created_tasks = []
+        original_create_task = asyncio.create_task
+
+        def capture_task(coro, *args, **kwargs):
+            # Close the coroutine to avoid warnings
+            coro.close()
+            mock_task = MagicMock()
+            created_tasks.append(mock_task)
+            return mock_task
+
+        with patch("gateway.run.asyncio.create_task", side_effect=capture_task):
+            event = _make_event(text="/background Summarize the top HN stories")
+            result = await runner._handle_background_command(event)
+
+        assert "🔄" in result
+        assert "Background task started" in result
+        assert "bg_" in result  # task ID starts with bg_
+        assert "Summarize the top HN stories" in result
+        assert len(created_tasks) == 1  # background task was created
+
+    @pytest.mark.asyncio
+    async def test_prompt_truncated_in_preview(self):
+        """Long prompts are truncated to 60 chars in the confirmation message."""
+        runner = _make_runner()
+        long_prompt = "A" * 100
+
+        with patch("gateway.run.asyncio.create_task", side_effect=lambda c, **kw: (c.close(), MagicMock())[1]):
+            event = _make_event(text=f"/background {long_prompt}")
+            result = await runner._handle_background_command(event)
+
+        assert "..." in result
+        # Should not contain the full prompt
+        assert long_prompt not in result
+
+    @pytest.mark.asyncio
+    async def test_task_id_is_unique(self):
+        """Each background task gets a unique task ID."""
+        runner = _make_runner()
+        task_ids = set()
+
+        with patch("gateway.run.asyncio.create_task", side_effect=lambda c, **kw: (c.close(), MagicMock())[1]):
+            for i in range(5):
+                event = _make_event(text=f"/background task {i}")
+                result = await runner._handle_background_command(event)
+                # Extract task ID from result (format: "Task ID: bg_HHMMSS_hex")
+                for line in result.split("\n"):
+                    if "Task ID:" in line:
+                        tid = line.split("Task ID:")[1].strip()
+                        task_ids.add(tid)
+
+        assert len(task_ids) == 5  # all unique
+
+    @pytest.mark.asyncio
+    async def test_works_across_platforms(self):
+        """The /background command works for all platforms."""
+        for platform in [Platform.TELEGRAM, Platform.DISCORD, Platform.SLACK]:
+            runner = _make_runner()
+            with patch("gateway.run.asyncio.create_task", side_effect=lambda c, **kw: (c.close(), MagicMock())[1]):
+                event = _make_event(
+                    text="/background test task",
+                    platform=platform,
+                )
+                result = await runner._handle_background_command(event)
+                assert "Background task started" in result
+
+
+# ---------------------------------------------------------------------------
+# _run_background_task
+# ---------------------------------------------------------------------------
+
+
+class TestRunBackgroundTask:
+    """Tests for GatewayRunner._run_background_task (the actual execution)."""
+
+    @pytest.mark.asyncio
+    async def test_no_adapter_returns_silently(self):
+        """When no adapter is available, the task returns without error."""
+        runner = _make_runner()
+        source = SessionSource(
+            platform=Platform.TELEGRAM,
+            user_id="12345",
+            chat_id="67890",
+            user_name="testuser",
+        )
+        # No adapters set — should not raise
+        await runner._run_background_task("test prompt", source, "bg_test")
+
+    @pytest.mark.asyncio
+    async def test_no_credentials_sends_error(self):
+        """When provider credentials are missing, an error is sent."""
+        runner = _make_runner()
+        mock_adapter = AsyncMock()
+        mock_adapter.send = AsyncMock()
+        runner.adapters[Platform.TELEGRAM] = mock_adapter
+
+        source = SessionSource(
+            platform=Platform.TELEGRAM,
+            user_id="12345",
+            chat_id="67890",
+            user_name="testuser",
+        )
+
+        with patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": None}):
+            await runner._run_background_task("test prompt", source, "bg_test")
+
+        # Should have sent an error message
+        mock_adapter.send.assert_called_once()
+        call_args = mock_adapter.send.call_args
+        assert "failed" in call_args[1].get("content", call_args[0][1] if len(call_args[0]) > 1 else "").lower()
+
+    @pytest.mark.asyncio
+    async def test_successful_task_sends_result(self):
+        """When the agent completes successfully, the result is sent."""
+        runner = _make_runner()
+        mock_adapter = AsyncMock()
+        mock_adapter.send = AsyncMock()
+        mock_adapter.extract_media = MagicMock(return_value=([], "Hello from background!"))
+        mock_adapter.extract_images = MagicMock(return_value=([], "Hello from background!"))
+        runner.adapters[Platform.TELEGRAM] = mock_adapter
+
+        source = SessionSource(
+            platform=Platform.TELEGRAM,
+            user_id="12345",
+            chat_id="67890",
+            user_name="testuser",
+        )
+
+        mock_result = {"final_response": "Hello from background!", "messages": []}
+
+        with patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "test-key"}), \
+             patch("run_agent.AIAgent") as MockAgent:
+            mock_agent_instance = MagicMock()
+            mock_agent_instance.run_conversation.return_value = mock_result
+            MockAgent.return_value = mock_agent_instance
+
+            await runner._run_background_task("say hello", source, "bg_test")
+
+        # Should have sent the result
+        mock_adapter.send.assert_called_once()
+        call_args = mock_adapter.send.call_args
+        content = call_args[1].get("content", call_args[0][1] if len(call_args[0]) > 1 else "")
+        assert "Background task complete" in content
+        assert "Hello from background!" in content
+
+    @pytest.mark.asyncio
+    async def test_exception_sends_error_message(self):
+        """When the agent raises an exception, an error message is sent."""
+        runner = _make_runner()
+        mock_adapter = AsyncMock()
+        mock_adapter.send = AsyncMock()
+        runner.adapters[Platform.TELEGRAM] = mock_adapter
+
+        source = SessionSource(
+            platform=Platform.TELEGRAM,
+            user_id="12345",
+            chat_id="67890",
+            user_name="testuser",
+        )
+
+        with patch("gateway.run._resolve_runtime_agent_kwargs", side_effect=RuntimeError("boom")):
+            await runner._run_background_task("test prompt", source, "bg_test")
+
+        mock_adapter.send.assert_called_once()
+        call_args = mock_adapter.send.call_args
+        content = call_args[1].get("content", call_args[0][1] if len(call_args[0]) > 1 else "")
+        assert "failed" in content.lower()
+
+
+# ---------------------------------------------------------------------------
+# /background in help and known_commands
+# ---------------------------------------------------------------------------
+
+
+class TestBackgroundInHelp:
+    """Verify /background appears in help text and known commands."""
+
+    @pytest.mark.asyncio
+    async def test_background_in_help_output(self):
+        """The /help output includes /background."""
+        runner = _make_runner()
+        event = _make_event(text="/help")
+        result = await runner._handle_help_command(event)
+        assert "/background" in result
+
+    def test_background_is_known_command(self):
+        """The /background command is in the _known_commands set."""
+        from gateway.run import GatewayRunner
+        import inspect
+        source = inspect.getsource(GatewayRunner._handle_message)
+        assert '"background"' in source
+
+
+# ---------------------------------------------------------------------------
+# CLI /background command definition
+# ---------------------------------------------------------------------------
+
+
+class TestBackgroundInCLICommands:
+    """Verify /background is registered in the CLI command system."""
+
+    def test_background_in_commands_dict(self):
+        """The /background command is in the COMMANDS dict."""
+        from hermes_cli.commands import COMMANDS
+        assert "/background" in COMMANDS
+
+    def test_background_in_session_category(self):
+        """The /background command is in the Session category."""
+        from hermes_cli.commands import COMMANDS_BY_CATEGORY
+        assert "/background" in COMMANDS_BY_CATEGORY["Session"]
+
+    def test_background_autocompletes(self):
+        """The /background command appears in autocomplete results."""
+        from hermes_cli.commands import SlashCommandCompleter
+        from prompt_toolkit.document import Document
+
+        completer = SlashCommandCompleter()
+        doc = Document("backgro")  # Partial match
+        completions = list(completer.get_completions(doc, None))
+        # Text doesn't start with / so no completions
+        assert len(completions) == 0
+
+        doc = Document("/backgro")  # With slash prefix
+        completions = list(completer.get_completions(doc, None))
+        cmd_displays = [str(c.display) for c in completions]
+        assert any("/background" in d for d in cmd_displays)
@@ -0,0 +1,135 @@
+"""Tests for BasePlatformAdapter topic-aware session handling."""
+
+import asyncio
+from types import SimpleNamespace
+
+import pytest
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
+from gateway.session import SessionSource, build_session_key
+
+
+class DummyTelegramAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="fake-token"), Platform.TELEGRAM)
+        self.sent = []
+        self.typing = []
+
+    async def connect(self) -> bool:
+        return True
+
+    async def disconnect(self) -> None:
+        return None
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None) -> SendResult:
+        self.sent.append(
+            {
+                "chat_id": chat_id,
+                "content": content,
+                "reply_to": reply_to,
+                "metadata": metadata,
+            }
+        )
+        return SendResult(success=True, message_id="1")
+
+    async def send_typing(self, chat_id: str, metadata=None) -> None:
+        self.typing.append({"chat_id": chat_id, "metadata": metadata})
+        return None
+
+    async def get_chat_info(self, chat_id: str):
+        return {"id": chat_id}
+
+
+def _make_event(chat_id: str, thread_id: str, message_id: str = "1") -> MessageEvent:
+    return MessageEvent(
+        text="hello",
+        source=SessionSource(
+            platform=Platform.TELEGRAM,
+            chat_id=chat_id,
+            chat_type="group",
+            thread_id=thread_id,
+        ),
+        message_id=message_id,
+    )
+
+
+class TestBasePlatformTopicSessions:
+    @pytest.mark.asyncio
+    async def test_handle_message_does_not_interrupt_different_topic(self, monkeypatch):
+        adapter = DummyTelegramAdapter()
+        adapter.set_message_handler(lambda event: asyncio.sleep(0, result=None))
+
+        active_event = _make_event("-1001", "10")
+        adapter._active_sessions[build_session_key(active_event.source)] = asyncio.Event()
+
+        scheduled = []
+
+        def fake_create_task(coro):
+            scheduled.append(coro)
+            coro.close()
+            return SimpleNamespace()
+
+        monkeypatch.setattr(asyncio, "create_task", fake_create_task)
+
+        await adapter.handle_message(_make_event("-1001", "11"))
+
+        assert len(scheduled) == 1
+        assert adapter._pending_messages == {}
+
+    @pytest.mark.asyncio
+    async def test_handle_message_interrupts_same_topic(self, monkeypatch):
+        adapter = DummyTelegramAdapter()
+        adapter.set_message_handler(lambda event: asyncio.sleep(0, result=None))
+
+        active_event = _make_event("-1001", "10")
+        adapter._active_sessions[build_session_key(active_event.source)] = asyncio.Event()
+
+        scheduled = []
+
+        def fake_create_task(coro):
+            scheduled.append(coro)
+            coro.close()
+            return SimpleNamespace()
+
+        monkeypatch.setattr(asyncio, "create_task", fake_create_task)
+
+        pending_event = _make_event("-1001", "10", message_id="2")
+        await adapter.handle_message(pending_event)
+
+        assert scheduled == []
+        assert adapter.get_pending_message(build_session_key(pending_event.source)) == pending_event
+
+    @pytest.mark.asyncio
+    async def test_process_message_background_replies_in_same_topic(self):
+        adapter = DummyTelegramAdapter()
+        typing_calls = []
+
+        async def handler(_event):
+            await asyncio.sleep(0)
+            return "ack"
+
+        async def hold_typing(_chat_id, interval=2.0, metadata=None):
+            typing_calls.append({"chat_id": _chat_id, "metadata": metadata})
+            await asyncio.Event().wait()
+
+        adapter.set_message_handler(handler)
+        adapter._keep_typing = hold_typing
+
+        event = _make_event("-1001", "17585")
+        await adapter._process_message_background(event, build_session_key(event.source))
+
+        assert adapter.sent == [
+            {
+                "chat_id": "-1001",
+                "content": "ack",
+                "reply_to": "1",
+                "metadata": {"thread_id": "17585"},
+            }
+        ]
+        assert typing_calls == [
+            {
+                "chat_id": "-1001",
+                "metadata": {"thread_id": "17585"},
+            }
+        ]
@@ -111,6 +111,13 @@ class TestResolveChannelName:
        with self._setup(tmp_path, platforms):
            assert resolve_channel_name("telegram", "nonexistent") is None

+    def test_topic_name_resolves_to_composite_id(self, tmp_path):
+        platforms = {
+            "telegram": [{"id": "-1001:17585", "name": "Coaching Chat / topic 17585", "type": "group"}]
+        }
+        with self._setup(tmp_path, platforms):
+            assert resolve_channel_name("telegram", "Coaching Chat / topic 17585") == "-1001:17585"
+

 class TestBuildFromSessions:
    def _write_sessions(self, tmp_path, sessions_data):
@@ -169,6 +176,42 @@ class TestBuildFromSessions:

        assert len(entries) == 1

+    def test_keeps_distinct_topics_with_same_chat_id(self, tmp_path):
+        self._write_sessions(tmp_path, {
+            "group_root": {
+                "origin": {"platform": "telegram", "chat_id": "-1001", "chat_name": "Coaching Chat"},
+                "chat_type": "group",
+            },
+            "topic_a": {
+                "origin": {
+                    "platform": "telegram",
+                    "chat_id": "-1001",
+                    "chat_name": "Coaching Chat",
+                    "thread_id": "17585",
+                },
+                "chat_type": "group",
+            },
+            "topic_b": {
+                "origin": {
+                    "platform": "telegram",
+                    "chat_id": "-1001",
+                    "chat_name": "Coaching Chat",
+                    "thread_id": "17587",
+                },
+                "chat_type": "group",
+            },
+        })
+
+        with patch.object(Path, "home", return_value=tmp_path):
+            entries = _build_from_sessions("telegram")
+
+        ids = {entry["id"] for entry in entries}
+        names = {entry["name"] for entry in entries}
+        assert ids == {"-1001", "-1001:17585", "-1001:17587"}
+        assert "Coaching Chat" in names
+        assert "Coaching Chat / topic 17585" in names
+        assert "Coaching Chat / topic 17587" in names
+

 class TestFormatDirectoryForDisplay:
    def test_empty_directory(self, tmp_path):
@@ -181,6 +224,7 @@ class TestFormatDirectoryForDisplay:
            "telegram": [
                {"id": "123", "name": "Alice", "type": "dm"},
                {"id": "456", "name": "Dev Group", "type": "group"},
+                {"id": "-1001:17585", "name": "Coaching Chat / topic 17585", "type": "group"},
            ]
        })
        with patch("gateway.channel_directory.DIRECTORY_PATH", cache_file):
@@ -189,6 +233,7 @@ class TestFormatDirectoryForDisplay:
        assert "Telegram:" in result
        assert "telegram:Alice" in result
        assert "telegram:Dev Group" in result
+        assert "telegram:Coaching Chat / topic 17585" in result

    def test_discord_grouped_by_guild(self, tmp_path):
        cache_file = _write_directory(tmp_path, {
@@ -24,10 +24,11 @@ class TestParseTargetPlatformChat:
        assert target.chat_id is None

    def test_origin_with_source(self):
-        origin = SessionSource(platform=Platform.TELEGRAM, chat_id="789")
+        origin = SessionSource(platform=Platform.TELEGRAM, chat_id="789", thread_id="42")
        target = DeliveryTarget.parse("origin", origin=origin)
        assert target.platform == Platform.TELEGRAM
        assert target.chat_id == "789"
+        assert target.thread_id == "42"
        assert target.is_origin is True

    def test_origin_without_source(self):
@@ -64,7 +65,7 @@ class TestParseDeliverSpec:

 class TestTargetToStringRoundtrip:
    def test_origin_roundtrip(self):
-        origin = SessionSource(platform=Platform.TELEGRAM, chat_id="111")
+        origin = SessionSource(platform=Platform.TELEGRAM, chat_id="111", thread_id="42")
        target = DeliveryTarget.parse("origin", origin=origin)
        assert target.to_string() == "origin"

@@ -0,0 +1,117 @@
+"""Tests for Discord bot message filtering (DISCORD_ALLOW_BOTS)."""
+
+import asyncio
+import os
+import unittest
+from unittest.mock import AsyncMock, MagicMock, patch
+
+
+def _make_author(*, bot: bool = False, is_self: bool = False):
+    """Create a mock Discord author."""
+    author = MagicMock()
+    author.bot = bot
+    author.id = 99999 if is_self else 12345
+    author.name = "TestBot" if bot else "TestUser"
+    author.display_name = author.name
+    return author
+
+
+def _make_message(*, author=None, content="hello", mentions=None, is_dm=False):
+    """Create a mock Discord message."""
+    msg = MagicMock()
+    msg.author = author or _make_author()
+    msg.content = content
+    msg.attachments = []
+    msg.mentions = mentions or []
+    if is_dm:
+        import discord
+        msg.channel = MagicMock(spec=discord.DMChannel)
+        msg.channel.id = 111
+    else:
+        msg.channel = MagicMock()
+        msg.channel.id = 222
+        msg.channel.name = "test-channel"
+        msg.channel.guild = MagicMock()
+        msg.channel.guild.name = "TestServer"
+        # Make isinstance checks fail for DMChannel and Thread
+        type(msg.channel).__name__ = "TextChannel"
+    return msg
+
+
+class TestDiscordBotFilter(unittest.TestCase):
+    """Test the DISCORD_ALLOW_BOTS filtering logic."""
+
+    def _run_filter(self, message, allow_bots="none", client_user=None):
+        """Simulate the on_message filter logic and return whether message was accepted."""
+        # Replicate the exact filter logic from discord.py on_message
+        if message.author == client_user:
+            return False  # own messages always ignored
+
+        if getattr(message.author, "bot", False):
+            allow = allow_bots.lower().strip()
+            if allow == "none":
+                return False
+            elif allow == "mentions":
+                if not client_user or client_user not in message.mentions:
+                    return False
+            # "all" falls through
+        
+        return True  # message accepted
+
+    def test_own_messages_always_ignored(self):
+        """Bot's own messages are always ignored regardless of allow_bots."""
+        bot_user = _make_author(is_self=True)
+        msg = _make_message(author=bot_user)
+        self.assertFalse(self._run_filter(msg, "all", bot_user))
+
+    def test_human_messages_always_accepted(self):
+        """Human messages are always accepted regardless of allow_bots."""
+        human = _make_author(bot=False)
+        msg = _make_message(author=human)
+        self.assertTrue(self._run_filter(msg, "none"))
+        self.assertTrue(self._run_filter(msg, "mentions"))
+        self.assertTrue(self._run_filter(msg, "all"))
+
+    def test_allow_bots_none_rejects_bots(self):
+        """With allow_bots=none, all other bot messages are rejected."""
+        bot = _make_author(bot=True)
+        msg = _make_message(author=bot)
+        self.assertFalse(self._run_filter(msg, "none"))
+
+    def test_allow_bots_all_accepts_bots(self):
+        """With allow_bots=all, all bot messages are accepted."""
+        bot = _make_author(bot=True)
+        msg = _make_message(author=bot)
+        self.assertTrue(self._run_filter(msg, "all"))
+
+    def test_allow_bots_mentions_rejects_without_mention(self):
+        """With allow_bots=mentions, bot messages without @mention are rejected."""
+        our_user = _make_author(is_self=True)
+        bot = _make_author(bot=True)
+        msg = _make_message(author=bot, mentions=[])
+        self.assertFalse(self._run_filter(msg, "mentions", our_user))
+
+    def test_allow_bots_mentions_accepts_with_mention(self):
+        """With allow_bots=mentions, bot messages with @mention are accepted."""
+        our_user = _make_author(is_self=True)
+        bot = _make_author(bot=True)
+        msg = _make_message(author=bot, mentions=[our_user])
+        self.assertTrue(self._run_filter(msg, "mentions", our_user))
+
+    def test_default_is_none(self):
+        """Default behavior (no env var) should be 'none'."""
+        default = os.getenv("DISCORD_ALLOW_BOTS", "none")
+        self.assertEqual(default, "none")
+
+    def test_case_insensitive(self):
+        """Allow_bots value should be case-insensitive."""
+        bot = _make_author(bot=True)
+        msg = _make_message(author=bot)
+        self.assertTrue(self._run_filter(msg, "ALL"))
+        self.assertTrue(self._run_filter(msg, "All"))
+        self.assertFalse(self._run_filter(msg, "NONE"))
+        self.assertFalse(self._run_filter(msg, "None"))
+
+
+if __name__ == "__main__":
+    unittest.main()
@@ -0,0 +1,249 @@
+"""Tests for Discord free-response defaults and mention gating."""
+
+from datetime import datetime, timezone
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, MagicMock
+import sys
+
+import pytest
+
+from gateway.config import PlatformConfig
+
+
+def _ensure_discord_mock():
+    """Install a mock discord module when discord.py isn't available."""
+    if "discord" in sys.modules and hasattr(sys.modules["discord"], "__file__"):
+        return
+
+    discord_mod = MagicMock()
+    discord_mod.Intents.default.return_value = MagicMock()
+    discord_mod.Client = MagicMock
+    discord_mod.File = MagicMock
+    discord_mod.DMChannel = type("DMChannel", (), {})
+    discord_mod.Thread = type("Thread", (), {})
+    discord_mod.ForumChannel = type("ForumChannel", (), {})
+    discord_mod.ui = SimpleNamespace(View=object, button=lambda *a, **k: (lambda fn: fn), Button=object)
+    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, danger=3, green=1, blurple=2, red=3)
+    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4)
+    discord_mod.Interaction = object
+    discord_mod.Embed = MagicMock
+
+    ext_mod = MagicMock()
+    commands_mod = MagicMock()
+    commands_mod.Bot = MagicMock
+    ext_mod.commands = commands_mod
+
+    sys.modules.setdefault("discord", discord_mod)
+    sys.modules.setdefault("discord.ext", ext_mod)
+    sys.modules.setdefault("discord.ext.commands", commands_mod)
+
+
+_ensure_discord_mock()
+
+import gateway.platforms.discord as discord_platform  # noqa: E402
+from gateway.platforms.discord import DiscordAdapter  # noqa: E402
+
+
+class FakeDMChannel:
+    def __init__(self, channel_id: int = 1, name: str = "dm"):
+        self.id = channel_id
+        self.name = name
+
+
+class FakeTextChannel:
+    def __init__(self, channel_id: int = 1, name: str = "general", guild_name: str = "Hermes Server"):
+        self.id = channel_id
+        self.name = name
+        self.guild = SimpleNamespace(name=guild_name)
+        self.topic = None
+
+
+class FakeForumChannel:
+    def __init__(self, channel_id: int = 1, name: str = "support-forum", guild_name: str = "Hermes Server"):
+        self.id = channel_id
+        self.name = name
+        self.guild = SimpleNamespace(name=guild_name)
+        self.type = 15
+        self.topic = None
+
+
+class FakeThread:
+    def __init__(self, channel_id: int = 1, name: str = "thread", parent=None, guild_name: str = "Hermes Server"):
+        self.id = channel_id
+        self.name = name
+        self.parent = parent
+        self.parent_id = getattr(parent, "id", None)
+        self.guild = getattr(parent, "guild", None) or SimpleNamespace(name=guild_name)
+        self.topic = None
+
+
+@pytest.fixture
+def adapter(monkeypatch):
+    monkeypatch.setattr(discord_platform.discord, "DMChannel", FakeDMChannel, raising=False)
+    monkeypatch.setattr(discord_platform.discord, "Thread", FakeThread, raising=False)
+    monkeypatch.setattr(discord_platform.discord, "ForumChannel", FakeForumChannel, raising=False)
+
+    config = PlatformConfig(enabled=True, token="fake-token")
+    adapter = DiscordAdapter(config)
+    adapter._client = SimpleNamespace(user=SimpleNamespace(id=999))
+    adapter.handle_message = AsyncMock()
+    return adapter
+
+
+def make_message(*, channel, content: str, mentions=None):
+    author = SimpleNamespace(id=42, display_name="Jezza", name="Jezza")
+    return SimpleNamespace(
+        id=123,
+        content=content,
+        mentions=list(mentions or []),
+        attachments=[],
+        reference=None,
+        created_at=datetime.now(timezone.utc),
+        channel=channel,
+        author=author,
+    )
+
+
+@pytest.mark.asyncio
+async def test_discord_defaults_to_require_mention(adapter, monkeypatch):
+    """Default behavior: require @mention in server channels."""
+    monkeypatch.delenv("DISCORD_REQUIRE_MENTION", raising=False)
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+
+    message = make_message(channel=FakeTextChannel(channel_id=123), content="hello from channel")
+
+    await adapter._handle_message(message)
+
+    # Should be ignored — no mention, require_mention defaults to true
+    adapter.handle_message.assert_not_awaited()
+
+
+@pytest.mark.asyncio
+async def test_discord_free_response_in_server_channels(adapter, monkeypatch):
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "false")
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+
+    message = make_message(channel=FakeTextChannel(channel_id=123), content="hello from channel")
+
+    await adapter._handle_message(message)
+
+    adapter.handle_message.assert_awaited_once()
+    event = adapter.handle_message.await_args.args[0]
+    assert event.text == "hello from channel"
+    assert event.source.chat_id == "123"
+    assert event.source.chat_type == "group"
+
+
+@pytest.mark.asyncio
+async def test_discord_free_response_in_threads(adapter, monkeypatch):
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "false")
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+
+    thread = FakeThread(channel_id=456, name="Ghost reader skill")
+    message = make_message(channel=thread, content="hello from thread")
+
+    await adapter._handle_message(message)
+
+    adapter.handle_message.assert_awaited_once()
+    event = adapter.handle_message.await_args.args[0]
+    assert event.text == "hello from thread"
+    assert event.source.chat_id == "456"
+    assert event.source.thread_id == "456"
+    assert event.source.chat_type == "thread"
+
+
+@pytest.mark.asyncio
+async def test_discord_forum_threads_are_handled_as_threads(adapter, monkeypatch):
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "false")
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+
+    forum = FakeForumChannel(channel_id=222, name="support-forum")
+    thread = FakeThread(channel_id=456, name="Can Hermes reply here?", parent=forum)
+    message = make_message(channel=thread, content="hello from forum post")
+
+    await adapter._handle_message(message)
+
+    adapter.handle_message.assert_awaited_once()
+    event = adapter.handle_message.await_args.args[0]
+    assert event.text == "hello from forum post"
+    assert event.source.chat_id == "456"
+    assert event.source.thread_id == "456"
+    assert event.source.chat_type == "thread"
+    assert event.source.chat_name == "Hermes Server / support-forum / Can Hermes reply here?"
+
+
+@pytest.mark.asyncio
+async def test_discord_can_still_require_mentions_when_enabled(adapter, monkeypatch):
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+
+    message = make_message(channel=FakeTextChannel(channel_id=789), content="ignored without mention")
+
+    await adapter._handle_message(message)
+
+    adapter.handle_message.assert_not_awaited()
+
+
+@pytest.mark.asyncio
+async def test_discord_free_response_channel_overrides_mention_requirement(adapter, monkeypatch):
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
+    monkeypatch.setenv("DISCORD_FREE_RESPONSE_CHANNELS", "789,999")
+
+    message = make_message(channel=FakeTextChannel(channel_id=789), content="allowed without mention")
+
+    await adapter._handle_message(message)
+
+    adapter.handle_message.assert_awaited_once()
+    event = adapter.handle_message.await_args.args[0]
+    assert event.text == "allowed without mention"
+
+
+@pytest.mark.asyncio
+async def test_discord_forum_parent_in_free_response_list_allows_forum_thread(adapter, monkeypatch):
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
+    monkeypatch.setenv("DISCORD_FREE_RESPONSE_CHANNELS", "222")
+
+    forum = FakeForumChannel(channel_id=222, name="support-forum")
+    thread = FakeThread(channel_id=333, name="Forum topic", parent=forum)
+    message = make_message(channel=thread, content="allowed from forum thread")
+
+    await adapter._handle_message(message)
+
+    adapter.handle_message.assert_awaited_once()
+    event = adapter.handle_message.await_args.args[0]
+    assert event.text == "allowed from forum thread"
+    assert event.source.chat_id == "333"
+
+
+@pytest.mark.asyncio
+async def test_discord_accepts_and_strips_bot_mentions_when_required(adapter, monkeypatch):
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+
+    bot_user = adapter._client.user
+    message = make_message(
+        channel=FakeTextChannel(channel_id=321),
+        content=f"<@{bot_user.id}> hello with mention",
+        mentions=[bot_user],
+    )
+
+    await adapter._handle_message(message)
+
+    adapter.handle_message.assert_awaited_once()
+    event = adapter.handle_message.await_args.args[0]
+    assert event.text == "hello with mention"
+
+
+@pytest.mark.asyncio
+async def test_discord_dms_ignore_mention_requirement(adapter, monkeypatch):
+    monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
+    monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
+
+    message = make_message(channel=FakeDMChannel(channel_id=654), content="dm without mention")
+
+    await adapter._handle_message(message)
+
+    adapter.handle_message.assert_awaited_once()
+    event = adapter.handle_message.await_args.args[0]
+    assert event.text == "dm without mention"
+    assert event.source.chat_type == "dm"
@@ -0,0 +1,124 @@
+"""Tests verifying interrupt key consistency between adapter and gateway.
+
+Regression test for a bug where monitor_for_interrupt() in _run_agent used
+source.chat_id to query the adapter, but the adapter stores interrupts under
+the full session key (build_session_key output).  This mismatch meant
+interrupts were never detected, causing subagents to ignore new messages.
+"""
+
+import asyncio
+
+import pytest
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
+from gateway.session import SessionSource, build_session_key
+
+
+class StubAdapter(BasePlatformAdapter):
+    """Minimal adapter for interrupt tests."""
+
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="test"), Platform.TELEGRAM)
+
+    async def connect(self):
+        return True
+
+    async def disconnect(self):
+        pass
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None):
+        return SendResult(success=True, message_id="1")
+
+    async def send_typing(self, chat_id, metadata=None):
+        pass
+
+    async def get_chat_info(self, chat_id):
+        return {"id": chat_id}
+
+
+def _source(chat_id="123456", chat_type="dm", thread_id=None):
+    return SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id=chat_id,
+        chat_type=chat_type,
+        thread_id=thread_id,
+    )
+
+
+class TestInterruptKeyConsistency:
+    """Ensure adapter interrupt methods are queried with session_key, not chat_id."""
+
+    def test_session_key_differs_from_chat_id_for_dm(self):
+        """Session key for a DM is NOT the same as chat_id."""
+        source = _source("123456", "dm")
+        session_key = build_session_key(source)
+        assert session_key != source.chat_id
+        assert session_key == "agent:main:telegram:dm"
+
+    def test_session_key_differs_from_chat_id_for_group(self):
+        """Session key for a group chat includes prefix, unlike raw chat_id."""
+        source = _source("-1001234", "group")
+        session_key = build_session_key(source)
+        assert session_key != source.chat_id
+        assert "agent:main:" in session_key
+        assert source.chat_id in session_key
+
+    @pytest.mark.asyncio
+    async def test_has_pending_interrupt_requires_session_key(self):
+        """has_pending_interrupt returns True only when queried with session_key."""
+        adapter = StubAdapter()
+        source = _source("123456", "dm")
+        session_key = build_session_key(source)
+
+        # Simulate adapter storing interrupt under session_key
+        interrupt_event = asyncio.Event()
+        adapter._active_sessions[session_key] = interrupt_event
+        interrupt_event.set()
+
+        # Using session_key → found
+        assert adapter.has_pending_interrupt(session_key) is True
+
+        # Using chat_id → NOT found (this was the bug)
+        assert adapter.has_pending_interrupt(source.chat_id) is False
+
+    @pytest.mark.asyncio
+    async def test_get_pending_message_requires_session_key(self):
+        """get_pending_message returns the event only with session_key."""
+        adapter = StubAdapter()
+        source = _source("123456", "dm")
+        session_key = build_session_key(source)
+
+        event = MessageEvent(text="hello", source=source, message_id="42")
+        adapter._pending_messages[session_key] = event
+
+        # Using chat_id → None (the bug)
+        assert adapter.get_pending_message(source.chat_id) is None
+
+        # Using session_key → found
+        result = adapter.get_pending_message(session_key)
+        assert result is event
+
+    @pytest.mark.asyncio
+    async def test_handle_message_stores_under_session_key(self):
+        """handle_message stores pending messages under session_key, not chat_id."""
+        adapter = StubAdapter()
+        adapter.set_message_handler(lambda event: asyncio.sleep(0, result=None))
+
+        source = _source("-1001234", "group")
+        session_key = build_session_key(source)
+
+        # Mark session as active
+        adapter._active_sessions[session_key] = asyncio.Event()
+
+        # Send a second message while session is active
+        event = MessageEvent(text="interrupt!", source=source, message_id="2")
+        await adapter.handle_message(event)
+
+        # Stored under session_key
+        assert session_key in adapter._pending_messages
+        # NOT stored under chat_id
+        assert source.chat_id not in adapter._pending_messages
+
+        # Interrupt event was set
+        assert adapter._active_sessions[session_key].is_set()
@@ -57,6 +57,26 @@ class TestFindSessionId:

        assert result == "sess_new"

+    def test_thread_id_disambiguates_same_chat(self, tmp_path):
+        sessions_dir, index_file = _setup_sessions(tmp_path, {
+            "topic_a": {
+                "session_id": "sess_topic_a",
+                "origin": {"platform": "telegram", "chat_id": "-1001", "thread_id": "10"},
+                "updated_at": "2026-01-01T00:00:00",
+            },
+            "topic_b": {
+                "session_id": "sess_topic_b",
+                "origin": {"platform": "telegram", "chat_id": "-1001", "thread_id": "11"},
+                "updated_at": "2026-02-01T00:00:00",
+            },
+        })
+
+        with patch.object(mirror_mod, "_SESSIONS_DIR", sessions_dir), \
+             patch.object(mirror_mod, "_SESSIONS_INDEX", index_file):
+            result = _find_session_id("telegram", "-1001", thread_id="10")
+
+        assert result == "sess_topic_a"
+
    def test_no_match_returns_none(self, tmp_path):
        sessions_dir, index_file = _setup_sessions(tmp_path, {
            "sess": {
@@ -146,6 +166,29 @@ class TestMirrorToSession:
        assert msg["mirror"] is True
        assert msg["mirror_source"] == "cli"

+    def test_successful_mirror_uses_thread_id(self, tmp_path):
+        sessions_dir, index_file = _setup_sessions(tmp_path, {
+            "topic_a": {
+                "session_id": "sess_topic_a",
+                "origin": {"platform": "telegram", "chat_id": "-1001", "thread_id": "10"},
+                "updated_at": "2026-01-01T00:00:00",
+            },
+            "topic_b": {
+                "session_id": "sess_topic_b",
+                "origin": {"platform": "telegram", "chat_id": "-1001", "thread_id": "11"},
+                "updated_at": "2026-02-01T00:00:00",
+            },
+        })
+
+        with patch.object(mirror_mod, "_SESSIONS_DIR", sessions_dir), \
+             patch.object(mirror_mod, "_SESSIONS_INDEX", index_file), \
+             patch("gateway.mirror._append_to_sqlite"):
+            result = mirror_to_session("telegram", "-1001", "Hello topic!", source_label="cron", thread_id="10")
+
+        assert result is True
+        assert (sessions_dir / "sess_topic_a.jsonl").exists()
+        assert not (sessions_dir / "sess_topic_b.jsonl").exists()
+
    def test_no_matching_session(self, tmp_path):
        sessions_dir, index_file = _setup_sessions(tmp_path, {})

@@ -0,0 +1,60 @@
+"""Regression test: /retry must return the agent response, not None.
+
+Before the fix in PR #441, _handle_retry_command() called
+_handle_message(retry_event) but discarded its return value with `return None`,
+so users never received the final response.
+"""
+import pytest
+from unittest.mock import AsyncMock, MagicMock
+from gateway.run import GatewayRunner
+from gateway.platforms.base import MessageEvent, MessageType
+
+
+@pytest.fixture
+def gateway(tmp_path):
+    config = MagicMock()
+    config.sessions_dir = tmp_path
+    config.max_context_messages = 20
+    gw = GatewayRunner.__new__(GatewayRunner)
+    gw.config = config
+    gw.session_store = MagicMock()
+    return gw
+
+
+@pytest.mark.asyncio
+async def test_retry_returns_response_not_none(gateway):
+    """_handle_retry_command must return the inner handler response, not None."""
+    gateway.session_store.get_or_create_session.return_value = MagicMock(
+        session_id="test-session"
+    )
+    gateway.session_store.load_transcript.return_value = [
+        {"role": "user", "content": "Hello Hermes"},
+        {"role": "assistant", "content": "Hi there!"},
+    ]
+    gateway.session_store.rewrite_transcript = MagicMock()
+    expected_response = "Hi there! (retried)"
+    gateway._handle_message = AsyncMock(return_value=expected_response)
+    event = MessageEvent(
+        text="/retry",
+        message_type=MessageType.TEXT,
+        source=MagicMock(),
+    )
+    result = await gateway._handle_retry_command(event)
+    assert result is not None, "/retry must not return None"
+    assert result == expected_response
+
+
+@pytest.mark.asyncio
+async def test_retry_no_previous_message(gateway):
+    """If there is no previous user message, return early with a message."""
+    gateway.session_store.get_or_create_session.return_value = MagicMock(
+        session_id="test-session"
+    )
+    gateway.session_store.load_transcript.return_value = []
+    event = MessageEvent(
+        text="/retry",
+        message_type=MessageType.TEXT,
+        source=MagicMock(),
+    )
+    result = await gateway._handle_retry_command(event)
+    assert result == "No previous message to retry."
@@ -0,0 +1,134 @@
+"""Tests for topic-aware gateway progress updates."""
+
+import importlib
+import sys
+import time
+import types
+from types import SimpleNamespace
+
+import pytest
+
+from gateway.config import Platform, PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter, SendResult
+from gateway.session import SessionSource
+
+
+class ProgressCaptureAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="fake-token"), Platform.TELEGRAM)
+        self.sent = []
+        self.edits = []
+        self.typing = []
+
+    async def connect(self) -> bool:
+        return True
+
+    async def disconnect(self) -> None:
+        return None
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None) -> SendResult:
+        self.sent.append(
+            {
+                "chat_id": chat_id,
+                "content": content,
+                "reply_to": reply_to,
+                "metadata": metadata,
+            }
+        )
+        return SendResult(success=True, message_id="progress-1")
+
+    async def edit_message(self, chat_id, message_id, content) -> SendResult:
+        self.edits.append(
+            {
+                "chat_id": chat_id,
+                "message_id": message_id,
+                "content": content,
+            }
+        )
+        return SendResult(success=True, message_id=message_id)
+
+    async def send_typing(self, chat_id, metadata=None) -> None:
+        self.typing.append({"chat_id": chat_id, "metadata": metadata})
+
+    async def get_chat_info(self, chat_id: str):
+        return {"id": chat_id}
+
+
+class FakeAgent:
+    def __init__(self, **kwargs):
+        self.tool_progress_callback = kwargs["tool_progress_callback"]
+        self.tools = []
+
+    def run_conversation(self, message, conversation_history=None, task_id=None):
+        self.tool_progress_callback("terminal", "pwd")
+        time.sleep(0.35)
+        self.tool_progress_callback("browser_navigate", "https://example.com")
+        time.sleep(0.35)
+        return {
+            "final_response": "done",
+            "messages": [],
+            "api_calls": 1,
+        }
+
+
+def _make_runner(adapter):
+    gateway_run = importlib.import_module("gateway.run")
+    GatewayRunner = gateway_run.GatewayRunner
+
+    runner = object.__new__(GatewayRunner)
+    runner.adapters = {Platform.TELEGRAM: adapter}
+    runner._prefill_messages = []
+    runner._ephemeral_system_prompt = ""
+    runner._reasoning_config = None
+    runner._provider_routing = {}
+    runner._fallback_model = None
+    runner._session_db = None
+    runner._running_agents = {}
+    runner.hooks = SimpleNamespace(loaded_hooks=False)
+    return runner
+
+
+@pytest.mark.asyncio
+async def test_run_agent_progress_stays_in_originating_topic(monkeypatch, tmp_path):
+    monkeypatch.setenv("HERMES_TOOL_PROGRESS_MODE", "all")
+
+    fake_dotenv = types.ModuleType("dotenv")
+    fake_dotenv.load_dotenv = lambda *args, **kwargs: None
+    monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
+
+    fake_run_agent = types.ModuleType("run_agent")
+    fake_run_agent.AIAgent = FakeAgent
+    monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
+
+    adapter = ProgressCaptureAdapter()
+    runner = _make_runner(adapter)
+    gateway_run = importlib.import_module("gateway.run")
+    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
+    monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "fake"})
+    source = SessionSource(
+        platform=Platform.TELEGRAM,
+        chat_id="-1001",
+        chat_type="group",
+        thread_id="17585",
+    )
+
+    result = await runner._run_agent(
+        message="hello",
+        context_prompt="",
+        history=[],
+        source=source,
+        session_id="sess-1",
+        session_key="agent:main:telegram:group:-1001:17585",
+    )
+
+    assert result["final_response"] == "done"
+    assert adapter.sent == [
+        {
+            "chat_id": "-1001",
+            "content": '💻 terminal: "pwd"',
+            "reply_to": None,
+            "metadata": {"thread_id": "17585"},
+        }
+    ]
+    assert adapter.edits
+    assert all(call["metadata"] == {"thread_id": "17585"} for call in adapter.typing)
@@ -368,6 +368,17 @@ class TestWhatsAppDMSessionKeyConsistency:
        key = build_session_key(source)
        assert key == "agent:main:discord:group:guild-123"

+    def test_group_thread_includes_thread_id(self):
+        """Forum-style threads need a distinct session key within one group."""
+        source = SessionSource(
+            platform=Platform.TELEGRAM,
+            chat_id="-1002285219667",
+            chat_type="group",
+            thread_id="17585",
+        )
+        key = build_session_key(source)
+        assert key == "agent:main:telegram:group:-1002285219667:17585"
+

 class TestSessionStoreEntriesAttribute:
    """Regression: /reset must access _entries, not _sessions."""
@@ -429,3 +440,119 @@ class TestHasAnySessions:

        store._entries = {"key1": MagicMock()}
        assert store.has_any_sessions() is False
+
+
+class TestLastPromptTokens:
+    """Tests for the last_prompt_tokens field — actual API token tracking."""
+
+    def test_session_entry_default(self):
+        """New sessions should have last_prompt_tokens=0."""
+        from gateway.session import SessionEntry
+        from datetime import datetime
+        entry = SessionEntry(
+            session_key="test",
+            session_id="s1",
+            created_at=datetime.now(),
+            updated_at=datetime.now(),
+        )
+        assert entry.last_prompt_tokens == 0
+
+    def test_session_entry_roundtrip(self):
+        """last_prompt_tokens should survive serialization/deserialization."""
+        from gateway.session import SessionEntry
+        from datetime import datetime
+        entry = SessionEntry(
+            session_key="test",
+            session_id="s1",
+            created_at=datetime.now(),
+            updated_at=datetime.now(),
+            last_prompt_tokens=42000,
+        )
+        d = entry.to_dict()
+        assert d["last_prompt_tokens"] == 42000
+        restored = SessionEntry.from_dict(d)
+        assert restored.last_prompt_tokens == 42000
+
+    def test_session_entry_from_old_data(self):
+        """Old session data without last_prompt_tokens should default to 0."""
+        from gateway.session import SessionEntry
+        data = {
+            "session_key": "test",
+            "session_id": "s1",
+            "created_at": "2025-01-01T00:00:00",
+            "updated_at": "2025-01-01T00:00:00",
+            "input_tokens": 100,
+            "output_tokens": 50,
+            "total_tokens": 150,
+            # No last_prompt_tokens — old format
+        }
+        entry = SessionEntry.from_dict(data)
+        assert entry.last_prompt_tokens == 0
+
+    def test_update_session_sets_last_prompt_tokens(self, tmp_path):
+        """update_session should store the actual prompt token count."""
+        config = GatewayConfig()
+        with patch("gateway.session.SessionStore._ensure_loaded"):
+            store = SessionStore(sessions_dir=tmp_path, config=config)
+        store._loaded = True
+        store._db = None
+        store._save = MagicMock()
+
+        from gateway.session import SessionEntry
+        from datetime import datetime
+        entry = SessionEntry(
+            session_key="k1",
+            session_id="s1",
+            created_at=datetime.now(),
+            updated_at=datetime.now(),
+        )
+        store._entries = {"k1": entry}
+
+        store.update_session("k1", last_prompt_tokens=85000)
+        assert entry.last_prompt_tokens == 85000
+
+    def test_update_session_none_does_not_change(self, tmp_path):
+        """update_session with default (None) should not change last_prompt_tokens."""
+        config = GatewayConfig()
+        with patch("gateway.session.SessionStore._ensure_loaded"):
+            store = SessionStore(sessions_dir=tmp_path, config=config)
+        store._loaded = True
+        store._db = None
+        store._save = MagicMock()
+
+        from gateway.session import SessionEntry
+        from datetime import datetime
+        entry = SessionEntry(
+            session_key="k1",
+            session_id="s1",
+            created_at=datetime.now(),
+            updated_at=datetime.now(),
+            last_prompt_tokens=50000,
+        )
+        store._entries = {"k1": entry}
+
+        store.update_session("k1")  # No last_prompt_tokens arg
+        assert entry.last_prompt_tokens == 50000  # unchanged
+
+    def test_update_session_zero_resets(self, tmp_path):
+        """update_session with last_prompt_tokens=0 should reset the field."""
+        config = GatewayConfig()
+        with patch("gateway.session.SessionStore._ensure_loaded"):
+            store = SessionStore(sessions_dir=tmp_path, config=config)
+        store._loaded = True
+        store._db = None
+        store._save = MagicMock()
+
+        from gateway.session import SessionEntry
+        from datetime import datetime
+        entry = SessionEntry(
+            session_key="k1",
+            session_id="s1",
+            created_at=datetime.now(),
+            updated_at=datetime.now(),
+            last_prompt_tokens=85000,
+        )
+        store._entries = {"k1": entry}
+
+        store.update_session("k1", last_prompt_tokens=0)
+        assert entry.last_prompt_tokens == 0
@@ -8,9 +8,19 @@ The hygiene system uses the SAME compression config as the agent:
 so CLI and messaging platforms behave identically.
 """

-import pytest
+import importlib
+import sys
+import types
+from datetime import datetime
+from types import SimpleNamespace
 from unittest.mock import patch, MagicMock, AsyncMock
+
+import pytest
+
 from agent.model_metadata import estimate_messages_tokens_rough
+from gateway.config import GatewayConfig, Platform, PlatformConfig
+from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
+from gateway.session import SessionEntry, SessionSource


 # ---------------------------------------------------------------------------
@@ -41,6 +51,32 @@ def _make_large_history_tokens(target_tokens: int) -> list:
    return _make_history(n_msgs, content_size=content_size)


+class HygieneCaptureAdapter(BasePlatformAdapter):
+    def __init__(self):
+        super().__init__(PlatformConfig(enabled=True, token="fake-token"), Platform.TELEGRAM)
+        self.sent = []
+
+    async def connect(self) -> bool:
+        return True
+
+    async def disconnect(self) -> None:
+        return None
+
+    async def send(self, chat_id, content, reply_to=None, metadata=None) -> SendResult:
+        self.sent.append(
+            {
+                "chat_id": chat_id,
+                "content": content,
+                "reply_to": reply_to,
+                "metadata": metadata,
+            }
+        )
+        return SendResult(success=True, message_id="hygiene-1")
+
+    async def get_chat_info(self, chat_id: str):
+        return {"id": chat_id}
+
+
 # ---------------------------------------------------------------------------
 # Detection threshold tests (model-aware, unified with compression config)
 # ---------------------------------------------------------------------------
@@ -202,3 +238,90 @@ class TestTokenEstimation:
        # Should be well above the 170K threshold for a 200k model
        threshold = int(200_000 * 0.85)
        assert tokens > threshold
+
+
+@pytest.mark.asyncio
+async def test_session_hygiene_messages_stay_in_originating_topic(monkeypatch, tmp_path):
+    fake_dotenv = types.ModuleType("dotenv")
+    fake_dotenv.load_dotenv = lambda *args, **kwargs: None
+    monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
+
+    class FakeCompressAgent:
+        def __init__(self, **kwargs):
+            self.model = kwargs.get("model")
+
+        def _compress_context(self, messages, *_args, **_kwargs):
+            return ([{"role": "assistant", "content": "compressed"}], None)
+
+    fake_run_agent = types.ModuleType("run_agent")
+    fake_run_agent.AIAgent = FakeCompressAgent
+    monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
+
+    gateway_run = importlib.import_module("gateway.run")
+    GatewayRunner = gateway_run.GatewayRunner
+
+    adapter = HygieneCaptureAdapter()
+    runner = object.__new__(GatewayRunner)
+    runner.config = GatewayConfig(
+        platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="fake-token")}
+    )
+    runner.adapters = {Platform.TELEGRAM: adapter}
+    runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False)
+    runner.session_store = MagicMock()
+    runner.session_store.get_or_create_session.return_value = SessionEntry(
+        session_key="agent:main:telegram:group:-1001:17585",
+        session_id="sess-1",
+        created_at=datetime.now(),
+        updated_at=datetime.now(),
+        platform=Platform.TELEGRAM,
+        chat_type="group",
+    )
+    runner.session_store.load_transcript.return_value = _make_history(6, content_size=400)
+    runner.session_store.has_any_sessions.return_value = True
+    runner.session_store.rewrite_transcript = MagicMock()
+    runner.session_store.append_to_transcript = MagicMock()
+    runner._running_agents = {}
+    runner._pending_messages = {}
+    runner._pending_approvals = {}
+    runner._session_db = None
+    runner._is_user_authorized = lambda _source: True
+    runner._set_session_env = lambda _context: None
+    runner._run_agent = AsyncMock(
+        return_value={
+            "final_response": "ok",
+            "messages": [],
+            "tools": [],
+            "history_offset": 0,
+            "last_prompt_tokens": 0,
+        }
+    )
+
+    monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
+    monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "fake"})
+    monkeypatch.setattr(
+        "agent.model_metadata.get_model_context_length",
+        lambda *_args, **_kwargs: 100,
+    )
+    monkeypatch.setenv("TELEGRAM_HOME_CHANNEL", "795544298")
+
+    event = MessageEvent(
+        text="hello",
+        source=SessionSource(
+            platform=Platform.TELEGRAM,
+            chat_id="-1001",
+            chat_type="group",
+            thread_id="17585",
+        ),
+        message_id="1",
+    )
+
+    result = await runner._handle_message(event)
+
+    assert result == "ok"
+    assert len(adapter.sent) == 2
+    assert adapter.sent[0]["chat_id"] == "-1001"
+    assert "Session is large" in adapter.sent[0]["content"]
+    assert adapter.sent[0]["metadata"] == {"thread_id": "17585"}
+    assert adapter.sent[1]["chat_id"] == "-1001"
+    assert "Compressed:" in adapter.sent[1]["content"]
+    assert adapter.sent[1]["metadata"] == {"thread_id": "17585"}
@@ -20,6 +20,7 @@ from gateway.config import Platform, PlatformConfig
 from gateway.platforms.base import (
    MessageEvent,
    MessageType,
+    SendResult,
    SUPPORTED_DOCUMENT_TYPES,
 )

@@ -336,3 +337,203 @@ class TestDocumentDownloadBlock:
        await adapter._handle_media_message(update, MagicMock())
        # handle_message should still be called (the handler catches the exception)
        adapter.handle_message.assert_called_once()
+
+
+# ---------------------------------------------------------------------------
+# TestSendDocument — outbound file attachment delivery
+# ---------------------------------------------------------------------------
+
+class TestSendDocument:
+    """Tests for TelegramAdapter.send_document() — sending files to users."""
+
+    @pytest.fixture()
+    def connected_adapter(self, adapter):
+        """Adapter with a mock bot attached."""
+        bot = AsyncMock()
+        adapter._bot = bot
+        return adapter
+
+    @pytest.mark.asyncio
+    async def test_send_document_success(self, connected_adapter, tmp_path):
+        """A local file is sent via bot.send_document and returns success."""
+        # Create a real temp file
+        test_file = tmp_path / "report.pdf"
+        test_file.write_bytes(b"%PDF-1.4 fake content")
+
+        mock_msg = MagicMock()
+        mock_msg.message_id = 99
+        connected_adapter._bot.send_document = AsyncMock(return_value=mock_msg)
+
+        result = await connected_adapter.send_document(
+            chat_id="12345",
+            file_path=str(test_file),
+            caption="Here's the report",
+        )
+
+        assert result.success is True
+        assert result.message_id == "99"
+        connected_adapter._bot.send_document.assert_called_once()
+        call_kwargs = connected_adapter._bot.send_document.call_args[1]
+        assert call_kwargs["chat_id"] == 12345
+        assert call_kwargs["filename"] == "report.pdf"
+        assert call_kwargs["caption"] == "Here's the report"
+
+    @pytest.mark.asyncio
+    async def test_send_document_custom_filename(self, connected_adapter, tmp_path):
+        """The file_name parameter overrides the basename for display."""
+        test_file = tmp_path / "doc_abc123_ugly.csv"
+        test_file.write_bytes(b"a,b,c\n1,2,3")
+
+        mock_msg = MagicMock()
+        mock_msg.message_id = 100
+        connected_adapter._bot.send_document = AsyncMock(return_value=mock_msg)
+
+        result = await connected_adapter.send_document(
+            chat_id="12345",
+            file_path=str(test_file),
+            file_name="clean_data.csv",
+        )
+
+        assert result.success is True
+        call_kwargs = connected_adapter._bot.send_document.call_args[1]
+        assert call_kwargs["filename"] == "clean_data.csv"
+
+    @pytest.mark.asyncio
+    async def test_send_document_file_not_found(self, connected_adapter):
+        """Missing file returns error without calling Telegram API."""
+        result = await connected_adapter.send_document(
+            chat_id="12345",
+            file_path="/nonexistent/file.pdf",
+        )
+
+        assert result.success is False
+        assert "not found" in result.error.lower()
+        connected_adapter._bot.send_document.assert_not_called()
+
+    @pytest.mark.asyncio
+    async def test_send_document_not_connected(self, adapter):
+        """If bot is None, returns not connected error."""
+        result = await adapter.send_document(
+            chat_id="12345",
+            file_path="/some/file.pdf",
+        )
+
+        assert result.success is False
+        assert "Not connected" in result.error
+
+    @pytest.mark.asyncio
+    async def test_send_document_caption_truncated(self, connected_adapter, tmp_path):
+        """Captions longer than 1024 chars are truncated."""
+        test_file = tmp_path / "data.json"
+        test_file.write_bytes(b"{}")
+
+        mock_msg = MagicMock()
+        mock_msg.message_id = 101
+        connected_adapter._bot.send_document = AsyncMock(return_value=mock_msg)
+
+        long_caption = "x" * 2000
+        await connected_adapter.send_document(
+            chat_id="12345",
+            file_path=str(test_file),
+            caption=long_caption,
+        )
+
+        call_kwargs = connected_adapter._bot.send_document.call_args[1]
+        assert len(call_kwargs["caption"]) == 1024
+
+    @pytest.mark.asyncio
+    async def test_send_document_api_error_falls_back(self, connected_adapter, tmp_path):
+        """If Telegram API raises, falls back to base class text message."""
+        test_file = tmp_path / "file.pdf"
+        test_file.write_bytes(b"data")
+
+        connected_adapter._bot.send_document = AsyncMock(
+            side_effect=RuntimeError("Telegram API error")
+        )
+
+        # The base fallback calls self.send() which is also on _bot, so mock it
+        # to avoid cascading errors.
+        connected_adapter.send = AsyncMock(
+            return_value=SendResult(success=True, message_id="fallback")
+        )
+
+        result = await connected_adapter.send_document(
+            chat_id="12345",
+            file_path=str(test_file),
+        )
+
+        # Should have fallen back to base class
+        assert result.success is True
+        assert result.message_id == "fallback"
+
+    @pytest.mark.asyncio
+    async def test_send_document_reply_to(self, connected_adapter, tmp_path):
+        """reply_to parameter is forwarded as reply_to_message_id."""
+        test_file = tmp_path / "spec.md"
+        test_file.write_bytes(b"# Spec")
+
+        mock_msg = MagicMock()
+        mock_msg.message_id = 102
+        connected_adapter._bot.send_document = AsyncMock(return_value=mock_msg)
+
+        await connected_adapter.send_document(
+            chat_id="12345",
+            file_path=str(test_file),
+            reply_to="50",
+        )
+
+        call_kwargs = connected_adapter._bot.send_document.call_args[1]
+        assert call_kwargs["reply_to_message_id"] == 50
+
+
+# ---------------------------------------------------------------------------
+# TestSendVideo — outbound video delivery
+# ---------------------------------------------------------------------------
+
+class TestSendVideo:
+    """Tests for TelegramAdapter.send_video() — sending videos to users."""
+
+    @pytest.fixture()
+    def connected_adapter(self, adapter):
+        bot = AsyncMock()
+        adapter._bot = bot
+        return adapter
+
+    @pytest.mark.asyncio
+    async def test_send_video_success(self, connected_adapter, tmp_path):
+        test_file = tmp_path / "clip.mp4"
+        test_file.write_bytes(b"\x00\x00\x00\x1c" + b"ftyp" + b"\x00" * 100)
+
+        mock_msg = MagicMock()
+        mock_msg.message_id = 200
+        connected_adapter._bot.send_video = AsyncMock(return_value=mock_msg)
+
+        result = await connected_adapter.send_video(
+            chat_id="12345",
+            video_path=str(test_file),
+            caption="Check this out",
+        )
+
+        assert result.success is True
+        assert result.message_id == "200"
+        connected_adapter._bot.send_video.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_send_video_file_not_found(self, connected_adapter):
+        result = await connected_adapter.send_video(
+            chat_id="12345",
+            video_path="/nonexistent/video.mp4",
+        )
+
+        assert result.success is False
+        assert "not found" in result.error.lower()
+
+    @pytest.mark.asyncio
+    async def test_send_video_not_connected(self, adapter):
+        result = await adapter.send_video(
+            chat_id="12345",
+            video_path="/some/video.mp4",
+        )
+
+        assert result.success is False
+        assert "Not connected" in result.error
@@ -0,0 +1,340 @@
+"""Tests for hermes claw commands."""
+
+from argparse import Namespace
+from types import ModuleType
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from hermes_cli import claw as claw_mod
+
+
+# ---------------------------------------------------------------------------
+# _find_migration_script
+# ---------------------------------------------------------------------------
+
+
+class TestFindMigrationScript:
+    """Test script discovery in known locations."""
+
+    def test_finds_project_root_script(self, tmp_path):
+        script = tmp_path / "openclaw_to_hermes.py"
+        script.write_text("# placeholder")
+        with patch.object(claw_mod, "_OPENCLAW_SCRIPT", script):
+            assert claw_mod._find_migration_script() == script
+
+    def test_finds_installed_script(self, tmp_path):
+        installed = tmp_path / "installed.py"
+        installed.write_text("# placeholder")
+        with (
+            patch.object(claw_mod, "_OPENCLAW_SCRIPT", tmp_path / "nonexistent.py"),
+            patch.object(claw_mod, "_OPENCLAW_SCRIPT_INSTALLED", installed),
+        ):
+            assert claw_mod._find_migration_script() == installed
+
+    def test_returns_none_when_missing(self, tmp_path):
+        with (
+            patch.object(claw_mod, "_OPENCLAW_SCRIPT", tmp_path / "a.py"),
+            patch.object(claw_mod, "_OPENCLAW_SCRIPT_INSTALLED", tmp_path / "b.py"),
+        ):
+            assert claw_mod._find_migration_script() is None
+
+
+# ---------------------------------------------------------------------------
+# claw_command routing
+# ---------------------------------------------------------------------------
+
+
+class TestClawCommand:
+    """Test the claw_command router."""
+
+    def test_routes_to_migrate(self):
+        args = Namespace(claw_action="migrate", source=None, dry_run=True,
+                         preset="full", overwrite=False, migrate_secrets=False,
+                         workspace_target=None, skill_conflict="skip", yes=False)
+        with patch.object(claw_mod, "_cmd_migrate") as mock:
+            claw_mod.claw_command(args)
+        mock.assert_called_once_with(args)
+
+    def test_shows_help_for_no_action(self, capsys):
+        args = Namespace(claw_action=None)
+        claw_mod.claw_command(args)
+        captured = capsys.readouterr()
+        assert "migrate" in captured.out
+
+
+# ---------------------------------------------------------------------------
+# _cmd_migrate
+# ---------------------------------------------------------------------------
+
+
+class TestCmdMigrate:
+    """Test the migrate command handler."""
+
+    def test_error_when_source_missing(self, tmp_path, capsys):
+        args = Namespace(
+            source=str(tmp_path / "nonexistent"),
+            dry_run=True, preset="full", overwrite=False,
+            migrate_secrets=False, workspace_target=None,
+            skill_conflict="skip", yes=False,
+        )
+        claw_mod._cmd_migrate(args)
+        captured = capsys.readouterr()
+        assert "not found" in captured.out
+
+    def test_error_when_script_missing(self, tmp_path, capsys):
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        args = Namespace(
+            source=str(openclaw_dir),
+            dry_run=True, preset="full", overwrite=False,
+            migrate_secrets=False, workspace_target=None,
+            skill_conflict="skip", yes=False,
+        )
+        with (
+            patch.object(claw_mod, "_OPENCLAW_SCRIPT", tmp_path / "a.py"),
+            patch.object(claw_mod, "_OPENCLAW_SCRIPT_INSTALLED", tmp_path / "b.py"),
+        ):
+            claw_mod._cmd_migrate(args)
+        captured = capsys.readouterr()
+        assert "Migration script not found" in captured.out
+
+    def test_dry_run_succeeds(self, tmp_path, capsys):
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        script = tmp_path / "script.py"
+        script.write_text("# placeholder")
+
+        # Build a fake migration module
+        fake_mod = ModuleType("openclaw_to_hermes")
+        fake_mod.resolve_selected_options = MagicMock(return_value={"soul", "memory"})
+        fake_migrator = MagicMock()
+        fake_migrator.migrate.return_value = {
+            "summary": {"migrated": 0, "skipped": 5, "conflict": 0, "error": 0},
+            "items": [
+                {"kind": "soul", "status": "skipped", "reason": "Not found"},
+            ],
+            "preset": "full",
+        }
+        fake_mod.Migrator = MagicMock(return_value=fake_migrator)
+
+        args = Namespace(
+            source=str(openclaw_dir),
+            dry_run=True, preset="full", overwrite=False,
+            migrate_secrets=False, workspace_target=None,
+            skill_conflict="skip", yes=False,
+        )
+
+        with (
+            patch.object(claw_mod, "_find_migration_script", return_value=script),
+            patch.object(claw_mod, "_load_migration_module", return_value=fake_mod),
+            patch.object(claw_mod, "get_config_path", return_value=tmp_path / "config.yaml"),
+            patch.object(claw_mod, "save_config"),
+            patch.object(claw_mod, "load_config", return_value={}),
+        ):
+            claw_mod._cmd_migrate(args)
+
+        captured = capsys.readouterr()
+        assert "Dry Run Results" in captured.out
+        assert "5 skipped" in captured.out
+
+    def test_execute_with_confirmation(self, tmp_path, capsys):
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        config_path = tmp_path / "config.yaml"
+        config_path.write_text("agent:\n  max_turns: 90\n")
+
+        fake_mod = ModuleType("openclaw_to_hermes")
+        fake_mod.resolve_selected_options = MagicMock(return_value={"soul"})
+        fake_migrator = MagicMock()
+        fake_migrator.migrate.return_value = {
+            "summary": {"migrated": 2, "skipped": 1, "conflict": 0, "error": 0},
+            "items": [
+                {"kind": "soul", "status": "migrated", "destination": str(tmp_path / "SOUL.md")},
+                {"kind": "memory", "status": "migrated", "destination": str(tmp_path / "memories/MEMORY.md")},
+            ],
+        }
+        fake_mod.Migrator = MagicMock(return_value=fake_migrator)
+
+        args = Namespace(
+            source=str(openclaw_dir),
+            dry_run=False, preset="user-data", overwrite=False,
+            migrate_secrets=False, workspace_target=None,
+            skill_conflict="skip", yes=False,
+        )
+
+        with (
+            patch.object(claw_mod, "_find_migration_script", return_value=tmp_path / "s.py"),
+            patch.object(claw_mod, "_load_migration_module", return_value=fake_mod),
+            patch.object(claw_mod, "get_config_path", return_value=config_path),
+            patch.object(claw_mod, "prompt_yes_no", return_value=True),
+        ):
+            claw_mod._cmd_migrate(args)
+
+        captured = capsys.readouterr()
+        assert "Migration Results" in captured.out
+        assert "Migration complete!" in captured.out
+
+    def test_execute_cancelled_by_user(self, tmp_path, capsys):
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        config_path = tmp_path / "config.yaml"
+        config_path.write_text("")
+
+        args = Namespace(
+            source=str(openclaw_dir),
+            dry_run=False, preset="full", overwrite=False,
+            migrate_secrets=False, workspace_target=None,
+            skill_conflict="skip", yes=False,
+        )
+
+        with (
+            patch.object(claw_mod, "_find_migration_script", return_value=tmp_path / "s.py"),
+            patch.object(claw_mod, "prompt_yes_no", return_value=False),
+        ):
+            claw_mod._cmd_migrate(args)
+
+        captured = capsys.readouterr()
+        assert "Migration cancelled" in captured.out
+
+    def test_execute_with_yes_skips_confirmation(self, tmp_path, capsys):
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        config_path = tmp_path / "config.yaml"
+        config_path.write_text("")
+
+        fake_mod = ModuleType("openclaw_to_hermes")
+        fake_mod.resolve_selected_options = MagicMock(return_value=set())
+        fake_migrator = MagicMock()
+        fake_migrator.migrate.return_value = {
+            "summary": {"migrated": 0, "skipped": 0, "conflict": 0, "error": 0},
+            "items": [],
+        }
+        fake_mod.Migrator = MagicMock(return_value=fake_migrator)
+
+        args = Namespace(
+            source=str(openclaw_dir),
+            dry_run=False, preset="full", overwrite=False,
+            migrate_secrets=False, workspace_target=None,
+            skill_conflict="skip", yes=True,
+        )
+
+        with (
+            patch.object(claw_mod, "_find_migration_script", return_value=tmp_path / "s.py"),
+            patch.object(claw_mod, "_load_migration_module", return_value=fake_mod),
+            patch.object(claw_mod, "get_config_path", return_value=config_path),
+            patch.object(claw_mod, "prompt_yes_no") as mock_prompt,
+        ):
+            claw_mod._cmd_migrate(args)
+
+        mock_prompt.assert_not_called()
+
+    def test_handles_migration_error(self, tmp_path, capsys):
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        config_path = tmp_path / "config.yaml"
+        config_path.write_text("")
+
+        args = Namespace(
+            source=str(openclaw_dir),
+            dry_run=True, preset="full", overwrite=False,
+            migrate_secrets=False, workspace_target=None,
+            skill_conflict="skip", yes=False,
+        )
+
+        with (
+            patch.object(claw_mod, "_find_migration_script", return_value=tmp_path / "s.py"),
+            patch.object(claw_mod, "_load_migration_module", side_effect=RuntimeError("boom")),
+            patch.object(claw_mod, "get_config_path", return_value=config_path),
+            patch.object(claw_mod, "save_config"),
+            patch.object(claw_mod, "load_config", return_value={}),
+        ):
+            claw_mod._cmd_migrate(args)
+
+        captured = capsys.readouterr()
+        assert "Migration failed" in captured.out
+
+    def test_full_preset_enables_secrets(self, tmp_path, capsys):
+        """The 'full' preset should set migrate_secrets=True automatically."""
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+
+        fake_mod = ModuleType("openclaw_to_hermes")
+        fake_mod.resolve_selected_options = MagicMock(return_value=set())
+        fake_migrator = MagicMock()
+        fake_migrator.migrate.return_value = {
+            "summary": {"migrated": 0, "skipped": 0, "conflict": 0, "error": 0},
+            "items": [],
+        }
+        fake_mod.Migrator = MagicMock(return_value=fake_migrator)
+
+        args = Namespace(
+            source=str(openclaw_dir),
+            dry_run=True, preset="full", overwrite=False,
+            migrate_secrets=False,  # Not explicitly set by user
+            workspace_target=None,
+            skill_conflict="skip", yes=False,
+        )
+
+        with (
+            patch.object(claw_mod, "_find_migration_script", return_value=tmp_path / "s.py"),
+            patch.object(claw_mod, "_load_migration_module", return_value=fake_mod),
+            patch.object(claw_mod, "get_config_path", return_value=tmp_path / "config.yaml"),
+            patch.object(claw_mod, "save_config"),
+            patch.object(claw_mod, "load_config", return_value={}),
+        ):
+            claw_mod._cmd_migrate(args)
+
+        # Migrator should have been called with migrate_secrets=True
+        call_kwargs = fake_mod.Migrator.call_args[1]
+        assert call_kwargs["migrate_secrets"] is True
+
+
+# ---------------------------------------------------------------------------
+# _print_migration_report
+# ---------------------------------------------------------------------------
+
+
+class TestPrintMigrationReport:
+    """Test the report formatting function."""
+
+    def test_dry_run_report(self, capsys):
+        report = {
+            "summary": {"migrated": 2, "skipped": 1, "conflict": 1, "error": 0},
+            "items": [
+                {"kind": "soul", "status": "migrated", "destination": "/home/user/.hermes/SOUL.md"},
+                {"kind": "memory", "status": "migrated", "destination": "/home/user/.hermes/memories/MEMORY.md"},
+                {"kind": "skills", "status": "conflict", "reason": "already exists"},
+                {"kind": "tts-assets", "status": "skipped", "reason": "not found"},
+            ],
+            "preset": "full",
+        }
+        claw_mod._print_migration_report(report, dry_run=True)
+        captured = capsys.readouterr()
+        assert "Dry Run Results" in captured.out
+        assert "Would migrate" in captured.out
+        assert "2 would migrate" in captured.out
+        assert "--dry-run" in captured.out
+
+    def test_execute_report(self, capsys):
+        report = {
+            "summary": {"migrated": 3, "skipped": 0, "conflict": 0, "error": 0},
+            "items": [
+                {"kind": "soul", "status": "migrated", "destination": "/home/user/.hermes/SOUL.md"},
+            ],
+            "output_dir": "/home/user/.hermes/migration/openclaw/20250312T120000",
+        }
+        claw_mod._print_migration_report(report, dry_run=False)
+        captured = capsys.readouterr()
+        assert "Migration Results" in captured.out
+        assert "Migrated" in captured.out
+        assert "Full report saved to" in captured.out
+
+    def test_empty_report(self, capsys):
+        report = {
+            "summary": {"migrated": 0, "skipped": 0, "conflict": 0, "error": 0},
+            "items": [],
+        }
+        claw_mod._print_migration_report(report, dry_run=False)
+        captured = capsys.readouterr()
+        assert "Nothing to migrate" in captured.out
@@ -11,8 +11,8 @@ EXPECTED_COMMANDS = {
    "/help", "/tools", "/toolsets", "/model", "/provider", "/prompt",
    "/personality", "/clear", "/history", "/new", "/reset", "/retry",
    "/undo", "/save", "/config", "/cron", "/skills", "/platforms",
-    "/verbose", "/compress", "/title", "/usage", "/insights", "/paste",
-    "/reload-mcp", "/rollback", "/skin", "/quit",
+    "/verbose", "/reasoning", "/compress", "/title", "/usage", "/insights", "/paste",
+    "/reload-mcp", "/rollback", "/background", "/skin", "/quit",
 }


@@ -0,0 +1,97 @@
+import json
+
+from hermes_cli.auth import _update_config_for_provider, get_active_provider
+from hermes_cli.config import load_config, save_config
+from hermes_cli.setup import setup_model_provider
+
+
+def _clear_provider_env(monkeypatch):
+    for key in (
+        "NOUS_API_KEY",
+        "OPENROUTER_API_KEY",
+        "OPENAI_BASE_URL",
+        "OPENAI_API_KEY",
+        "LLM_MODEL",
+    ):
+        monkeypatch.delenv(key, raising=False)
+
+
+
+def test_nous_oauth_setup_keeps_current_model_when_syncing_disk_provider(
+    tmp_path, monkeypatch
+):
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    _clear_provider_env(monkeypatch)
+
+    config = load_config()
+
+    prompt_choices = iter([0, 2])
+    monkeypatch.setattr(
+        "hermes_cli.setup.prompt_choice",
+        lambda *args, **kwargs: next(prompt_choices),
+    )
+    monkeypatch.setattr("hermes_cli.setup.prompt", lambda *args, **kwargs: "")
+
+    def _fake_login_nous(*args, **kwargs):
+        auth_path = tmp_path / "auth.json"
+        auth_path.write_text(json.dumps({"active_provider": "nous", "providers": {}}))
+        _update_config_for_provider("nous", "https://inference.example.com/v1")
+
+    monkeypatch.setattr("hermes_cli.auth._login_nous", _fake_login_nous)
+    monkeypatch.setattr(
+        "hermes_cli.auth.resolve_nous_runtime_credentials",
+        lambda *args, **kwargs: {
+            "base_url": "https://inference.example.com/v1",
+            "api_key": "nous-key",
+        },
+    )
+    monkeypatch.setattr(
+        "hermes_cli.auth.fetch_nous_models",
+        lambda *args, **kwargs: ["gemini-3-flash"],
+    )
+
+    setup_model_provider(config)
+    save_config(config)
+
+    reloaded = load_config()
+
+    assert isinstance(reloaded["model"], dict)
+    assert reloaded["model"]["provider"] == "nous"
+    assert reloaded["model"]["base_url"] == "https://inference.example.com/v1"
+    assert reloaded["model"]["default"] == "anthropic/claude-opus-4.6"
+
+
+def test_custom_setup_clears_active_oauth_provider(tmp_path, monkeypatch):
+    monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+    _clear_provider_env(monkeypatch)
+
+    auth_path = tmp_path / "auth.json"
+    auth_path.write_text(json.dumps({"active_provider": "nous", "providers": {}}))
+
+    config = load_config()
+
+    monkeypatch.setattr("hermes_cli.setup.prompt_choice", lambda *args, **kwargs: 3)
+
+    prompt_values = iter(
+        [
+            "https://custom.example/v1",
+            "custom-api-key",
+            "custom/model",
+            "",
+        ]
+    )
+    monkeypatch.setattr(
+        "hermes_cli.setup.prompt",
+        lambda *args, **kwargs: next(prompt_values),
+    )
+
+    setup_model_provider(config)
+    save_config(config)
+
+    reloaded = load_config()
+
+    assert get_active_provider() is None
+    assert isinstance(reloaded["model"], dict)
+    assert reloaded["model"]["provider"] == "custom"
+    assert reloaded["model"]["base_url"] == "https://custom.example/v1"
+    assert reloaded["model"]["default"] == "custom/model"
@@ -0,0 +1,284 @@
+"""Tests for OpenClaw migration integration in the setup wizard."""
+
+from argparse import Namespace
+from types import ModuleType
+from unittest.mock import MagicMock, patch
+
+from hermes_cli import setup as setup_mod
+
+
+# ---------------------------------------------------------------------------
+# _offer_openclaw_migration — unit tests
+# ---------------------------------------------------------------------------
+
+
+class TestOfferOpenclawMigration:
+    """Test the _offer_openclaw_migration helper in isolation."""
+
+    def test_skips_when_no_openclaw_dir(self, tmp_path):
+        """Should return False immediately when ~/.openclaw does not exist."""
+        with patch("hermes_cli.setup.Path.home", return_value=tmp_path):
+            assert setup_mod._offer_openclaw_migration(tmp_path / ".hermes") is False
+
+    def test_skips_when_migration_script_missing(self, tmp_path):
+        """Should return False when the migration script file is absent."""
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        with (
+            patch("hermes_cli.setup.Path.home", return_value=tmp_path),
+            patch.object(setup_mod, "_OPENCLAW_SCRIPT", tmp_path / "nonexistent.py"),
+        ):
+            assert setup_mod._offer_openclaw_migration(tmp_path / ".hermes") is False
+
+    def test_skips_when_user_declines(self, tmp_path):
+        """Should return False when user declines the migration prompt."""
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        script = tmp_path / "openclaw_to_hermes.py"
+        script.write_text("# placeholder")
+        with (
+            patch("hermes_cli.setup.Path.home", return_value=tmp_path),
+            patch.object(setup_mod, "_OPENCLAW_SCRIPT", script),
+            patch.object(setup_mod, "prompt_yes_no", return_value=False),
+        ):
+            assert setup_mod._offer_openclaw_migration(tmp_path / ".hermes") is False
+
+    def test_runs_migration_when_user_accepts(self, tmp_path):
+        """Should dynamically load the script and run the Migrator."""
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+
+        # Create a fake hermes home with config
+        hermes_home = tmp_path / ".hermes"
+        hermes_home.mkdir()
+        config_path = hermes_home / "config.yaml"
+        config_path.write_text("agent:\n  max_turns: 90\n")
+
+        # Build a fake migration module
+        fake_mod = ModuleType("openclaw_to_hermes")
+        fake_mod.resolve_selected_options = MagicMock(return_value={"soul", "memory"})
+        fake_migrator = MagicMock()
+        fake_migrator.migrate.return_value = {
+            "summary": {"migrated": 3, "skipped": 1, "conflict": 0, "error": 0},
+            "output_dir": str(hermes_home / "migration"),
+        }
+        fake_mod.Migrator = MagicMock(return_value=fake_migrator)
+
+        script = tmp_path / "openclaw_to_hermes.py"
+        script.write_text("# placeholder")
+
+        with (
+            patch("hermes_cli.setup.Path.home", return_value=tmp_path),
+            patch.object(setup_mod, "_OPENCLAW_SCRIPT", script),
+            patch.object(setup_mod, "prompt_yes_no", return_value=True),
+            patch.object(setup_mod, "get_config_path", return_value=config_path),
+            patch("importlib.util.spec_from_file_location") as mock_spec_fn,
+        ):
+            # Wire up the fake module loading
+            mock_spec = MagicMock()
+            mock_spec.loader = MagicMock()
+            mock_spec_fn.return_value = mock_spec
+
+            def exec_module(mod):
+                mod.resolve_selected_options = fake_mod.resolve_selected_options
+                mod.Migrator = fake_mod.Migrator
+
+            mock_spec.loader.exec_module = exec_module
+
+            result = setup_mod._offer_openclaw_migration(hermes_home)
+
+        assert result is True
+        fake_mod.resolve_selected_options.assert_called_once_with(
+            None, None, preset="full"
+        )
+        fake_mod.Migrator.assert_called_once()
+        call_kwargs = fake_mod.Migrator.call_args[1]
+        assert call_kwargs["execute"] is True
+        assert call_kwargs["overwrite"] is False
+        assert call_kwargs["migrate_secrets"] is True
+        assert call_kwargs["preset_name"] == "full"
+        fake_migrator.migrate.assert_called_once()
+
+    def test_handles_migration_error_gracefully(self, tmp_path):
+        """Should catch exceptions and return False."""
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        hermes_home = tmp_path / ".hermes"
+        hermes_home.mkdir()
+        config_path = hermes_home / "config.yaml"
+        config_path.write_text("")
+
+        script = tmp_path / "openclaw_to_hermes.py"
+        script.write_text("# placeholder")
+
+        with (
+            patch("hermes_cli.setup.Path.home", return_value=tmp_path),
+            patch.object(setup_mod, "_OPENCLAW_SCRIPT", script),
+            patch.object(setup_mod, "prompt_yes_no", return_value=True),
+            patch.object(setup_mod, "get_config_path", return_value=config_path),
+            patch(
+                "importlib.util.spec_from_file_location",
+                side_effect=RuntimeError("boom"),
+            ),
+        ):
+            result = setup_mod._offer_openclaw_migration(hermes_home)
+
+        assert result is False
+
+    def test_creates_config_if_missing(self, tmp_path):
+        """Should bootstrap config.yaml before running migration."""
+        openclaw_dir = tmp_path / ".openclaw"
+        openclaw_dir.mkdir()
+        hermes_home = tmp_path / ".hermes"
+        hermes_home.mkdir()
+        config_path = hermes_home / "config.yaml"
+        # config does NOT exist yet
+
+        script = tmp_path / "openclaw_to_hermes.py"
+        script.write_text("# placeholder")
+
+        with (
+            patch("hermes_cli.setup.Path.home", return_value=tmp_path),
+            patch.object(setup_mod, "_OPENCLAW_SCRIPT", script),
+            patch.object(setup_mod, "prompt_yes_no", return_value=True),
+            patch.object(setup_mod, "get_config_path", return_value=config_path),
+            patch.object(setup_mod, "load_config", return_value={"agent": {}}),
+            patch.object(setup_mod, "save_config") as mock_save,
+            patch(
+                "importlib.util.spec_from_file_location",
+                side_effect=RuntimeError("stop early"),
+            ),
+        ):
+            setup_mod._offer_openclaw_migration(hermes_home)
+
+        # save_config should have been called to bootstrap the file
+        mock_save.assert_called_once_with({"agent": {}})
+
+
+# ---------------------------------------------------------------------------
+# Integration with run_setup_wizard — first-time flow
+# ---------------------------------------------------------------------------
+
+
+def _first_time_args() -> Namespace:
+    return Namespace(
+        section=None,
+        non_interactive=False,
+        reset=False,
+    )
+
+
+class TestSetupWizardOpenclawIntegration:
+    """Verify _offer_openclaw_migration is called during first-time setup."""
+
+    def test_migration_offered_during_first_time_setup(self, tmp_path):
+        """On first-time setup, _offer_openclaw_migration should be called."""
+        args = _first_time_args()
+
+        with (
+            patch.object(setup_mod, "ensure_hermes_home"),
+            patch.object(setup_mod, "load_config", return_value={}),
+            patch.object(setup_mod, "get_hermes_home", return_value=tmp_path),
+            patch.object(setup_mod, "get_env_value", return_value=""),
+            patch("hermes_cli.auth.get_active_provider", return_value=None),
+            # User presses Enter to start
+            patch("builtins.input", return_value=""),
+            # Mock the migration offer
+            patch.object(
+                setup_mod, "_offer_openclaw_migration", return_value=False
+            ) as mock_migration,
+            # Mock the actual setup sections so they don't run
+            patch.object(setup_mod, "setup_model_provider"),
+            patch.object(setup_mod, "setup_terminal_backend"),
+            patch.object(setup_mod, "setup_agent_settings"),
+            patch.object(setup_mod, "setup_gateway"),
+            patch.object(setup_mod, "setup_tools"),
+            patch.object(setup_mod, "save_config"),
+            patch.object(setup_mod, "_print_setup_summary"),
+        ):
+            setup_mod.run_setup_wizard(args)
+
+        mock_migration.assert_called_once_with(tmp_path)
+
+    def test_migration_reloads_config_on_success(self, tmp_path):
+        """When migration returns True, config should be reloaded."""
+        args = _first_time_args()
+        call_order = []
+
+        def tracking_load_config():
+            call_order.append("load_config")
+            return {}
+
+        with (
+            patch.object(setup_mod, "ensure_hermes_home"),
+            patch.object(setup_mod, "load_config", side_effect=tracking_load_config),
+            patch.object(setup_mod, "get_hermes_home", return_value=tmp_path),
+            patch.object(setup_mod, "get_env_value", return_value=""),
+            patch("hermes_cli.auth.get_active_provider", return_value=None),
+            patch("builtins.input", return_value=""),
+            patch.object(setup_mod, "_offer_openclaw_migration", return_value=True),
+            patch.object(setup_mod, "setup_model_provider"),
+            patch.object(setup_mod, "setup_terminal_backend"),
+            patch.object(setup_mod, "setup_agent_settings"),
+            patch.object(setup_mod, "setup_gateway"),
+            patch.object(setup_mod, "setup_tools"),
+            patch.object(setup_mod, "save_config"),
+            patch.object(setup_mod, "_print_setup_summary"),
+        ):
+            setup_mod.run_setup_wizard(args)
+
+        # load_config called twice: once at start, once after migration
+        assert call_order.count("load_config") == 2
+
+    def test_reloaded_config_flows_into_remaining_setup_sections(self, tmp_path):
+        args = _first_time_args()
+        initial_config = {}
+        reloaded_config = {"model": {"provider": "openrouter"}}
+
+        with (
+            patch.object(setup_mod, "ensure_hermes_home"),
+            patch.object(
+                setup_mod,
+                "load_config",
+                side_effect=[initial_config, reloaded_config],
+            ),
+            patch.object(setup_mod, "get_hermes_home", return_value=tmp_path),
+            patch.object(setup_mod, "get_env_value", return_value=""),
+            patch("hermes_cli.auth.get_active_provider", return_value=None),
+            patch("builtins.input", return_value=""),
+            patch.object(setup_mod, "_offer_openclaw_migration", return_value=True),
+            patch.object(setup_mod, "setup_model_provider") as setup_model_provider,
+            patch.object(setup_mod, "setup_terminal_backend"),
+            patch.object(setup_mod, "setup_agent_settings"),
+            patch.object(setup_mod, "setup_gateway"),
+            patch.object(setup_mod, "setup_tools"),
+            patch.object(setup_mod, "save_config"),
+            patch.object(setup_mod, "_print_setup_summary"),
+        ):
+            setup_mod.run_setup_wizard(args)
+
+        setup_model_provider.assert_called_once_with(reloaded_config)
+
+    def test_migration_not_offered_for_existing_install(self, tmp_path):
+        """Returning users should not see the migration prompt."""
+        args = _first_time_args()
+
+        with (
+            patch.object(setup_mod, "ensure_hermes_home"),
+            patch.object(setup_mod, "load_config", return_value={}),
+            patch.object(setup_mod, "get_hermes_home", return_value=tmp_path),
+            patch.object(
+                setup_mod,
+                "get_env_value",
+                side_effect=lambda k: "sk-xxx" if k == "OPENROUTER_API_KEY" else "",
+            ),
+            patch("hermes_cli.auth.get_active_provider", return_value=None),
+            # Returning user picks "Exit"
+            patch.object(setup_mod, "prompt_choice", return_value=9),
+            patch.object(
+                setup_mod, "_offer_openclaw_migration", return_value=False
+            ) as mock_migration,
+        ):
+            setup_mod.run_setup_wizard(args)
+
+        mock_migration.assert_not_called()
@@ -0,0 +1,211 @@
+"""Tests for hermes_cli/skills_config.py and skills_tool disabled filtering."""
+import pytest
+from unittest.mock import patch, MagicMock
+
+
+# ---------------------------------------------------------------------------
+# get_disabled_skills
+# ---------------------------------------------------------------------------
+
+class TestGetDisabledSkills:
+    def test_empty_config(self):
+        from hermes_cli.skills_config import get_disabled_skills
+        assert get_disabled_skills({}) == set()
+
+    def test_reads_global_disabled(self):
+        from hermes_cli.skills_config import get_disabled_skills
+        config = {"skills": {"disabled": ["skill-a", "skill-b"]}}
+        assert get_disabled_skills(config) == {"skill-a", "skill-b"}
+
+    def test_reads_platform_disabled(self):
+        from hermes_cli.skills_config import get_disabled_skills
+        config = {"skills": {
+            "disabled": ["skill-a"],
+            "platform_disabled": {"telegram": ["skill-b"]}
+        }}
+        assert get_disabled_skills(config, platform="telegram") == {"skill-b"}
+
+    def test_platform_falls_back_to_global(self):
+        from hermes_cli.skills_config import get_disabled_skills
+        config = {"skills": {"disabled": ["skill-a"]}}
+        # no platform_disabled for cli -> falls back to global
+        assert get_disabled_skills(config, platform="cli") == {"skill-a"}
+
+    def test_missing_skills_key(self):
+        from hermes_cli.skills_config import get_disabled_skills
+        assert get_disabled_skills({"other": "value"}) == set()
+
+    def test_empty_disabled_list(self):
+        from hermes_cli.skills_config import get_disabled_skills
+        assert get_disabled_skills({"skills": {"disabled": []}}) == set()
+
+
+# ---------------------------------------------------------------------------
+# save_disabled_skills
+# ---------------------------------------------------------------------------
+
+class TestSaveDisabledSkills:
+    @patch("hermes_cli.skills_config.save_config")
+    def test_saves_global_sorted(self, mock_save):
+        from hermes_cli.skills_config import save_disabled_skills
+        config = {}
+        save_disabled_skills(config, {"skill-z", "skill-a"})
+        assert config["skills"]["disabled"] == ["skill-a", "skill-z"]
+        mock_save.assert_called_once()
+
+    @patch("hermes_cli.skills_config.save_config")
+    def test_saves_platform_disabled(self, mock_save):
+        from hermes_cli.skills_config import save_disabled_skills
+        config = {}
+        save_disabled_skills(config, {"skill-x"}, platform="telegram")
+        assert config["skills"]["platform_disabled"]["telegram"] == ["skill-x"]
+
+    @patch("hermes_cli.skills_config.save_config")
+    def test_saves_empty(self, mock_save):
+        from hermes_cli.skills_config import save_disabled_skills
+        config = {"skills": {"disabled": ["skill-a"]}}
+        save_disabled_skills(config, set())
+        assert config["skills"]["disabled"] == []
+
+    @patch("hermes_cli.skills_config.save_config")
+    def test_creates_skills_key(self, mock_save):
+        from hermes_cli.skills_config import save_disabled_skills
+        config = {}
+        save_disabled_skills(config, {"skill-x"})
+        assert "skills" in config
+        assert "disabled" in config["skills"]
+
+
+# ---------------------------------------------------------------------------
+# _is_skill_disabled
+# ---------------------------------------------------------------------------
+
+class TestIsSkillDisabled:
+    @patch("hermes_cli.config.load_config")
+    def test_globally_disabled(self, mock_load):
+        mock_load.return_value = {"skills": {"disabled": ["bad-skill"]}}
+        from tools.skills_tool import _is_skill_disabled
+        assert _is_skill_disabled("bad-skill") is True
+
+    @patch("hermes_cli.config.load_config")
+    def test_globally_enabled(self, mock_load):
+        mock_load.return_value = {"skills": {"disabled": ["other"]}}
+        from tools.skills_tool import _is_skill_disabled
+        assert _is_skill_disabled("good-skill") is False
+
+    @patch("hermes_cli.config.load_config")
+    def test_platform_disabled(self, mock_load):
+        mock_load.return_value = {"skills": {
+            "disabled": [],
+            "platform_disabled": {"telegram": ["tg-skill"]}
+        }}
+        from tools.skills_tool import _is_skill_disabled
+        assert _is_skill_disabled("tg-skill", platform="telegram") is True
+
+    @patch("hermes_cli.config.load_config")
+    def test_platform_enabled_overrides_global(self, mock_load):
+        mock_load.return_value = {"skills": {
+            "disabled": ["skill-a"],
+            "platform_disabled": {"telegram": []}
+        }}
+        from tools.skills_tool import _is_skill_disabled
+        # telegram has explicit empty list -> skill-a is NOT disabled for telegram
+        assert _is_skill_disabled("skill-a", platform="telegram") is False
+
+    @patch("hermes_cli.config.load_config")
+    def test_platform_falls_back_to_global(self, mock_load):
+        mock_load.return_value = {"skills": {"disabled": ["skill-a"]}}
+        from tools.skills_tool import _is_skill_disabled
+        # no platform_disabled for cli -> global
+        assert _is_skill_disabled("skill-a", platform="cli") is True
+
+    @patch("hermes_cli.config.load_config")
+    def test_empty_config(self, mock_load):
+        mock_load.return_value = {}
+        from tools.skills_tool import _is_skill_disabled
+        assert _is_skill_disabled("any-skill") is False
+
+    @patch("hermes_cli.config.load_config")
+    def test_exception_returns_false(self, mock_load):
+        mock_load.side_effect = Exception("config error")
+        from tools.skills_tool import _is_skill_disabled
+        assert _is_skill_disabled("any-skill") is False
+
+    @patch("hermes_cli.config.load_config")
+    @patch.dict("os.environ", {"HERMES_PLATFORM": "discord"})
+    def test_env_var_platform(self, mock_load):
+        mock_load.return_value = {"skills": {
+            "platform_disabled": {"discord": ["discord-skill"]}
+        }}
+        from tools.skills_tool import _is_skill_disabled
+        assert _is_skill_disabled("discord-skill") is True
+
+
+# ---------------------------------------------------------------------------
+# _find_all_skills — disabled filtering
+# ---------------------------------------------------------------------------
+
+class TestFindAllSkillsFiltering:
+    @patch("tools.skills_tool._get_disabled_skill_names", return_value={"my-skill"})
+    @patch("tools.skills_tool.skill_matches_platform", return_value=True)
+    @patch("tools.skills_tool.SKILLS_DIR")
+    def test_disabled_skill_excluded(self, mock_dir, mock_platform, mock_disabled, tmp_path):
+        skill_dir = tmp_path / "my-skill"
+        skill_dir.mkdir()
+        skill_md = skill_dir / "SKILL.md"
+        skill_md.write_text("---\nname: my-skill\ndescription: A test skill\n---\nContent")
+        mock_dir.exists.return_value = True
+        mock_dir.rglob.return_value = [skill_md]
+        from tools.skills_tool import _find_all_skills
+        skills = _find_all_skills()
+        assert not any(s["name"] == "my-skill" for s in skills)
+
+    @patch("tools.skills_tool._get_disabled_skill_names", return_value=set())
+    @patch("tools.skills_tool.skill_matches_platform", return_value=True)
+    @patch("tools.skills_tool.SKILLS_DIR")
+    def test_enabled_skill_included(self, mock_dir, mock_platform, mock_disabled, tmp_path):
+        skill_dir = tmp_path / "my-skill"
+        skill_dir.mkdir()
+        skill_md = skill_dir / "SKILL.md"
+        skill_md.write_text("---\nname: my-skill\ndescription: A test skill\n---\nContent")
+        mock_dir.exists.return_value = True
+        mock_dir.rglob.return_value = [skill_md]
+        from tools.skills_tool import _find_all_skills
+        skills = _find_all_skills()
+        assert any(s["name"] == "my-skill" for s in skills)
+
+    @patch("tools.skills_tool._get_disabled_skill_names", return_value={"my-skill"})
+    @patch("tools.skills_tool.skill_matches_platform", return_value=True)
+    @patch("tools.skills_tool.SKILLS_DIR")
+    def test_skip_disabled_returns_all(self, mock_dir, mock_platform, mock_disabled, tmp_path):
+        """skip_disabled=True ignores the disabled set (for config UI)."""
+        skill_dir = tmp_path / "my-skill"
+        skill_dir.mkdir()
+        skill_md = skill_dir / "SKILL.md"
+        skill_md.write_text("---\nname: my-skill\ndescription: A test skill\n---\nContent")
+        mock_dir.exists.return_value = True
+        mock_dir.rglob.return_value = [skill_md]
+        from tools.skills_tool import _find_all_skills
+        skills = _find_all_skills(skip_disabled=True)
+        assert any(s["name"] == "my-skill" for s in skills)
+
+
+# ---------------------------------------------------------------------------
+# _get_categories
+# ---------------------------------------------------------------------------
+
+class TestGetCategories:
+    def test_extracts_unique_categories(self):
+        from hermes_cli.skills_config import _get_categories
+        skills = [
+            {"name": "a", "category": "mlops", "description": ""},
+            {"name": "b", "category": "coding", "description": ""},
+            {"name": "c", "category": "mlops", "description": ""},
+        ]
+        cats = _get_categories(skills)
+        assert cats == ["coding", "mlops"]
+
+    def test_none_becomes_uncategorized(self):
+        from hermes_cli.skills_config import _get_categories
+        skills = [{"name": "a", "category": None, "description": ""}]
+        assert "uncategorized" in _get_categories(skills)
@@ -1,13 +1,23 @@
 from io import StringIO

+import pytest
 from rich.console import Console

 from hermes_cli.skills_hub import do_list


-def test_do_list_initializes_hub_dir(monkeypatch, tmp_path):
+class _DummyLockFile:
+    def __init__(self, installed):
+        self._installed = installed
+
+    def list_installed(self):
+        return self._installed
+
+
+@pytest.fixture()
+def hub_env(monkeypatch, tmp_path):
+    """Set up isolated hub directory paths and return (monkeypatch, tmp_path)."""
    import tools.skills_hub as hub
-    import tools.skills_tool as skills_tool

    hub_dir = tmp_path / "skills" / ".hub"
    monkeypatch.setattr(hub, "SKILLS_DIR", tmp_path / "skills")
@@ -17,15 +27,98 @@ def test_do_list_initializes_hub_dir(monkeypatch, tmp_path):
    monkeypatch.setattr(hub, "AUDIT_LOG", hub_dir / "audit.log")
    monkeypatch.setattr(hub, "TAPS_FILE", hub_dir / "taps.json")
    monkeypatch.setattr(hub, "INDEX_CACHE_DIR", hub_dir / "index-cache")
+
+    return hub_dir
+
+
+# ---------------------------------------------------------------------------
+# Fixtures for common skill setups
+# ---------------------------------------------------------------------------
+
+_HUB_ENTRY = {"name": "hub-skill", "source": "github", "trust_level": "community"}
+
+_ALL_THREE_SKILLS = [
+    {"name": "hub-skill", "category": "x", "description": "hub"},
+    {"name": "builtin-skill", "category": "x", "description": "builtin"},
+    {"name": "local-skill", "category": "x", "description": "local"},
+]
+
+_BUILTIN_MANIFEST = {"builtin-skill": "abc123"}
+
+
+@pytest.fixture()
+def three_source_env(monkeypatch, hub_env):
+    """Populate hub/builtin/local skills for source-classification tests."""
+    import tools.skills_hub as hub
+    import tools.skills_sync as skills_sync
+    import tools.skills_tool as skills_tool
+
+    monkeypatch.setattr(hub, "HubLockFile", lambda: _DummyLockFile([_HUB_ENTRY]))
+    monkeypatch.setattr(skills_tool, "_find_all_skills", lambda: list(_ALL_THREE_SKILLS))
+    monkeypatch.setattr(skills_sync, "_read_manifest", lambda: dict(_BUILTIN_MANIFEST))
+
+    return hub_env
+
+
+def _capture(source_filter: str = "all") -> str:
+    """Run do_list into a string buffer and return the output."""
+    sink = StringIO()
+    console = Console(file=sink, force_terminal=False, color_system=None)
+    do_list(source_filter=source_filter, console=console)
+    return sink.getvalue()
+
+
+# ---------------------------------------------------------------------------
+# Tests
+# ---------------------------------------------------------------------------
+
+
+def test_do_list_initializes_hub_dir(monkeypatch, hub_env):
+    import tools.skills_sync as skills_sync
+    import tools.skills_tool as skills_tool
+
    monkeypatch.setattr(skills_tool, "_find_all_skills", lambda: [])
+    monkeypatch.setattr(skills_sync, "_read_manifest", lambda: {})

-    console = Console(file=StringIO(), force_terminal=False, color_system=None)
-
+    hub_dir = hub_env
    assert not hub_dir.exists()

-    do_list(console=console)
+    _capture()

    assert hub_dir.exists()
    assert (hub_dir / "lock.json").exists()
    assert (hub_dir / "quarantine").is_dir()
    assert (hub_dir / "index-cache").is_dir()
+
+
+def test_do_list_distinguishes_hub_builtin_and_local(three_source_env):
+    output = _capture()
+
+    assert "hub-skill" in output
+    assert "builtin-skill" in output
+    assert "local-skill" in output
+    assert "1 hub-installed, 1 builtin, 1 local" in output
+
+
+def test_do_list_filter_local(three_source_env):
+    output = _capture(source_filter="local")
+
+    assert "local-skill" in output
+    assert "builtin-skill" not in output
+    assert "hub-skill" not in output
+
+
+def test_do_list_filter_hub(three_source_env):
+    output = _capture(source_filter="hub")
+
+    assert "hub-skill" in output
+    assert "builtin-skill" not in output
+    assert "local-skill" not in output
+
+
+def test_do_list_filter_builtin(three_source_env):
+    output = _capture(source_filter="builtin")
+
+    assert "builtin-skill" in output
+    assert "hub-skill" not in output
+    assert "local-skill" not in output
@@ -0,0 +1,35 @@
+"""Test that skills subparser doesn't conflict (regression test for #898)."""
+
+import argparse
+
+
+def test_no_duplicate_skills_subparser():
+    """Ensure 'skills' subparser is only registered once to avoid Python 3.11+ crash.
+
+    Python 3.11 changed argparse to raise an exception on duplicate subparser
+    names instead of silently overwriting (see CPython #94331).
+
+    This test will fail with:
+        argparse.ArgumentError: argument command: conflicting subparser: skills
+
+    if the duplicate 'skills' registration is reintroduced.
+    """
+    # Force fresh import of the module where parser is constructed
+    # If there are duplicate 'skills' subparsers, this import will raise
+    # argparse.ArgumentError at module load time
+    import importlib
+    import sys
+
+    # Remove cached module if present
+    if 'hermes_cli.main' in sys.modules:
+        del sys.modules['hermes_cli.main']
+
+    try:
+        import hermes_cli.main  # noqa: F401
+    except argparse.ArgumentError as e:
+        if "conflicting subparser" in str(e):
+            raise AssertionError(
+                f"Duplicate subparser detected: {e}. "
+                "See issue #898 for details."
+            ) from e
+        raise
@@ -1,6 +1,6 @@
 """Tests for hermes_cli.tools_config platform tool persistence."""

-from hermes_cli.tools_config import _get_platform_tools
+from hermes_cli.tools_config import _get_platform_tools, _platform_toolset_summary


 def test_get_platform_tools_uses_default_when_platform_not_configured():
@@ -17,3 +17,12 @@ def test_get_platform_tools_preserves_explicit_empty_selection():
    enabled = _get_platform_tools(config, "cli")

    assert enabled == set()
+
+
+def test_platform_toolset_summary_uses_explicit_platform_list():
+    config = {}
+
+    summary = _platform_toolset_summary(config, platforms=["cli"])
+
+    assert set(summary.keys()) == {"cli"}
+    assert summary["cli"] == _get_platform_tools(config, "cli")
@@ -579,7 +579,7 @@ class WebToolsTester:
            "results": self.test_results,
            "environment": {
                "firecrawl_api_key": check_firecrawl_api_key(),
-                "nous_api_key": check_auxiliary_model(),
+                "auxiliary_model": check_auxiliary_model(),
                "debug_mode": get_debug_session_info()["enabled"]
            }
        }
@@ -0,0 +1,141 @@
+#!/usr/bin/env python3
+"""Run a real interrupt test with actual AIAgent + delegate child.
+
+Not a pytest test — runs directly as a script for live testing.
+"""
+
+import threading
+import time
+import sys
+import os
+
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from unittest.mock import MagicMock, patch
+from run_agent import AIAgent, IterationBudget
+from tools.delegate_tool import _run_single_child
+from tools.interrupt import set_interrupt, is_interrupted
+
+set_interrupt(False)
+
+# Create parent agent (minimal)
+parent = AIAgent.__new__(AIAgent)
+parent._interrupt_requested = False
+parent._interrupt_message = None
+parent._active_children = []
+parent.quiet_mode = True
+parent.model = "test/model"
+parent.base_url = "http://localhost:1"
+parent.api_key = "test"
+parent.provider = "test"
+parent.api_mode = "chat_completions"
+parent.platform = "cli"
+parent.enabled_toolsets = ["terminal", "file"]
+parent.providers_allowed = None
+parent.providers_ignored = None
+parent.providers_order = None
+parent.provider_sort = None
+parent.max_tokens = None
+parent.reasoning_config = None
+parent.prefill_messages = None
+parent._session_db = None
+parent._delegate_depth = 0
+parent._delegate_spinner = None
+parent.tool_progress_callback = None
+parent.iteration_budget = IterationBudget(max_total=100)
+parent._client_kwargs = {"api_key": "test", "base_url": "http://localhost:1"}
+
+child_started = threading.Event()
+result_holder = [None]
+
+
+def run_delegate():
+    with patch("run_agent.OpenAI") as MockOpenAI:
+        mock_client = MagicMock()
+
+        def slow_create(**kwargs):
+            time.sleep(3)
+            resp = MagicMock()
+            resp.choices = [MagicMock()]
+            resp.choices[0].message.content = "Done"
+            resp.choices[0].message.tool_calls = None
+            resp.choices[0].message.refusal = None
+            resp.choices[0].finish_reason = "stop"
+            resp.usage.prompt_tokens = 100
+            resp.usage.completion_tokens = 10
+            resp.usage.total_tokens = 110
+            resp.usage.prompt_tokens_details = None
+            return resp
+
+        mock_client.chat.completions.create = slow_create
+        mock_client.close = MagicMock()
+        MockOpenAI.return_value = mock_client
+
+        original_init = AIAgent.__init__
+
+        def patched_init(self_agent, *a, **kw):
+            original_init(self_agent, *a, **kw)
+            child_started.set()
+
+        with patch.object(AIAgent, "__init__", patched_init):
+            try:
+                result = _run_single_child(
+                    task_index=0,
+                    goal="Test slow task",
+                    context=None,
+                    toolsets=["terminal"],
+                    model="test/model",
+                    max_iterations=5,
+                    parent_agent=parent,
+                    task_count=1,
+                    override_provider="test",
+                    override_base_url="http://localhost:1",
+                    override_api_key="test",
+                    override_api_mode="chat_completions",
+                )
+                result_holder[0] = result
+            except Exception as e:
+                print(f"ERROR in delegate: {e}")
+                import traceback
+                traceback.print_exc()
+
+
+print("Starting agent thread...")
+agent_thread = threading.Thread(target=run_delegate, daemon=True)
+agent_thread.start()
+
+started = child_started.wait(timeout=10)
+if not started:
+    print("ERROR: Child never started")
+    sys.exit(1)
+
+time.sleep(0.5)
+
+print(f"Active children: {len(parent._active_children)}")
+for i, c in enumerate(parent._active_children):
+    print(f"  Child {i}: _interrupt_requested={c._interrupt_requested}")
+
+t0 = time.monotonic()
+parent.interrupt("User typed a new message")
+print(f"Called parent.interrupt()")
+
+for i, c in enumerate(parent._active_children):
+    print(f"  Child {i} after interrupt: _interrupt_requested={c._interrupt_requested}")
+print(f"Global is_interrupted: {is_interrupted()}")
+
+agent_thread.join(timeout=10)
+elapsed = time.monotonic() - t0
+print(f"Agent thread finished in {elapsed:.2f}s")
+
+result = result_holder[0]
+if result:
+    print(f"Status: {result['status']}")
+    print(f"Duration: {result['duration_seconds']}s")
+    if elapsed < 2.0:
+        print("✅ PASS: Interrupt detected quickly!")
+    else:
+        print(f"❌ FAIL: Took {elapsed:.2f}s — interrupt was too slow or not detected")
+else:
+    print("❌ FAIL: No result!")
+
+set_interrupt(False)
@@ -6,6 +6,11 @@ Verifies that:
 - Preflight compression proactively compresses oversized sessions before API calls
 """

+import pytest
+pytestmark = pytest.mark.skip(reason="Hangs in non-interactive environments")
+
+
+
 import uuid
 from types import SimpleNamespace
 from unittest.mock import MagicMock, patch
@@ -396,3 +401,73 @@ class TestPreflightCompression:
            result = agent.run_conversation("hello", conversation_history=big_history)

        mock_compress.assert_not_called()
+
+
+class TestToolResultPreflightCompression:
+    """Compression should trigger when tool results push context past the threshold."""
+
+    def test_large_tool_results_trigger_compression(self, agent):
+        """When tool results push estimated tokens past threshold, compress before next call."""
+        agent.compression_enabled = True
+        agent.context_compressor.context_length = 200_000
+        agent.context_compressor.threshold_tokens = 140_000
+        agent.context_compressor.last_prompt_tokens = 130_000
+        agent.context_compressor.last_completion_tokens = 5_000
+
+        tc = SimpleNamespace(
+            id="tc1", type="function",
+            function=SimpleNamespace(name="web_search", arguments='{"query":"test"}'),
+        )
+        tool_resp = _mock_response(
+            content=None, finish_reason="stop", tool_calls=[tc],
+            usage={"prompt_tokens": 130_000, "completion_tokens": 5_000, "total_tokens": 135_000},
+        )
+        ok_resp = _mock_response(
+            content="Done after compression", finish_reason="stop",
+            usage={"prompt_tokens": 50_000, "completion_tokens": 100, "total_tokens": 50_100},
+        )
+        agent.client.chat.completions.create.side_effect = [tool_resp, ok_resp]
+        large_result = "x" * 100_000
+
+        with (
+            patch("run_agent.handle_function_call", return_value=large_result),
+            patch.object(agent, "_compress_context") as mock_compress,
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            mock_compress.return_value = (
+                [{"role": "user", "content": "hello"}], "compressed prompt",
+            )
+            result = agent.run_conversation("hello")
+
+        mock_compress.assert_called_once()
+        assert result["completed"] is True
+
+    def test_anthropic_prompt_too_long_safety_net(self, agent):
+        """Anthropic 'prompt is too long' error triggers compression as safety net."""
+        err_400 = Exception(
+            "Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', "
+            "'message': 'prompt is too long: 233153 tokens > 200000 maximum'}}"
+        )
+        err_400.status_code = 400
+        ok_resp = _mock_response(content="Recovered", finish_reason="stop")
+        agent.client.chat.completions.create.side_effect = [err_400, ok_resp]
+        prefill = [
+            {"role": "user", "content": "previous"},
+            {"role": "assistant", "content": "answer"},
+        ]
+
+        with (
+            patch.object(agent, "_compress_context") as mock_compress,
+            patch.object(agent, "_persist_session"),
+            patch.object(agent, "_save_trajectory"),
+            patch.object(agent, "_cleanup_task_resources"),
+        ):
+            mock_compress.return_value = (
+                [{"role": "user", "content": "hello"}], "compressed",
+            )
+            result = agent.run_conversation("hello", conversation_history=prefill)
+
+        mock_compress.assert_called_once()
+        assert result["completed"] is True
@@ -0,0 +1,486 @@
+"""
+Tests for environments/agent_loop.py — HermesAgentLoop.
+
+Tests the multi-turn agent engine using mocked servers, without needing
+real API keys or running servers.
+"""
+
+import asyncio
+import json
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+from unittest.mock import MagicMock
+
+import pytest
+
+# Ensure repo root is importable
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+try:
+    from environments.agent_loop import (
+        AgentResult,
+        HermesAgentLoop,
+        ToolError,
+        _extract_reasoning_from_message,
+        resize_tool_pool,
+    )
+except ImportError:
+    pytest.skip("atroposlib not installed", allow_module_level=True)
+
+
+# ─── Mock server infrastructure ─────────────────────────────────────────
+
+
+@dataclass
+class MockFunction:
+    name: str
+    arguments: str
+
+
+@dataclass
+class MockToolCall:
+    id: str
+    function: MockFunction
+    type: str = "function"
+
+
+@dataclass
+class MockMessage:
+    content: Optional[str]
+    role: str = "assistant"
+    tool_calls: Optional[List[MockToolCall]] = None
+    reasoning_content: Optional[str] = None
+    reasoning: Optional[str] = None
+    reasoning_details: Optional[list] = None
+
+
+@dataclass
+class MockChoice:
+    message: MockMessage
+    finish_reason: str = "stop"
+    index: int = 0
+
+
+@dataclass
+class MockChatCompletion:
+    choices: List[MockChoice]
+    id: str = "chatcmpl-mock"
+    model: str = "mock-model"
+
+
+class MockServer:
+    """
+    Mock server that returns pre-configured responses in sequence.
+    Mimics the chat_completion() interface.
+    """
+
+    def __init__(self, responses: List[MockChatCompletion]):
+        self.responses = responses
+        self.call_count = 0
+        self.call_history: List[Dict[str, Any]] = []
+
+    async def chat_completion(self, **kwargs) -> MockChatCompletion:
+        self.call_history.append(kwargs)
+        if self.call_count >= len(self.responses):
+            # Return a simple text response if we run out
+            return MockChatCompletion(
+                choices=[MockChoice(message=MockMessage(content="Done."))]
+            )
+        resp = self.responses[self.call_count]
+        self.call_count += 1
+        return resp
+
+
+def make_text_response(content: str) -> MockChatCompletion:
+    """Create a simple text-only response (no tool calls)."""
+    return MockChatCompletion(
+        choices=[MockChoice(message=MockMessage(content=content))]
+    )
+
+
+def make_tool_response(
+    tool_name: str,
+    arguments: dict,
+    content: str = "",
+    tool_call_id: str = "call_001",
+) -> MockChatCompletion:
+    """Create a response with a single tool call."""
+    return MockChatCompletion(
+        choices=[
+            MockChoice(
+                message=MockMessage(
+                    content=content,
+                    tool_calls=[
+                        MockToolCall(
+                            id=tool_call_id,
+                            function=MockFunction(
+                                name=tool_name,
+                                arguments=json.dumps(arguments),
+                            ),
+                        )
+                    ],
+                ),
+                finish_reason="tool_calls",
+            )
+        ]
+    )
+
+
+# ─── Tests ───────────────────────────────────────────────────────────────
+
+
+class TestAgentResult:
+    def test_defaults(self):
+        result = AgentResult(messages=[])
+        assert result.messages == []
+        assert result.managed_state is None
+        assert result.turns_used == 0
+        assert result.finished_naturally is False
+        assert result.reasoning_per_turn == []
+        assert result.tool_errors == []
+
+
+class TestExtractReasoning:
+    def test_reasoning_content_field(self):
+        msg = MockMessage(content="hello", reasoning_content="I think...")
+        assert _extract_reasoning_from_message(msg) == "I think..."
+
+    def test_reasoning_field(self):
+        msg = MockMessage(content="hello", reasoning="Let me consider...")
+        assert _extract_reasoning_from_message(msg) == "Let me consider..."
+
+    def test_reasoning_details(self):
+        detail = MagicMock()
+        detail.text = "Detail reasoning"
+        msg = MockMessage(content="hello", reasoning_details=[detail])
+        assert _extract_reasoning_from_message(msg) == "Detail reasoning"
+
+    def test_reasoning_details_dict_format(self):
+        msg = MockMessage(
+            content="hello",
+            reasoning_details=[{"text": "Dict reasoning"}],
+        )
+        assert _extract_reasoning_from_message(msg) == "Dict reasoning"
+
+    def test_no_reasoning(self):
+        msg = MockMessage(content="hello")
+        assert _extract_reasoning_from_message(msg) is None
+
+    def test_reasoning_content_takes_priority(self):
+        msg = MockMessage(
+            content="hello",
+            reasoning_content="First",
+            reasoning="Second",
+        )
+        assert _extract_reasoning_from_message(msg) == "First"
+
+
+class TestHermesAgentLoop:
+    """Test the agent loop with mock servers."""
+
+    @pytest.fixture
+    def basic_tools(self):
+        """Minimal tool schema for testing."""
+        return [
+            {
+                "type": "function",
+                "function": {
+                    "name": "terminal",
+                    "description": "Run a command",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "command": {
+                                "type": "string",
+                                "description": "Command to run",
+                            }
+                        },
+                        "required": ["command"],
+                    },
+                },
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "read_file",
+                    "description": "Read a file",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "path": {"type": "string"},
+                        },
+                        "required": ["path"],
+                    },
+                },
+            },
+        ]
+
+    @pytest.fixture
+    def valid_names(self):
+        return {"terminal", "read_file", "todo"}
+
+    @pytest.mark.asyncio
+    async def test_simple_text_response(self, basic_tools, valid_names):
+        """Model responds with text only, no tool calls."""
+        server = MockServer([make_text_response("Hello! How can I help?")])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is True
+        assert result.turns_used == 1
+        assert len(result.messages) >= 2  # user + assistant
+        assert result.messages[-1]["role"] == "assistant"
+        assert result.messages[-1]["content"] == "Hello! How can I help?"
+
+    @pytest.mark.asyncio
+    async def test_tool_call_then_text(self, basic_tools, valid_names):
+        """Model calls a tool, then responds with text."""
+        server = MockServer([
+            make_tool_response("todo", {"todos": [{"id": "1", "content": "test", "status": "pending"}]}),
+            make_text_response("I created a todo for you."),
+        ])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Create a todo"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is True
+        assert result.turns_used == 2
+        # Should have: user, assistant (tool_call), tool (result), assistant (text)
+        roles = [m["role"] for m in result.messages]
+        assert roles == ["user", "assistant", "tool", "assistant"]
+
+    @pytest.mark.asyncio
+    async def test_max_turns_reached(self, basic_tools, valid_names):
+        """Model keeps calling tools until max_turns is hit."""
+        # Create responses that always call a tool
+        responses = [
+            make_tool_response("todo", {"todos": [{"id": str(i), "content": f"task {i}", "status": "pending"}]}, tool_call_id=f"call_{i}")
+            for i in range(10)
+        ]
+        server = MockServer(responses)
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=3,
+        )
+        messages = [{"role": "user", "content": "Keep going"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is False
+        assert result.turns_used == 3
+
+    @pytest.mark.asyncio
+    async def test_unknown_tool_name(self, basic_tools, valid_names):
+        """Model calls a tool not in valid_tool_names."""
+        server = MockServer([
+            make_tool_response("nonexistent_tool", {"arg": "val"}),
+            make_text_response("OK, that didn't work."),
+        ])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Call something weird"}]
+        result = await agent.run(messages)
+
+        # Should record a tool error
+        assert len(result.tool_errors) >= 1
+        assert result.tool_errors[0].tool_name == "nonexistent_tool"
+
+    @pytest.mark.asyncio
+    async def test_empty_response(self, basic_tools, valid_names):
+        """Server returns empty response."""
+        server = MockServer([MockChatCompletion(choices=[])])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is False
+        assert result.turns_used == 1
+
+    @pytest.mark.asyncio
+    async def test_api_error_handling(self, basic_tools, valid_names):
+        """Server raises an exception."""
+
+        class FailingServer:
+            async def chat_completion(self, **kwargs):
+                raise ConnectionError("Server unreachable")
+
+        agent = HermesAgentLoop(
+            server=FailingServer(),
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.finished_naturally is False
+        assert result.turns_used == 1
+
+    @pytest.mark.asyncio
+    async def test_tools_passed_to_server(self, basic_tools, valid_names):
+        """Verify tools are passed in the chat_completion kwargs."""
+        server = MockServer([make_text_response("OK")])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        await agent.run(messages)
+
+        assert len(server.call_history) == 1
+        assert "tools" in server.call_history[0]
+        assert server.call_history[0]["tools"] == basic_tools
+
+    @pytest.mark.asyncio
+    async def test_extra_body_forwarded(self, basic_tools, valid_names):
+        """extra_body should be forwarded to server."""
+        extra = {"provider": {"ignore": ["DeepInfra"]}}
+        server = MockServer([make_text_response("OK")])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+            extra_body=extra,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        await agent.run(messages)
+
+        assert server.call_history[0].get("extra_body") == extra
+
+    @pytest.mark.asyncio
+    async def test_managed_state_returned(self, basic_tools, valid_names):
+        """If server has get_state(), result should include managed_state."""
+        server = MockServer([make_text_response("OK")])
+        server.get_state = lambda: {"nodes": [{"test": True}]}
+
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.managed_state is not None
+        assert "nodes" in result.managed_state
+
+    @pytest.mark.asyncio
+    async def test_no_managed_state_without_get_state(self, basic_tools, valid_names):
+        """Regular server without get_state() should return None managed_state."""
+        server = MockServer([make_text_response("OK")])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Hi"}]
+        result = await agent.run(messages)
+
+        assert result.managed_state is None
+
+    @pytest.mark.asyncio
+    async def test_memory_tool_blocked(self, basic_tools):
+        """Memory tool should return error in RL environments."""
+        valid = {"terminal", "read_file", "todo", "memory"}
+        server = MockServer([
+            make_tool_response("memory", {"action": "add", "target": "user", "content": "test"}),
+            make_text_response("Done"),
+        ])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Remember this"}]
+        result = await agent.run(messages)
+
+        # Find the tool response
+        tool_msgs = [m for m in result.messages if m["role"] == "tool"]
+        assert len(tool_msgs) >= 1
+        tool_result = json.loads(tool_msgs[0]["content"])
+        assert "error" in tool_result
+        assert "not available" in tool_result["error"].lower()
+
+    @pytest.mark.asyncio
+    async def test_session_search_blocked(self, basic_tools):
+        """session_search should return error in RL environments."""
+        valid = {"terminal", "read_file", "todo", "session_search"}
+        server = MockServer([
+            make_tool_response("session_search", {"query": "test"}),
+            make_text_response("Done"),
+        ])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "Search sessions"}]
+        result = await agent.run(messages)
+
+        tool_msgs = [m for m in result.messages if m["role"] == "tool"]
+        assert len(tool_msgs) >= 1
+        tool_result = json.loads(tool_msgs[0]["content"])
+        assert "error" in tool_result
+
+    @pytest.mark.asyncio
+    async def test_reasoning_content_preserved(self, basic_tools, valid_names):
+        """Reasoning content should be extracted and preserved."""
+        resp = MockChatCompletion(
+            choices=[
+                MockChoice(
+                    message=MockMessage(
+                        content="The answer is 42.",
+                        reasoning_content="Let me think about this step by step...",
+                    )
+                )
+            ]
+        )
+        server = MockServer([resp])
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=basic_tools,
+            valid_tool_names=valid_names,
+            max_turns=10,
+        )
+        messages = [{"role": "user", "content": "What is the meaning of life?"}]
+        result = await agent.run(messages)
+
+        assert len(result.reasoning_per_turn) == 1
+        assert result.reasoning_per_turn[0] == "Let me think about this step by step..."
+
+
+class TestResizeToolPool:
+    def test_resize_works(self):
+        """resize_tool_pool should not raise."""
+        resize_tool_pool(16)  # Small pool for testing
+        resize_tool_pool(128)  # Restore default
@@ -0,0 +1,552 @@
+"""Integration tests for HermesAgentLoop tool calling.
+
+Tests the full agent loop with real LLM calls via OpenRouter.
+Uses stepfun/step-3.5-flash:free by default (zero cost), falls back
+to anthropic/claude-sonnet-4 if the free model is unavailable.
+
+These tests verify:
+1. Single tool call: model calls a tool, gets result, responds
+2. Multi-tool call: model calls multiple tools in one turn
+3. Multi-turn: model calls tools across multiple turns
+4. Unknown tool rejection: model calling a non-existent tool gets an error
+5. Max turns: loop stops when max_turns is reached
+6. No tools: model responds without calling any tools
+7. Tool error handling: tool execution errors are captured
+
+Run:
+    pytest tests/test_agent_loop_tool_calling.py -v
+    pytest tests/test_agent_loop_tool_calling.py -v -k "single"  # run one test
+"""
+
+import asyncio
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any, Dict, List, Set
+from unittest.mock import patch
+
+import pytest
+
+pytestmark = pytest.mark.skip(reason="Live API integration test — hangs in batch runs")
+
+# Ensure repo root is importable
+_repo_root = Path(__file__).resolve().parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+try:
+    from environments.agent_loop import AgentResult, HermesAgentLoop
+    from atroposlib.envs.server_handling.openai_server import OpenAIServer  # noqa: F401
+except ImportError:
+    pytest.skip("atroposlib not installed", allow_module_level=True)
+
+
+# =========================================================================
+# Test infrastructure
+# =========================================================================
+
+# Models to try, in order of preference (free first)
+_MODELS = [
+    "stepfun/step-3.5-flash:free",
+    "google/gemini-2.0-flash-001",
+    "anthropic/claude-sonnet-4",
+]
+
+def _get_api_key():
+    key = os.getenv("OPENROUTER_API_KEY", "")
+    if not key:
+        pytest.skip("OPENROUTER_API_KEY not set")
+    return key
+
+
+def _make_server(model: str = None):
+    """Create an OpenAI server for testing."""
+    from atroposlib.envs.server_handling.openai_server import OpenAIServer
+    from atroposlib.envs.server_handling.server_manager import APIServerConfig
+
+    config = APIServerConfig(
+        base_url="https://openrouter.ai/api/v1",
+        model_name=model or _MODELS[0],
+        server_type="openai",
+        api_key=_get_api_key(),
+        health_check=False,
+    )
+    return OpenAIServer(config)
+
+
+async def _try_models(test_fn):
+    """Try running a test with each model until one works."""
+    last_error = None
+    for model in _MODELS:
+        try:
+            server = _make_server(model)
+            return await test_fn(server, model)
+        except Exception as e:
+            last_error = e
+            if "rate" in str(e).lower() or "limit" in str(e).lower():
+                continue  # Rate limited, try next model
+            raise  # Real error
+    pytest.skip(f"All models failed. Last error: {last_error}")
+
+
+# =========================================================================
+# Fake tools for testing
+# =========================================================================
+
+# Simple calculator tool
+CALC_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "calculate",
+        "description": "Calculate a math expression. Returns the numeric result.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "expression": {
+                    "type": "string",
+                    "description": "Math expression to evaluate, e.g. '2 + 3'"
+                }
+            },
+            "required": ["expression"],
+        },
+    },
+}
+
+# Weather lookup tool
+WEATHER_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get the current weather for a city. Returns temperature and conditions.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "city": {
+                    "type": "string",
+                    "description": "City name, e.g. 'Tokyo'"
+                }
+            },
+            "required": ["city"],
+        },
+    },
+}
+
+# Lookup tool (always succeeds)
+LOOKUP_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "lookup",
+        "description": "Look up a fact. Returns a short answer string.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {
+                    "type": "string",
+                    "description": "What to look up"
+                }
+            },
+            "required": ["query"],
+        },
+    },
+}
+
+# Error tool (always fails)
+ERROR_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "failing_tool",
+        "description": "A tool that always fails with an error.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "input": {"type": "string"}
+            },
+            "required": ["input"],
+        },
+    },
+}
+
+
+def _fake_tool_handler(tool_name: str, args: Dict[str, Any], **kwargs) -> str:
+    """Handle fake tool calls for testing."""
+    if tool_name == "calculate":
+        expr = args.get("expression", "0")
+        try:
+            # Safe eval for simple math
+            result = eval(expr, {"__builtins__": {}}, {})
+            return json.dumps({"result": result})
+        except Exception as e:
+            return json.dumps({"error": str(e)})
+
+    elif tool_name == "get_weather":
+        city = args.get("city", "Unknown")
+        # Return canned weather
+        return json.dumps({
+            "city": city,
+            "temperature": 22,
+            "conditions": "sunny",
+            "humidity": 45,
+        })
+
+    elif tool_name == "lookup":
+        query = args.get("query", "")
+        return json.dumps({"answer": f"The answer to '{query}' is 42."})
+
+    elif tool_name == "failing_tool":
+        raise RuntimeError("This tool always fails!")
+
+    return json.dumps({"error": f"Unknown tool: {tool_name}"})
+
+
+# =========================================================================
+# Tests
+# =========================================================================
+
+@pytest.mark.asyncio
+async def test_single_tool_call():
+    """Model should call a single tool, get the result, and respond."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": "What's the weather in Tokyo? Use the get_weather tool."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        assert isinstance(result, AgentResult)
+        assert result.turns_used >= 2, f"Expected at least 2 turns (tool call + response), got {result.turns_used}"
+
+        # Verify a tool call happened
+        tool_calls_found = False
+        for msg in result.messages:
+            if msg.get("role") == "assistant" and msg.get("tool_calls"):
+                for tc in msg["tool_calls"]:
+                    if tc["function"]["name"] == "get_weather":
+                        tool_calls_found = True
+                        args = json.loads(tc["function"]["arguments"])
+                        assert "city" in args
+        assert tool_calls_found, "Model should have called get_weather"
+
+        # Verify tool result is in conversation
+        tool_results = [m for m in result.messages if m.get("role") == "tool"]
+        assert len(tool_results) >= 1, "Should have at least one tool result"
+
+        # Verify the final response references the weather
+        final_msg = result.messages[-1]
+        assert final_msg["role"] == "assistant"
+        assert final_msg["content"], "Final response should have content"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_multi_tool_single_turn():
+    """Model should call multiple tools in a single turn."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[WEATHER_TOOL, CALC_TOOL],
+            valid_tool_names={"get_weather", "calculate"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": (
+                "I need two things at once: "
+                "1) What's the weather in Paris? Use get_weather. "
+                "2) What is 15 * 7? Use calculate. "
+                "Call BOTH tools in a single response."
+            )},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Count distinct tools called
+        tools_called = set()
+        for msg in result.messages:
+            if msg.get("role") == "assistant" and msg.get("tool_calls"):
+                for tc in msg["tool_calls"]:
+                    tools_called.add(tc["function"]["name"])
+
+        # At minimum, both tools should have been called (maybe in different turns)
+        assert "get_weather" in tools_called, f"get_weather not called. Called: {tools_called}"
+        assert "calculate" in tools_called, f"calculate not called. Called: {tools_called}"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_multi_turn_conversation():
+    """Agent should handle multiple turns of tool calls."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[LOOKUP_TOOL, CALC_TOOL],
+            valid_tool_names={"lookup", "calculate"},
+            max_turns=10,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": (
+                "First, use the lookup tool to look up 'meaning of life'. "
+                "Then use calculate to compute 6 * 7. "
+                "Do these in separate tool calls, one at a time."
+            )},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Should have used both tools
+        tools_called = set()
+        for msg in result.messages:
+            if msg.get("role") == "assistant" and msg.get("tool_calls"):
+                for tc in msg["tool_calls"]:
+                    tools_called.add(tc["function"]["name"])
+
+        assert "lookup" in tools_called, f"lookup not called. Called: {tools_called}"
+        assert "calculate" in tools_called, f"calculate not called. Called: {tools_called}"
+
+        # Should finish naturally
+        assert result.finished_naturally, "Should finish naturally after answering"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_unknown_tool_rejected():
+    """If the model calls a tool not in valid_tool_names, it gets an error."""
+
+    async def _run(server, model):
+        # Only allow "calculate" but give schema for both
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[CALC_TOOL, WEATHER_TOOL],
+            valid_tool_names={"calculate"},  # weather NOT allowed
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": "What's the weather in London? Use get_weather."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Check if get_weather was called and rejected
+        if result.tool_errors:
+            weather_errors = [e for e in result.tool_errors if e.tool_name == "get_weather"]
+            assert len(weather_errors) > 0, "get_weather should have been rejected"
+            assert "Unknown tool" in weather_errors[0].error
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_max_turns_limit():
+    """Agent should stop after max_turns even if model keeps calling tools."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[LOOKUP_TOOL],
+            valid_tool_names={"lookup"},
+            max_turns=2,  # Very low limit
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": (
+                "Keep looking up facts. Look up 'fact 1', then 'fact 2', "
+                "then 'fact 3', then 'fact 4'. Do them one at a time."
+            )},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        assert result.turns_used <= 2, f"Should stop at max_turns=2, used {result.turns_used}"
+        assert not result.finished_naturally, "Should NOT finish naturally (hit max_turns)"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_no_tools_direct_response():
+    """When no tools are useful, model should respond directly."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=200,
+        )
+
+        messages = [
+            {"role": "user", "content": "What is 2 + 2? Just answer directly, no tools needed."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        assert result.finished_naturally, "Should finish naturally with a direct response"
+        assert result.turns_used == 1, f"Should take exactly 1 turn for a direct answer, took {result.turns_used}"
+
+        final = result.messages[-1]
+        assert final["role"] == "assistant"
+        assert final["content"], "Should have text content"
+        assert "4" in final["content"], "Should contain the answer '4'"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_tool_error_handling():
+    """Tool execution errors should be captured and reported to the model."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[ERROR_TOOL],
+            valid_tool_names={"failing_tool"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": "Please call the failing_tool with input 'test'."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # The tool error should be recorded
+        assert len(result.tool_errors) >= 1, "Should have at least one tool error"
+        assert "RuntimeError" in result.tool_errors[0].error or "always fails" in result.tool_errors[0].error
+
+        # The error should be in the conversation as a tool result
+        tool_results = [m for m in result.messages if m.get("role") == "tool"]
+        assert len(tool_results) >= 1
+        error_result = json.loads(tool_results[0]["content"])
+        assert "error" in error_result
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_agent_result_structure():
+    """Verify the AgentResult has all expected fields populated."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[CALC_TOOL],
+            valid_tool_names={"calculate"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=300,
+        )
+
+        messages = [
+            {"role": "user", "content": "What is 3 + 4? Use the calculate tool."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Structural checks
+        assert isinstance(result, AgentResult)
+        assert isinstance(result.messages, list)
+        assert len(result.messages) >= 3, "Should have user + assistant(tool) + tool_result + assistant(final)"
+        assert isinstance(result.turns_used, int)
+        assert result.turns_used > 0
+        assert isinstance(result.finished_naturally, bool)
+        assert isinstance(result.tool_errors, list)
+        assert isinstance(result.reasoning_per_turn, list)
+
+        # Messages should follow OpenAI format
+        for msg in result.messages:
+            assert "role" in msg, f"Message missing 'role': {msg}"
+            assert msg["role"] in ("system", "user", "assistant", "tool"), f"Invalid role: {msg['role']}"
+
+        return result
+
+    await _try_models(_run)
+
+
+@pytest.mark.asyncio
+async def test_conversation_history_preserved():
+    """The full conversation history should be in result.messages."""
+
+    async def _run(server, model):
+        agent = HermesAgentLoop(
+            server=server,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.0,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "system", "content": "You are a helpful weather assistant."},
+            {"role": "user", "content": "What's the weather in Berlin? Use get_weather."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # System message should be preserved
+        assert result.messages[0]["role"] == "system"
+        assert "weather assistant" in result.messages[0]["content"]
+
+        # User message should be preserved
+        assert result.messages[1]["role"] == "user"
+        assert "Berlin" in result.messages[1]["content"]
+
+        # Should have assistant + tool + assistant sequence
+        roles = [m["role"] for m in result.messages]
+        assert "tool" in roles, "Should have tool results in conversation"
+
+        return result
+
+    await _try_models(_run)
@@ -0,0 +1,359 @@
+"""Integration tests for HermesAgentLoop with a local vLLM server.
+
+Tests the full Phase 2 flow: ManagedServer + tool calling with a real
+vLLM backend, producing actual token IDs and logprobs for RL training.
+
+Requires a running vLLM server. Start one from the atropos directory:
+
+    python -m example_trainer.vllm_api_server \
+        --model Qwen/Qwen3-4B-Thinking-2507 \
+        --port 9001 \
+        --gpu-memory-utilization 0.8 \
+        --max-model-len=32000
+
+Tests are automatically skipped if the server is not reachable.
+
+Run:
+    pytest tests/test_agent_loop_vllm.py -v
+    pytest tests/test_agent_loop_vllm.py -v -k "single"
+"""
+
+import asyncio
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any, Dict
+from unittest.mock import patch
+
+import pytest
+import requests
+
+# Ensure repo root is importable
+_repo_root = Path(__file__).resolve().parent.parent
+if str(_repo_root) not in sys.path:
+    sys.path.insert(0, str(_repo_root))
+
+try:
+    from environments.agent_loop import AgentResult, HermesAgentLoop
+except ImportError:
+    pytest.skip("atroposlib not installed", allow_module_level=True)
+
+
+# =========================================================================
+# Configuration
+# =========================================================================
+
+VLLM_HOST = "localhost"
+VLLM_PORT = 9001
+VLLM_BASE_URL = f"http://{VLLM_HOST}:{VLLM_PORT}"
+VLLM_MODEL = "Qwen/Qwen3-4B-Thinking-2507"
+
+
+def _vllm_is_running() -> bool:
+    """Check if the vLLM server is reachable."""
+    try:
+        r = requests.get(f"{VLLM_BASE_URL}/health", timeout=3)
+        return r.status_code == 200
+    except Exception:
+        return False
+
+
+# Skip all tests in this module if vLLM is not running
+pytestmark = pytest.mark.skipif(
+    not _vllm_is_running(),
+    reason=(
+        f"vLLM server not reachable at {VLLM_BASE_URL}. "
+        "Start it with: python -m example_trainer.vllm_api_server "
+        f"--model {VLLM_MODEL} --port {VLLM_PORT} "
+        "--gpu-memory-utilization 0.8 --max-model-len=32000"
+    ),
+)
+
+
+# =========================================================================
+# Server setup
+# =========================================================================
+
+def _make_server_manager():
+    """Create a ServerManager pointing to the local vLLM server."""
+    from atroposlib.envs.server_handling.server_manager import (
+        ServerManager,
+        APIServerConfig,
+    )
+
+    config = APIServerConfig(
+        base_url=VLLM_BASE_URL,
+        model_name=VLLM_MODEL,
+        server_type="vllm",
+        health_check=False,
+    )
+    sm = ServerManager([config], tool_parser="hermes")
+    sm.servers[0].server_healthy = True
+    return sm
+
+
+def _get_tokenizer():
+    """Load the tokenizer for the model."""
+    from transformers import AutoTokenizer
+    return AutoTokenizer.from_pretrained(VLLM_MODEL)
+
+
+# =========================================================================
+# Fake tools
+# =========================================================================
+
+WEATHER_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get the current weather for a city. Returns temperature and conditions.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "city": {
+                    "type": "string",
+                    "description": "City name, e.g. 'Tokyo'",
+                }
+            },
+            "required": ["city"],
+        },
+    },
+}
+
+CALC_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "calculate",
+        "description": "Calculate a math expression. Returns the numeric result.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "expression": {
+                    "type": "string",
+                    "description": "Math expression, e.g. '2 + 3'",
+                }
+            },
+            "required": ["expression"],
+        },
+    },
+}
+
+
+def _fake_tool_handler(tool_name: str, args: Dict[str, Any], **kwargs) -> str:
+    """Handle fake tool calls for testing."""
+    if tool_name == "get_weather":
+        city = args.get("city", "Unknown")
+        return json.dumps({
+            "city": city,
+            "temperature": 22,
+            "conditions": "sunny",
+            "humidity": 45,
+        })
+    elif tool_name == "calculate":
+        expr = args.get("expression", "0")
+        try:
+            result = eval(expr, {"__builtins__": {}}, {})
+            return json.dumps({"result": result})
+        except Exception as e:
+            return json.dumps({"error": str(e)})
+    return json.dumps({"error": f"Unknown tool: {tool_name}"})
+
+
+# =========================================================================
+# Tests
+# =========================================================================
+
+@pytest.mark.asyncio
+async def test_vllm_single_tool_call():
+    """vLLM model calls a tool, gets result, responds — full Phase 2 flow."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(tokenizer=tokenizer) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.6,
+            max_tokens=1000,
+        )
+
+        messages = [
+            {"role": "user", "content": "What's the weather in Tokyo? Use the get_weather tool."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+    assert isinstance(result, AgentResult)
+    assert result.turns_used >= 2, f"Expected at least 2 turns, got {result.turns_used}"
+
+    # Verify tool call happened
+    tool_calls_found = False
+    for msg in result.messages:
+        if msg.get("role") == "assistant" and msg.get("tool_calls"):
+            for tc in msg["tool_calls"]:
+                if tc["function"]["name"] == "get_weather":
+                    tool_calls_found = True
+                    args = json.loads(tc["function"]["arguments"])
+                    assert "city" in args
+    assert tool_calls_found, "Model should have called get_weather"
+
+    # Verify tool results in conversation
+    tool_results = [m for m in result.messages if m.get("role") == "tool"]
+    assert len(tool_results) >= 1
+
+
+@pytest.mark.asyncio
+async def test_vllm_multi_tool_calls():
+    """vLLM model calls multiple tools across turns."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(tokenizer=tokenizer) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[WEATHER_TOOL, CALC_TOOL],
+            valid_tool_names={"get_weather", "calculate"},
+            max_turns=10,
+            temperature=0.6,
+            max_tokens=1000,
+        )
+
+        messages = [
+            {"role": "user", "content": (
+                "I need two things: "
+                "1) What's the weather in Paris? Use get_weather. "
+                "2) What is 15 * 7? Use calculate."
+            )},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+    # Both tools should be called
+    tools_called = set()
+    for msg in result.messages:
+        if msg.get("role") == "assistant" and msg.get("tool_calls"):
+            for tc in msg["tool_calls"]:
+                tools_called.add(tc["function"]["name"])
+
+    assert "get_weather" in tools_called, f"get_weather not called. Called: {tools_called}"
+    assert "calculate" in tools_called, f"calculate not called. Called: {tools_called}"
+
+
+@pytest.mark.asyncio
+async def test_vllm_managed_server_produces_nodes():
+    """ManagedServer should produce SequenceNodes with tokens and logprobs."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(tokenizer=tokenizer) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.6,
+            max_tokens=1000,
+        )
+
+        messages = [
+            {"role": "user", "content": "What's the weather in Berlin? Use get_weather."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+        # Get the managed state — should have SequenceNodes
+        state = managed.get_state()
+
+    assert state is not None, "ManagedServer should return state"
+    nodes = state.get("nodes", [])
+    assert len(nodes) >= 1, f"Should have at least 1 node, got {len(nodes)}"
+
+    node = nodes[0]
+    assert hasattr(node, "tokens"), "Node should have tokens"
+    assert hasattr(node, "logprobs"), "Node should have logprobs"
+    assert len(node.tokens) > 0, "Tokens should not be empty"
+    assert len(node.logprobs) > 0, "Logprobs should not be empty"
+    assert len(node.tokens) == len(node.logprobs), (
+        f"Tokens ({len(node.tokens)}) and logprobs ({len(node.logprobs)}) should have same length"
+    )
+
+
+@pytest.mark.asyncio
+async def test_vllm_no_tools_direct_response():
+    """vLLM model should respond directly when no tools are needed."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(tokenizer=tokenizer) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[WEATHER_TOOL],
+            valid_tool_names={"get_weather"},
+            max_turns=5,
+            temperature=0.6,
+            max_tokens=500,
+        )
+
+        messages = [
+            {"role": "user", "content": "What is 2 + 2? Answer directly, no tools."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+    assert result.finished_naturally, "Should finish naturally"
+    assert result.turns_used == 1, f"Should take 1 turn, took {result.turns_used}"
+
+    final = result.messages[-1]
+    assert final["role"] == "assistant"
+    assert final["content"], "Should have content"
+
+
+@pytest.mark.asyncio
+async def test_vllm_thinking_content_extracted():
+    """Qwen3-Thinking model should produce reasoning content."""
+    sm = _make_server_manager()
+    tokenizer = _get_tokenizer()
+
+    async with sm.managed_server(
+        tokenizer=tokenizer,
+        preserve_think_blocks=True,
+    ) as managed:
+        agent = HermesAgentLoop(
+            server=managed,
+            tool_schemas=[CALC_TOOL],
+            valid_tool_names={"calculate"},
+            max_turns=5,
+            temperature=0.6,
+            max_tokens=1000,
+        )
+
+        messages = [
+            {"role": "user", "content": "What is 123 * 456? Use the calculate tool."},
+        ]
+
+        with patch("environments.agent_loop.handle_function_call", side_effect=_fake_tool_handler):
+            result = await agent.run(messages)
+
+    # Qwen3-Thinking should generate <think> blocks
+    # Check if any content contains thinking markers
+    has_thinking = False
+    for msg in result.messages:
+        content = msg.get("content", "") or ""
+        if "<think>" in content or "</think>" in content:
+            has_thinking = True
+            break
+
+    # Also check reasoning_per_turn
+    has_reasoning = any(r for r in result.reasoning_per_turn if r)
+
+    # At least one of these should be true for a thinking model
+    assert has_thinking or has_reasoning, (
+        "Qwen3-Thinking should produce <think> blocks or reasoning content"
+    )
--- a/Show More
+++ b/Show More